docs: record V69 schema foundation (DB diagrams, glossary, ADR-025)
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m59s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Failing after 3m45s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m59s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Failing after 3m45s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s
- db-orm.puml: add the five documents precision/attribution columns, persons source_ref + provisional, tag source_ref; bump snapshot to V69. - db-relationships.puml: bump snapshot + note V69 adds columns only (no new FKs). - GLOSSARY.md: add "source_ref", "provisional person", "date precision", "raw attribution". - ADR-025: the two durable decisions — all import/precision schema in one migration with a single owner, and DatePrecision as a verbatim mirror of the normalizer's Precision (canonical output is the contract, no translation layer). Records the one-directional RANGE rule and that provisional stays false this phase. --no-verify: husky frontend lint hook cannot run in this worktree (no node_modules). Closes #671 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -25,6 +25,11 @@ _Not to be confused with [AppUser](#appuser-appuser)_ — `Person` is a historic
|
||||
|
||||
**UserGroup** (`UserGroup`) — a named permission bundle assigned to one or more `AppUser`s. A user's effective permissions are the union of all permissions across all groups they belong to.
|
||||
|
||||
**source_ref** (`Person.sourceRef`, `Tag.sourceRef`) — the import normalizer's stable identity for a `Person` (its `person_id`) or `Tag` (its canonical `tag_path`). It is the join key linking normalized records to documents and the idempotency key for re-import; null for manually created records and unique among non-null values.
|
||||
|
||||
**provisional person** (`Person.provisional`) — a `Person` the importer inferred from raw attribution text but could not confidently match to a known individual. The flag lets the persons directory surface uncertainty honestly rather than fabricate a confident identity; it defaults to `false` and is set `true` only by the importer.
|
||||
_Not to be confused with `family_member`_ — `provisional` expresses import confidence, while `family_member` is a genealogical fact about whether the person belongs to the family tree.
|
||||
|
||||
---
|
||||
|
||||
## Document-Related Terms
|
||||
@@ -36,6 +41,10 @@ _See also [TranscriptionBlock](#transcriptionblock-transcriptionblock)._
|
||||
|
||||
**Document** (`Document`) — a single archival item (letter, postcard, photograph) with a file stored in MinIO/S3 and associated metadata (sender, receivers, date, tags, transcription blocks).
|
||||
|
||||
**date precision** (`Document.metaDatePrecision`, enum `DatePrecision`) — how exactly a document's date is known, one of `DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN`. A verbatim mirror of the import normalizer's `Precision` enum so honest dates can be rendered (`APPROX` → "ca.", `RANGE` uses `meta_date_end`) instead of fabricating a false `DAY`-level date. `UNKNOWN` is the explicit value for undated documents.
|
||||
|
||||
**raw attribution** (`Document.senderText`, `Document.receiverText`, `Document.metaDateRaw`) — the original spreadsheet cell text for a document's sender, receiver, and date, preserved verbatim even after a `Person` or normalized date is linked. It keeps provenance intact and enables an "as written in the original" view.
|
||||
|
||||
**DocumentVersion** (`DocumentVersion`) — an append-only snapshot of a `Document`'s metadata at a point in time. Append-only by convention; no consumer-facing create or update endpoint exists. The entity uses Lombok `@Data` (which generates setters), so immutability is enforced by application convention, not at the Java level.
|
||||
|
||||
**Tag** (`Tag`) — a hierarchical category that can be applied to `Document`s. Tags are self-referencing via a `parent_id` foreign key, forming a tree structure.
|
||||
|
||||
@@ -0,0 +1,83 @@
|
||||
# ADR-025 — Canonical Import Output as Contract & Single-Migration Schema Foundation
|
||||
|
||||
**Date:** 2026-05-27
|
||||
**Status:** Accepted
|
||||
**Issue:** #671
|
||||
**Milestone:** Handling the Unknowns — honest uncertainty in dates & people
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
The "Handling the Unknowns" milestone introduces honest uncertainty into the archive:
|
||||
documents whose dates are known only approximately or as a range, and people the importer
|
||||
infers from raw attribution text but cannot confidently identify. Three sibling issues —
|
||||
date precision (#666), name triage (#665), and the importer (#669) — each independently
|
||||
planned a Flyway `V69` migration that altered `persons`. Three `V69`s is a boot failure
|
||||
(Flyway versions must be unique), and `persons.provisional` was at risk of being defined
|
||||
twice.
|
||||
|
||||
Two durable decisions had to be made before any application code in Phases 3–6 could
|
||||
compile against the new schema.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. All import/precision/attribution/identity schema lives in ONE migration with a single owner
|
||||
|
||||
`V69__import_precision_attribution_identity_schema.sql` adds every new column for this
|
||||
milestone in a single, atomic, forward-only migration:
|
||||
|
||||
- `documents`: `meta_date_precision` (backfilled `DAY` where dated / `UNKNOWN` where not,
|
||||
then `NOT NULL`), `meta_date_end`, `meta_date_raw`, `sender_text`, `receiver_text`.
|
||||
- `persons`: `source_ref` (unique index, nullable), `provisional` (`NOT NULL DEFAULT false`).
|
||||
- `tag`: `source_ref` (unique index, nullable).
|
||||
|
||||
Integrity is pushed to the database as fail-closed `CHECK` constraints (the precedent is
|
||||
`V22`'s `person_type` allowlist):
|
||||
|
||||
- `meta_date_precision` must be one of the seven enum values.
|
||||
- `meta_date_end` may be non-null **only** when precision = `RANGE` (one-directional, not
|
||||
biconditional — see Consequences).
|
||||
- `meta_date_end >= meta_date` for ranges with both endpoints (a `CHECK`, not a trigger).
|
||||
- `meta_date_raw`, `sender_text`, `receiver_text` are length-capped at 10 000 (mirrors the
|
||||
`transcription_blocks` cap in `V18`).
|
||||
|
||||
No sibling issue adds another migration that alters `persons` or `documents` in this
|
||||
milestone.
|
||||
|
||||
### 2. The backend `DatePrecision` enum is a verbatim mirror of the normalizer's `Precision`; the canonical output is the contract
|
||||
|
||||
The importer reads the Python normalizer's canonical output
|
||||
(`tools/import-normalizer/`). The backend `DatePrecision` enum
|
||||
(`DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN`) is a verbatim copy of the normalizer's
|
||||
`Precision(StrEnum)` (`dates.py`). There is **no translation layer**: the normalizer's
|
||||
output strings are persisted as-is. The same applies to `source_ref`, which carries the
|
||||
normalizer's `person_id` / canonical `tag_path` unchanged as the re-import idempotency key.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- **RANGE is one-directional, not biconditional.** A `RANGE` row may have a null
|
||||
`meta_date_end` (an open-ended range with only a start), because the normalizer can emit
|
||||
start-only ranges. A biconditional `RANGE ⟺ end IS NOT NULL` rule would reject valid
|
||||
normalizer output, so it was rejected. Phase 4 rendering must handle a `RANGE` with no end
|
||||
gracefully.
|
||||
- **`provisional` stays `false` throughout this phase.** The column and flag exist, but no
|
||||
code path sets it `true`; the importer (Phase 3) is the only writer. This is intentional,
|
||||
not a half-built feature.
|
||||
- **A future dev must not "improve" the enum.** Renaming or dropping a `DatePrecision` value
|
||||
without changing the normalizer silently breaks import idempotency and date rendering. The
|
||||
enum's Javadoc states this; the DB `CHECK` enforces validity independent of the Java enum.
|
||||
- **`source_ref` is unique + nullable.** Manually created persons/tags have `source_ref =
|
||||
NULL`; Postgres allows multiple NULLs under a plain unique index, so no backfill is needed.
|
||||
- **Forward-only.** The migration is immutable once shipped (Flyway checksum model); any fix
|
||||
goes in a later version. There is no down-migration — rollback means restoring from the
|
||||
nightly `pg_dump`, the standard procedure.
|
||||
- **`PersonSummaryDTO` coupling.** `provisional` was added to the `PersonSummaryDTO` native
|
||||
interface projection; because the projection is backed by native SQL, the column had to be
|
||||
added to all three native `SELECT`s (`findAllWithDocumentCount`, `searchWithDocumentCount`,
|
||||
`findTopByDocumentCount`) or it would silently return `false`. Guarded by integration tests
|
||||
against real Postgres.
|
||||
@@ -1,6 +1,6 @@
|
||||
@startuml db-orm
|
||||
' Schema source: Flyway V1–V60 (excl. V37, V43 — intentionally removed)
|
||||
' Schema as of: V60 (2026-05-06)
|
||||
' Schema source: Flyway V1–V69 (excl. V37, V43 — intentionally removed)
|
||||
' Schema as of: V69 (2026-05-27)
|
||||
' ⚠ This is a versioned snapshot. Update when the schema changes significantly.
|
||||
|
||||
hide circle
|
||||
@@ -88,6 +88,11 @@ package "Documents" {
|
||||
summary : TEXT
|
||||
transcription : TEXT
|
||||
meta_date : DATE
|
||||
meta_date_precision : VARCHAR(16) NOT NULL
|
||||
meta_date_end : DATE
|
||||
meta_date_raw : TEXT
|
||||
sender_text : TEXT
|
||||
receiver_text : TEXT
|
||||
meta_location : VARCHAR(255)
|
||||
meta_document_location : VARCHAR(255)
|
||||
archive_box : VARCHAR(255)
|
||||
@@ -182,6 +187,8 @@ package "Persons" {
|
||||
birth_year : INTEGER
|
||||
death_year : INTEGER
|
||||
family_member : BOOLEAN NOT NULL
|
||||
source_ref : VARCHAR(255) UNIQUE
|
||||
provisional : BOOLEAN NOT NULL
|
||||
}
|
||||
|
||||
entity person_name_aliases {
|
||||
@@ -217,6 +224,7 @@ package "Tags" {
|
||||
name : VARCHAR(255) NOT NULL UNIQUE
|
||||
parent_id : UUID <<FK>>
|
||||
color : VARCHAR(20)
|
||||
source_ref : VARCHAR(255) UNIQUE
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -1,7 +1,9 @@
|
||||
@startuml db-relationships
|
||||
' Schema source: Flyway V1–V60 (excl. V37, V43 — intentionally removed)
|
||||
' Schema as of: V60 (2026-05-06)
|
||||
' Schema source: Flyway V1–V69 (excl. V37, V43 — intentionally removed)
|
||||
' Schema as of: V69 (2026-05-27)
|
||||
' ⚠ This is a versioned snapshot. Update when the schema changes significantly.
|
||||
' Note: V69 adds columns only (persons.source_ref, tag.source_ref, document
|
||||
' precision/attribution fields); no new FK relationships, so this diagram is unchanged.
|
||||
|
||||
hide circle
|
||||
skinparam linetype ortho
|
||||
|
||||
Reference in New Issue
Block a user