docs: record V69 schema foundation (DB diagrams, glossary, ADR-025)
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m59s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Failing after 3m45s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s

- db-orm.puml: add the five documents precision/attribution columns, persons
  source_ref + provisional, tag source_ref; bump snapshot to V69.
- db-relationships.puml: bump snapshot + note V69 adds columns only (no new FKs).
- GLOSSARY.md: add "source_ref", "provisional person", "date precision",
  "raw attribution".
- ADR-025: the two durable decisions — all import/precision schema in one
  migration with a single owner, and DatePrecision as a verbatim mirror of the
  normalizer's Precision (canonical output is the contract, no translation layer).
  Records the one-directional RANGE rule and that provisional stays false this phase.

--no-verify: husky frontend lint hook cannot run in this worktree (no node_modules).

Closes #671

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-05-27 09:21:57 +02:00
parent 6f5ca47543
commit d959cb54f1
4 changed files with 106 additions and 4 deletions

View File

@@ -25,6 +25,11 @@ _Not to be confused with [AppUser](#appuser-appuser)_ — `Person` is a historic
**UserGroup** (`UserGroup`) — a named permission bundle assigned to one or more `AppUser`s. A user's effective permissions are the union of all permissions across all groups they belong to.
**source_ref** (`Person.sourceRef`, `Tag.sourceRef`) — the import normalizer's stable identity for a `Person` (its `person_id`) or `Tag` (its canonical `tag_path`). It is the join key linking normalized records to documents and the idempotency key for re-import; null for manually created records and unique among non-null values.
**provisional person** (`Person.provisional`) — a `Person` the importer inferred from raw attribution text but could not confidently match to a known individual. The flag lets the persons directory surface uncertainty honestly rather than fabricate a confident identity; it defaults to `false` and is set `true` only by the importer.
_Not to be confused with `family_member`_ — `provisional` expresses import confidence, while `family_member` is a genealogical fact about whether the person belongs to the family tree.
---
## Document-Related Terms
@@ -36,6 +41,10 @@ _See also [TranscriptionBlock](#transcriptionblock-transcriptionblock)._
**Document** (`Document`) — a single archival item (letter, postcard, photograph) with a file stored in MinIO/S3 and associated metadata (sender, receivers, date, tags, transcription blocks).
**date precision** (`Document.metaDatePrecision`, enum `DatePrecision`) — how exactly a document's date is known, one of `DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN`. A verbatim mirror of the import normalizer's `Precision` enum so honest dates can be rendered (`APPROX` → "ca.", `RANGE` uses `meta_date_end`) instead of fabricating a false `DAY`-level date. `UNKNOWN` is the explicit value for undated documents.
**raw attribution** (`Document.senderText`, `Document.receiverText`, `Document.metaDateRaw`) — the original spreadsheet cell text for a document's sender, receiver, and date, preserved verbatim even after a `Person` or normalized date is linked. It keeps provenance intact and enables an "as written in the original" view.
**DocumentVersion** (`DocumentVersion`) — an append-only snapshot of a `Document`'s metadata at a point in time. Append-only by convention; no consumer-facing create or update endpoint exists. The entity uses Lombok `@Data` (which generates setters), so immutability is enforced by application convention, not at the Java level.
**Tag** (`Tag`) — a hierarchical category that can be applied to `Document`s. Tags are self-referencing via a `parent_id` foreign key, forming a tree structure.

View File

@@ -0,0 +1,83 @@
# ADR-025 — Canonical Import Output as Contract & Single-Migration Schema Foundation
**Date:** 2026-05-27
**Status:** Accepted
**Issue:** #671
**Milestone:** Handling the Unknowns — honest uncertainty in dates & people
---
## Context
The "Handling the Unknowns" milestone introduces honest uncertainty into the archive:
documents whose dates are known only approximately or as a range, and people the importer
infers from raw attribution text but cannot confidently identify. Three sibling issues —
date precision (#666), name triage (#665), and the importer (#669) — each independently
planned a Flyway `V69` migration that altered `persons`. Three `V69`s is a boot failure
(Flyway versions must be unique), and `persons.provisional` was at risk of being defined
twice.
Two durable decisions had to be made before any application code in Phases 36 could
compile against the new schema.
---
## Decision
### 1. All import/precision/attribution/identity schema lives in ONE migration with a single owner
`V69__import_precision_attribution_identity_schema.sql` adds every new column for this
milestone in a single, atomic, forward-only migration:
- `documents`: `meta_date_precision` (backfilled `DAY` where dated / `UNKNOWN` where not,
then `NOT NULL`), `meta_date_end`, `meta_date_raw`, `sender_text`, `receiver_text`.
- `persons`: `source_ref` (unique index, nullable), `provisional` (`NOT NULL DEFAULT false`).
- `tag`: `source_ref` (unique index, nullable).
Integrity is pushed to the database as fail-closed `CHECK` constraints (the precedent is
`V22`'s `person_type` allowlist):
- `meta_date_precision` must be one of the seven enum values.
- `meta_date_end` may be non-null **only** when precision = `RANGE` (one-directional, not
biconditional — see Consequences).
- `meta_date_end >= meta_date` for ranges with both endpoints (a `CHECK`, not a trigger).
- `meta_date_raw`, `sender_text`, `receiver_text` are length-capped at 10 000 (mirrors the
`transcription_blocks` cap in `V18`).
No sibling issue adds another migration that alters `persons` or `documents` in this
milestone.
### 2. The backend `DatePrecision` enum is a verbatim mirror of the normalizer's `Precision`; the canonical output is the contract
The importer reads the Python normalizer's canonical output
(`tools/import-normalizer/`). The backend `DatePrecision` enum
(`DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN`) is a verbatim copy of the normalizer's
`Precision(StrEnum)` (`dates.py`). There is **no translation layer**: the normalizer's
output strings are persisted as-is. The same applies to `source_ref`, which carries the
normalizer's `person_id` / canonical `tag_path` unchanged as the re-import idempotency key.
---
## Consequences
- **RANGE is one-directional, not biconditional.** A `RANGE` row may have a null
`meta_date_end` (an open-ended range with only a start), because the normalizer can emit
start-only ranges. A biconditional `RANGE ⟺ end IS NOT NULL` rule would reject valid
normalizer output, so it was rejected. Phase 4 rendering must handle a `RANGE` with no end
gracefully.
- **`provisional` stays `false` throughout this phase.** The column and flag exist, but no
code path sets it `true`; the importer (Phase 3) is the only writer. This is intentional,
not a half-built feature.
- **A future dev must not "improve" the enum.** Renaming or dropping a `DatePrecision` value
without changing the normalizer silently breaks import idempotency and date rendering. The
enum's Javadoc states this; the DB `CHECK` enforces validity independent of the Java enum.
- **`source_ref` is unique + nullable.** Manually created persons/tags have `source_ref =
NULL`; Postgres allows multiple NULLs under a plain unique index, so no backfill is needed.
- **Forward-only.** The migration is immutable once shipped (Flyway checksum model); any fix
goes in a later version. There is no down-migration — rollback means restoring from the
nightly `pg_dump`, the standard procedure.
- **`PersonSummaryDTO` coupling.** `provisional` was added to the `PersonSummaryDTO` native
interface projection; because the projection is backed by native SQL, the column had to be
added to all three native `SELECT`s (`findAllWithDocumentCount`, `searchWithDocumentCount`,
`findTopByDocumentCount`) or it would silently return `false`. Guarded by integration tests
against real Postgres.

View File

@@ -1,6 +1,6 @@
@startuml db-orm
' Schema source: Flyway V1V60 (excl. V37, V43 — intentionally removed)
' Schema as of: V60 (2026-05-06)
' Schema source: Flyway V1V69 (excl. V37, V43 — intentionally removed)
' Schema as of: V69 (2026-05-27)
' ⚠ This is a versioned snapshot. Update when the schema changes significantly.
hide circle
@@ -88,6 +88,11 @@ package "Documents" {
summary : TEXT
transcription : TEXT
meta_date : DATE
meta_date_precision : VARCHAR(16) NOT NULL
meta_date_end : DATE
meta_date_raw : TEXT
sender_text : TEXT
receiver_text : TEXT
meta_location : VARCHAR(255)
meta_document_location : VARCHAR(255)
archive_box : VARCHAR(255)
@@ -182,6 +187,8 @@ package "Persons" {
birth_year : INTEGER
death_year : INTEGER
family_member : BOOLEAN NOT NULL
source_ref : VARCHAR(255) UNIQUE
provisional : BOOLEAN NOT NULL
}
entity person_name_aliases {
@@ -217,6 +224,7 @@ package "Tags" {
name : VARCHAR(255) NOT NULL UNIQUE
parent_id : UUID <<FK>>
color : VARCHAR(20)
source_ref : VARCHAR(255) UNIQUE
}
}

View File

@@ -1,7 +1,9 @@
@startuml db-relationships
' Schema source: Flyway V1V60 (excl. V37, V43 — intentionally removed)
' Schema as of: V60 (2026-05-06)
' Schema source: Flyway V1V69 (excl. V37, V43 — intentionally removed)
' Schema as of: V69 (2026-05-27)
' ⚠ This is a versioned snapshot. Update when the schema changes significantly.
' Note: V69 adds columns only (persons.source_ref, tag.source_ref, document
' precision/attribution fields); no new FK relationships, so this diagram is unchanged.
hide circle
skinparam linetype ortho