# ADR-025 — Canonical Import Output as Contract & Single-Migration Schema Foundation **Date:** 2026-05-27 **Status:** Accepted **Issue:** #671 **Milestone:** Handling the Unknowns — honest uncertainty in dates & people --- ## Context The "Handling the Unknowns" milestone introduces honest uncertainty into the archive: documents whose dates are known only approximately or as a range, and people the importer infers from raw attribution text but cannot confidently identify. Three sibling issues — date precision (#666), name triage (#665), and the importer (#669) — each independently planned a Flyway `V69` migration that altered `persons`. Three `V69`s is a boot failure (Flyway versions must be unique), and `persons.provisional` was at risk of being defined twice. Two durable decisions had to be made before any application code in Phases 3–6 could compile against the new schema. --- ## Decision ### 1. All import/precision/attribution/identity schema lives in ONE migration with a single owner `V69__import_precision_attribution_identity_schema.sql` adds every new column for this milestone in a single, atomic, forward-only migration: - `documents`: `meta_date_precision` (backfilled `DAY` where dated / `UNKNOWN` where not, then `NOT NULL`), `meta_date_end`, `meta_date_raw`, `sender_text`, `receiver_text`. - `persons`: `source_ref` (unique index, nullable), `provisional` (`NOT NULL DEFAULT false`). - `tag`: `source_ref` (unique index, nullable). Integrity is pushed to the database as fail-closed `CHECK` constraints (the precedent is `V22`'s `person_type` allowlist): - `meta_date_precision` must be one of the seven enum values. - `meta_date_end` may be non-null **only** when precision = `RANGE` (one-directional, not biconditional — see Consequences). - `meta_date_end >= meta_date` for ranges with both endpoints (a `CHECK`, not a trigger). - `meta_date_raw`, `sender_text`, `receiver_text` are length-capped at 10 000 (mirrors the `transcription_blocks` cap in `V18`). No sibling issue adds another migration that alters `persons` or `documents` in this milestone. ### 2. The backend `DatePrecision` enum is a verbatim mirror of the normalizer's `Precision`; the canonical output is the contract The importer reads the Python normalizer's canonical output (`tools/import-normalizer/`). The backend `DatePrecision` enum (`DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN`) is a verbatim copy of the normalizer's `Precision(StrEnum)` (`dates.py`). There is **no translation layer**: the normalizer's output strings are persisted as-is. The same applies to `source_ref`, which carries the normalizer's `person_id` / canonical `tag_path` unchanged as the re-import idempotency key. --- ## Consequences - **RANGE is one-directional, not biconditional.** A `RANGE` row may have a null `meta_date_end` (an open-ended range with only a start), because the normalizer can emit start-only ranges. A biconditional `RANGE ⟺ end IS NOT NULL` rule would reject valid normalizer output, so it was rejected. Phase 4 rendering must handle a `RANGE` with no end gracefully. - **`provisional` stays `false` throughout this phase.** The column and flag exist, but no code path sets it `true`; the importer (Phase 3) is the only writer. This is intentional, not a half-built feature. - **A future dev must not "improve" the enum.** Renaming or dropping a `DatePrecision` value without changing the normalizer silently breaks import idempotency and date rendering. The enum's Javadoc states this; the DB `CHECK` enforces validity independent of the Java enum. - **`source_ref` is unique + nullable.** Manually created persons/tags have `source_ref = NULL`; Postgres allows multiple NULLs under a plain unique index, so no backfill is needed. - **Forward-only.** The migration is immutable once shipped (Flyway checksum model); any fix goes in a later version. There is no down-migration — rollback means restoring from the nightly `pg_dump`, the standard procedure. - **`PersonSummaryDTO` coupling.** `provisional` was added to the `PersonSummaryDTO` native interface projection; because the projection is backed by native SQL, the column had to be added to all three native `SELECT`s (`findAllWithDocumentCount`, `searchWithDocumentCount`, `findTopByDocumentCount`) or it would silently return `false`. Guarded by integration tests against real Postgres.