As the archive owner I want one Flyway migration and domain model carrying all import/precision/attribution/identity fields so downstream phases compile against a single, collision-free schema #671

New Issue

marcel · 2026-05-26T22:17:43+02:00

marcel commented

2026-05-26 22:17:43 +02:00

Phase 2 of the "Handling the Unknowns" milestone — the schema foundation

Multi-persona review found that the original date (#666), name-triage (#665), and importer (#669) issues each planned a separate V69 Flyway migration altering persons — three V69s is a boot failure, and persons.provisional was at risk of being defined twice. This phase consolidates every new column into ONE migration with a single owner, plus the entity/enum/DTO surface, so Phases 3-6 just compile against a finished schema. No import logic, no rendering, no UI here.

The single migration (next free version — confirm at implementation time; head was `V68__add_grafana_reader_role.sql`)

documents

meta_date_precision varchar(16) — the precision enum; backfill then NOT NULL.
meta_date_end date NULL — range end (only for RANGE).
meta_date_raw text NULL — original date cell, verbatim.
sender_text text NULL, receiver_text text NULL — raw attribution preserved even when a person is linked.

persons

source_ref varchar unique, indexed — the normalizer person_id; the join key for documents → persons and the idempotency key for re-import.
provisional boolean NOT NULL DEFAULT false.

tag

source_ref varchar unique, indexed — keyed on the canonical tag_path.

Integrity at the DB layer (review consensus — Markus/Nora/Tobias):

CHECK that meta_date_precision is one of the seven enum values.
CHECK that meta_date_end is non-null only when precision = RANGE, and null otherwise.
CHECK/trigger that meta_date_end >= meta_date for ranges.
Backfill before the NOT NULL: meta_date_precision = 'DAY' where meta_date is set, 'UNKNOWN' where null — then add NOT NULL.

Domain model

New DatePrecision enum mirroring the normalizer's seven values verbatim: DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN (no translation layer; APPROX is rendered "ca." in Phase 4).
Entity fields on Document (@Enumerated(STRING) precision with @Schema(requiredMode = REQUIRED); the rest nullable), Person (source_ref, provisional with @Schema REQUIRED + @Builder.Default), Tag (source_ref). No business logic.
DTO surface: add the precision fields to DocumentUpdateDTO, DocumentBatchMetadataDTO, DocumentListItem; add provisional (and source_ref if needed) to PersonSummaryDTO so Phase 5 can filter without a new field. Caveat: PersonSummaryDTO is a native-query INTERFACE PROJECTION consumed by ~3 @Query methods in PersonRepository (findAllWithDocumentCount, searchWithDocumentCount, findTopByDocumentCount), so the new provisional field must be added to all of those native SELECTs or it silently returns false — this needs an integration test (real Postgres), not a unit test.
Run npm run generate:api so the TS types pick up DatePrecision and the new fields.

Documentation (blocker)

docs/architecture/db/db-orm.puml + db-relationships.puml — all new columns.
docs/GLOSSARY.md — "date precision", "source_ref", "provisional person", "raw attribution".
ADR — "the importer reads the normalizer's canonical output, and all import-related schema lives in one migration" (the lasting decision behind Phases 1-3).

Acceptance criteria

Scenario: One migration, no collision
  Given the schema migration runs on a fresh database
  Then it applies cleanly as a single Flyway version
  And no other milestone issue adds a migration that alters persons or documents

Scenario: Precision backfill is correct and constrained
  Given existing documents with and without a meta_date
  When the migration runs
  Then dated rows get precision 'DAY' and undated rows get 'UNKNOWN'
  And meta_date_precision becomes NOT NULL
  And a row with precision != 'RANGE' cannot have a non-null meta_date_end

Scenario: Identity column is unique
  Then persons.source_ref and tag.source_ref are unique-indexed
  And persons.provisional defaults to false

Scenario: Types regenerate
  When npm run generate:api runs against the dev backend
  Then DatePrecision and the new document/person fields appear in the generated TS types

Out of scope

Reading/loading any canonical file → Phase 3 (importer).
Date rendering / formatter / buildTitle → Phase 4.
Persons directory UI → Phase 5; undated browse → Phase 6.

Dependency

Independent of Phase 1 #670 (parallelisable), but both #670 and this issue must land before Phase 3 (importer), which compiles against these columns and consumes the canonical files.

## Phase 2 of the "Handling the Unknowns" milestone — the schema foundation Multi-persona review found that the original date (#666), name-triage (#665), and importer (#669) issues **each planned a separate `V69` Flyway migration altering `persons`** — three `V69`s is a boot failure, and `persons.provisional` was at risk of being defined twice. This phase consolidates **every new column into ONE migration with a single owner**, plus the entity/enum/DTO surface, so Phases 3-6 just compile against a finished schema. No import logic, no rendering, no UI here. ## The single migration (next free version — confirm at implementation time; head was `V68__add_grafana_reader_role.sql`) **`documents`** - `meta_date_precision varchar(16)` — the precision enum; backfill then `NOT NULL`. - `meta_date_end date NULL` — range end (only for `RANGE`). - `meta_date_raw text NULL` — original date cell, verbatim. - `sender_text text NULL`, `receiver_text text NULL` — raw attribution preserved even when a person is linked. **`persons`** - `source_ref varchar` **unique, indexed** — the normalizer `person_id`; the join key for documents → persons and the idempotency key for re-import. - `provisional boolean NOT NULL DEFAULT false`. **`tag`** - `source_ref varchar` unique, indexed — keyed on the canonical `tag_path`. **Integrity at the DB layer (review consensus — Markus/Nora/Tobias):** - `CHECK` that `meta_date_precision` is one of the seven enum values. - `CHECK` that `meta_date_end` is non-null **only** when precision = `RANGE`, and null otherwise. - `CHECK`/trigger that `meta_date_end >= meta_date` for ranges. - **Backfill before the NOT NULL:** `meta_date_precision = 'DAY'` where `meta_date` is set, `'UNKNOWN'` where null — then add `NOT NULL`. ## Domain model - New `DatePrecision` enum mirroring the normalizer's seven values **verbatim**: `DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN` (no translation layer; `APPROX` is rendered "ca." in Phase 4). - Entity fields on `Document` (`@Enumerated(STRING)` precision with `@Schema(requiredMode = REQUIRED)`; the rest nullable), `Person` (`source_ref`, `provisional` with `@Schema` REQUIRED + `@Builder.Default`), `Tag` (`source_ref`). No business logic. - DTO surface: add the precision fields to `DocumentUpdateDTO`, `DocumentBatchMetadataDTO`, `DocumentListItem`; add `provisional` (and `source_ref` if needed) to `PersonSummaryDTO` so Phase 5 can filter without a new field. **Caveat: `PersonSummaryDTO` is a native-query INTERFACE PROJECTION** consumed by ~3 `@Query` methods in `PersonRepository` (`findAllWithDocumentCount`, `searchWithDocumentCount`, `findTopByDocumentCount`), so the new `provisional` field must be added to **all** of those native SELECTs or it silently returns false — this needs an **integration test** (real Postgres), not a unit test. - Run `npm run generate:api` so the TS types pick up `DatePrecision` and the new fields. ## Documentation (blocker) - `docs/architecture/db/db-orm.puml` + `db-relationships.puml` — all new columns. - `docs/GLOSSARY.md` — "date precision", "source_ref", "provisional person", "raw attribution". - **ADR** — "the importer reads the normalizer's canonical output, and all import-related schema lives in one migration" (the lasting decision behind Phases 1-3). ## Acceptance criteria ```gherkin Scenario: One migration, no collision Given the schema migration runs on a fresh database Then it applies cleanly as a single Flyway version And no other milestone issue adds a migration that alters persons or documents Scenario: Precision backfill is correct and constrained Given existing documents with and without a meta_date When the migration runs Then dated rows get precision 'DAY' and undated rows get 'UNKNOWN' And meta_date_precision becomes NOT NULL And a row with precision != 'RANGE' cannot have a non-null meta_date_end Scenario: Identity column is unique Then persons.source_ref and tag.source_ref are unique-indexed And persons.provisional defaults to false Scenario: Types regenerate When npm run generate:api runs against the dev backend Then DatePrecision and the new document/person fields appear in the generated TS types ``` ## Out of scope - Reading/loading any canonical file → **Phase 3 (importer)**. - Date rendering / formatter / buildTitle → **Phase 4**. - Persons directory UI → **Phase 5**; undated browse → **Phase 6**. ## Dependency Independent of Phase 1 #670 (parallelisable), but **both #670 and this issue must land before Phase 3 (importer)**, which compiles against these columns and consumes the canonical files.

marcel added this to the Handling the Unknowns — honest uncertainty in dates & people milestone 2026-05-26 22:17:43 +02:00

marcel added the P0-critical feature labels 2026-05-26 22:18:33 +02:00

As the archive owner I want one Flyway migration and domain model carrying all import/precision/attribution/identity fields so downstream phases compile against a single, collision-free schema #671

Phase 2 of the "Handling the Unknowns" milestone — the schema foundation

The single migration (next free version — confirm at implementation time; head was V68__add_grafana_reader_role.sql)

Domain model

Documentation (blocker)

Acceptance criteria

Out of scope

Dependency

Markus Keller — Senior Application Architect

Observations

Recommendations

Open Decisions

Felix Brandt — Senior Fullstack Developer

Observations

Recommendations

Open Decisions

Nora Steiner ("NullX") — Application Security Engineer

Observations

Recommendations

Open Decisions

Sara Holt — Senior QA Engineer

Observations

Recommendations

Open Decisions

Tobias Wendt — DevOps & Platform Engineer

Observations

Recommendations

Open Decisions

Elicit — Requirements Engineer & Business Analyst

Observations

Recommendations

Open Decisions

Leonie Voss — UX & Accessibility Lead

Observations

Recommendations

Open Decisions

Decision Queue — Action Required

Data model / constraints

Scope confirmation

Implemented on feature/671-schema-foundation (branched from docs/import-migration)

Tests run (Testcontainers Postgres, per-class — never full suite)

Decisions applied (the two open items)

npm run generate:api — hand-edited, must be re-validated in CI

The single migration (next free version — confirm at implementation time; head was `V68__add_grafana_reader_role.sql`)

Implemented on `feature/671-schema-foundation` (branched from `docs/import-migration`)

`npm run generate:api` — hand-edited, must be re-validated in CI