As the archive owner I want one Flyway migration and domain model carrying all import/precision/attribution/identity fields so downstream phases compile against a single, collision-free schema #671
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Phase 2 of the "Handling the Unknowns" milestone — the schema foundation
Multi-persona review found that the original date (#666), name-triage (#665), and importer (#669) issues each planned a separate
V69Flyway migration alteringpersons— threeV69s is a boot failure, andpersons.provisionalwas at risk of being defined twice. This phase consolidates every new column into ONE migration with a single owner, plus the entity/enum/DTO surface, so Phases 3-6 just compile against a finished schema. No import logic, no rendering, no UI here.The single migration (next free version — confirm at implementation time; head was
V68__add_grafana_reader_role.sql)documentsmeta_date_precision varchar(16)— the precision enum; backfill thenNOT NULL.meta_date_end date NULL— range end (only forRANGE).meta_date_raw text NULL— original date cell, verbatim.sender_text text NULL,receiver_text text NULL— raw attribution preserved even when a person is linked.personssource_ref varcharunique, indexed — the normalizerperson_id; the join key for documents → persons and the idempotency key for re-import.provisional boolean NOT NULL DEFAULT false.tagsource_ref varcharunique, indexed — keyed on the canonicaltag_path.Integrity at the DB layer (review consensus — Markus/Nora/Tobias):
CHECKthatmeta_date_precisionis one of the seven enum values.CHECKthatmeta_date_endis non-null only when precision =RANGE, and null otherwise.CHECK/trigger thatmeta_date_end >= meta_datefor ranges.meta_date_precision = 'DAY'wheremeta_dateis set,'UNKNOWN'where null — then addNOT NULL.Domain model
DatePrecisionenum mirroring the normalizer's seven values verbatim:DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN(no translation layer;APPROXis rendered "ca." in Phase 4).Document(@Enumerated(STRING)precision with@Schema(requiredMode = REQUIRED); the rest nullable),Person(source_ref,provisionalwith@SchemaREQUIRED +@Builder.Default),Tag(source_ref). No business logic.DocumentUpdateDTO,DocumentBatchMetadataDTO,DocumentListItem; addprovisional(andsource_refif needed) toPersonSummaryDTOso Phase 5 can filter without a new field. Caveat:PersonSummaryDTOis a native-query INTERFACE PROJECTION consumed by ~3@Querymethods inPersonRepository(findAllWithDocumentCount,searchWithDocumentCount,findTopByDocumentCount), so the newprovisionalfield must be added to all of those native SELECTs or it silently returns false — this needs an integration test (real Postgres), not a unit test.npm run generate:apiso the TS types pick upDatePrecisionand the new fields.Documentation (blocker)
docs/architecture/db/db-orm.puml+db-relationships.puml— all new columns.docs/GLOSSARY.md— "date precision", "source_ref", "provisional person", "raw attribution".Acceptance criteria
Out of scope
Dependency
Independent of Phase 1 #670 (parallelisable), but both #670 and this issue must land before Phase 3 (importer), which compiles against these columns and consumes the canonical files.
Markus Keller — Senior Application Architect
This is exactly the right move. Consolidating three competing
V69s into one owned migration is textbook "get the schema right before application code." I confirmed the migration head isV68__add_grafana_reader_role.sql, soV69is the correct next version — no collision today.Observations
V22__person_type_title_nullable_firstname.sql(CHECK (person_type IN ('PERSON', 'INSTITUTION', 'GROUP', 'UNKNOWN'))) verbatim. Follow that style exactly so the next reviewer recognizes the pattern.PersonSummaryDTOis a native-query interface projection, not a class. It is consumed by three native@Querymethods inPersonRepository(findAllWithDocumentCount,searchWithDocumentCount,findTopByDocumentCount). Adding aprovisionalgetter to the interface is silent unlessp.provisionalis added to all three SELECT lists — otherwise the getter returns null at runtime with no compile error. This is the single biggest hidden-coupling trap in the issue.tagtable is singulartag(confirmed inV1__initial_schema.sql:73), and theTagentity has no@Tableannotation — relies on the default. The migration mustALTER TABLE tag, nottags.DatePrecisionenum is a verbatim mirror of the normalizer'sPrecision(StrEnum)(dates.py:9— DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN). Good: no translation layer. This is the lasting decision the ADR must capture.Recommendations
DatePrecisionenum is a verbatim mirror of the Python normalizer'sPrecision; the canonical output is the contract — no translation layer." Record the second prong explicitly so a future dev doesn't "improve" the enum and silently break import idempotency.db-orm.puml+db-relationships.puml) — five newdocumentscolumns, twopersonscolumns, onetagcolumn. The relationship diagram needs no new lines (no FKs), but the ORM diagram needs every attribute. This is a blocker per my doc-currency table.CHECK (meta_date_end IS NULL OR meta_date_end >= meta_date)is a single immutable-row check — no trigger needed. The issue says "CHECK/trigger"; pick CHECK. A trigger is operational overhead and harder to test. Reserve a trigger only if you need cross-row logic, which you don't here.source_refnullable. It is an import-only identity key; manually created persons/tags will never have one. Unique + nullable is correct in Postgres (multiple NULLs allowed). Make sure the unique index is a plain unique index, not a constraint that rejects NULLs.Open Decisions
meta_date_endis non-null only when precision =RANGE. But the normalizer also emitsRANGEprecision withmeta_dateset to the range start (e.g.dates.py:213returns(year-01-01, RANGE)). Confirm the intended contract: does everyRANGErow require a non-nullmeta_date_end, or is aRANGEwith only a start (open-ended range) legal? The two CHECKs as written (end non-null only for RANGE+end >= meta_date) permit aRANGErow with null end — decide if that is intended or ifRANGE ⟺ end IS NOT NULLshould be biconditional. Cost: a biconditional CHECK is stricter and may reject normalizer output that emits open ranges; the looser version permits ambiguous range rows downstream.Felix Brandt — Senior Fullstack Developer
Clean, well-scoped schema-only issue. The TDD path here is mostly integration-level (migration + entity mapping), and I want to flag a couple of construction sites that the issue's DTO list glosses over.
Observations
PersonSummaryDTOis an interface projection backed by native SQL (PersonRepository.findAllWithDocumentCountand two siblings). Addingboolean isProvisional()to the interface compiles fine but returns null/false silently unless I addp.provisionalto all three native SELECT column lists. There is no compile-time safety net here — this needs an integration test asserting the projected value, not a unit test.DocumentListItemis a record with exactly one construction site:DocumentService.toListItem()(line 755). AddingmetaDatePrecision/metaDateEndto the record forces a change there. Good news: it's one place. I'll add the precision field as@Schema(requiredMode = REQUIRED)since every document row will have a backfilled precision.DocumentUpdateDTOandDocumentBatchMetadataDTOare plain@DataPOJOs — trivial to extend, no construction sites to chase.DatePrecisionenum maps verbatim todates.py:9'sPrecision(StrEnum). I'll use@Enumerated(EnumType.STRING)on theDocumentfield, matching howscriptTypeandstatusare already done inDocument.java.Recommendations
meta_date_precisionis NOT NULL." Then a second: "dated row backfills toDAY, undated toUNKNOWN." This is the red/green that proves the backfill, and it can only run on real Postgres (H2 won't honor the CHECK).provisionalwith@Builder.Default private boolean provisional = false;and@Schema(requiredMode = REQUIRED)— identical to the existingfamilyMemberfield inPerson.java:55. Reuse that exact pattern; don't invent a new one.Document, use the same shape asstatus:@Enumerated(EnumType.STRING) @Column(name = "meta_date_precision", nullable = false) @Schema(requiredMode = REQUIRED) @Builder.Default private DatePrecision metaDatePrecision = DatePrecision.UNKNOWN;. A non-null default in the builder prevents NPEs in tests that construct documents without going through import.npm run generate:apiis in the acceptance criteria for a reason — the TSDatePrecisionunion and the new optional fields must land in the generated types in this PR, or Phase 4/5 frontend work has no types to compile against. Verify the generated diff is committed.Open Decisions
None. The construction sites are identifiable and the patterns to copy already exist in the codebase.
Nora Steiner ("NullX") — Application Security Engineer
Schema-and-entity issue with no new endpoints, so the attack surface is small. My focus is on the integrity constraints (these are security controls — they fail closed) and the new free-text columns.
Observations
sender_text/receiver_textare newtextcolumns carrying raw attribution. No length bound. CompareV18__add_transcription_blocks.sql:5which caps text atCHECK (length(text) <= 10000). Unbounded user-influenced text is a (mild) DoS / storage-abuse vector and an inconsistency with existing convention. Same applies tometa_date_raw.source_refis the re-import idempotency key. That makes it security-relevant: if two distinct normalizer persons collide onsource_ref, the unique index fails closed (good) — but the import phase must surface that as a structured conflict, not a silent skip. That's Phase 3's job; just ensure the unique index exists so the DB enforces it rather than application code (no race window).V22'sIN (...)form so an out-of-enum value is rejected at write time, not discovered later as corrupt rendering input.Recommendations
CHECK (length(...) <= N)tometa_date_raw,sender_text, andreceiver_textto match thetranscription_blocksprecedent. These cells originate from spreadsheet imports; bound them (10000 is the house number). Defense in depth at the DB layer costs nothing and prevents a malformed/huge cell from bloating a row.IN ('DAY','MONTH',...)list, not a reference to anything app-side. The whole point is that the DB enforces validity independent of the Java enum — if someone deletes a Java enum value, the DB still rejects bad data.@RequirePermissionwork here since there are no new write endpoints — correct scope. When Phase 5 exposesprovisionalviaPersonSummaryDTO, confirm it doesn't leak anything sensitive; a boolean "this person is provisional" is metadata, not PII, so I have no concern there.Open Decisions
None. The constraints are the security control and they're already specified; I'm only asking to extend the existing length-bound convention to the new text columns.
Sara Holt — Senior QA Engineer
This is a migration-and-mapping issue, so the test strategy lives almost entirely at the integration layer against real Postgres. The acceptance criteria are written as Gherkin already — good — but a few are not yet covered by an obvious test and one will be invisible to H2.
Observations
postgres:16-alpine, never H2. The backfill, the NOT NULL transition, and the CHECK constraints are all PostgreSQL-specific behaviors. An H2 run would pass while the real migration fails. The backend already gates at 88% branch coverage (backend CLAUDE.md) and uses Testcontainers — this issue must land its migration test in that suite.DAY, undated →UNKNOWN) plus the NOT NULL transition plus the negative CHECK (non-RANGE row cannot have a non-nullmeta_date_end). That's four behaviors → four tests, one logical assertion each.PersonSummaryDTOprojection change is the highest regression risk and the easiest to under-test. Addingprovisionalto the interface compiles even if the native query forgets to SELECT it — the getter just returns false. Only an integration test that inserts a provisional person and reads it back throughfindAllWithDocumentCountwill catch a forgotten column.Recommendations
MigrationIntegrationTest(or extend the existing one) with these named tests:migration_backfills_dated_rows_to_DAY_precisionmigration_backfills_undated_rows_to_UNKNOWN_precisionmeta_date_precision_is_not_null_after_migrationnon_range_row_rejects_non_null_meta_date_end(expect aDataIntegrityViolationException)range_row_with_end_before_start_is_rejectedpersons_source_ref_unique_index_rejects_duplicatetag_source_ref_unique_index_rejects_duplicatepersons_provisional_defaults_to_falsePerson, call the repository'sfindAllWithDocumentCount, assert the returnedPersonSummaryDTO.isProvisional()is true. This is the test that guards against the silent-null trap.@Sqlfixture or insert pre-migration rows in a Flyway-aware test.npm run generate:apioutput is verified, not just run. A Vitest/type-check assertion isn't practical for generated types, but the PR diff must showDatePrecisionin the generated file — make that a manual checklist item in the PR description.Open Decisions
None. The acceptance criteria map cleanly to integration tests; I've named them above.
Tobias Wendt — DevOps & Platform Engineer
The headline of this issue is a DevOps concern: three competing
V69migrations would be a boot failure on the next deploy. Consolidating to one owned version is exactly the fix. I confirmed the current head isV68__add_grafana_reader_role.sql, soV69is free.Observations
UPDATE ... SET meta_date_precision = 'DAY'followed byALTER TABLE ... SET NOT NULLwill take a table lock. For the current archive size this is seconds, not a concern — but the migration must do the backfill before the NOT NULL in the same migration, exactly as the issue specifies. Order matters; a single-file migration is correct.docs/architecture/c4/l2-containers.pumlandDEPLOYMENT.mdare untouched — correct.Recommendations
persons.source_refandtag.source_refon a populated table: since all existing rows havesource_ref = NULL(the column is new), the unique index builds instantly and NULLs don't collide in Postgres. NoCREATE INDEX CONCURRENTLYneeded at current scale — a plain index inside the migration is fine and keeps it atomic. If the archive ever grows to where this locks too long, revisit, but don't pre-optimize now.pg_dump, which is the standard procedure — nothing new to build.Open Decisions
None. No infrastructure or config surface is touched; the migration ordering the issue specifies is the correct and safe approach at current data volume.
Elicit — Requirements Engineer & Business Analyst
I'm in Brownfield mode. This issue is unusually well-formed for a schema phase: it has Gherkin acceptance criteria, an explicit out-of-scope list, and a stated dependency on #670 before Phase 3. That's a Definition-of-Ready pass on most criteria. My job is to hunt ambiguity and missing edges before code starts.
Observations
meta_dateand the column already somehow populated (shouldn't happen — new column — but worth a one-line assumption), and (2) whethertag.source_refbackfill is needed at all, or whether all existing tags simply get NULL (the issue implies NULL; make it explicit).npm run generate:api" is an acceptance criterion that depends on a running dev backend. That's an environmental precondition, not a behavior the migration controls. It's correctly listed, but flag that this criterion is verified by a human checking the committed diff, not by an automated gate.Recommendations
tagandpersonsrows receivesource_ref = NULL; only the importer (Phase 3) populates it." This closes the ambiguity about whether a tag backfill is in scope here (it isn't).RANGErow must have an end. State the intended rule as a Gherkin line either way — an untestable gap becomes a testable assertion.provisional personin particular needs a definition that distinguishes it from the existingfamilyMemberboolean so future readers don't conflate them.Open Decisions
RANGEprecision row required to havemeta_date_end, or may it be open-ended? This determines whether the CHECK is biconditional. Cost: business call about whether the normalizer ever emits an open-ended range; if it does, the strict rule rejects valid import data.provisionalever set totrueby anything in this phase? The issue adds the column withDEFAULT falsebut all import logic is out of scope. Confirm that's intentional (column exists, stays false until Phase 3 writes it) so a reviewer doesn't expect a code path that sets it. Cost: none if intentional — just needs a one-line confirmation to avoid a "feature looks half-built" flag later.Leonie Voss — UX & Accessibility Lead
This is a schema + domain-model phase with explicitly no UI ("no rendering, no UI here"). So I have no surface to audit today. I checked one thing that will matter downstream, to make sure the data model doesn't quietly constrain the UX in Phases 4-6.
Observations
DatePrecisionenum carries the semantic distinction the UI needs. Phase 4 will renderAPPROXas "ca." and presumablySEASON/MONTH/YEARwith reduced specificity ("Frühjahr 1943", "1943"). The enum being a verbatim mirror of the normalizer means the rendering layer has the full fidelity it needs — nothing is collapsed to a lossyDate. That's the right call for the dual audience: a 60+ reader benefits enormously from "ca. 1943" over a fake-precise "01.01.1943".meta_date_rawpreserving the verbatim original cell is a UX asset, not just provenance. It enables a future "as written in the original" tooltip/disclosure for transcribers — exactly the kind of recognition-over-recall cue the senior audience values. Worth keeping in mind for Phase 4/5 even though it's out of scope now.provisionalas a first-class boolean means Phase 5 can give provisional persons a distinct, honest visual treatment (a badge with text + icon, never color alone) rather than hiding uncertainty. Good foundation.Recommendations
UNKNOWNrenders as a real label (e.g. "Datum unbekannt"), never an empty cell — an empty date field reads as "missing/broken" to users rather than "honestly unknown", which is the whole point of this milestone.Open Decisions
None — no UI in scope. My input is forward-looking for Phase 4, not a decision needed now.
Decision Queue — Action Required
2 decisions need your input before implementation starts. Everything else is a concrete recommendation the team will apply directly.
Data model / constraints
meta_date_endmay be non-null only when precision =RANGE, but does not say whether aRANGErow must have an end. The normalizer (dates.py) emitsRANGEwithmeta_dateas the range start — sometimes possibly with no end. Decide:RANGE ⟺ end IS NOT NULL): rejects any open-ended range; cleaner downstream rendering. Cost: rejects valid normalizer output if it ever emits a start-only range.end non-null → precision = RANGE, but RANGE may have null end): permits open-ended ranges. Cost: Phase 4 must handle aRANGEwith no end gracefully.(Raised by: Markus, Elicit)
Scope confirmation
provisionalstayfalsethroughout this phase? The column is added withDEFAULT falsebut all import logic that would set ittrueis out of scope (Phase 3). Confirm this is intentional — column exists, no code path sets it yet — so reviewers don't flag it as a half-built feature. A one-line "yes, Phase 3 populates it" in the issue closes this. (Raised by: Elicit)Implemented on
feature/671-schema-foundation(branched fromdocs/import-migration)Schema foundation complete via red→green TDD. Five atomic commits:
662927f9feat(schema):V69__import_precision_attribution_identity_schema.sql— one migration adding all columns + DB CHECK constraints;DatePrecisionenum (verbatim mirror of normalizer); entity fields on Document/Person/Tag. 14 new Testcontainers tests.0f07a95bfeat(person):provisionaladded toPersonSummaryDTOinterface and all three native SELECTs (findAllWithDocumentCount,searchWithDocumentCount+ its GROUP BY,findTopByDocumentCount) — guarded by 3 Postgres integration tests against the silent-false trap.c27c83f5feat(document): precision/attribution fields onDocumentListItem(carried throughtoListItem),DocumentUpdateDTO,DocumentBatchMetadataDTO.6f5ca475feat(frontend): hand-editedgenerated/api.ts(see note below).d959cb54docs: db-orm.puml + db-relationships.puml, GLOSSARY (4 terms), ADR-025.Tests run (Testcontainers Postgres, per-class — never full suite)
MigrationIntegrationTest— 44/44 ✅ (14 new V69)PersonRepositoryTest— 30/30 ✅ (3 new projection)DocumentListItemIntegrationTest4/4,DocumentSearchResultTest8/8,DocumentControllerTest97/97,FlywayConfigTest3/3 ✅./mvnw clean package -DskipTests✅Decisions applied (the two open items)
RANGErow may have a nullmeta_date_end(open-ended range), since the normalizer emits start-only ranges. Recorded in ADR-025.provisionalstaysfalsethis phase — column exists, only Phase 3 sets it true. Recorded in ADR-025.npm run generate:api— hand-edited, must be re-validated in CIThe worktree has no
node_modulesand no running dev backend, so I hand-editedfrontend/src/lib/generated/api.tsto mirror exactly whatgenerate:apiproduces (DatePrecision union, new Document/DocumentListItem/DTO fields, Personprovisional+sourceRef, TagsourceRef, PersonSummaryDTOprovisional). Action for reviewer/CI: re-runnpm run generate:apiagainst the dev backend to confirm byte parity, and fix any frontend test mock factories that now need the new required fields (provisional,metaDatePrecision) —npm run checkcould not run here.Committed locally only (no push / no PR per instruction) — owner will push + open the PR.