From 4fa2b83c0df84bf3be0d349fcead044209964214 Mon Sep 17 00:00:00 2001 From: Marcel Date: Wed, 27 May 2026 11:04:27 +0200 Subject: [PATCH] docs(adr-025): record document-authoritative collections and non-transactional orchestrator Clarify that idempotency precedence is domain-specific: Person/Tag scalar fields preserve human edits, while document sender/receivers/tags are canonical-authoritative (cleared and re-populated on re-import so a shrunk set prunes stale links). Pin the cross-loader provisional precedence. Record that runImport() is non-transactional (per-loader transactions only) and the partial-failure-then-retry recovery is safe because the import is idempotent. Refs #669 Co-Authored-By: Claude Opus 4.7 --- ...-and-single-migration-schema-foundation.md | 34 ++++++++++++++++--- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/docs/adr/025-canonical-import-and-single-migration-schema-foundation.md b/docs/adr/025-canonical-import-and-single-migration-schema-foundation.md index 8cfd897b..94f3991e 100644 --- a/docs/adr/025-canonical-import-and-single-migration-schema-foundation.md +++ b/docs/adr/025-canonical-import-and-single-migration-schema-foundation.md @@ -72,11 +72,26 @@ loader uses `RelationshipService`, never the relationship repository. Settled sub-decisions: -- **Idempotency precedence = preserve human edits.** Persons/tags upsert by `source_ref`, - documents by `index`. On re-import a non-blank field a human changed in-app is never - overwritten (blank fields are filled from canonical), and `provisional` is monotonic — once - a human confirms a person (`false`) it never reverts to `true`. Verified against real - Postgres in `CanonicalImportIntegrationTest`. +- **Idempotency precedence is domain-specific.** Persons/tags upsert by `source_ref`, + documents by `index`. Two distinct rules apply: + - **Person/Tag scalar fields = preserve human edits.** On re-import a non-blank field a human + changed in-app is never overwritten (blank fields are filled from canonical via the single + `preferHuman` idiom), and `provisional` is monotonic-downward — once a human confirms a + person (`false`) it never reverts to `true`. Because the orchestrator loads the register and + tree *before* documents, a person already `false` can never be flipped provisional by a + later document row that references the same `source_ref`, regardless of document-row order. + - **Document sender/receivers/tags = canonical-authoritative.** A document's sender, receiver + set, and tag set are owned by the canonical row, not the archivist. On re-import of a + PLACEHOLDER document `DocumentImporter` clears and re-populates `receivers`/`tags` so a row + whose set *shrinks* prunes the removed links rather than accumulating stale ones. The + "preserve human edits" rule above does **not** extend to these collections. The raw + `sender_text`/`receiver_text` cells are always retained verbatim (a separate invariant). + Note non-PLACEHOLDER documents are skipped entirely (`ALREADY_EXISTS`), so once a document + has a file the importer never touches it again — this bounds the authoritative-overwrite + blast radius to placeholder rows. + Verified against real Postgres in `CanonicalImportIntegrationTest` + (`reimport_preservesHumanEditedPersonField`, `reimport_prunesRemovedReceiverAndTag…`, + `import_neverFlipsRegisterPersonToProvisional…`). - **Name policy = Option A.** The normalizer resolved attribution upstream: the document sheet carries the resolved slug in `sender_person_id` / `receiver_person_ids` and the raw cell in `sender_name` / `receiver_names`. The importer routes register-first by `source_ref` @@ -114,6 +129,15 @@ Settled sub-decisions: - **Forward-only.** The migration is immutable once shipped (Flyway checksum model); any fix goes in a later version. There is no down-migration — rollback means restoring from the nightly `pg_dump`, the standard procedure. +- **`runImport()` is non-transactional — per-loader transactions only.** The orchestrator + does not wrap the four loaders in a single transaction; each loader (or the per-call + `upsertBySourceRef` / `DocumentImporter.load`) carries its own `@Transactional` boundary. A + partial failure mid-run (e.g. the document loader throws after tags + persons committed) + leaves the earlier loaders' data committed and the `ImportStatus` set to `FAILED`. This is + acceptable precisely because the import is idempotent: re-running is safe and converges to + the same state, so the operational recovery for a partial failure is simply to fix the + offending artifact and re-trigger the import — no manual cleanup of half-written data is + required. A future maintainer must not assume all-or-nothing semantics. - **`PersonSummaryDTO` coupling.** `provisional` was added to the `PersonSummaryDTO` native interface projection; because the projection is backed by native SQL, the column had to be added to all three native `SELECT`s (`findAllWithDocumentCount`, `searchWithDocumentCount`,