docs(adr-025): record document-authoritative collections and non-transactional orchestrator
Clarify that idempotency precedence is domain-specific: Person/Tag scalar fields preserve human edits, while document sender/receivers/tags are canonical-authoritative (cleared and re-populated on re-import so a shrunk set prunes stale links). Pin the cross-loader provisional precedence. Record that runImport() is non-transactional (per-loader transactions only) and the partial-failure-then-retry recovery is safe because the import is idempotent. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -72,11 +72,26 @@ loader uses `RelationshipService`, never the relationship repository.
|
||||
|
||||
Settled sub-decisions:
|
||||
|
||||
- **Idempotency precedence = preserve human edits.** Persons/tags upsert by `source_ref`,
|
||||
documents by `index`. On re-import a non-blank field a human changed in-app is never
|
||||
overwritten (blank fields are filled from canonical), and `provisional` is monotonic — once
|
||||
a human confirms a person (`false`) it never reverts to `true`. Verified against real
|
||||
Postgres in `CanonicalImportIntegrationTest`.
|
||||
- **Idempotency precedence is domain-specific.** Persons/tags upsert by `source_ref`,
|
||||
documents by `index`. Two distinct rules apply:
|
||||
- **Person/Tag scalar fields = preserve human edits.** On re-import a non-blank field a human
|
||||
changed in-app is never overwritten (blank fields are filled from canonical via the single
|
||||
`preferHuman` idiom), and `provisional` is monotonic-downward — once a human confirms a
|
||||
person (`false`) it never reverts to `true`. Because the orchestrator loads the register and
|
||||
tree *before* documents, a person already `false` can never be flipped provisional by a
|
||||
later document row that references the same `source_ref`, regardless of document-row order.
|
||||
- **Document sender/receivers/tags = canonical-authoritative.** A document's sender, receiver
|
||||
set, and tag set are owned by the canonical row, not the archivist. On re-import of a
|
||||
PLACEHOLDER document `DocumentImporter` clears and re-populates `receivers`/`tags` so a row
|
||||
whose set *shrinks* prunes the removed links rather than accumulating stale ones. The
|
||||
"preserve human edits" rule above does **not** extend to these collections. The raw
|
||||
`sender_text`/`receiver_text` cells are always retained verbatim (a separate invariant).
|
||||
Note non-PLACEHOLDER documents are skipped entirely (`ALREADY_EXISTS`), so once a document
|
||||
has a file the importer never touches it again — this bounds the authoritative-overwrite
|
||||
blast radius to placeholder rows.
|
||||
Verified against real Postgres in `CanonicalImportIntegrationTest`
|
||||
(`reimport_preservesHumanEditedPersonField`, `reimport_prunesRemovedReceiverAndTag…`,
|
||||
`import_neverFlipsRegisterPersonToProvisional…`).
|
||||
- **Name policy = Option A.** The normalizer resolved attribution upstream: the document sheet
|
||||
carries the resolved slug in `sender_person_id` / `receiver_person_ids` and the raw cell in
|
||||
`sender_name` / `receiver_names`. The importer routes register-first by `source_ref`
|
||||
@@ -114,6 +129,15 @@ Settled sub-decisions:
|
||||
- **Forward-only.** The migration is immutable once shipped (Flyway checksum model); any fix
|
||||
goes in a later version. There is no down-migration — rollback means restoring from the
|
||||
nightly `pg_dump`, the standard procedure.
|
||||
- **`runImport()` is non-transactional — per-loader transactions only.** The orchestrator
|
||||
does not wrap the four loaders in a single transaction; each loader (or the per-call
|
||||
`upsertBySourceRef` / `DocumentImporter.load`) carries its own `@Transactional` boundary. A
|
||||
partial failure mid-run (e.g. the document loader throws after tags + persons committed)
|
||||
leaves the earlier loaders' data committed and the `ImportStatus` set to `FAILED`. This is
|
||||
acceptable precisely because the import is idempotent: re-running is safe and converges to
|
||||
the same state, so the operational recovery for a partial failure is simply to fix the
|
||||
offending artifact and re-trigger the import — no manual cleanup of half-written data is
|
||||
required. A future maintainer must not assume all-or-nothing semantics.
|
||||
- **`PersonSummaryDTO` coupling.** `provisional` was added to the `PersonSummaryDTO` native
|
||||
interface projection; because the projection is backed by native SQL, the column had to be
|
||||
added to all three native `SELECT`s (`findAllWithDocumentCount`, `searchWithDocumentCount`,
|
||||
|
||||
Reference in New Issue
Block a user