docs(adr-025): record document-authoritative collections and non-transactional orchestrator
Clarify that idempotency precedence is domain-specific: Person/Tag scalar fields preserve human edits, while document sender/receivers/tags are canonical-authoritative (cleared and re-populated on re-import so a shrunk set prunes stale links). Pin the cross-loader provisional precedence. Record that runImport() is non-transactional (per-loader transactions only) and the partial-failure-then-retry recovery is safe because the import is idempotent. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -72,11 +72,26 @@ loader uses `RelationshipService`, never the relationship repository.
|
|||||||
|
|
||||||
Settled sub-decisions:
|
Settled sub-decisions:
|
||||||
|
|
||||||
- **Idempotency precedence = preserve human edits.** Persons/tags upsert by `source_ref`,
|
- **Idempotency precedence is domain-specific.** Persons/tags upsert by `source_ref`,
|
||||||
documents by `index`. On re-import a non-blank field a human changed in-app is never
|
documents by `index`. Two distinct rules apply:
|
||||||
overwritten (blank fields are filled from canonical), and `provisional` is monotonic — once
|
- **Person/Tag scalar fields = preserve human edits.** On re-import a non-blank field a human
|
||||||
a human confirms a person (`false`) it never reverts to `true`. Verified against real
|
changed in-app is never overwritten (blank fields are filled from canonical via the single
|
||||||
Postgres in `CanonicalImportIntegrationTest`.
|
`preferHuman` idiom), and `provisional` is monotonic-downward — once a human confirms a
|
||||||
|
person (`false`) it never reverts to `true`. Because the orchestrator loads the register and
|
||||||
|
tree *before* documents, a person already `false` can never be flipped provisional by a
|
||||||
|
later document row that references the same `source_ref`, regardless of document-row order.
|
||||||
|
- **Document sender/receivers/tags = canonical-authoritative.** A document's sender, receiver
|
||||||
|
set, and tag set are owned by the canonical row, not the archivist. On re-import of a
|
||||||
|
PLACEHOLDER document `DocumentImporter` clears and re-populates `receivers`/`tags` so a row
|
||||||
|
whose set *shrinks* prunes the removed links rather than accumulating stale ones. The
|
||||||
|
"preserve human edits" rule above does **not** extend to these collections. The raw
|
||||||
|
`sender_text`/`receiver_text` cells are always retained verbatim (a separate invariant).
|
||||||
|
Note non-PLACEHOLDER documents are skipped entirely (`ALREADY_EXISTS`), so once a document
|
||||||
|
has a file the importer never touches it again — this bounds the authoritative-overwrite
|
||||||
|
blast radius to placeholder rows.
|
||||||
|
Verified against real Postgres in `CanonicalImportIntegrationTest`
|
||||||
|
(`reimport_preservesHumanEditedPersonField`, `reimport_prunesRemovedReceiverAndTag…`,
|
||||||
|
`import_neverFlipsRegisterPersonToProvisional…`).
|
||||||
- **Name policy = Option A.** The normalizer resolved attribution upstream: the document sheet
|
- **Name policy = Option A.** The normalizer resolved attribution upstream: the document sheet
|
||||||
carries the resolved slug in `sender_person_id` / `receiver_person_ids` and the raw cell in
|
carries the resolved slug in `sender_person_id` / `receiver_person_ids` and the raw cell in
|
||||||
`sender_name` / `receiver_names`. The importer routes register-first by `source_ref`
|
`sender_name` / `receiver_names`. The importer routes register-first by `source_ref`
|
||||||
@@ -114,6 +129,15 @@ Settled sub-decisions:
|
|||||||
- **Forward-only.** The migration is immutable once shipped (Flyway checksum model); any fix
|
- **Forward-only.** The migration is immutable once shipped (Flyway checksum model); any fix
|
||||||
goes in a later version. There is no down-migration — rollback means restoring from the
|
goes in a later version. There is no down-migration — rollback means restoring from the
|
||||||
nightly `pg_dump`, the standard procedure.
|
nightly `pg_dump`, the standard procedure.
|
||||||
|
- **`runImport()` is non-transactional — per-loader transactions only.** The orchestrator
|
||||||
|
does not wrap the four loaders in a single transaction; each loader (or the per-call
|
||||||
|
`upsertBySourceRef` / `DocumentImporter.load`) carries its own `@Transactional` boundary. A
|
||||||
|
partial failure mid-run (e.g. the document loader throws after tags + persons committed)
|
||||||
|
leaves the earlier loaders' data committed and the `ImportStatus` set to `FAILED`. This is
|
||||||
|
acceptable precisely because the import is idempotent: re-running is safe and converges to
|
||||||
|
the same state, so the operational recovery for a partial failure is simply to fix the
|
||||||
|
offending artifact and re-trigger the import — no manual cleanup of half-written data is
|
||||||
|
required. A future maintainer must not assume all-or-nothing semantics.
|
||||||
- **`PersonSummaryDTO` coupling.** `provisional` was added to the `PersonSummaryDTO` native
|
- **`PersonSummaryDTO` coupling.** `provisional` was added to the `PersonSummaryDTO` native
|
||||||
interface projection; because the projection is backed by native SQL, the column had to be
|
interface projection; because the projection is backed by native SQL, the column had to be
|
||||||
added to all three native `SELECT`s (`findAllWithDocumentCount`, `searchWithDocumentCount`,
|
added to all three native `SELECT`s (`findAllWithDocumentCount`, `searchWithDocumentCount`,
|
||||||
|
|||||||
Reference in New Issue
Block a user