chore(import): stop tracking real family PII canonical artifacts
The four files in tools/import-normalizer/out/ contain real names, addresses, and attribution prose for ~163 living/deceased family members and were committed by mistake. They are now removed from the index (kept on disk for local development) and gitignored. The canonical artifacts are produced locally from the Python normalizer and synced into IMPORT_HOST_DIR out-of-band alongside the PDFs. The contract between normalizer and importer is the header schema, not the file contents — CanonicalSheetReader fails closed on a missing header, which is what locks the contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -36,7 +36,9 @@
|
||||
# accidentally share an import source. Must be
|
||||
# readable by the backend container's UID
|
||||
# (currently root via the OpenJDK image — any
|
||||
# world-readable directory works).
|
||||
# world-readable directory works). Canonical
|
||||
# artifacts are NOT in git (PII — ADR-025); ops
|
||||
# syncs them in beside the PDFs out-of-band.
|
||||
|
||||
networks:
|
||||
archiv-net:
|
||||
@@ -224,6 +226,10 @@ services:
|
||||
# Read-only; the canonical importer only reads them from /import.
|
||||
# Required — no default — so staging and prod cannot accidentally share an
|
||||
# import source. CI workflows pin this per-env (see .gitea/workflows/).
|
||||
# NOTE: the canonical artifacts are NOT version-controlled (they contain real
|
||||
# family PII — see ADR-025). Ops must produce them locally from the Python
|
||||
# normalizer (tools/import-normalizer/) and sync them into this host path
|
||||
# alongside the <index>.pdf corpus before triggering an import.
|
||||
volumes:
|
||||
- ${IMPORT_HOST_DIR:?Set IMPORT_HOST_DIR to a host path holding the import payload (canonical artifacts + <index>.pdf files). See docs/DEPLOYMENT.md.}:/import:ro
|
||||
environment:
|
||||
|
||||
Reference in New Issue
Block a user