Canonical importer can leave orphaned S3 objects when a document's file changes #675
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Backlog item surfaced in the multi-persona review of PR #674 (Phase 3 modular importer, #669) — DevOps (Wendt).
The
DocumentImporteruploads a document's file to S3 on thePLACEHOLDER → UPLOADEDtransition. There is no cleanup of the previous object if a document that already had a file is later re-imported with a different file (or itsfilecolumn changes). The old S3 object becomes orphaned — it consumes storage and is never referenced again.Bounded, not urgent: per ADR-025, once a document has a file the importer largely leaves it alone, so this only bites on the relatively rare "file changed for an existing index" path. No data-loss or correctness impact — purely storage hygiene.
Suggested approach
file_pathobject from S3 (within the same transaction boundary / after-commit hook), ordocuments.file_path.Prefer the inline delete-on-replace if it's cheap; otherwise a scheduled reconciliation.
Out of scope
Not part of #669 — that PR's review explicitly classified this as a follow-up. No migration involved.