Importer file lookup walks the whole import dir per row (O(rows×tree)) #676
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Backlog item surfaced in the multi-persona review of PR #674 (Phase 3 modular importer, #669) — DevOps (Wendt) / Architect (Keller).
The
DocumentImporterresolves each row's file via a recursive walk of the import directory (findFileRecursive), once per document row. With N rows and a directory tree of size T this is O(rows × tree) — fine for the current corpus, but it scales poorly as either the document count or the import-folder size grows. This is a pre-existing shape (the oldMassImportServicematched files similarly), not introduced by #669 — but worth fixing before the corpus grows.Suggested approach
Map<filename, path>(orMap<basename, List<path>>to surface ambiguity), then look each row up in O(1).Out of scope
Not part of #669 — classified as a follow-up in that PR's review. No behavior change intended, only complexity; covered by the existing importer tests.