Importer file lookup walks the whole import dir per row (O(rows×tree)) #676

Closed
opened 2026-05-27 11:21:18 +02:00 by marcel · 0 comments
Owner

Context

Backlog item surfaced in the multi-persona review of PR #674 (Phase 3 modular importer, #669) — DevOps (Wendt) / Architect (Keller).

The DocumentImporter resolves each row's file via a recursive walk of the import directory (findFileRecursive), once per document row. With N rows and a directory tree of size T this is O(rows × tree) — fine for the current corpus, but it scales poorly as either the document count or the import-folder size grows. This is a pre-existing shape (the old MassImportService matched files similarly), not introduced by #669 — but worth fixing before the corpus grows.

Suggested approach

  • Walk the import directory once at the start of the document load, building a Map<filename, path> (or Map<basename, List<path>> to surface ambiguity), then look each row up in O(1).
  • Keep the existing security guard (basename validation + canonical-path containment within the import dir) when resolving from the map.

Out of scope

Not part of #669 — classified as a follow-up in that PR's review. No behavior change intended, only complexity; covered by the existing importer tests.

## Context Backlog item surfaced in the multi-persona review of PR #674 (Phase 3 modular importer, #669) — DevOps (Wendt) / Architect (Keller). The `DocumentImporter` resolves each row's file via a recursive walk of the import directory (`findFileRecursive`), once **per document row**. With N rows and a directory tree of size T this is **O(rows × tree)** — fine for the current corpus, but it scales poorly as either the document count or the import-folder size grows. This is a **pre-existing shape** (the old `MassImportService` matched files similarly), not introduced by #669 — but worth fixing before the corpus grows. ## Suggested approach - Walk the import directory **once** at the start of the document load, building a `Map<filename, path>` (or `Map<basename, List<path>>` to surface ambiguity), then look each row up in O(1). - Keep the existing security guard (basename validation + canonical-path containment within the import dir) when resolving from the map. ## Out of scope Not part of #669 — classified as a follow-up in that PR's review. No behavior change intended, only complexity; covered by the existing importer tests.
marcel added the P3-laterrefactor labels 2026-05-27 11:21:29 +02:00
Sign in to join this conversation.
No Label P3-later refactor
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#676