docs(import): document index-based PDF resolution in ADR-025 and DEPLOYMENT

File resolution is now by index (<index>.pdf), not the datei/file
column. Update the ADR-025 security sub-decision and consequence (the
recursive walk and file column are gone; a bad index skips its row with
a loud SkipReason, a symlink-escape still aborts via the containment
assertion) and DEPLOYMENT §6 (PDFs must be named <index>.pdf flat in
the import dir).

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-05-27 21:03:57 +02:00
committed by marcel
parent 32d9a33550
commit 658277e97c
2 changed files with 33 additions and 14 deletions

View File

@@ -577,10 +577,15 @@ python3 -m venv .venv && .venv/bin/pip install -r requirements.txt # once, on
# writes the four canonical artifacts into ./out/
```
**Dev:** place all four canonical artifacts **plus** the referenced PDFs into `./import/`
**Dev:** place all four canonical artifacts **plus** the PDFs into `./import/`
at the repo root (the dev compose bind-mounts it to `/import`, which is `app.import.dir`).
The orchestrator smoke-checks that all four artifacts are present before starting and fails
closed (`IMPORT_ARTIFACT_INVALID`) if any is missing.
Each PDF must be named `<index>.pdf` (e.g. `W-0124.pdf`, `Mü-0001.pdf`) and live flat in the
import dir: since #686 the importer resolves a document's PDF directly by its index
(`importDir/<index>.pdf`), not via a `datei`/`file` column — the recursive directory walk and
its basename/homoglyph guards are gone, replaced by strict index validation plus a
canonical-path containment assertion (a document whose `<index>.pdf` is absent simply becomes a
`PLACEHOLDER`). The orchestrator smoke-checks that all four artifacts are present before
starting and fails closed (`IMPORT_ARTIFACT_INVALID`) if any is missing.
**Staging/production:**