Import normalizer: offline tool to normalize the raw archive spreadsheets #663

Merged
marcel merged 172 commits from docs/import-migration into main 2026-05-28 15:05:51 +02:00
2 changed files with 6 additions and 1 deletions
Showing only changes of commit e4a154406e - Show all commits

View File

@@ -592,7 +592,7 @@ closed (`IMPORT_ARTIFACT_INVALID`) if any is missing.
2. Make sure `IMPORT_HOST_DIR=<host-path>` is set in `.env.staging` / `.env.production` (the nightly/release workflows already write this — see §3). Compose refuses to start without it.
3. Redeploy the stack so the bind mount picks up — or, if the mount is already in place, skip to step 4.
4. Call `POST /api/admin/trigger-import` (requires `ADMIN` permission), or click the "Import starten" button on `/admin/system`.
5. The import runs asynchronously — poll `GET /api/admin/import-status`, watch `/admin/system`, or tail the backend logs. Re-running is safe: the import is idempotent (upsert by `source_ref` / document `index`) and never overwrites a human-edited field.
5. The import runs asynchronously — poll `GET /api/admin/import-status`, watch `/admin/system`, or tail the backend logs. Re-running is safe and idempotent (upsert by `source_ref` / document `index`). Person and tag scalar fields you edited in the app are preserved on re-import; a document's sender/receivers/tags are **canonical-authoritative** — a re-import re-applies them to exactly match the export, so a link removed from the export is removed from the document (the raw sender/receiver cell text is always kept).
---

View File

@@ -138,6 +138,11 @@ Settled sub-decisions:
the same state, so the operational recovery for a partial failure is simply to fix the
offending artifact and re-trigger the import — no manual cleanup of half-written data is
required. A future maintainer must not assume all-or-nothing semantics.
- **Path-escape aborts the whole import (fail-closed), by design.** A path-traversal or
symlink-escape in a row's file path is treated as an attack signal: the import aborts rather
than recording the row as a `SkippedFile` and continuing. This is a deliberate owner decision
(2026-05-27) over a per-file skip — a malicious path must surface loudly, not be silently
tolerated.
- **`PersonSummaryDTO` coupling.** `provisional` was added to the `PersonSummaryDTO` native
interface projection; because the projection is backed by native SQL, the column had to be
added to all three native `SELECT`s (`findAllWithDocumentCount`, `searchWithDocumentCount`,