spec(import): decide and document mass-import operator policy (3 open questions) #534

Open
opened 2026-05-11 20:14:22 +02:00 by marcel · 0 comments
Owner

Type: Spec / operator policy (3 open questions to resolve)
Priority: P2-medium — answers should land before the first real prod import campaign so the runbook reflects intent
Source: review of #526 by Elicit (req engineer) — comment #8647
Parent PR: #526 (mass-import bind mount)

Summary

#526 unlocked the mass-import card on production but left three operator-policy questions unanswered. Decide each, document the answer in docs/DEPLOYMENT.md §6.4 (or a new ADR if architectural), and file implementation follow-ups only where current behaviour diverges from the decision.

Context

#526 unlocked the existing /admin/system mass-import card on production. The feature now works, but its operator policy is unspecified. These three questions should be answered before the first real import campaign so the runbook in docs/DEPLOYMENT.md §6.4 reflects an actual policy rather than mechanical steps.

Open Questions

OQ-IMP-001: Is the bind mount permanent or campaign-only?

The current PR makes it permanent — staging and prod always have /import mounted, even when no import is running.

Trade-off:

  • Permanent: simpler ops, no env-file edits per campaign. Slight attack-surface increase (a compromised backend always has read access to IMPORT_HOST_DIR).
  • Campaign-only: flip IMPORT_HOST_DIR to a tmpfs/empty dir between campaigns, set to real path before each import. Lower attack surface, more ops friction.

Recommendation: decide and document. If "permanent", make the doc explicit so future-Marcel doesn't second-guess.

OQ-IMP-002: Is there a size ceiling per import?

Current rsync: 2.6 GB / 1,607 files. MassImportService has no upper bound — it walks the entire /import tree.

Trade-off:

  • No cap: simple, fits the current corpus, fails naturally on OOM/disk.
  • Cap (e.g. 10k files or 50 GB): explicit rejection with a clear error rather than a silent slow-down.

Document the intended ceiling so future operators know whether 10× the current payload is in or out of scope.

OQ-IMP-003: What's the recovery story for a partial import?

If MassImportService fails halfway through (e.g. S3 write error on file 800/1607), ImportStatus reports FAILED with a message — but the database has 800 new rows and 800 new S3 objects. Re-running the import: does the de-dup via findByOriginalFilename correctly skip the already-imported 800 and pick up at 801? Or does it leave the placeholder rows in a weird state?

Possible answers, each with operator-doc implications:

  • Resumable by design — re-run picks up where it left off. Acceptable; document the de-dup behaviour explicitly.
  • Manual cleanup required — operator must truncate documents / clean S3 before retry. Worse, but document the steps.
  • Transactional — all-or-nothing. Best, but requires changes to the service.

Acceptance criteria

  • OQ-IMP-001 answered in docs/DEPLOYMENT.md §6.4 — "permanent" vs "campaign-only" stated explicitly with rationale
  • OQ-IMP-002 answered in docs/DEPLOYMENT.md §6.4 — intended size ceiling stated (or "no cap, fails naturally" stated)
  • OQ-IMP-003 answered in docs/DEPLOYMENT.md §6.4 — recovery story documented; if "resumable by design", de-dup behaviour explained; if "manual cleanup", steps enumerated
  • Where the current behaviour matches the decision, no code change needed
  • Where current behaviour diverges, an implementation follow-up issue is filed and linked here

Linked NFRs

  • Operability: Operator-facing features MUST have documented policy for mount lifecycle, size ceilings, and partial-failure recovery before first production use.
  • Documentation: docs/DEPLOYMENT.md is the canonical runbook for production operations.

Definition of Ready

  • All three questions scoped and trade-offs articulated
  • Acceptance criteria explicit per question
  • Documentation location identified (docs/DEPLOYMENT.md §6.4)
  • Follow-up-issue policy stated

🤖 Generated with Claude Code during /implement on #526

**Type:** Spec / operator policy (3 open questions to resolve) **Priority:** P2-medium — answers should land before the first real prod import campaign so the runbook reflects intent **Source:** review of #526 by Elicit (req engineer) — comment [#8647](https://git.raddatz.cloud/marcel/familienarchiv/pulls/526#issuecomment-8647) **Parent PR:** #526 (mass-import bind mount) ## Summary #526 unlocked the mass-import card on production but left three operator-policy questions unanswered. Decide each, document the answer in `docs/DEPLOYMENT.md` §6.4 (or a new ADR if architectural), and file implementation follow-ups only where current behaviour diverges from the decision. ## Context #526 unlocked the existing `/admin/system` mass-import card on production. The feature now works, but its operator policy is unspecified. These three questions should be answered before the first real import campaign so the runbook in `docs/DEPLOYMENT.md` §6.4 reflects an actual policy rather than mechanical steps. ## Open Questions ### OQ-IMP-001: Is the bind mount permanent or campaign-only? The current PR makes it permanent — staging and prod always have `/import` mounted, even when no import is running. Trade-off: - **Permanent:** simpler ops, no env-file edits per campaign. Slight attack-surface increase (a compromised backend always has read access to `IMPORT_HOST_DIR`). - **Campaign-only:** flip `IMPORT_HOST_DIR` to a tmpfs/empty dir between campaigns, set to real path before each import. Lower attack surface, more ops friction. **Recommendation:** decide and document. If "permanent", make the doc explicit so future-Marcel doesn't second-guess. ### OQ-IMP-002: Is there a size ceiling per import? Current rsync: 2.6 GB / 1,607 files. `MassImportService` has no upper bound — it walks the entire `/import` tree. Trade-off: - **No cap:** simple, fits the current corpus, fails naturally on OOM/disk. - **Cap (e.g. 10k files or 50 GB):** explicit rejection with a clear error rather than a silent slow-down. Document the *intended* ceiling so future operators know whether 10× the current payload is in or out of scope. ### OQ-IMP-003: What's the recovery story for a partial import? If `MassImportService` fails halfway through (e.g. S3 write error on file 800/1607), `ImportStatus` reports `FAILED` with a message — but the database has 800 new rows and 800 new S3 objects. Re-running the import: does the de-dup via `findByOriginalFilename` correctly skip the already-imported 800 and pick up at 801? Or does it leave the placeholder rows in a weird state? Possible answers, each with operator-doc implications: - **Resumable by design** — re-run picks up where it left off. Acceptable; document the de-dup behaviour explicitly. - **Manual cleanup required** — operator must truncate `documents` / clean S3 before retry. Worse, but document the steps. - **Transactional** — all-or-nothing. Best, but requires changes to the service. ## Acceptance criteria - [ ] OQ-IMP-001 answered in `docs/DEPLOYMENT.md` §6.4 — "permanent" vs "campaign-only" stated explicitly with rationale - [ ] OQ-IMP-002 answered in `docs/DEPLOYMENT.md` §6.4 — intended size ceiling stated (or "no cap, fails naturally" stated) - [ ] OQ-IMP-003 answered in `docs/DEPLOYMENT.md` §6.4 — recovery story documented; if "resumable by design", de-dup behaviour explained; if "manual cleanup", steps enumerated - [ ] Where the current behaviour matches the decision, no code change needed - [ ] Where current behaviour diverges, an implementation follow-up issue is filed and linked here ## Linked NFRs - **Operability:** Operator-facing features MUST have documented policy for mount lifecycle, size ceilings, and partial-failure recovery before first production use. - **Documentation:** `docs/DEPLOYMENT.md` is the canonical runbook for production operations. ## Definition of Ready - [x] All three questions scoped and trade-offs articulated - [x] Acceptance criteria explicit per question - [x] Documentation location identified (`docs/DEPLOYMENT.md` §6.4) - [x] Follow-up-issue policy stated 🤖 Generated with [Claude Code](https://claude.com/claude-code) during /implement on #526
marcel added the P2-mediumdocumentation labels 2026-05-11 20:36:44 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#534