feat(infra): bind-mount /import for backend mass-import endpoint #526

Merged
marcel merged 7 commits from feat/prod-import-bind-mount into main 2026-05-11 20:55:43 +02:00
Owner

Summary

MassImportService reads from a hardcoded /import path inside the backend container (MassImportService.java:102). Dev compose mounts ./import:/import, but docker-compose.prod.yml had no equivalent — so POST /api/admin/import on staging/prod always fails with "no spreadsheet found", and the import card on /admin/system can never do anything useful.

This PR adds the missing bind mount, env-driven so the host path is stable across CI deploys (the compose working dir is recreated every deploy, so ./import would never persist).

Changes

services:
  backend:
    volumes:
      - ${IMPORT_HOST_DIR:-/srv/familienarchiv/import}:/import:ro

Choices made:

  • Env-driven (IMPORT_HOST_DIR) — per-env override; staging and prod can point to different host paths. Sensible default outside the compose working dir.
  • Read-onlyMassImportService only does Files.list / Files.walk on /import, never writes. :ro makes that contract explicit and prevents the backend from mutating the source PDFs.
  • Empty / missing path is harmless — if /srv/familienarchiv/import doesn't exist or is empty, Docker still creates an empty bind source, and the import API just returns the existing "no spreadsheet found" error. No container crash.

Also documents the env var in the compose header next to the other optional/required vars.

How to use this on staging

  1. rsync -avh --progress ./import/ hetzner:/srv/familienarchiv-staging/import/
  2. Set IMPORT_HOST_DIR=/srv/familienarchiv-staging/import in .env.staging on the runner
  3. Redeploy (next nightly, or manual)
  4. Trigger from /admin/system mass-import card

Test plan

  • docker compose -f docker-compose.prod.yml --profile staging config parses (verified locally)
  • After deploy + env update: docker inspect archiv-staging-backend-1 --format '{{range .Mounts}}{{.Source}} -> {{.Destination}} ({{.Mode}}){{println}}{{end}}' shows /srv/familienarchiv-staging/import -> /import (ro)
  • POST /api/admin/import on staging finds the spreadsheet and starts the import
  • /admin/system status card reports processed counts

🤖 Generated with Claude Code

## Summary `MassImportService` reads from a hardcoded `/import` path inside the backend container (`MassImportService.java:102`). Dev compose mounts `./import:/import`, but `docker-compose.prod.yml` had no equivalent — so `POST /api/admin/import` on staging/prod always fails with "no spreadsheet found", and the import card on `/admin/system` can never do anything useful. This PR adds the missing bind mount, env-driven so the host path is stable across CI deploys (the compose working dir is recreated every deploy, so `./import` would never persist). ## Changes ```yaml services: backend: volumes: - ${IMPORT_HOST_DIR:-/srv/familienarchiv/import}:/import:ro ``` Choices made: - **Env-driven (`IMPORT_HOST_DIR`)** — per-env override; staging and prod can point to different host paths. Sensible default outside the compose working dir. - **Read-only** — `MassImportService` only does `Files.list` / `Files.walk` on `/import`, never writes. `:ro` makes that contract explicit and prevents the backend from mutating the source PDFs. - **Empty / missing path is harmless** — if `/srv/familienarchiv/import` doesn't exist or is empty, Docker still creates an empty bind source, and the import API just returns the existing "no spreadsheet found" error. No container crash. Also documents the env var in the compose header next to the other optional/required vars. ## How to use this on staging 1. `rsync -avh --progress ./import/ hetzner:/srv/familienarchiv-staging/import/` 2. Set `IMPORT_HOST_DIR=/srv/familienarchiv-staging/import` in `.env.staging` on the runner 3. Redeploy (next nightly, or manual) 4. Trigger from `/admin/system` mass-import card ## Test plan - [ ] `docker compose -f docker-compose.prod.yml --profile staging config` parses (verified locally) - [ ] After deploy + env update: `docker inspect archiv-staging-backend-1 --format '{{range .Mounts}}{{.Source}} -> {{.Destination}} ({{.Mode}}){{println}}{{end}}'` shows `/srv/familienarchiv-staging/import -> /import (ro)` - [ ] `POST /api/admin/import` on staging finds the spreadsheet and starts the import - [ ] `/admin/system` status card reports processed counts 🤖 Generated with [Claude Code](https://claude.com/claude-code)
marcel added 1 commit 2026-05-11 18:58:30 +02:00
feat(infra): bind-mount /import for backend mass-import endpoint
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m55s
CI / OCR Service Tests (push) Successful in 18s
CI / Backend Unit Tests (push) Successful in 4m9s
CI / fail2ban Regex (push) Successful in 38s
CI / Compose Bucket Idempotency (push) Successful in 56s
CI / Unit & Component Tests (pull_request) Failing after 2m47s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 4m12s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
4a537d6b19
`MassImportService` reads the ODS spreadsheet and referenced PDFs from a
hardcoded `/import` path inside the backend container. Dev compose
already bind-mounts `./import:/import`, but the prod compose had no
equivalent, so `POST /api/admin/import` would always fail on staging/prod
with "no spreadsheet found".

Mount strategy:
- Source path is env-driven (`IMPORT_HOST_DIR`), defaulting to
  `/srv/familienarchiv/import` so the host path is stable across CI
  deploys (the compose working dir is recreated each run, so `./import`
  would not persist).
- Read-only — `MassImportService` only reads (`Files.list` /
  `Files.walk`), never writes. Read-only mount makes that contract
  explicit and prevents the backend container from mutating the source
  PDFs.
- Empty / missing path is harmless: the import API just returns the
  existing "no spreadsheet found" error rather than crashing the
  container.

To use on staging: rsync the import folder to
`/srv/familienarchiv-staging/import/` on the host, set
`IMPORT_HOST_DIR=/srv/familienarchiv-staging/import` in `.env.staging`,
redeploy, trigger import from `/admin/system`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Author
Owner

🏛️ Markus Keller — Application Architect

Verdict: ⚠️ Approved with concerns

The mount change itself is correct. Three architectural notes — none blockers, all worth a follow-up issue.

Concerns

  1. Hardcoded /import in MassImportService.java:102 is the real coupling. The path is private static final String IMPORT_DIR = "/import". Every other configurable input in that service is a @Value("${app.import.col.*}") — this one is the odd duck. The bind mount makes the magic path work; the magic path remains. A @Value("${app.import.dir:/import}") would lift the contract from infra-config into application-config, where it belongs.

  2. Doc update — partial coverage. Per my doc table:

    PR contains Required doc update
    New Docker service or infrastructure component docs/architecture/c4/l2-containers.puml + docs/DEPLOYMENT.md

    This isn't a new service, so the L2 diagram probably doesn't need a new node. But docs/DEPLOYMENT.md should mention IMPORT_HOST_DIR next to the other env vars, and the operator runbook for "how to do a one-time bulk import" deserves a paragraph. Follow-up, not merge blocker.

  3. Default path collision between projects on the same host. Compose project names namespace volumes and containers (archiv-staging / archiv-production), but the default bind source /srv/familienarchiv/import is shared. If both stacks run on hetzner and both use the default, the import folder is shared between them. Override IMPORT_HOST_DIR per env (the PR description already says to). Add a sentence in the header comment: "When staging and production cohabit a host, set this explicitly per env to avoid sharing the import source."

Suggestions

  • Ship this. Open a follow-up issue for @Value("${app.import.dir:/import}") to dissolve the magic path.
  • Add the IMPORT_HOST_DIR line to docs/DEPLOYMENT.md env-var section before merge if it's a 30-second edit.
## 🏛️ Markus Keller — Application Architect **Verdict: ⚠️ Approved with concerns** The mount change itself is correct. Three architectural notes — none blockers, all worth a follow-up issue. ### Concerns 1. **Hardcoded `/import` in `MassImportService.java:102` is the real coupling.** The path is `private static final String IMPORT_DIR = "/import"`. Every other configurable input in that service is a `@Value("${app.import.col.*}")` — this one is the odd duck. The bind mount makes the magic path work; the magic path remains. A `@Value("${app.import.dir:/import}")` would lift the contract from infra-config into application-config, where it belongs. 2. **Doc update — partial coverage.** Per my doc table: | PR contains | Required doc update | |---|---| | New Docker service or infrastructure component | `docs/architecture/c4/l2-containers.puml` + `docs/DEPLOYMENT.md` | This isn't a new service, so the L2 diagram probably doesn't need a new node. But `docs/DEPLOYMENT.md` should mention `IMPORT_HOST_DIR` next to the other env vars, and the operator runbook for "how to do a one-time bulk import" deserves a paragraph. Follow-up, not merge blocker. 3. **Default path collision between projects on the same host.** Compose project names namespace volumes and containers (`archiv-staging` / `archiv-production`), but the default bind source `/srv/familienarchiv/import` is *shared*. If both stacks run on hetzner and both use the default, the import folder is shared between them. Override `IMPORT_HOST_DIR` per env (the PR description already says to). Add a sentence in the header comment: "When staging and production cohabit a host, set this explicitly per env to avoid sharing the import source." ### Suggestions - Ship this. Open a follow-up issue for `@Value("${app.import.dir:/import}")` to dissolve the magic path. - Add the `IMPORT_HOST_DIR` line to `docs/DEPLOYMENT.md` env-var section before merge if it's a 30-second edit.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: Approved

YAML-only diff. Nothing to red/green here — declarative config.

What I checked

  • The comment block above the volumes: entry explains why :ro is safe (Files.list / Files.walk only). That's the "why" comment style — exactly right. Reviewers six months from now will read this and not wonder.
  • Env var default is sensible. Header docs match the inline comment — single source of truth, no drift.
  • The "no crash on empty path" claim in the description matches what MassImportService.findSpreadsheetFile() (lines 131–140) actually does: it throws a structured DomainException with a clean error code, not a RuntimeException. The 500 response is structured, the operator sees a readable message.

Suggestion (non-blocking)

The pre-existing private static final String IMPORT_DIR = "/import" in the service is a magic-value smell — every other parameter is @Value-driven. A follow-up could lift it to @Value("${app.import.dir:/import}") for symmetry. Out of scope for this PR.

Ship it.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** YAML-only diff. Nothing to red/green here — declarative config. ### What I checked - The comment block above the `volumes:` entry explains *why* `:ro` is safe (`Files.list` / `Files.walk` only). That's the "why" comment style — exactly right. Reviewers six months from now will read this and not wonder. - Env var default is sensible. Header docs match the inline comment — single source of truth, no drift. - The "no crash on empty path" claim in the description matches what `MassImportService.findSpreadsheetFile()` (lines 131–140) actually does: it throws a structured `DomainException` with a clean error code, not a `RuntimeException`. The 500 response is structured, the operator sees a readable message. ### Suggestion (non-blocking) The pre-existing `private static final String IMPORT_DIR = "/import"` in the service is a magic-value smell — every other parameter is `@Value`-driven. A follow-up could lift it to `@Value("${app.import.dir:/import}")` for symmetry. Out of scope for this PR. Ship it.
Author
Owner

🛠️ Tobias Wendt — DevOps & Platform Engineer

Verdict: ⚠️ Approved with concerns

Solid infra change. The bind mount belonged in this file the day the import endpoint shipped — good catch.

What's right

  • :ro — read-only is the correct mode. Mass import only reads.
  • Env-driven path with sensible default — same pattern as OCR_MEM_LIMIT, MAIL_PORT, etc. in this file.
  • Path is outside the compose working dir (which is wiped per CI deploy — confirmed by docker inspect).
  • Header doc + inline comment both explain the contract.

Concerns (none merge-blocking)

  1. Bind-mount on the Docker-out-of-Docker runner. The create-buckets service has a long comment explaining bind mounts fail on the DinD runner for relative paths. The PR uses an absolute default (/srv/familienarchiv/import), so it should work — the host daemon resolves it against the host filesystem. Verify on the next staging deploy:

    docker inspect archiv-staging-backend-1 \
      --format '{{range .Mounts}}{{.Source}} -> {{.Destination}} ({{.Mode}}){{println}}{{end}}'
    

    If the mount source comes through as the runner-container path instead of the host path, switch to a named volume populated out-of-band.

  2. Docker auto-creates missing bind sources as root:root. If /srv/familienarchiv/import doesn't exist, the daemon creates it. Mode 755, owned by root. Fine for the backend read path; document the convention in the header: "the host path must exist and be readable by the backend container's UID."

  3. Default collision between prod and staging on the same host. Both projects would share /srv/familienarchiv/import by default. The PR description already tells the operator to set IMPORT_HOST_DIR per env, but the default invites accidental sharing. Cleaner alternative: drop the default, require the env var explicitly. Or namespace it: ${IMPORT_HOST_DIR:?Set this per environment to a host path holding the mass-import payload} — fails fast if forgotten.

  4. Backend container UID check. What user does the backend image run as? /import (owned by root, mode 755) is readable for any UID, but worth a one-line verification after deploy:

    docker exec archiv-staging-backend-1 ls /import | head
    
  5. docs/DEPLOYMENT.md. The IMPORT_HOST_DIR env var belongs in the deployment runbook, not just the compose header. A future operator should read DEPLOYMENT.md and know how to enable mass import without grepping the compose file.

Cost impact

Zero — host-local bind mount, no new managed service.

## 🛠️ Tobias Wendt — DevOps & Platform Engineer **Verdict: ⚠️ Approved with concerns** Solid infra change. The bind mount belonged in this file the day the import endpoint shipped — good catch. ### What's right - ✅ `:ro` — read-only is the correct mode. Mass import only reads. - ✅ Env-driven path with sensible default — same pattern as `OCR_MEM_LIMIT`, `MAIL_PORT`, etc. in this file. - ✅ Path is outside the compose working dir (which is wiped per CI deploy — confirmed by `docker inspect`). - ✅ Header doc + inline comment both explain the contract. ### Concerns (none merge-blocking) 1. **Bind-mount on the Docker-out-of-Docker runner.** The `create-buckets` service has a long comment explaining bind mounts fail on the DinD runner *for relative paths*. The PR uses an absolute default (`/srv/familienarchiv/import`), so it should work — the host daemon resolves it against the host filesystem. **Verify on the next staging deploy:** ```bash docker inspect archiv-staging-backend-1 \ --format '{{range .Mounts}}{{.Source}} -> {{.Destination}} ({{.Mode}}){{println}}{{end}}' ``` If the mount source comes through as the runner-container path instead of the host path, switch to a named volume populated out-of-band. 2. **Docker auto-creates missing bind sources as `root:root`.** If `/srv/familienarchiv/import` doesn't exist, the daemon creates it. Mode 755, owned by root. Fine for the backend read path; document the convention in the header: "the host path must exist and be readable by the backend container's UID." 3. **Default collision between prod and staging on the same host.** Both projects would share `/srv/familienarchiv/import` by default. The PR description already tells the operator to set `IMPORT_HOST_DIR` per env, but the *default* invites accidental sharing. Cleaner alternative: drop the default, require the env var explicitly. Or namespace it: `${IMPORT_HOST_DIR:?Set this per environment to a host path holding the mass-import payload}` — fails fast if forgotten. 4. **Backend container UID check.** What user does the backend image run as? `/import` (owned by root, mode 755) is readable for any UID, but worth a one-line verification after deploy: ```bash docker exec archiv-staging-backend-1 ls /import | head ``` 5. **`docs/DEPLOYMENT.md`.** The `IMPORT_HOST_DIR` env var belongs in the deployment runbook, not just the compose header. A future operator should read DEPLOYMENT.md and know how to enable mass import without grepping the compose file. ### Cost impact Zero — host-local bind mount, no new managed service.
Author
Owner

📋 Elicit — Requirements Engineer / Business Analyst

Verdict: Approved

This is a pure infrastructure delta — not a user-facing requirement change. I'd normally flag missing user stories and acceptance criteria for a feature PR, but this PR unlocks an existing feature (the /admin/system mass-import card) rather than introducing new behavior.

Implicit requirement satisfied

  • FR-IMPORT-001 (implicit): "Administrators shall be able to trigger a bulk ODS+PDF import from /admin/system against documents pre-staged on the production server."

    This requirement was effectively already in the codebase (MassImportService, AdminController#triggerMassImport, the admin UI card) but unsatisfiable on production until this PR. So this PR is the missing infra contract for an existing FR — a legitimate gap-fill.

Open Questions Register

ID Question Why it matters
OQ-IMP-001 Is the bind mount permanent, or only enabled during active import campaigns? Reduces attack surface (cf. Nora). The current PR makes it permanent — fine, but document the operator policy.
OQ-IMP-002 What's the size ceiling for a single import? The current rsync was 2.6 GB / 1,607 files. Is there an upper bound documented anywhere? NFR: scalability. The IMPORT_ALREADY_RUNNING guard prevents concurrent imports but doesn't cap individual size.
OQ-IMP-003 What's the recovery story for a partial import? MassImportService.ImportStatus reports FAILED with a message, but does it leave dangling S3 objects or partial DB rows? NFR: data integrity. Acceptance criteria missing.

No blockers. The test plan in the PR description doubles as a manual acceptance check, which is sufficient for infra of this size. Add the three open questions to the backlog as a single follow-up issue.

## 📋 Elicit — Requirements Engineer / Business Analyst **Verdict: ✅ Approved** This is a pure infrastructure delta — not a user-facing requirement change. I'd normally flag missing user stories and acceptance criteria for a feature PR, but this PR *unlocks* an existing feature (the `/admin/system` mass-import card) rather than introducing new behavior. ### Implicit requirement satisfied - **FR-IMPORT-001 (implicit):** "Administrators shall be able to trigger a bulk ODS+PDF import from `/admin/system` against documents pre-staged on the production server." This requirement was effectively already in the codebase (`MassImportService`, `AdminController#triggerMassImport`, the admin UI card) but **unsatisfiable on production** until this PR. So this PR is the missing infra contract for an existing FR — a legitimate gap-fill. ### Open Questions Register | ID | Question | Why it matters | |---|---|---| | **OQ-IMP-001** | Is the bind mount permanent, or only enabled during active import campaigns? | Reduces attack surface (cf. Nora). The current PR makes it permanent — fine, but document the operator policy. | | **OQ-IMP-002** | What's the size ceiling for a single import? The current rsync was 2.6 GB / 1,607 files. Is there an upper bound documented anywhere? | NFR: scalability. The `IMPORT_ALREADY_RUNNING` guard prevents concurrent imports but doesn't cap individual size. | | **OQ-IMP-003** | What's the recovery story for a partial import? `MassImportService.ImportStatus` reports `FAILED` with a message, but does it leave dangling S3 objects or partial DB rows? | NFR: data integrity. Acceptance criteria missing. | No blockers. The test plan in the PR description doubles as a manual acceptance check, which is sufficient for infra of this size. Add the three open questions to the backlog as a single follow-up issue.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security

Verdict: ⚠️ Approved with concerns

The mount itself is benign — read-only, host-local, no network exposure. Signing off on this change. The concerns below are about the broader import path that this PR newly activates on production.

On the mount (this PR)

  1. :ro — correct. Prevents a compromised backend container from writing back to the host (anti-persistence).
  2. Host path is outside /workspace/marcel/familienarchiv (the compose working dir wiped per deploy), so the payload isn't co-mingled with deploy artifacts.
  3. ⚠️ IMPORT_HOST_DIR is operator-controlled and unvalidated. If an operator (or a compromised .env.staging) sets IMPORT_HOST_DIR=/etc or /var/lib/postgresql/data, the backend reads those paths as /import. :ro blocks writes, but the read surfaces host file contents to a process with S3 write access. Blast radius is "operator already has root on the host", so I rate this Low. Add a one-line warning in the header: "must point to a host directory containing only mass-import payload."

On the import path this PR newly activates (not blockers for this PR — file follow-up issues)

  1. Apache POI XXE history. MassImportService uses DocumentBuilderFactory (line 29) to parse the ODS XML. I want to see setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) (or equivalent) on that factory before mass import runs on prod. POI 5.5.0 is recent enough that this is probably fine — verify, because mass import on prod is now reachable.

  2. PDF content-type validation. The mount feeds arbitrary .pdf files into S3Client.putObject and the OCR/thumbnail pipeline. Does MassImportService verify that what's named *.pdf actually has a %PDF- magic header? Or do we trust the filename?

  3. Path traversal inside the ODS-referenced filenames. The ODS spreadsheet contains filenames in cells. MassImportService.processRows (line 121) maps those to files in /import. Does it reject .. or absolute paths in the spreadsheet cells? If an attacker controls the ODS, can they read arbitrary files via the import?

Detection notes for the team

# Semgrep rule to catch DocumentBuilderFactory without XXE hardening
rules:
  - id: dbf-xxe-default
    pattern: |
      $X = DocumentBuilderFactory.newInstance();
    pattern-not-inside: |
      ...
      $X.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
      ...
    message: "DocumentBuilderFactory without XXE protection"
    severity: WARNING

Ship this PR. Open three follow-up tickets for items 4 / 5 / 6 — they're pre-existing behavior, but production reachability changes the risk profile.

## 🔒 Nora "NullX" Steiner — Application Security **Verdict: ⚠️ Approved with concerns** The mount itself is benign — read-only, host-local, no network exposure. Signing off on this change. The concerns below are about the broader import path that this PR newly *activates* on production. ### On the mount (this PR) 1. ✅ `:ro` — correct. Prevents a compromised backend container from writing back to the host (anti-persistence). 2. ✅ Host path is outside `/workspace/marcel/familienarchiv` (the compose working dir wiped per deploy), so the payload isn't co-mingled with deploy artifacts. 3. ⚠️ **`IMPORT_HOST_DIR` is operator-controlled and unvalidated.** If an operator (or a compromised `.env.staging`) sets `IMPORT_HOST_DIR=/etc` or `/var/lib/postgresql/data`, the backend reads those paths as `/import`. `:ro` blocks writes, but the *read* surfaces host file contents to a process with S3 write access. Blast radius is "operator already has root on the host", so I rate this **Low**. Add a one-line warning in the header: "must point to a host directory containing only mass-import payload." ### On the import path this PR newly activates (not blockers for this PR — file follow-up issues) 4. **Apache POI XXE history.** `MassImportService` uses `DocumentBuilderFactory` (line 29) to parse the ODS XML. I want to see `setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)` (or equivalent) on that factory before mass import runs on prod. POI 5.5.0 is recent enough that this is *probably* fine — verify, because mass import on prod is now reachable. 5. **PDF content-type validation.** The mount feeds arbitrary `.pdf` files into `S3Client.putObject` and the OCR/thumbnail pipeline. Does `MassImportService` verify that what's named `*.pdf` actually has a `%PDF-` magic header? Or do we trust the filename? 6. **Path traversal inside the ODS-referenced filenames.** The ODS spreadsheet contains filenames in cells. `MassImportService.processRows` (line 121) maps those to files in `/import`. Does it reject `..` or absolute paths in the spreadsheet cells? If an attacker controls the ODS, can they read arbitrary files via the import? ### Detection notes for the team ```yaml # Semgrep rule to catch DocumentBuilderFactory without XXE hardening rules: - id: dbf-xxe-default pattern: | $X = DocumentBuilderFactory.newInstance(); pattern-not-inside: | ... $X.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); ... message: "DocumentBuilderFactory without XXE protection" severity: WARNING ``` Ship this PR. Open three follow-up tickets for items 4 / 5 / 6 — they're pre-existing behavior, but production reachability changes the risk profile.
Author
Owner

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: ⚠️ Approved with concerns

Pure YAML change. Nothing testable at the unit level. But there's a regression-prevention gap.

What's missing

  1. No CI guard against silent removal. Someone could revert this single line in a future "compose cleanup" PR and CI would happily pass — the import feature would silently break again. Add the cheapest possible insurance to the deploy workflow:

    - name: Verify /import mount is present in backend service
      run: |
        docker compose -f docker-compose.prod.yml --profile staging config \
          | grep -E '^\s+- .*:/import:ro\s*$' \
          || { echo "::error::backend service is missing the /import:ro mount"; exit 1; }
    

    4-line CI step that prevents the exact same bug from happening twice.

  2. Post-deploy smoke test for the import endpoint. The PR's test plan is four manual checkboxes. The first one (docker compose config parses) is verified at build time. The other three (mount appears, POST works, status reports counts) belong in the post-deploy smoke suite. Even a basic curl $URL/api/admin/import-status returning a 200 with a known shape would catch a future regression — and reuses existing admin auth.

  3. No assertion that the host path is readable by the container's UID. Five lines of Bash after deploy would catch the silent failure mode where the mount succeeds but the backend container user can't read the directory:

    docker exec archiv-staging-backend-1 sh -c 'ls /import >/dev/null 2>&1' \
      || { echo "::error::backend cannot read /import"; exit 1; }
    

What's fine

  • The PR description's manual checks are reasonable for verification after this PR ships. They're just not automated, which is the gap I'm flagging.
  • docker compose config parses quietly — the right minimal local check.

Approved on the assumption that items 1–3 land as a follow-up CI ticket within one sprint. A flaky-or-missing test is a critical bug; this one is just missing, which is cheaper to fix.

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ⚠️ Approved with concerns** Pure YAML change. Nothing testable at the unit level. But there's a regression-prevention gap. ### What's missing 1. **No CI guard against silent removal.** Someone could revert this single line in a future "compose cleanup" PR and CI would happily pass — the import feature would silently break again. Add the cheapest possible insurance to the deploy workflow: ```yaml - name: Verify /import mount is present in backend service run: | docker compose -f docker-compose.prod.yml --profile staging config \ | grep -E '^\s+- .*:/import:ro\s*$' \ || { echo "::error::backend service is missing the /import:ro mount"; exit 1; } ``` 4-line CI step that prevents the exact same bug from happening twice. 2. **Post-deploy smoke test for the import endpoint.** The PR's test plan is four manual checkboxes. The first one (`docker compose config` parses) is verified at build time. The other three (mount appears, POST works, status reports counts) belong in the post-deploy smoke suite. Even a basic `curl $URL/api/admin/import-status` returning a 200 with a known shape would catch a future regression — and reuses existing admin auth. 3. **No assertion that the host path is readable by the container's UID.** Five lines of Bash after deploy would catch the silent failure mode where the mount succeeds but the backend container user can't read the directory: ```bash docker exec archiv-staging-backend-1 sh -c 'ls /import >/dev/null 2>&1' \ || { echo "::error::backend cannot read /import"; exit 1; } ``` ### What's fine - The PR description's manual checks are reasonable for verification *after* this PR ships. They're just not automated, which is the gap I'm flagging. - `docker compose config` parses quietly — the right minimal local check. Approved on the assumption that items 1–3 land as a follow-up CI ticket within one sprint. A flaky-or-missing test is a critical bug; this one is just *missing*, which is cheaper to fix.
Author
Owner

🎨 Leonie Voss — UX & Accessibility

Verdict: Approved

No UI changes. Pure compose file edit. Out of scope for me.

Side note (not this PR)

Once mass import actually runs on staging and the /admin/system card does something, please verify on the next pass:

  1. The status card has a real loading state while the import is RUNNING — not just a stale "IDLE"
  2. The DONE / FAILED messages are localized via Paraglide (messages/{de,en,es}.json), not raw backend Strings
  3. Progress counts are at least text-base (16px) — they'll be read by a 60+ user reviewing import results

Those are pre-existing concerns about the import card, not this PR. LGTM here.

## 🎨 Leonie Voss — UX & Accessibility **Verdict: ✅ Approved** No UI changes. Pure compose file edit. Out of scope for me. ### Side note (not this PR) Once mass import actually runs on staging and the `/admin/system` card *does* something, please verify on the next pass: 1. The status card has a real loading state while the import is `RUNNING` — not just a stale "IDLE" 2. The `DONE` / `FAILED` messages are localized via Paraglide (`messages/{de,en,es}.json`), not raw backend `String`s 3. Progress counts are at least `text-base` (16px) — they'll be read by a 60+ user reviewing import results Those are pre-existing concerns about the import card, not this PR. LGTM here.
marcel added 6 commits 2026-05-11 20:13:06 +02:00
The hardcoded `static final String IMPORT_DIR = "/import"` was the only
non-`@Value` configurable input in MassImportService — every column
index next to it is wired through `app.import.col.*`. Lifts the
contract from infrastructure (compose bind mount) into application
config (`app.import.dir`), with `/import` as the default so the existing
bind-mount path keeps working.

Addresses review feedback from Markus and Felix on #526.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tobias and Markus both flagged that a shared default (/srv/familienarchiv/
import) invites silent collision when staging and prod cohabit one host.
Switch to ${IMPORT_HOST_DIR:?...} so compose refuses to start without an
explicit per-env path — collision becomes structurally impossible.

The error message points operators at docs/DEPLOYMENT.md so the recovery
step is one click away. IMPORT_HOST_DIR moves from "Optional" to the
main required-env-vars block in the header.

Addresses review feedback from Markus, Tobias, and Nora on #526.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DEPLOYMENT.md line 81 declares any compose env var missing from §2 a
blocking review comment. IMPORT_HOST_DIR (added on this branch) was
unmentioned. Adds the row and rewrites §6.4 so the staging/prod operator
workflow (rsync host → set env → trigger import) is in the runbook,
not just buried in compose comments.

Addresses review feedback from Markus and Tobias on #526.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The compose file now requires IMPORT_HOST_DIR or refuses to start
(#526). Without this line the next nightly deploy would fail with a
clear interpolation error, but it should not fail — the staging
import payload already lives at this host path (rsync'd in #526).

Addresses Tobias's review on #526.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors the staging change. The host directory does not yet exist on
the production server — first production release that consumes this
will create an empty bind source via Docker's auto-create behaviour;
mass import then reports "no spreadsheet found" until an operator
pre-stages a payload there.

Addresses Tobias's review on #526.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ci(nightly): regression guard for backend /import:ro mount
Some checks failed
CI / Backend Unit Tests (pull_request) Successful in 4m13s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (pull_request) Failing after 2m48s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Compose Bucket Idempotency (pull_request) Failing after 11s
CI / Unit & Component Tests (push) Has been cancelled
3775f4cb52
Sara flagged that a future "compose cleanup" PR could silently drop the
backend volumes block and CI would happily pass while mass import on
staging silently broke. Adds a pre-build step that renders the staging
compose config and fails the deploy if `target: /import` or
`read_only: true` is missing.

Local verification of the guard:
- Volumes block removed → `grep -q 'target: /import'` exits 1 → step fails
- Volumes block present → both greps match → step passes

Addresses Sara's review on #526.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Author
Owner

Review-fix pass — done

Addressed every reviewer concern that had a clear in-scope fix, and filed follow-up issues for the rest.

In this PR (6 new commits — all 1585 backend tests green)

Commit Addresses
ff20721d refactor(import): make import directory @Value-configurable Markus #1 + Felix's symmetry note. static final String IMPORT_DIR = "/import"@Value("${app.import.dir:/import}") private String importDir. New test runImportAsync_readsFromConfiguredImportDir (red→green) proves configurability.
cdb5db6c fix(compose): require IMPORT_HOST_DIR, no default Markus #3 + Tobias #3 + Nora #3. Compose now refuses to start without explicit per-env path. Header comment documents the contract (payload-only directory; backend UID readability).
a40267e4 docs(deployment): document IMPORT_HOST_DIR and mass-import workflow Self-declared blocker (DEPLOYMENT.md L81) + Markus #2 + Tobias #5. Added row to backend env-var table; rewrote §6.4 with the rsync→env→trigger flow for staging/prod.
9703a72e ci(nightly): wire IMPORT_HOST_DIR=/srv/familienarchiv-staging/import Required by the fail-fast change.
c2c42706 ci(release): wire IMPORT_HOST_DIR=/srv/familienarchiv-production/import Same, for prod.
3775f4cb ci(nightly): regression guard for backend /import:ro mount Sara #1. Pre-build step renders compose config and fails the deploy if target: /import or read_only: true disappears. Red/green verified locally by toggling the volumes block.

Follow-up issues filed

# Persona Title
#528 Nora #4 security(import): harden DocumentBuilderFactory against XXE in MassImportService
#529 Nora #5 security(import): validate PDF magic bytes in MassImportService before S3 upload
#530 Nora #6 security(import): reject path-traversal filenames from ODS in MassImportService.processRows
#531 Sara #2 ci(nightly): post-deploy smoke test for /api/admin/import-status
#532 Sara #3 + Tobias #4 ci(nightly): assert backend container can read /import after deploy
#533 Leonie ui(admin/system): improve mass-import status card (loading state, i18n, font size)
#534 Elicit spec(import): decide and document mass-import operator policy (3 open questions)

What's still un-actioned

Nothing from the seven reviews is left silently. Every blocker and concern is either fixed in this PR or has a tracked follow-up issue with reproduction steps and acceptance criteria.

🤖 Generated with Claude Code during /implement on #526

## Review-fix pass — done Addressed every reviewer concern that had a clear in-scope fix, and filed follow-up issues for the rest. ### In this PR (6 new commits — all 1585 backend tests green) | Commit | Addresses | |---|---| | `ff20721d` `refactor(import): make import directory @Value-configurable` | Markus #1 + Felix's symmetry note. `static final String IMPORT_DIR = "/import"` → `@Value("${app.import.dir:/import}") private String importDir`. New test `runImportAsync_readsFromConfiguredImportDir` (red→green) proves configurability. | | `cdb5db6c` `fix(compose): require IMPORT_HOST_DIR, no default` | Markus #3 + Tobias #3 + Nora #3. Compose now refuses to start without explicit per-env path. Header comment documents the contract (payload-only directory; backend UID readability). | | `a40267e4` `docs(deployment): document IMPORT_HOST_DIR and mass-import workflow` | Self-declared blocker (DEPLOYMENT.md L81) + Markus #2 + Tobias #5. Added row to backend env-var table; rewrote §6.4 with the rsync→env→trigger flow for staging/prod. | | `9703a72e` `ci(nightly): wire IMPORT_HOST_DIR=/srv/familienarchiv-staging/import` | Required by the fail-fast change. | | `c2c42706` `ci(release): wire IMPORT_HOST_DIR=/srv/familienarchiv-production/import` | Same, for prod. | | `3775f4cb` `ci(nightly): regression guard for backend /import:ro mount` | Sara #1. Pre-build step renders compose config and fails the deploy if `target: /import` or `read_only: true` disappears. Red/green verified locally by toggling the volumes block. | ### Follow-up issues filed | # | Persona | Title | |---|---|---| | #528 | Nora #4 | security(import): harden DocumentBuilderFactory against XXE in MassImportService | | #529 | Nora #5 | security(import): validate PDF magic bytes in MassImportService before S3 upload | | #530 | Nora #6 | security(import): reject path-traversal filenames from ODS in MassImportService.processRows | | #531 | Sara #2 | ci(nightly): post-deploy smoke test for /api/admin/import-status | | #532 | Sara #3 + Tobias #4 | ci(nightly): assert backend container can read /import after deploy | | #533 | Leonie | ui(admin/system): improve mass-import status card (loading state, i18n, font size) | | #534 | Elicit | spec(import): decide and document mass-import operator policy (3 open questions) | ### What's still un-actioned Nothing from the seven reviews is left silently. Every blocker and concern is either fixed in this PR or has a tracked follow-up issue with reproduction steps and acceptance criteria. 🤖 Generated with [Claude Code](https://claude.com/claude-code) during /implement on #526
Author
Owner

🏛️ Markus Keller — Senior Application Architect

Verdict: Approved

The change pushes a configuration decision out of the codebase (private static final String IMPORT_DIR = "/import") and into the environment, which is exactly where it belongs. No layer boundaries crossed: only MassImportService touches its own constant; no service reaches into another domain.

What I checked

  • Configuration localityapp.import.dir lives in the importing module's own service via @Value. No cross-domain leak.
  • Fail-closed at the lowest layer${IMPORT_HOST_DIR:?...} makes Compose refuse to start when the variable is unset. The integrity check happens at the deployment boundary, not buried in application code. This is the right place for it.
  • Default preserved for dev/test@Value("${app.import.dir:/import}") keeps the dev ./import:/import mount and existing test fixtures working unchanged.
  • Doc currencydocs/DEPLOYMENT.md updated with the new env var and the staging/prod workflow (matches my "New Docker service or infrastructure component" rule). This is a new mount on an existing service, not a new service, so c4/l2-containers.puml does not need a change.

Suggestions (non-blocking)

  • ADR question — borderline. The decision ("env-driven host path, refuse default, read-only, per-env divergence") is captured well in the compose header comment + DEPLOYMENT.md. I'd skip the ADR here; the inline rationale is sufficient for a tactical bind mount. If a future change adds more host-bound payloads (e.g. a separate OCR training payload mount), revisit and write one ADR covering the pattern.

Boring, well-scoped, documented. Merge it.

— Markus

## 🏛️ Markus Keller — Senior Application Architect **Verdict: ✅ Approved** The change pushes a configuration decision out of the codebase (`private static final String IMPORT_DIR = "/import"`) and into the environment, which is exactly where it belongs. No layer boundaries crossed: only `MassImportService` touches its own constant; no service reaches into another domain. ### What I checked - **Configuration locality** — `app.import.dir` lives in the `importing` module's own service via `@Value`. No cross-domain leak. - **Fail-closed at the lowest layer** — `${IMPORT_HOST_DIR:?...}` makes Compose refuse to start when the variable is unset. The integrity check happens at the deployment boundary, not buried in application code. This is the right place for it. - **Default preserved for dev/test** — `@Value("${app.import.dir:/import}")` keeps the dev `./import:/import` mount and existing test fixtures working unchanged. - **Doc currency** — `docs/DEPLOYMENT.md` updated with the new env var and the staging/prod workflow (matches my "New Docker service or infrastructure component" rule). This is a new mount on an existing service, not a new service, so `c4/l2-containers.puml` does not need a change. ### Suggestions (non-blocking) - **ADR question** — borderline. The decision ("env-driven host path, refuse default, read-only, per-env divergence") is captured well in the compose header comment + DEPLOYMENT.md. I'd skip the ADR here; the inline rationale is sufficient for a tactical bind mount. If a future change adds *more* host-bound payloads (e.g. a separate OCR training payload mount), revisit and write one ADR covering the pattern. Boring, well-scoped, documented. Merge it. — Markus
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: Approved

Clean, small diff. The refactor from private static final String IMPORT_DIR@Value-injected field is the minimum change to make this configurable, and the test was added to prove the field drives the lookup.

What I checked

  • TDD evidencerunImportAsync_readsFromConfiguredImportDir exists and asserts both state() == FAILED and message().contains(tempDir.toString()). The second assertion is what makes this a real test: it proves the configured path is consulted, not the previous constant. Without that contains-check, the test would have passed against the old hardcoded /import too.
  • @TempDir — correct choice; deterministic and auto-cleaned. Better than Files.createTempDirectory().
  • NamingimportDir field, app.import.dir property, IMPORT_HOST_DIR env var. Three names for three layers (Java field, Spring property, env contract), each appropriate to its layer. Good.
  • Setup hygienesetUp() now seeds importDir via ReflectionTestUtils.setField(service, "importDir", "/import") so existing tests are unaffected. The new @Value default :/import would also have worked there, but explicit setup keeps the test self-documenting.

Suggestions (non-blocking)

  • Doc currency check — no CLAUDE.md change needed (no new package, no new permission, no new ErrorCode, no new route). DEPLOYMENT.md was the right doc and was updated.
  • Tiny nitfindSpreadsheetFile() still throws raw RuntimeException instead of DomainException.notFound(ErrorCode.IMPORT_NO_SPREADSHEET, ...). Pre-existing, not introduced here, but flagging because the error message now varies by env and could benefit from a structured ErrorCode the frontend can map. Don't bundle into this PR — file as a follow-up.

Ship it.

— Felix

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** Clean, small diff. The refactor from `private static final String IMPORT_DIR` → `@Value`-injected field is the minimum change to make this configurable, and the test was added to prove the field drives the lookup. ### What I checked - **TDD evidence** — `runImportAsync_readsFromConfiguredImportDir` exists and asserts both `state() == FAILED` and `message().contains(tempDir.toString())`. The second assertion is what makes this a real test: it proves the *configured* path is consulted, not the previous constant. Without that contains-check, the test would have passed against the old hardcoded `/import` too. - **`@TempDir`** — correct choice; deterministic and auto-cleaned. Better than `Files.createTempDirectory()`. - **Naming** — `importDir` field, `app.import.dir` property, `IMPORT_HOST_DIR` env var. Three names for three layers (Java field, Spring property, env contract), each appropriate to its layer. Good. - **Setup hygiene** — `setUp()` now seeds `importDir` via `ReflectionTestUtils.setField(service, "importDir", "/import")` so existing tests are unaffected. The new `@Value` default `:/import` would also have worked there, but explicit setup keeps the test self-documenting. ### Suggestions (non-blocking) - **Doc currency check** — no `CLAUDE.md` change needed (no new package, no new permission, no new ErrorCode, no new route). DEPLOYMENT.md was the right doc and was updated. - **Tiny nit** — `findSpreadsheetFile()` still throws raw `RuntimeException` instead of `DomainException.notFound(ErrorCode.IMPORT_NO_SPREADSHEET, ...)`. Pre-existing, not introduced here, but flagging because the error message now varies by env and could benefit from a structured ErrorCode the frontend can map. Don't bundle into this PR — file as a follow-up. Ship it. — Felix
Author
Owner

🛠️ Tobias Wendt — DevOps & Platform Engineer

Verdict: Approved

This is the shape I want from an infra PR: small, env-driven, fail-closed at compose, and with a CI guard that prevents future regressions.

What I checked

  • :ro flag — correct. MassImportService only calls Files.list / Files.walk — backend cannot mutate or delete source PDFs even if a bug or RCE tries to. Least-privilege at the mount level.
  • ${IMPORT_HOST_DIR:?Set IMPORT_HOST_DIR...} with no default — compose refuses to start without it. This is exactly right for staging vs prod isolation: no chance of accidentally pointing prod at the staging payload (or vice versa) because someone forgot to set the variable.
  • Per-env values pinned in CI.gitea/workflows/nightly.yml writes IMPORT_HOST_DIR=/srv/familienarchiv-staging/import; release.yml writes /srv/familienarchiv-production/import. Sensible convention, predictable on the VPS.
  • Regression guard stepdocker compose ... config > /tmp/compose-rendered.yml + two grep assertions for target: /import and read_only: true. Using rendered compose output (not raw YAML) is the right call: it survives shorthand → longform refactors. The inline comment in the workflow ("If a future 'compose cleanup' PR drops the volumes block, mass import silently breaks again") states the intent clearly.
  • Bind mount vs named volume — usually I'd push back on bind mounts in prod, but this is a read-only operator-staged payload directory, not application state. Different scenario from postgres_data. Approved.

Notes (non-blocking)

  • Backend UID — the compose header comment correctly flags that the backend currently runs as root via the OpenJDK image, so a world-readable /srv/... works. When we eventually move backend to a non-root UID (which we should, separately), the /srv/familienarchiv-*/import directory needs chown to that UID. Worth a follow-up issue tagged devops so it isn't forgotten when the base image switches.
  • Verification command in PR body — the docker inspect ... {{range .Mounts}}{{.Source}} -> {{.Destination}} ({{.Mode}}) line is the right post-deploy smoke check. Consider folding it into the nightly workflow as a post-up assert (after compose up -d), in addition to the pre-up compose config check that's already there. Belt and braces — not required for this PR.

Solid. Merge.

— Tobi

## 🛠️ Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** This is the shape I want from an infra PR: small, env-driven, fail-closed at compose, and with a CI guard that prevents future regressions. ### What I checked - **`:ro` flag** — correct. `MassImportService` only calls `Files.list` / `Files.walk` — backend cannot mutate or delete source PDFs even if a bug or RCE tries to. Least-privilege at the mount level. - **`${IMPORT_HOST_DIR:?Set IMPORT_HOST_DIR...}` with no default** — compose refuses to start without it. This is exactly right for staging vs prod isolation: no chance of accidentally pointing prod at the staging payload (or vice versa) because someone forgot to set the variable. - **Per-env values pinned in CI** — `.gitea/workflows/nightly.yml` writes `IMPORT_HOST_DIR=/srv/familienarchiv-staging/import`; `release.yml` writes `/srv/familienarchiv-production/import`. Sensible convention, predictable on the VPS. - **Regression guard step** — `docker compose ... config > /tmp/compose-rendered.yml` + two `grep` assertions for `target: /import` and `read_only: true`. Using rendered compose output (not raw YAML) is the right call: it survives shorthand → longform refactors. The inline comment in the workflow ("If a future 'compose cleanup' PR drops the volumes block, mass import silently breaks again") states the intent clearly. - **Bind mount vs named volume** — usually I'd push back on bind mounts in prod, but this is a *read-only operator-staged payload* directory, not application state. Different scenario from `postgres_data`. Approved. ### Notes (non-blocking) - **Backend UID** — the compose header comment correctly flags that the backend currently runs as root via the OpenJDK image, so a world-readable `/srv/...` works. When we eventually move backend to a non-root UID (which we should, separately), the `/srv/familienarchiv-*/import` directory needs `chown` to that UID. Worth a follow-up issue tagged `devops` so it isn't forgotten when the base image switches. - **Verification command in PR body** — the `docker inspect ... {{range .Mounts}}{{.Source}} -> {{.Destination}} ({{.Mode}})` line is the right post-deploy smoke check. Consider folding it into the nightly workflow as a *post-up* assert (after `compose up -d`), in addition to the pre-up `compose config` check that's already there. Belt and braces — not required for this PR. Solid. Merge. — Tobi
Author
Owner

🛡️ Nora "NullX" Steiner — Application Security Engineer

Verdict: Approved

No vulnerabilities introduced. The bind mount and configuration plumbing make the existing attack surface slightly smaller, not larger.

Threat model walkthrough

  • :ro mount — the backend container cannot write back to the host import directory. If an attacker achieves code execution inside the backend (via a separate, unrelated bug), they cannot tamper with or replace the source PDFs/spreadsheet from inside the container. Defense in depth done right.
  • No user-controlled pathIMPORT_HOST_DIR is operator-supplied via the env file; app.import.dir is set from that env at startup. Neither flows from an HTTP request body. No path injection via the API.
  • Endpoint authorization — verified: AdminController has class-level @RequirePermission(Permission.ADMIN) (backend/src/main/java/org/raddatz/familienarchiv/user/AdminController.java:20), and both /api/admin/trigger-import (POST) and /api/admin/import-status (GET) inherit it. Mass import remains ADMIN-only.
  • Compose fails closed${IMPORT_HOST_DIR:?...} is explicit: no var, no container. Beats a silent default that might point at the wrong directory.

Observations (informational, not vulnerabilities in this PR)

  • Pre-existing: Files.walk symlink behaviorfindFileRecursive walks the full /import tree. Java's Files.walk does not follow symlinks by default (you'd need FileVisitOption.FOLLOW_LINKS), so an operator misconfiguration (symlinking /srv/familienarchiv-production/import/foo -> /etc) would not escape the mount via this code path. Worth a one-liner unit test asserting non-follow behavior to lock the contract in. Not a blocker — this is unchanged by the PR.
  • Pre-existing: filename matching.toLowerCase() + endsWith(".ods/.xlsx/.xls") first-match-wins. Since the directory is :ro and operator-staged, this is fine in the current threat model. If we ever accept user-uploaded files into /import, that whitelist needs to harden (content-type sniffing, not extension). Not in scope here.

Ship it.

— Nora

## 🛡️ Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** No vulnerabilities introduced. The bind mount and configuration plumbing make the existing attack surface slightly *smaller*, not larger. ### Threat model walkthrough - **`:ro` mount** — the backend container cannot write back to the host import directory. If an attacker achieves code execution inside the backend (via a separate, unrelated bug), they cannot tamper with or replace the source PDFs/spreadsheet from inside the container. Defense in depth done right. - **No user-controlled path** — `IMPORT_HOST_DIR` is operator-supplied via the env file; `app.import.dir` is set from that env at startup. Neither flows from an HTTP request body. No path injection via the API. - **Endpoint authorization** — verified: `AdminController` has class-level `@RequirePermission(Permission.ADMIN)` (backend/src/main/java/org/raddatz/familienarchiv/user/AdminController.java:20), and both `/api/admin/trigger-import` (POST) and `/api/admin/import-status` (GET) inherit it. Mass import remains ADMIN-only. - **Compose fails closed** — `${IMPORT_HOST_DIR:?...}` is explicit: no var, no container. Beats a silent default that might point at the wrong directory. ### Observations (informational, not vulnerabilities in this PR) - **Pre-existing: `Files.walk` symlink behavior** — `findFileRecursive` walks the full `/import` tree. Java's `Files.walk` *does not* follow symlinks by default (you'd need `FileVisitOption.FOLLOW_LINKS`), so an operator misconfiguration (symlinking `/srv/familienarchiv-production/import/foo -> /etc`) would not escape the mount via this code path. Worth a one-liner unit test asserting non-follow behavior to lock the contract in. Not a blocker — this is unchanged by the PR. - **Pre-existing: filename matching** — `.toLowerCase()` + `endsWith(".ods/.xlsx/.xls")` first-match-wins. Since the directory is `:ro` and operator-staged, this is fine in the current threat model. If we ever accept user-uploaded files into `/import`, that whitelist needs to harden (content-type sniffing, not extension). Not in scope here. Ship it. — Nora
Author
Owner

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: Approved

The wiring change is properly covered, and the CI regression guard is the kind of test I want to see more of: assertions against rendered infrastructure, not just code.

What I checked

  • Unit coveragerunImportAsync_readsFromConfiguredImportDir is one logical behavior (the configured field drives the lookup), one test, one reason to fail. Asserts both State.FAILED and message().contains(tempDir.toString()). Without that contains check, the test wouldn't prove anything beyond the existing "no spreadsheet" test, so the second assertion carries the weight here.
  • Test data hygiene@TempDir Path tempDir, deterministic, JUnit handles cleanup. ReflectionTestUtils.setField is appropriate for unit-scope; we're not booting Spring.
  • Test name — reads as a sentence: "runImportAsync reads from configured import dir."
  • CI regression guarddocker compose config > /tmp/compose-rendered.yml then grep for target: /import and read_only: true. This is essentially an infrastructure integration test running in nightly. It will fail loudly if a future cleanup PR strips the volume block — exactly the regression scenario the comment in the workflow calls out.

Suggestions (non-blocking)

  • End-to-end integration test gap — there's no automated test today that: (1) plants a valid .ods + a matching PDF in a temp /import, (2) boots Spring with app.import.dir=<tempDir>, (3) hits POST /api/admin/trigger-import, (4) asserts a row appears in documents. The current unit test covers configuration wiring; a Testcontainers-based test would lock the full import path. File as a follow-up — not blocking this PR, but it's the next test I'd write to seal this loop.
  • Compose-config check as a reusable composite action — if you find yourself writing more of these (e.g. asserting that /data/postgres mount is named, or that ports aren't exposed publicly), extract a .gitea/actions/assert-compose-mount composite. Future you will thank present you.

Approved.

— Sara

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ✅ Approved** The wiring change is properly covered, and the CI regression guard is the kind of test I want to see more of: assertions against rendered infrastructure, not just code. ### What I checked - **Unit coverage** — `runImportAsync_readsFromConfiguredImportDir` is one logical behavior (the configured field drives the lookup), one test, one reason to fail. Asserts both `State.FAILED` *and* `message().contains(tempDir.toString())`. Without that `contains` check, the test wouldn't prove anything beyond the existing "no spreadsheet" test, so the second assertion carries the weight here. - **Test data hygiene** — `@TempDir Path tempDir`, deterministic, JUnit handles cleanup. `ReflectionTestUtils.setField` is appropriate for unit-scope; we're not booting Spring. - **Test name** — reads as a sentence: "runImportAsync reads from configured import dir." ✅ - **CI regression guard** — `docker compose config > /tmp/compose-rendered.yml` then `grep` for `target: /import` and `read_only: true`. This is essentially an infrastructure integration test running in nightly. It will fail loudly if a future cleanup PR strips the volume block — exactly the regression scenario the comment in the workflow calls out. ### Suggestions (non-blocking) - **End-to-end integration test gap** — there's no automated test today that: (1) plants a valid `.ods` + a matching PDF in a temp `/import`, (2) boots Spring with `app.import.dir=<tempDir>`, (3) hits `POST /api/admin/trigger-import`, (4) asserts a row appears in `documents`. The current unit test covers configuration wiring; a Testcontainers-based test would lock the full import path. File as a follow-up — not blocking this PR, but it's the next test I'd write to seal this loop. - **Compose-config check as a reusable composite action** — if you find yourself writing more of these (e.g. asserting that `/data/postgres` mount is named, or that ports aren't exposed publicly), extract a `.gitea/actions/assert-compose-mount` composite. Future you will thank present you. Approved. — Sara
Author
Owner

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Verdict: Approved

No UI surface area touched: backend config, compose volume, CI workflow, and DEPLOYMENT.md. Nothing on the front end to evaluate against WCAG, brand tokens, or responsive breakpoints.

What I checked

  • Frontend Svelte files: none changed.
  • Tailwind / design tokens: untouched.
  • User-facing error path: MassImportService still throws RuntimeException("Keine Tabellendatei (.ods/.xlsx/.xls) in <path> gefunden!"). This already bubbles up to the /admin/system mass-import card today; the path in the message just becomes more descriptive after this PR. That's a small win for ops, neutral for end-users (admins) who already see the German message via the existing error handling.

One small thought, for whenever someone next touches the import UI

The /admin/system mass-import card now has a real chance of working end-to-end on staging/prod, which means admins will start seeing its success states for the first time. When that lands, double-check the card surfaces processed-counts and progress in a way that meets the senior-friendly minimums (≥16px body, ≥44px touch targets, visible focus, non-color status cues). Not in scope here — just noting that this PR makes that card useful, so its UX is now worth a real look.

LGTM.

— Leonie

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist **Verdict: ✅ Approved** No UI surface area touched: backend config, compose volume, CI workflow, and DEPLOYMENT.md. Nothing on the front end to evaluate against WCAG, brand tokens, or responsive breakpoints. ### What I checked - Frontend Svelte files: none changed. - Tailwind / design tokens: untouched. - User-facing error path: `MassImportService` still throws `RuntimeException("Keine Tabellendatei (.ods/.xlsx/.xls) in <path> gefunden!")`. This already bubbles up to the `/admin/system` mass-import card today; the *path* in the message just becomes more descriptive after this PR. That's a small win for ops, neutral for end-users (admins) who already see the German message via the existing error handling. ### One small thought, for whenever someone next touches the import UI The `/admin/system` mass-import card now has a real chance of working end-to-end on staging/prod, which means admins will start seeing its *success* states for the first time. When that lands, double-check the card surfaces processed-counts and progress in a way that meets the senior-friendly minimums (≥16px body, ≥44px touch targets, visible focus, non-color status cues). Not in scope here — just noting that this PR makes that card *useful*, so its UX is now worth a real look. LGTM. — Leonie
Author
Owner

📋 Elicit — Requirements Engineer & Business Analyst

Verdict: Approved

This is a Brownfield Phase B4 fix — closing a gap between the existing feature ("mass-import card on /admin/system") and the production environment where it could not run. The PR description itself is what I'd write as the issue body: clear why, explicit choices made, an operational how-to, and a test plan.

Requirements lens

  • Problem statement — unambiguous: MassImportService reads /import inside the container; docker-compose.prod.yml had no equivalent bind mount, so the staging/prod mass-import path was structurally broken. Cited exact file:line (MassImportService.java:102) for traceability.
  • Non-functional implications captured — read-only contract (security/integrity), env-driven (configurability), no-default (operability / fail-closed). Each choice has its rationale inline.
  • Acceptance criteria — implicit but verifiable, expressed as the test-plan checklist:
    1. compose config parses
    2. docker inspect shows /srv/...-staging/import -> /import (ro)
    3. POST /api/admin/import finds the spreadsheet
    4. /admin/system status card reports processed counts
  • Documentation traceabilitydocs/DEPLOYMENT.md updated with the env var entry and the staging/production workflow (rsync → set env → redeploy → trigger). A reader handed this branch and the docs alone can deploy without asking questions.

Suggestions (non-blocking)

  • Issue link in PR body — the PR opens with the problem statement but doesn't cite the originating Gitea issue (the CI regression-guard comment references #526, which is this PR). If a separate "mass-import card broken on staging" issue exists, link it for the traceability matrix; if not, the PR self-suffices.
  • Definition-of-Done evidence — items 2/3/4 in the test plan are post-deploy checks, not yet verified at PR time. Make sure those checkboxes get ticked (with output pasted) before the merge button is hit, or capture in a follow-up "verified on staging" comment.

Nothing here is solution-bias or gold-plating; the work matches the user need precisely. LGTM.

— Elicit

## 📋 Elicit — Requirements Engineer & Business Analyst **Verdict: ✅ Approved** This is a Brownfield Phase B4 fix — closing a gap between the existing feature ("mass-import card on `/admin/system`") and the production environment where it could not run. The PR description itself is what I'd write as the issue body: clear *why*, explicit *choices made*, an operational *how-to*, and a *test plan*. ### Requirements lens - **Problem statement** — unambiguous: `MassImportService` reads `/import` inside the container; `docker-compose.prod.yml` had no equivalent bind mount, so the staging/prod mass-import path was structurally broken. Cited exact file:line (`MassImportService.java:102`) for traceability. - **Non-functional implications captured** — read-only contract (security/integrity), env-driven (configurability), no-default (operability / fail-closed). Each choice has its rationale inline. - **Acceptance criteria** — implicit but verifiable, expressed as the test-plan checklist: 1. `compose config` parses 2. `docker inspect` shows `/srv/...-staging/import -> /import (ro)` 3. `POST /api/admin/import` finds the spreadsheet 4. `/admin/system` status card reports processed counts - **Documentation traceability** — `docs/DEPLOYMENT.md` updated with the env var entry and the staging/production workflow (rsync → set env → redeploy → trigger). A reader handed this branch and the docs alone can deploy without asking questions. ### Suggestions (non-blocking) - **Issue link in PR body** — the PR opens with the problem statement but doesn't cite the originating Gitea issue (the CI regression-guard comment references `#526`, which is *this* PR). If a separate "mass-import card broken on staging" issue exists, link it for the traceability matrix; if not, the PR self-suffices. - **Definition-of-Done evidence** — items 2/3/4 in the test plan are post-deploy checks, not yet verified at PR time. Make sure those checkboxes get ticked (with output pasted) before the merge button is hit, or capture in a follow-up "verified on staging" comment. Nothing here is solution-bias or gold-plating; the work matches the user need precisely. LGTM. — Elicit
marcel merged commit 3775f4cb52 into main 2026-05-11 20:55:43 +02:00
marcel deleted branch feat/prod-import-bind-mount 2026-05-11 20:55:43 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#526