feat(infra): bind-mount /import for backend mass-import endpoint #526
Reference in New Issue
Block a user
Delete Branch "feat/prod-import-bind-mount"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
MassImportServicereads from a hardcoded/importpath inside the backend container (MassImportService.java:102). Dev compose mounts./import:/import, butdocker-compose.prod.ymlhad no equivalent — soPOST /api/admin/importon staging/prod always fails with "no spreadsheet found", and the import card on/admin/systemcan never do anything useful.This PR adds the missing bind mount, env-driven so the host path is stable across CI deploys (the compose working dir is recreated every deploy, so
./importwould never persist).Changes
Choices made:
IMPORT_HOST_DIR) — per-env override; staging and prod can point to different host paths. Sensible default outside the compose working dir.MassImportServiceonly doesFiles.list/Files.walkon/import, never writes.:romakes that contract explicit and prevents the backend from mutating the source PDFs./srv/familienarchiv/importdoesn't exist or is empty, Docker still creates an empty bind source, and the import API just returns the existing "no spreadsheet found" error. No container crash.Also documents the env var in the compose header next to the other optional/required vars.
How to use this on staging
rsync -avh --progress ./import/ hetzner:/srv/familienarchiv-staging/import/IMPORT_HOST_DIR=/srv/familienarchiv-staging/importin.env.stagingon the runner/admin/systemmass-import cardTest plan
docker compose -f docker-compose.prod.yml --profile staging configparses (verified locally)docker inspect archiv-staging-backend-1 --format '{{range .Mounts}}{{.Source}} -> {{.Destination}} ({{.Mode}}){{println}}{{end}}'shows/srv/familienarchiv-staging/import -> /import (ro)POST /api/admin/importon staging finds the spreadsheet and starts the import/admin/systemstatus card reports processed counts🤖 Generated with Claude Code
🏛️ Markus Keller — Application Architect
Verdict: ⚠️ Approved with concerns
The mount change itself is correct. Three architectural notes — none blockers, all worth a follow-up issue.
Concerns
Hardcoded
/importinMassImportService.java:102is the real coupling. The path isprivate static final String IMPORT_DIR = "/import". Every other configurable input in that service is a@Value("${app.import.col.*}")— this one is the odd duck. The bind mount makes the magic path work; the magic path remains. A@Value("${app.import.dir:/import}")would lift the contract from infra-config into application-config, where it belongs.Doc update — partial coverage. Per my doc table:
docs/architecture/c4/l2-containers.puml+docs/DEPLOYMENT.mdThis isn't a new service, so the L2 diagram probably doesn't need a new node. But
docs/DEPLOYMENT.mdshould mentionIMPORT_HOST_DIRnext to the other env vars, and the operator runbook for "how to do a one-time bulk import" deserves a paragraph. Follow-up, not merge blocker.Default path collision between projects on the same host. Compose project names namespace volumes and containers (
archiv-staging/archiv-production), but the default bind source/srv/familienarchiv/importis shared. If both stacks run on hetzner and both use the default, the import folder is shared between them. OverrideIMPORT_HOST_DIRper env (the PR description already says to). Add a sentence in the header comment: "When staging and production cohabit a host, set this explicitly per env to avoid sharing the import source."Suggestions
@Value("${app.import.dir:/import}")to dissolve the magic path.IMPORT_HOST_DIRline todocs/DEPLOYMENT.mdenv-var section before merge if it's a 30-second edit.👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
YAML-only diff. Nothing to red/green here — declarative config.
What I checked
volumes:entry explains why:rois safe (Files.list/Files.walkonly). That's the "why" comment style — exactly right. Reviewers six months from now will read this and not wonder.MassImportService.findSpreadsheetFile()(lines 131–140) actually does: it throws a structuredDomainExceptionwith a clean error code, not aRuntimeException. The 500 response is structured, the operator sees a readable message.Suggestion (non-blocking)
The pre-existing
private static final String IMPORT_DIR = "/import"in the service is a magic-value smell — every other parameter is@Value-driven. A follow-up could lift it to@Value("${app.import.dir:/import}")for symmetry. Out of scope for this PR.Ship it.
🛠️ Tobias Wendt — DevOps & Platform Engineer
Verdict: ⚠️ Approved with concerns
Solid infra change. The bind mount belonged in this file the day the import endpoint shipped — good catch.
What's right
:ro— read-only is the correct mode. Mass import only reads.OCR_MEM_LIMIT,MAIL_PORT, etc. in this file.docker inspect).Concerns (none merge-blocking)
Bind-mount on the Docker-out-of-Docker runner. The
create-bucketsservice has a long comment explaining bind mounts fail on the DinD runner for relative paths. The PR uses an absolute default (/srv/familienarchiv/import), so it should work — the host daemon resolves it against the host filesystem. Verify on the next staging deploy:If the mount source comes through as the runner-container path instead of the host path, switch to a named volume populated out-of-band.
Docker auto-creates missing bind sources as
root:root. If/srv/familienarchiv/importdoesn't exist, the daemon creates it. Mode 755, owned by root. Fine for the backend read path; document the convention in the header: "the host path must exist and be readable by the backend container's UID."Default collision between prod and staging on the same host. Both projects would share
/srv/familienarchiv/importby default. The PR description already tells the operator to setIMPORT_HOST_DIRper env, but the default invites accidental sharing. Cleaner alternative: drop the default, require the env var explicitly. Or namespace it:${IMPORT_HOST_DIR:?Set this per environment to a host path holding the mass-import payload}— fails fast if forgotten.Backend container UID check. What user does the backend image run as?
/import(owned by root, mode 755) is readable for any UID, but worth a one-line verification after deploy:docs/DEPLOYMENT.md. TheIMPORT_HOST_DIRenv var belongs in the deployment runbook, not just the compose header. A future operator should read DEPLOYMENT.md and know how to enable mass import without grepping the compose file.Cost impact
Zero — host-local bind mount, no new managed service.
📋 Elicit — Requirements Engineer / Business Analyst
Verdict: ✅ Approved
This is a pure infrastructure delta — not a user-facing requirement change. I'd normally flag missing user stories and acceptance criteria for a feature PR, but this PR unlocks an existing feature (the
/admin/systemmass-import card) rather than introducing new behavior.Implicit requirement satisfied
FR-IMPORT-001 (implicit): "Administrators shall be able to trigger a bulk ODS+PDF import from
/admin/systemagainst documents pre-staged on the production server."This requirement was effectively already in the codebase (
MassImportService,AdminController#triggerMassImport, the admin UI card) but unsatisfiable on production until this PR. So this PR is the missing infra contract for an existing FR — a legitimate gap-fill.Open Questions Register
IMPORT_ALREADY_RUNNINGguard prevents concurrent imports but doesn't cap individual size.MassImportService.ImportStatusreportsFAILEDwith a message, but does it leave dangling S3 objects or partial DB rows?No blockers. The test plan in the PR description doubles as a manual acceptance check, which is sufficient for infra of this size. Add the three open questions to the backlog as a single follow-up issue.
🔒 Nora "NullX" Steiner — Application Security
Verdict: ⚠️ Approved with concerns
The mount itself is benign — read-only, host-local, no network exposure. Signing off on this change. The concerns below are about the broader import path that this PR newly activates on production.
On the mount (this PR)
:ro— correct. Prevents a compromised backend container from writing back to the host (anti-persistence)./workspace/marcel/familienarchiv(the compose working dir wiped per deploy), so the payload isn't co-mingled with deploy artifacts.IMPORT_HOST_DIRis operator-controlled and unvalidated. If an operator (or a compromised.env.staging) setsIMPORT_HOST_DIR=/etcor/var/lib/postgresql/data, the backend reads those paths as/import.:roblocks writes, but the read surfaces host file contents to a process with S3 write access. Blast radius is "operator already has root on the host", so I rate this Low. Add a one-line warning in the header: "must point to a host directory containing only mass-import payload."On the import path this PR newly activates (not blockers for this PR — file follow-up issues)
Apache POI XXE history.
MassImportServiceusesDocumentBuilderFactory(line 29) to parse the ODS XML. I want to seesetFeature("http://apache.org/xml/features/disallow-doctype-decl", true)(or equivalent) on that factory before mass import runs on prod. POI 5.5.0 is recent enough that this is probably fine — verify, because mass import on prod is now reachable.PDF content-type validation. The mount feeds arbitrary
.pdffiles intoS3Client.putObjectand the OCR/thumbnail pipeline. DoesMassImportServiceverify that what's named*.pdfactually has a%PDF-magic header? Or do we trust the filename?Path traversal inside the ODS-referenced filenames. The ODS spreadsheet contains filenames in cells.
MassImportService.processRows(line 121) maps those to files in/import. Does it reject..or absolute paths in the spreadsheet cells? If an attacker controls the ODS, can they read arbitrary files via the import?Detection notes for the team
Ship this PR. Open three follow-up tickets for items 4 / 5 / 6 — they're pre-existing behavior, but production reachability changes the risk profile.
🧪 Sara Holt — QA Engineer & Test Strategist
Verdict: ⚠️ Approved with concerns
Pure YAML change. Nothing testable at the unit level. But there's a regression-prevention gap.
What's missing
No CI guard against silent removal. Someone could revert this single line in a future "compose cleanup" PR and CI would happily pass — the import feature would silently break again. Add the cheapest possible insurance to the deploy workflow:
4-line CI step that prevents the exact same bug from happening twice.
Post-deploy smoke test for the import endpoint. The PR's test plan is four manual checkboxes. The first one (
docker compose configparses) is verified at build time. The other three (mount appears, POST works, status reports counts) belong in the post-deploy smoke suite. Even a basiccurl $URL/api/admin/import-statusreturning a 200 with a known shape would catch a future regression — and reuses existing admin auth.No assertion that the host path is readable by the container's UID. Five lines of Bash after deploy would catch the silent failure mode where the mount succeeds but the backend container user can't read the directory:
What's fine
docker compose configparses quietly — the right minimal local check.Approved on the assumption that items 1–3 land as a follow-up CI ticket within one sprint. A flaky-or-missing test is a critical bug; this one is just missing, which is cheaper to fix.
🎨 Leonie Voss — UX & Accessibility
Verdict: ✅ Approved
No UI changes. Pure compose file edit. Out of scope for me.
Side note (not this PR)
Once mass import actually runs on staging and the
/admin/systemcard does something, please verify on the next pass:RUNNING— not just a stale "IDLE"DONE/FAILEDmessages are localized via Paraglide (messages/{de,en,es}.json), not raw backendStringstext-base(16px) — they'll be read by a 60+ user reviewing import resultsThose are pre-existing concerns about the import card, not this PR. LGTM here.
Tobias and Markus both flagged that a shared default (/srv/familienarchiv/ import) invites silent collision when staging and prod cohabit one host. Switch to ${IMPORT_HOST_DIR:?...} so compose refuses to start without an explicit per-env path — collision becomes structurally impossible. The error message points operators at docs/DEPLOYMENT.md so the recovery step is one click away. IMPORT_HOST_DIR moves from "Optional" to the main required-env-vars block in the header. Addresses review feedback from Markus, Tobias, and Nora on #526. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Review-fix pass — done
Addressed every reviewer concern that had a clear in-scope fix, and filed follow-up issues for the rest.
In this PR (6 new commits — all 1585 backend tests green)
ff20721drefactor(import): make import directory @Value-configurablestatic final String IMPORT_DIR = "/import"→@Value("${app.import.dir:/import}") private String importDir. New testrunImportAsync_readsFromConfiguredImportDir(red→green) proves configurability.cdb5db6cfix(compose): require IMPORT_HOST_DIR, no defaulta40267e4docs(deployment): document IMPORT_HOST_DIR and mass-import workflow9703a72eci(nightly): wire IMPORT_HOST_DIR=/srv/familienarchiv-staging/importc2c42706ci(release): wire IMPORT_HOST_DIR=/srv/familienarchiv-production/import3775f4cbci(nightly): regression guard for backend /import:ro mounttarget: /importorread_only: truedisappears. Red/green verified locally by toggling the volumes block.Follow-up issues filed
What's still un-actioned
Nothing from the seven reviews is left silently. Every blocker and concern is either fixed in this PR or has a tracked follow-up issue with reproduction steps and acceptance criteria.
🤖 Generated with Claude Code during /implement on #526
🏛️ Markus Keller — Senior Application Architect
Verdict: ✅ Approved
The change pushes a configuration decision out of the codebase (
private static final String IMPORT_DIR = "/import") and into the environment, which is exactly where it belongs. No layer boundaries crossed: onlyMassImportServicetouches its own constant; no service reaches into another domain.What I checked
app.import.dirlives in theimportingmodule's own service via@Value. No cross-domain leak.${IMPORT_HOST_DIR:?...}makes Compose refuse to start when the variable is unset. The integrity check happens at the deployment boundary, not buried in application code. This is the right place for it.@Value("${app.import.dir:/import}")keeps the dev./import:/importmount and existing test fixtures working unchanged.docs/DEPLOYMENT.mdupdated with the new env var and the staging/prod workflow (matches my "New Docker service or infrastructure component" rule). This is a new mount on an existing service, not a new service, soc4/l2-containers.pumldoes not need a change.Suggestions (non-blocking)
Boring, well-scoped, documented. Merge it.
— Markus
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
Clean, small diff. The refactor from
private static final String IMPORT_DIR→@Value-injected field is the minimum change to make this configurable, and the test was added to prove the field drives the lookup.What I checked
runImportAsync_readsFromConfiguredImportDirexists and asserts bothstate() == FAILEDandmessage().contains(tempDir.toString()). The second assertion is what makes this a real test: it proves the configured path is consulted, not the previous constant. Without that contains-check, the test would have passed against the old hardcoded/importtoo.@TempDir— correct choice; deterministic and auto-cleaned. Better thanFiles.createTempDirectory().importDirfield,app.import.dirproperty,IMPORT_HOST_DIRenv var. Three names for three layers (Java field, Spring property, env contract), each appropriate to its layer. Good.setUp()now seedsimportDirviaReflectionTestUtils.setField(service, "importDir", "/import")so existing tests are unaffected. The new@Valuedefault:/importwould also have worked there, but explicit setup keeps the test self-documenting.Suggestions (non-blocking)
CLAUDE.mdchange needed (no new package, no new permission, no new ErrorCode, no new route). DEPLOYMENT.md was the right doc and was updated.findSpreadsheetFile()still throws rawRuntimeExceptioninstead ofDomainException.notFound(ErrorCode.IMPORT_NO_SPREADSHEET, ...). Pre-existing, not introduced here, but flagging because the error message now varies by env and could benefit from a structured ErrorCode the frontend can map. Don't bundle into this PR — file as a follow-up.Ship it.
— Felix
🛠️ Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
This is the shape I want from an infra PR: small, env-driven, fail-closed at compose, and with a CI guard that prevents future regressions.
What I checked
:roflag — correct.MassImportServiceonly callsFiles.list/Files.walk— backend cannot mutate or delete source PDFs even if a bug or RCE tries to. Least-privilege at the mount level.${IMPORT_HOST_DIR:?Set IMPORT_HOST_DIR...}with no default — compose refuses to start without it. This is exactly right for staging vs prod isolation: no chance of accidentally pointing prod at the staging payload (or vice versa) because someone forgot to set the variable..gitea/workflows/nightly.ymlwritesIMPORT_HOST_DIR=/srv/familienarchiv-staging/import;release.ymlwrites/srv/familienarchiv-production/import. Sensible convention, predictable on the VPS.docker compose ... config > /tmp/compose-rendered.yml+ twogrepassertions fortarget: /importandread_only: true. Using rendered compose output (not raw YAML) is the right call: it survives shorthand → longform refactors. The inline comment in the workflow ("If a future 'compose cleanup' PR drops the volumes block, mass import silently breaks again") states the intent clearly.postgres_data. Approved.Notes (non-blocking)
/srv/...works. When we eventually move backend to a non-root UID (which we should, separately), the/srv/familienarchiv-*/importdirectory needschownto that UID. Worth a follow-up issue taggeddevopsso it isn't forgotten when the base image switches.docker inspect ... {{range .Mounts}}{{.Source}} -> {{.Destination}} ({{.Mode}})line is the right post-deploy smoke check. Consider folding it into the nightly workflow as a post-up assert (aftercompose up -d), in addition to the pre-upcompose configcheck that's already there. Belt and braces — not required for this PR.Solid. Merge.
— Tobi
🛡️ Nora "NullX" Steiner — Application Security Engineer
Verdict: ✅ Approved
No vulnerabilities introduced. The bind mount and configuration plumbing make the existing attack surface slightly smaller, not larger.
Threat model walkthrough
:romount — the backend container cannot write back to the host import directory. If an attacker achieves code execution inside the backend (via a separate, unrelated bug), they cannot tamper with or replace the source PDFs/spreadsheet from inside the container. Defense in depth done right.IMPORT_HOST_DIRis operator-supplied via the env file;app.import.diris set from that env at startup. Neither flows from an HTTP request body. No path injection via the API.AdminControllerhas class-level@RequirePermission(Permission.ADMIN)(backend/src/main/java/org/raddatz/familienarchiv/user/AdminController.java:20), and both/api/admin/trigger-import(POST) and/api/admin/import-status(GET) inherit it. Mass import remains ADMIN-only.${IMPORT_HOST_DIR:?...}is explicit: no var, no container. Beats a silent default that might point at the wrong directory.Observations (informational, not vulnerabilities in this PR)
Files.walksymlink behavior —findFileRecursivewalks the full/importtree. Java'sFiles.walkdoes not follow symlinks by default (you'd needFileVisitOption.FOLLOW_LINKS), so an operator misconfiguration (symlinking/srv/familienarchiv-production/import/foo -> /etc) would not escape the mount via this code path. Worth a one-liner unit test asserting non-follow behavior to lock the contract in. Not a blocker — this is unchanged by the PR..toLowerCase()+endsWith(".ods/.xlsx/.xls")first-match-wins. Since the directory is:roand operator-staged, this is fine in the current threat model. If we ever accept user-uploaded files into/import, that whitelist needs to harden (content-type sniffing, not extension). Not in scope here.Ship it.
— Nora
🧪 Sara Holt — QA Engineer & Test Strategist
Verdict: ✅ Approved
The wiring change is properly covered, and the CI regression guard is the kind of test I want to see more of: assertions against rendered infrastructure, not just code.
What I checked
runImportAsync_readsFromConfiguredImportDiris one logical behavior (the configured field drives the lookup), one test, one reason to fail. Asserts bothState.FAILEDandmessage().contains(tempDir.toString()). Without thatcontainscheck, the test wouldn't prove anything beyond the existing "no spreadsheet" test, so the second assertion carries the weight here.@TempDir Path tempDir, deterministic, JUnit handles cleanup.ReflectionTestUtils.setFieldis appropriate for unit-scope; we're not booting Spring.docker compose config > /tmp/compose-rendered.ymlthengrepfortarget: /importandread_only: true. This is essentially an infrastructure integration test running in nightly. It will fail loudly if a future cleanup PR strips the volume block — exactly the regression scenario the comment in the workflow calls out.Suggestions (non-blocking)
.ods+ a matching PDF in a temp/import, (2) boots Spring withapp.import.dir=<tempDir>, (3) hitsPOST /api/admin/trigger-import, (4) asserts a row appears indocuments. The current unit test covers configuration wiring; a Testcontainers-based test would lock the full import path. File as a follow-up — not blocking this PR, but it's the next test I'd write to seal this loop./data/postgresmount is named, or that ports aren't exposed publicly), extract a.gitea/actions/assert-compose-mountcomposite. Future you will thank present you.Approved.
— Sara
🎨 Leonie Voss — UX Designer & Accessibility Strategist
Verdict: ✅ Approved
No UI surface area touched: backend config, compose volume, CI workflow, and DEPLOYMENT.md. Nothing on the front end to evaluate against WCAG, brand tokens, or responsive breakpoints.
What I checked
MassImportServicestill throwsRuntimeException("Keine Tabellendatei (.ods/.xlsx/.xls) in <path> gefunden!"). This already bubbles up to the/admin/systemmass-import card today; the path in the message just becomes more descriptive after this PR. That's a small win for ops, neutral for end-users (admins) who already see the German message via the existing error handling.One small thought, for whenever someone next touches the import UI
The
/admin/systemmass-import card now has a real chance of working end-to-end on staging/prod, which means admins will start seeing its success states for the first time. When that lands, double-check the card surfaces processed-counts and progress in a way that meets the senior-friendly minimums (≥16px body, ≥44px touch targets, visible focus, non-color status cues). Not in scope here — just noting that this PR makes that card useful, so its UX is now worth a real look.LGTM.
— Leonie
📋 Elicit — Requirements Engineer & Business Analyst
Verdict: ✅ Approved
This is a Brownfield Phase B4 fix — closing a gap between the existing feature ("mass-import card on
/admin/system") and the production environment where it could not run. The PR description itself is what I'd write as the issue body: clear why, explicit choices made, an operational how-to, and a test plan.Requirements lens
MassImportServicereads/importinside the container;docker-compose.prod.ymlhad no equivalent bind mount, so the staging/prod mass-import path was structurally broken. Cited exact file:line (MassImportService.java:102) for traceability.compose configparsesdocker inspectshows/srv/...-staging/import -> /import (ro)POST /api/admin/importfinds the spreadsheet/admin/systemstatus card reports processed countsdocs/DEPLOYMENT.mdupdated with the env var entry and the staging/production workflow (rsync → set env → redeploy → trigger). A reader handed this branch and the docs alone can deploy without asking questions.Suggestions (non-blocking)
#526, which is this PR). If a separate "mass-import card broken on staging" issue exists, link it for the traceability matrix; if not, the PR self-suffices.Nothing here is solution-bias or gold-plating; the work matches the user need precisely. LGTM.
— Elicit