Commit Graph

15 Commits

Author SHA1 Message Date
Marcel
cdb5db6c68 fix(compose): require IMPORT_HOST_DIR, no default
Tobias and Markus both flagged that a shared default (/srv/familienarchiv/
import) invites silent collision when staging and prod cohabit one host.
Switch to ${IMPORT_HOST_DIR:?...} so compose refuses to start without an
explicit per-env path — collision becomes structurally impossible.

The error message points operators at docs/DEPLOYMENT.md so the recovery
step is one click away. IMPORT_HOST_DIR moves from "Optional" to the
main required-env-vars block in the header.

Addresses review feedback from Markus, Tobias, and Nora on #526.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 20:03:57 +02:00
Marcel
4a537d6b19 feat(infra): bind-mount /import for backend mass-import endpoint
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m55s
CI / OCR Service Tests (push) Successful in 18s
CI / Backend Unit Tests (push) Successful in 4m9s
CI / fail2ban Regex (push) Successful in 38s
CI / Compose Bucket Idempotency (push) Successful in 56s
CI / Unit & Component Tests (pull_request) Failing after 2m47s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 4m12s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
`MassImportService` reads the ODS spreadsheet and referenced PDFs from a
hardcoded `/import` path inside the backend container. Dev compose
already bind-mounts `./import:/import`, but the prod compose had no
equivalent, so `POST /api/admin/import` would always fail on staging/prod
with "no spreadsheet found".

Mount strategy:
- Source path is env-driven (`IMPORT_HOST_DIR`), defaulting to
  `/srv/familienarchiv/import` so the host path is stable across CI
  deploys (the compose working dir is recreated each run, so `./import`
  would not persist).
- Read-only — `MassImportService` only reads (`Files.list` /
  `Files.walk`), never writes. Read-only mount makes that contract
  explicit and prevents the backend container from mutating the source
  PDFs.
- Empty / missing path is harmless: the import API just returns the
  existing "no spreadsheet found" error rather than crashing the
  container.

To use on staging: rsync the import folder to
`/srv/familienarchiv-staging/import/` on the host, set
`IMPORT_HOST_DIR=/srv/familienarchiv-staging/import` in `.env.staging`,
redeploy, trigger import from `/admin/system`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 18:57:47 +02:00
Marcel
5f3529439a fix(infra): frontend healthcheck on 127.0.0.1, not localhost
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m53s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 4m33s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
CI / Unit & Component Tests (push) Failing after 2m52s
CI / OCR Service Tests (push) Successful in 18s
CI / Backend Unit Tests (push) Successful in 4m23s
CI / fail2ban Regex (push) Successful in 39s
CI / Compose Bucket Idempotency (push) Successful in 1m0s
The new alpine-based frontend production image (`node:20.19.0-alpine3.21`)
resolves `localhost` only to `::1` in /etc/hosts. SvelteKit's adapter-node
binds to 0.0.0.0 (IPv4 only), so `wget http://localhost:3000/login` from
inside the container connects to ::1 and gets "Connection refused" every
15s. Container goes unhealthy → `docker compose up --wait` fails → nightly
staging deploy fails. The app itself is fine.

Switching to 127.0.0.1 bypasses /etc/hosts and matches what Node actually
listens on.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 18:49:32 +02:00
Marcel
3668555421 fix(compose): mark create-buckets as one-shot for up --wait
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m47s
CI / OCR Service Tests (push) Successful in 17s
CI / Backend Unit Tests (push) Successful in 4m12s
CI / fail2ban Regex (push) Successful in 37s
CI / Compose Bucket Idempotency (push) Successful in 56s
CI / Unit & Component Tests (pull_request) Failing after 2m49s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m13s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
Closes #510.

`docker compose up -d --wait` exits 1 even when every service is
healthy because the one-shot `create-buckets` exits 0 and --wait
expects "running". The whole stack came up fine on staging, but the
workflow gate failed before the smoke step could run.

Two changes:

1. create-buckets: `restart: "no"` declares one-shot intent.
2. backend.depends_on: add `create-buckets: service_completed_successfully`.

With both, compose v2.20+ understands create-buckets is a one-shot
that must complete successfully, and --wait treats exited(0) as the
target state. Backend startup now also correctly gates on bucket
bootstrap (closes a latent race where backend could start before
the archiv-app policy was bound).

Verified `docker compose config --quiet` parses and the resolved
config shows the right dependency graph.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 16:33:04 +02:00
Marcel
f8f0951bd5 fix(minio): bake bootstrap.sh into image instead of bind-mounting
Some checks failed
CI / Unit & Component Tests (push) Has been cancelled
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (pull_request) Failing after 2m50s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 4m9s
CI / fail2ban Regex (pull_request) Failing after 12s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
Closes #506.

Under Docker-out-of-Docker (the production Gitea Actions runner), the
host daemon resolves the relative bind-mount path against the host
filesystem — not the runner container's /workspace. The script is not
there, so Docker creates an empty directory at /bootstrap.sh and the
entrypoint fails with `/bootstrap.sh: Is a directory`.

Bake the script into a tiny derived image (infra/minio/Dockerfile) so
there is no runtime path resolution. Works in DooD, regular Docker,
and CI.

Unblocks the staging / production deploy pipelines from #497 / #499
and turns the Compose Bucket Idempotency CI job green.

Verified locally:
- `docker compose ... config --quiet` parses
- `docker compose ... build create-buckets` builds the image
- bootstrap.sh exists as a +x file at /bootstrap.sh inside the image

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 15:32:36 +02:00
Marcel
9adde3cd89 refactor(compose): rename docker network archive-net to archiv-net
The docker network was the only `archive-*` identifier in either
compose file; everything else (user, db, bucket, service account,
project name) uses the `archiv-*` spelling. Reviewers' eyes stuttered
on it on the prod compose review (round 2 of PR #499 — Markus and
Tobi). Renamed in both prod and dev compose for consistency and
updated the single doc reference to the dev-project-prefixed
network name.

Operational note: applying this change to a running stack will
recreate the network on the next `docker compose up`; containers
restart, named volumes are unaffected.

`docker compose config --quiet` passes for both compose files and
for the staging profile. Sweep confirms zero `archive-net`
references remain in the tree.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 14:10:39 +02:00
Marcel
1873f50f7f infra(mailpit): use nc -z healthcheck instead of wget
The mailpit service healthcheck previously assumed `wget` ships in
the axllent/mailpit image. That's true for v1.29.7 but is not part
of the image's contract — a future Alpine slim-down could drop wget
and silently disable the healthcheck. Switched to BusyBox `nc -z
localhost 8025`, which is a TCP-port open check with no dependency
beyond BusyBox itself.

Verified inside axllent/mailpit:v1.29.7 that `nc` is present
(/usr/bin/nc, BusyBox v1.37.0) and that the proposed command
returns 0 against an open port and non-zero against a closed one.
Compose still parses with `--profile staging`. Addresses @tobi's
round-2 suggestion on PR #499.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 14:08:23 +02:00
Marcel
a4f2047bcc security(ocr): pin ALLOWED_PDF_HOSTS=minio in prod ocr-service env
Production never sources PDFs from localhost or 127.0.0.1 — the OCR
service only reads from MinIO over the internal docker network. The
Python default (`minio,localhost,127.0.0.1`) was permissive on
purpose for local dev, but in production a future change to that
default — or a host-env override — would silently broaden the SSRF
surface. Pinning the env var explicitly here freezes the allowlist
to the one hostname production actually needs.

`docker compose config --quiet` and `--profile staging config
--quiet` both still pass. Verified the resolved config emits
`ALLOWED_PDF_HOSTS: minio`. Addresses @nora's round-2 suggestion on
PR #499 — "five characters of YAML, lifetime guarantee".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 14:07:16 +02:00
Marcel
33300e4ad9 chore(infra): drop aspirational Renovate comments from compose
The repo's renovate.json only configures TipTap grouping; Renovate is
not currently active against MinIO / mc / mailpit / Postgres / Node /
Caddy. The "Renovate keeps it current" comments were aspirational —
those tags will rot until Renovate is bootstrapped (tracked in a
follow-up issue).

The "Pinned mc release; Renovate keeps it current" comment is gone
already since the create-buckets entrypoint was extracted to a script
in the preceding MinIO-policy commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 13:12:55 +02:00
Marcel
91f70e652d security(minio): scope archiv-app to bucket-only IAM policy
Replaces MinIO's built-in `readwrite` policy (which grants s3:* on
arn:aws:s3:::* — every bucket present and future) with a bucket-scoped
custom policy `archiv-app-policy`:

  - s3:GetObject / s3:PutObject / s3:DeleteObject on familienarchiv/*
  - s3:ListBucket / s3:GetBucketLocation on familienarchiv

The previous configuration silently regressed the least-privilege guarantee
that the service-account separation was supposed to provide: a future
second bucket (logs, backups, mc-mirror staging) would have been
read/write/delete-accessible to a compromised backend.

While at it, two follow-on fixes:

  1. Extract the entrypoint to infra/minio/bootstrap.sh. The previous
     inline `/bin/sh -c "..."` was already at the YAML-escaping ceiling;
     adding the policy-JSON heredoc would have made it unreadable.

  2. Replace the `| grep -q readwrite || exit 1` fatal-check with a
     POSIX `case` substring match. The minio/mc image ships coreutils +
     bash but NOT grep/awk/sed — the original check was a no-op that
     ALWAYS exited 1 (verified locally). The new check passes on the
     first invocation and on every subsequent re-deploy.

Idempotency verified locally: two consecutive `docker compose run --rm
create-buckets` invocations both exit 0 with the user bound to the
new policy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 13:07:56 +02:00
Marcel
4eb5eba347 feat(infra): parameterize OCR mem_limit via OCR_MEM_LIMIT
Hardcoded `mem_limit: 12g` only works on CX42+ (16 GB) hosts; a CX32 (8
GB) cannot honour it. Make both mem_limit and memswap_limit driven by
the OCR_MEM_LIMIT env var, defaulting to 12g so prod deploys on a CX42
keep current behaviour. Operators on smaller hosts override to 6g.

Verified compose interpolation produces 12 GiB by default and 6 GiB when
OCR_MEM_LIMIT=6g.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 12:01:23 +02:00
Marcel
47c5f77c81 fix(infra): fail loud when archiv-app is missing the readwrite policy
The previous `mc admin policy attach … || true` swallowed every failure
mode: a renamed policy, an mc CLI signature change, or a transient MinIO
error would leave the bootstrap container exiting zero with the service
account possessing no permissions, and the backend would then fail every
S3 call after a "successful" deploy.

Replace the silent fallback with verify-after: keep the attach (idempotent
in current mc, redundant in older versions), then assert via `mc admin
user info` that `readwrite` ends up on archiv-app. A genuine attach
failure now exits 1 and blocks the stack from starting.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 12:00:34 +02:00
Marcel
a36f25cfc3 fix(infra): pin minio/mc client tag
Removes the implicit `:latest` from the create-buckets bootstrap
container. Pins to RELEASE.2025-08-13T08-35-41Z so a breaking change in
mc CLI syntax cannot silently brick deploys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 11:59:18 +02:00
Marcel
c9ac83b2ba fix(infra): pin axllent/mailpit tag
Removes `:latest` from the mailpit service; pins to v1.29.7 so staging
deploys are reproducible. Renovate keeps the tag current.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 11:58:34 +02:00
Marcel
ecb930e5f9 feat(infra): add docker-compose.prod.yml for production/staging
Standalone production compose file (not an overlay) that runs the
full stack on a single host. Environment isolation is achieved via
the docker compose project name (-p archiv-production / -p
archiv-staging) so the two environments cohabit cleanly.

Key choices, resolved in #497 review:
- Named volumes for persistent data (no host bind mounts)
- MinIO pinned to a specific RELEASE tag (no :latest)
- Backend uses MinIO service account (S3_ACCESS_KEY=archiv-app),
  not root credentials; create-buckets bootstraps the account
- Mailpit lives under profiles: [staging] so no real SMTP secret
  is ever wired into the staging deploy
- OCR mem_limit 12g + healthcheck (start_period 120s) copied from
  the dev compose so docker compose up -d --wait works in CI
- Backend admin credentials wired through APP_ADMIN_USERNAME /
  APP_ADMIN_PASSWORD; first deploy locks the password in
  permanently because UserDataInitializer is idempotent on email
- All host ports bound to 127.0.0.1; Caddy fronts external traffic

Refs #497.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 21:53:19 +02:00