The four files in tools/import-normalizer/out/ contain real names,
addresses, and attribution prose for ~163 living/deceased family members
and were committed by mistake. They are now removed from the index
(kept on disk for local development) and gitignored.
The canonical artifacts are produced locally from the Python normalizer
and synced into IMPORT_HOST_DIR out-of-band alongside the PDFs. The
contract between normalizer and importer is the header schema, not the
file contents — CanonicalSheetReader fails closed on a missing header,
which is what locks the contract.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The mass-import card no longer parses an ODS spreadsheet and MassImportService
was deleted (#674); /import now holds the normalizer's canonical artifacts
(canonical-*.xlsx + canonical-persons-tree.json) plus <index>.pdf files, read
by the canonical importer. Fix the IMPORT_HOST_DIR descriptions in
DEPLOYMENT.md and docker-compose.prod.yml accordingly.
Refs #686
Wires the new GRAFANA_DB_PASSWORD secret through the deploy pipeline:
- docker-compose.prod.yml: backend env now passes GRAFANA_DB_PASSWORD
through so Flyway V68 can resolve the ${grafanaDbPassword} placeholder
in production and staging (it already worked in local dev via
docker-compose.yml).
- release.yml + nightly.yml: declare GRAFANA_DB_PASSWORD as a required
Gitea secret, write it into .env.production / .env.staging (consumed
by archive-backend), and into /opt/familienarchiv/obs-secrets.env
(consumed by obs-grafana's PostgreSQL datasource).
Operator action before the next deploy: add a GRAFANA_DB_PASSWORD value
to the Gitea repo secrets (openssl rand -hex 32).
Refs #651.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
VITE_SENTRY_DSN is a Vite build-time variable baked into the JS bundle.
Without an ARG/ENV in the Dockerfile build stage and a build.args entry in
docker-compose.prod.yml, the SDK initialised with enabled=false regardless
of the Gitea secret value.
- frontend/Dockerfile: add ARG VITE_SENTRY_DSN + ENV before npm run build
- docker-compose.prod.yml: add build.args.VITE_SENTRY_DSN with empty fallback
- nightly.yml: write VITE_SENTRY_DSN secret into .env.staging
Requires Gitea secret VITE_SENTRY_DSN to be set to the GlitchTip project #1 DSN.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pass SENTRY_DSN env var through to the backend container so the Sentry SDK
actually ships exceptions to GlitchTip — the variable was written to
.env.staging by nightly.yml but never forwarded into the container.
Enable Spring Boot 4.0 ECS structured logging (LOGGING_STRUCTURED_FORMAT_CONSOLE=ecs)
so Loki receives single-entry JSON log lines with parsed log.level, enabling
detected_level filtering in Grafana instead of 50-line unlinked stack trace blobs.
Update Grafana Loki dashboard query from | logfmt to | json to match the new format.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Makes the upload size cap explicit in both dev and prod compose files.
After the @sveltejs/kit bump (GHSA-2crg-3p73-43xp), the default 512KB
limit is now enforced — 50M covers multi-page Kurrent/Sütterlin PDFs
(typically 500KB–15MB) without being reckless.
Caddy's client_max_body_size must be set to match when the reverse
proxy config is committed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
alpine:3 is a moving tag — pinning to 3.21 makes builds reproducible and
rollbacks possible. networks: [] removes the init container from the project
network since it only needs volume access, not network access (least privilege).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TMPDIR=/app/cache/.tmp routes Surya model staging to the SSD-backed cache
volume instead of the 512 MB /tmp tmpfs. The ocr-volume-init one-shot service
runs first to ensure correct ownership (uid 1000) and creates /app/cache/.tmp
on fresh volumes, making AC #6 ("fresh volume still works") a permanent
infrastructure-as-code guarantee rather than a manual chown step.
Both docker-compose.yml and docker-compose.prod.yml are updated in the same
commit to prevent the silent drift that occurred with the 512 MB tmpfs comment.
Fixes#614. See ADR-021.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirror the CIS Docker §4.1/§4.6 hardening from docker-compose.yml to the
production/staging compose file, which is standalone (not an overlay).
- Fix cache volume mount path: ocr-cache:/root/.cache → /app/cache (matches
the non-root user's HF_HOME/XDG_CACHE_HOME, avoids PermissionError)
- Add HF_HOME, XDG_CACHE_HOME, TORCH_HOME env vars so HuggingFace, ketos,
and PyTorch all write to the declared writable volumes, not HOME
- Add read_only: true, tmpfs (/tmp:512m), cap_drop: [ALL],
no-new-privileges:true — matching the dev baseline
Also extend DEPLOYMENT.md §8 upgrade notes to cover all three environments
(dev/production/staging), each with its correct project-namespaced volume name.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tempo only handles traces; sending metrics to /v1/metrics returns 404.
Prometheus already scrapes Spring Boot metrics via the pull-model at
/actuator/prometheus, so OTLP metric push is redundant and noisy.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Change OTEL default endpoint from port 4317 (gRPC) to 4318 (HTTP) to
match Spring Boot's HttpExporter; sending HTTP/1.1 to a gRPC listener
caused "Connection reset" errors
- Add otel.logs.exporter=none: Promtail captures Docker logs via the
logging driver; sending logs to Tempo's OTLP endpoint (which only
handles traces) produced 404 errors
- Add management.metrics.tags.application to every metric so Grafana's
Spring Boot Observability dashboard (ID 17175) can filter by the
application label_values() template variable
- Add MANAGEMENT_METRICS_TAGS_APPLICATION and OTEL_LOGS_EXPORTER env
vars to docker-compose.prod.yml; production Tempo endpoint already
uses 4318
- Add MANAGEMENT_TRACING_SAMPLING_PROBABILITY to prod compose with
0.1 default to avoid 100% trace sampling in production
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The endpoint belongs in the compose file (hardcoded to the in-network
Tempo service) rather than per-environment workflow files. This covers
both staging (nightly.yml) and production (release.yml) with a single
change and removes the duplicate from the nightly env-file block.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs introduced when the management port was split from the app port:
1. Backend healthcheck hit localhost:8080/actuator/health (app port) —
actuator is on management.server.port=8081, so every probe got a 404
from the main MVC dispatcher, marking the container permanently unhealthy.
Fix: change the probe to localhost:8081.
2. OTEL_EXPORTER_OTLP_ENDPOINT was not set in .env.staging, so the exporter
fell back to http://localhost:4317 (the CI-safe default) instead of
http://tempo:4317 (the in-network Tempo service). Fix: inject the correct
endpoint in the nightly env-file generation step.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The name: archiv-net declaration (needed so docker-compose.observability.yml
can join the network as external: true) caused the compose-idempotency CI job
to collide with any archiv-net left on the runner from staging or a previous
run. mc would resolve 'minio' to the wrong container and fail with a signature
mismatch.
Make the network name interpolable via COMPOSE_NETWORK_NAME (default: archiv-net
so production/staging behaviour is unchanged). Inject COMPOSE_NETWORK_NAME=
test-idem-archiv-net into the stub env file so the idempotency test always
gets a fully isolated network.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docker-compose.prod.yml: add `name: archiv-net` so the network has a
stable Docker name regardless of compose project name (-p flag).
Both staging and production share the same host-level network, which
is correct since the observability stack is a single shared instance.
- nightly.yml / release.yml: add observability env vars (POSTGRES_USER,
PORT_GRAFANA=3003, PORT_GLITCHTIP=3002, PORT_PROMETHEUS=9090,
GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, GLITCHTIP_DOMAIN) to the
env file, then `docker compose -f docker-compose.observability.yml up -d`
after the app deploy step. PORT_GRAFANA=3003 avoids collision with
staging frontend on 3001.
Requires two new Gitea secrets: GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tobias and Markus both flagged that a shared default (/srv/familienarchiv/
import) invites silent collision when staging and prod cohabit one host.
Switch to ${IMPORT_HOST_DIR:?...} so compose refuses to start without an
explicit per-env path — collision becomes structurally impossible.
The error message points operators at docs/DEPLOYMENT.md so the recovery
step is one click away. IMPORT_HOST_DIR moves from "Optional" to the
main required-env-vars block in the header.
Addresses review feedback from Markus, Tobias, and Nora on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`MassImportService` reads the ODS spreadsheet and referenced PDFs from a
hardcoded `/import` path inside the backend container. Dev compose
already bind-mounts `./import:/import`, but the prod compose had no
equivalent, so `POST /api/admin/import` would always fail on staging/prod
with "no spreadsheet found".
Mount strategy:
- Source path is env-driven (`IMPORT_HOST_DIR`), defaulting to
`/srv/familienarchiv/import` so the host path is stable across CI
deploys (the compose working dir is recreated each run, so `./import`
would not persist).
- Read-only — `MassImportService` only reads (`Files.list` /
`Files.walk`), never writes. Read-only mount makes that contract
explicit and prevents the backend container from mutating the source
PDFs.
- Empty / missing path is harmless: the import API just returns the
existing "no spreadsheet found" error rather than crashing the
container.
To use on staging: rsync the import folder to
`/srv/familienarchiv-staging/import/` on the host, set
`IMPORT_HOST_DIR=/srv/familienarchiv-staging/import` in `.env.staging`,
redeploy, trigger import from `/admin/system`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The new alpine-based frontend production image (`node:20.19.0-alpine3.21`)
resolves `localhost` only to `::1` in /etc/hosts. SvelteKit's adapter-node
binds to 0.0.0.0 (IPv4 only), so `wget http://localhost:3000/login` from
inside the container connects to ::1 and gets "Connection refused" every
15s. Container goes unhealthy → `docker compose up --wait` fails → nightly
staging deploy fails. The app itself is fine.
Switching to 127.0.0.1 bypasses /etc/hosts and matches what Node actually
listens on.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#510.
`docker compose up -d --wait` exits 1 even when every service is
healthy because the one-shot `create-buckets` exits 0 and --wait
expects "running". The whole stack came up fine on staging, but the
workflow gate failed before the smoke step could run.
Two changes:
1. create-buckets: `restart: "no"` declares one-shot intent.
2. backend.depends_on: add `create-buckets: service_completed_successfully`.
With both, compose v2.20+ understands create-buckets is a one-shot
that must complete successfully, and --wait treats exited(0) as the
target state. Backend startup now also correctly gates on bucket
bootstrap (closes a latent race where backend could start before
the archiv-app policy was bound).
Verified `docker compose config --quiet` parses and the resolved
config shows the right dependency graph.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#506.
Under Docker-out-of-Docker (the production Gitea Actions runner), the
host daemon resolves the relative bind-mount path against the host
filesystem — not the runner container's /workspace. The script is not
there, so Docker creates an empty directory at /bootstrap.sh and the
entrypoint fails with `/bootstrap.sh: Is a directory`.
Bake the script into a tiny derived image (infra/minio/Dockerfile) so
there is no runtime path resolution. Works in DooD, regular Docker,
and CI.
Unblocks the staging / production deploy pipelines from #497 / #499
and turns the Compose Bucket Idempotency CI job green.
Verified locally:
- `docker compose ... config --quiet` parses
- `docker compose ... build create-buckets` builds the image
- bootstrap.sh exists as a +x file at /bootstrap.sh inside the image
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The docker network was the only `archive-*` identifier in either
compose file; everything else (user, db, bucket, service account,
project name) uses the `archiv-*` spelling. Reviewers' eyes stuttered
on it on the prod compose review (round 2 of PR #499 — Markus and
Tobi). Renamed in both prod and dev compose for consistency and
updated the single doc reference to the dev-project-prefixed
network name.
Operational note: applying this change to a running stack will
recreate the network on the next `docker compose up`; containers
restart, named volumes are unaffected.
`docker compose config --quiet` passes for both compose files and
for the staging profile. Sweep confirms zero `archive-net`
references remain in the tree.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The mailpit service healthcheck previously assumed `wget` ships in
the axllent/mailpit image. That's true for v1.29.7 but is not part
of the image's contract — a future Alpine slim-down could drop wget
and silently disable the healthcheck. Switched to BusyBox `nc -z
localhost 8025`, which is a TCP-port open check with no dependency
beyond BusyBox itself.
Verified inside axllent/mailpit:v1.29.7 that `nc` is present
(/usr/bin/nc, BusyBox v1.37.0) and that the proposed command
returns 0 against an open port and non-zero against a closed one.
Compose still parses with `--profile staging`. Addresses @tobi's
round-2 suggestion on PR #499.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Production never sources PDFs from localhost or 127.0.0.1 — the OCR
service only reads from MinIO over the internal docker network. The
Python default (`minio,localhost,127.0.0.1`) was permissive on
purpose for local dev, but in production a future change to that
default — or a host-env override — would silently broaden the SSRF
surface. Pinning the env var explicitly here freezes the allowlist
to the one hostname production actually needs.
`docker compose config --quiet` and `--profile staging config
--quiet` both still pass. Verified the resolved config emits
`ALLOWED_PDF_HOSTS: minio`. Addresses @nora's round-2 suggestion on
PR #499 — "five characters of YAML, lifetime guarantee".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The repo's renovate.json only configures TipTap grouping; Renovate is
not currently active against MinIO / mc / mailpit / Postgres / Node /
Caddy. The "Renovate keeps it current" comments were aspirational —
those tags will rot until Renovate is bootstrapped (tracked in a
follow-up issue).
The "Pinned mc release; Renovate keeps it current" comment is gone
already since the create-buckets entrypoint was extracted to a script
in the preceding MinIO-policy commit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces MinIO's built-in `readwrite` policy (which grants s3:* on
arn:aws:s3:::* — every bucket present and future) with a bucket-scoped
custom policy `archiv-app-policy`:
- s3:GetObject / s3:PutObject / s3:DeleteObject on familienarchiv/*
- s3:ListBucket / s3:GetBucketLocation on familienarchiv
The previous configuration silently regressed the least-privilege guarantee
that the service-account separation was supposed to provide: a future
second bucket (logs, backups, mc-mirror staging) would have been
read/write/delete-accessible to a compromised backend.
While at it, two follow-on fixes:
1. Extract the entrypoint to infra/minio/bootstrap.sh. The previous
inline `/bin/sh -c "..."` was already at the YAML-escaping ceiling;
adding the policy-JSON heredoc would have made it unreadable.
2. Replace the `| grep -q readwrite || exit 1` fatal-check with a
POSIX `case` substring match. The minio/mc image ships coreutils +
bash but NOT grep/awk/sed — the original check was a no-op that
ALWAYS exited 1 (verified locally). The new check passes on the
first invocation and on every subsequent re-deploy.
Idempotency verified locally: two consecutive `docker compose run --rm
create-buckets` invocations both exit 0 with the user bound to the
new policy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hardcoded `mem_limit: 12g` only works on CX42+ (16 GB) hosts; a CX32 (8
GB) cannot honour it. Make both mem_limit and memswap_limit driven by
the OCR_MEM_LIMIT env var, defaulting to 12g so prod deploys on a CX42
keep current behaviour. Operators on smaller hosts override to 6g.
Verified compose interpolation produces 12 GiB by default and 6 GiB when
OCR_MEM_LIMIT=6g.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous `mc admin policy attach … || true` swallowed every failure
mode: a renamed policy, an mc CLI signature change, or a transient MinIO
error would leave the bootstrap container exiting zero with the service
account possessing no permissions, and the backend would then fail every
S3 call after a "successful" deploy.
Replace the silent fallback with verify-after: keep the attach (idempotent
in current mc, redundant in older versions), then assert via `mc admin
user info` that `readwrite` ends up on archiv-app. A genuine attach
failure now exits 1 and blocks the stack from starting.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Removes the implicit `:latest` from the create-buckets bootstrap
container. Pins to RELEASE.2025-08-13T08-35-41Z so a breaking change in
mc CLI syntax cannot silently brick deploys.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Removes `:latest` from the mailpit service; pins to v1.29.7 so staging
deploys are reproducible. Renovate keeps the tag current.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Standalone production compose file (not an overlay) that runs the
full stack on a single host. Environment isolation is achieved via
the docker compose project name (-p archiv-production / -p
archiv-staging) so the two environments cohabit cleanly.
Key choices, resolved in #497 review:
- Named volumes for persistent data (no host bind mounts)
- MinIO pinned to a specific RELEASE tag (no :latest)
- Backend uses MinIO service account (S3_ACCESS_KEY=archiv-app),
not root credentials; create-buckets bootstraps the account
- Mailpit lives under profiles: [staging] so no real SMTP secret
is ever wired into the staging deploy
- OCR mem_limit 12g + healthcheck (start_period 120s) copied from
the dev compose so docker compose up -d --wait works in CI
- Backend admin credentials wired through APP_ADMIN_USERNAME /
APP_ADMIN_PASSWORD; first deploy locks the password in
permanently because UserDataInitializer is idempotent on email
- All host ports bound to 127.0.0.1; Caddy fronts external traffic
Refs #497.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>