familienarchiv

Author	SHA1	Message	Date
Marcel	46d1f5c6d8	chore(import): stop tracking real family PII canonical artifacts The four files in tools/import-normalizer/out/ contain real names, addresses, and attribution prose for ~163 living/deceased family members and were committed by mistake. They are now removed from the index (kept on disk for local development) and gitignored. The canonical artifacts are produced locally from the Python normalizer and synced into IMPORT_HOST_DIR out-of-band alongside the PDFs. The contract between normalizer and importer is the header schema, not the file contents — CanonicalSheetReader fails closed on a missing header, which is what locks the contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 10:20:38 +02:00
Marcel	a4c2b6289d	docs: drop stale MassImportService/ODS references from import deploy docs The mass-import card no longer parses an ODS spreadsheet and MassImportService was deleted (#674); /import now holds the normalizer's canonical artifacts (canonical-*.xlsx + canonical-persons-tree.json) plus <index>.pdf files, read by the canonical importer. Fix the IMPORT_HOST_DIR descriptions in DEPLOYMENT.md and docker-compose.prod.yml accordingly. Refs #686	2026-05-27 22:08:45 +02:00
Marcel	bcba4dab80	ci(observability): inject GRAFANA_DB_PASSWORD from Gitea secrets All checks were successful CI / fail2ban Regex (pull_request) Successful in 42s Details CI / Semgrep Security Scan (pull_request) Successful in 20s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s Details CI / Unit & Component Tests (pull_request) Successful in 3m32s Details CI / OCR Service Tests (pull_request) Successful in 20s Details CI / Backend Unit Tests (pull_request) Successful in 3m30s Details Wires the new GRAFANA_DB_PASSWORD secret through the deploy pipeline: - docker-compose.prod.yml: backend env now passes GRAFANA_DB_PASSWORD through so Flyway V68 can resolve the ${grafanaDbPassword} placeholder in production and staging (it already worked in local dev via docker-compose.yml). - release.yml + nightly.yml: declare GRAFANA_DB_PASSWORD as a required Gitea secret, write it into .env.production / .env.staging (consumed by archive-backend), and into /opt/familienarchiv/obs-secrets.env (consumed by obs-grafana's PostgreSQL datasource). Operator action before the next deploy: add a GRAFANA_DB_PASSWORD value to the Gitea repo secrets (openssl rand -hex 32). Refs #651. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-21 20:21:27 +02:00
Marcel	cdc3e2e4c8	fix(deploy): wire VITE_SENTRY_DSN as Docker build arg for frontend GlitchTip (#645 ) All checks were successful CI / Backend Unit Tests (pull_request) Successful in 3m18s Details CI / fail2ban Regex (pull_request) Successful in 42s Details CI / Semgrep Security Scan (pull_request) Successful in 20s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details CI / Unit & Component Tests (push) Successful in 3m19s Details CI / OCR Service Tests (push) Successful in 19s Details CI / Backend Unit Tests (push) Successful in 3m26s Details CI / fail2ban Regex (push) Successful in 43s Details CI / Semgrep Security Scan (push) Successful in 18s Details CI / Compose Bucket Idempotency (push) Successful in 1m0s Details CI / Unit & Component Tests (pull_request) Successful in 3m29s Details CI / OCR Service Tests (pull_request) Successful in 19s Details VITE_SENTRY_DSN is a Vite build-time variable baked into the JS bundle. Without an ARG/ENV in the Dockerfile build stage and a build.args entry in docker-compose.prod.yml, the SDK initialised with enabled=false regardless of the Gitea secret value. - frontend/Dockerfile: add ARG VITE_SENTRY_DSN + ENV before npm run build - docker-compose.prod.yml: add build.args.VITE_SENTRY_DSN with empty fallback - nightly.yml: write VITE_SENTRY_DSN secret into .env.staging Requires Gitea secret VITE_SENTRY_DSN to be set to the GlitchTip project #1 DSN. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 09:54:04 +02:00
Marcel	e89a90ff66	fix(deploy): wire SENTRY_DSN and enable ECS JSON logging for prod (#641 ) All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m27s Details CI / OCR Service Tests (pull_request) Successful in 20s Details CI / Backend Unit Tests (pull_request) Successful in 3m22s Details CI / fail2ban Regex (pull_request) Successful in 1m19s Details CI / Semgrep Security Scan (pull_request) Successful in 19s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details CI / Unit & Component Tests (push) Successful in 3m21s Details CI / OCR Service Tests (push) Successful in 18s Details CI / Backend Unit Tests (push) Successful in 3m33s Details CI / fail2ban Regex (push) Successful in 43s Details CI / Semgrep Security Scan (push) Successful in 20s Details CI / Compose Bucket Idempotency (push) Successful in 59s Details Pass SENTRY_DSN env var through to the backend container so the Sentry SDK actually ships exceptions to GlitchTip — the variable was written to .env.staging by nightly.yml but never forwarded into the container. Enable Spring Boot 4.0 ECS structured logging (LOGGING_STRUCTURED_FORMAT_CONSOLE=ecs) so Loki receives single-entry JSON log lines with parsed log.level, enabling detected_level filtering in Grafana instead of 50-line unlinked stack trace blobs. Update Grafana Loki dashboard query from \| logfmt to \| json to match the new format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 08:16:00 +02:00
Marcel	378023c53d	chore(infra): set BODY_SIZE_LIMIT=50M in frontend service Makes the upload size cap explicit in both dev and prod compose files. After the @sveltejs/kit bump (GHSA-2crg-3p73-43xp), the default 512KB limit is now enforced — 50M covers multi-page Kurrent/Sütterlin PDFs (typically 500KB–15MB) without being reckless. Caddy's client_max_body_size must be set to match when the reverse proxy config is committed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 13:55:10 +02:00
Marcel	3182da8d92	fix(infra): pin ocr-volume-init to alpine:3.21 and drop project network alpine:3 is a moving tag — pinning to 3.21 makes builds reproducible and rollbacks possible. networks: [] removes the init container from the project network since it only needs volume access, not network access (least privilege). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 11:21:55 +02:00
Marcel	1f7b08b74f	fix(ocr): add TMPDIR env var and ocr-volume-init service to compose files TMPDIR=/app/cache/.tmp routes Surya model staging to the SSD-backed cache volume instead of the 512 MB /tmp tmpfs. The ocr-volume-init one-shot service runs first to ensure correct ownership (uid 1000) and creates /app/cache/.tmp on fresh volumes, making AC #6 ("fresh volume still works") a permanent infrastructure-as-code guarantee rather than a manual chown step. Both docker-compose.yml and docker-compose.prod.yml are updated in the same commit to prevent the silent drift that occurred with the 512 MB tmpfs comment. Fixes #614. See ADR-021. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 10:56:10 +02:00
Marcel	7769dbc9f4	security(ocr): apply container hardening baseline to docker-compose.prod.yml All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m3s Details CI / OCR Service Tests (pull_request) Successful in 18s Details CI / Backend Unit Tests (pull_request) Successful in 3m4s Details CI / fail2ban Regex (pull_request) Successful in 43s Details CI / Semgrep Security Scan (pull_request) Successful in 18s Details CI / Compose Bucket Idempotency (pull_request) Successful in 59s Details Mirror the CIS Docker §4.1/§4.6 hardening from docker-compose.yml to the production/staging compose file, which is standalone (not an overlay). - Fix cache volume mount path: ocr-cache:/root/.cache → /app/cache (matches the non-root user's HF_HOME/XDG_CACHE_HOME, avoids PermissionError) - Add HF_HOME, XDG_CACHE_HOME, TORCH_HOME env vars so HuggingFace, ketos, and PyTorch all write to the declared writable volumes, not HOME - Add read_only: true, tmpfs (/tmp:512m), cap_drop: [ALL], no-new-privileges:true — matching the dev baseline Also extend DEPLOYMENT.md §8 upgrade notes to cover all three environments (dev/production/staging), each with its correct project-namespaced volume name. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 17:43:18 +02:00
Marcel	cea94ce260	fix(obs): disable OTLP metric export (Prometheus scrapes pull-model) Tempo only handles traces; sending metrics to /v1/metrics returns 404. Prometheus already scrapes Spring Boot metrics via the pull-model at /actuator/prometheus, so OTLP metric push is redundant and noisy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 15:46:45 +02:00
Marcel	45a992f5a8	fix(obs): fix OTLP transport port and add application metrics tag - Change OTEL default endpoint from port 4317 (gRPC) to 4318 (HTTP) to match Spring Boot's HttpExporter; sending HTTP/1.1 to a gRPC listener caused "Connection reset" errors - Add otel.logs.exporter=none: Promtail captures Docker logs via the logging driver; sending logs to Tempo's OTLP endpoint (which only handles traces) produced 404 errors - Add management.metrics.tags.application to every metric so Grafana's Spring Boot Observability dashboard (ID 17175) can filter by the application label_values() template variable - Add MANAGEMENT_METRICS_TAGS_APPLICATION and OTEL_LOGS_EXPORTER env vars to docker-compose.prod.yml; production Tempo endpoint already uses 4318 - Add MANAGEMENT_TRACING_SAMPLING_PROBABILITY to prod compose with 0.1 default to avoid 100% trace sampling in production Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 15:46:45 +02:00
Marcel	fed427dc4a	fix(infra): set OTEL_EXPORTER_OTLP_ENDPOINT in docker-compose.prod.yml Some checks failed CI / Unit & Component Tests (pull_request) Has been cancelled Details CI / OCR Service Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details CI / fail2ban Regex (pull_request) Has been cancelled Details CI / Compose Bucket Idempotency (pull_request) Has been cancelled Details CI / Unit & Component Tests (push) Has been cancelled Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details The endpoint belongs in the compose file (hardcoded to the in-network Tempo service) rather than per-environment workflow files. This covers both staging (nightly.yml) and production (release.yml) with a single change and removes the duplicate from the nightly env-file block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:43:23 +02:00
Marcel	cf78ab2f8e	fix(staging): correct backend healthcheck port and OTel endpoint Some checks failed CI / OCR Service Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details CI / fail2ban Regex (pull_request) Has been cancelled Details CI / Compose Bucket Idempotency (pull_request) Has been cancelled Details CI / Unit & Component Tests (pull_request) Has been cancelled Details Two bugs introduced when the management port was split from the app port: 1. Backend healthcheck hit localhost:8080/actuator/health (app port) — actuator is on management.server.port=8081, so every probe got a 404 from the main MVC dispatcher, marking the container permanently unhealthy. Fix: change the probe to localhost:8081. 2. OTEL_EXPORTER_OTLP_ENDPOINT was not set in .env.staging, so the exporter fell back to http://localhost:4317 (the CI-safe default) instead of http://tempo:4317 (the in-network Tempo service). Fix: inject the correct endpoint in the nightly env-file generation step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:37:15 +02:00
Marcel	c8883d0e40	fix(ci): isolate compose-idempotency network from archiv-net collisions All checks were successful CI / Unit & Component Tests (pull_request) Successful in 5m40s Details CI / OCR Service Tests (pull_request) Successful in 34s Details CI / Backend Unit Tests (pull_request) Successful in 7m8s Details CI / fail2ban Regex (pull_request) Successful in 1m58s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m41s Details CI / Unit & Component Tests (push) Successful in 5m37s Details CI / OCR Service Tests (push) Successful in 28s Details CI / Backend Unit Tests (push) Successful in 6m59s Details CI / fail2ban Regex (push) Successful in 1m59s Details CI / Compose Bucket Idempotency (push) Successful in 1m44s Details The name: archiv-net declaration (needed so docker-compose.observability.yml can join the network as external: true) caused the compose-idempotency CI job to collide with any archiv-net left on the runner from staging or a previous run. mc would resolve 'minio' to the wrong container and fail with a signature mismatch. Make the network name interpolable via COMPOSE_NETWORK_NAME (default: archiv-net so production/staging behaviour is unchanged). Inject COMPOSE_NETWORK_NAME= test-idem-archiv-net into the stub env file so the idempotency test always gets a fully isolated network. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:33:07 +02:00
Marcel	d7d225af77	devops(observability): wire observability stack into nightly and release deploys All checks were successful CI / Unit & Component Tests (pull_request) Successful in 4m32s Details CI / OCR Service Tests (pull_request) Successful in 17s Details CI / Backend Unit Tests (pull_request) Successful in 4m3s Details CI / fail2ban Regex (pull_request) Successful in 1m55s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m42s Details - docker-compose.prod.yml: add `name: archiv-net` so the network has a stable Docker name regardless of compose project name (-p flag). Both staging and production share the same host-level network, which is correct since the observability stack is a single shared instance. - nightly.yml / release.yml: add observability env vars (POSTGRES_USER, PORT_GRAFANA=3003, PORT_GLITCHTIP=3002, PORT_PROMETHEUS=9090, GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, GLITCHTIP_DOMAIN) to the env file, then `docker compose -f docker-compose.observability.yml up -d` after the app deploy step. PORT_GRAFANA=3003 avoids collision with staging frontend on 3001. Requires two new Gitea secrets: GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 11:22:37 +02:00
Marcel	cdb5db6c68	fix(compose): require IMPORT_HOST_DIR, no default Tobias and Markus both flagged that a shared default (/srv/familienarchiv/ import) invites silent collision when staging and prod cohabit one host. Switch to ${IMPORT_HOST_DIR:?...} so compose refuses to start without an explicit per-env path — collision becomes structurally impossible. The error message points operators at docs/DEPLOYMENT.md so the recovery step is one click away. IMPORT_HOST_DIR moves from "Optional" to the main required-env-vars block in the header. Addresses review feedback from Markus, Tobias, and Nora on #526. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 20:03:57 +02:00
Marcel	4a537d6b19	feat(infra): bind-mount /import for backend mass-import endpoint Some checks failed CI / Unit & Component Tests (push) Failing after 2m55s Details CI / OCR Service Tests (push) Successful in 18s Details CI / Backend Unit Tests (push) Successful in 4m9s Details CI / fail2ban Regex (push) Successful in 38s Details CI / Compose Bucket Idempotency (push) Successful in 56s Details CI / Unit & Component Tests (pull_request) Failing after 2m47s Details CI / OCR Service Tests (pull_request) Successful in 17s Details CI / Backend Unit Tests (pull_request) Successful in 4m12s Details CI / fail2ban Regex (pull_request) Successful in 38s Details CI / Compose Bucket Idempotency (pull_request) Successful in 57s Details `MassImportService` reads the ODS spreadsheet and referenced PDFs from a hardcoded `/import` path inside the backend container. Dev compose already bind-mounts `./import:/import`, but the prod compose had no equivalent, so `POST /api/admin/import` would always fail on staging/prod with "no spreadsheet found". Mount strategy: - Source path is env-driven (`IMPORT_HOST_DIR`), defaulting to `/srv/familienarchiv/import` so the host path is stable across CI deploys (the compose working dir is recreated each run, so `./import` would not persist). - Read-only — `MassImportService` only reads (`Files.list` / `Files.walk`), never writes. Read-only mount makes that contract explicit and prevents the backend container from mutating the source PDFs. - Empty / missing path is harmless: the import API just returns the existing "no spreadsheet found" error rather than crashing the container. To use on staging: rsync the import folder to `/srv/familienarchiv-staging/import/` on the host, set `IMPORT_HOST_DIR=/srv/familienarchiv-staging/import` in `.env.staging`, redeploy, trigger import from `/admin/system`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 18:57:47 +02:00
Marcel	5f3529439a	fix(infra): frontend healthcheck on 127.0.0.1, not localhost Some checks failed CI / Unit & Component Tests (pull_request) Failing after 2m53s Details CI / OCR Service Tests (pull_request) Successful in 17s Details CI / Backend Unit Tests (pull_request) Successful in 4m33s Details CI / fail2ban Regex (pull_request) Successful in 40s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details CI / Unit & Component Tests (push) Failing after 2m52s Details CI / OCR Service Tests (push) Successful in 18s Details CI / Backend Unit Tests (push) Successful in 4m23s Details CI / fail2ban Regex (push) Successful in 39s Details CI / Compose Bucket Idempotency (push) Successful in 1m0s Details The new alpine-based frontend production image (`node:20.19.0-alpine3.21`) resolves `localhost` only to `::1` in /etc/hosts. SvelteKit's adapter-node binds to 0.0.0.0 (IPv4 only), so `wget http://localhost:3000/login` from inside the container connects to ::1 and gets "Connection refused" every 15s. Container goes unhealthy → `docker compose up --wait` fails → nightly staging deploy fails. The app itself is fine. Switching to 127.0.0.1 bypasses /etc/hosts and matches what Node actually listens on. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 18:49:32 +02:00
Marcel	3668555421	fix(compose): mark create-buckets as one-shot for up --wait Some checks failed CI / Unit & Component Tests (push) Failing after 2m47s Details CI / OCR Service Tests (push) Successful in 17s Details CI / Backend Unit Tests (push) Successful in 4m12s Details CI / fail2ban Regex (push) Successful in 37s Details CI / Compose Bucket Idempotency (push) Successful in 56s Details CI / Unit & Component Tests (pull_request) Failing after 2m49s Details CI / OCR Service Tests (pull_request) Successful in 16s Details CI / Backend Unit Tests (pull_request) Successful in 4m13s Details CI / fail2ban Regex (pull_request) Successful in 38s Details CI / Compose Bucket Idempotency (pull_request) Successful in 58s Details Closes #510. `docker compose up -d --wait` exits 1 even when every service is healthy because the one-shot `create-buckets` exits 0 and --wait expects "running". The whole stack came up fine on staging, but the workflow gate failed before the smoke step could run. Two changes: 1. create-buckets: `restart: "no"` declares one-shot intent. 2. backend.depends_on: add `create-buckets: service_completed_successfully`. With both, compose v2.20+ understands create-buckets is a one-shot that must complete successfully, and --wait treats exited(0) as the target state. Backend startup now also correctly gates on bucket bootstrap (closes a latent race where backend could start before the archiv-app policy was bound). Verified `docker compose config --quiet` parses and the resolved config shows the right dependency graph. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 16:33:04 +02:00
Marcel	f8f0951bd5	fix(minio): bake bootstrap.sh into image instead of bind-mounting Some checks failed CI / Unit & Component Tests (push) Has been cancelled Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details CI / Unit & Component Tests (pull_request) Failing after 2m50s Details CI / OCR Service Tests (pull_request) Successful in 17s Details CI / Backend Unit Tests (pull_request) Successful in 4m9s Details CI / fail2ban Regex (pull_request) Failing after 12s Details CI / Compose Bucket Idempotency (pull_request) Successful in 57s Details Closes #506. Under Docker-out-of-Docker (the production Gitea Actions runner), the host daemon resolves the relative bind-mount path against the host filesystem — not the runner container's /workspace. The script is not there, so Docker creates an empty directory at /bootstrap.sh and the entrypoint fails with `/bootstrap.sh: Is a directory`. Bake the script into a tiny derived image (infra/minio/Dockerfile) so there is no runtime path resolution. Works in DooD, regular Docker, and CI. Unblocks the staging / production deploy pipelines from #497 / #499 and turns the Compose Bucket Idempotency CI job green. Verified locally: - `docker compose ... config --quiet` parses - `docker compose ... build create-buckets` builds the image - bootstrap.sh exists as a +x file at /bootstrap.sh inside the image Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 15:32:36 +02:00
Marcel	9adde3cd89	refactor(compose): rename docker network archive-net to archiv-net The docker network was the only `archive-` identifier in either compose file; everything else (user, db, bucket, service account, project name) uses the `archiv-` spelling. Reviewers' eyes stuttered on it on the prod compose review (round 2 of PR #499 — Markus and Tobi). Renamed in both prod and dev compose for consistency and updated the single doc reference to the dev-project-prefixed network name. Operational note: applying this change to a running stack will recreate the network on the next `docker compose up`; containers restart, named volumes are unaffected. `docker compose config --quiet` passes for both compose files and for the staging profile. Sweep confirms zero `archive-net` references remain in the tree. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 14:10:39 +02:00
Marcel	1873f50f7f	infra(mailpit): use nc -z healthcheck instead of wget The mailpit service healthcheck previously assumed `wget` ships in the axllent/mailpit image. That's true for v1.29.7 but is not part of the image's contract — a future Alpine slim-down could drop wget and silently disable the healthcheck. Switched to BusyBox `nc -z localhost 8025`, which is a TCP-port open check with no dependency beyond BusyBox itself. Verified inside axllent/mailpit:v1.29.7 that `nc` is present (/usr/bin/nc, BusyBox v1.37.0) and that the proposed command returns 0 against an open port and non-zero against a closed one. Compose still parses with `--profile staging`. Addresses @tobi's round-2 suggestion on PR #499. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 14:08:23 +02:00
Marcel	a4f2047bcc	security(ocr): pin ALLOWED_PDF_HOSTS=minio in prod ocr-service env Production never sources PDFs from localhost or 127.0.0.1 — the OCR service only reads from MinIO over the internal docker network. The Python default (`minio,localhost,127.0.0.1`) was permissive on purpose for local dev, but in production a future change to that default — or a host-env override — would silently broaden the SSRF surface. Pinning the env var explicitly here freezes the allowlist to the one hostname production actually needs. `docker compose config --quiet` and `--profile staging config --quiet` both still pass. Verified the resolved config emits `ALLOWED_PDF_HOSTS: minio`. Addresses @nora's round-2 suggestion on PR #499 — "five characters of YAML, lifetime guarantee". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 14:07:16 +02:00
Marcel	33300e4ad9	chore(infra): drop aspirational Renovate comments from compose The repo's renovate.json only configures TipTap grouping; Renovate is not currently active against MinIO / mc / mailpit / Postgres / Node / Caddy. The "Renovate keeps it current" comments were aspirational — those tags will rot until Renovate is bootstrapped (tracked in a follow-up issue). The "Pinned mc release; Renovate keeps it current" comment is gone already since the create-buckets entrypoint was extracted to a script in the preceding MinIO-policy commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 13:12:55 +02:00
Marcel	91f70e652d	security(minio): scope archiv-app to bucket-only IAM policy Replaces MinIO's built-in `readwrite` policy (which grants s3:* on arn:aws:s3:::* — every bucket present and future) with a bucket-scoped custom policy `archiv-app-policy`: - s3:GetObject / s3:PutObject / s3:DeleteObject on familienarchiv/* - s3:ListBucket / s3:GetBucketLocation on familienarchiv The previous configuration silently regressed the least-privilege guarantee that the service-account separation was supposed to provide: a future second bucket (logs, backups, mc-mirror staging) would have been read/write/delete-accessible to a compromised backend. While at it, two follow-on fixes: 1. Extract the entrypoint to infra/minio/bootstrap.sh. The previous inline `/bin/sh -c "..."` was already at the YAML-escaping ceiling; adding the policy-JSON heredoc would have made it unreadable. 2. Replace the `\| grep -q readwrite \|\| exit 1` fatal-check with a POSIX `case` substring match. The minio/mc image ships coreutils + bash but NOT grep/awk/sed — the original check was a no-op that ALWAYS exited 1 (verified locally). The new check passes on the first invocation and on every subsequent re-deploy. Idempotency verified locally: two consecutive `docker compose run --rm create-buckets` invocations both exit 0 with the user bound to the new policy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 13:07:56 +02:00
Marcel	4eb5eba347	feat(infra): parameterize OCR mem_limit via OCR_MEM_LIMIT Hardcoded `mem_limit: 12g` only works on CX42+ (16 GB) hosts; a CX32 (8 GB) cannot honour it. Make both mem_limit and memswap_limit driven by the OCR_MEM_LIMIT env var, defaulting to 12g so prod deploys on a CX42 keep current behaviour. Operators on smaller hosts override to 6g. Verified compose interpolation produces 12 GiB by default and 6 GiB when OCR_MEM_LIMIT=6g. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 12:01:23 +02:00
Marcel	47c5f77c81	fix(infra): fail loud when archiv-app is missing the readwrite policy The previous `mc admin policy attach … \|\| true` swallowed every failure mode: a renamed policy, an mc CLI signature change, or a transient MinIO error would leave the bootstrap container exiting zero with the service account possessing no permissions, and the backend would then fail every S3 call after a "successful" deploy. Replace the silent fallback with verify-after: keep the attach (idempotent in current mc, redundant in older versions), then assert via `mc admin user info` that `readwrite` ends up on archiv-app. A genuine attach failure now exits 1 and blocks the stack from starting. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 12:00:34 +02:00
Marcel	a36f25cfc3	fix(infra): pin minio/mc client tag Removes the implicit `:latest` from the create-buckets bootstrap container. Pins to RELEASE.2025-08-13T08-35-41Z so a breaking change in mc CLI syntax cannot silently brick deploys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 11:59:18 +02:00
Marcel	c9ac83b2ba	fix(infra): pin axllent/mailpit tag Removes `:latest` from the mailpit service; pins to v1.29.7 so staging deploys are reproducible. Renovate keeps the tag current. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 11:58:34 +02:00
Marcel	ecb930e5f9	feat(infra): add docker-compose.prod.yml for production/staging Standalone production compose file (not an overlay) that runs the full stack on a single host. Environment isolation is achieved via the docker compose project name (-p archiv-production / -p archiv-staging) so the two environments cohabit cleanly. Key choices, resolved in #497 review: - Named volumes for persistent data (no host bind mounts) - MinIO pinned to a specific RELEASE tag (no :latest) - Backend uses MinIO service account (S3_ACCESS_KEY=archiv-app), not root credentials; create-buckets bootstraps the account - Mailpit lives under profiles: [staging] so no real SMTP secret is ever wired into the staging deploy - OCR mem_limit 12g + healthcheck (start_period 120s) copied from the dev compose so docker compose up -d --wait works in CI - Backend admin credentials wired through APP_ADMIN_USERNAME / APP_ADMIN_PASSWORD; first deploy locks the password in permanently because UserDataInitializer is idempotent on email - All host ports bound to 127.0.0.1; Caddy fronts external traffic Refs #497. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 21:53:19 +02:00

30 Commits