familienarchiv

Author	SHA1	Message	Date
Marcel	49c5324352	fix(ci): use Bash array for curl --resolve to fix smoke tests All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m6s Details CI / OCR Service Tests (pull_request) Successful in 20s Details CI / Backend Unit Tests (pull_request) Successful in 3m8s Details CI / fail2ban Regex (pull_request) Successful in 40s Details CI / Semgrep Security Scan (pull_request) Successful in 19s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details CI / Unit & Component Tests (push) Successful in 3m3s Details CI / OCR Service Tests (push) Successful in 19s Details CI / Backend Unit Tests (push) Successful in 3m5s Details CI / fail2ban Regex (push) Successful in 42s Details CI / Semgrep Security Scan (push) Successful in 19s Details CI / Compose Bucket Idempotency (push) Successful in 1m0s Details nightly / deploy-staging (push) Successful in 2m8s Details Quoting RESOLVE as a string and expanding with "$RESOLVE" passes the flag and its value as a single token to curl; curl rejects the whole string as an unknown option (exit 2). Switching to a Bash array and "${RESOLVE[@]}" ensures the two words are always passed as separate arguments regardless of quoting context. Also aligns release.yml gateway detection with nightly.yml: replaces `ip route` (requires iproute2) with /proc/net/route (always available in the job container, no extra package needed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 12:01:44 +02:00
Marcel	c43f45a472	Merge branch 'fix/issue-601-obs-stack-permanent' Some checks failed CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details CI / Unit & Component Tests (push) Has been cancelled Details	2026-05-16 10:19:59 +02:00
Marcel	55ccd5f3c0	ci(obs): replace rsync with rm+cp in deploy step rsync is not present in the act_runner job container image. rm -rf + cp -r gives identical semantics (including removal of deleted files) using only coreutils, which are always available. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 10:18:42 +02:00
marcel	0bb0a314ad	ci(obs): add obs-glitchtip to health assertion loop (now has /_health/ healthcheck) Some checks are pending CI / Unit & Component Tests (pull_request) Waiting to run Details CI / OCR Service Tests (pull_request) Waiting to run Details CI / Backend Unit Tests (pull_request) Waiting to run Details CI / fail2ban Regex (pull_request) Waiting to run Details CI / Compose Bucket Idempotency (pull_request) Waiting to run Details	2026-05-16 09:36:37 +02:00
Marcel	6720a5aeb2	chore(obs): improve deploy maintainability from review feedback Some checks failed CI / Unit & Component Tests (pull_request) Successful in 5m45s Details CI / OCR Service Tests (pull_request) Successful in 47s Details CI / fail2ban Regex (pull_request) Has been cancelled Details CI / Compose Bucket Idempotency (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details - Move POSTGRES_USER to obs.env (non-secret, constant across envs) - Replace cp -r with rsync -a --delete so removed config files are purged from /opt/familienarchiv on next deploy instead of lingering - Document --env-file ordering contract in validate + start steps: obs.env first (defaults), obs-secrets.env second (wins on dupes) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 09:20:08 +02:00
Marcel	9662ff5f8c	ci(obs): quote heredoc delimiter in nightly obs-secrets.env write Prevents shell from expanding '$' in Gitea-rendered secret values. Without the quote, a password like 'P@$s5w0rd' has '$s5w0rd' silently expanded to '' — writing a truncated value to obs-secrets.env. '<<'EOF'' suppresses shell expansion; Gitea's '${{ }}' template rendering already ran before the shell sees the script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 09:03:46 +02:00
Marcel	f5c7be932b	ci(obs): document POSTGRES_HOST derivation from Compose project name Some checks failed CI / Unit & Component Tests (pull_request) Successful in 5m38s Details CI / OCR Service Tests (pull_request) Successful in 45s Details CI / Backend Unit Tests (pull_request) Failing after 10m48s Details CI / fail2ban Regex (pull_request) Successful in 2m51s Details CI / Compose Bucket Idempotency (pull_request) Successful in 2m16s Details The container names archiv-staging-db-1 and archiv-production-db-1 are derived from the Compose project name + service name. A project rename silently breaks the obs stack DB connection. Add a comment at the point of definition so the dependency is obvious when someone changes it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 08:54:17 +02:00
Marcel	dec0001bd1	ci(obs): chmod 600 obs-secrets.env after creation in both workflows The heredoc creates the file with default umask permissions (644 — world-readable). Setting 600 immediately after creation prevents other processes on the host from reading the Grafana, GlitchTip, and Postgres credentials. Defence-in-depth for the single-tenant VPS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 08:53:49 +02:00
Marcel	53cf1837b2	fix(obs): set POSTGRES_HOST per environment — staging/prod use compose auto-names not archive-db All checks were successful CI / Unit & Component Tests (pull_request) Successful in 2m58s Details CI / OCR Service Tests (pull_request) Successful in 19s Details CI / Backend Unit Tests (pull_request) Successful in 2m39s Details CI / fail2ban Regex (pull_request) Successful in 40s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 00:21:53 +02:00
Marcel	c5139851b8	ci(obs): GitOps obs env split in nightly — obs.env in git, secrets fresh from Gitea Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 00:18:38 +02:00
Marcel	79735e23e0	ci(obs): assert obs-loki/prometheus/grafana/tempo are healthy after stack up All checks were successful CI / Unit & Component Tests (pull_request) Successful in 2m58s Details CI / OCR Service Tests (pull_request) Successful in 17s Details CI / Backend Unit Tests (pull_request) Successful in 2m36s Details CI / fail2ban Regex (pull_request) Successful in 41s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 00:01:48 +02:00
Marcel	df37113d38	ci(obs): add compose config dry-run before obs stack up to catch .env substitution errors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 00:01:17 +02:00
Marcel	7e52494880	fix(ci): deploy obs configs to /opt/familienarchiv/ before starting stack All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m4s Details CI / OCR Service Tests (pull_request) Successful in 18s Details CI / Backend Unit Tests (pull_request) Successful in 2m42s Details CI / fail2ban Regex (pull_request) Successful in 41s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details The observability stack's bind-mount sources pointed to workspace-relative paths. When CI wiped the workspace between runs, containers kept running but their config files disappeared — causing Docker to auto-create directories at the missing paths and crash the services on next restart. Fix: mount /opt/familienarchiv/ into CI job containers via runner-config.yaml, then copy infra/observability/ and docker-compose.observability.yml there before docker compose up. Compose runs from the permanent path, so bind mounts resolve to stable host paths that survive workspace wipes. Docker Compose reads /opt/familienarchiv/.env automatically (no --env-file flag), which is managed on the server and persists between CI runs. Closes #601 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 21:59:23 +02:00
Marcel	56c3e51657	fix(ci): replace overlay2 sync with workspace bind mount for DooD runner-config.yaml: correct path to /srv/gitea-workspace (VPS, not Synology). docker-compose.observability.yml: revert 5 bind mounts to plain relative paths; OBS_CONFIG_DIR variable is no longer needed. nightly.yml / release.yml: remove OBS_CONFIG_DIR env injection and the "Sync observability configs to host" step from both workflows. With workdir_parent=/srv/gitea-workspace and an identical host<->container bind mount, $(pwd) inside job containers resolves to a real host path the daemon can find — no privileged container, no overlay2 inspection, no nsenter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 19:36:55 +02:00
Marcel	1fc47888d5	fix(ci): sync observability configs to host before docker compose up (#598 ) All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m26s Details CI / OCR Service Tests (pull_request) Successful in 18s Details CI / Backend Unit Tests (pull_request) Successful in 2m40s Details CI / fail2ban Regex (pull_request) Successful in 41s Details CI / Compose Bucket Idempotency (pull_request) Successful in 57s Details DooD runner only shares /var/run/docker.sock — no workspace directory is mapped to the host daemon. Relative bind mounts in docker-compose.observability.yml resolved to paths that didn't exist on the host; Docker auto-created directories in their place, causing 'not a directory' mount failures for all five config files. Fix: - docker-compose.observability.yml: replace hardcoded ./infra/observability/ prefix with ${OBS_CONFIG_DIR:-./infra/observability} so the path is configurable while remaining backwards-compatible for local use. - nightly.yml / release.yml: add a 'Sync observability configs to host' step that finds the job container's overlay2 MergedDir (the container's full filesystem as seen from the host mount namespace), then uses the existing nsenter/alpine pattern to cp the config tree into a stable host path (/srv/familienarchiv-{staging,production}/obs-configs). OBS_CONFIG_DIR is injected into the env file so Compose picks it up. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 19:02:53 +02:00
Marcel	fed427dc4a	fix(infra): set OTEL_EXPORTER_OTLP_ENDPOINT in docker-compose.prod.yml Some checks failed CI / Unit & Component Tests (pull_request) Has been cancelled Details CI / OCR Service Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details CI / fail2ban Regex (pull_request) Has been cancelled Details CI / Compose Bucket Idempotency (pull_request) Has been cancelled Details CI / Unit & Component Tests (push) Has been cancelled Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details The endpoint belongs in the compose file (hardcoded to the in-network Tempo service) rather than per-environment workflow files. This covers both staging (nightly.yml) and production (release.yml) with a single change and removes the duplicate from the nightly env-file block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:43:23 +02:00
Marcel	cf78ab2f8e	fix(staging): correct backend healthcheck port and OTel endpoint Some checks failed CI / OCR Service Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details CI / fail2ban Regex (pull_request) Has been cancelled Details CI / Compose Bucket Idempotency (pull_request) Has been cancelled Details CI / Unit & Component Tests (pull_request) Has been cancelled Details Two bugs introduced when the management port was split from the app port: 1. Backend healthcheck hit localhost:8080/actuator/health (app port) — actuator is on management.server.port=8081, so every probe got a 404 from the main MVC dispatcher, marking the container permanently unhealthy. Fix: change the probe to localhost:8081. 2. OTEL_EXPORTER_OTLP_ENDPOINT was not set in .env.staging, so the exporter fell back to http://localhost:4317 (the CI-safe default) instead of http://tempo:4317 (the in-network Tempo service). Fix: inject the correct endpoint in the nightly env-file generation step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:37:15 +02:00
Marcel	ada3a3ccaf	devops(ci): add --remove-orphans to observability stack deploy steps All checks were successful CI / Unit & Component Tests (pull_request) Successful in 5m27s Details CI / OCR Service Tests (pull_request) Successful in 34s Details CI / Backend Unit Tests (pull_request) Successful in 7m13s Details CI / fail2ban Regex (pull_request) Successful in 1m51s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m47s Details CI / Unit & Component Tests (push) Successful in 5m45s Details CI / OCR Service Tests (push) Successful in 36s Details CI / Backend Unit Tests (push) Successful in 7m12s Details CI / fail2ban Regex (push) Successful in 1m54s Details CI / Compose Bucket Idempotency (push) Successful in 1m41s Details Both nightly and release workflows were missing --remove-orphans on the observability compose up, while the main app deploy step already had it. Without it, containers removed from docker-compose.observability.yml linger as unnamed orphans until manually pruned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 14:55:28 +02:00
Marcel	4a7349543a	devops(ci): wire SENTRY_DSN into staging and production env files Adds SENTRY_DSN as an optional secret (empty by default) so it can be set after GlitchTip first-run without requiring another code change. Backend reads it via application.yaml; empty value keeps Sentry disabled. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 13:45:07 +02:00
Marcel	f15e004645	devops(ci): add --wait to observability stack startup Prometheus, Loki, Tempo, and Grafana all define healthchecks in docker-compose.observability.yml. Without --wait, the step exits 0 as soon as containers are created, masking startup failures silently. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 13:44:16 +02:00
Marcel	d7d225af77	devops(observability): wire observability stack into nightly and release deploys All checks were successful CI / Unit & Component Tests (pull_request) Successful in 4m32s Details CI / OCR Service Tests (pull_request) Successful in 17s Details CI / Backend Unit Tests (pull_request) Successful in 4m3s Details CI / fail2ban Regex (pull_request) Successful in 1m55s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m42s Details - docker-compose.prod.yml: add `name: archiv-net` so the network has a stable Docker name regardless of compose project name (-p flag). Both staging and production share the same host-level network, which is correct since the observability stack is a single shared instance. - nightly.yml / release.yml: add observability env vars (POSTGRES_USER, PORT_GRAFANA=3003, PORT_GLITCHTIP=3002, PORT_PROMETHEUS=9090, GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, GLITCHTIP_DOMAIN) to the env file, then `docker compose -f docker-compose.observability.yml up -d` after the app deploy step. PORT_GRAFANA=3003 avoids collision with staging frontend on 3001. Requires two new Gitea secrets: GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 11:22:37 +02:00
Marcel	9c26c00eee	fix(ci): replace iproute2 `ip` with /proc/net/route for gateway detection Some checks failed CI / Unit & Component Tests (push) Has been cancelled Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details CI / Unit & Component Tests (pull_request) Has been cancelled Details CI / OCR Service Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details CI / fail2ban Regex (pull_request) Has been cancelled Details CI / Compose Bucket Idempotency (pull_request) Has been cancelled Details `ip route` (iproute2) is not installed in the Gitea runner container, causing the smoke test step to exit 127. /proc/net/route is a kernel virtual file that is always present on Linux; awk decodes the little-endian hex gateway field to dotted-decimal without any external binary dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:50:56 +02:00
Marcel	6d16be4669	fix(ci): quote \$RESOLVE in all curl calls Some checks failed CI / Unit & Component Tests (pull_request) Failing after 1m51s Details CI / OCR Service Tests (pull_request) Successful in 18s Details CI / Backend Unit Tests (pull_request) Successful in 4m1s Details CI / fail2ban Regex (pull_request) Successful in 38s Details CI / Compose Bucket Idempotency (pull_request) Failing after 11s Details CI / Unit & Component Tests (push) Failing after 1m51s Details CI / OCR Service Tests (push) Successful in 18s Details CI / Backend Unit Tests (push) Successful in 4m10s Details CI / fail2ban Regex (push) Successful in 38s Details CI / Compose Bucket Idempotency (push) Failing after 10s Details Unquoted variable expansion is safe here since the value contains no spaces or glob characters, but quoting is the correct default and keeps the script consistent with surrounding style. Addresses review suggestion by Felix Brandt and Tobias Wendt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:26:35 +02:00
Marcel	f1032865f3	fix(ci): guard against empty HOST_IP in smoke test If `ip route show default` returns no output the old code passed an empty string to curl --resolve, producing a confusing error 6 ("couldn't resolve host") with no indication that gateway detection had failed. The new guard exits immediately with a clear message. Addresses review concern raised by Tobias Wendt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:26:35 +02:00
Marcel	3056311c24	fix(ci): resolve smoke test host via bridge gateway, not 127.0.0.1 Some checks failed CI / Unit & Component Tests (pull_request) Failing after 1m50s Details CI / OCR Service Tests (pull_request) Successful in 17s Details CI / Backend Unit Tests (pull_request) Successful in 4m8s Details CI / fail2ban Regex (pull_request) Successful in 38s Details CI / Compose Bucket Idempotency (pull_request) Failing after 10s Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Unit & Component Tests (push) Has started running Details CI / Compose Bucket Idempotency (push) Has been cancelled Details Job containers run in bridge network mode (runner-config.yaml). Inside a bridge-networked container 127.0.0.1 is the container's own loopback; Caddy on the host is unreachable there, causing an immediate ECONNREFUSED. Use the Docker bridge gateway IP instead — the host's docker0 interface where Caddy (bound on 0.0.0.0:443) is reachable from the container. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:10:17 +02:00
Marcel	544b96bc9e	fix(ci): pin Reload Caddy to alpine:3.21 digest, add reload-vs-restart rationale - Switch ubuntu:22.04 (floating, ~70 MB) to alpine:3.21 pinned by sha256 digest (~5 MB); util-linux installed at run time via apk add - Add explicit comment explaining why `reload` not `restart`: SIGHUP re-reads config in-process without dropping TLS connections Addresses Tobias + Nora blocker from PR review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 07:42:28 +02:00
Marcel	d750d5cee2	fix(ci): reload Caddy via nsenter, not sudo systemctl `sudo systemctl reload caddy` does not work from inside a DooD job container: `systemctl` is absent from Ubuntu container images and container processes cannot reach the host systemd without entering its namespaces. Replace with `docker run --privileged --pid=host ubuntu:22.04 nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy`, which uses the already-mounted Docker socket to spin up a privileged sibling container that enters the host PID namespace via nsenter. Tested live on the Hetzner VPS. No sudoers entry required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 07:42:28 +02:00
Marcel	90f52eae41	ci(nightly): reload Caddy before smoke test Adds a `sudo systemctl reload caddy` step between the docker compose deploy and the smoke test. This ensures any committed Caddyfile changes are applied before the public surface is verified. Previously the workflow had no mechanism to push Caddyfile changes to the running host daemon. A Caddyfile edit would land in the repo but Caddy would keep serving the previous config, causing the smoke test to catch a stale header or still-proxied /actuator route rather than the intended current config. This step also surfaces the root cause of today's port-443 failure explicitly: if Caddy is not running, the step fails with a clear service error rather than a misleading "Failed to connect to port 443" from curl. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 07:42:28 +02:00
Marcel	3775f4cb52	ci(nightly): regression guard for backend /import:ro mount Some checks failed CI / Unit & Component Tests (pull_request) Failing after 2m48s Details CI / OCR Service Tests (pull_request) Successful in 18s Details CI / Backend Unit Tests (pull_request) Successful in 4m13s Details CI / fail2ban Regex (pull_request) Successful in 38s Details CI / Compose Bucket Idempotency (pull_request) Failing after 11s Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details CI / Unit & Component Tests (push) Has been cancelled Details Sara flagged that a future "compose cleanup" PR could silently drop the backend volumes block and CI would happily pass while mass import on staging silently broke. Adds a pre-build step that renders the staging compose config and fails the deploy if `target: /import` or `read_only: true` is missing. Local verification of the guard: - Volumes block removed → `grep -q 'target: /import'` exits 1 → step fails - Volumes block present → both greps match → step passes Addresses Sara's review on #526. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 20:08:30 +02:00
Marcel	9703a72e6c	ci(nightly): wire IMPORT_HOST_DIR=/srv/familienarchiv-staging/import The compose file now requires IMPORT_HOST_DIR or refuses to start (#526). Without this line the next nightly deploy would fail with a clear interpolation error, but it should not fail — the staging import payload already lives at this host path (rsync'd in #526). Addresses Tobias's review on #526. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 20:05:55 +02:00
Marcel	54a8f7f8e9	fix(workflows): match runner label — runs-on ubuntu-latest, not self-hosted Some checks failed CI / Unit & Component Tests (pull_request) Has been cancelled Details CI / OCR Service Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details CI / fail2ban Regex (pull_request) Has been cancelled Details CI / Compose Bucket Idempotency (pull_request) Has been cancelled Details CI / Unit & Component Tests (push) Failing after 2m49s Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details CI / OCR Service Tests (push) Has been cancelled Details Closes #508. Our gitea-runner advertises labels ubuntu-latest / ubuntu-24.04 / ubuntu-22.04. `runs-on: self-hosted` never matches → dispatched deploy jobs sit in the queue forever. The runner is still genuinely self-hosted (DooD socket, joined to gitea_gitea net, single-tenant per ADR-011) — the `self-hosted` token was just an unconfirmed assumption about the label name. Unblocks #497 / #499 first deploy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 16:15:53 +02:00
Marcel	440a191138	infra(workflows): annotate env-file cleanup as load-bearing The `if: always()` conditional on the env-file cleanup step in both deploy workflows is what makes the ADR-011 single-tenant runner trust model safe: secrets land on disk before each deploy and are wiped unconditionally afterwards. A future workflow refactor that drops `if: always()` would silently leave plaintext secrets on the runner on any failed deploy. The ADR documents this; the workflow file did not. Adds a prominent inline comment so the next reader of the YAML sees the constraint without having to cross-reference ADR-011. No behaviour change — both workflows still parse. Addresses @nora's round-2 suggestion on PR #499 — "linchpin of the ADR-011 trust model". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 14:09:12 +02:00
Marcel	09680557ef	security(caddy): add Permissions-Policy header Adds `Permissions-Policy: camera=(), microphone=(), geolocation=()` to the shared (security_headers) snippet, so both archiv vhosts and the git vhost deny browser APIs the app does not use. Reduces blast radius of an XSS landing in a privileged origin. The deploy smoke steps in nightly.yml and release.yml gain a matching assertion against the canonical header value, so a future Caddyfile edit that drops or loosens the header (e.g. `camera=(self)`) fails the deploy instead of regressing silently. `caddy validate` against caddy:2 passes; both workflow YAMLs parse. Addresses @nora's round-2 suggestion on PR #499 — "lower-impact than CSP but nearly free". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 14:06:13 +02:00
Marcel	8fcf653cb0	ci(smoke): pin HSTS to preload-list-eligible value Replaces the presence-only `grep -qi strict-transport-security` smoke assertion in both nightly.yml and release.yml with a value-pinning regex that requires `max-age=31536000`, `includeSubDomains`, and `preload`. A future Caddyfile edit that drops any of those three parts now fails the deploy smoke step instead of passing silently. Verified locally that the new pattern matches the preload-eligible value and rejects three degraded forms (short max-age, missing includeSubDomains, missing preload). Addresses @sara's round-2 note on PR #499 — "presence check, not value check". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 14:05:02 +02:00
Marcel	fe1451f570	ci(smoke): pin curl to 127.0.0.1 via --resolve The smoke step previously curled the public hostname unconditionally, which routes the runner's request via DNS → router → back into the same host. Many SOHO routers do not implement hairpin NAT (or do so only after a firmware update), so the deploy may pass on day one and silently fail on day 90. --resolve "<host>:443:127.0.0.1" pins the hostname to the runner's loopback while keeping SNI on the public name (so the cert validates correctly and the Caddy vhost block matches). The smoke test now verifies that the Caddy-on-the-same-host is serving the right hostname end-to-end, with no router dependency. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 13:12:05 +02:00
Marcel	f2ec81547b	ci(deploy): add --pull to docker compose build for CVE pickup Without --pull, the host's Docker layer cache wins: if a CVE drops in node:20.19.0-alpine3.21 / postgres:16-alpine and the vendor re-publishes the same tag, the runner keeps serving the cached layer until the cache is manually cleared — a silent supply-chain blind spot. Adding --pull to both `compose build` invocations costs a single re-pull per run and lifts the base-image patch lag from "next host prune" to "next nightly". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 13:10:59 +02:00
Marcel	83565c6bb5	docs(ci): document workflow operational assumptions The two deploy workflows make two non-obvious assumptions that future maintainers should not have to rediscover by reading the diff: 1. Single-tenant self-hosted runner — the .env.* file lands on disk during the deploy and is cleaned up unconditionally. Multi-tenant usage would require switching to stdin-piped env input. 2. Host docker layer cache is authoritative — there is no actions/cache directive; a host-level `docker system prune` will cold-start the next build. Both notes added as block comments at the top of each workflow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 12:06:48 +02:00
Marcel	c523721ce8	feat(ci): smoke test staging deploy after up --wait Healthchecks prove containers are healthy on the docker network; they do not prove the public URL is reachable, HSTS still fires, or /actuator is still blocked at the edge. Add a post-deploy smoke step to nightly.yml that: 1. GETs https://staging.raddatz.cloud/login (frontend reachable) 2. asserts the response includes the Strict-Transport-Security header 3. asserts /actuator/health returns 404 (defense-in-depth verified) Failure aborts the workflow before the env-file cleanup step. The cleanup step still runs because it is `if: always()`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 12:05:00 +02:00
Marcel	59349dfe93	feat(ci): add nightly staging deploy workflow Runs daily at 02:00 (and on workflow_dispatch). Builds the prod compose stack with BuildKit, writes a transient .env.staging from Gitea secrets, then `docker compose up -d --wait` so the job fails loudly if any service's healthcheck never reports healthy. The --profile staging flag starts the mailpit catcher in place of a real SMTP relay; no production SMTP credentials touch the staging environment. The .env.staging file is cleaned up in `if: always()` to avoid leaving secrets in the runner workspace between runs. Refs #497. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 21:55:41 +02:00

39 Commits