familienarchiv

Author	SHA1	Message	Date
Marcel	efa01337a5	ci: restrict push trigger to main — eliminate duplicate runs on feature branches All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m26s Details CI / OCR Service Tests (pull_request) Successful in 17s Details CI / Backend Unit Tests (pull_request) Successful in 4m18s Details CI / fail2ban Regex (pull_request) Successful in 40s Details CI / Compose Bucket Idempotency (pull_request) Successful in 59s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 11:09:52 +02:00
Marcel	3de0d2f0fe	fix(ci): add IMPORT_HOST_DIR stub to compose-idempotency env file Some checks failed CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details CI / Unit & Component Tests (push) Has been cancelled Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details Docker Compose interpolates all variables in the full file even when only a subset of services is requested. The backend service uses IMPORT_HOST_DIR with :? (hard-required), causing the idempotency job to abort before any container starts. A dummy path satisfies the parser; the backend service is never started in this job so the path need not exist. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 10:58:38 +02:00
Marcel	0abbc147e2	ci(unit-tests): add negative self-test case to upload-artifact guard Some checks failed CI / Unit & Component Tests (push) Has been cancelled Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details The previous self-test proved the regex catches @v5 (positive case). This adds a negative case proving @v3 is NOT flagged — guards against a false-positive that would break every CI run permanently. Suggested by Sara Holt in review of PR #558. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 10:58:19 +02:00
Marcel	fa46492759	ci(workflows): downgrade upload-artifact v4 → v3 — Gitea act_runner limitation (ADR-014) Reverts the re-regression introduced in `410b91e2`. Gitea Actions (act_runner) does not implement the v4 artifact protocol — jobs report failure even when all tests pass. Pins all three call sites back to @v3 and adds load-bearing inline comments pointing to ADR-014 / #557. This commit makes the grep guard added in the previous commit GREEN. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 10:58:19 +02:00
Marcel	3965541879	ci(unit-tests): add grep guard for (upload\|download)-artifact@v4+ Adds a repo-invariant check in the same 'Assert' block as the ADR-012 birpc guard. Anchored to YAML `uses:` lines so the inline self-test fixture does not false-positive. Fails with an actionable error referencing ADR-014 / #557. Guard is intentionally RED at this commit — the three v4 call sites are downgraded in the next commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 10:58:19 +02:00
Marcel	c820884765	ci(coverage-flake-probe): add workflow_dispatch matrix job (20 parallel runs) Verification mechanism for the 20-run acceptance criterion of issue #553. Triggered manually via workflow_dispatch, runs the full coverage suite 20× in parallel against a single SHA, asserts zero `[birpc] rpc is closed` lines in every cell. One fire, parallel cost (~one main-job's wall-clock), deterministic signal for the teardown race. Cheaper than 20 sequential push events and tests the same property the AC names. Closes the verification gap raised by Tobias and Elicit in the issue discussion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:12:04 +02:00
Marcel	67cd56acc7	ci(unit-tests): extend grep guard to async vi.mock with dynamic import The pdfjs-dist literal grep added in `9260866f` only caught one named trigger of the birpc teardown race; the underlying mechanism (ADR 012 / #553) is any async vi.mock factory whose body performs `await import(...)`. Add a second PCRE-multiline grep matching that shape. Scoped to */.{spec,test}.ts under frontend/src/, excluding __meta__ (which holds the fixture strings exercising the meta-test). Defence in depth pairs with the ESLint rule (saves at edit time) and the in-suite meta-test (catches when tests run). Verified locally with real GNU grep against a planted synthetic offender. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:11:09 +02:00
Marcel	9260866f47	ci(unit-tests): add early grep check for banned vi.mock pdfjs-dist pattern Some checks failed CI / Unit & Component Tests (push) Failing after 1m47s Details CI / OCR Service Tests (push) Successful in 16s Details CI / Backend Unit Tests (push) Successful in 4m11s Details CI / fail2ban Regex (push) Successful in 38s Details CI / Compose Bucket Idempotency (push) Failing after 11s Details Adds a static grep step that runs after Lint and before the test suite. Fails in ~1 s if any file under frontend/src/ contains the banned vi.mock('pdfjs-dist' pattern, catching the regression before Playwright spins up. Belt-and-suspenders with the ESLint rule (ADR 012). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 12:32:23 +02:00
Marcel	1ead1f293f	ci(coverage): document that birpc guard covers coverage run only Adds a comment above the assertion step so a future developer diagnosing a birpc-related failure in `npm test` knows where to find the diagnostic. Addresses Sara Holt + Tobias Wendt round-4 observation on PR #536. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:57:28 +02:00
Marcel	729f5c66d6	ci(coverage): use grep -F for birpc guard to avoid BRE escaping -F (fixed string) matches the literal pattern [birpc] rpc is closed without relying on BRE bracket escaping, making the intent explicit and immune to accidental regex interpretation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:57:28 +02:00
Marcel	d40f477397	ci(coverage): include coverage log in artifact upload The birpc guard step writes to /tmp/coverage-test-<run_id>.log and exits 1 when a race is detected. Without this file in the artifact, the evidence disappears when the runner tears down — only the exit code remained visible. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:57:28 +02:00
Marcel	cf78957476	ci(coverage): harden coverage guard step - Add explicit set -eo pipefail so npm test:coverage exit code propagates through the pipe (not just tee's always-0 exit) - Scope log file to github.run_id to prevent stale-log false positives on retried steps sharing the same runner /tmp - Tighten grep pattern to \[birpc\] rpc is closed to avoid matching unrelated log lines that happen to contain "rpc is closed" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:57:28 +02:00
Marcel	3594204214	ci(coverage): simplify coverage step and pin shell to bash - removes unreachable `; exit ${PIPESTATUS[0]}` — already covered by pipefail (Tobias) - adds explicit `shell: bash` to both new steps for clarity (Tobias) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:57:28 +02:00
Marcel	538adb43a9	ci(guard): fail unit-tests job if [birpc] rpc is closed appears in coverage run Captures npm run test:coverage output with tee and adds an always-run step that greps for the teardown-race fingerprint. Any future regression where a vi.mock factory races with birpc teardown will now surface as an explicit CI failure rather than a silent exit-1 after all tests report green (#535). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:57:28 +02:00
Marcel	9c26c00eee	fix(ci): replace iproute2 `ip` with /proc/net/route for gateway detection Some checks failed CI / Unit & Component Tests (push) Has been cancelled Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details CI / Unit & Component Tests (pull_request) Has been cancelled Details CI / OCR Service Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details CI / fail2ban Regex (pull_request) Has been cancelled Details CI / Compose Bucket Idempotency (pull_request) Has been cancelled Details `ip route` (iproute2) is not installed in the Gitea runner container, causing the smoke test step to exit 127. /proc/net/route is a kernel virtual file that is always present on Linux; awk decodes the little-endian hex gateway field to dotted-decimal without any external binary dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:50:56 +02:00
Marcel	6d16be4669	fix(ci): quote \$RESOLVE in all curl calls Some checks failed CI / Unit & Component Tests (pull_request) Failing after 1m51s Details CI / OCR Service Tests (pull_request) Successful in 18s Details CI / Backend Unit Tests (pull_request) Successful in 4m1s Details CI / fail2ban Regex (pull_request) Successful in 38s Details CI / Compose Bucket Idempotency (pull_request) Failing after 11s Details CI / Unit & Component Tests (push) Failing after 1m51s Details CI / OCR Service Tests (push) Successful in 18s Details CI / Backend Unit Tests (push) Successful in 4m10s Details CI / fail2ban Regex (push) Successful in 38s Details CI / Compose Bucket Idempotency (push) Failing after 10s Details Unquoted variable expansion is safe here since the value contains no spaces or glob characters, but quoting is the correct default and keeps the script consistent with surrounding style. Addresses review suggestion by Felix Brandt and Tobias Wendt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:26:35 +02:00
Marcel	f1032865f3	fix(ci): guard against empty HOST_IP in smoke test If `ip route show default` returns no output the old code passed an empty string to curl --resolve, producing a confusing error 6 ("couldn't resolve host") with no indication that gateway detection had failed. The new guard exits immediately with a clear message. Addresses review concern raised by Tobias Wendt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:26:35 +02:00
Marcel	3056311c24	fix(ci): resolve smoke test host via bridge gateway, not 127.0.0.1 Some checks failed CI / Unit & Component Tests (pull_request) Failing after 1m50s Details CI / OCR Service Tests (pull_request) Successful in 17s Details CI / Backend Unit Tests (pull_request) Successful in 4m8s Details CI / fail2ban Regex (pull_request) Successful in 38s Details CI / Compose Bucket Idempotency (pull_request) Failing after 10s Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Unit & Component Tests (push) Has started running Details CI / Compose Bucket Idempotency (push) Has been cancelled Details Job containers run in bridge network mode (runner-config.yaml). Inside a bridge-networked container 127.0.0.1 is the container's own loopback; Caddy on the host is unreachable there, causing an immediate ECONNREFUSED. Use the Docker bridge gateway IP instead — the host's docker0 interface where Caddy (bound on 0.0.0.0:443) is reachable from the container. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 09:10:17 +02:00
Marcel	544b96bc9e	fix(ci): pin Reload Caddy to alpine:3.21 digest, add reload-vs-restart rationale - Switch ubuntu:22.04 (floating, ~70 MB) to alpine:3.21 pinned by sha256 digest (~5 MB); util-linux installed at run time via apk add - Add explicit comment explaining why `reload` not `restart`: SIGHUP re-reads config in-process without dropping TLS connections Addresses Tobias + Nora blocker from PR review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 07:42:28 +02:00
Marcel	d29169eb39	fix(ci): add Caddy reload step to release workflow Same gap as nightly.yml: production deploys also need Caddy to reload the updated Caddyfile before the smoke test validates the public surface. Uses the same nsenter pattern introduced in the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 07:42:28 +02:00
Marcel	d750d5cee2	fix(ci): reload Caddy via nsenter, not sudo systemctl `sudo systemctl reload caddy` does not work from inside a DooD job container: `systemctl` is absent from Ubuntu container images and container processes cannot reach the host systemd without entering its namespaces. Replace with `docker run --privileged --pid=host ubuntu:22.04 nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy`, which uses the already-mounted Docker socket to spin up a privileged sibling container that enters the host PID namespace via nsenter. Tested live on the Hetzner VPS. No sudoers entry required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 07:42:28 +02:00
Marcel	90f52eae41	ci(nightly): reload Caddy before smoke test Adds a `sudo systemctl reload caddy` step between the docker compose deploy and the smoke test. This ensures any committed Caddyfile changes are applied before the public surface is verified. Previously the workflow had no mechanism to push Caddyfile changes to the running host daemon. A Caddyfile edit would land in the repo but Caddy would keep serving the previous config, causing the smoke test to catch a stale header or still-proxied /actuator route rather than the intended current config. This step also surfaces the root cause of today's port-443 failure explicitly: if Caddy is not running, the step fails with a clear service error rather than a misleading "Failed to connect to port 443" from curl. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 07:42:28 +02:00
Marcel	e42c7b04c1	ci: drop redundant npm test step, coverage run covers it The test:coverage step runs the full suite under Istanbul; running `npm test` first executes every test twice for no extra signal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 21:50:28 +02:00
Marcel	3775f4cb52	ci(nightly): regression guard for backend /import:ro mount Some checks failed CI / Backend Unit Tests (pull_request) Successful in 4m13s Details CI / fail2ban Regex (pull_request) Successful in 38s Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details CI / Unit & Component Tests (pull_request) Failing after 2m48s Details CI / OCR Service Tests (pull_request) Successful in 18s Details CI / Compose Bucket Idempotency (pull_request) Failing after 11s Details CI / Unit & Component Tests (push) Has been cancelled Details Sara flagged that a future "compose cleanup" PR could silently drop the backend volumes block and CI would happily pass while mass import on staging silently broke. Adds a pre-build step that renders the staging compose config and fails the deploy if `target: /import` or `read_only: true` is missing. Local verification of the guard: - Volumes block removed → `grep -q 'target: /import'` exits 1 → step fails - Volumes block present → both greps match → step passes Addresses Sara's review on #526. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 20:08:30 +02:00
Marcel	c2c42706c7	ci(release): wire IMPORT_HOST_DIR=/srv/familienarchiv-production/import Mirrors the staging change. The host directory does not yet exist on the production server — first production release that consumes this will create an empty bind source via Docker's auto-create behaviour; mass import then reports "no spreadsheet found" until an operator pre-stages a payload there. Addresses Tobias's review on #526. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 20:06:33 +02:00
Marcel	9703a72e6c	ci(nightly): wire IMPORT_HOST_DIR=/srv/familienarchiv-staging/import The compose file now requires IMPORT_HOST_DIR or refuses to start (#526). Without this line the next nightly deploy would fail with a clear interpolation error, but it should not fail — the staging import payload already lives at this host path (rsync'd in #526). Addresses Tobias's review on #526. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 20:05:55 +02:00
Marcel	6ba7254344	test(ci): assert prerender output is only /hilfe/transkription Some checks failed CI / Unit & Component Tests (push) Has been cancelled Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details CI / Unit & Component Tests (pull_request) Has been cancelled Details CI / OCR Service Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details CI / fail2ban Regex (pull_request) Has been cancelled Details CI / Compose Bucket Idempotency (pull_request) Has been cancelled Details Addresses Sara's review request on #515. Without this gate, a future regression that turns prerender.crawl back on (or adds a new prerender entry whose nav links into protected routes) would silently bake /, /documents, /persons etc. to "redirect-to-login" HTML and re-introduce #514. Verified the script catches the current broken build state: $ find build/prerendered ... -not -path 'hilfe/*' ... build/prerendered/{index,documents,persons,geschichten,stammbaum}.html Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 17:00:54 +02:00
Marcel	54a8f7f8e9	fix(workflows): match runner label — runs-on ubuntu-latest, not self-hosted Some checks failed CI / Unit & Component Tests (pull_request) Has been cancelled Details CI / OCR Service Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details CI / fail2ban Regex (pull_request) Has been cancelled Details CI / Compose Bucket Idempotency (pull_request) Has been cancelled Details CI / Unit & Component Tests (push) Failing after 2m49s Details CI / Backend Unit Tests (push) Has been cancelled Details CI / fail2ban Regex (push) Has been cancelled Details CI / Compose Bucket Idempotency (push) Has been cancelled Details CI / OCR Service Tests (push) Has been cancelled Details Closes #508. Our gitea-runner advertises labels ubuntu-latest / ubuntu-24.04 / ubuntu-22.04. `runs-on: self-hosted` never matches → dispatched deploy jobs sit in the queue forever. The runner is still genuinely self-hosted (DooD socket, joined to gitea_gitea net, single-tenant per ADR-011) — the `self-hosted` token was just an unconfirmed assumption about the label name. Unblocks #497 / #499 first deploy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 16:15:53 +02:00
Marcel	e5363913ec	fix(fail2ban): pin polling backend so jail actually reads Caddy access log Some checks failed CI / Unit & Component Tests (push) Failing after 2m49s Details CI / OCR Service Tests (push) Successful in 16s Details CI / Backend Unit Tests (push) Successful in 4m8s Details CI / fail2ban Regex (push) Successful in 37s Details CI / Compose Bucket Idempotency (push) Failing after 53s Details CI / Unit & Component Tests (pull_request) Failing after 2m46s Details CI / OCR Service Tests (pull_request) Successful in 15s Details CI / Backend Unit Tests (pull_request) Successful in 4m14s Details CI / fail2ban Regex (pull_request) Successful in 37s Details CI / Compose Bucket Idempotency (pull_request) Failing after 50s Details Closes #503. Debian's fail2ban package ships defaults-debian.conf with `[DEFAULT] backend = systemd`. Without an explicit override, our familienarchiv-auth jail inherits the systemd backend at runtime, reads from journald, and never inspects /var/log/caddy/access.log. A live login brute-force would not be banned. Add `backend = polling` to the jail and a CI step that links the jail into /etc/fail2ban/ and asserts `fail2ban-client -d` resolves it to the polling backend, not the inherited systemd backend. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 14:59:40 +02:00
Marcel	440a191138	infra(workflows): annotate env-file cleanup as load-bearing The `if: always()` conditional on the env-file cleanup step in both deploy workflows is what makes the ADR-011 single-tenant runner trust model safe: secrets land on disk before each deploy and are wiped unconditionally afterwards. A future workflow refactor that drops `if: always()` would silently leave plaintext secrets on the runner on any failed deploy. The ADR documents this; the workflow file did not. Adds a prominent inline comment so the next reader of the YAML sees the constraint without having to cross-reference ADR-011. No behaviour change — both workflows still parse. Addresses @nora's round-2 suggestion on PR #499 — "linchpin of the ADR-011 trust model". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 14:09:12 +02:00
Marcel	09680557ef	security(caddy): add Permissions-Policy header Adds `Permissions-Policy: camera=(), microphone=(), geolocation=()` to the shared (security_headers) snippet, so both archiv vhosts and the git vhost deny browser APIs the app does not use. Reduces blast radius of an XSS landing in a privileged origin. The deploy smoke steps in nightly.yml and release.yml gain a matching assertion against the canonical header value, so a future Caddyfile edit that drops or loosens the header (e.g. `camera=(self)`) fails the deploy instead of regressing silently. `caddy validate` against caddy:2 passes; both workflow YAMLs parse. Addresses @nora's round-2 suggestion on PR #499 — "lower-impact than CSP but nearly free". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 14:06:13 +02:00
Marcel	8fcf653cb0	ci(smoke): pin HSTS to preload-list-eligible value Replaces the presence-only `grep -qi strict-transport-security` smoke assertion in both nightly.yml and release.yml with a value-pinning regex that requires `max-age=31536000`, `includeSubDomains`, and `preload`. A future Caddyfile edit that drops any of those three parts now fails the deploy smoke step instead of passing silently. Verified locally that the new pattern matches the preload-eligible value and rejects three degraded forms (short max-age, missing includeSubDomains, missing preload). Addresses @sara's round-2 note on PR #499 — "presence check, not value check". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 14:05:02 +02:00
Marcel	fe1451f570	ci(smoke): pin curl to 127.0.0.1 via --resolve The smoke step previously curled the public hostname unconditionally, which routes the runner's request via DNS → router → back into the same host. Many SOHO routers do not implement hairpin NAT (or do so only after a firmware update), so the deploy may pass on day one and silently fail on day 90. --resolve "<host>:443:127.0.0.1" pins the hostname to the runner's loopback while keeping SNI on the public name (so the cert validates correctly and the Caddy vhost block matches). The smoke test now verifies that the Caddy-on-the-same-host is serving the right hostname end-to-end, with no router dependency. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 13:12:05 +02:00
Marcel	f2ec81547b	ci(deploy): add --pull to docker compose build for CVE pickup Without --pull, the host's Docker layer cache wins: if a CVE drops in node:20.19.0-alpine3.21 / postgres:16-alpine and the vendor re-publishes the same tag, the runner keeps serving the cached layer until the cache is manually cleared — a silent supply-chain blind spot. Adding --pull to both `compose build` invocations costs a single re-pull per run and lifts the base-image patch lag from "next host prune" to "next nightly". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 13:10:59 +02:00
Marcel	7e430998b8	security(fail2ban): widen jail to /forgot-password and rate-limit 429 The filter only watched /api/auth/login 401 — leaving the forgot-password endpoint open to: - email enumeration (slow brute-force probing which addresses exist) - password-reset brute-force against accounts whose addresses leak Widens the failregex to /api/auth/(login\|forgot-password) and adds 429 to the status alternation so a future in-app rate-limiter response is also caught by the jail (defense in depth). CI assertions extended to cover both new dimensions plus a negative case on an unrelated 401 endpoint (/api/documents) — pins that the widening did not over-match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 13:10:08 +02:00
Marcel	156afa14a2	test(ci): add compose bucket-bootstrap idempotency job The create-buckets service in docker-compose.prod.yml runs on every `docker compose up` (one-shot, restart=no). A re-deploy that fails because the user/bucket/policy already exists would block the whole nightly/release pipeline — and the only way to find out today is to run a second deploy. This job runs the bootstrap twice against a throwaway minio stack and asserts both invocations exit 0. Caught at PR time, not at the third nightly deploy at 02:00. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 13:08:51 +02:00
Marcel	9652894aa4	test(ci): add fail2ban-regex regression job Caddy 2.x emits JSON access logs; the failregex in infra/fail2ban/filter.d/familienarchiv-auth.conf depends on the "remote_ip" → "uri" → "status" key order being stable. A future Caddy upgrade that reorders fields would break the jail silently (regex no longer matches → fail2ban returns 0 hits → host stops banning brute-force, discovered only at the next incident). This job pins the contract: a sample /api/auth/login 401 line must match (1 hit) and a /api/auth/login 200 line must not (0 hits). Catches a regression at PR time instead of in production. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 13:03:04 +02:00
Marcel	83565c6bb5	docs(ci): document workflow operational assumptions The two deploy workflows make two non-obvious assumptions that future maintainers should not have to rediscover by reading the diff: 1. Single-tenant self-hosted runner — the .env.* file lands on disk during the deploy and is cleaned up unconditionally. Multi-tenant usage would require switching to stdin-piped env input. 2. Host docker layer cache is authoritative — there is no actions/cache directive; a host-level `docker system prune` will cold-start the next build. Both notes added as block comments at the top of each workflow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 12:06:48 +02:00
Marcel	a91a3e1f61	feat(ci): smoke test production deploy after up --wait Mirrors the nightly.yml smoke step against archiv.raddatz.cloud. Catches the same three failure modes (Caddy not reloaded, DNS missing, HSTS dropped, /actuator block bypassed) on the prod path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 12:05:41 +02:00
Marcel	c523721ce8	feat(ci): smoke test staging deploy after up --wait Healthchecks prove containers are healthy on the docker network; they do not prove the public URL is reachable, HSTS still fires, or /actuator is still blocked at the edge. Add a post-deploy smoke step to nightly.yml that: 1. GETs https://staging.raddatz.cloud/login (frontend reachable) 2. asserts the response includes the Strict-Transport-Security header 3. asserts /actuator/health returns 404 (defense-in-depth verified) Failure aborts the workflow before the env-file cleanup step. The cleanup step still runs because it is `if: always()`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 12:05:00 +02:00
Marcel	334b507476	feat(ci): add release production deploy workflow Fires on `v*` tag push. Tags the built images with the git tag so rollbacks are a one-liner (TAG=<previous> docker compose ... up -d). `up -d --wait` blocks until every service healthcheck reports healthy; a bad release fails the workflow rather than crash-looping silently. The .env.production file containing all Gitea secrets is removed in `if: always()` after the deploy step. Refs #497. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 21:56:37 +02:00
Marcel	59349dfe93	feat(ci): add nightly staging deploy workflow Runs daily at 02:00 (and on workflow_dispatch). Builds the prod compose stack with BuildKit, writes a transient .env.staging from Gitea secrets, then `docker compose up -d --wait` so the job fails loudly if any service's healthcheck never reports healthy. The --profile staging flag starts the mailpit catcher in place of a real SMTP relay; no production SMTP credentials touch the staging environment. The .env.staging file is cleaned up in `if: always()` to avoid leaving secrets in the runner workspace between runs. Refs #497. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 21:55:41 +02:00
Marcel	eccecf35e3	ci: add combined coverage gate to unit-tests job Some checks failed CI / Unit & Component Tests (push) Failing after 5m54s Details CI / Backend Unit Tests (push) Failing after 3m20s Details CI / Unit & Component Tests (pull_request) Failing after 5m48s Details CI / OCR Service Tests (push) Successful in 38s Details CI / OCR Service Tests (pull_request) Successful in 33s Details CI / Backend Unit Tests (pull_request) Failing after 3m21s Details Runs test:coverage (server v8 + client Istanbul) after tests, hard-gates on both 80% branch thresholds, and uploads coverage/ as an artifact. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 17:51:10 +02:00
Marcel	a158048f45	fix(ci): expose Docker socket env vars for Testcontainers in backend job DOCKER_HOST makes the socket explicit rather than relying on runner config propagation; TESTCONTAINERS_RYUK_DISABLED=true avoids Ryuk watchdog start failures in nested container environments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 16:03:36 +02:00
Marcel	ac999066dd	fix(ci): add TZ=Europe/Berlin to frontend test step date-buckets.spec.ts midnight tests pass timezone-aware dates (+02:00) which are 22:00 UTC the prior day; setHours(0,0,0,0) uses local TZ. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 16:03:36 +02:00
Marcel	e2d74ff880	ci: add npm run build step to unit-tests job Some checks failed CI / Unit & Component Tests (push) Has been cancelled Details CI / OCR Service Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details The prerender fix only prevents regression if the build is actually run in CI. Without this gate, a future prerendered route that becomes unreachable behind auth would fail silently until someone runs the build manually. Fits after the test step in the existing unit-tests job — no new job needed since node_modules is already cached for the Playwright container. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 14:25:32 +02:00
Marcel	410b91e2a5	chore: upgrade upload-artifact action from v3 to v4 Some checks failed CI / Unit & Component Tests (push) Failing after 3m34s Details CI / OCR Service Tests (push) Successful in 43s Details CI / OCR Service Tests (pull_request) Successful in 30s Details CI / Backend Unit Tests (push) Failing after 3m15s Details CI / Unit & Component Tests (pull_request) Failing after 3m30s Details CI / Backend Unit Tests (pull_request) Failing after 3m14s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 14:54:29 +02:00
Marcel	84c09e41ef	test(ocr): add /train-sender auth tests and run sender registry tests in CI Add 503/403 auth tests for the /train-sender endpoint, matching the pattern already used for /train and /segtrain. Also surface test_sender_registry.py in CI (it needs no ML stack) and add pytest-asyncio to the install step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 21:14:27 +02:00
Marcel	68b57918eb	ci: add ocr-tests job for spell_check and confidence unit tests Some checks failed CI / Unit & Component Tests (push) Failing after 2m48s Details CI / OCR Service Tests (push) Successful in 1m59s Details CI / Backend Unit Tests (push) Failing after 2m53s Details CI / Unit & Component Tests (pull_request) Failing after 2m52s Details CI / OCR Service Tests (pull_request) Successful in 33s Details CI / Backend Unit Tests (pull_request) Failing after 2m54s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:55:07 +02:00
Marcel	4b8e0637ce	fix(ci): pin DOCKER_API_VERSION=1.43 for Testcontainers on NAS runner Some checks failed CI / Unit & Component Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details CI / Unit & Component Tests (push) Successful in 3m41s Details CI / Backend Unit Tests (push) Failing after 2m41s Details Testcontainers 2.0.2 (via Spring Boot 4.0) negotiates Docker API 1.44, but the NAS runner has Docker Engine 24.x which caps at 1.43. Forcing the client version down unblocks tests until Docker is upgraded on the NAS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 12:28:57 +02:00

1 2

66 Commits