UploadZone is the canonical browser-test template referenced from issue #496
implementation guidance. Adding afterEach(cleanup) makes it match the
TranscriptionPanelHeader pattern and prevents cross-test DOM leakage as more
tests are added in this branch.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per Felix's review on issue #496, tests should query observable behaviour via
ARIA roles, not test-only data-testid attributes. Replaces every
'document.querySelector([data-testid=...])' with 'page.getByRole(...)'.
The disabled-button click test uses force: true so Playwright bypasses its
enabled-check — the behaviour under test is precisely that the click is
ignored.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes scaffolding pages from initial Paraglide setup that were never
navigated to in production. Shrinks the measured coverage surface and
removes dead code from the production bundle. CLAUDE.md route tables
updated to drop the demo/ entry.
Refs #496.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sara flagged that a future "compose cleanup" PR could silently drop the
backend volumes block and CI would happily pass while mass import on
staging silently broke. Adds a pre-build step that renders the staging
compose config and fails the deploy if `target: /import` or
`read_only: true` is missing.
Local verification of the guard:
- Volumes block removed → `grep -q 'target: /import'` exits 1 → step fails
- Volumes block present → both greps match → step passes
Addresses Sara's review on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors the staging change. The host directory does not yet exist on
the production server — first production release that consumes this
will create an empty bind source via Docker's auto-create behaviour;
mass import then reports "no spreadsheet found" until an operator
pre-stages a payload there.
Addresses Tobias's review on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The compose file now requires IMPORT_HOST_DIR or refuses to start
(#526). Without this line the next nightly deploy would fail with a
clear interpolation error, but it should not fail — the staging
import payload already lives at this host path (rsync'd in #526).
Addresses Tobias's review on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DEPLOYMENT.md line 81 declares any compose env var missing from §2 a
blocking review comment. IMPORT_HOST_DIR (added on this branch) was
unmentioned. Adds the row and rewrites §6.4 so the staging/prod operator
workflow (rsync host → set env → trigger import) is in the runbook,
not just buried in compose comments.
Addresses review feedback from Markus and Tobias on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tobias and Markus both flagged that a shared default (/srv/familienarchiv/
import) invites silent collision when staging and prod cohabit one host.
Switch to ${IMPORT_HOST_DIR:?...} so compose refuses to start without an
explicit per-env path — collision becomes structurally impossible.
The error message points operators at docs/DEPLOYMENT.md so the recovery
step is one click away. IMPORT_HOST_DIR moves from "Optional" to the
main required-env-vars block in the header.
Addresses review feedback from Markus, Tobias, and Nora on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The hardcoded `static final String IMPORT_DIR = "/import"` was the only
non-`@Value` configurable input in MassImportService — every column
index next to it is wired through `app.import.col.*`. Lifts the
contract from infrastructure (compose bind mount) into application
config (`app.import.dir`), with `/import` as the default so the existing
bind-mount path keeps working.
Addresses review feedback from Markus and Felix on #526.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`MassImportService` reads the ODS spreadsheet and referenced PDFs from a
hardcoded `/import` path inside the backend container. Dev compose
already bind-mounts `./import:/import`, but the prod compose had no
equivalent, so `POST /api/admin/import` would always fail on staging/prod
with "no spreadsheet found".
Mount strategy:
- Source path is env-driven (`IMPORT_HOST_DIR`), defaulting to
`/srv/familienarchiv/import` so the host path is stable across CI
deploys (the compose working dir is recreated each run, so `./import`
would not persist).
- Read-only — `MassImportService` only reads (`Files.list` /
`Files.walk`), never writes. Read-only mount makes that contract
explicit and prevents the backend container from mutating the source
PDFs.
- Empty / missing path is harmless: the import API just returns the
existing "no spreadsheet found" error rather than crashing the
container.
To use on staging: rsync the import folder to
`/srv/familienarchiv-staging/import/` on the host, set
`IMPORT_HOST_DIR=/srv/familienarchiv-staging/import` in `.env.staging`,
redeploy, trigger import from `/admin/system`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The new alpine-based frontend production image (`node:20.19.0-alpine3.21`)
resolves `localhost` only to `::1` in /etc/hosts. SvelteKit's adapter-node
binds to 0.0.0.0 (IPv4 only), so `wget http://localhost:3000/login` from
inside the container connects to ::1 and gets "Connection refused" every
15s. Container goes unhealthy → `docker compose up --wait` fails → nightly
staging deploy fails. The app itself is fine.
Switching to 127.0.0.1 bypasses /etc/hosts and matches what Node actually
listens on.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- frontend/login: derive cookie `secure` flag from request URL protocol.
Pre-PR the cookie was only read by SSR so the flag didn't matter; now
the cookie IS the API credential and must be Secure on HTTPS or it
leaks a 24h Basic token on plaintext networks. Dev runs over HTTP and
would silently lose the cookie if we hardcoded `secure: true`, so the
flag follows `event.url.protocol === 'https:'`.
- SecurityConfig: rewrite the CSRF-disabled comment. The old
"browsers block cross-origin custom headers" justification no longer
holds once /api/* is authenticated via the cookie. Make the
load-bearing dependencies explicit: SameSite=strict on the auth_token
cookie + Spring's default CORS rejection.
- AuthTokenCookieFilter:
- Scope to /api/* only. /actuator/health and similar must not be
cookie-authenticated.
- Refuse malformed percent-encoding (URLDecoder throws); forward the
request without a promoted Authorization rather than crash.
- Use isBlank() instead of isEmpty() per Nora.
- Javadoc warning: getHeaderNames/getHeaders exposes the Basic
credential; any future header-iterating logger must scrub
Authorization before logging.
- Tests: add `passes_through_unchanged_when_request_is_outside_api_scope`
(/actuator/health with cookie should NOT be wrapped) and
`passes_through_unchanged_when_cookie_value_is_malformed_percent_encoding`.
Tighten the explicit-header test to verify same-instance forwarding
rather than just header equality.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#520.
The login action stores `Basic <base64>` in an HttpOnly `auth_token`
cookie. SSR fetches from hooks.server.ts explicitly set the
Authorization header. Vite's dev proxy does the same on every
/api/* request. Caddy in production does NOT. So browser-side
fetch() and EventSource() calls reach the backend without auth,
get 401 + WWW-Authenticate: Basic, and the browser pops a native
auth dialog over the SPA.
Add AuthTokenCookieFilter (Ordered.HIGHEST_PRECEDENCE, before any
Spring Security filter) that promotes the cookie to a request
header when no explicit Authorization is present. URL-decodes the
cookie value because SvelteKit URL-encodes spaces ("Basic " ->
"Basic%20") when serializing the cookie. Works the same for REST,
SSE (/api/notifications/stream, /api/ocr/jobs/.../progress), and
any other browser-direct backend call.
5 tests in AuthTokenCookieFilterTest cover: URL-decoded promotion,
explicit-Authorization-wins precedence, no-cookies pass-through,
absent-auth-token pass-through, empty-value pass-through.
Also: add `@ActiveProfiles("test")` to ThumbnailServiceIntegrationTest,
the one remaining @SpringBootTest in the suite that wasn't annotated.
After #516 made UserDataInitializer fail-closed outside dev/test/e2e,
this test's context load was throwing. Restores green main.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#518.
UserDataInitializer.initAdminUser was doing groupRepository.save(adminGroup)
unconditionally. If a previous boot had seeded the group but failed
before creating the admin user (or if the operator deleted just the
admin row to retry with a corrected APP_ADMIN_USERNAME), the next
seed attempt violated user_groups_name_key and aborted the context.
Switch to the same findByName(...).orElseGet(...) pattern initE2EData
already uses for the "Leser" group.
Tests in AdminSeedFailClosedTest:
- reuses_existing_Administrators_group_when_seeding_a_new_admin
- creates_Administrators_group_when_seeding_admin_on_a_fresh_database
Plus updated existing tests to stub groupRepository.save now that the
seed path also exercises it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#512.
The previous `(block_actuator)` snippet emitted `respond @actuator 404`
at the top level of each archive vhost. But each vhost also has a
catch-all `handle { reverse_proxy ... }` that matches /actuator/*
too. Caddy's `handle` blocks are mutually exclusive — once one matches,
the request never reaches a top-level `respond`. So /actuator/health
was being proxied to the backend, which 302s to /login.
Wrap the actuator response in its own `handle /actuator/*` block.
Caddy sorts `handle` blocks by path specificity, so /actuator/* wins
over the catch-all and the 404 is actually returned.
Verified with `caddy validate` against the caddy:2 image.
Also unblocks the nightly.yml smoke test's `/actuator/health → 404`
assertion, which has been failing since the first staging deploy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses Nora's review concern on #513/#516.
The previous fix only made env-vars take effect — it did NOT close the
fail-open default path. If an operator forgets APP_ADMIN_USERNAME /
APP_ADMIN_PASSWORD on first prod boot, the seeded admin is the
well-known `admin@familienarchiv.local` / `admin123` and is permanently
locked (UserDataInitializer only seeds when the row is missing).
Refuse to seed outside dev/test/e2e profiles when either credential
matches the documented default. The startup fails fast with a clear
message pointing at the env-var names and the permanence trap.
Also adds Markus/Felix/Sara's "pin the Java side" coverage: a
reflection test on the @Value placeholder catches a future rename
of `${app.admin.email:...}` back to `${app.admin.username:...}`,
which would otherwise pass the yaml-side test but silently break
the binding.
Tests:
- AdminSeedFailClosedTest pins fail-closed for non-local profiles
and verifies the dev/test/e2e bypass.
- AdminSeedPropertyKeyTest now also asserts the @Value placeholder
string on UserDataInitializer.adminEmail/adminPassword.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#513.
UserDataInitializer reads `@Value("${app.admin.email:...}")` but
application.yaml mapped APP_ADMIN_USERNAME to `app.admin.username`.
The keys never connected — env vars APP_ADMIN_USERNAME and
APP_ADMIN_PASSWORD were silently ignored and the admin user got
seeded with the hardcoded defaults admin@familyarchive.local /
admin123.
For production this is HIGH severity: DEPLOYMENT.md §3.5 documents
the admin password as permanently locked on first deploy. The
bug locked the lock-in to dev defaults, not to whatever an operator
set in PROD_APP_ADMIN_PASSWORD.
Rename yaml key from `username:` to `email:` so the Spring property
`app.admin.email` actually exists. Keep env-var name
APP_ADMIN_USERNAME (matches the already-set Gitea secrets and
DEPLOYMENT.md §3.3). Default value updated to an email-shape.
Added AdminSeedPropertyKeyTest (Binder pattern, no Spring context):
verifies both `app.admin.email` and `app.admin.password` resolve
from the yaml. Confirmed red without the fix, green with it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses Sara's review request on #515.
Without this gate, a future regression that turns prerender.crawl
back on (or adds a new prerender entry whose nav links into
protected routes) would silently bake /, /documents, /persons etc.
to "redirect-to-login" HTML and re-introduce #514.
Verified the script catches the current broken build state:
$ find build/prerendered ... -not -path 'hilfe/*' ...
build/prerendered/{index,documents,persons,geschichten,stammbaum}.html
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#514.
The build was prerendering protected routes via crawl from
/hilfe/transkription. Their load functions throw redirect('/login')
during the build (no auth cookie), so SvelteKit captured the redirect
as static HTML and shipped /app/build/prerendered/{index,documents,
persons,geschichten,stammbaum}.html with a `location.href=/login`
script. In production these files are served BEFORE hooks.server.ts
runs, so an authenticated user with a valid cookie is still served
the baked bounce-back page.
Setting `crawl: false` keeps the explicit /hilfe/transkription entry
prerendered (needed for the public help page) without dragging the
nav targets along with it.
Verified locally: build now emits only `hilfe/transkription.html`
under build/prerendered/, no index.html or documents.html etc.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#510.
`docker compose up -d --wait` exits 1 even when every service is
healthy because the one-shot `create-buckets` exits 0 and --wait
expects "running". The whole stack came up fine on staging, but the
workflow gate failed before the smoke step could run.
Two changes:
1. create-buckets: `restart: "no"` declares one-shot intent.
2. backend.depends_on: add `create-buckets: service_completed_successfully`.
With both, compose v2.20+ understands create-buckets is a one-shot
that must complete successfully, and --wait treats exited(0) as the
target state. Backend startup now also correctly gates on bucket
bootstrap (closes a latent race where backend could start before
the archiv-app policy was bound).
Verified `docker compose config --quiet` parses and the resolved
config shows the right dependency graph.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#508.
Our gitea-runner advertises labels ubuntu-latest / ubuntu-24.04 /
ubuntu-22.04. `runs-on: self-hosted` never matches → dispatched
deploy jobs sit in the queue forever. The runner is still
genuinely self-hosted (DooD socket, joined to gitea_gitea net,
single-tenant per ADR-011) — the `self-hosted` token was just an
unconfirmed assumption about the label name.
Unblocks #497 / #499 first deploy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#506.
Under Docker-out-of-Docker (the production Gitea Actions runner), the
host daemon resolves the relative bind-mount path against the host
filesystem — not the runner container's /workspace. The script is not
there, so Docker creates an empty directory at /bootstrap.sh and the
entrypoint fails with `/bootstrap.sh: Is a directory`.
Bake the script into a tiny derived image (infra/minio/Dockerfile) so
there is no runtime path resolution. Works in DooD, regular Docker,
and CI.
Unblocks the staging / production deploy pipelines from #497 / #499
and turns the Compose Bucket Idempotency CI job green.
Verified locally:
- `docker compose ... config --quiet` parses
- `docker compose ... build create-buckets` builds the image
- bootstrap.sh exists as a +x file at /bootstrap.sh inside the image
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes#503.
Debian's fail2ban package ships defaults-debian.conf with
`[DEFAULT] backend = systemd`. Without an explicit override, our
familienarchiv-auth jail inherits the systemd backend at runtime,
reads from journald, and never inspects /var/log/caddy/access.log.
A live login brute-force would not be banned.
Add `backend = polling` to the jail and a CI step that links the jail
into /etc/fail2ban/ and asserts `fail2ban-client -d` resolves it to
the polling backend, not the inherited systemd backend.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`archiv-app` is the bucket-scoped MinIO service account introduced
in PR #499 alongside the production deploy pipeline. Until now the
term only appeared in `infra/minio/bootstrap.sh` and the prod compose
file; a reader encountering `S3_ACCESS_KEY: archiv-app` had no
single-page reference distinguishing it from the MinIO root account.
Adds a new "Infrastructure Terms" section to docs/GLOSSARY.md so the
distinction (root account vs. application service account) and the
attached `archiv-app-policy` scope live in the canonical glossary
location. Cross-links to ADR-010 for the MinIO-stays-self-hosted
rationale. Addresses @elicit's round-2 recommendation on PR #499.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The docker network was the only `archive-*` identifier in either
compose file; everything else (user, db, bucket, service account,
project name) uses the `archiv-*` spelling. Reviewers' eyes stuttered
on it on the prod compose review (round 2 of PR #499 — Markus and
Tobi). Renamed in both prod and dev compose for consistency and
updated the single doc reference to the dev-project-prefixed
network name.
Operational note: applying this change to a running stack will
recreate the network on the next `docker compose up`; containers
restart, named volumes are unaffected.
`docker compose config --quiet` passes for both compose files and
for the staging profile. Sweep confirms zero `archive-net`
references remain in the tree.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The `if: always()` conditional on the env-file cleanup step in both
deploy workflows is what makes the ADR-011 single-tenant runner trust
model safe: secrets land on disk before each deploy and are wiped
unconditionally afterwards. A future workflow refactor that drops
`if: always()` would silently leave plaintext secrets on the runner
on any failed deploy.
The ADR documents this; the workflow file did not. Adds a prominent
inline comment so the next reader of the YAML sees the constraint
without having to cross-reference ADR-011. No behaviour change — both
workflows still parse. Addresses @nora's round-2 suggestion on PR
#499 — "linchpin of the ADR-011 trust model".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The mailpit service healthcheck previously assumed `wget` ships in
the axllent/mailpit image. That's true for v1.29.7 but is not part
of the image's contract — a future Alpine slim-down could drop wget
and silently disable the healthcheck. Switched to BusyBox `nc -z
localhost 8025`, which is a TCP-port open check with no dependency
beyond BusyBox itself.
Verified inside axllent/mailpit:v1.29.7 that `nc` is present
(/usr/bin/nc, BusyBox v1.37.0) and that the proposed command
returns 0 against an open port and non-zero against a closed one.
Compose still parses with `--profile staging`. Addresses @tobi's
round-2 suggestion on PR #499.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Production never sources PDFs from localhost or 127.0.0.1 — the OCR
service only reads from MinIO over the internal docker network. The
Python default (`minio,localhost,127.0.0.1`) was permissive on
purpose for local dev, but in production a future change to that
default — or a host-env override — would silently broaden the SSRF
surface. Pinning the env var explicitly here freezes the allowlist
to the one hostname production actually needs.
`docker compose config --quiet` and `--profile staging config
--quiet` both still pass. Verified the resolved config emits
`ALLOWED_PDF_HOSTS: minio`. Addresses @nora's round-2 suggestion on
PR #499 — "five characters of YAML, lifetime guarantee".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds `Permissions-Policy: camera=(), microphone=(), geolocation=()` to
the shared (security_headers) snippet, so both archiv vhosts and the
git vhost deny browser APIs the app does not use. Reduces blast radius
of an XSS landing in a privileged origin.
The deploy smoke steps in nightly.yml and release.yml gain a matching
assertion against the canonical header value, so a future Caddyfile
edit that drops or loosens the header (e.g. `camera=(self)`) fails the
deploy instead of regressing silently.
`caddy validate` against caddy:2 passes; both workflow YAMLs parse.
Addresses @nora's round-2 suggestion on PR #499 — "lower-impact than
CSP but nearly free".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the presence-only `grep -qi strict-transport-security` smoke
assertion in both nightly.yml and release.yml with a value-pinning
regex that requires `max-age=31536000`, `includeSubDomains`, and
`preload`. A future Caddyfile edit that drops any of those three
parts now fails the deploy smoke step instead of passing silently.
Verified locally that the new pattern matches the preload-eligible
value and rejects three degraded forms (short max-age, missing
includeSubDomains, missing preload). Addresses @sara's round-2 note
on PR #499 — "presence check, not value check".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The top-level deployment diagram lagged the C4 L2 diagram, which
correctly notes that SSE notifications are fronted by Caddy. The
mermaid showed Browser → Backend direct, which would only be true
if the backend port were exposed publicly (it is not — all docker
ports bind to 127.0.0.1).
Fixes the inconsistency Markus flagged on PR #499: the public
surface is Caddy and Caddy only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the Caddy hop to seq-auth-flow.puml and surfaces the two
production-relevant header behaviours:
- Caddy terminates TLS and forwards X-Forwarded-Proto: https
- Spring Boot trusts this header (server.forward-headers-strategy:
native, ForwardedRequestCustomizer at the Jetty layer), so
request.getScheme() returns "https"
- The Set-Cookie response carries the Secure flag because the
observed scheme is https — without forward-headers-strategy this
would silently drop to plain http and the cookie would lose Secure
Closes the doc-currency gap flagged in the Markus review on PR #499:
"Auth flow change → docs/architecture/c4/seq-auth-flow.puml".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Records the operational assumption that nightly.yml and release.yml
bake in: the self-hosted runner is single-tenant, so writing secrets
to .env.staging / .env.production on disk and removing them via an
`if: always()` cleanup step is acceptable for v1.
Documents the three migration triggers (second repo on the runner,
untrusted PR execution, move to shared infrastructure) and the
one-step migration path (--env-file <(printf '%s' "$SECRET_BLOB"))
so the next operator does not silently break the trust assumption.
The in-comment notes at the top of both workflow files already point
at this ADR's content; this commit records the decision in the durable
location the doc-currency table demands.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Records the reversal of the earlier "migrate to Hetzner Object Storage"
direction in docs/infrastructure/production-compose.md. Documents the
cost/benefit (current 13 GB fits trivially on the VPS; OBS billing is
dominated by base fee at this size; migration is a three-env-var swap
plus `mc mirror`, no application rewrite cost).
Captures the four triggers that should re-open the decision (50 GB
threshold, healthcheck latency, VPS upgrade cost, backup runtime) so
the deferral does not become an indefinite punt.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Records the decision to make docker-compose.prod.yml a fully self-contained
file rather than an overlay over docker-compose.yml. Captures the cost
(env-var duplication across dev and prod files) and the benefit (single
file the reviewer can hold in their head, no Compose merge-rule
surprises, automatic project-name namespacing for cohabiting staging +
production on one host).
Surfaces the retirement of the earlier overlay narrative in
docs/infrastructure/production-compose.md so a future maintainer does
not reverse the choice out of ignorance.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The repo's renovate.json only configures TipTap grouping; Renovate is
not currently active against MinIO / mc / mailpit / Postgres / Node /
Caddy. The "Renovate keeps it current" comments were aspirational —
those tags will rot until Renovate is bootstrapped (tracked in a
follow-up issue).
The "Pinned mc release; Renovate keeps it current" comment is gone
already since the create-buckets entrypoint was extracted to a script
in the preceding MinIO-policy commit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The smoke step previously curled the public hostname unconditionally,
which routes the runner's request via DNS → router → back into the same
host. Many SOHO routers do not implement hairpin NAT (or do so only after
a firmware update), so the deploy may pass on day one and silently fail
on day 90.
--resolve "<host>:443:127.0.0.1" pins the hostname to the runner's
loopback while keeping SNI on the public name (so the cert validates
correctly and the Caddy vhost block matches). The smoke test now
verifies that the Caddy-on-the-same-host is serving the right
hostname end-to-end, with no router dependency.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Without --pull, the host's Docker layer cache wins: if a CVE drops in
node:20.19.0-alpine3.21 / postgres:16-alpine and the vendor re-publishes
the same tag, the runner keeps serving the cached layer until the cache
is manually cleared — a silent supply-chain blind spot.
Adding --pull to both `compose build` invocations costs a single
re-pull per run and lifts the base-image patch lag from "next host
prune" to "next nightly".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The filter only watched /api/auth/login 401 — leaving the forgot-password
endpoint open to:
- email enumeration (slow brute-force probing which addresses exist)
- password-reset brute-force against accounts whose addresses leak
Widens the failregex to /api/auth/(login|forgot-password) and adds 429 to
the status alternation so a future in-app rate-limiter response is also
caught by the jail (defense in depth).
CI assertions extended to cover both new dimensions plus a negative case
on an unrelated 401 endpoint (/api/documents) — pins that the widening
did not over-match.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The create-buckets service in docker-compose.prod.yml runs on every
`docker compose up` (one-shot, restart=no). A re-deploy that fails
because the user/bucket/policy already exists would block the whole
nightly/release pipeline — and the only way to find out today is to
run a second deploy.
This job runs the bootstrap twice against a throwaway minio stack and
asserts both invocations exit 0. Caught at PR time, not at the third
nightly deploy at 02:00.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces MinIO's built-in `readwrite` policy (which grants s3:* on
arn:aws:s3:::* — every bucket present and future) with a bucket-scoped
custom policy `archiv-app-policy`:
- s3:GetObject / s3:PutObject / s3:DeleteObject on familienarchiv/*
- s3:ListBucket / s3:GetBucketLocation on familienarchiv
The previous configuration silently regressed the least-privilege guarantee
that the service-account separation was supposed to provide: a future
second bucket (logs, backups, mc-mirror staging) would have been
read/write/delete-accessible to a compromised backend.
While at it, two follow-on fixes:
1. Extract the entrypoint to infra/minio/bootstrap.sh. The previous
inline `/bin/sh -c "..."` was already at the YAML-escaping ceiling;
adding the policy-JSON heredoc would have made it unreadable.
2. Replace the `| grep -q readwrite || exit 1` fatal-check with a
POSIX `case` substring match. The minio/mc image ships coreutils +
bash but NOT grep/awk/sed — the original check was a no-op that
ALWAYS exited 1 (verified locally). The new check passes on the
first invocation and on every subsequent re-deploy.
Idempotency verified locally: two consecutive `docker compose run --rm
create-buckets` invocations both exit 0 with the user bound to the
new policy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Caddy 2.x emits JSON access logs; the failregex in
infra/fail2ban/filter.d/familienarchiv-auth.conf depends on the
"remote_ip" → "uri" → "status" key order being stable. A future Caddy
upgrade that reorders fields would break the jail silently (regex no
longer matches → fail2ban returns 0 hits → host stops banning
brute-force, discovered only at the next incident).
This job pins the contract: a sample /api/auth/login 401 line must
match (1 hit) and a /api/auth/login 200 line must not (0 hits).
Catches a regression at PR time instead of in production.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops @SpringBootTest + PostgresContainerConfig + @MockitoBean S3Client in
favour of Spring's Binder API against application.yaml. The new test binds
the property into the typed ServerProperties.ForwardHeadersStrategy enum,
so typos (`nativ`, `Native`, `framework `) and future enum renames fail
the build with BindException — addresses the silent-coercion concern that
the YAML-string assertion missed.
Verified the test goes red on a typo (BindException: Failed to convert
"nativ" → ForwardHeadersStrategy) and green on `native`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Updates DEPLOYMENT.md to match the infra changes in this PR:
§1 OCR memory — point operators at the new OCR_MEM_LIMIT env var instead
of telling them to edit "the prod overlay".
§2 OCR env vars — add OCR_MEM_LIMIT to the table.
§3.1 server setup — replace fail2ban prose with concrete `ln -sf`
commands referencing the committed jail/filter.
Document the single-tenant runner assumption near
the runner-registration step.
§3.4 first deploy — describe the new automated smoke test step.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The two deploy workflows make two non-obvious assumptions that future
maintainers should not have to rediscover by reading the diff:
1. Single-tenant self-hosted runner — the .env.* file lands on disk
during the deploy and is cleaned up unconditionally. Multi-tenant
usage would require switching to stdin-piped env input.
2. Host docker layer cache is authoritative — there is no
actions/cache directive; a host-level `docker system prune` will
cold-start the next build.
Both notes added as block comments at the top of each workflow.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors the nightly.yml smoke step against archiv.raddatz.cloud. Catches
the same three failure modes (Caddy not reloaded, DNS missing, HSTS
dropped, /actuator block bypassed) on the prod path.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Healthchecks prove containers are healthy on the docker network; they
do not prove the public URL is reachable, HSTS still fires, or
/actuator is still blocked at the edge. Add a post-deploy smoke step
to nightly.yml that:
1. GETs https://staging.raddatz.cloud/login (frontend reachable)
2. asserts the response includes the Strict-Transport-Security header
3. asserts /actuator/health returns 404 (defense-in-depth verified)
Failure aborts the workflow before the env-file cleanup step. The
cleanup step still runs because it is `if: always()`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two files mirroring the on-host install layout:
infra/fail2ban/filter.d/familienarchiv-auth.conf
infra/fail2ban/jail.d/familienarchiv.conf
Filter parses the JSON access log emitted by Caddy (previous commit) and
matches 401 responses on /api/auth/login. Jail bans the offending IP for
30 min after 10 attempts in a 10-minute window.
Verified the failregex against four sample log lines via fail2ban-regex
in an alpine container:
- 2 brute-force 401 attempts → matched (ban)
- 1 successful login (POST /api/auth/login 200) → not matched
- 1 unrelated GET /login 200 → not matched
Date template "ts":{EPOCH} parses Caddy's Unix-epoch ts field.
The previous review iteration described this jail in DEPLOYMENT.md prose
only; committing it makes the security posture reproducible from a
fresh server build.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>