The smoke step previously curled the public hostname unconditionally,
which routes the runner's request via DNS → router → back into the same
host. Many SOHO routers do not implement hairpin NAT (or do so only after
a firmware update), so the deploy may pass on day one and silently fail
on day 90.
--resolve "<host>:443:127.0.0.1" pins the hostname to the runner's
loopback while keeping SNI on the public name (so the cert validates
correctly and the Caddy vhost block matches). The smoke test now
verifies that the Caddy-on-the-same-host is serving the right
hostname end-to-end, with no router dependency.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Without --pull, the host's Docker layer cache wins: if a CVE drops in
node:20.19.0-alpine3.21 / postgres:16-alpine and the vendor re-publishes
the same tag, the runner keeps serving the cached layer until the cache
is manually cleared — a silent supply-chain blind spot.
Adding --pull to both `compose build` invocations costs a single
re-pull per run and lifts the base-image patch lag from "next host
prune" to "next nightly".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The filter only watched /api/auth/login 401 — leaving the forgot-password
endpoint open to:
- email enumeration (slow brute-force probing which addresses exist)
- password-reset brute-force against accounts whose addresses leak
Widens the failregex to /api/auth/(login|forgot-password) and adds 429 to
the status alternation so a future in-app rate-limiter response is also
caught by the jail (defense in depth).
CI assertions extended to cover both new dimensions plus a negative case
on an unrelated 401 endpoint (/api/documents) — pins that the widening
did not over-match.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The create-buckets service in docker-compose.prod.yml runs on every
`docker compose up` (one-shot, restart=no). A re-deploy that fails
because the user/bucket/policy already exists would block the whole
nightly/release pipeline — and the only way to find out today is to
run a second deploy.
This job runs the bootstrap twice against a throwaway minio stack and
asserts both invocations exit 0. Caught at PR time, not at the third
nightly deploy at 02:00.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces MinIO's built-in `readwrite` policy (which grants s3:* on
arn:aws:s3:::* — every bucket present and future) with a bucket-scoped
custom policy `archiv-app-policy`:
- s3:GetObject / s3:PutObject / s3:DeleteObject on familienarchiv/*
- s3:ListBucket / s3:GetBucketLocation on familienarchiv
The previous configuration silently regressed the least-privilege guarantee
that the service-account separation was supposed to provide: a future
second bucket (logs, backups, mc-mirror staging) would have been
read/write/delete-accessible to a compromised backend.
While at it, two follow-on fixes:
1. Extract the entrypoint to infra/minio/bootstrap.sh. The previous
inline `/bin/sh -c "..."` was already at the YAML-escaping ceiling;
adding the policy-JSON heredoc would have made it unreadable.
2. Replace the `| grep -q readwrite || exit 1` fatal-check with a
POSIX `case` substring match. The minio/mc image ships coreutils +
bash but NOT grep/awk/sed — the original check was a no-op that
ALWAYS exited 1 (verified locally). The new check passes on the
first invocation and on every subsequent re-deploy.
Idempotency verified locally: two consecutive `docker compose run --rm
create-buckets` invocations both exit 0 with the user bound to the
new policy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Caddy 2.x emits JSON access logs; the failregex in
infra/fail2ban/filter.d/familienarchiv-auth.conf depends on the
"remote_ip" → "uri" → "status" key order being stable. A future Caddy
upgrade that reorders fields would break the jail silently (regex no
longer matches → fail2ban returns 0 hits → host stops banning
brute-force, discovered only at the next incident).
This job pins the contract: a sample /api/auth/login 401 line must
match (1 hit) and a /api/auth/login 200 line must not (0 hits).
Catches a regression at PR time instead of in production.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops @SpringBootTest + PostgresContainerConfig + @MockitoBean S3Client in
favour of Spring's Binder API against application.yaml. The new test binds
the property into the typed ServerProperties.ForwardHeadersStrategy enum,
so typos (`nativ`, `Native`, `framework `) and future enum renames fail
the build with BindException — addresses the silent-coercion concern that
the YAML-string assertion missed.
Verified the test goes red on a typo (BindException: Failed to convert
"nativ" → ForwardHeadersStrategy) and green on `native`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Updates DEPLOYMENT.md to match the infra changes in this PR:
§1 OCR memory — point operators at the new OCR_MEM_LIMIT env var instead
of telling them to edit "the prod overlay".
§2 OCR env vars — add OCR_MEM_LIMIT to the table.
§3.1 server setup — replace fail2ban prose with concrete `ln -sf`
commands referencing the committed jail/filter.
Document the single-tenant runner assumption near
the runner-registration step.
§3.4 first deploy — describe the new automated smoke test step.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The two deploy workflows make two non-obvious assumptions that future
maintainers should not have to rediscover by reading the diff:
1. Single-tenant self-hosted runner — the .env.* file lands on disk
during the deploy and is cleaned up unconditionally. Multi-tenant
usage would require switching to stdin-piped env input.
2. Host docker layer cache is authoritative — there is no
actions/cache directive; a host-level `docker system prune` will
cold-start the next build.
Both notes added as block comments at the top of each workflow.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors the nightly.yml smoke step against archiv.raddatz.cloud. Catches
the same three failure modes (Caddy not reloaded, DNS missing, HSTS
dropped, /actuator block bypassed) on the prod path.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Healthchecks prove containers are healthy on the docker network; they
do not prove the public URL is reachable, HSTS still fires, or
/actuator is still blocked at the edge. Add a post-deploy smoke step
to nightly.yml that:
1. GETs https://staging.raddatz.cloud/login (frontend reachable)
2. asserts the response includes the Strict-Transport-Security header
3. asserts /actuator/health returns 404 (defense-in-depth verified)
Failure aborts the workflow before the env-file cleanup step. The
cleanup step still runs because it is `if: always()`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two files mirroring the on-host install layout:
infra/fail2ban/filter.d/familienarchiv-auth.conf
infra/fail2ban/jail.d/familienarchiv.conf
Filter parses the JSON access log emitted by Caddy (previous commit) and
matches 401 responses on /api/auth/login. Jail bans the offending IP for
30 min after 10 attempts in a 10-minute window.
Verified the failregex against four sample log lines via fail2ban-regex
in an alpine container:
- 2 brute-force 401 attempts → matched (ban)
- 1 successful login (POST /api/auth/login 200) → not matched
- 1 unrelated GET /login 200 → not matched
Date template "ts":{EPOCH} parses Caddy's Unix-epoch ts field.
The previous review iteration described this jail in DEPLOYMENT.md prose
only; committing it makes the security posture reproducible from a
fresh server build.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds an (access_log) snippet writing JSON-formatted access logs to
/var/log/caddy/access.log with 10mb rolling and 14-file retention. Both
archive vhosts (archiv.raddatz.cloud and staging.raddatz.cloud) import
it; the git vhost is intentionally excluded.
This is the prerequisite for the fail2ban jail committed in the next
commit — fail2ban tails this file looking for 401 responses on
/api/auth/login to defend against credential stuffing.
Validated with `caddy validate` against caddy:2.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hardcoded `mem_limit: 12g` only works on CX42+ (16 GB) hosts; a CX32 (8
GB) cannot honour it. Make both mem_limit and memswap_limit driven by
the OCR_MEM_LIMIT env var, defaulting to 12g so prod deploys on a CX42
keep current behaviour. Operators on smaller hosts override to 6g.
Verified compose interpolation produces 12 GiB by default and 6 GiB when
OCR_MEM_LIMIT=6g.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous `mc admin policy attach … || true` swallowed every failure
mode: a renamed policy, an mc CLI signature change, or a transient MinIO
error would leave the bootstrap container exiting zero with the service
account possessing no permissions, and the backend would then fail every
S3 call after a "successful" deploy.
Replace the silent fallback with verify-after: keep the attach (idempotent
in current mc, redundant in older versions), then assert via `mc admin
user info` that `readwrite` ends up on archiv-app. A genuine attach
failure now exits 1 and blocks the stack from starting.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Removes the implicit `:latest` from the create-buckets bootstrap
container. Pins to RELEASE.2025-08-13T08-35-41Z so a breaking change in
mc CLI syntax cannot silently brick deploys.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Removes `:latest` from the mailpit service; pins to v1.29.7 so staging
deploys are reproducible. Renovate keeps the tag current.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- docs/infrastructure/production-compose.md: trimmed to VPS sizing,
cost breakdown, and Hetzner ecosystem rationale. The inline
compose spec (overlay + Hetzner OBS in prod) is retired; the
live file is now docker-compose.prod.yml at the repo root and
the Caddyfile lives at infra/caddy/Caddyfile. Observability
stack is called out as a not-yet-deployed gap (issue #498).
- docs/architecture/c4/l2-containers.puml: adds Caddy as a named
reverse-proxy container with the two port paths and notes the
archiv-app service-account split on MinIO access.
Refs #497.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Brings DEPLOYMENT.md in line with the production deployment landed
in #497:
- Topology diagram: frontend port 3000 (Node adapter), 127.0.0.1
binding, project-name isolation between prod and staging
- Caddyfile now lives in-tree at infra/caddy/Caddyfile (symlinked
onto the server)
- Dev vs prod table: documents the new deploy method (workflows +
--wait) and the prod-compose specific differences
- Env vars: adds MINIO_APP_PASSWORD; notes that prod compose
hardcodes the MinIO root user and the bucket name
- Bootstrap section: server hardening, fail2ban, Tailscale, the 16
Gitea secrets, and the workflow_dispatch first-deploy step
- Admin password warning: first deploy locks the password, secret
rotation after that point has no effect
- Rollback: TAG= override + docker compose up -d --wait
Refs #497.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fires on `v*` tag push. Tags the built images with the git tag so
rollbacks are a one-liner (TAG=<previous> docker compose ... up -d).
`up -d --wait` blocks until every service healthcheck reports
healthy; a bad release fails the workflow rather than crash-looping
silently. The .env.production file containing all Gitea secrets is
removed in `if: always()` after the deploy step.
Refs #497.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Runs daily at 02:00 (and on workflow_dispatch). Builds the prod
compose stack with BuildKit, writes a transient .env.staging from
Gitea secrets, then `docker compose up -d --wait` so the job fails
loudly if any service's healthcheck never reports healthy.
The --profile staging flag starts the mailpit catcher in place of
a real SMTP relay; no production SMTP credentials touch the staging
environment.
The .env.staging file is cleaned up in `if: always()` to avoid
leaving secrets in the runner workspace between runs.
Refs #497.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reverse proxy for the Familienarchiv host, validated against Caddy 2.
Includes both vhosts (production and staging), the Gitea vhost, and:
- HSTS, X-Content-Type-Options, Referrer-Policy headers on every site
- "-Server" header strip to hide the Caddy version
- /actuator/* responds 404 on both archive vhosts (defense in depth
for Spring Boot's management endpoints)
X-Frame-Options is intentionally not set in Caddy: Spring Security
configures frame-options SAMEORIGIN for the in-app PDF preview
iframe; a DENY header here would conflict.
Refs #497.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Standalone production compose file (not an overlay) that runs the
full stack on a single host. Environment isolation is achieved via
the docker compose project name (-p archiv-production / -p
archiv-staging) so the two environments cohabit cleanly.
Key choices, resolved in #497 review:
- Named volumes for persistent data (no host bind mounts)
- MinIO pinned to a specific RELEASE tag (no :latest)
- Backend uses MinIO service account (S3_ACCESS_KEY=archiv-app),
not root credentials; create-buckets bootstraps the account
- Mailpit lives under profiles: [staging] so no real SMTP secret
is ever wired into the staging deploy
- OCR mem_limit 12g + healthcheck (start_period 120s) copied from
the dev compose so docker compose up -d --wait works in CI
- Backend admin credentials wired through APP_ADMIN_USERNAME /
APP_ADMIN_PASSWORD; first deploy locks the password in
permanently because UserDataInitializer is idempotent on email
- All host ports bound to 127.0.0.1; Caddy fronts external traffic
Refs #497.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Multi-stage Dockerfile with three targets:
- development (dev server on :5173, used by docker-compose.yml)
- build (runs npm run build, produces SvelteKit Node-adapter output)
- production (self-contained node build server on :3000)
Node base pinned to node:20.19.0-alpine3.21 for reproducible CI
builds (Renovate will keep it current).
docker-compose.yml now specifies target: development for the
frontend so dev continues to use the dev-server stage. Without
this, Docker would default to the last stage (production).
Refs #497.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The route exports prerender = true and is listed in
svelte.config.js's prerender.entries. Until now the auth hook
redirected unauthenticated requests to /login, so the prerender
crawler hit a 302 and the build failed with "marked as prerenderable,
but were not prerendered".
Adding the path to PUBLIC_PATHS lets the crawler render the static
HTML; consistent with the route's intent as a public help page.
Surfaced by #497 (the production Docker build is the first place
npm run build runs in CI).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds server.forward-headers-strategy: native so that Jetty honours
X-Forwarded-{Proto,For,Host} from Caddy. Without this, getScheme(),
redirect URLs, and Spring Session "Secure" cookies reflect the
internal http hop instead of the original https client request.
Refs #497.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add lines, functions, and statements at 80% alongside branches in both
the server (vite.config.ts) and client (vitest.client-coverage.config.ts)
coverage gates — branch-only thresholds allow misleadingly sparse tests to
pass the gate.
Also adds a plugin-sync comment to vitest.client-coverage.config.ts listing
the four Vite plugins mirrored from vite.config.ts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Runs test:coverage (server v8 + client Istanbul) after tests, hard-gates
on both 80% branch thresholds, and uploads coverage/ as an artifact.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sequential && prevents the ENOTEMPTY race on coverage/.tmp. Server
uses v8 via --project=server; client uses the standalone Istanbul config.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Vitest 4 silently ignores per-project coverage overrides in test.projects,
so a standalone vitest.client-coverage.config.ts provides the root-level
Istanbul coverage block that Vitest actually honours.
Root vite.config.ts retains the v8 coverage block (reportsDirectory:
coverage/server) for the server project. The client config writes to
coverage/client and instruments all .svelte and .svelte.ts files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Istanbul instruments code at transpile time and works inside Chromium's
sandbox; v8 coverage is silently a no-op in browser mode.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- searchDocuments_relevance_returns_empty_when_offset_exceeds_maxInt:
proves the long→int guard fires and findFtsPageRaw is never called
- searchDocuments_relevance_handles_string_uuid_from_jdbc_driver:
exercises the toFtsPage String fallback branch for JDBC drivers that
return UUID columns as String instead of java.util.UUID
Addresses Sara's review concerns on PR #488.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract isPureTextRelevance() private static method to replace the
7-clause inline boolean in searchDocuments
- Guard long→int cast in relevanceSortedPageFromSql to prevent silent
overflow at page ≥43M (CWE-190)
- resolvePersonName now uses the typed API client (createApiClient)
instead of raw fetch, aligning with project conventions
- Update DocumentServiceTest stubs to match new FTS path (findFtsPageRaw
+ findAllById instead of findAllMatchingIdsByFts)
- Rewrite page.server.spec.ts person-name tests to mock via path-based
API dispatch, matching the new api.GET call site
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- DocumentFtsPagedIntegrationTest: Testcontainers repo-level tests for
findFtsPageRaw (page size, window total, last page, no matches, stopword)
- DocumentServiceSortTest: rewritten to stub findFtsPageRaw + findAllById
for the pure-text RELEVANCE path; verifies filter-active path stays in-memory
- DocumentServiceTest: update two enrichment tests to use new SQL-path stubs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pure-text RELEVANCE queries now use findFtsPageRaw (CTE + COUNT(*) OVER())
instead of loading all matching IDs into memory and sorting in-process.
Non-text paths (filters active, DATE sort) still use the in-memory path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Flyway V62 adds idx_documents_sender_id and idx_comments_author_id to speed up
FK-driven queries on the persons page and briefwechsel view. Closes#470.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add min-h-[44px] min-w-[44px] to all five PDF viewer buttons (prev,
next, zoom in, zoom out, annotation toggle) and widen icon-only
padding from p-1 to p-2. Adds aria-pressed to the annotation toggle
for correct toggle semantics (WCAG 2.2 §2.5.8 + ARIA 1.2).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the NAS runner configuration needed for Testcontainers.
Must be deployed to the runner host alongside the act_runner binary.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DOCKER_HOST makes the socket explicit rather than relying on runner
config propagation; TESTCONTAINERS_RYUK_DISABLED=true avoids Ryuk
watchdog start failures in nested container environments.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
date-buckets.spec.ts midnight tests pass timezone-aware dates (+02:00)
which are 22:00 UTC the prior day; setHours(0,0,0,0) uses local TZ.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Math.abs(Integer.MIN_VALUE) overflows back to Integer.MIN_VALUE (negative),
making the old pattern unsafe for any palette size that doesn't evenly divide
MIN_VALUE. Math.floorMod always returns a non-negative residue in [0, n-1],
eliminating the overflow edge case entirely.
Fixes SpotBugs RV_ABSOLUTE_VALUE_OF_HASHCODE (priority 1, CORRECTNESS).
Closes#471
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
getBlockComments was missing documentId; replyToBlockComment was missing
blockId. Spring silently ignored undeclared path variables — the segments
were parsed but never bound. Now both parameters are explicitly declared so
Spring rejects non-UUID values with 400.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Null dto.permissions now produces an empty HashSet instead of propagating null
into the @ElementCollection — prevents a silent NPE after V64 adds NOT NULL.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
V63 deduplicates any phantom (group_id, permission) rows accumulated since
the initial schema. V64 sets NOT NULL on permission and adds pk_group_permissions.
V65 renames uq_tbmp_block_person to pk_tbmp for naming-convention consistency.
Integration tests confirm each constraint via pg_catalog.pg_constraint. Closes#469 (partial).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>