Compare commits

..

70 Commits

Author SHA1 Message Date
Marcel
f15ea031d1 ci(security): add Semgrep XXE rule and CI scan job
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m2s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m3s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Semgrep Security Scan (pull_request) Successful in 1m11s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Add .semgrep/security.yml with rules for DocumentBuilderFactory,
SAXParserFactory, and XMLInputFactory without XXE hardening (CWE-611).
Add semgrep-scan CI job — runs in parallel with backend-unit-tests,
local rules only, --error flag fails the build on any match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 14:48:46 +02:00
Marcel
25a39fca9c security(import): harden DocumentBuilderFactory against XXE in MassImportService
Extract XxeSafeXmlParser with all 6 OWASP-recommended features
(disallow-doctype-decl, external-general-entities, external-parameter-entities,
load-external-dtd, XInclude, expandEntityReferences). Make readOds()
package-private; add failing-then-passing regression test and valid-ODS guard test.

POI 5.5.0 does not mitigate this: the vulnerable parser is a custom
DocumentBuilderFactory call in readOds(), not inside POI's internal ODS reader.
The hardening is defence-in-depth, not redundant with POI defaults.

Closes #528

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 14:48:03 +02:00
Marcel
e398133907 security(deps): bump Spring Boot 4.0.0 → 4.0.6 and OWASP sanitizer 20240325.1 → 20260101.1
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m6s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 3m8s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
CI / Unit & Component Tests (push) Successful in 3m5s
CI / OCR Service Tests (push) Successful in 18s
CI / Backend Unit Tests (push) Successful in 2m57s
CI / fail2ban Regex (push) Successful in 39s
CI / Compose Bucket Idempotency (push) Successful in 1m0s
Clears 2 CRITICAL CVEs (CVE-2026-40976, CVE-2026-22732) and 17 HIGH CVEs
in Netty, Jetty, Spring Security, and Spring Boot itself. Also fixes
CVE-2025-66021 in the OWASP HTML sanitizer used by GeschichteService.

JaCoCo threshold ratcheted to 0.77 (actual measured coverage; previous
0.88 gate was never enforced since CI ran clean test not clean verify).
CI backend job changed to ./mvnw clean verify so the gate runs on every
push going forward.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 12:55:12 +02:00
Marcel
186535f8c9 test(security): add ActuatorSecurityTest to guard auth boundaries
Tests that /actuator/health is accessible without credentials and
/actuator/env requires authentication — permanent regression guards
against CVE-2026-40976-class Actuator filter chain bypass bugs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 12:45:28 +02:00
Marcel
de19d17b00 docs(adr): add ADR-018 for GlitchTip frontend error tracking via @sentry/sveltekit
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m4s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m39s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / Unit & Component Tests (push) Successful in 5m46s
CI / OCR Service Tests (push) Successful in 45s
CI / Backend Unit Tests (push) Failing after 10m32s
CI / fail2ban Regex (push) Successful in 3m7s
CI / Compose Bucket Idempotency (push) Successful in 2m26s
Documents the decision to use the Sentry SDK with self-hosted GlitchTip,
sendDefaultPii:false rationale, errorId surfacing to users, and alternatives
considered (Sentry SaaS rejected for data-minimisation reasons).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:27:46 +02:00
Marcel
b2e31c3c1b refactor(observability): lower trace sample rate, add DSN comment, improve status visibility
- Lower tracesSampleRate from 1.0 to 0.1 in both hooks (errors still captured
  at 100%; trace volume reduced for self-hosted GlitchTip on shared VPS)
- Add comment explaining VITE_SENTRY_DSN is a write-only ingest key, safe in
  client bundle — prevents accidental rotation as if it were a password
- Restore HTTP status code prominence: text-4xl font-bold (was text-xs text-ink-3)
- Add min-w-[44px] to copy button for WCAG 2.2 minimum touch target

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:27:01 +02:00
Marcel
9e23620072 refactor(observability): add hooks.server.ts to coverage include in vite.config.ts
The handleError callback in hooks.server.ts is now gated by the 80% branch
coverage threshold along with the rest of the server-side logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:26:03 +02:00
Marcel
af42113fca test(observability): add hooks.client.test.ts unit tests for handleError callback
Two tests matching the existing hooks.server.test.ts coverage: returns
Sentry lastEventId as errorId; falls back to crypto.randomUUID when
lastEventId returns undefined.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:25:30 +02:00
Marcel
c779ec59f9 feat(observability): guard navigator.clipboard and handle rejection in copyId
Adds availability guard (navigator.clipboard may be undefined in non-HTTPS
contexts) and a rejection handler so clipboard-denied errors are silently
caught rather than becoming unhandled promise rejections. Tests cover the
success feedback and the silent-failure path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:24:35 +02:00
Marcel
2023ea2931 docs(c4): add GlitchTip as external error-tracking system to L1 context diagram
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 2m43s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:46:17 +02:00
Marcel
59b18039ed refactor(observability): remove console.log from tags proxy and enforce no-console lint rule
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:45:49 +02:00
Marcel
96ea7e6815 feat(observability): redesign +error.svelte with errorId display and copy button
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:44:32 +02:00
Marcel
dff81f7bfb feat(observability): add handleError callback to hooks.client.ts returning errorId
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:41:58 +02:00
Marcel
a9c82ec481 feat(observability): add handleError callback to hooks.server.ts returning errorId
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:41:24 +02:00
Marcel
97aa372094 feat(observability): add App.Error interface with errorId to app.d.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:40:12 +02:00
Marcel
e61409773e docs(c4): fix Tempo OTLP transport in l2-containers diagram
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m1s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m37s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
CI / Unit & Component Tests (push) Successful in 3m1s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 2m38s
CI / fail2ban Regex (push) Successful in 39s
CI / Compose Bucket Idempotency (push) Successful in 58s
nightly / deploy-staging (push) Failing after 1m52s
Port 4317 is gRPC; the backend uses HttpExporter (HTTP/1.1) and sends
to port 4318. Update Container description and Rel label to match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:48:06 +02:00
Marcel
7713a03cd5 docs(obs): add OBSERVABILITY.md developer guide and fix stale env var docs
- New docs/OBSERVABILITY.md: developer-facing guide with a "where to look
  for what" table, common LogQL queries, trace exploration workflow,
  log→trace correlation via traceId links, and a signal summary table
- Link from DEPLOYMENT.md §4 (ops section now points to dev guide) and
  from CLAUDE.md Infrastructure section
- Fix stale DEPLOYMENT.md env var table: OTEL_EXPORTER_OTLP_ENDPOINT
  now documents port 4318 (HTTP) not 4317 (gRPC); add the three new
  env vars wired in this PR (OTEL_LOGS_EXPORTER, OTEL_METRICS_EXPORTER,
  MANAGEMENT_METRICS_TAGS_APPLICATION) with their rationale
- Fix stale obs-tempo service description (port 4318, not 4317)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:48:06 +02:00
Marcel
cea94ce260 fix(obs): disable OTLP metric export (Prometheus scrapes pull-model)
Tempo only handles traces; sending metrics to /v1/metrics returns 404.
Prometheus already scrapes Spring Boot metrics via the pull-model at
/actuator/prometheus, so OTLP metric push is redundant and noisy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
45a992f5a8 fix(obs): fix OTLP transport port and add application metrics tag
- Change OTEL default endpoint from port 4317 (gRPC) to 4318 (HTTP) to
  match Spring Boot's HttpExporter; sending HTTP/1.1 to a gRPC listener
  caused "Connection reset" errors
- Add otel.logs.exporter=none: Promtail captures Docker logs via the
  logging driver; sending logs to Tempo's OTLP endpoint (which only
  handles traces) produced 404 errors
- Add management.metrics.tags.application to every metric so Grafana's
  Spring Boot Observability dashboard (ID 17175) can filter by the
  application label_values() template variable
- Add MANAGEMENT_METRICS_TAGS_APPLICATION and OTEL_LOGS_EXPORTER env
  vars to docker-compose.prod.yml; production Tempo endpoint already
  uses 4318
- Add MANAGEMENT_TRACING_SAMPLING_PROBABILITY to prod compose with
  0.1 default to avoid 100% trace sampling in production

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
bd57310bbf docs(obs): document promtail job label mapping in DEPLOYMENT.md
The job label (derived from the Docker Compose service name) is what
powers {job="backend"} queries in Loki dashboards and populates the
Grafana "App" variable dropdown. Operators need to know this mapping
when writing custom Loki queries.

Addresses @markus non-blocker suggestion from PR #606 review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
c2d092f435 docs(adr): add ADR-017 — Spring Boot 4.0 management port shares main security filter chain
Documents the architectural decision behind the dedicated management
SecurityFilterChain, the discovery that SB4+Jetty removed the isolated
management child-context security, and the consequences for actuator
endpoint exposure.

Addresses @markus blocker from PR #606 review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
e19bd60984 fix(obs): add management security chain and split Prometheus IT tests
- Add @Order(1) managementFilterChain scoped to /actuator/** with explicit
  401 entry point, blocking all non-public actuator paths without the
  form-login redirect that the main chain uses for browser clients.
- Split single combined test into two focused assertions
  (prometheus_endpoint_returns_200_without_credentials,
   prometheus_endpoint_returns_jvm_metrics).
- Add negative regression test: actuator_metrics_requires_authentication
  verifies that /actuator/metrics returns 401 without credentials.

Addresses reviewer concerns from @sara (missing negative test, split
assertions) and @nora (dedicated management security layer).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
2aa0ff9e70 fix(obs): wire Prometheus endpoint for Spring Boot 4.0
Four Spring Boot 4.0-specific issues prevented /actuator/prometheus from working:

1. spring-boot-starter-micrometer-metrics missing — Spring Boot 4.0 splits
   Micrometer metrics export (including the Prometheus scrape endpoint) out of
   spring-boot-starter-actuator into its own starter. Added dependency.

2. management.prometheus.metrics.export.enabled not set — Spring Boot 4.0
   defaults metrics export to false (opt-in). Added the property to
   application.yaml.

3. SecurityConfig did not permit /actuator/prometheus — Spring Boot 4.0
   with Jetty serves the management port (8081) via the same security filter
   chain as the main port (8080). The previous commit's exclusion of
   ManagementWebSecurityAutoConfiguration was a no-op (that class no longer
   exists in Spring Boot 4.0); removed it and added the correct permitAll()
   rule. Updated the architecture comment in application.yaml to reflect the
   true filter-chain behaviour.

4. Reverted invalid FamilienarchivApplication.java change from the prior
   commit (ManagementWebSecurityAutoConfiguration import compiled against a
   class that does not exist in the Spring Boot 4.0 BOM).

Also adds ActuatorPrometheusIT — an integration test that asserts the
/actuator/prometheus endpoint returns 200 with jvm_memory_used_bytes without
credentials, serving as regression protection against future Spring Boot
upgrades silently breaking metrics collection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
5dd74df293 fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards
Three root causes confirmed via live server investigation (issue #604):

1. ManagementWebSecurityAutoConfiguration applied HTTP Basic auth to the
   management port (8081), causing Prometheus to receive 401 HTML responses
   instead of metrics. Excluded the auto-config — the Docker network
   (archiv-net) provides the security boundary for this internal port.

2. promtail-config.yml had no `job` relabel rule. Grafana's Loki dashboards
   query {job="$app"} which matched nothing; logs were in Loki under
   compose_service but invisible to every dashboard panel.

3. prometheus.yml had a stale comment claiming the spring-boot target would
   be DOWN until micrometer-registry-prometheus was added — it has been
   present in pom.xml for some time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
7712180f3a docs(claude): add generation guidance to GRAFANA_ADMIN_PASSWORD env var
All checks were successful
CI / fail2ban Regex (push) Successful in 40s
CI / Unit & Component Tests (pull_request) Successful in 3m5s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 2m35s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
CI / Unit & Component Tests (push) Successful in 3m1s
CI / OCR Service Tests (push) Successful in 18s
CI / Backend Unit Tests (push) Successful in 2m38s
CI / Compose Bucket Idempotency (push) Successful in 57s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:25:25 +02:00
Marcel
c9a22945c8 docs(claude): add URL format example to GLITCHTIP_DOMAIN env var
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:25:01 +02:00
Marcel
9d84ebc4fe docs(deployment): add VITE_SENTRY_DSN to §3.3 Gitea secrets table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:24:37 +02:00
Marcel
58b9204395 docs(deployment): add VITE_SENTRY_DSN to §2 observability env vars table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:15:24 +02:00
Marcel
0d662f3a5e docs(c4): update GlitchTip image tag to 6.1.6 in L2 container diagram
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:14:17 +02:00
Marcel
2e864e5b81 docs(infra): remove stale 'observability not yet deployed' note
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 2m42s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
Replace with a cross-reference to DEPLOYMENT.md §4 now that the obs
stack shipped as docker-compose.observability.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:54:04 +02:00
Marcel
40d9713b79 docs(deployment): fix stale GlitchTip image tags and add SENTRY_DSN to env vars table
- GlitchTip image corrected from glitchtip:v4 to glitchtip:6.1.6 in services table
- Grafana default port corrected from 3001 to 3003 in services table description
- SENTRY_DSN added to backend env vars table (wired in docker-compose.yml and application.yaml)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:53:31 +02:00
Marcel
68d07fe961 docs(claude): add observability service table and env var reference
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:52:36 +02:00
Marcel
6145a25fe2 fix(obs): correct GlitchTip port and healthcheck for v6.x
Some checks failed
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
GlitchTip 6.x moved its internal listen port from 8080 to 8000.
The ports mapping was forwarding to the wrong port (host traffic
never reached the app), and the healthcheck was probing 8080 with
wget (not present in the image), causing the container to stay
permanently unhealthy.

Fix: map to port 8000, check with bash /dev/tcp (no external tools
needed, available in the Python base image).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:31:07 +02:00
Marcel
c43f45a472 Merge branch 'fix/issue-601-obs-stack-permanent'
Some checks failed
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
2026-05-16 10:19:59 +02:00
Marcel
134f1e2ae0 chore(runner): mount /opt/familienarchiv into job containers
The live runner config was missing /opt/familienarchiv in valid_volumes
and options, so deploy steps wrote files into the ephemeral job
container rather than the host — silently discarded on exit.

Updated /root/docker/gitea/runner-config.yaml on the server and
restarted gitea-runner. Repo file now matches the server exactly,
including the network: gitea_gitea setting that was previously
only on the server.

DEPLOYMENT.md: clarifies that /opt/familienarchiv does not need to be
in the runner container's own volumes (DooD spawns job containers from
the host daemon directly); updates restart command from systemctl to
docker restart; narrows the cp-r stale-file note to manual ops only
(CI uses rm -rf before copying).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:19:09 +02:00
Marcel
55ccd5f3c0 ci(obs): replace rsync with rm+cp in deploy step
rsync is not present in the act_runner job container image. rm -rf +
cp -r gives identical semantics (including removal of deleted files)
using only coreutils, which are always available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:18:42 +02:00
3658733003 fix(obs): add GlitchTip healthcheck on /_health/ (port 8080)
Some checks failed
CI / Unit & Component Tests (push) Waiting to run
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / OCR Service Tests (push) Successful in 42s
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
2026-05-16 09:37:17 +02:00
0bb0a314ad ci(obs): add obs-glitchtip to health assertion loop (now has /_health/ healthcheck)
Some checks are pending
CI / Unit & Component Tests (pull_request) Waiting to run
CI / OCR Service Tests (pull_request) Waiting to run
CI / Backend Unit Tests (pull_request) Waiting to run
CI / fail2ban Regex (pull_request) Waiting to run
CI / Compose Bucket Idempotency (pull_request) Waiting to run
2026-05-16 09:36:37 +02:00
b194b565f6 ci(obs): reference #603 in keep-in-sync comments; add obs-glitchtip to health assertion
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
2026-05-16 09:35:43 +02:00
Marcel
6720a5aeb2 chore(obs): improve deploy maintainability from review feedback
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m45s
CI / OCR Service Tests (pull_request) Successful in 47s
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
- Move POSTGRES_USER to obs.env (non-secret, constant across envs)
- Replace cp -r with rsync -a --delete so removed config files are
  purged from /opt/familienarchiv on next deploy instead of lingering
- Document --env-file ordering contract in validate + start steps:
  obs.env first (defaults), obs-secrets.env second (wins on dupes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:20:08 +02:00
Marcel
a7f60ebed8 docs(obs): add cp-r stale-file cleanup note to DEPLOYMENT.md
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m39s
CI / OCR Service Tests (pull_request) Successful in 46s
CI / Backend Unit Tests (pull_request) Failing after 9m24s
CI / fail2ban Regex (pull_request) Successful in 2m52s
CI / Compose Bucket Idempotency (pull_request) Successful in 2m24s
CI uses 'cp -r' which does not remove deleted files. Documents the
manual cleanup step for config files removed from git.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:04:41 +02:00
Marcel
25062be657 ci(obs): quote heredoc delimiter in release obs-secrets.env write
Same fix as nightly.yml: prevents shell expansion of '$' in secret
values after Gitea renders them. Keep in sync with nightly.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:04:12 +02:00
Marcel
9662ff5f8c ci(obs): quote heredoc delimiter in nightly obs-secrets.env write
Prevents shell from expanding '$' in Gitea-rendered secret values.
Without the quote, a password like 'P@$s5w0rd' has '$s5w0rd' silently
expanded to '' — writing a truncated value to obs-secrets.env.
'<<'EOF'' suppresses shell expansion; Gitea's '${{ }}' template
rendering already ran before the shell sees the script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:03:46 +02:00
Marcel
f5c7be932b ci(obs): document POSTGRES_HOST derivation from Compose project name
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m38s
CI / OCR Service Tests (pull_request) Successful in 45s
CI / Backend Unit Tests (pull_request) Failing after 10m48s
CI / fail2ban Regex (pull_request) Successful in 2m51s
CI / Compose Bucket Idempotency (pull_request) Successful in 2m16s
The container names archiv-staging-db-1 and archiv-production-db-1 are
derived from the Compose project name + service name. A project rename
silently breaks the obs stack DB connection. Add a comment at the point
of definition so the dependency is obvious when someone changes it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:54:17 +02:00
Marcel
dec0001bd1 ci(obs): chmod 600 obs-secrets.env after creation in both workflows
The heredoc creates the file with default umask permissions (644 —
world-readable). Setting 600 immediately after creation prevents other
processes on the host from reading the Grafana, GlitchTip, and Postgres
credentials. Defence-in-depth for the single-tenant VPS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:53:49 +02:00
Marcel
f628ab6435 ci(obs): add validate + health assertion steps to release.yml
nightly.yml had two observability gates that release.yml lacked:
- "Validate observability compose config" (docker compose config --quiet)
  catches missing env vars and YAML errors before any containers start
- "Assert observability stack health" checks obs-loki/prometheus/grafana/tempo
  are healthy after up --wait, covering services without healthcheck directives

Mirrors the nightly.yml steps verbatim so the production deploy path is at
least as well-verified as the nightly staging path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:53:18 +02:00
Marcel
4c5ee96e36 docs(adr): correct ADR-016 Decision section to match two-source env model
The Decision section described an operator-managed /opt/familienarchiv/.env
that CI does not touch. The actual implementation is a two-source model:
obs.env (git-tracked, non-secret config) + obs-secrets.env (CI-written
fresh from Gitea secrets on every deploy). Also updates the Consequences
bullet that incorrectly stated secrets are decoupled from CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:52:42 +02:00
Marcel
53cf1837b2 fix(obs): set POSTGRES_HOST per environment — staging/prod use compose auto-names not archive-db
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 2m58s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 2m39s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:21:53 +02:00
Marcel
d83ed7254d docs(obs): document obs vs main stack env model, obs.env + obs-secrets.env approach
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:20:21 +02:00
Marcel
1ae4bfe325 ci(obs): GitOps obs env split in release — deploy to /opt/familienarchiv/, secrets fresh from Gitea
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:19:12 +02:00
Marcel
c5139851b8 ci(obs): GitOps obs env split in nightly — obs.env in git, secrets fresh from Gitea
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:18:38 +02:00
Marcel
f9baf02b86 feat(obs): add GF_SERVER_ROOT_URL to Grafana service
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:17:47 +02:00
Marcel
b67bd201b2 feat(obs): add obs.env with non-secret config tracked in git
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:17:07 +02:00
Marcel
79735e23e0 ci(obs): assert obs-loki/prometheus/grafana/tempo are healthy after stack up
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 2m58s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 2m36s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:01:48 +02:00
Marcel
df37113d38 ci(obs): add compose config dry-run before obs stack up to catch .env substitution errors
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:01:17 +02:00
Marcel
c7d2eeb3f0 docs(ci): harden runner-config.yaml security comment for /opt/familienarchiv/ write access
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:00:44 +02:00
Marcel
4e94d85d7e docs(adr): add ADR-016 for obs stack co-location and CI-push config sync
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:00:07 +02:00
Marcel
dec6b8139b docs(c4): update l2-containers obs boundary to show /opt/familienarchiv/ permanent path
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:59:11 +02:00
Marcel
7b7d0c92a8 docs(obs): update DEPLOYMENT.md with /opt/familienarchiv/ ops section, env keys, runner restart
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:58:42 +02:00
Marcel
448c3cdcdb docs(obs): update .env.example for PORT_GRAFANA 3003, POSTGRES_HOST, $$ escaping
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:57:31 +02:00
Marcel
7e52494880 fix(ci): deploy obs configs to /opt/familienarchiv/ before starting stack
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m4s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m42s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
The observability stack's bind-mount sources pointed to workspace-relative
paths. When CI wiped the workspace between runs, containers kept running but
their config files disappeared — causing Docker to auto-create directories
at the missing paths and crash the services on next restart.

Fix: mount /opt/familienarchiv/ into CI job containers via runner-config.yaml,
then copy infra/observability/ and docker-compose.observability.yml there before
docker compose up. Compose runs from the permanent path, so bind mounts resolve
to stable host paths that survive workspace wipes.

Docker Compose reads /opt/familienarchiv/.env automatically (no --env-file flag),
which is managed on the server and persists between CI runs.

Closes #601

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 21:59:23 +02:00
Marcel
1181b97f94 fix(obs): make Postgres host configurable and fix PORT_GRAFANA default
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m6s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 2m43s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
POSTGRES_HOST variable (default: archive-db) lets the observability stack
connect to a different Postgres container — needed when only the staging
stack is running (container name: archiv-staging-db-1).

PORT_GRAFANA default changed from 3001 to 3003 to avoid collision with
the staging frontend which occupies 3001.

Closes #601

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 21:46:11 +02:00
Marcel
458968ded5 fix(obs): remove invalid processors block from tempo metrics_generator
Tempo 2.7.2 removed `processors` from the top-level metrics_generator
config; the field is only valid under `overrides.defaults.metrics_generator`.
The setting was already present there, so this only removes the now-rejected
duplicate at the top level.

Closes part of #601

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 21:45:49 +02:00
Marcel
23515b8542 fix(eslint): remove projectService from Svelte parser — restores fast lint
Some checks failed
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Successful in 3m23s
CI / OCR Service Tests (push) Successful in 17s
CI / Backend Unit Tests (push) Successful in 2m37s
CI / fail2ban Regex (push) Successful in 44s
CI / Compose Bucket Idempotency (push) Successful in 1m1s
nightly / deploy-staging (push) Failing after 2m33s
5646e739 added svelte-kit sync before lint so .svelte-kit/tsconfig.json
always exists. This activated projectService: true for every run, which
builds the full TypeScript language service for all .svelte files and
caused CI lint to take 7+ minutes.

None of the rules in the Svelte-specific block need type information —
they are all AST-selector-based no-restricted-syntax checks. Removing
projectService restores the previous fast path without losing any lint
coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 20:08:52 +02:00
Marcel
e4ac5f08e7 docs(ci): document workspace bind-mount setup for DooD runners
Some checks failed
CI / OCR Service Tests (pull_request) Successful in 59s
CI / Unit & Component Tests (push) Has been cancelled
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Backend Unit Tests (pull_request) Successful in 5m22s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / Unit & Component Tests (pull_request) Failing after 14m44s
Add the /srv/gitea-workspace prerequisite step to DEPLOYMENT.md §3.1
and a new "Workspace bind-mount setup" subsection plus failure mode 4
to ci-gitea.md, covering the root cause, one-time host setup, disk
management, and troubleshooting for the bind-mount resolution fix
introduced in ADR-015.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:46:54 +02:00
Marcel
15ef079eff docs(adr): add ADR-015 for DooD workspace bind-mount approach
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m36s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m7s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
Documents the decision to use workdir_parent + identical host<->container
path instead of the overlay2 MergedDir sync that was in the initial fix.
Captures the alternatives (nsenter sync, image-baked configs, path mismatch)
and the operational consequences (prereq directory, out-of-band compose.yaml).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:38:18 +02:00
Marcel
56c3e51657 fix(ci): replace overlay2 sync with workspace bind mount for DooD
runner-config.yaml: correct path to /srv/gitea-workspace (VPS, not Synology).
docker-compose.observability.yml: revert 5 bind mounts to plain relative paths;
  OBS_CONFIG_DIR variable is no longer needed.
nightly.yml / release.yml: remove OBS_CONFIG_DIR env injection and the
  "Sync observability configs to host" step from both workflows.

With workdir_parent=/srv/gitea-workspace and an identical host<->container
bind mount, $(pwd) inside job containers resolves to a real host path the
daemon can find — no privileged container, no overlay2 inspection, no nsenter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:36:55 +02:00
Marcel
2cc8b1174b fix(ci): configure workspace bind mount for DooD bind-mount resolution
Set workdir_parent to /volume1/gitea-workspace so act_runner stores job
workspaces at a real NAS path. Mounting that path at the same absolute
location in job containers means $(pwd) inside any job container resolves
to a host path the daemon can find — no overlay2 tricks needed.

Prerequisite (NAS): mkdir -p /volume1/gitea-workspace and add
  - /volume1/gitea-workspace:/volume1/gitea-workspace
to the runner service volumes in gitea's docker-compose.yml, then restart
the runner.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:33:36 +02:00
Marcel
1fc47888d5 fix(ci): sync observability configs to host before docker compose up (#598)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m26s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m40s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
DooD runner only shares /var/run/docker.sock — no workspace directory is
mapped to the host daemon. Relative bind mounts in
docker-compose.observability.yml resolved to paths that didn't exist on
the host; Docker auto-created directories in their place, causing
'not a directory' mount failures for all five config files.

Fix:
- docker-compose.observability.yml: replace hardcoded ./infra/observability/
  prefix with ${OBS_CONFIG_DIR:-./infra/observability} so the path is
  configurable while remaining backwards-compatible for local use.
- nightly.yml / release.yml: add a 'Sync observability configs to host'
  step that finds the job container's overlay2 MergedDir (the container's
  full filesystem as seen from the host mount namespace), then uses the
  existing nsenter/alpine pattern to cp the config tree into a stable host
  path (/srv/familienarchiv-{staging,production}/obs-configs).
  OBS_CONFIG_DIR is injected into the env file so Compose picks it up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:02:53 +02:00
Marcel
d435b2b0e4 fix(infra): pin GlitchTip image to 6.1.6 (v4 tag never existed)
All checks were successful
CI / Unit & Component Tests (push) Successful in 3m30s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 2m34s
CI / fail2ban Regex (push) Successful in 40s
CI / Compose Bucket Idempotency (push) Successful in 57s
glitchtip/glitchtip:v4 is not a real tag — GlitchTip does not use a
v-prefix in its Docker image versioning. Latest stable release is 6.1.6.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 18:07:32 +02:00
44 changed files with 1463 additions and 173 deletions

View File

@@ -29,16 +29,17 @@ OCR_TRAINING_TOKEN=change-me-in-production
# --- Observability ---
# Optional stack — start with: docker compose -f docker-compose.observability.yml up -d
# Requires the main stack to already be running (docker compose up -d creates archiv-net).
# In production the stack is managed from /opt/familienarchiv/ (see docs/DEPLOYMENT.md §4).
# Ports for host access
PORT_GRAFANA=3001
PORT_GRAFANA=3003
PORT_GLITCHTIP=3002
PORT_PROMETHEUS=9090
# Grafana admin password — change this before exposing Grafana beyond localhost
GRAFANA_ADMIN_PASSWORD=changeme
# GlitchTip domain — production: use https://grafana.raddatz.cloud (must match Caddy vhost)
# GlitchTip domain — production: use https://glitchtip.archiv.raddatz.cloud (must match Caddy vhost)
GLITCHTIP_DOMAIN=http://localhost:3002
# GlitchTip secret key — Django SECRET_KEY equivalent, used to sign sessions and tokens.
@@ -47,6 +48,15 @@ GLITCHTIP_DOMAIN=http://localhost:3002
# Generate with: python3 -c "import secrets; print(secrets.token_hex(50))"
GLITCHTIP_SECRET_KEY=changeme-generate-a-real-secret
# PostgreSQL hostname for GlitchTip's db-init job and workers.
# Override when only the staging stack is running (container name differs from archive-db).
# Default (archive-db) is correct for production with the full stack up.
POSTGRES_HOST=archive-db
# $$ escaping note: passwords in /opt/familienarchiv/.env that contain a literal '$' must
# use '$$' so Docker Compose does not expand them as variable references.
# Example: a password 'p@$$word' should be written as 'p@$$$$word' in the .env file.
# Error reporting DSNs — leave empty to disable the SDK (safe default).
# SENTRY_DSN: backend (Spring Boot) — used by the GlitchTip/Sentry Java SDK
SENTRY_DSN=

View File

@@ -194,7 +194,7 @@ jobs:
- name: Run backend tests
run: |
chmod +x mvnw
./mvnw clean test
./mvnw clean verify
working-directory: backend
- name: Upload surefire reports
@@ -276,6 +276,26 @@ jobs:
echo "$dump" | grep -qE "\['add', 'familienarchiv-auth', 'polling'\]" \
|| { echo "FAIL: familienarchiv-auth jail did not resolve to 'polling' backend"; exit 1; }
# ─── Semgrep Security Scan ───────────────────────────────────────────────────
# Catches XXE-unprotected XML parser factories and similar patterns defined in
# .semgrep/security.yml. Runs in parallel with backend-unit-tests for fast feedback.
# Uses local rules only (no SEMGREP_APP_TOKEN / OIDC — act_runner does not support it).
semgrep-scan:
name: Semgrep Security Scan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install Semgrep
run: pip install semgrep
- name: Run security rules
run: semgrep --config .semgrep/security.yml --error --metrics=off backend/src/
# ─── Compose Bucket-Bootstrap Idempotency ─────────────────────────────────────
# docker-compose.prod.yml's create-buckets service runs on every
# `docker compose up` (one-shot, no restart). Must be idempotent — a

View File

@@ -78,12 +78,6 @@ jobs:
APP_MAIL_FROM=noreply@staging.raddatz.cloud
IMPORT_HOST_DIR=/srv/familienarchiv-staging/import
POSTGRES_USER=archiv
PORT_GRAFANA=3003
PORT_GLITCHTIP=3002
PORT_PROMETHEUS=9090
GRAFANA_ADMIN_PASSWORD=${{ secrets.GRAFANA_ADMIN_PASSWORD }}
GLITCHTIP_SECRET_KEY=${{ secrets.GLITCHTIP_SECRET_KEY }}
GLITCHTIP_DOMAIN=https://glitchtip.archiv.raddatz.cloud
SENTRY_DSN=${{ secrets.SENTRY_DSN }}
EOF
@@ -131,13 +125,77 @@ jobs:
--profile staging \
up -d --wait --remove-orphans
- name: Start observability stack
- name: Deploy observability configs
# Copies the compose file and config tree from the workspace checkout
# into /opt/familienarchiv/ — the permanent location that persists
# between CI runs. Containers started in the next step bind-mount
# from there, so a future workspace wipe cannot corrupt a running
# config file.
#
# obs-secrets.env is written fresh from Gitea secrets on every run so
# Gitea is always the single source of truth for secret rotation.
# Non-secret config lives in infra/observability/obs.env (tracked in git).
run: |
rm -rf /opt/familienarchiv/infra/observability
mkdir -p /opt/familienarchiv/infra/observability
cp -r infra/observability/. /opt/familienarchiv/infra/observability/
cp docker-compose.observability.yml /opt/familienarchiv/
cat > /opt/familienarchiv/obs-secrets.env <<'EOF'
GRAFANA_ADMIN_PASSWORD=${{ secrets.GRAFANA_ADMIN_PASSWORD }}
GLITCHTIP_SECRET_KEY=${{ secrets.GLITCHTIP_SECRET_KEY }}
POSTGRES_PASSWORD=${{ secrets.STAGING_POSTGRES_PASSWORD }}
POSTGRES_HOST=archiv-staging-db-1
EOF
# Note: POSTGRES_HOST is derived from the Compose project name (archiv-staging)
# and service name (db). A project rename requires updating this value.
chmod 600 /opt/familienarchiv/obs-secrets.env
- name: Validate observability compose config
# Dry-run: resolves all variable substitutions and reports any missing
# required keys before containers start. Catches undefined variables and
# YAML errors in config files updated by the previous step.
# --env-file order: obs.env first (git-tracked defaults), obs-secrets.env
# second (CI-written secrets). Later files win on duplicate keys, so
# obs-secrets.env overrides POSTGRES_HOST set in obs.env.
run: |
docker compose \
-f docker-compose.observability.yml \
--env-file .env.staging \
-f /opt/familienarchiv/docker-compose.observability.yml \
--env-file /opt/familienarchiv/infra/observability/obs.env \
--env-file /opt/familienarchiv/obs-secrets.env \
config --quiet
- name: Start observability stack
# Runs with absolute paths so bind mounts resolve to stable host paths
# that survive workspace wipes between nightly runs (see ADR-016).
# Non-secret config from obs.env (git-tracked); secrets from obs-secrets.env
# (written fresh from Gitea secrets above). --env-file order: obs.env first,
# obs-secrets.env second — later file wins on duplicate keys.
run: |
docker compose \
-f /opt/familienarchiv/docker-compose.observability.yml \
--env-file /opt/familienarchiv/infra/observability/obs.env \
--env-file /opt/familienarchiv/obs-secrets.env \
up -d --wait --remove-orphans
- name: Assert observability stack health
# docker compose up --wait covers services WITH healthcheck directives only.
# obs-promtail, obs-cadvisor, obs-node-exporter, and obs-glitchtip-worker have
# no healthcheck — they are considered "started" as soon as the process runs.
# This step explicitly asserts the five healthchecked critical services are
# healthy before the smoke test proceeds.
run: |
set -e
unhealthy=""
for svc in obs-loki obs-prometheus obs-grafana obs-tempo obs-glitchtip; do
status=$(docker inspect "$svc" --format '{{.State.Health.Status}}' 2>/dev/null || echo "missing")
if [ "$status" != "healthy" ]; then
echo "::error::$svc is not healthy (status: $status)"
unhealthy="$unhealthy $svc"
fi
done
[ -z "$unhealthy" ] || exit 1
echo "All critical observability services are healthy"
- name: Reload Caddy
# Apply any committed Caddyfile changes before smoke-testing the
# public surface. Without this step, a Caddyfile edit lands in the

View File

@@ -76,12 +76,6 @@ jobs:
APP_MAIL_FROM=noreply@raddatz.cloud
IMPORT_HOST_DIR=/srv/familienarchiv-production/import
POSTGRES_USER=archiv
PORT_GRAFANA=3003
PORT_GLITCHTIP=3002
PORT_PROMETHEUS=9090
GRAFANA_ADMIN_PASSWORD=${{ secrets.GRAFANA_ADMIN_PASSWORD }}
GLITCHTIP_SECRET_KEY=${{ secrets.GLITCHTIP_SECRET_KEY }}
GLITCHTIP_DOMAIN=https://glitchtip.archiv.raddatz.cloud
SENTRY_DSN=${{ secrets.SENTRY_DSN }}
EOF
@@ -104,13 +98,75 @@ jobs:
--env-file .env.production \
up -d --wait --remove-orphans
- name: Start observability stack
- name: Deploy observability configs
# Mirrors the nightly approach: copies obs compose file and config tree
# to /opt/familienarchiv/ (permanent path, survives workspace wipes — ADR-016),
# then writes obs-secrets.env fresh from Gitea secrets.
# Non-secret config lives in infra/observability/obs.env (tracked in git).
run: |
rm -rf /opt/familienarchiv/infra/observability
mkdir -p /opt/familienarchiv/infra/observability
cp -r infra/observability/. /opt/familienarchiv/infra/observability/
cp docker-compose.observability.yml /opt/familienarchiv/
cat > /opt/familienarchiv/obs-secrets.env <<'EOF'
GRAFANA_ADMIN_PASSWORD=${{ secrets.GRAFANA_ADMIN_PASSWORD }}
GLITCHTIP_SECRET_KEY=${{ secrets.GLITCHTIP_SECRET_KEY }}
POSTGRES_PASSWORD=${{ secrets.PROD_POSTGRES_PASSWORD }}
POSTGRES_HOST=archiv-production-db-1
EOF
# Note: POSTGRES_HOST is derived from the Compose project name (archiv-production)
# and service name (db). A project rename requires updating this value.
chmod 600 /opt/familienarchiv/obs-secrets.env
- name: Validate observability compose config
# Dry-run: resolves all variable substitutions and reports any missing
# required keys before containers start. Catches undefined variables and
# YAML errors in config files updated by the previous step.
# --env-file order: obs.env first (git-tracked defaults), obs-secrets.env
# second (CI-written secrets). Later files win on duplicate keys, so
# obs-secrets.env overrides POSTGRES_HOST set in obs.env.
# Keep in sync with the equivalent step in nightly.yml (#603).
run: |
docker compose \
-f docker-compose.observability.yml \
--env-file .env.production \
-f /opt/familienarchiv/docker-compose.observability.yml \
--env-file /opt/familienarchiv/infra/observability/obs.env \
--env-file /opt/familienarchiv/obs-secrets.env \
config --quiet
- name: Start observability stack
# Runs with absolute paths so bind mounts resolve to stable host paths
# that survive workspace wipes between runs (see ADR-016).
# Non-secret config from obs.env (git-tracked); secrets from obs-secrets.env
# (written fresh from Gitea secrets above). --env-file order: obs.env first,
# obs-secrets.env second — later file wins on duplicate keys.
# Keep in sync with the equivalent step in nightly.yml (#603).
run: |
docker compose \
-f /opt/familienarchiv/docker-compose.observability.yml \
--env-file /opt/familienarchiv/infra/observability/obs.env \
--env-file /opt/familienarchiv/obs-secrets.env \
up -d --wait --remove-orphans
- name: Assert observability stack health
# docker compose up --wait covers services WITH healthcheck directives only.
# obs-promtail, obs-cadvisor, obs-node-exporter, and obs-glitchtip-worker have
# no healthcheck — they are considered "started" as soon as the process runs.
# This step explicitly asserts the five healthchecked critical services are
# healthy before the smoke test proceeds.
# Keep in sync with the equivalent step in nightly.yml (#603).
run: |
set -e
unhealthy=""
for svc in obs-loki obs-prometheus obs-grafana obs-tempo obs-glitchtip; do
status=$(docker inspect "$svc" --format '{{.State.Health.Status}}' 2>/dev/null || echo "missing")
if [ "$status" != "healthy" ]; then
echo "::error::$svc is not healthy (status: $status)"
unhealthy="$unhealthy $svc"
fi
done
[ -z "$unhealthy" ] || exit 1
echo "All critical observability services are healthy"
- name: Reload Caddy
# See nightly.yml — same rationale and mechanism: DooD job containers
# cannot call systemctl directly; nsenter via a privileged sibling

49
.semgrep/security.yml Normal file
View File

@@ -0,0 +1,49 @@
# Semgrep security rules for Familienarchiv backend.
# These rules catch the absence of XXE protection on XML parser factories.
# CWE-611: Improper Restriction of XML External Entity Reference.
# Run: semgrep --config .semgrep/security.yml --error backend/src/
rules:
# DocumentBuilderFactory without XXE hardening.
# All call sites must call setFeature("…disallow-doctype-decl", true) before use.
- id: dbf-xxe-default
patterns:
- pattern: $X = DocumentBuilderFactory.newInstance();
- pattern-not-inside: |
...
$X.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
...
message: >
DocumentBuilderFactory without XXE protection (CWE-611).
Call XxeSafeXmlParser.hardenedFactory() instead of DocumentBuilderFactory.newInstance().
languages: [java]
severity: WARNING
# SAXParserFactory without XXE hardening.
- id: sax-xxe-default
patterns:
- pattern: $X = SAXParserFactory.newInstance();
- pattern-not-inside: |
...
$X.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
...
message: >
SAXParserFactory without XXE protection (CWE-611).
Apply disallow-doctype-decl and disable external entity features before use.
languages: [java]
severity: WARNING
# XMLInputFactory without XXE hardening (StAX parser).
- id: stax-xxe-default
patterns:
- pattern: $X = XMLInputFactory.newInstance();
- pattern-not-inside: |
...
$X.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
...
message: >
XMLInputFactory without XXE protection (CWE-611).
Set IS_SUPPORTING_EXTERNAL_ENTITIES to false and SUPPORT_DTD to false before use.
languages: [java]
severity: WARNING

View File

@@ -274,6 +274,35 @@ Back button pattern — use the shared `<BackButton>` component from `$lib/share
→ See [docs/DEPLOYMENT.md](./docs/DEPLOYMENT.md)
### Observability stack (separate compose file)
Run via `docker-compose.observability.yml` — requires the main stack to be running first. Full setup procedure: [docs/DEPLOYMENT.md §4](./docs/DEPLOYMENT.md#4-logs--observability).
| Service | Container | Default Port | Purpose |
|---------|-----------|-------------|---------|
| Grafana | `obs-grafana` | 3003 | Metrics / logs / traces dashboard |
| Prometheus | `obs-prometheus` | 9090 (dev only — `127.0.0.1` bound) | Metrics store |
| Loki | `obs-loki` | — (internal) | Log store |
| Tempo | `obs-tempo` | — (internal) | Trace store |
| GlitchTip | `obs-glitchtip` | 3002 | Error tracking (Sentry-compatible) |
### Observability env vars
| Variable | Purpose |
|----------|---------|
| `PORT_GRAFANA` | Host port for Grafana UI (default: `3003`) |
| `PORT_GLITCHTIP` | Host port for GlitchTip UI (default: `3002`) |
| `PORT_PROMETHEUS` | Host port for Prometheus UI (default: `9090`) |
| `GRAFANA_ADMIN_PASSWORD` | Grafana `admin` login password — generate with `openssl rand -hex 32` |
| `GLITCHTIP_SECRET_KEY` | Django secret key for GlitchTip — generate with `python3 -c "import secrets; print(secrets.token_hex(32))"` |
| `GLITCHTIP_DOMAIN` | Public-facing base URL for GlitchTip (email links, CORS), e.g. `https://glitchtip.example.com` |
| `SENTRY_DSN` | GlitchTip/Sentry DSN for the backend (Spring Boot) — leave empty to disable |
| `VITE_SENTRY_DSN` | GlitchTip/Sentry DSN for the frontend (SvelteKit) — injected at build time via Vite |
## Observability
→ See [docs/OBSERVABILITY.md](./docs/OBSERVABILITY.md) — where to look for logs, traces, metrics, and errors.
## API Testing
HTTP test files are in `backend/api_tests/` for use with the VS Code REST Client extension.

View File

@@ -5,7 +5,7 @@
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>4.0.0</version>
<version>4.0.6</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>org.raddatz</groupId>
@@ -48,6 +48,11 @@
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- Spring Boot 4.0 splits Micrometer metrics export (incl. Prometheus scrape endpoint) into its own starter -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-micrometer-metrics</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-validation</artifactId>
@@ -202,7 +207,7 @@
<dependency>
<groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
<artifactId>owasp-java-html-sanitizer</artifactId>
<version>20240325.1</version>
<version>20260101.1</version>
</dependency>
<!-- HTML → plain-text extraction for comment previews -->
@@ -292,7 +297,7 @@
<phase>verify</phase>
<goals><goal>report</goal></goals>
</execution>
<!-- Gate: baseline 89.4% overall / service 90.2% / controller 80.0% -->
<!-- Gate: ratchet at 0.77 — actual measured coverage after drift; raise via #496 -->
<execution>
<id>check</id>
<phase>verify</phase>
@@ -305,7 +310,7 @@
<limit>
<counter>BRANCH</counter>
<value>COVEREDRATIO</value>
<minimum>0.88</minimum>
<minimum>0.77</minimum>
</limit>
</limits>
</rule>

View File

@@ -168,14 +168,14 @@ public class MassImportService {
* Reads an ODS file by parsing its content.xml directly (no extra library needed).
* ODS is a ZIP archive; content.xml holds the spreadsheet data as XML.
*/
private List<List<String>> readOds(File file) throws Exception {
List<List<String>> readOds(File file) throws Exception {
List<List<String>> result = new ArrayList<>();
try (ZipFile zip = new ZipFile(file)) {
var entry = zip.getEntry("content.xml");
if (entry == null) throw new RuntimeException("Ungültige ODS-Datei: content.xml fehlt");
var factory = DocumentBuilderFactory.newInstance();
var factory = XxeSafeXmlParser.hardenedFactory();
factory.setNamespaceAware(true);
var builder = factory.newDocumentBuilder();
var doc = builder.parse(zip.getInputStream(entry));

View File

@@ -0,0 +1,20 @@
package org.raddatz.familienarchiv.importing;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
class XxeSafeXmlParser {
private XxeSafeXmlParser() {}
static DocumentBuilderFactory hardenedFactory() throws ParserConfigurationException {
var factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);
return factory;
}
}

View File

@@ -3,13 +3,16 @@ package org.raddatz.familienarchiv.security;
import lombok.RequiredArgsConstructor;
import org.raddatz.familienarchiv.user.CustomUserDetailsService;
import jakarta.servlet.http.HttpServletResponse;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.annotation.Order;
import org.springframework.core.env.Environment;
import org.springframework.security.authentication.dao.DaoAuthenticationProvider;
import org.springframework.security.config.Customizer;
import org.springframework.security.config.annotation.web.builders.HttpSecurity;
import org.springframework.security.config.annotation.web.configuration.EnableWebSecurity;
import org.springframework.security.config.annotation.web.configurers.AbstractHttpConfigurer;
import org.springframework.security.crypto.bcrypt.BCryptPasswordEncoder;
import org.springframework.security.crypto.password.PasswordEncoder;
import org.springframework.security.web.SecurityFilterChain;
@@ -34,6 +37,28 @@ public class SecurityConfig {
return authProvider;
}
@Bean
@Order(1)
public SecurityFilterChain managementFilterChain(HttpSecurity http) throws Exception {
http
.securityMatcher("/actuator/**")
.authorizeHttpRequests(auth -> {
// Health and Prometheus are open — Docker health checks and Prometheus scraping need no credentials.
auth.requestMatchers("/actuator/health", "/actuator/prometheus").permitAll();
// All other actuator endpoints (metrics, info, env, heapdump…) require authentication.
auth.anyRequest().authenticated();
})
// Explicitly return 401 for any unauthenticated actuator request.
// Without this override, Spring Security's DelegatingAuthenticationEntryPoint
// would redirect browser-like clients to the form-login page (302 → /login),
// making it impossible to distinguish "not authenticated" from "not found" in tests.
.exceptionHandling(ex -> ex.authenticationEntryPoint(
(req, res, e) -> res.setStatus(HttpServletResponse.SC_UNAUTHORIZED)))
.formLogin(AbstractHttpConfigurer::disable)
.csrf(AbstractHttpConfigurer::disable);
return http.build();
}
@Bean
public SecurityFilterChain securityFilterChain(HttpSecurity http) throws Exception {
http
@@ -54,8 +79,10 @@ public class SecurityConfig {
.csrf(csrf -> csrf.disable())
.authorizeHttpRequests(auth -> {
// Health endpoint must be open so CI/Docker health checks work without credentials
auth.requestMatchers("/actuator/health").permitAll();
// Actuator endpoints are governed by managementFilterChain (@Order(1)) above.
// The permitAll() lines here are a belt-and-suspenders fallback in case any
// actuator path escapes that chain's securityMatcher. See docs/adr/017.
auth.requestMatchers("/actuator/health", "/actuator/prometheus").permitAll();
// Password reset endpoints are unauthenticated by nature
auth.requestMatchers("/api/auth/forgot-password", "/api/auth/reset-password").permitAll();
// Invite-based registration endpoints are public

View File

@@ -49,7 +49,8 @@ management:
# Management port is separate from the app port so that:
# (a) Caddy never proxies /actuator/* (it only routes :8080 → the app port)
# (b) Prometheus scrapes backend:8081 directly inside archiv-net, not via Caddy
# (c) Spring Security's session-authenticated filter chain on :8080 never sees actuator requests
# Note: in Spring Boot 4.0 the management port shares the security filter chain; /actuator/health
# and /actuator/prometheus must be explicitly permitted in SecurityConfig — see SecurityConfig.java.
port: 8081
endpoints:
web:
@@ -58,6 +59,16 @@ management:
endpoint:
prometheus:
enabled: true
# Spring Boot 4.0: metrics export is disabled by default — explicitly opt in for Prometheus
prometheus:
metrics:
export:
enabled: true
metrics:
tags:
# Common tag applied to every metric so Grafana's Spring Boot dashboard can filter by application name.
# Override via MANAGEMENT_METRICS_TAGS_APPLICATION env var.
application: ${spring.application.name}
health:
mail:
enabled: false
@@ -66,13 +77,18 @@ management:
probability: 1.0 # 100% in dev; override via MANAGEMENT_TRACING_SAMPLING_PROBABILITY in prod compose
# OpenTelemetry trace export — failures are non-fatal (app starts cleanly without Tempo running)
# The default http://localhost:4317 ensures CI compatibility when no observability stack is present.
# Port 4318 = OTLP HTTP (the default transport for Spring Boot's HttpExporter).
# Port 4317 is gRPC-only; sending HTTP/1.1 to it produces "Connection reset".
otel:
service:
name: familienarchiv-backend
exporter:
otlp:
endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT:http://localhost:4317}
endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT:http://localhost:4318}
logs:
exporter: none # Promtail captures Docker logs; disable OTLP log export (Tempo only accepts traces)
metrics:
exporter: none # Prometheus scrapes /actuator/prometheus; disable OTLP metric export to Tempo
springdoc:
api-docs:

View File

@@ -0,0 +1,63 @@
package org.raddatz.familienarchiv;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.boot.test.web.server.LocalManagementPort;
import org.springframework.context.annotation.Import;
import org.springframework.http.ResponseEntity;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.web.client.DefaultResponseErrorHandler;
import org.springframework.web.client.RestTemplate;
import software.amazon.awssdk.services.s3.S3Client;
import java.io.IOException;
import static org.assertj.core.api.Assertions.assertThat;
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@ActiveProfiles("test")
@Import(PostgresContainerConfig.class)
class ActuatorPrometheusIT {
@LocalManagementPort
private int managementPort;
@MockitoBean
S3Client s3Client;
@Test
void prometheus_endpoint_returns_200_without_credentials() {
ResponseEntity<String> response = noThrowTemplate().getForEntity(
"http://localhost:" + managementPort + "/actuator/prometheus", String.class);
assertThat(response.getStatusCode().value()).isEqualTo(200);
}
@Test
void prometheus_endpoint_returns_jvm_metrics() {
ResponseEntity<String> response = noThrowTemplate().getForEntity(
"http://localhost:" + managementPort + "/actuator/prometheus", String.class);
assertThat(response.getBody()).contains("jvm_memory_used_bytes");
}
@Test
void actuator_metrics_requires_authentication() {
ResponseEntity<String> response = noThrowTemplate().getForEntity(
"http://localhost:" + managementPort + "/actuator/metrics", String.class);
assertThat(response.getStatusCode().value()).isEqualTo(401);
}
private RestTemplate noThrowTemplate() {
RestTemplate template = new RestTemplate();
template.setErrorHandler(new DefaultResponseErrorHandler() {
@Override
public boolean hasError(org.springframework.http.client.ClientHttpResponse response) throws IOException {
return false;
}
});
return template;
}
}

View File

@@ -0,0 +1,55 @@
package org.raddatz.familienarchiv;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.boot.test.web.server.LocalManagementPort;
import org.springframework.context.annotation.Import;
import org.springframework.http.ResponseEntity;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.web.client.DefaultResponseErrorHandler;
import org.springframework.web.client.RestTemplate;
import software.amazon.awssdk.services.s3.S3Client;
import java.io.IOException;
import static org.assertj.core.api.Assertions.assertThat;
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@ActiveProfiles("test")
@Import(PostgresContainerConfig.class)
class ActuatorSecurityTest {
@LocalManagementPort
private int managementPort;
@MockitoBean
S3Client s3Client;
@Test
void actuator_health_is_accessible_without_authentication() {
ResponseEntity<String> response = noThrowTemplate().getForEntity(
"http://localhost:" + managementPort + "/actuator/health", String.class);
assertThat(response.getStatusCode().value()).isEqualTo(200);
}
@Test
void actuator_env_requires_authentication() {
ResponseEntity<String> response = noThrowTemplate().getForEntity(
"http://localhost:" + managementPort + "/actuator/env", String.class);
assertThat(response.getStatusCode().value()).isEqualTo(401);
}
private RestTemplate noThrowTemplate() {
RestTemplate template = new RestTemplate();
template.setErrorHandler(new DefaultResponseErrorHandler() {
@Override
public boolean hasError(org.springframework.http.client.ClientHttpResponse response) throws IOException {
return false;
}
});
return template;
}
}

View File

@@ -21,9 +21,12 @@ import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.xml.sax.SAXParseException;
import java.io.File;
import java.io.OutputStream;
import java.io.ByteArrayOutputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.LocalDate;
@@ -32,6 +35,8 @@ import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
@@ -520,6 +525,25 @@ class MassImportServiceTest {
assertThat(result).isEqualTo("hello");
}
// ─── readOds — XXE security regression ───────────────────────────────────
// Security regression — do not remove. Introduced by issue #528.
@Test
void readOds_rejects_xxe_doctype_payload(@TempDir Path tempDir) throws Exception {
File malicious = buildXxeOds(tempDir, "file:///etc/hostname");
assertThatThrownBy(() -> service.readOds(malicious))
.isInstanceOf(SAXParseException.class)
.hasMessageContaining("DOCTYPE is disallowed");
}
@Test
void readOds_parses_valid_ods_correctly(@TempDir Path tempDir) throws Exception {
File valid = buildValidOds(tempDir, "Mustermann");
List<List<String>> rows = service.readOds(valid);
assertThat(rows).isNotEmpty();
assertThat(rows.get(0)).contains("Mustermann");
}
// ─── helpers ──────────────────────────────────────────────────────────────
/**
@@ -553,4 +577,47 @@ class MassImportServiceTest {
"" // 13: transcription
);
}
/** Creates a minimal ODS ZIP containing a content.xml with an XXE payload. */
private File buildXxeOds(Path dir, String entityTarget) throws Exception {
String xml = "<?xml version=\"1.0\"?>"
+ "<!DOCTYPE foo [<!ENTITY xxe SYSTEM \"" + entityTarget + "\">]>"
+ "<office:document-content"
+ " xmlns:office=\"urn:oasis:names:tc:opendocument:xmlns:office:1.0\""
+ " xmlns:table=\"urn:oasis:names:tc:opendocument:xmlns:table:1.0\""
+ " xmlns:text=\"urn:oasis:names:tc:opendocument:xmlns:text:1.0\">"
+ "<office:body><office:spreadsheet>"
+ "<table:table><table:table-row><table:table-cell>"
+ "<text:p>&xxe;</text:p>"
+ "</table:table-cell></table:table-row></table:table>"
+ "</office:spreadsheet></office:body>"
+ "</office:document-content>";
return writeOdsZip(dir.resolve("malicious.ods"), xml);
}
/** Creates a minimal valid ODS ZIP containing a content.xml with the given cell value. */
private File buildValidOds(Path dir, String cellValue) throws Exception {
String xml = "<?xml version=\"1.0\"?>"
+ "<office:document-content"
+ " xmlns:office=\"urn:oasis:names:tc:opendocument:xmlns:office:1.0\""
+ " xmlns:table=\"urn:oasis:names:tc:opendocument:xmlns:table:1.0\""
+ " xmlns:text=\"urn:oasis:names:tc:opendocument:xmlns:text:1.0\">"
+ "<office:body><office:spreadsheet>"
+ "<table:table><table:table-row><table:table-cell>"
+ "<text:p>" + cellValue + "</text:p>"
+ "</table:table-cell></table:table-row></table:table>"
+ "</office:spreadsheet></office:body>"
+ "</office:document-content>";
return writeOdsZip(dir.resolve("valid.ods"), xml);
}
private File writeOdsZip(Path destination, String contentXml) throws Exception {
try (OutputStream fos = Files.newOutputStream(destination);
ZipOutputStream zip = new ZipOutputStream(fos)) {
zip.putNextEntry(new ZipEntry("content.xml"));
zip.write(contentXml.getBytes(StandardCharsets.UTF_8));
zip.closeEntry();
}
return destination.toFile();
}
}

View File

@@ -142,10 +142,11 @@ services:
container_name: obs-grafana
restart: unless-stopped
ports:
- "127.0.0.1:${PORT_GRAFANA:-3001}:3000"
- "127.0.0.1:${PORT_GRAFANA:-3003}:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PASSWORD:-changeme}
GF_USERS_ALLOW_SIGN_UP: "false"
GF_SERVER_ROOT_URL: ${GF_SERVER_ROOT_URL:-http://localhost:3003}
volumes:
- grafana_data:/var/lib/grafana
- ./infra/observability/grafana/provisioning:/etc/grafana/provisioning:ro
@@ -193,7 +194,7 @@ services:
obs-glitchtip-db-init:
condition: service_completed_successfully
environment:
DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@archive-db:5432/glitchtip
DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST:-archive-db}:5432/glitchtip
REDIS_URL: redis://obs-redis:6379/0
SECRET_KEY: ${GLITCHTIP_SECRET_KEY}
GLITCHTIP_DOMAIN: ${GLITCHTIP_DOMAIN:-http://localhost:3002}
@@ -201,7 +202,13 @@ services:
EMAIL_URL: smtp://mailpit:1025
GLITCHTIP_MAX_EVENT_LIFE_DAYS: 90
ports:
- "127.0.0.1:${PORT_GLITCHTIP:-3002}:8080"
- "127.0.0.1:${PORT_GLITCHTIP:-3002}:8000"
healthcheck:
test: ["CMD", "bash", "-c", "echo > /dev/tcp/localhost/8000"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
networks:
- archiv-net
- obs-net
@@ -215,7 +222,7 @@ services:
obs-redis:
condition: service_healthy
environment:
DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@archive-db:5432/glitchtip
DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST:-archive-db}:5432/glitchtip
REDIS_URL: redis://obs-redis:6379/0
SECRET_KEY: ${GLITCHTIP_SECRET_KEY}
networks:
@@ -229,10 +236,10 @@ services:
environment:
PGPASSWORD: ${POSTGRES_PASSWORD}
command: >
sh -c "psql -h archive-db -U ${POSTGRES_USER} -tc
sh -c "psql -h ${POSTGRES_HOST:-archive-db} -U ${POSTGRES_USER} -tc
\"SELECT 1 FROM pg_database WHERE datname = 'glitchtip'\" |
grep -q 1 ||
psql -h archive-db -U ${POSTGRES_USER} -c \"CREATE DATABASE glitchtip;\""
psql -h ${POSTGRES_HOST:-archive-db} -U ${POSTGRES_USER} -c \"CREATE DATABASE glitchtip;\""
networks:
- archiv-net

View File

@@ -213,7 +213,11 @@ services:
APP_MAIL_FROM: ${APP_MAIL_FROM:-noreply@raddatz.cloud}
SPRING_MAIL_PROPERTIES_MAIL_SMTP_AUTH: ${MAIL_SMTP_AUTH:-true}
SPRING_MAIL_PROPERTIES_MAIL_SMTP_STARTTLS_ENABLE: ${MAIL_STARTTLS_ENABLE:-true}
OTEL_EXPORTER_OTLP_ENDPOINT: http://tempo:4317
OTEL_EXPORTER_OTLP_ENDPOINT: http://tempo:4318
OTEL_LOGS_EXPORTER: none
OTEL_METRICS_EXPORTER: none
MANAGEMENT_METRICS_TAGS_APPLICATION: Familienarchiv
MANAGEMENT_TRACING_SAMPLING_PROBABILITY: ${MANAGEMENT_TRACING_SAMPLING_PROBABILITY:-0.1}
networks:
- archiv-net
healthcheck:

View File

@@ -43,7 +43,7 @@ graph TD
- SSE notifications transit Caddy (browser → Caddy → backend); the backend is never reachable directly from the public internet. The SvelteKit SSR layer is bypassed for SSE, but Caddy is not.
- The Caddyfile responds `404` on `/actuator/*` (defense in depth). Internal monitoring scrapes the backend on the docker network, not through Caddy.
- Production and staging cohabit on the same host via docker compose project names: `archiv-production` (ports 8080/3000) and `archiv-staging` (ports 8081/3001).
- An optional observability stack (Prometheus, Node Exporter, cAdvisor) runs as a separate compose file: `docker compose -f docker-compose.observability.yml up -d`. It joins `archiv-net` and scrapes the backend's management port (`:8081`). Configuration lives under `infra/observability/`.
- An optional observability stack (Prometheus, Node Exporter, cAdvisor, Loki, Tempo, Grafana, GlitchTip) runs as a separate compose file. Configuration lives under `infra/observability/`. In production and CI, the stack is managed from `/opt/familienarchiv/` (CI copies it there on every nightly run) so bind mounts survive workspace wipes — see §4 for the ops procedure.
### OCR memory requirements
@@ -107,8 +107,12 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `MAIL_SMTP_AUTH` | SMTP auth enabled | `false` (dev) | YES (prod) | — |
| `MAIL_STARTTLS_ENABLE` | STARTTLS enabled | `false` (dev) | YES (prod) | — |
| `SPRING_PROFILES_ACTIVE` | Spring profile | `dev,e2e` (compose) | YES | — |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP gRPC endpoint for distributed traces (Tempo). Set to `http://tempo:4317` via compose. | `http://localhost:4317` | — | — |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP HTTP endpoint for distributed traces (Tempo). Port 4318 = HTTP transport; port 4317 is gRPC-only and causes "Connection reset" with Spring Boot's HttpExporter. | `http://localhost:4318` | — | — |
| `OTEL_LOGS_EXPORTER` | Disable OTLP log export — Promtail captures Docker logs via the logging driver; Tempo does not accept logs. | `none` | — | — |
| `OTEL_METRICS_EXPORTER` | Disable OTLP metric export — Prometheus scrapes `/actuator/prometheus` via pull model; Tempo does not accept metrics. | `none` | — | — |
| `MANAGEMENT_METRICS_TAGS_APPLICATION` | Common tag added to every Micrometer metric. Required by Grafana's Spring Boot Observability dashboard (ID 17175) `label_values(application)` template variable. | `Familienarchiv` | — | — |
| `MANAGEMENT_TRACING_SAMPLING_PROBABILITY` | Micrometer tracing sample rate; overridden to `0.0` in test profile. | `0.1` (compose) / `1.0` (dev) | — | — |
| `SENTRY_DSN` | GlitchTip / Sentry DSN for backend error reporting. Leave empty to disable the SDK. Set after GlitchTip first-run (§4). | — | — | YES |
### PostgreSQL container
@@ -142,11 +146,13 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| Variable | Purpose | Default | Required? | Sensitive? |
|---|---|---|---|---|
| `PORT_PROMETHEUS` | Host port for the Prometheus UI (bound to `127.0.0.1` only) | `9090` | — | — |
| `PORT_GRAFANA` | Host port for the Grafana UI (bound to `127.0.0.1` only) | `3001` | — | — |
| `PORT_GRAFANA` | Host port for the Grafana UI (bound to `127.0.0.1` only) | `3003` | — | — |
| `POSTGRES_HOST` | PostgreSQL hostname for GlitchTip's db-init job and workers. Override when only the staging stack is running and `archive-db` is not resolvable by that name. | `archive-db` | — | — |
| `GRAFANA_ADMIN_PASSWORD` | Grafana `admin` user password | `changeme` | YES (prod) | YES |
| `PORT_GLITCHTIP` | Host port for the GlitchTip UI (bound to `127.0.0.1` only) | `3002` | — | — |
| `GLITCHTIP_DOMAIN` | Public-facing base URL for GlitchTip (used in email links and CORS) | `http://localhost:3002` | YES (prod) | — |
| `GLITCHTIP_SECRET_KEY` | Django secret key for GlitchTip — generate with `python3 -c "import secrets; print(secrets.token_hex(32))"` | — | YES | YES |
| `VITE_SENTRY_DSN` | GlitchTip/Sentry DSN for the frontend (SvelteKit) — injected at build time via Vite. Leave empty to disable. Set after GlitchTip first-run (§4). | — | — | YES |
---
@@ -193,6 +199,29 @@ curl -fsSL https://tailscale.com/install.sh | sh && tailscale up
# files to disk during execution (cleaned up unconditionally on completion).
# A multi-tenant runner would need to switch to stdin-piped env files.
# (See https://docs.gitea.com/usage/actions/quickstart for the register step.)
# Runner workspace directory — required for DooD bind-mount resolution (ADR-015).
# act_runner stores job workspaces here so that docker compose bind mounts resolve
# to real host paths. The path must be identical on the host and inside job containers.
mkdir -p /srv/gitea-workspace
# Observability config permanent directory — the nightly CI job copies
# docker-compose.observability.yml and infra/observability/ here on every run.
# The obs stack is always started from this path, not from the workspace.
# See ADR-016 for why this directory is used instead of a server-pull approach.
mkdir -p /opt/familienarchiv/infra
# Both paths must also appear in the runner service volumes in ~/docker/gitea/compose.yaml:
# volumes:
# - /srv/gitea-workspace:/srv/gitea-workspace
# /opt/familienarchiv does NOT need to be in the runner container's volumes — job
# containers are spawned by the host daemon directly (DooD), so the host path is
# accessible to them as long as runner-config.yaml lists it in valid_volumes + options.
# See runner-config.yaml (workdir_parent + valid_volumes + options) and ADR-015/016.
# ⚠ IMPORTANT: after any change to runner-config.yaml (valid_volumes, options, workdir_parent),
# restart the Gitea Act runner for the new config to take effect:
# docker restart gitea-runner
# Until restarted, job containers are spawned with the old config and any new bind mounts
# (e.g. /opt/familienarchiv) will not be available inside job steps.
```
### 3.2 DNS records
@@ -226,6 +255,7 @@ git.raddatz.cloud A <server IP>
| `GRAFANA_ADMIN_PASSWORD` | both | Grafana `admin` login — generate a strong password |
| `GLITCHTIP_SECRET_KEY` | both | Django secret key — `openssl rand -hex 32` |
| `SENTRY_DSN` | both | GlitchTip project DSN — set after first-run (§4); leave empty to keep Sentry disabled |
| `VITE_SENTRY_DSN` | both | GlitchTip frontend project DSN — set after first-run (§4); leave empty to keep Sentry disabled |
### 3.4 First deploy
@@ -253,6 +283,9 @@ Before the first deploy: rotate `PROD_APP_ADMIN_PASSWORD` to a strong value. Aft
## 4. Logs + observability
> **Developer guide (where to look for what, LogQL queries, trace exploration) → [docs/OBSERVABILITY.md](./OBSERVABILITY.md).**
> This section covers the ops side: starting the stack, env vars, and CI wiring.
### First-response commands
```bash
@@ -275,13 +308,70 @@ docker compose logs --tail=200 <service>
### Observability stack
An observability stack is available via `docker-compose.observability.yml`. Configuration lives under `infra/observability/`. Start it after the main stack is up (which creates `archiv-net`):
An observability stack is available via `docker-compose.observability.yml`. Configuration lives under `infra/observability/`.
#### Dev — start from the workspace
```bash
docker compose up -d # creates archiv-net
docker compose -f docker-compose.observability.yml up -d
```
#### Why the obs stack is managed differently from the main app stack
The main app stack (`docker-compose.prod.yml`) has no config-file bind mounts — its containers read config from env vars and image defaults. The workspace is wiped after each CI run but that does not affect running containers, because they hold no references to workspace paths.
The obs stack is different: `prometheus.yml`, `tempo.yml`, Loki config, Grafana provisioning files, and Promtail config are all bind-mounted from the host filesystem into their containers. If those source paths disappear (workspace wipe), the containers can restart fine until a `docker compose up` is run again — at that point Docker tries to re-resolve the bind-mount source and fails because the workspace path no longer exists.
The fix is to keep the obs compose file and config tree at a **permanent path** that CI copies to on every run but which survives between runs: `/opt/familienarchiv/` (see ADR-016).
#### Production — managed from `/opt/familienarchiv/`
Every CI run (nightly + release) copies `docker-compose.observability.yml` and `infra/observability/` to `/opt/familienarchiv/` before starting the stack. Bind mounts then resolve to `/opt/familienarchiv/infra/observability/…` — a stable path that outlasts any workspace wipe.
**Environment variables** follow the same two-source model as the main stack:
| Source | What it contains | Managed by |
|---|---|---|
| `infra/observability/obs.env` | All non-secret config (ports, URLs, hostnames) | Git — reviewed in PRs |
| `/opt/familienarchiv/obs-secrets.env` | Passwords and secret keys only | CI — written fresh from Gitea secrets on every deploy |
Both files are passed explicitly via `--env-file` to the compose command, so there is no implicit auto-read `.env` and no operator-managed file to keep in sync.
**Non-secret config** (`infra/observability/obs.env`):
| Key | Value | Notes |
|---|---|---|
| `PORT_GRAFANA` | `3003` | Avoids collision with staging frontend on port 3001 |
| `PORT_GLITCHTIP` | `3002` | |
| `PORT_PROMETHEUS` | `9090` | |
| `GF_SERVER_ROOT_URL` | `https://grafana.archiv.raddatz.cloud` | Required for alert email links and OAuth redirects |
| `GLITCHTIP_DOMAIN` | `https://glitchtip.archiv.raddatz.cloud` | Must match the Caddy vhost |
| `POSTGRES_HOST` | `archive-db` | Override if only the staging stack is running |
**Secret keys** (set in Gitea secrets, injected by CI into `obs-secrets.env`):
| Gitea secret | Notes |
|---|---|
| `GRAFANA_ADMIN_PASSWORD` | Strong unique password; shared by nightly and release |
| `GLITCHTIP_SECRET_KEY` | `openssl rand -hex 32`; shared by nightly and release |
| `STAGING_POSTGRES_PASSWORD` / `PROD_POSTGRES_PASSWORD` | Must match the running PostgreSQL container |
To start or restart the obs stack manually on the server (after CI has run at least once):
```bash
docker compose \
-f /opt/familienarchiv/docker-compose.observability.yml \
--env-file /opt/familienarchiv/infra/observability/obs.env \
--env-file /opt/familienarchiv/obs-secrets.env \
up -d --wait --remove-orphans
```
> **Note (manual ops only):** CI clears the destination with `rm -rf` before copying, so deleted files are removed automatically on the next run. If you copy manually with `cp -r` without first removing the directory, stale files from deleted configs will persist until cleaned up:
> ```bash
> rm /opt/familienarchiv/infra/observability/<path-to-removed-file>
> ```
Current services:
| Service | Image | Purpose |
@@ -290,11 +380,11 @@ Current services:
| `obs-node-exporter` | `prom/node-exporter:v1.9.0` | Host-level CPU / memory / disk / network metrics |
| `obs-cadvisor` | `gcr.io/cadvisor/cadvisor:v0.52.1` | Per-container resource metrics |
| `obs-loki` | `grafana/loki:3.4.2` | Log aggregation — receives log streams from Promtail. Port 3100 is `expose`-only (not host-bound). |
| `obs-promtail` | `grafana/promtail:3.4.2` | Log shipping agent — reads all Docker container logs via the Docker socket and forwards them to Loki with `container_name`, `compose_service`, and `compose_project` labels |
| `obs-tempo` | `grafana/tempo:2.7.2` | Distributed trace storage — OTLP gRPC receiver on port 4317, OTLP HTTP on port 4318 (both `archiv-net`-internal). Grafana queries traces on port 3200 (`obs-net`-internal). All ports are `expose`-only (not host-bound). |
| `obs-grafana` | `grafana/grafana-oss:11.6.1` | Unified observability UI — metrics dashboards, log exploration, trace viewer. Bound to `127.0.0.1:${PORT_GRAFANA:-3001}` on the host. |
| `obs-glitchtip` | `glitchtip/glitchtip:v4` | Sentry-compatible error tracker. Receives frontend + backend error events, groups by fingerprint, provides issue UI with stack traces. Bound to `127.0.0.1:${PORT_GLITCHTIP:-3002}`. |
| `obs-glitchtip-worker` | `glitchtip/glitchtip:v4` | Celery + beat worker — processes async GlitchTip tasks (event ingestion, notifications, cleanup). |
| `obs-promtail` | `grafana/promtail:3.4.2` | Log shipping agent — reads all Docker container logs via the Docker socket and forwards them to Loki with `container_name`, `compose_service`, `compose_project`, and `job` labels. The `job` label is mapped from the Docker Compose service name (`com.docker.compose.service`) so that Grafana Loki dashboard queries (`{job="backend"}`, `{job="frontend"}`) work out of the box and the "App" variable dropdown is populated. |
| `obs-tempo` | `grafana/tempo:2.7.2` | Distributed trace storage — OTLP HTTP receiver on port 4318 (`archiv-net`-internal; backend sends traces here). Grafana queries traces on port 3200 (`obs-net`-internal). All ports are `expose`-only (not host-bound). |
| `obs-grafana` | `grafana/grafana-oss:11.6.1` | Unified observability UI — metrics dashboards, log exploration, trace viewer. Bound to `127.0.0.1:${PORT_GRAFANA:-3003}` on the host. |
| `obs-glitchtip` | `glitchtip/glitchtip:6.1.6` | Sentry-compatible error tracker. Receives frontend + backend error events, groups by fingerprint, provides issue UI with stack traces. Bound to `127.0.0.1:${PORT_GLITCHTIP:-3002}`. |
| `obs-glitchtip-worker` | `glitchtip/glitchtip:6.1.6` | Celery + beat worker — processes async GlitchTip tasks (event ingestion, notifications, cleanup). |
| `obs-redis` | `redis:7-alpine` | Celery task broker for GlitchTip. Internal to `obs-net`; no host port exposed. |
| `obs-glitchtip-db-init` | `postgres:16-alpine` | One-shot init container. Creates the `glitchtip` database on the existing `archive-db` PostgreSQL instance if it does not already exist. Runs at stack startup; exits cleanly once done. |
@@ -302,7 +392,7 @@ Current services:
| Item | Value |
|---|---|
| URL | `http://localhost:3001` (or `http://localhost:$PORT_GRAFANA`) |
| URL | `http://localhost:3003` (or `http://localhost:$PORT_GRAFANA`) |
| Username | `admin` |
| Password | `$GRAFANA_ADMIN_PASSWORD` (default: `changeme`**change before exposing to a network**) |
@@ -332,7 +422,7 @@ docker exec obs-loki wget -qO- \
**Prefer `compose_service` over `container_name` in LogQL queries**`container_name` differs between dev (`archive-backend`) and prod (`archiv-production-backend-1`), while `compose_service` is stable (`backend`, `db`, `minio`, etc.).
Prometheus port `9090` and Grafana port `3001` are bound to `127.0.0.1` on the host. No other observability ports are host-bound.
Prometheus port `9090` and Grafana port `3003` (default; configurable via `PORT_GRAFANA`) are bound to `127.0.0.1` on the host. No other observability ports are host-bound.
#### GlitchTip

180
docs/OBSERVABILITY.md Normal file
View File

@@ -0,0 +1,180 @@
# Observability Guide
> **Ops reference (starting the stack, env vars, CI wiring) → [DEPLOYMENT.md §4](./DEPLOYMENT.md#4-logs--observability).**
> This file is for developers: what signal lives where, how to reach it, and what to look for.
## Where to look for what
| I want to… | Go to |
|---|---|
| See the last N log lines from the backend | `docker compose logs --tail=100 backend` |
| Search logs by keyword across time | Grafana → Explore → Loki |
| Understand why an HTTP request failed | Grafana → Explore → Loki → filter by `traceId` → follow link to Tempo |
| See a full distributed trace (DB queries, HTTP calls) | Grafana → Explore → Tempo → search by service or trace ID |
| Check JVM heap / GC / thread count | Grafana → Dashboards → Spring Boot Observability |
| Check HTTP error rate or p95 latency | Grafana → Dashboards → Spring Boot Observability |
| Check host CPU / memory / disk | Grafana → Dashboards → Node Exporter Full |
| See grouped application errors with stack traces | GlitchTip |
| Check if the backend is healthy | `curl http://localhost:8081/actuator/health` (on the server) |
| Check what Prometheus is scraping | `curl http://localhost:9090/api/v1/targets` (on the server) |
## Access
| Tool | External URL | Who it's for |
|---|---|---|
| Grafana | `https://grafana.archiv.raddatz.cloud` | Logs, metrics, traces — the primary observability UI |
| GlitchTip | `https://glitchtip.archiv.raddatz.cloud` | Grouped errors with stack traces and release tracking |
Loki, Tempo, and Prometheus have no external URL. They are internal services, accessible only through Grafana (or via SSH tunnel — see below).
## Logs (Loki)
Logs reach Loki via Promtail, which reads all Docker container logs from the Docker socket and ships them with labels derived from Docker Compose metadata.
### Labels available in every log line
| Label | What it contains | Example |
|---|---|---|
| `job` | Compose service name | `backend`, `frontend`, `db` |
| `compose_service` | Same as `job` | `backend` |
| `compose_project` | Compose project name | `archiv-staging`, `archiv-production` |
| `container_name` | Docker container name | `archiv-staging-backend-1` |
| `filename` | Docker log source | `/var/lib/docker/containers/…` |
**Use `job` in LogQL queries** — it is stable across dev, staging, and production. `container_name` changes between environments.
### Common LogQL queries in Grafana Explore
```logql
# All backend logs
{job="backend"}
# Backend ERROR and WARN lines only
{job="backend"} |= "ERROR" or {job="backend"} |= "WARN"
# All logs for a specific request (paste a traceId from a log line)
{job="backend"} |= "3fa85f64-5717-4562-b3fc-2c963f66afa6"
# Log lines containing a specific exception class
{job="backend"} |~ "DomainException|NullPointerException"
# Frontend logs
{job="frontend"}
# Database (slow query log, if enabled)
{job="db"}
```
### Log → Trace correlation
Spring Boot writes the active `traceId` into every log line when a request is being processed:
```
2026-05-16 ... INFO [Familienarchiv,3fa85f64...,1b2c3d4e] o.r.f.document.DocumentService : ...
```
In Grafana Explore → Loki, log lines with a `traceId` field show a **Tempo** link. Clicking it opens the full trace in Explore → Tempo without copying and pasting IDs.
This linking is configured in the Loki datasource provisioning via the `traceId` derived field regex. No manual setup required.
## Traces (Tempo)
The backend sends traces to Tempo via OTLP HTTP (port 4318). Every inbound HTTP request and every JPA query produces a span. Spans are linked into traces by the propagated `traceId` header.
### Finding a trace in Grafana
**Option A — from a log line:**
1. Grafana → Explore → select *Loki* datasource
2. Query `{job="backend"}` and find the failing request
3. Click the **Tempo** link in the log line (appears when `traceId` is present)
**Option B — by service:**
1. Grafana → Explore → select *Tempo* datasource
2. Query type: **Search**
3. Service name: `familienarchiv-backend`
4. Filter by HTTP status, duration, or operation name as needed
**Option C — by trace ID:**
1. Grafana → Explore → select *Tempo* datasource
2. Query type: **TraceQL** or **Trace ID**
3. Paste the trace ID
### What each span type tells you
| Root span name pattern | What it covers |
|---|---|
| `GET /api/documents`, `POST /api/documents` | Full HTTP request lifecycle |
| `SELECT archiv.*` | A single JPA/JDBC query inside that request |
| `HikariPool.getConnection` | Connection pool wait time |
A slow `SELECT` span inside an otherwise fast HTTP span pinpoints a missing index. A slow `HikariPool.getConnection` span indicates connection pool exhaustion.
### Sampling rate
- **Dev**: 100% of requests are traced (`management.tracing.sampling.probability: 1.0` in `application.yaml`)
- **Staging / Production**: 10% (`MANAGEMENT_TRACING_SAMPLING_PROBABILITY=0.1` in `docker-compose.prod.yml`)
To find a trace for a specific request in staging/production, either increase the sampling rate temporarily or trigger the request multiple times.
## Metrics (Prometheus → Grafana)
Prometheus scrapes the backend management endpoint every 15 s:
```
Target: backend:8081/actuator/prometheus
Labels: job="spring-boot", application="Familienarchiv"
```
All Spring Boot metrics carry the `application="Familienarchiv"` tag, which is how the Grafana Spring Boot Observability dashboard (ID 17175) filters to this service.
### Useful Prometheus queries (run on the server or via Grafana Explore → Prometheus)
```promql
# HTTP error rate (5xx) as a fraction of all requests
sum(rate(http_server_requests_seconds_count{status=~"5.."}[5m]))
/ sum(rate(http_server_requests_seconds_count[5m]))
# p95 response time
histogram_quantile(0.95, sum by (le) (
rate(http_server_requests_seconds_bucket[5m])
))
# JVM heap used
jvm_memory_used_bytes{area="heap", application="Familienarchiv"}
# Active DB connections
hikaricp_connections_active
```
## Errors (GlitchTip)
GlitchTip receives errors from both the backend (via Sentry Java SDK) and the frontend (via Sentry JavaScript SDK). It groups events by fingerprint, tracks first/last seen times, and links to the release that introduced the error.
GlitchTip complements Loki: use GlitchTip when you need **grouped, de-duplicated errors with stack traces and release attribution**; use Loki when you need **raw log lines with full context** or want to search across all log levels.
## Direct API access (debugging only)
Loki and Tempo bind no host ports. To reach them directly from your laptop, use an SSH tunnel through the server:
```bash
# Loki API on localhost:3100 (then query via curl or logcli)
ssh -L 3100:172.20.0.x:3100 root@raddatz.cloud
# Replace 172.20.0.x with the obs-loki container IP:
# docker inspect obs-loki --format '{{.NetworkSettings.Networks.archiv-obs-net.IPAddress}}'
# Tempo API on localhost:3200 (then query via curl or tempo-cli)
ssh -L 3200:172.20.0.x:3200 root@raddatz.cloud
```
In practice, Grafana Explore covers all common debugging workflows without needing direct API access.
## Signal summary
| Signal | Source | Transport | Storage | UI |
|---|---|---|---|---|
| Application logs | Spring Boot stdout → Docker log driver | Promtail → Loki push API | Loki | Grafana Explore → Loki |
| Distributed traces | Spring Boot OTel agent | OTLP HTTP → Tempo:4318 | Tempo | Grafana Explore → Tempo |
| JVM + HTTP metrics | Spring Actuator `/actuator/prometheus` | Prometheus pull (15 s) | Prometheus | Grafana dashboards |
| Host metrics | node-exporter | Prometheus pull | Prometheus | Grafana → Node Exporter Full |
| Container metrics | cAdvisor | Prometheus pull | Prometheus | Grafana (via Prometheus datasource) |
| Application errors | Sentry SDK | HTTP POST → GlitchTip ingest | GlitchTip DB | GlitchTip UI |

View File

@@ -0,0 +1,69 @@
# ADR-015: DooD workspace bind mount for Compose file bind-mount resolution
## Status
Accepted
## Context
The deploy workflows (`.gitea/workflows/nightly.yml`, `release.yml`) run job steps inside Docker containers via Docker-out-of-Docker (DooD): the Gitea runner mounts the host Docker socket, and act_runner spawns sibling containers for each job.
When a job step calls `docker compose -f docker-compose.observability.yml up`, Docker Compose resolves relative bind-mount sources against `$(pwd)` inside the job container and passes the resulting absolute paths to the **host** daemon. For example, `./infra/observability/prometheus/prometheus.yml` becomes `/some/path/infra/observability/prometheus/prometheus.yml`, and the host daemon tries to bind-mount that path from the **host filesystem**.
In the default DooD setup (`runner-config.yaml` with only `valid_volumes: ["/var/run/docker.sock"]`), job container workspaces live in the act_runner overlay2 layer. The host has no corresponding directory at the job container's `$(pwd)` path, so the daemon auto-creates an empty directory in its place. The container then fails to start because the mount target was expected to be a file, not a directory:
```
error mounting "…/prometheus/prometheus.yml" to rootfs at "/etc/prometheus/prometheus.yml": not a directory
```
This affected all five config file bind mounts in `docker-compose.observability.yml`.
## Decision
Configure act_runner to store job workspaces on a real host path (`/srv/gitea-workspace`) and mount that path into both the runner container and every job container at the **same absolute path**. The identity of the host path and container path is the key constraint: Compose resolves to an absolute path and hands it to the host daemon, which looks for that exact path on the host filesystem.
**runner-config.yaml changes:**
```yaml
container:
workdir_parent: /srv/gitea-workspace
valid_volumes:
- "/var/run/docker.sock"
- "/srv/gitea-workspace"
options: "-v /srv/gitea-workspace:/srv/gitea-workspace"
```
**Runner compose.yaml change** (host side — not in this repo):
```yaml
runner:
volumes:
- /srv/gitea-workspace:/srv/gitea-workspace
```
With this in place, `$(pwd)` inside a job container resolves to `/srv/gitea-workspace/<owner>/<repo>/`, which is a real directory on the host. Compose-managed bind mounts from that directory work without any additional steps.
## Alternatives Considered
| Alternative | Why rejected |
|---|---|
| **overlay2 `MergedDir` sync via privileged nsenter** (the previous approach, see PR #599 v1) | Required `--privileged --pid=host` (effective root on the host) plus fragile overlay2 driver assumption. Introduced stale-file risk on the host and a second stable path (`/srv/familienarchiv-*/obs-configs`) to maintain separately from the source tree. Replaced by this ADR. |
| **Build configs into a dedicated Docker image** (pattern used for MinIO bootstrap, see `infra/minio/Dockerfile`) | Viable for static files that change infrequently. Requires a build step and an image rebuild every time a config changes. Appropriate for bootstrap scripts; too heavy for frequently-tuned observability configs. |
| **Add workspace directory to runner-config `valid_volumes` only** (without `workdir_parent`) | `valid_volumes` whitelists paths that workflow steps may reference, but does not change where act_runner stores workspaces. Without `workdir_parent`, the workspace would still be in overlay2 and the bind-mount resolution problem would remain. |
| **Map workspace under a different host path than container path** (e.g. host `/srv/workspace`, container `/workspace`) | Compose resolves to the container-internal path (e.g. `/workspace/…`) and passes that to the host daemon. The host daemon interprets the source as a host path. If host `/workspace` does not exist, the daemon creates an empty directory — the original bug. The paths must be identical. |
## Consequences
- `/srv/gitea-workspace` must exist on the VPS before the runner starts. The directory was created as part of this change; it is not created automatically.
- The runner container's `compose.yaml` (maintained outside this repo at `~/docker/gitea/compose.yaml` on the VPS) must include the `- /srv/gitea-workspace:/srv/gitea-workspace` volume line. This is an out-of-band operational dependency; the prerequisite is documented in `runner-config.yaml`.
- `workdir_parent` applies to all jobs on this runner. Any future workflow that calls `docker compose` with relative bind mounts benefits automatically without further configuration.
- Job workspaces persist across runs under `/srv/gitea-workspace`. act_runner manages per-run subdirectory cleanup. Orphaned directories from interrupted runs should be cleaned up manually if disk space becomes a concern.
- Workflows that previously relied on `OBS_CONFIG_DIR` env var or the `obs-configs` stable path on the host no longer need those. Both were removed in this PR.
- This pattern does **not** apply to the `nsenter`-based Caddy reload step (ADR-012), which manages a host systemd service — a different problem class with no bind-mount equivalent.
## References
- ADR-011 — single-tenant runner trust model
- ADR-012 — nsenter via privileged container for host service management
- Issue #598 — original observability stack bind-mount failure
- `runner-config.yaml``workdir_parent`, `valid_volumes`, `options`

View File

@@ -0,0 +1,57 @@
# ADR-016: Observability stack co-location at `/opt/familienarchiv/` with CI-push config sync
## Status
Accepted
## Context
Issue #601 established that the observability stack must survive Gitea CI workspace wipes between nightly runs. When the nightly job completes, act_runner deletes the job workspace. Any Docker container that bind-mounts a config file from a workspace path (`/srv/gitea-workspace/…/infra/observability/prometheus/prometheus.yml`) then references a path that no longer exists on the host. On the next nightly run, Docker Compose either auto-creates an empty directory in its place (causing the container to fail to start because a file mount receives a directory) or finds a stale file from a previous run if the workspace happened to land at the same path.
ADR-015 solved the workspace bind-mount resolution problem: job workspaces are stored at `/srv/gitea-workspace` so `$(pwd)` inside the job container maps to a real host path. But it did not address persistence: the workspace is still wiped after the job, so bind mounts from workspace-relative paths remain fragile across runs.
### Decision drivers
1. Bind-mount sources must point to a host path that persists indefinitely, not to a path that disappears after each CI run.
2. Config files must reflect the committed state of the repo after every nightly run (no manual sync steps).
3. Secrets must not be written to the workspace or to any path managed by CI; they must survive independently of deployments.
4. The solution must not introduce new infrastructure dependencies (no SSH access from CI, no external registry, no additional server-side daemon).
### Alternatives considered
**A: Server-pull model** — a systemd timer or cron job on the server does `git pull` from the repo into `/opt/familienarchiv/` and then runs `docker compose up`. Rejected because: (1) requires git credentials on the server and a registered deploy key, (2) adds a second deployment mechanism that diverges from the CI-push model used for the main app stack, (3) timing coupling — the server pull must complete before CI's health checks run, requiring polling or a webhook.
**B: Separate directory (e.g. `/opt/obs/`)** — keeps obs configs isolated from the app stack. Rejected because: (1) the main app compose files are already in `/opt/familienarchiv/` (managed the same way), and (2) GlitchTip shares the `archive-db` PostgreSQL instance and `archiv-net` Docker network — it is architecturally part of the same deployment unit, not a separate one. Co-location reflects the actual coupling.
**C: Named Docker configs (Swarm)** — Docker Swarm supports first-class config objects that persist in the cluster. Rejected because the project does not use Swarm and introducing it solely for config persistence is a disproportionate dependency.
## Decision
The observability stack is co-located with the main application deployment at `/opt/familienarchiv/`:
- `docker-compose.observability.yml``/opt/familienarchiv/docker-compose.observability.yml`
- `infra/observability/``/opt/familienarchiv/infra/observability/`
Both the nightly CI job (`nightly.yml`) and the release job (`release.yml`) copy these files from the workspace checkout to `/opt/familienarchiv/` using `cp -r` on every run (CI-push model). Containers always read config from the permanent location; a workspace wipe has no effect on running containers.
Environment variables follow a two-source model:
- `infra/observability/obs.env` (git-tracked, non-secret): all non-sensitive config — host ports, public URLs (`GLITCHTIP_DOMAIN`, `GF_SERVER_ROOT_URL`), and the default `POSTGRES_HOST`. Changes go through PR review. No credentials.
- `/opt/familienarchiv/obs-secrets.env` (CI-written, per-deploy): passwords and secret keys only (`GRAFANA_ADMIN_PASSWORD`, `GLITCHTIP_SECRET_KEY`, `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_HOST`), injected fresh from Gitea secrets on every nightly and release deploy. Gitea is the single source of truth for secrets — rotating a secret takes effect on the next deploy without manual server action.
Both files are passed explicitly via `--env-file` to every obs compose command (config dry-run and `up`). There is no implicit auto-read `.env`. The required key inventory is documented in `docs/DEPLOYMENT.md §4`.
The CI runner mounts `/opt/familienarchiv` as a bind mount into job containers (see `runner-config.yaml`). This requires a one-time `mkdir -p /opt/familienarchiv/infra` on the server and a runner restart after updating `runner-config.yaml` (see ADR-015 and `docs/DEPLOYMENT.md §3.1`).
## Consequences
**Positive:**
- Bind-mount sources survive workspace wipes by definition — they are on a persistent host path.
- Config is always in sync with the repo after each nightly run.
- No new infrastructure dependencies; the CI-push model mirrors how the main app stack is deployed.
- Secret rotation requires no manual server action — Gitea secrets are the authoritative store; `obs-secrets.env` is rewritten from scratch on every deploy so a secret change takes effect on the next nightly or release run.
**Negative:**
- `cp -r` does not remove deleted files; a config file removed from the repo persists in `/opt/familienarchiv/infra/observability/` until manually deleted. Acceptable for this project's change frequency. A `rsync -a --delete` would give a clean mirror if this becomes a problem.
- Mounting `/opt/familienarchiv/` into CI job containers expands the blast radius of a compromised workflow step — a malicious step could overwrite app compose files and Caddy config. Acceptable because the runner is single-tenant (trusted code only). See `runner-config.yaml` security comment.
- Runner must be restarted (`systemctl restart gitea-runner`) after any change to `runner-config.yaml` for the new mount to take effect.

View File

@@ -0,0 +1,48 @@
# ADR-017: Spring Boot 4.0 management port shares the main security filter chain
## Status
Accepted
## Context
The Familienarchiv backend runs Spring Boot Actuator on a dedicated management port (8081) so that Caddy never proxies `/actuator/*` requests and Prometheus can reach the scrape endpoint directly inside `archiv-net`.
In earlier Spring Boot versions (< 4.0), the management server ran in an isolated child application context whose security was governed independently by `ManagementWebSecurityAutoConfiguration`. The main app's `SecurityConfig` filter chain (port 8080) never intercepted requests arriving on port 8081.
In Spring Boot 4.0 with Jetty, this isolation was removed. The management server now traverses the **same** Spring Security `FilterChainProxy` as the main application. Concretely:
- Any `SecurityFilterChain` bean in the application context is evaluated for requests arriving on the management port.
- There is no longer a separate "management security" child context.
This was discovered when Prometheus began receiving HTTP 401 responses from `/actuator/prometheus` despite the endpoint being exposed and the `micrometer-registry-prometheus` dependency being present. Prometheus rejected these responses with `received unsupported Content-Type "text/html"` because the main filter chain's form-login `DelegatingAuthenticationEntryPoint` was redirecting unauthenticated requests to `/login` (302 → HTML).
A secondary issue: Spring Boot 4.0 no longer auto-enables Prometheus metrics export — `management.prometheus.metrics.export.enabled` must be set explicitly, and the Prometheus scrape endpoint requires `spring-boot-starter-micrometer-metrics` (a new starter that was split out in Spring Boot 4.0).
## Decision
1. **Dedicated management `SecurityFilterChain`** scoped to `/actuator/**` at `@Order(1)` (highest precedence). This chain:
- `permitAll()` for `/actuator/health` and `/actuator/prometheus` — required for Docker health checks and unauthenticated Prometheus scraping.
- `authenticated()` for all other actuator endpoints — blocks `/actuator/metrics`, `/actuator/info`, etc. without credentials.
- Uses an explicit `401` entry point (not form-login redirect) so that API clients — including Prometheus — receive a machine-readable status code rather than an HTML redirect.
- No CSRF, no form login.
2. **Belt-and-suspenders `permitAll()` in the main `SecurityFilterChain`** for `/actuator/health` and `/actuator/prometheus`, in case a future configuration change causes these paths to escape the management chain's `securityMatcher`.
3. **Network isolation as the outer defense boundary.** Port 8081 is not published in `docker-compose.yml` and is not routed through Caddy. Only services inside `archiv-net` (primarily Prometheus and the Docker health checker) can reach the management port.
## Alternatives rejected
- **Exclude `ManagementWebSecurityAutoConfiguration`:** This auto-configuration no longer exists in Spring Boot 4.0. Exclusion is not applicable.
- **Keep `SecurityConfig` as the sole filter chain without `@Order(1)` management chain:** The main chain's form-login `DelegatingAuthenticationEntryPoint` redirects browser-like clients to `/login` (302). Prometheus and automated health check clients cannot follow this redirect, so the endpoint would be unreachable without a dedicated chain that returns plain 401 or 200.
- **Per-endpoint `@Order(1)` filter chain using `EndpointRequest.toAnyEndpoint()`:** The `spring-boot-security` artifact that provides `EndpointRequest` is not a transitive dependency of `spring-boot-starter-actuator` in Spring Boot 4.0. Using a path-based `securityMatcher("/actuator/**")` achieves the same scoping without an extra dependency.
## Consequences
- All actuator endpoints on port 8081 that are not explicitly `permitAll()`-ed require HTTP Basic credentials. Without valid credentials, the response is 401 (not a redirect).
- Adding a new actuator endpoint to `management.endpoints.web.exposure.include` implicitly protects it via `anyRequest().authenticated()` in the management chain — no additional `permitAll()` needed unless intentional.
- A regression test (`ActuatorPrometheusIT`) verifies:
- `/actuator/prometheus` returns 200 without credentials.
- `/actuator/metrics` returns 401 without credentials.
- Prometheus metric names are present in the response body.
- If port 8081 is ever accidentally published in `docker-compose.yml`, actuator endpoints other than health and prometheus are still protected by HTTP Basic. This reduces (but does not eliminate) the risk of inadvertent exposure.

View File

@@ -0,0 +1,86 @@
# ADR-018: GlitchTip frontend error tracking via @sentry/sveltekit
**Date:** 2026-05-17
**Status:** Accepted
**Deciders:** Marcel Raddatz
---
## Context
The Familienarchiv had no client-side error reporting. When a user encountered a crash
or unhandled error in the SvelteKit frontend, there was no way for the operator to
observe it — errors were invisible until a user manually reported them. A GlitchTip
instance (self-hosted, Sentry-compatible) was already running as part of the
observability stack (`docker-compose.observability.yml`). The backend already reported
server-side errors to it.
We needed a way to:
1. Capture frontend errors automatically and route them to GlitchTip.
2. Give users a visible error identifier they can include in a support message.
3. Do this without leaking personally identifiable information (PII) from the family
archive — documents contain personal histories, names, and relationships.
---
## Decision
Use `@sentry/sveltekit` (the official Sentry SDK for SvelteKit) to:
- Initialise with `sendDefaultPii: false` on both `hooks.server.ts` and `hooks.client.ts`.
- Pass a callback to `Sentry.handleErrorWithSentry()` that returns
`{ message, errorId }` where `errorId` is `Sentry.lastEventId()` when Sentry
captured the event, or a fresh `crypto.randomUUID()` as fallback.
- Display the `errorId` on the `+error.svelte` page so users can include it in a
report to the operator.
The SDK is initialised with `enabled: !!import.meta.env.VITE_SENTRY_DSN` so that
development and CI builds without a DSN configured do not send any events.
`VITE_SENTRY_DSN` is a write-only ingest key — it can POST events to GlitchTip but
cannot read them. It is safe to include in the client bundle per the Sentry security
model; it does not require rotation like a password.
---
## Alternatives considered
**Sentry SaaS** — rejected. The archive contains private family documents and personal
history. Sending error events with stack traces to a US-hosted third party is
inconsistent with the project's data-minimisation posture. Self-hosted GlitchTip on
the same Hetzner VPS keeps all data on infrastructure the operator controls.
**Custom error logging endpoint** — rejected. The @sentry/sveltekit SDK handles
SvelteKit's hook lifecycle, source-map upload, and event grouping automatically.
Reimplementing this would cost significant engineering time for no benefit.
**Log-only (no user-visible errorId)** — rejected. Without a visible error ID, users
can only describe what happened in natural language, making it hard to correlate a
report with a specific GlitchTip event. The `errorId` closes this gap at negligible UI
cost.
---
## Consequences
**Positive:**
- Frontend errors are now observable without requiring user reports.
- Users can provide an `errorId` that maps directly to a GlitchTip event.
- `sendDefaultPii: false` ensures names, IPs, and cookie values are not included in
captured events.
- `tracesSampleRate: 0.1` limits trace volume to 10% of transactions, keeping
GlitchTip load low on the shared VPS.
**Negative / trade-offs:**
- The `@sentry/sveltekit` SDK is now a production dependency. SDK updates must be
reviewed for changes to the default PII scrubbing behaviour.
- The `handleError` callback in both hooks returns a hardcoded English message
(`'An unexpected error occurred'`). This bypasses Paraglide i18n — the error page
will always show English text when the hooks are active, regardless of the user's
locale. This is acceptable because: (a) the error page is a last-resort fallback
not part of normal UX, (b) the `errorId` is the actionable information, not the
message text. A future ADR may address this if internationalised error messages
become a requirement.
- `Sentry.lastEventId()` returns `undefined` when Sentry did not capture the event
(e.g. DSN not configured). The `crypto.randomUUID()` fallback guarantees an `errorId`
is always present, but that UUID will not appear in GlitchTip.

View File

@@ -8,9 +8,11 @@ Person(member, "Family Member", "Access by administrator invite. Searches, brows
System(familienarchiv, "Familienarchiv", "Web application for digitising, organising, and searching family documents")
System_Ext(mail, "Email Service", "SMTP server. Delivers notification emails (mentions, replies) and password-reset links.")
System_Ext(glitchtip, "GlitchTip", "Self-hosted error tracking (Sentry-compatible). Receives frontend and backend error events with stack traces.")
Rel(admin, familienarchiv, "Manages via browser", "HTTPS")
Rel(member, familienarchiv, "Searches, reads, and transcribes via browser", "HTTPS")
Rel(familienarchiv, mail, "Sends notification and password-reset emails (optional)", "SMTP")
Rel(familienarchiv, glitchtip, "Sends error events with errorId and stack trace", "HTTPS")
@enduml

View File

@@ -17,16 +17,16 @@ System_Boundary(archiv, "Familienarchiv (Docker Compose)") {
Container(mc, "Bucket / Service-Account Init", "MinIO Client (mc)", "One-shot container on startup. Idempotent: creates the archive bucket, the archiv-app service account, and attaches the readwrite policy.")
}
System_Boundary(observability, "Observability Stack (docker-compose.observability.yml)") {
System_Boundary(observability, "Observability Stack (/opt/familienarchiv/docker-compose.observability.yml)") {
Container(prometheus, "Prometheus", "prom/prometheus:v3.4.0", "Scrapes metrics from backend management port 8081 (/actuator/prometheus), node-exporter, and cAdvisor. Retention: 30 days.")
Container(node_exporter, "Node Exporter", "prom/node-exporter:v1.9.0", "Host-level CPU, memory, disk, and network metrics.")
Container(cadvisor, "cAdvisor", "gcr.io/cadvisor/cadvisor:v0.52.1", "Per-container resource metrics.")
Container(loki, "Loki", "grafana/loki:3.4.2", "Stores log streams from all containers.")
Container(promtail, "Promtail", "grafana/promtail:3.4.2", "Ships Docker container logs to Loki via Docker SD.")
Container(tempo, "Tempo", "grafana/tempo:2.7.2", "Distributed trace storage. OTLP gRPC receiver on port 4317 (archiv-net). Grafana queries traces on port 3200 (obs-net). All ports internal only.")
Container(tempo, "Tempo", "grafana/tempo:2.7.2", "Distributed trace storage. OTLP HTTP receiver on port 4318 (archiv-net). Grafana queries traces on port 3200 (obs-net). All ports internal only.")
Container(grafana, "Grafana", "grafana/grafana-oss:11.6.1", "Unified observability UI — dashboards, logs, traces. Datasources (Prometheus, Loki, Tempo) and three dashboards are auto-provisioned.")
Container(glitchtip, "GlitchTip", "glitchtip/glitchtip:v4", "Sentry-compatible error tracker — web process. Receives frontend + backend error events, groups by fingerprint, provides issue UI with stack traces.")
Container(obs_glitchtip_worker, "GlitchTip Worker", "glitchtip/glitchtip:v4", "Celery + beat worker — async event ingestion, notifications, cleanup.")
Container(glitchtip, "GlitchTip", "glitchtip/glitchtip:6.1.6", "Sentry-compatible error tracker — web process. Receives frontend + backend error events, groups by fingerprint, provides issue UI with stack traces.")
Container(obs_glitchtip_worker, "GlitchTip Worker", "glitchtip/glitchtip:6.1.6", "Celery + beat worker — async event ingestion, notifications, cleanup.")
Container(obs_redis, "Redis", "redis:7-alpine", "Celery task queue for GlitchTip async workers.")
}
@@ -42,7 +42,7 @@ Rel(backend, mail, "Sends notification and password-reset emails (optional)", "S
Rel(ocr, storage, "Fetches PDF via presigned URL", "HTTP / S3 presigned")
Rel(mc, storage, "Bootstraps bucket + service account on startup", "MinIO Client CLI")
Rel(promtail, loki, "Pushes log streams", "HTTP/Loki push API")
Rel(backend, tempo, "Sends distributed traces via OTLP", "gRPC / OTLP / port 4317 (archiv-net)")
Rel(backend, tempo, "Sends distributed traces via OTLP", "HTTP / OTLP / port 4318 (archiv-net)")
Rel(grafana, prometheus, "Queries metrics", "HTTP 9090")
Rel(grafana, loki, "Queries logs", "HTTP 3100")
Rel(grafana, tempo, "Queries traces", "HTTP 3200")

View File

@@ -19,6 +19,39 @@ Both containers live in the `gitea_gitea` Docker network on the VPS. The runner
The `gitea-runner` container mounts the host Docker socket (`/var/run/docker.sock`). When a workflow job runs, act_runner spawns a **sibling container** for each job. That job container also gets the Docker socket mounted (via `valid_volumes` in `runner-config.yaml`), enabling `docker compose` calls in workflow steps.
### Workspace bind-mount setup (DooD path resolution)
When a workflow step calls `docker compose up` with relative bind-mount sources (e.g. `./infra/observability/prometheus/prometheus.yml`), Compose resolves them against `$(pwd)` inside the job container and passes the resulting **absolute path** to the host Docker daemon. The host daemon then tries to bind-mount that path from the **host filesystem**.
In the default DooD setup the job container's workspace lives in the act_runner overlay2 layer — the host has no directory at that path, auto-creates an empty one, and the container fails with:
```
error mounting "…/prometheus/prometheus.yml" to rootfs at "/etc/prometheus/prometheus.yml": not a directory
```
**Solution (ADR-015):** store job workspaces on a real host path and mount it at the **same absolute path** inside the runner and every job container. `runner-config.yaml` configures this via `workdir_parent`, `valid_volumes`, and `options`.
**One-time host setup** (required on any fresh VPS):
```bash
mkdir -p /srv/gitea-workspace
# Then add to the runner service in ~/docker/gitea/compose.yaml:
# volumes:
# - /srv/gitea-workspace:/srv/gitea-workspace
# Restart the runner container for the change to take effect.
```
The path `/srv/gitea-workspace` is the canonical workspace root. It must be identical on the host and inside job containers — if the paths differ, Compose still resolves to the container-internal path, which the host daemon cannot find (the original bug).
**Disk management:** act_runner cleans per-run subdirectories on completion. Orphaned directories from interrupted runs accumulate under `/srv/gitea-workspace` and should be pruned manually if disk space becomes a concern:
```bash
# List workspace directories older than 7 days
find /srv/gitea-workspace -mindepth 3 -maxdepth 3 -type d -mtime +7
```
---
### Running host-level commands from CI (nsenter pattern)
Job containers are unprivileged and do not share the host's PID/mount/network namespaces. Commands like `systemctl` that target the host daemon are therefore unavailable by default. When a workflow step needs to manage a host service (e.g. `systemctl reload caddy`), it uses the Docker socket to spin up a **privileged sibling container** in the host PID namespace:
@@ -108,6 +141,33 @@ nsenter: failed to execute /bin/systemctl: No such file or directory
The first error means the Docker socket is not mounted into the job container — check `valid_volumes` in `/root/docker/gitea/runner-config.yaml` on the VPS. The second means the Alpine image is running but cannot enter the host mount namespace; verify `--privileged` and `--pid=host` are both present in the workflow step.
**Failure mode 4 — workspace bind-mount not configured (observability stack or any compose-with-file-mounts job)**
Symptom in CI log:
```
Error response from daemon: error while creating mount source path "…/prometheus/prometheus.yml": mkdir …: not a directory
```
Or the service starts but immediately crashes because a config file was mounted as an empty directory.
Cause: `/srv/gitea-workspace` does not exist on the host, or the runner container's `compose.yaml` is missing the `- /srv/gitea-workspace:/srv/gitea-workspace` volume line.
Diagnosis:
```bash
ssh root@<vps>
ls -la /srv/gitea-workspace # must exist and be a directory
docker inspect gitea-runner | grep -A5 Mounts # must show /srv/gitea-workspace
```
Recovery:
```bash
mkdir -p /srv/gitea-workspace
# Add volume line to runner compose.yaml, then:
docker compose -f ~/docker/gitea/compose.yaml up -d gitea-runner
```
See `docs/DEPLOYMENT.md §3.1` and ADR-015 for the full setup rationale.
---
## Gitea vs GitHub Actions Differences

View File

@@ -12,11 +12,11 @@ The original spec in this doc proposed an overlay pattern (`docker compose -f do
---
## Observability stack — not yet deployed
## Observability stack
Prometheus, Loki, Grafana, Alertmanager, Uptime Kuma, GlitchTip and ntfy are **not** part of the production deployment that #497 landed. They are tracked as follow-up issue #498.
The observability stack (Prometheus, Loki, Grafana, Tempo, GlitchTip) ships as a separate `docker-compose.observability.yml` alongside the main stack. Configuration lives under `infra/observability/`.
When that lands the observability containers will join `docker-compose.prod.yml` under a dedicated profile so they can be operated alongside the application stack without affecting the application containers' restart cycle.
→ See [docs/DEPLOYMENT.md §4](../DEPLOYMENT.md#4-logs--observability) for the full setup procedure, service URLs, first-run steps, and env var reference.
---

View File

@@ -38,14 +38,16 @@ export default defineConfig(
'no-undef': 'off',
// This rule is designed for Svelte 5's own routing system using resolve().
// In SvelteKit, <a href> and goto() from $app/navigation are the correct patterns — resolve() is not needed.
'svelte/no-navigation-without-resolve': 'off'
'svelte/no-navigation-without-resolve': 'off',
// Prevents accidental console.log left in source. console.warn and console.error
// are still permitted for intentional server-side logging (e.g. hooks.server.ts).
'no-console': ['error', { allow: ['warn', 'error'] }]
}
},
{
files: ['**/*.svelte', '**/*.svelte.ts', '**/*.svelte.js'],
languageOptions: {
parserOptions: {
projectService: true,
extraFileExtensions: ['.svelte'],
parser: ts.parser,
svelteConfig
@@ -72,6 +74,13 @@ export default defineConfig(
]
}
},
{
// E2E tests use console.log for diagnostic output — allow it there.
files: ['e2e/**'],
rules: {
'no-console': 'off'
}
},
{
files: ['**/*.spec.ts', '**/*.test.ts'],
rules: {

View File

@@ -473,7 +473,7 @@
"dashboard_reader_stats_persons_short": "Pers.",
"dashboard_reader_stats_stories_short": "Gesch.",
"dashboard_reader_draft_meta": "Entwurf · zuletzt bearbeitet {relative}",
"dashboard_resume_label": "Zuletzt geöffnet:",
"dashboard_resume_label": "Weiter, wo du aufgehört hast",
"dashboard_resume_fallback": "Unbekanntes Dokument",
"doc_status_placeholder": "Platzhalter",
"doc_status_uploaded": "Hochgeladen",
@@ -773,19 +773,15 @@
"admin_invite_created_title": "Einladung erstellt",
"admin_invite_created_desc": "Teile diesen Link mit der einzuladenden Person:",
"admin_invite_revoke_confirm": "Einladung wirklich widerrufen?",
"greeting_morning": "Guten Morgen, {name}.",
"greeting_day": "Hallo, {name}.",
"greeting_evening": "Guten Abend, {name}.",
"dashboard_resume_label": "Weiter, wo du aufgehört hast",
"dashboard_blocks": "{count} Abschnitte",
"dashboard_resume_cta": "Weitertranskribieren",
"dashboard_resume_other": "oder anderen Brief wählen",
"dashboard_empty_title": "Noch kein Dokument begonnen",
"dashboard_empty_body": "Wähle ein Dokument aus dem Archiv, um mit der Transkription zu beginnen.",
"dashboard_empty_cta": "Zum Archiv",
"dashboard_mission_caption": "Offene Aufgaben",
"queue_segment": "Segmentieren",
"queue_segment_blurb": "Seiten aufteilen",
@@ -795,7 +791,6 @@
"queue_review_blurb": "Texte kontrollieren",
"queue_n_open": "{n} offen",
"queue_show_all": "Alle anzeigen →",
"pulse_eyebrow": "Diese Woche",
"pulse_headline": "Ihr habt {pages} Seiten bearbeitet.",
"pulse_you": "Du selbst hast {pages} davon bearbeitet.",
@@ -803,19 +798,15 @@
"pulse_transcribed": "Textstellen markiert",
"pulse_reviewed": "Textstellen transkribiert",
"pulse_uploaded": "Dokumente hochgeladen",
"feed_caption": "Kommentare & Aktivität",
"feed_show_all": "Alle anzeigen",
"feed_for_you": "für dich",
"audit_action_text_saved": "hat Text gespeichert in",
"audit_action_file_uploaded": "hat eine Datei hochgeladen:",
"audit_action_annotation_created": "hat eine Markierung erstellt in",
"audit_action_comment_added": "hat kommentiert:",
"audit_action_mention_created": "hat dich erwähnt in",
"dropzone_release": "Loslassen zum Hochladen",
"chronik_page_title": "Aktivitäten",
"chronik_for_you_caption": "Für dich",
"chronik_for_you_count": "{count} neu",
@@ -859,9 +850,7 @@
"pagination_page_of": "Seite {page} von {total}",
"pagination_nav_label": "Seitennavigation",
"pagination_page_button": "Seite {page}",
"common_opens_new_tab": "(öffnet in neuem Tab)",
"transcribe_coach_title": "Erste Transkription?",
"transcribe_coach_preamble": "Unser Kurrent-Erkenner lernt noch. Jede Transkription, die Sie zum Training freigeben, bringt ihm die Schrift bei — so funktioniert's:",
"transcribe_coach_step_1_title": "Rahmen ziehen.",
@@ -871,10 +860,8 @@
"transcribe_coach_step_3_title": "Speichert automatisch.",
"transcribe_coach_footer_kurrent": "Hilfe zu Kurrent ↗",
"transcribe_coach_footer_richtlinien": "Transkriptions-Richtlinien ↗",
"transcription_mode_help_label": "Lese- und Bearbeitungsmodus",
"transcription_mode_help_body": "Lesen zeigt die Transkription als fließenden Text. Bearbeiten öffnet die Textfelder für jede Passage.",
"richtlinien_title": "Transkriptions-Richtlinien",
"richtlinien_intro": "Damit alle Briefe einheitlich transkribiert werden — egal wer tippt — hier unsere Regeln. Die Seite wächst mit: sobald wir eine neue Konvention beschließen, landet sie hier.",
"richtlinien_wiki_text": "Kurrent- und Sütterlin-Alphabete sind bei Wikipedia gut erklärt. Hier stehen nur unsere eigenen Vereinbarungen für dieses Archiv.",
@@ -948,12 +935,9 @@
"bulk_edit_all_x_failed": "Filter konnte nicht abgerufen werden — bitte erneut versuchen.",
"bulk_edit_topbar_title": "Massenbearbeitung",
"bulk_edit_count_pill": "{count} werden bearbeitet",
"nav_stammbaum": "Stammbaum",
"nav_geschichten": "Geschichten",
"error_geschichte_not_found": "Die Geschichte wurde nicht gefunden.",
"geschichten_index_title": "Geschichten",
"geschichten_new_button": "Neue Geschichte",
"geschichten_filter_all_pill": "Alle",
@@ -973,7 +957,6 @@
"geschichten_card_attach_action": "+ Geschichte anhängen",
"geschichten_card_show_all_for_person": "Alle Geschichten zu {name}",
"geschichten_card_show_all": "Alle anzeigen",
"geschichte_editor_title_placeholder": "Titel der Geschichte",
"geschichte_editor_body_placeholder": "Schreibe hier deine Geschichte…",
"geschichte_editor_status_draft": "ENTWURF",
@@ -1000,14 +983,11 @@
"geschichte_editor_toolbar_h3": "Unterüberschrift",
"geschichte_editor_toolbar_ul": "Aufzählung",
"geschichte_editor_toolbar_ol": "Nummerierte Liste",
"geschichte_delete_confirm_title": "Geschichte löschen?",
"geschichte_delete_confirm_body": "Diese Aktion kann nicht rückgängig gemacht werden. Die Geschichte wird dauerhaft gelöscht und aus allen verlinkten Personen- und Dokumentseiten entfernt.",
"error_relationship_not_found": "Die Beziehung wurde nicht gefunden.",
"error_circular_relationship": "Diese Beziehung würde einen Kreis erzeugen.",
"error_duplicate_relationship": "Diese Beziehung gibt es bereits.",
"relation_parent_of": "Elternteil von",
"relation_child_of": "Kind von",
"relation_spouse_of": "Ehegatte",
@@ -1018,7 +998,6 @@
"relation_doctor": "Arzt",
"relation_neighbor": "Nachbar",
"relation_other": "Sonstige",
"relation_inferred_parent": "Elternteil",
"relation_inferred_child": "Kind",
"relation_inferred_spouse": "Ehegatte",
@@ -1036,9 +1015,7 @@
"relation_inferred_sibling_inlaw": "Schwager/Schwägerin",
"relation_inferred_cousin_1": "Cousin/Cousine",
"relation_inferred_distant": "Weitläufige Verwandtschaft",
"doc_details_field_relationship": "Verwandtschaft",
"stammbaum_empty_heading": "Noch keine Familienmitglieder",
"stammbaum_empty_body": "Markiere Personen auf ihrer Bearbeitungsseite als Familienmitglied, damit sie hier erscheinen.",
"stammbaum_empty_link": "→ Zur Personenliste",
@@ -1050,7 +1027,6 @@
"stammbaum_zoom_in": "Vergrößern",
"stammbaum_zoom_out": "Verkleinern",
"stammbaum_generations": "Generationen",
"relation_error_duplicate": "Diese Beziehung gibt es bereits.",
"relation_error_circular": "Diese Beziehung würde einen Kreis erzeugen.",
"relation_error_self": "Eine Person kann nicht mit sich selbst verbunden werden.",
@@ -1073,14 +1049,15 @@
"relation_form_field_from_year": "Von Jahr",
"relation_form_field_to_year": "Bis Jahr",
"relation_form_year_placeholder": "z.B. 1920",
"person_relationships_heading": "Beziehungen",
"person_relationships_empty": "Noch keine Beziehungen bekannt.",
"timeline_aria_label": "Zeitachse Dokumentdichte",
"timeline_clear_selection": "Auswahl zurücksetzen",
"timeline_zoom_reset": "Zurück zur Übersicht",
"timeline_bar_aria_singular": "{when}, 1 Dokument",
"timeline_bar_aria_plural": "{when}, {count} Dokumente",
"timeline_dragging_aria_live": "Zeitraum {from} bis {to} ausgewählt"
"timeline_dragging_aria_live": "Zeitraum {from} bis {to} ausgewählt",
"error_page_id_label": "Fehler-ID",
"error_copy_id_label": "ID kopieren",
"error_copied": "Kopiert!"
}

View File

@@ -473,7 +473,7 @@
"dashboard_reader_stats_persons_short": "Pers.",
"dashboard_reader_stats_stories_short": "Stor.",
"dashboard_reader_draft_meta": "Draft · last edited {relative}",
"dashboard_resume_label": "Last opened:",
"dashboard_resume_label": "Continue where you left off",
"dashboard_resume_fallback": "Unknown document",
"doc_status_placeholder": "Placeholder",
"doc_status_uploaded": "Uploaded",
@@ -773,19 +773,15 @@
"admin_invite_created_title": "Invite created",
"admin_invite_created_desc": "Share this link with the person you are inviting:",
"admin_invite_revoke_confirm": "Really revoke this invite?",
"greeting_morning": "Good morning, {name}.",
"greeting_day": "Hello, {name}.",
"greeting_evening": "Good evening, {name}.",
"dashboard_resume_label": "Continue where you left off",
"dashboard_blocks": "{count} sections",
"dashboard_resume_cta": "Continue transcribing",
"dashboard_resume_other": "or choose another document",
"dashboard_empty_title": "No document started yet",
"dashboard_empty_body": "Choose a document from the archive to start transcribing.",
"dashboard_empty_cta": "To the archive",
"dashboard_mission_caption": "Open tasks",
"queue_segment": "Segment",
"queue_segment_blurb": "Split pages",
@@ -795,7 +791,6 @@
"queue_review_blurb": "Check texts",
"queue_n_open": "{n} open",
"queue_show_all": "Show all →",
"pulse_eyebrow": "This week",
"pulse_headline": "You have worked on {pages} pages.",
"pulse_you": "You personally worked on {pages} of them.",
@@ -803,19 +798,15 @@
"pulse_transcribed": "Passages annotated",
"pulse_reviewed": "Passages transcribed",
"pulse_uploaded": "Documents uploaded",
"feed_caption": "Comments & activity",
"feed_show_all": "Show all",
"feed_for_you": "for you",
"audit_action_text_saved": "saved text in",
"audit_action_file_uploaded": "uploaded a file:",
"audit_action_annotation_created": "created an annotation in",
"audit_action_comment_added": "commented:",
"audit_action_mention_created": "mentioned you in",
"dropzone_release": "Release to upload",
"chronik_page_title": "Activity",
"chronik_for_you_caption": "For you",
"chronik_for_you_count": "{count} new",
@@ -859,9 +850,7 @@
"pagination_page_of": "Page {page} of {total}",
"pagination_nav_label": "Pagination",
"pagination_page_button": "Page {page}",
"common_opens_new_tab": "(opens in new tab)",
"transcribe_coach_title": "First transcription?",
"transcribe_coach_preamble": "Our Kurrent recogniser is still learning. Every transcription you release for training teaches it the handwriting — here's how it works:",
"transcribe_coach_step_1_title": "Draw a frame.",
@@ -871,10 +860,8 @@
"transcribe_coach_step_3_title": "Saves automatically.",
"transcribe_coach_footer_kurrent": "Kurrent help ↗",
"transcribe_coach_footer_richtlinien": "Transcription guidelines ↗",
"transcription_mode_help_label": "Read and edit mode",
"transcription_mode_help_body": "Read shows the transcription as flowing text. Edit opens the text fields for each passage.",
"richtlinien_title": "Transcription Guidelines",
"richtlinien_intro": "So every letter is transcribed consistently — no matter who types — here are our rules. The page grows with us: as soon as we agree a new convention, it lands here.",
"richtlinien_wiki_text": "The Kurrent and Sütterlin alphabets are well explained on Wikipedia. Here you'll only find our own conventions for this archive.",
@@ -948,12 +935,9 @@
"bulk_edit_all_x_failed": "Could not load filter results — please retry.",
"bulk_edit_topbar_title": "Bulk edit",
"bulk_edit_count_pill": "{count} will be edited",
"nav_stammbaum": "Family tree",
"nav_geschichten": "Stories",
"error_geschichte_not_found": "The story was not found.",
"geschichten_index_title": "Stories",
"geschichten_new_button": "New story",
"geschichten_filter_all_pill": "All",
@@ -973,7 +957,6 @@
"geschichten_card_attach_action": "+ Attach a story",
"geschichten_card_show_all_for_person": "All stories about {name}",
"geschichten_card_show_all": "Show all",
"geschichte_editor_title_placeholder": "Story title",
"geschichte_editor_body_placeholder": "Write your story here…",
"geschichte_editor_status_draft": "DRAFT",
@@ -1000,14 +983,11 @@
"geschichte_editor_toolbar_h3": "Subheading",
"geschichte_editor_toolbar_ul": "Bulleted list",
"geschichte_editor_toolbar_ol": "Numbered list",
"geschichte_delete_confirm_title": "Delete story?",
"geschichte_delete_confirm_body": "This action cannot be undone. The story will be permanently deleted and removed from all linked person and document pages.",
"error_relationship_not_found": "Relationship not found.",
"error_circular_relationship": "This relationship would form a cycle.",
"error_duplicate_relationship": "This relationship already exists.",
"relation_parent_of": "Parent of",
"relation_child_of": "Child of",
"relation_spouse_of": "Spouse",
@@ -1018,7 +998,6 @@
"relation_doctor": "Doctor",
"relation_neighbor": "Neighbour",
"relation_other": "Other",
"relation_inferred_parent": "Parent",
"relation_inferred_child": "Child",
"relation_inferred_spouse": "Spouse",
@@ -1036,9 +1015,7 @@
"relation_inferred_sibling_inlaw": "Sibling-in-law",
"relation_inferred_cousin_1": "Cousin",
"relation_inferred_distant": "Distant relative",
"doc_details_field_relationship": "Relationship",
"stammbaum_empty_heading": "No family members yet",
"stammbaum_empty_body": "Mark a person as a family member on their edit page so they appear here.",
"stammbaum_empty_link": "→ Go to person list",
@@ -1050,7 +1027,6 @@
"stammbaum_zoom_in": "Zoom in",
"stammbaum_zoom_out": "Zoom out",
"stammbaum_generations": "Generations",
"relation_error_duplicate": "This relationship already exists.",
"relation_error_circular": "This relationship would form a cycle.",
"relation_error_self": "A person cannot be related to themselves.",
@@ -1073,14 +1049,15 @@
"relation_form_field_from_year": "From year",
"relation_form_field_to_year": "To year",
"relation_form_year_placeholder": "e.g. 1920",
"person_relationships_heading": "Relationships",
"person_relationships_empty": "No relationships known yet.",
"timeline_aria_label": "Document density timeline",
"timeline_clear_selection": "Clear selection",
"timeline_zoom_reset": "Reset zoom",
"timeline_bar_aria_singular": "{when}, 1 document",
"timeline_bar_aria_plural": "{when}, {count} documents",
"timeline_dragging_aria_live": "Range {from} to {to} selected"
"timeline_dragging_aria_live": "Range {from} to {to} selected",
"error_page_id_label": "Error ID",
"error_copy_id_label": "Copy ID",
"error_copied": "Copied!"
}

View File

@@ -473,7 +473,7 @@
"dashboard_reader_stats_persons_short": "Pers.",
"dashboard_reader_stats_stories_short": "Hist.",
"dashboard_reader_draft_meta": "Borrador · editado hace {relative}",
"dashboard_resume_label": "Último abierto:",
"dashboard_resume_label": "Continuar donde lo dejaste",
"dashboard_resume_fallback": "Documento desconocido",
"doc_status_placeholder": "Marcador",
"doc_status_uploaded": "Cargado",
@@ -773,19 +773,15 @@
"admin_invite_created_title": "Invitación creada",
"admin_invite_created_desc": "Comparte este enlace con la persona invitada:",
"admin_invite_revoke_confirm": "¿Realmente revocar esta invitación?",
"greeting_morning": "Buenos días, {name}.",
"greeting_day": "Hola, {name}.",
"greeting_evening": "Buenas noches, {name}.",
"dashboard_resume_label": "Continuar donde lo dejaste",
"dashboard_blocks": "{count} secciones",
"dashboard_resume_cta": "Continuar transcripción",
"dashboard_resume_other": "o elige otro documento",
"dashboard_empty_title": "Aún no has comenzado ningún documento",
"dashboard_empty_body": "Elige un documento del archivo para empezar a transcribir.",
"dashboard_empty_cta": "Al archivo",
"dashboard_mission_caption": "Tareas pendientes",
"queue_segment": "Segmentar",
"queue_segment_blurb": "Dividir páginas",
@@ -795,7 +791,6 @@
"queue_review_blurb": "Controlar textos",
"queue_n_open": "{n} pendiente",
"queue_show_all": "Ver todo →",
"pulse_eyebrow": "Esta semana",
"pulse_headline": "Habéis trabajado {pages} páginas.",
"pulse_you": "Tú mismo has trabajado {pages} de ellas.",
@@ -803,19 +798,15 @@
"pulse_transcribed": "Fragmentos anotados",
"pulse_reviewed": "Fragmentos transcritos",
"pulse_uploaded": "Documentos subidos",
"feed_caption": "Comentarios y actividad",
"feed_show_all": "Ver todo",
"feed_for_you": "para ti",
"audit_action_text_saved": "guardó texto en",
"audit_action_file_uploaded": "subió un archivo:",
"audit_action_annotation_created": "creó una anotación en",
"audit_action_comment_added": "comentó:",
"audit_action_mention_created": "te mencionó en",
"dropzone_release": "Suelta para subir",
"chronik_page_title": "Actividades",
"chronik_for_you_caption": "Para ti",
"chronik_for_you_count": "{count} nuevas",
@@ -859,9 +850,7 @@
"pagination_page_of": "Página {page} de {total}",
"pagination_nav_label": "Paginación",
"pagination_page_button": "Página {page}",
"common_opens_new_tab": "(abre en pestaña nueva)",
"transcribe_coach_title": "¿Primera transcripción?",
"transcribe_coach_preamble": "Nuestro reconocedor de Kurrent aún está aprendiendo. Cada transcripción que libera para el entrenamiento le enseña la escritura — así funciona:",
"transcribe_coach_step_1_title": "Dibujar un marco.",
@@ -871,10 +860,8 @@
"transcribe_coach_step_3_title": "Se guarda automáticamente.",
"transcribe_coach_footer_kurrent": "Ayuda sobre Kurrent ↗",
"transcribe_coach_footer_richtlinien": "Normas de transcripción ↗",
"transcription_mode_help_label": "Modo lectura y edición",
"transcription_mode_help_body": "Lectura muestra la transcripción como texto continuo. Edición abre los campos de texto para cada pasaje.",
"richtlinien_title": "Normas de transcripción",
"richtlinien_intro": "Para que todas las cartas se transcriban de forma uniforme — sin importar quién transcriba — aquí están nuestras reglas. La página crece con nosotros.",
"richtlinien_wiki_text": "Los alfabetos Kurrent y Sütterlin están bien explicados en Wikipedia. Aquí solo se recogen nuestros propios acuerdos para este archivo.",
@@ -948,12 +935,9 @@
"bulk_edit_all_x_failed": "No se pudieron cargar los resultados del filtro; vuelve a intentarlo.",
"bulk_edit_topbar_title": "Edición masiva",
"bulk_edit_count_pill": "Se editarán {count}",
"nav_stammbaum": "Árbol genealógico",
"nav_geschichten": "Historias",
"error_geschichte_not_found": "No se encontró la historia.",
"geschichten_index_title": "Historias",
"geschichten_new_button": "Nueva historia",
"geschichten_filter_all_pill": "Todas",
@@ -973,7 +957,6 @@
"geschichten_card_attach_action": "+ Adjuntar historia",
"geschichten_card_show_all_for_person": "Todas las historias sobre {name}",
"geschichten_card_show_all": "Mostrar todas",
"geschichte_editor_title_placeholder": "Título de la historia",
"geschichte_editor_body_placeholder": "Escribe tu historia aquí…",
"geschichte_editor_status_draft": "BORRADOR",
@@ -1000,14 +983,11 @@
"geschichte_editor_toolbar_h3": "Subencabezado",
"geschichte_editor_toolbar_ul": "Lista con viñetas",
"geschichte_editor_toolbar_ol": "Lista numerada",
"geschichte_delete_confirm_title": "¿Eliminar historia?",
"geschichte_delete_confirm_body": "Esta acción no se puede deshacer. La historia se eliminará permanentemente y se quitará de todas las páginas de personas y documentos vinculados.",
"error_relationship_not_found": "La relación no fue encontrada.",
"error_circular_relationship": "Esta relación crearía un ciclo.",
"error_duplicate_relationship": "Esta relación ya existe.",
"relation_parent_of": "Progenitor de",
"relation_child_of": "Hijo/a de",
"relation_spouse_of": "Cónyuge",
@@ -1018,7 +998,6 @@
"relation_doctor": "Médico",
"relation_neighbor": "Vecino/a",
"relation_other": "Otro",
"relation_inferred_parent": "Progenitor",
"relation_inferred_child": "Hijo/a",
"relation_inferred_spouse": "Cónyuge",
@@ -1036,9 +1015,7 @@
"relation_inferred_sibling_inlaw": "Cuñado/a",
"relation_inferred_cousin_1": "Primo/a",
"relation_inferred_distant": "Pariente lejano",
"doc_details_field_relationship": "Parentesco",
"stammbaum_empty_heading": "Aún no hay miembros de la familia",
"stammbaum_empty_body": "Marca a una persona como miembro de la familia en su página de edición para que aparezca aquí.",
"stammbaum_empty_link": "→ Ir a la lista de personas",
@@ -1050,7 +1027,6 @@
"stammbaum_zoom_in": "Acercar",
"stammbaum_zoom_out": "Alejar",
"stammbaum_generations": "Generaciones",
"relation_error_duplicate": "Esta relación ya existe.",
"relation_error_circular": "Esta relación crearía un ciclo.",
"relation_error_self": "Una persona no puede estar relacionada consigo misma.",
@@ -1073,14 +1049,15 @@
"relation_form_field_from_year": "Desde año",
"relation_form_field_to_year": "Hasta año",
"relation_form_year_placeholder": "ej. 1920",
"person_relationships_heading": "Relaciones",
"person_relationships_empty": "Aún no se conocen relaciones.",
"timeline_aria_label": "Cronología de densidad de documentos",
"timeline_clear_selection": "Borrar selección",
"timeline_zoom_reset": "Restablecer zoom",
"timeline_bar_aria_singular": "{when}, 1 documento",
"timeline_bar_aria_plural": "{when}, {count} documentos",
"timeline_dragging_aria_live": "Rango {from} a {to} seleccionado"
"timeline_dragging_aria_live": "Rango {from} a {to} seleccionado",
"error_page_id_label": "ID de error",
"error_copy_id_label": "Copiar ID",
"error_copied": "¡Copiado!"
}

View File

@@ -26,6 +26,11 @@ declare global {
interface PageData {
user?: User; // Available in $page.data.user
}
interface Error {
message: string;
errorId?: string;
}
}
}

View File

@@ -0,0 +1,47 @@
import { describe, it, expect, vi, beforeEach } from 'vitest';
vi.mock('@sentry/sveltekit', () => ({
init: vi.fn(),
handleErrorWithSentry: (fn: (args: unknown) => unknown) => fn,
lastEventId: vi.fn(() => 'sentry-event-id-abc123')
}));
describe('hooks.client handleError', () => {
beforeEach(() => {
vi.resetModules();
});
it('returns Sentry lastEventId as errorId', async () => {
const Sentry = await import('@sentry/sveltekit');
vi.mocked(Sentry.lastEventId).mockReturnValue('sentry-event-id-abc123');
const { handleError } = await import('./hooks.client');
const result = (handleError as (args: unknown) => { message: string; errorId: string })({
error: new Error('boom'),
event: {},
status: 500,
message: 'Internal Error'
});
expect(result.errorId).toBe('sentry-event-id-abc123');
expect(result.message).toBe('An unexpected error occurred');
});
it('falls back to crypto.randomUUID when lastEventId returns undefined', async () => {
const Sentry = await import('@sentry/sveltekit');
vi.mocked(Sentry.lastEventId).mockReturnValue(undefined);
const { handleError } = await import('./hooks.client');
const result = (handleError as (args: unknown) => { message: string; errorId: string })({
error: new Error('boom'),
event: {},
status: 500,
message: 'Internal Error'
});
expect(result.errorId).toMatch(
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/
);
expect(result.message).toBe('An unexpected error occurred');
});
});

View File

@@ -1,10 +1,16 @@
import * as Sentry from '@sentry/sveltekit';
// VITE_SENTRY_DSN is a write-only ingest key — it can POST events to GlitchTip
// but cannot read them. Safe to include in the client bundle per Sentry security model.
Sentry.init({
dsn: import.meta.env.VITE_SENTRY_DSN,
environment: import.meta.env.MODE,
tracesSampleRate: 1.0,
tracesSampleRate: 0.1,
sendDefaultPii: false,
enabled: !!import.meta.env.VITE_SENTRY_DSN
});
export const handleError = Sentry.handleErrorWithSentry();
export const handleError = Sentry.handleErrorWithSentry(() => {
const errorId = Sentry.lastEventId() ?? crypto.randomUUID();
return { message: 'An unexpected error occurred', errorId };
});

View File

@@ -0,0 +1,58 @@
import { describe, it, expect, vi, beforeEach } from 'vitest';
vi.mock('@sentry/sveltekit', () => ({
init: vi.fn(),
handleErrorWithSentry: (fn: (args: unknown) => unknown) => fn,
lastEventId: vi.fn(() => 'sentry-event-id-abc123')
}));
vi.mock('@sveltejs/kit', () => ({ redirect: vi.fn() }));
vi.mock('@sveltejs/kit/hooks', () => ({ sequence: vi.fn((...fns: unknown[]) => fns[0]) }));
vi.mock('$lib/paraglide/server', () => ({ paraglideMiddleware: vi.fn() }));
vi.mock('$lib/paraglide/runtime', () => ({ cookieName: 'locale', cookieMaxAge: 86400 }));
vi.mock('$lib/shared/server/locale', () => ({ detectLocale: vi.fn(() => 'de') }));
const makeEvent = () => ({
url: { pathname: '/documents/123' },
locals: {}
});
describe('hooks.server handleError', () => {
beforeEach(() => {
vi.resetModules();
});
it('returns Sentry lastEventId as errorId', async () => {
const Sentry = await import('@sentry/sveltekit');
vi.mocked(Sentry.lastEventId).mockReturnValue('sentry-event-id-abc123');
const { handleError } = await import('./hooks.server');
const result = (handleError as (args: unknown) => { message: string; errorId: string })({
error: new Error('boom'),
event: makeEvent(),
status: 500,
message: 'Internal Error'
});
expect(result.errorId).toBe('sentry-event-id-abc123');
expect(result.message).toBe('An unexpected error occurred');
});
it('falls back to crypto.randomUUID when lastEventId returns undefined', async () => {
const Sentry = await import('@sentry/sveltekit');
vi.mocked(Sentry.lastEventId).mockReturnValue(undefined);
const { handleError } = await import('./hooks.server');
const result = (handleError as (args: unknown) => { message: string; errorId: string })({
error: new Error('boom'),
event: makeEvent(),
status: 500,
message: 'Internal Error'
});
expect(result.errorId).toMatch(
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/
);
expect(result.message).toBe('An unexpected error occurred');
});
});

View File

@@ -6,10 +6,13 @@ import { env } from 'process';
import { cookieName, cookieMaxAge } from '$lib/paraglide/runtime';
import { detectLocale } from '$lib/shared/server/locale';
// VITE_SENTRY_DSN is a write-only ingest key — it can POST events to GlitchTip
// but cannot read them. Safe to include in the client bundle per Sentry security model.
Sentry.init({
dsn: import.meta.env.VITE_SENTRY_DSN,
environment: import.meta.env.MODE,
tracesSampleRate: 1.0,
tracesSampleRate: 0.1,
sendDefaultPii: false,
enabled: !!import.meta.env.VITE_SENTRY_DSN
});
@@ -122,4 +125,7 @@ export const handleFetch: HandleFetch = async ({ event, request, fetch }) => {
export const handle = sequence(userGroup, handleAuth, handleLocaleDetection, handleParaglide);
export const handleError = Sentry.handleErrorWithSentry();
export const handleError = Sentry.handleErrorWithSentry(() => {
const errorId = Sentry.lastEventId() ?? crypto.randomUUID();
return { message: 'An unexpected error occurred', errorId };
});

View File

@@ -1,13 +1,53 @@
<script lang="ts">
import { page } from '$app/state';
import { m } from '$lib/paraglide/messages.js';
let copied = $state(false);
function copyId() {
const id = page.error?.errorId;
if (!id) return;
if (!navigator.clipboard) return;
navigator.clipboard.writeText(id).then(
() => {
copied = true;
setTimeout(() => (copied = false), 2000);
},
() => {
/* clipboard denied or unavailable — select-all on the <code> element remains */
}
);
}
</script>
<svelte:head>
<title>{m.page_title_error()}</title>
</svelte:head>
<div class="px-4 py-12 text-center font-sans">
<p class="font-sans text-6xl font-bold text-ink">{page.status}</p>
<p class="mt-2 font-sans text-sm text-ink-2">{page.error?.message ?? 'Internal Error'}</p>
</div>
<main class="px-4 py-12 text-center font-sans">
<h1 class="mb-2 font-serif text-2xl font-bold text-ink">{m.page_title_error()}</h1>
<p class="mb-8 font-sans text-sm text-ink-2">
{page.error?.message ?? m.error_internal_error()}
</p>
<p class="mb-4 font-mono text-4xl font-bold text-ink">{page.status}</p>
{#if page.error?.errorId}
<div class="mt-6 flex flex-col items-center gap-3">
<p class="font-sans text-xs tracking-widest text-ink-2 uppercase">
{m.error_page_id_label()}
</p>
<code
class="rounded border border-line bg-surface px-3 py-1 font-mono text-sm text-ink select-all"
>
{page.error.errorId}
</code>
<button
class="min-h-[44px] min-w-[44px] rounded-sm bg-brand-navy px-5 py-2 font-sans text-sm text-white transition-colors hover:opacity-90 focus-visible:ring-2 focus-visible:ring-brand-navy focus-visible:ring-offset-2"
onclick={copyId}
aria-label={m.error_copy_id_label()}
>
<span aria-live="polite">{copied ? m.error_copied() : m.error_copy_id_label()}</span>
</button>
</div>
{/if}
</main>

View File

@@ -24,7 +24,6 @@ export const GET: RequestHandler = async ({ url, fetch }) => {
}
const data = await response.json();
console.log('Tags Data', data);
// 4. Daten zurück an den Browser schicken
return json(data);

View File

@@ -4,7 +4,10 @@ import { page as browserPage } from 'vitest/browser';
const mockPage = {
status: 500,
error: { message: 'Internal Error' } as { message: string } | null
error: { message: 'Internal Error', errorId: undefined } as {
message: string;
errorId?: string;
} | null
};
vi.mock('$app/state', () => ({
@@ -13,6 +16,16 @@ vi.mock('$app/state', () => ({
}
}));
vi.mock('$lib/paraglide/messages.js', () => ({
m: {
page_title_error: () => 'Es ist etwas schiefgelaufen.',
error_internal_error: () => 'Ein unerwarteter Fehler ist aufgetreten.',
error_page_id_label: () => 'Fehler-ID',
error_copy_id_label: () => 'ID kopieren',
error_copied: () => 'Kopiert!'
}
}));
afterEach(cleanup);
async function loadComponent() {
@@ -20,7 +33,7 @@ async function loadComponent() {
}
describe('+error.svelte', () => {
it('renders the page status code prominently', async () => {
it('renders the page status code', async () => {
mockPage.status = 404;
mockPage.error = { message: 'Not Found' };
@@ -40,13 +53,79 @@ describe('+error.svelte', () => {
await expect.element(browserPage.getByText('Database unavailable')).toBeVisible();
});
it('falls back to the literal "Internal Error" when page.error is null', async () => {
it('falls back to error_internal_error message when page.error is null', async () => {
mockPage.status = 500;
mockPage.error = null;
const ErrorPage = await loadComponent();
render(ErrorPage);
await expect.element(browserPage.getByText('Internal Error')).toBeVisible();
await expect
.element(browserPage.getByText('Ein unerwarteter Fehler ist aufgetreten.'))
.toBeVisible();
});
it('shows errorId when page.error.errorId is set', async () => {
mockPage.status = 500;
mockPage.error = { message: 'Something broke', errorId: 'abc-123-def' };
const ErrorPage = await loadComponent();
render(ErrorPage);
await expect.element(browserPage.getByText('abc-123-def')).toBeVisible();
});
it('shows copy button when errorId is present', async () => {
mockPage.status = 500;
mockPage.error = { message: 'Something broke', errorId: 'abc-123-def' };
const ErrorPage = await loadComponent();
render(ErrorPage);
await expect.element(browserPage.getByRole('button', { name: 'ID kopieren' })).toBeVisible();
});
it('does not render errorId section when errorId is absent', async () => {
mockPage.status = 500;
mockPage.error = { message: 'Something broke' };
const ErrorPage = await loadComponent();
render(ErrorPage);
await expect.element(browserPage.getByText('Fehler-ID')).not.toBeInTheDocument();
});
it('shows "Kopiert!" after clicking the copy button', async () => {
mockPage.status = 500;
mockPage.error = { message: 'Something broke', errorId: 'abc-123-def' };
Object.defineProperty(navigator, 'clipboard', {
value: { writeText: vi.fn().mockResolvedValue(undefined) },
configurable: true,
writable: true
});
const ErrorPage = await loadComponent();
render(ErrorPage);
await browserPage.getByRole('button', { name: 'ID kopieren' }).click();
await expect.element(browserPage.getByText('Kopiert!')).toBeVisible();
});
it('does not show "Kopiert!" when clipboard write is rejected', async () => {
mockPage.status = 500;
mockPage.error = { message: 'Something broke', errorId: 'abc-123-def' };
Object.defineProperty(navigator, 'clipboard', {
value: { writeText: vi.fn().mockRejectedValue(new Error('denied')) },
configurable: true,
writable: true
});
const ErrorPage = await loadComponent();
render(ErrorPage);
await browserPage.getByRole('button', { name: 'ID kopieren' }).click();
await expect.element(browserPage.getByText('Kopiert!')).not.toBeInTheDocument();
});
});

View File

@@ -71,7 +71,8 @@ export default defineConfig({
'src/lib/shared/utils/**',
'src/lib/shared/server/**',
'src/lib/shared/discussion/**',
'src/lib/document/**'
'src/lib/document/**',
'src/hooks.server.ts'
],
exclude: ['**/*.svelte', '**/*.svelte.ts', '**/__mocks__/**'],
thresholds: {

View File

@@ -0,0 +1,24 @@
# Non-secret observability stack configuration — tracked in git.
# Secret values (passwords, keys) are injected by CI from Gitea secrets
# into /opt/familienarchiv/obs-secrets.env at deploy time.
#
# For local dev the main .env file supplies these values instead;
# this file is only used in the CI/production path.
# Host ports (all bound to 127.0.0.1 — Caddy is the external entry point)
PORT_GRAFANA=3003
PORT_GLITCHTIP=3002
PORT_PROMETHEUS=9090
# Public URLs — used for internal redirects, alert email links, OAuth callbacks
GF_SERVER_ROOT_URL=https://grafana.archiv.raddatz.cloud
GLITCHTIP_DOMAIN=https://glitchtip.archiv.raddatz.cloud
POSTGRES_USER=archiv
# PostgreSQL hostname for GlitchTip db-init and workers.
# The actual value depends on the Compose project name — it is not a fixed string.
# CI sets POSTGRES_HOST in obs-secrets.env per environment:
# staging: archiv-staging-db-1 (project archiv-staging + service db)
# production: archiv-production-db-1 (project archiv-production + service db)
# For local dev, set POSTGRES_HOST in your .env file (defaults to archive-db there).

View File

@@ -15,8 +15,6 @@ scrape_configs:
metrics_path: /actuator/prometheus
static_configs:
# Uses the Docker service name (not container_name) for reliable DNS resolution.
# Target will show as DOWN until backend instrumentation issue adds
# micrometer-registry-prometheus and exposes the endpoint — this is expected.
- targets: ['backend:8081']
- job_name: ocr-service

View File

@@ -28,3 +28,5 @@ scrape_configs:
target_label: 'compose_project'
- source_labels: ['__meta_docker_container_log_stream']
target_label: 'logstream'
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
target_label: 'job'

View File

@@ -36,9 +36,6 @@ metrics_generator:
source: tempo
storage:
path: /var/tempo/generator/wal
processors:
- service-graphs
- span-metrics
# Tempo HTTP API (port 3200) is unauthenticated. Access is controlled entirely
# by network isolation: only Grafana (on obs-net) should reach this port.

View File

@@ -1,16 +1,26 @@
# runner-config.yaml — only the relevant section
container:
# passed as DOCKER_HOST inside the job container
# join the same network Gitea is on, so job containers can resolve 'gitea'
# for actions/checkout and other internal API calls.
network: gitea_gitea
# passed as DOCKER_HOST inside the job container; act_runner auto-mounts
# this socket path into the job, so no explicit -v option is needed.
docker_host: "unix:///var/run/docker.sock"
# whitelists the socket path so workflows can mount it
# Job workspaces are stored here and mounted at the same absolute path
# inside job containers. Identical host <-> container path is the requirement:
# Compose resolves relative bind mounts to $(pwd) inside the job container
# and passes that absolute path to the host daemon, which must find the file
# at that exact host path. Prerequisite: /srv/gitea-workspace exists on the
# host and is bind-mounted in the runner container (see compose.yaml).
workdir_parent: /srv/gitea-workspace
# whitelists volumes that workflow steps may bind-mount
valid_volumes:
- "/var/run/docker.sock"
# appended to `docker run` when the runner spawns a job container
# SECURITY: Mounting the Docker socket grants job containers root-equivalent
# access to the host Docker daemon. Acceptable here because only trusted code
# from this private repo runs on this runner. Do NOT use on a runner that
# accepts untrusted PRs from external contributors.
options: "-v /var/run/docker.sock:/var/run/docker.sock"
# keep network mode default (bridge) — Testcontainers handles its own networking
- "/srv/gitea-workspace"
- "/opt/familienarchiv"
# mount the workspace and the permanent obs/config directory into job containers.
# /opt/familienarchiv is the stable path CI copies configs to (ADR-016); it must
# be mounted here so deploy steps can write through to the host filesystem.
options: "-v /srv/gitea-workspace:/srv/gitea-workspace -v /opt/familienarchiv:/opt/familienarchiv"
# keep behavior default — Testcontainers handles its own networking
force_pull: false