Commit Graph

2567 Commits

Author SHA1 Message Date
Marcel
7769dbc9f4 security(ocr): apply container hardening baseline to docker-compose.prod.yml
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m4s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 18s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
Mirror the CIS Docker §4.1/§4.6 hardening from docker-compose.yml to the
production/staging compose file, which is standalone (not an overlay).

- Fix cache volume mount path: ocr-cache:/root/.cache → /app/cache (matches
  the non-root user's HF_HOME/XDG_CACHE_HOME, avoids PermissionError)
- Add HF_HOME, XDG_CACHE_HOME, TORCH_HOME env vars so HuggingFace, ketos,
  and PyTorch all write to the declared writable volumes, not HOME
- Add read_only: true, tmpfs (/tmp:512m), cap_drop: [ALL],
  no-new-privileges:true — matching the dev baseline

Also extend DEPLOYMENT.md §8 upgrade notes to cover all three environments
(dev/production/staging), each with its correct project-namespaced volume name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:43:18 +02:00
Marcel
74ca5ee35f docs(adr): ADR-019 — container hardening baseline (non-root + read-only)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m2s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m11s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 17s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:33:06 +02:00
Marcel
38973a014e docs: add XDG_CACHE_HOME/TORCH_HOME to OCR env table and upgrade notes for PR #611
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:32:02 +02:00
Marcel
fc8b4b164b security(ocr): redirect XDG cache and Torch home away from read-only HOME
Prevents PyTorch/Matplotlib/Ketos from writing to /home/ocr which is
on the read-only container filesystem — fixes Nora's blocker. Also
restores the explanatory comment on the ocr_cache volume mount.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:30:39 +02:00
Marcel
eb63df2000 test(ocr): add startup root canary tests for main.py lifespan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:29:47 +02:00
Marcel
53bd574660 test(ocr): replace vacuous startswith assertion with equality check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:26:58 +02:00
Marcel
581ba01d8d security(ocr): log warning on startup when running as root
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m10s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
Adds a canary log line if os.getuid() == 0. Produces an observable
signal in container logs if the USER directive is ever removed from
the Dockerfile, without requiring an external audit tool.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:51:00 +02:00
Marcel
9db42d6cc1 fix(ocr): resolve HTRMOPO_DIR from env var, not ~ expansion
With --no-create-home, os.path.expanduser("~") resolves to "/" causing
kraken get to write to /.local/share/htrmopo. Replace with
os.environ.get("HTRMOPO_DIR", "/app/models/.htrmopo") so the path is
explicit and override-friendly without a home directory.

Adds two tests verifying env-var resolution and ~-free default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:49:21 +02:00
Marcel
ab24786d2a security(ocr): harden compose — fix cache volume path, add read_only + cap_drop
Move ocr_cache mount from /root/.cache to /app/cache (correct path for
non-root user). Add HF_HOME so Hugging Face resolves to the same path.
Add runtime hardening: read_only, tmpfs /tmp (512 MB cap), cap_drop ALL,
no-new-privileges.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:47:18 +02:00
Marcel
1aca4c4a41 security(ocr): add non-root user and set HOME/HF_HOME in Dockerfile
CIS Docker §4.1: run uvicorn as UID 1000 (ocr) instead of root.
Creates /home/ocr and /app/cache with correct ownership so named
volumes inherit ocr:ocr on first Docker mount. Sets HOME and HF_HOME
so ~ expansion and Hugging Face caching resolve under /app, not /root.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:46:25 +02:00
Marcel
669eaa7c65 fix(ci): pin semgrep version, add pip cache, harden rule severity
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m2s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m55s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 18s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / Unit & Component Tests (push) Successful in 3m3s
CI / OCR Service Tests (push) Successful in 19s
CI / Backend Unit Tests (push) Successful in 2m56s
CI / fail2ban Regex (push) Successful in 40s
CI / Semgrep Security Scan (push) Successful in 17s
CI / Compose Bucket Idempotency (push) Successful in 59s
- Pin semgrep to 1.163.0 to prevent silent upgrades breaking the scan
- Add cache: 'pip' to setup-python@v5 for faster CI runs
- Promote all three XXE Semgrep rules from WARNING to ERROR to match
  the --error CI flag intent
- Update SAX/StAX rule messages to reference XxeSafeXmlParser and
  the OWASP XXE prevention cheat sheet
- Remove stale issue reference from regression test comment
- Document XML metacharacter constraint on buildValidOds test helper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:18:03 +02:00
Marcel
f15ea031d1 ci(security): add Semgrep XXE rule and CI scan job
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m2s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m3s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Semgrep Security Scan (pull_request) Successful in 1m11s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Add .semgrep/security.yml with rules for DocumentBuilderFactory,
SAXParserFactory, and XMLInputFactory without XXE hardening (CWE-611).
Add semgrep-scan CI job — runs in parallel with backend-unit-tests,
local rules only, --error flag fails the build on any match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 14:48:46 +02:00
Marcel
25a39fca9c security(import): harden DocumentBuilderFactory against XXE in MassImportService
Extract XxeSafeXmlParser with all 6 OWASP-recommended features
(disallow-doctype-decl, external-general-entities, external-parameter-entities,
load-external-dtd, XInclude, expandEntityReferences). Make readOds()
package-private; add failing-then-passing regression test and valid-ODS guard test.

POI 5.5.0 does not mitigate this: the vulnerable parser is a custom
DocumentBuilderFactory call in readOds(), not inside POI's internal ODS reader.
The hardening is defence-in-depth, not redundant with POI defaults.

Closes #528

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 14:48:03 +02:00
Marcel
e398133907 security(deps): bump Spring Boot 4.0.0 → 4.0.6 and OWASP sanitizer 20240325.1 → 20260101.1
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m6s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 3m8s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
CI / Unit & Component Tests (push) Successful in 3m5s
CI / OCR Service Tests (push) Successful in 18s
CI / Backend Unit Tests (push) Successful in 2m57s
CI / fail2ban Regex (push) Successful in 39s
CI / Compose Bucket Idempotency (push) Successful in 1m0s
Clears 2 CRITICAL CVEs (CVE-2026-40976, CVE-2026-22732) and 17 HIGH CVEs
in Netty, Jetty, Spring Security, and Spring Boot itself. Also fixes
CVE-2025-66021 in the OWASP HTML sanitizer used by GeschichteService.

JaCoCo threshold ratcheted to 0.77 (actual measured coverage; previous
0.88 gate was never enforced since CI ran clean test not clean verify).
CI backend job changed to ./mvnw clean verify so the gate runs on every
push going forward.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 12:55:12 +02:00
Marcel
186535f8c9 test(security): add ActuatorSecurityTest to guard auth boundaries
Tests that /actuator/health is accessible without credentials and
/actuator/env requires authentication — permanent regression guards
against CVE-2026-40976-class Actuator filter chain bypass bugs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 12:45:28 +02:00
Marcel
de19d17b00 docs(adr): add ADR-018 for GlitchTip frontend error tracking via @sentry/sveltekit
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m4s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m39s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / Unit & Component Tests (push) Successful in 5m46s
CI / OCR Service Tests (push) Successful in 45s
CI / Backend Unit Tests (push) Failing after 10m32s
CI / fail2ban Regex (push) Successful in 3m7s
CI / Compose Bucket Idempotency (push) Successful in 2m26s
Documents the decision to use the Sentry SDK with self-hosted GlitchTip,
sendDefaultPii:false rationale, errorId surfacing to users, and alternatives
considered (Sentry SaaS rejected for data-minimisation reasons).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:27:46 +02:00
Marcel
b2e31c3c1b refactor(observability): lower trace sample rate, add DSN comment, improve status visibility
- Lower tracesSampleRate from 1.0 to 0.1 in both hooks (errors still captured
  at 100%; trace volume reduced for self-hosted GlitchTip on shared VPS)
- Add comment explaining VITE_SENTRY_DSN is a write-only ingest key, safe in
  client bundle — prevents accidental rotation as if it were a password
- Restore HTTP status code prominence: text-4xl font-bold (was text-xs text-ink-3)
- Add min-w-[44px] to copy button for WCAG 2.2 minimum touch target

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:27:01 +02:00
Marcel
9e23620072 refactor(observability): add hooks.server.ts to coverage include in vite.config.ts
The handleError callback in hooks.server.ts is now gated by the 80% branch
coverage threshold along with the rest of the server-side logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:26:03 +02:00
Marcel
af42113fca test(observability): add hooks.client.test.ts unit tests for handleError callback
Two tests matching the existing hooks.server.test.ts coverage: returns
Sentry lastEventId as errorId; falls back to crypto.randomUUID when
lastEventId returns undefined.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:25:30 +02:00
Marcel
c779ec59f9 feat(observability): guard navigator.clipboard and handle rejection in copyId
Adds availability guard (navigator.clipboard may be undefined in non-HTTPS
contexts) and a rejection handler so clipboard-denied errors are silently
caught rather than becoming unhandled promise rejections. Tests cover the
success feedback and the silent-failure path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:24:35 +02:00
Marcel
2023ea2931 docs(c4): add GlitchTip as external error-tracking system to L1 context diagram
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 2m43s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:46:17 +02:00
Marcel
59b18039ed refactor(observability): remove console.log from tags proxy and enforce no-console lint rule
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:45:49 +02:00
Marcel
96ea7e6815 feat(observability): redesign +error.svelte with errorId display and copy button
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:44:32 +02:00
Marcel
dff81f7bfb feat(observability): add handleError callback to hooks.client.ts returning errorId
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:41:58 +02:00
Marcel
a9c82ec481 feat(observability): add handleError callback to hooks.server.ts returning errorId
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:41:24 +02:00
Marcel
97aa372094 feat(observability): add App.Error interface with errorId to app.d.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:40:12 +02:00
Marcel
e61409773e docs(c4): fix Tempo OTLP transport in l2-containers diagram
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m1s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m37s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
CI / Unit & Component Tests (push) Successful in 3m1s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 2m38s
CI / fail2ban Regex (push) Successful in 39s
CI / Compose Bucket Idempotency (push) Successful in 58s
nightly / deploy-staging (push) Failing after 1m52s
Port 4317 is gRPC; the backend uses HttpExporter (HTTP/1.1) and sends
to port 4318. Update Container description and Rel label to match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:48:06 +02:00
Marcel
7713a03cd5 docs(obs): add OBSERVABILITY.md developer guide and fix stale env var docs
- New docs/OBSERVABILITY.md: developer-facing guide with a "where to look
  for what" table, common LogQL queries, trace exploration workflow,
  log→trace correlation via traceId links, and a signal summary table
- Link from DEPLOYMENT.md §4 (ops section now points to dev guide) and
  from CLAUDE.md Infrastructure section
- Fix stale DEPLOYMENT.md env var table: OTEL_EXPORTER_OTLP_ENDPOINT
  now documents port 4318 (HTTP) not 4317 (gRPC); add the three new
  env vars wired in this PR (OTEL_LOGS_EXPORTER, OTEL_METRICS_EXPORTER,
  MANAGEMENT_METRICS_TAGS_APPLICATION) with their rationale
- Fix stale obs-tempo service description (port 4318, not 4317)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:48:06 +02:00
Marcel
cea94ce260 fix(obs): disable OTLP metric export (Prometheus scrapes pull-model)
Tempo only handles traces; sending metrics to /v1/metrics returns 404.
Prometheus already scrapes Spring Boot metrics via the pull-model at
/actuator/prometheus, so OTLP metric push is redundant and noisy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
45a992f5a8 fix(obs): fix OTLP transport port and add application metrics tag
- Change OTEL default endpoint from port 4317 (gRPC) to 4318 (HTTP) to
  match Spring Boot's HttpExporter; sending HTTP/1.1 to a gRPC listener
  caused "Connection reset" errors
- Add otel.logs.exporter=none: Promtail captures Docker logs via the
  logging driver; sending logs to Tempo's OTLP endpoint (which only
  handles traces) produced 404 errors
- Add management.metrics.tags.application to every metric so Grafana's
  Spring Boot Observability dashboard (ID 17175) can filter by the
  application label_values() template variable
- Add MANAGEMENT_METRICS_TAGS_APPLICATION and OTEL_LOGS_EXPORTER env
  vars to docker-compose.prod.yml; production Tempo endpoint already
  uses 4318
- Add MANAGEMENT_TRACING_SAMPLING_PROBABILITY to prod compose with
  0.1 default to avoid 100% trace sampling in production

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
bd57310bbf docs(obs): document promtail job label mapping in DEPLOYMENT.md
The job label (derived from the Docker Compose service name) is what
powers {job="backend"} queries in Loki dashboards and populates the
Grafana "App" variable dropdown. Operators need to know this mapping
when writing custom Loki queries.

Addresses @markus non-blocker suggestion from PR #606 review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
c2d092f435 docs(adr): add ADR-017 — Spring Boot 4.0 management port shares main security filter chain
Documents the architectural decision behind the dedicated management
SecurityFilterChain, the discovery that SB4+Jetty removed the isolated
management child-context security, and the consequences for actuator
endpoint exposure.

Addresses @markus blocker from PR #606 review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
e19bd60984 fix(obs): add management security chain and split Prometheus IT tests
- Add @Order(1) managementFilterChain scoped to /actuator/** with explicit
  401 entry point, blocking all non-public actuator paths without the
  form-login redirect that the main chain uses for browser clients.
- Split single combined test into two focused assertions
  (prometheus_endpoint_returns_200_without_credentials,
   prometheus_endpoint_returns_jvm_metrics).
- Add negative regression test: actuator_metrics_requires_authentication
  verifies that /actuator/metrics returns 401 without credentials.

Addresses reviewer concerns from @sara (missing negative test, split
assertions) and @nora (dedicated management security layer).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
2aa0ff9e70 fix(obs): wire Prometheus endpoint for Spring Boot 4.0
Four Spring Boot 4.0-specific issues prevented /actuator/prometheus from working:

1. spring-boot-starter-micrometer-metrics missing — Spring Boot 4.0 splits
   Micrometer metrics export (including the Prometheus scrape endpoint) out of
   spring-boot-starter-actuator into its own starter. Added dependency.

2. management.prometheus.metrics.export.enabled not set — Spring Boot 4.0
   defaults metrics export to false (opt-in). Added the property to
   application.yaml.

3. SecurityConfig did not permit /actuator/prometheus — Spring Boot 4.0
   with Jetty serves the management port (8081) via the same security filter
   chain as the main port (8080). The previous commit's exclusion of
   ManagementWebSecurityAutoConfiguration was a no-op (that class no longer
   exists in Spring Boot 4.0); removed it and added the correct permitAll()
   rule. Updated the architecture comment in application.yaml to reflect the
   true filter-chain behaviour.

4. Reverted invalid FamilienarchivApplication.java change from the prior
   commit (ManagementWebSecurityAutoConfiguration import compiled against a
   class that does not exist in the Spring Boot 4.0 BOM).

Also adds ActuatorPrometheusIT — an integration test that asserts the
/actuator/prometheus endpoint returns 200 with jvm_memory_used_bytes without
credentials, serving as regression protection against future Spring Boot
upgrades silently breaking metrics collection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
5dd74df293 fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards
Three root causes confirmed via live server investigation (issue #604):

1. ManagementWebSecurityAutoConfiguration applied HTTP Basic auth to the
   management port (8081), causing Prometheus to receive 401 HTML responses
   instead of metrics. Excluded the auto-config — the Docker network
   (archiv-net) provides the security boundary for this internal port.

2. promtail-config.yml had no `job` relabel rule. Grafana's Loki dashboards
   query {job="$app"} which matched nothing; logs were in Loki under
   compose_service but invisible to every dashboard panel.

3. prometheus.yml had a stale comment claiming the spring-boot target would
   be DOWN until micrometer-registry-prometheus was added — it has been
   present in pom.xml for some time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
7712180f3a docs(claude): add generation guidance to GRAFANA_ADMIN_PASSWORD env var
All checks were successful
CI / fail2ban Regex (push) Successful in 40s
CI / Unit & Component Tests (pull_request) Successful in 3m5s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 2m35s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
CI / Unit & Component Tests (push) Successful in 3m1s
CI / OCR Service Tests (push) Successful in 18s
CI / Backend Unit Tests (push) Successful in 2m38s
CI / Compose Bucket Idempotency (push) Successful in 57s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:25:25 +02:00
Marcel
c9a22945c8 docs(claude): add URL format example to GLITCHTIP_DOMAIN env var
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:25:01 +02:00
Marcel
9d84ebc4fe docs(deployment): add VITE_SENTRY_DSN to §3.3 Gitea secrets table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:24:37 +02:00
Marcel
58b9204395 docs(deployment): add VITE_SENTRY_DSN to §2 observability env vars table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:15:24 +02:00
Marcel
0d662f3a5e docs(c4): update GlitchTip image tag to 6.1.6 in L2 container diagram
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:14:17 +02:00
Marcel
2e864e5b81 docs(infra): remove stale 'observability not yet deployed' note
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 2m42s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
Replace with a cross-reference to DEPLOYMENT.md §4 now that the obs
stack shipped as docker-compose.observability.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:54:04 +02:00
Marcel
40d9713b79 docs(deployment): fix stale GlitchTip image tags and add SENTRY_DSN to env vars table
- GlitchTip image corrected from glitchtip:v4 to glitchtip:6.1.6 in services table
- Grafana default port corrected from 3001 to 3003 in services table description
- SENTRY_DSN added to backend env vars table (wired in docker-compose.yml and application.yaml)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:53:31 +02:00
Marcel
68d07fe961 docs(claude): add observability service table and env var reference
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:52:36 +02:00
Marcel
6145a25fe2 fix(obs): correct GlitchTip port and healthcheck for v6.x
Some checks failed
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
GlitchTip 6.x moved its internal listen port from 8080 to 8000.
The ports mapping was forwarding to the wrong port (host traffic
never reached the app), and the healthcheck was probing 8080 with
wget (not present in the image), causing the container to stay
permanently unhealthy.

Fix: map to port 8000, check with bash /dev/tcp (no external tools
needed, available in the Python base image).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:31:07 +02:00
Marcel
c43f45a472 Merge branch 'fix/issue-601-obs-stack-permanent'
Some checks failed
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
2026-05-16 10:19:59 +02:00
Marcel
134f1e2ae0 chore(runner): mount /opt/familienarchiv into job containers
The live runner config was missing /opt/familienarchiv in valid_volumes
and options, so deploy steps wrote files into the ephemeral job
container rather than the host — silently discarded on exit.

Updated /root/docker/gitea/runner-config.yaml on the server and
restarted gitea-runner. Repo file now matches the server exactly,
including the network: gitea_gitea setting that was previously
only on the server.

DEPLOYMENT.md: clarifies that /opt/familienarchiv does not need to be
in the runner container's own volumes (DooD spawns job containers from
the host daemon directly); updates restart command from systemctl to
docker restart; narrows the cp-r stale-file note to manual ops only
(CI uses rm -rf before copying).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:19:09 +02:00
Marcel
55ccd5f3c0 ci(obs): replace rsync with rm+cp in deploy step
rsync is not present in the act_runner job container image. rm -rf +
cp -r gives identical semantics (including removal of deleted files)
using only coreutils, which are always available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:18:42 +02:00
3658733003 fix(obs): add GlitchTip healthcheck on /_health/ (port 8080)
Some checks failed
CI / Unit & Component Tests (push) Waiting to run
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / OCR Service Tests (push) Successful in 42s
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
2026-05-16 09:37:17 +02:00
0bb0a314ad ci(obs): add obs-glitchtip to health assertion loop (now has /_health/ healthcheck)
Some checks are pending
CI / Unit & Component Tests (pull_request) Waiting to run
CI / OCR Service Tests (pull_request) Waiting to run
CI / Backend Unit Tests (pull_request) Waiting to run
CI / fail2ban Regex (pull_request) Waiting to run
CI / Compose Bucket Idempotency (pull_request) Waiting to run
2026-05-16 09:36:37 +02:00
b194b565f6 ci(obs): reference #603 in keep-in-sync comments; add obs-glitchtip to health assertion
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
2026-05-16 09:35:43 +02:00