Commit Graph

2575 Commits

Author SHA1 Message Date
Marcel
a77b0c1221 feat(auth): AuthService — login/logout with audit logging and timing-safe credential rejection
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 19:21:46 +02:00
Marcel
393a3c25fd feat(auth): add INVALID_CREDENTIALS + SESSION_EXPIRED error codes; LOGIN_SUCCESS/FAILED/LOGOUT audit kinds
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 19:19:01 +02:00
Marcel
8c7a2741b0 feat(auth): configure Spring Session JDBC (fa_session, 8h idle, SameSite=strict)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 19:18:28 +02:00
Marcel
865c6ed796 feat(auth): add spring-session-jdbc 4.0.3 dependency
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 19:17:46 +02:00
Marcel
14542b6e33 migration: V67 — recreate spring_session tables (ADR-020)
Re-introduces tables dropped by V2. Canonical DDL from Spring
Session 3.x schema-postgresql.sql.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 19:16:25 +02:00
Marcel
de7053644b docs(adr): ADR-020 — stateful auth via Spring Session JDBC
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 19:15:51 +02:00
Marcel
f1e0b92f47 style(ocr): normalize cap_drop to block notation in docker-compose.yml
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m10s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
CI / Unit & Component Tests (push) Successful in 3m2s
CI / OCR Service Tests (push) Successful in 18s
CI / Backend Unit Tests (push) Successful in 3m0s
CI / fail2ban Regex (push) Successful in 42s
CI / Semgrep Security Scan (push) Successful in 18s
CI / Compose Bucket Idempotency (push) Successful in 1m1s
Aligns with the block sequence style used in docker-compose.prod.yml and
the rest of the compose file, removing the inline [ALL] inconsistency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 18:54:24 +02:00
Marcel
bead6f1811 fix(ocr): handle empty-string HTRMOPO_DIR env var with or-fallback
os.environ.get(key, default) returns "" when the key exists but is blank —
the default is only used when the key is absent. The or-fallback treats both
absence and blank values as "use the default".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 18:53:26 +02:00
Marcel
7769dbc9f4 security(ocr): apply container hardening baseline to docker-compose.prod.yml
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m4s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 18s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
Mirror the CIS Docker §4.1/§4.6 hardening from docker-compose.yml to the
production/staging compose file, which is standalone (not an overlay).

- Fix cache volume mount path: ocr-cache:/root/.cache → /app/cache (matches
  the non-root user's HF_HOME/XDG_CACHE_HOME, avoids PermissionError)
- Add HF_HOME, XDG_CACHE_HOME, TORCH_HOME env vars so HuggingFace, ketos,
  and PyTorch all write to the declared writable volumes, not HOME
- Add read_only: true, tmpfs (/tmp:512m), cap_drop: [ALL],
  no-new-privileges:true — matching the dev baseline

Also extend DEPLOYMENT.md §8 upgrade notes to cover all three environments
(dev/production/staging), each with its correct project-namespaced volume name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:43:18 +02:00
Marcel
74ca5ee35f docs(adr): ADR-019 — container hardening baseline (non-root + read-only)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m2s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m11s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 17s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:33:06 +02:00
Marcel
38973a014e docs: add XDG_CACHE_HOME/TORCH_HOME to OCR env table and upgrade notes for PR #611
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:32:02 +02:00
Marcel
fc8b4b164b security(ocr): redirect XDG cache and Torch home away from read-only HOME
Prevents PyTorch/Matplotlib/Ketos from writing to /home/ocr which is
on the read-only container filesystem — fixes Nora's blocker. Also
restores the explanatory comment on the ocr_cache volume mount.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:30:39 +02:00
Marcel
eb63df2000 test(ocr): add startup root canary tests for main.py lifespan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:29:47 +02:00
Marcel
53bd574660 test(ocr): replace vacuous startswith assertion with equality check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:26:58 +02:00
Marcel
581ba01d8d security(ocr): log warning on startup when running as root
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m10s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
Adds a canary log line if os.getuid() == 0. Produces an observable
signal in container logs if the USER directive is ever removed from
the Dockerfile, without requiring an external audit tool.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:51:00 +02:00
Marcel
9db42d6cc1 fix(ocr): resolve HTRMOPO_DIR from env var, not ~ expansion
With --no-create-home, os.path.expanduser("~") resolves to "/" causing
kraken get to write to /.local/share/htrmopo. Replace with
os.environ.get("HTRMOPO_DIR", "/app/models/.htrmopo") so the path is
explicit and override-friendly without a home directory.

Adds two tests verifying env-var resolution and ~-free default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:49:21 +02:00
Marcel
ab24786d2a security(ocr): harden compose — fix cache volume path, add read_only + cap_drop
Move ocr_cache mount from /root/.cache to /app/cache (correct path for
non-root user). Add HF_HOME so Hugging Face resolves to the same path.
Add runtime hardening: read_only, tmpfs /tmp (512 MB cap), cap_drop ALL,
no-new-privileges.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:47:18 +02:00
Marcel
1aca4c4a41 security(ocr): add non-root user and set HOME/HF_HOME in Dockerfile
CIS Docker §4.1: run uvicorn as UID 1000 (ocr) instead of root.
Creates /home/ocr and /app/cache with correct ownership so named
volumes inherit ocr:ocr on first Docker mount. Sets HOME and HF_HOME
so ~ expansion and Hugging Face caching resolve under /app, not /root.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:46:25 +02:00
Marcel
669eaa7c65 fix(ci): pin semgrep version, add pip cache, harden rule severity
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m2s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m55s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 18s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / Unit & Component Tests (push) Successful in 3m3s
CI / OCR Service Tests (push) Successful in 19s
CI / Backend Unit Tests (push) Successful in 2m56s
CI / fail2ban Regex (push) Successful in 40s
CI / Semgrep Security Scan (push) Successful in 17s
CI / Compose Bucket Idempotency (push) Successful in 59s
- Pin semgrep to 1.163.0 to prevent silent upgrades breaking the scan
- Add cache: 'pip' to setup-python@v5 for faster CI runs
- Promote all three XXE Semgrep rules from WARNING to ERROR to match
  the --error CI flag intent
- Update SAX/StAX rule messages to reference XxeSafeXmlParser and
  the OWASP XXE prevention cheat sheet
- Remove stale issue reference from regression test comment
- Document XML metacharacter constraint on buildValidOds test helper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:18:03 +02:00
Marcel
f15ea031d1 ci(security): add Semgrep XXE rule and CI scan job
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m2s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m3s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Semgrep Security Scan (pull_request) Successful in 1m11s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Add .semgrep/security.yml with rules for DocumentBuilderFactory,
SAXParserFactory, and XMLInputFactory without XXE hardening (CWE-611).
Add semgrep-scan CI job — runs in parallel with backend-unit-tests,
local rules only, --error flag fails the build on any match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 14:48:46 +02:00
Marcel
25a39fca9c security(import): harden DocumentBuilderFactory against XXE in MassImportService
Extract XxeSafeXmlParser with all 6 OWASP-recommended features
(disallow-doctype-decl, external-general-entities, external-parameter-entities,
load-external-dtd, XInclude, expandEntityReferences). Make readOds()
package-private; add failing-then-passing regression test and valid-ODS guard test.

POI 5.5.0 does not mitigate this: the vulnerable parser is a custom
DocumentBuilderFactory call in readOds(), not inside POI's internal ODS reader.
The hardening is defence-in-depth, not redundant with POI defaults.

Closes #528

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 14:48:03 +02:00
Marcel
e398133907 security(deps): bump Spring Boot 4.0.0 → 4.0.6 and OWASP sanitizer 20240325.1 → 20260101.1
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m6s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 3m8s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
CI / Unit & Component Tests (push) Successful in 3m5s
CI / OCR Service Tests (push) Successful in 18s
CI / Backend Unit Tests (push) Successful in 2m57s
CI / fail2ban Regex (push) Successful in 39s
CI / Compose Bucket Idempotency (push) Successful in 1m0s
Clears 2 CRITICAL CVEs (CVE-2026-40976, CVE-2026-22732) and 17 HIGH CVEs
in Netty, Jetty, Spring Security, and Spring Boot itself. Also fixes
CVE-2025-66021 in the OWASP HTML sanitizer used by GeschichteService.

JaCoCo threshold ratcheted to 0.77 (actual measured coverage; previous
0.88 gate was never enforced since CI ran clean test not clean verify).
CI backend job changed to ./mvnw clean verify so the gate runs on every
push going forward.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 12:55:12 +02:00
Marcel
186535f8c9 test(security): add ActuatorSecurityTest to guard auth boundaries
Tests that /actuator/health is accessible without credentials and
/actuator/env requires authentication — permanent regression guards
against CVE-2026-40976-class Actuator filter chain bypass bugs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 12:45:28 +02:00
Marcel
de19d17b00 docs(adr): add ADR-018 for GlitchTip frontend error tracking via @sentry/sveltekit
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m4s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m39s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / Unit & Component Tests (push) Successful in 5m46s
CI / OCR Service Tests (push) Successful in 45s
CI / Backend Unit Tests (push) Failing after 10m32s
CI / fail2ban Regex (push) Successful in 3m7s
CI / Compose Bucket Idempotency (push) Successful in 2m26s
Documents the decision to use the Sentry SDK with self-hosted GlitchTip,
sendDefaultPii:false rationale, errorId surfacing to users, and alternatives
considered (Sentry SaaS rejected for data-minimisation reasons).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:27:46 +02:00
Marcel
b2e31c3c1b refactor(observability): lower trace sample rate, add DSN comment, improve status visibility
- Lower tracesSampleRate from 1.0 to 0.1 in both hooks (errors still captured
  at 100%; trace volume reduced for self-hosted GlitchTip on shared VPS)
- Add comment explaining VITE_SENTRY_DSN is a write-only ingest key, safe in
  client bundle — prevents accidental rotation as if it were a password
- Restore HTTP status code prominence: text-4xl font-bold (was text-xs text-ink-3)
- Add min-w-[44px] to copy button for WCAG 2.2 minimum touch target

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:27:01 +02:00
Marcel
9e23620072 refactor(observability): add hooks.server.ts to coverage include in vite.config.ts
The handleError callback in hooks.server.ts is now gated by the 80% branch
coverage threshold along with the rest of the server-side logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:26:03 +02:00
Marcel
af42113fca test(observability): add hooks.client.test.ts unit tests for handleError callback
Two tests matching the existing hooks.server.test.ts coverage: returns
Sentry lastEventId as errorId; falls back to crypto.randomUUID when
lastEventId returns undefined.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:25:30 +02:00
Marcel
c779ec59f9 feat(observability): guard navigator.clipboard and handle rejection in copyId
Adds availability guard (navigator.clipboard may be undefined in non-HTTPS
contexts) and a rejection handler so clipboard-denied errors are silently
caught rather than becoming unhandled promise rejections. Tests cover the
success feedback and the silent-failure path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:24:35 +02:00
Marcel
2023ea2931 docs(c4): add GlitchTip as external error-tracking system to L1 context diagram
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 2m43s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:46:17 +02:00
Marcel
59b18039ed refactor(observability): remove console.log from tags proxy and enforce no-console lint rule
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:45:49 +02:00
Marcel
96ea7e6815 feat(observability): redesign +error.svelte with errorId display and copy button
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:44:32 +02:00
Marcel
dff81f7bfb feat(observability): add handleError callback to hooks.client.ts returning errorId
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:41:58 +02:00
Marcel
a9c82ec481 feat(observability): add handleError callback to hooks.server.ts returning errorId
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:41:24 +02:00
Marcel
97aa372094 feat(observability): add App.Error interface with errorId to app.d.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:40:12 +02:00
Marcel
e61409773e docs(c4): fix Tempo OTLP transport in l2-containers diagram
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m1s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m37s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
CI / Unit & Component Tests (push) Successful in 3m1s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 2m38s
CI / fail2ban Regex (push) Successful in 39s
CI / Compose Bucket Idempotency (push) Successful in 58s
nightly / deploy-staging (push) Failing after 1m52s
Port 4317 is gRPC; the backend uses HttpExporter (HTTP/1.1) and sends
to port 4318. Update Container description and Rel label to match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:48:06 +02:00
Marcel
7713a03cd5 docs(obs): add OBSERVABILITY.md developer guide and fix stale env var docs
- New docs/OBSERVABILITY.md: developer-facing guide with a "where to look
  for what" table, common LogQL queries, trace exploration workflow,
  log→trace correlation via traceId links, and a signal summary table
- Link from DEPLOYMENT.md §4 (ops section now points to dev guide) and
  from CLAUDE.md Infrastructure section
- Fix stale DEPLOYMENT.md env var table: OTEL_EXPORTER_OTLP_ENDPOINT
  now documents port 4318 (HTTP) not 4317 (gRPC); add the three new
  env vars wired in this PR (OTEL_LOGS_EXPORTER, OTEL_METRICS_EXPORTER,
  MANAGEMENT_METRICS_TAGS_APPLICATION) with their rationale
- Fix stale obs-tempo service description (port 4318, not 4317)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:48:06 +02:00
Marcel
cea94ce260 fix(obs): disable OTLP metric export (Prometheus scrapes pull-model)
Tempo only handles traces; sending metrics to /v1/metrics returns 404.
Prometheus already scrapes Spring Boot metrics via the pull-model at
/actuator/prometheus, so OTLP metric push is redundant and noisy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
45a992f5a8 fix(obs): fix OTLP transport port and add application metrics tag
- Change OTEL default endpoint from port 4317 (gRPC) to 4318 (HTTP) to
  match Spring Boot's HttpExporter; sending HTTP/1.1 to a gRPC listener
  caused "Connection reset" errors
- Add otel.logs.exporter=none: Promtail captures Docker logs via the
  logging driver; sending logs to Tempo's OTLP endpoint (which only
  handles traces) produced 404 errors
- Add management.metrics.tags.application to every metric so Grafana's
  Spring Boot Observability dashboard (ID 17175) can filter by the
  application label_values() template variable
- Add MANAGEMENT_METRICS_TAGS_APPLICATION and OTEL_LOGS_EXPORTER env
  vars to docker-compose.prod.yml; production Tempo endpoint already
  uses 4318
- Add MANAGEMENT_TRACING_SAMPLING_PROBABILITY to prod compose with
  0.1 default to avoid 100% trace sampling in production

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
bd57310bbf docs(obs): document promtail job label mapping in DEPLOYMENT.md
The job label (derived from the Docker Compose service name) is what
powers {job="backend"} queries in Loki dashboards and populates the
Grafana "App" variable dropdown. Operators need to know this mapping
when writing custom Loki queries.

Addresses @markus non-blocker suggestion from PR #606 review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
c2d092f435 docs(adr): add ADR-017 — Spring Boot 4.0 management port shares main security filter chain
Documents the architectural decision behind the dedicated management
SecurityFilterChain, the discovery that SB4+Jetty removed the isolated
management child-context security, and the consequences for actuator
endpoint exposure.

Addresses @markus blocker from PR #606 review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
e19bd60984 fix(obs): add management security chain and split Prometheus IT tests
- Add @Order(1) managementFilterChain scoped to /actuator/** with explicit
  401 entry point, blocking all non-public actuator paths without the
  form-login redirect that the main chain uses for browser clients.
- Split single combined test into two focused assertions
  (prometheus_endpoint_returns_200_without_credentials,
   prometheus_endpoint_returns_jvm_metrics).
- Add negative regression test: actuator_metrics_requires_authentication
  verifies that /actuator/metrics returns 401 without credentials.

Addresses reviewer concerns from @sara (missing negative test, split
assertions) and @nora (dedicated management security layer).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
2aa0ff9e70 fix(obs): wire Prometheus endpoint for Spring Boot 4.0
Four Spring Boot 4.0-specific issues prevented /actuator/prometheus from working:

1. spring-boot-starter-micrometer-metrics missing — Spring Boot 4.0 splits
   Micrometer metrics export (including the Prometheus scrape endpoint) out of
   spring-boot-starter-actuator into its own starter. Added dependency.

2. management.prometheus.metrics.export.enabled not set — Spring Boot 4.0
   defaults metrics export to false (opt-in). Added the property to
   application.yaml.

3. SecurityConfig did not permit /actuator/prometheus — Spring Boot 4.0
   with Jetty serves the management port (8081) via the same security filter
   chain as the main port (8080). The previous commit's exclusion of
   ManagementWebSecurityAutoConfiguration was a no-op (that class no longer
   exists in Spring Boot 4.0); removed it and added the correct permitAll()
   rule. Updated the architecture comment in application.yaml to reflect the
   true filter-chain behaviour.

4. Reverted invalid FamilienarchivApplication.java change from the prior
   commit (ManagementWebSecurityAutoConfiguration import compiled against a
   class that does not exist in the Spring Boot 4.0 BOM).

Also adds ActuatorPrometheusIT — an integration test that asserts the
/actuator/prometheus endpoint returns 200 with jvm_memory_used_bytes without
credentials, serving as regression protection against future Spring Boot
upgrades silently breaking metrics collection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
5dd74df293 fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards
Three root causes confirmed via live server investigation (issue #604):

1. ManagementWebSecurityAutoConfiguration applied HTTP Basic auth to the
   management port (8081), causing Prometheus to receive 401 HTML responses
   instead of metrics. Excluded the auto-config — the Docker network
   (archiv-net) provides the security boundary for this internal port.

2. promtail-config.yml had no `job` relabel rule. Grafana's Loki dashboards
   query {job="$app"} which matched nothing; logs were in Loki under
   compose_service but invisible to every dashboard panel.

3. prometheus.yml had a stale comment claiming the spring-boot target would
   be DOWN until micrometer-registry-prometheus was added — it has been
   present in pom.xml for some time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
7712180f3a docs(claude): add generation guidance to GRAFANA_ADMIN_PASSWORD env var
All checks were successful
CI / fail2ban Regex (push) Successful in 40s
CI / Unit & Component Tests (pull_request) Successful in 3m5s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 2m35s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
CI / Unit & Component Tests (push) Successful in 3m1s
CI / OCR Service Tests (push) Successful in 18s
CI / Backend Unit Tests (push) Successful in 2m38s
CI / Compose Bucket Idempotency (push) Successful in 57s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:25:25 +02:00
Marcel
c9a22945c8 docs(claude): add URL format example to GLITCHTIP_DOMAIN env var
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:25:01 +02:00
Marcel
9d84ebc4fe docs(deployment): add VITE_SENTRY_DSN to §3.3 Gitea secrets table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:24:37 +02:00
Marcel
58b9204395 docs(deployment): add VITE_SENTRY_DSN to §2 observability env vars table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:15:24 +02:00
Marcel
0d662f3a5e docs(c4): update GlitchTip image tag to 6.1.6 in L2 container diagram
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:14:17 +02:00
Marcel
2e864e5b81 docs(infra): remove stale 'observability not yet deployed' note
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 2m42s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
Replace with a cross-reference to DEPLOYMENT.md §4 now that the obs
stack shipped as docker-compose.observability.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:54:04 +02:00
Marcel
40d9713b79 docs(deployment): fix stale GlitchTip image tags and add SENTRY_DSN to env vars table
- GlitchTip image corrected from glitchtip:v4 to glitchtip:6.1.6 in services table
- Grafana default port corrected from 3001 to 3003 in services table description
- SENTRY_DSN added to backend env vars table (wired in docker-compose.yml and application.yaml)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:53:31 +02:00