Commit Graph

161 Commits

Author SHA1 Message Date
Marcel
1052295a6e docs(adr): add ADR-022 for CSRF, session revocation, and rate limiting
Documents the double-submit cookie CSRF pattern, sequential token-bucket
rate limiter with refund mechanic, and session revocation on password
change/reset — all implemented as part of issue #524.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 13:40:19 +02:00
Marcel
193a4d6ee6 docs(deployment): document ocr-volume-init bootstrap service in §8 upgrade notes
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m1s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m0s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / Unit & Component Tests (push) Successful in 3m5s
CI / OCR Service Tests (push) Successful in 19s
CI / Backend Unit Tests (push) Successful in 3m1s
CI / fail2ban Regex (push) Successful in 43s
CI / Semgrep Security Scan (push) Successful in 18s
CI / Compose Bucket Idempotency (push) Successful in 59s
Explains what ocr-volume-init does (chown volumes + create TMPDIR), how to
verify it succeeded (docker logs), and what failure looks like. Addresses
reviewer concerns from @mkeller and @tobiwendt on PR #615.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 11:23:04 +02:00
Marcel
cfd49ff69e docs(ocr): document TMPDIR convention and add ADR-021
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m7s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 3m7s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 18s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
- ocr-service/README.md: add HF_HOME, XDG_CACHE_HOME, TORCH_HOME, TMPDIR rows
  to the environment variables table
- ocr-service/CLAUDE.md: LLM reminder — TMPDIR must stay on the cache volume
- docs/adr/021-tmpdir-persistent-volume-staging.md: records the decision,
  trade-offs, and rejected alternatives (Approach B / C) for issue #614
- ci.yml: add test_tmpdir.py to the OCR CI run (stdlib-only tests, no ML stack)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 10:58:10 +02:00
Marcel
9b21d6aee8 docs(c4): l3-security includes auth package and Spring Session JDBC
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m1s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 2m57s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
CI / Unit & Component Tests (push) Successful in 3m3s
CI / OCR Service Tests (push) Successful in 17s
CI / Backend Unit Tests (push) Successful in 2m58s
CI / fail2ban Regex (push) Successful in 42s
CI / Semgrep Security Scan (push) Successful in 19s
CI / Compose Bucket Idempotency (push) Successful in 58s
nightly / deploy-staging (push) Failing after 3m35s
Replace the stale Basic-Auth picture with the post-#523 model:
AuthSessionController + AuthService (the new auth/ package), Spring Session
JDBC (spring_session*, 8h idle timeout, fa_session cookie), and the
ChangeSessionIdAuthenticationStrategy bean used by login to defend against
session fixation. Addresses PR #612 / Markus M3.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 22:54:55 +02:00
Marcel
3438260090 docs: rewrite seq-auth-flow.puml for the Spring Session model (ADR-020)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m4s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m6s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Semgrep Security Scan (pull_request) Successful in 17s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
Removes the cookie-promotion step (auth_token → Authorization: Basic) and
splits the diagram into three labelled phases: Login, Authenticated
request, Logout. Adds the spring_session DB round-trip on every
authenticated request and the alt branch for an expired session
returning 401 → /login?reason=expired.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 20:59:44 +02:00
Marcel
de7053644b docs(adr): ADR-020 — stateful auth via Spring Session JDBC
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 19:15:51 +02:00
Marcel
7769dbc9f4 security(ocr): apply container hardening baseline to docker-compose.prod.yml
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m4s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 18s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
Mirror the CIS Docker §4.1/§4.6 hardening from docker-compose.yml to the
production/staging compose file, which is standalone (not an overlay).

- Fix cache volume mount path: ocr-cache:/root/.cache → /app/cache (matches
  the non-root user's HF_HOME/XDG_CACHE_HOME, avoids PermissionError)
- Add HF_HOME, XDG_CACHE_HOME, TORCH_HOME env vars so HuggingFace, ketos,
  and PyTorch all write to the declared writable volumes, not HOME
- Add read_only: true, tmpfs (/tmp:512m), cap_drop: [ALL],
  no-new-privileges:true — matching the dev baseline

Also extend DEPLOYMENT.md §8 upgrade notes to cover all three environments
(dev/production/staging), each with its correct project-namespaced volume name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:43:18 +02:00
Marcel
74ca5ee35f docs(adr): ADR-019 — container hardening baseline (non-root + read-only)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m2s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m11s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 17s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:33:06 +02:00
Marcel
38973a014e docs: add XDG_CACHE_HOME/TORCH_HOME to OCR env table and upgrade notes for PR #611
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:32:02 +02:00
Marcel
de19d17b00 docs(adr): add ADR-018 for GlitchTip frontend error tracking via @sentry/sveltekit
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m4s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m39s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / Unit & Component Tests (push) Successful in 5m46s
CI / OCR Service Tests (push) Successful in 45s
CI / Backend Unit Tests (push) Failing after 10m32s
CI / fail2ban Regex (push) Successful in 3m7s
CI / Compose Bucket Idempotency (push) Successful in 2m26s
Documents the decision to use the Sentry SDK with self-hosted GlitchTip,
sendDefaultPii:false rationale, errorId surfacing to users, and alternatives
considered (Sentry SaaS rejected for data-minimisation reasons).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:27:46 +02:00
Marcel
2023ea2931 docs(c4): add GlitchTip as external error-tracking system to L1 context diagram
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 2m43s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 09:46:17 +02:00
Marcel
e61409773e docs(c4): fix Tempo OTLP transport in l2-containers diagram
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m1s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m37s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
CI / Unit & Component Tests (push) Successful in 3m1s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 2m38s
CI / fail2ban Regex (push) Successful in 39s
CI / Compose Bucket Idempotency (push) Successful in 58s
nightly / deploy-staging (push) Failing after 1m52s
Port 4317 is gRPC; the backend uses HttpExporter (HTTP/1.1) and sends
to port 4318. Update Container description and Rel label to match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:48:06 +02:00
Marcel
7713a03cd5 docs(obs): add OBSERVABILITY.md developer guide and fix stale env var docs
- New docs/OBSERVABILITY.md: developer-facing guide with a "where to look
  for what" table, common LogQL queries, trace exploration workflow,
  log→trace correlation via traceId links, and a signal summary table
- Link from DEPLOYMENT.md §4 (ops section now points to dev guide) and
  from CLAUDE.md Infrastructure section
- Fix stale DEPLOYMENT.md env var table: OTEL_EXPORTER_OTLP_ENDPOINT
  now documents port 4318 (HTTP) not 4317 (gRPC); add the three new
  env vars wired in this PR (OTEL_LOGS_EXPORTER, OTEL_METRICS_EXPORTER,
  MANAGEMENT_METRICS_TAGS_APPLICATION) with their rationale
- Fix stale obs-tempo service description (port 4318, not 4317)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:48:06 +02:00
Marcel
bd57310bbf docs(obs): document promtail job label mapping in DEPLOYMENT.md
The job label (derived from the Docker Compose service name) is what
powers {job="backend"} queries in Loki dashboards and populates the
Grafana "App" variable dropdown. Operators need to know this mapping
when writing custom Loki queries.

Addresses @markus non-blocker suggestion from PR #606 review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
c2d092f435 docs(adr): add ADR-017 — Spring Boot 4.0 management port shares main security filter chain
Documents the architectural decision behind the dedicated management
SecurityFilterChain, the discovery that SB4+Jetty removed the isolated
management child-context security, and the consequences for actuator
endpoint exposure.

Addresses @markus blocker from PR #606 review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 15:46:45 +02:00
Marcel
9d84ebc4fe docs(deployment): add VITE_SENTRY_DSN to §3.3 Gitea secrets table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:24:37 +02:00
Marcel
58b9204395 docs(deployment): add VITE_SENTRY_DSN to §2 observability env vars table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:15:24 +02:00
Marcel
0d662f3a5e docs(c4): update GlitchTip image tag to 6.1.6 in L2 container diagram
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:14:17 +02:00
Marcel
2e864e5b81 docs(infra): remove stale 'observability not yet deployed' note
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 2m42s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
Replace with a cross-reference to DEPLOYMENT.md §4 now that the obs
stack shipped as docker-compose.observability.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:54:04 +02:00
Marcel
40d9713b79 docs(deployment): fix stale GlitchTip image tags and add SENTRY_DSN to env vars table
- GlitchTip image corrected from glitchtip:v4 to glitchtip:6.1.6 in services table
- Grafana default port corrected from 3001 to 3003 in services table description
- SENTRY_DSN added to backend env vars table (wired in docker-compose.yml and application.yaml)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:53:31 +02:00
Marcel
134f1e2ae0 chore(runner): mount /opt/familienarchiv into job containers
The live runner config was missing /opt/familienarchiv in valid_volumes
and options, so deploy steps wrote files into the ephemeral job
container rather than the host — silently discarded on exit.

Updated /root/docker/gitea/runner-config.yaml on the server and
restarted gitea-runner. Repo file now matches the server exactly,
including the network: gitea_gitea setting that was previously
only on the server.

DEPLOYMENT.md: clarifies that /opt/familienarchiv does not need to be
in the runner container's own volumes (DooD spawns job containers from
the host daemon directly); updates restart command from systemctl to
docker restart; narrows the cp-r stale-file note to manual ops only
(CI uses rm -rf before copying).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:19:09 +02:00
Marcel
a7f60ebed8 docs(obs): add cp-r stale-file cleanup note to DEPLOYMENT.md
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m39s
CI / OCR Service Tests (pull_request) Successful in 46s
CI / Backend Unit Tests (pull_request) Failing after 9m24s
CI / fail2ban Regex (pull_request) Successful in 2m52s
CI / Compose Bucket Idempotency (pull_request) Successful in 2m24s
CI uses 'cp -r' which does not remove deleted files. Documents the
manual cleanup step for config files removed from git.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:04:41 +02:00
Marcel
4c5ee96e36 docs(adr): correct ADR-016 Decision section to match two-source env model
The Decision section described an operator-managed /opt/familienarchiv/.env
that CI does not touch. The actual implementation is a two-source model:
obs.env (git-tracked, non-secret config) + obs-secrets.env (CI-written
fresh from Gitea secrets on every deploy). Also updates the Consequences
bullet that incorrectly stated secrets are decoupled from CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:52:42 +02:00
Marcel
d83ed7254d docs(obs): document obs vs main stack env model, obs.env + obs-secrets.env approach
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:20:21 +02:00
Marcel
4e94d85d7e docs(adr): add ADR-016 for obs stack co-location and CI-push config sync
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:00:07 +02:00
Marcel
dec6b8139b docs(c4): update l2-containers obs boundary to show /opt/familienarchiv/ permanent path
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:59:11 +02:00
Marcel
7b7d0c92a8 docs(obs): update DEPLOYMENT.md with /opt/familienarchiv/ ops section, env keys, runner restart
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:58:42 +02:00
Marcel
e4ac5f08e7 docs(ci): document workspace bind-mount setup for DooD runners
Some checks failed
CI / OCR Service Tests (pull_request) Successful in 59s
CI / Unit & Component Tests (push) Has been cancelled
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Backend Unit Tests (pull_request) Successful in 5m22s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / Unit & Component Tests (pull_request) Failing after 14m44s
Add the /srv/gitea-workspace prerequisite step to DEPLOYMENT.md §3.1
and a new "Workspace bind-mount setup" subsection plus failure mode 4
to ci-gitea.md, covering the root cause, one-time host setup, disk
management, and troubleshooting for the bind-mount resolution fix
introduced in ADR-015.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:46:54 +02:00
Marcel
15ef079eff docs(adr): add ADR-015 for DooD workspace bind-mount approach
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m36s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m7s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
Documents the decision to use workdir_parent + identical host<->container
path instead of the overlay2 MergedDir sync that was in the initial fix.
Captures the alternatives (nsenter sync, image-baked configs, path mismatch)
and the operational consequences (prereq directory, out-of-band compose.yaml).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:38:18 +02:00
Marcel
553e2f8898 docs(deployment): add observability secrets to §3.3 Gitea secrets table
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 5m35s
CI / OCR Service Tests (pull_request) Successful in 33s
CI / Backend Unit Tests (pull_request) Successful in 7m10s
CI / fail2ban Regex (pull_request) Successful in 1m54s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m39s
GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, and SENTRY_DSN were
referenced in the workflow env files but absent from the secrets table,
leaving the first-run operator without a complete checklist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 13:46:01 +02:00
Marcel
67004737f6 fix(observability): define obs_glitchtip_worker Container in C4 diagram
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m45s
CI / OCR Service Tests (pull_request) Successful in 36s
CI / Backend Unit Tests (pull_request) Failing after 23m49s
CI / fail2ban Regex (pull_request) Successful in 2m13s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m46s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 04:43:09 +02:00
Marcel
3ced565aa2 docs(observability): document GlitchTip services in DEPLOYMENT.md and C4 diagram
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m53s
CI / OCR Service Tests (pull_request) Successful in 32s
CI / Backend Unit Tests (pull_request) Failing after 23m39s
CI / fail2ban Regex (pull_request) Successful in 2m13s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m55s
Adds GlitchTip env vars to the observability env var table, extends the
services table, and adds a first-run section with superuser creation and
project setup steps. Updates the C4 L2 container diagram with GlitchTip
and Redis containers and their relationships.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 04:38:47 +02:00
Marcel
c99321e5cf docs(observability): document Grafana in DEPLOYMENT.md and C4 diagram
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 4m7s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 5m41s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Add Grafana row to the observability services table, Grafana access details
(URL, credentials, auto-provisioned datasources, pre-loaded dashboards), and
GRAFANA_ADMIN_PASSWORD to the env vars table in DEPLOYMENT.md.
Update C4 l2-containers.puml: replace placeholder Grafana entry with pinned
image version, expand observability boundary with node_exporter and cadvisor
containers, and add Rel() edges for Grafana → Prometheus, Loki, and Tempo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 04:04:09 +02:00
Marcel
6ce6122384 docs: add OTEL and tracing env vars to DEPLOYMENT.md
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m22s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Failing after 2m33s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 54s
2026-05-15 03:29:38 +02:00
Marcel
de08ffe989 devops(observability): add Tempo for distributed trace storage (OTLP receiver)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m22s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 4m32s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 56s
Closes #575

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 03:01:22 +02:00
Marcel
c1406a32f1 devops(observability): fix C4 diagram, security comment, and add Loki compactor block
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m22s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m33s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 56s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 02:25:34 +02:00
Marcel
22e1b25398 devops(observability): add Loki + Promtail for centralised container log aggregation
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m31s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
- Add obs-loki (grafana/loki:3.4.2) to docker-compose.observability.yml
  with healthcheck (wget /ready), expose-only port 3100, named volume loki_data
- Add obs-promtail (grafana/promtail:3.4.2) bridging archiv-net + obs-net,
  depends_on loki service_healthy, docker.sock:ro, promtail_positions volume
  for restart-safe position tracking
- Create infra/observability/loki/loki-config.yml: single-node TSDB schema v13,
  30-day retention, auth disabled (obs-net only), telemetry off
- Create infra/observability/promtail/promtail-config.yml: Docker SD scrape,
  container_name / compose_service / compose_project / logstream labels
- Update docs/DEPLOYMENT.md §4 with service table and Loki quick-check commands

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 02:18:22 +02:00
Marcel
0c66f6298b devops(observability): fix Prometheus port binding, scrape port, and update DEPLOYMENT.md
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m35s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
- Fix spring-boot scrape target from backend:8080 to backend:8081 (actuator/management port)
- Restrict Prometheus host port binding to 127.0.0.1 to prevent unintended external exposure
- Add observability stack (Prometheus, Node Exporter, cAdvisor) to topology description
- Add PORT_PROMETHEUS env var to DEPLOYMENT.md reference table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 01:52:28 +02:00
Marcel
cf8d22d81b docs: update DEPLOYMENT.md and C4 diagram for observability scaffold
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m31s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m31s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
Replace the stale "no monitoring infrastructure in place yet" note in
§4 with a brief description of the observability compose file and a
pointer to issue #581 for full docs.

Add a placeholder System_Boundary block for Prometheus + Loki + Grafana
to l2-containers.puml, showing the stack joins archiv-net.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 01:28:14 +02:00
Marcel
e2632a556d docs: align ErrorCode 4-step checklist in CLAUDE.md; note frontend sync in ARCHITECTURE.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 19:26:53 +02:00
Marcel
6210480952 docs(ci-gitea): replace '← upgraded from v3' with ADR-014 pin comment
Lines 203, 230, and 332 carried comments that actively encouraged
the regression (they read as if v4 is the canonical target). Replaced
with the correct pinned-at-v3 comment referencing ADR-014.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 10:58:19 +02:00
Marcel
e17f4110f1 docs(adr-014): record upload-artifact v3 pin and Gitea act_runner v4 limitation
Documents the three-incident history, the enforcement layers (inline
comments + grep guard + ADR), how to spot the symptom, and the explicit
upgrade trigger (act_runner v4 protocol support OR v3 CVE).

Cross-references ADR-011 (single-tenant Gitea runner) and #557.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 10:58:19 +02:00
Marcel
582191d014 docs(adr-013): set exact baseline to 75% (confirmed in CI)
Some checks failed
CI / Backend Unit Tests (pull_request) Successful in 4m20s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Failing after 11s
CI / Unit & Component Tests (pull_request) Failing after 2m35s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Unit & Component Tests (push) Failing after 2m35s
CI / OCR Service Tests (push) Successful in 17s
CI / Backend Unit Tests (push) Successful in 4m18s
CI / fail2ban Regex (push) Successful in 39s
CI / Compose Bucket Idempotency (push) Failing after 11s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 09:03:45 +02:00
Marcel
118100e58d chore(coverage): drop client branches threshold 80→75, add ADR-013
Some checks failed
CI / Unit & Component Tests (push) Has been cancelled
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (pull_request) Failing after 2m35s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m21s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Failing after 11s
Branches gate was blocking CI at 75% measured coverage. The 80% floor
suffers Istanbul parent/child denominator coupling (long-tail grind, per
#496) that makes the remaining gap disproportionately costly to close.
Drop branches to 75 to match current state; leave lines/functions/
statements at 80. ADR-013 documents the rationale and the ratchet rule
for raising the gate back incrementally.

Closes #556

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 08:54:59 +02:00
Marcel
2e6cc346ab docs(adr-012): document duplicate-id hazard and patch-package backport
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m33s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 4m12s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Failing after 11s
CI / Unit & Component Tests (push) Failing after 2m33s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 4m16s
CI / fail2ban Regex (push) Successful in 38s
CI / Compose Bucket Idempotency (push) Failing after 11s
nightly / deploy-staging (push) Failing after 1m46s
Adds a second binding invariant section to ADR-012 covering the
duplicate-id mechanism named in #553's follow-up investigation: same
resolved module URL referenced via two distinct vi.mock id strings →
@vitest/browser-playwright leaks an orphan Playwright route → birpc-closed
crash in the next session.

Records the rule (one canonical id per mocked module, prefer the spelling
production uses, no-extension for .svelte rune modules), the in-suite
detector (no-duplicate-mock-ids.test.ts), and the patch-package backport
of vitest PR #10267 with its removal trigger.

Extends the existing Consequences enforcement list from four layers to
six, adding the duplicate-id detector and the patch-package layer.

Refs: #553 · vitest-dev/vitest#9957 · vitest-dev/vitest#10267

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 08:09:57 +02:00
Marcel
addf5c98db docs(adr-012): record the sync-factory invariant and the $app/state migration
Some checks failed
CI / Unit & Component Tests (push) Failing after 1m49s
CI / OCR Service Tests (push) Successful in 17s
CI / Backend Unit Tests (push) Successful in 4m13s
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
The previous revision allowed vi.mock for virtual modules on the "consumer
import is static" argument. #553 proved that argument wrong: a statically-
imported module with an async factory body whose dynamic import landed
after teardown still produced the race. The factory body — not the
consumer — is the failure surface.

- Drop the "residual exceptions" table.
- Add the binding invariant: factory bodies under `**/*.svelte.{test,spec}.ts`
  must be synchronous (no `await`, no `import(...)`).
- Document the canonical vi.hoisted + getter pattern, with file references.
- Record the $app/stores → $app/state architectural call (Markus's
  recommendation), removing one of the last two deprecated-import
  outliers.
- Record the preload-data=off hardening (Tobias's recommendation) as a
  pattern note.
- Update the Enforcement section to list all four defence layers (ESLint,
  CI grep, in-suite meta-test, CI birpc assert) and the coverage-flake-
  probe verification workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:14:31 +02:00
Marcel
3005782a75 docs(adr-012): correct pattern note to document button+goto, not anchor+preventDefault
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 17:43:11 +02:00
Marcel
7c8811e439 refactor(eslint): ban vi.mock('pdfjs-dist', ...) in spec/test files — ADR 012
Adds a no-restricted-syntax rule scoped to *.spec.ts / *.test.ts that
flags any vi.mock call whose first argument starts with 'pdfjs-dist'.
Turns the ~2-min CI wait into an immediate lint error on save.
Updates ADR 012 Enforcement section to document the rule.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 12:32:23 +02:00
Marcel
763e9f5708 docs(adr-012): add overlay navigation pattern note
Some checks failed
CI / Backend Unit Tests (push) Successful in 4m10s
CI / fail2ban Regex (push) Successful in 38s
CI / Compose Bucket Idempotency (push) Failing after 11s
CI / Unit & Component Tests (push) Failing after 1m50s
CI / OCR Service Tests (push) Successful in 16s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 11:35:40 +02:00
Marcel
7f07180c71 docs(adr-012): add enforcement note and libLoader revisit signal
Some checks failed
CI / OCR Service Tests (push) Successful in 15s
CI / Unit & Component Tests (push) Failing after 2m4s
CI / Backend Unit Tests (push) Successful in 4m6s
CI / fail2ban Regex (push) Successful in 38s
CI / Compose Bucket Idempotency (push) Failing after 10s
- Explicitly states no lint rule is planned; CI guard is the backstop
  (addresses Elicit OQ-001 from PR #536 round 4)
- Adds a "when to revisit" note: extract shared DynamicImportLoader<T>
  if 3+ components adopt the libLoader pattern
  (addresses Markus Keller round-4 observation on PR #536)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 09:57:28 +02:00