Commit Graph

16 Commits

Author SHA1 Message Date
Marcel
6145a25fe2 fix(obs): correct GlitchTip port and healthcheck for v6.x
Some checks failed
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
GlitchTip 6.x moved its internal listen port from 8080 to 8000.
The ports mapping was forwarding to the wrong port (host traffic
never reached the app), and the healthcheck was probing 8080 with
wget (not present in the image), causing the container to stay
permanently unhealthy.

Fix: map to port 8000, check with bash /dev/tcp (no external tools
needed, available in the Python base image).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:31:07 +02:00
3658733003 fix(obs): add GlitchTip healthcheck on /_health/ (port 8080)
Some checks failed
CI / Unit & Component Tests (push) Waiting to run
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / OCR Service Tests (push) Successful in 42s
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
2026-05-16 09:37:17 +02:00
Marcel
f9baf02b86 feat(obs): add GF_SERVER_ROOT_URL to Grafana service
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:17:47 +02:00
Marcel
1181b97f94 fix(obs): make Postgres host configurable and fix PORT_GRAFANA default
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m6s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 2m43s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
POSTGRES_HOST variable (default: archive-db) lets the observability stack
connect to a different Postgres container — needed when only the staging
stack is running (container name: archiv-staging-db-1).

PORT_GRAFANA default changed from 3001 to 3003 to avoid collision with
the staging frontend which occupies 3001.

Closes #601

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 21:46:11 +02:00
Marcel
56c3e51657 fix(ci): replace overlay2 sync with workspace bind mount for DooD
runner-config.yaml: correct path to /srv/gitea-workspace (VPS, not Synology).
docker-compose.observability.yml: revert 5 bind mounts to plain relative paths;
  OBS_CONFIG_DIR variable is no longer needed.
nightly.yml / release.yml: remove OBS_CONFIG_DIR env injection and the
  "Sync observability configs to host" step from both workflows.

With workdir_parent=/srv/gitea-workspace and an identical host<->container
bind mount, $(pwd) inside job containers resolves to a real host path the
daemon can find — no privileged container, no overlay2 inspection, no nsenter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:36:55 +02:00
Marcel
1fc47888d5 fix(ci): sync observability configs to host before docker compose up (#598)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m26s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m40s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
DooD runner only shares /var/run/docker.sock — no workspace directory is
mapped to the host daemon. Relative bind mounts in
docker-compose.observability.yml resolved to paths that didn't exist on
the host; Docker auto-created directories in their place, causing
'not a directory' mount failures for all five config files.

Fix:
- docker-compose.observability.yml: replace hardcoded ./infra/observability/
  prefix with ${OBS_CONFIG_DIR:-./infra/observability} so the path is
  configurable while remaining backwards-compatible for local use.
- nightly.yml / release.yml: add a 'Sync observability configs to host'
  step that finds the job container's overlay2 MergedDir (the container's
  full filesystem as seen from the host mount namespace), then uses the
  existing nsenter/alpine pattern to cp the config tree into a stable host
  path (/srv/familienarchiv-{staging,production}/obs-configs).
  OBS_CONFIG_DIR is injected into the env file so Compose picks it up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:02:53 +02:00
Marcel
d435b2b0e4 fix(infra): pin GlitchTip image to 6.1.6 (v4 tag never existed)
All checks were successful
CI / Unit & Component Tests (push) Successful in 3m30s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 2m34s
CI / fail2ban Regex (push) Successful in 40s
CI / Compose Bucket Idempotency (push) Successful in 57s
glitchtip/glitchtip:v4 is not a real tag — GlitchTip does not use a
v-prefix in its Docker image versioning. Latest stable release is 6.1.6.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 18:07:32 +02:00
Marcel
cd715029eb feat(observability): add GlitchTip error tracking to observability stack
Adds obs-glitchtip, obs-glitchtip-worker, obs-redis, and obs-glitchtip-db-init
services to docker-compose.observability.yml. The one-shot db-init container
creates the dedicated glitchtip database on the existing archive-db PostgreSQL
instance automatically on first stack start.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 04:38:06 +02:00
Marcel
457c1d3aee fix(observability): add grafana healthcheck and service_healthy depends_on
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 4m19s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 5m32s
CI / fail2ban Regex (pull_request) Successful in 48s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 04:09:13 +02:00
Marcel
f3f8345b03 feat(observability): add Grafana with provisioned datasources and dashboards
Add obs-grafana service (grafana/grafana-oss:11.6.1) to docker-compose.observability.yml.
Datasources (Prometheus, Loki, Tempo) are auto-provisioned via
infra/observability/grafana/provisioning/datasources/datasources.yml with
cross-datasource linking (Loki traceId → Tempo, Tempo → Loki, service map via Prometheus).
Three dashboards are pre-loaded: Node Exporter Full (1860), Spring Boot Observability (17175),
Loki Logs (13639) — datasource template variables replaced with provisioned UIDs.
GRAFANA_ADMIN_PASSWORD added to .env.example.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 04:03:33 +02:00
Marcel
de08ffe989 devops(observability): add Tempo for distributed trace storage (OTLP receiver)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m22s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 4m32s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 56s
Closes #575

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 03:01:22 +02:00
Marcel
c1406a32f1 devops(observability): fix C4 diagram, security comment, and add Loki compactor block
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m22s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m33s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 56s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 02:25:34 +02:00
Marcel
22e1b25398 devops(observability): add Loki + Promtail for centralised container log aggregation
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m31s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
- Add obs-loki (grafana/loki:3.4.2) to docker-compose.observability.yml
  with healthcheck (wget /ready), expose-only port 3100, named volume loki_data
- Add obs-promtail (grafana/promtail:3.4.2) bridging archiv-net + obs-net,
  depends_on loki service_healthy, docker.sock:ro, promtail_positions volume
  for restart-safe position tracking
- Create infra/observability/loki/loki-config.yml: single-node TSDB schema v13,
  30-day retention, auth disabled (obs-net only), telemetry off
- Create infra/observability/promtail/promtail-config.yml: Docker SD scrape,
  container_name / compose_service / compose_project / logstream labels
- Update docs/DEPLOYMENT.md §4 with service table and Loki quick-check commands

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 02:18:22 +02:00
Marcel
0c66f6298b devops(observability): fix Prometheus port binding, scrape port, and update DEPLOYMENT.md
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m35s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
- Fix spring-boot scrape target from backend:8080 to backend:8081 (actuator/management port)
- Restrict Prometheus host port binding to 127.0.0.1 to prevent unintended external exposure
- Add observability stack (Prometheus, Node Exporter, cAdvisor) to topology description
- Add PORT_PROMETHEUS env var to DEPLOYMENT.md reference table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 01:52:28 +02:00
Marcel
0c9973fdff devops(observability): add Prometheus + Node Exporter + cAdvisor for host and container metrics
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m22s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m40s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 01:47:07 +02:00
Marcel
1d42be9882 devops(observability): scaffold docker-compose.observability.yml and infra/observability/ structure
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m19s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m28s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 55s
Creates the skeleton observability stack (no running services yet) that all
subsequent Grafana LGTM + GlitchTip issues depend on:

- docker-compose.observability.yml: external archiv-net join, obs-net bridge,
  named volumes for all five services, placeholder comments for each service
  group (Metrics/Logs/Traces/Dashboards/Error Tracking), startup-order note
- infra/observability/{prometheus,loki,promtail,tempo,grafana/provisioning/{datasources,dashboards}}/.gitkeep
- .env.example: new # --- Observability --- section with PORT_GRAFANA,
  PORT_GLITCHTIP, PORT_PROMETHEUS, GLITCHTIP_DOMAIN, GLITCHTIP_SECRET_KEY
  (with generation hint), SENTRY_DSN, VITE_SENTRY_DSN

Verified: docker compose -f docker-compose.observability.yml config exits 0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 01:23:03 +02:00