Marcel
457c1d3aee
fix(observability): add grafana healthcheck and service_healthy depends_on
...
CI / Unit & Component Tests (pull_request) Successful in 4m19s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 5m32s
CI / fail2ban Regex (pull_request) Successful in 48s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-15 04:09:13 +02:00
Marcel
f3f8345b03
feat(observability): add Grafana with provisioned datasources and dashboards
...
Add obs-grafana service (grafana/grafana-oss:11.6.1) to docker-compose.observability.yml.
Datasources (Prometheus, Loki, Tempo) are auto-provisioned via
infra/observability/grafana/provisioning/datasources/datasources.yml with
cross-datasource linking (Loki traceId → Tempo, Tempo → Loki, service map via Prometheus).
Three dashboards are pre-loaded: Node Exporter Full (1860), Spring Boot Observability (17175),
Loki Logs (13639) — datasource template variables replaced with provisioned UIDs.
GRAFANA_ADMIN_PASSWORD added to .env.example.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-15 04:03:33 +02:00
Marcel
de08ffe989
devops(observability): add Tempo for distributed trace storage (OTLP receiver)
...
CI / Unit & Component Tests (pull_request) Successful in 3m22s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 4m32s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 56s
Closes #575
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-15 03:01:22 +02:00
Marcel
c1406a32f1
devops(observability): fix C4 diagram, security comment, and add Loki compactor block
...
CI / Unit & Component Tests (pull_request) Successful in 3m22s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m33s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 56s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-15 02:25:34 +02:00
Marcel
22e1b25398
devops(observability): add Loki + Promtail for centralised container log aggregation
...
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m31s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
- Add obs-loki (grafana/loki:3.4.2) to docker-compose.observability.yml
with healthcheck (wget /ready), expose-only port 3100, named volume loki_data
- Add obs-promtail (grafana/promtail:3.4.2) bridging archiv-net + obs-net,
depends_on loki service_healthy, docker.sock:ro, promtail_positions volume
for restart-safe position tracking
- Create infra/observability/loki/loki-config.yml: single-node TSDB schema v13,
30-day retention, auth disabled (obs-net only), telemetry off
- Create infra/observability/promtail/promtail-config.yml: Docker SD scrape,
container_name / compose_service / compose_project / logstream labels
- Update docs/DEPLOYMENT.md §4 with service table and Loki quick-check commands
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-15 02:18:22 +02:00
Marcel
0c66f6298b
devops(observability): fix Prometheus port binding, scrape port, and update DEPLOYMENT.md
...
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m35s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
- Fix spring-boot scrape target from backend:8080 to backend:8081 (actuator/management port)
- Restrict Prometheus host port binding to 127.0.0.1 to prevent unintended external exposure
- Add observability stack (Prometheus, Node Exporter, cAdvisor) to topology description
- Add PORT_PROMETHEUS env var to DEPLOYMENT.md reference table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-15 01:52:28 +02:00
Marcel
0c9973fdff
devops(observability): add Prometheus + Node Exporter + cAdvisor for host and container metrics
...
CI / Unit & Component Tests (pull_request) Successful in 3m22s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m40s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-15 01:47:07 +02:00
Marcel
1d42be9882
devops(observability): scaffold docker-compose.observability.yml and infra/observability/ structure
...
CI / Unit & Component Tests (pull_request) Successful in 3m19s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m28s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 55s
Creates the skeleton observability stack (no running services yet) that all
subsequent Grafana LGTM + GlitchTip issues depend on:
- docker-compose.observability.yml: external archiv-net join, obs-net bridge,
named volumes for all five services, placeholder comments for each service
group (Metrics/Logs/Traces/Dashboards/Error Tracking), startup-order note
- infra/observability/{prometheus,loki,promtail,tempo,grafana/provisioning/{datasources,dashboards}}/.gitkeep
- .env.example: new # --- Observability --- section with PORT_GRAFANA,
PORT_GLITCHTIP, PORT_PROMETHEUS, GLITCHTIP_DOMAIN, GLITCHTIP_SECRET_KEY
(with generation hint), SENTRY_DSN, VITE_SENTRY_DSN
Verified: docker compose -f docker-compose.observability.yml config exits 0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-15 01:23:03 +02:00