devops(observability): add Prometheus + Node Exporter + cAdvisor for host and container metrics #585

Merged
marcel merged 2 commits from feat/issue-573-prometheus-metrics into main 2026-05-15 02:15:10 +02:00
3 changed files with 9 additions and 2 deletions
Showing only changes of commit 0c66f6298b - Show all commits

View File

@@ -24,7 +24,7 @@ services:
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
ports:
- "${PORT_PROMETHEUS:-9090}:9090"
- "127.0.0.1:${PORT_PROMETHEUS:-9090}:9090"
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:9090/-/healthy"]
interval: 30s

View File

@@ -43,6 +43,7 @@ graph TD
- SSE notifications transit Caddy (browser → Caddy → backend); the backend is never reachable directly from the public internet. The SvelteKit SSR layer is bypassed for SSE, but Caddy is not.
- The Caddyfile responds `404` on `/actuator/*` (defense in depth). Internal monitoring scrapes the backend on the docker network, not through Caddy.
- Production and staging cohabit on the same host via docker compose project names: `archiv-production` (ports 8080/3000) and `archiv-staging` (ports 8081/3001).
- An optional observability stack (Prometheus, Node Exporter, cAdvisor) runs as a separate compose file: `docker compose -f docker-compose.observability.yml up -d`. It joins `archiv-net` and scrapes the backend's management port (`:8081`). Configuration lives under `infra/observability/`.
### OCR memory requirements
@@ -134,6 +135,12 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `BLLA_MODEL_PATH` | Kraken baseline layout analysis model path | `/app/models/blla.mlmodel` | — | — |
| `OCR_MEM_LIMIT` | Container memory cap for ocr-service in `docker-compose.prod.yml`. Set to `6g` on CX32 hosts; leave unset on CX42+ to use the 12g default | `12g` (prod compose default) | — | — |
### Observability stack (`docker-compose.observability.yml`)
| Variable | Purpose | Default | Required? | Sensitive? |
|---|---|---|---|---|
| `PORT_PROMETHEUS` | Host port for the Prometheus UI (bound to `127.0.0.1` only) | `9090` | — | — |
---
## 3. Bootstrap from scratch

View File

@@ -17,7 +17,7 @@ scrape_configs:
# Uses the Docker service name (not container_name) for reliable DNS resolution.
# Target will show as DOWN until backend instrumentation issue adds
# micrometer-registry-prometheus and exposes the endpoint — this is expected.
- targets: ['backend:8080']
- targets: ['backend:8081']
- job_name: ocr-service
metrics_path: /metrics