docs: document observability stack in DEPLOYMENT.md and CLAUDE.md (#581) #605

Merged
marcel merged 8 commits from feat/issue-581-obs-docs into main 2026-05-16 11:32:19 +02:00
4 changed files with 36 additions and 8 deletions

View File

@@ -274,6 +274,31 @@ Back button pattern — use the shared `<BackButton>` component from `$lib/share
→ See [docs/DEPLOYMENT.md](./docs/DEPLOYMENT.md)
### Observability stack (separate compose file)
Run via `docker-compose.observability.yml` — requires the main stack to be running first. Full setup procedure: [docs/DEPLOYMENT.md §4](./docs/DEPLOYMENT.md#4-logs--observability).
| Service | Container | Default Port | Purpose |
|---------|-----------|-------------|---------|
| Grafana | `obs-grafana` | 3003 | Metrics / logs / traces dashboard |
| Prometheus | `obs-prometheus` | 9090 (dev only — `127.0.0.1` bound) | Metrics store |
| Loki | `obs-loki` | — (internal) | Log store |
| Tempo | `obs-tempo` | — (internal) | Trace store |
| GlitchTip | `obs-glitchtip` | 3002 | Error tracking (Sentry-compatible) |
### Observability env vars
| Variable | Purpose |
|----------|---------|
| `PORT_GRAFANA` | Host port for Grafana UI (default: `3003`) |
| `PORT_GLITCHTIP` | Host port for GlitchTip UI (default: `3002`) |
| `PORT_PROMETHEUS` | Host port for Prometheus UI (default: `9090`) |
| `GRAFANA_ADMIN_PASSWORD` | Grafana `admin` login password — generate with `openssl rand -hex 32` |
| `GLITCHTIP_SECRET_KEY` | Django secret key for GlitchTip — generate with `python3 -c "import secrets; print(secrets.token_hex(32))"` |
| `GLITCHTIP_DOMAIN` | Public-facing base URL for GlitchTip (email links, CORS), e.g. `https://glitchtip.example.com` |
| `SENTRY_DSN` | GlitchTip/Sentry DSN for the backend (Spring Boot) — leave empty to disable |
| `VITE_SENTRY_DSN` | GlitchTip/Sentry DSN for the frontend (SvelteKit) — injected at build time via Vite |
## API Testing
HTTP test files are in `backend/api_tests/` for use with the VS Code REST Client extension.

View File

@@ -109,6 +109,7 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `SPRING_PROFILES_ACTIVE` | Spring profile | `dev,e2e` (compose) | YES | — |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP gRPC endpoint for distributed traces (Tempo). Set to `http://tempo:4317` via compose. | `http://localhost:4317` | — | — |
| `MANAGEMENT_TRACING_SAMPLING_PROBABILITY` | Micrometer tracing sample rate; overridden to `0.0` in test profile. | `0.1` (compose) / `1.0` (dev) | — | — |
| `SENTRY_DSN` | GlitchTip / Sentry DSN for backend error reporting. Leave empty to disable the SDK. Set after GlitchTip first-run (§4). | — | — | YES |
### PostgreSQL container
@@ -148,6 +149,7 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `PORT_GLITCHTIP` | Host port for the GlitchTip UI (bound to `127.0.0.1` only) | `3002` | — | — |
| `GLITCHTIP_DOMAIN` | Public-facing base URL for GlitchTip (used in email links and CORS) | `http://localhost:3002` | YES (prod) | — |
| `GLITCHTIP_SECRET_KEY` | Django secret key for GlitchTip — generate with `python3 -c "import secrets; print(secrets.token_hex(32))"` | — | YES | YES |
| `VITE_SENTRY_DSN` | GlitchTip/Sentry DSN for the frontend (SvelteKit) — injected at build time via Vite. Leave empty to disable. Set after GlitchTip first-run (§4). | — | — | YES |
---
@@ -250,6 +252,7 @@ git.raddatz.cloud A <server IP>
| `GRAFANA_ADMIN_PASSWORD` | both | Grafana `admin` login — generate a strong password |
| `GLITCHTIP_SECRET_KEY` | both | Django secret key — `openssl rand -hex 32` |
| `SENTRY_DSN` | both | GlitchTip project DSN — set after first-run (§4); leave empty to keep Sentry disabled |
| `VITE_SENTRY_DSN` | both | GlitchTip frontend project DSN — set after first-run (§4); leave empty to keep Sentry disabled |
### 3.4 First deploy
@@ -373,9 +376,9 @@ Current services:
| `obs-loki` | `grafana/loki:3.4.2` | Log aggregation — receives log streams from Promtail. Port 3100 is `expose`-only (not host-bound). |
| `obs-promtail` | `grafana/promtail:3.4.2` | Log shipping agent — reads all Docker container logs via the Docker socket and forwards them to Loki with `container_name`, `compose_service`, and `compose_project` labels |
| `obs-tempo` | `grafana/tempo:2.7.2` | Distributed trace storage — OTLP gRPC receiver on port 4317, OTLP HTTP on port 4318 (both `archiv-net`-internal). Grafana queries traces on port 3200 (`obs-net`-internal). All ports are `expose`-only (not host-bound). |
| `obs-grafana` | `grafana/grafana-oss:11.6.1` | Unified observability UI — metrics dashboards, log exploration, trace viewer. Bound to `127.0.0.1:${PORT_GRAFANA:-3001}` on the host. |
| `obs-glitchtip` | `glitchtip/glitchtip:v4` | Sentry-compatible error tracker. Receives frontend + backend error events, groups by fingerprint, provides issue UI with stack traces. Bound to `127.0.0.1:${PORT_GLITCHTIP:-3002}`. |
| `obs-glitchtip-worker` | `glitchtip/glitchtip:v4` | Celery + beat worker — processes async GlitchTip tasks (event ingestion, notifications, cleanup). |
| `obs-grafana` | `grafana/grafana-oss:11.6.1` | Unified observability UI — metrics dashboards, log exploration, trace viewer. Bound to `127.0.0.1:${PORT_GRAFANA:-3003}` on the host. |
| `obs-glitchtip` | `glitchtip/glitchtip:6.1.6` | Sentry-compatible error tracker. Receives frontend + backend error events, groups by fingerprint, provides issue UI with stack traces. Bound to `127.0.0.1:${PORT_GLITCHTIP:-3002}`. |
| `obs-glitchtip-worker` | `glitchtip/glitchtip:6.1.6` | Celery + beat worker — processes async GlitchTip tasks (event ingestion, notifications, cleanup). |
| `obs-redis` | `redis:7-alpine` | Celery task broker for GlitchTip. Internal to `obs-net`; no host port exposed. |
| `obs-glitchtip-db-init` | `postgres:16-alpine` | One-shot init container. Creates the `glitchtip` database on the existing `archive-db` PostgreSQL instance if it does not already exist. Runs at stack startup; exits cleanly once done. |

View File

@@ -25,8 +25,8 @@ System_Boundary(observability, "Observability Stack (/opt/familienarchiv/docker-
Container(promtail, "Promtail", "grafana/promtail:3.4.2", "Ships Docker container logs to Loki via Docker SD.")
Container(tempo, "Tempo", "grafana/tempo:2.7.2", "Distributed trace storage. OTLP gRPC receiver on port 4317 (archiv-net). Grafana queries traces on port 3200 (obs-net). All ports internal only.")
Container(grafana, "Grafana", "grafana/grafana-oss:11.6.1", "Unified observability UI — dashboards, logs, traces. Datasources (Prometheus, Loki, Tempo) and three dashboards are auto-provisioned.")
Container(glitchtip, "GlitchTip", "glitchtip/glitchtip:v4", "Sentry-compatible error tracker — web process. Receives frontend + backend error events, groups by fingerprint, provides issue UI with stack traces.")
Container(obs_glitchtip_worker, "GlitchTip Worker", "glitchtip/glitchtip:v4", "Celery + beat worker — async event ingestion, notifications, cleanup.")
Container(glitchtip, "GlitchTip", "glitchtip/glitchtip:6.1.6", "Sentry-compatible error tracker — web process. Receives frontend + backend error events, groups by fingerprint, provides issue UI with stack traces.")
Container(obs_glitchtip_worker, "GlitchTip Worker", "glitchtip/glitchtip:6.1.6", "Celery + beat worker — async event ingestion, notifications, cleanup.")
Container(obs_redis, "Redis", "redis:7-alpine", "Celery task queue for GlitchTip async workers.")
}

View File

@@ -12,11 +12,11 @@ The original spec in this doc proposed an overlay pattern (`docker compose -f do
---
## Observability stack — not yet deployed
## Observability stack
Prometheus, Loki, Grafana, Alertmanager, Uptime Kuma, GlitchTip and ntfy are **not** part of the production deployment that #497 landed. They are tracked as follow-up issue #498.
The observability stack (Prometheus, Loki, Grafana, Tempo, GlitchTip) ships as a separate `docker-compose.observability.yml` alongside the main stack. Configuration lives under `infra/observability/`.
When that lands the observability containers will join `docker-compose.prod.yml` under a dedicated profile so they can be operated alongside the application stack without affecting the application containers' restart cycle.
→ See [docs/DEPLOYMENT.md §4](../DEPLOYMENT.md#4-logs--observability) for the full setup procedure, service URLs, first-run steps, and env var reference.
---