From 7b7d0c92a8f2e998134e7afdc5841e55ea499bb9 Mon Sep 17 00:00:00 2001 From: Marcel Date: Fri, 15 May 2026 23:58:42 +0200 Subject: [PATCH] docs(obs): update DEPLOYMENT.md with /opt/familienarchiv/ ops section, env keys, runner restart Co-Authored-By: Claude Sonnet 4.6 --- docs/DEPLOYMENT.md | 53 +++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 48 insertions(+), 5 deletions(-) diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index b906d66f..6f6c2e52 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -43,7 +43,7 @@ graph TD - SSE notifications transit Caddy (browser → Caddy → backend); the backend is never reachable directly from the public internet. The SvelteKit SSR layer is bypassed for SSE, but Caddy is not. - The Caddyfile responds `404` on `/actuator/*` (defense in depth). Internal monitoring scrapes the backend on the docker network, not through Caddy. - Production and staging cohabit on the same host via docker compose project names: `archiv-production` (ports 8080/3000) and `archiv-staging` (ports 8081/3001). -- An optional observability stack (Prometheus, Node Exporter, cAdvisor) runs as a separate compose file: `docker compose -f docker-compose.observability.yml up -d`. It joins `archiv-net` and scrapes the backend's management port (`:8081`). Configuration lives under `infra/observability/`. +- An optional observability stack (Prometheus, Node Exporter, cAdvisor, Loki, Tempo, Grafana, GlitchTip) runs as a separate compose file. Configuration lives under `infra/observability/`. In production and CI, the stack is managed from `/opt/familienarchiv/` (CI copies it there on every nightly run) so bind mounts survive workspace wipes — see §4 for the ops procedure. ### OCR memory requirements @@ -142,7 +142,8 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back | Variable | Purpose | Default | Required? | Sensitive? | |---|---|---|---|---| | `PORT_PROMETHEUS` | Host port for the Prometheus UI (bound to `127.0.0.1` only) | `9090` | — | — | -| `PORT_GRAFANA` | Host port for the Grafana UI (bound to `127.0.0.1` only) | `3001` | — | — | +| `PORT_GRAFANA` | Host port for the Grafana UI (bound to `127.0.0.1` only) | `3003` | — | — | +| `POSTGRES_HOST` | PostgreSQL hostname for GlitchTip's db-init job and workers. Override when only the staging stack is running and `archive-db` is not resolvable by that name. | `archive-db` | — | — | | `GRAFANA_ADMIN_PASSWORD` | Grafana `admin` user password | `changeme` | YES (prod) | YES | | `PORT_GLITCHTIP` | Host port for the GlitchTip UI (bound to `127.0.0.1` only) | `3002` | — | — | | `GLITCHTIP_DOMAIN` | Public-facing base URL for GlitchTip (used in email links and CORS) | `http://localhost:3002` | YES (prod) | — | @@ -202,6 +203,18 @@ mkdir -p /srv/gitea-workspace # volumes: # - /srv/gitea-workspace:/srv/gitea-workspace # See runner-config.yaml (workdir_parent + valid_volumes + options) and ADR-015. + +# Observability config permanent directory — the nightly CI job copies +# docker-compose.observability.yml and infra/observability/ here on every run. +# The obs stack is always started from this path, not from the workspace. +# See ADR-016 for why this directory is used instead of a server-pull approach. +mkdir -p /opt/familienarchiv/infra + +# ⚠ IMPORTANT: after any change to runner-config.yaml (valid_volumes, options, workdir_parent), +# restart the Gitea Act runner on the host for the new config to take effect: +# systemctl restart gitea-runner +# Until restarted, job containers are spawned with the old config and any new bind mounts +# (e.g. /opt/familienarchiv) will not be available inside job steps. ``` ### 3.2 DNS records @@ -284,13 +297,43 @@ docker compose logs --tail=200 ### Observability stack -An observability stack is available via `docker-compose.observability.yml`. Configuration lives under `infra/observability/`. Start it after the main stack is up (which creates `archiv-net`): +An observability stack is available via `docker-compose.observability.yml`. Configuration lives under `infra/observability/`. + +#### Dev — start from the workspace ```bash docker compose up -d # creates archiv-net docker compose -f docker-compose.observability.yml up -d ``` +#### Production — managed from `/opt/familienarchiv/` + +The nightly CI job copies `docker-compose.observability.yml` and `infra/observability/` to `/opt/familienarchiv/` on every run, then starts the stack from there. Bind mounts in the compose file resolve to `/opt/familienarchiv/infra/observability/…` on the host, which survives workspace wipes between CI runs (see ADR-016). + +The obs stack reads secrets from `/opt/familienarchiv/.env` (Docker Compose auto-reads this file when launched from that directory). This file is managed by the operator — CI does **not** write or delete it. + +**Required keys in `/opt/familienarchiv/.env`:** + +| Key | Example / notes | +|---|---| +| `GRAFANA_ADMIN_PASSWORD` | Strong unique password | +| `GLITCHTIP_SECRET_KEY` | `python3 -c "import secrets; print(secrets.token_hex(32))"` | +| `GLITCHTIP_DOMAIN` | `https://glitchtip.archiv.raddatz.cloud` — must match the Caddy vhost | +| `POSTGRES_USER` | Must match the `archiv` user set in `.env.staging` / `.env.production` | +| `POSTGRES_PASSWORD` | Must match the running PostgreSQL container's password | +| `PORT_GRAFANA` | `3003` (staging default; 3001 was used by staging frontend) | +| `PORT_GLITCHTIP` | `3002` | +| `PORT_PROMETHEUS` | `9090` | +| `SENTRY_DSN` | Set after GlitchTip first-run; leave empty to disable | + +**`$$` escaping rule:** passwords that contain a literal `$` must use `$$` in this file so Docker Compose does not expand them as variable references. Example: a password `p@$word` must be written as `p@$$word`. Failure to escape produces a silently truncated password — Grafana or GlitchTip will start but reject logins. + +To start or restart the obs stack manually on the server: + +```bash +docker compose -f /opt/familienarchiv/docker-compose.observability.yml up -d --wait --remove-orphans +``` + Current services: | Service | Image | Purpose | @@ -311,7 +354,7 @@ Current services: | Item | Value | |---|---| -| URL | `http://localhost:3001` (or `http://localhost:$PORT_GRAFANA`) | +| URL | `http://localhost:3003` (or `http://localhost:$PORT_GRAFANA`) | | Username | `admin` | | Password | `$GRAFANA_ADMIN_PASSWORD` (default: `changeme` — **change before exposing to a network**) | @@ -341,7 +384,7 @@ docker exec obs-loki wget -qO- \ **Prefer `compose_service` over `container_name` in LogQL queries** — `container_name` differs between dev (`archive-backend`) and prod (`archiv-production-backend-1`), while `compose_service` is stable (`backend`, `db`, `minio`, etc.). -Prometheus port `9090` and Grafana port `3001` are bound to `127.0.0.1` on the host. No other observability ports are host-bound. +Prometheus port `9090` and Grafana port `3003` (default; configurable via `PORT_GRAFANA`) are bound to `127.0.0.1` on the host. No other observability ports are host-bound. #### GlitchTip