diff --git a/docs/adr/016-obs-stack-co-location-ci-push.md b/docs/adr/016-obs-stack-co-location-ci-push.md new file mode 100644 index 00000000..839d08f6 --- /dev/null +++ b/docs/adr/016-obs-stack-co-location-ci-push.md @@ -0,0 +1,52 @@ +# ADR-016: Observability stack co-location at `/opt/familienarchiv/` with CI-push config sync + +## Status + +Accepted + +## Context + +Issue #601 established that the observability stack must survive Gitea CI workspace wipes between nightly runs. When the nightly job completes, act_runner deletes the job workspace. Any Docker container that bind-mounts a config file from a workspace path (`/srv/gitea-workspace/…/infra/observability/prometheus/prometheus.yml`) then references a path that no longer exists on the host. On the next nightly run, Docker Compose either auto-creates an empty directory in its place (causing the container to fail to start because a file mount receives a directory) or finds a stale file from a previous run if the workspace happened to land at the same path. + +ADR-015 solved the workspace bind-mount resolution problem: job workspaces are stored at `/srv/gitea-workspace` so `$(pwd)` inside the job container maps to a real host path. But it did not address persistence: the workspace is still wiped after the job, so bind mounts from workspace-relative paths remain fragile across runs. + +### Decision drivers + +1. Bind-mount sources must point to a host path that persists indefinitely, not to a path that disappears after each CI run. +2. Config files must reflect the committed state of the repo after every nightly run (no manual sync steps). +3. Secrets must not be written to the workspace or to any path managed by CI; they must survive independently of deployments. +4. The solution must not introduce new infrastructure dependencies (no SSH access from CI, no external registry, no additional server-side daemon). + +### Alternatives considered + +**A: Server-pull model** — a systemd timer or cron job on the server does `git pull` from the repo into `/opt/familienarchiv/` and then runs `docker compose up`. Rejected because: (1) requires git credentials on the server and a registered deploy key, (2) adds a second deployment mechanism that diverges from the CI-push model used for the main app stack, (3) timing coupling — the server pull must complete before CI's health checks run, requiring polling or a webhook. + +**B: Separate directory (e.g. `/opt/obs/`)** — keeps obs configs isolated from the app stack. Rejected because: (1) the main app compose files are already in `/opt/familienarchiv/` (managed the same way), and (2) GlitchTip shares the `archive-db` PostgreSQL instance and `archiv-net` Docker network — it is architecturally part of the same deployment unit, not a separate one. Co-location reflects the actual coupling. + +**C: Named Docker configs (Swarm)** — Docker Swarm supports first-class config objects that persist in the cluster. Rejected because the project does not use Swarm and introducing it solely for config persistence is a disproportionate dependency. + +## Decision + +The observability stack is co-located with the main application deployment at `/opt/familienarchiv/`: + +- `docker-compose.observability.yml` → `/opt/familienarchiv/docker-compose.observability.yml` +- `infra/observability/` → `/opt/familienarchiv/infra/observability/` + +The nightly CI job (`nightly.yml`) copies these files from the workspace checkout to `/opt/familienarchiv/` using `cp -r` on every run (CI-push model). Containers always read config from the permanent location; a workspace wipe has no effect on running containers. + +Secrets are stored in `/opt/familienarchiv/.env` on the server. This file is managed by the operator — CI does not write or delete it. Docker Compose auto-reads it when started from `/opt/familienarchiv/`. The required key inventory is documented in `docs/DEPLOYMENT.md §4`. + +The CI runner mounts `/opt/familienarchiv` as a bind mount into job containers (see `runner-config.yaml`). This requires a one-time `mkdir -p /opt/familienarchiv/infra` on the server and a runner restart after updating `runner-config.yaml` (see ADR-015 and `docs/DEPLOYMENT.md §3.1`). + +## Consequences + +**Positive:** +- Bind-mount sources survive workspace wipes by definition — they are on a persistent host path. +- Config is always in sync with the repo after each nightly run. +- No new infrastructure dependencies; the CI-push model mirrors how the main app stack is deployed. +- Secrets (`/opt/familienarchiv/.env`) are decoupled from CI — a deployment cannot accidentally overwrite them. + +**Negative:** +- `cp -r` does not remove deleted files; a config file removed from the repo persists in `/opt/familienarchiv/infra/observability/` until manually deleted. Acceptable for this project's change frequency. A `rsync -a --delete` would give a clean mirror if this becomes a problem. +- Mounting `/opt/familienarchiv/` into CI job containers expands the blast radius of a compromised workflow step — a malicious step could overwrite app compose files and Caddy config. Acceptable because the runner is single-tenant (trusted code only). See `runner-config.yaml` security comment. +- Runner must be restarted (`systemctl restart gitea-runner`) after any change to `runner-config.yaml` for the new mount to take effect.