Files
familienarchiv/docs/adr/016-obs-stack-co-location-ci-push.md
Marcel 4c5ee96e36 docs(adr): correct ADR-016 Decision section to match two-source env model
The Decision section described an operator-managed /opt/familienarchiv/.env
that CI does not touch. The actual implementation is a two-source model:
obs.env (git-tracked, non-secret config) + obs-secrets.env (CI-written
fresh from Gitea secrets on every deploy). Also updates the Consequences
bullet that incorrectly stated secrets are decoupled from CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:52:42 +02:00

5.7 KiB

ADR-016: Observability stack co-location at /opt/familienarchiv/ with CI-push config sync

Status

Accepted

Context

Issue #601 established that the observability stack must survive Gitea CI workspace wipes between nightly runs. When the nightly job completes, act_runner deletes the job workspace. Any Docker container that bind-mounts a config file from a workspace path (/srv/gitea-workspace/…/infra/observability/prometheus/prometheus.yml) then references a path that no longer exists on the host. On the next nightly run, Docker Compose either auto-creates an empty directory in its place (causing the container to fail to start because a file mount receives a directory) or finds a stale file from a previous run if the workspace happened to land at the same path.

ADR-015 solved the workspace bind-mount resolution problem: job workspaces are stored at /srv/gitea-workspace so $(pwd) inside the job container maps to a real host path. But it did not address persistence: the workspace is still wiped after the job, so bind mounts from workspace-relative paths remain fragile across runs.

Decision drivers

  1. Bind-mount sources must point to a host path that persists indefinitely, not to a path that disappears after each CI run.
  2. Config files must reflect the committed state of the repo after every nightly run (no manual sync steps).
  3. Secrets must not be written to the workspace or to any path managed by CI; they must survive independently of deployments.
  4. The solution must not introduce new infrastructure dependencies (no SSH access from CI, no external registry, no additional server-side daemon).

Alternatives considered

A: Server-pull model — a systemd timer or cron job on the server does git pull from the repo into /opt/familienarchiv/ and then runs docker compose up. Rejected because: (1) requires git credentials on the server and a registered deploy key, (2) adds a second deployment mechanism that diverges from the CI-push model used for the main app stack, (3) timing coupling — the server pull must complete before CI's health checks run, requiring polling or a webhook.

B: Separate directory (e.g. /opt/obs/) — keeps obs configs isolated from the app stack. Rejected because: (1) the main app compose files are already in /opt/familienarchiv/ (managed the same way), and (2) GlitchTip shares the archive-db PostgreSQL instance and archiv-net Docker network — it is architecturally part of the same deployment unit, not a separate one. Co-location reflects the actual coupling.

C: Named Docker configs (Swarm) — Docker Swarm supports first-class config objects that persist in the cluster. Rejected because the project does not use Swarm and introducing it solely for config persistence is a disproportionate dependency.

Decision

The observability stack is co-located with the main application deployment at /opt/familienarchiv/:

  • docker-compose.observability.yml/opt/familienarchiv/docker-compose.observability.yml
  • infra/observability//opt/familienarchiv/infra/observability/

Both the nightly CI job (nightly.yml) and the release job (release.yml) copy these files from the workspace checkout to /opt/familienarchiv/ using cp -r on every run (CI-push model). Containers always read config from the permanent location; a workspace wipe has no effect on running containers.

Environment variables follow a two-source model:

  • infra/observability/obs.env (git-tracked, non-secret): all non-sensitive config — host ports, public URLs (GLITCHTIP_DOMAIN, GF_SERVER_ROOT_URL), and the default POSTGRES_HOST. Changes go through PR review. No credentials.
  • /opt/familienarchiv/obs-secrets.env (CI-written, per-deploy): passwords and secret keys only (GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_HOST), injected fresh from Gitea secrets on every nightly and release deploy. Gitea is the single source of truth for secrets — rotating a secret takes effect on the next deploy without manual server action.

Both files are passed explicitly via --env-file to every obs compose command (config dry-run and up). There is no implicit auto-read .env. The required key inventory is documented in docs/DEPLOYMENT.md §4.

The CI runner mounts /opt/familienarchiv as a bind mount into job containers (see runner-config.yaml). This requires a one-time mkdir -p /opt/familienarchiv/infra on the server and a runner restart after updating runner-config.yaml (see ADR-015 and docs/DEPLOYMENT.md §3.1).

Consequences

Positive:

  • Bind-mount sources survive workspace wipes by definition — they are on a persistent host path.
  • Config is always in sync with the repo after each nightly run.
  • No new infrastructure dependencies; the CI-push model mirrors how the main app stack is deployed.
  • Secret rotation requires no manual server action — Gitea secrets are the authoritative store; obs-secrets.env is rewritten from scratch on every deploy so a secret change takes effect on the next nightly or release run.

Negative:

  • cp -r does not remove deleted files; a config file removed from the repo persists in /opt/familienarchiv/infra/observability/ until manually deleted. Acceptable for this project's change frequency. A rsync -a --delete would give a clean mirror if this becomes a problem.
  • Mounting /opt/familienarchiv/ into CI job containers expands the blast radius of a compromised workflow step — a malicious step could overwrite app compose files and Caddy config. Acceptable because the runner is single-tenant (trusted code only). See runner-config.yaml security comment.
  • Runner must be restarted (systemctl restart gitea-runner) after any change to runner-config.yaml for the new mount to take effect.