Files
familienarchiv/docs/adr/016-obs-stack-co-location-ci-push.md
2026-05-16 00:00:07 +02:00

5.0 KiB

ADR-016: Observability stack co-location at /opt/familienarchiv/ with CI-push config sync

Status

Accepted

Context

Issue #601 established that the observability stack must survive Gitea CI workspace wipes between nightly runs. When the nightly job completes, act_runner deletes the job workspace. Any Docker container that bind-mounts a config file from a workspace path (/srv/gitea-workspace/…/infra/observability/prometheus/prometheus.yml) then references a path that no longer exists on the host. On the next nightly run, Docker Compose either auto-creates an empty directory in its place (causing the container to fail to start because a file mount receives a directory) or finds a stale file from a previous run if the workspace happened to land at the same path.

ADR-015 solved the workspace bind-mount resolution problem: job workspaces are stored at /srv/gitea-workspace so $(pwd) inside the job container maps to a real host path. But it did not address persistence: the workspace is still wiped after the job, so bind mounts from workspace-relative paths remain fragile across runs.

Decision drivers

  1. Bind-mount sources must point to a host path that persists indefinitely, not to a path that disappears after each CI run.
  2. Config files must reflect the committed state of the repo after every nightly run (no manual sync steps).
  3. Secrets must not be written to the workspace or to any path managed by CI; they must survive independently of deployments.
  4. The solution must not introduce new infrastructure dependencies (no SSH access from CI, no external registry, no additional server-side daemon).

Alternatives considered

A: Server-pull model — a systemd timer or cron job on the server does git pull from the repo into /opt/familienarchiv/ and then runs docker compose up. Rejected because: (1) requires git credentials on the server and a registered deploy key, (2) adds a second deployment mechanism that diverges from the CI-push model used for the main app stack, (3) timing coupling — the server pull must complete before CI's health checks run, requiring polling or a webhook.

B: Separate directory (e.g. /opt/obs/) — keeps obs configs isolated from the app stack. Rejected because: (1) the main app compose files are already in /opt/familienarchiv/ (managed the same way), and (2) GlitchTip shares the archive-db PostgreSQL instance and archiv-net Docker network — it is architecturally part of the same deployment unit, not a separate one. Co-location reflects the actual coupling.

C: Named Docker configs (Swarm) — Docker Swarm supports first-class config objects that persist in the cluster. Rejected because the project does not use Swarm and introducing it solely for config persistence is a disproportionate dependency.

Decision

The observability stack is co-located with the main application deployment at /opt/familienarchiv/:

  • docker-compose.observability.yml/opt/familienarchiv/docker-compose.observability.yml
  • infra/observability//opt/familienarchiv/infra/observability/

The nightly CI job (nightly.yml) copies these files from the workspace checkout to /opt/familienarchiv/ using cp -r on every run (CI-push model). Containers always read config from the permanent location; a workspace wipe has no effect on running containers.

Secrets are stored in /opt/familienarchiv/.env on the server. This file is managed by the operator — CI does not write or delete it. Docker Compose auto-reads it when started from /opt/familienarchiv/. The required key inventory is documented in docs/DEPLOYMENT.md §4.

The CI runner mounts /opt/familienarchiv as a bind mount into job containers (see runner-config.yaml). This requires a one-time mkdir -p /opt/familienarchiv/infra on the server and a runner restart after updating runner-config.yaml (see ADR-015 and docs/DEPLOYMENT.md §3.1).

Consequences

Positive:

  • Bind-mount sources survive workspace wipes by definition — they are on a persistent host path.
  • Config is always in sync with the repo after each nightly run.
  • No new infrastructure dependencies; the CI-push model mirrors how the main app stack is deployed.
  • Secrets (/opt/familienarchiv/.env) are decoupled from CI — a deployment cannot accidentally overwrite them.

Negative:

  • cp -r does not remove deleted files; a config file removed from the repo persists in /opt/familienarchiv/infra/observability/ until manually deleted. Acceptable for this project's change frequency. A rsync -a --delete would give a clean mirror if this becomes a problem.
  • Mounting /opt/familienarchiv/ into CI job containers expands the blast radius of a compromised workflow step — a malicious step could overwrite app compose files and Caddy config. Acceptable because the runner is single-tenant (trusted code only). See runner-config.yaml security comment.
  • Runner must be restarted (systemctl restart gitea-runner) after any change to runner-config.yaml for the new mount to take effect.