The Decision section described an operator-managed /opt/familienarchiv/.env that CI does not touch. The actual implementation is a two-source model: obs.env (git-tracked, non-secret config) + obs-secrets.env (CI-written fresh from Gitea secrets on every deploy). Also updates the Consequences bullet that incorrectly stated secrets are decoupled from CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.7 KiB
ADR-016: Observability stack co-location at /opt/familienarchiv/ with CI-push config sync
Status
Accepted
Context
Issue #601 established that the observability stack must survive Gitea CI workspace wipes between nightly runs. When the nightly job completes, act_runner deletes the job workspace. Any Docker container that bind-mounts a config file from a workspace path (/srv/gitea-workspace/…/infra/observability/prometheus/prometheus.yml) then references a path that no longer exists on the host. On the next nightly run, Docker Compose either auto-creates an empty directory in its place (causing the container to fail to start because a file mount receives a directory) or finds a stale file from a previous run if the workspace happened to land at the same path.
ADR-015 solved the workspace bind-mount resolution problem: job workspaces are stored at /srv/gitea-workspace so $(pwd) inside the job container maps to a real host path. But it did not address persistence: the workspace is still wiped after the job, so bind mounts from workspace-relative paths remain fragile across runs.
Decision drivers
- Bind-mount sources must point to a host path that persists indefinitely, not to a path that disappears after each CI run.
- Config files must reflect the committed state of the repo after every nightly run (no manual sync steps).
- Secrets must not be written to the workspace or to any path managed by CI; they must survive independently of deployments.
- The solution must not introduce new infrastructure dependencies (no SSH access from CI, no external registry, no additional server-side daemon).
Alternatives considered
A: Server-pull model — a systemd timer or cron job on the server does git pull from the repo into /opt/familienarchiv/ and then runs docker compose up. Rejected because: (1) requires git credentials on the server and a registered deploy key, (2) adds a second deployment mechanism that diverges from the CI-push model used for the main app stack, (3) timing coupling — the server pull must complete before CI's health checks run, requiring polling or a webhook.
B: Separate directory (e.g. /opt/obs/) — keeps obs configs isolated from the app stack. Rejected because: (1) the main app compose files are already in /opt/familienarchiv/ (managed the same way), and (2) GlitchTip shares the archive-db PostgreSQL instance and archiv-net Docker network — it is architecturally part of the same deployment unit, not a separate one. Co-location reflects the actual coupling.
C: Named Docker configs (Swarm) — Docker Swarm supports first-class config objects that persist in the cluster. Rejected because the project does not use Swarm and introducing it solely for config persistence is a disproportionate dependency.
Decision
The observability stack is co-located with the main application deployment at /opt/familienarchiv/:
docker-compose.observability.yml→/opt/familienarchiv/docker-compose.observability.ymlinfra/observability/→/opt/familienarchiv/infra/observability/
Both the nightly CI job (nightly.yml) and the release job (release.yml) copy these files from the workspace checkout to /opt/familienarchiv/ using cp -r on every run (CI-push model). Containers always read config from the permanent location; a workspace wipe has no effect on running containers.
Environment variables follow a two-source model:
infra/observability/obs.env(git-tracked, non-secret): all non-sensitive config — host ports, public URLs (GLITCHTIP_DOMAIN,GF_SERVER_ROOT_URL), and the defaultPOSTGRES_HOST. Changes go through PR review. No credentials./opt/familienarchiv/obs-secrets.env(CI-written, per-deploy): passwords and secret keys only (GRAFANA_ADMIN_PASSWORD,GLITCHTIP_SECRET_KEY,POSTGRES_USER,POSTGRES_PASSWORD,POSTGRES_HOST), injected fresh from Gitea secrets on every nightly and release deploy. Gitea is the single source of truth for secrets — rotating a secret takes effect on the next deploy without manual server action.
Both files are passed explicitly via --env-file to every obs compose command (config dry-run and up). There is no implicit auto-read .env. The required key inventory is documented in docs/DEPLOYMENT.md §4.
The CI runner mounts /opt/familienarchiv as a bind mount into job containers (see runner-config.yaml). This requires a one-time mkdir -p /opt/familienarchiv/infra on the server and a runner restart after updating runner-config.yaml (see ADR-015 and docs/DEPLOYMENT.md §3.1).
Consequences
Positive:
- Bind-mount sources survive workspace wipes by definition — they are on a persistent host path.
- Config is always in sync with the repo after each nightly run.
- No new infrastructure dependencies; the CI-push model mirrors how the main app stack is deployed.
- Secret rotation requires no manual server action — Gitea secrets are the authoritative store;
obs-secrets.envis rewritten from scratch on every deploy so a secret change takes effect on the next nightly or release run.
Negative:
cp -rdoes not remove deleted files; a config file removed from the repo persists in/opt/familienarchiv/infra/observability/until manually deleted. Acceptable for this project's change frequency. Arsync -a --deletewould give a clean mirror if this becomes a problem.- Mounting
/opt/familienarchiv/into CI job containers expands the blast radius of a compromised workflow step — a malicious step could overwrite app compose files and Caddy config. Acceptable because the runner is single-tenant (trusted code only). Seerunner-config.yamlsecurity comment. - Runner must be restarted (
systemctl restart gitea-runner) after any change torunner-config.yamlfor the new mount to take effect.