docs(adr): add ADR-016 for obs stack co-location and CI-push config sync
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
52
docs/adr/016-obs-stack-co-location-ci-push.md
Normal file
52
docs/adr/016-obs-stack-co-location-ci-push.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# ADR-016: Observability stack co-location at `/opt/familienarchiv/` with CI-push config sync
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Issue #601 established that the observability stack must survive Gitea CI workspace wipes between nightly runs. When the nightly job completes, act_runner deletes the job workspace. Any Docker container that bind-mounts a config file from a workspace path (`/srv/gitea-workspace/…/infra/observability/prometheus/prometheus.yml`) then references a path that no longer exists on the host. On the next nightly run, Docker Compose either auto-creates an empty directory in its place (causing the container to fail to start because a file mount receives a directory) or finds a stale file from a previous run if the workspace happened to land at the same path.
|
||||
|
||||
ADR-015 solved the workspace bind-mount resolution problem: job workspaces are stored at `/srv/gitea-workspace` so `$(pwd)` inside the job container maps to a real host path. But it did not address persistence: the workspace is still wiped after the job, so bind mounts from workspace-relative paths remain fragile across runs.
|
||||
|
||||
### Decision drivers
|
||||
|
||||
1. Bind-mount sources must point to a host path that persists indefinitely, not to a path that disappears after each CI run.
|
||||
2. Config files must reflect the committed state of the repo after every nightly run (no manual sync steps).
|
||||
3. Secrets must not be written to the workspace or to any path managed by CI; they must survive independently of deployments.
|
||||
4. The solution must not introduce new infrastructure dependencies (no SSH access from CI, no external registry, no additional server-side daemon).
|
||||
|
||||
### Alternatives considered
|
||||
|
||||
**A: Server-pull model** — a systemd timer or cron job on the server does `git pull` from the repo into `/opt/familienarchiv/` and then runs `docker compose up`. Rejected because: (1) requires git credentials on the server and a registered deploy key, (2) adds a second deployment mechanism that diverges from the CI-push model used for the main app stack, (3) timing coupling — the server pull must complete before CI's health checks run, requiring polling or a webhook.
|
||||
|
||||
**B: Separate directory (e.g. `/opt/obs/`)** — keeps obs configs isolated from the app stack. Rejected because: (1) the main app compose files are already in `/opt/familienarchiv/` (managed the same way), and (2) GlitchTip shares the `archive-db` PostgreSQL instance and `archiv-net` Docker network — it is architecturally part of the same deployment unit, not a separate one. Co-location reflects the actual coupling.
|
||||
|
||||
**C: Named Docker configs (Swarm)** — Docker Swarm supports first-class config objects that persist in the cluster. Rejected because the project does not use Swarm and introducing it solely for config persistence is a disproportionate dependency.
|
||||
|
||||
## Decision
|
||||
|
||||
The observability stack is co-located with the main application deployment at `/opt/familienarchiv/`:
|
||||
|
||||
- `docker-compose.observability.yml` → `/opt/familienarchiv/docker-compose.observability.yml`
|
||||
- `infra/observability/` → `/opt/familienarchiv/infra/observability/`
|
||||
|
||||
The nightly CI job (`nightly.yml`) copies these files from the workspace checkout to `/opt/familienarchiv/` using `cp -r` on every run (CI-push model). Containers always read config from the permanent location; a workspace wipe has no effect on running containers.
|
||||
|
||||
Secrets are stored in `/opt/familienarchiv/.env` on the server. This file is managed by the operator — CI does not write or delete it. Docker Compose auto-reads it when started from `/opt/familienarchiv/`. The required key inventory is documented in `docs/DEPLOYMENT.md §4`.
|
||||
|
||||
The CI runner mounts `/opt/familienarchiv` as a bind mount into job containers (see `runner-config.yaml`). This requires a one-time `mkdir -p /opt/familienarchiv/infra` on the server and a runner restart after updating `runner-config.yaml` (see ADR-015 and `docs/DEPLOYMENT.md §3.1`).
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Bind-mount sources survive workspace wipes by definition — they are on a persistent host path.
|
||||
- Config is always in sync with the repo after each nightly run.
|
||||
- No new infrastructure dependencies; the CI-push model mirrors how the main app stack is deployed.
|
||||
- Secrets (`/opt/familienarchiv/.env`) are decoupled from CI — a deployment cannot accidentally overwrite them.
|
||||
|
||||
**Negative:**
|
||||
- `cp -r` does not remove deleted files; a config file removed from the repo persists in `/opt/familienarchiv/infra/observability/` until manually deleted. Acceptable for this project's change frequency. A `rsync -a --delete` would give a clean mirror if this becomes a problem.
|
||||
- Mounting `/opt/familienarchiv/` into CI job containers expands the blast radius of a compromised workflow step — a malicious step could overwrite app compose files and Caddy config. Acceptable because the runner is single-tenant (trusted code only). See `runner-config.yaml` security comment.
|
||||
- Runner must be restarted (`systemctl restart gitea-runner`) after any change to `runner-config.yaml` for the new mount to take effect.
|
||||
Reference in New Issue
Block a user