docs(adr): add ADR-015 for DooD workspace bind-mount approach
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m36s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m7s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s

Documents the decision to use workdir_parent + identical host<->container
path instead of the overlay2 MergedDir sync that was in the initial fix.
Captures the alternatives (nsenter sync, image-baked configs, path mismatch)
and the operational consequences (prereq directory, out-of-band compose.yaml).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-05-15 19:38:18 +02:00
parent 56c3e51657
commit 15ef079eff

View File

@@ -0,0 +1,69 @@
# ADR-015: DooD workspace bind mount for Compose file bind-mount resolution
## Status
Accepted
## Context
The deploy workflows (`.gitea/workflows/nightly.yml`, `release.yml`) run job steps inside Docker containers via Docker-out-of-Docker (DooD): the Gitea runner mounts the host Docker socket, and act_runner spawns sibling containers for each job.
When a job step calls `docker compose -f docker-compose.observability.yml up`, Docker Compose resolves relative bind-mount sources against `$(pwd)` inside the job container and passes the resulting absolute paths to the **host** daemon. For example, `./infra/observability/prometheus/prometheus.yml` becomes `/some/path/infra/observability/prometheus/prometheus.yml`, and the host daemon tries to bind-mount that path from the **host filesystem**.
In the default DooD setup (`runner-config.yaml` with only `valid_volumes: ["/var/run/docker.sock"]`), job container workspaces live in the act_runner overlay2 layer. The host has no corresponding directory at the job container's `$(pwd)` path, so the daemon auto-creates an empty directory in its place. The container then fails to start because the mount target was expected to be a file, not a directory:
```
error mounting "…/prometheus/prometheus.yml" to rootfs at "/etc/prometheus/prometheus.yml": not a directory
```
This affected all five config file bind mounts in `docker-compose.observability.yml`.
## Decision
Configure act_runner to store job workspaces on a real host path (`/srv/gitea-workspace`) and mount that path into both the runner container and every job container at the **same absolute path**. The identity of the host path and container path is the key constraint: Compose resolves to an absolute path and hands it to the host daemon, which looks for that exact path on the host filesystem.
**runner-config.yaml changes:**
```yaml
container:
workdir_parent: /srv/gitea-workspace
valid_volumes:
- "/var/run/docker.sock"
- "/srv/gitea-workspace"
options: "-v /srv/gitea-workspace:/srv/gitea-workspace"
```
**Runner compose.yaml change** (host side — not in this repo):
```yaml
runner:
volumes:
- /srv/gitea-workspace:/srv/gitea-workspace
```
With this in place, `$(pwd)` inside a job container resolves to `/srv/gitea-workspace/<owner>/<repo>/`, which is a real directory on the host. Compose-managed bind mounts from that directory work without any additional steps.
## Alternatives Considered
| Alternative | Why rejected |
|---|---|
| **overlay2 `MergedDir` sync via privileged nsenter** (the previous approach, see PR #599 v1) | Required `--privileged --pid=host` (effective root on the host) plus fragile overlay2 driver assumption. Introduced stale-file risk on the host and a second stable path (`/srv/familienarchiv-*/obs-configs`) to maintain separately from the source tree. Replaced by this ADR. |
| **Build configs into a dedicated Docker image** (pattern used for MinIO bootstrap, see `infra/minio/Dockerfile`) | Viable for static files that change infrequently. Requires a build step and an image rebuild every time a config changes. Appropriate for bootstrap scripts; too heavy for frequently-tuned observability configs. |
| **Add workspace directory to runner-config `valid_volumes` only** (without `workdir_parent`) | `valid_volumes` whitelists paths that workflow steps may reference, but does not change where act_runner stores workspaces. Without `workdir_parent`, the workspace would still be in overlay2 and the bind-mount resolution problem would remain. |
| **Map workspace under a different host path than container path** (e.g. host `/srv/workspace`, container `/workspace`) | Compose resolves to the container-internal path (e.g. `/workspace/…`) and passes that to the host daemon. The host daemon interprets the source as a host path. If host `/workspace` does not exist, the daemon creates an empty directory — the original bug. The paths must be identical. |
## Consequences
- `/srv/gitea-workspace` must exist on the VPS before the runner starts. The directory was created as part of this change; it is not created automatically.
- The runner container's `compose.yaml` (maintained outside this repo at `~/docker/gitea/compose.yaml` on the VPS) must include the `- /srv/gitea-workspace:/srv/gitea-workspace` volume line. This is an out-of-band operational dependency; the prerequisite is documented in `runner-config.yaml`.
- `workdir_parent` applies to all jobs on this runner. Any future workflow that calls `docker compose` with relative bind mounts benefits automatically without further configuration.
- Job workspaces persist across runs under `/srv/gitea-workspace`. act_runner manages per-run subdirectory cleanup. Orphaned directories from interrupted runs should be cleaned up manually if disk space becomes a concern.
- Workflows that previously relied on `OBS_CONFIG_DIR` env var or the `obs-configs` stable path on the host no longer need those. Both were removed in this PR.
- This pattern does **not** apply to the `nsenter`-based Caddy reload step (ADR-012), which manages a host systemd service — a different problem class with no bind-mount equivalent.
## References
- ADR-011 — single-tenant runner trust model
- ADR-012 — nsenter via privileged container for host service management
- Issue #598 — original observability stack bind-mount failure
- `runner-config.yaml``workdir_parent`, `valid_volumes`, `options`