Files

Marcel bbdf1c3e67 docs(adr): ADR-012 — nsenter via privileged container for host service management in CI

Captures the architectural decision, alternatives considered (sudo
systemctl, Caddy admin API, SSH), and consequences (symlink contract,
Renovate review requirement, step duplication tracked in #539).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-12 07:42:28 +02:00

5.4 KiB

Raw Permalink Blame History

ADR-012: nsenter via privileged sibling container for host service management in CI

Status

Accepted

Context

The deploy workflows (.gitea/workflows/nightly.yml, release.yml) run job steps inside Docker containers under a Docker-out-of-Docker (DooD) setup: the Gitea runner container mounts the host Docker socket, and act_runner spawns a sibling container for each job. That job container also gets the Docker socket mounted (via valid_volumes in runner-config.yaml).

This architecture has one significant limitation: job containers cannot manage host services. Specifically:

Job containers are not in the host's PID, mount, UTS, network, or IPC namespaces.
There is no systemd PID 1 inside a job container — systemctl has nothing to talk to.
sudo is not present in standard container images; even if it were, it would not help.
Caddy runs as a host systemd service (not a Docker container), managing TLS certificates via Let's Encrypt. It must be running on the host to serve port 443.

The deploy workflows need to tell Caddy to reload its config after each deploy so that committed Caddyfile changes are applied before the smoke test validates the public surface. Without a reload step, Caddy silently serves the previous config and the smoke test may pass against stale configuration.

Decision

Use the host Docker socket (already mounted in every job container via runner-config.yaml) to spin up a privileged sibling container in the host PID namespace, then use nsenter to enter all host namespaces and call systemctl reload caddy:

- name: Reload Caddy
  run: |
    docker run --rm --privileged --pid=host \
      alpine:3.21@sha256:48b0309ca019d89d40f670aa1bc06e426dc0931948452e8491e3d65087abc07d \
      sh -c 'apk add --no-cache util-linux -q && nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy'

nsenter -t 1 -m -u -n -p -i enters the init process's mount, UTS, IPC, network, PID, and cgroup namespaces, giving systemctl a view of the real host systemd daemon.

Alpine is used instead of Ubuntu: ~5 MB vs ~70 MB pull size, no unnecessary tooling. util-linux (which ships nsenter) is installed at run time; apk add takes ~1 s on the warm VPS cache. The image digest is pinned so any upstream change requires an explicit Renovate bump PR.

reload not restart: reload sends SIGHUP so Caddy re-reads its config in-process without dropping TLS connections or in-flight requests.

No sudoers entry is required: the Docker socket already grants root-equivalent host access. This pattern makes existing implicit privileges explicit rather than introducing new ones.

This decision applies the same pattern to both nightly.yml and release.yml since both deploy the app stack and must apply Caddyfile changes before smoke-testing the public surface.

Alternatives Considered

Alternative	Why rejected
`sudo systemctl reload caddy` in the job container	No systemd PID 1 inside the container — `systemctl` has nothing to connect to. `sudo` is not present in container images and would not help even if it were.
Caddy admin API (`curl localhost:2019/load`)	Job containers do not share the host network namespace; `localhost:2019` on the host is unreachable. Exposing `:2019` on a host-bound port would add a network attack surface with no benefit over the current approach.
SSH from the job container to the VPS host	Requires storing an SSH private key as a CI secret, managing authorized_keys on the host, and opening an inbound SSH path from the container. Adds key management overhead for a pattern that the Docker socket already enables more directly.
Running Caddy as a Docker container (instead of host service)	Caddy manages TLS certificates via Let's Encrypt; running it in Docker complicates certificate persistence and renewal. As a host service, cert storage is straightforward and restarts do not risk rate-limit issues. This would be a larger infrastructure change unrelated to the CI gap.

Consequences

The runner host's Docker socket access is now a capability relied upon for host service management, not just for running docker compose commands. This is stated explicitly in the YAML comment so future reviewers understand the trust boundary.
The Caddyfile symlink on the VPS (/etc/caddy/Caddyfile → /opt/familienarchiv/infra/caddy/Caddyfile) is a required contract for CI to succeed. It is documented in docs/DEPLOYMENT.md §3.1 and docs/infrastructure/ci-gitea.md. If the symlink is absent or mis-pointed, systemctl reload caddy succeeds but Caddy serves stale config.
Renovate will create bump PRs when a new Alpine 3.21 digest is published. Because the container runs --privileged --pid=host, these bump PRs must be reviewed manually and must not be auto-merged. A packageRule in renovate.json enforces this.
The step is duplicated between nightly.yml and release.yml (tracked in issue #539 for extraction into a composite action).
If Caddy is not running when the step executes, systemctl reload exits non-zero and the workflow aborts before the smoke test — preventing a misleading "port 443 refused" curl error.

References

docs/infrastructure/ci-gitea.md §"Running host-level commands from CI (nsenter pattern)" — full operational context, troubleshooting guide
docs/DEPLOYMENT.md §3.1 — Caddyfile symlink bootstrap step
ADR-011 — single-tenant runner trust model (Docker socket access scope)

5.4 KiB Raw Permalink Blame History