From 90f52eae413055283fa3b4b30c54ed29c1ec2bca Mon Sep 17 00:00:00 2001 From: Marcel Date: Mon, 11 May 2026 21:49:32 +0200 Subject: [PATCH 01/10] ci(nightly): reload Caddy before smoke test Adds a `sudo systemctl reload caddy` step between the docker compose deploy and the smoke test. This ensures any committed Caddyfile changes are applied before the public surface is verified. Previously the workflow had no mechanism to push Caddyfile changes to the running host daemon. A Caddyfile edit would land in the repo but Caddy would keep serving the previous config, causing the smoke test to catch a stale header or still-proxied /actuator route rather than the intended current config. This step also surfaces the root cause of today's port-443 failure explicitly: if Caddy is not running, the step fails with a clear service error rather than a misleading "Failed to connect to port 443" from curl. Co-Authored-By: Claude Sonnet 4.6 --- .gitea/workflows/nightly.yml | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/.gitea/workflows/nightly.yml b/.gitea/workflows/nightly.yml index da11ebf7..2cb71392 100644 --- a/.gitea/workflows/nightly.yml +++ b/.gitea/workflows/nightly.yml @@ -120,6 +120,20 @@ jobs: --profile staging \ up -d --wait --remove-orphans + - name: Reload Caddy + # Apply any committed Caddyfile changes before smoke-testing the + # public surface. Without this step, a Caddyfile edit lands in the + # repo but Caddy keeps serving the previous config until someone + # reloads it manually — the smoke test would then catch a stale + # header or a still-proxied /actuator route rather than confirming + # the current config is live. + # + # `systemctl reload caddy` sends SIGHUP; Caddy re-reads /etc/caddy/Caddyfile + # (symlinked to infra/caddy/Caddyfile) without dropping connections. + # If Caddy is not running this step fails fast and clearly before the + # smoke test issues a misleading "port 443 refused" error. + run: sudo systemctl reload caddy + - name: Smoke test deployed environment # Healthchecks confirm containers are healthy; they do NOT confirm the # public surface works. This step catches: Caddy not reloaded, HSTS -- 2.49.1 From d750d5cee2ff41930639e19dd51f4d56cbf50955 Mon Sep 17 00:00:00 2001 From: Marcel Date: Mon, 11 May 2026 22:28:24 +0200 Subject: [PATCH 02/10] fix(ci): reload Caddy via nsenter, not sudo systemctl `sudo systemctl reload caddy` does not work from inside a DooD job container: `systemctl` is absent from Ubuntu container images and container processes cannot reach the host systemd without entering its namespaces. Replace with `docker run --privileged --pid=host ubuntu:22.04 nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy`, which uses the already-mounted Docker socket to spin up a privileged sibling container that enters the host PID namespace via nsenter. Tested live on the Hetzner VPS. No sudoers entry required. Co-Authored-By: Claude Sonnet 4.6 --- .gitea/workflows/nightly.yml | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/.gitea/workflows/nightly.yml b/.gitea/workflows/nightly.yml index 2cb71392..72eaa60f 100644 --- a/.gitea/workflows/nightly.yml +++ b/.gitea/workflows/nightly.yml @@ -128,11 +128,23 @@ jobs: # header or a still-proxied /actuator route rather than confirming # the current config is live. # - # `systemctl reload caddy` sends SIGHUP; Caddy re-reads /etc/caddy/Caddyfile - # (symlinked to infra/caddy/Caddyfile) without dropping connections. - # If Caddy is not running this step fails fast and clearly before the - # smoke test issues a misleading "port 443 refused" error. - run: sudo systemctl reload caddy + # The runner executes job steps inside Docker containers (DooD). + # `systemctl` is not present in Ubuntu container images and cannot + # reach the host's systemd directly. We use the Docker socket + # (mounted into every job container via runner-config.yaml) to spin + # up a privileged sibling container in the host PID namespace; + # nsenter then enters the host's namespaces so systemctl talks to + # the real host systemd daemon. No sudoers entry is required — the + # Docker socket already grants root-equivalent host access. + # + # `systemctl reload caddy` sends SIGHUP; Caddy re-reads + # /etc/caddy/Caddyfile (symlinked to infra/caddy/Caddyfile) without + # dropping connections. If Caddy is not running this step fails fast + # before the smoke test issues a misleading "port 443 refused" error. + run: | + docker run --rm --privileged --pid=host \ + ubuntu:22.04 \ + nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy - name: Smoke test deployed environment # Healthchecks confirm containers are healthy; they do NOT confirm the -- 2.49.1 From d29169eb39013c79098bb0340cc9a7bed7f4ac4c Mon Sep 17 00:00:00 2001 From: Marcel Date: Mon, 11 May 2026 22:29:02 +0200 Subject: [PATCH 03/10] fix(ci): add Caddy reload step to release workflow Same gap as nightly.yml: production deploys also need Caddy to reload the updated Caddyfile before the smoke test validates the public surface. Uses the same nsenter pattern introduced in the previous commit. Co-Authored-By: Claude Sonnet 4.6 --- .gitea/workflows/release.yml | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/.gitea/workflows/release.yml b/.gitea/workflows/release.yml index 7d2d3618..c53797c4 100644 --- a/.gitea/workflows/release.yml +++ b/.gitea/workflows/release.yml @@ -93,6 +93,17 @@ jobs: --env-file .env.production \ up -d --wait --remove-orphans + - name: Reload Caddy + # See nightly.yml — same rationale and mechanism: DooD job containers + # cannot call systemctl directly; nsenter via a privileged sibling + # container reaches the host systemd. Must run after deploy (so the + # latest Caddyfile is on disk) and before the smoke test (so the + # public surface reflects the current config). + run: | + docker run --rm --privileged --pid=host \ + ubuntu:22.04 \ + nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy + - name: Smoke test deployed environment # See nightly.yml — same three checks, against the prod vhost. # --resolve pins archiv.raddatz.cloud to the runner's loopback so -- 2.49.1 From fe2cdaae83c5c627fb2c1a7b032fd982b0ab14bb Mon Sep 17 00:00:00 2001 From: Marcel Date: Mon, 11 May 2026 22:29:39 +0200 Subject: [PATCH 04/10] docs(ci): document DooD runner architecture and nsenter pattern Replace the stale generic runner provisioning docs with an accurate description of the actual two-container setup on the Hetzner VPS. Document the nsenter pattern for running host-level commands (systemctl) from containerised CI steps, and the Caddyfile symlink contract that the reload step depends on. Co-Authored-By: Claude Sonnet 4.6 --- docs/infrastructure/ci-gitea.md | 45 ++++++++++++++++++++++++++++----- 1 file changed, 39 insertions(+), 6 deletions(-) diff --git a/docs/infrastructure/ci-gitea.md b/docs/infrastructure/ci-gitea.md index 3f96583e..f58bfde4 100644 --- a/docs/infrastructure/ci-gitea.md +++ b/docs/infrastructure/ci-gitea.md @@ -4,16 +4,49 @@ This document covers the Gitea Actions CI workflow for Familienarchiv, including --- -## Self-Hosted Runner Provisioning +## Runner Architecture -Gitea Actions requires self-hosted runners. GitHub Actions provides `ubuntu-latest` for free; on Gitea you run the runner yourself. +Familienarchiv uses **two runners** on the same Hetzner VPS: -```bash -# On the VPS — register a Gitea Actions runner -docker run -d --name gitea-runner --restart unless-stopped -v /var/run/docker.sock:/var/run/docker.sock -v gitea-runner-data:/data -e GITEA_INSTANCE_URL=https://gitea.example.com -e GITEA_RUNNER_REGISTRATION_TOKEN= -e GITEA_RUNNER_NAME=vps-runner-1 -e GITEA_RUNNER_LABELS=ubuntu-latest:docker://node:20-bullseye gitea/act_runner:latest +| Runner | Purpose | Config | +|---|---|---| +| `gitea` (Docker container) | Hosts Gitea itself | `infra/gitea/docker-compose.yml` | +| `gitea-runner` (Docker container) | Runs all CI and deploy jobs | `infra/gitea/docker-compose.yml` + `/root/docker/gitea/runner-config.yaml` | + +Both containers live in the `gitea_gitea` Docker network on the VPS. The runner connects to Gitea via the LAN IP so job containers (which don't share the `gitea_gitea` network) can also reach it. + +### Docker-out-of-Docker (DooD) + +The `gitea-runner` container mounts the host Docker socket (`/var/run/docker.sock`). When a workflow job runs, act_runner spawns a **sibling container** for each job. That job container also gets the Docker socket mounted (via `valid_volumes` in `runner-config.yaml`), enabling `docker compose` calls in workflow steps. + +### Running host-level commands from CI (nsenter pattern) + +Job containers are unprivileged and do not share the host's PID/mount/network namespaces. Commands like `systemctl` that target the host daemon are therefore unavailable by default. When a workflow step needs to manage a host service (e.g. `systemctl reload caddy`), it uses the Docker socket to spin up a **privileged sibling container** in the host PID namespace: + +```yaml +- name: Reload Caddy + run: | + docker run --rm --privileged --pid=host \ + ubuntu:22.04 \ + nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy ``` -The runner label `ubuntu-latest` maps to the Docker image it uses -- this is how `runs-on: ubuntu-latest` in the workflow YAML continues to work unchanged. +`nsenter -t 1 -m -u -n -p -i` enters the init process's mount, UTS, IPC, network, PID, and cgroup namespaces, giving `systemctl` a view of the real host systemd. No sudoers entry is required — the Docker socket already grants root-equivalent host access. + +### Caddyfile symlink contract + +The deploy workflows reload Caddy to pick up committed Caddyfile changes. This relies on a symlink that must exist on the VPS: + +``` +/etc/caddy/Caddyfile → /opt/familienarchiv/infra/caddy/Caddyfile +``` + +Created once during server bootstrap (see `docs/DEPLOYMENT.md §3.1`). Verify with: + +```bash +ls -la /etc/caddy/Caddyfile +# Expected: lrwxrwxrwx ... /etc/caddy/Caddyfile -> /opt/familienarchiv/infra/caddy/Caddyfile +``` --- -- 2.49.1 From 544b96bc9ec14e6946d1b69bf2e60ed149cda56a Mon Sep 17 00:00:00 2001 From: Marcel Date: Mon, 11 May 2026 22:43:55 +0200 Subject: [PATCH 05/10] fix(ci): pin Reload Caddy to alpine:3.21 digest, add reload-vs-restart rationale - Switch ubuntu:22.04 (floating, ~70 MB) to alpine:3.21 pinned by sha256 digest (~5 MB); util-linux installed at run time via apk add - Add explicit comment explaining why `reload` not `restart`: SIGHUP re-reads config in-process without dropping TLS connections Addresses Tobias + Nora blocker from PR review. Co-Authored-By: Claude Sonnet 4.6 --- .gitea/workflows/nightly.yml | 33 ++++++++++++++++++++------------- .gitea/workflows/release.yml | 7 ++++--- 2 files changed, 24 insertions(+), 16 deletions(-) diff --git a/.gitea/workflows/nightly.yml b/.gitea/workflows/nightly.yml index 72eaa60f..490a91fc 100644 --- a/.gitea/workflows/nightly.yml +++ b/.gitea/workflows/nightly.yml @@ -129,22 +129,29 @@ jobs: # the current config is live. # # The runner executes job steps inside Docker containers (DooD). - # `systemctl` is not present in Ubuntu container images and cannot - # reach the host's systemd directly. We use the Docker socket - # (mounted into every job container via runner-config.yaml) to spin - # up a privileged sibling container in the host PID namespace; - # nsenter then enters the host's namespaces so systemctl talks to - # the real host systemd daemon. No sudoers entry is required — the - # Docker socket already grants root-equivalent host access. + # `systemctl` is not present in container images and cannot reach + # the host's systemd directly. We use the Docker socket (mounted + # into every job container via runner-config.yaml) to spin up a + # privileged sibling container in the host PID namespace; nsenter + # then enters the host's namespaces so systemctl talks to the real + # host systemd daemon. No sudoers entry is required — the Docker + # socket already grants root-equivalent host access. # - # `systemctl reload caddy` sends SIGHUP; Caddy re-reads - # /etc/caddy/Caddyfile (symlinked to infra/caddy/Caddyfile) without - # dropping connections. If Caddy is not running this step fails fast - # before the smoke test issues a misleading "port 443 refused" error. + # Alpine is used: ~5 MB vs ~70 MB for ubuntu, no unnecessary + # tooling, and the digest is pinned so any upstream change requires + # an explicit bump PR. util-linux (which ships nsenter) is installed + # at run time; apk add takes ~1 s on the warm VPS cache. + # + # `reload` not `restart`: reload sends SIGHUP so Caddy re-reads its + # config in-process without dropping TLS connections. `restart` + # would briefly stop the service, losing in-flight requests. + # + # If Caddy is not running this step fails fast before the smoke test + # issues a misleading "port 443 refused" error. run: | docker run --rm --privileged --pid=host \ - ubuntu:22.04 \ - nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy + alpine:3.21@sha256:48b0309ca019d89d40f670aa1bc06e426dc0931948452e8491e3d65087abc07d \ + sh -c 'apk add --no-cache util-linux -q && nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy' - name: Smoke test deployed environment # Healthchecks confirm containers are healthy; they do NOT confirm the diff --git a/.gitea/workflows/release.yml b/.gitea/workflows/release.yml index c53797c4..8d355da2 100644 --- a/.gitea/workflows/release.yml +++ b/.gitea/workflows/release.yml @@ -98,11 +98,12 @@ jobs: # cannot call systemctl directly; nsenter via a privileged sibling # container reaches the host systemd. Must run after deploy (so the # latest Caddyfile is on disk) and before the smoke test (so the - # public surface reflects the current config). + # public surface reflects the current config). Alpine with pinned + # digest; reload not restart — see nightly.yml for full rationale. run: | docker run --rm --privileged --pid=host \ - ubuntu:22.04 \ - nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy + alpine:3.21@sha256:48b0309ca019d89d40f670aa1bc06e426dc0931948452e8491e3d65087abc07d \ + sh -c 'apk add --no-cache util-linux -q && nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy' - name: Smoke test deployed environment # See nightly.yml — same three checks, against the prod vhost. -- 2.49.1 From 4bb988824fd65c3b203128dc2f3a7010868bd886 Mon Sep 17 00:00:00 2001 From: Marcel Date: Mon, 11 May 2026 22:47:41 +0200 Subject: [PATCH 06/10] docs(ci): update nsenter example to Alpine, document alternatives considered Co-Authored-By: Claude Sonnet 4.6 --- docs/infrastructure/ci-gitea.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/docs/infrastructure/ci-gitea.md b/docs/infrastructure/ci-gitea.md index f58bfde4..d54adcfa 100644 --- a/docs/infrastructure/ci-gitea.md +++ b/docs/infrastructure/ci-gitea.md @@ -27,12 +27,22 @@ Job containers are unprivileged and do not share the host's PID/mount/network na - name: Reload Caddy run: | docker run --rm --privileged --pid=host \ - ubuntu:22.04 \ - nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy + alpine:3.21@sha256:48b0309ca019d89d40f670aa1bc06e426dc0931948452e8491e3d65087abc07d \ + sh -c 'apk add --no-cache util-linux -q && nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy' ``` `nsenter -t 1 -m -u -n -p -i` enters the init process's mount, UTS, IPC, network, PID, and cgroup namespaces, giving `systemctl` a view of the real host systemd. No sudoers entry is required — the Docker socket already grants root-equivalent host access. +Alpine is used instead of Ubuntu: ~5 MB vs ~70 MB, and the digest is pinned to a specific sha256 so any upstream change requires an explicit Renovate bump PR. `util-linux` (which ships `nsenter`) is not part of the Alpine base image but is installed at run time in ~1 s from the warm VPS cache. + +#### Why not `sudo systemctl` in the job container? + +Job containers run as root inside an unprivileged Docker namespace. There is no systemd PID 1 inside the container — `systemctl` would attempt to reach a socket that does not exist. `sudo` is not present in container images and would not help even if it were. + +#### Why not Caddy's admin API? + +Caddy ships a localhost admin API at `:2019` by default. Job containers do not share the host network namespace, so they cannot reach `localhost:2019` on the host. Exposing `:2019` on a host-bound port to make it reachable would add a network attack surface with no benefit over the current approach. + ### Caddyfile symlink contract The deploy workflows reload Caddy to pick up committed Caddyfile changes. This relies on a symlink that must exist on the VPS: -- 2.49.1 From 8536b2ebbdc516b38c9bfc542d3f88ac7beeb944 Mon Sep 17 00:00:00 2001 From: Marcel Date: Mon, 11 May 2026 22:52:34 +0200 Subject: [PATCH 07/10] docs(deploy): note Caddyfile symlink is a CI dependency Co-Authored-By: Claude Sonnet 4.6 --- docs/DEPLOYMENT.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index a2dc55ca..58d2769e 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -151,6 +151,9 @@ ufw default deny incoming && ufw allow 22/tcp && ufw allow 80/tcp && ufw allow 4 apt install caddy # Use the Caddyfile from the repo (replace path with the runner's clone target) +# CI DEPENDENCY: the nightly and release workflows run `systemctl reload caddy` to +# pick up committed Caddyfile changes. They find the file via this symlink — if it +# is absent or points elsewhere, the reload succeeds but serves stale config. ln -sf /opt/familienarchiv/infra/caddy/Caddyfile /etc/caddy/Caddyfile systemctl reload caddy -- 2.49.1 From bbdf1c3e677699150f64bb924c77712a59e3101f Mon Sep 17 00:00:00 2001 From: Marcel Date: Mon, 11 May 2026 23:13:50 +0200 Subject: [PATCH 08/10] =?UTF-8?q?docs(adr):=20ADR-012=20=E2=80=94=20nsente?= =?UTF-8?q?r=20via=20privileged=20container=20for=20host=20service=20manag?= =?UTF-8?q?ement=20in=20CI?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Captures the architectural decision, alternatives considered (sudo systemctl, Caddy admin API, SSH), and consequences (symlink contract, Renovate review requirement, step duplication tracked in #539). Co-Authored-By: Claude Sonnet 4.6 --- ...enter-for-host-service-management-in-ci.md | 63 +++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 docs/adr/012-nsenter-for-host-service-management-in-ci.md diff --git a/docs/adr/012-nsenter-for-host-service-management-in-ci.md b/docs/adr/012-nsenter-for-host-service-management-in-ci.md new file mode 100644 index 00000000..17823a21 --- /dev/null +++ b/docs/adr/012-nsenter-for-host-service-management-in-ci.md @@ -0,0 +1,63 @@ +# ADR-012: nsenter via privileged sibling container for host service management in CI + +## Status + +Accepted + +## Context + +The deploy workflows (`.gitea/workflows/nightly.yml`, `release.yml`) run job steps inside Docker containers under a Docker-out-of-Docker (DooD) setup: the Gitea runner container mounts the host Docker socket, and act_runner spawns a sibling container for each job. That job container also gets the Docker socket mounted (via `valid_volumes` in `runner-config.yaml`). + +This architecture has one significant limitation: **job containers cannot manage host services**. Specifically: + +- Job containers are not in the host's PID, mount, UTS, network, or IPC namespaces. +- There is no systemd PID 1 inside a job container — `systemctl` has nothing to talk to. +- `sudo` is not present in standard container images; even if it were, it would not help. +- Caddy runs as a **host systemd service** (not a Docker container), managing TLS certificates via Let's Encrypt. It must be running on the host to serve port 443. + +The deploy workflows need to tell Caddy to reload its config after each deploy so that committed Caddyfile changes are applied before the smoke test validates the public surface. Without a reload step, Caddy silently serves the previous config and the smoke test may pass against stale configuration. + +## Decision + +Use the host Docker socket (already mounted in every job container via `runner-config.yaml`) to spin up a **privileged sibling container** in the host PID namespace, then use `nsenter` to enter all host namespaces and call `systemctl reload caddy`: + +```yaml +- name: Reload Caddy + run: | + docker run --rm --privileged --pid=host \ + alpine:3.21@sha256:48b0309ca019d89d40f670aa1bc06e426dc0931948452e8491e3d65087abc07d \ + sh -c 'apk add --no-cache util-linux -q && nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy' +``` + +`nsenter -t 1 -m -u -n -p -i` enters the init process's mount, UTS, IPC, network, PID, and cgroup namespaces, giving `systemctl` a view of the real host systemd daemon. + +**Alpine is used** instead of Ubuntu: ~5 MB vs ~70 MB pull size, no unnecessary tooling. `util-linux` (which ships `nsenter`) is installed at run time; apk add takes ~1 s on the warm VPS cache. The image digest is pinned so any upstream change requires an explicit Renovate bump PR. + +**`reload` not `restart`**: reload sends SIGHUP so Caddy re-reads its config in-process without dropping TLS connections or in-flight requests. + +**No sudoers entry is required**: the Docker socket already grants root-equivalent host access. This pattern makes existing implicit privileges explicit rather than introducing new ones. + +This decision applies the same pattern to both `nightly.yml` and `release.yml` since both deploy the app stack and must apply Caddyfile changes before smoke-testing the public surface. + +## Alternatives Considered + +| Alternative | Why rejected | +|---|---| +| `sudo systemctl reload caddy` in the job container | No systemd PID 1 inside the container — `systemctl` has nothing to connect to. `sudo` is not present in container images and would not help even if it were. | +| Caddy admin API (`curl localhost:2019/load`) | Job containers do not share the host network namespace; `localhost:2019` on the host is unreachable. Exposing `:2019` on a host-bound port would add a network attack surface with no benefit over the current approach. | +| SSH from the job container to the VPS host | Requires storing an SSH private key as a CI secret, managing authorized_keys on the host, and opening an inbound SSH path from the container. Adds key management overhead for a pattern that the Docker socket already enables more directly. | +| Running Caddy as a Docker container (instead of host service) | Caddy manages TLS certificates via Let's Encrypt; running it in Docker complicates certificate persistence and renewal. As a host service, cert storage is straightforward and restarts do not risk rate-limit issues. This would be a larger infrastructure change unrelated to the CI gap. | + +## Consequences + +- The runner host's Docker socket access is now a capability relied upon for host service management, not just for running `docker compose` commands. This is stated explicitly in the YAML comment so future reviewers understand the trust boundary. +- The Caddyfile symlink on the VPS (`/etc/caddy/Caddyfile → /opt/familienarchiv/infra/caddy/Caddyfile`) is a required contract for CI to succeed. It is documented in `docs/DEPLOYMENT.md §3.1` and `docs/infrastructure/ci-gitea.md`. If the symlink is absent or mis-pointed, `systemctl reload caddy` succeeds but Caddy serves stale config. +- Renovate will create bump PRs when a new Alpine 3.21 digest is published. Because the container runs `--privileged --pid=host`, these bump PRs must be reviewed manually and must not be auto-merged. A `packageRule` in `renovate.json` enforces this. +- The step is duplicated between `nightly.yml` and `release.yml` (tracked in issue #539 for extraction into a composite action). +- If Caddy is not running when the step executes, `systemctl reload` exits non-zero and the workflow aborts before the smoke test — preventing a misleading "port 443 refused" curl error. + +## References + +- `docs/infrastructure/ci-gitea.md` §"Running host-level commands from CI (nsenter pattern)" — full operational context, troubleshooting guide +- `docs/DEPLOYMENT.md` §3.1 — Caddyfile symlink bootstrap step +- ADR-011 — single-tenant runner trust model (Docker socket access scope) -- 2.49.1 From 58922bee537c7edc9c59a0970ced4b3ce6797f76 Mon Sep 17 00:00:00 2001 From: Marcel Date: Mon, 11 May 2026 23:14:35 +0200 Subject: [PATCH 09/10] docs(ci): add Troubleshooting section for Reload Caddy failures Covers the three failure modes Sara flagged: Caddy stopped (explicit systemctl error), symlink missing/mis-pointed (silent reload, stale smoke test), and Docker socket / nsenter unavailable (container error). Each failure mode includes symptoms and recovery steps. Co-Authored-By: Claude Sonnet 4.6 --- docs/infrastructure/ci-gitea.md | 50 +++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/docs/infrastructure/ci-gitea.md b/docs/infrastructure/ci-gitea.md index d54adcfa..017d6ba2 100644 --- a/docs/infrastructure/ci-gitea.md +++ b/docs/infrastructure/ci-gitea.md @@ -58,6 +58,56 @@ ls -la /etc/caddy/Caddyfile # Expected: lrwxrwxrwx ... /etc/caddy/Caddyfile -> /opt/familienarchiv/infra/caddy/Caddyfile ``` +### Troubleshooting: Reload Caddy step fails + +**Failure mode 1 — Caddy is stopped** + +Symptom in CI log: +``` +Failed to reload caddy.service: Unit caddy.service is not active. +``` + +Recovery: +```bash +ssh root@ +systemctl start caddy +systemctl status caddy # confirm Active: active (running) +``` + +Re-run the workflow via Gitea Actions → "Re-run workflow". + +**Failure mode 2 — Caddyfile symlink is missing or mis-pointed** + +This failure is silent — `systemctl reload caddy` exits 0 but Caddy reloads whatever `/etc/caddy/Caddyfile` currently resolves to. The smoke test may then pass against stale config. + +Symptom: smoke test fails on the HSTS value or the `/actuator/health → 404` check despite the Reload Caddy step succeeding. + +Diagnosis: +```bash +ssh root@ +ls -la /etc/caddy/Caddyfile +# Should be: lrwxrwxrwx ... /etc/caddy/Caddyfile -> /opt/familienarchiv/infra/caddy/Caddyfile +``` + +Recovery if symlink is wrong or missing: +```bash +ln -sf /opt/familienarchiv/infra/caddy/Caddyfile /etc/caddy/Caddyfile +systemctl reload caddy +``` + +**Failure mode 3 — nsenter / Docker socket unavailable** + +Symptom in CI log: +``` +docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. +``` +or +``` +nsenter: failed to execute /bin/systemctl: No such file or directory +``` + +The first error means the Docker socket is not mounted into the job container — check `valid_volumes` in `/root/docker/gitea/runner-config.yaml` on the VPS. The second means the Alpine image is running but cannot enter the host mount namespace; verify `--privileged` and `--pid=host` are both present in the workflow step. + --- ## Gitea vs GitHub Actions Differences -- 2.49.1 From e9caa3a1f748f04989f040d3b839ac6a7e23eca9 Mon Sep 17 00:00:00 2001 From: Marcel Date: Mon, 11 May 2026 23:15:05 +0200 Subject: [PATCH 10/10] chore(renovate): require manual review for privileged CI image digest bumps Adds a packageRule matching .gitea/workflows/** digest updates with automerge: false. Digest bumps for images running --privileged --pid=host have root-equivalent host access and must not be auto-merged. Addresses Nora's review concern on #537. Co-Authored-By: Claude Sonnet 4.6 --- renovate.json | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/renovate.json b/renovate.json index bcb6238b..e4f29762 100644 --- a/renovate.json +++ b/renovate.json @@ -5,6 +5,13 @@ "matchPackagePatterns": ["^@tiptap/"], "groupName": "tiptap", "automerge": false + }, + { + "description": "Digest bumps for images used in privileged CI steps (--privileged --pid=host) must be reviewed manually — a compromised image has root-equivalent host access.", + "matchPaths": [".gitea/workflows/**"], + "matchUpdateTypes": ["digest"], + "automerge": false, + "reviewersFromCodeOwners": false } ] } -- 2.49.1