# CI with Gitea Actions This document covers the Gitea Actions CI workflow for Familienarchiv, including the full workflow YAML, differences from GitHub Actions, and self-hosted runner provisioning. --- ## Runner Architecture Familienarchiv uses **two runners** on the same Hetzner VPS: | Runner | Purpose | Config | |---|---|---| | `gitea` (Docker container) | Hosts Gitea itself | `infra/gitea/docker-compose.yml` | | `gitea-runner` (Docker container) | Runs all CI and deploy jobs | `infra/gitea/docker-compose.yml` + `/root/docker/gitea/runner-config.yaml` | Both containers live in the `gitea_gitea` Docker network on the VPS. The runner connects to Gitea via the LAN IP so job containers (which don't share the `gitea_gitea` network) can also reach it. ### Docker-out-of-Docker (DooD) The `gitea-runner` container mounts the host Docker socket (`/var/run/docker.sock`). When a workflow job runs, act_runner spawns a **sibling container** for each job. That job container also gets the Docker socket mounted (via `valid_volumes` in `runner-config.yaml`), enabling `docker compose` calls in workflow steps. ### Workspace bind-mount setup (DooD path resolution) When a workflow step calls `docker compose up` with relative bind-mount sources (e.g. `./infra/observability/prometheus/prometheus.yml`), Compose resolves them against `$(pwd)` inside the job container and passes the resulting **absolute path** to the host Docker daemon. The host daemon then tries to bind-mount that path from the **host filesystem**. In the default DooD setup the job container's workspace lives in the act_runner overlay2 layer — the host has no directory at that path, auto-creates an empty one, and the container fails with: ``` error mounting "…/prometheus/prometheus.yml" to rootfs at "/etc/prometheus/prometheus.yml": not a directory ``` **Solution (ADR-015):** store job workspaces on a real host path and mount it at the **same absolute path** inside the runner and every job container. `runner-config.yaml` configures this via `workdir_parent`, `valid_volumes`, and `options`. **One-time host setup** (required on any fresh VPS): ```bash mkdir -p /srv/gitea-workspace # Then add to the runner service in ~/docker/gitea/compose.yaml: # volumes: # - /srv/gitea-workspace:/srv/gitea-workspace # Restart the runner container for the change to take effect. ``` The path `/srv/gitea-workspace` is the canonical workspace root. It must be identical on the host and inside job containers — if the paths differ, Compose still resolves to the container-internal path, which the host daemon cannot find (the original bug). **Disk management:** act_runner cleans per-run subdirectories on completion. Orphaned directories from interrupted runs accumulate under `/srv/gitea-workspace` and should be pruned manually if disk space becomes a concern: ```bash # List workspace directories older than 7 days find /srv/gitea-workspace -mindepth 3 -maxdepth 3 -type d -mtime +7 ``` --- ### Running host-level commands from CI (nsenter pattern) Job containers are unprivileged and do not share the host's PID/mount/network namespaces. Commands like `systemctl` that target the host daemon are therefore unavailable by default. When a workflow step needs to manage a host service (e.g. `systemctl reload caddy`), it uses the Docker socket to spin up a **privileged sibling container** in the host PID namespace: ```yaml - name: Reload Caddy run: | docker run --rm --privileged --pid=host \ alpine:3.21@sha256:48b0309ca019d89d40f670aa1bc06e426dc0931948452e8491e3d65087abc07d \ sh -c 'apk add --no-cache util-linux -q && nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy' ``` `nsenter -t 1 -m -u -n -p -i` enters the init process's mount, UTS, IPC, network, PID, and cgroup namespaces, giving `systemctl` a view of the real host systemd. No sudoers entry is required — the Docker socket already grants root-equivalent host access. Alpine is used instead of Ubuntu: ~5 MB vs ~70 MB, and the digest is pinned to a specific sha256 so any upstream change requires an explicit Renovate bump PR. `util-linux` (which ships `nsenter`) is not part of the Alpine base image but is installed at run time in ~1 s from the warm VPS cache. This exact step now lives in the `reload-caddy` composite action (see [Composite actions](#composite-actions) below); both deploy workflows call it via `uses: ./.gitea/actions/reload-caddy`. The pinned digest moved with it, so Renovate's privileged-digest watch covers `.gitea/actions/**` as well as `.gitea/workflows/**`. #### Why not `sudo systemctl` in the job container? Job containers run as root inside an unprivileged Docker namespace. There is no systemd PID 1 inside the container — `systemctl` would attempt to reach a socket that does not exist. `sudo` is not present in container images and would not help even if it were. #### Why not Caddy's admin API? Caddy ships a localhost admin API at `:2019` by default. Job containers do not share the host network namespace, so they cannot reach `localhost:2019` on the host. Exposing `:2019` on a host-bound port to make it reachable would add a network attack surface with no benefit over the current approach. ### Caddyfile symlink contract The deploy workflows reload Caddy to pick up committed Caddyfile changes. This relies on a symlink that must exist on the VPS: ``` /etc/caddy/Caddyfile → /opt/familienarchiv/infra/caddy/Caddyfile ``` Created once during server bootstrap (see `docs/DEPLOYMENT.md §3.1`). Verify with: ```bash ls -la /etc/caddy/Caddyfile # Expected: lrwxrwxrwx ... /etc/caddy/Caddyfile -> /opt/familienarchiv/infra/caddy/Caddyfile ``` ### Troubleshooting: Reload Caddy step fails **Failure mode 1 — Caddy is stopped** Symptom in CI log: ``` Failed to reload caddy.service: Unit caddy.service is not active. ``` Recovery: ```bash ssh root@ systemctl start caddy systemctl status caddy # confirm Active: active (running) ``` Re-run the workflow via Gitea Actions → "Re-run workflow". **Failure mode 2 — Caddyfile symlink is missing or mis-pointed** This failure is silent — `systemctl reload caddy` exits 0 but Caddy reloads whatever `/etc/caddy/Caddyfile` currently resolves to. The smoke test may then pass against stale config. Symptom: smoke test fails on the HSTS value or the `/actuator/health → 404` check despite the Reload Caddy step succeeding. Diagnosis: ```bash ssh root@ ls -la /etc/caddy/Caddyfile # Should be: lrwxrwxrwx ... /etc/caddy/Caddyfile -> /opt/familienarchiv/infra/caddy/Caddyfile ``` Recovery if symlink is wrong or missing: ```bash ln -sf /opt/familienarchiv/infra/caddy/Caddyfile /etc/caddy/Caddyfile systemctl reload caddy ``` **Failure mode 3 — nsenter / Docker socket unavailable** Symptom in CI log: ``` docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. ``` or ``` nsenter: failed to execute /bin/systemctl: No such file or directory ``` The first error means the Docker socket is not mounted into the job container — check `valid_volumes` in `/root/docker/gitea/runner-config.yaml` on the VPS. The second means the Alpine image is running but cannot enter the host mount namespace; verify `--privileged` and `--pid=host` are both present in the workflow step. **Failure mode 4 — workspace bind-mount not configured (observability stack or any compose-with-file-mounts job)** Symptom in CI log: ``` Error response from daemon: error while creating mount source path "…/prometheus/prometheus.yml": mkdir …: not a directory ``` Or the service starts but immediately crashes because a config file was mounted as an empty directory. Cause: `/srv/gitea-workspace` does not exist on the host, or the runner container's `compose.yaml` is missing the `- /srv/gitea-workspace:/srv/gitea-workspace` volume line. Diagnosis: ```bash ssh root@ ls -la /srv/gitea-workspace # must exist and be a directory docker inspect gitea-runner | grep -A5 Mounts # must show /srv/gitea-workspace ``` Recovery: ```bash mkdir -p /srv/gitea-workspace # Add volume line to runner compose.yaml, then: docker compose -f ~/docker/gitea/compose.yaml up -d gitea-runner ``` See `docs/DEPLOYMENT.md §3.1` and ADR-015 for the full setup rationale. --- ## Composite actions The `nightly.yml` (staging) and `release.yml` (production) deploy workflows share their observability-stack deploy, Caddy reload, and smoke-test logic through three single-responsibility composite actions under `.gitea/actions/` (ADR-029). Before this, the shared logic was duplicated in both workflows and held together by `# Keep in sync with nightly.yml` comments — an unenforced honour-system invariant. | Action | Inputs | Purpose | |---|---|---| | `deploy-obs` | `grafana_admin_password`, `grafana_db_password`, `glitchtip_secret_key`, `postgres_password`, `postgres_host` | Deploy obs configs + secrets to `/opt/familienarchiv`, validate the compose config, start the stack, assert the five healthchecked services | | `reload-caddy` | — | Reload host Caddy via the privileged-sibling + nsenter pattern | | `smoke-test` | `host` | Verify the public surface (login reachable, HSTS pinned, Permissions-Policy present, `/actuator → 404`) | A workflow calls them by relative path, passing per-environment values as `with:` inputs: ```yaml - uses: ./.gitea/actions/deploy-obs with: grafana_admin_password: ${{ secrets.GRAFANA_ADMIN_PASSWORD }} grafana_db_password: ${{ secrets.GRAFANA_DB_PASSWORD }} glitchtip_secret_key: ${{ secrets.GLITCHTIP_SECRET_KEY }} postgres_password: ${{ secrets.STAGING_POSTGRES_PASSWORD }} postgres_host: archiv-staging-db-1 - uses: ./.gitea/actions/reload-caddy - uses: ./.gitea/actions/smoke-test with: host: staging.raddatz.cloud ``` ### Checkout-first ordering rule A local composite action (`uses: ./…`) only exists on disk **after** the repo is checked out. `actions/checkout@v4` MUST therefore be the **first step** of any job that calls one — if a future reorder moves checkout later, every `uses: ./.gitea/actions/…` call fails because the action file is not yet on disk. Both deploy workflows pin checkout as step 1 for exactly this reason. ### Secrets inside composite actions The `secrets.*` context is **not** available inside a composite action. Secrets are passed in as `inputs`, mapped to an `env:` block, and referenced as `$VAR`: ```yaml inputs: grafana_admin_password: required: true # no default — a missing secret must fail loudly, never fall back to empty runs: using: composite steps: - shell: bash # composite steps do NOT default the shell — always declare it env: GRAFANA_ADMIN_PASSWORD: ${{ inputs.grafana_admin_password }} run: | cat > obs-secrets.env <