Files
familienarchiv/docs/infrastructure/ci-gitea.md
Marcel 02fb16a0bd
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m20s
CI / OCR Service Tests (pull_request) Successful in 24s
CI / Backend Unit Tests (pull_request) Successful in 3m39s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
docs(ci): document composite actions in ci-gitea.md
Adds a Composite actions section covering the checkout-first ordering rule, the
secrets-via-inputs + unquoted-heredoc constraint (with the five-key guard and
shell: bash requirement), and a step-by-step for adding an input. Notes that the
inline Reload Caddy example now lives in the reload-caddy action.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 19:25:32 +02:00

465 lines
19 KiB
Markdown

# CI with Gitea Actions
This document covers the Gitea Actions CI workflow for Familienarchiv, including the full workflow YAML, differences from GitHub Actions, and self-hosted runner provisioning.
---
## Runner Architecture
Familienarchiv uses **two runners** on the same Hetzner VPS:
| Runner | Purpose | Config |
|---|---|---|
| `gitea` (Docker container) | Hosts Gitea itself | `infra/gitea/docker-compose.yml` |
| `gitea-runner` (Docker container) | Runs all CI and deploy jobs | `infra/gitea/docker-compose.yml` + `/root/docker/gitea/runner-config.yaml` |
Both containers live in the `gitea_gitea` Docker network on the VPS. The runner connects to Gitea via the LAN IP so job containers (which don't share the `gitea_gitea` network) can also reach it.
### Docker-out-of-Docker (DooD)
The `gitea-runner` container mounts the host Docker socket (`/var/run/docker.sock`). When a workflow job runs, act_runner spawns a **sibling container** for each job. That job container also gets the Docker socket mounted (via `valid_volumes` in `runner-config.yaml`), enabling `docker compose` calls in workflow steps.
### Workspace bind-mount setup (DooD path resolution)
When a workflow step calls `docker compose up` with relative bind-mount sources (e.g. `./infra/observability/prometheus/prometheus.yml`), Compose resolves them against `$(pwd)` inside the job container and passes the resulting **absolute path** to the host Docker daemon. The host daemon then tries to bind-mount that path from the **host filesystem**.
In the default DooD setup the job container's workspace lives in the act_runner overlay2 layer — the host has no directory at that path, auto-creates an empty one, and the container fails with:
```
error mounting "…/prometheus/prometheus.yml" to rootfs at "/etc/prometheus/prometheus.yml": not a directory
```
**Solution (ADR-015):** store job workspaces on a real host path and mount it at the **same absolute path** inside the runner and every job container. `runner-config.yaml` configures this via `workdir_parent`, `valid_volumes`, and `options`.
**One-time host setup** (required on any fresh VPS):
```bash
mkdir -p /srv/gitea-workspace
# Then add to the runner service in ~/docker/gitea/compose.yaml:
# volumes:
# - /srv/gitea-workspace:/srv/gitea-workspace
# Restart the runner container for the change to take effect.
```
The path `/srv/gitea-workspace` is the canonical workspace root. It must be identical on the host and inside job containers — if the paths differ, Compose still resolves to the container-internal path, which the host daemon cannot find (the original bug).
**Disk management:** act_runner cleans per-run subdirectories on completion. Orphaned directories from interrupted runs accumulate under `/srv/gitea-workspace` and should be pruned manually if disk space becomes a concern:
```bash
# List workspace directories older than 7 days
find /srv/gitea-workspace -mindepth 3 -maxdepth 3 -type d -mtime +7
```
---
### Running host-level commands from CI (nsenter pattern)
Job containers are unprivileged and do not share the host's PID/mount/network namespaces. Commands like `systemctl` that target the host daemon are therefore unavailable by default. When a workflow step needs to manage a host service (e.g. `systemctl reload caddy`), it uses the Docker socket to spin up a **privileged sibling container** in the host PID namespace:
```yaml
- name: Reload Caddy
run: |
docker run --rm --privileged --pid=host \
alpine:3.21@sha256:48b0309ca019d89d40f670aa1bc06e426dc0931948452e8491e3d65087abc07d \
sh -c 'apk add --no-cache util-linux -q && nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy'
```
`nsenter -t 1 -m -u -n -p -i` enters the init process's mount, UTS, IPC, network, PID, and cgroup namespaces, giving `systemctl` a view of the real host systemd. No sudoers entry is required — the Docker socket already grants root-equivalent host access.
Alpine is used instead of Ubuntu: ~5 MB vs ~70 MB, and the digest is pinned to a specific sha256 so any upstream change requires an explicit Renovate bump PR. `util-linux` (which ships `nsenter`) is not part of the Alpine base image but is installed at run time in ~1 s from the warm VPS cache.
This exact step now lives in the `reload-caddy` composite action (see [Composite actions](#composite-actions) below); both deploy workflows call it via `uses: ./.gitea/actions/reload-caddy`. The pinned digest moved with it, so Renovate's privileged-digest watch covers `.gitea/actions/**` as well as `.gitea/workflows/**`.
#### Why not `sudo systemctl` in the job container?
Job containers run as root inside an unprivileged Docker namespace. There is no systemd PID 1 inside the container — `systemctl` would attempt to reach a socket that does not exist. `sudo` is not present in container images and would not help even if it were.
#### Why not Caddy's admin API?
Caddy ships a localhost admin API at `:2019` by default. Job containers do not share the host network namespace, so they cannot reach `localhost:2019` on the host. Exposing `:2019` on a host-bound port to make it reachable would add a network attack surface with no benefit over the current approach.
### Caddyfile symlink contract
The deploy workflows reload Caddy to pick up committed Caddyfile changes. This relies on a symlink that must exist on the VPS:
```
/etc/caddy/Caddyfile → /opt/familienarchiv/infra/caddy/Caddyfile
```
Created once during server bootstrap (see `docs/DEPLOYMENT.md §3.1`). Verify with:
```bash
ls -la /etc/caddy/Caddyfile
# Expected: lrwxrwxrwx ... /etc/caddy/Caddyfile -> /opt/familienarchiv/infra/caddy/Caddyfile
```
### Troubleshooting: Reload Caddy step fails
**Failure mode 1 — Caddy is stopped**
Symptom in CI log:
```
Failed to reload caddy.service: Unit caddy.service is not active.
```
Recovery:
```bash
ssh root@<vps>
systemctl start caddy
systemctl status caddy # confirm Active: active (running)
```
Re-run the workflow via Gitea Actions → "Re-run workflow".
**Failure mode 2 — Caddyfile symlink is missing or mis-pointed**
This failure is silent — `systemctl reload caddy` exits 0 but Caddy reloads whatever `/etc/caddy/Caddyfile` currently resolves to. The smoke test may then pass against stale config.
Symptom: smoke test fails on the HSTS value or the `/actuator/health → 404` check despite the Reload Caddy step succeeding.
Diagnosis:
```bash
ssh root@<vps>
ls -la /etc/caddy/Caddyfile
# Should be: lrwxrwxrwx ... /etc/caddy/Caddyfile -> /opt/familienarchiv/infra/caddy/Caddyfile
```
Recovery if symlink is wrong or missing:
```bash
ln -sf /opt/familienarchiv/infra/caddy/Caddyfile /etc/caddy/Caddyfile
systemctl reload caddy
```
**Failure mode 3 — nsenter / Docker socket unavailable**
Symptom in CI log:
```
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock.
```
or
```
nsenter: failed to execute /bin/systemctl: No such file or directory
```
The first error means the Docker socket is not mounted into the job container — check `valid_volumes` in `/root/docker/gitea/runner-config.yaml` on the VPS. The second means the Alpine image is running but cannot enter the host mount namespace; verify `--privileged` and `--pid=host` are both present in the workflow step.
**Failure mode 4 — workspace bind-mount not configured (observability stack or any compose-with-file-mounts job)**
Symptom in CI log:
```
Error response from daemon: error while creating mount source path "…/prometheus/prometheus.yml": mkdir …: not a directory
```
Or the service starts but immediately crashes because a config file was mounted as an empty directory.
Cause: `/srv/gitea-workspace` does not exist on the host, or the runner container's `compose.yaml` is missing the `- /srv/gitea-workspace:/srv/gitea-workspace` volume line.
Diagnosis:
```bash
ssh root@<vps>
ls -la /srv/gitea-workspace # must exist and be a directory
docker inspect gitea-runner | grep -A5 Mounts # must show /srv/gitea-workspace
```
Recovery:
```bash
mkdir -p /srv/gitea-workspace
# Add volume line to runner compose.yaml, then:
docker compose -f ~/docker/gitea/compose.yaml up -d gitea-runner
```
See `docs/DEPLOYMENT.md §3.1` and ADR-015 for the full setup rationale.
---
## Composite actions
The `nightly.yml` (staging) and `release.yml` (production) deploy workflows share their observability-stack deploy, Caddy reload, and smoke-test logic through three single-responsibility composite actions under `.gitea/actions/` (ADR-029). Before this, the shared logic was duplicated in both workflows and held together by `# Keep in sync with nightly.yml` comments — an unenforced honour-system invariant.
| Action | Inputs | Purpose |
|---|---|---|
| `deploy-obs` | `grafana_admin_password`, `grafana_db_password`, `glitchtip_secret_key`, `postgres_password`, `postgres_host` | Deploy obs configs + secrets to `/opt/familienarchiv`, validate the compose config, start the stack, assert the five healthchecked services |
| `reload-caddy` | — | Reload host Caddy via the privileged-sibling + nsenter pattern |
| `smoke-test` | `host` | Verify the public surface (login reachable, HSTS pinned, Permissions-Policy present, `/actuator → 404`) |
A workflow calls them by relative path, passing per-environment values as `with:` inputs:
```yaml
- uses: ./.gitea/actions/deploy-obs
with:
grafana_admin_password: ${{ secrets.GRAFANA_ADMIN_PASSWORD }}
grafana_db_password: ${{ secrets.GRAFANA_DB_PASSWORD }}
glitchtip_secret_key: ${{ secrets.GLITCHTIP_SECRET_KEY }}
postgres_password: ${{ secrets.STAGING_POSTGRES_PASSWORD }}
postgres_host: archiv-staging-db-1
- uses: ./.gitea/actions/reload-caddy
- uses: ./.gitea/actions/smoke-test
with:
host: staging.raddatz.cloud
```
### Checkout-first ordering rule
A local composite action (`uses: ./…`) only exists on disk **after** the repo is checked out. `actions/checkout@v4` MUST therefore be the **first step** of any job that calls one — if a future reorder moves checkout later, every `uses: ./.gitea/actions/…` call fails because the action file is not yet on disk. Both deploy workflows pin checkout as step 1 for exactly this reason.
### Secrets inside composite actions
The `secrets.*` context is **not** available inside a composite action. Secrets are passed in as `inputs`, mapped to an `env:` block, and referenced as `$VAR`:
```yaml
inputs:
grafana_admin_password:
required: true # no default — a missing secret must fail loudly, never fall back to empty
runs:
using: composite
steps:
- shell: bash # composite steps do NOT default the shell — always declare it
env:
GRAFANA_ADMIN_PASSWORD: ${{ inputs.grafana_admin_password }}
run: |
cat > obs-secrets.env <<EOF # unquoted EOF — $VAR expands at the shell layer
GRAFANA_ADMIN_PASSWORD=$GRAFANA_ADMIN_PASSWORD
EOF
```
Two load-bearing details:
- **Unquoted heredoc delimiter (`<<EOF`, not `<<'EOF'`).** With a quoted delimiter the shell writes the literal string `$GRAFANA_ADMIN_PASSWORD`, and `docker compose config --quiet` still passes (the variable is *present, just wrong*). The `deploy-obs` action guards against this with a five-key **non-empty** check (`grep -Eq "^KEY=.+"`) immediately after writing `obs-secrets.env`. `chmod 600` is the action's final operation so the file is never world-readable.
- **Every `run:` step declares `shell: bash`.** Composite actions do not inherit the workflow's default shell; a step without it fails to run.
### Adding an input to an action
To thread a new per-environment value (e.g. a new secret) through `deploy-obs`:
1. Add it under `inputs:` in `.gitea/actions/deploy-obs/action.yml` with `required: true` and **no `default:`**.
2. Map it in the relevant step's `env:` block: `NEW_KEY: ${{ inputs.new_key }}`.
3. Reference it as `$NEW_KEY` in the `run:` script — add a `NEW_KEY=$NEW_KEY` line to the heredoc **and** a matching entry to the five-key guard loop.
4. Pass it from **both** workflows' `with:` blocks. That is the whole point of the action: the contract lives in one place, so neither environment can silently drift.
---
## Gitea vs GitHub Actions Differences
### Context Variable Names
| GitHub Actions | Gitea Actions |
|---|---|
| `github.sha` | `gitea.sha` |
| `github.actor` | `gitea.actor` |
| `github.repository` | `gitea.repository` |
| `github.ref_name` | `gitea.ref_name` |
| `secrets.GITHUB_TOKEN` | `secrets.GITEA_TOKEN` (must be created manually) |
### Token Name Difference
```yaml
# GitHub Actions
password: ${{ secrets.GITHUB_TOKEN }}
# Gitea Actions — use a Gitea access token stored as a secret
password: ${{ secrets.GITEA_TOKEN }}
```
### Container Registry
```yaml
# GitHub Actions — GHCR
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
tags: ghcr.io/${{ github.repository }}/app:${{ github.sha }}
# Gitea Actions — Gitea Package Registry
registry: gitea.example.com
username: ${{ gitea.actor }}
password: ${{ secrets.GITEA_TOKEN }}
tags: gitea.example.com/${{ gitea.repository }}/app:${{ gitea.sha }}
```
---
## What Works Identically Between GitHub and Gitea Actions
- `uses: actions/checkout@v4` -- works unchanged
- `uses: actions/setup-java@v4` -- works unchanged
- `uses: actions/setup-node@v4` -- works unchanged
- `uses: actions/cache@v4` -- works unchanged
- `uses: docker/build-push-action@v5` -- works unchanged
- `container:` key for running jobs inside a Docker image -- works unchanged
- Secrets syntax `${{ secrets.MY_SECRET }}` -- works unchanged
---
## Full CI Workflow YAML
This is the complete `ci.yml` workflow, updated for Gitea with key changes highlighted.
```yaml
# Updated for Gitea — key changes highlighted
name: CI
on:
push:
pull_request:
jobs:
unit-tests:
name: Unit & Component Tests
runs-on: ubuntu-latest # matches runner label registered above
container:
image: mcr.microsoft.com/playwright:v1.58.2-noble
steps:
- uses: actions/checkout@v4
- name: Cache node_modules
uses: actions/cache@v4
with:
path: frontend/node_modules
key: node-modules-${{ hashFiles('frontend/package-lock.json') }}
- name: Install dependencies
if: steps.node-modules-cache.outputs.cache-hit != 'true'
run: npm ci
working-directory: frontend
- name: Lint
run: npm run lint
working-directory: frontend
- name: Run unit and component tests
run: npm test
working-directory: frontend
- name: Upload screenshots
if: always()
uses: actions/upload-artifact@v3 # pinned per ADR-014 — Gitea Actions does not implement v4 protocol. Do NOT upgrade.
with:
name: unit-test-screenshots
path: frontend/test-results/screenshots/
backend-unit-tests:
name: Backend Unit Tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
with:
java-version: '21'
distribution: temurin
- name: Cache Maven repository
uses: actions/cache@v4
with:
path: ~/.m2/repository
key: maven-${{ hashFiles('backend/pom.xml') }}
restore-keys: maven-
- name: Run backend tests
run: |
chmod +x mvnw
./mvnw clean test
working-directory: backend
- name: Upload test results
if: always()
uses: actions/upload-artifact@v3 # pinned per ADR-014 — Gitea Actions does not implement v4 protocol. Do NOT upgrade.
with:
name: backend-test-results
path: backend/target/surefire-reports/
e2e-tests:
name: E2E Tests
runs-on: ubuntu-latest
env:
DOCKER_API_VERSION: "1.43"
POSTGRES_USER: archive_user
POSTGRES_PASSWORD: ci_db_password
POSTGRES_DB: family_archive_db
MINIO_ROOT_USER: minio_admin
MINIO_ROOT_PASSWORD: ci_minio_password
MINIO_DEFAULT_BUCKETS: archive-documents
PORT_DB: 5433
PORT_MINIO_API: 9100
PORT_MINIO_CONSOLE: 9101
PORT_BACKEND: 8080
PORT_FRONTEND: 3000
steps:
- uses: actions/checkout@v4
- name: Cleanup leftover containers
run: docker compose -f docker-compose.yml -f docker-compose.ci.yml down --volumes --remove-orphans || true
- name: Start DB and MinIO
run: docker compose -f docker-compose.yml -f docker-compose.ci.yml up -d db minio create-buckets
- name: Wait for DB
run: |
timeout 30 bash -c \
'until docker compose -f docker-compose.yml -f docker-compose.ci.yml exec -T db pg_isready -U archive_user; do sleep 2; done'
- name: Connect job container to compose network
run: docker network connect familienarchiv_archiv-net $(cat /etc/hostname)
- uses: actions/setup-java@v4
with:
java-version: '21'
distribution: temurin
- name: Cache Maven repository
uses: actions/cache@v4
with:
path: ~/.m2/repository
key: maven-${{ hashFiles('backend/pom.xml') }}
restore-keys: maven-
- name: Build backend
run: |
chmod +x mvnw
./mvnw clean package -DskipTests
working-directory: backend
- name: Start backend
run: |
java -jar backend/target/*.jar \
--spring.profiles.active=e2e \
--SPRING_DATASOURCE_URL=jdbc:postgresql://db:5432/family_archive_db \
--SPRING_DATASOURCE_USERNAME=archive_user \
--SPRING_DATASOURCE_PASSWORD=ci_db_password \
--S3_ENDPOINT=http://minio:9000 \
--S3_ACCESS_KEY=minio_admin \
--S3_SECRET_KEY=ci_minio_password \
--S3_BUCKET_NAME=archive-documents \
--S3_REGION=us-east-1 \
--APP_ADMIN_USERNAME=admin \
--APP_ADMIN_PASSWORD=${{ secrets.E2E_ADMIN_PASSWORD }} \
&
timeout 90 bash -c \
'until curl -sf http://localhost:8080/actuator/health | grep -q "UP"; do sleep 3; done'
- uses: actions/setup-node@v4
with:
node-version: 20
- name: Cache node_modules
id: node-modules-cache
uses: actions/cache@v4
with:
path: frontend/node_modules
key: node-modules-${{ hashFiles('frontend/package-lock.json') }}
- name: Install frontend dependencies
if: steps.node-modules-cache.outputs.cache-hit != 'true'
run: npm ci
working-directory: frontend
- name: Cache Playwright browsers
id: playwright-cache
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: playwright-chromium-${{ hashFiles('frontend/package-lock.json') }}
- name: Install Playwright Chromium + system deps
if: steps.playwright-cache.outputs.cache-hit != 'true'
run: npx playwright install chromium --with-deps
working-directory: frontend
- name: Install Playwright system deps only
if: steps.playwright-cache.outputs.cache-hit == 'true'
run: npx playwright install-deps chromium
working-directory: frontend
- name: Run E2E tests
run: npm run test:e2e
working-directory: frontend
env:
E2E_BASE_URL: http://localhost:3000
E2E_USERNAME: admin
E2E_PASSWORD: ${{ secrets.E2E_ADMIN_PASSWORD }} # ← secret, not hardcoded
E2E_BACKEND_URL: http://localhost:8080
- name: Upload E2E results
if: always()
uses: actions/upload-artifact@v3 # pinned per ADR-014 — Gitea Actions does not implement v4 protocol. Do NOT upgrade.
with:
name: e2e-results
path: frontend/test-results/e2e/
```