ci(nightly): reload Caddy before smoke test #537

Merged
marcel merged 10 commits from fix/nightly-caddy-reload into main 2026-05-12 07:51:13 +02:00

10 Commits

Author SHA1 Message Date
Marcel
e9caa3a1f7 chore(renovate): require manual review for privileged CI image digest bumps
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 1m46s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m8s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Failing after 11s
CI / OCR Service Tests (push) Successful in 16s
CI / Unit & Component Tests (push) Failing after 1m52s
CI / Backend Unit Tests (push) Successful in 4m11s
CI / fail2ban Regex (push) Successful in 39s
CI / Compose Bucket Idempotency (push) Failing after 10s
Adds a packageRule matching .gitea/workflows/** digest updates with
automerge: false. Digest bumps for images running --privileged --pid=host
have root-equivalent host access and must not be auto-merged.

Addresses Nora's review concern on #537.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 07:42:28 +02:00
Marcel
58922bee53 docs(ci): add Troubleshooting section for Reload Caddy failures
Covers the three failure modes Sara flagged: Caddy stopped (explicit
systemctl error), symlink missing/mis-pointed (silent reload, stale
smoke test), and Docker socket / nsenter unavailable (container error).
Each failure mode includes symptoms and recovery steps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 07:42:28 +02:00
Marcel
bbdf1c3e67 docs(adr): ADR-012 — nsenter via privileged container for host service management in CI
Captures the architectural decision, alternatives considered (sudo
systemctl, Caddy admin API, SSH), and consequences (symlink contract,
Renovate review requirement, step duplication tracked in #539).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 07:42:28 +02:00
Marcel
8536b2ebbd docs(deploy): note Caddyfile symlink is a CI dependency
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 07:42:28 +02:00
Marcel
4bb988824f docs(ci): update nsenter example to Alpine, document alternatives considered
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 07:42:28 +02:00
Marcel
544b96bc9e fix(ci): pin Reload Caddy to alpine:3.21 digest, add reload-vs-restart rationale
- Switch ubuntu:22.04 (floating, ~70 MB) to alpine:3.21 pinned by sha256
  digest (~5 MB); util-linux installed at run time via apk add
- Add explicit comment explaining why `reload` not `restart`: SIGHUP
  re-reads config in-process without dropping TLS connections

Addresses Tobias + Nora blocker from PR review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 07:42:28 +02:00
Marcel
fe2cdaae83 docs(ci): document DooD runner architecture and nsenter pattern
Replace the stale generic runner provisioning docs with an accurate
description of the actual two-container setup on the Hetzner VPS.
Document the nsenter pattern for running host-level commands (systemctl)
from containerised CI steps, and the Caddyfile symlink contract that the
reload step depends on.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 07:42:28 +02:00
Marcel
d29169eb39 fix(ci): add Caddy reload step to release workflow
Same gap as nightly.yml: production deploys also need Caddy to reload
the updated Caddyfile before the smoke test validates the public surface.
Uses the same nsenter pattern introduced in the previous commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 07:42:28 +02:00
Marcel
d750d5cee2 fix(ci): reload Caddy via nsenter, not sudo systemctl
`sudo systemctl reload caddy` does not work from inside a DooD job
container: `systemctl` is absent from Ubuntu container images and
container processes cannot reach the host systemd without entering its
namespaces. Replace with `docker run --privileged --pid=host ubuntu:22.04
nsenter -t 1 -m -u -n -p -i -- /bin/systemctl reload caddy`, which uses
the already-mounted Docker socket to spin up a privileged sibling
container that enters the host PID namespace via nsenter. Tested live on
the Hetzner VPS. No sudoers entry required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 07:42:28 +02:00
Marcel
90f52eae41 ci(nightly): reload Caddy before smoke test
Adds a `sudo systemctl reload caddy` step between the docker compose
deploy and the smoke test. This ensures any committed Caddyfile changes
are applied before the public surface is verified.

Previously the workflow had no mechanism to push Caddyfile changes to
the running host daemon. A Caddyfile edit would land in the repo but
Caddy would keep serving the previous config, causing the smoke test to
catch a stale header or still-proxied /actuator route rather than the
intended current config.

This step also surfaces the root cause of today's port-443 failure
explicitly: if Caddy is not running, the step fails with a clear service
error rather than a misleading "Failed to connect to port 443" from curl.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 07:42:28 +02:00