familienarchiv/.gitea/workflows/nightly.yml at c3c1efe5f17b1b794d1b2cae76a22f964a2a96c3

Files

Marcel 440a191138 infra(workflows): annotate env-file cleanup as load-bearing

The `if: always()` conditional on the env-file cleanup step in both
deploy workflows is what makes the ADR-011 single-tenant runner trust
model safe: secrets land on disk before each deploy and are wiped
unconditionally afterwards. A future workflow refactor that drops
`if: always()` would silently leave plaintext secrets on the runner
on any failed deploy.

The ADR documents this; the workflow file did not. Adds a prominent
inline comment so the next reader of the YAML sees the constraint
without having to cross-reference ADR-011. No behaviour change — both
workflows still parse. Addresses @nora's round-2 suggestion on PR
#499 — "linchpin of the ADR-011 trust model".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-11 14:09:12 +02:00

133 lines

5.5 KiB

YAML

Raw Blame History

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

 name: nightly
 # Builds and deploys the staging environment from main every night.
 # Runs on the self-hosted runner using Docker-out-of-Docker (the docker
 # socket is mounted in), so `docker compose build` produces images on
 # the host daemon and `docker compose up` consumes them directly — no
 # registry hop.
 #
 # Operational assumptions (see docs/DEPLOYMENT.md §3 for the full setup):
 #
 #   1. Single-tenant self-hosted runner. The "Write staging env file" step
 #      writes every secret to .env.staging on the runner filesystem; the
 #      `if: always()` cleanup step removes it. A multi-tenant runner
 #      would need to switch to docker compose --env-file <(stdin) instead.
 #
 #   2. Host docker layer cache is authoritative. There is no
 #      actions/cache; we rely on the host daemon to keep Maven and npm
 #      layers warm between runs. A `docker system prune` on the host
 #      will cause the next nightly build to be cold (5–10 min slower).
 #
 # Staging environment isolation:
 #   - project name: archiv-staging
 #   - host ports:   backend 8081, frontend 3001
 #   - profile:      staging (starts mailpit instead of a real SMTP relay)
 #
 # Required Gitea secrets:
 #   STAGING_POSTGRES_PASSWORD
 #   STAGING_MINIO_PASSWORD
 #   STAGING_MINIO_APP_PASSWORD
 #   STAGING_OCR_TRAINING_TOKEN
 #   STAGING_APP_ADMIN_USERNAME
 #   STAGING_APP_ADMIN_PASSWORD
 on:
   schedule:
     - cron: "0 2 * * *"
   workflow_dispatch:
 env:
   # Ensures the backend Dockerfile's `RUN --mount=type=cache` lines are
   # honoured (Maven cache survives between runs).
   DOCKER_BUILDKIT: "1"
 jobs:
   deploy-staging:
     runs-on: self-hosted
     steps:
       - uses: actions/checkout@v4
       - name: Write staging env file
         run: |
           cat > .env.staging <<EOF
           TAG=nightly
           PORT_BACKEND=8081
           PORT_FRONTEND=3001
           APP_DOMAIN=staging.raddatz.cloud
           POSTGRES_PASSWORD=${{ secrets.STAGING_POSTGRES_PASSWORD }}
           MINIO_PASSWORD=${{ secrets.STAGING_MINIO_PASSWORD }}
           MINIO_APP_PASSWORD=${{ secrets.STAGING_MINIO_APP_PASSWORD }}
           OCR_TRAINING_TOKEN=${{ secrets.STAGING_OCR_TRAINING_TOKEN }}
           APP_ADMIN_USERNAME=${{ secrets.STAGING_APP_ADMIN_USERNAME }}
           APP_ADMIN_PASSWORD=${{ secrets.STAGING_APP_ADMIN_PASSWORD }}
           MAIL_HOST=mailpit
           MAIL_PORT=1025
           MAIL_USERNAME=
           MAIL_PASSWORD=
           MAIL_SMTP_AUTH=false
           MAIL_STARTTLS_ENABLE=false
           APP_MAIL_FROM=noreply@staging.raddatz.cloud
           EOF
       - name: Build images
         # `--pull` forces re-fetching pinned base images so a CVE
         # re-publication of the same tag (e.g. node:20.19.0-alpine3.21,
         # postgres:16-alpine) is picked up instead of being served
         # from the host's stale Docker layer cache.
         run: |
           docker compose \
             -f docker-compose.prod.yml \
             -p archiv-staging \
             --env-file .env.staging \
             --profile staging \
             build --pull
       - name: Deploy staging
         run: |
           docker compose \
             -f docker-compose.prod.yml \
             -p archiv-staging \
             --env-file .env.staging \
             --profile staging \
             up -d --wait --remove-orphans
       - name: Smoke test deployed environment
         # Healthchecks confirm containers are healthy; they do NOT confirm the
         # public surface works. This step catches: Caddy not reloaded, HSTS
         # header dropped, /actuator block bypassed.
         #
         # --resolve pins staging.raddatz.cloud to the runner's loopback so we
         # do NOT depend on the host router doing hairpin NAT (many SOHO
         # routers do not, or do so only after a firmware update). SNI still
         # uses the public hostname so the cert validates correctly.
         run: |
           set -e
           HOST="staging.raddatz.cloud"
           URL="https://$HOST"
           RESOLVE="--resolve $HOST:443:127.0.0.1"
           echo "Smoke test: $URL (pinned to 127.0.0.1)"
           curl -fsS $RESOLVE --max-time 10 "$URL/login" -o /dev/null
           # Pin the preload-list-eligible HSTS value, not just header presence:
           # a degraded `max-age=1` or a dropped `includeSubDomains; preload` must
           # fail this check rather than pass it silently.
           curl -fsS $RESOLVE --max-time 10 -I "$URL/" \
             | grep -Eqi 'strict-transport-security:[[:space:]]*max-age=31536000.*includeSubDomains.*preload'
           # Permissions-Policy denies APIs the app does not use (camera,
           # microphone, geolocation). A regression that loosens or drops the
           # header now fails the smoke step.
           curl -fsS $RESOLVE --max-time 10 -I "$URL/" \
             | grep -Eqi 'permissions-policy:[[:space:]]*camera=\(\),[[:space:]]*microphone=\(\),[[:space:]]*geolocation=\(\)'
           status=$(curl -s $RESOLVE -o /dev/null -w "%{http_code}" --max-time 10 "$URL/actuator/health")
           [ "$status" = "404" ] || { echo "expected 404 from /actuator/health, got $status"; exit 1; }
           echo "All smoke checks passed"
       - name: Cleanup env file
         # LOAD-BEARING: `if: always()` is the linchpin of the ADR-011
         # single-tenant runner trust model. Every secret in .env.staging
         # is plain text on the runner filesystem until this step runs.
         # If a future refactor drops `if: always()`, a failed deploy
         # leaves the env-file behind. Do not remove this conditional
         # without first re-evaluating ADR-011.
         if: always()
         run: rm -f .env.staging

133 lines 5.5 KiB YAML Raw Blame History Unescape Escape

133 lines

5.5 KiB

YAML

Raw Blame History