DevOps: Renovate runner + nightly npm audit early-warning (#818) (#821)

## Summary Closes #818. Sets up the prevention layer so newly-published advisories are caught on a branch we own, not on a contributor's PR. **What changed:** - `renovate.json` — migrated 2 deprecated keys (`matchPackagePatterns` → `matchPackageNames`, `matchPaths` → `matchFileNames`); added `osvVulnerabilityAlerts`, `dependencyDashboard`, `vulnerabilityAlerts` (labels: security + P1-high), weekly routine `schedule`, and `lockFileMaintenance` (no automerge) - `.gitea/workflows/renovate.yml` — **new** daily cron runner (`0 3 * * *`), pinned to `renovatebot/github-action@8217b3fc` (v46.1.15) with `renovate-version: "46.1.15"`, `RENOVATE_TOKEN` secret, Gitea platform/endpoint env vars - `.gitea/workflows/nightly.yml` — added `npm-audit` job (parallel to `deploy-staging`, independent signal): shell self-test, `set +e` audit capture, jq-built deduped issue open/update, `NIGHTLY_AUDIT_TOKEN` via step env only, heartbeat on clean path - `docs/adr/041-renovate-runner-setup.md` — **new** negative-space ADR (no auto GITEA_TOKEN, two-token rationale, OSV-vs-platform, digest-pin threat model, schedule-batches-routine-only, l2-containers omission) - `docs/infrastructure/ci-gitea.md` — two-token model, PAT rotation cadence, OSV-vs-platform, nightly/PR-gate divergence table, runbook for nightly-opened issues - `docs/infrastructure/self-hosted-catalogue.md` — fixed Renovate snippet (daily cron, digest pin, `RENOVATE_TOKEN`, fixed version, no root `automerge: true`) **No `l2-containers.puml` entry** — Renovate is a scheduled CI job, not a long-lived container. Stated here as a decision, not an oversight (ADR-041). ## Manual steps required before the runner is live (not automated) 1. Create a dedicated bot account (e.g. `renovate-bot`) on the Gitea instance 2. Mint `RENOVATE_TOKEN` PAT (scopes: `contents` + `pull_request` + `issues`) → add as Gitea secret 3. Mint `NIGHTLY_AUDIT_TOKEN` PAT (scope: `issues` only) → add as Gitea secret 4. Configure `main` branch protection to forbid the bot pushing directly ## Acceptance criteria status - [x] `renovate.json` deprecated keys migrated; vuln surfacing config enabled - [x] `.gitea/workflows/renovate.yml` exists (digest-pinned, daily cron, fixed version) - [x] `self-hosted-catalogue.md` snippet corrected (4 items) - [x] `nightly.yml` npm-audit job: survives non-zero exit, deduped tracking issue, jq payload, NIGHTLY_AUDIT_TOKEN via env only, heartbeat on clean - [x] ADR-041 records all negative-space decisions - [x] `ci-gitea.md` documents two-token model + runbook - [ ] Phase 0 manual gates: bot account creation, Renovate onboarding PR evidence, Dependency Dashboard screenshot — **requires manual provisioning** - [ ] Dedupe AC verified via `workflow_dispatch` — **requires NIGHTLY_AUDIT_TOKEN secret to be provisioned first** - [ ] `$GITHUB_STEP_SUMMARY` availability on this runner — **verify in first live run** Co-authored-by: Marcel <marcel@familienarchiv> Reviewed-on: #821
2026-06-13 12:13:35 +02:00
parent bde1237358
commit 83ca2eb34d
6 changed files with 440 additions and 14 deletions
--- a/.gitea/workflows/nightly.yml
+++ b/.gitea/workflows/nightly.yml
@@ -161,3 +161,147 @@ jobs:
        # without first re-evaluating ADR-011.
        if: always()
        run: rm -f .env.staging
+
+  npm-audit:
+    # Independent parallel job — a deploy failure cannot mask the audit signal
+    # and a clean audit cannot hide a broken deploy. Intentionally no `needs:`.
+    #
+    # Scans dev deps too (no --omit=dev), which is deliberately broader than the
+    # PR gate (ci.yml §Security audit) that uses --omit=dev. A nightly broader
+    # result is NOT a PR gate failure — it catches dev-tooling advisories (esbuild,
+    # Vite, etc.) early. See docs/infrastructure/ci-gitea.md §Nightly audit vs PR gate.
+    #
+    # Required Gitea secrets:
+    #   NIGHTLY_AUDIT_TOKEN  — PAT with issues scope only. An issues-only token
+    #                          means a leak via logs/process-args cannot push
+    #                          branches, open PRs, or read repo contents (ADR-041).
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Assert jq is available
+        run: which jq || sudo apt-get install -y jq
+
+      - name: Run npm audit and file tracking issue on findings
+        # Never run under set -x — NIGHTLY_AUDIT_TOKEN in env would leak to logs.
+        env:
+          NIGHTLY_AUDIT_TOKEN: ${{ secrets.NIGHTLY_AUDIT_TOKEN }}
+        run: |
+          MARKER="Nightly npm audit: high-severity advisory"
+          GITEA_URL="${{ github.server_url }}"
+          REPO="${{ github.repository }}"
+          RUN_URL="${GITEA_URL}/${REPO}/actions/runs/${{ github.run_id }}"
+
+          # --- Self-test (mirrors ci.yml §Assert pattern) ---
+          # Tests the exact jq test() call used in the dedupe step, before any
+          # API call, so a broken matcher fails loudly early rather than silently
+          # opening duplicate issues. Proves the regex only — create-vs-update
+          # decision is exercised by the workflow_dispatch AC.
+          echo "{\"title\": \"${MARKER}\"}" \
+            | jq -e --arg m "$MARKER" '.title | test($m; "i")' > /dev/null \
+            || { echo "FAIL: self-test — jq test() missed tracking issue title"; exit 1; }
+          echo '{"title": "fix(deps): update dependency esbuild (CVE-2025-12345)"}' \
+            | jq -e --arg m "$MARKER" '.title | test($m; "i") | not' > /dev/null \
+            || { echo "FAIL: self-test — jq test() incorrectly matched unrelated title"; exit 1; }
+          echo "Self-test passed."
+
+          # --- Run audit ---
+          # No npm ci — audit reads only the lockfile (no network, no install).
+          set +e
+          (cd frontend && npm audit --audit-level=high --json > /tmp/audit.json)
+          AUDIT_EXIT=$?
+          set -e
+
+          if [ "$AUDIT_EXIT" -ne 0 ]; then
+            # --- Build issue body with jq (never string-concat advisory text) ---
+            # Advisory overview/title text is registry-controlled; string-concat
+            # would be an injection/escaping vector into the API body. Truncate
+            # raw excerpt to 500 chars so a pathological overview can't produce
+            # a multi-MB PATCH body.
+            ISSUE_BODY=$(jq -r \
+              --arg run_url "$RUN_URL" \
+              '
+              (.vulnerabilities // {}) as $vulns |
+              ($vulns | to_entries |
+                map(select(.value.severity == "high" or .value.severity == "critical")) |
+                map("- **" + .key + "** (" + .value.severity + ")") |
+                if length > 0 then join("\n") else "_See raw output for details._" end) as $pkg_list |
+              "## npm audit: high/critical advisories\n\n" + $pkg_list +
+              "\n\n**Run:** " + $run_url +
+              "\n\n<details><summary>Raw audit excerpt (first 500 chars)</summary>\n\n```\n" +
+              (tostring | .[0:500]) +
+              "\n```\n\n</details>"
+              ' /tmp/audit.json)
+
+            # --- Dedupe: fetch open security issues, match by title marker ---
+            # Renovate vuln PRs also carry the "security" label, so >1 open
+            # "security" issue WILL occur. Title-match (not just label) ensures
+            # we deduplicate only our own tracking issue.
+            OPEN_ISSUES=$(curl -sf \
+              -H "Authorization: token $NIGHTLY_AUDIT_TOKEN" \
+              "${GITEA_URL}/api/v1/repos/${REPO}/issues?state=open&type=issues&labels=security&limit=50")
+
+            MATCHED=$(echo "$OPEN_ISSUES" | jq \
+              --arg m "$MARKER" \
+              '[.[] | select(.title | test($m; "i"))] | sort_by(.created_at)')
+            MATCH_COUNT=$(echo "$MATCHED" | jq 'length')
+
+            if [ "$MATCH_COUNT" -gt 0 ]; then
+              # Patch the oldest matched issue (append run URL to body).
+              ISSUE_NUMBER=$(echo "$MATCHED" | jq -r '.[0].number')
+              EXISTING_BODY=$(echo "$MATCHED" | jq -r '.[0].body')
+              NEW_BODY=$(jq -n \
+                --arg existing "$EXISTING_BODY" \
+                --arg run_url "$RUN_URL" \
+                '$existing + "\n\n---\n\nUpdated by run: " + $run_url')
+              PAYLOAD=$(jq -n --arg body "$NEW_BODY" '{"body": $body}')
+              curl -sf -X PATCH \
+                -H "Authorization: token $NIGHTLY_AUDIT_TOKEN" \
+                -H "Content-Type: application/json" \
+                -d "$PAYLOAD" \
+                "${GITEA_URL}/api/v1/repos/${REPO}/issues/${ISSUE_NUMBER}" > /dev/null
+              echo "Updated tracking issue #${ISSUE_NUMBER}"
+            else
+              # Closed prior issue that recurs → new issue (not reopened).
+              # A re-opened issue would obscure when the advisory was re-discovered.
+              PAYLOAD=$(jq -n \
+                --arg title "$MARKER" \
+                --arg body "$ISSUE_BODY" \
+                '{"title": $title, "body": $body}')
+              CREATED=$(curl -sf -X POST \
+                -H "Authorization: token $NIGHTLY_AUDIT_TOKEN" \
+                -H "Content-Type: application/json" \
+                -d "$PAYLOAD" \
+                "${GITEA_URL}/api/v1/repos/${REPO}/issues")
+              NEW_NUMBER=$(echo "$CREATED" | jq -r '.number')
+              echo "Opened new tracking issue #${NEW_NUMBER}"
+
+              # Labels are ignored on issue create in Gitea — add in a follow-up call.
+              LABEL_IDS=$(curl -sf \
+                -H "Authorization: token $NIGHTLY_AUDIT_TOKEN" \
+                "${GITEA_URL}/api/v1/repos/${REPO}/labels?limit=50" \
+                | jq '[.[] | select(.name == "security" or .name == "devops" or .name == "P1-high") | .id]')
+              curl -sf -X POST \
+                -H "Authorization: token $NIGHTLY_AUDIT_TOKEN" \
+                -H "Content-Type: application/json" \
+                -d "{\"labels\": $LABEL_IDS}" \
+                "${GITEA_URL}/api/v1/repos/${REPO}/issues/${NEW_NUMBER}/labels" > /dev/null
+            fi
+
+            exit "$AUDIT_EXIT"
+
+          else
+            # --- Heartbeat: proves the job ran and found nothing ---
+            # "No issue created" is only meaningful evidence when paired with a
+            # visible positive signal. Without this, a never-ran job is
+            # indistinguishable from a clean run.
+            #
+            # $GITHUB_STEP_SUMMARY availability is unproven on this runner
+            # (act_runner populates it, but this is the first run to verify it).
+            # Guard before use so an unset variable does not fail the clean-path.
+            MSG="✅ npm audit clean $(date -u)"
+            if [ -n "${GITHUB_STEP_SUMMARY:-}" ]; then
+              echo "$MSG" >> "$GITHUB_STEP_SUMMARY"
+            fi
+            echo "$MSG"
+          fi