devops(ci): add SAST/SCA/secret-scan/container-scan gates to .gitea/workflows/ci.yml #461

Open
opened 2026-05-07 17:22:49 +02:00 by marcel · 8 comments
Owner

Context

.gitea/workflows/ci.yml currently runs only lint + Vitest + JUnit + a sliver of pytest. No SAST, no SCA, no container scanning, no secret scanning, no SBOM. The pre-prod audit ran every one of those tools manually and found:

  • 19 HIGH/CRITICAL CVEs in backend/pom.xml (would have been caught by trivy / OWASP Dependency-Check).
  • 14 CRIT + 83 HIGH in the OCR image (would have been caught by trivy image).
  • 9 secrets in working tree + 2 in git history (would have been caught by gitleaks).
  • 1 priority-1 SpotBugs bug in AppUser.computeColor (would have been caught by SpotBugs).
  • 2 ERROR-severity OpenAPI issues (would have been caught by Spectral).

Without these gates, regressions reach production.

Approach

Add four new jobs to .gitea/workflows/ci.yml. Run them in parallel with the existing test jobs; gate merges on their success.

Job: frontend-sca

frontend-sca:
  name: Frontend SCA
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - run: npm ci
      working-directory: frontend
    - run: npm audit --audit-level=high --omit=dev
      working-directory: frontend

Job: backend-sast-sca

backend-sast-sca:
  name: Backend SAST + SCA
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-java@v4
      with: { java-version: '21', distribution: temurin }
    - name: Trivy fs (Maven CVEs)
      run: |
        curl -fsSL https://github.com/aquasecurity/trivy/releases/download/v0.70.0/trivy_0.70.0_Linux-64bit.tar.gz | tar xz trivy
        ./trivy fs --scanners vuln --severity HIGH,CRITICAL --exit-code 1 backend/pom.xml
    - name: SpotBugs + FindSecBugs
      working-directory: backend
      run: |
        ./mvnw com.github.spotbugs:spotbugs-maven-plugin:check \
          -Dspotbugs.threshold=Medium \
          -Dspotbugs.plugins=com.h3xstream.findsecbugs:findsecbugs-plugin:1.13.0 \
          -Dspotbugs.failOnError=true

Pre-requisite: an <excludeFilterFile>spotbugs-exclude.xml</excludeFilterFile> to suppress the 288 Lombok EI_EXPOSE_REP/REP2 false positives the audit hit.

Job: secret-scan

secret-scan:
  name: Secret scan (gitleaks)
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
      with: { fetch-depth: 0 }
    - run: |
        curl -fsSL https://github.com/gitleaks/gitleaks/releases/download/v8.21.2/gitleaks_8.21.2_linux_x64.tar.gz | tar xz gitleaks
        ./gitleaks detect --source . --no-banner

Job: container-scan

Runs after build-and-push (per #142). Scans the three images we publish:

container-scan:
  name: Container scan
  needs: [build-and-push]
  runs-on: ubuntu-latest
  strategy:
    matrix:
      image: [familienarchiv-backend, familienarchiv-frontend, familienarchiv-ocr-service]
  steps:
    - run: |
        curl -fsSL https://github.com/aquasecurity/trivy/releases/download/v0.70.0/trivy_0.70.0_Linux-64bit.tar.gz | tar xz trivy
        ./trivy image --severity HIGH,CRITICAL --exit-code 1 \
          ${{ env.REGISTRY }}/${{ matrix.image }}:${{ github.sha }}

Job: openapi-lint

openapi-lint:
  name: OpenAPI Spectral
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with: { node-version: '24' }
    # boot the backend in dev profile, capture spec, lint
    - run: docker compose up -d backend db minio
    - run: |
        until curl -fs http://localhost:8080/v3/api-docs > openapi.json; do sleep 2; done
        npx --yes -p @stoplight/spectral-cli@6.13.1 spectral lint \
          openapi.json --ruleset .spectral.yaml --fail-severity error

Critical files

  • .gitea/workflows/ci.yml — add 4-5 new jobs
  • backend/spotbugs-exclude.xmlnew, suppress Lombok EI_EXPOSE_REP false positives
  • .spectral.yamlnew, extends spectral:oas with project-specific rules
  • .gitleaks.tomloptional, custom rules / allowlist for known false positives

Verification

  1. Push a branch that intentionally introduces a known-vulnerable dep (e.g., lodash@4.17.20). The frontend-sca or backend-sast-sca job fails.
  2. Push a branch that adds a fake secret (e.g., password = "hunter2"). secret-scan fails.
  3. Push a branch that breaks the OpenAPI spec (e.g., remove a @PathVariable name). openapi-lint fails.
  4. Push a clean branch — all gates pass green.

Acceptance criteria

  • CI fails on any HIGH/CRITICAL CVE in either Maven or npm production deps.
  • CI fails on any new committed secret.
  • CI fails on any HIGH/CRITICAL container CVE in published images (after #134/#135 image pipeline lands).
  • CI fails on any ERROR-severity OpenAPI lint finding.
  • SpotBugs runs with FindSecBugs and fails on priority-1 findings.
  • spotbugs-exclude.xml suppresses Lombok EI_EXPOSE_REP/REP2 false positives.

Effort

M — 3-4 days including baselining the Spotbugs exclusion file (suppressing 288 Lombok false positives) and testing each gate against intentional regressions.

Risk if not addressed

CVEs accumulate silently. Secrets get committed undetected. Container vulnerabilities reach production.

Tracked in audit doc as F-07 (Critical, escalated).

## Context `.gitea/workflows/ci.yml` currently runs only lint + Vitest + JUnit + a sliver of pytest. No SAST, no SCA, no container scanning, no secret scanning, no SBOM. The pre-prod audit ran every one of those tools manually and found: - **19 HIGH/CRITICAL CVEs** in `backend/pom.xml` (would have been caught by trivy / OWASP Dependency-Check). - **14 CRIT + 83 HIGH** in the OCR image (would have been caught by `trivy image`). - **9 secrets** in working tree + 2 in git history (would have been caught by gitleaks). - **1 priority-1 SpotBugs bug** in `AppUser.computeColor` (would have been caught by SpotBugs). - **2 ERROR-severity OpenAPI issues** (would have been caught by Spectral). Without these gates, regressions reach production. ## Approach Add four new jobs to `.gitea/workflows/ci.yml`. Run them in parallel with the existing test jobs; gate merges on their success. ### Job: `frontend-sca` ```yaml frontend-sca: name: Frontend SCA runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: npm ci working-directory: frontend - run: npm audit --audit-level=high --omit=dev working-directory: frontend ``` ### Job: `backend-sast-sca` ```yaml backend-sast-sca: name: Backend SAST + SCA runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-java@v4 with: { java-version: '21', distribution: temurin } - name: Trivy fs (Maven CVEs) run: | curl -fsSL https://github.com/aquasecurity/trivy/releases/download/v0.70.0/trivy_0.70.0_Linux-64bit.tar.gz | tar xz trivy ./trivy fs --scanners vuln --severity HIGH,CRITICAL --exit-code 1 backend/pom.xml - name: SpotBugs + FindSecBugs working-directory: backend run: | ./mvnw com.github.spotbugs:spotbugs-maven-plugin:check \ -Dspotbugs.threshold=Medium \ -Dspotbugs.plugins=com.h3xstream.findsecbugs:findsecbugs-plugin:1.13.0 \ -Dspotbugs.failOnError=true ``` Pre-requisite: an `<excludeFilterFile>spotbugs-exclude.xml</excludeFilterFile>` to suppress the 288 Lombok `EI_EXPOSE_REP/REP2` false positives the audit hit. ### Job: `secret-scan` ```yaml secret-scan: name: Secret scan (gitleaks) runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: { fetch-depth: 0 } - run: | curl -fsSL https://github.com/gitleaks/gitleaks/releases/download/v8.21.2/gitleaks_8.21.2_linux_x64.tar.gz | tar xz gitleaks ./gitleaks detect --source . --no-banner ``` ### Job: `container-scan` Runs after build-and-push (per #142). Scans the three images we publish: ```yaml container-scan: name: Container scan needs: [build-and-push] runs-on: ubuntu-latest strategy: matrix: image: [familienarchiv-backend, familienarchiv-frontend, familienarchiv-ocr-service] steps: - run: | curl -fsSL https://github.com/aquasecurity/trivy/releases/download/v0.70.0/trivy_0.70.0_Linux-64bit.tar.gz | tar xz trivy ./trivy image --severity HIGH,CRITICAL --exit-code 1 \ ${{ env.REGISTRY }}/${{ matrix.image }}:${{ github.sha }} ``` ### Job: `openapi-lint` ```yaml openapi-lint: name: OpenAPI Spectral runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: { node-version: '24' } # boot the backend in dev profile, capture spec, lint - run: docker compose up -d backend db minio - run: | until curl -fs http://localhost:8080/v3/api-docs > openapi.json; do sleep 2; done npx --yes -p @stoplight/spectral-cli@6.13.1 spectral lint \ openapi.json --ruleset .spectral.yaml --fail-severity error ``` ## Critical files - `.gitea/workflows/ci.yml` — add 4-5 new jobs - `backend/spotbugs-exclude.xml` — **new**, suppress Lombok EI_EXPOSE_REP false positives - `.spectral.yaml` — **new**, extends `spectral:oas` with project-specific rules - `.gitleaks.toml` — **optional**, custom rules / allowlist for known false positives ## Verification 1. Push a branch that intentionally introduces a known-vulnerable dep (e.g., `lodash@4.17.20`). The `frontend-sca` or `backend-sast-sca` job fails. 2. Push a branch that adds a fake secret (e.g., `password = "hunter2"`). `secret-scan` fails. 3. Push a branch that breaks the OpenAPI spec (e.g., remove a `@PathVariable` name). `openapi-lint` fails. 4. Push a clean branch — all gates pass green. ## Acceptance criteria - [ ] CI fails on any HIGH/CRITICAL CVE in either Maven or npm production deps. - [ ] CI fails on any new committed secret. - [ ] CI fails on any HIGH/CRITICAL container CVE in published images (after #134/#135 image pipeline lands). - [ ] CI fails on any ERROR-severity OpenAPI lint finding. - [ ] SpotBugs runs with FindSecBugs and fails on priority-1 findings. - [ ] `spotbugs-exclude.xml` suppresses Lombok `EI_EXPOSE_REP/REP2` false positives. ## Effort M — 3-4 days including baselining the Spotbugs exclusion file (suppressing 288 Lombok false positives) and testing each gate against intentional regressions. ## Risk if not addressed CVEs accumulate silently. Secrets get committed undetected. Container vulnerabilities reach production. Tracked in audit doc as **F-07** (Critical, escalated).
marcel added this to the Production v1 milestone 2026-05-07 17:22:49 +02:00
marcel added the P0-criticaldevopssecurity labels 2026-05-07 17:23:34 +02:00
Author
Owner

🏗️ Markus Keller — Application Architect

Observations

  • The four new jobs (frontend-sca, backend-sast-sca, secret-scan, container-scan) are structurally sound and follow the correct parallelism pattern — they run alongside existing test jobs, which is the right call for wall-clock time.
  • The openapi-lint job has a fundamental design problem: it boots the full Docker Compose stack (backend db minio) inside the CI runner just to capture an OpenAPI spec. This makes openapi-lint the slowest, most fragile job in the pipeline and creates a hidden dependency on a running Postgres and MinIO instance that the other jobs don't need.
  • The container-scan job has a blocking prerequisite problem: it depends on build-and-push (issue #142), which does not exist yet in ci.yml. As a result this job cannot be merged as a functioning gate — it silently becomes a no-op until #142 lands. The issue correctly notes this as a prerequisite but the dependency should be called out more explicitly in the acceptance criteria.
  • The existing CI architecture commits to ubuntu-latest runners for all new jobs while existing jobs mix ubuntu-latest with the Playwright container image. This is consistent and correct.
  • backend-sast-sca inlines the Trivy binary download on every run (curl ... | tar xz trivy) without caching. On the project's self-hosted NAS runner, this is a network hit on every push. The same pattern appears for gitleaks.
  • spotbugs-exclude.xml is listed as a new file that suppresses 288 Lombok EI_EXPOSE_REP/REP2 false positives. This is architecturally correct — the Lombok @Data pattern on entities (per CLAUDE.md convention) will produce these warnings by design. The exclusion file is not noise suppression, it is an intentional design encoding that Lombok-annotated entities expose mutable collections by reference.
  • The AppUser.computeColor SpotBugs P1 finding deserves a look: Math.abs(id.hashCode()) % PALETTE.length has a known Java bug — Math.abs(Integer.MIN_VALUE) returns Integer.MIN_VALUE (negative), making % return a negative index. This causes an ArrayIndexOutOfBoundsException for roughly 1 in 2^31 UUIDs. SpotBugs will catch this correctly. The fix is Math.floorMod(id.hashCode(), PALETTE.length).
  • No ADR is proposed for the security toolchain selection. For a production-critical gate (blocking merges), an ADR should document why Trivy over OWASP Dependency-Check, gitleaks over truffleHog, etc.

Recommendations

  • Extract OpenAPI spec differently: run ./mvnw spring-boot:run in the backend-sast-sca job using an H2 profile (or a dedicated test profile that skips Postgres), wait for /v3/api-docs, dump the spec, then run Spectral. This avoids the docker compose up in CI entirely. Alternatively, configure SpringDoc to generate the spec at build time via springdoc-openapi-maven-plugin and lint the static output.
  • Cache the Trivy and gitleaks binaries: use actions/cache@v4 keyed on the binary version. On the self-hosted runner, the download cost is real. Cache both under ~/.tools/trivy-VERSION and ~/.tools/gitleaks-VERSION.
  • Block container-scan explicitly on the build pipeline issue: add a comment in the job YAML noting needs: [build-and-push] # requires #142 and add an acceptance criterion: "CI fails if #142 is not merged first." This prevents the job landing in a no-op state.
  • Fix AppUser.computeColor before this PR lands: the SpotBugs gate will catch it and block every future merge. Proactively fix Math.abs(id.hashCode())Math.floorMod(id.hashCode(), PALETTE.length) in this issue or open a dedicated bug.
  • Write the ADR: Document the toolchain choices in docs/adr/ADR-00X-ci-security-gates.md before merging. The "why Trivy, why gitleaks, why these severity thresholds" decisions have lasting consequences and will be questioned during future upgrades.
  • Update docs/architecture/c4/l2-containers.puml: per the doc-update rule, adding new CI infrastructure tools is a C4 L2 change. The CI pipeline is part of the delivery system.

Open Decisions

  • Should openapi-lint block only on ERROR-severity Spectral findings (as proposed), or also on WARN? The issue says --fail-severity error which is correct for a gate — but the .spectral.yaml ruleset content is not specified and the OAS ruleset has many warn-level rules that may hide real issues. Decide the severity boundary explicitly.
  • The spotbugs-exclude.xml suppresses 288 Lombok warnings globally. Should it suppress them only on entity classes (by package pattern) rather than globally? A global suppression hides the same warning class if it appears in a non-entity service class where it would be a real finding.
## 🏗️ Markus Keller — Application Architect ### Observations - The four new jobs (`frontend-sca`, `backend-sast-sca`, `secret-scan`, `container-scan`) are structurally sound and follow the correct parallelism pattern — they run alongside existing test jobs, which is the right call for wall-clock time. - The `openapi-lint` job has a fundamental design problem: it boots the **full Docker Compose stack** (`backend db minio`) inside the CI runner just to capture an OpenAPI spec. This makes `openapi-lint` the slowest, most fragile job in the pipeline and creates a hidden dependency on a running Postgres and MinIO instance that the other jobs don't need. - The `container-scan` job has a blocking prerequisite problem: it depends on `build-and-push` (issue #142), which **does not exist yet** in `ci.yml`. As a result this job cannot be merged as a functioning gate — it silently becomes a no-op until #142 lands. The issue correctly notes this as a prerequisite but the dependency should be called out more explicitly in the acceptance criteria. - The existing CI architecture commits to `ubuntu-latest` runners for all new jobs while existing jobs mix `ubuntu-latest` with the Playwright container image. This is consistent and correct. - `backend-sast-sca` inlines the Trivy binary download on every run (`curl ... | tar xz trivy`) without caching. On the project's self-hosted NAS runner, this is a network hit on every push. The same pattern appears for gitleaks. - `spotbugs-exclude.xml` is listed as a new file that suppresses 288 Lombok `EI_EXPOSE_REP/REP2` false positives. This is architecturally correct — the Lombok `@Data` pattern on entities (per CLAUDE.md convention) will produce these warnings by design. The exclusion file is not noise suppression, it is an intentional design encoding that Lombok-annotated entities expose mutable collections by reference. - The `AppUser.computeColor` SpotBugs P1 finding deserves a look: `Math.abs(id.hashCode()) % PALETTE.length` has a known Java bug — `Math.abs(Integer.MIN_VALUE)` returns `Integer.MIN_VALUE` (negative), making `%` return a negative index. This causes an `ArrayIndexOutOfBoundsException` for roughly 1 in 2^31 UUIDs. SpotBugs will catch this correctly. The fix is `Math.floorMod(id.hashCode(), PALETTE.length)`. - No ADR is proposed for the security toolchain selection. For a production-critical gate (blocking merges), an ADR should document *why* Trivy over OWASP Dependency-Check, gitleaks over truffleHog, etc. ### Recommendations - **Extract OpenAPI spec differently**: run `./mvnw spring-boot:run` in the `backend-sast-sca` job using an H2 profile (or a dedicated test profile that skips Postgres), wait for `/v3/api-docs`, dump the spec, then run Spectral. This avoids the `docker compose up` in CI entirely. Alternatively, configure SpringDoc to generate the spec at build time via `springdoc-openapi-maven-plugin` and lint the static output. - **Cache the Trivy and gitleaks binaries**: use `actions/cache@v4` keyed on the binary version. On the self-hosted runner, the download cost is real. Cache both under `~/.tools/trivy-VERSION` and `~/.tools/gitleaks-VERSION`. - **Block container-scan explicitly on the build pipeline issue**: add a comment in the job YAML noting `needs: [build-and-push] # requires #142` and add an acceptance criterion: "CI fails if #142 is not merged first." This prevents the job landing in a no-op state. - **Fix `AppUser.computeColor` before this PR lands**: the SpotBugs gate will catch it and block every future merge. Proactively fix `Math.abs(id.hashCode())` → `Math.floorMod(id.hashCode(), PALETTE.length)` in this issue or open a dedicated bug. - **Write the ADR**: Document the toolchain choices in `docs/adr/ADR-00X-ci-security-gates.md` before merging. The "why Trivy, why gitleaks, why these severity thresholds" decisions have lasting consequences and will be questioned during future upgrades. - **Update `docs/architecture/c4/l2-containers.puml`**: per the doc-update rule, adding new CI infrastructure tools is a C4 L2 change. The CI pipeline is part of the delivery system. ### Open Decisions - Should `openapi-lint` block only on `ERROR`-severity Spectral findings (as proposed), or also on `WARN`? The issue says `--fail-severity error` which is correct for a gate — but the `.spectral.yaml` ruleset content is not specified and the OAS ruleset has many `warn`-level rules that may hide real issues. Decide the severity boundary explicitly. - The `spotbugs-exclude.xml` suppresses 288 Lombok warnings globally. Should it suppress them only on entity classes (by package pattern) rather than globally? A global suppression hides the same warning class if it appears in a non-entity service class where it would be a real finding.
Author
Owner

👨‍💻 Felix Brandt — Fullstack Developer

Observations

  • The SpotBugs invocation in backend-sast-sca calls the plugin directly via ./mvnw com.github.spotbugs:spotbugs-maven-plugin:check without first adding the plugin to pom.xml. This will work, but the plugin version is implicitly resolved at invocation time. For reproducibility the plugin should be declared in <build><plugins> in pom.xml with a pinned version, so the same version is used locally and in CI.
  • AppUser.computeColor is identified as a P1 SpotBugs finding. The actual bug is Math.abs(id.hashCode()) % PALETTE.length where hashCode() can return Integer.MIN_VALUE and Math.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE (negative). PALETTE[negative] throws ArrayIndexOutOfBoundsException. This should be fixed before the SpotBugs gate is switched on, not as a follow-up — otherwise the gate will block the main branch immediately after merging this issue.
  • The secret-scan job uses fetch-depth: 0 (full history), which is correct — scanning only the last commit would miss secrets committed earlier and then "fixed" by deletion. This is the right default for gitleaks.
  • The frontend-sca step uses npm audit --omit=dev, which skips devDependencies. This is the right call for a production gate — dev tooling CVEs are noise. However, vitest-browser-svelte, playwright, and @inlang/paraglide-js are in devDependencies but their bundled output may not reach production. The --omit=dev flag is correct.
  • The openapi-lint job proposes starting the backend via docker compose up -d backend db minio and polling /v3/api-docs. This is fragile in several ways: (1) the job needs Docker-in-Docker or a Docker socket, which the NAS runner may not provide in a container context; (2) the 95-character until polling loop has no timeout — if the backend fails to start, the job hangs indefinitely.
  • The gitleaks binary is downloaded and extracted to . (current directory), meaning it becomes an untracked file in the working tree. If gitleaks then scans . it will find its own binary. Add the binary to a temp directory outside the repo, or use --source . carefully.
  • The CI inline shell uses curl -fsSL ... | tar xz trivy — this executes untrusted bytes from the internet directly in the runner. For a security-focused job, the binary hash should be verified before execution.

Recommendations

  • Add SpotBugs to pom.xml as a pinned plugin rather than calling it via mvnw plugin:goal. This makes the version explicit, allows local ./mvnw spotbugs:check during development, and ensures CI and local runs are identical:
    <plugin>
      <groupId>com.github.spotbugs</groupId>
      <artifactId>spotbugs-maven-plugin</artifactId>
      <version>4.9.3.0</version>
      <configuration>
        <threshold>Medium</threshold>
        <excludeFilterFile>spotbugs-exclude.xml</excludeFilterFile>
      </configuration>
    </plugin>
    
  • Fix AppUser.computeColor now: change Math.abs(id.hashCode()) % PALETTE.length to Math.floorMod(id.hashCode(), PALETTE.length). Math.floorMod always returns a non-negative result. Ship the fix in this PR or as a prerequisite — do not let the SpotBugs gate go live with a known P1 finding blocking the build.
  • Add a timeout to the openapi-lint polling loop:
    timeout 120 bash -c 'until curl -fs http://localhost:8080/v3/api-docs > openapi.json; do sleep 2; done'
    
    Without timeout, a backend startup failure silently hangs the CI runner for the runner's maximum job duration.
  • Verify binary checksums: download the .tar.gz and the corresponding .tar.gz.sha256sum, verify before executing. This is table stakes for a security-focused step:
    curl -fsSL .../trivy_0.70.0_Linux-64bit.tar.gz -o trivy.tar.gz
    echo "EXPECTED_SHA256  trivy.tar.gz" | sha256sum -c -
    tar xz -f trivy.tar.gz trivy
    
  • Move downloaded binaries to $RUNNER_TEMP or /tmp to prevent gitleaks from scanning its own binary when run with --source .. Extract to $(mktemp -d) and add that path to PATH.

Open Decisions

  • None — the recommendations above have clear winners.
## 👨‍💻 Felix Brandt — Fullstack Developer ### Observations - The SpotBugs invocation in `backend-sast-sca` calls the plugin directly via `./mvnw com.github.spotbugs:spotbugs-maven-plugin:check` without first adding the plugin to `pom.xml`. This will work, but the plugin version is implicitly resolved at invocation time. For reproducibility the plugin should be declared in `<build><plugins>` in `pom.xml` with a pinned version, so the same version is used locally and in CI. - `AppUser.computeColor` is identified as a P1 SpotBugs finding. The actual bug is `Math.abs(id.hashCode()) % PALETTE.length` where `hashCode()` can return `Integer.MIN_VALUE` and `Math.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE` (negative). `PALETTE[negative]` throws `ArrayIndexOutOfBoundsException`. This should be fixed before the SpotBugs gate is switched on, not as a follow-up — otherwise the gate will block the main branch immediately after merging this issue. - The `secret-scan` job uses `fetch-depth: 0` (full history), which is correct — scanning only the last commit would miss secrets committed earlier and then "fixed" by deletion. This is the right default for gitleaks. - The `frontend-sca` step uses `npm audit --omit=dev`, which skips `devDependencies`. This is the right call for a production gate — dev tooling CVEs are noise. However, `vitest-browser-svelte`, `playwright`, and `@inlang/paraglide-js` are in `devDependencies` but their bundled output may not reach production. The `--omit=dev` flag is correct. - The `openapi-lint` job proposes starting the backend via `docker compose up -d backend db minio` and polling `/v3/api-docs`. This is fragile in several ways: (1) the job needs Docker-in-Docker or a Docker socket, which the NAS runner may not provide in a container context; (2) the 95-character `until` polling loop has no timeout — if the backend fails to start, the job hangs indefinitely. - The gitleaks binary is downloaded and extracted to `.` (current directory), meaning it becomes an untracked file in the working tree. If gitleaks then scans `.` it will find its own binary. Add the binary to a temp directory outside the repo, or use `--source .` carefully. - The CI inline shell uses `curl -fsSL ... | tar xz trivy` — this executes untrusted bytes from the internet directly in the runner. For a security-focused job, the binary hash should be verified before execution. ### Recommendations - **Add SpotBugs to `pom.xml`** as a pinned plugin rather than calling it via `mvnw plugin:goal`. This makes the version explicit, allows local `./mvnw spotbugs:check` during development, and ensures CI and local runs are identical: ```xml <plugin> <groupId>com.github.spotbugs</groupId> <artifactId>spotbugs-maven-plugin</artifactId> <version>4.9.3.0</version> <configuration> <threshold>Medium</threshold> <excludeFilterFile>spotbugs-exclude.xml</excludeFilterFile> </configuration> </plugin> ``` - **Fix `AppUser.computeColor` now**: change `Math.abs(id.hashCode()) % PALETTE.length` to `Math.floorMod(id.hashCode(), PALETTE.length)`. `Math.floorMod` always returns a non-negative result. Ship the fix in this PR or as a prerequisite — do not let the SpotBugs gate go live with a known P1 finding blocking the build. - **Add a timeout to the `openapi-lint` polling loop**: ```bash timeout 120 bash -c 'until curl -fs http://localhost:8080/v3/api-docs > openapi.json; do sleep 2; done' ``` Without `timeout`, a backend startup failure silently hangs the CI runner for the runner's maximum job duration. - **Verify binary checksums**: download the `.tar.gz` and the corresponding `.tar.gz.sha256sum`, verify before executing. This is table stakes for a security-focused step: ```bash curl -fsSL .../trivy_0.70.0_Linux-64bit.tar.gz -o trivy.tar.gz echo "EXPECTED_SHA256 trivy.tar.gz" | sha256sum -c - tar xz -f trivy.tar.gz trivy ``` - **Move downloaded binaries to `$RUNNER_TEMP` or `/tmp`** to prevent gitleaks from scanning its own binary when run with `--source .`. Extract to `$(mktemp -d)` and add that path to `PATH`. ### Open Decisions - None — the recommendations above have clear winners.
Author
Owner

🔧 Tobias Wendt — DevOps & Platform Engineer

Observations

  • The CI workflow is running on a self-hosted NAS runner (evidence: DOCKER_API_VERSION: "1.43" set for Testcontainers in the existing backend-unit-tests job). The proposed jobs all use runs-on: ubuntu-latest without this variable. If the NAS runner's Testcontainers behavior is affected by Docker API version, the new jobs may need the same env var — though none of the new jobs use Testcontainers, so this is low risk.
  • The container-scan job depends on build-and-push (issue #142/#134/#135), which does not yet exist in the workflow. Currently the three services built in docker-compose.yml are all build: context builds with no published image names (familienarchiv-backend, familienarchiv-frontend, familienarchiv-ocr-service are proposed names). There is no image registry configured, no REGISTRY secret defined, and no env.REGISTRY variable in the current workflow. The entire container-scan job is non-functional until the image pipeline lands.
  • The openapi-lint job runs docker compose up -d backend db minio inside the CI runner. The self-hosted runner does not appear to run inside a container (the existing workflow runs containers via container: directive for the Playwright job but the runner itself has Docker access). This approach requires Docker socket access in the runner, which is a security tradeoff: any job that can call docker compose can also escape to the host via volume mounts. For a security-focused issue, this is worth flagging.
  • docker-compose.yml currently has minio/minio:latest (not pinned) and minio/mc (not pinned). This means the openapi-lint job that does docker compose up minio will pull :latest every run — non-reproducible and potentially slow on a cold NAS runner cache. This pre-existing issue becomes more visible once CI boots the stack on every push.
  • The Trivy and gitleaks binaries are downloaded fresh on every run with no caching. On the NAS runner with a local network, GitHub release downloads may be slow or throttled. Both binaries are stable across runs (same version pinned) and should be cached via actions/cache@v4.
  • The renovate.json currently has no configuration for CI workflow tool versions (Trivy 0.70.0, gitleaks 8.21.2). These will drift without automation. The Spectral CLI version (6.13.1 via npx --yes) will also drift if npx resolves to a newer version.
  • No ci: true or --json output flag is specified for Trivy/gitleaks, so failures produce human-readable output in CI logs. This is fine for readability but SARIF or JSON output would enable future GitHub/Gitea code scanning integration.

Recommendations

  • Do not merge container-scan until build-and-push (#142) is ready. Gate the job with if: false or comment it out entirely, with a note: # Activate after #142 (image build pipeline) lands. A job that always skips due to a missing needs target produces confusing CI output and may not actually skip — it may error.
  • Cache Trivy and gitleaks binaries using actions/cache@v4 keyed on the version string. These are 30–80MB binaries; downloading them on every push is wasteful on a home NAS runner:
    - uses: actions/cache@v4
      with:
        path: ~/.tools/trivy
        key: trivy-v0.70.0
    - name: Install Trivy (if not cached)
      run: |
        if [ ! -f ~/.tools/trivy/trivy ]; then
          mkdir -p ~/.tools/trivy
          curl -fsSL https://github.com/aquasecurity/trivy/releases/download/v0.70.0/trivy_0.70.0_Linux-64bit.tar.gz | tar xz -C ~/.tools/trivy trivy
        fi
        echo "$HOME/.tools/trivy" >> $GITHUB_PATH
    
  • Pin docker-compose.yml image tags before the openapi-lint job goes live. minio/minio:latest and minio/mc pull latest on every cold boot. Pin to specific versions (minio/minio:RELEASE.2025-04-22T22-12-26Z or equivalent) and add them to renovate.json.
  • Add Renovate rules for CI tool versions: add regexManagers to renovate.json to track Trivy, gitleaks, and Spectral versions in the workflow YAML. Without automation these versions stall.
  • Consider replacing the openapi-lint Docker Compose approach with a Maven build-time spec export (via springdoc-openapi-maven-plugin) to avoid needing Docker socket access in the lint job entirely. This is simpler, faster, and has no runtime infrastructure dependency.
  • Forward the DOCKER_API_VERSION: "1.43" env var to any job that might use Docker on the NAS runner, as a defensive measure, even if currently unnecessary.

Open Decisions

  • The openapi-lint job requires Docker socket access (to run docker compose up). This is a meaningful security boundary on a self-hosted runner. The alternative (Maven build-time spec) avoids this tradeoff entirely. Which approach fits the operational constraints of the NAS runner?
## 🔧 Tobias Wendt — DevOps & Platform Engineer ### Observations - The CI workflow is running on a **self-hosted NAS runner** (evidence: `DOCKER_API_VERSION: "1.43"` set for Testcontainers in the existing `backend-unit-tests` job). The proposed jobs all use `runs-on: ubuntu-latest` without this variable. If the NAS runner's Testcontainers behavior is affected by Docker API version, the new jobs may need the same env var — though none of the new jobs use Testcontainers, so this is low risk. - The `container-scan` job depends on `build-and-push` (issue #142/#134/#135), which does not yet exist in the workflow. Currently the three services built in `docker-compose.yml` are all `build:` context builds with no published image names (`familienarchiv-backend`, `familienarchiv-frontend`, `familienarchiv-ocr-service` are proposed names). There is no image registry configured, no `REGISTRY` secret defined, and no `env.REGISTRY` variable in the current workflow. The entire `container-scan` job is non-functional until the image pipeline lands. - The `openapi-lint` job runs `docker compose up -d backend db minio` inside the CI runner. The self-hosted runner does not appear to run inside a container (the existing workflow runs containers via `container:` directive for the Playwright job but the runner itself has Docker access). This approach requires Docker socket access in the runner, which is a security tradeoff: any job that can call `docker compose` can also escape to the host via volume mounts. For a security-focused issue, this is worth flagging. - `docker-compose.yml` currently has `minio/minio:latest` (not pinned) and `minio/mc` (not pinned). This means the `openapi-lint` job that does `docker compose up minio` will pull `:latest` every run — non-reproducible and potentially slow on a cold NAS runner cache. This pre-existing issue becomes more visible once CI boots the stack on every push. - The Trivy and gitleaks binaries are downloaded fresh on every run with no caching. On the NAS runner with a local network, GitHub release downloads may be slow or throttled. Both binaries are stable across runs (same version pinned) and should be cached via `actions/cache@v4`. - The `renovate.json` currently has no configuration for CI workflow tool versions (Trivy `0.70.0`, gitleaks `8.21.2`). These will drift without automation. The Spectral CLI version (`6.13.1` via `npx --yes`) will also drift if `npx` resolves to a newer version. - No `ci: true` or `--json` output flag is specified for Trivy/gitleaks, so failures produce human-readable output in CI logs. This is fine for readability but SARIF or JSON output would enable future GitHub/Gitea code scanning integration. ### Recommendations - **Do not merge `container-scan` until `build-and-push` (#142) is ready.** Gate the job with `if: false` or comment it out entirely, with a note: `# Activate after #142 (image build pipeline) lands`. A job that always skips due to a missing `needs` target produces confusing CI output and may not actually skip — it may error. - **Cache Trivy and gitleaks binaries** using `actions/cache@v4` keyed on the version string. These are 30–80MB binaries; downloading them on every push is wasteful on a home NAS runner: ```yaml - uses: actions/cache@v4 with: path: ~/.tools/trivy key: trivy-v0.70.0 - name: Install Trivy (if not cached) run: | if [ ! -f ~/.tools/trivy/trivy ]; then mkdir -p ~/.tools/trivy curl -fsSL https://github.com/aquasecurity/trivy/releases/download/v0.70.0/trivy_0.70.0_Linux-64bit.tar.gz | tar xz -C ~/.tools/trivy trivy fi echo "$HOME/.tools/trivy" >> $GITHUB_PATH ``` - **Pin docker-compose.yml image tags before the `openapi-lint` job goes live.** `minio/minio:latest` and `minio/mc` pull latest on every cold boot. Pin to specific versions (`minio/minio:RELEASE.2025-04-22T22-12-26Z` or equivalent) and add them to `renovate.json`. - **Add Renovate rules for CI tool versions**: add `regexManagers` to `renovate.json` to track Trivy, gitleaks, and Spectral versions in the workflow YAML. Without automation these versions stall. - **Consider replacing the `openapi-lint` Docker Compose approach** with a Maven build-time spec export (via `springdoc-openapi-maven-plugin`) to avoid needing Docker socket access in the lint job entirely. This is simpler, faster, and has no runtime infrastructure dependency. - **Forward the `DOCKER_API_VERSION: "1.43"` env var** to any job that might use Docker on the NAS runner, as a defensive measure, even if currently unnecessary. ### Open Decisions - The `openapi-lint` job requires Docker socket access (to run `docker compose up`). This is a meaningful security boundary on a self-hosted runner. The alternative (Maven build-time spec) avoids this tradeoff entirely. Which approach fits the operational constraints of the NAS runner?
Author
Owner

📋 Elicit — Requirements Engineer

Observations

  • The issue body is well-structured and specific. Findings are traceable (each gate maps to a specific audit finding: F-07 in 2026-05-07-pre-prod-architectural-review.md), and the acceptance criteria are checkboxes with clear pass/fail conditions. This is one of the better-specified DevOps issues in the backlog.
  • One acceptance criterion is ambiguous: "CI fails on any HIGH/CRITICAL container CVE in published images (after #134/#135 image pipeline lands)." The phrase "after ... lands" makes this criterion conditionally in scope — it may not be testable at the time this issue is completed. The criterion should either be removed from this issue and tracked in #134/#135, or rephrased as "CI job container-scan is present in the workflow and will activate once #142 is merged."
  • The verification steps (steps 1–4) are concrete and executable, which is good. However, step 4 ("Push a clean branch — all gates pass green") is not independently verifiable without the gates passing against the actual current codebase. The pre-prod audit found 19 HIGH/CRITICAL Maven CVEs — these need to be remediated (or baselined in a .trivyignore) before step 4 is achievable. This remediation work is not scoped in the effort estimate.
  • The effort estimate is "M — 3-4 days including baselining the SpotBugs exclusion file." The effort to create a .trivyignore baseline for 19 HIGH/CRITICAL Maven CVEs is not included. Trivy will fail on the current pom.xml immediately. Baselining those 19 CVEs (each requiring a decision: suppress-with-justification, upgrade dep, or accept risk) is non-trivial and is itself 1-2 days of work.
  • The .spectral.yaml ruleset is listed as "new" but its content is not specified anywhere in the issue. extends: spectral:oas gives you the built-in OAS ruleset, but the project-specific rules mentioned in the issue title ("2 ERROR-severity OpenAPI issues") are not described. Without knowing what the 2 existing errors are, a reviewer cannot confirm that the Spectral config will catch them.
  • The .gitleaks.toml is marked "optional" but the issue body states 9 secrets in the working tree + 2 in git history were found. If those 9 secrets are known false positives (e.g. test fixtures), the allowlist is mandatory for the gate to pass green, not optional.

Recommendations

  • Split the container-scan acceptance criterion out of this issue and move it to #142 (or whichever image pipeline issue it depends on). This issue should be self-contained and completable without #142.
  • Add an explicit sub-task for CVE baselining: "Create backend/.trivyignore with suppression entries for each of the 19 HIGH/CRITICAL CVEs found in the audit, each annotated with: CVE ID, suppression reason (no fix available / accept risk / pending upgrade), and target review date." This is part of the M estimate and should be visible.
  • Specify the .spectral.yaml rules: document the 2 ERROR-severity OpenAPI findings from the audit report so the ruleset can be verified against them. At minimum, reference which Spectral OAS rule IDs fire on the current spec.
  • Change .gitleaks.toml from optional to required if any of the 9 found secrets are test fixtures or known false positives. The gate cannot go green without the allowlist if gitleaks flags known-safe values. Clarify which of the 9 are real secrets (to be removed from history) vs. false positives (to be allowlisted).
  • Add a prerequisite task: "Verify the 9 secrets found in the audit are either removed from git history (via git filter-repo) or added to .gitleaks.toml allowlist before enabling the gate." This work must precede the gate being activated on main.

Open Decisions

  • For the 19 HIGH/CRITICAL Maven CVEs: should the .trivyignore baseline include a review date (e.g. 90 days) after which suppressed CVEs are automatically re-evaluated? This is a policy decision about acceptable technical debt in the dependency supply chain.
  • Are the 9 secrets found in the audit real credentials that need history rewriting, or test fixtures that can be allowlisted? The answer changes the scope of this issue significantly.
## 📋 Elicit — Requirements Engineer ### Observations - The issue body is well-structured and specific. Findings are traceable (each gate maps to a specific audit finding: F-07 in `2026-05-07-pre-prod-architectural-review.md`), and the acceptance criteria are checkboxes with clear pass/fail conditions. This is one of the better-specified DevOps issues in the backlog. - One acceptance criterion is ambiguous: "CI fails on any HIGH/CRITICAL container CVE in published images (after #134/#135 image pipeline lands)." The phrase "after ... lands" makes this criterion **conditionally in scope** — it may not be testable at the time this issue is completed. The criterion should either be removed from this issue and tracked in #134/#135, or rephrased as "CI job `container-scan` is present in the workflow and will activate once #142 is merged." - The verification steps (steps 1–4) are concrete and executable, which is good. However, step 4 ("Push a clean branch — all gates pass green") is not independently verifiable without the gates passing against the actual current codebase. The pre-prod audit found 19 HIGH/CRITICAL Maven CVEs — these need to be remediated (or baselined in a `.trivyignore`) before step 4 is achievable. This remediation work is not scoped in the effort estimate. - The effort estimate is "M — 3-4 days including baselining the SpotBugs exclusion file." The effort to create a `.trivyignore` baseline for 19 HIGH/CRITICAL Maven CVEs is not included. Trivy will fail on the current `pom.xml` immediately. Baselining those 19 CVEs (each requiring a decision: suppress-with-justification, upgrade dep, or accept risk) is non-trivial and is itself 1-2 days of work. - The `.spectral.yaml` ruleset is listed as "new" but its content is not specified anywhere in the issue. `extends: spectral:oas` gives you the built-in OAS ruleset, but the project-specific rules mentioned in the issue title ("2 ERROR-severity OpenAPI issues") are not described. Without knowing what the 2 existing errors are, a reviewer cannot confirm that the Spectral config will catch them. - The `.gitleaks.toml` is marked "optional" but the issue body states 9 secrets in the working tree + 2 in git history were found. If those 9 secrets are known false positives (e.g. test fixtures), the allowlist is **mandatory** for the gate to pass green, not optional. ### Recommendations - **Split the container-scan acceptance criterion out of this issue** and move it to #142 (or whichever image pipeline issue it depends on). This issue should be self-contained and completable without #142. - **Add an explicit sub-task for CVE baselining**: "Create `backend/.trivyignore` with suppression entries for each of the 19 HIGH/CRITICAL CVEs found in the audit, each annotated with: CVE ID, suppression reason (no fix available / accept risk / pending upgrade), and target review date." This is part of the M estimate and should be visible. - **Specify the `.spectral.yaml` rules**: document the 2 ERROR-severity OpenAPI findings from the audit report so the ruleset can be verified against them. At minimum, reference which Spectral OAS rule IDs fire on the current spec. - **Change `.gitleaks.toml` from optional to required** if any of the 9 found secrets are test fixtures or known false positives. The gate cannot go green without the allowlist if gitleaks flags known-safe values. Clarify which of the 9 are real secrets (to be removed from history) vs. false positives (to be allowlisted). - **Add a prerequisite task**: "Verify the 9 secrets found in the audit are either removed from git history (via `git filter-repo`) or added to `.gitleaks.toml` allowlist before enabling the gate." This work must precede the gate being activated on main. ### Open Decisions - For the 19 HIGH/CRITICAL Maven CVEs: should the `.trivyignore` baseline include a **review date** (e.g. 90 days) after which suppressed CVEs are automatically re-evaluated? This is a policy decision about acceptable technical debt in the dependency supply chain. - Are the 9 secrets found in the audit real credentials that need history rewriting, or test fixtures that can be allowlisted? The answer changes the scope of this issue significantly.
Author
Owner

🔐 Nora "NullX" Steiner — Security Engineer

Observations

  • The issue correctly identifies the audit findings and maps each to the right tooling. The gate selection (Trivy for SCA/container, SpotBugs+FindSecBugs for SAST, gitleaks for secrets, Spectral for API linting) is a solid baseline stack. Good choices.
  • gitleaks binary integrity: the secret-scan job downloads gitleaks directly from GitHub releases and executes it: curl -fsSL ... | tar xz gitleaks && ./gitleaks detect. There is no checksum verification. Executing an unverified binary in a CI runner that has access to repo secrets and environment variables is a supply-chain risk. CWE-494 (Download of Code Without Integrity Check) applies here. The same issue affects the Trivy download in backend-sast-sca.
  • gitleaks scans .: after extracting ./gitleaks to the current directory, gitleaks runs --source . — this includes the gitleaks binary itself. While gitleaks ignores binaries by default, the binary is also extracted alongside the repository working tree, which could interfere with scan results on certain file system states. Extract to a temp directory.
  • The SpotBugs AppUser.computeColor finding is a real bug: Math.abs(id.hashCode()) % PALETTE.length where hashCode() returns Integer.MIN_VALUE causes Math.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE (JLS §15.15.4), and Integer.MIN_VALUE % 8 is -0 (negative modulo), causing ArrayIndexOutOfBoundsException. This is a correctness bug that SpotBugs INTX_BAD_PERCENT_OPERATOR or similar will flag as Priority 1. It must be fixed before the gate goes live or it blocks every merge.
  • The secret-scan job scans full history (fetch-depth: 0) — this is correct and necessary to catch the 2 secrets found in git history in the audit. However, if those historical secrets are not removed from history (via git filter-repo or BFG) before this gate is enabled, the gate will permanently block main. The issue does not address this remediation.
  • FindSecBugs version 1.13.0 (as specified in the plugin invocation) should be verified against its release notes for coverage of Spring Boot 4 / Java 21 patterns. FindSecBugs has historically had gaps in detecting Spring-specific injection patterns in newer framework versions.
  • No DAST gate: the issue adds SAST (SpotBugs), SCA (Trivy, npm audit), secret scanning (gitleaks), and API linting (Spectral). It does not address dynamic testing (OWASP ZAP, Nuclei). For a project where the pre-prod audit found no CSP and an unprotected admin-default-password, a DAST probe against the running app on PR branches would catch runtime security regressions that static tools miss. This is out of scope for this issue but should be a follow-up.
  • Spectral linting an OpenAPI spec does not catch runtime security misconfigurations — missing @RequirePermission, accidentally permitAll() endpoints, or CORS misconfig. The 2 ERROR findings from the audit are structural API issues, which Spectral catches well. Security-specific OpenAPI checks (e.g., missing securitySchemes references on endpoints) require custom Spectral rules.

Recommendations

  • Verify binary checksums before execution for both Trivy and gitleaks. Download the .sha256sum file alongside the tarball and verify before extracting:
    curl -fsSL https://github.com/gitleaks/gitleaks/releases/download/v8.21.2/gitleaks_8.21.2_linux_x64.tar.gz -o gitleaks.tar.gz
    curl -fsSL https://github.com/gitleaks/gitleaks/releases/download/v8.21.2/checksums.txt | grep linux_x64.tar.gz | sha256sum -c -
    tar xz -C /tmp -f gitleaks.tar.gz gitleaks
    
    This closes the supply-chain gap. Without verification, a compromised GitHub release or MITM attack could execute arbitrary code with access to all CI secrets.
  • Remove the 2 historical secrets before enabling the secret-scan gate on main. Use git filter-repo --path <file> --invert-paths or BFG Repo Cleaner. This is non-negotiable — the gate will block main permanently otherwise. Document the remediation in the commit message.
  • Fix AppUser.computeColor as a prerequisite (not a follow-up). Use Math.floorMod(id.hashCode(), PALETTE.length) which is defined to return a non-negative result for any divisor. This is a one-line fix that unblocks the SpotBugs gate.
  • Add a --exit-code 1 and --no-progress flag to Trivy to suppress verbose output in CI logs and ensure non-zero exit on findings. The issue already has --exit-code 1; add --no-progress to keep logs clean.
  • Write a custom Spectral rule to detect endpoints missing the security field in the OpenAPI spec. This catches cases where @RequirePermission was forgotten and the endpoint is documented as unauthenticated. Add it to .spectral.yaml as a project-specific rule.
  • Track a follow-up issue for DAST: open a separate issue for adding a ZAP or Nuclei scan against the running stack on PR branches (can run nightly or only on PR to main). The static gates in this issue are necessary but not sufficient for the auth-related findings from the audit.

Open Decisions

  • The pre-prod audit found 9 secrets in the working tree. Are these real credentials (requiring immediate rotation + history rewrite) or test fixtures / placeholder values (allowlistable in .gitleaks.toml)? This determines whether the gitleaks gate can go live on the current branch at all.
## 🔐 Nora "NullX" Steiner — Security Engineer ### Observations - The issue correctly identifies the audit findings and maps each to the right tooling. The gate selection (Trivy for SCA/container, SpotBugs+FindSecBugs for SAST, gitleaks for secrets, Spectral for API linting) is a solid baseline stack. Good choices. - **gitleaks binary integrity**: the `secret-scan` job downloads gitleaks directly from GitHub releases and executes it: `curl -fsSL ... | tar xz gitleaks && ./gitleaks detect`. There is no checksum verification. Executing an unverified binary in a CI runner that has access to repo secrets and environment variables is a supply-chain risk. CWE-494 (Download of Code Without Integrity Check) applies here. The same issue affects the Trivy download in `backend-sast-sca`. - **gitleaks scans `.`**: after extracting `./gitleaks` to the current directory, gitleaks runs `--source .` — this includes the gitleaks binary itself. While gitleaks ignores binaries by default, the binary is also extracted alongside the repository working tree, which could interfere with scan results on certain file system states. Extract to a temp directory. - **The SpotBugs `AppUser.computeColor` finding is a real bug**: `Math.abs(id.hashCode()) % PALETTE.length` where `hashCode()` returns `Integer.MIN_VALUE` causes `Math.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE` (JLS §15.15.4), and `Integer.MIN_VALUE % 8` is `-0` (negative modulo), causing `ArrayIndexOutOfBoundsException`. This is a *correctness* bug that SpotBugs INTX_BAD_PERCENT_OPERATOR or similar will flag as Priority 1. It must be fixed before the gate goes live or it blocks every merge. - **The `secret-scan` job scans full history** (`fetch-depth: 0`) — this is correct and necessary to catch the 2 secrets found in git history in the audit. However, if those historical secrets are not removed from history (via `git filter-repo` or BFG) before this gate is enabled, the gate will permanently block `main`. The issue does not address this remediation. - **FindSecBugs version 1.13.0** (as specified in the plugin invocation) should be verified against its release notes for coverage of Spring Boot 4 / Java 21 patterns. FindSecBugs has historically had gaps in detecting Spring-specific injection patterns in newer framework versions. - **No DAST gate**: the issue adds SAST (SpotBugs), SCA (Trivy, npm audit), secret scanning (gitleaks), and API linting (Spectral). It does not address dynamic testing (OWASP ZAP, Nuclei). For a project where the pre-prod audit found no CSP and an unprotected admin-default-password, a DAST probe against the running app on PR branches would catch runtime security regressions that static tools miss. This is out of scope for this issue but should be a follow-up. - **Spectral linting an OpenAPI spec** does not catch runtime security misconfigurations — missing `@RequirePermission`, accidentally `permitAll()` endpoints, or CORS misconfig. The 2 ERROR findings from the audit are structural API issues, which Spectral catches well. Security-specific OpenAPI checks (e.g., missing `securitySchemes` references on endpoints) require custom Spectral rules. ### Recommendations - **Verify binary checksums before execution** for both Trivy and gitleaks. Download the `.sha256sum` file alongside the tarball and verify before extracting: ```bash curl -fsSL https://github.com/gitleaks/gitleaks/releases/download/v8.21.2/gitleaks_8.21.2_linux_x64.tar.gz -o gitleaks.tar.gz curl -fsSL https://github.com/gitleaks/gitleaks/releases/download/v8.21.2/checksums.txt | grep linux_x64.tar.gz | sha256sum -c - tar xz -C /tmp -f gitleaks.tar.gz gitleaks ``` This closes the supply-chain gap. Without verification, a compromised GitHub release or MITM attack could execute arbitrary code with access to all CI secrets. - **Remove the 2 historical secrets before enabling the `secret-scan` gate on `main`**. Use `git filter-repo --path <file> --invert-paths` or BFG Repo Cleaner. This is non-negotiable — the gate will block `main` permanently otherwise. Document the remediation in the commit message. - **Fix `AppUser.computeColor` as a prerequisite** (not a follow-up). Use `Math.floorMod(id.hashCode(), PALETTE.length)` which is defined to return a non-negative result for any divisor. This is a one-line fix that unblocks the SpotBugs gate. - **Add a `--exit-code 1` and `--no-progress` flag to Trivy** to suppress verbose output in CI logs and ensure non-zero exit on findings. The issue already has `--exit-code 1`; add `--no-progress` to keep logs clean. - **Write a custom Spectral rule** to detect endpoints missing the `security` field in the OpenAPI spec. This catches cases where `@RequirePermission` was forgotten and the endpoint is documented as unauthenticated. Add it to `.spectral.yaml` as a project-specific rule. - **Track a follow-up issue for DAST**: open a separate issue for adding a ZAP or Nuclei scan against the running stack on PR branches (can run nightly or only on PR to `main`). The static gates in this issue are necessary but not sufficient for the auth-related findings from the audit. ### Open Decisions - The pre-prod audit found 9 secrets in the working tree. Are these real credentials (requiring immediate rotation + history rewrite) or test fixtures / placeholder values (allowlistable in `.gitleaks.toml`)? This determines whether the gitleaks gate can go live on the current branch at all.
Author
Owner

🧪 Sara Holt — QA Engineer & Test Strategist

Observations

  • The issue specifies 4 verification steps at the end, which is good. However, steps 1–3 test the gates fail when they should — they do not test that existing code passes all gates clean. Step 4 ("push a clean branch — all gates pass green") is the acceptance criterion that matters most, and it cannot currently be satisfied: the audit found 19 HIGH/CRITICAL Maven CVEs, a P1 SpotBugs bug, 9 secrets in the working tree, and 2 Spectral errors. The test pyramid for this issue is: gate-fails-correctly tests (steps 1-3) and gate-passes-correctly test (step 4). Step 4 requires CVE baselining, secret remediation, and bug fixes before it can pass. This remediation work should be listed as explicit pre-conditions.
  • The verification steps are described as manual ("push a branch that intentionally introduces..."). For a CI gate issue, the verification should itself be automated and repeatable. The recommended approach: include a minimal test fixture — e.g., a known-vulnerable lodash version in a scratch package.json, a fake-secret string, a broken @PathVariable — in the PR and verify the gates catch them. Then remove the fixtures before merge. This is the standard approach for testing security gates.
  • There are no tests specified for the spotbugs-exclude.xml filter itself. If the exclusion file incorrectly suppresses a real finding (rather than a Lombok false positive), there is no test to catch it. A minimal integration test — "SpotBugs run on a class with a known EI_EXPOSE_REP violation passes (confirming the exclusion works) AND a class with a non-EI_EXPOSE_REP violation fails (confirming the exclusion is scoped)" — would validate the filter's correctness.
  • The frontend-sca job (npm audit --audit-level=high --omit=dev) will gate on the audit results at the time of first run. If there are any HIGH npm CVEs in production dependencies at merge time, this job will immediately block main. The current package.json should be audited first and any existing HIGH/CRITICAL vulnerabilities resolved before the gate is enabled.
  • The openapi-lint job has no mechanism to upload the openapi.json artifact, making it difficult to debug Spectral failures. If Spectral reports an error, the dev needs to re-run the job locally to see the spec — the spec should be uploaded as a CI artifact for inspection.
  • The .spectral.yaml ruleset content is not defined in the issue. Without knowing the rules, it is impossible to write tests for the Spectral gate. The 2 ERROR-severity issues found in the audit should be reproducible test cases: deliberately introduce each error condition, verify Spectral catches it, fix it, verify Spectral passes.

Recommendations

  • Run npm audit --audit-level=high --omit=dev locally before merging this issue and resolve any existing HIGH/CRITICAL findings in frontend/package.json. With only 7 production dependencies, this should be fast. Document the result (clean or 0 HIGH) in the PR description.
  • Upload openapi.json as a CI artifact in the openapi-lint job, always (not just on failure). Developers need to see the spec that Spectral rejected:
    - uses: actions/upload-artifact@v4
      if: always()
      with:
        name: openapi-spec
        path: openapi.json
    
  • Validate the SpotBugs exclusion filter by running SpotBugs locally against the backend before merging. The goal is to confirm: (a) 288 Lombok warnings are suppressed, (b) the AppUser.computeColor P1 finding appears (to confirm the gate would catch it if not fixed), (c) no other P1 findings exist that would block main after the gate goes live.
  • Document a test plan in the PR matching the 4 verification steps: for each step, specify the exact branch, commit, and command that was used to verify the gate fired correctly. This creates a reproducible audit trail.
  • Add a quality gate check for test coverage on the new CI configuration itself: while you cannot unit-test YAML workflow files, the integration between jobs (e.g., container-scan needing build-and-push) should be smoke-tested on a test branch before landing on main.

Open Decisions

  • None — the recommendations above are concrete actions with clear expected outcomes.
## 🧪 Sara Holt — QA Engineer & Test Strategist ### Observations - The issue specifies 4 verification steps at the end, which is good. However, steps 1–3 test the *gates fail when they should* — they do not test that *existing code passes all gates clean*. Step 4 ("push a clean branch — all gates pass green") is the acceptance criterion that matters most, and it cannot currently be satisfied: the audit found 19 HIGH/CRITICAL Maven CVEs, a P1 SpotBugs bug, 9 secrets in the working tree, and 2 Spectral errors. The test pyramid for this issue is: gate-fails-correctly tests (steps 1-3) and gate-passes-correctly test (step 4). Step 4 requires CVE baselining, secret remediation, and bug fixes before it can pass. This remediation work should be listed as explicit pre-conditions. - The verification steps are described as manual ("push a branch that intentionally introduces..."). For a CI gate issue, the verification should itself be automated and repeatable. The recommended approach: include a minimal test fixture — e.g., a known-vulnerable `lodash` version in a scratch `package.json`, a fake-secret string, a broken `@PathVariable` — in the PR and verify the gates catch them. Then remove the fixtures before merge. This is the standard approach for testing security gates. - There are no tests specified for the `spotbugs-exclude.xml` filter itself. If the exclusion file incorrectly suppresses a real finding (rather than a Lombok false positive), there is no test to catch it. A minimal integration test — "SpotBugs run on a class with a known EI_EXPOSE_REP violation passes (confirming the exclusion works) AND a class with a non-EI_EXPOSE_REP violation fails (confirming the exclusion is scoped)" — would validate the filter's correctness. - The `frontend-sca` job (`npm audit --audit-level=high --omit=dev`) will gate on the audit results at the time of first run. If there are any HIGH npm CVEs in production dependencies at merge time, this job will immediately block `main`. The current `package.json` should be audited first and any existing HIGH/CRITICAL vulnerabilities resolved before the gate is enabled. - The `openapi-lint` job has no mechanism to upload the `openapi.json` artifact, making it difficult to debug Spectral failures. If Spectral reports an error, the dev needs to re-run the job locally to see the spec — the spec should be uploaded as a CI artifact for inspection. - The `.spectral.yaml` ruleset content is not defined in the issue. Without knowing the rules, it is impossible to write tests for the Spectral gate. The 2 ERROR-severity issues found in the audit should be reproducible test cases: deliberately introduce each error condition, verify Spectral catches it, fix it, verify Spectral passes. ### Recommendations - **Run `npm audit --audit-level=high --omit=dev` locally before merging this issue** and resolve any existing HIGH/CRITICAL findings in `frontend/package.json`. With only 7 production dependencies, this should be fast. Document the result (clean or 0 HIGH) in the PR description. - **Upload `openapi.json` as a CI artifact** in the `openapi-lint` job, always (not just on failure). Developers need to see the spec that Spectral rejected: ```yaml - uses: actions/upload-artifact@v4 if: always() with: name: openapi-spec path: openapi.json ``` - **Validate the SpotBugs exclusion filter** by running SpotBugs locally against the backend before merging. The goal is to confirm: (a) 288 Lombok warnings are suppressed, (b) the `AppUser.computeColor` P1 finding appears (to confirm the gate would catch it if not fixed), (c) no other P1 findings exist that would block `main` after the gate goes live. - **Document a test plan in the PR** matching the 4 verification steps: for each step, specify the exact branch, commit, and command that was used to verify the gate fired correctly. This creates a reproducible audit trail. - **Add a quality gate check for test coverage on the new CI configuration itself**: while you cannot unit-test YAML workflow files, the integration between jobs (e.g., `container-scan` needing `build-and-push`) should be smoke-tested on a test branch before landing on `main`. ### Open Decisions - None — the recommendations above are concrete actions with clear expected outcomes.
Author
Owner

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Observations

This is a DevOps/CI issue with no frontend UI surface — no components, no routes, no Tailwind, no Svelte. There is nothing to review from a visual design, accessibility, or responsive layout perspective.

However, there is one indirect UX dimension worth noting: the openapi-lint Spectral gate will enforce the API contract that the TypeScript client (openapi-fetch) depends on. Broken OpenAPI specs cause broken TypeScript type generation, which in turn causes UX regressions (unexpected undefined values, broken form state, missing labels). The Spectral gate is therefore an indirect UX quality gate.

Recommendations

  • When writing .spectral.yaml rules, include a check that every API response schema field marked as required in the OpenAPI spec has @Schema(requiredMode = REQUIRED) on the corresponding Java entity field. This is the mechanism that drives TypeScript non-optional types (result.data!.id vs. result.data?.id). A missing requiredMode causes the TypeScript type to become optional, which propagates to the UI as potential undefined renders. This is a UX correctness concern, not just an API pedantry.
  • The CI feedback loop affects developer experience. Consider adding a --format github or equivalent flag to Spectral to get inline PR annotations on failing rules, rather than requiring developers to read raw log output.

Open Decisions

  • None.
## 🎨 Leonie Voss — UX Designer & Accessibility Strategist ### Observations This is a DevOps/CI issue with no frontend UI surface — no components, no routes, no Tailwind, no Svelte. There is nothing to review from a visual design, accessibility, or responsive layout perspective. However, there is one indirect UX dimension worth noting: the `openapi-lint` Spectral gate will enforce the API contract that the TypeScript client (`openapi-fetch`) depends on. Broken OpenAPI specs cause broken TypeScript type generation, which in turn causes UX regressions (unexpected `undefined` values, broken form state, missing labels). The Spectral gate is therefore an indirect UX quality gate. ### Recommendations - When writing `.spectral.yaml` rules, include a check that every API response schema field marked as `required` in the OpenAPI spec has `@Schema(requiredMode = REQUIRED)` on the corresponding Java entity field. This is the mechanism that drives TypeScript non-optional types (`result.data!.id` vs. `result.data?.id`). A missing `requiredMode` causes the TypeScript type to become optional, which propagates to the UI as potential `undefined` renders. This is a UX correctness concern, not just an API pedantry. - The CI feedback loop affects developer experience. Consider adding a `--format github` or equivalent flag to Spectral to get inline PR annotations on failing rules, rather than requiring developers to read raw log output. ### Open Decisions - None.
Author
Owner

Decision Queue — Issue #461

Grouped open decisions from the persona review. Each needs a concrete answer before or during implementation.


Theme A — Secret remediation scope (Nora + Elicit)

Question: Are the 9 secrets found in the working tree and 2 in git history real credentials requiring rotation + history rewrite, or test fixtures / placeholder values that can be allowlisted in .gitleaks.toml?

Why it matters: If any are real credentials, they must be rotated and history-rewritten (git filter-repo) before the secret-scan gate goes live on main. The gate will permanently block main otherwise. If all 9 are test fixtures, a .gitleaks.toml allowlist unblocks the gate without history rewriting — but .gitleaks.toml moves from "optional" (as currently listed) to required.

Blocks: secret-scan gate going live.


Theme B — CVE baseline policy (Elicit + Nora)

Question A: The pre-prod audit found 19 HIGH/CRITICAL Maven CVEs. Should .trivyignore suppressions include an expiry/review date (e.g., 90 days) after which they are re-evaluated automatically?

Why it matters: A review-date policy prevents .trivyignore from accumulating stale suppressions that become permanent technical debt. Without it, a CVE suppressed today because "no fix available" may have a fix in 3 months but remain suppressed forever.

Question B: For each of the 19 CVEs, what is the decision: upgrade the dependency, accept risk, or suppress-with-justification? This work is 1–2 days not currently in the effort estimate.

Blocks: Step 4 of the verification checklist ("push a clean branch — all gates pass green").


Theme C — openapi-lint implementation approach (Markus + Tobias)

Question: Should openapi-lint boot the full Docker Compose stack at CI runtime to generate the OpenAPI spec, or should it use the springdoc-openapi-maven-plugin to generate a static spec at build time?

Why it matters: The Docker Compose approach requires Docker socket access on the self-hosted NAS runner (a security boundary), has no timeout on the backend startup poll, and pulls minio/minio:latest (unpinned). The Maven build-time approach avoids all three problems but requires configuring a new Maven plugin.

Blocks: openapi-lint job design — affects both the workflow YAML and whether docker-compose.yml image pinning is a prerequisite.


Theme D — SpotBugs exclusion scope (Markus)

Question: Should spotbugs-exclude.xml suppress EI_EXPOSE_REP/REP2 warnings globally (for all classes) or only for classes in the entity packages (e.g., org.raddatz.familienarchiv.*.Document, *.Person, etc.)?

Why it matters: A global suppression hides the same warning class if it appears in a non-entity service class where it would be a real finding (a service exposing an internal mutable collection reference). A package-scoped suppression is more precise but requires listing all entity packages.

Blocks: spotbugs-exclude.xml authoring.


Theme E — Spectral severity boundary (Markus)

Question: Should openapi-lint gate on --fail-severity error only (as currently proposed), or also on --fail-severity warn?

Why it matters: The built-in spectral:oas ruleset has many warn-level rules (e.g., missing description on operations, inconsistent tag casing) that could indicate real issues without being hard errors. Gating on warn is stricter but may produce noise; gating only on error may miss meaningful API contract problems.

Blocks: .spectral.yaml ruleset authoring and the --fail-severity flag choice.

## Decision Queue — Issue #461 Grouped open decisions from the persona review. Each needs a concrete answer before or during implementation. --- ### Theme A — Secret remediation scope (Nora + Elicit) **Question:** Are the 9 secrets found in the working tree and 2 in git history real credentials requiring rotation + history rewrite, or test fixtures / placeholder values that can be allowlisted in `.gitleaks.toml`? **Why it matters:** If any are real credentials, they must be rotated and history-rewritten (`git filter-repo`) *before* the `secret-scan` gate goes live on `main`. The gate will permanently block `main` otherwise. If all 9 are test fixtures, a `.gitleaks.toml` allowlist unblocks the gate without history rewriting — but `.gitleaks.toml` moves from "optional" (as currently listed) to **required**. **Blocks:** `secret-scan` gate going live. --- ### Theme B — CVE baseline policy (Elicit + Nora) **Question A:** The pre-prod audit found 19 HIGH/CRITICAL Maven CVEs. Should `.trivyignore` suppressions include an expiry/review date (e.g., 90 days) after which they are re-evaluated automatically? **Why it matters:** A review-date policy prevents `.trivyignore` from accumulating stale suppressions that become permanent technical debt. Without it, a CVE suppressed today because "no fix available" may have a fix in 3 months but remain suppressed forever. **Question B:** For each of the 19 CVEs, what is the decision: upgrade the dependency, accept risk, or suppress-with-justification? This work is 1–2 days not currently in the effort estimate. **Blocks:** Step 4 of the verification checklist ("push a clean branch — all gates pass green"). --- ### Theme C — openapi-lint implementation approach (Markus + Tobias) **Question:** Should `openapi-lint` boot the full Docker Compose stack at CI runtime to generate the OpenAPI spec, or should it use the `springdoc-openapi-maven-plugin` to generate a static spec at build time? **Why it matters:** The Docker Compose approach requires Docker socket access on the self-hosted NAS runner (a security boundary), has no timeout on the backend startup poll, and pulls `minio/minio:latest` (unpinned). The Maven build-time approach avoids all three problems but requires configuring a new Maven plugin. **Blocks:** `openapi-lint` job design — affects both the workflow YAML and whether `docker-compose.yml` image pinning is a prerequisite. --- ### Theme D — SpotBugs exclusion scope (Markus) **Question:** Should `spotbugs-exclude.xml` suppress `EI_EXPOSE_REP/REP2` warnings globally (for all classes) or only for classes in the entity packages (e.g., `org.raddatz.familienarchiv.*.Document`, `*.Person`, etc.)? **Why it matters:** A global suppression hides the same warning class if it appears in a non-entity service class where it would be a real finding (a service exposing an internal mutable collection reference). A package-scoped suppression is more precise but requires listing all entity packages. **Blocks:** `spotbugs-exclude.xml` authoring. --- ### Theme E — Spectral severity boundary (Markus) **Question:** Should `openapi-lint` gate on `--fail-severity error` only (as currently proposed), or also on `--fail-severity warn`? **Why it matters:** The built-in `spectral:oas` ruleset has many `warn`-level rules (e.g., missing `description` on operations, inconsistent tag casing) that could indicate real issues without being hard errors. Gating on `warn` is stricter but may produce noise; gating only on `error` may miss meaningful API contract problems. **Blocks:** `.spectral.yaml` ruleset authoring and the `--fail-severity` flag choice.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#461