devops(ci): add SAST/SCA/secret-scan/container-scan gates to .gitea/workflows/ci.yml #461
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
.gitea/workflows/ci.ymlcurrently runs only lint + Vitest + JUnit + a sliver of pytest. No SAST, no SCA, no container scanning, no secret scanning, no SBOM. The pre-prod audit ran every one of those tools manually and found:backend/pom.xml(would have been caught by trivy / OWASP Dependency-Check).trivy image).AppUser.computeColor(would have been caught by SpotBugs).Without these gates, regressions reach production.
Approach
Add four new jobs to
.gitea/workflows/ci.yml. Run them in parallel with the existing test jobs; gate merges on their success.Job:
frontend-scaJob:
backend-sast-scaPre-requisite: an
<excludeFilterFile>spotbugs-exclude.xml</excludeFilterFile>to suppress the 288 LombokEI_EXPOSE_REP/REP2false positives the audit hit.Job:
secret-scanJob:
container-scanRuns after build-and-push (per #142). Scans the three images we publish:
Job:
openapi-lintCritical files
.gitea/workflows/ci.yml— add 4-5 new jobsbackend/spotbugs-exclude.xml— new, suppress Lombok EI_EXPOSE_REP false positives.spectral.yaml— new, extendsspectral:oaswith project-specific rules.gitleaks.toml— optional, custom rules / allowlist for known false positivesVerification
lodash@4.17.20). Thefrontend-scaorbackend-sast-scajob fails.password = "hunter2").secret-scanfails.@PathVariablename).openapi-lintfails.Acceptance criteria
spotbugs-exclude.xmlsuppresses LombokEI_EXPOSE_REP/REP2false positives.Effort
M — 3-4 days including baselining the Spotbugs exclusion file (suppressing 288 Lombok false positives) and testing each gate against intentional regressions.
Risk if not addressed
CVEs accumulate silently. Secrets get committed undetected. Container vulnerabilities reach production.
Tracked in audit doc as F-07 (Critical, escalated).
🏗️ Markus Keller — Application Architect
Observations
frontend-sca,backend-sast-sca,secret-scan,container-scan) are structurally sound and follow the correct parallelism pattern — they run alongside existing test jobs, which is the right call for wall-clock time.openapi-lintjob has a fundamental design problem: it boots the full Docker Compose stack (backend db minio) inside the CI runner just to capture an OpenAPI spec. This makesopenapi-lintthe slowest, most fragile job in the pipeline and creates a hidden dependency on a running Postgres and MinIO instance that the other jobs don't need.container-scanjob has a blocking prerequisite problem: it depends onbuild-and-push(issue #142), which does not exist yet inci.yml. As a result this job cannot be merged as a functioning gate — it silently becomes a no-op until #142 lands. The issue correctly notes this as a prerequisite but the dependency should be called out more explicitly in the acceptance criteria.ubuntu-latestrunners for all new jobs while existing jobs mixubuntu-latestwith the Playwright container image. This is consistent and correct.backend-sast-scainlines the Trivy binary download on every run (curl ... | tar xz trivy) without caching. On the project's self-hosted NAS runner, this is a network hit on every push. The same pattern appears for gitleaks.spotbugs-exclude.xmlis listed as a new file that suppresses 288 LombokEI_EXPOSE_REP/REP2false positives. This is architecturally correct — the Lombok@Datapattern on entities (per CLAUDE.md convention) will produce these warnings by design. The exclusion file is not noise suppression, it is an intentional design encoding that Lombok-annotated entities expose mutable collections by reference.AppUser.computeColorSpotBugs P1 finding deserves a look:Math.abs(id.hashCode()) % PALETTE.lengthhas a known Java bug —Math.abs(Integer.MIN_VALUE)returnsInteger.MIN_VALUE(negative), making%return a negative index. This causes anArrayIndexOutOfBoundsExceptionfor roughly 1 in 2^31 UUIDs. SpotBugs will catch this correctly. The fix isMath.floorMod(id.hashCode(), PALETTE.length).Recommendations
./mvnw spring-boot:runin thebackend-sast-scajob using an H2 profile (or a dedicated test profile that skips Postgres), wait for/v3/api-docs, dump the spec, then run Spectral. This avoids thedocker compose upin CI entirely. Alternatively, configure SpringDoc to generate the spec at build time viaspringdoc-openapi-maven-pluginand lint the static output.actions/cache@v4keyed on the binary version. On the self-hosted runner, the download cost is real. Cache both under~/.tools/trivy-VERSIONand~/.tools/gitleaks-VERSION.needs: [build-and-push] # requires #142and add an acceptance criterion: "CI fails if #142 is not merged first." This prevents the job landing in a no-op state.AppUser.computeColorbefore this PR lands: the SpotBugs gate will catch it and block every future merge. Proactively fixMath.abs(id.hashCode())→Math.floorMod(id.hashCode(), PALETTE.length)in this issue or open a dedicated bug.docs/adr/ADR-00X-ci-security-gates.mdbefore merging. The "why Trivy, why gitleaks, why these severity thresholds" decisions have lasting consequences and will be questioned during future upgrades.docs/architecture/c4/l2-containers.puml: per the doc-update rule, adding new CI infrastructure tools is a C4 L2 change. The CI pipeline is part of the delivery system.Open Decisions
openapi-lintblock only onERROR-severity Spectral findings (as proposed), or also onWARN? The issue says--fail-severity errorwhich is correct for a gate — but the.spectral.yamlruleset content is not specified and the OAS ruleset has manywarn-level rules that may hide real issues. Decide the severity boundary explicitly.spotbugs-exclude.xmlsuppresses 288 Lombok warnings globally. Should it suppress them only on entity classes (by package pattern) rather than globally? A global suppression hides the same warning class if it appears in a non-entity service class where it would be a real finding.👨💻 Felix Brandt — Fullstack Developer
Observations
backend-sast-scacalls the plugin directly via./mvnw com.github.spotbugs:spotbugs-maven-plugin:checkwithout first adding the plugin topom.xml. This will work, but the plugin version is implicitly resolved at invocation time. For reproducibility the plugin should be declared in<build><plugins>inpom.xmlwith a pinned version, so the same version is used locally and in CI.AppUser.computeColoris identified as a P1 SpotBugs finding. The actual bug isMath.abs(id.hashCode()) % PALETTE.lengthwherehashCode()can returnInteger.MIN_VALUEandMath.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE(negative).PALETTE[negative]throwsArrayIndexOutOfBoundsException. This should be fixed before the SpotBugs gate is switched on, not as a follow-up — otherwise the gate will block the main branch immediately after merging this issue.secret-scanjob usesfetch-depth: 0(full history), which is correct — scanning only the last commit would miss secrets committed earlier and then "fixed" by deletion. This is the right default for gitleaks.frontend-scastep usesnpm audit --omit=dev, which skipsdevDependencies. This is the right call for a production gate — dev tooling CVEs are noise. However,vitest-browser-svelte,playwright, and@inlang/paraglide-jsare indevDependenciesbut their bundled output may not reach production. The--omit=devflag is correct.openapi-lintjob proposes starting the backend viadocker compose up -d backend db minioand polling/v3/api-docs. This is fragile in several ways: (1) the job needs Docker-in-Docker or a Docker socket, which the NAS runner may not provide in a container context; (2) the 95-characteruntilpolling loop has no timeout — if the backend fails to start, the job hangs indefinitely..(current directory), meaning it becomes an untracked file in the working tree. If gitleaks then scans.it will find its own binary. Add the binary to a temp directory outside the repo, or use--source .carefully.curl -fsSL ... | tar xz trivy— this executes untrusted bytes from the internet directly in the runner. For a security-focused job, the binary hash should be verified before execution.Recommendations
pom.xmlas a pinned plugin rather than calling it viamvnw plugin:goal. This makes the version explicit, allows local./mvnw spotbugs:checkduring development, and ensures CI and local runs are identical:AppUser.computeColornow: changeMath.abs(id.hashCode()) % PALETTE.lengthtoMath.floorMod(id.hashCode(), PALETTE.length).Math.floorModalways returns a non-negative result. Ship the fix in this PR or as a prerequisite — do not let the SpotBugs gate go live with a known P1 finding blocking the build.openapi-lintpolling loop:timeout, a backend startup failure silently hangs the CI runner for the runner's maximum job duration..tar.gzand the corresponding.tar.gz.sha256sum, verify before executing. This is table stakes for a security-focused step:$RUNNER_TEMPor/tmpto prevent gitleaks from scanning its own binary when run with--source .. Extract to$(mktemp -d)and add that path toPATH.Open Decisions
🔧 Tobias Wendt — DevOps & Platform Engineer
Observations
DOCKER_API_VERSION: "1.43"set for Testcontainers in the existingbackend-unit-testsjob). The proposed jobs all useruns-on: ubuntu-latestwithout this variable. If the NAS runner's Testcontainers behavior is affected by Docker API version, the new jobs may need the same env var — though none of the new jobs use Testcontainers, so this is low risk.container-scanjob depends onbuild-and-push(issue #142/#134/#135), which does not yet exist in the workflow. Currently the three services built indocker-compose.ymlare allbuild:context builds with no published image names (familienarchiv-backend,familienarchiv-frontend,familienarchiv-ocr-serviceare proposed names). There is no image registry configured, noREGISTRYsecret defined, and noenv.REGISTRYvariable in the current workflow. The entirecontainer-scanjob is non-functional until the image pipeline lands.openapi-lintjob runsdocker compose up -d backend db minioinside the CI runner. The self-hosted runner does not appear to run inside a container (the existing workflow runs containers viacontainer:directive for the Playwright job but the runner itself has Docker access). This approach requires Docker socket access in the runner, which is a security tradeoff: any job that can calldocker composecan also escape to the host via volume mounts. For a security-focused issue, this is worth flagging.docker-compose.ymlcurrently hasminio/minio:latest(not pinned) andminio/mc(not pinned). This means theopenapi-lintjob that doesdocker compose up miniowill pull:latestevery run — non-reproducible and potentially slow on a cold NAS runner cache. This pre-existing issue becomes more visible once CI boots the stack on every push.actions/cache@v4.renovate.jsoncurrently has no configuration for CI workflow tool versions (Trivy0.70.0, gitleaks8.21.2). These will drift without automation. The Spectral CLI version (6.13.1vianpx --yes) will also drift ifnpxresolves to a newer version.ci: trueor--jsonoutput flag is specified for Trivy/gitleaks, so failures produce human-readable output in CI logs. This is fine for readability but SARIF or JSON output would enable future GitHub/Gitea code scanning integration.Recommendations
container-scanuntilbuild-and-push(#142) is ready. Gate the job withif: falseor comment it out entirely, with a note:# Activate after #142 (image build pipeline) lands. A job that always skips due to a missingneedstarget produces confusing CI output and may not actually skip — it may error.actions/cache@v4keyed on the version string. These are 30–80MB binaries; downloading them on every push is wasteful on a home NAS runner:openapi-lintjob goes live.minio/minio:latestandminio/mcpull latest on every cold boot. Pin to specific versions (minio/minio:RELEASE.2025-04-22T22-12-26Zor equivalent) and add them torenovate.json.regexManagerstorenovate.jsonto track Trivy, gitleaks, and Spectral versions in the workflow YAML. Without automation these versions stall.openapi-lintDocker Compose approach with a Maven build-time spec export (viaspringdoc-openapi-maven-plugin) to avoid needing Docker socket access in the lint job entirely. This is simpler, faster, and has no runtime infrastructure dependency.DOCKER_API_VERSION: "1.43"env var to any job that might use Docker on the NAS runner, as a defensive measure, even if currently unnecessary.Open Decisions
openapi-lintjob requires Docker socket access (to rundocker compose up). This is a meaningful security boundary on a self-hosted runner. The alternative (Maven build-time spec) avoids this tradeoff entirely. Which approach fits the operational constraints of the NAS runner?📋 Elicit — Requirements Engineer
Observations
2026-05-07-pre-prod-architectural-review.md), and the acceptance criteria are checkboxes with clear pass/fail conditions. This is one of the better-specified DevOps issues in the backlog.container-scanis present in the workflow and will activate once #142 is merged.".trivyignore) before step 4 is achievable. This remediation work is not scoped in the effort estimate..trivyignorebaseline for 19 HIGH/CRITICAL Maven CVEs is not included. Trivy will fail on the currentpom.xmlimmediately. Baselining those 19 CVEs (each requiring a decision: suppress-with-justification, upgrade dep, or accept risk) is non-trivial and is itself 1-2 days of work..spectral.yamlruleset is listed as "new" but its content is not specified anywhere in the issue.extends: spectral:oasgives you the built-in OAS ruleset, but the project-specific rules mentioned in the issue title ("2 ERROR-severity OpenAPI issues") are not described. Without knowing what the 2 existing errors are, a reviewer cannot confirm that the Spectral config will catch them..gitleaks.tomlis marked "optional" but the issue body states 9 secrets in the working tree + 2 in git history were found. If those 9 secrets are known false positives (e.g. test fixtures), the allowlist is mandatory for the gate to pass green, not optional.Recommendations
backend/.trivyignorewith suppression entries for each of the 19 HIGH/CRITICAL CVEs found in the audit, each annotated with: CVE ID, suppression reason (no fix available / accept risk / pending upgrade), and target review date." This is part of the M estimate and should be visible..spectral.yamlrules: document the 2 ERROR-severity OpenAPI findings from the audit report so the ruleset can be verified against them. At minimum, reference which Spectral OAS rule IDs fire on the current spec..gitleaks.tomlfrom optional to required if any of the 9 found secrets are test fixtures or known false positives. The gate cannot go green without the allowlist if gitleaks flags known-safe values. Clarify which of the 9 are real secrets (to be removed from history) vs. false positives (to be allowlisted).git filter-repo) or added to.gitleaks.tomlallowlist before enabling the gate." This work must precede the gate being activated on main.Open Decisions
.trivyignorebaseline include a review date (e.g. 90 days) after which suppressed CVEs are automatically re-evaluated? This is a policy decision about acceptable technical debt in the dependency supply chain.🔐 Nora "NullX" Steiner — Security Engineer
Observations
secret-scanjob downloads gitleaks directly from GitHub releases and executes it:curl -fsSL ... | tar xz gitleaks && ./gitleaks detect. There is no checksum verification. Executing an unverified binary in a CI runner that has access to repo secrets and environment variables is a supply-chain risk. CWE-494 (Download of Code Without Integrity Check) applies here. The same issue affects the Trivy download inbackend-sast-sca..: after extracting./gitleaksto the current directory, gitleaks runs--source .— this includes the gitleaks binary itself. While gitleaks ignores binaries by default, the binary is also extracted alongside the repository working tree, which could interfere with scan results on certain file system states. Extract to a temp directory.AppUser.computeColorfinding is a real bug:Math.abs(id.hashCode()) % PALETTE.lengthwherehashCode()returnsInteger.MIN_VALUEcausesMath.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE(JLS §15.15.4), andInteger.MIN_VALUE % 8is-0(negative modulo), causingArrayIndexOutOfBoundsException. This is a correctness bug that SpotBugs INTX_BAD_PERCENT_OPERATOR or similar will flag as Priority 1. It must be fixed before the gate goes live or it blocks every merge.secret-scanjob scans full history (fetch-depth: 0) — this is correct and necessary to catch the 2 secrets found in git history in the audit. However, if those historical secrets are not removed from history (viagit filter-repoor BFG) before this gate is enabled, the gate will permanently blockmain. The issue does not address this remediation.@RequirePermission, accidentallypermitAll()endpoints, or CORS misconfig. The 2 ERROR findings from the audit are structural API issues, which Spectral catches well. Security-specific OpenAPI checks (e.g., missingsecuritySchemesreferences on endpoints) require custom Spectral rules.Recommendations
.sha256sumfile alongside the tarball and verify before extracting:secret-scangate onmain. Usegit filter-repo --path <file> --invert-pathsor BFG Repo Cleaner. This is non-negotiable — the gate will blockmainpermanently otherwise. Document the remediation in the commit message.AppUser.computeColoras a prerequisite (not a follow-up). UseMath.floorMod(id.hashCode(), PALETTE.length)which is defined to return a non-negative result for any divisor. This is a one-line fix that unblocks the SpotBugs gate.--exit-code 1and--no-progressflag to Trivy to suppress verbose output in CI logs and ensure non-zero exit on findings. The issue already has--exit-code 1; add--no-progressto keep logs clean.securityfield in the OpenAPI spec. This catches cases where@RequirePermissionwas forgotten and the endpoint is documented as unauthenticated. Add it to.spectral.yamlas a project-specific rule.main). The static gates in this issue are necessary but not sufficient for the auth-related findings from the audit.Open Decisions
.gitleaks.toml)? This determines whether the gitleaks gate can go live on the current branch at all.🧪 Sara Holt — QA Engineer & Test Strategist
Observations
lodashversion in a scratchpackage.json, a fake-secret string, a broken@PathVariable— in the PR and verify the gates catch them. Then remove the fixtures before merge. This is the standard approach for testing security gates.spotbugs-exclude.xmlfilter itself. If the exclusion file incorrectly suppresses a real finding (rather than a Lombok false positive), there is no test to catch it. A minimal integration test — "SpotBugs run on a class with a known EI_EXPOSE_REP violation passes (confirming the exclusion works) AND a class with a non-EI_EXPOSE_REP violation fails (confirming the exclusion is scoped)" — would validate the filter's correctness.frontend-scajob (npm audit --audit-level=high --omit=dev) will gate on the audit results at the time of first run. If there are any HIGH npm CVEs in production dependencies at merge time, this job will immediately blockmain. The currentpackage.jsonshould be audited first and any existing HIGH/CRITICAL vulnerabilities resolved before the gate is enabled.openapi-lintjob has no mechanism to upload theopenapi.jsonartifact, making it difficult to debug Spectral failures. If Spectral reports an error, the dev needs to re-run the job locally to see the spec — the spec should be uploaded as a CI artifact for inspection..spectral.yamlruleset content is not defined in the issue. Without knowing the rules, it is impossible to write tests for the Spectral gate. The 2 ERROR-severity issues found in the audit should be reproducible test cases: deliberately introduce each error condition, verify Spectral catches it, fix it, verify Spectral passes.Recommendations
npm audit --audit-level=high --omit=devlocally before merging this issue and resolve any existing HIGH/CRITICAL findings infrontend/package.json. With only 7 production dependencies, this should be fast. Document the result (clean or 0 HIGH) in the PR description.openapi.jsonas a CI artifact in theopenapi-lintjob, always (not just on failure). Developers need to see the spec that Spectral rejected:AppUser.computeColorP1 finding appears (to confirm the gate would catch it if not fixed), (c) no other P1 findings exist that would blockmainafter the gate goes live.container-scanneedingbuild-and-push) should be smoke-tested on a test branch before landing onmain.Open Decisions
🎨 Leonie Voss — UX Designer & Accessibility Strategist
Observations
This is a DevOps/CI issue with no frontend UI surface — no components, no routes, no Tailwind, no Svelte. There is nothing to review from a visual design, accessibility, or responsive layout perspective.
However, there is one indirect UX dimension worth noting: the
openapi-lintSpectral gate will enforce the API contract that the TypeScript client (openapi-fetch) depends on. Broken OpenAPI specs cause broken TypeScript type generation, which in turn causes UX regressions (unexpectedundefinedvalues, broken form state, missing labels). The Spectral gate is therefore an indirect UX quality gate.Recommendations
.spectral.yamlrules, include a check that every API response schema field marked asrequiredin the OpenAPI spec has@Schema(requiredMode = REQUIRED)on the corresponding Java entity field. This is the mechanism that drives TypeScript non-optional types (result.data!.idvs.result.data?.id). A missingrequiredModecauses the TypeScript type to become optional, which propagates to the UI as potentialundefinedrenders. This is a UX correctness concern, not just an API pedantry.--format githubor equivalent flag to Spectral to get inline PR annotations on failing rules, rather than requiring developers to read raw log output.Open Decisions
Decision Queue — Issue #461
Grouped open decisions from the persona review. Each needs a concrete answer before or during implementation.
Theme A — Secret remediation scope (Nora + Elicit)
Question: Are the 9 secrets found in the working tree and 2 in git history real credentials requiring rotation + history rewrite, or test fixtures / placeholder values that can be allowlisted in
.gitleaks.toml?Why it matters: If any are real credentials, they must be rotated and history-rewritten (
git filter-repo) before thesecret-scangate goes live onmain. The gate will permanently blockmainotherwise. If all 9 are test fixtures, a.gitleaks.tomlallowlist unblocks the gate without history rewriting — but.gitleaks.tomlmoves from "optional" (as currently listed) to required.Blocks:
secret-scangate going live.Theme B — CVE baseline policy (Elicit + Nora)
Question A: The pre-prod audit found 19 HIGH/CRITICAL Maven CVEs. Should
.trivyignoresuppressions include an expiry/review date (e.g., 90 days) after which they are re-evaluated automatically?Why it matters: A review-date policy prevents
.trivyignorefrom accumulating stale suppressions that become permanent technical debt. Without it, a CVE suppressed today because "no fix available" may have a fix in 3 months but remain suppressed forever.Question B: For each of the 19 CVEs, what is the decision: upgrade the dependency, accept risk, or suppress-with-justification? This work is 1–2 days not currently in the effort estimate.
Blocks: Step 4 of the verification checklist ("push a clean branch — all gates pass green").
Theme C — openapi-lint implementation approach (Markus + Tobias)
Question: Should
openapi-lintboot the full Docker Compose stack at CI runtime to generate the OpenAPI spec, or should it use thespringdoc-openapi-maven-pluginto generate a static spec at build time?Why it matters: The Docker Compose approach requires Docker socket access on the self-hosted NAS runner (a security boundary), has no timeout on the backend startup poll, and pulls
minio/minio:latest(unpinned). The Maven build-time approach avoids all three problems but requires configuring a new Maven plugin.Blocks:
openapi-lintjob design — affects both the workflow YAML and whetherdocker-compose.ymlimage pinning is a prerequisite.Theme D — SpotBugs exclusion scope (Markus)
Question: Should
spotbugs-exclude.xmlsuppressEI_EXPOSE_REP/REP2warnings globally (for all classes) or only for classes in the entity packages (e.g.,org.raddatz.familienarchiv.*.Document,*.Person, etc.)?Why it matters: A global suppression hides the same warning class if it appears in a non-entity service class where it would be a real finding (a service exposing an internal mutable collection reference). A package-scoped suppression is more precise but requires listing all entity packages.
Blocks:
spotbugs-exclude.xmlauthoring.Theme E — Spectral severity boundary (Markus)
Question: Should
openapi-lintgate on--fail-severity erroronly (as currently proposed), or also on--fail-severity warn?Why it matters: The built-in
spectral:oasruleset has manywarn-level rules (e.g., missingdescriptionon operations, inconsistent tag casing) that could indicate real issues without being hard errors. Gating onwarnis stricter but may produce noise; gating only onerrormay miss meaningful API contract problems.Blocks:
.spectral.yamlruleset authoring and the--fail-severityflag choice.