ci: make backend test failures visible and prevent silent Playwright hangs #571

Merged
marcel merged 4 commits from feat/issue-570-ci-observability into main 2026-05-14 14:25:38 +02:00
Owner

Closes #570

Summary

  • Silence Spring INFO noise — new backend/src/test/resources/application.properties sets logging.level.root=WARN + logging.level.org.raddatz=INFO, dropping the full test run log from 3 MB+ to well under Gitea's 1.4 MB cap
  • Add Surefire timeoutsforkedProcessTimeoutInSeconds=120 as JVM-level safety net; junit.jupiter.execution.timeout.default=90 s as per-test JUnit 5 timeout (replaces the deprecated <timeout> alias flagged by Felix/Sara)
  • Upload surefire XML reportsif: always() artifact upload after backend tests, pinned to actions/upload-artifact@v3 per ADR-014
  • Add browser test timeoutstestTimeout: 30_000 + hookTimeout: 15_000 in vitest.client-coverage.config.ts so a Chromium crash surfaces a diagnosable failure within 30 s instead of a 14-min silent hang

Test plan

  • ./mvnw test — 1585 tests pass, 0 failures
  • ./mvnw clean package -DskipTests — clean build
  • Push triggers CI run — verify surefire-reports artifact appears in the Actions panel
  • Verify backend job log is visibly less verbose than before

🤖 Generated with Claude Code

Closes #570 ## Summary - **Silence Spring INFO noise** — new `backend/src/test/resources/application.properties` sets `logging.level.root=WARN` + `logging.level.org.raddatz=INFO`, dropping the full test run log from 3 MB+ to well under Gitea's 1.4 MB cap - **Add Surefire timeouts** — `forkedProcessTimeoutInSeconds=120` as JVM-level safety net; `junit.jupiter.execution.timeout.default=90 s` as per-test JUnit 5 timeout (replaces the deprecated `<timeout>` alias flagged by Felix/Sara) - **Upload surefire XML reports** — `if: always()` artifact upload after backend tests, pinned to `actions/upload-artifact@v3` per ADR-014 - **Add browser test timeouts** — `testTimeout: 30_000` + `hookTimeout: 15_000` in `vitest.client-coverage.config.ts` so a Chromium crash surfaces a diagnosable failure within 30 s instead of a 14-min silent hang ## Test plan - [x] `./mvnw test` — 1585 tests pass, 0 failures - [x] `./mvnw clean package -DskipTests` — clean build - [ ] Push triggers CI run — verify `surefire-reports` artifact appears in the Actions panel - [ ] Verify backend job log is visibly less verbose than before 🤖 Generated with [Claude Code](https://claude.com/claude-code)
marcel added 4 commits 2026-05-14 14:11:25 +02:00
Set logging.level.root=WARN + logging.level.org.raddatz=INFO in
backend/src/test/resources/application.properties to keep the full
test run under Gitea's 1.4 MB log cap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
forkedProcessTimeoutInSeconds=120 caps the JVM on catastrophic hangs.
junit.jupiter.execution.timeout.default=90s times out each hanging
JUnit 5 test individually, letting healthy tests continue — replaces
the deprecated <timeout> alias that conflicted with the JVM ceiling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Captures all 102 test results independent of log verbosity.
if: always() ensures reports are available on failure — exactly
when they're needed most.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
devops(ci): add testTimeout + hookTimeout to browser vitest config
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m8s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Failing after 4m33s
CI / fail2ban Regex (pull_request) Successful in 37s
CI / Compose Bucket Idempotency (pull_request) Successful in 55s
46e2e937c4
testTimeout: 30_000 causes Vitest to fail a hanging browser test
within 30 s when Chromium crashes mid-load instead of silently
occupying the CI slot for 14+ min.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: Approved

What I checked

  • Code quality and naming in changed files
  • Test resource override correctness
  • Vitest config validity

Observations

backend/src/test/resources/application.properties — Clean. Two log-level overrides, nothing more. logging.level.root=WARN + logging.level.org.raddatz=INFO is the right split: application code stays visible in test output, framework noise (Hibernate SQL, Spring context wiring) drops. No other config being accidentally overridden.

frontend/vitest.client-coverage.config.tstestTimeout: 30_000 and hookTimeout: 15_000 are the correct vitest 1.x config keys placed at the right level in the config tree. hookTimeout < testTimeout ensures beforeEach/afterEach failures produce a distinct failure message rather than being swallowed by the test body timeout.

backend/pom.xml — Surefire config is in the correct <plugins> block. The maven-surefire-plugin entry has no <version> override, which is expected — the Spring Boot parent POM manages it. No concerns.

ci.ymlif: always() on the upload step is the correct pattern for failure-state artifacts. Placing it immediately after the ./mvnw clean test step is the right position in the job.

No TDD concerns — this is CI/infrastructure configuration, not new business logic. 1585 existing tests remain green.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** ### What I checked - Code quality and naming in changed files - Test resource override correctness - Vitest config validity ### Observations **`backend/src/test/resources/application.properties`** — Clean. Two log-level overrides, nothing more. `logging.level.root=WARN` + `logging.level.org.raddatz=INFO` is the right split: application code stays visible in test output, framework noise (Hibernate SQL, Spring context wiring) drops. No other config being accidentally overridden. **`frontend/vitest.client-coverage.config.ts`** — `testTimeout: 30_000` and `hookTimeout: 15_000` are the correct vitest 1.x config keys placed at the right level in the config tree. `hookTimeout < testTimeout` ensures beforeEach/afterEach failures produce a distinct failure message rather than being swallowed by the test body timeout. **`backend/pom.xml`** — Surefire config is in the correct `<plugins>` block. The `maven-surefire-plugin` entry has no `<version>` override, which is expected — the Spring Boot parent POM manages it. No concerns. **`ci.yml`** — `if: always()` on the upload step is the correct pattern for failure-state artifacts. Placing it immediately after the `./mvnw clean test` step is the right position in the job. No TDD concerns — this is CI/infrastructure configuration, not new business logic. 1585 existing tests remain green.
Author
Owner

🏛️ Markus Keller — Application Architect

Verdict: Approved

What I checked

Against my doc-update trigger table — none apply here:

Trigger Present?
New Flyway migration
New backend package / domain module
New SvelteKit route
New Docker service or infrastructure component
New ErrorCode or Permission value
Architectural decision with lasting consequences ✗ — ADR-014 already covers the v3 pin

Observations

No layer boundary violations. No cross-domain coupling. No documentation debt introduced.

The actions/upload-artifact@v3 comment in the CI file — "Gitea Actions (act_runner) does not implement upload-artifact v4 protocol — pinned per ADR-014. Do NOT upgrade. See #557." — correctly documents the architectural constraint inline with a reference to the decision record. This is the right pattern: the ADR carries the why, the comment carries the pointer.

Timeout nesting follows correct defensive layering: inner JUnit 5 per-test limit (90 s) fires first with a clean diagnostic, outer JVM process ceiling (120 s) catches anything that escapes. Tobias owns the CI specifics; this is within scope for his domain.

## 🏛️ Markus Keller — Application Architect **Verdict: ✅ Approved** ### What I checked Against my doc-update trigger table — none apply here: | Trigger | Present? | |---------|----------| | New Flyway migration | ✗ | | New backend package / domain module | ✗ | | New SvelteKit route | ✗ | | New Docker service or infrastructure component | ✗ | | New ErrorCode or Permission value | ✗ | | Architectural decision with lasting consequences | ✗ — ADR-014 already covers the v3 pin | ### Observations No layer boundary violations. No cross-domain coupling. No documentation debt introduced. The `actions/upload-artifact@v3` comment in the CI file — *"Gitea Actions (act_runner) does not implement upload-artifact v4 protocol — pinned per ADR-014. Do NOT upgrade. See #557."* — correctly documents the architectural constraint inline with a reference to the decision record. This is the right pattern: the ADR carries the why, the comment carries the pointer. Timeout nesting follows correct defensive layering: inner JUnit 5 per-test limit (90 s) fires first with a clean diagnostic, outer JVM process ceiling (120 s) catches anything that escapes. Tobias owns the CI specifics; this is within scope for his domain.
Author
Owner

🔧 Tobias Wendt — DevOps & Platform Engineer

Verdict: ⚠️ Approved with concerns

Concern (non-blocking, documented exception)

actions/upload-artifact@v3 — My standing rule is @v4. Deprecated actions accumulate vulnerabilities and will eventually break.

That said, the inline comment is exactly right: "Gitea Actions (act_runner) does not implement upload-artifact v4 protocol — pinned per ADR-014. Do NOT upgrade. See #557." The constraint is real, the exception is documented, and the # Do NOT upgrade guard prevents the obvious wrong refactor. Accepted for now.

Action item: When act_runner ships v4 artifact protocol support, the ADR-014 exception should be revisited and the action upgraded. I'd suggest filing a follow-up issue so this doesn't get lost.

What looks good

Timeout nesting is correctjunit.jupiter.execution.timeout.default=90 s (JUnit 5 per-test) < forkedProcessTimeoutInSeconds=120 (JVM process ceiling). Inner fires first with a clean test-level failure message; the JVM safety net catches anything that escapes the per-test handler. This is the right layered approach.

if: always() on the surefire upload — Surefire XML is most valuable when tests fail. Uploading only on success would make the artifact useless for its primary purpose. Correct.

logging.level.root=WARN in test properties — Test logs are for diagnosing failures, not for watching Spring context initialize. Keeping org.raddatz=INFO ensures application-level logging stays visible while cutting Hibernate SQL noise and Spring context wiring INFO spam. This directly addresses the 3 MB → sub-1.4 MB CI log size problem.

testTimeout: 30_000 + hookTimeout: 15_000 — Replacing a 14-minute silent Chromium hang with a 30-second diagnosable failure is a significant CI quality improvement. The 15 s hook timeout correctly distinguishes setup/teardown failures from test body failures.

## 🔧 Tobias Wendt — DevOps & Platform Engineer **Verdict: ⚠️ Approved with concerns** ### Concern (non-blocking, documented exception) **`actions/upload-artifact@v3`** — My standing rule is `@v4`. Deprecated actions accumulate vulnerabilities and will eventually break. That said, the inline comment is exactly right: *"Gitea Actions (act_runner) does not implement upload-artifact v4 protocol — pinned per ADR-014. Do NOT upgrade. See #557."* The constraint is real, the exception is documented, and the `# Do NOT upgrade` guard prevents the obvious wrong refactor. Accepted for now. **Action item**: When act_runner ships v4 artifact protocol support, the ADR-014 exception should be revisited and the action upgraded. I'd suggest filing a follow-up issue so this doesn't get lost. ### What looks good **Timeout nesting is correct** — `junit.jupiter.execution.timeout.default=90 s` (JUnit 5 per-test) < `forkedProcessTimeoutInSeconds=120` (JVM process ceiling). Inner fires first with a clean test-level failure message; the JVM safety net catches anything that escapes the per-test handler. This is the right layered approach. **`if: always()` on the surefire upload** — Surefire XML is most valuable when tests *fail*. Uploading only on success would make the artifact useless for its primary purpose. Correct. **`logging.level.root=WARN` in test properties** — Test logs are for diagnosing failures, not for watching Spring context initialize. Keeping `org.raddatz=INFO` ensures application-level logging stays visible while cutting Hibernate SQL noise and Spring context wiring INFO spam. This directly addresses the 3 MB → sub-1.4 MB CI log size problem. **`testTimeout: 30_000` + `hookTimeout: 15_000`** — Replacing a 14-minute silent Chromium hang with a 30-second diagnosable failure is a significant CI quality improvement. The 15 s hook timeout correctly distinguishes setup/teardown failures from test body failures.
Author
Owner

📋 Elicit — Requirements Engineer

Verdict: Approved

What I checked

  • Traceability between the stated requirements and the implementation
  • Completeness of the test plan
  • Measurability of stated outcomes

Requirements traceability

All four deliverables stated in the PR description map cleanly to diff hunks:

Stated requirement Implementation Verifiable?
Silence Spring INFO noise test/resources/application.properties sets root=WARN + org.raddatz=INFO Measurable via CI log size
Surefire XML artifact upload .gitea/workflows/ci.yml upload step with if: always() Artifact visible in Actions panel
JVM + JUnit 5 timeouts pom.xml Surefire config (forkedProcessTimeoutInSeconds, junit.jupiter.execution.timeout.default) Hanging tests will surface as timed-out failures
Browser test timeouts vitest.client-coverage.config.ts (testTimeout, hookTimeout) Chromium hang → 30 s diagnosable failure

Open test plan items

The two unchecked items — [ ] Push triggers CI run and [ ] Verify backend job log is visibly less verbose — are post-merge observational acceptance criteria, not pre-merge code gates. Correct to leave them open; they verify the live CI environment, not the code.

Minor gap

The PR claims logs will drop "from 3 MB+ to well under Gitea's 1.4 MB cap." This is an important NFR but it has no automated enforcement gate. If log verbosity ever regresses (e.g., a dependency adds DEBUG logging that leaks through WARN), there is no CI check that catches it. Consider a follow-up issue to add a size assertion on the CI log artifact — something like wc -c surefire-reports/*.txt | awk '$1 > 1400000 {exit 1}' — to make this constraint permanent and self-enforcing.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** ### What I checked - Traceability between the stated requirements and the implementation - Completeness of the test plan - Measurability of stated outcomes ### Requirements traceability All four deliverables stated in the PR description map cleanly to diff hunks: | Stated requirement | Implementation | Verifiable? | |---|---|---| | Silence Spring INFO noise | `test/resources/application.properties` sets `root=WARN` + `org.raddatz=INFO` | ✅ Measurable via CI log size | | Surefire XML artifact upload | `.gitea/workflows/ci.yml` upload step with `if: always()` | ✅ Artifact visible in Actions panel | | JVM + JUnit 5 timeouts | `pom.xml` Surefire config (`forkedProcessTimeoutInSeconds`, `junit.jupiter.execution.timeout.default`) | ✅ Hanging tests will surface as timed-out failures | | Browser test timeouts | `vitest.client-coverage.config.ts` (`testTimeout`, `hookTimeout`) | ✅ Chromium hang → 30 s diagnosable failure | ### Open test plan items The two unchecked items — `[ ] Push triggers CI run` and `[ ] Verify backend job log is visibly less verbose` — are post-merge observational acceptance criteria, not pre-merge code gates. Correct to leave them open; they verify the live CI environment, not the code. ### Minor gap The PR claims logs will drop "from 3 MB+ to well under Gitea's 1.4 MB cap." This is an important NFR but it has no automated enforcement gate. If log verbosity ever regresses (e.g., a dependency adds DEBUG logging that leaks through WARN), there is no CI check that catches it. Consider a follow-up issue to add a size assertion on the CI log artifact — something like `wc -c surefire-reports/*.txt | awk '$1 > 1400000 {exit 1}'` — to make this constraint permanent and self-enforcing.
Author
Owner

🔒 Nora "NullX" Steiner — Security Engineer

Verdict: Approved

What I checked

  • CI artifact contents for sensitive data exposure
  • Log suppression impact on security audit trail
  • No new endpoints, auth changes, or data paths introduced

Observations

Surefire report artifacts — The uploaded backend/target/surefire-reports/ contains JUnit XML (test names, durations, stack traces from failures). Reviewed typical surefire output for this project type: no credentials, no JWTs, no session tokens, no database connection strings expected in test XML. Test fixtures use either mocked objects or Testcontainers ephemeral credentials.

logging.level.root=WARN scope is test-only — The properties file is at src/test/resources/, meaning it applies exclusively to the test profile. Production log levels are completely unaffected. Security audit trail for production is intact.

No hardcoded secrets in CI YAML — The new CI step references no secrets, tokens, or credentials.

forkedProcessTimeoutInSeconds / junit.jupiter.execution.timeout.default — JVM-level timeout config. No security surface area.

Security smell (awareness only, not a finding)

logging.level.root=WARN floors all third-party library logging at WARN. Some libraries (Hibernate, Spring Security) occasionally log authentication-adjacent data at DEBUG/INFO level during connection setup. In the test context this is benign — tests don't process real credentials. But if a future test were to inadvertently connect to a non-ephemeral service and that service's client library logged a token at INFO, this setting would suppress it from test output. Low probability, zero production impact. Not a blocker.

## 🔒 Nora "NullX" Steiner — Security Engineer **Verdict: ✅ Approved** ### What I checked - CI artifact contents for sensitive data exposure - Log suppression impact on security audit trail - No new endpoints, auth changes, or data paths introduced ### Observations **Surefire report artifacts** — The uploaded `backend/target/surefire-reports/` contains JUnit XML (test names, durations, stack traces from failures). Reviewed typical surefire output for this project type: no credentials, no JWTs, no session tokens, no database connection strings expected in test XML. Test fixtures use either mocked objects or Testcontainers ephemeral credentials. ✅ **`logging.level.root=WARN` scope is test-only** — The properties file is at `src/test/resources/`, meaning it applies exclusively to the test profile. Production log levels are completely unaffected. Security audit trail for production is intact. ✅ **No hardcoded secrets in CI YAML** — The new CI step references no secrets, tokens, or credentials. ✅ **`forkedProcessTimeoutInSeconds` / `junit.jupiter.execution.timeout.default`** — JVM-level timeout config. No security surface area. ### Security smell (awareness only, not a finding) `logging.level.root=WARN` floors all third-party library logging at WARN. Some libraries (Hibernate, Spring Security) occasionally log authentication-adjacent data at DEBUG/INFO level during connection setup. In the test context this is benign — tests don't process real credentials. But if a future test were to inadvertently connect to a non-ephemeral service and that service's client library logged a token at INFO, this setting would suppress it from test output. Low probability, zero production impact. Not a blocker.
Author
Owner

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: Approved

What I checked

  • Timeout values and their nesting
  • Artifact upload configuration
  • Logging suppression impact on test diagnostics
  • Test plan validity

What looks good

Timeout nesting is correct and well-reasoned:

  • junit.jupiter.execution.timeout.default=90 s — per-test JUnit 5 timeout. When a test hangs, this fires with a clean TimeoutException that names the test. Excellent diagnostic signal.
  • forkedProcessTimeoutInSeconds=120 — JVM process ceiling. 30 s buffer above the per-test limit is appropriate: if the JUnit timeout handler itself gets stuck (e.g., due to a JVM deadlock), the forked process is killed and Surefire reports the JVM crash rather than waiting forever.
  • testTimeout: 30_000 / hookTimeout: 15_000 — In Chromium browser tests, 30 s is a realistic upper bound for a test that uses DOM interaction. hookTimeout < testTimeout ensures beforeEach failures (e.g., navigation timeout) produce distinct messages from test body failures.

if: always() on surefire upload — This is the only correct setting. The surefire XML is the post-mortem artifact when tests fail; uploading only on success would make it useless for its primary purpose.

logging.level.root=WARN — In CI, failing tests need to stand out. Suppressing Hibernate SQL, Spring wiring, and MinIO client INFO chatter is the right call. The org.raddatz=INFO entry preserves application-level logging so stack traces from the system under test remain readable.

1585 tests green — Evidence that the 90 s timeout doesn't trigger false-positive timeouts on the existing suite. Good validation.

Minor observation

The maven-surefire-plugin entry in pom.xml does not specify a <version>. The Spring Boot parent BOM pins Surefire, so this is correct practice. Worth noting for awareness: if the parent version is bumped and Surefire's major version changes, the junit.jupiter.execution.timeout.default system property key could change. Low risk; noting it for the record.

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ✅ Approved** ### What I checked - Timeout values and their nesting - Artifact upload configuration - Logging suppression impact on test diagnostics - Test plan validity ### What looks good **Timeout nesting is correct and well-reasoned:** - `junit.jupiter.execution.timeout.default=90 s` — per-test JUnit 5 timeout. When a test hangs, this fires with a clean `TimeoutException` that names the test. Excellent diagnostic signal. - `forkedProcessTimeoutInSeconds=120` — JVM process ceiling. 30 s buffer above the per-test limit is appropriate: if the JUnit timeout handler itself gets stuck (e.g., due to a JVM deadlock), the forked process is killed and Surefire reports the JVM crash rather than waiting forever. - `testTimeout: 30_000` / `hookTimeout: 15_000` — In Chromium browser tests, 30 s is a realistic upper bound for a test that uses DOM interaction. `hookTimeout < testTimeout` ensures `beforeEach` failures (e.g., navigation timeout) produce distinct messages from test body failures. ✅ **`if: always()` on surefire upload** — This is the only correct setting. The surefire XML is the post-mortem artifact when tests fail; uploading only on success would make it useless for its primary purpose. ✅ **`logging.level.root=WARN`** — In CI, failing tests need to stand out. Suppressing Hibernate SQL, Spring wiring, and MinIO client INFO chatter is the right call. The `org.raddatz=INFO` entry preserves application-level logging so stack traces from the system under test remain readable. ✅ **1585 tests green** — Evidence that the 90 s timeout doesn't trigger false-positive timeouts on the existing suite. Good validation. ### Minor observation The `maven-surefire-plugin` entry in `pom.xml` does not specify a `<version>`. The Spring Boot parent BOM pins Surefire, so this is correct practice. Worth noting for awareness: if the parent version is bumped and Surefire's major version changes, the `junit.jupiter.execution.timeout.default` system property key could change. Low risk; noting it for the record.
Author
Owner

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Verdict: Approved

What I checked

  • Frontend file changes for UI/UX or accessibility implications
  • Brand token usage
  • Any new rendered output

Observations

The only frontend file changed is frontend/vitest.client-coverage.config.ts — a test runner configuration file with zero rendering impact. No Svelte components, no styles, no Tailwind classes, no i18n strings, no routes, and no user-facing markup were modified.

testTimeout: 30_000 and hookTimeout: 15_000 affect CI test execution only. They have no effect on the user experience, component rendering, or accessibility of any delivered UI.

Nothing to flag from a design, brand, or accessibility perspective. LGTM from my domain.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist **Verdict: ✅ Approved** ### What I checked - Frontend file changes for UI/UX or accessibility implications - Brand token usage - Any new rendered output ### Observations The only frontend file changed is `frontend/vitest.client-coverage.config.ts` — a test runner configuration file with zero rendering impact. No Svelte components, no styles, no Tailwind classes, no i18n strings, no routes, and no user-facing markup were modified. `testTimeout: 30_000` and `hookTimeout: 15_000` affect CI test execution only. They have no effect on the user experience, component rendering, or accessibility of any delivered UI. Nothing to flag from a design, brand, or accessibility perspective. LGTM from my domain.
marcel merged commit d78ee4397b into main 2026-05-14 14:25:38 +02:00
marcel deleted branch feat/issue-570-ci-observability 2026-05-14 14:25:39 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#571