ci: make backend test failures visible and prevent silent Playwright hangs #570

Closed
opened 2026-05-14 13:24:37 +02:00 by marcel · 8 comments
Owner

Problem

Two recurring CI failure modes are currently undiagnosable and slow to surface:

1. Backend — hidden failures, 17-minute hangs

Spring Boot's verbose startup output (banner + INFO for every bean per context restart) hits Gitea's 1.4 MB log cap before all 102 test classes have run. When a test in the later half fails or hangs, the log is already truncated — no failure message is visible, and the job silently occupies the runner for 10–17 minutes before timing out.

Observed in run #1636: all visible tests passed, job ran 17 min (4× normal), failure invisible.

2. Frontend — Playwright/Chromium silent crash

During vitest run -c vitest.client-coverage.config.ts --coverage, the Chromium browser process can crash mid-test-load. Because birpc loses its connection without a teardown event, Vitest waits indefinitely — no output, no error. The existing [birpc] rpc is closed guard only catches teardown races; a mid-run crash bypasses it entirely.

Observed in run #1637: 534 server tests passed, 3 browser tests passed, log ends mid-Svelte-compilation at AnnotationEditOverlay.svelte, then 14 min of silence before job timeout.


Plan

Backend

1. Silence Spring noise in the test log

Add to backend/src/test/resources/application.properties:

logging.level.root=WARN
logging.level.org.raddatz=INFO

Drops log size from 3 MB+ to well under 100 KB — all 102 test classes fit inside the cap, failures are always visible.

2. Add a Surefire test timeout

Add to maven-surefire-plugin config in pom.xml:

<configuration>
    <forkedProcessTimeoutInSeconds>120</forkedProcessTimeoutInSeconds>
    <timeout>90</timeout>
</configuration>

A hanging integration test fails loudly in ≤2 min instead of silently consuming the CI slot for 13+ min.

3. Upload surefire XML reports as a CI artifact

Add after the backend test step in .gitea/workflows/ci.yml:

- name: Upload surefire reports
  if: always()
  uses: actions/upload-artifact@v3
  with:
    name: surefire-reports
    path: backend/target/surefire-reports/

These XML files capture every test result regardless of log verbosity — independent of the log cap.

Frontend

4. Add global timeouts to the browser vitest config

Add to vitest.client-coverage.config.ts:

testTimeout: 30_000,
hookTimeout: 15_000,

When Chromium crashes mid-load, Vitest times out the hanging test after 30 s and reports a diagnosable failure instead of hanging for 14 min.


Acceptance criteria

  • Backend log output for a full test run stays under 500 KB
  • A hanging Testcontainers test fails with a timeout error within 2 minutes
  • Surefire XML reports are available as downloadable artifacts on every backend run
  • A Playwright mid-run crash produces a timeout failure message within 30 s instead of a silent hang
## Problem Two recurring CI failure modes are currently **undiagnosable** and **slow to surface**: ### 1. Backend — hidden failures, 17-minute hangs Spring Boot's verbose startup output (banner + INFO for every bean per context restart) hits Gitea's **1.4 MB log cap** before all 102 test classes have run. When a test in the later half fails or hangs, the log is already truncated — no failure message is visible, and the job silently occupies the runner for 10–17 minutes before timing out. Observed in run [#1636](https://git.raddatz.cloud/marcel/familienarchiv/actions/runs/1636/jobs/2): all visible tests passed, job ran 17 min (4× normal), failure invisible. ### 2. Frontend — Playwright/Chromium silent crash During `vitest run -c vitest.client-coverage.config.ts --coverage`, the Chromium browser process can crash mid-test-load. Because `birpc` loses its connection without a teardown event, Vitest waits indefinitely — no output, no error. The existing `[birpc] rpc is closed` guard only catches teardown races; a mid-run crash bypasses it entirely. Observed in run [#1637](https://git.raddatz.cloud/marcel/familienarchiv/actions/runs/1637/jobs/0): 534 server tests passed, 3 browser tests passed, log ends mid-Svelte-compilation at `AnnotationEditOverlay.svelte`, then 14 min of silence before job timeout. --- ## Plan ### Backend **1. Silence Spring noise in the test log** Add to `backend/src/test/resources/application.properties`: ```properties logging.level.root=WARN logging.level.org.raddatz=INFO ``` Drops log size from 3 MB+ to well under 100 KB — all 102 test classes fit inside the cap, failures are always visible. **2. Add a Surefire test timeout** Add to `maven-surefire-plugin` config in `pom.xml`: ```xml <configuration> <forkedProcessTimeoutInSeconds>120</forkedProcessTimeoutInSeconds> <timeout>90</timeout> </configuration> ``` A hanging integration test fails loudly in ≤2 min instead of silently consuming the CI slot for 13+ min. **3. Upload surefire XML reports as a CI artifact** Add after the backend test step in `.gitea/workflows/ci.yml`: ```yaml - name: Upload surefire reports if: always() uses: actions/upload-artifact@v3 with: name: surefire-reports path: backend/target/surefire-reports/ ``` These XML files capture every test result regardless of log verbosity — independent of the log cap. ### Frontend **4. Add global timeouts to the browser vitest config** Add to `vitest.client-coverage.config.ts`: ```ts testTimeout: 30_000, hookTimeout: 15_000, ``` When Chromium crashes mid-load, Vitest times out the hanging test after 30 s and reports a diagnosable failure instead of hanging for 14 min. --- ## Acceptance criteria - [ ] Backend log output for a full test run stays under 500 KB - [ ] A hanging Testcontainers test fails with a timeout error within 2 minutes - [ ] Surefire XML reports are available as downloadable artifacts on every backend run - [ ] A Playwright mid-run crash produces a timeout failure message within 30 s instead of a silent hang
marcel added the P2-mediumdevops labels 2026-05-14 13:24:44 +02:00
Author
Owner

🏗️ Markus Keller — Senior Application Architect

Observations

  • All four changes are targeted and non-invasive. No domain boundaries are crossed, no new infrastructure dependencies introduced.
  • Adding backend/src/test/resources/application.properties is the correct Spring Boot convention for test-scoped config overrides. It doesn't affect the main profile.
  • The Surefire <timeout>90</timeout> element deserves closer scrutiny — see Sara's review for the technical detail. If it silently aliases forkedProcessTimeoutInSeconds, the two values in the same block conflict.
  • No architecture documentation updates are needed: no new packages, services, routes, infrastructure components, or error codes. The ADR index doesn't need an entry for log-level tuning.

Recommendations

  • Proceed. Scope is appropriate and implementation is straightforward. The only item to resolve before merging is the Surefire XML config ambiguity (Sara's concern).
  • No ADR required — this is operational configuration, not a lasting architectural decision.
## 🏗️ Markus Keller — Senior Application Architect ### Observations - All four changes are targeted and non-invasive. No domain boundaries are crossed, no new infrastructure dependencies introduced. - Adding `backend/src/test/resources/application.properties` is the correct Spring Boot convention for test-scoped config overrides. It doesn't affect the main profile. - The Surefire `<timeout>90</timeout>` element deserves closer scrutiny — see Sara's review for the technical detail. If it silently aliases `forkedProcessTimeoutInSeconds`, the two values in the same block conflict. - No architecture documentation updates are needed: no new packages, services, routes, infrastructure components, or error codes. The ADR index doesn't need an entry for log-level tuning. ### Recommendations - Proceed. Scope is appropriate and implementation is straightforward. The only item to resolve before merging is the Surefire XML config ambiguity (Sara's concern). - No ADR required — this is operational configuration, not a lasting architectural decision.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

  • backend/src/test/resources/application.properties does not exist yet — this is a new file. No merging, no conflicts.
  • logging.level.root=WARN silences Testcontainers image pull and startup messages (logged at INFO). Container startup failures still surface at ERROR, so they remain visible. logging.level.org.raddatz=INFO preserves your own application logs — right call.
  • The vitest.client-coverage.config.ts additions belong at the same level as expect and browser inside the test: block — correct Vitest 4 placement. Default testTimeout is 5 000 ms, so 30 000 ms is a clear improvement without being recklessly long.

Concern: <timeout>90</timeout> in Surefire is a deprecated alias, not a per-test timeout

In Maven Surefire 3.x, <timeout> is a deprecated alias for <forkedProcessTimeoutInSeconds>. Having both in the same <configuration> block creates a conflict — the effective ceiling ends up being 90 seconds (last-wins or alias precedence), not 120. With Testcontainers PostgreSQL startup taking 20–40 seconds on a cold cache, only ~50 seconds remain for all integration tests — likely insufficient.

Neither element gives you per-test JUnit 5 timeouts. A single hanging @Test still blocks until the JVM ceiling is hit. The correct approach:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <configuration>
        <forkedProcessTimeoutInSeconds>120</forkedProcessTimeoutInSeconds>
        <systemPropertyVariables>
            <junit.jupiter.execution.timeout.default>90 s</junit.jupiter.execution.timeout.default>
        </systemPropertyVariables>
    </configuration>
</plugin>

junit.jupiter.execution.timeout.default times out each hanging test individually, lets healthy tests continue, and keeps the 120-second JVM ceiling as a backstop for catastrophic hangs.

Recommendations

  • Remove <timeout>90</timeout>. Replace with the junit.jupiter.execution.timeout.default system property as above.
  • Keep <forkedProcessTimeoutInSeconds>120</forkedProcessTimeoutInSeconds> as the JVM-level safety net.
  • Everything else in the plan is correct as written.
## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - `backend/src/test/resources/application.properties` does not exist yet — this is a new file. No merging, no conflicts. - `logging.level.root=WARN` silences Testcontainers image pull and startup messages (logged at INFO). Container startup failures still surface at ERROR, so they remain visible. `logging.level.org.raddatz=INFO` preserves your own application logs — right call. - The `vitest.client-coverage.config.ts` additions belong at the same level as `expect` and `browser` inside the `test:` block — correct Vitest 4 placement. Default `testTimeout` is 5 000 ms, so 30 000 ms is a clear improvement without being recklessly long. ### Concern: `<timeout>90</timeout>` in Surefire is a deprecated alias, not a per-test timeout In Maven Surefire 3.x, `<timeout>` is a deprecated alias for `<forkedProcessTimeoutInSeconds>`. Having both in the same `<configuration>` block creates a conflict — the effective ceiling ends up being 90 seconds (last-wins or alias precedence), not 120. With Testcontainers PostgreSQL startup taking 20–40 seconds on a cold cache, only ~50 seconds remain for all integration tests — likely insufficient. Neither element gives you per-test JUnit 5 timeouts. A single hanging `@Test` still blocks until the JVM ceiling is hit. The correct approach: ```xml <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <configuration> <forkedProcessTimeoutInSeconds>120</forkedProcessTimeoutInSeconds> <systemPropertyVariables> <junit.jupiter.execution.timeout.default>90 s</junit.jupiter.execution.timeout.default> </systemPropertyVariables> </configuration> </plugin> ``` `junit.jupiter.execution.timeout.default` times out each hanging test individually, lets healthy tests continue, and keeps the 120-second JVM ceiling as a backstop for catastrophic hangs. ### Recommendations - Remove `<timeout>90</timeout>`. Replace with the `junit.jupiter.execution.timeout.default` system property as above. - Keep `<forkedProcessTimeoutInSeconds>120</forkedProcessTimeoutInSeconds>` as the JVM-level safety net. - Everything else in the plan is correct as written.
Author
Owner

🔧 Tobias Wendt — DevOps & Platform Engineer

Observations

  • All four changes are the right call. This addresses confirmed, observed CI pain (runs #1636 and #1637) with minimal footprint — exactly the kind of CI fix that earns its keep.
  • actions/upload-artifact@v3 is correct. ADR-014 (issue #557) pins artifact actions to v3; the unit-tests job already has an explicit self-check step that blocks v4+ with a grep guard. The proposed surefire upload step is already compliant. Do not bump to v4.
  • if: always() on the artifact upload step is essential — without it you only get artifacts on success, which is precisely when you don't need them. Correct pattern.
  • Placement of the surefire upload step after Run backend tests in backend-unit-tests is right. XML reports are written during test execution and are available regardless of exit code.
  • Maven cache key (maven-${{ hashFiles('backend/pom.xml') }}) will be invalidated once when the Surefire plugin config is added to pom.xml. Expected and acceptable — one cold build.
  • Surefire XML output for 102 test classes is typically 50–200 KB, well inside CI artifact limits.

Concern: <timeout>90</timeout> may override <forkedProcessTimeoutInSeconds>120</forkedProcessTimeoutInSeconds>

If <timeout> is a deprecated alias for forkedProcessTimeoutInSeconds in Surefire 3.x (it is), the effective JVM ceiling becomes 90 seconds, not 120. On a Testcontainers-heavy suite where container startup alone can consume 30–40 seconds, this leaves a very narrow window. See Felix's review for the correct fix.

Recommendations

  • Resolve the <timeout> vs forkedProcessTimeoutInSeconds conflict before merging (Felix's fix).
  • No other blockers. The log-level change and artifact upload step are ready to ship independently of the timeout fix if you want to split the work.
## 🔧 Tobias Wendt — DevOps & Platform Engineer ### Observations - All four changes are the right call. This addresses confirmed, observed CI pain (runs #1636 and #1637) with minimal footprint — exactly the kind of CI fix that earns its keep. - **`actions/upload-artifact@v3` is correct.** ADR-014 (issue #557) pins artifact actions to v3; the `unit-tests` job already has an explicit self-check step that blocks v4+ with a grep guard. The proposed surefire upload step is already compliant. Do not bump to v4. - **`if: always()`** on the artifact upload step is essential — without it you only get artifacts on success, which is precisely when you don't need them. Correct pattern. - Placement of the surefire upload step after `Run backend tests` in `backend-unit-tests` is right. XML reports are written during test execution and are available regardless of exit code. - Maven cache key (`maven-${{ hashFiles('backend/pom.xml') }}`) will be invalidated once when the Surefire plugin config is added to `pom.xml`. Expected and acceptable — one cold build. - Surefire XML output for 102 test classes is typically 50–200 KB, well inside CI artifact limits. ### Concern: `<timeout>90</timeout>` may override `<forkedProcessTimeoutInSeconds>120</forkedProcessTimeoutInSeconds>` If `<timeout>` is a deprecated alias for `forkedProcessTimeoutInSeconds` in Surefire 3.x (it is), the effective JVM ceiling becomes 90 seconds, not 120. On a Testcontainers-heavy suite where container startup alone can consume 30–40 seconds, this leaves a very narrow window. See Felix's review for the correct fix. ### Recommendations - Resolve the `<timeout>` vs `forkedProcessTimeoutInSeconds` conflict before merging (Felix's fix). - No other blockers. The log-level change and artifact upload step are ready to ship independently of the timeout fix if you want to split the work.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Observations

  • Surefire XML reports as CI artifacts: These contain test class names, method names, and exception stack traces from test failures — no production data, no credentials, no user-identifiable content. Safe to expose as CI artifacts scoped to the Gitea instance.
  • logging.level.root=WARN: Reduces verbosity; does not expose more data. Security-relevant events in this project (auth failures, permission denials) are recorded through AuditService — which logs at WARN or ERROR, not INFO — so they remain visible under the new log level. Spring Security's access-decision logging sits at DEBUG and is already suppressed in normal runs.
  • No new code paths, no new input validation boundaries, no authentication or authorization changes. This is purely test infrastructure.

No security concerns with any of the proposed changes.

## 🔒 Nora "NullX" Steiner — Application Security Engineer ### Observations - **Surefire XML reports as CI artifacts:** These contain test class names, method names, and exception stack traces from test failures — no production data, no credentials, no user-identifiable content. Safe to expose as CI artifacts scoped to the Gitea instance. - **`logging.level.root=WARN`:** Reduces verbosity; does not expose more data. Security-relevant events in this project (auth failures, permission denials) are recorded through `AuditService` — which logs at WARN or ERROR, not INFO — so they remain visible under the new log level. Spring Security's access-decision logging sits at DEBUG and is already suppressed in normal runs. - No new code paths, no new input validation boundaries, no authentication or authorization changes. This is purely test infrastructure. No security concerns with any of the proposed changes.
Author
Owner

🧪 Sara Holt — Senior QA Engineer

Observations

  • The logging fix directly improves test observability. Failures invisible in a truncated log are a false-negative factory — this is the correct fix and it's the highest-leverage change in the issue.
  • Surefire XML artifacts are a correct complement: XML reports capture every test result independent of log verbosity, and if: always() ensures they're available on failure. Zero-effort win.
  • testTimeout: 30_000 and hookTimeout: 15_000 are placed correctly in vitest.client-coverage.config.ts (same level as expect: and browser:). Default Vitest testTimeout is 5 000 ms — 30 s is an appropriate step-up for browser component tests with Svelte compilation overhead.
  • The existing Assert no birpc teardown race step checks for [birpc] rpc is closed — a teardown race. A mid-run Chromium crash produces a different failure mode (no message, silent hang). The testTimeout fix addresses the crash scenario; the birpc guard addresses the teardown scenario. They are complementary, not redundant.

Concern: <timeout>90</timeout> in Surefire is a deprecated alias, not a per-test JUnit 5 timeout

In Maven Surefire 3.x, <timeout> is a deprecated alias for <forkedProcessTimeoutInSeconds>. Having both in the same <configuration> block results in a conflicting or implementation-defined effective value — likely 90 seconds, overriding the intended 120. More importantly, neither element provides per-test timeouts in JUnit 5. A hanging @Test blocks until the JVM ceiling is hit — no diagnostic timeout error, no test name in the output.

The correct approach for per-test JUnit 5 timeouts:

<systemPropertyVariables>
    <junit.jupiter.execution.timeout.default>90 s</junit.jupiter.execution.timeout.default>
</systemPropertyVariables>

This times out each hanging test individually (with a meaningful test name in the error), lets all other tests continue running, and keeps forkedProcessTimeoutInSeconds=120 as a backstop for catastrophic JVM-level hangs.

Acceptance criteria gap: AC1 lacks automated enforcement

AC1 ("Backend log output stays under 500 KB") is measurable but has no automated check. The log-level change should make this permanently true, but it's worth verifying it will hold. Adding a log-size assertion to the CI step makes AC1 a gate rather than a spot-check:

- name: Assert log size under 500 KB
  if: always()
  run: |
    size=$(wc -c < /tmp/backend-test-${{ github.run_id }}.log 2>/dev/null || echo 0)
    [ "$size" -lt 512000 ] || { echo "FAIL: backend test log exceeds 500 KB ($size bytes)"; exit 1; }

This is optional — the 500 KB threshold will be structurally guaranteed by the WARN log level — but it makes the acceptance criterion machine-verifiable.

Recommendations

  • Required: Fix the Surefire <timeout> config before merging. Remove <timeout>90</timeout>; replace with <systemPropertyVariables> containing junit.jupiter.execution.timeout.default.
  • Optional: Add the log-size CI assertion to make AC1 automatically verifiable.
  • Everything else is correct and ready.
## 🧪 Sara Holt — Senior QA Engineer ### Observations - The logging fix directly improves test observability. Failures invisible in a truncated log are a false-negative factory — this is the correct fix and it's the highest-leverage change in the issue. - Surefire XML artifacts are a correct complement: XML reports capture every test result independent of log verbosity, and `if: always()` ensures they're available on failure. Zero-effort win. - `testTimeout: 30_000` and `hookTimeout: 15_000` are placed correctly in `vitest.client-coverage.config.ts` (same level as `expect:` and `browser:`). Default Vitest `testTimeout` is 5 000 ms — 30 s is an appropriate step-up for browser component tests with Svelte compilation overhead. - The existing `Assert no birpc teardown race` step checks for `[birpc] rpc is closed` — a teardown race. A mid-run Chromium crash produces a different failure mode (no message, silent hang). The `testTimeout` fix addresses the crash scenario; the birpc guard addresses the teardown scenario. They are complementary, not redundant. ### Concern: `<timeout>90</timeout>` in Surefire is a deprecated alias, not a per-test JUnit 5 timeout In Maven Surefire 3.x, `<timeout>` is a deprecated alias for `<forkedProcessTimeoutInSeconds>`. Having both in the same `<configuration>` block results in a conflicting or implementation-defined effective value — likely 90 seconds, overriding the intended 120. More importantly, **neither element provides per-test timeouts in JUnit 5.** A hanging `@Test` blocks until the JVM ceiling is hit — no diagnostic timeout error, no test name in the output. The correct approach for per-test JUnit 5 timeouts: ```xml <systemPropertyVariables> <junit.jupiter.execution.timeout.default>90 s</junit.jupiter.execution.timeout.default> </systemPropertyVariables> ``` This times out each hanging test individually (with a meaningful test name in the error), lets all other tests continue running, and keeps `forkedProcessTimeoutInSeconds=120` as a backstop for catastrophic JVM-level hangs. ### Acceptance criteria gap: AC1 lacks automated enforcement AC1 ("Backend log output stays under 500 KB") is measurable but has no automated check. The log-level change should make this permanently true, but it's worth verifying it will hold. Adding a log-size assertion to the CI step makes AC1 a gate rather than a spot-check: ```yaml - name: Assert log size under 500 KB if: always() run: | size=$(wc -c < /tmp/backend-test-${{ github.run_id }}.log 2>/dev/null || echo 0) [ "$size" -lt 512000 ] || { echo "FAIL: backend test log exceeds 500 KB ($size bytes)"; exit 1; } ``` This is optional — the 500 KB threshold will be structurally guaranteed by the `WARN` log level — but it makes the acceptance criterion machine-verifiable. ### Recommendations - **Required:** Fix the Surefire `<timeout>` config before merging. Remove `<timeout>90</timeout>`; replace with `<systemPropertyVariables>` containing `junit.jupiter.execution.timeout.default`. - **Optional:** Add the log-size CI assertion to make AC1 automatically verifiable. - Everything else is correct and ready.
Author
Owner

🎨 Leonie Voss — UX Designer & Accessibility Strategist

This is a CI infrastructure change with no user-facing component. Checked: the proposed changes affect only backend/src/test/resources/application.properties, pom.xml, .gitea/workflows/ci.yml, and vitest.client-coverage.config.ts — no Svelte components, no routes, no CSS, no interaction patterns, no accessibility surface, no brand decisions.

No UX concerns from this side.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist This is a CI infrastructure change with no user-facing component. Checked: the proposed changes affect only `backend/src/test/resources/application.properties`, `pom.xml`, `.gitea/workflows/ci.yml`, and `vitest.client-coverage.config.ts` — no Svelte components, no routes, no CSS, no interaction patterns, no accessibility surface, no brand decisions. No UX concerns from this side.
Author
Owner

📋 Elicit — Requirements Engineer

Observations

  • The problem statement is concise and evidence-backed (run #1636 with 17-minute silent hang, run #1637 with 14-minute silent Chromium hang). That's exemplary requirements practice — each failure mode is traced to a concrete observation.
  • All four solutions map directly to the two failure modes. No scope creep, no gold-plating.

Acceptance criteria quality

AC Measurable? Automatically enforced? Notes
Backend log stays under 500 KB No CI gate proposed — see Sara's review for an optional assertion
Hanging Testcontainers test fails within 2 min ⚠️ Depends on the Surefire timeout fix being correct (see Felix/Sara)
Surefire XML reports downloadable Directly verifiable in CI artifacts
Playwright crash produces timeout failure within 30 s ⚠️ Structurally guaranteed by testTimeout; hard to reproduce deterministically in CI

Edge case to clarify

The observed crash in run #1637 ended "mid-Svelte-compilation at AnnotationEditOverlay.svelte" — before any test had been scheduled for execution. In Vitest browser mode, testTimeout: 30_000 fires per-test once test execution starts. If Chromium dies before the first test is scheduled, the timeout fires on the initialization/compilation phase rather than test execution.

The hookTimeout: 15_000 covers beforeAll/beforeEach hooks, which is the closest built-in mechanism to the pre-execution phase. In practice, testTimeout should still cause Vitest to eventually surface a timeout error for each pending test — but the 30-second window applies per test, not to the entire hanging suite. If 20 tests are queued when Chromium dies, worst case is 20 × 30 s = 10 minutes before the suite fails. Still a significant improvement over the 14-minute observed hang, but worth verifying empirically that the worst case is acceptable.

No blocking concerns — the fix is structurally correct. The edge case note is for implementation awareness only.

## 📋 Elicit — Requirements Engineer ### Observations - The problem statement is concise and evidence-backed (run #1636 with 17-minute silent hang, run #1637 with 14-minute silent Chromium hang). That's exemplary requirements practice — each failure mode is traced to a concrete observation. - All four solutions map directly to the two failure modes. No scope creep, no gold-plating. ### Acceptance criteria quality | AC | Measurable? | Automatically enforced? | Notes | |----|-------------|------------------------|-------| | Backend log stays under 500 KB | ✅ | ❌ | No CI gate proposed — see Sara's review for an optional assertion | | Hanging Testcontainers test fails within 2 min | ✅ | ⚠️ | Depends on the Surefire timeout fix being correct (see Felix/Sara) | | Surefire XML reports downloadable | ✅ | ✅ | Directly verifiable in CI artifacts | | Playwright crash produces timeout failure within 30 s | ✅ | ⚠️ | Structurally guaranteed by `testTimeout`; hard to reproduce deterministically in CI | ### Edge case to clarify The observed crash in run #1637 ended "mid-Svelte-compilation at `AnnotationEditOverlay.svelte`" — before any test had been scheduled for execution. In Vitest browser mode, `testTimeout: 30_000` fires per-test once test execution starts. If Chromium dies before the first test is scheduled, the timeout fires on the initialization/compilation phase rather than test execution. The `hookTimeout: 15_000` covers `beforeAll`/`beforeEach` hooks, which is the closest built-in mechanism to the pre-execution phase. In practice, `testTimeout` should still cause Vitest to eventually surface a timeout error for each pending test — but the 30-second window applies per test, not to the entire hanging suite. If 20 tests are queued when Chromium dies, worst case is 20 × 30 s = 10 minutes before the suite fails. Still a significant improvement over the 14-minute observed hang, but worth verifying empirically that the worst case is acceptable. No blocking concerns — the fix is structurally correct. The edge case note is for implementation awareness only.
Author
Owner

Implementation complete — PR opened

All four changes are implemented and all 1585 backend tests pass.

Commits

Commit Change
dcb2b2f Create backend/src/test/resources/application.propertiesWARN root level, INFO for org.raddatz
811b80c Add maven-surefire-plugin to pom.xmlforkedProcessTimeoutInSeconds=120 + junit.jupiter.execution.timeout.default=90 s system property (replaces deprecated <timeout> alias per Felix/Sara review)
fd4d14f Add surefire XML artifact upload step — if: always(), pinned to upload-artifact@v3 per ADR-014
46e2e93 Add testTimeout: 30_000 + hookTimeout: 15_000 to vitest.client-coverage.config.ts

Key design choice

The Surefire config uses junit.jupiter.execution.timeout.default as a system property rather than the deprecated <timeout> element. This gives per-test JUnit 5 timeouts (a hanging test fails with a named error after 90 s, healthy tests continue), with forkedProcessTimeoutInSeconds=120 as a JVM-level backstop for catastrophic hangs.

## Implementation complete — PR opened All four changes are implemented and all 1585 backend tests pass. ### Commits | Commit | Change | |--------|--------| | `dcb2b2f` | Create `backend/src/test/resources/application.properties` — `WARN` root level, `INFO` for `org.raddatz` | | `811b80c` | Add `maven-surefire-plugin` to `pom.xml` — `forkedProcessTimeoutInSeconds=120` + `junit.jupiter.execution.timeout.default=90 s` system property (replaces deprecated `<timeout>` alias per Felix/Sara review) | | `fd4d14f` | Add surefire XML artifact upload step — `if: always()`, pinned to `upload-artifact@v3` per ADR-014 | | `46e2e93` | Add `testTimeout: 30_000` + `hookTimeout: 15_000` to `vitest.client-coverage.config.ts` | ### Key design choice The Surefire config uses `junit.jupiter.execution.timeout.default` as a system property rather than the deprecated `<timeout>` element. This gives per-test JUnit 5 timeouts (a hanging test fails with a named error after 90 s, healthy tests continue), with `forkedProcessTimeoutInSeconds=120` as a JVM-level backstop for catastrophic hangs.
Sign in to join this conversation.
No Label P2-medium devops
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#570