test: fix flaky browser-mode tests in AnnotationShape and OcrTrainingCard specs #547

marcel · 2026-05-12T10:31:49+02:00

marcel commented

2026-05-12 10:31:49 +02:00

Closes #541

Summary

AnnotationShape + AnnotationLayer: replaced 4 synchronous query().toBeNull() assertions with await expect.element(...).not.toBeInTheDocument() — Svelte defers DOM updates to microtasks, so the synchronous snapshot races against the update and fails non-deterministically
OcrTrainingCard in-flight test: resolved pendingFetch and awaited the settled state (button not disabled) so the async finally { training = false } block completes before afterEach(cleanup) destroys the component
OcrTrainingCard success-dismiss test: scoped vi.useFakeTimers() / vi.runAllTimers() / vi.useRealTimers() to the describe block so the 5-second auto-dismiss timer fires on the live component, not after cleanup destroys it

Root causes addressed

Test	Anti-pattern	Fix
AnnotationShape (×2)	`query().toBeNull()` — synchronous DOM snapshot before Svelte microtask settles	`await expect.element().not.toBeInTheDocument()`
AnnotationLayer (×2)	same (Sara's sweep: ≤5 instances → fix in PR)	same
OcrTrainingCard in-flight	`resolveFetch()` last line → `finally` block runs after `cleanup` → `track_reactivity_loss`	await settled state before test exits
OcrTrainingCard success-dismiss	`setTimeout(5000)` fires after `cleanup` → `track_reactivity_loss`	`vi.useFakeTimers()` + `vi.runAllTimers()` scoped to describe block

Test plan

All 24 tests across the 3 spec files pass on 3 consecutive runs with no code changes
grep confirms zero remaining query().toBeNull() instances in src/
No regressions in the rest of the test suite

Notes

During review, discovered that PdfViewer.svelte.test.ts (added in a recent merge to main) re-introduces vi.mock('pdfjs-dist', factory) — the exact pattern that #535 eliminated. This causes the birpc teardown race locally. Tracked separately in #546; not in scope for this PR.

🤖 Generated with Claude Code

Closes #541 ## Summary - **AnnotationShape + AnnotationLayer**: replaced 4 synchronous `query().toBeNull()` assertions with `await expect.element(...).not.toBeInTheDocument()` — Svelte defers DOM updates to microtasks, so the synchronous snapshot races against the update and fails non-deterministically - **OcrTrainingCard in-flight test**: resolved `pendingFetch` and awaited the settled state (`button not disabled`) so the async `finally { training = false }` block completes before `afterEach(cleanup)` destroys the component - **OcrTrainingCard success-dismiss test**: scoped `vi.useFakeTimers()` / `vi.runAllTimers()` / `vi.useRealTimers()` to the describe block so the 5-second auto-dismiss timer fires on the live component, not after cleanup destroys it ## Root causes addressed | Test | Anti-pattern | Fix | |------|-------------|-----| | AnnotationShape (×2) | `query().toBeNull()` — synchronous DOM snapshot before Svelte microtask settles | `await expect.element().not.toBeInTheDocument()` | | AnnotationLayer (×2) | same (Sara's sweep: ≤5 instances → fix in PR) | same | | OcrTrainingCard in-flight | `resolveFetch()` last line → `finally` block runs after `cleanup` → `track_reactivity_loss` | await settled state before test exits | | OcrTrainingCard success-dismiss | `setTimeout(5000)` fires after `cleanup` → `track_reactivity_loss` | `vi.useFakeTimers()` + `vi.runAllTimers()` scoped to describe block | ## Test plan - [ ] All 24 tests across the 3 spec files pass on 3 consecutive runs with no code changes - [ ] `grep` confirms zero remaining `query().toBeNull()` instances in `src/` - [ ] No regressions in the rest of the test suite ## Notes During review, discovered that `PdfViewer.svelte.test.ts` (added in a recent merge to main) re-introduces `vi.mock('pdfjs-dist', factory)` — the exact pattern that #535 eliminated. This causes the birpc teardown race locally. Tracked separately in #546; not in scope for this PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

marcel added 2 commits 2026-05-12 10:31:49 +02:00

test(annotation): replace synchronous query().toBeNull() with async not.toBeInTheDocument() ae1688319e

Svelte defers DOM updates to microtasks; .query() is a synchronous
snapshot that can fire before the element disappears — making the
absence assertions in AnnotationShape and AnnotationLayer non-deterministic.

Sweeps all 4 instances across both spec files (Sara's ≤5 threshold).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(ocr): fix track_reactivity_loss in OcrTrainingCard spec

CI / Unit & Component Tests (pull_request) Failing after 1m50s

Details

CI / OCR Service Tests (pull_request) Successful in 16s

Details

CI / Backend Unit Tests (pull_request) Successful in 4m11s

Details

CI / fail2ban Regex (pull_request) Successful in 39s

Details

CI / Compose Bucket Idempotency (pull_request) Failing after 11s

Details

CI / Unit & Component Tests (push) Failing after 1m50s

Details

CI / OCR Service Tests (push) Successful in 16s

Details

CI / Backend Unit Tests (push) Successful in 4m7s

Details

CI / fail2ban Regex (push) Successful in 37s

Details

CI / Compose Bucket Idempotency (push) Failing after 10s

Details

7ee038faaf

Two root causes:

1. In-flight test: resolveFetch() was the last line, leaving the async
   finally-block writing `training = false` after cleanup destroyed the
   component. Awaiting the button becoming re-enabled ensures the finally
   block settles before cleanup runs.

2. Success-dismiss test: startTraining() schedules setTimeout(5000) which
   fired after cleanup destroyed the component. vi.useFakeTimers() +
   vi.runAllTimers() scoped to the describe block drains the timer while
   the component is still alive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel commented

2026-05-12 11:12:38 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ✅ Approved

Precise, minimal, and correct. Each fix addresses exactly the identified anti-pattern — nothing more.

What I checked

AnnotationShape + AnnotationLayer (*.svelte.spec.ts): The switch from the synchronous .query().toBeNull() pattern to await expect.element(...).not.toBeInTheDocument() is the canonical vitest-browser assertion. .query() returns a synchronous DOM snapshot before Svelte's microtask queue settles, making absence assertions structurally non-deterministic. The await version lets the browser runner properly wait. Both files are consistent. ✓

OcrTrainingCard in-flight test (OcrTrainingCard.svelte.spec.ts, lines 114–122): Previously the test exited after resolveFetch() with the finally { training = false } block still pending — a classic race against afterEach(cleanup). The new approach awaits the settled state (button not disabled) before the test returns. This is the correct idiom: let the component reach a known stable state before cleanup runs. ✓

OcrTrainingCard success-dismiss timer scoping (lines 76–82): vi.useFakeTimers() in beforeEach, then vi.runAllTimers() before vi.useRealTimers() in afterEach — the order matters. Running all timers first drains any pending fake timers on the live component, preventing them from leaking into the next test's real-clock context. ✓

The import addition of beforeEach is correct; the dead comment removal (// Cleanup: resolve the pending promise) is good hygiene since the code now self-documents intent.

Suggestions (nice to have)

The PR description mentions a grep check for zero remaining .query().toBeNull() in src/. Worth codifying this as a lint rule or a CI grep -r step in the future — catch regressions before they reach review.

No blockers.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** Precise, minimal, and correct. Each fix addresses exactly the identified anti-pattern — nothing more. ### What I checked **AnnotationShape + AnnotationLayer** (`*.svelte.spec.ts`): The switch from the synchronous `.query().toBeNull()` pattern to `await expect.element(...).not.toBeInTheDocument()` is the canonical vitest-browser assertion. `.query()` returns a synchronous DOM snapshot before Svelte's microtask queue settles, making absence assertions structurally non-deterministic. The `await` version lets the browser runner properly wait. Both files are consistent. ✓ **OcrTrainingCard in-flight test** (`OcrTrainingCard.svelte.spec.ts`, lines 114–122): Previously the test exited after `resolveFetch()` with the `finally { training = false }` block still pending — a classic race against `afterEach(cleanup)`. The new approach awaits the settled state (`button not disabled`) before the test returns. This is the correct idiom: let the component reach a known stable state before cleanup runs. ✓ **OcrTrainingCard success-dismiss timer scoping** (lines 76–82): `vi.useFakeTimers()` in `beforeEach`, then `vi.runAllTimers()` *before* `vi.useRealTimers()` in `afterEach` — the order matters. Running all timers first drains any pending fake timers on the live component, preventing them from leaking into the next test's real-clock context. ✓ The import addition of `beforeEach` is correct; the dead comment removal (`// Cleanup: resolve the pending promise`) is good hygiene since the code now self-documents intent. ### Suggestions (nice to have) - The PR description mentions a `grep` check for zero remaining `.query().toBeNull()` in `src/`. Worth codifying this as a lint rule or a CI `grep -r` step in the future — catch regressions before they reach review. No blockers.

marcel commented

2026-05-12 11:12:45 +02:00

🏛️ Markus Keller — Application Architect

Verdict: ✅ Approved

This PR is entirely test-file changes. No production code, no domain model, no new routes, no infrastructure additions. Zero architectural implications.

Documentation check

Per my documentation update table, I verified whether any trigger condition applies:

Trigger	Present?
New Flyway migration	✗
New backend package or domain module	✗
New SvelteKit route	✗
New Docker service	✗
New `ErrorCode` or `Permission`	✗
New domain concept or term	✗
Architectural decision with lasting consequences	✗

No documentation updates required. The PR scope is correctly bounded to its stated purpose.

Observation

The PR description's footnote about PdfViewer.svelte.test.ts re-introducing vi.mock('pdfjs-dist', factory) (tracked in #546) is a clean escalation of a discovered problem — it's scoped out appropriately and not bundled here. That's the right call; mixing concerns would have made this PR harder to review and revert independently.

## 🏛️ Markus Keller — Application Architect **Verdict: ✅ Approved** This PR is entirely test-file changes. No production code, no domain model, no new routes, no infrastructure additions. Zero architectural implications. ### Documentation check Per my documentation update table, I verified whether any trigger condition applies: | Trigger | Present? | |---|---| | New Flyway migration | ✗ | | New backend package or domain module | ✗ | | New SvelteKit route | ✗ | | New Docker service | ✗ | | New `ErrorCode` or `Permission` | ✗ | | New domain concept or term | ✗ | | Architectural decision with lasting consequences | ✗ | No documentation updates required. The PR scope is correctly bounded to its stated purpose. ### Observation The PR description's footnote about `PdfViewer.svelte.test.ts` re-introducing `vi.mock('pdfjs-dist', factory)` (tracked in #546) is a clean escalation of a discovered problem — it's scoped out appropriately and not bundled here. That's the right call; mixing concerns would have made this PR harder to review and revert independently.

marcel commented

2026-05-12 11:12:52 +02:00

🔧 Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

This PR touches three frontend spec files only. No changes to Docker Compose, CI workflow files, Caddyfile, or any infrastructure component.

What I checked

No new dependencies added (package.json / pom.xml untouched)
No CI workflow changes — the existing vitest-browser run will pick up these fixes automatically
No new Docker services, volumes, or environment variables
No image tags changed

CI impact note

Flaky browser tests are a CI reliability problem as much as a code problem — they erode trust in the pipeline until teams start treating red runs as noise. Eliminating the query().toBeNull() anti-pattern and the timer/cleanup race directly improves pipeline signal quality. The 3-consecutive-run verification in the test plan is the right bar to set before merge.

The PdfViewer re-introduction of vi.mock('pdfjs-dist') (tracked in #546) is worth watching — if that lands in main before #546 is resolved, expect sporadic CI failures on the browser test job. Worth prioritising #546 promptly.

## 🔧 Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** This PR touches three frontend spec files only. No changes to Docker Compose, CI workflow files, Caddyfile, or any infrastructure component. ### What I checked - No new dependencies added (`package.json` / `pom.xml` untouched) - No CI workflow changes — the existing `vitest-browser` run will pick up these fixes automatically - No new Docker services, volumes, or environment variables - No image tags changed ### CI impact note Flaky browser tests are a CI reliability problem as much as a code problem — they erode trust in the pipeline until teams start treating red runs as noise. Eliminating the `query().toBeNull()` anti-pattern and the timer/cleanup race directly improves pipeline signal quality. The 3-consecutive-run verification in the test plan is the right bar to set before merge. The PdfViewer re-introduction of `vi.mock('pdfjs-dist')` (tracked in #546) is worth watching — if that lands in main before #546 is resolved, expect sporadic CI failures on the browser test job. Worth prioritising #546 promptly.

marcel commented

2026-05-12 11:13:00 +02:00

📋 Elicit — Requirements Engineer

Verdict: ✅ Approved

Reviewing from a requirements and specification completeness perspective.

PR description quality

The description is unusually well-structured for a test-only fix:

Root cause table: clearly maps each affected test to its anti-pattern and the concrete fix. This is the kind of documentation that makes future contributors understand why the tests look the way they do.
Test plan: three concrete, independently verifiable acceptance criteria — consecutive pass runs, grep verification, regression check. Each is testable without ambiguity.
Scope boundary: the PdfViewer observation is properly escalated to #546 rather than silently absorbed or silently ignored. This is correct scope management.

Requirement traceability

Closes #541 — the linked issue should describe the flakiness symptoms. The PR delivers on that. The fix is self-contained, reversible, and does not introduce new behaviour — it only makes existing behaviour deterministic.

One clarification worth noting

The test plan item "All 24 tests across the 3 spec files pass on 3 consecutive runs" — it's worth confirming in the merge checklist that this was actually run, not just stated. Flakiness by definition requires repeated runs to validate elimination. If CI only runs once, the 3-consecutive-run check is a manual step that should be explicitly confirmed before merge.

No blockers.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** Reviewing from a requirements and specification completeness perspective. ### PR description quality The description is unusually well-structured for a test-only fix: - **Root cause table**: clearly maps each affected test to its anti-pattern and the concrete fix. This is the kind of documentation that makes future contributors understand *why* the tests look the way they do. - **Test plan**: three concrete, independently verifiable acceptance criteria — consecutive pass runs, grep verification, regression check. Each is testable without ambiguity. - **Scope boundary**: the PdfViewer observation is properly escalated to #546 rather than silently absorbed or silently ignored. This is correct scope management. ### Requirement traceability Closes #541 — the linked issue should describe the flakiness symptoms. The PR delivers on that. The fix is self-contained, reversible, and does not introduce new behaviour — it only makes existing behaviour deterministic. ### One clarification worth noting The test plan item "All 24 tests across the 3 spec files pass on 3 consecutive runs" — it's worth confirming in the merge checklist that this was actually run, not just stated. Flakiness by definition requires repeated runs to validate elimination. If CI only runs once, the 3-consecutive-run check is a manual step that should be explicitly confirmed before merge. No blockers.

marcel commented

2026-05-12 11:13:07 +02:00

🔒 Nora "NullX" Steiner — Application Security Engineer

Verdict: ✅ Approved

This PR modifies three test spec files only. No production code paths, no authentication logic, no data handling, no API endpoints are touched.

Security assessment

vi.useFakeTimers() / vi.useRealTimers() scoping — the timer fake is properly contained to its describe block via beforeEach/afterEach. There's no risk of fake timers leaking into unrelated tests and masking timing-sensitive security controls (e.g. token expiry, session timeout assertions). The drain order — vi.runAllTimers() before vi.useRealTimers() — ensures no pending timer callbacks escape into real-clock context. ✓

No test credential exposure — the spec files mock fetch responses inline with { ok: true } / { ok: false } — no real credentials, no network calls, no secrets. ✓

No new test infrastructure — no new vi.mock() patterns, no new module boundary bypasses beyond what already exists in these files. ✓

Nothing of security concern here. The PR reduces non-determinism in the test suite, which is a net positive for test reliability — and reliable tests are a prerequisite for meaningful security regression coverage.

## 🔒 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** This PR modifies three test spec files only. No production code paths, no authentication logic, no data handling, no API endpoints are touched. ### Security assessment **`vi.useFakeTimers()` / `vi.useRealTimers()` scoping** — the timer fake is properly contained to its `describe` block via `beforeEach`/`afterEach`. There's no risk of fake timers leaking into unrelated tests and masking timing-sensitive security controls (e.g. token expiry, session timeout assertions). The drain order — `vi.runAllTimers()` *before* `vi.useRealTimers()` — ensures no pending timer callbacks escape into real-clock context. ✓ **No test credential exposure** — the spec files mock `fetch` responses inline with `{ ok: true }` / `{ ok: false }` — no real credentials, no network calls, no secrets. ✓ **No new test infrastructure** — no new `vi.mock()` patterns, no new module boundary bypasses beyond what already exists in these files. ✓ Nothing of security concern here. The PR reduces non-determinism in the test suite, which is a net positive for test reliability — and reliable tests are a prerequisite for meaningful security regression coverage.

marcel commented

2026-05-12 11:13:22 +02:00

🧪 Sara Holt — Senior QA Engineer

Verdict: ✅ Approved

This is exactly the kind of test-infrastructure maintenance that keeps a test suite trustworthy. Each fix addresses a real, documented root cause — not a symptom.

Detailed review

query().toBeNull() → await expect.element(...).not.toBeInTheDocument() (AnnotationShape, AnnotationLayer):

The synchronous .query() pattern takes a DOM snapshot before Svelte's microtask queue settles the update. In browser-mode testing (vitest-browser), DOM updates are microtask-deferred — the snapshot may capture the pre-update state, making absence assertions structurally non-deterministic. The await form hands control back to the browser's rendering pipeline and waits for the expectation to become true. This is the correct, documented vitest-browser API. ✓

Four instances fixed across two files, consistent treatment. ✓

OcrTrainingCard in-flight test (line 114–122):

The previous pattern:

resolveFetch({ ok: false });
// test exits
// afterEach(cleanup) runs
// finally { training = false } fires AFTER component is destroyed → track_reactivity_loss

The new pattern awaits button not disabled — a concrete settled state — before the test exits. This ensures the component's async cleanup path completes while the component is still alive. ✓

OcrTrainingCard success-dismiss timer scoping (lines 76–82):

Critical detail: vi.runAllTimers() is called before vi.useRealTimers() in afterEach. This is the correct order. Running all timers first drains the 5-second auto-dismiss callback on the live component. If useRealTimers() were called first, the pending fake timer callback would be orphaned — a scenario that can cause track_reactivity_loss or silent test pollution. ✓

The beforeEach → afterEach symmetry is clean and limited to the describe block that needs it. ✓

Suggestions (non-blocking)

Grep verification in CI: The test plan mentions grep for zero remaining .query().toBeNull() instances. Consider adding this as a package.json script ("test:antipattern-check": "grep -r 'query().toBeNull' src/ && exit 1 || exit 0") that runs in CI — prevents silent regression.
#546 priority: The re-introduced vi.mock('pdfjs-dist', factory) in PdfViewer.svelte.test.ts is the exact pattern eliminated by #535. If that lands before #546 is resolved, expect birpc teardown races to resurface in CI. Flag #546 as a follow-up dependency.

No blockers. The fixes are correct, minimal, and well-documented.

## 🧪 Sara Holt — Senior QA Engineer **Verdict: ✅ Approved** This is exactly the kind of test-infrastructure maintenance that keeps a test suite trustworthy. Each fix addresses a real, documented root cause — not a symptom. ### Detailed review **`query().toBeNull()` → `await expect.element(...).not.toBeInTheDocument()`** (AnnotationShape, AnnotationLayer): The synchronous `.query()` pattern takes a DOM snapshot *before* Svelte's microtask queue settles the update. In browser-mode testing (vitest-browser), DOM updates are microtask-deferred — the snapshot may capture the pre-update state, making absence assertions structurally non-deterministic. The `await` form hands control back to the browser's rendering pipeline and waits for the expectation to become true. This is the correct, documented vitest-browser API. ✓ Four instances fixed across two files, consistent treatment. ✓ **OcrTrainingCard in-flight test** (line 114–122): The previous pattern: ``` resolveFetch({ ok: false }); // test exits // afterEach(cleanup) runs // finally { training = false } fires AFTER component is destroyed → track_reactivity_loss ``` The new pattern awaits `button not disabled` — a concrete settled state — before the test exits. This ensures the component's async cleanup path completes while the component is still alive. ✓ **OcrTrainingCard success-dismiss timer scoping** (lines 76–82): Critical detail: `vi.runAllTimers()` is called *before* `vi.useRealTimers()` in `afterEach`. This is the correct order. Running all timers first drains the 5-second auto-dismiss callback on the live component. If `useRealTimers()` were called first, the pending fake timer callback would be orphaned — a scenario that can cause `track_reactivity_loss` or silent test pollution. ✓ The `beforeEach` → `afterEach` symmetry is clean and limited to the `describe` block that needs it. ✓ ### Suggestions (non-blocking) 1. **Grep verification in CI**: The test plan mentions `grep` for zero remaining `.query().toBeNull()` instances. Consider adding this as a `package.json` script (`"test:antipattern-check": "grep -r 'query().toBeNull' src/ && exit 1 || exit 0"`) that runs in CI — prevents silent regression. 2. **`#546` priority**: The re-introduced `vi.mock('pdfjs-dist', factory)` in `PdfViewer.svelte.test.ts` is the exact pattern eliminated by #535. If that lands before #546 is resolved, expect birpc teardown races to resurface in CI. Flag #546 as a follow-up dependency. No blockers. The fixes are correct, minimal, and well-documented.

marcel commented

2026-05-12 11:13:26 +02:00

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Verdict: ✅ Approved

This PR modifies three test spec files with no changes to any .svelte component, no template markup, no CSS, and no Tailwind classes.

UI/UX assessment

Nothing to review from a design, accessibility, or brand perspective. No visual states changed, no interaction patterns modified, no ARIA attributes touched.

One indirect observation worth noting: the OcrTrainingCard success-dismiss test verifies that a dismiss button exists with the correct 44×44px touch target (h-11 w-11). That test already exists and is passing — this PR preserves it by ensuring the timer runs against the live component (not after cleanup). The accessibility requirement on the dismiss button remains correctly tested. ✓

LGTM.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist **Verdict: ✅ Approved** This PR modifies three test spec files with no changes to any `.svelte` component, no template markup, no CSS, and no Tailwind classes. ### UI/UX assessment Nothing to review from a design, accessibility, or brand perspective. No visual states changed, no interaction patterns modified, no ARIA attributes touched. One indirect observation worth noting: the `OcrTrainingCard` success-dismiss test verifies that a dismiss button exists with the correct 44×44px touch target (`h-11 w-11`). That test already exists and is passing — this PR preserves it by ensuring the timer runs against the live component (not after cleanup). The accessibility requirement on the dismiss button remains correctly tested. ✓ LGTM.

marcel merged commit 7ee038faaf into main

2026-05-12 11:14:22 +02:00

marcel deleted branch feat/issue-541-flaky-browser-tests

2026-05-12 11:14:22 +02:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#547