test: fix flaky browser-mode tests in AnnotationShape and OcrTrainingCard specs #547

Merged
marcel merged 2 commits from feat/issue-541-flaky-browser-tests into main 2026-05-12 11:14:22 +02:00
Owner

Closes #541

Summary

  • AnnotationShape + AnnotationLayer: replaced 4 synchronous query().toBeNull() assertions with await expect.element(...).not.toBeInTheDocument() — Svelte defers DOM updates to microtasks, so the synchronous snapshot races against the update and fails non-deterministically
  • OcrTrainingCard in-flight test: resolved pendingFetch and awaited the settled state (button not disabled) so the async finally { training = false } block completes before afterEach(cleanup) destroys the component
  • OcrTrainingCard success-dismiss test: scoped vi.useFakeTimers() / vi.runAllTimers() / vi.useRealTimers() to the describe block so the 5-second auto-dismiss timer fires on the live component, not after cleanup destroys it

Root causes addressed

Test Anti-pattern Fix
AnnotationShape (×2) query().toBeNull() — synchronous DOM snapshot before Svelte microtask settles await expect.element().not.toBeInTheDocument()
AnnotationLayer (×2) same (Sara's sweep: ≤5 instances → fix in PR) same
OcrTrainingCard in-flight resolveFetch() last line → finally block runs after cleanuptrack_reactivity_loss await settled state before test exits
OcrTrainingCard success-dismiss setTimeout(5000) fires after cleanuptrack_reactivity_loss vi.useFakeTimers() + vi.runAllTimers() scoped to describe block

Test plan

  • All 24 tests across the 3 spec files pass on 3 consecutive runs with no code changes
  • grep confirms zero remaining query().toBeNull() instances in src/
  • No regressions in the rest of the test suite

Notes

During review, discovered that PdfViewer.svelte.test.ts (added in a recent merge to main) re-introduces vi.mock('pdfjs-dist', factory) — the exact pattern that #535 eliminated. This causes the birpc teardown race locally. Tracked separately in #546; not in scope for this PR.

🤖 Generated with Claude Code

Closes #541 ## Summary - **AnnotationShape + AnnotationLayer**: replaced 4 synchronous `query().toBeNull()` assertions with `await expect.element(...).not.toBeInTheDocument()` — Svelte defers DOM updates to microtasks, so the synchronous snapshot races against the update and fails non-deterministically - **OcrTrainingCard in-flight test**: resolved `pendingFetch` and awaited the settled state (`button not disabled`) so the async `finally { training = false }` block completes before `afterEach(cleanup)` destroys the component - **OcrTrainingCard success-dismiss test**: scoped `vi.useFakeTimers()` / `vi.runAllTimers()` / `vi.useRealTimers()` to the describe block so the 5-second auto-dismiss timer fires on the live component, not after cleanup destroys it ## Root causes addressed | Test | Anti-pattern | Fix | |------|-------------|-----| | AnnotationShape (×2) | `query().toBeNull()` — synchronous DOM snapshot before Svelte microtask settles | `await expect.element().not.toBeInTheDocument()` | | AnnotationLayer (×2) | same (Sara's sweep: ≤5 instances → fix in PR) | same | | OcrTrainingCard in-flight | `resolveFetch()` last line → `finally` block runs after `cleanup` → `track_reactivity_loss` | await settled state before test exits | | OcrTrainingCard success-dismiss | `setTimeout(5000)` fires after `cleanup` → `track_reactivity_loss` | `vi.useFakeTimers()` + `vi.runAllTimers()` scoped to describe block | ## Test plan - [ ] All 24 tests across the 3 spec files pass on 3 consecutive runs with no code changes - [ ] `grep` confirms zero remaining `query().toBeNull()` instances in `src/` - [ ] No regressions in the rest of the test suite ## Notes During review, discovered that `PdfViewer.svelte.test.ts` (added in a recent merge to main) re-introduces `vi.mock('pdfjs-dist', factory)` — the exact pattern that #535 eliminated. This causes the birpc teardown race locally. Tracked separately in #546; not in scope for this PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
marcel added 2 commits 2026-05-12 10:31:49 +02:00
Svelte defers DOM updates to microtasks; .query() is a synchronous
snapshot that can fire before the element disappears — making the
absence assertions in AnnotationShape and AnnotationLayer non-deterministic.

Sweeps all 4 instances across both spec files (Sara's ≤5 threshold).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test(ocr): fix track_reactivity_loss in OcrTrainingCard spec
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 1m50s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m11s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Failing after 11s
CI / Unit & Component Tests (push) Failing after 1m50s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 4m7s
CI / fail2ban Regex (push) Successful in 37s
CI / Compose Bucket Idempotency (push) Failing after 10s
7ee038faaf
Two root causes:

1. In-flight test: resolveFetch() was the last line, leaving the async
   finally-block writing `training = false` after cleanup destroyed the
   component. Awaiting the button becoming re-enabled ensures the finally
   block settles before cleanup runs.

2. Success-dismiss test: startTraining() schedules setTimeout(5000) which
   fired after cleanup destroyed the component. vi.useFakeTimers() +
   vi.runAllTimers() scoped to the describe block drains the timer while
   the component is still alive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: Approved

Precise, minimal, and correct. Each fix addresses exactly the identified anti-pattern — nothing more.

What I checked

AnnotationShape + AnnotationLayer (*.svelte.spec.ts): The switch from the synchronous .query().toBeNull() pattern to await expect.element(...).not.toBeInTheDocument() is the canonical vitest-browser assertion. .query() returns a synchronous DOM snapshot before Svelte's microtask queue settles, making absence assertions structurally non-deterministic. The await version lets the browser runner properly wait. Both files are consistent. ✓

OcrTrainingCard in-flight test (OcrTrainingCard.svelte.spec.ts, lines 114–122): Previously the test exited after resolveFetch() with the finally { training = false } block still pending — a classic race against afterEach(cleanup). The new approach awaits the settled state (button not disabled) before the test returns. This is the correct idiom: let the component reach a known stable state before cleanup runs. ✓

OcrTrainingCard success-dismiss timer scoping (lines 76–82): vi.useFakeTimers() in beforeEach, then vi.runAllTimers() before vi.useRealTimers() in afterEach — the order matters. Running all timers first drains any pending fake timers on the live component, preventing them from leaking into the next test's real-clock context. ✓

The import addition of beforeEach is correct; the dead comment removal (// Cleanup: resolve the pending promise) is good hygiene since the code now self-documents intent.

Suggestions (nice to have)

  • The PR description mentions a grep check for zero remaining .query().toBeNull() in src/. Worth codifying this as a lint rule or a CI grep -r step in the future — catch regressions before they reach review.

No blockers.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** Precise, minimal, and correct. Each fix addresses exactly the identified anti-pattern — nothing more. ### What I checked **AnnotationShape + AnnotationLayer** (`*.svelte.spec.ts`): The switch from the synchronous `.query().toBeNull()` pattern to `await expect.element(...).not.toBeInTheDocument()` is the canonical vitest-browser assertion. `.query()` returns a synchronous DOM snapshot before Svelte's microtask queue settles, making absence assertions structurally non-deterministic. The `await` version lets the browser runner properly wait. Both files are consistent. ✓ **OcrTrainingCard in-flight test** (`OcrTrainingCard.svelte.spec.ts`, lines 114–122): Previously the test exited after `resolveFetch()` with the `finally { training = false }` block still pending — a classic race against `afterEach(cleanup)`. The new approach awaits the settled state (`button not disabled`) before the test returns. This is the correct idiom: let the component reach a known stable state before cleanup runs. ✓ **OcrTrainingCard success-dismiss timer scoping** (lines 76–82): `vi.useFakeTimers()` in `beforeEach`, then `vi.runAllTimers()` *before* `vi.useRealTimers()` in `afterEach` — the order matters. Running all timers first drains any pending fake timers on the live component, preventing them from leaking into the next test's real-clock context. ✓ The import addition of `beforeEach` is correct; the dead comment removal (`// Cleanup: resolve the pending promise`) is good hygiene since the code now self-documents intent. ### Suggestions (nice to have) - The PR description mentions a `grep` check for zero remaining `.query().toBeNull()` in `src/`. Worth codifying this as a lint rule or a CI `grep -r` step in the future — catch regressions before they reach review. No blockers.
Author
Owner

🏛️ Markus Keller — Application Architect

Verdict: Approved

This PR is entirely test-file changes. No production code, no domain model, no new routes, no infrastructure additions. Zero architectural implications.

Documentation check

Per my documentation update table, I verified whether any trigger condition applies:

Trigger Present?
New Flyway migration
New backend package or domain module
New SvelteKit route
New Docker service
New ErrorCode or Permission
New domain concept or term
Architectural decision with lasting consequences

No documentation updates required. The PR scope is correctly bounded to its stated purpose.

Observation

The PR description's footnote about PdfViewer.svelte.test.ts re-introducing vi.mock('pdfjs-dist', factory) (tracked in #546) is a clean escalation of a discovered problem — it's scoped out appropriately and not bundled here. That's the right call; mixing concerns would have made this PR harder to review and revert independently.

## 🏛️ Markus Keller — Application Architect **Verdict: ✅ Approved** This PR is entirely test-file changes. No production code, no domain model, no new routes, no infrastructure additions. Zero architectural implications. ### Documentation check Per my documentation update table, I verified whether any trigger condition applies: | Trigger | Present? | |---|---| | New Flyway migration | ✗ | | New backend package or domain module | ✗ | | New SvelteKit route | ✗ | | New Docker service | ✗ | | New `ErrorCode` or `Permission` | ✗ | | New domain concept or term | ✗ | | Architectural decision with lasting consequences | ✗ | No documentation updates required. The PR scope is correctly bounded to its stated purpose. ### Observation The PR description's footnote about `PdfViewer.svelte.test.ts` re-introducing `vi.mock('pdfjs-dist', factory)` (tracked in #546) is a clean escalation of a discovered problem — it's scoped out appropriately and not bundled here. That's the right call; mixing concerns would have made this PR harder to review and revert independently.
Author
Owner

🔧 Tobias Wendt — DevOps & Platform Engineer

Verdict: Approved

This PR touches three frontend spec files only. No changes to Docker Compose, CI workflow files, Caddyfile, or any infrastructure component.

What I checked

  • No new dependencies added (package.json / pom.xml untouched)
  • No CI workflow changes — the existing vitest-browser run will pick up these fixes automatically
  • No new Docker services, volumes, or environment variables
  • No image tags changed

CI impact note

Flaky browser tests are a CI reliability problem as much as a code problem — they erode trust in the pipeline until teams start treating red runs as noise. Eliminating the query().toBeNull() anti-pattern and the timer/cleanup race directly improves pipeline signal quality. The 3-consecutive-run verification in the test plan is the right bar to set before merge.

The PdfViewer re-introduction of vi.mock('pdfjs-dist') (tracked in #546) is worth watching — if that lands in main before #546 is resolved, expect sporadic CI failures on the browser test job. Worth prioritising #546 promptly.

## 🔧 Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** This PR touches three frontend spec files only. No changes to Docker Compose, CI workflow files, Caddyfile, or any infrastructure component. ### What I checked - No new dependencies added (`package.json` / `pom.xml` untouched) - No CI workflow changes — the existing `vitest-browser` run will pick up these fixes automatically - No new Docker services, volumes, or environment variables - No image tags changed ### CI impact note Flaky browser tests are a CI reliability problem as much as a code problem — they erode trust in the pipeline until teams start treating red runs as noise. Eliminating the `query().toBeNull()` anti-pattern and the timer/cleanup race directly improves pipeline signal quality. The 3-consecutive-run verification in the test plan is the right bar to set before merge. The PdfViewer re-introduction of `vi.mock('pdfjs-dist')` (tracked in #546) is worth watching — if that lands in main before #546 is resolved, expect sporadic CI failures on the browser test job. Worth prioritising #546 promptly.
Author
Owner

📋 Elicit — Requirements Engineer

Verdict: Approved

Reviewing from a requirements and specification completeness perspective.

PR description quality

The description is unusually well-structured for a test-only fix:

  • Root cause table: clearly maps each affected test to its anti-pattern and the concrete fix. This is the kind of documentation that makes future contributors understand why the tests look the way they do.
  • Test plan: three concrete, independently verifiable acceptance criteria — consecutive pass runs, grep verification, regression check. Each is testable without ambiguity.
  • Scope boundary: the PdfViewer observation is properly escalated to #546 rather than silently absorbed or silently ignored. This is correct scope management.

Requirement traceability

Closes #541 — the linked issue should describe the flakiness symptoms. The PR delivers on that. The fix is self-contained, reversible, and does not introduce new behaviour — it only makes existing behaviour deterministic.

One clarification worth noting

The test plan item "All 24 tests across the 3 spec files pass on 3 consecutive runs" — it's worth confirming in the merge checklist that this was actually run, not just stated. Flakiness by definition requires repeated runs to validate elimination. If CI only runs once, the 3-consecutive-run check is a manual step that should be explicitly confirmed before merge.

No blockers.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** Reviewing from a requirements and specification completeness perspective. ### PR description quality The description is unusually well-structured for a test-only fix: - **Root cause table**: clearly maps each affected test to its anti-pattern and the concrete fix. This is the kind of documentation that makes future contributors understand *why* the tests look the way they do. - **Test plan**: three concrete, independently verifiable acceptance criteria — consecutive pass runs, grep verification, regression check. Each is testable without ambiguity. - **Scope boundary**: the PdfViewer observation is properly escalated to #546 rather than silently absorbed or silently ignored. This is correct scope management. ### Requirement traceability Closes #541 — the linked issue should describe the flakiness symptoms. The PR delivers on that. The fix is self-contained, reversible, and does not introduce new behaviour — it only makes existing behaviour deterministic. ### One clarification worth noting The test plan item "All 24 tests across the 3 spec files pass on 3 consecutive runs" — it's worth confirming in the merge checklist that this was actually run, not just stated. Flakiness by definition requires repeated runs to validate elimination. If CI only runs once, the 3-consecutive-run check is a manual step that should be explicitly confirmed before merge. No blockers.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Verdict: Approved

This PR modifies three test spec files only. No production code paths, no authentication logic, no data handling, no API endpoints are touched.

Security assessment

vi.useFakeTimers() / vi.useRealTimers() scoping — the timer fake is properly contained to its describe block via beforeEach/afterEach. There's no risk of fake timers leaking into unrelated tests and masking timing-sensitive security controls (e.g. token expiry, session timeout assertions). The drain order — vi.runAllTimers() before vi.useRealTimers() — ensures no pending timer callbacks escape into real-clock context. ✓

No test credential exposure — the spec files mock fetch responses inline with { ok: true } / { ok: false } — no real credentials, no network calls, no secrets. ✓

No new test infrastructure — no new vi.mock() patterns, no new module boundary bypasses beyond what already exists in these files. ✓

Nothing of security concern here. The PR reduces non-determinism in the test suite, which is a net positive for test reliability — and reliable tests are a prerequisite for meaningful security regression coverage.

## 🔒 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** This PR modifies three test spec files only. No production code paths, no authentication logic, no data handling, no API endpoints are touched. ### Security assessment **`vi.useFakeTimers()` / `vi.useRealTimers()` scoping** — the timer fake is properly contained to its `describe` block via `beforeEach`/`afterEach`. There's no risk of fake timers leaking into unrelated tests and masking timing-sensitive security controls (e.g. token expiry, session timeout assertions). The drain order — `vi.runAllTimers()` *before* `vi.useRealTimers()` — ensures no pending timer callbacks escape into real-clock context. ✓ **No test credential exposure** — the spec files mock `fetch` responses inline with `{ ok: true }` / `{ ok: false }` — no real credentials, no network calls, no secrets. ✓ **No new test infrastructure** — no new `vi.mock()` patterns, no new module boundary bypasses beyond what already exists in these files. ✓ Nothing of security concern here. The PR reduces non-determinism in the test suite, which is a net positive for test reliability — and reliable tests are a prerequisite for meaningful security regression coverage.
Author
Owner

🧪 Sara Holt — Senior QA Engineer

Verdict: Approved

This is exactly the kind of test-infrastructure maintenance that keeps a test suite trustworthy. Each fix addresses a real, documented root cause — not a symptom.

Detailed review

query().toBeNull()await expect.element(...).not.toBeInTheDocument() (AnnotationShape, AnnotationLayer):

The synchronous .query() pattern takes a DOM snapshot before Svelte's microtask queue settles the update. In browser-mode testing (vitest-browser), DOM updates are microtask-deferred — the snapshot may capture the pre-update state, making absence assertions structurally non-deterministic. The await form hands control back to the browser's rendering pipeline and waits for the expectation to become true. This is the correct, documented vitest-browser API. ✓

Four instances fixed across two files, consistent treatment. ✓

OcrTrainingCard in-flight test (line 114–122):

The previous pattern:

resolveFetch({ ok: false });
// test exits
// afterEach(cleanup) runs
// finally { training = false } fires AFTER component is destroyed → track_reactivity_loss

The new pattern awaits button not disabled — a concrete settled state — before the test exits. This ensures the component's async cleanup path completes while the component is still alive. ✓

OcrTrainingCard success-dismiss timer scoping (lines 76–82):

Critical detail: vi.runAllTimers() is called before vi.useRealTimers() in afterEach. This is the correct order. Running all timers first drains the 5-second auto-dismiss callback on the live component. If useRealTimers() were called first, the pending fake timer callback would be orphaned — a scenario that can cause track_reactivity_loss or silent test pollution. ✓

The beforeEachafterEach symmetry is clean and limited to the describe block that needs it. ✓

Suggestions (non-blocking)

  1. Grep verification in CI: The test plan mentions grep for zero remaining .query().toBeNull() instances. Consider adding this as a package.json script ("test:antipattern-check": "grep -r 'query().toBeNull' src/ && exit 1 || exit 0") that runs in CI — prevents silent regression.

  2. #546 priority: The re-introduced vi.mock('pdfjs-dist', factory) in PdfViewer.svelte.test.ts is the exact pattern eliminated by #535. If that lands before #546 is resolved, expect birpc teardown races to resurface in CI. Flag #546 as a follow-up dependency.

No blockers. The fixes are correct, minimal, and well-documented.

## 🧪 Sara Holt — Senior QA Engineer **Verdict: ✅ Approved** This is exactly the kind of test-infrastructure maintenance that keeps a test suite trustworthy. Each fix addresses a real, documented root cause — not a symptom. ### Detailed review **`query().toBeNull()` → `await expect.element(...).not.toBeInTheDocument()`** (AnnotationShape, AnnotationLayer): The synchronous `.query()` pattern takes a DOM snapshot *before* Svelte's microtask queue settles the update. In browser-mode testing (vitest-browser), DOM updates are microtask-deferred — the snapshot may capture the pre-update state, making absence assertions structurally non-deterministic. The `await` form hands control back to the browser's rendering pipeline and waits for the expectation to become true. This is the correct, documented vitest-browser API. ✓ Four instances fixed across two files, consistent treatment. ✓ **OcrTrainingCard in-flight test** (line 114–122): The previous pattern: ``` resolveFetch({ ok: false }); // test exits // afterEach(cleanup) runs // finally { training = false } fires AFTER component is destroyed → track_reactivity_loss ``` The new pattern awaits `button not disabled` — a concrete settled state — before the test exits. This ensures the component's async cleanup path completes while the component is still alive. ✓ **OcrTrainingCard success-dismiss timer scoping** (lines 76–82): Critical detail: `vi.runAllTimers()` is called *before* `vi.useRealTimers()` in `afterEach`. This is the correct order. Running all timers first drains the 5-second auto-dismiss callback on the live component. If `useRealTimers()` were called first, the pending fake timer callback would be orphaned — a scenario that can cause `track_reactivity_loss` or silent test pollution. ✓ The `beforeEach` → `afterEach` symmetry is clean and limited to the `describe` block that needs it. ✓ ### Suggestions (non-blocking) 1. **Grep verification in CI**: The test plan mentions `grep` for zero remaining `.query().toBeNull()` instances. Consider adding this as a `package.json` script (`"test:antipattern-check": "grep -r 'query().toBeNull' src/ && exit 1 || exit 0"`) that runs in CI — prevents silent regression. 2. **`#546` priority**: The re-introduced `vi.mock('pdfjs-dist', factory)` in `PdfViewer.svelte.test.ts` is the exact pattern eliminated by #535. If that lands before #546 is resolved, expect birpc teardown races to resurface in CI. Flag #546 as a follow-up dependency. No blockers. The fixes are correct, minimal, and well-documented.
Author
Owner

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Verdict: Approved

This PR modifies three test spec files with no changes to any .svelte component, no template markup, no CSS, and no Tailwind classes.

UI/UX assessment

Nothing to review from a design, accessibility, or brand perspective. No visual states changed, no interaction patterns modified, no ARIA attributes touched.

One indirect observation worth noting: the OcrTrainingCard success-dismiss test verifies that a dismiss button exists with the correct 44×44px touch target (h-11 w-11). That test already exists and is passing — this PR preserves it by ensuring the timer runs against the live component (not after cleanup). The accessibility requirement on the dismiss button remains correctly tested. ✓

LGTM.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist **Verdict: ✅ Approved** This PR modifies three test spec files with no changes to any `.svelte` component, no template markup, no CSS, and no Tailwind classes. ### UI/UX assessment Nothing to review from a design, accessibility, or brand perspective. No visual states changed, no interaction patterns modified, no ARIA attributes touched. One indirect observation worth noting: the `OcrTrainingCard` success-dismiss test verifies that a dismiss button exists with the correct 44×44px touch target (`h-11 w-11`). That test already exists and is passing — this PR preserves it by ensuring the timer runs against the live component (not after cleanup). The accessibility requirement on the dismiss button remains correctly tested. ✓ LGTM.
marcel merged commit 7ee038faaf into main 2026-05-12 11:14:22 +02:00
marcel deleted branch feat/issue-541-flaky-browser-tests 2026-05-12 11:14:22 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#547