test: fix flaky browser-mode tests in AnnotationShape and OcrTrainingCard specs #547
Reference in New Issue
Block a user
Delete Branch "feat/issue-541-flaky-browser-tests"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #541
Summary
query().toBeNull()assertions withawait expect.element(...).not.toBeInTheDocument()— Svelte defers DOM updates to microtasks, so the synchronous snapshot races against the update and fails non-deterministicallypendingFetchand awaited the settled state (button not disabled) so the asyncfinally { training = false }block completes beforeafterEach(cleanup)destroys the componentvi.useFakeTimers()/vi.runAllTimers()/vi.useRealTimers()to the describe block so the 5-second auto-dismiss timer fires on the live component, not after cleanup destroys itRoot causes addressed
query().toBeNull()— synchronous DOM snapshot before Svelte microtask settlesawait expect.element().not.toBeInTheDocument()resolveFetch()last line →finallyblock runs aftercleanup→track_reactivity_losssetTimeout(5000)fires aftercleanup→track_reactivity_lossvi.useFakeTimers()+vi.runAllTimers()scoped to describe blockTest plan
grepconfirms zero remainingquery().toBeNull()instances insrc/Notes
During review, discovered that
PdfViewer.svelte.test.ts(added in a recent merge to main) re-introducesvi.mock('pdfjs-dist', factory)— the exact pattern that #535 eliminated. This causes the birpc teardown race locally. Tracked separately in #546; not in scope for this PR.🤖 Generated with Claude Code
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
Precise, minimal, and correct. Each fix addresses exactly the identified anti-pattern — nothing more.
What I checked
AnnotationShape + AnnotationLayer (
*.svelte.spec.ts): The switch from the synchronous.query().toBeNull()pattern toawait expect.element(...).not.toBeInTheDocument()is the canonical vitest-browser assertion..query()returns a synchronous DOM snapshot before Svelte's microtask queue settles, making absence assertions structurally non-deterministic. Theawaitversion lets the browser runner properly wait. Both files are consistent. ✓OcrTrainingCard in-flight test (
OcrTrainingCard.svelte.spec.ts, lines 114–122): Previously the test exited afterresolveFetch()with thefinally { training = false }block still pending — a classic race againstafterEach(cleanup). The new approach awaits the settled state (button not disabled) before the test returns. This is the correct idiom: let the component reach a known stable state before cleanup runs. ✓OcrTrainingCard success-dismiss timer scoping (lines 76–82):
vi.useFakeTimers()inbeforeEach, thenvi.runAllTimers()beforevi.useRealTimers()inafterEach— the order matters. Running all timers first drains any pending fake timers on the live component, preventing them from leaking into the next test's real-clock context. ✓The import addition of
beforeEachis correct; the dead comment removal (// Cleanup: resolve the pending promise) is good hygiene since the code now self-documents intent.Suggestions (nice to have)
grepcheck for zero remaining.query().toBeNull()insrc/. Worth codifying this as a lint rule or a CIgrep -rstep in the future — catch regressions before they reach review.No blockers.
🏛️ Markus Keller — Application Architect
Verdict: ✅ Approved
This PR is entirely test-file changes. No production code, no domain model, no new routes, no infrastructure additions. Zero architectural implications.
Documentation check
Per my documentation update table, I verified whether any trigger condition applies:
ErrorCodeorPermissionNo documentation updates required. The PR scope is correctly bounded to its stated purpose.
Observation
The PR description's footnote about
PdfViewer.svelte.test.tsre-introducingvi.mock('pdfjs-dist', factory)(tracked in #546) is a clean escalation of a discovered problem — it's scoped out appropriately and not bundled here. That's the right call; mixing concerns would have made this PR harder to review and revert independently.🔧 Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
This PR touches three frontend spec files only. No changes to Docker Compose, CI workflow files, Caddyfile, or any infrastructure component.
What I checked
package.json/pom.xmluntouched)vitest-browserrun will pick up these fixes automaticallyCI impact note
Flaky browser tests are a CI reliability problem as much as a code problem — they erode trust in the pipeline until teams start treating red runs as noise. Eliminating the
query().toBeNull()anti-pattern and the timer/cleanup race directly improves pipeline signal quality. The 3-consecutive-run verification in the test plan is the right bar to set before merge.The PdfViewer re-introduction of
vi.mock('pdfjs-dist')(tracked in #546) is worth watching — if that lands in main before #546 is resolved, expect sporadic CI failures on the browser test job. Worth prioritising #546 promptly.📋 Elicit — Requirements Engineer
Verdict: ✅ Approved
Reviewing from a requirements and specification completeness perspective.
PR description quality
The description is unusually well-structured for a test-only fix:
Requirement traceability
Closes #541 — the linked issue should describe the flakiness symptoms. The PR delivers on that. The fix is self-contained, reversible, and does not introduce new behaviour — it only makes existing behaviour deterministic.
One clarification worth noting
The test plan item "All 24 tests across the 3 spec files pass on 3 consecutive runs" — it's worth confirming in the merge checklist that this was actually run, not just stated. Flakiness by definition requires repeated runs to validate elimination. If CI only runs once, the 3-consecutive-run check is a manual step that should be explicitly confirmed before merge.
No blockers.
🔒 Nora "NullX" Steiner — Application Security Engineer
Verdict: ✅ Approved
This PR modifies three test spec files only. No production code paths, no authentication logic, no data handling, no API endpoints are touched.
Security assessment
vi.useFakeTimers()/vi.useRealTimers()scoping — the timer fake is properly contained to itsdescribeblock viabeforeEach/afterEach. There's no risk of fake timers leaking into unrelated tests and masking timing-sensitive security controls (e.g. token expiry, session timeout assertions). The drain order —vi.runAllTimers()beforevi.useRealTimers()— ensures no pending timer callbacks escape into real-clock context. ✓No test credential exposure — the spec files mock
fetchresponses inline with{ ok: true }/{ ok: false }— no real credentials, no network calls, no secrets. ✓No new test infrastructure — no new
vi.mock()patterns, no new module boundary bypasses beyond what already exists in these files. ✓Nothing of security concern here. The PR reduces non-determinism in the test suite, which is a net positive for test reliability — and reliable tests are a prerequisite for meaningful security regression coverage.
🧪 Sara Holt — Senior QA Engineer
Verdict: ✅ Approved
This is exactly the kind of test-infrastructure maintenance that keeps a test suite trustworthy. Each fix addresses a real, documented root cause — not a symptom.
Detailed review
query().toBeNull()→await expect.element(...).not.toBeInTheDocument()(AnnotationShape, AnnotationLayer):The synchronous
.query()pattern takes a DOM snapshot before Svelte's microtask queue settles the update. In browser-mode testing (vitest-browser), DOM updates are microtask-deferred — the snapshot may capture the pre-update state, making absence assertions structurally non-deterministic. Theawaitform hands control back to the browser's rendering pipeline and waits for the expectation to become true. This is the correct, documented vitest-browser API. ✓Four instances fixed across two files, consistent treatment. ✓
OcrTrainingCard in-flight test (line 114–122):
The previous pattern:
The new pattern awaits
button not disabled— a concrete settled state — before the test exits. This ensures the component's async cleanup path completes while the component is still alive. ✓OcrTrainingCard success-dismiss timer scoping (lines 76–82):
Critical detail:
vi.runAllTimers()is called beforevi.useRealTimers()inafterEach. This is the correct order. Running all timers first drains the 5-second auto-dismiss callback on the live component. IfuseRealTimers()were called first, the pending fake timer callback would be orphaned — a scenario that can causetrack_reactivity_lossor silent test pollution. ✓The
beforeEach→afterEachsymmetry is clean and limited to thedescribeblock that needs it. ✓Suggestions (non-blocking)
Grep verification in CI: The test plan mentions
grepfor zero remaining.query().toBeNull()instances. Consider adding this as apackage.jsonscript ("test:antipattern-check": "grep -r 'query().toBeNull' src/ && exit 1 || exit 0") that runs in CI — prevents silent regression.#546priority: The re-introducedvi.mock('pdfjs-dist', factory)inPdfViewer.svelte.test.tsis the exact pattern eliminated by #535. If that lands before #546 is resolved, expect birpc teardown races to resurface in CI. Flag #546 as a follow-up dependency.No blockers. The fixes are correct, minimal, and well-documented.
🎨 Leonie Voss — UX Designer & Accessibility Strategist
Verdict: ✅ Approved
This PR modifies three test spec files with no changes to any
.sveltecomponent, no template markup, no CSS, and no Tailwind classes.UI/UX assessment
Nothing to review from a design, accessibility, or brand perspective. No visual states changed, no interaction patterns modified, no ARIA attributes touched.
One indirect observation worth noting: the
OcrTrainingCardsuccess-dismiss test verifies that a dismiss button exists with the correct 44×44px touch target (h-11 w-11). That test already exists and is passing — this PR preserves it by ensuring the timer runs against the live component (not after cleanup). The accessibility requirement on the dismiss button remains correctly tested. ✓LGTM.