test: fix flaky browser-mode tests in AnnotationShape and OcrTrainingCard specs #541
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
Two browser-mode (vitest + Playwright/Chromium) test files intermittently fail in CI and local runs:
src/lib/document/annotation/AnnotationShape.svelte.spec.ts— 2 tests fail with timing/assertion issuessrc/lib/ocr/OcrTrainingCard.svelte.spec.ts— 2 tests fail withtrack_reactivity_lossfromsvelte/internal/client/reactivity/async.jsThe
OcrTrainingCardfailures reference a Svelte internal:Module.track_reactivity_loss node_modules/svelte/src/internal/client/reactivity/async.js:167— likely an async reactivity boundary issue in the test setup.Impact
These flaky tests cause non-deterministic CI failures that are unrelated to code changes under review. They appear alongside legitimate test failures and make it harder to diagnose regressions.
Acceptance criteria
await?)👨💻 Felix Brandt — Senior Fullstack Developer
Observations
AnnotationShape.svelte.spec.ts — synchronous null-checks are the timing/assertion culprit:
expect(page.getByTestId(...).query()).toBeNull()— a synchronous DOM snapshot inside an async browser test. Svelte's DOM updates are microtask-deferred; the query can fire before the update settles, making the assertion non-deterministic.dispatchEvent(new KeyboardEvent(...))is synchronous, but Svelte'sonkeydownhandler schedules reactive work. Theexpect(onDeleteRequest).toHaveBeenCalledOnce()assertion immediately follows with no yield — it races against the handler completing.OcrTrainingCard.svelte.spec.ts —
track_reactivity_lossroot cause:pendingFetchis not resolved until line 112, afterafterEach(cleanup)has already destroyed the component. WhenresolveFetch({ ok: false })fires, thefinally { training = false }block writes to a dead reactive graph →track_reactivity_loss.startTraining()schedulessetTimeout(() => { successMessage = null }, 5000)inside the component. AfterafterEach(cleanup)destroys the component andvi.restoreAllMocks()clears stubs, the 5-second timer is still pending. When it fires it writessuccessMessage = nullon a destroyed component → secondtrack_reactivity_losspath.Recommendations
AnnotationShape fixes:
expect(page.getByTestId(...).query()).toBeNull()withawait expect.element(page.getByTestId(...)).not.toBeInTheDocument()— this uses vitest-browser's async polling and is race-free.await vi.waitFor(() => expect(onDeleteRequest).toHaveBeenCalledOnce())afterdispatchEvent(...), or switch toawait page.keyboard.press('Delete')on the focused element.OcrTrainingCard fixes:
pendingFetchbefore the test body exits andawaitthe settled state so thefinallyblock completes beforecleanupruns:setTimeoutauto-dismiss: usevi.useFakeTimers()in the affecteddescribeblock and callvi.runAllTimers()beforecleanupfires. AddafterEach(() => vi.useRealTimers())to restore. This is the standard pattern for components with internal timers — no production code change needed.Open Decisions
vi.useFakeTimers()keeps production code unchanged and is the simplest fix. Injecting adismissDelayprop is more testable by design but adds unnecessary API surface to a UI detail. Recommend fake timers.🏗️ Markus Keller — Application Architect
Observations
track_reactivity_lossfailures inOcrTrainingCardhave an architectural root:startTraining()usessetTimeout(() => { successMessage = null }, 5000)to auto-dismiss a success message. This creates an implicit side effect that outlives the test lifecycle — the component schedules work against wall-clock time with no way for the caller or test to control or observe that schedule.AnnotationShapeandOcrTrainingCardare pure UI components with no cross-domain coupling. The issue is fully contained in the test layer.afterEach(cleanup),vi.restoreAllMocks(), factory functions for props — no architectural cleanup needed beyond the timing fixes.Recommendations
vi.useFakeTimers()scoped to the success-dismissdescribeblock is the correct adapter — it isolates the component from wall-clock time without changing production code. This is not a design smell requiring refactoring.useAutoDismiss(delay)hook that can be stubbed at a higher level. Not needed now.(No open decisions from my angle.)
🔒 Nora Steiner — Application Security Engineer
Observations
OcrTrainingCard.svelte.spec.tsalready hasafterEach(() => vi.restoreAllMocks())— correct.vi.stubGlobal('fetch')is properly cleaned up and will not leak the stub into other tests. No change needed here.AnnotationShape.svelte.spec.tsdoes not stub any globals — no cleanup concern.AnnotationShape.svelteduring research:annotation.coloris used in inline styles viahexToRgba(), which parses hex digits withparseIntand constructs anrgba(...)string. No user-controlled string is injected into HTML directly — no XSS path.fetch('/api/ocr/train', { method: 'POST' })inOcrTrainingCardis correctly stubbed in tests. In production it hits an authenticated endpoint —@RequirePermissionenforcement is at the backend layer and is not in scope for this fix.Recommendations
No security changes required. The timing fixes described by Felix and Sara are sufficient. Proceed.
🧪 Sara Holt — Senior QA Engineer
Observations
expect(page.getByTestId(...).query()).toBeNull()(lines 47–48, 62–63) is the anti-pattern. In vitest-browser,.query()is a synchronous DOM snapshot — Svelte's microtask-deferred DOM updates may not have settled yet. The correct pattern,await expect.element(...).not.toBeInTheDocument(), uses vitest-browser's built-in async retry loop and is deterministic.track_reactivity_lossfires because async functions inside the component (finally { training = false }and thesetTimeoutauto-dismiss) write reactive state afterafterEach(cleanup)destroys the component. This is the canonical "async cleanup race" — the fix is to ensure all in-flight async work settles beforecleanupruns.@Disabledorit.skipshould be introduced as a stopgap — the issue body correctly targets root-cause fixes.Recommendations
query().toBeNull()withawait expect.element(...).not.toBeInTheDocument(). This pattern should be documented in the project's test conventions as the canonical absence assertion for browser tests.pendingFetchbefore the test body exits andawaitthe settled state sofinallycompletes beforecleanupruns.vi.useFakeTimers()+vi.runAllTimers()scoped to the'OcrTrainingCard — success dismiss button'describe block. Callvi.runAllTimers()at the end of the test or in anafterEachbeforecleanup.query().toBeNull()— if this pattern exists elsewhere, fix them in the same PR or open a follow-up issue. One flaky pattern usually means there are siblings.Open Decisions
query().toBeNull()sweep: fix all instances in this PR vs. open a follow-up issue. If there are ≤5 instances, fix them here. If there are more, open a separate issue taggedtestto track the sweep.🎨 Leonie Voss — UX Designer & Accessibility Strategist
Observations
This issue is a test infrastructure fix with no UI or interaction changes. No visual regression or accessibility audit is required for the fix itself.
While reviewing the components during research, two low-priority accessibility observations surfaced (not blockers for this issue):
AnnotationShape.svelte:
aria-label="Block anzeigen"is static — all annotation shapes on a page will have the identical label. Screen reader users navigating by Tab hear "Block anzeigen, button" repeated with no way to distinguish between shapes. A future improvement:aria-label="{m.annotation_view_label({ id: annotation.id })}"or incorporatingblockNumberwhen present.aria-label="Löschen"without context — screen readers announce "Löschen, Schaltfläche" with no indication of what is being deleted. Consideraria-label="{m.annotation_delete_label()}"(e.g., "Annotation löschen").OcrTrainingCard.svelte:
aria-label={m.comp_dismiss()}andh-11 w-11(44px touch target) ✓aria-live="polite"on success message andaria-live="assertive"on error message ✓ — correct priority split.Recommendations
aria-labelimprovements rather than bundling them here.🛠️ Tobias Wendt — DevOps & Platform Engineer
Observations
clientproject (seevite.config.tslines 68–83). No explicitretryortimeoutis configured beyond vitest defaults. This is correct for deterministic tests; the issue is that these specific tests are not yet deterministic.vite.config.tscoverage config measures theserverproject only — browser tests do not gate on coverage thresholds. These failures block CI via test pass/fail only, which is the right gate.Recommendations
--retryto the CI vitest command as a workaround — it masks root causes, inflates CI time, and moves failures from "fail fast" to "fail slow." Fix the tests properly, as this issue targets.(No open decisions from my angle.)
📋 Elicit — Senior Requirements Engineer
Observations
The issue is well-formed for a bug fix: clear Problem statement, identifiable Impact, and measurable Acceptance Criteria. This is above the project average for test issues. A few gaps worth closing before implementation starts:
AC-1 gap — "5 consecutive local runs" is ambiguous:
Local runs have different timing characteristics than CI (CPU load, headless vs. headed, network). A flaky test that passes locally 5× in a row can still fail in CI. The definitive stability signal is CI, not local. Suggest revising to: "Both spec files run green on 3 consecutive CI runs with no code changes."
AC-2 gap — "Root cause documented" — documented where?
The commit message and PR description are the right places for the root cause summary. A short inline comment explaining why
await expect.element()is required (vs..query().toBeNull()) is more durable than PR prose that gets buried. Suggest specifying: "Root cause summarized in the PR description and in a code comment at the fixed assertion sites."Missing AC-3 — regression prevention:
The
query().toBeNull()anti-pattern may exist in other browser-mode spec files. If the fix only touches the two named files, the pattern can reappear in future tests without anyone flagging it. Suggest adding: "All other*.svelte.spec.tsfiles are checked for the samequery().toBeNull()anti-pattern; instances are either fixed in this PR or tracked in a follow-up issue."Recommendations
.query()and async cleanup race) is concrete and actionable. No further elicitation needed before coding begins.🗳️ Decision Queue — Action Required
2 decisions need your input before implementation starts.
Testing Strategy
OcrTrainingCard—vi.useFakeTimers()keeps production code unchanged and requires no API surface change. Injecting adismissDelayprop makes the timing explicit and testable without fake timers, but adds an API detail that has no UX relevance. Felix and Markus both recommend fake timers. (Raised by: Felix, Markus)Scope
query().toBeNull()anti-pattern — Sara asked whether to fix all instances in this PR or open a follow-up. Codebase check finds exactly 4 instances across 2 files: 2 inAnnotationShape.svelte.spec.ts(already in scope) and 2 more insrc/lib/document/annotation/AnnotationLayer.svelte.spec.tslines 110 and 123. Since the total is ≤5 and both files are in the same annotation domain, Sara's own threshold says fix all 4 in this PR. Only needs confirmation if you prefer a separate issue. (Raised by: Sara)✅ Implementation complete — branch
feat/issue-541-flaky-browser-testsAll 6 tasks from the plan are done. Two commits on the branch:
Commit 1 —
ae168831—test(annotation): replace synchronous query().toBeNull() with async not.toBeInTheDocument()Fixed all 4
query().toBeNull()instances across both annotation spec files:AnnotationShape.svelte.spec.tslines 48, 63 — the 2 confirmed flaky testsAnnotationLayer.svelte.spec.tslines 110, 123 — sweep (Sara's ≤5 threshold)Root cause:
.query()is a synchronous DOM snapshot; Svelte defers DOM updates to microtasks, so the element may still be present when the assertion fires.await expect.element(...).not.toBeInTheDocument()uses vitest-browser's async retry loop and is deterministic.Commit 2 —
7ee038fa—test(ocr): fix track_reactivity_loss in OcrTrainingCard specTwo root causes addressed in one file:
In-flight test:
resolveFetch({ ok: false })was the last line of the test. The asyncfinally { training = false }block ran in microtasks afterafterEach(cleanup)destroyed the component. Fix: await the button becoming re-enabled — this waits for thefinallyblock to settle before cleanup runs.Success-dismiss test:
startTraining()schedulessetTimeout(() => { successMessage = null }, 5000)after a successful fetch. The 5-second timer fired aftercleanup()destroyed the component. Fix:vi.useFakeTimers()+vi.runAllTimers()+vi.useRealTimers()scoped to the'OcrTrainingCard — success dismiss button'describe block. The innerafterEachruns before the outerafterEach(cleanup), so the timer fires on the live component.Verification
All 24 tests across the 3 spec files pass consistently. No
query().toBeNull()instances remain anywhere in the test suite (grepconfirmed zero remaining instances).Confirmed: no new regressions introduced (full suite run shows only the pre-existing birpc issue from
PdfViewer.svelte.test.ts— see separate note to the team).