test: fix flaky browser-mode tests in AnnotationShape and OcrTrainingCard specs #541

New Issue

marcel · 2026-05-12T09:35:57+02:00

marcel commented

2026-05-12 09:35:57 +02:00

Problem

Two browser-mode (vitest + Playwright/Chromium) test files intermittently fail in CI and local runs:

src/lib/document/annotation/AnnotationShape.svelte.spec.ts — 2 tests fail with timing/assertion issues
src/lib/ocr/OcrTrainingCard.svelte.spec.ts — 2 tests fail with track_reactivity_loss from svelte/internal/client/reactivity/async.js

The OcrTrainingCard failures reference a Svelte internal: Module.track_reactivity_loss node_modules/svelte/src/internal/client/reactivity/async.js:167 — likely an async reactivity boundary issue in the test setup.

Impact

These flaky tests cause non-deterministic CI failures that are unrelated to code changes under review. They appear alongside legitimate test failures and make it harder to diagnose regressions.

Acceptance criteria

Both spec files run green on 5 consecutive local runs with no code changes
Root cause documented (timing? async reactivity boundary? missing await?)
Fix applied and CI confirmed stable

## Problem Two browser-mode (vitest + Playwright/Chromium) test files intermittently fail in CI and local runs: - `src/lib/document/annotation/AnnotationShape.svelte.spec.ts` — 2 tests fail with timing/assertion issues - `src/lib/ocr/OcrTrainingCard.svelte.spec.ts` — 2 tests fail with `track_reactivity_loss` from `svelte/internal/client/reactivity/async.js` The `OcrTrainingCard` failures reference a Svelte internal: `Module.track_reactivity_loss node_modules/svelte/src/internal/client/reactivity/async.js:167` — likely an async reactivity boundary issue in the test setup. ## Impact These flaky tests cause non-deterministic CI failures that are unrelated to code changes under review. They appear alongside legitimate test failures and make it harder to diagnose regressions. ## Acceptance criteria - Both spec files run green on 5 consecutive local runs with no code changes - Root cause documented (timing? async reactivity boundary? missing `await`?) - Fix applied and CI confirmed stable

marcel added the P2-medium bug test labels 2026-05-12 09:36:03 +02:00

marcel referenced this issue

2026-05-12 09:46:49 +02:00

fix(#535): eliminate vi.mock(pdfjs-dist) birpc teardown race via libLoader injection #536

marcel commented

2026-05-12 10:01:38 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

AnnotationShape.svelte.spec.ts — synchronous null-checks are the timing/assertion culprit:

Lines 47–48 and 62–63 use expect(page.getByTestId(...).query()).toBeNull() — a synchronous DOM snapshot inside an async browser test. Svelte's DOM updates are microtask-deferred; the query can fire before the update settles, making the assertion non-deterministic.
Lines 152–155 and 168–176: dispatchEvent(new KeyboardEvent(...)) is synchronous, but Svelte's onkeydown handler schedules reactive work. The expect(onDeleteRequest).toHaveBeenCalledOnce() assertion immediately follows with no yield — it races against the handler completing.

OcrTrainingCard.svelte.spec.ts — track_reactivity_loss root cause:

In-flight test (lines 94–113): pendingFetch is not resolved until line 112, after afterEach(cleanup) has already destroyed the component. When resolveFetch({ ok: false }) fires, the finally { training = false } block writes to a dead reactive graph → track_reactivity_loss.
Success-dismiss test (lines 77–91): startTraining() schedules setTimeout(() => { successMessage = null }, 5000) inside the component. After afterEach(cleanup) destroys the component and vi.restoreAllMocks() clears stubs, the 5-second timer is still pending. When it fires it writes successMessage = null on a destroyed component → second track_reactivity_loss path.

Recommendations

AnnotationShape fixes:

Replace all expect(page.getByTestId(...).query()).toBeNull() with await expect.element(page.getByTestId(...)).not.toBeInTheDocument() — this uses vitest-browser's async polling and is race-free.
For keyboard dispatch, add a await vi.waitFor(() => expect(onDeleteRequest).toHaveBeenCalledOnce()) after dispatchEvent(...), or switch to await page.keyboard.press('Delete') on the focused element.

OcrTrainingCard fixes:

In-flight test: resolve pendingFetch before the test body exits and await the settled state so the finally block completes before cleanup runs:

resolveFetch({ ok: false });
await expect.element(page.getByRole('button', { name: /Training starten/i })).not.toBeDisabled();

For the setTimeout auto-dismiss: use vi.useFakeTimers() in the affected describe block and call vi.runAllTimers() before cleanup fires. Add afterEach(() => vi.useRealTimers()) to restore. This is the standard pattern for components with internal timers — no production code change needed.

Open Decisions

Fake timers vs. prop injection for the 5-second auto-dismiss: vi.useFakeTimers() keeps production code unchanged and is the simplest fix. Injecting a dismissDelay prop is more testable by design but adds unnecessary API surface to a UI detail. Recommend fake timers.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations **AnnotationShape.svelte.spec.ts — synchronous null-checks are the timing/assertion culprit:** - Lines 47–48 and 62–63 use `expect(page.getByTestId(...).query()).toBeNull()` — a synchronous DOM snapshot inside an async browser test. Svelte's DOM updates are microtask-deferred; the query can fire before the update settles, making the assertion non-deterministic. - Lines 152–155 and 168–176: `dispatchEvent(new KeyboardEvent(...))` is synchronous, but Svelte's `onkeydown` handler schedules reactive work. The `expect(onDeleteRequest).toHaveBeenCalledOnce()` assertion immediately follows with no yield — it races against the handler completing. **OcrTrainingCard.svelte.spec.ts — `track_reactivity_loss` root cause:** - In-flight test (lines 94–113): `pendingFetch` is not resolved until line 112, *after* `afterEach(cleanup)` has already destroyed the component. When `resolveFetch({ ok: false })` fires, the `finally { training = false }` block writes to a dead reactive graph → `track_reactivity_loss`. - Success-dismiss test (lines 77–91): `startTraining()` schedules `setTimeout(() => { successMessage = null }, 5000)` inside the component. After `afterEach(cleanup)` destroys the component and `vi.restoreAllMocks()` clears stubs, the 5-second timer is still pending. When it fires it writes `successMessage = null` on a destroyed component → second `track_reactivity_loss` path. ### Recommendations **AnnotationShape fixes:** - Replace all `expect(page.getByTestId(...).query()).toBeNull()` with `await expect.element(page.getByTestId(...)).not.toBeInTheDocument()` — this uses vitest-browser's async polling and is race-free. - For keyboard dispatch, add a `await vi.waitFor(() => expect(onDeleteRequest).toHaveBeenCalledOnce())` after `dispatchEvent(...)`, or switch to `await page.keyboard.press('Delete')` on the focused element. **OcrTrainingCard fixes:** - In-flight test: resolve `pendingFetch` *before* the test body exits and `await` the settled state so the `finally` block completes before `cleanup` runs: ```typescript resolveFetch({ ok: false }); await expect.element(page.getByRole('button', { name: /Training starten/i })).not.toBeDisabled(); ``` - For the `setTimeout` auto-dismiss: use `vi.useFakeTimers()` in the affected `describe` block and call `vi.runAllTimers()` before `cleanup` fires. Add `afterEach(() => vi.useRealTimers())` to restore. This is the standard pattern for components with internal timers — no production code change needed. ### Open Decisions - **Fake timers vs. prop injection for the 5-second auto-dismiss**: `vi.useFakeTimers()` keeps production code unchanged and is the simplest fix. Injecting a `dismissDelay` prop is more testable by design but adds unnecessary API surface to a UI detail. Recommend fake timers.

marcel commented

2026-05-12 10:01:46 +02:00

🏗️ Markus Keller — Application Architect

Observations

The track_reactivity_loss failures in OcrTrainingCard have an architectural root: startTraining() uses setTimeout(() => { successMessage = null }, 5000) to auto-dismiss a success message. This creates an implicit side effect that outlives the test lifecycle — the component schedules work against wall-clock time with no way for the caller or test to control or observe that schedule.
This is a well-known testability trap: components that mutate their own state via uncontrolled async timers couple their behavior to real time. The problem shows up in tests first, but it also means production behavior (the dismiss timing) cannot be verified without a 5-second wait.
No layering or module boundary concerns here — both AnnotationShape and OcrTrainingCard are pure UI components with no cross-domain coupling. The issue is fully contained in the test layer.
The test setup is otherwise well-structured: afterEach(cleanup), vi.restoreAllMocks(), factory functions for props — no architectural cleanup needed beyond the timing fixes.

Recommendations

vi.useFakeTimers() scoped to the success-dismiss describe block is the correct adapter — it isolates the component from wall-clock time without changing production code. This is not a design smell requiring refactoring.
If the auto-dismiss pattern grows (multiple cards, multiple timers), consider a shared useAutoDismiss(delay) hook that can be stubbed at a higher level. Not needed now.
No ADR required — this is a test isolation correction, not a structural decision. No architectural diagrams need updating.

(No open decisions from my angle.)

## 🏗️ Markus Keller — Application Architect ### Observations - The `track_reactivity_loss` failures in `OcrTrainingCard` have an architectural root: `startTraining()` uses `setTimeout(() => { successMessage = null }, 5000)` to auto-dismiss a success message. This creates an implicit side effect that outlives the test lifecycle — the component schedules work against wall-clock time with no way for the caller or test to control or observe that schedule. - This is a well-known testability trap: components that mutate their own state via uncontrolled async timers couple their behavior to real time. The problem shows up in tests first, but it also means production behavior (the dismiss timing) cannot be verified without a 5-second wait. - No layering or module boundary concerns here — both `AnnotationShape` and `OcrTrainingCard` are pure UI components with no cross-domain coupling. The issue is fully contained in the test layer. - The test setup is otherwise well-structured: `afterEach(cleanup)`, `vi.restoreAllMocks()`, factory functions for props — no architectural cleanup needed beyond the timing fixes. ### Recommendations - `vi.useFakeTimers()` scoped to the success-dismiss `describe` block is the correct adapter — it isolates the component from wall-clock time without changing production code. This is not a design smell requiring refactoring. - If the auto-dismiss pattern grows (multiple cards, multiple timers), consider a shared `useAutoDismiss(delay)` hook that can be stubbed at a higher level. Not needed now. - No ADR required — this is a test isolation correction, not a structural decision. No architectural diagrams need updating. _(No open decisions from my angle.)_

marcel commented

2026-05-12 10:01:51 +02:00

🔒 Nora Steiner — Application Security Engineer

Observations

This is a test-only fix — no production security surface is affected.
OcrTrainingCard.svelte.spec.ts already has afterEach(() => vi.restoreAllMocks()) — correct. vi.stubGlobal('fetch') is properly cleaned up and will not leak the stub into other tests. No change needed here.
AnnotationShape.svelte.spec.ts does not stub any globals — no cleanup concern.
Reviewed AnnotationShape.svelte during research: annotation.color is used in inline styles via hexToRgba(), which parses hex digits with parseInt and constructs an rgba(...) string. No user-controlled string is injected into HTML directly — no XSS path.
The fetch('/api/ocr/train', { method: 'POST' }) in OcrTrainingCard is correctly stubbed in tests. In production it hits an authenticated endpoint — @RequirePermission enforcement is at the backend layer and is not in scope for this fix.

Recommendations

No security changes required. The timing fixes described by Felix and Sara are sufficient. Proceed.

## 🔒 Nora Steiner — Application Security Engineer ### Observations - This is a test-only fix — no production security surface is affected. - `OcrTrainingCard.svelte.spec.ts` already has `afterEach(() => vi.restoreAllMocks())` — correct. `vi.stubGlobal('fetch')` is properly cleaned up and will not leak the stub into other tests. No change needed here. - `AnnotationShape.svelte.spec.ts` does not stub any globals — no cleanup concern. - Reviewed `AnnotationShape.svelte` during research: `annotation.color` is used in inline styles via `hexToRgba()`, which parses hex digits with `parseInt` and constructs an `rgba(...)` string. No user-controlled string is injected into HTML directly — no XSS path. - The `fetch('/api/ocr/train', { method: 'POST' })` in `OcrTrainingCard` is correctly stubbed in tests. In production it hits an authenticated endpoint — `@RequirePermission` enforcement is at the backend layer and is not in scope for this fix. ### Recommendations No security changes required. The timing fixes described by Felix and Sara are sufficient. Proceed.

marcel commented

2026-05-12 10:02:03 +02:00

🧪 Sara Holt — Senior QA Engineer

Observations

Four non-deterministic failures across two files is a quality gate problem. Flaky tests cause teams to re-run CI speculatively instead of diagnosing confidently. The right response is exactly what this issue does: fix the root cause, not add retries.
Root cause in AnnotationShape: expect(page.getByTestId(...).query()).toBeNull() (lines 47–48, 62–63) is the anti-pattern. In vitest-browser, .query() is a synchronous DOM snapshot — Svelte's microtask-deferred DOM updates may not have settled yet. The correct pattern, await expect.element(...).not.toBeInTheDocument(), uses vitest-browser's built-in async retry loop and is deterministic.
Root cause in OcrTrainingCard: track_reactivity_loss fires because async functions inside the component (finally { training = false } and the setTimeout auto-dismiss) write reactive state after afterEach(cleanup) destroys the component. This is the canonical "async cleanup race" — the fix is to ensure all in-flight async work settles before cleanup runs.
The acceptance criterion "5 consecutive local runs" is achievable but local runs have different timing characteristics than CI. The real stability signal is consistent CI runs.
No @Disabled or it.skip should be introduced as a stopgap — the issue body correctly targets root-cause fixes.

Recommendations

AnnotationShape: replace all query().toBeNull() with await expect.element(...).not.toBeInTheDocument(). This pattern should be documented in the project's test conventions as the canonical absence assertion for browser tests.
OcrTrainingCard in-flight test: resolve pendingFetch before the test body exits and await the settled state so finally completes before cleanup runs.
OcrTrainingCard success test: use vi.useFakeTimers() + vi.runAllTimers() scoped to the 'OcrTrainingCard — success dismiss button' describe block. Call vi.runAllTimers() at the end of the test or in an afterEach before cleanup.
Scan the rest of the browser test suite for other instances of query().toBeNull() — if this pattern exists elsewhere, fix them in the same PR or open a follow-up issue. One flaky pattern usually means there are siblings.
CI gate: run CI at least twice with no code changes after the fix to confirm stability. State this in the PR description.

Open Decisions

Scope of the query().toBeNull() sweep: fix all instances in this PR vs. open a follow-up issue. If there are ≤5 instances, fix them here. If there are more, open a separate issue tagged test to track the sweep.

## 🧪 Sara Holt — Senior QA Engineer ### Observations - **Four non-deterministic failures across two files is a quality gate problem.** Flaky tests cause teams to re-run CI speculatively instead of diagnosing confidently. The right response is exactly what this issue does: fix the root cause, not add retries. - **Root cause in AnnotationShape**: `expect(page.getByTestId(...).query()).toBeNull()` (lines 47–48, 62–63) is the anti-pattern. In vitest-browser, `.query()` is a synchronous DOM snapshot — Svelte's microtask-deferred DOM updates may not have settled yet. The correct pattern, `await expect.element(...).not.toBeInTheDocument()`, uses vitest-browser's built-in async retry loop and is deterministic. - **Root cause in OcrTrainingCard**: `track_reactivity_loss` fires because async functions inside the component (`finally { training = false }` and the `setTimeout` auto-dismiss) write reactive state after `afterEach(cleanup)` destroys the component. This is the canonical "async cleanup race" — the fix is to ensure all in-flight async work settles before `cleanup` runs. - The acceptance criterion "5 consecutive local runs" is achievable but local runs have different timing characteristics than CI. The real stability signal is consistent CI runs. - No `@Disabled` or `it.skip` should be introduced as a stopgap — the issue body correctly targets root-cause fixes. ### Recommendations - **AnnotationShape**: replace all `query().toBeNull()` with `await expect.element(...).not.toBeInTheDocument()`. This pattern should be documented in the project's test conventions as the canonical absence assertion for browser tests. - **OcrTrainingCard in-flight test**: resolve `pendingFetch` before the test body exits and `await` the settled state so `finally` completes before `cleanup` runs. - **OcrTrainingCard success test**: use `vi.useFakeTimers()` + `vi.runAllTimers()` scoped to the `'OcrTrainingCard — success dismiss button'` describe block. Call `vi.runAllTimers()` at the end of the test or in an `afterEach` before `cleanup`. - **Scan the rest of the browser test suite** for other instances of `query().toBeNull()` — if this pattern exists elsewhere, fix them in the same PR or open a follow-up issue. One flaky pattern usually means there are siblings. - **CI gate**: run CI at least twice with no code changes after the fix to confirm stability. State this in the PR description. ### Open Decisions - **Scope of the `query().toBeNull()` sweep**: fix all instances in this PR vs. open a follow-up issue. If there are ≤5 instances, fix them here. If there are more, open a separate issue tagged `test` to track the sweep.

marcel commented

2026-05-12 10:02:09 +02:00

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Observations

This issue is a test infrastructure fix with no UI or interaction changes. No visual regression or accessibility audit is required for the fix itself.

While reviewing the components during research, two low-priority accessibility observations surfaced (not blockers for this issue):

AnnotationShape.svelte:

aria-label="Block anzeigen" is static — all annotation shapes on a page will have the identical label. Screen reader users navigating by Tab hear "Block anzeigen, button" repeated with no way to distinguish between shapes. A future improvement: aria-label="{m.annotation_view_label({ id: annotation.id })}" or incorporating blockNumber when present.
The delete button uses aria-label="Löschen" without context — screen readers announce "Löschen, Schaltfläche" with no indication of what is being deleted. Consider aria-label="{m.annotation_delete_label()}" (e.g., "Annotation löschen").

OcrTrainingCard.svelte:

Dismiss button correctly has aria-label={m.comp_dismiss()} and h-11 w-11 (44px touch target) ✓
aria-live="polite" on success message and aria-live="assertive" on error message ✓ — correct priority split.

Recommendations

Proceed with the test fix. The accessibility observations above are Low priority findings — suggest opening a separate follow-up issue for the aria-label improvements rather than bundling them here.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist ### Observations This issue is a test infrastructure fix with no UI or interaction changes. No visual regression or accessibility audit is required for the fix itself. While reviewing the components during research, two low-priority accessibility observations surfaced (not blockers for this issue): **AnnotationShape.svelte:** - `aria-label="Block anzeigen"` is static — all annotation shapes on a page will have the identical label. Screen reader users navigating by Tab hear "Block anzeigen, button" repeated with no way to distinguish between shapes. A future improvement: `aria-label="{m.annotation_view_label({ id: annotation.id })}"` or incorporating `blockNumber` when present. - The delete button uses `aria-label="Löschen"` without context — screen readers announce "Löschen, Schaltfläche" with no indication of *what* is being deleted. Consider `aria-label="{m.annotation_delete_label()}"` (e.g., "Annotation löschen"). **OcrTrainingCard.svelte:** - Dismiss button correctly has `aria-label={m.comp_dismiss()}` and `h-11 w-11` (44px touch target) ✓ - `aria-live="polite"` on success message and `aria-live="assertive"` on error message ✓ — correct priority split. ### Recommendations - Proceed with the test fix. The accessibility observations above are **Low** priority findings — suggest opening a separate follow-up issue for the `aria-label` improvements rather than bundling them here.

marcel commented

2026-05-12 10:02:18 +02:00

🛠️ Tobias Wendt — DevOps & Platform Engineer

Observations

Flaky browser-mode tests are a CI reliability problem first. Non-deterministic failures erode pipeline trust — when CI goes red, the team cannot distinguish a real regression from a timing race. That diagnostic overhead compounds over time.
The Vitest browser-mode project runs Chromium via Playwright in the client project (see vite.config.ts lines 68–83). No explicit retry or timeout is configured beyond vitest defaults. This is correct for deterministic tests; the issue is that these specific tests are not yet deterministic.
Current vite.config.ts coverage config measures the server project only — browser tests do not gate on coverage thresholds. These failures block CI via test pass/fail only, which is the right gate.
The acceptance criterion "CI confirmed stable" is the correct bar. The practical check is 2–3 consecutive green CI runs with no code changes after the fix is merged.

Recommendations

Do not add --retry to the CI vitest command as a workaround — it masks root causes, inflates CI time, and moves failures from "fail fast" to "fail slow." Fix the tests properly, as this issue targets.
Post-fix verification: trigger CI twice with an empty commit or two manual re-runs after the fix merges. Confirm both green before closing the issue.
No infrastructure changes required. The existing Gitea CI pipeline running vitest in browser mode is correctly set up. No runner config, Docker service, or Compose change is needed for this fix.

(No open decisions from my angle.)

## 🛠️ Tobias Wendt — DevOps & Platform Engineer ### Observations - Flaky browser-mode tests are a CI reliability problem first. Non-deterministic failures erode pipeline trust — when CI goes red, the team cannot distinguish a real regression from a timing race. That diagnostic overhead compounds over time. - The Vitest browser-mode project runs Chromium via Playwright in the `client` project (see `vite.config.ts` lines 68–83). No explicit `retry` or `timeout` is configured beyond vitest defaults. This is correct for deterministic tests; the issue is that these specific tests are not yet deterministic. - Current `vite.config.ts` coverage config measures the `server` project only — browser tests do not gate on coverage thresholds. These failures block CI via test pass/fail only, which is the right gate. - The acceptance criterion "CI confirmed stable" is the correct bar. The practical check is 2–3 consecutive green CI runs with no code changes after the fix is merged. ### Recommendations - **Do not add `--retry` to the CI vitest command as a workaround** — it masks root causes, inflates CI time, and moves failures from "fail fast" to "fail slow." Fix the tests properly, as this issue targets. - **Post-fix verification**: trigger CI twice with an empty commit or two manual re-runs after the fix merges. Confirm both green before closing the issue. - **No infrastructure changes required.** The existing Gitea CI pipeline running vitest in browser mode is correctly set up. No runner config, Docker service, or Compose change is needed for this fix. _(No open decisions from my angle.)_

marcel commented

2026-05-12 10:02:27 +02:00

📋 Elicit — Senior Requirements Engineer

Observations

The issue is well-formed for a bug fix: clear Problem statement, identifiable Impact, and measurable Acceptance Criteria. This is above the project average for test issues. A few gaps worth closing before implementation starts:

AC-1 gap — "5 consecutive local runs" is ambiguous:
Local runs have different timing characteristics than CI (CPU load, headless vs. headed, network). A flaky test that passes locally 5× in a row can still fail in CI. The definitive stability signal is CI, not local. Suggest revising to: "Both spec files run green on 3 consecutive CI runs with no code changes."

AC-2 gap — "Root cause documented" — documented where?
The commit message and PR description are the right places for the root cause summary. A short inline comment explaining why await expect.element() is required (vs. .query().toBeNull()) is more durable than PR prose that gets buried. Suggest specifying: "Root cause summarized in the PR description and in a code comment at the fixed assertion sites."

Missing AC-3 — regression prevention:
The query().toBeNull() anti-pattern may exist in other browser-mode spec files. If the fix only touches the two named files, the pattern can reappear in future tests without anyone flagging it. Suggest adding: "All other *.svelte.spec.ts files are checked for the same query().toBeNull() anti-pattern; instances are either fixed in this PR or tracked in a follow-up issue."

Recommendations

Update the issue body with the refined ACs above before implementation starts — this prevents the PR from being closed on "passes locally" evidence alone.
The root cause identified by Felix (synchronous .query() and async cleanup race) is concrete and actionable. No further elicitation needed before coding begins.

## 📋 Elicit — Senior Requirements Engineer ### Observations The issue is well-formed for a bug fix: clear **Problem** statement, identifiable **Impact**, and measurable **Acceptance Criteria**. This is above the project average for test issues. A few gaps worth closing before implementation starts: **AC-1 gap — "5 consecutive local runs" is ambiguous:** Local runs have different timing characteristics than CI (CPU load, headless vs. headed, network). A flaky test that passes locally 5× in a row can still fail in CI. The definitive stability signal is CI, not local. Suggest revising to: *"Both spec files run green on 3 consecutive CI runs with no code changes."* **AC-2 gap — "Root cause documented" — documented where?** The commit message and PR description are the right places for the root cause summary. A short inline comment explaining *why* `await expect.element()` is required (vs. `.query().toBeNull()`) is more durable than PR prose that gets buried. Suggest specifying: *"Root cause summarized in the PR description and in a code comment at the fixed assertion sites."* **Missing AC-3 — regression prevention:** The `query().toBeNull()` anti-pattern may exist in other browser-mode spec files. If the fix only touches the two named files, the pattern can reappear in future tests without anyone flagging it. Suggest adding: *"All other `*.svelte.spec.ts` files are checked for the same `query().toBeNull()` anti-pattern; instances are either fixed in this PR or tracked in a follow-up issue."* ### Recommendations - Update the issue body with the refined ACs above before implementation starts — this prevents the PR from being closed on "passes locally" evidence alone. - The root cause identified by Felix (synchronous `.query()` and async cleanup race) is concrete and actionable. No further elicitation needed before coding begins.

marcel commented

2026-05-12 10:03:00 +02:00

🗳️ Decision Queue — Action Required

2 decisions need your input before implementation starts.

Testing Strategy

Fake timers vs. prop injection for the 5-second auto-dismiss in OcrTrainingCard — vi.useFakeTimers() keeps production code unchanged and requires no API surface change. Injecting a dismissDelay prop makes the timing explicit and testable without fake timers, but adds an API detail that has no UX relevance. Felix and Markus both recommend fake timers. (Raised by: Felix, Markus)

Scope

Sweep scope for query().toBeNull() anti-pattern — Sara asked whether to fix all instances in this PR or open a follow-up. Codebase check finds exactly 4 instances across 2 files: 2 in AnnotationShape.svelte.spec.ts (already in scope) and 2 more in src/lib/document/annotation/AnnotationLayer.svelte.spec.ts lines 110 and 123. Since the total is ≤5 and both files are in the same annotation domain, Sara's own threshold says fix all 4 in this PR. Only needs confirmation if you prefer a separate issue. (Raised by: Sara)

## 🗳️ Decision Queue — Action Required _2 decisions need your input before implementation starts._ ### Testing Strategy - **Fake timers vs. prop injection for the 5-second auto-dismiss in `OcrTrainingCard`** — `vi.useFakeTimers()` keeps production code unchanged and requires no API surface change. Injecting a `dismissDelay` prop makes the timing explicit and testable without fake timers, but adds an API detail that has no UX relevance. Felix and Markus both recommend fake timers. _(Raised by: Felix, Markus)_ ### Scope - **Sweep scope for `query().toBeNull()` anti-pattern** — Sara asked whether to fix all instances in this PR or open a follow-up. Codebase check finds exactly 4 instances across 2 files: 2 in `AnnotationShape.svelte.spec.ts` (already in scope) and 2 more in `src/lib/document/annotation/AnnotationLayer.svelte.spec.ts` lines 110 and 123. Since the total is ≤5 and both files are in the same annotation domain, Sara's own threshold says **fix all 4 in this PR**. Only needs confirmation if you prefer a separate issue. _(Raised by: Sara)_

marcel commented

2026-05-12 10:30:18 +02:00

✅ Implementation complete — branch `feat/issue-541-flaky-browser-tests`

All 6 tasks from the plan are done. Two commits on the branch:

Commit 1 — `ae168831` — `test(annotation): replace synchronous query().toBeNull() with async not.toBeInTheDocument()`

Fixed all 4 query().toBeNull() instances across both annotation spec files:

AnnotationShape.svelte.spec.ts lines 48, 63 — the 2 confirmed flaky tests
AnnotationLayer.svelte.spec.ts lines 110, 123 — sweep (Sara's ≤5 threshold)

Root cause: .query() is a synchronous DOM snapshot; Svelte defers DOM updates to microtasks, so the element may still be present when the assertion fires. await expect.element(...).not.toBeInTheDocument() uses vitest-browser's async retry loop and is deterministic.

Commit 2 — `7ee038fa` — `test(ocr): fix track_reactivity_loss in OcrTrainingCard spec`

Two root causes addressed in one file:

In-flight test: resolveFetch({ ok: false }) was the last line of the test. The async finally { training = false } block ran in microtasks after afterEach(cleanup) destroyed the component. Fix: await the button becoming re-enabled — this waits for the finally block to settle before cleanup runs.

Success-dismiss test: startTraining() schedules setTimeout(() => { successMessage = null }, 5000) after a successful fetch. The 5-second timer fired after cleanup() destroyed the component. Fix: vi.useFakeTimers() + vi.runAllTimers() + vi.useRealTimers() scoped to the 'OcrTrainingCard — success dismiss button' describe block. The inner afterEach runs before the outer afterEach(cleanup), so the timer fires on the live component.

Verification

All 24 tests across the 3 spec files pass consistently. No query().toBeNull() instances remain anywhere in the test suite (grep confirmed zero remaining instances).

Confirmed: no new regressions introduced (full suite run shows only the pre-existing birpc issue from PdfViewer.svelte.test.ts — see separate note to the team).

## ✅ Implementation complete — branch `feat/issue-541-flaky-browser-tests` All 6 tasks from the plan are done. Two commits on the branch: ### Commit 1 — `ae168831` — `test(annotation): replace synchronous query().toBeNull() with async not.toBeInTheDocument()` Fixed all 4 `query().toBeNull()` instances across both annotation spec files: - `AnnotationShape.svelte.spec.ts` lines 48, 63 — the 2 confirmed flaky tests - `AnnotationLayer.svelte.spec.ts` lines 110, 123 — sweep (Sara's ≤5 threshold) Root cause: `.query()` is a synchronous DOM snapshot; Svelte defers DOM updates to microtasks, so the element may still be present when the assertion fires. `await expect.element(...).not.toBeInTheDocument()` uses vitest-browser's async retry loop and is deterministic. ### Commit 2 — `7ee038fa` — `test(ocr): fix track_reactivity_loss in OcrTrainingCard spec` Two root causes addressed in one file: **In-flight test:** `resolveFetch({ ok: false })` was the last line of the test. The async `finally { training = false }` block ran in microtasks after `afterEach(cleanup)` destroyed the component. Fix: await the button becoming re-enabled — this waits for the `finally` block to settle before cleanup runs. **Success-dismiss test:** `startTraining()` schedules `setTimeout(() => { successMessage = null }, 5000)` after a successful fetch. The 5-second timer fired after `cleanup()` destroyed the component. Fix: `vi.useFakeTimers()` + `vi.runAllTimers()` + `vi.useRealTimers()` scoped to the `'OcrTrainingCard — success dismiss button'` describe block. The inner `afterEach` runs before the outer `afterEach(cleanup)`, so the timer fires on the live component. ### Verification All 24 tests across the 3 spec files pass consistently. No `query().toBeNull()` instances remain anywhere in the test suite (`grep` confirmed zero remaining instances). **Confirmed**: no new regressions introduced (full suite run shows only the pre-existing birpc issue from `PdfViewer.svelte.test.ts` — see separate note to the team).

marcel referenced a pull request that will close this issue

2026-05-12 10:31:49 +02:00

test: fix flaky browser-mode tests in AnnotationShape and OcrTrainingCard specs #547

marcel referenced a pull request that will close this issue

2026-05-12 11:13:00 +02:00

test: fix flaky browser-mode tests in AnnotationShape and OcrTrainingCard specs #547

marcel closed this issue

2026-05-12 11:14:22 +02:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#541

test: fix flaky browser-mode tests in AnnotationShape and OcrTrainingCard specs #541

Problem

Impact

Acceptance criteria

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

Recommendations

Open Decisions

🏗️ Markus Keller — Application Architect

Observations

Recommendations

🔒 Nora Steiner — Application Security Engineer

Observations

Recommendations

🧪 Sara Holt — Senior QA Engineer

Observations

Recommendations

Open Decisions

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Observations

Recommendations

🛠️ Tobias Wendt — DevOps & Platform Engineer

Observations

Recommendations

📋 Elicit — Senior Requirements Engineer

Observations

Recommendations

🗳️ Decision Queue — Action Required

Testing Strategy

Scope

✅ Implementation complete — branch feat/issue-541-flaky-browser-tests

Commit 1 — ae168831 — test(annotation): replace synchronous query().toBeNull() with async not.toBeInTheDocument()

Commit 2 — 7ee038fa — test(ocr): fix track_reactivity_loss in OcrTrainingCard spec

Verification

✅ Implementation complete — branch `feat/issue-541-flaky-browser-tests`

Commit 1 — `ae168831` — `test(annotation): replace synchronous query().toBeNull() with async not.toBeInTheDocument()`

Commit 2 — `7ee038fa` — `test(ocr): fix track_reactivity_loss in OcrTrainingCard spec`