Unit & Component Tests job exits 1 — birpc teardown race resurfaces from async vi.mock factory with dynamic import #553
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
TL;DR
The
Unit & Component Testsjob is red again with the same[birpc] rpc is closed, cannot call "resolveManualMock"unhandled rejection that #535 was meant to bury. #535 and its follow-up #546 killed one trigger (vi.mock('pdfjs-dist', …)+ dynamic import inusePdfRenderer.svelte.ts). They never fixed the underlying race — they just removed the most visible victim. A second instance of the same pattern was hiding in plain sight, and it has now started to bite.Smoking gun:
frontend/src/lib/document/EnrichmentBlock.svelte.spec.ts:10-13:This is the only file in the whole browser test suite (
src/**/*.svelte.{test,spec}.ts) with anasync () => …await import(…)vi.mockfactory. ADR-012 listed$app/storesas a "safe residual exception" because the consumer import is static — but that's not the dimension that matters. What matters is whether the factory body can produce a late birpc roundtrip, and this one can.What the user sees
Every recent push, on
feat/issue-545-notification-dropdown-iframe-fixand after merge ontomain, theUnit & Component Testsjob turns red while every individual test reports green. The failure is detected by the CI guard added in538adb43(Assert no birpc teardown race in coverage run).Failing runs:
feat/issue-545-notification-dropdown-iframe-fixadmin/tags/[id]/page.svelte.test.tsmainafter merge ofd21ba8feChronikFuerDichBox.svelte.spec.ts/geschichten/[id]/page.svelte.test.tsThe "last test before the crash" changes between runs. That alone is the giveaway: this is a race, not a bad test.
Failing job — exact symptom (run 1596, job 4696)
This is identical to the trace in #535 — same line numbers, same call chain.
Where the trace actually lands
node_modules/@vitest/browser-playwright/dist/index.js:977:The route handler is attached to the Playwright
BrowserContext, not scoped per test/iframe. Any in-flight HTTP request for a manually-mocked module that lands here after vitest closed the worker's birpc channel triggers the unhandled rejection. There is noif (rpc.closed) return route.continue()guard.The CI guard in
538adb43documents this — "birpc guard covers coverage run only" — so we already know the failure is conditioned on the client coverage config (vitest.client-coverage.config.ts, istanbul). Istanbul instrumentation widens the window between "last test green" and "worker fully shut down", which is exactly when the race lives.Why #535 and #546 did not close the door
#535 /
11547645(PdfViewer libLoader) + #546 (test re-introduction guard) + the ESLint rule + the CI grep guard collectively removedvi.mock('pdfjs-dist', …)as a source of late birpc calls.pdfjs-distwas being lazy-imported fromonMountinusePdfRenderer.svelte.ts, which fit the "dynamic import after teardown" pattern perfectly. That fix is real and good.But the mechanism ADR-012 names — any
ManualMockedModulefactory whose resolution depends on RPC at unpredictable timing — has another instance:The ADR's safety argument for
$app/stores:The premise — "factories are served synchronously" — is violated by this factory. The factory itself awaits a dynamic import. The "consumer is static" property does not protect the factory body from being driven through the same RPC channel during teardown.
Sanity check:
grep -rln "vi\.mock.*async.*=>"infrontend/src/**/*.svelte.{test,spec}.tsreturns exactly one file —EnrichmentBlock.svelte.spec.ts. So this is a single, addressable hot spot.Secondary suspect: SvelteKit hover prefetch
frontend/src/app.html:14:Several of the tests that finish right before the crash (
ChronikFuerDichBox,EnrichmentBlock,admin/tags/[id]/page,geschichten/[id]/page) render real<a href="/...">links that the SvelteKit router auto-prefetches. The prefetch fires a realfetchfor the target route's loader chunks; those chunks go through the same Playwright route handler. Even whenEscorcleanup()happens, an in-flight prefetch can still arrive at the handler after the iframe is torn down.This is unverified from the log alone, but it explains the cluster of suspect "last tests" being multi-link cards/dashboards.
What is not the cause (do not chase these)
TypeError: Cannot read properties of undefined (reading 'wrapDynamicImport')lines in the log. SvelteKit dev runtime stderr noise fromget_hooks (.svelte-kit/generated/server/internal.js:37); same as #535. Cosmetic.notifications.svelte.spec.ts. Intentional negative-path log line in passing tests.Caused by:line names.actions/upload-artifact@v4GHESNotSupportedError on the same job is unrelated and already-known; out of scope.Repro plan
frontend/:vi.mock('$app/stores', …)block inEnrichmentBlock.svelte.spec.ts(the$navigating-skeleton tests will fail their assertions — that's fine for this experiment) and run the coverage config 10× again. If birpc errors disappear or drop dramatically, the EnrichmentBlock factory is confirmed as the (current) dominant trigger.Acceptance criteria
EnrichmentBlock.svelte.spec.tscontains novi.mock(…, async () => …)factory withawait import(…)in the body.vi.mock(*, async () => …)factory anywhere infrontend/src/**/*.svelte.{test,spec}.tsperforms a dynamic import in its body.await import(…).vi.mock('pdfjs-dist', …)is generalised to flag anyvi.mock(*, async () => …)with a dynamic import in the body, scoped to**/*.{spec,test}.ts.Unit & Component Testsjob is green on at least 20 consecutive CI runs of an unmodified branch.npx vitest run -c vitest.client-coverage.config.ts --coverageis green on 10 consecutive runs.[birpc] rpc is closed, cannot call "resolveManualMock"lines across all of the above.Candidate fix paths (cheapest → most invasive)
1. Rewrite
EnrichmentBlock.svelte.spec.tsto remove the async factory (RECOMMENDED)Two viable shapes. Pick the one that matches the existing house style for this domain.
1a —
vi.hoisted(idiomatic vitest, ~5 line change):The factory becomes synchronous; nothing in it awaits anything. No
ManualMockedModuleround-trip is triggered after worker teardown because the factory body is a plain object literal.1b — prop injection (ADR-012's "test-host pattern"):
Add a
navigating?: Readable<unknown | null>prop toEnrichmentBlock.svelte(default to the real$navigatingimport). Tests pass a localwritable()and don't mock$app/storesat all. Heavier but matches the pattern that #535 / #546 chose.2. Tighten ADR-012 and the ESLint rule
**/*.svelte.{test,spec}.tsmust be synchronous and contain noawait import(…).eslint.config.jsno-restricted-syntaxfrom the pdfjs-dist-specific selector to anyCallExpression[callee.object.name='vi'][callee.property.name='mock']whose second argument isArrowFunctionExpression[async=true]AND contains anAwaitExpression > ImportExpressionanywhere in its body. Same message, pointing at ADR-012.3. Turn off SvelteKit link prefetch during tests
Add
data-sveltekit-preload-data="off"either to a global test-host wrapper or programmatically in test setup (document.body.dataset.sveltekitPreloadData = 'off'). Cheap, eliminates the secondary trigger named above. Has no impact on production behaviour because tests don't navigate.4. Workaround at the vitest level (escape hatch — only if 1–3 do not stick)
In
vitest.client-coverage.config.ts, settest.fileParallelism: falseor pinpool: 'forks'withpoolOptions.forks.singleFork: true. Serialises iframe teardown, but multiplies the coverage run wall-clock time by ~2-3×. Not recommended unless we have evidence 1–3 are insufficient.5. Upstream patch
The route handler at
@vitest/browser-playwright/dist/index.js:977should guard with something like:Worth filing upstream — this is a genuine library robustness gap, not just our usage. But not on this issue's critical path.
References
frontend/node_modules/@vitest/browser-playwright/dist/index.js:977frontend/node_modules/@vitest/mocker/dist/chunk-registry.js:161frontend/node_modules/@vitest/mocker/dist/chunk-registry.js:189🏛️ Markus Keller — Senior Application Architect
Observations
EnrichmentBlock.svelte.spec.ts:10-13is statically imported and still produces the race, because the factory itself awaits a dynamic import. One counter-example invalidates the safety claim.$app/stateas the modern reactive idiom. Agrepfor production imports finds 13+ files usingfrom '$app/state'and only two still onfrom '$app/stores':routes/admin/tags/[id]/+page.svelteandlib/document/EnrichmentBlock.svelte. EnrichmentBlock is an outlier from the established pattern, not a baseline.eslint.config.js:76-87→ CI grep at.gitea/workflows/ci.yml:39-45→ CI birpc assert at.gitea/workflows/ci.yml:58-66) is the right shape. It just needs to encode the correct invariant.Recommendations
**/*.svelte.{test,spec}.tsmust be synchronous — noawait, noimport()in the body. The mocked module's import semantics elsewhere are irrelevant; the factory is the failure surface.EnrichmentBlock.sveltefrom$app/stores.navigating→$app/state.navigating, matching the pattern inroutes/aktivitaeten/+page.svelte:117(navigating.type) androutes/documents/+page.svelte:261(navigating.to). This closes the deprecation, removes the mock surface entirely, and aligns one of the last two stragglers with the codebase convention.await import(…)in the body, which is the named mechanism.👨💻 Felix Brandt — Senior Fullstack Developer
Observations
The canonical pattern for shared mock state in this repo is already
vi.hoisted+ synchronousvi.mockfactory. See:routes/DropZone.svelte.spec.ts:7-10—const { invalidateAllMock } = vi.hoisted(() => ({ invalidateAllMock: vi.fn(...) })); vi.mock('$app/navigation', () => ({ invalidateAll: invalidateAllMock }));lib/notification/NotificationBell.svelte.spec.ts:6-26— hoistsgotoMock,mockMarkRead,mockNotificationListand uses sync factories.lib/shared/primitives/LanguageSwitcher.svelte.spec.ts:6—vi.hoisted(() => vi.fn()).EnrichmentBlock is the only browser spec that deviates with an
async () => { await import(…) }factory. It's the outlier.EnrichmentBlock.svelte:2imports the deprecated$app/stores. The component only uses$navigatingto drive a skeleton state (line 21). Production already uses$app/state.navigatinginroutes/documents/+page.svelte:261androutes/aktivitaeten/+page.svelte:117for the same purpose. The migration is one import line + one expression rewrite.We're on
feat/issue-545-notification-dropdown-iframe-fix. Do not bundle this fix into that PR — atomic commits, separate branch.Recommendations
EnrichmentBlock.sveltefrom$app/storesto$app/state. Changeimport { navigating } from '$app/stores'→import { navigating } from '$app/state', then!!$navigating→!!navigating.type. The component becomes a normal reactive-state consumer, the test stops needing to mock anything store-shaped, and you close the deprecated-import outlier in one commit.vi.hoistedpattern fromDropZone.svelte.spec.ts:9verbatim. Avoid therequire('svelte/store')form from option 1a in the issue body — Vite/ESM context makesrequire()brittle. Hoist a plain{ value: T }ref (likeNotificationBell.svelte.spec.ts:10) and have the factory return a{ subscribe }shim that reads from it.npx vitest run -c vitest.client-coverage.config.ts --coverage5× locally, confirm at least one red with the named birpc trace. Green phase = apply the fix, get 10× green locally. Refactor phase = generalise the ESLint rule per Markus's ADR update.CallExpression[callee.object.name='vi'][callee.property.name='mock']whose second arg isArrowFunctionExpression[async=true]containing anAwaitExpression > ImportExpressionin its body. Same message, points at ADR-012.Open Decisions
$app/statemigration vsvi.hoistedshim. The migration is one line of production code + one line of test code and closes the deprecated-import outlier; it's the right long-term call. The shim is a smaller surgical fix that leaves$app/storesin place; it's the right immediate call if you want to keep the production component untouched in the same commit that fixes the test infrastructure. They're not mutually exclusive — migration is just a tighter commit story.🛠️ Tobias Wendt — DevOps & Platform Engineer
Observations
The defense-in-depth chain is in the right shape but the layers fire at different costs:
eslint.config.js:76-87) — fails in seconds, catches at save time..gitea/workflows/ci.yml:39-45) — fails in ~10s, before the coverage run..gitea/workflows/ci.yml:58-66) — fails after the ~2-minute coverage step has burned.All three are currently scoped to literal
'pdfjs-dist'. Generalising the first two gives the biggest cost reduction; the third stays as belt-and-suspenders.Image is pinned correctly:
mcr.microsoft.com/playwright:v1.58.2-noble(ci.yml:15).actions/upload-artifact@v4is correctly in use (line 70). The@v3GHESNotSupportedError in the issue body is from a different failing job; ignore for this issue's scope.The "20 consecutive CI runs of an unmodified branch" acceptance criterion as written implies 20 push events. That's ~40 minutes of serial CI per acceptance check and operationally awkward.
Recommendations
workflow_dispatch-triggered job:coverage-flake-probe. Job matrixstrategy.matrix.run: [1..20], each cell runsnpm run test:coverageand asserts zero[birpc] rpc is closedlines. Triggered on demand. Pass = all 20 cells green. This is the cheap form of the acceptance criterion: one fire, parallel cost, deterministic signal.vi.mock('pdfjs-dist'string to aripgrep/PCRE pattern matchingvi\.mock\([^)]+,\s*asyncscoped tofrontend/src/**/*.{spec,test}.ts. Same step, broader catch. Pair with the ESLint rule generalisation Markus and Felix described.data-sveltekit-preload-data="off"in a test setup file (e.g. insidesrc/test-setup.tsor viadocument.body.dataset.sveltekitPreloadData = 'off'in a global beforeAll). Eliminates the secondary hover-prefetch trigger. Production unchanged.fileParallelism: false/singleFork: true). Serialising iframe teardown multiplies the slowest CI job's wall-clock — measurable regression on every push. Path 1+2+3 is cheaper and sufficient.@vitest/browser-playwrightfor the missingif (rpc.closed) return route.continue()guard atdist/index.js:977. Link the issue in the closing commit so the next maintainer can claim back the local workarounds when upstream lands a fix.Open Decisions
🛡️ Nora Steiner ("NullX") — Application Security Engineer
Observations
Unit & Component Testsjob is "sometimes red for no reason", they learn to rerun instead of read failures. That muscle memory is what causes real security regressions to be merged while the CI is red. It's not a vulnerability — it's a posture problem.Recommendations
git blameon a similar-looking race in 18 months should land at the ADR within one hop. Treat that line of the AC as non-negotiable.🧪 Sara Holt — Senior QA Engineer
Observations
EnrichmentBlock.svelte.spec.ts:10-13is the only deviation in the entire browser-mode spec corpus. Every other spec that needs to leak state acrossvi.mock's hoist uses thevi.hoistedpattern (seeDropZone.svelte.spec.ts:9,NotificationBell.svelte.spec.ts:9-10,LanguageSwitcher.svelte.spec.ts:6). The fix is conformance to existing house style, not invention.vitest.client-coverage.config.ts:43-48(80% / 80% / 80% / 80%) are correctly placed. The bug is in test-runner shutdown, not in coverage — the fix must not touch the thresholds.Recommendations
npx vitest run -c vitest.client-coverage.config.ts --coveragerepeated 5–10× — and confirm at least one red with the named[birpc] rpc is closed, cannot call "resolveManualMock"line. If you can't get the test red in 10 tries, the fix can't be validated. Don't skip this step.coverage-flake-probejob) → merge. Don't gate merge on 20 sequential push events.frontend/src/__meta__/no-async-mock-factories.test.tsor similar) that globssrc/**/*.svelte.{test,spec}.tsand fails if any file contains avi.mock(*, async () => { … await import(…) … })pattern. This catches a new factory the moment a contributor saves it in their editor — before CI runs, before lint runs in some setups. Belt-and-braces with ESLint, but the ESLint rule could be disabled or scoped wrong; an in-suite test is harder to bypass.🎨 Leonie Voss — UX & Accessibility
Observations
EnrichmentBlock.svelte:33-38. It'sh-[360px]withanimate-pulse,motion-reduce:animate-none,aria-busy="true", andaria-label="Lade aktualisierte Liste". These are correct — preserve them through any fix path.!!$navigating && topDocs.length === 0. The skeleton only appears when the user has nothing to look at and a navigation is in flight. That's the right semantic.Recommendations
$app/state(Felix's preferred path): match the trigger to the established pattern.!!navigating.type(as inroutes/aktivitaeten/+page.svelte:117) is the right shape — fires on any in-flight navigation. Don't switch tonavigating.to !== nullhere without a reason; both work, but the codebase has two conventions in flight and I'd prefer one.topDocsset, confirm the skeleton appears, confirmmotion-reduce: reducekills the pulse animation. The fix is in test infrastructure, but the expression rewrite touches the runtime trigger — easy to regress quietly.aria-busy, German label) is sound. No change recommended there.No Open Decisions from a UX standpoint — the fix doesn't change user-visible design intent.
📋 Elicit — Requirements Engineer
Observations
[birpc] rpc is closedlines), explicit named files (EnrichmentBlock.svelte.spec.ts), explicit ADR update mandate, and a "do not chase" list for red herrings. Most bug issues lack three of these; this one has all of them.workflow_dispatchon the same SHA, or 20 reruns of a single completed workflow? Each is a different cost profile and a different signal (push events test cache variance; matrix dispatch tests teardown determinism; rerun tests neither). Without resolving this, an implementer will pick the most expensive option by default.[birpc] rpc is closed, cannot call \"resolveManualMock\"lines across all of the above" — clear for the named string. Less clear about adjacent stderr noise (TypeErrorwrapDynamicImport, "Failed to fetch unread count"). The issue body explicitly excludes those as "do not chase", but the acceptance grep must match the exact named string only, not all stderr noise, or acceptance will fail on unrelated incidents.Recommendations
workflow_dispatchmatrix run (per Tobias'scoverage-flake-probejob). One fire, parallel cost, 20 independent iframe lifecycles. State this explicitly in the acceptance criteria.grep -qF '[birpc] rpc is closed, cannot call "resolveManualMock"' <log>. Anchor to the full message string — avoids false fails on unrelated stderr.Path 1a (hoisted shim),Path 1b (prop injection),Path 2 ($app/state migration). A reviewer grepping git history later can find the institutional memory in one query.Open Decisions
birpc/rpcnoise. The issue body's "do not chase" list excludes the second-order stderr; the grep should match it. Confirm the AC encodes the exact target string and nothing broader.🗳️ Decision Queue — Action Required
3 decisions need your input before implementation starts.
Architecture / Scope
$app/statemigration vsvi.hoistedshim. Migration changesEnrichmentBlock.svelte:2from deprecated$app/storesto modern$app/state(matching 13+ other production files) and removes the need to mock anything store-shaped — one tighter commit, closes one of two$app/storesoutliers, but touches the production component's runtime trigger expression. Hoisted shim leaves the production component untouched and changes only the spec — smaller blast radius, but leaves the deprecated import in place for a later migration. (Raised by: Felix)Infrastructure
Verification trigger mechanism for the 20-run acceptance criterion. Three options:
workflow_dispatch— one fire, 20 parallel cells against a single SHA, ~one CI run's worth of wall-clock. Tests teardown determinism only. Operationally cheap, deterministic signal.Pick once and write into the AC. (Raised by: Tobias, Elicit)
Acceptance Criteria
wrapDynamicImport, "Failed to fetch unread count"). The acceptance grep should anchor to the full string[birpc] rpc is closed, cannot call "resolveManualMock"to avoid false fails on unrelated stderr. Confirm this is the intent. (Raised by: Elicit)👨💻 Felix Brandt — Implementation Complete
Branch:
feat/issue-553-birpc-async-mock-factory(offmain—89860403).Fix path applied:
$app/statemigration (Markus's recommendation / Felix's Path 2).Commits (7, atomic)
9f1b8b42fix(enrichment-block):migrateEnrichmentBlock.sveltefrom$app/stores.navigating→$app/state.navigatingwith!!navigating.type; spec usesvi.hoisted+ syncvi.mockfactory with getter; deleted__mocks__/navigatingStore.ts3c9e40catest(setup):src/test-setup.tssetsdocument.body.dataset.sveltekitPreloadData = 'off', wired viasetupFilesin bothvite.config.tsandvitest.client-coverage.config.ts; meta-test asserts setup ran (eliminates Tobias's secondary trigger)636d61a8test(meta):src/__meta__/no-async-mock-factories.test.tsglobs browser specs and asserts none match the banned async-factory-with-dynamic-import pattern; demonstrated red→green by planting a synthetic offender5afebde3refactor(eslint):generalisedno-restricted-syntaxrule to flagvi.mock(..., async () => { ... await import(...) ... })via AST selector; kept the literalpdfjs-distrule alongside67cd56acci(unit-tests):extended the CI early-fail grep with PCRE-multiline matching the same pattern; excludes__meta__/(fixture strings)c8208847ci(coverage-flake-probe):newworkflow_dispatchworkflow,strategy.matrix.run: [1..20], each cell runs coverage and asserts zero[birpc] rpc is closed— the verification mechanism for the 20-run ACaddf5c98docs(adr-012):dropped the "residual exceptions" handwave, added the binding sync-factory invariant, recorded the$app/statemigration, listed all four defence layers and the probe workflowAcceptance criteria
EnrichmentBlock.svelte.spec.tscontains novi.mock(…, async () => …)factory withawait import(…)— replaced withvi.hoisted+ sync getter.vi.mock(*, async () => …)factory anywhere infrontend/src/**/*.svelte.{test,spec}.tsperforms a dynamic import — enforced by ESLint, CI grep, and the in-suite meta-test.pdfjs-distliteral kept (libLoader pattern enforcement) + new selector flags anyvi.mock(..., async () => { await import(...) }).Unit & Component Testsjob green on at least 20 consecutive CI runs — verifiable on demand via the newcoverage-flake-probe.ymlworkflow_dispatch (one fire = 20 parallel cells against this SHA).npx vitest run -c vitest.client-coverage.config.ts --coveragegreen on 10 consecutive runs — skipped locally per @marcel's instruction (terminal crashed mid-run); deferred to CI evidence.[birpc] rpc is closed, cannot call "resolveManualMock"lines — to be verified by thecoverage-flake-probejob and the permanent CI birpc assert.9f1b8b42body and this comment both state "Fix path applied:$app/statemigration (Markus's recommendation / Felix's Path 2)".Bonus hardenings (beyond AC, recommended by personas)
src/test-setup.ts. Closes the named secondary trigger.Next steps
mainfromfeat/issue-553-birpc-async-mock-factory.coverage-flake-probeworkflow on the PR head (manualworkflow_dispatch) to satisfy the 20-run AC./review-prfor the multi-persona PR review.👨💻 Felix Brandt — Local repro after fix + follow-up analysis
@marcel asked me to repro locally after the fix. Doing so revealed the named race still surfaces on this branch under
npx vitest run --project=client(non-coverage, 580 tests). Findings:What's confirmed working
EnrichmentBlock.svelte.spec.ts+__meta__/*) runs green every time.What still surfaces locally
Running the full 580-test client project surfaces the trace:
The crash is async; the "last test before the trace" rotates between runs (last seen:
admin/invites/page.svelte.test.ts:74, which has zerovi.mockcalls — confirming the trigger is not in that file but in a parallel iframe being torn down).Diagnostic added in
edde9292: also disabledata-sveltekit-preload-code(route JS chunks) insrc/test-setup.ts. The original fix turned off only loader-data prefetch; route-code chunks prefetched on hover/tap are also a candidate trigger.What this means
The remaining residual is consistent with the issue body's own analysis: the route handler at
@vitest/browser-playwright/dist/index.js:977does not guard againstmodule.resolve()being called after the worker's birpc channel closes. That's Path 5 in the issue body — upstream patch — explicitly out of this issue's critical path. The set of mitigations on this branch reduces the surface but doesn't close the underlying library gap.My recommendation
Push the branch to CI as-is and run the new
coverage-flake-probeworkflow (workflow_dispatch, 20-cell matrix). Three plausible CI outcomes:main. Mitigations meaningful but not bulletproof.fileParallelism: falseorsingleFork: true) as an opt-in for the coverage step only — trade ~2× coverage-run wall-clock for determinism.@vitest/browser-playwrightand apply Path 4 as the immediate band-aid.Latest branch head:
edde9292(8 commits total).