Reference in New Issue
Block a user
Delete Branch "feat/issue-553-birpc-async-mock-factory"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #553.
Summary
The first cycle of #553 closed one named trigger of the
[birpc] rpc is closed, cannot call "resolveManualMock"teardown race (asyncvi.mockfactory withawait import(…)inEnrichmentBlock.svelte.spec.ts). Felix's follow-up local run on the post-fix branch confirmed the race still surfaced under the full 580-test client project, with the "last test before crash" rotating randomly across files with zerovi.mockcalls.Root cause of the residual race, identified by a direct read of
@vitest/browser-playwright/dist/index.jsand confirmed against vitest issue #9957 / PR #10267:We hit the exact named trigger:
$lib/shared/services/confirm.sveltewas mocked under two distinct IDs across 8 spec files (no other duplicate-ID pair existed). This PR closes that gap from four sides:confirm.svelteto a single canonical ID (no-extension form) across production code (4 files) and specs (7 files)+layout.svelte,documents/[id]/+page.svelte,NameHistoryEditCard.svelte,TagDeleteGuard.svelte+ 7 spec filessrc/__meta__/no-duplicate-mock-ids.test.ts(NEW, 130 LOC, 7 tests: 6 self-test fixtures + 1 corpus scan)patch-packagebackport of vitest PR #10267 against@vitest/browser-playwright@4.1.0— closes the upstream race at the route-handler level (unroute existing predicate before installing replacement)patches/@vitest+browser-playwright+4.1.0.patch,package.json(postinstall hook + devDep)docs/adr/012-browser-test-mocking-strategy.mdThe patch from Layer 3 is reapplied automatically via
postinstall → patch-package. CI'snpm cistep exercises this. Delete the patch when@vitest/browser-playwrightpublishes a release containing PR #10267 — the meta-test (Layer 2) stays as a permanent contributor-facing guard.Defence in depth — six layers
ADR-012's enforcement chain is now:
no-restricted-syntax(scoped to**/*.{spec,test}.ts) — sync-factory invariantno-async-mock-factories.test.ts— sync-factory invariantno-duplicate-mock-ids.test.ts— one-canonical-ID-per-module invariantpatch-packagebackport of PR #10267 — closes the upstream race itselfCommits
Atomic per
feedback_atomic_commits.md— one logical change per commit (production import normalisation, test mock normalisation, meta-test, patch-package, docs).Test plan
npm run check— type checking (787 pre-existing errors in profile/users/stammbaum/admin/users; none introduced by this change)npm run lint— Prettier + ESLint (running under push CI #1603)npm run test— full vitest suite, both projects (under push CI). Newno-duplicate-mock-ids.test.tsmust pass; existingno-async-mock-factories.test.tsmust remain green;EnrichmentBlock.svelte.spec.tsmust remain greennpm cipostinstall replays the patch without error (under push CI)coverage-flake-probe.ymlworkflow_dispatch — 20-cell matrix run against this branch's head, all green, zero[birpc] rpc is closedlines. Triggered as run #1604Follow-up
Separately tracked: #554 —
audit: factory mocks → prop injection migration (sveltest pattern). Out of scope for this PR.References
frontend/src/lib/shared/services/confirm.test-host.svelte— canonical prop-injection example (informs the Layer 5 follow-up)Scans every src/**/*.svelte.{spec,test}.ts file for vi.mock first-arg strings, canonicalises each by stripping a trailing .js/.ts after .svelte, groups by canonical id, and fails if any canonical id is referenced under two or more distinct raw spellings. Mirrors the shape of src/__meta__/no-async-mock-factories.test.ts: source-text regex scan (no AST parser dependency), red/green self-test fixtures inline, then one corpus assertion that the whole suite is clean. This is the in-suite defence-in-depth layer for the duplicate-id birpc race named in ADR-012 / #553 and fixed upstream by vitest PR #10267. Harder to disable than ESLint (cross-file invariant ESLint cannot express anyway) and harder to scope around than a CI grep. Refs: #553 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Markus Keller — Senior Application Architect
Verdict: Approved
Five-commit, atomic, well-bounded change. Defence-in-depth with six layers is exactly the shape this class of "intermittent CI race" calls for — each layer catches the symptom at a different stage of the loop (editor → pre-test grep → in-suite → post-run birpc grep). Nothing here leaks across domain boundaries; the changes are all infrastructure-level (test setup, ESLint, patch-package, ADR).
What I like
When to remove the patch) and the in-suite meta-test stays as the permanent guard. This is the right way to backport — own the workaround, document its expiry condition.Concerns (non-blocking)
The
idPreficatestypo is reproduced verbatim in the patch. This is correct — patch-package must match the upstream file — but it does mean if upstream renames the variable in a point release, our patch will fail to apply silently. Tobias's CI exercise (npm cipostinstall) is the right guard. Worth a sentence in the ADR's "When to remove the patch" section noting how the failure will surface (postinstall step exits non-zero,npm cifails the unit-tests job before any test runs).The meta-test fires a
readFileSyncper spec file at suite startup. With ~50 browser specs today this is cheap; at 500+ it adds measurable cold-start time per CI run. Not actionable now — flag it as something to revisit if the meta-test ever appears in a CI flame graph.coverage-flake-probe.ymlis a 20-cell matrix on the Playwright Docker image. Tobias will price-check this, but architecturally I'd note that it'sworkflow_dispatch-only — won't fire on every push. Good choice. A 20× matrix on push would dominate CI capacity.LGTM on layering
No controller-to-repository violations to flag (this PR doesn't touch the backend). No cross-domain frontend imports introduced. ESLint boundaries rule unchanged. ADR cross-references all six layers and points at the upstream issue/PR for traceability. This is how
docs/adr/should be maintained.— Markus
Felix Brandt — Senior Fullstack Developer
Verdict: Approved with concerns
The diff is exactly what you'd want a teardown-race fix to look like: small, atomic, every commit one logical change, every layer of defence has a test behind it. The two new meta-tests (
no-duplicate-mock-ids.test.ts,no-async-mock-factories.test.ts) follow textbook red/green construction — 6 self-test fixtures bracketing the corpus scan. That's the right shape.What's right
refactor(confirm-service)→test(confirm-service)→test(meta)→build(patch-package)→docs(adr-012). Reads cleanly ingit log. No "while I'm here" smuggling.findDuplicateMockIds()is exported and unit-tested with 6 fixtures (positive, negative, edge cases including.svelte.tscanonicalisation) before the corpus scan asserts the live invariant. Red phase visible: the fixtures would have caught the bug, the corpus scan locks the fix in. This is the pattern I want to see for every meta-guard.canonicalise()is a 3-line function that does exactly one thing. No nested conditionals.vi.hoisted()inEnrichmentBlock.svelte.spec.ts(lines 9-17). Exactly the pattern ADR-012 prescribes. The getter defers the read until consumption time and the hoisted reference is initialised before the also-hoistedvi.mockruns.Concerns
Regex-based scan in
no-async-mock-factories.test.tsis fragile. The patternvi\.mock\([^)]*,\s*async[^{]*\{[\s\S]*?await\s+import\s*\(works on the corpus today but trips on legitimate code if anyone ever writes:The ESLint rule (
AwaitExpression > ImportExpressionin the AST selector) doesn't have this problem because it inspects the AST. The CI grep guard also doesn't (same regex, same blind spot — but at least the message is clear). Suggestion: add one negative-fixture test forimportOriginal()so we know the pattern works for that case, or annotate in the file that this scan is a backstop and the AST selector is the source of truth.canonicalise()strips only one trailing extension..svelte.jsand.svelte.tsare handled;.svelte.mjsand.svelte.cjsaren't. Probably not used anywhere — but if someone introduces one, the guard silently skips. One-line fix:if (id.match(/\.svelte\.[mc]?[jt]s$/)) return id.replace(/\.svelte\.[mc]?[jt]s$/, '.svelte').EnrichmentBlock.svelte.spec.tsline 25-28 has a typed helperfunction doc(...)that shadows thedocvariable name used elsewhere in this codebase. Nit. Rename tomakeDoc()to match the project's factory-function convention (seemakeUser,makeDocumentin the persona doc, and the broader test corpus). 5-line change.no-duplicate-mock-ids.test.tsdiscovery walksrecursive: trueon a NodereaddirSync— works, but blows up cold cache time on container starts. Same observation as Markus's point 2. Defer.Test plan you mentioned
npm run check— type-check (787 pre-existing errors, none new): fine, that backlog is tracked.coverage-flake-probe.ymlrun #1604: that's the acceptance criterion. Until it's green across all 20 cells, this is approved-with-concerns. Once it lands green, the verdict flips to a clean approval.— Felix
Tobias Wendt — DevOps & Platform Engineer
Verdict: Approved with concerns
CI-side defences are well-thought-out.
coverage-flake-probe.ymlas aworkflow_dispatch-only matrix replaces what would otherwise be 20 push events onmain— that's a real capacity saving and the right tool for "verify a flake-class problem is fixed."What's right
actions/cache@v4,actions/checkout@v4,actions/upload-artifact@v4— all current, none deprecated. Cachingfrontend/node_moduleskeyed onpackage-lock.jsonis the canonical pattern.mcr.microsoft.com/playwright:v1.58.2-noble). Matches the version of@playwright/testinpackage.json(1.58.2). Reproducible.postinstall: patch-packageis the right shape for a tactical backport — applied vianpm ciin CI, no separate apply step, the artefact is one file inpatches/.set -eo pipefail+tee+ post-step grep guard. Exactly how you want to capture and gate on a log signature without losing the exit code throughtee. Correct.fail-fast: falseon the 20-cell matrix. You want every cell to run so you see partial-failure patterns, not bail on cell #3. Correct.Concerns
coverage-flake-probe.ymlruns 20 cells × fullnpm run test:coverage. Each cell does its ownnpm ci(if cache misses) plus a coverage run that, looking at the test pyramid, is bounded by browser tests. Ballpark per cell: 3-6 minutes. 20 cells in parallel on a shared NAS runner means contention. Worth checking how many concurrent runners are configured before this becomes the default flake-probe — and worth adding a comment in the workflow header noting "use sparingly; 20-cell on shared runner."No explicit timeout on the test step.
unit-testsinci.ymldoesn't set one either, but the flake-probe is the place to set one — a hung cell could pin a runner for hours. Suggesttimeout-minutes: 15per cell.Upload coverage log on failureartefact name is run-indexed (coverage-log-run-${{ matrix.run }}). Good. But the path uses${{ github.run_id }}in the filename so the second invocation of the same workflow won't collide. Verify Gitea Actions populatesgithub.run_idthe same way GitHub does — if not, you'd want a fallback like${{ env.GITHUB_RUN_ID || github.sha }}. Quick check, not a blocker.The patch is applied via
postinstall, not as part of the build. If a contributor runsnpm install --ignore-scripts(e.g. in a sandboxed CI step they haven't audited yet), the patch silently skips. The in-suite meta-test (Layer 5) still fires the duplicate-ID guard, but the upstream-fix backport (Layer 6) doesn't. Not a blocker — but worth a defensive log line:postinstall: patch-package && echo "patches applied"so a missing patch is visible in CI output.No Renovate config visible for the
patches/directory. When@vitest/browser-playwrightreleases the version containing PR #10267, Renovate will bump the dep and the patch will fail to apply (filename embeds4.1.0). That's actually the right failure mode — loud, visible, forces a human to delete the patch. Worth ADR-012 calling this out explicitly under "When to remove the patch" — "the bump PR will fail postinstall; that is the signal to delete." Minor doc edit.LGTM otherwise
The grep guard in
ci.yml(Assert no birpc teardown race in coverage run) is correct —if: always()so it runs even if the test step exited 1,grep -qFfor fixed-string match, exit 1 with a clear message. Standard.Monthly cost impact of this PR: 0 EUR. No new infrastructure.
— Tobias
Elicit — Requirements Engineer & Business Analyst
Verdict: Approved with concerns
Working in Brownfield Phase B3 (issue triage) and B4 (gap analysis) mode for this PR.
Issue-quality assessment of #553 → this PR
The PR description meets the Definition of Ready bar I'd want for a solo-with-LLM workflow:
[birpc] rpc is closedlines." This is exactly the kind of NFR target I push for — not "make the flake go away" but "20 consecutive green coverage runs against this SHA." Falsifiable. Re-verifiable.Concerns and gaps
Acceptance criterion is gated on a workflow that hasn't reported back yet. The PR description marks
coverage-flake-probe.ymlrun #1604 as "Triggered" with empty checkboxes for lint/test/postinstall. Until run #1604 reports all 20 cells green, the PR's primary acceptance evidence is pending. This is fine for a draft, but the PR is not marked draft and is open for review. Recommendation: either mark Draft until #1604 completes, or post the run result as a follow-up comment before requesting merge.NFR coverage on the patch lifecycle is implicit, not explicit. ADR-012 says "delete the patch when
@vitest/browser-playwrightpublishes a release containing PR #10267" — but the trigger for that deletion isn't tracked. Suggestion: open a low-priority Gitea issue right now titled "Remove@vitest+browser-playwright+4.1.0.patchonce upstream PR #10267 is released" and link it from the ADR. Without this, the patch becomes invisible technical debt. (TD entry shape:TD-NN | Tactical backport of vitest PR #10267 | Drift risk if upstream API changes | When upstream releases | XS.)The PR closes #553 but the AC for #535 (original incident) was "60 consecutive green
workflow_dispatchCI runs againstmainafter the fix is merged." Is that still the bar, or has #553's 20-cell parallel probe superseded it? Per ADR-012's "Acceptance criterion for #535" line vs the PR's "20-cell matrix" — these are slightly different shapes (60 sequential post-merge vs 20 parallel pre-merge). Worth clarifying in the merge commit message or in #535 directly so the closing condition for the original incident is unambiguous.No user-facing impact mentioned. This is correct — it's a CI/test-infrastructure fix and shouldn't have one. But the PR template doesn't make this explicit. If you maintain a
CHANGELOG.mdor release notes, an "internal: closed CI flake from #553/#535" entry is the right shape.LGTM on requirements rigor
The PR description's six-layer enforcement chain table (layer / what / where) is the kind of artefact I'd produce as the output of a Phase B6 repackage. Issue references (vitest #9957, #10267) are linked. Out-of-scope item is parked with its tracking issue. Test plan has explicit checkboxes mapped to verification commands.
This is what spec-dense issue work looks like.
— Elicit
Nora "NullX" Steiner — Application Security Engineer
Verdict: Approved
This is a test-infrastructure fix. No security-sensitive surfaces are touched — no auth, no input validation, no JPQL, no permission annotations, no actuator config, no CORS, no headers. I reviewed the diff against my OWASP / CWE checklist and there is nothing for me to flag at the vulnerability level.
What I checked (and what came back clean)
patches/@vitest+browser-playwright+4.1.0.patch— Reviewed the patch hunk-by-hunk. It modifies the mocker'sregister/clearhandlers to unroute an existing predicate before installing a new one. No new input parsing, no new HTTP surface, no privilege boundary crossed. The addedunroutecall operates on Playwright's in-process route registry — not on user-controllable data. No threat surface added.postinstall: patch-package—patch-packageis a well-known utility (DefinitelyTyped uses it, Next.js uses it). It applies a diff againstnode_modules. Risk is supply-chain (the package itself) rather than application code. The patch file lives in version control, is reviewable, and is identical for every contributor and every CI run. Acceptable.coverage-flake-probe.yml—workflow_dispatchonly, runs in the Playwright container, no secrets read, artefact upload is a coverage log (no credentials embedded). Workflow-level review: clean.eval, noFunction, no dynamic-importof user-controlled paths. Read-onlyreadFileSync. Clean.One systemic observation (not a finding)
The PR is a good worked example of the "every vulnerability fix starts with a failing test" principle I want to see applied broadly — even though this isn't a vulnerability fix, the pattern (
findDuplicateMockIds()exported, six self-test fixtures bracketing the corpus scan) is exactly the shape I'd want for any future security regression test. The duplicate-ID meta-test is reusable as a template: scan corpus, extract pattern, assert invariant. Worth keeping in mind for the next security finding that needs codification.LGTM with one teaching note
If we ever do find ourselves needing
vi.mockof a sensitive module in browser tests (e.g. mocking the auth service to test logged-out flows), the ADR-012 invariants from this PR mean we'll be forced to do it via prop injection orvi.hoisted+ sync factory — which structurally prevents the kind of test-time auth bypass that I'd worry about in a JSDOM/Mock setup. Indirect security benefit.— Nora
Sara Holt — Senior QA Engineer
Verdict: Approved with concerns
This PR is in my domain. The whole change is test-infrastructure hardening. I'm happy with the strategy and most of the execution, but I have specific concerns about reliability and the test pyramid that I want addressed before this becomes the standard pattern.
What I like
findDuplicateMockIds()andhasAsyncMockFactoryWithDynamicImport()are both exported and have unit-test fixtures before the corpus scan. 6 + 5 self-tests respectively, covering positive, negative, edge cases (.svelte.ts, single-file duplicates, both-spellings-in-one-file). When the regex is wrong, the unit tests fail before the corpus scan does. Strong TDD discipline.coverage-flake-probe.yml20-cell matrix asserting zero[birpc] rpc is closedlines. This is a falsifiable quality gate, not "feels less flaky." It's the test plan I'd write.vitest-browser-svelte+ Playwright is the right tool for catching this exact class of teardown race. The meta-test sits inside the same test runner that's affected — it can't be bypassed by disabling a separate lint job.Concerns
The two meta-tests are duplicated patterns of "glob → readFileSync → scan → assert empty offenders." Currently 2 instances. If we add a third or fourth pattern guard, we'll have shotgun-surgery on the discovery code. Suggestion: extract
findBrowserSpecs()and the__dirname / SRC_ROOTsetup intosrc/__meta__/fixtures.ts(or similar) so future guards inherit the discovery. Defer if not blocking — 2 instances doesn't trigger the rule of three yet, but flag it.no-duplicate-mock-ids.test.tslines 119-129 — single assertion, but the report shape isn't great on failure. If the test fails, theexpect(report).toEqual({})produces a diff that lists every duplicate. Good. But the failure message doesn't tell the developer what to do — they'll need to find the ADR. Suggestion: whenduplicates.size > 0, log a one-line message beforeexpect:console.error('Duplicate vi.mock IDs found. See ADR-012 / fix: pick one canonical spelling.'). Cheap UX improvement.Self-test fixtures are inline strings. Standard for this kind of meta-test, fine. But the regex-based scanner in
no-async-mock-factories.test.tsdoesn't have a fixture for the legitimate-await-importOriginalcase (which would be a false positive). Felix flagged this too. Add one negative fixture and assert it does not match — even if the AST selector is the source of truth, the regex is what runs at corpus scan time.coverage-flake-probe.ymlhas notimeout-minutes. Tobias caught this too. From a QA reliability standpoint: a hung cell pollutes the signal — was the cell flaky, or did it time out at 6 hours? Addtimeout-minutes: 20per cell.No assertion that the patch applied. The
npm cipostinstall step runspatch-package, but if the file is missing or fails to apply, postinstall exits non-zero and CI fails — good. However, there's no positive assertion that the patched code is actually innode_modulesat test time. Could add a sanity check step:grep -F 'Backport of vitest PR #10267' frontend/node_modules/@vitest/browser-playwright/dist/index.js || exit 1. Optional — postinstall failure is already loud — but it would catch the case where someone runsnpm install --ignore-scriptslocally and pushes "works on my machine" thinking the upstream fix is in.LGTM on pyramid placement
The new tests live at the right layer (unit/in-suite for the meta-guards, CI workflow for the flake probe). E2E suite untouched. No new Playwright tests added — correct, because this isn't a user-facing change. Layer separation respected.
Will flip to clean Approve once run #1604 lands green across all 20 cells and the timeout/false-positive concerns are addressed (or explicitly deferred with a tracking issue).
— Sara
Leonie Voss — UX Designer & Accessibility Strategist
Verdict: Approved (LGTM — no UI surface in scope)
I reviewed the diff against my checklist (brand tokens, semantic HTML, WCAG AA contrast, 44px touch targets, focus visibility, dark-mode token remap, redundant cues, responsive 320px-first). None of those checks apply to this PR. This is a test-infrastructure and CI change — no visible UI is modified.
What I checked (and why each came back N/A)
frontend/src/routes/+layout.svelte— only line changed is theprovideConfirmServiceimport (.svelte.js→.svelte). No template, no styling, no accessibility surface affected.frontend/src/routes/documents/[id]/+page.svelte— same single-line import normalisation.NameHistoryEditCard.svelte,TagDeleteGuard.svelte— same. Import-spelling normalisation only.EnrichmentBlock.svelte— diff is just$app/stores→$app/stateand the derived expression now reads!!navigating.type. Same rendered output, same DOM, same accessibility surface, same focus order, samearia-busyandaria-labelon the skeleton (preserved at lines 35-37). No regression in the user-facing surface. The skeleton's existingaria-label="Lade aktualisierte Liste"is German-locale-hardcoded — that's a pre-existing concern not introduced by this PR.One thing I want to note (for future PRs, not this one)
EnrichmentBlock.svelte's skeleton (line 34) usesaria-busy="true"+aria-label="Lade aktualisierte Liste"(German hardcoded). Once the i18n migration of skeletons is done, this should use Paraglide'sm.*helpers — but this PR did not introduce that issue and is not the place to fix it. Tracking-wise, if there isn't already a "skeleton aria-labels via Paraglide" issue, that's worth filing separately. Not blocking.Nothing to flag at the user-experience level
No brand token changes, no contrast regressions, no touch-target changes, no focus-indicator changes, no responsive breakpoint changes, no dark-mode token impact, no animation/motion changes. Skeleton's
motion-reduce:animate-noneclass is preserved (line 34 — good, that's a WCAG 2.3.3 consideration).I'd happily see this merge from a UI/UX standpoint.
— Leonie