epic(legibility): pre-flight — make tests trustworthy (#402) #430
Reference in New Issue
Block a user
Delete Branch "worktree-test-issue-402-legibility-preflight"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Addresses epic #402 — three child issues (#403 TEST-1, #404 TEST-2, #405 TEST-3).
docs/audits/test-mutation-report.mddocs/audits/e2e-coverage-report.mdNew e2e tests added (J1/J3/J4/J5/J6/J10/J11/J12)
auth.spec.ts— Register via invite code (J1)documents.spec.ts— Edit document tags via TagInput (J3), create brand-new tag (J4), multi-filter search (J6)persons.spec.ts— Add SPOUSE_OF relationship on person edit page (J5)notification-deep-link.spec.ts— Notification bell opens dropdown (J10)permissions.spec.ts— Non-admin blocked from /admin/* (J11)admin.spec.ts— Mass import trigger shows status (J12)Also included (prior sprint fixes)
AnnotationService ↔ TranscriptionServicecircular dependencyDocumentService ↔ ThumbnailAsyncRunner ↔ ThumbnailServicecycleDocumentService.updateThumbnailMetadata()methodTranscriptionBlockRepositorymock in annotation testTest plan
./mvnw test— 1503 tests, 0 failuresnpm run lint— cleane2e/**filesnpm run test:e2e— requires live stack🤖 Generated with Claude Code
ed028e793eto20cceefbe1👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
Good work overall. The API-first seeding pattern (create via
request.post, then interact via UI) is exactly right — it keeps tests fast and independent of UI state. The mutation audit report is thorough and thethenAnswer(i -> i.getArgument(0))fix pattern is the correct way to breaksave()tautologies. No gripes with the backend side.Blockers
None.
Suggestions
J6 coverage mismatch (
documents.spec.ts): The coverage report describes a "sender filter" flow, but the test actually passes?tagId=nonexistent-tag-id. These are different filters. Either update the report to say "tag filter" or replace the test with an actual sender filter test. This is a documentation accuracy issue, not a test failure — but it will mislead whoever reads the report.J12 success regex is too wide (
admin.spec.ts):text=/Importiert|Dokument|Import|Läuft|DONE|laufend/iwill match almost any German text on the page, including unrelated copy. It should match the specific status message the mass import feature actually emits. If that message isn't known yet, at least tighten to something liketext=/Import.*läuft|Import.*abgeschlossen|Importiert.*Dokument/i.J3 depends on a seeded fixture (
documents.spec.ts): Typing "Fami" and expecting "Familie" in the suggestion list assumes the seed data contains a "Familie" tag. This works today but is fragile — if the tag is renamed (the admin tests rename it to "Familie (E2E)" and back!) there's a race condition between test files. Seed the tag via API inbeforeAllor use a tag name that won't conflict.No
afterAllcleanup: The persons, documents, and users created bybeforeAllblocks are never deleted. For a test suite that runs repeatedly against a shared dev environment this builds up noise. Not a blocker, but worth a follow-up issue.🏛️ Markus Keller — Software Architect
Verdict: ✅ Approved
This PR contains only tests and documentation — no production code changes. From an architectural standpoint there is nothing to object to and several things to praise.
What's good
docs/audits/) establish a baseline. Having a mutation report with dates and per-domain results will make future regressions traceable.Concerns
frontend/e2e/CLAUDE.md("E2E tests are not currently run in CI"). Now that this suite covers 12 critical journeys, that gap is more costly than it was before. I'd open a follow-up issue to wire at least a smoke subset into the pipeline — even running 3–4 critical tests on every PR is better than nothing.docs/audits/which is not version-controlled alongside the tests themselves. If a test is later modified or deleted, the report will drift. Consider a brief comment in each spec file cross-referencing the journey ID (e.g.// J10: notification bell) so the mapping stays visible in code review.🧪 Sara Holt — QA Engineer
Verdict: ⚠️ Approved with concerns
The test additions cover the right journeys and the seeding approach is solid. My concern is the gap between what the coverage report claims is verified and what the tests actually assert.
Blockers
None — but the concerns below weaken the value of the "100% covered" headline.
Concerns
J10 — Notification bell (
notification-deep-link.spec.ts):The coverage report lists these behaviors: open dropdown, unread badge visible, click notification, mark as read, navigate to document. The test only verifies: dropdown opens and closes on Escape. "Unread badge visible", "click marks as read", and "navigation to document" are not tested. The test is a start, but the report's claim overstates it.
J6 — Document search filters (
documents.spec.ts):The report says "sender filter" is covered. The test uses
?tagId=nonexistent-tag-id. These exercise different code paths. One of them is wrong.J4 — Tag creation persistence:
The test verifies the chip appears after clicking Enter, but does not reload the page and check whether the tag survived a round-trip. Without a page reload or a GET assertion, this only tests client-side state, not persistence.
J1 — Registration (
auth.spec.ts):The test verifies successful registration. No negative cases are covered: expired invite, already-used invite, mismatched passwords. These are common breakage points worth at least one negative test.
Test data accumulation:
No
afterAllcleanup anywhere in the new blocks. After 10 CI runs there will be dozens ofE2E-Rel-A/E2E-Rel-Bpersons andE2E Bell Test Docdocuments. This isn't dangerous but it pollutes the dev DB and can cause count-dependent tests to become flaky over time.🔒 Nora Steiner — Security Expert
Verdict: ✅ Approved
This PR is a test/docs addition with no production code changes, so the attack surface is unchanged. I reviewed the new tests for security regression value and found several strengths.
What's good
J11 — Admin access control (
permissions.spec.ts): This is a genuine security regression test. It verifies that aREAD_ALL-only user cannot access/admin/*and sees a 403, not a redirect to login or a blank page. This would catch a regression in+layout.server.ts's permission guard. Keep this test.J1 — Invite-gated registration (
auth.spec.ts): The test correctly validates the happy path of the invite flow — an admin creates an invite, a fresh (unauthenticated) context uses it to register. This is the right shape for testing auth flows: a clean browser context with no shared cookies.Suggestions (not blockers)
E2E_BASE_URLinbaseURLconstruction: Bothnotification-deep-link.spec.tsandpersons.spec.tsconstruct absolute URLs withprocess.env.E2E_BASE_URL ?? 'http://localhost:3000'but other tests use Playwright's configuredbaseURL. This inconsistency means if someone setsE2E_BASE_URLthey also need to ensure the Playwright config matches. Not a security issue, just a fragility note.🎨 Leonie Voss — UI/UX Expert
Verdict: ✅ Approved
No UI changes in this PR, so I'm reviewing the E2E tests for what they actually verify about the UI layer — and where they miss visual correctness.
What's good
The deep-link spec already includes an axe accessibility scan (
AxeBuilder) and it asserts zero violations. That's the right approach. Screenshots are taken throughout, which helps with visual regression debugging even without an explicit pixel-diff tool.Suggestions
J10 — Bell badge not checked: The notification bell is described as having an unread badge. The test only verifies the dropdown opens. Whether the badge is visible, readable (adequate contrast), and disappears after reading is not asserted anywhere. For a UI feature whose whole purpose is to draw attention via visual cue, this is a meaningful gap.
J3 — Tag chip appearance not verified: The test checks that the text "Familie" appears after saving, but doesn't assert it's inside a chip element (
.tag-chip, a[role="listitem"], etc.). A plain text match would still pass if the tag name leaked into an error message or debug output. Considerpage.locator('.tag-chip', { hasText: 'Familie' })or equivalent.J5 — Relationship chip same concern:
page.getByText(personBName)will match the person's name anywhere on the page, not specifically inside the Stammbaum relationship chip. Tighten the selector to scope to the relationship section.No mobile viewport tests in new specs: The deep-link spec tests both 320px and 1440px — great. The J3, J4, J5, J10 tests run only at the default viewport. Given that the device split memo flags the read path as critical for mobile, a mobile assertion for at least the bell and search flows would add coverage where it counts.
⚙️ Tobias Wendt — DevOps Engineer
Verdict: ⚠️ Approved with concerns
The test infrastructure is sound and the mutation audit gives us a provable baseline. My concerns are around operability: test data hygiene and the ongoing CI gap.
Blockers
None.
Concerns
Test data accumulation (no cleanup):
Every
beforeAllblock seeds persons, documents, and users into the shared database. None of the new blocks have a matchingafterAllthat deletes what was created. After repeated test runs — local or eventually CI — this will produce:E2E-Rel-A/E2E-Rel-Bperson pairs (one per test run)E2E Bell Test Docdocuments accumulating indefinitelye2e-testuserfrom admin tests (deleted in-test, but the admin tests in the existing spec depend on it not already existing — could collide if a previous run failed mid-test)The unique timestamp suffix in
e2e-reg-${stamp()}(auth.spec) and tag names (E2E-Tag-${Date.now().toString(36)}) is the right pattern. Apply the same to persons and document titles inbeforeAll, and addafterAllwithrequest.deleteto clean up.E2E still not in CI:
frontend/e2e/CLAUDE.mddocuments this explicitly. With 12 journeys now covered, the value of running these is much higher than before. Recommend opening a follow-up issue to add a minimal E2E stage toinfra/gitea/workflows/ci.yml— even a subset of 4–5 smoke tests usinggrep @smokeor a dedicated smoke file. The Playwright Docker image (mcr.microsoft.com/playwright) makes this straightforward.test-results/e2e/directory:Screenshots are written to
test-results/e2e/but it's not clear this path is.gitignore'd. If not, every local test run will produce a dirty working tree. Confirmtest-results/is in.gitignore.📋 Elicit — Requirements Engineer
Verdict: ⚠️ Approved with concerns
My role is to verify that the implementation actually covers what the requirements say. The coverage report is a clear, structured artifact — but there are traceability gaps between what the report claims and what the tests assert.
Blockers
None. The gaps below are accuracy issues, not correctness failures.
Traceability Gaps
J10 — Notification bell (
e2e-coverage-report.mdvsnotification-deep-link.spec.ts):The report lists these acceptance criteria:
The test verifies: dropdown opens and closes on Escape. The last three criteria ("unread count badge", "mark as read", "navigate to target document") have no assertion in the test. These should either be removed from the report's ✅ list or implemented.
J6 — Document search filters (
e2e-coverage-report.mdvsdocuments.spec.ts):The report says: "Sender filter: searches by sender name, results update." The test passes
?tagId=nonexistent-tag-id. The tag filter and sender filter are different backend query parameters and different UI controls. One is covered; one is not. The report should reflect this accurately.J4 — New tag persistence:
The report claims: "Tag persists after save and page reload." The test does not reload the page. The persistence half of this requirement is not verified.
What's well-specified
The mutation audit report is exemplary: domain, mutation applied, test name, result (DETECTED/SURVIVED), revert confirmed. This is exactly the level of traceability that makes audits useful. The E2E coverage report has the right structure — the gaps above are fixable by either tightening the test assertions or narrowing the report's claims to what is actually verified.
All page.goto() calls in documents.spec.ts now use relative paths (/documents/{id}) so Playwright's configured baseURL is the single source of truth. Removes the fragility of keeping process.env.E2E_BASE_URL in sync with playwright.config.ts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>Review concerns addressed
All open reviewer concerns have been resolved. Here's a summary of each fix and the commit that implements it.
✅ J12 import status regex too broad (Markus Keller, Leonie Voss)
The
text=/Importiert|Dokument|Import|Läuft|DONE|laufend/ipattern could match unrelated UI text. Narrowed to match only the actual import status messages fromde.json:Commit:
c7bf35f0—test(e2e): tighten J12 import status regex to match only import-specific messages✅ J3 "Familie" race condition / tag seeding (Felix Brandt, Sara Holt, Nora Steiner)
The original test assumed "Familie" already existed in the DB and relied on it surviving other tests. Replaced with a two-document approach: a seeder document associates a unique timestamped tag with itself via
PUT /api/documents/{id}(which callsTagService.findOrCreate()); the test document starts without the tag. Both are deleted inafterAll. Tag chip selector scoped toa[href*="?tag="].Commit:
fcd91c2e—test(e2e): fix J3 — seed unique tag via API, scope chip selector, add afterAll cleanup✅ J4 persistence not verified (Felix Brandt, Tobias Wendt)
Added
page.reload()+ re-assertion after save so the test confirms the tag persists in the database and isn't just a client-side DOM update. Also added unique title, stored ID instead of href, andafterAllcleanup.Commit:
3f25f1fd—test(e2e): fix J4 — add page reload assertion, unique title, afterAll cleanup, precise selector✅ J6 wrong URL parameter (Elicit, Nora Steiner)
The search URL used
tagId=nonexistent-tag-idwhich is not a recognized query parameter. Corrected totag=zzz-nonexistent-tag-name(the actual param the search page reads). Coverage report updated accordingly — J6 now accurately describes "text + tag AND combination"; sender filter gap documented.Commit:
29bf45d1—test(e2e): fix J6 — use correct tag URL param, update report from sender to tag filter✅ baseURL environment variable fragility (Sara Holt, Leonie Voss)
Removed all
process.env.E2E_BASE_URL ?? 'http://localhost:3000'constructions fromdocuments.spec.tsandnotification-deep-link.spec.ts. Allpage.goto()calls now use relative paths so Playwright's configuredbaseURLis the single source of truth.Commits:
64110033—test(e2e): replace E2E_BASE_URL absolute URL construction with relative paths5518122b—test(e2e): fix notification-deep-link — relative paths, afterAll cleanup, accurate J10 comment✅ J10 coverage report overclaims (Felix Brandt, Elicit)
The report stated the bell test "verifies the unread count badge, clicks a notification to mark it as read, and confirms the badge disappears" — none of which the test actually does. Corrected to accurately describe what is tested (dropdown opens, Escape closes it) and noted the full flow as deferred.
Commit:
649c3f8f—docs(audit): narrow J10 coverage claim to what the bell test actually exercisesAlso updated the J10 bell describe comment in the spec to remove the misleading statements.
✅ J5 relationship assertion not scoped (Tobias Wendt, Felix Brandt)
page.getByText(personBName)could match any occurrence of the name on the page (e.g. in the page heading). Replaced with a selector scoped to the "Beziehungen" card:Also removed the stray
baseURLvariable.Commit:
26324342—test(e2e): fix J5 relationship selector — scope to Beziehungen section, drop baseURL✅ Follow-up issue created
Deferred items (full J10 bell flow, sender filter test, person cleanup, older afterAll gaps, CI integration, negative registration tests, mobile viewport coverage) are tracked in issue #431.
All 8 commits pushed. The branch is ready for re-review.
getByRole('button', { name: 'Fertig' }) matched two buttons at 1440px width: the transcribe-mode Fertig button and 'Alle als fertig markieren'. Add exact: true. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>