test(legibility): mutation-test critical backend service tests; flag tautologies #403
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Part of Epic #402 — Pre-flight. This is TEST-1: verify that critical-path service tests are real (would fail if the service were broken) before the big-bang refactor moves anything.
Per the Legibility Rubric, this addresses C8.1 (Critical).
Goal
Spot-check at least 5 service tests per Tier-1 backend domain (
document,person,tag,user,geschichte,notification,ocr— 7 domains × 5 tests = 35 tests minimum). For each, perform a small mutation in the service under test and confirm the test fails.Method
Two viable approaches — pick one:
Approach A — Manual targeted mutation (faster for solo + LLM)
For each selected test:
Approach B — PIT (mutation testing tool for Java/Maven)
pom.xml(dev profile only, not committed permanently)./mvnw org.pitest:pitest-maven:mutationCoverageApproach A is recommended for this issue — it gives concrete, per-test answers without requiring tooling setup.
Selection criteria for the 35 tests
For each of the 7 Tier-1 domains, sample tests in this order until 5 are picked:
MassImportService,RelationshipInferenceService)Required output
Add
docs/audits/test-mutation-report.mdcontaining:Where
ResultisDETECTED/NOT DETECTED (tautological).Summary at the bottom: per-domain pass-rate and a list of all
NOT DETECTEDtests, which feed into TEST-2.Acceptance criteria
docs/audits/test-mutation-report.mdexists with per-test resultsDefinition of Done
Report committed; closing comment summarizes the per-domain pass-rate and lists how many follow-up tests TEST-2 needs to write.
Dispatch
Best done by an agent with full file-edit access (not a read-only Explore agent). Memory: source SDKMAN before running mvn.
👨💻 Felix Brandt — Senior Fullstack Developer
Observations
*ServiceTest.javafiles underbackend/src/test/java/org/raddatz/familienarchiv/service/. Test counts per domain:DocumentServiceTest(147 tests),PersonServiceTest(49),UserServiceTest(51),TagServiceTest(40),NotificationServiceTest(27),GeschichteServiceTest(22),OcrServiceTest(8). The OCR domain stands out as thin.@ExtendWith(MockitoExtension.class)+@InjectMocks— correct pattern for fast unit tests. Mutations will be quick to run and revert since no Spring context boots.Recommendations
OcrServiceTest(8 tests). With the fewest tests and a domain that touches async state transitions, it's the highest-risk place for tautologies. Five mutations there will tell you whether the tests actually guard the job state machine.DocumentServiceTest, focus on the search/filter methods first — these tend to be complex (Specification-based), and a negated condition or dropped predicate is the classic tautology pattern in Mockito tests where the repo is fully mocked.git stashafter each mutation run rather than manual undo. One stash per mutation keeps reverts clean and eliminates "did I revert that?" doubt. Checkgit diffis empty before moving to the next domain.GeschichteServiceTestusesSecurityContextHoldermanipulation in@BeforeEach/@AfterEach. When mutatingGeschichteServicepermission checks, run the fullGeschichteServiceTestclass, not just one test — those tests share security context state and a mutation in one guard can cascade silently.Open Decisions
OcrServiceTestonly testsOcrService(8 tests). There are alsoOcrBatchServiceTest(separate file) andOcrProgressServiceTest. Does "ocr domain — 5 tests" mean 5 fromOcrServiceTestalone, or 5 from the combined ocr/* service tests? Worth clarifying before starting so the report's domain column is consistent.🏛️ Markus Keller — Application Architect
Observations
pom.xml), but branch coverage is a necessary, not sufficient, condition for test quality. A test that stubs the repo and never asserts the result passes JaCoCo green while catching nothing.docs/audits/does not exist yet. The issue requires creating it and committingtest-mutation-report.md. This is a documentation artifact, not code — its existence in the repo is the acceptance criterion. That's appropriate: audit findings have the same lifecycle as ADRs and belong in version control.Recommendations
docs/audits/directory alongsidedocs/adr/— consistent with the existing documentation structure (docs/adr/001-ocr-python-microservice.mdthrough006-...). Don't create it underbackend/or anywhere else.pom.xml, even in a dev profile. The issue correctly marks Approach B as "not preferred." A committed plugin adds noise to the build descriptor and is easy to forget to remove. Approach A produces the same actionable data with less infrastructure overhead.Open Decisions (none — Approach A is clearly the right choice here)
🧪 Sara Holt — QA Engineer & Test Strategist
Observations
OcrServiceTest). The issue's 5-per-domain minimum means OCR will need 5 of those 8 tested — that's 62.5% of the entire OCR unit test suite, which gives a meaningful signal but also means any tautologies found there have high impact.Recommendations
DocumentStatustransitions andGeschichteStatustransitions are the highest-value mutation targets — a droppedstatus = REVIEWEDassignment fails silently in a mocked unit test if the test only asserts thatsave()was called and not what was saved. UseArgumentCaptorto verify the status field, not just thatsavewas invoked.GeschichteServiceTest: check tests withSecurityContextHoldermanipulation carefully. A mutation that removes a permission guard will only be detected if the test actually asserts the exception — assertinggetByIdreturns a non-null result when the user has noBLOG_WRITEwould be a tautology even with the mock in place. This is the category most likely to surfaceNOT DETECTEDcases.GeschichteServiceTest.getById_throws_NOT_FOUND_for_draft_when_user_lacks_BLOG_WRITEis unambiguous in a way thatgetById_throws...alone is not../mvnw verify(not justtest) to confirm the JaCoCo 88% gate still passes with all mutations reverted. This is the "reverts verified" acceptance criterion.🔒 Nora "NullX" Steiner — Application Security Engineer
Observations
GeschichteServiceTestdirectly tests permission enforcement by manipulatingSecurityContextHolder. These are the most security-relevant unit tests in the service layer.UserServiceTesttests auth-adjacent flows (password change, email uniqueness). These two domains should be mutation-tested with extra care on theassertThatThrownBypatterns.NotificationServiceTest(27 tests) covers cross-user notification flows. A mutation that drops the recipient filter (e.g., sends a notification to all users instead of the target user) would be a data-exposure bug. Tests that only verifymailSender.send()was called but don't verify who it was sent to are tautological from a security perspective.Recommendations
GeschichteService.getById(), don't just comment out theDomainException.forbidden()throw — negate the permission check condition (!→ remove!). This tests whether the guard is what triggers the assertion, not just whether the exception path is reachable.NotificationServiceTest, look for ArgumentCaptor usage. Tests that verifysave(notification)was called but useany()as the argument matcher are security-blind — they can't detect mutations that send notifications to the wrong recipient. Flag these asNOT DETECTEDrisk even before mutating, as a pre-screening step.NOT DETECTEDcase involving auth/permission logic. These are not equal toNOT DETECTEDin a sort comparator — they need TEST-2 to write a test with a specific security assertion, not just a coverage improvement.changePasswordtests: verify that the mutation test checks both "wrong user cannot change another user's password" AND "correct user can change their own." If only the happy path is covered, a mutation removing the ownership check won't be detected.🚀 Tobias Wendt — DevOps & Platform Engineer
Observations
source SDKMAN before running mvn(noted in project memory). The agent executing this will need to source~/.sdkman/bin/sdkman-init.shbefore each./mvnw test -Dtest=ClassNameinvocation, otherwise Maven won't resolve Java 21 and test runs will fail with a confusing error rather than a mutation signal.docs/audits/directory doesn't exist yet. Creating it and committing the report file is part of the acceptance criteria. Trivial, but the PR should be just the audit report — no production code changes. Keeping that PR surgically small makes review and revert easy.Recommendations
source ~/.sdkman/bin/sdkman-init.sh && cd backend && ./mvnw test -Dtest=<ClassName>#<methodName>— run the single test method, not the whole class, to keep turnaround fast. Switch to the full class only when theGeschichteServiceTestSecurityContext sharing is relevant../mvnw clean verify(not justtest) at the end to confirm: (a) all mutations are reverted, (b) JaCoCo 88% gate still passes. Theverifyphase runs the JaCoCo check goal;testalone does not.docs/audits/test-mutation-report.md— consistent with the existingdocs/adr/convention. Createdocs/audits/as a new directory with just this file. NoREADME.mdneeded.📋 Elicit — Requirements Engineer
Observations
NOT DETECTEDtautologies) is directly the input to TEST-2. That traceability is exactly what a preflight audit should have.Recommendations
OcrServiceTestcurrently has 8 tests — all 5 will come from there. But what if a future domain had only 3 tests? The issue should clarify: "if fewer than 5 tests exist, test all available and note the domain as under-sampled." This avoids ambiguity during execution.NotificationServiceTestalone, or does it includeSseEmitterRegistrytests if any exist? The issue should canonically define which test class maps to each domain name in the 7-domain list. A one-line mapping table would close this.Reverted?column to the audit table makes the "all reverts verified" acceptance criterion machine-checkable from the report itself, rather than requiring the reader to trust the closing comment.🎨 Leonie Voss — UX Designer & Accessibility Strategist
Observations
What I checked
docs/audits/test-mutation-report.md, which is an internal developer document.No concerns from my angle.
🗳️ Decision Queue — Action Required
2 decisions need your input before implementation starts.
Scope / Domain Boundaries
OCR domain boundary —
OcrServiceTesthas 8 tests, but there are alsoOcrBatchServiceTestandOcrProgressServiceTestin the same test directory. Options: (A) "ocr domain" =OcrServiceTestonly, keeping domain names 1:1 with service classes, or (B) "ocr domain" = all three OCR test files treated as one domain. Option A is simpler and consistent with the other 6 domains (each maps to one service class). Option B gives a more complete signal but makes the domain column in the report ambiguous. (Raised by: Felix)Under-sampled domain policy — The issue requires ≥5 tests per domain, but if a domain has fewer than 5 tests total (e.g., a future domain with only 3), the issue is silent on what to do. Options: (A) "test all available tests and mark the domain as under-sampled in the report," or (B) "treat this as a blocker — TEST-2 must first add tests before mutation testing can proceed." Option A is pragmatic for a pre-flight audit; Option B is stricter but creates a dependency between TEST-2 and this issue. (Raised by: Elicit)
Report Format
Reverted?boolean column to every row, making revert confirmation traceable per-test, or (B) keep the table as-is and confirm revert once in the closing comment. Option A costs nothing in effort but makes the report self-contained and auditable. Option B is simpler and the closing comment is part of the Definition of Done anyway. (Raised by: Elicit)TEST-1 Complete ✅
All 35 mutations detected (100%).
Applied 5 targeted mutations to each of the 7 Tier-1 service domains (document, person, tag, user, geschichte, notification, ocr) and ran the paired test in isolation for each.
No tautological tests found. Every mutation caused a clear, specific assertion failure.
Full results:
docs/audits/test-mutation-report.mdPR: #430