epic(legibility): pre-flight — make tests trustworthy before big-bang refactor #402
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Epic context
This is Epic 3 of 5 for the Codebase Legibility Refactor (see #387).
The big-bang refactor (Epic 4) is only safe if the test suite actually catches regressions. AI-generated tests can pass while asserting tautologies (test sets up X, asserts X). This epic verifies — before the refactor moves any code — that critical-path service tests would FAIL if the service were broken.
This is the gate between Epic 1 (Audit) and Epic 4 (Refactor).
Scope
Acceptance criteria
Definition of Done
This epic closes when REFACTOR-1 / REFACTOR-2 (Epic 4) can run with full confidence that the test suite is the safety net. The "Refactor readiness" verdict in
docs/LEGIBILITY-READINESS.md(AUDIT-6) should reference Epic 3's completion.🏗️ Markus Keller — Senior Application Architect
Observations
DocumentServiceTest(147@Test),PersonServiceTest(49),TagServiceTest(40), andGeschichteServiceTest(22) exist and all use@ExtendWith(MockitoExtension.class)— no Spring context overhead, properly structured. The question this epic must answer is not "do tests exist?" but "would they fail if the production code was broken?"DocumentServicewould need to preserve that contract to be meaningful — a mutation that causesPersonServiceto be called instead ofPersonRepositorydirectly should be caught by existing tests.docs/LEGIBILITY-READINESS.mdreference in the acceptance criteria is a file that doesn't exist yet (it would be created in AUDIT-6). Ensure TEST-1/TEST-2/TEST-3 child issues do not depend on it existing before they can close.Recommendations
DocumentService.updateDocument,PersonService.mergePersons, andGeschichteService(new domain, most likely to have thin coverage). These are the paths where a refactor is most likely to change control flow.DocumentServiceTestactually exercises the path throughPersonServicewhensenderIdis set — not just stubs it away withthenReturn(null). If the service-to-service call is always short-circuited in tests, a refactor that wires upPersonRepositorydirectly would pass all tests..gitea/workflows/ci.yml, the backend job runs./mvnw clean test— this should exercise everything includingPersonServiceIntegrationTestandGeschichteServiceIntegrationTest. ConfirmMigrationIntegrationTestis included (it is, by path), which verifies Flyway migrations as a safety net for Epic 4's schema touches.Open Decisions (omit this section entirely if none)
org.pitest:pitest-maven) and can be scoped to specific packages. The decision is: run PIT as a one-off verification step and report results as comments, OR wire it into CI as a permanent gate. One-off is appropriate for this epic's purpose; a permanent gate is a separate investment. The issue doesn't specify — clarify in TEST-1.👨💻 Felix Brandt — Senior Fullstack Developer
Observations
DocumentServiceTest: most tests dowhen(repo.findById(id)).thenReturn(of(doc))then assert the returned value equalsdoc. That's not tautological — the test verifies the service returns what the repository gives it. But it doesn't verify what happens whenupdateDocumenttransforms the document — if the service sets the wrong field during an update andsave()returns a mock anyway, the test passes with a lie.DocumentServiceTesthas 147 tests, which is healthy volume. The risk area is the update/transformation path — tests that assert onverify(repo.save(captor.capture()))and then check the captured state. If a test useswhen(repo.save(any())).thenReturn(doc)wheredocis the pre-mutation object, the assertion on the returned value tells you nothing about whether the mutation was applied.auth.spec.ts(login/logout/redirect),documents.spec.ts(create, search),transcription.spec.ts(block editing),geschichten.spec.ts(story create + publish). The tag journey is not covered by a dedicated e2e spec — there's anadmin.spec.tsbut tag creation/application to a document and subsequent tag search is not an explicit test scenario. This is the gap to fill in TEST-3.PersonServiceTesthas 49 tests.findOrCreateandmergePersonsare the highest-stakes paths. ThemergePersonspath specifically is the one most likely to be touched in a refactor and most likely to have subtle behavioral mutations survive existing tests.Recommendations
--targetClasses=org.raddatz.familienarchiv.service.DocumentService,PersonService,TagService,GeschichteServiceand report surviving mutants that should have been caught. The "5 tests per Tier-1 domain" AC is the right minimum — I'd aim for zero survivors in the not-found, permission-denied, and cascade-delete paths specifically.when(repo.save(any())).thenReturn(inputDoc)withwhen(repo.save(any())).thenAnswer(i -> i.getArgument(0))(return whatever was actually saved) and then assert the captured argument's state. This makes the save assertion meaningful.Open Decisions (none)
🔒 Nora "NullX" Steiner — Application Security Engineer
Observations
GeschichteControllerTest: it uses@WithMockUser(authorities = "BLOG_WRITE")for write operations — that's correctly verifying the permission gate. However, the counterpart (a test that POSTs as a user withoutBLOG_WRITEand expects 403) needs to exist as well. If TEST-1's mutation of the@RequirePermissionannotation check onGeschichteControllerdoesn't cause a test failure, the permission boundary is not proven.DocumentControllerTest— the critical security invariant is that aREAD_ALL-only user cannot call write endpoints. If a mutation strips the@RequirePermissionannotation and the test suite still passes, that's a tautological security test.PersonServiceIntegrationTestandGeschichteServiceIntegrationTestboth exist. The latter runsauthenticateAs(writer, Permission.BLOG_WRITE)— that's integration-level permission testing, which is good. Confirm the negative case exists:authenticateAs(reader, Permission.READ_ALL)then attempt a write and expectDomainException.forbidden()../mvnw clean test. No SAST (Semgrep, SpotBugs) step is present. This epic is the right moment to note that gap — not to fix it, but to include it in the TEST-1 findings as an observability gap for Epic 4.Recommendations
@RequirePermissionannotation is not a test quality issue — it's a security gap. Document any such findings explicitly.403is returned when the user lacks the required authority, not just that200is returned when they have it.auth.spec.tse2e suite covers unauthenticated redirect and session-clearing on logout — that's solid. Consider adding apermissions.spec.tsscenario (it exists!) that specifically verifies a read-only user cannot access the Geschichten editor. Check ifpermissions.spec.tscurrently covers this — if not, it belongs in TEST-3.Open Decisions (none)
🧪 Sara Holt — Senior QA Engineer
Observations
DocumentServiceTesthas 147 tests — that's the richest domain.PersonServiceTesthas 49.TagServiceTesthas 40.GeschichteServiceTesthas 22 (the newest domain, the least battle-tested). The "at least 5 per Tier-1 domain" floor in the AC is appropriate — but I'd recommend the TEST-1 child issue explicitly lists which services are Tier-1, or ambiguity will cause scope drift.auth.spec.ts✅,documents.spec.ts(partial — create covered, tag application during create is not explicitly verified) ⚠️,transcription.spec.ts✅,geschichten.spec.ts✅. The tag journey (apply tag to document → search by tag → results filtered) is the clearest gap.e2e/CLAUDE.mdexplicitly notes that E2E tests are not run in CI. This means TEST-3's "Critical user journeys each have at least one e2e" is not enforced by the pipeline — someone could delete an e2e test and CI would still pass. The epic's DoD should state whether CI integration is in scope or deferred.transcription.spec.ts: 14 tests covering blocks, edit, persist on reload, reorder, delete — this is thorough and would catch most transcription regressions. Good baseline.docs/LEGIBILITY-READINESS.md" referenced in the DoD is a file that presumably gets created in AUDIT-6. Check that the dependency is not a blocker — if AUDIT-6 isn't merged yet when this epic closes, the verdict reference is a dangling pointer.Recommendations
DocumentService,PersonService,TagService,GeschichteService. This avoids "we did 5 on OcrService and skipped PersonService" outcomes.documents.spec.ts.npm run test:e2eagainst a Docker Compose stack." If you're leaving E2E out of CI, note that as a deliberate scope boundary so it doesn't quietly stay unfixed.@Transactionalrollback pattern inGeschichteServiceIntegrationTest(it uses Testcontainers + real Postgres) means integration tests do clean up after themselves. That's good — TEST-2 rewrites should follow the same pattern.makeDocument(),makeGeschichte()) rather than inline builders. The existingDocumentServiceTestalready uses a constantUNPAGED— that same DRY instinct should extend to entity setup.Open Decisions
🚀 Tobias Wendt — DevOps & Platform Engineer
Observations
ci.yml) runs three jobs:unit-tests(frontend),ocr-tests(Python), andbackend-unit-tests. The backend job runs./mvnw clean test, which includes Testcontainers integration tests — the Gitea runner has Docker access and setsDOCKER_API_VERSION: "1.43"to handle the older Docker 24.x API on the NAS. That's solid.e2e/CLAUDE.md: "E2E tests are not currently run in CI." This means the TEST-3 e2e coverage is only locally verified. From a pipeline perspective, an e2e test that never runs in CI is a documentation artifact, not a safety net.actions/upload-artifact@v3step inci.ymlis deprecated. It still works but security patches stopped. This is worth fixing in this epic's cleanup scope since you're touching CI quality anyway — swap to@v4. Low effort, reduces vulnerability surface.mcr.microsoft.com/playwright:v1.58.2-noble— that's correct practice. Maven is cached onbackend/pom.xmlhash, node_modules onpackage-lock.jsonhash — both right.target/pit-reports/artifact upload for reviewability.Recommendations
actions/upload-artifact@v3→@v4in the same PR that closes TEST-1 or TEST-2. This is a 1-line change and shouldn't be deferred past this epic.mvn org.pitest:pitest-maven:mutationCoverage -pl backend -Dpit.targetClasses=...step in CI (or at least upload the HTML report as an artifact). Without this, the mutation results can't be reviewed or tracked over time.e2e-testsjob that usesdocker/setup-buildx-actionor relies on the Gitea NAS runner's host Docker access (same as the Testcontainers job), startsdocker compose up -dfor the backend stack, then runsnpm run test:e2e. The NAS runner already has Docker — the infrastructure cost is low.backend-unit-testsjob.Open Decisions (none)
📋 Elicit (Requirements Engineer) — Brownfield Analysis
Observations
docs/LEGIBILITY-READINESS.md(AUDIT-6) — verify this file exists or is expected to be created before this epic closes, otherwise the DoD has an unresolved dependency.Recommendations
DocumentService,PersonService,TagService,GeschichteService. This makes the AC unambiguous at review time.Open Decisions
🎨 Leonie Voss — UX Designer & Accessibility Strategist
Observations
No direct UX or accessibility concerns from this epic — it is purely a backend/test quality initiative with no frontend visual changes.
I checked what's covered:
geschichten.spec.tsincludes an axe scan (AxeBuilder.withTags(['wcag2a', 'wcag2aa'])), andtranscription.spec.tsincludestranscription panel passes axe accessibility check. The blocking-severity filter ingeschichten.spec.ts(filtering toseriousandcriticalviolations only) is a pragmatic gate. These pass today and TEST-3 should verify they still pass after any test suite changes — but since this epic doesn't touch UI code, there's no regression risk here.One note for the record: the e2e suite's
accessibility.spec.tsruns Axe on key pages. If TEST-3 adds a new tag-filter journey, it should include an axe scan on the tag-filtered document list page if that path hasn't been scanned before.No open decisions from my angle.
🗳️ Decision Queue — Action Required
3 decisions need your input before implementation starts.
Test Strategy
Mutation testing tooling: one-off vs CI gate — PIT (
org.pitest:pitest-maven) is the standard Java choice. The question is whether to run it once during TEST-1 and report results as a comment, or wire it as a permanent CI gate. One-off is appropriate for this epic's purpose (verifying pre-refactor safety). A permanent gate is a separate investment with real CI time cost. Pick one and record it in the TEST-1 child issue. (Raised by: Markus)E2E in CI: in scope or deferred? — The DoD says "refactor can run with full confidence," but the e2e suite is not wired into CI (
e2e/CLAUDE.mdconfirms this). A passing e2e suite that only runs locally is a documentation artifact, not an enforced safety net. Options: (A) add ane2e-testsjob toci.ymlusing the NAS runner's existing Docker access — infrastructure cost is low since Testcontainers already uses it; (B) explicitly document this as a known gap that doesn't block Epic 4, with a follow-up issue. Either choice is defensible — the risk is leaving it unstated. (Raised by: Sara, Tobias)Requirements Clarity
DocumentService,PersonService,TagService,GeschichteServiceare the obvious candidates). Option B: define Tier-1 as "any service with ≥1 write endpoint that touches two or more entities." Option A is simpler and immediately actionable. (Raised by: Elicit, Sara)Epic complete ✅
All three child issues resolved:
Audit artifacts committed
docs/audits/test-mutation-report.md— full mutation-by-mutation resultsdocs/audits/e2e-coverage-report.md— 12-journey coverage mapPR: #430
Backend: 1503 tests, 0 failures