epic(legibility): pre-flight — make tests trustworthy before big-bang refactor #402

Closed
opened 2026-05-04 16:10:25 +02:00 by marcel · 9 comments
Owner

Epic context

This is Epic 3 of 5 for the Codebase Legibility Refactor (see #387).

The big-bang refactor (Epic 4) is only safe if the test suite actually catches regressions. AI-generated tests can pass while asserting tautologies (test sets up X, asserts X). This epic verifies — before the refactor moves any code — that critical-path service tests would FAIL if the service were broken.

This is the gate between Epic 1 (Audit) and Epic 4 (Refactor).

Scope

  • TEST-1: Spot-check critical service tests via mutation testing on backend services
  • TEST-2: Fill coverage gaps in critical paths revealed by TEST-1 (write real tests for the tautological ones)
  • TEST-3: Verify the e2e suite covers every critical user journey end-to-end

Acceptance criteria

  • All 3 child issues are closed
  • At least 5 service tests per Tier-1 backend domain have been mutation-tested and proven to detect breakage (TEST-1)
  • Any tautological tests found have been rewritten to be real (TEST-2)
  • Critical user journeys (auth, document create, tag, search, transcription, story create) each have at least one e2e (TEST-3)

Definition of Done

This epic closes when REFACTOR-1 / REFACTOR-2 (Epic 4) can run with full confidence that the test suite is the safety net. The "Refactor readiness" verdict in docs/LEGIBILITY-READINESS.md (AUDIT-6) should reference Epic 3's completion.

## Epic context This is **Epic 3 of 5** for the *Codebase Legibility Refactor* (see #387). The big-bang refactor (Epic 4) is only safe if the test suite **actually catches regressions**. AI-generated tests can pass while asserting tautologies (test sets up X, asserts X). This epic verifies — before the refactor moves any code — that critical-path service tests would FAIL if the service were broken. This is the gate between Epic 1 (Audit) and Epic 4 (Refactor). ## Scope - **TEST-1**: Spot-check critical service tests via mutation testing on backend services - **TEST-2**: Fill coverage gaps in critical paths revealed by TEST-1 (write real tests for the tautological ones) - **TEST-3**: Verify the e2e suite covers every critical user journey end-to-end ## Acceptance criteria - [ ] All 3 child issues are closed - [ ] At least 5 service tests per Tier-1 backend domain have been mutation-tested and proven to detect breakage (TEST-1) - [ ] Any tautological tests found have been rewritten to be real (TEST-2) - [ ] Critical user journeys (auth, document create, tag, search, transcription, story create) each have at least one e2e (TEST-3) ## Definition of Done This epic closes when REFACTOR-1 / REFACTOR-2 (Epic 4) can run with full confidence that the test suite is the safety net. The "Refactor readiness" verdict in `docs/LEGIBILITY-READINESS.md` (AUDIT-6) should reference Epic 3's completion.
marcel added this to the (deleted) milestone 2026-05-04 16:10:25 +02:00
marcel added the epiclegibilitytest labels 2026-05-04 16:10:35 +02:00
Author
Owner

🏗️ Markus Keller — Senior Application Architect

Observations

  • The issue frames this correctly as a gate between Epic 1 (Audit) and Epic 4 (Refactor). This is the right architectural sequencing. The cost of a false-safe test suite during a structural refactor is hard-to-detect regressions — not failing tests, but passing tests that lie.
  • The test suite is already substantial. DocumentServiceTest (147 @Test), PersonServiceTest (49), TagServiceTest (40), and GeschichteServiceTest (22) exist and all use @ExtendWith(MockitoExtension.class) — no Spring context overhead, properly structured. The question this epic must answer is not "do tests exist?" but "would they fail if the production code was broken?"
  • The current layering is clean: services own repositories, cross-domain calls go through service boundaries. Any mutation-testing pass on DocumentService would need to preserve that contract to be meaningful — a mutation that causes PersonService to be called instead of PersonRepository directly should be caught by existing tests.
  • The docs/LEGIBILITY-READINESS.md reference in the acceptance criteria is a file that doesn't exist yet (it would be created in AUDIT-6). Ensure TEST-1/TEST-2/TEST-3 child issues do not depend on it existing before they can close.

Recommendations

  • TEST-1 scope: Focus mutation testing on the three highest-complexity service paths — DocumentService.updateDocument, PersonService.mergePersons, and GeschichteService (new domain, most likely to have thin coverage). These are the paths where a refactor is most likely to change control flow.
  • Don't run PIT on all services simultaneously. Run it per-service, fix tautologies immediately, commit the fix before moving to the next service. This keeps the mutation-testing work auditable and creates a clear paper trail before Epic 4 starts.
  • The architecture boundary test matters more than line coverage: verify that DocumentServiceTest actually exercises the path through PersonService when senderId is set — not just stubs it away with thenReturn(null). If the service-to-service call is always short-circuited in tests, a refactor that wires up PersonRepository directly would pass all tests.
  • Verify the CI pipeline reaches the backend integration tests (Testcontainers). Looking at .gitea/workflows/ci.yml, the backend job runs ./mvnw clean test — this should exercise everything including PersonServiceIntegrationTest and GeschichteServiceIntegrationTest. Confirm MigrationIntegrationTest is included (it is, by path), which verifies Flyway migrations as a safety net for Epic 4's schema touches.

Open Decisions (omit this section entirely if none)

  • Mutation testing tooling choice: PIT (pitest.org) is the standard for Java. It integrates with Maven (org.pitest:pitest-maven) and can be scoped to specific packages. The decision is: run PIT as a one-off verification step and report results as comments, OR wire it into CI as a permanent gate. One-off is appropriate for this epic's purpose; a permanent gate is a separate investment. The issue doesn't specify — clarify in TEST-1.
## 🏗️ Markus Keller — Senior Application Architect ### Observations - The issue frames this correctly as a **gate** between Epic 1 (Audit) and Epic 4 (Refactor). This is the right architectural sequencing. The cost of a false-safe test suite during a structural refactor is hard-to-detect regressions — not failing tests, but passing tests that lie. - The test suite is already substantial. `DocumentServiceTest` (147 `@Test`), `PersonServiceTest` (49), `TagServiceTest` (40), and `GeschichteServiceTest` (22) exist and all use `@ExtendWith(MockitoExtension.class)` — no Spring context overhead, properly structured. The question this epic must answer is not "do tests exist?" but "would they fail if the production code was broken?" - The current layering is clean: services own repositories, cross-domain calls go through service boundaries. Any mutation-testing pass on `DocumentService` would need to preserve that contract to be meaningful — a mutation that causes `PersonService` to be called instead of `PersonRepository` directly should be caught by existing tests. - The `docs/LEGIBILITY-READINESS.md` reference in the acceptance criteria is a file that doesn't exist yet (it would be created in AUDIT-6). Ensure TEST-1/TEST-2/TEST-3 child issues do not depend on it existing before they can close. ### Recommendations - **TEST-1 scope**: Focus mutation testing on the three highest-complexity service paths — `DocumentService.updateDocument`, `PersonService.mergePersons`, and `GeschichteService` (new domain, most likely to have thin coverage). These are the paths where a refactor is most likely to change control flow. - **Don't run PIT on all services simultaneously**. Run it per-service, fix tautologies immediately, commit the fix before moving to the next service. This keeps the mutation-testing work auditable and creates a clear paper trail before Epic 4 starts. - **The architecture boundary test matters more than line coverage**: verify that `DocumentServiceTest` actually exercises the path through `PersonService` when `senderId` is set — not just stubs it away with `thenReturn(null)`. If the service-to-service call is always short-circuited in tests, a refactor that wires up `PersonRepository` directly would pass all tests. - Verify the CI pipeline reaches the backend integration tests (Testcontainers). Looking at `.gitea/workflows/ci.yml`, the backend job runs `./mvnw clean test` — this should exercise everything including `PersonServiceIntegrationTest` and `GeschichteServiceIntegrationTest`. Confirm `MigrationIntegrationTest` is included (it is, by path), which verifies Flyway migrations as a safety net for Epic 4's schema touches. ### Open Decisions _(omit this section entirely if none)_ - **Mutation testing tooling choice**: PIT (pitest.org) is the standard for Java. It integrates with Maven (`org.pitest:pitest-maven`) and can be scoped to specific packages. The decision is: run PIT as a one-off verification step and report results as comments, OR wire it into CI as a permanent gate. One-off is appropriate for this epic's purpose; a permanent gate is a separate investment. The issue doesn't specify — clarify in TEST-1.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

  • The issue identifies the right risk: AI-generated tests that set up X and assert X are tautological by construction. I've seen this pattern in DocumentServiceTest: most tests do when(repo.findById(id)).thenReturn(of(doc)) then assert the returned value equals doc. That's not tautological — the test verifies the service returns what the repository gives it. But it doesn't verify what happens when updateDocument transforms the document — if the service sets the wrong field during an update and save() returns a mock anyway, the test passes with a lie.
  • Looking at the existing service tests: DocumentServiceTest has 147 tests, which is healthy volume. The risk area is the update/transformation path — tests that assert on verify(repo.save(captor.capture())) and then check the captured state. If a test uses when(repo.save(any())).thenReturn(doc) where doc is the pre-mutation object, the assertion on the returned value tells you nothing about whether the mutation was applied.
  • The e2e suite covers the core journeys mentioned in AC: auth.spec.ts (login/logout/redirect), documents.spec.ts (create, search), transcription.spec.ts (block editing), geschichten.spec.ts (story create + publish). The tag journey is not covered by a dedicated e2e spec — there's an admin.spec.ts but tag creation/application to a document and subsequent tag search is not an explicit test scenario. This is the gap to fill in TEST-3.
  • The PersonServiceTest has 49 tests. findOrCreate and mergePersons are the highest-stakes paths. The mergePersons path specifically is the one most likely to be touched in a refactor and most likely to have subtle behavioral mutations survive existing tests.

Recommendations

  • For TEST-1, use PIT with --targetClasses=org.raddatz.familienarchiv.service.DocumentService,PersonService,TagService,GeschichteService and report surviving mutants that should have been caught. The "5 tests per Tier-1 domain" AC is the right minimum — I'd aim for zero survivors in the not-found, permission-denied, and cascade-delete paths specifically.
  • For TEST-2, the specific fix pattern for tautological tests: replace when(repo.save(any())).thenReturn(inputDoc) with when(repo.save(any())).thenAnswer(i -> i.getArgument(0)) (return whatever was actually saved) and then assert the captured argument's state. This makes the save assertion meaningful.
  • For TEST-3, add a tag e2e scenario: create a document → add a tag → search by that tag → verify the document appears. This is one of the acceptance criteria gaps.
  • Don't add new behavior during TEST-2 refactoring of tautological tests. Red/green/refactor applies here: if rewriting a test reveals the production code has a bug, file a separate issue for the bug. Don't fix it inline with the test rewrite — that conflates two changes.

Open Decisions (none)

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - The issue identifies the right risk: AI-generated tests that set up X and assert X are tautological by construction. I've seen this pattern in `DocumentServiceTest`: most tests do `when(repo.findById(id)).thenReturn(of(doc))` then assert the returned value equals `doc`. That's not tautological — the test verifies the service returns what the repository gives it. But it doesn't verify what happens when `updateDocument` transforms the document — if the service sets the wrong field during an update and `save()` returns a mock anyway, the test passes with a lie. - Looking at the existing service tests: `DocumentServiceTest` has 147 tests, which is healthy volume. The risk area is the update/transformation path — tests that assert on `verify(repo.save(captor.capture()))` and then check the captured state. If a test uses `when(repo.save(any())).thenReturn(doc)` where `doc` is the pre-mutation object, the assertion on the returned value tells you nothing about whether the mutation was applied. - The e2e suite covers the core journeys mentioned in AC: `auth.spec.ts` (login/logout/redirect), `documents.spec.ts` (create, search), `transcription.spec.ts` (block editing), `geschichten.spec.ts` (story create + publish). The **tag journey is not covered by a dedicated e2e spec** — there's an `admin.spec.ts` but tag creation/application to a document and subsequent tag search is not an explicit test scenario. This is the gap to fill in TEST-3. - The `PersonServiceTest` has 49 tests. `findOrCreate` and `mergePersons` are the highest-stakes paths. The `mergePersons` path specifically is the one most likely to be touched in a refactor and most likely to have subtle behavioral mutations survive existing tests. ### Recommendations - For TEST-1, use PIT with `--targetClasses=org.raddatz.familienarchiv.service.DocumentService,PersonService,TagService,GeschichteService` and report surviving mutants that should have been caught. The "5 tests per Tier-1 domain" AC is the right minimum — I'd aim for zero survivors in the not-found, permission-denied, and cascade-delete paths specifically. - For TEST-2, the specific fix pattern for tautological tests: replace `when(repo.save(any())).thenReturn(inputDoc)` with `when(repo.save(any())).thenAnswer(i -> i.getArgument(0))` (return whatever was actually saved) and then assert the captured argument's state. This makes the save assertion meaningful. - For TEST-3, add a tag e2e scenario: create a document → add a tag → search by that tag → verify the document appears. This is one of the acceptance criteria gaps. - **Don't add new behavior during TEST-2 refactoring of tautological tests.** Red/green/refactor applies here: if rewriting a test reveals the production code has a bug, file a separate issue for the bug. Don't fix it inline with the test rewrite — that conflates two changes. ### Open Decisions _(none)_
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Observations

  • This epic is about test trustworthiness, not new features — so the security angle is specifically: do the security-relevant service tests actually prove that permission checks fire?
  • I checked GeschichteControllerTest: it uses @WithMockUser(authorities = "BLOG_WRITE") for write operations — that's correctly verifying the permission gate. However, the counterpart (a test that POSTs as a user without BLOG_WRITE and expects 403) needs to exist as well. If TEST-1's mutation of the @RequirePermission annotation check on GeschichteController doesn't cause a test failure, the permission boundary is not proven.
  • The same pattern applies to DocumentControllerTest — the critical security invariant is that a READ_ALL-only user cannot call write endpoints. If a mutation strips the @RequirePermission annotation and the test suite still passes, that's a tautological security test.
  • PersonServiceIntegrationTest and GeschichteServiceIntegrationTest both exist. The latter runs authenticateAs(writer, Permission.BLOG_WRITE) — that's integration-level permission testing, which is good. Confirm the negative case exists: authenticateAs(reader, Permission.READ_ALL) then attempt a write and expect DomainException.forbidden().
  • The CI pipeline runs backend tests via ./mvnw clean test. No SAST (Semgrep, SpotBugs) step is present. This epic is the right moment to note that gap — not to fix it, but to include it in the TEST-1 findings as an observability gap for Epic 4.

Recommendations

  • In TEST-1, explicitly include authorization boundary tests in the "mutation tested" set. A surviving mutant that removes a @RequirePermission annotation is not a test quality issue — it's a security gap. Document any such findings explicitly.
  • In TEST-2, if tautological tests are found in security-adjacent paths (permission checks, not-found vs forbidden distinction), rewrite them with the negative test pattern: test that 403 is returned when the user lacks the required authority, not just that 200 is returned when they have it.
  • The auth.spec.ts e2e suite covers unauthenticated redirect and session-clearing on logout — that's solid. Consider adding a permissions.spec.ts scenario (it exists!) that specifically verifies a read-only user cannot access the Geschichten editor. Check if permissions.spec.ts currently covers this — if not, it belongs in TEST-3.

Open Decisions (none)

## 🔒 Nora "NullX" Steiner — Application Security Engineer ### Observations - This epic is about test trustworthiness, not new features — so the security angle is specifically: **do the security-relevant service tests actually prove that permission checks fire?** - I checked `GeschichteControllerTest`: it uses `@WithMockUser(authorities = "BLOG_WRITE")` for write operations — that's correctly verifying the permission gate. However, the counterpart (a test that POSTs as a user *without* `BLOG_WRITE` and expects 403) needs to exist as well. If TEST-1's mutation of the `@RequirePermission` annotation check on `GeschichteController` doesn't cause a test failure, the permission boundary is not proven. - The same pattern applies to `DocumentControllerTest` — the critical security invariant is that a `READ_ALL`-only user cannot call write endpoints. If a mutation strips the `@RequirePermission` annotation and the test suite still passes, that's a tautological security test. - `PersonServiceIntegrationTest` and `GeschichteServiceIntegrationTest` both exist. The latter runs `authenticateAs(writer, Permission.BLOG_WRITE)` — that's integration-level permission testing, which is good. Confirm the negative case exists: `authenticateAs(reader, Permission.READ_ALL)` then attempt a write and expect `DomainException.forbidden()`. - The CI pipeline runs backend tests via `./mvnw clean test`. No SAST (Semgrep, SpotBugs) step is present. This epic is the right moment to note that gap — not to fix it, but to include it in the TEST-1 findings as an observability gap for Epic 4. ### Recommendations - In TEST-1, explicitly include **authorization boundary tests** in the "mutation tested" set. A surviving mutant that removes a `@RequirePermission` annotation is not a test quality issue — it's a security gap. Document any such findings explicitly. - In TEST-2, if tautological tests are found in security-adjacent paths (permission checks, not-found vs forbidden distinction), rewrite them with the negative test pattern: test that `403` is returned when the user lacks the required authority, not just that `200` is returned when they have it. - **The `auth.spec.ts` e2e suite covers unauthenticated redirect and session-clearing on logout** — that's solid. Consider adding a `permissions.spec.ts` scenario (it exists!) that specifically verifies a read-only user cannot access the Geschichten editor. Check if `permissions.spec.ts` currently covers this — if not, it belongs in TEST-3. ### Open Decisions _(none)_
Author
Owner

🧪 Sara Holt — Senior QA Engineer

Observations

  • This is exactly the epic I'd want before a big-bang refactor. The acceptance criteria are well-scoped: mutation testing with a minimum floor (5 tests per Tier-1 domain), tautology cleanup, and e2e journey coverage. These are measurable and don't leave room for "it's probably fine."
  • Existing coverage is strong but unevenly distributed. DocumentServiceTest has 147 tests — that's the richest domain. PersonServiceTest has 49. TagServiceTest has 40. GeschichteServiceTest has 22 (the newest domain, the least battle-tested). The "at least 5 per Tier-1 domain" floor in the AC is appropriate — but I'd recommend the TEST-1 child issue explicitly lists which services are Tier-1, or ambiguity will cause scope drift.
  • e2e journeys are mostly covered by existing specs. The explicit list in the AC (auth, document create, tag, search, transcription, story create) maps to: auth.spec.ts , documents.spec.ts (partial — create covered, tag application during create is not explicitly verified) ⚠️, transcription.spec.ts , geschichten.spec.ts . The tag journey (apply tag to document → search by tag → results filtered) is the clearest gap.
  • The CI pipeline gap: e2e/CLAUDE.md explicitly notes that E2E tests are not run in CI. This means TEST-3's "Critical user journeys each have at least one e2e" is not enforced by the pipeline — someone could delete an e2e test and CI would still pass. The epic's DoD should state whether CI integration is in scope or deferred.
  • I looked at transcription.spec.ts: 14 tests covering blocks, edit, persist on reload, reorder, delete — this is thorough and would catch most transcription regressions. Good baseline.
  • The "Refactor readiness verdict in docs/LEGIBILITY-READINESS.md" referenced in the DoD is a file that presumably gets created in AUDIT-6. Check that the dependency is not a blocker — if AUDIT-6 isn't merged yet when this epic closes, the verdict reference is a dangling pointer.

Recommendations

  • The TEST-1 child issue should define Tier-1 services explicitly: at minimum DocumentService, PersonService, TagService, GeschichteService. This avoids "we did 5 on OcrService and skipped PersonService" outcomes.
  • For TEST-3, the missing journey is tag filtering: upload/create a document, add a tag, search by that tag, verify the document appears. This is a 10-line Playwright addition to documents.spec.ts.
  • Add an explicit AC item: "E2E tests are runnable in CI via npm run test:e2e against a Docker Compose stack." If you're leaving E2E out of CI, note that as a deliberate scope boundary so it doesn't quietly stay unfixed.
  • The @Transactional rollback pattern in GeschichteServiceIntegrationTest (it uses Testcontainers + real Postgres) means integration tests do clean up after themselves. That's good — TEST-2 rewrites should follow the same pattern.
  • When rewriting tautological tests in TEST-2, add factory method helpers (makeDocument(), makeGeschichte()) rather than inline builders. The existing DocumentServiceTest already uses a constant UNPAGED — that same DRY instinct should extend to entity setup.

Open Decisions

  • E2E in CI: in scope or deferred? The epic's DoD says "refactor can run with full confidence" — but if the e2e suite isn't wired into CI, that confidence exists only locally. Options: (A) wire up e2e to CI as part of this epic (adds complexity, requires Docker Compose in CI); (B) document it as a known gap that doesn't block Epic 4. Either is defensible, but it should be an explicit choice recorded in the DoD, not silence. (Raised by: Sara)
## 🧪 Sara Holt — Senior QA Engineer ### Observations - This is exactly the epic I'd want before a big-bang refactor. The acceptance criteria are well-scoped: mutation testing with a minimum floor (5 tests per Tier-1 domain), tautology cleanup, and e2e journey coverage. These are measurable and don't leave room for "it's probably fine." - **Existing coverage is strong but unevenly distributed.** `DocumentServiceTest` has 147 tests — that's the richest domain. `PersonServiceTest` has 49. `TagServiceTest` has 40. `GeschichteServiceTest` has 22 (the newest domain, the least battle-tested). The "at least 5 per Tier-1 domain" floor in the AC is appropriate — but I'd recommend the TEST-1 child issue explicitly lists which services are Tier-1, or ambiguity will cause scope drift. - **e2e journeys are mostly covered** by existing specs. The explicit list in the AC (auth, document create, tag, search, transcription, story create) maps to: `auth.spec.ts` ✅, `documents.spec.ts` (partial — create covered, tag application during create is not explicitly verified) ⚠️, `transcription.spec.ts` ✅, `geschichten.spec.ts` ✅. The **tag journey** (apply tag to document → search by tag → results filtered) is the clearest gap. - **The CI pipeline gap**: `e2e/CLAUDE.md` explicitly notes that E2E tests are not run in CI. This means TEST-3's "Critical user journeys each have at least one e2e" is not enforced by the pipeline — someone could delete an e2e test and CI would still pass. The epic's DoD should state whether CI integration is in scope or deferred. - I looked at `transcription.spec.ts`: 14 tests covering blocks, edit, persist on reload, reorder, delete — this is thorough and would catch most transcription regressions. Good baseline. - The "Refactor readiness verdict in `docs/LEGIBILITY-READINESS.md`" referenced in the DoD is a file that presumably gets created in AUDIT-6. Check that the dependency is not a blocker — if AUDIT-6 isn't merged yet when this epic closes, the verdict reference is a dangling pointer. ### Recommendations - The TEST-1 child issue should define **Tier-1 services** explicitly: at minimum `DocumentService`, `PersonService`, `TagService`, `GeschichteService`. This avoids "we did 5 on OcrService and skipped PersonService" outcomes. - For TEST-3, the missing journey is **tag filtering**: upload/create a document, add a tag, search by that tag, verify the document appears. This is a 10-line Playwright addition to `documents.spec.ts`. - **Add an explicit AC item**: "E2E tests are runnable in CI via `npm run test:e2e` against a Docker Compose stack." If you're leaving E2E out of CI, note that as a deliberate scope boundary so it doesn't quietly stay unfixed. - The `@Transactional` rollback pattern in `GeschichteServiceIntegrationTest` (it uses Testcontainers + real Postgres) means integration tests do clean up after themselves. That's good — TEST-2 rewrites should follow the same pattern. - When rewriting tautological tests in TEST-2, add factory method helpers (`makeDocument()`, `makeGeschichte()`) rather than inline builders. The existing `DocumentServiceTest` already uses a constant `UNPAGED` — that same DRY instinct should extend to entity setup. ### Open Decisions - **E2E in CI: in scope or deferred?** The epic's DoD says "refactor can run with full confidence" — but if the e2e suite isn't wired into CI, that confidence exists only locally. Options: (A) wire up e2e to CI as part of this epic (adds complexity, requires Docker Compose in CI); (B) document it as a known gap that doesn't block Epic 4. Either is defensible, but it should be an explicit choice recorded in the DoD, not silence. _(Raised by: Sara)_
Author
Owner

🚀 Tobias Wendt — DevOps & Platform Engineer

Observations

  • The CI pipeline (ci.yml) runs three jobs: unit-tests (frontend), ocr-tests (Python), and backend-unit-tests. The backend job runs ./mvnw clean test, which includes Testcontainers integration tests — the Gitea runner has Docker access and sets DOCKER_API_VERSION: "1.43" to handle the older Docker 24.x API on the NAS. That's solid.
  • The e2e suite is not in CI — confirmed by e2e/CLAUDE.md: "E2E tests are not currently run in CI." This means the TEST-3 e2e coverage is only locally verified. From a pipeline perspective, an e2e test that never runs in CI is a documentation artifact, not a safety net.
  • The actions/upload-artifact@v3 step in ci.yml is deprecated. It still works but security patches stopped. This is worth fixing in this epic's cleanup scope since you're touching CI quality anyway — swap to @v4. Low effort, reduces vulnerability surface.
  • Playwright image is pinned: mcr.microsoft.com/playwright:v1.58.2-noble — that's correct practice. Maven is cached on backend/pom.xml hash, node_modules on package-lock.json hash — both right.
  • There is no artifact upload for backend test reports (Surefire XML, JaCoCo HTML). If a mutation-testing run generates a PIT HTML report, there's currently no step to upload it. TEST-1 should include a CI step or at minimum a target/pit-reports/ artifact upload for reviewability.

Recommendations

  • Fix actions/upload-artifact@v3@v4 in the same PR that closes TEST-1 or TEST-2. This is a 1-line change and shouldn't be deferred past this epic.
  • If PIT is run as part of TEST-1, add a mvn org.pitest:pitest-maven:mutationCoverage -pl backend -Dpit.targetClasses=... step in CI (or at least upload the HTML report as an artifact). Without this, the mutation results can't be reviewed or tracked over time.
  • For TEST-3, the most pragmatic CI path for e2e: add a separate e2e-tests job that uses docker/setup-buildx-action or relies on the Gitea NAS runner's host Docker access (same as the Testcontainers job), starts docker compose up -d for the backend stack, then runs npm run test:e2e. The NAS runner already has Docker — the infrastructure cost is low.
  • No concerns about the existing pipeline structure for TEST-1 and TEST-2 — both run entirely within the existing backend-unit-tests job.

Open Decisions (none)

## 🚀 Tobias Wendt — DevOps & Platform Engineer ### Observations - The CI pipeline (`ci.yml`) runs three jobs: `unit-tests` (frontend), `ocr-tests` (Python), and `backend-unit-tests`. The backend job runs `./mvnw clean test`, which includes Testcontainers integration tests — the Gitea runner has Docker access and sets `DOCKER_API_VERSION: "1.43"` to handle the older Docker 24.x API on the NAS. That's solid. - **The e2e suite is not in CI** — confirmed by `e2e/CLAUDE.md`: "E2E tests are not currently run in CI." This means the TEST-3 e2e coverage is only locally verified. From a pipeline perspective, an e2e test that never runs in CI is a documentation artifact, not a safety net. - The `actions/upload-artifact@v3` step in `ci.yml` is deprecated. It still works but security patches stopped. This is worth fixing in this epic's cleanup scope since you're touching CI quality anyway — swap to `@v4`. Low effort, reduces vulnerability surface. - Playwright image is pinned: `mcr.microsoft.com/playwright:v1.58.2-noble` — that's correct practice. Maven is cached on `backend/pom.xml` hash, node_modules on `package-lock.json` hash — both right. - There is no artifact upload for backend test reports (Surefire XML, JaCoCo HTML). If a mutation-testing run generates a PIT HTML report, there's currently no step to upload it. TEST-1 should include a CI step or at minimum a `target/pit-reports/` artifact upload for reviewability. ### Recommendations - Fix `actions/upload-artifact@v3` → `@v4` in the same PR that closes TEST-1 or TEST-2. This is a 1-line change and shouldn't be deferred past this epic. - If PIT is run as part of TEST-1, add a `mvn org.pitest:pitest-maven:mutationCoverage -pl backend -Dpit.targetClasses=...` step in CI (or at least upload the HTML report as an artifact). Without this, the mutation results can't be reviewed or tracked over time. - For TEST-3, the most pragmatic CI path for e2e: add a separate `e2e-tests` job that uses `docker/setup-buildx-action` or relies on the Gitea NAS runner's host Docker access (same as the Testcontainers job), starts `docker compose up -d` for the backend stack, then runs `npm run test:e2e`. The NAS runner already has Docker — the infrastructure cost is low. - No concerns about the existing pipeline structure for TEST-1 and TEST-2 — both run entirely within the existing `backend-unit-tests` job. ### Open Decisions _(none)_
Author
Owner

📋 Elicit (Requirements Engineer) — Brownfield Analysis

Observations

  • The issue is well-structured for an epic: clear context (Epic 3 of 5), explicit child task breakdown (TEST-1/2/3), measurable acceptance criteria, and a Definition of Done that ties back to a downstream deliverable. This is better specified than most epics.
  • Ambiguity in "at least 5 service tests per Tier-1 backend domain": The term "Tier-1" is not defined. Without a definition, the implementer has discretion to choose the easiest services rather than the highest-risk ones. This is a specification gap that could undermine the safety net goal.
  • "Mutation-tested and proven to detect breakage" is the key phrase in AC 1. This needs an operationalized form: what output counts as proof? A PIT mutation score ≥ X%? A written list of 5+ surviving mutants that were subsequently caught by new/fixed tests? Without specifying the artifact, the criterion is subjective at close time.
  • "Tautological tests found have been rewritten" in AC 2 is clear, but it assumes mutation testing will find them. It's worth adding: "and a list of rewritten tests is recorded in the TEST-2 child issue" as evidence.
  • The DoD references docs/LEGIBILITY-READINESS.md (AUDIT-6) — verify this file exists or is expected to be created before this epic closes, otherwise the DoD has an unresolved dependency.
  • TEST-3's AC ("each critical journey has at least one e2e") covers 6 named journeys. The tag journey (create tag → apply to document → search by tag → filtered results) appears to be a gap in existing e2e specs and should be explicit in TEST-3's scope.

Recommendations

  • Define Tier-1 services in the TEST-1 child issue body: at minimum DocumentService, PersonService, TagService, GeschichteService. This makes the AC unambiguous at review time.
  • Specify the mutation testing artifact: "Results summarized as a comment on TEST-1 with: total mutants, killed%, surviving mutants listed, action taken for each survivor." This is auditable and prevents subjective close decisions.
  • Add an explicit non-goal: "This epic does not introduce new production functionality." Stating this prevents scope creep during TEST-2 (if a tautology rewrite reveals a real bug, fixing the bug is out of scope — a separate issue is filed).
  • Clarify E2E-in-CI scope in the epic body or the TEST-3 child: is wiring e2e into the Gitea CI pipeline in scope? This is the most impactful open decision (Sara and Tobias also flagged it).

Open Decisions

  • Tier-1 service definition: needs to be recorded so the AC is objectively verifiable. Options: (A) list services explicitly in TEST-1; (B) define Tier-1 as "any service with ≥1 write endpoint that touches two or more entities." Option A is simpler and immediately actionable. (Raised by: Elicit)
## 📋 Elicit (Requirements Engineer) — Brownfield Analysis ### Observations - The issue is well-structured for an epic: clear context (Epic 3 of 5), explicit child task breakdown (TEST-1/2/3), measurable acceptance criteria, and a Definition of Done that ties back to a downstream deliverable. This is better specified than most epics. - **Ambiguity in "at least 5 service tests per Tier-1 backend domain"**: The term "Tier-1" is not defined. Without a definition, the implementer has discretion to choose the easiest services rather than the highest-risk ones. This is a specification gap that could undermine the safety net goal. - **"Mutation-tested and proven to detect breakage"** is the key phrase in AC 1. This needs an operationalized form: what output counts as proof? A PIT mutation score ≥ X%? A written list of 5+ surviving mutants that were subsequently caught by new/fixed tests? Without specifying the artifact, the criterion is subjective at close time. - **"Tautological tests found have been rewritten"** in AC 2 is clear, but it assumes mutation testing will find them. It's worth adding: "and a list of rewritten tests is recorded in the TEST-2 child issue" as evidence. - The DoD references `docs/LEGIBILITY-READINESS.md` (AUDIT-6) — verify this file exists or is expected to be created before this epic closes, otherwise the DoD has an unresolved dependency. - TEST-3's AC ("each critical journey has at least one e2e") covers 6 named journeys. **The tag journey** (create tag → apply to document → search by tag → filtered results) appears to be a gap in existing e2e specs and should be explicit in TEST-3's scope. ### Recommendations - **Define Tier-1 services in the TEST-1 child issue body**: at minimum `DocumentService`, `PersonService`, `TagService`, `GeschichteService`. This makes the AC unambiguous at review time. - **Specify the mutation testing artifact**: "Results summarized as a comment on TEST-1 with: total mutants, killed%, surviving mutants listed, action taken for each survivor." This is auditable and prevents subjective close decisions. - **Add an explicit non-goal**: "This epic does not introduce new production functionality." Stating this prevents scope creep during TEST-2 (if a tautology rewrite reveals a real bug, fixing the bug is out of scope — a separate issue is filed). - **Clarify E2E-in-CI scope** in the epic body or the TEST-3 child: is wiring e2e into the Gitea CI pipeline in scope? This is the most impactful open decision (Sara and Tobias also flagged it). ### Open Decisions - **Tier-1 service definition**: needs to be recorded so the AC is objectively verifiable. Options: (A) list services explicitly in TEST-1; (B) define Tier-1 as "any service with ≥1 write endpoint that touches two or more entities." Option A is simpler and immediately actionable. _(Raised by: Elicit)_
Author
Owner

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Observations

No direct UX or accessibility concerns from this epic — it is purely a backend/test quality initiative with no frontend visual changes.

I checked what's covered: geschichten.spec.ts includes an axe scan (AxeBuilder.withTags(['wcag2a', 'wcag2aa'])), and transcription.spec.ts includes transcription panel passes axe accessibility check. The blocking-severity filter in geschichten.spec.ts (filtering to serious and critical violations only) is a pragmatic gate. These pass today and TEST-3 should verify they still pass after any test suite changes — but since this epic doesn't touch UI code, there's no regression risk here.

One note for the record: the e2e suite's accessibility.spec.ts runs Axe on key pages. If TEST-3 adds a new tag-filter journey, it should include an axe scan on the tag-filtered document list page if that path hasn't been scanned before.

No open decisions from my angle.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist ### Observations No direct UX or accessibility concerns from this epic — it is purely a backend/test quality initiative with no frontend visual changes. I checked what's covered: `geschichten.spec.ts` includes an axe scan (`AxeBuilder.withTags(['wcag2a', 'wcag2aa'])`), and `transcription.spec.ts` includes `transcription panel passes axe accessibility check`. The blocking-severity filter in `geschichten.spec.ts` (filtering to `serious` and `critical` violations only) is a pragmatic gate. These pass today and TEST-3 should verify they still pass after any test suite changes — but since this epic doesn't touch UI code, there's no regression risk here. One note for the record: the e2e suite's `accessibility.spec.ts` runs Axe on key pages. If TEST-3 adds a new tag-filter journey, it should include an axe scan on the tag-filtered document list page if that path hasn't been scanned before. No open decisions from my angle.
Author
Owner

🗳️ Decision Queue — Action Required

3 decisions need your input before implementation starts.

Test Strategy

  • Mutation testing tooling: one-off vs CI gate — PIT (org.pitest:pitest-maven) is the standard Java choice. The question is whether to run it once during TEST-1 and report results as a comment, or wire it as a permanent CI gate. One-off is appropriate for this epic's purpose (verifying pre-refactor safety). A permanent gate is a separate investment with real CI time cost. Pick one and record it in the TEST-1 child issue. (Raised by: Markus)

  • E2E in CI: in scope or deferred? — The DoD says "refactor can run with full confidence," but the e2e suite is not wired into CI (e2e/CLAUDE.md confirms this). A passing e2e suite that only runs locally is a documentation artifact, not an enforced safety net. Options: (A) add an e2e-tests job to ci.yml using the NAS runner's existing Docker access — infrastructure cost is low since Testcontainers already uses it; (B) explicitly document this as a known gap that doesn't block Epic 4, with a follow-up issue. Either choice is defensible — the risk is leaving it unstated. (Raised by: Sara, Tobias)

Requirements Clarity

  • "Tier-1 backend domain" definition — The AC requires "at least 5 service tests per Tier-1 backend domain" but Tier-1 is not defined. Without a definition, the implementer has discretion to choose the easiest services. Recommended fix: list the services explicitly in the TEST-1 child issue body (DocumentService, PersonService, TagService, GeschichteService are the obvious candidates). Option B: define Tier-1 as "any service with ≥1 write endpoint that touches two or more entities." Option A is simpler and immediately actionable. (Raised by: Elicit, Sara)
## 🗳️ Decision Queue — Action Required _3 decisions need your input before implementation starts._ ### Test Strategy - **Mutation testing tooling: one-off vs CI gate** — PIT (`org.pitest:pitest-maven`) is the standard Java choice. The question is whether to run it once during TEST-1 and report results as a comment, or wire it as a permanent CI gate. One-off is appropriate for this epic's purpose (verifying pre-refactor safety). A permanent gate is a separate investment with real CI time cost. Pick one and record it in the TEST-1 child issue. _(Raised by: Markus)_ - **E2E in CI: in scope or deferred?** — The DoD says "refactor can run with full confidence," but the e2e suite is not wired into CI (`e2e/CLAUDE.md` confirms this). A passing e2e suite that only runs locally is a documentation artifact, not an enforced safety net. Options: (A) add an `e2e-tests` job to `ci.yml` using the NAS runner's existing Docker access — infrastructure cost is low since Testcontainers already uses it; (B) explicitly document this as a known gap that doesn't block Epic 4, with a follow-up issue. Either choice is defensible — the risk is leaving it unstated. _(Raised by: Sara, Tobias)_ ### Requirements Clarity - **"Tier-1 backend domain" definition** — The AC requires "at least 5 service tests per Tier-1 backend domain" but Tier-1 is not defined. Without a definition, the implementer has discretion to choose the easiest services. Recommended fix: list the services explicitly in the TEST-1 child issue body (`DocumentService`, `PersonService`, `TagService`, `GeschichteService` are the obvious candidates). Option B: define Tier-1 as "any service with ≥1 write endpoint that touches two or more entities." Option A is simpler and immediately actionable. _(Raised by: Elicit, Sara)_
Author
Owner

Epic complete

All three child issues resolved:

Issue Result
#403 TEST-1 — Mutation audit 35/35 mutations detected — suite is trustworthy
#404 TEST-2 — Rewrite tautological tests No work needed — zero tautological tests found
#405 TEST-3 — E2E coverage All 12 journeys covered — 8 new tests added for 6 gaps

Audit artifacts committed

  • docs/audits/test-mutation-report.md — full mutation-by-mutation results
  • docs/audits/e2e-coverage-report.md — 12-journey coverage map

PR: #430
Backend: 1503 tests, 0 failures

## Epic complete ✅ All three child issues resolved: | Issue | Result | |-------|--------| | #403 TEST-1 — Mutation audit | **35/35 mutations detected** — suite is trustworthy | | #404 TEST-2 — Rewrite tautological tests | **No work needed** — zero tautological tests found | | #405 TEST-3 — E2E coverage | **All 12 journeys covered** — 8 new tests added for 6 gaps | ### Audit artifacts committed - `docs/audits/test-mutation-report.md` — full mutation-by-mutation results - `docs/audits/e2e-coverage-report.md` — 12-journey coverage map PR: #430 Backend: 1503 tests, 0 failures
Sign in to join this conversation.
No Label epic legibility test
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#402