test(legibility): mutation-test critical backend service tests; flag tautologies #403

New Issue

marcel · 2026-05-04T16:10:55+02:00

marcel commented

2026-05-04 16:10:55 +02:00

Context

Part of Epic #402 — Pre-flight. This is TEST-1: verify that critical-path service tests are real (would fail if the service were broken) before the big-bang refactor moves anything.

Per the Legibility Rubric, this addresses C8.1 (Critical).

Goal

Spot-check at least 5 service tests per Tier-1 backend domain (document, person, tag, user, geschichte, notification, ocr — 7 domains × 5 tests = 35 tests minimum). For each, perform a small mutation in the service under test and confirm the test fails.

Method

Two viable approaches — pick one:

Approach A — Manual targeted mutation (faster for solo + LLM)

For each selected test:

Identify the service method it claims to test
Make a "dumb" mutation in that method (e.g., negate a condition, return null, swap two args, comment out a save call)
Run only that test
Expected: test FAILS
Revert the mutation
Record outcome in audit log (passes mutation? yes/no)

Approach B — PIT (mutation testing tool for Java/Maven)

Add PIT plugin to backend pom.xml (dev profile only, not committed permanently)
Run ./mvnw org.pitest:pitest-maven:mutationCoverage
Read the report — focus on Tier-1 domain services
Record per-class mutation score

Approach A is recommended for this issue — it gives concrete, per-test answers without requiring tooling setup.

Selection criteria for the 35 tests

For each of the 7 Tier-1 domains, sample tests in this order until 5 are picked:

Tests for the most complex service method (longest, most branches)
Tests for the most-called method (search controller usage)
Tests for security-sensitive methods (anything tied to permission checks)
Tests for state-transition methods (DocumentStatus changes, transcription review, etc.)
Tests for cross-domain orchestration methods (e.g., MassImportService, RelationshipInferenceService)

Required output

Add docs/audits/test-mutation-report.md containing:

Domain	Test class	Test method	Mutation made	Result	Action needed

Where Result is DETECTED / NOT DETECTED (tautological).

Summary at the bottom: per-domain pass-rate and a list of all NOT DETECTED tests, which feed into TEST-2.

Acceptance criteria

At least 5 tests per Tier-1 backend domain (≥35 total) have been mutation-tested
docs/audits/test-mutation-report.md exists with per-test results
Tautological tests (NOT DETECTED) are listed for TEST-2 to fix
Pass-rate per domain is reported
No tests were left mutated; all reverts verified by running the full backend test suite green
PR opened (adding the report) and merged before this issue is closed

Definition of Done

Report committed; closing comment summarizes the per-domain pass-rate and lists how many follow-up tests TEST-2 needs to write.

Dispatch

Best done by an agent with full file-edit access (not a read-only Explore agent). Memory: source SDKMAN before running mvn.

## Context Part of **Epic #402** — Pre-flight. This is **TEST-1**: verify that critical-path service tests are real (would fail if the service were broken) before the big-bang refactor moves anything. Per the Legibility Rubric, this addresses **C8.1 (Critical)**. ## Goal Spot-check at least 5 service tests per Tier-1 backend domain (`document`, `person`, `tag`, `user`, `geschichte`, `notification`, `ocr` — 7 domains × 5 tests = 35 tests minimum). For each, perform a small mutation in the service under test and confirm the test fails. ## Method Two viable approaches — pick one: ### Approach A — Manual targeted mutation (faster for solo + LLM) For each selected test: 1. Identify the service method it claims to test 2. Make a "dumb" mutation in that method (e.g., negate a condition, return null, swap two args, comment out a save call) 3. Run only that test 4. Expected: test FAILS 5. Revert the mutation 6. Record outcome in audit log (passes mutation? yes/no) ### Approach B — PIT (mutation testing tool for Java/Maven) 1. Add PIT plugin to backend `pom.xml` (dev profile only, not committed permanently) 2. Run `./mvnw org.pitest:pitest-maven:mutationCoverage` 3. Read the report — focus on Tier-1 domain services 4. Record per-class mutation score Approach A is recommended for this issue — it gives concrete, per-test answers without requiring tooling setup. ## Selection criteria for the 35 tests For each of the 7 Tier-1 domains, sample tests in this order until 5 are picked: 1. Tests for the most complex service method (longest, most branches) 2. Tests for the most-called method (search controller usage) 3. Tests for security-sensitive methods (anything tied to permission checks) 4. Tests for state-transition methods (DocumentStatus changes, transcription review, etc.) 5. Tests for cross-domain orchestration methods (e.g., `MassImportService`, `RelationshipInferenceService`) ## Required output Add `docs/audits/test-mutation-report.md` containing: | Domain | Test class | Test method | Mutation made | Result | Action needed | |---|---|---|---|---|---| Where `Result` is `DETECTED` / `NOT DETECTED (tautological)`. Summary at the bottom: per-domain pass-rate and a list of all `NOT DETECTED` tests, which feed into TEST-2. ## Acceptance criteria - [ ] At least 5 tests per Tier-1 backend domain (≥35 total) have been mutation-tested - [ ] `docs/audits/test-mutation-report.md` exists with per-test results - [ ] Tautological tests (NOT DETECTED) are listed for TEST-2 to fix - [ ] Pass-rate per domain is reported - [ ] No tests were left mutated; all reverts verified by running the full backend test suite green - [ ] PR opened (adding the report) and merged before this issue is closed ## Definition of Done Report committed; closing comment summarizes the per-domain pass-rate and lists how many follow-up tests TEST-2 needs to write. ## Dispatch Best done by an agent with full file-edit access (not a read-only Explore agent). Memory: source SDKMAN before running mvn.

marcel added this to the (deleted) milestone 2026-05-04 16:10:55 +02:00

marcel referenced this issue

2026-05-04 16:11:55 +02:00

test(legibility): rewrite tautological tests revealed by TEST-1 #404

marcel added the P1-high legibility test labels 2026-05-04 16:11:56 +02:00

marcel commented

2026-05-04 21:04:06 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

The existing test suite is well-structured: all 7 Tier-1 domains have dedicated *ServiceTest.java files under backend/src/test/java/org/raddatz/familienarchiv/service/. Test counts per domain: DocumentServiceTest (147 tests), PersonServiceTest (49), UserServiceTest (51), TagServiceTest (40), NotificationServiceTest (27), GeschichteServiceTest (22), OcrServiceTest (8). The OCR domain stands out as thin.
All files use @ExtendWith(MockitoExtension.class) + @InjectMocks — correct pattern for fast unit tests. Mutations will be quick to run and revert since no Spring context boots.
The issue recommends Approach A (manual mutation) and that's the right call for this codebase. All tests mock their dependencies, so mutations in the service method are immediately visible to the test: no integration layer noise.

Recommendations

Start with OcrServiceTest (8 tests). With the fewest tests and a domain that touches async state transitions, it's the highest-risk place for tautologies. Five mutations there will tell you whether the tests actually guard the job state machine.
For DocumentServiceTest, focus on the search/filter methods first — these tend to be complex (Specification-based), and a negated condition or dropped predicate is the classic tautology pattern in Mockito tests where the repo is fully mocked.
Reverts: use git stash after each mutation run rather than manual undo. One stash per mutation keeps reverts clean and eliminates "did I revert that?" doubt. Check git diff is empty before moving to the next domain.
Name the mutations in the audit log precisely. "Negated condition on line 74" is not enough — write the before/after one-liner so the report is actionable for TEST-2.
The GeschichteServiceTest uses SecurityContextHolder manipulation in @BeforeEach/@AfterEach. When mutating GeschichteService permission checks, run the full GeschichteServiceTest class, not just one test — those tests share security context state and a mutation in one guard can cascade silently.

Open Decisions

OCR domain boundary: OcrServiceTest only tests OcrService (8 tests). There are also OcrBatchServiceTest (separate file) and OcrProgressServiceTest. Does "ocr domain — 5 tests" mean 5 from OcrServiceTest alone, or 5 from the combined ocr/* service tests? Worth clarifying before starting so the report's domain column is consistent.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - The existing test suite is well-structured: all 7 Tier-1 domains have dedicated `*ServiceTest.java` files under `backend/src/test/java/org/raddatz/familienarchiv/service/`. Test counts per domain: `DocumentServiceTest` (147 tests), `PersonServiceTest` (49), `UserServiceTest` (51), `TagServiceTest` (40), `NotificationServiceTest` (27), `GeschichteServiceTest` (22), `OcrServiceTest` (8). The OCR domain stands out as thin. - All files use `@ExtendWith(MockitoExtension.class)` + `@InjectMocks` — correct pattern for fast unit tests. Mutations will be quick to run and revert since no Spring context boots. - The issue recommends Approach A (manual mutation) and that's the right call for this codebase. All tests mock their dependencies, so mutations in the service method are immediately visible to the test: no integration layer noise. ### Recommendations - **Start with `OcrServiceTest` (8 tests).** With the fewest tests and a domain that touches async state transitions, it's the highest-risk place for tautologies. Five mutations there will tell you whether the tests actually guard the job state machine. - **For `DocumentServiceTest`, focus on the search/filter methods first** — these tend to be complex (`Specification`-based), and a negated condition or dropped predicate is the classic tautology pattern in Mockito tests where the repo is fully mocked. - **Reverts:** use `git stash` after each mutation run rather than manual undo. One stash per mutation keeps reverts clean and eliminates "did I revert that?" doubt. Check `git diff` is empty before moving to the next domain. - **Name the mutations in the audit log precisely.** "Negated condition on line 74" is not enough — write the before/after one-liner so the report is actionable for TEST-2. - The `GeschichteServiceTest` uses `SecurityContextHolder` manipulation in `@BeforeEach`/`@AfterEach`. When mutating `GeschichteService` permission checks, run the *full* `GeschichteServiceTest` class, not just one test — those tests share security context state and a mutation in one guard can cascade silently. ### Open Decisions - **OCR domain boundary:** `OcrServiceTest` only tests `OcrService` (8 tests). There are also `OcrBatchServiceTest` (separate file) and `OcrProgressServiceTest`. Does "ocr domain — 5 tests" mean 5 from `OcrServiceTest` alone, or 5 from the combined ocr/* service tests? Worth clarifying before starting so the report's domain column is consistent.

marcel commented

2026-05-04 21:04:17 +02:00

🏛️ Markus Keller — Application Architect

Observations

This issue is doing the right thing structurally: mutation testing is the only reliable way to distinguish coverage theatre from coverage substance. JaCoCo reports 88% branch coverage (gate in pom.xml), but branch coverage is a necessary, not sufficient, condition for test quality. A test that stubs the repo and never asserts the result passes JaCoCo green while catching nothing.
The issue limits scope to 7 Tier-1 service classes. That's the correct architectural focus — services are where business logic lives, per the layering rules. Controllers are thin; repositories are interfaces. The value-at-risk is in services.
docs/audits/ does not exist yet. The issue requires creating it and committing test-mutation-report.md. This is a documentation artifact, not code — its existence in the repo is the acceptance criterion. That's appropriate: audit findings have the same lifecycle as ADRs and belong in version control.

Recommendations

Keep the docs/audits/ directory alongside docs/adr/ — consistent with the existing documentation structure (docs/adr/001-ocr-python-microservice.md through 006-...). Don't create it under backend/ or anywhere else.
The report format proposed in the issue is sufficient. Do not over-engineer it. A Markdown table with Domain / Test class / Test method / Mutation made / Result / Action needed captures everything TEST-2 needs to act on.
If tautological tests are found, do NOT fix them in this issue. The issue's own scope boundary is clear: detect and report. Fixing belongs to TEST-2. Mixing detect + fix in one PR makes the audit trail unreadable.
No PIT plugin should be committed to pom.xml, even in a dev profile. The issue correctly marks Approach B as "not preferred." A committed plugin adds noise to the build descriptor and is easy to forget to remove. Approach A produces the same actionable data with less infrastructure overhead.

Open Decisions (none — Approach A is clearly the right choice here)

## 🏛️ Markus Keller — Application Architect ### Observations - This issue is doing the right thing structurally: **mutation testing is the only reliable way to distinguish coverage theatre from coverage substance.** JaCoCo reports 88% branch coverage (gate in `pom.xml`), but branch coverage is a necessary, not sufficient, condition for test quality. A test that stubs the repo and never asserts the result passes JaCoCo green while catching nothing. - The issue limits scope to 7 Tier-1 service classes. That's the correct architectural focus — services are where business logic lives, per the layering rules. Controllers are thin; repositories are interfaces. The value-at-risk is in services. - `docs/audits/` does not exist yet. The issue requires creating it and committing `test-mutation-report.md`. This is a documentation artifact, not code — its existence in the repo is the acceptance criterion. That's appropriate: audit findings have the same lifecycle as ADRs and belong in version control. ### Recommendations - **Keep the `docs/audits/` directory alongside `docs/adr/`** — consistent with the existing documentation structure (`docs/adr/001-ocr-python-microservice.md` through `006-...`). Don't create it under `backend/` or anywhere else. - **The report format proposed in the issue is sufficient.** Do not over-engineer it. A Markdown table with Domain / Test class / Test method / Mutation made / Result / Action needed captures everything TEST-2 needs to act on. - **If tautological tests are found, do NOT fix them in this issue.** The issue's own scope boundary is clear: detect and report. Fixing belongs to TEST-2. Mixing detect + fix in one PR makes the audit trail unreadable. - **No PIT plugin should be committed to `pom.xml`**, even in a dev profile. The issue correctly marks Approach B as "not preferred." A committed plugin adds noise to the build descriptor and is easy to forget to remove. Approach A produces the same actionable data with less infrastructure overhead. ### Open Decisions _(none — Approach A is clearly the right choice here)_

marcel referenced this issue

2026-05-04 21:04:35 +02:00

test(legibility): rewrite tautological tests revealed by TEST-1 #404

marcel commented

2026-05-04 21:04:40 +02:00

🧪 Sara Holt — QA Engineer & Test Strategist

Observations

Current test counts per domain are healthy for most domains but thin for OCR (8 tests in OcrServiceTest). The issue's 5-per-domain minimum means OCR will need 5 of those 8 tested — that's 62.5% of the entire OCR unit test suite, which gives a meaningful signal but also means any tautologies found there have high impact.
The issue is correctly scoped to service-layer unit tests (Mockito-based, no Spring context). This is the right layer for mutation testing because: fast to run, mutations are isolated to one service method, and reverts have no side effects on DB or file state.
The acceptance criteria are clear and measurable. The table format with DETECTED / NOT DETECTED columns maps directly to TEST-2's input, which is good issue chaining.
The "revert and run full backend test suite green" criterion is the most important safety check. It prevents a half-reverted mutation from corrupting the baseline.

Recommendations

Prioritize tests for state-transition methods first (as the issue suggests for priority 4). DocumentStatus transitions and GeschichteStatus transitions are the highest-value mutation targets — a dropped status = REVIEWED assignment fails silently in a mocked unit test if the test only asserts that save() was called and not what was saved. Use ArgumentCaptor to verify the status field, not just that save was invoked.
For the GeschichteServiceTest: check tests with SecurityContextHolder manipulation carefully. A mutation that removes a permission guard will only be detected if the test actually asserts the exception — asserting getById returns a non-null result when the user has no BLOG_WRITE would be a tautology even with the mock in place. This is the category most likely to surface NOT DETECTED cases.
Record the full test class name in the audit log, not just the method. TEST-2 will need to locate the file, and GeschichteServiceTest.getById_throws_NOT_FOUND_for_draft_when_user_lacks_BLOG_WRITE is unambiguous in a way that getById_throws... alone is not.
After the full audit, run ./mvnw verify (not just test) to confirm the JaCoCo 88% gate still passes with all mutations reverted. This is the "reverts verified" acceptance criterion.

## 🧪 Sara Holt — QA Engineer & Test Strategist ### Observations - Current test counts per domain are healthy for most domains but thin for OCR (8 tests in `OcrServiceTest`). The issue's 5-per-domain minimum means OCR will need 5 of those 8 tested — that's 62.5% of the entire OCR unit test suite, which gives a meaningful signal but also means any tautologies found there have high impact. - The issue is correctly scoped to **service-layer unit tests** (Mockito-based, no Spring context). This is the right layer for mutation testing because: fast to run, mutations are isolated to one service method, and reverts have no side effects on DB or file state. - The acceptance criteria are clear and measurable. The table format with DETECTED / NOT DETECTED columns maps directly to TEST-2's input, which is good issue chaining. - The "revert and run full backend test suite green" criterion is the most important safety check. It prevents a half-reverted mutation from corrupting the baseline. ### Recommendations - **Prioritize tests for state-transition methods first** (as the issue suggests for priority 4). `DocumentStatus` transitions and `GeschichteStatus` transitions are the highest-value mutation targets — a dropped `status = REVIEWED` assignment fails silently in a mocked unit test if the test only asserts that `save()` was called and not *what* was saved. Use `ArgumentCaptor` to verify the status field, not just that `save` was invoked. - **For the `GeschichteServiceTest`: check tests with `SecurityContextHolder` manipulation carefully.** A mutation that removes a permission guard will only be detected if the test actually asserts the exception — asserting `getById` returns a non-null result when the user has no `BLOG_WRITE` would be a tautology even with the mock in place. This is the category most likely to surface `NOT DETECTED` cases. - **Record the full test class name in the audit log**, not just the method. TEST-2 will need to locate the file, and `GeschichteServiceTest.getById_throws_NOT_FOUND_for_draft_when_user_lacks_BLOG_WRITE` is unambiguous in a way that `getById_throws...` alone is not. - **After the full audit, run `./mvnw verify` (not just `test`)** to confirm the JaCoCo 88% gate still passes with all mutations reverted. This is the "reverts verified" acceptance criterion.

marcel commented

2026-05-04 21:05:01 +02:00

🔒 Nora "NullX" Steiner — Application Security Engineer

Observations

The issue correctly flags security-sensitive methods as priority 3 in the selection criteria. This is where tautological tests cause the most damage — a test that "verifies" a permission check but doesn't actually detect its removal gives false confidence in the security posture.
From scanning the existing tests: GeschichteServiceTest directly tests permission enforcement by manipulating SecurityContextHolder. These are the most security-relevant unit tests in the service layer. UserServiceTest tests auth-adjacent flows (password change, email uniqueness). These two domains should be mutation-tested with extra care on the assertThatThrownBy patterns.
The NotificationServiceTest (27 tests) covers cross-user notification flows. A mutation that drops the recipient filter (e.g., sends a notification to all users instead of the target user) would be a data-exposure bug. Tests that only verify mailSender.send() was called but don't verify who it was sent to are tautological from a security perspective.

Recommendations

For security-gated methods, always mutate the condition itself, not the consequence. For example, in GeschichteService.getById(), don't just comment out the DomainException.forbidden() throw — negate the permission check condition (! → remove !). This tests whether the guard is what triggers the assertion, not just whether the exception path is reachable.
In NotificationServiceTest, look for ArgumentCaptor usage. Tests that verify save(notification) was called but use any() as the argument matcher are security-blind — they can't detect mutations that send notifications to the wrong recipient. Flag these as NOT DETECTED risk even before mutating, as a pre-screening step.
The audit report's "Action needed" column should explicitly call out security implications for any NOT DETECTED case involving auth/permission logic. These are not equal to NOT DETECTED in a sort comparator — they need TEST-2 to write a test with a specific security assertion, not just a coverage improvement.
UserServiceTest changePassword tests: verify that the mutation test checks both "wrong user cannot change another user's password" AND "correct user can change their own." If only the happy path is covered, a mutation removing the ownership check won't be detected.

## 🔒 Nora "NullX" Steiner — Application Security Engineer ### Observations - The issue correctly flags **security-sensitive methods** as priority 3 in the selection criteria. This is where tautological tests cause the most damage — a test that "verifies" a permission check but doesn't actually detect its removal gives false confidence in the security posture. - From scanning the existing tests: `GeschichteServiceTest` directly tests permission enforcement by manipulating `SecurityContextHolder`. These are the most security-relevant unit tests in the service layer. `UserServiceTest` tests auth-adjacent flows (password change, email uniqueness). These two domains should be mutation-tested with extra care on the `assertThatThrownBy` patterns. - The `NotificationServiceTest` (27 tests) covers cross-user notification flows. A mutation that drops the recipient filter (e.g., sends a notification to all users instead of the target user) would be a data-exposure bug. Tests that only verify `mailSender.send()` was called but don't verify *who it was sent to* are tautological from a security perspective. ### Recommendations - **For security-gated methods, always mutate the condition itself, not the consequence.** For example, in `GeschichteService.getById()`, don't just comment out the `DomainException.forbidden()` throw — negate the permission check condition (`!` → remove `!`). This tests whether the *guard* is what triggers the assertion, not just whether the exception path is reachable. - **In `NotificationServiceTest`, look for ArgumentCaptor usage.** Tests that verify `save(notification)` was called but use `any()` as the argument matcher are security-blind — they can't detect mutations that send notifications to the wrong recipient. Flag these as `NOT DETECTED` risk even before mutating, as a pre-screening step. - **The audit report's "Action needed" column should explicitly call out security implications** for any `NOT DETECTED` case involving auth/permission logic. These are not equal to `NOT DETECTED` in a sort comparator — they need TEST-2 to write a test with a specific security assertion, not just a coverage improvement. - **UserServiceTest `changePassword` tests**: verify that the mutation test checks both "wrong user cannot change another user's password" AND "correct user can change their own." If only the happy path is covered, a mutation removing the ownership check won't be detected.

marcel commented

2026-05-04 21:05:29 +02:00

🚀 Tobias Wendt — DevOps & Platform Engineer

Observations

This issue is entirely a local dev-time activity — no CI pipeline changes, no Docker changes, no infrastructure touch. From a DevOps perspective, that's correct scoping. Mutation testing by Approach A is a manual, agent-executed task.
One infrastructure concern: the issue requires source SDKMAN before running mvn (noted in project memory). The agent executing this will need to source ~/.sdkman/bin/sdkman-init.sh before each ./mvnw test -Dtest=ClassName invocation, otherwise Maven won't resolve Java 21 and test runs will fail with a confusing error rather than a mutation signal.
The docs/audits/ directory doesn't exist yet. Creating it and committing the report file is part of the acceptance criteria. Trivial, but the PR should be just the audit report — no production code changes. Keeping that PR surgically small makes review and revert easy.

Recommendations

Shell invocation for each mutation test run: source ~/.sdkman/bin/sdkman-init.sh && cd backend && ./mvnw test -Dtest=<ClassName>#<methodName> — run the single test method, not the whole class, to keep turnaround fast. Switch to the full class only when the GeschichteServiceTest SecurityContext sharing is relevant.
Verify revert with a full test run before committing the report. Run ./mvnw clean verify (not just test) at the end to confirm: (a) all mutations are reverted, (b) JaCoCo 88% gate still passes. The verify phase runs the JaCoCo check goal; test alone does not.
The report file belongs at docs/audits/test-mutation-report.md — consistent with the existing docs/adr/ convention. Create docs/audits/ as a new directory with just this file. No README.md needed.
No concerns about CI impact — this is a documentation-only PR. Build time is unaffected.

## 🚀 Tobias Wendt — DevOps & Platform Engineer ### Observations - This issue is entirely a local dev-time activity — no CI pipeline changes, no Docker changes, no infrastructure touch. From a DevOps perspective, that's correct scoping. Mutation testing by Approach A is a manual, agent-executed task. - One infrastructure concern: the issue requires `source SDKMAN before running mvn` (noted in project memory). The agent executing this will need to source `~/.sdkman/bin/sdkman-init.sh` before each `./mvnw test -Dtest=ClassName` invocation, otherwise Maven won't resolve Java 21 and test runs will fail with a confusing error rather than a mutation signal. - The `docs/audits/` directory doesn't exist yet. Creating it and committing the report file is part of the acceptance criteria. Trivial, but the PR should be just the audit report — no production code changes. Keeping that PR surgically small makes review and revert easy. ### Recommendations - **Shell invocation for each mutation test run:** `source ~/.sdkman/bin/sdkman-init.sh && cd backend && ./mvnw test -Dtest=<ClassName>#<methodName>` — run the single test method, not the whole class, to keep turnaround fast. Switch to the full class only when the `GeschichteServiceTest` SecurityContext sharing is relevant. - **Verify revert with a full test run before committing the report.** Run `./mvnw clean verify` (not just `test`) at the end to confirm: (a) all mutations are reverted, (b) JaCoCo 88% gate still passes. The `verify` phase runs the JaCoCo check goal; `test` alone does not. - **The report file belongs at `docs/audits/test-mutation-report.md`** — consistent with the existing `docs/adr/` convention. Create `docs/audits/` as a new directory with just this file. No `README.md` needed. - No concerns about CI impact — this is a documentation-only PR. Build time is unaffected.

marcel referenced this issue

2026-05-04 21:05:49 +02:00

test(legibility): rewrite tautological tests revealed by TEST-1 #404

marcel commented

2026-05-04 21:05:56 +02:00

📋 Elicit — Requirements Engineer

Observations

The issue is well-structured for its purpose: clear goal, explicit method, concrete selection criteria, defined output format, and measurable acceptance criteria. It would pass a Definition of Ready check with one minor flag (see below).
The relationship to Epic #402 and TEST-2 is explicit, which is good issue chaining. The output of this issue (the list of NOT DETECTED tautologies) is directly the input to TEST-2. That traceability is exactly what a preflight audit should have.
The "Definition of Done" is unambiguous: report committed, closing comment summarizes per-domain pass-rate, TEST-2 input list produced. No hidden completion criteria.

Recommendations

Flag for Definition of Ready — one missing element: The issue says "at least 5 tests per Tier-1 backend domain (≥35 total)" but doesn't specify what happens if a domain has fewer than 5 tests after applying the selection criteria. OcrServiceTest currently has 8 tests — all 5 will come from there. But what if a future domain had only 3 tests? The issue should clarify: "if fewer than 5 tests exist, test all available and note the domain as under-sampled." This avoids ambiguity during execution.
The "notification" domain boundary is ambiguous. Is "notification" domain covered by NotificationServiceTest alone, or does it include SseEmitterRegistry tests if any exist? The issue should canonically define which test class maps to each domain name in the 7-domain list. A one-line mapping table would close this.
The output table lacks a "Reverted" confirmation column. Adding a boolean Reverted? column to the audit table makes the "all reverts verified" acceptance criterion machine-checkable from the report itself, rather than requiring the reader to trust the closing comment.
These are minor refinements — the issue is implementable as written. The above would only matter if TEST-2 needs to consume the report programmatically or if the audit is revisited months later.

## 📋 Elicit — Requirements Engineer ### Observations - The issue is well-structured for its purpose: clear goal, explicit method, concrete selection criteria, defined output format, and measurable acceptance criteria. It would pass a Definition of Ready check with one minor flag (see below). - The relationship to Epic #402 and TEST-2 is explicit, which is good issue chaining. The output of this issue (the list of `NOT DETECTED` tautologies) is directly the input to TEST-2. That traceability is exactly what a preflight audit should have. - The "Definition of Done" is unambiguous: report committed, closing comment summarizes per-domain pass-rate, TEST-2 input list produced. No hidden completion criteria. ### Recommendations - **Flag for Definition of Ready — one missing element:** The issue says "at least 5 tests per Tier-1 backend domain (≥35 total)" but doesn't specify what happens if a domain has *fewer than 5 tests* after applying the selection criteria. `OcrServiceTest` currently has 8 tests — all 5 will come from there. But what if a future domain had only 3 tests? The issue should clarify: "if fewer than 5 tests exist, test all available and note the domain as under-sampled." This avoids ambiguity during execution. - **The "notification" domain boundary is ambiguous.** Is "notification" domain covered by `NotificationServiceTest` alone, or does it include `SseEmitterRegistry` tests if any exist? The issue should canonically define which test class maps to each domain name in the 7-domain list. A one-line mapping table would close this. - **The output table lacks a "Reverted" confirmation column.** Adding a boolean `Reverted?` column to the audit table makes the "all reverts verified" acceptance criterion machine-checkable from the report itself, rather than requiring the reader to trust the closing comment. - These are minor refinements — the issue is implementable as written. The above would only matter if TEST-2 needs to consume the report programmatically or if the audit is revisited months later.

marcel commented

2026-05-04 21:06:23 +02:00

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Observations

This issue is entirely backend/testing infrastructure with no UI component. From a UX perspective, there is nothing to review in terms of interaction patterns, accessibility, or responsive design.

What I checked

Verified the issue has no frontend scope, no new UI surfaces, no design decisions, and no user-facing output. The only artifact produced is docs/audits/test-mutation-report.md, which is an internal developer document.

No concerns from my angle.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist ### Observations - This issue is entirely backend/testing infrastructure with no UI component. From a UX perspective, there is nothing to review in terms of interaction patterns, accessibility, or responsive design. ### What I checked - Verified the issue has no frontend scope, no new UI surfaces, no design decisions, and no user-facing output. The only artifact produced is `docs/audits/test-mutation-report.md`, which is an internal developer document. No concerns from my angle.

marcel referenced this issue

2026-05-04 21:06:39 +02:00

test(legibility): rewrite tautological tests revealed by TEST-1 #404

marcel commented

2026-05-04 21:07:00 +02:00

🗳️ Decision Queue — Action Required

2 decisions need your input before implementation starts.

Scope / Domain Boundaries

OCR domain boundary — OcrServiceTest has 8 tests, but there are also OcrBatchServiceTest and OcrProgressServiceTest in the same test directory. Options: (A) "ocr domain" = OcrServiceTest only, keeping domain names 1:1 with service classes, or (B) "ocr domain" = all three OCR test files treated as one domain. Option A is simpler and consistent with the other 6 domains (each maps to one service class). Option B gives a more complete signal but makes the domain column in the report ambiguous. (Raised by: Felix)
Under-sampled domain policy — The issue requires ≥5 tests per domain, but if a domain has fewer than 5 tests total (e.g., a future domain with only 3), the issue is silent on what to do. Options: (A) "test all available tests and mark the domain as under-sampled in the report," or (B) "treat this as a blocker — TEST-2 must first add tests before mutation testing can proceed." Option A is pragmatic for a pre-flight audit; Option B is stricter but creates a dependency between TEST-2 and this issue. (Raised by: Elicit)

Report Format

"Reverted?" column in the audit table — The acceptance criteria require "all reverts verified by running the full backend test suite green" but the proposed table format has no column for this. Options: (A) add a Reverted? boolean column to every row, making revert confirmation traceable per-test, or (B) keep the table as-is and confirm revert once in the closing comment. Option A costs nothing in effort but makes the report self-contained and auditable. Option B is simpler and the closing comment is part of the Definition of Done anyway. (Raised by: Elicit)

## 🗳️ Decision Queue — Action Required _2 decisions need your input before implementation starts._ ### Scope / Domain Boundaries - **OCR domain boundary** — `OcrServiceTest` has 8 tests, but there are also `OcrBatchServiceTest` and `OcrProgressServiceTest` in the same test directory. Options: (A) "ocr domain" = `OcrServiceTest` only, keeping domain names 1:1 with service classes, or (B) "ocr domain" = all three OCR test files treated as one domain. Option A is simpler and consistent with the other 6 domains (each maps to one service class). Option B gives a more complete signal but makes the domain column in the report ambiguous. _(Raised by: Felix)_ - **Under-sampled domain policy** — The issue requires ≥5 tests per domain, but if a domain has fewer than 5 tests total (e.g., a future domain with only 3), the issue is silent on what to do. Options: (A) "test all available tests and mark the domain as under-sampled in the report," or (B) "treat this as a blocker — TEST-2 must first add tests before mutation testing can proceed." Option A is pragmatic for a pre-flight audit; Option B is stricter but creates a dependency between TEST-2 and this issue. _(Raised by: Elicit)_ ### Report Format - **"Reverted?" column in the audit table** — The acceptance criteria require "all reverts verified by running the full backend test suite green" but the proposed table format has no column for this. Options: (A) add a `Reverted?` boolean column to every row, making revert confirmation traceable per-test, or (B) keep the table as-is and confirm revert once in the closing comment. Option A costs nothing in effort but makes the report self-contained and auditable. Option B is simpler and the closing comment is part of the Definition of Done anyway. _(Raised by: Elicit)_

marcel referenced this issue from a commit

2026-05-05 18:04:21 +02:00

docs(audit): add mutation test report for 7 Tier-1 service domains

marcel referenced this issue

2026-05-05 18:04:27 +02:00

epic(legibility): pre-flight — make tests trustworthy (#402) #430

marcel commented

2026-05-05 18:04:35 +02:00

TEST-1 Complete ✅

All 35 mutations detected (100%).

Applied 5 targeted mutations to each of the 7 Tier-1 service domains (document, person, tag, user, geschichte, notification, ocr) and ran the paired test in isolation for each.

No tautological tests found. Every mutation caused a clear, specific assertion failure.

Full results: docs/audits/test-mutation-report.md
PR: #430

## TEST-1 Complete ✅ **All 35 mutations detected (100%).** Applied 5 targeted mutations to each of the 7 Tier-1 service domains (document, person, tag, user, geschichte, notification, ocr) and ran the paired test in isolation for each. No tautological tests found. Every mutation caused a clear, specific assertion failure. Full results: `docs/audits/test-mutation-report.md` PR: #430

marcel referenced this issue

2026-05-05 18:05:12 +02:00

epic(legibility): pre-flight — make tests trustworthy before big-bang refactor #402

marcel referenced this issue from a commit

2026-05-05 18:10:19 +02:00

docs(audit): add mutation test report for 7 Tier-1 service domains

marcel closed this issue

2026-05-05 20:36:56 +02:00

Sign in to join this conversation.