Bug: PersonService.findOrCreateByAlias kann NonUniqueResultException werfen bei case-kollidierenden Aliassen (Schwester von #730) #731

Closed
opened 2026-06-06 10:54:46 +02:00 by marcel · 1 comment
Owner

Summary

Same structural lookup bug as #730, in the Person domain — and it lives on two lookup surfaces, not one:

  1. Alias pathPersonService.findOrCreateByAlias(rawName) resolves via personRepository.findByAliasIgnoreCase(alias)Optional<Person>. If two persons have aliases that collide case-insensitively (e.g. müller and Müller), the case-insensitive derived query matches two rows and Hibernate throws NonUniqueResultExceptionIncorrectResultSizeDataAccessException → generic 500. Latent — only triggers once two case-colliding rows coexist.

  2. Name pathPersonService.findByName(firstName, lastName)personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase(...)Optional<Person>, called by DocumentService.storeDocument (DocumentService.java:222) to resolve the upload sender from the parsed filename (.orElse(null)). Two real people sharing a first+last name (Hans Müller / hans müller) is routine, so this path can throw the same 500 on upload. The original draft asked only to "confirm its caller can't hit the throw" — review established it can, so it is in scope here, not a follow-up.

Found while fixing #730 (the tag-domain instance). Both are the same root cause; fix both in this one PR (near-identical diff).

Depends on / branch after #730. This work mirrors #730. The #731 branch MUST be cut off a base that already has #730 merged — otherwise TagService.findOrCreate is still the old throwing Optional<…>IgnoreCase form and there is nothing to "mirror" (and no tag code to share a helper with). Do not implement #731 on a #730-less base.

Root cause

Optional<…> findBy…IgnoreCase(…) is a leaky identity that only works while the queried column is globally unique case-insensitively. There is no such constraint on persons.alias nor on first/last name, and the importer can legitimately create colliding rows. Like #730 this is a lookup problem, not a data problem — do not add a unique(lower(alias)) constraint and do not merge/dedupe persons. source_ref is the stable identity (ADR-025 / upsertBySourceRef); alias/name are human-editable labels.

Definition — "ambiguous": 2+ persons match the queried label case-insensitively with no exact-case match. An exact-case match among case-folding siblings is not ambiguous — the exact row always wins.

Affected code

  • backend/.../person/PersonService.java
    • findOrCreateByAlias (~line 125), via personRepository.findByAliasIgnoreCase. Note: the create branch (:130-155) is non-trivial — SKIP early-return, INSTITUTION/GROUP vs. name-split + maiden-name alias (PersonNameParser.split + aliasRepository). It must be preserved verbatim; do not flatten it to #730's tag one-liner. The .orElseGet(...) lambda simply moves below the new exact/CI guards.
    • findByName (~line 114), via personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase. The service signature stays Optional<Person> — only its internals change.
  • backend/.../person/PersonRepository.java
    • Optional<Person> findByAliasIgnoreCase(String alias) (line 33)
    • Optional<Person> findByFirstNameIgnoreCaseAndLastNameIgnoreCase(String firstName, String lastName) (line 39)
  • backend/.../document/DocumentService.java:222 — the name-path caller (upload sender resolution). Layering is clean: it already resolves through personService.findByName(...), not the repository, so the fix stays entirely inside PersonService + PersonRepository. Do not lift resolution logic up into DocumentService.

Fix shape

Alias path — mirror #730 (exact-case-first, non-throwing)

String alias = rawName.trim();
// ... SKIP early-return stays ...
Optional<Person> exact = personRepository.findByAlias(alias);
if (exact.isPresent()) return exact.get();                          // exact-case wins
List<Person> ci = personRepository.findAllByAliasIgnoreCase(alias);
if (!ci.isEmpty()) return ci.stream().min(Comparator.comparing(Person::getId)).orElseThrow(); // deterministic lowest id — NEVER throw
// ... existing create lambda (INSTITUTION/GROUP, split, maiden alias) stays ...

Delete the Optional<…> findByAliasIgnoreCase variant so the throwing call can't be reintroduced. Keep findOrCreateByAlias as the single resolution point. Use .orElseThrow() (not bare .get()) on the min() so the "list is non-empty here" invariant is explicit and matches the no-bare-.get() rule. (The #730 final form 80f6468d uses .orElseThrow() — mirror that, not the original .get().)

Name path — non-throwing, bail to null on ambiguity (decision below)

Add List<Person> findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(...) + an exact findByFirstNameAndLastName(...); delete the Optional<…>IgnoreCase variant. findByName returns:

  • exact-case match if present;
  • else, if exactly one case-insensitive match → that one;
  • else (two or more CI matches) → empty (the upload leaves the sender unset for a human to assign — never an arbitrary guess).

Fail-closed requirement on null first name. Parsed filenames can have a null first name; today the derived query folds that to a non-match. The new plural method must keep this: confirm Spring Data generates first_name = ? (which never matches a null arg), not first_name IS NULL. A first_name IS NULL predicate would silently widen the result and pull a last-name-only / institution row in as a "sender" — a quiet provenance-integrity defect. A null first name must resolve to no match. Write that failing test first (repo layer, real Postgres).

Resolved decision — name-collision sender resolution

When a filename resolves to a name shared (case-insensitively) by 2+ persons:

  • Alias path keeps deterministic lowest-id (A) — no human in the importer loop, and it creates-on-absent anyway.
  • Name/sender path bails to null (B) — the archive's value is correct provenance; a confidently-wrong pre-filled Hans Müller is worse than an empty field (senior reviewers won't re-check a pre-filled value). Leave the sender unset and let the human assign it.

The two divergent strategies are deliberate. The load-bearing comment at the name path must explain why it differs from the alias path (provenance correctness > pre-filled guess), not just that it differs — otherwise a future "consistency cleanup" reintroduces the confidently-wrong-sender bug.

Acceptance criteria

Scenario: findOrCreateByAlias exact-case match wins over a case-insensitive sibling
  Given persons with aliases "Müller" and "müller" both exist
  When findOrCreateByAlias is called with "müller"
  Then the person whose alias is exactly "müller" is returned

Scenario: findOrCreateByAlias exact-case match wins even when multiple siblings collide
  Given an exact-case alias "Müller" exists alongside one or more case-colliding siblings
  When findOrCreateByAlias is called with the exact name "Müller"
  Then the exact-case row is returned and the siblings are ignored

Scenario: findOrCreateByAlias never throws on case-insensitive duplicates
  Given two persons collide case-insensitively on alias and neither equals the queried name exactly
  When findOrCreateByAlias is called with that name
  Then it returns the same one deterministically (lowest id) on every call, without throwing

Scenario: findOrCreateByAlias still creates when absent (with maiden-name alias)
  Given no person matches the alias "Clara Cram geb. Bauer"
  When findOrCreateByAlias is called
  Then a new person is created AND a MAIDEN_NAME alias "Bauer" is written

Scenario: findOrCreateByAlias returns null and creates nothing for a SKIP name
  Given a raw name that classifies as PersonType.SKIP
  When findOrCreateByAlias is called
  Then it returns null and no person is created

Scenario: Upload with a unique name resolves the sender
  Given exactly one person is named "Hans Müller"
  When a document is uploaded whose filename resolves to "Hans Müller"
  Then the document's sender is set to that person and the save succeeds

Scenario: Upload with a single case-insensitive name match resolves the sender
  Given exactly one person matches "hans müller" case-insensitively (stored as "Hans Müller")
  When a document is uploaded whose filename resolves to "hans müller"
  Then the document's sender is set to that person and the save succeeds

Scenario: Upload with an ambiguous name leaves the sender unset (no 500)
  Given two persons share the name "Hans Müller" / "hans müller"
  When a document is uploaded whose filename resolves to that name
  Then the save succeeds (no 500) AND the sender is left null (not an arbitrary person)

Scenario: Upload with a last-name-only filename leaves the sender unset
  Given a filename resolves to a last name only (no first name)
  When the sender is resolved
  Then the sender is left null (the null first name never widens the match)

Test strategy

Unit (PersonServiceTest, mocked repo) — each branch a separate test:

  • exact-case wins · exact-case wins among multiple siblings · single CI match → use it · multiple CI (no exact) → lowest id + determinism asserted by calling twice and asserting the same id · SKIP → null · create-when-absent still creates and writes the maiden-name alias · " Clara Cram " trims then resolves · null first name → folds to non-match.
  • Update the existing mocksfindByAliasIgnoreCase is mocked in 8 PersonServiceTest cases (lines 378, 390, 403, 425, 442, 460, 472, 476); they return Optional and won't compile after the method change.
  • DocumentServiceTest is unaffected — leave it alone. Its mocks at DocumentServiceTest.java:1072/1089/1104 stub personService.findByName(...) returning Optional; since the service signature stays Optional<Person>, they still compile. Do not edit them.

Repository (PersonRepositoryTest, @DataJpaTest) — MUST be migrated, not just deleted:

  • PersonRepositoryTest.java:123-154 calls findByAliasIgnoreCase (:126, :137) and findByFirstNameIgnoreCaseAndLastNameIgnoreCase (:146/:149) directly. Deleting those two repo variants breaks this test's compilation. Rewrite the three @Test methods against the new methods (findByAlias, findAllByAliasIgnoreCase, findByFirstNameAndLastName, findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase).
  • Add an umlaut row at the repository layer too (e.g. findAllByAliasIgnoreCase("müller") returns both Müller/müller) — proves the derived query itself folds case in real Postgres, independent of the service test.
  • Add the null-first-name repo test: findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(null, "Müller") returns empty (no widening). This NULL-semantics bug cannot be caught by a mocked unit test — assert it against real Postgres.

Integration (Testcontainers postgres:16, in PersonServiceIntegrationTest):

  • Umlaut pair is mandatory — seed Müller/müller; only the real DB proves Postgres LOWER() folds ü. A plain-ASCII test stays green while umlaut aliases regress.
  • Alias resolution: colliding pair → deterministic lowest id, no throw (split into two tests: one asserts the chosen id, one asserts the second call returns the same id).
  • Name path: seed Hans Müller / hans müller, drive the upload/findByName path → 200, sender null (assert it's null, not an arbitrary person). "No 500" alone is too weak — assert the surviving value. Add the happy single-match case (unique Hans Müller → sender set) and the null-first-name case.
  • PersonServiceIntegrationTest is already @Transactional (rollback) — keep it so seeded umlaut rows don't leak across tests.

Coverage: backend gate is 88% branch (JaCoCo). New branches (exact-present / single-CI / multi-CI / create / SKIP / null-firstName) each need a hitting test.

Verification / observability

  • Opaque error path — already satisfied today, pin it with a test. GlobalExceptionHandler has no dedicated handler for IncorrectResultSizeDataAccessException; it falls through to the generic @ExceptionHandler(Exception.class) (~line 100) → Sentry.captureException + log.error + a wire body of INTERNAL_ERROR / "An unexpected error occurred". No Hibernate class name, SQL, or row count reaches the client. After the fix this path no longer throws, but keep/add a regression test asserting the body carries no Hibernate class name, no SQL, no "2 results were returned" — it guards the handler against being weakened later.
  • Pre-deploy detection (read-only — run against prod with the app read role and attach counts to the PR so we know latent vs. already-active):
    SELECT lower(alias) AS k, count(*) FROM persons
     WHERE alias IS NOT NULL GROUP BY lower(alias) HAVING count(*) > 1;
    
    SELECT lower(first_name), lower(last_name), count(*) FROM persons
     GROUP BY lower(first_name), lower(last_name) HAVING count(*) > 1;
    
    If the alias query returns rows the import path can already 500; if the name query returns rows the upload path can. Both are read-only (GROUP BY … HAVING).
  • Post-deploy: search GlitchTip for IncorrectResultSizeDataAccessException / NonUniqueResultException filtered to person stack frames, record the group id in the PR, and confirm zero new occurrences in the 24h after deploy.

UX note

Option B leaves the sender unset on an ambiguous filename. Verified the empty-sender state already degrades cleanly and is tested (DocumentRow.svelte "renders the unknown placeholder when sender is null"; DocumentMetadataDrawer no-persons placeholder), and metadataComplete=false already routes the doc into the "needs completion" state. So this is adequately covered here — no UI work in this PR, no blocker.

Optional follow-up (file as a separate low-priority issue, not this PR): an inline hint on the edit form so the transcriber learns why the sender is empty. A11y spec for whoever picks it up: redundant cue (info icon + text, not color-only), German copy "Absender konnte aus dem Dateinamen nicht eindeutig zugeordnet werden", min 16px, text-ink on bg-surface (do not use brand-mint as text — ~2.8:1 fails AA), and aria-describedby tying the hint to the sender field. Keep the ambiguous-copy ("nicht eindeutig zuordenbar") distinct from a genuinely-unknown sender ("kein Absender").

Notes

  • NO DB migration. Code-only, fully reversible (roll back the JAR). Resist a unique(lower(...)) index — it's semantically wrong (collisions are valid) and would fail to apply against existing colliding rows, turning a clean deploy into a failed Flyway migration that blocks startup.
  • Implement as two atomic commits within the one PR (alias-path fix + tests; name-path fix + tests) so each path is independently revertible. Each commit starts with its failing test (red → green).
  • ADR — resolved (deliverable, not optional): extend ADR-032. docs/adr/032-tag-name-resolution-tolerates-case-collisions.md (created for #730, on main) already names this Person path as the tracked sibling ("The sibling Person path is unfixed but tracked … deferred to #731"). This fix is the same root-cause pattern, so add a "Person extension" section to ADR-032 rather than a new ADR — keep the case-collision resolution story in one document. The section records: alias path = deterministic lowest-id (mirrors tag); name/sender path = bail-to-null with the provenance rationale (the one genuinely new wrinkle vs. the tag decision); and the same "no unique(lower(...)) constraint" stance. Flip the load-bearing code comment to point at ADR-032 (not just "#730"). (Aside, separate housekeeping: the feat/issue-684-person-delete-db-fk branch independently created a different docs/adr/032-person-delete-db-level-integrity.md — an ADR-number collision to resolve when that branch merges; it does not affect this issue, which extends the main ADR-032.)
  • Skip the shared resolveExactThenCi(...) helper. The two paths have different fallbacks (lowest-id vs. null-on-ambiguity), and on a correctly-based branch the tag code (#730) is the only other user — extracting a 3-way-shared helper is premature. Inline the exact→CI→fallback logic in each path with the load-bearing comment; that's the KISS choice. (If a future refactor does extract it, the fallback must be a parameter, never buried.)
  • Doc currency: update the person/README.md method table (lines 23–24) for the changed findByName / findOrCreateByAlias signatures. Also fix line 23's descriptionfindByName is currently mislabelled "Typeahead search"; its sole caller is filename-based sender resolution in storeDocument. No DB-diagram update (no schema change). No C4 update (no new package/route/service).
  • Leave a load-bearing comment at both repo methods and both resolution points: case-colliding alias/name across persons is valid; do NOT add a uniqueness constraint.
## Summary Same structural lookup bug as #730, in the Person domain — and it lives on **two** lookup surfaces, not one: 1. **Alias path** — `PersonService.findOrCreateByAlias(rawName)` resolves via `personRepository.findByAliasIgnoreCase(alias)` → `Optional<Person>`. If two persons have aliases that collide **case-insensitively** (e.g. `müller` and `Müller`), the case-insensitive derived query matches two rows and Hibernate throws `NonUniqueResultException` → `IncorrectResultSizeDataAccessException` → generic 500. **Latent** — only triggers once two case-colliding rows coexist. 2. **Name path** — `PersonService.findByName(firstName, lastName)` → `personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase(...)` → `Optional<Person>`, called by `DocumentService.storeDocument` (`DocumentService.java:222`) to resolve the upload **sender** from the parsed filename (`.orElse(null)`). Two real people sharing a first+last name (`Hans Müller` / `hans müller`) is **routine**, so this path can throw the same 500 on upload. The original draft asked only to "confirm its caller can't hit the throw" — review established it **can**, so it is in scope here, not a follow-up. Found while fixing #730 (the tag-domain instance). Both are the same root cause; fix both in this one PR (near-identical diff). > **Depends on / branch after #730.** This work mirrors #730. The #731 branch MUST be cut off a base that already has #730 merged — otherwise `TagService.findOrCreate` is still the old throwing `Optional<…>IgnoreCase` form and there is nothing to "mirror" (and no tag code to share a helper with). Do not implement #731 on a #730-less base. ## Root cause `Optional<…> findBy…IgnoreCase(…)` is a leaky identity that only works while the queried column is globally unique case-insensitively. There is no such constraint on `persons.alias` nor on first/last name, and the importer can legitimately create colliding rows. Like #730 this is a **lookup** problem, not a data problem — do **not** add a `unique(lower(alias))` constraint and do **not** merge/dedupe persons. `source_ref` is the stable identity (ADR-025 / `upsertBySourceRef`); `alias`/name are human-editable labels. **Definition — "ambiguous":** 2+ persons match the queried label **case-insensitively** with **no exact-case match**. An exact-case match among case-folding siblings is *not* ambiguous — the exact row always wins. ## Affected code - `backend/.../person/PersonService.java` - `findOrCreateByAlias` (~line 125), via `personRepository.findByAliasIgnoreCase`. **Note:** the create branch (`:130-155`) is non-trivial — SKIP early-return, INSTITUTION/GROUP vs. name-split + maiden-name alias (`PersonNameParser.split` + `aliasRepository`). It must be preserved **verbatim**; do not flatten it to #730's tag one-liner. The `.orElseGet(...)` lambda simply moves below the new exact/CI guards. - `findByName` (~line 114), via `personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase`. The **service** signature stays `Optional<Person>` — only its internals change. - `backend/.../person/PersonRepository.java` - `Optional<Person> findByAliasIgnoreCase(String alias)` (line 33) - `Optional<Person> findByFirstNameIgnoreCaseAndLastNameIgnoreCase(String firstName, String lastName)` (line 39) - `backend/.../document/DocumentService.java:222` — the name-path caller (upload sender resolution). Layering is clean: it already resolves through `personService.findByName(...)`, not the repository, so the fix stays entirely inside `PersonService` + `PersonRepository`. Do **not** lift resolution logic up into `DocumentService`. ## Fix shape ### Alias path — mirror #730 (exact-case-first, non-throwing) ```java String alias = rawName.trim(); // ... SKIP early-return stays ... Optional<Person> exact = personRepository.findByAlias(alias); if (exact.isPresent()) return exact.get(); // exact-case wins List<Person> ci = personRepository.findAllByAliasIgnoreCase(alias); if (!ci.isEmpty()) return ci.stream().min(Comparator.comparing(Person::getId)).orElseThrow(); // deterministic lowest id — NEVER throw // ... existing create lambda (INSTITUTION/GROUP, split, maiden alias) stays ... ``` Delete the `Optional<…> findByAliasIgnoreCase` variant so the throwing call can't be reintroduced. Keep `findOrCreateByAlias` as the single resolution point. Use `.orElseThrow()` (not bare `.get()`) on the `min()` so the "list is non-empty here" invariant is explicit and matches the no-bare-`.get()` rule. (The #730 final form `80f6468d` uses `.orElseThrow()` — mirror that, not the original `.get()`.) ### Name path — non-throwing, **bail to null on ambiguity** (decision below) Add `List<Person> findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(...)` + an exact `findByFirstNameAndLastName(...)`; delete the `Optional<…>IgnoreCase` variant. `findByName` returns: - exact-case match if present; - else, if **exactly one** case-insensitive match → that one; - else (**two or more** CI matches) → **empty** (the upload leaves the sender unset for a human to assign — never an arbitrary guess). **Fail-closed requirement on null first name.** Parsed filenames can have a null first name; today the derived query folds that to a non-match. The new plural method must keep this: confirm Spring Data generates `first_name = ?` (which never matches a null arg), **not** `first_name IS NULL`. A `first_name IS NULL` predicate would silently widen the result and pull a last-name-only / institution row in as a "sender" — a quiet provenance-integrity defect. A null first name must resolve to *no match*. Write that failing test first (repo layer, real Postgres). ### Resolved decision — name-collision sender resolution When a filename resolves to a name shared (case-insensitively) by 2+ persons: - **Alias path keeps deterministic lowest-id (A)** — no human in the importer loop, and it creates-on-absent anyway. - **Name/sender path bails to null (B)** — the archive's value is correct provenance; a confidently-wrong pre-filled `Hans Müller` is worse than an empty field (senior reviewers won't re-check a pre-filled value). Leave the sender unset and let the human assign it. The two divergent strategies are deliberate. The load-bearing comment at the name path must explain **why** it differs from the alias path (provenance correctness > pre-filled guess), not just *that* it differs — otherwise a future "consistency cleanup" reintroduces the confidently-wrong-sender bug. ## Acceptance criteria ```gherkin Scenario: findOrCreateByAlias exact-case match wins over a case-insensitive sibling Given persons with aliases "Müller" and "müller" both exist When findOrCreateByAlias is called with "müller" Then the person whose alias is exactly "müller" is returned Scenario: findOrCreateByAlias exact-case match wins even when multiple siblings collide Given an exact-case alias "Müller" exists alongside one or more case-colliding siblings When findOrCreateByAlias is called with the exact name "Müller" Then the exact-case row is returned and the siblings are ignored Scenario: findOrCreateByAlias never throws on case-insensitive duplicates Given two persons collide case-insensitively on alias and neither equals the queried name exactly When findOrCreateByAlias is called with that name Then it returns the same one deterministically (lowest id) on every call, without throwing Scenario: findOrCreateByAlias still creates when absent (with maiden-name alias) Given no person matches the alias "Clara Cram geb. Bauer" When findOrCreateByAlias is called Then a new person is created AND a MAIDEN_NAME alias "Bauer" is written Scenario: findOrCreateByAlias returns null and creates nothing for a SKIP name Given a raw name that classifies as PersonType.SKIP When findOrCreateByAlias is called Then it returns null and no person is created Scenario: Upload with a unique name resolves the sender Given exactly one person is named "Hans Müller" When a document is uploaded whose filename resolves to "Hans Müller" Then the document's sender is set to that person and the save succeeds Scenario: Upload with a single case-insensitive name match resolves the sender Given exactly one person matches "hans müller" case-insensitively (stored as "Hans Müller") When a document is uploaded whose filename resolves to "hans müller" Then the document's sender is set to that person and the save succeeds Scenario: Upload with an ambiguous name leaves the sender unset (no 500) Given two persons share the name "Hans Müller" / "hans müller" When a document is uploaded whose filename resolves to that name Then the save succeeds (no 500) AND the sender is left null (not an arbitrary person) Scenario: Upload with a last-name-only filename leaves the sender unset Given a filename resolves to a last name only (no first name) When the sender is resolved Then the sender is left null (the null first name never widens the match) ``` ## Test strategy **Unit (`PersonServiceTest`, mocked repo) — each branch a separate test:** - exact-case wins · exact-case wins among multiple siblings · single CI match → use it · multiple CI (no exact) → lowest id + **determinism asserted by calling twice and asserting the same id** · SKIP → null · create-when-absent still creates **and** writes the maiden-name alias · `" Clara Cram "` trims then resolves · **null first name → folds to non-match**. - **Update the existing mocks** — `findByAliasIgnoreCase` is mocked in 8 `PersonServiceTest` cases (lines 378, 390, 403, 425, 442, 460, 472, 476); they return `Optional` and won't compile after the method change. - **`DocumentServiceTest` is unaffected — leave it alone.** Its mocks at `DocumentServiceTest.java:1072/1089/1104` stub `personService.findByName(...)` returning `Optional`; since the service signature stays `Optional<Person>`, they still compile. Do not edit them. **Repository (`PersonRepositoryTest`, `@DataJpaTest`) — MUST be migrated, not just deleted:** - `PersonRepositoryTest.java:123-154` calls `findByAliasIgnoreCase` (`:126`, `:137`) and `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` (`:146`/`:149`) **directly**. Deleting those two repo variants breaks this test's **compilation**. Rewrite the three @Test methods against the new methods (`findByAlias`, `findAllByAliasIgnoreCase`, `findByFirstNameAndLastName`, `findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase`). - Add an **umlaut row at the repository layer too** (e.g. `findAllByAliasIgnoreCase("müller")` returns both `Müller`/`müller`) — proves the derived query itself folds case in real Postgres, independent of the service test. - Add the **null-first-name repo test**: `findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(null, "Müller")` returns **empty** (no widening). This NULL-semantics bug cannot be caught by a mocked unit test — assert it against real Postgres. **Integration (Testcontainers `postgres:16`, in `PersonServiceIntegrationTest`):** - **Umlaut pair is mandatory** — seed `Müller`/`müller`; only the real DB proves Postgres `LOWER()` folds `ü`. A plain-ASCII test stays green while umlaut aliases regress. - Alias resolution: colliding pair → deterministic lowest id, no throw (split into two tests: one asserts the chosen id, one asserts the second call returns the same id). - Name path: seed `Hans Müller` / `hans müller`, drive the upload/`findByName` path → **200, sender null** (assert it's null, not an arbitrary person). "No 500" alone is too weak — assert the surviving value. Add the happy single-match case (unique `Hans Müller` → sender set) and the null-first-name case. - `PersonServiceIntegrationTest` is already `@Transactional` (rollback) — keep it so seeded umlaut rows don't leak across tests. **Coverage:** backend gate is **88% branch** (JaCoCo). New branches (exact-present / single-CI / multi-CI / create / SKIP / null-firstName) each need a hitting test. ## Verification / observability - **Opaque error path — already satisfied today, pin it with a test.** `GlobalExceptionHandler` has no dedicated handler for `IncorrectResultSizeDataAccessException`; it falls through to the generic `@ExceptionHandler(Exception.class)` (~line 100) → `Sentry.captureException` + `log.error` + a wire body of `INTERNAL_ERROR` / "An unexpected error occurred". No Hibernate class name, SQL, or row count reaches the client. After the fix this path no longer throws, but keep/add a regression test asserting the body carries no Hibernate class name, no SQL, no "2 results were returned" — it guards the handler against being weakened later. - **Pre-deploy detection (read-only — run against prod with the app read role and attach counts to the PR so we know latent vs. already-active):** ```sql SELECT lower(alias) AS k, count(*) FROM persons WHERE alias IS NOT NULL GROUP BY lower(alias) HAVING count(*) > 1; SELECT lower(first_name), lower(last_name), count(*) FROM persons GROUP BY lower(first_name), lower(last_name) HAVING count(*) > 1; ``` If the alias query returns rows the import path can already 500; if the name query returns rows the upload path can. Both are read-only (`GROUP BY … HAVING`). - Post-deploy: search GlitchTip for `IncorrectResultSizeDataAccessException` / `NonUniqueResultException` filtered to `person` stack frames, record the group id in the PR, and confirm zero new occurrences in the 24h after deploy. ## UX note Option B leaves the sender **unset** on an ambiguous filename. Verified the empty-sender state already degrades cleanly and is tested (`DocumentRow.svelte` "renders the unknown placeholder when sender is null"; `DocumentMetadataDrawer` no-persons placeholder), and `metadataComplete=false` already routes the doc into the "needs completion" state. So this is adequately covered here — **no UI work in this PR, no blocker.** Optional follow-up (file as a **separate** low-priority issue, not this PR): an inline hint on the edit form so the transcriber learns *why* the sender is empty. A11y spec for whoever picks it up: redundant cue (info icon + text, not color-only), German copy "Absender konnte aus dem Dateinamen nicht eindeutig zugeordnet werden", min 16px, `text-ink` on `bg-surface` (do **not** use `brand-mint` as text — ~2.8:1 fails AA), and `aria-describedby` tying the hint to the sender field. Keep the ambiguous-copy ("nicht eindeutig zuordenbar") distinct from a genuinely-unknown sender ("kein Absender"). ## Notes - **NO DB migration.** Code-only, fully reversible (roll back the JAR). Resist a `unique(lower(...))` index — it's semantically wrong (collisions are valid) **and** would fail to apply against existing colliding rows, turning a clean deploy into a failed Flyway migration that blocks startup. - **Implement as two atomic commits within the one PR** (alias-path fix + tests; name-path fix + tests) so each path is independently revertible. Each commit starts with its failing test (red → green). - **ADR — resolved (deliverable, not optional): extend ADR-032.** `docs/adr/032-tag-name-resolution-tolerates-case-collisions.md` (created for #730, on `main`) already names this Person path as the tracked sibling ("The sibling Person path is unfixed but tracked … deferred to #731"). This fix is the same root-cause pattern, so **add a "Person extension" section to ADR-032** rather than a new ADR — keep the case-collision resolution story in one document. The section records: alias path = deterministic lowest-id (mirrors tag); name/sender path = bail-to-null with the provenance rationale (the one genuinely new wrinkle vs. the tag decision); and the same "no `unique(lower(...))` constraint" stance. Flip the load-bearing code comment to point at ADR-032 (not just "#730"). _(Aside, separate housekeeping: the `feat/issue-684-person-delete-db-fk` branch independently created a different `docs/adr/032-person-delete-db-level-integrity.md` — an ADR-number collision to resolve when that branch merges; it does not affect this issue, which extends the `main` ADR-032.)_ - **Skip the shared `resolveExactThenCi(...)` helper.** The two paths have *different* fallbacks (lowest-id vs. null-on-ambiguity), and on a correctly-based branch the tag code (#730) is the only other user — extracting a 3-way-shared helper is premature. Inline the exact→CI→fallback logic in each path with the load-bearing comment; that's the KISS choice. (If a future refactor does extract it, the fallback must be a parameter, never buried.) - Doc currency: update the `person/README.md` method table (lines 23–24) for the changed `findByName` / `findOrCreateByAlias` signatures. **Also fix line 23's description** — `findByName` is currently mislabelled "Typeahead search"; its sole caller is filename-based **sender resolution** in `storeDocument`. No DB-diagram update (no schema change). No C4 update (no new package/route/service). - Leave a load-bearing comment at both repo methods and both resolution points: case-colliding alias/name across persons is valid; do **NOT** add a uniqueness constraint.
marcel added the P2-mediumbugperson labels 2026-06-06 10:55:01 +02:00
Author
Owner

Implemented in PR #745 (fix/issue-731-person-case-collision, branched off main with #730 merged).

Two atomic commits, each red→green TDD:

  • 20cfe41f — alias path: findOrCreateByAlias resolves exact-case → lowest-id case-insensitive sibling → existing create branch (INSTITUTION/GROUP + maiden-name alias preserved verbatim); throwing findByAliasIgnoreCase deleted.
  • ddf378aa — name/sender path: findByName resolves exact-case → single CI → else empty (ambiguous sender left unset, not guessed). Both new name queries use explicit HQL equality so a null first name binds = NULL (no match), not the derived first_name IS NULL widening. Throwing findByFirstNameIgnoreCaseAndLastNameIgnoreCase deleted. Opaque-error path pinned. ADR-032 extended with the Person section; person/README.md findByName mislabel fixed.

Acceptance criteria — all covered by tests:

  • exact-case wins (incl. among multiple siblings) · never throws on CI duplicates, deterministic lowest id · still creates with maiden-name alias · SKIP → null · unique name → sender set · single-CI name → sender set · ambiguous name → sender null, no 500 · last-name-only / null-first-name → no sender
  • Tests: PersonServiceTest (64), PersonRepositoryTest (52, Testcontainers, incl. umlaut folding + null-first-name on real Postgres), PersonServiceIntegrationTest (19, upload path via storeDocument), GlobalExceptionHandlerTest (4). TagCaseCollisionIntegrationTest (3) and DocumentServiceTest (187) re-run green. clean package BUILD SUCCESS.

No DB migration — code-only, reversible; no unique(lower(...)) constraint; no merge/dedupe; no shared helper.

Deferred to you (needs prod / GlitchTip access): the two read-only pre-deploy detection queries (latent vs. already-active counts), and the post-deploy GlitchTip check for IncorrectResultSizeDataAccessException / NonUniqueResultException in person frames.

Next: multi-persona review on PR #745.

Implemented in **PR #745** (`fix/issue-731-person-case-collision`, branched off `main` with #730 merged). **Two atomic commits, each red→green TDD:** - `20cfe41f` — alias path: `findOrCreateByAlias` resolves exact-case → lowest-id case-insensitive sibling → existing create branch (INSTITUTION/GROUP + maiden-name alias preserved verbatim); throwing `findByAliasIgnoreCase` deleted. - `ddf378aa` — name/sender path: `findByName` resolves exact-case → single CI → else **empty** (ambiguous sender left unset, not guessed). Both new name queries use explicit HQL equality so a **null first name binds `= NULL` (no match)**, not the derived `first_name IS NULL` widening. Throwing `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` deleted. Opaque-error path pinned. ADR-032 extended with the Person section; `person/README.md` `findByName` mislabel fixed. **Acceptance criteria — all covered by tests:** - exact-case wins (incl. among multiple siblings) ✅ · never throws on CI duplicates, deterministic lowest id ✅ · still creates with maiden-name alias ✅ · SKIP → null ✅ · unique name → sender set ✅ · single-CI name → sender set ✅ · ambiguous name → sender null, no 500 ✅ · last-name-only / null-first-name → no sender ✅ - Tests: `PersonServiceTest` (64), `PersonRepositoryTest` (52, Testcontainers, incl. umlaut folding + null-first-name on real Postgres), `PersonServiceIntegrationTest` (19, upload path via `storeDocument`), `GlobalExceptionHandlerTest` (4). `TagCaseCollisionIntegrationTest` (3) and `DocumentServiceTest` (187) re-run green. `clean package` BUILD SUCCESS. **No DB migration** — code-only, reversible; no `unique(lower(...))` constraint; no merge/dedupe; no shared helper. **Deferred to you (needs prod / GlitchTip access):** the two read-only pre-deploy detection queries (latent vs. already-active counts), and the post-deploy GlitchTip check for `IncorrectResultSizeDataAccessException` / `NonUniqueResultException` in person frames. Next: multi-persona review on PR #745.
Sign in to join this conversation.
No Label P2-medium bug person
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#731