fix(person): resolve ambiguous sender names to null on upload (#731)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m18s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m38s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m18s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m38s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
findByName resolved via Optional<Person> findByFirstNameIgnoreCaseAndLastNameIgnoreCase, which threw NonUniqueResultException once two people shared a first+last name case- insensitively (hans müller / Hans Müller) — a 500 on the routine upload path (DocumentService.storeDocument sender resolution). findByName now resolves exact-case → single case-insensitive match → else empty. The sender path deliberately diverges from the alias path: an ambiguous name leaves the sender UNSET rather than guessing the lowest id, because correct provenance beats a confidently-wrong pre-fill a reviewer won't re-check. The two new name queries use explicit HQL equality so a null first name binds as `= NULL` (no match) instead of the derived-query fold to `first_name IS NULL`, which would widen a last-name-only row in as a sender. Pins the opaque error path (IncorrectResultSizeDataAccessException stays INTERNAL_ERROR with no Hibernate/SQL/row-count leak) and extends ADR-032 with the Person section. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -82,15 +82,52 @@ added later.
|
||||
`IncorrectResultSizeDataAccessException`, and `GlobalExceptionHandler`'s generic handler maps
|
||||
any stray one to `INTERNAL_ERROR` with no Hibernate/SQL leak — so no dedicated handler was
|
||||
added.
|
||||
- **The sibling Person path is unfixed but tracked.** `PersonService.findOrCreateByAlias`
|
||||
(`findByAliasIgnoreCase`) and `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` carry the same
|
||||
latent `Optional`-non-unique throw on user-influenced names; deferred to #731 rather than
|
||||
widened into this fix.
|
||||
- **The sibling Person path is fixed the same way — see the Person extension below (#731).**
|
||||
- Postgres `LOWER()` folding of umlauts (`ü`/`ä`) is the actual correctness hinge of the
|
||||
fallback and cannot be proven by a mocked repo, so it is pinned by a Testcontainers
|
||||
`postgres:16-alpine` test on a `Glückwünsche`/`glückwünsche` pair; a plain-ASCII test would
|
||||
stay green while the bug reappeared for umlaut tags.
|
||||
|
||||
## Person extension (#731)
|
||||
|
||||
The Person domain carried the same latent throw on **two** user-influenced lookup surfaces, and
|
||||
is fixed with the same exact-case-first, non-throwing pattern — but with a deliberately
|
||||
**different fallback per surface**, because the two paths have different consequences.
|
||||
|
||||
- **Alias path — `PersonService.findOrCreateByAlias` — deterministic lowest-id (mirrors tag).**
|
||||
`findByAliasIgnoreCase` (`Optional`) is replaced by `findByAlias` (exact) → `findAllByAliasIgnoreCase`
|
||||
(plural, lowest id) → the existing create-when-absent branch (INSTITUTION/GROUP and the
|
||||
maiden-name alias are preserved verbatim). There is no human in the importer loop and the path
|
||||
creates-on-absent anyway, so a deterministic guess is the right behaviour — exactly like tags.
|
||||
|
||||
- **Name/sender path — `PersonService.findByName` — bail to null on ambiguity (the new wrinkle).**
|
||||
Used only by `DocumentService.storeDocument` to resolve the upload **sender** from the parsed
|
||||
filename. `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` (`Optional`) is replaced by
|
||||
`findByFirstNameAndLastName` (exact) → `findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase`
|
||||
(plural). Resolution returns the exact-case match, else the single case-insensitive match, else
|
||||
— on **two or more** matches — **empty**. The sender is left unset rather than guessing.
|
||||
|
||||
**Why this diverges from the alias (and tag) decision:** the archive's value is correct
|
||||
provenance. A confidently-wrong pre-filled `Hans Müller` is worse than an empty field, because a
|
||||
senior reviewer will not re-check a value that is already filled in, whereas an empty sender
|
||||
routes the document into the "needs completion" state (`metadataComplete=false`) for a human to
|
||||
assign. The load-bearing comment at `findByName` records this so a future "consistency cleanup"
|
||||
does not reintroduce the confidently-wrong-sender bug by switching it to lowest-id.
|
||||
|
||||
- **Fail-closed on a null first name.** A parsed filename can lack a first name. The two new name
|
||||
methods use explicit HQL equality (`= :firstName`) rather than a derived
|
||||
`…IgnoreCase` query, because Spring Data folds a null derived-query argument to `first_name IS
|
||||
NULL` — which would silently widen the match and pull a last-name-only / institution row in as a
|
||||
"sender" (a quiet provenance-integrity defect). With HQL equality a null binds as `= NULL`,
|
||||
which never matches, so a null first name resolves to **no sender**. This is pinned by a
|
||||
real-Postgres repository test.
|
||||
|
||||
- **Same stance as tags otherwise:** no `unique(lower(alias))` / `unique(lower(name))` constraint
|
||||
(collisions are valid human labels; `source_ref` is the stable identity per ADR-025), no
|
||||
merge/dedupe, code-only and reversible, and no shared `resolveExactThenCi(...)` helper — the
|
||||
two Person paths have different fallbacks, so the exact→CI→fallback logic is inlined at each
|
||||
with its load-bearing comment (KISS).
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- **A `unique(lower(name))` index** — rejected: the collisions are valid canonical nodes, and
|
||||
|
||||
Reference in New Issue
Block a user