fix(person): resolve ambiguous sender names to null on upload (#731)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m18s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m38s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s

findByName resolved via Optional<Person>
findByFirstNameIgnoreCaseAndLastNameIgnoreCase, which threw
NonUniqueResultException once two people shared a first+last name case-
insensitively (hans müller / Hans Müller) — a 500 on the routine upload path
(DocumentService.storeDocument sender resolution).

findByName now resolves exact-case → single case-insensitive match → else
empty. The sender path deliberately diverges from the alias path: an
ambiguous name leaves the sender UNSET rather than guessing the lowest id,
because correct provenance beats a confidently-wrong pre-fill a reviewer
won't re-check. The two new name queries use explicit HQL equality so a null
first name binds as `= NULL` (no match) instead of the derived-query fold to
`first_name IS NULL`, which would widen a last-name-only row in as a sender.

Pins the opaque error path (IncorrectResultSizeDataAccessException stays
INTERNAL_ERROR with no Hibernate/SQL/row-count leak) and extends ADR-032 with
the Person section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-06-06 13:03:04 +02:00
parent 20cfe41f21
commit ddf378aaac
8 changed files with 223 additions and 17 deletions

View File

@@ -43,8 +43,22 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
// Lookup by the normalizer person_id, used for idempotent canonical re-import (Phase 3). // Lookup by the normalizer person_id, used for idempotent canonical re-import (Phase 3).
Optional<Person> findBySourceRef(String sourceRef); Optional<Person> findBySourceRef(String sourceRef);
// Exact first+last name match, used for filename-based sender lookup // Exact-case first+last name match — the first step of filename-based sender resolution.
Optional<Person> findByFirstNameIgnoreCaseAndLastNameIgnoreCase(String firstName, String lastName); // Explicit `=` (HQL, not a derived query) so a null firstName binds as `first_name = NULL`
// — never a match — instead of the derived-query fold to `first_name IS NULL`, which would
// pull a last-name-only row in as a sender (a provenance defect). See ADR-032.
@Query("SELECT p FROM Person p WHERE p.firstName = :firstName AND p.lastName = :lastName")
Optional<Person> findByFirstNameAndLastName(@Param("firstName") String firstName,
@Param("lastName") String lastName);
// Plural case-insensitive first+last name match — lets findByName bail to empty on 2+ matches
// instead of letting a derived Optional<…>IgnoreCase throw NonUniqueResultException. Same
// null fail-closed guarantee as above: LOWER(:firstName) is NULL for a null arg, so a null
// first name resolves to no match (not first_name IS NULL widening). See ADR-032.
@Query("SELECT p FROM Person p WHERE LOWER(p.firstName) = LOWER(:firstName) "
+ "AND LOWER(p.lastName) = LOWER(:lastName)")
List<Person> findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(@Param("firstName") String firstName,
@Param("lastName") String lastName);
// --- PersonSummaryDTO with document count --- // --- PersonSummaryDTO with document count ---

View File

@@ -111,7 +111,16 @@ public class PersonService {
} }
public Optional<Person> findByName(String firstName, String lastName) { public Optional<Person> findByName(String firstName, String lastName) {
return personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase(firstName, lastName); Optional<Person> exact = personRepository.findByFirstNameAndLastName(firstName, lastName);
if (exact.isPresent()) return exact;
List<Person> caseInsensitive =
personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(firstName, lastName);
// Deliberate divergence from findOrCreateByAlias: an ambiguous filename leaves the sender
// UNSET rather than picking the lowest id. The archive's value is correct provenance — a
// confidently-wrong pre-filled "Hans Müller" is worse than an empty field, because a
// reviewer won't re-check a pre-filled value. Do NOT "consistency-clean" this into the
// lowest-id fallback. See ADR-032.
return caseInsensitive.size() == 1 ? Optional.of(caseInsensitive.get(0)) : Optional.empty();
} }
/** Lookup by the normalizer person_id — used by the canonical importer for register-first matching. */ /** Lookup by the normalizer person_id — used by the canonical importer for register-first matching. */

View File

@@ -20,8 +20,8 @@ Features: person CRUD, name alias management, person merge (deduplication), fami
| `getById(UUID)` | document, geschichte, ocr | Fetch one person by ID | | `getById(UUID)` | document, geschichte, ocr | Fetch one person by ID |
| `getAllById(List<UUID>)` | document | Bulk fetch for sender/receiver resolution | | `getAllById(List<UUID>)` | document | Bulk fetch for sender/receiver resolution |
| `findAll(String q)` | document, dashboard | List all persons | | `findAll(String q)` | document, dashboard | List all persons |
| `findByName(String firstName, String lastName)` | document | Typeahead search | | `findByName(String firstName, String lastName)` | document | Filename-based **sender resolution** in `storeDocument`: exact-case match → single case-insensitive match → else **empty** (ambiguous names leave the sender unset; a null first name never matches). See ADR-032. |
| `findOrCreateByAlias(String rawName)` | importing | Idempotent create during mass import; type classification happens internally | | `findOrCreateByAlias(String rawName)` | importing | Idempotent create during mass import; type classification happens internally. Resolves exact-case → lowest-id case-insensitive sibling → create — never throws on case-colliding aliases. See ADR-032. |
| `findAllFamilyMembers()` | dashboard | Family member list for stats | | `findAllFamilyMembers()` | dashboard | Family member list for stats |
| `findCorrespondents()` | document | Correspondent list for conversation filter | | `findCorrespondents()` | document | Correspondent list for conversation filter |
| `count()` | dashboard | Total person count for stats | | `count()` | dashboard | Total person count for stats |

View File

@@ -12,6 +12,7 @@ import org.mockito.MockedStatic;
import org.mockito.junit.jupiter.MockitoExtension; import org.mockito.junit.jupiter.MockitoExtension;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
import org.springframework.dao.DataIntegrityViolationException; import org.springframework.dao.DataIntegrityViolationException;
import org.springframework.dao.IncorrectResultSizeDataAccessException;
import org.springframework.http.ResponseEntity; import org.springframework.http.ResponseEntity;
import static org.assertj.core.api.Assertions.assertThat; import static org.assertj.core.api.Assertions.assertThat;
@@ -37,6 +38,30 @@ class GlobalExceptionHandlerTest {
} }
} }
@Test
void handleGeneric_incorrectResultSize_staysOpaque_noHibernateOrRowCountLeak() {
// #731: before the fix, a case-colliding alias/name made Hibernate throw
// NonUniqueResultException → IncorrectResultSizeDataAccessException, which has no
// dedicated handler and falls through to handleGeneric. The fix removes the throw, but
// this pins the handler: a stray one must stay opaque — no Hibernate class name, no SQL,
// no "2 results were returned" row count reaching the client (CWE-209).
IncorrectResultSizeDataAccessException ex = new IncorrectResultSizeDataAccessException(
"query did not return a unique result: 2 results were returned", 1, 2);
try (MockedStatic<Sentry> sentryMock = mockStatic(Sentry.class)) {
ResponseEntity<GlobalExceptionHandler.ErrorResponse> response = handler.handleGeneric(ex);
assertThat(response.getStatusCode().value()).isEqualTo(500);
assertThat(response.getBody()).isNotNull();
assertThat(response.getBody().code()).isEqualTo(ErrorCode.INTERNAL_ERROR);
assertThat(response.getBody().message())
.isEqualTo("An unexpected error occurred")
.doesNotContain("results were returned")
.doesNotContain("NonUnique")
.doesNotContain("IncorrectResultSize");
}
}
@Test @Test
void handleDataIntegrityViolation_returns400_withoutLeakingConstraint_orSentry() { void handleDataIntegrityViolation_returns400_withoutLeakingConstraint_orSentry() {
// A DataIntegrityViolationException carries the constraint name + SQL in its message; // A DataIntegrityViolationException carries the constraint name + SQL in its message;

View File

@@ -147,19 +147,34 @@ class PersonRepositoryTest {
assertThat(personRepository.findAllByAliasIgnoreCase("MÜLLER")).hasSize(2); assertThat(personRepository.findAllByAliasIgnoreCase("MÜLLER")).hasSize(2);
} }
// ─── findByFirstNameIgnoreCaseAndLastNameIgnoreCase ─────────────────────── // ─── findByFirstNameAndLastName (exact) / findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase ───
// ─── findByFirstNameIgnoreCaseAndLastNameIgnoreCase ───────────────────────
@Test @Test
void findByFirstNameIgnoreCaseAndLastNameIgnoreCase_returnsMatch() { void findByFirstNameAndLastName_returnsExactCaseMatchOnly() {
personRepository.save(Person.builder().firstName("Maria").lastName("Raddatz").build()); personRepository.save(Person.builder().firstName("Maria").lastName("Raddatz").build());
Optional<Person> found = personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase( assertThat(personRepository.findByFirstNameAndLastName("Maria", "Raddatz")).isPresent();
"maria", "raddatz"); assertThat(personRepository.findByFirstNameAndLastName("maria", "raddatz")).isEmpty(); // exact-case only
}
assertThat(found).isPresent(); @Test
assertThat(found.get().getFirstName()).isEqualTo("Maria"); void findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase_foldsUmlautCase_inRealPostgres() {
personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
personRepository.save(Person.builder().firstName("hans").lastName("müller").build());
assertThat(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("HANS", "MÜLLER"))
.hasSize(2);
}
@Test
void findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase_nullFirstName_foldsToNoMatch() {
// Fail-closed: a last-name-only filename (null first name) must NOT widen to first_name IS
// NULL and pull in the institution/last-name-only row as a "sender". Proven on real
// Postgres because a mocked unit test cannot catch the IS NULL vs `= NULL` semantics.
personRepository.save(Person.builder().lastName("Müller").build()); // first_name NULL
assertThat(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(null, "Müller"))
.isEmpty();
} }
// ─── findCorrespondents ─────────────────────────────────────────────────── // ─── findCorrespondents ───────────────────────────────────────────────────

View File

@@ -4,6 +4,7 @@ import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.PostgresContainerConfig; import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.raddatz.familienarchiv.document.Document; import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentRepository; import org.raddatz.familienarchiv.document.DocumentRepository;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentStatus; import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.person.Person; import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonType; import org.raddatz.familienarchiv.person.PersonType;
@@ -16,6 +17,8 @@ import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.transaction.annotation.Transactional; import org.springframework.transaction.annotation.Transactional;
import software.amazon.awssdk.services.s3.S3Client; import software.amazon.awssdk.services.s3.S3Client;
import org.springframework.mock.web.MockMultipartFile;
import jakarta.persistence.EntityManager; import jakarta.persistence.EntityManager;
import jakarta.persistence.PersistenceContext; import jakarta.persistence.PersistenceContext;
@@ -34,6 +37,7 @@ class PersonServiceIntegrationTest {
@Autowired PersonService personService; @Autowired PersonService personService;
@Autowired PersonRepository personRepository; @Autowired PersonRepository personRepository;
@Autowired DocumentRepository documentRepository; @Autowired DocumentRepository documentRepository;
@Autowired DocumentService documentService;
@PersistenceContext EntityManager entityManager; @PersistenceContext EntityManager entityManager;
@@ -104,6 +108,65 @@ class PersonServiceIntegrationTest {
assertThat(second.getId()).isEqualTo(first.getId()); assertThat(second.getId()).isEqualTo(first.getId());
} }
// ─── #731: filename-based sender resolution against real Postgres ──────────
@Test
void storeDocument_resolvesSender_whenFilenameNameIsUnique() throws Exception {
Person hans = personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
Document doc = uploadNamed("1965-03-12_Müller_Hans.pdf").document();
assertThat(doc.getSender()).isNotNull();
assertThat(doc.getSender().getId()).isEqualTo(hans.getId());
}
@Test
void storeDocument_resolvesSender_onSingleCaseInsensitiveMatch() throws Exception {
Person hans = personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
// Filename folds to "hans müller"; the only stored person is "Hans Müller".
Document doc = uploadNamed("1965-03-12_müller_hans.pdf").document();
assertThat(doc.getSender()).isNotNull();
assertThat(doc.getSender().getId()).isEqualTo(hans.getId());
}
@Test
void storeDocument_leavesSenderUnset_whenFilenameNameIsAmbiguous() throws Exception {
// Two persons collide case-insensitively; the filename casing ("HANS"/"MÜLLER") matches
// neither exactly → no exact-case winner → bail to null (never an arbitrary guess), no 500.
personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
personRepository.save(Person.builder().firstName("hans").lastName("müller").build());
Document doc = uploadNamed("1965-03-12_MÜLLER_HANS.pdf").document();
assertThat(doc.getSender()).isNull();
}
@Test
void storeDocument_leavesSenderUnset_whenFilenameHasNoFirstName() throws Exception {
// A last-name-only filename never resolves to a sender (the parser yields no parsed name).
personRepository.save(Person.builder().lastName("Müller").build());
Document doc = uploadNamed("1965-03-12_Müller.pdf").document();
assertThat(doc.getSender()).isNull();
}
@Test
void findByName_nullFirstName_resolvesToEmpty_inRealPostgres() {
// Fail-closed against the real DB: a null first name must NOT widen to first_name IS NULL
// and pick up the last-name-only row.
personRepository.save(Person.builder().lastName("Müller").build()); // first_name NULL
assertThat(personService.findByName(null, "Müller")).isEmpty();
}
private DocumentService.StoreResult uploadNamed(String filename) throws Exception {
MockMultipartFile file = new MockMultipartFile("file", filename, "application/pdf", new byte[]{1, 2, 3});
return documentService.storeDocument(file, null);
}
// ─── #667: confirm round-trip + reader-default semantics ────────────────── // ─── #667: confirm round-trip + reader-default semantics ──────────────────
@Test @Test

View File

@@ -527,6 +527,49 @@ class PersonServiceTest {
verify(personRepository).findByAlias("Clara Cram"); verify(personRepository).findByAlias("Clara Cram");
} }
// ─── findByName (filename-based sender resolution) ────────────────────────
@Test
void findByName_returnsExactCaseMatch_overCaseInsensitiveSibling() {
Person exact = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
when(personRepository.findByFirstNameAndLastName("Hans", "Müller")).thenReturn(Optional.of(exact));
assertThat(personService.findByName("Hans", "Müller")).contains(exact);
verify(personRepository, never()).findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(any(), any());
}
@Test
void findByName_usesSingleCaseInsensitiveMatch_whenNoExactCase() {
Person only = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
when(personRepository.findByFirstNameAndLastName("hans", "müller")).thenReturn(Optional.empty());
when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("hans", "müller"))
.thenReturn(List.of(only));
assertThat(personService.findByName("hans", "müller")).contains(only);
}
@Test
void findByName_bailsToEmpty_whenTwoOrMoreCaseInsensitiveMatches() {
Person a = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
Person b = Person.builder().id(UUID.randomUUID()).firstName("hans").lastName("müller").build();
when(personRepository.findByFirstNameAndLastName("hans", "müller")).thenReturn(Optional.empty());
when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("hans", "müller"))
.thenReturn(List.of(a, b));
// Ambiguous sender → unset, never an arbitrary guess (provenance correctness over a
// confidently-wrong pre-fill). This is the deliberate divergence from the alias path.
assertThat(personService.findByName("hans", "müller")).isEmpty();
}
@Test
void findByName_returnsEmpty_whenFirstNameNullFoldsToNoMatch() {
when(personRepository.findByFirstNameAndLastName(null, "Müller")).thenReturn(Optional.empty());
when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(null, "Müller"))
.thenReturn(List.of());
assertThat(personService.findByName(null, "Müller")).isEmpty();
}
// ─── updatePerson (notes) ──────────────────────────────────────────────── // ─── updatePerson (notes) ────────────────────────────────────────────────
@Test @Test

View File

@@ -82,15 +82,52 @@ added later.
`IncorrectResultSizeDataAccessException`, and `GlobalExceptionHandler`'s generic handler maps `IncorrectResultSizeDataAccessException`, and `GlobalExceptionHandler`'s generic handler maps
any stray one to `INTERNAL_ERROR` with no Hibernate/SQL leak — so no dedicated handler was any stray one to `INTERNAL_ERROR` with no Hibernate/SQL leak — so no dedicated handler was
added. added.
- **The sibling Person path is unfixed but tracked.** `PersonService.findOrCreateByAlias` - **The sibling Person path is fixed the same way — see the Person extension below (#731).**
(`findByAliasIgnoreCase`) and `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` carry the same
latent `Optional`-non-unique throw on user-influenced names; deferred to #731 rather than
widened into this fix.
- Postgres `LOWER()` folding of umlauts (`ü`/`ä`) is the actual correctness hinge of the - Postgres `LOWER()` folding of umlauts (`ü`/`ä`) is the actual correctness hinge of the
fallback and cannot be proven by a mocked repo, so it is pinned by a Testcontainers fallback and cannot be proven by a mocked repo, so it is pinned by a Testcontainers
`postgres:16-alpine` test on a `Glückwünsche`/`glückwünsche` pair; a plain-ASCII test would `postgres:16-alpine` test on a `Glückwünsche`/`glückwünsche` pair; a plain-ASCII test would
stay green while the bug reappeared for umlaut tags. stay green while the bug reappeared for umlaut tags.
## Person extension (#731)
The Person domain carried the same latent throw on **two** user-influenced lookup surfaces, and
is fixed with the same exact-case-first, non-throwing pattern — but with a deliberately
**different fallback per surface**, because the two paths have different consequences.
- **Alias path — `PersonService.findOrCreateByAlias` — deterministic lowest-id (mirrors tag).**
`findByAliasIgnoreCase` (`Optional`) is replaced by `findByAlias` (exact) → `findAllByAliasIgnoreCase`
(plural, lowest id) → the existing create-when-absent branch (INSTITUTION/GROUP and the
maiden-name alias are preserved verbatim). There is no human in the importer loop and the path
creates-on-absent anyway, so a deterministic guess is the right behaviour — exactly like tags.
- **Name/sender path — `PersonService.findByName` — bail to null on ambiguity (the new wrinkle).**
Used only by `DocumentService.storeDocument` to resolve the upload **sender** from the parsed
filename. `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` (`Optional`) is replaced by
`findByFirstNameAndLastName` (exact) → `findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase`
(plural). Resolution returns the exact-case match, else the single case-insensitive match, else
— on **two or more** matches — **empty**. The sender is left unset rather than guessing.
**Why this diverges from the alias (and tag) decision:** the archive's value is correct
provenance. A confidently-wrong pre-filled `Hans Müller` is worse than an empty field, because a
senior reviewer will not re-check a value that is already filled in, whereas an empty sender
routes the document into the "needs completion" state (`metadataComplete=false`) for a human to
assign. The load-bearing comment at `findByName` records this so a future "consistency cleanup"
does not reintroduce the confidently-wrong-sender bug by switching it to lowest-id.
- **Fail-closed on a null first name.** A parsed filename can lack a first name. The two new name
methods use explicit HQL equality (`= :firstName`) rather than a derived
`…IgnoreCase` query, because Spring Data folds a null derived-query argument to `first_name IS
NULL` — which would silently widen the match and pull a last-name-only / institution row in as a
"sender" (a quiet provenance-integrity defect). With HQL equality a null binds as `= NULL`,
which never matches, so a null first name resolves to **no sender**. This is pinned by a
real-Postgres repository test.
- **Same stance as tags otherwise:** no `unique(lower(alias))` / `unique(lower(name))` constraint
(collisions are valid human labels; `source_ref` is the stable identity per ADR-025), no
merge/dedupe, code-only and reversible, and no shared `resolveExactThenCi(...)` helper — the
two Person paths have different fallbacks, so the exact→CI→fallback logic is inlined at each
with its load-bearing comment (KISS).
## Alternatives considered ## Alternatives considered
- **A `unique(lower(name))` index** — rejected: the collisions are valid canonical nodes, and - **A `unique(lower(name))` index** — rejected: the collisions are valid canonical nodes, and