As a searcher I want a single direct person match auto-selected in smart search so I'm not forced to disambiguate when there's only one real match #763

Closed
opened 2026-06-06 20:20:16 +02:00 by marcel · 1 comment
Owner

Problem

In natural-language ("smart") search, a query like "Briefe von Clara Cram" shows the disambiguation picker even when exactly one person is actually named Clara Cram.

Root cause: NlQueryParserService.resolveNames() resolves person names via personService.findByDisplayNameContaining(name) — a substring match — and then treats any result count > 1 as ambiguous:

List<Person> candidates = personService.findByDisplayNameContaining(name);
...
if (capped.isEmpty()) {
    noMatchFragments.add(name);
} else if (capped.size() == 1) {
    // auto-resolve
} else {
    // ALL candidates -> ambiguous -> picker
}

So "Clara Cram" returns the picker whenever a Clara Cram coexists with any other substring hit such as Clara Cramer or Clara Crammond, even though there is exactly one direct match.

backend/src/main/java/org/raddatz/familienarchiv/search/NlQueryParserService.java:89-123

Desired behaviour

  • Exactly one token-direct match → auto-select it (ignore looser partial substring matches).
  • Multiple token-direct matches (genuinely "multiple Clara Crams") → picker showing the direct matches.
  • Zero direct, but partial/substring matches exist → picker showing the partials as suggestions, with a "Meintest du …?" panel heading (see Frontend). The cap-after-classify rule below is load-bearing here, not a micro-optimization: a common surname can produce one direct match plus ≥10 partials for the same name.
  • No candidates at all → fold the name into full-text search (today's no-match behaviour).

Definition of a "direct" match (alias / maiden-name aware)

Token / word-boundary match (order-independent, tolerates middle names), evaluated against all of a person's name components, not just the formatted display name.

  1. Tokenize the extracted name. The tokenizer shall lowercase the input and split on whitespace, hyphen, and apostrophe, dropping empty tokens. The same tokenizer is applied symmetrically to the query and to every candidate name component — symmetry guarantees "Anna-Maria" (query) matches "Anna Maria" (person) and vice versa. Worked output: Anna-Maria{anna, maria}; D'Angelo{d, angelo}; Müller{müller}; "Clara Cram" (double space) → {clara, cram}.
  2. Build the candidate's name-token union from every structured name field: firstName, lastName, alias, each PersonNameAlias's firstName + lastName (covers MAIDEN_NAME, BIRTH, WIDOWED, DIVORCED, OTHER), and title (see note). Tokenize each the same way and union into one set → personTokens.
  3. Direct iff every queryToken is present as a whole token in personTokens (set containment).
    • "Clara Cram" → direct for "Clara Cram" and "Clara Maria Cram" (middle name); reordered "Cram Clara" also matches.
    • "Clara Cram" → direct for a person whose displayName is "Clara Müller" but who carries a maiden-name alias "Cram"clara from firstName, cram from the maiden alias's lastName.
    • "Clara Cram" → not direct for "Clara Cramer" (cramcramer).
  4. Partial = a candidate that matched the substring fetch but is not direct (e.g. typing "Cram" → "Clara Cramer").

Token-set containment is chosen over regex \b word boundaries to avoid umlaut/diacritic edge cases and for natural order-independence. The union spans separate name components (so firstName + maiden lastName together satisfy "Clara Cram"), which is intended — a query token need only match some component, not a single name form.

title is classification-only (superset tolerance). A title token like "Dr" never appears in an LLM-extracted person name and is never the sole basis for a match; including title in the union just keeps "Dr Clara Cram" from being rejected. It is deliberately not added to the candidate-fetch query — a title-only match is meaningless. Every other component participates in both fetch and classification (see resolved decision #3).

Resolved decisions (from multi-persona review)

  1. Maiden-name / alias matches count as DIRECT. Married women in a 1899–1950 letter archive routinely appear under both married and maiden names; a unique née match must auto-select, not force a click. Hence the alias-aware union in step 2.
  2. The matching rule lives in the Person domain. Implement as PersonService.resolveByName(String name) returning a Person-domain result record record NameMatches(List<Person> direct, List<Person> partial) — fields named direct/partial (name-match strength), not resolved/ambiguous (search-layer vocabulary). Do not reuse the private NameResolution record currently inside NlQueryParserService across the boundary. NlQueryParserService maps the NameMatches result into its own resolved/ambiguous/noMatch buckets. Keep findByDisplayNameContaining as the per-token fetch primitive, called from inside resolveByName (it is not removed — the person-search typeahead still uses it).
  3. Close the fetch-vs-classify gap by extending the fetch. Today PersonRepository.searchByName matches firstName lastName / lastName firstName concatenations, p.alias, and a.lastName — but not a.firstName. The classifier (step 2) accepts alias firstName tokens, so without this fix a candidate matchable only via an alias first name is never fetched and can never be classified direct. Add OR LOWER(a.firstName) LIKE LOWER(CONCAT('%', :query, '%')) to searchByName (reuses the same bound :query parameter — stays injection-proof). The person_name_aliases.first_name column already exists, so no Flyway migration / no DB-diagram update. (Benign side effect: the typeahead now also matches alias first names.) title stays out of the fetch by design.
  4. Differentiate the picker copy — IN SCOPE. A single partial still shows a 1-item picker, but the copy must match the case (see Frontend).
  5. The "Meintest du …?" prompt lives in the non-truncated panel heading, not the trigger. The trigger button keeps showing the bare name(s) (its name span is truncated at max-w-[8rem]/max-w-[12rem], which would clip "Meintest du Clara Cra…" at 320px and hide the suggestion from the senior audience). Putting the prompt in the <ul> panel heading guarantees the full suggested name renders, at body size, whenever the picker opens. Also drop the standalone (auswählen…) cue (search_disambiguation_cue) in the 1-partial branch — "Meintest du …?" already implies the action; keep the cue for the ≥2 case.

Implementation notes

In PersonService.resolveByName(String name):

  1. Tokenize once (per the contract above), then dedupe and cap to MAX_TOKENS = 8 distinct tokens before the fetch loop. This cap is a DoS control (each token is one unindexed leading-wildcard LIKE scan), not just perf — make it a tested invariant. Keep the existing MAX_NAME_LENGTH = 200 input guard in NlQueryParserService as the first line of defense. Co-locate MAX_TOKENS/MAX_CANDIDATES in PersonService (the layer that owns the loop); leave MAX_NAME_LENGTH in NlQueryParserService. Don't duplicate constants across both classes.
  2. Empty-token guard (guard clause at the top): if queryTokens is empty (name was all punctuation/whitespace), return empty direct + empty partial. Set.containsAll(emptySet) returns true, so without this guard every candidate would be marked direct. The search layer then folds the name to full-text search.
  3. Build a candidate pool = union of findByDisplayNameContaining(token) over each capped token, deduped by id.
  4. Classify each candidate as direct or partial via the alias-aware token-union rule.
  5. Cap to MAX_CANDIDATES (10) AFTER classification, never beforesearchByName returns ORDER BY lastName, firstName, so a direct match can sort past position 10; capping the raw list first can discard the one true direct match.
  6. Observability: add a single log.debug recording the outcome bucket per name — direct=1 / direct>=2 / partial-only / no-match — plus the token count (not the raw name, to keep PII out). Makes "is auto-select firing?" / "why slow?" answerable from Loki without a redeploy.

Keep resolveByName a short orchestrator — tokenize → cap → fetchPool → classify → capAfterClassify — each a private helper under 20 lines, empty-token guard first.

NlQueryParserService.resolveNames() then maps NameMatches per name:

  • 1 direct → resolved
  • ≥2 direct → ambiguous (direct matches only)
  • 0 direct, ≥1 partial → ambiguous (partials, as suggestions — 1-item picker allowed)
  • 0 candidates → noMatchFragments

Preserve existing behaviour: the up-to-2 resolved cap with overflow → extraFragments; the early return when any name is ambiguous; sender/receiver role mapping (buildSender/buildReceiver, isAnyRole) including the two-distinct-names case (Walter AND Emma → sender=person[0], receiver=person[1]).

Docs (do before opening the PR)

  • docs/GLOSSARY.md — entry for NameMatches (direct vs partial name-match strength).
  • frontend/Person domain person/README.md — add PersonService.resolveByName to the public-surface list.
  • No ADR (localized matching rule, choices justified inline).

Frontend changes

The picker copy must reflect which case it is — "Mehrere Personen gefunden" on a 1-item picker fails Nielsen heuristic #2 (system matches reality), which matters for the 60+ audience.

  • Trigger button stays as-is — shows the bare name(s) with the existing search_disambiguation_trigger_label. Do not put the "Meintest du" framing here (it truncates at 320px).
  • Panel heading carries the prompt. In frontend/src/routes/documents/+page.svelte (the nlIsAmbiguous branch, :58/:438), derive the <ul> heading from ambiguousPersons.length: length === 1 → "Meintest du …?" / "Did you mean …?"; length ≥ 2 → keep the existing "Person auswählen" / "Mehrere Personen gefunden" framing.
  • Pass the chosen heading into DisambiguationPicker.svelte as a prop (it currently calls m.search_disambiguation_heading() internally at :69, and m.search_disambiguation_trigger_label() at :57) — keep the component dumb, let the page decide.
  • Drop the m.search_disambiguation_cue() ("(auswählen…)", :62) in the 1-partial branch; keep it for ≥2.
  • Add the new Paraglide keys (e.g. search_disambiguation_did_you_mean) in all three locales messages/{de,en,es}.json, following the existing search_disambiguation_* pattern (verified present in all three).
  • The per-option aria-label ({name} auswählen) reads correctly for both cases — no change.
  • DisambiguationPicker already meets the a11y baseline (44px targets, aria-haspopup/expanded/controls, focus-visible rings, Esc-to-close + focus return) — no new a11y work.

Acceptance criteria

  • "Clara Cram" with one Clara Cram + a Clara Cramer in the DB → auto-selects Clara Cram (resolved chip, no picker).
  • Two persons literally named "Clara Cram" → picker lists both.
  • "Clara Cram" matches "Clara Maria Cram" (middle name) as direct.
  • "Clara Cram" matches a person whose displayName is "Clara Müller" with a maiden-name alias "Cram" → direct (auto-select).
  • A person matchable only via an alias first name is fetched and classifiable (fetch query extended with a.firstName).
  • Tokenizer output is pinned: Anna-Maria{anna, maria}, D'Angelo{d, angelo}, Müller{müller}, lowercased, empties dropped.
  • "Cram" with only "Clara Cramer" present → 1-item picker, panel heading reads "Meintest du …?" (not "Mehrere Personen gefunden"), and the (auswählen…) cue is suppressed.
  • Partial-only with ≥2 candidates → picker keeps the "Mehrere Personen gefunden" framing.
  • A name with no substring candidates at all → folded into full-text search.
  • An all-punctuation / empty-after-tokenizing name → folded into full-text search (no false "direct" matches).
  • A direct match that sorts beyond MAX_CANDIDATES among partials is still auto-selected (cap applied after classification).
  • A name with >8 distinct tokens issues at most 8 findByDisplayNameContaining/searchByName invocations (token cap is a tested invariant).
  • Two distinct names, each with exactly one direct match → sender/receiver mapping preserved, no picker.
  • Matching is case-insensitive and order-independent.

Tests

TDD, red/green per criterion. Backend coverage gate is 88% branch (JaCoCo) — ensure the empty-token-guard and direct-sorts-beyond-cap branches are both hit, plus the direct/partial and cap-before/after branches.

  • PersonServiceTest (@ExtendWith(MockitoExtension.class), mock PersonRepository + PersonNameAliasRepository, no Spring context) — resolveByName classification, one assertion per test:
    • tokenizer pins: Anna-Maria, D'Angelo, Müller, double-space, empty/all-spaces, null-safety (assert exact token output).
    • singleDirectMatch_classifiesAsDirect, maidenAliasToken_classifiesAsDirect, aliasFirstNameToken_isFetchedAndClassified, middleName_stillDirect, reorderedTokens_stillDirect, cramVsCramer_notDirect, emptyAfterTokenizing_returnsNoCandidates, directSortsBeyondCap_stillReturnedAsDirect.
    • over8Tokens_issuesAtMost8Fetches — assert invocation count via verify(personRepository, atMost(8)).searchByName(any()), not list size (the DoS guarantee is about fetches issued).
    • findByDisplayNameContaining_delegatesToSearchByName (existing, line 903) stays valid — the primitive is not removed.
  • Extend the existing PersonRepositoryTest (already a @DataJpaTest + Testcontainers postgres:16-alpine slice with searchByName fixtures incl. an alias-"Cram" test) — do not author a new slice / second container. Add searchByName_findsByAliasFirstName and searchByName_ordersByLastNameThenFirstName, reusing the existing builder fixtures. H2 differs on CASE-folding and CONCAT over nulls, so this must be real Postgres.
  • NlQueryParserServiceTest — re-point the 23 mocks from findByDisplayNameContaining to resolveByName returning NameMatches. Provide a makeNameMatches(direct, partial) factory with both args defaulting to empty list so no-match stubs (tests #4/#8) stay one-liners. Keep assertions for tests #2/#3 (ambiguous size 2, search skipped); verify test #20 (search_elevenCandidates_capsAtTen) exercises the cap after classification, not the old pre-cap path. Add: partialOnly_oneCandidate_populatesAmbiguous, partialOnly_twoCandidates_populatesAmbiguous (producer for both copy branches), oneDirect_executesSearch, two-names → sender/receiver.
  • Frontend — a documents/+page.svelte (or DisambiguationPicker) test asserting the panel heading copy switches on ambiguousPersons.length (1 → "Meintest du", ≥2 → "Mehrere Personen gefunden") and that the cue is absent in the 1-item case.

All backend unit tests <10s; the integration assertions run in the existing backend test stage (one container). Keep permutations out of E2E — one NL-search happy-path E2E already exists.

## Problem In natural-language ("smart") search, a query like **"Briefe von Clara Cram"** shows the disambiguation picker even when exactly one person is actually named *Clara Cram*. Root cause: `NlQueryParserService.resolveNames()` resolves person names via `personService.findByDisplayNameContaining(name)` — a **substring** match — and then treats *any* result count > 1 as ambiguous: ```java List<Person> candidates = personService.findByDisplayNameContaining(name); ... if (capped.isEmpty()) { noMatchFragments.add(name); } else if (capped.size() == 1) { // auto-resolve } else { // ALL candidates -> ambiguous -> picker } ``` So "Clara Cram" returns the picker whenever a *Clara Cram* coexists with any other substring hit such as *Clara Cramer* or *Clara Crammond*, even though there is exactly one **direct** match. `backend/src/main/java/org/raddatz/familienarchiv/search/NlQueryParserService.java:89-123` ## Desired behaviour - **Exactly one token-direct match** → auto-select it (ignore looser partial substring matches). - **Multiple token-direct matches** (genuinely "multiple Clara Crams") → picker showing the direct matches. - **Zero direct, but partial/substring matches exist** → picker showing the partials as suggestions, with a "Meintest du …?" panel heading (see Frontend). The cap-after-classify rule below is load-bearing here, not a micro-optimization: a common surname can produce one direct match *plus* ≥10 partials for the same name. - **No candidates at all** → fold the name into full-text search (today's no-match behaviour). ## Definition of a "direct" match (alias / maiden-name aware) Token / word-boundary match (order-independent, tolerates middle names), evaluated against **all of a person's name components**, not just the formatted display name. 1. **Tokenize** the extracted name. **The tokenizer shall lowercase the input and split on whitespace, hyphen, and apostrophe, dropping empty tokens.** The same tokenizer is applied symmetrically to the query and to every candidate name component — symmetry guarantees "Anna-Maria" (query) matches "Anna Maria" (person) and vice versa. Worked output: `Anna-Maria` → `{anna, maria}`; `D'Angelo` → `{d, angelo}`; `Müller` → `{müller}`; `"Clara Cram"` (double space) → `{clara, cram}`. 2. Build the candidate's **name-token union** from every structured name field: `firstName`, `lastName`, `alias`, **each `PersonNameAlias`'s `firstName` + `lastName`** (covers `MAIDEN_NAME`, `BIRTH`, `WIDOWED`, `DIVORCED`, `OTHER`), and `title` (see note). Tokenize each the same way and union into one set → `personTokens`. 3. **Direct** iff every `queryToken` is present as a *whole token* in `personTokens` (set containment). - "Clara Cram" → direct for "Clara Cram" **and** "Clara Maria Cram" (middle name); reordered "Cram Clara" also matches. - "Clara Cram" → direct for a person whose `displayName` is **"Clara Müller"** but who carries a maiden-name alias **"Cram"** — `clara` from `firstName`, `cram` from the maiden alias's `lastName`. - "Clara Cram" → **not** direct for "Clara Cramer" (`cram` ≠ `cramer`). 4. **Partial** = a candidate that matched the substring fetch but is not direct (e.g. typing "Cram" → "Clara Cramer"). Token-set containment is chosen over regex `\b` word boundaries to avoid umlaut/diacritic edge cases and for natural order-independence. The union spans separate name components (so `firstName` + maiden `lastName` together satisfy "Clara Cram"), which is **intended** — a query token need only match *some* component, not a single name form. **`title` is classification-only (superset tolerance).** A title token like "Dr" never appears in an LLM-extracted person name and is never the sole basis for a match; including `title` in the union just keeps "Dr Clara Cram" from being rejected. It is deliberately **not** added to the candidate-fetch query — a title-only match is meaningless. Every other component participates in *both* fetch and classification (see resolved decision #3). ## Resolved decisions (from multi-persona review) 1. **Maiden-name / alias matches count as DIRECT.** Married women in a 1899–1950 letter archive routinely appear under both married and maiden names; a unique née match must auto-select, not force a click. Hence the alias-aware union in step 2. 2. **The matching rule lives in the Person domain.** Implement as `PersonService.resolveByName(String name)` returning a **Person-domain** result record `record NameMatches(List<Person> direct, List<Person> partial)` — fields named `direct`/`partial` (name-match strength), **not** `resolved`/`ambiguous` (search-layer vocabulary). Do **not** reuse the private `NameResolution` record currently inside `NlQueryParserService` across the boundary. `NlQueryParserService` maps the `NameMatches` result into its own resolved/ambiguous/noMatch buckets. Keep `findByDisplayNameContaining` as the per-token fetch primitive, called from *inside* `resolveByName` (it is **not** removed — the person-search typeahead still uses it). 3. **Close the fetch-vs-classify gap by extending the fetch.** Today `PersonRepository.searchByName` matches `firstName lastName` / `lastName firstName` concatenations, `p.alias`, and `a.lastName` — but **not `a.firstName`**. The classifier (step 2) accepts alias `firstName` tokens, so without this fix a candidate matchable *only* via an alias first name is never fetched and can never be classified direct. Add `OR LOWER(a.firstName) LIKE LOWER(CONCAT('%', :query, '%'))` to `searchByName` (reuses the same bound `:query` parameter — stays injection-proof). The `person_name_aliases.first_name` column already exists, so **no Flyway migration / no DB-diagram update**. (Benign side effect: the typeahead now also matches alias first names.) `title` stays out of the fetch by design. 4. **Differentiate the picker copy — IN SCOPE.** A single partial still shows a 1-item picker, but the copy must match the case (see Frontend). 5. **The "Meintest du …?" prompt lives in the non-truncated panel heading, not the trigger.** The trigger button keeps showing the bare name(s) (its name span is truncated at `max-w-[8rem]`/`max-w-[12rem]`, which would clip "Meintest du Clara Cra…" at 320px and hide the suggestion from the senior audience). Putting the prompt in the `<ul>` panel heading guarantees the full suggested name renders, at body size, whenever the picker opens. Also **drop the standalone `(auswählen…)` cue (`search_disambiguation_cue`) in the 1-partial branch** — "Meintest du …?" already implies the action; keep the cue for the ≥2 case. ## Implementation notes In `PersonService.resolveByName(String name)`: 1. **Tokenize once** (per the contract above), then dedupe and cap to `MAX_TOKENS = 8` distinct tokens **before** the fetch loop. This cap is a **DoS control** (each token is one unindexed leading-wildcard `LIKE` scan), not just perf — make it a tested invariant. Keep the existing `MAX_NAME_LENGTH = 200` input guard in `NlQueryParserService` as the first line of defense. Co-locate `MAX_TOKENS`/`MAX_CANDIDATES` in `PersonService` (the layer that owns the loop); leave `MAX_NAME_LENGTH` in `NlQueryParserService`. Don't duplicate constants across both classes. 2. **Empty-token guard (guard clause at the top):** if `queryTokens` is empty (name was all punctuation/whitespace), return empty `direct` + empty `partial`. `Set.containsAll(emptySet)` returns `true`, so without this guard *every* candidate would be marked direct. The search layer then folds the name to full-text search. 3. Build a candidate pool = union of `findByDisplayNameContaining(token)` over each capped token, deduped by id. 4. Classify each candidate as **direct** or **partial** via the alias-aware token-union rule. 5. **Cap to `MAX_CANDIDATES` (10) AFTER classification, never before** — `searchByName` returns `ORDER BY lastName, firstName`, so a direct match can sort past position 10; capping the raw list first can discard the one true direct match. 6. **Observability:** add a single `log.debug` recording the outcome bucket per name — `direct=1` / `direct>=2` / `partial-only` / `no-match` — plus the **token count** (not the raw name, to keep PII out). Makes "is auto-select firing?" / "why slow?" answerable from Loki without a redeploy. Keep `resolveByName` a short orchestrator — `tokenize → cap → fetchPool → classify → capAfterClassify` — each a private helper under 20 lines, empty-token guard first. `NlQueryParserService.resolveNames()` then maps `NameMatches` per name: - 1 direct → `resolved` - ≥2 direct → `ambiguous` (direct matches only) - 0 direct, ≥1 partial → `ambiguous` (partials, as suggestions — 1-item picker allowed) - 0 candidates → `noMatchFragments` **Preserve** existing behaviour: the up-to-2 `resolved` cap with overflow → `extraFragments`; the early return when any name is ambiguous; sender/receiver role mapping (`buildSender`/`buildReceiver`, `isAnyRole`) including the two-distinct-names case (Walter AND Emma → sender=person[0], receiver=person[1]). ### Docs (do before opening the PR) - `docs/GLOSSARY.md` — entry for **`NameMatches`** (direct vs partial name-match strength). - `frontend`/Person domain `person/README.md` — add `PersonService.resolveByName` to the public-surface list. - No ADR (localized matching rule, choices justified inline). ## Frontend changes The picker copy must reflect *which* case it is — "Mehrere Personen gefunden" on a 1-item picker fails Nielsen heuristic #2 (system matches reality), which matters for the 60+ audience. - **Trigger button stays as-is** — shows the bare name(s) with the existing `search_disambiguation_trigger_label`. Do not put the "Meintest du" framing here (it truncates at 320px). - **Panel heading carries the prompt.** In `frontend/src/routes/documents/+page.svelte` (the `nlIsAmbiguous` branch, `:58`/`:438`), derive the `<ul>` heading from `ambiguousPersons.length`: **length === 1** → "Meintest du …?" / "Did you mean …?"; **length ≥ 2** → keep the existing "Person auswählen" / "Mehrere Personen gefunden" framing. - Pass the chosen `heading` into `DisambiguationPicker.svelte` as a prop (it currently calls `m.search_disambiguation_heading()` internally at `:69`, and `m.search_disambiguation_trigger_label()` at `:57`) — keep the component dumb, let the page decide. - **Drop the `m.search_disambiguation_cue()` ("(auswählen…)", `:62`) in the 1-partial branch**; keep it for ≥2. - Add the new Paraglide keys (e.g. `search_disambiguation_did_you_mean`) in **all three** locales `messages/{de,en,es}.json`, following the existing `search_disambiguation_*` pattern (verified present in all three). - The per-option `aria-label` (`{name} auswählen`) reads correctly for both cases — no change. - `DisambiguationPicker` already meets the a11y baseline (44px targets, `aria-haspopup`/`expanded`/`controls`, focus-visible rings, Esc-to-close + focus return) — no new a11y work. ## Acceptance criteria - [ ] "Clara Cram" with one *Clara Cram* + a *Clara Cramer* in the DB → auto-selects *Clara Cram* (resolved chip, no picker). - [ ] Two persons literally named "Clara Cram" → picker lists both. - [ ] "Clara Cram" matches "Clara Maria Cram" (middle name) as direct. - [ ] "Clara Cram" matches a person whose displayName is "Clara Müller" with a maiden-name alias "Cram" → direct (auto-select). - [ ] A person matchable **only via an alias first name** is fetched and classifiable (fetch query extended with `a.firstName`). - [ ] Tokenizer output is pinned: `Anna-Maria` → `{anna, maria}`, `D'Angelo` → `{d, angelo}`, `Müller` → `{müller}`, lowercased, empties dropped. - [ ] "Cram" with only "Clara Cramer" present → 1-item picker, **panel heading reads "Meintest du …?"** (not "Mehrere Personen gefunden"), and the `(auswählen…)` cue is suppressed. - [ ] Partial-only with ≥2 candidates → picker keeps the "Mehrere Personen gefunden" framing. - [ ] A name with no substring candidates at all → folded into full-text search. - [ ] An all-punctuation / empty-after-tokenizing name → folded into full-text search (no false "direct" matches). - [ ] A direct match that sorts beyond `MAX_CANDIDATES` among partials is still auto-selected (cap applied after classification). - [ ] A name with >8 distinct tokens issues **at most 8** `findByDisplayNameContaining`/`searchByName` invocations (token cap is a tested invariant). - [ ] Two distinct names, each with exactly one direct match → sender/receiver mapping preserved, no picker. - [ ] Matching is case-insensitive and order-independent. ## Tests TDD, red/green per criterion. Backend coverage gate is **88% branch (JaCoCo)** — ensure the empty-token-guard and direct-sorts-beyond-cap branches are both hit, plus the direct/partial and cap-before/after branches. - **`PersonServiceTest`** (`@ExtendWith(MockitoExtension.class)`, mock `PersonRepository` + `PersonNameAliasRepository`, no Spring context) — `resolveByName` classification, one assertion per test: - tokenizer pins: `Anna-Maria`, `D'Angelo`, `Müller`, double-space, empty/all-spaces, null-safety (assert exact token output). - `singleDirectMatch_classifiesAsDirect`, `maidenAliasToken_classifiesAsDirect`, `aliasFirstNameToken_isFetchedAndClassified`, `middleName_stillDirect`, `reorderedTokens_stillDirect`, `cramVsCramer_notDirect`, `emptyAfterTokenizing_returnsNoCandidates`, `directSortsBeyondCap_stillReturnedAsDirect`. - `over8Tokens_issuesAtMost8Fetches` — assert **invocation count** via `verify(personRepository, atMost(8)).searchByName(any())`, not list size (the DoS guarantee is about fetches issued). - `findByDisplayNameContaining_delegatesToSearchByName` (existing, line 903) stays valid — the primitive is not removed. - **Extend the existing `PersonRepositoryTest`** (already a `@DataJpaTest` + Testcontainers `postgres:16-alpine` slice with `searchByName` fixtures incl. an alias-"Cram" test) — **do not author a new slice / second container.** Add `searchByName_findsByAliasFirstName` and `searchByName_ordersByLastNameThenFirstName`, reusing the existing builder fixtures. H2 differs on CASE-folding and `CONCAT` over nulls, so this must be real Postgres. - **`NlQueryParserServiceTest`** — re-point the 23 mocks from `findByDisplayNameContaining` to `resolveByName` returning `NameMatches`. Provide a `makeNameMatches(direct, partial)` factory with **both args defaulting to empty list** so no-match stubs (tests #4/#8) stay one-liners. Keep assertions for tests #2/#3 (ambiguous size 2, search skipped); verify test #20 (`search_elevenCandidates_capsAtTen`) exercises the cap *after* classification, not the old pre-cap path. Add: `partialOnly_oneCandidate_populatesAmbiguous`, `partialOnly_twoCandidates_populatesAmbiguous` (producer for both copy branches), `oneDirect_executesSearch`, two-names → sender/receiver. - **Frontend** — a `documents/+page.svelte` (or `DisambiguationPicker`) test asserting the **panel heading** copy switches on `ambiguousPersons.length` (1 → "Meintest du", ≥2 → "Mehrere Personen gefunden") and that the cue is absent in the 1-item case. All backend unit tests <10s; the integration assertions run in the existing backend test stage (one container). Keep permutations out of E2E — one NL-search happy-path E2E already exists.
marcel added the P2-mediumfeature labels 2026-06-06 20:20:21 +02:00
Author
Owner

Implemented — PR #769

Shipped via TDD on feat/issue-763-nl-search-direct-match (based on the unmerged #743 branch; PR targets main).

Commits

  • 543013b9 feat(person): name-match tokenizer (lowercase, split on whitespace/hyphen/apostrophe, drop empties)
  • ef4f7eda feat(person): match alias first names in searchByName (closes the fetch-vs-classify gap; no migration)
  • ef75f7f0 feat(person): resolveByNameNameMatches(direct, partial) — tokenize → cap(8) → fetch pool → classify → cap(10) after classification; empty-token guard; PII-free debug log; read-only tx for lazy aliases
  • a79a31c3 feat(search): map NameMatches into resolve buckets (1 direct → auto-select, ≥2 direct → ambiguous, partial-only → suggestions, 0 → full-text)
  • c20f0351 feat(search): case-appropriate picker copy — "Meintest du …?" visible panel heading + cue suppression for the 1-item case
  • c0edbd43 docs(search): GLOSSARY NameMatches + person/README public surface
  • Review fixes: 3f0d37bf (trigger aria-label derives from match count — a11y), 751ac1c4 (real-Postgres end-to-end alias resolution tests), 79bcd1b3 (fetchPool dedup test)

Acceptance criteria

All 14 criteria are covered by code + tests — including auto-select with a coexisting Clara Cramer, maiden-alias direct match, alias-first-name fetch, pinned tokenizer output, the 1-item "Meintest du …?" heading with suppressed cue, ≥2 "Mehrere Personen gefunden" framing, all-punctuation/empty folding, direct-sorts-beyond-cap, the ≤8-fetch DoS invariant, and two-distinct-names sender/receiver mapping.

Verification

  • Backend: PersonServiceTest (81), PersonRepositoryTest (56, real Postgres), NlQueryParserServiceTest (38), NlSearchControllerTest (7) — all green; mvnw package succeeds.
  • Frontend: DisambiguationPicker.svelte.spec.ts 10/10; npm run check adds no new errors over baseline.

Review

Multi-persona review on PR #769: cycle 1 → zero blockers, three concerns (a11y trigger label + alias test seam + dedup test); all fixed. Cycle 2 → all 7 personas Approved, zero blockers, zero concerns.

## Implemented — PR #769 ✅ Shipped via TDD on `feat/issue-763-nl-search-direct-match` (based on the unmerged #743 branch; PR targets `main`). ### Commits - `543013b9` feat(person): name-match tokenizer (lowercase, split on whitespace/hyphen/apostrophe, drop empties) - `ef4f7eda` feat(person): match alias first names in `searchByName` (closes the fetch-vs-classify gap; no migration) - `ef75f7f0` feat(person): `resolveByName` → `NameMatches(direct, partial)` — tokenize → cap(8) → fetch pool → classify → cap(10) after classification; empty-token guard; PII-free debug log; read-only tx for lazy aliases - `a79a31c3` feat(search): map `NameMatches` into resolve buckets (1 direct → auto-select, ≥2 direct → ambiguous, partial-only → suggestions, 0 → full-text) - `c20f0351` feat(search): case-appropriate picker copy — "Meintest du …?" visible panel heading + cue suppression for the 1-item case - `c0edbd43` docs(search): GLOSSARY `NameMatches` + person/README public surface - Review fixes: `3f0d37bf` (trigger aria-label derives from match count — a11y), `751ac1c4` (real-Postgres end-to-end alias resolution tests), `79bcd1b3` (fetchPool dedup test) ### Acceptance criteria All 14 criteria are covered by code + tests — including auto-select with a coexisting *Clara Cramer*, maiden-alias direct match, alias-first-name fetch, pinned tokenizer output, the 1-item "Meintest du …?" heading with suppressed cue, ≥2 "Mehrere Personen gefunden" framing, all-punctuation/empty folding, direct-sorts-beyond-cap, the ≤8-fetch DoS invariant, and two-distinct-names sender/receiver mapping. ### Verification - Backend: `PersonServiceTest` (81), `PersonRepositoryTest` (56, real Postgres), `NlQueryParserServiceTest` (38), `NlSearchControllerTest` (7) — all green; `mvnw package` succeeds. - Frontend: `DisambiguationPicker.svelte.spec.ts` 10/10; `npm run check` adds no new errors over baseline. ### Review Multi-persona review on PR #769: cycle 1 → zero blockers, three concerns (a11y trigger label + alias test seam + dedup test); all fixed. Cycle 2 → **all 7 personas ✅ Approved, zero blockers, zero concerns.**
Sign in to join this conversation.
No Label P2-medium feature
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#763