As a searcher I want a single direct person match auto-selected in smart search so I'm not forced to disambiguate when there's only one real match #763
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
In natural-language ("smart") search, a query like "Briefe von Clara Cram" shows the disambiguation picker even when exactly one person is actually named Clara Cram.
Root cause:
NlQueryParserService.resolveNames()resolves person names viapersonService.findByDisplayNameContaining(name)— a substring match — and then treats any result count > 1 as ambiguous:So "Clara Cram" returns the picker whenever a Clara Cram coexists with any other substring hit such as Clara Cramer or Clara Crammond, even though there is exactly one direct match.
backend/src/main/java/org/raddatz/familienarchiv/search/NlQueryParserService.java:89-123Desired behaviour
Definition of a "direct" match (alias / maiden-name aware)
Token / word-boundary match (order-independent, tolerates middle names), evaluated against all of a person's name components, not just the formatted display name.
Anna-Maria→{anna, maria};D'Angelo→{d, angelo};Müller→{müller};"Clara Cram"(double space) →{clara, cram}.firstName,lastName,alias, eachPersonNameAlias'sfirstName+lastName(coversMAIDEN_NAME,BIRTH,WIDOWED,DIVORCED,OTHER), andtitle(see note). Tokenize each the same way and union into one set →personTokens.queryTokenis present as a whole token inpersonTokens(set containment).displayNameis "Clara Müller" but who carries a maiden-name alias "Cram" —clarafromfirstName,cramfrom the maiden alias'slastName.cram≠cramer).Token-set containment is chosen over regex
\bword boundaries to avoid umlaut/diacritic edge cases and for natural order-independence. The union spans separate name components (sofirstName+ maidenlastNametogether satisfy "Clara Cram"), which is intended — a query token need only match some component, not a single name form.titleis classification-only (superset tolerance). A title token like "Dr" never appears in an LLM-extracted person name and is never the sole basis for a match; includingtitlein the union just keeps "Dr Clara Cram" from being rejected. It is deliberately not added to the candidate-fetch query — a title-only match is meaningless. Every other component participates in both fetch and classification (see resolved decision #3).Resolved decisions (from multi-persona review)
PersonService.resolveByName(String name)returning a Person-domain result recordrecord NameMatches(List<Person> direct, List<Person> partial)— fields nameddirect/partial(name-match strength), notresolved/ambiguous(search-layer vocabulary). Do not reuse the privateNameResolutionrecord currently insideNlQueryParserServiceacross the boundary.NlQueryParserServicemaps theNameMatchesresult into its own resolved/ambiguous/noMatch buckets. KeepfindByDisplayNameContainingas the per-token fetch primitive, called from insideresolveByName(it is not removed — the person-search typeahead still uses it).PersonRepository.searchByNamematchesfirstName lastName/lastName firstNameconcatenations,p.alias, anda.lastName— but nota.firstName. The classifier (step 2) accepts aliasfirstNametokens, so without this fix a candidate matchable only via an alias first name is never fetched and can never be classified direct. AddOR LOWER(a.firstName) LIKE LOWER(CONCAT('%', :query, '%'))tosearchByName(reuses the same bound:queryparameter — stays injection-proof). Theperson_name_aliases.first_namecolumn already exists, so no Flyway migration / no DB-diagram update. (Benign side effect: the typeahead now also matches alias first names.)titlestays out of the fetch by design.max-w-[8rem]/max-w-[12rem], which would clip "Meintest du Clara Cra…" at 320px and hide the suggestion from the senior audience). Putting the prompt in the<ul>panel heading guarantees the full suggested name renders, at body size, whenever the picker opens. Also drop the standalone(auswählen…)cue (search_disambiguation_cue) in the 1-partial branch — "Meintest du …?" already implies the action; keep the cue for the ≥2 case.Implementation notes
In
PersonService.resolveByName(String name):MAX_TOKENS = 8distinct tokens before the fetch loop. This cap is a DoS control (each token is one unindexed leading-wildcardLIKEscan), not just perf — make it a tested invariant. Keep the existingMAX_NAME_LENGTH = 200input guard inNlQueryParserServiceas the first line of defense. Co-locateMAX_TOKENS/MAX_CANDIDATESinPersonService(the layer that owns the loop); leaveMAX_NAME_LENGTHinNlQueryParserService. Don't duplicate constants across both classes.queryTokensis empty (name was all punctuation/whitespace), return emptydirect+ emptypartial.Set.containsAll(emptySet)returnstrue, so without this guard every candidate would be marked direct. The search layer then folds the name to full-text search.findByDisplayNameContaining(token)over each capped token, deduped by id.MAX_CANDIDATES(10) AFTER classification, never before —searchByNamereturnsORDER BY lastName, firstName, so a direct match can sort past position 10; capping the raw list first can discard the one true direct match.log.debugrecording the outcome bucket per name —direct=1/direct>=2/partial-only/no-match— plus the token count (not the raw name, to keep PII out). Makes "is auto-select firing?" / "why slow?" answerable from Loki without a redeploy.Keep
resolveByNamea short orchestrator —tokenize → cap → fetchPool → classify → capAfterClassify— each a private helper under 20 lines, empty-token guard first.NlQueryParserService.resolveNames()then mapsNameMatchesper name:resolvedambiguous(direct matches only)ambiguous(partials, as suggestions — 1-item picker allowed)noMatchFragmentsPreserve existing behaviour: the up-to-2
resolvedcap with overflow →extraFragments; the early return when any name is ambiguous; sender/receiver role mapping (buildSender/buildReceiver,isAnyRole) including the two-distinct-names case (Walter AND Emma → sender=person[0], receiver=person[1]).Docs (do before opening the PR)
docs/GLOSSARY.md— entry forNameMatches(direct vs partial name-match strength).frontend/Person domainperson/README.md— addPersonService.resolveByNameto the public-surface list.Frontend changes
The picker copy must reflect which case it is — "Mehrere Personen gefunden" on a 1-item picker fails Nielsen heuristic #2 (system matches reality), which matters for the 60+ audience.
search_disambiguation_trigger_label. Do not put the "Meintest du" framing here (it truncates at 320px).frontend/src/routes/documents/+page.svelte(thenlIsAmbiguousbranch,:58/:438), derive the<ul>heading fromambiguousPersons.length: length === 1 → "Meintest du …?" / "Did you mean …?"; length ≥ 2 → keep the existing "Person auswählen" / "Mehrere Personen gefunden" framing.headingintoDisambiguationPicker.svelteas a prop (it currently callsm.search_disambiguation_heading()internally at:69, andm.search_disambiguation_trigger_label()at:57) — keep the component dumb, let the page decide.m.search_disambiguation_cue()("(auswählen…)",:62) in the 1-partial branch; keep it for ≥2.search_disambiguation_did_you_mean) in all three localesmessages/{de,en,es}.json, following the existingsearch_disambiguation_*pattern (verified present in all three).aria-label({name} auswählen) reads correctly for both cases — no change.DisambiguationPickeralready meets the a11y baseline (44px targets,aria-haspopup/expanded/controls, focus-visible rings, Esc-to-close + focus return) — no new a11y work.Acceptance criteria
a.firstName).Anna-Maria→{anna, maria},D'Angelo→{d, angelo},Müller→{müller}, lowercased, empties dropped.(auswählen…)cue is suppressed.MAX_CANDIDATESamong partials is still auto-selected (cap applied after classification).findByDisplayNameContaining/searchByNameinvocations (token cap is a tested invariant).Tests
TDD, red/green per criterion. Backend coverage gate is 88% branch (JaCoCo) — ensure the empty-token-guard and direct-sorts-beyond-cap branches are both hit, plus the direct/partial and cap-before/after branches.
PersonServiceTest(@ExtendWith(MockitoExtension.class), mockPersonRepository+PersonNameAliasRepository, no Spring context) —resolveByNameclassification, one assertion per test:Anna-Maria,D'Angelo,Müller, double-space, empty/all-spaces, null-safety (assert exact token output).singleDirectMatch_classifiesAsDirect,maidenAliasToken_classifiesAsDirect,aliasFirstNameToken_isFetchedAndClassified,middleName_stillDirect,reorderedTokens_stillDirect,cramVsCramer_notDirect,emptyAfterTokenizing_returnsNoCandidates,directSortsBeyondCap_stillReturnedAsDirect.over8Tokens_issuesAtMost8Fetches— assert invocation count viaverify(personRepository, atMost(8)).searchByName(any()), not list size (the DoS guarantee is about fetches issued).findByDisplayNameContaining_delegatesToSearchByName(existing, line 903) stays valid — the primitive is not removed.PersonRepositoryTest(already a@DataJpaTest+ Testcontainerspostgres:16-alpineslice withsearchByNamefixtures incl. an alias-"Cram" test) — do not author a new slice / second container. AddsearchByName_findsByAliasFirstNameandsearchByName_ordersByLastNameThenFirstName, reusing the existing builder fixtures. H2 differs on CASE-folding andCONCATover nulls, so this must be real Postgres.NlQueryParserServiceTest— re-point the 23 mocks fromfindByDisplayNameContainingtoresolveByNamereturningNameMatches. Provide amakeNameMatches(direct, partial)factory with both args defaulting to empty list so no-match stubs (tests #4/#8) stay one-liners. Keep assertions for tests #2/#3 (ambiguous size 2, search skipped); verify test #20 (search_elevenCandidates_capsAtTen) exercises the cap after classification, not the old pre-cap path. Add:partialOnly_oneCandidate_populatesAmbiguous,partialOnly_twoCandidates_populatesAmbiguous(producer for both copy branches),oneDirect_executesSearch, two-names → sender/receiver.documents/+page.svelte(orDisambiguationPicker) test asserting the panel heading copy switches onambiguousPersons.length(1 → "Meintest du", ≥2 → "Mehrere Personen gefunden") and that the cue is absent in the 1-item case.All backend unit tests <10s; the integration assertions run in the existing backend test stage (one container). Keep permutations out of E2E — one NL-search happy-path E2E already exists.
Implemented — PR #769 ✅
Shipped via TDD on
feat/issue-763-nl-search-direct-match(based on the unmerged #743 branch; PR targetsmain).Commits
543013b9feat(person): name-match tokenizer (lowercase, split on whitespace/hyphen/apostrophe, drop empties)ef4f7edafeat(person): match alias first names insearchByName(closes the fetch-vs-classify gap; no migration)ef75f7f0feat(person):resolveByName→NameMatches(direct, partial)— tokenize → cap(8) → fetch pool → classify → cap(10) after classification; empty-token guard; PII-free debug log; read-only tx for lazy aliasesa79a31c3feat(search): mapNameMatchesinto resolve buckets (1 direct → auto-select, ≥2 direct → ambiguous, partial-only → suggestions, 0 → full-text)c20f0351feat(search): case-appropriate picker copy — "Meintest du …?" visible panel heading + cue suppression for the 1-item casec0edbd43docs(search): GLOSSARYNameMatches+ person/README public surface3f0d37bf(trigger aria-label derives from match count — a11y),751ac1c4(real-Postgres end-to-end alias resolution tests),79bcd1b3(fetchPool dedup test)Acceptance criteria
All 14 criteria are covered by code + tests — including auto-select with a coexisting Clara Cramer, maiden-alias direct match, alias-first-name fetch, pinned tokenizer output, the 1-item "Meintest du …?" heading with suppressed cue, ≥2 "Mehrere Personen gefunden" framing, all-punctuation/empty folding, direct-sorts-beyond-cap, the ≤8-fetch DoS invariant, and two-distinct-names sender/receiver mapping.
Verification
PersonServiceTest(81),PersonRepositoryTest(56, real Postgres),NlQueryParserServiceTest(38),NlSearchControllerTest(7) — all green;mvnw packagesucceeds.DisambiguationPicker.svelte.spec.ts10/10;npm run checkadds no new errors over baseline.Review
Multi-persona review on PR #769: cycle 1 → zero blockers, three concerns (a11y trigger label + alias test seam + dedup test); all fixed. Cycle 2 → all 7 personas ✅ Approved, zero blockers, zero concerns.