Removes SmartModeToggle, SmartSearchStatus, InterpretationChipRow,
DisambiguationPicker, chip-types utilities, and theme-chip-removal
utilities as part of NLP feature removal.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove SMART_SEARCH_UNAVAILABLE and SMART_SEARCH_RATE_LIMITED error codes
from ErrorCode enum; remove nlp and nl-search configuration blocks from
application.yaml; remove nlp config block from application-dev.yaml.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes the ollama and ollama-model-init services (and ollama-models
volume) from the production/staging compose file. Adds the nlp-service
in their place — mirroring the dev compose — and wires the backend
dependency and APP_NLP_BASE_URL env var so staging can reach the new
service.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Paraglide 2.5 runtime exports getLocale(), not languageTag(). The
8bed0cc6 commit introduced the wrong import when threading lang through
the NL search path.
Also updates two test assertions that still expected the old 'KI' button
label after 0b31a51e renamed it to 'Smart-Suche'.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The source class RestClientOllamaClient was removed in 864f44a4 but the
corresponding test file was not staged at the time. Removes the leftover
file; coverage is provided by RestClientNlpClientTest.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes 'dateparser 1.2' from the stack section (dependency was dropped
in favour of the rule-based date regex pipeline). Rewrites the Notes
section to reflect that docker-compose integration and Java-side wiring
were both delivered in this PR.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Changes condition: service_started → service_healthy so the backend
container does not start until FastAPI has bound its port and loaded
person names from the database. Eliminates the startup race where a
first NL search would return 503 during nlp-service bootstrap.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The rule-based NLP service is <100ms vs Ollama's ~15s, making the old
limit too restrictive for normal interactive use.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- NlSearchRequest gains @NotBlank @Pattern(regexp="de|en|es") lang field
- NlSearchController passes request.lang() to service
- NlQueryParserService.search signature: (String, String, Pageable); renames ollamaClient→nlpClient; removes redundant length guard (Bean Validation is enforcement point)
- application.yaml: replaces app.ollama.* with app.nlp.base-url; application-dev.yaml: points to localhost:8001
- frontend/documents/+page.svelte: sends lang: languageTag() in POST body
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wire _EXTRA_SPAN_STOPS into _extract_persons_and_role so German function
words (im, seine, ihre, dem, …) terminate name spans — fixes "Clara im"
and "seine Kinder" leaking into personNames
- Add _NON_NAME_TOKENS filter in PersonMatcher.load() to skip DB records
whose first_name contains prepositions or possessives — filters 290 bad
records (annotations like "an seine Eltern", "Eltern in", place references
like "Enkel Cram aus Mexiko") that were causing exact Pass-2 matches
- Remove spaCy model downloads from Dockerfile (no longer needed after the
DB-backed matcher rewrite)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rule-based pipeline: persons matched via rapidfuzz against all known
names loaded from DB at startup. Fixes first-name-only extraction
(Eugenie, Herbert), merged-span bug (Herbert + Eugenie de Gruyter),
false positives on compound nouns, and EN/ES model failures.
Date extraction unchanged (regex). No spaCy models required.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also adds regex year-fallback in extract_dates() for de/es spaCy small
models that don't tag bare 4-digit years as DATE entities, and widens
the direction-token window to 2 tokens back to handle Spanish "antes de".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Task 1: Create standalone FastAPI service scaffold with models, test framework,
and documentation. Includes ParseRequest, ParseResponse Pydantic models matching
OllamaExtraction contract, plus three passing tests validating model validation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Drop unused MAX_CANDIDATES constant (not referenced in service)
- Keep detached-entity safety comment in resolveTags()
- Add 3 new partial-name match tests (23a/b/c) from #763
- Use resolveByName() API in test 28 (replaces findByDisplayNameContaining)
- Add NameMatches glossary entry from #763
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Assert that when the same person id is returned by two different token
fetches, the person appears exactly once in the result -- pinning
fetchPool's putIfAbsent dedup so a future refactor can't silently
double-classify a candidate.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
AC#4 (maiden alias -> direct) and AC#5 (alias first name -> fetchable +
classifiable) were each split across PersonRepositoryTest (the fetch) and
PersonServiceTest (the classifier with stubs) -- nothing walked
searchByName -> resolveByName end-to-end on real Postgres. Add two tests
in the existing @DataJpaTest slice that build a real PersonService over
the autowired repositories, persist a person with a MAIDEN_NAME alias and
one with an alias firstName, and assert both classify as direct.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The trigger hardcoded the multiple-people label for every count, so a
single did-you-mean picker announced "Mehrere Personen gefunden" to
screen readers while sighted users saw one name and a "Meintest du …?"
heading. Derive the trigger's accessible name from persons.length: a
single suggestion reuses the heading prop, two or more keep the
multiple-people label. Visible truncated name span unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
GLOSSARY entry for NameMatches (direct vs partial name-match strength and how
the search layer maps it); person/README adds resolveByName to the public
surface. No ADR — the matching rule is localized and justified inline.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A 1-item picker now reads "Meintest du …?" (a single direct match auto-selects
and never reaches the picker), while ≥2 keeps the "Person auswählen" framing.
The prompt lives in a visible, non-truncated panel heading (the trigger span
clips at 320px), and the "(auswählen…)" cue is dropped for the 1-item case.
DisambiguationPicker takes heading + showCue props; the page derives both from
ambiguousPersons.length. New search_disambiguation_did_you_mean key in de/en/es.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
resolveNames now delegates to PersonService.resolveByName and maps by match
strength: 1 direct → resolved (auto-select), ≥2 direct → ambiguous, 0 direct
with partials → ambiguous suggestions, 0 candidates → folded into full-text.
A single direct match no longer forces the picker when looser substring hits
coexist. The MAX_CANDIDATES cap moved into PersonService (after classification);
the MAX_NAME_LENGTH guard, resolved-cap overflow, and sender/receiver mapping
are preserved.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Token-set containment over all of a person's name components (firstName,
lastName, alias, each PersonNameAlias first+last, title) decides direct vs
partial. Orchestrates tokenize → cap(8) → fetch pool → classify → cap(10)
after classification, with an empty-token guard and a PII-free debug log of
the outcome bucket. MAX_TOKENS is a DoS control; the after-classify cap keeps a
direct match that sorts past position 10 among partials. Read-only transaction
keeps lazy nameAliases reachable during classification (ADR-022).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The direct-match classifier accepts alias firstName tokens, so the fetch must
surface candidates matchable only via an alias first name. Add a.firstName to
the searchByName LIKE clause (reuses the bound :query — injection-proof). The
person_name_aliases.first_name column already exists; no migration.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Lowercase, split on whitespace/hyphen/apostrophe, drop empties. Applied
symmetrically to query and candidate name components so "Anna-Maria" and
"Anna Maria" tokenize alike. Foundation for resolveByName direct matching.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>