Commit Graph

3411 Commits

Author SHA1 Message Date
Marcel
35642ce6c4 refactor(search): remove NLP error codes and application config
Remove SMART_SEARCH_UNAVAILABLE and SMART_SEARCH_RATE_LIMITED error codes
from ErrorCode enum; remove nlp and nl-search configuration blocks from
application.yaml; remove nlp config block from application-dev.yaml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
63926f440f refactor(search): delete backend NLP search package
Remove entire backend search domain including:
- NlSearchController, NlQueryParserService, NlpClient implementations
- Rate limiting, properties, DTOs (NlSearchRequest/Response/NlQueryInterpretation)
- All domain logic and tests (5 test files deleted)

Backend compiles successfully post-deletion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
e485626471 fix(infra): replace Ollama with nlp-service in docker-compose.prod.yml
Removes the ollama and ollama-model-init services (and ollama-models
volume) from the production/staging compose file. Adds the nlp-service
in their place — mirroring the dev compose — and wires the backend
dependency and APP_NLP_BASE_URL env var so staging can reach the new
service.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
345b3eca87 fix(search): replace languageTag() with getLocale(); sync KI→Smart in tests
Paraglide 2.5 runtime exports getLocale(), not languageTag(). The
8bed0cc6 commit introduced the wrong import when threading lang through
the NL search path.

Also updates two test assertions that still expected the old 'KI' button
label after 0b31a51e renamed it to 'Smart-Suche'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
53fdc8e3f9 refactor(search): delete orphaned RestClientOllamaClientTest
The source class RestClientOllamaClient was removed in 864f44a4 but the
corresponding test file was not staged at the time. Removes the leftover
file; coverage is provided by RestClientNlpClientTest.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
884c1156bd docs(deployment): replace Ollama with nlp-service in DEPLOYMENT.md
- §1: update memory table (nlp-service ~256 MB vs Ollama ~8 GB);
  update memory budget note; add nlp-service to topology diagram
- §2: replace 'Ollama (NL search) service' env var table with
  'NLP service' table (APP_NLP_BASE_URL, NLP_FUZZY_THRESHOLD);
  add credential-rotation restart note
- §3.4: replace Ollama model-pull first-deploy warning with
  nlp-service startup note (no download, --wait safe)
- §6: replace Ollama operational section (model pull, ollama list,
  upgrade guide) with nlp-service health check and tuning guide

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
960f1c171a docs(nlp-service): update CLAUDE.md — remove stale dateparser entry and prototype note
Removes 'dateparser 1.2' from the stack section (dependency was dropped
in favour of the rule-based date regex pipeline). Rewrites the Notes
section to reflect that docker-compose integration and Java-side wiring
were both delivered in this PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
644b7c2683 fix(infra): wait for nlp-service healthy before starting backend
Changes condition: service_started → service_healthy so the backend
container does not start until FastAPI has bound its port and loaded
person names from the database. Eliminates the startup race where a
first NL search would return 503 during nlp-service bootstrap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
3d2b907fb4 docs(adr): ADR-035 — replace Ollama with rule-based nlp-service
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
cc7132d11d docs(c4): replace Ollama with nlp-service in L2 container diagram
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
34ff3dbdfd feat(infra): replace Ollama with nlp-service in docker-compose
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
ab10daf325 chore(i18n): remove AI/KI/IA and timing refs from smart search strings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
4a23dfff8e test(search): assert lang field sent in E2E NL search request
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
08c11e567c feat(search): raise NL search rate limit from 5 to 20 req/min
The rule-based NLP service is <100ms vs Ollama's ~15s, making the old
limit too restrictive for normal interactive use.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
503ed6adef test(search): replace OllamaClient test suite with NlpClient equivalents
- Delete RestClientOllamaClientTest, add RestClientNlpClientTest:
  WireMock targets POST /parse; adds isHealthy_returnsFalse_whenPersonsLoadedIsZero
- NlQueryParserServiceTest: @Mock NlpClient; all stubs updated to parse(String,String);
  NlpExtraction throughout; service.search(..., "de", PAGE); adds verify(nlpClient).parse(eq,eq)
- NlSearchControllerTest: add lang:"de" to all request bodies; stubs use anyString×3;
  rename search_returns503_whenOllamaUnavailable → search_returns503_whenNlpServiceUnavailable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
28201e363a refactor(search): delete Ollama* classes replaced by Nlp* equivalents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
9f78f25b0a feat(search): thread lang through NlSearchRequest → controller → NlQueryParserService → NlpClient
- NlSearchRequest gains @NotBlank @Pattern(regexp="de|en|es") lang field
- NlSearchController passes request.lang() to service
- NlQueryParserService.search signature: (String, String, Pageable); renames ollamaClient→nlpClient; removes redundant length guard (Bean Validation is enforcement point)
- application.yaml: replaces app.ollama.* with app.nlp.base-url; application-dev.yaml: points to localhost:8001
- frontend/documents/+page.svelte: sends lang: languageTag() in POST body

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
1fcadfcd8f feat(search): add RestClientNlpClient — POST /parse, GET /health with persons_loaded check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
381bd1d943 test(search): NlpPropertiesTest — validates baseUrl required and defaults
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
c8543726ec feat(search): add NlpProperties config and @ConfigurationPropertiesScan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
4cbe1dc2d3 feat(search): add NlpExtraction record, NlpClient and NlpHealthClient interfaces
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
324a76d6d2 test(nlp-service): guard global matcher state in try/finally
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
f4e8632e0d chore(nlp-service): add .dockerignore to exclude dev artifacts from image
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
829194f345 chore(nlp-service): remove unused dateparser dependency
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
98ee6cf587 feat(nlp-service): wire NLP_FUZZY_THRESHOLD env var with 0-100 validation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
778382cd61 feat(nlp-service): cap /parse query at 500 chars via Field(max_length=500)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
8d4f30019b feat(nlp-service): log WARNING when DATABASE_URL absent, ERROR on DB failure
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
d65879d273 fix(nlp-service): return generic 500 detail to prevent credential leakage
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
bda7855cad fix(nlp-service): eliminate false-positive person matches from dirty DB records
- Wire _EXTRA_SPAN_STOPS into _extract_persons_and_role so German function
  words (im, seine, ihre, dem, …) terminate name spans — fixes "Clara im"
  and "seine Kinder" leaking into personNames
- Add _NON_NAME_TOKENS filter in PersonMatcher.load() to skip DB records
  whose first_name contains prepositions or possessives — filters 290 bad
  records (annotations like "an seine Eltern", "Eltern in", place references
  like "Enkel Cram aus Mexiko") that were causing exact Pass-2 matches
- Remove spaCy model downloads from Dockerfile (no longer needed after the
  DB-backed matcher rewrite)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
03d7d44e57 feat(nlp-service): replace spaCy NER with DB-backed PersonMatcher
Rule-based pipeline: persons matched via rapidfuzz against all known
names loaded from DB at startup. Fixes first-name-only extraction
(Eugenie, Herbert), merged-span bug (Herbert + Eugenie de Gruyter),
false positives on compound nouns, and EN/ES model failures.
Date extraction unchanged (regex). No spaCy models required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
aa200bf3c5 feat(nlp-service): Dockerfile — python:3.11-slim, models baked in 2026-06-08 10:56:32 +02:00
Marcel
7404284130 feat(nlp-service): FastAPI app with /parse and /health endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
3ddb2b278b feat(nlp-service): full extract() pipeline — assembles all steps
Also adds regex year-fallback in extract_dates() for de/es spaCy small
models that don't tag bare 4-digit years as DATE entities, and widens
the direction-token window to 2 tokens back to handle Spanish "antes de".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
702a72d575 feat(nlp-service): keyword extraction (POS-filtered, deduped lemmas) 2026-06-08 10:56:32 +02:00
Marcel
3f74deda8c feat(nlp-service): date range extraction with direction detection 2026-06-08 10:56:32 +02:00
Marcel
8ed2a6d95b feat(nlp-service): role detection (sender/receiver/any) 2026-06-08 10:56:32 +02:00
Marcel
deea34c797 feat(nlp-service): NER person name extraction 2026-06-08 10:56:32 +02:00
Marcel
482a1c2863 feat(nlp-service): spaCy model loading with get_nlp/load_all_models 2026-06-08 10:56:32 +02:00
Marcel
8e63867ad8 docs(specs): UI specs for Lesereisen reader and Journey editor
All checks were successful
CI / Unit & Component Tests (push) Successful in 3m23s
CI / OCR Service Tests (push) Successful in 24s
CI / Backend Unit Tests (push) Successful in 4m2s
CI / fail2ban Regex (push) Successful in 46s
CI / Semgrep Security Scan (push) Successful in 21s
CI / Compose Bucket Idempotency (push) Successful in 1m8s
nightly / deploy-staging (push) Successful in 2m44s
lesereisen-reader-spec.html — Issue #752
  LR-0 type selector on /geschichten/new
  LR-1 REISE badge on the list
  LR-2 Journey reader (ordered cards, interlude asides, no position numbers)

lesereisen-editor-spec.html — Issue #753
  LE-1 empty JourneyEditor layout
  LE-2 editor with mixed items (documents + interludes, drag handles)
  LE-3 inline note-editing state
  LE-4 mobile layout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 19:07:34 +02:00
Marcel
6b0a06e8b1 feat(nlp-service): scaffold — models, requirements, CLAUDE.md
Task 1: Create standalone FastAPI service scaffold with models, test framework,
and documentation. Includes ParseRequest, ParseResponse Pydantic models matching
OllamaExtraction contract, plus three passing tests validating model validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 10:11:34 +02:00
Marcel
7c1eef710c docs(nlp): add spaCy NLP service implementation plan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:52:07 +02:00
Marcel
03e22a2f26 docs(nlp): add spaCy NLP service prototype design spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:40:00 +02:00
Marcel
6878419156 merge: resolve conflicts with origin/main (#763 person name-match integration)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m31s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m48s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
CI / Unit & Component Tests (push) Successful in 3m20s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 3m48s
CI / fail2ban Regex (push) Successful in 46s
CI / Semgrep Security Scan (push) Successful in 23s
CI / Compose Bucket Idempotency (push) Successful in 1m8s
- Drop unused MAX_CANDIDATES constant (not referenced in service)
- Keep detached-entity safety comment in resolveTags()
- Add 3 new partial-name match tests (23a/b/c) from #763
- Use resolveByName() API in test 28 (replaces findByDisplayNameContaining)
- Add NameMatches glossary entry from #763

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:50:48 +02:00
Marcel
09b77e9b36 test(person): pin fetchPool dedup when one person matches two tokens (#763 review)
All checks were successful
CI / Unit & Component Tests (push) Successful in 3m20s
CI / OCR Service Tests (push) Successful in 24s
CI / Backend Unit Tests (push) Successful in 3m53s
CI / fail2ban Regex (push) Successful in 44s
CI / Semgrep Security Scan (push) Successful in 21s
CI / Compose Bucket Idempotency (push) Successful in 1m5s
Assert that when the same person id is returned by two different token
fetches, the person appears exactly once in the result -- pinning
fetchPool's putIfAbsent dedup so a future refactor can't silently
double-classify a candidate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 08:47:47 +02:00
Marcel
9d202b042b test(person): close fetch-to-classify seam for alias matches on real Postgres (#763 review)
AC#4 (maiden alias -> direct) and AC#5 (alias first name -> fetchable +
classifiable) were each split across PersonRepositoryTest (the fetch) and
PersonServiceTest (the classifier with stubs) -- nothing walked
searchByName -> resolveByName end-to-end on real Postgres. Add two tests
in the existing @DataJpaTest slice that build a real PersonService over
the autowired repositories, persist a person with a MAIDEN_NAME alias and
one with an alias firstName, and assert both classify as direct.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 08:47:47 +02:00
Marcel
8429b1e9f8 fix(search): derive disambiguation trigger aria-label from match count (#763 review)
The trigger hardcoded the multiple-people label for every count, so a
single did-you-mean picker announced "Mehrere Personen gefunden" to
screen readers while sighted users saw one name and a "Meintest du …?"
heading. Derive the trigger's accessible name from persons.length: a
single suggestion reuses the heading prop, two or more keep the
multiple-people label. Visible truncated name span unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 08:47:47 +02:00
Marcel
6959651b36 docs(search): document NameMatches and resolveByName (#763)
GLOSSARY entry for NameMatches (direct vs partial name-match strength and how
the search layer maps it); person/README adds resolveByName to the public
surface. No ADR — the matching rule is localized and justified inline.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 08:47:47 +02:00
Marcel
0ef4f4f07c feat(search): case-appropriate disambiguation picker copy (#763)
A 1-item picker now reads "Meintest du …?" (a single direct match auto-selects
and never reaches the picker), while ≥2 keeps the "Person auswählen" framing.
The prompt lives in a visible, non-truncated panel heading (the trigger span
clips at 320px), and the "(auswählen…)" cue is dropped for the 1-item case.
DisambiguationPicker takes heading + showCue props; the page derives both from
ambiguousPersons.length. New search_disambiguation_did_you_mean key in de/en/es.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 08:47:47 +02:00
Marcel
f1bb9d3a69 feat(search): map direct/partial NameMatches into resolve buckets (#763)
resolveNames now delegates to PersonService.resolveByName and maps by match
strength: 1 direct → resolved (auto-select), ≥2 direct → ambiguous, 0 direct
with partials → ambiguous suggestions, 0 candidates → folded into full-text.
A single direct match no longer forces the picker when looser substring hits
coexist. The MAX_CANDIDATES cap moved into PersonService (after classification);
the MAX_NAME_LENGTH guard, resolved-cap overflow, and sender/receiver mapping
are preserved.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 08:47:47 +02:00
Marcel
ca52145556 feat(person): add resolveByName for direct/partial name matching (#763)
Token-set containment over all of a person's name components (firstName,
lastName, alias, each PersonNameAlias first+last, title) decides direct vs
partial. Orchestrates tokenize → cap(8) → fetch pool → classify → cap(10)
after classification, with an empty-token guard and a PII-free debug log of
the outcome bucket. MAX_TOKENS is a DoS control; the after-classify cap keeps a
direct match that sorts past position 10 among partials. Read-only transaction
keeps lazy nameAliases reachable during classification (ADR-022).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 08:47:47 +02:00