Commit Graph

3417 Commits

Author SHA1 Message Date
Marcel
3ad1a69195 docs(claude): remove NLP search references from CLAUDE.md files
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m20s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m47s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 23s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m7s
2026-06-07 19:04:52 +02:00
Marcel
f20521b6fb refactor(search): delete nlp-service microservice and Ollama ADRs 2026-06-07 19:04:00 +02:00
Marcel
2231744e6a refactor(infra): remove Ollama/NLP observability config 2026-06-07 19:02:56 +02:00
Marcel
00b7c86b6a refactor(infra): remove nlp-service from docker-compose files 2026-06-07 19:02:17 +02:00
Marcel
fd27dfacc8 refactor(search): remove smart search i18n keys from all language files 2026-06-07 19:01:17 +02:00
Marcel
62bc92a75c refactor(search): remove smart search error codes from frontend 2026-06-07 18:59:47 +02:00
Marcel
2d6ab85709 refactor(search): remove NLP smart search from documents page
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:58:36 +02:00
Marcel
0cf4916c8b refactor(search): remove smart mode from SearchFilterBar
Removes SmartModeToggle component import and all smart-mode conditional logic from SearchFilterBar, including mode-specific input handling, max-length constraints, and CSS class toggling. Removes associated smart-mode tests that verified chip lifecycle callbacks (onModeToggle, onSmartSearch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:55:42 +02:00
Marcel
1e1e96b86f refactor(search): delete frontend NLP search components and utilities
Removes SmartModeToggle, SmartSearchStatus, InterpretationChipRow,
DisambiguationPicker, chip-types utilities, and theme-chip-removal
utilities as part of NLP feature removal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:43:34 +02:00
Marcel
30aba010f4 refactor(search): remove NLP error codes and application config
Remove SMART_SEARCH_UNAVAILABLE and SMART_SEARCH_RATE_LIMITED error codes
from ErrorCode enum; remove nlp and nl-search configuration blocks from
application.yaml; remove nlp config block from application-dev.yaml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:42:48 +02:00
Marcel
be7ad1d1fa refactor(search): delete backend NLP search package
Remove entire backend search domain including:
- NlSearchController, NlQueryParserService, NlpClient implementations
- Rate limiting, properties, DTOs (NlSearchRequest/Response/NlQueryInterpretation)
- All domain logic and tests (5 test files deleted)

Backend compiles successfully post-deletion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:41:36 +02:00
Marcel
4232941b99 fix(infra): replace Ollama with nlp-service in docker-compose.prod.yml
Some checks failed
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Semgrep Security Scan (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (pull_request) Has been cancelled
Removes the ollama and ollama-model-init services (and ollama-models
volume) from the production/staging compose file. Adds the nlp-service
in their place — mirroring the dev compose — and wires the backend
dependency and APP_NLP_BASE_URL env var so staging can reach the new
service.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:16:45 +02:00
Marcel
f41acfb29e fix(search): replace languageTag() with getLocale(); sync KI→Smart in tests
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m24s
CI / OCR Service Tests (pull_request) Successful in 26s
CI / Backend Unit Tests (pull_request) Successful in 3m57s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 23s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m7s
Paraglide 2.5 runtime exports getLocale(), not languageTag(). The
8bed0cc6 commit introduced the wrong import when threading lang through
the NL search path.

Also updates two test assertions that still expected the old 'KI' button
label after 0b31a51e renamed it to 'Smart-Suche'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 17:44:38 +02:00
Marcel
15dff2a7b9 refactor(search): delete orphaned RestClientOllamaClientTest
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m49s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 4m3s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 24s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
The source class RestClientOllamaClient was removed in 864f44a4 but the
corresponding test file was not staged at the time. Removes the leftover
file; coverage is provided by RestClientNlpClientTest.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:42:20 +02:00
Marcel
081e9c3163 docs(deployment): replace Ollama with nlp-service in DEPLOYMENT.md
- §1: update memory table (nlp-service ~256 MB vs Ollama ~8 GB);
  update memory budget note; add nlp-service to topology diagram
- §2: replace 'Ollama (NL search) service' env var table with
  'NLP service' table (APP_NLP_BASE_URL, NLP_FUZZY_THRESHOLD);
  add credential-rotation restart note
- §3.4: replace Ollama model-pull first-deploy warning with
  nlp-service startup note (no download, --wait safe)
- §6: replace Ollama operational section (model pull, ollama list,
  upgrade guide) with nlp-service health check and tuning guide

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:41:46 +02:00
Marcel
0c8d516eed docs(nlp-service): update CLAUDE.md — remove stale dateparser entry and prototype note
Removes 'dateparser 1.2' from the stack section (dependency was dropped
in favour of the rule-based date regex pipeline). Rewrites the Notes
section to reflect that docker-compose integration and Java-side wiring
were both delivered in this PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:40:01 +02:00
Marcel
6fdbc6240a fix(infra): wait for nlp-service healthy before starting backend
Changes condition: service_started → service_healthy so the backend
container does not start until FastAPI has bound its port and loaded
person names from the database. Eliminates the startup race where a
first NL search would return 503 during nlp-service bootstrap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:39:25 +02:00
Marcel
6e997c7474 docs(adr): ADR-035 — replace Ollama with rule-based nlp-service
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m56s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m8s
CI / Backend Unit Tests (pull_request) Failing after 39s
CI / fail2ban Regex (pull_request) Successful in 53s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:13:58 +02:00
Marcel
2559260ee8 docs(c4): replace Ollama with nlp-service in L2 container diagram
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:12:59 +02:00
Marcel
2b8fb602e3 feat(infra): replace Ollama with nlp-service in docker-compose
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:12:13 +02:00
Marcel
0b31a51ed9 chore(i18n): remove AI/KI/IA and timing refs from smart search strings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:10:32 +02:00
Marcel
7ebfaf7933 test(search): assert lang field sent in E2E NL search request
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:08:57 +02:00
Marcel
a4e0d1685c feat(search): raise NL search rate limit from 5 to 20 req/min
The rule-based NLP service is <100ms vs Ollama's ~15s, making the old
limit too restrictive for normal interactive use.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:06:04 +02:00
Marcel
ac21f4fe38 test(search): replace OllamaClient test suite with NlpClient equivalents
- Delete RestClientOllamaClientTest, add RestClientNlpClientTest:
  WireMock targets POST /parse; adds isHealthy_returnsFalse_whenPersonsLoadedIsZero
- NlQueryParserServiceTest: @Mock NlpClient; all stubs updated to parse(String,String);
  NlpExtraction throughout; service.search(..., "de", PAGE); adds verify(nlpClient).parse(eq,eq)
- NlSearchControllerTest: add lang:"de" to all request bodies; stubs use anyString×3;
  rename search_returns503_whenOllamaUnavailable → search_returns503_whenNlpServiceUnavailable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:04:50 +02:00
Marcel
864f44a4be refactor(search): delete Ollama* classes replaced by Nlp* equivalents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:59:20 +02:00
Marcel
8bed0cc6e2 feat(search): thread lang through NlSearchRequest → controller → NlQueryParserService → NlpClient
- NlSearchRequest gains @NotBlank @Pattern(regexp="de|en|es") lang field
- NlSearchController passes request.lang() to service
- NlQueryParserService.search signature: (String, String, Pageable); renames ollamaClient→nlpClient; removes redundant length guard (Bean Validation is enforcement point)
- application.yaml: replaces app.ollama.* with app.nlp.base-url; application-dev.yaml: points to localhost:8001
- frontend/documents/+page.svelte: sends lang: languageTag() in POST body

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:58:48 +02:00
Marcel
34387f2d59 feat(search): add RestClientNlpClient — POST /parse, GET /health with persons_loaded check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:55:50 +02:00
Marcel
8d1ff1efe7 test(search): NlpPropertiesTest — validates baseUrl required and defaults
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:54:39 +02:00
Marcel
492a064735 feat(search): add NlpProperties config and @ConfigurationPropertiesScan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:52:12 +02:00
Marcel
e1ec1c0dfe feat(search): add NlpExtraction record, NlpClient and NlpHealthClient interfaces
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:51:26 +02:00
Marcel
00b2d46424 test(nlp-service): guard global matcher state in try/finally
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:50:32 +02:00
Marcel
d3da3b6cd1 chore(nlp-service): add .dockerignore to exclude dev artifacts from image
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:50:01 +02:00
Marcel
24e5ac9c22 chore(nlp-service): remove unused dateparser dependency
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:49:37 +02:00
Marcel
2eb5572d7a feat(nlp-service): wire NLP_FUZZY_THRESHOLD env var with 0-100 validation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:48:57 +02:00
Marcel
99d6a9a428 feat(nlp-service): cap /parse query at 500 chars via Field(max_length=500)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:47:40 +02:00
Marcel
4697f5fbb3 feat(nlp-service): log WARNING when DATABASE_URL absent, ERROR on DB failure
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:47:03 +02:00
Marcel
5d8ec38474 fix(nlp-service): return generic 500 detail to prevent credential leakage
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:46:24 +02:00
Marcel
824f048640 fix(nlp-service): eliminate false-positive person matches from dirty DB records
- Wire _EXTRA_SPAN_STOPS into _extract_persons_and_role so German function
  words (im, seine, ihre, dem, …) terminate name spans — fixes "Clara im"
  and "seine Kinder" leaking into personNames
- Add _NON_NAME_TOKENS filter in PersonMatcher.load() to skip DB records
  whose first_name contains prepositions or possessives — filters 290 bad
  records (annotations like "an seine Eltern", "Eltern in", place references
  like "Enkel Cram aus Mexiko") that were causing exact Pass-2 matches
- Remove spaCy model downloads from Dockerfile (no longer needed after the
  DB-backed matcher rewrite)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 11:09:35 +02:00
Marcel
6c5cf8ec9b feat(nlp-service): replace spaCy NER with DB-backed PersonMatcher
Rule-based pipeline: persons matched via rapidfuzz against all known
names loaded from DB at startup. Fixes first-name-only extraction
(Eugenie, Herbert), merged-span bug (Herbert + Eugenie de Gruyter),
false positives on compound nouns, and EN/ES model failures.
Date extraction unchanged (regex). No spaCy models required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 11:00:03 +02:00
Marcel
9472d8c25e feat(nlp-service): Dockerfile — python:3.11-slim, models baked in 2026-06-07 10:31:18 +02:00
Marcel
8521e6f173 feat(nlp-service): FastAPI app with /parse and /health endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 10:29:32 +02:00
Marcel
cc4c81e218 feat(nlp-service): full extract() pipeline — assembles all steps
Also adds regex year-fallback in extract_dates() for de/es spaCy small
models that don't tag bare 4-digit years as DATE entities, and widens
the direction-token window to 2 tokens back to handle Spanish "antes de".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 10:28:40 +02:00
Marcel
55f419d20f feat(nlp-service): keyword extraction (POS-filtered, deduped lemmas) 2026-06-07 10:24:35 +02:00
Marcel
53f6dcbfed feat(nlp-service): date range extraction with direction detection 2026-06-07 10:23:33 +02:00
Marcel
0ab2e2a743 feat(nlp-service): role detection (sender/receiver/any) 2026-06-07 10:22:14 +02:00
Marcel
bff16f6f1f feat(nlp-service): NER person name extraction 2026-06-07 10:21:16 +02:00
Marcel
18f028e2dd feat(nlp-service): spaCy model loading with get_nlp/load_all_models 2026-06-07 10:17:07 +02:00
Marcel
e3b8e57746 feat(nlp-service): scaffold — models, requirements, CLAUDE.md
Task 1: Create standalone FastAPI service scaffold with models, test framework,
and documentation. Includes ParseRequest, ParseResponse Pydantic models matching
OllamaExtraction contract, plus three passing tests validating model validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 10:13:08 +02:00
Marcel
6878419156 merge: resolve conflicts with origin/main (#763 person name-match integration)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m31s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m48s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
CI / Unit & Component Tests (push) Successful in 3m20s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 3m48s
CI / fail2ban Regex (push) Successful in 46s
CI / Semgrep Security Scan (push) Successful in 23s
CI / Compose Bucket Idempotency (push) Successful in 1m8s
- Drop unused MAX_CANDIDATES constant (not referenced in service)
- Keep detached-entity safety comment in resolveTags()
- Add 3 new partial-name match tests (23a/b/c) from #763
- Use resolveByName() API in test 28 (replaces findByDisplayNameContaining)
- Add NameMatches glossary entry from #763

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:50:48 +02:00
Marcel
09b77e9b36 test(person): pin fetchPool dedup when one person matches two tokens (#763 review)
All checks were successful
CI / Unit & Component Tests (push) Successful in 3m20s
CI / OCR Service Tests (push) Successful in 24s
CI / Backend Unit Tests (push) Successful in 3m53s
CI / fail2ban Regex (push) Successful in 44s
CI / Semgrep Security Scan (push) Successful in 21s
CI / Compose Bucket Idempotency (push) Successful in 1m5s
Assert that when the same person id is returned by two different token
fetches, the person appears exactly once in the result -- pinning
fetchPool's putIfAbsent dedup so a future refactor can't silently
double-classify a candidate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 08:47:47 +02:00