Commit Graph

3421 Commits

Author SHA1 Message Date
Marcel
1b9fb5a359 refactor(search): strip dead NL types from generated api.ts
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m25s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 4m6s
CI / fail2ban Regex (pull_request) Successful in 47s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
Remove the /api/search/nl path and the NlSearchRequest,
NlQueryInterpretation, NlSearchResponse, PersonHint, and TagHint
schemas left over from the NLP/smart-search removal. These were
unused (nothing in frontend/src imported them); the manual strip
matches what `npm run generate:api` produces against the now
NL-free backend. Closes the last deferred review item on PR #772.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:42:23 +02:00
Marcel
784a7759f5 fix(review): resolve all review blockers and concerns
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m51s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m46s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 24s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m8s
- Delete frontend/e2e/nl-search.spec.ts (was left alive; would have
  crashed CI when Playwright couldn't find the deleted SmartModeToggle)
- Fix docs/DEPLOYMENT.md: remove NLP service arrow + key-facts bullet
  that were accidentally added instead of removed in the prior commit
- Clean docs/GLOSSARY.md: remove keyword→tag resolution, PersonHint,
  TagHint, theme chip entries; trim NameMatches to drop the
  NlQueryParserService reference
- Remove @ConfigurationPropertiesScan from FamilienarchivApplication
  (all remaining @ConfigurationProperties beans carry @Component)
- Remove 12 orphaned i18n keys from de/en/es message files
  (search_loading_nl, search_chip_*, search_disambiguation_*, etc.)
- Fix SearchFilterBar.svelte input padding: pr-20 → pr-4 (SmartModeToggle
  that justified the right padding is gone)
- Delete docs/superpowers/plans/2026-06-07-remove-nlp-search.md
  (scaffolding artefact; plan files belong in Gitea issues, not the repo)
- Add docs/adr/034-remove-nl-search.md documenting the removal decision
  (supersedes deleted ADR-028 ×2, ADR-034-ollama, ADR-035)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 19:50:48 +02:00
Marcel
fbaf180136 docs(c4): remove NLP service from L2 container diagram; delete NL search L3 diagram
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m55s
CI / OCR Service Tests (pull_request) Successful in 31s
CI / Backend Unit Tests (pull_request) Successful in 4m42s
CI / fail2ban Regex (pull_request) Successful in 1m8s
CI / Semgrep Security Scan (pull_request) Successful in 28s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m18s
2026-06-07 19:37:17 +02:00
Marcel
02abb374cc docs: remove nlp-service and NL search references from DEPLOYMENT.md and GLOSSARY.md
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m22s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m52s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 23s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m8s
2026-06-07 19:12:20 +02:00
Marcel
3ad1a69195 docs(claude): remove NLP search references from CLAUDE.md files
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m20s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m47s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 23s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m7s
2026-06-07 19:04:52 +02:00
Marcel
f20521b6fb refactor(search): delete nlp-service microservice and Ollama ADRs 2026-06-07 19:04:00 +02:00
Marcel
2231744e6a refactor(infra): remove Ollama/NLP observability config 2026-06-07 19:02:56 +02:00
Marcel
00b7c86b6a refactor(infra): remove nlp-service from docker-compose files 2026-06-07 19:02:17 +02:00
Marcel
fd27dfacc8 refactor(search): remove smart search i18n keys from all language files 2026-06-07 19:01:17 +02:00
Marcel
62bc92a75c refactor(search): remove smart search error codes from frontend 2026-06-07 18:59:47 +02:00
Marcel
2d6ab85709 refactor(search): remove NLP smart search from documents page
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:58:36 +02:00
Marcel
0cf4916c8b refactor(search): remove smart mode from SearchFilterBar
Removes SmartModeToggle component import and all smart-mode conditional logic from SearchFilterBar, including mode-specific input handling, max-length constraints, and CSS class toggling. Removes associated smart-mode tests that verified chip lifecycle callbacks (onModeToggle, onSmartSearch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:55:42 +02:00
Marcel
1e1e96b86f refactor(search): delete frontend NLP search components and utilities
Removes SmartModeToggle, SmartSearchStatus, InterpretationChipRow,
DisambiguationPicker, chip-types utilities, and theme-chip-removal
utilities as part of NLP feature removal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:43:34 +02:00
Marcel
30aba010f4 refactor(search): remove NLP error codes and application config
Remove SMART_SEARCH_UNAVAILABLE and SMART_SEARCH_RATE_LIMITED error codes
from ErrorCode enum; remove nlp and nl-search configuration blocks from
application.yaml; remove nlp config block from application-dev.yaml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:42:48 +02:00
Marcel
be7ad1d1fa refactor(search): delete backend NLP search package
Remove entire backend search domain including:
- NlSearchController, NlQueryParserService, NlpClient implementations
- Rate limiting, properties, DTOs (NlSearchRequest/Response/NlQueryInterpretation)
- All domain logic and tests (5 test files deleted)

Backend compiles successfully post-deletion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:41:36 +02:00
Marcel
4232941b99 fix(infra): replace Ollama with nlp-service in docker-compose.prod.yml
Some checks failed
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Semgrep Security Scan (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (pull_request) Has been cancelled
Removes the ollama and ollama-model-init services (and ollama-models
volume) from the production/staging compose file. Adds the nlp-service
in their place — mirroring the dev compose — and wires the backend
dependency and APP_NLP_BASE_URL env var so staging can reach the new
service.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 18:16:45 +02:00
Marcel
f41acfb29e fix(search): replace languageTag() with getLocale(); sync KI→Smart in tests
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m24s
CI / OCR Service Tests (pull_request) Successful in 26s
CI / Backend Unit Tests (pull_request) Successful in 3m57s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 23s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m7s
Paraglide 2.5 runtime exports getLocale(), not languageTag(). The
8bed0cc6 commit introduced the wrong import when threading lang through
the NL search path.

Also updates two test assertions that still expected the old 'KI' button
label after 0b31a51e renamed it to 'Smart-Suche'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 17:44:38 +02:00
Marcel
15dff2a7b9 refactor(search): delete orphaned RestClientOllamaClientTest
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m49s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 4m3s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 24s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
The source class RestClientOllamaClient was removed in 864f44a4 but the
corresponding test file was not staged at the time. Removes the leftover
file; coverage is provided by RestClientNlpClientTest.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:42:20 +02:00
Marcel
081e9c3163 docs(deployment): replace Ollama with nlp-service in DEPLOYMENT.md
- §1: update memory table (nlp-service ~256 MB vs Ollama ~8 GB);
  update memory budget note; add nlp-service to topology diagram
- §2: replace 'Ollama (NL search) service' env var table with
  'NLP service' table (APP_NLP_BASE_URL, NLP_FUZZY_THRESHOLD);
  add credential-rotation restart note
- §3.4: replace Ollama model-pull first-deploy warning with
  nlp-service startup note (no download, --wait safe)
- §6: replace Ollama operational section (model pull, ollama list,
  upgrade guide) with nlp-service health check and tuning guide

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:41:46 +02:00
Marcel
0c8d516eed docs(nlp-service): update CLAUDE.md — remove stale dateparser entry and prototype note
Removes 'dateparser 1.2' from the stack section (dependency was dropped
in favour of the rule-based date regex pipeline). Rewrites the Notes
section to reflect that docker-compose integration and Java-side wiring
were both delivered in this PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:40:01 +02:00
Marcel
6fdbc6240a fix(infra): wait for nlp-service healthy before starting backend
Changes condition: service_started → service_healthy so the backend
container does not start until FastAPI has bound its port and loaded
person names from the database. Eliminates the startup race where a
first NL search would return 503 during nlp-service bootstrap.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:39:25 +02:00
Marcel
6e997c7474 docs(adr): ADR-035 — replace Ollama with rule-based nlp-service
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m56s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m8s
CI / Backend Unit Tests (pull_request) Failing after 39s
CI / fail2ban Regex (pull_request) Successful in 53s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:13:58 +02:00
Marcel
2559260ee8 docs(c4): replace Ollama with nlp-service in L2 container diagram
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:12:59 +02:00
Marcel
2b8fb602e3 feat(infra): replace Ollama with nlp-service in docker-compose
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:12:13 +02:00
Marcel
0b31a51ed9 chore(i18n): remove AI/KI/IA and timing refs from smart search strings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:10:32 +02:00
Marcel
7ebfaf7933 test(search): assert lang field sent in E2E NL search request
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:08:57 +02:00
Marcel
a4e0d1685c feat(search): raise NL search rate limit from 5 to 20 req/min
The rule-based NLP service is <100ms vs Ollama's ~15s, making the old
limit too restrictive for normal interactive use.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:06:04 +02:00
Marcel
ac21f4fe38 test(search): replace OllamaClient test suite with NlpClient equivalents
- Delete RestClientOllamaClientTest, add RestClientNlpClientTest:
  WireMock targets POST /parse; adds isHealthy_returnsFalse_whenPersonsLoadedIsZero
- NlQueryParserServiceTest: @Mock NlpClient; all stubs updated to parse(String,String);
  NlpExtraction throughout; service.search(..., "de", PAGE); adds verify(nlpClient).parse(eq,eq)
- NlSearchControllerTest: add lang:"de" to all request bodies; stubs use anyString×3;
  rename search_returns503_whenOllamaUnavailable → search_returns503_whenNlpServiceUnavailable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:04:50 +02:00
Marcel
864f44a4be refactor(search): delete Ollama* classes replaced by Nlp* equivalents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:59:20 +02:00
Marcel
8bed0cc6e2 feat(search): thread lang through NlSearchRequest → controller → NlQueryParserService → NlpClient
- NlSearchRequest gains @NotBlank @Pattern(regexp="de|en|es") lang field
- NlSearchController passes request.lang() to service
- NlQueryParserService.search signature: (String, String, Pageable); renames ollamaClient→nlpClient; removes redundant length guard (Bean Validation is enforcement point)
- application.yaml: replaces app.ollama.* with app.nlp.base-url; application-dev.yaml: points to localhost:8001
- frontend/documents/+page.svelte: sends lang: languageTag() in POST body

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:58:48 +02:00
Marcel
34387f2d59 feat(search): add RestClientNlpClient — POST /parse, GET /health with persons_loaded check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:55:50 +02:00
Marcel
8d1ff1efe7 test(search): NlpPropertiesTest — validates baseUrl required and defaults
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:54:39 +02:00
Marcel
492a064735 feat(search): add NlpProperties config and @ConfigurationPropertiesScan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:52:12 +02:00
Marcel
e1ec1c0dfe feat(search): add NlpExtraction record, NlpClient and NlpHealthClient interfaces
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:51:26 +02:00
Marcel
00b2d46424 test(nlp-service): guard global matcher state in try/finally
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:50:32 +02:00
Marcel
d3da3b6cd1 chore(nlp-service): add .dockerignore to exclude dev artifacts from image
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:50:01 +02:00
Marcel
24e5ac9c22 chore(nlp-service): remove unused dateparser dependency
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:49:37 +02:00
Marcel
2eb5572d7a feat(nlp-service): wire NLP_FUZZY_THRESHOLD env var with 0-100 validation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:48:57 +02:00
Marcel
99d6a9a428 feat(nlp-service): cap /parse query at 500 chars via Field(max_length=500)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:47:40 +02:00
Marcel
4697f5fbb3 feat(nlp-service): log WARNING when DATABASE_URL absent, ERROR on DB failure
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:47:03 +02:00
Marcel
5d8ec38474 fix(nlp-service): return generic 500 detail to prevent credential leakage
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 15:46:24 +02:00
Marcel
824f048640 fix(nlp-service): eliminate false-positive person matches from dirty DB records
- Wire _EXTRA_SPAN_STOPS into _extract_persons_and_role so German function
  words (im, seine, ihre, dem, …) terminate name spans — fixes "Clara im"
  and "seine Kinder" leaking into personNames
- Add _NON_NAME_TOKENS filter in PersonMatcher.load() to skip DB records
  whose first_name contains prepositions or possessives — filters 290 bad
  records (annotations like "an seine Eltern", "Eltern in", place references
  like "Enkel Cram aus Mexiko") that were causing exact Pass-2 matches
- Remove spaCy model downloads from Dockerfile (no longer needed after the
  DB-backed matcher rewrite)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 11:09:35 +02:00
Marcel
6c5cf8ec9b feat(nlp-service): replace spaCy NER with DB-backed PersonMatcher
Rule-based pipeline: persons matched via rapidfuzz against all known
names loaded from DB at startup. Fixes first-name-only extraction
(Eugenie, Herbert), merged-span bug (Herbert + Eugenie de Gruyter),
false positives on compound nouns, and EN/ES model failures.
Date extraction unchanged (regex). No spaCy models required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 11:00:03 +02:00
Marcel
9472d8c25e feat(nlp-service): Dockerfile — python:3.11-slim, models baked in 2026-06-07 10:31:18 +02:00
Marcel
8521e6f173 feat(nlp-service): FastAPI app with /parse and /health endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 10:29:32 +02:00
Marcel
cc4c81e218 feat(nlp-service): full extract() pipeline — assembles all steps
Also adds regex year-fallback in extract_dates() for de/es spaCy small
models that don't tag bare 4-digit years as DATE entities, and widens
the direction-token window to 2 tokens back to handle Spanish "antes de".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 10:28:40 +02:00
Marcel
55f419d20f feat(nlp-service): keyword extraction (POS-filtered, deduped lemmas) 2026-06-07 10:24:35 +02:00
Marcel
53f6dcbfed feat(nlp-service): date range extraction with direction detection 2026-06-07 10:23:33 +02:00
Marcel
0ab2e2a743 feat(nlp-service): role detection (sender/receiver/any) 2026-06-07 10:22:14 +02:00
Marcel
bff16f6f1f feat(nlp-service): NER person name extraction 2026-06-07 10:21:16 +02:00