Commit Graph

8 Commits

Author SHA1 Message Date
Marcel
03d7d44e57 feat(nlp-service): replace spaCy NER with DB-backed PersonMatcher
Rule-based pipeline: persons matched via rapidfuzz against all known
names loaded from DB at startup. Fixes first-name-only extraction
(Eugenie, Herbert), merged-span bug (Herbert + Eugenie de Gruyter),
false positives on compound nouns, and EN/ES model failures.
Date extraction unchanged (regex). No spaCy models required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
3ddb2b278b feat(nlp-service): full extract() pipeline — assembles all steps
Also adds regex year-fallback in extract_dates() for de/es spaCy small
models that don't tag bare 4-digit years as DATE entities, and widens
the direction-token window to 2 tokens back to handle Spanish "antes de".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 10:56:32 +02:00
Marcel
702a72d575 feat(nlp-service): keyword extraction (POS-filtered, deduped lemmas) 2026-06-08 10:56:32 +02:00
Marcel
3f74deda8c feat(nlp-service): date range extraction with direction detection 2026-06-08 10:56:32 +02:00
Marcel
8ed2a6d95b feat(nlp-service): role detection (sender/receiver/any) 2026-06-08 10:56:32 +02:00
Marcel
deea34c797 feat(nlp-service): NER person name extraction 2026-06-08 10:56:32 +02:00
Marcel
482a1c2863 feat(nlp-service): spaCy model loading with get_nlp/load_all_models 2026-06-08 10:56:32 +02:00
Marcel
6b0a06e8b1 feat(nlp-service): scaffold — models, requirements, CLAUDE.md
Task 1: Create standalone FastAPI service scaffold with models, test framework,
and documentation. Includes ParseRequest, ParseResponse Pydantic models matching
OllamaExtraction contract, plus three passing tests validating model validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 10:11:34 +02:00