feat(nlp-service): replace spaCy NER with DB-backed PersonMatcher

Rule-based pipeline: persons matched via rapidfuzz against all known
names loaded from DB at startup. Fixes first-name-only extraction
(Eugenie, Herbert), merged-span bug (Herbert + Eugenie de Gruyter),
false positives on compound nouns, and EN/ES model failures.
Date extraction unchanged (regex). No spaCy models required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-06-07 11:00:03 +02:00
committed by marcel
parent aa200bf3c5
commit 03d7d44e57
8 changed files with 939 additions and 551 deletions

View File

@@ -1,6 +1,7 @@
fastapi[standard]==0.115.6
uvicorn[standard]==0.34.0
spacy>=3.8,<4.0
dateparser>=1.2,<2.0
rapidfuzz>=3.0,<4.0
psycopg2-binary>=2.9,<3.0
pytest>=8.0,<9.0
httpx>=0.28,<1.0