feat(nlp-service): replace spaCy NER with DB-backed PersonMatcher
Rule-based pipeline: persons matched via rapidfuzz against all known names loaded from DB at startup. Fixes first-name-only extraction (Eugenie, Herbert), merged-span bug (Herbert + Eugenie de Gruyter), false positives on compound nouns, and EN/ES model failures. Date extraction unchanged (regex). No spaCy models required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,6 +1,7 @@
|
||||
fastapi[standard]==0.115.6
|
||||
uvicorn[standard]==0.34.0
|
||||
spacy>=3.8,<4.0
|
||||
dateparser>=1.2,<2.0
|
||||
rapidfuzz>=3.0,<4.0
|
||||
psycopg2-binary>=2.9,<3.0
|
||||
pytest>=8.0,<9.0
|
||||
httpx>=0.28,<1.0
|
||||
|
||||
Reference in New Issue
Block a user