familienarchiv

Author	SHA1	Message	Date
Marcel	324a76d6d2	test(nlp-service): guard global matcher state in try/finally Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 10:56:32 +02:00
Marcel	f4e8632e0d	chore(nlp-service): add .dockerignore to exclude dev artifacts from image Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 10:56:32 +02:00
Marcel	829194f345	chore(nlp-service): remove unused dateparser dependency Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 10:56:32 +02:00
Marcel	98ee6cf587	feat(nlp-service): wire NLP_FUZZY_THRESHOLD env var with 0-100 validation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 10:56:32 +02:00
Marcel	778382cd61	feat(nlp-service): cap /parse query at 500 chars via Field(max_length=500) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 10:56:32 +02:00
Marcel	8d4f30019b	feat(nlp-service): log WARNING when DATABASE_URL absent, ERROR on DB failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 10:56:32 +02:00
Marcel	d65879d273	fix(nlp-service): return generic 500 detail to prevent credential leakage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 10:56:32 +02:00
Marcel	bda7855cad	fix(nlp-service): eliminate false-positive person matches from dirty DB records - Wire _EXTRA_SPAN_STOPS into _extract_persons_and_role so German function words (im, seine, ihre, dem, …) terminate name spans — fixes "Clara im" and "seine Kinder" leaking into personNames - Add _NON_NAME_TOKENS filter in PersonMatcher.load() to skip DB records whose first_name contains prepositions or possessives — filters 290 bad records (annotations like "an seine Eltern", "Eltern in", place references like "Enkel Cram aus Mexiko") that were causing exact Pass-2 matches - Remove spaCy model downloads from Dockerfile (no longer needed after the DB-backed matcher rewrite) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 10:56:32 +02:00
Marcel	03d7d44e57	feat(nlp-service): replace spaCy NER with DB-backed PersonMatcher Rule-based pipeline: persons matched via rapidfuzz against all known names loaded from DB at startup. Fixes first-name-only extraction (Eugenie, Herbert), merged-span bug (Herbert + Eugenie de Gruyter), false positives on compound nouns, and EN/ES model failures. Date extraction unchanged (regex). No spaCy models required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 10:56:32 +02:00
Marcel	aa200bf3c5	feat(nlp-service): Dockerfile — python:3.11-slim, models baked in	2026-06-08 10:56:32 +02:00
Marcel	7404284130	feat(nlp-service): FastAPI app with /parse and /health endpoints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 10:56:32 +02:00
Marcel	3ddb2b278b	feat(nlp-service): full extract() pipeline — assembles all steps Also adds regex year-fallback in extract_dates() for de/es spaCy small models that don't tag bare 4-digit years as DATE entities, and widens the direction-token window to 2 tokens back to handle Spanish "antes de". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 10:56:32 +02:00
Marcel	702a72d575	feat(nlp-service): keyword extraction (POS-filtered, deduped lemmas)	2026-06-08 10:56:32 +02:00
Marcel	3f74deda8c	feat(nlp-service): date range extraction with direction detection	2026-06-08 10:56:32 +02:00
Marcel	8ed2a6d95b	feat(nlp-service): role detection (sender/receiver/any)	2026-06-08 10:56:32 +02:00
Marcel	deea34c797	feat(nlp-service): NER person name extraction	2026-06-08 10:56:32 +02:00
Marcel	482a1c2863	feat(nlp-service): spaCy model loading with get_nlp/load_all_models	2026-06-08 10:56:32 +02:00
Marcel	6b0a06e8b1	feat(nlp-service): scaffold — models, requirements, CLAUDE.md Task 1: Create standalone FastAPI service scaffold with models, test framework, and documentation. Includes ParseRequest, ParseResponse Pydantic models matching OllamaExtraction contract, plus three passing tests validating model validation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 10:11:34 +02:00

18 Commits