Removes 'dateparser 1.2' from the stack section (dependency was dropped in favour of the rule-based date regex pipeline). Rewrites the Notes section to reflect that docker-compose integration and Java-side wiring were both delivered in this PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1.9 KiB
1.9 KiB
NLP Service
Lightweight FastAPI service that parses free-text search queries into structured extractions, replacing Ollama for the Familienarchiv NL search feature.
Stack
- Python 3.11, FastAPI 0.115, rapidfuzz 3.x, psycopg2-binary
No ML models — persons are matched against the live DB via fuzzy lookup.
Endpoints
POST /parse— parse a free-text query, return extraction matchingNlpExtractioncontractGET /health— returns{"status": "ok", "persons_loaded": N}
Running locally
pip install -r requirements.txt
# Without DB (empty person matcher — dates and keywords still work):
uvicorn main:app --reload --port 8001
# With DB (full person matching):
DATABASE_URL=postgresql://archive_user:secret@localhost:5432/family_archive_db \
uvicorn main:app --reload --port 8001
curl -X POST http://localhost:8001/parse \
-H "Content-Type: application/json" \
-d '{"query": "Briefe von Clara Cram an Walter de Gruyter vor 1920", "lang": "de"}'
Testing
pytest -v
No DB required for tests — fixture pre-seeds the PersonMatcher with a small test corpus.
Architecture
person_matcher.py— DB-backed name lookup: loads all persons at startup, fuzzy-matches query tokens after person prepositionsextractor.py— pipeline: persons → role → dates (regex) → keywords (stopword filter)main.py— FastAPI app; readsDATABASE_URLenv var at startup
Design spec
See docs/superpowers/specs/2026-06-07-spacy-nlp-service-design.md.
Notes
This service is fully wired into docker-compose.yml (container archive-nlp, port 8001
internal-only) and the Java search path (RestClientNlpClient → NlQueryParserService →
NlSearchController). The extraction contract matches NlpExtraction in
backend/src/main/java/org/raddatz/familienarchiv/search/.
Test sentences for manual evaluation are in test_sentences.md.