# NLP Service Lightweight FastAPI service that parses free-text search queries into structured extractions, replacing Ollama for the Familienarchiv NL search feature. ## Stack - Python 3.11, FastAPI 0.115, rapidfuzz 3.x, dateparser 1.2, psycopg2-binary No ML models — persons are matched against the live DB via fuzzy lookup. ## Endpoints - `POST /parse` — parse a free-text query, return extraction matching `OllamaExtraction` contract - `GET /health` — returns `{"status": "ok", "persons_loaded": N}` ## Running locally ```bash pip install -r requirements.txt # Without DB (empty person matcher — dates and keywords still work): uvicorn main:app --reload --port 8001 # With DB (full person matching): DATABASE_URL=postgresql://archive_user:secret@localhost:5432/family_archive_db \ uvicorn main:app --reload --port 8001 curl -X POST http://localhost:8001/parse \ -H "Content-Type: application/json" \ -d '{"query": "Briefe von Clara Cram an Walter de Gruyter vor 1920", "lang": "de"}' ``` ## Testing ```bash pytest -v ``` No DB required for tests — fixture pre-seeds the PersonMatcher with a small test corpus. ## Architecture - `person_matcher.py` — DB-backed name lookup: loads all persons at startup, fuzzy-matches query tokens after person prepositions - `extractor.py` — pipeline: persons → role → dates (regex) → keywords (stopword filter) - `main.py` — FastAPI app; reads `DATABASE_URL` env var at startup ## Design spec See `docs/superpowers/specs/2026-06-07-spacy-nlp-service-design.md`. ## Notes This is a **prototype** for extraction quality evaluation. No docker-compose integration or Java-side changes in this iteration. The extraction contract matches `OllamaExtraction` in `backend/src/main/java/org/raddatz/familienarchiv/search/`. Test sentences for manual evaluation are in `test_sentences.md`.