Files
familienarchiv/docs/adr/035-rule-based-nlp-service.md
Marcel 6e997c7474
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m56s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m8s
CI / Backend Unit Tests (pull_request) Failing after 39s
CI / fail2ban Regex (pull_request) Successful in 53s
docs(adr): ADR-035 — replace Ollama with rule-based nlp-service
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 16:13:58 +02:00

106 lines
4.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-035: Replace Ollama with a rule-based NLP service for smart search
**Date:** 2026-06-07
**Status:** Accepted
**Deciders:** Marcel Raddatz
**Supersedes:** ADR-028 (Ollama for NL search), ADR-034 (Ollama production deployment)
**Relates to:** #771 (implementation)
---
## Context
ADR-028 introduced Ollama + qwen2.5-7B to parse free-text search queries into structured
extractions (person names, date ranges, person role, keywords). After deploying to
staging (ADR-034) the approach showed three problems:
1. **Cold-start latency:** even with `OLLAMA_KEEP_ALIVE=-1` a Qwen inference on CPU takes
~18 s. This blows the UX budget for a search feature and requires a 60 s timeout.
2. **Resource cost:** 8 GB resident RAM + 4 vCPU cap for an LLM whose only job is regex-
level entity extraction from short (< 500 char) German family-history queries.
3. **Fragility:** model-weight downloads, version pinning, and init-container orchestration
add operational surface area with no quality benefit over a deterministic parser.
The query set is narrow and well-understood: person names are all in the PostgreSQL
`persons` table; date patterns are a fixed repertoire of German/English/Spanish formats;
person role (sender vs. receiver) is reliably signalled by a handful of prepositions
("von", "an", "von … an"); keywords are nouns/proper nouns not consumed by the other
extractors.
---
## Decision
Replace Ollama with a lightweight, rule-based Python FastAPI service (`nlp-service`).
### Architecture
```
POST /api/search/nl (NlSearchController)
→ NlQueryParserService
→ RestClientNlpClient.parse(query, lang)
→ POST http://nlp-service:8001/parse
← { personNames, personRole, dateFrom, dateTo, keywords, rawQuery }
```
The response contract is identical to the old `OllamaExtraction`; only the transport
and implementation change. Java callers see `NlpExtraction` (renamed, same shape).
### Implementation
- **`nlp-service/`** — standalone FastAPI app (Python 3.11.12-slim image, ~256 MB RAM)
- `extractor.py` — pipeline: person extraction → role detection → date parsing → keywords
- `person_matcher.py` — two-pass fuzzy lookup (rapidfuzz 3.x) against the `persons` DB table;
loaded at startup, no live DB queries during extraction
- `models.py` — Pydantic `ParseRequest` (max 500 chars), `ParseResponse`
- `main.py` — lifespan loads persons from `DATABASE_URL`; `/health` reports `persons_loaded`
- **`backend/search/`** — `OllamaClient` / `OllamaExtraction` renamed to `NlpClient` /
`NlpExtraction`; `NlpProperties` (`@ConfigurationProperties("app.nlp")`) replaces
`OllamaProperties`; `lang` parameter added to `/parse` and threaded through the stack.
### Tunable parameters
| Env var | Default | Effect |
|---|---|---|
| `DATABASE_URL` | — | PostgreSQL DSN; unset → person matching disabled |
| `NLP_FUZZY_THRESHOLD` | `80` | rapidfuzz similarity floor (0100) |
### Graceful degradation
The backend's `RestClientNlpClient` wraps all HTTP errors and timeouts in
`DomainException.serviceUnavailable(SMART_SEARCH_UNAVAILABLE)`, returning HTTP 503 to
the client — identical behaviour to the Ollama path. The rate limiter is relaxed from
5 to 20 requests/min (rule-based extraction completes in < 50 ms vs. ~18 s for LLM).
---
## Consequences
### Positive
- **Latency:** < 50 ms per extraction vs. ~18 s — smart search is now interactive.
- **Memory:** ~256 MB vs. 8 GB — frees 7.75 GB on the production host.
- **No model downloads:** the image ships no weights; startup is a single DB query.
- **Deterministic:** same query always produces the same result; no temperature/sampling.
- **Testable without infrastructure:** pytest with a seeded `PersonMatcher` fixture; no
WireMock stubs needed for most unit tests.
### Trade-offs
- **No semantic generalisation.** The LLM could handle novel phrasing; the rule-based
parser only handles the preposition patterns it was written for. Edge cases that fall
outside the pattern produce an empty extraction rather than a best-effort result.
- **Person matching depends on DB content.** A person not yet in the archive will never
match, even if the user types their exact name. The LLM could surface the name as a
raw string; this service surfaces nothing. This is acceptable for the current archive
size and query patterns.
- **Language support is fixed at de/en/es** (Paraglide locales). Adding a fourth locale
requires adding its stopword list and preposition table to `extractor.py`.
### Superseded ADRs
ADR-028 and ADR-034 documented the Ollama topology, init recipe, keep-alive pin, and
memory budget. All of that is now moot. The `ollama`, `ollama-model-init`, and
`ollama_models` volume are removed from `docker-compose.yml`.