feat(search): replace Ollama inference with lightweight DB-backed NLP service #770

New Issue

marcel · 2026-06-07T14:14:04+02:00

marcel commented

2026-06-07 14:14:04 +02:00

Problem

The current NL search uses Ollama (qwen2.5:7b-instruct-q4_K_M) to parse free-text queries into structured extractions. Inference takes 5–15 seconds per request, making the feature effectively unusable compared to filling in the filter form manually. The inference server also requires a GPU-capable host (or slow CPU fallback) and adds operational complexity.

A prototype (nlp-service/) has been built and evaluated against live DB data. This issue captures the full findings and specifies the production integration work.

Prototype: What Was Built

Branch: worktree-feat+nlp-service · Worktree: nlp-service/

A standalone FastAPI service (Python 3.11) that parses free-text queries into the same OllamaExtraction contract the Java backend already consumes. No ML inference — response time is <100ms.

Extraction pipeline

POST /parse { query, lang }  →  { personNames, personRole, dateFrom, dateTo, keywords, rawQuery }

Step 1 — Person extraction (DB-backed fuzzy matching)
Loads all (first_name, last_name) rows from the persons table at startup (1,728 records after filtering). At query time, scans for tokens following person-indicator prepositions and fuzzy-matches them against the loaded name index using rapidfuzz.token_sort_ratio (threshold 80, configurable via NLP_FUZZY_THRESHOLD env var).

Step 2 — Role detection
Single-person query: the anchoring preposition determines sender / receiver.
Multi-person query: always returns "any" — Java maps personNames[0] → sender, [1] → receiver by left-to-right order.

Step 3 — Date extraction
Pure regex: \b\d{4}\b finds year tokens. Direction words in the two tokens immediately before the year determine the range:

Direction tokens	de	en	es
Before / upper bound	`vor`	`before`	`antes`
After / lower bound	`nach`	`after`	`después` / `despues`
Range	`zwischen…und`	`between…and`	`entre…y`
Bare year	—	—	— → closed range Jan 1 – Dec 31

Step 4 — Keyword extraction
Removes matched person spans and year tokens from the query text, then applies a stopword filter (de/en/es) and emits content words ≥ 3 characters.

Approach Tried First: spaCy NER — Why It Was Abandoned

The original spec called for spaCy NER (de_core_news_sm, en_core_web_sm, es_core_news_sm). After full evaluation against real archive queries, spaCy was dropped. Failures documented:

1. Historical names not in training data

spaCy models are trained on modern news corpora. 1890–1950 German family names score below the PER threshold or are not tagged at all.

"Briefe von Eugenie" → personNames: [] (first-name-only, not recognised)
"Kriegsbriefe von Herbert" → keywords: ["herbert"] (classified as a noun, not a person)

2. Cross-language failure

The English model has no knowledge of German names. Running en_core_web_sm on a query with German names returns empty personNames.

"Letters from Clara Cram to Walter de Gruyter in 1920" → personNames: []

3. Span merging across prepositions

spaCy NER greedily merges multi-token spans across prepositions when it thinks they form a single entity.

"Briefe von Herbert an Eugenie de Gruyter nach 1914" → one PER span ["Herbert an Eugenie de Gruyter"] instead of two

4. False positive compound nouns

The German model tags long compound nouns as PER entities.

"Geburtstagsglückwünsche" → personNames: ["Geburtstagsglückwünsche"]

5. DATE span unreliability

"Letters between 1914 and 1918" in English did not reliably produce two DATE spans; dateparser fallback produced inconsistent results.

6. Image size

Three baked spaCy models add ~300 MB to the Docker image (~350 MB total). The DB-backed approach produces an image of ~50 MB.

The Winning Approach: PersonMatcher

PersonMatcher (in person_matcher.py) loads the persons table at startup, builds a lowercase name index (full name, first name, last name as separate variants), and at query time runs two passes:

Pass 1 — preposition-anchored: For each person-indicator preposition found in the token list, collect up to 3 consecutive non-stop tokens after it, fuzzy-match longest-first against the index. Stop tokens = prepositions + date-direction words + German function words (im, am, seine, ihre, dem, den, über, und, oder, …).

Pass 2 — full-name exact scan: Scan positions not consumed by Pass 1 for exact 2- or 3-word matches against the name index. This handles queries without prepositions (e.g. "Clara Cram Herbert Cram 1920").

DB data quality filter

290 of the 2,018 DB rows are annotations or descriptions stored as person records (e.g. first_name="an seine", first_name="Eltern in", first_name="Todesanzeige für Paul Kesten mit…"). These are skipped at load time: any record whose first_name tokens contain known prepositions or possessives (an, in, aus, für, von, sein, seine, ihr, ihre, …) is excluded via _NON_NAME_TOKENS in PersonMatcher.load(). This is a stopgap — see "Technical Debt" below.

Test Battery — Full Results

Evaluated against the live DB (1,728 persons loaded). Expected fields shown; personRole shown only when not "any".

German — full sentences

Query	personNames	role	dateFrom	dateTo
`Briefe von Clara Cram an Walter de Gruyter im Jahr 1920`	`["Clara Cram", "Walter de Gruyter"]`	—	1920-01-01	1920-12-31
`Briefe von Herbert an Eugenie de Gruyter nach 1914`	`["Herbert", "Eugenie de Gruyter"]`	—	1914-01-01	—
`Schreiben von Albert de Gruyter an seine Kinder vor 1900`	`["Albert de Gruyter"]`	sender	—	1900-12-31
`Briefe von Juan Cram an Marie zwischen 1915 und 1918`	`["Juan Cram", "Marie"]`	—	1915-01-01	1918-12-31
`Telegramm von Walter de Gruyter an Clara im Jahr 1930`	`["Walter de Gruyter", "Clara"]`	—	1930-01-01	1930-12-31
`Briefe von Else Bohrmann an Herbert Cram nach 1939`	`["Else Bohrmann", "Herbert Cram"]`	—	1939-01-01	—

German — medium

Query	personNames	role	dates
`Briefe von Clara Cram vor 1910`	`["Clara Cram"]`	sender	–1910-12-31
`Briefe an Herbert Cram nach dem Krieg`	`["Herbert Cram"]`	receiver	—
`Schriften von Eugenie de Gruyter im Jahr 1905`	`["Eugenie de Gruyter"]`	sender	1905
`Briefe an Walter de Gruyter`	`["Walter de Gruyter"]`	receiver	—

German — topic only (no persons)

Query	keywords
`Briefe aus dem Krieg`	`["briefe", "krieg"]`
`Kriegspost`	`["kriegspost"]`
`Geburtstagsglückwünsche`	`["geburtstagsglückwünsche"]`
`Kondolenzbriefe nach dem Tod von Eugenie`	`["kondolenzbriefe", "tod"]` + `personNames: ["Eugenie"]`

German — date ranges

Query	dateFrom	dateTo
`Dokumente zwischen 1914 und 1918`	1914-01-01	1918-12-31
`Briefe vor 1900`	—	1900-12-31
`Schriften nach 1920`	1920-01-01	—
`1918`	1918-01-01	1918-12-31

English & Spanish

Query	personNames	dates
`Letters from Clara Cram to Walter de Gruyter in 1920`	`["Clara Cram", "Walter de Gruyter"]`	1920
`Letters to Herbert Cram after 1939`	`["Herbert Cram"]` receiver	1939-01-01–
`Birthday greetings from Anita Wöhler`	`["Anita Wöhler"]` sender	—
`Letters between 1914 and 1918`	`[]`	1914–1918
`Cartas de Clara Cram a Walter de Gruyter en 1920`	`["Clara Cram", "Walter de Gruyter"]`	1920
`Cartas antes de 1900`	`[]`	–1900-12-31
`Cartas de Juan Cram a sus hijos entre 1915 y 1920`	`["Juan Cram"]` sender	1915–1920

Edge cases — typos and lazy queries

Query	Result
`Briefe von Eugenie`	`personNames: ["Eugenie"]` sender ✅
`Kriegsbriefe von Herbert`	`personNames: ["Herbert"]` sender ✅
`Briefe von Klara Kram an Herbert`	`personNames: ["Klara Kram", "Herbert"]` (fuzzy match) ✅
`briefe von clara cram an herbert 1920` (all lowercase)	`personNames: ["clara cram", "herbert"]` ✅
`von Clara`	`personNames: ["Clara"]` sender ✅
`an Walter`	`personNames: ["Walter"]` receiver ✅
`Wer hat an Herbert Cram 1918 geschrieben?`	`personNames: ["Herbert Cram"]` receiver, 1918 ✅
`Clara Cram Herbert Cram 1920` (no prepositions)	`personNames: ["Clara Cram", "Herbert Cram"]` via Pass 2 ✅
`Briefe von Herrbert Cram` (double-r typo)	`personNames: ["Herrbert Cram"]` (fuzzy match, ≥80) ✅
`Briefe von Clara nach Herbert`	`personNames: ["Clara", "Herbert"]` ✅
`Eugenie` (bare name, no preposition)	`personNames: []` ⚠️ by design — see caveats
`Herbert` (bare name, no preposition)	`personNames: []` ⚠️ by design — see caveats
`Reise nach Mexiko im Jahr 1925`	`personNames: []`, keywords: `["reise", "mexiko"]` ✅
`Schreiben von Albert de Gruyter an seine Kinder vor 1900`	`personNames: ["Albert de Gruyter"]` ("seine Kinder" correctly suppressed) ✅

Caveats and Known Limitations

Single bare names without prepositions

"Eugenie" and "Herbert" typed alone return personNames: []. Without a preposition anchor there is no reliable way to distinguish a person name from a keyword. The noMatchFragments path in NlQueryParserService.buildText() feeds the name into full-text search, so documents mentioning the name will still surface — but the interpretation panel will show no person chip. Known UX gap: users may not understand why their name search returned text-matched documents instead of person-filtered ones. A future "Did you mean?" hint (using the ambiguous-persons panel) is the intended fix.

"nach" dual-role ambiguity

nach is both a German receiver preposition ("an/nach Herbert") and the date-after direction word ("nach 1914"). Person extraction and date extraction are separate passes and both consume the token correctly. The ambiguity only becomes an issue if a person name immediately precedes a year: "Briefe nach Marie 1920" — the direction-before-year logic detects that the word immediately before the year is a person token and falls back to a bare year (correct behaviour, covered by tests).

Fuzzy threshold false positives at 80

The threshold is configurable via NLP_FUZZY_THRESHOLD env var (default 80). If false positives re-emerge after DB imports, raise to 85 and re-run the battery — no code change or redeploy required.

DB data quality dependency

290 dirty records (descriptions stored as first_name) are filtered at load time via _NON_NAME_TOKENS. A future DB cleanup would make this filter unnecessary. See "Technical Debt" below.

Single-language preposition sets

The preposition lists are fixed enumerations (12–15 tokens across 3 languages). Unconventional phrasing without a listed preposition returns personRole: "any" — same behaviour as Ollama. Java searches both sender and receiver positions in this case.

Person names echoed verbatim (not resolved)

personNames contains query text as matched, not the canonical DB name. Intentional — NlQueryParserService.resolveNames() already does its own fuzzy lookup. Duplicating this in Python would violate the single-responsibility principle.

No reload on DB changes

The name index is loaded once at startup. New persons added to the DB are invisible until restart. A /reload endpoint or scheduled reload can be added if needed.

Bare-name UX and the interpretation panel

When a query like "Eugenie" produces personNames: [], the interpretation panel shows no resolved person. The Java side sends "Eugenie" into full-text search via noMatchFragments, which may return relevant results — but silently. This is a product gap to address separately, not a blocker for this integration.

Service Spec (for Java integration)

Endpoint

POST http://nlp-service:8001/parse
Content-Type: application/json

{ "query": "...", "lang": "de" | "en" | "es" }

lang is required. The frontend passes the active Paraglide locale (de/en/es) from the session. NlSearchRequest must gain a lang field; NlParserClient.parse(String query, String lang) must accept both parameters.

Response (matches `NlpExtraction` contract)

{
  "personNames": ["Clara Cram", "Walter de Gruyter"],
  "personRole": "any",
  "dateFrom": "1920-01-01",
  "dateTo": "1920-12-31",
  "keywords": ["briefe"],
  "rawQuery": "Briefe von Clara Cram an Walter de Gruyter im Jahr 1920"
}

Health

GET http://nlp-service:8001/health
→ { "status": "ok", "persons_loaded": 1728 }

persons_loaded: 0 indicates the service started before the DB was ready (degraded: dates and keywords still work, person matching disabled). The compose HEALTHCHECK should check persons_loaded > 0.

Environment

Variable	Default	Notes
`DATABASE_URL`	—	Full PostgreSQL DSN. Constructed from compose env: `postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}`
`NLP_FUZZY_THRESHOLD`	`80`	rapidfuzz token_sort_ratio cutoff for person matching

Security note: nlp-service has no auth layer. It must NOT have a ports: mapping in docker-compose — it is reachable only at http://nlp-service:8001 from within the Docker network. Never log query content in production; log only aggregate metadata (persons_found=N, lang=de).

Production Integration Work

The following is needed to replace Ollama in production. None of this is in the current branch.

Python service changes

Add query: str = Field(..., min_length=3, max_length=500) to ParseRequest in models.py — defence-in-depth, independent of Java caller
Add NLP_FUZZY_THRESHOLD env var wiring in PersonMatcher (already noted as configurable — wire it up)

Java changes

Add lang to NlSearchRequest — String lang with default "de"; frontend passes Paraglide locale
Rename OllamaClient → NlParserClient and OllamaExtraction → NlpExtraction — do this in the same PR; stale names cause ongoing confusion. Not cosmetic.
Create RestClientNlpClient implementing NlParserClient — POSTs { query, lang } to http://nlp-service:8001/parse
Create NlpProperties (app.nlp.base-url, app.nlp.timeout-seconds = 5) — do not reuse OllamaProperties; the 30s Ollama timeout is wrong for a sub-second service
Wire NlParserHealthClient — RestClientNlpClient must also implement the health interface (currently OllamaHealthClient), calling GET /health on the NLP service
Update NlQueryParserService call site: nlParserClient.parse(request.query(), request.lang())
SMART_SEARCH_UNAVAILABLE error code: already fires on RestClientException; ensure the new client maps 5xx and connection refused to the same DomainException.serviceUnavailable
Rate limiter: raise default in NlSearchRateLimitProperties from 5 → 20 req/min (Ollama-era value is too restrictive for <100ms responses)
Java test: @WebMvcTest or RestClientTest that stubs the NLP service at the HTTP level and asserts the full NlSearchController → NlQueryParserService → RestClientNlpClient round-trip

docker-compose changes

Add nlp-service: mem_limit: 256m, depends_on: db with condition: service_healthy, no ports: mapping (internal network only), DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}, NLP_FUZZY_THRESHOLD: ${NLP_FUZZY_THRESHOLD:-80}
Update HEALTHCHECK: CMD curl -sf http://localhost:8001/health | grep -q '"persons_loaded":[^0]' || exit 1
Remove ollama and ollama-model-init services — these consume ~8 GB RAM and 4.7 GB model storage; removing them is the primary operational win of this migration
Remove APP_OLLAMA_BASE_URL, APP_OLLAMA_MODEL, APP_OLLAMA_API_KEY from the backend service environment block
Add APP_NLP_BASE_URL: http://nlp-service:8001 to the backend environment

Acceptance criteria

✅ POST /api/search/nl with { "query": "Briefe von Clara Cram vor 1910", "lang": "de" } returns interpretation.resolvedPersons containing Clara Cram, interpretation.dateFrom: null, interpretation.dateTo: "1910-12-31"
✅ When nlp-service is unreachable, search returns HTTP 503 with code: SMART_SEARCH_UNAVAILABLE
✅ Existing NlQueryParserService unit tests pass with a stubbed NlParserClient
✅ docker stats shows nlp-service under 256 MB RSS
✅ ollama and ollama-model-init containers are gone from docker-compose ps

Technical Debt Created

Dirty DB records (first_name containing prepositions/phrases): 290 records are filtered at NLP service load time. The correct fix is a DB migration to remove or re-structure these records. File a separate cleanup issue targeting the import pipeline that created them.
_NON_NAME_TOKENS in PersonMatcher: valid stopgap; can be removed once DB is clean.

Prototype Code Location

nlp-service/
├── main.py           FastAPI app — lifespan loads persons from DB
├── extractor.py      Full pipeline: PersonMatcher → role → dates (regex) → keywords
├── person_matcher.py DB-backed fuzzy name matcher (rapidfuzz)
├── models.py         Pydantic ParseRequest / ParseResponse
├── requirements.txt  fastapi, uvicorn, rapidfuzz, dateparser, psycopg2-binary, pytest, httpx
├── Dockerfile        python:3.11-slim, ~50 MB image (no ML models)
├── test_extractor.py 61 unit + integration tests — no DB required
├── test_main.py      FastAPI integration tests
├── test_sentences.md Full manual evaluation corpus
└── CLAUDE.md         Dev guide

Test suite: pytest -v — runs without a DB connection (fixture pre-seeds PersonMatcher).

Last commit on branch: fix(nlp-service): eliminate false-positive person matches from dirty DB records

## Problem The current NL search uses Ollama (`qwen2.5:7b-instruct-q4_K_M`) to parse free-text queries into structured extractions. Inference takes **5–15 seconds per request**, making the feature effectively unusable compared to filling in the filter form manually. The inference server also requires a GPU-capable host (or slow CPU fallback) and adds operational complexity. A prototype (`nlp-service/`) has been built and evaluated against live DB data. This issue captures the full findings and specifies the production integration work. --- ## Prototype: What Was Built Branch: `worktree-feat+nlp-service` · Worktree: `nlp-service/` A standalone FastAPI service (Python 3.11) that parses free-text queries into the same `OllamaExtraction` contract the Java backend already consumes. No ML inference — response time is <100ms. ### Extraction pipeline ``` POST /parse { query, lang } → { personNames, personRole, dateFrom, dateTo, keywords, rawQuery } ``` **Step 1 — Person extraction (DB-backed fuzzy matching)** Loads all `(first_name, last_name)` rows from the `persons` table at startup (1,728 records after filtering). At query time, scans for tokens following person-indicator prepositions and fuzzy-matches them against the loaded name index using `rapidfuzz.token_sort_ratio` (threshold 80, configurable via `NLP_FUZZY_THRESHOLD` env var). **Step 2 — Role detection** Single-person query: the anchoring preposition determines `sender` / `receiver`. Multi-person query: always returns `"any"` — Java maps `personNames[0]` → sender, `[1]` → receiver by left-to-right order. **Step 3 — Date extraction** Pure regex: `\b\d{4}\b` finds year tokens. Direction words in the two tokens immediately before the year determine the range: | Direction tokens | de | en | es | |---|---|---|---| | Before / upper bound | `vor` | `before` | `antes` | | After / lower bound | `nach` | `after` | `después` / `despues` | | Range | `zwischen…und` | `between…and` | `entre…y` | | Bare year | — | — | — → closed range Jan 1 – Dec 31 | **Step 4 — Keyword extraction** Removes matched person spans and year tokens from the query text, then applies a stopword filter (de/en/es) and emits content words ≥ 3 characters. --- ## Approach Tried First: spaCy NER — Why It Was Abandoned The original spec called for spaCy NER (`de_core_news_sm`, `en_core_web_sm`, `es_core_news_sm`). After full evaluation against real archive queries, spaCy was dropped. Failures documented: ### 1. Historical names not in training data spaCy models are trained on modern news corpora. 1890–1950 German family names score below the PER threshold or are not tagged at all. - `"Briefe von Eugenie"` → `personNames: []` (first-name-only, not recognised) - `"Kriegsbriefe von Herbert"` → `keywords: ["herbert"]` (classified as a noun, not a person) ### 2. Cross-language failure The English model has no knowledge of German names. Running `en_core_web_sm` on a query with German names returns empty `personNames`. - `"Letters from Clara Cram to Walter de Gruyter in 1920"` → `personNames: []` ### 3. Span merging across prepositions spaCy NER greedily merges multi-token spans across prepositions when it thinks they form a single entity. - `"Briefe von Herbert an Eugenie de Gruyter nach 1914"` → one PER span `["Herbert an Eugenie de Gruyter"]` instead of two ### 4. False positive compound nouns The German model tags long compound nouns as PER entities. - `"Geburtstagsglückwünsche"` → `personNames: ["Geburtstagsglückwünsche"]` ### 5. DATE span unreliability `"Letters between 1914 and 1918"` in English did not reliably produce two DATE spans; dateparser fallback produced inconsistent results. ### 6. Image size Three baked spaCy models add ~300 MB to the Docker image (~350 MB total). The DB-backed approach produces an image of ~50 MB. --- ## The Winning Approach: PersonMatcher `PersonMatcher` (in `person_matcher.py`) loads the persons table at startup, builds a lowercase name index (full name, first name, last name as separate variants), and at query time runs two passes: **Pass 1 — preposition-anchored:** For each person-indicator preposition found in the token list, collect up to 3 consecutive non-stop tokens after it, fuzzy-match longest-first against the index. Stop tokens = prepositions + date-direction words + German function words (`im`, `am`, `seine`, `ihre`, `dem`, `den`, `über`, `und`, `oder`, …). **Pass 2 — full-name exact scan:** Scan positions not consumed by Pass 1 for exact 2- or 3-word matches against the name index. This handles queries without prepositions (e.g. `"Clara Cram Herbert Cram 1920"`). ### DB data quality filter 290 of the 2,018 DB rows are annotations or descriptions stored as person records (e.g. `first_name="an seine"`, `first_name="Eltern in"`, `first_name="Todesanzeige für Paul Kesten mit…"`). These are skipped at load time: any record whose `first_name` tokens contain known prepositions or possessives (`an`, `in`, `aus`, `für`, `von`, `sein`, `seine`, `ihr`, `ihre`, …) is excluded via `_NON_NAME_TOKENS` in `PersonMatcher.load()`. This is a stopgap — see "Technical Debt" below. --- ## Test Battery — Full Results Evaluated against the live DB (1,728 persons loaded). Expected fields shown; `personRole` shown only when not `"any"`. ### German — full sentences | Query | personNames | role | dateFrom | dateTo | |---|---|---|---|---| | `Briefe von Clara Cram an Walter de Gruyter im Jahr 1920` | `["Clara Cram", "Walter de Gruyter"]` | — | 1920-01-01 | 1920-12-31 | | `Briefe von Herbert an Eugenie de Gruyter nach 1914` | `["Herbert", "Eugenie de Gruyter"]` | — | 1914-01-01 | — | | `Schreiben von Albert de Gruyter an seine Kinder vor 1900` | `["Albert de Gruyter"]` | sender | — | 1900-12-31 | | `Briefe von Juan Cram an Marie zwischen 1915 und 1918` | `["Juan Cram", "Marie"]` | — | 1915-01-01 | 1918-12-31 | | `Telegramm von Walter de Gruyter an Clara im Jahr 1930` | `["Walter de Gruyter", "Clara"]` | — | 1930-01-01 | 1930-12-31 | | `Briefe von Else Bohrmann an Herbert Cram nach 1939` | `["Else Bohrmann", "Herbert Cram"]` | — | 1939-01-01 | — | ### German — medium | Query | personNames | role | dates | |---|---|---|---| | `Briefe von Clara Cram vor 1910` | `["Clara Cram"]` | sender | –1910-12-31 | | `Briefe an Herbert Cram nach dem Krieg` | `["Herbert Cram"]` | receiver | — | | `Schriften von Eugenie de Gruyter im Jahr 1905` | `["Eugenie de Gruyter"]` | sender | 1905 | | `Briefe an Walter de Gruyter` | `["Walter de Gruyter"]` | receiver | — | ### German — topic only (no persons) | Query | keywords | |---|---| | `Briefe aus dem Krieg` | `["briefe", "krieg"]` | | `Kriegspost` | `["kriegspost"]` | | `Geburtstagsglückwünsche` | `["geburtstagsglückwünsche"]` | | `Kondolenzbriefe nach dem Tod von Eugenie` | `["kondolenzbriefe", "tod"]` + `personNames: ["Eugenie"]` | ### German — date ranges | Query | dateFrom | dateTo | |---|---|---| | `Dokumente zwischen 1914 und 1918` | 1914-01-01 | 1918-12-31 | | `Briefe vor 1900` | — | 1900-12-31 | | `Schriften nach 1920` | 1920-01-01 | — | | `1918` | 1918-01-01 | 1918-12-31 | ### English & Spanish | Query | personNames | dates | |---|---|---| | `Letters from Clara Cram to Walter de Gruyter in 1920` | `["Clara Cram", "Walter de Gruyter"]` | 1920 | | `Letters to Herbert Cram after 1939` | `["Herbert Cram"]` receiver | 1939-01-01– | | `Birthday greetings from Anita Wöhler` | `["Anita Wöhler"]` sender | — | | `Letters between 1914 and 1918` | `[]` | 1914–1918 | | `Cartas de Clara Cram a Walter de Gruyter en 1920` | `["Clara Cram", "Walter de Gruyter"]` | 1920 | | `Cartas antes de 1900` | `[]` | –1900-12-31 | | `Cartas de Juan Cram a sus hijos entre 1915 y 1920` | `["Juan Cram"]` sender | 1915–1920 | ### Edge cases — typos and lazy queries | Query | Result | |---|---| | `Briefe von Eugenie` | `personNames: ["Eugenie"]` sender ✅ | | `Kriegsbriefe von Herbert` | `personNames: ["Herbert"]` sender ✅ | | `Briefe von Klara Kram an Herbert` | `personNames: ["Klara Kram", "Herbert"]` (fuzzy match) ✅ | | `briefe von clara cram an herbert 1920` (all lowercase) | `personNames: ["clara cram", "herbert"]` ✅ | | `von Clara` | `personNames: ["Clara"]` sender ✅ | | `an Walter` | `personNames: ["Walter"]` receiver ✅ | | `Wer hat an Herbert Cram 1918 geschrieben?` | `personNames: ["Herbert Cram"]` receiver, 1918 ✅ | | `Clara Cram Herbert Cram 1920` (no prepositions) | `personNames: ["Clara Cram", "Herbert Cram"]` via Pass 2 ✅ | | `Briefe von Herrbert Cram` (double-r typo) | `personNames: ["Herrbert Cram"]` (fuzzy match, ≥80) ✅ | | `Briefe von Clara nach Herbert` | `personNames: ["Clara", "Herbert"]` ✅ | | `Eugenie` (bare name, no preposition) | `personNames: []` ⚠️ by design — see caveats | | `Herbert` (bare name, no preposition) | `personNames: []` ⚠️ by design — see caveats | | `Reise nach Mexiko im Jahr 1925` | `personNames: []`, keywords: `["reise", "mexiko"]` ✅ | | `Schreiben von Albert de Gruyter an seine Kinder vor 1900` | `personNames: ["Albert de Gruyter"]` ("seine Kinder" correctly suppressed) ✅ | --- ## Caveats and Known Limitations ### Single bare names without prepositions `"Eugenie"` and `"Herbert"` typed alone return `personNames: []`. Without a preposition anchor there is no reliable way to distinguish a person name from a keyword. The `noMatchFragments` path in `NlQueryParserService.buildText()` feeds the name into full-text search, so documents mentioning the name will still surface — but the interpretation panel will show no person chip. **Known UX gap:** users may not understand why their name search returned text-matched documents instead of person-filtered ones. A future "Did you mean?" hint (using the ambiguous-persons panel) is the intended fix. ### "nach" dual-role ambiguity `nach` is both a German receiver preposition ("an/nach Herbert") and the date-after direction word ("nach 1914"). Person extraction and date extraction are separate passes and both consume the token correctly. The ambiguity only becomes an issue if a person name immediately precedes a year: `"Briefe nach Marie 1920"` — the direction-before-year logic detects that the word immediately before the year is a person token and falls back to a bare year (correct behaviour, covered by tests). ### Fuzzy threshold false positives at 80 The threshold is configurable via `NLP_FUZZY_THRESHOLD` env var (default 80). If false positives re-emerge after DB imports, raise to 85 and re-run the battery — no code change or redeploy required. ### DB data quality dependency 290 dirty records (descriptions stored as first_name) are filtered at load time via `_NON_NAME_TOKENS`. A future DB cleanup would make this filter unnecessary. See "Technical Debt" below. ### Single-language preposition sets The preposition lists are fixed enumerations (12–15 tokens across 3 languages). Unconventional phrasing without a listed preposition returns `personRole: "any"` — same behaviour as Ollama. Java searches both sender and receiver positions in this case. ### Person names echoed verbatim (not resolved) `personNames` contains query text as matched, not the canonical DB name. Intentional — `NlQueryParserService.resolveNames()` already does its own fuzzy lookup. Duplicating this in Python would violate the single-responsibility principle. ### No reload on DB changes The name index is loaded once at startup. New persons added to the DB are invisible until restart. A `/reload` endpoint or scheduled reload can be added if needed. ### Bare-name UX and the interpretation panel When a query like `"Eugenie"` produces `personNames: []`, the interpretation panel shows no resolved person. The Java side sends "Eugenie" into full-text search via `noMatchFragments`, which may return relevant results — but silently. This is a product gap to address separately, not a blocker for this integration. --- ## Service Spec (for Java integration) ### Endpoint ``` POST http://nlp-service:8001/parse Content-Type: application/json { "query": "...", "lang": "de" | "en" | "es" } ``` `lang` is required. The frontend passes the active Paraglide locale (`de`/`en`/`es`) from the session. `NlSearchRequest` must gain a `lang` field; `NlParserClient.parse(String query, String lang)` must accept both parameters. ### Response (matches `NlpExtraction` contract) ```json { "personNames": ["Clara Cram", "Walter de Gruyter"], "personRole": "any", "dateFrom": "1920-01-01", "dateTo": "1920-12-31", "keywords": ["briefe"], "rawQuery": "Briefe von Clara Cram an Walter de Gruyter im Jahr 1920" } ``` ### Health ``` GET http://nlp-service:8001/health → { "status": "ok", "persons_loaded": 1728 } ``` `persons_loaded: 0` indicates the service started before the DB was ready (degraded: dates and keywords still work, person matching disabled). The compose `HEALTHCHECK` should check `persons_loaded > 0`. ### Environment | Variable | Default | Notes | |---|---|---| | `DATABASE_URL` | — | Full PostgreSQL DSN. Constructed from compose env: `postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}` | | `NLP_FUZZY_THRESHOLD` | `80` | rapidfuzz token_sort_ratio cutoff for person matching | **Security note:** `nlp-service` has no auth layer. It must NOT have a `ports:` mapping in docker-compose — it is reachable only at `http://nlp-service:8001` from within the Docker network. Never log `query` content in production; log only aggregate metadata (`persons_found=N`, `lang=de`). --- ## Production Integration Work The following is needed to replace Ollama in production. **None of this is in the current branch.** ### Python service changes - [ ] Add `query: str = Field(..., min_length=3, max_length=500)` to `ParseRequest` in `models.py` — defence-in-depth, independent of Java caller - [ ] Add `NLP_FUZZY_THRESHOLD` env var wiring in `PersonMatcher` (already noted as configurable — wire it up) ### Java changes - [ ] **Add `lang` to `NlSearchRequest`** — `String lang` with default `"de"`; frontend passes Paraglide locale - [ ] **Rename `OllamaClient` → `NlParserClient`** and **`OllamaExtraction` → `NlpExtraction`** — do this in the same PR; stale names cause ongoing confusion. Not cosmetic. - [ ] **Create `RestClientNlpClient`** implementing `NlParserClient` — POSTs `{ query, lang }` to `http://nlp-service:8001/parse` - [ ] **Create `NlpProperties`** (`app.nlp.base-url`, `app.nlp.timeout-seconds = 5`) — do not reuse `OllamaProperties`; the 30s Ollama timeout is wrong for a sub-second service - [ ] **Wire `NlParserHealthClient`** — `RestClientNlpClient` must also implement the health interface (currently `OllamaHealthClient`), calling `GET /health` on the NLP service - [ ] **Update `NlQueryParserService`** call site: `nlParserClient.parse(request.query(), request.lang())` - [ ] **`SMART_SEARCH_UNAVAILABLE` error code**: already fires on `RestClientException`; ensure the new client maps 5xx and connection refused to the same `DomainException.serviceUnavailable` - [ ] **Rate limiter**: raise default in `NlSearchRateLimitProperties` from 5 → 20 req/min (Ollama-era value is too restrictive for <100ms responses) - [ ] **Java test**: `@WebMvcTest` or `RestClientTest` that stubs the NLP service at the HTTP level and asserts the full `NlSearchController → NlQueryParserService → RestClientNlpClient` round-trip ### docker-compose changes - [ ] **Add `nlp-service`**: `mem_limit: 256m`, `depends_on: db` with `condition: service_healthy`, no `ports:` mapping (internal network only), `DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}`, `NLP_FUZZY_THRESHOLD: ${NLP_FUZZY_THRESHOLD:-80}` - [ ] **Update HEALTHCHECK**: `CMD curl -sf http://localhost:8001/health | grep -q '"persons_loaded":[^0]' || exit 1` - [ ] **Remove `ollama` and `ollama-model-init` services** — these consume ~8 GB RAM and 4.7 GB model storage; removing them is the primary operational win of this migration - [ ] **Remove `APP_OLLAMA_BASE_URL`, `APP_OLLAMA_MODEL`, `APP_OLLAMA_API_KEY`** from the backend service environment block - [ ] **Add `APP_NLP_BASE_URL: http://nlp-service:8001`** to the backend environment ### Acceptance criteria - ✅ `POST /api/search/nl` with `{ "query": "Briefe von Clara Cram vor 1910", "lang": "de" }` returns `interpretation.resolvedPersons` containing Clara Cram, `interpretation.dateFrom: null`, `interpretation.dateTo: "1910-12-31"` - ✅ When nlp-service is unreachable, search returns HTTP 503 with `code: SMART_SEARCH_UNAVAILABLE` - ✅ Existing `NlQueryParserService` unit tests pass with a stubbed `NlParserClient` - ✅ `docker stats` shows nlp-service under 256 MB RSS - ✅ `ollama` and `ollama-model-init` containers are gone from `docker-compose ps` --- ## Technical Debt Created - **Dirty DB records** (`first_name` containing prepositions/phrases): 290 records are filtered at NLP service load time. The correct fix is a DB migration to remove or re-structure these records. File a separate cleanup issue targeting the import pipeline that created them. - **`_NON_NAME_TOKENS` in `PersonMatcher`**: valid stopgap; can be removed once DB is clean. --- ## Prototype Code Location ``` nlp-service/ ├── main.py FastAPI app — lifespan loads persons from DB ├── extractor.py Full pipeline: PersonMatcher → role → dates (regex) → keywords ├── person_matcher.py DB-backed fuzzy name matcher (rapidfuzz) ├── models.py Pydantic ParseRequest / ParseResponse ├── requirements.txt fastapi, uvicorn, rapidfuzz, dateparser, psycopg2-binary, pytest, httpx ├── Dockerfile python:3.11-slim, ~50 MB image (no ML models) ├── test_extractor.py 61 unit + integration tests — no DB required ├── test_main.py FastAPI integration tests ├── test_sentences.md Full manual evaluation corpus └── CLAUDE.md Dev guide ``` Test suite: `pytest -v` — runs without a DB connection (fixture pre-seeds PersonMatcher). Last commit on branch: `fix(nlp-service): eliminate false-positive person matches from dirty DB records`

marcel added the P1-high feature labels 2026-06-07 14:14:08 +02:00

marcel commented

2026-06-07 14:22:01 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

lang parameter is missing from the Java integration spec. NlQueryParserService calls ollamaClient.parse(query) — the current interface is OllamaClient.parse(String query). The new service requires { "query": "...", "lang": "de|en|es" }. The interface must change to parse(String query, String lang) (or take a request object) but this is not mentioned anywhere in the integration checklist. This is the largest gap in the spec.
NlSearchRequest only has query — no lang field. Where does the language come from? This needs to be resolved before anyone writes RestClientNlpClient.
The production checklist says RestClientNlpClient but makes no mention of updating the OllamaClient interface signature or the NlQueryParserService call site. The actual scope of the Java work is underspecified.
OllamaHealthClient is a separate interface that RestClientOllamaClient also implements. The issue doesn't address whether RestClientNlpClient needs to implement health checking too. The health controller (if there is one) will break or silently skip health checks.
Renaming OllamaExtraction to NlpExtraction is listed as "cosmetic / optional." It isn't — every time someone reads the code the name sends the wrong mental model. Recommend doing it in the same PR.
The Java integration checklist has no test item. At minimum: one @SpringBootTest or RestClientTest that stubs the NLP service and asserts the full round-trip.

Recommendations

Add lang to NlSearchRequest (simplest path: String lang with default "de"). Update OllamaClient.parse(String query, String lang). This forces the call site to be explicit.
Check whether a NlpHealthClient interface is needed — look for any @ConditionalOnProperty or health-check endpoint that calls OllamaHealthClient.isHealthy() and decide now, not during implementation.
Do the rename (OllamaExtraction → NlpExtraction, OllamaClient → NlParserClient) in the same PR. It's a search-and-replace, not a rewrite.

Open Decisions

lang source: three options: (a) add to NlSearchRequest — explicit, user selects; (b) derive from the authenticated user's profile language — transparent, no API change; (c) hard-default to "de" (archive is primarily German) and add en/es later. Option (a) is cleanest but changes the API contract. Option (c) is the fastest path to production but silently ignores EN/ES queries. Needs a product call.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - **`lang` parameter is missing from the Java integration spec.** `NlQueryParserService` calls `ollamaClient.parse(query)` — the current interface is `OllamaClient.parse(String query)`. The new service requires `{ "query": "...", "lang": "de|en|es" }`. The interface must change to `parse(String query, String lang)` (or take a request object) but this is not mentioned anywhere in the integration checklist. This is the largest gap in the spec. - `NlSearchRequest` only has `query` — no `lang` field. Where does the language come from? This needs to be resolved before anyone writes `RestClientNlpClient`. - The production checklist says `RestClientNlpClient` but makes no mention of updating the `OllamaClient` interface signature or the `NlQueryParserService` call site. The actual scope of the Java work is underspecified. - `OllamaHealthClient` is a separate interface that `RestClientOllamaClient` also implements. The issue doesn't address whether `RestClientNlpClient` needs to implement health checking too. The health controller (if there is one) will break or silently skip health checks. - Renaming `OllamaExtraction` to `NlpExtraction` is listed as "cosmetic / optional." It isn't — every time someone reads the code the name sends the wrong mental model. Recommend doing it in the same PR. - The Java integration checklist has no test item. At minimum: one `@SpringBootTest` or `RestClientTest` that stubs the NLP service and asserts the full round-trip. ### Recommendations - Add `lang` to `NlSearchRequest` (simplest path: `String lang` with default `"de"`). Update `OllamaClient.parse(String query, String lang)`. This forces the call site to be explicit. - Check whether a `NlpHealthClient` interface is needed — look for any `@ConditionalOnProperty` or health-check endpoint that calls `OllamaHealthClient.isHealthy()` and decide now, not during implementation. - Do the rename (`OllamaExtraction` → `NlpExtraction`, `OllamaClient` → `NlParserClient`) in the same PR. It's a search-and-replace, not a rewrite. ### Open Decisions - **`lang` source:** three options: (a) add to `NlSearchRequest` — explicit, user selects; (b) derive from the authenticated user's profile language — transparent, no API change; (c) hard-default to `"de"` (archive is primarily German) and add `en`/`es` later. Option (a) is cleanest but changes the API contract. Option (c) is the fastest path to production but silently ignores EN/ES queries. Needs a product call.

marcel commented

2026-06-07 14:22:16 +02:00

🏛️ Markus Lehmann — Backend Architect

Observations

The OllamaClient interface is the right seam. NlQueryParserService depends on OllamaClient (interface), so dropping in RestClientNlpClient is clean — no business logic changes needed, just the adapter. This is the correct architectural pattern and is already in place.
Timeout mismatch. OllamaProperties.timeoutSeconds = 30. The existing RestClientNlpClient will inherit this config key or need a new one. A Python service responding in <100ms should have a 3–5 second timeout max, not 30. If not changed, a hung nlp-service will block a thread for 30s per request.
Startup ordering vs. degraded mode inconsistency. The issue says persons_loaded: 0 is valid (service starts without DB, person matching disabled). But then says "add depends_on: db with condition: service_healthy". These are contradictory: if the service is designed to start without DB, don't require DB health for startup — just log a warning and serve degraded. If you require DB health, then persons_loaded: 0 is never a normal production state.
NlQueryParserService has a hardcoded MAX_RESOLVED_PERSONS = 2 (implicit in resolveNames). PersonMatcher also returns at most 2 meaningful persons (first is sender, second is receiver). These need to stay in sync. Not a blocker but should be noted in the spec.
_NON_NAME_TOKENS filter is a service-layer data concern. It corrects for dirty data in the DB. The right long-term home for this logic is either the Person importer/validator or a DB migration, not a runtime filter in the NLP service. Noting this so it doesn't get left as permanent technical debt.

Recommendations

Create NlpProperties (app.nlp.*) alongside existing OllamaProperties — at minimum baseUrl and timeoutSeconds (default 5). Don't reuse OllamaProperties as that couples two distinct integrations.
Decide on startup behaviour explicitly: either (a) fail fast if DB is unavailable at startup (simpler, clearer) or (b) start with empty index and log a degraded-mode warning. Then make depends_on match the decision. Don't do both.
File a separate cleanup ticket for the dirty DB records (first_name containing prepositions). The _NON_NAME_TOKENS filter is correct as a stopgap but should not be permanent.

## 🏛️ Markus Lehmann — Backend Architect ### Observations - **The `OllamaClient` interface is the right seam.** `NlQueryParserService` depends on `OllamaClient` (interface), so dropping in `RestClientNlpClient` is clean — no business logic changes needed, just the adapter. This is the correct architectural pattern and is already in place. - **Timeout mismatch.** `OllamaProperties.timeoutSeconds = 30`. The existing `RestClientNlpClient` will inherit this config key or need a new one. A Python service responding in <100ms should have a 3–5 second timeout max, not 30. If not changed, a hung nlp-service will block a thread for 30s per request. - **Startup ordering vs. degraded mode inconsistency.** The issue says `persons_loaded: 0` is valid (service starts without DB, person matching disabled). But then says "add `depends_on: db` with `condition: service_healthy`". These are contradictory: if the service is designed to start without DB, don't require DB health for startup — just log a warning and serve degraded. If you require DB health, then `persons_loaded: 0` is never a normal production state. - **`NlQueryParserService` has a hardcoded `MAX_RESOLVED_PERSONS = 2` (implicit in `resolveNames`).** PersonMatcher also returns at most 2 meaningful persons (first is sender, second is receiver). These need to stay in sync. Not a blocker but should be noted in the spec. - **`_NON_NAME_TOKENS` filter is a service-layer data concern.** It corrects for dirty data in the DB. The right long-term home for this logic is either the `Person` importer/validator or a DB migration, not a runtime filter in the NLP service. Noting this so it doesn't get left as permanent technical debt. ### Recommendations - Create `NlpProperties` (`app.nlp.*`) alongside existing `OllamaProperties` — at minimum `baseUrl` and `timeoutSeconds` (default 5). Don't reuse `OllamaProperties` as that couples two distinct integrations. - Decide on startup behaviour explicitly: either (a) fail fast if DB is unavailable at startup (simpler, clearer) or (b) start with empty index and log a degraded-mode warning. Then make `depends_on` match the decision. Don't do both. - File a separate cleanup ticket for the dirty DB records (`first_name` containing prepositions). The `_NON_NAME_TOKENS` filter is correct as a stopgap but should not be permanent.

marcel commented

2026-06-07 14:22:30 +02:00

🔒 Nora Weiß — Security Engineer

Observations

No port exposure concern flagged in the spec. docker-compose.yml currently exposes OCR service only on the internal network. The issue must explicitly say: nlp-service gets NO ports: entry in docker-compose — it is reachable only at http://nlp-service:8001 from other containers. If a ports: mapping is added (even for dev convenience), it becomes an unauthenticated HTTP endpoint exposed on the host. The Python service has no auth layer whatsoever.
No query length validation in the Python service. models.py validates lang (Literal enum) and query is an unvalidated str. The Java layer enforces 3–500 chars, but if the service is ever called directly (container network, internal tooling, curl for debugging), arbitrarily large inputs go straight to the extraction pipeline. PersonMatcher.find_in_query splits on whitespace — a 500KB payload with no spaces would produce one giant token and run process.extractOne against 1,728 names. That's a real resource spike.
DATABASE_URL contains credentials. The issue says "set DATABASE_URL from the existing compose env." The existing .env / docker-compose.yml pattern stores these as env vars (${POSTGRES_PASSWORD} etc.). The spec should be explicit: reference ${DATABASE_URL} or construct from existing POSTGRES_* variables — do not hardcode credentials in compose.
Person name data in logs. If the service ever logs the query string (e.g. for debugging), real person names become visible in log output. The current main.py lifespan and extractor.py don't log queries, which is correct. The spec should note: never log query content in production — only log aggregate metadata (e.g. persons_found=2, lang=de).

Recommendations

Add query: str = Field(..., min_length=3, max_length=500) to ParseRequest in models.py. This is a one-liner and provides defence-in-depth independent of the Java caller.
Add an explicit note to the compose integration checklist: "no ports: mapping for nlp-service — internal network only."
Add DATABASE_URL: ${DATABASE_URL} (or constructed equivalent) to the compose spec — explicitly, not as prose.

## 🔒 Nora Weiß — Security Engineer ### Observations - **No port exposure concern flagged in the spec.** `docker-compose.yml` currently exposes OCR service only on the internal network. The issue must explicitly say: `nlp-service` gets NO `ports:` entry in docker-compose — it is reachable only at `http://nlp-service:8001` from other containers. If a `ports:` mapping is added (even for dev convenience), it becomes an unauthenticated HTTP endpoint exposed on the host. The Python service has no auth layer whatsoever. - **No query length validation in the Python service.** `models.py` validates `lang` (Literal enum) and `query` is an unvalidated `str`. The Java layer enforces 3–500 chars, but if the service is ever called directly (container network, internal tooling, curl for debugging), arbitrarily large inputs go straight to the extraction pipeline. `PersonMatcher.find_in_query` splits on whitespace — a 500KB payload with no spaces would produce one giant token and run `process.extractOne` against 1,728 names. That's a real resource spike. - **`DATABASE_URL` contains credentials.** The issue says "set DATABASE_URL from the existing compose env." The existing `.env` / `docker-compose.yml` pattern stores these as env vars (`${POSTGRES_PASSWORD}` etc.). The spec should be explicit: reference `${DATABASE_URL}` or construct from existing `POSTGRES_*` variables — do not hardcode credentials in compose. - **Person name data in logs.** If the service ever logs the query string (e.g. for debugging), real person names become visible in log output. The current `main.py` lifespan and `extractor.py` don't log queries, which is correct. The spec should note: never log `query` content in production — only log aggregate metadata (e.g. `persons_found=2`, `lang=de`). ### Recommendations - Add `query: str = Field(..., min_length=3, max_length=500)` to `ParseRequest` in `models.py`. This is a one-liner and provides defence-in-depth independent of the Java caller. - Add an explicit note to the compose integration checklist: "no `ports:` mapping for nlp-service — internal network only." - Add `DATABASE_URL: ${DATABASE_URL}` (or constructed equivalent) to the compose spec — explicitly, not as prose.

marcel commented

2026-06-07 14:22:46 +02:00

🧪 Sara Müller — QA Engineer

Observations

No acceptance criteria for the Java integration work. The production checklist has 6 tasks but no testable acceptance criteria. A developer ticking "Java RestClientNlpClient" as done has no definition of "done" to verify against.
Rate limiter inherited from Ollama is likely wrong. 5 requests/minute was set for a 5–15 second inference call. With sub-second NLP parsing, real users will naturally issue more queries (quick refinements, typo corrections). 5/min will generate SMART_SEARCH_RATE_LIMITED errors that didn't exist before, and users won't understand why. This isn't a caveat — it's a regression in usability that needs a deliberate decision.
Missing degraded-mode behaviour in the test battery. The battery covers the happy path and typos well. Missing scenarios:
- nlp-service returns 500 → Java shows SMART_SEARCH_UNAVAILABLE
- nlp-service is unreachable (compose restart) → same
- nlp-service returns persons_loaded: 0 (started before DB was ready) → what does the Java search return? Documents only (no person filter) — is this surfaced to the user?
fuzzy threshold is hardcoded at 80. The issue mentions "raise to 85 if false positives re-emerge" as prose. This needs to be a configurable env var (NLP_FUZZY_THRESHOLD), not a code change requiring a deploy.
"Bare name without preposition" behaviour gap. The issue says the suggestion mechanism handles this case. I looked at NlQueryParserService — noMatchFragments are passed into buildText() which feeds the full-text search. So "Eugenie" (no preposition match) becomes a full-text keyword, which may surface results. But the interpretation panel will show resolvedPersons: [] with no explanation. The user may not understand why their name search returned text-matched documents instead of person-filtered ones.

Recommendations

Add explicit ACs to the production checklist:
- ✅ POST /api/search/nl with query: "Briefe von Clara Cram vor 1910" returns interpretation.resolvedPersons containing Clara Cram and interpretation.dateFrom null, dateTo: 1910-12-31
- ✅ When nlp-service is down, search returns HTTP 503 with code: SMART_SEARCH_UNAVAILABLE
- ✅ Existing Java unit tests for NlQueryParserService still pass with a stubbed NlParserClient
Add NLP_FUZZY_THRESHOLD as a documented env var with default 80.
Explicitly call out: "Rate limit should be re-evaluated — propose 20/min for production."

## 🧪 Sara Müller — QA Engineer ### Observations - **No acceptance criteria for the Java integration work.** The production checklist has 6 tasks but no testable acceptance criteria. A developer ticking "Java RestClientNlpClient" as done has no definition of "done" to verify against. - **Rate limiter inherited from Ollama is likely wrong.** 5 requests/minute was set for a 5–15 second inference call. With sub-second NLP parsing, real users will naturally issue more queries (quick refinements, typo corrections). 5/min will generate `SMART_SEARCH_RATE_LIMITED` errors that didn't exist before, and users won't understand why. This isn't a caveat — it's a regression in usability that needs a deliberate decision. - **Missing degraded-mode behaviour in the test battery.** The battery covers the happy path and typos well. Missing scenarios: - nlp-service returns 500 → Java shows `SMART_SEARCH_UNAVAILABLE` - nlp-service is unreachable (compose restart) → same - nlp-service returns `persons_loaded: 0` (started before DB was ready) → what does the Java search return? Documents only (no person filter) — is this surfaced to the user? - **`fuzzy threshold` is hardcoded at 80.** The issue mentions "raise to 85 if false positives re-emerge" as prose. This needs to be a configurable env var (`NLP_FUZZY_THRESHOLD`), not a code change requiring a deploy. - **"Bare name without preposition" behaviour gap.** The issue says the suggestion mechanism handles this case. I looked at `NlQueryParserService` — `noMatchFragments` are passed into `buildText()` which feeds the full-text search. So "Eugenie" (no preposition match) becomes a full-text keyword, which may surface results. But the interpretation panel will show `resolvedPersons: []` with no explanation. The user may not understand why their name search returned text-matched documents instead of person-filtered ones. ### Recommendations - Add explicit ACs to the production checklist: - ✅ `POST /api/search/nl` with `query: "Briefe von Clara Cram vor 1910"` returns `interpretation.resolvedPersons` containing Clara Cram and `interpretation.dateFrom` null, `dateTo: 1910-12-31` - ✅ When nlp-service is down, search returns HTTP 503 with `code: SMART_SEARCH_UNAVAILABLE` - ✅ Existing Java unit tests for `NlQueryParserService` still pass with a stubbed `NlParserClient` - Add `NLP_FUZZY_THRESHOLD` as a documented env var with default 80. - Explicitly call out: "Rate limit should be re-evaluated — propose 20/min for production."

marcel commented

2026-06-07 14:23:01 +02:00

🎨 Leonie Fischer — UX Engineer

Observations

Sub-second response changes the interaction contract. With Ollama taking 5–15s, the search had a distinct "waiting for AI" phase. At <100ms, NL search now behaves like instant search. The loading state (spinner, skeleton) will flash and disappear. If the frontend hardcodes a minimum display duration for the loading indicator, it may look broken. This is a UX regression in disguise.
Rate limit UX. A SMART_SEARCH_RATE_LIMITED at 5/min will feel arbitrary and frustrating at this speed. A user who types a query, gets results, then refines it 5 times in 60 seconds hits a wall with no obvious explanation. The error message in the frontend should say something like "Bitte warte kurz" rather than a generic error.
Interpretation panel feedback loop. NlQueryInterpretation includes resolvedPersons, ambiguousPersons, resolvedTags, keywordsApplied, tagsApplied. With fast responses, users will actually read this. The "bare name" case (Eugenie → personNames: [], falls through to text search) will look like the search didn't understand them — no person chip in the interpretation panel. The existing suggestion mechanism (noMatchFragments) sends the name into full-text search silently. This should either surface a "Did you mean: Eugenie de Gruyter?" hint or be documented as known UX gap.
No lang feedback in the UI. If lang is derived automatically (session language or default "de"), users typing English queries into a German-configured archive get silent degraded results. The issue doesn't address whether the UI exposes language selection.

Recommendations

Check SearchBar / NL search component — if there's a minimum spinner duration or debounce delay tuned for Ollama latency, update it.
For the rate limit error: confirm the existing SMART_SEARCH_RATE_LIMITED error message in en.json / de.json / es.json reads as user-friendly, not as a technical error.
Document the bare-name fallback behaviour as a known UX gap in the issue, with a note that a future "Did you mean?" hint is desirable.

Open Decisions

Rate limit value. 5/min was calibrated for Ollama. For a sub-second service 20–30/min is more appropriate. This is a product/ops tradeoff: higher limits increase load on the nlp-service and DB. Recommend 20/min as the new default but this needs sign-off.

## 🎨 Leonie Fischer — UX Engineer ### Observations - **Sub-second response changes the interaction contract.** With Ollama taking 5–15s, the search had a distinct "waiting for AI" phase. At <100ms, NL search now behaves like instant search. The loading state (spinner, skeleton) will flash and disappear. If the frontend hardcodes a minimum display duration for the loading indicator, it may look broken. This is a UX regression in disguise. - **Rate limit UX.** A `SMART_SEARCH_RATE_LIMITED` at 5/min will feel arbitrary and frustrating at this speed. A user who types a query, gets results, then refines it 5 times in 60 seconds hits a wall with no obvious explanation. The error message in the frontend should say something like "Bitte warte kurz" rather than a generic error. - **Interpretation panel feedback loop.** `NlQueryInterpretation` includes `resolvedPersons`, `ambiguousPersons`, `resolvedTags`, `keywordsApplied`, `tagsApplied`. With fast responses, users will actually read this. The "bare name" case (`Eugenie` → `personNames: []`, falls through to text search) will look like the search didn't understand them — no person chip in the interpretation panel. The existing suggestion mechanism (`noMatchFragments`) sends the name into full-text search silently. This should either surface a "Did you mean: Eugenie de Gruyter?" hint or be documented as known UX gap. - **No `lang` feedback in the UI.** If `lang` is derived automatically (session language or default "de"), users typing English queries into a German-configured archive get silent degraded results. The issue doesn't address whether the UI exposes language selection. ### Recommendations - Check `SearchBar` / NL search component — if there's a minimum spinner duration or debounce delay tuned for Ollama latency, update it. - For the rate limit error: confirm the existing `SMART_SEARCH_RATE_LIMITED` error message in `en.json` / `de.json` / `es.json` reads as user-friendly, not as a technical error. - Document the bare-name fallback behaviour as a known UX gap in the issue, with a note that a future "Did you mean?" hint is desirable. ### Open Decisions - **Rate limit value.** 5/min was calibrated for Ollama. For a sub-second service 20–30/min is more appropriate. This is a product/ops tradeoff: higher limits increase load on the nlp-service and DB. Recommend 20/min as the new default but this needs sign-off.

marcel commented

2026-06-07 14:23:19 +02:00

🚀 Tobias Huber — DevOps / Infrastructure

Observations

Ollama removal is missing from the compose checklist. The issue says "add nlp-service" but Ollama currently consumes ~8 GB memory (mem_limit: 8g) and ollama-model-init downloads a 4.7 GB model. These two services must be removed (or commented out for rollback). Not removing them means running both Ollama and the nlp-service simultaneously with no benefit. The checklist should include: remove ollama, ollama-model-init, and the APP_OLLAMA_* env vars from the backend service.
Memory footprint for nlp-service is unspecified. OCR service has mem_limit: 12g (Surya/Kraken models). The NLP service carries only rapidfuzz + dateparser + the name index in memory — mem_limit: 256m should be sufficient. Without a limit, compose will assign no cap and the container can grow unbounded if there is a memory leak.
Timeout not updated. OllamaProperties.timeoutSeconds = 30 drives a 30-second read timeout on the inference HTTP client. RestClientNlpClient must use a new NlpProperties bean with timeoutSeconds = 5 (or even 3). This is a production concern: a hung nlp-service ties up a thread for 30 seconds per pending request under the current config.
Health check response is richer than compose uses. GET /health → { "status": "ok", "persons_loaded": 1728 }. The compose HEALTHCHECK should check persons_loaded > 0 to detect degraded startup (DB was unavailable). Currently a standard curl -f /health returns 200 even if persons_loaded: 0, which means the container reports healthy when person-matching is completely non-functional.
No Prometheus metrics. OCR service presumably has some observability. nlp-service has none — no request count, no latency histogram, no persons_loaded gauge. For a production service, at minimum expose a /metrics endpoint (FastAPI + prometheus-fastapi-instrumentator is a one-liner). Not a blocker for the first iteration but should be in the backlog.
DATABASE_URL format. The compose env file uses POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB separately. The nlp-service needs a full DSN. Document whether to construct it in compose (postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}) or add NLP_DATABASE_URL as a separate variable.

Recommendations

Add to compose checklist: remove ollama, ollama-model-init; remove APP_OLLAMA_BASE_URL and APP_OLLAMA_MODEL from backend env.
Add mem_limit: 256m to the nlp-service compose spec.
Use a new app.nlp.timeout-seconds = 5 property (not the Ollama 30s).
Change compose HEALTHCHECK to: CMD curl -sf http://localhost:8001/health | grep -q '"persons_loaded":[^0]' || exit 1 — fails if persons index is empty.

## 🚀 Tobias Huber — DevOps / Infrastructure ### Observations - **Ollama removal is missing from the compose checklist.** The issue says "add nlp-service" but Ollama currently consumes ~8 GB memory (`mem_limit: 8g`) and `ollama-model-init` downloads a 4.7 GB model. These two services must be removed (or commented out for rollback). Not removing them means running both Ollama and the nlp-service simultaneously with no benefit. The checklist should include: remove `ollama`, `ollama-model-init`, and the `APP_OLLAMA_*` env vars from the backend service. - **Memory footprint for nlp-service is unspecified.** OCR service has `mem_limit: 12g` (Surya/Kraken models). The NLP service carries only rapidfuzz + dateparser + the name index in memory — `mem_limit: 256m` should be sufficient. Without a limit, compose will assign no cap and the container can grow unbounded if there is a memory leak. - **Timeout not updated.** `OllamaProperties.timeoutSeconds = 30` drives a 30-second read timeout on the inference HTTP client. `RestClientNlpClient` must use a new `NlpProperties` bean with `timeoutSeconds = 5` (or even 3). This is a production concern: a hung nlp-service ties up a thread for 30 seconds per pending request under the current config. - **Health check response is richer than compose uses.** `GET /health → { "status": "ok", "persons_loaded": 1728 }`. The compose `HEALTHCHECK` should check `persons_loaded > 0` to detect degraded startup (DB was unavailable). Currently a standard `curl -f /health` returns 200 even if `persons_loaded: 0`, which means the container reports healthy when person-matching is completely non-functional. - **No Prometheus metrics.** OCR service presumably has some observability. nlp-service has none — no request count, no latency histogram, no `persons_loaded` gauge. For a production service, at minimum expose a `/metrics` endpoint (FastAPI + `prometheus-fastapi-instrumentator` is a one-liner). Not a blocker for the first iteration but should be in the backlog. - **`DATABASE_URL` format.** The compose env file uses `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB` separately. The nlp-service needs a full DSN. Document whether to construct it in compose (`postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}`) or add `NLP_DATABASE_URL` as a separate variable. ### Recommendations - Add to compose checklist: remove `ollama`, `ollama-model-init`; remove `APP_OLLAMA_BASE_URL` and `APP_OLLAMA_MODEL` from backend env. - Add `mem_limit: 256m` to the nlp-service compose spec. - Use a new `app.nlp.timeout-seconds = 5` property (not the Ollama 30s). - Change compose `HEALTHCHECK` to: `CMD curl -sf http://localhost:8001/health | grep -q '"persons_loaded":[^0]' || exit 1` — fails if persons index is empty.

marcel commented

2026-06-07 14:23:35 +02:00

🗳️ Decision Queue — Action Required

2 decisions need your input before implementation starts.

API Design

lang parameter source — The NLP service requires a lang field (de/en/es) but the current NlSearchRequest only has query, and OllamaClient.parse(String query) has no language parameter. Three options:
- (a) Add lang to NlSearchRequest — user or frontend selects language explicitly. Cleanest, most flexible. Changes the API contract and requires a frontend update.
- (b) Derive from authenticated user's session/profile language — transparent, no API change, but the user's UI language may not match the query language (a German-speaking user might search in English).
- (c) Default to "de" — fastest path to production, works for the primary use case. EN/ES support is latent but can be activated later with option (a). Silently returns degraded results for non-German queries.
Raised by: Felix

Infrastructure / Product

Rate limit: 5/min → ? — The current limit was sized for 5–15s Ollama inference. With sub-second NLP parsing, it will fire on normal user behaviour (type query, refine, refine again). Two options:
- Raise to 20/min — fits natural refinement patterns, still protects the service. Recommended.
- Keep at 5/min — zero work, known friction. Acceptable only if the rate limiter is made configurable via env var so it can be tuned without a redeploy.
Raised by: Sara, Leonie

## 🗳️ Decision Queue — Action Required _2 decisions need your input before implementation starts._ ### API Design - **`lang` parameter source** — The NLP service requires a `lang` field (`de`/`en`/`es`) but the current `NlSearchRequest` only has `query`, and `OllamaClient.parse(String query)` has no language parameter. Three options: - **(a) Add `lang` to `NlSearchRequest`** — user or frontend selects language explicitly. Cleanest, most flexible. Changes the API contract and requires a frontend update. - **(b) Derive from authenticated user's session/profile language** — transparent, no API change, but the user's UI language may not match the query language (a German-speaking user might search in English). - **(c) Default to `"de"`** — fastest path to production, works for the primary use case. EN/ES support is latent but can be activated later with option (a). Silently returns degraded results for non-German queries. _Raised by: Felix_ ### Infrastructure / Product - **Rate limit: 5/min → ?** — The current limit was sized for 5–15s Ollama inference. With sub-second NLP parsing, it will fire on normal user behaviour (type query, refine, refine again). Two options: - **Raise to 20/min** — fits natural refinement patterns, still protects the service. Recommended. - **Keep at 5/min** — zero work, known friction. Acceptable only if the rate limiter is made configurable via env var so it can be tuned without a redeploy. _Raised by: Sara, Leonie_

marcel commented

2026-06-07 14:27:22 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

NlSearchController.search() passes only request.query() to NlQueryParserService.search() — the lang field added to NlSearchRequest will be silently dropped unless the controller call site is updated: nlQueryParserService.search(request.query(), request.lang(), pageable). The checklist item "Update NlQueryParserService call site" exists on the Python/Java rename side but the controller-to-service wiring gap is not explicitly called out. Easy to miss during implementation.
NlQueryParserServiceTest stubs OllamaClient.parse(String query) with anyString() and constructs NlQueryParserService directly via constructor. When the interface is renamed to NlParserClient and the signature changes to parse(String query, String lang), every single when(ollamaClient.parse(anyString())) stub in the 35-test class breaks with PotentialStubbingProblem. This is a non-trivial mechanical refactor — not a one-liner — and should be called out explicitly in the checklist so it doesn't land as a surprise mid-PR.
RestClientOllamaClientTest will become RestClientNlpClientTest. The WireMock stubs point at /api/generate (Ollama-specific path). The new client posts to /parse. Three stubs need updating and the response shape changes from the Ollama envelope ({"response":"...","done":true}) to the flat NLP service JSON. The checklist mentions "Java test" as a single item but says nothing about migrating or replacing the existing WireMock test — this could cause confusion about whether the old test is kept, adapted, or deleted.
NlSearchControllerTest test #7 hardcodes the 5-req/min limit (for (int i = 0; i < 5; i++)). After the rate limit is raised to 20/min, this test continues to pass (it only does 5 iterations) but the assertion no longer validates the correct limit. The test should be updated to use the configured value or at least document why it only goes to 5.

Recommendations

Add a checklist item: "Update NlSearchController.search() to pass request.lang() to service."
Add a checklist item: "Update all NlQueryParserServiceTest stubs from parse(anyString()) to parse(anyString(), anyString())."
Add a checklist item: "Replace or adapt RestClientOllamaClientTest — WireMock stubs target /api/generate; new client uses /parse with a flat response shape."
When updating NlSearchControllerTest #7, either use props.getMaxRequestsPerMinute() as the loop bound or add a comment marking it as a known limit-tied assertion.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - **`NlSearchController.search()` passes only `request.query()` to `NlQueryParserService.search()`** — the `lang` field added to `NlSearchRequest` will be silently dropped unless the controller call site is updated: `nlQueryParserService.search(request.query(), request.lang(), pageable)`. The checklist item "Update `NlQueryParserService` call site" exists on the Python/Java rename side but the controller-to-service wiring gap is not explicitly called out. Easy to miss during implementation. - **`NlQueryParserServiceTest` stubs `OllamaClient.parse(String query)` with `anyString()` and constructs `NlQueryParserService` directly via constructor.** When the interface is renamed to `NlParserClient` and the signature changes to `parse(String query, String lang)`, every single `when(ollamaClient.parse(anyString()))` stub in the 35-test class breaks with `PotentialStubbingProblem`. This is a non-trivial mechanical refactor — not a one-liner — and should be called out explicitly in the checklist so it doesn't land as a surprise mid-PR. - **`RestClientOllamaClientTest` will become `RestClientNlpClientTest`.** The WireMock stubs point at `/api/generate` (Ollama-specific path). The new client posts to `/parse`. Three stubs need updating and the response shape changes from the Ollama envelope (`{"response":"...","done":true}`) to the flat NLP service JSON. The checklist mentions "Java test" as a single item but says nothing about migrating or replacing the existing WireMock test — this could cause confusion about whether the old test is kept, adapted, or deleted. - **`NlSearchControllerTest` test #7 hardcodes the 5-req/min limit** (`for (int i = 0; i < 5; i++)`). After the rate limit is raised to 20/min, this test continues to pass (it only does 5 iterations) but the assertion no longer validates the correct limit. The test should be updated to use the configured value or at least document why it only goes to 5. ### Recommendations - Add a checklist item: "Update `NlSearchController.search()` to pass `request.lang()` to service." - Add a checklist item: "Update all `NlQueryParserServiceTest` stubs from `parse(anyString())` to `parse(anyString(), anyString())`." - Add a checklist item: "Replace or adapt `RestClientOllamaClientTest` — WireMock stubs target `/api/generate`; new client uses `/parse` with a flat response shape." - When updating `NlSearchControllerTest` #7, either use `props.getMaxRequestsPerMinute()` as the loop bound or add a comment marking it as a known limit-tied assertion.

marcel commented

2026-06-07 14:27:35 +02:00

🏛️ Markus Lehmann — Backend Architect

Observations

NlQueryParserService has private final OllamaClient ollamaClient — when the interface is renamed to NlParserClient, the field name will also need updating. More importantly, NlQueryParserService.search() currently takes (String query, Pageable pageable). Adding lang as a third parameter is straightforward, but the service signature change propagates to NlSearchController (call site) and to all test stubs. The spec calls out the client interface change but does not explicitly say the service method signature changes too — this needs to be stated.
The existing OllamaProperties bean is still wired in RestClientOllamaClient's constructor. When RestClientOllamaClient is replaced by RestClientNlpClient using NlpProperties, the old OllamaProperties bean (@ConfigurationProperties("app.ollama")) must also be removed from the context or it will fail to bind at startup if app.ollama.* keys are absent from application.properties. The checklist covers removing compose env vars but does not mention removing the OllamaProperties Spring component registration itself.
OllamaExtraction is a package-private record (no public modifier). After renaming to NlpExtraction, verify visibility is consistent — if RestClientNlpClient lives in the same search package this is fine, but worth confirming so nothing breaks if packages are ever reorganised.
The healthCheckTimeoutSeconds field on OllamaProperties (currently 2s) will need an equivalent on NlpProperties. The issue spec lists baseUrl and timeoutSeconds for NlpProperties but omits healthCheckTimeoutSeconds. RestClientOllamaClient uses a separate RestClient instance with its own timeout for health checks. The new RestClientNlpClient will need the same split — otherwise a slow health check endpoint ties up 5s (the inference timeout) during startup probing.

Recommendations

State explicitly in the checklist: "Update NlQueryParserService.search(String query, Pageable) signature to search(String query, String lang, Pageable)."
Add to checklist: "Remove OllamaProperties @Component annotation / bean registration when RestClientOllamaClient is deleted — prevent startup binding failure."
Add healthCheckTimeoutSeconds (default 2) to the NlpProperties spec in the issue body.

## 🏛️ Markus Lehmann — Backend Architect ### Observations - **`NlQueryParserService` has `private final OllamaClient ollamaClient`** — when the interface is renamed to `NlParserClient`, the field name will also need updating. More importantly, `NlQueryParserService.search()` currently takes `(String query, Pageable pageable)`. Adding `lang` as a third parameter is straightforward, but the service signature change propagates to `NlSearchController` (call site) and to all test stubs. The spec calls out the client interface change but does not explicitly say the service method signature changes too — this needs to be stated. - **The existing `OllamaProperties` bean is still wired in `RestClientOllamaClient`'s constructor.** When `RestClientOllamaClient` is replaced by `RestClientNlpClient` using `NlpProperties`, the old `OllamaProperties` bean (`@ConfigurationProperties("app.ollama")`) must also be removed from the context or it will fail to bind at startup if `app.ollama.*` keys are absent from `application.properties`. The checklist covers removing compose env vars but does not mention removing the `OllamaProperties` Spring component registration itself. - **`OllamaExtraction` is a package-private record** (no `public` modifier). After renaming to `NlpExtraction`, verify visibility is consistent — if `RestClientNlpClient` lives in the same `search` package this is fine, but worth confirming so nothing breaks if packages are ever reorganised. - **The `healthCheckTimeoutSeconds` field on `OllamaProperties`** (currently 2s) will need an equivalent on `NlpProperties`. The issue spec lists `baseUrl` and `timeoutSeconds` for `NlpProperties` but omits `healthCheckTimeoutSeconds`. `RestClientOllamaClient` uses a *separate* `RestClient` instance with its own timeout for health checks. The new `RestClientNlpClient` will need the same split — otherwise a slow health check endpoint ties up 5s (the inference timeout) during startup probing. ### Recommendations - State explicitly in the checklist: "Update `NlQueryParserService.search(String query, Pageable)` signature to `search(String query, String lang, Pageable)`." - Add to checklist: "Remove `OllamaProperties` `@Component` annotation / bean registration when `RestClientOllamaClient` is deleted — prevent startup binding failure." - Add `healthCheckTimeoutSeconds` (default 2) to the `NlpProperties` spec in the issue body.

marcel commented

2026-06-07 14:27:52 +02:00

🔒 Nora Weiß — Security Engineer

Observations

main.py _load_persons_from_db() uses a blocking synchronous psycopg2 connection inside the FastAPI lifespan — this is fine for a startup load, but the DB credentials come from DATABASE_URL with no validation. If DATABASE_URL is malformed or points to the wrong host, psycopg2.connect() raises an exception that is not caught, which causes the lifespan to fail and the container to exit at startup. This is arguably the correct fail-fast behaviour, but it means a misconfigured DATABASE_URL (e.g. wrong password after a secret rotation) kills the NLP service entirely rather than starting in degraded mode. The issue says persons_loaded: 0 is a valid degraded state, but the code does not currently achieve this for DB connection errors — only for a missing DATABASE_URL env var. The lifespan should catch psycopg2.OperationalError and log a warning rather than crashing.
parse endpoint exception handler leaks internal detail: raise HTTPException(status_code=500, detail=str(exc)). In production, str(exc) from a Python exception can contain file paths, internal variable state, or DB connection strings if the exception propagates from psycopg2. The detail should be a generic string, not str(exc).
The Dockerfile COPY . . instruction copies everything including venv/, __pycache__/, .env files (if present), and test files into the image. A .dockerignore file is not present in the listed directory contents. Without it, local virtual environments, credentials files, and test corpora are bundled into the production image. This is a standard oversight but worth calling out explicitly — a minimal .dockerignore (venv/, __pycache__/, *.pyc, .env, test_*.py) should be added.

Recommendations

Wrap _load_persons_from_db() in a try/except psycopg2.OperationalError in the lifespan; on failure, log a warning and leave the matcher empty (degraded mode). This aligns the runtime behaviour with the documented persons_loaded: 0 degraded state.
Replace detail=str(exc) in the parse endpoint with detail="Internal extraction error" — never expose exception text in HTTP responses.
Add a .dockerignore to nlp-service/ before the image is built for production.

Open Decisions (omit if none)

Degraded-mode startup vs. fail-fast: if the decision is made (per Markus's round-1 comment) to require DB health via depends_on: db service_healthy, then the psycopg2.OperationalError catch-and-degrade is not needed — the container simply won't start until the DB is ready. The two approaches are mutually exclusive. Nora's recommendation above applies only if the degraded-mode path is kept.

## 🔒 Nora Weiß — Security Engineer ### Observations - **`main.py` `_load_persons_from_db()` uses a blocking synchronous `psycopg2` connection inside the FastAPI lifespan** — this is fine for a startup load, but the DB credentials come from `DATABASE_URL` with no validation. If `DATABASE_URL` is malformed or points to the wrong host, `psycopg2.connect()` raises an exception that is *not caught*, which causes the lifespan to fail and the container to exit at startup. This is arguably the correct fail-fast behaviour, but it means a misconfigured `DATABASE_URL` (e.g. wrong password after a secret rotation) kills the NLP service entirely rather than starting in degraded mode. The issue says `persons_loaded: 0` is a valid degraded state, but the code does not currently achieve this for DB *connection errors* — only for a missing `DATABASE_URL` env var. The lifespan should catch `psycopg2.OperationalError` and log a warning rather than crashing. - **`parse` endpoint exception handler leaks internal detail**: `raise HTTPException(status_code=500, detail=str(exc))`. In production, `str(exc)` from a Python exception can contain file paths, internal variable state, or DB connection strings if the exception propagates from psycopg2. The detail should be a generic string, not `str(exc)`. - **The Dockerfile `COPY . .` instruction copies everything including `venv/`, `__pycache__/`, `.env` files (if present), and test files into the image.** A `.dockerignore` file is not present in the listed directory contents. Without it, local virtual environments, credentials files, and test corpora are bundled into the production image. This is a standard oversight but worth calling out explicitly — a minimal `.dockerignore` (`venv/`, `__pycache__/`, `*.pyc`, `.env`, `test_*.py`) should be added. ### Recommendations - Wrap `_load_persons_from_db()` in a `try/except psycopg2.OperationalError` in the lifespan; on failure, log a warning and leave the matcher empty (degraded mode). This aligns the runtime behaviour with the documented `persons_loaded: 0` degraded state. - Replace `detail=str(exc)` in the parse endpoint with `detail="Internal extraction error"` — never expose exception text in HTTP responses. - Add a `.dockerignore` to `nlp-service/` before the image is built for production. ### Open Decisions _(omit if none)_ - **Degraded-mode startup vs. fail-fast**: if the decision is made (per Markus's round-1 comment) to require DB health via `depends_on: db service_healthy`, then the `psycopg2.OperationalError` catch-and-degrade is not needed — the container simply won't start until the DB is ready. The two approaches are mutually exclusive. Nora's recommendation above applies only if the degraded-mode path is kept.

marcel commented

2026-06-07 14:28:06 +02:00

🧪 Sara Müller — QA Engineer

Observations

test_main.py test_health asserts persons_loaded > 0 — this works when the test matcher is pre-seeded, but there is no test covering the persons_loaded: 0 case (DB not available at startup). The HEALTHCHECK in compose is specifically designed to detect this condition, yet the test suite never exercises the degraded /health response. A test test_health_degraded (setting _matcher to an empty PersonMatcher) would cover it.
test_main.py does not test the query length validation boundary. models.py currently has no min_length/max_length on ParseRequest.query (this is listed as a checklist item to add). Once added, the test file needs a corresponding test_parse_empty_query_returns_422 and test_parse_oversized_query_returns_422. There is no test for these cases today and no checklist item to add them.
The NlSearchControllerTest happy-path test (#1) currently sends {"query":"Briefe von Walter im Krieg"} with no lang field. After NlSearchRequest gains a lang field (even with a default), this test should be updated to explicitly include "lang":"de" in at least one case to verify the field is accepted and forwarded correctly. A test without lang only validates the default, not the pass-through.
No test covers the multi-year date range extraction edge case in test_extractor.py: the "zwischen…und" path that sorts and selects only the first two years. If a query contains three year tokens like "zwischen 1914 und 1918 und 1920", the current code returns 1914–1918 (correct) but this is not an explicit test case.

Recommendations

Add test_health_degraded to test_main.py: pre-seed an empty PersonMatcher, call GET /health, assert persons_loaded == 0 and status == "ok".
Add query length boundary tests to test_main.py once the Field(min_length=3, max_length=500) constraint is wired.
Update NlSearchControllerTest test #1 to include "lang":"de" in the request body, and add a variant that sends no lang field to verify default behaviour is correct.
Add the three-year "zwischen" edge case to test_extractor.py.

## 🧪 Sara Müller — QA Engineer ### Observations - **`test_main.py` `test_health` asserts `persons_loaded > 0`** — this works when the test matcher is pre-seeded, but there is no test covering the `persons_loaded: 0` case (DB not available at startup). The HEALTHCHECK in compose is specifically designed to detect this condition, yet the test suite never exercises the degraded `/health` response. A test `test_health_degraded` (setting `_matcher` to an empty `PersonMatcher`) would cover it. - **`test_main.py` does not test the query length validation boundary.** `models.py` currently has no `min_length`/`max_length` on `ParseRequest.query` (this is listed as a checklist item to add). Once added, the test file needs a corresponding `test_parse_empty_query_returns_422` and `test_parse_oversized_query_returns_422`. There is no test for these cases today and no checklist item to add them. - **The `NlSearchControllerTest` happy-path test (#1)** currently sends `{"query":"Briefe von Walter im Krieg"}` with no `lang` field. After `NlSearchRequest` gains a `lang` field (even with a default), this test should be updated to explicitly include `"lang":"de"` in at least one case to verify the field is accepted and forwarded correctly. A test without `lang` only validates the default, not the pass-through. - **No test covers the multi-year date range extraction edge case in `test_extractor.py`**: the "zwischen…und" path that sorts and selects only the first two years. If a query contains three year tokens like `"zwischen 1914 und 1918 und 1920"`, the current code returns 1914–1918 (correct) but this is not an explicit test case. ### Recommendations - Add `test_health_degraded` to `test_main.py`: pre-seed an empty `PersonMatcher`, call `GET /health`, assert `persons_loaded == 0` and `status == "ok"`. - Add query length boundary tests to `test_main.py` once the `Field(min_length=3, max_length=500)` constraint is wired. - Update `NlSearchControllerTest` test #1 to include `"lang":"de"` in the request body, and add a variant that sends no `lang` field to verify default behaviour is correct. - Add the three-year "zwischen" edge case to `test_extractor.py`.

marcel commented

2026-06-07 14:28:20 +02:00

🎨 Leonie Fischer — UX Engineer

Observations

The NlSearchController currently passes only request.query() to the service — the lang field is discarded at the controller boundary. If the frontend passes the Paraglide locale as lang, but the value is silently ignored (because the controller hasn't been updated), the user sees no error but gets results as if lang were the default. This is a silent degradation that will be invisible during manual testing unless you specifically verify with a non-German query.
The interpretation panel shows rawQuery back to the user. After the migration, rawQuery comes from the Python service's ParseResponse.rawQuery, which echoes request.query verbatim. This is unchanged from Ollama behaviour — but it's worth confirming the frontend doesn't display this differently. If rawQuery is used as a display string ("You searched for: …"), mixed-language queries or queries with unusual casing (all-lowercase, as in the test battery) will look exactly as typed. No issue here, but confirms there's no normalisation needed on the Java side before it reaches the frontend.
The NlSearchRateLimiter error includes the username in the message: "NL search rate limit exceeded for user: " + userKey. This string is only used server-side in logs, not returned to the frontend (the DomainException code field carries SMART_SEARCH_RATE_LIMITED). The frontend error message for SMART_SEARCH_RATE_LIMITED is what users see — the issue body doesn't mention whether the existing i18n messages for this code are user-friendly or technical. Worth checking messages/de.json and messages/en.json to confirm the copy reads naturally at 20/min (it was presumably written for a 5/min regime where hitting it was exceptional).

Recommendations

Confirm the controller-to-service lang pass-through is in the checklist (it's implicit but not stated — Felix also raised this).
Do a quick content review of the SMART_SEARCH_RATE_LIMITED message in all three locale files now, before the rate limit is raised and users actually start hitting it. A message like "Zu viele Suchanfragen — bitte warte einen Moment" is more helpful than a technical string.
No new open decisions — this is a verification item only.

## 🎨 Leonie Fischer — UX Engineer ### Observations - **The `NlSearchController` currently passes only `request.query()` to the service** — the `lang` field is discarded at the controller boundary. If the frontend passes the Paraglide locale as `lang`, but the value is silently ignored (because the controller hasn't been updated), the user sees no error but gets results as if `lang` were the default. This is a silent degradation that will be invisible during manual testing unless you specifically verify with a non-German query. - **The interpretation panel shows `rawQuery` back to the user.** After the migration, `rawQuery` comes from the Python service's `ParseResponse.rawQuery`, which echoes `request.query` verbatim. This is unchanged from Ollama behaviour — but it's worth confirming the frontend doesn't display this differently. If `rawQuery` is used as a display string ("You searched for: …"), mixed-language queries or queries with unusual casing (all-lowercase, as in the test battery) will look exactly as typed. No issue here, but confirms there's no normalisation needed on the Java side before it reaches the frontend. - **The `NlSearchRateLimiter` error includes the username in the message**: `"NL search rate limit exceeded for user: " + userKey`. This string is only used server-side in logs, not returned to the frontend (the `DomainException` `code` field carries `SMART_SEARCH_RATE_LIMITED`). The frontend error message for `SMART_SEARCH_RATE_LIMITED` is what users see — the issue body doesn't mention whether the existing i18n messages for this code are user-friendly or technical. Worth checking `messages/de.json` and `messages/en.json` to confirm the copy reads naturally at 20/min (it was presumably written for a 5/min regime where hitting it was exceptional). ### Recommendations - Confirm the controller-to-service `lang` pass-through is in the checklist (it's implicit but not stated — Felix also raised this). - Do a quick content review of the `SMART_SEARCH_RATE_LIMITED` message in all three locale files now, before the rate limit is raised and users actually start hitting it. A message like "Zu viele Suchanfragen — bitte warte einen Moment" is more helpful than a technical string. - No new open decisions — this is a verification item only.

marcel commented

2026-06-07 14:28:36 +02:00

🚀 Tobias Huber — DevOps / Infrastructure

Observations

The Dockerfile HEALTHCHECK uses a simple curl -f http://localhost:8001/health || exit 1 — this checks HTTP 200 but does NOT inspect the body. The issue's compose HEALTHCHECK is correctly specified as grep -q '"persons_loaded":[^0]', but the Dockerfile-level check is weaker. In production, if the image is run standalone (e.g. during a canary deploy or a bare docker run for debugging), the container will report healthy even when persons_loaded: 0. The Dockerfile HEALTHCHECK should match the compose spec.
requirements.txt pins dateparser>=1.2,<2.0 — dateparser has historically had significant behaviour changes across minor versions (locale handling, ambiguous date resolution). A range pin like >=1.2,<2.0 means a pip install after a minor release bump could silently change date parsing behaviour. For a production service the extraction correctness depends on dateparser's behaviour; pin to an exact version (e.g. dateparser==1.2.0) or at minimum >=1.2,<1.3.
No restart: unless-stopped policy is specified in the compose service spec. The issue body gives the environment variables and mem_limit but doesn't mention a restart policy. For a stateless sidecar that fails fast on DB unavailability (startup crash), restart: unless-stopped or restart: on-failure is needed so compose recovers the container automatically if it crashes at startup before the DB is ready (even with depends_on: service_healthy, there is a race window if the DB becomes unhealthy after the check passes).
No image: or build: context: path is specified in the compose spec in the issue body. The checklist says "Add nlp-service" but doesn't specify whether it uses a pre-built image from a registry or build: ./nlp-service. For a self-hosted family archive project with no CI/CD registry, the build: directive is almost certainly the right choice — this should be explicit in the spec so the implementer doesn't have to decide during the PR.

Recommendations

Update the Dockerfile HEALTHCHECK to match the compose spec: CMD curl -sf http://localhost:8001/health | python3 -c "import sys,json; d=json.load(sys.stdin); sys.exit(0 if d.get('persons_loaded',0)>0 else 1)" || exit 1. (Or keep curl | grep if curl+grep is preferred — the point is body inspection, not just HTTP 200.)
Pin dateparser to an exact version in requirements.txt.
Add restart: unless-stopped to the nlp-service compose block.
Add build: ./nlp-service to the nlp-service compose block in the issue spec.

## 🚀 Tobias Huber — DevOps / Infrastructure ### Observations - **The Dockerfile `HEALTHCHECK` uses a simple `curl -f http://localhost:8001/health || exit 1`** — this checks HTTP 200 but does NOT inspect the body. The issue's compose HEALTHCHECK is correctly specified as `grep -q '"persons_loaded":[^0]'`, but the Dockerfile-level check is weaker. In production, if the image is run standalone (e.g. during a canary deploy or a bare `docker run` for debugging), the container will report healthy even when `persons_loaded: 0`. The Dockerfile HEALTHCHECK should match the compose spec. - **`requirements.txt` pins `dateparser>=1.2,<2.0`** — `dateparser` has historically had significant behaviour changes across minor versions (locale handling, ambiguous date resolution). A range pin like `>=1.2,<2.0` means a `pip install` after a minor release bump could silently change date parsing behaviour. For a production service the extraction correctness depends on `dateparser`'s behaviour; pin to an exact version (e.g. `dateparser==1.2.0`) or at minimum `>=1.2,<1.3`. - **No `restart: unless-stopped` policy is specified in the compose service spec.** The issue body gives the environment variables and `mem_limit` but doesn't mention a restart policy. For a stateless sidecar that fails fast on DB unavailability (startup crash), `restart: unless-stopped` or `restart: on-failure` is needed so compose recovers the container automatically if it crashes at startup before the DB is ready (even with `depends_on: service_healthy`, there is a race window if the DB becomes unhealthy after the check passes). - **No `image:` or `build: context:` path is specified in the compose spec in the issue body.** The checklist says "Add `nlp-service`" but doesn't specify whether it uses a pre-built image from a registry or `build: ./nlp-service`. For a self-hosted family archive project with no CI/CD registry, the `build:` directive is almost certainly the right choice — this should be explicit in the spec so the implementer doesn't have to decide during the PR. ### Recommendations - Update the Dockerfile `HEALTHCHECK` to match the compose spec: `CMD curl -sf http://localhost:8001/health | python3 -c "import sys,json; d=json.load(sys.stdin); sys.exit(0 if d.get('persons_loaded',0)>0 else 1)" || exit 1`. (Or keep `curl | grep` if curl+grep is preferred — the point is body inspection, not just HTTP 200.) - Pin `dateparser` to an exact version in `requirements.txt`. - Add `restart: unless-stopped` to the nlp-service compose block. - Add `build: ./nlp-service` to the nlp-service compose block in the issue spec.

marcel commented

2026-06-07 14:28:54 +02:00

🗳️ Decision Queue — Round 2 Addendum

1 new decision raised in this review round.

Startup behaviour: fail-fast vs. degraded-mode

The round-1 review (Markus) flagged the contradiction between depends_on: db service_healthy and persons_loaded: 0 as a valid degraded state. Round 2 (Nora) identified a concrete code gap: main.py lifespan does NOT catch psycopg2.OperationalError, so a DB connection error at startup crashes the container rather than starting in degraded mode.

Two consistent options:

(a) Fail-fast (simpler): Keep depends_on: db service_healthy in compose. Remove any degraded-mode documentation. On DB error the container exits and compose restarts it. persons_loaded: 0 is then only a transient state during the startup window, not a steady state. No code change needed in main.py.

(b) Graceful degraded mode: Remove depends_on: db service_healthy (or keep it as a soft hint). Wrap psycopg2.connect() in try/except psycopg2.OperationalError and start with empty matcher. Service responds to queries with dates/keywords but no person matching. persons_loaded: 0 is a documented production state. Requires a code change in main.py lifespan.

These are mutually exclusive. The spec currently documents both without choosing.

Raised by: Nora (round 2)

## 🗳️ Decision Queue — Round 2 Addendum _1 new decision raised in this review round._ ### Startup behaviour: fail-fast vs. degraded-mode The round-1 review (Markus) flagged the contradiction between `depends_on: db service_healthy` and `persons_loaded: 0` as a valid degraded state. Round 2 (Nora) identified a concrete code gap: `main.py` lifespan does NOT catch `psycopg2.OperationalError`, so a DB connection error at startup crashes the container rather than starting in degraded mode. Two consistent options: **(a) Fail-fast (simpler):** Keep `depends_on: db service_healthy` in compose. Remove any degraded-mode documentation. On DB error the container exits and compose restarts it. `persons_loaded: 0` is then only a transient state during the startup window, not a steady state. No code change needed in `main.py`. **(b) Graceful degraded mode:** Remove `depends_on: db service_healthy` (or keep it as a soft hint). Wrap `psycopg2.connect()` in `try/except psycopg2.OperationalError` and start with empty matcher. Service responds to queries with dates/keywords but no person matching. `persons_loaded: 0` is a documented production state. Requires a code change in `main.py` lifespan. These are mutually exclusive. The spec currently documents both without choosing. _Raised by: Nora (round 2)_

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#770

feat(search): replace Ollama inference with lightweight DB-backed NLP service #770

Problem

Prototype: What Was Built

Extraction pipeline

Approach Tried First: spaCy NER — Why It Was Abandoned

1. Historical names not in training data

2. Cross-language failure

3. Span merging across prepositions

4. False positive compound nouns

5. DATE span unreliability

6. Image size

The Winning Approach: PersonMatcher

DB data quality filter

Test Battery — Full Results

German — full sentences

German — medium

German — topic only (no persons)

German — date ranges

English & Spanish

Edge cases — typos and lazy queries

Caveats and Known Limitations

Single bare names without prepositions

"nach" dual-role ambiguity

Fuzzy threshold false positives at 80

DB data quality dependency

Single-language preposition sets

Person names echoed verbatim (not resolved)

No reload on DB changes

Bare-name UX and the interpretation panel

Service Spec (for Java integration)

Endpoint

Response (matches NlpExtraction contract)

Health

Environment

Production Integration Work

Python service changes

Java changes

docker-compose changes

Acceptance criteria

Technical Debt Created

Prototype Code Location

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

Recommendations

Open Decisions

🏛️ Markus Lehmann — Backend Architect

Observations

Recommendations

🔒 Nora Weiß — Security Engineer

Observations

Recommendations

🧪 Sara Müller — QA Engineer

Observations

Recommendations

🎨 Leonie Fischer — UX Engineer

Observations

Recommendations

Open Decisions

🚀 Tobias Huber — DevOps / Infrastructure

Observations

Recommendations

🗳️ Decision Queue — Action Required

API Design

Infrastructure / Product

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

Recommendations

🏛️ Markus Lehmann — Backend Architect

Observations

Recommendations

🔒 Nora Weiß — Security Engineer

Observations

Recommendations

Open Decisions (omit if none)

🧪 Sara Müller — QA Engineer

Observations

Recommendations

🎨 Leonie Fischer — UX Engineer

Observations

Recommendations

Response (matches `NlpExtraction` contract)