feat(search): replace Ollama inference with lightweight DB-backed NLP service #770
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The current NL search uses Ollama (
qwen2.5:7b-instruct-q4_K_M) to parse free-text queries into structured extractions. Inference takes 5–15 seconds per request, making the feature effectively unusable compared to filling in the filter form manually. The inference server also requires a GPU-capable host (or slow CPU fallback) and adds operational complexity.A prototype (
nlp-service/) has been built and evaluated against live DB data. This issue captures the full findings and specifies the production integration work.Prototype: What Was Built
Branch:
worktree-feat+nlp-service· Worktree:nlp-service/A standalone FastAPI service (Python 3.11) that parses free-text queries into the same
OllamaExtractioncontract the Java backend already consumes. No ML inference — response time is <100ms.Extraction pipeline
Step 1 — Person extraction (DB-backed fuzzy matching)
Loads all
(first_name, last_name)rows from thepersonstable at startup (1,728 records after filtering). At query time, scans for tokens following person-indicator prepositions and fuzzy-matches them against the loaded name index usingrapidfuzz.token_sort_ratio(threshold 80, configurable viaNLP_FUZZY_THRESHOLDenv var).Step 2 — Role detection
Single-person query: the anchoring preposition determines
sender/receiver.Multi-person query: always returns
"any"— Java mapspersonNames[0]→ sender,[1]→ receiver by left-to-right order.Step 3 — Date extraction
Pure regex:
\b\d{4}\bfinds year tokens. Direction words in the two tokens immediately before the year determine the range:vorbeforeantesnachafterdespués/despueszwischen…undbetween…andentre…yStep 4 — Keyword extraction
Removes matched person spans and year tokens from the query text, then applies a stopword filter (de/en/es) and emits content words ≥ 3 characters.
Approach Tried First: spaCy NER — Why It Was Abandoned
The original spec called for spaCy NER (
de_core_news_sm,en_core_web_sm,es_core_news_sm). After full evaluation against real archive queries, spaCy was dropped. Failures documented:1. Historical names not in training data
spaCy models are trained on modern news corpora. 1890–1950 German family names score below the PER threshold or are not tagged at all.
"Briefe von Eugenie"→personNames: [](first-name-only, not recognised)"Kriegsbriefe von Herbert"→keywords: ["herbert"](classified as a noun, not a person)2. Cross-language failure
The English model has no knowledge of German names. Running
en_core_web_smon a query with German names returns emptypersonNames."Letters from Clara Cram to Walter de Gruyter in 1920"→personNames: []3. Span merging across prepositions
spaCy NER greedily merges multi-token spans across prepositions when it thinks they form a single entity.
"Briefe von Herbert an Eugenie de Gruyter nach 1914"→ one PER span["Herbert an Eugenie de Gruyter"]instead of two4. False positive compound nouns
The German model tags long compound nouns as PER entities.
"Geburtstagsglückwünsche"→personNames: ["Geburtstagsglückwünsche"]5. DATE span unreliability
"Letters between 1914 and 1918"in English did not reliably produce two DATE spans; dateparser fallback produced inconsistent results.6. Image size
Three baked spaCy models add ~300 MB to the Docker image (~350 MB total). The DB-backed approach produces an image of ~50 MB.
The Winning Approach: PersonMatcher
PersonMatcher(inperson_matcher.py) loads the persons table at startup, builds a lowercase name index (full name, first name, last name as separate variants), and at query time runs two passes:Pass 1 — preposition-anchored: For each person-indicator preposition found in the token list, collect up to 3 consecutive non-stop tokens after it, fuzzy-match longest-first against the index. Stop tokens = prepositions + date-direction words + German function words (
im,am,seine,ihre,dem,den,über,und,oder, …).Pass 2 — full-name exact scan: Scan positions not consumed by Pass 1 for exact 2- or 3-word matches against the name index. This handles queries without prepositions (e.g.
"Clara Cram Herbert Cram 1920").DB data quality filter
290 of the 2,018 DB rows are annotations or descriptions stored as person records (e.g.
first_name="an seine",first_name="Eltern in",first_name="Todesanzeige für Paul Kesten mit…"). These are skipped at load time: any record whosefirst_nametokens contain known prepositions or possessives (an,in,aus,für,von,sein,seine,ihr,ihre, …) is excluded via_NON_NAME_TOKENSinPersonMatcher.load(). This is a stopgap — see "Technical Debt" below.Test Battery — Full Results
Evaluated against the live DB (1,728 persons loaded). Expected fields shown;
personRoleshown only when not"any".German — full sentences
Briefe von Clara Cram an Walter de Gruyter im Jahr 1920["Clara Cram", "Walter de Gruyter"]Briefe von Herbert an Eugenie de Gruyter nach 1914["Herbert", "Eugenie de Gruyter"]Schreiben von Albert de Gruyter an seine Kinder vor 1900["Albert de Gruyter"]Briefe von Juan Cram an Marie zwischen 1915 und 1918["Juan Cram", "Marie"]Telegramm von Walter de Gruyter an Clara im Jahr 1930["Walter de Gruyter", "Clara"]Briefe von Else Bohrmann an Herbert Cram nach 1939["Else Bohrmann", "Herbert Cram"]German — medium
Briefe von Clara Cram vor 1910["Clara Cram"]Briefe an Herbert Cram nach dem Krieg["Herbert Cram"]Schriften von Eugenie de Gruyter im Jahr 1905["Eugenie de Gruyter"]Briefe an Walter de Gruyter["Walter de Gruyter"]German — topic only (no persons)
Briefe aus dem Krieg["briefe", "krieg"]Kriegspost["kriegspost"]Geburtstagsglückwünsche["geburtstagsglückwünsche"]Kondolenzbriefe nach dem Tod von Eugenie["kondolenzbriefe", "tod"]+personNames: ["Eugenie"]German — date ranges
Dokumente zwischen 1914 und 1918Briefe vor 1900Schriften nach 19201918English & Spanish
Letters from Clara Cram to Walter de Gruyter in 1920["Clara Cram", "Walter de Gruyter"]Letters to Herbert Cram after 1939["Herbert Cram"]receiverBirthday greetings from Anita Wöhler["Anita Wöhler"]senderLetters between 1914 and 1918[]Cartas de Clara Cram a Walter de Gruyter en 1920["Clara Cram", "Walter de Gruyter"]Cartas antes de 1900[]Cartas de Juan Cram a sus hijos entre 1915 y 1920["Juan Cram"]senderEdge cases — typos and lazy queries
Briefe von EugeniepersonNames: ["Eugenie"]sender ✅Kriegsbriefe von HerbertpersonNames: ["Herbert"]sender ✅Briefe von Klara Kram an HerbertpersonNames: ["Klara Kram", "Herbert"](fuzzy match) ✅briefe von clara cram an herbert 1920(all lowercase)personNames: ["clara cram", "herbert"]✅von ClarapersonNames: ["Clara"]sender ✅an WalterpersonNames: ["Walter"]receiver ✅Wer hat an Herbert Cram 1918 geschrieben?personNames: ["Herbert Cram"]receiver, 1918 ✅Clara Cram Herbert Cram 1920(no prepositions)personNames: ["Clara Cram", "Herbert Cram"]via Pass 2 ✅Briefe von Herrbert Cram(double-r typo)personNames: ["Herrbert Cram"](fuzzy match, ≥80) ✅Briefe von Clara nach HerbertpersonNames: ["Clara", "Herbert"]✅Eugenie(bare name, no preposition)personNames: []⚠️ by design — see caveatsHerbert(bare name, no preposition)personNames: []⚠️ by design — see caveatsReise nach Mexiko im Jahr 1925personNames: [], keywords:["reise", "mexiko"]✅Schreiben von Albert de Gruyter an seine Kinder vor 1900personNames: ["Albert de Gruyter"]("seine Kinder" correctly suppressed) ✅Caveats and Known Limitations
Single bare names without prepositions
"Eugenie"and"Herbert"typed alone returnpersonNames: []. Without a preposition anchor there is no reliable way to distinguish a person name from a keyword. ThenoMatchFragmentspath inNlQueryParserService.buildText()feeds the name into full-text search, so documents mentioning the name will still surface — but the interpretation panel will show no person chip. Known UX gap: users may not understand why their name search returned text-matched documents instead of person-filtered ones. A future "Did you mean?" hint (using the ambiguous-persons panel) is the intended fix."nach" dual-role ambiguity
nachis both a German receiver preposition ("an/nach Herbert") and the date-after direction word ("nach 1914"). Person extraction and date extraction are separate passes and both consume the token correctly. The ambiguity only becomes an issue if a person name immediately precedes a year:"Briefe nach Marie 1920"— the direction-before-year logic detects that the word immediately before the year is a person token and falls back to a bare year (correct behaviour, covered by tests).Fuzzy threshold false positives at 80
The threshold is configurable via
NLP_FUZZY_THRESHOLDenv var (default 80). If false positives re-emerge after DB imports, raise to 85 and re-run the battery — no code change or redeploy required.DB data quality dependency
290 dirty records (descriptions stored as first_name) are filtered at load time via
_NON_NAME_TOKENS. A future DB cleanup would make this filter unnecessary. See "Technical Debt" below.Single-language preposition sets
The preposition lists are fixed enumerations (12–15 tokens across 3 languages). Unconventional phrasing without a listed preposition returns
personRole: "any"— same behaviour as Ollama. Java searches both sender and receiver positions in this case.Person names echoed verbatim (not resolved)
personNamescontains query text as matched, not the canonical DB name. Intentional —NlQueryParserService.resolveNames()already does its own fuzzy lookup. Duplicating this in Python would violate the single-responsibility principle.No reload on DB changes
The name index is loaded once at startup. New persons added to the DB are invisible until restart. A
/reloadendpoint or scheduled reload can be added if needed.Bare-name UX and the interpretation panel
When a query like
"Eugenie"producespersonNames: [], the interpretation panel shows no resolved person. The Java side sends "Eugenie" into full-text search vianoMatchFragments, which may return relevant results — but silently. This is a product gap to address separately, not a blocker for this integration.Service Spec (for Java integration)
Endpoint
langis required. The frontend passes the active Paraglide locale (de/en/es) from the session.NlSearchRequestmust gain alangfield;NlParserClient.parse(String query, String lang)must accept both parameters.Response (matches
NlpExtractioncontract)Health
persons_loaded: 0indicates the service started before the DB was ready (degraded: dates and keywords still work, person matching disabled). The composeHEALTHCHECKshould checkpersons_loaded > 0.Environment
DATABASE_URLpostgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}NLP_FUZZY_THRESHOLD80Security note:
nlp-servicehas no auth layer. It must NOT have aports:mapping in docker-compose — it is reachable only athttp://nlp-service:8001from within the Docker network. Never logquerycontent in production; log only aggregate metadata (persons_found=N,lang=de).Production Integration Work
The following is needed to replace Ollama in production. None of this is in the current branch.
Python service changes
query: str = Field(..., min_length=3, max_length=500)toParseRequestinmodels.py— defence-in-depth, independent of Java callerNLP_FUZZY_THRESHOLDenv var wiring inPersonMatcher(already noted as configurable — wire it up)Java changes
langtoNlSearchRequest—String langwith default"de"; frontend passes Paraglide localeOllamaClient→NlParserClientandOllamaExtraction→NlpExtraction— do this in the same PR; stale names cause ongoing confusion. Not cosmetic.RestClientNlpClientimplementingNlParserClient— POSTs{ query, lang }tohttp://nlp-service:8001/parseNlpProperties(app.nlp.base-url,app.nlp.timeout-seconds = 5) — do not reuseOllamaProperties; the 30s Ollama timeout is wrong for a sub-second serviceNlParserHealthClient—RestClientNlpClientmust also implement the health interface (currentlyOllamaHealthClient), callingGET /healthon the NLP serviceNlQueryParserServicecall site:nlParserClient.parse(request.query(), request.lang())SMART_SEARCH_UNAVAILABLEerror code: already fires onRestClientException; ensure the new client maps 5xx and connection refused to the sameDomainException.serviceUnavailableNlSearchRateLimitPropertiesfrom 5 → 20 req/min (Ollama-era value is too restrictive for <100ms responses)@WebMvcTestorRestClientTestthat stubs the NLP service at the HTTP level and asserts the fullNlSearchController → NlQueryParserService → RestClientNlpClientround-tripdocker-compose changes
nlp-service:mem_limit: 256m,depends_on: dbwithcondition: service_healthy, noports:mapping (internal network only),DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB},NLP_FUZZY_THRESHOLD: ${NLP_FUZZY_THRESHOLD:-80}CMD curl -sf http://localhost:8001/health | grep -q '"persons_loaded":[^0]' || exit 1ollamaandollama-model-initservices — these consume ~8 GB RAM and 4.7 GB model storage; removing them is the primary operational win of this migrationAPP_OLLAMA_BASE_URL,APP_OLLAMA_MODEL,APP_OLLAMA_API_KEYfrom the backend service environment blockAPP_NLP_BASE_URL: http://nlp-service:8001to the backend environmentAcceptance criteria
POST /api/search/nlwith{ "query": "Briefe von Clara Cram vor 1910", "lang": "de" }returnsinterpretation.resolvedPersonscontaining Clara Cram,interpretation.dateFrom: null,interpretation.dateTo: "1910-12-31"code: SMART_SEARCH_UNAVAILABLENlQueryParserServiceunit tests pass with a stubbedNlParserClientdocker statsshows nlp-service under 256 MB RSSollamaandollama-model-initcontainers are gone fromdocker-compose psTechnical Debt Created
first_namecontaining prepositions/phrases): 290 records are filtered at NLP service load time. The correct fix is a DB migration to remove or re-structure these records. File a separate cleanup issue targeting the import pipeline that created them._NON_NAME_TOKENSinPersonMatcher: valid stopgap; can be removed once DB is clean.Prototype Code Location
Test suite:
pytest -v— runs without a DB connection (fixture pre-seeds PersonMatcher).Last commit on branch:
fix(nlp-service): eliminate false-positive person matches from dirty DB records👨💻 Felix Brandt — Senior Fullstack Developer
Observations
langparameter is missing from the Java integration spec.NlQueryParserServicecallsollamaClient.parse(query)— the current interface isOllamaClient.parse(String query). The new service requires{ "query": "...", "lang": "de|en|es" }. The interface must change toparse(String query, String lang)(or take a request object) but this is not mentioned anywhere in the integration checklist. This is the largest gap in the spec.NlSearchRequestonly hasquery— nolangfield. Where does the language come from? This needs to be resolved before anyone writesRestClientNlpClient.RestClientNlpClientbut makes no mention of updating theOllamaClientinterface signature or theNlQueryParserServicecall site. The actual scope of the Java work is underspecified.OllamaHealthClientis a separate interface thatRestClientOllamaClientalso implements. The issue doesn't address whetherRestClientNlpClientneeds to implement health checking too. The health controller (if there is one) will break or silently skip health checks.OllamaExtractiontoNlpExtractionis listed as "cosmetic / optional." It isn't — every time someone reads the code the name sends the wrong mental model. Recommend doing it in the same PR.@SpringBootTestorRestClientTestthat stubs the NLP service and asserts the full round-trip.Recommendations
langtoNlSearchRequest(simplest path:String langwith default"de"). UpdateOllamaClient.parse(String query, String lang). This forces the call site to be explicit.NlpHealthClientinterface is needed — look for any@ConditionalOnPropertyor health-check endpoint that callsOllamaHealthClient.isHealthy()and decide now, not during implementation.OllamaExtraction→NlpExtraction,OllamaClient→NlParserClient) in the same PR. It's a search-and-replace, not a rewrite.Open Decisions
langsource: three options: (a) add toNlSearchRequest— explicit, user selects; (b) derive from the authenticated user's profile language — transparent, no API change; (c) hard-default to"de"(archive is primarily German) and adden/eslater. Option (a) is cleanest but changes the API contract. Option (c) is the fastest path to production but silently ignores EN/ES queries. Needs a product call.🏛️ Markus Lehmann — Backend Architect
Observations
OllamaClientinterface is the right seam.NlQueryParserServicedepends onOllamaClient(interface), so dropping inRestClientNlpClientis clean — no business logic changes needed, just the adapter. This is the correct architectural pattern and is already in place.OllamaProperties.timeoutSeconds = 30. The existingRestClientNlpClientwill inherit this config key or need a new one. A Python service responding in <100ms should have a 3–5 second timeout max, not 30. If not changed, a hung nlp-service will block a thread for 30s per request.persons_loaded: 0is valid (service starts without DB, person matching disabled). But then says "adddepends_on: dbwithcondition: service_healthy". These are contradictory: if the service is designed to start without DB, don't require DB health for startup — just log a warning and serve degraded. If you require DB health, thenpersons_loaded: 0is never a normal production state.NlQueryParserServicehas a hardcodedMAX_RESOLVED_PERSONS = 2(implicit inresolveNames). PersonMatcher also returns at most 2 meaningful persons (first is sender, second is receiver). These need to stay in sync. Not a blocker but should be noted in the spec._NON_NAME_TOKENSfilter is a service-layer data concern. It corrects for dirty data in the DB. The right long-term home for this logic is either thePersonimporter/validator or a DB migration, not a runtime filter in the NLP service. Noting this so it doesn't get left as permanent technical debt.Recommendations
NlpProperties(app.nlp.*) alongside existingOllamaProperties— at minimumbaseUrlandtimeoutSeconds(default 5). Don't reuseOllamaPropertiesas that couples two distinct integrations.depends_onmatch the decision. Don't do both.first_namecontaining prepositions). The_NON_NAME_TOKENSfilter is correct as a stopgap but should not be permanent.🔒 Nora Weiß — Security Engineer
Observations
docker-compose.ymlcurrently exposes OCR service only on the internal network. The issue must explicitly say:nlp-servicegets NOports:entry in docker-compose — it is reachable only athttp://nlp-service:8001from other containers. If aports:mapping is added (even for dev convenience), it becomes an unauthenticated HTTP endpoint exposed on the host. The Python service has no auth layer whatsoever.models.pyvalidateslang(Literal enum) andqueryis an unvalidatedstr. The Java layer enforces 3–500 chars, but if the service is ever called directly (container network, internal tooling, curl for debugging), arbitrarily large inputs go straight to the extraction pipeline.PersonMatcher.find_in_querysplits on whitespace — a 500KB payload with no spaces would produce one giant token and runprocess.extractOneagainst 1,728 names. That's a real resource spike.DATABASE_URLcontains credentials. The issue says "set DATABASE_URL from the existing compose env." The existing.env/docker-compose.ymlpattern stores these as env vars (${POSTGRES_PASSWORD}etc.). The spec should be explicit: reference${DATABASE_URL}or construct from existingPOSTGRES_*variables — do not hardcode credentials in compose.main.pylifespan andextractor.pydon't log queries, which is correct. The spec should note: never logquerycontent in production — only log aggregate metadata (e.g.persons_found=2,lang=de).Recommendations
query: str = Field(..., min_length=3, max_length=500)toParseRequestinmodels.py. This is a one-liner and provides defence-in-depth independent of the Java caller.ports:mapping for nlp-service — internal network only."DATABASE_URL: ${DATABASE_URL}(or constructed equivalent) to the compose spec — explicitly, not as prose.🧪 Sara Müller — QA Engineer
Observations
SMART_SEARCH_RATE_LIMITEDerrors that didn't exist before, and users won't understand why. This isn't a caveat — it's a regression in usability that needs a deliberate decision.SMART_SEARCH_UNAVAILABLEpersons_loaded: 0(started before DB was ready) → what does the Java search return? Documents only (no person filter) — is this surfaced to the user?fuzzy thresholdis hardcoded at 80. The issue mentions "raise to 85 if false positives re-emerge" as prose. This needs to be a configurable env var (NLP_FUZZY_THRESHOLD), not a code change requiring a deploy.NlQueryParserService—noMatchFragmentsare passed intobuildText()which feeds the full-text search. So "Eugenie" (no preposition match) becomes a full-text keyword, which may surface results. But the interpretation panel will showresolvedPersons: []with no explanation. The user may not understand why their name search returned text-matched documents instead of person-filtered ones.Recommendations
POST /api/search/nlwithquery: "Briefe von Clara Cram vor 1910"returnsinterpretation.resolvedPersonscontaining Clara Cram andinterpretation.dateFromnull,dateTo: 1910-12-31code: SMART_SEARCH_UNAVAILABLENlQueryParserServicestill pass with a stubbedNlParserClientNLP_FUZZY_THRESHOLDas a documented env var with default 80.🎨 Leonie Fischer — UX Engineer
Observations
SMART_SEARCH_RATE_LIMITEDat 5/min will feel arbitrary and frustrating at this speed. A user who types a query, gets results, then refines it 5 times in 60 seconds hits a wall with no obvious explanation. The error message in the frontend should say something like "Bitte warte kurz" rather than a generic error.NlQueryInterpretationincludesresolvedPersons,ambiguousPersons,resolvedTags,keywordsApplied,tagsApplied. With fast responses, users will actually read this. The "bare name" case (Eugenie→personNames: [], falls through to text search) will look like the search didn't understand them — no person chip in the interpretation panel. The existing suggestion mechanism (noMatchFragments) sends the name into full-text search silently. This should either surface a "Did you mean: Eugenie de Gruyter?" hint or be documented as known UX gap.langfeedback in the UI. Iflangis derived automatically (session language or default "de"), users typing English queries into a German-configured archive get silent degraded results. The issue doesn't address whether the UI exposes language selection.Recommendations
SearchBar/ NL search component — if there's a minimum spinner duration or debounce delay tuned for Ollama latency, update it.SMART_SEARCH_RATE_LIMITEDerror message inen.json/de.json/es.jsonreads as user-friendly, not as a technical error.Open Decisions
🚀 Tobias Huber — DevOps / Infrastructure
Observations
mem_limit: 8g) andollama-model-initdownloads a 4.7 GB model. These two services must be removed (or commented out for rollback). Not removing them means running both Ollama and the nlp-service simultaneously with no benefit. The checklist should include: removeollama,ollama-model-init, and theAPP_OLLAMA_*env vars from the backend service.mem_limit: 12g(Surya/Kraken models). The NLP service carries only rapidfuzz + dateparser + the name index in memory —mem_limit: 256mshould be sufficient. Without a limit, compose will assign no cap and the container can grow unbounded if there is a memory leak.OllamaProperties.timeoutSeconds = 30drives a 30-second read timeout on the inference HTTP client.RestClientNlpClientmust use a newNlpPropertiesbean withtimeoutSeconds = 5(or even 3). This is a production concern: a hung nlp-service ties up a thread for 30 seconds per pending request under the current config.GET /health → { "status": "ok", "persons_loaded": 1728 }. The composeHEALTHCHECKshould checkpersons_loaded > 0to detect degraded startup (DB was unavailable). Currently a standardcurl -f /healthreturns 200 even ifpersons_loaded: 0, which means the container reports healthy when person-matching is completely non-functional.persons_loadedgauge. For a production service, at minimum expose a/metricsendpoint (FastAPI +prometheus-fastapi-instrumentatoris a one-liner). Not a blocker for the first iteration but should be in the backlog.DATABASE_URLformat. The compose env file usesPOSTGRES_USER,POSTGRES_PASSWORD,POSTGRES_DBseparately. The nlp-service needs a full DSN. Document whether to construct it in compose (postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}) or addNLP_DATABASE_URLas a separate variable.Recommendations
ollama,ollama-model-init; removeAPP_OLLAMA_BASE_URLandAPP_OLLAMA_MODELfrom backend env.mem_limit: 256mto the nlp-service compose spec.app.nlp.timeout-seconds = 5property (not the Ollama 30s).HEALTHCHECKto:CMD curl -sf http://localhost:8001/health | grep -q '"persons_loaded":[^0]' || exit 1— fails if persons index is empty.🗳️ Decision Queue — Action Required
2 decisions need your input before implementation starts.
API Design
langparameter source — The NLP service requires alangfield (de/en/es) but the currentNlSearchRequestonly hasquery, andOllamaClient.parse(String query)has no language parameter. Three options:langtoNlSearchRequest— user or frontend selects language explicitly. Cleanest, most flexible. Changes the API contract and requires a frontend update."de"— fastest path to production, works for the primary use case. EN/ES support is latent but can be activated later with option (a). Silently returns degraded results for non-German queries.Raised by: Felix
Infrastructure / Product
Rate limit: 5/min → ? — The current limit was sized for 5–15s Ollama inference. With sub-second NLP parsing, it will fire on normal user behaviour (type query, refine, refine again). Two options:
Raised by: Sara, Leonie
👨💻 Felix Brandt — Senior Fullstack Developer
Observations
NlSearchController.search()passes onlyrequest.query()toNlQueryParserService.search()— thelangfield added toNlSearchRequestwill be silently dropped unless the controller call site is updated:nlQueryParserService.search(request.query(), request.lang(), pageable). The checklist item "UpdateNlQueryParserServicecall site" exists on the Python/Java rename side but the controller-to-service wiring gap is not explicitly called out. Easy to miss during implementation.NlQueryParserServiceTeststubsOllamaClient.parse(String query)withanyString()and constructsNlQueryParserServicedirectly via constructor. When the interface is renamed toNlParserClientand the signature changes toparse(String query, String lang), every singlewhen(ollamaClient.parse(anyString()))stub in the 35-test class breaks withPotentialStubbingProblem. This is a non-trivial mechanical refactor — not a one-liner — and should be called out explicitly in the checklist so it doesn't land as a surprise mid-PR.RestClientOllamaClientTestwill becomeRestClientNlpClientTest. The WireMock stubs point at/api/generate(Ollama-specific path). The new client posts to/parse. Three stubs need updating and the response shape changes from the Ollama envelope ({"response":"...","done":true}) to the flat NLP service JSON. The checklist mentions "Java test" as a single item but says nothing about migrating or replacing the existing WireMock test — this could cause confusion about whether the old test is kept, adapted, or deleted.NlSearchControllerTesttest #7 hardcodes the 5-req/min limit (for (int i = 0; i < 5; i++)). After the rate limit is raised to 20/min, this test continues to pass (it only does 5 iterations) but the assertion no longer validates the correct limit. The test should be updated to use the configured value or at least document why it only goes to 5.Recommendations
NlSearchController.search()to passrequest.lang()to service."NlQueryParserServiceTeststubs fromparse(anyString())toparse(anyString(), anyString())."RestClientOllamaClientTest— WireMock stubs target/api/generate; new client uses/parsewith a flat response shape."NlSearchControllerTest#7, either useprops.getMaxRequestsPerMinute()as the loop bound or add a comment marking it as a known limit-tied assertion.🏛️ Markus Lehmann — Backend Architect
Observations
NlQueryParserServicehasprivate final OllamaClient ollamaClient— when the interface is renamed toNlParserClient, the field name will also need updating. More importantly,NlQueryParserService.search()currently takes(String query, Pageable pageable). Addinglangas a third parameter is straightforward, but the service signature change propagates toNlSearchController(call site) and to all test stubs. The spec calls out the client interface change but does not explicitly say the service method signature changes too — this needs to be stated.The existing
OllamaPropertiesbean is still wired inRestClientOllamaClient's constructor. WhenRestClientOllamaClientis replaced byRestClientNlpClientusingNlpProperties, the oldOllamaPropertiesbean (@ConfigurationProperties("app.ollama")) must also be removed from the context or it will fail to bind at startup ifapp.ollama.*keys are absent fromapplication.properties. The checklist covers removing compose env vars but does not mention removing theOllamaPropertiesSpring component registration itself.OllamaExtractionis a package-private record (nopublicmodifier). After renaming toNlpExtraction, verify visibility is consistent — ifRestClientNlpClientlives in the samesearchpackage this is fine, but worth confirming so nothing breaks if packages are ever reorganised.The
healthCheckTimeoutSecondsfield onOllamaProperties(currently 2s) will need an equivalent onNlpProperties. The issue spec listsbaseUrlandtimeoutSecondsforNlpPropertiesbut omitshealthCheckTimeoutSeconds.RestClientOllamaClientuses a separateRestClientinstance with its own timeout for health checks. The newRestClientNlpClientwill need the same split — otherwise a slow health check endpoint ties up 5s (the inference timeout) during startup probing.Recommendations
NlQueryParserService.search(String query, Pageable)signature tosearch(String query, String lang, Pageable)."OllamaProperties@Componentannotation / bean registration whenRestClientOllamaClientis deleted — prevent startup binding failure."healthCheckTimeoutSeconds(default 2) to theNlpPropertiesspec in the issue body.🔒 Nora Weiß — Security Engineer
Observations
main.py_load_persons_from_db()uses a blocking synchronouspsycopg2connection inside the FastAPI lifespan — this is fine for a startup load, but the DB credentials come fromDATABASE_URLwith no validation. IfDATABASE_URLis malformed or points to the wrong host,psycopg2.connect()raises an exception that is not caught, which causes the lifespan to fail and the container to exit at startup. This is arguably the correct fail-fast behaviour, but it means a misconfiguredDATABASE_URL(e.g. wrong password after a secret rotation) kills the NLP service entirely rather than starting in degraded mode. The issue sayspersons_loaded: 0is a valid degraded state, but the code does not currently achieve this for DB connection errors — only for a missingDATABASE_URLenv var. The lifespan should catchpsycopg2.OperationalErrorand log a warning rather than crashing.parseendpoint exception handler leaks internal detail:raise HTTPException(status_code=500, detail=str(exc)). In production,str(exc)from a Python exception can contain file paths, internal variable state, or DB connection strings if the exception propagates from psycopg2. The detail should be a generic string, notstr(exc).The Dockerfile
COPY . .instruction copies everything includingvenv/,__pycache__/,.envfiles (if present), and test files into the image. A.dockerignorefile is not present in the listed directory contents. Without it, local virtual environments, credentials files, and test corpora are bundled into the production image. This is a standard oversight but worth calling out explicitly — a minimal.dockerignore(venv/,__pycache__/,*.pyc,.env,test_*.py) should be added.Recommendations
_load_persons_from_db()in atry/except psycopg2.OperationalErrorin the lifespan; on failure, log a warning and leave the matcher empty (degraded mode). This aligns the runtime behaviour with the documentedpersons_loaded: 0degraded state.detail=str(exc)in the parse endpoint withdetail="Internal extraction error"— never expose exception text in HTTP responses..dockerignoretonlp-service/before the image is built for production.Open Decisions (omit if none)
depends_on: db service_healthy, then thepsycopg2.OperationalErrorcatch-and-degrade is not needed — the container simply won't start until the DB is ready. The two approaches are mutually exclusive. Nora's recommendation above applies only if the degraded-mode path is kept.🧪 Sara Müller — QA Engineer
Observations
test_main.pytest_healthassertspersons_loaded > 0— this works when the test matcher is pre-seeded, but there is no test covering thepersons_loaded: 0case (DB not available at startup). The HEALTHCHECK in compose is specifically designed to detect this condition, yet the test suite never exercises the degraded/healthresponse. A testtest_health_degraded(setting_matcherto an emptyPersonMatcher) would cover it.test_main.pydoes not test the query length validation boundary.models.pycurrently has nomin_length/max_lengthonParseRequest.query(this is listed as a checklist item to add). Once added, the test file needs a correspondingtest_parse_empty_query_returns_422andtest_parse_oversized_query_returns_422. There is no test for these cases today and no checklist item to add them.The
NlSearchControllerTesthappy-path test (#1) currently sends{"query":"Briefe von Walter im Krieg"}with nolangfield. AfterNlSearchRequestgains alangfield (even with a default), this test should be updated to explicitly include"lang":"de"in at least one case to verify the field is accepted and forwarded correctly. A test withoutlangonly validates the default, not the pass-through.No test covers the multi-year date range extraction edge case in
test_extractor.py: the "zwischen…und" path that sorts and selects only the first two years. If a query contains three year tokens like"zwischen 1914 und 1918 und 1920", the current code returns 1914–1918 (correct) but this is not an explicit test case.Recommendations
test_health_degradedtotest_main.py: pre-seed an emptyPersonMatcher, callGET /health, assertpersons_loaded == 0andstatus == "ok".test_main.pyonce theField(min_length=3, max_length=500)constraint is wired.NlSearchControllerTesttest #1 to include"lang":"de"in the request body, and add a variant that sends nolangfield to verify default behaviour is correct.test_extractor.py.🎨 Leonie Fischer — UX Engineer
Observations
The
NlSearchControllercurrently passes onlyrequest.query()to the service — thelangfield is discarded at the controller boundary. If the frontend passes the Paraglide locale aslang, but the value is silently ignored (because the controller hasn't been updated), the user sees no error but gets results as iflangwere the default. This is a silent degradation that will be invisible during manual testing unless you specifically verify with a non-German query.The interpretation panel shows
rawQueryback to the user. After the migration,rawQuerycomes from the Python service'sParseResponse.rawQuery, which echoesrequest.queryverbatim. This is unchanged from Ollama behaviour — but it's worth confirming the frontend doesn't display this differently. IfrawQueryis used as a display string ("You searched for: …"), mixed-language queries or queries with unusual casing (all-lowercase, as in the test battery) will look exactly as typed. No issue here, but confirms there's no normalisation needed on the Java side before it reaches the frontend.The
NlSearchRateLimitererror includes the username in the message:"NL search rate limit exceeded for user: " + userKey. This string is only used server-side in logs, not returned to the frontend (theDomainExceptioncodefield carriesSMART_SEARCH_RATE_LIMITED). The frontend error message forSMART_SEARCH_RATE_LIMITEDis what users see — the issue body doesn't mention whether the existing i18n messages for this code are user-friendly or technical. Worth checkingmessages/de.jsonandmessages/en.jsonto confirm the copy reads naturally at 20/min (it was presumably written for a 5/min regime where hitting it was exceptional).Recommendations
langpass-through is in the checklist (it's implicit but not stated — Felix also raised this).SMART_SEARCH_RATE_LIMITEDmessage in all three locale files now, before the rate limit is raised and users actually start hitting it. A message like "Zu viele Suchanfragen — bitte warte einen Moment" is more helpful than a technical string.🚀 Tobias Huber — DevOps / Infrastructure
Observations
The Dockerfile
HEALTHCHECKuses a simplecurl -f http://localhost:8001/health || exit 1— this checks HTTP 200 but does NOT inspect the body. The issue's compose HEALTHCHECK is correctly specified asgrep -q '"persons_loaded":[^0]', but the Dockerfile-level check is weaker. In production, if the image is run standalone (e.g. during a canary deploy or a baredocker runfor debugging), the container will report healthy even whenpersons_loaded: 0. The Dockerfile HEALTHCHECK should match the compose spec.requirements.txtpinsdateparser>=1.2,<2.0—dateparserhas historically had significant behaviour changes across minor versions (locale handling, ambiguous date resolution). A range pin like>=1.2,<2.0means apip installafter a minor release bump could silently change date parsing behaviour. For a production service the extraction correctness depends ondateparser's behaviour; pin to an exact version (e.g.dateparser==1.2.0) or at minimum>=1.2,<1.3.No
restart: unless-stoppedpolicy is specified in the compose service spec. The issue body gives the environment variables andmem_limitbut doesn't mention a restart policy. For a stateless sidecar that fails fast on DB unavailability (startup crash),restart: unless-stoppedorrestart: on-failureis needed so compose recovers the container automatically if it crashes at startup before the DB is ready (even withdepends_on: service_healthy, there is a race window if the DB becomes unhealthy after the check passes).No
image:orbuild: context:path is specified in the compose spec in the issue body. The checklist says "Addnlp-service" but doesn't specify whether it uses a pre-built image from a registry orbuild: ./nlp-service. For a self-hosted family archive project with no CI/CD registry, thebuild:directive is almost certainly the right choice — this should be explicit in the spec so the implementer doesn't have to decide during the PR.Recommendations
HEALTHCHECKto match the compose spec:CMD curl -sf http://localhost:8001/health | python3 -c "import sys,json; d=json.load(sys.stdin); sys.exit(0 if d.get('persons_loaded',0)>0 else 1)" || exit 1. (Or keepcurl | grepif curl+grep is preferred — the point is body inspection, not just HTTP 200.)dateparserto an exact version inrequirements.txt.restart: unless-stoppedto the nlp-service compose block.build: ./nlp-serviceto the nlp-service compose block in the issue spec.🗳️ Decision Queue — Round 2 Addendum
1 new decision raised in this review round.
Startup behaviour: fail-fast vs. degraded-mode
The round-1 review (Markus) flagged the contradiction between
depends_on: db service_healthyandpersons_loaded: 0as a valid degraded state. Round 2 (Nora) identified a concrete code gap:main.pylifespan does NOT catchpsycopg2.OperationalError, so a DB connection error at startup crashes the container rather than starting in degraded mode.Two consistent options:
(a) Fail-fast (simpler): Keep
depends_on: db service_healthyin compose. Remove any degraded-mode documentation. On DB error the container exits and compose restarts it.persons_loaded: 0is then only a transient state during the startup window, not a steady state. No code change needed inmain.py.(b) Graceful degraded mode: Remove
depends_on: db service_healthy(or keep it as a soft hint). Wrappsycopg2.connect()intry/except psycopg2.OperationalErrorand start with empty matcher. Service responds to queries with dates/keywords but no person matching.persons_loaded: 0is a documented production state. Requires a code change inmain.pylifespan.These are mutually exclusive. The spec currently documents both without choosing.
Raised by: Nora (round 2)