From 5301b52e0f326ac6ae901c5d9122668df6b89b80 Mon Sep 17 00:00:00 2001
From: Marcel <marcel@familienarchiv>
Date: Sun, 7 Jun 2026 19:12:20 +0200
Subject: [PATCH] docs: remove nlp-service and NL search references from
 DEPLOYMENT.md and GLOSSARY.md

---
 docs/DEPLOYMENT.md | 72 +++++-----------------------------------------
 docs/GLOSSARY.md   |  7 -----
 2 files changed, 7 insertions(+), 72 deletions(-)
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index e8bc6b67..4692964a 100644
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -51,17 +51,15 @@ graph TD
 
 The OCR service requires significant RAM for model loading. The dev compose sets `mem_limit: 12g`.
 
-| Production target | RAM | Recommended OCR limit | NL Search | Notes |
-|---|---|---|---|---|
-| Current server (Hetzner Serverbörse, i7-6700) | 64 GB | 12 GB | Supported | Default `mem_limit: 12g` works comfortably; nlp-service adds only ~256 MB |
-| ≥ 16 GB RAM | 16+ GB | 12 GB | Supported | Default works |
-| 8 GB RAM | 8 GB | 6 GB | Supported | Set `OCR_MEM_LIMIT=6g`; accept reduced batch sizes; nlp-service is lightweight |
-| 4 GB RAM | 4 GB | — | Supported | Disable OCR service (`profiles: [ocr]`); run OCR on demand only; nlp-service still runs |
+| Production target | RAM | Recommended OCR limit | Notes |
+|---|---|---|---|
+| Current server (Hetzner Serverbörse, i7-6700) | 64 GB | 12 GB | Default `mem_limit: 12g` works comfortably |
+| ≥ 16 GB RAM | 16+ GB | 12 GB | Default works |
+| 8 GB RAM | 8 GB | 6 GB | Set `OCR_MEM_LIMIT=6g`; accept reduced batch sizes |
+| 4 GB RAM | 4 GB | — | Disable OCR service (`profiles: [ocr]`); run OCR on demand only |
 
 On servers with less than 16 GB RAM the default `mem_limit: 12g` cannot be honoured — set the `OCR_MEM_LIMIT` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow). The prod compose interpolates this var with a 12g default.
 
-> **Memory budget:** OCR (~6 GB active) + nlp-service (~256 MB) = ~6.25 GB. The previous Ollama LLM (~8 GB) has been replaced by the rule-based nlp-service — significant memory headroom freed on all server tiers.
-
 ### Dev vs production differences
 
 | Concern | Dev (`docker-compose.yml`) | Prod (`docker-compose.prod.yml`) |
@@ -148,19 +146,6 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
 | `XDG_CACHE_HOME` | XDG cache base dir — redirects Matplotlib and other XDG-aware libraries away from the read-only `HOME` (`/home/ocr`) to the writable cache volume | `/app/cache` | — | — |
 | `TORCH_HOME` | PyTorch model cache — redirects `~/.cache/torch` to the writable models volume | `/app/models/torch` | — | — |
 
-### NLP service (NL search)
-
-| Variable | Purpose | Default | Required? | Sensitive? |
-|---|---|---|---|---|
-| `APP_NLP_BASE_URL` | Internal URL of the nlp-service container. Wired automatically in compose via `http://nlp-service:8001`. | `http://nlp-service:8001` | YES | — |
-| `NLP_FUZZY_THRESHOLD` | Rapidfuzz similarity floor for person-name matching (0–100). Lower values match more aggressively; raise if false positives appear. | `80` | — | — |
-
-The nlp-service reads `DATABASE_URL` at startup (composed from `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB`). Any credential rotation that touches those three vars must be followed by a restart of **both** `backend` and `nlp-service`:
-
-```bash
-docker compose restart nlp-service backend
-```
-
 ### Observability stack (`docker-compose.observability.yml`)
 
 | Variable | Purpose | Default | Required? | Sensitive? |
@@ -281,14 +266,6 @@ git.raddatz.cloud      A   <server IP>
 
 ### 3.4 First deploy
 
-> **NL search startup:** `nlp-service` loads person names from the database at startup (single query, ~1–2 s). No model weights to download. The backend waits for `nlp-service` to pass its healthcheck (`/health` returns `{"status":"ok","persons_loaded":N}`) before starting, so `docker compose up -d --wait` is safe to use on first deploy.
->
-> **Verify NL search is active:**
-> ```bash
-> curl -s http://localhost:8001/health
-> # Returns {"status":"ok","persons_loaded":N} with N > 0 → person matching enabled
-> # Returns {"status":"ok","persons_loaded":0} → DB not reachable or persons table empty
-> ```
 
 ```bash
 # 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
@@ -328,7 +305,7 @@ docker compose logs --follow
 
 # Single snapshot
 docker compose logs --tail=200 <service>
-# services: frontend, backend, db, minio, ocr-service, nlp-service
+# services: frontend, backend, db, minio, ocr-service
 ```
 
 ### Log locations
@@ -585,41 +562,6 @@ bash scripts/download-kraken-models.sh
 
 > Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.
 
-### NLP service — natural-language search (NL Search)
-
-NL search uses the rule-based `nlp-service` FastAPI container for query parsing. It has no model weights — it loads person names from the database at startup and applies regex + fuzzy matching. See ADR-035.
-
-**Health check:**
-
-```bash
-curl -s http://localhost:8001/health
-# {"status":"ok","persons_loaded":1247}
-```
-
-`persons_loaded: 0` means the service started but could not reach the database (check `DATABASE_URL` and that `db` is healthy).
-
-If `POST /api/search/nl` returns HTTP 503 `SMART_SEARCH_UNAVAILABLE`, the backend cannot reach `nlp-service`. Check with:
-
-```bash
-docker compose logs nlp-service --tail=50
-docker compose ps nlp-service
-```
-
-**Configuration** (see `application.yaml` under `app.nlp`):
-
-| Property | Default | Description |
-|---|---|---|
-| `app.nlp.base-url` | `http://nlp-service:8001` | nlp-service URL; set via `APP_NLP_BASE_URL` env var |
-| `app.nl-search.rate-limit.max-requests-per-minute` | `20` | Per-user rate limit |
-
-**Tuning person matching:**
-
-Set `NLP_FUZZY_THRESHOLD` in `.env` (default: `80`, range: `0–100`). Lower values match more aggressively at the cost of false positives. Restart nlp-service after changing:
-
-```bash
-docker compose restart nlp-service
-```
-
 ### Trigger a canonical import
 
 The importer no longer parses the raw spreadsheet. It consumes the **canonical artifacts**
diff --git a/docs/GLOSSARY.md b/docs/GLOSSARY.md
index e30cf875..0af61be9 100644
--- a/docs/GLOSSARY.md
+++ b/docs/GLOSSARY.md
@@ -165,13 +165,6 @@ _See also [Chronik](#chronik-internal)._
 
 **Domain** — a Tier-1 bounded context with its own entities, controller, service, repository, and DTOs. Backend domains: `document`, `person`, `tag`, `user`, `geschichte`, `notification`, `ocr`, `audit`, `dashboard`. Frontend domains mirror this structure under `src/lib/`.
 
----
-
-## NL Search Terms
-
-**NlSearch** — the natural-language document search feature. Users type a plain-German query (e.g. "Was hat Walter im Krieg an Emma geschrieben?"); the backend parses it via Ollama, resolves person names to database UUIDs, and delegates to the standard `DocumentService.searchDocuments()` path. Endpoint: `POST /api/search/nl`.
-
-**NlQueryInterpretation** — the structured result of parsing a natural-language query. Contains: `resolvedPersons` (persons whose names unambiguously matched one DB record), `ambiguousPersons` (all candidates when a name matched more than one person), `keywords` (LLM-extracted search terms), `dateFrom`/`dateTo` (extracted date range), `rawQuery` (the original user input), `keywordsApplied` (whether keyword FTS was used), `resolvedTags` (tags matched by keyword→tag resolution), and `tagsApplied` (whether the OR-union tag filter was applied).
 
 **keyword→tag resolution** — the post-Ollama step in `NlQueryParserService` where each LLM-extracted keyword is substring-matched against the tag taxonomy via `TagService.findByNameContaining()`. Keywords that hit one or more tags are removed from the FTS text list and become an OR-union tag filter; keywords with no match remain as FTS text. Matching is case-insensitive and traverses the tag hierarchy via the recursive CTE `findDescendantIdsByName`. See ADR-033.