From 5301b52e0f326ac6ae901c5d9122668df6b89b80 Mon Sep 17 00:00:00 2001 From: Marcel Date: Sun, 7 Jun 2026 19:12:20 +0200 Subject: [PATCH] docs: remove nlp-service and NL search references from DEPLOYMENT.md and GLOSSARY.md --- docs/DEPLOYMENT.md | 72 +++++----------------------------------------- docs/GLOSSARY.md | 7 ----- 2 files changed, 7 insertions(+), 72 deletions(-) diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index e8bc6b67..4692964a 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -51,17 +51,15 @@ graph TD The OCR service requires significant RAM for model loading. The dev compose sets `mem_limit: 12g`. -| Production target | RAM | Recommended OCR limit | NL Search | Notes | -|---|---|---|---|---| -| Current server (Hetzner Serverbörse, i7-6700) | 64 GB | 12 GB | Supported | Default `mem_limit: 12g` works comfortably; nlp-service adds only ~256 MB | -| ≥ 16 GB RAM | 16+ GB | 12 GB | Supported | Default works | -| 8 GB RAM | 8 GB | 6 GB | Supported | Set `OCR_MEM_LIMIT=6g`; accept reduced batch sizes; nlp-service is lightweight | -| 4 GB RAM | 4 GB | — | Supported | Disable OCR service (`profiles: [ocr]`); run OCR on demand only; nlp-service still runs | +| Production target | RAM | Recommended OCR limit | Notes | +|---|---|---|---| +| Current server (Hetzner Serverbörse, i7-6700) | 64 GB | 12 GB | Default `mem_limit: 12g` works comfortably | +| ≥ 16 GB RAM | 16+ GB | 12 GB | Default works | +| 8 GB RAM | 8 GB | 6 GB | Set `OCR_MEM_LIMIT=6g`; accept reduced batch sizes | +| 4 GB RAM | 4 GB | — | Disable OCR service (`profiles: [ocr]`); run OCR on demand only | On servers with less than 16 GB RAM the default `mem_limit: 12g` cannot be honoured — set the `OCR_MEM_LIMIT` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow). The prod compose interpolates this var with a 12g default. -> **Memory budget:** OCR (~6 GB active) + nlp-service (~256 MB) = ~6.25 GB. The previous Ollama LLM (~8 GB) has been replaced by the rule-based nlp-service — significant memory headroom freed on all server tiers. - ### Dev vs production differences | Concern | Dev (`docker-compose.yml`) | Prod (`docker-compose.prod.yml`) | @@ -148,19 +146,6 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back | `XDG_CACHE_HOME` | XDG cache base dir — redirects Matplotlib and other XDG-aware libraries away from the read-only `HOME` (`/home/ocr`) to the writable cache volume | `/app/cache` | — | — | | `TORCH_HOME` | PyTorch model cache — redirects `~/.cache/torch` to the writable models volume | `/app/models/torch` | — | — | -### NLP service (NL search) - -| Variable | Purpose | Default | Required? | Sensitive? | -|---|---|---|---|---| -| `APP_NLP_BASE_URL` | Internal URL of the nlp-service container. Wired automatically in compose via `http://nlp-service:8001`. | `http://nlp-service:8001` | YES | — | -| `NLP_FUZZY_THRESHOLD` | Rapidfuzz similarity floor for person-name matching (0–100). Lower values match more aggressively; raise if false positives appear. | `80` | — | — | - -The nlp-service reads `DATABASE_URL` at startup (composed from `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB`). Any credential rotation that touches those three vars must be followed by a restart of **both** `backend` and `nlp-service`: - -```bash -docker compose restart nlp-service backend -``` - ### Observability stack (`docker-compose.observability.yml`) | Variable | Purpose | Default | Required? | Sensitive? | @@ -281,14 +266,6 @@ git.raddatz.cloud A ### 3.4 First deploy -> **NL search startup:** `nlp-service` loads person names from the database at startup (single query, ~1–2 s). No model weights to download. The backend waits for `nlp-service` to pass its healthcheck (`/health` returns `{"status":"ok","persons_loaded":N}`) before starting, so `docker compose up -d --wait` is safe to use on first deploy. -> -> **Verify NL search is active:** -> ```bash -> curl -s http://localhost:8001/health -> # Returns {"status":"ok","persons_loaded":N} with N > 0 → person matching enabled -> # Returns {"status":"ok","persons_loaded":0} → DB not reachable or persons table empty -> ``` ```bash # 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow") @@ -328,7 +305,7 @@ docker compose logs --follow # Single snapshot docker compose logs --tail=200 -# services: frontend, backend, db, minio, ocr-service, nlp-service +# services: frontend, backend, db, minio, ocr-service ``` ### Log locations @@ -585,41 +562,6 @@ bash scripts/download-kraken-models.sh > Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated. -### NLP service — natural-language search (NL Search) - -NL search uses the rule-based `nlp-service` FastAPI container for query parsing. It has no model weights — it loads person names from the database at startup and applies regex + fuzzy matching. See ADR-035. - -**Health check:** - -```bash -curl -s http://localhost:8001/health -# {"status":"ok","persons_loaded":1247} -``` - -`persons_loaded: 0` means the service started but could not reach the database (check `DATABASE_URL` and that `db` is healthy). - -If `POST /api/search/nl` returns HTTP 503 `SMART_SEARCH_UNAVAILABLE`, the backend cannot reach `nlp-service`. Check with: - -```bash -docker compose logs nlp-service --tail=50 -docker compose ps nlp-service -``` - -**Configuration** (see `application.yaml` under `app.nlp`): - -| Property | Default | Description | -|---|---|---| -| `app.nlp.base-url` | `http://nlp-service:8001` | nlp-service URL; set via `APP_NLP_BASE_URL` env var | -| `app.nl-search.rate-limit.max-requests-per-minute` | `20` | Per-user rate limit | - -**Tuning person matching:** - -Set `NLP_FUZZY_THRESHOLD` in `.env` (default: `80`, range: `0–100`). Lower values match more aggressively at the cost of false positives. Restart nlp-service after changing: - -```bash -docker compose restart nlp-service -``` - ### Trigger a canonical import The importer no longer parses the raw spreadsheet. It consumes the **canonical artifacts** diff --git a/docs/GLOSSARY.md b/docs/GLOSSARY.md index e30cf875..0af61be9 100644 --- a/docs/GLOSSARY.md +++ b/docs/GLOSSARY.md @@ -165,13 +165,6 @@ _See also [Chronik](#chronik-internal)._ **Domain** — a Tier-1 bounded context with its own entities, controller, service, repository, and DTOs. Backend domains: `document`, `person`, `tag`, `user`, `geschichte`, `notification`, `ocr`, `audit`, `dashboard`. Frontend domains mirror this structure under `src/lib/`. ---- - -## NL Search Terms - -**NlSearch** — the natural-language document search feature. Users type a plain-German query (e.g. "Was hat Walter im Krieg an Emma geschrieben?"); the backend parses it via Ollama, resolves person names to database UUIDs, and delegates to the standard `DocumentService.searchDocuments()` path. Endpoint: `POST /api/search/nl`. - -**NlQueryInterpretation** — the structured result of parsing a natural-language query. Contains: `resolvedPersons` (persons whose names unambiguously matched one DB record), `ambiguousPersons` (all candidates when a name matched more than one person), `keywords` (LLM-extracted search terms), `dateFrom`/`dateTo` (extracted date range), `rawQuery` (the original user input), `keywordsApplied` (whether keyword FTS was used), `resolvedTags` (tags matched by keyword→tag resolution), and `tagsApplied` (whether the OR-union tag filter was applied). **keyword→tag resolution** — the post-Ollama step in `NlQueryParserService` where each LLM-extracted keyword is substring-matched against the tag taxonomy via `TagService.findByNameContaining()`. Keywords that hit one or more tags are removed from the FTS text list and become an OR-union tag filter; keywords with no match remain as FTS text. Matching is case-insensitive and traverses the tag hierarchy via the recursive CTE `findDescendantIdsByName`. See ADR-033.