docs(search): update CLAUDE.md, GLOSSARY, DEPLOYMENT, and C4 diagrams

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 16:16:04 +02:00
parent 4634da9865
commit 44baff9c9c
5 changed files with 48 additions and 2 deletions
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -560,6 +560,37 @@ bash scripts/download-kraken-models.sh

 > Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.

+### Ollama — natural-language search (NL Search)
+
+NL search uses a local Ollama instance for query parsing. The `ollama` service is defined in `docker-compose.yml` alongside the main stack.
+
+**First-time model pull** (required before the feature works):
+
+```bash
+docker compose exec ollama ollama pull qwen2.5:7b-instruct-q4_K_M
+```
+
+This downloads ~4.4 GB. The model is stored in the `ollama_data` Docker volume and persists across container restarts.
+
+**Verify the model is available:**
+
+```bash
+docker compose exec ollama ollama list
+```
+
+Expected output includes `qwen2.5:7b-instruct-q4_K_M`.
+
+**Health check** — the backend polls `GET /api/tags` on Ollama at startup and before inference. If Ollama is absent, `POST /api/search/nl` returns HTTP 503 with `SMART_SEARCH_UNAVAILABLE`.
+
+**Configuration** (see `application.yaml` under `app.ollama`):
+
+| Property | Default | Description |
+|---|---|---|
+| `app.ollama.base-url` | `http://ollama:11434` | Ollama service URL (dev: `http://localhost:11434`) |
+| `app.ollama.model` | `qwen2.5:7b-instruct-q4_K_M` | Model to use for inference |
+| `app.ollama.timeout-seconds` | `30` | Read timeout for inference calls |
+| `app.nl-search.rate-limit.max-requests-per-minute` | `5` | Per-user rate limit |
+
 ### Trigger a canonical import

 The importer no longer parses the raw spreadsheet. It consumes the **canonical artifacts**