diff --git a/CLAUDE.md b/CLAUDE.md
index 8364399a..b0203020 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -92,6 +92,7 @@ backend/src/main/java/org/raddatz/familienarchiv/
 ├── ocr/                 OCR domain — OcrService, OcrBatchService, training
 ├── person/              Person domain
 │   └── relationship/    PersonRelationship sub-domain
+├── search/              NL search domain — NlSearchController, NlQueryParserService, RestClientOllamaClient, NlSearchRateLimiter
 ├── security/            SecurityConfig, Permission, @RequirePermission, PermissionAspect
 ├── tag/                 Tag domain
 └── user/                User domain — AppUser, UserGroup, UserService
@@ -160,7 +161,7 @@ Input DTOs live flat in the domain package. Response types are the model entitie
 
 → See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
 
-**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded).
+**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded); `SMART_SEARCH_UNAVAILABLE` (HTTP 503 — Ollama inference service offline or timed out); `SMART_SEARCH_RATE_LIMITED` (HTTP 429 — user exceeded 5 NL search requests per minute).
 
 ### Security / Permissions
 
@@ -268,7 +269,7 @@ Back button pattern — use the shared `<BackButton>` component from `$lib/share
 
 → See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
 
-**LLM reminder:** when adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded).
+**LLM reminder:** when adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded); `SMART_SEARCH_UNAVAILABLE` (HTTP 503 — Ollama inference service offline or timed out); `SMART_SEARCH_RATE_LIMITED` (HTTP 429 — user exceeded 5 NL search requests per minute).
 
 ---
 
diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md
index 3fddf929..f7b65825 100644
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -560,6 +560,37 @@ bash scripts/download-kraken-models.sh
 
 > Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.
 
+### Ollama — natural-language search (NL Search)
+
+NL search uses a local Ollama instance for query parsing. The `ollama` service is defined in `docker-compose.yml` alongside the main stack.
+
+**First-time model pull** (required before the feature works):
+
+```bash
+docker compose exec ollama ollama pull qwen2.5:7b-instruct-q4_K_M
+```
+
+This downloads ~4.4 GB. The model is stored in the `ollama_data` Docker volume and persists across container restarts.
+
+**Verify the model is available:**
+
+```bash
+docker compose exec ollama ollama list
+```
+
+Expected output includes `qwen2.5:7b-instruct-q4_K_M`.
+
+**Health check** — the backend polls `GET /api/tags` on Ollama at startup and before inference. If Ollama is absent, `POST /api/search/nl` returns HTTP 503 with `SMART_SEARCH_UNAVAILABLE`.
+
+**Configuration** (see `application.yaml` under `app.ollama`):
+
+| Property | Default | Description |
+|---|---|---|
+| `app.ollama.base-url` | `http://ollama:11434` | Ollama service URL (dev: `http://localhost:11434`) |
+| `app.ollama.model` | `qwen2.5:7b-instruct-q4_K_M` | Model to use for inference |
+| `app.ollama.timeout-seconds` | `30` | Read timeout for inference calls |
+| `app.nl-search.rate-limit.max-requests-per-minute` | `5` | Per-user rate limit |
+
 ### Trigger a canonical import
 
 The importer no longer parses the raw spreadsheet. It consumes the **canonical artifacts**
diff --git a/docs/GLOSSARY.md b/docs/GLOSSARY.md
index 8bb508ab..1c6e8cb6 100644
--- a/docs/GLOSSARY.md
+++ b/docs/GLOSSARY.md
@@ -167,6 +167,16 @@ _See also [Chronik](#chronik-internal)._
 
 ---
 
+## NL Search Terms
+
+**NlSearch** — the natural-language document search feature. Users type a plain-German query (e.g. "Was hat Walter im Krieg an Emma geschrieben?"); the backend parses it via Ollama, resolves person names to database UUIDs, and delegates to the standard `DocumentService.searchDocuments()` path. Endpoint: `POST /api/search/nl`.
+
+**NlQueryInterpretation** — the structured result of parsing a natural-language query. Contains: `resolvedPersons` (persons whose names unambiguously matched one DB record), `ambiguousPersons` (all candidates when a name matched more than one person), `keywords` (LLM-extracted search terms), `dateFrom`/`dateTo` (extracted date range), `rawQuery` (the original user input), and `keywordsApplied` (whether keyword FTS was used in the search).
+
+**PersonHint** — a lightweight `{id, displayName}` pair used in `NlQueryInterpretation` to describe a resolved or ambiguous person without exposing the full `Person` entity to the frontend.
+
+---
+
 ## Infrastructure Terms
 
 **archiv-app** — the bucket-scoped MinIO service account the backend uses to read and write the `familienarchiv` bucket. Distinct from the MinIO root account (`archiv`, used only by the bootstrap container for admin operations). Defined and provisioned in [`infra/minio/bootstrap.sh`](../infra/minio/bootstrap.sh) and consumed by the backend as `S3_ACCESS_KEY` in [`docker-compose.prod.yml`](../docker-compose.prod.yml). The attached `archiv-app-policy` grants `s3:GetObject/PutObject/DeleteObject` on `familienarchiv/*` and `s3:ListBucket/GetBucketLocation` on the bucket only — not the built-in `readwrite` policy which would grant `s3:*` on all buckets.
diff --git a/docs/architecture/c4/l1-context.puml b/docs/architecture/c4/l1-context.puml
index d31ae49f..e2ccef71 100644
--- a/docs/architecture/c4/l1-context.puml
+++ b/docs/architecture/c4/l1-context.puml
@@ -9,10 +9,12 @@ Person(member, "Family Member", "Access by administrator invite. Searches, brows
 System(familienarchiv, "Familienarchiv", "Web application for digitising, organising, and searching family documents")
 System_Ext(mail, "Email Service", "SMTP server. Delivers notification emails (mentions, replies) and password-reset links.")
 System_Ext(glitchtip, "GlitchTip", "Self-hosted error tracking (Sentry-compatible). Receives frontend and backend error events with stack traces.")
+System_Ext(ollama, "Ollama (self-hosted)", "Local LLM inference server (qwen2.5:7b). Parses natural-language search queries into structured filters. Runs in the same Docker Compose stack.")
 
 Rel(admin, familienarchiv, "Manages via browser", "HTTPS")
 Rel(member, familienarchiv, "Searches, reads, and transcribes via browser", "HTTPS")
 Rel(familienarchiv, mail, "Sends notification and password-reset emails (optional)", "SMTP")
 Rel(familienarchiv, glitchtip, "Sends error events with errorId and stack trace", "HTTPS")
+Rel(familienarchiv, ollama, "NL query parsing for natural-language search", "HTTP / REST (internal)")
 
 @enduml
diff --git a/docs/architecture/c4/l2-containers.puml b/docs/architecture/c4/l2-containers.puml
index 5bfd6799..74f42ab2 100644
--- a/docs/architecture/c4/l2-containers.puml
+++ b/docs/architecture/c4/l2-containers.puml
@@ -15,6 +15,7 @@ System_Boundary(archiv, "Familienarchiv (Docker Compose)") {
     ContainerDb(db, "Relational Database", "PostgreSQL 16", "Stores document metadata, persons, users, permission groups, tags, transcription blocks, audit log, and Spring Session data.")
     ContainerDb(storage, "Object Storage", "MinIO (S3-compatible)", "Stores the actual document files (PDFs, scans). Backend uses a bucket-scoped service account (archiv-app), not MinIO root.")
     Container(mc, "Bucket / Service-Account Init", "MinIO Client (mc)", "One-shot container on startup. Idempotent: creates the archive bucket, the archiv-app service account, and attaches the readwrite policy.")
+    Container(ollama, "Ollama", "Ollama / port 11434", "Local LLM inference server. Hosts qwen2.5:7b-instruct-q4_K_M for natural-language query parsing (NL Search). CPU-only; GPU not required.")
 }
 
 System_Boundary(observability, "Observability Stack (/opt/familienarchiv/docker-compose.observability.yml)") {
@@ -41,6 +42,7 @@ Rel(backend, ocr, "OCR job requests with presigned MinIO URL", "HTTP / REST / JS
 Rel(backend, mail, "Sends notification and password-reset emails (optional)", "SMTP")
 Rel(ocr, storage, "Fetches PDF via presigned URL", "HTTP / S3 presigned")
 Rel(mc, storage, "Bootstraps bucket + service account on startup", "MinIO Client CLI")
+Rel(backend, ollama, "NL query parsing (POST /api/generate)", "HTTP / REST / JSON")
 Rel(promtail, loki, "Pushes log streams", "HTTP/Loki push API")
 Rel(backend, tempo, "Sends distributed traces via OTLP", "HTTP / OTLP / port 4318 (archiv-net)")
 Rel(prometheus, backend, "Scrapes JVM + HTTP metrics", "HTTP 8081 /actuator/prometheus")