refactor(search): remove NLP/smart-search feature entirely (#772)
All checks were successful
CI / Unit & Component Tests (push) Successful in 3m23s
CI / OCR Service Tests (push) Successful in 24s
CI / Backend Unit Tests (push) Successful in 3m46s
CI / fail2ban Regex (push) Successful in 46s
CI / Semgrep Security Scan (push) Successful in 25s
CI / Compose Bucket Idempotency (push) Successful in 1m8s

## Summary

- Removes the NLP/smart-search feature completely — the feature was too unreliable and slow; users get better results with the regular search filters
- Deletes the entire backend `search/` package (NlSearchController, NlQueryParserService, NlpClient, NlSearchRateLimiter — 14 classes + 6 test classes)
- Deletes the `nlp-service/` Python microservice (FastAPI, rapidfuzz, DB-backed person matching)
- Removes all frontend NL search components: SmartModeToggle, SmartSearchStatus, InterpretationChipRow, DisambiguationPicker, chip-types, theme-chip-removal
- Strips smart-mode logic from SearchFilterBar and documents/+page.svelte
- Removes `SMART_SEARCH_UNAVAILABLE` / `SMART_SEARCH_RATE_LIMITED` error codes from backend, frontend types, and all three i18n files (de/en/es)
- Removes `nlp-service` container and `APP_NLP_BASE_URL` from both docker-compose files
- Removes Ollama/NLP Prometheus scrape job and Grafana dashboard
- Deletes ADRs 028 (×2), 034, 035

## Test plan

- [ ] Backend compiles: `cd backend && ./mvnw compile -q` → BUILD SUCCESS
- [ ] Frontend server tests pass: `cd frontend && npm run test -- --project=server`
- [ ] No NLP/smart-search references remain in source: `grep -r "SmartSearch\|NlSearch\|nlp-service\|SMART_SEARCH" backend/src frontend/src`
- [ ] `docker compose config` validates both compose files
- [ ] Search page loads, filter bar works, no smart-mode toggle visible

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Marcel <marcel@familienarchiv>
Reviewed-on: #772
This commit was merged in pull request #772.
This commit is contained in:
2026-06-08 10:57:00 +02:00
parent 8e63867ad8
commit d650b6c066
60 changed files with 126 additions and 4364 deletions

View File

@@ -50,17 +50,15 @@ graph TD
The OCR service requires significant RAM for model loading. The dev compose sets `mem_limit: 12g`.
| Production target | RAM | Recommended OCR limit | NL Search | Notes |
|---|---|---|---|---|
| Current server (Hetzner Serverbörse, i7-6700) | 64 GB | 12 GB | Supported | Default `mem_limit: 12g` works comfortably; plenty of headroom for Ollama |
| ≥ 16 GB RAM | 16+ GB | 12 GB | Supported | Default works |
| 8 GB RAM | 8 GB | 6 GB | Disabled — set `APP_OLLAMA_BASE_URL=` (empty) | Set `OCR_MEM_LIMIT=6g`; accept reduced batch sizes |
| 4 GB RAM | 4 GB | — | Unsupported | Disable OCR service (`profiles: [ocr]`); run OCR on demand only |
| Production target | RAM | Recommended OCR limit | Notes |
|---|---|---|---|
| Current server (Hetzner Serverbörse, i7-6700) | 64 GB | 12 GB | Default `mem_limit: 12g` works comfortably |
| ≥ 16 GB RAM | 16+ GB | 12 GB | Default works |
| 8 GB RAM | 8 GB | 6 GB | Set `OCR_MEM_LIMIT=6g`; accept reduced batch sizes |
| 4 GB RAM | 4 GB | — | Disable OCR service (`profiles: [ocr]`); run OCR on demand only |
On servers with less than 16 GB RAM the default `mem_limit: 12g` cannot be honoured — set the `OCR_MEM_LIMIT` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow). The prod compose interpolates this var with a 12g default.
> **Memory budget:** OCR (~6 GB active) + Ollama (~8 GB) = ~14 GB. On servers with less than 16 GB RAM, do not run `docker-compose.observability.yml` continuously alongside both OCR and Ollama.
### Dev vs production differences
| Concern | Dev (`docker-compose.yml`) | Prod (`docker-compose.prod.yml`) |
@@ -147,16 +145,6 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `XDG_CACHE_HOME` | XDG cache base dir — redirects Matplotlib and other XDG-aware libraries away from the read-only `HOME` (`/home/ocr`) to the writable cache volume | `/app/cache` | — | — |
| `TORCH_HOME` | PyTorch model cache — redirects `~/.cache/torch` to the writable models volume | `/app/models/torch` | — | — |
### Ollama (NL search) service
| Variable | Purpose | Default | Required? | Sensitive? |
|---|---|---|---|---|
| `APP_OLLAMA_BASE_URL` | Base URL for the Ollama service. Leave empty to disable NL search. | `http://ollama:11434` | — | — |
| `APP_OLLAMA_API_KEY` | API key passed as `Authorization: Bearer` to Ollama. Leave empty for unauthenticated access. Note: `OLLAMA_API_KEY` is not enforced in Ollama 0.6.5 or 0.30.6 (see ADR-028). | — | — | YES |
| `OLLAMA_CPU_LIMIT` | Docker CPU quota for the Ollama container. On CX42 (8 vCPUs) can be raised to `7.5`. | `4.0` | — | — |
| `OLLAMA_MEM_LIMIT` | Memory limit for the Ollama container. Requires CX42 (16 GB RAM). | `8g` | — | — |
| `OLLAMA_API_KEY` | API key set on the Ollama service itself. Same value as `APP_OLLAMA_API_KEY`. Leave empty for unauthenticated. | — | — | YES |
### Observability stack (`docker-compose.observability.yml`)
| Variable | Purpose | Default | Required? | Sensitive? |
@@ -277,18 +265,6 @@ git.raddatz.cloud A <server IP>
### 3.4 First deploy
> **First start — Ollama model pull:** On first `docker compose up -d`, the `ollama-model-init` container pulls `qwen2.5:7b-instruct-q4_K_M` (~4.7 GB). At 10 Mbps this takes approximately 6090 minutes; at 100 Mbps approximately 610 minutes. The pull is a one-time operation — subsequent restarts skip it (model already on the `ollama_models` volume). Monitor progress with `docker logs -f $(docker ps -q --filter name=ollama-model-init)`.
>
> **Do not use `--wait` on first deploy** — `docker compose up -d --wait` waits for all services to reach their health/completion target, including `ollama-model-init`. On first pull this blocks for 6090 minutes and will time out any CI/deploy script that uses `--wait`.
>
> **Re-deploy idempotency:** on subsequent `docker compose up -d` runs (including `--force-recreate`), `ollama-model-init` re-executes but exits in seconds — Ollama's CLI skips the download when the model digest already matches what is on the volume.
>
> **Verify NL search is active** after enabling Ollama (`APP_OLLAMA_BASE_URL=http://ollama:11434`):
> ```bash
> curl -s http://localhost:8080/api/nl-search?q=brief+von+grossmutter
> # Returns 200 with results → NL search is active
> # Returns 503 NL_SEARCH_UNAVAILABLE → Ollama is not reachable or APP_OLLAMA_BASE_URL is unset
> ```
```bash
# 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
@@ -585,55 +561,6 @@ bash scripts/download-kraken-models.sh
> Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.
### Ollama — natural-language search (NL Search)
NL search uses a local Ollama instance for query parsing. The `ollama` service is defined in `docker-compose.yml` alongside the main stack.
**First-time model pull** (required before the feature works):
```bash
docker compose exec ollama ollama pull qwen2.5:7b-instruct-q4_K_M
```
This downloads ~4.4 GB. The model is stored in the `ollama_data` Docker volume and persists across container restarts.
**Verify the model is available:**
```bash
docker compose exec ollama ollama list
```
Expected output includes `qwen2.5:7b-instruct-q4_K_M`.
**Health check** — the backend polls `GET /api/tags` on Ollama at startup and before inference. If Ollama is absent, `POST /api/search/nl` returns HTTP 503 with `SMART_SEARCH_UNAVAILABLE`.
**Configuration** (see `application.yaml` under `app.ollama`):
| Property | Default | Description |
|---|---|---|
| `app.ollama.base-url` | `http://ollama:11434` | Ollama service URL (dev: `http://localhost:11434`) |
| `app.ollama.model` | `qwen2.5:7b-instruct-q4_K_M` | Model to use for inference |
| `app.ollama.timeout-seconds` | `60` | Read timeout for inference calls (absorbs cold model load on the first query after an Ollama restart) |
| `app.nl-search.rate-limit.max-requests-per-minute` | `5` | Per-user rate limit |
### Upgrade the Ollama model
To switch to a newer model version (e.g. a future release of `qwen2.5`):
1. Update the model name in the `ollama-model-init` `command:` in `docker-compose.yml`.
2. Remove the existing model volume to free the old weights:
```bash
docker volume rm familienarchiv_ollama_models
```
(In production the volume name is prefixed with the compose project: `archiv-production_ollama-models`.)
3. Restart the stack:
```bash
docker compose up -d
```
The `ollama-model-init` container pulls the new model weights on first start (~48 GB download depending on the model). The `ollama` inference server will not start until the pull completes (`condition: service_completed_successfully`).
> **`ollama_models` volume:** holds model weights only — fully reproducible by re-pull, no backup needed.
### Trigger a canonical import
The importer no longer parses the raw spreadsheet. It consumes the **canonical artifacts**