docs(ollama): add model upgrade runbook + post-deploy smoke test to DEPLOYMENT.md
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m16s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m37s
CI / fail2ban Regex (pull_request) Successful in 47s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s

Addresses Elicit's and Sara's review concerns on PR #749:
- Expand §6 ollama_models section into a full model upgrade runbook (step-by-step
  docker volume rm + recreate, including production volume name prefix)
- Add re-deploy idempotency note to §3.4 (init container exits quickly when model
  already present on the volume)
- Add NL search smoke test to §3.4 (curl command distinguishing 200 from 503
  NL_SEARCH_UNAVAILABLE)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-06-06 14:54:58 +02:00
parent 6bc434ebb8
commit 27bef28c0e

View File

@@ -279,6 +279,15 @@ git.raddatz.cloud A <server IP>
> **First start — Ollama model pull:** On first `docker compose up -d`, the `ollama-model-init` container pulls `qwen2.5:7b-instruct-q4_K_M` (~4.7 GB). At 10 Mbps this takes approximately 6090 minutes; at 100 Mbps approximately 610 minutes. The pull is a one-time operation — subsequent restarts skip it (model already on the `ollama_models` volume). Monitor progress with `docker logs -f $(docker ps -q --filter name=ollama-model-init)`. > **First start — Ollama model pull:** On first `docker compose up -d`, the `ollama-model-init` container pulls `qwen2.5:7b-instruct-q4_K_M` (~4.7 GB). At 10 Mbps this takes approximately 6090 minutes; at 100 Mbps approximately 610 minutes. The pull is a one-time operation — subsequent restarts skip it (model already on the `ollama_models` volume). Monitor progress with `docker logs -f $(docker ps -q --filter name=ollama-model-init)`.
> >
> **Do not use `--wait` on first deploy** — `docker compose up -d --wait` waits for all services to reach their health/completion target, including `ollama-model-init`. On first pull this blocks for 6090 minutes and will time out any CI/deploy script that uses `--wait`. > **Do not use `--wait` on first deploy** — `docker compose up -d --wait` waits for all services to reach their health/completion target, including `ollama-model-init`. On first pull this blocks for 6090 minutes and will time out any CI/deploy script that uses `--wait`.
>
> **Re-deploy idempotency:** on subsequent `docker compose up -d` runs (including `--force-recreate`), `ollama-model-init` re-executes but exits in seconds — Ollama's CLI skips the download when the model digest already matches what is on the volume.
>
> **Verify NL search is active** after enabling Ollama (`APP_OLLAMA_BASE_URL=http://ollama:11434`):
> ```bash
> curl -s http://localhost:8080/api/nl-search?q=brief+von+grossmutter
> # Returns 200 with results → NL search is active
> # Returns 503 NL_SEARCH_UNAVAILABLE → Ollama is not reachable or APP_OLLAMA_BASE_URL is unset
> ```
```bash ```bash
# 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow") # 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
@@ -575,13 +584,23 @@ bash scripts/download-kraken-models.sh
> Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated. > Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.
### Manage the `ollama_models` volume ### Upgrade the Ollama model
> **`ollama_models` volume:** holds model weights only — fully reproducible by re-pull, no backup needed. If the volume fills after a model upgrade: To switch to a newer model version (e.g. a future release of `qwen2.5`):
> ```bash
> docker volume rm ollama_models && docker compose up -d 1. Update the model name in the `ollama-model-init` `command:` in `docker-compose.yml`.
> ``` 2. Remove the existing model volume to free the old weights:
> The init container re-pulls the model on next startup. ```bash
docker volume rm familienarchiv_ollama_models
```
(In production the volume name is prefixed with the compose project: `archiv-production_ollama_models`.)
3. Restart the stack:
```bash
docker compose up -d
```
The `ollama-model-init` container pulls the new model weights on first start (~48 GB download depending on the model). The `ollama` inference server will not start until the pull completes (`condition: service_completed_successfully`).
> **`ollama_models` volume:** holds model weights only — fully reproducible by re-pull, no backup needed.
### Trigger a canonical import ### Trigger a canonical import