diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index ca2e25f8..e56ca77a 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -280,6 +280,15 @@ git.raddatz.cloud A > **First start — Ollama model pull:** On first `docker compose up -d`, the `ollama-model-init` container pulls `qwen2.5:7b-instruct-q4_K_M` (~4.7 GB). At 10 Mbps this takes approximately 60–90 minutes; at 100 Mbps approximately 6–10 minutes. The pull is a one-time operation — subsequent restarts skip it (model already on the `ollama_models` volume). Monitor progress with `docker logs -f $(docker ps -q --filter name=ollama-model-init)`. > > **Do not use `--wait` on first deploy** — `docker compose up -d --wait` waits for all services to reach their health/completion target, including `ollama-model-init`. On first pull this blocks for 60–90 minutes and will time out any CI/deploy script that uses `--wait`. +> +> **Re-deploy idempotency:** on subsequent `docker compose up -d` runs (including `--force-recreate`), `ollama-model-init` re-executes but exits in seconds — Ollama's CLI skips the download when the model digest already matches what is on the volume. +> +> **Verify NL search is active** after enabling Ollama (`APP_OLLAMA_BASE_URL=http://ollama:11434`): +> ```bash +> curl -s http://localhost:8080/api/nl-search?q=brief+von+grossmutter +> # Returns 200 with results → NL search is active +> # Returns 503 NL_SEARCH_UNAVAILABLE → Ollama is not reachable or APP_OLLAMA_BASE_URL is unset +> ``` ```bash # 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow") @@ -576,13 +585,23 @@ bash scripts/download-kraken-models.sh > Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated. -### Manage the `ollama_models` volume +### Upgrade the Ollama model -> **`ollama_models` volume:** holds model weights only — fully reproducible by re-pull, no backup needed. If the volume fills after a model upgrade: -> ```bash -> docker volume rm ollama_models && docker compose up -d -> ``` -> The init container re-pulls the model on next startup. +To switch to a newer model version (e.g. a future release of `qwen2.5`): + +1. Update the model name in the `ollama-model-init` `command:` in `docker-compose.yml`. +2. Remove the existing model volume to free the old weights: + ```bash + docker volume rm familienarchiv_ollama_models + ``` + (In production the volume name is prefixed with the compose project: `archiv-production_ollama_models`.) +3. Restart the stack: + ```bash + docker compose up -d + ``` + The `ollama-model-init` container pulls the new model weights on first start (~4–8 GB download depending on the model). The `ollama` inference server will not start until the pull completes (`condition: service_completed_successfully`). + +> **`ollama_models` volume:** holds model weights only — fully reproducible by re-pull, no backup needed. ### Trigger a canonical import