fix(infra): Ollama missing from prod/staging compose + broken model-init recipe — NL search 503 on staging #758

Closed
opened 2026-06-06 19:18:31 +02:00 by marcel · 1 comment
Owner

Symptom

On staging, every natural-language search returns "Intelligente Suche nicht verfügbar" (HTTP 503 SMART_SEARCH_UNAVAILABLE). Backend logs:

WARN org.raddatz.familienarchiv.search.RestClientOllamaClient — Ollama inference failed: ResourceAccessException

Root cause

Two independent defects, both downstream of #737:

  1. Ollama was never added to docker-compose.prod.yml. #737 added the ollama + ollama-model-init services and the ollama_models volume to the dev docker-compose.yml only. Staging and production deploy from docker-compose.prod.yml (a self-contained file, not an overlay), which has no Ollama service. The backend defaults to app.ollama.base-url: http://ollama:11434 (application.yaml), so the conditional client bean is active and tries to connect to a host that does not exist → ResourceAccessException → 503. Confirmed on the host: no ollama container, no process, nothing on :11434.

  2. The model-init recipe merged in #737 is broken (so even the dev compose never actually worked):

    • The ollama/ollama image's ENTRYPOINT is ollama, so command: sh -c "..." is parsed as ollama sh -c "..."
      Error: unknown command "sh" for "ollama" → init exits 1 → ollama never starts (service_completed_successfully gate).
    • The image ships no curl, so both the init readiness loop (until curl -sf .../api/tags) and the
      service healthcheck (["CMD","curl","-f",...]) can never succeed.

Hotfix already applied to staging (2026-06-06)

To restore NL search immediately I patched the on-disk /opt/familienarchiv/docker-compose.prod.yml, pulled the model, and verified end-to-end. The corrected recipe (entrypoint override + ollama list for readiness/health, no curl) is now running:

  • ollama-model-init exits 0; model qwen2.5:7b-instruct-q4_K_M (4.7 GB) cached in the ollama-models volume
  • ollama container is healthy
  • docker exec archiv-staging-backend-1 wget -qO- http://ollama:11434/api/tags returns the model list
  • one-shot inference via ollama run succeeds within the 8 GB limit

⚠️ This on-disk patch will be overwritten by the next CI deploy (which checks out the repo docker-compose.prod.yml, currently without Ollama). The fix must land in the repo — this issue + PR.

Fix (repo)

  • docker-compose.prod.yml: add ollama-model-init + ollama services and the ollama-models volume, using the corrected recipe:
    • entrypoint: ["/bin/sh", "-c"] on the init container; command uses until ollama list >/dev/null 2>&1; do sleep 1; done for readiness (no curl)
    • service healthcheck: ["CMD", "ollama", "list"] (no curl)
    • no container_name (prod namespaces by compose project); ADR-019 hardening (read_only, cap_drop: [ALL], no-new-privileges, tmpfs /tmp)
  • docker-compose.yml (dev): fix the same broken model-init entrypoint/command and the curl healthcheck so the dev stack actually starts Ollama.

Out of scope / follow-ups

  • The merged RestClientOllamaClient does not send an Authorization: Bearer header (the OLLAMA_API_KEY plumbing described in #737 was not implemented). OLLAMA_API_KEY is therefore omitted from the prod service for now. Track separately if auth is still wanted.
  • docs/DEPLOYMENT.md NL-search hardware tier + env-var rows, Prometheus ollama scrape job, and the Grafana latency dashboard (all listed in #737) remain open.

Acceptance

  • Fresh docker compose -f docker-compose.prod.yml ... up -d starts ollama-model-init → exits 0 → ollama reaches healthy within start_period.
  • docker exec <backend> wget -qO- http://ollama:11434/api/tags returns the model list.
  • A real NL search on staging returns smart results instead of 503.
  • Dev docker compose up -d likewise brings Ollama to healthy.
## Symptom On staging, every natural-language search returns **"Intelligente Suche nicht verfügbar"** (HTTP 503 `SMART_SEARCH_UNAVAILABLE`). Backend logs: ``` WARN org.raddatz.familienarchiv.search.RestClientOllamaClient — Ollama inference failed: ResourceAccessException ``` ## Root cause Two independent defects, both downstream of #737: 1. **Ollama was never added to `docker-compose.prod.yml`.** #737 added the `ollama` + `ollama-model-init` services and the `ollama_models` volume to the **dev** `docker-compose.yml` only. Staging and production deploy from `docker-compose.prod.yml` (a self-contained file, *not* an overlay), which has no Ollama service. The backend defaults to `app.ollama.base-url: http://ollama:11434` (`application.yaml`), so the conditional client bean is active and tries to connect to a host that does not exist → `ResourceAccessException` → 503. Confirmed on the host: no `ollama` container, no process, nothing on `:11434`. 2. **The model-init recipe merged in #737 is broken** (so even the dev compose never actually worked): - The `ollama/ollama` image's `ENTRYPOINT` is `ollama`, so `command: sh -c "..."` is parsed as `ollama sh -c "..."` → `Error: unknown command "sh" for "ollama"` → init exits 1 → `ollama` never starts (`service_completed_successfully` gate). - The image ships **no `curl`**, so both the init readiness loop (`until curl -sf .../api/tags`) and the service healthcheck (`["CMD","curl","-f",...]`) can never succeed. ## Hotfix already applied to staging (2026-06-06) To restore NL search immediately I patched the on-disk `/opt/familienarchiv/docker-compose.prod.yml`, pulled the model, and verified end-to-end. The corrected recipe (entrypoint override + `ollama list` for readiness/health, no curl) is now running: - `ollama-model-init` exits **0**; model `qwen2.5:7b-instruct-q4_K_M` (4.7 GB) cached in the `ollama-models` volume - `ollama` container is **healthy** - `docker exec archiv-staging-backend-1 wget -qO- http://ollama:11434/api/tags` returns the model list - one-shot inference via `ollama run` succeeds within the 8 GB limit ⚠️ This on-disk patch will be **overwritten by the next CI deploy** (which checks out the repo `docker-compose.prod.yml`, currently without Ollama). The fix must land in the repo — this issue + PR. ## Fix (repo) - [ ] `docker-compose.prod.yml`: add `ollama-model-init` + `ollama` services and the `ollama-models` volume, using the **corrected** recipe: - `entrypoint: ["/bin/sh", "-c"]` on the init container; command uses `until ollama list >/dev/null 2>&1; do sleep 1; done` for readiness (no curl) - service `healthcheck: ["CMD", "ollama", "list"]` (no curl) - no `container_name` (prod namespaces by compose project); ADR-019 hardening (`read_only`, `cap_drop: [ALL]`, `no-new-privileges`, tmpfs `/tmp`) - [ ] `docker-compose.yml` (dev): fix the same broken `model-init` entrypoint/command and the curl healthcheck so the dev stack actually starts Ollama. ## Out of scope / follow-ups - The merged `RestClientOllamaClient` does **not** send an `Authorization: Bearer` header (the `OLLAMA_API_KEY` plumbing described in #737 was not implemented). `OLLAMA_API_KEY` is therefore omitted from the prod service for now. Track separately if auth is still wanted. - `docs/DEPLOYMENT.md` NL-search hardware tier + env-var rows, Prometheus `ollama` scrape job, and the Grafana latency dashboard (all listed in #737) remain open. ## Acceptance - Fresh `docker compose -f docker-compose.prod.yml ... up -d` starts `ollama-model-init` → exits 0 → `ollama` reaches healthy within `start_period`. - `docker exec <backend> wget -qO- http://ollama:11434/api/tags` returns the model list. - A real NL search on staging returns smart results instead of 503. - Dev `docker compose up -d` likewise brings Ollama to healthy.
marcel added this to the Archive Intelligence — NL Search milestone 2026-06-06 19:18:31 +02:00
marcel added the P1-highbugdevops labels 2026-06-06 19:18:36 +02:00
Author
Owner

Follow-up: second root cause — idle model unload (cold-load timeout)

After the deploy fix, NL search worked on the first query and then went 503 again within minutes. The Ollama container stayed healthy throughout (ollama list passes regardless of whether the model is in RAM), so this was a separate issue:

  • ollama ps showed UNTIL <N> minutes from now — Ollama unloads the model after its default ~5 min keep-alive.
  • Once unloaded, the next query cold-loads the 4.7 GB model, which exceeds the backend's 30 s read timeout → ResourceAccessExceptionSMART_SEARCH_UNAVAILABLE. Warm inference is ~18 s; the cold load after idle is what timed out. (The original 17:20 failures in the first report were the same cold-load timeout right after the container restart.)

Fix (added to #759):

  • OLLAMA_KEEP_ALIVE=-1 on the ollama service (both compose files) — model stays resident, no idle unload. Verified on staging: ollama ps now shows UNTIL Forever; host has 47 GB free so the pinned ~5 GB is comfortable.
  • app.ollama.timeout-seconds 30 → 60 in application.yaml — absorbs the one unavoidable cold load (first query after an Ollama restart, before the model is pinned).

Staging is live again (keep-alive applied to the on-disk compose + container recreated). The timeout bump takes effect for prod/staging on the next backend deploy when #759 merges.

## Follow-up: second root cause — idle model unload (cold-load timeout) After the deploy fix, NL search worked on the first query and then went 503 again within minutes. The Ollama container stayed **healthy** throughout (`ollama list` passes regardless of whether the model is in RAM), so this was a separate issue: - `ollama ps` showed `UNTIL <N> minutes from now` — Ollama unloads the model after its default **~5 min keep-alive**. - Once unloaded, the next query cold-loads the 4.7 GB model, which exceeds the backend's **30 s** read timeout → `ResourceAccessException` → `SMART_SEARCH_UNAVAILABLE`. Warm inference is ~18 s; the cold load after idle is what timed out. (The original 17:20 failures in the first report were the same cold-load timeout right after the container restart.) **Fix (added to #759):** - `OLLAMA_KEEP_ALIVE=-1` on the ollama service (both compose files) — model stays resident, no idle unload. Verified on staging: `ollama ps` now shows `UNTIL Forever`; host has 47 GB free so the pinned ~5 GB is comfortable. - `app.ollama.timeout-seconds` 30 → 60 in `application.yaml` — absorbs the one unavoidable cold load (first query after an Ollama restart, before the model is pinned). Staging is live again (keep-alive applied to the on-disk compose + container recreated). The timeout bump takes effect for prod/staging on the next backend deploy when #759 merges.
Sign in to join this conversation.
No Label P1-high bug devops
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#758