fix(infra): Ollama missing from prod/staging compose + broken model-init recipe — NL search 503 on staging #758
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
On staging, every natural-language search returns "Intelligente Suche nicht verfügbar" (HTTP 503
SMART_SEARCH_UNAVAILABLE). Backend logs:Root cause
Two independent defects, both downstream of #737:
Ollama was never added to
docker-compose.prod.yml. #737 added theollama+ollama-model-initservices and theollama_modelsvolume to the devdocker-compose.ymlonly. Staging and production deploy fromdocker-compose.prod.yml(a self-contained file, not an overlay), which has no Ollama service. The backend defaults toapp.ollama.base-url: http://ollama:11434(application.yaml), so the conditional client bean is active and tries to connect to a host that does not exist →ResourceAccessException→ 503. Confirmed on the host: noollamacontainer, no process, nothing on:11434.The model-init recipe merged in #737 is broken (so even the dev compose never actually worked):
ollama/ollamaimage'sENTRYPOINTisollama, socommand: sh -c "..."is parsed asollama sh -c "..."→Error: unknown command "sh" for "ollama"→ init exits 1 →ollamanever starts (service_completed_successfullygate).curl, so both the init readiness loop (until curl -sf .../api/tags) and theservice healthcheck (
["CMD","curl","-f",...]) can never succeed.Hotfix already applied to staging (2026-06-06)
To restore NL search immediately I patched the on-disk
/opt/familienarchiv/docker-compose.prod.yml, pulled the model, and verified end-to-end. The corrected recipe (entrypoint override +ollama listfor readiness/health, no curl) is now running:ollama-model-initexits 0; modelqwen2.5:7b-instruct-q4_K_M(4.7 GB) cached in theollama-modelsvolumeollamacontainer is healthydocker exec archiv-staging-backend-1 wget -qO- http://ollama:11434/api/tagsreturns the model listollama runsucceeds within the 8 GB limit⚠️ This on-disk patch will be overwritten by the next CI deploy (which checks out the repo
docker-compose.prod.yml, currently without Ollama). The fix must land in the repo — this issue + PR.Fix (repo)
docker-compose.prod.yml: addollama-model-init+ollamaservices and theollama-modelsvolume, using the corrected recipe:entrypoint: ["/bin/sh", "-c"]on the init container; command usesuntil ollama list >/dev/null 2>&1; do sleep 1; donefor readiness (no curl)healthcheck: ["CMD", "ollama", "list"](no curl)container_name(prod namespaces by compose project); ADR-019 hardening (read_only,cap_drop: [ALL],no-new-privileges, tmpfs/tmp)docker-compose.yml(dev): fix the same brokenmodel-initentrypoint/command and the curl healthcheck so the dev stack actually starts Ollama.Out of scope / follow-ups
RestClientOllamaClientdoes not send anAuthorization: Bearerheader (theOLLAMA_API_KEYplumbing described in #737 was not implemented).OLLAMA_API_KEYis therefore omitted from the prod service for now. Track separately if auth is still wanted.docs/DEPLOYMENT.mdNL-search hardware tier + env-var rows, Prometheusollamascrape job, and the Grafana latency dashboard (all listed in #737) remain open.Acceptance
docker compose -f docker-compose.prod.yml ... up -dstartsollama-model-init→ exits 0 →ollamareaches healthy withinstart_period.docker exec <backend> wget -qO- http://ollama:11434/api/tagsreturns the model list.docker compose up -dlikewise brings Ollama to healthy.Follow-up: second root cause — idle model unload (cold-load timeout)
After the deploy fix, NL search worked on the first query and then went 503 again within minutes. The Ollama container stayed healthy throughout (
ollama listpasses regardless of whether the model is in RAM), so this was a separate issue:ollama psshowedUNTIL <N> minutes from now— Ollama unloads the model after its default ~5 min keep-alive.ResourceAccessException→SMART_SEARCH_UNAVAILABLE. Warm inference is ~18 s; the cold load after idle is what timed out. (The original 17:20 failures in the first report were the same cold-load timeout right after the container restart.)Fix (added to #759):
OLLAMA_KEEP_ALIVE=-1on the ollama service (both compose files) — model stays resident, no idle unload. Verified on staging:ollama psnow showsUNTIL Forever; host has 47 GB free so the pinned ~5 GB is comfortable.app.ollama.timeout-seconds30 → 60 inapplication.yaml— absorbs the one unavoidable cold load (first query after an Ollama restart, before the model is pinned).Staging is live again (keep-alive applied to the on-disk compose + container recreated). The timeout bump takes effect for prod/staging on the next backend deploy when #759 merges.