fix(infra): deploy Ollama to prod/staging compose + fix broken model-init recipe

NL search returned 503 (SMART_SEARCH_UNAVAILABLE / "Intelligente Suche nicht verfügbar") on staging because Ollama was never reachable. Two defects, both downstream of #737: 1. Ollama was added only to the dev docker-compose.yml. Staging/prod deploy from the self-contained docker-compose.prod.yml, which had no ollama service — so the backend (defaulting to http://ollama:11434) hit a non-existent host (ResourceAccessException -> 503). 2. The merged model-init recipe never worked: the ollama/ollama image ENTRYPOINT is `ollama` (so `command: sh -c ...` ran as `ollama sh ...` -> "unknown command sh"), and the image ships no curl (so both the readiness loop and the healthcheck could never pass). - docker-compose.prod.yml: add ollama-model-init + ollama services and the ollama-models volume, with the corrected recipe (entrypoint override to /bin/sh -c, `ollama list` for readiness and healthcheck). - docker-compose.yml: fix the same broken entrypoint/command and the curl healthcheck so the dev stack actually starts Ollama. Verified on staging end-to-end: model-init exits 0, ollama healthy, backend reaches /api/tags, inference succeeds within the 8g limit. Refs #758 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 19:20:22 +02:00
parent 87af9ab446
commit b665e1132d
2 changed files with 67 additions and 3 deletions
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -161,8 +161,11 @@ services:
      - ALL
    security_opt:
      - no-new-privileges:true
-    command: >
-      sh -c "ollama serve & SERVE_PID=$$! && until curl -sf http://localhost:11434/api/tags; do sleep 1; done && ollama pull qwen2.5:7b-instruct-q4_K_M && kill $$SERVE_PID"
+    # The image ENTRYPOINT is `ollama`, so override it to a shell; the image has
+    # no curl, so readiness is probed with `ollama list` instead of a curl loop.
+    entrypoint: ["/bin/sh", "-c"]
+    command:
+      - "ollama serve & until ollama list >/dev/null 2>&1; do sleep 1; done && ollama pull qwen2.5:7b-instruct-q4_K_M"

  # --- Ollama: LLM inference server ---
  # Serves the pre-pulled model for NL search inference.
@@ -191,7 +194,9 @@ services:
    security_opt:
      - no-new-privileges:true
    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
+      # `ollama list` hits the local API and exits non-zero if the server is
+      # down — used instead of curl, which the image does not ship.
+      test: ["CMD", "ollama", "list"]
      interval: 30s
      timeout: 10s
      retries: 5