fix(search): pin Ollama model in memory + raise read timeout

NL search recovered after deploy but went 503 again after a few minutes: Ollama unloads the model after its default ~5 min keep-alive, so the next query cold-loads the 4.7 GB model and exceeds the backend's 30s read timeout (ResourceAccessException -> SMART_SEARCH_UNAVAILABLE). Warm inference is ~18s; the cold load after idle is what timed out. - docker-compose.{prod,yml}: set OLLAMA_KEEP_ALIVE=-1 on the ollama service so the model stays resident and never pays a cold-load penalty during normal operation (verified on staging: `ollama ps` -> UNTIL "Forever"; host has 47 GB free). - application.yaml: raise app.ollama.timeout-seconds 30 -> 60 so the one unavoidable cold load (first query after an Ollama restart, before the model is pinned) completes instead of timing out. Refs #758 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 19:27:02 +02:00
parent b665e1132d
commit 9e97687d0f
3 changed files with 11 additions and 1 deletions
--- a/docker-compose.prod.yml
+++ b/docker-compose.prod.yml
@@ -239,6 +239,11 @@ services:
      - archiv-net
    volumes:
      - ollama-models:/root/.ollama
+    environment:
+      # Pin the model in memory (no idle unload). Without this, Ollama evicts
+      # the model after ~5 min idle and the next query pays a cold-load penalty
+      # that exceeds the backend read timeout → NL search 503 after idle.
+      OLLAMA_KEEP_ALIVE: "-1"
    cpus: "${OLLAMA_CPU_LIMIT:-4.0}"
    mem_limit: "${OLLAMA_MEM_LIMIT:-8g}"
    memswap_limit: "${OLLAMA_MEM_LIMIT:-8g}"