familienarchiv/docker-compose.yml at a2f37f85a6bdbba520b5ad665e4e9ba7d6b64877

marcel/familienarchiv

Fork 0

Files

Marcel 9e97687d0f

CI / Unit & Component Tests (pull_request) Successful in 3m18s

Details

CI / OCR Service Tests (pull_request) Successful in 22s

Details

CI / Backend Unit Tests (pull_request) Successful in 3m55s

Details

CI / fail2ban Regex (pull_request) Successful in 51s

Details

CI / Semgrep Security Scan (pull_request) Successful in 22s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 1m8s

Details

fix(search): pin Ollama model in memory + raise read timeout

NL search recovered after deploy but went 503 again after a few minutes:
Ollama unloads the model after its default ~5 min keep-alive, so the next
query cold-loads the 4.7 GB model and exceeds the backend's 30s read
timeout (ResourceAccessException -> SMART_SEARCH_UNAVAILABLE). Warm
inference is ~18s; the cold load after idle is what timed out.

- docker-compose.{prod,yml}: set OLLAMA_KEEP_ALIVE=-1 on the ollama
  service so the model stays resident and never pays a cold-load penalty
  during normal operation (verified on staging: `ollama ps` -> UNTIL
  "Forever"; host has 47 GB free).
- application.yaml: raise app.ollama.timeout-seconds 30 -> 60 so the one
  unavoidable cold load (first query after an Ollama restart, before the
  model is pinned) completes instead of timing out.

Refs #758

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-06 19:27:02 +02:00

11 KiB

Raw Blame History

View Raw

11 KiB Raw Blame History

11 KiB

Raw Blame History