Capture the why behind deploying Ollama to prod/staging compose: the
corrected init recipe (supersedes ADR-028 §10's never-functional curl
loop), the OLLAMA_KEEP_ALIVE=-1 pin (so a future maintainer doesn't
optimize it away and reintroduce the post-idle cold-load 503), the
30->60s timeout NFR, and the memswap==mem hard-OOM trade-off.
Addresses #759 review (Markus #3, Nora #2).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The diagram declared Container(ollama, ...) twice — an alias collision that
renders a duplicate box. It also declared the backend->ollama relationship
twice. Keep the richer 'Ollama LLM Service' declaration and the more
specific 'NL query parsing (POST /api/generate)' relationship; drop the
duplicates.
Addresses #759 review (Markus #2).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mirror the prod hardening in the dev stack: guard the model pull with
`ollama list | grep -q <model>` so an already-cached model exits clean
without a registry round-trip. Keeps dev and prod on one recipe.
Addresses #759 review (Tobias #1).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The init command unconditionally ran `ollama pull`, which contacts the
registry to verify the manifest digest even when the model is already on
the volume. A host reboot during a registry/upstream-network blip would
then fail init non-zero, the `service_completed_successfully` gate would
never be met, and the ollama service (hence NL search) would stay down
until the registry was reachable again.
Guard the pull with `ollama list | grep -q <model>` so a cached model
exits clean without any registry round-trip.
Addresses #759 review (Tobias #1).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
docker-compose.prod.yml declares the volume as `ollama-models` (hyphen),
so the compose-project-prefixed name is `archiv-production_ollama-models`,
not the underscored `archiv-production_ollama_models` the model-upgrade
guide documented. The documented `docker volume rm` would not have matched
the real volume.
Addresses #759 review (Tobias #2).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
application.yaml sets app.ollama.timeout-seconds: 60 (raised from 30 to
absorb the cold model load on the first query after an Ollama restart),
but DEPLOYMENT.md still documented 30. A doc that contradicts the shipped
value is a traceability defect.
Addresses #759 review (Markus, Felix, Elicit).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
NL search recovered after deploy but went 503 again after a few minutes:
Ollama unloads the model after its default ~5 min keep-alive, so the next
query cold-loads the 4.7 GB model and exceeds the backend's 30s read
timeout (ResourceAccessException -> SMART_SEARCH_UNAVAILABLE). Warm
inference is ~18s; the cold load after idle is what timed out.
- docker-compose.{prod,yml}: set OLLAMA_KEEP_ALIVE=-1 on the ollama
service so the model stays resident and never pays a cold-load penalty
during normal operation (verified on staging: `ollama ps` -> UNTIL
"Forever"; host has 47 GB free).
- application.yaml: raise app.ollama.timeout-seconds 30 -> 60 so the one
unavoidable cold load (first query after an Ollama restart, before the
model is pinned) completes instead of timing out.
Refs #758
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
NL search returned 503 (SMART_SEARCH_UNAVAILABLE / "Intelligente Suche
nicht verfügbar") on staging because Ollama was never reachable.
Two defects, both downstream of #737:
1. Ollama was added only to the dev docker-compose.yml. Staging/prod
deploy from the self-contained docker-compose.prod.yml, which had no
ollama service — so the backend (defaulting to http://ollama:11434)
hit a non-existent host (ResourceAccessException -> 503).
2. The merged model-init recipe never worked: the ollama/ollama image
ENTRYPOINT is `ollama` (so `command: sh -c ...` ran as `ollama sh ...`
-> "unknown command sh"), and the image ships no curl (so both the
readiness loop and the healthcheck could never pass).
- docker-compose.prod.yml: add ollama-model-init + ollama services and
the ollama-models volume, with the corrected recipe (entrypoint
override to /bin/sh -c, `ollama list` for readiness and healthcheck).
- docker-compose.yml: fix the same broken entrypoint/command and the
curl healthcheck so the dev stack actually starts Ollama.
Verified on staging end-to-end: model-init exits 0, ollama healthy,
backend reaches /api/tags, inference succeeds within the 8g limit.
Refs #758
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>