fix(infra): deploy Ollama to prod/staging compose + fix broken model-init recipe #759

Merged
marcel merged 8 commits from fix/issue-758-ollama-prod-compose into main 2026-06-06 20:30:35 +02:00

8 Commits

Author SHA1 Message Date
Marcel
ed98729f75 docs(adr): record prod Ollama deployment + keep-alive decision (ADR-034)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m23s
CI / OCR Service Tests (pull_request) Successful in 24s
CI / Backend Unit Tests (pull_request) Successful in 3m52s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 25s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m7s
CI / Unit & Component Tests (push) Successful in 3m23s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 3m52s
CI / fail2ban Regex (push) Successful in 46s
CI / Semgrep Security Scan (push) Successful in 23s
CI / Compose Bucket Idempotency (push) Successful in 1m4s
nightly / deploy-staging (push) Successful in 2m44s
Capture the why behind deploying Ollama to prod/staging compose: the
corrected init recipe (supersedes ADR-028 §10's never-functional curl
loop), the OLLAMA_KEEP_ALIVE=-1 pin (so a future maintainer doesn't
optimize it away and reintroduce the post-idle cold-load 503), the
30->60s timeout NFR, and the memswap==mem hard-OOM trade-off.

Addresses #759 review (Markus #3, Nora #2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 20:16:03 +02:00
Marcel
db87a64cc0 docs(c4): de-duplicate Ollama container in l2-containers diagram
The diagram declared Container(ollama, ...) twice — an alias collision that
renders a duplicate box. It also declared the backend->ollama relationship
twice. Keep the richer 'Ollama LLM Service' declaration and the more
specific 'NL query parsing (POST /api/generate)' relationship; drop the
duplicates.

Addresses #759 review (Markus #2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 20:14:26 +02:00
Marcel
d7d6d0638c fix(infra): make dev Ollama model-init offline-safe
Mirror the prod hardening in the dev stack: guard the model pull with
`ollama list | grep -q <model>` so an already-cached model exits clean
without a registry round-trip. Keeps dev and prod on one recipe.

Addresses #759 review (Tobias #1).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 20:13:19 +02:00
Marcel
a2f37f85a6 fix(infra): make prod Ollama model-init offline-safe
The init command unconditionally ran `ollama pull`, which contacts the
registry to verify the manifest digest even when the model is already on
the volume. A host reboot during a registry/upstream-network blip would
then fail init non-zero, the `service_completed_successfully` gate would
never be met, and the ollama service (hence NL search) would stay down
until the registry was reachable again.

Guard the pull with `ollama list | grep -q <model>` so a cached model
exits clean without any registry round-trip.

Addresses #759 review (Tobias #1).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 20:12:21 +02:00
Marcel
f22a1a1cfa docs(deploy): fix prod Ollama volume name to match hyphenated compose volume
docker-compose.prod.yml declares the volume as `ollama-models` (hyphen),
so the compose-project-prefixed name is `archiv-production_ollama-models`,
not the underscored `archiv-production_ollama_models` the model-upgrade
guide documented. The documented `docker volume rm` would not have matched
the real volume.

Addresses #759 review (Tobias #2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 20:09:48 +02:00
Marcel
2a0863cf3e docs(deploy): correct Ollama read timeout default to 60s
application.yaml sets app.ollama.timeout-seconds: 60 (raised from 30 to
absorb the cold model load on the first query after an Ollama restart),
but DEPLOYMENT.md still documented 30. A doc that contradicts the shipped
value is a traceability defect.

Addresses #759 review (Markus, Felix, Elicit).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 20:08:55 +02:00
Marcel
9e97687d0f fix(search): pin Ollama model in memory + raise read timeout
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m18s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m55s
CI / fail2ban Regex (pull_request) Successful in 51s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m8s
NL search recovered after deploy but went 503 again after a few minutes:
Ollama unloads the model after its default ~5 min keep-alive, so the next
query cold-loads the 4.7 GB model and exceeds the backend's 30s read
timeout (ResourceAccessException -> SMART_SEARCH_UNAVAILABLE). Warm
inference is ~18s; the cold load after idle is what timed out.

- docker-compose.{prod,yml}: set OLLAMA_KEEP_ALIVE=-1 on the ollama
  service so the model stays resident and never pays a cold-load penalty
  during normal operation (verified on staging: `ollama ps` -> UNTIL
  "Forever"; host has 47 GB free).
- application.yaml: raise app.ollama.timeout-seconds 30 -> 60 so the one
  unavoidable cold load (first query after an Ollama restart, before the
  model is pinned) completes instead of timing out.

Refs #758

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 19:27:02 +02:00
Marcel
b665e1132d fix(infra): deploy Ollama to prod/staging compose + fix broken model-init recipe
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 4m0s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m56s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 23s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
NL search returned 503 (SMART_SEARCH_UNAVAILABLE / "Intelligente Suche
nicht verfügbar") on staging because Ollama was never reachable.

Two defects, both downstream of #737:

1. Ollama was added only to the dev docker-compose.yml. Staging/prod
   deploy from the self-contained docker-compose.prod.yml, which had no
   ollama service — so the backend (defaulting to http://ollama:11434)
   hit a non-existent host (ResourceAccessException -> 503).

2. The merged model-init recipe never worked: the ollama/ollama image
   ENTRYPOINT is `ollama` (so `command: sh -c ...` ran as `ollama sh ...`
   -> "unknown command sh"), and the image ships no curl (so both the
   readiness loop and the healthcheck could never pass).

- docker-compose.prod.yml: add ollama-model-init + ollama services and
  the ollama-models volume, with the corrected recipe (entrypoint
  override to /bin/sh -c, `ollama list` for readiness and healthcheck).
- docker-compose.yml: fix the same broken entrypoint/command and the
  curl healthcheck so the dev stack actually starts Ollama.

Verified on staging end-to-end: model-init exits 0, ollama healthy,
backend reaches /api/tags, inference succeeds within the 8g limit.

Refs #758

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 19:20:22 +02:00