marcel/familienarchiv

fix(infra): deploy Ollama to prod/staging compose + fix broken model-init recipe #759

Merged

marcel merged 8 commits from fix/issue-758-ollama-prod-compose into main

2026-06-06 20:30:35 +02:00

Author	SHA1	Message	Date
Marcel	ed98729f75	docs(adr): record prod Ollama deployment + keep-alive decision (ADR-034) All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m23s Details CI / OCR Service Tests (pull_request) Successful in 24s Details CI / Backend Unit Tests (pull_request) Successful in 3m52s Details CI / fail2ban Regex (pull_request) Successful in 45s Details CI / Semgrep Security Scan (pull_request) Successful in 25s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m7s Details CI / Unit & Component Tests (push) Successful in 3m23s Details CI / OCR Service Tests (push) Successful in 23s Details CI / Backend Unit Tests (push) Successful in 3m52s Details CI / fail2ban Regex (push) Successful in 46s Details CI / Semgrep Security Scan (push) Successful in 23s Details CI / Compose Bucket Idempotency (push) Successful in 1m4s Details nightly / deploy-staging (push) Successful in 2m44s Details Capture the why behind deploying Ollama to prod/staging compose: the corrected init recipe (supersedes ADR-028 §10's never-functional curl loop), the OLLAMA_KEEP_ALIVE=-1 pin (so a future maintainer doesn't optimize it away and reintroduce the post-idle cold-load 503), the 30->60s timeout NFR, and the memswap==mem hard-OOM trade-off. Addresses #759 review (Markus #3, Nora #2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 20:16:03 +02:00
Marcel	db87a64cc0	docs(c4): de-duplicate Ollama container in l2-containers diagram The diagram declared Container(ollama, ...) twice — an alias collision that renders a duplicate box. It also declared the backend->ollama relationship twice. Keep the richer 'Ollama LLM Service' declaration and the more specific 'NL query parsing (POST /api/generate)' relationship; drop the duplicates. Addresses #759 review (Markus #2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 20:14:26 +02:00
Marcel	d7d6d0638c	fix(infra): make dev Ollama model-init offline-safe Mirror the prod hardening in the dev stack: guard the model pull with `ollama list \| grep -q <model>` so an already-cached model exits clean without a registry round-trip. Keeps dev and prod on one recipe. Addresses #759 review (Tobias #1). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 20:13:19 +02:00
Marcel	a2f37f85a6	fix(infra): make prod Ollama model-init offline-safe The init command unconditionally ran `ollama pull`, which contacts the registry to verify the manifest digest even when the model is already on the volume. A host reboot during a registry/upstream-network blip would then fail init non-zero, the `service_completed_successfully` gate would never be met, and the ollama service (hence NL search) would stay down until the registry was reachable again. Guard the pull with `ollama list \| grep -q <model>` so a cached model exits clean without any registry round-trip. Addresses #759 review (Tobias #1). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 20:12:21 +02:00
Marcel	f22a1a1cfa	docs(deploy): fix prod Ollama volume name to match hyphenated compose volume docker-compose.prod.yml declares the volume as `ollama-models` (hyphen), so the compose-project-prefixed name is `archiv-production_ollama-models`, not the underscored `archiv-production_ollama_models` the model-upgrade guide documented. The documented `docker volume rm` would not have matched the real volume. Addresses #759 review (Tobias #2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 20:09:48 +02:00
Marcel	2a0863cf3e	docs(deploy): correct Ollama read timeout default to 60s application.yaml sets app.ollama.timeout-seconds: 60 (raised from 30 to absorb the cold model load on the first query after an Ollama restart), but DEPLOYMENT.md still documented 30. A doc that contradicts the shipped value is a traceability defect. Addresses #759 review (Markus, Felix, Elicit). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 20:08:55 +02:00
Marcel	9e97687d0f	fix(search): pin Ollama model in memory + raise read timeout All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m18s Details CI / OCR Service Tests (pull_request) Successful in 22s Details CI / Backend Unit Tests (pull_request) Successful in 3m55s Details CI / fail2ban Regex (pull_request) Successful in 51s Details CI / Semgrep Security Scan (pull_request) Successful in 22s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m8s Details NL search recovered after deploy but went 503 again after a few minutes: Ollama unloads the model after its default ~5 min keep-alive, so the next query cold-loads the 4.7 GB model and exceeds the backend's 30s read timeout (ResourceAccessException -> SMART_SEARCH_UNAVAILABLE). Warm inference is ~18s; the cold load after idle is what timed out. - docker-compose.{prod,yml}: set OLLAMA_KEEP_ALIVE=-1 on the ollama service so the model stays resident and never pays a cold-load penalty during normal operation (verified on staging: `ollama ps` -> UNTIL "Forever"; host has 47 GB free). - application.yaml: raise app.ollama.timeout-seconds 30 -> 60 so the one unavoidable cold load (first query after an Ollama restart, before the model is pinned) completes instead of timing out. Refs #758 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 19:27:02 +02:00
Marcel	b665e1132d	fix(infra): deploy Ollama to prod/staging compose + fix broken model-init recipe All checks were successful CI / Unit & Component Tests (pull_request) Successful in 4m0s Details CI / OCR Service Tests (pull_request) Successful in 25s Details CI / Backend Unit Tests (pull_request) Successful in 3m56s Details CI / fail2ban Regex (pull_request) Successful in 45s Details CI / Semgrep Security Scan (pull_request) Successful in 23s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s Details NL search returned 503 (SMART_SEARCH_UNAVAILABLE / "Intelligente Suche nicht verfügbar") on staging because Ollama was never reachable. Two defects, both downstream of #737: 1. Ollama was added only to the dev docker-compose.yml. Staging/prod deploy from the self-contained docker-compose.prod.yml, which had no ollama service — so the backend (defaulting to http://ollama:11434) hit a non-existent host (ResourceAccessException -> 503). 2. The merged model-init recipe never worked: the ollama/ollama image ENTRYPOINT is `ollama` (so `command: sh -c ...` ran as `ollama sh ...` -> "unknown command sh"), and the image ships no curl (so both the readiness loop and the healthcheck could never pass). - docker-compose.prod.yml: add ollama-model-init + ollama services and the ollama-models volume, with the corrected recipe (entrypoint override to /bin/sh -c, `ollama list` for readiness and healthcheck). - docker-compose.yml: fix the same broken entrypoint/command and the curl healthcheck so the dev stack actually starts Ollama. Verified on staging end-to-end: model-init exits 0, ollama healthy, backend reaches /api/tags, inference succeeds within the 8g limit. Refs #758 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 19:20:22 +02:00