feat(infra): Ollama Docker Compose service for NL search (#737) #749

Merged
marcel merged 15 commits from feat/issue-737-ollama-docker-compose into main 2026-06-06 15:19:06 +02:00

15 Commits

Author SHA1 Message Date
Marcel
7679596c70 docs(ollama): add model upgrade runbook + post-deploy smoke test to DEPLOYMENT.md
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Semgrep Security Scan (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Successful in 3m16s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 3m37s
CI / fail2ban Regex (push) Successful in 47s
CI / Semgrep Security Scan (push) Successful in 22s
CI / Compose Bucket Idempotency (push) Successful in 1m4s
Addresses Elicit's and Sara's review concerns on PR #749:
- Expand §6 ollama_models section into a full model upgrade runbook (step-by-step
  docker volume rm + recreate, including production volume name prefix)
- Add re-deploy idempotency note to §3.4 (init container exits quickly when model
  already present on the volume)
- Add NL search smoke test to §3.4 (curl command distinguishing 200 from 503
  NL_SEARCH_UNAVAILABLE)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
3d5dcd1f18 docs(deployment): fix OLLAMA_API_KEY version ref and add --wait warning
Updated OLLAMA_API_KEY env vars table from 0.6.5 to 0.6.5 or 0.30.6 to
match both tested versions. Added an explicit warning in §3.4 that
docker compose up -d --wait blocks for 60–90 min on first deploy when the
model pull has not yet completed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
52fca38f0f docs(env): correct OLLAMA_API_KEY comment — tested on 0.6.5 and 0.30.6
Both versions were tested and neither enforces the key. Comment updated to
say "0.6.5 or 0.30.6" and surface archiv-net as the sole effective control.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
662a8f3e80 fix(infra): interpolate APP_OLLAMA_BASE_URL so .env empty-value disables Ollama
Hardcoded literal overrides any .env setting — setting APP_OLLAMA_BASE_URL=
in .env had no effect on the backend container. Now uses the same pattern
as APP_OCR_TRAINING_TOKEN with a safe default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
cbba95c3f8 docs(c4): fix Ollama container version 0.6.5 → 0.30.6 in l2-containers.puml
Diagram must match the pinned image version in docker-compose.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
3536ed884c docs(adr): fix ADR-028 §12 false API-key claim, stale TBD, and §7 title
§12 stated OLLAMA_API_KEY guards against lateral movement — contradicts
§7's empirical finding that it is not enforced. Replaced with an accurate
note referencing §7. Stale pre-merge placeholder in Consequences ("Three
TBD items must be resolved") removed; all three are resolved. §7 section
title updated from "0.6.5" to "0.6.5 and 0.30.6" to match the body text.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
5a939d9222 fix(infra): escape \$\$SERVE_PID in compose command to prevent interpolation (#737)
Docker Compose interpolates $VAR in command strings — use $$ to pass a
literal $ to the shell so SERVE_PID=$! and kill $SERVE_PID work correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
93e90424ab docs(adr): update ADR-028 with 0.30.6 verified findings for API key + read_only (#737)
- OLLAMA_API_KEY: non-enforcement confirmed on both 0.6.5 and 0.30.6
- read_only: true: confirmed working on both 0.6.5 and 0.30.6
- Peak RSS during pull: ~108 MiB (well under 2g limit)
- All TBD placeholders resolved

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
e8f3004c4f feat(infra): add Ollama env vars to .env.example (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
9637ebbca2 feat(infra): add Ollama Docker Compose services for NL search (#737)
- ollama-model-init: one-shot init container that pulls qwen2.5:7b-instruct-q4_K_M
  into the ollama_models volume on first start
- ollama: main inference service on archiv-net (expose: only, no public port)
- ollama_models named volume for persistent model storage
- APP_OLLAMA_BASE_URL + APP_OLLAMA_API_KEY added to backend env
- Both services: cap_drop ALL, no-new-privileges, read_only+tmpfs (ADR-019 + ADR-028)
- start_period: 60s — model pre-pulled by init container

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
df10a42069 docs(deploy): document Ollama hardware requirements, env vars, and ops notes (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
64120a30b5 docs(arch): add Ollama container to C4 level-2 container diagram (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00
Marcel
25252fc709 feat(observability): add Grafana Ollama inference latency dashboard (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00
Marcel
1f379a161d fix(observability): fix OCR target name + add Ollama scrape job (#737)
- prometheus.yml: ocr:8000 → ocr-service:8000 (Docker service name is
  ocr-service, not ocr — current scrape target has never resolved)
- Add Ollama scrape job on ollama:11434 /metrics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00
Marcel
c0d034c85d docs(adr): add ADR-028 — Ollama Docker Compose service for NL search (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00