Addresses Elicit's and Sara's review concerns on PR #749:
- Expand §6 ollama_models section into a full model upgrade runbook (step-by-step
docker volume rm + recreate, including production volume name prefix)
- Add re-deploy idempotency note to §3.4 (init container exits quickly when model
already present on the volume)
- Add NL search smoke test to §3.4 (curl command distinguishing 200 from 503
NL_SEARCH_UNAVAILABLE)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Updated OLLAMA_API_KEY env vars table from 0.6.5 to 0.6.5 or 0.30.6 to
match both tested versions. Added an explicit warning in §3.4 that
docker compose up -d --wait blocks for 60–90 min on first deploy when the
model pull has not yet completed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both versions were tested and neither enforces the key. Comment updated to
say "0.6.5 or 0.30.6" and surface archiv-net as the sole effective control.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hardcoded literal overrides any .env setting — setting APP_OLLAMA_BASE_URL=
in .env had no effect on the backend container. Now uses the same pattern
as APP_OCR_TRAINING_TOKEN with a safe default.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
§12 stated OLLAMA_API_KEY guards against lateral movement — contradicts
§7's empirical finding that it is not enforced. Replaced with an accurate
note referencing §7. Stale pre-merge placeholder in Consequences ("Three
TBD items must be resolved") removed; all three are resolved. §7 section
title updated from "0.6.5" to "0.6.5 and 0.30.6" to match the body text.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Docker Compose interpolates $VAR in command strings — use $$ to pass a
literal $ to the shell so SERVE_PID=$! and kill $SERVE_PID work correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- OLLAMA_API_KEY: non-enforcement confirmed on both 0.6.5 and 0.30.6
- read_only: true: confirmed working on both 0.6.5 and 0.30.6
- Peak RSS during pull: ~108 MiB (well under 2g limit)
- All TBD placeholders resolved
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ollama-model-init: one-shot init container that pulls qwen2.5:7b-instruct-q4_K_M
into the ollama_models volume on first start
- ollama: main inference service on archiv-net (expose: only, no public port)
- ollama_models named volume for persistent model storage
- APP_OLLAMA_BASE_URL + APP_OLLAMA_API_KEY added to backend env
- Both services: cap_drop ALL, no-new-privileges, read_only+tmpfs (ADR-019 + ADR-028)
- start_period: 60s — model pre-pulled by init container
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- prometheus.yml: ocr:8000 → ocr-service:8000 (Docker service name is
ocr-service, not ocr — current scrape target has never resolved)
- Add Ollama scrape job on ollama:11434 /metrics
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>