feat(infra): Ollama Docker Compose service for NL search (#737) #749
Reference in New Issue
Block a user
Delete Branch "feat/issue-737-ollama-docker-compose"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #737. Part of epic #735.
Summary
ollama-model-init(one-shot model pull) +ollama(inference server) indocker-compose.yml, pinned toollama/ollama:0.30.6. Both run withread_only: true,cap_drop: ALL,no-new-privileges(ADR-019 baseline).prometheus.ymlOCR scrape target corrected fromocr:8000→ocr-service:8000(the scrape has never resolved — Docker service DNS uses the service name).infra/observability/grafana/provisioning/dashboards/ollama.json) with inference latency p50/p95 and request-rate panels, auto-provisioned.docs/DEPLOYMENT.mdupdated (hardware tiers, env vars, ops notes), C4l2-containers.pumlupdated.Pre-PR investigations (results in ADR-028)
read_only: true0.6.5and0.30.6— all three ops (serve, pull, list) succeedmem_limit: 2gon init container is adequateOLLAMA_API_KEYenforcementarchiv-netisolation is the only effective controlTest Plan
docker compose up -dstarts the Ollama service alongside the existing stackdocker exec archive-backend curl -sf http://ollama:11434/api/tagsreturns JSONdocker compose up -dwith model already in volume:docker logson init container shows "already up to date", no re-downloadAPP_OLLAMA_BASE_URLfrom env:GET /actuator/healthreturns 200 (backend unaffected)ollama:11434shows asUp🤖 Generated with Claude Code
🏛️ Markus Keller (@mkeller) — Senior Application Architect
Verdict: 🚫 Changes requested
Strong foundation. The init container pattern, graceful-degradation contract, and
@ConditionalOnExpressionrationale are all well-reasoned. ADR-028 is genuinely thorough — this is what a good ADR looks like. Three blockers before merge.Blockers
1. Version mismatch in C4 diagram (
docs/architecture/c4/l2-containers.puml)The newly added Container line reads:
docker-compose.ymlpinsollama/ollama:0.30.6. The architecture diagram must match the deployed version. Fix:2. Stale TBD placeholder in ADR-028 Consequences section (
docs/adr/028-ollama-docker-compose-service.md)The Consequences section still says:
All three TBD items are resolved in §7, §8, §9. This placeholder predates the empirical work and was never removed. Replace with something like: "All three empirical TBD items were resolved — see §7, §8, §9." The stale text actively misleads a reviewer scanning the Consequences section.
3. ADR §12 contradicts §7 on OLLAMA_API_KEY security (
docs/adr/028-ollama-docker-compose-service.md)Section 12 (Security threat model) states:
Section 7 empirically proves the opposite:
These are contradictory claims within the same document. §12 was written before the empirical test in §7. Fix §12:
Doc Checklist (from my review protocol)
l2-containers.pumldocs/DEPLOYMENT.mdCLAUDE.mdpackage tableCLAUDE.mdroute tableSuggestions
@ConditionalOnExpressionrationale (§3) andOptional<OllamaClient>injection pattern (§5) are unusually well-documented. This is the quality bar the rest of the codebase should aim for.start_periodrationale (§11) is clearly argued. Appreciated.🔧 Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer
Verdict: ⚠️ Approved with concerns
Solid infrastructure work overall. Image version pinned, named volume correct, security hardening baseline applied, CI exclusion well-documented. One concern that might break the graceful-degradation story in dev, one documentation gap, and two things to verify before first deploy.
Concerns
1.
APP_OLLAMA_BASE_URLis hardcoded, not interpolated (docker-compose.yml, backendenvironment:)The
.env.examplesays: "Leave APP_OLLAMA_BASE_URL empty to disable NL search (safe default for CX32 / CI)." But because the Compose file sets a literal value — not${APP_OLLAMA_BASE_URL:-http://ollama:11434}— setting it empty in.envhas no effect on the backend container. The env var override path doesn't exist.Compare with how OCR does it:
If a developer on a CX32-class machine wants to disable Ollama locally, they currently have no way to do so short of editing
docker-compose.yml. Suggested fix:This preserves the current default behaviour while allowing a
.envoverride.2. No
depends_on: ollamain the backend service — confirming this is intentionalThe backend doesn't depend on
ollamabeing healthy. On firstdocker compose up -d, Ollama won't be ready until the model init container finishes (~60–90 min on slow connections). NL search will return 503 during that window. If the graceful-degradation design from ADR-028 §3 is the intent here, this is correct — just flagging it for explicit confirmation.Verification Needed
3. First
docker compose up -d --waitblocks for model pulldocker compose up -d(no--wait) detaches immediately — no problem. Butdocker compose up -d --waitwaits for all services to reach their health/completion target, which includesollama-model-initcompleting successfully. On first pull, that's 60–90 minutes.DEPLOYMENT.md §3.4 mentions the pull time but doesn't warn about
--waitblocking. If any CI smoke test or staging deploy script uses--wait, it will time out. Worth adding a note: "Do not use--waiton first deploy — theollama-model-initpull takes 60–90 minutes."4. Grafana dashboard auto-provisioning
ollama.jsonis dropped ininfra/observability/grafana/provisioning/dashboards/. Whether Grafana picks it up automatically depends on whether the existing provisioner YAML config scans that directory. No provisioner config was changed in this PR. Verify on first observability stack deploy withdocker logs obs-grafana | grep ollama.What's Working Well
0.30.6pinned ✅ — reproducible buildsollama_models:named volume ✅ — survives container rebuildscap_drop: ALL,no-new-privileges: true,read_only: true,tmpfs: /tmp:size=512m✅ — ADR-019 baseline applied consistentlyexpose:notports:✅ — Ollama stays off the host networkdocker-compose.ci.ymlservice selection (not profiles) ✅ — no--profiletax on devsocr:8000→ocr-service:8000) fixed as a drive-by ✅🔐 Nora "NullX" Steiner — Application Security Engineer
Verdict: 🚫 Changes requested
The infra hardening baseline is solid:
cap_drop: ALL,no-new-privileges: true,read_only: true, internal-only port exposure, named-volume isolation. One blocker — the ADR's security threat model makes a false claim that needs to be corrected before merge.Blockers
1. ADR §12 contradicts §7 — false security claim in the threat model
ADR-028, Section 12:
ADR-028, Section 7:
These statements are directly contradictory. Section 12 was written before the empirical test in §7. A developer reading §12 in isolation would believe the API key provides a security boundary — it does not.
Why this matters: If an operator reads the threat model and concludes "Ollama is protected by the API key as a secondary control," they may make other decisions based on that assumption (e.g., relaxing network rules, skipping firewall configs). False threat model documentation is a security risk.
Fix for §12:
Concerns
2. Version reference in
.env.examplecomment is staleThe deployed version is
0.30.6, not0.6.5. The empirical test covered both versions — the comment should say so:3. Empty API key guard (sibling issue linkage)
ADR-028 §6 documents the empty API key guard (
omit Authorization header when apiKey.isBlank()). This guard lives inRestClientOllamaClient, which is a sibling issue. The ADR is the right place to document it, but the PR description should link the sibling issue explicitly so reviewers can verify end-to-end coverage when that PR lands.What's Secure
expose:(notports:) ✅ — Ollama unreachable from host or internetcap_drop: ALLon bothollamaandollama-model-init✅no-new-privileges: true✅read_only: truewith tmpfs for/tmp✅ — container filesystem hardenedarchiv-netas the primary isolation boundary ✅ — correctly identified as sole controlNL_SEARCH_UNAVAILABLE) ✅ — documented in ADR §3; failure mode is explicit, not silenttrainingTokenguard inRestClientOcrClient.java:107✅🧪 Sara Holt (@saraholt) — Senior QA Engineer
Verdict: ⚠️ Approved with concerns
This is a pure infrastructure PR — Docker Compose, ADR, observability config, docs. No application code, so the TDD mandate and coverage gates don't apply here. The concerns are around unverified metric names and a provisioning gap that could produce a silent failure on first deploy.
Concerns
1. Grafana dashboard metric names not verified against Ollama 0.30.6
The dashboard JSON uses:
histogram_quantile(0.5, rate(ollama_request_duration_seconds_bucket[5m]))(p50)histogram_quantile(0.95, rate(ollama_request_duration_seconds_bucket[5m]))(p95)rate(ollama_requests_total[5m])(request rate)These metric names are asserted in the dashboard but not empirically verified against
GET http://localhost:11434/metricson a running Ollama 0.30.6 instance. If the actual metric names differ (e.g.,ollama_model_request_duration_secondsor similar), all three panels will display "No data" silently.Suggested verification before merge:
Update metric names in
ollama.jsonto match the actual exposed names.2. Grafana provisioning config not changed
ollama.jsonis dropped intoinfra/observability/grafana/provisioning/dashboards/but the provisioner config (presumably adashboards.ymlordashboards.yaml) was not modified. Whether Grafana auto-discovers this dashboard depends on whether the existing provisioner already scans that directory. If it doesn't, the dashboard will be silently absent on the first observability stack deploy.Recommend verifying with
docker exec obs-grafana grafana-cli ...or checking the Grafana provisioning logs after deploy.3. Acceptance criteria are manual-only
The issue specifies 6 acceptance criteria that all require a running stack. None can be validated in CI. This is acceptable for infrastructure — but it means the PR relies entirely on manual verification before merge. Document which of the 6 ACs were actually run as part of this PR in the PR description or in a comment.
What Passes Review
ocr:8000→ocr-service:8000) ✅ — correct Docker DNS nameollamaPrometheus job usesmetrics_path: /metrics✅healthcheckon mainollamaservice with realisticstart_period: 60s✅condition: service_completed_successfullyonollama-model-init✅ — correct dependency type for one-shot init containersmem_limit: 2gwith actual peak RSS data ✅ — this is exactly the kind of evidence QA wants to see in an ADR👨💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
Infrastructure-only PR — no Java, TypeScript, or Python. TDD doesn't apply. The YAML and ADR quality are high. A few documentation inconsistencies and one env-var pattern deviation worth fixing.
Concerns
1. Backend
APP_OLLAMA_BASE_URLis a literal, not an interpolated env var (docker-compose.yml)The OCR equivalent uses proper interpolation:
Following the same pattern:
The hardcoded value means
.env.example's instruction "Leave empty to disable NL search" doesn't work for the dev compose. On a resource-constrained machine where a developer wants to skip Ollama, there's no env-var lever. The graceful-degradation story only works end-to-end if the backend can actually receive a blankAPP_OLLAMA_BASE_URL.2. Version
0.6.5referenced in two places where0.30.6is the deployed version.env.examplecomment:ADR-028 §7 section title:
Both should say
0.30.6(or "0.6.5 and 0.30.6" since ADR §7 tested both). The deployed image is0.30.6.3. ADR-028 Consequences section has a stale pre-merge placeholder
These are resolved in §7, §8, §9 respectively. The placeholder was written before the empirical work and never updated. Remove it or replace with: "All three empirical TBD items from the original issue spec were resolved — see §7, §8, §9."
What Looks Good
$$SERVE_PID/$$!escaping in thecommand:YAML ✅ — correct escape for Compose interpolationcondition: service_completed_successfullyfor the init container dependency ✅restart: "no"onollama-model-init✅ — one-shot containers shouldn't restartmemswap_limitmatchingmem_limit✅ — prevents silent latency degradation from swapollama_modelsvolume correctly declared with no bind-mount ✅📋 Elicit — Requirements Engineer
Verdict: ⚠️ Approved with concerns
This is an infrastructure story with one documented requirement that the implementation doesn't fulfil, and one operator-experience NFR that lacks traceability.
Concerns
1. Requirement/implementation gap: "leave empty to disable" doesn't work via
.envThe issue spec,
.env.example, and DEPLOYMENT.md all say operators can disable NL search by leavingAPP_OLLAMA_BASE_URLblank. The intended behaviour:However,
docker-compose.ymlsetsAPP_OLLAMA_BASE_URL: http://ollama:11434as a literal — not${APP_OLLAMA_BASE_URL:-http://ollama:11434}. SettingAPP_OLLAMA_BASE_URL=in.envhas no effect on the backend container in the dev Compose.The requirement is stated clearly. The implementation breaks it for the dev environment. This is a traceability failure — the spec says one thing, the config does another.
2. Operator experience NFR gap: no acceptance criterion for "pull in progress" visibility
The DEPLOYMENT.md documents the model pull command:
But there is no acceptance criterion in the issue that asks: "How does the operator know the init container is still running vs. stuck?"
The 60–90 minute pull is a high-friction first-start experience. A user story for this would be:
The
docker logscommand addresses this but it's buried in §3.4. A first-start notice at the top of DEPLOYMENT.md §3.4 — "On first start, the model pull takes 60–90 min. Monitor:docker logs -f ..." — would be more discoverable.3. Sibling issue links missing from PR description
The ADR references sibling issues for the Spring Boot client and frontend graceful-degradation state. These sibling issues are not linked in the PR description. Linking them establishes end-to-end traceability from this infrastructure PR to the full feature.
What's Well-Specified
NL_SEARCH_UNAVAILABLE" — not vague ✅ollama_modelsvolume lifecycle (no backup needed, delete to re-pull) is documented ✅🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead
Verdict: ✅ Approved
Pure infrastructure PR — Docker Compose services, ADR, Prometheus/Grafana observability config, and deployment documentation. No frontend code, no Svelte components, no route changes, no CSS, no brand tokens, no user-facing UI.
Nothing in my review domain is affected by this PR. The Grafana dashboard panels (latency p50/p95, request rate) are internal operational tooling, not user-facing UI.
LGTM from the UI/UX perspective. No accessibility, responsive design, or brand compliance concerns.
Re-review response
All reviewer concerns addressed in 5 commits pushed to this branch.
Blockers resolved
B1 — ADR §12 false security claim (Markus + Nora)
Commit
fca367df— Replaced "Defense-in-depth:OLLAMA_API_KEYguards against lateral movement" with an accurate note referencing §7's empirical finding: not enforced in 0.6.5 or 0.30.6,archiv-netis the sole effective control.B2 — Stale TBD placeholder in Consequences (Markus + Felix)
Commit
fca367df— Replaced "Three TBD items must be resolved before the PR is merged" with "All three empirical TBD items from the original issue spec were resolved — see §7, §8, §9."B3 — C4 diagram wrong Ollama version (Markus)
Commit
8c45ad01—ollama/ollama:0.6.5→ollama/ollama:0.30.6inl2-containers.puml.Concerns addressed
C1 —
APP_OLLAMA_BASE_URLhardcoded literal (Tobias + Felix + Elicit)Commit
ea50fb9e— Changed to"${APP_OLLAMA_BASE_URL:-http://ollama:11434}", matching theAPP_OCR_TRAINING_TOKENpattern. SettingAPP_OLLAMA_BASE_URL=in.envnow correctly reaches the backend container.C2 —
.env.examplecomment says "Ollama 0.6.5" (Nora + Felix)Commit
dce57750— Updated to "Ollama 0.6.5 or 0.30.6". Also made thearchiv-netnote explicit in the comment.C3 — ADR §7 section title says "0.6.5" (Felix)
Commit
fca367df— Title updated to "OLLAMA_API_KEY behavior in Ollama 0.6.5 and 0.30.6".C4 — DEPLOYMENT.md version ref + missing
--waitwarning (Tobias)Commit
6bc434eb—OLLAMA_API_KEYrow updated from 0.6.5 to "0.6.5 or 0.30.6". Added an explicit blockquote to §3.4: "Do not use--waiton first deploy — the init container pull takes 60–90 minutes and will time out any deploy script that uses--wait."Intentionally unchanged
No
depends_on: ollamain backend (Tobias concern #2): Intentional per ADR §3 — the graceful-degradation design means the backend can start and serve 503NL_SEARCH_UNAVAILABLEwithout Ollama being ready.Deferred (require running infrastructure)
Grafana metric names (Sara concern #1): Verifying
ollama_request_duration_seconds_bucketandollama_requests_totalrequirescurl http://localhost:11434/metricsagainst a live Ollama 0.30.6 instance. Will verify and updateollama.jsonbefore the observability stack goes to production.Grafana provisioning auto-discovery (Sara concern #2 / Tobias concern #4): Requires running
docker compose -f docker-compose.observability.yml up -dand checkingdocker logs obs-grafana | grep ollama. Same gate — verify before first observability deploy.🏛️ Markus Keller (@mkeller) — Application Architect
Verdict: ✅ Approved
All three blockers from the previous review are resolved. Clean.
Blockers resolved
OLLAMA_API_KEYas defense-in-depth. The updated text accurately states archiv-net isolation is the sole effective control and explicitly says "do not rely on it for access control." ✅l2-containers.pumlnow readsollama/ollama:0.30.6. ✅Documentation compliance (doc-update table check)
l2-containers.pumldocs/DEPLOYMENT.mddocs/adr/No new backend packages, Spring Boot controllers, or SvelteKit routes — no further doc updates required.
Architecture quality
APP_OLLAMA_BASE_URL: "${APP_OLLAMA_BASE_URL:-http://ollama:11434}"— the interpolation fix is architecturally important. Before, setting the env var to empty in.envsilently had no effect; the feature was not truly opt-out. Now it is. ✅condition: service_completed_successfullydependency ensures the inference server only starts after model weights are in the volume. ✅depends_on: ollamain backend is the right call. Graceful degradation viaOptional<OllamaClient>is documented in ADR §3 and §5. ✅OLLAMA_API_KEYnot enforced).Minor observation (non-blocking)
The
command:onollama-model-inituses YAML block scalar (>) which folds newlines to spaces. The result is a single-line string passed tosh -c. Functionally correct, but a list form would be more readable:Not a blocker — just a style note for future maintenance.
🛠️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer
Verdict: ✅ Approved
Previous concerns are addressed. Looks solid for the infrastructure side.
Previous concerns resolved
APP_OLLAMA_BASE_URLinterpolation fixed:"${APP_OLLAMA_BASE_URL:-http://ollama:11434}". Disabling Ollama by setting the env var to empty in.envnow works correctly. ✅docs/DEPLOYMENT.mdOLLAMA_API_KEY row now reads0.6.5 or 0.30.6. ✅--waitblocking warning added to DEPLOYMENT.md §3.4. This is exactly the footgun documentation I wanted: "Do not use--waiton first deploy." ✅What looks good
ollama/ollama:0.30.6is pinned — not:latest. ✅ (Add to Renovate config when ready.)ollama_modelsis a named Docker volume — model weights survive container recreation without bind mounts. ✅docker volume rm+ recreate to free after model upgrades). ✅read_only: true,tmpfs: /tmp:size=512m,cap_drop: ALL,no-new-privileges: trueon both Ollama services — matches the OCR service hardening baseline (ADR-019). ✅expose:(notports:) on both Ollama services — port 11434 never reaches the host or internet. ✅prometheus.ymlOCR target fixed fromocr:8000→ocr-service:8000. This was a latent bug — the scrape job was silently failing. ✅ollama:11434added. ✅Position unchanged
I'm not adding
depends_on: ollamato the backend. This is the correct call — the graceful-degradation contract in ADR §3 means the backend must start without Ollama available. Network-level unavailability is handled at runtime, not at startup.One note on mem_limit defaults
.env.examplesetsOLLAMA_MEM_LIMIT=8gwhich is only viable on CX42. The comment in.env.exampleand the DEPLOYMENT.md hardware tier table both make this clear. Good. Operators on CX32 who copy.env.examplewithout reading docs will OOM — but the comments are right there.🔐 Nora "NullX" Steiner — Application Security Engineer
Verdict: ✅ Approved
Previous blockers resolved. The security documentation is now honest about the threat model.
Previous blockers resolved
OLLAMA_API_KEY"guards against lateral movement." This was a false security claim. The updated text correctly reads: "archiv-net network isolation is the sole effective security control. The env var is retained for forward compatibility only — do not rely on it for access control." ✅.env.examplecomment onOLLAMA_API_KEYnow covers both tested versions (0.6.5 or 0.30.6) and explicitly states "archiv-net network isolation is the only effective access control." ✅Security posture review
Network isolation (primary control):
expose:notports:. Port 11434 is only reachable withinarchiv-net. The Caddyfile would need explicit routing to expose it — and the ADR §12 says it must not route to Ollama. ✅Container hardening:
cap_drop: ALL— drops all Linux capabilities ✅no-new-privileges: true— prevents privilege escalation via setuid ✅read_only: true— read-only filesystem, writes limited totmpfsand the named volume ✅tmpfs: /tmp:size=512m— tmpfs is capped to prevent DoS via disk fill ✅memswap_limitmatchesmem_limit— prevents model weights from being swapped, which could leave sensitive model data on disk ✅The
APP_OLLAMA_BASE_URLinterpolation fix is also a security improvement:Before this fix, an operator who set
APP_OLLAMA_BASE_URL=to disable Ollama (e.g., on a lower-tier host) was silently ignored — the backend hardcodedhttp://ollama:11434. The backend would attempt Ollama connections regardless. Now the env var controls the feature correctly, and the@ConditionalOnExpressionblank-check ensures theRestClientbean is not registered when the URL is absent.No new findings.
🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist
Verdict: ⚠️ Approved with concerns
Infrastructure-only PR — unit/integration test pyramid doesn't apply. CI pipeline exclusion is correctly documented. But two open quality risks remain from the previous review.
Previous review concerns
The Grafana metric name verification and provisioning auto-discovery were explicitly deferred in the previous review. They remain unresolved. I'm noting them again so they don't silently evaporate post-merge.
Concerns (non-blocking for merge, but must be tracked)
1. Grafana metric names not verified against Ollama 0.30.6
infra/observability/grafana/provisioning/dashboards/ollama.jsonreferences:ollama_request_duration_seconds_bucketollama_requests_totalThese are assumed metric names. They have not been verified by running
curl http://localhost:11434/metricsagainst a live Ollama 0.30.6 instance. If these names are wrong, the dashboard silently shows "No data" — indistinguishable from "Ollama is healthy but idle."Verification command:
This should be a follow-up issue before the observability dashboard is declared functional.
2. Grafana provisioning auto-discovery not verified
The
ollama.jsonfile is placed ininfra/observability/grafana/provisioning/dashboards/. Whether the Grafana provisioning config (grafana/provisioning/dashboards/dashboard.ymlor equivalent) auto-discovers all JSON files in that directory — or requires explicit registration — has not been verified against the running observability stack.3. Init container idempotency on re-pull (minor, documentation gap)
When
docker compose up -d --force-recreateis run,ollama-model-initre-executesollama pull qwen2.5:7b-instruct-q4_K_M. The pull exits quickly if the model is already in the volume (Ollama's CLI skips re-download when the digest matches). This is correct behavior, but it's not documented in DEPLOYMENT.md. Worth adding a note so operators don't panic when they see the init container run again on redeploy.What works
docker-compose.ci.yml— correct approach, no Compose profiles friction. ✅ollamauses/api/tagswith appropriatestart_period: 60s. ✅ollama-model-initdepends_on: condition: service_completed_successfullyensures inference server only starts after model pull. ✅📋 Elicit — Requirements Engineer
Verdict: ⚠️ Approved with concerns
Reviewing for completeness, operational specification, and traceability.
Well-specified
--waitwarning in DEPLOYMENT.md §3.4 — documents a concrete deployment footgun with exact symptom (60–90 min block). ✅APP_OLLAMA_BASE_URLdisable pattern documented with the "Leave empty to disable" phrasing. ✅Gap 1: Model upgrade runbook missing from DEPLOYMENT.md (blocker suggestion)
ADR §14 explicitly states:
This documentation does not yet exist in DEPLOYMENT.md. An operator who upgrades from
qwen2.5:7b-instruct-q4_K_Mto a newer model will hit a disk-full scenario without guidance. Suggested addition to DEPLOYMENT.md §3.4 or a new §3.5:Gap 2: No post-deploy verification step in DEPLOYMENT.md
DEPLOYMENT.md documents how to enable Ollama but not how to verify the feature is operational after deploy. Operators need a smoke test. Suggested addition:
Non-blocking observation
.env.examplesays "Requires CX42 (16 GB RAM) to run alongside OCR." This duplicates the DEPLOYMENT.md hardware tier table. A cross-reference (see DEPLOYMENT.md §3 for hardware requirements) would prevent future drift between the two sources.🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead
Verdict: ✅ Approved
Pure infrastructure PR — Docker Compose services, ADR, Prometheus scrape config, Grafana dashboard JSON. No frontend components, Svelte files, or user-facing UI changes to review.
UX-adjacent observation
The only user-facing consequence of this PR is the HTTP 503
NL_SEARCH_UNAVAILABLEerror code that gets returned when Ollama is not configured. The graceful-degradation contract (ADR §3) is well-specified:From a UX perspective, this error needs to be:
getErrorMessage(NL_SEARCH_UNAVAILABLE)— not raw JSONThese are concerns for the frontend NL search PR (issue #735 epic), not this infra PR. I'll flag them there.
LGTM on infrastructure.
👨💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer
Verdict: ✅ Approved
No application code changed — this is infrastructure and documentation. Reviewing the embedded Java code patterns in the ADR and the Docker Compose correctness.
ADR code patterns (will be implemented in the NL search backend PR)
The patterns documented in ADR-028 §3–6 are clean:
@ConditionalOnExpressionblank-check (§3):Correct.
@ConditionalOnPropertywould register the bean when the property is present-but-blank, producing aRestClientwith an empty base URL that fails at runtime with an opaque error. The blank-check pattern is more correct for optional features with no safe default. ✅@ConfigurationPropertiesrecord (§4):Clean — immutable record, registered unconditionally as a value holder. Correct separation from the conditional
RestClientOllamaClientbean. ✅Optional<OllamaClient>injection (§5):Avoids null field with
@Autowired(required = false), which is noisy in@RequiredArgsConstructorclasses.orElseThrow(NL_SEARCH_UNAVAILABLE)is a clean guard. ✅Empty API key guard (§6):
Mirrors
trainingTokenguard inRestClientOcrClient.java:107. Consistent pattern. ✅Docker Compose correctness
Init container PID capture (ADR §10):
SERVE_PID=$!is the right approach —kill %1is job-control syntax and unreliable in non-interactivesh -ccontexts. ✅depends_on: condition: service_completed_successfullyonollama→ollama-model-initensures inference server starts only after pull completes. Correct. ✅No TDD concerns
This PR contains no production application code. Docker Compose, Prometheus YAML, and Grafana JSON have no unit test layer. The CI exclusion via explicit service selection in
docker-compose.ci.ymlis the right approach.One minor style note (non-blocking): the
>YAML block scalar forollama-model-initcommand folds newlines to spaces — functionally identical to a single-line string, but harder to read. Could be a list form in a future cleanup.docs/DEPLOYMENT.mdupdated — commit27bef28cAddressed the open concerns from Elicit and Sara's reviews:
Model upgrade runbook (Elicit Gap 1 / ADR §14 self-reference): expanded the brief
ollama_modelsnote in §6 into a full step-by-step procedure — update model name in Compose,docker volume rm,docker compose up -d. Includes production volume name prefix (archiv-production_ollama_models).Re-deploy idempotency note (Sara concern 3): added to §3.4 —
ollama-model-initre-runs on--force-recreatebut exits in seconds when model digest already matches the volume.Post-deploy NL search smoke test (Elicit Gap 2): added to §3.4 —
curlcommand distinguishing200(active) from503 NL_SEARCH_UNAVAILABLE(Ollama unreachable orAPP_OLLAMA_BASE_URLunset).§12 stated OLLAMA_API_KEY guards against lateral movement — contradicts §7's empirical finding that it is not enforced. Replaced with an accurate note referencing §7. Stale pre-merge placeholder in Consequences ("Three TBD items must be resolved") removed; all three are resolved. §7 section title updated from "0.6.5" to "0.6.5 and 0.30.6" to match the body text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>27bef28c0eto7679596c70