feat(infra): Ollama service in Docker Compose for NL search #737
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Part of epic #735.
Goal
Add Ollama as a new Docker Compose service so the NL search backend can call Qwen 2.5 7B over the internal Docker network. The backend degrades gracefully (503) if Ollama is absent, so this service is optional for environments that don't need NL search (CI, staging without the feature).
Hardware requirement: Running Ollama alongside OCR requires a minimum CX42 (16 GB RAM, ~32 EUR/month). On a CX32 (8 GB), leave
APP_OLLAMA_BASE_URLempty to disable NL search.docs/DEPLOYMENT.mdhardware table must be updated with an "NL Search" tier row.Decisions
OLLAMA_CPU_LIMIT=4.0— safe default for both CX32 and CX42. Operators on CX42 (8 vCPUs) can raise to7.5via the env var. The.env.examplecomment documents both values.curl-based readiness loop with a captured PID —kill %1is job-control syntax unreliable in non-interactivesh -c. The init container runs:ollama serve & SERVE_PID=$! && until curl -sf http://localhost:11434/api/tags; do sleep 1; done && ollama pull qwen2.5:7b-instruct-q4_K_M && kill $SERVE_PID. Uses the same endpoint as the main service healthcheck.docker-compose.ci.ymlalready excludes Ollama via explicit service selection (up -d db minio create-buckets). Profiles would add unnecessary dev friction. Add a compose comment instead.OLLAMA_API_KEYdefault: Empty string — consistent with the "empty = disabled/unconfigured" pattern used bySENTRY_DSNandAPP_OLLAMA_BASE_URL. Empirically verify Ollama's behavior whenOLLAMA_API_KEY=(empty string) AND whenOLLAMA_API_KEYis fully unset before shipping — document in ADR-028 whether both are treated as "no key" (unauthenticated) or "invalid key" (rejects all requests). If empty-string rejects requests, the.env.examplecomment "Leave empty to run unauthenticated" must be corrected.@ConditionalOnPropertyvs empty string:@ConditionalOnProperty(matchIfMissing = false)registers the bean when the property is present but blank (APP_OLLAMA_BASE_URL=), producing aRestClientwith an empty base URL that fails at runtime. Use@ConditionalOnExpression("!'${app.ollama.base-url:}'.isBlank()")instead — treats both absent and empty-string as disabled. (Explanation: when absent, the placeholder resolves to'';.isBlank()returns true; the negation makes the condition false; bean not registered. Same for empty-string.) ADR-028 must document why@Valuewith a hardcoded default (the OCR pattern) is not appropriate here.start_periodfor main service:60s— not 300s. The model is pre-pulled byollama-model-initbefore the main service starts (service_completed_successfully), so the main service only loads model weights from the named volume and binds port 11434. 300s was appropriate if the service pulled the model itself; with the init container pattern it overstates the actual cold-start time.memswap_limit: Addmemswap_limit: "${OLLAMA_MEM_LIMIT:-8g}"to the main service, matching the OCR service pattern (memswap_limit: 12g). Without this, Linux may swap Ollama model weights under memory pressure from OCR, causing inference to silently degrade to unacceptable latency.Tasks
Docker Compose service
Add
ollama_models:to the top-levelvolumes:block alongsideocr_models:andocr_cache:.Pre-PR prerequisite —
read_only: trueinvestigation: Before opening the PR, run:Test all three operations (
serve,pull,list) to find all write paths (not just startup). Ollama may write to/root/.config/ollama,/var/run, or/tmp/ollama*. If the test succeeds, addread_only: trueandtmpfs: - /tmp:size=...to match the OCR service pattern. Document the result in ADR-028 either way.Also capture
docker statspeak RSS during the pull to verifymem_limit: 2gon the init container is sufficient. If peak exceeds 2 GB, bump to 4 GB.Init container (model pull)
No
container_name— consistent withocr-volume-init. Network access is required for the model pull;archiv-netis correct.mem_limit: 2gprevents the one-time pull from starving the rest of the stack (verify peak RSS in pre-PR investigation — bump to 4 GB if exceeded).Environment variables
Add to
docker-compose.ymlbackend environment block:Both are required:
APP_OLLAMA_BASE_URLenables the conditional bean;APP_OLLAMA_API_KEYis injected asAuthorization: Bearer <key>on every request to Ollama (omitted entirely when blank — see Security).Add to
.env.example:Security
OLLAMA_API_KEYenv var on the Ollama service; require the backend to passAuthorization: Bearer ${OLLAMA_API_KEY}on every request whenapp.ollama.api-keyis non-blank. When blank, the header must be omitted entirely — sendingAuthorization: Bearer(empty token) has undefined behavior and may lock the backend out. Pattern:if (!apiKey.isBlank()) { request.header("Authorization", "Bearer " + apiKey); }— mirrorsRestClientOcrClient.java:107'strainingTokenguard.ollamaandollama-model-initget ADR-019 hardening:cap_drop: [ALL],security_opt: [no-new-privileges:true]. Investigateread_only: trueas above (pre-PR prerequisite).ollamain the Caddyfile before merging —expose:-only is correct and must not gain a public route.archiv-netisolation is the primary security control (Ollama is not internet-facing);OLLAMA_API_KEYis defense-in-depth against lateral movement from a compromised backend container.Observability — prometheus.yml + Grafana provisioning
No change to
docker-compose.observability.ymlitself — Prometheus already joinsarchiv-netand can reach Ollama by service name.infra/observability/prometheus/prometheus.yml:ocr:8000→ocr-service:8000(service name mismatch — current OCR scrape does not resolve onarchiv-net).infra/observability/grafana/provisioning/dashboards/ollama.jsonso it's available automatically without manual import.ADR
Write ADR-028 (next in sequence after ADR-027) before implementation starts. Must cover:
app.ollama.base-urlabsent or blank = disabled; backend uses@ConditionalOnExpression("!'${app.ollama.base-url:}'.isBlank()")(not@ConditionalOnPropertyalone — it activates on empty-string, creating a brokenRestClient); SpEL expression explanation: absent → resolves to''→.isBlank()true → negation false → bean not registered; same for empty-string; when Ollama is unavailable at runtime, return HTTP 503 withErrorCode: NL_SEARCH_UNAVAILABLE— this applies to ALL unavailability paths (base-url unset, service unreachable, health check failed, request timeout)@ConfigurationProperties("app.ollama") record OllamaProperties(String baseUrl, String apiKey)— not two@Valueinjections.OllamaPropertiesis registered unconditionally (it is a simple value holder). The@ConditionalOnExpressionbelongs only onRestClientOllamaClient— not on the properties record. Documents the deliberate divergence from the OCR@Value-with-default pattern: OCR is always-on with a safe default; Ollama is truly optional with no safe default URL.Optional<OllamaClient>injection: the NL search service injectsOptional<OllamaClient>. When empty (bean not registered), the service method returns 503 withNL_SEARCH_UNAVAILABLEwithout making a network call.Authorizationheader entirely whenapiKey.isBlank(). Document this alongside the empiricalOLLAMA_API_KEY=behavior so both the backend guard and the Ollama server behavior are on record.OLLAMA_API_KEYempty-string behavior: empirically verified result of whether Ollama 0.6.5 treatsOLLAMA_API_KEY=(empty) as "no auth" or "invalid key"; document the exact finding for both empty-string and fully-unset casesread_only: truefeasibility finding for the Ollama container (result of pre-PR investigation above); peak RSS of init container during pull (to validate or revisemem_limit: 2g)archiv-netisolation as primary control;OLLAMA_API_KEYas defense-in-depth for lateral movementollama_modelsvolume fills (e.g., after a model upgrade), rundocker volume rm ollama_modelsand re-pull. Volume holds model weights only — fully reproducible, no backup needed.Sibling issue (prerequisite to close this one)
Create or link the sibling issue for the Spring Boot Ollama client (graceful degradation,
@ConditionalOnExpressionbean,NL_SEARCH_UNAVAILABLEErrorCode) before this issue is closed. The sibling issue spec must include explicit TDD requirements:ErrorCode: NL_SEARCH_UNAVAILABLE(red before any implementation)OLLAMA_API_KEYis passed asAuthorization: Bearer <key>on every request, verified by asserting the header value matches the configured key — not just that a header exists (red before any implementation)app.ollama.api-keyis blank, noAuthorizationheader is present on the outgoing request — mirrors thetrainingTokenguard inRestClientOcrClient.java:107(red before any implementation)APP_OLLAMA_BASE_URL=(empty string) and when unset — not just unit tests at the service layerOllamaClientinterface beforeRestClientOllamaClientimplementation, mirroring theOcrClient/RestClientOcrClientpattern. Unit tests mock the interface.Optional<OllamaClient>; the absent case returns 503 withNL_SEARCH_UNAVAILABLEwithout a network call. Use.orElseThrow(() -> DomainException.internal(ErrorCode.NL_SEARCH_UNAVAILABLE, "..."))— not@Autowired(required = false)with a null check, which is noisier in a@RequiredArgsConstructorservice.The ErrorCode requires the standard four-step addition:
ErrorCode.java→errors.ts→getErrorMessage()→ i18n keys in de/en/es.NL search UI (linked issue required)
The graceful-degradation frontend state must be tracked as a separate Gitea issue — not just this prose note — or explicitly added to the sibling issue scope. Create and link it before closing this issue.
When the frontend graceful-degradation state is implemented, the message must do three things for the 60+ audience: reassure (app is working), explain (smart search is temporarily off), and tell them what they got instead (keyword results).
Draft i18n messages for all three locales:
"Smart Suche momentan nicht verfügbar — Ergebnisse aus normaler Stichwortsuche.""Smart search temporarily unavailable — showing keyword results.""Búsqueda inteligente temporalmente no disponible — mostrando resultados de palabras clave."Documentation
docs/architecture/c4/l2-containers.puml— add Ollama container +ollama_modelsvolume (l1-context.pumlunchanged — Ollama is internal, not an external system)docs/DEPLOYMENT.md:APP_OLLAMA_BASE_URL,APP_OLLAMA_API_KEY,OLLAMA_CPU_LIMIT,OLLAMA_MEM_LIMIT,OLLAMA_API_KEYstart_periodof init container; one-time manual verification on first deploy — record measured time in PR description)docker-compose.observability.ymlcontinuously alongside both OCR and Ollama activeollama_modelsvolume: model weights only — fully reproducible by re-pull, no backup needed. If volume fills after a model upgrade, rundocker volume rm ollama_modelsand re-pull.Acceptance Criteria
Automated checks:
docker-compose up -dstarts the Ollama service alongside the existing stackollama_modelsvolume), the Ollama service reachesservice_healthywithin 60 secondsdocker compose up -dwhen the model is already in the volume does not trigger a re-download — verified viadocker logson the init container showing "up to date" or equivalent (not "pulling manifest")ollamaservice does not start untilollama-model-initexits with code 0 (service_completed_successfully)ollama-model-initexits non-zero (e.g., network outage, disk full, bad model name),docker-compose upreports a service dependency failure forollamaGET http://ollama:11434/api/tagsis reachable from within the backend container (docker exec archive-backend curl -sf http://ollama:11434/api/tags)APP_OLLAMA_BASE_URLfrom the environment does not break any other service; the backend's/actuator/healthreturns 200 withAPP_OLLAMA_BASE_URLunsetManual verification gates (one-time, record result in PR description):
docker statspeak RSS during init container pull is withinmem_limit: 2g; if exceeded, bump limit and note the measured valueObservability stack required:
ollama:11434appears asUpImplementation complete — branch
feat/issue-737-ollama-docker-composeAll tasks from the issue spec have been implemented and committed. Sibling issues created.
Commits (9)
d3d92931docs(adr): add ADR-028 — Ollama Docker Compose service for NL search741ddfa9fix(observability): fix OCR target name + add Ollama scrape job94d5b711feat(observability): add Grafana Ollama inference latency dashboard604e7883docs(arch): add Ollama container to C4 level-2 container diagram9949e8b9docs(deploy): document Ollama hardware requirements, env vars, and ops notes8b6e3888feat(infra): add Ollama Docker Compose services for NL searchce3f5de6feat(infra): add Ollama env vars to .env.example98df26e0docs(adr): update ADR-028 with 0.30.6 verified findings for API key + read_onlyff588ba5fix(infra): escape $$SERVE_PID in compose command to prevent interpolationPre-PR investigations completed (ADR-028)
read_only: true: ✅ Works on both0.6.5and0.30.6.ollama serve,ollama pull qwen2.5:7b-instruct-q4_K_M, andollama listall succeed. Applied to bothollamaandollama-model-init.Peak RSS during pull: ~108 MiB peak.
mem_limit: 2gonollama-model-initis adequate — model weights stream to the volume, not RAM.OLLAMA_API_KEYenforcement: ❌ Not enforced in either0.6.5or0.30.6. All requests return 200 regardless of auth header or key value. The env var does not appear in Ollama's startup config dump.archiv-netisolation is the only effective security control. Retained for forward compatibility. Documented in ADR-028 §7.Image version
Pinned to
ollama/ollama:0.30.6(current stable as of 2026-06-06). Renovate will bump.Bug fixed
prometheus.ymlocr-servicescrape target wasocr:8000— Docker service DNS isocr-service, so the scrape has never resolved. Fixed in741ddfa9.Sibling issues created
feat(search): Spring Boot Ollama client with graceful degradation—OllamaClientinterface,RestClientOllamaClient,@ConditionalOnExpression,NL_SEARCH_UNAVAILABLEErrorCode,Optional<OllamaClient>injection, 5 TDD requirementsfeat(search): graceful-degradation UI when NL search is unavailable— frontend message (de/en/es), three-part UX spec for 60+ audienceNext step
Run
/review-prafter opening the PR fromfeat/issue-737-ollama-docker-compose.