feat(infra): Ollama Docker Compose service for NL search (#737) #749

marcel · 2026-06-06T14:20:59+02:00

marcel commented

2026-06-06 14:20:59 +02:00

Closes #737. Part of epic #735.

Summary

New services: ollama-model-init (one-shot model pull) + ollama (inference server) in docker-compose.yml, pinned to ollama/ollama:0.30.6. Both run with read_only: true, cap_drop: ALL, no-new-privileges (ADR-019 baseline).
Bug fix: prometheus.yml OCR scrape target corrected from ocr:8000 → ocr-service:8000 (the scrape has never resolved — Docker service DNS uses the service name).
New observability: Grafana Ollama dashboard (infra/observability/grafana/provisioning/dashboards/ollama.json) with inference latency p50/p95 and request-rate panels, auto-provisioned.
Docs: ADR-028 (all decisions + three empirically-verified findings), docs/DEPLOYMENT.md updated (hardware tiers, env vars, ops notes), C4 l2-containers.puml updated.
Sibling issues created: #747 (Spring Boot Ollama client) and #748 (frontend graceful-degradation UI).

Pre-PR investigations (results in ADR-028)

Investigation	Result
`read_only: true`	✅ Works on `0.6.5` and `0.30.6` — all three ops (serve, pull, list) succeed
Peak RSS during model pull	~108 MiB — `mem_limit: 2g` on init container is adequate
`OLLAMA_API_KEY` enforcement	❌ Not enforced in either version — all requests return 200 regardless of header; `archiv-net` isolation is the only effective control

Test Plan

docker compose up -d starts the Ollama service alongside the existing stack
docker exec archive-backend curl -sf http://ollama:11434/api/tags returns JSON
Re-run docker compose up -d with model already in volume: docker logs on init container shows "already up to date", no re-download
Remove APP_OLLAMA_BASE_URL from env: GET /actuator/health returns 200 (backend unaffected)
After deploying observability stack: Prometheus target ollama:11434 shows as Up
Grafana Ollama dashboard loads automatically without manual import

🤖 Generated with Claude Code

Closes #737. Part of epic #735. ## Summary - **New services:** `ollama-model-init` (one-shot model pull) + `ollama` (inference server) in `docker-compose.yml`, pinned to `ollama/ollama:0.30.6`. Both run with `read_only: true`, `cap_drop: ALL`, `no-new-privileges` (ADR-019 baseline). - **Bug fix:** `prometheus.yml` OCR scrape target corrected from `ocr:8000` → `ocr-service:8000` (the scrape has never resolved — Docker service DNS uses the service name). - **New observability:** Grafana Ollama dashboard (`infra/observability/grafana/provisioning/dashboards/ollama.json`) with inference latency p50/p95 and request-rate panels, auto-provisioned. - **Docs:** ADR-028 (all decisions + three empirically-verified findings), `docs/DEPLOYMENT.md` updated (hardware tiers, env vars, ops notes), C4 `l2-containers.puml` updated. - **Sibling issues created:** #747 (Spring Boot Ollama client) and #748 (frontend graceful-degradation UI). ## Pre-PR investigations (results in ADR-028) | Investigation | Result | |---|---| | `read_only: true` | ✅ Works on `0.6.5` and `0.30.6` — all three ops (serve, pull, list) succeed | | Peak RSS during model pull | ~108 MiB — `mem_limit: 2g` on init container is adequate | | `OLLAMA_API_KEY` enforcement | ❌ Not enforced in either version — all requests return 200 regardless of header; `archiv-net` isolation is the only effective control | ## Test Plan - [ ] `docker compose up -d` starts the Ollama service alongside the existing stack - [ ] `docker exec archive-backend curl -sf http://ollama:11434/api/tags` returns JSON - [ ] Re-run `docker compose up -d` with model already in volume: `docker logs` on init container shows "already up to date", no re-download - [ ] Remove `APP_OLLAMA_BASE_URL` from env: `GET /actuator/health` returns 200 (backend unaffected) - [ ] After deploying observability stack: Prometheus target `ollama:11434` shows as `Up` - [ ] Grafana Ollama dashboard loads automatically without manual import 🤖 Generated with [Claude Code](https://claude.ai/claude-code)

marcel commented

2026-06-06 14:25:46 +02:00

🏛️ Markus Keller (@mkeller) — Senior Application Architect

Verdict: 🚫 Changes requested

Strong foundation. The init container pattern, graceful-degradation contract, and @ConditionalOnExpression rationale are all well-reasoned. ADR-028 is genuinely thorough — this is what a good ADR looks like. Three blockers before merge.

Blockers

1. Version mismatch in C4 diagram (docs/architecture/c4/l2-containers.puml)

The newly added Container line reads:

Container(ollama, "Ollama LLM Service", "ollama/ollama:0.6.5 / port 11434 (internal only)", "...")

docker-compose.yml pins ollama/ollama:0.30.6. The architecture diagram must match the deployed version. Fix:

Container(ollama, "Ollama LLM Service", "ollama/ollama:0.30.6 / port 11434 (internal only)", "...")

2. Stale TBD placeholder in ADR-028 Consequences section (docs/adr/028-ollama-docker-compose-service.md)

The Consequences section still says:

Three TBD items (OLLAMA_API_KEY empty-string behavior, read_only feasibility, init container peak RSS) must be resolved before the PR is merged. See Decisions §7, §8, §9.

All three TBD items are resolved in §7, §8, §9. This placeholder predates the empirical work and was never removed. Replace with something like: "All three empirical TBD items were resolved — see §7, §8, §9." The stale text actively misleads a reviewer scanning the Consequences section.

3. ADR §12 contradicts §7 on OLLAMA_API_KEY security (docs/adr/028-ollama-docker-compose-service.md)

Section 12 (Security threat model) states:

Defense-in-depth: OLLAMA_API_KEY guards against lateral movement from a compromised backend container.

Section 7 empirically proves the opposite:

OLLAMA_API_KEY provides no defense-in-depth in the tested versions.

These are contradictory claims within the same document. §12 was written before the empirical test in §7. Fix §12:

**Defense-in-depth (not effective in tested versions):** `OLLAMA_API_KEY` was intended to guard against lateral movement from a compromised backend container, but empirical testing (§7) confirmed it is not enforced in Ollama 0.30.6. `archiv-net` isolation remains the sole effective control. The env var is retained for forward compatibility only.

Doc Checklist (from my review protocol)

PR change	Required doc	Status
New Docker service	`l2-containers.puml`	✅ present, ⚠️ wrong version
New Docker service	`docs/DEPLOYMENT.md`	✅
Architectural decision with lasting consequences	New ADR	✅ ADR-028
No new backend package	`CLAUDE.md` package table	N/A
No new SvelteKit route	`CLAUDE.md` route table	N/A

Suggestions

The @ConditionalOnExpression rationale (§3) and Optional<OllamaClient> injection pattern (§5) are unusually well-documented. This is the quality bar the rest of the codebase should aim for.
The deliberate divergence from the OCR pattern (§4) is correct — Ollama is truly optional, OCR is not. Good judgment call.
60s start_period rationale (§11) is clearly argued. Appreciated.

## 🏛️ Markus Keller (@mkeller) — Senior Application Architect **Verdict: 🚫 Changes requested** Strong foundation. The init container pattern, graceful-degradation contract, and `@ConditionalOnExpression` rationale are all well-reasoned. ADR-028 is genuinely thorough — this is what a good ADR looks like. Three blockers before merge. --- ### Blockers **1. Version mismatch in C4 diagram** (`docs/architecture/c4/l2-containers.puml`) The newly added Container line reads: ``` Container(ollama, "Ollama LLM Service", "ollama/ollama:0.6.5 / port 11434 (internal only)", "...") ``` `docker-compose.yml` pins `ollama/ollama:0.30.6`. The architecture diagram must match the deployed version. Fix: ``` Container(ollama, "Ollama LLM Service", "ollama/ollama:0.30.6 / port 11434 (internal only)", "...") ``` **2. Stale TBD placeholder in ADR-028 Consequences section** (`docs/adr/028-ollama-docker-compose-service.md`) The Consequences section still says: > **Three TBD items** (OLLAMA_API_KEY empty-string behavior, `read_only` feasibility, init container peak RSS) must be resolved before the PR is merged. See Decisions §7, §8, §9. All three TBD items *are* resolved in §7, §8, §9. This placeholder predates the empirical work and was never removed. Replace with something like: "All three empirical TBD items were resolved — see §7, §8, §9." The stale text actively misleads a reviewer scanning the Consequences section. **3. ADR §12 contradicts §7 on OLLAMA_API_KEY security** (`docs/adr/028-ollama-docker-compose-service.md`) Section 12 (Security threat model) states: > **Defense-in-depth:** `OLLAMA_API_KEY` guards against lateral movement from a compromised backend container. Section 7 empirically proves the opposite: > `OLLAMA_API_KEY` provides no defense-in-depth in the tested versions. These are contradictory claims within the same document. §12 was written before the empirical test in §7. Fix §12: ``` **Defense-in-depth (not effective in tested versions):** `OLLAMA_API_KEY` was intended to guard against lateral movement from a compromised backend container, but empirical testing (§7) confirmed it is not enforced in Ollama 0.30.6. `archiv-net` isolation remains the sole effective control. The env var is retained for forward compatibility only. ``` --- ### Doc Checklist (from my review protocol) | PR change | Required doc | Status | |---|---|---| | New Docker service | `l2-containers.puml` | ✅ present, ⚠️ wrong version | | New Docker service | `docs/DEPLOYMENT.md` | ✅ | | Architectural decision with lasting consequences | New ADR | ✅ ADR-028 | | No new backend package | `CLAUDE.md` package table | N/A | | No new SvelteKit route | `CLAUDE.md` route table | N/A | --- ### Suggestions - The `@ConditionalOnExpression` rationale (§3) and `Optional<OllamaClient>` injection pattern (§5) are unusually well-documented. This is the quality bar the rest of the codebase should aim for. - The deliberate divergence from the OCR pattern (§4) is correct — Ollama is truly optional, OCR is not. Good judgment call. - 60s `start_period` rationale (§11) is clearly argued. Appreciated.

marcel commented

2026-06-06 14:26:06 +02:00

🔧 Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer

Verdict: ⚠️ Approved with concerns

Solid infrastructure work overall. Image version pinned, named volume correct, security hardening baseline applied, CI exclusion well-documented. One concern that might break the graceful-degradation story in dev, one documentation gap, and two things to verify before first deploy.

Concerns

1. APP_OLLAMA_BASE_URL is hardcoded, not interpolated (docker-compose.yml, backend environment:)

APP_OLLAMA_BASE_URL: http://ollama:11434   # ← literal value

The .env.example says: "Leave APP_OLLAMA_BASE_URL empty to disable NL search (safe default for CX32 / CI)." But because the Compose file sets a literal value — not ${APP_OLLAMA_BASE_URL:-http://ollama:11434} — setting it empty in .env has no effect on the backend container. The env var override path doesn't exist.

Compare with how OCR does it:

APP_OCR_TRAINING_TOKEN: "${OCR_TRAINING_TOKEN:-}"

If a developer on a CX32-class machine wants to disable Ollama locally, they currently have no way to do so short of editing docker-compose.yml. Suggested fix:

APP_OLLAMA_BASE_URL: ${APP_OLLAMA_BASE_URL:-http://ollama:11434}
APP_OLLAMA_API_KEY: "${OLLAMA_API_KEY}"

This preserves the current default behaviour while allowing a .env override.

2. No depends_on: ollama in the backend service — confirming this is intentional

The backend doesn't depend on ollama being healthy. On first docker compose up -d, Ollama won't be ready until the model init container finishes (~60–90 min on slow connections). NL search will return 503 during that window. If the graceful-degradation design from ADR-028 §3 is the intent here, this is correct — just flagging it for explicit confirmation.

Verification Needed

3. First docker compose up -d --wait blocks for model pull

docker compose up -d (no --wait) detaches immediately — no problem. But docker compose up -d --wait waits for all services to reach their health/completion target, which includes ollama-model-init completing successfully. On first pull, that's 60–90 minutes.

DEPLOYMENT.md §3.4 mentions the pull time but doesn't warn about --wait blocking. If any CI smoke test or staging deploy script uses --wait, it will time out. Worth adding a note: "Do not use --wait on first deploy — the ollama-model-init pull takes 60–90 minutes."

4. Grafana dashboard auto-provisioning

ollama.json is dropped in infra/observability/grafana/provisioning/dashboards/. Whether Grafana picks it up automatically depends on whether the existing provisioner YAML config scans that directory. No provisioner config was changed in this PR. Verify on first observability stack deploy with docker logs obs-grafana | grep ollama.

What's Working Well

0.30.6 pinned ✅ — reproducible builds
ollama_models: named volume ✅ — survives container rebuilds
cap_drop: ALL, no-new-privileges: true, read_only: true, tmpfs: /tmp:size=512m ✅ — ADR-019 baseline applied consistently
expose: not ports: ✅ — Ollama stays off the host network
CI exclusion via docker-compose.ci.yml service selection (not profiles) ✅ — no --profile tax on devs
Pre-existing Prometheus bug (ocr:8000 → ocr-service:8000) fixed as a drive-by ✅

## 🔧 Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer **Verdict: ⚠️ Approved with concerns** Solid infrastructure work overall. Image version pinned, named volume correct, security hardening baseline applied, CI exclusion well-documented. One concern that might break the graceful-degradation story in dev, one documentation gap, and two things to verify before first deploy. --- ### Concerns **1. `APP_OLLAMA_BASE_URL` is hardcoded, not interpolated** (`docker-compose.yml`, backend `environment:`) ```yaml APP_OLLAMA_BASE_URL: http://ollama:11434 # ← literal value ``` The `.env.example` says: *"Leave APP_OLLAMA_BASE_URL empty to disable NL search (safe default for CX32 / CI)."* But because the Compose file sets a literal value — not `${APP_OLLAMA_BASE_URL:-http://ollama:11434}` — setting it empty in `.env` has no effect on the backend container. The env var override path doesn't exist. Compare with how OCR does it: ```yaml APP_OCR_TRAINING_TOKEN: "${OCR_TRAINING_TOKEN:-}" ``` If a developer on a CX32-class machine wants to disable Ollama locally, they currently have no way to do so short of editing `docker-compose.yml`. Suggested fix: ```yaml APP_OLLAMA_BASE_URL: ${APP_OLLAMA_BASE_URL:-http://ollama:11434} APP_OLLAMA_API_KEY: "${OLLAMA_API_KEY}" ``` This preserves the current default behaviour while allowing a `.env` override. **2. No `depends_on: ollama` in the backend service** — confirming this is intentional The backend doesn't depend on `ollama` being healthy. On first `docker compose up -d`, Ollama won't be ready until the model init container finishes (~60–90 min on slow connections). NL search will return 503 during that window. If the graceful-degradation design from ADR-028 §3 is the intent here, this is correct — just flagging it for explicit confirmation. --- ### Verification Needed **3. First `docker compose up -d --wait` blocks for model pull** `docker compose up -d` (no `--wait`) detaches immediately — no problem. But `docker compose up -d --wait` waits for all services to reach their health/completion target, which includes `ollama-model-init` completing successfully. On first pull, that's 60–90 minutes. DEPLOYMENT.md §3.4 mentions the pull time but doesn't warn about `--wait` blocking. If any CI smoke test or staging deploy script uses `--wait`, it will time out. Worth adding a note: *"Do not use `--wait` on first deploy — the `ollama-model-init` pull takes 60–90 minutes."* **4. Grafana dashboard auto-provisioning** `ollama.json` is dropped in `infra/observability/grafana/provisioning/dashboards/`. Whether Grafana picks it up automatically depends on whether the existing provisioner YAML config scans that directory. No provisioner config was changed in this PR. Verify on first observability stack deploy with `docker logs obs-grafana | grep ollama`. --- ### What's Working Well - `0.30.6` pinned ✅ — reproducible builds - `ollama_models:` named volume ✅ — survives container rebuilds - `cap_drop: ALL`, `no-new-privileges: true`, `read_only: true`, `tmpfs: /tmp:size=512m` ✅ — ADR-019 baseline applied consistently - `expose:` not `ports:` ✅ — Ollama stays off the host network - CI exclusion via `docker-compose.ci.yml` service selection (not profiles) ✅ — no `--profile` tax on devs - Pre-existing Prometheus bug (`ocr:8000` → `ocr-service:8000`) fixed as a drive-by ✅

marcel commented

2026-06-06 14:26:24 +02:00

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: 🚫 Changes requested

The infra hardening baseline is solid: cap_drop: ALL, no-new-privileges: true, read_only: true, internal-only port exposure, named-volume isolation. One blocker — the ADR's security threat model makes a false claim that needs to be corrected before merge.

Blockers

1. ADR §12 contradicts §7 — false security claim in the threat model

ADR-028, Section 12:

Defense-in-depth: OLLAMA_API_KEY guards against lateral movement from a compromised backend container.

ADR-028, Section 7:

OLLAMA_API_KEY provides no defense-in-depth in the tested versions. All configurations accept all requests without authentication.

These statements are directly contradictory. Section 12 was written before the empirical test in §7. A developer reading §12 in isolation would believe the API key provides a security boundary — it does not.

Why this matters: If an operator reads the threat model and concludes "Ollama is protected by the API key as a secondary control," they may make other decisions based on that assumption (e.g., relaxing network rules, skipping firewall configs). False threat model documentation is a security risk.

Fix for §12:

**Primary control:** `archiv-net` network isolation. Ollama has no externally exposed port (`expose:` only, not `ports:`). The Caddyfile must not route any path to the Ollama service.

**Note on `OLLAMA_API_KEY`:** Per §7, `OLLAMA_API_KEY` is not enforced in Ollama 0.30.6 and provides no authentication barrier against a compromised backend container. `archiv-net` network isolation is the sole effective security control. The env var is retained for forward compatibility only — do not rely on it for access control.

Concerns

2. Version reference in .env.example comment is stale

# NOTE: Empirically verified that OLLAMA_API_KEY is NOT enforced in Ollama 0.6.5 (ADR-028 §7).

The deployed version is 0.30.6, not 0.6.5. The empirical test covered both versions — the comment should say so:

# NOTE: Empirically verified that OLLAMA_API_KEY is NOT enforced in Ollama 0.6.5 or 0.30.6 (ADR-028 §7).
# archiv-net network isolation is the only effective access control.

3. Empty API key guard (sibling issue linkage)

ADR-028 §6 documents the empty API key guard (omit Authorization header when apiKey.isBlank()). This guard lives in RestClientOllamaClient, which is a sibling issue. The ADR is the right place to document it, but the PR description should link the sibling issue explicitly so reviewers can verify end-to-end coverage when that PR lands.

What's Secure

expose: (not ports:) ✅ — Ollama unreachable from host or internet
cap_drop: ALL on both ollama and ollama-model-init ✅
no-new-privileges: true ✅
read_only: true with tmpfs for /tmp ✅ — container filesystem hardened
archiv-net as the primary isolation boundary ✅ — correctly identified as sole control
Backend graceful degradation (503 NL_SEARCH_UNAVAILABLE) ✅ — documented in ADR §3; failure mode is explicit, not silent
ADR §6 empty-key guard is the correct pattern — matches trainingToken guard in RestClientOcrClient.java:107 ✅

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: 🚫 Changes requested** The infra hardening baseline is solid: `cap_drop: ALL`, `no-new-privileges: true`, `read_only: true`, internal-only port exposure, named-volume isolation. One blocker — the ADR's security threat model makes a false claim that needs to be corrected before merge. --- ### Blockers **1. ADR §12 contradicts §7 — false security claim in the threat model** ADR-028, Section 12: > **Defense-in-depth:** `OLLAMA_API_KEY` guards against lateral movement from a compromised backend container. ADR-028, Section 7: > `OLLAMA_API_KEY` provides **no** defense-in-depth in the tested versions. All configurations accept all requests without authentication. These statements are directly contradictory. Section 12 was written before the empirical test in §7. A developer reading §12 in isolation would believe the API key provides a security boundary — it does not. **Why this matters**: If an operator reads the threat model and concludes "Ollama is protected by the API key as a secondary control," they may make other decisions based on that assumption (e.g., relaxing network rules, skipping firewall configs). False threat model documentation is a security risk. **Fix for §12:** ```markdown **Primary control:** `archiv-net` network isolation. Ollama has no externally exposed port (`expose:` only, not `ports:`). The Caddyfile must not route any path to the Ollama service. **Note on `OLLAMA_API_KEY`:** Per §7, `OLLAMA_API_KEY` is not enforced in Ollama 0.30.6 and provides no authentication barrier against a compromised backend container. `archiv-net` network isolation is the sole effective security control. The env var is retained for forward compatibility only — do not rely on it for access control. ``` --- ### Concerns **2. Version reference in `.env.example` comment is stale** ``` # NOTE: Empirically verified that OLLAMA_API_KEY is NOT enforced in Ollama 0.6.5 (ADR-028 §7). ``` The deployed version is `0.30.6`, not `0.6.5`. The empirical test covered both versions — the comment should say so: ``` # NOTE: Empirically verified that OLLAMA_API_KEY is NOT enforced in Ollama 0.6.5 or 0.30.6 (ADR-028 §7). # archiv-net network isolation is the only effective access control. ``` **3. Empty API key guard (sibling issue linkage)** ADR-028 §6 documents the empty API key guard (`omit Authorization header when apiKey.isBlank()`). This guard lives in `RestClientOllamaClient`, which is a sibling issue. The ADR is the right place to document it, but the PR description should link the sibling issue explicitly so reviewers can verify end-to-end coverage when that PR lands. --- ### What's Secure - `expose:` (not `ports:`) ✅ — Ollama unreachable from host or internet - `cap_drop: ALL` on both `ollama` and `ollama-model-init` ✅ - `no-new-privileges: true` ✅ - `read_only: true` with tmpfs for `/tmp` ✅ — container filesystem hardened - `archiv-net` as the primary isolation boundary ✅ — correctly identified as sole control - Backend graceful degradation (503 `NL_SEARCH_UNAVAILABLE`) ✅ — documented in ADR §3; failure mode is explicit, not silent - ADR §6 empty-key guard is the correct pattern — matches `trainingToken` guard in `RestClientOcrClient.java:107` ✅

marcel commented

2026-06-06 14:26:39 +02:00

🧪 Sara Holt (@saraholt) — Senior QA Engineer

Verdict: ⚠️ Approved with concerns

This is a pure infrastructure PR — Docker Compose, ADR, observability config, docs. No application code, so the TDD mandate and coverage gates don't apply here. The concerns are around unverified metric names and a provisioning gap that could produce a silent failure on first deploy.

Concerns

1. Grafana dashboard metric names not verified against Ollama 0.30.6

The dashboard JSON uses:

histogram_quantile(0.5, rate(ollama_request_duration_seconds_bucket[5m])) (p50)
histogram_quantile(0.95, rate(ollama_request_duration_seconds_bucket[5m])) (p95)
rate(ollama_requests_total[5m]) (request rate)

These metric names are asserted in the dashboard but not empirically verified against GET http://localhost:11434/metrics on a running Ollama 0.30.6 instance. If the actual metric names differ (e.g., ollama_model_request_duration_seconds or similar), all three panels will display "No data" silently.

Suggested verification before merge:

curl -s http://localhost:11434/metrics | grep -E "^(ollama_request|ollama_model)" | head -20

Update metric names in ollama.json to match the actual exposed names.

2. Grafana provisioning config not changed

ollama.json is dropped into infra/observability/grafana/provisioning/dashboards/ but the provisioner config (presumably a dashboards.yml or dashboards.yaml) was not modified. Whether Grafana auto-discovers this dashboard depends on whether the existing provisioner already scans that directory. If it doesn't, the dashboard will be silently absent on the first observability stack deploy.

Recommend verifying with docker exec obs-grafana grafana-cli ... or checking the Grafana provisioning logs after deploy.

3. Acceptance criteria are manual-only

The issue specifies 6 acceptance criteria that all require a running stack. None can be validated in CI. This is acceptable for infrastructure — but it means the PR relies entirely on manual verification before merge. Document which of the 6 ACs were actually run as part of this PR in the PR description or in a comment.

What Passes Review

Pre-existing Prometheus bug fix (ocr:8000 → ocr-service:8000) ✅ — correct Docker DNS name
ollama Prometheus job uses metrics_path: /metrics ✅
healthcheck on main ollama service with realistic start_period: 60s ✅
condition: service_completed_successfully on ollama-model-init ✅ — correct dependency type for one-shot init containers
CI exclusion via explicit service selection (not affecting CI stack) ✅
ADR-028 §9 empirically validates mem_limit: 2g with actual peak RSS data ✅ — this is exactly the kind of evidence QA wants to see in an ADR

## 🧪 Sara Holt (@saraholt) — Senior QA Engineer **Verdict: ⚠️ Approved with concerns** This is a pure infrastructure PR — Docker Compose, ADR, observability config, docs. No application code, so the TDD mandate and coverage gates don't apply here. The concerns are around unverified metric names and a provisioning gap that could produce a silent failure on first deploy. --- ### Concerns **1. Grafana dashboard metric names not verified against Ollama 0.30.6** The dashboard JSON uses: - `histogram_quantile(0.5, rate(ollama_request_duration_seconds_bucket[5m]))` (p50) - `histogram_quantile(0.95, rate(ollama_request_duration_seconds_bucket[5m]))` (p95) - `rate(ollama_requests_total[5m])` (request rate) These metric names are asserted in the dashboard but not empirically verified against `GET http://localhost:11434/metrics` on a running Ollama 0.30.6 instance. If the actual metric names differ (e.g., `ollama_model_request_duration_seconds` or similar), all three panels will display "No data" silently. **Suggested verification before merge:** ```bash curl -s http://localhost:11434/metrics | grep -E "^(ollama_request|ollama_model)" | head -20 ``` Update metric names in `ollama.json` to match the actual exposed names. **2. Grafana provisioning config not changed** `ollama.json` is dropped into `infra/observability/grafana/provisioning/dashboards/` but the provisioner config (presumably a `dashboards.yml` or `dashboards.yaml`) was not modified. Whether Grafana auto-discovers this dashboard depends on whether the existing provisioner already scans that directory. If it doesn't, the dashboard will be silently absent on the first observability stack deploy. Recommend verifying with `docker exec obs-grafana grafana-cli ...` or checking the Grafana provisioning logs after deploy. **3. Acceptance criteria are manual-only** The issue specifies 6 acceptance criteria that all require a running stack. None can be validated in CI. This is acceptable for infrastructure — but it means the PR relies entirely on manual verification before merge. Document which of the 6 ACs were actually run as part of this PR in the PR description or in a comment. --- ### What Passes Review - Pre-existing Prometheus bug fix (`ocr:8000` → `ocr-service:8000`) ✅ — correct Docker DNS name - `ollama` Prometheus job uses `metrics_path: /metrics` ✅ - `healthcheck` on main `ollama` service with realistic `start_period: 60s` ✅ - `condition: service_completed_successfully` on `ollama-model-init` ✅ — correct dependency type for one-shot init containers - CI exclusion via explicit service selection (not affecting CI stack) ✅ - ADR-028 §9 empirically validates `mem_limit: 2g` with actual peak RSS data ✅ — this is exactly the kind of evidence QA wants to see in an ADR

marcel commented

2026-06-06 14:26:53 +02:00

👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer

Verdict: ⚠️ Approved with concerns

Infrastructure-only PR — no Java, TypeScript, or Python. TDD doesn't apply. The YAML and ADR quality are high. A few documentation inconsistencies and one env-var pattern deviation worth fixing.

Concerns

1. Backend APP_OLLAMA_BASE_URL is a literal, not an interpolated env var (docker-compose.yml)

# Current — hardcoded literal
APP_OLLAMA_BASE_URL: http://ollama:11434
APP_OLLAMA_API_KEY: "${OLLAMA_API_KEY}"

The OCR equivalent uses proper interpolation:

APP_OCR_TRAINING_TOKEN: "${OCR_TRAINING_TOKEN:-}"

Following the same pattern:

APP_OLLAMA_BASE_URL: "${APP_OLLAMA_BASE_URL:-http://ollama:11434}"
APP_OLLAMA_API_KEY: "${OLLAMA_API_KEY}"

The hardcoded value means .env.example's instruction "Leave empty to disable NL search" doesn't work for the dev compose. On a resource-constrained machine where a developer wants to skip Ollama, there's no env-var lever. The graceful-degradation story only works end-to-end if the backend can actually receive a blank APP_OLLAMA_BASE_URL.

2. Version 0.6.5 referenced in two places where 0.30.6 is the deployed version

.env.example comment:

# NOTE: Empirically verified that OLLAMA_API_KEY is NOT enforced in Ollama 0.6.5 (ADR-028 §7).

ADR-028 §7 section title:

### 7. OLLAMA_API_KEY behavior in Ollama 0.6.5

Both should say 0.30.6 (or "0.6.5 and 0.30.6" since ADR §7 tested both). The deployed image is 0.30.6.

3. ADR-028 Consequences section has a stale pre-merge placeholder

- **Three TBD items** (OLLAMA_API_KEY empty-string behavior, `read_only` feasibility, init container peak RSS) must be resolved before the PR is merged. See Decisions §7, §8, §9.

These are resolved in §7, §8, §9 respectively. The placeholder was written before the empirical work and never updated. Remove it or replace with: "All three empirical TBD items from the original issue spec were resolved — see §7, §8, §9."

What Looks Good

$$SERVE_PID / $$! escaping in the command: YAML ✅ — correct escape for Compose interpolation
condition: service_completed_successfully for the init container dependency ✅
restart: "no" on ollama-model-init ✅ — one-shot containers shouldn't restart
memswap_limit matching mem_limit ✅ — prevents silent latency degradation from swap
The ollama_models volume correctly declared with no bind-mount ✅

## 👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer **Verdict: ⚠️ Approved with concerns** Infrastructure-only PR — no Java, TypeScript, or Python. TDD doesn't apply. The YAML and ADR quality are high. A few documentation inconsistencies and one env-var pattern deviation worth fixing. --- ### Concerns **1. Backend `APP_OLLAMA_BASE_URL` is a literal, not an interpolated env var** (`docker-compose.yml`) ```yaml # Current — hardcoded literal APP_OLLAMA_BASE_URL: http://ollama:11434 APP_OLLAMA_API_KEY: "${OLLAMA_API_KEY}" ``` The OCR equivalent uses proper interpolation: ```yaml APP_OCR_TRAINING_TOKEN: "${OCR_TRAINING_TOKEN:-}" ``` Following the same pattern: ```yaml APP_OLLAMA_BASE_URL: "${APP_OLLAMA_BASE_URL:-http://ollama:11434}" APP_OLLAMA_API_KEY: "${OLLAMA_API_KEY}" ``` The hardcoded value means `.env.example`'s instruction "Leave empty to disable NL search" doesn't work for the dev compose. On a resource-constrained machine where a developer wants to skip Ollama, there's no env-var lever. The graceful-degradation story only works end-to-end if the backend can actually receive a blank `APP_OLLAMA_BASE_URL`. **2. Version `0.6.5` referenced in two places where `0.30.6` is the deployed version** `.env.example` comment: ``` # NOTE: Empirically verified that OLLAMA_API_KEY is NOT enforced in Ollama 0.6.5 (ADR-028 §7). ``` ADR-028 §7 section title: ```markdown ### 7. OLLAMA_API_KEY behavior in Ollama 0.6.5 ``` Both should say `0.30.6` (or "0.6.5 and 0.30.6" since ADR §7 tested both). The deployed image is `0.30.6`. **3. ADR-028 Consequences section has a stale pre-merge placeholder** ```markdown - **Three TBD items** (OLLAMA_API_KEY empty-string behavior, `read_only` feasibility, init container peak RSS) must be resolved before the PR is merged. See Decisions §7, §8, §9. ``` These are resolved in §7, §8, §9 respectively. The placeholder was written before the empirical work and never updated. Remove it or replace with: "All three empirical TBD items from the original issue spec were resolved — see §7, §8, §9." --- ### What Looks Good - `$$SERVE_PID` / `$$!` escaping in the `command:` YAML ✅ — correct escape for Compose interpolation - `condition: service_completed_successfully` for the init container dependency ✅ - `restart: "no"` on `ollama-model-init` ✅ — one-shot containers shouldn't restart - `memswap_limit` matching `mem_limit` ✅ — prevents silent latency degradation from swap - The `ollama_models` volume correctly declared with no bind-mount ✅

marcel commented

2026-06-06 14:27:08 +02:00

📋 Elicit — Requirements Engineer

Verdict: ⚠️ Approved with concerns

This is an infrastructure story with one documented requirement that the implementation doesn't fulfil, and one operator-experience NFR that lacks traceability.

Concerns

1. Requirement/implementation gap: "leave empty to disable" doesn't work via .env

The issue spec, .env.example, and DEPLOYMENT.md all say operators can disable NL search by leaving APP_OLLAMA_BASE_URL blank. The intended behaviour:

Leave APP_OLLAMA_BASE_URL empty (or unset) → Ollama bean not registered → NL search returns 503

However, docker-compose.yml sets APP_OLLAMA_BASE_URL: http://ollama:11434 as a literal — not ${APP_OLLAMA_BASE_URL:-http://ollama:11434}. Setting APP_OLLAMA_BASE_URL= in .env has no effect on the backend container in the dev Compose.

The requirement is stated clearly. The implementation breaks it for the dev environment. This is a traceability failure — the spec says one thing, the config does another.

2. Operator experience NFR gap: no acceptance criterion for "pull in progress" visibility

The DEPLOYMENT.md documents the model pull command:

docker logs -f $(docker ps -q --filter name=ollama-model-init)

But there is no acceptance criterion in the issue that asks: "How does the operator know the init container is still running vs. stuck?"

The 60–90 minute pull is a high-friction first-start experience. A user story for this would be:

As a new operator, I want to see pull progress during first startup, so that I know the system is running and not stuck.

The docker logs command addresses this but it's buried in §3.4. A first-start notice at the top of DEPLOYMENT.md §3.4 — "On first start, the model pull takes 60–90 min. Monitor: docker logs -f ..." — would be more discoverable.

3. Sibling issue links missing from PR description

The ADR references sibling issues for the Spring Boot client and frontend graceful-degradation state. These sibling issues are not linked in the PR description. Linking them establishes end-to-end traceability from this infrastructure PR to the full feature.

What's Well-Specified

6 acceptance criteria from the original issue are behavioural and testable ✅
CX42/CX32/CX22 tier table maps hardware to feature availability — clear, unambiguous ✅
Memory budget (OCR ~6 GB + Ollama ~8 GB = ~14 GB) is stated as a measurable NFR ✅
Graceful-degradation contract is explicit: "503 NL_SEARCH_UNAVAILABLE" — not vague ✅
ollama_models volume lifecycle (no backup needed, delete to re-pull) is documented ✅

## 📋 Elicit — Requirements Engineer **Verdict: ⚠️ Approved with concerns** This is an infrastructure story with one documented requirement that the implementation doesn't fulfil, and one operator-experience NFR that lacks traceability. --- ### Concerns **1. Requirement/implementation gap: "leave empty to disable" doesn't work via `.env`** The issue spec, `.env.example`, and DEPLOYMENT.md all say operators can disable NL search by leaving `APP_OLLAMA_BASE_URL` blank. The intended behaviour: > Leave `APP_OLLAMA_BASE_URL` empty (or unset) → Ollama bean not registered → NL search returns 503 However, `docker-compose.yml` sets `APP_OLLAMA_BASE_URL: http://ollama:11434` as a literal — not `${APP_OLLAMA_BASE_URL:-http://ollama:11434}`. Setting `APP_OLLAMA_BASE_URL=` in `.env` has no effect on the backend container in the dev Compose. The requirement is stated clearly. The implementation breaks it for the dev environment. This is a traceability failure — the spec says one thing, the config does another. **2. Operator experience NFR gap: no acceptance criterion for "pull in progress" visibility** The DEPLOYMENT.md documents the model pull command: ```bash docker logs -f $(docker ps -q --filter name=ollama-model-init) ``` But there is no acceptance criterion in the issue that asks: *"How does the operator know the init container is still running vs. stuck?"* The 60–90 minute pull is a high-friction first-start experience. A user story for this would be: > **As a new operator**, I want to see pull progress during first startup, **so that** I know the system is running and not stuck. The `docker logs` command addresses this but it's buried in §3.4. A first-start notice at the top of DEPLOYMENT.md §3.4 — *"On first start, the model pull takes 60–90 min. Monitor: `docker logs -f ...`"* — would be more discoverable. **3. Sibling issue links missing from PR description** The ADR references sibling issues for the Spring Boot client and frontend graceful-degradation state. These sibling issues are not linked in the PR description. Linking them establishes end-to-end traceability from this infrastructure PR to the full feature. --- ### What's Well-Specified - 6 acceptance criteria from the original issue are behavioural and testable ✅ - CX42/CX32/CX22 tier table maps hardware to feature availability — clear, unambiguous ✅ - Memory budget (OCR ~6 GB + Ollama ~8 GB = ~14 GB) is stated as a measurable NFR ✅ - Graceful-degradation contract is explicit: "503 `NL_SEARCH_UNAVAILABLE`" — not vague ✅ - `ollama_models` volume lifecycle (no backup needed, delete to re-pull) is documented ✅

marcel commented

2026-06-06 14:27:12 +02:00

🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead

Verdict: ✅ Approved

Pure infrastructure PR — Docker Compose services, ADR, Prometheus/Grafana observability config, and deployment documentation. No frontend code, no Svelte components, no route changes, no CSS, no brand tokens, no user-facing UI.

Nothing in my review domain is affected by this PR. The Grafana dashboard panels (latency p50/p95, request rate) are internal operational tooling, not user-facing UI.

LGTM from the UI/UX perspective. No accessibility, responsive design, or brand compliance concerns.

## 🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead **Verdict: ✅ Approved** Pure infrastructure PR — Docker Compose services, ADR, Prometheus/Grafana observability config, and deployment documentation. No frontend code, no Svelte components, no route changes, no CSS, no brand tokens, no user-facing UI. Nothing in my review domain is affected by this PR. The Grafana dashboard panels (latency p50/p95, request rate) are internal operational tooling, not user-facing UI. LGTM from the UI/UX perspective. No accessibility, responsive design, or brand compliance concerns.

marcel commented

2026-06-06 14:41:23 +02:00

Re-review response

All reviewer concerns addressed in 5 commits pushed to this branch.

Blockers resolved

B1 — ADR §12 false security claim (Markus + Nora)
Commit fca367df — Replaced "Defense-in-depth: OLLAMA_API_KEY guards against lateral movement" with an accurate note referencing §7's empirical finding: not enforced in 0.6.5 or 0.30.6, archiv-net is the sole effective control.

B2 — Stale TBD placeholder in Consequences (Markus + Felix)
Commit fca367df — Replaced "Three TBD items must be resolved before the PR is merged" with "All three empirical TBD items from the original issue spec were resolved — see §7, §8, §9."

B3 — C4 diagram wrong Ollama version (Markus)
Commit 8c45ad01 — ollama/ollama:0.6.5 → ollama/ollama:0.30.6 in l2-containers.puml.

Concerns addressed

C1 — APP_OLLAMA_BASE_URL hardcoded literal (Tobias + Felix + Elicit)
Commit ea50fb9e — Changed to "${APP_OLLAMA_BASE_URL:-http://ollama:11434}", matching the APP_OCR_TRAINING_TOKEN pattern. Setting APP_OLLAMA_BASE_URL= in .env now correctly reaches the backend container.

C2 — .env.example comment says "Ollama 0.6.5" (Nora + Felix)
Commit dce57750 — Updated to "Ollama 0.6.5 or 0.30.6". Also made the archiv-net note explicit in the comment.

C3 — ADR §7 section title says "0.6.5" (Felix)
Commit fca367df — Title updated to "OLLAMA_API_KEY behavior in Ollama 0.6.5 and 0.30.6".

C4 — DEPLOYMENT.md version ref + missing --wait warning (Tobias)
Commit 6bc434eb — OLLAMA_API_KEY row updated from 0.6.5 to "0.6.5 or 0.30.6". Added an explicit blockquote to §3.4: "Do not use --wait on first deploy — the init container pull takes 60–90 minutes and will time out any deploy script that uses --wait."

Intentionally unchanged

No depends_on: ollama in backend (Tobias concern #2): Intentional per ADR §3 — the graceful-degradation design means the backend can start and serve 503 NL_SEARCH_UNAVAILABLE without Ollama being ready.

Deferred (require running infrastructure)

Grafana metric names (Sara concern #1): Verifying ollama_request_duration_seconds_bucket and ollama_requests_total requires curl http://localhost:11434/metrics against a live Ollama 0.30.6 instance. Will verify and update ollama.json before the observability stack goes to production.

Grafana provisioning auto-discovery (Sara concern #2 / Tobias concern #4): Requires running docker compose -f docker-compose.observability.yml up -d and checking docker logs obs-grafana | grep ollama. Same gate — verify before first observability deploy.

## Re-review response All reviewer concerns addressed in 5 commits pushed to this branch. --- ### Blockers resolved **B1 — ADR §12 false security claim** (Markus + Nora) Commit `fca367df` — Replaced "Defense-in-depth: `OLLAMA_API_KEY` guards against lateral movement" with an accurate note referencing §7's empirical finding: not enforced in 0.6.5 or 0.30.6, `archiv-net` is the sole effective control. **B2 — Stale TBD placeholder in Consequences** (Markus + Felix) Commit `fca367df` — Replaced "Three TBD items must be resolved before the PR is merged" with "All three empirical TBD items from the original issue spec were resolved — see §7, §8, §9." **B3 — C4 diagram wrong Ollama version** (Markus) Commit `8c45ad01` — `ollama/ollama:0.6.5` → `ollama/ollama:0.30.6` in `l2-containers.puml`. --- ### Concerns addressed **C1 — `APP_OLLAMA_BASE_URL` hardcoded literal** (Tobias + Felix + Elicit) Commit `ea50fb9e` — Changed to `"${APP_OLLAMA_BASE_URL:-http://ollama:11434}"`, matching the `APP_OCR_TRAINING_TOKEN` pattern. Setting `APP_OLLAMA_BASE_URL=` in `.env` now correctly reaches the backend container. **C2 — `.env.example` comment says "Ollama 0.6.5"** (Nora + Felix) Commit `dce57750` — Updated to "Ollama 0.6.5 or 0.30.6". Also made the `archiv-net` note explicit in the comment. **C3 — ADR §7 section title says "0.6.5"** (Felix) Commit `fca367df` — Title updated to "OLLAMA_API_KEY behavior in Ollama 0.6.5 and 0.30.6". **C4 — DEPLOYMENT.md version ref + missing `--wait` warning** (Tobias) Commit `6bc434eb` — `OLLAMA_API_KEY` row updated from 0.6.5 to "0.6.5 or 0.30.6". Added an explicit blockquote to §3.4: "Do not use `--wait` on first deploy — the init container pull takes 60–90 minutes and will time out any deploy script that uses `--wait`." --- ### Intentionally unchanged **No `depends_on: ollama` in backend** (Tobias concern #2): Intentional per ADR §3 — the graceful-degradation design means the backend can start and serve 503 `NL_SEARCH_UNAVAILABLE` without Ollama being ready. --- ### Deferred (require running infrastructure) **Grafana metric names** (Sara concern #1): Verifying `ollama_request_duration_seconds_bucket` and `ollama_requests_total` requires `curl http://localhost:11434/metrics` against a live Ollama 0.30.6 instance. Will verify and update `ollama.json` before the observability stack goes to production. **Grafana provisioning auto-discovery** (Sara concern #2 / Tobias concern #4): Requires running `docker compose -f docker-compose.observability.yml up -d` and checking `docker logs obs-grafana | grep ollama`. Same gate — verify before first observability deploy.

marcel commented

2026-06-06 14:46:40 +02:00

🏛️ Markus Keller (@mkeller) — Application Architect

Verdict: ✅ Approved

All three blockers from the previous review are resolved. Clean.

Blockers resolved

B1 — §12 no longer claims OLLAMA_API_KEY as defense-in-depth. The updated text accurately states archiv-net isolation is the sole effective control and explicitly says "do not rely on it for access control." ✅
B2 — Consequences no longer contains the stale "Three TBD items must be resolved before merge." It now names concrete resolutions (§7, §8, §9). ✅
B3 — l2-containers.puml now reads ollama/ollama:0.30.6. ✅

Documentation compliance (doc-update table check)

Trigger	Required doc	Status
New Docker service	`l2-containers.puml`	✅ Updated
New Docker service	`docs/DEPLOYMENT.md`	✅ Updated
Architectural decision	New ADR in `docs/adr/`	✅ ADR-028

No new backend packages, Spring Boot controllers, or SvelteKit routes — no further doc updates required.

Architecture quality

APP_OLLAMA_BASE_URL: "${APP_OLLAMA_BASE_URL:-http://ollama:11434}" — the interpolation fix is architecturally important. Before, setting the env var to empty in .env silently had no effect; the feature was not truly opt-out. Now it is. ✅
Init container + inference server split is correct separation. The condition: service_completed_successfully dependency ensures the inference server only starts after model weights are in the volume. ✅
No depends_on: ollama in backend is the right call. Graceful degradation via Optional<OllamaClient> is documented in ADR §3 and §5. ✅
ADR-028 is thorough. 14 sections with empirical evidence (§7, §8, §9) is exactly the depth I want in an ADR for infrastructure that has non-obvious behavior (OLLAMA_API_KEY not enforced).

Minor observation (non-blocking)

The command: on ollama-model-init uses YAML block scalar (>) which folds newlines to spaces. The result is a single-line string passed to sh -c. Functionally correct, but a list form would be more readable:

command:
  - sh
  - -c
  - "ollama serve & SERVE_PID=$! && ..."

Not a blocker — just a style note for future maintenance.

## 🏛️ Markus Keller (@mkeller) — Application Architect **Verdict: ✅ Approved** All three blockers from the previous review are resolved. Clean. ### Blockers resolved - **B1** — §12 no longer claims `OLLAMA_API_KEY` as defense-in-depth. The updated text accurately states archiv-net isolation is the sole effective control and explicitly says "do not rely on it for access control." ✅ - **B2** — Consequences no longer contains the stale "Three TBD items must be resolved before merge." It now names concrete resolutions (§7, §8, §9). ✅ - **B3** — `l2-containers.puml` now reads `ollama/ollama:0.30.6`. ✅ ### Documentation compliance (doc-update table check) | Trigger | Required doc | Status | |---|---|---| | New Docker service | `l2-containers.puml` | ✅ Updated | | New Docker service | `docs/DEPLOYMENT.md` | ✅ Updated | | Architectural decision | New ADR in `docs/adr/` | ✅ ADR-028 | No new backend packages, Spring Boot controllers, or SvelteKit routes — no further doc updates required. ### Architecture quality - `APP_OLLAMA_BASE_URL: "${APP_OLLAMA_BASE_URL:-http://ollama:11434}"` — the interpolation fix is architecturally important. Before, setting the env var to empty in `.env` silently had no effect; the feature was not truly opt-out. Now it is. ✅ - Init container + inference server split is correct separation. The `condition: service_completed_successfully` dependency ensures the inference server only starts after model weights are in the volume. ✅ - No `depends_on: ollama` in backend is the right call. Graceful degradation via `Optional<OllamaClient>` is documented in ADR §3 and §5. ✅ - ADR-028 is thorough. 14 sections with empirical evidence (§7, §8, §9) is exactly the depth I want in an ADR for infrastructure that has non-obvious behavior (`OLLAMA_API_KEY` not enforced). ### Minor observation (non-blocking) The `command:` on `ollama-model-init` uses YAML block scalar (`>`) which folds newlines to spaces. The result is a single-line string passed to `sh -c`. Functionally correct, but a list form would be more readable: ```yaml command: - sh - -c - "ollama serve & SERVE_PID=$! && ..." ``` Not a blocker — just a style note for future maintenance.

marcel commented

2026-06-06 14:46:54 +02:00

🛠️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer

Verdict: ✅ Approved

Previous concerns are addressed. Looks solid for the infrastructure side.

Previous concerns resolved

C1 — APP_OLLAMA_BASE_URL interpolation fixed: "${APP_OLLAMA_BASE_URL:-http://ollama:11434}". Disabling Ollama by setting the env var to empty in .env now works correctly. ✅
C4a — docs/DEPLOYMENT.md OLLAMA_API_KEY row now reads 0.6.5 or 0.30.6. ✅
C4b — --wait blocking warning added to DEPLOYMENT.md §3.4. This is exactly the footgun documentation I wanted: "Do not use --wait on first deploy." ✅

What looks good

ollama/ollama:0.30.6 is pinned — not :latest. ✅ (Add to Renovate config when ready.)
ollama_models is a named Docker volume — model weights survive container recreation without bind mounts. ✅
ADR §14 correctly documents that the volume holds fully reproducible content (no backup needed, docker volume rm + recreate to free after model upgrades). ✅
read_only: true, tmpfs: /tmp:size=512m, cap_drop: ALL, no-new-privileges: true on both Ollama services — matches the OCR service hardening baseline (ADR-019). ✅
expose: (not ports:) on both Ollama services — port 11434 never reaches the host or internet. ✅
prometheus.yml OCR target fixed from ocr:8000 → ocr-service:8000. This was a latent bug — the scrape job was silently failing. ✅
Ollama scrape job at ollama:11434 added. ✅

Position unchanged

I'm not adding depends_on: ollama to the backend. This is the correct call — the graceful-degradation contract in ADR §3 means the backend must start without Ollama available. Network-level unavailability is handled at runtime, not at startup.

One note on mem_limit defaults

.env.example sets OLLAMA_MEM_LIMIT=8g which is only viable on CX42. The comment in .env.example and the DEPLOYMENT.md hardware tier table both make this clear. Good. Operators on CX32 who copy .env.example without reading docs will OOM — but the comments are right there.

## 🛠️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer **Verdict: ✅ Approved** Previous concerns are addressed. Looks solid for the infrastructure side. ### Previous concerns resolved - **C1** — `APP_OLLAMA_BASE_URL` interpolation fixed: `"${APP_OLLAMA_BASE_URL:-http://ollama:11434}"`. Disabling Ollama by setting the env var to empty in `.env` now works correctly. ✅ - **C4a** — `docs/DEPLOYMENT.md` OLLAMA_API_KEY row now reads `0.6.5 or 0.30.6`. ✅ - **C4b** — `--wait` blocking warning added to DEPLOYMENT.md §3.4. This is exactly the footgun documentation I wanted: "Do not use `--wait` on first deploy." ✅ ### What looks good - `ollama/ollama:0.30.6` is pinned — not `:latest`. ✅ (Add to Renovate config when ready.) - `ollama_models` is a named Docker volume — model weights survive container recreation without bind mounts. ✅ - ADR §14 correctly documents that the volume holds fully reproducible content (no backup needed, `docker volume rm` + recreate to free after model upgrades). ✅ - `read_only: true`, `tmpfs: /tmp:size=512m`, `cap_drop: ALL`, `no-new-privileges: true` on both Ollama services — matches the OCR service hardening baseline (ADR-019). ✅ - `expose:` (not `ports:`) on both Ollama services — port 11434 never reaches the host or internet. ✅ - `prometheus.yml` OCR target fixed from `ocr:8000` → `ocr-service:8000`. This was a latent bug — the scrape job was silently failing. ✅ - Ollama scrape job at `ollama:11434` added. ✅ ### Position unchanged I'm not adding `depends_on: ollama` to the backend. This is the correct call — the graceful-degradation contract in ADR §3 means the backend must start without Ollama available. Network-level unavailability is handled at runtime, not at startup. ### One note on mem_limit defaults `.env.example` sets `OLLAMA_MEM_LIMIT=8g` which is only viable on CX42. The comment in `.env.example` and the DEPLOYMENT.md hardware tier table both make this clear. Good. Operators on CX32 who copy `.env.example` without reading docs will OOM — but the comments are right there.

marcel commented

2026-06-06 14:47:08 +02:00

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: ✅ Approved

Previous blockers resolved. The security documentation is now honest about the threat model.

Previous blockers resolved

B1 — §12 of ADR-028 previously claimed OLLAMA_API_KEY "guards against lateral movement." This was a false security claim. The updated text correctly reads: "archiv-net network isolation is the sole effective security control. The env var is retained for forward compatibility only — do not rely on it for access control." ✅
The .env.example comment on OLLAMA_API_KEY now covers both tested versions (0.6.5 or 0.30.6) and explicitly states "archiv-net network isolation is the only effective access control." ✅

Security posture review

Network isolation (primary control):

Ollama uses expose: not ports:. Port 11434 is only reachable within archiv-net. The Caddyfile would need explicit routing to expose it — and the ADR §12 says it must not route to Ollama. ✅
The ADR honestly documents that API key enforcement is not working in the tested versions. A future Ollama version may enforce it; the env var is wired up so that enforcement would automatically apply. ✅

Container hardening:

cap_drop: ALL — drops all Linux capabilities ✅
no-new-privileges: true — prevents privilege escalation via setuid ✅
read_only: true — read-only filesystem, writes limited to tmpfs and the named volume ✅
tmpfs: /tmp:size=512m — tmpfs is capped to prevent DoS via disk fill ✅
memswap_limit matches mem_limit — prevents model weights from being swapped, which could leave sensitive model data on disk ✅

The APP_OLLAMA_BASE_URL interpolation fix is also a security improvement:
Before this fix, an operator who set APP_OLLAMA_BASE_URL= to disable Ollama (e.g., on a lower-tier host) was silently ignored — the backend hardcoded http://ollama:11434. The backend would attempt Ollama connections regardless. Now the env var controls the feature correctly, and the @ConditionalOnExpression blank-check ensures the RestClient bean is not registered when the URL is absent.

No new findings.

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** Previous blockers resolved. The security documentation is now honest about the threat model. ### Previous blockers resolved - **B1** — §12 of ADR-028 previously claimed `OLLAMA_API_KEY` "guards against lateral movement." This was a false security claim. The updated text correctly reads: "archiv-net network isolation is the sole effective security control. The env var is retained for forward compatibility only — do not rely on it for access control." ✅ - The `.env.example` comment on `OLLAMA_API_KEY` now covers both tested versions (0.6.5 or 0.30.6) and explicitly states "archiv-net network isolation is the only effective access control." ✅ ### Security posture review **Network isolation (primary control):** - Ollama uses `expose:` not `ports:`. Port 11434 is only reachable within `archiv-net`. The Caddyfile would need explicit routing to expose it — and the ADR §12 says it must not route to Ollama. ✅ - The ADR honestly documents that API key enforcement is not working in the tested versions. A future Ollama version may enforce it; the env var is wired up so that enforcement would automatically apply. ✅ **Container hardening:** - `cap_drop: ALL` — drops all Linux capabilities ✅ - `no-new-privileges: true` — prevents privilege escalation via setuid ✅ - `read_only: true` — read-only filesystem, writes limited to `tmpfs` and the named volume ✅ - `tmpfs: /tmp:size=512m` — tmpfs is capped to prevent DoS via disk fill ✅ - `memswap_limit` matches `mem_limit` — prevents model weights from being swapped, which could leave sensitive model data on disk ✅ **The `APP_OLLAMA_BASE_URL` interpolation fix is also a security improvement:** Before this fix, an operator who set `APP_OLLAMA_BASE_URL=` to disable Ollama (e.g., on a lower-tier host) was silently ignored — the backend hardcoded `http://ollama:11434`. The backend would attempt Ollama connections regardless. Now the env var controls the feature correctly, and the `@ConditionalOnExpression` blank-check ensures the `RestClient` bean is not registered when the URL is absent. ### No new findings.

marcel commented

2026-06-06 14:47:22 +02:00

🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist

Verdict: ⚠️ Approved with concerns

Infrastructure-only PR — unit/integration test pyramid doesn't apply. CI pipeline exclusion is correctly documented. But two open quality risks remain from the previous review.

Previous review concerns

The Grafana metric name verification and provisioning auto-discovery were explicitly deferred in the previous review. They remain unresolved. I'm noting them again so they don't silently evaporate post-merge.

Concerns (non-blocking for merge, but must be tracked)

1. Grafana metric names not verified against Ollama 0.30.6

infra/observability/grafana/provisioning/dashboards/ollama.json references:

ollama_request_duration_seconds_bucket
ollama_requests_total

These are assumed metric names. They have not been verified by running curl http://localhost:11434/metrics against a live Ollama 0.30.6 instance. If these names are wrong, the dashboard silently shows "No data" — indistinguishable from "Ollama is healthy but idle."

Verification command:

curl -s http://localhost:11434/metrics | grep -E '^(ollama_|# HELP ollama_)'

This should be a follow-up issue before the observability dashboard is declared functional.

2. Grafana provisioning auto-discovery not verified

The ollama.json file is placed in infra/observability/grafana/provisioning/dashboards/. Whether the Grafana provisioning config (grafana/provisioning/dashboards/dashboard.yml or equivalent) auto-discovers all JSON files in that directory — or requires explicit registration — has not been verified against the running observability stack.

3. Init container idempotency on re-pull (minor, documentation gap)

When docker compose up -d --force-recreate is run, ollama-model-init re-executes ollama pull qwen2.5:7b-instruct-q4_K_M. The pull exits quickly if the model is already in the volume (Ollama's CLI skips re-download when the digest matches). This is correct behavior, but it's not documented in DEPLOYMENT.md. Worth adding a note so operators don't panic when they see the init container run again on redeploy.

What works

CI exclusion via explicit service selection in docker-compose.ci.yml — correct approach, no Compose profiles friction. ✅
Healthcheck on ollama uses /api/tags with appropriate start_period: 60s. ✅
ollama-model-init depends_on: condition: service_completed_successfully ensures inference server only starts after model pull. ✅

## 🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist **Verdict: ⚠️ Approved with concerns** Infrastructure-only PR — unit/integration test pyramid doesn't apply. CI pipeline exclusion is correctly documented. But two open quality risks remain from the previous review. ### Previous review concerns The Grafana metric name verification and provisioning auto-discovery were explicitly deferred in the previous review. They remain unresolved. I'm noting them again so they don't silently evaporate post-merge. ### Concerns (non-blocking for merge, but must be tracked) **1. Grafana metric names not verified against Ollama 0.30.6** `infra/observability/grafana/provisioning/dashboards/ollama.json` references: - `ollama_request_duration_seconds_bucket` - `ollama_requests_total` These are assumed metric names. They have not been verified by running `curl http://localhost:11434/metrics` against a live Ollama 0.30.6 instance. If these names are wrong, the dashboard silently shows "No data" — indistinguishable from "Ollama is healthy but idle." Verification command: ```bash curl -s http://localhost:11434/metrics | grep -E '^(ollama_|# HELP ollama_)' ``` This should be a follow-up issue before the observability dashboard is declared functional. **2. Grafana provisioning auto-discovery not verified** The `ollama.json` file is placed in `infra/observability/grafana/provisioning/dashboards/`. Whether the Grafana provisioning config (`grafana/provisioning/dashboards/dashboard.yml` or equivalent) auto-discovers all JSON files in that directory — or requires explicit registration — has not been verified against the running observability stack. **3. Init container idempotency on re-pull** (minor, documentation gap) When `docker compose up -d --force-recreate` is run, `ollama-model-init` re-executes `ollama pull qwen2.5:7b-instruct-q4_K_M`. The pull exits quickly if the model is already in the volume (Ollama's CLI skips re-download when the digest matches). This is correct behavior, but it's not documented in DEPLOYMENT.md. Worth adding a note so operators don't panic when they see the init container run again on redeploy. ### What works - CI exclusion via explicit service selection in `docker-compose.ci.yml` — correct approach, no Compose profiles friction. ✅ - Healthcheck on `ollama` uses `/api/tags` with appropriate `start_period: 60s`. ✅ - `ollama-model-init` `depends_on: condition: service_completed_successfully` ensures inference server only starts after model pull. ✅

marcel commented

2026-06-06 14:47:38 +02:00

📋 Elicit — Requirements Engineer

Verdict: ⚠️ Approved with concerns

Reviewing for completeness, operational specification, and traceability.

Well-specified

Hardware tier table in DEPLOYMENT.md with NL Search column — operators know immediately whether their instance supports the feature. ✅
Memory budget table in ADR §2 — clear per-component breakdown with operational implication ("do NOT run observability stack alongside OCR + Ollama on CX42"). ✅
--wait warning in DEPLOYMENT.md §3.4 — documents a concrete deployment footgun with exact symptom (60–90 min block). ✅
Env var table in DEPLOYMENT.md — APP_OLLAMA_BASE_URL disable pattern documented with the "Leave empty to disable" phrasing. ✅
CI exclusion strategy — explicit service selection, no profile friction, comment in Compose file. ✅
ADR §3 graceful degradation contract — single code path covers all unavailability scenarios. This is the right NFR for a family archive with variable hosting. ✅

Gap 1: Model upgrade runbook missing from DEPLOYMENT.md (blocker suggestion)

ADR §14 explicitly states:

"Model upgrades require a docker volume rm to free old weights before pulling the replacement. Document this in runbook/DEPLOYMENT.md."

This documentation does not yet exist in DEPLOYMENT.md. An operator who upgrades from qwen2.5:7b-instruct-q4_K_M to a newer model will hit a disk-full scenario without guidance. Suggested addition to DEPLOYMENT.md §3.4 or a new §3.5:

### Model upgrade
To upgrade the Ollama model (e.g. from qwen2.5:7b-instruct-q4_K_M to a newer version):
1. Update the model name in the `ollama-model-init` command in `docker-compose.yml`
2. Remove the existing model volume: `docker volume rm familienarchiv_ollama_models`
3. Restart: `docker compose up -d`
The init container will pull the new model weights on first start (~4–8 GB download).

Gap 2: No post-deploy verification step in DEPLOYMENT.md

DEPLOYMENT.md documents how to enable Ollama but not how to verify the feature is operational after deploy. Operators need a smoke test. Suggested addition:

### Verify NL search is active
curl http://localhost:8080/api/nl-search?q=brief+von+grossmutter
# Should return 200 with results, not 503 NL_SEARCH_UNAVAILABLE

Non-blocking observation

.env.example says "Requires CX42 (16 GB RAM) to run alongside OCR." This duplicates the DEPLOYMENT.md hardware tier table. A cross-reference (see DEPLOYMENT.md §3 for hardware requirements) would prevent future drift between the two sources.

## 📋 Elicit — Requirements Engineer **Verdict: ⚠️ Approved with concerns** Reviewing for completeness, operational specification, and traceability. ### Well-specified - Hardware tier table in DEPLOYMENT.md with NL Search column — operators know immediately whether their instance supports the feature. ✅ - Memory budget table in ADR §2 — clear per-component breakdown with operational implication ("do NOT run observability stack alongside OCR + Ollama on CX42"). ✅ - `--wait` warning in DEPLOYMENT.md §3.4 — documents a concrete deployment footgun with exact symptom (60–90 min block). ✅ - Env var table in DEPLOYMENT.md — `APP_OLLAMA_BASE_URL` disable pattern documented with the "Leave empty to disable" phrasing. ✅ - CI exclusion strategy — explicit service selection, no profile friction, comment in Compose file. ✅ - ADR §3 graceful degradation contract — single code path covers all unavailability scenarios. This is the right NFR for a family archive with variable hosting. ✅ ### Gap 1: Model upgrade runbook missing from DEPLOYMENT.md (blocker suggestion) ADR §14 explicitly states: > "Model upgrades require a `docker volume rm` to free old weights before pulling the replacement. Document this in runbook/DEPLOYMENT.md." This documentation does not yet exist in DEPLOYMENT.md. An operator who upgrades from `qwen2.5:7b-instruct-q4_K_M` to a newer model will hit a disk-full scenario without guidance. Suggested addition to DEPLOYMENT.md §3.4 or a new §3.5: ``` ### Model upgrade To upgrade the Ollama model (e.g. from qwen2.5:7b-instruct-q4_K_M to a newer version): 1. Update the model name in the `ollama-model-init` command in `docker-compose.yml` 2. Remove the existing model volume: `docker volume rm familienarchiv_ollama_models` 3. Restart: `docker compose up -d` The init container will pull the new model weights on first start (~4–8 GB download). ``` ### Gap 2: No post-deploy verification step in DEPLOYMENT.md DEPLOYMENT.md documents how to enable Ollama but not how to verify the feature is operational after deploy. Operators need a smoke test. Suggested addition: ``` ### Verify NL search is active curl http://localhost:8080/api/nl-search?q=brief+von+grossmutter # Should return 200 with results, not 503 NL_SEARCH_UNAVAILABLE ``` ### Non-blocking observation `.env.example` says "Requires CX42 (16 GB RAM) to run alongside OCR." This duplicates the DEPLOYMENT.md hardware tier table. A cross-reference (`see DEPLOYMENT.md §3 for hardware requirements`) would prevent future drift between the two sources.

marcel commented

2026-06-06 14:47:46 +02:00

🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead

Verdict: ✅ Approved

Pure infrastructure PR — Docker Compose services, ADR, Prometheus scrape config, Grafana dashboard JSON. No frontend components, Svelte files, or user-facing UI changes to review.

UX-adjacent observation

The only user-facing consequence of this PR is the HTTP 503 NL_SEARCH_UNAVAILABLE error code that gets returned when Ollama is not configured. The graceful-degradation contract (ADR §3) is well-specified:

app.ollama.base-url absent OR blank → NL search returns HTTP 503 with ErrorCode: NL_SEARCH_UNAVAILABLE

From a UX perspective, this error needs to be:

Caught in the frontend NL search component
Displayed as a user-friendly message via getErrorMessage(NL_SEARCH_UNAVAILABLE) — not raw JSON
Ideally: the NL search UI is hidden or disabled when the backend signals unavailability, rather than showing an error after the user already typed a query

These are concerns for the frontend NL search PR (issue #735 epic), not this infra PR. I'll flag them there.

LGTM on infrastructure.

## 🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead **Verdict: ✅ Approved** Pure infrastructure PR — Docker Compose services, ADR, Prometheus scrape config, Grafana dashboard JSON. No frontend components, Svelte files, or user-facing UI changes to review. ### UX-adjacent observation The only user-facing consequence of this PR is the HTTP 503 `NL_SEARCH_UNAVAILABLE` error code that gets returned when Ollama is not configured. The graceful-degradation contract (ADR §3) is well-specified: > `app.ollama.base-url` absent OR blank → NL search returns HTTP 503 with `ErrorCode: NL_SEARCH_UNAVAILABLE` From a UX perspective, this error needs to be: 1. Caught in the frontend NL search component 2. Displayed as a user-friendly message via `getErrorMessage(NL_SEARCH_UNAVAILABLE)` — not raw JSON 3. Ideally: the NL search UI is hidden or disabled when the backend signals unavailability, rather than showing an error after the user already typed a query These are concerns for the frontend NL search PR (issue #735 epic), not this infra PR. I'll flag them there. LGTM on infrastructure.

marcel commented

2026-06-06 14:48:00 +02:00

👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer

Verdict: ✅ Approved

No application code changed — this is infrastructure and documentation. Reviewing the embedded Java code patterns in the ADR and the Docker Compose correctness.

ADR code patterns (will be implemented in the NL search backend PR)

The patterns documented in ADR-028 §3–6 are clean:

@ConditionalOnExpression blank-check (§3):

@ConditionalOnExpression("!'${app.ollama.base-url:}'.isBlank()")

Correct. @ConditionalOnProperty would register the bean when the property is present-but-blank, producing a RestClient with an empty base URL that fails at runtime with an opaque error. The blank-check pattern is more correct for optional features with no safe default. ✅

@ConfigurationProperties record (§4):

@ConfigurationProperties("app.ollama")
record OllamaProperties(String baseUrl, String apiKey) {}

Clean — immutable record, registered unconditionally as a value holder. Correct separation from the conditional RestClientOllamaClient bean. ✅

Optional<OllamaClient> injection (§5):

private final Optional<OllamaClient> ollamaClient;

Avoids null field with @Autowired(required = false), which is noisy in @RequiredArgsConstructor classes. orElseThrow(NL_SEARCH_UNAVAILABLE) is a clean guard. ✅

Empty API key guard (§6):

if (!apiKey.isBlank()) {
    request.header("Authorization", "Bearer " + apiKey);
}

Mirrors trainingToken guard in RestClientOcrClient.java:107. Consistent pattern. ✅

Docker Compose correctness

Init container PID capture (ADR §10):

ollama serve & SERVE_PID=$!
until curl -sf http://localhost:11434/api/tags; do sleep 1; done
ollama pull qwen2.5:7b-instruct-q4_K_M
kill $SERVE_PID

SERVE_PID=$! is the right approach — kill %1 is job-control syntax and unreliable in non-interactive sh -c contexts. ✅

depends_on: condition: service_completed_successfully on ollama → ollama-model-init ensures inference server starts only after pull completes. Correct. ✅

No TDD concerns

This PR contains no production application code. Docker Compose, Prometheus YAML, and Grafana JSON have no unit test layer. The CI exclusion via explicit service selection in docker-compose.ci.yml is the right approach.

One minor style note (non-blocking): the > YAML block scalar for ollama-model-init command folds newlines to spaces — functionally identical to a single-line string, but harder to read. Could be a list form in a future cleanup.

## 👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer **Verdict: ✅ Approved** No application code changed — this is infrastructure and documentation. Reviewing the embedded Java code patterns in the ADR and the Docker Compose correctness. ### ADR code patterns (will be implemented in the NL search backend PR) The patterns documented in ADR-028 §3–6 are clean: **`@ConditionalOnExpression` blank-check (§3):** ```java @ConditionalOnExpression("!'${app.ollama.base-url:}'.isBlank()") ``` Correct. `@ConditionalOnProperty` would register the bean when the property is present-but-blank, producing a `RestClient` with an empty base URL that fails at runtime with an opaque error. The blank-check pattern is more correct for optional features with no safe default. ✅ **`@ConfigurationProperties` record (§4):** ```java @ConfigurationProperties("app.ollama") record OllamaProperties(String baseUrl, String apiKey) {} ``` Clean — immutable record, registered unconditionally as a value holder. Correct separation from the conditional `RestClientOllamaClient` bean. ✅ **`Optional<OllamaClient>` injection (§5):** ```java private final Optional<OllamaClient> ollamaClient; ``` Avoids null field with `@Autowired(required = false)`, which is noisy in `@RequiredArgsConstructor` classes. `orElseThrow(NL_SEARCH_UNAVAILABLE)` is a clean guard. ✅ **Empty API key guard (§6):** ```java if (!apiKey.isBlank()) { request.header("Authorization", "Bearer " + apiKey); } ``` Mirrors `trainingToken` guard in `RestClientOcrClient.java:107`. Consistent pattern. ✅ ### Docker Compose correctness **Init container PID capture (ADR §10):** ```sh ollama serve & SERVE_PID=$! until curl -sf http://localhost:11434/api/tags; do sleep 1; done ollama pull qwen2.5:7b-instruct-q4_K_M kill $SERVE_PID ``` `SERVE_PID=$!` is the right approach — `kill %1` is job-control syntax and unreliable in non-interactive `sh -c` contexts. ✅ **`depends_on: condition: service_completed_successfully`** on `ollama` → `ollama-model-init` ensures inference server starts only after pull completes. Correct. ✅ ### No TDD concerns This PR contains no production application code. Docker Compose, Prometheus YAML, and Grafana JSON have no unit test layer. The CI exclusion via explicit service selection in `docker-compose.ci.yml` is the right approach. One minor style note (non-blocking): the `>` YAML block scalar for `ollama-model-init` command folds newlines to spaces — functionally identical to a single-line string, but harder to read. Could be a list form in a future cleanup.

marcel referenced this issue from a commit

2026-06-06 14:55:22 +02:00

docs(ollama): add model upgrade runbook + post-deploy smoke test to DEPLOYMENT.md

marcel commented

2026-06-06 14:55:29 +02:00

docs/DEPLOYMENT.md updated — commit 27bef28c

Addressed the open concerns from Elicit and Sara's reviews:

Model upgrade runbook (Elicit Gap 1 / ADR §14 self-reference): expanded the brief ollama_models note in §6 into a full step-by-step procedure — update model name in Compose, docker volume rm, docker compose up -d. Includes production volume name prefix (archiv-production_ollama_models).
Re-deploy idempotency note (Sara concern 3): added to §3.4 — ollama-model-init re-runs on --force-recreate but exits in seconds when model digest already matches the volume.
Post-deploy NL search smoke test (Elicit Gap 2): added to §3.4 — curl command distinguishing 200 (active) from 503 NL_SEARCH_UNAVAILABLE (Ollama unreachable or APP_OLLAMA_BASE_URL unset).

**`docs/DEPLOYMENT.md` updated** — commit `27bef28c` Addressed the open concerns from Elicit and Sara's reviews: 1. **Model upgrade runbook** (Elicit Gap 1 / ADR §14 self-reference): expanded the brief `ollama_models` note in §6 into a full step-by-step procedure — update model name in Compose, `docker volume rm`, `docker compose up -d`. Includes production volume name prefix (`archiv-production_ollama_models`). 2. **Re-deploy idempotency note** (Sara concern 3): added to §3.4 — `ollama-model-init` re-runs on `--force-recreate` but exits in seconds when model digest already matches the volume. 3. **Post-deploy NL search smoke test** (Elicit Gap 2): added to §3.4 — `curl` command distinguishing `200` (active) from `503 NL_SEARCH_UNAVAILABLE` (Ollama unreachable or `APP_OLLAMA_BASE_URL` unset).

marcel added 15 commits 2026-06-06 14:59:43 +02:00

docs(adr): add ADR-028 — Ollama Docker Compose service for NL search (#737 ) c0d034c85d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(observability): fix OCR target name + add Ollama scrape job (#737 ) 1f379a161d

- prometheus.yml: ocr:8000 → ocr-service:8000 (Docker service name is
  ocr-service, not ocr — current scrape target has never resolved)
- Add Ollama scrape job on ollama:11434 /metrics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(observability): add Grafana Ollama inference latency dashboard (#737 ) 25252fc709

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(arch): add Ollama container to C4 level-2 container diagram (#737 ) 64120a30b5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(deploy): document Ollama hardware requirements, env vars, and ops notes (#737 ) df10a42069

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(infra): add Ollama Docker Compose services for NL search (#737 ) 9637ebbca2

- ollama-model-init: one-shot init container that pulls qwen2.5:7b-instruct-q4_K_M
  into the ollama_models volume on first start
- ollama: main inference service on archiv-net (expose: only, no public port)
- ollama_models named volume for persistent model storage
- APP_OLLAMA_BASE_URL + APP_OLLAMA_API_KEY added to backend env
- Both services: cap_drop ALL, no-new-privileges, read_only+tmpfs (ADR-019 + ADR-028)
- start_period: 60s — model pre-pulled by init container

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(infra): add Ollama env vars to .env.example (#737 ) e8f3004c4f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): update ADR-028 with 0.30.6 verified findings for API key + read_only (#737 ) 93e90424ab

- OLLAMA_API_KEY: non-enforcement confirmed on both 0.6.5 and 0.30.6
- read_only: true: confirmed working on both 0.6.5 and 0.30.6
- Peak RSS during pull: ~108 MiB (well under 2g limit)
- All TBD placeholders resolved

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(infra): escape \$\$SERVE_PID in compose command to prevent interpolation (#737 ) 5a939d9222

Docker Compose interpolates $VAR in command strings — use $$ to pass a
literal $ to the shell so SERVE_PID=$! and kill $SERVE_PID work correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(adr): fix ADR-028 §12 false API-key claim, stale TBD, and §7 title 3536ed884c

§12 stated OLLAMA_API_KEY guards against lateral movement — contradicts
§7's empirical finding that it is not enforced. Replaced with an accurate
note referencing §7. Stale pre-merge placeholder in Consequences ("Three
TBD items must be resolved") removed; all three are resolved. §7 section
title updated from "0.6.5" to "0.6.5 and 0.30.6" to match the body text.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(c4): fix Ollama container version 0.6.5 → 0.30.6 in l2-containers.puml cbba95c3f8

Diagram must match the pinned image version in docker-compose.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(infra): interpolate APP_OLLAMA_BASE_URL so .env empty-value disables Ollama 662a8f3e80

Hardcoded literal overrides any .env setting — setting APP_OLLAMA_BASE_URL=
in .env had no effect on the backend container. Now uses the same pattern
as APP_OCR_TRAINING_TOKEN with a safe default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(env): correct OLLAMA_API_KEY comment — tested on 0.6.5 and 0.30.6 52fca38f0f

Both versions were tested and neither enforces the key. Comment updated to
say "0.6.5 or 0.30.6" and surface archiv-net as the sole effective control.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(deployment): fix OLLAMA_API_KEY version ref and add --wait warning 3d5dcd1f18

Updated OLLAMA_API_KEY env vars table from 0.6.5 to 0.6.5 or 0.30.6 to
match both tested versions. Added an explicit warning in §3.4 that
docker compose up -d --wait blocks for 60–90 min on first deploy when the
model pull has not yet completed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(ollama): add model upgrade runbook + post-deploy smoke test to DEPLOYMENT.md

CI / Unit & Component Tests (pull_request) Has been cancelled

Details

CI / OCR Service Tests (pull_request) Has been cancelled

Details

CI / Backend Unit Tests (pull_request) Has been cancelled

Details

CI / fail2ban Regex (pull_request) Has been cancelled

Details

CI / Semgrep Security Scan (pull_request) Has been cancelled

Details

CI / Compose Bucket Idempotency (pull_request) Has been cancelled

Details

CI / Unit & Component Tests (push) Successful in 3m16s

Details

CI / OCR Service Tests (push) Successful in 23s

Details

CI / Backend Unit Tests (push) Successful in 3m37s

Details

CI / fail2ban Regex (push) Successful in 47s

Details

CI / Semgrep Security Scan (push) Successful in 22s

Details

CI / Compose Bucket Idempotency (push) Successful in 1m4s

Details

7679596c70

Addresses Elicit's and Sara's review concerns on PR #749:
- Expand §6 ollama_models section into a full model upgrade runbook (step-by-step
  docker volume rm + recreate, including production volume name prefix)
- Add re-deploy idempotency note to §3.4 (init container exits quickly when model
  already present on the volume)
- Add NL search smoke test to §3.4 (curl command distinguishing 200 from 503
  NL_SEARCH_UNAVAILABLE)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel force-pushed feat/issue-737-ollama-docker-compose from 27bef28c0e to 7679596c70

2026-06-06 14:59:43 +02:00

Compare

marcel referenced this issue from a commit

2026-06-06 14:59:43 +02:00

docs(ollama): add model upgrade runbook + post-deploy smoke test to DEPLOYMENT.md

marcel merged commit 7679596c70 into main

2026-06-06 15:19:06 +02:00

marcel deleted branch feat/issue-737-ollama-docker-compose

2026-06-06 15:19:06 +02:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#749