docs(deploy): document Ollama hardware requirements, env vars, and ops notes (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -50,14 +50,16 @@ graph TD
|
|||||||
|
|
||||||
The OCR service requires significant RAM for model loading. The dev compose sets `mem_limit: 12g`.
|
The OCR service requires significant RAM for model loading. The dev compose sets `mem_limit: 12g`.
|
||||||
|
|
||||||
| Production target | RAM | Recommended OCR limit | Notes |
|
| Production target | RAM | Recommended OCR limit | NL Search | Notes |
|
||||||
|---|---|---|---|
|
|---|---|---|---|---|
|
||||||
| Hetzner CX42 | 16 GB | 12 GB | Recommended for OCR-enabled production |
|
| Hetzner CX42 | 16 GB | 12 GB | Supported (Ollama 8 GB + OCR 6 GB active ≈ 14 GB) | Recommended for OCR-enabled production |
|
||||||
| Hetzner CX32 | 8 GB | 6 GB | Accept reduced batch sizes and slower throughput |
|
| Hetzner CX32 | 8 GB | 6 GB | Disabled — set `APP_OLLAMA_BASE_URL=` (empty) | Accept reduced batch sizes and slower throughput |
|
||||||
| Hetzner CX22 | 4 GB | — | Disable the OCR service (`profiles: [ocr]`); run OCR on demand only |
|
| Hetzner CX22 | 4 GB | — | Unsupported | Disable the OCR service (`profiles: [ocr]`); run OCR on demand only |
|
||||||
|
|
||||||
A CX32 cannot honour the default `mem_limit: 12g` — set the `OCR_MEM_LIMIT=6g` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow) before deploying on a CX32. The prod compose interpolates this var with a 12g default.
|
A CX32 cannot honour the default `mem_limit: 12g` — set the `OCR_MEM_LIMIT=6g` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow) before deploying on a CX32. The prod compose interpolates this var with a 12g default.
|
||||||
|
|
||||||
|
> **Memory budget (CX42):** OCR (~6 GB active) + Ollama (~8 GB) = ~14 GB. Do not run `docker-compose.observability.yml` continuously alongside both services on a CX42.
|
||||||
|
|
||||||
### Dev vs production differences
|
### Dev vs production differences
|
||||||
|
|
||||||
| Concern | Dev (`docker-compose.yml`) | Prod (`docker-compose.prod.yml`) |
|
| Concern | Dev (`docker-compose.yml`) | Prod (`docker-compose.prod.yml`) |
|
||||||
@@ -144,6 +146,16 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
|
|||||||
| `XDG_CACHE_HOME` | XDG cache base dir — redirects Matplotlib and other XDG-aware libraries away from the read-only `HOME` (`/home/ocr`) to the writable cache volume | `/app/cache` | — | — |
|
| `XDG_CACHE_HOME` | XDG cache base dir — redirects Matplotlib and other XDG-aware libraries away from the read-only `HOME` (`/home/ocr`) to the writable cache volume | `/app/cache` | — | — |
|
||||||
| `TORCH_HOME` | PyTorch model cache — redirects `~/.cache/torch` to the writable models volume | `/app/models/torch` | — | — |
|
| `TORCH_HOME` | PyTorch model cache — redirects `~/.cache/torch` to the writable models volume | `/app/models/torch` | — | — |
|
||||||
|
|
||||||
|
### Ollama (NL search) service
|
||||||
|
|
||||||
|
| Variable | Purpose | Default | Required? | Sensitive? |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| `APP_OLLAMA_BASE_URL` | Base URL for the Ollama service. Leave empty to disable NL search. | `http://ollama:11434` | — | — |
|
||||||
|
| `APP_OLLAMA_API_KEY` | API key passed as `Authorization: Bearer` to Ollama. Leave empty for unauthenticated access. Note: `OLLAMA_API_KEY` is not enforced in Ollama 0.6.5 (see ADR-028). | — | — | YES |
|
||||||
|
| `OLLAMA_CPU_LIMIT` | Docker CPU quota for the Ollama container. On CX42 (8 vCPUs) can be raised to `7.5`. | `4.0` | — | — |
|
||||||
|
| `OLLAMA_MEM_LIMIT` | Memory limit for the Ollama container. Requires CX42 (16 GB RAM). | `8g` | — | — |
|
||||||
|
| `OLLAMA_API_KEY` | API key set on the Ollama service itself. Same value as `APP_OLLAMA_API_KEY`. Leave empty for unauthenticated. | — | — | YES |
|
||||||
|
|
||||||
### Observability stack (`docker-compose.observability.yml`)
|
### Observability stack (`docker-compose.observability.yml`)
|
||||||
|
|
||||||
| Variable | Purpose | Default | Required? | Sensitive? |
|
| Variable | Purpose | Default | Required? | Sensitive? |
|
||||||
@@ -264,6 +276,8 @@ git.raddatz.cloud A <server IP>
|
|||||||
|
|
||||||
### 3.4 First deploy
|
### 3.4 First deploy
|
||||||
|
|
||||||
|
> **First start — Ollama model pull:** On first `docker compose up -d`, the `ollama-model-init` container pulls `qwen2.5:7b-instruct-q4_K_M` (~4.7 GB). At 10 Mbps this takes approximately 60–90 minutes; at 100 Mbps approximately 6–10 minutes. The pull is a one-time operation — subsequent restarts skip it (model already on the `ollama_models` volume). Monitor progress with `docker logs -f $(docker ps -q --filter name=ollama-model-init)`.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
|
# 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
|
||||||
# Expected: docker compose up -d --wait succeeds for archiv-staging, then
|
# Expected: docker compose up -d --wait succeeds for archiv-staging, then
|
||||||
@@ -559,6 +573,14 @@ bash scripts/download-kraken-models.sh
|
|||||||
|
|
||||||
> Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.
|
> Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.
|
||||||
|
|
||||||
|
### Manage the `ollama_models` volume
|
||||||
|
|
||||||
|
> **`ollama_models` volume:** holds model weights only — fully reproducible by re-pull, no backup needed. If the volume fills after a model upgrade:
|
||||||
|
> ```bash
|
||||||
|
> docker volume rm ollama_models && docker compose up -d
|
||||||
|
> ```
|
||||||
|
> The init container re-pulls the model on next startup.
|
||||||
|
|
||||||
### Trigger a canonical import
|
### Trigger a canonical import
|
||||||
|
|
||||||
The importer no longer parses the raw spreadsheet. It consumes the **canonical artifacts**
|
The importer no longer parses the raw spreadsheet. It consumes the **canonical artifacts**
|
||||||
|
|||||||
Reference in New Issue
Block a user