security(ocr): run OCR container as non-root user (CIS Docker §4.1) #611

Merged
marcel merged 12 commits from feat/issue-459-ocr-non-root into main 2026-05-17 19:06:47 +02:00
2 changed files with 20 additions and 3 deletions
Showing only changes of commit 7769dbc9f4 - Show all commits

View File

@@ -142,8 +142,11 @@ services:
memswap_limit: ${OCR_MEM_LIMIT:-12g}
volumes:
- ocr-models:/app/models
- ocr-cache:/root/.cache
- ocr-cache:/app/cache # HuggingFace / ketos cache — prevents re-downloads on recreate (HF_HOME)
environment:
HF_HOME: /app/cache
XDG_CACHE_HOME: /app/cache
TORCH_HOME: /app/models/torch
KRAKEN_MODEL_PATH: /app/models/german_kurrent.mlmodel
TRAINING_TOKEN: ${OCR_TRAINING_TOKEN}
OCR_CONFIDENCE_THRESHOLD: "0.3"
@@ -161,6 +164,13 @@ services:
timeout: 5s
retries: 12
start_period: 120s
read_only: true
tmpfs:
- /tmp:size=512m # training endpoints write ZIPs to /tmp; 512 MB covers typical batches (2050 images)
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
backend:
image: familienarchiv/backend:${TAG:-nightly}

View File

@@ -566,12 +566,19 @@ Version-specific one-time steps that must be run before or after upgrading to a
### Upgrading to PR #611 — non-root OCR container
The OCR cache volume path changed from `/root/.cache` to `/app/cache` (PR #611 — CIS Docker §4.1 hardening). The existing `ocr_cache` volume was written as root and is inaccessible to the new non-root `ocr` user, causing a `PermissionError` on startup.
The OCR cache volume path changed from `/root/.cache` to `/app/cache` (PR #611 — CIS Docker §4.1 hardening). The existing volume was written as root and is inaccessible to the new non-root `ocr` user, causing a `PermissionError` on startup.
**Before starting the updated container stack**, drop the old root-owned volume:
**Before starting the updated container stack**, drop the old root-owned volume. The volume name depends on the compose project name:
```bash
# Dev (docker-compose.yml — project name: familienarchiv)
docker volume rm familienarchiv_ocr_cache
# Production (docker-compose.prod.yml -p archiv-production)
docker volume rm archiv-production_ocr-cache
# Staging (docker-compose.prod.yml -p archiv-staging)
docker volume rm archiv-staging_ocr-cache
```
The volume is recreated automatically on `docker compose up`. The OCR service will re-download its model cache on first startup (approximately 12 GB, one-time cost).