security(ocr): run OCR container as non-root user (CIS Docker §4.1) #611

Merged
marcel merged 12 commits from feat/issue-459-ocr-non-root into main 2026-05-17 19:06:47 +02:00

12 Commits

Author SHA1 Message Date
Marcel
f1e0b92f47 style(ocr): normalize cap_drop to block notation in docker-compose.yml
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m10s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
CI / Unit & Component Tests (push) Successful in 3m2s
CI / OCR Service Tests (push) Successful in 18s
CI / Backend Unit Tests (push) Successful in 3m0s
CI / fail2ban Regex (push) Successful in 42s
CI / Semgrep Security Scan (push) Successful in 18s
CI / Compose Bucket Idempotency (push) Successful in 1m1s
Aligns with the block sequence style used in docker-compose.prod.yml and
the rest of the compose file, removing the inline [ALL] inconsistency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 18:54:24 +02:00
Marcel
bead6f1811 fix(ocr): handle empty-string HTRMOPO_DIR env var with or-fallback
os.environ.get(key, default) returns "" when the key exists but is blank —
the default is only used when the key is absent. The or-fallback treats both
absence and blank values as "use the default".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 18:53:26 +02:00
Marcel
7769dbc9f4 security(ocr): apply container hardening baseline to docker-compose.prod.yml
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m4s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 18s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
Mirror the CIS Docker §4.1/§4.6 hardening from docker-compose.yml to the
production/staging compose file, which is standalone (not an overlay).

- Fix cache volume mount path: ocr-cache:/root/.cache → /app/cache (matches
  the non-root user's HF_HOME/XDG_CACHE_HOME, avoids PermissionError)
- Add HF_HOME, XDG_CACHE_HOME, TORCH_HOME env vars so HuggingFace, ketos,
  and PyTorch all write to the declared writable volumes, not HOME
- Add read_only: true, tmpfs (/tmp:512m), cap_drop: [ALL],
  no-new-privileges:true — matching the dev baseline

Also extend DEPLOYMENT.md §8 upgrade notes to cover all three environments
(dev/production/staging), each with its correct project-namespaced volume name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:43:18 +02:00
Marcel
74ca5ee35f docs(adr): ADR-019 — container hardening baseline (non-root + read-only)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m2s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m11s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 17s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:33:06 +02:00
Marcel
38973a014e docs: add XDG_CACHE_HOME/TORCH_HOME to OCR env table and upgrade notes for PR #611
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:32:02 +02:00
Marcel
fc8b4b164b security(ocr): redirect XDG cache and Torch home away from read-only HOME
Prevents PyTorch/Matplotlib/Ketos from writing to /home/ocr which is
on the read-only container filesystem — fixes Nora's blocker. Also
restores the explanatory comment on the ocr_cache volume mount.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:30:39 +02:00
Marcel
eb63df2000 test(ocr): add startup root canary tests for main.py lifespan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:29:47 +02:00
Marcel
53bd574660 test(ocr): replace vacuous startswith assertion with equality check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:26:58 +02:00
Marcel
581ba01d8d security(ocr): log warning on startup when running as root
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m10s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
Adds a canary log line if os.getuid() == 0. Produces an observable
signal in container logs if the USER directive is ever removed from
the Dockerfile, without requiring an external audit tool.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:51:00 +02:00
Marcel
9db42d6cc1 fix(ocr): resolve HTRMOPO_DIR from env var, not ~ expansion
With --no-create-home, os.path.expanduser("~") resolves to "/" causing
kraken get to write to /.local/share/htrmopo. Replace with
os.environ.get("HTRMOPO_DIR", "/app/models/.htrmopo") so the path is
explicit and override-friendly without a home directory.

Adds two tests verifying env-var resolution and ~-free default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:49:21 +02:00
Marcel
ab24786d2a security(ocr): harden compose — fix cache volume path, add read_only + cap_drop
Move ocr_cache mount from /root/.cache to /app/cache (correct path for
non-root user). Add HF_HOME so Hugging Face resolves to the same path.
Add runtime hardening: read_only, tmpfs /tmp (512 MB cap), cap_drop ALL,
no-new-privileges.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:47:18 +02:00
Marcel
1aca4c4a41 security(ocr): add non-root user and set HOME/HF_HOME in Dockerfile
CIS Docker §4.1: run uvicorn as UID 1000 (ocr) instead of root.
Creates /home/ocr and /app/cache with correct ownership so named
volumes inherit ocr:ocr on first Docker mount. Sets HOME and HF_HOME
so ~ expansion and Hugging Face caching resolve under /app, not /root.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 16:46:25 +02:00