security(ocr): run OCR container as non-root user (CIS Docker §4.1) #459
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
ocr-service/Dockerfilelacks aUSERdirective. uvicorn runs as UID 0 inside the container.Any RCE in the OCR pipeline (PyTorch, Surya, Kraken, opencv, libvips — all processing user-uploaded PDFs at scale) yields root inside the container. Container-escape exploits are sporadically published, and root-in-container remains a CIS Docker §4.1 violation. The OCR service is also the only service in the stack that ingests fully-untrusted user content directly into native image-processing libraries — exactly the threat model that motivates the non-root rule.
This issue is OCR-only; #134 (backend image) and #135 (frontend image) cover the other two services.
Approach
Add a non-root user, transfer ownership, switch context. Plus harden the runtime in
docker-compose.yml.ocr-service/DockerfileVerify Kraken/Surya can write to their model cache. The compose file mounts
ocr_models:/app/modelsandocr_cache:/root/.cache— the second mount path is wrong for a non-root user. Change to:(Or set
HF_HOME=/app/cachein the env and mount there — keeps everything under/app.)docker-compose.yml— runtime hardeningread_only: truemay break Surya/Kraken if they write to a non-volume path; if so, scope the override to specific writable mounts.Critical files
ocr-service/Dockerfiledocker-compose.yml— ocr-service service block, volume pathocr-service/entrypoint.sh— verify it works as a non-root userVerification
docker compose build ocr-service && docker compose up -d ocr-servicedocker exec archive-ocr id→ returnsuid=1000(ocr), notuid=0(root).docker exec archive-ocr ls -la /app/models→ ocr:ocr ownership.docker compose ps ocr-service→Up (healthy).docker run --rm -v /var/run/docker.sock:/var/run/docker.sock -v $(pwd):/root docker/docker-bench-securityreports 0 CIS §4.1 violations onarchive-ocr.Acceptance criteria
read_only: trueandcap_drop: [ALL]set in compose.Effort
S — half a day including the cache-path migration and verification.
Risk if not addressed
Container compromise = root in container. Combined with Docker socket access on the host, that's a path to host compromise on a multi-tenant or shared host.
Tracked in audit doc as F-06 (Critical).
🏗️ Markus Keller — Application Architect
Observations
ocr_cache:/root/.cachevolume mount is the critical problem: onceUSER ocris set,/root/.cachebecomes inaccessible. The issue correctly identifies this. However, the two fix options (change mount path vs. setHF_HOME=/app/cache) have different operational implications worth spelling out.ensure_blla_model.pyusesos.path.expanduser("~/.local/share/htrmopo")asHTRMOPO_DIR. After switching toUSER ocr(with--no-create-home),~expands to/, not/home/ocr. The blla download path will break silently unlessHTRMOPO_DIRorHOMEis explicitly set. This is an additional wrinkle the issue does not mention.read_only: truein the compose block will conflict withtempfile.TemporaryDirectory()in the/trainand/train-senderendpoints — both write to the system/tmp. The issue anticipates this: "if so, scope the override to specific writable mounts."docs/architecture/c4/l2-containers.pumldescribes the OCR service but does not document its security posture. This change does not add a new container, so no diagram update is strictly required per the doc matrix — but a comment noting the non-root runtime would be a worthwhile one-liner addition.USERdirectives — those are tracked separately in #134 and #135. That decomposition is correct; do not scope-creep into this issue.Recommendations
HF_HOME=/app/cacheover changing the mount path. It is more explicit (the variable name documents the intent), keeps all writable paths under/app(a single chown covers everything), and is immune to the~expansion problem with--no-create-home.HOME=/appin the Dockerfile (or at minimum setHTRMOPO_DIRexplicitly inensure_blla_model.py) to prevent the silent~resolution failure whenkraken getdownloads the blla model.tmpfs: [/tmp]alongsideread_only: truerather than widening the read-only restriction. The training endpoints need/tmp; the tmpfs mount gives them a writable, non-persistent, memory-backed/tmpwhile keeping the rest of the filesystem read-only.👨💻 Felix Brandt — Fullstack Developer
Observations
ocr-service/Dockerfilecurrently has noUSERdirective.uvicornruns as UID 0. The fix is mechanical: adduseradd,chown,USER— three lines.entrypoint.shis minimal (set -euo pipefail+python3 /app/ensure_blla_model.py+exec uvicorn). It will work unchanged as a non-root user provided all the paths it touches (/app/ensure_blla_model.py,/app/models/) are owned by theocruser. Thechmod +x /app/entrypoint.shmust happen before theUSERdirective — it already does in the proposed snippet, which is correct.ensure_blla_model.pyhas a latent bug under--no-create-home:HTRMOPO_DIR = os.path.expanduser("~/.local/share/htrmopo")— with no home directory,~becomes/and the path resolves to/.local/share/htrmopo. Thekraken getdownload will either fail or write to the filesystem root. This must be fixed as part of this issue, not deferred.tempfile.TemporaryDirectory().tempfiledefaults toTMPDIRenv var, then/tmp. Withread_only: trueand notmpfs, training will fail with aReadOnlyError. Thetmpfs: [/tmp]entry in compose is required for training to continue working.main.pyis clean Python: type hints on public functions,_prefix on private helpers, Pydantic models,asyncio.to_thread()for CPU-bound work. No style issues introduced by this change.test_training_auth.pysuite has good coverage of token auth for/train,/train-sender, and/segtrain. No test currently verifies that the process runs as UID 1000 — this is an infrastructure-layer test, not a Python unit test, and is correctly left to the verification steps in the issue rather than added to the test suite.Recommendations
ensure_blla_model.pyin the same PR. AddHTRMOPO_DIR = os.path.expanduser(os.environ.get("HTRMOPO_DIR", "~/.local/share/htrmopo"))and setHOME=/home/ocrin the Dockerfile (creating the directory withmkdir -p /home/ocr && chown ocr:ocr /home/ocr) — or useHF_HOME=/app/cacheand setHTRMOPO_DIR=/app/models/.htrmopoexplicitly. The key point: don't leave it to~expansion with a no-home user.useradd→chown -R ocr:ocr /app→chmod +x /app/entrypoint.sh→USER ocr. Runningchmodafterchownas root is fine; running it afterUSER ocrwould require the file to already be owned byocr.chown -R ocr:ocr /applayer will be large because it touches the PyTorch wheels and all installed packages. Accept this: it runs once per build and the security benefit justifies the layer cost. Do not try to merge it into an earlier layer — it must come afterCOPY . ..ocr_modelsvolume ownership on first run. Docker named volumes are owned by root by default. Thechownin the Dockerfile only affects image layers, not the mounted volume. The service will fail to write models on first startup unless either: (a) the entrypoint script runs a one-timechownas root before dropping privileges (not possible withUSER ocr), or (b) the volume is initialized with correct ownership. The cleanest solution is tomkdir -p /app/models && chown ocr:ocr /app/modelsin the Dockerfile — Docker will copy that ownership to a new named volume on first mount.🚀 Tobias Wendt — DevOps & Platform Engineer
Observations
ocr_cache:/root/.cachemount is broken today for any process that drops root — but it also exists currently as root, which means it works now. After addingUSER ocr, this mount will be readable but not writable (volume owned by root, no write permission for UID 1000). Every Hugging Face model download attempt will silently fail or raise aPermissionError. This is the highest-risk item in the change.ocr-serviceblock indocker-compose.ymlhas noread_only,cap_drop, orsecurity_opt— consistent with the other services (backend, frontend), which also lack these. This is a dev Compose file, so the absence is expected, but the issue is right to add them here since the OCR service is uniquely exposed to untrusted content.minio:latestis still present. This is pre-existing and out of scope for this issue, but worth a note. Tracked separately.USERdirective). The backend image uses a multi-stage build witheclipse-temurin:21.0.10_7-jre-noble— the JRE base image version is pinned, which is good. Theocr-servicebase imagepython:3.11.9-slimis also pinned — good.mem_limit: 12gandmemswap_limit: 12gare set on the OCR service. These survive this change — the memory limits apply to the container regardless of the UID running inside.healthcheckonocr-serviceusescurl -f http://localhost:8000/health. After switching to non-root,curlmust remain in the image. It is currently installed via theapt-get install curllayer in the Dockerfile. No change required there.docker-bench-securityis referenced in the verification steps. This is a reasonable one-time check; it does not need to be added to CI for this project at current scale.Recommendations
HF_HOME=/app/cacheas the cache redirect strategy and update the compose volume toocr_cache:/app/cache. This avoids any path involving/rootand keeps all service-writable paths under/app. AddHF_HOMEto theenvironment:block in compose and to theENVdirectives in the Dockerfile so both container and build-time pip installations agree on the path.read_only: true,tmpfs: [/tmp],cap_drop: [ALL],security_opt: [no-new-privileges:true]. Thetmpfsentry is non-negotiable:/trainand/train-senderboth usetempfile.TemporaryDirectory()which writes to/tmp.ocr_cachevolumes on dev machines will have root ownership after this change. Document this in the PR description with the one-liner workaround:docker volume rm familienarchiv_ocr_cache(or equivalent) beforedocker compose up. Users will see a permission error on first startup otherwise.read_only: trueto the backend or frontend compose services in this PR. That is out of scope. This PR is OCR-only per the issue title. #134 and #135 cover the other services.Open Decisions
tmpfssize limit. The defaulttmpfsin Docker Compose has no size limit (uses available memory). Training uploads can be large ZIP files (multi-MB). On a CX32 VPS with 8GB RAM and 12GB already reserved for OCR models, an uncapped tmpfs could cause OOM under concurrent training. Considertmpfs: - /tmp:size=512mor similar. This needs a human judgment call on maximum expected training ZIP size.📋 Elicit — Requirements Engineer
Observations
/train,/train-sender, and/segtrain(present intest_training_auth.py). The AC should reference all three, or be written as "all training endpoints still work."docker compose stop ocr-service && docker compose up -d ocr-serviceand confirm healthcheck passes).Recommendations
/train,/train-sender, and/segtrain. This prevents inadvertently shipping a broken training endpoint that was not in the tester's mental model.docker exec archive-ocr ls -la /app/models/shows models present and no re-download occurs in the logs.🔐 Nora "NullX" Steiner — Security Engineer
Observations
_check_training_token) and the tests intest_training_auth.pyverify this correctly — including fail-closed behavior whenTRAINING_TOKENis empty. The non-root change does not affect this control.subprocess.run(["ketos", ...])uses list form with noshell=True. This is correct and remains safe under non-root._validate_url()implements SSRF protection viaALLOWED_PDF_HOSTS. This control is unaffected by the UID change._validate_zip_entry()implements ZIP Slip protection. This control is unaffected.cap_drop: [ALL]is the highest-value hardening in the compose block. Dropping all Linux capabilities prevents privilege escalation attacks even from within the container. A container-escape exploit that requiresCAP_SYS_ADMIN(common) is blocked by this. Combined withno-new-privileges:true, even a setuid binary in the image cannot escalate.read_only: truesignificantly reduces post-exploitation persistence options. An attacker who achieves RCE cannot write a backdoor to the filesystem; any written files disappear on container restart. The tmpfs/tmpis memory-only and also non-persistent.ocr_cachevolume changing from/root/.cacheto/app/cache(or similar) should be done atomically: if the volume still mounts at/root/.cacheafter the UID switch, Hugging Face model downloads silently use a root-owned directory. The process has read access (world-readable cache files) but not write access — this means model downloads that succeed for root will silently fail for theocruser. This is a security-adjacent reliability issue: a confused process that partially reads a root-owned cache could load stale or corrupted model state.ensure_blla_model.pypath issue (HTRMOPO_DIR = os.path.expanduser("~/.local/share/htrmopo")expanding to/.local/...under--no-create-home) is not a security vulnerability, but it is a startup reliability issue. If the blla model cannot be validated, the container starts in a degraded state that could mask other errors.Recommendations
cap_drop: [ALL]first, then confirm OCR and training work. Some capability drops are surprising —CAP_SETUID,CAP_SETGIDare already absent for non-root, butCAP_CHOWNmay be needed during container init if any setup script callschown. Confirm by testing, not by assumption.idoutput is notuid=0(root)— or more practically, add a startup check tomain.py:if os.getuid() == 0: logger.warning("Running as root — CIS Docker §4.1 violation"). This produces an observable signal in logs during verification and acts as a canary if theUSERdirective is accidentally removed in a future Dockerfile edit.CAP_NET_RAWback. The OCR service makes outbound HTTP requests (to MinIO) but useshttpxover standard TCP — no raw socket operations needed.TRAINING_TOKENempty-string check (_check_training_token) returns 503 when the token is not configured. This is fail-closed and correct. Verify that this behavior is preserved after the UID change (it will be — it is application logic, not filesystem permission logic).🧪 Sara Holt — QA Engineer
Observations
test_training_auth.py(token auth),test_stream.py(streaming),test_confidence.py,test_engines.py,test_preprocessing.py. All useAsyncClient+ASGITransportfor in-process API testing — correct pattern.docker exec archive-ocr id) is the right approach.ensure_blla_model.pytries to write to a volume directory owned by root. This failure mode is silent in the test suite (tests mock the engine, not the filesystem). The issue's verification step 4 (healthcheck green) catches this, but only if the healthcheck'sstart_period: 120sis long enough to observe a permission-error crash and retry.test_training_auth.pytests the/train,/train-sender, and/segtrainendpoints for authentication. After this change, these same tests must continue to pass — they will, since they mock filesystem operations. A manual E2E training run (verification step 5) is still required.ensure_blla_model.pypath when running as non-root.test_ensure_blla_model.pyexists but mockssubprocess.run— it does not validate the~expansion orHTRMOPO_DIRresolution under a non-root user. This gap is acceptable to leave as a manual verification step for this issue, but it should be noted.retries: 12withinterval: 10sgives 120s of retry time. Given that model loading takes 30–50s on cold start, this is appropriate. After the UID change, a permission error on the model volume would cause uvicorn to start but_models_readyto remainFalse, returning 503 on/health. The healthcheck would eventually mark the container unhealthy. This is correct behavior — fail loudly.Recommendations
test_ensure_blla_model.pythat assertsHTRMOPO_DIRresolution is not root-dependent: mockos.path.expanduserto return a temp directory and verify that_download_bllaresolves the path correctly. This guards against the--no-create-homeregression being silently reintroduced.docker compose logs ocr-service | grep "blla model OK"appears and noPermissionErrorlines appear. This catches the volume ownership issue that unit tests cannot.ocr-service/testdata/or similar, and document the exact command to trigger it in the verification steps. This makes the verification reproducible by anyone, not just the implementer.os.getuid()in a test is environment-specific and will break CI runs that do not honor theUSERdirective. Keep UID verification in the operational checklist, not the automated suite.🎨 Leonie Voss — UX Design Lead
Observations
This is a pure infrastructure security change with no frontend UI impact. No Svelte components, no routes, no styling, and no user-facing behavior are affected. The OCR service runs as an internal microservice on the Docker network — users interact with it only indirectly through the Java backend's OCR trigger and streaming progress UI.
One indirect UX concern worth flagging: if the non-root change causes a startup permission error, the OCR service will fail its healthcheck and the backend's
depends_on: ocr-service: condition: service_started(note:service_started, notservice_healthy) means the backend starts regardless. Users who trigger OCR while the OCR service is recovering from a permission error will see a failure state in the OCR progress UI. This is not a new failure mode — it exists today — but the non-root change introduces a new class of startup failure that did not exist before. The UI already handles this (503 from the OCR service surfaces as an error state in the progress indicator); no UI changes are needed.Recommendations
📬 Decision Queue
One open decision was raised across the reviews — needs a human call before implementation:
Theme: tmpfs size cap for
/tmpRaised by: Tobias Wendt (@tobiwendt)
The
tmpfs: [/tmp]mount required for training is uncapped by default. Docker will allow it to consume all available memory. On the CX32 VPS (8GB RAM, with 12GB already allocated to the OCR container viamem_limit), an uncapped/tmpunder concurrent training could cause OOM and crash the container.The tradeoff:
size=512m) — limits blast radius, but requires knowing the maximum expected training ZIP size. If the cap is too small,/trainwill fail mid-run with a no-space error.What's needed: a human decision on the maximum expected training ZIP file size. Training ZIPs contain
.pngscan images and.gt.txtground-truth pairs — a typical batch might be 20–50 images at 1–5MB each, so 100MB–250MB is a plausible upper bound. A 512MB cap with a 10–20% safety margin would cover typical usage.Suggested resolution: cap at
512mwith a comment explaining the rationale, and log a warning in_run_training()if the extracted ZIP contents exceed a soft threshold (e.g., 400MB). Revisit if training datasets grow larger.