# ADR-019 — Container hardening baseline: non-root user + read-only filesystem **Status:** Accepted **Date:** 2026-05-17 **PR:** #611 --- ## Context The OCR service ran as `root` inside its container by default. This violated CIS Docker Benchmark §4.1 and CIS §4.6, and meant that any exploit in the OCR pipeline (untrusted PDF content, model deserialization, ZIP handling) could write to or execute anything inside the container without restriction. The following risks were present before this baseline: - A path-traversal in the ZIP-based training endpoint could overwrite arbitrary paths on the container filesystem (including Python source files and model files). - A compromised dependency running at startup could persist itself to the image layers or model volumes. - Misconfigured model downloads could overwrite `/etc/passwd` or similar via path-traversal — possible because root can write everywhere. --- ## Decision All containers in this project that have no operational need for elevated privileges **must** apply the following hardening baseline: ### 1. Non-root user Create a dedicated user with a fixed UID and no login shell: ```dockerfile RUN useradd --no-create-home --shell /usr/sbin/nologin --uid 1000 ``` Set `HOME` explicitly to a path owned by this user. Do not rely on `~` expansion for any path resolution in application code. ### 2. Read-only container filesystem ```yaml read_only: true ``` All paths the application writes to at runtime must be explicitly declared as either a named volume or a `tmpfs` mount. This turns any unexpected write attempt into an immediate, visible `PermissionError` rather than a silent success. ### 3. Per-path write carve-outs Declare only the paths that are actually written at runtime: ```yaml volumes: - _models:/app/models # persistent model storage - _cache:/app/cache # HuggingFace / ketos download cache tmpfs: - /tmp:size=512m # transient scratch space (ZIP extraction etc.) ``` Do not mount the home directory as a volume unless necessary — use `XDG_CACHE_HOME` and `TORCH_HOME` env vars to redirect library cache writes to the declared writable paths instead. ### 4. Dropped capabilities and privilege escalation prevention ```yaml cap_drop: [ALL] security_opt: - no-new-privileges:true ``` A Python/FastAPI service on port 8000+ requires no Linux capabilities. Dropping all and blocking privilege escalation via setuid prevents any capability regain even if a dependency contains a SUID binary. ### 5. Startup root canary Log a warning during startup if the process is running as root. This catches misconfiguration (e.g., `USER` directive accidentally removed in a future Dockerfile edit) before it becomes a silent vulnerability: ```python if os.getuid() == 0: logger.warning("Running as root — CIS Docker §4.1 violation") ``` --- ## Consequences **Positive:** - Any exploit that achieves code execution inside the container is confined: it cannot write outside the declared volumes, cannot acquire new capabilities, and cannot persist to the image filesystem. - `PermissionError` on startup is an explicit, diagnosable failure rather than a silent privilege misuse. - The startup canary catches accidental regressions in the non-root setup. **Negative / operational cost:** - Every new feature that writes to a new path (e.g., a new model cache directory, a new scratch path) must add a volume or tmpfs mount. The `read_only: true` flag makes this a hard constraint, not a suggestion. - Library dependencies that write to `HOME` without respecting `XDG_CACHE_HOME` must be identified and redirected explicitly (see `TORCH_HOME`, `XDG_CACHE_HOME`, `HF_HOME` in `docker-compose.yml`). - Existing named volumes written by root (pre-baseline) must be dropped and recreated before upgrading. See [DEPLOYMENT.md §8](../DEPLOYMENT.md#8-upgrade-notes). --- ## Applicability This baseline applies to the OCR service (PR #611). It should be applied to any new container added to the project unless there is a documented, specific operational reason a capability or writable filesystem is required.