Files
familienarchiv/docs/adr/019-container-hardening-baseline.md
Marcel 74ca5ee35f
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m2s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m11s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 17s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
docs(adr): ADR-019 — container hardening baseline (non-root + read-only)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:33:06 +02:00

4.1 KiB

ADR-019 — Container hardening baseline: non-root user + read-only filesystem

Status: Accepted
Date: 2026-05-17
PR: #611


Context

The OCR service ran as root inside its container by default. This violated CIS Docker Benchmark §4.1 and CIS §4.6, and meant that any exploit in the OCR pipeline (untrusted PDF content, model deserialization, ZIP handling) could write to or execute anything inside the container without restriction.

The following risks were present before this baseline:

  • A path-traversal in the ZIP-based training endpoint could overwrite arbitrary paths on the container filesystem (including Python source files and model files).
  • A compromised dependency running at startup could persist itself to the image layers or model volumes.
  • Misconfigured model downloads could overwrite /etc/passwd or similar via path-traversal — possible because root can write everywhere.

Decision

All containers in this project that have no operational need for elevated privileges must apply the following hardening baseline:

1. Non-root user

Create a dedicated user with a fixed UID and no login shell:

RUN useradd --no-create-home --shell /usr/sbin/nologin --uid 1000 <service>

Set HOME explicitly to a path owned by this user. Do not rely on ~ expansion for any path resolution in application code.

2. Read-only container filesystem

read_only: true

All paths the application writes to at runtime must be explicitly declared as either a named volume or a tmpfs mount. This turns any unexpected write attempt into an immediate, visible PermissionError rather than a silent success.

3. Per-path write carve-outs

Declare only the paths that are actually written at runtime:

volumes:
  - <service>_models:/app/models   # persistent model storage
  - <service>_cache:/app/cache     # HuggingFace / ketos download cache
tmpfs:
  - /tmp:size=512m                 # transient scratch space (ZIP extraction etc.)

Do not mount the home directory as a volume unless necessary — use XDG_CACHE_HOME and TORCH_HOME env vars to redirect library cache writes to the declared writable paths instead.

4. Dropped capabilities and privilege escalation prevention

cap_drop: [ALL]
security_opt:
  - no-new-privileges:true

A Python/FastAPI service on port 8000+ requires no Linux capabilities. Dropping all and blocking privilege escalation via setuid prevents any capability regain even if a dependency contains a SUID binary.

5. Startup root canary

Log a warning during startup if the process is running as root. This catches misconfiguration (e.g., USER directive accidentally removed in a future Dockerfile edit) before it becomes a silent vulnerability:

if os.getuid() == 0:
    logger.warning("Running as root — CIS Docker §4.1 violation")

Consequences

Positive:

  • Any exploit that achieves code execution inside the container is confined: it cannot write outside the declared volumes, cannot acquire new capabilities, and cannot persist to the image filesystem.
  • PermissionError on startup is an explicit, diagnosable failure rather than a silent privilege misuse.
  • The startup canary catches accidental regressions in the non-root setup.

Negative / operational cost:

  • Every new feature that writes to a new path (e.g., a new model cache directory, a new scratch path) must add a volume or tmpfs mount. The read_only: true flag makes this a hard constraint, not a suggestion.
  • Library dependencies that write to HOME without respecting XDG_CACHE_HOME must be identified and redirected explicitly (see TORCH_HOME, XDG_CACHE_HOME, HF_HOME in docker-compose.yml).
  • Existing named volumes written by root (pre-baseline) must be dropped and recreated before upgrading. See DEPLOYMENT.md §8.

Applicability

This baseline applies to the OCR service (PR #611). It should be applied to any new container added to the project unless there is a documented, specific operational reason a capability or writable filesystem is required.