bug(ocr): /tmp tmpfs too small for Surya model download — guided OCR fails with ENOSPC on staging #614
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
After the OCR security hardening landed (commits
1aca4c4a,ab24786d,fc8b4b16,f1e0b92f), the OCR container in staging (archiv-staging-ocr-service-1) cannot complete a guided OCR request.Two distinct failures have surfaced. The first —
PermissionError: '/app/cache/datalab'— was fixed today by chowning the pre-existingarchiv-staging_ocr-cacheandarchiv-staging_ocr-modelsDocker volumes from0:0→1000:1000(the volumes pre-dated the non-rootocruser introduced in1aca4c4a).This issue tracks the second failure, which only became visible after the chown unblocked the first.
Symptom
First guided OCR call after a clean restart triggers Surya to download
text_recognition/2025_09_23/model.safetensors(1.34 GB). The download fails at ~510 MB, retried 3× by Surya, each time with:The host (
raddatz.cloud,/dev/md2) has 1.8 TB free./app/cacheand/app/modelsare both on that disk. The space exhausted is somewhere else.Root cause
surya/common/s3.py:95—download_directorystages every file into atempfile.TemporaryDirectory()andshutil.moves to the cache afterwards:tempfile.TemporaryDirectory()honours$TMPDIRand otherwise falls back to/tmp. The OCR container declares:TMPDIRis unset → staging path resolves to the 512 MB/tmptmpfs → 1.34 GB safetensors blows the budget at ~510 MB. The comment on that line was written before Surya was introduced; nobody re-evaluated/tmpsizing for model downloads.Inside the running container:
Approaches under consideration
Pick exactly one. Trade-offs are real — please choose deliberately.
Approach A — Redirect
TMPDIRto the disk-backed cache volume (recommended)Add a new
TMPDIR=/app/cache/.tmpenv var to theocr-serviceblock, and ensure the directory exists withocr:ocrownership. The/tmptmpfs stays at 512 MB and continues to absorb training ZIPs (which are small and benefit from being in RAM).Pros
shutil.movebecomes a same-filesystemrename(2)— atomic and near-free./tmp(the original intent of the 512 MB sizing).Cons
.tmpsubdirectory to exist with correct ownership before first use. Two ways to guarantee that:mkdir -p "$TMPDIR" && chownstep toentrypoint.sh— but the container runs asocrandread_only: true, somkdiris OK on the volume mount butchownwould fail. Safer: justmkdir -p "$TMPDIR"(the parent volume is already owned byocrafter today's chown, so the new dir inherits correctly).Dockerfile(RUN mkdir -p /app/cache/.tmp && chown ocr:ocr /app/cache/.tmp) — but/app/cacheis volume-shadowed at runtime, so this only works on a fresh volume. Existing volumes need the entrypoint variant.docker killed mid-download, theTemporaryDirectorycontext manager doesn't get to clean up — leaves orphaned files in.tmp. Mitigation: addfind "$TMPDIR" -mtime +1 -deleteto entrypoint, or accept it (the bytes are small compared to the 1.8 TB available).Approach B — Enlarge
/tmptmpfs to 4 GBOne-line change:
Pros
Cons
mem_limit: 12gis set on the cgroup, but tmpfs accounting depends on host config — in the worst case, a 1.34 GB download sits in host RAM, thenshutil.moveto/app/cachecopies it (cross-filesystem move = read + write + delete) consuming briefly ~2.7 GB total RAM./tmpand "model download staging" share a budget. A future model 2× larger requires bumping again.Approach C — Apply both
TMPDIR=/app/cache/.tmpand/tmp:size=1g. Belt-and-suspenders: any future code that ignores$TMPDIRand writes directly to/tmpstill has 1 GB of headroom for typical work (training ZIPs, transient PDF buffers).Pros
Cons
Acceptance criteria
/app/cache/datalab/models/text_recognition/2025_09_23/and inference runs)./train-sender, seemain.py) still works — ZIPs of 20-50 page batches succeed.read_only: trueandcap_drop: [ALL].ocruser (uid 1000).docker-compose.ymlin the repo and the deployed compose at/opt/familienarchiv/docker-compose.ymlon the server are kept in sync (CI/CD redeploy is the canonical sync path — no manual edits to the server file).ocr_cachevolume (no pre-chown), the chosen approach still works (relevant for new environments + the eventual repaving of the staging volume).Test plan
docker compose up -d ocr-service, then trigger a guided OCR call against a document with no cached Surya model. Confirm download completes (~10 s on home connection) and OCR returns text./train-senderwith a sample ZIP — confirm training endpoint still functional.raddatz.cloud,docker volume rm archiv-staging_ocr-cache(force fresh-volume path),docker compose ... up -d ocr-service, repeat #1 against staging./app/cache/.tmpownership inside the running container:docker exec ... ls -ldn /app/cache/.tmpshould showocr:ocr(uid 1000).Files to touch
docker-compose.yml—ocr-service.environment(Approach A or C) and/orocr-service.tmpfs(Approach B or C). Update the inline comment on the tmpfs line to reflect the new sizing rationale.ocr-service/entrypoint.sh— Approach A or C: prependmkdir -p "${TMPDIR:-/tmp}"so a freshocr_cachevolume gets the directory on first start. Idempotent; safe on already-populated volumes.ocr-service/CLAUDE.mdand/orREADME.md— one-line note about theTMPDIRconvention (Approach A or C only) so future contributors don't undo it.Not touched:
Dockerfile—/app/cache/.tmpcannot be baked in because the volume mount shadows it.docker-compose.prod.yml/ staging overlays — env var is already inherited from the base compose.Out of scope (separate tickets)
german_kurrent.mlmodelis missing fromarchiv-staging_ocr-models. Pre-existing gap; Surya works without it, but Kurrent-specific OCR is disabled until the model is seeded. File a separate issue.chown -R 1000:1000ofarchiv-staging_ocr-cacheandarchiv-staging_ocr-modelsworked but is not reproducible from version control. A separate hardening ticket should make volume ownership automatic — e.g. via a one-shot init container or a volume-bootstrap step in CI/CD.TMPDIRconvention if Approach A/C is chosen — author's call; not required for merge.Related work / history
1aca4c4a security(ocr): add non-root user and set HOME/HF_HOME in Dockerfileab24786d security(ocr): harden compose — fix cache volume path, add read_only + cap_dropfc8b4b16 security(ocr): redirect XDG cache and Torch home away from read-only HOMEf1e0b92f style(ocr): normalize cap_drop to block notation in docker-compose.ymlraddatz.cloud(2026-05-18):docker stop archiv-staging-ocr-service-1 && docker run --rm -v archiv-staging_ocr-cache:/c alpine chown -R 1000:1000 /c && docker run --rm -v archiv-staging_ocr-models:/m alpine chown -R 1000:1000 /m && docker start archiv-staging-ocr-service-1. Volumes now correctly owned by uid 1000; service healthy; this issue is what surfaces next.🏛️ Markus Keller — Senior Application Architect
Observations
/tmpis for (training-ZIP unpacking, transient PDF buffers). Approach A correctly separates them; B collapses them back into one budget.text_recognitionis one of several Surya models, andsurya_engineis the project's default OCR engine (seeengines/__init__.pyimport inmain.py:27). Future Surya releases will be in this size range or larger.docker run --rm alpine chown.docker-compose.ymlanddocker-compose.prod.ymlboth carry the identicaltmpfs: /tmp:size=512mcomment about "training endpoints write ZIPs to /tmp" — duplicated config drift waiting to happen.Recommendations
TMPDIR=/app/cache/.tmpconvention. The decision has the lifetime signature of an ADR: it constrains future contributors (they cannot move/app/cacheto bind-mount semantics without breaking this), it cites a concrete failure mode (ENOSPC), and it survives across deployments. Per CLAUDE.md, the author note in the issue ("ADR or commit comment ... author's call") understates this —TMPDIRpointing inside a volume mount is a non-obvious decision that will be reverted by someone in 2 years if there is no ADR.mkdir -p /app/cache/.tmp(Approach A's mkdir step becomes redundant). Solving them together avoids a secondentrypoint.shmodification later.TMPDIRenv var with the threat model, not the mechanism. Something like# Stage GB-scale model downloads on the disk-backed cache volume, not the RAM tmpfs. See ADR-008.is the kind of comment Nora's persona principle ("explain why this is safe") calls for.Open Decisions
👨💻 Felix Brandt — Senior Fullstack Developer
Observations
tempfile.TemporaryDirectory()call sites inocr-service/main.py(lines 383, 480, 557 —/train,/train-sender,/segtrain). All three currently rely on/tmp. Approach A'sTMPDIRredirect fixes all three at once without code changes.surya/common/s3.py:55), not our code — we cannot wrap or patch the call site.TMPDIRis the only knob the host can turn without forking Surya.entrypoint.shis currently 9 lines and does exactly one thing (validate blla model then exec uvicorn). Adding anmkdir -pis a single line that fits its style.HF_HOME=/app/cacheetc. asENV. PuttingTMPDIRonly in compose means container users who run the image with no compose layer (e.g.docker runfor diagnostics) lose the safety. But puttingENV TMPDIR=/app/cache/.tmpin the Dockerfile risks pointing at a path that doesn't exist on a fresh image (no compose-mounted volume). The entrypointmkdir -p "$TMPDIR"resolves both — the Dockerfile sets the default, the entrypoint guarantees the directory.Recommendations
TMPDIR=/app/cache/.tmpindocker-compose.ymlANDdocker-compose.prod.yml(both files, atomic commit — they diverged silently once already with the 512 MB tmpfs comment, that's the smell to avoid). Also addENV TMPDIR=/app/cache/.tmpto the Dockerfile as a default sodocker run archive-ocrdoesn't blow up either.python3 /app/ensure_blla_model.pyso the BLLA fallback download also benefits. The:-/tmpfallback keeps the script safe ifTMPDIRis ever unset.ocr-service/that callstempfile.gettempdir()in a subprocess started withTMPDIR=/some/pathand asserts it resolves to that path. Then write an integration-style test that doestempfile.TemporaryDirectory()and verifies the path begins withos.environ["TMPDIR"]. Both are <10-line tests. Theentrypoint.shmkdirline is harder to TDD; assert it via a shellcheck + a smoke step in CI that runsdocker compose build && docker compose run --rm ocr-service ls -ld /app/cache/.tmp.find $TMPDIR -mtime +1 -delete). It is overengineering for a 1.8 TB disk with sub-GB orphan candidates. YAGNI — revisit if monitoring ever flags it.ocr-service/README.md— addTMPDIRto the env var table (it has rows forHF_HOME,XDG_CACHE_HOME,TORCH_HOMEalready; one more matches the pattern).ocr-service/CLAUDE.md— single LLM-reminder bullet ("TMPDIR points into the persistent cache volume; do not redirect to RAM tmpfs").Open Decisions
🛠️ Tobias Wendt — DevOps & Platform Engineer
Observations
docker-compose.yml:114anddocker-compose.prod.yml:169are byte-identical on the tmpfs line. Whichever approach lands, the comment update has to be made in both files in the same commit. (We learned this lesson with #526 already —compose configrendered output is the canonical check.)mem_limit: 12gonocr-service. tmpfs allocations do count against cgroup memory in modern Docker (kernel ≥ 4.x). A 4 GB tmpfs + Surya's ~5 GB resident model + 12g cgroup limit is fine on CX42 (16 GB host RAM), but on CX32 withOCR_MEM_LIMIT=6g(documented indocs/DEPLOYMENT.md:143) it will OOMKill on cold start. That alone disqualifies B for the staging-on-CX42 / future-prod-on-CX32 path.nightly.ymlshould not require a human SSH session withdocker run --rm alpine chown. That's the issue we should be tracking even more urgently than this one..tmpsubdirectory sits inside the same volume, gets created on first start, survives restarts).Recommendations
shutil.movebetween same-filesystem paths becomesrename(2)— atomic and ~free. Approach B trades correctness for diff size; that's a poor trade in infra code.compose configoutput side-by-side before merging to catch drift.devops(ocr): automate ocr_cache + ocr_models volume ownership on first start. Recommendation is a one-shot init container indocker-compose.prod.ymlthat runs beforeocr-service:depends_on: ocr-volume-init: condition: service_completed_successfullyonocr-service(same pattern ascreate-bucketsindocker-compose.prod.yml:191). This makes the manual chown history non-repeatable — exactly what infra-as-code is for. Once that init container exists, the entrypointmkdir -p "$TMPDIR"becomes redundant; consider sequencing the two PRs so we only touchentrypoint.shonce.nightly.ymlalready does adocker compose configregression check for the import mount. Add an analogous assertion:grep -q 'TMPDIR: /app/cache/.tmp' /tmp/compose-rendered.ymlso the env var cannot be accidentally dropped by a future "cleanup" PR. Costs 2 lines, prevents a 1 AM redeploy.docker-compose.prod.yml, note thatmailpit:v1.29.7andminio:RELEASE.2025-02-28T09-55-16Zare still hand-pinned. Separate ticket.Open Decisions
🔐 Nora "NullX" Steiner — Application Security Engineer
Observations
1aca4c4a,ab24786d,fc8b4b16,f1e0b92f) is correctly enumerated in the AC: non-root user,read_only: true,cap_drop: [ALL]. None of the three approaches weakens those. Good.TMPDIRto/app/cache/.tmp(Approach A), user-uploaded ZIP extraction in/train,/train-sender,/segtrain(main.py:383, 480, 557) now writes to the persistent cache volume instead of the RAM tmpfs. That has two security implications worth naming:docker killed mid-extract, ZIP entries persist on the SSD across container lifecycles. The issue body mentions this in passing but downplays it as a disk-space issue. From a security standpoint, the persistence itself is the concern — partial ground-truth data from training-run N could still be on disk during training-run N+1. Mitigation: the existing_validate_zip_entry()ZIP Slip checks (Felix's persona file documents them) still apply, so an attacker can't write outside.tmp. But within.tmp, orphans accumulate.TRAINING_TOKENcheck (main.py —_check_training_token) gates/train*endpoints, so this is post-auth DoS, not unauthenticated. Lower severity, but worth naming. Approach A preserves the original 512 MB cap on the/tmp-using paths.mkdir -p "$TMPDIR"inentrypoint.shruns as theocruser (uid 1000), inside aread_only: truerootfs, with the/app/cachevolume mounted RW. That's the only place thismkdircan succeed. It cannot escalate.read_only: trueis preserved in all three approaches — verified againstdocker-compose.yml:112anddocker-compose.prod.yml:167.Recommendations
/tmpis whattempfilefalls back to only ifTMPDIRis unset, which it now never is). Approach B raises the attacker's playing field. Approach C does both for no security gain over A.entrypoint.shto address the cross-job leakage concern from Observation #1:-mtime +1prevents nuking an in-progress download started seconds before a restart), and security-positive (no stale ground-truth XML floats across runs). Counter-argues Felix's "skip cleanup as YAGNI" — disagreed: this is a multi-tenant-ish endpoint (/trainaccepts arbitrary uploads) and the cleanup costs nothing.ocr-service/test_training_auth.py(it already exists) that asserts the ZIP Slip check still fires under the newTMPDIR. The threat model didn't change, but the working directory did — and_validate_zip_entry()resolves againsttmp_dirwhich is now/app/cache/.tmp/tmpXXX/.... Confirmos.path.realpath()still produces the correct anchoring.TMPDIRenv var (CLAUDE.md says comments explain the threat model, not the code):Open Decisions
🧪 Sara Holt — Senior QA Engineer
Observations
/train-senderstill works) are named, and the AC explicitly call outread_only: true/cap_drop/ non-root preservation. That's the right shape.TMPDIRenv var — we will rediscover it in production again.ocr-service/test_main.py,test_training_auth.py,test_stream.py,test_ensure_blla_model.pyall exist — there is an established Python test surface to extend.ocr_cachevolume, the chosen approach still works" is named, but the verification is left to manualdocker volume rm+ restart. This is exactly the case that broke staging (pre-existing volumes with0:0ownership). It deserves an automated check.Recommendations
ocr-service/test_main.py(or a newtest_tmpdir.py) — red/green TDD style:def test_tempfile_uses_tmpdir_when_set(monkeypatch, tmp_path):— monkeypatchTMPDIRtotmp_path, calltempfile.TemporaryDirectory(), assert the returned path begins withtmp_path. Proves Python honours the env var.def test_entrypoint_creates_tmpdir():— shell out tobash entrypoint.sh(mockpython3anduvicorn), setTMPDIR=/tmp/test-tmpdir-xyz, assert directory exists afterward. Proves themkdir -pworks.def test_tmpdir_is_inside_persistent_volume_path():— assert that the configuredTMPDIRin environment lives under/app/cache(matches the compose contract). Catches drift if someone later writesTMPDIR=/tmp/something.compose config | grep TMPDIRassertion. That covers the YAML side; my tests cover the runtime side.tempfile.TemporaryDirectory()calls in threads.ls -ldn, write a one-line bash assertion that exits non-zero on the wrong owner, so it works as a CI step too:Open Decisions
📋 "Elicit" — Requirements Engineer
Observations
german_kurrent.mlmodel, and the volume-ownership automation) are mentioned in the AC's context (#6: "the chosen approach still works on a freshocr_cachevolume"). The AC implicitly depends on the volume being writable by uid 1000 — which is currently solved only by the manual chown. Without the volume-bootstrap automation in scope, AC #6 is satisfiable only on environments where someone has already chowned. That is a hidden dependency.TMPDIRredirected to a persistent volume mount is exactly that. Either the ADR is in scope or CLAUDE.md needs updating — but the issue should not be the place that quietly waives the doc requirement.Recommendations
TMPDIRconvention — required per project doc rules, not optional.surya/common/s3.py" so a reviewer doesn't ask why we didn't patch upstream. The reason (we can't fork OCR libraries for hosting tweaks) is sound; just name it.ocr_cachevolume owned by uid 1000, when theocr-servicecontainer starts and receives a guided-OCR request for a Surya model not previously cached, then the model downloads to/app/cache/datalab/models/text_recognition/2025_09_23/and inference returns text within the existing healthcheckstart_period.ocr_cachevolume with stale0:0ownership (pre-#612 state), when... — covers AC #6./train-senderis invoked with a 20-image ZIP, then training completes without ENOSPC and the cached Surya model is preserved.TMPDIR, log it. Today this failure was diagnosed by reading stack traces in container logs — a structuredlogger.info("Surya model staging to %s", tmpdir)would have spotted the pre-existing-volume issue weeks ago. Lightweight, costs nothing.Open Decisions
docker volume rmrather than a permanent CI guarantee. (Raised by Elicit, also implicit in Markus's and Tobias's comments.)🎨 Leonie Voss — UX & Accessibility Lead
No concerns from my angle. This is an infrastructure bug in the OCR Python service; the user-visible failure mode is "guided OCR returns an error in the UI." I checked: the transcribe panel surfaces backend errors via the existing
getErrorMessage()mapping, so the symptom appears as a generic localized OCR-failed message inmessages/{de,en,es}.json— no UI change required by this fix.The one thing worth naming for future reference: a 1.34 GB first-time download takes ~10–30 seconds on a typical connection (the issue notes "~10s on home connection"). If the transcribe panel today shows a generic spinner during that window, the user perceives "the app is stuck." Once this bug is fixed, please check whether the loading state on the first OCR-after-deploy still feels acceptable on a 60+ user's laptop. If it doesn't, that's a separate UX ticket about progressive loading messaging, not part of this fix.
🗳️ Decision Queue — Action Required
1 decision needs your input before implementation starts.
Infrastructure / Scope
AC #6 ("On a fresh
ocr_cachevolume the chosen approach still works") is only honestly satisfiable once the manualchown -R 1000:1000from 2026-05-18 is replaced by an init-container (or equivalent) indocker-compose.prod.yml. Three options:ocr-volume-initservice (Tobias's snippet) to both compose files in the same PR. ~1 extra day of work, but AC #6 becomes a permanent CI-verifiable guarantee instead of "someone chowned once."TMPDIRfix now (resolves staging today), opendevops(ocr): automate ocr_cache + ocr_models volume ownershipas P1, link both ways. Risk: AC #6 stays theoretical until issue 2 lands.(Raised by: Elicit, with concurring observations from Markus and Tobias.)
All seven personas (Markus, Felix, Tobias, Nora, Sara, Elicit, Leonie) converged independently on Approach A for the core fix. That part of the design is settled — only this scoping question remains.
A
Implementation complete — branch
feat/issue-614-tmpdir-persistent-volumeApproach A implemented. All acceptance criteria addressed. PR forthcoming.
Commits
09a04343build(ocr): set ENV TMPDIR=/app/cache/.tmp so docker run uses SSD staging240b373ffix(ocr): create TMPDIR on startup and clear day-old orphans1f7b08b7fix(ocr): add TMPDIR env var and ocr-volume-init service to compose filescfd49ff6docs(ocr): document TMPDIR convention and add ADR-021What was implemented
ocr-service/Dockerfile—ENV TMPDIR=/app/cache/.tmpensures that even baredocker run(without compose) uses SSD staging. Without this, any tool readingTMPDIRbefore the compose env layer is applied would fall back to/tmp.ocr-service/entrypoint.sh— Two lines added beforeensure_blla_model.py:mkdir -p "${TMPDIR:-/tmp}"— creates/app/cache/.tmpon fresh volumes (idempotent on existing ones)find "${TMPDIR:-/tmp}" -mindepth 1 -mtime +1 -delete— clears orphaned fragments from priordocker killduring model downloads (Nora's cross-job leakage concern)docker-compose.ymlanddocker-compose.prod.yml— Both files updated in the same commit to prevent the silent drift that already existed:TMPDIR: /app/cache/.tmpwith threat-model comment added toocr-service.environment/tmptmpfs comment updated to reflect its revised purpose (training ZIPs + PDF buffers only)ocr-volume-initone-shot service added: runschown -R 1000:1000+mkdir -p /app/cache/.tmpbeforeocr-servicestarts. Replaces the manualdocker run --rm alpine chownfrom 2026-05-18 with a permanent infrastructure-as-code guarantee. Addresses AC #6 (fresh volume works) and the volume-bootstrap concern from Markus/Tobias/Elicit (Decision A — same PR).ocr-service.depends_onadded withcondition: service_completed_successfullydocs/adr/021-tmpdir-persistent-volume-staging.md— New ADR documenting the decision, consequences, and rejected alternatives (Approach B/C). Per Markus's recommendation.ocr-service/README.md—HF_HOME,XDG_CACHE_HOME,TORCH_HOME,TMPDIRrows added to the environment variables table.ocr-service/CLAUDE.md— LLM reminder added: TMPDIR must stay on the cache volume; references ADR-021.ocr-service/test_tmpdir.py— Three tests:test_tempfile_uses_tmpdir_when_set— proves Python honoursTMPDIR; monkeypatch-based, runs in CItest_entrypoint_creates_tmpdir— TDD regression test: was RED before the entrypoint change, GREEN after. Runs entrypoint.sh with stub python3/uvicorn, asserts the directory gets created.test_tmpdir_is_inside_persistent_cache_volume— guards against accidental reversion; skipped outside Docker, runs inside the container whereTMPDIR=/app/cache/.tmpis set..gitea/workflows/ci.yml—test_tmpdir.pyadded to the OCR CI run (stdlib-only, no ML stack required).AC verification
/app/cache)/train-sender20–50 image ZIPs) still works/tmp(512 MB, RAM-fast); TMPDIR redirect only affects model stagingread_only: trueretainedcap_drop: [ALL]retainedocruser (uid 1000) retainedocr-volume-initsets ownership to 1000:1000 before service startsocr_cachevolume works (AC #6)ocr-volume-initcreateschown + mkdir; entrypointmkdir -pas second layer