Commit Graph

4 Commits

Author SHA1 Message Date
Marcel
775b5c062e test(ocr): add orphan cleanup behavior tests for entrypoint.sh find -mtime
test_entrypoint_removes_day_old_orphans and test_entrypoint_preserves_fresh_files
verify the find -mtime +1 -delete logic using os.utime() to fabricate old mtimes
without mocking system time. Also extracts _run_entrypoint helper to remove
subprocess setup duplication.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 11:19:33 +02:00
Marcel
e31dac5c9c test(ocr): assert entrypoint.sh exit code in test_entrypoint_creates_tmpdir
A silent non-zero exit would previously cause the test to pass incorrectly
because only directory creation was checked. Exit code is now the first
assertion, catching regressions before the filesystem check runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 11:18:14 +02:00
Marcel
c2bd1b34f0 refactor(ocr): extract _validate_zip_entry to utils.py so ZIP Slip test runs in CI
_validate_zip_entry has no ML-stack dependency; importing it via main.py
pulled in surya/torch and caused the test to be skipped in CI. Moving it
to utils.py (fastapi only) and adding fastapi to the CI lightweight install
lets test_zipslip_still_anchors_under_custom_tmpdir run on every push.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 11:17:15 +02:00
Marcel
240b373f68 fix(ocr): create TMPDIR on startup and clear day-old orphans
On a fresh ocr_cache volume /app/cache/.tmp does not exist yet. The mkdir
ensures the first Surya model download can proceed without ENOSPC on the
512 MB /tmp tmpfs. The find cleanup removes fragments left by docker-kill
mid-download, preventing cross-job ground-truth leakage.

Fixes #614. See ADR-021.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 10:54:17 +02:00