familienarchiv

marcel/familienarchiv

Fork 0

Files

History

Marcel 902d423f3c

CI / Unit & Component Tests (push) Failing after 1s

Details

CI / Backend Unit Tests (push) Failing after 1s

Details

CI / Unit & Component Tests (pull_request) Failing after 1s

Details

CI / Backend Unit Tests (pull_request) Failing after 1s

Details

fix(ocr): reduce memory usage for 16GB dev machines

- Surya models lazy-load on first OCR request instead of at startup
  (saves ~3-4GB idle RAM — Kraken stays eager at ~16MB)
- Process one page at a time in Surya engine (limits peak memory)
- RECOGNITION_BATCH_SIZE=1, DETECTOR_BATCH_SIZE=1 (slower but fits in RAM)
- Revert mem_limit back to 6GB (sufficient with these optimizations)
- Render DPI stays at 200

Idle memory: ~2GB (Kraken only). Peak during OCR: ~5-6GB (Surya loaded).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-12 22:26:50 +02:00

engines

fix(ocr): reduce memory usage for 16GB dev machines

2026-04-12 22:26:50 +02:00

confidence.py

feat(ocr): per-script-type confidence thresholds

2026-04-12 20:50:59 +02:00

Dockerfile

fix(ocr): add pyvips for kraken PDF input support

2026-04-12 20:11:14 +02:00

main.py

fix(ocr): reduce memory usage for 16GB dev machines