test(ocr): add integration tests for spell-check routing in main.py #262
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Background
Deferred during PR #260 review cycle 1.
Concern
The spell-check post-processing is wired into three code paths in
main.py:run_ocr)generate()insiderun_ocr_stream)generate_guided())None of these wiring points currently have test coverage verifying that
correct_text()is called whenscriptTypeisHANDWRITING_KURRENTorHANDWRITING_LATIN, and that it is not called forTYPEWRITER.Why deferred
The full ML stack (Kraken, Surya, model files) is not available in CI or local without GPU provisioning. A minimal smoke test could mock out the OCR engines using
unittest.mock.patchandhttpx.AsyncClientwithASGITransport, but this was out of scope for the initial feature PR.Suggested approach
Write three parametrized tests (block / stream / guided mode) using
ASGITransportand patching:main._download_and_convert_pdf→ returns a fake PIL image listmain.kraken_engine.extract_blocks→ returns a fake block with textmain.correct_text→ verify it is called once per block for handwriting types, zero times for typewriterReference
PR: http://heim-nas:3005/marcel/familienarchiv/pulls/260
Raised by: @saraholt in PR review