fix(ocr): accept sender_model_path in Surya engine so non-Kurrent OCR works
main.py unifies the call to both engines and always passes `sender_model_path` (None for non-Kurrent scripts). Surya's extract_region_text / extract_page_blocks accepted one fewer positional arg than Kraken's, so every guided-OCR run on a TYPEWRITER or HANDWRITING_LATIN document raised "takes 5 positional arguments but 6 were given" and the stream returned 0 blocks / 1 skipped page. Add an ignored `sender_model_path` kwarg to both Surya functions so the signatures match Kraken's, and guard the regression with two signature tests in test_engines.py that compare both engines' parameter lists. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,6 @@
|
||||
"""Tests for per-page block extraction in OCR engines."""
|
||||
|
||||
import inspect
|
||||
from unittest.mock import MagicMock, patch
|
||||
from PIL import Image
|
||||
|
||||
@@ -176,3 +177,24 @@ def test_kraken_extract_blocks_delegates_to_extract_page_blocks():
|
||||
assert len(blocks) == 2
|
||||
assert blocks[0]["pageNumber"] == 1
|
||||
assert blocks[1]["pageNumber"] == 2
|
||||
|
||||
|
||||
# ─── Engine signatures must match ─────────────────────────────────────────────
|
||||
#
|
||||
# main.py resolves `engine = kraken_engine if use_kraken else surya_engine` and
|
||||
# then invokes the chosen engine with a uniform call pattern that always
|
||||
# includes `sender_model_path` (None for non-Kurrent scripts). A signature
|
||||
# drift between the two engines therefore breaks OCR at runtime — which is
|
||||
# exactly the regression these tests guard against.
|
||||
|
||||
|
||||
def test_extract_region_text_signatures_match():
|
||||
surya_params = list(inspect.signature(surya.extract_region_text).parameters)
|
||||
kraken_params = list(inspect.signature(kraken.extract_region_text).parameters)
|
||||
assert surya_params == kraken_params
|
||||
|
||||
|
||||
def test_extract_page_blocks_signatures_match():
|
||||
surya_params = list(inspect.signature(surya.extract_page_blocks).parameters)
|
||||
kraken_params = list(inspect.signature(kraken.extract_page_blocks).parameters)
|
||||
assert surya_params == kraken_params
|
||||
|
||||
Reference in New Issue
Block a user