fix(ocr): accept sender_model_path in Surya engine so non-Kurrent OCR works
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m36s
CI / OCR Service Tests (push) Successful in 33s
CI / Backend Unit Tests (push) Has started running

main.py unifies the call to both engines and always passes
`sender_model_path` (None for non-Kurrent scripts). Surya's
extract_region_text / extract_page_blocks accepted one fewer positional
arg than Kraken's, so every guided-OCR run on a TYPEWRITER or
HANDWRITING_LATIN document raised "takes 5 positional arguments but 6
were given" and the stream returned 0 blocks / 1 skipped page.

Add an ignored `sender_model_path` kwarg to both Surya functions so the
signatures match Kraken's, and guard the regression with two signature
tests in test_engines.py that compare both engines' parameter lists.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-04-23 09:28:25 +02:00
parent 90f111fcb1
commit 1f7b712dd0
2 changed files with 40 additions and 3 deletions

View File

@@ -1,5 +1,6 @@
"""Tests for per-page block extraction in OCR engines."""
import inspect
from unittest.mock import MagicMock, patch
from PIL import Image
@@ -176,3 +177,24 @@ def test_kraken_extract_blocks_delegates_to_extract_page_blocks():
assert len(blocks) == 2
assert blocks[0]["pageNumber"] == 1
assert blocks[1]["pageNumber"] == 2
# ─── Engine signatures must match ─────────────────────────────────────────────
#
# main.py resolves `engine = kraken_engine if use_kraken else surya_engine` and
# then invokes the chosen engine with a uniform call pattern that always
# includes `sender_model_path` (None for non-Kurrent scripts). A signature
# drift between the two engines therefore breaks OCR at runtime — which is
# exactly the regression these tests guard against.
def test_extract_region_text_signatures_match():
surya_params = list(inspect.signature(surya.extract_region_text).parameters)
kraken_params = list(inspect.signature(kraken.extract_region_text).parameters)
assert surya_params == kraken_params
def test_extract_page_blocks_signatures_match():
surya_params = list(inspect.signature(surya.extract_page_blocks).parameters)
kraken_params = list(inspect.signature(kraken.extract_page_blocks).parameters)
assert surya_params == kraken_params