fix(ocr): use 1-based page numbers to match frontend PDF viewer

The PDF viewer uses 1-based currentPage (starting at 1) but the OCR engines produced 0-based pageNumber from enumerate(). Annotations created by OCR were assigned to page 0, which doesn't exist in the viewer. Change enumerate() to start=1 in both engines and the streaming endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:32:08 +02:00
parent bac67706b9
commit 97e5138934
5 changed files with 15 additions and 15 deletions
--- a/ocr-service/engines/kraken.py
+++ b/ocr-service/engines/kraken.py
@@ -88,7 +88,7 @@ def extract_blocks(images: list, language: str = "de") -> list[dict]:
    """
    all_blocks = []

-    for page_idx, image in enumerate(images):
+    for page_idx, image in enumerate(images, start=1):
        all_blocks.extend(extract_page_blocks(image, page_idx, language))

    return all_blocks
--- a/ocr-service/engines/surya.py
+++ b/ocr-service/engines/surya.py
@@ -90,7 +90,7 @@ def extract_blocks(images: list, language: str = "de") -> list[dict]:
    """
    all_blocks = []

-    for page_idx, image in enumerate(images):
+    for page_idx, image in enumerate(images, start=1):
        all_blocks.extend(extract_page_blocks(image, page_idx, language))
        del image