feat(ocr): add guided OCR mode using existing annotation regions

When a document has manually drawn annotation boxes, the user can now enable "Nur annotierte Bereiche" in the OCR trigger panel. The engine skips layout detection entirely and runs recognition only within the pre-drawn bounding boxes, preserving manual transcription blocks. - Python: adds OcrRegion model, extend OcrRequest/OcrBlock; guided branch in /ocr/stream groups by page and crops each region - Engines: add extract_region_text() to both Kraken and Surya - Java: adds OcrBlockResult.annotationId, OcrClient.OcrRegion, TriggerOcrDTO.useExistingAnnotations; OcrAsyncRunner dispatches to upsertGuidedBlock when annotationId is present; OcrService threads the flag through to runSingleDocument - TranscriptionService: adds upsertGuidedBlock (creates, updates OCR, or preserves MANUAL blocks) - Frontend: guided OCR toggle in OcrTrigger shown when blocks exist; skips destructive-replace confirmation in guided mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 15:57:54 +02:00
parent 9b2f91ee59
commit ee58b63517
25 changed files with 380 additions and 55 deletions
--- a/ocr-service/engines/surya.py
+++ b/ocr-service/engines/surya.py
@@ -81,6 +81,25 @@ def extract_page_blocks(image, page_idx: int, language: str = "de") -> list[dict
    return blocks


+def extract_region_text(image, x: float, y: float, w: float, h: float) -> str:
+    """Crop image to a normalized region and run Surya recognition on the crop.
+
+    Used for guided OCR — skips full-page layout detection and only processes
+    the given bounding box. Coordinates are normalized to [0, 1].
+    """
+    load_models()
+
+    pw, ph = image.size
+    x1 = max(0, int(x * pw))
+    y1 = max(0, int(y * ph))
+    x2 = min(pw, int((x + w) * pw))
+    y2 = min(ph, int((y + h) * ph))
+    crop = image.crop((x1, y1, x2, y2))
+
+    predictions = _recognition_predictor([crop], det_predictor=_detection_predictor)
+    return " ".join(line.text for line in predictions[0].text_lines)
+
+
 def extract_blocks(images: list, language: str = "de") -> list[dict]:
    """Run Surya OCR on a list of PIL images (one per page).