feat(ocr): per-script-type confidence thresholds
Kurrent OCR produces much lower confidence than typewriter/Latin. Separate thresholds allow aggressive filtering for Kurrent (0.5) while keeping typewriter lenient (0.3). - OCR_CONFIDENCE_THRESHOLD: default for Surya paths (0.3) - OCR_CONFIDENCE_THRESHOLD_KURRENT: Kraken Kurrent path (0.5) - apply_confidence_markers() now accepts threshold parameter - get_threshold(script_type) selects the right threshold Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -85,6 +85,7 @@ services:
|
||||
environment:
|
||||
KRAKEN_MODEL_PATH: /app/models/german_kurrent.mlmodel
|
||||
OCR_CONFIDENCE_THRESHOLD: "0.3"
|
||||
OCR_CONFIDENCE_THRESHOLD_KURRENT: "0.5"
|
||||
networks:
|
||||
- archive-net
|
||||
healthcheck:
|
||||
|
||||
Reference in New Issue
Block a user