fix(training): only count reviewed blocks as checked text for recognition
Previously all MANUAL blocks counted as eligible training data, even ones where text was filled in by guided OCR but never explicitly reviewed. This caused segmentation and recognition counts to always match. Now only reviewed=true blocks qualify for recognition training, so the counts properly reflect: segments = all drawn annotation boxes, checked text = only boxes where the user has verified the transcription. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -22,7 +22,7 @@ public interface TranscriptionBlockRepository extends JpaRepository<Transcriptio
|
||||
SELECT b FROM TranscriptionBlock b
|
||||
JOIN DocumentAnnotation a ON a.id = b.annotationId
|
||||
JOIN Document d ON d.id = b.documentId
|
||||
WHERE (b.source = 'MANUAL' OR (b.source = 'OCR' AND b.reviewed = true))
|
||||
WHERE b.reviewed = true
|
||||
AND 'KURRENT_RECOGNITION' MEMBER OF d.trainingLabels
|
||||
""")
|
||||
List<TranscriptionBlock> findEligibleKurrentBlocks();
|
||||
|
||||
Reference in New Issue
Block a user