feat(training): add segmentation training pipeline and complete Part 6

- Add /segtrain endpoint to OCR service (ZIP upload, ketos.segtrain,
  backup rotation, in-process model reload)
- Add segtrainModel() to OcrClient and RestClientOcrClient (10-min timeout,
  X-Training-Token header)
- Add SegmentationTrainingExportService: PAGE XML export with polygon
  de-normalization and per-page PNG rendering via PDFBox
- Add GET /api/ocr/segmentation-training-data/export endpoint
- Make TranscriptionBlock.text nullable for segmentation-only blocks
  (V31 migration)
- Add Paraglide i18n translation keys for all training UI strings (de/en/es)
- Pass source prop from TranscriptionEditView to TranscriptionBlock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-04-13 15:15:17 +02:00
parent 86e9c05aaf
commit 9b2f91ee59
13 changed files with 383 additions and 5 deletions

View File

@@ -152,6 +152,13 @@
<artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
<version>3.0.2</version>
</dependency>
<!-- PDF rendering for training data export -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>3.0.4</version>
</dependency>
</dependencies>