familienarchiv

Author	SHA1	Message	Date
Marcel	81da127381	refactor(ocr): rename findTop5 to findTop10 for headroom as frontend shows 3 by default Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	f206c0b9e9	test(ocr): add unit tests for triggerSegTraining() — conflict, threshold, happy path, failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	15e532eb96	refactor(ocr): extract assertNoRunningTraining() to eliminate duplicate guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	f241a71733	feat(frontend): limit training history to 3 runs with expand toggle Both training panels (OCR and segmentation) share TrainingHistory. Show only the 3 most recent runs by default; render a Mehr/Weniger anzeigen button when there are more. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	b83465020a	fix(backend): store error rate for segmentation training runs setCer() was called for recognition training but not for segmentation. The OCR service now returns cer = 1 - accuracy for segtrain; persist it so the admin panel can display Fehlerrate for both training types. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	f08897b801	fix(deploy): wire OCR training token to backend and raise container memory limit - Pass OCR_TRAINING_TOKEN through to the backend container as APP_OCR_TRAINING_TOKEN so RestClientOcrClient sends the X-Training-Token header when calling /train and /segtrain. - Raise mem_limit/memswap_limit from 8g to 12g to give segtrain headroom on hosts with more available RAM. - Uncomment OCR_TRAINING_TOKEN in .env.example — it is now required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	a5979c4069	fix(ocr-service): fix ketos 7 segtrain compatibility and prevent OOM Three issues fixed: 1. --resize both was removed in ketos 7; replaced with --resize union which extends the model's class mapping to include training data classes. 2. ketos ignores -s when -i is present, so the 1800px blla model caused 7+ GB peak RAM and OOM-killed the host (no swap, 5 GB free). Now checks the loaded model's input height: only uses the base model when it was already fine-tuned at 800px; otherwise trains from scratch at 800px (~200 MB peak). After the first run the trained 800px model becomes the base for all subsequent fine-tuning runs. 3. segtrain now computes and returns cer = 1 - accuracy, matching the recognition training path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	e8375d6c72	fix(ocr-service): add entrypoint that validates blla model format on startup Adds ensure_blla_model.py which loads the blla segmentation model with ketos on every container start. If the model is missing or in the legacy PyTorch ZIP format (incompatible with ketos 7), it re-downloads the correct CoreML protobuf model from Zenodo (DOI 10.5281/zenodo.14602569). The Dockerfile now uses entrypoint.sh which runs this check before starting uvicorn. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	28ac90b529	fix(annotations): replace outline:none with focus-visible ring for keyboard accessibility [M7] Some checks failed CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:42:01 +02:00
Marcel	76828a95e3	fix(annotations): add catch(err) binding to handlePointerUp error handler [M6] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:41:21 +02:00
Marcel	7125a0a8eb	fix(annotations): reset liveWidth/liveHeight in handleKeyDown error rollback [M1, M6] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:40:55 +02:00
Marcel	7097f991fe	feat(annotations): add keyboard accessibility to resize handles [B2] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:40:30 +02:00
Marcel	4d9145e49f	feat(annotations): wire SVG aria-label to Paraglide i18n [B3] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:39:35 +02:00
Marcel	060d1c0515	feat(i18n): add annotation_resize_area and annotation_resize_handle message keys [B2, B3] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:38:10 +02:00
Marcel	72700bd28f	test(annotations): add Testcontainers integration tests for V33 chk_annotation_bounds [B1] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:36:37 +02:00
Marcel	40c8f548db	docs(annotations): fix ANNOTATION_UPDATE_FAILED Javadoc to reflect 400 status [M3] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:34:55 +02:00
Marcel	a19faa3806	feat(annotations): add @Slf4j and DataIntegrityViolationException catch to updateAnnotation [M2] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:34:03 +02:00
Marcel	f00b470928	test(annotations): add failing test for DataIntegrityViolationException defense [M2 red] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:33:43 +02:00
Marcel	65d606d8bb	test(annotations): add missing height and x boundary validation tests [M4] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:31:07 +02:00
Marcel	4d3207fc27	test(annotations): verify save() is called in updateAnnotation test [M5] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:30:50 +02:00
Marcel	2350b4f845	fix(annotations): make resize overlay keyboard-interactive Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 2s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details - Add tabindex="0" so the SVG can receive DOM focus - Auto-focus the SVG on mount so arrow keys work immediately after clicking an annotation to select it - Show preview rect during keyboard nudging (not just pointer drag) by checking hasLiveChanges instead of only checking dragState - Suppress default browser focus outline (outline: none) on the SVG Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 11:47:41 +02:00
Marcel	9fe5b32a69	feat(annotations): add N/S/E/W edge midpoint handles to resize overlay Extends the 4-corner L-bracket handles with 4 tick-mark edge handles (short lines along each edge), enabling single-axis resize from any edge. Updates applyHandleDrag to route each handle to the correct axis. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 11:40:39 +02:00
Marcel	fcc0efbf02	refactor(annotations): replace 8-square handles with 4 corner L-brackets - 4 corner-only handles (nw/ne/sw/se), no edge midpoints - Each handle renders as two short perpendicular lines meeting at the corner (10px arms, navy, square linecap) — no fill, no box - Thin dashed selection border added to SVG overlay to signal edit mode - Simplify applyHandleDrag to remove dead n/s/e/w branches Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 11:14:30 +02:00
Marcel	e7f88a4ea1	fix(annotations): use pixel-space viewBox so handles stay square on non-square annotations ResizeObserver binds actual SVG pixel dimensions; viewBox matches them so 16px handle squares and 44px hit areas are physically correct regardless of the annotation's aspect ratio. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 11:03:15 +02:00
Marcel	c610a3cc37	feat(annotations): wire updateAnnotation context and error display into PdfViewer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 11:00:50 +02:00
Marcel	3fb32ea285	feat(annotations): pass isResizable to AnnotationShape based on selection + transcribeMode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:57:13 +02:00
Marcel	3b756cd718	feat(annotations): add isResizable prop to AnnotationShape to render edit overlay Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:55:13 +02:00
Marcel	f5362a5850	feat(annotations): add AnnotationEditOverlay component with resize handles and drag Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:52:07 +02:00
Marcel	953cb2c910	feat(i18n): add ANNOTATION_UPDATE_FAILED error code and annotation_edit_mode_active translation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:43:10 +02:00
Marcel	ff231db671	feat(annotations): add PATCH endpoint for annotation resize/move Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:42:08 +02:00
Marcel	1558881c01	feat(annotations): add updateAnnotation service method with partial-update DTO Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:39:50 +02:00
Marcel	26c7181ba4	feat(annotations): add ANNOTATION_UPDATE_FAILED error code Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:38:33 +02:00
Marcel	f76a6c0ee5	migration(annotations): add chk_annotation_bounds CHECK constraint (V33) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:38:11 +02:00
Marcel	ca10e8a6a9	fix(test): update TranscriptionEditView empty-state assertion after text change Some checks failed CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 2s Details CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 2s Details Commit `5afdc37` changed the empty state from transcription_empty_cta ('Markiere einen Bereich…') to transcription_empty_draw_hint ('Zeichnen Sie Bereiche…') but left the spec asserting the old text. Updated the locator to match the current component output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:11:57 +02:00
Marcel	22ee3dce68	fix(api): remove duplicate import and align patchTrainingLabel OpenAPI response to 204 Removed duplicate import of org.mockito.ArgumentMatchers.eq from DocumentControllerTest (lines 32+35). Added @ApiResponse(responseCode="204") to patchTrainingLabel so the generated OpenAPI spec matches the actual NoContent response the controller returns. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:07:41 +02:00
Marcel	99847980d2	fix(a11y): replace unicode glyphs with SVG icons in TrainingHistory status badges WCAG 1.4.1 (Use of Color) requires non-color redundant cues for status. The unicode ✓/✗ characters had inconsistent screen-reader support. Replaced with explicit aria-hidden SVG icons (checkmark / x-circle) alongside the translated status text labels. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:06:11 +02:00
Marcel	8f6e398af7	fix(i18n): replace hardcoded German training label chip strings with Paraglide keys TranscriptionEditView rendered 'Kurrent-Erkennung' and 'Segmentierung' as hardcoded German strings, breaking the en/es locales. Added training_chip_kurrent and training_chip_segmentation keys to all three message files and wired them up via m.training_chip_kurrent() / m.training_chip_segmentation(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:04:52 +02:00
Marcel	30a17c97e8	fix(ocr): fail closed when TRAINING_TOKEN is not configured _check_training_token previously skipped auth when TRAINING_TOKEN was empty, allowing unauthenticated requests to reach /train and /segtrain. Now returns 503 ("Training not configured on this node") when the token is absent, so missing configuration fails closed rather than open. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:02:13 +02:00
Marcel	dc283ba271	fix(training): remove @Transactional from triggerTraining to avoid holding DB connection during OCR HTTP call OcrTrainingService.triggerTraining() and triggerSegTraining() held a DB connection open for the entire ketos training run (potentially minutes), risking connection pool exhaustion. Replaced class-level @Transactional with TransactionTemplate for narrow DB writes: guard+create and result-record each run in their own short transaction; the HTTP call to the OCR service runs between them with no open connection. Also replaces blockRepository.findAll().size() with blockRepository.count() in getTrainingInfo() to avoid loading every block into heap on each poll. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 09:59:12 +02:00
Marcel	62be895b9e	fix(ocr): drop uvicorn workers from 2 to 1 Two workers × ~5 GB Surya model load = ~10 GB required, exceeding the 8 GB memory cap and causing OOM on the first /train call. Two OS processes also cause model-state divergence after training, contradicting the single-node constraint documented in ADR-001. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 09:55:55 +02:00
Marcel	7b79dc105b	test(migrations): add Testcontainers integration tests for V23 + V30 constraints Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 0s Details V23 introduced a JSONB check constraint (chk_annotation_polygon_quad) requiring polygon arrays to have exactly 4 points. V30 introduced a partial unique index preventing two concurrent RUNNING training runs. These are DB-level invariants that unit tests cannot verify — five Testcontainers tests now assert they are correctly applied by Flyway and enforced by PostgreSQL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:07:17 +02:00
Marcel	e933aacc92	docs(infra): add .env.example with OCR_TRAINING_TOKEN Fresh cloners had no tracked reference for required env vars. .env is gitignored (contains real credentials). .env.example documents all variables including the new OCR_TRAINING_TOKEN for the Python OCR microservice training endpoints. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:03:10 +02:00
Marcel	fdba3211aa	fix(a11y): add aria-live to OcrProgress page counter Screen readers did not announce page-by-page OCR progress updates. Wrapping the counter text in a span with aria-live=polite ensures assistive technology announces each page completion without interrupting the user. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:02:25 +02:00
Marcel	287920a982	docs(ocr): document single-node constraint for OCR training Training reloads the Kraken model in-process on the Python service. The DB-level RUNNING constraint prevents concurrent API calls but cannot protect against multi-replica deployments. Added explicit comments in docker-compose.yml and OcrTrainingService to prevent accidental horizontal scaling. See ADR-001. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:01:45 +02:00
Marcel	2b355e748e	fix(ocr): increase presigned URL TTL from 15 min to 1 hour A 100-page document at ~10 s/page takes ~17 min on CPU-only hardware, which could cause the presigned URL to expire mid-OCR job. 1 hour gives ample headroom for any realistic document size in this archive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:00:52 +02:00
Marcel	2181fe0b50	test(annotations): fix AnnotationServiceTest — add missing TranscriptionBlockRepository mock The cascade-delete commit (`5a5a8b6`) added blockRepository.deleteByAnnotationId() to AnnotationService.deleteAnnotation(), but the test class was not updated to mock TranscriptionBlockRepository. Mockito injected null, causing deleteAnnotation_succeeds_whenOwner to throw NPE. Adds the mock, verifies the cascade call, and adds an inOrder test asserting the block is deleted before the annotation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:00:09 +02:00
Marcel	5a5a8b6e5c	fix(annotations): cascade-delete transcription block when annotation is deleted Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details The DELETE endpoint was returning 500 due to a FK constraint violation. `deleteAnnotation` now calls `blockRepository.deleteByAnnotationId()` before removing the annotation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 22:31:02 +02:00
Marcel	5afdc37653	feat(ui): manual-first OCR workflow — remove full-page auto-segmentation Drawing annotations is now the primary workflow. OCR only runs on manually drawn regions (guided mode always). Full-page layout detection and the useExistingAnnotations checkbox are removed entirely. - OcrTrigger: guided-only, disabled with hint when no annotations exist - TranscriptionEditView: empty state shows draw-regions instruction, OCR trigger moved out of collapsible and shown inline after block list - i18n: add ocr_trigger_no_annotations, ocr_section_heading, transcription_empty_draw_hint; remove ocr_use_existing_annotations keys Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 22:24:50 +02:00
Marcel	669f2f8b98	fix(training): output CoreML format and fix best-model finder ketos 7 defaults to safetensors output, but kraken's load_any() only handles CoreML (.mlmodel). Adding --weights-format coreml ensures the hot-swap after training produces a file that load_any() can parse. Also fixed _find_best_model to look for best_<score>.mlmodel (produced by --weights-format coreml) in addition to the previous checkpoint_* pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 21:57:42 +02:00
Marcel	49c9022285	fix(training): switch to PAGE XML format for kurrent recognition training Kraken 7 removed support for the legacy `path` format (image + .gt.txt pairs) in VGSLRecognitionDataModule despite the CLI still advertising it. Switching to PAGE XML (-f page) format which is the supported standard. - Java export now writes .xml alongside .png (PAGE XML with TextLine, Baseline at 75% height, and Unicode transcription) - XML special characters in transcription text are escaped (& < >) - Python trainer globs *.xml and passes -f page to ketos train - Regenerated frontend API types to include cer/loss/accuracy/epochs on OcrTrainingRun (were missing, causing empty CER column in history) - Updated and extended TrainingDataExportServiceTest Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 21:45:08 +02:00

... 6 7 8 9 10 ...

1206 Commits