Commit Graph

1143 Commits

Author SHA1 Message Date
Marcel
c3939e0f13 refactor(ocr): move person-name enrichment from OcrController into OcrTrainingService
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
4f86011ffb test(ocr): verify triggerSenderTraining upserts SenderModel with correct path and cer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
3ecda655c5 fix(ocr): eliminate race window in runOrQueueSenderTraining by creating RUNNING row atomically
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
68ec66002a fix(ocr): correct trainSenderModel URI from /train to /train-sender
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
a296ad527e feat(ocr): wire SenderModelService into OcrAsyncRunner; stage missing foundational files
OcrAsyncRunner now passes the per-sender model path to streamBlocks for
HANDWRITING_KURRENT documents. processDocument replaced extractBlocks
with streamBlocks + AtomicReference, removing the unchecked raw-array
pattern.

Also stages all previously uncommitted foundational files for this
feature: SenderModel entity, SenderModelRepository, Flyway migrations
V40/V41, updated OcrClient/RestClientOcrClient streaming API,
TrainingDataExportService.exportForSender, TranscriptionService Kurrent
hook, application.yaml OCR config, and frontend i18n/test additions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
a8bd2606a0 refactor(ocr): move sender training methods from OcrTrainingService to SenderModelService
Eliminates cross-domain repository access: OcrTrainingService no longer
holds SenderModelRepository. SenderModelService now owns the full sender
training lifecycle (runOrQueueSenderTraining, triggerSenderTraining,
promoteNextQueuedRun), removing the circular dependency risk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
607a3567e6 refactor(ocr): delete buildTrainingInfoMap() dead code
The controller now builds the map inline (with personNames support).
This method had zero callers.

Fixes reviewer concerns from @felixbrandt and @mkeller.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
7cc90b8a90 fix(ocr): log debug instead of silently swallowing person name resolution errors
Replaces catch(Exception ignored){} with log.debug() in getTrainingInfo().
Adds controller test documenting the graceful degradation behavior
(response stays 200 when personService.getById() throws).

Fixes reviewer concerns from @felixbrandt and @nullx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
cd31bf63c1 feat(frontend): wire personNames to TrainingHistory in OcrTrainingCard
Extends Run interface with personId and QUEUED status, TrainingInfo with
personNames map, and passes it through to TrainingHistory for per-sender
model column display.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
add799c57f chore: regenerate API types for per-sender model additions
OcrTrainingRun now includes personId (uuid, optional) and QUEUED status.
TrainingInfoResponse includes runs array with personId fields.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
a146a2ec3c feat(ocr): per-sender model registry and /train-sender endpoint
engines/kraken.py:
- Add _SenderModelRegistry with LRU eviction (max configurable via
  OCR_MAX_CACHED_MODELS env var), double-checked locking, invalidate(),
  and path whitelist (/app/models/ only)
- Add _load_sender_model() helper for testability
- extract_page_blocks() and extract_region_text() accept optional
  sender_model_path; route to sender registry when provided

models.py:
- OcrRequest gains senderModelPath: str | None = None field

main.py:
- /ocr and /ocr/stream pass request.senderModelPath to Kraken engine
- New /train-sender endpoint: validates output_model_path, runs ketos
  train with base model as starting point, invalidates sender cache

docker-compose.yml:
- Add OCR_MAX_CACHED_MODELS: "5" to ocr-service environment

test_sender_registry.py:
- 4 tests: cache hit, LRU eviction, invalidate, path traversal guard

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
548ad0fa68 test: add unit tests for SenderModelService, runOrQueueSenderTraining, and updateBlock hook
- SenderModelServiceTest: 6 tests covering activation threshold (99/100),
  retrain delta (149/150), runNow flag (queued vs triggered)
- OcrTrainingServiceTest: 3 tests for runOrQueueSenderTraining — idle returns
  true, running saves QUEUED, duplicate QUEUED coalesces
- TranscriptionServiceTest: 3 tests for updateBlock — sets source=MANUAL,
  triggers training for HANDWRITING_KURRENT with sender, skips when no sender

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
e3c8e1a067 test: fix broken tests after per-sender model integration
- OcrAsyncRunnerTest: switch from extractBlocks/4-arg streamBlocks stubs
  to 5-arg streamBlocks (senderModelPath param) via doAnswer
- TranscriptionServiceTest: stub documentService.getDocumentById in
  updateBlock tests so the new Kurrent training hook does not NPE
- OcrControllerTest: add @MockitoBean PersonService (now injected into
  OcrController for personNames assembly in getTrainingInfo)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:30:54 +02:00
Marcel
1279753ddb fix(ocr): clarify stat card labels to distinguish available vs total blocks
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m30s
CI / OCR Service Tests (push) Successful in 38s
CI / Backend Unit Tests (push) Failing after 2m50s
CI / Unit & Component Tests (pull_request) Failing after 2m52s
CI / OCR Service Tests (pull_request) Successful in 48s
CI / Backend Unit Tests (pull_request) Failing after 2m55s
'Trainingsblöcke' and 'Gesamt Blöcke' were indistinguishable.
Labels now read 'Bereit (OCR-Training)', 'Textblöcke gesamt',
'Trainingsdokumente', 'Bereit (Segm.-Training)'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 11:06:43 +02:00
Marcel
6c2e7078ba fix(ocr): widen OCR overview page to max-w-5xl for two-column training cards
Some checks failed
CI / Backend Unit Tests (pull_request) Failing after 2m51s
CI / Unit & Component Tests (push) Failing after 2m34s
CI / OCR Service Tests (push) Successful in 44s
CI / Backend Unit Tests (push) Failing after 2m50s
CI / Unit & Component Tests (pull_request) Failing after 2m38s
CI / OCR Service Tests (pull_request) Successful in 41s
max-w-4xl was too narrow for the side-by-side training card grid.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 10:54:09 +02:00
Marcel
cea1234400 feat(ocr): move training cards from system page to OCR overview page
OcrTrainingCard and SegmentationTrainingCard now live on the dedicated
OCR overview page. System page no longer fetches training info.
SegmentationTrainingCard updated to use shared TrainingRun type.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 10:52:23 +02:00
Marcel
9ff498a194 feat(training-history): hide person/type columns for segmentation context
Add showPersonColumns prop (default true) to TrainingHistory.
SegmentationTrainingCard passes false — segmentation is not person-specific.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 10:39:18 +02:00
Marcel
8128769feb style(admin/ocr): center content with max-w-4xl, wrap history tables in cards
Some checks failed
CI / OCR Service Tests (push) Successful in 39s
CI / Unit & Component Tests (push) Failing after 2m34s
CI / Backend Unit Tests (push) Failing after 2m45s
CI / Unit & Component Tests (pull_request) Failing after 2m33s
CI / OCR Service Tests (pull_request) Successful in 40s
CI / Backend Unit Tests (pull_request) Failing after 2m51s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 10:33:39 +02:00
Marcel
16bcd0f73c fix(ocr): replace IllegalStateException with DomainException in triggerSenderTraining
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m33s
CI / OCR Service Tests (push) Successful in 36s
CI / Backend Unit Tests (push) Failing after 2m46s
CI / Unit & Component Tests (pull_request) Failing after 2m37s
CI / OCR Service Tests (pull_request) Successful in 36s
CI / Backend Unit Tests (pull_request) Failing after 2m50s
Consistent with triggerManualSenderTraining — both defensive paths now use
DomainException.internal(OCR_TRAINING_CONFLICT) when the expected RUNNING row
is not found after creation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 09:55:22 +02:00
Marcel
fc892f0f59 fix(admin): pass personId through load fn instead of params prop; widen touch targets in table rows
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m33s
CI / OCR Service Tests (push) Successful in 35s
CI / Backend Unit Tests (push) Failing after 2m43s
CI / Unit & Component Tests (pull_request) Failing after 2m35s
CI / OCR Service Tests (pull_request) Successful in 41s
CI / Backend Unit Tests (pull_request) Failing after 2m44s
SvelteKit page components receive only data/form as props; accessing params
directly caused a TypeError and personName always fell back to 'Unknown'.
Also moves py-3 padding from <td> to <a> in OcrModelsTable to give
keyboard/touch users a full-height 44px target (WCAG 2.5.5).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 09:46:47 +02:00
Marcel
2466553216 fix(ocr): replace IllegalStateException with DomainException.internal in triggerManualSenderTraining
Ensures the unexpected-state path produces a structured JSON error response
instead of an unmapped 500 RuntimeException. Adds OCR_TRAINING_CONFLICT
ErrorCode and mirrors it in the frontend errors.ts. Adds coverage tests for
getAllSenderModels() and runSenderTraining().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 09:46:00 +02:00
Marcel
794000cbd1 fix(admin): locale-agnostic OcrHealthBar tests, focus rings on all OCR links
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m33s
CI / OCR Service Tests (push) Successful in 42s
CI / Backend Unit Tests (push) Failing after 2m45s
CI / Backend Unit Tests (pull_request) Failing after 2m44s
CI / Unit & Component Tests (pull_request) Failing after 2m43s
CI / OCR Service Tests (pull_request) Successful in 47s
OcrHealthBar spec used /online/i and /offline/i text matchers that would fail
in Spanish locale — replaced with CSS class assertions on role="img" dot.

Added focus-visible:ring-2/ring-brand-navy/rounded-sm to all links in OCR
admin pages (OcrModelsTable person+details, global history link, back-links
in global and personId detail pages) to satisfy WCAG 2.4.7.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 09:23:39 +02:00
Marcel
269894a47a refactor(ocr): move async training dispatch out of controller into SenderModelService
Controller was deciding when to fire runSenderTraining based on the returned run
status — a business rule that belongs in the service. Introduces @Lazy self-reference
to preserve @Async proxy dispatch without self-invocation bypassing Spring AOP.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 09:18:43 +02:00
Marcel
a00617194c fix(admin): i18n all hardcoded OCR strings, fix personName lookup, add empty state
Some checks failed
CI / Unit & Component Tests (push) Failing after 3m17s
CI / OCR Service Tests (push) Successful in 57s
CI / Backend Unit Tests (push) Failing after 2m52s
CI / Unit & Component Tests (pull_request) Failing after 2m47s
CI / OCR Service Tests (pull_request) Successful in 43s
CI / Backend Unit Tests (pull_request) Failing after 2m48s
- Replace hardcoded EN strings in OcrHealthBar/OcrStatCards/OcrModelsTable with
  Paraglide message keys (de/en/es translations added)
- Add role=img + aria-label to OcrHealthBar status dot
- Add {:else} empty-state row in OcrModelsTable
- Fix personName derivation in [personId]/+page.svelte to use params.personId key
  instead of Object.values()[0] (fragile when multiple persons present)
- Update OcrModelsTable spec to assert empty-state row structure (locale-agnostic)
- Add missing availableSegBlocks test to OcrStatCards spec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 08:59:49 +02:00
Marcel
b879d28761 fix(ocr): validate personId in TriggerSenderTrainingDTO — returns 400 not 500 on null
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 08:49:17 +02:00
Marcel
8acb830649 feat(admin): add OCR admin routes — overview, global history, sender detail
Some checks failed
CI / Backend Unit Tests (push) Failing after 2m45s
CI / Unit & Component Tests (pull_request) Failing after 2m32s
CI / OCR Service Tests (pull_request) Successful in 27s
CI / Backend Unit Tests (pull_request) Failing after 2m44s
CI / Unit & Component Tests (push) Failing after 2m35s
CI / OCR Service Tests (push) Successful in 30s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 01:05:08 +02:00
Marcel
0d8ac46639 feat(admin): add OcrHealthBar, OcrStatCards, OcrModelsTable components
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:30:24 +02:00
Marcel
5f4e60a14c feat(admin): add OCR entry to EntityNav sidebar and flyout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:25:42 +02:00
Marcel
f533817c7b feat(api): regenerate TypeScript types with new OCR admin endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:20:46 +02:00
Marcel
99e7176eac test(ocr): add service-level tests for triggerManualSenderTraining
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:17:54 +02:00
Marcel
c3fa09d12e feat(ocr): add POST /api/ocr/train-sender endpoint for manual sender training
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:16:02 +02:00
Marcel
178afcd496 feat(ocr): add per-model history endpoints via path segments
Add findByPersonIdIsNullOrderByCreatedAtDesc + findByPersonIdOrderByCreatedAtDesc to
OcrTrainingRunRepository. Add dto/TrainingHistoryResponse. Expose
GET /api/ocr/training-info/global and GET /api/ocr/training-info/{personId} on
OcrController, both requiring ADMIN; getSenderTrainingHistory guards person existence
via PersonService and returns 404 for unknown personId.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:08:38 +02:00
Marcel
b1b7418404 feat(ocr): promote TrainingInfoResponse to dto, add senderModels field
Move TrainingInfoResponse from private nested record to dto/TrainingInfoResponse.java,
add senderModels field, inject SenderModelService into OcrTrainingService so personNames
covers all known senders rather than only recent-run participants.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:04:29 +02:00
Marcel
a52c8bf079 test(db): verify V42 partial unique index for QUEUED training runs per person
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m32s
CI / OCR Service Tests (push) Successful in 31s
CI / Backend Unit Tests (push) Failing after 2m41s
CI / Unit & Component Tests (pull_request) Failing after 2m33s
CI / OCR Service Tests (pull_request) Successful in 37s
CI / Backend Unit Tests (pull_request) Failing after 2m49s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:54:34 +02:00
Marcel
da0a7e9194 fix(frontend): increase dismiss button touch target to 44×44px (WCAG 2.5.5)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:50:05 +02:00
Marcel
bbfd234746 refactor(ocr): use stream .toList() instead of FQCN Collectors.toList()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:47:36 +02:00
Marcel
b396fccd52 fix(frontend): QUEUED badge test, touch target on dismiss button, focus ring on expand toggle
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m37s
CI / OCR Service Tests (push) Successful in 40s
CI / Backend Unit Tests (push) Failing after 2m50s
CI / Unit & Component Tests (pull_request) Failing after 2m31s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Failing after 2m38s
Add missing test coverage for the amber QUEUED status badge in TrainingHistory.
Fix WCAG 2.2 minimum touch target (24 × 24 px) on the success-message dismiss
button in OcrTrainingCard. Add focus-visible ring to the expand/collapse toggle
in TrainingHistory so keyboard users get a visible focus indicator.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:36:26 +02:00
Marcel
64a854aad6 refactor(ocr): mark _SenderModelRegistry.contains as private (_contains)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:26:46 +02:00
Marcel
92f3c04d54 fix(ocr): add partial unique index and align SenderModelServiceTest with suite style
Add V42 partial unique index on ocr_training_runs(person_id) WHERE status='QUEUED'
to enforce the per-person queued coalescing guarantee at the DB level. Also adds
@ExtendWith(MockitoExtension.class) to SenderModelServiceTest for consistency with
the rest of the service test suite, with lenient() on the shared txTemplate stub.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:25:18 +02:00
Marcel
0d5f3f38d0 perf(ocr): resolve person names in single batch query in getTrainingInfo
Replace the per-run getById loop with a single getAllById call on distinct
person IDs, eliminating the N+1 query when training history contains multiple
sender model runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:21:12 +02:00
Marcel
4aa477555d refactor(ocr): return TrainingInfoResponse directly from getTrainingInfo endpoint
Remove the intermediate Map<String,Object> and return the typed record directly
so OpenAPI codegen produces a concrete TypeScript type. Fixes lastRun serializing
as {} (empty object) instead of null when no training run exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:18:27 +02:00
Marcel
84c09e41ef test(ocr): add /train-sender auth tests and run sender registry tests in CI
Add 503/403 auth tests for the /train-sender endpoint, matching the pattern
already used for /train and /segtrain. Also surface test_sender_registry.py
in CI (it needs no ML stack) and add pytest-asyncio to the install step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:14:27 +02:00
Marcel
e16dcdb7dc docs(ocr): document tail-recursive queue drain design in promoteNextQueuedRun
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m36s
CI / OCR Service Tests (push) Successful in 34s
CI / Backend Unit Tests (push) Failing after 2m43s
CI / Unit & Component Tests (pull_request) Failing after 2m38s
CI / OCR Service Tests (pull_request) Successful in 35s
CI / Backend Unit Tests (pull_request) Failing after 2m43s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:54:53 +02:00
Marcel
000079fd50 refactor(ocr): rename _contains to contains in SenderModelRegistry
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:53:16 +02:00
Marcel
a09a9e6043 fix(frontend): show person name inline in mobile status cell in TrainingHistory
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:52:08 +02:00
Marcel
1e289100a1 fix(frontend): show error on training start failure, add aria-live and dismiss to success message
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:46:06 +02:00
Marcel
0c2175aa07 refactor(frontend): extract shared TrainingRun type to $lib/types/training.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:42:06 +02:00
Marcel
f76a9cce1f test(ocr): add failure path and DONE status assertions to SenderModelServiceTest
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:38:43 +02:00
Marcel
e2081b57e7 refactor(ocr): extract exportSenderData helper in triggerSenderTraining
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m36s
CI / OCR Service Tests (push) Successful in 37s
CI / Backend Unit Tests (push) Failing after 2m51s
CI / Unit & Component Tests (pull_request) Failing after 2m42s
CI / OCR Service Tests (pull_request) Successful in 35s
CI / Backend Unit Tests (pull_request) Failing after 2m54s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:24:38 +02:00
Marcel
07035b9fa9 style(ocr): add Image type hints to extract_page_blocks and extract_region_text
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:22:34 +02:00