Commit Graph

1175 Commits

Author SHA1 Message Date
Marcel
16bcd0f73c fix(ocr): replace IllegalStateException with DomainException in triggerSenderTraining
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m33s
CI / OCR Service Tests (push) Successful in 36s
CI / Backend Unit Tests (push) Failing after 2m46s
CI / Unit & Component Tests (pull_request) Failing after 2m37s
CI / OCR Service Tests (pull_request) Successful in 36s
CI / Backend Unit Tests (pull_request) Failing after 2m50s
Consistent with triggerManualSenderTraining — both defensive paths now use
DomainException.internal(OCR_TRAINING_CONFLICT) when the expected RUNNING row
is not found after creation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 09:55:22 +02:00
Marcel
fc892f0f59 fix(admin): pass personId through load fn instead of params prop; widen touch targets in table rows
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m33s
CI / OCR Service Tests (push) Successful in 35s
CI / Backend Unit Tests (push) Failing after 2m43s
CI / Unit & Component Tests (pull_request) Failing after 2m35s
CI / OCR Service Tests (pull_request) Successful in 41s
CI / Backend Unit Tests (pull_request) Failing after 2m44s
SvelteKit page components receive only data/form as props; accessing params
directly caused a TypeError and personName always fell back to 'Unknown'.
Also moves py-3 padding from <td> to <a> in OcrModelsTable to give
keyboard/touch users a full-height 44px target (WCAG 2.5.5).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 09:46:47 +02:00
Marcel
2466553216 fix(ocr): replace IllegalStateException with DomainException.internal in triggerManualSenderTraining
Ensures the unexpected-state path produces a structured JSON error response
instead of an unmapped 500 RuntimeException. Adds OCR_TRAINING_CONFLICT
ErrorCode and mirrors it in the frontend errors.ts. Adds coverage tests for
getAllSenderModels() and runSenderTraining().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 09:46:00 +02:00
Marcel
794000cbd1 fix(admin): locale-agnostic OcrHealthBar tests, focus rings on all OCR links
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m33s
CI / OCR Service Tests (push) Successful in 42s
CI / Backend Unit Tests (push) Failing after 2m45s
CI / Backend Unit Tests (pull_request) Failing after 2m44s
CI / Unit & Component Tests (pull_request) Failing after 2m43s
CI / OCR Service Tests (pull_request) Successful in 47s
OcrHealthBar spec used /online/i and /offline/i text matchers that would fail
in Spanish locale — replaced with CSS class assertions on role="img" dot.

Added focus-visible:ring-2/ring-brand-navy/rounded-sm to all links in OCR
admin pages (OcrModelsTable person+details, global history link, back-links
in global and personId detail pages) to satisfy WCAG 2.4.7.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 09:23:39 +02:00
Marcel
269894a47a refactor(ocr): move async training dispatch out of controller into SenderModelService
Controller was deciding when to fire runSenderTraining based on the returned run
status — a business rule that belongs in the service. Introduces @Lazy self-reference
to preserve @Async proxy dispatch without self-invocation bypassing Spring AOP.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 09:18:43 +02:00
Marcel
a00617194c fix(admin): i18n all hardcoded OCR strings, fix personName lookup, add empty state
Some checks failed
CI / Unit & Component Tests (push) Failing after 3m17s
CI / OCR Service Tests (push) Successful in 57s
CI / Backend Unit Tests (push) Failing after 2m52s
CI / Unit & Component Tests (pull_request) Failing after 2m47s
CI / OCR Service Tests (pull_request) Successful in 43s
CI / Backend Unit Tests (pull_request) Failing after 2m48s
- Replace hardcoded EN strings in OcrHealthBar/OcrStatCards/OcrModelsTable with
  Paraglide message keys (de/en/es translations added)
- Add role=img + aria-label to OcrHealthBar status dot
- Add {:else} empty-state row in OcrModelsTable
- Fix personName derivation in [personId]/+page.svelte to use params.personId key
  instead of Object.values()[0] (fragile when multiple persons present)
- Update OcrModelsTable spec to assert empty-state row structure (locale-agnostic)
- Add missing availableSegBlocks test to OcrStatCards spec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 08:59:49 +02:00
Marcel
b879d28761 fix(ocr): validate personId in TriggerSenderTrainingDTO — returns 400 not 500 on null
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 08:49:17 +02:00
Marcel
8acb830649 feat(admin): add OCR admin routes — overview, global history, sender detail
Some checks failed
CI / Backend Unit Tests (push) Failing after 2m45s
CI / Unit & Component Tests (pull_request) Failing after 2m32s
CI / OCR Service Tests (pull_request) Successful in 27s
CI / Backend Unit Tests (pull_request) Failing after 2m44s
CI / Unit & Component Tests (push) Failing after 2m35s
CI / OCR Service Tests (push) Successful in 30s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 01:05:08 +02:00
Marcel
0d8ac46639 feat(admin): add OcrHealthBar, OcrStatCards, OcrModelsTable components
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:30:24 +02:00
Marcel
5f4e60a14c feat(admin): add OCR entry to EntityNav sidebar and flyout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:25:42 +02:00
Marcel
f533817c7b feat(api): regenerate TypeScript types with new OCR admin endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:20:46 +02:00
Marcel
99e7176eac test(ocr): add service-level tests for triggerManualSenderTraining
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:17:54 +02:00
Marcel
c3fa09d12e feat(ocr): add POST /api/ocr/train-sender endpoint for manual sender training
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:16:02 +02:00
Marcel
178afcd496 feat(ocr): add per-model history endpoints via path segments
Add findByPersonIdIsNullOrderByCreatedAtDesc + findByPersonIdOrderByCreatedAtDesc to
OcrTrainingRunRepository. Add dto/TrainingHistoryResponse. Expose
GET /api/ocr/training-info/global and GET /api/ocr/training-info/{personId} on
OcrController, both requiring ADMIN; getSenderTrainingHistory guards person existence
via PersonService and returns 404 for unknown personId.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:08:38 +02:00
Marcel
b1b7418404 feat(ocr): promote TrainingInfoResponse to dto, add senderModels field
Move TrainingInfoResponse from private nested record to dto/TrainingInfoResponse.java,
add senderModels field, inject SenderModelService into OcrTrainingService so personNames
covers all known senders rather than only recent-run participants.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:04:29 +02:00
Marcel
a52c8bf079 test(db): verify V42 partial unique index for QUEUED training runs per person
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m32s
CI / OCR Service Tests (push) Successful in 31s
CI / Backend Unit Tests (push) Failing after 2m41s
CI / Unit & Component Tests (pull_request) Failing after 2m33s
CI / OCR Service Tests (pull_request) Successful in 37s
CI / Backend Unit Tests (pull_request) Failing after 2m49s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:54:34 +02:00
Marcel
da0a7e9194 fix(frontend): increase dismiss button touch target to 44×44px (WCAG 2.5.5)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:50:05 +02:00
Marcel
bbfd234746 refactor(ocr): use stream .toList() instead of FQCN Collectors.toList()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:47:36 +02:00
Marcel
b396fccd52 fix(frontend): QUEUED badge test, touch target on dismiss button, focus ring on expand toggle
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m37s
CI / OCR Service Tests (push) Successful in 40s
CI / Backend Unit Tests (push) Failing after 2m50s
CI / Unit & Component Tests (pull_request) Failing after 2m31s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Failing after 2m38s
Add missing test coverage for the amber QUEUED status badge in TrainingHistory.
Fix WCAG 2.2 minimum touch target (24 × 24 px) on the success-message dismiss
button in OcrTrainingCard. Add focus-visible ring to the expand/collapse toggle
in TrainingHistory so keyboard users get a visible focus indicator.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:36:26 +02:00
Marcel
64a854aad6 refactor(ocr): mark _SenderModelRegistry.contains as private (_contains)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:26:46 +02:00
Marcel
92f3c04d54 fix(ocr): add partial unique index and align SenderModelServiceTest with suite style
Add V42 partial unique index on ocr_training_runs(person_id) WHERE status='QUEUED'
to enforce the per-person queued coalescing guarantee at the DB level. Also adds
@ExtendWith(MockitoExtension.class) to SenderModelServiceTest for consistency with
the rest of the service test suite, with lenient() on the shared txTemplate stub.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:25:18 +02:00
Marcel
0d5f3f38d0 perf(ocr): resolve person names in single batch query in getTrainingInfo
Replace the per-run getById loop with a single getAllById call on distinct
person IDs, eliminating the N+1 query when training history contains multiple
sender model runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:21:12 +02:00
Marcel
4aa477555d refactor(ocr): return TrainingInfoResponse directly from getTrainingInfo endpoint
Remove the intermediate Map<String,Object> and return the typed record directly
so OpenAPI codegen produces a concrete TypeScript type. Fixes lastRun serializing
as {} (empty object) instead of null when no training run exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:18:27 +02:00
Marcel
84c09e41ef test(ocr): add /train-sender auth tests and run sender registry tests in CI
Add 503/403 auth tests for the /train-sender endpoint, matching the pattern
already used for /train and /segtrain. Also surface test_sender_registry.py
in CI (it needs no ML stack) and add pytest-asyncio to the install step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:14:27 +02:00
Marcel
e16dcdb7dc docs(ocr): document tail-recursive queue drain design in promoteNextQueuedRun
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m36s
CI / OCR Service Tests (push) Successful in 34s
CI / Backend Unit Tests (push) Failing after 2m43s
CI / Unit & Component Tests (pull_request) Failing after 2m38s
CI / OCR Service Tests (pull_request) Successful in 35s
CI / Backend Unit Tests (pull_request) Failing after 2m43s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:54:53 +02:00
Marcel
000079fd50 refactor(ocr): rename _contains to contains in SenderModelRegistry
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:53:16 +02:00
Marcel
a09a9e6043 fix(frontend): show person name inline in mobile status cell in TrainingHistory
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:52:08 +02:00
Marcel
1e289100a1 fix(frontend): show error on training start failure, add aria-live and dismiss to success message
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:46:06 +02:00
Marcel
0c2175aa07 refactor(frontend): extract shared TrainingRun type to $lib/types/training.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:42:06 +02:00
Marcel
f76a9cce1f test(ocr): add failure path and DONE status assertions to SenderModelServiceTest
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:38:43 +02:00
Marcel
e2081b57e7 refactor(ocr): extract exportSenderData helper in triggerSenderTraining
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m36s
CI / OCR Service Tests (push) Successful in 37s
CI / Backend Unit Tests (push) Failing after 2m51s
CI / Unit & Component Tests (pull_request) Failing after 2m42s
CI / OCR Service Tests (pull_request) Successful in 35s
CI / Backend Unit Tests (pull_request) Failing after 2m54s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:24:38 +02:00
Marcel
07035b9fa9 style(ocr): add Image type hints to extract_page_blocks and extract_region_text
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:22:34 +02:00
Marcel
57ffb7d751 chore(ocr): lower OCR_MAX_CACHED_MODELS to 2 with memory budget comment
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:20:53 +02:00
Marcel
eab37b9ac9 test(ocr): verify load failure does not cache broken entry in SenderModelRegistry
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:19:40 +02:00
Marcel
2459408930 refactor(ocr): move person-name enrichment from OcrController into OcrTrainingService
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:18:21 +02:00
Marcel
09f4601d15 test(ocr): verify triggerSenderTraining upserts SenderModel with correct path and cer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:13:21 +02:00
Marcel
1b34a36a77 fix(ocr): eliminate race window in runOrQueueSenderTraining by creating RUNNING row atomically
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:11:56 +02:00
Marcel
8d041a377d fix(ocr): correct trainSenderModel URI from /train to /train-sender
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:08:18 +02:00
Marcel
18cf839fac feat(ocr): wire SenderModelService into OcrAsyncRunner; stage missing foundational files
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m21s
CI / OCR Service Tests (push) Successful in 29s
CI / Backend Unit Tests (push) Failing after 2m38s
CI / Unit & Component Tests (pull_request) Failing after 2m26s
CI / OCR Service Tests (pull_request) Successful in 31s
CI / Backend Unit Tests (pull_request) Failing after 2m44s
OcrAsyncRunner now passes the per-sender model path to streamBlocks for
HANDWRITING_KURRENT documents. processDocument replaced extractBlocks
with streamBlocks + AtomicReference, removing the unchecked raw-array
pattern.

Also stages all previously uncommitted foundational files for this
feature: SenderModel entity, SenderModelRepository, Flyway migrations
V40/V41, updated OcrClient/RestClientOcrClient streaming API,
TrainingDataExportService.exportForSender, TranscriptionService Kurrent
hook, application.yaml OCR config, and frontend i18n/test additions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 19:27:02 +02:00
Marcel
78eca8e9a1 docs(ocr): add Admin OCR overview & model-detail UI spec
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m36s
CI / OCR Service Tests (push) Successful in 27s
CI / Backend Unit Tests (push) Failing after 1m22s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 19:11:07 +02:00
Marcel
386dc83958 refactor(ocr): move sender training methods from OcrTrainingService to SenderModelService
Eliminates cross-domain repository access: OcrTrainingService no longer
holds SenderModelRepository. SenderModelService now owns the full sender
training lifecycle (runOrQueueSenderTraining, triggerSenderTraining,
promoteNextQueuedRun), removing the circular dependency risk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 19:08:10 +02:00
Marcel
60c1ec7b5f refactor(ocr): delete buildTrainingInfoMap() dead code
The controller now builds the map inline (with personNames support).
This method had zero callers.

Fixes reviewer concerns from @felixbrandt and @mkeller.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:52:51 +02:00
Marcel
10a4a4d94b fix(ocr): log debug instead of silently swallowing person name resolution errors
Replaces catch(Exception ignored){} with log.debug() in getTrainingInfo().
Adds controller test documenting the graceful degradation behavior
(response stays 200 when personService.getById() throws).

Fixes reviewer concerns from @felixbrandt and @nullx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:51:15 +02:00
Marcel
e0b7cfdada feat(frontend): wire personNames to TrainingHistory in OcrTrainingCard
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m57s
CI / OCR Service Tests (push) Successful in 25s
CI / Backend Unit Tests (push) Failing after 1m34s
CI / Unit & Component Tests (pull_request) Failing after 2m44s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Failing after 1m26s
Extends Run interface with personId and QUEUED status, TrainingInfo with
personNames map, and passes it through to TrainingHistory for per-sender
model column display.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:25:59 +02:00
Marcel
b5e1a8ac2f chore: regenerate API types for per-sender model additions
OcrTrainingRun now includes personId (uuid, optional) and QUEUED status.
TrainingInfoResponse includes runs array with personId fields.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:10:04 +02:00
Marcel
64d27d6d61 feat(ocr): per-sender model registry and /train-sender endpoint
engines/kraken.py:
- Add _SenderModelRegistry with LRU eviction (max configurable via
  OCR_MAX_CACHED_MODELS env var), double-checked locking, invalidate(),
  and path whitelist (/app/models/ only)
- Add _load_sender_model() helper for testability
- extract_page_blocks() and extract_region_text() accept optional
  sender_model_path; route to sender registry when provided

models.py:
- OcrRequest gains senderModelPath: str | None = None field

main.py:
- /ocr and /ocr/stream pass request.senderModelPath to Kraken engine
- New /train-sender endpoint: validates output_model_path, runs ketos
  train with base model as starting point, invalidates sender cache

docker-compose.yml:
- Add OCR_MAX_CACHED_MODELS: "5" to ocr-service environment

test_sender_registry.py:
- 4 tests: cache hit, LRU eviction, invalidate, path traversal guard

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:05:39 +02:00
Marcel
7a342a07cf test: add unit tests for SenderModelService, runOrQueueSenderTraining, and updateBlock hook
- SenderModelServiceTest: 6 tests covering activation threshold (99/100),
  retrain delta (149/150), runNow flag (queued vs triggered)
- OcrTrainingServiceTest: 3 tests for runOrQueueSenderTraining — idle returns
  true, running saves QUEUED, duplicate QUEUED coalesces
- TranscriptionServiceTest: 3 tests for updateBlock — sets source=MANUAL,
  triggers training for HANDWRITING_KURRENT with sender, skips when no sender

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:00:59 +02:00
Marcel
bd23a76330 test: fix broken tests after per-sender model integration
- OcrAsyncRunnerTest: switch from extractBlocks/4-arg streamBlocks stubs
  to 5-arg streamBlocks (senderModelPath param) via doAnswer
- TranscriptionServiceTest: stub documentService.getDocumentById in
  updateBlock tests so the new Kurrent training hook does not NPE
- OcrControllerTest: add @MockitoBean PersonService (now injected into
  OcrController for personNames assembly in getTrainingInfo)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 17:56:51 +02:00
Marcel
c5e6ed922b test(ocr): decouple correction tests from exact library dictionary state
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m35s
CI / OCR Service Tests (pull_request) Successful in 36s
CI / Backend Unit Tests (pull_request) Failing after 2m47s
CI / Unit & Component Tests (push) Failing after 2m33s
CI / OCR Service Tests (push) Successful in 34s
CI / Backend Unit Tests (push) Failing after 2m41s
Replace exact-string assertions in test_correctable_ocr_error_gets_corrected
and test_sentence_with_multiple_corrections with structural assertions that
verify behavior (correction attempted, marker present, expected stem) without
coupling to a specific pyspellchecker version's frequency weights.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 17:23:09 +02:00
Marcel
ec85f228c1 refactor(ocr): document > 50 frequency threshold rationale
Strict greater-than avoids non-determinism: if multiple candidates share
the minimum frequency value, pyspellchecker's ranking is undefined.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 17:21:37 +02:00