Commit Graph

591 Commits

Author SHA1 Message Date
Marcel
e302d3d689 feat(search): add group headers to DocumentList by sort field
Documents sorted by DATE show year dividers, SENDER/RECEIVER sort
shows person name dividers. Dividers only appear when there are 2+
distinct groups. Multi-receiver docs appear in each receiver group.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 07:59:02 +02:00
Marcel
a9aa1ec924 feat(search): add groupDocuments utility with unit tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 23:36:35 +02:00
Marcel
ce2bbf4230 refactor(conversations): use GroupDivider in ConversationTimeline
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 23:35:09 +02:00
Marcel
69bcb3f8b2 feat(search): add GroupDivider shared component
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 23:24:48 +02:00
Marcel
34a97cbfa2 i18n: add docs_group_undated and docs_group_unknown translation keys
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 23:21:43 +02:00
Marcel
7562a400c0 test(frontend): add Vitest component tests for TrainingHistory expand/collapse
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
2073a4b64a fix(frontend): accessibility fixes for TrainingHistory expand/collapse and FAILED badge
- Add aria-expanded + aria-controls to expand button (WCAG 4.1.2)
- Add id="training-history-rows" to tbody for aria-controls target
- Replace title= tooltip on FAILED badge with details/summary for keyboard
  and touch accessibility; add training_error_detail_label i18n key
- Use motion-safe:animate-pulse on RUNNING badge for prefers-reduced-motion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
f241a71733 feat(frontend): limit training history to 3 runs with expand toggle
Both training panels (OCR and segmentation) share TrainingHistory.
Show only the 3 most recent runs by default; render a Mehr/Weniger
anzeigen button when there are more.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
28ac90b529 fix(annotations): replace outline:none with focus-visible ring for keyboard accessibility [M7]
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:42:01 +02:00
Marcel
76828a95e3 fix(annotations): add catch(err) binding to handlePointerUp error handler [M6]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:41:21 +02:00
Marcel
7125a0a8eb fix(annotations): reset liveWidth/liveHeight in handleKeyDown error rollback [M1, M6]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:40:55 +02:00
Marcel
7097f991fe feat(annotations): add keyboard accessibility to resize handles [B2]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:40:30 +02:00
Marcel
4d9145e49f feat(annotations): wire SVG aria-label to Paraglide i18n [B3]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:39:35 +02:00
Marcel
060d1c0515 feat(i18n): add annotation_resize_area and annotation_resize_handle message keys [B2, B3]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:38:10 +02:00
Marcel
2350b4f845 fix(annotations): make resize overlay keyboard-interactive
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 1s
- Add tabindex="0" so the SVG can receive DOM focus
- Auto-focus the SVG on mount so arrow keys work immediately after
  clicking an annotation to select it
- Show preview rect during keyboard nudging (not just pointer drag) by
  checking hasLiveChanges instead of only checking dragState
- Suppress default browser focus outline (outline: none) on the SVG

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 11:47:41 +02:00
Marcel
9fe5b32a69 feat(annotations): add N/S/E/W edge midpoint handles to resize overlay
Extends the 4-corner L-bracket handles with 4 tick-mark edge handles
(short lines along each edge), enabling single-axis resize from any edge.
Updates applyHandleDrag to route each handle to the correct axis.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 11:40:39 +02:00
Marcel
fcc0efbf02 refactor(annotations): replace 8-square handles with 4 corner L-brackets
- 4 corner-only handles (nw/ne/sw/se), no edge midpoints
- Each handle renders as two short perpendicular lines meeting at the corner
  (10px arms, navy, square linecap) — no fill, no box
- Thin dashed selection border added to SVG overlay to signal edit mode
- Simplify applyHandleDrag to remove dead n/s/e/w branches

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 11:14:30 +02:00
Marcel
e7f88a4ea1 fix(annotations): use pixel-space viewBox so handles stay square on non-square annotations
ResizeObserver binds actual SVG pixel dimensions; viewBox matches them so
16px handle squares and 44px hit areas are physically correct regardless of
the annotation's aspect ratio.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 11:03:15 +02:00
Marcel
c610a3cc37 feat(annotations): wire updateAnnotation context and error display into PdfViewer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 11:00:50 +02:00
Marcel
3fb32ea285 feat(annotations): pass isResizable to AnnotationShape based on selection + transcribeMode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:57:13 +02:00
Marcel
3b756cd718 feat(annotations): add isResizable prop to AnnotationShape to render edit overlay
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:55:13 +02:00
Marcel
f5362a5850 feat(annotations): add AnnotationEditOverlay component with resize handles and drag
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:52:07 +02:00
Marcel
953cb2c910 feat(i18n): add ANNOTATION_UPDATE_FAILED error code and annotation_edit_mode_active translation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:43:10 +02:00
Marcel
ca10e8a6a9 fix(test): update TranscriptionEditView empty-state assertion after text change
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 2s
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 2s
Commit 5afdc37 changed the empty state from transcription_empty_cta
('Markiere einen Bereich…') to transcription_empty_draw_hint
('Zeichnen Sie Bereiche…') but left the spec asserting the old text.
Updated the locator to match the current component output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:11:57 +02:00
Marcel
99847980d2 fix(a11y): replace unicode glyphs with SVG icons in TrainingHistory status badges
WCAG 1.4.1 (Use of Color) requires non-color redundant cues for status.
The unicode ✓/✗ characters had inconsistent screen-reader support.
Replaced with explicit aria-hidden SVG icons (checkmark / x-circle)
alongside the translated status text labels.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:06:11 +02:00
Marcel
8f6e398af7 fix(i18n): replace hardcoded German training label chip strings with Paraglide keys
TranscriptionEditView rendered 'Kurrent-Erkennung' and 'Segmentierung'
as hardcoded German strings, breaking the en/es locales. Added
training_chip_kurrent and training_chip_segmentation keys to all three
message files and wired them up via m.training_chip_kurrent() /
m.training_chip_segmentation().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:04:52 +02:00
Marcel
fdba3211aa fix(a11y): add aria-live to OcrProgress page counter
Screen readers did not announce page-by-page OCR progress updates.
Wrapping the counter text in a span with aria-live=polite ensures
assistive technology announces each page completion without
interrupting the user.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 23:02:25 +02:00
Marcel
5afdc37653 feat(ui): manual-first OCR workflow — remove full-page auto-segmentation
Drawing annotations is now the primary workflow. OCR only runs on
manually drawn regions (guided mode always). Full-page layout detection
and the useExistingAnnotations checkbox are removed entirely.

- OcrTrigger: guided-only, disabled with hint when no annotations exist
- TranscriptionEditView: empty state shows draw-regions instruction,
  OCR trigger moved out of collapsible and shown inline after block list
- i18n: add ocr_trigger_no_annotations, ocr_section_heading,
  transcription_empty_draw_hint; remove ocr_use_existing_annotations keys

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 22:24:50 +02:00
Marcel
49c9022285 fix(training): switch to PAGE XML format for kurrent recognition training
Kraken 7 removed support for the legacy `path` format (image + .gt.txt
pairs) in VGSLRecognitionDataModule despite the CLI still advertising it.
Switching to PAGE XML (-f page) format which is the supported standard.

- Java export now writes .xml alongside .png (PAGE XML with TextLine,
  Baseline at 75% height, and Unicode transcription)
- XML special characters in transcription text are escaped (&amp; &lt; &gt;)
- Python trainer globs *.xml and passes -f page to ketos train
- Regenerated frontend API types to include cer/loss/accuracy/epochs on
  OcrTrainingRun (were missing, causing empty CER column in history)
- Updated and extended TrainingDataExportServiceTest

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 21:45:08 +02:00
Marcel
e33164c4aa fix(training): use ketos CLI subprocess instead of missing Python API
kraken.ketos has no .train or .segtrain attributes in Kraken 7 — both are
only exposed as CLI commands. Rewrites both training functions to invoke
`ketos train` / `ketos segtrain` via subprocess and parse the best
val_metric from checkpoint filenames.

Also fixes the OcrTrainingCard history so it only shows non-blla runs
(recognition model), matching SegmentationTrainingCard which already
filtered to blla-only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 20:50:21 +02:00
Marcel
22954f348a feat(training): track and display CER per training run
After each training run, the Character Error Rate (CER = 1 - accuracy),
loss, accuracy, and epoch count are now stored on the OcrTrainingRun
record and shown in the training history table.

Also adds the missing POST /api/ocr/segtrain endpoint and the
triggerSegTraining service method so the segmentation training card
can actually trigger training.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 19:01:10 +02:00
Marcel
1fd5c31fd1 fix(training): pass trainingInfo directly to SegmentationTrainingCard
The parent was manually remapping availableSegBlocks → availableBlocks
before passing props, which broke after the card was updated to read
availableSegBlocks directly. Pass the full trainingInfo object instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 17:55:16 +02:00
Marcel
a514cbca18 fix(training): segmentation card reads availableSegBlocks not availableBlocks
Both cards were reading the same availableBlocks field, so the segmentation
box always showed the kurrent recognition count. Use the correct
availableSegBlocks field from the training info response.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 17:54:20 +02:00
Marcel
b6f74fd6fc refactor(annotations): remove overlap check to allow intersecting regions
Historical letter lines often intersect, so the system must support
overlapping annotation regions. Removed the overlap guard from
createAnnotation(), deleted ErrorCode.ANNOTATION_OVERLAP, and cleaned
up all tests and frontend error mappings that referenced it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 16:48:18 +02:00
Marcel
ee58b63517 feat(ocr): add guided OCR mode using existing annotation regions
When a document has manually drawn annotation boxes, the user can now
enable "Nur annotierte Bereiche" in the OCR trigger panel. The engine
skips layout detection entirely and runs recognition only within the
pre-drawn bounding boxes, preserving manual transcription blocks.

- Python: adds OcrRegion model, extend OcrRequest/OcrBlock; guided
  branch in /ocr/stream groups by page and crops each region
- Engines: add extract_region_text() to both Kraken and Surya
- Java: adds OcrBlockResult.annotationId, OcrClient.OcrRegion,
  TriggerOcrDTO.useExistingAnnotations; OcrAsyncRunner dispatches to
  upsertGuidedBlock when annotationId is present; OcrService threads
  the flag through to runSingleDocument
- TranscriptionService: adds upsertGuidedBlock (creates, updates OCR,
  or preserves MANUAL blocks)
- Frontend: guided OCR toggle in OcrTrigger shown when blocks exist;
  skips destructive-replace confirmation in guided mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 15:57:54 +02:00
Marcel
9b2f91ee59 feat(training): add segmentation training pipeline and complete Part 6
- Add /segtrain endpoint to OCR service (ZIP upload, ketos.segtrain,
  backup rotation, in-process model reload)
- Add segtrainModel() to OcrClient and RestClientOcrClient (10-min timeout,
  X-Training-Token header)
- Add SegmentationTrainingExportService: PAGE XML export with polygon
  de-normalization and per-page PNG rendering via PDFBox
- Add GET /api/ocr/segmentation-training-data/export endpoint
- Make TranscriptionBlock.text nullable for segmentation-only blocks
  (V31 migration)
- Add Paraglide i18n translation keys for all training UI strings (de/en/es)
- Pass source prop from TranscriptionEditView to TranscriptionBlock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 15:15:17 +02:00
Marcel
86e9c05aaf feat(training): add Paraglide i18n to training UI components and wire SegmentationTrainingCard
- Convert TrainingHistory, OcrTrainingCard, SegmentationTrainingCard, and
  TranscriptionBlock "Nur Segmentierung" badge to use Paraglide message keys
- Add availableSegBlocks to TrainingInfoResponse to expose segmentation
  block count in the training info endpoint
- Wire SegmentationTrainingCard into admin/system page below OCR training card
- Update api.ts with availableSegBlocks field

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 15:14:27 +02:00
Marcel
4e08d31e01 feat(admin): add OCR training card to admin/system page
- TrainingHistory.svelte: responsive table with status badges
  (green/red/animated pulse), keyed iteration, empty-state row
- OcrTrainingCard.svelte: shows available blocks/docs, disabled states
  (< 5 blocks, service down), in-flight "…" state, 5s success message,
  embeds TrainingHistory
- Wired into admin/system/+page.svelte via fetchTrainingInfo() in $effect
- Regenerated api.ts with OcrTrainingRun + TrainingInfoResponse types
- TRAINING_ALREADY_RUNNING error code in errors.ts + de/en/es translations
- 7 OcrTrainingCard Vitest tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 14:58:13 +02:00
Marcel
fdf1eb92ad feat(training): add document-level training enrollment
- V29 migration: document_training_labels join table
- TrainingLabel enum: KURRENT_RECOGNITION, KURRENT_SEGMENTATION
- Document.trainingLabels @ElementCollection
- DocumentService.addTrainingLabel / removeTrainingLabel
- PATCH /api/documents/{id}/training-labels (WRITE_ALL)
- Auto-enroll on Kurrent OCR trigger (OcrService.startOcr)
- TranscriptionEditView: enrollment chips in panel footer
- JPQL queries updated to use MEMBER OF trainingLabels

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 14:30:51 +02:00
Marcel
73229077be feat(transcription): add sticky review progress counter to TranscriptionEditView
Shows 'X / Y geprüft' with a brand-mint progress bar at the top of the
transcription panel. Derived from the blocks prop — no extra state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 13:59:35 +02:00
Marcel
b7fd4018c2 fix(frontend): normalize paraglide imports and improve accessibility
Changed OcrTrigger and ScriptTypeSelect from 'import * as m' to
'import { m }' to match the rest of the codebase. Increased
ScriptTypeSelect label to text-sm and annotation badge font to 12px
for better readability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:29:00 +02:00
Marcel
8c07779a91 fix(ocr): fix SSE retry to actually reconnect EventSource
The retry button set status='running' but didn't re-trigger the $effect
because jobId hadn't changed. Added retryCount state so the effect
re-runs and creates a fresh EventSource on retry. Also added aria-label
to the progress bar for accessibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:28:40 +02:00
Marcel
410ef88e1a refactor(ocr): delete unused OcrProgressBar component
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
The skipped-pages warning is inlined directly in +page.svelte.
The component and its tests are no longer needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:53:10 +02:00
Marcel
6b94882409 fix(ocr): remove redundant page counter from progress display
The progress message already says "Seite 3 von 7 wird analysiert…"
so the separate "3 / 7" counter was redundant. Remove the
OcrProgressBar from the page and inline only the skipped-pages
warning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:50:05 +02:00
Marcel
b868da07cd fix(ocr): remove progress bar, keep text-only page counter
The thin bar without a border looked broken at low progress values.
The text counter (e.g. "1 / 6") already communicates progress clearly
so the bar is unnecessary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:46:29 +02:00
Marcel
3fe6eedffb feat(ocr): allow re-running OCR when transcription blocks already exist
Add a collapsible OCR trigger below the block list in edit mode.
Uses a <details> element so it's unobtrusive — the primary workflow
is editing existing blocks, but users can expand to re-run OCR with
a confirmation dialog that warns about replacing existing blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:37:51 +02:00
Marcel
bac67706b9 feat(ocr): integrate progress bar and streaming progress into document page
Replace inline translateOcrProgress with the extracted module. Add
OcrProgressBar below the spinner during OCR. Parse page numbers from
ANALYZING_PAGE progress codes and feed them to the bar. On Done, fill
bar to 100% briefly before clearing the overlay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:15:55 +02:00
Marcel
035f9768bd feat(ocr): add OcrProgressBar component with page-based ARIA semantics
Progress bar shows brand-mint fill on brand-sand background with
smooth transition. Displays page counter with tabular-nums and
skipped-pages warning in amber when applicable. Only renders when
totalPages > 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:13:57 +02:00
Marcel
ddec64fc79 feat(ocr): extract translateOcrProgress with ANALYZING_PAGE and DONE:skipped support
Move translateOcrProgress from page.svelte to a testable module.
Return structured result with currentPage/totalPages/skippedPages
for the progress bar. Add ANALYZING_PAGE and DONE with skipped pages
parsing. Add i18n keys for de/en/es.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:09:29 +02:00
Marcel
d8dcba1a71 fix(ocr): unblock event loop during OCR and show errors in UI
OCR engines are CPU-bound and were blocking Uvicorn's single async
event loop, making /health unresponsive during processing. This caused
new OCR requests to fail silently (health check failure → no DB record
→ UI shows NONE). Wrap engine calls in asyncio.to_thread() to keep the
event loop free. Also surface OCR trigger errors in the frontend
instead of silently resetting the spinner.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:50:39 +02:00