Commit Graph

862 Commits

Author SHA1 Message Date
Marcel
3d3d4b8616 chore: add Claude personas, skills, memory, and project docs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 23:21:15 +02:00
Marcel
e4719b9487 fix(deploy): increase OCR healthcheck start_period, comment ocr_cache volume, add token hint
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
- start_period 60s → 120s: Zenodo download on cold start can exceed 60s on slow connections
- ocr_cache volume comment: documents what the cache stores for future operators
- .env.example: add token generation command to prevent weak placeholder in production

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
7562a400c0 test(frontend): add Vitest component tests for TrainingHistory expand/collapse
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
2073a4b64a fix(frontend): accessibility fixes for TrainingHistory expand/collapse and FAILED badge
- Add aria-expanded + aria-controls to expand button (WCAG 4.1.2)
- Add id="training-history-rows" to tbody for aria-controls target
- Replace title= tooltip on FAILED badge with details/summary for keyboard
  and touch accessibility; add training_error_detail_label i18n key
- Use motion-safe:animate-pulse on RUNNING badge for prefers-reduced-motion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
5c7efef307 fix(ocr): pin Dockerfile base image to python:3.11.9-slim for reproducible builds
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
74c9046745 fix(ocr): narrow exception handling and add unit tests for ensure_blla_model
- _model_is_loadable: narrow bare except to (RuntimeError, OSError, ValueError)
  with DEBUG-level fallback for unexpected exceptions — prevents silent masking
  of missing kraken install or AttributeError on vgsl
- _run_segtrain: replace bare except:pass with log.warning so height-check
  fallback is visible in container logs
- New test_ensure_blla_model.py: covers model-OK early return, incompatible
  model rename+replace, and missing model download paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
81da127381 refactor(ocr): rename findTop5 to findTop10 for headroom as frontend shows 3 by default
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
f206c0b9e9 test(ocr): add unit tests for triggerSegTraining() — conflict, threshold, happy path, failure
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
15e532eb96 refactor(ocr): extract assertNoRunningTraining() to eliminate duplicate guard
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
f241a71733 feat(frontend): limit training history to 3 runs with expand toggle
Both training panels (OCR and segmentation) share TrainingHistory.
Show only the 3 most recent runs by default; render a Mehr/Weniger
anzeigen button when there are more.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
b83465020a fix(backend): store error rate for segmentation training runs
setCer() was called for recognition training but not for segmentation.
The OCR service now returns cer = 1 - accuracy for segtrain; persist it
so the admin panel can display Fehlerrate for both training types.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
f08897b801 fix(deploy): wire OCR training token to backend and raise container memory limit
- Pass OCR_TRAINING_TOKEN through to the backend container as
  APP_OCR_TRAINING_TOKEN so RestClientOcrClient sends the X-Training-Token
  header when calling /train and /segtrain.
- Raise mem_limit/memswap_limit from 8g to 12g to give segtrain headroom
  on hosts with more available RAM.
- Uncomment OCR_TRAINING_TOKEN in .env.example — it is now required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
a5979c4069 fix(ocr-service): fix ketos 7 segtrain compatibility and prevent OOM
Three issues fixed:

1. --resize both was removed in ketos 7; replaced with --resize union
   which extends the model's class mapping to include training data classes.

2. ketos ignores -s when -i is present, so the 1800px blla model caused
   7+ GB peak RAM and OOM-killed the host (no swap, 5 GB free).
   Now checks the loaded model's input height: only uses the base model
   when it was already fine-tuned at 800px; otherwise trains from scratch
   at 800px (~200 MB peak). After the first run the trained 800px model
   becomes the base for all subsequent fine-tuning runs.

3. segtrain now computes and returns cer = 1 - accuracy, matching the
   recognition training path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
e8375d6c72 fix(ocr-service): add entrypoint that validates blla model format on startup
Adds ensure_blla_model.py which loads the blla segmentation model with
ketos on every container start. If the model is missing or in the legacy
PyTorch ZIP format (incompatible with ketos 7), it re-downloads the
correct CoreML protobuf model from Zenodo (DOI 10.5281/zenodo.14602569).
The Dockerfile now uses entrypoint.sh which runs this check before
starting uvicorn.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
28ac90b529 fix(annotations): replace outline:none with focus-visible ring for keyboard accessibility [M7]
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:42:01 +02:00
Marcel
76828a95e3 fix(annotations): add catch(err) binding to handlePointerUp error handler [M6]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:41:21 +02:00
Marcel
7125a0a8eb fix(annotations): reset liveWidth/liveHeight in handleKeyDown error rollback [M1, M6]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:40:55 +02:00
Marcel
7097f991fe feat(annotations): add keyboard accessibility to resize handles [B2]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:40:30 +02:00
Marcel
4d9145e49f feat(annotations): wire SVG aria-label to Paraglide i18n [B3]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:39:35 +02:00
Marcel
060d1c0515 feat(i18n): add annotation_resize_area and annotation_resize_handle message keys [B2, B3]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:38:10 +02:00
Marcel
72700bd28f test(annotations): add Testcontainers integration tests for V33 chk_annotation_bounds [B1]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:36:37 +02:00
Marcel
40c8f548db docs(annotations): fix ANNOTATION_UPDATE_FAILED Javadoc to reflect 400 status [M3]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:34:55 +02:00
Marcel
a19faa3806 feat(annotations): add @Slf4j and DataIntegrityViolationException catch to updateAnnotation [M2]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:34:03 +02:00
Marcel
f00b470928 test(annotations): add failing test for DataIntegrityViolationException defense [M2 red]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:33:43 +02:00
Marcel
65d606d8bb test(annotations): add missing height and x boundary validation tests [M4]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:31:07 +02:00
Marcel
4d3207fc27 test(annotations): verify save() is called in updateAnnotation test [M5]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 14:30:50 +02:00
Marcel
2350b4f845 fix(annotations): make resize overlay keyboard-interactive
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 1s
- Add tabindex="0" so the SVG can receive DOM focus
- Auto-focus the SVG on mount so arrow keys work immediately after
  clicking an annotation to select it
- Show preview rect during keyboard nudging (not just pointer drag) by
  checking hasLiveChanges instead of only checking dragState
- Suppress default browser focus outline (outline: none) on the SVG

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 11:47:41 +02:00
Marcel
9fe5b32a69 feat(annotations): add N/S/E/W edge midpoint handles to resize overlay
Extends the 4-corner L-bracket handles with 4 tick-mark edge handles
(short lines along each edge), enabling single-axis resize from any edge.
Updates applyHandleDrag to route each handle to the correct axis.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 11:40:39 +02:00
Marcel
fcc0efbf02 refactor(annotations): replace 8-square handles with 4 corner L-brackets
- 4 corner-only handles (nw/ne/sw/se), no edge midpoints
- Each handle renders as two short perpendicular lines meeting at the corner
  (10px arms, navy, square linecap) — no fill, no box
- Thin dashed selection border added to SVG overlay to signal edit mode
- Simplify applyHandleDrag to remove dead n/s/e/w branches

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 11:14:30 +02:00
Marcel
e7f88a4ea1 fix(annotations): use pixel-space viewBox so handles stay square on non-square annotations
ResizeObserver binds actual SVG pixel dimensions; viewBox matches them so
16px handle squares and 44px hit areas are physically correct regardless of
the annotation's aspect ratio.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 11:03:15 +02:00
Marcel
c610a3cc37 feat(annotations): wire updateAnnotation context and error display into PdfViewer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 11:00:50 +02:00
Marcel
3fb32ea285 feat(annotations): pass isResizable to AnnotationShape based on selection + transcribeMode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:57:13 +02:00
Marcel
3b756cd718 feat(annotations): add isResizable prop to AnnotationShape to render edit overlay
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:55:13 +02:00
Marcel
f5362a5850 feat(annotations): add AnnotationEditOverlay component with resize handles and drag
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:52:07 +02:00
Marcel
953cb2c910 feat(i18n): add ANNOTATION_UPDATE_FAILED error code and annotation_edit_mode_active translation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:43:10 +02:00
Marcel
ff231db671 feat(annotations): add PATCH endpoint for annotation resize/move
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:42:08 +02:00
Marcel
1558881c01 feat(annotations): add updateAnnotation service method with partial-update DTO
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:39:50 +02:00
Marcel
26c7181ba4 feat(annotations): add ANNOTATION_UPDATE_FAILED error code
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:38:33 +02:00
Marcel
f76a6c0ee5 migration(annotations): add chk_annotation_bounds CHECK constraint (V33)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:38:11 +02:00
Marcel
ca10e8a6a9 fix(test): update TranscriptionEditView empty-state assertion after text change
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 2s
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 2s
Commit 5afdc37 changed the empty state from transcription_empty_cta
('Markiere einen Bereich…') to transcription_empty_draw_hint
('Zeichnen Sie Bereiche…') but left the spec asserting the old text.
Updated the locator to match the current component output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:11:57 +02:00
Marcel
22ee3dce68 fix(api): remove duplicate import and align patchTrainingLabel OpenAPI response to 204
Removed duplicate import of org.mockito.ArgumentMatchers.eq from
DocumentControllerTest (lines 32+35). Added @ApiResponse(responseCode="204")
to patchTrainingLabel so the generated OpenAPI spec matches the actual
NoContent response the controller returns.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:07:41 +02:00
Marcel
99847980d2 fix(a11y): replace unicode glyphs with SVG icons in TrainingHistory status badges
WCAG 1.4.1 (Use of Color) requires non-color redundant cues for status.
The unicode ✓/✗ characters had inconsistent screen-reader support.
Replaced with explicit aria-hidden SVG icons (checkmark / x-circle)
alongside the translated status text labels.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:06:11 +02:00
Marcel
8f6e398af7 fix(i18n): replace hardcoded German training label chip strings with Paraglide keys
TranscriptionEditView rendered 'Kurrent-Erkennung' and 'Segmentierung'
as hardcoded German strings, breaking the en/es locales. Added
training_chip_kurrent and training_chip_segmentation keys to all three
message files and wired them up via m.training_chip_kurrent() /
m.training_chip_segmentation().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:04:52 +02:00
Marcel
30a17c97e8 fix(ocr): fail closed when TRAINING_TOKEN is not configured
_check_training_token previously skipped auth when TRAINING_TOKEN was
empty, allowing unauthenticated requests to reach /train and /segtrain.
Now returns 503 ("Training not configured on this node") when the token
is absent, so missing configuration fails closed rather than open.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:02:13 +02:00
Marcel
dc283ba271 fix(training): remove @Transactional from triggerTraining to avoid holding DB connection during OCR HTTP call
OcrTrainingService.triggerTraining() and triggerSegTraining() held a DB
connection open for the entire ketos training run (potentially minutes),
risking connection pool exhaustion. Replaced class-level @Transactional
with TransactionTemplate for narrow DB writes: guard+create and
result-record each run in their own short transaction; the HTTP call to
the OCR service runs between them with no open connection.

Also replaces blockRepository.findAll().size() with blockRepository.count()
in getTrainingInfo() to avoid loading every block into heap on each poll.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 09:59:12 +02:00
Marcel
62be895b9e fix(ocr): drop uvicorn workers from 2 to 1
Two workers × ~5 GB Surya model load = ~10 GB required, exceeding the
8 GB memory cap and causing OOM on the first /train call. Two OS
processes also cause model-state divergence after training, contradicting
the single-node constraint documented in ADR-001.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 09:55:55 +02:00
Marcel
7b79dc105b test(migrations): add Testcontainers integration tests for V23 + V30 constraints
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 0s
V23 introduced a JSONB check constraint (chk_annotation_polygon_quad)
requiring polygon arrays to have exactly 4 points. V30 introduced a
partial unique index preventing two concurrent RUNNING training runs.
These are DB-level invariants that unit tests cannot verify — five
Testcontainers tests now assert they are correctly applied by Flyway
and enforced by PostgreSQL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 23:07:17 +02:00
Marcel
e933aacc92 docs(infra): add .env.example with OCR_TRAINING_TOKEN
Fresh cloners had no tracked reference for required env vars.
.env is gitignored (contains real credentials). .env.example
documents all variables including the new OCR_TRAINING_TOKEN
for the Python OCR microservice training endpoints.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 23:03:10 +02:00
Marcel
fdba3211aa fix(a11y): add aria-live to OcrProgress page counter
Screen readers did not announce page-by-page OCR progress updates.
Wrapping the counter text in a span with aria-live=polite ensures
assistive technology announces each page completion without
interrupting the user.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 23:02:25 +02:00
Marcel
287920a982 docs(ocr): document single-node constraint for OCR training
Training reloads the Kraken model in-process on the Python service.
The DB-level RUNNING constraint prevents concurrent API calls but
cannot protect against multi-replica deployments. Added explicit
comments in docker-compose.yml and OcrTrainingService to prevent
accidental horizontal scaling. See ADR-001.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 23:01:45 +02:00