- V34 migration: adds search_vector tsvector column with GIN index
- BEFORE INSERT/UPDATE trigger on documents rebuilds vector from title (A),
summary + transcription_blocks.text (B), sender/receiver names (C),
tag names + location (D) using german FTS config
- AFTER triggers on transcription_blocks, document_receivers, document_tags
touch the parent document row to re-fire the BEFORE UPDATE trigger
- DocumentRepository.findRankedIdsByFts() native query using websearch_to_tsquery
- DocumentFtsTest: 12 integration tests covering stemming, trigger sync,
ranking, stop words, malformed input, receiver and tag search
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
With a pre-built JAR, Spring Boot + Flyway starts in ~15 seconds.
The previous 60s was sized for runtime compilation (90+ seconds).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Pin to eclipse-temurin:21.0.10_7-{jdk,jre}-noble for reproducible builds
- Switch -DskipTests to -Dmaven.test.skip=true: skips test compilation entirely,
not just execution — faster and avoids build failures from test-only missing classes
- Add comment on COPY *.jar explaining why the glob is safe (Spring Boot renames
the pre-repackage artifact to .jar.original, leaving only one .jar in target/)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevents 111MB of compiled output from being sent to the BuildKit daemon
on cold builds. Only .mvn/, mvnw, pom.xml, and src/ are needed by the
three COPY instructions in the Dockerfile.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace runtime mvn spring-boot:run with a proper multi-stage build:
- Stage 1 (builder): compiles JAR with BuildKit cache mount for ~/.m2
- Stage 2 (runtime): eclipse-temurin:21-jre with only the JAR
Removes the backend source volume mount and maven_cache named volume.
Deploy with: docker compose up -d --build
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace hand-rolled enrichedDocuments year-divider logic with the shared
groupDocuments utility. Also fixes a timezone bug in documentYears: adds
'T12:00:00' to date strings so getFullYear() doesn't drift on UTC boundaries.
No behavior change — year dividers render the same way as before.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirrors the existing sort allowlist pattern. Any value other than 'asc' or
'desc' silently falls back to 'desc', preventing arbitrary strings from
reaching the search API.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
text-xs text-ink/40 (~2.1:1) fails WCAG AA; text-sm bold at text-ink/60
(~3.7:1) passes the large-text 3:1 threshold. Also adds role="separator"
and aria-label so screen readers announce the group boundary.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add failing test for DATE-sort + undated doc showing "Undatiert" fallback
label, then fix DocumentList by null-coalescing sort before comparison
((sort ?? 'DATE') === 'DATE'). Test uses one dated + one undated doc to
produce two groups and trigger GroupDivider rendering.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents sorted by DATE show year dividers, SENDER/RECEIVER sort
shows person name dividers. Dividers only appear when there are 2+
distinct groups. Multi-receiver docs appear in each receiver group.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- start_period 60s → 120s: Zenodo download on cold start can exceed 60s on slow connections
- ocr_cache volume comment: documents what the cache stores for future operators
- .env.example: add token generation command to prevent weak placeholder in production
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add aria-expanded + aria-controls to expand button (WCAG 4.1.2)
- Add id="training-history-rows" to tbody for aria-controls target
- Replace title= tooltip on FAILED badge with details/summary for keyboard
and touch accessibility; add training_error_detail_label i18n key
- Use motion-safe:animate-pulse on RUNNING badge for prefers-reduced-motion
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _model_is_loadable: narrow bare except to (RuntimeError, OSError, ValueError)
with DEBUG-level fallback for unexpected exceptions — prevents silent masking
of missing kraken install or AttributeError on vgsl
- _run_segtrain: replace bare except:pass with log.warning so height-check
fallback is visible in container logs
- New test_ensure_blla_model.py: covers model-OK early return, incompatible
model rename+replace, and missing model download paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both training panels (OCR and segmentation) share TrainingHistory.
Show only the 3 most recent runs by default; render a Mehr/Weniger
anzeigen button when there are more.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
setCer() was called for recognition training but not for segmentation.
The OCR service now returns cer = 1 - accuracy for segtrain; persist it
so the admin panel can display Fehlerrate for both training types.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Pass OCR_TRAINING_TOKEN through to the backend container as
APP_OCR_TRAINING_TOKEN so RestClientOcrClient sends the X-Training-Token
header when calling /train and /segtrain.
- Raise mem_limit/memswap_limit from 8g to 12g to give segtrain headroom
on hosts with more available RAM.
- Uncomment OCR_TRAINING_TOKEN in .env.example — it is now required.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three issues fixed:
1. --resize both was removed in ketos 7; replaced with --resize union
which extends the model's class mapping to include training data classes.
2. ketos ignores -s when -i is present, so the 1800px blla model caused
7+ GB peak RAM and OOM-killed the host (no swap, 5 GB free).
Now checks the loaded model's input height: only uses the base model
when it was already fine-tuned at 800px; otherwise trains from scratch
at 800px (~200 MB peak). After the first run the trained 800px model
becomes the base for all subsequent fine-tuning runs.
3. segtrain now computes and returns cer = 1 - accuracy, matching the
recognition training path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds ensure_blla_model.py which loads the blla segmentation model with
ketos on every container start. If the model is missing or in the legacy
PyTorch ZIP format (incompatible with ketos 7), it re-downloads the
correct CoreML protobuf model from Zenodo (DOI 10.5281/zenodo.14602569).
The Dockerfile now uses entrypoint.sh which runs this check before
starting uvicorn.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add tabindex="0" so the SVG can receive DOM focus
- Auto-focus the SVG on mount so arrow keys work immediately after
clicking an annotation to select it
- Show preview rect during keyboard nudging (not just pointer drag) by
checking hasLiveChanges instead of only checking dragState
- Suppress default browser focus outline (outline: none) on the SVG
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extends the 4-corner L-bracket handles with 4 tick-mark edge handles
(short lines along each edge), enabling single-axis resize from any edge.
Updates applyHandleDrag to route each handle to the correct axis.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 4 corner-only handles (nw/ne/sw/se), no edge midpoints
- Each handle renders as two short perpendicular lines meeting at the corner
(10px arms, navy, square linecap) — no fill, no box
- Thin dashed selection border added to SVG overlay to signal edit mode
- Simplify applyHandleDrag to remove dead n/s/e/w branches
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ResizeObserver binds actual SVG pixel dimensions; viewBox matches them so
16px handle squares and 44px hit areas are physically correct regardless of
the annotation's aspect ratio.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>