Commit Graph

1031 Commits

Author SHA1 Message Date
Marcel
b5e1a8ac2f chore: regenerate API types for per-sender model additions
OcrTrainingRun now includes personId (uuid, optional) and QUEUED status.
TrainingInfoResponse includes runs array with personId fields.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:10:04 +02:00
Marcel
64d27d6d61 feat(ocr): per-sender model registry and /train-sender endpoint
engines/kraken.py:
- Add _SenderModelRegistry with LRU eviction (max configurable via
  OCR_MAX_CACHED_MODELS env var), double-checked locking, invalidate(),
  and path whitelist (/app/models/ only)
- Add _load_sender_model() helper for testability
- extract_page_blocks() and extract_region_text() accept optional
  sender_model_path; route to sender registry when provided

models.py:
- OcrRequest gains senderModelPath: str | None = None field

main.py:
- /ocr and /ocr/stream pass request.senderModelPath to Kraken engine
- New /train-sender endpoint: validates output_model_path, runs ketos
  train with base model as starting point, invalidates sender cache

docker-compose.yml:
- Add OCR_MAX_CACHED_MODELS: "5" to ocr-service environment

test_sender_registry.py:
- 4 tests: cache hit, LRU eviction, invalidate, path traversal guard

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:05:39 +02:00
Marcel
7a342a07cf test: add unit tests for SenderModelService, runOrQueueSenderTraining, and updateBlock hook
- SenderModelServiceTest: 6 tests covering activation threshold (99/100),
  retrain delta (149/150), runNow flag (queued vs triggered)
- OcrTrainingServiceTest: 3 tests for runOrQueueSenderTraining — idle returns
  true, running saves QUEUED, duplicate QUEUED coalesces
- TranscriptionServiceTest: 3 tests for updateBlock — sets source=MANUAL,
  triggers training for HANDWRITING_KURRENT with sender, skips when no sender

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:00:59 +02:00
Marcel
bd23a76330 test: fix broken tests after per-sender model integration
- OcrAsyncRunnerTest: switch from extractBlocks/4-arg streamBlocks stubs
  to 5-arg streamBlocks (senderModelPath param) via doAnswer
- TranscriptionServiceTest: stub documentService.getDocumentById in
  updateBlock tests so the new Kurrent training hook does not NPE
- OcrControllerTest: add @MockitoBean PersonService (now injected into
  OcrController for personNames assembly in getTrainingInfo)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 17:56:51 +02:00
Marcel
c5e6ed922b test(ocr): decouple correction tests from exact library dictionary state
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m35s
CI / OCR Service Tests (pull_request) Successful in 36s
CI / Backend Unit Tests (pull_request) Failing after 2m47s
CI / Unit & Component Tests (push) Failing after 2m33s
CI / OCR Service Tests (push) Successful in 34s
CI / Backend Unit Tests (push) Failing after 2m41s
Replace exact-string assertions in test_correctable_ocr_error_gets_corrected
and test_sentence_with_multiple_corrections with structural assertions that
verify behavior (correction attempted, marker present, expected stem) without
coupling to a specific pyspellchecker version's frequency weights.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 17:23:09 +02:00
Marcel
ec85f228c1 refactor(ocr): document > 50 frequency threshold rationale
Strict greater-than avoids non-determinism: if multiple candidates share
the minimum frequency value, pyspellchecker's ranking is undefined.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 17:21:37 +02:00
Marcel
fea24aee25 refactor(ocr): make collapse_adjacent_markers a public function
Drop underscore prefix — the helper is part of confidence.py's effective
public API since spell_check.py imports and calls it directly.

Fixes reviewer concern: importing a _-prefixed name across module boundaries
contradicts Python's private-by-convention signal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 17:20:31 +02:00
Marcel
68b57918eb ci: add ocr-tests job for spell_check and confidence unit tests
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m48s
CI / OCR Service Tests (push) Successful in 1m59s
CI / Backend Unit Tests (push) Failing after 2m53s
CI / Unit & Component Tests (pull_request) Failing after 2m52s
CI / OCR Service Tests (pull_request) Successful in 33s
CI / Backend Unit Tests (pull_request) Failing after 2m54s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 16:55:07 +02:00
Marcel
77100ab1e6 feat(ocr): integrate spell-check post-processing for handwriting script types
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 16:54:17 +02:00
Marcel
092131930c feat(ocr): add spell_check module with German spellchecker and historical wordlist
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 16:52:50 +02:00
Marcel
47f9a0bf73 test(ocr): add failing tests for spell_check module
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 16:51:38 +02:00
Marcel
30a6cbeb7f feat(ocr): add DTA-derived historical German wordlist and generation script
153K words from dtak+dtae 1800-1899 corpora (min_freq=20),
covering pre-reform spellings common in Kurrent/Süterlin documents.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 16:48:26 +02:00
Marcel
6faaa3b7d6 feat(ocr): add pyspellchecker dependency
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 16:41:24 +02:00
Marcel
77747aa556 refactor(ocr): extract _collapse_adjacent_markers helper and add CORRECTION_MARKER
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 16:40:39 +02:00
Marcel
9a64c0698f fix(pdf): make isLoaded reactive so nav buttons are enabled after load
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m50s
CI / Backend Unit Tests (push) Failing after 2m47s
pdfDoc was a plain variable (not \$state), so renderer.isLoaded had no
reactive dependencies in Svelte 5. PdfControls received isLoaded=false
permanently, keeping the next-page button disabled while zoom buttons
(which have no disabled attribute) still worked.

Fix: derive isLoaded from totalPages (\$state) via totalPages > 0.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 15:55:42 +02:00
Marcel
4cb7c975f5 test(ocr): add resilience tests for tiny image and unexpected exception propagation
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m27s
CI / Backend Unit Tests (pull_request) Failing after 2m37s
CI / Unit & Component Tests (push) Failing after 3m14s
CI / Backend Unit Tests (push) Has been cancelled
Add test for 1×1 image (sub-tile-size) resilience and narrow preprocess_page
fallback from except Exception to (cv2.error, ValueError, MemoryError) so
programming errors propagate instead of being silently swallowed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 15:16:17 +02:00
Marcel
97c94c91f8 test(ocr): guard translateOcrProgress fallback for PREPROCESSING_PAGE with missing colon parts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 15:13:52 +02:00
Marcel
eaefd4091e feat(ocr): add PREPROCESSING_PAGE progress translation and i18n strings
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m34s
CI / Backend Unit Tests (push) Failing after 2m57s
CI / Unit & Component Tests (pull_request) Failing after 2m36s
CI / Backend Unit Tests (pull_request) Failing after 2m43s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:27:42 +02:00
Marcel
ba36a88b65 feat(ocr): add Preprocessing NDJSON event to Java stream pipeline
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:21:00 +02:00
Marcel
b310caaeeb feat(ocr): integrate preprocessing into stream and batch endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:16:47 +02:00
Marcel
615d404ba9 chore(ocr): add opencv-python-headless, libglib2.0-0, and CLAHE env vars
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:14:47 +02:00
Marcel
7183fc4428 feat(ocr): add image preprocessing module with CLAHE + grayscale + blur
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:13:42 +02:00
Marcel
bf010a23c3 docs(tag-input): add clarifying comments for non-obvious design decisions
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m26s
CI / Backend Unit Tests (pull_request) Failing after 2m44s
CI / Unit & Component Tests (push) Failing after 2m28s
CI / Backend Unit Tests (push) Failing after 25s
- SvelteMap satisfies svelte/prefer-svelte-reactivity; $derived.by() handles reactivity
- ‹›› prefix only on depth=0 context ancestors; indentation serves deeper nodes
- fetchedForQuery set after suggestions causes harmless double $derived evaluation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 12:17:22 +02:00
Marcel
b01761d800 test(tag-input): add regression guard for allowCreation=false + Enter on suggestion
Confirms that Enter on a suggestion item adds the tag even when allowCreation is
false — the activeIndex guard in handleKeydown runs before the allowCreation check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 12:14:35 +02:00
Marcel
6b7829d5c8 test(tag-input): rename waitForDebounce to waitForFetch and reduce to 50ms
fetchSuggestions has no debounce; the wait is purely for the async mock to
resolve. The old name implied semantics that don't exist and added ~4.5s to
the suite (13 uses × 350ms).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 12:11:54 +02:00
Marcel
1b617aa08b fix(tag-input): increase suggestion item padding to py-3 for 44px touch target
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 12:09:20 +02:00
Marcel
5120dd19a1 feat(tag-input): tree-aware DFS ordering, depth indentation, and direct-match styling
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m37s
CI / Backend Unit Tests (push) Failing after 2m39s
CI / Unit & Component Tests (pull_request) Failing after 2m28s
CI / Backend Unit Tests (pull_request) Failing after 2m41s
Rewrites orderedSuggestions to a recursive DFS with SuggestionEntry type,
adds role=listbox, depth indentation via inline style, font-medium for direct
matches, text-ink-3 for context nodes, and › prefix for root-level ancestors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 11:54:28 +02:00
Marcel
d075bf390a feat(tag-search): expand children and surface ancestor path in search results
Modifies TagService.search() to enrich name-matches with tree relatives:
root matches expand descendants, child matches prepend ancestors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 11:27:41 +02:00
Marcel
59b7f7cddf docs(specs): add tag-typeahead-tree-aware design spec
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m38s
CI / Backend Unit Tests (push) Failing after 2m42s
Visual spec for tree-aware tag typeahead: parent matches expand to
show children, child matches surface ancestor path for context.
Covers backend enrichment strategy (TagService.search enrichment via
existing recursive CTEs) and frontend DFS ordering + depth-indent
rendering in TagInput.svelte.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 11:09:38 +02:00
Marcel
3b1317af98 fix(admin-tags): name clears after save and wrong confirmation text
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m30s
CI / Backend Unit Tests (pull_request) Failing after 2m45s
CI / Unit & Component Tests (push) Failing after 2m35s
CI / Backend Unit Tests (push) Failing after 2m44s
SvelteKit's use:enhance resets the form after a successful action.
The name input used value={data.tag.name} without bind:, so Svelte 5's
fine-grained reactivity did not re-apply the unchanged value after the
reset — leaving the field empty. Passing reset: false to update() fixes
this.

Also corrected the confirmation message from "renamed" to "saved" in
all three locales, since the action updates name, parent, and color.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 10:22:54 +02:00
Marcel
4442b25a7a fix(#248): add confirmation dialog before tag delete
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m29s
CI / Backend Unit Tests (push) Failing after 2m43s
CI / Unit & Component Tests (pull_request) Failing after 2m36s
CI / Backend Unit Tests (pull_request) Failing after 2m34s
TagDeleteGuard now calls confirm() (admin_tag_delete_confirm) before
submitting — same pattern as document delete. Button changed to type=button
with an async handler; page.svelte.spec.ts updated to pass ConfirmService
context so TagDeleteGuard can initialise inside the page render.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 09:39:33 +02:00
Marcel
47d57b96c8 fix(#248): show merge success banner via PRG pattern (?merged=1 redirect)
After a successful merge, redirect 303 to /admin/tags/{targetId}?merged=1.
Load function detects the param and returns mergeSuccess:true; +page.svelte
renders the banner and cleans the URL with replaceState so refresh doesn't
re-show it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 09:15:29 +02:00
Marcel
902172e4e2 fix(#248): fix 3 merge zone bugs — stale state, wrong placeholder, missing success feedback
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m41s
CI / Backend Unit Tests (pull_request) Failing after 2m34s
CI / Unit & Component Tests (push) Failing after 2m21s
CI / Backend Unit Tests (push) Failing after 2m35s
- TagMergeZone: add $effect to reset targetId when tag prop changes (fixes stale form after navigation)
- TagMergeZone: pass merge-specific placeholder to TagParentPicker
- TagMergeZone: show success banner on form.mergeSuccess and goto() target tag
- +page.server.ts: merge action returns { mergeSuccess, mergeTargetId } instead of redirect

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 08:12:35 +02:00
Marcel
654575bf16 feat(#248): add admin_tag_merge_target_placeholder and admin_tag_merge_success i18n keys
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 08:05:50 +02:00
Marcel
b5ea04e47a feat(#248): add optional placeholder prop to TagParentPicker
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 08:04:16 +02:00
Marcel
4ec4062274 refactor(#248): simplify TagService.buildTree() to single-pass LinkedHashMap approach
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3m12s
CI / Backend Unit Tests (pull_request) Failing after 2m57s
CI / Unit & Component Tests (push) Failing after 2m41s
CI / Backend Unit Tests (push) Failing after 2m45s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 07:45:40 +02:00
Marcel
3cd6483042 fix(#248): replace focus:outline-none with focus-visible ring on TagParentPicker clear button
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 07:43:38 +02:00
Marcel
aff7afa7cb fix(#248): resolve parent UUID to name in TagParentPicker dropdown subtitle
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 07:42:13 +02:00
Marcel
be7009f9ed fix(#248): replace document.querySelectorAll with page.getByRole in TagDeleteGuard spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 07:39:06 +02:00
Marcel
e6497ebff4 fix(#248): add @Schema(REQUIRED) to TagTreeNodeDTO, improve mergeTags log, add comments
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m42s
CI / Backend Unit Tests (pull_request) Failing after 2m44s
CI / Unit & Component Tests (push) Failing after 2m35s
CI / Backend Unit Tests (push) Failing after 2m44s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 01:01:09 +02:00
Marcel
ba8758c085 fix(#248): mode-aware delete button text in TagDeleteGuard and fix document.querySelector in spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:58:50 +02:00
Marcel
61976e9479 fix(#248): increase tree-node indent from 12px to 16px for better scanability
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:43:44 +02:00
Marcel
901483ab73 fix(#248): complete ARIA combobox pattern in TagParentPicker — role="option", aria-activedescendant, keyboard nav
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:42:54 +02:00
Marcel
6f6ff8e9ed fix(#248): add console.error to typeahead catch block and expose setActiveIndex
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:37:18 +02:00
Marcel
7919ba3a57 fix(#248): address PR review concerns — i18n, aria-label, stable keys, test selectors
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m37s
CI / Backend Unit Tests (push) Failing after 2m48s
CI / Unit & Component Tests (pull_request) Failing after 2m35s
CI / Backend Unit Tests (pull_request) Failing after 2m49s
- Add filter_operator_and/or/and_label/or_label i18n keys to de/en/es locale files
- Add aria-label and aria-pressed to AND/OR toggle buttons in SearchFilterBar
- Add data-testid="operator-and/or" for unambiguous test targeting (fixes substring match on German "Schlagwort")
- Use stable keys (tag.id ?? tag.name) for TagInput chip and suggestion lists
- Remove aria-level from role="option" items in TagInput (invalid attribute for that role)
- Add aria-live="polite" role="status" to TagMergeZone step indicator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:24:53 +02:00
Marcel
d7a46de1cc refactor(#248): address PR review concerns — TagOperator enum, typed projection, bean validation
- Replace stringly-typed "AND"/"OR" tagOperator with TagOperator enum (DocumentService, DocumentController)
- Replace Object[] with TagCount projection interface in TagRepository.findDocumentCountsPerTag()
- Use @NotNull + @Valid on MergeTagDTO.targetId; remove manual null check from TagController
- Correct ALLOWED_TAG_COLORS to match actual frontend CSS tokens (sage/sienna/amber/slate/violet/rose/cobalt/moss/sand/coral)
- Add TOCTOU comment to validateNoAncestorCycle() with mitigation explanation
- Add test: deleteWithDescendants_skipsDocTagDeletion_whenDescendantIdsIsEmpty
- Update TagServiceTest to use mock TagRepository.TagCount projection

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:24:04 +02:00
Marcel
172c5613ed feat(#248): overhaul tag edit page — TagParentPicker, new components, merge+subtree actions
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m51s
CI / Backend Unit Tests (push) Failing after 2m46s
CI / Unit & Component Tests (pull_request) Failing after 2m39s
CI / Backend Unit Tests (pull_request) Failing after 2m58s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 23:46:06 +02:00
Marcel
f1889ff20c feat(#248): add TagDeleteGuard component and brand-warning CSS token
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 23:34:29 +02:00
Marcel
4d670de156 feat(#248): add TagMergeZone component with 2-step merge flow
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 23:30:28 +02:00
Marcel
b6b1b142dc feat(#248): add TagAncestry and TagChildrenPreview components
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 23:26:02 +02:00