Add findByPersonIdIsNullOrderByCreatedAtDesc + findByPersonIdOrderByCreatedAtDesc to
OcrTrainingRunRepository. Add dto/TrainingHistoryResponse. Expose
GET /api/ocr/training-info/global and GET /api/ocr/training-info/{personId} on
OcrController, both requiring ADMIN; getSenderTrainingHistory guards person existence
via PersonService and returns 404 for unknown personId.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move TrainingInfoResponse from private nested record to dto/TrainingInfoResponse.java,
add senderModels field, inject SenderModelService into OcrTrainingService so personNames
covers all known senders rather than only recent-run participants.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add V42 partial unique index on ocr_training_runs(person_id) WHERE status='QUEUED'
to enforce the per-person queued coalescing guarantee at the DB level. Also adds
@ExtendWith(MockitoExtension.class) to SenderModelServiceTest for consistency with
the rest of the service test suite, with lenient() on the shared txTemplate stub.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the per-run getById loop with a single getAllById call on distinct
person IDs, eliminating the N+1 query when training history contains multiple
sender model runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove the intermediate Map<String,Object> and return the typed record directly
so OpenAPI codegen produces a concrete TypeScript type. Fixes lastRun serializing
as {} (empty object) instead of null when no training run exists.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OcrAsyncRunner now passes the per-sender model path to streamBlocks for
HANDWRITING_KURRENT documents. processDocument replaced extractBlocks
with streamBlocks + AtomicReference, removing the unchecked raw-array
pattern.
Also stages all previously uncommitted foundational files for this
feature: SenderModel entity, SenderModelRepository, Flyway migrations
V40/V41, updated OcrClient/RestClientOcrClient streaming API,
TrainingDataExportService.exportForSender, TranscriptionService Kurrent
hook, application.yaml OCR config, and frontend i18n/test additions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eliminates cross-domain repository access: OcrTrainingService no longer
holds SenderModelRepository. SenderModelService now owns the full sender
training lifecycle (runOrQueueSenderTraining, triggerSenderTraining,
promoteNextQueuedRun), removing the circular dependency risk.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces catch(Exception ignored){} with log.debug() in getTrainingInfo().
Adds controller test documenting the graceful degradation behavior
(response stays 200 when personService.getById() throws).
Fixes reviewer concerns from @felixbrandt and @nullx.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- OcrAsyncRunnerTest: switch from extractBlocks/4-arg streamBlocks stubs
to 5-arg streamBlocks (senderModelPath param) via doAnswer
- TranscriptionServiceTest: stub documentService.getDocumentById in
updateBlock tests so the new Kurrent training hook does not NPE
- OcrControllerTest: add @MockitoBean PersonService (now injected into
OcrController for personNames assembly in getTrainingInfo)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace stringly-typed "AND"/"OR" tagOperator with TagOperator enum (DocumentService, DocumentController)
- Replace Object[] with TagCount projection interface in TagRepository.findDocumentCountsPerTag()
- Use @NotNull + @Valid on MergeTagDTO.targetId; remove manual null check from TagController
- Correct ALLOWED_TAG_COLORS to match actual frontend CSS tokens (sage/sienna/amber/slate/violet/rose/cobalt/moss/sand/coral)
- Add TOCTOU comment to validateNoAncestorCycle() with mitigation explanation
- Add test: deleteWithDescendants_skipsDocTagDeletion_whenDescendantIdsIsEmpty
- Update TagServiceTest to use mock TagRepository.TagCount projection
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Colors are stored only on root-level tags. DocumentService now calls
TagService.resolveEffectiveColors() before returning search results and
single-document responses, so child tags carry their parent's color when
serialised to JSON. Parent tags are batch-loaded in a single query.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace hasTags(List<String>) spec with hasTags(List<Set<UUID>>, useOr)
- AND mode: one EXISTS subquery per expanded tag ID set; empty set = disjunction
- OR mode: union of all expanded sets into a single EXISTS subquery
- DocumentService calls tagService.expandTagNamesToDescendantIdSets() before building spec
- DocumentController exposes ?tagOp=AND|OR query param (default AND)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds parent_id FK (ON DELETE SET NULL), self-reference check constraint,
parent_id index, and nullable color column to the tag table.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Lesefertig pulse was removed from the UI; drop the backend support
for it too — removes the subquery from findWeeklyStats(), the projection
getter, the DTO field, and updates all affected tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verifies 401/403/200 responses for all four endpoints. Matches
the @WebMvcTest + @RequirePermission pattern used across the project.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces TranscriptionQueueProjection and TranscriptionWeeklyStatsProjection
interfaces so column reordering in native SQL can never silently produce wrong
data. Removes the four type-coercion helpers (toUUID, toLocalDate, toInt, toLong)
from TranscriptionQueueService. Covered by TranscriptionQueueServiceTest (6 tests).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two backend tests passed a 6-element enrichment row but the rebase
added summary_snippet as column 7 — added null at index 6 to both
fixtures.
Two frontend page.server tests mocked only 4 dashboard API calls but
the page now makes 8 (3 Mission Control queues + weekly-stats added
on this branch) — added the 4 missing mock responses.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a summary_snippet column to findEnrichmentData using ts_headline on
documents.summary, only when the summary's tsvector matches the query.
Expose it via SearchMatchData.summarySnippet / summaryOffsets and render
a "Zusammenfassung" / "Summary" / "Resumen" labelled row in the document
list — identical treatment to the transcription snippet row.
Fixes the case where a document appeared in search results with no
visible match explanation (e.g. searching "frucht" found a document
whose summary mentioned "Früchte").
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace websearch_to_tsquery with a CROSS JOIN LATERAL subquery that
appends :* to each lexeme so prefix matches work (e.g. "furchtb" finds
"furchtbar"). websearch_to_tsquery still handles the safe tokenisation
of user input (stop words, special chars, operators); regexp_replace
then adds :* before to_tsquery re-parses the result.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- SearchMatchData gains a 6th field snippetOffsets: List<MatchOffset> so the frontend
can render highlighted terms inside the transcription snippet without {#html}.
- DocumentRepository.findEnrichmentData now calls ts_headline() with chr(1)/chr(2)
sentinels instead of returning raw block text; parseHighlight() strips the sentinels
and produces clean text + MatchOffset list in one pass.
- DocumentService exposes ParsedHighlight and parseHighlight() as public so they can be
called from cross-package integration tests.
- All related tests updated to the new 6-argument SearchMatchData constructor and
to call parseHighlight() for asserting the snippet clean text and offsets.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DocumentService.searchDocuments now returns DocumentSearchResult with matchData
populated from findEnrichmentData. Title highlights are parsed from chr(1)/chr(2)
delimiters into MatchOffset lists; transcription snippet and sender/receiver/tag
match flags are extracted from the same native SQL row.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- should_find_document_by_sender_name — symmetric with existing receiver test
- fts_combined_with_status_filter_excludes_non_matching_status — verifies
hasIds(rankedIds).and(hasStatus(...)) two-phase search works together
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a user explicitly selects DATE sort with a text query active, the
previous code treated it identically to RELEVANCE, silently discarding
the user's sort choice. Remove DATE from the useRankOrder condition so
that explicit DATE sort always goes through the standard JPA sort path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- V34 migration: adds search_vector tsvector column with GIN index
- BEFORE INSERT/UPDATE trigger on documents rebuilds vector from title (A),
summary + transcription_blocks.text (B), sender/receiver names (C),
tag names + location (D) using german FTS config
- AFTER triggers on transcription_blocks, document_receivers, document_tags
touch the parent document row to re-fire the BEFORE UPDATE trigger
- DocumentRepository.findRankedIdsByFts() native query using websearch_to_tsquery
- DocumentFtsTest: 12 integration tests covering stemming, trigger sync,
ranking, stop words, malformed input, receiver and tag search
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>