familienarchiv

Author	SHA1	Message	Date
Marcel	ca0cf4903c	refactor(#240 ): remove needsExpert feature completely Some checks failed CI / Unit & Component Tests (pull_request) Failing after 2m23s Details CI / Backend Unit Tests (pull_request) Failing after 2m43s Details CI / Backend Unit Tests (push) Has been cancelled Details CI / Unit & Component Tests (push) Has started running Details Drops the needsExpert / needs_expert flag end-to-end: DB migration (V37, never applied), Document entity field, PATCH endpoint, service method, DTO field, all three queue queries, ExpertBadge component, i18n key, generated API types, and test fixture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 10:52:14 +02:00
Marcel	9404ec34ce	fix(#240 ): add missing V36 index migration and rename needs_expert to V37 V36 (add_index_transcription_blocks_document_id) was applied to the dev database during a previous local session but never committed to git. Flyway checksum mismatch prevented the backend from starting. - V36__add_index_transcription_blocks_document_id.sql: restored from the index that already exists in the database (idx_transcription_blocks_document_id) - V36__add_needs_expert_to_documents.sql → V37__add_needs_expert_to_documents.sql Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 10:42:18 +02:00
Marcel	2ea603a3bf	feat(#240 ): backend for Mission Control Strip — queue endpoints + expert flag Adds the server-side foundation for the dashboard transcription widget: - V36 migration: needs_expert BOOLEAN NOT NULL DEFAULT FALSE on documents - Document entity: needsExpert field (@Schema required) - DocumentRepository: 4 native queries — segmentation queue, transcription queue, ready-to-read queue (seeded weekly shuffle sort), weekly pulse stats - TranscriptionQueueService: maps Object[] rows to typed DTOs, handles PostgreSQL type variations (UUID/String, Date/LocalDate, Number/BigDecimal) - TranscriptionQueueController: GET /api/transcription/{segmentation-queue, transcription-queue, ready-to-read, weekly-stats} — all guarded by READ_ALL - DocumentService + DocumentController: PATCH /api/documents/{id}/needs-expert toggles the expert flag (WRITE_ALL required) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 10:41:55 +02:00
Marcel	d7b2357834	feat(search): surface summary snippet when summary matched the query Some checks failed CI / Unit & Component Tests (push) Failing after 2m33s Details CI / Backend Unit Tests (push) Failing after 2m44s Details Add a summary_snippet column to findEnrichmentData using ts_headline on documents.summary, only when the summary's tsvector matches the query. Expose it via SearchMatchData.summarySnippet / summaryOffsets and render a "Zusammenfassung" / "Summary" / "Resumen" labelled row in the document list — identical treatment to the transcription snippet row. Fixes the case where a document appeared in search results with no visible match explanation (e.g. searching "frucht" found a document whose summary mentioned "Früchte"). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	091f7e5d25	feat(search): partial-word matching via to_tsquery prefix queries Replace websearch_to_tsquery with a CROSS JOIN LATERAL subquery that appends :* to each lexeme so prefix matches work (e.g. "furchtb" finds "furchtbar"). websearch_to_tsquery still handles the safe tokenisation of user input (stop words, special chars, operators); regexp_replace then adds :* before to_tsquery re-parses the result. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	32f151ff31	feat(search): add snippetOffsets to SearchMatchData and use ts_headline for highlighted snippets - SearchMatchData gains a 6th field snippetOffsets: List<MatchOffset> so the frontend can render highlighted terms inside the transcription snippet without {#html}. - DocumentRepository.findEnrichmentData now calls ts_headline() with chr(1)/chr(2) sentinels instead of returning raw block text; parseHighlight() strips the sentinels and produces clean text + MatchOffset list in one pass. - DocumentService exposes ParsedHighlight and parseHighlight() as public so they can be called from cross-package integration tests. - All related tests updated to the new 6-argument SearchMatchData constructor and to call parseHighlight() for asserting the snippet clean text and offsets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	162397d4eb	fix(search): make ParsedHighlight and parseHighlight public for cross-package test access Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	bcb2898e5f	perf(search): add index on transcription_blocks.document_id for lateral join Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	2817410f94	test(search): assert matchData key and snippet in controller search response Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	63d1a2e1ff	fix(search): mark documents and total as required in OpenAPI schema Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	f723a83011	feat(search): enrich searchDocuments with per-document match data DocumentService.searchDocuments now returns DocumentSearchResult with matchData populated from findEnrichmentData. Title highlights are parsed from chr(1)/chr(2) delimiters into MatchOffset lists; transcription snippet and sender/receiver/tag match flags are extracted from the same native SQL row. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	c235151075	test(search): add DocumentSearchEnrichmentTest for findEnrichmentData native query Tests lateral join best-block selection, chr(1)/chr(2) headline delimiters, sender/receiver/tag match flags, and null cases for missing relations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	741eebc276	feat(search): add DocumentSearchResult.withMatchData() factory with match overlay map Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	8a5ca6868f	feat(search): add SearchMatchData record for per-document match signals Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	a15b5ebf17	feat(search): add MatchOffset record for character-level highlight positions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 09:10:10 +02:00
Marcel	305f95a572	test(search): add sender name FTS coverage and combined filter test Some checks failed CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1m57s Details CI / Backend Unit Tests (pull_request) Failing after 3m0s Details - should_find_document_by_sender_name — symmetric with existing receiver test - fts_combined_with_status_filter_excludes_non_matching_status — verifies hasIds(rankedIds).and(hasStatus(...)) two-phase search works together Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:35:30 +02:00
Marcel	43595aeb8a	refactor(search): replace O(n²) indexOf with HashMap for rank ordering ids.indexOf() scans the full list for each document, giving O(n²) total. Build a Map<UUID, Integer> once at O(n) and use getOrDefault at O(1) per document. Behavior is identical; existing tests remain green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:35:30 +02:00
Marcel	947d8aeb6c	fix(search): respect DATE sort when text is present — do not override with relevance When a user explicitly selects DATE sort with a text query active, the previous code treated it identically to RELEVANCE, silently discarding the user's sort choice. Remove DATE from the useRankOrder condition so that explicit DATE sort always goes through the standard JPA sort path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:35:30 +02:00
Marcel	7ec3e6170d	feat(fts): backfill search_vector for all existing documents (V35) Fires the BEFORE UPDATE trigger for every documents row, which recomputes the tsvector from all currently-linked metadata, blocks, receivers, and tags. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:35:30 +02:00
Marcel	7d456d8e8b	feat(fts): replace ILIKE hasText with FTS two-phase search and RELEVANCE sort - DocumentSort: add RELEVANCE enum value - DocumentSpecifications: remove hasText() ILIKE, add hasIds(List<UUID>) for FTS-pre-filtered ID sets - DocumentService.searchDocuments(): FTS two-phase path — findRankedIdsByFts() returns ranked UUIDs, hasIds() narrows subsequent Specification query, in-memory re-sort preserves rank order; RELEVANCE is the default when text is present and no explicit non-relevance sort is requested - DocumentSpecificationsTest: remove hasText() tests (Specification removed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:35:30 +02:00
Marcel	24530cf85b	feat(fts): add search_vector column, GIN index, DB triggers, and FTS repository method (V34) - V34 migration: adds search_vector tsvector column with GIN index - BEFORE INSERT/UPDATE trigger on documents rebuilds vector from title (A), summary + transcription_blocks.text (B), sender/receiver names (C), tag names + location (D) using german FTS config - AFTER triggers on transcription_blocks, document_receivers, document_tags touch the parent document row to re-fire the BEFORE UPDATE trigger - DocumentRepository.findRankedIdsByFts() native query using websearch_to_tsquery - DocumentFtsTest: 12 integration tests covering stemming, trigger sync, ranking, stop words, malformed input, receiver and tag search Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:35:16 +02:00
Marcel	48223d5a3d	devops(backend): pin eclipse-temurin tags, skip test compilation, document jar glob - Pin to eclipse-temurin:21.0.10_7-{jdk,jre}-noble for reproducible builds - Switch -DskipTests to -Dmaven.test.skip=true: skips test compilation entirely, not just execution — faster and avoids build failures from test-only missing classes - Add comment on COPY *.jar explaining why the glob is safe (Spring Boot renames the pre-repackage artifact to .jar.original, leaving only one .jar in target/) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:33:03 +02:00
Marcel	04069c0286	devops(backend): add .dockerignore to exclude target/ from build context Prevents 111MB of compiled output from being sent to the BuildKit daemon on cold builds. Only .mvn/, mvnw, pom.xml, and src/ are needed by the three COPY instructions in the Dockerfile. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:33:03 +02:00
Marcel	3c46d820ad	devops(backend): switch to multi-stage Docker build Replace runtime mvn spring-boot:run with a proper multi-stage build: - Stage 1 (builder): compiles JAR with BuildKit cache mount for ~/.m2 - Stage 2 (runtime): eclipse-temurin:21-jre with only the JAR Removes the backend source volume mount and maven_cache named volume. Deploy with: docker compose up -d --build Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:33:03 +02:00
Marcel	81da127381	refactor(ocr): rename findTop5 to findTop10 for headroom as frontend shows 3 by default Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	f206c0b9e9	test(ocr): add unit tests for triggerSegTraining() — conflict, threshold, happy path, failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	15e532eb96	refactor(ocr): extract assertNoRunningTraining() to eliminate duplicate guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	b83465020a	fix(backend): store error rate for segmentation training runs setCer() was called for recognition training but not for segmentation. The OCR service now returns cer = 1 - accuracy for segtrain; persist it so the admin panel can display Fehlerrate for both training types. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	72700bd28f	test(annotations): add Testcontainers integration tests for V33 chk_annotation_bounds [B1] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:36:37 +02:00
Marcel	40c8f548db	docs(annotations): fix ANNOTATION_UPDATE_FAILED Javadoc to reflect 400 status [M3] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:34:55 +02:00
Marcel	a19faa3806	feat(annotations): add @Slf4j and DataIntegrityViolationException catch to updateAnnotation [M2] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:34:03 +02:00
Marcel	f00b470928	test(annotations): add failing test for DataIntegrityViolationException defense [M2 red] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:33:43 +02:00
Marcel	65d606d8bb	test(annotations): add missing height and x boundary validation tests [M4] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:31:07 +02:00
Marcel	4d3207fc27	test(annotations): verify save() is called in updateAnnotation test [M5] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:30:50 +02:00
Marcel	ff231db671	feat(annotations): add PATCH endpoint for annotation resize/move Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:42:08 +02:00
Marcel	1558881c01	feat(annotations): add updateAnnotation service method with partial-update DTO Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:39:50 +02:00
Marcel	26c7181ba4	feat(annotations): add ANNOTATION_UPDATE_FAILED error code Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:38:33 +02:00
Marcel	f76a6c0ee5	migration(annotations): add chk_annotation_bounds CHECK constraint (V33) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:38:11 +02:00
Marcel	22ee3dce68	fix(api): remove duplicate import and align patchTrainingLabel OpenAPI response to 204 Removed duplicate import of org.mockito.ArgumentMatchers.eq from DocumentControllerTest (lines 32+35). Added @ApiResponse(responseCode="204") to patchTrainingLabel so the generated OpenAPI spec matches the actual NoContent response the controller returns. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:07:41 +02:00
Marcel	dc283ba271	fix(training): remove @Transactional from triggerTraining to avoid holding DB connection during OCR HTTP call OcrTrainingService.triggerTraining() and triggerSegTraining() held a DB connection open for the entire ketos training run (potentially minutes), risking connection pool exhaustion. Replaced class-level @Transactional with TransactionTemplate for narrow DB writes: guard+create and result-record each run in their own short transaction; the HTTP call to the OCR service runs between them with no open connection. Also replaces blockRepository.findAll().size() with blockRepository.count() in getTrainingInfo() to avoid loading every block into heap on each poll. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 09:59:12 +02:00
Marcel	7b79dc105b	test(migrations): add Testcontainers integration tests for V23 + V30 constraints Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 0s Details V23 introduced a JSONB check constraint (chk_annotation_polygon_quad) requiring polygon arrays to have exactly 4 points. V30 introduced a partial unique index preventing two concurrent RUNNING training runs. These are DB-level invariants that unit tests cannot verify — five Testcontainers tests now assert they are correctly applied by Flyway and enforced by PostgreSQL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:07:17 +02:00
Marcel	287920a982	docs(ocr): document single-node constraint for OCR training Training reloads the Kraken model in-process on the Python service. The DB-level RUNNING constraint prevents concurrent API calls but cannot protect against multi-replica deployments. Added explicit comments in docker-compose.yml and OcrTrainingService to prevent accidental horizontal scaling. See ADR-001. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:01:45 +02:00
Marcel	2b355e748e	fix(ocr): increase presigned URL TTL from 15 min to 1 hour A 100-page document at ~10 s/page takes ~17 min on CPU-only hardware, which could cause the presigned URL to expire mid-OCR job. 1 hour gives ample headroom for any realistic document size in this archive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:00:52 +02:00
Marcel	2181fe0b50	test(annotations): fix AnnotationServiceTest — add missing TranscriptionBlockRepository mock The cascade-delete commit (`5a5a8b6`) added blockRepository.deleteByAnnotationId() to AnnotationService.deleteAnnotation(), but the test class was not updated to mock TranscriptionBlockRepository. Mockito injected null, causing deleteAnnotation_succeeds_whenOwner to throw NPE. Adds the mock, verifies the cascade call, and adds an inOrder test asserting the block is deleted before the annotation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:00:09 +02:00
Marcel	5a5a8b6e5c	fix(annotations): cascade-delete transcription block when annotation is deleted Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details The DELETE endpoint was returning 500 due to a FK constraint violation. `deleteAnnotation` now calls `blockRepository.deleteByAnnotationId()` before removing the annotation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 22:31:02 +02:00
Marcel	49c9022285	fix(training): switch to PAGE XML format for kurrent recognition training Kraken 7 removed support for the legacy `path` format (image + .gt.txt pairs) in VGSLRecognitionDataModule despite the CLI still advertising it. Switching to PAGE XML (-f page) format which is the supported standard. - Java export now writes .xml alongside .png (PAGE XML with TextLine, Baseline at 75% height, and Unicode transcription) - XML special characters in transcription text are escaped (& < >) - Python trainer globs *.xml and passes -f page to ketos train - Regenerated frontend API types to include cer/loss/accuracy/epochs on OcrTrainingRun (were missing, causing empty CER column in history) - Updated and extended TrainingDataExportServiceTest Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 21:45:08 +02:00
Marcel	22954f348a	feat(training): track and display CER per training run After each training run, the Character Error Rate (CER = 1 - accuracy), loss, accuracy, and epoch count are now stored on the OcrTrainingRun record and shown in the training history table. Also adds the missing POST /api/ocr/segtrain endpoint and the triggerSegTraining service method so the segmentation training card can actually trigger training. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 19:01:10 +02:00
Marcel	a99afef319	fix(training): only count reviewed blocks as checked text for recognition Previously all MANUAL blocks counted as eligible training data, even ones where text was filled in by guided OCR but never explicitly reviewed. This caused segmentation and recognition counts to always match. Now only reviewed=true blocks qualify for recognition training, so the counts properly reflect: segments = all drawn annotation boxes, checked text = only boxes where the user has verified the transcription. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 18:00:59 +02:00
Marcel	063095f58c	fix(training): count segmentation blocks regardless of text content The findSegmentationBlocks query was filtering out blocks with non-empty text. Segmentation training only needs annotation geometry (polygon/bbox), not transcription text — so any MANUAL block on a KURRENT_SEGMENTATION document should count, regardless of whether it has text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 17:14:40 +02:00
Marcel	b6f74fd6fc	refactor(annotations): remove overlap check to allow intersecting regions Historical letter lines often intersect, so the system must support overlapping annotation regions. Removed the overlap guard from createAnnotation(), deleted ErrorCode.ANNOTATION_OVERLAP, and cleaned up all tests and frontend error mappings that referenced it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 16:48:18 +02:00

1 2 3 4 5 ...

252 Commits