familienarchiv

Author	SHA1	Message	Date
Marcel	305f95a572	test(search): add sender name FTS coverage and combined filter test Some checks failed CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1m57s Details CI / Backend Unit Tests (pull_request) Failing after 3m0s Details - should_find_document_by_sender_name — symmetric with existing receiver test - fts_combined_with_status_filter_excludes_non_matching_status — verifies hasIds(rankedIds).and(hasStatus(...)) two-phase search works together Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:35:30 +02:00
Marcel	947d8aeb6c	fix(search): respect DATE sort when text is present — do not override with relevance When a user explicitly selects DATE sort with a text query active, the previous code treated it identically to RELEVANCE, silently discarding the user's sort choice. Remove DATE from the useRankOrder condition so that explicit DATE sort always goes through the standard JPA sort path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:35:30 +02:00
Marcel	7d456d8e8b	feat(fts): replace ILIKE hasText with FTS two-phase search and RELEVANCE sort - DocumentSort: add RELEVANCE enum value - DocumentSpecifications: remove hasText() ILIKE, add hasIds(List<UUID>) for FTS-pre-filtered ID sets - DocumentService.searchDocuments(): FTS two-phase path — findRankedIdsByFts() returns ranked UUIDs, hasIds() narrows subsequent Specification query, in-memory re-sort preserves rank order; RELEVANCE is the default when text is present and no explicit non-relevance sort is requested - DocumentSpecificationsTest: remove hasText() tests (Specification removed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:35:30 +02:00
Marcel	24530cf85b	feat(fts): add search_vector column, GIN index, DB triggers, and FTS repository method (V34) - V34 migration: adds search_vector tsvector column with GIN index - BEFORE INSERT/UPDATE trigger on documents rebuilds vector from title (A), summary + transcription_blocks.text (B), sender/receiver names (C), tag names + location (D) using german FTS config - AFTER triggers on transcription_blocks, document_receivers, document_tags touch the parent document row to re-fire the BEFORE UPDATE trigger - DocumentRepository.findRankedIdsByFts() native query using websearch_to_tsquery - DocumentFtsTest: 12 integration tests covering stemming, trigger sync, ranking, stop words, malformed input, receiver and tag search Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:35:16 +02:00
Marcel	81da127381	refactor(ocr): rename findTop5 to findTop10 for headroom as frontend shows 3 by default Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	f206c0b9e9	test(ocr): add unit tests for triggerSegTraining() — conflict, threshold, happy path, failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 21:17:53 +02:00
Marcel	72700bd28f	test(annotations): add Testcontainers integration tests for V33 chk_annotation_bounds [B1] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:36:37 +02:00
Marcel	f00b470928	test(annotations): add failing test for DataIntegrityViolationException defense [M2 red] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:33:43 +02:00
Marcel	65d606d8bb	test(annotations): add missing height and x boundary validation tests [M4] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:31:07 +02:00
Marcel	4d3207fc27	test(annotations): verify save() is called in updateAnnotation test [M5] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 14:30:50 +02:00
Marcel	ff231db671	feat(annotations): add PATCH endpoint for annotation resize/move Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:42:08 +02:00
Marcel	1558881c01	feat(annotations): add updateAnnotation service method with partial-update DTO Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:39:50 +02:00
Marcel	22ee3dce68	fix(api): remove duplicate import and align patchTrainingLabel OpenAPI response to 204 Removed duplicate import of org.mockito.ArgumentMatchers.eq from DocumentControllerTest (lines 32+35). Added @ApiResponse(responseCode="204") to patchTrainingLabel so the generated OpenAPI spec matches the actual NoContent response the controller returns. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 10:07:41 +02:00
Marcel	dc283ba271	fix(training): remove @Transactional from triggerTraining to avoid holding DB connection during OCR HTTP call OcrTrainingService.triggerTraining() and triggerSegTraining() held a DB connection open for the entire ketos training run (potentially minutes), risking connection pool exhaustion. Replaced class-level @Transactional with TransactionTemplate for narrow DB writes: guard+create and result-record each run in their own short transaction; the HTTP call to the OCR service runs between them with no open connection. Also replaces blockRepository.findAll().size() with blockRepository.count() in getTrainingInfo() to avoid loading every block into heap on each poll. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 09:59:12 +02:00
Marcel	7b79dc105b	test(migrations): add Testcontainers integration tests for V23 + V30 constraints Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 0s Details V23 introduced a JSONB check constraint (chk_annotation_polygon_quad) requiring polygon arrays to have exactly 4 points. V30 introduced a partial unique index preventing two concurrent RUNNING training runs. These are DB-level invariants that unit tests cannot verify — five Testcontainers tests now assert they are correctly applied by Flyway and enforced by PostgreSQL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:07:17 +02:00
Marcel	2181fe0b50	test(annotations): fix AnnotationServiceTest — add missing TranscriptionBlockRepository mock The cascade-delete commit (`5a5a8b6`) added blockRepository.deleteByAnnotationId() to AnnotationService.deleteAnnotation(), but the test class was not updated to mock TranscriptionBlockRepository. Mockito injected null, causing deleteAnnotation_succeeds_whenOwner to throw NPE. Adds the mock, verifies the cascade call, and adds an inOrder test asserting the block is deleted before the annotation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 23:00:09 +02:00
Marcel	49c9022285	fix(training): switch to PAGE XML format for kurrent recognition training Kraken 7 removed support for the legacy `path` format (image + .gt.txt pairs) in VGSLRecognitionDataModule despite the CLI still advertising it. Switching to PAGE XML (-f page) format which is the supported standard. - Java export now writes .xml alongside .png (PAGE XML with TextLine, Baseline at 75% height, and Unicode transcription) - XML special characters in transcription text are escaped (& < >) - Python trainer globs *.xml and passes -f page to ketos train - Regenerated frontend API types to include cer/loss/accuracy/epochs on OcrTrainingRun (were missing, causing empty CER column in history) - Updated and extended TrainingDataExportServiceTest Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 21:45:08 +02:00
Marcel	22954f348a	feat(training): track and display CER per training run After each training run, the Character Error Rate (CER = 1 - accuracy), loss, accuracy, and epoch count are now stored on the OcrTrainingRun record and shown in the training history table. Also adds the missing POST /api/ocr/segtrain endpoint and the triggerSegTraining service method so the segmentation training card can actually trigger training. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 19:01:10 +02:00
Marcel	a99afef319	fix(training): only count reviewed blocks as checked text for recognition Previously all MANUAL blocks counted as eligible training data, even ones where text was filled in by guided OCR but never explicitly reviewed. This caused segmentation and recognition counts to always match. Now only reviewed=true blocks qualify for recognition training, so the counts properly reflect: segments = all drawn annotation boxes, checked text = only boxes where the user has verified the transcription. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 18:00:59 +02:00
Marcel	b6f74fd6fc	refactor(annotations): remove overlap check to allow intersecting regions Historical letter lines often intersect, so the system must support overlapping annotation regions. Removed the overlap guard from createAnnotation(), deleted ErrorCode.ANNOTATION_OVERLAP, and cleaned up all tests and frontend error mappings that referenced it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 16:48:18 +02:00
Marcel	8618e520b5	fix(ocr): fill empty MANUAL blocks in guided OCR mode When a user draws annotation boxes to mark OCR regions, the blocks are created with source=MANUAL and empty text. upsertGuidedBlock was protecting all MANUAL blocks unconditionally, so guided OCR silently produced no output for these drawn-but-empty blocks. Changed the guard to only protect non-empty MANUAL blocks — empty ones are treated like OCR blocks and get their text filled in. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 16:25:23 +02:00
Marcel	ee58b63517	feat(ocr): add guided OCR mode using existing annotation regions When a document has manually drawn annotation boxes, the user can now enable "Nur annotierte Bereiche" in the OCR trigger panel. The engine skips layout detection entirely and runs recognition only within the pre-drawn bounding boxes, preserving manual transcription blocks. - Python: adds OcrRegion model, extend OcrRequest/OcrBlock; guided branch in /ocr/stream groups by page and crops each region - Engines: add extract_region_text() to both Kraken and Surya - Java: adds OcrBlockResult.annotationId, OcrClient.OcrRegion, TriggerOcrDTO.useExistingAnnotations; OcrAsyncRunner dispatches to upsertGuidedBlock when annotationId is present; OcrService threads the flag through to runSingleDocument - TranscriptionService: adds upsertGuidedBlock (creates, updates OCR, or preserves MANUAL blocks) - Frontend: guided OCR toggle in OcrTrigger shown when blocks exist; skips destructive-replace confirmation in guided mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 15:57:54 +02:00
Marcel	9b2f91ee59	feat(training): add segmentation training pipeline and complete Part 6 - Add /segtrain endpoint to OCR service (ZIP upload, ketos.segtrain, backup rotation, in-process model reload) - Add segtrainModel() to OcrClient and RestClientOcrClient (10-min timeout, X-Training-Token header) - Add SegmentationTrainingExportService: PAGE XML export with polygon de-normalization and per-page PNG rendering via PDFBox - Add GET /api/ocr/segmentation-training-data/export endpoint - Make TranscriptionBlock.text nullable for segmentation-only blocks (V31 migration) - Add Paraglide i18n translation keys for all training UI strings (de/en/es) - Pass source prop from TranscriptionEditView to TranscriptionBlock Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 15:15:17 +02:00
Marcel	86e9c05aaf	feat(training): add Paraglide i18n to training UI components and wire SegmentationTrainingCard - Convert TrainingHistory, OcrTrainingCard, SegmentationTrainingCard, and TranscriptionBlock "Nur Segmentierung" badge to use Paraglide message keys - Add availableSegBlocks to TrainingInfoResponse to expose segmentation block count in the training info endpoint - Wire SegmentationTrainingCard into admin/system page below OCR training card - Update api.ts with availableSegBlocks field Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 15:14:27 +02:00
Marcel	88e005eb49	feat(ocr): add training history + POST /train + GET /training-info endpoints - OcrTrainingRun entity + V30 migration (partial unique index prevents concurrent runs at DB level) - OcrTrainingService: concurrent-run guard, 5-block threshold, MDC log correlation, orphan recovery on ApplicationReadyEvent - POST /api/ocr/train (ADMIN) + GET /api/ocr/training-info (ADMIN) - TRAINING_ALREADY_RUNNING ErrorCode - 6 OcrTrainingServiceTest + 6 OcrControllerTest tests for the new endpoints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 14:47:56 +02:00
Marcel	bc97a2dade	feat(ocr): add /train endpoint to OCR service and OcrClient.trainModel() - POST /train in ocr-service with ZIP Slip validation, TemporaryDirectory, ketos transfer learning, timestamped backups (keep last 3), in-process reload - X-Training-Token auth (no-op in dev when TRAINING_TOKEN env is empty) - trainModel() in OcrClient interface + RestClientOcrClient (10-min timeout, multipart upload, forwards X-Training-Token when configured) - TRAINING_TOKEN env var wired in docker-compose; --workers 2 in Dockerfile so /health stays responsive during synchronous training Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 14:40:53 +02:00
Marcel	cfa3c4df67	feat(training): add recognition training data export - TrainingDataExportService: PDFBox rendering at 300 DPI, crop by annotation coordinates, ZIP with <uuid>.png + <uuid>.gt.txt pairs - Skips documents with missing S3 files (logs WARN, continues) - GET /api/ocr/training-data/export (ADMIN); 204 when no enrolled blocks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 14:35:06 +02:00
Marcel	fdf1eb92ad	feat(training): add document-level training enrollment - V29 migration: document_training_labels join table - TrainingLabel enum: KURRENT_RECOGNITION, KURRENT_SEGMENTATION - Document.trainingLabels @ElementCollection - DocumentService.addTrainingLabel / removeTrainingLabel - PATCH /api/documents/{id}/training-labels (WRITE_ALL) - Auto-enroll on Kurrent OCR trigger (OcrService.startOcr) - TranscriptionEditView: enrollment chips in panel footer - JPQL queries updated to use MEMBER OF trainingLabels Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 14:30:51 +02:00
Marcel	9282e46a02	fix(ocr): handle unknown NDJSON fields with @JsonIgnoreProperties Added @JsonIgnoreProperties(ignoreUnknown = true) to OcrBlockResult so new fields from the Python OCR service don't crash the Java parser, while keeping FAIL_ON_UNKNOWN_PROPERTIES strict globally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:27:20 +02:00
Marcel	caae2ead81	refactor(ocr): route block lifecycle through TranscriptionService OcrAsyncRunner was bypassing TranscriptionService — building blocks directly and calling blockRepository.save(), skipping sanitizeText() and saveVersion(). Also replaced N individual deleteBlock() calls with a single bulk deleteAllBlocksByDocument() for OCR re-runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:27:01 +02:00
Marcel	6a0fd25662	fix(ocr): persist scriptType override via DocumentService transaction OcrService.startOcr() was setting scriptType on a detached entity, silently losing the mutation. Added DocumentService.updateScriptType() with @Transactional to persist the change properly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:26:37 +02:00
Marcel	2d43f09172	refactor(ocr): move repository access from OcrController into OcrService OcrController was injecting OcrJobRepository and OcrJobDocumentRepository directly, violating the Controller → Service → Repository layering rule. Moved getJob() and getDocumentOcrStatus() logic into OcrService. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:26:14 +02:00
Marcel	292dc66f3c	feat(ocr): rewrite runSingleDocument to use streamBlocks with per-page progress Replace the single extractBlocks() call with streamBlocks() that processes pages incrementally. Each page's blocks are persisted immediately via createSingleBlock(). Progress updates use the ANALYZING_PAGE:current:total:blocks format. Per-page errors are logged at WARN level without failing the entire job. The batch path (processDocument) remains on the old extractBlocks() path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:07:06 +02:00
Marcel	93c3154b3c	feat(ocr): implement NDJSON streaming in RestClientOcrClient Add streamBlocks() that POSTs to /ocr/stream and parses the NDJSON response line by line with a dedicated ObjectMapper. Falls back to the old /ocr endpoint via the default method when /ocr/stream returns 404. Uses a separate HttpClient with 5-minute request timeout for streaming. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:03:12 +02:00
Marcel	641e91d5a3	feat(ocr): add default streamBlocks method to OcrClient interface The default method synthesizes Start/Page/Done events from extractBlocks() results, providing backward compatibility for implementations that don't support streaming natively. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:01:26 +02:00
Marcel	e21d01e10b	feat(ocr): add OcrStreamEvent sealed interface with Start/Page/Error/Done records Defines the event types for NDJSON streaming OCR. Uses Java 21 sealed interface with record subtypes for exhaustive pattern matching in the consumer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:00:02 +02:00
Marcel	c1befd3fa3	fix(ocr): resume polling on page reload + track single-doc job status Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 0s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details Single-document OCR now creates an OcrJobDocument row so GET /api/documents/{id}/ocr-status can find running jobs. OcrAsyncRunner updates the job document status (RUNNING → DONE/FAILED). Frontend checks OCR status when entering transcription mode — if a job is running, resumes polling and shows the spinner. Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 23:16:59 +02:00
Marcel	dd175d09e2	refactor(ocr): make single-document OCR async, fix circular dependency Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details OcrService → OcrAsyncRunner was circular. Fixed by moving all OCR processing logic (processDocument, clearExistingBlocks, createBlocks) into OcrAsyncRunner. OcrService is now a thin entry point that validates, creates the job, and dispatches to OcrAsyncRunner. Architecture: - OcrService: validates document, checks health, creates OcrJob, delegates - OcrAsyncRunner: @Async processDocument + runSingleDocument + runBatch - OcrBatchService: creates job + job documents, delegates to OcrAsyncRunner - No circular dependencies Single-document OCR is now async (returns jobId immediately). Frontend polls GET /api/ocr/jobs/{jobId} every 3s until DONE/FAILED. 816 backend tests pass, 687 frontend tests pass. Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 22:55:52 +02:00
Marcel	4500c99e40	fix(ocr): use presigned URLs for MinIO access from OCR service Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 0s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details The OCR service was getting 403 Forbidden because it tried to download PDFs from MinIO using plain internal URLs without authentication. MinIO buckets are private. - Add S3Presigner bean to MinioConfig - FileService.generatePresignedUrl(): generates 15-min presigned URLs - OcrService uses presigned URLs instead of plain internal URLs - Remove unused s3InternalUrl / bucketName @Value fields from OcrService Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 22:16:52 +02:00
Marcel	3aaec01421	feat(transcription): add source/reviewed fields for training pipeline Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 0s Details CI / Unit & Component Tests (pull_request) Failing after 0s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details - BlockSource enum: MANUAL, OCR - V26 migration adds source + reviewed columns to transcription_blocks - OcrService sets source=OCR when creating blocks - TranscriptionService.reviewBlock() toggles the reviewed flag - PUT /api/documents/{id}/transcription-blocks/{blockId}/review endpoint - 5 new tests: reviewBlock toggle/untoggle/notfound, controller, OcrService source=OCR verification The reviewed flag enables the Kraken fine-tuning pipeline: only blocks marked as reviewed by a human are exported as training data. Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 21:44:51 +02:00
Marcel	aea46c5fd0	feat(ocr): add OcrService, OcrBatchService, OcrProgressService, OcrController - OcrService: single-document OCR (health check, block clearing, presigned URL, annotation + block creation) - OcrBatchService: batch processing with @Async, per-document status tracking, SKIPPED for PLACEHOLDER documents, failure isolation - OcrProgressService: SSE emitter registry per job ID with 5-min timeout - OcrController: POST /api/documents/{id}/ocr (WRITE_ALL), POST /api/ocr/batch (ADMIN), GET /api/ocr/jobs/{id} (READ_ALL), GET /api/ocr/jobs/{id}/progress (SSE), GET /api/documents/{id}/ocr-status 19 tests: 6 OcrService, 4 OcrBatchService, 3 OcrProgressService, 6 OcrController Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:24:15 +02:00
Marcel	c19c41f812	feat(annotations): add createOcrAnnotation that skips overlap check OCR creates many adjacent text line annotations that would fail the existing overlap check. createOcrAnnotation() accepts an optional polygon and bypasses overlap detection entirely. Refs #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:12:11 +02:00
Marcel	878a90a86d	feat(annotations): add polygon JSONB support for quadrilateral shapes - V23 migration adds polygon JSONB column with 4-point CHECK constraint - PolygonConverter: AttributeConverter for List<List<Double>> <-> JSONB - @UniquePoints custom validator rejects duplicate coordinates - CreateAnnotationDTO: validated optional polygon field - DocumentAnnotation entity: polygon field with converter Refs #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:10:35 +02:00
Marcel	e69aaa6a8c	fix: classify Steuerfinanzamt and Reichsfechtschule as institutions Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details Add "amt" and "schule" suffixes to INSTITUTION_END in PersonTypeClassifier so German government offices and schools are auto-classified on import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-08 20:59:17 +02:00
Marcel	5106d277f1	test(service): add integration test for findOrCreateByAlias classification Some checks failed CI / Unit & Component Tests (pull_request) Failing after 2s Details CI / Backend Unit Tests (pull_request) Failing after 2s Details CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 1s Details Testcontainers test verifying: SKIP returns null with no DB record, INSTITUTION/GROUP store full name in lastName with null firstName and correct personType, PERSON splits name normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:29:20 +02:00
Marcel	ac545ecdaa	refactor: address PR review concerns - Remove Architekt from WORD_PREFIXES (classifier handles it) - Use Objects.equals for null-safe firstName/lastName comparison - Remove unused trimmed variable in PersonTypeClassifier - Fix containsWord to loop through all occurrences (finds "Eltern" in "Nachbareltern Eltern") - Extract DisplayNameFormatter utility shared by Person and PersonSummaryDTO to eliminate display logic duplication Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:25:06 +02:00
Marcel	73640ef5b6	feat(parser): implement stripTitle for known prefixes Some checks failed CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 2s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details Two-pass title stripping with loop for stacked titles: - Dot-prefixes (Dr., Prof.) matched without trailing space - Word-prefixes (Tante, Frau, Schwester, etc.) matched at word boundary - Stacked titles like "Prof. Dr. Muller" handled correctly - Single token after title strip goes to lastName (not firstName) Add 5 "von" last names to KNOWN_LAST_NAMES for correct splitting of entries like "Freifrau von Massenbach". 15 new test cases + updated 3 existing tests for title behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:15:18 +02:00
Marcel	a3da5731d0	feat(service): integrate PersonTypeClassifier into findOrCreateByAlias Classify raw name before processing. SKIP returns null (no Person created). INSTITUTION/GROUP skip split() and store full name in lastName with firstName=null and appropriate personType. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:06:49 +02:00
Marcel	68f0c4c4b9	feat(service): add PersonTypeClassifier with keyword heuristics Static classify() method uses position-aware keyword matching: - SKIP: Briefumschlag, Kondolenzbriefe, Hochzeitsgedicht (start) - INSTITUTION: Firma, Architekt (start), GmbH, Co (end) - GROUP: Familie, Comité, Comite, Geschwister, Gesellschafter, Garde, Mitarbeiter (start), Eltern, Kinder, Schwiegereltern (word boundary) - PERSON: default for all other inputs Case-insensitive. 25 parameterized test cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:03:53 +02:00
Marcel	e49ae5de29	fix(parser): preserve annotation parens for single-person inputs Move paren extraction in parseReceivers() after the multi-separator check so single-person entries like "Clara de Gruyter(*1871)" keep their parens intact for split()'s annotation extraction. Multi-person entries like "Hedi und Tutu (Gruber)" still use parens as shared last-name override. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:00:34 +02:00

1 2 3

144 Commits