familienarchiv

Author	SHA1	Message	Date
Marcel	88e005eb49	feat(ocr): add training history + POST /train + GET /training-info endpoints - OcrTrainingRun entity + V30 migration (partial unique index prevents concurrent runs at DB level) - OcrTrainingService: concurrent-run guard, 5-block threshold, MDC log correlation, orphan recovery on ApplicationReadyEvent - POST /api/ocr/train (ADMIN) + GET /api/ocr/training-info (ADMIN) - TRAINING_ALREADY_RUNNING ErrorCode - 6 OcrTrainingServiceTest + 6 OcrControllerTest tests for the new endpoints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 14:47:56 +02:00
Marcel	bc97a2dade	feat(ocr): add /train endpoint to OCR service and OcrClient.trainModel() - POST /train in ocr-service with ZIP Slip validation, TemporaryDirectory, ketos transfer learning, timestamped backups (keep last 3), in-process reload - X-Training-Token auth (no-op in dev when TRAINING_TOKEN env is empty) - trainModel() in OcrClient interface + RestClientOcrClient (10-min timeout, multipart upload, forwards X-Training-Token when configured) - TRAINING_TOKEN env var wired in docker-compose; --workers 2 in Dockerfile so /health stays responsive during synchronous training Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 14:40:53 +02:00
Marcel	cfa3c4df67	feat(training): add recognition training data export - TrainingDataExportService: PDFBox rendering at 300 DPI, crop by annotation coordinates, ZIP with <uuid>.png + <uuid>.gt.txt pairs - Skips documents with missing S3 files (logs WARN, continues) - GET /api/ocr/training-data/export (ADMIN); 204 when no enrolled blocks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 14:35:06 +02:00
Marcel	fdf1eb92ad	feat(training): add document-level training enrollment - V29 migration: document_training_labels join table - TrainingLabel enum: KURRENT_RECOGNITION, KURRENT_SEGMENTATION - Document.trainingLabels @ElementCollection - DocumentService.addTrainingLabel / removeTrainingLabel - PATCH /api/documents/{id}/training-labels (WRITE_ALL) - Auto-enroll on Kurrent OCR trigger (OcrService.startOcr) - TranscriptionEditView: enrollment chips in panel footer - JPQL queries updated to use MEMBER OF trainingLabels Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 14:30:51 +02:00
Marcel	dd47a48d90	feat(ocr): add unique constraint on (job_id, document_id) Prevents the same document from being added to an OCR job twice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:28:18 +02:00
Marcel	08b1cd5dac	fix(ocr): reduce async queue capacity from 100 to 10 Queue capacity of 100 is disproportionate for 2 worker threads — a backed-up queue would represent hours of unprocessed OCR jobs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:27:58 +02:00
Marcel	5a97316940	fix(ocr): log warning when user ID resolution fails The resolveUserId() catch block was silently swallowing exceptions, making auth failures invisible in logs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:27:39 +02:00
Marcel	9282e46a02	fix(ocr): handle unknown NDJSON fields with @JsonIgnoreProperties Added @JsonIgnoreProperties(ignoreUnknown = true) to OcrBlockResult so new fields from the Python OCR service don't crash the Java parser, while keeping FAIL_ON_UNKNOWN_PROPERTIES strict globally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:27:20 +02:00
Marcel	caae2ead81	refactor(ocr): route block lifecycle through TranscriptionService OcrAsyncRunner was bypassing TranscriptionService — building blocks directly and calling blockRepository.save(), skipping sanitizeText() and saveVersion(). Also replaced N individual deleteBlock() calls with a single bulk deleteAllBlocksByDocument() for OCR re-runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:27:01 +02:00
Marcel	6a0fd25662	fix(ocr): persist scriptType override via DocumentService transaction OcrService.startOcr() was setting scriptType on a detached entity, silently losing the mutation. Added DocumentService.updateScriptType() with @Transactional to persist the change properly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:26:37 +02:00
Marcel	2d43f09172	refactor(ocr): move repository access from OcrController into OcrService OcrController was injecting OcrJobRepository and OcrJobDocumentRepository directly, violating the Controller → Service → Repository layering rule. Moved getJob() and getDocumentOcrStatus() logic into OcrService. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:26:14 +02:00
Marcel	84aca240ea	fix(ocr): remove misleading ANALYZING progress before streaming starts The ANALYZING message appeared while the Python service was still downloading the PDF and loading models. Remove it so the LOADING message ("Lade Modell und Dokument…") stays visible until the first ANALYZING_PAGE event arrives from the stream. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:40:54 +02:00
Marcel	292dc66f3c	feat(ocr): rewrite runSingleDocument to use streamBlocks with per-page progress Replace the single extractBlocks() call with streamBlocks() that processes pages incrementally. Each page's blocks are persisted immediately via createSingleBlock(). Progress updates use the ANALYZING_PAGE:current:total:blocks format. Per-page errors are logged at WARN level without failing the entire job. The batch path (processDocument) remains on the old extractBlocks() path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:07:06 +02:00
Marcel	6823973429	refactor(ocr): extract createSingleBlock from createTranscriptionBlocks Enable per-page block creation during streaming by extracting the loop body into a package-private createSingleBlock() method with an explicit sortOrder parameter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:04:02 +02:00
Marcel	93c3154b3c	feat(ocr): implement NDJSON streaming in RestClientOcrClient Add streamBlocks() that POSTs to /ocr/stream and parses the NDJSON response line by line with a dedicated ObjectMapper. Falls back to the old /ocr endpoint via the default method when /ocr/stream returns 404. Uses a separate HttpClient with 5-minute request timeout for streaming. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:03:12 +02:00
Marcel	641e91d5a3	feat(ocr): add default streamBlocks method to OcrClient interface The default method synthesizes Start/Page/Done events from extractBlocks() results, providing backward compatibility for implementations that don't support streaming natively. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:01:26 +02:00
Marcel	e21d01e10b	feat(ocr): add OcrStreamEvent sealed interface with Start/Page/Error/Done records Defines the event types for NDJSON streaming OCR. Uses Java 21 sealed interface with record subtypes for exhaustive pattern matching in the consumer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 10:00:02 +02:00
Marcel	971527a50e	feat(ocr): show translated progress messages during OCR processing Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 2s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details Backend sends progress codes (PREPARING, LOADING, ANALYZING, CREATING_BLOCKS:N, DONE:N, ERROR) via OcrJob.progressMessage. Frontend translates them via Paraglide (de/en/es) and displays below the spinner. - V27 migration: adds progress_message column to ocr_jobs - OcrAsyncRunner updates progress at each phase - Poll interval reduced to 2s for snappier updates Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 23:31:23 +02:00
Marcel	c1befd3fa3	fix(ocr): resume polling on page reload + track single-doc job status Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 0s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details Single-document OCR now creates an OcrJobDocument row so GET /api/documents/{id}/ocr-status can find running jobs. OcrAsyncRunner updates the job document status (RUNNING → DONE/FAILED). Frontend checks OCR status when entering transcription mode — if a job is running, resumes polling and shows the spinner. Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 23:16:59 +02:00
Marcel	2db1b73d5d	fix(ocr): force HTTP/1.1 on RestClient to OCR service Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 0s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 0s Details JDK HttpClient defaults to HTTP/2 with upgrade negotiation. Uvicorn rejects the upgrade ('Unsupported upgrade request'), causing the request body to be lost and a 422 'Field required' from FastAPI. Force HTTP/1.1 since the OCR service is internal and doesn't need h2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 23:08:11 +02:00
Marcel	9e01009e3d	fix(async): revert to AbortPolicy — CallerRunsPolicy blocks requests Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 0s Details CallerRunsPolicy would cause the HTTP request to hang for minutes if the queue is full. AbortPolicy with queue=100 is safe — the queue will never realistically fill for a family archive. If it somehow does, a clear error is better than a silent multi-minute hang. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 23:02:58 +02:00
Marcel	0bfaa7540b	fix(async): queue 100 tasks + CallerRunsPolicy instead of abort Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details Better to wait than to error. Queue capacity 100 holds plenty of OCR jobs. CallerRunsPolicy means if the queue is somehow full, the request blocks instead of getting rejected with an exception. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 23:01:37 +02:00
Marcel	b6d928e1c5	fix(async): increase thread pool to 2 threads + queue of 10 Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 2s Details The old pool (1 thread, queue=1) meant OCR blocked all other async tasks (imports). Now 2 concurrent async tasks with a queue of 10 — enough for OCR + import to run in parallel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 22:59:31 +02:00
Marcel	aa50951320	fix(ocr): set 10-minute read timeout on RestClientOcrClient Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 0s Details Default RestClient timeout was 10 seconds — OCR on CPU takes minutes. Set connect timeout to 10s, read timeout to 10 minutes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 22:58:00 +02:00
Marcel	dd175d09e2	refactor(ocr): make single-document OCR async, fix circular dependency Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details OcrService → OcrAsyncRunner was circular. Fixed by moving all OCR processing logic (processDocument, clearExistingBlocks, createBlocks) into OcrAsyncRunner. OcrService is now a thin entry point that validates, creates the job, and dispatches to OcrAsyncRunner. Architecture: - OcrService: validates document, checks health, creates OcrJob, delegates - OcrAsyncRunner: @Async processDocument + runSingleDocument + runBatch - OcrBatchService: creates job + job documents, delegates to OcrAsyncRunner - No circular dependencies Single-document OCR is now async (returns jobId immediately). Frontend polls GET /api/ocr/jobs/{jobId} every 3s until DONE/FAILED. 816 backend tests pass, 687 frontend tests pass. Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 22:55:52 +02:00
Marcel	4500c99e40	fix(ocr): use presigned URLs for MinIO access from OCR service Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 0s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details The OCR service was getting 403 Forbidden because it tried to download PDFs from MinIO using plain internal URLs without authentication. MinIO buckets are private. - Add S3Presigner bean to MinioConfig - FileService.generatePresignedUrl(): generates 15-min presigned URLs - OcrService uses presigned URLs instead of plain internal URLs - Remove unused s3InternalUrl / bucketName @Value fields from OcrService Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 22:16:52 +02:00
Marcel	3aaec01421	feat(transcription): add source/reviewed fields for training pipeline Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 0s Details CI / Unit & Component Tests (pull_request) Failing after 0s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details - BlockSource enum: MANUAL, OCR - V26 migration adds source + reviewed columns to transcription_blocks - OcrService sets source=OCR when creating blocks - TranscriptionService.reviewBlock() toggles the reviewed flag - PUT /api/documents/{id}/transcription-blocks/{blockId}/review endpoint - 5 new tests: reviewBlock toggle/untoggle/notfound, controller, OcrService source=OCR verification The reviewed flag enables the Kraken fine-tuning pipeline: only blocks marked as reviewed by a human are exported as training data. Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 21:44:51 +02:00
Marcel	931fbc28e5	fix(annotations): use @JdbcTypeCode(JSON) for polygon JSONB column Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 2s Details Replace @Convert(PolygonConverter) with Hibernate native @JdbcTypeCode(SqlTypes.JSON) to fix JDBC type mismatch — PostgreSQL requires jsonb type, not varchar. The PolygonConverter is retained as a standalone utility but no longer used on the entity. Hibernate 6 natively handles List<List<Double>> serialization to JSONB. Refs #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:39:54 +02:00
Marcel	6737bd6db5	feat(ocr): add Python OCR microservice, RestClientOcrClient, Docker Compose Python microservice (ocr-service/): - FastAPI app with /ocr and /health endpoints - Surya engine: transformer-based OCR for typewritten/modern handwriting - Kraken engine: historical HTR for Kurrent/Suetterlin with pure-Python polygon-to-quad approximation (gift wrapping + rotating calipers) - Eager model loading at startup via lifespan context manager - PDF download via httpx, page rendering via pypdfium2 at 300 DPI Java RestClientOcrClient: - Implements OcrClient + OcrHealthClient interfaces - Calls Python service via Spring RestClient - Health check with graceful fallback Docker Compose: - New ocr-service container (mem_limit 6g, no host ports) - Health check with start_period 60s for model loading - ocr_models volume for Kraken model files - Backend depends on ocr-service health Refs #226, #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:26:40 +02:00
Marcel	aea46c5fd0	feat(ocr): add OcrService, OcrBatchService, OcrProgressService, OcrController - OcrService: single-document OCR (health check, block clearing, presigned URL, annotation + block creation) - OcrBatchService: batch processing with @Async, per-document status tracking, SKIPPED for PLACEHOLDER documents, failure isolation - OcrProgressService: SSE emitter registry per job ID with 5-min timeout - OcrController: POST /api/documents/{id}/ocr (WRITE_ALL), POST /api/ocr/batch (ADMIN), GET /api/ocr/jobs/{id} (READ_ALL), GET /api/ocr/jobs/{id}/progress (SSE), GET /api/documents/{id}/ocr-status 19 tests: 6 OcrService, 4 OcrBatchService, 3 OcrProgressService, 6 OcrController Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:24:15 +02:00
Marcel	ff3990710e	feat(ocr): add OCR infrastructure (interfaces, entities, migrations, DTOs) - OcrClient + OcrHealthClient interfaces for testable OCR integration - OcrBlockResult record for OCR engine response mapping - OcrJob + OcrJobDocument entities with status enums - V25 migration creates ocr_jobs and ocr_job_documents tables - Repositories for job and job-document queries - TriggerOcrDTO, BatchOcrDTO (@Size max=500), OcrStatusDTO - ErrorCodes: OCR_SERVICE_UNAVAILABLE, OCR_JOB_NOT_FOUND, OCR_DOCUMENT_NOT_UPLOADED, OCR_PROCESSING_FAILED Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:15:16 +02:00
Marcel	d194b6b225	feat(documents): add ScriptType enum and script_type column - ScriptType enum: UNKNOWN, TYPEWRITER, HANDWRITING_LATIN, HANDWRITING_KURRENT - V24 migration adds script_type VARCHAR(30) NOT NULL DEFAULT 'UNKNOWN' - Document entity: scriptType field with @Builder.Default UNKNOWN - DocumentUpdateDTO: optional scriptType field - DocumentService: wires scriptType through update method Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:13:42 +02:00
Marcel	c19c41f812	feat(annotations): add createOcrAnnotation that skips overlap check OCR creates many adjacent text line annotations that would fail the existing overlap check. createOcrAnnotation() accepts an optional polygon and bypasses overlap detection entirely. Refs #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:12:11 +02:00
Marcel	878a90a86d	feat(annotations): add polygon JSONB support for quadrilateral shapes - V23 migration adds polygon JSONB column with 4-point CHECK constraint - PolygonConverter: AttributeConverter for List<List<Double>> <-> JSONB - @UniquePoints custom validator rejects duplicate coordinates - CreateAnnotationDTO: validated optional polygon field - DocumentAnnotation entity: polygon field with converter Refs #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:10:35 +02:00
Marcel	e69aaa6a8c	fix: classify Steuerfinanzamt and Reichsfechtschule as institutions Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details Add "amt" and "schule" suffixes to INSTITUTION_END in PersonTypeClassifier so German government offices and schools are auto-classified on import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-08 20:59:17 +02:00
Marcel	c34db997fa	feat(model): add title field to PersonUpdateDTO with @Size validation Some checks failed CI / Unit & Component Tests (pull_request) Failing after 3s Details CI / Backend Unit Tests (pull_request) Failing after 2s Details CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 1s Details Add title to PersonUpdateDTO with @Size(max=50) constraint. PersonService.createPerson and updatePerson now handle the title field with blank-to-null normalization. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 18:38:33 +02:00
Marcel	5106d277f1	test(service): add integration test for findOrCreateByAlias classification Some checks failed CI / Unit & Component Tests (pull_request) Failing after 2s Details CI / Backend Unit Tests (pull_request) Failing after 2s Details CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 1s Details Testcontainers test verifying: SKIP returns null with no DB record, INSTITUTION/GROUP store full name in lastName with null firstName and correct personType, PERSON splits name normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:29:20 +02:00
Marcel	ac545ecdaa	refactor: address PR review concerns - Remove Architekt from WORD_PREFIXES (classifier handles it) - Use Objects.equals for null-safe firstName/lastName comparison - Remove unused trimmed variable in PersonTypeClassifier - Fix containsWord to loop through all occurrences (finds "Eltern" in "Nachbareltern Eltern") - Extract DisplayNameFormatter utility shared by Person and PersonSummaryDTO to eliminate display logic duplication Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:25:06 +02:00
Marcel	c0cf8d7952	fix(service): add @Nullable to findOrCreateByAlias and filter nulls in caller Add @Nullable annotation to findOrCreateByAlias() return type. Filter null results (from SKIP classification) in MassImportService receiver list to prevent null elements in the receivers collection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:22:33 +02:00
Marcel	73640ef5b6	feat(parser): implement stripTitle for known prefixes Some checks failed CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 2s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details Two-pass title stripping with loop for stacked titles: - Dot-prefixes (Dr., Prof.) matched without trailing space - Word-prefixes (Tante, Frau, Schwester, etc.) matched at word boundary - Stacked titles like "Prof. Dr. Muller" handled correctly - Single token after title strip goes to lastName (not firstName) Add 5 "von" last names to KNOWN_LAST_NAMES for correct splitting of entries like "Freifrau von Massenbach". 15 new test cases + updated 3 existing tests for title behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:15:18 +02:00
Marcel	a3da5731d0	feat(service): integrate PersonTypeClassifier into findOrCreateByAlias Classify raw name before processing. SKIP returns null (no Person created). INSTITUTION/GROUP skip split() and store full name in lastName with firstName=null and appropriate personType. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:06:49 +02:00
Marcel	68f0c4c4b9	feat(service): add PersonTypeClassifier with keyword heuristics Static classify() method uses position-aware keyword matching: - SKIP: Briefumschlag, Kondolenzbriefe, Hochzeitsgedicht (start) - INSTITUTION: Firma, Architekt (start), GmbH, Co (end) - GROUP: Familie, Comité, Comite, Geschwister, Gesellschafter, Garde, Mitarbeiter (start), Eltern, Kinder, Schwiegereltern (word boundary) - PERSON: default for all other inputs Case-insensitive. 25 parameterized test cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:03:53 +02:00
Marcel	e49ae5de29	fix(parser): preserve annotation parens for single-person inputs Move paren extraction in parseReceivers() after the multi-separator check so single-person entries like "Clara de Gruyter(*1871)" keep their parens intact for split()'s annotation extraction. Multi-person entries like "Hedi und Tutu (Gruber)" still use parens as shared last-name override. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:00:34 +02:00
Marcel	e696e5056d	feat(parser): implement stripAnnotation for parenthesized content Extract trailing (...) content as annotation. Handles birth years (1871), nicknames (Tuttu), uncertainty markers (?), and uncertain names (Quast ?) where the name part is extracted back into the cleaned result. Uses [^)] regex to prevent ReDoS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:58:02 +02:00
Marcel	9f90cc1a5f	feat(service): create MAIDEN_NAME alias in findOrCreateByAlias When split() returns a non-null maidenName, PersonService now creates a PersonNameAlias with type MAIDEN_NAME. The maiden name is stored as lastName on the alias (no firstName). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:55:50 +02:00
Marcel	8421d45c71	test(parser): add parseReceivers tests for widened geb pattern Verify comma-prefix, no-dot, and multi-word maiden name variants are correctly stripped in parseReceivers(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:53:03 +02:00
Marcel	c49cb345ca	feat(parser): widen GEB_PATTERN and extract maiden name in stripMaidenName Widen pattern from `\s+geb\.\s+\S+` to `,?\s*geb\.?\s+(.+)$` to handle: optional comma, optional dot, multi-word maiden names. stripMaidenName() now captures the maiden name instead of discarding it. Handles all 5 input variants from the ODS data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:51:32 +02:00
Marcel	f11d8a38ed	feat(frontend): replace all name concatenation with displayName - Add displayName default method to PersonSummaryDTO - Update native SQL queries to include title, person_type columns - Add getInitials() utility to personFormat.ts - Update abbreviateName/abbreviateCompact for nullable firstName - Replace firstName+lastName concatenation with displayName in all person-displaying components and server load files - Regenerate API types with displayName on Person and PersonSummaryDTO Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:22:30 +02:00
Marcel	de2cc677a9	fix(search): handle null firstName in all search queries Use COALESCE to convert null firstName to empty string in: - PersonRepository.searchByName (JPQL) - PersonRepository.searchWithDocumentCount (native SQL) - PersonRepository.findCorrespondentsWithFilter (native SQL) - DocumentSpecifications.hasText (Criteria API, sender + receiver) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 11:59:41 +02:00
Marcel	92f1a112f5	feat(migration): V22 add title, person_type, nullable first_name - Add title VARCHAR(50) column - Add person_type VARCHAR(20) NOT NULL DEFAULT 'PERSON' with CHECK constraint (PERSON, INSTITUTION, GROUP, UNKNOWN — SKIP excluded) - Drop NOT NULL on first_name for non-person entities Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 11:55:04 +02:00

1 2 3 4

198 Commits