familienarchiv

Author	SHA1	Message	Date
Marcel	3aaec01421	feat(transcription): add source/reviewed fields for training pipeline Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 0s Details CI / Unit & Component Tests (pull_request) Failing after 0s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details - BlockSource enum: MANUAL, OCR - V26 migration adds source + reviewed columns to transcription_blocks - OcrService sets source=OCR when creating blocks - TranscriptionService.reviewBlock() toggles the reviewed flag - PUT /api/documents/{id}/transcription-blocks/{blockId}/review endpoint - 5 new tests: reviewBlock toggle/untoggle/notfound, controller, OcrService source=OCR verification The reviewed flag enables the Kraken fine-tuning pipeline: only blocks marked as reviewed by a human are exported as training data. Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 21:44:51 +02:00
Marcel	931fbc28e5	fix(annotations): use @JdbcTypeCode(JSON) for polygon JSONB column Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 2s Details Replace @Convert(PolygonConverter) with Hibernate native @JdbcTypeCode(SqlTypes.JSON) to fix JDBC type mismatch — PostgreSQL requires jsonb type, not varchar. The PolygonConverter is retained as a standalone utility but no longer used on the entity. Hibernate 6 natively handles List<List<Double>> serialization to JSONB. Refs #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:39:54 +02:00
Marcel	6737bd6db5	feat(ocr): add Python OCR microservice, RestClientOcrClient, Docker Compose Python microservice (ocr-service/): - FastAPI app with /ocr and /health endpoints - Surya engine: transformer-based OCR for typewritten/modern handwriting - Kraken engine: historical HTR for Kurrent/Suetterlin with pure-Python polygon-to-quad approximation (gift wrapping + rotating calipers) - Eager model loading at startup via lifespan context manager - PDF download via httpx, page rendering via pypdfium2 at 300 DPI Java RestClientOcrClient: - Implements OcrClient + OcrHealthClient interfaces - Calls Python service via Spring RestClient - Health check with graceful fallback Docker Compose: - New ocr-service container (mem_limit 6g, no host ports) - Health check with start_period 60s for model loading - ocr_models volume for Kraken model files - Backend depends on ocr-service health Refs #226, #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:26:40 +02:00
Marcel	aea46c5fd0	feat(ocr): add OcrService, OcrBatchService, OcrProgressService, OcrController - OcrService: single-document OCR (health check, block clearing, presigned URL, annotation + block creation) - OcrBatchService: batch processing with @Async, per-document status tracking, SKIPPED for PLACEHOLDER documents, failure isolation - OcrProgressService: SSE emitter registry per job ID with 5-min timeout - OcrController: POST /api/documents/{id}/ocr (WRITE_ALL), POST /api/ocr/batch (ADMIN), GET /api/ocr/jobs/{id} (READ_ALL), GET /api/ocr/jobs/{id}/progress (SSE), GET /api/documents/{id}/ocr-status 19 tests: 6 OcrService, 4 OcrBatchService, 3 OcrProgressService, 6 OcrController Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:24:15 +02:00
Marcel	ff3990710e	feat(ocr): add OCR infrastructure (interfaces, entities, migrations, DTOs) - OcrClient + OcrHealthClient interfaces for testable OCR integration - OcrBlockResult record for OCR engine response mapping - OcrJob + OcrJobDocument entities with status enums - V25 migration creates ocr_jobs and ocr_job_documents tables - Repositories for job and job-document queries - TriggerOcrDTO, BatchOcrDTO (@Size max=500), OcrStatusDTO - ErrorCodes: OCR_SERVICE_UNAVAILABLE, OCR_JOB_NOT_FOUND, OCR_DOCUMENT_NOT_UPLOADED, OCR_PROCESSING_FAILED Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:15:16 +02:00
Marcel	d194b6b225	feat(documents): add ScriptType enum and script_type column - ScriptType enum: UNKNOWN, TYPEWRITER, HANDWRITING_LATIN, HANDWRITING_KURRENT - V24 migration adds script_type VARCHAR(30) NOT NULL DEFAULT 'UNKNOWN' - Document entity: scriptType field with @Builder.Default UNKNOWN - DocumentUpdateDTO: optional scriptType field - DocumentService: wires scriptType through update method Refs #226 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:13:42 +02:00
Marcel	c19c41f812	feat(annotations): add createOcrAnnotation that skips overlap check OCR creates many adjacent text line annotations that would fail the existing overlap check. createOcrAnnotation() accepts an optional polygon and bypasses overlap detection entirely. Refs #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:12:11 +02:00
Marcel	878a90a86d	feat(annotations): add polygon JSONB support for quadrilateral shapes - V23 migration adds polygon JSONB column with 4-point CHECK constraint - PolygonConverter: AttributeConverter for List<List<Double>> <-> JSONB - @UniquePoints custom validator rejects duplicate coordinates - CreateAnnotationDTO: validated optional polygon field - DocumentAnnotation entity: polygon field with converter Refs #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:10:35 +02:00
Marcel	e69aaa6a8c	fix: classify Steuerfinanzamt and Reichsfechtschule as institutions Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details Add "amt" and "schule" suffixes to INSTITUTION_END in PersonTypeClassifier so German government offices and schools are auto-classified on import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-08 20:59:17 +02:00
Marcel	c34db997fa	feat(model): add title field to PersonUpdateDTO with @Size validation Some checks failed CI / Unit & Component Tests (pull_request) Failing after 3s Details CI / Backend Unit Tests (pull_request) Failing after 2s Details CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 1s Details Add title to PersonUpdateDTO with @Size(max=50) constraint. PersonService.createPerson and updatePerson now handle the title field with blank-to-null normalization. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 18:38:33 +02:00
Marcel	5106d277f1	test(service): add integration test for findOrCreateByAlias classification Some checks failed CI / Unit & Component Tests (pull_request) Failing after 2s Details CI / Backend Unit Tests (pull_request) Failing after 2s Details CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 1s Details Testcontainers test verifying: SKIP returns null with no DB record, INSTITUTION/GROUP store full name in lastName with null firstName and correct personType, PERSON splits name normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:29:20 +02:00
Marcel	ac545ecdaa	refactor: address PR review concerns - Remove Architekt from WORD_PREFIXES (classifier handles it) - Use Objects.equals for null-safe firstName/lastName comparison - Remove unused trimmed variable in PersonTypeClassifier - Fix containsWord to loop through all occurrences (finds "Eltern" in "Nachbareltern Eltern") - Extract DisplayNameFormatter utility shared by Person and PersonSummaryDTO to eliminate display logic duplication Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:25:06 +02:00
Marcel	c0cf8d7952	fix(service): add @Nullable to findOrCreateByAlias and filter nulls in caller Add @Nullable annotation to findOrCreateByAlias() return type. Filter null results (from SKIP classification) in MassImportService receiver list to prevent null elements in the receivers collection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:22:33 +02:00
Marcel	73640ef5b6	feat(parser): implement stripTitle for known prefixes Some checks failed CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 2s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details Two-pass title stripping with loop for stacked titles: - Dot-prefixes (Dr., Prof.) matched without trailing space - Word-prefixes (Tante, Frau, Schwester, etc.) matched at word boundary - Stacked titles like "Prof. Dr. Muller" handled correctly - Single token after title strip goes to lastName (not firstName) Add 5 "von" last names to KNOWN_LAST_NAMES for correct splitting of entries like "Freifrau von Massenbach". 15 new test cases + updated 3 existing tests for title behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:15:18 +02:00
Marcel	a3da5731d0	feat(service): integrate PersonTypeClassifier into findOrCreateByAlias Classify raw name before processing. SKIP returns null (no Person created). INSTITUTION/GROUP skip split() and store full name in lastName with firstName=null and appropriate personType. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:06:49 +02:00
Marcel	68f0c4c4b9	feat(service): add PersonTypeClassifier with keyword heuristics Static classify() method uses position-aware keyword matching: - SKIP: Briefumschlag, Kondolenzbriefe, Hochzeitsgedicht (start) - INSTITUTION: Firma, Architekt (start), GmbH, Co (end) - GROUP: Familie, Comité, Comite, Geschwister, Gesellschafter, Garde, Mitarbeiter (start), Eltern, Kinder, Schwiegereltern (word boundary) - PERSON: default for all other inputs Case-insensitive. 25 parameterized test cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:03:53 +02:00
Marcel	e49ae5de29	fix(parser): preserve annotation parens for single-person inputs Move paren extraction in parseReceivers() after the multi-separator check so single-person entries like "Clara de Gruyter(*1871)" keep their parens intact for split()'s annotation extraction. Multi-person entries like "Hedi und Tutu (Gruber)" still use parens as shared last-name override. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 13:00:34 +02:00
Marcel	e696e5056d	feat(parser): implement stripAnnotation for parenthesized content Extract trailing (...) content as annotation. Handles birth years (1871), nicknames (Tuttu), uncertainty markers (?), and uncertain names (Quast ?) where the name part is extracted back into the cleaned result. Uses [^)] regex to prevent ReDoS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:58:02 +02:00
Marcel	9f90cc1a5f	feat(service): create MAIDEN_NAME alias in findOrCreateByAlias When split() returns a non-null maidenName, PersonService now creates a PersonNameAlias with type MAIDEN_NAME. The maiden name is stored as lastName on the alias (no firstName). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:55:50 +02:00
Marcel	8421d45c71	test(parser): add parseReceivers tests for widened geb pattern Verify comma-prefix, no-dot, and multi-word maiden name variants are correctly stripped in parseReceivers(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:53:03 +02:00
Marcel	c49cb345ca	feat(parser): widen GEB_PATTERN and extract maiden name in stripMaidenName Widen pattern from `\s+geb\.\s+\S+` to `,?\s*geb\.?\s+(.+)$` to handle: optional comma, optional dot, multi-word maiden names. stripMaidenName() now captures the maiden name instead of discarding it. Handles all 5 input variants from the ODS data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:51:32 +02:00
Marcel	f11d8a38ed	feat(frontend): replace all name concatenation with displayName - Add displayName default method to PersonSummaryDTO - Update native SQL queries to include title, person_type columns - Add getInitials() utility to personFormat.ts - Update abbreviateName/abbreviateCompact for nullable firstName - Replace firstName+lastName concatenation with displayName in all person-displaying components and server load files - Regenerate API types with displayName on Person and PersonSummaryDTO Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 12:22:30 +02:00
Marcel	de2cc677a9	fix(search): handle null firstName in all search queries Use COALESCE to convert null firstName to empty string in: - PersonRepository.searchByName (JPQL) - PersonRepository.searchWithDocumentCount (native SQL) - PersonRepository.findCorrespondentsWithFilter (native SQL) - DocumentSpecifications.hasText (Criteria API, sender + receiver) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 11:59:41 +02:00
Marcel	92f1a112f5	feat(migration): V22 add title, person_type, nullable first_name - Add title VARCHAR(50) column - Add person_type VARCHAR(20) NOT NULL DEFAULT 'PERSON' with CHECK constraint (PERSON, INSTITUTION, GROUP, UNKNOWN — SKIP excluded) - Drop NOT NULL on first_name for non-person entities Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 11:55:04 +02:00
Marcel	9f14648dc3	feat(model): add title, personType, displayName to Person entity - Add title (nullable VARCHAR) and personType (enum, default PERSON) - Make firstName nullable for non-person entities - Add @Transient getDisplayName() as single source of truth for name display, exposed via @Schema(READ_ONLY, REQUIRED) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 11:53:07 +02:00
Marcel	8101ddb697	feat(model): add PersonType enum and MAIDEN_NAME alias type PersonType has 5 values: PERSON, INSTITUTION, GROUP, UNKNOWN, SKIP. SKIP is intentionally excluded from the DB CHECK constraint (added in migration) as defense-in-depth. MAIDEN_NAME added to PersonNameAliasType for #209. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 11:50:19 +02:00
Marcel	dea1635d75	refactor(parser): extract split() pipeline into named methods Extract stripMaidenName, normalizeDotCompressed, stripAnnotation, stripTitle, and splitByKnownLastNameOrFallback as individually testable pipeline steps. Each extraction method is a pass-through until its feature issue fills in the logic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 11:48:08 +02:00
Marcel	1e1921e0fa	refactor(parser): expand SplitName record to 5 fields Add title, maidenName, and annotation fields (all nullable) to SplitName. All existing call sites pass null for new fields. Test assertions updated to document the null-by-default contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 11:46:09 +02:00
Marcel	d6e74972eb	test(parser): add regression and cross-feature interaction tests Some checks failed CI / Unit & Component Tests (pull_request) Failing after 3s Details CI / Backend Unit Tests (pull_request) Failing after 2s Details CI / Unit & Component Tests (push) Failing after 4s Details CI / Backend Unit Tests (push) Failing after 3s Details Regression test confirms already-spaced dot names are not double-spaced. Interaction test confirms // separator works with dot-compressed names. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 17:35:30 +02:00
Marcel	0b57717586	feat(parser): normalize dot-compressed names in split() Inserts spaces after dots when the cleaned name has no spaces but contains dots, so the existing last-space fallback handles names like "E.Rockstroh" and "Dr.Fr.Zarncke" correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 17:34:56 +02:00
Marcel	59475efbcb	feat(parser): support // as multi-person separator in parseReceivers Pre-splits input on "//" before existing logic so each segment is processed independently through the full pipeline (und/u splitting, last-name distribution, etc.). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 17:33:55 +02:00
Marcel	f435f2441c	fix(model): add @JsonIgnore on PersonNameAlias.person to prevent LazyInitializationException Some checks failed CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 1s Details Jackson tried to serialize the lazy Person proxy when returning alias list, causing a "no session" error. The back-reference is only needed for JPA navigation, not for API responses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 16:31:39 +02:00
Marcel	cfb3260e0e	fix(api): add input validation to PersonNameAliasDTO Adds @NotBlank @Size(max=255) on lastName, @NotNull on type, @Valid on controller parameter. Blank/null input now returns 400 instead of reaching the DB constraint. 2 new controller tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 13:40:43 +02:00
Marcel	90c9ac9357	feat(search): extend document text search to match alias last names Adds sender alias LEFT JOIN and receiver alias EXISTS subquery to DocumentSpecifications.hasText(). Uses entity-graph navigation via Person.nameAliases (@OneToMany) to avoid a separate DB roundtrip while respecting domain boundaries. 2 new integration tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 13:18:31 +02:00
Marcel	db61d6b77f	feat(search): extend person search to include alias last names Adds LEFT JOIN to person_name_aliases in both searchByName (JPQL) and searchWithDocumentCount (native SQL). Uses DISTINCT/GROUP BY to prevent duplicate results. 4 new integration tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 13:12:54 +02:00
Marcel	a1d63bbc42	feat(api): add GET/POST/DELETE /api/persons/{id}/aliases endpoints GET returns aliases (no permission required), POST requires WRITE_ALL, DELETE requires WRITE_ALL. 5 new controller tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 13:09:58 +02:00
Marcel	0fc568dd9f	feat(service): add alias CRUD methods to PersonService getAliases (sorted by sort_order), addAlias (auto-incrementing sort_order), removeAlias (with IDOR protection verifying alias belongs to the given person). All TDD with 7 new unit tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 13:07:14 +02:00
Marcel	765cbfbaaf	feat(model): add PersonNameAlias entity, type enum, repository, DTO Introduces the alias domain model: entity with @ManyToOne to Person, @OneToMany on Person for JPA graph navigation, repository with sort_order queries, input DTO, and ALIAS_NOT_FOUND error code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 13:04:38 +02:00
Marcel	22fe9600a1	feat(migration): V21 add person_name_aliases table with pg_trgm indexes Creates the alias table for historical name changes (marriage, widowhood, etc.) and adds GIN trigram indexes on both the new alias table and the existing persons table for substring search. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-07 13:02:51 +02:00
Marcel	56f7282a9d	test(search): add empty-receivers edge case for RECEIVER sort Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 16:45:01 +02:00
Marcel	110024245d	docs(search): document in-memory sort tradeoff and total=size() limitation Add TODO comment explaining why SENDER/RECEIVER sort is in-memory (JPA INNER JOIN drops null-sender docs) and note that pagination will require a DB COUNT query in DocumentSearchResult.of(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 16:41:17 +02:00
Marcel	972048d57d	fix(search): treat null sender.lastName as empty in sort key A sender with lastName=null produced sort key "null Bob" which sorted before names starting with lowercase letters (n < s, t, u, v...). Now returns "" for null lastName, which the comparator places at end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 16:39:30 +02:00
Marcel	1c1ab0c72a	feat(search): reject invalid dir parameter with 400 Previously any value other than ASC/DESC silently defaulted to DESC with no feedback. Now returns 400 Bad Request. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 16:34:38 +02:00
Marcel	6ac3f6b176	refactor(search): remove dead SENDER case from resolveSort switch SENDER and RECEIVER are handled by in-memory sort before resolveSort is called, making those switch cases unreachable. Removed and added a comment making the invariant explicit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 16:31:39 +02:00
Marcel	12023513b2	refactor(search): move DocumentSort from model/ to dto/ DocumentSort is a query parameter enum, not a JPA entity. Placing it in model/ violated the layer boundary — model/ should contain only domain entities. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 16:29:35 +02:00
Marcel	bc397048b7	fix(search): use in-memory sort for SENDER to include documents with null sender Some checks failed CI / Unit & Component Tests (push) Has been cancelled Details CI / Backend Unit Tests (push) Has been cancelled Details CI / Unit & Component Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details INNER JOIN from Sort.by("sender.lastName") was excluding docs without a sender. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 14:15:03 +02:00
Marcel	879435c8d9	feat(search): wrap search response in { documents, total } envelope Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 13:17:08 +02:00
Marcel	c2b5008c66	feat(search): add sort param (DATE/TITLE/SENDER/RECEIVER/UPLOAD_DATE) and tagQ filter - DocumentSort enum validated by Spring MVC (400 for unknown values) - SENDER sort uses Spring Data Sort on sender.lastName/firstName - RECEIVER sort uses in-memory sort by first receiver alphabetically - UPLOAD_DATE sort uses createdAt; default sort is DATE DESC - tagQ param wired to hasTagPartial specification Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 13:13:06 +02:00
Marcel	beca2d463a	feat(search): extend hasText to match sender/receiver/tag names, add hasTagPartial - hasText now JOINs sender (LEFT JOIN) and uses EXISTS subqueries for receivers and tags to avoid duplicate rows - hasTagPartial added for live debounced tag text filter (ILIKE partial match) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 13:07:39 +02:00
Marcel	e89d8a4ca9	test: increase coverage	2026-04-06 11:20:57 +02:00

1 2 3 4

172 Commits