Commit Graph

124 Commits

Author SHA1 Message Date
Marcel
8618e520b5 fix(ocr): fill empty MANUAL blocks in guided OCR mode
When a user draws annotation boxes to mark OCR regions, the blocks are
created with source=MANUAL and empty text. upsertGuidedBlock was
protecting all MANUAL blocks unconditionally, so guided OCR silently
produced no output for these drawn-but-empty blocks.

Changed the guard to only protect non-empty MANUAL blocks — empty ones
are treated like OCR blocks and get their text filled in.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 16:25:23 +02:00
Marcel
ee58b63517 feat(ocr): add guided OCR mode using existing annotation regions
When a document has manually drawn annotation boxes, the user can now
enable "Nur annotierte Bereiche" in the OCR trigger panel. The engine
skips layout detection entirely and runs recognition only within the
pre-drawn bounding boxes, preserving manual transcription blocks.

- Python: adds OcrRegion model, extend OcrRequest/OcrBlock; guided
  branch in /ocr/stream groups by page and crops each region
- Engines: add extract_region_text() to both Kraken and Surya
- Java: adds OcrBlockResult.annotationId, OcrClient.OcrRegion,
  TriggerOcrDTO.useExistingAnnotations; OcrAsyncRunner dispatches to
  upsertGuidedBlock when annotationId is present; OcrService threads
  the flag through to runSingleDocument
- TranscriptionService: adds upsertGuidedBlock (creates, updates OCR,
  or preserves MANUAL blocks)
- Frontend: guided OCR toggle in OcrTrigger shown when blocks exist;
  skips destructive-replace confirmation in guided mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 15:57:54 +02:00
Marcel
9b2f91ee59 feat(training): add segmentation training pipeline and complete Part 6
- Add /segtrain endpoint to OCR service (ZIP upload, ketos.segtrain,
  backup rotation, in-process model reload)
- Add segtrainModel() to OcrClient and RestClientOcrClient (10-min timeout,
  X-Training-Token header)
- Add SegmentationTrainingExportService: PAGE XML export with polygon
  de-normalization and per-page PNG rendering via PDFBox
- Add GET /api/ocr/segmentation-training-data/export endpoint
- Make TranscriptionBlock.text nullable for segmentation-only blocks
  (V31 migration)
- Add Paraglide i18n translation keys for all training UI strings (de/en/es)
- Pass source prop from TranscriptionEditView to TranscriptionBlock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 15:15:17 +02:00
Marcel
86e9c05aaf feat(training): add Paraglide i18n to training UI components and wire SegmentationTrainingCard
- Convert TrainingHistory, OcrTrainingCard, SegmentationTrainingCard, and
  TranscriptionBlock "Nur Segmentierung" badge to use Paraglide message keys
- Add availableSegBlocks to TrainingInfoResponse to expose segmentation
  block count in the training info endpoint
- Wire SegmentationTrainingCard into admin/system page below OCR training card
- Update api.ts with availableSegBlocks field

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 15:14:27 +02:00
Marcel
88e005eb49 feat(ocr): add training history + POST /train + GET /training-info endpoints
- OcrTrainingRun entity + V30 migration (partial unique index prevents
  concurrent runs at DB level)
- OcrTrainingService: concurrent-run guard, 5-block threshold, MDC log
  correlation, orphan recovery on ApplicationReadyEvent
- POST /api/ocr/train (ADMIN) + GET /api/ocr/training-info (ADMIN)
- TRAINING_ALREADY_RUNNING ErrorCode
- 6 OcrTrainingServiceTest + 6 OcrControllerTest tests for the new endpoints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 14:47:56 +02:00
Marcel
bc97a2dade feat(ocr): add /train endpoint to OCR service and OcrClient.trainModel()
- POST /train in ocr-service with ZIP Slip validation, TemporaryDirectory,
  ketos transfer learning, timestamped backups (keep last 3), in-process reload
- X-Training-Token auth (no-op in dev when TRAINING_TOKEN env is empty)
- trainModel() in OcrClient interface + RestClientOcrClient (10-min timeout,
  multipart upload, forwards X-Training-Token when configured)
- TRAINING_TOKEN env var wired in docker-compose; --workers 2 in Dockerfile
  so /health stays responsive during synchronous training

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 14:40:53 +02:00
Marcel
cfa3c4df67 feat(training): add recognition training data export
- TrainingDataExportService: PDFBox rendering at 300 DPI, crop by
  annotation coordinates, ZIP with <uuid>.png + <uuid>.gt.txt pairs
- Skips documents with missing S3 files (logs WARN, continues)
- GET /api/ocr/training-data/export (ADMIN); 204 when no enrolled blocks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 14:35:06 +02:00
Marcel
fdf1eb92ad feat(training): add document-level training enrollment
- V29 migration: document_training_labels join table
- TrainingLabel enum: KURRENT_RECOGNITION, KURRENT_SEGMENTATION
- Document.trainingLabels @ElementCollection
- DocumentService.addTrainingLabel / removeTrainingLabel
- PATCH /api/documents/{id}/training-labels (WRITE_ALL)
- Auto-enroll on Kurrent OCR trigger (OcrService.startOcr)
- TranscriptionEditView: enrollment chips in panel footer
- JPQL queries updated to use MEMBER OF trainingLabels

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 14:30:51 +02:00
Marcel
9282e46a02 fix(ocr): handle unknown NDJSON fields with @JsonIgnoreProperties
Added @JsonIgnoreProperties(ignoreUnknown = true) to OcrBlockResult so
new fields from the Python OCR service don't crash the Java parser,
while keeping FAIL_ON_UNKNOWN_PROPERTIES strict globally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:27:20 +02:00
Marcel
caae2ead81 refactor(ocr): route block lifecycle through TranscriptionService
OcrAsyncRunner was bypassing TranscriptionService — building blocks
directly and calling blockRepository.save(), skipping sanitizeText()
and saveVersion(). Also replaced N individual deleteBlock() calls with
a single bulk deleteAllBlocksByDocument() for OCR re-runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:27:01 +02:00
Marcel
6a0fd25662 fix(ocr): persist scriptType override via DocumentService transaction
OcrService.startOcr() was setting scriptType on a detached entity,
silently losing the mutation. Added DocumentService.updateScriptType()
with @Transactional to persist the change properly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:26:37 +02:00
Marcel
2d43f09172 refactor(ocr): move repository access from OcrController into OcrService
OcrController was injecting OcrJobRepository and OcrJobDocumentRepository
directly, violating the Controller → Service → Repository layering rule.
Moved getJob() and getDocumentOcrStatus() logic into OcrService.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:26:14 +02:00
Marcel
292dc66f3c feat(ocr): rewrite runSingleDocument to use streamBlocks with per-page progress
Replace the single extractBlocks() call with streamBlocks() that
processes pages incrementally. Each page's blocks are persisted
immediately via createSingleBlock(). Progress updates use the
ANALYZING_PAGE:current:total:blocks format. Per-page errors are
logged at WARN level without failing the entire job. The batch path
(processDocument) remains on the old extractBlocks() path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:07:06 +02:00
Marcel
93c3154b3c feat(ocr): implement NDJSON streaming in RestClientOcrClient
Add streamBlocks() that POSTs to /ocr/stream and parses the NDJSON
response line by line with a dedicated ObjectMapper. Falls back to
the old /ocr endpoint via the default method when /ocr/stream returns
404. Uses a separate HttpClient with 5-minute request timeout for
streaming.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:03:12 +02:00
Marcel
641e91d5a3 feat(ocr): add default streamBlocks method to OcrClient interface
The default method synthesizes Start/Page/Done events from
extractBlocks() results, providing backward compatibility for
implementations that don't support streaming natively.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:01:26 +02:00
Marcel
e21d01e10b feat(ocr): add OcrStreamEvent sealed interface with Start/Page/Error/Done records
Defines the event types for NDJSON streaming OCR. Uses Java 21 sealed
interface with record subtypes for exhaustive pattern matching in the
consumer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:00:02 +02:00
Marcel
c1befd3fa3 fix(ocr): resume polling on page reload + track single-doc job status
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 0s
CI / Backend Unit Tests (pull_request) Failing after 1s
Single-document OCR now creates an OcrJobDocument row so
GET /api/documents/{id}/ocr-status can find running jobs.
OcrAsyncRunner updates the job document status (RUNNING → DONE/FAILED).

Frontend checks OCR status when entering transcription mode —
if a job is running, resumes polling and shows the spinner.

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 23:16:59 +02:00
Marcel
dd175d09e2 refactor(ocr): make single-document OCR async, fix circular dependency
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
OcrService → OcrAsyncRunner was circular. Fixed by moving all OCR
processing logic (processDocument, clearExistingBlocks, createBlocks)
into OcrAsyncRunner. OcrService is now a thin entry point that
validates, creates the job, and dispatches to OcrAsyncRunner.

Architecture:
- OcrService: validates document, checks health, creates OcrJob, delegates
- OcrAsyncRunner: @Async processDocument + runSingleDocument + runBatch
- OcrBatchService: creates job + job documents, delegates to OcrAsyncRunner
- No circular dependencies

Single-document OCR is now async (returns jobId immediately).
Frontend polls GET /api/ocr/jobs/{jobId} every 3s until DONE/FAILED.

816 backend tests pass, 687 frontend tests pass.

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:55:52 +02:00
Marcel
4500c99e40 fix(ocr): use presigned URLs for MinIO access from OCR service
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 0s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
The OCR service was getting 403 Forbidden because it tried to
download PDFs from MinIO using plain internal URLs without
authentication. MinIO buckets are private.

- Add S3Presigner bean to MinioConfig
- FileService.generatePresignedUrl(): generates 15-min presigned URLs
- OcrService uses presigned URLs instead of plain internal URLs
- Remove unused s3InternalUrl / bucketName @Value fields from OcrService

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:16:52 +02:00
Marcel
3aaec01421 feat(transcription): add source/reviewed fields for training pipeline
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 0s
CI / Unit & Component Tests (pull_request) Failing after 0s
CI / Backend Unit Tests (pull_request) Failing after 1s
- BlockSource enum: MANUAL, OCR
- V26 migration adds source + reviewed columns to transcription_blocks
- OcrService sets source=OCR when creating blocks
- TranscriptionService.reviewBlock() toggles the reviewed flag
- PUT /api/documents/{id}/transcription-blocks/{blockId}/review endpoint
- 5 new tests: reviewBlock toggle/untoggle/notfound, controller,
  OcrService source=OCR verification

The reviewed flag enables the Kraken fine-tuning pipeline: only blocks
marked as reviewed by a human are exported as training data.

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 21:44:51 +02:00
Marcel
aea46c5fd0 feat(ocr): add OcrService, OcrBatchService, OcrProgressService, OcrController
- OcrService: single-document OCR (health check, block clearing,
  presigned URL, annotation + block creation)
- OcrBatchService: batch processing with @Async, per-document status
  tracking, SKIPPED for PLACEHOLDER documents, failure isolation
- OcrProgressService: SSE emitter registry per job ID with 5-min timeout
- OcrController: POST /api/documents/{id}/ocr (WRITE_ALL),
  POST /api/ocr/batch (ADMIN), GET /api/ocr/jobs/{id} (READ_ALL),
  GET /api/ocr/jobs/{id}/progress (SSE), GET /api/documents/{id}/ocr-status

19 tests: 6 OcrService, 4 OcrBatchService, 3 OcrProgressService, 6 OcrController

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:24:15 +02:00
Marcel
c19c41f812 feat(annotations): add createOcrAnnotation that skips overlap check
OCR creates many adjacent text line annotations that would fail the
existing overlap check. createOcrAnnotation() accepts an optional
polygon and bypasses overlap detection entirely.

Refs #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:12:11 +02:00
Marcel
878a90a86d feat(annotations): add polygon JSONB support for quadrilateral shapes
- V23 migration adds polygon JSONB column with 4-point CHECK constraint
- PolygonConverter: AttributeConverter for List<List<Double>> <-> JSONB
- @UniquePoints custom validator rejects duplicate coordinates
- CreateAnnotationDTO: validated optional polygon field
- DocumentAnnotation entity: polygon field with converter

Refs #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:10:35 +02:00
Marcel
e69aaa6a8c fix: classify Steuerfinanzamt and Reichsfechtschule as institutions
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
Add "amt" and "schule" suffixes to INSTITUTION_END in PersonTypeClassifier
so German government offices and schools are auto-classified on import.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 20:59:17 +02:00
Marcel
5106d277f1 test(service): add integration test for findOrCreateByAlias classification
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 2s
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 1s
Testcontainers test verifying: SKIP returns null with no DB record,
INSTITUTION/GROUP store full name in lastName with null firstName
and correct personType, PERSON splits name normally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:29:20 +02:00
Marcel
ac545ecdaa refactor: address PR review concerns
- Remove Architekt from WORD_PREFIXES (classifier handles it)
- Use Objects.equals for null-safe firstName/lastName comparison
- Remove unused trimmed variable in PersonTypeClassifier
- Fix containsWord to loop through all occurrences (finds
  "Eltern" in "Nachbareltern Eltern")
- Extract DisplayNameFormatter utility shared by Person and
  PersonSummaryDTO to eliminate display logic duplication

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:25:06 +02:00
Marcel
73640ef5b6 feat(parser): implement stripTitle for known prefixes
Some checks failed
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 1s
Two-pass title stripping with loop for stacked titles:
- Dot-prefixes (Dr., Prof.) matched without trailing space
- Word-prefixes (Tante, Frau, Schwester, etc.) matched at
  word boundary
- Stacked titles like "Prof. Dr. Muller" handled correctly
- Single token after title strip goes to lastName (not firstName)

Add 5 "von" last names to KNOWN_LAST_NAMES for correct splitting
of entries like "Freifrau von Massenbach".

15 new test cases + updated 3 existing tests for title behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:15:18 +02:00
Marcel
a3da5731d0 feat(service): integrate PersonTypeClassifier into findOrCreateByAlias
Classify raw name before processing. SKIP returns null (no Person
created). INSTITUTION/GROUP skip split() and store full name in
lastName with firstName=null and appropriate personType.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:06:49 +02:00
Marcel
68f0c4c4b9 feat(service): add PersonTypeClassifier with keyword heuristics
Static classify() method uses position-aware keyword matching:
- SKIP: Briefumschlag, Kondolenzbriefe, Hochzeitsgedicht (start)
- INSTITUTION: Firma, Architekt (start), GmbH, Co (end)
- GROUP: Familie, Comité, Comite, Geschwister, Gesellschafter,
  Garde, Mitarbeiter (start), Eltern, Kinder,
  Schwiegereltern (word boundary)
- PERSON: default for all other inputs

Case-insensitive. 25 parameterized test cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:03:53 +02:00
Marcel
e49ae5de29 fix(parser): preserve annotation parens for single-person inputs
Move paren extraction in parseReceivers() after the multi-separator
check so single-person entries like "Clara de Gruyter(*1871)" keep
their parens intact for split()'s annotation extraction. Multi-person
entries like "Hedi und Tutu (Gruber)" still use parens as shared
last-name override.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:00:34 +02:00
Marcel
e696e5056d feat(parser): implement stripAnnotation for parenthesized content
Extract trailing (...) content as annotation. Handles birth years
(*1871), nicknames (Tuttu), uncertainty markers (?), and uncertain
names (Quast ?) where the name part is extracted back into the
cleaned result. Uses [^)]* regex to prevent ReDoS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 12:58:02 +02:00
Marcel
9f90cc1a5f feat(service): create MAIDEN_NAME alias in findOrCreateByAlias
When split() returns a non-null maidenName, PersonService now
creates a PersonNameAlias with type MAIDEN_NAME. The maiden name
is stored as lastName on the alias (no firstName).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 12:55:50 +02:00
Marcel
8421d45c71 test(parser): add parseReceivers tests for widened geb pattern
Verify comma-prefix, no-dot, and multi-word maiden name variants
are correctly stripped in parseReceivers().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 12:53:03 +02:00
Marcel
c49cb345ca feat(parser): widen GEB_PATTERN and extract maiden name in stripMaidenName
Widen pattern from `\s+geb\.\s+\S+` to `,?\s*geb\.?\s+(.+)$` to
handle: optional comma, optional dot, multi-word maiden names.
stripMaidenName() now captures the maiden name instead of discarding
it. Handles all 5 input variants from the ODS data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 12:51:32 +02:00
Marcel
f11d8a38ed feat(frontend): replace all name concatenation with displayName
- Add displayName default method to PersonSummaryDTO
- Update native SQL queries to include title, person_type columns
- Add getInitials() utility to personFormat.ts
- Update abbreviateName/abbreviateCompact for nullable firstName
- Replace firstName+lastName concatenation with displayName in all
  person-displaying components and server load files
- Regenerate API types with displayName on Person and PersonSummaryDTO

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 12:22:30 +02:00
Marcel
de2cc677a9 fix(search): handle null firstName in all search queries
Use COALESCE to convert null firstName to empty string in:
- PersonRepository.searchByName (JPQL)
- PersonRepository.searchWithDocumentCount (native SQL)
- PersonRepository.findCorrespondentsWithFilter (native SQL)
- DocumentSpecifications.hasText (Criteria API, sender + receiver)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 11:59:41 +02:00
Marcel
9f14648dc3 feat(model): add title, personType, displayName to Person entity
- Add title (nullable VARCHAR) and personType (enum, default PERSON)
- Make firstName nullable for non-person entities
- Add @Transient getDisplayName() as single source of truth for
  name display, exposed via @Schema(READ_ONLY, REQUIRED)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 11:53:07 +02:00
Marcel
8101ddb697 feat(model): add PersonType enum and MAIDEN_NAME alias type
PersonType has 5 values: PERSON, INSTITUTION, GROUP, UNKNOWN, SKIP.
SKIP is intentionally excluded from the DB CHECK constraint (added
in migration) as defense-in-depth. MAIDEN_NAME added to
PersonNameAliasType for #209.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 11:50:19 +02:00
Marcel
dea1635d75 refactor(parser): extract split() pipeline into named methods
Extract stripMaidenName, normalizeDotCompressed, stripAnnotation,
stripTitle, and splitByKnownLastNameOrFallback as individually
testable pipeline steps. Each extraction method is a pass-through
until its feature issue fills in the logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 11:48:08 +02:00
Marcel
1e1921e0fa refactor(parser): expand SplitName record to 5 fields
Add title, maidenName, and annotation fields (all nullable) to
SplitName. All existing call sites pass null for new fields. Test
assertions updated to document the null-by-default contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 11:46:09 +02:00
Marcel
d6e74972eb test(parser): add regression and cross-feature interaction tests
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3s
CI / Backend Unit Tests (pull_request) Failing after 2s
CI / Unit & Component Tests (push) Failing after 4s
CI / Backend Unit Tests (push) Failing after 3s
Regression test confirms already-spaced dot names are not double-spaced.
Interaction test confirms // separator works with dot-compressed names.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 17:35:30 +02:00
Marcel
0b57717586 feat(parser): normalize dot-compressed names in split()
Inserts spaces after dots when the cleaned name has no spaces but
contains dots, so the existing last-space fallback handles names
like "E.Rockstroh" and "Dr.Fr.Zarncke" correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 17:34:56 +02:00
Marcel
59475efbcb feat(parser): support // as multi-person separator in parseReceivers
Pre-splits input on "//" before existing logic so each segment is
processed independently through the full pipeline (und/u splitting,
last-name distribution, etc.).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 17:33:55 +02:00
Marcel
cfb3260e0e fix(api): add input validation to PersonNameAliasDTO
Adds @NotBlank @Size(max=255) on lastName, @NotNull on type,
@Valid on controller parameter. Blank/null input now returns
400 instead of reaching the DB constraint. 2 new controller tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 13:40:43 +02:00
Marcel
90c9ac9357 feat(search): extend document text search to match alias last names
Adds sender alias LEFT JOIN and receiver alias EXISTS subquery to
DocumentSpecifications.hasText(). Uses entity-graph navigation via
Person.nameAliases (@OneToMany) to avoid a separate DB roundtrip
while respecting domain boundaries. 2 new integration tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 13:18:31 +02:00
Marcel
db61d6b77f feat(search): extend person search to include alias last names
Adds LEFT JOIN to person_name_aliases in both searchByName (JPQL)
and searchWithDocumentCount (native SQL). Uses DISTINCT/GROUP BY
to prevent duplicate results. 4 new integration tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 13:12:54 +02:00
Marcel
a1d63bbc42 feat(api): add GET/POST/DELETE /api/persons/{id}/aliases endpoints
GET returns aliases (no permission required), POST requires
WRITE_ALL, DELETE requires WRITE_ALL. 5 new controller tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 13:09:58 +02:00
Marcel
0fc568dd9f feat(service): add alias CRUD methods to PersonService
getAliases (sorted by sort_order), addAlias (auto-incrementing
sort_order), removeAlias (with IDOR protection verifying alias
belongs to the given person). All TDD with 7 new unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 13:07:14 +02:00
Marcel
56f7282a9d test(search): add empty-receivers edge case for RECEIVER sort
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 16:45:01 +02:00
Marcel
972048d57d fix(search): treat null sender.lastName as empty in sort key
A sender with lastName=null produced sort key "null Bob" which sorted
before names starting with lowercase letters (n < s, t, u, v...).
Now returns "" for null lastName, which the comparator places at end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 16:39:30 +02:00