Commit Graph

733 Commits

Author SHA1 Message Date
Marcel
3aaec01421 feat(transcription): add source/reviewed fields for training pipeline
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 0s
CI / Unit & Component Tests (pull_request) Failing after 0s
CI / Backend Unit Tests (pull_request) Failing after 1s
- BlockSource enum: MANUAL, OCR
- V26 migration adds source + reviewed columns to transcription_blocks
- OcrService sets source=OCR when creating blocks
- TranscriptionService.reviewBlock() toggles the reviewed flag
- PUT /api/documents/{id}/transcription-blocks/{blockId}/review endpoint
- 5 new tests: reviewBlock toggle/untoggle/notfound, controller,
  OcrService source=OCR verification

The reviewed flag enables the Kraken fine-tuning pipeline: only blocks
marked as reviewed by a human are exported as training data.

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 21:44:51 +02:00
Marcel
f064b27439 feat(ocr): per-script-type confidence thresholds
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
Kurrent OCR produces much lower confidence than typewriter/Latin.
Separate thresholds allow aggressive filtering for Kurrent (0.5)
while keeping typewriter lenient (0.3).

- OCR_CONFIDENCE_THRESHOLD: default for Surya paths (0.3)
- OCR_CONFIDENCE_THRESHOLD_KURRENT: Kraken Kurrent path (0.5)
- apply_confidence_markers() now accepts threshold parameter
- get_threshold(script_type) selects the right threshold

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 20:50:59 +02:00
Marcel
dd078d50da fix(ocr): extract PDF pages as PNGs before running kraken OCR
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 0s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 1s
Kraken's -f pdf mode tries to write output next to the input file,
which fails on read-only mounts. Instead, extract pages as PNGs via
pypdfium2 (already installed), then run kraken on each image.
Both models run in a single container per PDF to avoid overhead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 20:37:29 +02:00
Marcel
31519af1a4 fix(ocr): add pyvips for kraken PDF input support
Some checks failed
CI / Unit & Component Tests (push) Failing after 0s
CI / Backend Unit Tests (push) Failing after 0s
CI / Unit & Component Tests (pull_request) Failing after 0s
CI / Backend Unit Tests (pull_request) Failing after 1s
Kraken 7 requires pyvips (optional dep) for -f pdf mode.
Added libvips42 system package and pyvips Python package.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 20:11:14 +02:00
Marcel
c0004f5e6f fix(ocr): parse kraken 'Model dir' output to locate downloaded model
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 0s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 0s
The previous approach used find across the htrmopo cache which failed
because -newer /tmp ran in a separate container. Now parses the
'Model dir: <path>' line from kraken get output directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 20:09:23 +02:00
Marcel
f12b41161e fix(ocr): update model script for kraken 7 DOI-based downloads
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
Kraken 7 uses DOIs (not short names) to identify models from Zenodo.
Updated to use actual DOIs:
- 10.5281/zenodo.7933463 — German handwriting HTR
- 10.5281/zenodo.13788177 — McCATMuS generic handwritten/printed/typed

Added -f pdf flag for PDF input, volume mounts for import dir,
and post-download copy from htrmopo cache to the models volume.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 20:05:29 +02:00
Marcel
37abc376ec fix(ocr): install torchvision from CPU index alongside torch
Some checks failed
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 1s
torchvision installed from PyPI expects CUDA torch operator
registrations. Installing from the CPU whl index ensures torchvision
matches the CPU-only torch build. Fixes 'torchvision::nms does not
exist' RuntimeError on startup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:46:37 +02:00
Marcel
0af4749677 feat(ocr): extend model script with automatic OCR evaluation
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 2s
Downloads both Kraken models, then runs each against 4 sample PDFs
from the import folder (Eu-0693, Eu-0692, W-0150, W-0575). Output
goes to ocr-model-evaluation/<model-name>/<doc>.txt for side-by-side
comparison.

Usage:
  ./scripts/download-kraken-models.sh           # download + evaluate
  ./scripts/download-kraken-models.sh --eval-only  # re-run evaluation
  ./scripts/download-kraken-models.sh --activate 1  # pick winner

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:41:59 +02:00
Marcel
6669fffead fix(ocr): pin transformers<5.0 and torch==2.7.1 in requirements.txt
Some checks failed
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
transformers 5.x breaks surya 0.17.1 — SuryaDecoderConfig is missing
pad_token_id. Pin to transformers>=4.56.1,<5.0.0.

Also add torch==2.7.1 to requirements.txt to prevent pip from upgrading
it past the CPU-only build installed in the Dockerfile layer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:34:03 +02:00
Marcel
41f9262238 feat(ocr): add Kraken model download and evaluation script
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 2s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 2s
Runbook script to download both HTR-United Kurrent model candidates
(german_kurrent_manu_9, kurrent-de) into the ocr_models Docker volume,
test them against sample documents, and activate the winner.

Usage:
  ./scripts/download-kraken-models.sh              # download both
  ./scripts/download-kraken-models.sh --activate 1  # pick model 1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:19:39 +02:00
Marcel
c74539b04b feat(ocr): auto-insert [unleserlich] markers for low-confidence words
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 2s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 1s
New confidence.py module with two functions:
- apply_confidence_markers(): replaces words below threshold with
  [unleserlich], collapses adjacent markers into one
- words_from_characters(): reconstructs word-level confidence from
  Kraken's character-level data

Surya 0.17 provides native word-level confidence via line.words.
Kraken 7.0 provides per-character confidences via record.confidences.
Both engines now pass word+confidence data through main.py, which
applies the marker post-processing before returning the API response.

Threshold configurable via OCR_CONFIDENCE_THRESHOLD env var (default 0.3).
Frontend already renders [unleserlich] markers via transcriptionMarkers.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:16:17 +02:00
Marcel
49975154d9 feat(ocr): bump to latest surya 0.17.1, kraken 7.0, torch 2.7.1
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
- surya-ocr 0.6.3 → 0.17.1: new predictor API (FoundationPredictor,
  RecognitionPredictor, DetectionPredictor), native polygon output
  on text lines (4-point clockwise)
- kraken 5.2.9 → 7.0: wider torch range (>=2.4,<=2.10), unpinned numpy
- torch 2.5.1 → 2.7.1: satisfies surya's >=2.7.0 requirement
- Rewrite engines/surya.py for the 0.17 predictor class API
- Surya now outputs polygons natively — no longer rectangle-only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 18:53:14 +02:00
Marcel
e29c865016 fix(ocr): upgrade kraken to 6.0.3 for torch>=2.4 compatibility
Some checks failed
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 2s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 3s
kraken 5.2.9 required torch~=2.1.0, incompatible with surya-ocr's
torch>=2.3.0. kraken 6.0.3 requires torch>=2.4.0,<=2.9 which
overlaps with surya and our pinned torch==2.5.1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 18:48:14 +02:00
Marcel
d49010cd7b fix(ocr): relax pillow version to match surya-ocr constraint
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 1s
surya-ocr 0.6.3 requires pillow<11.0.0,>=10.2.0. The previous
pin at 11.1.0 caused a dependency resolution failure during
Docker build.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 18:40:46 +02:00
Marcel
931fbc28e5 fix(annotations): use @JdbcTypeCode(JSON) for polygon JSONB column
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 2s
Replace @Convert(PolygonConverter) with Hibernate native @JdbcTypeCode(SqlTypes.JSON)
to fix JDBC type mismatch — PostgreSQL requires jsonb type, not varchar.

The PolygonConverter is retained as a standalone utility but no longer
used on the entity. Hibernate 6 natively handles List<List<Double>>
serialization to JSONB.

Refs #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:39:54 +02:00
Marcel
a4651aa317 feat(frontend): add OCR UI components and translations
- ScriptTypeSelect: native select for TYPEWRITER/HANDWRITING_LATIN/KURRENT
- OcrTrigger: wraps script type select + start button + confirmation dialog
- OcrProgress: SSE-based progress display with page counter and progress bar
- Paraglide translations for OCR (de/en/es): script types, trigger labels,
  confirmation dialog, progress messages, error messages
- ErrorCode type + getErrorMessage: OCR_SERVICE_UNAVAILABLE, OCR_JOB_NOT_FOUND,
  OCR_DOCUMENT_NOT_UPLOADED, OCR_PROCESSING_FAILED

All 687 frontend tests pass.

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:36:00 +02:00
Marcel
cf8dc3559f feat(frontend): extract AnnotationShape component with polygon support
- AnnotationShape.svelte: renders a single annotation as either a
  rectangle or a polygon-clipped div (via CSS clip-path: polygon())
- AnnotationLayer.svelte: refactored to delegate rendering to
  AnnotationShape, keeping draw logic and hover state management
- Annotation type: added optional polygon field ([number, number][] | null)
- Polygon coordinates are converted from page-normalized to
  bounding-box-relative percentages for clip-path

All 687 existing frontend tests pass.

Refs #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:30:27 +02:00
Marcel
6737bd6db5 feat(ocr): add Python OCR microservice, RestClientOcrClient, Docker Compose
Python microservice (ocr-service/):
- FastAPI app with /ocr and /health endpoints
- Surya engine: transformer-based OCR for typewritten/modern handwriting
- Kraken engine: historical HTR for Kurrent/Suetterlin with
  pure-Python polygon-to-quad approximation (gift wrapping + rotating calipers)
- Eager model loading at startup via lifespan context manager
- PDF download via httpx, page rendering via pypdfium2 at 300 DPI

Java RestClientOcrClient:
- Implements OcrClient + OcrHealthClient interfaces
- Calls Python service via Spring RestClient
- Health check with graceful fallback

Docker Compose:
- New ocr-service container (mem_limit 6g, no host ports)
- Health check with start_period 60s for model loading
- ocr_models volume for Kraken model files
- Backend depends on ocr-service health

Refs #226, #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:26:40 +02:00
Marcel
aea46c5fd0 feat(ocr): add OcrService, OcrBatchService, OcrProgressService, OcrController
- OcrService: single-document OCR (health check, block clearing,
  presigned URL, annotation + block creation)
- OcrBatchService: batch processing with @Async, per-document status
  tracking, SKIPPED for PLACEHOLDER documents, failure isolation
- OcrProgressService: SSE emitter registry per job ID with 5-min timeout
- OcrController: POST /api/documents/{id}/ocr (WRITE_ALL),
  POST /api/ocr/batch (ADMIN), GET /api/ocr/jobs/{id} (READ_ALL),
  GET /api/ocr/jobs/{id}/progress (SSE), GET /api/documents/{id}/ocr-status

19 tests: 6 OcrService, 4 OcrBatchService, 3 OcrProgressService, 6 OcrController

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:24:15 +02:00
Marcel
ff3990710e feat(ocr): add OCR infrastructure (interfaces, entities, migrations, DTOs)
- OcrClient + OcrHealthClient interfaces for testable OCR integration
- OcrBlockResult record for OCR engine response mapping
- OcrJob + OcrJobDocument entities with status enums
- V25 migration creates ocr_jobs and ocr_job_documents tables
- Repositories for job and job-document queries
- TriggerOcrDTO, BatchOcrDTO (@Size max=500), OcrStatusDTO
- ErrorCodes: OCR_SERVICE_UNAVAILABLE, OCR_JOB_NOT_FOUND,
  OCR_DOCUMENT_NOT_UPLOADED, OCR_PROCESSING_FAILED

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:15:16 +02:00
Marcel
d194b6b225 feat(documents): add ScriptType enum and script_type column
- ScriptType enum: UNKNOWN, TYPEWRITER, HANDWRITING_LATIN, HANDWRITING_KURRENT
- V24 migration adds script_type VARCHAR(30) NOT NULL DEFAULT 'UNKNOWN'
- Document entity: scriptType field with @Builder.Default UNKNOWN
- DocumentUpdateDTO: optional scriptType field
- DocumentService: wires scriptType through update method

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:13:42 +02:00
Marcel
c19c41f812 feat(annotations): add createOcrAnnotation that skips overlap check
OCR creates many adjacent text line annotations that would fail the
existing overlap check. createOcrAnnotation() accepts an optional
polygon and bypasses overlap detection entirely.

Refs #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:12:11 +02:00
Marcel
878a90a86d feat(annotations): add polygon JSONB support for quadrilateral shapes
- V23 migration adds polygon JSONB column with 4-point CHECK constraint
- PolygonConverter: AttributeConverter for List<List<Double>> <-> JSONB
- @UniquePoints custom validator rejects duplicate coordinates
- CreateAnnotationDTO: validated optional polygon field
- DocumentAnnotation entity: polygon field with converter

Refs #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:10:35 +02:00
Marcel
ec32d225b5 docs(adr): add ADR-001 (OCR microservice) and ADR-002 (polygon JSONB)
ADR-001 documents the decision to use a separate Python container for
OCR (Surya + Kraken), the interface contract, and why alternatives
like Tess4J were rejected.

ADR-002 documents the decision to store polygon annotations as JSONB
with a 4-point CHECK constraint, backed by an AttributeConverter.

Refs #226, #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:07:46 +02:00
Marcel
11a35f2952 fix(tests): resolve all 4 pre-existing test failures
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
- CommentThread: add missing empty-state paragraph using comment_empty_hint
  i18n key (key existed but was never rendered in the template)
- TranscriptionBlock: add selectedQuote hint using transcription_block_quote_hint
  i18n key (key existed but was never rendered); fix test to use native DOM
  el.focus()/setSelectionRange()/dispatchEvent instead of locator.selectText()
  which is not available in this vitest-browser version
- TranscriptionEditView: fix test to use native el.dispatchEvent(FocusEvent)
  instead of locator.blur() which is not available
- Conversations: fix test expected text from stale "Korrespondenz durchsuchen"
  to match current conv_empty_heading() = "Wessen Briefe möchten Sie lesen?"

All 687 tests now pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:55:34 +02:00
Marcel
d046c89631 test(confirm): add ConfirmDialog component spec (12 tests)
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3s
CI / Backend Unit Tests (pull_request) Failing after 1s
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 1s
Covers: title/body rendering, destructive vs primary button class,
custom labels, settle true/cancel, aria-labelledby, and hide-after-settle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:38:58 +02:00
Marcel
a2d078b8f9 refactor(persons): replace non-null assertion with null guard on removeFormEl
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:36:47 +02:00
Marcel
0b95c90e7a refactor(confirm): use import { m } instead of import * as m in ConfirmDialog
Consistent with every other component in the project.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:35:42 +02:00
Marcel
84378f11b4 refactor(confirm): use plain let for resolveRef instead of $state
resolveRef is never read reactively — it is only read synchronously
inside settle(). Using $state was misleading about the intent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:34:52 +02:00
Marcel
3a316bc382 fix(ui): center dialog, add backdrop, hover states, and cursor-pointer on buttons
- Add m-auto and w-full to ensure the native <dialog> is centred
- Add backdrop:bg-black/50 for dimmed overlay when modal is open
- Add hover:bg-danger/80 and hover:bg-primary/80 on confirm button
- Add cursor-pointer to both cancel and confirm buttons

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:33:33 +02:00
Marcel
1a519eedd6 refactor(persons): replace inline delete modal with ConfirmService in NameHistoryEditCard
Some checks failed
CI / Unit & Component Tests (push) Failing after 4s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 3s
CI / Backend Unit Tests (pull_request) Failing after 2s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:19:33 +02:00
Marcel
498679234a refactor(docs): replace inline confirmDelete toggle with ConfirmService in SaveBar
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:12:01 +02:00
Marcel
14fc5cbc54 refactor(admin): replace window.confirm with ConfirmService in admin group delete
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:06:54 +02:00
Marcel
0d1401ce4f refactor(admin): replace window.confirm with ConfirmService in admin user delete
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 14:04:09 +02:00
Marcel
d4ead08c17 refactor(transcription): replace window.confirm with ConfirmService in TranscriptionBlock
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:47:37 +02:00
Marcel
08bd27b5cd feat(layout): mount ConfirmDialog in root layout and provide confirm service
provideConfirmService() sets up context for the entire component tree.
ConfirmDialog is mounted once at the bottom of the layout shell.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:21:34 +02:00
Marcel
1942c2a5cb feat(confirm): add ConfirmService and ConfirmDialog with deferred-Promise pattern
- confirm.svelte.ts: context-based async service returning Promise<boolean>
- ConfirmDialog.svelte: native <dialog> element, reads service from context
- Concurrent calls return false immediately (guard at top of confirm())
- SSR-safe: confirm() returns Promise.resolve(false) on server
- getConfirmService() throws descriptive error outside provider tree
- 5 Vitest tests: confirm/cancel/Escape/concurrent/outside-provider all green

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:20:37 +02:00
Marcel
fb00de6690 feat(design-system): add --c-danger/--c-danger-fg token pair for destructive actions
Light: #c0392b (5.1:1 on white — WCAG AA), dark: #e55347 (4.7:1 on surface).
Exposed as bg-danger/text-danger-fg Tailwind utilities via @theme inline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:07:46 +02:00
Marcel
52dd72ae8d feat(i18n): add btn_confirm key to de/en/es message files
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:06:43 +02:00
Marcel
7a6b3d66fb docs(spec): add design spec for person title & type fields UI
Covers segmented type control, title input, conditional field
visibility, PersonCard title display, mobile layout, and a11y.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 21:42:17 +02:00
Marcel
e69aaa6a8c fix: classify Steuerfinanzamt and Reichsfechtschule as institutions
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
Add "amt" and "schule" suffixes to INSTITUTION_END in PersonTypeClassifier
so German government offices and schools are auto-classified on import.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 20:59:17 +02:00
Marcel
c34db997fa feat(model): add title field to PersonUpdateDTO with @Size validation
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3s
CI / Backend Unit Tests (pull_request) Failing after 2s
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 1s
Add title to PersonUpdateDTO with @Size(max=50) constraint.
PersonService.createPerson and updatePerson now handle the title
field with blank-to-null normalization.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 18:38:33 +02:00
Marcel
166f60f7d3 feat(ui): show type icon in avatar for non-person entities
Person list and detail page avatars now display a type-specific
icon (building, people group, question mark) instead of meaningless
initials for INSTITUTION, GROUP, and UNKNOWN person types.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 18:36:34 +02:00
Marcel
a1b21d6989 refactor(ui): use CSS custom properties for PersonTypeBadge colors
Replace hardcoded Tailwind utility colors with project CSS variables
(--c-badge-institution-*, --c-badge-group-*, --c-badge-unknown-*).
Dark mode variants defined in both @media and manual toggle blocks.
Extract shared badge classes and use $derived config object.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 18:35:05 +02:00
Marcel
5106d277f1 test(service): add integration test for findOrCreateByAlias classification
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 2s
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 1s
Testcontainers test verifying: SKIP returns null with no DB record,
INSTITUTION/GROUP store full name in lastName with null firstName
and correct personType, PERSON splits name normally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:29:20 +02:00
Marcel
ac545ecdaa refactor: address PR review concerns
- Remove Architekt from WORD_PREFIXES (classifier handles it)
- Use Objects.equals for null-safe firstName/lastName comparison
- Remove unused trimmed variable in PersonTypeClassifier
- Fix containsWord to loop through all occurrences (finds
  "Eltern" in "Nachbareltern Eltern")
- Extract DisplayNameFormatter utility shared by Person and
  PersonSummaryDTO to eliminate display logic duplication

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:25:06 +02:00
Marcel
c0cf8d7952 fix(service): add @Nullable to findOrCreateByAlias and filter nulls in caller
Add @Nullable annotation to findOrCreateByAlias() return type.
Filter null results (from SKIP classification) in MassImportService
receiver list to prevent null elements in the receivers collection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:22:33 +02:00
Marcel
73640ef5b6 feat(parser): implement stripTitle for known prefixes
Some checks failed
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 1s
Two-pass title stripping with loop for stacked titles:
- Dot-prefixes (Dr., Prof.) matched without trailing space
- Word-prefixes (Tante, Frau, Schwester, etc.) matched at
  word boundary
- Stacked titles like "Prof. Dr. Muller" handled correctly
- Single token after title strip goes to lastName (not firstName)

Add 5 "von" last names to KNOWN_LAST_NAMES for correct splitting
of entries like "Freifrau von Massenbach".

15 new test cases + updated 3 existing tests for title behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:15:18 +02:00
Marcel
6ee1ef73c3 feat(ui): add PersonTypeBadge to person list and detail pages
Show colored badge for non-PERSON types per design spec:
- INSTITUTION: blue with building icon
- GROUP: purple with people icon
- UNKNOWN: amber with question mark icon
- PERSON: no badge (unmarked default)

Badge appears on person cards in list and on detail page.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:09:16 +02:00
Marcel
a3da5731d0 feat(service): integrate PersonTypeClassifier into findOrCreateByAlias
Classify raw name before processing. SKIP returns null (no Person
created). INSTITUTION/GROUP skip split() and store full name in
lastName with firstName=null and appropriate personType.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 13:06:49 +02:00