Commit Graph

82 Commits

Author SHA1 Message Date
Marcel
7ca44d7df1 fix(db): add indexes on documents.sender_id and document_comments.author_id
Some checks failed
CI / Unit & Component Tests (push) Failing after 4m26s
CI / OCR Service Tests (push) Successful in 32s
CI / Backend Unit Tests (push) Failing after 3m16s
CI / Unit & Component Tests (pull_request) Failing after 4m33s
CI / OCR Service Tests (pull_request) Successful in 39s
CI / Backend Unit Tests (pull_request) Failing after 3m16s
Flyway V62 adds idx_documents_sender_id and idx_comments_author_id to speed up
FK-driven queries on the persons page and briefwechsel view. Closes #470.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 16:31:30 +02:00
Marcel
3fcdfa85f1 fix(db): add PRIMARY KEY to group_permissions; promote tbmp UNIQUE to PK
V63 deduplicates any phantom (group_id, permission) rows accumulated since
the initial schema. V64 sets NOT NULL on permission and adds pk_group_permissions.
V65 renames uq_tbmp_block_person to pk_tbmp for naming-convention consistency.
Integration tests confirm each constraint via pg_catalog.pg_constraint. Closes #469 (partial).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 15:18:46 +02:00
Marcel
5146aeb568 feat(document): add DocumentSort.UPDATED_AT for reader dashboard feed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 15:56:47 +02:00
Marcel
360db1ae33 chore(documents): drop V61 timeline density index migration (#385)
The index was added in anticipation of a SQL GROUP BY aggregation,
but DocumentService.getDensity aggregates in memory via
findAll(spec).stream(). The index is never touched by the current
query plan. Per Markus's round-2 review: drop the unused migration
to avoid mismatched rationale-vs-implementation debt. Revisit when
the archive crosses 50k rows (TODO already in getDensity Javadoc).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 10:49:24 +02:00
Marcel
ce0c013f0f feat(documents): add document_date index for density aggregation (#385)
Issue #385 introduces GET /api/documents/density which aggregates documents
by month via date_trunc. Adding the index now keeps the query cheap as the
archive grows and removes a future-investigation tax.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 21:43:28 +02:00
Marcel
eedf5e3ac1 fix(backend): rename users table to app_users
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3m43s
CI / OCR Service Tests (pull_request) Successful in 39s
CI / Backend Unit Tests (pull_request) Failing after 3m15s
CI / Unit & Component Tests (push) Failing after 3m37s
CI / OCR Service Tests (push) Successful in 41s
CI / Backend Unit Tests (push) Failing after 3m2s
Aligns the auth-account table name with the AppUser entity. The historical
mismatch (table 'users' alongside table 'persons') misled schema-first readers
into assuming the two were related; renaming to 'app_users' makes the
deliberate split between auth accounts and historical persons explicit at the
schema layer.

Scope: the table itself, the users_groups join table, and the three FK columns
whose name was literally 'user_id'. Semantic FK columns (audit_log.actor_id,
notifications.recipient_id, document_versions.editor_id, etc.) keep their
names — the role they describe is the documentation, not the type.

Closes #418. Unblocks #407 (REFACTOR-1).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-04 21:44:21 +02:00
Marcel
18e5d18cc7 feat(geschichte): V59 grants BLOG_WRITE to existing WRITE_ALL groups
Without this, the Geschichten feature ships dark on prod day-one — no group
holds BLOG_WRITE, so the editor controls never render even for admins. The
mapping "anyone who can write documents can also author family stories" is
the safest default and admins can revoke afterwards via the new checkbox UI.

Closes Tobias's review S5 on PR #382.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 18:42:46 +02:00
Marcel
e5024fc804 test(geschichte): add Testcontainers integration test and fix V58 author FK
The end-to-end test creates a DRAFT, verifies it is hidden from a READ_ALL
reader (list and getById), publishes it, verifies the reader sees it, then
deletes it and confirms the join rows go with it but the linked Person
remains. Also corrects the V58 author FK to reference the actual users
table (not app_users).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 17:33:52 +02:00
Marcel
b944ae9510 feat(geschichte): add entity, status enum, and V58 schema migration
Geschichte holds family memory stories (issue #381). Body is unbounded TEXT
(Tiptap HTML, no length limit). Two join tables link a story to historical
Persons and Documents. A partial index speeds the public index query
(status='PUBLISHED' ORDER BY published_at DESC) and reverse-lookup indexes
support the ?personId and ?documentId filters.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 17:24:31 +02:00
Marcel
091f6c7592 migration(transcription): add unique constraint on (block_id, person_id) sidecar
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3m4s
CI / OCR Service Tests (pull_request) Successful in 35s
CI / Backend Unit Tests (pull_request) Failing after 2m59s
CI / Unit & Component Tests (push) Failing after 3m5s
CI / OCR Service Tests (push) Successful in 35s
CI / Backend Unit Tests (push) Failing after 2m59s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 23:42:05 +02:00
Marcel
e833d1f71a feat(transcription): V56 migration adds transcription_block_mentioned_persons sidecar
Child table for @-mentions inside transcription block text. Each row binds
one block to one person via personId + displayName; the literal "@DisplayName"
stays in block.text. No FK on person_id so deleted persons degrade gracefully
to plain unlinked text rather than cascade-deleting the block. Indexed on
person_id for the future "blocks mentioning person X" query and on block_id
for the @ElementCollection load.

Schema choice diverges from document_comments.comment_mentions (many-to-many
to AppUser): the latter cascades, this one degrades. Mirrors the established
UserGroup.permissions / group_permissions @ElementCollection pattern.

Refs #362

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 20:03:36 +02:00
Marcel
6babcc7f17 fix(stammbaum): V55 adds unique_spouse_pair index — symmetric SPOUSE_OF enforced at DB level
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 19:32:17 +02:00
Marcel
df6175ed2c feat(stammbaum): add V54 migration for family network
Adds persons.family_member flag and person_relationships table with
ON DELETE CASCADE on both FKs, no_self_rel check, unique_rel composite,
indexes on both person columns, and partial unique index for symmetric
SIBLING_OF pairs (LEAST/GREATEST trick).

Refs #358.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 19:32:17 +02:00
Marcel
43cf022f05 feat(documents): extend quick-upload with optional batch metadata part
- Add DocumentBatchMetadataDTO (titles, senderId, receiverIds, documentDate, location, tags, metadataComplete)
- Add BATCH_TOO_LARGE to ErrorCode
- Extend quickUpload to accept optional @RequestPart("metadata"); dispatches to storeDocumentWithBatchMetadata when present
- Cap batch at 50 files/request; reject 400 when titles.size > files.size
- Add DocumentService.storeDocumentWithBatchMetadata applying shared fields + index-based titles to both created and updated docs
- Raise max-request-size to 500MB (10-file chunk at max per-file size)
- Add structured SLF4J logging for every quickUpload call

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 12:24:22 +02:00
Marcel
55557047de feat(documents): V53 add thumbnail_aspect + page_count columns
Adds two nullable metadata columns to documents, populated by
ThumbnailService when it generates the JPEG preview. Both remain null
until the existing admin backfill endpoint reruns the service.

Refs #305

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-23 21:38:56 +02:00
Marcel
6cf0601590 feat(db): add thumbnail_key and thumbnail_generated_at to documents
Adds two nullable columns to the documents table and their JPA mappings
on the Document entity. Both are left out of the OpenAPI required-mode
schema so the generated TypeScript type exposes them as optional.

Refs #307

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 21:34:03 +02:00
Marcel
13732ab96b fix(db): V51 backfills annotation_id on block comments and notifications
Previously issued block-comment notifications were stored with
annotation_id=NULL because CommentService.postBlockComment did not
populate DocumentComment.annotationId. Now that the code fix is in
place, existing rows need to be filled in so legacy notifications
can also carry the query param that the frontend deep-link flow
expects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 13:13:13 +02:00
Marcel
edb4e54df2 fix(audit): backfill COMMENT_ADDED and MENTION_CREATED events
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m40s
CI / OCR Service Tests (push) Successful in 35s
CI / Backend Unit Tests (push) Failing after 2m54s
Comments created before audit logging was added in 428c63a2 have no
corresponding audit_log rows, so the Chronik activity feed (which
reads exclusively from audit_log) cannot surface them in "Alle" or
"Für dich", even though the fix from #295 is wired up correctly.
V50 inserts the missing events idempotently from document_comments
and comment_mentions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 12:08:45 +02:00
Marcel
101f5b2a6a feat(audit): add V49 rollup covering index + raise /api/dashboard/activity cap to 40
- V49__add_audit_log_rollup_index.sql: partial covering index on
  (actor_id, document_id, kind, happened_at DESC) filtered by the 6 rollup
  kinds. Matches the WHERE clause of findRolledUpActivityFeed exactly so the
  session-grouping window scan is index-backed.
- DashboardController: clamp limit to 40 (was 20). Chronik requests up to 40
  activity items per page; dashboard side-rail still passes 7.

Part of #285.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 20:38:10 +02:00
Marcel
71c02626f4 feat(migration): V48 add composite index on transcription_blocks(document_id, reviewed)
Speeds up the bulk completion percentage query added in previous commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 23:19:24 +02:00
Marcel
bae07c8171 fix(audit): submit afterCommit write to executor to avoid transaction sync conflict
AuditService.logAfterCommit() called writeLog() inline inside the afterCommit()
callback. At that point Spring's transaction synchronizations are still active on
the thread, so SimpleJpaRepository.save() throws IllegalStateException which the
catch block silently swallowed — leaving audit_log permanently empty.

Fix: submit writeLog() to auditExecutor so it runs on a fresh thread with no active
synchronization context. Also switch auditExecutor from CallerRunsPolicy to AbortPolicy
to prevent the bug from silently recurring when the queue fills under load.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:39:59 +02:00
Marcel
c678432d25 fix(migration): correct app_users → users table references in V46/V47
The AppUser entity is mapped to the 'users' table (not 'app_users').
V46 had a broken REFERENCES clause and hardcoded role in REVOKE; V47 and the
native query in AuditLogQueryRepository had the same wrong table name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 16:58:04 +02:00
Marcel
cb02dc84f6 feat(user): add deterministic avatar color to AppUser
Adds color field assigned from an 8-colour palette keyed on the user's UUID
hash (Math.abs(id.hashCode()) % 8). Fires via @PrePersist/@PreUpdate/@PostLoad
so both new and existing users get the correct colour at runtime.

V47 migration adds the column and fixes the V46 REVOKE bug that hardcoded
role name 'app_user' instead of CURRENT_USER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 16:33:27 +02:00
Marcel
793b863096 feat(audit): add audit_log infrastructure and instrument AnnotationService
- V46 migration: audit_log table with indexes and append-only REVOKE
- audit/ package: AuditKind enum (with Javadoc payloads), AuditLog entity,
  AuditLogRepository, AuditService (@Async on dedicated auditExecutor)
- AsyncConfig: auditExecutor with CallerRunsPolicy and queueCapacity 50
- AnnotationService: ANNOTATION_CREATED on createAnnotation() only,
  deferred via afterCommit() when inside a transaction

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 13:17:54 +02:00
Marcel
88012a1193 fix(invite): address review cycle 2 feedback
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m32s
CI / Unit & Component Tests (pull_request) Failing after 2m31s
CI / OCR Service Tests (pull_request) Successful in 31s
CI / Backend Unit Tests (pull_request) Failing after 2m46s
CI / OCR Service Tests (push) Successful in 36s
CI / Backend Unit Tests (push) Failing after 2m43s
- Narrow isTrustedProxy to RFC 1918 172.16-31.x.x (was 172.x.x.x)
- Add @Valid/@NotBlank/@Email to RegisterRequest and @Valid to AuthController
- Add FK constraint on invite_token_group_ids.group_id → user_groups(id)
- Add back-to-login link and <main> landmark to register error state
- Add component test suite for register/+page.svelte (11 tests)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 09:30:57 +02:00
Marcel
61fa35df67 feat(invites): implement invite-based self-service registration backend
- V45 migration: invite_tokens + invite_token_group_ids tables
- InviteToken entity with @ElementCollection group IDs
- InviteService: code generation, validation, redemption (pessimistic lock prevents TOCTOU), revoke, list
- RateLimitInterceptor (Caffeine-backed, 10 req/min per IP) registered via WebMvcConfigurer
- AuthController: GET /api/auth/invite/{code} + POST /api/auth/register (both public)
- InviteController: GET/POST/DELETE /api/invites (ADMIN_USER permission)
- SecurityConfig: permitAll for new public auth endpoints
- ErrorCode: INVITE_NOT_FOUND, INVITE_EXHAUSTED, INVITE_REVOKED, INVITE_EXPIRED
- 36 new tests (InviteServiceTest, AuthControllerTest, InviteControllerTest)

Closes #269

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 00:42:43 +02:00
Marcel
5e01db1c74 feat(auth): remove username field, migrate identity to email
- AppUser entity: replace username with email (NOT NULL, UNIQUE,
  colon-pattern validated)
- AppUserRepository: remove findByUsername, rename search JPQL to
  searchByEmailOrName (searches email + firstName + lastName)
- CreateUserRequest: remove username, require email with colon guard
- UserService: rename findByUsername→findByEmail, createUserOrUpdate
  upserts by email, blank-email guard throws instead of setting null
- UserController + all other controllers: findByEmail(auth.getName())
- DataInitializer: email-based config and lookup, E2E users have email
- V44 migration: pre-check + email NOT NULL + drop username column
- All tests updated: .username() builders removed, mocks updated,
  NotificationRepositoryTest fixtures include email fields

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 23:36:55 +02:00
Marcel
92f3c04d54 fix(ocr): add partial unique index and align SenderModelServiceTest with suite style
Add V42 partial unique index on ocr_training_runs(person_id) WHERE status='QUEUED'
to enforce the per-person queued coalescing guarantee at the DB level. Also adds
@ExtendWith(MockitoExtension.class) to SenderModelServiceTest for consistency with
the rest of the service test suite, with lenient() on the shared txTemplate stub.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:25:18 +02:00
Marcel
18cf839fac feat(ocr): wire SenderModelService into OcrAsyncRunner; stage missing foundational files
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m21s
CI / OCR Service Tests (push) Successful in 29s
CI / Backend Unit Tests (push) Failing after 2m38s
CI / Unit & Component Tests (pull_request) Failing after 2m26s
CI / OCR Service Tests (pull_request) Successful in 31s
CI / Backend Unit Tests (pull_request) Failing after 2m44s
OcrAsyncRunner now passes the per-sender model path to streamBlocks for
HANDWRITING_KURRENT documents. processDocument replaced extractBlocks
with streamBlocks + AtomicReference, removing the unchecked raw-array
pattern.

Also stages all previously uncommitted foundational files for this
feature: SenderModel entity, SenderModelRepository, Flyway migrations
V40/V41, updated OcrClient/RestClientOcrClient streaming API,
TrainingDataExportService.exportForSender, TranscriptionService Kurrent
hook, application.yaml OCR config, and frontend i18n/test additions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 19:27:02 +02:00
Marcel
f9ac963b9f feat(#221): add V39 migration for tag hierarchy and colors
Adds parent_id FK (ON DELETE SET NULL), self-reference check constraint,
parent_id index, and nullable color column to the tag table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 15:15:17 +02:00
Marcel
23410aa4b8 fix(#240): rename V37→V38 (V37 was already applied); regenerate api.ts
The original needsExpert V37 migration was applied to the dev DB before
the feature was removed. Renaming our new indexes migration to V38 avoids
the Flyway checksum conflict. Regenerated api.ts now reflects the
@Schema(requiredMode=REQUIRED) annotations — DTO fields are non-optional.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 12:23:14 +02:00
Marcel
adea7d498f fix(#240): add @Schema(requiredMode=REQUIRED) to both queue DTOs; add V37 indexes
All non-null DTO fields are now marked required so the generated api.ts
emits required (non-optional) types for callers. V37 migration adds
created_at/updated_at indexes on document_annotations and transcription_blocks
to avoid full table scans in the weekly stats correlated subqueries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 12:09:09 +02:00
Marcel
ca0cf4903c refactor(#240): remove needsExpert feature completely
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m23s
CI / Backend Unit Tests (pull_request) Failing after 2m43s
CI / Backend Unit Tests (push) Has been cancelled
CI / Unit & Component Tests (push) Has started running
Drops the needsExpert / needs_expert flag end-to-end: DB migration
(V37, never applied), Document entity field, PATCH endpoint, service
method, DTO field, all three queue queries, ExpertBadge component,
i18n key, generated API types, and test fixture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:52:14 +02:00
Marcel
9404ec34ce fix(#240): add missing V36 index migration and rename needs_expert to V37
V36 (add_index_transcription_blocks_document_id) was applied to the dev
database during a previous local session but never committed to git.
Flyway checksum mismatch prevented the backend from starting.

- V36__add_index_transcription_blocks_document_id.sql: restored from the
  index that already exists in the database (idx_transcription_blocks_document_id)
- V36__add_needs_expert_to_documents.sql → V37__add_needs_expert_to_documents.sql

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:42:18 +02:00
Marcel
2ea603a3bf feat(#240): backend for Mission Control Strip — queue endpoints + expert flag
Adds the server-side foundation for the dashboard transcription widget:

- V36 migration: needs_expert BOOLEAN NOT NULL DEFAULT FALSE on documents
- Document entity: needsExpert field (@Schema required)
- DocumentRepository: 4 native queries — segmentation queue, transcription
  queue, ready-to-read queue (seeded weekly shuffle sort), weekly pulse stats
- TranscriptionQueueService: maps Object[] rows to typed DTOs, handles
  PostgreSQL type variations (UUID/String, Date/LocalDate, Number/BigDecimal)
- TranscriptionQueueController: GET /api/transcription/{segmentation-queue,
  transcription-queue, ready-to-read, weekly-stats} — all guarded by READ_ALL
- DocumentService + DocumentController: PATCH /api/documents/{id}/needs-expert
  toggles the expert flag (WRITE_ALL required)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:41:55 +02:00
Marcel
bcb2898e5f perf(search): add index on transcription_blocks.document_id for lateral join
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
7ec3e6170d feat(fts): backfill search_vector for all existing documents (V35)
Fires the BEFORE UPDATE trigger for every documents row, which recomputes
the tsvector from all currently-linked metadata, blocks, receivers, and tags.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:30 +02:00
Marcel
24530cf85b feat(fts): add search_vector column, GIN index, DB triggers, and FTS repository method (V34)
- V34 migration: adds search_vector tsvector column with GIN index
- BEFORE INSERT/UPDATE trigger on documents rebuilds vector from title (A),
  summary + transcription_blocks.text (B), sender/receiver names (C),
  tag names + location (D) using german FTS config
- AFTER triggers on transcription_blocks, document_receivers, document_tags
  touch the parent document row to re-fire the BEFORE UPDATE trigger
- DocumentRepository.findRankedIdsByFts() native query using websearch_to_tsquery
- DocumentFtsTest: 12 integration tests covering stemming, trigger sync,
  ranking, stop words, malformed input, receiver and tag search

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:16 +02:00
Marcel
f76a6c0ee5 migration(annotations): add chk_annotation_bounds CHECK constraint (V33)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:38:11 +02:00
Marcel
22954f348a feat(training): track and display CER per training run
After each training run, the Character Error Rate (CER = 1 - accuracy),
loss, accuracy, and epoch count are now stored on the OcrTrainingRun
record and shown in the training history table.

Also adds the missing POST /api/ocr/segtrain endpoint and the
triggerSegTraining service method so the segmentation training card
can actually trigger training.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 19:01:10 +02:00
Marcel
9b2f91ee59 feat(training): add segmentation training pipeline and complete Part 6
- Add /segtrain endpoint to OCR service (ZIP upload, ketos.segtrain,
  backup rotation, in-process model reload)
- Add segtrainModel() to OcrClient and RestClientOcrClient (10-min timeout,
  X-Training-Token header)
- Add SegmentationTrainingExportService: PAGE XML export with polygon
  de-normalization and per-page PNG rendering via PDFBox
- Add GET /api/ocr/segmentation-training-data/export endpoint
- Make TranscriptionBlock.text nullable for segmentation-only blocks
  (V31 migration)
- Add Paraglide i18n translation keys for all training UI strings (de/en/es)
- Pass source prop from TranscriptionEditView to TranscriptionBlock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 15:15:17 +02:00
Marcel
88e005eb49 feat(ocr): add training history + POST /train + GET /training-info endpoints
- OcrTrainingRun entity + V30 migration (partial unique index prevents
  concurrent runs at DB level)
- OcrTrainingService: concurrent-run guard, 5-block threshold, MDC log
  correlation, orphan recovery on ApplicationReadyEvent
- POST /api/ocr/train (ADMIN) + GET /api/ocr/training-info (ADMIN)
- TRAINING_ALREADY_RUNNING ErrorCode
- 6 OcrTrainingServiceTest + 6 OcrControllerTest tests for the new endpoints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 14:47:56 +02:00
Marcel
fdf1eb92ad feat(training): add document-level training enrollment
- V29 migration: document_training_labels join table
- TrainingLabel enum: KURRENT_RECOGNITION, KURRENT_SEGMENTATION
- Document.trainingLabels @ElementCollection
- DocumentService.addTrainingLabel / removeTrainingLabel
- PATCH /api/documents/{id}/training-labels (WRITE_ALL)
- Auto-enroll on Kurrent OCR trigger (OcrService.startOcr)
- TranscriptionEditView: enrollment chips in panel footer
- JPQL queries updated to use MEMBER OF trainingLabels

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 14:30:51 +02:00
Marcel
dd47a48d90 feat(ocr): add unique constraint on (job_id, document_id)
Prevents the same document from being added to an OCR job twice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:28:18 +02:00
Marcel
971527a50e feat(ocr): show translated progress messages during OCR processing
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 1s
Backend sends progress codes (PREPARING, LOADING, ANALYZING,
CREATING_BLOCKS:N, DONE:N, ERROR) via OcrJob.progressMessage.
Frontend translates them via Paraglide (de/en/es) and displays
below the spinner.

- V27 migration: adds progress_message column to ocr_jobs
- OcrAsyncRunner updates progress at each phase
- Poll interval reduced to 2s for snappier updates

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 23:31:23 +02:00
Marcel
3aaec01421 feat(transcription): add source/reviewed fields for training pipeline
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 0s
CI / Unit & Component Tests (pull_request) Failing after 0s
CI / Backend Unit Tests (pull_request) Failing after 1s
- BlockSource enum: MANUAL, OCR
- V26 migration adds source + reviewed columns to transcription_blocks
- OcrService sets source=OCR when creating blocks
- TranscriptionService.reviewBlock() toggles the reviewed flag
- PUT /api/documents/{id}/transcription-blocks/{blockId}/review endpoint
- 5 new tests: reviewBlock toggle/untoggle/notfound, controller,
  OcrService source=OCR verification

The reviewed flag enables the Kraken fine-tuning pipeline: only blocks
marked as reviewed by a human are exported as training data.

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 21:44:51 +02:00
Marcel
ff3990710e feat(ocr): add OCR infrastructure (interfaces, entities, migrations, DTOs)
- OcrClient + OcrHealthClient interfaces for testable OCR integration
- OcrBlockResult record for OCR engine response mapping
- OcrJob + OcrJobDocument entities with status enums
- V25 migration creates ocr_jobs and ocr_job_documents tables
- Repositories for job and job-document queries
- TriggerOcrDTO, BatchOcrDTO (@Size max=500), OcrStatusDTO
- ErrorCodes: OCR_SERVICE_UNAVAILABLE, OCR_JOB_NOT_FOUND,
  OCR_DOCUMENT_NOT_UPLOADED, OCR_PROCESSING_FAILED

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:15:16 +02:00
Marcel
d194b6b225 feat(documents): add ScriptType enum and script_type column
- ScriptType enum: UNKNOWN, TYPEWRITER, HANDWRITING_LATIN, HANDWRITING_KURRENT
- V24 migration adds script_type VARCHAR(30) NOT NULL DEFAULT 'UNKNOWN'
- Document entity: scriptType field with @Builder.Default UNKNOWN
- DocumentUpdateDTO: optional scriptType field
- DocumentService: wires scriptType through update method

Refs #226

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:13:42 +02:00
Marcel
878a90a86d feat(annotations): add polygon JSONB support for quadrilateral shapes
- V23 migration adds polygon JSONB column with 4-point CHECK constraint
- PolygonConverter: AttributeConverter for List<List<Double>> <-> JSONB
- @UniquePoints custom validator rejects duplicate coordinates
- CreateAnnotationDTO: validated optional polygon field
- DocumentAnnotation entity: polygon field with converter

Refs #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:10:35 +02:00
Marcel
92f1a112f5 feat(migration): V22 add title, person_type, nullable first_name
- Add title VARCHAR(50) column
- Add person_type VARCHAR(20) NOT NULL DEFAULT 'PERSON' with CHECK
  constraint (PERSON, INSTITUTION, GROUP, UNKNOWN — SKIP excluded)
- Drop NOT NULL on first_name for non-person entities

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 11:55:04 +02:00