Commit Graph

269 Commits

Author SHA1 Message Date
Marcel
d075bf390a feat(tag-search): expand children and surface ancestor path in search results
Modifies TagService.search() to enrich name-matches with tree relatives:
root matches expand descendants, child matches prepend ancestors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 11:27:41 +02:00
Marcel
4ec4062274 refactor(#248): simplify TagService.buildTree() to single-pass LinkedHashMap approach
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3m12s
CI / Backend Unit Tests (pull_request) Failing after 2m57s
CI / Unit & Component Tests (push) Failing after 2m41s
CI / Backend Unit Tests (push) Failing after 2m45s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 07:45:40 +02:00
Marcel
e6497ebff4 fix(#248): add @Schema(REQUIRED) to TagTreeNodeDTO, improve mergeTags log, add comments
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m42s
CI / Backend Unit Tests (pull_request) Failing after 2m44s
CI / Unit & Component Tests (push) Failing after 2m35s
CI / Backend Unit Tests (push) Failing after 2m44s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 01:01:09 +02:00
Marcel
d7a46de1cc refactor(#248): address PR review concerns — TagOperator enum, typed projection, bean validation
- Replace stringly-typed "AND"/"OR" tagOperator with TagOperator enum (DocumentService, DocumentController)
- Replace Object[] with TagCount projection interface in TagRepository.findDocumentCountsPerTag()
- Use @NotNull + @Valid on MergeTagDTO.targetId; remove manual null check from TagController
- Correct ALLOWED_TAG_COLORS to match actual frontend CSS tokens (sage/sienna/amber/slate/violet/rose/cobalt/moss/sand/coral)
- Add TOCTOU comment to validateNoAncestorCycle() with mitigation explanation
- Add test: deleteWithDescendants_skipsDocTagDeletion_whenDescendantIdsIsEmpty
- Update TagServiceTest to use mock TagRepository.TagCount projection

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:24:04 +02:00
Marcel
a669f6368d feat(#248): expose parentId in TagTreeNodeDTO OpenAPI schema and regenerate TypeScript types
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:33:12 +02:00
Marcel
5e5c249aba feat(#248): add POST /api/tags/{id}/merge and DELETE /api/tags/{id}/subtree endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:27:41 +02:00
Marcel
609d242f5d feat(#248): enrich TagTreeNodeDTO with parentId and populate documentCount via single aggregate query
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:24:50 +02:00
Marcel
c03c391879 test(#248): add deleteWithDescendants test coverage to TagServiceTest
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:20:19 +02:00
Marcel
f921284db6 feat(#248): add TagService.mergeTags() with validateNotSelf/validateNotDescendant/transferDocuments helpers
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:18:41 +02:00
Marcel
b9b572436a feat(#248): add merge/delete/count native queries to TagRepository
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:15:14 +02:00
Marcel
a05d9c22ae fix(#248): TagService.getById() throws DomainException(TAG_NOT_FOUND) instead of ResponseStatusException
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:13:45 +02:00
Marcel
de7c48117b feat(#248): add TAG_NOT_FOUND, TAG_MERGE_SELF, TAG_MERGE_INVALID_TARGET error codes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:10:52 +02:00
Marcel
06fd5ae2da fix(#221): resolve inherited color on child tags in document responses
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m51s
CI / Backend Unit Tests (push) Failing after 2m46s
Colors are stored only on root-level tags. DocumentService now calls
TagService.resolveEffectiveColors() before returning search results and
single-document responses, so child tags carry their parent's color when
serialised to JSON. Parent tags are batch-loaded in a single query.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 19:28:21 +02:00
Marcel
e8e54cc282 feat(#221): change TagInput binding to Tag[], add color dots and hierarchy grouping
Backend:
- TagRepository: add findDescendantIdsByName() recursive CTE query
- TagService: add expandTagNamesToDescendantIdSets() for document search

Frontend:
- TagInput: accept Tag[] (id, name, color, parentId) instead of string[]
- Chips show color dot via var(--c-tag-{color}) when tag has color
- Suggestions grouped hierarchically: children indented under their parents
- Update DescriptionSection, edit/new pages, SearchFilterBar, +page.svelte

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 16:11:38 +02:00
Marcel
57dc72b51d feat(#221): add AND/OR tag filtering with hierarchy expansion in document search
- Replace hasTags(List<String>) spec with hasTags(List<Set<UUID>>, useOr)
- AND mode: one EXISTS subquery per expanded tag ID set; empty set = disjunction
- OR mode: union of all expanded sets into a single EXISTS subquery
- DocumentService calls tagService.expandTagNamesToDescendantIdSets() before building spec
- DocumentController exposes ?tagOp=AND|OR query param (default AND)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 15:44:18 +02:00
Marcel
3fba740469 feat(#221): tag entity hierarchy fields, service, repository, controller
- Tag entity: add parentId (UUID FK) and color (String) fields
- TagUpdateDTO and TagTreeNodeDTO records
- ErrorCode: INVALID_TAG_COLOR, TAG_CYCLE_DETECTED
- TagRepository: findAncestorIds() recursive CTE query
- TagService: cycle detection, color validation, getTagTree()
- TagController: use TagUpdateDTO, add GET /api/tags/tree endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 15:26:23 +02:00
Marcel
f9ac963b9f feat(#221): add V39 migration for tag hierarchy and colors
Adds parent_id FK (ON DELETE SET NULL), self-reference check constraint,
parent_id index, and nullable color column to the tag table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 15:15:17 +02:00
Marcel
da5c92fe39 fix(#240): remove readyCount from weekly stats DTO and SQL query
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m26s
CI / Backend Unit Tests (push) Failing after 2m46s
CI / Unit & Component Tests (pull_request) Failing after 2m32s
CI / Backend Unit Tests (pull_request) Failing after 2m30s
The Lesefertig pulse was removed from the UI; drop the backend support
for it too — removes the subquery from findWeeklyStats(), the projection
getter, the DTO field, and updates all affected tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 13:19:53 +02:00
Marcel
23410aa4b8 fix(#240): rename V37→V38 (V37 was already applied); regenerate api.ts
The original needsExpert V37 migration was applied to the dev DB before
the feature was removed. Renaming our new indexes migration to V38 avoids
the Flyway checksum conflict. Regenerated api.ts now reflects the
@Schema(requiredMode=REQUIRED) annotations — DTO fields are non-optional.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 12:23:14 +02:00
Marcel
e041c75793 test(#240): add Testcontainers integration tests for native SQL queue queries
6 new tests covering findSegmentationQueue (excludes PLACEHOLDER, excludes
annotated docs), findTranscriptionQueue (below-90%-reviewed docs, zero-block
case), findReadyToReadQueue (>=90% reviewed), and findWeeklyStats (zeros on
empty DB). Runs against real PostgreSQL 16 via Testcontainers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 12:15:21 +02:00
Marcel
adea7d498f fix(#240): add @Schema(requiredMode=REQUIRED) to both queue DTOs; add V37 indexes
All non-null DTO fields are now marked required so the generated api.ts
emits required (non-optional) types for callers. V37 migration adds
created_at/updated_at indexes on document_annotations and transcription_blocks
to avoid full table scans in the weekly stats correlated subqueries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 12:09:09 +02:00
Marcel
4cf01a0f1d test(#240): add TranscriptionQueueControllerTest
Verifies 401/403/200 responses for all four endpoints. Matches
the @WebMvcTest + @RequirePermission pattern used across the project.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 12:07:14 +02:00
Marcel
2e4d9a8375 refactor(#240): replace Object[] positional mapping with Spring Data projections
Introduces TranscriptionQueueProjection and TranscriptionWeeklyStatsProjection
interfaces so column reordering in native SQL can never silently produce wrong
data. Removes the four type-coercion helpers (toUUID, toLocalDate, toInt, toLong)
from TranscriptionQueueService. Covered by TranscriptionQueueServiceTest (6 tests).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 12:05:21 +02:00
Marcel
ff1606f63d fix(#240): update test fixtures broken by rebase changes
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m29s
CI / Backend Unit Tests (push) Failing after 2m38s
CI / Unit & Component Tests (pull_request) Failing after 2m31s
CI / Backend Unit Tests (pull_request) Failing after 2m42s
Two backend tests passed a 6-element enrichment row but the rebase
added summary_snippet as column 7 — added null at index 6 to both
fixtures.

Two frontend page.server tests mocked only 4 dashboard API calls but
the page now makes 8 (3 Mission Control queues + weekly-stats added
on this branch) — added the 4 missing mock responses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 11:50:49 +02:00
Marcel
8980d810d4 fix(#240): use annotationCount as denominator in queue thresholds
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m24s
CI / Backend Unit Tests (pull_request) Failing after 2m51s
CI / Unit & Component Tests (push) Failing after 2m24s
CI / Backend Unit Tests (push) Failing after 2m37s
The ready-to-read and transcription queue queries were dividing
reviewed blocks by textedBlockCount instead of annotationCount.
A document with 4/15 annotations typed — all 4 reviewed — scored
4/4 = 100 % and incorrectly appeared in the Lesefertig column.

Both queries now compute the ratio as:
  reviewed / annotationCount

so a document must have ≥ 90 % of all its drawn regions reviewed
before it graduates to Lesefertig.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 11:00:18 +02:00
Marcel
ca0cf4903c refactor(#240): remove needsExpert feature completely
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m23s
CI / Backend Unit Tests (pull_request) Failing after 2m43s
CI / Backend Unit Tests (push) Has been cancelled
CI / Unit & Component Tests (push) Has started running
Drops the needsExpert / needs_expert flag end-to-end: DB migration
(V37, never applied), Document entity field, PATCH endpoint, service
method, DTO field, all three queue queries, ExpertBadge component,
i18n key, generated API types, and test fixture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:52:14 +02:00
Marcel
9404ec34ce fix(#240): add missing V36 index migration and rename needs_expert to V37
V36 (add_index_transcription_blocks_document_id) was applied to the dev
database during a previous local session but never committed to git.
Flyway checksum mismatch prevented the backend from starting.

- V36__add_index_transcription_blocks_document_id.sql: restored from the
  index that already exists in the database (idx_transcription_blocks_document_id)
- V36__add_needs_expert_to_documents.sql → V37__add_needs_expert_to_documents.sql

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:42:18 +02:00
Marcel
2ea603a3bf feat(#240): backend for Mission Control Strip — queue endpoints + expert flag
Adds the server-side foundation for the dashboard transcription widget:

- V36 migration: needs_expert BOOLEAN NOT NULL DEFAULT FALSE on documents
- Document entity: needsExpert field (@Schema required)
- DocumentRepository: 4 native queries — segmentation queue, transcription
  queue, ready-to-read queue (seeded weekly shuffle sort), weekly pulse stats
- TranscriptionQueueService: maps Object[] rows to typed DTOs, handles
  PostgreSQL type variations (UUID/String, Date/LocalDate, Number/BigDecimal)
- TranscriptionQueueController: GET /api/transcription/{segmentation-queue,
  transcription-queue, ready-to-read, weekly-stats} — all guarded by READ_ALL
- DocumentService + DocumentController: PATCH /api/documents/{id}/needs-expert
  toggles the expert flag (WRITE_ALL required)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:41:55 +02:00
Marcel
d7b2357834 feat(search): surface summary snippet when summary matched the query
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m33s
CI / Backend Unit Tests (push) Failing after 2m44s
Add a summary_snippet column to findEnrichmentData using ts_headline on
documents.summary, only when the summary's tsvector matches the query.
Expose it via SearchMatchData.summarySnippet / summaryOffsets and render
a "Zusammenfassung" / "Summary" / "Resumen" labelled row in the document
list — identical treatment to the transcription snippet row.

Fixes the case where a document appeared in search results with no
visible match explanation (e.g. searching "frucht" found a document
whose summary mentioned "Früchte").

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
091f7e5d25 feat(search): partial-word matching via to_tsquery prefix queries
Replace websearch_to_tsquery with a CROSS JOIN LATERAL subquery that
appends :* to each lexeme so prefix matches work (e.g. "furchtb" finds
"furchtbar"). websearch_to_tsquery still handles the safe tokenisation
of user input (stop words, special chars, operators); regexp_replace
then adds :* before to_tsquery re-parses the result.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
32f151ff31 feat(search): add snippetOffsets to SearchMatchData and use ts_headline for highlighted snippets
- SearchMatchData gains a 6th field snippetOffsets: List<MatchOffset> so the frontend
  can render highlighted terms inside the transcription snippet without {#html}.
- DocumentRepository.findEnrichmentData now calls ts_headline() with chr(1)/chr(2)
  sentinels instead of returning raw block text; parseHighlight() strips the sentinels
  and produces clean text + MatchOffset list in one pass.
- DocumentService exposes ParsedHighlight and parseHighlight() as public so they can be
  called from cross-package integration tests.
- All related tests updated to the new 6-argument SearchMatchData constructor and
  to call parseHighlight() for asserting the snippet clean text and offsets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
162397d4eb fix(search): make ParsedHighlight and parseHighlight public for cross-package test access
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
bcb2898e5f perf(search): add index on transcription_blocks.document_id for lateral join
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
2817410f94 test(search): assert matchData key and snippet in controller search response
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
63d1a2e1ff fix(search): mark documents and total as required in OpenAPI schema
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
f723a83011 feat(search): enrich searchDocuments with per-document match data
DocumentService.searchDocuments now returns DocumentSearchResult with matchData
populated from findEnrichmentData. Title highlights are parsed from chr(1)/chr(2)
delimiters into MatchOffset lists; transcription snippet and sender/receiver/tag
match flags are extracted from the same native SQL row.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
c235151075 test(search): add DocumentSearchEnrichmentTest for findEnrichmentData native query
Tests lateral join best-block selection, chr(1)/chr(2) headline delimiters,
sender/receiver/tag match flags, and null cases for missing relations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
741eebc276 feat(search): add DocumentSearchResult.withMatchData() factory with match overlay map
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
8a5ca6868f feat(search): add SearchMatchData record for per-document match signals
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
a15b5ebf17 feat(search): add MatchOffset record for character-level highlight positions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:10:10 +02:00
Marcel
305f95a572 test(search): add sender name FTS coverage and combined filter test
Some checks failed
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1m57s
CI / Backend Unit Tests (pull_request) Failing after 3m0s
- should_find_document_by_sender_name — symmetric with existing receiver test
- fts_combined_with_status_filter_excludes_non_matching_status — verifies
  hasIds(rankedIds).and(hasStatus(...)) two-phase search works together

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:30 +02:00
Marcel
43595aeb8a refactor(search): replace O(n²) indexOf with HashMap for rank ordering
ids.indexOf() scans the full list for each document, giving O(n²) total.
Build a Map<UUID, Integer> once at O(n) and use getOrDefault at O(1) per
document. Behavior is identical; existing tests remain green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:30 +02:00
Marcel
947d8aeb6c fix(search): respect DATE sort when text is present — do not override with relevance
When a user explicitly selects DATE sort with a text query active, the
previous code treated it identically to RELEVANCE, silently discarding
the user's sort choice. Remove DATE from the useRankOrder condition so
that explicit DATE sort always goes through the standard JPA sort path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:30 +02:00
Marcel
7ec3e6170d feat(fts): backfill search_vector for all existing documents (V35)
Fires the BEFORE UPDATE trigger for every documents row, which recomputes
the tsvector from all currently-linked metadata, blocks, receivers, and tags.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:30 +02:00
Marcel
7d456d8e8b feat(fts): replace ILIKE hasText with FTS two-phase search and RELEVANCE sort
- DocumentSort: add RELEVANCE enum value
- DocumentSpecifications: remove hasText() ILIKE, add hasIds(List<UUID>)
  for FTS-pre-filtered ID sets
- DocumentService.searchDocuments(): FTS two-phase path — findRankedIdsByFts()
  returns ranked UUIDs, hasIds() narrows subsequent Specification query,
  in-memory re-sort preserves rank order; RELEVANCE is the default when
  text is present and no explicit non-relevance sort is requested
- DocumentSpecificationsTest: remove hasText() tests (Specification removed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:30 +02:00
Marcel
24530cf85b feat(fts): add search_vector column, GIN index, DB triggers, and FTS repository method (V34)
- V34 migration: adds search_vector tsvector column with GIN index
- BEFORE INSERT/UPDATE trigger on documents rebuilds vector from title (A),
  summary + transcription_blocks.text (B), sender/receiver names (C),
  tag names + location (D) using german FTS config
- AFTER triggers on transcription_blocks, document_receivers, document_tags
  touch the parent document row to re-fire the BEFORE UPDATE trigger
- DocumentRepository.findRankedIdsByFts() native query using websearch_to_tsquery
- DocumentFtsTest: 12 integration tests covering stemming, trigger sync,
  ranking, stop words, malformed input, receiver and tag search

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:16 +02:00
Marcel
81da127381 refactor(ocr): rename findTop5 to findTop10 for headroom as frontend shows 3 by default
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
f206c0b9e9 test(ocr): add unit tests for triggerSegTraining() — conflict, threshold, happy path, failure
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
15e532eb96 refactor(ocr): extract assertNoRunningTraining() to eliminate duplicate guard
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
b83465020a fix(backend): store error rate for segmentation training runs
setCer() was called for recognition training but not for segmentation.
The OCR service now returns cer = 1 - accuracy for segtrain; persist it
so the admin panel can display Fehlerrate for both training types.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00