feat(search): show search result snippets with match highlighting (#219) #242

Merged
marcel merged 20 commits from feat/issue-219-search-snippets into main 2026-04-16 09:10:11 +02:00

20 Commits

Author SHA1 Message Date
Marcel
bca7822ab7 feat(search): surface summary snippet when summary matched the query
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m31s
CI / Backend Unit Tests (push) Failing after 2m39s
CI / Unit & Component Tests (pull_request) Failing after 2m27s
CI / Backend Unit Tests (pull_request) Failing after 4m45s
Add a summary_snippet column to findEnrichmentData using ts_headline on
documents.summary, only when the summary's tsvector matches the query.
Expose it via SearchMatchData.summarySnippet / summaryOffsets and render
a "Zusammenfassung" / "Summary" / "Resumen" labelled row in the document
list — identical treatment to the transcription snippet row.

Fixes the case where a document appeared in search results with no
visible match explanation (e.g. searching "frucht" found a document
whose summary mentioned "Früchte").

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 21:34:10 +02:00
Marcel
b87036dde0 feat(search): restyle highlights to navy underline and add snippet labels
Switch search match highlights from bordered mint chips to a plain navy
underline (decoration-brand-navy). Add visible "Inhalt" / "Content" /
"Contenido" label before the transcription snippet, matching the style
of the Von/An sender-receiver labels.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 21:33:23 +02:00
Marcel
ca7dbfeb99 feat(search): partial-word matching via to_tsquery prefix queries
Replace websearch_to_tsquery with a CROSS JOIN LATERAL subquery that
appends :* to each lexeme so prefix matches work (e.g. "furchtb" finds
"furchtbar"). websearch_to_tsquery still handles the safe tokenisation
of user input (stop words, special chars, operators); regexp_replace
then adds :* before to_tsquery re-parses the result.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 21:32:37 +02:00
Marcel
f2f55b05c5 feat(search): add snippetOffsets to SearchMatchData and use ts_headline for highlighted snippets
- SearchMatchData gains a 6th field snippetOffsets: List<MatchOffset> so the frontend
  can render highlighted terms inside the transcription snippet without {#html}.
- DocumentRepository.findEnrichmentData now calls ts_headline() with chr(1)/chr(2)
  sentinels instead of returning raw block text; parseHighlight() strips the sentinels
  and produces clean text + MatchOffset list in one pass.
- DocumentService exposes ParsedHighlight and parseHighlight() as public so they can be
  called from cross-package integration tests.
- All related tests updated to the new 6-argument SearchMatchData constructor and
  to call parseHighlight() for asserting the snippet clean text and offsets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 20:14:20 +02:00
Marcel
25f1402dd9 feat(search): highlight snippet terms and mark sender/receiver/tag matches in document list
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 20:03:04 +02:00
Marcel
c8cd236568 fix(search): make ParsedHighlight and parseHighlight public for cross-package test access
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 19:47:23 +02:00
Marcel
d14dd795a4 fix(pdf): merge setElements and render effects so canvas remount triggers re-render
The refactor made pdfDoc a plain variable so renderer.isLoaded was not
reactive. Svelte only tracked currentPage and scale — but when the canvas
reappeared after loading, neither changed, so the PDF stayed blank.

Fix: merge the two effects into one that reads canvasEl synchronously.
Svelte now tracks canvasEl as a dependency; when the canvas remounts
(loading spinner → false), the effect re-fires and renders the
already-loaded PDF document.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 19:26:54 +02:00
Marcel
211f531513 perf(search): add index on transcription_blocks.document_id for lateral join
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m29s
CI / Backend Unit Tests (push) Failing after 2m42s
CI / Unit & Component Tests (pull_request) Failing after 2m26s
CI / Backend Unit Tests (pull_request) Failing after 2m35s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 19:12:46 +02:00
Marcel
141511e973 style(search): improve mark hover contrast, remove no-op class, italicize snippet
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 19:11:54 +02:00
Marcel
89d2c47f8f test(search): add applyOffsets coverage for negative start offsets
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 19:08:30 +02:00
Marcel
2b8663415b test(search): assert matchData key and snippet in controller search response
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 19:06:47 +02:00
Marcel
a7c839aa94 fix(search): mark documents and total as required in OpenAPI schema
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 19:04:59 +02:00
Marcel
ddb87db6b7 feat(search): pass matchData from server load to DocumentList
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m46s
CI / Backend Unit Tests (push) Failing after 2m49s
CI / Unit & Component Tests (pull_request) Failing after 2m32s
CI / Backend Unit Tests (pull_request) Failing after 2m36s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 18:34:54 +02:00
Marcel
93c78433cf feat(search): render title highlights and transcription snippets in DocumentList
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 18:34:14 +02:00
Marcel
9673cefe44 feat(search): add applyOffsets utility and regenerate API types with MatchOffset/SearchMatchData
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 18:05:07 +02:00
Marcel
8c7ce147a0 feat(search): enrich searchDocuments with per-document match data
DocumentService.searchDocuments now returns DocumentSearchResult with matchData
populated from findEnrichmentData. Title highlights are parsed from chr(1)/chr(2)
delimiters into MatchOffset lists; transcription snippet and sender/receiver/tag
match flags are extracted from the same native SQL row.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 17:53:57 +02:00
Marcel
8526e6c0a1 test(search): add DocumentSearchEnrichmentTest for findEnrichmentData native query
Tests lateral join best-block selection, chr(1)/chr(2) headline delimiters,
sender/receiver/tag match flags, and null cases for missing relations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 17:40:47 +02:00
Marcel
003d68ed21 feat(search): add DocumentSearchResult.withMatchData() factory with match overlay map
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 15:34:00 +02:00
Marcel
8cbecd452b feat(search): add SearchMatchData record for per-document match signals
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 15:32:26 +02:00
Marcel
47da0fa216 feat(search): add MatchOffset record for character-level highlight positions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 15:30:53 +02:00