feat(search): show search result snippets with match highlighting (#219) #242
Reference in New Issue
Block a user
Delete Branch "feat/issue-219-search-snippets"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
titleOffsets(char-level highlight positions fromts_headline),transcriptionSnippet(best matching block via lateral join),senderMatched,matchedReceiverIds,matchedTagIdsapplyOffsets(text, offsets)function converts flat offsets into typedTextSegment[](sorted, merged, clamped) for safe rendering without{@html}DocumentListrenders<mark>elements for highlighted title terms and shows a transcription snippet below each result card when a match existsImplementation details
Backend pipeline
MatchOffsetrecord:start: int, length: int(Javacharpositions, compatible with JS indexing)SearchMatchDatarecord: carries snippet + title offsets + sender/receiver/tag match flagsDocumentRepository.findEnrichmentData(): native SQL with lateral join to select highest-ranked transcription block; usests_headlinewithchr(1)/chr(2)delimiters (safe sentinel approach — no regex ambiguity)DocumentService.enrichWithMatchData(): parses delimited headlines intoList<MatchOffset>, short-circuits when query is blank or doc list is emptyDocumentSearchResultwraps the document list + match data map; controller returns it directlyFrontend rendering
applyOffsetsmerges overlapping/adjacent spans before splitting the string — prevents nested<mark>elements{#each titleSegments}with native<mark>elements (no{@html})data-testid="search-snippet"for test targetingTest plan
DocumentSearchEnrichmentTest) — lateral join, stemming, sender/receiver/tag match flags, null-block handlingDocumentServiceTestenrichment tests — title offsets parsing, empty matchData for non-text queries, snippet presentsearch.spec.tsunit tests —applyOffsetsedge cases: empty, start/middle/end, two terms, overlapping merge, adjacent merge, clamp, out-of-bounds, unsortedDocumentList.svelte.spec.tstests — snippet shown/hidden,<mark>present/absentCloses #219
🤖 Generated with Claude Code
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
Solid TDD all the way through. Tests written before production code, clean pure function for
applyOffsets, no{@html}— exactly the right approach.Suggestions
1.
parseTitleOffsetsandparseUUIDsbelong on the data types, not onDocumentServiceDocumentServicehas grown twoprivate staticparsing helpers that have no dependency on service state. They're conceptually parsing helpers forSearchMatchData. Consider moving them:Or a
MatchOffset.parseHeadline(String headline)factory. Both would shrinkDocumentServiceand give the logic a more natural home. Not a blocker — butDocumentServiceis already large.2.
senderMatched/matchedReceiverIds/matchedTagIdscomputed but not renderedThe backend computes 3 signals that the frontend does not yet use. That's fine if this is a planned follow-up, but it means the lateral join + two correlated subqueries are running for every text search with no visible user benefit beyond what
titleOffsetsandtranscriptionSnippetalone would deliver.If there's no immediate plan to render sender/receiver/tag highlights in this sprint, consider either (a) deferring these subqueries to a follow-up or (b) filing an issue so the intent is documented. Not a blocker, but unrendered data is dead weight.
3. Minor:
<mark>template formatting is subtle but correctThe placement of
>immediately before{seg.text}to avoid Svelte whitespace injection is correct and intentional — good catch. This trips up reviewers but it's right.4.
parseUUIDswill throwIllegalArgumentExceptionon a malformed UUID from the DBSince the data is DB-sourced (UUIDs cast to text via
string_agg), this should never fire. But it would produce a 500 instead of a graceful empty result if it ever does. Low risk — the DB is the source of truth — but a defensivefilterwould be safer:Or just leave it and trust the DB. Not a blocker.
🏛️ Markus Keller — Application Architect
Verdict: ✅ Approved
Layer boundaries are clean. The controller is thin, the service owns enrichment logic, the repository owns SQL. The
DocumentSearchResultDTO wraps the response correctly and thewithMatchData()factory is the right abstraction. Records forMatchOffsetandSearchMatchDataare idiomatic Java 21.Suggestions
1. Missing
@Schema(requiredMode = REQUIRED)ondocumentsandtotalfields ofDocumentSearchResultmatchDatahas the annotation, butdocumentsandtotaldo not. Per project convention (CLAUDE.md), all fields the backend always populates should be marked required to drive correct TypeScript type generation:Without it, the TypeScript types will model
documentsandtotalas optional, which misrepresents the contract. Low-risk now, but it will bite when the types are next regenerated.2.
enrichWithMatchDatafires a second DB round-trip for every text searchThe LATERAL JOIN + two correlated subqueries are correct SQL, but they add a second round-trip after the main search query. For the current volume (family archive = tens of thousands of documents max) this is fine. Worth noting: if this table ever grows large and
transcription_blocks.document_iddoesn't have an index, the lateral join degrades. The existing Flyway migrations should already have this index given the transcription block feature — but it's worth verifying withEXPLAIN ANALYZE.3.
totalisdocuments.size()— the comment is correct, the debt is acknowledgedThe comment inside
DocumentSearchResult.withMatchData()correctly documents thattotalmust come from aCOUNTquery when pagination is added. This is acceptable debt, well-signposted. One suggestion: create a GitHub issue for the pagination work so the comment can reference a ticket number rather than being prose-only.4.
parseTitleOffsets/parseUUIDsas private statics onDocumentService(Echoing Felix.) These are pure functions with no service dependencies. They belong as static factories on the DTO records themselves or in a dedicated
SearchMatchDataParserclass.DocumentServiceis already doing a lot — parsing headline delimiters is not its domain. Not a blocker for this PR, but worth extracting before the service grows further.🧪 Sara Holt — QA Engineer
Verdict: ✅ Approved
The test suite is well-structured and covers the right layers. Integration tests run against real Postgres via Testcontainers ✓.
applyOffsetshas 10 unit tests including edge cases (clamping, unsorted, out-of-bounds) ✓. Component tests usevitest-browser-svelteagainst real DOM ✓. TDD discipline observed throughout.Suggestions
1. No test for
matchDatafield presence in the controller JSON responseDocumentControllerTestcurrently mockssearchDocumentsto returnDocumentSearchResult.of(List.of()), which is correct. But there is no assertion that the JSON response actually contains amatchDatakey. If the field were accidentally dropped from serialization (e.g. Jackson exclusion, wrong visibility), no test would catch it.Suggested addition:
2. No test for malformed UUID in
parseUUIDsThe static helper
parseUUIDs(String csv)callsUUID::fromString, which throwsIllegalArgumentExceptionon invalid input. The happy path and null/empty cases are covered indirectly, but there is no test asserting the behavior when the DB somehow returns a malformed UUID string. Since this is DB-sourced data, the risk is low — but a unit test with an invalid input string would make the behavior explicit (either "throws" or "gracefully skips").3. The
transcriptionSnippetcontains unescaped user text — no XSS test needed (Svelte handles it), but worth a commentThe snippet is bound via
{snippet}in Svelte (text content, notinnerHTML), so XSS is structurally impossible. No test needed, but a brief comment near the{snippet}binding would help the next reviewer confirm it's intentional.4. Missing test:
applyOffsetswith a negative start offsetCurrent edge cases cover: empty offsets, out-of-bounds (start > text length), clamping (end > text length), unsorted. A negative
startvalue is not tested —Math.max(0, start)handles it correctly per the implementation, but the test case is absent. Minor coverage gap.5. What's tested at each layer — summary (good news)
search.spec.tsDocumentSearchEnrichmentTestDocumentServiceTestDocumentList.svelte.spec.tspage.svelte.spec.tsWell-balanced pyramid. No E2E tests added — appropriate, since the existing E2E suite covers search as a critical journey and the new behavior is adequately covered by component tests.
🔒 Nora "NullX" Steiner — Security Engineer
Verdict: ✅ Approved
This PR has a strong security posture. The two biggest risks for a search-with-highlighting feature — XSS via rendered match data, and SQL injection via the FTS query parameter — are both handled correctly.
What I checked
FTS injection: clean
webSearchToTsqueryis a PostgreSQL function that normalizes, tokenizes, and escapes input. It accepts free-form text and converts it to a tsquery — it cannot be injected through the query string. Named parameter:querymeans the value is passed as a bind variable, not interpolated into SQL. ✓XSS via title highlighting: clean
{seg.text}in Svelte is text-content binding — it becomestextContent, notinnerHTML. An attacker-controlled document title containing<script>alert(1)</script>would be rendered as literal text, not executed. ✓XSS via transcription snippet: clean
Same Svelte text-content binding. No
{@html}. Thechr(1)/chr(2)delimiters used ints_headlineare control characters (0x01/0x02) that can never appear in user-submitted document text, so there is no risk of delimiter injection. ✓UUID parsing from DB: acceptable risk
UUID::fromStringinparseUUIDswill throwIllegalArgumentExceptionon malformed input. Since the source is astring_aggof UUID primary key columns, malformed UUIDs cannot be inserted (PostgreSQL enforces UUID column type at storage). The exception path would surface as a 500 error, not a security event. ✓Suggestions (non-blocking)
1. The
webSearchToTsqueryfallback on invalid queriesPostgreSQL's
webSearchToTsqueryreturnsNULLwhen the query string produces an empty tsquery (e.g. all stopwords). The native query usesWHERE d.id IN :idsand the tsquery is only used insidets_headline/to_tsvector @@ ...subexpressions — so a NULL tsquery would silently produce NULL headlines and no subquery matches. This is correct behavior (graceful degradation) but there is no test covering the all-stopwords case. Not a security issue, but a correctness edge case.2. Content Security Policy:
<mark>injection scenario (informational)If a CSP is in place (not seen in this PR, but worth noting for future),
<mark>elements created via DOM manipulation could be flagged. Since these are server-side rendered by Svelte's SSR, this is not a concern here. Just confirming: SSR renders clean HTML, no client-side DOM injection. ✓Overall assessment: The sentinel delimiter approach (
chr(1)/chr(2)) forts_headlineis a clean solution — using control characters that cannot appear in user-submitted text is safer than trying to HTML-escape around arbitrary delimiters. The decision to avoid{@html}and use structuredTextSegment[]is the correct defensive choice.🎨 Leonie Voss — UI/UX & Accessibility
Verdict: ⚠️ Approved with concerns
The highlighting approach is semantically correct (
<mark>is the right HTML element for this), the snippet placement is well-considered, and the screen reader annotation (sr-only "Fundstelle:") is a lovely touch. Two things need follow-up.Blockers
None.
Suggestions
1.
bg-accent/20on the<mark>— verify contrast in contextaccentmaps to#A6DAD8(brand-mint). At 20% opacity on white, the resulting background is approximatelyrgba(166, 218, 216, 0.2)= a very light teal. The text inside the mark inherits thetext-inkcolor (#002850on this light teal background), which passes contrast easily.However, when the list item is hovered (
hover:bg-muted/50), the mark's background blends with the hover background — the highlight may become nearly invisible. Verify this state visually. A slight opacity bump tobg-accent/30on hover would maintain visual distinctiveness:2. The snippet has no visual distinction from the metadata row above it
The snippet (
text-sm text-ink-2 font-sans) uses the same typographic treatment as the date/location metadata row. A user scanning the card may not recognize it as a "found in transcription" result vs. document metadata.Consider one of:
accent:border-l-2 border-accent pl-2italic text-ink-2<span class="sr-only">Transkription:</span>(already has that) + visible label for sighted usersAt minimum, the
sr-onlyspan "Fundstelle: " should probably read "In Transkription:" to be more precise about what matched.3.
line-clamp-2is correct, but test at 320pxOn a narrow mobile viewport,
line-clamp-2attext-sm(14px) renders approximately 2 × 20px = 40px of snippet. Combined with the title, date, sender/receiver rows, and tags, this is a dense card. Verify that the card doesn't become overwhelming on 320px — the snippet is the deepest item in the information hierarchy and should be visually de-emphasized accordingly.4.
not-italicon<mark>is unnecessaryThe
<mark>HTML element has no default italic styling.not-italicis a no-op here (equivalent tofont-style: normalwhich<mark>already inherits). Remove it to keep the class list clean:What works well:
<mark>element — screen readers announce "highlight" around matched terms ✓text-inheritpreserves the serif heading style inside the mark ✓<span class="sr-only">Fundstelle: </span>gives screen reader users context for the snippet ✓line-clamp-2prevents layout explosion on long transcriptions ✓data-testid="search-snippet"for test targeting ✓🚀 Tobias Wendt — DevOps & Platform
Verdict: ✅ Approved
No infrastructure changes in this PR. No new services, no Docker Compose changes, no CI modifications. The process used (rebuild Docker container to regenerate API types) is correct — the
api.tsis committed and the generation step is a dev-time operation, not a CI step.What I checked
Schema changes: none
No Flyway migrations added. This is purely a query + application code change. The feature works against the existing schema — the
transcription_blocksanddocumentstables are unchanged. ✓generated/api.tscommitted: correctThe regenerated TypeScript types are committed to the repository rather than generated at build time. This is the correct approach for this stack — it avoids requiring the backend to be running during
npm run buildin CI. ✓No new environment variables or configuration needed: correct ✓
One thing worth checking out-of-band
Index on
transcription_blocks.document_idThe LATERAL JOIN in
findEnrichmentDatascanstranscription_blocks WHERE tb.document_id = d.id. If this column is not indexed, every search with a text query will perform a full table scan oftranscription_blocksfor each document in the result set.Check:
If
document_iddoesn't have an index, add one in a new Flyway migration:This is a follow-up, not a blocker — at the current data volume (family archive scale) the query will be fast either way — but it's worth doing before the archive grows large.
Review feedback addressed ✅
All seven reviewer concerns from the multi-persona review have been implemented. Summary:
Backend (commits
a7c839a,2b86634)documentsandtotalfields not marked required in OpenAPI spec@Schema(requiredMode = REQUIRED)to both components inDocumentSearchResult.java; verified with reflection-based unit testsmatchDatakey in response bodysearch_responseBodyContainsMatchDataKeytest toDocumentControllerTest— stubs a result with a document + match entry and asserts$.matchData.<uuid>.transcriptionSnippetin the JSON responseapplyOffsetstest for negative start offsetsearch.spec.ts: negative start clamped to 0, and negative-end offset (completely ignored) — implementation already handled both cases correctlyFrontend (commit
141511e)<mark>highlight invisible on hover (list item darkens, mark stays light)group-hover:bg-accent/30to<mark>so it intensifies with the row hovernot-italicclass on<mark>italicto the snippet<p>classDatabase (commit
211f531)transcription_blocks.document_idfor the LATERAL joinV36__add_index_transcription_blocks_document_id.sql- SearchMatchData gains a 6th field snippetOffsets: List<MatchOffset> so the frontend can render highlighted terms inside the transcription snippet without {#html}. - DocumentRepository.findEnrichmentData now calls ts_headline() with chr(1)/chr(2) sentinels instead of returning raw block text; parseHighlight() strips the sentinels and produces clean text + MatchOffset list in one pass. - DocumentService exposes ParsedHighlight and parseHighlight() as public so they can be called from cross-package integration tests. - All related tests updated to the new 6-argument SearchMatchData constructor and to call parseHighlight() for asserting the snippet clean text and offsets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
Overall this is a well-thought-out feature — the sentinel approach is clever,
applyOffsetsis clean, and the test pyramid is genuinely good. But there's a real bug in the test mocks that needs to be fixed before merge.Blockers
1. ArrayIndexOutOfBoundsException in
DocumentServiceTestenrichment testsDocumentServiceTest.java(the two new enrichment unit tests) mocksfindEnrichmentDatawith 6-elementObject[]arrays:But
enrichWithMatchDatareads 7 columns includingrow[6]forsummaryHeadline:These tests will throw
ArrayIndexOutOfBoundsException. The fix is a one-liner — add a 7thnullto each mock array:The PR checklist marks these as passing — please re-verify against the current code.
Suggestions
2.
ParsedHighlightandparseHighlightexposed aspublicfrom a serviceParsedHighlightis apublicnested record insideDocumentService, andparseHighlightispublic static. They're only public because the integration test (DocumentSearchEnrichmentTest) calls them directly. This leaks implementation detail out of the service's public API.The cleaner fix: move
ParsedHighlightto thedto/package (alongside its siblingMatchOffset) and moveparseHighlightto a package-private utility or to thedtopackage. The integration test can then import fromdtowithout touching service internals.3.
enrichWithMatchDatareads 7 positionalObject[]columns — fragilePositional access (
row[0]…row[6]) is brittle: changing column order in the SQL silently maps wrong values. A JPA projection interface would give named access and compile-time safety:This is a suggestion, not a blocker — but the
Object[]pattern is already fragile with 7 columns.4. Minor:
{#each titleSegments as seg, i (i)}— index keyKeying by index
iworks correctly here since the segment list is immutable and derived, but Svelte's reconciler will do unnecessary DOM work on re-renders. Not a real problem at this scale but(seg.text + seg.highlight)would be more precise.🏛️ Markus Keller — Application Architect
Verdict: ⚠️ Approved with concerns
The overall feature architecture is sound: backend produces typed offset structs, frontend consumes them without
{@html}. That pipeline is clean. A few structural issues are worth addressing.Concerns
1.
ParsedHighlightas apublicrecord nested inDocumentServiceA service class is the boundary between repository and controller. Exposing an internal parsing record as a
publictype from a service erodes that boundary and makes the service's public API surface wider than it needs to be.ParsedHighlightbelongs indto/alongside its siblingMatchOffset.parseHighlightcan then be a static factory method onParsedHighlightitself, or a package-private utility — notpublic staticon a service. The current design forcesDocumentSearchEnrichmentTestto import a service internals type, which is an odd testing dependency.2.
Object[]with 7 positional columns is an architectural smellfindEnrichmentDatareturnsList<Object[]>with 7 unnamed columns mapped by position. This pattern was acceptable with 2 columns infindRankedIdsByFts, but at 7 it becomes genuinely fragile. Changing column order in the SQL — a refactor any developer might make — silently produces wrong data with no compiler warning.A Spring Data projection interface gives named access at zero cost:
The interface is a structural type; Spring Data generates the mapping. No boilerplate required.
3. Duplicated prefix-query construction in two native queries
The
regexp_replace(websearch_to_tsquery('german', :query)::text, '''([^'']+)''', '''\\1'':*', 'g')pattern for prefix-matching appears in bothfindRankedIdsByFtsandfindEnrichmentData. If the prefix strategy changes, it must change in two places.Extraction to a PostgreSQL function keeps this in one place:
Both queries then call
prefix_tsquery('german', :query).What's done well
The sentinel approach (
chr(1)/chr(2)) forts_headlinedelimiters is correct — it avoids regex ambiguity and is safe for all German BMP characters. TheV36migration usesCREATE INDEX IF NOT EXISTS— idempotent and safe. TheapplyOffsetsmerge/clamp logic is the right place to handle edge cases rather than spreading defensive code across the component.🧪 Sara Holt — QA Engineer
Verdict: 🚫 Changes requested
The test pyramid structure is genuinely good:
applyOffsetshas 10 well-named unit tests,DocumentSearchEnrichmentTestruns against real PostgreSQL with 13 cases, and the component tests cover the key rendering states. But there's a real defect in the mock setup that blocks merge.Blockers
1. Mock arrays too short — AIOOBE in enrichment unit tests
Both new enrichment unit tests in
DocumentServiceTestconstruct 6-element mock arrays:enrichWithMatchDatareadsrow[6]forsummaryHeadline. With a 6-element array (indices 0–5), that's anArrayIndexOutOfBoundsException. Both tests will fail.Fix — add a 7th
nullelement to each mock:The PR checklist marks these as green — please re-run
./mvnw test -Dtest=DocumentServiceTestagainst the current branch to confirm.Suggestions
2.
summarySnippetandsummaryOffsetshave no integration test coverageDocumentSearchEnrichmentTestcovers title headline, transcription snippet, sender, receiver, and tag matches thoroughly — but the summary field (column 6 in the SQL) has zero test cases. The summary path is real code that runs in production when a document has a summary matching the query.Suggested additions to
DocumentSearchEnrichmentTest:summary_snippet_contains_delimiters_when_summary_matches_querysummary_snippet_is_null_when_summary_does_not_matchsummary_snippet_is_null_when_summary_is_empty3.
MatchOffsetTest.should_hold_start_and_lengthtests Java record boilerplateThis test verifies that a Java record's auto-generated
start()andlength()accessors return the constructor arguments. That's the JVM contract for records — it doesn't need a test. Remove it to keep the suite focused on behavior.4.
@BeforeEachcleanup could use@Transactionalfor rollback isolationDocumentSearchEnrichmentTestuses explicitdeleteAll()in@BeforeEach. That's correct but leaves cleanup dependent on execution order. Adding@Transactionalon the test class rolls back after each test automatically and is the project standard for JPA integration tests (@Transactionalon the class, not just the method).What's done well
The
applyOffsetstest suite insearch.spec.tsis excellent: it covers empty offsets, start/middle/end positions, two non-overlapping terms, overlapping merge, adjacent merge, clamping, out-of-bounds, unsorted input, and negative start values. Every interesting edge case is named and tested independently. Thedata-testidattributes (search-snippet,sender-match,receiver-match,tag-match) make Playwright targeting clean for future E2E tests.🔒 Nora "NullX" Steiner — Security Engineer
Verdict: ✅ Approved
No security vulnerabilities found. This is a well-implemented feature from a security standpoint.
What I checked
SQL injection — All native queries use named parameters (
:query,:ids). Thewebsearch_to_tsqueryfunction sanitizes FTS input. Theregexp_replacetransform operates on the already-parameterized tsquery output, not on raw user input. ✅XSS — The frontend rendering uses native Svelte text interpolation (
{seg.text}) and<mark>elements throughout. There is no{@html}anywhere in this feature. The entire purpose ofapplyOffsetsis to convert backend offset data into typedTextSegment[]objects that eliminate the need for HTML injection — this is the correct XSS defense for this pattern. ✅Authorization — No new endpoints or permission boundaries. The
enrichWithMatchDatacall is triggered only from the existingGET /api/documents/searchendpoint, which sits behind the existing auth chain. ✅Input sanitization in
parseUUIDs—UUID.fromString()rejects malformed UUIDs withIllegalArgumentException. Since the input comes fromstring_agg(r.id::text, ',')in a trusted SQL subquery, injection is structurally impossible. ✅ParsedHighlightexposure — Making itpublicis a code design issue, not a security concern.One low-priority note
The sentinel characters used as
ts_headlinedelimiters are\u0001(SOH) and\u0002(STX) — Unicode control characters. PostgreSQL's TEXT type accepts all Unicode, including these. If user-provided document titles or transcription text somehow contained literal\u0001/\u0002characters,parseHighlightwould produce incorrect offsets (the control chars would be misinterpreted as highlight delimiters).This is not a realistic attack vector (browsers and standard form inputs strip control characters), but if you have a bulk import path (the ODS importer) that reads raw file content, consider stripping C0 control characters (U+0001–U+001F) from title and text fields at ingestion time as a defensive measure.
CWE: CWE-116 (Improper Encoding or Escaping of Output) — low confidence, low impact.
🎨 Leonie Voss — UI/UX & Accessibility
Verdict: ⚠️ Approved with concerns
Using native
<mark>elements for search highlights is semantically correct, the brand token usage on the underline decoration is on-point, and thenot-italicoverride in snippet marks is a thoughtful detail. A few accessibility and readability issues need attention.High Priority
1. Snippet and summary text at
text-sm(14px) — below minimum for the senior audienceThe snippet
<p>and summary<p>elements both usetext-sm(14px):For our dual audience (25–42 and 60+), 14px is below the 16px recommended minimum for body-adjacent text. Snippets are exactly the content seniors need to read to understand why a document matched — this isn't decoration. Consider
text-base(16px).line-clamp-2still applies cleanly at 16px.Medium Priority
2.
<mark>elements have no screen reader labelThe native
<mark>element is announced by some screen readers as "highlight" but inconsistently across AT/browser combinations. For keyboard/screen reader users in our senior audience, the highlighted term blends in without context. A minimal improvement:Or add a visually hidden prefix before the title:
At minimum, add a Paraglide key for this so it can be localized.
3. Verify
text-ink-3contrast on snippet labelsThe "Inhalt" / "Zusammenfassung" labels above snippets use
text-ink-3:text-xsis 12px. At 12px you need a 4.5:1 contrast ratio for WCAG AA. Verify thattext-ink-3meets this against the white card background. If it's in the gray-400 range (~#9CA3AF), the contrast is ~2.9:1 — a WCAG fail. The section title pattern in the codebase usestext-gray-400which already sits at this risk threshold; adding moretext-xslabels in the same light-gray family compounds it.Low Priority
4. Duplicated
<mark>class stringThe class
"bg-transparent text-inherit underline decoration-brand-navy decoration-2 underline-offset-2"appears 3–4 times across the template. Extract to a Svelte{@const}at the top of the component or a CSS class in a<style>block so the styling lives in one place:What's done well
<mark>is the semantically correct element for search highlights per HTML5. Thenot-italicoverride prevents the italic snippet style from bleeding into highlighted terms. The underline treatment (decoration-brand-navy decoration-2) is consistent with the existing highlight pattern established in earlier commits. Labels ("Inhalt", "Zusammenfassung") are properly i18n'd across all three languages.⚙️ Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
No infrastructure changes in this PR. Everything looks clean from the platform side.
What I checked
docker-compose.yml— No changes. ✅pom.xmlorpackage.json. ✅V36 migration
Clean.
IF NOT EXISTSmakes it idempotent and safe to re-apply. The index ontranscription_blocks.document_idis the right support for the lateral join infindEnrichmentData— without it, the lateral join would do a sequential scan per document in the result set.One operational note
The new
findEnrichmentDataquery runs for every search with a text query. It's a lateral join with ats_rank(...) ORDER BY ... LIMIT 1inside — that's efficient when thetranscription_blocks.document_idindex is hit. At current archive volumes this is fine.As the document count grows, keep an eye on search latency using
EXPLAIN ANALYZEagainst the production-equivalent database. Ifts_rankscans start to dominate at large block counts, a GIN index onto_tsvector('german', tb.text)ontranscription_blockswould speed up the innerWHERE ... @@ q.pqfilter. Not needed now, but worth having in a backlog item.