fix(fts): paginate FTS match-set in SQL instead of loading all matching IDs #488
Reference in New Issue
Block a user
Delete Branch "fix/issue-468-fts-sql-pagination"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #468
Summary
findFtsPageRaw— a CTE that pushesOFFSET/LIMITinto Postgres and returns the page's total match count viaCOUNT(*) OVER(), avoiding the N-IDs-into-memory patternfindAllMatchingIdsByFtsas beforeFtsHit/FtsPagevalue records as typed wrappers around the rawObject[]rowsTest plan
DocumentFtsPagedIntegrationTest— 5 Testcontainers repo-level tests (page size, window total, last-page remainder, no-match, stopword)DocumentServiceSortTest— verifies pure-text RELEVANCE routes tofindFtsPageRaw, filter-active path stays in-memoryDocumentServiceTest— enrichment tests updated for the new SQL path🤖 Generated with Claude Code
🏛️ Markus Keller — Senior Application Architect
Verdict: ⚠️ Approved with concerns
What's Done Well
docs(adr)preceding thefeat(fts)commit. That's the correct order. ✅FtsHitandFtsPageare package-private records in thedocumentpackage — proper boundary discipline, no leakage to other domains. ✅DocumentServicecallsDocumentRepository— layering rules respected. ✅:query,:offset,:limit) — injection-proof by construction. ✅Concerns
1.
resolvePersonNamebypasses the typed API client — suggestionIn
frontend/src/routes/documents/+page.server.ts, the new helper does:The project pattern is
createApiClient(fetch).GET('/api/persons/{id}', ...). The rawfetchcall bypasses type-checking and the OpenAPI contract. If the/api/persons/{id}response shape changes, this call silently breaks at runtime. Use the typed client:2.
pureTextRelevancecondition is 7 clauses — suggestionThis inline condition spans 3 lines and requires reading 7 distinct checks to understand the routing decision. Extract it to a named private method:
isPureTextRelevance(hasText, sort, from, to, sender, receiver, tags, tagQ, status). The name documents the intent; the method body documents the contract.Documentation Coverage Check
findFtsPageRawCTE + window functionDocumentation coverage: LGTM.
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
TDD Evidence
The commit log shows the correct order: ADR → records/rename → feat → test. The failing tests (
DocumentServiceSortTestbreaking after the service refactor) were caught and fixed — that's the red/green cycle working as intended. ✅What's Clean
toFtsPageis a well-extracted helper: named column constants (COL_ID,COL_RANK,COL_TOTAL) + defensive UUID cast + clear Javadoc. ✅FtsHitandFtsPageas Java records: correct choice for immutable value types. ✅relevanceSortedPageFromSqlis under 20 lines and does one thing. ✅$effect(() => { void resetKey; searchTerm = initialName; })with the comment explaining the WHY of thevoidpattern — exactly the right use of a comment. ✅UUID_RE.test(id)guard inresolvePersonNamebefore the fetch — validates at the boundary. ✅Suggestions
1. Extract
pureTextRelevanceinto a named method2. Silent
long → intcast inrelevanceSortedPageFromSqlAt page ~43M with size 50 this would silently wrap negative. Add a guard or document the constraint:
3.
ftsRows2variable naming inDocumentServiceTestDocumentServiceTestline ~1657 usesftsRows2as a local name to avoid shadowing the outerrowsvariable. This suggests a copy-paste refactor. ConsiderenrichmentFtsRowsor just restructuring to avoid the shadow:4.
resolvePersonNameswallows errors silentlyNo logging means debugging failed person lookups in production requires tracing the missing name in the UI. Add at minimum:
🔒 Nora "NullX" Steiner — Application Security Engineer
Verdict: ⚠️ Approved with concerns
What's Secure
1. Parameterized native query — no SQL injection
findFtsPageRawuses@Param("query"),@Param("offset"),@Param("limit")throughout. Spring Data JPA's native query parameterization is injection-proof here. The CTE structure doesn't open any new attack surface. ✅2. UUID validation before fetch — SSRF/path-traversal prevented
This guard in
resolvePersonNameprevents a craftedsenderId=../secretsfrom being forwarded to the backend. ✅3. Server-side fetch — auth context preserved
resolvePersonNameuses SvelteKit's server-sidefetch, which inherits the session cookie automatically. The person lookup runs with the caller's session — no auth bypass. ✅4. Defensive UUID cast in
toFtsPagePattern-matching instance check guards against JDBC driver variance.
UUID.fromString()will throwIllegalArgumentExceptionon a malformed value — not a security issue, but worth knowing that the exception propagates up. ✅Concern
long → intoffset overflow — security smell (CWE-190: Integer Overflow)pageable.getOffset()returnslong. Forpage=42950000, size=50, offset =2,147,500,000>Integer.MAX_VALUE, which wraps to a negative value. PostgreSQL would then seeOFFSET -someValueand throw an error — no data exposure, but a denial-of-service via crafted URL parameters.PoC:
GET /api/documents/search?page=42950000&size=50→ 500 error.Fix:
Or simply return an empty result set when the offset exceeds any plausible document count. The archive is unlikely to have 2 billion documents, so a reasonable cap (e.g.,
> 10_000_000L) would suffice.This is a suggestion given the archive's expected scale, but it's a clean and cheap fix.
🧪 Sara Holt — QA Engineer & Test Strategist
Verdict: ⚠️ Approved with concerns
What's Well Tested
DocumentFtsPagedIntegrationTest: 5 tests against real Postgres via Testcontainers, covering first page, window total, last-page remainder, no-match, and stopword-only query. All boundary conditions covered. ✅@DirtiesContext(AFTER_CLASS): correct choice — avoids rebuilding the Testcontainers context after every test while still cleaning up after the class. ✅DocumentServiceSortTestnow explicitly verifies the routing decision:findFtsPageRawis called for pure-text RELEVANCE,findAllMatchingIdsByFtsis called when filters are active. That's behavior-level coverage, not just line coverage. ✅page.server.spec.ts: 4 targeted tests forresolvePersonName— UUID validation guard, 404 fallback, happy path sender, happy path receiver. Thorough for a new boundary function. ✅Concerns
1. No service-level integration test for the
pureTextRelevancerouting pathDocumentFtsPagedIntegrationTestonly tests the repository (findFtsPageRaw). There's no test that callsDocumentService.searchDocuments(...)against a real Postgres container and verifies that the pure-text RELEVANCE path actually runs the SQL query end-to-end. The unit tests mock the repository, so they prove the routing logic but not that the SQL actually executes and returns the right result through the full stack. A service-level@SpringBootTesttest with Testcontainers would close this gap.2.
(int) pageable.getOffset()— missing edge case testThe
long → intcast inrelevanceSortedPageFromSqlsilently overflows for very large page numbers. A test at the unit level:This test would currently fail (500 from Postgres), revealing the overflow bug before it reaches production.
3.
ftsRows2variable name inDocumentServiceTestMinor readability issue at line ~1657. Two different local variables named
rowsandftsRows2in adjacent tests suggest the refactor was done by modifying stubs rather than restructuring the test. Not a defect, but makes the test harder to scan.4.
@BeforeEach deleteAll()performance noteThe integration test seeds 70 documents × 5 test runs = 350
saveAndFlushcalls per test class run. For CI this is acceptable, but if the seed count grows, consider using@Sqlscripts or a single@BeforeAllwith@Transactionaltest rollback instead.🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
No infrastructure changes in this PR. Checked the usual suspects:
docker-composechanges ✅DocumentFtsPagedIntegrationTestuses@DataJpaTest+ Testcontainers — this pattern works cleanly in CI without DinD or external Postgres; the Testcontainers library manages the container lifecycle ✅@DirtiesContext(AFTER_CLASS)keeps context rebuilds to one per test class instead of one per test — this is the right setting for Testcontainers-based tests in CI; reduces cold-start overhead ✅One observation: the new
resolvePersonNamefunction in+page.server.tsadds up to 2 extra backend calls per page load when sender/receiver filters are active. At the family archive's scale this is invisible, but if this ever needs to be profiled, Grafana's backend request rate dashboard is the right first look.LGTM from the platform side.
📋 Elicit — Requirements Engineer
Verdict: ⚠️ Approved with concerns
Issue #468 Coverage
The stated requirement — paginate the FTS match-set in SQL instead of loading all matching IDs into memory — is fully satisfied:
findFtsPageRawCTECOUNT(*) OVER ()window functionfindAllMatchingIdsByFtsretainedADR-008 captures the trade-offs clearly. The "acceptable while under ~10K documents" constraint is explicit and bounded. ✅
Scope Observation — PR bundles two issues
This PR's diff includes changes from Issue #482 (filter display sync) in addition to #468:
PersonTypeahead.svelte:resetKeypropDateInput.svelte: external value sync+page.server.ts:resolvePersonName+initialSenderName/initialReceiverNameSearchFilterBar.svelte:navKeypropThe PR description only says "Closes #468." If these changes were already merged via PR #487, they should not appear in this diff. If they were not, the PR description should also say "Closes #482" and the PR title should reflect both fixes.
Recommendation: Verify whether the #482 changes are already in
main. If yes, rebase this branch onto the currentmainto drop the duplicate commits. If no, add "Closes #482" to the PR description.NFR Coverage
The extra
resolvePersonNamecall per page load (when filters are active) is an implicit performance NFR change. No measurement has been added. For the current scale this is negligible, but noting it here so it can be tracked if latency ever becomes a concern.🎨 Leonie Voss — UX Designer & Accessibility Strategist
Verdict: ✅ Approved
The UI changes in this PR are plumbing, not visual design — they restore correct behaviour rather than introducing new UI elements.
What Was Checked
PersonTypeahead's user-visible shape is unchanged — theresetKeyprop is an internal mechanism, invisible to the user ✅DateInput's external sync fix ($effectwithuntrack) prevents stale date display after navigation — this is a direct improvement for users who navigate between filtered search states ✅resolvePersonNamerestores the display name in the sender/receiver typeahead when loading a bookmarked filtered URL — correct behaviour, previously broken ✅One Note
The
resetKeysmoke test correctly notes in a comment that thererender()path invitest-browser-svelteis a full re-mount, not an in-place prop update. This means the real navigation scenario (prop update on a live instance) is tested only at the browser/E2E level. If a Playwright E2E test covers the filter-bar navigation flow, this gap is acceptable; if not, consider adding one to catch regressions in the$effect(resetKey)path.LGTM on the UI/UX side.
Review concerns addressed (commit
abeece7e)@markus — Architecture / API client convention
resolvePersonNamenow acceptsapi: ReturnType<typeof createApiClient>instead of rawfetch. Usesapi.GET('/api/persons/{id}', { params: { path: { id } } })with properresult.response.okcheck and error logging. ✅@nora — Security / integer overflow
relevanceSortedPageFromSql: addedlong → intoverflow guard before the cast. Ifpageable.getOffset() > Integer.MAX_VALUE, returns an empty result page instead of silently wrapping. ✅@felix — Code clarity
isPureTextRelevance(...)private static method (8 params) to replace the 7-clause inline boolean. Call site readsif (isPureTextRelevance(...)) {. ✅Tests updated accordingly
DocumentServiceTest: replacedfindAllMatchingIdsByFts/findAll(Specification)stubs withfindFtsPageRaw/findAllByIdstubs in the two match-data tests.page.server.spec.ts: rewrote person-name resolution tests to mock via path-basedapi.GETdispatch (matching new call site), confirmed "not a UUID" guard fires before any person API call. All 1812 frontend tests green. ✅👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
Strong TDD evidence throughout.
DocumentFtsPagedIntegrationTesthas 5 repo-level tests covering the happy path, window-total accuracy, last-page remainder, no-match, and stopword-only edge case.DocumentServiceSortTestwas fully rewritten with explicit routing verification usingverify(documentRepository, never()).findAllMatchingIdsByFts(anyString())— this proves the new SQL path fires when expected and the old in-memory path fires when filters are active. That's exactly the red-before-green discipline I want to see.Clean code
isPureTextRelevance()extraction is right — the 8-param signature is long but all are domain concepts. ✅toFtsPage()uses named constants (COL_ID,COL_RANK,COL_TOTAL) with a Javadoc enumerating the column contract. Fragile (positionalObject[]), but well-documented. ✅relevanceSortedPageFromSqlis ~20 lines doing three things: guard + SQL call + JPA load + sort. Borderline — could split intoloadFtsPage(text, pageable)returning theFtsPageand a separate JPA hydration call. Not a blocker given it reads cleanly.void resetKeyin the$effectis standard Svelte 5 idiom for reactive subscriptions without consuming the value. Comment is accurate: writable$derivedis read-only, so$state+$effectis the right pattern here. ✅Suggestion
pageSlice()still does(int) pageable.getOffset()without an overflow guard — the same pattern that was fixed inrelevanceSortedPageFromSql. It's used by the SENDER/RECEIVER in-memory sort paths. Low practical risk (those paths need matching IDs in memory first, which caps the realistic page number), but it's inconsistent. Worth adding the same guard for defensive consistency.🏗️ Markus Keller — Application Architect
Verdict: ✅ Approved
ADR-008 is present, committed to
docs/adr/, and covers all four required sections: context (the pre-production audit finding F-31, row counts measured), decision (CTE withCOUNT(*) OVER()), alternatives considered (two-query, capped result set, full SQL rewrite), and consequences (what stays in-memory and why). This is exactly the format that prevents future developers from reversing a good decision out of ignorance.Architecture
The two-phase routing (
isPureTextRelevance→ SQL path; text+filters → JPA Specification) is clean and correctly documented in the ADR. The detection function is pure (no side effects, static) and the call site reads as a guard clause — good. Layer boundaries are respected:DocumentServicecalls its ownDocumentRepositoryexclusively.resolvePersonNamecorrectly crosses into the Person domain via the typed HTTP API client (api.GET('/api/persons/{id}', ...)) rather than injectingPersonRepositorydirectly — this is the correct boundary crossing for a monolith.FtsHitandFtsPageare package-private records in thedocumentpackage, used only within that domain. Correct scoping.One architectural observation (suggestion, not blocker)
findFtsPageRawreturnsList<Object[]>. Spring Data JPA supports typed projections:This would eliminate the positional column constants and the driver-variance cast in
toFtsPage. The Javadoc + constants are a reasonable mitigation for now, but worth refactoring if the query is extended (e.g., adding a snippet column).Doc checklist
🔐 Nora "NullX" Steiner — Security Engineer
Verdict: ✅ Approved
SQL injection analysis
findFtsPageRaw— all three bind parameters (query,offset,limit) are@Param-bound in the JPA native query. No string concatenation.The
regexp_replaceinside the SQL operates on the output ofwebsearch_to_tsquery('german', :query)— not on the raw user string.websearch_to_tsqueryis a Postgres function that parses and sanitizes the FTS query into atsquerytype before the regex sees it. The transformation ('\\1':*) adds prefix-match semantics to each lexeme. This is safe.findAllMatchingIdsByFts— same parameterized pattern as before. ✅resolvePersonName/^[0-9a-f]{8}-[0-9a-f]{4}-...$/i) prevents path traversal via malformed IDs before any HTTP call is made. ✅api.GET('/api/persons/{id}', { params: { path: { id } } })— the ID is injected into the URL template, not string-concatenated. ✅Overflow guard
relevanceSortedPageFromSql:long → intguard added (if (rawOffset > Integer.MAX_VALUE) return DocumentSearchResult.of(List.of())). CWE-190 fixed. ✅Remaining gap (suggestion, not blocker)
pageSlice()still does(int) pageable.getOffset()without an overflow guard. This is used by the SENDER/RECEIVER in-memory sort paths. In practice those paths requirefindAllMatchingIdsByFtsto return IDs first, which bounds the realistic match count and page number. But the cast is still technically CWE-190. A guard here (if (pageable.getOffset() > Integer.MAX_VALUE) return DocumentSearchResult.of(...)) would be consistent with the newrelevanceSortedPageFromSqlguard.Overall posture: no new injection vectors, UUID input validation added, overflow fixed on the new hot path. ✅
🧪 Sara Holt — QA Engineer
Verdict: ⚠️ Approved with concerns
What's well covered
DocumentFtsPagedIntegrationTest— 5 Testcontainers tests against real Postgres: first page, window total, last-page remainder, no-match, stopword-only. No H2. ✅DocumentServiceSortTest— routing verification withverify(documentRepository, never()).findAllMatchingIdsByFts(anyString())andverify(documentRepository).findFtsPageRaw(...). This is exactly the right approach: verifying which code path is taken, not just the output. ✅page.server.spec.ts— 4 tests for person name resolution: UUID lookup, receiver lookup, invalid-UUID short-circuit, 404 fallback. ThemockGetcall count assertion (toHaveBeenCalledTimes(1)for the invalid UUID case) verifies the guard fires correctly. ✅Missing coverage (blockers)
1. Overflow guard is untested.
relevanceSortedPageFromSqlreturns early whenpageable.getOffset() > Integer.MAX_VALUE. This guard was flagged as a security fix (CWE-190). There's no test proving it works. A unit test inDocumentServiceSortTest:Note:
findFtsPageRawshould NOT be called in this case — verify withnever().2.
toFtsPageUUID-as-String driver variance is untested.The code path
r[COL_ID] instanceof UUID u ? u : UUID.fromString(r[COL_ID].toString())handles JDBC drivers that return IDs as Strings. This is tested indirectly (Postgres JDBC returns UUID objects, so theinstanceofbranch fires in integration tests), but theStringfallback branch has zero coverage. Consider a direct unit test fortoFtsPagewith a String ID in the array.Suggestion (nice to have)
The comment in
PersonTypeahead.svelte.spec.tshonestly documents the rerender limitation — the$effect(resetKey)path during SvelteKit navigation (prop update on a live instance) can't be isolated at the component test level. The smoke test that exercises the rerender path is the right pragmatic choice. ✅🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist
Verdict: ✅ Approved
The frontend changes in this PR are behavioral correctness fixes, not visual changes. Checking from the user's perspective:
PersonTypeahead
resetKeypropWithout this fix, a user who typed a name into the sender/receiver filter without selecting a person (no API call, no
senderIdset) would see their typed text persist in the filter bar after navigating to a new search. This is a Nielsen Heuristic 3 violation (User control and freedom — users expect their filter context to reflect the actual search, not stale input). TheresetKeymechanism ensures the input field accurately reflects state. ✅The
$effect(() => { void resetKey; searchTerm = initialName; })pattern is correct Svelte 5. The comment explaining why$derivedcannot be used here ($derivedis read-only, we need local write + external sync) is accurate and should prevent future "optimization" that would break this. ✅DateInput external value re-derive
The guard
if (germanToIso(untrack(() => display)) !== externalIso)prevents the display from overwriting while the user is mid-typing a partial date — a critical UX protection. Withoutuntrack(), the guard itself would create an additional reactive dependency ondisplay, potentially causing a loop. This is done correctly. ✅Accessibility check
No new interactive elements introduced. No color or typography changes. No removal of focus rings. No ARIA label changes. Touch targets unchanged. ✅
The sender/receiver typeahead inputs already use
<label>elements via thelabelprop pattern inPersonTypeahead.svelte. The newresetKeyprop is invisible to assistive technology — it only affects internal state. ✅⚙️ Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
This PR doesn't touch CI workflows, Docker Compose, or infrastructure config. Checking what's relevant from a platform perspective:
Testcontainers strategy
DocumentFtsPagedIntegrationTestuses@DirtiesContext(classMode = AFTER_CLASS)— this keeps the Spring context alive for all 5 tests in the class and rebuilds it once at teardown, rather than resetting after each test. This is the correct approach: a rebuild-per-test would restart the Postgres container 5 times, adding ~30 seconds to CI. WithAFTER_CLASS, the container starts once and seeds/cleans in@BeforeEach. ✅The container image is
postgres:16-alpine— pinned and consistent with the project's production database version. ✅CI overhead estimate
+5 integration tests sharing one Testcontainers instance. The
@BeforeEachseeds 70 documents and clears between tests viadocumentRepository.deleteAll(). Estimated added CI time: ~5-10 seconds. Acceptable.Nothing else to flag
No new Docker services, no new environment variables, no new action versions, no new dependencies. The
scripts/clean-e2e-data.shfile shows as modified in the working tree but isn't part of this PR's diff. LGTM.📋 Elicit — Requirements Engineer
Verdict: ✅ Approved
Issue closure check
Issue #468 ("paginate FTS match-set in SQL") is closed cleanly by this PR. The pre-production audit finding F-31 (loading all matching IDs into memory, measured at 911 IDs/call for "walter") is directly addressed:
rows_per_callfor pure-text RELEVANCE queries drops from match-set size to ≤ page size (50).The PR correctly limits scope to the RELEVANCE/pure-text path. Filter-active queries (date range, sender, receiver, tags, status) continue on the existing in-memory path — this is documented in ADR-008 as an intentional tradeoff, not missing functionality.
Scope check
The included UI fixes (
resetKey,navKey,DateInputre-derive,resolvePersonName) are all related to the URL sync concern that issue #468 encompasses — filter state not correctly persisting or clearing on navigation. No scope creep.One observation (non-blocking)
The PR description's test plan only lists backend tests:
The frontend tests (
page.server.spec.ts— 4 new person name resolution tests;page.svelte.spec.ts— sender/receiver display test) are not mentioned. Consider updating the PR description's test plan checkbox list to include the frontend coverage. This is a documentation gap, not a functional concern.Sara's blockers addressed (commit
ed32de9f)Blocker 1 — Overflow guard untested
Added
searchDocuments_relevance_returns_empty_when_offset_exceeds_maxInttoDocumentServiceSortTest. UsesPageRequest.of(Integer.MAX_VALUE / 10 + 1, 10)which produces an offset > Integer.MAX_VALUE. Verifies: result is empty ANDfindFtsPageRawis never called.Blocker 2 —
toFtsPageUUID-as-String driver varianceAdded
searchDocuments_relevance_handles_string_uuid_from_jdbc_driver. MocksfindFtsPageRawto return a row with the ID column asString(notUUID) — simulating JDBC drivers that return UUID columns as Strings. Verifies the result contains the document with the correct UUID.All 7 tests in
DocumentServiceSortTestgreen. ✅👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
Blockers
"mergeable": false— needs rebase before merge.The PR cannot be merged as-is due to conflicts with the current
main. Some of the frontend changes in this PR (PersonTypeaheadresetKey, DateInput external value sync, SearchFilterBarnavKey,+page.server.tsperson name resolution) overlap with commits already landed onmain(0430383e,8a8205ad,a239c16c). Agit rebase mainonfix/issue-468-fts-sql-paginationis required to resolve conflicts and produce a clean diff showing only what this PR uniquely adds.Suggestions
@BeforeEach seed()is slow (DocumentFtsPagedIntegrationTest): 70 ×saveAndFlush()in a loop is 70 individual DB round-trips per test method. UsesaveAll(list)+ oneem.flush()instead. The tsvector trigger fires at flush time so the batch insert is safe here.console.errorinresolvePersonName(+page.server.ts): This only fires for genuine exceptions (network errors, not 404s — those return early from the!result.response.okbranch), which is correct. But the log message includes the full exception objectewhich may contain stack trace noise in normal server output. Considerconsole.error('[resolvePersonName] failed for', id)without the exception for cleaner prod logs, or wrap it as a structured log entry.void resetKeyin PersonTypeahead (PersonTypeahead.svelte): The pattern is correct — reading a reactive variable inside$effectcreates the dependency without using the value. The existing comment above the effect explains it well. No change needed.Positives
FtsHit/FtsPagerecords are appropriately narrow — they express the type contract ofObject[]without leaking repository internals into the service layer.isPureTextRelevanceas a named predicate is exactly right — it makes the routing condition testable in isolation and readable at the call site.toFtsPagecorrectly handles JDBC driver UUID-as-String variance with theinstanceof UUID u ?pattern match. Good defensive code.rows_per_callmetric framing makes the problem concrete and the decision verifiable post-deploy.🏗️ Markus Keller — Application Architect
Verdict: ✅ Approved
Observations
The two-phase approach (SQL page →
findAllById) is the correct architecture here. Phase 1 pushes OFFSET/LIMIT andCOUNT(*) OVER ()into Postgres — this is where ranking and counting belong. Phase 2 loads only the page's documents via a boundedINclause (≤ page size, typically ≤ 50 IDs). The result: the worst-caseINclause is always O(page size), not O(match set size). This directly addresses finding F-31 from the pre-production audit.isPureTextRelevanceis a textbook guard-clause pattern. A single named predicate routing to the fast path is far better than embedding the condition inline insidesearchDocuments. It is independently testable and readable at the call site.ADR-008 documents the split path correctly. The explicit statement that
findAllMatchingIdsByFtsis retained for (a) bulk-edit, (b) density chart, and (c) SENDER/RECEIVER in-memory sort paths is important — it prevents a future refactor from accidentally "cleaning up" the two-path design without understanding the consequences.FtsHit/FtsPageare package-private records in thedocumentpackage. This is the right scope — they are the repository-to-service contract, not a public API. They should not be promoted topublicunless a second consumer appears.Concern
relevanceSortedPageFromSql— document deleted between FTS query andfindAllById: If a document is deleted in the narrow window betweenfindFtsPageRawandfindAllById, the page will have fewer items than expected and thetotalfrom the window function will be one higher than the actual count. This is an inherent TOCTOU property of the two-phase approach, documented in Alternatives Considered. Acceptable for a family archive with low write frequency. Worth noting in the ADR's Consequences section if it isn't already.Minor
The comment
// In-memory sort on page slice (≤ page size rows) — acceptableon the RECEIVER path (DocumentService.java) says "page slice rows" but the RECEIVER path loads the entire filtered set (findAll(spec)) and THEN slices. The≤ page size rowsqualifier is wrong — it's ≤ match-set size rows. The in-memory sort is acceptable because the filtered set (with an active receiver filter) is typically small, not because slicing happens first. Consider:// In-memory sort on filtered set — acceptable while filtered match count stays under ~1 K rows.🔒 Nora "NullX" Steiner — Security Engineer
Verdict: ✅ Approved
What I Checked
findFtsPageRawSQL — injection surface: All three user-controlled values (query,offset,limit) are bound via named parameters (:query,:offset,:limit). Spring Data JPA's@Query(nativeQuery = true)with@Parambinding is parameterized — the driver prepares the statement and substitutes the values safely. No injection risk.regexp_replaceinside the SQL CTE: The pattern operates onwebsearch_to_tsquery('german', :query)::text, which is the output of Postgres's own FTS parser, not raw user input. The regex transforms'term'to'term':*for prefix matching. This is server-side string manipulation on already-sanitized, already-parsed tsquery text. Not a user-controlled pattern. No injection risk.UUID_REguard inresolvePersonName(+page.server.ts): The regex/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/ivalidates that the URL parameter is a proper UUID before making an API call. This prevents SSRF-adjacent abuse where a craftedsenderIdvalue could be used to probe internal paths. Good practice.console.errorlog inresolvePersonName: Logsid(a UUID from the URL, user-controlled but not sensitive) and the exception object. The UUID is already in the URL bar so logging it is not a data exposure concern. The exception object may contain stack trace information but not application secrets.No new input surfaces, no XSS vectors, no auth bypass: The frontend changes (PersonTypeahead, DateInput, SearchFilterBar) do not introduce new user-controlled data flows that could reach unescaped output paths. The
resolvePersonNameperson lookup uses the same authenticated API client as the rest of the page load.One Note
The security comment on
runner-config.yaml(from PR #476) correctly documents the Docker socket trust model. Unrelated to this PR, but worth confirming that thefix/issue-468-fts-sql-paginationbranch does not inadvertently include therunner-config.yamlchanges (they should be inmainalready). Verify the rebase doesn't duplicate that file.🧪 Sara Holt — QA Engineer
Verdict: ⚠️ Approved with concerns
Strengths
DocumentFtsPagedIntegrationTestcovers the five key repository-level scenarios — page size, window total, last-page remainder, no-match, and stopword-only. Using real Postgres via Testcontainers means the GIN index and tsvector trigger semantics are production-identical. These tests give high confidence in the SQL query correctness.DocumentServiceSortTestnew tests are thorough:searchDocuments_relevance_pureText_calls_findFtsPageRaw_not_findAllMatchingIds— routing guard verifiedsearchDocuments_relevance_returns_empty_when_offset_exceeds_maxInt— overflow guard verified (and thenever()verify correctly confirms the guard fires before the DB call)searchDocuments_relevance_handles_string_uuid_from_jdbc_driver— JDBC driver variance testedsearchDocuments_relevance_with_active_filter_uses_inMemory_path— fallback path verifiedpage.server.spec.tsperson name resolution tests are well-structured with a cleanmakeSearchMockfactory.Concerns
@BeforeEach seed()performance (DocumentFtsPagedIntegrationTest): 70 ×saveAndFlush()is 70 individual database round-trips before every test method. With 5 test methods, that's 350 inserts across the test run. ConsiderdocumentRepository.saveAll(docs)+em.flush()+em.clear()with all documents pre-built, which reduces this to 1 batch insert per test method. This is especially important since@DirtiesContext(AFTER_CLASS)means the container is shared — slow setup accumulates.Missing coverage:
ftsPage.total()propagation: The integration tests verify that the SQL window function returns the correct total count. The service tests verify routing. But there's no test that checks theDocumentSearchResult.totalItems(or equivalent) field after arelevanceSortedPageFromSqlcall. AsearchDocuments_relevance_pureText_totalItems_reflectsSqlWindowCounttest inDocumentServiceSortTestwould close this gap — it's the field the frontend uses to render pagination.PersonTypeahead
resetKeysmoke-test limitation: The comment in the spec is correct and appropriate — thererender()path tests re-mount, not in-place prop update. No action required, the note protects future contributors from confusion.Verdict Rationale
The core FTS tests are solid. The two gaps (seed performance + total propagation) are not blocking but the total propagation gap is worth closing before this lands, since it tests the primary observable behaviour of the fix from a user perspective.
🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
What I Checked
CI workflow: No changes to
.gitea/workflows/ci.yml. The existingunit-tests,backend-unit-tests, and related jobs will exerciseDocumentFtsPagedIntegrationTestvia./mvnw clean test. The Testcontainers runner infrastructure (Docker socket env vars,TESTCONTAINERS_RYUK_DISABLED=true) is already wired from PR #476. No new CI configuration required.Testcontainers test class:
DocumentFtsPagedIntegrationTestuses@Import({PostgresContainerConfig.class, FlywayConfig.class})— consistent with the existing pattern inDocumentFtsTest,DocumentServiceIntegrationTest, and others. No new test infrastructure to provision.@DirtiesContext(classMode = AFTER_CLASS): Correct choice. Rebuilds the Spring context once after all 5 tests in the class complete rather than after each test. Reduces context rebuild overhead significantly for a Testcontainers test class. Consistent with the pattern used in other integration tests in this repo.No new Docker images, no new services, no new volumes: This PR is purely application-layer. Nothing changes in
docker-compose.ymlor the platform stack.Branch isolation: The
fix/issue-468-fts-sql-paginationbranch diverged before some recent main commits landed. The"mergeable": falsestatus means CI won't run on a merge commit until the rebase is done. Once rebased, the full CI suite should pass.Operational Note
resolvePersonNamein+page.server.tsnow makes parallel/api/persons/{id}calls on every document page load that has a sender/receiver filter active. At the current archive scale (single family, ≤ 10 concurrent users), this is a non-event. Worth noting: if the person API is temporarily slow, these calls could delay the SSR response for the documents page. Thecatchhandler returns''gracefully, so a person API timeout won't cause a 500. Acceptable.🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist
Verdict: ✅ Approved
What I Checked
Person name pre-fill on filtered navigation (
+page.server.ts++page.svelte): This is a genuine UX improvement. Previously, if a user navigated to/documents?senderId=<uuid>(e.g. from a bookmark or shared link), the sender typeahead would show the UUID in its input (or empty) rather than the person's name. NowresolvePersonNamefetches the display name server-side and passes it asinitialSenderName/initialReceiverName. The PersonTypeahead renders with the human-readable name from the first paint — no flash of raw UUIDs. Good for the 60+ audience who need clear visual feedback on what filter is active.resetKey/navKeyfor PersonTypeahead (SearchFilterBar.svelte): The pattern correctly solves the case where a user types a partial name (without selecting a person), navigates away, and comes back — the typeahead now resets to reflect the current URL state rather than the stale typed text. Correct behavior for the search filter context.DateInputexternal value sync (DateInput.svelte): The$effectre-derives the display string when thevalueprop changes externally. Theuntrack(() => display)guard prevents the effect from re-running when the user is mid-typing a partial date. This is the correct implementation pattern — it won't overwrite"01.01"with""while the user is typing"01.01.1920".No Accessibility Regressions
The changes don't modify any ARIA attributes, focus management, or keyboard interaction patterns. The
PersonTypeaheadinput remains arole="combobox"with proper labelling.DateInputremains atype="text"with the existing validation feedback.Minor
The
void resetKeyreactive read pattern inPersonTypeahead.svelteis invisible to users but its absence would cause subtle UX bugs. The comment above the effect (// Sync display text when initialName changes OR when resetKey increments) is sufficient documentation for developers maintaining this component.📋 Elicit — Requirements Engineer
Verdict: ⚠️ Approved with concerns
Scope Observation
This PR bundles two distinct concerns:
FTS SQL pagination (the stated issue #468 fix):
findFtsPageRaw,FtsHit,FtsPage,isPureTextRelevance, ADR-008, and all backend test changes. This is a clean, well-scoped backend performance fix.Filter display state / person name resolution (not mentioned in issue #468):
resolvePersonNamein+page.server.ts,initialSenderName/initialReceiverNamethreading through+page.svelte,navKey/resetKeythroughSearchFilterBar→PersonTypeahead, andDateInputexternal value sync.Looking at the git history, commits
0430383e(DateInput external value),8a8205ad(PersonTypeahead resetKey), anda239c16c(filter display state sync) already landed onmainvia what appears to be issue #482's fix branch. This means the frontend filter-state changes in PR #488 are the source of the merge conflict — these files were independently modified on both paths.The PR needs to be rebased. After rebasing, the diff should contain only the FTS-specific changes (backend + tests + ADR-008) plus whatever frontend wiring was uniquely added in this branch that didn't land through #482.
Acceptance Criteria Coverage
Issue #468's ACs (per the issue body not shown here, inferred from PR description):
rows_per_calldrops from match-set size to page size for pure-text RELEVANCE queriesfindAllMatchingIdsByFts)DocumentSearchResult.totalItems— Sara's gap)Recommendation
Rebase onto main, verify the diff is clean (only FTS changes remain), then this is ready to merge. The architecture is sound and the ADR is thorough.
ed32de9f10to7e1f4f8b09