fix(fts): paginate FTS match-set in SQL instead of loading all matching IDs #488

Merged
marcel merged 6 commits from fix/issue-468-fts-sql-pagination into main 2026-05-09 16:35:15 +02:00

6 Commits

Author SHA1 Message Date
Marcel
7e1f4f8b09 test(fts): add overflow guard and UUID-as-String regression tests
Some checks failed
CI / Unit & Component Tests (push) Failing after 4m38s
CI / OCR Service Tests (push) Successful in 36s
CI / Backend Unit Tests (push) Failing after 3m26s
CI / Unit & Component Tests (pull_request) Failing after 4m29s
CI / OCR Service Tests (pull_request) Successful in 32s
CI / Backend Unit Tests (pull_request) Failing after 3m20s
- searchDocuments_relevance_returns_empty_when_offset_exceeds_maxInt:
  proves the long→int guard fires and findFtsPageRaw is never called
- searchDocuments_relevance_handles_string_uuid_from_jdbc_driver:
  exercises the toFtsPage String fallback branch for JDBC drivers that
  return UUID columns as String instead of java.util.UUID

Addresses Sara's review concerns on PR #488.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 16:31:52 +02:00
Marcel
ff2eb2ab91 refactor(fts): address PR #488 review concerns
- Extract isPureTextRelevance() private static method to replace the
  7-clause inline boolean in searchDocuments
- Guard long→int cast in relevanceSortedPageFromSql to prevent silent
  overflow at page ≥43M (CWE-190)
- resolvePersonName now uses the typed API client (createApiClient)
  instead of raw fetch, aligning with project conventions
- Update DocumentServiceTest stubs to match new FTS path (findFtsPageRaw
  + findAllById instead of findAllMatchingIdsByFts)
- Rewrite page.server.spec.ts person-name tests to mock via path-based
  API dispatch, matching the new api.GET call site

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 16:31:52 +02:00
Marcel
4a0a43b1cf test(fts): add integration tests and update unit tests for SQL-paginated relevance
- DocumentFtsPagedIntegrationTest: Testcontainers repo-level tests for
  findFtsPageRaw (page size, window total, last page, no matches, stopword)
- DocumentServiceSortTest: rewritten to stub findFtsPageRaw + findAllById
  for the pure-text RELEVANCE path; verifies filter-active path stays in-memory
- DocumentServiceTest: update two enrichment tests to use new SQL-path stubs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 16:31:52 +02:00
Marcel
a8e732ac39 feat(fts): push FTS pagination into SQL via CTE window function
Pure-text RELEVANCE queries now use findFtsPageRaw (CTE + COUNT(*) OVER())
instead of loading all matching IDs into memory and sorting in-process.
Non-text paths (filters active, DATE sort) still use the in-memory path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 16:31:52 +02:00
Marcel
ea136a8724 refactor(fts): add FtsHit/FtsPage records; rename findRankedIdsByFts -> findAllMatchingIdsByFts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 16:31:52 +02:00
Marcel
de1c55d18e docs(adr): ADR-008 SQL-level FTS pagination via window-function CTE
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 16:31:52 +02:00