fix(fts): paginate FTS match-set in SQL instead of loading all matching IDs #488

marcel · 2026-05-09T14:21:55+02:00

marcel commented

2026-05-09 14:21:55 +02:00

Closes #468

Summary

Pure-text RELEVANCE queries now call findFtsPageRaw — a CTE that pushes OFFSET/LIMIT into Postgres and returns the page's total match count via COUNT(*) OVER(), avoiding the N-IDs-into-memory pattern
Non-text paths (active filters, DATE sort) continue to use findAllMatchingIdsByFts as before
Added FtsHit / FtsPage value records as typed wrappers around the raw Object[] rows
ADR-008 documents the design decision

Test plan

DocumentFtsPagedIntegrationTest — 5 Testcontainers repo-level tests (page size, window total, last-page remainder, no-match, stopword)
DocumentServiceSortTest — verifies pure-text RELEVANCE routes to findFtsPageRaw, filter-active path stays in-memory
DocumentServiceTest — enrichment tests updated for the new SQL path
Full backend suite: 1553 tests, 0 failures

🤖 Generated with Claude Code

Closes #468 ## Summary - Pure-text RELEVANCE queries now call `findFtsPageRaw` — a CTE that pushes `OFFSET`/`LIMIT` into Postgres and returns the page's total match count via `COUNT(*) OVER()`, avoiding the N-IDs-into-memory pattern - Non-text paths (active filters, DATE sort) continue to use `findAllMatchingIdsByFts` as before - Added `FtsHit` / `FtsPage` value records as typed wrappers around the raw `Object[]` rows - ADR-008 documents the design decision ## Test plan - [ ] `DocumentFtsPagedIntegrationTest` — 5 Testcontainers repo-level tests (page size, window total, last-page remainder, no-match, stopword) - [ ] `DocumentServiceSortTest` — verifies pure-text RELEVANCE routes to `findFtsPageRaw`, filter-active path stays in-memory - [ ] `DocumentServiceTest` — enrichment tests updated for the new SQL path - [ ] Full backend suite: 1553 tests, 0 failures 🤖 Generated with [Claude Code](https://claude.ai/claude-code)

marcel commented

2026-05-09 14:25:38 +02:00

🏛️ Markus Keller — Senior Application Architect

Verdict: ⚠️ Approved with concerns

What's Done Well

ADR-008 written before implementation — the commit log shows docs(adr) preceding the feat(fts) commit. That's the correct order. ✅
FtsHit and FtsPage are package-private records in the document package — proper boundary discipline, no leakage to other domains. ✅
No cross-domain repository calls introduced. DocumentService calls DocumentRepository — layering rules respected. ✅
The SQL CTE uses named parameters (:query, :offset, :limit) — injection-proof by construction. ✅
ADR-008 explicitly documents the two retained in-memory paths (SENDER/RECEIVER sorts) and their scaling limit (~10K documents). That's honest engineering. ✅

Concerns

1. resolvePersonName bypasses the typed API client — suggestion

In frontend/src/routes/documents/+page.server.ts, the new helper does:

const res = await fetch(`/api/persons/${id}`);

The project pattern is createApiClient(fetch).GET('/api/persons/{id}', ...). The raw fetch call bypasses type-checking and the OpenAPI contract. If the /api/persons/{id} response shape changes, this call silently breaks at runtime. Use the typed client:

const api = createApiClient(fetch);
const result = await api.GET('/api/persons/{id}', { params: { path: { id } } });
if (!result.response.ok) return '';
return result.data?.displayName ?? '';

2. pureTextRelevance condition is 7 clauses — suggestion

boolean pureTextRelevance = hasText && (sort == null || sort == DocumentSort.RELEVANCE)
        && from == null && to == null && sender == null && receiver == null
        && (tags == null || tags.isEmpty()) && (tagQ == null || tagQ.isBlank()) && status == null;

This inline condition spans 3 lines and requires reading 7 distinct checks to understand the routing decision. Extract it to a named private method: isPureTextRelevance(hasText, sort, from, to, sender, receiver, tags, tagQ, status). The name documents the intent; the method body documents the contract.

Documentation Coverage Check

Category	Triggered?	Status
New Flyway migration	No	N/A
New controller/service	No	N/A
New SvelteKit route	No	N/A
New external system	No	N/A
New ErrorCode/Permission	No	N/A
New domain concept/term	Internal types only	N/A
Architectural decision	Yes — `findFtsPageRaw` CTE + window function	✅ ADR-008 committed

Documentation coverage: LGTM.

## 🏛️ Markus Keller — Senior Application Architect **Verdict: ⚠️ Approved with concerns** ### What's Done Well - **ADR-008 written before implementation** — the commit log shows `docs(adr)` preceding the `feat(fts)` commit. That's the correct order. ✅ - `FtsHit` and `FtsPage` are package-private records in the `document` package — proper boundary discipline, no leakage to other domains. ✅ - No cross-domain repository calls introduced. `DocumentService` calls `DocumentRepository` — layering rules respected. ✅ - The SQL CTE uses named parameters (`:query`, `:offset`, `:limit`) — injection-proof by construction. ✅ - ADR-008 explicitly documents the two retained in-memory paths (SENDER/RECEIVER sorts) and their scaling limit (~10K documents). That's honest engineering. ✅ ### Concerns **1. `resolvePersonName` bypasses the typed API client — suggestion** In `frontend/src/routes/documents/+page.server.ts`, the new helper does: ```typescript const res = await fetch(`/api/persons/${id}`); ``` The project pattern is `createApiClient(fetch).GET('/api/persons/{id}', ...)`. The raw `fetch` call bypasses type-checking and the OpenAPI contract. If the `/api/persons/{id}` response shape changes, this call silently breaks at runtime. Use the typed client: ```typescript const api = createApiClient(fetch); const result = await api.GET('/api/persons/{id}', { params: { path: { id } } }); if (!result.response.ok) return ''; return result.data?.displayName ?? ''; ``` **2. `pureTextRelevance` condition is 7 clauses — suggestion** ```java boolean pureTextRelevance = hasText && (sort == null || sort == DocumentSort.RELEVANCE) && from == null && to == null && sender == null && receiver == null && (tags == null || tags.isEmpty()) && (tagQ == null || tagQ.isBlank()) && status == null; ``` This inline condition spans 3 lines and requires reading 7 distinct checks to understand the routing decision. Extract it to a named private method: `isPureTextRelevance(hasText, sort, from, to, sender, receiver, tags, tagQ, status)`. The name documents the intent; the method body documents the contract. ### Documentation Coverage Check | Category | Triggered? | Status | |---|---|---| | New Flyway migration | No | N/A | | New controller/service | No | N/A | | New SvelteKit route | No | N/A | | New external system | No | N/A | | New ErrorCode/Permission | No | N/A | | New domain concept/term | Internal types only | N/A | | Architectural decision | Yes — `findFtsPageRaw` CTE + window function | ✅ ADR-008 committed | Documentation coverage: **LGTM**.

marcel commented

2026-05-09 14:25:55 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ⚠️ Approved with concerns

TDD Evidence

The commit log shows the correct order: ADR → records/rename → feat → test. The failing tests (DocumentServiceSortTest breaking after the service refactor) were caught and fixed — that's the red/green cycle working as intended. ✅

What's Clean

toFtsPage is a well-extracted helper: named column constants (COL_ID, COL_RANK, COL_TOTAL) + defensive UUID cast + clear Javadoc. ✅
FtsHit and FtsPage as Java records: correct choice for immutable value types. ✅
relevanceSortedPageFromSql is under 20 lines and does one thing. ✅
$effect(() => { void resetKey; searchTerm = initialName; }) with the comment explaining the WHY of the void pattern — exactly the right use of a comment. ✅
UUID_RE.test(id) guard in resolvePersonName before the fetch — validates at the boundary. ✅

Suggestions

1. Extract pureTextRelevance into a named method

// Current — 7 inline clauses, hard to scan
boolean pureTextRelevance = hasText && (sort == null || sort == DocumentSort.RELEVANCE)
        && from == null && to == null && sender == null && receiver == null
        && (tags == null || tags.isEmpty()) && (tagQ == null || tagQ.isBlank()) && status == null;

// Better — intent is readable at the call site
private static boolean isPureTextRelevance(boolean hasText, DocumentSort sort,
        LocalDate from, LocalDate to, UUID sender, UUID receiver,
        List<String> tags, String tagQ, DocumentStatus status) {
    return hasText && (sort == null || sort == DocumentSort.RELEVANCE)
            && from == null && to == null && sender == null && receiver == null
            && (tags == null || tags.isEmpty()) && (tagQ == null || tagQ.isBlank()) && status == null;
}

2. Silent long → int cast in relevanceSortedPageFromSql

int offset = (int) pageable.getOffset();  // getOffset() returns long — silent overflow

At page ~43M with size 50 this would silently wrap negative. Add a guard or document the constraint:

long rawOffset = pageable.getOffset();
if (rawOffset > Integer.MAX_VALUE) return DocumentSearchResult.of(List.of());
int offset = (int) rawOffset;

3. ftsRows2 variable naming in DocumentServiceTest

DocumentServiceTest line ~1657 uses ftsRows2 as a local name to avoid shadowing the outer rows variable. This suggests a copy-paste refactor. Consider enrichmentFtsRows or just restructuring to avoid the shadow:

// Instead of ftsRows2 — rename to match what this test actually verifies
List<Object[]> snippetFtsRows = new ArrayList<>();
snippetFtsRows.add(new Object[]{docId, 0.5d, 1L});

4. resolvePersonName swallows errors silently

} catch {
    return '';
}

No logging means debugging failed person lookups in production requires tracing the missing name in the UI. Add at minimum:

} catch (e) {
    console.error('[resolvePersonName] failed for id', id, e);
    return '';
}

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ⚠️ Approved with concerns** ### TDD Evidence The commit log shows the correct order: ADR → records/rename → feat → test. The failing tests (`DocumentServiceSortTest` breaking after the service refactor) were caught and fixed — that's the red/green cycle working as intended. ✅ ### What's Clean - `toFtsPage` is a well-extracted helper: named column constants (`COL_ID`, `COL_RANK`, `COL_TOTAL`) + defensive UUID cast + clear Javadoc. ✅ - `FtsHit` and `FtsPage` as Java records: correct choice for immutable value types. ✅ - `relevanceSortedPageFromSql` is under 20 lines and does one thing. ✅ - `$effect(() => { void resetKey; searchTerm = initialName; })` with the comment explaining the WHY of the `void` pattern — exactly the right use of a comment. ✅ - `UUID_RE.test(id)` guard in `resolvePersonName` before the fetch — validates at the boundary. ✅ ### Suggestions **1. Extract `pureTextRelevance` into a named method** ```java // Current — 7 inline clauses, hard to scan boolean pureTextRelevance = hasText && (sort == null || sort == DocumentSort.RELEVANCE) && from == null && to == null && sender == null && receiver == null && (tags == null || tags.isEmpty()) && (tagQ == null || tagQ.isBlank()) && status == null; ``` ```java // Better — intent is readable at the call site private static boolean isPureTextRelevance(boolean hasText, DocumentSort sort, LocalDate from, LocalDate to, UUID sender, UUID receiver, List<String> tags, String tagQ, DocumentStatus status) { return hasText && (sort == null || sort == DocumentSort.RELEVANCE) && from == null && to == null && sender == null && receiver == null && (tags == null || tags.isEmpty()) && (tagQ == null || tagQ.isBlank()) && status == null; } ``` **2. Silent `long → int` cast in `relevanceSortedPageFromSql`** ```java int offset = (int) pageable.getOffset(); // getOffset() returns long — silent overflow ``` At page ~43M with size 50 this would silently wrap negative. Add a guard or document the constraint: ```java long rawOffset = pageable.getOffset(); if (rawOffset > Integer.MAX_VALUE) return DocumentSearchResult.of(List.of()); int offset = (int) rawOffset; ``` **3. `ftsRows2` variable naming in `DocumentServiceTest`** `DocumentServiceTest` line ~1657 uses `ftsRows2` as a local name to avoid shadowing the outer `rows` variable. This suggests a copy-paste refactor. Consider `enrichmentFtsRows` or just restructuring to avoid the shadow: ```java // Instead of ftsRows2 — rename to match what this test actually verifies List<Object[]> snippetFtsRows = new ArrayList<>(); snippetFtsRows.add(new Object[]{docId, 0.5d, 1L}); ``` **4. `resolvePersonName` swallows errors silently** ```typescript } catch { return ''; } ``` No logging means debugging failed person lookups in production requires tracing the missing name in the UI. Add at minimum: ```typescript } catch (e) { console.error('[resolvePersonName] failed for id', id, e); return ''; } ```

marcel commented

2026-05-09 14:26:10 +02:00

🔒 Nora "NullX" Steiner — Application Security Engineer

Verdict: ⚠️ Approved with concerns

What's Secure

1. Parameterized native query — no SQL injection

findFtsPageRaw uses @Param("query"), @Param("offset"), @Param("limit") throughout. Spring Data JPA's native query parameterization is injection-proof here. The CTE structure doesn't open any new attack surface. ✅

2. UUID validation before fetch — SSRF/path-traversal prevented

const UUID_RE = /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i;
if (!UUID_RE.test(id)) return '';

This guard in resolvePersonName prevents a crafted senderId=../secrets from being forwarded to the backend. ✅

3. Server-side fetch — auth context preserved

resolvePersonName uses SvelteKit's server-side fetch, which inherits the session cookie automatically. The person lookup runs with the caller's session — no auth bypass. ✅

4. Defensive UUID cast in toFtsPage

UUID id = r[COL_ID] instanceof UUID u ? u : UUID.fromString(r[COL_ID].toString());

Pattern-matching instance check guards against JDBC driver variance. UUID.fromString() will throw IllegalArgumentException on a malformed value — not a security issue, but worth knowing that the exception propagates up. ✅

Concern

long → int offset overflow — security smell (CWE-190: Integer Overflow)

int offset = (int) pageable.getOffset();

pageable.getOffset() returns long. For page=42950000, size=50, offset = 2,147,500,000 > Integer.MAX_VALUE, which wraps to a negative value. PostgreSQL would then see OFFSET -someValue and throw an error — no data exposure, but a denial-of-service via crafted URL parameters.

PoC: GET /api/documents/search?page=42950000&size=50 → 500 error.

Fix:

long rawOffset = pageable.getOffset();
if (rawOffset > Integer.MAX_VALUE) {
    return DocumentSearchResult.of(List.of());
}
int offset = (int) rawOffset;

Or simply return an empty result set when the offset exceeds any plausible document count. The archive is unlikely to have 2 billion documents, so a reasonable cap (e.g., > 10_000_000L) would suffice.

This is a suggestion given the archive's expected scale, but it's a clean and cheap fix.

## 🔒 Nora "NullX" Steiner — Application Security Engineer **Verdict: ⚠️ Approved with concerns** ### What's Secure **1. Parameterized native query — no SQL injection** `findFtsPageRaw` uses `@Param("query")`, `@Param("offset")`, `@Param("limit")` throughout. Spring Data JPA's native query parameterization is injection-proof here. The CTE structure doesn't open any new attack surface. ✅ **2. UUID validation before fetch — SSRF/path-traversal prevented** ```typescript const UUID_RE = /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i; if (!UUID_RE.test(id)) return ''; ``` This guard in `resolvePersonName` prevents a crafted `senderId=../secrets` from being forwarded to the backend. ✅ **3. Server-side fetch — auth context preserved** `resolvePersonName` uses SvelteKit's server-side `fetch`, which inherits the session cookie automatically. The person lookup runs with the caller's session — no auth bypass. ✅ **4. Defensive UUID cast in `toFtsPage`** ```java UUID id = r[COL_ID] instanceof UUID u ? u : UUID.fromString(r[COL_ID].toString()); ``` Pattern-matching instance check guards against JDBC driver variance. `UUID.fromString()` will throw `IllegalArgumentException` on a malformed value — not a security issue, but worth knowing that the exception propagates up. ✅ ### Concern **`long → int` offset overflow — security smell (CWE-190: Integer Overflow)** ```java int offset = (int) pageable.getOffset(); ``` `pageable.getOffset()` returns `long`. For `page=42950000, size=50`, offset = `2,147,500,000` > `Integer.MAX_VALUE`, which wraps to a negative value. PostgreSQL would then see `OFFSET -someValue` and throw an error — no data exposure, but a denial-of-service via crafted URL parameters. **PoC**: `GET /api/documents/search?page=42950000&size=50` → 500 error. **Fix**: ```java long rawOffset = pageable.getOffset(); if (rawOffset > Integer.MAX_VALUE) { return DocumentSearchResult.of(List.of()); } int offset = (int) rawOffset; ``` Or simply return an empty result set when the offset exceeds any plausible document count. The archive is unlikely to have 2 billion documents, so a reasonable cap (e.g., `> 10_000_000L`) would suffice. This is a **suggestion** given the archive's expected scale, but it's a clean and cheap fix.

marcel commented

2026-05-09 14:26:28 +02:00

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: ⚠️ Approved with concerns

What's Well Tested

DocumentFtsPagedIntegrationTest: 5 tests against real Postgres via Testcontainers, covering first page, window total, last-page remainder, no-match, and stopword-only query. All boundary conditions covered. ✅
@DirtiesContext(AFTER_CLASS): correct choice — avoids rebuilding the Testcontainers context after every test while still cleaning up after the class. ✅
DocumentServiceSortTest now explicitly verifies the routing decision: findFtsPageRaw is called for pure-text RELEVANCE, findAllMatchingIdsByFts is called when filters are active. That's behavior-level coverage, not just line coverage. ✅
page.server.spec.ts: 4 targeted tests for resolvePersonName — UUID validation guard, 404 fallback, happy path sender, happy path receiver. Thorough for a new boundary function. ✅

Concerns

1. No service-level integration test for the pureTextRelevance routing path

DocumentFtsPagedIntegrationTest only tests the repository (findFtsPageRaw). There's no test that calls DocumentService.searchDocuments(...) against a real Postgres container and verifies that the pure-text RELEVANCE path actually runs the SQL query end-to-end. The unit tests mock the repository, so they prove the routing logic but not that the SQL actually executes and returns the right result through the full stack. A service-level @SpringBootTest test with Testcontainers would close this gap.

2. (int) pageable.getOffset() — missing edge case test

The long → int cast in relevanceSortedPageFromSql silently overflows for very large page numbers. A test at the unit level:

@Test
void searchDocuments_pureText_returnsEmpty_when_offset_exceeds_safe_int_range() {
    Pageable hugePage = PageRequest.of(42_950_000, 50);
    // should not throw, should return empty result
    DocumentSearchResult result = documentService.searchDocuments(
            "Brief", null, null, null, null, null, null, null, null, null, null, hugePage);
    assertThat(result.items()).isEmpty();
}

This test would currently fail (500 from Postgres), revealing the overflow bug before it reaches production.

3. ftsRows2 variable name in DocumentServiceTest

Minor readability issue at line ~1657. Two different local variables named rows and ftsRows2 in adjacent tests suggest the refactor was done by modifying stubs rather than restructuring the test. Not a defect, but makes the test harder to scan.

4. @BeforeEach deleteAll() performance note

The integration test seeds 70 documents × 5 test runs = 350 saveAndFlush calls per test class run. For CI this is acceptable, but if the seed count grows, consider using @Sql scripts or a single @BeforeAll with @Transactional test rollback instead.

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ⚠️ Approved with concerns** ### What's Well Tested - **`DocumentFtsPagedIntegrationTest`**: 5 tests against real Postgres via Testcontainers, covering first page, window total, last-page remainder, no-match, and stopword-only query. All boundary conditions covered. ✅ - **`@DirtiesContext(AFTER_CLASS)`**: correct choice — avoids rebuilding the Testcontainers context after every test while still cleaning up after the class. ✅ - **`DocumentServiceSortTest`** now explicitly verifies the routing decision: `findFtsPageRaw` is called for pure-text RELEVANCE, `findAllMatchingIdsByFts` is called when filters are active. That's behavior-level coverage, not just line coverage. ✅ - **`page.server.spec.ts`**: 4 targeted tests for `resolvePersonName` — UUID validation guard, 404 fallback, happy path sender, happy path receiver. Thorough for a new boundary function. ✅ ### Concerns **1. No service-level integration test for the `pureTextRelevance` routing path** `DocumentFtsPagedIntegrationTest` only tests the repository (`findFtsPageRaw`). There's no test that calls `DocumentService.searchDocuments(...)` against a real Postgres container and verifies that the pure-text RELEVANCE path actually runs the SQL query end-to-end. The unit tests mock the repository, so they prove the routing logic but not that the SQL actually executes and returns the right result through the full stack. A service-level `@SpringBootTest` test with Testcontainers would close this gap. **2. `(int) pageable.getOffset()` — missing edge case test** The `long → int` cast in `relevanceSortedPageFromSql` silently overflows for very large page numbers. A test at the unit level: ```java @Test void searchDocuments_pureText_returnsEmpty_when_offset_exceeds_safe_int_range() { Pageable hugePage = PageRequest.of(42_950_000, 50); // should not throw, should return empty result DocumentSearchResult result = documentService.searchDocuments( "Brief", null, null, null, null, null, null, null, null, null, null, hugePage); assertThat(result.items()).isEmpty(); } ``` This test would currently fail (500 from Postgres), revealing the overflow bug before it reaches production. **3. `ftsRows2` variable name in `DocumentServiceTest`** Minor readability issue at line ~1657. Two different local variables named `rows` and `ftsRows2` in adjacent tests suggest the refactor was done by modifying stubs rather than restructuring the test. Not a defect, but makes the test harder to scan. **4. `@BeforeEach deleteAll()` performance note** The integration test seeds 70 documents × 5 test runs = 350 `saveAndFlush` calls per test class run. For CI this is acceptable, but if the seed count grows, consider using `@Sql` scripts or a single `@BeforeAll` with `@Transactional` test rollback instead.

marcel commented

2026-05-09 14:26:35 +02:00

🚀 Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

No infrastructure changes in this PR. Checked the usual suspects:

No new Docker services, volumes, or docker-compose changes ✅
No CI pipeline modifications ✅
No secrets, credentials, or environment variables introduced ✅
No new external service dependencies ✅
DocumentFtsPagedIntegrationTest uses @DataJpaTest + Testcontainers — this pattern works cleanly in CI without DinD or external Postgres; the Testcontainers library manages the container lifecycle ✅
@DirtiesContext(AFTER_CLASS) keeps context rebuilds to one per test class instead of one per test — this is the right setting for Testcontainers-based tests in CI; reduces cold-start overhead ✅

One observation: the new resolvePersonName function in +page.server.ts adds up to 2 extra backend calls per page load when sender/receiver filters are active. At the family archive's scale this is invisible, but if this ever needs to be profiled, Grafana's backend request rate dashboard is the right first look.

LGTM from the platform side.

## 🚀 Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** No infrastructure changes in this PR. Checked the usual suspects: - No new Docker services, volumes, or `docker-compose` changes ✅ - No CI pipeline modifications ✅ - No secrets, credentials, or environment variables introduced ✅ - No new external service dependencies ✅ - `DocumentFtsPagedIntegrationTest` uses `@DataJpaTest` + Testcontainers — this pattern works cleanly in CI without DinD or external Postgres; the Testcontainers library manages the container lifecycle ✅ - `@DirtiesContext(AFTER_CLASS)` keeps context rebuilds to one per test class instead of one per test — this is the right setting for Testcontainers-based tests in CI; reduces cold-start overhead ✅ One observation: the new `resolvePersonName` function in `+page.server.ts` adds up to 2 extra backend calls per page load when sender/receiver filters are active. At the family archive's scale this is invisible, but if this ever needs to be profiled, Grafana's backend request rate dashboard is the right first look. LGTM from the platform side.

marcel commented

2026-05-09 14:26:48 +02:00

📋 Elicit — Requirements Engineer

Verdict: ⚠️ Approved with concerns

Issue #468 Coverage

The stated requirement — paginate the FTS match-set in SQL instead of loading all matching IDs into memory — is fully satisfied:

Acceptance Criterion	Status
Pure-text RELEVANCE queries use SQL-level OFFSET/LIMIT	✅ `findFtsPageRaw` CTE
Total match count returned without a separate COUNT query	✅ `COUNT(*) OVER ()` window function
Non-text filter paths unchanged (date, sender, receiver, tags, status)	✅ Fallback path preserved
Bulk-edit "select all" path not regressed	✅ `findAllMatchingIdsByFts` retained

ADR-008 captures the trade-offs clearly. The "acceptable while under ~10K documents" constraint is explicit and bounded. ✅

Scope Observation — PR bundles two issues

This PR's diff includes changes from Issue #482 (filter display sync) in addition to #468:

PersonTypeahead.svelte: resetKey prop
DateInput.svelte: external value sync
+page.server.ts: resolvePersonName + initialSenderName/initialReceiverName
SearchFilterBar.svelte: navKey prop

The PR description only says "Closes #468." If these changes were already merged via PR #487, they should not appear in this diff. If they were not, the PR description should also say "Closes #482" and the PR title should reflect both fixes.

Recommendation: Verify whether the #482 changes are already in main. If yes, rebase this branch onto the current main to drop the duplicate commits. If no, add "Closes #482" to the PR description.

NFR Coverage

The extra resolvePersonName call per page load (when filters are active) is an implicit performance NFR change. No measurement has been added. For the current scale this is negligible, but noting it here so it can be tracked if latency ever becomes a concern.

## 📋 Elicit — Requirements Engineer **Verdict: ⚠️ Approved with concerns** ### Issue #468 Coverage The stated requirement — paginate the FTS match-set in SQL instead of loading all matching IDs into memory — is fully satisfied: | Acceptance Criterion | Status | |---|---| | Pure-text RELEVANCE queries use SQL-level OFFSET/LIMIT | ✅ `findFtsPageRaw` CTE | | Total match count returned without a separate COUNT query | ✅ `COUNT(*) OVER ()` window function | | Non-text filter paths unchanged (date, sender, receiver, tags, status) | ✅ Fallback path preserved | | Bulk-edit "select all" path not regressed | ✅ `findAllMatchingIdsByFts` retained | ADR-008 captures the trade-offs clearly. The "acceptable while under ~10K documents" constraint is explicit and bounded. ✅ ### Scope Observation — PR bundles two issues This PR's diff includes changes from **Issue #482** (filter display sync) in addition to #468: - `PersonTypeahead.svelte`: `resetKey` prop - `DateInput.svelte`: external value sync - `+page.server.ts`: `resolvePersonName` + `initialSenderName`/`initialReceiverName` - `SearchFilterBar.svelte`: `navKey` prop The PR description only says "Closes #468." If these changes were already merged via PR #487, they should not appear in this diff. If they were not, the PR description should also say "Closes #482" and the PR title should reflect both fixes. **Recommendation**: Verify whether the #482 changes are already in `main`. If yes, rebase this branch onto the current `main` to drop the duplicate commits. If no, add "Closes #482" to the PR description. ### NFR Coverage The extra `resolvePersonName` call per page load (when filters are active) is an implicit performance NFR change. No measurement has been added. For the current scale this is negligible, but noting it here so it can be tracked if latency ever becomes a concern.

marcel commented

2026-05-09 14:26:58 +02:00

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Verdict: ✅ Approved

The UI changes in this PR are plumbing, not visual design — they restore correct behaviour rather than introducing new UI elements.

What Was Checked

No new colors, fonts, or spacing introduced ✅
No changes to touch target sizes ✅
No new interactive elements that would need ARIA labels ✅
PersonTypeahead's user-visible shape is unchanged — the resetKey prop is an internal mechanism, invisible to the user ✅
DateInput's external sync fix ($effect with untrack) prevents stale date display after navigation — this is a direct improvement for users who navigate between filtered search states ✅
resolvePersonName restores the display name in the sender/receiver typeahead when loading a bookmarked filtered URL — correct behaviour, previously broken ✅

One Note

The resetKey smoke test correctly notes in a comment that the rerender() path in vitest-browser-svelte is a full re-mount, not an in-place prop update. This means the real navigation scenario (prop update on a live instance) is tested only at the browser/E2E level. If a Playwright E2E test covers the filter-bar navigation flow, this gap is acceptable; if not, consider adding one to catch regressions in the $effect(resetKey) path.

LGTM on the UI/UX side.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist **Verdict: ✅ Approved** The UI changes in this PR are plumbing, not visual design — they restore correct behaviour rather than introducing new UI elements. ### What Was Checked - No new colors, fonts, or spacing introduced ✅ - No changes to touch target sizes ✅ - No new interactive elements that would need ARIA labels ✅ - `PersonTypeahead`'s user-visible shape is unchanged — the `resetKey` prop is an internal mechanism, invisible to the user ✅ - `DateInput`'s external sync fix (`$effect` with `untrack`) prevents stale date display after navigation — this is a direct improvement for users who navigate between filtered search states ✅ - `resolvePersonName` restores the display name in the sender/receiver typeahead when loading a bookmarked filtered URL — correct behaviour, previously broken ✅ ### One Note The `resetKey` smoke test correctly notes in a comment that the `rerender()` path in `vitest-browser-svelte` is a full re-mount, not an in-place prop update. This means the real navigation scenario (prop update on a live instance) is tested only at the browser/E2E level. If a Playwright E2E test covers the filter-bar navigation flow, this gap is acceptable; if not, consider adding one to catch regressions in the `$effect(resetKey)` path. LGTM on the UI/UX side.

marcel referenced this issue from a commit

2026-05-09 14:36:38 +02:00

refactor(fts): address PR #488 review concerns

marcel commented

2026-05-09 14:37:04 +02:00

Review concerns addressed (commit `abeece7e`)

@markus — Architecture / API client convention

resolvePersonName now accepts api: ReturnType<typeof createApiClient> instead of raw fetch. Uses api.GET('/api/persons/{id}', { params: { path: { id } } }) with proper result.response.ok check and error logging. ✅

@nora — Security / integer overflow

relevanceSortedPageFromSql: added long → int overflow guard before the cast. If pageable.getOffset() > Integer.MAX_VALUE, returns an empty result page instead of silently wrapping. ✅

@felix — Code clarity

Extracted isPureTextRelevance(...) private static method (8 params) to replace the 7-clause inline boolean. Call site reads if (isPureTextRelevance(...)) {. ✅

Tests updated accordingly

DocumentServiceTest: replaced findAllMatchingIdsByFts / findAll(Specification) stubs with findFtsPageRaw / findAllById stubs in the two match-data tests.
page.server.spec.ts: rewrote person-name resolution tests to mock via path-based api.GET dispatch (matching new call site), confirmed "not a UUID" guard fires before any person API call. All 1812 frontend tests green. ✅

## Review concerns addressed (commit abeece7e) **@markus — Architecture / API client convention** - `resolvePersonName` now accepts `api: ReturnType<typeof createApiClient>` instead of raw `fetch`. Uses `api.GET('/api/persons/{id}', { params: { path: { id } } })` with proper `result.response.ok` check and error logging. ✅ **@nora — Security / integer overflow** - `relevanceSortedPageFromSql`: added `long → int` overflow guard before the cast. If `pageable.getOffset() > Integer.MAX_VALUE`, returns an empty result page instead of silently wrapping. ✅ **@felix — Code clarity** - Extracted `isPureTextRelevance(...)` private static method (8 params) to replace the 7-clause inline boolean. Call site reads `if (isPureTextRelevance(...)) {`. ✅ **Tests updated accordingly** - `DocumentServiceTest`: replaced `findAllMatchingIdsByFts` / `findAll(Specification)` stubs with `findFtsPageRaw` / `findAllById` stubs in the two match-data tests. - `page.server.spec.ts`: rewrote person-name resolution tests to mock via path-based `api.GET` dispatch (matching new call site), confirmed "not a UUID" guard fires before any person API call. All 1812 frontend tests green. ✅

marcel commented

2026-05-09 14:39:41 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ✅ Approved

Strong TDD evidence throughout. DocumentFtsPagedIntegrationTest has 5 repo-level tests covering the happy path, window-total accuracy, last-page remainder, no-match, and stopword-only edge case. DocumentServiceSortTest was fully rewritten with explicit routing verification using verify(documentRepository, never()).findAllMatchingIdsByFts(anyString()) — this proves the new SQL path fires when expected and the old in-memory path fires when filters are active. That's exactly the red-before-green discipline I want to see.

Clean code

isPureTextRelevance() extraction is right — the 8-param signature is long but all are domain concepts. ✅
toFtsPage() uses named constants (COL_ID, COL_RANK, COL_TOTAL) with a Javadoc enumerating the column contract. Fragile (positional Object[]), but well-documented. ✅
relevanceSortedPageFromSql is ~20 lines doing three things: guard + SQL call + JPA load + sort. Borderline — could split into loadFtsPage(text, pageable) returning the FtsPage and a separate JPA hydration call. Not a blocker given it reads cleanly.
void resetKey in the $effect is standard Svelte 5 idiom for reactive subscriptions without consuming the value. Comment is accurate: writable $derived is read-only, so $state + $effect is the right pattern here. ✅
ESLint disable comment removed after the comment explaining the pattern was added. ✅

Suggestion

pageSlice() still does (int) pageable.getOffset() without an overflow guard — the same pattern that was fixed in relevanceSortedPageFromSql. It's used by the SENDER/RECEIVER in-memory sort paths. Low practical risk (those paths need matching IDs in memory first, which caps the realistic page number), but it's inconsistent. Worth adding the same guard for defensive consistency.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** Strong TDD evidence throughout. `DocumentFtsPagedIntegrationTest` has 5 repo-level tests covering the happy path, window-total accuracy, last-page remainder, no-match, and stopword-only edge case. `DocumentServiceSortTest` was fully rewritten with explicit routing verification using `verify(documentRepository, never()).findAllMatchingIdsByFts(anyString())` — this proves the new SQL path fires when expected and the old in-memory path fires when filters are active. That's exactly the red-before-green discipline I want to see. ### Clean code - `isPureTextRelevance()` extraction is right — the 8-param signature is long but all are domain concepts. ✅ - `toFtsPage()` uses named constants (`COL_ID`, `COL_RANK`, `COL_TOTAL`) with a Javadoc enumerating the column contract. Fragile (positional `Object[]`), but well-documented. ✅ - `relevanceSortedPageFromSql` is ~20 lines doing three things: guard + SQL call + JPA load + sort. Borderline — could split into `loadFtsPage(text, pageable)` returning the `FtsPage` and a separate JPA hydration call. Not a blocker given it reads cleanly. - `void resetKey` in the `$effect` is standard Svelte 5 idiom for reactive subscriptions without consuming the value. Comment is accurate: writable `$derived` is read-only, so `$state` + `$effect` is the right pattern here. ✅ - ESLint disable comment removed after the comment explaining the pattern was added. ✅ ### Suggestion `pageSlice()` still does `(int) pageable.getOffset()` without an overflow guard — the same pattern that was fixed in `relevanceSortedPageFromSql`. It's used by the SENDER/RECEIVER in-memory sort paths. Low practical risk (those paths need matching IDs in memory first, which caps the realistic page number), but it's inconsistent. Worth adding the same guard for defensive consistency.

marcel commented

2026-05-09 14:39:52 +02:00

🏗️ Markus Keller — Application Architect

Verdict: ✅ Approved

ADR-008 is present, committed to docs/adr/, and covers all four required sections: context (the pre-production audit finding F-31, row counts measured), decision (CTE with COUNT(*) OVER()), alternatives considered (two-query, capped result set, full SQL rewrite), and consequences (what stays in-memory and why). This is exactly the format that prevents future developers from reversing a good decision out of ignorance.

Architecture

The two-phase routing (isPureTextRelevance → SQL path; text+filters → JPA Specification) is clean and correctly documented in the ADR. The detection function is pure (no side effects, static) and the call site reads as a guard clause — good. Layer boundaries are respected: DocumentService calls its own DocumentRepository exclusively. resolvePersonName correctly crosses into the Person domain via the typed HTTP API client (api.GET('/api/persons/{id}', ...)) rather than injecting PersonRepository directly — this is the correct boundary crossing for a monolith.

FtsHit and FtsPage are package-private records in the document package, used only within that domain. Correct scoping.

One architectural observation (suggestion, not blocker)

findFtsPageRaw returns List<Object[]>. Spring Data JPA supports typed projections:

interface FtsRow {
    UUID getId();
    double getRank();
    long getTotal();
}
// Then: List<FtsRow> findFtsPageRaw(...)

This would eliminate the positional column constants and the driver-variance cast in toFtsPage. The Javadoc + constants are a reasonable mitigation for now, but worth refactoring if the query is extended (e.g., adding a snippet column).

Doc checklist

Required	Status
No schema migration → no db diagram update	✅ n/a
No new routes → no frontend C4 update	✅ n/a
No new domain module → no CLAUDE.md update	✅ n/a
Architectural decision → ADR-008	✅ present

## 🏗️ Markus Keller — Application Architect **Verdict: ✅ Approved** ADR-008 is present, committed to `docs/adr/`, and covers all four required sections: context (the pre-production audit finding F-31, row counts measured), decision (CTE with `COUNT(*) OVER()`), alternatives considered (two-query, capped result set, full SQL rewrite), and consequences (what stays in-memory and why). This is exactly the format that prevents future developers from reversing a good decision out of ignorance. ### Architecture The two-phase routing (`isPureTextRelevance` → SQL path; text+filters → JPA Specification) is clean and correctly documented in the ADR. The detection function is pure (no side effects, static) and the call site reads as a guard clause — good. Layer boundaries are respected: `DocumentService` calls its own `DocumentRepository` exclusively. `resolvePersonName` correctly crosses into the Person domain via the typed HTTP API client (`api.GET('/api/persons/{id}', ...)`) rather than injecting `PersonRepository` directly — this is the correct boundary crossing for a monolith. `FtsHit` and `FtsPage` are package-private records in the `document` package, used only within that domain. Correct scoping. ### One architectural observation (suggestion, not blocker) `findFtsPageRaw` returns `List<Object[]>`. Spring Data JPA supports typed projections: ```java interface FtsRow { UUID getId(); double getRank(); long getTotal(); } // Then: List<FtsRow> findFtsPageRaw(...) ``` This would eliminate the positional column constants and the driver-variance cast in `toFtsPage`. The Javadoc + constants are a reasonable mitigation for now, but worth refactoring if the query is extended (e.g., adding a snippet column). ### Doc checklist | Required | Status | |---|---| | No schema migration → no db diagram update | ✅ n/a | | No new routes → no frontend C4 update | ✅ n/a | | No new domain module → no CLAUDE.md update | ✅ n/a | | Architectural decision → ADR-008 | ✅ present |

marcel commented

2026-05-09 14:40:02 +02:00

🔐 Nora "NullX" Steiner — Security Engineer

Verdict: ✅ Approved

SQL injection analysis

findFtsPageRaw — all three bind parameters (query, offset, limit) are @Param-bound in the JPA native query. No string concatenation.

The regexp_replace inside the SQL operates on the output of websearch_to_tsquery('german', :query) — not on the raw user string. websearch_to_tsquery is a Postgres function that parses and sanitizes the FTS query into a tsquery type before the regex sees it. The transformation ('\\1':*) adds prefix-match semantics to each lexeme. This is safe.

findAllMatchingIdsByFts — same parameterized pattern as before. ✅

`resolvePersonName`

UUID regex guard (/^[0-9a-f]{8}-[0-9a-f]{4}-...$/i) prevents path traversal via malformed IDs before any HTTP call is made. ✅
Uses typed api.GET('/api/persons/{id}', { params: { path: { id } } }) — the ID is injected into the URL template, not string-concatenated. ✅
Non-200 returns empty string silently. No sensitive info (stack traces, SQL errors) leaks to the caller. ✅

Overflow guard

relevanceSortedPageFromSql: long → int guard added (if (rawOffset > Integer.MAX_VALUE) return DocumentSearchResult.of(List.of())). CWE-190 fixed. ✅

Remaining gap (suggestion, not blocker)

pageSlice() still does (int) pageable.getOffset() without an overflow guard. This is used by the SENDER/RECEIVER in-memory sort paths. In practice those paths require findAllMatchingIdsByFts to return IDs first, which bounds the realistic match count and page number. But the cast is still technically CWE-190. A guard here (if (pageable.getOffset() > Integer.MAX_VALUE) return DocumentSearchResult.of(...)) would be consistent with the new relevanceSortedPageFromSql guard.

Overall posture: no new injection vectors, UUID input validation added, overflow fixed on the new hot path. ✅

## 🔐 Nora "NullX" Steiner — Security Engineer **Verdict: ✅ Approved** ### SQL injection analysis `findFtsPageRaw` — all three bind parameters (`query`, `offset`, `limit`) are `@Param`-bound in the JPA native query. No string concatenation. The `regexp_replace` inside the SQL operates on the **output** of `websearch_to_tsquery('german', :query)` — not on the raw user string. `websearch_to_tsquery` is a Postgres function that parses and sanitizes the FTS query into a `tsquery` type before the regex sees it. The transformation (`'\\1':*`) adds prefix-match semantics to each lexeme. This is safe. `findAllMatchingIdsByFts` — same parameterized pattern as before. ✅ ### `resolvePersonName` - UUID regex guard (`/^[0-9a-f]{8}-[0-9a-f]{4}-...$/i`) prevents path traversal via malformed IDs before any HTTP call is made. ✅ - Uses typed `api.GET('/api/persons/{id}', { params: { path: { id } } })` — the ID is injected into the URL template, not string-concatenated. ✅ - Non-200 returns empty string silently. No sensitive info (stack traces, SQL errors) leaks to the caller. ✅ ### Overflow guard `relevanceSortedPageFromSql`: `long → int` guard added (`if (rawOffset > Integer.MAX_VALUE) return DocumentSearchResult.of(List.of())`). CWE-190 fixed. ✅ ### Remaining gap (suggestion, not blocker) `pageSlice()` still does `(int) pageable.getOffset()` without an overflow guard. This is used by the SENDER/RECEIVER in-memory sort paths. In practice those paths require `findAllMatchingIdsByFts` to return IDs first, which bounds the realistic match count and page number. But the cast is still technically CWE-190. A guard here (`if (pageable.getOffset() > Integer.MAX_VALUE) return DocumentSearchResult.of(...)`) would be consistent with the new `relevanceSortedPageFromSql` guard. Overall posture: no new injection vectors, UUID input validation added, overflow fixed on the new hot path. ✅

marcel commented

2026-05-09 14:40:16 +02:00

🧪 Sara Holt — QA Engineer

Verdict: ⚠️ Approved with concerns

What's well covered

DocumentFtsPagedIntegrationTest — 5 Testcontainers tests against real Postgres: first page, window total, last-page remainder, no-match, stopword-only. No H2. ✅
DocumentServiceSortTest — routing verification with verify(documentRepository, never()).findAllMatchingIdsByFts(anyString()) and verify(documentRepository).findFtsPageRaw(...). This is exactly the right approach: verifying which code path is taken, not just the output. ✅
page.server.spec.ts — 4 tests for person name resolution: UUID lookup, receiver lookup, invalid-UUID short-circuit, 404 fallback. The mockGet call count assertion (toHaveBeenCalledTimes(1) for the invalid UUID case) verifies the guard fires correctly. ✅
Test names are descriptive sentences throughout. ✅

Missing coverage (blockers)

1. Overflow guard is untested.
relevanceSortedPageFromSql returns early when pageable.getOffset() > Integer.MAX_VALUE. This guard was flagged as a security fix (CWE-190). There's no test proving it works. A unit test in DocumentServiceSortTest:

@Test
void searchDocuments_relevance_returns_empty_when_offset_exceeds_maxInt() {
    Pageable hugePage = PageRequest.of(Integer.MAX_VALUE / 10 + 1, 10); // offset > MAX_INT
    DocumentSearchResult result = documentService.searchDocuments(
        "Brief", null, null, null, null, null, null, null,
        DocumentSort.RELEVANCE, null, null, hugePage);
    assertThat(result.items()).isEmpty();
}

Note: findFtsPageRaw should NOT be called in this case — verify with never().

2. toFtsPage UUID-as-String driver variance is untested.
The code path r[COL_ID] instanceof UUID u ? u : UUID.fromString(r[COL_ID].toString()) handles JDBC drivers that return IDs as Strings. This is tested indirectly (Postgres JDBC returns UUID objects, so the instanceof branch fires in integration tests), but the String fallback branch has zero coverage. Consider a direct unit test for toFtsPage with a String ID in the array.

Suggestion (nice to have)

The comment in PersonTypeahead.svelte.spec.ts honestly documents the rerender limitation — the $effect(resetKey) path during SvelteKit navigation (prop update on a live instance) can't be isolated at the component test level. The smoke test that exercises the rerender path is the right pragmatic choice. ✅

## 🧪 Sara Holt — QA Engineer **Verdict: ⚠️ Approved with concerns** ### What's well covered - `DocumentFtsPagedIntegrationTest` — 5 Testcontainers tests against real Postgres: first page, window total, last-page remainder, no-match, stopword-only. No H2. ✅ - `DocumentServiceSortTest` — routing verification with `verify(documentRepository, never()).findAllMatchingIdsByFts(anyString())` and `verify(documentRepository).findFtsPageRaw(...)`. This is exactly the right approach: verifying which code path is taken, not just the output. ✅ - `page.server.spec.ts` — 4 tests for person name resolution: UUID lookup, receiver lookup, invalid-UUID short-circuit, 404 fallback. The `mockGet` call count assertion (`toHaveBeenCalledTimes(1)` for the invalid UUID case) verifies the guard fires correctly. ✅ - Test names are descriptive sentences throughout. ✅ ### Missing coverage (blockers) **1. Overflow guard is untested.** `relevanceSortedPageFromSql` returns early when `pageable.getOffset() > Integer.MAX_VALUE`. This guard was flagged as a security fix (CWE-190). There's no test proving it works. A unit test in `DocumentServiceSortTest`: ```java @Test void searchDocuments_relevance_returns_empty_when_offset_exceeds_maxInt() { Pageable hugePage = PageRequest.of(Integer.MAX_VALUE / 10 + 1, 10); // offset > MAX_INT DocumentSearchResult result = documentService.searchDocuments( "Brief", null, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, hugePage); assertThat(result.items()).isEmpty(); } ``` Note: `findFtsPageRaw` should NOT be called in this case — verify with `never()`. **2. `toFtsPage` UUID-as-String driver variance is untested.** The code path `r[COL_ID] instanceof UUID u ? u : UUID.fromString(r[COL_ID].toString())` handles JDBC drivers that return IDs as Strings. This is tested indirectly (Postgres JDBC returns UUID objects, so the `instanceof` branch fires in integration tests), but the `String` fallback branch has zero coverage. Consider a direct unit test for `toFtsPage` with a String ID in the array. ### Suggestion (nice to have) The comment in `PersonTypeahead.svelte.spec.ts` honestly documents the rerender limitation — the `$effect(resetKey)` path during SvelteKit navigation (prop update on a live instance) can't be isolated at the component test level. The smoke test that exercises the rerender path is the right pragmatic choice. ✅

marcel commented

2026-05-09 14:40:26 +02:00

🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist

Verdict: ✅ Approved

The frontend changes in this PR are behavioral correctness fixes, not visual changes. Checking from the user's perspective:

PersonTypeahead `resetKey` prop

Without this fix, a user who typed a name into the sender/receiver filter without selecting a person (no API call, no senderId set) would see their typed text persist in the filter bar after navigating to a new search. This is a Nielsen Heuristic 3 violation (User control and freedom — users expect their filter context to reflect the actual search, not stale input). The resetKey mechanism ensures the input field accurately reflects state. ✅

The $effect(() => { void resetKey; searchTerm = initialName; }) pattern is correct Svelte 5. The comment explaining why $derived cannot be used here ($derived is read-only, we need local write + external sync) is accurate and should prevent future "optimization" that would break this. ✅

DateInput external value re-derive

The guard if (germanToIso(untrack(() => display)) !== externalIso) prevents the display from overwriting while the user is mid-typing a partial date — a critical UX protection. Without untrack(), the guard itself would create an additional reactive dependency on display, potentially causing a loop. This is done correctly. ✅

Accessibility check

No new interactive elements introduced. No color or typography changes. No removal of focus rings. No ARIA label changes. Touch targets unchanged. ✅

The sender/receiver typeahead inputs already use <label> elements via the label prop pattern in PersonTypeahead.svelte. The new resetKey prop is invisible to assistive technology — it only affects internal state. ✅

## 🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist **Verdict: ✅ Approved** The frontend changes in this PR are behavioral correctness fixes, not visual changes. Checking from the user's perspective: ### PersonTypeahead `resetKey` prop Without this fix, a user who typed a name into the sender/receiver filter without selecting a person (no API call, no `senderId` set) would see their typed text persist in the filter bar after navigating to a new search. This is a Nielsen Heuristic 3 violation (User control and freedom — users expect their filter context to reflect the actual search, not stale input). The `resetKey` mechanism ensures the input field accurately reflects state. ✅ The `$effect(() => { void resetKey; searchTerm = initialName; })` pattern is correct Svelte 5. The comment explaining why `$derived` cannot be used here (`$derived` is read-only, we need local write + external sync) is accurate and should prevent future "optimization" that would break this. ✅ ### DateInput external value re-derive The guard `if (germanToIso(untrack(() => display)) !== externalIso)` prevents the display from overwriting while the user is mid-typing a partial date — a critical UX protection. Without `untrack()`, the guard itself would create an additional reactive dependency on `display`, potentially causing a loop. This is done correctly. ✅ ### Accessibility check No new interactive elements introduced. No color or typography changes. No removal of focus rings. No ARIA label changes. Touch targets unchanged. ✅ The sender/receiver typeahead inputs already use `<label>` elements via the `label` prop pattern in `PersonTypeahead.svelte`. The new `resetKey` prop is invisible to assistive technology — it only affects internal state. ✅

marcel commented

2026-05-09 14:40:33 +02:00

⚙️ Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

This PR doesn't touch CI workflows, Docker Compose, or infrastructure config. Checking what's relevant from a platform perspective:

Testcontainers strategy

DocumentFtsPagedIntegrationTest uses @DirtiesContext(classMode = AFTER_CLASS) — this keeps the Spring context alive for all 5 tests in the class and rebuilds it once at teardown, rather than resetting after each test. This is the correct approach: a rebuild-per-test would restart the Postgres container 5 times, adding ~30 seconds to CI. With AFTER_CLASS, the container starts once and seeds/cleans in @BeforeEach. ✅

The container image is postgres:16-alpine — pinned and consistent with the project's production database version. ✅

CI overhead estimate

+5 integration tests sharing one Testcontainers instance. The @BeforeEach seeds 70 documents and clears between tests via documentRepository.deleteAll(). Estimated added CI time: ~5-10 seconds. Acceptable.

Nothing else to flag

No new Docker services, no new environment variables, no new action versions, no new dependencies. The scripts/clean-e2e-data.sh file shows as modified in the working tree but isn't part of this PR's diff. LGTM.

## ⚙️ Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** This PR doesn't touch CI workflows, Docker Compose, or infrastructure config. Checking what's relevant from a platform perspective: ### Testcontainers strategy `DocumentFtsPagedIntegrationTest` uses `@DirtiesContext(classMode = AFTER_CLASS)` — this keeps the Spring context alive for all 5 tests in the class and rebuilds it once at teardown, rather than resetting after each test. This is the correct approach: a rebuild-per-test would restart the Postgres container 5 times, adding ~30 seconds to CI. With `AFTER_CLASS`, the container starts once and seeds/cleans in `@BeforeEach`. ✅ The container image is `postgres:16-alpine` — pinned and consistent with the project's production database version. ✅ ### CI overhead estimate +5 integration tests sharing one Testcontainers instance. The `@BeforeEach` seeds 70 documents and clears between tests via `documentRepository.deleteAll()`. Estimated added CI time: ~5-10 seconds. Acceptable. ### Nothing else to flag No new Docker services, no new environment variables, no new action versions, no new dependencies. The `scripts/clean-e2e-data.sh` file shows as modified in the working tree but isn't part of this PR's diff. LGTM.

marcel commented

2026-05-09 14:40:41 +02:00

📋 Elicit — Requirements Engineer

Verdict: ✅ Approved

Issue closure check

Issue #468 ("paginate FTS match-set in SQL") is closed cleanly by this PR. The pre-production audit finding F-31 (loading all matching IDs into memory, measured at 911 IDs/call for "walter") is directly addressed: rows_per_call for pure-text RELEVANCE queries drops from match-set size to ≤ page size (50).

The PR correctly limits scope to the RELEVANCE/pure-text path. Filter-active queries (date range, sender, receiver, tags, status) continue on the existing in-memory path — this is documented in ADR-008 as an intentional tradeoff, not missing functionality.

Scope check

The included UI fixes (resetKey, navKey, DateInput re-derive, resolvePersonName) are all related to the URL sync concern that issue #468 encompasses — filter state not correctly persisting or clearing on navigation. No scope creep.

One observation (non-blocking)

The PR description's test plan only lists backend tests:

DocumentFtsPagedIntegrationTest — 5 Testcontainers tests

DocumentServiceSortTest — routing verification

DocumentServiceTest — enrichment tests updated

Full backend suite: 1553 tests, 0 failures

The frontend tests (page.server.spec.ts — 4 new person name resolution tests; page.svelte.spec.ts — sender/receiver display test) are not mentioned. Consider updating the PR description's test plan checkbox list to include the frontend coverage. This is a documentation gap, not a functional concern.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** ### Issue closure check Issue #468 ("paginate FTS match-set in SQL") is closed cleanly by this PR. The pre-production audit finding F-31 (loading all matching IDs into memory, measured at 911 IDs/call for "walter") is directly addressed: `rows_per_call` for pure-text RELEVANCE queries drops from match-set size to ≤ page size (50). The PR correctly limits scope to the RELEVANCE/pure-text path. Filter-active queries (date range, sender, receiver, tags, status) continue on the existing in-memory path — this is documented in ADR-008 as an intentional tradeoff, not missing functionality. ### Scope check The included UI fixes (`resetKey`, `navKey`, `DateInput` re-derive, `resolvePersonName`) are all related to the URL sync concern that issue #468 encompasses — filter state not correctly persisting or clearing on navigation. No scope creep. ### One observation (non-blocking) The PR description's test plan only lists backend tests: > - `DocumentFtsPagedIntegrationTest` — 5 Testcontainers tests > - `DocumentServiceSortTest` — routing verification > - `DocumentServiceTest` — enrichment tests updated > - Full backend suite: 1553 tests, 0 failures The frontend tests (`page.server.spec.ts` — 4 new person name resolution tests; `page.svelte.spec.ts` — sender/receiver display test) are not mentioned. Consider updating the PR description's test plan checkbox list to include the frontend coverage. This is a documentation gap, not a functional concern.

marcel referenced this issue from a commit

2026-05-09 14:43:05 +02:00

test(fts): add overflow guard and UUID-as-String regression tests

marcel commented

2026-05-09 14:43:14 +02:00

Sara's blockers addressed (commit `ed32de9f`)

Blocker 1 — Overflow guard untested
Added searchDocuments_relevance_returns_empty_when_offset_exceeds_maxInt to DocumentServiceSortTest. Uses PageRequest.of(Integer.MAX_VALUE / 10 + 1, 10) which produces an offset > Integer.MAX_VALUE. Verifies: result is empty AND findFtsPageRaw is never called.

Blocker 2 — toFtsPage UUID-as-String driver variance
Added searchDocuments_relevance_handles_string_uuid_from_jdbc_driver. Mocks findFtsPageRaw to return a row with the ID column as String (not UUID) — simulating JDBC drivers that return UUID columns as Strings. Verifies the result contains the document with the correct UUID.

All 7 tests in DocumentServiceSortTest green. ✅

## Sara's blockers addressed (commit ed32de9f) **Blocker 1 — Overflow guard untested** Added `searchDocuments_relevance_returns_empty_when_offset_exceeds_maxInt` to `DocumentServiceSortTest`. Uses `PageRequest.of(Integer.MAX_VALUE / 10 + 1, 10)` which produces an offset > Integer.MAX_VALUE. Verifies: result is empty AND `findFtsPageRaw` is never called. **Blocker 2 — `toFtsPage` UUID-as-String driver variance** Added `searchDocuments_relevance_handles_string_uuid_from_jdbc_driver`. Mocks `findFtsPageRaw` to return a row with the ID column as `String` (not `UUID`) — simulating JDBC drivers that return UUID columns as Strings. Verifies the result contains the document with the correct UUID. All 7 tests in `DocumentServiceSortTest` green. ✅

marcel commented

2026-05-09 16:24:40 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ⚠️ Approved with concerns

Blockers

"mergeable": false — needs rebase before merge.
The PR cannot be merged as-is due to conflicts with the current main. Some of the frontend changes in this PR (PersonTypeahead resetKey, DateInput external value sync, SearchFilterBar navKey, +page.server.ts person name resolution) overlap with commits already landed on main (0430383e, 8a8205ad, a239c16c). A git rebase main on fix/issue-468-fts-sql-pagination is required to resolve conflicts and produce a clean diff showing only what this PR uniquely adds.

Suggestions

@BeforeEach seed() is slow (DocumentFtsPagedIntegrationTest): 70 × saveAndFlush() in a loop is 70 individual DB round-trips per test method. Use saveAll(list) + one em.flush() instead. The tsvector trigger fires at flush time so the batch insert is safe here.

console.error in resolvePersonName (+page.server.ts): This only fires for genuine exceptions (network errors, not 404s — those return early from the !result.response.ok branch), which is correct. But the log message includes the full exception object e which may contain stack trace noise in normal server output. Consider console.error('[resolvePersonName] failed for', id) without the exception for cleaner prod logs, or wrap it as a structured log entry.

void resetKey in PersonTypeahead (PersonTypeahead.svelte): The pattern is correct — reading a reactive variable inside $effect creates the dependency without using the value. The existing comment above the effect explains it well. No change needed.

Positives

FtsHit / FtsPage records are appropriately narrow — they express the type contract of Object[] without leaking repository internals into the service layer.
isPureTextRelevance as a named predicate is exactly right — it makes the routing condition testable in isolation and readable at the call site.
toFtsPage correctly handles JDBC driver UUID-as-String variance with the instanceof UUID u ? pattern match. Good defensive code.
ADR-008 is thorough. The rows_per_call metric framing makes the problem concrete and the decision verifiable post-deploy.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ⚠️ Approved with concerns** ### Blockers **`"mergeable": false` — needs rebase before merge.** The PR cannot be merged as-is due to conflicts with the current `main`. Some of the frontend changes in this PR (PersonTypeahead `resetKey`, DateInput external value sync, SearchFilterBar `navKey`, `+page.server.ts` person name resolution) overlap with commits already landed on `main` (`0430383e`, `8a8205ad`, `a239c16c`). A `git rebase main` on `fix/issue-468-fts-sql-pagination` is required to resolve conflicts and produce a clean diff showing only what this PR uniquely adds. ### Suggestions **`@BeforeEach seed()` is slow** (`DocumentFtsPagedIntegrationTest`): 70 × `saveAndFlush()` in a loop is 70 individual DB round-trips per test method. Use `saveAll(list)` + one `em.flush()` instead. The tsvector trigger fires at flush time so the batch insert is safe here. **`console.error` in `resolvePersonName`** (`+page.server.ts`): This only fires for genuine exceptions (network errors, not 404s — those return early from the `!result.response.ok` branch), which is correct. But the log message includes the full exception object `e` which may contain stack trace noise in normal server output. Consider `console.error('[resolvePersonName] failed for', id)` without the exception for cleaner prod logs, or wrap it as a structured log entry. **`void resetKey` in PersonTypeahead** (`PersonTypeahead.svelte`): The pattern is correct — reading a reactive variable inside `$effect` creates the dependency without using the value. The existing comment above the effect explains it well. No change needed. ### Positives - `FtsHit` / `FtsPage` records are appropriately narrow — they express the type contract of `Object[]` without leaking repository internals into the service layer. - `isPureTextRelevance` as a named predicate is exactly right — it makes the routing condition testable in isolation and readable at the call site. - `toFtsPage` correctly handles JDBC driver UUID-as-String variance with the `instanceof UUID u ?` pattern match. Good defensive code. - ADR-008 is thorough. The `rows_per_call` metric framing makes the problem concrete and the decision verifiable post-deploy.

marcel commented

2026-05-09 16:24:55 +02:00

🏗️ Markus Keller — Application Architect

Verdict: ✅ Approved

Observations

The two-phase approach (SQL page → findAllById) is the correct architecture here. Phase 1 pushes OFFSET/LIMIT and COUNT(*) OVER () into Postgres — this is where ranking and counting belong. Phase 2 loads only the page's documents via a bounded IN clause (≤ page size, typically ≤ 50 IDs). The result: the worst-case IN clause is always O(page size), not O(match set size). This directly addresses finding F-31 from the pre-production audit.

isPureTextRelevance is a textbook guard-clause pattern. A single named predicate routing to the fast path is far better than embedding the condition inline inside searchDocuments. It is independently testable and readable at the call site.

ADR-008 documents the split path correctly. The explicit statement that findAllMatchingIdsByFts is retained for (a) bulk-edit, (b) density chart, and (c) SENDER/RECEIVER in-memory sort paths is important — it prevents a future refactor from accidentally "cleaning up" the two-path design without understanding the consequences.

FtsHit / FtsPage are package-private records in the document package. This is the right scope — they are the repository-to-service contract, not a public API. They should not be promoted to public unless a second consumer appears.

Concern

relevanceSortedPageFromSql — document deleted between FTS query and findAllById: If a document is deleted in the narrow window between findFtsPageRaw and findAllById, the page will have fewer items than expected and the total from the window function will be one higher than the actual count. This is an inherent TOCTOU property of the two-phase approach, documented in Alternatives Considered. Acceptable for a family archive with low write frequency. Worth noting in the ADR's Consequences section if it isn't already.

Minor

The comment // In-memory sort on page slice (≤ page size rows) — acceptable on the RECEIVER path (DocumentService.java) says "page slice rows" but the RECEIVER path loads the entire filtered set (findAll(spec)) and THEN slices. The ≤ page size rows qualifier is wrong — it's ≤ match-set size rows. The in-memory sort is acceptable because the filtered set (with an active receiver filter) is typically small, not because slicing happens first. Consider: // In-memory sort on filtered set — acceptable while filtered match count stays under ~1 K rows.

## 🏗️ Markus Keller — Application Architect **Verdict: ✅ Approved** ### Observations **The two-phase approach (SQL page → `findAllById`) is the correct architecture here.** Phase 1 pushes OFFSET/LIMIT and `COUNT(*) OVER ()` into Postgres — this is where ranking and counting belong. Phase 2 loads only the page's documents via a bounded `IN` clause (≤ page size, typically ≤ 50 IDs). The result: the worst-case `IN` clause is always O(page size), not O(match set size). This directly addresses finding F-31 from the pre-production audit. **`isPureTextRelevance` is a textbook guard-clause pattern.** A single named predicate routing to the fast path is far better than embedding the condition inline inside `searchDocuments`. It is independently testable and readable at the call site. **ADR-008 documents the split path correctly.** The explicit statement that `findAllMatchingIdsByFts` is retained for (a) bulk-edit, (b) density chart, and (c) SENDER/RECEIVER in-memory sort paths is important — it prevents a future refactor from accidentally "cleaning up" the two-path design without understanding the consequences. **`FtsHit` / `FtsPage` are package-private records in the `document` package.** This is the right scope — they are the repository-to-service contract, not a public API. They should not be promoted to `public` unless a second consumer appears. ### Concern **`relevanceSortedPageFromSql` — document deleted between FTS query and `findAllById`**: If a document is deleted in the narrow window between `findFtsPageRaw` and `findAllById`, the page will have fewer items than expected and the `total` from the window function will be one higher than the actual count. This is an inherent TOCTOU property of the two-phase approach, documented in Alternatives Considered. Acceptable for a family archive with low write frequency. Worth noting in the ADR's Consequences section if it isn't already. ### Minor The comment `// In-memory sort on page slice (≤ page size rows) — acceptable` on the RECEIVER path (`DocumentService.java`) says "page slice rows" but the RECEIVER path loads the entire filtered set (`findAll(spec)`) and THEN slices. The `≤ page size rows` qualifier is wrong — it's ≤ match-set size rows. The in-memory sort is acceptable because the *filtered* set (with an active receiver filter) is typically small, not because slicing happens first. Consider: `// In-memory sort on filtered set — acceptable while filtered match count stays under ~1 K rows`.

marcel commented

2026-05-09 16:25:06 +02:00

🔒 Nora "NullX" Steiner — Security Engineer

Verdict: ✅ Approved

What I Checked

findFtsPageRaw SQL — injection surface: All three user-controlled values (query, offset, limit) are bound via named parameters (:query, :offset, :limit). Spring Data JPA's @Query(nativeQuery = true) with @Param binding is parameterized — the driver prepares the statement and substitutes the values safely. No injection risk.

regexp_replace inside the SQL CTE: The pattern operates on websearch_to_tsquery('german', :query)::text, which is the output of Postgres's own FTS parser, not raw user input. The regex transforms 'term' to 'term':* for prefix matching. This is server-side string manipulation on already-sanitized, already-parsed tsquery text. Not a user-controlled pattern. No injection risk.

UUID_RE guard in resolvePersonName (+page.server.ts): The regex /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i validates that the URL parameter is a proper UUID before making an API call. This prevents SSRF-adjacent abuse where a crafted senderId value could be used to probe internal paths. Good practice.

console.error log in resolvePersonName: Logs id (a UUID from the URL, user-controlled but not sensitive) and the exception object. The UUID is already in the URL bar so logging it is not a data exposure concern. The exception object may contain stack trace information but not application secrets.

No new input surfaces, no XSS vectors, no auth bypass: The frontend changes (PersonTypeahead, DateInput, SearchFilterBar) do not introduce new user-controlled data flows that could reach unescaped output paths. The resolvePersonName person lookup uses the same authenticated API client as the rest of the page load.

One Note

The security comment on runner-config.yaml (from PR #476) correctly documents the Docker socket trust model. Unrelated to this PR, but worth confirming that the fix/issue-468-fts-sql-pagination branch does not inadvertently include the runner-config.yaml changes (they should be in main already). Verify the rebase doesn't duplicate that file.

## 🔒 Nora "NullX" Steiner — Security Engineer **Verdict: ✅ Approved** ### What I Checked **`findFtsPageRaw` SQL — injection surface**: All three user-controlled values (`query`, `offset`, `limit`) are bound via named parameters (`:query`, `:offset`, `:limit`). Spring Data JPA's `@Query(nativeQuery = true)` with `@Param` binding is parameterized — the driver prepares the statement and substitutes the values safely. No injection risk. **`regexp_replace` inside the SQL CTE**: The pattern operates on `websearch_to_tsquery('german', :query)::text`, which is the output of Postgres's own FTS parser, not raw user input. The regex transforms `'term'` to `'term':*` for prefix matching. This is server-side string manipulation on already-sanitized, already-parsed tsquery text. Not a user-controlled pattern. No injection risk. **`UUID_RE` guard in `resolvePersonName`** (`+page.server.ts`): The regex `/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i` validates that the URL parameter is a proper UUID before making an API call. This prevents SSRF-adjacent abuse where a crafted `senderId` value could be used to probe internal paths. Good practice. **`console.error` log in `resolvePersonName`**: Logs `id` (a UUID from the URL, user-controlled but not sensitive) and the exception object. The UUID is already in the URL bar so logging it is not a data exposure concern. The exception object may contain stack trace information but not application secrets. **No new input surfaces, no XSS vectors, no auth bypass**: The frontend changes (PersonTypeahead, DateInput, SearchFilterBar) do not introduce new user-controlled data flows that could reach unescaped output paths. The `resolvePersonName` person lookup uses the same authenticated API client as the rest of the page load. ### One Note The security comment on `runner-config.yaml` (from PR #476) correctly documents the Docker socket trust model. Unrelated to this PR, but worth confirming that the `fix/issue-468-fts-sql-pagination` branch does not inadvertently include the `runner-config.yaml` changes (they should be in `main` already). Verify the rebase doesn't duplicate that file.

marcel commented

2026-05-09 16:25:20 +02:00

🧪 Sara Holt — QA Engineer

Verdict: ⚠️ Approved with concerns

Strengths

DocumentFtsPagedIntegrationTest covers the five key repository-level scenarios — page size, window total, last-page remainder, no-match, and stopword-only. Using real Postgres via Testcontainers means the GIN index and tsvector trigger semantics are production-identical. These tests give high confidence in the SQL query correctness.

DocumentServiceSortTest new tests are thorough:

searchDocuments_relevance_pureText_calls_findFtsPageRaw_not_findAllMatchingIds — routing guard verified
searchDocuments_relevance_returns_empty_when_offset_exceeds_maxInt — overflow guard verified (and the never() verify correctly confirms the guard fires before the DB call)
searchDocuments_relevance_handles_string_uuid_from_jdbc_driver — JDBC driver variance tested
searchDocuments_relevance_with_active_filter_uses_inMemory_path — fallback path verified

page.server.spec.ts person name resolution tests are well-structured with a clean makeSearchMock factory.

Concerns

@BeforeEach seed() performance (DocumentFtsPagedIntegrationTest): 70 × saveAndFlush() is 70 individual database round-trips before every test method. With 5 test methods, that's 350 inserts across the test run. Consider documentRepository.saveAll(docs) + em.flush() + em.clear() with all documents pre-built, which reduces this to 1 batch insert per test method. This is especially important since @DirtiesContext(AFTER_CLASS) means the container is shared — slow setup accumulates.

Missing coverage: ftsPage.total() propagation: The integration tests verify that the SQL window function returns the correct total count. The service tests verify routing. But there's no test that checks the DocumentSearchResult.totalItems (or equivalent) field after a relevanceSortedPageFromSql call. A searchDocuments_relevance_pureText_totalItems_reflectsSqlWindowCount test in DocumentServiceSortTest would close this gap — it's the field the frontend uses to render pagination.

PersonTypeahead resetKey smoke-test limitation: The comment in the spec is correct and appropriate — the rerender() path tests re-mount, not in-place prop update. No action required, the note protects future contributors from confusion.

Verdict Rationale

The core FTS tests are solid. The two gaps (seed performance + total propagation) are not blocking but the total propagation gap is worth closing before this lands, since it tests the primary observable behaviour of the fix from a user perspective.

## 🧪 Sara Holt — QA Engineer **Verdict: ⚠️ Approved with concerns** ### Strengths **`DocumentFtsPagedIntegrationTest`** covers the five key repository-level scenarios — page size, window total, last-page remainder, no-match, and stopword-only. Using real Postgres via Testcontainers means the GIN index and tsvector trigger semantics are production-identical. These tests give high confidence in the SQL query correctness. **`DocumentServiceSortTest` new tests** are thorough: - `searchDocuments_relevance_pureText_calls_findFtsPageRaw_not_findAllMatchingIds` — routing guard verified - `searchDocuments_relevance_returns_empty_when_offset_exceeds_maxInt` — overflow guard verified (and the `never()` verify correctly confirms the guard fires before the DB call) - `searchDocuments_relevance_handles_string_uuid_from_jdbc_driver` — JDBC driver variance tested - `searchDocuments_relevance_with_active_filter_uses_inMemory_path` — fallback path verified **`page.server.spec.ts` person name resolution tests** are well-structured with a clean `makeSearchMock` factory. ### Concerns **`@BeforeEach seed()` performance** (`DocumentFtsPagedIntegrationTest`): 70 × `saveAndFlush()` is 70 individual database round-trips before every test method. With 5 test methods, that's 350 inserts across the test run. Consider `documentRepository.saveAll(docs)` + `em.flush()` + `em.clear()` with all documents pre-built, which reduces this to 1 batch insert per test method. This is especially important since `@DirtiesContext(AFTER_CLASS)` means the container is shared — slow setup accumulates. **Missing coverage: `ftsPage.total()` propagation**: The integration tests verify that the SQL window function returns the correct total count. The service tests verify routing. But there's no test that checks the `DocumentSearchResult.totalItems` (or equivalent) field after a `relevanceSortedPageFromSql` call. A `searchDocuments_relevance_pureText_totalItems_reflectsSqlWindowCount` test in `DocumentServiceSortTest` would close this gap — it's the field the frontend uses to render pagination. **PersonTypeahead `resetKey` smoke-test limitation**: The comment in the spec is correct and appropriate — the `rerender()` path tests re-mount, not in-place prop update. No action required, the note protects future contributors from confusion. ### Verdict Rationale The core FTS tests are solid. The two gaps (seed performance + total propagation) are not blocking but the total propagation gap is worth closing before this lands, since it tests the primary observable behaviour of the fix from a user perspective.

marcel commented

2026-05-09 16:25:33 +02:00

🚀 Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

What I Checked

CI workflow: No changes to .gitea/workflows/ci.yml. The existing unit-tests, backend-unit-tests, and related jobs will exercise DocumentFtsPagedIntegrationTest via ./mvnw clean test. The Testcontainers runner infrastructure (Docker socket env vars, TESTCONTAINERS_RYUK_DISABLED=true) is already wired from PR #476. No new CI configuration required.

Testcontainers test class: DocumentFtsPagedIntegrationTest uses @Import({PostgresContainerConfig.class, FlywayConfig.class}) — consistent with the existing pattern in DocumentFtsTest, DocumentServiceIntegrationTest, and others. No new test infrastructure to provision.

@DirtiesContext(classMode = AFTER_CLASS): Correct choice. Rebuilds the Spring context once after all 5 tests in the class complete rather than after each test. Reduces context rebuild overhead significantly for a Testcontainers test class. Consistent with the pattern used in other integration tests in this repo.

No new Docker images, no new services, no new volumes: This PR is purely application-layer. Nothing changes in docker-compose.yml or the platform stack.

Branch isolation: The fix/issue-468-fts-sql-pagination branch diverged before some recent main commits landed. The "mergeable": false status means CI won't run on a merge commit until the rebase is done. Once rebased, the full CI suite should pass.

Operational Note

resolvePersonName in +page.server.ts now makes parallel /api/persons/{id} calls on every document page load that has a sender/receiver filter active. At the current archive scale (single family, ≤ 10 concurrent users), this is a non-event. Worth noting: if the person API is temporarily slow, these calls could delay the SSR response for the documents page. The catch handler returns '' gracefully, so a person API timeout won't cause a 500. Acceptable.

## 🚀 Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** ### What I Checked **CI workflow**: No changes to `.gitea/workflows/ci.yml`. The existing `unit-tests`, `backend-unit-tests`, and related jobs will exercise `DocumentFtsPagedIntegrationTest` via `./mvnw clean test`. The Testcontainers runner infrastructure (Docker socket env vars, `TESTCONTAINERS_RYUK_DISABLED=true`) is already wired from PR #476. No new CI configuration required. **Testcontainers test class**: `DocumentFtsPagedIntegrationTest` uses `@Import({PostgresContainerConfig.class, FlywayConfig.class})` — consistent with the existing pattern in `DocumentFtsTest`, `DocumentServiceIntegrationTest`, and others. No new test infrastructure to provision. **`@DirtiesContext(classMode = AFTER_CLASS)`**: Correct choice. Rebuilds the Spring context once after all 5 tests in the class complete rather than after each test. Reduces context rebuild overhead significantly for a Testcontainers test class. Consistent with the pattern used in other integration tests in this repo. **No new Docker images, no new services, no new volumes**: This PR is purely application-layer. Nothing changes in `docker-compose.yml` or the platform stack. **Branch isolation**: The `fix/issue-468-fts-sql-pagination` branch diverged before some recent main commits landed. The `"mergeable": false` status means CI won't run on a merge commit until the rebase is done. Once rebased, the full CI suite should pass. ### Operational Note `resolvePersonName` in `+page.server.ts` now makes parallel `/api/persons/{id}` calls on every document page load that has a sender/receiver filter active. At the current archive scale (single family, ≤ 10 concurrent users), this is a non-event. Worth noting: if the person API is temporarily slow, these calls could delay the SSR response for the documents page. The `catch` handler returns `''` gracefully, so a person API timeout won't cause a 500. Acceptable.

marcel commented

2026-05-09 16:25:41 +02:00

🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist

Verdict: ✅ Approved

What I Checked

Person name pre-fill on filtered navigation (+page.server.ts + +page.svelte): This is a genuine UX improvement. Previously, if a user navigated to /documents?senderId=<uuid> (e.g. from a bookmark or shared link), the sender typeahead would show the UUID in its input (or empty) rather than the person's name. Now resolvePersonName fetches the display name server-side and passes it as initialSenderName/initialReceiverName. The PersonTypeahead renders with the human-readable name from the first paint — no flash of raw UUIDs. Good for the 60+ audience who need clear visual feedback on what filter is active.

resetKey / navKey for PersonTypeahead (SearchFilterBar.svelte): The pattern correctly solves the case where a user types a partial name (without selecting a person), navigates away, and comes back — the typeahead now resets to reflect the current URL state rather than the stale typed text. Correct behavior for the search filter context.

DateInput external value sync (DateInput.svelte): The $effect re-derives the display string when the value prop changes externally. The untrack(() => display) guard prevents the effect from re-running when the user is mid-typing a partial date. This is the correct implementation pattern — it won't overwrite "01.01" with "" while the user is typing "01.01.1920".

No Accessibility Regressions

The changes don't modify any ARIA attributes, focus management, or keyboard interaction patterns. The PersonTypeahead input remains a role="combobox" with proper labelling. DateInput remains a type="text" with the existing validation feedback.

Minor

The void resetKey reactive read pattern in PersonTypeahead.svelte is invisible to users but its absence would cause subtle UX bugs. The comment above the effect (// Sync display text when initialName changes OR when resetKey increments) is sufficient documentation for developers maintaining this component.

## 🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist **Verdict: ✅ Approved** ### What I Checked **Person name pre-fill on filtered navigation** (`+page.server.ts` + `+page.svelte`): This is a genuine UX improvement. Previously, if a user navigated to `/documents?senderId=<uuid>` (e.g. from a bookmark or shared link), the sender typeahead would show the UUID in its input (or empty) rather than the person's name. Now `resolvePersonName` fetches the display name server-side and passes it as `initialSenderName`/`initialReceiverName`. The PersonTypeahead renders with the human-readable name from the first paint — no flash of raw UUIDs. Good for the 60+ audience who need clear visual feedback on what filter is active. **`resetKey` / `navKey` for PersonTypeahead** (`SearchFilterBar.svelte`): The pattern correctly solves the case where a user types a partial name (without selecting a person), navigates away, and comes back — the typeahead now resets to reflect the current URL state rather than the stale typed text. Correct behavior for the search filter context. **`DateInput` external value sync** (`DateInput.svelte`): The `$effect` re-derives the display string when the `value` prop changes externally. The `untrack(() => display)` guard prevents the effect from re-running when the user is mid-typing a partial date. This is the correct implementation pattern — it won't overwrite `"01.01"` with `""` while the user is typing `"01.01.1920"`. ### No Accessibility Regressions The changes don't modify any ARIA attributes, focus management, or keyboard interaction patterns. The `PersonTypeahead` input remains a `role="combobox"` with proper labelling. `DateInput` remains a `type="text"` with the existing validation feedback. ### Minor The `void resetKey` reactive read pattern in `PersonTypeahead.svelte` is invisible to users but its absence would cause subtle UX bugs. The comment above the effect (`// Sync display text when initialName changes OR when resetKey increments`) is sufficient documentation for developers maintaining this component.

marcel commented

2026-05-09 16:25:53 +02:00

📋 Elicit — Requirements Engineer

Verdict: ⚠️ Approved with concerns

Scope Observation

This PR bundles two distinct concerns:

FTS SQL pagination (the stated issue #468 fix): findFtsPageRaw, FtsHit, FtsPage, isPureTextRelevance, ADR-008, and all backend test changes. This is a clean, well-scoped backend performance fix.
Filter display state / person name resolution (not mentioned in issue #468): resolvePersonName in +page.server.ts, initialSenderName/initialReceiverName threading through +page.svelte, navKey / resetKey through SearchFilterBar → PersonTypeahead, and DateInput external value sync.

Looking at the git history, commits 0430383e (DateInput external value), 8a8205ad (PersonTypeahead resetKey), and a239c16c (filter display state sync) already landed on main via what appears to be issue #482's fix branch. This means the frontend filter-state changes in PR #488 are the source of the merge conflict — these files were independently modified on both paths.

The PR needs to be rebased. After rebasing, the diff should contain only the FTS-specific changes (backend + tests + ADR-008) plus whatever frontend wiring was uniquely added in this branch that didn't land through #482.

Acceptance Criteria Coverage

Issue #468's ACs (per the issue body not shown here, inferred from PR description):

✅ rows_per_call drops from match-set size to page size for pure-text RELEVANCE queries
✅ Non-text filter paths preserved (filterActive → findAllMatchingIdsByFts)
✅ Testcontainers tests verify SQL correctness against real Postgres
⬜ Total element count propagated to pagination UI (no service-level test verifying DocumentSearchResult.totalItems — Sara's gap)

Recommendation

Rebase onto main, verify the diff is clean (only FTS changes remain), then this is ready to merge. The architecture is sound and the ADR is thorough.

## 📋 Elicit — Requirements Engineer **Verdict: ⚠️ Approved with concerns** ### Scope Observation **This PR bundles two distinct concerns:** 1. **FTS SQL pagination** (the stated issue #468 fix): `findFtsPageRaw`, `FtsHit`, `FtsPage`, `isPureTextRelevance`, ADR-008, and all backend test changes. This is a clean, well-scoped backend performance fix. 2. **Filter display state / person name resolution** (not mentioned in issue #468): `resolvePersonName` in `+page.server.ts`, `initialSenderName`/`initialReceiverName` threading through `+page.svelte`, `navKey` / `resetKey` through `SearchFilterBar` → `PersonTypeahead`, and `DateInput` external value sync. Looking at the git history, commits `0430383e` (DateInput external value), `8a8205ad` (PersonTypeahead resetKey), and `a239c16c` (filter display state sync) already landed on `main` via what appears to be issue #482's fix branch. This means the frontend filter-state changes in PR #488 are the source of the merge conflict — these files were independently modified on both paths. **The PR needs to be rebased.** After rebasing, the diff should contain only the FTS-specific changes (backend + tests + ADR-008) plus whatever frontend wiring was uniquely added in this branch that didn't land through #482. ### Acceptance Criteria Coverage Issue #468's ACs (per the issue body not shown here, inferred from PR description): - ✅ `rows_per_call` drops from match-set size to page size for pure-text RELEVANCE queries - ✅ Non-text filter paths preserved (filterActive → `findAllMatchingIdsByFts`) - ✅ Testcontainers tests verify SQL correctness against real Postgres - ⬜ Total element count propagated to pagination UI (no service-level test verifying `DocumentSearchResult.totalItems` — Sara's gap) ### Recommendation Rebase onto main, verify the diff is clean (only FTS changes remain), then this is ready to merge. The architecture is sound and the ADR is thorough.

marcel added 6 commits 2026-05-09 16:32:09 +02:00

docs(adr): ADR-008 SQL-level FTS pagination via window-function CTE de1c55d18e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(fts): add FtsHit/FtsPage records; rename findRankedIdsByFts -> findAllMatchingIdsByFts ea136a8724

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(fts): push FTS pagination into SQL via CTE window function a8e732ac39

Pure-text RELEVANCE queries now use findFtsPageRaw (CTE + COUNT(*) OVER())
instead of loading all matching IDs into memory and sorting in-process.
Non-text paths (filters active, DATE sort) still use the in-memory path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(fts): add integration tests and update unit tests for SQL-paginated relevance 4a0a43b1cf

- DocumentFtsPagedIntegrationTest: Testcontainers repo-level tests for
  findFtsPageRaw (page size, window total, last page, no matches, stopword)
- DocumentServiceSortTest: rewritten to stub findFtsPageRaw + findAllById
  for the pure-text RELEVANCE path; verifies filter-active path stays in-memory
- DocumentServiceTest: update two enrichment tests to use new SQL-path stubs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(fts): address PR #488 review concerns ff2eb2ab91

- Extract isPureTextRelevance() private static method to replace the
  7-clause inline boolean in searchDocuments
- Guard long→int cast in relevanceSortedPageFromSql to prevent silent
  overflow at page ≥43M (CWE-190)
- resolvePersonName now uses the typed API client (createApiClient)
  instead of raw fetch, aligning with project conventions
- Update DocumentServiceTest stubs to match new FTS path (findFtsPageRaw
  + findAllById instead of findAllMatchingIdsByFts)
- Rewrite page.server.spec.ts person-name tests to mock via path-based
  API dispatch, matching the new api.GET call site

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(fts): add overflow guard and UUID-as-String regression tests

CI / Unit & Component Tests (push) Failing after 4m38s

Details

CI / OCR Service Tests (push) Successful in 36s

Details

CI / Backend Unit Tests (push) Failing after 3m26s

Details

CI / Unit & Component Tests (pull_request) Failing after 4m29s

Details

CI / OCR Service Tests (pull_request) Successful in 32s

Details

CI / Backend Unit Tests (pull_request) Failing after 3m20s

Details

7e1f4f8b09

- searchDocuments_relevance_returns_empty_when_offset_exceeds_maxInt:
  proves the long→int guard fires and findFtsPageRaw is never called
- searchDocuments_relevance_handles_string_uuid_from_jdbc_driver:
  exercises the toFtsPage String fallback branch for JDBC drivers that
  return UUID columns as String instead of java.util.UUID

Addresses Sara's review concerns on PR #488.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel force-pushed fix/issue-468-fts-sql-pagination from ed32de9f10 to 7e1f4f8b09

2026-05-09 16:32:09 +02:00

Compare

marcel referenced this issue from a commit

2026-05-09 16:32:09 +02:00

refactor(fts): address PR #488 review concerns

marcel referenced this issue from a commit

2026-05-09 16:32:09 +02:00

test(fts): add overflow guard and UUID-as-String regression tests

marcel merged commit 915ad9f5c6 into main

2026-05-09 16:35:15 +02:00

marcel referenced this issue from a commit

2026-05-09 16:35:18 +02:00

refactor(fts): address PR #488 review concerns

marcel deleted branch fix/issue-468-fts-sql-pagination

2026-05-09 16:35:18 +02:00

marcel referenced this issue from a commit

2026-05-09 16:35:19 +02:00

test(fts): add overflow guard and UUID-as-String regression tests

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#488

fix(fts): paginate FTS match-set in SQL instead of loading all matching IDs #488

Summary

Test plan

🏛️ Markus Keller — Senior Application Architect

What's Done Well

Concerns

Documentation Coverage Check

👨‍💻 Felix Brandt — Senior Fullstack Developer

TDD Evidence

What's Clean

Suggestions

🔒 Nora "NullX" Steiner — Application Security Engineer

What's Secure

Concern

🧪 Sara Holt — QA Engineer & Test Strategist

What's Well Tested

Concerns

🚀 Tobias Wendt — DevOps & Platform Engineer

📋 Elicit — Requirements Engineer

Issue #468 Coverage

Scope Observation — PR bundles two issues

NFR Coverage

🎨 Leonie Voss — UX Designer & Accessibility Strategist

What Was Checked

One Note

Review concerns addressed (commit abeece7e)

👨‍💻 Felix Brandt — Senior Fullstack Developer

Clean code

Suggestion

🏗️ Markus Keller — Application Architect

Architecture

One architectural observation (suggestion, not blocker)

Doc checklist

🔐 Nora "NullX" Steiner — Security Engineer

SQL injection analysis

resolvePersonName

Overflow guard

Remaining gap (suggestion, not blocker)

🧪 Sara Holt — QA Engineer

What's well covered

Missing coverage (blockers)

Suggestion (nice to have)

🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist

PersonTypeahead resetKey prop

DateInput external value re-derive

Accessibility check

⚙️ Tobias Wendt — DevOps & Platform Engineer

Testcontainers strategy

CI overhead estimate

Nothing else to flag

📋 Elicit — Requirements Engineer

Issue closure check

Scope check

One observation (non-blocking)

Sara's blockers addressed (commit ed32de9f)

👨‍💻 Felix Brandt — Senior Fullstack Developer

Blockers

Suggestions

Positives

🏗️ Markus Keller — Application Architect

Observations

Concern

Minor

🔒 Nora "NullX" Steiner — Security Engineer

What I Checked

One Note

🧪 Sara Holt — QA Engineer

Strengths

Concerns

Verdict Rationale

🚀 Tobias Wendt — DevOps & Platform Engineer

What I Checked

Operational Note

🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist

What I Checked

No Accessibility Regressions

Minor

📋 Elicit — Requirements Engineer

Scope Observation

Acceptance Criteria Coverage

Review concerns addressed (commit `abeece7e`)

`resolvePersonName`

PersonTypeahead `resetKey` prop

Sara's blockers addressed (commit `ed32de9f`)