feat(search): upgrade to PostgreSQL full-text search with German stemming #237

Merged
marcel merged 8 commits from feat/issue-222-fts-search into main 2026-04-15 12:40:21 +02:00

8 Commits

Author SHA1 Message Date
Marcel
4b8e0637ce fix(ci): pin DOCKER_API_VERSION=1.43 for Testcontainers on NAS runner
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Successful in 3m41s
CI / Backend Unit Tests (push) Failing after 2m41s
Testcontainers 2.0.2 (via Spring Boot 4.0) negotiates Docker API 1.44,
but the NAS runner has Docker Engine 24.x which caps at 1.43. Forcing
the client version down unblocks tests until Docker is upgraded on the NAS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 12:28:57 +02:00
Marcel
793e632889 fix(lint): exclude project.inlang/ from Prettier
Some checks failed
CI / Unit & Component Tests (push) Successful in 3m49s
CI / Backend Unit Tests (push) Failing after 2m42s
CI / Unit & Component Tests (pull_request) Successful in 3m46s
CI / Backend Unit Tests (pull_request) Failing after 2m42s
Inlang regenerates .meta.json and README.md on every compilation run.
The regenerated files fail Prettier in CI because the tool writes its
own formatting, not ours.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 12:16:16 +02:00
Marcel
305f95a572 test(search): add sender name FTS coverage and combined filter test
Some checks failed
CI / Unit & Component Tests (push) Failing after 3s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1m57s
CI / Backend Unit Tests (pull_request) Failing after 3m0s
- should_find_document_by_sender_name — symmetric with existing receiver test
- fts_combined_with_status_filter_excludes_non_matching_status — verifies
  hasIds(rankedIds).and(hasStatus(...)) two-phase search works together

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:30 +02:00
Marcel
43595aeb8a refactor(search): replace O(n²) indexOf with HashMap for rank ordering
ids.indexOf() scans the full list for each document, giving O(n²) total.
Build a Map<UUID, Integer> once at O(n) and use getOrDefault at O(1) per
document. Behavior is identical; existing tests remain green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:30 +02:00
Marcel
947d8aeb6c fix(search): respect DATE sort when text is present — do not override with relevance
When a user explicitly selects DATE sort with a text query active, the
previous code treated it identically to RELEVANCE, silently discarding
the user's sort choice. Remove DATE from the useRankOrder condition so
that explicit DATE sort always goes through the standard JPA sort path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:30 +02:00
Marcel
7ec3e6170d feat(fts): backfill search_vector for all existing documents (V35)
Fires the BEFORE UPDATE trigger for every documents row, which recomputes
the tsvector from all currently-linked metadata, blocks, receivers, and tags.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:30 +02:00
Marcel
7d456d8e8b feat(fts): replace ILIKE hasText with FTS two-phase search and RELEVANCE sort
- DocumentSort: add RELEVANCE enum value
- DocumentSpecifications: remove hasText() ILIKE, add hasIds(List<UUID>)
  for FTS-pre-filtered ID sets
- DocumentService.searchDocuments(): FTS two-phase path — findRankedIdsByFts()
  returns ranked UUIDs, hasIds() narrows subsequent Specification query,
  in-memory re-sort preserves rank order; RELEVANCE is the default when
  text is present and no explicit non-relevance sort is requested
- DocumentSpecificationsTest: remove hasText() tests (Specification removed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:30 +02:00
Marcel
24530cf85b feat(fts): add search_vector column, GIN index, DB triggers, and FTS repository method (V34)
- V34 migration: adds search_vector tsvector column with GIN index
- BEFORE INSERT/UPDATE trigger on documents rebuilds vector from title (A),
  summary + transcription_blocks.text (B), sender/receiver names (C),
  tag names + location (D) using german FTS config
- AFTER triggers on transcription_blocks, document_receivers, document_tags
  touch the parent document row to re-fire the BEFORE UPDATE trigger
- DocumentRepository.findRankedIdsByFts() native query using websearch_to_tsquery
- DocumentFtsTest: 12 integration tests covering stemming, trigger sync,
  ranking, stop words, malformed input, receiver and tag search

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:35:16 +02:00