fix(search): use to_tsquery('simple') for prefix transform to avoid German stop word collision
Some checks failed
CI / Unit & Component Tests (push) Failing after 3m51s
CI / OCR Service Tests (push) Successful in 56s
CI / Backend Unit Tests (push) Failing after 3m9s

Words like "Wille" stem to "will" via the German Snowball stemmer, which is
also a German stop word. The prefix-transform step (websearch_to_tsquery text →
regexp_replace → to_tsquery) was passing already-stemmed lexemes back through
the German dictionary, causing them to be silently dropped as stop words. Using
the 'simple' configuration skips stop-word processing entirely while the
tsvector @@ tsquery comparison still works because lexemes are matched by
string value, not by configuration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-04-25 21:40:07 +02:00
parent 18cad798fc
commit d45739cb76
2 changed files with 18 additions and 2 deletions

View File

@@ -87,7 +87,7 @@ public interface DocumentRepository extends JpaRepository<Document, UUID>, JpaSp
SELECT d.id FROM documents d
CROSS JOIN LATERAL (
SELECT CASE WHEN websearch_to_tsquery('german', :query)::text <> ''
THEN to_tsquery('german', regexp_replace(
THEN to_tsquery('simple', regexp_replace(
websearch_to_tsquery('german', :query)::text,
'''([^'']+)''',
'''\\1'':*',
@@ -149,7 +149,7 @@ public interface DocumentRepository extends JpaRepository<Document, UUID>, JpaSp
FROM documents d
CROSS JOIN LATERAL (
SELECT CASE WHEN websearch_to_tsquery('german', :query)::text <> ''
THEN to_tsquery('german', regexp_replace(
THEN to_tsquery('simple', regexp_replace(
websearch_to_tsquery('german', :query)::text,
'''([^'']+)''',
'''\\1'':*',