fix(search): use to_tsquery('simple') for prefix transform to avoid German stop word collision
Words like "Wille" stem to "will" via the German Snowball stemmer, which is also a German stop word. The prefix-transform step (websearch_to_tsquery text → regexp_replace → to_tsquery) was passing already-stemmed lexemes back through the German dictionary, causing them to be silently dropped as stop words. Using the 'simple' configuration skips stop-word processing entirely while the tsvector @@ tsquery comparison still works because lexemes are matched by string value, not by configuration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -87,7 +87,7 @@ public interface DocumentRepository extends JpaRepository<Document, UUID>, JpaSp
|
||||
SELECT d.id FROM documents d
|
||||
CROSS JOIN LATERAL (
|
||||
SELECT CASE WHEN websearch_to_tsquery('german', :query)::text <> ''
|
||||
THEN to_tsquery('german', regexp_replace(
|
||||
THEN to_tsquery('simple', regexp_replace(
|
||||
websearch_to_tsquery('german', :query)::text,
|
||||
'''([^'']+)''',
|
||||
'''\\1'':*',
|
||||
@@ -149,7 +149,7 @@ public interface DocumentRepository extends JpaRepository<Document, UUID>, JpaSp
|
||||
FROM documents d
|
||||
CROSS JOIN LATERAL (
|
||||
SELECT CASE WHEN websearch_to_tsquery('german', :query)::text <> ''
|
||||
THEN to_tsquery('german', regexp_replace(
|
||||
THEN to_tsquery('simple', regexp_replace(
|
||||
websearch_to_tsquery('german', :query)::text,
|
||||
'''([^'']+)''',
|
||||
'''\\1'':*',
|
||||
|
||||
Reference in New Issue
Block a user