feat(search): NL search — resolve tag names in query parsing #743

New Issue

marcel · 2026-06-06T12:17:04+02:00

marcel commented

2026-06-06 12:17:04 +02:00

Part of epic #735. Post-v1 refinement (v1 #738 folds all unresolved terms into keywords; this issue resolves a subset of those terms against the tag taxonomy).

Self-contained (backend tag resolution + frontend theme chips). Depends on #739 merged — the interpretation-chip components (InterpretationChipRow.svelte) must exist before theme chips can be added.

Refinement note (req-eng pass, 2026-06-06): This issue was promoted from a parking-lot stub. Three design decisions were resolved with the product owner and three contradictions with the merged #738 backend were corrected (see Design decisions). The original stub proposed resolvedTags: List<Tag>, tag-ID filters, an ambiguousTags disambiguation flow, and a new themeNames LLM field — all four are dropped in favour of the design below.

Six-persona review pass (2026-06-06): all review findings and the four open questions have been folded into this body and resolved by the product owner: resolve as tags on the any path (D4); theme chips reuse the existing tag colour palette (D5); design review does not block this PR; the GET /api/documents/search param-name defect is corrected (see Frontend changes). OQ-1/OQ-2 are promoted to accepted defaults. Three additional correctness/edge tests were added (see Testing).

Second review pass (2026-06-06): one residual decision — theme-chip colour source — resolved by the product owner as per-tag colour (option a): TagHint carries a color field and the chip renders the tag's own --c-tag-{color}. Folded into Backend §1/§4, Frontend, and the acceptance criteria below.

Third review pass (2026-06-06): Six additional items folded in: ADR-028 reference updated to exact filename; resolveEffectiveColors mutation-guard note added; Thema: prefix placed outside truncate span; @DataJpaTest descendant-expansion test added; fresh-mock-per-test note added; branch coverage gate confirmed as 88%. One open UX decision remained.

Fourth review pass (2026-06-06): OQ-5 resolved as option B (prefix + tint + tinted-left-border); onRemoveChip value specified as tag.name; parent chip-removal filter-combination pattern added; TypeScript fixture audit added; isHealthy() clarified as "document over wire"; tagsApplied: false hint added to Out of Scope. No remaining open decisions.

Fifth review pass (2026-06-06): Three corrections from codebase reads folded in: (1) security note updated — inline style attribute IS the established --c-tag-* pattern (see TagTreeNode.svelte, TagInput.svelte, themen/+page.svelte); (2) pre-implementation DocumentService check resolved — hasTagPartial has its own null guard, tags list filter fires independently, no fix needed; (3) resolveEffectiveColors confirmed batched (findAllById single IN query). Two additions: paramsFromInterpretation() extension added as a named Frontend task; paramsFromInterpretation() unit tests + E2E "last chip + person survives" scenario added to Testing. Border-left treatment refined to 1px tinted-left-color (using inline style, consistent with existing tag components) rather than a separate 2px utility class.

Sixth review pass (2026-06-06): Three codebase-grounded findings folded in: (1) resolveEffectiveColors resolves only ONE level of color inheritance — documented as a known depth constraint, pinned with a 3-level test case (see Testing); (2) ChipType union is duplicated across InterpretationChipRow.svelte:7 and +page.svelte:272 — extract to search/chip-types.ts as a pre-implementation step; (3) C4 L3 backend diagram needs a search/ → tag/ dependency arrow (added to Documentation). Two implementation-coordination items: removeChip's if/else switch needs an explicit 'theme' case (added to Frontend changes); theme chips require a distinct {:else if chip.type === 'theme'} template branch to keep Thema: prefix outside the truncated span (added to Frontend changes). applyResolvedAndSearch signature extension approach clarified. NlQueryParserServiceTest 4th constructor-arg sync added to Testing. Call-site mutation-guard comment required for resolveEffectiveColors.

Goal

In v1, every term the LLM extracts as a keyword ("Hochzeit", "Krieg", "Kriegsende") is folded into PostgreSQL full-text search. In this iteration, NlQueryParserService additionally resolves each extracted keyword against the existing tag taxonomy. Keywords that match one or more tags become tag filters (OR-union, with the existing hierarchy expansion); keywords that match nothing remain as FTS keywords — exactly as today. No LLM schema or prompt change is required.

This makes a smart-search theme term behave the same as clicking a tag in the existing keyword search.

Design decisions (resolved)

#	Decision	Rationale
D1	No LLM change. Resolve the existing `keywords` list against tags in `NlQueryParserService`; matches → tag filters, misses → FTS keywords.	Mirrors the existing person no-match folding. The Ollama JSON schema in `RestClientOllamaClient` (`personNames`, `personRole`, `dateFrom`, `dateTo`, `keywords`) and the prompt are untouched — lowest risk, no new model failure mode. Drops the stub's `themeNames` field.
D2	OR-union, no disambiguation. When a keyword substring-matches several distinct tags ("krieg" → Weltkrieg, Kriegsende, Nachkriegszeit) all matches become one OR set (`TagOperator.OR`).	Tags are categories, not identities — "show me anything war-related" is the desired result, not an error. Best recall for the 60+ audience. Drops the stub's `ambiguousTags` concept entirely (no picker, no suppressed results). Consistent with ADR-033 resolving tag names deterministically.
D3	Self-contained scope, depends on #739.	One coherent vertical slice (parser → API → chips).
D4	`personRole:"any"` path resolves tags (does not skip). `resolvedTags` is populated on this path for an honest interpretation payload; `tagsApplied = false` keeps the chips hidden. (Resolves OQ-4.)	The 1–3 unused `ILIKE` queries are sub-millisecond inside a multi-second LLM call — trivial. A truthful interpretation object is worth more than the micro-saving of skipping the lookup.
D5	Theme chips use per-tag colour + tinted-left-border colour (OQ-5, option B). `TagHint` carries a `color` field; each chip renders (a) a tinted background from `--c-tag-{color}` and (b) a tinted left-border colour (`border-left-color: var(--c-tag-{color})`) as a chip-family visual marker — three redundant cues: `Thema:` prefix + tinted left border + tinted background. Both applied via inline `style` consistent with `TagTreeNode.svelte` / `TagInput.svelte` / `themen/+page.svelte`. Null-color fallback: omit the `style` attribute entirely; the `Thema:` prefix is the sole differentiator. (Resolves OQ-5.)	Three redundant cues for the 60+/colour-blind/bright-sunlight audience. Using inline `style` matches the established `--c-tag-*` pattern in the codebase.

Corrections to the original stub (grounded in merged #738 code)

❌ resolvedTags: List<Tag> → ✅ resolvedTags: List<TagHint>. The API never exposes JPA entities; persons surface as the lightweight PersonHint(id, displayName) record (search/PersonHint.java). Tags mirror this with a new TagHint(id, name, color).
❌ "passes resolved tag IDs as filters" → ✅ passes resolved tag names. DocumentService.searchDocuments consumes SearchFilters.tags: List<String> (names), then TagService.expandTagNamesToDescendantIdSets expands each name to its descendant-ID set. Passing names gives hierarchy expansion for free.
❌ ambiguousTags + disambiguation flow → ✅ removed (see D2).

Backend changes

1. New record — `search/TagHint.java`

public record TagHint(
        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
        UUID id,
        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
        String name,
        String color           // effective tag colour token (sage/sienna/…); nullable if no colour resolves
) {}

Plain record built from Tag entities by NlQueryParserService (not a JPA projection — same pattern as PersonHint). The color field (D5) is the tag's effective colour: tag colours are stored only on root tags and inherited by children, so the parser must call TagService.resolveEffectiveColors(matched) on the matched entities before mapping to TagHint — otherwise child tags surface a null colour. color is not @Schema(requiredMode = REQUIRED) (it can legitimately be null).

⚠️ resolveEffectiveColors depth limitation (known constraint): The method resolves exactly one level of color inheritance — it loads the direct parentId and stops. For a grandchild tag (grandchild → child (no own color) → root (color=sage)), the grandchild gets null rather than sage. The archive's tag taxonomy is typically shallow (root → child); if a matched keyword happens to be a grandchild, it surfaces null color. Document this at the call site and pin it with the 3-level test (see Testing). If the taxonomy grows deeper, fix the method to recurse.

2. Extend `search/NlQueryInterpretation.java`

Add two fields (current fields: resolvedPersons, ambiguousPersons, dateFrom, dateTo, keywords, rawQuery, keywordsApplied):

@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<TagHint> resolvedTags,     // tags the query was filtered by (after keyword→tag resolution)
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
boolean tagsApplied             // false when the active search path cannot apply tag filters

Place resolvedTags after keywords and tagsApplied after keywordsApplied (keeps person/keyword/tag groupings adjacent). NlSearchResponse is unchanged.

Note (Felix): record fields are positional, so the two new fields break all three NlQueryInterpretation construction call sites in search() (ambiguous early-return, any-path, main path). The compiler will catch them — write the tests first so the new positional args are pinned.

3. `TagService.findByNameContaining(String fragment): List<Tag>`

One-liner mirroring PersonService.findByDisplayNameContaining. Delegates to the existing TagRepository.findByNameContainingIgnoreCase(fragment) — no new JPQL. Read method → no @Transactional (consistent with PersonService read-method style).

Do not reuse TagService.search() — it enriches results with relatives (enrichWithRelatives()) for the tag-browse UI, which is unwanted noise for filter resolution.

4. `NlQueryParserService` — keyword→tag resolution

Currently (search/NlQueryParserService.java ~line 75-80) the SearchFilters is built with tags = List.of(), tagQ = null, tagOperator = TagOperator.AND. New flow, inserted after keyword extraction and before the personRole:"any" branch (so both keywordsApplied and tagsApplied are known on every path). Resolve once, then branch — do not duplicate the resolution logic in both branches:

resolveTags(keywords):
  resolvedTagEntities = new LinkedHashSet<Tag>()   // dedup, preserve order
  remainingKeywords   = new ArrayList<String>()
  for kw in keywords:
    if kw == null or kw.length() < MIN_TAG_TERM:        // guard short/noisy terms
        remainingKeywords.add(kw); continue
    matches = tagService.findByNameContaining(kw)        // substring ILIKE
    if matches.isEmpty():
        remainingKeywords.add(kw)                         // unchanged v1 behaviour
    else:
        add each match to resolvedTagEntities (stop adding once size == MAX_RESOLVED_TAGS)
  // resolveEffectiveColors mutates detached Tag instances in-place (sets color field);
  // must not be called inside a write @Transactional scope — mutations would be persisted.
  tagService.resolveEffectiveColors(resolvedTagEntities)  // D5 — see depth limitation note in §1
  return (resolvedTagEntities → List<TagHint(id, name, color)>, tag names, remainingKeywords)

FTS text is built from remainingKeywords only (plus person no-match/extra fragments) — resolved terms are removed from the text predicate so they are not double-applied (tag filter AND text match would over-narrow).
⚠️ buildText rawQuery fallback guard (correctness, not just coverage): buildText() currently falls back to rawQuery when the joined parts are blank. If every keyword resolves to a tag and there are no person/no-match fragments, the FTS text would become blank → buildText returns rawQuery, re-introducing the resolved terms as text AND applying them as tags — the exact double-apply this issue forbids. The resolved-term removal must produce an empty/null FTS text in this case, not a rawQuery fallback (e.g. only fall back to rawQuery when nothing was resolved — no tags AND no persons — or thread a boolean hadStructuredMatch into buildText). Pin this with an explicit test (see Testing).
Filter assembly (non-any path): build SearchFilters with tags = <resolved tag names>, tagOperator = TagOperator.OR, tagQ = null. Set tagsApplied = true. Person (sender/receiver) and date predicates remain AND-composed with the tag predicate; OR applies only within the tag set. (Confirmed: DocumentSpecifications.hasTagPartial has its own null guard and fires only when tagQ is non-blank; the tags list filter is a separate specification that fires independently. No guard in DocumentService suppresses it when tagQ = null.)
personRole:"any" single-person path (searchDocumentsByPersonId) supports neither keywords nor tags → set tagsApplied = false (mirrors the existing keywordsApplied = false). Per D4, resolvedTags is still populated for transparency; the frontend hides the chips when tagsApplied == false. (This path already early-returns before the SearchFilters build site ~line 63 — confirmed.)
Constants: MIN_TAG_TERM (3, accepted — resolves OQ-1) and MAX_RESOLVED_TAGS (10, mirrors person MAX_CANDIDATES = 10 — resolves OQ-2), declared as named constants in NlQueryParserService next to MAX_CANDIDATES. Both have value 10 but model different concepts (person resolution candidate cap vs. tag resolution cap) — keep them as separate named constants; they may diverge independently. When matches exceed the cap, take the first N and log.debug("Keyword matched {} tags; capping at {}", count, MAX_RESOLVED_TAGS) — count only, never the raw keyword at info/warn (parameterized SLF4J; LLM-derived input echoes private names lifted from the query; defends against log injection + PII leak per ADR-028).

Security / defense-in-depth

Keyword fragments are already length-capped to 100 chars (MAX_KEYWORD_LENGTH) by RestClientOllamaClient.toExtraction() before reaching the parser; no new validation needed before the TagRepository call. Queries are parameterized (LIKE LOWER(CONCAT('%', :q, '%'))). MIN_TAG_TERM >= 3 also blunts trivial 1–2-char enumeration of the tag taxonomy. TagHint.color is a closed enum value from TagService.ALLOWED_TAG_COLORS — use the established inline style attribute pattern from existing tag-color components (TagTreeNode.svelte, TagInput.svelte, themen/+page.svelte): style="background-color: var(--c-tag-{color}); border-left-color: var(--c-tag-{color})". TagHint.color is a closed DB-origin set and Svelte auto-escapes attribute values; no {@html} needed. TagHint.name originates from the database (Tag entity name resolved from the taxonomy), not from LLM output directly — a tighter XSS surface than keyword chips. The endpoint authz (@RequirePermission(Permission.READ_ALL) on POST /api/search/nl) is unchanged — no new endpoint, no new permission decision.

Documentation

Amend docs/adr/028-nl-search-ollama.md in this PR:

Record the keyword→tag resolution decision and the deliberate FTS-text removal of resolved terms (tag filter AND text match would over-narrow) — reference ADR-033 for the deterministic name-resolution lineage. This is the non-obvious "why" a future dev would otherwise reverse.
Correct isHealthy() wording — recommended resolution: document over wire. ADR-028 states isHealthy() is "called inline before each inference request." It is not — search() calls ollamaClient.parse() directly; isHealthy() has zero callers in search/. parse() already throws on connection failure (caught as SMART_SEARCH_UNAVAILABLE); a pre-call isHealthy() probe would add ~50–150 ms latency for a failure mode already covered. Update ADR-028 to state: isHealthy() is reserved for ops/health endpoint polling; the in-path degradation mechanism is parse()'s timeout/IOException handling. No code change — one ADR paragraph. (Pre-existing; flagged in review.)

Add the new term "keyword→tag resolution" / "theme chip" to docs/GLOSSARY.md.

C4 diagram check: the search/ package gains a new cross-domain dependency on tag/ (TagService). Check docs/architecture/c4/l3-backend-*.puml — if NlQueryParserService is diagrammed, add a dependency arrow to TagService.

Frontend changes (depends on #739)

Pre-implementation step: ChipType is currently defined in two files — InterpretationChipRow.svelte:7 and +page.svelte:272 (both 'sender' | 'directional' | 'date' | 'keyword'). Extract to frontend/src/routes/search/chip-types.ts and import from both files before the PR. When 'theme' is added, it becomes the single source of truth.

Extend InterpretationChipRow.svelte (introduced by #739) with theme chips, and extend the chip-removal machinery in the parent page:

Render one removable chip per interpretation.resolvedTags entry, only when interpretation.tagsApplied === true (same rule as keywordsApplied for keyword chips — do not show greyed/disabled chips).
Visual identity (D5 — per-tag colour via inline style): each theme chip renders (a) a tinted background and (b) a tinted left-border colour using style="background-color: var(--c-tag-{tag.color}); border-left-color: var(--c-tag-{tag.color})" — consistent with TagTreeNode.svelte, TagInput.svelte, and themen/+page.svelte. Three redundant cues: Thema: prefix + tinted left border + tinted background. When tag.color is null, omit the style attribute entirely — the chip renders in the standard neutral appearance; the Thema: prefix is the sole differentiator in this case.
Template branch: theme chips require a distinct {:else if chip.type === 'theme'} branch — do not reuse the generic {:else} branch. The generic branch places chip.label inside the nameSpan truncated element; for theme chips, the Thema: prefix and the × button must both remain outside the truncated span (only the tag name itself truncates). Mirror the structure of the directional branch. Add data-chip-type="theme" to the chip wrapper <span> (consistent with data-chip-type={chip.type} on other branches — required for test assertions).
Type-prefixed label for senior legibility: [Thema: Weltkrieg ×] (new i18n key search_chip_theme_prefix).
Removal re-run — exact GET /api/documents/search param names (corrected; resolves OQ-3): clicking × drops that tag and re-runs via GET /api/documents/search. Theme chips pass value = tag.name to onRemoveChip('theme', value). Extend the onRemoveChip callback union with 'theme' (one callback, switch on type — do not add a parallel callback). Route the re-run through the same withCsrf() wrapper as the other chips for consistency.
paramsFromInterpretation() extension and removeChip switch (parent page — +page.svelte):
- paramsFromInterpretation() at line 261 currently returns {senderId, receiverId, from, to, q}. The multi-value tag[] params cannot be expressed in this object shape. Choose one approach before implementing: (a) add tags: string[] and tagOp: string to the return type and extend applyResolvedAndSearch to accept them; or (b) build URLSearchParams directly in the 'theme' case of removeChip without changing the return type. Be consistent across removeChip and applyResolvedAndSearch.
- The removeChip if/else switch at line 274 also needs a 'theme' case — this is a separate change from paramsFromInterpretation(). Without it, clicking a theme chip's × calls applyResolvedAndSearch with the full interpretation params intact (no tag removed). The theme branch: drop the selected tag from resolvedTags, reconstruct remaining tag names, emit as tag params + tagOp=OR.
- When resolvedTags is non-empty after removing the tapped tag: params.append('tag', name) for each remaining name + params.set('tagOp', 'OR'); when the last theme chip is removed, omit both tag and tagOp. Any still-active person/date filters from the interpretation are preserved in the params as before. No re-parse via the NL endpoint — existing chip removal already clears NL state (smartMode=false, nlInterpretation=null) and switches to regular search; theme chip removal follows the same pattern.
{#each} key: "theme:" + tag.id. Use $derived for the chip list and the tagsApplied gate (not $effect).
a11y / touch (inherit from #739): ≥44px × target; focus-visible:ring-2 on the wrapper (not just the ×); truncate on the label span at max-w-[8rem] (mobile) with the Thema: prefix and the × button both outside the truncated span — the prefix is the sole non-colour differentiator for colour-blind users and must never be clipped at any viewport width; flex flex-wrap gap-2 container must absorb up to MAX_RESOLVED_TAGS=10 theme chips + person/date chips without pushing the result list off the first screen. aria-label via search_filter_remove_label (Filter entfernen: {label}), where {label} includes the "Thema" prefix ("Filter entfernen: Thema Weltkrieg"). No {@html} — tag names are LLM-influenced; rely on Svelte text-position auto-escaping.

New i18n keys (`messages/{de,en,es}.json`)

Key	de	en	es
`search_chip_theme_prefix`	"Thema"	"Topic"	"Tema"

OpenAPI / types

Run npm run generate:api in the same PR — TagHint (with id, name, color) and the two new NlQueryInterpretation fields must appear in src/lib/generated/.

Testing

TypeScript fixture audit (pre-implementation): before running npm run generate:api, run grep -rn 'NlQueryInterpretation' frontend/ to find all existing mock objects. The type regen adds two required fields (resolvedTags: TagHint[], tagsApplied: boolean) — every mock object missing them causes npm run check failures across spec files. Add resolvedTags: [] and tagsApplied: false as neutral defaults to each found site. Confirmed primary site: makeInterpretation() factory at InterpretationChipRow.svelte.spec.ts:14.

Backend — NlQueryParserServiceTest constructor sync (pre-implementation): The test at line 43 manually constructs the service: new NlQueryParserService(ollamaClient, personService, documentService). Adding TagService as the 4th constructor arg breaks all existing tests at compile time. Add @Mock TagService tagService and a default stub when(tagService.findByNameContaining(anyString())).thenReturn(List.of()) in setUp() alongside the existing service stubs.

Backend — NlQueryParserServiceTest (Mockito), additions (assert filters.tags() and filters.tagOperator() with exact matchers, never any() — a vacuous any() lets an accidental AND default pass; write tag(name) / tagHint(id,name,color) factory helpers first, following the existing lowercase private-method convention):

keyword resolves to a single tag → resolvedTags contains it, tagsApplied == true, removed from FTS text; assert searchDocuments called with filters.tags() == [name] and filters.tagOperator() == TagOperator.OR
keyword substring-matches multiple tags → all in resolvedTags; assert OR-union filter
keyword with no tag match → stays in FTS text, resolvedTags empty
mixed (one resolves, one doesn't) → resolved term becomes tag, unresolved stays in the space-joined text argument
single person + personRole:"any" + a resolvable keyword → searchDocumentsByPersonId path taken, searchDocuments never called (verify(..., never())), tagsApplied == false, resolvedTags still populated (D4)
cap: keyword matches > MAX_RESOLVED_TAGS tags → capped, debug-logged (count only)
short keyword (< MIN_TAG_TERM) → skipped from tag resolution, stays as keyword
dedup: the same tag matched via multiple keywords/substrings ("krieg" + "kriegsende" both hitting Kriegsende) → appears once in resolvedTags (LinkedHashSet)
all-resolve rawQuery guard (latent bug): every keyword resolves and there are no person/no-match fragments → assert filters.text() == null and filters.tags() populated (no double-apply)
flag independence: all keywords became tags → keywordsApplied == false and tagsApplied == true in the same interpretation (frontend contract: keyword chips hidden, theme chips shown)
colour (D5, one-level): a child tag with no own colour resolves to its parent's colour → the resulting TagHint.color equals the parent's (exercises resolveEffectiveColors); build the mock Tag entity fresh per test (not a shared field) — resolveEffectiveColors calls tag.setColor(...) in-place and bleeds into other tests that share the same entity instance
colour depth limitation (3-level): a grandchild tag (no own color) whose direct parent also has no color → assert TagHint.color == null. This pins the known one-level depth constraint of resolveEffectiveColors as a documented limitation rather than a silent production bug. If the archive's taxonomy grows deeper, fix the method to recurse and update this test.

Backend — TagServiceTest: findByNameContaining delegates to findByNameContainingIgnoreCase (red → green). Do not test the trivial TagHint record accessors.

Backend — descendant-expansion integration test (@DataJpaTest, postgres:16-alpine): create a parent tag + a child tag; a document tagged only on the child; assert an NL search with a keyword substring-matching the parent name returns that document. Verifies expandTagNamesToDescendantIdSets fires end-to-end — the acceptance criterion "filtered by the OR-union of every tag … and their descendants" is not provable via Mockito alone. Use postgres:16-alpine (project convention — never H2). Check for an existing Testcontainers base class (NlSearchControllerTest.java — inspect its imports) before declaring a new container.

Coverage gate: target is 88% branch coverage (backend/CLAUDE.md). The new branches — short-term skip, cap, dedup, all-resolve guard, flag independence — are the ones easiest to miss; verify each is hit.

Frontend — InterpretationChipRow.svelte.spec.ts (vitest-browser-svelte):

resolvedTags non-empty + tagsApplied: true → theme chips render with Thema: prefix
tagsApplied: false → assert DOM absence via querySelectorAll('[data-chip-type="theme"]') — not [data-chip-type] which would also catch other chip types
a theme chip with color: "sage" renders with inline style containing background-color: var(--c-tag-sage) and border-left-color: var(--c-tag-sage); a chip with color: null renders with no tint style (standard neutral chip appearance)
clicking × on a theme chip → onRemoveChip('theme', tag.name) called with the tag's name as value (assert callback receives the name string — the HTTP call is the parent component's responsibility, not the chip row's)
N resolved tags → exactly N theme chips

Frontend — paramsFromInterpretation() unit tests (vitest): the parent page's param-reconstruction function must be tested for the theme-chip branch covering: (a) N remaining tags → exactly N tag params + tagOp=OR; (b) last tag removed → no tag/tagOp in params; (c) last tag removed with a resolved person present → sender param intact, no tag/tagOp; (d) null-color tag → tag name is emitted correctly (color does not affect param construction).

No new Playwright test required — extend the existing #739 happy-path E2E fixture to include resolvedTags + tagsApplied: true. Locate the fixture file first: run find frontend/e2e -name "*.spec.ts" | xargs grep -l "smartMode". Also add a "last theme chip removed, person filter survives" scenario: NL result with one resolved person + one theme tag; remove the theme chip; assert no tag/tagOp in the resulting URL but the sender UUID param is present.

Acceptance criteria

POST /api/search/nl with {"query": "Briefe über den Krieg"} returns results filtered by the OR-union of every tag whose name substring-matches "Krieg" (and their descendants), with those tags in interpretation.resolvedTags and tagsApplied == true
A keyword with no matching tag is folded into the FTS keyword text and does not appear in resolvedTags (v1 behaviour preserved)
A keyword matching multiple tags includes all of them; a document tagged with any one of them appears in results (OR semantics)
A keyword matching the same tag via multiple substrings yields that tag once in resolvedTags (dedup)
Resolved tag terms are removed from the FTS text predicate (not applied twice); when every keyword resolves the FTS text is null and does not fall back to rawQuery
Each resolvedTags entry carries its effective color (own colour, or the inherited parent colour for a direct child tag with no own colour; null for grandchild+ tags or when neither the tag nor its parent has a colour)
A query with a single resolved person and personRole:"any" plus a theme term runs the sender-OR-receiver search; tagsApplied == false; resolvedTags is still populated; the frontend shows no theme chips
Frontend: theme chips render as [Thema: … ×] with the tag's own --c-tag-{color} applied via inline style (background-color + border-left-color); null-color falls back to standard neutral chip appearance (no style attribute); when tagsApplied == true; removing one re-runs GET /api/documents/search with tagOp=OR and the remaining tag values; removing the last theme chip omits both tag and tagOp while any still-active sender/receiver/date filters remain intact
GET /api/documents/search behaviour is otherwise unchanged
The Ollama JSON schema and prompt in RestClientOllamaClient are unchanged (no themeNames field)
npm run generate:api has been run; TagHint (id, name, color), resolvedTags, tagsApplied exist in generated types
npm run check passes with zero new errors after adding resolvedTags: [] and tagsApplied: false to all existing NlQueryInterpretation mock objects

Non-functional checklist

Performance: ≤ one indexed ILIKE per extracted keyword (typically 1–3), bounded by MAX_RESOLVED_TAGS; resolveEffectiveColors adds one batched parent-colour lookup (confirmed: tagRepository.findAllById(parentIdsNeeded) — single IN query). Adds < ~10ms to a 2–15s LLM call — negligible. The existing /api/search/nl Prometheus latency exclusion (#738) already covers it. Pre-merge check: verify #738's Prometheus exclusion/histogram actually landed in main — curl -s http://localhost:8080/actuator/prometheus | grep 'http_server_requests.*search/nl' should return a series; if absent, fix before merging. Document this check in the PR description as a merge checklist item. Known future lever (not this PR): LIKE LOWER(CONCAT('%', :q, '%')) is a leading-wildcard match → no B-tree index; trivial seq-scan on the small tag table today. If tag ever grows past a few thousand rows, add a pg_trgm GIN index on lower(name).
Security: keyword fragments pre-capped at 100 chars; parameterized queries; MIN_TAG_TERM >= 3 blunts taxonomy enumeration; TagHint.color is a closed DB-origin palette token applied via inline style (Svelte auto-escapes attribute values); TagHint.name is DB-origin (not LLM text); LLM-derived tag names never reach innerHTML (no {@html}) and are logged only as counts.
Observability: log resolved-tag count as metadata (log.debug) — never the raw query or tag names beyond debug level (PII policy from ADR-028). No new metric or dashboard panel (sub-ms inside a multi-second request).
i18n: new key trio added to de/en/es before any render.
Accessibility: theme chips inherit #739's ≥44px targets, aria-label (with "Thema" prefix), and focus-ring patterns. Run axe in both light and dark mode on the chip row — --c-tag-* tokens are remapped in dark mode and a token passing 4.5:1 in light can fail in dark. Perform a visual wrap stress test at 320px with 10 stub theme chips to confirm the flex flex-wrap gap-2 container absorbs them without pushing results off screen.
Compatibility: no DB migration, no infra change, no new container, no new env var, no new error code.

Out of scope

LLM themeNames field / system-prompt change (explicitly rejected — see D1)
Tag disambiguation UI / ambiguousTags (rejected — see D2)
Tag filtering on the personRole:"any" search path (returns tagsApplied = false; chips hidden — see D4)
No tagsApplied: false hint shown on the personRole:"any" path (by design — D4 transparency is API-level, not UI-level; no analogue to the keyword showKeywordsNotApplied hint)
nl-search-spec.html theme-chip mockup (nice-to-have follow-up; design review does not gate this PR — see D5)
pg_trgm index on tag.name (future lever, only if the table grows)
Recursive resolveEffectiveColors for grandchild+ color inheritance (future lever if taxonomy grows deeper than 2 levels)
Semantic/embedding-based theme matching — tracked separately under the epic parking lot

Part of epic #735. Post-v1 refinement (v1 #738 folds all unresolved terms into `keywords`; this issue resolves a subset of those terms against the tag taxonomy). **Self-contained** (backend tag resolution + frontend theme chips). **Depends on #739 merged** — the interpretation-chip components (`InterpretationChipRow.svelte`) must exist before theme chips can be added. > **Refinement note (req-eng pass, 2026-06-06):** This issue was promoted from a parking-lot stub. Three design decisions were resolved with the product owner and three contradictions with the merged #738 backend were corrected (see _Design decisions_). The original stub proposed `resolvedTags: List<Tag>`, tag-**ID** filters, an `ambiguousTags` disambiguation flow, and a new `themeNames` LLM field — **all four are dropped** in favour of the design below. > > **Six-persona review pass (2026-06-06):** all review findings and the four open questions have been folded into this body and resolved by the product owner: _resolve as tags on the `any` path (D4); theme chips reuse the existing tag colour palette (D5); design review does not block this PR; the `GET /api/documents/search` param-name defect is corrected (see Frontend changes)._ OQ-1/OQ-2 are promoted to accepted defaults. Three additional correctness/edge tests were added (see Testing). > > **Second review pass (2026-06-06):** one residual decision — theme-chip colour source — resolved by the product owner as **per-tag colour (option a)**: `TagHint` carries a `color` field and the chip renders the tag's own `--c-tag-{color}`. Folded into Backend §1/§4, Frontend, and the acceptance criteria below. > > **Third review pass (2026-06-06):** Six additional items folded in: ADR-028 reference updated to exact filename; `resolveEffectiveColors` mutation-guard note added; `Thema:` prefix placed outside `truncate` span; `@DataJpaTest` descendant-expansion test added; fresh-mock-per-test note added; branch coverage gate confirmed as 88%. One open UX decision remained. > > **Fourth review pass (2026-06-06):** OQ-5 resolved as option B (prefix + tint + tinted-left-border); `onRemoveChip` value specified as `tag.name`; parent chip-removal filter-combination pattern added; TypeScript fixture audit added; `isHealthy()` clarified as "document over wire"; `tagsApplied: false` hint added to Out of Scope. No remaining open decisions. > > **Fifth review pass (2026-06-06):** Three corrections from codebase reads folded in: (1) security note updated — inline `style` attribute IS the established `--c-tag-*` pattern (see `TagTreeNode.svelte`, `TagInput.svelte`, `themen/+page.svelte`); (2) pre-implementation `DocumentService` check resolved — `hasTagPartial` has its own null guard, `tags` list filter fires independently, no fix needed; (3) `resolveEffectiveColors` confirmed batched (`findAllById` single IN query). Two additions: `paramsFromInterpretation()` extension added as a named Frontend task; `paramsFromInterpretation()` unit tests + E2E "last chip + person survives" scenario added to Testing. Border-left treatment refined to 1px tinted-left-color (using inline `style`, consistent with existing tag components) rather than a separate 2px utility class. > > **Sixth review pass (2026-06-06):** Three codebase-grounded findings folded in: (1) `resolveEffectiveColors` resolves only ONE level of color inheritance — documented as a known depth constraint, pinned with a 3-level test case (see Testing); (2) `ChipType` union is duplicated across `InterpretationChipRow.svelte:7` and `+page.svelte:272` — extract to `search/chip-types.ts` as a pre-implementation step; (3) C4 L3 backend diagram needs a `search/` → `tag/` dependency arrow (added to Documentation). Two implementation-coordination items: `removeChip`'s `if/else` switch needs an explicit `'theme'` case (added to Frontend changes); theme chips require a distinct `{:else if chip.type === 'theme'}` template branch to keep `Thema:` prefix outside the truncated span (added to Frontend changes). `applyResolvedAndSearch` signature extension approach clarified. `NlQueryParserServiceTest` 4th constructor-arg sync added to Testing. Call-site mutation-guard comment required for `resolveEffectiveColors`. ## Goal In v1, every term the LLM extracts as a `keyword` ("Hochzeit", "Krieg", "Kriegsende") is folded into PostgreSQL full-text search. In this iteration, `NlQueryParserService` additionally **resolves each extracted keyword against the existing tag taxonomy**. Keywords that match one or more tags become **tag filters** (OR-union, with the existing hierarchy expansion); keywords that match nothing **remain as FTS keywords** — exactly as today. No LLM schema or prompt change is required. This makes a smart-search theme term behave the same as clicking a tag in the existing keyword search. ## Design decisions (resolved) | # | Decision | Rationale | |---|----------|-----------| | D1 | **No LLM change.** Resolve the existing `keywords` list against tags in `NlQueryParserService`; matches → tag filters, misses → FTS keywords. | Mirrors the existing person no-match folding. The Ollama JSON schema in `RestClientOllamaClient` (`personNames`, `personRole`, `dateFrom`, `dateTo`, `keywords`) and the prompt are **untouched** — lowest risk, no new model failure mode. Drops the stub's `themeNames` field. | | D2 | **OR-union, no disambiguation.** When a keyword substring-matches several distinct tags ("krieg" → Weltkrieg, Kriegsende, Nachkriegszeit) all matches become one OR set (`TagOperator.OR`). | Tags are categories, not identities — "show me anything war-related" is the desired result, not an error. Best recall for the 60+ audience. Drops the stub's `ambiguousTags` concept entirely (no picker, no suppressed results). Consistent with ADR-033 resolving tag names deterministically. | | D3 | **Self-contained scope**, depends on #739. | One coherent vertical slice (parser → API → chips). | | D4 | **`personRole:"any"` path resolves tags (does not skip).** `resolvedTags` is populated on this path for an honest interpretation payload; `tagsApplied = false` keeps the chips hidden. _(Resolves OQ-4.)_ | The 1–3 unused `ILIKE` queries are sub-millisecond inside a multi-second LLM call — trivial. A truthful interpretation object is worth more than the micro-saving of skipping the lookup. | | D5 | **Theme chips use per-tag colour + tinted-left-border colour (OQ-5, option B).** `TagHint` carries a `color` field; each chip renders (a) a tinted background from `--c-tag-{color}` and (b) a tinted left-border colour (`border-left-color: var(--c-tag-{color})`) as a chip-family visual marker — three redundant cues: `Thema:` prefix + tinted left border + tinted background. Both applied via inline `style` consistent with `TagTreeNode.svelte` / `TagInput.svelte` / `themen/+page.svelte`. Null-color fallback: omit the `style` attribute entirely; the `Thema:` prefix is the sole differentiator. _(Resolves OQ-5.)_ | Three redundant cues for the 60+/colour-blind/bright-sunlight audience. Using inline `style` matches the established `--c-tag-*` pattern in the codebase. | ### Corrections to the original stub (grounded in merged #738 code) - ❌ `resolvedTags: List<Tag>` → ✅ **`resolvedTags: List<TagHint>`**. The API never exposes JPA entities; persons surface as the lightweight `PersonHint(id, displayName)` record (`search/PersonHint.java`). Tags mirror this with a new `TagHint(id, name, color)`. - ❌ "passes resolved tag **IDs** as filters" → ✅ passes resolved tag **names**. `DocumentService.searchDocuments` consumes `SearchFilters.tags: List<String>` (names), then `TagService.expandTagNamesToDescendantIdSets` expands each name to its descendant-ID set. Passing names gives **hierarchy expansion for free**. - ❌ `ambiguousTags` + disambiguation flow → ✅ removed (see D2). ## Backend changes ### 1. New record — `search/TagHint.java` ```java public record TagHint( @Schema(requiredMode = Schema.RequiredMode.REQUIRED) UUID id, @Schema(requiredMode = Schema.RequiredMode.REQUIRED) String name, String color // effective tag colour token (sage/sienna/…); nullable if no colour resolves ) {} ``` Plain record built from `Tag` entities by `NlQueryParserService` (not a JPA projection — same pattern as `PersonHint`). The `color` field (D5) is the tag's **effective** colour: tag colours are stored only on root tags and inherited by children, so the parser must call `TagService.resolveEffectiveColors(matched)` on the matched entities **before** mapping to `TagHint` — otherwise child tags surface a `null` colour. `color` is **not** `@Schema(requiredMode = REQUIRED)` (it can legitimately be null). > ⚠️ **`resolveEffectiveColors` depth limitation (known constraint):** The method resolves exactly **one level** of color inheritance — it loads the direct `parentId` and stops. For a grandchild tag (grandchild → child (no own color) → root (color=sage)), the grandchild gets `null` rather than `sage`. The archive's tag taxonomy is typically shallow (root → child); if a matched keyword happens to be a grandchild, it surfaces `null` color. Document this at the call site and pin it with the 3-level test (see Testing). If the taxonomy grows deeper, fix the method to recurse. ### 2. Extend `search/NlQueryInterpretation.java` Add two fields (current fields: `resolvedPersons, ambiguousPersons, dateFrom, dateTo, keywords, rawQuery, keywordsApplied`): ```java @Schema(requiredMode = Schema.RequiredMode.REQUIRED) List<TagHint> resolvedTags, // tags the query was filtered by (after keyword→tag resolution) @Schema(requiredMode = Schema.RequiredMode.REQUIRED) boolean tagsApplied // false when the active search path cannot apply tag filters ``` Place `resolvedTags` after `keywords` and `tagsApplied` after `keywordsApplied` (keeps person/keyword/tag groupings adjacent). `NlSearchResponse` is unchanged. > **Note (Felix):** record fields are positional, so the two new fields break **all three** `NlQueryInterpretation` construction call sites in `search()` (ambiguous early-return, `any`-path, main path). The compiler will catch them — write the tests first so the new positional args are pinned. ### 3. `TagService.findByNameContaining(String fragment): List<Tag>` One-liner mirroring `PersonService.findByDisplayNameContaining`. Delegates to the **existing** `TagRepository.findByNameContainingIgnoreCase(fragment)` — no new JPQL. Read method → **no `@Transactional`** (consistent with `PersonService` read-method style). > Do **not** reuse `TagService.search()` — it enriches results with relatives (`enrichWithRelatives()`) for the tag-browse UI, which is unwanted noise for filter resolution. ### 4. `NlQueryParserService` — keyword→tag resolution Currently (`search/NlQueryParserService.java` ~line 75-80) the `SearchFilters` is built with `tags = List.of()`, `tagQ = null`, `tagOperator = TagOperator.AND`. New flow, inserted after keyword extraction and **before** the `personRole:"any"` branch (so both `keywordsApplied` and `tagsApplied` are known on every path). Resolve **once**, then branch — do not duplicate the resolution logic in both branches: ```text resolveTags(keywords): resolvedTagEntities = new LinkedHashSet<Tag>() // dedup, preserve order remainingKeywords = new ArrayList<String>() for kw in keywords: if kw == null or kw.length() < MIN_TAG_TERM: // guard short/noisy terms remainingKeywords.add(kw); continue matches = tagService.findByNameContaining(kw) // substring ILIKE if matches.isEmpty(): remainingKeywords.add(kw) // unchanged v1 behaviour else: add each match to resolvedTagEntities (stop adding once size == MAX_RESOLVED_TAGS) // resolveEffectiveColors mutates detached Tag instances in-place (sets color field); // must not be called inside a write @Transactional scope — mutations would be persisted. tagService.resolveEffectiveColors(resolvedTagEntities) // D5 — see depth limitation note in §1 return (resolvedTagEntities → List<TagHint(id, name, color)>, tag names, remainingKeywords) ``` - **FTS text** is built from `remainingKeywords` only (plus person no-match/extra fragments) — resolved terms are **removed from the text predicate** so they are not double-applied (tag filter AND text match would over-narrow). - **⚠️ `buildText` `rawQuery` fallback guard (correctness, not just coverage):** `buildText()` currently falls back to `rawQuery` when the joined parts are blank. If **every** keyword resolves to a tag and there are no person/no-match fragments, the FTS text would become blank → `buildText` returns `rawQuery`, re-introducing the resolved terms as text **AND** applying them as tags — the exact double-apply this issue forbids. The resolved-term removal must produce an **empty/null FTS text in this case, not a `rawQuery` fallback** (e.g. only fall back to `rawQuery` when nothing was resolved — no tags AND no persons — or thread a `boolean hadStructuredMatch` into `buildText`). Pin this with an explicit test (see Testing). - **Filter assembly (non-`any` path):** build `SearchFilters` with `tags = <resolved tag names>`, `tagOperator = TagOperator.OR`, `tagQ = null`. Set `tagsApplied = true`. Person (sender/receiver) and date predicates remain AND-composed with the tag predicate; OR applies only *within* the tag set. _(Confirmed: `DocumentSpecifications.hasTagPartial` has its own null guard and fires only when `tagQ` is non-blank; the `tags` list filter is a separate specification that fires independently. No guard in `DocumentService` suppresses it when `tagQ = null`.)_ - **`personRole:"any"` single-person path** (`searchDocumentsByPersonId`) supports neither keywords nor tags → set `tagsApplied = false` (mirrors the existing `keywordsApplied = false`). Per **D4**, `resolvedTags` is still populated for transparency; the frontend hides the chips when `tagsApplied == false`. (This path already early-returns before the `SearchFilters` build site ~line 63 — confirmed.) - **Constants:** `MIN_TAG_TERM` (**3**, accepted — resolves OQ-1) and `MAX_RESOLVED_TAGS` (**10**, mirrors person `MAX_CANDIDATES = 10` — resolves OQ-2), declared as named constants in `NlQueryParserService` next to `MAX_CANDIDATES`. Both have value `10` but model different concepts (person resolution candidate cap vs. tag resolution cap) — keep them as **separate named constants**; they may diverge independently. When matches exceed the cap, take the first N and `log.debug("Keyword matched {} tags; capping at {}", count, MAX_RESOLVED_TAGS)` — **count only, never the raw keyword at info/warn** (parameterized SLF4J; LLM-derived input echoes private names lifted from the query; defends against log injection + PII leak per ADR-028). ### Security / defense-in-depth Keyword fragments are already length-capped to 100 chars (`MAX_KEYWORD_LENGTH`) by `RestClientOllamaClient.toExtraction()` before reaching the parser; no new validation needed before the `TagRepository` call. Queries are parameterized (`LIKE LOWER(CONCAT('%', :q, '%'))`). `MIN_TAG_TERM >= 3` also blunts trivial 1–2-char enumeration of the tag taxonomy. `TagHint.color` is a closed enum value from `TagService.ALLOWED_TAG_COLORS` — use the established inline `style` attribute pattern from existing tag-color components (`TagTreeNode.svelte`, `TagInput.svelte`, `themen/+page.svelte`): `style="background-color: var(--c-tag-{color}); border-left-color: var(--c-tag-{color})"`. `TagHint.color` is a closed DB-origin set and Svelte auto-escapes attribute values; no `{@html}` needed. `TagHint.name` originates from the database (Tag entity name resolved from the taxonomy), not from LLM output directly — a tighter XSS surface than keyword chips. The endpoint authz (`@RequirePermission(Permission.READ_ALL)` on `POST /api/search/nl`) is unchanged — no new endpoint, no new permission decision. ### Documentation Amend **`docs/adr/028-nl-search-ollama.md`** in this PR: - Record the keyword→tag resolution decision and the **deliberate FTS-text removal of resolved terms** (tag filter AND text match would over-narrow) — reference ADR-033 for the deterministic name-resolution lineage. This is the non-obvious "why" a future dev would otherwise reverse. - **Correct `isHealthy()` wording — recommended resolution: document over wire.** ADR-028 states `isHealthy()` is "called inline before each inference request." It is not — `search()` calls `ollamaClient.parse()` directly; `isHealthy()` has zero callers in `search/`. `parse()` already throws on connection failure (caught as `SMART_SEARCH_UNAVAILABLE`); a pre-call `isHealthy()` probe would add ~50–150 ms latency for a failure mode already covered. Update ADR-028 to state: `isHealthy()` is reserved for ops/health endpoint polling; the in-path degradation mechanism is `parse()`'s timeout/IOException handling. No code change — one ADR paragraph. _(Pre-existing; flagged in review.)_ Add the new term **"keyword→tag resolution" / "theme chip"** to `docs/GLOSSARY.md`. **C4 diagram check:** the `search/` package gains a new cross-domain dependency on `tag/` (`TagService`). Check `docs/architecture/c4/l3-backend-*.puml` — if `NlQueryParserService` is diagrammed, add a dependency arrow to `TagService`. ## Frontend changes (depends on #739) **Pre-implementation step:** `ChipType` is currently defined in **two files** — `InterpretationChipRow.svelte:7` and `+page.svelte:272` (both `'sender' | 'directional' | 'date' | 'keyword'`). Extract to `frontend/src/routes/search/chip-types.ts` and import from both files before the PR. When `'theme'` is added, it becomes the single source of truth. Extend `InterpretationChipRow.svelte` (introduced by #739) with **theme chips**, and extend the chip-removal machinery in the parent page: - Render one removable chip per `interpretation.resolvedTags` entry, **only when `interpretation.tagsApplied === true`** (same rule as `keywordsApplied` for keyword chips — do not show greyed/disabled chips). - **Visual identity (D5 — per-tag colour via inline `style`):** each theme chip renders (a) a tinted background and (b) a tinted left-border colour using `style="background-color: var(--c-tag-{tag.color}); border-left-color: var(--c-tag-{tag.color})"` — consistent with `TagTreeNode.svelte`, `TagInput.svelte`, and `themen/+page.svelte`. Three redundant cues: `Thema:` prefix + tinted left border + tinted background. When `tag.color` is null, **omit the `style` attribute entirely** — the chip renders in the standard neutral appearance; the `Thema:` prefix is the sole differentiator in this case. - **Template branch:** theme chips require a **distinct `{:else if chip.type === 'theme'}` branch** — **do not reuse the generic `{:else}` branch**. The generic branch places `chip.label` inside the `nameSpan` truncated element; for theme chips, the `Thema:` prefix and the `×` button must both remain **outside** the truncated span (only the tag name itself truncates). Mirror the structure of the `directional` branch. Add `data-chip-type="theme"` to the chip wrapper `<span>` (consistent with `data-chip-type={chip.type}` on other branches — required for test assertions). - Type-prefixed label for senior legibility: **`[Thema: Weltkrieg ×]`** (new i18n key `search_chip_theme_prefix`). - **Removal re-run — exact `GET /api/documents/search` param names (corrected; resolves OQ-3):** clicking × drops that tag and re-runs via `GET /api/documents/search`. Theme chips pass **`value = tag.name`** to `onRemoveChip('theme', value)`. Extend the `onRemoveChip` callback union with `'theme'` (one callback, switch on type — do not add a parallel callback). Route the re-run through the same `withCsrf()` wrapper as the other chips for consistency. - **`paramsFromInterpretation()` extension and `removeChip` switch (parent page — `+page.svelte`):** - `paramsFromInterpretation()` at line 261 currently returns `{senderId, receiverId, from, to, q}`. The multi-value `tag[]` params cannot be expressed in this object shape. Choose **one approach** before implementing: (a) add `tags: string[]` and `tagOp: string` to the return type and extend `applyResolvedAndSearch` to accept them; or (b) build `URLSearchParams` directly in the `'theme'` case of `removeChip` without changing the return type. Be consistent across `removeChip` and `applyResolvedAndSearch`. - The `removeChip` `if/else` switch at line 274 **also needs a `'theme'` case** — this is a separate change from `paramsFromInterpretation()`. Without it, clicking a theme chip's × calls `applyResolvedAndSearch` with the full interpretation params intact (no tag removed). The theme branch: drop the selected tag from `resolvedTags`, reconstruct remaining tag names, emit as `tag` params + `tagOp=OR`. - When `resolvedTags` is non-empty after removing the tapped tag: `params.append('tag', name)` for each remaining name + `params.set('tagOp', 'OR')`; when the last theme chip is removed, omit both `tag` and `tagOp`. Any still-active person/date filters from the interpretation are preserved in the params as before. **No re-parse via the NL endpoint** — existing chip removal already clears NL state (`smartMode=false`, `nlInterpretation=null`) and switches to regular search; theme chip removal follows the same pattern. - `{#each}` key: `"theme:" + tag.id`. Use `$derived` for the chip list and the `tagsApplied` gate (not `$effect`). - a11y / touch (inherit from #739): ≥44px × target; `focus-visible:ring-2` on the **wrapper** (not just the ×); `truncate` on the label span at `max-w-[8rem]` (mobile) with the `Thema:` prefix **and** the × button both **outside** the truncated span — the prefix is the sole non-colour differentiator for colour-blind users and must never be clipped at any viewport width; `flex flex-wrap gap-2` container must absorb up to `MAX_RESOLVED_TAGS`=10 theme chips + person/date chips without pushing the result list off the first screen. `aria-label` via `search_filter_remove_label` (`Filter entfernen: {label}`), where `{label}` **includes the "Thema" prefix** ("Filter entfernen: Thema Weltkrieg"). **No `{@html}`** — tag names are LLM-influenced; rely on Svelte text-position auto-escaping. ### New i18n keys (`messages/{de,en,es}.json`) | Key | de | en | es | |-----|----|----|-----| | `search_chip_theme_prefix` | "Thema" | "Topic" | "Tema" | ## OpenAPI / types Run `npm run generate:api` in the same PR — `TagHint` (with `id`, `name`, `color`) and the two new `NlQueryInterpretation` fields must appear in `src/lib/generated/`. ## Testing **TypeScript fixture audit (pre-implementation):** before running `npm run generate:api`, run `grep -rn 'NlQueryInterpretation' frontend/` to find all existing mock objects. The type regen adds two required fields (`resolvedTags: TagHint[]`, `tagsApplied: boolean`) — every mock object missing them causes `npm run check` failures across spec files. Add `resolvedTags: []` and `tagsApplied: false` as neutral defaults to each found site. **Confirmed primary site:** `makeInterpretation()` factory at `InterpretationChipRow.svelte.spec.ts:14`. **Backend — `NlQueryParserServiceTest` constructor sync (pre-implementation):** The test at line 43 manually constructs the service: `new NlQueryParserService(ollamaClient, personService, documentService)`. Adding `TagService` as the 4th constructor arg breaks all existing tests at compile time. Add `@Mock TagService tagService` and a default stub `when(tagService.findByNameContaining(anyString())).thenReturn(List.of())` in `setUp()` alongside the existing service stubs. **Backend — `NlQueryParserServiceTest` (Mockito), additions** (assert `filters.tags()` and `filters.tagOperator()` with **exact matchers, never `any()`** — a vacuous `any()` lets an accidental `AND` default pass; write `tag(name)` / `tagHint(id,name,color)` factory helpers first, following the existing lowercase private-method convention): - keyword resolves to a single tag → `resolvedTags` contains it, `tagsApplied == true`, removed from FTS text; assert `searchDocuments` called with `filters.tags() == [name]` and `filters.tagOperator() == TagOperator.OR` - keyword substring-matches multiple tags → all in `resolvedTags`; assert OR-union filter - keyword with no tag match → stays in FTS text, `resolvedTags` empty - mixed (one resolves, one doesn't) → resolved term becomes tag, unresolved stays in the space-joined text argument - single person + `personRole:"any"` + a resolvable keyword → `searchDocumentsByPersonId` path taken, `searchDocuments` **never** called (`verify(..., never())`), `tagsApplied == false`, `resolvedTags` still populated (D4) - cap: keyword matches > `MAX_RESOLVED_TAGS` tags → capped, debug-logged (count only) - short keyword (< `MIN_TAG_TERM`) → skipped from tag resolution, stays as keyword - **dedup:** the same tag matched via multiple keywords/substrings ("krieg" + "kriegsende" both hitting *Kriegsende*) → appears **once** in `resolvedTags` (`LinkedHashSet`) - **all-resolve `rawQuery` guard (latent bug):** every keyword resolves and there are no person/no-match fragments → assert `filters.text() == null` **and** `filters.tags()` populated (no double-apply) - **flag independence:** all keywords became tags → `keywordsApplied == false` **and** `tagsApplied == true` in the same interpretation (frontend contract: keyword chips hidden, theme chips shown) - **colour (D5, one-level):** a child tag with no own colour resolves to its parent's colour → the resulting `TagHint.color` equals the parent's (exercises `resolveEffectiveColors`); build the mock `Tag` entity **fresh per test** (not a shared field) — `resolveEffectiveColors` calls `tag.setColor(...)` in-place and bleeds into other tests that share the same entity instance - **colour depth limitation (3-level):** a grandchild tag (no own color) whose direct parent also has no color → assert `TagHint.color == null`. This pins the known one-level depth constraint of `resolveEffectiveColors` as a documented limitation rather than a silent production bug. If the archive's taxonomy grows deeper, fix the method to recurse and update this test. **Backend — `TagServiceTest`:** `findByNameContaining` delegates to `findByNameContainingIgnoreCase` (red → green). Do **not** test the trivial `TagHint` record accessors. **Backend — descendant-expansion integration test (`@DataJpaTest`, `postgres:16-alpine`):** create a parent tag + a child tag; a document tagged only on the child; assert an NL search with a keyword substring-matching the parent name returns that document. Verifies `expandTagNamesToDescendantIdSets` fires end-to-end — the acceptance criterion "filtered by the OR-union of every tag … **and their descendants**" is not provable via Mockito alone. Use `postgres:16-alpine` (project convention — never H2). Check for an existing Testcontainers base class (`NlSearchControllerTest.java` — inspect its imports) before declaring a new container. **Coverage gate:** target is **88% branch coverage** (`backend/CLAUDE.md`). The new branches — short-term skip, cap, dedup, all-resolve guard, flag independence — are the ones easiest to miss; verify each is hit. **Frontend — `InterpretationChipRow.svelte.spec.ts` (vitest-browser-svelte):** - `resolvedTags` non-empty + `tagsApplied: true` → theme chips render with `Thema:` prefix - `tagsApplied: false` → assert DOM absence via `querySelectorAll('[data-chip-type="theme"]')` — **not** `[data-chip-type]` which would also catch other chip types - a theme chip with `color: "sage"` renders with inline `style` containing `background-color: var(--c-tag-sage)` and `border-left-color: var(--c-tag-sage)`; a chip with `color: null` renders with no tint style (standard neutral chip appearance) - clicking × on a theme chip → `onRemoveChip('theme', tag.name)` called with the tag's **name** as `value` (assert callback receives the name string — the HTTP call is the parent component's responsibility, not the chip row's) - N resolved tags → exactly N theme chips **Frontend — `paramsFromInterpretation()` unit tests (vitest):** the parent page's param-reconstruction function must be tested for the theme-chip branch covering: (a) N remaining tags → exactly N `tag` params + `tagOp=OR`; (b) last tag removed → no `tag`/`tagOp` in params; (c) last tag removed with a resolved person present → sender param intact, no `tag`/`tagOp`; (d) null-color tag → tag name is emitted correctly (color does not affect param construction). No new Playwright test required — extend the existing #739 happy-path E2E fixture to include `resolvedTags` + `tagsApplied: true`. **Locate the fixture file first:** run `find frontend/e2e -name "*.spec.ts" | xargs grep -l "smartMode"`. Also add a **"last theme chip removed, person filter survives"** scenario: NL result with one resolved person + one theme tag; remove the theme chip; assert no `tag`/`tagOp` in the resulting URL but the sender UUID param is present. ## Acceptance criteria - `POST /api/search/nl` with `{"query": "Briefe über den Krieg"}` returns results filtered by the OR-union of every tag whose name substring-matches "Krieg" (and their descendants), with those tags in `interpretation.resolvedTags` and `tagsApplied == true` - A keyword with no matching tag is folded into the FTS keyword text and does **not** appear in `resolvedTags` (v1 behaviour preserved) - A keyword matching multiple tags includes **all** of them; a document tagged with **any** one of them appears in results (OR semantics) - A keyword matching the same tag via multiple substrings yields that tag **once** in `resolvedTags` (dedup) - Resolved tag terms are removed from the FTS text predicate (not applied twice); when every keyword resolves the FTS text is null and does **not** fall back to `rawQuery` - Each `resolvedTags` entry carries its **effective** `color` (own colour, or the inherited parent colour for a direct child tag with no own colour; null for grandchild+ tags or when neither the tag nor its parent has a colour) - A query with a single resolved person and `personRole:"any"` plus a theme term runs the sender-OR-receiver search; `tagsApplied == false`; `resolvedTags` is still populated; the frontend shows **no** theme chips - Frontend: theme chips render as `[Thema: … ×]` with the tag's own `--c-tag-{color}` applied via inline `style` (`background-color` + `border-left-color`); null-color falls back to standard neutral chip appearance (no style attribute); when `tagsApplied == true`; removing one re-runs `GET /api/documents/search` with `tagOp=OR` and the remaining `tag` values; removing the last theme chip omits both `tag` and `tagOp` while any still-active sender/receiver/date filters remain intact - `GET /api/documents/search` behaviour is otherwise unchanged - The Ollama JSON schema and prompt in `RestClientOllamaClient` are unchanged (no `themeNames` field) - `npm run generate:api` has been run; `TagHint` (`id`, `name`, `color`), `resolvedTags`, `tagsApplied` exist in generated types - `npm run check` passes with zero new errors after adding `resolvedTags: []` and `tagsApplied: false` to all existing `NlQueryInterpretation` mock objects ## Non-functional checklist - **Performance:** ≤ one indexed `ILIKE` per extracted keyword (typically 1–3), bounded by `MAX_RESOLVED_TAGS`; `resolveEffectiveColors` adds one batched parent-colour lookup (confirmed: `tagRepository.findAllById(parentIdsNeeded)` — single IN query). Adds < ~10ms to a 2–15s LLM call — negligible. The existing `/api/search/nl` Prometheus latency exclusion (#738) already covers it. _Pre-merge check: verify #738's Prometheus exclusion/histogram actually landed in `main` — `curl -s http://localhost:8080/actuator/prometheus | grep 'http_server_requests.*search/nl'` should return a series; if absent, fix before merging. Document this check in the PR description as a merge checklist item._ _Known future lever (not this PR):_ `LIKE LOWER(CONCAT('%', :q, '%'))` is a leading-wildcard match → no B-tree index; trivial seq-scan on the small `tag` table today. If `tag` ever grows past a few thousand rows, add a `pg_trgm` GIN index on `lower(name)`. - **Security:** keyword fragments pre-capped at 100 chars; parameterized queries; `MIN_TAG_TERM >= 3` blunts taxonomy enumeration; `TagHint.color` is a closed DB-origin palette token applied via inline `style` (Svelte auto-escapes attribute values); `TagHint.name` is DB-origin (not LLM text); LLM-derived tag names never reach `innerHTML` (no `{@html}`) and are logged only as counts. - **Observability:** log resolved-tag **count** as metadata (`log.debug`) — never the raw query or tag names beyond debug level (PII policy from ADR-028). No new metric or dashboard panel (sub-ms inside a multi-second request). - **i18n:** new key trio added to de/en/es before any render. - **Accessibility:** theme chips inherit #739's ≥44px targets, `aria-label` (with "Thema" prefix), and focus-ring patterns. Run axe in both **light and dark mode** on the chip row — `--c-tag-*` tokens are remapped in dark mode and a token passing 4.5:1 in light can fail in dark. Perform a visual wrap stress test at 320px with 10 stub theme chips to confirm the `flex flex-wrap gap-2` container absorbs them without pushing results off screen. - **Compatibility:** no DB migration, no infra change, no new container, no new env var, no new error code. ## Out of scope - LLM `themeNames` field / system-prompt change (explicitly rejected — see D1) - Tag disambiguation UI / `ambiguousTags` (rejected — see D2) - Tag filtering on the `personRole:"any"` search path (returns `tagsApplied = false`; chips hidden — see D4) - No `tagsApplied: false` hint shown on the `personRole:"any"` path (by design — D4 transparency is API-level, not UI-level; no analogue to the keyword `showKeywordsNotApplied` hint) - `nl-search-spec.html` theme-chip mockup (nice-to-have follow-up; design review does **not** gate this PR — see D5) - `pg_trgm` index on `tag.name` (future lever, only if the table grows) - Recursive `resolveEffectiveColors` for grandchild+ color inheritance (future lever if taxonomy grows deeper than 2 levels) - Semantic/embedding-based theme matching — tracked separately under the epic parking lot

marcel added this to the Archive Intelligence — NL Search milestone 2026-06-06 12:17:04 +02:00

marcel added the P3-later feature labels 2026-06-06 12:19:06 +02:00

marcel commented

2026-06-06 23:44:05 +02:00

Implementation complete ✅

All 12 tasks implemented via Red→Green→Refactor TDD. Branch: feat/issue-743-nl-search-tag-resolution

Commits (12)

Commit	Description
`8bd83908`	`feat(search): add TagService.findByNameContaining for NL tag resolution`
`89051350`	`feat(search): add TagHint record for NL tag resolution API surface`
`7eee688c`	`feat(search): extend NlQueryInterpretation with resolvedTags + tagsApplied`
`e94414b8`	`refactor(search): extract ChipType to chip-types.ts; audit NL fixtures`
`fc557bd9`	`feat(search): implement keyword→tag resolution in NlQueryParserService`
`6cb10258`	`test(search): add 11 tag-resolution test cases to NlQueryParserServiceTest`
`86690fdb`	`test(search): DataJpaTest for descendant-expansion via TagRepository`
`573bca49`	`feat(i18n): add search_chip_theme_prefix to de/en/es message bundles`
`847874ab`	`feat(api): add TagHint schema and extend NlQueryInterpretation with resolvedTags/tagsApplied`
`5387bc92`	`feat(search): render removable theme chips in InterpretationChipRow`
`2a7e1337`	`feat(search): wire theme chip removal to URL navigation in +page.svelte`
`64b7b231`	`docs(search): ADR-028 fix + glossary + C4 diagram for tag resolution (#743)`

What was built

Backend:

TagService.findByNameContaining(fragment) — case-insensitive substring match via existing findByNameContainingIgnoreCase
TagHint record (id, name, color?) — lightweight representation for the API surface
NlQueryInterpretation extended with resolvedTags: List<TagHint> and tagsApplied: boolean
NlQueryParserService.resolveTags() — iterates LLM-extracted keywords; each keyword with length ≥ 3 is substring-matched against tags; matches go to resolvedTags (OR-union filter, deduplicated via LinkedHashSet, capped at 10); non-matching keywords stay as FTS text
buildText rawQuery fallback guarded by hadStructuredMatch — prevents raw query double-apply when all keywords resolve to tags
tagOperator = OR when resolved tags are non-empty; tagsApplied = false on the personRole:any single-person path (no tag filter slot there, but resolvedTags still populated for UI transparency)
TagService.resolveEffectiveColors() called on the resolved tag list (one-level color inheritance)
11 new NlQueryParserServiceTest cases covering all resolution scenarios
NlSearchTagResolutionIntegrationTest (@DataJpaTest + Testcontainers) verifying the recursive CTE findDescendantIdsByName expands parent → child

Frontend:

messages/{de,en,es}.json — search_chip_theme_prefix: "Thema" / "Topic" / "Tema"
api.ts (manually updated, generate:api run) — TagHint schema + resolvedTags/tagsApplied in NlQueryInterpretation
InterpretationChipRow.svelte — Chip union extended with theme variant; resolvedTags loop when tagsApplied; inline style from tagColorStyle() for coloured chips; remove() passes chip.tag.name as value; 6 new browser component tests
theme-chip-removal.ts — pure function buildThemeRemovalUrl(interp, removedTagName): string building the correct /documents?tag=…&tagOp=OR URL with person/date params preserved; 5 pure-function server tests
+page.svelte — removeChip 'theme' case calls buildThemeRemovalUrl, then resetNlState() + goto()
e2e/nl-search.spec.ts — Thema: Weltkrieg chip assertion in existing test; new "last theme chip removed keeps person params" scenario

Docs:

ADR-028: corrected isHealthy() paragraph (it has zero callers in search/, degradation is via parse() timeout); added keyword→tag resolution paragraph with reference to ADR-033
GLOSSARY.md: added keyword→tag resolution, TagHint, theme chip entries; updated NlQueryInterpretation entry
docs/architecture/c4/l3-backend-3h-search.puml: added TagService component + findByNameContaining / resolveEffectiveColors relation arrows

Next step

Open PR from feat/issue-743-nl-search-tag-resolution → main.

## Implementation complete ✅ All 12 tasks implemented via Red→Green→Refactor TDD. Branch: `feat/issue-743-nl-search-tag-resolution` ### Commits (12) | Commit | Description | |--------|-------------| | `8bd83908` | `feat(search): add TagService.findByNameContaining for NL tag resolution` | | `89051350` | `feat(search): add TagHint record for NL tag resolution API surface` | | `7eee688c` | `feat(search): extend NlQueryInterpretation with resolvedTags + tagsApplied` | | `e94414b8` | `refactor(search): extract ChipType to chip-types.ts; audit NL fixtures` | | `fc557bd9` | `feat(search): implement keyword→tag resolution in NlQueryParserService` | | `6cb10258` | `test(search): add 11 tag-resolution test cases to NlQueryParserServiceTest` | | `86690fdb` | `test(search): DataJpaTest for descendant-expansion via TagRepository` | | `573bca49` | `feat(i18n): add search_chip_theme_prefix to de/en/es message bundles` | | `847874ab` | `feat(api): add TagHint schema and extend NlQueryInterpretation with resolvedTags/tagsApplied` | | `5387bc92` | `feat(search): render removable theme chips in InterpretationChipRow` | | `2a7e1337` | `feat(search): wire theme chip removal to URL navigation in +page.svelte` | | `64b7b231` | `docs(search): ADR-028 fix + glossary + C4 diagram for tag resolution (#743)` | ### What was built **Backend:** - `TagService.findByNameContaining(fragment)` — case-insensitive substring match via existing `findByNameContainingIgnoreCase` - `TagHint` record (`id`, `name`, `color?`) — lightweight representation for the API surface - `NlQueryInterpretation` extended with `resolvedTags: List<TagHint>` and `tagsApplied: boolean` - `NlQueryParserService.resolveTags()` — iterates LLM-extracted keywords; each keyword with `length ≥ 3` is substring-matched against tags; matches go to `resolvedTags` (OR-union filter, deduplicated via `LinkedHashSet`, capped at 10); non-matching keywords stay as FTS text - `buildText` rawQuery fallback guarded by `hadStructuredMatch` — prevents raw query double-apply when all keywords resolve to tags - `tagOperator = OR` when resolved tags are non-empty; `tagsApplied = false` on the `personRole:any` single-person path (no tag filter slot there, but `resolvedTags` still populated for UI transparency) - `TagService.resolveEffectiveColors()` called on the resolved tag list (one-level color inheritance) - 11 new `NlQueryParserServiceTest` cases covering all resolution scenarios - `NlSearchTagResolutionIntegrationTest` (`@DataJpaTest` + Testcontainers) verifying the recursive CTE `findDescendantIdsByName` expands parent → child **Frontend:** - `messages/{de,en,es}.json` — `search_chip_theme_prefix`: "Thema" / "Topic" / "Tema" - `api.ts` (manually updated, `generate:api` run) — `TagHint` schema + `resolvedTags`/`tagsApplied` in `NlQueryInterpretation` - `InterpretationChipRow.svelte` — `Chip` union extended with `theme` variant; `resolvedTags` loop when `tagsApplied`; inline `style` from `tagColorStyle()` for coloured chips; `remove()` passes `chip.tag.name` as value; 6 new browser component tests - `theme-chip-removal.ts` — pure function `buildThemeRemovalUrl(interp, removedTagName): string` building the correct `/documents?tag=…&tagOp=OR` URL with person/date params preserved; 5 pure-function server tests - `+page.svelte` — `removeChip` 'theme' case calls `buildThemeRemovalUrl`, then `resetNlState() + goto()` - `e2e/nl-search.spec.ts` — `Thema: Weltkrieg` chip assertion in existing test; new "last theme chip removed keeps person params" scenario **Docs:** - ADR-028: corrected `isHealthy()` paragraph (it has zero callers in `search/`, degradation is via `parse()` timeout); added keyword→tag resolution paragraph with reference to ADR-033 - `GLOSSARY.md`: added `keyword→tag resolution`, `TagHint`, `theme chip` entries; updated `NlQueryInterpretation` entry - `docs/architecture/c4/l3-backend-3h-search.puml`: added `TagService` component + `findByNameContaining` / `resolveEffectiveColors` relation arrows ### Next step Open PR from `feat/issue-743-nl-search-tag-resolution` → `main`.

marcel referenced this issue from a commit

2026-06-07 00:42:00 +02:00

refactor(search): extract ChipType to chip-types.ts; audit NL fixtures

marcel referenced this issue from a commit

2026-06-07 00:42:00 +02:00

docs(search): ADR-028 fix + glossary + C4 diagram for tag resolution (#743)

marcel referenced a pull request that will close this issue

2026-06-07 00:42:15 +02:00

feat(search): NL search — resolve tag names in query parsing (#743) #765

marcel referenced this issue

2026-06-07 00:47:05 +02:00

feat(search): NL search — resolve tag names in query parsing (#743) #765