feat(search): NL search — resolve tag names in query parsing (#743) #765

Merged
marcel merged 15 commits from feat/issue-743-nl-search-tag-resolution into main 2026-06-07 09:33:26 +02:00
Owner

Summary

  • Extends NlQueryParserService to substring-match each LLM-extracted keyword against the tag taxonomy after Ollama parsing; matching keywords become OR-union tag filters, non-matching keywords remain as FTS text
  • Adds TagHint {id, name, color?} record and extends NlQueryInterpretation with resolvedTags: List<TagHint> + tagsApplied: boolean
  • Frontend renders resolved tags as removable "Thema: X" chips in InterpretationChipRow; chip removal navigates to /documents?tag=…&tagOp=OR preserving person/date params

Test plan

  • Backend unit: NlQueryParserServiceTest (+11 cases), TagServiceTest, NlSearchTagResolutionIntegrationTest (@DataJpaTest descendant-expansion) — 85 tests, all green
  • Frontend server: theme-chip-removal.spec.ts (5 pure-function tests) — all green
  • Frontend component: InterpretationChipRow.svelte.spec.ts (+6 browser tests) — all green
  • E2E: nl-search.spec.ts extended with Thema: Weltkrieg chip assertion + last-chip-removal scenario (requires full stack)

Closes #743

🤖 Generated with Claude Code

## Summary - Extends `NlQueryParserService` to substring-match each LLM-extracted keyword against the tag taxonomy after Ollama parsing; matching keywords become OR-union tag filters, non-matching keywords remain as FTS text - Adds `TagHint {id, name, color?}` record and extends `NlQueryInterpretation` with `resolvedTags: List<TagHint>` + `tagsApplied: boolean` - Frontend renders resolved tags as removable "Thema: X" chips in `InterpretationChipRow`; chip removal navigates to `/documents?tag=…&tagOp=OR` preserving person/date params ## Test plan - [ ] Backend unit: `NlQueryParserServiceTest` (+11 cases), `TagServiceTest`, `NlSearchTagResolutionIntegrationTest` (@DataJpaTest descendant-expansion) — 85 tests, all green - [ ] Frontend server: `theme-chip-removal.spec.ts` (5 pure-function tests) — all green - [ ] Frontend component: `InterpretationChipRow.svelte.spec.ts` (+6 browser tests) — all green - [ ] E2E: `nl-search.spec.ts` extended with `Thema: Weltkrieg` chip assertion + last-chip-removal scenario (requires full stack) Closes #743 🤖 Generated with [Claude Code](https://claude.ai/claude-code)
marcel added 12 commits 2026-06-07 00:42:15 +02:00
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Positional record fields added; all 3 construction sites updated with neutral
defaults; NlQueryParserService wired for TagService (4th constructor arg);
NlQueryParserServiceTest and NlSearchControllerTest synced.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pre-implementation step for #743: ChipType union extracted from
InterpretationChipRow and +page.svelte into shared chip-types.ts;
resolvedTags/tagsApplied neutral defaults added to test fixtures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keywords that substring-match the tag taxonomy become OR-union tag filters;
non-matching keywords stay as FTS text. Resolved tags surface in the
NlQueryInterpretation as TagHint objects with effective colours. The
rawQuery fallback is now guarded by hadStructuredMatch to prevent
double-apply when all keywords resolve.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers multi-tag match, no-match FTS fallback, mixed resolution, personRole
bypass, cap at 10, short-keyword skip, dedup, rawQuery suppression when all
keywords resolve, flag independence, colour propagation via resolveEffectiveColors,
and colour=null when depth constraint prevents resolution.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verifies the recursive CTE in findDescendantIdsByName expands a parent tag
to include all child IDs, and that findByNameContainingIgnoreCase matches
both parent and child names when the fragment appears in both.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Manual update since Docker compose backend runs old build; regenerate with
npm run generate:api once new backend is deployed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When tagsApplied is true, each resolvedTag renders as a 'Thema: Name'
chip with optional inline color style from the tag's resolved color.
Clicking × calls onRemoveChip('theme', tag.name).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs(search): ADR-028 fix + glossary + C4 diagram for tag resolution (#743)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m25s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 4m1s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 23s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m7s
64b7b2315d
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

🏛️ Markus Keller — Software Architect

Verdict: Approved

Layer boundaries and coupling

The cross-domain dependency is handled correctly throughout. NlQueryParserService calls TagService.findByNameContaining() (service layer), never TagRepository directly — the layering rule is intact. TagHint lives in search/ and acts as a DTO projected from tag/ domain entities inside the search/ package, keeping the tag domain's internal Tag entity from leaking into the search response contract.

The C4 L3 diagram (docs/architecture/c4/l3-backend-3h-search.puml) is updated with both new TagService relations: the resolution call and the resolveEffectiveColors() call. ADR-028 now correctly documents the search/tag/ dependency direction alongside the existing search/person/ and search/document/ dependencies.

ADR completeness

ADR-028 was amended rather than superseded — appropriate for an extension of the same feature. The cross-reference to ADR-033 from the keyword→tag resolution paragraph correctly externalises the case-collision and color-inheritance rules, keeping ADR-028 focused on the Ollama integration and its consequences. ADR-028's correction of the isHealthy() paragraph (it has no callers in search/; in-path degradation goes via parse() timeout/IOException) is accurate and removes a latent misunderstanding.

Minor observation — resolveEffectiveColors() on detached entities

resolveTags() calls tagService.resolveEffectiveColors(resolvedTagEntities) on Tag entities that were fetched in a non-@Transactional context. Mutating JPA entities outside a transaction is safe here (no dirty-check fires, no second-level cache involvement), but a reader arriving cold could wonder whether Hibernate might pick up the mutation. A one-line comment at the call site — something like // safe: entities are detached at this point; mutation is for DTO projection only — would remove the ambiguity for future reviewers. Not a blocker.

No Flyway migrations needed

Correct — TagHint is a projection record, not a DB entity. No schema change required.

## 🏛️ Markus Keller — Software Architect **Verdict: ✅ Approved** ### Layer boundaries and coupling The cross-domain dependency is handled correctly throughout. `NlQueryParserService` calls `TagService.findByNameContaining()` (service layer), never `TagRepository` directly — the layering rule is intact. `TagHint` lives in `search/` and acts as a DTO projected from `tag/` domain entities inside the `search/` package, keeping the tag domain's internal `Tag` entity from leaking into the search response contract. The C4 L3 diagram (`docs/architecture/c4/l3-backend-3h-search.puml`) is updated with both new `TagService` relations: the resolution call and the `resolveEffectiveColors()` call. ADR-028 now correctly documents the `search/` → `tag/` dependency direction alongside the existing `search/` → `person/` and `search/` → `document/` dependencies. ### ADR completeness ADR-028 was amended rather than superseded — appropriate for an extension of the same feature. The cross-reference to ADR-033 from the keyword→tag resolution paragraph correctly externalises the case-collision and color-inheritance rules, keeping ADR-028 focused on the Ollama integration and its consequences. ADR-028's correction of the `isHealthy()` paragraph (it has no callers in `search/`; in-path degradation goes via `parse()` timeout/IOException) is accurate and removes a latent misunderstanding. ### Minor observation — `resolveEffectiveColors()` on detached entities `resolveTags()` calls `tagService.resolveEffectiveColors(resolvedTagEntities)` on `Tag` entities that were fetched in a non-`@Transactional` context. Mutating JPA entities outside a transaction is safe here (no dirty-check fires, no second-level cache involvement), but a reader arriving cold could wonder whether Hibernate might pick up the mutation. A one-line comment at the call site — something like `// safe: entities are detached at this point; mutation is for DTO projection only` — would remove the ambiguity for future reviewers. Not a blocker. ### No Flyway migrations needed Correct — `TagHint` is a projection record, not a DB entity. No schema change required.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: Approved

TDD adherence

The plan separates failing test steps (1, 7, 9, 13, 15) from production code steps (2, 8, 14, 16) — red-before-green order preserved throughout. The commit history confirms each test file was introduced before its corresponding implementation. No shortcuts here.

Standout design decision — buildThemeRemovalUrl extraction

Pulling theme chip removal into a pure function (frontend/src/routes/documents/theme-chip-removal.ts) instead of testing it as browser interaction is the right call. Five focused server-mode tests give precise feedback on URL construction without spinning up a Svelte runtime. This is exactly the kind of testability refactor that keeps the unit layer fast and the browser layer lean.

Backend code quality

  • TagResolution private record as the return type for resolveTags() gives the intermediate result a name — much better than a three-element tuple or parallel lists.
  • hadStructuredMatch boolean solves the rawQuery double-apply invariant cleanly. The name is unambiguous once you know the context, but given it's a non-obvious guard, the comment at the buildText call site is warranted (which you added — ).
  • MIN_TAG_TERM = 3 and MAX_RESOLVED_TAGS = 10 as package-private constants are the right call — named, changeable, and visible in tests.
  • LinkedHashSet<Tag> for dedup preserves insertion order → deterministic resolvedTags ordering in the response. Good.

Frontend code quality

  • chip-types.ts extraction resolves the duplication between InterpretationChipRow.svelte and +page.svelte. Clean.
  • The { key: string; type: 'theme'; tag: TagHint; label: string } discriminated union member is consistent with the existing chip union shape.
  • tagColorStyle() helper isolating the --c-tag-{color} CSS variable construction keeps the template readable.

Minor note

resolveEffectiveColors() mutates Tag entities in-place. The test note "build mock Tag fresh per test" is critical — if any test shares a Tag instance across cases, a color-related assertion in test N could be poisoned by test N-1's mutation. Confirm each color test constructs its own fresh Tag objects.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** ### TDD adherence The plan separates failing test steps (1, 7, 9, 13, 15) from production code steps (2, 8, 14, 16) — red-before-green order preserved throughout. The commit history confirms each test file was introduced before its corresponding implementation. No shortcuts here. ### Standout design decision — `buildThemeRemovalUrl` extraction Pulling theme chip removal into a pure function (`frontend/src/routes/documents/theme-chip-removal.ts`) instead of testing it as browser interaction is the right call. Five focused server-mode tests give precise feedback on URL construction without spinning up a Svelte runtime. This is exactly the kind of testability refactor that keeps the unit layer fast and the browser layer lean. ### Backend code quality - `TagResolution` private record as the return type for `resolveTags()` gives the intermediate result a name — much better than a three-element tuple or parallel lists. - `hadStructuredMatch` boolean solves the rawQuery double-apply invariant cleanly. The name is unambiguous once you know the context, but given it's a non-obvious guard, the comment at the `buildText` call site is warranted (which you added — ✅). - `MIN_TAG_TERM = 3` and `MAX_RESOLVED_TAGS = 10` as package-private constants are the right call — named, changeable, and visible in tests. - `LinkedHashSet<Tag>` for dedup preserves insertion order → deterministic `resolvedTags` ordering in the response. Good. ### Frontend code quality - `chip-types.ts` extraction resolves the duplication between `InterpretationChipRow.svelte` and `+page.svelte`. Clean. - The `{ key: string; type: 'theme'; tag: TagHint; label: string }` discriminated union member is consistent with the existing chip union shape. - `tagColorStyle()` helper isolating the `--c-tag-{color}` CSS variable construction keeps the template readable. ### Minor note `resolveEffectiveColors()` mutates `Tag` entities in-place. The test note "build mock Tag fresh per test" is critical — if any test shares a `Tag` instance across cases, a color-related assertion in test N could be poisoned by test N-1's mutation. Confirm each color test constructs its own fresh `Tag` objects.
Author
Owner

🚀 Tobias Wendt — DevOps / Infrastructure

Verdict: Approved

Infrastructure delta: zero

No new Compose services, no new config properties, no Flyway migrations, no changes to the rate limiter or its Bucket4j/Caffeine wiring. The TagService call is pure application-layer — no container or network topology change. Clean.

Performance note (non-blocking)

resolveTags() currently issues N sequential findByNameContaining() DB queries — one per keyword extracted by Ollama. For a query with five keywords that's five round-trips before the searchDocuments() call. In the current architecture this is invisible: Ollama's 2–15 s inference window dwarfs N × 1 ms Postgres round-trips by at least two orders of magnitude. Not a concern today.

If keyword counts ever grow substantially (say, a user sends a long comma-separated list that Ollama parses into 20+ keywords), a single IN-style batch query (WHERE LOWER(name) LIKE ANY(?)) would reduce this to one round-trip. Suggested as a future optimisation ticket rather than a blocker here.

Observability suggestion (non-blocking)

The existing log line records queryLength, resolvedPersonCount, and latencyMs. With this PR, tag resolution is now part of every search path. Adding resolvedTagCount and tagsApplied to that structured log entry would let you answer "Is tag resolution providing value in production?" from Grafana without touching the code again. One-liner addition, worth a follow-up issue.

## 🚀 Tobias Wendt — DevOps / Infrastructure **Verdict: ✅ Approved** ### Infrastructure delta: zero No new Compose services, no new config properties, no Flyway migrations, no changes to the rate limiter or its Bucket4j/Caffeine wiring. The `TagService` call is pure application-layer — no container or network topology change. Clean. ### Performance note (non-blocking) `resolveTags()` currently issues N sequential `findByNameContaining()` DB queries — one per keyword extracted by Ollama. For a query with five keywords that's five round-trips before the `searchDocuments()` call. In the current architecture this is invisible: Ollama's 2–15 s inference window dwarfs N × 1 ms Postgres round-trips by at least two orders of magnitude. Not a concern today. If keyword counts ever grow substantially (say, a user sends a long comma-separated list that Ollama parses into 20+ keywords), a single `IN`-style batch query (`WHERE LOWER(name) LIKE ANY(?)`) would reduce this to one round-trip. Suggested as a future optimisation ticket rather than a blocker here. ### Observability suggestion (non-blocking) The existing log line records `queryLength`, `resolvedPersonCount`, and `latencyMs`. With this PR, tag resolution is now part of every search path. Adding `resolvedTagCount` and `tagsApplied` to that structured log entry would let you answer "Is tag resolution providing value in production?" from Grafana without touching the code again. One-liner addition, worth a follow-up issue.
Author
Owner

📋 Elicit — Requirements Engineer

Verdict: Approved

Acceptance criteria coverage (against issue #743)

Requirement Status
Keywords matching tag names → OR-union tag filter
Unmatched keywords → remain as FTS text
Mixed (some resolve, some don't) → split correctly
resolvedTags in response → frontend theme chips
Theme chip × button removes tag filter, page reloads
personRole:any path: tagsApplied=false, resolvedTags still populated (transparency) documented in ADR-028
hadStructuredMatch guard prevents rawQuery double-apply
i18n: search_chip_theme_prefix in de / en / es
GLOSSARY updated: keyword→tag resolution, TagHint, theme chip

Traceability

ADR-028 captures the feature rationale and edge-case rules (OR semantics, personRole:any limitation, cap at MAX_RESOLVED_TAGS, MIN_TAG_TERM guard). ADR-033 is cross-referenced for the color-inheritance rules. The C4 L3 diagram reflects the new search/tag/ dependency. Requirements → implementation → docs chain is complete.

One open item to verify

ADR-028 references ADR-033 for "tag name resolution and case-collision rules." Confirm ADR-033 exists and was published as part of this PR or an earlier one — if it's still a draft, the cross-reference dangles. Non-blocking if ADR-033 was already merged with #739 or an earlier issue.

Discovery items not in original spec (implementation-discovered)

The hadStructuredMatch rawQuery fallback guard appears to have been discovered during implementation rather than specified in issue #743. This is a legitimate "emerge during TDD" finding. Worth folding a one-line note into the issue body (or the ADR) explaining the invariant — future maintainers won't know it came from a discovered edge case without that context.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** ### Acceptance criteria coverage (against issue #743) | Requirement | Status | |---|---| | Keywords matching tag names → OR-union tag filter | ✅ | | Unmatched keywords → remain as FTS text | ✅ | | Mixed (some resolve, some don't) → split correctly | ✅ | | `resolvedTags` in response → frontend theme chips | ✅ | | Theme chip × button removes tag filter, page reloads | ✅ | | `personRole:any` path: `tagsApplied=false`, `resolvedTags` still populated (transparency) | ✅ documented in ADR-028 | | `hadStructuredMatch` guard prevents rawQuery double-apply | ✅ | | i18n: `search_chip_theme_prefix` in de / en / es | ✅ | | GLOSSARY updated: keyword→tag resolution, TagHint, theme chip | ✅ | ### Traceability ADR-028 captures the feature rationale and edge-case rules (OR semantics, personRole:any limitation, cap at `MAX_RESOLVED_TAGS`, `MIN_TAG_TERM` guard). ADR-033 is cross-referenced for the color-inheritance rules. The C4 L3 diagram reflects the new `search/` → `tag/` dependency. Requirements → implementation → docs chain is complete. ### One open item to verify ADR-028 references ADR-033 for "tag name resolution and case-collision rules." Confirm ADR-033 exists and was published as part of this PR or an earlier one — if it's still a draft, the cross-reference dangles. Non-blocking if ADR-033 was already merged with #739 or an earlier issue. ### Discovery items not in original spec (implementation-discovered) The `hadStructuredMatch` rawQuery fallback guard appears to have been discovered during implementation rather than specified in issue #743. This is a legitimate "emerge during TDD" finding. Worth folding a one-line note into the issue body (or the ADR) explaining the invariant — future maintainers won't know it came from a discovered edge case without that context.
Author
Owner

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: Approved

Findings

[INFORMATIONAL] LIKE wildcard characters in keyword fragments not escaped

TagService.findByNameContaining(fragment) delegates to TagRepository.findByNameContainingIgnoreCase(fragment), which generates LOWER(name) LIKE LOWER('%' || :fragment || '%'). JPA parameterization prevents SQL injection . However, if fragment contains % or _, they are interpreted as LIKE wildcards by the database engine. A fragment of "%%k" would match any tag whose name contains k (effectively broadening the query to most of the tag table). A fragment of "__" would match any tag with exactly 2 characters.

In practice, Ollama's grammar-constrained JSON output with maxLength: 100 rarely produces these characters, and the MIN_TAG_TERM = 3 guard blocks 0–2 char fragments. The risk is low. Defense-in-depth recommendation for a follow-up:

// In TagService.findByNameContaining():
String escaped = fragment.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_");
return tagRepository.findByNameContainingIgnoreCase(escaped);
// and add escapeChar = '\\' to the @Query annotation

Not a blocker. Log as a low-priority security improvement ticket.

Confirmed safe

  • No SQL injection: JPA named parameters throughout.
  • Data exposure: TagHint contains tag id, name, color — organizational metadata, no PII.
  • Auth boundary: @RequirePermission(READ_ALL) on NlSearchController unchanged. Tag resolution operates entirely inside the existing auth perimeter.
  • PII in logs: Structured log.debug and log.warn calls use SLF4J {} placeholders, no raw query content logged.
  • Ollama output → DB input: Grammar-constrained schema + maxLength: 100 per keyword + JPA parameterization = two-layer defense.
  • Rate limiting: Unchanged — 5 NL search requests per user per minute still applies to the full pipeline including tag resolution.
## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** ### Findings **[INFORMATIONAL] LIKE wildcard characters in keyword fragments not escaped** `TagService.findByNameContaining(fragment)` delegates to `TagRepository.findByNameContainingIgnoreCase(fragment)`, which generates `LOWER(name) LIKE LOWER('%' || :fragment || '%')`. JPA parameterization prevents SQL injection ✅. However, if `fragment` contains `%` or `_`, they are interpreted as LIKE wildcards by the database engine. A fragment of `"%%k"` would match any tag whose name contains `k` (effectively broadening the query to most of the tag table). A fragment of `"__"` would match any tag with exactly 2 characters. In practice, Ollama's grammar-constrained JSON output with `maxLength: 100` rarely produces these characters, and the `MIN_TAG_TERM = 3` guard blocks 0–2 char fragments. The risk is low. Defense-in-depth recommendation for a follow-up: ```java // In TagService.findByNameContaining(): String escaped = fragment.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_"); return tagRepository.findByNameContainingIgnoreCase(escaped); // and add escapeChar = '\\' to the @Query annotation ``` Not a blocker. Log as a low-priority security improvement ticket. ### Confirmed safe - **No SQL injection**: JPA named parameters throughout. ✅ - **Data exposure**: `TagHint` contains tag `id`, `name`, `color` — organizational metadata, no PII. ✅ - **Auth boundary**: `@RequirePermission(READ_ALL)` on `NlSearchController` unchanged. Tag resolution operates entirely inside the existing auth perimeter. ✅ - **PII in logs**: Structured `log.debug` and `log.warn` calls use SLF4J `{}` placeholders, no raw query content logged. ✅ - **Ollama output → DB input**: Grammar-constrained schema + `maxLength: 100` per keyword + JPA parameterization = two-layer defense. ✅ - **Rate limiting**: Unchanged — 5 NL search requests per user per minute still applies to the full pipeline including tag resolution. ✅
Author
Owner

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: ⚠️ Approved with concerns

Test pyramid assessment

Backend unit — NlQueryParserServiceTest: Covers 12+ cases across the full decision matrix: single resolve, multi-resolve, no-match, mixed, personRole:any, cap at MAX_RESOLVED_TAGS, MIN_TAG_TERM guard, dedup, all-resolve rawQuery guard, flag independence (keywordsApplied=false AND tagsApplied=true), 1-level color inheritance, 3-level color depth (null propagation). Comprehensive.

Backend unit — TagServiceTest: Verifies findByNameContaining delegates correctly to the repository.

Backend integration — NlSearchTagResolutionIntegrationTest: @DataJpaTest + PostgresContainerConfig + Flyway — real Postgres, real schema. Covers the end-to-end descendant-expansion path.

Frontend unit — theme-chip-removal.spec.ts: 5 server-mode tests for pure function. N remaining tags → N tag params + tagOp=OR; last tag removed → no params; directional pair survives; null-color handled.

Frontend component — InterpretationChipRow.svelte.spec.ts: 6 browser tests: chips render, tagsApplied:false suppresses chips, color → inline style, null-color → no style, × click → callback with tag name, N tags → N chips.

E2E — nl-search.spec.ts: Theme chip visible assertion + removal scenario (URL asserts no tag=/tagOp=, sender UUID preserved).


Concerns

1. LIKE wildcard edge case not tested (Minor)

No test covers fragment values containing % or _. A test like:

@Test
void findByNameContaining_treats_percent_as_literal_when_escaped() {
    // if escape isn't implemented, this would match all tags
    assertThat(tagService.findByNameContaining("%%")).isEmpty();
}

...would document the current behavior and catch a future regression if escaping is added. Low risk in practice, but a gap in the boundary-condition coverage.

2. Browser test workaround not documented (Minor)

InterpretationChipRow.svelte.spec.ts runs from /tmp/fe743 due to the + character in the worktree path confusing vitest-browser's iframe URL routing. This workaround is invisible to any developer who checks out this branch on a fresh machine — they'll see test failures with no obvious cause. A // NOTE: run from /tmp/fe743 copy — see docs/... comment in the spec file, or a note in the PR description, would save the next person 30 minutes.

3. resolveEffectiveColors mutation isolation (Verify)

The plan notes "build mock Tag fresh per test (resolveEffectiveColors mutates in-place)." Confirm each color-related test case constructs its own Tag instances rather than sharing via @BeforeEach. If two tests share a Tag through a field, whichever runs second will see the first test's mutation. Not an issue if confirmed — just flag to verify in code review.

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ⚠️ Approved with concerns** ### Test pyramid assessment **Backend unit — `NlQueryParserServiceTest`**: Covers 12+ cases across the full decision matrix: single resolve, multi-resolve, no-match, mixed, `personRole:any`, cap at `MAX_RESOLVED_TAGS`, `MIN_TAG_TERM` guard, dedup, all-resolve rawQuery guard, flag independence (`keywordsApplied=false AND tagsApplied=true`), 1-level color inheritance, 3-level color depth (null propagation). Comprehensive. ✅ **Backend unit — `TagServiceTest`**: Verifies `findByNameContaining` delegates correctly to the repository. ✅ **Backend integration — `NlSearchTagResolutionIntegrationTest`**: `@DataJpaTest` + `PostgresContainerConfig` + Flyway — real Postgres, real schema. Covers the end-to-end descendant-expansion path. ✅ **Frontend unit — `theme-chip-removal.spec.ts`**: 5 server-mode tests for pure function. N remaining tags → N `tag` params + `tagOp=OR`; last tag removed → no params; directional pair survives; null-color handled. ✅ **Frontend component — `InterpretationChipRow.svelte.spec.ts`**: 6 browser tests: chips render, `tagsApplied:false` suppresses chips, color → inline style, null-color → no style, × click → callback with tag name, N tags → N chips. ✅ **E2E — `nl-search.spec.ts`**: Theme chip visible assertion + removal scenario (URL asserts no `tag=`/`tagOp=`, sender UUID preserved). ✅ --- ### Concerns **1. LIKE wildcard edge case not tested (Minor)** No test covers `fragment` values containing `%` or `_`. A test like: ```java @Test void findByNameContaining_treats_percent_as_literal_when_escaped() { // if escape isn't implemented, this would match all tags assertThat(tagService.findByNameContaining("%%")).isEmpty(); } ``` ...would document the current behavior and catch a future regression if escaping is added. Low risk in practice, but a gap in the boundary-condition coverage. **2. Browser test workaround not documented (Minor)** `InterpretationChipRow.svelte.spec.ts` runs from `/tmp/fe743` due to the `+` character in the worktree path confusing vitest-browser's iframe URL routing. This workaround is invisible to any developer who checks out this branch on a fresh machine — they'll see test failures with no obvious cause. A `// NOTE: run from /tmp/fe743 copy — see docs/...` comment in the spec file, or a note in the PR description, would save the next person 30 minutes. **3. `resolveEffectiveColors` mutation isolation (Verify)** The plan notes "build mock Tag fresh per test (resolveEffectiveColors mutates in-place)." Confirm each color-related test case constructs its own `Tag` instances rather than sharing via `@BeforeEach`. If two tests share a `Tag` through a field, whichever runs second will see the first test's mutation. Not an issue if confirmed — just flag to verify in code review.
Author
Owner

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Verdict: ⚠️ Approved with concerns

Brand and accessibility audit

Color system — Theme chips use var(--c-tag-{color}) CSS variables, consistent with the tag color tokens used in tag badges elsewhere. Null-color tags fall back to neutral chip appearance (no inline style attribute) — correct.

Chip structure — "Thema:" prefix is rendered outside the truncated name <span>, preserving screen-reader legibility ("Thema: Zweiter Weltkrieg", not "Thema: Zweiter W…").

ARIA labels — Remove button uses aria-label={m.search_filter_remove_label({ label: ... })} — localized, descriptive, screen-reader friendly.

i18n search_chip_theme_prefix present in de ("Thema"), en ("Topic"), es ("Tema").

data-chip-type="theme" attribute — Supports both accessibility tooling and test selectors.


Concerns

1. Silent tag recognition on personRole:any path (Minor — UX gap)

When a query resolves to a single person with role "any" (e.g. "Briefe von Walter über Krieg"), tagsApplied is false and no theme chips are rendered — even though "Krieg" was recognized as a known tag. From the user's perspective, the keyword disappeared silently. They typed it, Ollama understood it, but there's no visual feedback.

In a v1 context this is acceptable (documented in ADR-028), but the UX cost is real: senior users are more likely to re-type the query wondering if it was processed. A follow-up issue should explore a lightweight signal — even a muted "Thema erkannt, aber nicht angewendet" tooltip or a different chip state — for a future iteration. Non-blocking here.

2. Tag color contrast (Informational — systemic, not this PR)

--c-tag-{color} values are user-defined in the tag admin UI. If an admin defines a very light tag color (e.g. yellow/cream), the chip text may fall below WCAG AA 4.5:1. This isn't introduced by this PR — it's a pre-existing concern with the tag color system. Worth a separate audit issue for the tag admin color picker to constrain choices to accessible values.

3. Touch target on theme chip remove button (Verify)

Confirm the × dismiss button inside theme chips meets the 44×44 px minimum (WCAG 2.2, critical for the 60+ audience). If the existing keyword/date/sender chip remove buttons already have min-h-[44px] min-w-[44px] or equivalent padding, theme chips inherit this automatically — just verify consistency. if already compliant across all chip types.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist **Verdict: ⚠️ Approved with concerns** ### Brand and accessibility audit **Color system** ✅ — Theme chips use `var(--c-tag-{color})` CSS variables, consistent with the tag color tokens used in tag badges elsewhere. Null-color tags fall back to neutral chip appearance (no inline style attribute) — correct. **Chip structure** ✅ — "Thema:" prefix is rendered outside the truncated name `<span>`, preserving screen-reader legibility (`"Thema: Zweiter Weltkrieg"`, not `"Thema: Zweiter W…"`). **ARIA labels** ✅ — Remove button uses `aria-label={m.search_filter_remove_label({ label: ... })}` — localized, descriptive, screen-reader friendly. **i18n** ✅ — `search_chip_theme_prefix` present in de (`"Thema"`), en (`"Topic"`), es (`"Tema"`). **`data-chip-type="theme"` attribute** ✅ — Supports both accessibility tooling and test selectors. --- ### Concerns **1. Silent tag recognition on `personRole:any` path (Minor — UX gap)** When a query resolves to a single person with role `"any"` (e.g. "Briefe von Walter über Krieg"), `tagsApplied` is `false` and no theme chips are rendered — even though "Krieg" was recognized as a known tag. From the user's perspective, the keyword disappeared silently. They typed it, Ollama understood it, but there's no visual feedback. In a v1 context this is acceptable (documented in ADR-028), but the UX cost is real: senior users are more likely to re-type the query wondering if it was processed. A follow-up issue should explore a lightweight signal — even a muted "Thema erkannt, aber nicht angewendet" tooltip or a different chip state — for a future iteration. Non-blocking here. **2. Tag color contrast (Informational — systemic, not this PR)** `--c-tag-{color}` values are user-defined in the tag admin UI. If an admin defines a very light tag color (e.g. yellow/cream), the chip text may fall below WCAG AA 4.5:1. This isn't introduced by this PR — it's a pre-existing concern with the tag color system. Worth a separate audit issue for the tag admin color picker to constrain choices to accessible values. **3. Touch target on theme chip remove button (Verify)** Confirm the `×` dismiss button inside theme chips meets the 44×44 px minimum (WCAG 2.2, critical for the 60+ audience). If the existing keyword/date/sender chip remove buttons already have `min-h-[44px] min-w-[44px]` or equivalent padding, theme chips inherit this automatically — just verify consistency. ✅ if already compliant across all chip types.
marcel added 2 commits 2026-06-07 01:00:03 +02:00
Addresses @Markus review: tags fetched by findByNameContaining live outside
any transaction; Hibernate's dirty-check never fires on them. The comment
removes the ambiguity for cold readers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs(search): note vitest-browser workaround for + in path
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m22s
CI / OCR Service Tests (pull_request) Successful in 24s
CI / Backend Unit Tests (pull_request) Successful in 3m47s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 24s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
fb41affd4c
Addresses @Sara review: browser tests in this spec fail silently when
the project path contains '+' (common in git worktrees). The comment
tells developers to copy the frontend directory to a clean path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

Review follow-up — all concerns addressed

Changes committed

dc366ed4docs(search): add detached-entity safety comment in resolveTags

  • Adds // safe: entities are detached here; mutation is for DTO projection only, no dirty-check fires at the tagService.resolveEffectiveColors(capped) call site in NlQueryParserService.resolveTags()
  • Addresses @Markus: "a reader arriving cold could wonder whether Hibernate might pick up the mutation"

fb41affddocs(search): note vitest-browser workaround for + in path

  • Adds a 3-line NOTE comment at the top of InterpretationChipRow.svelte.spec.ts explaining that vitest-browser fails silently when the project path contains +, and how to work around it
  • Addresses @Sara concern 2: "invisible to any developer who checks out this branch on a fresh machine"

Verified OK — no code change needed

Concern Status
ADR-033 exists and cross-reference is valid 033-tag-name-resolution-tolerates-case-collisions.md confirmed
Touch target: × button on theme chips meets 44px height removeButton class has min-h-[44px] — consistent with all other chip types
resolveEffectiveColors mutation isolation in color tests Each test constructs its own Tag instance via the tag() builder factory

Follow-up issues filed

Issue Reviewer Priority
#766 feat(search): escape LIKE wildcards in TagService.findByNameContaining @Nora, @Sara P3-later / security
#767 feat(search): add resolvedTagCount and tagsApplied to NL search structured log @Tobias P3-later / feature
#768 feat(search): show recognized-but-unapplied theme chips on personRole:any path @Leonie P2-medium / ui
## Review follow-up — all concerns addressed ### Changes committed **`dc366ed4` — `docs(search): add detached-entity safety comment in resolveTags`** - Adds `// safe: entities are detached here; mutation is for DTO projection only, no dirty-check fires` at the `tagService.resolveEffectiveColors(capped)` call site in `NlQueryParserService.resolveTags()` - Addresses @Markus: "a reader arriving cold could wonder whether Hibernate might pick up the mutation" **`fb41affd` — `docs(search): note vitest-browser workaround for + in path`** - Adds a 3-line NOTE comment at the top of `InterpretationChipRow.svelte.spec.ts` explaining that vitest-browser fails silently when the project path contains `+`, and how to work around it - Addresses @Sara concern 2: "invisible to any developer who checks out this branch on a fresh machine" ### Verified OK — no code change needed | Concern | Status | |---|---| | ADR-033 exists and cross-reference is valid | ✅ `033-tag-name-resolution-tolerates-case-collisions.md` confirmed | | Touch target: `×` button on theme chips meets 44px height | ✅ `removeButton` class has `min-h-[44px]` — consistent with all other chip types | | `resolveEffectiveColors` mutation isolation in color tests | ✅ Each test constructs its own `Tag` instance via the `tag()` builder factory | ### Follow-up issues filed | Issue | Reviewer | Priority | |---|---|---| | [#766 feat(search): escape LIKE wildcards in TagService.findByNameContaining](https://git.raddatz.cloud/marcel/familienarchiv/issues/766) | @Nora, @Sara | P3-later / security | | [#767 feat(search): add resolvedTagCount and tagsApplied to NL search structured log](https://git.raddatz.cloud/marcel/familienarchiv/issues/767) | @Tobias | P3-later / feature | | [#768 feat(search): show recognized-but-unapplied theme chips on personRole:any path](https://git.raddatz.cloud/marcel/familienarchiv/issues/768) | @Leonie | P2-medium / ui |
marcel added 1 commit 2026-06-07 08:51:12 +02:00
merge: resolve conflicts with origin/main (#763 person name-match integration)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m31s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m48s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
CI / Unit & Component Tests (push) Successful in 3m20s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 3m48s
CI / fail2ban Regex (push) Successful in 46s
CI / Semgrep Security Scan (push) Successful in 23s
CI / Compose Bucket Idempotency (push) Successful in 1m8s
6878419156
- Drop unused MAX_CANDIDATES constant (not referenced in service)
- Keep detached-entity safety comment in resolveTags()
- Add 3 new partial-name match tests (23a/b/c) from #763
- Use resolveByName() API in test 28 (replaces findByDisplayNameContaining)
- Add NameMatches glossary entry from #763

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
marcel merged commit 6878419156 into main 2026-06-07 09:33:26 +02:00
marcel deleted branch feat/issue-743-nl-search-tag-resolution 2026-06-07 09:33:26 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#765