feat(search): NL search backend — POST /api/search/nl with Ollama integration (#738) #756

Merged
marcel merged 16 commits from worktree-feat+issue-738-nl-search-backend into main 2026-06-06 16:52:44 +02:00
Owner

Closes #738

Summary

  • POST /api/search/nl — natural-language query → structured NlSearchResponse
  • Ollama integration (RestClientOllamaClient) with grammar-constrained JSON schema (Qwen 2.5 7B); graceful degradation on timeout/unavailable → 503 SMART_SEARCH_UNAVAILABLE
  • Name-to-UUID resolution in NlQueryParserService: single-match, ambiguous (returns early), no-match folded into keywords, sender+receiver role detection, 3+ names with extra fragments
  • Per-user rate limiting: 5 req/min via Bucket4j + Caffeine (NlSearchRateLimiter) → 429 SMART_SEARCH_RATE_LIMITED
  • searchDocumentsByPersonId in DocumentService using JPA Specification (avoids PostgreSQL null-type inference issue with JPQL IS NULL OR)
  • Full i18n (de/en/es) for both new error codes + smart_search_keywords_not_applied
  • TypeScript API types regenerated (NlSearchRequest, NlQueryInterpretation, NlSearchResponse, PersonHint)
  • ADR-028, C4 L1/L2/L3 diagrams, CLAUDE.md, GLOSSARY, DEPLOYMENT docs updated

Test plan

  • NlQueryParserServiceTest — 23 Mockito tests
  • NlSearchControllerTest — 7 @WebMvcTest tests (incl. 429 rate limit, 503 unavailable)
  • RestClientOllamaClientTest — 4 WireMock tests
  • NlSearchRateLimiterTest — 4 tests
  • DocumentRepositoryTest — 34 tests (incl. 5 new Specification-based sender/receiver tests)
  • Manual: curl -X POST /api/search/nl -d '{"query":"Was hat Walter im Krieg geschrieben?"}' | jq . → 200 with NlSearchResponse shape (requires Ollama + qwen2.5:7b running)
  • Manual: 6 rapid requests → 6th returns 429 SMART_SEARCH_RATE_LIMITED
  • Manual: Ollama stopped → 503 SMART_SEARCH_UNAVAILABLE

🤖 Generated with Claude Code

Closes #738 ## Summary - `POST /api/search/nl` — natural-language query → structured `NlSearchResponse` - Ollama integration (`RestClientOllamaClient`) with grammar-constrained JSON schema (Qwen 2.5 7B); graceful degradation on timeout/unavailable → 503 `SMART_SEARCH_UNAVAILABLE` - Name-to-UUID resolution in `NlQueryParserService`: single-match, ambiguous (returns early), no-match folded into keywords, sender+receiver role detection, 3+ names with extra fragments - Per-user rate limiting: 5 req/min via Bucket4j + Caffeine (`NlSearchRateLimiter`) → 429 `SMART_SEARCH_RATE_LIMITED` - `searchDocumentsByPersonId` in `DocumentService` using JPA Specification (avoids PostgreSQL null-type inference issue with JPQL `IS NULL OR`) - Full i18n (de/en/es) for both new error codes + `smart_search_keywords_not_applied` - TypeScript API types regenerated (`NlSearchRequest`, `NlQueryInterpretation`, `NlSearchResponse`, `PersonHint`) - ADR-028, C4 L1/L2/L3 diagrams, CLAUDE.md, GLOSSARY, DEPLOYMENT docs updated ## Test plan - [ ] `NlQueryParserServiceTest` — 23 Mockito tests - [ ] `NlSearchControllerTest` — 7 @WebMvcTest tests (incl. 429 rate limit, 503 unavailable) - [ ] `RestClientOllamaClientTest` — 4 WireMock tests - [ ] `NlSearchRateLimiterTest` — 4 tests - [ ] `DocumentRepositoryTest` — 34 tests (incl. 5 new Specification-based sender/receiver tests) - [ ] Manual: `curl -X POST /api/search/nl -d '{"query":"Was hat Walter im Krieg geschrieben?"}' | jq .` → 200 with NlSearchResponse shape (requires Ollama + qwen2.5:7b running) - [ ] Manual: 6 rapid requests → 6th returns 429 SMART_SEARCH_RATE_LIMITED - [ ] Manual: Ollama stopped → 503 SMART_SEARCH_UNAVAILABLE 🤖 Generated with [Claude Code](https://claude.com/claude-code)
marcel added 14 commits 2026-06-06 16:28:08 +02:00
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switch to wiremock-jetty12 artifact and force ee10 Jetty deps to 12.1.8
to resolve compatibility with Spring Boot 4's Jetty 12.1.8 core.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs(search): update CLAUDE.md, GLOSSARY, DEPLOYMENT, and C4 diagrams
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m52s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
44baff9c9c
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

🏗️ Markus Keller (@mkeller) — Senior Application Architect

Verdict: Approved

The architecture is clean and well-thought-out. The PR earns a straightforward approval, with a couple of minor observations.

What's correct

  • Package-by-feature respected. search/ is a new, bounded domain package. No layer packages created.
  • Layering rule enforced. NlQueryParserService reaches into PersonService and DocumentService — not their repositories. searchDocumentsByPersonId in DocumentService fetches Person via personService.getById(), not personRepository.
  • ADR-028 committed before the PR. The context/decision/consequences structure covers all architectural choices: grammar-constrained JSON, CPU-only inference, graceful degradation, rate limiting.
  • Documentation checklist fully satisfied:
    • New external system → l1-context.puml
    • New Docker service → l2-containers.puml + DEPLOYMENT.md
    • New backend package → CLAUDE.md package table + l3-backend-3h-search.puml
    • New domain term → GLOSSARY.md
    • New ErrorCode values → CLAUDE.md + errors.ts
    • Architectural decision → docs/adr/028-nl-search-ollama.md
  • No Flyway migration needed. Feature is pure application logic — no schema changes.
  • Rate limiter documented as node-local. The caveat is captured in the C4 L3 diagram component description and ADR. Correct at current single-node scale.

Minor observations (not blockers)

buildPersonSpec uses raw field name stringsroot.get("sender"), root.get("receivers") — instead of a JPA static metamodel. This is consistent with how the rest of DocumentService uses Criteria API in this codebase, so it's not a regression. A compile-time checked metamodel would be more robust, but not required here.

Inline full-qualified class names in buildPersonSpecjakarta.persistence.criteria.JoinType.LEFT, new java.util.ArrayList<>, new jakarta.persistence.criteria.Predicate[0] — without corresponding imports. Works fine, but it's stylistically noisy. Consider adding the imports at the top of DocumentService.java.

OllamaHealthClient.isHealthy() is implemented but not wired to a Spring Boot HealthIndicator. The interface exists and is tested, but Actuator /actuator/health won't show Ollama's status. This is acceptable for the initial implementation; a future improvement could expose it as an actuator health contributor.

## 🏗️ Markus Keller (@mkeller) — Senior Application Architect **Verdict: ✅ Approved** The architecture is clean and well-thought-out. The PR earns a straightforward approval, with a couple of minor observations. ### What's correct - **Package-by-feature respected.** `search/` is a new, bounded domain package. No layer packages created. - **Layering rule enforced.** `NlQueryParserService` reaches into `PersonService` and `DocumentService` — not their repositories. `searchDocumentsByPersonId` in `DocumentService` fetches `Person` via `personService.getById()`, not `personRepository`. ✅ - **ADR-028 committed** before the PR. The context/decision/consequences structure covers all architectural choices: grammar-constrained JSON, CPU-only inference, graceful degradation, rate limiting. ✅ - **Documentation checklist fully satisfied:** - New external system → `l1-context.puml` ✅ - New Docker service → `l2-containers.puml` + `DEPLOYMENT.md` ✅ - New backend package → `CLAUDE.md` package table + `l3-backend-3h-search.puml` ✅ - New domain term → `GLOSSARY.md` ✅ - New `ErrorCode` values → `CLAUDE.md` + `errors.ts` ✅ - Architectural decision → `docs/adr/028-nl-search-ollama.md` ✅ - **No Flyway migration needed.** Feature is pure application logic — no schema changes. ✅ - **Rate limiter documented as node-local.** The caveat is captured in the C4 L3 diagram component description and ADR. Correct at current single-node scale. ✅ ### Minor observations (not blockers) **`buildPersonSpec` uses raw field name strings** — `root.get("sender")`, `root.get("receivers")` — instead of a JPA static metamodel. This is consistent with how the rest of `DocumentService` uses Criteria API in this codebase, so it's not a regression. A compile-time checked metamodel would be more robust, but not required here. **Inline full-qualified class names in `buildPersonSpec`** — `jakarta.persistence.criteria.JoinType.LEFT`, `new java.util.ArrayList<>`, `new jakarta.persistence.criteria.Predicate[0]` — without corresponding imports. Works fine, but it's stylistically noisy. Consider adding the imports at the top of `DocumentService.java`. **`OllamaHealthClient.isHealthy()` is implemented but not wired to a Spring Boot `HealthIndicator`.** The interface exists and is tested, but Actuator `/actuator/health` won't show Ollama's status. This is acceptable for the initial implementation; a future improvement could expose it as an actuator health contributor.
Author
Owner

👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer

Verdict: Approved

Solid TDD discipline throughout. 72 tests across 5 classes, all passing. Clean code with some minor style observations.

What's correct

  • Red/green discipline evident. Every new service class has a corresponding test class. The test structure matches the behavior list from the issue spec precisely.
  • Constructor injection + @RequiredArgsConstructor. All new services use Lombok constructor injection. Dependencies are final.
  • Guard clauses at the top. NlQueryParserService.search() validates query length before calling Ollama.
  • @MockitoBean (Spring Boot 4) used in NlSearchControllerTest — not the deprecated @MockBean.
  • @Schema(requiredMode = REQUIRED) on all always-populated fields in PersonHint, NlQueryInterpretation, NlSearchResponse. TypeScript codegen is correct.
  • @AuthenticationPrincipal UserDetails principal in controller for rate limit key — matches the pattern from InviteController.
  • Null-coalescing personNames/keywords to List.of() before use — no NPE risk.

Suggestions (not blockers)

DocumentService.buildPersonSpec has inline fully-qualified namesjakarta.persistence.criteria.JoinType.LEFT, new java.util.ArrayList<>, new jakarta.persistence.criteria.Predicate[0] — without top-level imports. These should be import statements, not inline FQNs.

// current — noisy
var receiversJoin = root.join("receivers", jakarta.persistence.criteria.JoinType.LEFT);
var predicates = new java.util.ArrayList<>(java.util.List.of(personPredicate));
return cb.and(predicates.toArray(new jakarta.persistence.criteria.Predicate[0]));

// cleaner — add imports at the top of DocumentService.java
import jakarta.persistence.criteria.JoinType;
import jakarta.persistence.criteria.Predicate;
// ...
var receiversJoin = root.join("receivers", JoinType.LEFT);
var predicates = new ArrayList<>(List.of(personPredicate));
return cb.and(predicates.toArray(new Predicate[0]));

NlQueryParserServiceTest uses MockitoAnnotations.openMocks(this) in @BeforeEach rather than @ExtendWith(MockitoExtension.class) on the class. Both work, but the annotation approach is the project convention (see PersonServiceTest). Minor.

Stray blank line in DocumentRepository.java — the diff shows a blank line added between findByReceiversId() and the comment. Small but unnecessary noise.

Test naming is slightly inconsistent — most tests use search_* prefix (search_resolvesSingleName_asSender) but repository tests use searchByPersonSpec_*. Not a real issue, just noting the pattern divergence.

isAnyRole returns true for null/unknown role — the logic role == null || "any".equals(role) || (!\"sender\".equals(role) && !"receiver".equals(role)) correctly defaults unknown values to "any" behavior. The sanitiseRole method in RestClientOllamaClient ensures the role is always one of the three valid values before it reaches NlQueryParserService, making the || (!sender && !receiver) branch technically unreachable from production code. It's still harmless defensive code.

## 👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer **Verdict: ✅ Approved** Solid TDD discipline throughout. 72 tests across 5 classes, all passing. Clean code with some minor style observations. ### What's correct - **Red/green discipline evident.** Every new service class has a corresponding test class. The test structure matches the behavior list from the issue spec precisely. - **Constructor injection + `@RequiredArgsConstructor`.** All new services use Lombok constructor injection. Dependencies are `final`. ✅ - **Guard clauses at the top.** `NlQueryParserService.search()` validates query length before calling Ollama. ✅ - **`@MockitoBean` (Spring Boot 4)** used in `NlSearchControllerTest` — not the deprecated `@MockBean`. ✅ - **`@Schema(requiredMode = REQUIRED)` on all always-populated fields** in `PersonHint`, `NlQueryInterpretation`, `NlSearchResponse`. TypeScript codegen is correct. ✅ - **`@AuthenticationPrincipal UserDetails principal`** in controller for rate limit key — matches the pattern from `InviteController`. ✅ - **Null-coalescing `personNames`/`keywords`** to `List.of()` before use — no NPE risk. ✅ ### Suggestions (not blockers) **`DocumentService.buildPersonSpec` has inline fully-qualified names** — `jakarta.persistence.criteria.JoinType.LEFT`, `new java.util.ArrayList<>`, `new jakarta.persistence.criteria.Predicate[0]` — without top-level imports. These should be import statements, not inline FQNs. ```java // current — noisy var receiversJoin = root.join("receivers", jakarta.persistence.criteria.JoinType.LEFT); var predicates = new java.util.ArrayList<>(java.util.List.of(personPredicate)); return cb.and(predicates.toArray(new jakarta.persistence.criteria.Predicate[0])); // cleaner — add imports at the top of DocumentService.java import jakarta.persistence.criteria.JoinType; import jakarta.persistence.criteria.Predicate; // ... var receiversJoin = root.join("receivers", JoinType.LEFT); var predicates = new ArrayList<>(List.of(personPredicate)); return cb.and(predicates.toArray(new Predicate[0])); ``` **`NlQueryParserServiceTest` uses `MockitoAnnotations.openMocks(this)` in `@BeforeEach`** rather than `@ExtendWith(MockitoExtension.class)` on the class. Both work, but the annotation approach is the project convention (see `PersonServiceTest`). Minor. **Stray blank line in `DocumentRepository.java`** — the diff shows a blank line added between `findByReceiversId()` and the comment. Small but unnecessary noise. **Test naming is slightly inconsistent** — most tests use `search_*` prefix (`search_resolvesSingleName_asSender`) but repository tests use `searchByPersonSpec_*`. Not a real issue, just noting the pattern divergence. **`isAnyRole` returns true for null/unknown role** — the logic `role == null || "any".equals(role) || (!\"sender\".equals(role) && !"receiver".equals(role))` correctly defaults unknown values to "any" behavior. The `sanitiseRole` method in `RestClientOllamaClient` ensures the role is always one of the three valid values before it reaches `NlQueryParserService`, making the `|| (!sender && !receiver)` branch technically unreachable from production code. It's still harmless defensive code.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Verdict: Approved

No security vulnerabilities found. The implementation applies defense-in-depth correctly. A couple of observations worth noting for future work.

What's correct

  • @RequirePermission(Permission.READ_ALL) on the POST endpoint — type-safe, AOP-enforced. No magic strings.
  • @Valid @RequestBody NlSearchRequest with @NotBlank @Size(min=3, max=500) — validates at the controller boundary before any downstream calls.
  • Parameterized SLF4J logging throughoutlog.warn("Ollama inference failed: {}", e.getClass().getSimpleName()). No JNDI injection risk (CWE-117).
  • No raw user input reflected in logs. NlQueryParserService never logs the query string. The C4 L3 diagram even notes "Never logs raw query content (PII)".
  • Grammar-constrained JSON schema sent to Ollama — limits the structured output to personNames, personRole, dateFrom, dateTo, keywords. This is a reasonable mitigation against prompt injection via structured output fields.
  • Rate limiter throws tooManyRequests() (HTTP 429) with structured ErrorCode — the error does not expose implementation details in the response.
  • Fail-closed on Ollama errors — any timeout, 5xx, or malformed JSON throws SMART_SEARCH_UNAVAILABLE rather than proceeding with partially parsed data.
  • No SQL injection vector. JPA Specification API builds predicates programmatically — no string concatenation.

Observations (not blockers)

Rate limit error message includes the caller's username: "NL search rate limit exceeded for user: " + userKey. If DomainException.message is serialized in the error response body, this leaks the username back to the caller. Looking at the existing LoginRateLimiter pattern, this appears to be accepted practice in this codebase. The username is already known to the caller (they provided it), so the practical impact is minimal. Still worth auditing the GlobalExceptionHandler to confirm message is not serialized.

OllamaHealthClient is not wired to Spring Boot Actuator health. The interface exists and is tested, but a degraded Ollama won't surface in /actuator/health. Consider adding a HealthIndicator bean in a future PR so monitoring can detect Ollama outages before users hit 503s.

Input validation is duplicated between @Size annotation and NlQueryParserService.search(). Both enforce min=3/max=500. The annotation fires first (Spring validation layer), so the service guard is technically redundant. This is defensive programming — not a security concern — but the duplication could be removed from the service layer.

## 🔒 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** No security vulnerabilities found. The implementation applies defense-in-depth correctly. A couple of observations worth noting for future work. ### What's correct - **`@RequirePermission(Permission.READ_ALL)`** on the POST endpoint — type-safe, AOP-enforced. No magic strings. ✅ - **`@Valid @RequestBody NlSearchRequest`** with `@NotBlank @Size(min=3, max=500)` — validates at the controller boundary before any downstream calls. ✅ - **Parameterized SLF4J logging throughout** — `log.warn("Ollama inference failed: {}", e.getClass().getSimpleName())`. No JNDI injection risk (CWE-117). ✅ - **No raw user input reflected in logs.** `NlQueryParserService` never logs the query string. The C4 L3 diagram even notes "Never logs raw query content (PII)". ✅ - **Grammar-constrained JSON schema** sent to Ollama — limits the structured output to `personNames`, `personRole`, `dateFrom`, `dateTo`, `keywords`. This is a reasonable mitigation against prompt injection via structured output fields. ✅ - **Rate limiter throws `tooManyRequests()`** (HTTP 429) with structured `ErrorCode` — the error does not expose implementation details in the response. ✅ - **Fail-closed on Ollama errors** — any timeout, 5xx, or malformed JSON throws `SMART_SEARCH_UNAVAILABLE` rather than proceeding with partially parsed data. ✅ - **No SQL injection vector.** JPA Specification API builds predicates programmatically — no string concatenation. ✅ ### Observations (not blockers) **Rate limit error message includes the caller's username**: `"NL search rate limit exceeded for user: " + userKey`. If `DomainException.message` is serialized in the error response body, this leaks the username back to the caller. Looking at the existing `LoginRateLimiter` pattern, this appears to be accepted practice in this codebase. The username is already known to the caller (they provided it), so the practical impact is minimal. Still worth auditing the `GlobalExceptionHandler` to confirm `message` is not serialized. **`OllamaHealthClient` is not wired to Spring Boot Actuator health.** The interface exists and is tested, but a degraded Ollama won't surface in `/actuator/health`. Consider adding a `HealthIndicator` bean in a future PR so monitoring can detect Ollama outages before users hit 503s. **Input validation is duplicated between `@Size` annotation and `NlQueryParserService.search()`.** Both enforce min=3/max=500. The annotation fires first (Spring validation layer), so the service guard is technically redundant. This is defensive programming — not a security concern — but the duplication could be removed from the service layer.
Author
Owner

🛠️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer

Verdict: ⚠️ Approved with concerns

The backend code itself is infrastructure-ready. My concern is that the PR adds an Ollama container to the C4 L2 diagram and DEPLOYMENT.md — but no actual docker-compose.yml changes. The implementation is ahead of its infrastructure.

What's correct

  • application.yaml uses http://ollama:11434 — references the Docker Compose service name, not a hardcoded IP.
  • application-dev.yaml overrides to http://localhost:11434 — dev profile correctly points to local Ollama.
  • WireMock 3.9.2 is version-pinned in pom.xml. Jetty ee10 transitive deps are also pinned to 12.1.8.
  • Rate limiter documented as node-local — the ADR and C4 L3 diagram note that horizontal scaling would require moving to Redis. The single-node constraint is explicit.
  • DEPLOYMENT.md Ollama section covers the first-run model pull, health check, and config table. Good runbook.
  • timeoutSeconds: 30 / healthCheckTimeoutSeconds: 2 — reasonable defaults. 30 seconds for CPU inference is generous; 2 seconds for health is tight enough to detect early.

Concerns

Ollama is in the C4 diagrams but not in docker-compose.yml. The PR description says the Compose service is tracked as issue #735. This means:

  • The L2 diagram shows Ollama inside the System_Boundary(archiv, "Familienarchiv (Docker Compose)") — but it's not actually there yet
  • A developer following DEPLOYMENT.md will be told to pull qwen2.5:7b-instruct-q4_K_M but won't have an Ollama container to pull it into

This is acceptable as long as issue #735 ships soon and the note in DEPLOYMENT.md makes clear that Ollama requires manual setup until then. I'd suggest adding a callout in DEPLOYMENT.md that Ollama must be started separately until the Compose definition is merged.

No health check on the Ollama container. When the Compose definition ships in #735, it should include:

ollama:
  image: ollama/ollama:latest  # pin this!
  healthcheck:
    test: ["CMD-SHELL", "ollama list | grep qwen2.5 || exit 1"]
    interval: 30s
    timeout: 10s
    retries: 3
    start_period: 120s  # model loading takes time on first run

And the backend should depend on it: condition: service_healthy. Without this, the backend starts before the model is loaded and the first requests 503.

ollama/ollama:latest — when the Compose definition is written for #735, pin the image tag. :latest is not a version.

## 🛠️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer **Verdict: ⚠️ Approved with concerns** The backend code itself is infrastructure-ready. My concern is that the PR adds an Ollama container to the C4 L2 diagram and DEPLOYMENT.md — but no actual `docker-compose.yml` changes. The implementation is ahead of its infrastructure. ### What's correct - **`application.yaml` uses `http://ollama:11434`** — references the Docker Compose service name, not a hardcoded IP. ✅ - **`application-dev.yaml` overrides to `http://localhost:11434`** — dev profile correctly points to local Ollama. ✅ - **WireMock `3.9.2` is version-pinned** in `pom.xml`. Jetty ee10 transitive deps are also pinned to `12.1.8`. ✅ - **Rate limiter documented as node-local** — the ADR and C4 L3 diagram note that horizontal scaling would require moving to Redis. The single-node constraint is explicit. ✅ - **`DEPLOYMENT.md` Ollama section** covers the first-run model pull, health check, and config table. Good runbook. ✅ - **`timeoutSeconds: 30` / `healthCheckTimeoutSeconds: 2`** — reasonable defaults. 30 seconds for CPU inference is generous; 2 seconds for health is tight enough to detect early. ✅ ### Concerns **Ollama is in the C4 diagrams but not in `docker-compose.yml`.** The PR description says the Compose service is tracked as issue #735. This means: - The L2 diagram shows Ollama inside the `System_Boundary(archiv, "Familienarchiv (Docker Compose)")` — but it's not actually there yet - A developer following `DEPLOYMENT.md` will be told to pull `qwen2.5:7b-instruct-q4_K_M` but won't have an Ollama container to pull it into This is acceptable as long as issue #735 ships soon and the note in DEPLOYMENT.md makes clear that Ollama requires manual setup until then. I'd suggest adding a callout in DEPLOYMENT.md that Ollama must be started separately until the Compose definition is merged. **No health check on the Ollama container.** When the Compose definition ships in #735, it should include: ```yaml ollama: image: ollama/ollama:latest # pin this! healthcheck: test: ["CMD-SHELL", "ollama list | grep qwen2.5 || exit 1"] interval: 30s timeout: 10s retries: 3 start_period: 120s # model loading takes time on first run ``` And the backend should depend on it: `condition: service_healthy`. Without this, the backend starts before the model is loaded and the first requests 503. **`ollama/ollama:latest`** — when the Compose definition is written for #735, pin the image tag. `:latest` is not a version.
Author
Owner

🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist

Verdict: Approved

Strong test pyramid. 72 tests across the right layers. A few gaps worth tracking.

What's correct

  • Correct test slices used:

    • NlQueryParserServiceTest@ExtendWith(MockitoExtension.class) style (via openMocks), no Spring context
    • NlSearchControllerTest@WebMvcTest slice, not full @SpringBootTest
    • DocumentRepositoryTest — Testcontainers real PostgreSQL, not H2
    • NlSearchRateLimiterTest — plain JUnit, no Spring context
    • RestClientOllamaClientTest — WireMock for HTTP boundary
  • Rate limiter test isolation@BeforeEach calls rateLimiter.resetForTest() in NlSearchControllerTest. Prevents bucket state leaking between tests.

  • Boundary conditions tested in NlQueryParserServiceTest:

    • Name > 200 chars → silently skipped
    • 11 candidates → capped at 10, ambiguousPersons has 10 entries
    • Null personNames/keywords → no NPE
    • Query < 3 chars → VALIDATION_ERROR
    • Query > 500 chars → VALIDATION_ERROR
    • Ollama throws SMART_SEARCH_UNAVAILABLE → propagated
    • All 5 DocumentRepositoryTest Specification cases including DISTINCT deduplication
  • Security boundaries tested:

    • 401 for unauthenticated request
    • 429 on 6th request within rate limit
    • 503 when Ollama throws SMART_SEARCH_UNAVAILABLE
  • makeResponse() factory function in controller test — clean test data setup.

Gaps (not blockers, worth tracking)

NlQueryParserServiceTest uses MockitoAnnotations.openMocks(this) in @BeforeEach rather than @ExtendWith(MockitoExtension.class) on the class. Both are functionally equivalent, but the annotation form is the project standard (see PersonServiceTest). The openMocks approach also requires manually cleaning up mock state — the MockitoExtension does this automatically.

OllamaHealthClient.isHealthy() has no dedicated test for the false case in context — only RestClientOllamaClientTest tests that a request succeeds. The health check is a GET /api/tags call. Consider adding a test that wireMock.stubFor(get("/api/tags").willReturn(serverError())) causes isHealthy() to return false. Low priority since the interface isn't wired to an actuator health contributor yet.

No test for the 3-name case in NlQueryParserServiceTest where the 3rd name is also ambiguous (e.g. names=[Walter→resolved, Emma→resolved, Heinrich→multi-match]). The existing test covers Heinrich resolving to a single person and being folded into extraFragments. The ambiguous 3rd-name path (ambiguous short-circuits before the 2-resolved check) is implicitly covered by test #7, but an explicit 3-name-with-ambiguous-third test would make the intent clear.

WireMock timeout test uses withFixedDelay(6000) and client.timeoutSeconds = 5 — depends on the test executing within the fixed delay window. On a very slow CI machine this could be flaky. Worth noting but not critical at current CI speeds.

## 🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist **Verdict: ✅ Approved** Strong test pyramid. 72 tests across the right layers. A few gaps worth tracking. ### What's correct - **Correct test slices used:** - `NlQueryParserServiceTest` — `@ExtendWith(MockitoExtension.class)` style (via `openMocks`), no Spring context ✅ - `NlSearchControllerTest` — `@WebMvcTest` slice, not full `@SpringBootTest` ✅ - `DocumentRepositoryTest` — Testcontainers real PostgreSQL, not H2 ✅ - `NlSearchRateLimiterTest` — plain JUnit, no Spring context ✅ - `RestClientOllamaClientTest` — WireMock for HTTP boundary ✅ - **Rate limiter test isolation** — `@BeforeEach` calls `rateLimiter.resetForTest()` in `NlSearchControllerTest`. Prevents bucket state leaking between tests. ✅ - **Boundary conditions tested in `NlQueryParserServiceTest`:** - Name > 200 chars → silently skipped ✅ - 11 candidates → capped at 10, `ambiguousPersons` has 10 entries ✅ - Null `personNames`/`keywords` → no NPE ✅ - Query < 3 chars → `VALIDATION_ERROR` ✅ - Query > 500 chars → `VALIDATION_ERROR` ✅ - Ollama throws `SMART_SEARCH_UNAVAILABLE` → propagated ✅ - All 5 `DocumentRepositoryTest` Specification cases including DISTINCT deduplication ✅ - **Security boundaries tested:** - 401 for unauthenticated request ✅ - 429 on 6th request within rate limit ✅ - 503 when Ollama throws `SMART_SEARCH_UNAVAILABLE` ✅ - **`makeResponse()` factory function in controller test** — clean test data setup. ✅ ### Gaps (not blockers, worth tracking) **`NlQueryParserServiceTest` uses `MockitoAnnotations.openMocks(this)` in `@BeforeEach`** rather than `@ExtendWith(MockitoExtension.class)` on the class. Both are functionally equivalent, but the annotation form is the project standard (see `PersonServiceTest`). The `openMocks` approach also requires manually cleaning up mock state — the `MockitoExtension` does this automatically. **`OllamaHealthClient.isHealthy()` has no dedicated test** for the false case in context — only `RestClientOllamaClientTest` tests that a request succeeds. The health check is a GET `/api/tags` call. Consider adding a test that `wireMock.stubFor(get("/api/tags").willReturn(serverError()))` causes `isHealthy()` to return `false`. Low priority since the interface isn't wired to an actuator health contributor yet. **No test for the 3-name case in `NlQueryParserServiceTest`** where the 3rd name is also ambiguous (e.g. names=[Walter→resolved, Emma→resolved, Heinrich→multi-match]). The existing test covers Heinrich resolving to a single person and being folded into `extraFragments`. The ambiguous 3rd-name path (ambiguous short-circuits before the 2-resolved check) is implicitly covered by test #7, but an explicit 3-name-with-ambiguous-third test would make the intent clear. **WireMock timeout test** uses `withFixedDelay(6000)` and `client.timeoutSeconds = 5` — depends on the test executing within the fixed delay window. On a very slow CI machine this could be flaky. Worth noting but not critical at current CI speeds.
Author
Owner

📋 Elicit — Requirements Engineer

Verdict: Approved

The implementation traces cleanly to the acceptance criteria in issue #738. All specified behaviors are present.

Requirements coverage

Requirement Status
POST /api/search/nl endpoint NlSearchController
Ollama integration (Qwen 2.5 7B) RestClientOllamaClient with grammar-constrained JSON schema
Name → UUID resolution NlQueryParserService.resolveNames()
Single match → sender/receiver UUID Covered by buildSender()/buildReceiver()
Ambiguous match → return ambiguousPersons, skip search Early return in search()
No-match → fold into keyword text noMatchFragments in buildText()
Date range parsing (ISO + year-only) parseDate() handles both forms
5 req/min rate limit per user NlSearchRateLimiter with Bucket4j
503 SMART_SEARCH_UNAVAILABLE when Ollama unreachable All error paths mapped
429 SMART_SEARCH_RATE_LIMITED when limit exceeded
keywordsApplied flag in response NlQueryInterpretation.keywordsApplied
i18n strings (de/en/es) All 3 message files
TypeScript types generated api.ts regenerated

Observations

smart_search_keywords_not_applied key is added to all message files but unused in this PR. It's defined pre-emptively for the frontend issue (#735). This is fine as a forward declaration, but worth noting that it will silently go unused until the frontend lands.

The personRole field's behavior for 2-resolved names bypasses isAnyRole check — when 2 names resolve, resolved.size() >= 2 routes directly to buildSender(0)/buildReceiver(1) regardless of personRole. This means if Ollama says personRole="sender" but two names match, the implementation treats them as sender+receiver pair. The issue spec implies this is intentional ("3+ names → extra fragments; sender+receiver role detection"), but the behavior where personRole is overridden by the 2-name heuristic is worth documenting explicitly in the ADR or as a code comment. Not a defect — just an implicit design choice.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** The implementation traces cleanly to the acceptance criteria in issue #738. All specified behaviors are present. ### Requirements coverage | Requirement | Status | |---|---| | `POST /api/search/nl` endpoint | ✅ `NlSearchController` | | Ollama integration (Qwen 2.5 7B) | ✅ `RestClientOllamaClient` with grammar-constrained JSON schema | | Name → UUID resolution | ✅ `NlQueryParserService.resolveNames()` | | Single match → sender/receiver UUID | ✅ Covered by `buildSender()`/`buildReceiver()` | | Ambiguous match → return `ambiguousPersons`, skip search | ✅ Early return in `search()` | | No-match → fold into keyword text | ✅ `noMatchFragments` in `buildText()` | | Date range parsing (ISO + year-only) | ✅ `parseDate()` handles both forms | | 5 req/min rate limit per user | ✅ `NlSearchRateLimiter` with `Bucket4j` | | 503 `SMART_SEARCH_UNAVAILABLE` when Ollama unreachable | ✅ All error paths mapped | | 429 `SMART_SEARCH_RATE_LIMITED` when limit exceeded | ✅ | | `keywordsApplied` flag in response | ✅ `NlQueryInterpretation.keywordsApplied` | | i18n strings (de/en/es) | ✅ All 3 message files | | TypeScript types generated | ✅ `api.ts` regenerated | ### Observations **`smart_search_keywords_not_applied` key is added to all message files but unused in this PR.** It's defined pre-emptively for the frontend issue (#735). This is fine as a forward declaration, but worth noting that it will silently go unused until the frontend lands. **The `personRole` field's behavior for 2-resolved names bypasses `isAnyRole` check** — when 2 names resolve, `resolved.size() >= 2` routes directly to `buildSender(0)`/`buildReceiver(1)` regardless of `personRole`. This means if Ollama says `personRole="sender"` but two names match, the implementation treats them as sender+receiver pair. The issue spec implies this is intentional ("3+ names → extra fragments; sender+receiver role detection"), but the behavior where `personRole` is overridden by the 2-name heuristic is worth documenting explicitly in the ADR or as a code comment. Not a defect — just an implicit design choice.
Author
Owner

🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead

Verdict: ⚠️ Approved with concerns

This is a backend-only PR — no Svelte components, no layout changes. I focused on the i18n strings, as those directly affect what users see and hear.

What's correct

  • All three languages (de/en/es) covered. No missing translations.
  • Error messages are actionable. Each tells the user what happened and what to do next:
    • "Die intelligente Suche ist momentan nicht verfügbar. Bitte nutze die normale Suche." — gives a concrete next step
    • "Du hast die Suchfunktion zu häufig genutzt. Bitte warte eine Minute." — tells them when they can try again
  • smart_search_keywords_not_applied is clear and non-technical for users.

Blocker

German i18n strings use informal "du" form while the existing error messages use formal "Sie":

// EXISTING — formal "Sie"
"error_too_many_login_attempts": "Zu viele Anmeldeversuche. Bitte versuchen Sie es später erneut."
"error_forbidden": "Sie haben keine Berechtigung für diese Aktion."

// NEW — informal "du"
"error_smart_search_unavailable": "...Bitte nutze die normale Suche."
"error_smart_search_rate_limited": "Du hast die Suchfunktion zu häufig genutzt..."
"smart_search_keywords_not_applied": "..."  // neutral, no pronoun — fine

The Familienarchiv's primary senior audience (60+) expects formal address ("Sie") in error messages. Switching to "du" creates a tone inconsistency that stands out to older German speakers. The fix is two lines:

"error_smart_search_unavailable": "Die intelligente Suche ist momentan nicht verfügbar. Bitte nutzen Sie die normale Suche.",
"error_smart_search_rate_limited": "Sie haben die Suchfunktion zu häufig genutzt. Bitte warten Sie eine Minute."

This should be fixed before merge since it's the only UI-visible artifact in this PR. The strings are not yet referenced in any frontend component, so no other changes are needed.

## 🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead **Verdict: ⚠️ Approved with concerns** This is a backend-only PR — no Svelte components, no layout changes. I focused on the i18n strings, as those directly affect what users see and hear. ### What's correct - **All three languages (de/en/es) covered.** No missing translations. ✅ - **Error messages are actionable.** Each tells the user what happened and what to do next: - "Die intelligente Suche ist momentan nicht verfügbar. **Bitte nutze die normale Suche.**" — gives a concrete next step ✅ - "Du hast die Suchfunktion zu häufig genutzt. **Bitte warte eine Minute.**" — tells them when they can try again ✅ - **`smart_search_keywords_not_applied`** is clear and non-technical for users. ✅ ### Blocker **German i18n strings use informal "du" form while the existing error messages use formal "Sie":** ```json // EXISTING — formal "Sie" "error_too_many_login_attempts": "Zu viele Anmeldeversuche. Bitte versuchen Sie es später erneut." "error_forbidden": "Sie haben keine Berechtigung für diese Aktion." // NEW — informal "du" "error_smart_search_unavailable": "...Bitte nutze die normale Suche." "error_smart_search_rate_limited": "Du hast die Suchfunktion zu häufig genutzt..." "smart_search_keywords_not_applied": "..." // neutral, no pronoun — fine ``` The Familienarchiv's primary senior audience (60+) expects formal address ("Sie") in error messages. Switching to "du" creates a tone inconsistency that stands out to older German speakers. The fix is two lines: ```json "error_smart_search_unavailable": "Die intelligente Suche ist momentan nicht verfügbar. Bitte nutzen Sie die normale Suche.", "error_smart_search_rate_limited": "Sie haben die Suchfunktion zu häufig genutzt. Bitte warten Sie eine Minute." ``` This should be fixed before merge since it's the only UI-visible artifact in this PR. The strings are not yet referenced in any frontend component, so no other changes are needed.
marcel added 1 commit 2026-06-06 16:47:03 +02:00
fix(search): formal Sie form in German error strings; clean up DocumentService imports
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m19s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m57s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
4c620619d4
- error_smart_search_unavailable/rate_limited now use "Sie" (formal) to
  match the tone of all existing German error messages
- Replace inline FQNs in DocumentService.buildPersonSpec with proper
  JoinType + Predicate imports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
marcel added 1 commit 2026-06-06 16:48:11 +02:00
merge(search): resolve DEPLOYMENT.md conflict — keep setup + upgrade sections
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m17s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m45s
CI / fail2ban Regex (pull_request) Successful in 48s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m4s
9a9e1c4c40
Both the first-time model pull runbook (from this branch) and the model
upgrade procedure (from main) belong in DEPLOYMENT.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
marcel merged commit 4a43962c98 into main 2026-06-06 16:52:44 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#756