refactor(search): remove NLP/smart-search feature entirely (#772)

## Summary - Removes the NLP/smart-search feature completely — the feature was too unreliable and slow; users get better results with the regular search filters - Deletes the entire backend `search/` package (NlSearchController, NlQueryParserService, NlpClient, NlSearchRateLimiter — 14 classes + 6 test classes) - Deletes the `nlp-service/` Python microservice (FastAPI, rapidfuzz, DB-backed person matching) - Removes all frontend NL search components: SmartModeToggle, SmartSearchStatus, InterpretationChipRow, DisambiguationPicker, chip-types, theme-chip-removal - Strips smart-mode logic from SearchFilterBar and documents/+page.svelte - Removes `SMART_SEARCH_UNAVAILABLE` / `SMART_SEARCH_RATE_LIMITED` error codes from backend, frontend types, and all three i18n files (de/en/es) - Removes `nlp-service` container and `APP_NLP_BASE_URL` from both docker-compose files - Removes Ollama/NLP Prometheus scrape job and Grafana dashboard - Deletes ADRs 028 (×2), 034, 035 ## Test plan - [ ] Backend compiles: `cd backend && ./mvnw compile -q` → BUILD SUCCESS - [ ] Frontend server tests pass: `cd frontend && npm run test -- --project=server` - [ ] No NLP/smart-search references remain in source: `grep -r "SmartSearch\|NlSearch\|nlp-service\|SMART_SEARCH" backend/src frontend/src` - [ ] `docker compose config` validates both compose files - [ ] Search page loads, filter bar works, no smart-mode toggle visible 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Marcel <marcel@familienarchiv> Reviewed-on: #772
2026-06-08 10:57:00 +02:00
parent 8e63867ad8
commit d650b6c066
60 changed files with 126 additions and 4364 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -92,7 +92,6 @@ backend/src/main/java/org/raddatz/familienarchiv/
 ├── ocr/                 OCR domain — OcrService, OcrBatchService, training
 ├── person/              Person domain
 │   └── relationship/    PersonRelationship sub-domain
-├── search/              NL search domain — NlSearchController, NlQueryParserService, RestClientOllamaClient, NlSearchRateLimiter
 ├── security/            SecurityConfig, Permission, @RequirePermission, PermissionAspect
 ├── tag/                 Tag domain
 └── user/                User domain — AppUser, UserGroup, UserService
@@ -161,7 +160,7 @@ Input DTOs live flat in the domain package. Response types are the model entitie

 → See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)

-**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded); `SMART_SEARCH_UNAVAILABLE` (HTTP 503 — Ollama inference service offline or timed out); `SMART_SEARCH_RATE_LIMITED` (HTTP 429 — user exceeded 5 NL search requests per minute).
+**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded).

 ### Security / Permissions

@@ -269,7 +268,7 @@ Back button pattern — use the shared `<BackButton>` component from `$lib/share

 → See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)

-**LLM reminder:** when adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded); `SMART_SEARCH_UNAVAILABLE` (HTTP 503 — Ollama inference service offline or timed out); `SMART_SEARCH_RATE_LIMITED` (HTTP 429 — user exceeded 5 NL search requests per minute).
+**LLM reminder:** when adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded).

 ---

--- a/backend/src/main/java/org/raddatz/familienarchiv/exception/ErrorCode.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/exception/ErrorCode.java
@@ -135,12 +135,6 @@ public enum ErrorCode {
    /** The merge target is a descendant of the source tag. 400 */
    TAG_MERGE_INVALID_TARGET,

-    // --- NL Search ---
-    /** Ollama is unreachable or timed out. 503 */
-    SMART_SEARCH_UNAVAILABLE,
-    /** NL search rate limit exceeded (5 requests per user per minute). 429 */
-    SMART_SEARCH_RATE_LIMITED,
-
    // --- Generic ---
    /** Request validation failed (missing or malformed fields). 400 */
    VALIDATION_ERROR,
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/NlQueryInterpretation.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/NlQueryInterpretation.java
@@ -1,26 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import io.swagger.v3.oas.annotations.media.Schema;
-
-import java.time.LocalDate;
-import java.util.List;
-
-public record NlQueryInterpretation(
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        List<PersonHint> resolvedPersons,
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        List<PersonHint> ambiguousPersons,
-        LocalDate dateFrom,
-        LocalDate dateTo,
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        List<String> keywords,
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        List<TagHint> resolvedTags,
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        String rawQuery,
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        boolean keywordsApplied,
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        boolean tagsApplied
-) {
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/NlQueryParserService.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/NlQueryParserService.java
@@ -1,216 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import lombok.RequiredArgsConstructor;
-import lombok.extern.slf4j.Slf4j;
-import org.raddatz.familienarchiv.document.DocumentSearchResult;
-import org.raddatz.familienarchiv.document.DocumentService;
-import org.raddatz.familienarchiv.document.DocumentSort;
-import org.raddatz.familienarchiv.document.SearchFilters;
-import org.raddatz.familienarchiv.exception.DomainException;
-import org.raddatz.familienarchiv.exception.ErrorCode;
-import org.raddatz.familienarchiv.person.NameMatches;
-import org.raddatz.familienarchiv.person.Person;
-import org.raddatz.familienarchiv.person.PersonService;
-import org.raddatz.familienarchiv.tag.Tag;
-import org.raddatz.familienarchiv.tag.TagOperator;
-import org.raddatz.familienarchiv.tag.TagService;
-import org.springframework.data.domain.Pageable;
-import org.springframework.stereotype.Service;
-
-import java.time.LocalDate;
-import java.util.ArrayList;
-import java.util.LinkedHashSet;
-import java.util.List;
-import java.util.UUID;
-
-@Service
-@RequiredArgsConstructor
-@Slf4j
-public class NlQueryParserService {
-
-    private static final int MIN_QUERY = 3;
-    private static final int MAX_QUERY = 500;
-    private static final int MAX_NAME_LENGTH = 200;
-    private static final int MIN_TAG_TERM = 3;
-    private static final int MAX_RESOLVED_TAGS = 10;
-
-    private final OllamaClient ollamaClient;
-    private final PersonService personService;
-    private final DocumentService documentService;
-    private final TagService tagService;
-
-    public NlSearchResponse search(String query, Pageable pageable) {
-        if (query == null || query.length() < MIN_QUERY || query.length() > MAX_QUERY) {
-            throw DomainException.badRequest(ErrorCode.VALIDATION_ERROR,
-                    "Query must be between " + MIN_QUERY + " and " + MAX_QUERY + " characters");
-        }
-
-        OllamaExtraction ext = ollamaClient.parse(query);
-
-        List<String> personNames = ext.personNames() != null ? ext.personNames() : List.of();
-        List<String> keywords = ext.keywords() != null ? ext.keywords() : List.of();
-
-        TagResolution tagResolution = resolveTags(keywords);
-        List<TagHint> resolvedTagHints = tagResolution.hints();
-        List<String> resolvedTagNames = tagResolution.tagNames();
-        List<String> remainingKeywords = tagResolution.remaining();
-
-        NameResolution resolution = resolveNames(personNames);
-
-        if (!resolution.ambiguous().isEmpty()) {
-            NlQueryInterpretation interpretation = new NlQueryInterpretation(
-                    List.of(), resolution.ambiguous(),
-                    ext.dateFrom(), ext.dateTo(),
-                    keywords, List.of(), ext.rawQuery(), false, false);
-            return new NlSearchResponse(DocumentSearchResult.of(List.of()), interpretation);
-        }
-
-        List<PersonHint> resolved = resolution.resolved();
-        List<String> noMatchFragments = resolution.noMatchFragments();
-        List<String> extraFragments = resolution.extraFragments();
-
-        boolean hadStructuredMatch = !resolvedTagHints.isEmpty() || !resolved.isEmpty();
-        String text = buildText(remainingKeywords, noMatchFragments, extraFragments, ext.rawQuery(), hadStructuredMatch);
-
-        if (resolved.size() == 1 && isAnyRole(ext.personRole())) {
-            UUID personId = resolved.get(0).id();
-            DocumentSearchResult docs = documentService.searchDocumentsByPersonId(
-                    personId, ext.dateFrom(), ext.dateTo(), pageable);
-            NlQueryInterpretation interpretation = new NlQueryInterpretation(
-                    resolved, List.of(), ext.dateFrom(), ext.dateTo(), keywords, resolvedTagHints, ext.rawQuery(), false, false);
-            return new NlSearchResponse(docs, interpretation);
-        }
-
-        UUID sender = buildSender(resolved, ext.personRole());
-        UUID receiver = buildReceiver(resolved, ext.personRole());
-
-        boolean tagsApplied = !resolvedTagHints.isEmpty();
-        TagOperator tagOperator = tagsApplied ? TagOperator.OR : TagOperator.AND;
-
-        SearchFilters filters = new SearchFilters(
-                text.isBlank() ? null : text,
-                ext.dateFrom(), ext.dateTo(),
-                sender, receiver,
-                resolvedTagNames, null,
-                null, tagOperator, false);
-
-        DocumentSearchResult docs = documentService.searchDocuments(filters, DocumentSort.DATE, "desc", pageable);
-        boolean keywordsApplied = !text.isBlank();
-        NlQueryInterpretation interpretation = new NlQueryInterpretation(
-                resolved, List.of(), ext.dateFrom(), ext.dateTo(), keywords, resolvedTagHints, ext.rawQuery(), keywordsApplied, tagsApplied);
-        return new NlSearchResponse(docs, interpretation);
-    }
-
-    private NameResolution resolveNames(List<String> personNames) {
-        List<PersonHint> resolved = new ArrayList<>();
-        List<PersonHint> ambiguous = new ArrayList<>();
-        List<String> noMatchFragments = new ArrayList<>();
-        List<String> extraFragments = new ArrayList<>();
-
-        int resolvedIndex = 0;
-        for (String name : personNames) {
-            if (name == null || name.length() > MAX_NAME_LENGTH) {
-                log.debug("Skipping name fragment (too long or null): length={}", name == null ? 0 : name.length());
-                continue;
-            }
-            NameMatches matches = personService.resolveByName(name);
-            List<Person> direct = matches.direct();
-            List<Person> partial = matches.partial();
-
-            if (direct.size() == 1) {
-                Person p = direct.get(0);
-                resolvedIndex++;
-                if (resolvedIndex <= 2) {
-                    resolved.add(new PersonHint(p.getId(), p.getDisplayName()));
-                } else {
-                    extraFragments.add(name);
-                }
-            } else if (direct.size() >= 2) {
-                direct.forEach(p -> ambiguous.add(new PersonHint(p.getId(), p.getDisplayName())));
-            } else if (!partial.isEmpty()) {
-                partial.forEach(p -> ambiguous.add(new PersonHint(p.getId(), p.getDisplayName())));
-            } else {
-                noMatchFragments.add(name);
-            }
-        }
-
-        return new NameResolution(resolved, ambiguous, noMatchFragments, extraFragments);
-    }
-
-    private TagResolution resolveTags(List<String> keywords) {
-        LinkedHashSet<Tag> seen = new LinkedHashSet<>();
-        List<String> remaining = new ArrayList<>();
-
-        for (String kw : keywords) {
-            if (kw == null || kw.length() < MIN_TAG_TERM) {
-                remaining.add(kw);
-                continue;
-            }
-            List<Tag> matches = tagService.findByNameContaining(kw);
-            if (matches.isEmpty()) {
-                remaining.add(kw);
-            } else {
-                seen.addAll(matches);
-            }
-        }
-
-        if (seen.size() > MAX_RESOLVED_TAGS) {
-            log.debug("Keyword matched {} tags; capping at {}", seen.size(), MAX_RESOLVED_TAGS);
-        }
-        List<Tag> capped = seen.size() > MAX_RESOLVED_TAGS
-                ? new ArrayList<>(seen).subList(0, MAX_RESOLVED_TAGS)
-                : new ArrayList<>(seen);
-
-        // safe: entities are detached here; mutation is for DTO projection only, no dirty-check fires
-        tagService.resolveEffectiveColors(capped);
-
-        List<TagHint> hints = capped.stream()
-                .map(t -> new TagHint(t.getId(), t.getName(), t.getColor()))
-                .toList();
-        List<String> tagNames = capped.stream().map(Tag::getName).toList();
-
-        return new TagResolution(hints, tagNames, remaining);
-    }
-
-    private String buildText(List<String> keywords, List<String> noMatchFragments,
-                             List<String> extraFragments, String rawQuery, boolean hadStructuredMatch) {
-        List<String> parts = new ArrayList<>();
-        parts.addAll(keywords);
-        parts.addAll(noMatchFragments);
-        parts.addAll(extraFragments);
-        String text = String.join(" ", parts).strip();
-        if (text.isBlank() && !hadStructuredMatch && rawQuery != null && !rawQuery.isBlank()) {
-            return rawQuery;
-        }
-        return text;
-    }
-
-    private boolean isAnyRole(String role) {
-        return role == null || "any".equals(role) || (!"sender".equals(role) && !"receiver".equals(role));
-    }
-
-    private UUID buildSender(List<PersonHint> resolved, String role) {
-        if (resolved.size() >= 2) return resolved.get(0).id();
-        if (resolved.size() == 1 && "sender".equals(role)) return resolved.get(0).id();
-        return null;
-    }
-
-    private UUID buildReceiver(List<PersonHint> resolved, String role) {
-        if (resolved.size() >= 2) return resolved.get(1).id();
-        if (resolved.size() == 1 && "receiver".equals(role)) return resolved.get(0).id();
-        return null;
-    }
-
-    private record NameResolution(
-            List<PersonHint> resolved,
-            List<PersonHint> ambiguous,
-            List<String> noMatchFragments,
-            List<String> extraFragments
-    ) {}
-
-    private record TagResolution(
-            List<TagHint> hints,
-            List<String> tagNames,
-            List<String> remaining
-    ) {}
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/NlSearchController.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/NlSearchController.java
@@ -1,28 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import jakarta.validation.Valid;
-import lombok.RequiredArgsConstructor;
-import org.raddatz.familienarchiv.security.Permission;
-import org.raddatz.familienarchiv.security.RequirePermission;
-import org.springframework.data.domain.Pageable;
-import org.springframework.security.core.annotation.AuthenticationPrincipal;
-import org.springframework.security.core.userdetails.UserDetails;
-import org.springframework.web.bind.annotation.*;
-
-@RestController
-@RequestMapping("/api/search/nl")
-@RequiredArgsConstructor
-public class NlSearchController {
-
-    private final NlQueryParserService nlQueryParserService;
-    private final NlSearchRateLimiter rateLimiter;
-
-    @PostMapping
-    @RequirePermission(Permission.READ_ALL)
-    public NlSearchResponse search(@Valid @RequestBody NlSearchRequest request,
-                                   Pageable pageable,
-                                   @AuthenticationPrincipal UserDetails principal) {
-        rateLimiter.checkAndConsume(principal.getUsername());
-        return nlQueryParserService.search(request.query(), pageable);
-    }
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/NlSearchRateLimitProperties.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/NlSearchRateLimitProperties.java
@@ -1,12 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import lombok.Data;
-import org.springframework.boot.context.properties.ConfigurationProperties;
-import org.springframework.stereotype.Component;
-
-@Component
-@ConfigurationProperties("app.nl-search.rate-limit")
-@Data
-public class NlSearchRateLimitProperties {
-    private int maxRequestsPerMinute = 5;
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/NlSearchRateLimiter.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/NlSearchRateLimiter.java
@@ -1,46 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import com.github.benmanes.caffeine.cache.Caffeine;
-import com.github.benmanes.caffeine.cache.LoadingCache;
-import io.github.bucket4j.Bandwidth;
-import io.github.bucket4j.Bucket;
-import org.raddatz.familienarchiv.exception.DomainException;
-import org.raddatz.familienarchiv.exception.ErrorCode;
-import org.springframework.stereotype.Service;
-
-import java.time.Duration;
-import java.util.concurrent.TimeUnit;
-
-@Service
-public class NlSearchRateLimiter {
-
-    private final LoadingCache<String, Bucket> byUser;
-    private final int maxRequestsPerMinute;
-
-    public NlSearchRateLimiter(NlSearchRateLimitProperties props) {
-        this.maxRequestsPerMinute = props.getMaxRequestsPerMinute();
-        this.byUser = Caffeine.newBuilder()
-                .expireAfterAccess(1, TimeUnit.MINUTES)
-                .build(key -> newBucket(maxRequestsPerMinute));
-    }
-
-    public void checkAndConsume(String userKey) {
-        if (!byUser.get(userKey).tryConsume(1)) {
-            throw DomainException.tooManyRequests(ErrorCode.SMART_SEARCH_RATE_LIMITED,
-                    "NL search rate limit exceeded for user: " + userKey, 60L);
-        }
-    }
-
-    void resetForTest() {
-        byUser.invalidateAll();
-    }
-
-    private static Bucket newBucket(int limit) {
-        return Bucket.builder()
-                .addLimit(Bandwidth.builder()
-                        .capacity(limit)
-                        .refillGreedy(limit, Duration.ofMinutes(1))
-                        .build())
-                .build();
-    }
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/NlSearchRequest.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/NlSearchRequest.java
@@ -1,11 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import jakarta.validation.constraints.NotBlank;
-import jakarta.validation.constraints.Size;
-
-public record NlSearchRequest(
-        @NotBlank
-        @Size(min = 3, max = 500)
-        String query
-) {
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/NlSearchResponse.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/NlSearchResponse.java
@@ -1,12 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import io.swagger.v3.oas.annotations.media.Schema;
-import org.raddatz.familienarchiv.document.DocumentSearchResult;
-
-public record NlSearchResponse(
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        DocumentSearchResult result,
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        NlQueryInterpretation interpretation
-) {
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/OllamaClient.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/OllamaClient.java
@@ -1,5 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-public interface OllamaClient {
-    OllamaExtraction parse(String query);
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/OllamaExtraction.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/OllamaExtraction.java
@@ -1,18 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import java.time.LocalDate;
-import java.util.List;
-
-/**
- * Raw structured output from Ollama after parsing and sanitising.
- * personRole is always one of "sender", "receiver", "any" — defensive parsing ensures this.
- */
-record OllamaExtraction(
-        List<String> personNames,
-        String personRole,
-        LocalDate dateFrom,
-        LocalDate dateTo,
-        List<String> keywords,
-        String rawQuery
-) {
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/OllamaHealthClient.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/OllamaHealthClient.java
@@ -1,5 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-public interface OllamaHealthClient {
-    boolean isHealthy();
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/OllamaProperties.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/OllamaProperties.java
@@ -1,15 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import lombok.Data;
-import org.springframework.boot.context.properties.ConfigurationProperties;
-import org.springframework.stereotype.Component;
-
-@Component
-@ConfigurationProperties("app.ollama")
-@Data
-public class OllamaProperties {
-    private String baseUrl;
-    private String model;
-    private int timeoutSeconds = 30;
-    private int healthCheckTimeoutSeconds = 2;
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/PersonHint.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/PersonHint.java
@@ -1,13 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import io.swagger.v3.oas.annotations.media.Schema;
-
-import java.util.UUID;
-
-public record PersonHint(
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        UUID id,
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        String displayName
-) {
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/RestClientOllamaClient.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/RestClientOllamaClient.java
@@ -1,184 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
-import com.fasterxml.jackson.annotation.JsonProperty;
-import com.fasterxml.jackson.databind.ObjectMapper;
-import lombok.extern.slf4j.Slf4j;
-import org.raddatz.familienarchiv.exception.DomainException;
-import org.raddatz.familienarchiv.exception.ErrorCode;
-import org.springframework.http.client.JdkClientHttpRequestFactory;
-import org.springframework.stereotype.Service;
-import org.springframework.web.client.RestClient;
-import org.springframework.web.client.RestClientException;
-
-import java.net.http.HttpClient;
-import java.time.Duration;
-import java.time.LocalDate;
-import java.time.Year;
-import java.time.format.DateTimeFormatter;
-import java.time.format.DateTimeParseException;
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
-
-@Service
-@Slf4j
-public class RestClientOllamaClient implements OllamaClient, OllamaHealthClient {
-
-    private static final ObjectMapper MAPPER = new ObjectMapper();
-    private static final Set<String> VALID_ROLES = Set.of("sender", "receiver", "any");
-    private static final int MAX_NAME_LENGTH = 200;
-    private static final int MAX_KEYWORD_LENGTH = 100;
-
-    private static final Map<String, Object> JSON_SCHEMA = Map.of(
-            "type", "object",
-            "required", List.of("personNames", "personRole", "keywords"),
-            "properties", Map.of(
-                    "personNames", Map.of("type", "array", "items", Map.of("type", "string", "maxLength", MAX_NAME_LENGTH)),
-                    "personRole", Map.of("type", "string", "enum", List.of("sender", "receiver", "any")),
-                    "dateFrom", Map.of("type", List.of("string", "null"), "maxLength", 20),
-                    "dateTo", Map.of("type", List.of("string", "null"), "maxLength", 20),
-                    "keywords", Map.of("type", "array", "items", Map.of("type", "string", "maxLength", MAX_KEYWORD_LENGTH))
-            )
-    );
-
-    private final RestClient inferenceClient;
-    private final RestClient healthClient;
-    private final OllamaProperties props;
-
-    public RestClientOllamaClient(OllamaProperties props) {
-        this.props = props;
-
-        HttpClient inferenceHttp = HttpClient.newBuilder()
-                .version(HttpClient.Version.HTTP_1_1)
-                .connectTimeout(Duration.ofSeconds(10))
-                .build();
-        JdkClientHttpRequestFactory inferenceFactory = new JdkClientHttpRequestFactory(inferenceHttp);
-        inferenceFactory.setReadTimeout(Duration.ofSeconds(props.getTimeoutSeconds()));
-        this.inferenceClient = RestClient.builder()
-                .baseUrl(props.getBaseUrl())
-                .requestFactory(inferenceFactory)
-                .build();
-
-        HttpClient healthHttp = HttpClient.newBuilder()
-                .version(HttpClient.Version.HTTP_1_1)
-                .connectTimeout(Duration.ofSeconds(props.getHealthCheckTimeoutSeconds()))
-                .build();
-        JdkClientHttpRequestFactory healthFactory = new JdkClientHttpRequestFactory(healthHttp);
-        healthFactory.setReadTimeout(Duration.ofSeconds(props.getHealthCheckTimeoutSeconds()));
-        this.healthClient = RestClient.builder()
-                .baseUrl(props.getBaseUrl())
-                .requestFactory(healthFactory)
-                .build();
-    }
-
-    @Override
-    public OllamaExtraction parse(String query) {
-        try {
-            OllamaGenerateRequest request = new OllamaGenerateRequest(
-                    props.getModel(), query, JSON_SCHEMA, false);
-            String responseBody = inferenceClient.post()
-                    .uri("/api/generate")
-                    .contentType(org.springframework.http.MediaType.APPLICATION_JSON)
-                    .body(request)
-                    .retrieve()
-                    .body(String.class);
-            return parseOllamaResponse(responseBody, query);
-        } catch (DomainException e) {
-            throw e;
-        } catch (Exception e) {
-            log.warn("Ollama inference failed: {}", e.getClass().getSimpleName());
-            throw DomainException.serviceUnavailable(ErrorCode.SMART_SEARCH_UNAVAILABLE,
-                    "Ollama unavailable: " + e.getClass().getSimpleName());
-        }
-    }
-
-    @Override
-    public boolean isHealthy() {
-        try {
-            healthClient.get().uri("/api/tags").retrieve().toBodilessEntity();
-            return true;
-        } catch (Exception e) {
-            return false;
-        }
-    }
-
-    private OllamaExtraction parseOllamaResponse(String responseBody, String rawQuery) {
-        try {
-            OllamaGenerateResponse response = MAPPER.readValue(responseBody, OllamaGenerateResponse.class);
-            String inner = response.response();
-            if (inner == null || inner.isBlank()) {
-                return fallbackExtraction(rawQuery);
-            }
-            RawOllamaOutput raw = MAPPER.readValue(inner, RawOllamaOutput.class);
-            return toExtraction(raw, rawQuery);
-        } catch (Exception e) {
-            log.warn("Failed to parse Ollama response: {}", e.getClass().getSimpleName());
-            throw DomainException.serviceUnavailable(ErrorCode.SMART_SEARCH_UNAVAILABLE,
-                    "Failed to parse Ollama response: " + e.getClass().getSimpleName());
-        }
-    }
-
-    private OllamaExtraction toExtraction(RawOllamaOutput raw, String rawQuery) {
-        List<String> names = raw.personNames() == null ? List.of() : raw.personNames().stream()
-                .filter(n -> n != null && n.length() <= MAX_NAME_LENGTH)
-                .toList();
-        List<String> keywords = raw.keywords() == null ? List.of() : raw.keywords().stream()
-                .filter(k -> k != null && k.length() <= MAX_KEYWORD_LENGTH)
-                .toList();
-        String role = sanitiseRole(raw.personRole());
-        LocalDate dateFrom = parseDate(raw.dateFrom(), true);
-        LocalDate dateTo = parseDate(raw.dateTo(), false);
-        return new OllamaExtraction(names, role, dateFrom, dateTo, keywords, rawQuery);
-    }
-
-    private OllamaExtraction fallbackExtraction(String rawQuery) {
-        return new OllamaExtraction(List.of(), "any", null, null, List.of(), rawQuery);
-    }
-
-    private String sanitiseRole(String role) {
-        if (role != null && VALID_ROLES.contains(role)) {
-            return role;
-        }
-        log.warn("Unexpected personRole from Ollama: {}", role);
-        return "any";
-    }
-
-    private LocalDate parseDate(String raw, boolean isFrom) {
-        if (raw == null || raw.isBlank()) return null;
-        try {
-            return LocalDate.parse(raw, DateTimeFormatter.ISO_LOCAL_DATE);
-        } catch (DateTimeParseException ignored) {
-        }
-        try {
-            int year = Integer.parseInt(raw.strip());
-            if (year > 1000 && year < 3000) {
-                return isFrom ? Year.of(year).atDay(1) : Year.of(year).atMonth(12).atEndOfMonth();
-            }
-        } catch (NumberFormatException ignored) {
-        }
-        return null;
-    }
-
-    @JsonIgnoreProperties(ignoreUnknown = true)
-    private record OllamaGenerateResponse(String response) {
-    }
-
-    @JsonIgnoreProperties(ignoreUnknown = true)
-    private record RawOllamaOutput(
-            @JsonProperty("personNames") List<String> personNames,
-            @JsonProperty("personRole") String personRole,
-            @JsonProperty("dateFrom") String dateFrom,
-            @JsonProperty("dateTo") String dateTo,
-            @JsonProperty("keywords") List<String> keywords
-    ) {
-    }
-
-    private record OllamaGenerateRequest(
-            String model,
-            String prompt,
-            Object format,
-            boolean stream
-    ) {
-    }
-}
--- a/backend/src/main/java/org/raddatz/familienarchiv/search/TagHint.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/search/TagHint.java
@@ -1,14 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import io.swagger.v3.oas.annotations.media.Schema;
-
-import java.util.UUID;
-
-public record TagHint(
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        UUID id,
-        @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
-        String name,
-        String color
-) {
-}
--- a/backend/src/main/resources/application-dev.yaml
+++ b/backend/src/main/resources/application-dev.yaml
@@ -12,6 +12,3 @@ springdoc:
    enabled: true
    path: /swagger-ui.html

-app:
-  ollama:
-    base-url: http://localhost:11434
--- a/backend/src/main/resources/application.yaml
+++ b/backend/src/main/resources/application.yaml
@@ -130,18 +130,6 @@ app:
    # The loader maps columns by header name — no positional indices (see ADR-025).
    dir: ${IMPORT_DIR:/import}

-  ollama:
-    base-url: http://ollama:11434
-    model: qwen2.5:7b-instruct-q4_K_M
-    # CPU inference: ~18s warm. Higher ceiling absorbs the cold model load on the
-    # first query after an Ollama (re)start before OLLAMA_KEEP_ALIVE pins it.
-    timeout-seconds: 60
-    health-check-timeout-seconds: 2
-
-  nl-search:
-    rate-limit:
-      max-requests-per-minute: 5
-
 ocr:
  sender-model:
    activation-threshold: 100
--- a/backend/src/test/java/org/raddatz/familienarchiv/search/NlQueryParserServiceTest.java
+++ b/backend/src/test/java/org/raddatz/familienarchiv/search/NlQueryParserServiceTest.java
@@ -1,739 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import org.junit.jupiter.api.BeforeEach;
-import org.junit.jupiter.api.Test;
-import org.mockito.ArgumentCaptor;
-import org.mockito.Mock;
-import org.mockito.MockitoAnnotations;
-import org.raddatz.familienarchiv.document.DocumentSearchResult;
-import org.raddatz.familienarchiv.document.DocumentService;
-import org.raddatz.familienarchiv.document.DocumentSort;
-import org.raddatz.familienarchiv.document.SearchFilters;
-import org.raddatz.familienarchiv.exception.DomainException;
-import org.raddatz.familienarchiv.exception.ErrorCode;
-import org.raddatz.familienarchiv.person.NameMatches;
-import org.raddatz.familienarchiv.person.Person;
-import org.raddatz.familienarchiv.person.PersonService;
-import org.raddatz.familienarchiv.tag.Tag;
-import org.raddatz.familienarchiv.tag.TagOperator;
-import org.raddatz.familienarchiv.tag.TagService;
-import org.springframework.data.domain.PageRequest;
-import org.springframework.data.domain.Pageable;
-
-import java.time.LocalDate;
-import java.util.ArrayList;
-import java.util.Collection;
-import java.util.List;
-import java.util.UUID;
-
-import static org.assertj.core.api.Assertions.assertThat;
-import static org.assertj.core.api.Assertions.assertThatThrownBy;
-import static org.mockito.ArgumentMatchers.*;
-import static org.mockito.Mockito.*;
-
-class NlQueryParserServiceTest {
-
-    @Mock OllamaClient ollamaClient;
-    @Mock PersonService personService;
-    @Mock DocumentService documentService;
-    @Mock TagService tagService;
-
-    NlQueryParserService service;
-
-    static final Pageable PAGE = PageRequest.of(0, 20);
-
-    @BeforeEach
-    void setUp() {
-        MockitoAnnotations.openMocks(this);
-        service = new NlQueryParserService(ollamaClient, personService, documentService, tagService);
-        when(documentService.searchDocuments(any(), any(), any(), any()))
-                .thenReturn(DocumentSearchResult.of(List.of()));
-        when(documentService.searchDocumentsByPersonId(any(), any(), any(), any()))
-                .thenReturn(DocumentSearchResult.of(List.of()));
-        when(tagService.findByNameContaining(anyString())).thenReturn(List.of());
-    }
-
-    // --- Factory helpers ---
-
-    private OllamaExtraction extraction(List<String> names, String role, LocalDate from, LocalDate to,
-                                         List<String> keywords) {
-        String raw = names.isEmpty() ? "test query" : String.join(" ", names);
-        return new OllamaExtraction(names, role, from, to, keywords, raw);
-    }
-
-    private Person person(UUID id, String firstName, String lastName) {
-        return Person.builder().id(id).firstName(firstName).lastName(lastName).build();
-    }
-
-    private NameMatches makeNameMatches() {
-        return new NameMatches(List.of(), List.of());
-    }
-
-    private NameMatches makeNameMatches(List<Person> direct) {
-        return new NameMatches(direct, List.of());
-    }
-
-    private NameMatches makeNameMatches(List<Person> direct, List<Person> partial) {
-        return new NameMatches(direct, partial);
-    }
-
-    private static final UUID P1 = UUID.fromString("00000000-0000-0000-0000-000000000001");
-    private static final UUID P2 = UUID.fromString("00000000-0000-0000-0000-000000000002");
-    private static final UUID P3 = UUID.fromString("00000000-0000-0000-0000-000000000003");
-
-    // --- 1. Single resolved name + personRole=sender ---
-
-    @Test
-    void search_resolvesSingleName_asSender() {
-        Person walter = person(P1, "Walter", "Raddatz");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Walter"), "sender", null, null, List.of()));
-        when(personService.resolveByName("Walter")).thenReturn(makeNameMatches(List.of(walter)));
-
-        NlSearchResponse resp = service.search("Was hat Walter geschrieben?", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), eq(DocumentSort.DATE), eq("desc"), eq(PAGE));
-        assertThat(cap.getValue().sender()).isEqualTo(P1);
-        assertThat(cap.getValue().receiver()).isNull();
-        assertThat(resp.interpretation().resolvedPersons()).hasSize(1);
-        assertThat(resp.interpretation().resolvedPersons().get(0).id()).isEqualTo(P1);
-        assertThat(resp.interpretation().ambiguousPersons()).isEmpty();
-    }
-
-    // --- 2. Multi-match name → ambiguous, search NOT executed ---
-
-    @Test
-    void search_multiMatchName_populatesAmbiguous_andSkipsSearch() {
-        Person a = person(UUID.randomUUID(), "Walter", "Braun");
-        Person b = person(UUID.randomUUID(), "Walter", "Schmidt");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Walter"), "sender", null, null, List.of()));
-        when(personService.resolveByName("Walter")).thenReturn(makeNameMatches(List.of(a, b)));
-
-        NlSearchResponse resp = service.search("Briefe von Walter", PAGE);
-
-        verify(documentService, never()).searchDocuments(any(), any(), any(), any());
-        verify(documentService, never()).searchDocumentsByPersonId(any(), any(), any(), any());
-        assertThat(resp.interpretation().ambiguousPersons()).hasSize(2);
-        assertThat(resp.interpretation().resolvedPersons()).isEmpty();
-    }
-
-    // --- 3. Multi-match + personRole=any → still ambiguous, search NOT executed ---
-
-    @Test
-    void search_multiMatchName_withPersonRoleAny_stillSkipsSearch() {
-        Person a = person(UUID.randomUUID(), "Emma", "Braun");
-        Person b = person(UUID.randomUUID(), "Emma", "Raddatz");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Emma"), "any", null, null, List.of()));
-        when(personService.resolveByName("Emma")).thenReturn(makeNameMatches(List.of(a, b)));
-
-        NlSearchResponse resp = service.search("Briefe an Emma", PAGE);
-
-        verify(documentService, never()).searchDocuments(any(), any(), any(), any());
-        verify(documentService, never()).searchDocumentsByPersonId(any(), any(), any(), any());
-        assertThat(resp.interpretation().ambiguousPersons()).hasSize(2);
-    }
-
-    // --- 4. No-match name → folded into text ---
-
-    @Test
-    void search_noMatchName_isFoldedIntoText() {
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Karl"), "any", null, null, List.of()));
-        when(personService.resolveByName("Karl")).thenReturn(makeNameMatches());
-
-        service.search("Briefe von Karl", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().text()).contains("Karl");
-        assertThat(cap.getValue().sender()).isNull();
-        assertThat(cap.getValue().receiver()).isNull();
-    }
-
-    // --- 5. personRole=any + 1 resolved → searchDocumentsByPersonId called ---
-
-    @Test
-    void search_personRoleAny_singleMatch_callsSearchDocumentsByPersonId() {
-        Person walter = person(P1, "Walter", "Raddatz");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Walter"), "any", null, null, List.of()));
-        when(personService.resolveByName("Walter")).thenReturn(makeNameMatches(List.of(walter)));
-
-        NlSearchResponse resp = service.search("Briefe von Walter", PAGE);
-
-        verify(documentService).searchDocumentsByPersonId(eq(P1), isNull(), isNull(), eq(PAGE));
-        verify(documentService, never()).searchDocuments(any(), any(), any(), any());
-        assertThat(resp.interpretation().keywordsApplied()).isFalse();
-    }
-
-    // --- 6. 2 names both resolve → sender=person1, receiver=person2 ---
-
-    @Test
-    void search_twoNamesResolve_assignsSenderAndReceiver() {
-        Person walter = person(P1, "Walter", "Raddatz");
-        Person emma = person(P2, "Emma", "Raddatz");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Walter", "Emma"), "any", null, null, List.of()));
-        when(personService.resolveByName("Walter")).thenReturn(makeNameMatches(List.of(walter)));
-        when(personService.resolveByName("Emma")).thenReturn(makeNameMatches(List.of(emma)));
-
-        NlSearchResponse resp = service.search("Briefe von Walter an Emma", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), eq(DocumentSort.DATE), eq("desc"), eq(PAGE));
-        assertThat(cap.getValue().sender()).isEqualTo(P1);
-        assertThat(cap.getValue().receiver()).isEqualTo(P2);
-        assertThat(resp.interpretation().resolvedPersons().get(0).id()).isEqualTo(P1);
-        assertThat(resp.interpretation().resolvedPersons().get(1).id()).isEqualTo(P2);
-    }
-
-    // --- 7. 2 names, first resolves, second ambiguous → search NOT executed ---
-
-    @Test
-    void search_twoNames_secondAmbiguous_skipsSearch() {
-        Person walter = person(P1, "Walter", "Raddatz");
-        Person emma1 = person(P2, "Emma", "Braun");
-        Person emma2 = person(P3, "Emma", "Schmidt");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Walter", "Emma"), "sender", null, null, List.of()));
-        when(personService.resolveByName("Walter")).thenReturn(makeNameMatches(List.of(walter)));
-        when(personService.resolveByName("Emma")).thenReturn(makeNameMatches(List.of(emma1, emma2)));
-
-        NlSearchResponse resp = service.search("Briefe von Walter an Emma", PAGE);
-
-        verify(documentService, never()).searchDocuments(any(), any(), any(), any());
-        assertThat(resp.interpretation().ambiguousPersons()).hasSize(2);
-    }
-
-    // --- 8. 2 names, first no match → folded into text, second used as single person ---
-
-    @Test
-    void search_twoNames_firstNoMatch_secondResolved_foldFirstIntoText() {
-        Person emma = person(P2, "Emma", "Raddatz");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Karl", "Emma"), "sender", null, null, List.of()));
-        when(personService.resolveByName("Karl")).thenReturn(makeNameMatches());
-        when(personService.resolveByName("Emma")).thenReturn(makeNameMatches(List.of(emma)));
-
-        service.search("Briefe von Karl an Emma", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().text()).contains("Karl");
-        assertThat(cap.getValue().sender()).isEqualTo(P2);
-    }
-
-    // --- 9. 3+ names all resolve → first two as sender/receiver, third folded into text ---
-
-    @Test
-    void search_threeNamesResolve_extraFoldedIntoText() {
-        Person walter = person(P1, "Walter", "Raddatz");
-        Person emma = person(P2, "Emma", "Raddatz");
-        Person heinrich = person(P3, "Heinrich", "Braun");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Walter", "Emma", "Heinrich"), "any", null, null, List.of()));
-        when(personService.resolveByName("Walter")).thenReturn(makeNameMatches(List.of(walter)));
-        when(personService.resolveByName("Emma")).thenReturn(makeNameMatches(List.of(emma)));
-        when(personService.resolveByName("Heinrich")).thenReturn(makeNameMatches(List.of(heinrich)));
-
-        service.search("Briefe von Walter an Emma über Heinrich", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().sender()).isEqualTo(P1);
-        assertThat(cap.getValue().receiver()).isEqualTo(P2);
-        assertThat(cap.getValue().text()).contains("Heinrich");
-    }
-
-    // --- 10. Keywords space-joined into text ---
-
-    @Test
-    void search_keywords_areJoinedIntoText() {
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Krieg", "Walter")));
-
-        service.search("Dokumente über den Krieg Walter", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().text()).isEqualTo("Krieg Walter");
-    }
-
-    // --- 11. Date range passed through ---
-
-    @Test
-    void search_dateRange_passedIntoSearchFilters() {
-        LocalDate from = LocalDate.of(1914, 1, 1);
-        LocalDate to = LocalDate.of(1914, 12, 31);
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", from, to, List.of()));
-
-        service.search("Briefe aus dem Jahr 1914", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().from()).isEqualTo(from);
-        assertThat(cap.getValue().to()).isEqualTo(to);
-    }
-
-    // --- 12. Null dates → null in SearchFilters (not an error) ---
-
-    @Test
-    void search_nullDates_passedAsNullIntoFilters() {
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Hochzeit")));
-
-        service.search("Hochzeitsbriefe", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().from()).isNull();
-        assertThat(cap.getValue().to()).isNull();
-    }
-
-    // --- 13. Query under 3 chars → VALIDATION_ERROR before Ollama call ---
-
-    @Test
-    void search_queryTooShort_throwsValidationError() {
-        assertThatThrownBy(() -> service.search("ab", PAGE))
-                .isInstanceOf(DomainException.class)
-                .extracting(e -> ((DomainException) e).getCode())
-                .isEqualTo(ErrorCode.VALIDATION_ERROR);
-
-        verify(ollamaClient, never()).parse(anyString());
-    }
-
-    // --- 14. Query over 500 chars → VALIDATION_ERROR ---
-
-    @Test
-    void search_queryTooLong_throwsValidationError() {
-        String longQuery = "a".repeat(501);
-        assertThatThrownBy(() -> service.search(longQuery, PAGE))
-                .isInstanceOf(DomainException.class)
-                .extracting(e -> ((DomainException) e).getCode())
-                .isEqualTo(ErrorCode.VALIDATION_ERROR);
-
-        verify(ollamaClient, never()).parse(anyString());
-    }
-
-    // --- 15. Ollama returns empty names/keywords → raw query used as keyword fallback ---
-
-    @Test
-    void search_ollamaReturnsEmpty_usesRawQueryAsTextFallback() {
-        String raw = "Briefe aus dem Krieg";
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(new OllamaExtraction(List.of(), "any", null, null, List.of(), raw));
-
-        service.search(raw, PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().text()).isEqualTo(raw);
-    }
-
-    // --- 16. Null personNames/keywords from Ollama → no NPE ---
-
-    @Test
-    void search_nullPersonNamesAndKeywords_handledWithoutNpe() {
-        OllamaExtraction ext = new OllamaExtraction(null, "any", null, null, null, "test query");
-        when(ollamaClient.parse(anyString())).thenReturn(ext);
-
-        NlSearchResponse resp = service.search("test query", PAGE);
-
-        assertThat(resp).isNotNull();
-        verify(documentService).searchDocuments(any(), any(), any(), any());
-    }
-
-    // --- 17. Unrecognized personRole → defaults to any-like behavior (no crash) ---
-
-    @Test
-    void search_unrecognizedPersonRole_treatedLikeAny_withSingleResolvedPerson() {
-        Person walter = person(P1, "Walter", "Raddatz");
-        // OllamaClient defensive parsing returns "any" for unknown roles,
-        // but NlQueryParserService must also be safe if something unexpected arrives.
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(new OllamaExtraction(List.of("Walter"), "unknown_role", null, null, List.of(), "query"));
-        when(personService.resolveByName("Walter")).thenReturn(makeNameMatches(List.of(walter)));
-
-        NlSearchResponse resp = service.search("Briefe von Walter", PAGE);
-
-        // Should not crash; "unknown_role" treated as fallback (neither sender nor receiver → any)
-        assertThat(resp).isNotNull();
-    }
-
-    // --- 18. Ollama throws SMART_SEARCH_UNAVAILABLE → propagates to caller ---
-
-    @Test
-    void search_ollamaThrowsUnavailable_propagates() {
-        when(ollamaClient.parse(anyString()))
-                .thenThrow(DomainException.tooManyRequests(ErrorCode.SMART_SEARCH_UNAVAILABLE, "offline"));
-
-        assertThatThrownBy(() -> service.search("Was hat Walter geschrieben?", PAGE))
-                .isInstanceOf(DomainException.class)
-                .extracting(e -> ((DomainException) e).getCode())
-                .isEqualTo(ErrorCode.SMART_SEARCH_UNAVAILABLE);
-    }
-
-    // --- 19. LLM-extracted name > 200 chars → skipped, PersonService never called ---
-
-    @Test
-    void search_nameLongerThan200Chars_isSkippedBeforePersonServiceCall() {
-        String longName = "A".repeat(201);
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(longName), "sender", null, null, List.of()));
-
-        service.search("Briefe von sehr langem Namen", PAGE);
-
-        verify(personService, never()).resolveByName(anyString());
-    }
-
-    // --- 20. Cap lives in resolveByName (after classification): a pre-capped 10-direct result
-    //         maps straight to ambiguousPersons; the search layer adds no second cap. ---
-
-    @Test
-    void search_tenDirectMatches_allShownAsAmbiguous() {
-        List<Person> ten = new ArrayList<>();
-        for (int i = 0; i < 10; i++) {
-            ten.add(person(UUID.randomUUID(), "Walter", "Person" + i));
-        }
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Walter"), "sender", null, null, List.of()));
-        when(personService.resolveByName("Walter")).thenReturn(makeNameMatches(ten));
-
-        NlSearchResponse resp = service.search("Briefe von Walter", PAGE);
-
-        assertThat(resp.interpretation().ambiguousPersons()).hasSize(10);
-        verify(documentService, never()).searchDocuments(any(), any(), any(), any());
-    }
-
-    // --- 21. SearchFilters defaults: tagOperator=AND, status=null, undated=false, tags=empty ---
-
-    @Test
-    void search_searchFiltersDefaults_areCorrect() {
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Krieg")));
-
-        service.search("Dokumente über den Krieg", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), eq(DocumentSort.DATE), eq("desc"), eq(PAGE));
-        SearchFilters f = cap.getValue();
-        assertThat(f.tagOperator()).isEqualTo(TagOperator.AND);
-        assertThat(f.status()).isNull();
-        assertThat(f.undated()).isFalse();
-        assertThat(f.tags()).isEmpty();
-        assertThat(f.tagQ()).isNull();
-    }
-
-    // --- 22. personRole=receiver + 1 resolved → receiver UUID set ---
-
-    @Test
-    void search_personRoleReceiver_singleMatch_setsReceiver() {
-        Person emma = person(P2, "Emma", "Raddatz");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Emma"), "receiver", null, null, List.of()));
-        when(personService.resolveByName("Emma")).thenReturn(makeNameMatches(List.of(emma)));
-
-        service.search("Briefe an Emma", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().receiver()).isEqualTo(P2);
-        assertThat(cap.getValue().sender()).isNull();
-    }
-
-    // --- 23. keywordsApplied=true when text is non-blank ---
-
-    @Test
-    void search_keywordsApplied_trueWhenTextNonBlank() {
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Feldpost")));
-
-        NlSearchResponse resp = service.search("Feldpost aus dem Krieg", PAGE);
-
-        assertThat(resp.interpretation().keywordsApplied()).isTrue();
-    }
-
-    // --- 23a. Partial-only, one candidate → ambiguous (1-item picker), search skipped ---
-
-    @Test
-    void search_partialOnly_oneCandidate_populatesAmbiguous() {
-        Person cramer = person(P1, "Clara", "Cramer");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Clara Cram"), "any", null, null, List.of()));
-        when(personService.resolveByName("Clara Cram")).thenReturn(makeNameMatches(List.of(), List.of(cramer)));
-
-        NlSearchResponse resp = service.search("Briefe von Clara Cram", PAGE);
-
-        assertThat(resp.interpretation().ambiguousPersons()).hasSize(1);
-        verify(documentService, never()).searchDocuments(any(), any(), any(), any());
-    }
-
-    // --- 23b. Partial-only, two candidates → ambiguous (multi-item picker) ---
-
-    @Test
-    void search_partialOnly_twoCandidates_populatesAmbiguous() {
-        Person cramer = person(P1, "Clara", "Cramer");
-        Person crammond = person(P2, "Clara", "Crammond");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Clara Cram"), "any", null, null, List.of()));
-        when(personService.resolveByName("Clara Cram"))
-                .thenReturn(makeNameMatches(List.of(), List.of(cramer, crammond)));
-
-        NlSearchResponse resp = service.search("Briefe von Clara Cram", PAGE);
-
-        assertThat(resp.interpretation().ambiguousPersons()).hasSize(2);
-    }
-
-    // --- 23c. Exactly one direct match → search executes, no picker ---
-
-    @Test
-    void search_oneDirect_executesSearch() {
-        Person clara = person(P1, "Clara", "Cram");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Clara Cram"), "any", null, null, List.of()));
-        when(personService.resolveByName("Clara Cram")).thenReturn(makeNameMatches(List.of(clara)));
-
-        NlSearchResponse resp = service.search("Briefe von Clara Cram", PAGE);
-
-        verify(documentService).searchDocumentsByPersonId(eq(P1), isNull(), isNull(), eq(PAGE));
-        assertThat(resp.interpretation().ambiguousPersons()).isEmpty();
-    }
-
-    // --- Tag resolution helpers ---
-
-    private Tag tag(UUID id, String name) {
-        return Tag.builder().id(id).name(name).build();
-    }
-
-    private Tag tag(UUID id, String name, String color) {
-        return Tag.builder().id(id).name(name).color(color).build();
-    }
-
-    private TagHint tagHint(UUID id, String name, String color) {
-        return new TagHint(id, name, color);
-    }
-
-    private static final UUID T1 = UUID.fromString("00000000-0000-0000-0001-000000000001");
-
-    // --- 24. Single keyword resolves to one tag → tag filter applied ---
-
-    @Test
-    void search_singleKeywordResolvesToTag_appliesTagFilter() {
-        Tag hochzeit = tag(T1, "Hochzeit");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Hochzeit")));
-        when(tagService.findByNameContaining("Hochzeit")).thenReturn(List.of(hochzeit));
-
-        NlSearchResponse resp = service.search("Briefe über Hochzeit", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().tags()).containsExactly("Hochzeit");
-        assertThat(cap.getValue().tagOperator()).isEqualTo(TagOperator.OR);
-        assertThat(resp.interpretation().resolvedTags()).hasSize(1);
-        assertThat(resp.interpretation().resolvedTags().get(0).name()).isEqualTo("Hochzeit");
-        assertThat(resp.interpretation().tagsApplied()).isTrue();
-        assertThat(cap.getValue().text()).isNull();
-    }
-
-    private static final UUID T2 = UUID.fromString("00000000-0000-0000-0001-000000000002");
-
-    // --- 25. Keyword matches multiple tags → all in resolvedTags, OR-union ---
-
-    @Test
-    void search_keywordMatchesMultipleTags_allIncluded() {
-        Tag hochzeit1 = tag(T1, "Hochzeit Raddatz");
-        Tag hochzeit2 = tag(T2, "Hochzeit Braun");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Hochzeit")));
-        when(tagService.findByNameContaining("Hochzeit")).thenReturn(List.of(hochzeit1, hochzeit2));
-
-        NlSearchResponse resp = service.search("Briefe über Hochzeit", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().tags()).containsExactlyInAnyOrder("Hochzeit Raddatz", "Hochzeit Braun");
-        assertThat(cap.getValue().tagOperator()).isEqualTo(TagOperator.OR);
-        assertThat(resp.interpretation().resolvedTags()).hasSize(2);
-    }
-
-    // --- 26. Keyword no tag match → stays as FTS text, resolvedTags empty ---
-
-    @Test
-    void search_keywordNoTagMatch_staysAsFtsText() {
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Feldpost")));
-
-        NlSearchResponse resp = service.search("Feldpost Briefe", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().text()).contains("Feldpost");
-        assertThat(cap.getValue().tags()).isEmpty();
-        assertThat(resp.interpretation().resolvedTags()).isEmpty();
-        assertThat(resp.interpretation().tagsApplied()).isFalse();
-    }
-
-    // --- 27. Mixed: one keyword resolves, one doesn't → tag filter + FTS text ---
-
-    @Test
-    void search_mixedKeywords_oneResolves_oneStaysAsText() {
-        Tag hochzeit = tag(T1, "Hochzeit");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Hochzeit", "Feldpost")));
-        when(tagService.findByNameContaining("Hochzeit")).thenReturn(List.of(hochzeit));
-
-        NlSearchResponse resp = service.search("Hochzeit und Feldpost", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().tags()).containsExactly("Hochzeit");
-        assertThat(cap.getValue().tagOperator()).isEqualTo(TagOperator.OR);
-        assertThat(cap.getValue().text()).contains("Feldpost");
-        assertThat(resp.interpretation().resolvedTags()).hasSize(1);
-        assertThat(resp.interpretation().tagsApplied()).isTrue();
-    }
-
-    // --- 28. personRole=any + 1 person + resolvable keyword → personId search, tagsApplied=false ---
-
-    @Test
-    void search_personRoleAny_singlePerson_resolvableKeyword_tagsAppliedFalse() {
-        Person walter = person(P1, "Walter", "Raddatz");
-        Tag hochzeit = tag(T1, "Hochzeit");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of("Walter"), "any", null, null, List.of("Hochzeit")));
-        when(personService.resolveByName("Walter")).thenReturn(makeNameMatches(List.of(walter)));
-        when(tagService.findByNameContaining("Hochzeit")).thenReturn(List.of(hochzeit));
-
-        NlSearchResponse resp = service.search("Briefe von Walter über Hochzeit", PAGE);
-
-        verify(documentService).searchDocumentsByPersonId(eq(P1), isNull(), isNull(), eq(PAGE));
-        verify(documentService, never()).searchDocuments(any(), any(), any(), any());
-        assertThat(resp.interpretation().tagsApplied()).isFalse();
-        assertThat(resp.interpretation().resolvedTags()).hasSize(1);
-        assertThat(resp.interpretation().resolvedTags().get(0).name()).isEqualTo("Hochzeit");
-    }
-
-    // --- 29. Cap: keyword matches > 10 tags → capped at 10 ---
-
-    @Test
-    void search_keywordMatchesMoreThanMaxTags_cappedAtTen() {
-        List<Tag> eleven = new ArrayList<>();
-        for (int i = 0; i < 11; i++) {
-            eleven.add(tag(UUID.randomUUID(), "Thema " + i));
-        }
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Thema")));
-        when(tagService.findByNameContaining("Thema")).thenReturn(eleven);
-
-        NlSearchResponse resp = service.search("Dokumente zum Thema", PAGE);
-
-        assertThat(resp.interpretation().resolvedTags()).hasSize(10);
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().tags()).hasSize(10);
-    }
-
-    // --- 30. Short keyword (< 3 chars) → skipped, not passed to TagService ---
-
-    @Test
-    void search_shortKeyword_skippedByTagResolution() {
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("ab", "Krieg")));
-
-        service.search("ab Krieg", PAGE);
-
-        verify(tagService, never()).findByNameContaining("ab");
-        verify(tagService).findByNameContaining("Krieg");
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().text()).contains("ab");
-    }
-
-    // --- 31. Dedup: same tag matched by two keywords → appears once ---
-
-    @Test
-    void search_sameTagMatchedByTwoKeywords_deduplicatedToOne() {
-        Tag hochzeit = tag(T1, "Hochzeit");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Hochzeit", "hoch")));
-        when(tagService.findByNameContaining("Hochzeit")).thenReturn(List.of(hochzeit));
-        when(tagService.findByNameContaining("hoch")).thenReturn(List.of(hochzeit));
-
-        NlSearchResponse resp = service.search("Hochzeit hoch", PAGE);
-
-        assertThat(resp.interpretation().resolvedTags()).hasSize(1);
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().tags()).hasSize(1);
-    }
-
-    // --- 32. All keywords resolve → rawQuery fallback suppressed, text=null ---
-
-    @Test
-    void search_allKeywordsResolveToTags_rawQueryFallbackSuppressed() {
-        Tag hochzeit = tag(T1, "Hochzeit");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(new OllamaExtraction(List.of(), "any", null, null, List.of("Hochzeit"), "raw query text"));
-        when(tagService.findByNameContaining("Hochzeit")).thenReturn(List.of(hochzeit));
-
-        NlSearchResponse resp = service.search("Hochzeit", PAGE);
-
-        ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
-        verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
-        assertThat(cap.getValue().text()).isNull();
-        assertThat(cap.getValue().tags()).containsExactly("Hochzeit");
-    }
-
-    // --- 33. Flag independence: keywordsApplied=false AND tagsApplied=true ---
-
-    @Test
-    void search_allKeywordsResolveToTags_keywordsAppliedFalse_tagsAppliedTrue() {
-        Tag hochzeit = tag(T1, "Hochzeit");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Hochzeit")));
-        when(tagService.findByNameContaining("Hochzeit")).thenReturn(List.of(hochzeit));
-
-        NlSearchResponse resp = service.search("Hochzeit Briefe", PAGE);
-
-        assertThat(resp.interpretation().keywordsApplied()).isFalse();
-        assertThat(resp.interpretation().tagsApplied()).isTrue();
-    }
-
-    // --- 34. Color carried through from resolveEffectiveColors ---
-
-    @Test
-    void search_tagHint_carriesColorSetByResolveEffectiveColors() {
-        Tag hochzeit = tag(T1, "Hochzeit");
-        doAnswer(invocation -> {
-            Collection<Tag> tags = invocation.getArgument(0);
-            tags.forEach(t -> t.setColor("sage"));
-            return null;
-        }).when(tagService).resolveEffectiveColors(any());
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Hochzeit")));
-        when(tagService.findByNameContaining("Hochzeit")).thenReturn(List.of(hochzeit));
-
-        NlSearchResponse resp = service.search("Hochzeit", PAGE);
-
-        assertThat(resp.interpretation().resolvedTags().get(0).color()).isEqualTo("sage");
-    }
-
-    // --- 35. Color stays null when resolveEffectiveColors leaves it unset ---
-
-    @Test
-    void search_tagHint_colorIsNull_whenNoColorResolved() {
-        Tag hochzeit = tag(T1, "Hochzeit");
-        when(ollamaClient.parse(anyString()))
-                .thenReturn(extraction(List.of(), "any", null, null, List.of("Hochzeit")));
-        when(tagService.findByNameContaining("Hochzeit")).thenReturn(List.of(hochzeit));
-
-        NlSearchResponse resp = service.search("Hochzeit", PAGE);
-
-        assertThat(resp.interpretation().resolvedTags().get(0).color()).isNull();
-    }
-}
--- a/backend/src/test/java/org/raddatz/familienarchiv/search/NlSearchControllerTest.java
+++ b/backend/src/test/java/org/raddatz/familienarchiv/search/NlSearchControllerTest.java
@@ -1,161 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import tools.jackson.databind.ObjectMapper;
-import org.junit.jupiter.api.BeforeEach;
-import org.junit.jupiter.api.Test;
-import org.raddatz.familienarchiv.document.DocumentSearchResult;
-import org.raddatz.familienarchiv.exception.DomainException;
-import org.raddatz.familienarchiv.exception.ErrorCode;
-import org.raddatz.familienarchiv.security.SecurityConfig;
-import org.raddatz.familienarchiv.security.PermissionAspect;
-import org.raddatz.familienarchiv.user.CustomUserDetailsService;
-import org.springframework.beans.factory.annotation.Autowired;
-import org.springframework.boot.autoconfigure.aop.AopAutoConfiguration;
-import org.springframework.boot.webmvc.test.autoconfigure.WebMvcTest;
-import org.springframework.context.annotation.Import;
-import org.springframework.http.MediaType;
-import org.springframework.security.test.context.support.WithMockUser;
-import org.springframework.test.context.bean.override.mockito.MockitoBean;
-import org.springframework.test.web.servlet.MockMvc;
-
-import java.util.List;
-import java.util.UUID;
-
-import static org.mockito.ArgumentMatchers.*;
-import static org.mockito.Mockito.when;
-import static org.springframework.security.test.web.servlet.request.SecurityMockMvcRequestPostProcessors.csrf;
-import static org.springframework.test.web.servlet.request.MockMvcRequestBuilders.post;
-import static org.springframework.test.web.servlet.result.MockMvcResultMatchers.*;
-
-@WebMvcTest(NlSearchController.class)
-@Import({SecurityConfig.class, PermissionAspect.class, AopAutoConfiguration.class,
-        NlSearchRateLimiter.class, NlSearchRateLimitProperties.class})
-class NlSearchControllerTest {
-
-    @Autowired MockMvc mockMvc;
-    private final ObjectMapper objectMapper = new ObjectMapper();
-
-    @MockitoBean NlQueryParserService nlQueryParserService;
-    @MockitoBean CustomUserDetailsService customUserDetailsService;
-    @Autowired NlSearchRateLimiter rateLimiter;
-
-    @BeforeEach
-    void resetRateLimiter() {
-        rateLimiter.resetForTest();
-    }
-
-    private NlSearchResponse makeResponse() {
-        PersonHint hint = new PersonHint(UUID.randomUUID(), "Walter Raddatz");
-        NlQueryInterpretation interp = new NlQueryInterpretation(
-                List.of(hint), List.of(), null, null,
-                List.of("Krieg"), List.of(), "Briefe von Walter im Krieg", true, false);
-        return new NlSearchResponse(DocumentSearchResult.of(List.of()), interp);
-    }
-
-    // --- 1. Happy path ---
-
-    @Test
-    @WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
-    void search_returns200_withNlSearchResponse() throws Exception {
-        when(nlQueryParserService.search(anyString(), any())).thenReturn(makeResponse());
-
-        mockMvc.perform(post("/api/search/nl").with(csrf())
-                        .contentType(MediaType.APPLICATION_JSON)
-                        .content("{\"query\":\"Briefe von Walter im Krieg\"}"))
-                .andExpect(status().isOk())
-                .andExpect(jsonPath("$.interpretation.rawQuery").value("Briefe von Walter im Krieg"))
-                .andExpect(jsonPath("$.interpretation.resolvedPersons[0].displayName").value("Walter Raddatz"))
-                .andExpect(jsonPath("$.interpretation.keywordsApplied").value(true));
-    }
-
-    // --- 2. ambiguousPersons in response shape ---
-
-    @Test
-    @WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
-    void search_returns200_withAmbiguousPersons() throws Exception {
-        PersonHint a = new PersonHint(UUID.randomUUID(), "Walter Braun");
-        PersonHint b = new PersonHint(UUID.randomUUID(), "Walter Schmidt");
-        NlQueryInterpretation interp = new NlQueryInterpretation(
-                List.of(), List.of(a, b), null, null,
-                List.of(), List.of(), "Briefe von Walter", false, false);
-        NlSearchResponse resp = new NlSearchResponse(DocumentSearchResult.of(List.of()), interp);
-        when(nlQueryParserService.search(anyString(), any())).thenReturn(resp);
-
-        mockMvc.perform(post("/api/search/nl").with(csrf())
-                        .contentType(MediaType.APPLICATION_JSON)
-                        .content("{\"query\":\"Briefe von Walter\"}"))
-                .andExpect(status().isOk())
-                .andExpect(jsonPath("$.interpretation.ambiguousPersons").isArray())
-                .andExpect(jsonPath("$.interpretation.ambiguousPersons[0].displayName").value("Walter Braun"))
-                .andExpect(jsonPath("$.interpretation.ambiguousPersons[1].id").isNotEmpty());
-    }
-
-    // --- 3. Unauthenticated → 401 ---
-
-    @Test
-    void search_returns401_whenUnauthenticated() throws Exception {
-        mockMvc.perform(post("/api/search/nl").with(csrf())
-                        .contentType(MediaType.APPLICATION_JSON)
-                        .content("{\"query\":\"Briefe von Walter\"}"))
-                .andExpect(status().isUnauthorized());
-    }
-
-    // --- 4. Query < 3 chars → 400 ---
-
-    @Test
-    @WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
-    void search_returns400_whenQueryTooShort() throws Exception {
-        mockMvc.perform(post("/api/search/nl").with(csrf())
-                        .contentType(MediaType.APPLICATION_JSON)
-                        .content("{\"query\":\"ab\"}"))
-                .andExpect(status().isBadRequest());
-    }
-
-    // --- 5. Query > 500 chars → 400 ---
-
-    @Test
-    @WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
-    void search_returns400_whenQueryTooLong() throws Exception {
-        String longQuery = "a".repeat(501);
-        mockMvc.perform(post("/api/search/nl").with(csrf())
-                        .contentType(MediaType.APPLICATION_JSON)
-                        .content("{\"query\":\"" + longQuery + "\"}"))
-                .andExpect(status().isBadRequest());
-    }
-
-    // --- 6. Ollama unavailable → 503 ---
-
-    @Test
-    @WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
-    void search_returns503_whenOllamaUnavailable() throws Exception {
-        when(nlQueryParserService.search(anyString(), any()))
-                .thenThrow(DomainException.serviceUnavailable(ErrorCode.SMART_SEARCH_UNAVAILABLE, "Ollama offline"));
-
-        mockMvc.perform(post("/api/search/nl").with(csrf())
-                        .contentType(MediaType.APPLICATION_JSON)
-                        .content("{\"query\":\"Briefe von Walter\"}"))
-                .andExpect(status().isServiceUnavailable())
-                .andExpect(jsonPath("$.code").value("SMART_SEARCH_UNAVAILABLE"));
-    }
-
-    // --- 7. 6th request in 1 minute → 429 ---
-
-    @Test
-    @WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
-    void search_returns429_onSixthRequestWithinRateLimit() throws Exception {
-        when(nlQueryParserService.search(anyString(), any())).thenReturn(makeResponse());
-
-        for (int i = 0; i < 5; i++) {
-            mockMvc.perform(post("/api/search/nl").with(csrf())
-                            .contentType(MediaType.APPLICATION_JSON)
-                            .content("{\"query\":\"Briefe von Walter\"}"))
-                    .andExpect(status().isOk());
-        }
-
-        mockMvc.perform(post("/api/search/nl").with(csrf())
-                        .contentType(MediaType.APPLICATION_JSON)
-                        .content("{\"query\":\"Briefe von Walter\"}"))
-                .andExpect(status().isTooManyRequests())
-                .andExpect(jsonPath("$.code").value("SMART_SEARCH_RATE_LIMITED"));
-    }
-}
--- a/backend/src/test/java/org/raddatz/familienarchiv/search/NlSearchRateLimiterTest.java
+++ b/backend/src/test/java/org/raddatz/familienarchiv/search/NlSearchRateLimiterTest.java
@@ -1,62 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import org.junit.jupiter.api.BeforeEach;
-import org.junit.jupiter.api.Test;
-import org.raddatz.familienarchiv.exception.DomainException;
-import org.raddatz.familienarchiv.exception.ErrorCode;
-
-import static org.assertj.core.api.Assertions.assertThatCode;
-import static org.assertj.core.api.Assertions.assertThatThrownBy;
-
-class NlSearchRateLimiterTest {
-
-    private NlSearchRateLimiter rateLimiter;
-
-    @BeforeEach
-    void setUp() {
-        NlSearchRateLimitProperties props = new NlSearchRateLimitProperties();
-        props.setMaxRequestsPerMinute(5);
-        rateLimiter = new NlSearchRateLimiter(props);
-    }
-
-    @Test
-    void checkAndConsume_allowsRequestsWithinLimit() {
-        for (int i = 0; i < 5; i++) {
-            assertThatCode(() -> rateLimiter.checkAndConsume("user@example.com"))
-                    .doesNotThrowAnyException();
-        }
-    }
-
-    @Test
-    void checkAndConsume_throwsRateLimited_onSixthRequest() {
-        for (int i = 0; i < 5; i++) {
-            rateLimiter.checkAndConsume("user@example.com");
-        }
-
-        assertThatThrownBy(() -> rateLimiter.checkAndConsume("user@example.com"))
-                .isInstanceOf(DomainException.class)
-                .extracting(e -> ((DomainException) e).getCode())
-                .isEqualTo(ErrorCode.SMART_SEARCH_RATE_LIMITED);
-    }
-
-    @Test
-    void checkAndConsume_limitsAreIndependentPerUser() {
-        for (int i = 0; i < 5; i++) {
-            rateLimiter.checkAndConsume("alice@example.com");
-        }
-        assertThatCode(() -> rateLimiter.checkAndConsume("bob@example.com"))
-                .doesNotThrowAnyException();
-    }
-
-    @Test
-    void resetForTest_clearsAllBuckets() {
-        for (int i = 0; i < 5; i++) {
-            rateLimiter.checkAndConsume("user@example.com");
-        }
-
-        rateLimiter.resetForTest();
-
-        assertThatCode(() -> rateLimiter.checkAndConsume("user@example.com"))
-                .doesNotThrowAnyException();
-    }
-}
--- a/backend/src/test/java/org/raddatz/familienarchiv/search/NlSearchTagResolutionIntegrationTest.java
+++ b/backend/src/test/java/org/raddatz/familienarchiv/search/NlSearchTagResolutionIntegrationTest.java
@@ -1,56 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import org.junit.jupiter.api.Test;
-import org.raddatz.familienarchiv.PostgresContainerConfig;
-import org.raddatz.familienarchiv.config.FlywayConfig;
-import org.raddatz.familienarchiv.tag.Tag;
-import org.raddatz.familienarchiv.tag.TagRepository;
-import org.springframework.beans.factory.annotation.Autowired;
-import org.springframework.boot.jdbc.test.autoconfigure.AutoConfigureTestDatabase;
-import org.springframework.boot.data.jpa.test.autoconfigure.DataJpaTest;
-import org.springframework.context.annotation.Import;
-
-import java.util.List;
-import java.util.UUID;
-
-import static org.assertj.core.api.Assertions.assertThat;
-
-@DataJpaTest
-@AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
-@Import({PostgresContainerConfig.class, FlywayConfig.class})
-class NlSearchTagResolutionIntegrationTest {
-
-    @Autowired
-    private TagRepository tagRepository;
-
-    @Test
-    void findDescendantIdsByName_parentName_includesChildId() {
-        Tag parent = tagRepository.save(Tag.builder().name("Krieg").build());
-        Tag child = tagRepository.save(Tag.builder().name("Weltkrieg").parentId(parent.getId()).build());
-
-        List<UUID> ids = tagRepository.findDescendantIdsByName("Krieg");
-
-        assertThat(ids).containsExactlyInAnyOrder(parent.getId(), child.getId());
-    }
-
-    @Test
-    void findDescendantIdsByName_childName_returnsOnlyChild() {
-        Tag parent = tagRepository.save(Tag.builder().name("Krieg").build());
-        Tag child = tagRepository.save(Tag.builder().name("Weltkrieg").parentId(parent.getId()).build());
-
-        List<UUID> ids = tagRepository.findDescendantIdsByName("Weltkrieg");
-
-        assertThat(ids).containsExactly(child.getId());
-        assertThat(ids).doesNotContain(parent.getId());
-    }
-
-    @Test
-    void findByNameContainingIgnoreCase_parentSubstring_matchesParentOnly() {
-        Tag parent = tagRepository.save(Tag.builder().name("Krieg").build());
-        tagRepository.save(Tag.builder().name("Weltkrieg").parentId(parent.getId()).build());
-
-        List<Tag> found = tagRepository.findByNameContainingIgnoreCase("Krieg");
-
-        assertThat(found).extracting(Tag::getName).containsExactlyInAnyOrder("Krieg", "Weltkrieg");
-    }
-}
--- a/backend/src/test/java/org/raddatz/familienarchiv/search/RestClientOllamaClientTest.java
+++ b/backend/src/test/java/org/raddatz/familienarchiv/search/RestClientOllamaClientTest.java
@@ -1,113 +0,0 @@
-package org.raddatz.familienarchiv.search;
-
-import com.github.tomakehurst.wiremock.WireMockServer;
-import com.github.tomakehurst.wiremock.core.WireMockConfiguration;
-import org.junit.jupiter.api.AfterEach;
-import org.junit.jupiter.api.BeforeEach;
-import org.junit.jupiter.api.Test;
-import org.raddatz.familienarchiv.exception.DomainException;
-import org.raddatz.familienarchiv.exception.ErrorCode;
-
-import static com.github.tomakehurst.wiremock.client.WireMock.*;
-import static org.assertj.core.api.Assertions.assertThat;
-import static org.assertj.core.api.Assertions.assertThatThrownBy;
-
-class RestClientOllamaClientTest {
-
-    private WireMockServer wireMock;
-    private RestClientOllamaClient client;
-
-    @BeforeEach
-    void setUp() {
-        wireMock = new WireMockServer(WireMockConfiguration.wireMockConfig().dynamicPort());
-        wireMock.start();
-
-        OllamaProperties props = new OllamaProperties();
-        props.setBaseUrl("http://localhost:" + wireMock.port());
-        props.setModel("qwen2.5:7b-instruct-q4_K_M");
-        props.setTimeoutSeconds(5);
-        props.setHealthCheckTimeoutSeconds(2);
-
-        client = new RestClientOllamaClient(props);
-    }
-
-    @AfterEach
-    void tearDown() {
-        wireMock.stop();
-    }
-
-    // --- Factory helpers ---
-
-    private String makeOllamaResponseJson(String personNamesJson, String personRole,
-                                           String dateFrom, String dateTo, String keywordsJson) {
-        String inner = String.format(
-                "{\"personNames\":%s,\"personRole\":\"%s\",\"dateFrom\":%s,\"dateTo\":%s,\"keywords\":%s}",
-                personNamesJson, personRole,
-                dateFrom == null ? "null" : "\"" + dateFrom + "\"",
-                dateTo == null ? "null" : "\"" + dateTo + "\"",
-                keywordsJson
-        );
-        return String.format("{\"model\":\"qwen2.5:7b-instruct-q4_K_M\",\"response\":\"%s\",\"done\":true}",
-                inner.replace("\"", "\\\""));
-    }
-
-    // --- Test cases ---
-
-    @Test
-    void parse_returnsExtraction_whenOllamaReturnsValidJson() {
-        String body = makeOllamaResponseJson("[\"Walter\"]", "sender", "1914-01-01", "1914-12-31", "[\"Krieg\"]");
-        wireMock.stubFor(post(urlEqualTo("/api/generate"))
-                .willReturn(aResponse()
-                        .withStatus(200)
-                        .withHeader("Content-Type", "application/json")
-                        .withBody(body)));
-
-        OllamaExtraction result = client.parse("Was hat Walter im Krieg geschrieben?");
-
-        assertThat(result.personNames()).containsExactly("Walter");
-        assertThat(result.personRole()).isEqualTo("sender");
-        assertThat(result.keywords()).containsExactly("Krieg");
-        assertThat(result.dateFrom()).isNotNull();
-        assertThat(result.dateTo()).isNotNull();
-    }
-
-    @Test
-    void parse_throwsSmartSearchUnavailable_whenOllamaReturns500() {
-        wireMock.stubFor(post(urlEqualTo("/api/generate"))
-                .willReturn(aResponse().withStatus(500)));
-
-        assertThatThrownBy(() -> client.parse("some query"))
-                .isInstanceOf(DomainException.class)
-                .extracting(e -> ((DomainException) e).getCode())
-                .isEqualTo(ErrorCode.SMART_SEARCH_UNAVAILABLE);
-    }
-
-    @Test
-    void parse_throwsSmartSearchUnavailable_whenOllamaExceedsTimeout() {
-        wireMock.stubFor(post(urlEqualTo("/api/generate"))
-                .willReturn(aResponse()
-                        .withStatus(200)
-                        .withHeader("Content-Type", "application/json")
-                        .withFixedDelay(6000)
-                        .withBody("{\"response\":\"{}\",\"done\":true}")));
-
-        assertThatThrownBy(() -> client.parse("some query"))
-                .isInstanceOf(DomainException.class)
-                .extracting(e -> ((DomainException) e).getCode())
-                .isEqualTo(ErrorCode.SMART_SEARCH_UNAVAILABLE);
-    }
-
-    @Test
-    void parse_throwsSmartSearchUnavailable_whenOllamaReturnsMalformedJson() {
-        wireMock.stubFor(post(urlEqualTo("/api/generate"))
-                .willReturn(aResponse()
-                        .withStatus(200)
-                        .withHeader("Content-Type", "application/json")
-                        .withBody("{\"response\":\"not-json-at-all\",\"done\":true}")));
-
-        assertThatThrownBy(() -> client.parse("some query"))
-                .isInstanceOf(DomainException.class)
-                .extracting(e -> ((DomainException) e).getCode())
-                .isEqualTo(ErrorCode.SMART_SEARCH_UNAVAILABLE);
-    }
-}
--- a/docker-compose.prod.yml
+++ b/docker-compose.prod.yml
@@ -50,7 +50,6 @@ volumes:
  minio-data:
  ocr-models:
  ocr-cache:
-  ollama-models:

 services:
  db:
@@ -201,73 +200,6 @@ services:
    security_opt:
      - no-new-privileges:true

-  # --- Ollama: Model init (one-shot pull) ---
-  # Pulls qwen2.5:7b-instruct-q4_K_M (~4.7 GB) into the ollama-models volume on
-  # first start; exits quickly on subsequent starts (model already cached).
-  # The ollama/ollama image's ENTRYPOINT is `ollama` and the image ships WITHOUT
-  # curl, so the entrypoint is overridden to a shell and readiness is probed with
-  # `ollama list` (not curl). The pull is guarded by a `grep` on the cached model
-  # list so a model already on the volume exits clean WITHOUT a registry round-trip
-  # — a host reboot during a registry/network blip can no longer fail init (which
-  # would block the ollama service via service_completed_successfully).
-  # Backend degrades gracefully (503) if Ollama is absent.
-  ollama-model-init:
-    image: ollama/ollama:0.30.6
-    restart: "no"
-    entrypoint: ["/bin/sh", "-c"]
-    command:
-      - "ollama serve & until ollama list >/dev/null 2>&1; do sleep 1; done && (ollama list | grep -q 'qwen2.5:7b-instruct-q4_K_M' || ollama pull qwen2.5:7b-instruct-q4_K_M)"
-    networks:
-      - archiv-net
-    volumes:
-      - ollama-models:/root/.ollama
-    mem_limit: 2g
-    read_only: true
-    tmpfs:
-      - /tmp:size=512m
-    cap_drop:
-      - ALL
-    security_opt:
-      - no-new-privileges:true
-
-  # --- Ollama: LLM inference server ---
-  # Serves the pre-pulled model for NL search inference. Backend reaches it at
-  # http://ollama:11434 (application.yaml default; no env override required).
-  # Healthcheck uses `ollama list` because the image has no curl.
-  ollama:
-    image: ollama/ollama:0.30.6
-    restart: unless-stopped
-    expose:
-      - "11434"
-    networks:
-      - archiv-net
-    volumes:
-      - ollama-models:/root/.ollama
-    environment:
-      # Pin the model in memory (no idle unload). Without this, Ollama evicts
-      # the model after ~5 min idle and the next query pays a cold-load penalty
-      # that exceeds the backend read timeout → NL search 503 after idle.
-      OLLAMA_KEEP_ALIVE: "-1"
-    cpus: "${OLLAMA_CPU_LIMIT:-4.0}"
-    mem_limit: "${OLLAMA_MEM_LIMIT:-8g}"
-    memswap_limit: "${OLLAMA_MEM_LIMIT:-8g}"
-    read_only: true
-    tmpfs:
-      - /tmp:size=512m
-    cap_drop:
-      - ALL
-    security_opt:
-      - no-new-privileges:true
-    healthcheck:
-      test: ["CMD", "ollama", "list"]
-      interval: 30s
-      timeout: 10s
-      retries: 5
-      start_period: 60s
-    depends_on:
-      ollama-model-init:
-        condition: service_completed_successfully
-
  backend:
    image: familienarchiv/backend:${TAG:-nightly}
    build:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -141,75 +141,6 @@ services:
    security_opt:
      - no-new-privileges:true

-  # --- Ollama: Model init (one-shot pull) ---
-  # Pulls qwen2.5:7b-instruct-q4_K_M (~4.7 GB) into the ollama_models volume on first start.
-  # On subsequent starts (model already in volume), exits quickly without re-downloading.
-  # Not started in CI — CI uses explicit service selection
-  # (docker-compose.ci.yml: db minio create-buckets)
-  ollama-model-init:
-    image: ollama/ollama:0.30.6
-    restart: "no"
-    networks:
-      - archiv-net
-    volumes:
-      - ollama_models:/root/.ollama
-    mem_limit: 2g
-    read_only: true
-    tmpfs:
-      - /tmp:size=512m
-    cap_drop:
-      - ALL
-    security_opt:
-      - no-new-privileges:true
-    # The image ENTRYPOINT is `ollama`, so override it to a shell; the image has
-    # no curl, so readiness is probed with `ollama list` instead of a curl loop.
-    # The pull is guarded by a `grep` on the cached model list so an already-cached
-    # model exits clean without a registry round-trip (offline-safe re-up).
-    entrypoint: ["/bin/sh", "-c"]
-    command:
-      - "ollama serve & until ollama list >/dev/null 2>&1; do sleep 1; done && (ollama list | grep -q 'qwen2.5:7b-instruct-q4_K_M' || ollama pull qwen2.5:7b-instruct-q4_K_M)"
-
-  # --- Ollama: LLM inference server ---
-  # Serves the pre-pulled model for NL search inference.
-  # Not started in CI — CI uses explicit service selection
-  # (docker-compose.ci.yml: db minio create-buckets)
-  ollama:
-    image: ollama/ollama:0.30.6
-    container_name: archive-ollama
-    restart: unless-stopped
-    expose:
-      - "11434"
-    networks:
-      - archiv-net
-    volumes:
-      - ollama_models:/root/.ollama
-    environment:
-      OLLAMA_API_KEY: "${OLLAMA_API_KEY}"
-      # Pin the model in memory (no idle unload) so queries never pay a cold-load
-      # penalty that exceeds the backend read timeout → NL search 503 after idle.
-      OLLAMA_KEEP_ALIVE: "-1"
-    cpus: "${OLLAMA_CPU_LIMIT:-4.0}"
-    mem_limit: "${OLLAMA_MEM_LIMIT:-8g}"
-    memswap_limit: "${OLLAMA_MEM_LIMIT:-8g}"
-    read_only: true
-    tmpfs:
-      - /tmp:size=512m
-    cap_drop:
-      - ALL
-    security_opt:
-      - no-new-privileges:true
-    healthcheck:
-      # `ollama list` hits the local API and exits non-zero if the server is
-      # down — used instead of curl, which the image does not ship.
-      test: ["CMD", "ollama", "list"]
-      interval: 30s
-      timeout: 10s
-      retries: 5
-      start_period: 60s  # model weights are pre-loaded by ollama-model-init; service only needs to bind port
-    depends_on:
-      ollama-model-init:
-        condition: service_completed_successfully
-
  # --- Backend: Spring Boot ---
  backend:
    build:
@@ -253,8 +184,6 @@ services:
      SPRING_MAIL_PROPERTIES_MAIL_SMTP_STARTTLS_ENABLE: ${MAIL_STARTTLS_ENABLE:-false}
      APP_OCR_BASE_URL: http://ocr-service:8000
      APP_OCR_TRAINING_TOKEN: "${OCR_TRAINING_TOKEN:-}"
-      APP_OLLAMA_BASE_URL: "${APP_OLLAMA_BASE_URL:-http://ollama:11434}"
-      APP_OLLAMA_API_KEY: "${OLLAMA_API_KEY}"
      SENTRY_DSN: ${SENTRY_DSN:-}
      SENTRY_TRACES_SAMPLE_RATE: ${SENTRY_TRACES_SAMPLE_RATE:-1.0}
      # Observability: send traces to Tempo inside archiv-net (OTLP gRPC port 4317)
@@ -318,4 +247,3 @@ volumes:
  frontend_node_modules:
  ocr_models:
  ocr_cache:
-  ollama_models:
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -50,17 +50,15 @@ graph TD

 The OCR service requires significant RAM for model loading. The dev compose sets `mem_limit: 12g`.

-| Production target | RAM | Recommended OCR limit | NL Search | Notes |
-|---|---|---|---|---|
-| Current server (Hetzner Serverbörse, i7-6700) | 64 GB | 12 GB | Supported | Default `mem_limit: 12g` works comfortably; plenty of headroom for Ollama |
-| ≥ 16 GB RAM | 16+ GB | 12 GB | Supported | Default works |
-| 8 GB RAM | 8 GB | 6 GB | Disabled — set `APP_OLLAMA_BASE_URL=` (empty) | Set `OCR_MEM_LIMIT=6g`; accept reduced batch sizes |
-| 4 GB RAM | 4 GB | — | Unsupported | Disable OCR service (`profiles: [ocr]`); run OCR on demand only |
+| Production target | RAM | Recommended OCR limit | Notes |
+|---|---|---|---|
+| Current server (Hetzner Serverbörse, i7-6700) | 64 GB | 12 GB | Default `mem_limit: 12g` works comfortably |
+| ≥ 16 GB RAM | 16+ GB | 12 GB | Default works |
+| 8 GB RAM | 8 GB | 6 GB | Set `OCR_MEM_LIMIT=6g`; accept reduced batch sizes |
+| 4 GB RAM | 4 GB | — | Disable OCR service (`profiles: [ocr]`); run OCR on demand only |

 On servers with less than 16 GB RAM the default `mem_limit: 12g` cannot be honoured — set the `OCR_MEM_LIMIT` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow). The prod compose interpolates this var with a 12g default.

-> **Memory budget:** OCR (~6 GB active) + Ollama (~8 GB) = ~14 GB. On servers with less than 16 GB RAM, do not run `docker-compose.observability.yml` continuously alongside both OCR and Ollama.
-
 ### Dev vs production differences

 | Concern | Dev (`docker-compose.yml`) | Prod (`docker-compose.prod.yml`) |
@@ -147,16 +145,6 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
 | `XDG_CACHE_HOME` | XDG cache base dir — redirects Matplotlib and other XDG-aware libraries away from the read-only `HOME` (`/home/ocr`) to the writable cache volume | `/app/cache` | — | — |
 | `TORCH_HOME` | PyTorch model cache — redirects `~/.cache/torch` to the writable models volume | `/app/models/torch` | — | — |

-### Ollama (NL search) service
-
-| Variable | Purpose | Default | Required? | Sensitive? |
-|---|---|---|---|---|
-| `APP_OLLAMA_BASE_URL` | Base URL for the Ollama service. Leave empty to disable NL search. | `http://ollama:11434` | — | — |
-| `APP_OLLAMA_API_KEY` | API key passed as `Authorization: Bearer` to Ollama. Leave empty for unauthenticated access. Note: `OLLAMA_API_KEY` is not enforced in Ollama 0.6.5 or 0.30.6 (see ADR-028). | — | — | YES |
-| `OLLAMA_CPU_LIMIT` | Docker CPU quota for the Ollama container. On CX42 (8 vCPUs) can be raised to `7.5`. | `4.0` | — | — |
-| `OLLAMA_MEM_LIMIT` | Memory limit for the Ollama container. Requires CX42 (16 GB RAM). | `8g` | — | — |
-| `OLLAMA_API_KEY` | API key set on the Ollama service itself. Same value as `APP_OLLAMA_API_KEY`. Leave empty for unauthenticated. | — | — | YES |
-
 ### Observability stack (`docker-compose.observability.yml`)

 | Variable | Purpose | Default | Required? | Sensitive? |
@@ -277,18 +265,6 @@ git.raddatz.cloud      A   <server IP>

 ### 3.4 First deploy

-> **First start — Ollama model pull:** On first `docker compose up -d`, the `ollama-model-init` container pulls `qwen2.5:7b-instruct-q4_K_M` (~4.7 GB). At 10 Mbps this takes approximately 60–90 minutes; at 100 Mbps approximately 6–10 minutes. The pull is a one-time operation — subsequent restarts skip it (model already on the `ollama_models` volume). Monitor progress with `docker logs -f $(docker ps -q --filter name=ollama-model-init)`.
->
-> **Do not use `--wait` on first deploy** — `docker compose up -d --wait` waits for all services to reach their health/completion target, including `ollama-model-init`. On first pull this blocks for 60–90 minutes and will time out any CI/deploy script that uses `--wait`.
->
-> **Re-deploy idempotency:** on subsequent `docker compose up -d` runs (including `--force-recreate`), `ollama-model-init` re-executes but exits in seconds — Ollama's CLI skips the download when the model digest already matches what is on the volume.
->
-> **Verify NL search is active** after enabling Ollama (`APP_OLLAMA_BASE_URL=http://ollama:11434`):
-> ```bash
-> curl -s http://localhost:8080/api/nl-search?q=brief+von+grossmutter
-> # Returns 200 with results → NL search is active
-> # Returns 503 NL_SEARCH_UNAVAILABLE → Ollama is not reachable or APP_OLLAMA_BASE_URL is unset
-> ```

 ```bash
 # 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
@@ -585,55 +561,6 @@ bash scripts/download-kraken-models.sh

 > Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.

-### Ollama — natural-language search (NL Search)
-
-NL search uses a local Ollama instance for query parsing. The `ollama` service is defined in `docker-compose.yml` alongside the main stack.
-
-**First-time model pull** (required before the feature works):
-
-```bash
-docker compose exec ollama ollama pull qwen2.5:7b-instruct-q4_K_M
-```
-
-This downloads ~4.4 GB. The model is stored in the `ollama_data` Docker volume and persists across container restarts.
-
-**Verify the model is available:**
-
-```bash
-docker compose exec ollama ollama list
-```
-
-Expected output includes `qwen2.5:7b-instruct-q4_K_M`.
-
-**Health check** — the backend polls `GET /api/tags` on Ollama at startup and before inference. If Ollama is absent, `POST /api/search/nl` returns HTTP 503 with `SMART_SEARCH_UNAVAILABLE`.
-
-**Configuration** (see `application.yaml` under `app.ollama`):
-
-| Property | Default | Description |
-|---|---|---|
-| `app.ollama.base-url` | `http://ollama:11434` | Ollama service URL (dev: `http://localhost:11434`) |
-| `app.ollama.model` | `qwen2.5:7b-instruct-q4_K_M` | Model to use for inference |
-| `app.ollama.timeout-seconds` | `60` | Read timeout for inference calls (absorbs cold model load on the first query after an Ollama restart) |
-| `app.nl-search.rate-limit.max-requests-per-minute` | `5` | Per-user rate limit |
-
-### Upgrade the Ollama model
-
-To switch to a newer model version (e.g. a future release of `qwen2.5`):
-
-1. Update the model name in the `ollama-model-init` `command:` in `docker-compose.yml`.
-2. Remove the existing model volume to free the old weights:
-   ```bash
-   docker volume rm familienarchiv_ollama_models
-   ```
-   (In production the volume name is prefixed with the compose project: `archiv-production_ollama-models`.)
-3. Restart the stack:
-   ```bash
-   docker compose up -d
-   ```
-   The `ollama-model-init` container pulls the new model weights on first start (~4–8 GB download depending on the model). The `ollama` inference server will not start until the pull completes (`condition: service_completed_successfully`).
-
-> **`ollama_models` volume:** holds model weights only — fully reproducible by re-pull, no backup needed.
-
 ### Trigger a canonical import

 The importer no longer parses the raw spreadsheet. It consumes the **canonical artifacts**
--- a/docs/GLOSSARY.md
+++ b/docs/GLOSSARY.md
@@ -165,23 +165,7 @@ _See also [Chronik](#chronik-internal)._

 **Domain** — a Tier-1 bounded context with its own entities, controller, service, repository, and DTOs. Backend domains: `document`, `person`, `tag`, `user`, `geschichte`, `notification`, `ocr`, `audit`, `dashboard`. Frontend domains mirror this structure under `src/lib/`.

---
-
-## NL Search Terms
-
-**NlSearch** — the natural-language document search feature. Users type a plain-German query (e.g. "Was hat Walter im Krieg an Emma geschrieben?"); the backend parses it via Ollama, resolves person names to database UUIDs, and delegates to the standard `DocumentService.searchDocuments()` path. Endpoint: `POST /api/search/nl`.
-
-**NlQueryInterpretation** — the structured result of parsing a natural-language query. Contains: `resolvedPersons` (persons whose names unambiguously matched one DB record), `ambiguousPersons` (all candidates when a name matched more than one person), `keywords` (LLM-extracted search terms), `dateFrom`/`dateTo` (extracted date range), `rawQuery` (the original user input), `keywordsApplied` (whether keyword FTS was used), `resolvedTags` (tags matched by keyword→tag resolution), and `tagsApplied` (whether the OR-union tag filter was applied).
-
-**keyword→tag resolution** — the post-Ollama step in `NlQueryParserService` where each LLM-extracted keyword is substring-matched against the tag taxonomy via `TagService.findByNameContaining()`. Keywords that hit one or more tags are removed from the FTS text list and become an OR-union tag filter; keywords with no match remain as FTS text. Matching is case-insensitive and traverses the tag hierarchy via the recursive CTE `findDescendantIdsByName`. See ADR-033.
-
-**PersonHint** — a lightweight `{id, displayName}` pair used in `NlQueryInterpretation` to describe a resolved or ambiguous person without exposing the full `Person` entity to the frontend.
-
-**NameMatches** — the Person-domain result of `PersonService.resolveByName(name)`: candidate persons split by name-match strength into `direct` and `partial`. A match is **direct** when every query token is a whole-token match (order-independent, alias/maiden-name aware) across all of a person's name components (`firstName`, `lastName`, `alias`, each `PersonNameAlias` first+last, `title`); a **partial** matched the substring fetch but is not direct (e.g. "Cram" → "Clara Cramer"). The vocabulary is deliberately match strength, not the search layer's resolved/ambiguous buckets — `NlQueryParserService` maps one direct → resolved (auto-select), ≥2 direct → ambiguous, partial-only → ambiguous suggestions ("Meintest du …?"), and no candidates → folded into full-text search.
-
-**TagHint** — a lightweight `{id, name, color?}` triple used in `NlQueryInterpretation.resolvedTags` to describe a tag matched by keyword→tag resolution. `color` is the tag's effective color (one-level inheritance from parent when the tag has no own color), or null if neither tag nor parent has a color.
-
-**theme chip** `[frontend]` — a removable chip rendered in `InterpretationChipRow` for each entry in `NlQueryInterpretation.resolvedTags` when `tagsApplied` is `true`. Displays "Thema: {tag.name}" (prefix varies by locale). Clicking × removes the tag from the OR-union filter and navigates to `/documents?tag=…&tagOp=OR` with remaining tag and person parameters preserved.
+**NameMatches** — the Person-domain result of `PersonService.resolveByName(name)`: candidate persons split by name-match strength into `direct` and `partial`. A match is **direct** when every query token is a whole-token match (order-independent, alias/maiden-name aware) across all of a person's name components (`firstName`, `lastName`, `alias`, each `PersonNameAlias` first+last, `title`); a **partial** matched the substring fetch but is not direct (e.g. "Cram" → "Clara Cramer").

 ---

--- a/docs/adr/028-nl-search-ollama.md
+++ b/docs/adr/028-nl-search-ollama.md
@@ -1,67 +0,0 @@
-# ADR-028 — Natural language search is powered by Ollama (Qwen 2.5 7B), not a cloud API
-
-**Date:** 2026-06-06
-**Status:** Accepted
-**Issue:** #738 (NL search backend); part of epic #735
-**Milestone:** Archive Intelligence — NL Search
-
---
-
-## Context
-
-Family members write their search intent in plain German ("Was hat Walter im Krieg an Emma geschrieben?"), not in structured filter forms. Issue #735 defines NL search as a core product goal. Three delivery options were evaluated:
-
-**Option A — extend the OCR service.** The OCR Python microservice already runs on the same host. Adding LLM inference there avoids a new container. Rejected: the OCR service is a single-purpose, CPU-bound pipeline optimised for Kraken; bundling a 4.5 GB LLM weight into the same image would bloat it, complicate model lifecycle management, and create an unrelated failure domain (OOM on large OCR batches vs. LLM load time). ADR-001 was explicit about keeping OCR single-purpose.
-
-**Option B — call an external API (OpenAI, Anthropic, etc.).** Cloud inference is instant and requires no local hardware. Rejected: the archive contains real person names and private family correspondence from 1899–1950 — sending query content to a third party violates the project's data-residency principle (family data stays on the family server). Additionally, API cost and availability are outside the operator's control; the system must work air-gapped.
-
-**Option C — local Ollama service (chosen).** Ollama is a purpose-built LLM runtime with a simple REST API, model lifecycle management (`ollama pull`), and support for grammar-constrained JSON output. It runs entirely on the existing server (i7-6700, 64 GB RAM) with no cloud dependency.
-
-**Model selection:** Qwen 2.5 7B Q4_K_M (`qwen2.5:7b-instruct-q4_K_M`) was chosen over larger models because:
- Quantised weight is ~4.5 GB — fits comfortably in 64 GB RAM alongside PostgreSQL and the JVM.
- Instruction-tuned variant follows the structured JSON schema reliably without fine-tuning.
- CPU-only inference at Q4_K_M takes 2–15 seconds per query, acceptable for a search that replaces a multi-step filter form.
-
-**Prompt injection mitigation:** The backend sends the raw user query to Ollama. To prevent the model from being prompted to return schema-breaking output, the API call uses Ollama's `format` parameter with a grammar-constrained JSON schema. Output length is further bounded by `maxLength` constraints in the schema (names ≤ 200 chars, keywords ≤ 100 chars). `NlQueryParserService` enforces these limits in code before any LLM-extracted fragment is passed to `PersonRepository.searchByName()` — defence in depth.
-
-**DB-blind name resolution:** The Ollama prompt stays small (the raw query only); person database records are never sent to the model. Name resolution happens as a cheap SQL query after the model returns. This keeps the prompt short, avoids data leakage, and means adding 1,000 new persons requires no prompt change.
-
-**Graceful degradation:** In-path Ollama failures surface via `OllamaClient.parse()` — any `IOException`, read timeout, or non-2xx response is caught by `RestClientOllamaClient` and re-thrown as `DomainException(SMART_SEARCH_UNAVAILABLE, HTTP 503)`. `isHealthy()` has no callers inside `search/`; it is reserved for the ops/health-endpoint polling path only (e.g. a future `/api/health/ollama` endpoint). The regular structured search (`GET /api/documents/search`) is unaffected — it never calls Ollama.
-
-**Expected inference latency:** 2–15 seconds on the current CPU-only hardware. The frontend issue must show a persistent "Suche läuft…" indicator for the full duration (see `aria-live="polite"` requirement in issue #738 frontend notes). The backend timeout is 30 seconds (`app.ollama.timeout-seconds=30`) — chosen as a safe upper bound for Q4_K_M on the i7-6700 with a realistic 500-character query under modest concurrent load.
-
-**NL query logging policy:** Only metadata is logged — query length, resolved person count, latency in milliseconds. The raw query is never written to the log file. Rationale: queries contain real family names (PII); log files persist to disk and may be shipped to Loki. Structured metadata is sufficient for debugging latency regressions.
-
-**Prompt-amplification abuse:** A malicious user could submit a long or crafted query to cause slow Ollama inference, consuming CPU. Mitigated by `NlSearchRateLimiter` (5 requests per user per minute, Bucket4j + Caffeine) and by `@Size(max=500)` on the request body. The rate limiter is node-local; in multi-replica deployments the effective limit multiplies by replica count — acceptable at the current single-node deployment scale.
-
-**Ollama model pre-pull requirement:** The Docker image contains only the Ollama binary, not the model weights. The operator must run `ollama pull qwen2.5:7b-instruct-q4_K_M` (≈4.5 GB download, 10–30 minutes) before the backend starts inference. If skipped, every NL search request returns 503 until the pull completes. The deployment runbook in `docs/DEPLOYMENT.md` covers this explicitly.
-
-**Startup dependency:** The `backend` Compose service declares `depends_on: ollama: condition: service_healthy`. The Ollama healthcheck polls `GET http://localhost:11434/api/tags`; `start_period: 120s` provides margin for weight loading (20–60 s on SSD). Note: `service_healthy` confirms the API is responding, not that the model is downloaded — if the pull was skipped, inference still returns 404.
-
-**Multi-name resolution heuristic:** For 2-name queries (e.g. "Was hat Walter an Emma geschrieben?"), the first extracted name is treated as sender and the second as receiver. Per-name role annotation (e.g. `{name: "Walter", role: "sender"}`) was rejected because it would require a combinatorially complex Ollama schema and the most natural German phrasing strongly implies sender→receiver order. For single-name queries, a `personRole` field (`sender`/`receiver`/`any`) is returned.
-
-**`personRole: "any"` keyword limitation:** When `personRole` is `"any"` and the name resolves to exactly one person, `DocumentService.searchDocumentsByPersonId()` is called (OR semantics: person as sender or receiver). Keyword filtering is not applied on this path — only person identity and date range. `keywordsApplied = false` is returned in the response. Rationale: the JPQL for OR-semantics person queries has no text predicate; adding FTS would require a native query or a separate pass, adding complexity for a case that is already well-narrowed by person identity.
-
-**`search/` → `person/` + `document/` dependency direction:** `NlQueryParserService` calls `PersonService.findByDisplayNameContaining()` and `DocumentService.searchDocuments()` — both are legitimate cross-domain service calls, not repository leaks. The `search/` package has no JPA entities of its own and never accesses `PersonRepository` or `DocumentRepository` directly.
-
-**Keyword→tag resolution** (issue #743): After Ollama extracts the `keywords` list, `NlQueryParserService` calls `TagService.findByNameContaining()` for each keyword. Keywords that match one or more tags are removed from the FTS text list and added as OR-union tag filters; keywords with no tag match remain as FTS text. Resolved tags are returned to the frontend as `TagHint` objects in `NlQueryInterpretation.resolvedTags` and rendered as removable "Thema: X" chips. The `tagsApplied` flag signals whether the OR-union filter was actually passed to `DocumentService.searchDocuments()` — it is `false` when the `personRole:any` single-person path is taken, because that path has no tag filter slot. See ADR-033 for the tag name resolution and case-collision rules that `TagService.findByNameContaining()` relies on.
-
-## Decision
-
-**Introduce a new `search/` domain package** with a local Ollama integration via `RestClientOllamaClient`. The Ollama service runs as a separate Docker container, reachable only on the internal Docker network (`expose: ["11434"]`, not `ports:`). The backend calls Ollama's `/api/generate` endpoint with grammar-constrained JSON output. Name resolution and document search are performed by existing services after the model returns.
-
-Key component structure:
- `OllamaClient` / `OllamaHealthClient` interfaces — mockable for tests, modelled on `OcrClient`/`OcrHealthClient`
- `RestClientOllamaClient` — two `RestClient` instances (30 s inference, 2 s health-check)
- `NlQueryParserService` — orchestrates Ollama → name resolution → document search
- `NlSearchRateLimiter` — Bucket4j + Caffeine, 5 req/min per user
- `NlSearchController` — `POST /api/search/nl`, `@RequirePermission(READ_ALL)`
-
-## Consequences
-
- Family members can query in natural German without learning filter UI. Expected search satisfaction improvement for the 60+ age cohort (primary transcription audience) is significant.
- NL search is unavailable when Ollama is down or the model pull is not complete. The regular search is unaffected. The 503 response includes a CTA directing users to the regular search.
- Operator responsibility: run `ollama pull` on first deploy and after model updates. The backup runbook must exclude `ollama_models` volume (model weights are re-downloadable, not user data).
- Inference takes 2–15 seconds. The frontend loading indicator is a hard requirement (see issue #738 frontend notes).
- The rate limiter is node-local. At the current single-node deployment scale this is correct. If the service is ever scaled horizontally, the rate limiter must be moved to Redis (same caveat as `LoginRateLimiter`).
- The `search/` package introduces a new cross-domain dependency direction (`search` → `person`, `search` → `document`). This is intentional and documented in `docs/architecture/c4/l3-backend-search.puml`.
--- a/docs/adr/028-ollama-docker-compose-service.md
+++ b/docs/adr/028-ollama-docker-compose-service.md
@@ -1,239 +0,0 @@
-# ADR-028: Ollama Docker Compose service for NL search
-
-**Date:** 2026-06-06
-**Status:** Accepted
-**Deciders:** Marcel Raddatz
-**Relates to:** #737 (infrastructure), #735 (NL search epic)
-
---
-
-## Context
-
-Issue #735 introduces natural-language document search, requiring a local LLM to generate embeddings and/or run inference at query time. The family archive stores personal family history — data privacy is non-negotiable, so cloud-based inference APIs are excluded. The production target is a Hetzner CX42 (16 GB RAM, 8 vCPUs, CPU-only, ~32 EUR/month).
-
-Alternatives considered:
-
-| Option | Reason rejected |
-|---|---|
-| **llama.cpp** | No HTTP API out of the box; requires custom wrapper; higher ops burden |
-| **vLLM** | GPU-first; significant overhead on CPU-only hardware; overkill for this scale |
-| **Cloud APIs** (OpenAI, Gemini, etc.) | Vendor lock-in; per-token cost at scale; data leaves the server — unacceptable for a private family archive |
-| **Ollama** | Self-contained Docker image; built-in HTTP REST API; actively maintained; CPU-compatible; zero egress |
-
-**Decision:** run Ollama as a Docker Compose service alongside the existing stack.
-
---
-
-## Decisions
-
-### 1. Hardware minimums and CPU-only constraint
-
-All inference runs on CPU. The target is the Hetzner CX42 (16 GB RAM, 8 vCPUs).
-
-| Tier | RAM | NL search |
-|---|---|---|
-| CX42 | 16 GB | Supported — full stack including Ollama |
-| CX32 | 8 GB | Disabled — set `APP_OLLAMA_BASE_URL=` (empty) to skip Ollama entirely |
-| CX22 | 4 GB | Unsupported for NL search |
-
-### 2. Memory budget on CX42
-
-| Component | `mem_limit` | Typical active RSS |
-|---|---|---|
-| OCR service | 12g (hard ceiling) | ~6 GB |
-| Ollama | 8g | ~8 GB |
-| **Total** | | **~14 GB active** |
-
-`memswap_limit` on the Ollama service is set to `8g` (matching `mem_limit`) to prevent Linux from swapping model weights into swap under OCR memory pressure. Swapping model weights does not crash the container but silently degrades inference latency. This mirrors the pattern already applied to the OCR service.
-
-**Operational constraint:** do NOT run `docker-compose.observability.yml` continuously alongside both OCR and Ollama on a CX42. The observability stack adds ~2 GB, which leaves no headroom.
-
-### 3. Graceful-degradation contract
-
-`app.ollama.base-url` absent OR blank → Ollama bean NOT registered → NL search returns HTTP 503 with `ErrorCode: NL_SEARCH_UNAVAILABLE`.
-
-This single code path covers all unavailability scenarios: base-url unset, service unreachable, health check failed, and request timeout.
-
-#### Why not `@ConditionalOnProperty`
-
-`@ConditionalOnProperty` registers the bean when the property is present but blank (`APP_OLLAMA_BASE_URL=`). This produces a `RestClient` with an empty base URL that fails at runtime with an opaque error rather than a clean 503.
-
-#### Correct condition expression
-
-```java
-@ConditionalOnExpression("!'${app.ollama.base-url:}'.isBlank()")
-```
-
-When the property is absent, the placeholder resolves to `''`; `.isBlank()` returns `true`; negation makes the condition `false`; the bean is not registered. Same result for an explicit empty string (`APP_OLLAMA_BASE_URL=`).
-
-### 4. Backend configuration pattern
-
-Use a `@ConfigurationProperties` record, not separate `@Value` injections:
-
-```java
-@ConfigurationProperties("app.ollama")
-record OllamaProperties(String baseUrl, String apiKey) {}
-```
-
-`OllamaProperties` is registered unconditionally — it is a plain value holder with no side effects.
-
-`@ConditionalOnExpression` belongs **only** on `RestClientOllamaClient` (the bean that creates a live network client).
-
-**Deliberate divergence from the OCR pattern:** the OCR service uses `@Value`-with-default because OCR is always-on and `http://ocr-service:8000` is a safe default. Ollama is truly optional — a missing URL means "feature disabled", not "use this default server". There is no safe default Ollama URL.
-
-### 5. Optional<OllamaClient> injection
-
-The NL search service uses constructor injection with `Optional<OllamaClient>`:
-
-```java
-private final Optional<OllamaClient> ollamaClient;
-```
-
-When empty (bean not registered), the service method returns 503 immediately:
-
-```java
-var client = ollamaClient.orElseThrow(
-    () -> DomainException.internal(ErrorCode.NL_SEARCH_UNAVAILABLE, "Ollama not configured"));
-```
-
-Prefer this over `@Autowired(required = false)` with a null check — the null-check pattern is noisy when the service already uses `@RequiredArgsConstructor`.
-
-### 6. Empty API key guard
-
-`RestClientOllamaClient` omits the `Authorization` header entirely when `apiKey` is blank:
-
-```java
-if (!apiKey.isBlank()) {
-    request.header("Authorization", "Bearer " + apiKey);
-}
-```
-
-Sending `Authorization: Bearer ` (empty token) has undefined or potentially broken behavior depending on the Ollama version. This mirrors the `trainingToken` guard in `RestClientOcrClient.java:107`.
-
-### 7. OLLAMA_API_KEY behavior in Ollama 0.6.5 and 0.30.6
-
-**Empirically verified (2026-06-06) on both `0.6.5` and `0.30.6`:** `OLLAMA_API_KEY` does **not** enforce request authentication in either version.
-
-Test matrix run against `/api/tags`:
-
-| Configuration | No auth header | `Authorization: Bearer ` (empty) | `Authorization: Bearer wrongkey` | `Authorization: Bearer correctkey` |
-|---|---|---|---|---|
-| `OLLAMA_API_KEY=` (empty) | 200 | 200 | — | — |
-| `OLLAMA_API_KEY` unset | 200 | — | — | — |
-| `OLLAMA_API_KEY=testkey99` | 200 | 200 | 200 | 200 |
-
-**Finding:** The `OLLAMA_API_KEY` environment variable is not listed in Ollama's startup config dump and does not gate any HTTP request in either tested version. All configurations — empty string, fully unset, and a real key — accept all requests without authentication.
-
-**Practical implication:** `OLLAMA_API_KEY` provides no defense-in-depth in the tested versions. `archiv-net` network isolation is the only effective security control. The env var is retained in the Compose definition and `.env.example` for forward compatibility if Ollama enables enforcement in a future version, but operators must not rely on it for access control.
-
-**Backend guard still valid:** the `RestClientOllamaClient` code-level guard (omit `Authorization` header when `apiKey.isBlank()`) remains correct behavior regardless — it prevents a malformed `Authorization: Bearer ` header from being sent.
-
-### 8. read_only: true feasibility
-
-**Empirically verified (2026-06-06) on both `0.6.5` and `0.30.6`:** `read_only: true` works with Ollama. All three operations — `ollama serve`, `ollama pull qwen2.5:7b-instruct-q4_K_M`, and `ollama list` — succeeded with exit code 0 in both versions.
-
-Test run:
-```bash
-docker run --rm --read-only \
-  -v ollama_models:/root/.ollama \
-  --tmpfs /tmp \
-  --entrypoint sh ollama/ollama:0.30.6 \
-  -c "ollama serve & sleep 5 && ollama pull qwen2.5:7b-instruct-q4_K_M && ollama list"
-```
-
-**Note:** the entrypoint must be overridden to `sh` for the test command — the container's default entrypoint is `/bin/ollama` and does not accept `sh` as a subcommand. This is a Docker invocation detail; the Compose service definition uses the image's default entrypoint and `command:` override for the init container, which works correctly.
-
-**Result:** `read_only: true` and `tmpfs: - /tmp:size=512m` are applied to both `ollama` and `ollama-model-init`. The `ollama_models` volume handles all persistent writes; no other paths require write access during normal operation.
-
-### 9. Peak RSS of init container during pull
-
-**Empirically verified (2026-06-06):** Peak RSS during `qwen2.5:7b-instruct-q4_K_M` pull was **~108 MiB**.
-
-`docker stats` samples during the pull (15-second intervals):
-
-| Sample | MEM |
-|---|---|
-| 1 | 54.89 MiB |
-| 2 | 66.3 MiB |
-| 5 | 97.25 MiB |
-| 9 | **107.8 MiB** (peak) |
-
-`mem_limit: 2g` is adequate — the model weights stream directly to the named volume; RSS is dominated by the Ollama server process alone (~100 MB), not the model data. No bump to 4 GB needed.
-
-### 10. Init container pull mechanism
-
-The `ollama-model-init` container uses a curl-based readiness loop with captured PID:
-
-```sh
-ollama serve & SERVE_PID=$!
-until curl -sf http://localhost:11434/api/tags; do sleep 1; done
-ollama pull qwen2.5:7b-instruct-q4_K_M
-kill $SERVE_PID
-```
-
-`kill %1` (job-control syntax) is unreliable in non-interactive `sh -c` contexts. Capturing the PID via `SERVE_PID=$!` is reliable.
-
-The same endpoint (`/api/tags`) is used for both the init container readiness loop and the main service `healthcheck`.
-
-### 11. start_period: 60s rationale
-
-The model is pre-pulled by `ollama-model-init` before the main service starts (via `condition: service_completed_successfully`). At main service startup, Ollama only loads model weights from the named volume and binds port 11434.
-
-60 seconds is appropriate for this cold-start profile. 300 seconds was considered — that would be appropriate if the service pulled the model itself — but overstates actual startup time when the model is already present on the volume.
-
-### 12. Security threat model
-
-**Primary control:** `archiv-net` network isolation. Ollama has no externally exposed port (`expose:` only, not `ports:`). The Caddyfile must not route any path to the Ollama service.
-
-**Note on `OLLAMA_API_KEY`:** Per §7, `OLLAMA_API_KEY` is not enforced in Ollama 0.6.5 or 0.30.6 and provides no authentication barrier against a compromised backend container. `archiv-net` network isolation is the sole effective security control. The env var is retained for forward compatibility only — do not rely on it for access control.
-
-Both `ollama` and `ollama-model-init` receive the ADR-019 hardening baseline:
-
-```yaml
-cap_drop: [ALL]
-security_opt: [no-new-privileges:true]
-```
-
-### 13. CI exclusion strategy
-
-Docker Compose profiles are not used — they would add developer friction (requiring `--profile ...` for all local dev commands).
-
-CI uses explicit service selection in `docker-compose.ci.yml`:
-```bash
-docker compose -f docker-compose.ci.yml up -d db minio create-buckets
-```
-
-Ollama is simply not listed and is never started in CI. A YAML comment on the `ollama` service block documents this:
-
-```yaml
-# Not started in CI — CI uses explicit service selection
-# (docker-compose.ci.yml: db minio create-buckets)
-```
-
-### 14. ollama_models volume operational note
-
-The `ollama_models` named volume holds model weights only — fully reproducible by re-pull. No backup is needed.
-
-If the volume fills after a model upgrade:
-```bash
-docker volume rm ollama_models && docker compose up -d
-```
-The init container re-pulls the model on next startup.
-
---
-
-## Consequences
-
-### Positive
-
- NL search runs entirely on-premises; no data leaves the server and no per-token cloud cost.
- Graceful degradation is a first-class concern: smaller or budget-constrained instances can run the app without Ollama with a single env var change.
- The init container pattern keeps model pull out of the critical startup path for the main service, giving accurate healthcheck timings.
- `@ConditionalOnExpression` with a blank-check is more correct than `@ConditionalOnProperty` for optional features with no safe default URL.
-
-### Risks and operational implications
-
- **Memory pressure:** OCR + Ollama together consume ~14 GB on a 16 GB host. Running the observability stack simultaneously risks OOM kills. Monitor with `docker stats`.
- **CPU inference latency:** `qwen2.5:7b-instruct-q4_K_M` is chosen for CPU viability, but inference on 8 vCPUs will be noticeably slower than GPU-accelerated alternatives. This is acceptable for the family archive use case (low concurrency, not real-time).
- All three empirical TBD items from the original issue spec were resolved — see §7 (OLLAMA_API_KEY not enforced), §8 (`read_only: true` works), §9 (peak RSS ~108 MiB).
- Model upgrades require a `docker volume rm` to free old weights before pulling the replacement. Document this in runbook/DEPLOYMENT.md.
--- a/docs/adr/034-ollama-production-deployment-and-keep-alive.md
+++ b/docs/adr/034-ollama-production-deployment-and-keep-alive.md
@@ -1,125 +0,0 @@
-# ADR-034: Ollama in production — deployment, keep-alive pinning, and corrected init recipe
-
-**Date:** 2026-06-06
-**Status:** Accepted
-**Deciders:** Marcel Raddatz
-**Relates to:** #758 (bug), #759 (fix), #737 (NL search infrastructure)
-**Corrects:** ADR-028 §10–§11 (init recipe and readiness probe)
-
---
-
-## Context
-
-ADR-028 introduced Ollama as a Docker Compose service for NL search and documented
-its topology, graceful-degradation contract, and memory budget. Two defects survived
-that work and only surfaced once NL search reached staging (#758):
-
-1. **Ollama was added only to the dev `docker-compose.yml`.** Staging and production
-   deploy from the self-contained `docker-compose.prod.yml`, which had no `ollama`
-   service. The backend defaults to `app.ollama.base-url: http://ollama:11434`, so its
-   client bean was active and resolved to a non-existent host → `ResourceAccessException`
-   → HTTP 503 on every NL search.
-2. **The init recipe documented in ADR-028 §10 never worked.** The `ollama/ollama` image
-   `ENTRYPOINT` is `ollama`, so a bare `command: sh -c "…"` ran as `ollama sh -c "…"`
-   (`unknown command "sh"`), and the image ships **no curl**, so the curl-based readiness
-   loop and the curl healthcheck could never pass.
-
-This ADR records the production deployment decision and the corrected operational
-contract. It is also the durable record of *why* `OLLAMA_KEEP_ALIVE=-1` is set, so a
-future maintainer does not "optimize" it away and reintroduce the cold-load 503.
-
---
-
-## Decisions
-
-### 1. Ollama is a first-class production service
-
-`docker-compose.prod.yml` now defines `ollama` + `ollama-model-init` + the
-`ollama-models` volume, mirroring the dev stack. The graceful-degradation contract from
-ADR-028 §3 is preserved: `backend` has **no** hard `depends_on` on `ollama`, so an absent
-or unhealthy Ollama still yields a clean 503 rather than blocking backend startup.
-
-### 2. Corrected init recipe (supersedes ADR-028 §10)
-
-The init container overrides the image entrypoint to a shell and probes readiness with
-`ollama list` (not curl, which the image lacks):
-
-```sh
-ollama serve & until ollama list >/dev/null 2>&1; do sleep 1; done && \
-  (ollama list | grep -q 'qwen2.5:7b-instruct-q4_K_M' || ollama pull qwen2.5:7b-instruct-q4_K_M)
-```
-
-```yaml
-entrypoint: ["/bin/sh", "-c"]
-```
-
-The pull is **guarded by a grep on the cached model list**. A model already on the volume
-exits clean without any registry round-trip. This makes re-up offline-safe: a host reboot
-during a registry/network blip can no longer fail init (which, via
-`condition: service_completed_successfully`, would otherwise block the `ollama` service
-and take NL search down until the registry was reachable again). The same recipe is used
-in dev and prod — one mental model.
-
-### 3. Healthcheck uses `ollama list` (supersedes ADR-028 §11 probe)
-
-```yaml
-healthcheck:
-  test: ["CMD", "ollama", "list"]
-```
-
-`ollama list` hits the local API and exits non-zero when the server is down — the correct
-probe for a curl-less image. The `start_period: 60s` rationale from ADR-028 §11 still holds.
-
-### 4. `OLLAMA_KEEP_ALIVE=-1` — pin the model in memory
-
-```yaml
-environment:
-  OLLAMA_KEEP_ALIVE: "-1"
-```
-
-By default Ollama evicts an idle model after ~5 minutes. The next query then pays a
-cold-load penalty that exceeds the backend read timeout, producing an NL search 503 after
-any idle period. Pinning the model (`-1` = never unload) keeps warm-path latency
-predictable (~18 s on CPU). **Do not remove this** without re-introducing the post-idle
-cold-load 503.
-
-### 5. Read timeout raised 30 → 60 s
-
-`app.ollama.timeout-seconds` is raised from 30 to 60 (`application.yaml`, mirrored in
-`DEPLOYMENT.md`). Warm CPU inference is ~18 s; the higher ceiling absorbs the one cold
-model load on the first query after an Ollama (re)start, before §4's pin takes hold.
-
-**Implicit NFR made explicit:** NL search shall return a result or a 503 within 60 s; the
-cold-start path immediately after an Ollama restart is the only path that approaches this
-ceiling.
-
-### 6. Hard-OOM trade-off (refines ADR-028 §2)
-
-`memswap_limit == mem_limit` (both `${OLLAMA_MEM_LIMIT:-8g}`) disables swap for the
-container. Combined with §4's pinned model, a memory-pressure event is a **hard OOM-kill,
-not graceful latency degradation**. This is deliberate — swap-thrashing an LLM is worse
-than a clean restart — but it means the 8 GB envelope is a real ceiling. `qwen2.5-7B-q4`
-plus its KV cache under load sits close enough to 8 GB that this needs a Prometheus
-memory alert on the `ollama` container before it bites in production (tracked as
-observability follow-up, not in this PR).
-
---
-
-## Consequences
-
-### Positive
-
- NL search works on staging/production, not just dev — the actual deploy artifact now
-  matches the documented architecture.
- Re-up is offline-safe: a cached model never depends on registry reachability.
- The keep-alive pin and timeout ceiling make NL search latency predictable on CPU.
-
-### Risks and operational implications
-
- **Hard OOM under memory pressure** (§6): a Prometheus alert on `ollama` container memory
-  is required before this is load-bearing in prod. Tracked as an observability follow-up.
- **Unauthenticated inference** relies entirely on `archiv-net` isolation (ADR-028 §7/§12,
-  unchanged). Sending an `Authorization` header from `RestClientOllamaClient` is a separate
-  durable hardening item, tracked outside this PR.
- ADR-028 §10–§11 describe a recipe that never functioned; this ADR is the authoritative
-  init/healthcheck contract going forward.
--- a/docs/adr/034-remove-nl-search.md
+++ b/docs/adr/034-remove-nl-search.md
@@ -0,0 +1,53 @@
+# ADR-034 — Remove NL/smart-search (supersedes ADR-028 ×2, ADR-034-ollama, ADR-035)
+
+**Date:** 2026-06-07
+**Status:** Accepted
+**Issue:** #772
+**Supersedes:** ADR-028 (nl-search-ollama), ADR-028 (ollama-docker-compose-service), ADR-034 (ollama-production-deployment-and-keep-alive), ADR-035 (rule-based-nlp-service)
+
+---
+
+## Context
+
+The natural-language search feature ("KI-Suche" / smart search) allowed users to enter
+free-form queries like *"Was hat Walter an Emma im Krieg geschrieben?"* and have them
+interpreted by an LLM into structured filters (persons, tags, date range, keywords).
+
+The feature went through two major iterations:
+1. **Ollama integration** (ADR-028): an `ollama` Docker service running a local LLM
+   (llama3.2/gemma3) parsed queries via a JSON-mode prompt.
+2. **Rule-based NLP service** (ADR-035): after Ollama proved too slow and unreliable on
+   CPU-only hardware, a Python FastAPI microservice (`nlp-service`, port 8001) replaced
+   it with deterministic regex + spaCy parsing plus a lightweight LLM call.
+
+Both approaches shared the same fundamental problem: inference on the production server
+(Hetzner Serverbörse, no GPU, 64 GB RAM, i7-6700) was too slow to be useful, with
+typical query latencies of 10–30 seconds. Users got better and faster results from
+the existing keyword search with date/person/tag filters.
+
+## Decision
+
+**Remove the NL search feature entirely.** The Python `nlp-service` microservice, the
+Spring Boot `search/` package (`NlSearchController`, `NlQueryParserService`,
+`RestClientNlpClient`, `NlSearchRateLimiter`, and all supporting classes), the frontend
+NL search components (`SmartModeToggle`, `SmartSearchStatus`, `InterpretationChipRow`,
+`DisambiguationPicker`), the related Docker Compose services, Prometheus scrape job,
+Grafana dashboard, and all i18n keys are removed.
+
+The existing structured search (FTS keyword + person/tag/date/directional filters) is
+sufficient for the archive's current audience and search workload.
+
+## Consequences
+
+- **Capability removed:** users can no longer enter free-form natural-language queries.
+  They must use the structured filter bar (keyword text box + person/tag/date/directional
+  dropdowns). For documents where these filters are sufficient, there is no regression.
+- **Operational simplification:** the Docker Compose stack loses two services
+  (`nlp-service` and previously `ollama`/`ollama-model-init`). Memory budget on the
+  production host is freed. No external model weights need to be kept warm.
+- **Future reinstatement:** if a GPU-capable host becomes available, re-implementing
+  server-side LLM inference would be straightforward given the clean separation of the
+  `NlSearchController` entry point. However, this ADR deliberately avoids leaving dead
+  infrastructure or stub code in place — start clean if and when that becomes viable.
+- **No data or schema change:** only query/endpoint code and Docker services are removed.
+  The `documents`, `persons`, and `tags` tables and their FTS indexes are untouched.
--- a/docs/architecture/c4/l2-containers.puml
+++ b/docs/architecture/c4/l2-containers.puml
@@ -12,15 +12,13 @@ System_Boundary(archiv, "Familienarchiv (Docker Compose)") {
    Container(frontend, "Web Frontend", "SvelteKit / Node adapter / port 3000", "Server-side rendered UI. Handles auth session cookies, document search and viewer, transcription editor, annotation layer, family tree (Stammbaum), stories (Geschichten), activity feed (Chronik), enrichment workflow, and admin panel.")
    Container(backend, "API Backend", "Spring Boot 4 / Java 21 / Jetty / port 8080", "REST API. Implements document management, search, user auth, file upload/download, transcription, OCR orchestration, and SSE notifications. Trusts X-Forwarded-* headers from Caddy.")
    Container(ocr, "OCR Service", "Python FastAPI / port 8000", "Handwritten text recognition (HTR) and OCR microservice. Single-node by design — see ADR-001. Reachable only on the internal Docker network; no external port exposed.")
-    Container(ollama, "Ollama LLM Service", "ollama/ollama:0.30.6 / port 11434 (internal only)", "Local LLM inference server for NL search. Runs qwen2.5:7b-instruct-q4_K_M on CPU. Reachable only on the internal Docker network; no external port exposed. Disabled when APP_OLLAMA_BASE_URL is unset or blank.")
-    ' Named volume: ollama_models — model weights, fully reproducible, no backup needed
    ContainerDb(db, "Relational Database", "PostgreSQL 16", "Stores document metadata, persons, users, permission groups, tags, transcription blocks, audit log, and Spring Session data.")
    ContainerDb(storage, "Object Storage", "MinIO (S3-compatible)", "Stores the actual document files (PDFs, scans). Backend uses a bucket-scoped service account (archiv-app), not MinIO root.")
    Container(mc, "Bucket / Service-Account Init", "MinIO Client (mc)", "One-shot container on startup. Idempotent: creates the archive bucket, the archiv-app service account, and attaches the readwrite policy.")
 }

 System_Boundary(observability, "Observability Stack (/opt/familienarchiv/docker-compose.observability.yml)") {
-    Container(prometheus, "Prometheus", "prom/prometheus:v3.4.0", "Scrapes metrics from backend (8081 /actuator/prometheus), OCR service (8000 /metrics), Ollama (11434 /metrics), node-exporter, and cAdvisor. Retention: 30 days.")
+    Container(prometheus, "Prometheus", "prom/prometheus:v3.4.0", "Scrapes metrics from backend (8081 /actuator/prometheus), OCR service (8000 /metrics), node-exporter, and cAdvisor. Retention: 30 days.")
    Container(node_exporter, "Node Exporter", "prom/node-exporter:v1.9.0", "Host-level CPU, memory, disk, and network metrics.")
    Container(cadvisor, "cAdvisor", "gcr.io/cadvisor/cadvisor:v0.52.1", "Per-container resource metrics.")
    Container(loki, "Loki", "grafana/loki:3.4.2", "Stores log streams from all containers.")
@@ -43,12 +41,11 @@ Rel(backend, ocr, "OCR job requests with presigned MinIO URL", "HTTP / REST / JS
 Rel(backend, mail, "Sends notification and password-reset emails (optional)", "SMTP")
 Rel(ocr, storage, "Fetches PDF via presigned URL", "HTTP / S3 presigned")
 Rel(mc, storage, "Bootstraps bucket + service account on startup", "MinIO Client CLI")
-Rel(backend, ollama, "NL query parsing (POST /api/generate)", "HTTP / REST / JSON")
 Rel(promtail, loki, "Pushes log streams", "HTTP/Loki push API")
 Rel(backend, tempo, "Sends distributed traces via OTLP", "HTTP / OTLP / port 4318 (archiv-net)")
 Rel(prometheus, backend, "Scrapes JVM + HTTP metrics", "HTTP 8081 /actuator/prometheus")
 Rel(prometheus, ocr, "Scrapes OCR + http_* metrics", "HTTP 8000 /metrics")
-Rel(prometheus, ollama, "Scrapes LLM request metrics", "HTTP 11434 /metrics")
+
 Rel(grafana, prometheus, "Queries metrics", "HTTP 9090")
 Rel(grafana, loki, "Queries logs", "HTTP 3100")
 Rel(grafana, tempo, "Queries traces", "HTTP 3200")
--- a/docs/architecture/c4/l3-backend-3h-search.puml
+++ b/docs/architecture/c4/l3-backend-3h-search.puml
@@ -1,37 +0,0 @@
-@startuml
-!include <C4/C4_Component>
-
-title Component Diagram: API Backend — NL Search
-
-Container(frontend, "Web Frontend", "SvelteKit")
-ContainerDb(db, "PostgreSQL", "PostgreSQL 16")
-Container(ollama, "Ollama", "ollama/ollama — port 11434 (internal only)")
-
-System_Boundary(backend, "API Backend (Spring Boot)") {
-    Component(nlCtrl, "NlSearchController", "Spring MVC — POST /api/search/nl", "REST entry point for natural language search. Enforces READ_ALL permission. Uses @AuthenticationPrincipal UserDetails to obtain the caller's email for rate limiting. Delegates to NlQueryParserService and returns NlSearchResponse.")
-    Component(rateLimiter, "NlSearchRateLimiter", "Spring Service", "Bucket4j + Caffeine LoadingCache keyed on user email. Allows 5 NL search requests per minute per user. Throws DomainException(SMART_SEARCH_RATE_LIMITED / HTTP 429) when the bucket is exhausted. Node-local — same caveat as LoginRateLimiter.")
-    Component(parserSvc, "NlQueryParserService", "Spring Service", "Orchestrates the full NL search pipeline: (1) validates query length, (2) calls OllamaClient.parse() to extract structured intent, (3) resolves keywords to tags via TagService.findByNameContaining(), (4) resolves each person name via PersonService.findByDisplayNameContaining(), (5) applies multi-name / personRole heuristics, (6) delegates to DocumentService.searchDocuments() or searchDocumentsByPersonId(). Returns NlSearchResponse. Never logs raw query content (PII).")
-    Component(ollamaClient, "RestClientOllamaClient", "Spring Service — implements OllamaClient + OllamaHealthClient", "HTTP client for the Ollama API. Uses two separate RestClient instances: inference client (30 s read timeout) and health-check client (2 s connect timeout). Calls POST /api/generate with grammar-constrained JSON schema (personNames, personRole, dateFrom, dateTo, keywords). isHealthy() polls GET /api/tags. Null-coalesces absent personNames/keywords to List.of(). Defaults unknown personRole to 'any' with a warning log. Maps timeout/5xx/parse errors to DomainException(SMART_SEARCH_UNAVAILABLE / HTTP 503).")
-    Component(ollamaProps, "OllamaProperties", "@ConfigurationProperties(\"app.ollama\")", "Config bean: baseUrl, model (qwen2.5:7b-instruct-q4_K_M), timeoutSeconds (default: 30), healthCheckTimeoutSeconds (default: 2).")
-    Component(rateLimitProps, "NlSearchRateLimitProperties", "@ConfigurationProperties(\"app.nl-search.rate-limit\")", "Config bean: maxRequestsPerMinute (default: 5).")
-}
-
-Component(personSvc, "PersonService", "Spring Service", "See diagram 3e. findByDisplayNameContaining(fragment) delegates to PersonRepository.searchByName() — covers first+last name, alias, and name aliases via LEFT JOIN.")
-Component(documentSvc, "DocumentService", "Spring Service", "See diagram 3b. searchDocuments() for keyword/sender/receiver/date queries. searchDocumentsByPersonId() for OR-semantics single-person queries (person as sender OR receiver, no keyword filter).")
-Component(tagSvc, "TagService", "Spring Service", "See diagram 3b. findByNameContaining(fragment) delegates to TagRepository.findByNameContainingIgnoreCase(). resolveEffectiveColors() applies one-level color inheritance in-place on a collection of Tag entities.")
-
-Rel(frontend, nlCtrl, "POST /api/search/nl with JSON query", "HTTP / JSON")
-Rel(nlCtrl, rateLimiter, "checkAndConsume(userEmail)")
-Rel(nlCtrl, parserSvc, "parse(query)")
-Rel(parserSvc, ollamaClient, "parse(rawQuery) — extracts intent", "HTTP / JSON")
-Rel(ollamaClient, ollama, "POST /api/generate (grammar-constrained JSON schema)", "HTTP / REST")
-Rel(ollamaClient, ollama, "GET /api/tags (health check)", "HTTP / REST")
-Rel(parserSvc, tagSvc, "findByNameContaining(keyword) — keyword→tag resolution")
-Rel(parserSvc, tagSvc, "resolveEffectiveColors(tags)")
-Rel(parserSvc, personSvc, "findByDisplayNameContaining(name) for each extracted name")
-Rel(parserSvc, documentSvc, "searchDocuments() or searchDocumentsByPersonId()")
-Rel(documentSvc, db, "JPA queries", "JDBC")
-Rel(personSvc, db, "JPA queries", "JDBC")
-Rel(tagSvc, db, "JPA queries", "JDBC")
-
-@enduml
--- a/frontend/CLAUDE.md
+++ b/frontend/CLAUDE.md
@@ -28,7 +28,6 @@ src/
 │   ├── +layout.server.ts # Loads current user, injects auth cookie
 │   ├── +page.svelte     # Home / document search dashboard
 │   ├── documents/       # Document CRUD, detail, edit, upload
-│   ├── search/          # Smart (NL) search sub-components — SmartModeToggle, InterpretationChipRow, SmartSearchStatus, DisambiguationPicker (no +page; consumed by documents/ and SearchFilterBar)
 │   ├── persons/         # Person directory (filtered, paginated), detail, edit, merge, review (triage)
 │   ├── aktivitaeten/    # Unified activity feed (Chronik)
 │   ├── admin/           # User, group, tag, OCR, system management
--- a/frontend/e2e/nl-search.spec.ts
+++ b/frontend/e2e/nl-search.spec.ts
@@ -1,113 +0,0 @@
-import AxeBuilder from '@axe-core/playwright';
-import { test, expect } from '@playwright/test';
-
-// NL search is mocked at the network boundary — Ollama is not required in CI.
-// CSRF enforcement is bypassed by page.route (the real request is never sent),
-// so it is only verified in manual full-stack runs (see issue #739 DevOps notes).
-const interpretation = {
-	resolvedPersons: [
-		{ id: '11111111-1111-1111-1111-111111111111', displayName: 'Walter Raddatz' },
-		{ id: '22222222-2222-2222-2222-222222222222', displayName: 'Emma Raddatz' }
-	],
-	ambiguousPersons: [],
-	dateFrom: '1914-01-01',
-	dateTo: '1918-12-31',
-	keywords: ['krieg'],
-	resolvedTags: [{ id: '33333333-3333-3333-3333-333333333333', name: 'Weltkrieg', color: 'sage' }],
-	rawQuery: 'Was hat Walter an Emma im Krieg geschrieben?',
-	keywordsApplied: true,
-	tagsApplied: true
-};
-
-const nlResponse = {
-	result: {
-		items: [],
-		totalElements: 0,
-		pageNumber: 0,
-		pageSize: 20,
-		totalPages: 0,
-		undatedCount: 0
-	},
-	interpretation
-};
-
-test.describe('NL (smart) search — happy path', () => {
-	test('toggle → loading → chips → remove chip re-runs keyword search; axe clean light + dark', async ({
-		page
-	}) => {
-		// Deliberate delay so the loading state is assertable before the response arrives.
-		await page.route('**/api/search/nl', async (route) => {
-			await new Promise((resolve) => setTimeout(resolve, 150));
-			await route.fulfill({
-				status: 200,
-				contentType: 'application/json',
-				body: JSON.stringify(nlResponse)
-			});
-		});
-
-		await page.goto('/documents');
-		await page.waitForSelector('[data-hydrated]');
-
-		// Switch to smart mode via the toggle pill (keyword label = "Text").
-		await page.getByRole('button', { name: /Text/ }).click();
-
-		const input = page.getByPlaceholder('Titel, Personen, Tags durchsuchen…');
-		await input.fill('Was hat Walter an Emma im Krieg geschrieben?');
-		await input.press('Enter');
-
-		// Loading panel announced to screen readers.
-		await expect(page.getByText(/Archiv wird befragt/)).toBeVisible();
-
-		// Directional chip (Walter → Emma) + keyword chip + theme chip render once the fixture resolves.
-		await expect(page.getByText('→')).toBeVisible();
-		await expect(page.getByText('Stichwort: krieg')).toBeVisible();
-		await expect(page.getByText(/Thema:.*Weltkrieg/)).toBeVisible();
-
-		// Accessibility — light mode.
-		const lightScan = await new AxeBuilder({ page })
-			.include('[data-testid="smart-search-results"]')
-			.analyze();
-		expect(lightScan.violations).toEqual([]);
-
-		// Accessibility — dark mode.
-		await page.evaluate(() => document.documentElement.setAttribute('data-theme', 'dark'));
-		const darkScan = await new AxeBuilder({ page })
-			.include('[data-testid="smart-search-results"]')
-			.analyze();
-		expect(darkScan.violations).toEqual([]);
-		await page.evaluate(() => document.documentElement.setAttribute('data-theme', 'light'));
-
-		// Removing the keyword chip re-runs a keyword GET with the remaining resolved
-		// params (sender + receiver from the directional pair).
-		await page.getByRole('button', { name: 'Filter entfernen: Stichwort: krieg' }).click();
-		await page.waitForURL(/senderId=11111111-1111-1111-1111-111111111111/);
-		await expect(page).toHaveURL(/receiverId=22222222-2222-2222-2222-222222222222/);
-	});
-
-	test('removing the last theme chip drops tag/tagOp but keeps person params', async ({ page }) => {
-		await page.route('**/api/search/nl', async (route) => {
-			await route.fulfill({
-				status: 200,
-				contentType: 'application/json',
-				body: JSON.stringify(nlResponse)
-			});
-		});
-
-		await page.goto('/documents');
-		await page.waitForSelector('[data-hydrated]');
-		await page.getByRole('button', { name: /Text/ }).click();
-
-		const input = page.getByPlaceholder('Titel, Personen, Tags durchsuchen…');
-		await input.fill('Was hat Walter an Emma im Krieg geschrieben?');
-		await input.press('Enter');
-
-		await expect(page.getByText(/Thema:.*Weltkrieg/)).toBeVisible();
-
-		// Remove the single theme chip — URL must carry sender UUID but no tag/tagOp.
-		await page.getByRole('button', { name: 'Filter entfernen: Thema: Weltkrieg' }).click();
-		await page.waitForURL(/senderId=11111111-1111-1111-1111-111111111111/);
-		const url = page.url();
-		expect(url).not.toMatch(/tag=/);
-		expect(url).not.toMatch(/tagOp=/);
-	});
-});
--- a/frontend/messages/de.json
+++ b/frontend/messages/de.json
@@ -22,33 +22,6 @@
 	"error_forbidden": "Sie haben keine Berechtigung für diese Aktion.",
 	"error_csrf_token_missing": "Sitzungsfehler. Bitte laden Sie die Seite neu.",
 	"error_too_many_login_attempts": "Zu viele Anmeldeversuche. Bitte versuchen Sie es später erneut.",
-	"error_smart_search_unavailable": "Die intelligente Suche ist momentan nicht verfügbar. Bitte nutzen Sie die normale Suche.",
-	"error_smart_search_rate_limited": "Sie haben die Suchfunktion zu häufig genutzt. Bitte warten Sie eine Minute.",
-	"smart_search_keywords_not_applied": "Schlüsselwörter konnten bei dieser Suche nicht berücksichtigt werden.",
-	"search_toggle_smart_label": "KI",
-	"search_toggle_smart_label_suffix": "-Suche",
-	"search_toggle_keyword_label": "Text",
-	"search_toggle_keyword_label_suffix": "suche",
-	"search_loading_nl": "Archiv wird befragt…",
-	"search_loading_nl_sub": "Die KI analysiert Ihre Anfrage. Das kann bis zu 15 Sekunden dauern.",
-	"search_error_unavailable": "Intelligente Suche nicht verfügbar",
-	"search_error_unavailable_body": "Die KI-Suche ist momentan nicht erreichbar. Sie können Ihre Anfrage als einfache Volltextsuche wiederholen.",
-	"search_switch_to_keyword": "Zur Volltextsuche wechseln",
-	"search_error_rate_limited": "Zu viele Anfragen",
-	"search_error_rate_limited_body": "Sie haben die intelligente Suche zu häufig genutzt. Bitte warten Sie eine Minute und versuchen Sie es erneut.",
-	"search_empty_nl": "Keine Ergebnisse",
-	"search_empty_retry_keyword": "Als Volltextsuche wiederholen",
-	"search_filter_remove_label": "Filter entfernen: {label}",
-	"search_chip_sender": "Absender",
-	"search_chip_date": "Zeitraum",
-	"search_chip_keyword": "Stichwort",
-	"search_chip_theme_prefix": "Thema",
-	"search_chip_directional_label": "Von {from} zu {to}, Filter entfernen",
-	"search_disambiguation_trigger_label": "Mehrere Personen gefunden — zum Auswählen klicken",
-	"search_disambiguation_cue": "(auswählen…)",
-	"search_disambiguation_heading": "Person auswählen",
-	"search_disambiguation_did_you_mean": "Meintest du {name}?",
-	"search_disambiguation_select_label": "{name} auswählen",
 	"error_validation_error": "Die Eingabe ist ungültig.",
 	"error_internal_error": "Ein unerwarteter Fehler ist aufgetreten.",
 	"nav_documents": "Dokumente",
--- a/frontend/messages/en.json
+++ b/frontend/messages/en.json
@@ -22,33 +22,6 @@
 	"error_forbidden": "You do not have permission for this action.",
 	"error_csrf_token_missing": "Session error. Please reload the page.",
 	"error_too_many_login_attempts": "Too many login attempts. Please try again later.",
-	"error_smart_search_unavailable": "The smart search is currently unavailable. Please use the regular search.",
-	"error_smart_search_rate_limited": "You have used the search function too frequently. Please wait a minute.",
-	"smart_search_keywords_not_applied": "Keywords could not be applied to this search.",
-	"search_toggle_smart_label": "AI",
-	"search_toggle_smart_label_suffix": " search",
-	"search_toggle_keyword_label": "Text",
-	"search_toggle_keyword_label_suffix": " search",
-	"search_loading_nl": "Querying the archive…",
-	"search_loading_nl_sub": "The AI is analysing your request. This can take up to 15 seconds.",
-	"search_error_unavailable": "Smart search unavailable",
-	"search_error_unavailable_body": "The AI search is currently unreachable. You can repeat your request as a plain full-text search.",
-	"search_switch_to_keyword": "Switch to full-text search",
-	"search_error_rate_limited": "Too many requests",
-	"search_error_rate_limited_body": "You have used the smart search too frequently. Please wait a minute and try again.",
-	"search_empty_nl": "No results",
-	"search_empty_retry_keyword": "Repeat as full-text search",
-	"search_filter_remove_label": "Remove filter: {label}",
-	"search_chip_sender": "Sender",
-	"search_chip_date": "Period",
-	"search_chip_keyword": "Keyword",
-	"search_chip_theme_prefix": "Topic",
-	"search_chip_directional_label": "From {from} to {to}, remove filter",
-	"search_disambiguation_trigger_label": "Several people found — click to choose",
-	"search_disambiguation_cue": "(choose…)",
-	"search_disambiguation_heading": "Choose a person",
-	"search_disambiguation_did_you_mean": "Did you mean {name}?",
-	"search_disambiguation_select_label": "Select {name}",
 	"error_validation_error": "The input is invalid.",
 	"error_internal_error": "An unexpected error occurred.",
 	"nav_documents": "Documents",
--- a/frontend/messages/es.json
+++ b/frontend/messages/es.json
@@ -22,33 +22,6 @@
 	"error_forbidden": "No tiene permiso para realizar esta acción.",
 	"error_csrf_token_missing": "Error de sesión. Recargue la página.",
 	"error_too_many_login_attempts": "Demasiados intentos. Por favor, inténtelo más tarde.",
-	"error_smart_search_unavailable": "La búsqueda inteligente no está disponible en este momento. Por favor, usa la búsqueda normal.",
-	"error_smart_search_rate_limited": "Has utilizado la función de búsqueda demasiadas veces. Por favor, espera un minuto.",
-	"smart_search_keywords_not_applied": "Las palabras clave no pudieron aplicarse a esta búsqueda.",
-	"search_toggle_smart_label": "IA",
-	"search_toggle_smart_label_suffix": " búsqueda",
-	"search_toggle_keyword_label": "Texto",
-	"search_toggle_keyword_label_suffix": " búsqueda",
-	"search_loading_nl": "Consultando el archivo…",
-	"search_loading_nl_sub": "La IA está analizando su solicitud. Esto puede tardar hasta 15 segundos.",
-	"search_error_unavailable": "Búsqueda inteligente no disponible",
-	"search_error_unavailable_body": "La búsqueda con IA no está disponible en este momento. Puede repetir su solicitud como una búsqueda de texto completo.",
-	"search_switch_to_keyword": "Cambiar a búsqueda de texto completo",
-	"search_error_rate_limited": "Demasiadas solicitudes",
-	"search_error_rate_limited_body": "Ha utilizado la búsqueda inteligente con demasiada frecuencia. Espere un minuto e inténtelo de nuevo.",
-	"search_empty_nl": "Sin resultados",
-	"search_empty_retry_keyword": "Repetir como búsqueda de texto completo",
-	"search_filter_remove_label": "Eliminar filtro: {label}",
-	"search_chip_sender": "Remitente",
-	"search_chip_date": "Período",
-	"search_chip_keyword": "Palabra clave",
-	"search_chip_theme_prefix": "Tema",
-	"search_chip_directional_label": "De {from} a {to}, eliminar filtro",
-	"search_disambiguation_trigger_label": "Se encontraron varias personas — haga clic para elegir",
-	"search_disambiguation_cue": "(elegir…)",
-	"search_disambiguation_heading": "Elegir una persona",
-	"search_disambiguation_did_you_mean": "¿Quería decir {name}?",
-	"search_disambiguation_select_label": "Seleccionar {name}",
 	"error_validation_error": "La entrada no es válida.",
 	"error_internal_error": "Se ha producido un error inesperado.",
 	"nav_documents": "Documentos",
--- a/frontend/src/lib/generated/api.ts
+++ b/frontend/src/lib/generated/api.ts
@@ -228,22 +228,6 @@ export interface paths {
        patch?: never;
        trace?: never;
    };
-    "/api/search/nl": {
-        parameters: {
-            query?: never;
-            header?: never;
-            path?: never;
-            cookie?: never;
-        };
-        get?: never;
-        put?: never;
-        post: operations["search"];
-        delete?: never;
-        options?: never;
-        head?: never;
-        patch?: never;
-        trace?: never;
-    };
    "/api/persons": {
        parameters: {
            query?: never;
@@ -1835,9 +1819,6 @@ export interface components {
            /** Format: uuid */
            targetId: string;
        };
-        NlSearchRequest: {
-            query: string;
-        };
        Pageable: {
            /** Format: int32 */
            page?: number;
@@ -1897,34 +1878,6 @@ export interface components {
            /** Format: int32 */
            length: number;
        };
-        NlQueryInterpretation: {
-            resolvedPersons: components["schemas"]["PersonHint"][];
-            ambiguousPersons: components["schemas"]["PersonHint"][];
-            /** Format: date */
-            dateFrom?: string;
-            /** Format: date */
-            dateTo?: string;
-            keywords: string[];
-            resolvedTags: components["schemas"]["TagHint"][];
-            rawQuery: string;
-            keywordsApplied: boolean;
-            tagsApplied: boolean;
-        };
-        NlSearchResponse: {
-            result: components["schemas"]["DocumentSearchResult"];
-            interpretation: components["schemas"]["NlQueryInterpretation"];
-        };
-        PersonHint: {
-            /** Format: uuid */
-            id: string;
-            displayName: string;
-        };
-        TagHint: {
-            /** Format: uuid */
-            id: string;
-            name: string;
-            color?: string;
-        };
        SearchMatchData: {
            transcriptionSnippet?: string;
            titleOffsets: components["schemas"]["MatchOffset"][];
@@ -3244,32 +3197,6 @@ export interface operations {
            };
        };
    };
-    search: {
-        parameters: {
-            query: {
-                pageable: components["schemas"]["Pageable"];
-            };
-            header?: never;
-            path?: never;
-            cookie?: never;
-        };
-        requestBody: {
-            content: {
-                "application/json": components["schemas"]["NlSearchRequest"];
-            };
-        };
-        responses: {
-            /** @description OK */
-            200: {
-                headers: {
-                    [name: string]: unknown;
-                };
-                content: {
-                    "*/*": components["schemas"]["NlSearchResponse"];
-                };
-            };
-        };
-    };
    getPersons: {
        parameters: {
            query?: {
--- a/frontend/src/lib/shared/errors.ts
+++ b/frontend/src/lib/shared/errors.ts
@@ -53,8 +53,6 @@ export type ErrorCode =
 	| 'FORBIDDEN'
 	| 'CSRF_TOKEN_MISSING'
 	| 'TOO_MANY_LOGIN_ATTEMPTS'
-	| 'SMART_SEARCH_UNAVAILABLE'
-	| 'SMART_SEARCH_RATE_LIMITED'
 	| 'VALIDATION_ERROR'
 	| 'BATCH_TOO_LARGE'
 	| 'BULK_EDIT_TOO_MANY_IDS'
@@ -180,10 +178,6 @@ export function getErrorMessage(code: ErrorCode | string | undefined): string {
 			return m.error_csrf_token_missing();
 		case 'TOO_MANY_LOGIN_ATTEMPTS':
 			return m.error_too_many_login_attempts();
-		case 'SMART_SEARCH_UNAVAILABLE':
-			return m.error_smart_search_unavailable();
-		case 'SMART_SEARCH_RATE_LIMITED':
-			return m.error_smart_search_rate_limited();
 		case 'VALIDATION_ERROR':
 			return m.error_validation_error();
 		case 'BATCH_TOO_LARGE':
--- a/frontend/src/routes/SearchFilterBar.svelte
+++ b/frontend/src/routes/SearchFilterBar.svelte
@@ -3,7 +3,6 @@ import PersonTypeahead from '$lib/person/PersonTypeahead.svelte';
 import TagInput from '$lib/tag/TagInput.svelte';
 import DateInput from '$lib/shared/primitives/DateInput.svelte';
 import SortDropdown from '$lib/shared/primitives/SortDropdown.svelte';
-import SmartModeToggle from './search/SmartModeToggle.svelte';
 import { slide } from 'svelte/transition';
 import { m } from '$lib/paraglide/messages.js';

@@ -21,15 +20,12 @@ let {
 	sort = $bindable('DATE'),
 	dir = $bindable('desc'),
 	showAdvanced = $bindable(false),
-	smartMode = $bindable(false),
 	initialSenderName = '',
 	initialReceiverName = '',
 	navKey = 0,
 	isLoading = false,
 	onSearch,
 	onSearchImmediate,
-	onSmartSearch,
-	onModeToggle,
 	onfocus,
 	onblur
 }: {
@@ -46,28 +42,16 @@ let {
 	sort?: string;
 	dir?: string;
 	showAdvanced?: boolean;
-	smartMode?: boolean;
 	initialSenderName?: string;
 	initialReceiverName?: string;
 	navKey?: number;
 	isLoading?: boolean;
 	onSearch: () => void;
 	onSearchImmediate?: () => void;
-	onSmartSearch?: () => void;
-	onModeToggle?: () => void;
 	onfocus?: () => void;
 	onblur?: () => void;
 } = $props();

-// In smart mode the keyword search must not fire on every keystroke — the NL
-// query is submitted only on Enter (or an explicit button click).
-function onSearchKeydown(event: KeyboardEvent) {
-	if (smartMode && event.key === 'Enter') {
-		event.preventDefault();
-		onSmartSearch?.();
-	}
-}
-
 // Plain (non-reactive) flag — not $state, so no reactive assignment inside $effect
 let sortDirMounted = false;

@@ -92,19 +76,13 @@ $effect(() => {
 			<input
 				type="text"
 				bind:value={q}
-				oninput={smartMode ? undefined : onSearch}
-				onkeydown={onSearchKeydown}
+				oninput={onSearch}
 				onfocus={onfocus}
 				onblur={onblur}
-				maxlength={smartMode ? 500 : undefined}
 				aria-label={m.docs_search_placeholder()}
 				placeholder={m.docs_search_placeholder()}
-				class="block w-full border-line py-2.5 pl-10 placeholder-ink-3 shadow-sm focus:outline-none focus-visible:ring-2 focus-visible:ring-focus-ring {smartMode
-					? 'pr-28'
-					: 'pr-20'}"
+				class="block w-full border-line py-2.5 pr-4 pl-10 placeholder-ink-3 shadow-sm focus:outline-none focus-visible:ring-2 focus-visible:ring-focus-ring"
 			/>
-			<!-- Decorative search icon / loading spinner — left slot keeps the right
-			     slot free for the always-visible smart-mode toggle pill. -->
 			<div class="pointer-events-none absolute inset-y-0 left-0 flex items-center pl-3">
 				{#if isLoading}
 					<svg
@@ -132,7 +110,6 @@ $effect(() => {
 					/>
 				{/if}
 			</div>
-			<SmartModeToggle bind:smartMode={smartMode} onToggle={onModeToggle} />
 		</div>

 		<!-- Sort Dropdown -->
--- a/frontend/src/routes/SearchFilterBar.svelte.spec.ts
+++ b/frontend/src/routes/SearchFilterBar.svelte.spec.ts
@@ -195,39 +195,3 @@ describe('SearchFilterBar – tagQ live filter', () => {
 		vi.unstubAllGlobals();
 	});
 });
-
-describe('SearchFilterBar – smart-mode chip lifecycle hooks', () => {
-	// The interpretation chips live in the result area (parent page). SearchFilterBar
-	// drives chip-clearing through callbacks: onModeToggle (mode switch) and
-	// onSmartSearch (new query). These tests pin that contract.
-	it('invokes onModeToggle when toggling back to keyword mode (parent clears chips)', async () => {
-		const onModeToggle = vi.fn();
-		render(SearchFilterBar, {
-			...defaultProps,
-			sort: 'DATE',
-			dir: 'desc',
-			smartMode: true,
-			onModeToggle
-		});
-		await page.getByRole('button', { name: /KI/ }).click();
-		expect(onModeToggle).toHaveBeenCalledOnce();
-	});
-
-	it('invokes onSmartSearch when a new query is submitted in smart mode (parent resets chips)', async () => {
-		const onSmartSearch = vi.fn();
-		render(SearchFilterBar, {
-			...defaultProps,
-			sort: 'DATE',
-			dir: 'desc',
-			smartMode: true,
-			onSmartSearch
-		});
-		const input = page.getByPlaceholder('Titel, Personen, Tags durchsuchen…');
-		await input.fill('Walter im Krieg');
-		await input.click();
-		(document.activeElement as HTMLElement).dispatchEvent(
-			new KeyboardEvent('keydown', { key: 'Enter', bubbles: true })
-		);
-		await vi.waitFor(() => expect(onSmartSearch).toHaveBeenCalled());
-	});
-});
--- a/frontend/src/routes/documents/+page.svelte
+++ b/frontend/src/routes/documents/+page.svelte
@@ -8,22 +8,9 @@ import DocumentList from '../DocumentList.svelte';
 import Pagination from '$lib/shared/primitives/Pagination.svelte';
 import BulkSelectionBar from '$lib/document/BulkSelectionBar.svelte';
 import TimelineDensityFilter from '$lib/document/TimelineDensityFilter.svelte';
-import SmartSearchStatus from '../search/SmartSearchStatus.svelte';
-import InterpretationChipRow from '../search/InterpretationChipRow.svelte';
-import type { ChipType } from '../search/chip-types.js';
-import { buildThemeRemovalUrl } from './theme-chip-removal.js';
-import DisambiguationPicker from '../search/DisambiguationPicker.svelte';
 import { bulkSelectionStore } from '$lib/document/bulkSelection.svelte';
 import { getErrorMessage, parseBackendError } from '$lib/shared/errors';
-import { csrfFetch } from '$lib/shared/cookies';
 import * as m from '$lib/paraglide/messages.js';
-import type { components } from '$lib/generated/api';
-
-type NlQueryInterpretation = components['schemas']['NlQueryInterpretation'];
-type NlSearchResponse = components['schemas']['NlSearchResponse'];
-type DocumentSearchResult = components['schemas']['DocumentSearchResult'];
-type PersonHint = components['schemas']['PersonHint'];
-type SmartSearchErrorCode = 'SMART_SEARCH_UNAVAILABLE' | 'SMART_SEARCH_RATE_LIMITED';

 let { data } = $props();

@@ -47,27 +34,6 @@ let tagQ = $state(untrack(() => data.tagQ || ''));
 let tagOperator = $state<'AND' | 'OR'>(untrack(() => (data.tagOp as 'AND' | 'OR') || 'AND'));
 let undated = $state(untrack(() => data.undated ?? false));

-// Smart (NL) search — UI-local state, resets on real page navigation (away + back).
-let smartMode = $state(false);
-let nlSubmitted = $state(false);
-let nlLoading = $state(false);
-let nlError = $state<SmartSearchErrorCode | null>(null);
-let nlInterpretation = $state<NlQueryInterpretation | null>(null);
-let nlResult = $state<DocumentSearchResult | null>(null);
-
-const showNlView = $derived(smartMode && nlSubmitted);
-const nlHasResults = $derived((nlResult?.items.length ?? 0) > 0);
-const ambiguousPersons = $derived(nlInterpretation?.ambiguousPersons ?? []);
-const nlIsAmbiguous = $derived(ambiguousPersons.length > 0);
-// A 1-item picker is always a "did you mean …?" suggestion (a single direct match auto-selects
-// and never reaches the picker); ≥2 keeps the "choose a person" framing and the action cue.
-const disambiguationHeading = $derived(
-	ambiguousPersons.length === 1
-		? m.search_disambiguation_did_you_mean({ name: ambiguousPersons[0].displayName })
-		: m.search_disambiguation_heading()
-);
-const showDisambiguationCue = $derived(ambiguousPersons.length >= 2);
-
 function hasAdvancedFilters() {
 	return (
 		(data.tags?.length ?? 0) > 0 ||
@@ -198,124 +164,6 @@ function handleImmediateSearch() {
 	triggerSearchKeepZoom();
 }

-function resetNlState() {
-	nlSubmitted = false;
-	nlLoading = false;
-	nlError = null;
-	nlInterpretation = null;
-	nlResult = null;
-}
-
-/** Toggling the mode (either direction) always clears any prior NL interpretation. */
-function onModeToggle() {
-	resetNlState();
-}
-
-/** Submit the natural-language query to the server-side parser. */
-async function runSmartSearch() {
-	const query = q.trim();
-	if (query.length < 3) return;
-	nlSubmitted = true;
-	nlLoading = true;
-	nlError = null;
-	nlInterpretation = null;
-	nlResult = null;
-	try {
-		const res = await csrfFetch('/api/search/nl', {
-			method: 'POST',
-			headers: { 'Content-Type': 'application/json' },
-			body: JSON.stringify({ query })
-		});
-		if (!res.ok) {
-			const backend = await parseBackendError(res);
-			nlError =
-				backend?.code === 'SMART_SEARCH_RATE_LIMITED'
-					? 'SMART_SEARCH_RATE_LIMITED'
-					: 'SMART_SEARCH_UNAVAILABLE';
-			return;
-		}
-		const body: NlSearchResponse = await res.json();
-		nlInterpretation = body.interpretation;
-		nlResult = body.result;
-	} catch {
-		nlError = 'SMART_SEARCH_UNAVAILABLE';
-	} finally {
-		nlLoading = false;
-	}
-}
-
-/** Option A empty/error fallback: drop NL mode, keep the raw query, run a keyword search. */
-function switchToKeywordMode() {
-	resetNlState();
-	smartMode = false;
-	handleImmediateSearch();
-}
-
-/** Applies a resolved param set to the keyword filters and re-runs via GET. */
-function applyResolvedAndSearch(p: {
-	senderId: string;
-	receiverId: string;
-	from: string;
-	to: string;
-	q: string;
-}) {
-	resetNlState();
-	smartMode = false;
-	senderId = p.senderId;
-	receiverId = p.receiverId;
-	from = p.from;
-	to = p.to;
-	q = p.q;
-	handleImmediateSearch();
-}
-
-function paramsFromInterpretation(interp: NlQueryInterpretation) {
-	const resolved = interp.resolvedPersons;
-	return {
-		senderId: resolved.length >= 1 ? resolved[0].id : '',
-		receiverId: resolved.length >= 2 ? resolved[1].id : '',
-		from: interp.dateFrom ?? '',
-		to: interp.dateTo ?? '',
-		q: interp.keywordsApplied ? interp.keywords.join(' ') : ''
-	};
-}
-
-function removeChip(type: ChipType, value?: string) {
-	if (!nlInterpretation) return;
-	const p = paramsFromInterpretation(nlInterpretation);
-	if (type === 'sender') {
-		p.senderId = '';
-	} else if (type === 'directional') {
-		p.senderId = '';
-		p.receiverId = '';
-	} else if (type === 'date') {
-		p.from = '';
-		p.to = '';
-	} else if (type === 'keyword' && value) {
-		const remaining = nlInterpretation.keywords.filter((keyword) => keyword !== value);
-		p.q = remaining.join(' ');
-	} else if (type === 'theme' && value) {
-		const url = buildThemeRemovalUrl(nlInterpretation, value);
-		resetNlState();
-		goto(url, { keepFocus: true, noScroll: true });
-		return;
-	}
-	applyResolvedAndSearch(p);
-}
-
-/** Single-select disambiguation: resolved person becomes sender, chosen becomes receiver. */
-function selectDisambiguated(person: PersonHint) {
-	if (!nlInterpretation) return;
-	const resolved = nlInterpretation.resolvedPersons;
-	applyResolvedAndSearch({
-		senderId: resolved.length >= 1 ? resolved[0].id : person.id,
-		receiverId: resolved.length >= 1 ? person.id : '',
-		from: nlInterpretation.dateFrom ?? '',
-		to: nlInterpretation.dateTo ?? '',
-		q: nlInterpretation.keywordsApplied ? nlInterpretation.keywords.join(' ') : ''
-	});
-}
-
 // Trigger search reactively when the tag list changes.
 let prevTagStr = untrack(() => tagNames.map((t) => t.name).join(','));
 $effect(() => {
@@ -420,7 +268,6 @@ $effect(() => {
 		bind:tagQ={tagQ}
 		bind:tagOperator={tagOperator}
 		bind:undated={undated}
-		bind:smartMode={smartMode}
 		undatedCount={data.undatedCount ?? 0}
 		initialSenderName={initialSenderName}
 		initialReceiverName={initialReceiverName}
@@ -428,71 +275,20 @@ $effect(() => {
 		isLoading={navigating.to !== null}
 		onSearch={handleTextSearch}
 		onSearchImmediate={handleImmediateSearch}
-		onSmartSearch={runSmartSearch}
-		onModeToggle={onModeToggle}
 		onfocus={() => (qFocused = true)}
 		onblur={() => (qFocused = false)}
 	/>

-	{#if showNlView}
-		<!-- Smart-search results area: loading / error / chips + results / empty / disambiguation. -->
-		<div data-testid="smart-search-results">
-			{#if nlLoading}
-				<SmartSearchStatus status="loading" />
-			{:else if nlError}
-				<SmartSearchStatus
-					status="error"
-					errorCode={nlError}
-					onSwitchToKeyword={switchToKeywordMode}
-				/>
-			{:else if nlInterpretation}
-				{#key nlInterpretation}
-					<div class="mb-4">
-						{#if nlIsAmbiguous}
-							<DisambiguationPicker
-								persons={nlInterpretation.ambiguousPersons}
-								heading={disambiguationHeading}
-								showCue={showDisambiguationCue}
-								onSelect={selectDisambiguated}
-							/>
-						{:else}
-							<InterpretationChipRow interpretation={nlInterpretation} onRemoveChip={removeChip} />
-						{/if}
-					</div>
-
-					{#if !nlIsAmbiguous}
-						{#if nlHasResults}
-							<p class="mb-3 font-sans text-base text-ink-2">
-								{m.docs_result_count({ count: nlResult?.totalElements ?? 0 })}
-							</p>
-							<DocumentList items={nlResult?.items ?? []} canWrite={data.canWrite} sort={sort} />
-						{:else}
-							<div class="flex flex-col items-center justify-center gap-3 py-16 text-center">
-								<p class="text-sm font-bold text-ink">{m.search_empty_nl()}</p>
-								<button
-									type="button"
-									onclick={switchToKeywordMode}
-									class="inline-flex min-h-[44px] items-center rounded px-3 py-2 text-sm font-bold text-primary underline underline-offset-4 outline-none hover:text-brand-mint focus-visible:ring-2 focus-visible:ring-brand-navy"
-								>
-									{m.search_empty_retry_keyword()}
-								</button>
-							</div>
-						{/if}
-					{/if}
-				{/key}
-			{/if}
-		</div>
-	{:else}
-		<div class="mt-3 mb-4 hidden lg:block">
-			<TimelineDensityFilter
-				density={data.density}
-				minDate={data.minDate}
-				maxDate={data.maxDate}
-				zoomFrom={data.zoomFrom}
-				zoomTo={data.zoomTo}
-				from={from}
-				to={to}
-				onchange={(event) => {
+	<div class="mt-3 mb-4 hidden lg:block">
+		<TimelineDensityFilter
+			density={data.density}
+			minDate={data.minDate}
+			maxDate={data.maxDate}
+			zoomFrom={data.zoomFrom}
+			zoomTo={data.zoomTo}
+			from={from}
+			to={to}
+			onchange={(event) => {
 					from = event.from;
 					to = event.to;
 					// Drag commits filter + zoom atomically (Graylog-style range selector).
@@ -503,70 +299,69 @@ $effect(() => {
 						triggerSearchKeepZoom();
 					}
 				}}
-				onzoomchange={(event) => {
+			onzoomchange={(event) => {
 					triggerSearchWithZoom(event?.zoomFrom ?? null, event?.zoomTo ?? null);
 				}}
-			/>
-		</div>
+		/>
+	</div>

-		<div class="mb-3 flex items-center justify-between gap-4">
-			<p class="font-sans text-base text-ink-2">
-				{#if data.totalElements > 0}{m.docs_result_count({ count: data.totalElements })}{/if}
-			</p>
-			{#if data.canWrite}
-				<div class="flex flex-col items-end gap-1">
-					<div class="flex items-center gap-4">
-						{#if data.totalElements > 0}
-							<button
-								type="button"
-								onclick={editAllMatching}
-								disabled={editingAll}
-								class="inline-flex cursor-pointer items-center gap-1 text-sm font-medium text-ink-2 transition-colors hover:text-ink disabled:opacity-50"
-								data-testid="bulk-edit-all-x"
-							>
-								<img
-									src="/degruyter-icons/Simple/Medium-24px/SVG/Action/Edit-Content-MD.svg"
-									alt=""
-									aria-hidden="true"
-									class="h-4 w-4"
-								/>
-								{m.bulk_edit_all_x({ count: data.totalElements })}
-							</button>
-						{/if}
-						<a
-							href="/documents/new"
-							class="inline-flex items-center gap-1 text-sm font-medium text-ink-2 transition-colors hover:text-ink"
+	<div class="mb-3 flex items-center justify-between gap-4">
+		<p class="font-sans text-base text-ink-2">
+			{#if data.totalElements > 0}{m.docs_result_count({ count: data.totalElements })}{/if}
+		</p>
+		{#if data.canWrite}
+			<div class="flex flex-col items-end gap-1">
+				<div class="flex items-center gap-4">
+					{#if data.totalElements > 0}
+						<button
+							type="button"
+							onclick={editAllMatching}
+							disabled={editingAll}
+							class="inline-flex cursor-pointer items-center gap-1 text-sm font-medium text-ink-2 transition-colors hover:text-ink disabled:opacity-50"
+							data-testid="bulk-edit-all-x"
 						>
 							<img
-								src="/degruyter-icons/Simple/Medium-24px/SVG/Action/Add/Add-General-MD.svg"
+								src="/degruyter-icons/Simple/Medium-24px/SVG/Action/Edit-Content-MD.svg"
 								alt=""
 								aria-hidden="true"
 								class="h-4 w-4"
 							/>
-							{m.docs_btn_new()}
-						</a>
-					</div>
-					{#if editAllError}
-						<p role="alert" class="text-xs text-danger" data-testid="bulk-edit-all-x-error">
-							{editAllError}
-						</p>
+							{m.bulk_edit_all_x({ count: data.totalElements })}
+						</button>
 					{/if}
+					<a
+						href="/documents/new"
+						class="inline-flex items-center gap-1 text-sm font-medium text-ink-2 transition-colors hover:text-ink"
+					>
+						<img
+							src="/degruyter-icons/Simple/Medium-24px/SVG/Action/Add/Add-General-MD.svg"
+							alt=""
+							aria-hidden="true"
+							class="h-4 w-4"
+						/>
+						{m.docs_btn_new()}
+					</a>
 				</div>
-			{/if}
-		</div>
+				{#if editAllError}
+					<p role="alert" class="text-xs text-danger" data-testid="bulk-edit-all-x-error">
+						{editAllError}
+					</p>
+				{/if}
+			</div>
+		{/if}
+	</div>

-		<DocumentList
-			items={data.items}
-			q={data.q}
-			canWrite={data.canWrite}
-			error={data.error}
-			sort={sort}
-			from={data.from}
-			to={data.to}
-		/>
+	<DocumentList
+		items={data.items}
+		q={data.q}
+		canWrite={data.canWrite}
+		error={data.error}
+		sort={sort}
+		from={data.from}
+		to={data.to}
+	/>

-		<Pagination page={data.pageNumber} totalPages={data.totalPages} makeHref={buildPageHref} />
-	{/if}
+	<Pagination page={data.pageNumber} totalPages={data.totalPages} makeHref={buildPageHref} />
 </main>

 <BulkSelectionBar canWrite={data.canWrite} />
--- a/frontend/src/routes/documents/theme-chip-removal.spec.ts
+++ b/frontend/src/routes/documents/theme-chip-removal.spec.ts
@@ -1,85 +0,0 @@
-import { describe, it, expect } from 'vitest';
-import { buildThemeRemovalUrl } from './theme-chip-removal.js';
-import type { components } from '$lib/generated/api';
-
-type NlQueryInterpretation = components['schemas']['NlQueryInterpretation'];
-
-function makeInterp(overrides: Partial<NlQueryInterpretation> = {}): NlQueryInterpretation {
-	return {
-		resolvedPersons: [],
-		ambiguousPersons: [],
-		keywords: [],
-		resolvedTags: [],
-		rawQuery: '',
-		keywordsApplied: false,
-		tagsApplied: true,
-		...overrides
-	};
-}
-
-function makeTag(id: string, name: string, color?: string) {
-	return color ? { id, name, color } : { id, name };
-}
-
-describe('buildThemeRemovalUrl', () => {
-	it('N remaining tags → N tag params + tagOp=OR', () => {
-		const interp = makeInterp({
-			resolvedTags: [
-				makeTag('aaa', 'Hochzeit'),
-				makeTag('bbb', 'Weltkrieg'),
-				makeTag('ccc', 'Familie')
-			]
-		});
-		const url = buildThemeRemovalUrl(interp, 'Hochzeit');
-		const params = new URL(url, 'http://x').searchParams;
-		expect(params.getAll('tag')).toEqual(['Weltkrieg', 'Familie']);
-		expect(params.get('tagOp')).toBe('OR');
-	});
-
-	it('last tag removed → no tag or tagOp params in URL', () => {
-		const interp = makeInterp({
-			resolvedTags: [makeTag('aaa', 'Hochzeit')]
-		});
-		const url = buildThemeRemovalUrl(interp, 'Hochzeit');
-		const params = new URL(url, 'http://x').searchParams;
-		expect(params.getAll('tag')).toEqual([]);
-		expect(params.get('tagOp')).toBeNull();
-	});
-
-	it('last tag removed with resolved sender person → sender param intact', () => {
-		const interp = makeInterp({
-			resolvedPersons: [{ id: '11111111-1111-1111-1111-111111111111', displayName: 'Walter' }],
-			resolvedTags: [makeTag('aaa', 'Hochzeit')]
-		});
-		const url = buildThemeRemovalUrl(interp, 'Hochzeit');
-		const params = new URL(url, 'http://x').searchParams;
-		expect(params.get('senderId')).toBe('11111111-1111-1111-1111-111111111111');
-		expect(params.getAll('tag')).toEqual([]);
-		expect(params.get('tagOp')).toBeNull();
-	});
-
-	it('null-color tag → tag name emitted correctly; color does not affect params', () => {
-		const interp = makeInterp({
-			resolvedTags: [makeTag('aaa', 'Erbschaft'), makeTag('bbb', 'Migration')]
-		});
-		const url = buildThemeRemovalUrl(interp, 'Erbschaft');
-		const params = new URL(url, 'http://x').searchParams;
-		expect(params.getAll('tag')).toEqual(['Migration']);
-		expect(params.get('tagOp')).toBe('OR');
-	});
-
-	it('directional pair → senderId and receiverId both emitted', () => {
-		const interp = makeInterp({
-			resolvedPersons: [
-				{ id: '11111111-1111-1111-1111-111111111111', displayName: 'Walter' },
-				{ id: '22222222-2222-2222-2222-222222222222', displayName: 'Emma' }
-			],
-			resolvedTags: [makeTag('aaa', 'Krieg'), makeTag('bbb', 'Heimat')]
-		});
-		const url = buildThemeRemovalUrl(interp, 'Krieg');
-		const params = new URL(url, 'http://x').searchParams;
-		expect(params.get('senderId')).toBe('11111111-1111-1111-1111-111111111111');
-		expect(params.get('receiverId')).toBe('22222222-2222-2222-2222-222222222222');
-		expect(params.getAll('tag')).toEqual(['Heimat']);
-	});
-});
--- a/frontend/src/routes/documents/theme-chip-removal.ts
+++ b/frontend/src/routes/documents/theme-chip-removal.ts
@@ -1,26 +0,0 @@
-import type { components } from '$lib/generated/api';
-
-type NlQueryInterpretation = components['schemas']['NlQueryInterpretation'];
-
-export function buildThemeRemovalUrl(
-	interp: NlQueryInterpretation,
-	removedTagName: string
-): string {
-	const remaining = interp.resolvedTags.filter((t) => t.name !== removedTagName);
-	const params = new URLSearchParams();
-
-	const resolved = interp.resolvedPersons;
-	if (resolved.length >= 1) params.set('senderId', resolved[0].id);
-	if (resolved.length >= 2) params.set('receiverId', resolved[1].id);
-	if (interp.dateFrom) params.set('from', interp.dateFrom);
-	if (interp.dateTo) params.set('to', interp.dateTo);
-	if (interp.keywordsApplied && interp.keywords.length > 0) {
-		params.set('q', interp.keywords.join(' '));
-	}
-
-	remaining.forEach((tag) => params.append('tag', tag.name));
-	if (remaining.length > 0) params.set('tagOp', 'OR');
-
-	const qs = params.toString();
-	return qs ? `/documents?${qs}` : '/documents';
-}
--- a/frontend/src/routes/search/DisambiguationPicker.svelte
+++ b/frontend/src/routes/search/DisambiguationPicker.svelte
@@ -1,102 +0,0 @@
-<script lang="ts">
-import { tick } from 'svelte';
-import { m } from '$lib/paraglide/messages.js';
-import { clickOutside } from '$lib/shared/actions/clickOutside';
-import type { components } from '$lib/generated/api';
-
-type PersonHint = components['schemas']['PersonHint'];
-
-let {
-	persons,
-	heading,
-	showCue,
-	onSelect
-}: {
-	persons: PersonHint[];
-	heading: string;
-	showCue: boolean;
-	onSelect: (person: PersonHint) => void;
-} = $props();
-
-let open = $state(false);
-let triggerEl = $state<HTMLButtonElement>();
-let listEl = $state<HTMLUListElement>();
-
-const panelId = 'disambiguation-panel';
-const headingId = 'disambiguation-heading';
-const names = $derived(persons.map((person) => person.displayName).join(', '));
-const triggerLabel = $derived(
-	persons.length === 1 ? heading : m.search_disambiguation_trigger_label()
-);
-
-async function openPicker() {
-	open = true;
-	await tick();
-	listEl?.querySelector<HTMLButtonElement>('button')?.focus();
-}
-
-function closePicker() {
-	open = false;
-	triggerEl?.focus();
-}
-
-function toggle() {
-	if (open) closePicker();
-	else openPicker();
-}
-
-function select(person: PersonHint) {
-	open = false;
-	onSelect(person);
-}
-
-function onKeydown(event: KeyboardEvent) {
-	if (event.key === 'Escape' && open) {
-		event.stopPropagation();
-		closePicker();
-	}
-}
-</script>
-
-<svelte:window onkeydown={onKeydown} />
-
-<div class="relative inline-block" use:clickOutside onclickoutside={() => open && closePicker()}>
-	<button
-		bind:this={triggerEl}
-		type="button"
-		aria-haspopup="true"
-		aria-expanded={open}
-		aria-controls={panelId}
-		aria-label={triggerLabel}
-		onclick={toggle}
-		class="inline-flex min-h-[44px] items-center gap-1.5 rounded-full border border-line bg-muted px-3 text-sm text-ink-2 outline-none focus-visible:ring-2 focus-visible:ring-brand-navy"
-	>
-		<span class="max-w-[8rem] truncate sm:max-w-[12rem]">{names}</span>
-		{#if showCue}
-			<span class="text-ink-3">{m.search_disambiguation_cue()}</span>
-		{/if}
-	</button>
-
-	{#if open}
-		<div
-			id={panelId}
-			class="absolute left-0 z-10 mt-1 min-w-[12rem] rounded-sm border border-line bg-surface py-1 shadow-md"
-		>
-			<p id={headingId} class="px-4 py-1.5 text-sm font-bold text-ink">{heading}</p>
-			<ul bind:this={listEl} aria-labelledby={headingId}>
-				{#each persons as person (person.id)}
-					<li>
-						<button
-							type="button"
-							aria-label={m.search_disambiguation_select_label({ name: person.displayName })}
-							onclick={() => select(person)}
-							class="flex min-h-[44px] w-full items-center px-4 text-left text-sm text-ink outline-none hover:bg-muted focus-visible:bg-muted focus-visible:ring-2 focus-visible:ring-brand-navy"
-						>
-							{person.displayName}
-						</button>
-					</li>
-				{/each}
-			</ul>
-		</div>
-	{/if}
-</div>
--- a/frontend/src/routes/search/DisambiguationPicker.svelte.spec.ts
+++ b/frontend/src/routes/search/DisambiguationPicker.svelte.spec.ts
@@ -1,118 +0,0 @@
-import { describe, expect, it, vi, afterEach } from 'vitest';
-import { cleanup, render } from 'vitest-browser-svelte';
-import { page } from 'vitest/browser';
-import DisambiguationPicker from './DisambiguationPicker.svelte';
-import type { components } from '$lib/generated/api';
-
-type PersonHint = components['schemas']['PersonHint'];
-
-afterEach(() => cleanup());
-
-const persons: PersonHint[] = [
-	{ id: 'w1', displayName: 'Walter Raddatz' },
-	{ id: 'w2', displayName: 'Walter Müller' }
-];
-
-const multiProps = { persons, heading: 'Person auswählen', showCue: true };
-
-function pressEscape() {
-	(document.activeElement as HTMLElement).dispatchEvent(
-		new KeyboardEvent('keydown', { key: 'Escape', bubbles: true })
-	);
-}
-
-describe('DisambiguationPicker', () => {
-	it('opens the picker and shows a select option per ambiguous person', async () => {
-		render(DisambiguationPicker, { ...multiProps, onSelect: vi.fn() });
-		await page.getByRole('button', { name: /Mehrere Personen gefunden/ }).click();
-		await expect
-			.element(page.getByRole('button', { name: 'Walter Raddatz auswählen' }))
-			.toBeInTheDocument();
-		await expect
-			.element(page.getByRole('button', { name: 'Walter Müller auswählen' }))
-			.toBeInTheDocument();
-	});
-
-	it('moves focus into the picker list on open', async () => {
-		render(DisambiguationPicker, { ...multiProps, onSelect: vi.fn() });
-		await page.getByRole('button', { name: /Mehrere Personen gefunden/ }).click();
-		await expect
-			.element(page.getByRole('button', { name: 'Walter Raddatz auswählen' }))
-			.toHaveFocus();
-	});
-
-	it('returns focus to the trigger when closed with Escape', async () => {
-		render(DisambiguationPicker, { ...multiProps, onSelect: vi.fn() });
-		const trigger = page.getByRole('button', { name: /Mehrere Personen gefunden/ });
-		await trigger.click();
-		await expect
-			.element(page.getByRole('button', { name: 'Walter Raddatz auswählen' }))
-			.toHaveFocus();
-		pressEscape();
-		await expect.element(trigger).toHaveFocus();
-	});
-
-	it('does not call onSelect when dismissed without choosing', async () => {
-		const onSelect = vi.fn();
-		render(DisambiguationPicker, { ...multiProps, onSelect });
-		await page.getByRole('button', { name: /Mehrere Personen gefunden/ }).click();
-		await expect
-			.element(page.getByRole('button', { name: 'Walter Raddatz auswählen' }))
-			.toHaveFocus();
-		pressEscape();
-		expect(onSelect).not.toHaveBeenCalled();
-	});
-
-	it('calls onSelect with the chosen person', async () => {
-		const onSelect = vi.fn();
-		render(DisambiguationPicker, { ...multiProps, onSelect });
-		await page.getByRole('button', { name: /Mehrere Personen gefunden/ }).click();
-		await page.getByRole('button', { name: 'Walter Müller auswählen' }).click();
-		expect(onSelect).toHaveBeenCalledWith(persons[1]);
-	});
-
-	it('renders the supplied heading as a visible panel heading', async () => {
-		render(DisambiguationPicker, {
-			persons: [{ id: 'c1', displayName: 'Clara Cramer' }],
-			heading: 'Meintest du Clara Cramer?',
-			showCue: false,
-			onSelect: vi.fn()
-		});
-		await page.getByRole('button', { name: 'Meintest du Clara Cramer?' }).click();
-		await expect.element(page.getByText('Meintest du Clara Cramer?')).toBeVisible();
-	});
-
-	it('suppresses the cue when showCue is false', async () => {
-		render(DisambiguationPicker, {
-			persons: [{ id: 'c1', displayName: 'Clara Cramer' }],
-			heading: 'Meintest du Clara Cramer?',
-			showCue: false,
-			onSelect: vi.fn()
-		});
-		await expect.element(page.getByText('(auswählen…)')).not.toBeInTheDocument();
-	});
-
-	it('shows the cue when showCue is true', async () => {
-		render(DisambiguationPicker, { ...multiProps, onSelect: vi.fn() });
-		await expect.element(page.getByText('(auswählen…)')).toBeVisible();
-	});
-
-	it('announces the did-you-mean heading as the trigger accessible name for a single suggestion', async () => {
-		render(DisambiguationPicker, {
-			persons: [{ id: 'c1', displayName: 'Clara Cramer' }],
-			heading: 'Meintest du Clara Cramer?',
-			showCue: false,
-			onSelect: vi.fn()
-		});
-		await expect
-			.element(page.getByRole('button', { name: 'Meintest du Clara Cramer?' }))
-			.toBeInTheDocument();
-	});
-
-	it('keeps the multiple-people trigger accessible name for two or more suggestions', async () => {
-		render(DisambiguationPicker, { ...multiProps, onSelect: vi.fn() });
-		await expect
-			.element(page.getByRole('button', { name: /Mehrere Personen gefunden/ }))
-			.toBeInTheDocument();
-	});
-});
--- a/frontend/src/routes/search/InterpretationChipRow.svelte
+++ b/frontend/src/routes/search/InterpretationChipRow.svelte
@@ -1,181 +0,0 @@
-<script lang="ts">
-import { SvelteSet } from 'svelte/reactivity';
-import { m } from '$lib/paraglide/messages.js';
-import type { components } from '$lib/generated/api';
-
-import type { ChipType } from './chip-types.js';
-
-type NlQueryInterpretation = components['schemas']['NlQueryInterpretation'];
-type TagHint = components['schemas']['TagHint'];
-
-let {
-	interpretation,
-	onRemoveChip
-}: {
-	interpretation: NlQueryInterpretation;
-	onRemoveChip: (type: ChipType, value?: string) => void;
-} = $props();
-
-type Chip =
-	| { key: string; type: 'sender'; label: string }
-	| { key: string; type: 'directional'; from: string; to: string }
-	| { key: string; type: 'date'; label: string }
-	| { key: string; type: 'keyword'; value: string; label: string }
-	| { key: string; type: 'theme'; tag: TagHint; label: string };
-
-// Locally removed chips. The parent remounts this component (via {#key}) on every
-// new NL search, so this set never needs an explicit reset.
-const removed = new SvelteSet<string>();
-
-function yearOf(iso: string | undefined): string | undefined {
-	return iso?.slice(0, 4);
-}
-
-function dateRangeLabel(from: string | undefined, to: string | undefined): string {
-	const fromYear = yearOf(from);
-	const toYear = yearOf(to);
-	if (fromYear && toYear) return fromYear === toYear ? fromYear : `${fromYear}–${toYear}`;
-	return fromYear ?? toYear ?? '';
-}
-
-function tagColorStyle(color: string | undefined): string | undefined {
-	if (!color) return undefined;
-	return `background-color: var(--c-tag-${color}); border-left-color: var(--c-tag-${color})`;
-}
-
-const chips = $derived.by(() => {
-	const list: Chip[] = [];
-	const {
-		resolvedPersons,
-		dateFrom,
-		dateTo,
-		keywords,
-		keywordsApplied,
-		resolvedTags,
-		tagsApplied
-	} = interpretation;
-
-	if (resolvedPersons.length >= 2) {
-		list.push({
-			key: 'directional',
-			type: 'directional',
-			from: resolvedPersons[0].displayName,
-			to: resolvedPersons[1].displayName
-		});
-	} else if (resolvedPersons.length === 1) {
-		list.push({
-			key: 'sender:' + resolvedPersons[0].id,
-			type: 'sender',
-			label: `${m.search_chip_sender()}: ${resolvedPersons[0].displayName}`
-		});
-	}
-
-	if (dateFrom || dateTo) {
-		list.push({
-			key: 'date',
-			type: 'date',
-			label: `${m.search_chip_date()}: ${dateRangeLabel(dateFrom, dateTo)}`
-		});
-	}
-
-	if (keywordsApplied) {
-		for (const keyword of keywords) {
-			list.push({
-				key: 'keyword:' + keyword,
-				type: 'keyword',
-				value: keyword,
-				label: `${m.search_chip_keyword()}: ${keyword}`
-			});
-		}
-	}
-
-	if (tagsApplied) {
-		for (const tag of resolvedTags) {
-			list.push({
-				key: 'theme:' + tag.id,
-				type: 'theme',
-				tag,
-				label: `${m.search_chip_theme_prefix()}: ${tag.name}`
-			});
-		}
-	}
-
-	return list.filter((chip) => !removed.has(chip.key));
-});
-
-const showKeywordsNotApplied = $derived(
-	!interpretation.keywordsApplied && interpretation.keywords.length > 0
-);
-
-function remove(chip: Chip) {
-	removed.add(chip.key);
-	if (chip.type === 'keyword') {
-		onRemoveChip(chip.type, chip.value);
-	} else if (chip.type === 'theme') {
-		onRemoveChip(chip.type, chip.tag.name);
-	} else {
-		onRemoveChip(chip.type, undefined);
-	}
-}
-
-const nameSpan = 'sm:max-w-[12rem] max-w-[8rem] truncate';
-const chipWrapper =
-	'inline-flex items-center gap-1.5 rounded-full border border-line bg-muted px-3 text-sm text-ink-2 focus-within:ring-2 focus-within:ring-brand-navy';
-const removeButton =
-	'flex min-h-[44px] w-6 shrink-0 items-center justify-center text-ink-3 outline-none hover:text-red-500 focus-visible:ring-2 focus-visible:ring-brand-navy';
-</script>
-
-<div class="flex flex-wrap gap-2">
-	{#each chips as chip (chip.key)}
-		{#if chip.type === 'directional'}
-			<span
-				data-chip-type="directional"
-				class={chipWrapper}
-				aria-label={m.search_chip_directional_label({ from: chip.from, to: chip.to })}
-			>
-				<span class={nameSpan}>{chip.from}</span>
-				<span aria-hidden="true">→</span>
-				<span class={nameSpan}>{chip.to}</span>
-				<button
-					type="button"
-					class={removeButton}
-					aria-label={m.search_filter_remove_label({ label: `${chip.from} → ${chip.to}` })}
-					onclick={() => remove(chip)}
-				>
-					<span aria-hidden="true">×</span>
-				</button>
-			</span>
-		{:else if chip.type === 'theme'}
-			<span data-chip-type="theme" class={chipWrapper} style={tagColorStyle(chip.tag.color)}>
-				<span>{m.search_chip_theme_prefix()}:</span>
-				<span class={nameSpan}>{chip.tag.name}</span>
-				<button
-					type="button"
-					class={removeButton}
-					aria-label={m.search_filter_remove_label({
-						label: `${m.search_chip_theme_prefix()}: ${chip.tag.name}`
-					})}
-					onclick={() => remove(chip)}
-				>
-					<span aria-hidden="true">×</span>
-				</button>
-			</span>
-		{:else}
-			<span data-chip-type={chip.type} class={chipWrapper}>
-				<span class={nameSpan}>{chip.label}</span>
-				<button
-					type="button"
-					class={removeButton}
-					aria-label={m.search_filter_remove_label({ label: chip.label })}
-					onclick={() => remove(chip)}
-				>
-					<span aria-hidden="true">×</span>
-				</button>
-			</span>
-		{/if}
-	{/each}
-</div>
-
-{#if showKeywordsNotApplied}
-	<p class="mt-2 text-xs text-ink-3">{m.smart_search_keywords_not_applied()}</p>
-{/if}
--- a/frontend/src/routes/search/InterpretationChipRow.svelte.spec.ts
+++ b/frontend/src/routes/search/InterpretationChipRow.svelte.spec.ts
@@ -1,214 +0,0 @@
-// NOTE: vitest-browser fails silently when the project path contains '+' (common in git worktrees
-// named 'feat+issue-NNN-slug'). If tests fail with iframe routing errors, copy the frontend
-// directory to a path without '+' (e.g. /tmp/fe-copy) and run the suite from there.
-import { describe, expect, it, vi, afterEach } from 'vitest';
-import { cleanup, render } from 'vitest-browser-svelte';
-import { page } from 'vitest/browser';
-import InterpretationChipRow from './InterpretationChipRow.svelte';
-import type { components } from '$lib/generated/api';
-
-type NlQueryInterpretation = components['schemas']['NlQueryInterpretation'];
-type PersonHint = components['schemas']['PersonHint'];
-type TagHint = components['schemas']['TagHint'];
-
-afterEach(() => cleanup());
-
-const makePerson = (id: string, displayName: string): PersonHint => ({ id, displayName });
-
-const makeInterpretation = (
-	overrides: Partial<NlQueryInterpretation> = {}
-): NlQueryInterpretation => ({
-	resolvedPersons: [],
-	ambiguousPersons: [],
-	keywords: [],
-	resolvedTags: [],
-	rawQuery: 'test',
-	keywordsApplied: true,
-	tagsApplied: false,
-	...overrides
-});
-
-describe('InterpretationChipRow', () => {
-	it('renders type-prefixed labels for sender, date and keyword chips', async () => {
-		render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedPersons: [makePerson('p1', 'Walter Raddatz')],
-				dateFrom: '1914-01-01',
-				dateTo: '1918-12-31',
-				keywords: ['krieg']
-			}),
-			onRemoveChip: vi.fn()
-		});
-		await expect.element(page.getByText('Absender: Walter Raddatz')).toBeInTheDocument();
-		await expect.element(page.getByText('Zeitraum: 1914–1918')).toBeInTheDocument();
-		await expect.element(page.getByText('Stichwort: krieg')).toBeInTheDocument();
-	});
-
-	it('calls onRemoveChip with "sender" when the sender chip × is clicked', async () => {
-		const onRemoveChip = vi.fn();
-		render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedPersons: [makePerson('p1', 'Walter Raddatz')]
-			}),
-			onRemoveChip
-		});
-		await page.getByRole('button', { name: /Absender: Walter Raddatz/ }).click();
-		expect(onRemoveChip).toHaveBeenCalledWith('sender', undefined);
-	});
-
-	it('removes a chip from the DOM but keeps the rest when one × is clicked', async () => {
-		const { container } = render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedPersons: [makePerson('p1', 'Walter Raddatz')],
-				dateFrom: '1914-01-01',
-				dateTo: '1918-12-31',
-				keywords: ['krieg']
-			}),
-			onRemoveChip: vi.fn()
-		});
-		expect(container.querySelectorAll('[data-chip-type]')).toHaveLength(3);
-		await page.getByRole('button', { name: /Absender/ }).click();
-		await vi.waitFor(() => expect(container.querySelectorAll('[data-chip-type]')).toHaveLength(2));
-	});
-
-	it('renders a single directional chip with an arrow for a 2-name query', async () => {
-		const { container } = render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedPersons: [makePerson('p1', 'Walter Raddatz'), makePerson('p2', 'Emma Raddatz')]
-			}),
-			onRemoveChip: vi.fn()
-		});
-		expect(container.querySelectorAll('[data-chip-type="directional"]')).toHaveLength(1);
-		await expect.element(page.getByText(/→/)).toBeInTheDocument();
-	});
-
-	it('calls onRemoveChip with "directional" when the directional chip × is clicked', async () => {
-		const onRemoveChip = vi.fn();
-		render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedPersons: [makePerson('p1', 'Walter Raddatz'), makePerson('p2', 'Emma Raddatz')]
-			}),
-			onRemoveChip
-		});
-		await page.getByRole('button', { name: /Walter Raddatz/ }).click();
-		expect(onRemoveChip).toHaveBeenCalledWith('directional', undefined);
-	});
-
-	it('does not render keyword chips when keywordsApplied is false', async () => {
-		const { container } = render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				keywordsApplied: false,
-				keywords: ['krieg', 'brief']
-			}),
-			onRemoveChip: vi.fn()
-		});
-		expect(container.querySelectorAll('[data-chip-type="keyword"]')).toHaveLength(0);
-	});
-
-	it('renders no keyword chips when keywords is empty', async () => {
-		const { container } = render(InterpretationChipRow, {
-			interpretation: makeInterpretation({ keywordsApplied: true, keywords: [] }),
-			onRemoveChip: vi.fn()
-		});
-		expect(container.querySelectorAll('[data-chip-type="keyword"]')).toHaveLength(0);
-	});
-
-	it('renders exactly one keyword chip per keyword', async () => {
-		const { container } = render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				keywordsApplied: true,
-				keywords: ['krieg', 'brief', 'front']
-			}),
-			onRemoveChip: vi.fn()
-		});
-		expect(container.querySelectorAll('[data-chip-type="keyword"]')).toHaveLength(3);
-	});
-
-	it('keeps the × button in the DOM when a display name is 100 characters', async () => {
-		const longName = 'W'.repeat(100);
-		render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedPersons: [makePerson('p1', longName)]
-			}),
-			onRemoveChip: vi.fn()
-		});
-		await expect
-			.element(page.getByRole('button', { name: new RegExp('Absender') }))
-			.toBeInTheDocument();
-	});
-
-	// ── theme chips ─────────────────────────────────────────────────────────────
-
-	const makeTag = (id: string, name: string, color?: string): TagHint => ({ id, name, color });
-
-	it('renders theme chips when tagsApplied is true', async () => {
-		const { container } = render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedTags: [makeTag('t1', 'Hochzeit')],
-				tagsApplied: true
-			}),
-			onRemoveChip: vi.fn()
-		});
-		expect(container.querySelectorAll('[data-chip-type="theme"]')).toHaveLength(1);
-		await expect.element(page.getByText(/Thema: Hochzeit/)).toBeInTheDocument();
-	});
-
-	it('renders no theme chips when tagsApplied is false', async () => {
-		const { container } = render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedTags: [makeTag('t1', 'Hochzeit')],
-				tagsApplied: false
-			}),
-			onRemoveChip: vi.fn()
-		});
-		expect(container.querySelectorAll('[data-chip-type="theme"]')).toHaveLength(0);
-	});
-
-	it('renders exactly N theme chips for N resolved tags', async () => {
-		const { container } = render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedTags: [makeTag('t1', 'Krieg'), makeTag('t2', 'Hochzeit'), makeTag('t3', 'Familie')],
-				tagsApplied: true
-			}),
-			onRemoveChip: vi.fn()
-		});
-		expect(container.querySelectorAll('[data-chip-type="theme"]')).toHaveLength(3);
-	});
-
-	it('calls onRemoveChip with "theme" and tag name when × is clicked', async () => {
-		const onRemoveChip = vi.fn();
-		render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedTags: [makeTag('t1', 'Hochzeit')],
-				tagsApplied: true
-			}),
-			onRemoveChip
-		});
-		await page.getByRole('button', { name: /Thema: Hochzeit/ }).click();
-		expect(onRemoveChip).toHaveBeenCalledWith('theme', 'Hochzeit');
-	});
-
-	it('applies inline color style for a tag with a color', async () => {
-		const { container } = render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedTags: [makeTag('t1', 'Hochzeit', 'sage')],
-				tagsApplied: true
-			}),
-			onRemoveChip: vi.fn()
-		});
-		const chip = container.querySelector('[data-chip-type="theme"]') as HTMLElement;
-		expect(chip.style.backgroundColor).toBeTruthy();
-	});
-
-	it('omits color style for a tag with no color', async () => {
-		const { container } = render(InterpretationChipRow, {
-			interpretation: makeInterpretation({
-				resolvedTags: [makeTag('t1', 'Hochzeit')],
-				tagsApplied: true
-			}),
-			onRemoveChip: vi.fn()
-		});
-		const chip = container.querySelector('[data-chip-type="theme"]') as HTMLElement;
-		expect(chip.getAttribute('style')).toBeFalsy();
-	});
-});
--- a/frontend/src/routes/search/SmartModeToggle.svelte
+++ b/frontend/src/routes/search/SmartModeToggle.svelte
@@ -1,38 +0,0 @@
-<script lang="ts">
-import { m } from '$lib/paraglide/messages.js';
-
-let { smartMode = $bindable(false), onToggle }: { smartMode?: boolean; onToggle?: () => void } =
-	$props();
-
-const label = $derived(smartMode ? m.search_toggle_smart_label() : m.search_toggle_keyword_label());
-const labelSuffix = $derived(
-	smartMode ? m.search_toggle_smart_label_suffix() : m.search_toggle_keyword_label_suffix()
-);
-
-function toggle() {
-	smartMode = !smartMode;
-	onToggle?.();
-}
-</script>
-
-<button
-	type="button"
-	aria-pressed={smartMode}
-	onclick={toggle}
-	class="pointer-events-auto absolute top-1/2 right-2 flex -translate-y-1/2 cursor-pointer items-center gap-1.5 rounded-full px-2.5 py-1 text-xs font-bold uppercase outline-none focus-visible:ring-2 focus-visible:ring-brand-navy {smartMode
-		? 'border border-primary bg-primary text-primary-fg'
-		: 'border border-line bg-muted text-ink-2'}"
->
-	<svg
-		aria-hidden="true"
-		viewBox="0 0 24 24"
-		fill="currentColor"
-		class="h-3.5 w-3.5"
-		xmlns="http://www.w3.org/2000/svg"
-	>
-		<path d="M12 2l2.09 6.26L20 10l-5.91 1.74L12 18l-2.09-6.26L4 10l5.91-1.74L12 2z" />
-	</svg>
-	<span>
-		{label}<span class="sm:hidden">{labelSuffix}</span>
-	</span>
-</button>
--- a/frontend/src/routes/search/SmartModeToggle.svelte.spec.ts
+++ b/frontend/src/routes/search/SmartModeToggle.svelte.spec.ts
@@ -1,81 +0,0 @@
-import { describe, expect, it, vi, afterEach } from 'vitest';
-import { cleanup, render } from 'vitest-browser-svelte';
-import { page } from 'vitest/browser';
-import SmartModeToggle from './SmartModeToggle.svelte';
-import SearchFilterBar from '../SearchFilterBar.svelte';
-
-afterEach(() => cleanup());
-
-const SEARCH_PLACEHOLDER = 'Titel, Personen, Tags durchsuchen…';
-
-describe('SmartModeToggle', () => {
-	it('renders aria-pressed="false" by default and toggles on click', async () => {
-		render(SmartModeToggle, { smartMode: false });
-		const btn = page.getByRole('button');
-		await expect.element(btn).toHaveAttribute('aria-pressed', 'false');
-		await btn.click();
-		await expect.element(btn).toHaveAttribute('aria-pressed', 'true');
-		await btn.click();
-		await expect.element(btn).toHaveAttribute('aria-pressed', 'false');
-	});
-
-	it('shows the smart label when smartMode is true', async () => {
-		render(SmartModeToggle, { smartMode: true });
-		const btn = page.getByRole('button');
-		await expect.element(btn).toHaveTextContent('KI');
-	});
-
-	it('shows the keyword label when smartMode is false', async () => {
-		render(SmartModeToggle, { smartMode: false });
-		const btn = page.getByRole('button');
-		await expect.element(btn).toHaveTextContent('Text');
-	});
-
-	it('applies the active pill style only in smart mode', async () => {
-		render(SmartModeToggle, { smartMode: true });
-		const btn = page.getByRole('button');
-		await expect.element(btn).toHaveClass(/bg-primary/);
-	});
-});
-
-describe('SmartModeToggle inside SearchFilterBar', () => {
-	it('adds maxlength="500" to the search input only in smart mode', async () => {
-		render(SearchFilterBar, { onSearch: vi.fn(), sort: 'DATE', dir: 'desc', smartMode: true });
-		await expect
-			.element(page.getByPlaceholder(SEARCH_PLACEHOLDER))
-			.toHaveAttribute('maxlength', '500');
-	});
-
-	it('omits maxlength from the search input in keyword mode', async () => {
-		render(SearchFilterBar, { onSearch: vi.fn(), sort: 'DATE', dir: 'desc', smartMode: false });
-		await expect
-			.element(page.getByPlaceholder(SEARCH_PLACEHOLDER))
-			.not.toHaveAttribute('maxlength');
-	});
-
-	it('does not fire the keyword search on input while in smart mode', async () => {
-		const onSearch = vi.fn();
-		render(SearchFilterBar, { onSearch, sort: 'DATE', dir: 'desc', smartMode: true });
-		await page.getByPlaceholder(SEARCH_PLACEHOLDER).fill('Walter im Krieg');
-		expect(onSearch).not.toHaveBeenCalled();
-	});
-
-	it('fires the smart search callback on Enter in smart mode', async () => {
-		const onSmartSearch = vi.fn();
-		render(SearchFilterBar, {
-			onSearch: vi.fn(),
-			onSmartSearch,
-			sort: 'DATE',
-			dir: 'desc',
-			smartMode: true
-		});
-		const input = page.getByPlaceholder(SEARCH_PLACEHOLDER);
-		await input.fill('Walter im Krieg');
-		await input.click();
-		// Enter submits the NL query in smart mode
-		(document.activeElement as HTMLElement).dispatchEvent(
-			new KeyboardEvent('keydown', { key: 'Enter', bubbles: true })
-		);
-		await vi.waitFor(() => expect(onSmartSearch).toHaveBeenCalled());
-	});
-});
--- a/frontend/src/routes/search/SmartSearchStatus.svelte
+++ b/frontend/src/routes/search/SmartSearchStatus.svelte
@@ -1,69 +0,0 @@
-<script lang="ts">
-import { m } from '$lib/paraglide/messages.js';
-
-type SmartSearchErrorCode = 'SMART_SEARCH_UNAVAILABLE' | 'SMART_SEARCH_RATE_LIMITED';
-
-let {
-	status,
-	errorCode,
-	onSwitchToKeyword
-}: {
-	status: 'loading' | 'error';
-	errorCode?: SmartSearchErrorCode;
-	onSwitchToKeyword?: () => void;
-} = $props();
-
-const isRateLimited = $derived(errorCode === 'SMART_SEARCH_RATE_LIMITED');
-const title = $derived(
-	isRateLimited ? m.search_error_rate_limited() : m.search_error_unavailable()
-);
-const body = $derived(
-	isRateLimited ? m.search_error_rate_limited_body() : m.search_error_unavailable_body()
-);
-</script>
-
-{#if status === 'loading'}
-	<div
-		role="status"
-		aria-live="polite"
-		class="flex flex-col items-center justify-center gap-3 py-16 text-center"
-	>
-		<div
-			aria-hidden="true"
-			class="h-9 w-9 rounded-full border-[3px] border-primary/12 border-t-primary motion-safe:animate-spin"
-		></div>
-		<p class="text-sm font-bold text-ink">{m.search_loading_nl()}</p>
-		<p class="max-w-xs text-xs text-ink-3 motion-safe:animate-pulse">
-			{m.search_loading_nl_sub()}
-		</p>
-	</div>
-{:else if status === 'error'}
-	<div role="alert" class="flex flex-col items-center justify-center gap-3 py-16 text-center">
-		<div
-			aria-hidden="true"
-			class="flex h-10 w-10 items-center justify-center rounded-full border-2 text-lg font-bold {isRateLimited
-				? 'border-amber-400 bg-amber-50 text-amber-600'
-				: 'border-red-400 bg-red-50 text-red-600'}"
-		>
-			{#if isRateLimited}
-				<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" class="h-5 w-5">
-					<circle cx="12" cy="12" r="9" />
-					<path d="M12 7v5l3 2" stroke-linecap="round" stroke-linejoin="round" />
-				</svg>
-			{:else}
-				<span>!</span>
-			{/if}
-		</div>
-		<p class="text-sm font-bold text-ink">{title}</p>
-		<p class="max-w-xs text-xs text-ink-3">{body}</p>
-		{#if !isRateLimited}
-			<button
-				type="button"
-				onclick={onSwitchToKeyword}
-				class="mt-2 inline-flex min-h-[44px] items-center rounded border border-primary bg-primary px-4 py-2 text-sm font-bold text-primary-fg outline-none focus-visible:ring-2 focus-visible:ring-brand-navy"
-			>
-				{m.search_switch_to_keyword()}
-			</button>
-		{/if}
-	</div>
-{/if}
--- a/frontend/src/routes/search/SmartSearchStatus.svelte.spec.ts
+++ b/frontend/src/routes/search/SmartSearchStatus.svelte.spec.ts
@@ -1,60 +0,0 @@
-import { describe, expect, it, vi, afterEach } from 'vitest';
-import { cleanup, render } from 'vitest-browser-svelte';
-import { page } from 'vitest/browser';
-import SmartSearchStatus from './SmartSearchStatus.svelte';
-
-afterEach(() => {
-	cleanup();
-	vi.restoreAllMocks();
-});
-
-describe('SmartSearchStatus', () => {
-	it('renders a role="status" loading panel with the loading title', async () => {
-		render(SmartSearchStatus, { status: 'loading' });
-		const status = page.getByRole('status');
-		await expect.element(status).toBeInTheDocument();
-		await expect.element(status).toHaveTextContent('Archiv wird befragt');
-	});
-
-	it('hides the loading panel once the status changes away from loading', async () => {
-		const { rerender } = render(SmartSearchStatus, { status: 'loading' });
-		await expect.element(page.getByRole('status')).toBeInTheDocument();
-		await rerender({ status: 'error', errorCode: 'SMART_SEARCH_UNAVAILABLE' });
-		await expect.element(page.getByRole('status')).not.toBeInTheDocument();
-	});
-
-	it('renders the 503 panel with title, body and a switch-to-keyword button', async () => {
-		render(SmartSearchStatus, {
-			status: 'error',
-			errorCode: 'SMART_SEARCH_UNAVAILABLE',
-			onSwitchToKeyword: vi.fn()
-		});
-		await expect.element(page.getByText('Intelligente Suche nicht verfügbar')).toBeInTheDocument();
-		await expect
-			.element(page.getByRole('button', { name: /Volltextsuche wechseln/ }))
-			.toBeInTheDocument();
-	});
-
-	it('invokes onSwitchToKeyword when the 503 fallback button is clicked', async () => {
-		const onSwitchToKeyword = vi.fn();
-		render(SmartSearchStatus, {
-			status: 'error',
-			errorCode: 'SMART_SEARCH_UNAVAILABLE',
-			onSwitchToKeyword
-		});
-		await page.getByRole('button', { name: /Volltextsuche wechseln/ }).click();
-		expect(onSwitchToKeyword).toHaveBeenCalledOnce();
-	});
-
-	it('renders the 429 panel with title and body but no switch-to-keyword button', async () => {
-		render(SmartSearchStatus, {
-			status: 'error',
-			errorCode: 'SMART_SEARCH_RATE_LIMITED',
-			onSwitchToKeyword: vi.fn()
-		});
-		await expect.element(page.getByText('Zu viele Anfragen')).toBeInTheDocument();
-		await expect
-			.element(page.getByRole('button', { name: /Volltextsuche wechseln/ }))
-			.not.toBeInTheDocument();
-	});
-});
--- a/frontend/src/routes/search/chip-types.ts
+++ b/frontend/src/routes/search/chip-types.ts
@@ -1 +0,0 @@
-export type ChipType = 'sender' | 'directional' | 'date' | 'keyword' | 'theme';
--- a/infra/observability/grafana/provisioning/dashboards/ollama.json
+++ b/infra/observability/grafana/provisioning/dashboards/ollama.json
@@ -1,218 +0,0 @@
-{
-  "id": null,
-  "uid": "ollama-dashboard",
-  "title": "Ollama",
-  "description": "Ollama inference latency and request rate",
-  "version": 1,
-  "schemaVersion": 39,
-  "tags": ["ollama", "inference"],
-  "timezone": "browser",
-  "editable": true,
-  "fiscalYearStartMonth": 0,
-  "graphTooltip": 1,
-  "links": [],
-  "liveNow": false,
-  "refresh": "30s",
-  "time": {
-    "from": "now-1h",
-    "to": "now"
-  },
-  "timepicker": {},
-  "weekStart": "",
-  "annotations": {
-    "list": [
-      {
-        "builtIn": 1,
-        "datasource": { "type": "datasource", "uid": "grafana" },
-        "enable": true,
-        "hide": true,
-        "iconColor": "rgba(0, 211, 255, 1)",
-        "name": "Annotations & Alerts",
-        "type": "dashboard"
-      }
-    ]
-  },
-  "panels": [
-    {
-      "id": 1,
-      "type": "timeseries",
-      "title": "Inference Latency p50",
-      "description": "50th percentile of Ollama request duration over a 5-minute window",
-      "gridPos": { "h": 8, "w": 8, "x": 0, "y": 0 },
-      "datasource": { "type": "prometheus", "uid": "prometheus" },
-      "fieldConfig": {
-        "defaults": {
-          "color": { "mode": "palette-classic" },
-          "custom": {
-            "axisBorderShow": false,
-            "axisCenteredZero": false,
-            "axisColorMode": "text",
-            "axisLabel": "",
-            "axisPlacement": "auto",
-            "barAlignment": 0,
-            "drawStyle": "line",
-            "fillOpacity": 10,
-            "gradientMode": "none",
-            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
-            "insertNulls": false,
-            "lineInterpolation": "linear",
-            "lineWidth": 2,
-            "pointSize": 5,
-            "scaleDistribution": { "type": "linear" },
-            "showPoints": "auto",
-            "spanNulls": false,
-            "stacking": { "group": "A", "mode": "none" },
-            "thresholdsStyle": { "mode": "off" }
-          },
-          "mappings": [],
-          "thresholds": {
-            "mode": "absolute",
-            "steps": [
-              { "color": "green", "value": null },
-              { "color": "red", "value": 80 }
-            ]
-          },
-          "unit": "s"
-        },
-        "overrides": []
-      },
-      "options": {
-        "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
-        "tooltip": { "mode": "single", "sort": "none" }
-      },
-      "targets": [
-        {
-          "datasource": { "type": "prometheus", "uid": "prometheus" },
-          "editorMode": "code",
-          "expr": "histogram_quantile(0.5, rate(ollama_request_duration_seconds_bucket[5m]))",
-          "instant": false,
-          "legendFormat": "p50",
-          "range": true,
-          "refId": "A"
-        }
-      ]
-    },
-    {
-      "id": 2,
-      "type": "timeseries",
-      "title": "Inference Latency p95",
-      "description": "95th percentile of Ollama request duration over a 5-minute window",
-      "gridPos": { "h": 8, "w": 8, "x": 8, "y": 0 },
-      "datasource": { "type": "prometheus", "uid": "prometheus" },
-      "fieldConfig": {
-        "defaults": {
-          "color": { "mode": "palette-classic" },
-          "custom": {
-            "axisBorderShow": false,
-            "axisCenteredZero": false,
-            "axisColorMode": "text",
-            "axisLabel": "",
-            "axisPlacement": "auto",
-            "barAlignment": 0,
-            "drawStyle": "line",
-            "fillOpacity": 10,
-            "gradientMode": "none",
-            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
-            "insertNulls": false,
-            "lineInterpolation": "linear",
-            "lineWidth": 2,
-            "pointSize": 5,
-            "scaleDistribution": { "type": "linear" },
-            "showPoints": "auto",
-            "spanNulls": false,
-            "stacking": { "group": "A", "mode": "none" },
-            "thresholdsStyle": { "mode": "off" }
-          },
-          "mappings": [],
-          "thresholds": {
-            "mode": "absolute",
-            "steps": [
-              { "color": "green", "value": null },
-              { "color": "red", "value": 80 }
-            ]
-          },
-          "unit": "s"
-        },
-        "overrides": []
-      },
-      "options": {
-        "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
-        "tooltip": { "mode": "single", "sort": "none" }
-      },
-      "targets": [
-        {
-          "datasource": { "type": "prometheus", "uid": "prometheus" },
-          "editorMode": "code",
-          "expr": "histogram_quantile(0.95, rate(ollama_request_duration_seconds_bucket[5m]))",
-          "instant": false,
-          "legendFormat": "p95",
-          "range": true,
-          "refId": "A"
-        }
-      ]
-    },
-    {
-      "id": 3,
-      "type": "timeseries",
-      "title": "Request Rate",
-      "description": "Ollama requests per second over a 5-minute window",
-      "gridPos": { "h": 8, "w": 8, "x": 16, "y": 0 },
-      "datasource": { "type": "prometheus", "uid": "prometheus" },
-      "fieldConfig": {
-        "defaults": {
-          "color": { "mode": "palette-classic" },
-          "custom": {
-            "axisBorderShow": false,
-            "axisCenteredZero": false,
-            "axisColorMode": "text",
-            "axisLabel": "",
-            "axisPlacement": "auto",
-            "barAlignment": 0,
-            "drawStyle": "line",
-            "fillOpacity": 10,
-            "gradientMode": "none",
-            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
-            "insertNulls": false,
-            "lineInterpolation": "linear",
-            "lineWidth": 2,
-            "pointSize": 5,
-            "scaleDistribution": { "type": "linear" },
-            "showPoints": "auto",
-            "spanNulls": false,
-            "stacking": { "group": "A", "mode": "none" },
-            "thresholdsStyle": { "mode": "off" }
-          },
-          "mappings": [],
-          "thresholds": {
-            "mode": "absolute",
-            "steps": [
-              { "color": "green", "value": null },
-              { "color": "red", "value": 80 }
-            ]
-          },
-          "unit": "reqps"
-        },
-        "overrides": []
-      },
-      "options": {
-        "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
-        "tooltip": { "mode": "single", "sort": "none" }
-      },
-      "targets": [
-        {
-          "datasource": { "type": "prometheus", "uid": "prometheus" },
-          "editorMode": "code",
-          "expr": "rate(ollama_requests_total[5m])",
-          "instant": false,
-          "legendFormat": "req/s",
-          "range": true,
-          "refId": "A"
-        }
-      ]
-    }
-  ],
-  "preload": false,
-  "templating": {
-    "list": []
-  }
-}
--- a/infra/observability/prometheus/prometheus.yml
+++ b/infra/observability/prometheus/prometheus.yml
@@ -22,8 +22,3 @@ scrape_configs:
    static_configs:
      - targets: ['ocr-service:8000']

-  - job_name: ollama
-    metrics_path: /metrics
-    static_configs:
-      # Uses the Docker service name for reliable DNS resolution.
-      - targets: ['ollama:11434']
--- a/nlp-service/CLAUDE.md
+++ b/nlp-service/CLAUDE.md
@@ -1,41 +0,0 @@
-# NLP Service
-
-Lightweight FastAPI service that parses free-text search queries into structured extractions,
-replacing Ollama for the Familienarchiv NL search feature.
-
-## Stack
-
- Python 3.11, FastAPI 0.115, spaCy 3.8, dateparser 1.2
-
-## Endpoints
-
- `POST /parse` — parse a free-text query, return extraction matching `OllamaExtraction` contract
- `GET /health` — returns `{"status": "ok"}` when all models are loaded
-
-## Running locally
-
-```bash
-pip install -r requirements.txt
-python -m spacy download de_core_news_sm en_core_web_sm es_core_news_sm
-uvicorn main:app --reload --port 8001
-
-curl -X POST http://localhost:8001/parse \
-  -H "Content-Type: application/json" \
-  -d '{"query": "Briefe von Opa Hermann an Marie vor 1920", "lang": "de"}'
-```
-
-## Testing
-
-```bash
-pytest -v
-```
-
-## Design spec
-
-See `docs/superpowers/specs/2026-06-07-spacy-nlp-service-design.md`.
-
-## Notes
-
-This is a **prototype** for extraction quality evaluation. No docker-compose integration or
-Java-side changes in this iteration. The extraction contract matches `OllamaExtraction` in
-`backend/src/main/java/org/raddatz/familienarchiv/search/`.
--- a/nlp-service/models.py
+++ b/nlp-service/models.py
@@ -1,17 +0,0 @@
-from __future__ import annotations
-from typing import Literal
-from pydantic import BaseModel
-
-
-class ParseRequest(BaseModel):
-    query: str
-    lang: Literal["de", "en", "es"]
-
-
-class ParseResponse(BaseModel):
-    personNames: list[str]
-    personRole: Literal["sender", "receiver", "any"]
-    dateFrom: str | None
-    dateTo: str | None
-    keywords: list[str]
-    rawQuery: str
--- a/nlp-service/requirements.txt
+++ b/nlp-service/requirements.txt
@@ -1,6 +0,0 @@
-fastapi[standard]==0.115.6
-uvicorn[standard]==0.34.0
-spacy>=3.8,<4.0
-dateparser>=1.2,<2.0
-pytest>=8.0,<9.0
-httpx>=0.28,<1.0
--- a/nlp-service/test_extractor.py
+++ b/nlp-service/test_extractor.py
@@ -1,33 +0,0 @@
-import pytest
-from pydantic import ValidationError
-
-
-# ── Models ──────────────────────────────────────────────────────────────────
-
-def test_parse_request_valid():
-    from models import ParseRequest
-    req = ParseRequest(query="Briefe von Opa", lang="de")
-    assert req.query == "Briefe von Opa"
-    assert req.lang == "de"
-
-
-def test_parse_request_rejects_unknown_lang():
-    from models import ParseRequest
-    with pytest.raises(ValidationError):
-        ParseRequest(query="Letters from grandpa", lang="fr")
-
-
-def test_parse_response_serializes_nulls():
-    from models import ParseResponse
-    resp = ParseResponse(
-        personNames=["Opa"],
-        personRole="sender",
-        dateFrom=None,
-        dateTo="1920-12-31",
-        keywords=["brief"],
-        rawQuery="Briefe von Opa",
-    )
-    data = resp.model_dump()
-    assert data["dateFrom"] is None
-    assert data["dateTo"] == "1920-12-31"
-    assert data["personRole"] == "sender"
				`@@ -1 +0,0 @@`
				`export type ChipType = 'sender' \| 'directional' \| 'date' \| 'keyword' \| 'theme';`