docs(ollama): add model upgrade runbook + post-deploy smoke test to DEPLOYMENT.md

Addresses Elicit's and Sara's review concerns on PR #749: - Expand §6 ollama_models section into a full model upgrade runbook (step-by-step docker volume rm + recreate, including production volume name prefix) - Add re-deploy idempotency note to §3.4 (init container exits quickly when model already present on the volume) - Add NL search smoke test to §3.4 (curl command distinguishing 200 from 503 NL_SEARCH_UNAVAILABLE) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs(deployment): fix OLLAMA_API_KEY version ref and add --wait warning
2026-06-06 14:59:35 +02:00 · 2026-06-06 14:59:35 +02:00 · 2026-06-06 14:59:35 +02:00 · 2026-06-06 14:59:35 +02:00 · 2026-06-06 14:59:35 +02:00 · 2026-06-06 14:59:35 +02:00
19 changed files with 1002 additions and 94 deletions
--- a/.claude/personas/devops.md
+++ b/.claude/personas/devops.md
@@ -154,9 +154,9 @@ Schedule monthly automated restore tests. If the restore fails, the backup is wo
 ```
 Every alert needs: description, severity, likely cause, resolution steps, escalation path.

-3. **Upgrading VPS tier before profiling**
+3. **Upgrading hardware before profiling**
 ```
-# "The app feels slow" → upgrade from CX32 to CX42
+# "The app feels slow" → order more RAM / a faster CPU
 # Actual cause: unindexed query scanning 100k rows
 ```
 Profile with Grafana dashboards first. Most perceived performance issues are application bugs, not resource constraints.
@@ -404,8 +404,8 @@ Hetzner Object Storage (S3-compatible, replaces MinIO in prod)
 Prometheus + Loki + Alertmanager
 ```

-### Monthly Cost: ~23 EUR
-CX32 VPS (4 vCPU, 8GB RAM): 17 EUR · Object Storage (~200GB): 5 EUR · SMTP relay: ~1 EUR
+### Monthly Cost: ~6 EUR (excl. server)
+Hetzner dedicated server (Serverbörse, i7-6700, 64 GB RAM): see invoice · Object Storage (~200GB): 5 EUR · SMTP relay: ~1 EUR

 ### Reference Documentation
 - Full CI workflow, Gitea vs GitHub differences: `docs/infrastructure/ci-gitea.md`
--- a/.env.example
+++ b/.env.example
@@ -72,6 +72,25 @@ VITE_SENTRY_DSN=
 # Sentry/GlitchTip auth token for source map upload at build time (optional)
 SENTRY_AUTH_TOKEN=

+# NL search — Ollama LLM inference
+# Leave APP_OLLAMA_BASE_URL empty to disable NL search (safe default for CX32 / CI).
+# Set to http://ollama:11434 to enable. Requires CX42 (16 GB RAM) to run alongside OCR.
+APP_OLLAMA_BASE_URL=http://ollama:11434
+
+# CPU limit: 4.0 is safe on both CX32 (4 vCPUs) and CX42 (8 vCPUs).
+# Raise to 7.5 on CX42 for full throughput.
+OLLAMA_CPU_LIMIT=4.0
+
+# Memory limit: requires CX42 (16 GB) to run alongside OCR.
+# Reduce or set APP_OLLAMA_BASE_URL= on smaller hosts.
+OLLAMA_MEM_LIMIT=8g
+
+# Ollama API key — set on the Ollama service to restrict inference API access on archiv-net.
+# Generate with: openssl rand -hex 32
+# NOTE: Empirically verified that OLLAMA_API_KEY is NOT enforced in Ollama 0.6.5 or 0.30.6 (ADR-028 §7).
+# archiv-net network isolation is the only effective access control. Retained for forward compatibility.
+OLLAMA_API_KEY=
+
 # Production SMTP — uncomment and fill in to send real emails instead of catching them
 # APP_BASE_URL=https://your-domain.example.com
 # MAIL_HOST=smtp.example.com
--- a/backend/src/main/java/org/raddatz/familienarchiv/person/PersonRepository.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/person/PersonRepository.java
@@ -29,14 +29,36 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
    // Stammbaum-Knoten: alle Personen mit family_member = true.
    List<Person> findByFamilyMemberTrueOrderByLastNameAscFirstNameAsc();

-    // Lookup by full alias string, used during ODS mass import
-    Optional<Person> findByAliasIgnoreCase(String alias);
+    // Exact-case alias lookup — the first resolution step in findOrCreateByAlias.
+    // Case-colliding aliases across persons (müller / Müller) are valid human labels, NOT
+    // duplicates: source_ref is the stable identity (ADR-025/033), alias is editable. Do NOT
+    // add a unique(lower(alias)) constraint — see ADR-033.
+    Optional<Person> findByAlias(String alias);
+
+    // Plural case-insensitive alias lookup — the fallback step. Returns ALL case-folding
+    // siblings so the service can pick a deterministic one (lowest id) instead of letting a
+    // derived Optional<…>IgnoreCase throw NonUniqueResultException. See ADR-033.
+    List<Person> findAllByAliasIgnoreCase(String alias);

    // Lookup by the normalizer person_id, used for idempotent canonical re-import (Phase 3).
    Optional<Person> findBySourceRef(String sourceRef);

-    // Exact first+last name match, used for filename-based sender lookup
-    Optional<Person> findByFirstNameIgnoreCaseAndLastNameIgnoreCase(String firstName, String lastName);
+    // Exact-case first+last name match — the first step of filename-based sender resolution.
+    // Explicit `=` (HQL, not a derived query) so a null firstName binds as `first_name = NULL`
+    // — never a match — instead of the derived-query fold to `first_name IS NULL`, which would
+    // pull a last-name-only row in as a sender (a provenance defect). See ADR-033.
+    @Query("SELECT p FROM Person p WHERE p.firstName = :firstName AND p.lastName = :lastName")
+    Optional<Person> findByFirstNameAndLastName(@Param("firstName") String firstName,
+                                                @Param("lastName") String lastName);
+
+    // Plural case-insensitive first+last name match — lets findByName bail to empty on 2+ matches
+    // instead of letting a derived Optional<…>IgnoreCase throw NonUniqueResultException. Same
+    // null fail-closed guarantee as above: LOWER(:firstName) is NULL for a null arg, so a null
+    // first name resolves to no match (not first_name IS NULL widening). See ADR-033.
+    @Query("SELECT p FROM Person p WHERE LOWER(p.firstName) = LOWER(:firstName) "
+         + "AND LOWER(p.lastName) = LOWER(:lastName)")
+    List<Person> findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(@Param("firstName") String firstName,
+                                                                   @Param("lastName") String lastName);

    // --- PersonSummaryDTO with document count ---

--- a/backend/src/main/java/org/raddatz/familienarchiv/person/PersonService.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/person/PersonService.java
@@ -1,5 +1,6 @@
 package org.raddatz.familienarchiv.person;

+import java.util.Comparator;
 import java.util.List;
 import java.util.Optional;
 import java.util.UUID;
@@ -110,7 +111,19 @@ public class PersonService {
    }

    public Optional<Person> findByName(String firstName, String lastName) {
-        return personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase(firstName, lastName);
+        // Same scope as findOrCreateByAlias (#731): a case-collision resolves without throwing;
+        // two byte-identical same-case persons are an out-of-scope data anomaly the exact
+        // Optional below would surface as the opaque INTERNAL_ERROR, not a wrong sender.
+        Optional<Person> exact = personRepository.findByFirstNameAndLastName(firstName, lastName);
+        if (exact.isPresent()) return exact;
+        List<Person> caseInsensitive =
+                personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(firstName, lastName);
+        // Deliberate divergence from findOrCreateByAlias: an ambiguous filename leaves the sender
+        // UNSET rather than picking the lowest id. The archive's value is correct provenance — a
+        // confidently-wrong pre-filled "Hans Müller" is worse than an empty field, because a
+        // reviewer won't re-check a pre-filled value. Do NOT "consistency-clean" this into the
+        // lowest-id fallback. See ADR-033.
+        return caseInsensitive.size() == 1 ? Optional.of(caseInsensitive.get(0)) : Optional.empty();
    }

    /** Lookup by the normalizer person_id — used by the canonical importer for register-first matching. */
@@ -125,32 +138,45 @@ public class PersonService {
        PersonType type = PersonTypeClassifier.classify(alias);
        if (type == PersonType.SKIP) return null;

-        return personRepository.findByAliasIgnoreCase(alias).orElseGet(() -> {
-            if (type == PersonType.INSTITUTION || type == PersonType.GROUP) {
-                return personRepository.save(Person.builder()
-                        .alias(alias)
-                        .lastName(alias)
-                        .personType(type)
-                        .build());
-            }
+        // Aliases differing only by case (müller / Müller) are valid distinct persons, not
+        // duplicates, so a CASE-COLLISION must not throw: exact-case first, then the lowest-id
+        // case-insensitive sibling, then create. Mirrors the tag path — see ADR-033.
+        // Scope (#731): "ambiguous" means case-insensitive. Two BYTE-IDENTICAL same-case aliases
+        // are a true data anomaly out of scope here; the exact Optional below would surface that
+        // as the opaque INTERNAL_ERROR (never a wrong row), not silently pick one.
+        Optional<Person> exact = personRepository.findByAlias(alias);
+        if (exact.isPresent()) return exact.get();              // exact-case wins
+        List<Person> caseInsensitive = personRepository.findAllByAliasIgnoreCase(alias);
+        if (!caseInsensitive.isEmpty()) {
+            return caseInsensitive.stream().min(Comparator.comparing(Person::getId)).orElseThrow(); // deterministic tie-break — list is non-empty, never throws
+        }

-            PersonNameParser.SplitName split = PersonNameParser.split(alias);
-            Person person = personRepository.save(Person.builder()
+        // Create-when-absent: institution/group keep the full label in lastName; a person name
+        // is split and a maiden name (geb. …) becomes a MAIDEN_NAME alias.
+        if (type == PersonType.INSTITUTION || type == PersonType.GROUP) {
+            return personRepository.save(Person.builder()
                    .alias(alias)
-                    .firstName(split.firstName())
-                    .lastName(split.lastName())
+                    .lastName(alias)
+                    .personType(type)
                    .build());
-            if (split.maidenName() != null) {
-                int nextSortOrder = aliasRepository.findMaxSortOrder(person.getId()) + 1;
-                aliasRepository.save(PersonNameAlias.builder()
-                        .person(person)
-                        .lastName(split.maidenName())
-                        .type(PersonNameAliasType.MAIDEN_NAME)
-                        .sortOrder(nextSortOrder)
-                        .build());
-            }
-            return person;
-        });
+        }
+
+        PersonNameParser.SplitName split = PersonNameParser.split(alias);
+        Person person = personRepository.save(Person.builder()
+                .alias(alias)
+                .firstName(split.firstName())
+                .lastName(split.lastName())
+                .build());
+        if (split.maidenName() != null) {
+            int nextSortOrder = aliasRepository.findMaxSortOrder(person.getId()) + 1;
+            aliasRepository.save(PersonNameAlias.builder()
+                    .person(person)
+                    .lastName(split.maidenName())
+                    .type(PersonNameAliasType.MAIDEN_NAME)
+                    .sortOrder(nextSortOrder)
+                    .build());
+        }
+        return person;
    }

    /**
--- a/backend/src/main/java/org/raddatz/familienarchiv/person/README.md
+++ b/backend/src/main/java/org/raddatz/familienarchiv/person/README.md
@@ -20,8 +20,8 @@ Features: person CRUD, name alias management, person merge (deduplication), fami
 | `getById(UUID)` | document, geschichte, ocr | Fetch one person by ID |
 | `getAllById(List<UUID>)` | document | Bulk fetch for sender/receiver resolution |
 | `findAll(String q)` | document, dashboard | List all persons |
-| `findByName(String firstName, String lastName)` | document | Typeahead search |
-| `findOrCreateByAlias(String rawName)` | importing | Idempotent create during mass import; type classification happens internally |
+| `findByName(String firstName, String lastName)` | document | Filename-based **sender resolution** in `storeDocument`: exact-case match → single case-insensitive match → else **empty** (ambiguous names leave the sender unset; a null first name never matches). See ADR-033. |
+| `findOrCreateByAlias(String rawName)` | importing | Idempotent create during mass import; type classification happens internally. Resolves exact-case → lowest-id case-insensitive sibling → create — never throws on case-colliding aliases. See ADR-033. |
 | `findAllFamilyMembers()` | dashboard | Family member list for stats |
 | `findCorrespondents()` | document | Correspondent list for conversation filter |
 | `count()` | dashboard | Total person count for stats |
--- a/backend/src/test/java/org/raddatz/familienarchiv/exception/GlobalExceptionHandlerTest.java
+++ b/backend/src/test/java/org/raddatz/familienarchiv/exception/GlobalExceptionHandlerTest.java
@@ -12,6 +12,7 @@ import org.mockito.MockedStatic;
 import org.mockito.junit.jupiter.MockitoExtension;
 import org.slf4j.LoggerFactory;
 import org.springframework.dao.DataIntegrityViolationException;
+import org.springframework.dao.IncorrectResultSizeDataAccessException;
 import org.springframework.http.ResponseEntity;

 import static org.assertj.core.api.Assertions.assertThat;
@@ -37,6 +38,30 @@ class GlobalExceptionHandlerTest {
        }
    }

+    @Test
+    void handleGeneric_incorrectResultSize_staysOpaque_noHibernateOrRowCountLeak() {
+        // #731: before the fix, a case-colliding alias/name made Hibernate throw
+        // NonUniqueResultException → IncorrectResultSizeDataAccessException, which has no
+        // dedicated handler and falls through to handleGeneric. The fix removes the throw, but
+        // this pins the handler: a stray one must stay opaque — no Hibernate class name, no SQL,
+        // no "2 results were returned" row count reaching the client (CWE-209).
+        IncorrectResultSizeDataAccessException ex = new IncorrectResultSizeDataAccessException(
+                "query did not return a unique result: 2 results were returned", 1, 2);
+
+        try (MockedStatic<Sentry> sentryMock = mockStatic(Sentry.class)) {
+            ResponseEntity<GlobalExceptionHandler.ErrorResponse> response = handler.handleGeneric(ex);
+
+            assertThat(response.getStatusCode().value()).isEqualTo(500);
+            assertThat(response.getBody()).isNotNull();
+            assertThat(response.getBody().code()).isEqualTo(ErrorCode.INTERNAL_ERROR);
+            assertThat(response.getBody().message())
+                    .isEqualTo("An unexpected error occurred")
+                    .doesNotContain("results were returned")
+                    .doesNotContain("NonUnique")
+                    .doesNotContain("IncorrectResultSize");
+        }
+    }
+
    @Test
    void handleDataIntegrityViolation_returns400_withoutLeakingConstraint_orSentry() {
        // A DataIntegrityViolationException carries the constraint name + SQL in its message;
--- a/backend/src/test/java/org/raddatz/familienarchiv/person/PersonRepositoryTest.java
+++ b/backend/src/test/java/org/raddatz/familienarchiv/person/PersonRepositoryTest.java
@@ -121,37 +121,60 @@ class PersonRepositoryTest {
                .containsExactly("Anna", "Clara");
    }

-    // ─── findByAliasIgnoreCase ────────────────────────────────────────────────
+    // ─── findByAlias (exact) / findAllByAliasIgnoreCase (case-folding siblings) ───

    @Test
-    void findByAliasIgnoreCase_returnsMatchingPerson() {
+    void findByAlias_returnsExactCaseMatchOnly() {
        personRepository.save(Person.builder()
                .firstName("Karl").lastName("Brandt").alias("Opa Karl").build());

-        Optional<Person> found = personRepository.findByAliasIgnoreCase("opa karl");
-
-        assertThat(found).isPresent();
-        assertThat(found.get().getFirstName()).isEqualTo("Karl");
+        assertThat(personRepository.findByAlias("Opa Karl")).isPresent();
+        assertThat(personRepository.findByAlias("opa karl")).isEmpty(); // exact-case: a folded form does NOT match
    }

    @Test
-    void findByAliasIgnoreCase_returnsEmpty_whenAliasDoesNotMatch() {
-        Optional<Person> found = personRepository.findByAliasIgnoreCase("nobody");
-
-        assertThat(found).isEmpty();
+    void findAllByAliasIgnoreCase_returnsEmpty_whenAliasDoesNotMatch() {
+        assertThat(personRepository.findAllByAliasIgnoreCase("nobody")).isEmpty();
    }

-    // ─── findByFirstNameIgnoreCaseAndLastNameIgnoreCase ───────────────────────
+    @Test
+    void findAllByAliasIgnoreCase_foldsUmlautCase_inRealPostgres() {
+        // Proves Postgres LOWER() folds ü the same way for both rows — a plain-ASCII probe would
+        // stay green even if umlaut folding regressed. Both case-colliding aliases must match.
+        personRepository.save(Person.builder().lastName("Müller").alias("Müller").build());
+        personRepository.save(Person.builder().lastName("müller").alias("müller").build());
+
+        assertThat(personRepository.findAllByAliasIgnoreCase("MÜLLER")).hasSize(2);
+    }
+
+    // ─── findByFirstNameAndLastName (exact) / findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase ───

    @Test
-    void findByFirstNameIgnoreCaseAndLastNameIgnoreCase_returnsMatch() {
+    void findByFirstNameAndLastName_returnsExactCaseMatchOnly() {
        personRepository.save(Person.builder().firstName("Maria").lastName("Raddatz").build());

-        Optional<Person> found = personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase(
-                "maria", "raddatz");
+        assertThat(personRepository.findByFirstNameAndLastName("Maria", "Raddatz")).isPresent();
+        assertThat(personRepository.findByFirstNameAndLastName("maria", "raddatz")).isEmpty(); // exact-case only
+    }

-        assertThat(found).isPresent();
-        assertThat(found.get().getFirstName()).isEqualTo("Maria");
+    @Test
+    void findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase_foldsUmlautCase_inRealPostgres() {
+        personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
+        personRepository.save(Person.builder().firstName("hans").lastName("müller").build());
+
+        assertThat(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("HANS", "MÜLLER"))
+                .hasSize(2);
+    }
+
+    @Test
+    void findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase_nullFirstName_foldsToNoMatch() {
+        // Fail-closed: a last-name-only filename (null first name) must NOT widen to first_name IS
+        // NULL and pull in the institution/last-name-only row as a "sender". Proven on real
+        // Postgres because a mocked unit test cannot catch the IS NULL vs `= NULL` semantics.
+        personRepository.save(Person.builder().lastName("Müller").build()); // first_name NULL
+
+        assertThat(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(null, "Müller"))
+                .isEmpty();
    }

    // ─── findCorrespondents ───────────────────────────────────────────────────
--- a/backend/src/test/java/org/raddatz/familienarchiv/person/PersonServiceIntegrationTest.java
+++ b/backend/src/test/java/org/raddatz/familienarchiv/person/PersonServiceIntegrationTest.java
@@ -4,6 +4,7 @@ import org.junit.jupiter.api.Test;
 import org.raddatz.familienarchiv.PostgresContainerConfig;
 import org.raddatz.familienarchiv.document.Document;
 import org.raddatz.familienarchiv.document.DocumentRepository;
+import org.raddatz.familienarchiv.document.DocumentService;
 import org.raddatz.familienarchiv.document.DocumentStatus;
 import org.raddatz.familienarchiv.person.Person;
 import org.raddatz.familienarchiv.person.PersonType;
@@ -16,10 +17,13 @@ import org.springframework.test.context.bean.override.mockito.MockitoBean;
 import org.springframework.transaction.annotation.Transactional;
 import software.amazon.awssdk.services.s3.S3Client;

+import org.springframework.mock.web.MockMultipartFile;
+
 import jakarta.persistence.EntityManager;
 import jakarta.persistence.PersistenceContext;

 import java.util.Set;
+import java.util.UUID;

 import static org.assertj.core.api.Assertions.assertThat;

@@ -33,6 +37,7 @@ class PersonServiceIntegrationTest {
    @Autowired PersonService personService;
    @Autowired PersonRepository personRepository;
    @Autowired DocumentRepository documentRepository;
+    @Autowired DocumentService documentService;

    @PersistenceContext EntityManager entityManager;

@@ -75,6 +80,93 @@ class PersonServiceIntegrationTest {
        assertThat(result.getLastName()).isEqualTo("Cram");
    }

+    // ─── #731: case-colliding alias resolution against real Postgres ───────────
+    // The umlaut pair is mandatory — only the real DB proves Postgres LOWER() folds ü; a
+    // plain-ASCII test would stay green while umlaut aliases regressed.
+
+    @Test
+    void findOrCreateByAlias_resolvesUmlautAliasCollision_toLowestId_withoutThrow() {
+        Person muller = personRepository.save(Person.builder().lastName("Müller").alias("Müller").build());
+        Person mullerLower = personRepository.save(Person.builder().lastName("müller").alias("müller").build());
+        UUID expected = muller.getId().compareTo(mullerLower.getId()) <= 0 ? muller.getId() : mullerLower.getId();
+
+        // No exact-case "MÜLLER" row → falls through to the case-insensitive branch with two
+        // candidates and must pick the lowest id, never throwing NonUniqueResultException.
+        Person resolved = personService.findOrCreateByAlias("MÜLLER");
+
+        assertThat(resolved.getId()).isEqualTo(expected);
+    }
+
+    @Test
+    void findOrCreateByAlias_umlautAliasCollision_isDeterministicAcrossCalls() {
+        personRepository.save(Person.builder().lastName("Müller").alias("Müller").build());
+        personRepository.save(Person.builder().lastName("müller").alias("müller").build());
+
+        Person first = personService.findOrCreateByAlias("MÜLLER");
+        Person second = personService.findOrCreateByAlias("MÜLLER");
+
+        assertThat(second.getId()).isEqualTo(first.getId());
+    }
+
+    // ─── #731: filename-based sender resolution against real Postgres ──────────
+
+    @Test
+    void storeDocument_resolvesSender_whenFilenameNameIsUnique() throws Exception {
+        Person hans = personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
+
+        Document doc = uploadNamed("1965-03-12_Müller_Hans.pdf").document();
+
+        assertThat(doc.getSender()).isNotNull();
+        assertThat(doc.getSender().getId()).isEqualTo(hans.getId());
+    }
+
+    @Test
+    void storeDocument_resolvesSender_onSingleCaseInsensitiveMatch() throws Exception {
+        Person hans = personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
+
+        // Filename folds to "hans müller"; the only stored person is "Hans Müller".
+        Document doc = uploadNamed("1965-03-12_müller_hans.pdf").document();
+
+        assertThat(doc.getSender()).isNotNull();
+        assertThat(doc.getSender().getId()).isEqualTo(hans.getId());
+    }
+
+    @Test
+    void storeDocument_leavesSenderUnset_whenFilenameNameIsAmbiguous() throws Exception {
+        // Two persons collide case-insensitively; the filename casing ("HANS"/"MÜLLER") matches
+        // neither exactly → no exact-case winner → bail to null (never an arbitrary guess), no 500.
+        personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
+        personRepository.save(Person.builder().firstName("hans").lastName("müller").build());
+
+        Document doc = uploadNamed("1965-03-12_MÜLLER_HANS.pdf").document();
+
+        assertThat(doc.getSender()).isNull();
+    }
+
+    @Test
+    void storeDocument_leavesSenderUnset_whenFilenameHasNoFirstName() throws Exception {
+        // A last-name-only filename never resolves to a sender (the parser yields no parsed name).
+        personRepository.save(Person.builder().lastName("Müller").build());
+
+        Document doc = uploadNamed("1965-03-12_Müller.pdf").document();
+
+        assertThat(doc.getSender()).isNull();
+    }
+
+    @Test
+    void findByName_nullFirstName_resolvesToEmpty_inRealPostgres() {
+        // Fail-closed against the real DB: a null first name must NOT widen to first_name IS NULL
+        // and pick up the last-name-only row.
+        personRepository.save(Person.builder().lastName("Müller").build()); // first_name NULL
+
+        assertThat(personService.findByName(null, "Müller")).isEmpty();
+    }
+
+    private DocumentService.StoreResult uploadNamed(String filename) throws Exception {
+        MockMultipartFile file = new MockMultipartFile("file", filename, "application/pdf", new byte[]{1, 2, 3});
+        return documentService.storeDocument(file, null);
+    }
+
    // ─── #667: confirm round-trip + reader-default semantics ──────────────────

    @Test
--- a/backend/src/test/java/org/raddatz/familienarchiv/person/PersonServiceTest.java
+++ b/backend/src/test/java/org/raddatz/familienarchiv/person/PersonServiceTest.java
@@ -375,14 +375,57 @@ class PersonServiceTest {
    // ─── findOrCreateByAlias ─────────────────────────────────────────────────

    @Test
-    void findOrCreateByAlias_returnsExisting_whenAliasFound() {
-        String alias = "Walter de Gruyter";
-        Person existing = Person.builder().id(UUID.randomUUID()).alias(alias).build();
-        when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.of(existing));
+    void findOrCreateByAlias_returnsExactCaseMatch_overCaseInsensitiveSibling() {
+        String alias = "müller";
+        Person exact = Person.builder().id(UUID.randomUUID()).alias("müller").build();
+        when(personRepository.findByAlias(alias)).thenReturn(Optional.of(exact));

        Person result = personService.findOrCreateByAlias(alias);

-        assertThat(result).isEqualTo(existing);
+        assertThat(result).isEqualTo(exact);
+        verify(personRepository, never()).findAllByAliasIgnoreCase(any());
+        verify(personRepository, never()).save(any());
+    }
+
+    @Test
+    void findOrCreateByAlias_returnsExactCaseMatch_evenWhenMultipleSiblingsCollide() {
+        String alias = "Müller";
+        Person exact = Person.builder().id(UUID.randomUUID()).alias("Müller").build();
+        when(personRepository.findByAlias(alias)).thenReturn(Optional.of(exact));
+
+        Person result = personService.findOrCreateByAlias(alias);
+
+        assertThat(result).isEqualTo(exact);
+        // exact-case short-circuits — the case-insensitive siblings are never consulted.
+        verify(personRepository, never()).findAllByAliasIgnoreCase(any());
+    }
+
+    @Test
+    void findOrCreateByAlias_usesSingleCaseInsensitiveMatch_whenNoExactCase() {
+        String alias = "müller";
+        Person only = Person.builder().id(UUID.randomUUID()).alias("Müller").build();
+        when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
+        when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of(only));
+
+        Person result = personService.findOrCreateByAlias(alias);
+
+        assertThat(result).isEqualTo(only);
+        verify(personRepository, never()).save(any());
+    }
+
+    @Test
+    void findOrCreateByAlias_returnsLowestIdDeterministically_whenMultipleCaseInsensitiveMatches() {
+        String alias = "müller";
+        Person lower = Person.builder().id(UUID.fromString("00000000-0000-0000-0000-000000000001")).alias("Müller").build();
+        Person higher = Person.builder().id(UUID.fromString("00000000-0000-0000-0000-000000000002")).alias("müller").build();
+        when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
+        when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of(higher, lower)); // unordered
+
+        Person first = personService.findOrCreateByAlias(alias);
+        Person second = personService.findOrCreateByAlias(alias);
+
+        assertThat(first.getId()).isEqualTo(lower.getId());      // lowest id wins
+        assertThat(second.getId()).isEqualTo(first.getId());     // same result every call — never throws
        verify(personRepository, never()).save(any());
    }

@@ -390,7 +433,8 @@ class PersonServiceTest {
    void findOrCreateByAlias_createsNew_whenAliasNotFound() {
        String alias = "Clara Cram";
        Person saved = Person.builder().id(UUID.randomUUID()).alias(alias).firstName("Clara").lastName("Cram").build();
-        when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
+        when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
+        when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
        when(personRepository.save(any())).thenReturn(saved);

        Person result = personService.findOrCreateByAlias(alias);
@@ -403,7 +447,8 @@ class PersonServiceTest {
    void findOrCreateByAlias_createsMaidenNameAlias_whenGebPresent() {
        String alias = "Clara Cram geb. de Gruyter";
        Person saved = Person.builder().id(UUID.randomUUID()).alias(alias).firstName("Clara").lastName("Cram").build();
-        when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
+        when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
+        when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
        when(personRepository.save(any())).thenReturn(saved);
        when(aliasRepository.findMaxSortOrder(saved.getId())).thenReturn(0);
        when(aliasRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
@@ -425,7 +470,8 @@ class PersonServiceTest {
    @Test
    void findOrCreateByAlias_setsInstitutionType_withFullNameInLastName() {
        String alias = "Arthur Collignon GmbH";
-        when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
+        when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
+        when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
        when(personRepository.save(any())).thenAnswer(inv -> {
            Person p = inv.getArgument(0);
            p.setId(UUID.randomUUID());
@@ -442,7 +488,8 @@ class PersonServiceTest {
    @Test
    void findOrCreateByAlias_setsGroupType_withFullNameInLastName() {
        String alias = "Geschwister de Gruyter";
-        when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
+        when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
+        when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
        when(personRepository.save(any())).thenAnswer(inv -> {
            Person p = inv.getArgument(0);
            p.setId(UUID.randomUUID());
@@ -460,7 +507,8 @@ class PersonServiceTest {
    void findOrCreateByAlias_noAlias_whenNoGeb() {
        String alias = "Clara Cram";
        Person saved = Person.builder().id(UUID.randomUUID()).alias(alias).firstName("Clara").lastName("Cram").build();
-        when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
+        when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
+        when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
        when(personRepository.save(any())).thenReturn(saved);

        personService.findOrCreateByAlias(alias);
@@ -472,11 +520,54 @@ class PersonServiceTest {
    void findOrCreateByAlias_trimsInput() {
        String alias = "  Clara Cram  ";
        Person saved = Person.builder().id(UUID.randomUUID()).alias("Clara Cram").build();
-        when(personRepository.findByAliasIgnoreCase("Clara Cram")).thenReturn(Optional.of(saved));
+        when(personRepository.findByAlias("Clara Cram")).thenReturn(Optional.of(saved));

        personService.findOrCreateByAlias(alias);

-        verify(personRepository).findByAliasIgnoreCase("Clara Cram");
+        verify(personRepository).findByAlias("Clara Cram");
+    }
+
+    // ─── findByName (filename-based sender resolution) ────────────────────────
+
+    @Test
+    void findByName_returnsExactCaseMatch_overCaseInsensitiveSibling() {
+        Person exact = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
+        when(personRepository.findByFirstNameAndLastName("Hans", "Müller")).thenReturn(Optional.of(exact));
+
+        assertThat(personService.findByName("Hans", "Müller")).contains(exact);
+        verify(personRepository, never()).findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(any(), any());
+    }
+
+    @Test
+    void findByName_usesSingleCaseInsensitiveMatch_whenNoExactCase() {
+        Person only = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
+        when(personRepository.findByFirstNameAndLastName("hans", "müller")).thenReturn(Optional.empty());
+        when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("hans", "müller"))
+                .thenReturn(List.of(only));
+
+        assertThat(personService.findByName("hans", "müller")).contains(only);
+    }
+
+    @Test
+    void findByName_bailsToEmpty_whenTwoOrMoreCaseInsensitiveMatches() {
+        Person a = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
+        Person b = Person.builder().id(UUID.randomUUID()).firstName("hans").lastName("müller").build();
+        when(personRepository.findByFirstNameAndLastName("hans", "müller")).thenReturn(Optional.empty());
+        when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("hans", "müller"))
+                .thenReturn(List.of(a, b));
+
+        // Ambiguous sender → unset, never an arbitrary guess (provenance correctness over a
+        // confidently-wrong pre-fill). This is the deliberate divergence from the alias path.
+        assertThat(personService.findByName("hans", "müller")).isEmpty();
+    }
+
+    @Test
+    void findByName_returnsEmpty_whenFirstNameNullFoldsToNoMatch() {
+        when(personRepository.findByFirstNameAndLastName(null, "Müller")).thenReturn(Optional.empty());
+        when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(null, "Müller"))
+                .thenReturn(List.of());
+
+        assertThat(personService.findByName(null, "Müller")).isEmpty();
    }

    // ─── updatePerson (notes) ────────────────────────────────────────────────
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -141,6 +141,65 @@ services:
    security_opt:
      - no-new-privileges:true

+  # --- Ollama: Model init (one-shot pull) ---
+  # Pulls qwen2.5:7b-instruct-q4_K_M (~4.7 GB) into the ollama_models volume on first start.
+  # On subsequent starts (model already in volume), exits quickly without re-downloading.
+  # Not started in CI — CI uses explicit service selection
+  # (docker-compose.ci.yml: db minio create-buckets)
+  ollama-model-init:
+    image: ollama/ollama:0.30.6
+    restart: "no"
+    networks:
+      - archiv-net
+    volumes:
+      - ollama_models:/root/.ollama
+    mem_limit: 2g
+    read_only: true
+    tmpfs:
+      - /tmp:size=512m
+    cap_drop:
+      - ALL
+    security_opt:
+      - no-new-privileges:true
+    command: >
+      sh -c "ollama serve & SERVE_PID=$$! && until curl -sf http://localhost:11434/api/tags; do sleep 1; done && ollama pull qwen2.5:7b-instruct-q4_K_M && kill $$SERVE_PID"
+
+  # --- Ollama: LLM inference server ---
+  # Serves the pre-pulled model for NL search inference.
+  # Not started in CI — CI uses explicit service selection
+  # (docker-compose.ci.yml: db minio create-buckets)
+  ollama:
+    image: ollama/ollama:0.30.6
+    container_name: archive-ollama
+    restart: unless-stopped
+    expose:
+      - "11434"
+    networks:
+      - archiv-net
+    volumes:
+      - ollama_models:/root/.ollama
+    environment:
+      OLLAMA_API_KEY: "${OLLAMA_API_KEY}"
+    cpus: "${OLLAMA_CPU_LIMIT:-4.0}"
+    mem_limit: "${OLLAMA_MEM_LIMIT:-8g}"
+    memswap_limit: "${OLLAMA_MEM_LIMIT:-8g}"
+    read_only: true
+    tmpfs:
+      - /tmp:size=512m
+    cap_drop:
+      - ALL
+    security_opt:
+      - no-new-privileges:true
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
+      interval: 30s
+      timeout: 10s
+      retries: 5
+      start_period: 60s  # model weights are pre-loaded by ollama-model-init; service only needs to bind port
+    depends_on:
+      ollama-model-init:
+        condition: service_completed_successfully
+
  # --- Backend: Spring Boot ---
  backend:
    build:
@@ -184,6 +243,8 @@ services:
      SPRING_MAIL_PROPERTIES_MAIL_SMTP_STARTTLS_ENABLE: ${MAIL_STARTTLS_ENABLE:-false}
      APP_OCR_BASE_URL: http://ocr-service:8000
      APP_OCR_TRAINING_TOKEN: "${OCR_TRAINING_TOKEN:-}"
+      APP_OLLAMA_BASE_URL: "${APP_OLLAMA_BASE_URL:-http://ollama:11434}"
+      APP_OLLAMA_API_KEY: "${OLLAMA_API_KEY}"
      SENTRY_DSN: ${SENTRY_DSN:-}
      SENTRY_TRACES_SAMPLE_RATE: ${SENTRY_TRACES_SAMPLE_RATE:-1.0}
      # Observability: send traces to Tempo inside archiv-net (OTLP gRPC port 4317)
@@ -247,3 +308,4 @@ volumes:
  frontend_node_modules:
  ocr_models:
  ocr_cache:
+  ollama_models:
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -50,13 +50,16 @@ graph TD

 The OCR service requires significant RAM for model loading. The dev compose sets `mem_limit: 12g`.

-| Production target | RAM | Recommended OCR limit | Notes |
-|---|---|---|---|
-| Hetzner CX42 | 16 GB | 12 GB | Recommended for OCR-enabled production |
-| Hetzner CX32 | 8 GB | 6 GB | Accept reduced batch sizes and slower throughput |
-| Hetzner CX22 | 4 GB | — | Disable the OCR service (`profiles: [ocr]`); run OCR on demand only |
+| Production target | RAM | Recommended OCR limit | NL Search | Notes |
+|---|---|---|---|---|
+| Current server (Hetzner Serverbörse, i7-6700) | 64 GB | 12 GB | Supported | Default `mem_limit: 12g` works comfortably; plenty of headroom for Ollama |
+| ≥ 16 GB RAM | 16+ GB | 12 GB | Supported | Default works |
+| 8 GB RAM | 8 GB | 6 GB | Disabled — set `APP_OLLAMA_BASE_URL=` (empty) | Set `OCR_MEM_LIMIT=6g`; accept reduced batch sizes |
+| 4 GB RAM | 4 GB | — | Unsupported | Disable OCR service (`profiles: [ocr]`); run OCR on demand only |

-A CX32 cannot honour the default `mem_limit: 12g` — set the `OCR_MEM_LIMIT=6g` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow) before deploying on a CX32. The prod compose interpolates this var with a 12g default.
+On servers with less than 16 GB RAM the default `mem_limit: 12g` cannot be honoured — set the `OCR_MEM_LIMIT` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow). The prod compose interpolates this var with a 12g default.
+
+> **Memory budget:** OCR (~6 GB active) + Ollama (~8 GB) = ~14 GB. On servers with less than 16 GB RAM, do not run `docker-compose.observability.yml` continuously alongside both OCR and Ollama.

 ### Dev vs production differences

@@ -140,10 +143,20 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
 | `ALLOWED_PDF_HOSTS` | SSRF protection — comma-separated list of allowed PDF source hosts. **Do not widen to `*`** | `minio,localhost,127.0.0.1` | YES | — |
 | `KRAKEN_MODEL_PATH` | Directory containing Kraken HTR models (populated by `download-kraken-models.sh`) | `/app/models/` | — | — |
 | `BLLA_MODEL_PATH` | Kraken baseline layout analysis model path | `/app/models/blla.mlmodel` | — | — |
-| `OCR_MEM_LIMIT` | Container memory cap for ocr-service in `docker-compose.prod.yml`. Set to `6g` on CX32 hosts; leave unset on CX42+ to use the 12g default | `12g` (prod compose default) | — | — |
+| `OCR_MEM_LIMIT` | Container memory cap for ocr-service in `docker-compose.prod.yml`. Set to `6g` on servers with 8 GB RAM; leave unset (12g default) on servers with ≥ 16 GB RAM | `12g` (prod compose default) | — | — |
 | `XDG_CACHE_HOME` | XDG cache base dir — redirects Matplotlib and other XDG-aware libraries away from the read-only `HOME` (`/home/ocr`) to the writable cache volume | `/app/cache` | — | — |
 | `TORCH_HOME` | PyTorch model cache — redirects `~/.cache/torch` to the writable models volume | `/app/models/torch` | — | — |

+### Ollama (NL search) service
+
+| Variable | Purpose | Default | Required? | Sensitive? |
+|---|---|---|---|---|
+| `APP_OLLAMA_BASE_URL` | Base URL for the Ollama service. Leave empty to disable NL search. | `http://ollama:11434` | — | — |
+| `APP_OLLAMA_API_KEY` | API key passed as `Authorization: Bearer` to Ollama. Leave empty for unauthenticated access. Note: `OLLAMA_API_KEY` is not enforced in Ollama 0.6.5 or 0.30.6 (see ADR-028). | — | — | YES |
+| `OLLAMA_CPU_LIMIT` | Docker CPU quota for the Ollama container. On CX42 (8 vCPUs) can be raised to `7.5`. | `4.0` | — | — |
+| `OLLAMA_MEM_LIMIT` | Memory limit for the Ollama container. Requires CX42 (16 GB RAM). | `8g` | — | — |
+| `OLLAMA_API_KEY` | API key set on the Ollama service itself. Same value as `APP_OLLAMA_API_KEY`. Leave empty for unauthenticated. | — | — | YES |
+
 ### Observability stack (`docker-compose.observability.yml`)

 | Variable | Purpose | Default | Required? | Sensitive? |
@@ -264,6 +277,19 @@ git.raddatz.cloud      A   <server IP>

 ### 3.4 First deploy

+> **First start — Ollama model pull:** On first `docker compose up -d`, the `ollama-model-init` container pulls `qwen2.5:7b-instruct-q4_K_M` (~4.7 GB). At 10 Mbps this takes approximately 60–90 minutes; at 100 Mbps approximately 6–10 minutes. The pull is a one-time operation — subsequent restarts skip it (model already on the `ollama_models` volume). Monitor progress with `docker logs -f $(docker ps -q --filter name=ollama-model-init)`.
+>
+> **Do not use `--wait` on first deploy** — `docker compose up -d --wait` waits for all services to reach their health/completion target, including `ollama-model-init`. On first pull this blocks for 60–90 minutes and will time out any CI/deploy script that uses `--wait`.
+>
+> **Re-deploy idempotency:** on subsequent `docker compose up -d` runs (including `--force-recreate`), `ollama-model-init` re-executes but exits in seconds — Ollama's CLI skips the download when the model digest already matches what is on the volume.
+>
+> **Verify NL search is active** after enabling Ollama (`APP_OLLAMA_BASE_URL=http://ollama:11434`):
+> ```bash
+> curl -s http://localhost:8080/api/nl-search?q=brief+von+grossmutter
+> # Returns 200 with results → NL search is active
+> # Returns 503 NL_SEARCH_UNAVAILABLE → Ollama is not reachable or APP_OLLAMA_BASE_URL is unset
+> ```
+
 ```bash
 # 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
 #    Expected: docker compose up -d --wait succeeds for archiv-staging, then
@@ -559,6 +585,24 @@ bash scripts/download-kraken-models.sh

 > Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.

+### Upgrade the Ollama model
+
+To switch to a newer model version (e.g. a future release of `qwen2.5`):
+
+1. Update the model name in the `ollama-model-init` `command:` in `docker-compose.yml`.
+2. Remove the existing model volume to free the old weights:
+   ```bash
+   docker volume rm familienarchiv_ollama_models
+   ```
+   (In production the volume name is prefixed with the compose project: `archiv-production_ollama_models`.)
+3. Restart the stack:
+   ```bash
+   docker compose up -d
+   ```
+   The `ollama-model-init` container pulls the new model weights on first start (~4–8 GB download depending on the model). The `ollama` inference server will not start until the pull completes (`condition: service_completed_successfully`).
+
+> **`ollama_models` volume:** holds model weights only — fully reproducible by re-pull, no backup needed.
+
 ### Trigger a canonical import

 The importer no longer parses the raw spreadsheet. It consumes the **canonical artifacts**
--- a/docs/adr/004-pdfbox-thumbnails.md
+++ b/docs/adr/004-pdfbox-thumbnails.md
@@ -35,7 +35,7 @@ Render thumbnails in-process in Spring Boot using **Apache PDFBox 3.0.4** (alrea

 **Harder:**
 - PDFBox is a parser attack surface. Mitigated by a 30-second watchdog timeout in `ThumbnailAsyncRunner` and by the fire-and-forget contract (failures never break upload).
- Memory ceiling: the `thumbnailExecutor` is capped at 2 threads on the CX32 (8 GB). A busy backfill alongside OCR can approach the 3 GB heap — acceptable but not comfortable. Streaming via `FileService.downloadFileStream` keeps this bounded for PDFs up to 50 MB.
+- Memory ceiling: the `thumbnailExecutor` is capped at 2 threads on memory-constrained hosts. A busy backfill alongside OCR can approach the 3 GB heap on an 8 GB server — acceptable but not comfortable. The current production server (64 GB) has ample headroom. Streaming via `FileService.downloadFileStream` keeps this bounded for PDFs up to 50 MB.

 ### Operational caveats (intentional)

--- a/docs/adr/021-tmpdir-persistent-volume-staging.md
+++ b/docs/adr/021-tmpdir-persistent-volume-staging.md
@@ -62,7 +62,7 @@ The `/tmp` tmpfs remains at 512 MB and continues to serve training-ZIP extractio
 ## Alternatives considered

 **Approach B — Enlarge `/tmp` to 4 GB**  
-One-line change. Discarded because: (1) 4 GB tmpfs counts against the cgroup `mem_limit`; on CX32 hosts with `OCR_MEM_LIMIT=6g` the combined Surya resident set + tmpfs would trigger OOMKill on cold start; (2) staging GB-scale model files through RAM is using the wrong storage tier; (3) any future model larger than 4 GB requires another bump.
+One-line change. Discarded because: (1) 4 GB tmpfs counts against the cgroup `mem_limit`; on servers with `OCR_MEM_LIMIT=6g` the combined Surya resident set + tmpfs would trigger OOMKill on cold start; (2) staging GB-scale model files through RAM is using the wrong storage tier; (3) any future model larger than 4 GB requires another bump.

 **Approach C — Both TMPDIR redirect and enlarged /tmp**  
 Belt-and-suspenders: Approach A + 1 GB tmpfs. Discarded in favour of the cleaner Approach A. The defence-in-depth benefit does not outweigh the extra compose churn; the 512 MB cap on `/tmp` is intentional.
--- a/docs/adr/028-ollama-docker-compose-service.md
+++ b/docs/adr/028-ollama-docker-compose-service.md
@@ -0,0 +1,239 @@
+# ADR-028: Ollama Docker Compose service for NL search
+
+**Date:** 2026-06-06
+**Status:** Accepted
+**Deciders:** Marcel Raddatz
+**Relates to:** #737 (infrastructure), #735 (NL search epic)
+
+---
+
+## Context
+
+Issue #735 introduces natural-language document search, requiring a local LLM to generate embeddings and/or run inference at query time. The family archive stores personal family history — data privacy is non-negotiable, so cloud-based inference APIs are excluded. The production target is a Hetzner CX42 (16 GB RAM, 8 vCPUs, CPU-only, ~32 EUR/month).
+
+Alternatives considered:
+
+| Option | Reason rejected |
+|---|---|
+| **llama.cpp** | No HTTP API out of the box; requires custom wrapper; higher ops burden |
+| **vLLM** | GPU-first; significant overhead on CPU-only hardware; overkill for this scale |
+| **Cloud APIs** (OpenAI, Gemini, etc.) | Vendor lock-in; per-token cost at scale; data leaves the server — unacceptable for a private family archive |
+| **Ollama** | Self-contained Docker image; built-in HTTP REST API; actively maintained; CPU-compatible; zero egress |
+
+**Decision:** run Ollama as a Docker Compose service alongside the existing stack.
+
+---
+
+## Decisions
+
+### 1. Hardware minimums and CPU-only constraint
+
+All inference runs on CPU. The target is the Hetzner CX42 (16 GB RAM, 8 vCPUs).
+
+| Tier | RAM | NL search |
+|---|---|---|
+| CX42 | 16 GB | Supported — full stack including Ollama |
+| CX32 | 8 GB | Disabled — set `APP_OLLAMA_BASE_URL=` (empty) to skip Ollama entirely |
+| CX22 | 4 GB | Unsupported for NL search |
+
+### 2. Memory budget on CX42
+
+| Component | `mem_limit` | Typical active RSS |
+|---|---|---|
+| OCR service | 12g (hard ceiling) | ~6 GB |
+| Ollama | 8g | ~8 GB |
+| **Total** | | **~14 GB active** |
+
+`memswap_limit` on the Ollama service is set to `8g` (matching `mem_limit`) to prevent Linux from swapping model weights into swap under OCR memory pressure. Swapping model weights does not crash the container but silently degrades inference latency. This mirrors the pattern already applied to the OCR service.
+
+**Operational constraint:** do NOT run `docker-compose.observability.yml` continuously alongside both OCR and Ollama on a CX42. The observability stack adds ~2 GB, which leaves no headroom.
+
+### 3. Graceful-degradation contract
+
+`app.ollama.base-url` absent OR blank → Ollama bean NOT registered → NL search returns HTTP 503 with `ErrorCode: NL_SEARCH_UNAVAILABLE`.
+
+This single code path covers all unavailability scenarios: base-url unset, service unreachable, health check failed, and request timeout.
+
+#### Why not `@ConditionalOnProperty`
+
+`@ConditionalOnProperty` registers the bean when the property is present but blank (`APP_OLLAMA_BASE_URL=`). This produces a `RestClient` with an empty base URL that fails at runtime with an opaque error rather than a clean 503.
+
+#### Correct condition expression
+
+```java
+@ConditionalOnExpression("!'${app.ollama.base-url:}'.isBlank()")
+```
+
+When the property is absent, the placeholder resolves to `''`; `.isBlank()` returns `true`; negation makes the condition `false`; the bean is not registered. Same result for an explicit empty string (`APP_OLLAMA_BASE_URL=`).
+
+### 4. Backend configuration pattern
+
+Use a `@ConfigurationProperties` record, not separate `@Value` injections:
+
+```java
+@ConfigurationProperties("app.ollama")
+record OllamaProperties(String baseUrl, String apiKey) {}
+```
+
+`OllamaProperties` is registered unconditionally — it is a plain value holder with no side effects.
+
+`@ConditionalOnExpression` belongs **only** on `RestClientOllamaClient` (the bean that creates a live network client).
+
+**Deliberate divergence from the OCR pattern:** the OCR service uses `@Value`-with-default because OCR is always-on and `http://ocr-service:8000` is a safe default. Ollama is truly optional — a missing URL means "feature disabled", not "use this default server". There is no safe default Ollama URL.
+
+### 5. Optional<OllamaClient> injection
+
+The NL search service uses constructor injection with `Optional<OllamaClient>`:
+
+```java
+private final Optional<OllamaClient> ollamaClient;
+```
+
+When empty (bean not registered), the service method returns 503 immediately:
+
+```java
+var client = ollamaClient.orElseThrow(
+    () -> DomainException.internal(ErrorCode.NL_SEARCH_UNAVAILABLE, "Ollama not configured"));
+```
+
+Prefer this over `@Autowired(required = false)` with a null check — the null-check pattern is noisy when the service already uses `@RequiredArgsConstructor`.
+
+### 6. Empty API key guard
+
+`RestClientOllamaClient` omits the `Authorization` header entirely when `apiKey` is blank:
+
+```java
+if (!apiKey.isBlank()) {
+    request.header("Authorization", "Bearer " + apiKey);
+}
+```
+
+Sending `Authorization: Bearer ` (empty token) has undefined or potentially broken behavior depending on the Ollama version. This mirrors the `trainingToken` guard in `RestClientOcrClient.java:107`.
+
+### 7. OLLAMA_API_KEY behavior in Ollama 0.6.5 and 0.30.6
+
+**Empirically verified (2026-06-06) on both `0.6.5` and `0.30.6`:** `OLLAMA_API_KEY` does **not** enforce request authentication in either version.
+
+Test matrix run against `/api/tags`:
+
+| Configuration | No auth header | `Authorization: Bearer ` (empty) | `Authorization: Bearer wrongkey` | `Authorization: Bearer correctkey` |
+|---|---|---|---|---|
+| `OLLAMA_API_KEY=` (empty) | 200 | 200 | — | — |
+| `OLLAMA_API_KEY` unset | 200 | — | — | — |
+| `OLLAMA_API_KEY=testkey99` | 200 | 200 | 200 | 200 |
+
+**Finding:** The `OLLAMA_API_KEY` environment variable is not listed in Ollama's startup config dump and does not gate any HTTP request in either tested version. All configurations — empty string, fully unset, and a real key — accept all requests without authentication.
+
+**Practical implication:** `OLLAMA_API_KEY` provides no defense-in-depth in the tested versions. `archiv-net` network isolation is the only effective security control. The env var is retained in the Compose definition and `.env.example` for forward compatibility if Ollama enables enforcement in a future version, but operators must not rely on it for access control.
+
+**Backend guard still valid:** the `RestClientOllamaClient` code-level guard (omit `Authorization` header when `apiKey.isBlank()`) remains correct behavior regardless — it prevents a malformed `Authorization: Bearer ` header from being sent.
+
+### 8. read_only: true feasibility
+
+**Empirically verified (2026-06-06) on both `0.6.5` and `0.30.6`:** `read_only: true` works with Ollama. All three operations — `ollama serve`, `ollama pull qwen2.5:7b-instruct-q4_K_M`, and `ollama list` — succeeded with exit code 0 in both versions.
+
+Test run:
+```bash
+docker run --rm --read-only \
+  -v ollama_models:/root/.ollama \
+  --tmpfs /tmp \
+  --entrypoint sh ollama/ollama:0.30.6 \
+  -c "ollama serve & sleep 5 && ollama pull qwen2.5:7b-instruct-q4_K_M && ollama list"
+```
+
+**Note:** the entrypoint must be overridden to `sh` for the test command — the container's default entrypoint is `/bin/ollama` and does not accept `sh` as a subcommand. This is a Docker invocation detail; the Compose service definition uses the image's default entrypoint and `command:` override for the init container, which works correctly.
+
+**Result:** `read_only: true` and `tmpfs: - /tmp:size=512m` are applied to both `ollama` and `ollama-model-init`. The `ollama_models` volume handles all persistent writes; no other paths require write access during normal operation.
+
+### 9. Peak RSS of init container during pull
+
+**Empirically verified (2026-06-06):** Peak RSS during `qwen2.5:7b-instruct-q4_K_M` pull was **~108 MiB**.
+
+`docker stats` samples during the pull (15-second intervals):
+
+| Sample | MEM |
+|---|---|
+| 1 | 54.89 MiB |
+| 2 | 66.3 MiB |
+| 5 | 97.25 MiB |
+| 9 | **107.8 MiB** (peak) |
+
+`mem_limit: 2g` is adequate — the model weights stream directly to the named volume; RSS is dominated by the Ollama server process alone (~100 MB), not the model data. No bump to 4 GB needed.
+
+### 10. Init container pull mechanism
+
+The `ollama-model-init` container uses a curl-based readiness loop with captured PID:
+
+```sh
+ollama serve & SERVE_PID=$!
+until curl -sf http://localhost:11434/api/tags; do sleep 1; done
+ollama pull qwen2.5:7b-instruct-q4_K_M
+kill $SERVE_PID
+```
+
+`kill %1` (job-control syntax) is unreliable in non-interactive `sh -c` contexts. Capturing the PID via `SERVE_PID=$!` is reliable.
+
+The same endpoint (`/api/tags`) is used for both the init container readiness loop and the main service `healthcheck`.
+
+### 11. start_period: 60s rationale
+
+The model is pre-pulled by `ollama-model-init` before the main service starts (via `condition: service_completed_successfully`). At main service startup, Ollama only loads model weights from the named volume and binds port 11434.
+
+60 seconds is appropriate for this cold-start profile. 300 seconds was considered — that would be appropriate if the service pulled the model itself — but overstates actual startup time when the model is already present on the volume.
+
+### 12. Security threat model
+
+**Primary control:** `archiv-net` network isolation. Ollama has no externally exposed port (`expose:` only, not `ports:`). The Caddyfile must not route any path to the Ollama service.
+
+**Note on `OLLAMA_API_KEY`:** Per §7, `OLLAMA_API_KEY` is not enforced in Ollama 0.6.5 or 0.30.6 and provides no authentication barrier against a compromised backend container. `archiv-net` network isolation is the sole effective security control. The env var is retained for forward compatibility only — do not rely on it for access control.
+
+Both `ollama` and `ollama-model-init` receive the ADR-019 hardening baseline:
+
+```yaml
+cap_drop: [ALL]
+security_opt: [no-new-privileges:true]
+```
+
+### 13. CI exclusion strategy
+
+Docker Compose profiles are not used — they would add developer friction (requiring `--profile ...` for all local dev commands).
+
+CI uses explicit service selection in `docker-compose.ci.yml`:
+```bash
+docker compose -f docker-compose.ci.yml up -d db minio create-buckets
+```
+
+Ollama is simply not listed and is never started in CI. A YAML comment on the `ollama` service block documents this:
+
+```yaml
+# Not started in CI — CI uses explicit service selection
+# (docker-compose.ci.yml: db minio create-buckets)
+```
+
+### 14. ollama_models volume operational note
+
+The `ollama_models` named volume holds model weights only — fully reproducible by re-pull. No backup is needed.
+
+If the volume fills after a model upgrade:
+```bash
+docker volume rm ollama_models && docker compose up -d
+```
+The init container re-pulls the model on next startup.
+
+---
+
+## Consequences
+
+### Positive
+
+- NL search runs entirely on-premises; no data leaves the server and no per-token cloud cost.
+- Graceful degradation is a first-class concern: smaller or budget-constrained instances can run the app without Ollama with a single env var change.
+- The init container pattern keeps model pull out of the critical startup path for the main service, giving accurate healthcheck timings.
+- `@ConditionalOnExpression` with a blank-check is more correct than `@ConditionalOnProperty` for optional features with no safe default URL.
+
+### Risks and operational implications
+
+- **Memory pressure:** OCR + Ollama together consume ~14 GB on a 16 GB host. Running the observability stack simultaneously risks OOM kills. Monitor with `docker stats`.
+- **CPU inference latency:** `qwen2.5:7b-instruct-q4_K_M` is chosen for CPU viability, but inference on 8 vCPUs will be noticeably slower than GPU-accelerated alternatives. This is acceptable for the family archive use case (low concurrency, not real-time).
+- All three empirical TBD items from the original issue spec were resolved — see §7 (OLLAMA_API_KEY not enforced), §8 (`read_only: true` works), §9 (peak RSS ~108 MiB).
+- Model upgrades require a `docker volume rm` to free old weights before pulling the replacement. Document this in runbook/DEPLOYMENT.md.
--- a/docs/adr/033-tag-name-resolution-tolerates-case-collisions.md
+++ b/docs/adr/033-tag-name-resolution-tolerates-case-collisions.md
@@ -1,4 +1,4 @@
-# ADR-032 — Tag-name resolution tolerates case-collisions: exact-case first, then a deterministic lowest-id fallback, and never a `unique(lower(name))` constraint
+# ADR-033 — Tag-name resolution tolerates case-collisions: exact-case first, then a deterministic lowest-id fallback, and never a `unique(lower(name))` constraint

 **Date:** 2026-06-06
 **Status:** Accepted
@@ -82,15 +82,58 @@ added later.
  `IncorrectResultSizeDataAccessException`, and `GlobalExceptionHandler`'s generic handler maps
  any stray one to `INTERNAL_ERROR` with no Hibernate/SQL leak — so no dedicated handler was
  added.
- **The sibling Person path is unfixed but tracked.** `PersonService.findOrCreateByAlias`
-  (`findByAliasIgnoreCase`) and `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` carry the same
-  latent `Optional`-non-unique throw on user-influenced names; deferred to #731 rather than
-  widened into this fix.
+- **The sibling Person path is fixed the same way — see the Person extension below (#731).**
 - Postgres `LOWER()` folding of umlauts (`ü`/`ä`) is the actual correctness hinge of the
  fallback and cannot be proven by a mocked repo, so it is pinned by a Testcontainers
  `postgres:16-alpine` test on a `Glückwünsche`/`glückwünsche` pair; a plain-ASCII test would
  stay green while the bug reappeared for umlaut tags.

+## Person extension (#731)
+
+The Person domain carried the same latent throw on **two** user-influenced lookup surfaces, and
+is fixed with the same exact-case-first, non-throwing pattern — but with a deliberately
+**different fallback per surface**, because the two paths have different consequences.
+
+- **Alias path — `PersonService.findOrCreateByAlias` — deterministic lowest-id (mirrors tag).**
+  `findByAliasIgnoreCase` (`Optional`) is replaced by `findByAlias` (exact) → `findAllByAliasIgnoreCase`
+  (plural, lowest id) → the existing create-when-absent branch (INSTITUTION/GROUP and the
+  maiden-name alias are preserved verbatim). There is no human in the importer loop and the path
+  creates-on-absent anyway, so a deterministic guess is the right behaviour — exactly like tags.
+
+- **Name/sender path — `PersonService.findByName` — bail to null on ambiguity (the new wrinkle).**
+  Used only by `DocumentService.storeDocument` to resolve the upload **sender** from the parsed
+  filename. `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` (`Optional`) is replaced by
+  `findByFirstNameAndLastName` (exact) → `findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase`
+  (plural). Resolution returns the exact-case match, else the single case-insensitive match, else
+  — on **two or more** matches — **empty**. The sender is left unset rather than guessing.
+
+  **Why this diverges from the alias (and tag) decision:** the archive's value is correct
+  provenance. A confidently-wrong pre-filled `Hans Müller` is worse than an empty field, because a
+  senior reviewer will not re-check a value that is already filled in, whereas an empty sender
+  routes the document into the "needs completion" state (`metadataComplete=false`) for a human to
+  assign. The load-bearing comment at `findByName` records this so a future "consistency cleanup"
+  does not reintroduce the confidently-wrong-sender bug by switching it to lowest-id.
+
+- **Fail-closed on a null first name.** A parsed filename can lack a first name. The two new name
+  methods use explicit HQL equality (`= :firstName`) rather than a derived
+  `…IgnoreCase` query, because Spring Data folds a null derived-query argument to `first_name IS
+  NULL` — which would silently widen the match and pull a last-name-only / institution row in as a
+  "sender" (a quiet provenance-integrity defect). With HQL equality a null binds as `= NULL`,
+  which never matches, so a null first name resolves to **no sender**. This is pinned by a
+  real-Postgres repository test.
+
+- **Scope — "ambiguous" is case-insensitive only.** Both exact-case lookups (`findByAlias`,
+  `findByFirstNameAndLastName`) return `Optional`, so two **byte-identical same-case** rows would
+  still throw `NonUniqueResultException`. That is a true data anomaly, deliberately out of scope
+  (it is not a case-collision), and it surfaces as the opaque `INTERNAL_ERROR` — never a silently
+  wrong row — so it is no worse than any other unexpected error and needs no extra handling here.
+
+- **Same stance as tags otherwise:** no `unique(lower(alias))` / `unique(lower(name))` constraint
+  (collisions are valid human labels; `source_ref` is the stable identity per ADR-025), no
+  merge/dedupe, code-only and reversible, and no shared `resolveExactThenCi(...)` helper — the
+  two Person paths have different fallbacks, so the exact→CI→fallback logic is inlined at each
+  with its load-bearing comment (KISS).
+
 ## Alternatives considered

 - **A `unique(lower(name))` index** — rejected: the collisions are valid canonical nodes, and
--- a/docs/architecture/c4/l2-containers.puml
+++ b/docs/architecture/c4/l2-containers.puml
@@ -12,13 +12,15 @@ System_Boundary(archiv, "Familienarchiv (Docker Compose)") {
    Container(frontend, "Web Frontend", "SvelteKit / Node adapter / port 3000", "Server-side rendered UI. Handles auth session cookies, document search and viewer, transcription editor, annotation layer, family tree (Stammbaum), stories (Geschichten), activity feed (Chronik), enrichment workflow, and admin panel.")
    Container(backend, "API Backend", "Spring Boot 4 / Java 21 / Jetty / port 8080", "REST API. Implements document management, search, user auth, file upload/download, transcription, OCR orchestration, and SSE notifications. Trusts X-Forwarded-* headers from Caddy.")
    Container(ocr, "OCR Service", "Python FastAPI / port 8000", "Handwritten text recognition (HTR) and OCR microservice. Single-node by design — see ADR-001. Reachable only on the internal Docker network; no external port exposed.")
+    Container(ollama, "Ollama LLM Service", "ollama/ollama:0.30.6 / port 11434 (internal only)", "Local LLM inference server for NL search. Runs qwen2.5:7b-instruct-q4_K_M on CPU. Reachable only on the internal Docker network; no external port exposed. Disabled when APP_OLLAMA_BASE_URL is unset or blank.")
+    ' Named volume: ollama_models — model weights, fully reproducible, no backup needed
    ContainerDb(db, "Relational Database", "PostgreSQL 16", "Stores document metadata, persons, users, permission groups, tags, transcription blocks, audit log, and Spring Session data.")
    ContainerDb(storage, "Object Storage", "MinIO (S3-compatible)", "Stores the actual document files (PDFs, scans). Backend uses a bucket-scoped service account (archiv-app), not MinIO root.")
    Container(mc, "Bucket / Service-Account Init", "MinIO Client (mc)", "One-shot container on startup. Idempotent: creates the archive bucket, the archiv-app service account, and attaches the readwrite policy.")
 }

 System_Boundary(observability, "Observability Stack (/opt/familienarchiv/docker-compose.observability.yml)") {
-    Container(prometheus, "Prometheus", "prom/prometheus:v3.4.0", "Scrapes metrics from backend management port 8081 (/actuator/prometheus), node-exporter, and cAdvisor. Retention: 30 days.")
+    Container(prometheus, "Prometheus", "prom/prometheus:v3.4.0", "Scrapes metrics from backend (8081 /actuator/prometheus), OCR service (8000 /metrics), Ollama (11434 /metrics), node-exporter, and cAdvisor. Retention: 30 days.")
    Container(node_exporter, "Node Exporter", "prom/node-exporter:v1.9.0", "Host-level CPU, memory, disk, and network metrics.")
    Container(cadvisor, "cAdvisor", "gcr.io/cadvisor/cadvisor:v0.52.1", "Per-container resource metrics.")
    Container(loki, "Loki", "grafana/loki:3.4.2", "Stores log streams from all containers.")
@@ -45,6 +47,8 @@ Rel(promtail, loki, "Pushes log streams", "HTTP/Loki push API")
 Rel(backend, tempo, "Sends distributed traces via OTLP", "HTTP / OTLP / port 4318 (archiv-net)")
 Rel(prometheus, backend, "Scrapes JVM + HTTP metrics", "HTTP 8081 /actuator/prometheus")
 Rel(prometheus, ocr, "Scrapes OCR + http_* metrics", "HTTP 8000 /metrics")
+Rel(backend, ollama, "NL search inference requests", "HTTP / REST / JSON")
+Rel(prometheus, ollama, "Scrapes LLM request metrics", "HTTP 11434 /metrics")
 Rel(grafana, prometheus, "Queries metrics", "HTTP 9090")
 Rel(grafana, loki, "Queries logs", "HTTP 3100")
 Rel(grafana, tempo, "Queries traces", "HTTP 3200")
--- a/docs/infrastructure/production-compose.md
+++ b/docs/infrastructure/production-compose.md
@@ -20,24 +20,19 @@ The observability stack (Prometheus, Loki, Grafana, Tempo, GlitchTip) ships as a

 ---

-## VPS Sizing Recommendations
+## Server Sizing

-### Recommended: Hetzner CX32
+### Current Production Server: Hetzner Dedicated (Serverbörse)

-**Specs**: 4 vCPU, 8 GB RAM, 80 GB SSD · **Cost**: 17 EUR/mo
+**Specs**: Intel Core i7-6700 (4C/8T, 3.4 GHz), 64 GB RAM · acquired via Hetzner server auction

-Sufficient for the application stack (Postgres, MinIO, OCR with `mem_limit: 12g`, backend, frontend, Caddy) on a CX32 today. Once the observability stack lands (Prometheus/Loki/Grafana/Alertmanager add ~2 GB) consider a CX42.
+Comfortably handles the full application stack (Postgres, MinIO, OCR with `mem_limit: 12g`, backend, frontend, Caddy, full observability stack) with headroom to spare. The 64 GB RAM means OCR, Ollama inference, and the observability stack can all run concurrently without memory pressure.

-### When to Upgrade: Hetzner CX42
+### When to Reconsider Hardware

-**Specs**: 8 vCPU, 16 GB RAM · **Cost**: 29 EUR/mo
-
-Upgrade when:
- Observability stack adds memory pressure (Loki + Grafana with >30 days retention)
- OCR throughput needs scaling beyond a single-node Surya/Kraken setup
- Real user load profiled in Grafana shows response-time degradation
-
-Never upgrade the VPS tier before profiling — most perceived performance issues are application bugs, not resource constraints.
+- CPU is Skylake (2015) — single-threaded performance is the likely bottleneck before RAM
+- Profile with Grafana dashboards before concluding hardware is the constraint
+- Most perceived performance issues are application bugs (unindexed queries, N+1 loads), not resource limits

 ---

@@ -45,12 +40,11 @@ Never upgrade the VPS tier before profiling — most perceived performance issue

 | Service | Cost |
 |---|---|
-| Hetzner CX32 VPS | 17.00 EUR |
+| Hetzner dedicated server (Serverbörse, i7-6700, 64 GB RAM) | see invoice |
 | Hetzner DNS | 0.00 EUR |
 | Hetzner SMTP relay | ~1.00 EUR |
-| **Total** | **~18 EUR/mo** |

-MinIO data lives on the VPS disk (no Object Storage line item yet). The Hetzner OBS migration would add ~5 EUR/mo at ~200 GB.
+MinIO data lives on the server disk (no Object Storage line item yet). The Hetzner OBS migration would add ~5 EUR/mo at ~200 GB.

 Equivalent SaaS stack: 200–300 EUR/mo.

--- a/infra/observability/grafana/provisioning/dashboards/ollama.json
+++ b/infra/observability/grafana/provisioning/dashboards/ollama.json
@@ -0,0 +1,218 @@
+{
+  "id": null,
+  "uid": "ollama-dashboard",
+  "title": "Ollama",
+  "description": "Ollama inference latency and request rate",
+  "version": 1,
+  "schemaVersion": 39,
+  "tags": ["ollama", "inference"],
+  "timezone": "browser",
+  "editable": true,
+  "fiscalYearStartMonth": 0,
+  "graphTooltip": 1,
+  "links": [],
+  "liveNow": false,
+  "refresh": "30s",
+  "time": {
+    "from": "now-1h",
+    "to": "now"
+  },
+  "timepicker": {},
+  "weekStart": "",
+  "annotations": {
+    "list": [
+      {
+        "builtIn": 1,
+        "datasource": { "type": "datasource", "uid": "grafana" },
+        "enable": true,
+        "hide": true,
+        "iconColor": "rgba(0, 211, 255, 1)",
+        "name": "Annotations & Alerts",
+        "type": "dashboard"
+      }
+    ]
+  },
+  "panels": [
+    {
+      "id": 1,
+      "type": "timeseries",
+      "title": "Inference Latency p50",
+      "description": "50th percentile of Ollama request duration over a 5-minute window",
+      "gridPos": { "h": 8, "w": 8, "x": 0, "y": 0 },
+      "datasource": { "type": "prometheus", "uid": "prometheus" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "palette-classic" },
+          "custom": {
+            "axisBorderShow": false,
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
+            "insertNulls": false,
+            "lineInterpolation": "linear",
+            "lineWidth": 2,
+            "pointSize": 5,
+            "scaleDistribution": { "type": "linear" },
+            "showPoints": "auto",
+            "spanNulls": false,
+            "stacking": { "group": "A", "mode": "none" },
+            "thresholdsStyle": { "mode": "off" }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              { "color": "green", "value": null },
+              { "color": "red", "value": 80 }
+            ]
+          },
+          "unit": "s"
+        },
+        "overrides": []
+      },
+      "options": {
+        "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
+        "tooltip": { "mode": "single", "sort": "none" }
+      },
+      "targets": [
+        {
+          "datasource": { "type": "prometheus", "uid": "prometheus" },
+          "editorMode": "code",
+          "expr": "histogram_quantile(0.5, rate(ollama_request_duration_seconds_bucket[5m]))",
+          "instant": false,
+          "legendFormat": "p50",
+          "range": true,
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "id": 2,
+      "type": "timeseries",
+      "title": "Inference Latency p95",
+      "description": "95th percentile of Ollama request duration over a 5-minute window",
+      "gridPos": { "h": 8, "w": 8, "x": 8, "y": 0 },
+      "datasource": { "type": "prometheus", "uid": "prometheus" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "palette-classic" },
+          "custom": {
+            "axisBorderShow": false,
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
+            "insertNulls": false,
+            "lineInterpolation": "linear",
+            "lineWidth": 2,
+            "pointSize": 5,
+            "scaleDistribution": { "type": "linear" },
+            "showPoints": "auto",
+            "spanNulls": false,
+            "stacking": { "group": "A", "mode": "none" },
+            "thresholdsStyle": { "mode": "off" }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              { "color": "green", "value": null },
+              { "color": "red", "value": 80 }
+            ]
+          },
+          "unit": "s"
+        },
+        "overrides": []
+      },
+      "options": {
+        "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
+        "tooltip": { "mode": "single", "sort": "none" }
+      },
+      "targets": [
+        {
+          "datasource": { "type": "prometheus", "uid": "prometheus" },
+          "editorMode": "code",
+          "expr": "histogram_quantile(0.95, rate(ollama_request_duration_seconds_bucket[5m]))",
+          "instant": false,
+          "legendFormat": "p95",
+          "range": true,
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "id": 3,
+      "type": "timeseries",
+      "title": "Request Rate",
+      "description": "Ollama requests per second over a 5-minute window",
+      "gridPos": { "h": 8, "w": 8, "x": 16, "y": 0 },
+      "datasource": { "type": "prometheus", "uid": "prometheus" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "palette-classic" },
+          "custom": {
+            "axisBorderShow": false,
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
+            "insertNulls": false,
+            "lineInterpolation": "linear",
+            "lineWidth": 2,
+            "pointSize": 5,
+            "scaleDistribution": { "type": "linear" },
+            "showPoints": "auto",
+            "spanNulls": false,
+            "stacking": { "group": "A", "mode": "none" },
+            "thresholdsStyle": { "mode": "off" }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              { "color": "green", "value": null },
+              { "color": "red", "value": 80 }
+            ]
+          },
+          "unit": "reqps"
+        },
+        "overrides": []
+      },
+      "options": {
+        "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
+        "tooltip": { "mode": "single", "sort": "none" }
+      },
+      "targets": [
+        {
+          "datasource": { "type": "prometheus", "uid": "prometheus" },
+          "editorMode": "code",
+          "expr": "rate(ollama_requests_total[5m])",
+          "instant": false,
+          "legendFormat": "req/s",
+          "range": true,
+          "refId": "A"
+        }
+      ]
+    }
+  ],
+  "preload": false,
+  "templating": {
+    "list": []
+  }
+}
--- a/infra/observability/prometheus/prometheus.yml
+++ b/infra/observability/prometheus/prometheus.yml
@@ -20,4 +20,10 @@ scrape_configs:
  - job_name: ocr-service
    metrics_path: /metrics
    static_configs:
-      - targets: ['ocr:8000']
+      - targets: ['ocr-service:8000']
+
+  - job_name: ollama
+    metrics_path: /metrics
+    static_configs:
+      # Uses the Docker service name for reliable DNS resolution.
+      - targets: ['ollama:11434']
Author	SHA1	Message	Date
Marcel	7679596c70	docs(ollama): add model upgrade runbook + post-deploy smoke test to DEPLOYMENT.md Some checks failed CI / Unit & Component Tests (pull_request) Has been cancelled Details CI / OCR Service Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details CI / fail2ban Regex (pull_request) Has been cancelled Details CI / Semgrep Security Scan (pull_request) Has been cancelled Details CI / Compose Bucket Idempotency (pull_request) Has been cancelled Details CI / Unit & Component Tests (push) Successful in 3m16s Details CI / OCR Service Tests (push) Successful in 23s Details CI / Backend Unit Tests (push) Successful in 3m37s Details CI / fail2ban Regex (push) Successful in 47s Details CI / Semgrep Security Scan (push) Successful in 22s Details CI / Compose Bucket Idempotency (push) Successful in 1m4s Details Addresses Elicit's and Sara's review concerns on PR #749: - Expand §6 ollama_models section into a full model upgrade runbook (step-by-step docker volume rm + recreate, including production volume name prefix) - Add re-deploy idempotency note to §3.4 (init container exits quickly when model already present on the volume) - Add NL search smoke test to §3.4 (curl command distinguishing 200 from 503 NL_SEARCH_UNAVAILABLE) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:59:35 +02:00
Marcel	3d5dcd1f18	docs(deployment): fix OLLAMA_API_KEY version ref and add --wait warning Updated OLLAMA_API_KEY env vars table from 0.6.5 to 0.6.5 or 0.30.6 to match both tested versions. Added an explicit warning in §3.4 that docker compose up -d --wait blocks for 60–90 min on first deploy when the model pull has not yet completed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:59:35 +02:00
Marcel	52fca38f0f	docs(env): correct OLLAMA_API_KEY comment — tested on 0.6.5 and 0.30.6 Both versions were tested and neither enforces the key. Comment updated to say "0.6.5 or 0.30.6" and surface archiv-net as the sole effective control. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:59:35 +02:00
Marcel	662a8f3e80	fix(infra): interpolate APP_OLLAMA_BASE_URL so .env empty-value disables Ollama Hardcoded literal overrides any .env setting — setting APP_OLLAMA_BASE_URL= in .env had no effect on the backend container. Now uses the same pattern as APP_OCR_TRAINING_TOKEN with a safe default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:59:35 +02:00
Marcel	cbba95c3f8	docs(c4): fix Ollama container version 0.6.5 → 0.30.6 in l2-containers.puml Diagram must match the pinned image version in docker-compose.yml. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:59:35 +02:00
Marcel	3536ed884c	docs(adr): fix ADR-028 §12 false API-key claim, stale TBD, and §7 title §12 stated OLLAMA_API_KEY guards against lateral movement — contradicts §7's empirical finding that it is not enforced. Replaced with an accurate note referencing §7. Stale pre-merge placeholder in Consequences ("Three TBD items must be resolved") removed; all three are resolved. §7 section title updated from "0.6.5" to "0.6.5 and 0.30.6" to match the body text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:59:35 +02:00
Marcel	5a939d9222	fix(infra): escape \$\$SERVE_PID in compose command to prevent interpolation (#737 ) Docker Compose interpolates $VAR in command strings — use $$ to pass a literal $ to the shell so SERVE_PID=$! and kill $SERVE_PID work correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:59:35 +02:00
Marcel	93e90424ab	docs(adr): update ADR-028 with 0.30.6 verified findings for API key + read_only (#737 ) - OLLAMA_API_KEY: non-enforcement confirmed on both 0.6.5 and 0.30.6 - read_only: true: confirmed working on both 0.6.5 and 0.30.6 - Peak RSS during pull: ~108 MiB (well under 2g limit) - All TBD placeholders resolved Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:59:35 +02:00
Marcel	e8f3004c4f	feat(infra): add Ollama env vars to .env.example (#737 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:59:35 +02:00
Marcel	9637ebbca2	feat(infra): add Ollama Docker Compose services for NL search (#737 ) - ollama-model-init: one-shot init container that pulls qwen2.5:7b-instruct-q4_K_M into the ollama_models volume on first start - ollama: main inference service on archiv-net (expose: only, no public port) - ollama_models named volume for persistent model storage - APP_OLLAMA_BASE_URL + APP_OLLAMA_API_KEY added to backend env - Both services: cap_drop ALL, no-new-privileges, read_only+tmpfs (ADR-019 + ADR-028) - start_period: 60s — model pre-pulled by init container Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:59:35 +02:00
Marcel	df10a42069	docs(deploy): document Ollama hardware requirements, env vars, and ops notes (#737 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:59:35 +02:00
Marcel	64120a30b5	docs(arch): add Ollama container to C4 level-2 container diagram (#737 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:58:49 +02:00
Marcel	25252fc709	feat(observability): add Grafana Ollama inference latency dashboard (#737 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:58:49 +02:00
Marcel	1f379a161d	fix(observability): fix OCR target name + add Ollama scrape job (#737 ) - prometheus.yml: ocr:8000 → ocr-service:8000 (Docker service name is ocr-service, not ocr — current scrape target has never resolved) - Add Ollama scrape job on ollama:11434 /metrics Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:58:49 +02:00
Marcel	c0d034c85d	docs(adr): add ADR-028 — Ollama Docker Compose service for NL search (#737 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:58:49 +02:00
Marcel	ca93cde06e	docs(infra): correct server specs — Hetzner Serverbörse i7-6700 64 GB, not CX32 All checks were successful CI / Unit & Component Tests (push) Successful in 3m18s Details CI / OCR Service Tests (push) Successful in 21s Details CI / Backend Unit Tests (push) Successful in 3m46s Details CI / fail2ban Regex (push) Successful in 48s Details CI / Semgrep Security Scan (push) Successful in 23s Details CI / Compose Bucket Idempotency (push) Successful in 1m6s Details Replace all references to the CX32 VPS (8 GB RAM, Hetzner Cloud) with the actual production server: a Hetzner Serverbörse dedicated server with an Intel Core i7-6700 (4C/8T, 3.4 GHz) and 64 GB RAM. Affected files: - .claude/personas/devops.md — monthly cost line + upgrade example - docs/infrastructure/production-compose.md — sizing section + cost table - docs/DEPLOYMENT.md — OCR memory table + OCR_MEM_LIMIT env var description - docs/adr/004-pdfbox-thumbnails.md — thumbnailExecutor memory ceiling note - docs/adr/021-tmpdir-persistent-volume-staging.md — OOMKill rationale in alternatives Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-06 14:51:07 +02:00
Marcel	7629e35897	docs(adr): renumber tag case-collision ADR 032 → 033 to resolve number clash (#731 ) All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m15s Details CI / OCR Service Tests (pull_request) Successful in 23s Details CI / Backend Unit Tests (pull_request) Successful in 3m40s Details CI / fail2ban Regex (pull_request) Successful in 44s Details CI / Semgrep Security Scan (pull_request) Successful in 21s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s Details CI / Unit & Component Tests (push) Successful in 3m13s Details CI / OCR Service Tests (push) Successful in 23s Details CI / Backend Unit Tests (push) Successful in 3m40s Details CI / fail2ban Regex (push) Successful in 46s Details CI / Semgrep Security Scan (push) Successful in 21s Details CI / Compose Bucket Idempotency (push) Successful in 1m7s Details Both #730 (tag case-collision) and #684 (person-delete DB integrity) landed an ADR-032 on main. Renumber the tag/case-collision one to 033 — it is referenced only from this PR's person-domain comments and its own file, so the move is self-contained and touches no Flyway migration. The person-delete ADR-032 and the V71 migration comment that cites it are deliberately left untouched (editing an applied migration would drift its Flyway checksum). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 13:52:25 +02:00
Marcel	cd741b9f57	docs(person): clarify case-collision scope at the exact-case lookups (#731 ) All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m15s Details CI / OCR Service Tests (pull_request) Successful in 22s Details CI / Backend Unit Tests (pull_request) Successful in 3m42s Details CI / fail2ban Regex (pull_request) Successful in 46s Details CI / Semgrep Security Scan (pull_request) Successful in 21s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s Details Review noted the "never throws" claim was overstated: the exact-case Optional lookups still surface a NonUniqueResultException on two byte-identical same-case rows. That is a true data anomaly out of #731's scope (ambiguous = case-insensitive) and resolves to the opaque INTERNAL_ERROR, never a wrong row. Record that boundary at both resolution points and in ADR-032 so the gap is not silently assumed covered. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 13:36:22 +02:00
Marcel	ddf378aaac	fix(person): resolve ambiguous sender names to null on upload (#731 ) All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m18s Details CI / OCR Service Tests (pull_request) Successful in 25s Details CI / Backend Unit Tests (pull_request) Successful in 3m38s Details CI / fail2ban Regex (pull_request) Successful in 43s Details CI / Semgrep Security Scan (pull_request) Successful in 22s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s Details findByName resolved via Optional<Person> findByFirstNameIgnoreCaseAndLastNameIgnoreCase, which threw NonUniqueResultException once two people shared a first+last name case- insensitively (hans müller / Hans Müller) — a 500 on the routine upload path (DocumentService.storeDocument sender resolution). findByName now resolves exact-case → single case-insensitive match → else empty. The sender path deliberately diverges from the alias path: an ambiguous name leaves the sender UNSET rather than guessing the lowest id, because correct provenance beats a confidently-wrong pre-fill a reviewer won't re-check. The two new name queries use explicit HQL equality so a null first name binds as `= NULL` (no match) instead of the derived-query fold to `first_name IS NULL`, which would widen a last-name-only row in as a sender. Pins the opaque error path (IncorrectResultSizeDataAccessException stays INTERNAL_ERROR with no Hibernate/SQL/row-count leak) and extends ADR-032 with the Person section. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 13:03:04 +02:00
Marcel	20cfe41f21	fix(person): resolve case-colliding aliases without throwing (#731 ) findOrCreateByAlias resolved via Optional<Person> findByAliasIgnoreCase, which throws NonUniqueResultException once two aliases collide only by case (müller / Müller) — a generic 500 on the importer path. Mirror the #730 tag fix: resolve exact-case first, then the lowest-id case-insensitive sibling, then create-when-absent (institution/group and maiden-name alias preserved). The throwing Optional<…>IgnoreCase variant is deleted so it can't be reused. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 12:50:21 +02:00