Compare commits

..

20 Commits

Author SHA1 Message Date
Marcel
7679596c70 docs(ollama): add model upgrade runbook + post-deploy smoke test to DEPLOYMENT.md
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Semgrep Security Scan (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Successful in 3m16s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 3m37s
CI / fail2ban Regex (push) Successful in 47s
CI / Semgrep Security Scan (push) Successful in 22s
CI / Compose Bucket Idempotency (push) Successful in 1m4s
Addresses Elicit's and Sara's review concerns on PR #749:
- Expand §6 ollama_models section into a full model upgrade runbook (step-by-step
  docker volume rm + recreate, including production volume name prefix)
- Add re-deploy idempotency note to §3.4 (init container exits quickly when model
  already present on the volume)
- Add NL search smoke test to §3.4 (curl command distinguishing 200 from 503
  NL_SEARCH_UNAVAILABLE)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
3d5dcd1f18 docs(deployment): fix OLLAMA_API_KEY version ref and add --wait warning
Updated OLLAMA_API_KEY env vars table from 0.6.5 to 0.6.5 or 0.30.6 to
match both tested versions. Added an explicit warning in §3.4 that
docker compose up -d --wait blocks for 60–90 min on first deploy when the
model pull has not yet completed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
52fca38f0f docs(env): correct OLLAMA_API_KEY comment — tested on 0.6.5 and 0.30.6
Both versions were tested and neither enforces the key. Comment updated to
say "0.6.5 or 0.30.6" and surface archiv-net as the sole effective control.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
662a8f3e80 fix(infra): interpolate APP_OLLAMA_BASE_URL so .env empty-value disables Ollama
Hardcoded literal overrides any .env setting — setting APP_OLLAMA_BASE_URL=
in .env had no effect on the backend container. Now uses the same pattern
as APP_OCR_TRAINING_TOKEN with a safe default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
cbba95c3f8 docs(c4): fix Ollama container version 0.6.5 → 0.30.6 in l2-containers.puml
Diagram must match the pinned image version in docker-compose.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
3536ed884c docs(adr): fix ADR-028 §12 false API-key claim, stale TBD, and §7 title
§12 stated OLLAMA_API_KEY guards against lateral movement — contradicts
§7's empirical finding that it is not enforced. Replaced with an accurate
note referencing §7. Stale pre-merge placeholder in Consequences ("Three
TBD items must be resolved") removed; all three are resolved. §7 section
title updated from "0.6.5" to "0.6.5 and 0.30.6" to match the body text.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
5a939d9222 fix(infra): escape \$\$SERVE_PID in compose command to prevent interpolation (#737)
Docker Compose interpolates $VAR in command strings — use $$ to pass a
literal $ to the shell so SERVE_PID=$! and kill $SERVE_PID work correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
93e90424ab docs(adr): update ADR-028 with 0.30.6 verified findings for API key + read_only (#737)
- OLLAMA_API_KEY: non-enforcement confirmed on both 0.6.5 and 0.30.6
- read_only: true: confirmed working on both 0.6.5 and 0.30.6
- Peak RSS during pull: ~108 MiB (well under 2g limit)
- All TBD placeholders resolved

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
e8f3004c4f feat(infra): add Ollama env vars to .env.example (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
9637ebbca2 feat(infra): add Ollama Docker Compose services for NL search (#737)
- ollama-model-init: one-shot init container that pulls qwen2.5:7b-instruct-q4_K_M
  into the ollama_models volume on first start
- ollama: main inference service on archiv-net (expose: only, no public port)
- ollama_models named volume for persistent model storage
- APP_OLLAMA_BASE_URL + APP_OLLAMA_API_KEY added to backend env
- Both services: cap_drop ALL, no-new-privileges, read_only+tmpfs (ADR-019 + ADR-028)
- start_period: 60s — model pre-pulled by init container

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
df10a42069 docs(deploy): document Ollama hardware requirements, env vars, and ops notes (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
64120a30b5 docs(arch): add Ollama container to C4 level-2 container diagram (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00
Marcel
25252fc709 feat(observability): add Grafana Ollama inference latency dashboard (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00
Marcel
1f379a161d fix(observability): fix OCR target name + add Ollama scrape job (#737)
- prometheus.yml: ocr:8000 → ocr-service:8000 (Docker service name is
  ocr-service, not ocr — current scrape target has never resolved)
- Add Ollama scrape job on ollama:11434 /metrics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00
Marcel
c0d034c85d docs(adr): add ADR-028 — Ollama Docker Compose service for NL search (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00
Marcel
ca93cde06e docs(infra): correct server specs — Hetzner Serverbörse i7-6700 64 GB, not CX32
All checks were successful
CI / Unit & Component Tests (push) Successful in 3m18s
CI / OCR Service Tests (push) Successful in 21s
CI / Backend Unit Tests (push) Successful in 3m46s
CI / fail2ban Regex (push) Successful in 48s
CI / Semgrep Security Scan (push) Successful in 23s
CI / Compose Bucket Idempotency (push) Successful in 1m6s
Replace all references to the CX32 VPS (8 GB RAM, Hetzner Cloud) with the
actual production server: a Hetzner Serverbörse dedicated server with an
Intel Core i7-6700 (4C/8T, 3.4 GHz) and 64 GB RAM.

Affected files:
- .claude/personas/devops.md — monthly cost line + upgrade example
- docs/infrastructure/production-compose.md — sizing section + cost table
- docs/DEPLOYMENT.md — OCR memory table + OCR_MEM_LIMIT env var description
- docs/adr/004-pdfbox-thumbnails.md — thumbnailExecutor memory ceiling note
- docs/adr/021-tmpdir-persistent-volume-staging.md — OOMKill rationale in alternatives

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:51:07 +02:00
Marcel
7629e35897 docs(adr): renumber tag case-collision ADR 032 → 033 to resolve number clash (#731)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m15s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m40s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
CI / Unit & Component Tests (push) Successful in 3m13s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 3m40s
CI / fail2ban Regex (push) Successful in 46s
CI / Semgrep Security Scan (push) Successful in 21s
CI / Compose Bucket Idempotency (push) Successful in 1m7s
Both #730 (tag case-collision) and #684 (person-delete DB integrity) landed
an ADR-032 on main. Renumber the tag/case-collision one to 033 — it is
referenced only from this PR's person-domain comments and its own file, so the
move is self-contained and touches no Flyway migration. The person-delete
ADR-032 and the V71 migration comment that cites it are deliberately left
untouched (editing an applied migration would drift its Flyway checksum).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 13:52:25 +02:00
Marcel
cd741b9f57 docs(person): clarify case-collision scope at the exact-case lookups (#731)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m15s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m42s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
Review noted the "never throws" claim was overstated: the exact-case Optional
lookups still surface a NonUniqueResultException on two byte-identical
same-case rows. That is a true data anomaly out of #731's scope (ambiguous =
case-insensitive) and resolves to the opaque INTERNAL_ERROR, never a wrong
row. Record that boundary at both resolution points and in ADR-032 so the gap
is not silently assumed covered.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 13:36:22 +02:00
Marcel
ddf378aaac fix(person): resolve ambiguous sender names to null on upload (#731)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m18s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m38s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
findByName resolved via Optional<Person>
findByFirstNameIgnoreCaseAndLastNameIgnoreCase, which threw
NonUniqueResultException once two people shared a first+last name case-
insensitively (hans müller / Hans Müller) — a 500 on the routine upload path
(DocumentService.storeDocument sender resolution).

findByName now resolves exact-case → single case-insensitive match → else
empty. The sender path deliberately diverges from the alias path: an
ambiguous name leaves the sender UNSET rather than guessing the lowest id,
because correct provenance beats a confidently-wrong pre-fill a reviewer
won't re-check. The two new name queries use explicit HQL equality so a null
first name binds as `= NULL` (no match) instead of the derived-query fold to
`first_name IS NULL`, which would widen a last-name-only row in as a sender.

Pins the opaque error path (IncorrectResultSizeDataAccessException stays
INTERNAL_ERROR with no Hibernate/SQL/row-count leak) and extends ADR-032 with
the Person section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 13:03:04 +02:00
Marcel
20cfe41f21 fix(person): resolve case-colliding aliases without throwing (#731)
findOrCreateByAlias resolved via Optional<Person> findByAliasIgnoreCase,
which throws NonUniqueResultException once two aliases collide only by case
(müller / Müller) — a generic 500 on the importer path. Mirror the #730 tag
fix: resolve exact-case first, then the lowest-id case-insensitive sibling,
then create-when-absent (institution/group and maiden-name alias preserved).
The throwing Optional<…>IgnoreCase variant is deleted so it can't be reused.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:50:21 +02:00
13 changed files with 408 additions and 91 deletions

View File

@@ -154,9 +154,9 @@ Schedule monthly automated restore tests. If the restore fails, the backup is wo
```
Every alert needs: description, severity, likely cause, resolution steps, escalation path.
3. **Upgrading VPS tier before profiling**
3. **Upgrading hardware before profiling**
```
# "The app feels slow" → upgrade from CX32 to CX42
# "The app feels slow" → order more RAM / a faster CPU
# Actual cause: unindexed query scanning 100k rows
```
Profile with Grafana dashboards first. Most perceived performance issues are application bugs, not resource constraints.
@@ -404,8 +404,8 @@ Hetzner Object Storage (S3-compatible, replaces MinIO in prod)
Prometheus + Loki + Alertmanager
```
### Monthly Cost: ~23 EUR
CX32 VPS (4 vCPU, 8GB RAM): 17 EUR · Object Storage (~200GB): 5 EUR · SMTP relay: ~1 EUR
### Monthly Cost: ~6 EUR (excl. server)
Hetzner dedicated server (Serverbörse, i7-6700, 64 GB RAM): see invoice · Object Storage (~200GB): 5 EUR · SMTP relay: ~1 EUR
### Reference Documentation
- Full CI workflow, Gitea vs GitHub differences: `docs/infrastructure/ci-gitea.md`

View File

@@ -29,14 +29,36 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
// Stammbaum-Knoten: alle Personen mit family_member = true.
List<Person> findByFamilyMemberTrueOrderByLastNameAscFirstNameAsc();
// Lookup by full alias string, used during ODS mass import
Optional<Person> findByAliasIgnoreCase(String alias);
// Exact-case alias lookup — the first resolution step in findOrCreateByAlias.
// Case-colliding aliases across persons (müller / Müller) are valid human labels, NOT
// duplicates: source_ref is the stable identity (ADR-025/033), alias is editable. Do NOT
// add a unique(lower(alias)) constraint — see ADR-033.
Optional<Person> findByAlias(String alias);
// Plural case-insensitive alias lookup — the fallback step. Returns ALL case-folding
// siblings so the service can pick a deterministic one (lowest id) instead of letting a
// derived Optional<…>IgnoreCase throw NonUniqueResultException. See ADR-033.
List<Person> findAllByAliasIgnoreCase(String alias);
// Lookup by the normalizer person_id, used for idempotent canonical re-import (Phase 3).
Optional<Person> findBySourceRef(String sourceRef);
// Exact first+last name match, used for filename-based sender lookup
Optional<Person> findByFirstNameIgnoreCaseAndLastNameIgnoreCase(String firstName, String lastName);
// Exact-case first+last name match — the first step of filename-based sender resolution.
// Explicit `=` (HQL, not a derived query) so a null firstName binds as `first_name = NULL`
// — never a match — instead of the derived-query fold to `first_name IS NULL`, which would
// pull a last-name-only row in as a sender (a provenance defect). See ADR-033.
@Query("SELECT p FROM Person p WHERE p.firstName = :firstName AND p.lastName = :lastName")
Optional<Person> findByFirstNameAndLastName(@Param("firstName") String firstName,
@Param("lastName") String lastName);
// Plural case-insensitive first+last name match — lets findByName bail to empty on 2+ matches
// instead of letting a derived Optional<…>IgnoreCase throw NonUniqueResultException. Same
// null fail-closed guarantee as above: LOWER(:firstName) is NULL for a null arg, so a null
// first name resolves to no match (not first_name IS NULL widening). See ADR-033.
@Query("SELECT p FROM Person p WHERE LOWER(p.firstName) = LOWER(:firstName) "
+ "AND LOWER(p.lastName) = LOWER(:lastName)")
List<Person> findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(@Param("firstName") String firstName,
@Param("lastName") String lastName);
// --- PersonSummaryDTO with document count ---

View File

@@ -1,5 +1,6 @@
package org.raddatz.familienarchiv.person;
import java.util.Comparator;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
@@ -110,7 +111,19 @@ public class PersonService {
}
public Optional<Person> findByName(String firstName, String lastName) {
return personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase(firstName, lastName);
// Same scope as findOrCreateByAlias (#731): a case-collision resolves without throwing;
// two byte-identical same-case persons are an out-of-scope data anomaly the exact
// Optional below would surface as the opaque INTERNAL_ERROR, not a wrong sender.
Optional<Person> exact = personRepository.findByFirstNameAndLastName(firstName, lastName);
if (exact.isPresent()) return exact;
List<Person> caseInsensitive =
personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(firstName, lastName);
// Deliberate divergence from findOrCreateByAlias: an ambiguous filename leaves the sender
// UNSET rather than picking the lowest id. The archive's value is correct provenance — a
// confidently-wrong pre-filled "Hans Müller" is worse than an empty field, because a
// reviewer won't re-check a pre-filled value. Do NOT "consistency-clean" this into the
// lowest-id fallback. See ADR-033.
return caseInsensitive.size() == 1 ? Optional.of(caseInsensitive.get(0)) : Optional.empty();
}
/** Lookup by the normalizer person_id — used by the canonical importer for register-first matching. */
@@ -125,32 +138,45 @@ public class PersonService {
PersonType type = PersonTypeClassifier.classify(alias);
if (type == PersonType.SKIP) return null;
return personRepository.findByAliasIgnoreCase(alias).orElseGet(() -> {
if (type == PersonType.INSTITUTION || type == PersonType.GROUP) {
return personRepository.save(Person.builder()
.alias(alias)
.lastName(alias)
.personType(type)
.build());
}
// Aliases differing only by case (müller / Müller) are valid distinct persons, not
// duplicates, so a CASE-COLLISION must not throw: exact-case first, then the lowest-id
// case-insensitive sibling, then create. Mirrors the tag path — see ADR-033.
// Scope (#731): "ambiguous" means case-insensitive. Two BYTE-IDENTICAL same-case aliases
// are a true data anomaly out of scope here; the exact Optional below would surface that
// as the opaque INTERNAL_ERROR (never a wrong row), not silently pick one.
Optional<Person> exact = personRepository.findByAlias(alias);
if (exact.isPresent()) return exact.get(); // exact-case wins
List<Person> caseInsensitive = personRepository.findAllByAliasIgnoreCase(alias);
if (!caseInsensitive.isEmpty()) {
return caseInsensitive.stream().min(Comparator.comparing(Person::getId)).orElseThrow(); // deterministic tie-break — list is non-empty, never throws
}
PersonNameParser.SplitName split = PersonNameParser.split(alias);
Person person = personRepository.save(Person.builder()
// Create-when-absent: institution/group keep the full label in lastName; a person name
// is split and a maiden name (geb. …) becomes a MAIDEN_NAME alias.
if (type == PersonType.INSTITUTION || type == PersonType.GROUP) {
return personRepository.save(Person.builder()
.alias(alias)
.firstName(split.firstName())
.lastName(split.lastName())
.lastName(alias)
.personType(type)
.build());
if (split.maidenName() != null) {
int nextSortOrder = aliasRepository.findMaxSortOrder(person.getId()) + 1;
aliasRepository.save(PersonNameAlias.builder()
.person(person)
.lastName(split.maidenName())
.type(PersonNameAliasType.MAIDEN_NAME)
.sortOrder(nextSortOrder)
.build());
}
return person;
});
}
PersonNameParser.SplitName split = PersonNameParser.split(alias);
Person person = personRepository.save(Person.builder()
.alias(alias)
.firstName(split.firstName())
.lastName(split.lastName())
.build());
if (split.maidenName() != null) {
int nextSortOrder = aliasRepository.findMaxSortOrder(person.getId()) + 1;
aliasRepository.save(PersonNameAlias.builder()
.person(person)
.lastName(split.maidenName())
.type(PersonNameAliasType.MAIDEN_NAME)
.sortOrder(nextSortOrder)
.build());
}
return person;
}
/**

View File

@@ -20,8 +20,8 @@ Features: person CRUD, name alias management, person merge (deduplication), fami
| `getById(UUID)` | document, geschichte, ocr | Fetch one person by ID |
| `getAllById(List<UUID>)` | document | Bulk fetch for sender/receiver resolution |
| `findAll(String q)` | document, dashboard | List all persons |
| `findByName(String firstName, String lastName)` | document | Typeahead search |
| `findOrCreateByAlias(String rawName)` | importing | Idempotent create during mass import; type classification happens internally |
| `findByName(String firstName, String lastName)` | document | Filename-based **sender resolution** in `storeDocument`: exact-case match → single case-insensitive match → else **empty** (ambiguous names leave the sender unset; a null first name never matches). See ADR-033. |
| `findOrCreateByAlias(String rawName)` | importing | Idempotent create during mass import; type classification happens internally. Resolves exact-case → lowest-id case-insensitive sibling → create — never throws on case-colliding aliases. See ADR-033. |
| `findAllFamilyMembers()` | dashboard | Family member list for stats |
| `findCorrespondents()` | document | Correspondent list for conversation filter |
| `count()` | dashboard | Total person count for stats |

View File

@@ -12,6 +12,7 @@ import org.mockito.MockedStatic;
import org.mockito.junit.jupiter.MockitoExtension;
import org.slf4j.LoggerFactory;
import org.springframework.dao.DataIntegrityViolationException;
import org.springframework.dao.IncorrectResultSizeDataAccessException;
import org.springframework.http.ResponseEntity;
import static org.assertj.core.api.Assertions.assertThat;
@@ -37,6 +38,30 @@ class GlobalExceptionHandlerTest {
}
}
@Test
void handleGeneric_incorrectResultSize_staysOpaque_noHibernateOrRowCountLeak() {
// #731: before the fix, a case-colliding alias/name made Hibernate throw
// NonUniqueResultException → IncorrectResultSizeDataAccessException, which has no
// dedicated handler and falls through to handleGeneric. The fix removes the throw, but
// this pins the handler: a stray one must stay opaque — no Hibernate class name, no SQL,
// no "2 results were returned" row count reaching the client (CWE-209).
IncorrectResultSizeDataAccessException ex = new IncorrectResultSizeDataAccessException(
"query did not return a unique result: 2 results were returned", 1, 2);
try (MockedStatic<Sentry> sentryMock = mockStatic(Sentry.class)) {
ResponseEntity<GlobalExceptionHandler.ErrorResponse> response = handler.handleGeneric(ex);
assertThat(response.getStatusCode().value()).isEqualTo(500);
assertThat(response.getBody()).isNotNull();
assertThat(response.getBody().code()).isEqualTo(ErrorCode.INTERNAL_ERROR);
assertThat(response.getBody().message())
.isEqualTo("An unexpected error occurred")
.doesNotContain("results were returned")
.doesNotContain("NonUnique")
.doesNotContain("IncorrectResultSize");
}
}
@Test
void handleDataIntegrityViolation_returns400_withoutLeakingConstraint_orSentry() {
// A DataIntegrityViolationException carries the constraint name + SQL in its message;

View File

@@ -121,37 +121,60 @@ class PersonRepositoryTest {
.containsExactly("Anna", "Clara");
}
// ─── findByAliasIgnoreCase ────────────────────────────────────────────────
// ─── findByAlias (exact) / findAllByAliasIgnoreCase (case-folding siblings) ───
@Test
void findByAliasIgnoreCase_returnsMatchingPerson() {
void findByAlias_returnsExactCaseMatchOnly() {
personRepository.save(Person.builder()
.firstName("Karl").lastName("Brandt").alias("Opa Karl").build());
Optional<Person> found = personRepository.findByAliasIgnoreCase("opa karl");
assertThat(found).isPresent();
assertThat(found.get().getFirstName()).isEqualTo("Karl");
assertThat(personRepository.findByAlias("Opa Karl")).isPresent();
assertThat(personRepository.findByAlias("opa karl")).isEmpty(); // exact-case: a folded form does NOT match
}
@Test
void findByAliasIgnoreCase_returnsEmpty_whenAliasDoesNotMatch() {
Optional<Person> found = personRepository.findByAliasIgnoreCase("nobody");
assertThat(found).isEmpty();
void findAllByAliasIgnoreCase_returnsEmpty_whenAliasDoesNotMatch() {
assertThat(personRepository.findAllByAliasIgnoreCase("nobody")).isEmpty();
}
// ─── findByFirstNameIgnoreCaseAndLastNameIgnoreCase ───────────────────────
@Test
void findAllByAliasIgnoreCase_foldsUmlautCase_inRealPostgres() {
// Proves Postgres LOWER() folds ü the same way for both rows — a plain-ASCII probe would
// stay green even if umlaut folding regressed. Both case-colliding aliases must match.
personRepository.save(Person.builder().lastName("Müller").alias("Müller").build());
personRepository.save(Person.builder().lastName("müller").alias("müller").build());
assertThat(personRepository.findAllByAliasIgnoreCase("MÜLLER")).hasSize(2);
}
// ─── findByFirstNameAndLastName (exact) / findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase ───
@Test
void findByFirstNameIgnoreCaseAndLastNameIgnoreCase_returnsMatch() {
void findByFirstNameAndLastName_returnsExactCaseMatchOnly() {
personRepository.save(Person.builder().firstName("Maria").lastName("Raddatz").build());
Optional<Person> found = personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase(
"maria", "raddatz");
assertThat(personRepository.findByFirstNameAndLastName("Maria", "Raddatz")).isPresent();
assertThat(personRepository.findByFirstNameAndLastName("maria", "raddatz")).isEmpty(); // exact-case only
}
assertThat(found).isPresent();
assertThat(found.get().getFirstName()).isEqualTo("Maria");
@Test
void findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase_foldsUmlautCase_inRealPostgres() {
personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
personRepository.save(Person.builder().firstName("hans").lastName("müller").build());
assertThat(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("HANS", "MÜLLER"))
.hasSize(2);
}
@Test
void findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase_nullFirstName_foldsToNoMatch() {
// Fail-closed: a last-name-only filename (null first name) must NOT widen to first_name IS
// NULL and pull in the institution/last-name-only row as a "sender". Proven on real
// Postgres because a mocked unit test cannot catch the IS NULL vs `= NULL` semantics.
personRepository.save(Person.builder().lastName("Müller").build()); // first_name NULL
assertThat(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(null, "Müller"))
.isEmpty();
}
// ─── findCorrespondents ───────────────────────────────────────────────────

View File

@@ -4,6 +4,7 @@ import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentRepository;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonType;
@@ -16,10 +17,13 @@ import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.transaction.annotation.Transactional;
import software.amazon.awssdk.services.s3.S3Client;
import org.springframework.mock.web.MockMultipartFile;
import jakarta.persistence.EntityManager;
import jakarta.persistence.PersistenceContext;
import java.util.Set;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
@@ -33,6 +37,7 @@ class PersonServiceIntegrationTest {
@Autowired PersonService personService;
@Autowired PersonRepository personRepository;
@Autowired DocumentRepository documentRepository;
@Autowired DocumentService documentService;
@PersistenceContext EntityManager entityManager;
@@ -75,6 +80,93 @@ class PersonServiceIntegrationTest {
assertThat(result.getLastName()).isEqualTo("Cram");
}
// ─── #731: case-colliding alias resolution against real Postgres ───────────
// The umlaut pair is mandatory — only the real DB proves Postgres LOWER() folds ü; a
// plain-ASCII test would stay green while umlaut aliases regressed.
@Test
void findOrCreateByAlias_resolvesUmlautAliasCollision_toLowestId_withoutThrow() {
Person muller = personRepository.save(Person.builder().lastName("Müller").alias("Müller").build());
Person mullerLower = personRepository.save(Person.builder().lastName("müller").alias("müller").build());
UUID expected = muller.getId().compareTo(mullerLower.getId()) <= 0 ? muller.getId() : mullerLower.getId();
// No exact-case "MÜLLER" row → falls through to the case-insensitive branch with two
// candidates and must pick the lowest id, never throwing NonUniqueResultException.
Person resolved = personService.findOrCreateByAlias("MÜLLER");
assertThat(resolved.getId()).isEqualTo(expected);
}
@Test
void findOrCreateByAlias_umlautAliasCollision_isDeterministicAcrossCalls() {
personRepository.save(Person.builder().lastName("Müller").alias("Müller").build());
personRepository.save(Person.builder().lastName("müller").alias("müller").build());
Person first = personService.findOrCreateByAlias("MÜLLER");
Person second = personService.findOrCreateByAlias("MÜLLER");
assertThat(second.getId()).isEqualTo(first.getId());
}
// ─── #731: filename-based sender resolution against real Postgres ──────────
@Test
void storeDocument_resolvesSender_whenFilenameNameIsUnique() throws Exception {
Person hans = personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
Document doc = uploadNamed("1965-03-12_Müller_Hans.pdf").document();
assertThat(doc.getSender()).isNotNull();
assertThat(doc.getSender().getId()).isEqualTo(hans.getId());
}
@Test
void storeDocument_resolvesSender_onSingleCaseInsensitiveMatch() throws Exception {
Person hans = personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
// Filename folds to "hans müller"; the only stored person is "Hans Müller".
Document doc = uploadNamed("1965-03-12_müller_hans.pdf").document();
assertThat(doc.getSender()).isNotNull();
assertThat(doc.getSender().getId()).isEqualTo(hans.getId());
}
@Test
void storeDocument_leavesSenderUnset_whenFilenameNameIsAmbiguous() throws Exception {
// Two persons collide case-insensitively; the filename casing ("HANS"/"MÜLLER") matches
// neither exactly → no exact-case winner → bail to null (never an arbitrary guess), no 500.
personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
personRepository.save(Person.builder().firstName("hans").lastName("müller").build());
Document doc = uploadNamed("1965-03-12_MÜLLER_HANS.pdf").document();
assertThat(doc.getSender()).isNull();
}
@Test
void storeDocument_leavesSenderUnset_whenFilenameHasNoFirstName() throws Exception {
// A last-name-only filename never resolves to a sender (the parser yields no parsed name).
personRepository.save(Person.builder().lastName("Müller").build());
Document doc = uploadNamed("1965-03-12_Müller.pdf").document();
assertThat(doc.getSender()).isNull();
}
@Test
void findByName_nullFirstName_resolvesToEmpty_inRealPostgres() {
// Fail-closed against the real DB: a null first name must NOT widen to first_name IS NULL
// and pick up the last-name-only row.
personRepository.save(Person.builder().lastName("Müller").build()); // first_name NULL
assertThat(personService.findByName(null, "Müller")).isEmpty();
}
private DocumentService.StoreResult uploadNamed(String filename) throws Exception {
MockMultipartFile file = new MockMultipartFile("file", filename, "application/pdf", new byte[]{1, 2, 3});
return documentService.storeDocument(file, null);
}
// ─── #667: confirm round-trip + reader-default semantics ──────────────────
@Test

View File

@@ -375,14 +375,57 @@ class PersonServiceTest {
// ─── findOrCreateByAlias ─────────────────────────────────────────────────
@Test
void findOrCreateByAlias_returnsExisting_whenAliasFound() {
String alias = "Walter de Gruyter";
Person existing = Person.builder().id(UUID.randomUUID()).alias(alias).build();
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.of(existing));
void findOrCreateByAlias_returnsExactCaseMatch_overCaseInsensitiveSibling() {
String alias = "müller";
Person exact = Person.builder().id(UUID.randomUUID()).alias("müller").build();
when(personRepository.findByAlias(alias)).thenReturn(Optional.of(exact));
Person result = personService.findOrCreateByAlias(alias);
assertThat(result).isEqualTo(existing);
assertThat(result).isEqualTo(exact);
verify(personRepository, never()).findAllByAliasIgnoreCase(any());
verify(personRepository, never()).save(any());
}
@Test
void findOrCreateByAlias_returnsExactCaseMatch_evenWhenMultipleSiblingsCollide() {
String alias = "Müller";
Person exact = Person.builder().id(UUID.randomUUID()).alias("Müller").build();
when(personRepository.findByAlias(alias)).thenReturn(Optional.of(exact));
Person result = personService.findOrCreateByAlias(alias);
assertThat(result).isEqualTo(exact);
// exact-case short-circuits — the case-insensitive siblings are never consulted.
verify(personRepository, never()).findAllByAliasIgnoreCase(any());
}
@Test
void findOrCreateByAlias_usesSingleCaseInsensitiveMatch_whenNoExactCase() {
String alias = "müller";
Person only = Person.builder().id(UUID.randomUUID()).alias("Müller").build();
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of(only));
Person result = personService.findOrCreateByAlias(alias);
assertThat(result).isEqualTo(only);
verify(personRepository, never()).save(any());
}
@Test
void findOrCreateByAlias_returnsLowestIdDeterministically_whenMultipleCaseInsensitiveMatches() {
String alias = "müller";
Person lower = Person.builder().id(UUID.fromString("00000000-0000-0000-0000-000000000001")).alias("Müller").build();
Person higher = Person.builder().id(UUID.fromString("00000000-0000-0000-0000-000000000002")).alias("müller").build();
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of(higher, lower)); // unordered
Person first = personService.findOrCreateByAlias(alias);
Person second = personService.findOrCreateByAlias(alias);
assertThat(first.getId()).isEqualTo(lower.getId()); // lowest id wins
assertThat(second.getId()).isEqualTo(first.getId()); // same result every call — never throws
verify(personRepository, never()).save(any());
}
@@ -390,7 +433,8 @@ class PersonServiceTest {
void findOrCreateByAlias_createsNew_whenAliasNotFound() {
String alias = "Clara Cram";
Person saved = Person.builder().id(UUID.randomUUID()).alias(alias).firstName("Clara").lastName("Cram").build();
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
when(personRepository.save(any())).thenReturn(saved);
Person result = personService.findOrCreateByAlias(alias);
@@ -403,7 +447,8 @@ class PersonServiceTest {
void findOrCreateByAlias_createsMaidenNameAlias_whenGebPresent() {
String alias = "Clara Cram geb. de Gruyter";
Person saved = Person.builder().id(UUID.randomUUID()).alias(alias).firstName("Clara").lastName("Cram").build();
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
when(personRepository.save(any())).thenReturn(saved);
when(aliasRepository.findMaxSortOrder(saved.getId())).thenReturn(0);
when(aliasRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
@@ -425,7 +470,8 @@ class PersonServiceTest {
@Test
void findOrCreateByAlias_setsInstitutionType_withFullNameInLastName() {
String alias = "Arthur Collignon GmbH";
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
when(personRepository.save(any())).thenAnswer(inv -> {
Person p = inv.getArgument(0);
p.setId(UUID.randomUUID());
@@ -442,7 +488,8 @@ class PersonServiceTest {
@Test
void findOrCreateByAlias_setsGroupType_withFullNameInLastName() {
String alias = "Geschwister de Gruyter";
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
when(personRepository.save(any())).thenAnswer(inv -> {
Person p = inv.getArgument(0);
p.setId(UUID.randomUUID());
@@ -460,7 +507,8 @@ class PersonServiceTest {
void findOrCreateByAlias_noAlias_whenNoGeb() {
String alias = "Clara Cram";
Person saved = Person.builder().id(UUID.randomUUID()).alias(alias).firstName("Clara").lastName("Cram").build();
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
when(personRepository.save(any())).thenReturn(saved);
personService.findOrCreateByAlias(alias);
@@ -472,11 +520,54 @@ class PersonServiceTest {
void findOrCreateByAlias_trimsInput() {
String alias = " Clara Cram ";
Person saved = Person.builder().id(UUID.randomUUID()).alias("Clara Cram").build();
when(personRepository.findByAliasIgnoreCase("Clara Cram")).thenReturn(Optional.of(saved));
when(personRepository.findByAlias("Clara Cram")).thenReturn(Optional.of(saved));
personService.findOrCreateByAlias(alias);
verify(personRepository).findByAliasIgnoreCase("Clara Cram");
verify(personRepository).findByAlias("Clara Cram");
}
// ─── findByName (filename-based sender resolution) ────────────────────────
@Test
void findByName_returnsExactCaseMatch_overCaseInsensitiveSibling() {
Person exact = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
when(personRepository.findByFirstNameAndLastName("Hans", "Müller")).thenReturn(Optional.of(exact));
assertThat(personService.findByName("Hans", "Müller")).contains(exact);
verify(personRepository, never()).findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(any(), any());
}
@Test
void findByName_usesSingleCaseInsensitiveMatch_whenNoExactCase() {
Person only = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
when(personRepository.findByFirstNameAndLastName("hans", "müller")).thenReturn(Optional.empty());
when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("hans", "müller"))
.thenReturn(List.of(only));
assertThat(personService.findByName("hans", "müller")).contains(only);
}
@Test
void findByName_bailsToEmpty_whenTwoOrMoreCaseInsensitiveMatches() {
Person a = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
Person b = Person.builder().id(UUID.randomUUID()).firstName("hans").lastName("müller").build();
when(personRepository.findByFirstNameAndLastName("hans", "müller")).thenReturn(Optional.empty());
when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("hans", "müller"))
.thenReturn(List.of(a, b));
// Ambiguous sender → unset, never an arbitrary guess (provenance correctness over a
// confidently-wrong pre-fill). This is the deliberate divergence from the alias path.
assertThat(personService.findByName("hans", "müller")).isEmpty();
}
@Test
void findByName_returnsEmpty_whenFirstNameNullFoldsToNoMatch() {
when(personRepository.findByFirstNameAndLastName(null, "Müller")).thenReturn(Optional.empty());
when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(null, "Müller"))
.thenReturn(List.of());
assertThat(personService.findByName(null, "Müller")).isEmpty();
}
// ─── updatePerson (notes) ────────────────────────────────────────────────

View File

@@ -52,13 +52,14 @@ The OCR service requires significant RAM for model loading. The dev compose sets
| Production target | RAM | Recommended OCR limit | NL Search | Notes |
|---|---|---|---|---|
| Hetzner CX42 | 16 GB | 12 GB | Supported (Ollama 8 GB + OCR 6 GB active ≈ 14 GB) | Recommended for OCR-enabled production |
| Hetzner CX32 | 8 GB | 6 GB | Disabled — set `APP_OLLAMA_BASE_URL=` (empty) | Accept reduced batch sizes and slower throughput |
| Hetzner CX22 | 4 GB | — | Unsupported | Disable the OCR service (`profiles: [ocr]`); run OCR on demand only |
| Current server (Hetzner Serverbörse, i7-6700) | 64 GB | 12 GB | Supported | Default `mem_limit: 12g` works comfortably; plenty of headroom for Ollama |
| ≥ 16 GB RAM | 16+ GB | 12 GB | Supported | Default works |
| 8 GB RAM | 8 GB | 6 GB | Disabled — set `APP_OLLAMA_BASE_URL=` (empty) | Set `OCR_MEM_LIMIT=6g`; accept reduced batch sizes |
| 4 GB RAM | 4 GB | — | Unsupported | Disable OCR service (`profiles: [ocr]`); run OCR on demand only |
A CX32 cannot honour the default `mem_limit: 12g` — set the `OCR_MEM_LIMIT=6g` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow) before deploying on a CX32. The prod compose interpolates this var with a 12g default.
On servers with less than 16 GB RAM the default `mem_limit: 12g` cannot be honoured — set the `OCR_MEM_LIMIT` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow). The prod compose interpolates this var with a 12g default.
> **Memory budget (CX42):** OCR (~6 GB active) + Ollama (~8 GB) = ~14 GB. Do not run `docker-compose.observability.yml` continuously alongside both services on a CX42.
> **Memory budget:** OCR (~6 GB active) + Ollama (~8 GB) = ~14 GB. On servers with less than 16 GB RAM, do not run `docker-compose.observability.yml` continuously alongside both OCR and Ollama.
### Dev vs production differences
@@ -142,7 +143,7 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `ALLOWED_PDF_HOSTS` | SSRF protection — comma-separated list of allowed PDF source hosts. **Do not widen to `*`** | `minio,localhost,127.0.0.1` | YES | — |
| `KRAKEN_MODEL_PATH` | Directory containing Kraken HTR models (populated by `download-kraken-models.sh`) | `/app/models/` | — | — |
| `BLLA_MODEL_PATH` | Kraken baseline layout analysis model path | `/app/models/blla.mlmodel` | — | — |
| `OCR_MEM_LIMIT` | Container memory cap for ocr-service in `docker-compose.prod.yml`. Set to `6g` on CX32 hosts; leave unset on CX42+ to use the 12g default | `12g` (prod compose default) | — | — |
| `OCR_MEM_LIMIT` | Container memory cap for ocr-service in `docker-compose.prod.yml`. Set to `6g` on servers with 8 GB RAM; leave unset (12g default) on servers with ≥ 16 GB RAM | `12g` (prod compose default) | — | — |
| `XDG_CACHE_HOME` | XDG cache base dir — redirects Matplotlib and other XDG-aware libraries away from the read-only `HOME` (`/home/ocr`) to the writable cache volume | `/app/cache` | — | — |
| `TORCH_HOME` | PyTorch model cache — redirects `~/.cache/torch` to the writable models volume | `/app/models/torch` | — | — |

View File

@@ -35,7 +35,7 @@ Render thumbnails in-process in Spring Boot using **Apache PDFBox 3.0.4** (alrea
**Harder:**
- PDFBox is a parser attack surface. Mitigated by a 30-second watchdog timeout in `ThumbnailAsyncRunner` and by the fire-and-forget contract (failures never break upload).
- Memory ceiling: the `thumbnailExecutor` is capped at 2 threads on the CX32 (8 GB). A busy backfill alongside OCR can approach the 3 GB heap — acceptable but not comfortable. Streaming via `FileService.downloadFileStream` keeps this bounded for PDFs up to 50 MB.
- Memory ceiling: the `thumbnailExecutor` is capped at 2 threads on memory-constrained hosts. A busy backfill alongside OCR can approach the 3 GB heap on an 8 GB server — acceptable but not comfortable. The current production server (64 GB) has ample headroom. Streaming via `FileService.downloadFileStream` keeps this bounded for PDFs up to 50 MB.
### Operational caveats (intentional)

View File

@@ -62,7 +62,7 @@ The `/tmp` tmpfs remains at 512 MB and continues to serve training-ZIP extractio
## Alternatives considered
**Approach B — Enlarge `/tmp` to 4 GB**
One-line change. Discarded because: (1) 4 GB tmpfs counts against the cgroup `mem_limit`; on CX32 hosts with `OCR_MEM_LIMIT=6g` the combined Surya resident set + tmpfs would trigger OOMKill on cold start; (2) staging GB-scale model files through RAM is using the wrong storage tier; (3) any future model larger than 4 GB requires another bump.
One-line change. Discarded because: (1) 4 GB tmpfs counts against the cgroup `mem_limit`; on servers with `OCR_MEM_LIMIT=6g` the combined Surya resident set + tmpfs would trigger OOMKill on cold start; (2) staging GB-scale model files through RAM is using the wrong storage tier; (3) any future model larger than 4 GB requires another bump.
**Approach C — Both TMPDIR redirect and enlarged /tmp**
Belt-and-suspenders: Approach A + 1 GB tmpfs. Discarded in favour of the cleaner Approach A. The defence-in-depth benefit does not outweigh the extra compose churn; the 512 MB cap on `/tmp` is intentional.

View File

@@ -1,4 +1,4 @@
# ADR-032 — Tag-name resolution tolerates case-collisions: exact-case first, then a deterministic lowest-id fallback, and never a `unique(lower(name))` constraint
# ADR-033 — Tag-name resolution tolerates case-collisions: exact-case first, then a deterministic lowest-id fallback, and never a `unique(lower(name))` constraint
**Date:** 2026-06-06
**Status:** Accepted
@@ -82,15 +82,58 @@ added later.
`IncorrectResultSizeDataAccessException`, and `GlobalExceptionHandler`'s generic handler maps
any stray one to `INTERNAL_ERROR` with no Hibernate/SQL leak — so no dedicated handler was
added.
- **The sibling Person path is unfixed but tracked.** `PersonService.findOrCreateByAlias`
(`findByAliasIgnoreCase`) and `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` carry the same
latent `Optional`-non-unique throw on user-influenced names; deferred to #731 rather than
widened into this fix.
- **The sibling Person path is fixed the same way — see the Person extension below (#731).**
- Postgres `LOWER()` folding of umlauts (`ü`/`ä`) is the actual correctness hinge of the
fallback and cannot be proven by a mocked repo, so it is pinned by a Testcontainers
`postgres:16-alpine` test on a `Glückwünsche`/`glückwünsche` pair; a plain-ASCII test would
stay green while the bug reappeared for umlaut tags.
## Person extension (#731)
The Person domain carried the same latent throw on **two** user-influenced lookup surfaces, and
is fixed with the same exact-case-first, non-throwing pattern — but with a deliberately
**different fallback per surface**, because the two paths have different consequences.
- **Alias path — `PersonService.findOrCreateByAlias` — deterministic lowest-id (mirrors tag).**
`findByAliasIgnoreCase` (`Optional`) is replaced by `findByAlias` (exact) → `findAllByAliasIgnoreCase`
(plural, lowest id) → the existing create-when-absent branch (INSTITUTION/GROUP and the
maiden-name alias are preserved verbatim). There is no human in the importer loop and the path
creates-on-absent anyway, so a deterministic guess is the right behaviour — exactly like tags.
- **Name/sender path — `PersonService.findByName` — bail to null on ambiguity (the new wrinkle).**
Used only by `DocumentService.storeDocument` to resolve the upload **sender** from the parsed
filename. `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` (`Optional`) is replaced by
`findByFirstNameAndLastName` (exact) → `findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase`
(plural). Resolution returns the exact-case match, else the single case-insensitive match, else
— on **two or more** matches — **empty**. The sender is left unset rather than guessing.
**Why this diverges from the alias (and tag) decision:** the archive's value is correct
provenance. A confidently-wrong pre-filled `Hans Müller` is worse than an empty field, because a
senior reviewer will not re-check a value that is already filled in, whereas an empty sender
routes the document into the "needs completion" state (`metadataComplete=false`) for a human to
assign. The load-bearing comment at `findByName` records this so a future "consistency cleanup"
does not reintroduce the confidently-wrong-sender bug by switching it to lowest-id.
- **Fail-closed on a null first name.** A parsed filename can lack a first name. The two new name
methods use explicit HQL equality (`= :firstName`) rather than a derived
`…IgnoreCase` query, because Spring Data folds a null derived-query argument to `first_name IS
NULL` — which would silently widen the match and pull a last-name-only / institution row in as a
"sender" (a quiet provenance-integrity defect). With HQL equality a null binds as `= NULL`,
which never matches, so a null first name resolves to **no sender**. This is pinned by a
real-Postgres repository test.
- **Scope — "ambiguous" is case-insensitive only.** Both exact-case lookups (`findByAlias`,
`findByFirstNameAndLastName`) return `Optional`, so two **byte-identical same-case** rows would
still throw `NonUniqueResultException`. That is a true data anomaly, deliberately out of scope
(it is not a case-collision), and it surfaces as the opaque `INTERNAL_ERROR` — never a silently
wrong row — so it is no worse than any other unexpected error and needs no extra handling here.
- **Same stance as tags otherwise:** no `unique(lower(alias))` / `unique(lower(name))` constraint
(collisions are valid human labels; `source_ref` is the stable identity per ADR-025), no
merge/dedupe, code-only and reversible, and no shared `resolveExactThenCi(...)` helper — the
two Person paths have different fallbacks, so the exact→CI→fallback logic is inlined at each
with its load-bearing comment (KISS).
## Alternatives considered
- **A `unique(lower(name))` index** — rejected: the collisions are valid canonical nodes, and

View File

@@ -20,24 +20,19 @@ The observability stack (Prometheus, Loki, Grafana, Tempo, GlitchTip) ships as a
---
## VPS Sizing Recommendations
## Server Sizing
### Recommended: Hetzner CX32
### Current Production Server: Hetzner Dedicated (Serverbörse)
**Specs**: 4 vCPU, 8 GB RAM, 80 GB SSD · **Cost**: 17 EUR/mo
**Specs**: Intel Core i7-6700 (4C/8T, 3.4 GHz), 64 GB RAM · acquired via Hetzner server auction
Sufficient for the application stack (Postgres, MinIO, OCR with `mem_limit: 12g`, backend, frontend, Caddy) on a CX32 today. Once the observability stack lands (Prometheus/Loki/Grafana/Alertmanager add ~2 GB) consider a CX42.
Comfortably handles the full application stack (Postgres, MinIO, OCR with `mem_limit: 12g`, backend, frontend, Caddy, full observability stack) with headroom to spare. The 64 GB RAM means OCR, Ollama inference, and the observability stack can all run concurrently without memory pressure.
### When to Upgrade: Hetzner CX42
### When to Reconsider Hardware
**Specs**: 8 vCPU, 16 GB RAM · **Cost**: 29 EUR/mo
Upgrade when:
- Observability stack adds memory pressure (Loki + Grafana with >30 days retention)
- OCR throughput needs scaling beyond a single-node Surya/Kraken setup
- Real user load profiled in Grafana shows response-time degradation
Never upgrade the VPS tier before profiling — most perceived performance issues are application bugs, not resource constraints.
- CPU is Skylake (2015) — single-threaded performance is the likely bottleneck before RAM
- Profile with Grafana dashboards before concluding hardware is the constraint
- Most perceived performance issues are application bugs (unindexed queries, N+1 loads), not resource limits
---
@@ -45,12 +40,11 @@ Never upgrade the VPS tier before profiling — most perceived performance issue
| Service | Cost |
|---|---|
| Hetzner CX32 VPS | 17.00 EUR |
| Hetzner dedicated server (Serverbörse, i7-6700, 64 GB RAM) | see invoice |
| Hetzner DNS | 0.00 EUR |
| Hetzner SMTP relay | ~1.00 EUR |
| **Total** | **~18 EUR/mo** |
MinIO data lives on the VPS disk (no Object Storage line item yet). The Hetzner OBS migration would add ~5 EUR/mo at ~200 GB.
MinIO data lives on the server disk (no Object Storage line item yet). The Hetzner OBS migration would add ~5 EUR/mo at ~200 GB.
Equivalent SaaS stack: 200300 EUR/mo.