Compare commits

...

73 Commits

Author SHA1 Message Date
Marcel
9a9e1c4c40 merge(search): resolve DEPLOYMENT.md conflict — keep setup + upgrade sections
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m17s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m45s
CI / fail2ban Regex (pull_request) Successful in 48s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m4s
Both the first-time model pull runbook (from this branch) and the model
upgrade procedure (from main) belong in DEPLOYMENT.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 16:47:49 +02:00
Marcel
4c620619d4 fix(search): formal Sie form in German error strings; clean up DocumentService imports
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m19s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m57s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
- error_smart_search_unavailable/rate_limited now use "Sie" (formal) to
  match the tone of all existing German error messages
- Replace inline FQNs in DocumentService.buildPersonSpec with proper
  JoinType + Predicate imports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 16:46:40 +02:00
Marcel
44baff9c9c docs(search): update CLAUDE.md, GLOSSARY, DEPLOYMENT, and C4 diagrams
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m52s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 16:16:04 +02:00
Marcel
4634da9865 feat(search): add @Schema annotations and regenerate TypeScript API types
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 16:11:01 +02:00
Marcel
79e4a3f9db feat(search): add searchDocumentsByPersonId with Specification-based sender/receiver query
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 16:04:54 +02:00
Marcel
70e8a6e6ad feat(search): implement NlSearchController with @WebMvcTest tests (7 cases)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:58:35 +02:00
Marcel
3af1095d13 feat(search): implement NlQueryParserService with Mockito tests (23 cases)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:54:45 +02:00
Marcel
8c835e957a feat(search): implement RestClientOllamaClient with WireMock tests
Switch to wiremock-jetty12 artifact and force ee10 Jetty deps to 12.1.8
to resolve compatibility with Spring Boot 4's Jetty 12.1.8 core.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:43:49 +02:00
Marcel
fe8fcba7a7 feat(search): add NlSearchRateLimiter with Bucket4j/Caffeine
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:39:06 +02:00
Marcel
e0c80ac193 feat(search): add Ollama and rate-limit config properties
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:37:24 +02:00
Marcel
005265b5a8 feat(search): add NL search error codes and i18n strings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:36:13 +02:00
Marcel
684c6e63de feat(search): add NL search domain records and OllamaClient interfaces
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:33:56 +02:00
Marcel
e27d52b9ee docs(c4): add L3 backend search component diagram
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:32:40 +02:00
Marcel
6f5497c7bf docs(adr): ADR-028 — NL search via Ollama
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:31:53 +02:00
Marcel
e0fac783e8 feat(person): add findByDisplayNameContaining service method
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:30:30 +02:00
Marcel
202ea85a58 build(deps): add org.wiremock:wiremock 3.9.2 as test dependency
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 15:28:55 +02:00
Marcel
7679596c70 docs(ollama): add model upgrade runbook + post-deploy smoke test to DEPLOYMENT.md
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Semgrep Security Scan (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Successful in 3m16s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 3m37s
CI / fail2ban Regex (push) Successful in 47s
CI / Semgrep Security Scan (push) Successful in 22s
CI / Compose Bucket Idempotency (push) Successful in 1m4s
Addresses Elicit's and Sara's review concerns on PR #749:
- Expand §6 ollama_models section into a full model upgrade runbook (step-by-step
  docker volume rm + recreate, including production volume name prefix)
- Add re-deploy idempotency note to §3.4 (init container exits quickly when model
  already present on the volume)
- Add NL search smoke test to §3.4 (curl command distinguishing 200 from 503
  NL_SEARCH_UNAVAILABLE)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
3d5dcd1f18 docs(deployment): fix OLLAMA_API_KEY version ref and add --wait warning
Updated OLLAMA_API_KEY env vars table from 0.6.5 to 0.6.5 or 0.30.6 to
match both tested versions. Added an explicit warning in §3.4 that
docker compose up -d --wait blocks for 60–90 min on first deploy when the
model pull has not yet completed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
52fca38f0f docs(env): correct OLLAMA_API_KEY comment — tested on 0.6.5 and 0.30.6
Both versions were tested and neither enforces the key. Comment updated to
say "0.6.5 or 0.30.6" and surface archiv-net as the sole effective control.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
662a8f3e80 fix(infra): interpolate APP_OLLAMA_BASE_URL so .env empty-value disables Ollama
Hardcoded literal overrides any .env setting — setting APP_OLLAMA_BASE_URL=
in .env had no effect on the backend container. Now uses the same pattern
as APP_OCR_TRAINING_TOKEN with a safe default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
cbba95c3f8 docs(c4): fix Ollama container version 0.6.5 → 0.30.6 in l2-containers.puml
Diagram must match the pinned image version in docker-compose.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
3536ed884c docs(adr): fix ADR-028 §12 false API-key claim, stale TBD, and §7 title
§12 stated OLLAMA_API_KEY guards against lateral movement — contradicts
§7's empirical finding that it is not enforced. Replaced with an accurate
note referencing §7. Stale pre-merge placeholder in Consequences ("Three
TBD items must be resolved") removed; all three are resolved. §7 section
title updated from "0.6.5" to "0.6.5 and 0.30.6" to match the body text.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
5a939d9222 fix(infra): escape \$\$SERVE_PID in compose command to prevent interpolation (#737)
Docker Compose interpolates $VAR in command strings — use $$ to pass a
literal $ to the shell so SERVE_PID=$! and kill $SERVE_PID work correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
93e90424ab docs(adr): update ADR-028 with 0.30.6 verified findings for API key + read_only (#737)
- OLLAMA_API_KEY: non-enforcement confirmed on both 0.6.5 and 0.30.6
- read_only: true: confirmed working on both 0.6.5 and 0.30.6
- Peak RSS during pull: ~108 MiB (well under 2g limit)
- All TBD placeholders resolved

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
e8f3004c4f feat(infra): add Ollama env vars to .env.example (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
9637ebbca2 feat(infra): add Ollama Docker Compose services for NL search (#737)
- ollama-model-init: one-shot init container that pulls qwen2.5:7b-instruct-q4_K_M
  into the ollama_models volume on first start
- ollama: main inference service on archiv-net (expose: only, no public port)
- ollama_models named volume for persistent model storage
- APP_OLLAMA_BASE_URL + APP_OLLAMA_API_KEY added to backend env
- Both services: cap_drop ALL, no-new-privileges, read_only+tmpfs (ADR-019 + ADR-028)
- start_period: 60s — model pre-pulled by init container

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
df10a42069 docs(deploy): document Ollama hardware requirements, env vars, and ops notes (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:59:35 +02:00
Marcel
64120a30b5 docs(arch): add Ollama container to C4 level-2 container diagram (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00
Marcel
25252fc709 feat(observability): add Grafana Ollama inference latency dashboard (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00
Marcel
1f379a161d fix(observability): fix OCR target name + add Ollama scrape job (#737)
- prometheus.yml: ocr:8000 → ocr-service:8000 (Docker service name is
  ocr-service, not ocr — current scrape target has never resolved)
- Add Ollama scrape job on ollama:11434 /metrics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00
Marcel
c0d034c85d docs(adr): add ADR-028 — Ollama Docker Compose service for NL search (#737)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:58:49 +02:00
Marcel
ca93cde06e docs(infra): correct server specs — Hetzner Serverbörse i7-6700 64 GB, not CX32
All checks were successful
CI / Unit & Component Tests (push) Successful in 3m18s
CI / OCR Service Tests (push) Successful in 21s
CI / Backend Unit Tests (push) Successful in 3m46s
CI / fail2ban Regex (push) Successful in 48s
CI / Semgrep Security Scan (push) Successful in 23s
CI / Compose Bucket Idempotency (push) Successful in 1m6s
Replace all references to the CX32 VPS (8 GB RAM, Hetzner Cloud) with the
actual production server: a Hetzner Serverbörse dedicated server with an
Intel Core i7-6700 (4C/8T, 3.4 GHz) and 64 GB RAM.

Affected files:
- .claude/personas/devops.md — monthly cost line + upgrade example
- docs/infrastructure/production-compose.md — sizing section + cost table
- docs/DEPLOYMENT.md — OCR memory table + OCR_MEM_LIMIT env var description
- docs/adr/004-pdfbox-thumbnails.md — thumbnailExecutor memory ceiling note
- docs/adr/021-tmpdir-persistent-volume-staging.md — OOMKill rationale in alternatives

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 14:51:07 +02:00
Marcel
7629e35897 docs(adr): renumber tag case-collision ADR 032 → 033 to resolve number clash (#731)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m15s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m40s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
CI / Unit & Component Tests (push) Successful in 3m13s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 3m40s
CI / fail2ban Regex (push) Successful in 46s
CI / Semgrep Security Scan (push) Successful in 21s
CI / Compose Bucket Idempotency (push) Successful in 1m7s
Both #730 (tag case-collision) and #684 (person-delete DB integrity) landed
an ADR-032 on main. Renumber the tag/case-collision one to 033 — it is
referenced only from this PR's person-domain comments and its own file, so the
move is self-contained and touches no Flyway migration. The person-delete
ADR-032 and the V71 migration comment that cites it are deliberately left
untouched (editing an applied migration would drift its Flyway checksum).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 13:52:25 +02:00
Marcel
cd741b9f57 docs(person): clarify case-collision scope at the exact-case lookups (#731)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m15s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m42s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
Review noted the "never throws" claim was overstated: the exact-case Optional
lookups still surface a NonUniqueResultException on two byte-identical
same-case rows. That is a true data anomaly out of #731's scope (ambiguous =
case-insensitive) and resolves to the opaque INTERNAL_ERROR, never a wrong
row. Record that boundary at both resolution points and in ADR-032 so the gap
is not silently assumed covered.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 13:36:22 +02:00
Marcel
ddf378aaac fix(person): resolve ambiguous sender names to null on upload (#731)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m18s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m38s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
findByName resolved via Optional<Person>
findByFirstNameIgnoreCaseAndLastNameIgnoreCase, which threw
NonUniqueResultException once two people shared a first+last name case-
insensitively (hans müller / Hans Müller) — a 500 on the routine upload path
(DocumentService.storeDocument sender resolution).

findByName now resolves exact-case → single case-insensitive match → else
empty. The sender path deliberately diverges from the alias path: an
ambiguous name leaves the sender UNSET rather than guessing the lowest id,
because correct provenance beats a confidently-wrong pre-fill a reviewer
won't re-check. The two new name queries use explicit HQL equality so a null
first name binds as `= NULL` (no match) instead of the derived-query fold to
`first_name IS NULL`, which would widen a last-name-only row in as a sender.

Pins the opaque error path (IncorrectResultSizeDataAccessException stays
INTERNAL_ERROR with no Hibernate/SQL/row-count leak) and extends ADR-032 with
the Person section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 13:03:04 +02:00
Marcel
20cfe41f21 fix(person): resolve case-colliding aliases without throwing (#731)
findOrCreateByAlias resolved via Optional<Person> findByAliasIgnoreCase,
which throws NonUniqueResultException once two aliases collide only by case
(müller / Müller) — a generic 500 on the importer path. Mirror the #730 tag
fix: resolve exact-case first, then the lowest-id case-insensitive sibling,
then create-when-absent (institution/group and maiden-name alias preserved).
The throwing Optional<…>IgnoreCase variant is deleted so it can't be reused.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:50:21 +02:00
Marcel
43601a3770 test(transcription): persist real persons for mention FK after V71 (#684)
All checks were successful
CI / Unit & Component Tests (push) Successful in 3m20s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 3m39s
CI / fail2ban Regex (push) Successful in 46s
CI / Semgrep Security Scan (push) Successful in 22s
CI / Compose Bucket Idempotency (push) Successful in 1m6s
V71 gives transcription_block_mentioned_persons.person_id a real FK, so two
TranscriptionBlockMentionsRepositoryTest cases that inserted mention rows with
random (non-existent) person ids now violate fk_tbmp_person. Persist real
Person rows and use their ids. Caught by CI's full suite.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:34:46 +02:00
Marcel
6603bc5333 test(person): address PR #736 review nits
- AC-3 cascade test: assert an innocent bystander's mention row survives the
  delete, proving the cascade is scoped to the deleted person (Nora).
- Fix integration-test comment: receivers is @ManyToMany(LAZY), not an EAGER
  @ElementCollection (Sara).
- ADR-032: note the @ prefix is kept in the degraded path, stripped in live
  mentions (Leonie).
- Add trailing newline to PersonRepository.java (Felix).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:34:46 +02:00
Marcel
6753d115f9 fix(db): leave V56 untouched to avoid Flyway checksum drift (#684)
Editing an already-applied migration changes its Flyway checksum and would
fail validateOnMigrate against prod (where V56 is applied). Revert the V56
comment edit; V71 now records that it reverses V56's no-FK choice and points
to ADR-032 as the authoritative record, so the V56 -> V71 trail stays
discoverable without touching the applied migration. (DevOps review, PR #736.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:34:46 +02:00
Marcel
73dd6c80fa docs(adr): record DB-level person-delete integrity decision (ADR-032) (#684)
Capture the reversal of V56's no-FK decision, the DB-layer-integrity
principle, and the cascade-boundary invariant (the cascade never reaches
documents rows). Numbered 032 — 028-031 are already taken on main; the
issue's '028 is next' was written before main moved.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:34:46 +02:00
Marcel
9ade36dd3b docs(db): annotate person-delete ON DELETE behaviour in DB diagrams (#684)
Annotate SET NULL on documents.sender_id and CASCADE on
document_receivers.person_id, and add the new
transcription_block_mentioned_persons -> persons person_id FK (CASCADE)
to both db-relationships.puml and db-orm.puml.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:34:46 +02:00
Marcel
378da60ae8 test(mention): lock deleted-person graceful-degradation contract (#684)
Strengthen one renderTranscriptionBody case into the AC-6 contract: a
@DisplayName with an empty mentionedPersons array (the deleted-person case
V71 produces) must render as plain readable text with no <a>, person-mention
class, data-person-id, or href. Guards against a future renderer refactor
silently reintroducing the dead-link-on-deleted-person degradation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:34:46 +02:00
Marcel
6d267f2269 test(person): describe DB-cascade mechanism in delete service-path test (#684)
The deletePerson service-path guard (AC-4) is unchanged behaviourally, but its
comments described the removed reassignSenderToNull/deleteReceiverReferences
chain. Update them to the V71 ON DELETE cascade mechanism.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:34:46 +02:00
Marcel
ff76a3784f refactor(person): simplify mergePersons to lean on V71 cascade (#684)
Drop the explicit deleteReceiverReferences call from mergePersons — the
source's leftover receiver join rows now cascade-drop via V71's ON DELETE
CASCADE on deleteById. Remove the now-unused deleteReceiverReferences
repository method (and its repo test), and add clearAutomatically +
flushAutomatically to the remaining merge native queries so the L1 cache
cannot desync from the bulk updates. Rewrite the merge unit test with
verifyNoMoreInteractions and add an end-to-end merge regression test (AC-7).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:34:46 +02:00
Marcel
534665459f refactor(person): thin deletePerson to lean on V71 DB cascade (#684)
Drop the application-layer sender/receiver detach from deletePerson — the
V71 ON DELETE constraints now enforce it. Remove the now-unused
reassignSenderToNull repository method and rewrite the unit test to assert
only the existence check plus deleteById (verifyNoMoreInteractions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:34:46 +02:00
Marcel
fd792f6d78 feat(person): enforce person-delete integrity at the DB layer (V71) (#684)
Add ON DELETE behaviour to the two V1 FKs into persons (documents.sender_id
-> SET NULL, document_receivers.person_id -> CASCADE) and a real FK with
ON DELETE CASCADE on the transcription_block_mentioned_persons soft reference,
cleaning up pre-existing orphan mention rows first. The cascade stays strictly
at the join/reference layer and never reaches documents rows.

Proven by new Postgres-backed PersonRepositoryTest cascade tests (AC-1/2/3/8
plus the cascade-boundary document-survival guard). Rewrites the now-stale
V56 'no FK' comment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 12:34:46 +02:00
Marcel
bafbf609eb docs(adr): ADR-032 tag-name resolution tolerates case-collisions (#730)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m16s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m34s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
CI / Unit & Component Tests (push) Successful in 3m17s
CI / OCR Service Tests (push) Successful in 24s
CI / Backend Unit Tests (push) Successful in 3m36s
CI / fail2ban Regex (push) Successful in 46s
CI / Semgrep Security Scan (push) Successful in 22s
CI / Compose Bucket Idempotency (push) Successful in 1m6s
Records the lasting decision behind the #730 fix: exact-case-first
resolution, deterministic lowest-id case-insensitive fallback, and the
explicit refusal of a unique(lower(name)) constraint (collisions are
valid canonical nodes). Previously the rationale lived only in code
comments and the issue body. Raised as a blocker in the PR #733 review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 11:09:10 +02:00
Marcel
2710f2e233 test(tag): close review-flagged gaps in case-collision coverage (#730)
Two adversarial gaps from PR #733 review:

- Unit: exact-case must win even when its id is NOT the lowest, proving
  exact-case short-circuits before the lowest-id tie-break (a naive
  "lowest id across all CI matches" would pick the wrong row).
- Integration: assert findAllByNameIgnoreCase folds the UPPERCASE
  "GLÜCKWÜNSCHE" — the exact string findOrCreate passes — so the umlaut
  proof matches the resolution path under test, not a lowercase probe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 11:07:39 +02:00
Marcel
80f6468d52 refactor(tag): use orElseThrow over Optional.get in findOrCreate (#730)
The lowest-id tie-break stream is guarded non-empty, so .get() never
throws — but the project bans Optional.get(). Switch to .orElseThrow()
for the project idiom. No behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 11:05:45 +02:00
Marcel
a58378e8f0 test(tag): pin case-colliding tag resolution on real Postgres (#730)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m16s
CI / OCR Service Tests (pull_request) Successful in 21s
CI / Backend Unit Tests (pull_request) Successful in 3m35s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
Mocked TagServiceTest can't prove the two things that actually broke:
that findAllByNameIgnoreCase folds umlauts the way Postgres LOWER() does,
and that saving a document tagged with a case-colliding tag no longer
throws NonUniqueResultException. Testcontainers postgres:16-alpine:

- updateDocument on a doc tagged with the child "weihnachten" succeeds
  and keeps exactly the child tag (not the parent).
- findOrCreate("GLÜCKWÜNSCHE") resolves the Glückwünsche/glückwünsche
  umlaut pair deterministically (lowest id) without throwing — the
  regression catcher a plain-ASCII pair would miss.
- bulk-edit funnels through resolveTags → findOrCreate, guarding a
  future refactor that bypasses it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 10:53:04 +02:00
Marcel
d000170f52 fix(tag): resolve case-colliding tag names without throwing (#730)
findOrCreate used tagRepository.findByNameIgnoreCase, which returns
Optional<Tag> and threw NonUniqueResultException whenever two tags
collided case-insensitively (a canonical parent and its same-named
lowercase child). Every document carrying such a tag became un-editable:
any save re-resolves the whole tag set by name and blew up with a 500.

Replace the throwing lookup with exact-case-first resolution: findByName
(exact) → findAllByNameIgnoreCase (lowest-id, deterministic, never
throws) → create. Delete findByNameIgnoreCase so the throwing call can't
be reintroduced. Case collisions are valid tree nodes — no migration, no
unique(lower(name)) constraint.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 10:49:02 +02:00
Marcel
d1ed9c022f test(stammbaum): fix #718 tab-order test for tidy-tree layout (#724)
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m17s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m39s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 23s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
CI / Unit & Component Tests (push) Successful in 3m19s
CI / OCR Service Tests (push) Successful in 23s
CI / fail2ban Regex (push) Has been cancelled
CI / Semgrep Security Scan (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
nightly / deploy-staging (push) Successful in 1m55s
The #718 keyboard-tab-order test hardcoded the visual order
['Eugenie','Walter','Clara','Hans'] on the assumption that buildLayout
sorts each generation alphabetically. #724 replaced that with the
tidy-tree layout, which orders a couple's run by structural ownership
(earliest birth year, then a deterministic id tie-break) — so Walter
(id …a1) now owns the run and Eugenie renders to his right.

Both PRs were green independently; the stale assertion only surfaced
once #718 and #724 landed together on main. Correct the expected reading
order to ['Walter','Eugenie','Clara','Hans'] and refresh the now-wrong
'alphabetical' comment. The companion self-validating test (DOM order ==
sorted by y,x) already guarded the real property, so only the hardcoded
assertion needed updating.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 18:00:59 +02:00
Marcel
1e5e8e43e8 refactor(transcribe): extract t-mark + draw-cue policy into tested helpers (#327)
Some checks failed
CI / Unit & Component Tests (push) Failing after 2m33s
CI / OCR Service Tests (push) Successful in 24s
CI / Backend Unit Tests (push) Successful in 3m42s
CI / fail2ban Regex (push) Successful in 43s
CI / Semgrep Security Scan (push) Successful in 22s
CI / Compose Bucket Idempotency (push) Successful in 1m7s
Review follow-up (Sara, fast-follow): the t no-active-region guard and the
draw-cue arm/disarm rule lived inline in the page with no direct coverage.
Extracted to pure resolveTrainingMark() (no-op when no region; recognition
enrolled flip) and canArmDraw()/shouldDisarmDraw(), each with unit tests
(10 cases total). The page now arms the draw cue only via canArmDraw and
disarms via shouldDisarmDraw, and routes t through resolveTrainingMark.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
8c198f22be polish(transcribe): review nits — kbd size, focus ring, guard, action doc (#327)
Review follow-up (Leonie, Felix, Markus): bump cheatsheet key caps to text-sm
for the 60+ audience, add a focus-visible ring to the close button, simplify
the draw-hint guard to {#if drawArmed} (the $effect already clears it outside
edit mode), and document why the transcribeShortcuts action ignores its node
and binds to window.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
6fd05e08d8 test(transcribe): prove Delete fires once via real shape + action (#327)
Review follow-up (Sara): the prior single-owner evidence was two separate
unit facts against an inert DOM stub. This renders a real AnnotationShape,
attaches the live transcribeShortcuts action, focuses the region, and presses
Delete once — asserting deleteCurrentRegion fires exactly once. A genuine
integration guard against re-introducing a double-bind.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
ab469b744c refactor(transcribe): extract region navigation into a tested pure helper (#327)
Review follow-up (Sara): j/k wrap-around and fresh-entry had no direct
coverage — the logic lived inline in the page where the action spec only
mocks the callbacks. Extracted to a pure stepRegion() with 9 unit tests
(empty list, forward/back, both wraps, fresh-entry null + unknown id,
length-1). Also replaces the inline nested ternary Felix flagged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
f07527158c fix(transcribe): hide the "?" hint on touch-only devices (#327)
Review follow-up (Requirements Engineer, Leonie) — closes the unmet
acceptance row. The coach card's "press ?" tip rendered unconditionally, so
a touch-only tablet transcriber (no hardware keyboard) was told to press a
key they don't have. The hint is now gated behind a fine-pointer media
query ([@media(pointer:coarse)]:hidden); the cheatsheet itself only opens
via the "?" key, so it already never surfaces without a keyboard. Also bumps
the key cap from 11px to text-xs for the 60+ audience.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
9f75de0350 fix(transcribe): localise Delete key cap + annotation label, clarify Esc row (#327)
Review follow-up (Leonie, Requirements Engineer): the Delete key cap was a
hardcoded German "Entf" shown to EN/ES users — now driven by key_cap_delete
(Entf/Del/Supr). The annotation read-only aria-label was a hardcoded German
"Block anzeigen" in all locales — now annotation_view_label. Renamed the Esc
row label from "Bereich schließen" to "Panel schließen" so it no longer
collides with "Bereich" (= region) used elsewhere in the cheatsheet.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
8a9fbc6aef test(transcribe): e2e coverage for shortcuts + cheatsheet a11y (#327)
Seeds a two-block document via API (annotations.spec pattern) and drives the
keyboard: ? opens the cheatsheet, Esc closes it then a second Esc closes the
panel (Esc ladder), e toggles read/edit, and j/k walk the regions forward and
back. Adds an axe-core pass over the open dialog asserting no critical
violations and aria-modal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
0336d07980 feat(transcribe): surface the "?" shortcut tip in the coach card (#327)
Adds a secondary keyboard hint to the existing coach footer row pointing
transcribers at the "?" cheatsheet, with a semantic <kbd>. Cross-references
the shortcuts introduced for the empty-state coach (#320).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
61256942e1 feat(transcribe): wire keyboard shortcuts into the document panel (#327)
Attaches the transcribeShortcuts action to the document page and wires every
command to existing context setters: j/k walk the sortOrder-sorted regions
and set activeAnnotationId, e toggles read/edit, n arms a draw cue (edit
only), Delete routes to the existing confirm path, ? opens the cheatsheet,
and Esc is now owned solely by the action — the inline onMount Esc listener
is removed (decision B1). Renders ShortcutCheatsheet and a draw-armed hint.

"t" toggles the document-level KURRENT_RECOGNITION training enrollment (the
only training surface that exists; there is no per-region flag yet — see
#321) and no-ops unless a region is active. Also reconciles annotation
Delete: the shape no longer self-handles the key, with onfocus syncing the
active region so the action deletes exactly once.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
6aaf8ddb9e feat(transcribe): add ShortcutCheatsheet dialog overlay (#327)
Native <dialog aria-modal> cheatsheet: showModal()/close() bridge, close
button focused on open, eight grouped <kbd> rows (nav/edit/utility), an
autosave footer line, and a reduced-motion-guarded fade. Closes on Esc,
backdrop click, and the close button; "?" while open is a no-op. Adds the
shortcut_close_panel i18n key. 8 component tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
1b9707c6cd feat(transcribe): add transcribeShortcuts keyboard action (#327)
Single-owner window keydown action for the Transcribe panel: j/k region
nav, e mode toggle, n draw (edit only), t training mark, Delete, ? cheat-
sheet, and the Esc precedence ladder (cheatsheet → editable no-op → close
panel). Pure input-to-callback translator with a focus guard that exempts
only "?"; removes its listener on destroy. 20 unit tests cover every key,
the panel/focus guards, the Esc matrix, and teardown.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
8353e71eed feat(transcribe): add i18n keys for shortcut cheatsheet (#327)
Adds de/en/es Paraglide keys for the keyboard-shortcut cheatsheet,
coach hint, draw-armed hint, and the discoverable annotation Delete
aria-label.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 17:54:24 +02:00
Marcel
0693cfddd1 fix(document): enlarge auto-title helper to 14px and assert its localized text (#726)
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m35s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m33s
CI / fail2ban Regex (pull_request) Successful in 48s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
CI / Unit & Component Tests (push) Failing after 2m31s
CI / OCR Service Tests (push) Successful in 21s
CI / Backend Unit Tests (push) Successful in 3m38s
CI / fail2ban Regex (push) Successful in 44s
CI / Semgrep Security Scan (push) Successful in 22s
CI / Compose Bucket Idempotency (push) Successful in 1m6s
Bumps the title helper from text-xs (12px) to text-sm (14px) for the 60+ audience (FR-005
prefers a larger size than the field hints) and tightens the component test to assert the
actual localized string and the 14px class — addresses Leonie's and Sara's review notes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 17:15:46 +02:00
Marcel
f656f7c1ff test(document): close review-flagged coverage gaps for auto-title sync (#726)
- save-time: precision+raw carry-over when the DTO omits them (exercises the shared skip-null
  resolvers), and a RANGE label round-trip (Sara/Elicit)
- factory: a bare Document with a null index builds "" rather than NPE-ing (Felix)
- backfill matcher: negative near-misses — ASCII hyphen vs en dash, missing separator before
  trailing text, year-with-trailing-letters, index followed by text without a separator (Sara)
- backfill integration: tighten the count assertion to exactly 1 on the clean test DB (Sara)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 17:10:50 +02:00
Marcel
7316c51d4a refactor(document): share skip-null date-field resolution between save and projection (#726)
Extract effectivePrecision/effectiveMetaDateEnd/effectiveMetaDateRaw, used by both
applyDatePrecision (the real setters) and projectedState (the title projection), so the two
can no longer drift — addresses review feedback (Markus/Felix/Sara). Writing a stored value
back when the DTO omits a field is a harmless no-op, so behaviour is unchanged (185 existing
DocumentServiceTest cases stay green). Also documents the file-replace "treat as manual" path
inline at the reassignment site.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 17:08:51 +02:00
Marcel
cf457cb96f docs(document): ADR-031 + glossary/c4/api_tests for auto-title sync (#726)
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m32s
CI / OCR Service Tests (pull_request) Successful in 26s
CI / Backend Unit Tests (pull_request) Successful in 3m35s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
ADR-031 records the shared document-package title factory, the exact-match save-time
regeneration, and the grammar-heuristic one-time backfill (with the ReDoS / no-version-spam
/ file-replace-is-manual decisions). Adds an "auto-generated title" glossary entry, extends
the document-management c4 diagram with DocumentTitleFactory / DocumentTitleBackfillMatcher
and the backfill flows, and documents POST /api/admin/backfill-titles in Admin-Auth.http as
a one-shot ADMIN call hitting port 8080 directly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:44:56 +02:00
Marcel
83e0afb466 feat(document): explain auto-generated title under the edit title field (#726)
Adds the FR-TITLE-005 helper line under the title input in DescriptionSection, shown only
on the single-document edit form via a new showTitleHelp prop (off for the new-document and
bulk-edit forms). It is wired to the input with aria-describedby and uses text-ink-3 (WCAG AA
on bg-surface). New Paraglide key form_helper_title_autogenerated in de/en/es. Adds a
component test for the helper + aria wiring and an end-to-end pass: create an auto-titled doc,
edit its date, and see the title follow on the detail page.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:41:52 +02:00
Marcel
12db7b3596 test(document): integration-test title backfill against real Postgres (#726)
Pins backfill behaviour on postgres:16-alpine (H2 unusable — title is NOT NULL): a stale
auto-title is rewritten, the sweep is idempotent (second run touches nothing), prose is
left alone, and the mechanical rename adds no document_versions rows. Permission (401/403)
stays in the faster @WebMvcTest slice.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:32:07 +02:00
Marcel
26b45f1c78 feat(document): one-time backfill endpoint for stale auto-titles (#726)
Adds POST /api/admin/backfill-titles (ADMIN-only, synchronous) which rebuilds every
machine-generated title from the row's current state. A grammar heuristic
(DocumentTitleBackfillMatcher) decides overwritability: index matched literally via
startsWith (originalFilename is user-controlled — no regex injection / ReDoS, CWE-1333),
date-label forms derived from the same Locale.GERMAN formatters as the factory so they
cannot drift, prose left untouched, fail-closed on any surprise. Saves via the repository
directly (no recordVersion — follows backfillFileHashes), so the mechanical rename never
version-spams document_versions. Idempotent: a second run rewrites nothing. Emits one
SLF4J-parameterized scanned/updated/skipped line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:29:57 +02:00
Marcel
e6ce00035e feat(document): regenerate auto-title on save when date/location change (#726)
updateDocument now captures the machine title from the persisted state before any
setter runs, and rebuilds it from the new state only when the submitted title still
equals that machine value — an exact comparison that relies on the edit form
round-tripping an untouched title verbatim. A hand-written or freshly-typed title is
kept; a blank submission falls back to the rebuilt auto-title (title is always present);
a file-replaced document no longer matches its import-time title and is treated as
manual. projectedState mirrors the setter asymmetry exactly (date/location overwrite
incl. null-clear; precision/end/raw skip-null from the entity).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:20:46 +02:00
Marcel
b1f77bcfb6 refactor(document): extract title composition into shared DocumentTitleFactory (#726)
Move DocumentTitleFormatter from importing into the document package and
introduce DocumentTitleFactory there as the single source of truth for the
{index} – {dateLabel} – {location} formula. DocumentImporter now consumes the
factory instead of owning the composition; the document package owns the rule,
importing depends on it (not the reverse). No behavioral change — importer
title assertions and the #666 fixture parity test stay green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:15:00 +02:00
103 changed files with 5823 additions and 406 deletions

View File

@@ -154,9 +154,9 @@ Schedule monthly automated restore tests. If the restore fails, the backup is wo
```
Every alert needs: description, severity, likely cause, resolution steps, escalation path.
3. **Upgrading VPS tier before profiling**
3. **Upgrading hardware before profiling**
```
# "The app feels slow" → upgrade from CX32 to CX42
# "The app feels slow" → order more RAM / a faster CPU
# Actual cause: unindexed query scanning 100k rows
```
Profile with Grafana dashboards first. Most perceived performance issues are application bugs, not resource constraints.
@@ -404,8 +404,8 @@ Hetzner Object Storage (S3-compatible, replaces MinIO in prod)
Prometheus + Loki + Alertmanager
```
### Monthly Cost: ~23 EUR
CX32 VPS (4 vCPU, 8GB RAM): 17 EUR · Object Storage (~200GB): 5 EUR · SMTP relay: ~1 EUR
### Monthly Cost: ~6 EUR (excl. server)
Hetzner dedicated server (Serverbörse, i7-6700, 64 GB RAM): see invoice · Object Storage (~200GB): 5 EUR · SMTP relay: ~1 EUR
### Reference Documentation
- Full CI workflow, Gitea vs GitHub differences: `docs/infrastructure/ci-gitea.md`

View File

@@ -72,6 +72,25 @@ VITE_SENTRY_DSN=
# Sentry/GlitchTip auth token for source map upload at build time (optional)
SENTRY_AUTH_TOKEN=
# NL search — Ollama LLM inference
# Leave APP_OLLAMA_BASE_URL empty to disable NL search (safe default for CX32 / CI).
# Set to http://ollama:11434 to enable. Requires CX42 (16 GB RAM) to run alongside OCR.
APP_OLLAMA_BASE_URL=http://ollama:11434
# CPU limit: 4.0 is safe on both CX32 (4 vCPUs) and CX42 (8 vCPUs).
# Raise to 7.5 on CX42 for full throughput.
OLLAMA_CPU_LIMIT=4.0
# Memory limit: requires CX42 (16 GB) to run alongside OCR.
# Reduce or set APP_OLLAMA_BASE_URL= on smaller hosts.
OLLAMA_MEM_LIMIT=8g
# Ollama API key — set on the Ollama service to restrict inference API access on archiv-net.
# Generate with: openssl rand -hex 32
# NOTE: Empirically verified that OLLAMA_API_KEY is NOT enforced in Ollama 0.6.5 or 0.30.6 (ADR-028 §7).
# archiv-net network isolation is the only effective access control. Retained for forward compatibility.
OLLAMA_API_KEY=
# Production SMTP — uncomment and fill in to send real emails instead of catching them
# APP_BASE_URL=https://your-domain.example.com
# MAIL_HOST=smtp.example.com

View File

@@ -92,6 +92,7 @@ backend/src/main/java/org/raddatz/familienarchiv/
├── ocr/ OCR domain — OcrService, OcrBatchService, training
├── person/ Person domain
│ └── relationship/ PersonRelationship sub-domain
├── search/ NL search domain — NlSearchController, NlQueryParserService, RestClientOllamaClient, NlSearchRateLimiter
├── security/ SecurityConfig, Permission, @RequirePermission, PermissionAspect
├── tag/ Tag domain
└── user/ User domain — AppUser, UserGroup, UserService
@@ -160,7 +161,7 @@ Input DTOs live flat in the domain package. Response types are the model entitie
→ See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded).
**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded); `SMART_SEARCH_UNAVAILABLE` (HTTP 503 — Ollama inference service offline or timed out); `SMART_SEARCH_RATE_LIMITED` (HTTP 429 — user exceeded 5 NL search requests per minute).
### Security / Permissions
@@ -268,7 +269,7 @@ Back button pattern — use the shared `<BackButton>` component from `$lib/share
→ See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
**LLM reminder:** when adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded).
**LLM reminder:** when adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`. Valid error codes include: `TOO_MANY_LOGIN_ATTEMPTS` (returned by `LoginRateLimiter` as HTTP 429 when a brute-force threshold is exceeded); `SMART_SEARCH_UNAVAILABLE` (HTTP 503 — Ollama inference service offline or timed out); `SMART_SEARCH_RATE_LIMITED` (HTTP 429 — user exceeded 5 NL search requests per minute).
---

View File

@@ -28,4 +28,18 @@ Authorization: Basic Gast_User gast
###Groups
#GET
GET http://localhost:8080/api/admin/tags
Authorization: Basic admin admin123
Authorization: Basic admin admin123
### One-time backfill: re-sync already-stale auto-titles (#726)
# RUNBOOK: a one-shot ADMIN maintenance call, NOT part of normal operation. Run it ONCE
# after deploying #726 to clean the existing backlog of stale titles (e.g. a title still
# showing "2028" after the date was corrected to "1928"). It is synchronous and idempotent
# — a second run returns {"count": 0} and writes nothing. Hit the backend DIRECTLY on
# port 8080 (NOT through the SvelteKit proxy) so the sweep can't trip the proxy timeout.
# Returns {"count": <documents rewritten>}.
POST http://localhost:8080/api/admin/backfill-titles
Authorization: Basic admin admin123
### NEGATIV-TEST: ein Nicht-Admin darf den Backfill NICHT auslösen -> 403 Forbidden
POST http://localhost:8080/api/admin/backfill-titles
Authorization: Basic Gast_User gast

View File

@@ -41,6 +41,27 @@
<type>pom</type>
<scope>import</scope>
</dependency>
<!-- Force WireMock's ee10 Jetty transitive deps to match Spring Boot's 12.1.8 core -->
<dependency>
<groupId>org.eclipse.jetty.ee10</groupId>
<artifactId>jetty-ee10-servlet</artifactId>
<version>12.1.8</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty.ee10</groupId>
<artifactId>jetty-ee10-servlets</artifactId>
<version>12.1.8</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty.ee10</groupId>
<artifactId>jetty-ee10-webapp</artifactId>
<version>12.1.8</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-ee</artifactId>
<version>12.1.8</version>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
@@ -137,6 +158,12 @@
<artifactId>archunit-junit5</artifactId>
<version>1.3.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.wiremock</groupId>
<artifactId>wiremock-jetty12</artifactId>
<version>3.9.2</version>
<scope>test</scope>
</dependency>
<!-- Excel Bearbeitung (Apache POI) -->
<dependency>

View File

@@ -57,6 +57,7 @@ public interface DocumentRepository extends JpaRepository<Document, UUID>, JpaSp
@EntityGraph("Document.full")
List<Document> findByReceiversId(UUID receiverId);
// Callers access only doc.getTags() to mutate the set — receivers/sender not touched; no graph needed.
List<Document> findByTags_Id(UUID tagId);

View File

@@ -32,6 +32,8 @@ import org.springframework.data.domain.Page;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import org.springframework.data.domain.Sort;
import jakarta.persistence.criteria.JoinType;
import jakarta.persistence.criteria.Predicate;
import org.springframework.data.jpa.domain.Specification;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
@@ -68,6 +70,7 @@ import static org.raddatz.familienarchiv.document.DocumentSpecifications.*;
public class DocumentService {
private final DocumentRepository documentRepository;
private final DocumentTitleFactory documentTitleFactory;
private final PersonService personService;
private final FileService fileService;
private final TagService tagService;
@@ -379,8 +382,14 @@ public class DocumentService {
DocumentStatus statusBefore = doc.getStatus();
// Auto-title sync (#726): capture the machine title from the CURRENTLY-persisted state
// BEFORE any setter runs — the setters below overwrite date/location and applyDatePrecision
// skips nulls, so the old state must be read first. The submitted title is the catalog
// auto-title iff it equals this; only then does it follow date/location forward.
String autoTitleBefore = documentTitleFactory.build(doc);
// 1. Einfache Felder Update
doc.setTitle(dto.getTitle());
doc.setTitle(resolveTitle(dto.getTitle(), autoTitleBefore, doc, dto));
doc.setDocumentDate(dto.getDocumentDate());
applyDatePrecision(doc, dto);
validateDateRange(doc); // guard before any save (updateDocumentTags below persists)
@@ -424,7 +433,11 @@ public class DocumentService {
doc.setScriptType(dto.getScriptType());
}
// 4. Datei austauschen (nur wenn eine neue ausgewählt wurde)
// 4. Datei austauschen (nur wenn eine neue ausgewählt wurde).
// NB (#726): this reassigns originalFilename to the uploaded file's name. The title's index
// segment is originalFilename, so after a replace the stored title no longer matches
// build(currentState) and the row is treated as manual — neither save-time nor backfill
// rewrites it. Accepted fail-safe (ADR-031), and autoTitleBefore was already captured above.
boolean fileReplaced = newFile != null && !newFile.isEmpty();
if (fileReplaced) {
FileService.UploadResult upload = fileService.uploadFile(newFile, newFile.getOriginalFilename());
@@ -453,22 +466,68 @@ public class DocumentService {
}
/**
* Applies the three date-precision fields only when the DTO carries them.
* A null field means "not submitted" — overwriting the stored value with null
* would fabricate a precision the user never chose, the exact dishonesty #666
* exists to prevent. A row with a genuinely-unknown precision must keep it when
* an unrelated edit (e.g. a location typo) is saved.
* Decides the title to persist on an edit (#726). The submitted title is the catalog
* auto-title only when it equals {@code autoBefore} (built from the stored state) — an exact
* comparison with no heuristic, relying on the edit form round-tripping the stored title
* verbatim when untouched. A machine title is rebuilt from the new state so a corrected
* date/location flows into it; a hand-written or freshly-typed title is kept verbatim. A blank
* submission is never persisted (title is always present) — it falls back to the rebuilt
* auto-title, which always carries at least the index.
*/
private String resolveTitle(String submitted, String autoBefore, Document doc, DocumentUpdateDTO dto) {
if (submitted == null || submitted.isBlank()) {
return documentTitleFactory.build(projectedState(doc, dto));
}
if (!Objects.equals(submitted, autoBefore)) {
return submitted;
}
return documentTitleFactory.build(projectedState(doc, dto));
}
/**
* The document state the regenerated title is built from. It is composed from the SAME
* resolvers the real setters use — {@code documentDate}/{@code location} overwritten from the
* DTO (a null value clears the field), precision/end/raw resolved skip-null via
* {@link #effectivePrecision}/{@link #effectiveMetaDateEnd}/{@link #effectiveMetaDateRaw} — so
* the projection cannot drift from {@link #updateDocument}. The index ({@code originalFilename})
* is never touched by a metadata edit.
*/
private Document projectedState(Document doc, DocumentUpdateDTO dto) {
return Document.builder()
.originalFilename(doc.getOriginalFilename())
.documentDate(dto.getDocumentDate())
.location(dto.getLocation())
.metaDatePrecision(effectivePrecision(doc, dto))
.metaDateEnd(effectiveMetaDateEnd(doc, dto))
.metaDateRaw(effectiveMetaDateRaw(doc, dto))
.build();
}
/**
* Applies the three date-precision fields skip-null: a null DTO field means "not submitted",
* so the stored value is kept rather than overwritten with null — which would fabricate a
* precision the user never chose, the exact dishonesty #666 exists to prevent. Expressed via
* the shared {@code effective*} resolvers so {@link #projectedState} stays lock-step (writing
* the stored value back when the DTO omits a field is a harmless no-op).
*/
private void applyDatePrecision(Document doc, DocumentUpdateDTO dto) {
if (dto.getMetaDatePrecision() != null) {
doc.setMetaDatePrecision(dto.getMetaDatePrecision());
}
if (dto.getMetaDateEnd() != null) {
doc.setMetaDateEnd(dto.getMetaDateEnd());
}
if (dto.getMetaDateRaw() != null) {
doc.setMetaDateRaw(dto.getMetaDateRaw());
}
doc.setMetaDatePrecision(effectivePrecision(doc, dto));
doc.setMetaDateEnd(effectiveMetaDateEnd(doc, dto));
doc.setMetaDateRaw(effectiveMetaDateRaw(doc, dto));
}
// Skip-null date-field resolution shared by applyDatePrecision (the real setters) and
// projectedState (the title projection) — the single rule keeps them from diverging (#726).
private static DatePrecision effectivePrecision(Document doc, DocumentUpdateDTO dto) {
return dto.getMetaDatePrecision() != null ? dto.getMetaDatePrecision() : doc.getMetaDatePrecision();
}
private static LocalDate effectiveMetaDateEnd(Document doc, DocumentUpdateDTO dto) {
return dto.getMetaDateEnd() != null ? dto.getMetaDateEnd() : doc.getMetaDateEnd();
}
private static String effectiveMetaDateRaw(Document doc, DocumentUpdateDTO dto) {
return dto.getMetaDateRaw() != null ? dto.getMetaDateRaw() : doc.getMetaDateRaw();
}
/**
@@ -976,6 +1035,28 @@ public class DocumentService {
return documentRepository.findByReceiversId(receiverId);
}
public DocumentSearchResult searchDocumentsByPersonId(UUID personId, LocalDate from, LocalDate to, Pageable pageable) {
Person person = personService.getById(personId);
Specification<Document> spec = buildPersonSpec(person, from, to);
Page<Document> page = documentRepository.findAll(spec, pageable);
List<DocumentListItem> items = enrichItems(page.getContent(), null);
return DocumentSearchResult.paged(items, pageable, page.getTotalElements());
}
private Specification<Document> buildPersonSpec(Person person, LocalDate from, LocalDate to) {
return (root, query, cb) -> {
if (query != null) query.distinct(true);
var receiversJoin = root.join("receivers", JoinType.LEFT);
var senderPredicate = cb.equal(root.get("sender"), person);
var receiverPredicate = cb.equal(receiversJoin, person);
var personPredicate = cb.or(senderPredicate, receiverPredicate);
var predicates = new ArrayList<>(List.of(personPredicate));
if (from != null) predicates.add(cb.greaterThanOrEqualTo(root.get("documentDate"), from));
if (to != null) predicates.add(cb.lessThanOrEqualTo(root.get("documentDate"), to));
return cb.and(predicates.toArray(new Predicate[0]));
};
}
public long getIncompleteCount() {
return documentRepository.countByMetadataCompleteFalse();
}
@@ -1010,6 +1091,43 @@ public class DocumentService {
tagService.delete(tagId);
}
/**
* One-time cleanup of already-stale auto-titles (#726, FR-003). For every document whose
* stored title passes the {@link DocumentTitleBackfillMatcher} overwrite heuristic, rebuilds
* the title from the row's current state and persists it only when it actually changed.
* Idempotent: a second run rebuilds the same value and saves nothing. Hand-written prose is
* left untouched.
*
* <p>Saves via {@code documentRepository.save} directly — it must NOT route through
* {@link #updateDocument} (which versions every write), following the {@link #backfillFileHashes}
* precedent: a mechanical rename must not snapshot the whole corpus into {@code document_versions}.
*
* @return the number of documents whose title was rewritten
*/
@Transactional
public int backfillTitles() {
List<Document> docs = documentRepository.findAll();
int updated = 0;
int skipped = 0;
for (Document doc : docs) {
if (!DocumentTitleBackfillMatcher.isOverwritable(
doc.getTitle(), doc.getOriginalFilename(), doc.getLocation())) {
skipped++;
continue;
}
String rebuilt = documentTitleFactory.build(doc);
if (rebuilt.equals(doc.getTitle())) {
skipped++; // already correct — keep idempotent, no write
continue;
}
doc.setTitle(rebuilt);
documentRepository.save(doc); // direct save, no recordVersion (mechanical rename)
updated++;
}
log.info("Title backfill complete: scanned={} updated={} skipped={}", docs.size(), updated, skipped);
return updated;
}
@Transactional
public int backfillFileHashes() {
List<Document> docs = documentRepository.findByFileHashIsNullAndFilePathIsNotNull();

View File

@@ -0,0 +1,101 @@
package org.raddatz.familienarchiv.document;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.LinkedHashSet;
import java.util.Locale;
import java.util.Set;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
/**
* Heuristic overwrite test for the one-time title backfill (#726, FR-004): decides whether a
* STORED title is a machine-generated auto-title (and so may be rebuilt from the row's current
* state) versus hand-written prose (left untouched). Used ONLY by the backfill — save-time
* regeneration uses an exact old-vs-new comparison instead, with no heuristic.
*
* <p>A stored title is overwritable iff, after stripping the literal {@code index} prefix:
* <ol>
* <li>it is exactly {@code {index}}, or</li>
* <li>{@code {index} {dateLabel}} with an optional trailing {@code {location}} segment
* (any location — a present, valid date label is itself strong evidence of a machine
* title), or</li>
* <li>{@code {index} {location}} where the segment equals the document's current location
* (no date label, so the segment must match the known location to be distinguished from
* prose).</li>
* </ol>
*
* <p>Security: the {@code index} is compared <em>literally</em> via {@link String#startsWith}
* (never compiled into a regex) because {@code originalFilename} is user-controlled and may carry
* regex metacharacters — an unquoted pattern would be a ReDoS / regex-injection vector
* (CWE-1333 / CWE-625). The date-label sub-patterns use only bounded, non-nested quantifiers over
* short tokens, so there is no catastrophic backtracking. Fail-closed: any null/blank index or
* structural surprise returns {@code false}.
*/
final class DocumentTitleBackfillMatcher {
private static final String SEPARATOR = " ";
// German month tokens derived from the SAME Locale.GERMAN formatters DocumentTitleFormatter
// uses, so the matcher's accepted spellings cannot drift from what the factory emits (full
// names "Januar"…"Dezember"; abbreviations "Jan."…"Dez." — note May/June/July/März carry no
// period). Pattern.quote each so a "." in an abbreviation is literal, never a wildcard.
private static final String FULL_MONTH = monthAlternation("MMMM");
private static final String ABBR_MONTH = monthAlternation("MMM");
private static final String SEASON = "(?:Frühling|Sommer|Herbst|Winter)";
private static final String YEAR = "\\d{1,4}";
private static final String DAY_NUM = "\\d{1,2}";
// One complete date label, anchored, optionally followed by a free-form trailing location
// segment. Only bounded/non-nested quantifiers over short tokens plus a single trailing
// ".+" → linear, no catastrophic backtracking (FR-004 ReDoS guard).
private static final Pattern DATE_LABEL_WITH_OPTIONAL_LOCATION = Pattern.compile(
"^(?:" + String.join("|",
YEAR, // 1916
"ca\\. " + YEAR, // ca. 1920
FULL_MONTH + " " + YEAR, // Juni 1916
DAY_NUM + "\\. " + FULL_MONTH + " " + YEAR, // 24. Dezember 1943
SEASON + " " + YEAR, // Sommer 1916
"Datum unbekannt",
DAY_NUM + "\\." + DAY_NUM + "\\. " + ABBR_MONTH + " " + YEAR, // 10.11. Jan. 1917
DAY_NUM + "\\. " + ABBR_MONTH + " " + DAY_NUM + "\\. " + ABBR_MONTH + " " + YEAR, // 30. Jan. 2. Feb. 1917
DAY_NUM + "\\. " + ABBR_MONTH + " " + YEAR + " " + DAY_NUM + "\\. " + ABBR_MONTH + " " + YEAR, // 30. Dez. 1916 2. Jan. 1917
DAY_NUM + "\\. " + ABBR_MONTH + " " + YEAR, // 10. Jan. 1917 (range end == start)
"ab " + DAY_NUM + "\\. " + ABBR_MONTH + " " + YEAR) // ab 10. Jan. 1917
+ ")(?: .+)?$");
private DocumentTitleBackfillMatcher() {
}
static boolean isOverwritable(String title, String index, String location) {
if (title == null || index == null || index.isBlank()) {
return false; // fail closed
}
if (!title.startsWith(index)) {
return false; // index is matched LITERALLY, never as a regex
}
String tail = title.substring(index.length());
if (tail.isEmpty()) {
return true; // exactly {index}
}
if (!tail.startsWith(SEPARATOR)) {
return false;
}
String body = tail.substring(SEPARATOR.length());
if (DATE_LABEL_WITH_OPTIONAL_LOCATION.matcher(body).matches()) {
return true; // {dateLabel} (+ optional trailing location)
}
// No date label: the lone segment must equal the document's current location to be
// distinguished from hand-written prose.
return location != null && !location.isBlank() && body.equals(location);
}
private static String monthAlternation(String pattern) {
DateTimeFormatter formatter = DateTimeFormatter.ofPattern(pattern, Locale.GERMAN);
Set<String> tokens = new LinkedHashSet<>();
for (int month = 1; month <= 12; month++) {
tokens.add(formatter.format(LocalDate.of(2000, month, 15)));
}
return tokens.stream().map(Pattern::quote).collect(Collectors.joining("|", "(?:", ")"));
}
}

View File

@@ -0,0 +1,39 @@
package org.raddatz.familienarchiv.document;
import org.springframework.stereotype.Component;
/**
* Single source of truth for the auto-generated document title
* {@code {index} {dateLabel} {location}}.
*
* <p>The {@code document} package owns this formula; {@code importing} consumes it
* (see ADR for issue #726). The leading {@code index} is the document's
* {@code originalFilename}; the date label is the honest German label produced by
* {@link DocumentTitleFormatter} (the Java half of the #666 date-label split); the
* trailing location is the {@code meta_location} verbatim, omitted when blank.
*/
@Component
public class DocumentTitleFactory {
static final String SEPARATOR = " ";
/**
* Composes the auto-title from the document's current state. The date segment is
* dropped for UNKNOWN precision or a null date (the honest "no date" case); the
* location segment is dropped when blank.
*/
public String build(Document doc) {
// originalFilename is NOT NULL in production; guard only so a synthetic/partial entity
// never trips StringBuilder(null) with an opaque NPE.
StringBuilder title = new StringBuilder(doc.getOriginalFilename() == null ? "" : doc.getOriginalFilename());
if (doc.getDocumentDate() != null && doc.getMetaDatePrecision() != DatePrecision.UNKNOWN) {
title.append(SEPARATOR).append(DocumentTitleFormatter.formatTitleDate(
doc.getDocumentDate(), doc.getMetaDatePrecision(),
doc.getMetaDateEnd(), doc.getMetaDateRaw()));
}
if (doc.getLocation() != null && !doc.getLocation().isBlank()) {
title.append(SEPARATOR).append(doc.getLocation());
}
return title.toString();
}
}

View File

@@ -1,6 +1,4 @@
package org.raddatz.familienarchiv.importing;
import org.raddatz.familienarchiv.document.DatePrecision;
package org.raddatz.familienarchiv.document;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;

View File

@@ -78,4 +78,8 @@ public class DomainException extends RuntimeException {
public static DomainException tooManyRequests(ErrorCode code, String message, long retryAfterSeconds) {
return new DomainException(code, HttpStatus.TOO_MANY_REQUESTS, message, retryAfterSeconds);
}
public static DomainException serviceUnavailable(ErrorCode code, String message) {
return new DomainException(code, HttpStatus.SERVICE_UNAVAILABLE, message);
}
}

View File

@@ -135,6 +135,12 @@ public enum ErrorCode {
/** The merge target is a descendant of the source tag. 400 */
TAG_MERGE_INVALID_TARGET,
// --- NL Search ---
/** Ollama is unreachable or timed out. 503 */
SMART_SEARCH_UNAVAILABLE,
/** NL search rate limit exceeded (5 requests per user per minute). 429 */
SMART_SEARCH_RATE_LIMITED,
// --- Generic ---
/** Request validation failed (missing or malformed fields). 400 */
VALIDATION_ERROR,

View File

@@ -5,6 +5,7 @@ import lombok.extern.slf4j.Slf4j;
import org.raddatz.familienarchiv.document.DatePrecision;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentTitleFactory;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.document.ThumbnailAsyncRunner;
import org.raddatz.familienarchiv.exception.DomainException;
@@ -74,6 +75,7 @@ public class DocumentImporter {
Pattern.compile("[A-Za-z\\u00C0-\\u00D6\\u00D8-\\u00F6\\u00F8-\\u00FF]{1,4}-+\\d+x?");
private final DocumentService documentService;
private final DocumentTitleFactory documentTitleFactory;
private final PersonService personService;
private final TagService tagService;
private final S3Client s3Client;
@@ -181,7 +183,7 @@ public class DocumentImporter {
applyAttribution(doc, row);
applyDates(doc, row);
applyAuthoritativeAssociations(doc, row);
applyFileMetadata(doc, s3Key, contentType, status, index);
applyFileMetadata(doc, s3Key, contentType, status);
applyComputedFlags(doc);
return doc;
}
@@ -217,14 +219,15 @@ public class DocumentImporter {
attachTag(doc, row.get("tags"));
}
// S3 key, content type, status, and the index-derived title.
// S3 key, content type, status, and the index-derived title. The title formula lives in
// the document package's DocumentTitleFactory (single source of truth, #726); by this point
// applyDates has populated the date/location and originalFilename carries the index.
private void applyFileMetadata(Document doc, String s3Key, String contentType,
DocumentStatus status, String index) {
DocumentStatus status) {
doc.setStatus(status);
doc.setFilePath(s3Key);
doc.setContentType(contentType);
doc.setTitle(buildTitle(index, doc.getDocumentDate(), doc.getMetaDatePrecision(),
doc.getMetaDateEnd(), doc.getMetaDateRaw(), doc.getLocation()));
doc.setTitle(documentTitleFactory.build(doc));
}
// metadataComplete: a document counts as fully described if any of the three "who/when"
@@ -235,20 +238,6 @@ public class DocumentImporter {
|| !doc.getReceivers().isEmpty());
}
// The title carries the date at the HONEST precision (never a fabricated day) via the
// shared DocumentTitleFormatter, plus the location — kept under 20 lines by delegating.
private static String buildTitle(String index, LocalDate date, DatePrecision precision,
LocalDate end, String raw, String location) {
StringBuilder title = new StringBuilder(index);
if (date != null && precision != DatePrecision.UNKNOWN) {
title.append(" ").append(DocumentTitleFormatter.formatTitleDate(date, precision, end, raw));
}
if (location != null && !location.isBlank()) {
title.append(" ").append(location);
}
return title.toString();
}
// ─── attribution routing — register-first, always retain raw ─────────────────────
private Person resolveSender(String slug, String rawName) {

View File

@@ -29,14 +29,36 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
// Stammbaum-Knoten: alle Personen mit family_member = true.
List<Person> findByFamilyMemberTrueOrderByLastNameAscFirstNameAsc();
// Lookup by full alias string, used during ODS mass import
Optional<Person> findByAliasIgnoreCase(String alias);
// Exact-case alias lookup — the first resolution step in findOrCreateByAlias.
// Case-colliding aliases across persons (müller / Müller) are valid human labels, NOT
// duplicates: source_ref is the stable identity (ADR-025/033), alias is editable. Do NOT
// add a unique(lower(alias)) constraint — see ADR-033.
Optional<Person> findByAlias(String alias);
// Plural case-insensitive alias lookup — the fallback step. Returns ALL case-folding
// siblings so the service can pick a deterministic one (lowest id) instead of letting a
// derived Optional<…>IgnoreCase throw NonUniqueResultException. See ADR-033.
List<Person> findAllByAliasIgnoreCase(String alias);
// Lookup by the normalizer person_id, used for idempotent canonical re-import (Phase 3).
Optional<Person> findBySourceRef(String sourceRef);
// Exact first+last name match, used for filename-based sender lookup
Optional<Person> findByFirstNameIgnoreCaseAndLastNameIgnoreCase(String firstName, String lastName);
// Exact-case first+last name match — the first step of filename-based sender resolution.
// Explicit `=` (HQL, not a derived query) so a null firstName binds as `first_name = NULL`
// — never a match — instead of the derived-query fold to `first_name IS NULL`, which would
// pull a last-name-only row in as a sender (a provenance defect). See ADR-033.
@Query("SELECT p FROM Person p WHERE p.firstName = :firstName AND p.lastName = :lastName")
Optional<Person> findByFirstNameAndLastName(@Param("firstName") String firstName,
@Param("lastName") String lastName);
// Plural case-insensitive first+last name match — lets findByName bail to empty on 2+ matches
// instead of letting a derived Optional<…>IgnoreCase throw NonUniqueResultException. Same
// null fail-closed guarantee as above: LOWER(:firstName) is NULL for a null arg, so a null
// first name resolves to no match (not first_name IS NULL widening). See ADR-033.
@Query("SELECT p FROM Person p WHERE LOWER(p.firstName) = LOWER(:firstName) "
+ "AND LOWER(p.lastName) = LOWER(:lastName)")
List<Person> findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(@Param("firstName") String firstName,
@Param("lastName") String lastName);
// --- PersonSummaryDTO with document count ---
@@ -189,18 +211,15 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
List<Person> findCorrespondentsWithFilter(@Param("personId") UUID personId, @Param("q") String q);
// --- Merge helpers (native SQL to bypass JPA entity layer) ---
// clearAutomatically + flushAutomatically keep the L1 cache from desyncing: these bulk
// updates run beneath Hibernate, and mergePersons follows them with a deleteById whose
// ON DELETE CASCADE (V71) also fires beneath the session.
@Modifying
@Modifying(clearAutomatically = true, flushAutomatically = true)
@Query(value = "UPDATE documents SET sender_id = :target WHERE sender_id = :source", nativeQuery = true)
void reassignSender(@Param("source") UUID source, @Param("target") UUID target);
// Used by deletePerson: detach a deleted person from documents they sent, so the hard
// delete cannot orphan a documents.sender_id FK (the column is nullable).
@Modifying
@Query(value = "UPDATE documents SET sender_id = NULL WHERE sender_id = :source", nativeQuery = true)
void reassignSenderToNull(@Param("source") UUID source);
@Modifying
@Modifying(clearAutomatically = true, flushAutomatically = true)
@Query(value = """
INSERT INTO document_receivers (document_id, person_id)
SELECT document_id, :target FROM document_receivers
@@ -210,8 +229,4 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
)
""", nativeQuery = true)
void insertMissingReceiverReference(@Param("source") UUID source, @Param("target") UUID target);
@Modifying
@Query(value = "DELETE FROM document_receivers WHERE person_id = :source", nativeQuery = true)
void deleteReceiverReferences(@Param("source") UUID source);
}
}

View File

@@ -1,5 +1,6 @@
package org.raddatz.familienarchiv.person;
import java.util.Comparator;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
@@ -68,15 +69,13 @@ public class PersonService {
}
/**
* Hard-deletes a person used by triage. Detaches the person from any documents they
* sent (nulls sender_id) and from any received-document references first, so the delete
* cannot orphan an FK and fail with a 500.
* Hard-deletes a person used by triage. Referential integrity is enforced by the database
* (V71's {@code ON DELETE} constraints: sender_id {@code SET NULL}, receiver and @-mention
* rows {@code CASCADE}), so the service stays thin — it only verifies existence then deletes.
*/
@Transactional
public void deletePerson(UUID id) {
getById(id);
personRepository.reassignSenderToNull(id);
personRepository.deleteReceiverReferences(id);
personRepository.deleteById(id);
}
@@ -100,6 +99,10 @@ public class PersonService {
return personRepository.findAllById(ids);
}
public List<Person> findByDisplayNameContaining(String fragment) {
return personRepository.searchByName(fragment);
}
public List<Person> findAllFamilyMembers() {
return personRepository.findByFamilyMemberTrueOrderByLastNameAscFirstNameAsc();
}
@@ -112,7 +115,19 @@ public class PersonService {
}
public Optional<Person> findByName(String firstName, String lastName) {
return personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase(firstName, lastName);
// Same scope as findOrCreateByAlias (#731): a case-collision resolves without throwing;
// two byte-identical same-case persons are an out-of-scope data anomaly the exact
// Optional below would surface as the opaque INTERNAL_ERROR, not a wrong sender.
Optional<Person> exact = personRepository.findByFirstNameAndLastName(firstName, lastName);
if (exact.isPresent()) return exact;
List<Person> caseInsensitive =
personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(firstName, lastName);
// Deliberate divergence from findOrCreateByAlias: an ambiguous filename leaves the sender
// UNSET rather than picking the lowest id. The archive's value is correct provenance — a
// confidently-wrong pre-filled "Hans Müller" is worse than an empty field, because a
// reviewer won't re-check a pre-filled value. Do NOT "consistency-clean" this into the
// lowest-id fallback. See ADR-033.
return caseInsensitive.size() == 1 ? Optional.of(caseInsensitive.get(0)) : Optional.empty();
}
/** Lookup by the normalizer person_id — used by the canonical importer for register-first matching. */
@@ -127,32 +142,45 @@ public class PersonService {
PersonType type = PersonTypeClassifier.classify(alias);
if (type == PersonType.SKIP) return null;
return personRepository.findByAliasIgnoreCase(alias).orElseGet(() -> {
if (type == PersonType.INSTITUTION || type == PersonType.GROUP) {
return personRepository.save(Person.builder()
.alias(alias)
.lastName(alias)
.personType(type)
.build());
}
// Aliases differing only by case (müller / Müller) are valid distinct persons, not
// duplicates, so a CASE-COLLISION must not throw: exact-case first, then the lowest-id
// case-insensitive sibling, then create. Mirrors the tag path — see ADR-033.
// Scope (#731): "ambiguous" means case-insensitive. Two BYTE-IDENTICAL same-case aliases
// are a true data anomaly out of scope here; the exact Optional below would surface that
// as the opaque INTERNAL_ERROR (never a wrong row), not silently pick one.
Optional<Person> exact = personRepository.findByAlias(alias);
if (exact.isPresent()) return exact.get(); // exact-case wins
List<Person> caseInsensitive = personRepository.findAllByAliasIgnoreCase(alias);
if (!caseInsensitive.isEmpty()) {
return caseInsensitive.stream().min(Comparator.comparing(Person::getId)).orElseThrow(); // deterministic tie-break — list is non-empty, never throws
}
PersonNameParser.SplitName split = PersonNameParser.split(alias);
Person person = personRepository.save(Person.builder()
// Create-when-absent: institution/group keep the full label in lastName; a person name
// is split and a maiden name (geb. …) becomes a MAIDEN_NAME alias.
if (type == PersonType.INSTITUTION || type == PersonType.GROUP) {
return personRepository.save(Person.builder()
.alias(alias)
.firstName(split.firstName())
.lastName(split.lastName())
.lastName(alias)
.personType(type)
.build());
if (split.maidenName() != null) {
int nextSortOrder = aliasRepository.findMaxSortOrder(person.getId()) + 1;
aliasRepository.save(PersonNameAlias.builder()
.person(person)
.lastName(split.maidenName())
.type(PersonNameAliasType.MAIDEN_NAME)
.sortOrder(nextSortOrder)
.build());
}
return person;
});
}
PersonNameParser.SplitName split = PersonNameParser.split(alias);
Person person = personRepository.save(Person.builder()
.alias(alias)
.firstName(split.firstName())
.lastName(split.lastName())
.build());
if (split.maidenName() != null) {
int nextSortOrder = aliasRepository.findMaxSortOrder(person.getId()) + 1;
aliasRepository.save(PersonNameAlias.builder()
.person(person)
.lastName(split.maidenName())
.type(PersonNameAliasType.MAIDEN_NAME)
.sortOrder(nextSortOrder)
.build());
}
return person;
}
/**
@@ -295,6 +323,12 @@ public class PersonService {
return personRepository.save(person);
}
/**
* Merges the source person into the target, then deletes the source. Sender references move
* to the target; receiver references the target lacks are inserted. The source's leftover
* receiver join rows are not deleted explicitly — they cascade-drop via V71's
* {@code ON DELETE CASCADE} on {@code document_receivers.person_id} when the source is deleted.
*/
@Transactional
public void mergePersons(UUID sourceId, UUID targetId) {
if (sourceId.equals(targetId)) {
@@ -311,9 +345,7 @@ public class PersonService {
// Add target as receiver where source is receiver but target is not yet
personRepository.insertMissingReceiverReference(sourceId, targetId);
// Remove all remaining source receiver references (duplicates already handled)
personRepository.deleteReceiverReferences(sourceId);
// Source's remaining receiver rows cascade-drop via V71's ON DELETE CASCADE.
personRepository.deleteById(sourceId);
}

View File

@@ -20,8 +20,8 @@ Features: person CRUD, name alias management, person merge (deduplication), fami
| `getById(UUID)` | document, geschichte, ocr | Fetch one person by ID |
| `getAllById(List<UUID>)` | document | Bulk fetch for sender/receiver resolution |
| `findAll(String q)` | document, dashboard | List all persons |
| `findByName(String firstName, String lastName)` | document | Typeahead search |
| `findOrCreateByAlias(String rawName)` | importing | Idempotent create during mass import; type classification happens internally |
| `findByName(String firstName, String lastName)` | document | Filename-based **sender resolution** in `storeDocument`: exact-case match → single case-insensitive match → else **empty** (ambiguous names leave the sender unset; a null first name never matches). See ADR-033. |
| `findOrCreateByAlias(String rawName)` | importing | Idempotent create during mass import; type classification happens internally. Resolves exact-case → lowest-id case-insensitive sibling → create — never throws on case-colliding aliases. See ADR-033. |
| `findAllFamilyMembers()` | dashboard | Family member list for stats |
| `findCorrespondents()` | document | Correspondent list for conversation filter |
| `count()` | dashboard | Total person count for stats |

View File

@@ -0,0 +1,22 @@
package org.raddatz.familienarchiv.search;
import io.swagger.v3.oas.annotations.media.Schema;
import java.time.LocalDate;
import java.util.List;
public record NlQueryInterpretation(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<PersonHint> resolvedPersons,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<PersonHint> ambiguousPersons,
LocalDate dateFrom,
LocalDate dateTo,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<String> keywords,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
String rawQuery,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
boolean keywordsApplied
) {
}

View File

@@ -0,0 +1,160 @@
package org.raddatz.familienarchiv.search;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.raddatz.familienarchiv.document.DocumentSearchResult;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentSort;
import org.raddatz.familienarchiv.document.SearchFilters;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonService;
import org.raddatz.familienarchiv.tag.TagOperator;
import org.springframework.data.domain.Pageable;
import org.springframework.stereotype.Service;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
@Service
@RequiredArgsConstructor
@Slf4j
public class NlQueryParserService {
private static final int MIN_QUERY = 3;
private static final int MAX_QUERY = 500;
private static final int MAX_NAME_LENGTH = 200;
private static final int MAX_CANDIDATES = 10;
private final OllamaClient ollamaClient;
private final PersonService personService;
private final DocumentService documentService;
public NlSearchResponse search(String query, Pageable pageable) {
if (query == null || query.length() < MIN_QUERY || query.length() > MAX_QUERY) {
throw DomainException.badRequest(ErrorCode.VALIDATION_ERROR,
"Query must be between " + MIN_QUERY + " and " + MAX_QUERY + " characters");
}
OllamaExtraction ext = ollamaClient.parse(query);
List<String> personNames = ext.personNames() != null ? ext.personNames() : List.of();
List<String> keywords = ext.keywords() != null ? ext.keywords() : List.of();
NameResolution resolution = resolveNames(personNames);
if (!resolution.ambiguous().isEmpty()) {
NlQueryInterpretation interpretation = new NlQueryInterpretation(
List.of(), resolution.ambiguous(),
ext.dateFrom(), ext.dateTo(),
keywords, ext.rawQuery(), false);
return new NlSearchResponse(DocumentSearchResult.of(List.of()), interpretation);
}
List<PersonHint> resolved = resolution.resolved();
List<String> noMatchFragments = resolution.noMatchFragments();
List<String> extraFragments = resolution.extraFragments();
String text = buildText(keywords, noMatchFragments, extraFragments, ext.rawQuery());
if (resolved.size() == 1 && isAnyRole(ext.personRole())) {
UUID personId = resolved.get(0).id();
DocumentSearchResult docs = documentService.searchDocumentsByPersonId(
personId, ext.dateFrom(), ext.dateTo(), pageable);
NlQueryInterpretation interpretation = new NlQueryInterpretation(
resolved, List.of(), ext.dateFrom(), ext.dateTo(), keywords, ext.rawQuery(), false);
return new NlSearchResponse(docs, interpretation);
}
UUID sender = buildSender(resolved, ext.personRole());
UUID receiver = buildReceiver(resolved, ext.personRole());
SearchFilters filters = new SearchFilters(
text.isBlank() ? null : text,
ext.dateFrom(), ext.dateTo(),
sender, receiver,
List.of(), null,
null, TagOperator.AND, false);
DocumentSearchResult docs = documentService.searchDocuments(filters, DocumentSort.DATE, "desc", pageable);
boolean keywordsApplied = !text.isBlank();
NlQueryInterpretation interpretation = new NlQueryInterpretation(
resolved, List.of(), ext.dateFrom(), ext.dateTo(), keywords, ext.rawQuery(), keywordsApplied);
return new NlSearchResponse(docs, interpretation);
}
private NameResolution resolveNames(List<String> personNames) {
List<PersonHint> resolved = new ArrayList<>();
List<PersonHint> ambiguous = new ArrayList<>();
List<String> noMatchFragments = new ArrayList<>();
List<String> extraFragments = new ArrayList<>();
int resolvedIndex = 0;
for (String name : personNames) {
if (name == null || name.length() > MAX_NAME_LENGTH) {
log.debug("Skipping name fragment (too long or null): length={}", name == null ? 0 : name.length());
continue;
}
List<Person> candidates = personService.findByDisplayNameContaining(name);
List<Person> capped = candidates.size() > MAX_CANDIDATES
? candidates.subList(0, MAX_CANDIDATES)
: candidates;
if (capped.isEmpty()) {
noMatchFragments.add(name);
} else if (capped.size() == 1) {
Person p = capped.get(0);
PersonHint hint = new PersonHint(p.getId(), p.getDisplayName());
resolvedIndex++;
if (resolvedIndex <= 2) {
resolved.add(hint);
} else {
extraFragments.add(name);
}
} else {
capped.forEach(p -> ambiguous.add(new PersonHint(p.getId(), p.getDisplayName())));
}
}
return new NameResolution(resolved, ambiguous, noMatchFragments, extraFragments);
}
private String buildText(List<String> keywords, List<String> noMatchFragments,
List<String> extraFragments, String rawQuery) {
List<String> parts = new ArrayList<>();
parts.addAll(keywords);
parts.addAll(noMatchFragments);
parts.addAll(extraFragments);
String text = String.join(" ", parts).strip();
if (text.isBlank() && rawQuery != null && !rawQuery.isBlank()) {
return rawQuery;
}
return text;
}
private boolean isAnyRole(String role) {
return role == null || "any".equals(role) || (!"sender".equals(role) && !"receiver".equals(role));
}
private UUID buildSender(List<PersonHint> resolved, String role) {
if (resolved.size() >= 2) return resolved.get(0).id();
if (resolved.size() == 1 && "sender".equals(role)) return resolved.get(0).id();
return null;
}
private UUID buildReceiver(List<PersonHint> resolved, String role) {
if (resolved.size() >= 2) return resolved.get(1).id();
if (resolved.size() == 1 && "receiver".equals(role)) return resolved.get(0).id();
return null;
}
private record NameResolution(
List<PersonHint> resolved,
List<PersonHint> ambiguous,
List<String> noMatchFragments,
List<String> extraFragments
) {}
}

View File

@@ -0,0 +1,28 @@
package org.raddatz.familienarchiv.search;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import org.raddatz.familienarchiv.security.Permission;
import org.raddatz.familienarchiv.security.RequirePermission;
import org.springframework.data.domain.Pageable;
import org.springframework.security.core.annotation.AuthenticationPrincipal;
import org.springframework.security.core.userdetails.UserDetails;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/api/search/nl")
@RequiredArgsConstructor
public class NlSearchController {
private final NlQueryParserService nlQueryParserService;
private final NlSearchRateLimiter rateLimiter;
@PostMapping
@RequirePermission(Permission.READ_ALL)
public NlSearchResponse search(@Valid @RequestBody NlSearchRequest request,
Pageable pageable,
@AuthenticationPrincipal UserDetails principal) {
rateLimiter.checkAndConsume(principal.getUsername());
return nlQueryParserService.search(request.query(), pageable);
}
}

View File

@@ -0,0 +1,12 @@
package org.raddatz.familienarchiv.search;
import lombok.Data;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
@Component
@ConfigurationProperties("app.nl-search.rate-limit")
@Data
public class NlSearchRateLimitProperties {
private int maxRequestsPerMinute = 5;
}

View File

@@ -0,0 +1,46 @@
package org.raddatz.familienarchiv.search;
import com.github.benmanes.caffeine.cache.Caffeine;
import com.github.benmanes.caffeine.cache.LoadingCache;
import io.github.bucket4j.Bandwidth;
import io.github.bucket4j.Bucket;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import org.springframework.stereotype.Service;
import java.time.Duration;
import java.util.concurrent.TimeUnit;
@Service
public class NlSearchRateLimiter {
private final LoadingCache<String, Bucket> byUser;
private final int maxRequestsPerMinute;
public NlSearchRateLimiter(NlSearchRateLimitProperties props) {
this.maxRequestsPerMinute = props.getMaxRequestsPerMinute();
this.byUser = Caffeine.newBuilder()
.expireAfterAccess(1, TimeUnit.MINUTES)
.build(key -> newBucket(maxRequestsPerMinute));
}
public void checkAndConsume(String userKey) {
if (!byUser.get(userKey).tryConsume(1)) {
throw DomainException.tooManyRequests(ErrorCode.SMART_SEARCH_RATE_LIMITED,
"NL search rate limit exceeded for user: " + userKey, 60L);
}
}
void resetForTest() {
byUser.invalidateAll();
}
private static Bucket newBucket(int limit) {
return Bucket.builder()
.addLimit(Bandwidth.builder()
.capacity(limit)
.refillGreedy(limit, Duration.ofMinutes(1))
.build())
.build();
}
}

View File

@@ -0,0 +1,11 @@
package org.raddatz.familienarchiv.search;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;
public record NlSearchRequest(
@NotBlank
@Size(min = 3, max = 500)
String query
) {
}

View File

@@ -0,0 +1,12 @@
package org.raddatz.familienarchiv.search;
import io.swagger.v3.oas.annotations.media.Schema;
import org.raddatz.familienarchiv.document.DocumentSearchResult;
public record NlSearchResponse(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
DocumentSearchResult result,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
NlQueryInterpretation interpretation
) {
}

View File

@@ -0,0 +1,5 @@
package org.raddatz.familienarchiv.search;
public interface OllamaClient {
OllamaExtraction parse(String query);
}

View File

@@ -0,0 +1,18 @@
package org.raddatz.familienarchiv.search;
import java.time.LocalDate;
import java.util.List;
/**
* Raw structured output from Ollama after parsing and sanitising.
* personRole is always one of "sender", "receiver", "any" — defensive parsing ensures this.
*/
record OllamaExtraction(
List<String> personNames,
String personRole,
LocalDate dateFrom,
LocalDate dateTo,
List<String> keywords,
String rawQuery
) {
}

View File

@@ -0,0 +1,5 @@
package org.raddatz.familienarchiv.search;
public interface OllamaHealthClient {
boolean isHealthy();
}

View File

@@ -0,0 +1,15 @@
package org.raddatz.familienarchiv.search;
import lombok.Data;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
@Component
@ConfigurationProperties("app.ollama")
@Data
public class OllamaProperties {
private String baseUrl;
private String model;
private int timeoutSeconds = 30;
private int healthCheckTimeoutSeconds = 2;
}

View File

@@ -0,0 +1,13 @@
package org.raddatz.familienarchiv.search;
import io.swagger.v3.oas.annotations.media.Schema;
import java.util.UUID;
public record PersonHint(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
UUID id,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
String displayName
) {
}

View File

@@ -0,0 +1,184 @@
package org.raddatz.familienarchiv.search;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.extern.slf4j.Slf4j;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import org.springframework.http.client.JdkClientHttpRequestFactory;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestClient;
import org.springframework.web.client.RestClientException;
import java.net.http.HttpClient;
import java.time.Duration;
import java.time.LocalDate;
import java.time.Year;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeParseException;
import java.util.List;
import java.util.Map;
import java.util.Set;
@Service
@Slf4j
public class RestClientOllamaClient implements OllamaClient, OllamaHealthClient {
private static final ObjectMapper MAPPER = new ObjectMapper();
private static final Set<String> VALID_ROLES = Set.of("sender", "receiver", "any");
private static final int MAX_NAME_LENGTH = 200;
private static final int MAX_KEYWORD_LENGTH = 100;
private static final Map<String, Object> JSON_SCHEMA = Map.of(
"type", "object",
"required", List.of("personNames", "personRole", "keywords"),
"properties", Map.of(
"personNames", Map.of("type", "array", "items", Map.of("type", "string", "maxLength", MAX_NAME_LENGTH)),
"personRole", Map.of("type", "string", "enum", List.of("sender", "receiver", "any")),
"dateFrom", Map.of("type", List.of("string", "null"), "maxLength", 20),
"dateTo", Map.of("type", List.of("string", "null"), "maxLength", 20),
"keywords", Map.of("type", "array", "items", Map.of("type", "string", "maxLength", MAX_KEYWORD_LENGTH))
)
);
private final RestClient inferenceClient;
private final RestClient healthClient;
private final OllamaProperties props;
public RestClientOllamaClient(OllamaProperties props) {
this.props = props;
HttpClient inferenceHttp = HttpClient.newBuilder()
.version(HttpClient.Version.HTTP_1_1)
.connectTimeout(Duration.ofSeconds(10))
.build();
JdkClientHttpRequestFactory inferenceFactory = new JdkClientHttpRequestFactory(inferenceHttp);
inferenceFactory.setReadTimeout(Duration.ofSeconds(props.getTimeoutSeconds()));
this.inferenceClient = RestClient.builder()
.baseUrl(props.getBaseUrl())
.requestFactory(inferenceFactory)
.build();
HttpClient healthHttp = HttpClient.newBuilder()
.version(HttpClient.Version.HTTP_1_1)
.connectTimeout(Duration.ofSeconds(props.getHealthCheckTimeoutSeconds()))
.build();
JdkClientHttpRequestFactory healthFactory = new JdkClientHttpRequestFactory(healthHttp);
healthFactory.setReadTimeout(Duration.ofSeconds(props.getHealthCheckTimeoutSeconds()));
this.healthClient = RestClient.builder()
.baseUrl(props.getBaseUrl())
.requestFactory(healthFactory)
.build();
}
@Override
public OllamaExtraction parse(String query) {
try {
OllamaGenerateRequest request = new OllamaGenerateRequest(
props.getModel(), query, JSON_SCHEMA, false);
String responseBody = inferenceClient.post()
.uri("/api/generate")
.contentType(org.springframework.http.MediaType.APPLICATION_JSON)
.body(request)
.retrieve()
.body(String.class);
return parseOllamaResponse(responseBody, query);
} catch (DomainException e) {
throw e;
} catch (Exception e) {
log.warn("Ollama inference failed: {}", e.getClass().getSimpleName());
throw DomainException.serviceUnavailable(ErrorCode.SMART_SEARCH_UNAVAILABLE,
"Ollama unavailable: " + e.getClass().getSimpleName());
}
}
@Override
public boolean isHealthy() {
try {
healthClient.get().uri("/api/tags").retrieve().toBodilessEntity();
return true;
} catch (Exception e) {
return false;
}
}
private OllamaExtraction parseOllamaResponse(String responseBody, String rawQuery) {
try {
OllamaGenerateResponse response = MAPPER.readValue(responseBody, OllamaGenerateResponse.class);
String inner = response.response();
if (inner == null || inner.isBlank()) {
return fallbackExtraction(rawQuery);
}
RawOllamaOutput raw = MAPPER.readValue(inner, RawOllamaOutput.class);
return toExtraction(raw, rawQuery);
} catch (Exception e) {
log.warn("Failed to parse Ollama response: {}", e.getClass().getSimpleName());
throw DomainException.serviceUnavailable(ErrorCode.SMART_SEARCH_UNAVAILABLE,
"Failed to parse Ollama response: " + e.getClass().getSimpleName());
}
}
private OllamaExtraction toExtraction(RawOllamaOutput raw, String rawQuery) {
List<String> names = raw.personNames() == null ? List.of() : raw.personNames().stream()
.filter(n -> n != null && n.length() <= MAX_NAME_LENGTH)
.toList();
List<String> keywords = raw.keywords() == null ? List.of() : raw.keywords().stream()
.filter(k -> k != null && k.length() <= MAX_KEYWORD_LENGTH)
.toList();
String role = sanitiseRole(raw.personRole());
LocalDate dateFrom = parseDate(raw.dateFrom(), true);
LocalDate dateTo = parseDate(raw.dateTo(), false);
return new OllamaExtraction(names, role, dateFrom, dateTo, keywords, rawQuery);
}
private OllamaExtraction fallbackExtraction(String rawQuery) {
return new OllamaExtraction(List.of(), "any", null, null, List.of(), rawQuery);
}
private String sanitiseRole(String role) {
if (role != null && VALID_ROLES.contains(role)) {
return role;
}
log.warn("Unexpected personRole from Ollama: {}", role);
return "any";
}
private LocalDate parseDate(String raw, boolean isFrom) {
if (raw == null || raw.isBlank()) return null;
try {
return LocalDate.parse(raw, DateTimeFormatter.ISO_LOCAL_DATE);
} catch (DateTimeParseException ignored) {
}
try {
int year = Integer.parseInt(raw.strip());
if (year > 1000 && year < 3000) {
return isFrom ? Year.of(year).atDay(1) : Year.of(year).atMonth(12).atEndOfMonth();
}
} catch (NumberFormatException ignored) {
}
return null;
}
@JsonIgnoreProperties(ignoreUnknown = true)
private record OllamaGenerateResponse(String response) {
}
@JsonIgnoreProperties(ignoreUnknown = true)
private record RawOllamaOutput(
@JsonProperty("personNames") List<String> personNames,
@JsonProperty("personRole") String personRole,
@JsonProperty("dateFrom") String dateFrom,
@JsonProperty("dateTo") String dateTo,
@JsonProperty("keywords") List<String> keywords
) {
}
private record OllamaGenerateRequest(
String model,
String prompt,
Object format,
boolean stream
) {
}
}

View File

@@ -20,7 +20,14 @@ public interface TagRepository extends JpaRepository<Tag, UUID> {
}
Optional<Tag> findByNameIgnoreCase(String name);
// Tag-name resolution (see TagService.findOrCreate). Names that collide case-insensitively across
// the canonical tree are VALID — a parent and its same-named lowercase child (e.g. "Geburt" /
// "Geburt/geburt") are distinct nodes with their own source_ref and document attachments. So
// resolution must be exact-case first, then a non-throwing list for the case-insensitive fallback.
// Do NOT add a unique(lower(name)) constraint — it would reject these legitimate rows. See #730.
Optional<Tag> findByName(String name);
List<Tag> findAllByNameIgnoreCase(String name);
// Lookup by the canonical tag_path, used for idempotent canonical re-import (Phase 3).
Optional<Tag> findBySourceRef(String sourceRef);

View File

@@ -2,6 +2,7 @@ package org.raddatz.familienarchiv.tag;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedHashMap;
@@ -55,10 +56,21 @@ public class TagService {
return tagRepository.findBySourceRef(sourceRef);
}
/**
* Resolves a tag name to a single tag, creating one when absent. Never throws on case-insensitive
* collisions: names that differ only by case are valid distinct nodes in the canonical tree (a
* parent and its same-named lowercase child), so resolution prefers an exact-case match, then
* falls back to the lowest-id case-insensitive match, then creates. See #730.
*/
public Tag findOrCreate(String name) {
String cleanName = name.trim();
return tagRepository.findByNameIgnoreCase(cleanName)
.orElseGet(() -> tagRepository.save(Tag.builder().name(cleanName).build()));
Optional<Tag> exact = tagRepository.findByName(cleanName);
if (exact.isPresent()) return exact.get(); // exact-case wins (edit round-trip replays the stored name)
List<Tag> caseInsensitive = tagRepository.findAllByNameIgnoreCase(cleanName);
if (!caseInsensitive.isEmpty()) {
return caseInsensitive.stream().min(Comparator.comparing(Tag::getId)).orElseThrow(); // deterministic tie-break by id — list is non-empty, never throws
}
return tagRepository.save(Tag.builder().name(cleanName).build()); // create-when-absent (orphan tag: null sourceRef/parentId)
}
/**

View File

@@ -51,6 +51,12 @@ public class AdminController {
return ResponseEntity.ok(new BackfillResult(count));
}
@PostMapping("/backfill-titles")
public ResponseEntity<BackfillResult> backfillTitles() {
int count = documentService.backfillTitles();
return ResponseEntity.ok(new BackfillResult(count));
}
@PostMapping("/generate-thumbnails")
public ResponseEntity<ThumbnailBackfillService.BackfillStatus> generateThumbnails() {
thumbnailBackfillService.runBackfillAsync();

View File

@@ -11,3 +11,7 @@ springdoc:
swagger-ui:
enabled: true
path: /swagger-ui.html
app:
ollama:
base-url: http://localhost:11434

View File

@@ -130,6 +130,16 @@ app:
# The loader maps columns by header name — no positional indices (see ADR-025).
dir: ${IMPORT_DIR:/import}
ollama:
base-url: http://ollama:11434
model: qwen2.5:7b-instruct-q4_K_M
timeout-seconds: 30
health-check-timeout-seconds: 2
nl-search:
rate-limit:
max-requests-per-minute: 5
ocr:
sender-model:
activation-threshold: 100

View File

@@ -0,0 +1,53 @@
-- Move person-delete referential integrity from application code into the database (#684).
--
-- Before this migration, PersonService.deletePerson nulled documents.sender_id and removed
-- document_receivers rows in Java before deleting the person, because the two V1 FKs into
-- persons had no ON DELETE behaviour. Any other delete path (a future endpoint, a manual
-- psql, a batch job) could still orphan rows or 500. This migration makes the database the
-- single source of truth so a person delete is safe from every path.
--
-- Cascade boundary: the cascade stays STRICTLY at the join/reference layer and NEVER reaches
-- documents rows — a cascade into documents would destroy historical letters. sender_id is
-- SET NULL (documents.senderText preserves the raw textual attribution); the receiver join
-- row and the @-mention sidecar row are dropped.
--
-- No NOT VALID + VALIDATE two-step: these tables are small (thousands of rows → sub-second
-- ACCESS EXCLUSIVE lock). Do NOT copy this drop-and-recreate pattern onto a large table.
--
-- Not audit-logged: a DB ON DELETE cascade runs below AuditService — a known, accepted trade.
-- The person-delete action itself is still logged at the service layer.
-- documents.sender_id → ON DELETE SET NULL (deleted sender clears the link; the document survives).
ALTER TABLE public.documents
DROP CONSTRAINT fkl5xhww7es3b4um01vmly4y18m,
ADD CONSTRAINT fkl5xhww7es3b4um01vmly4y18m
FOREIGN KEY (sender_id) REFERENCES public.persons(id) ON DELETE SET NULL;
-- document_receivers.person_id → ON DELETE CASCADE (drop the join row), the symmetric
-- completion of V14, which added the same to the document_id side of this table.
ALTER TABLE public.document_receivers
DROP CONSTRAINT fkcg7r68qvosqricx1betgrlt7s,
ADD CONSTRAINT fkcg7r68qvosqricx1betgrlt7s
FOREIGN KEY (person_id) REFERENCES public.persons(id) ON DELETE CASCADE;
-- Soft reference fix: transcription_block_mentioned_persons.person_id was a UUID with no FK
-- (V56), so deleting a person left dangling mention rows. Give it a real FK with CASCADE.
-- This reverses V56's deliberate "no FK on person_id" choice — that comment is now historical
-- but is intentionally left untouched, because editing an already-applied migration changes its
-- Flyway checksum and would fail validateOnMigrate in prod. ADR-032 is the authoritative record.
-- Clean up pre-existing orphans first — production likely holds dangling rows because the old
-- deletePerson never cleaned mention rows, and the ADD CONSTRAINT validation scan fails on them.
-- A DO block with RAISE NOTICE surfaces the purge count: Flyway runs each statement via JDBC
-- and discards a trailing SELECT's result set, so a "SELECT count(*)" would log nothing.
DO $$
DECLARE removed int;
BEGIN
DELETE FROM transcription_block_mentioned_persons m
WHERE NOT EXISTS (SELECT 1 FROM persons p WHERE p.id = m.person_id);
GET DIAGNOSTICS removed = ROW_COUNT;
RAISE NOTICE 'V71 orphaned_mention_rows_removed=%', removed;
END $$;
ALTER TABLE public.transcription_block_mentioned_persons
ADD CONSTRAINT fk_tbmp_person
FOREIGN KEY (person_id) REFERENCES public.persons(id) ON DELETE CASCADE;

View File

@@ -624,4 +624,88 @@ class DocumentRepositoryTest {
.reviewed(reviewed)
.build();
}
// ─── searchDocumentsByPersonId (via Specification) ───────────────────────
private Page<Document> searchByPerson(Person person, LocalDate from, LocalDate to) {
Specification<Document> spec = (root, query, cb) -> {
if (query != null) query.distinct(true);
var receiversJoin = root.join("receivers", jakarta.persistence.criteria.JoinType.LEFT);
var personPredicate = cb.or(
cb.equal(root.get("sender"), person),
cb.equal(receiversJoin, person));
var predicates = new java.util.ArrayList<>(java.util.List.of(personPredicate));
if (from != null) predicates.add(cb.greaterThanOrEqualTo(root.get("documentDate"), from));
if (to != null) predicates.add(cb.lessThanOrEqualTo(root.get("documentDate"), to));
return cb.and(predicates.toArray(new jakarta.persistence.criteria.Predicate[0]));
};
return documentRepository.findAll(spec, PageRequest.of(0, 10));
}
@Test
void searchByPersonSpec_returnsDocument_whenPersonIsSender() {
Person person = personRepository.save(Person.builder().lastName("Raddatz").build());
Document doc = documentRepository.save(Document.builder()
.title("Senderbrief").originalFilename("sender.pdf")
.status(DocumentStatus.UPLOADED).sender(person).build());
Page<Document> result = searchByPerson(person, null, null);
assertThat(result.getContent()).extracting(Document::getId).containsExactly(doc.getId());
}
@Test
void searchByPersonSpec_returnsDocument_whenPersonIsReceiver() {
Person person = personRepository.save(Person.builder().lastName("Raddatz").build());
Document doc = documentRepository.save(Document.builder()
.title("Empfängerbrief").originalFilename("receiver.pdf")
.status(DocumentStatus.UPLOADED)
.receivers(new java.util.HashSet<>(List.of(person))).build());
Page<Document> result = searchByPerson(person, null, null);
assertThat(result.getContent()).extracting(Document::getId).containsExactly(doc.getId());
}
@Test
void searchByPersonSpec_returnsDocumentOnce_whenPersonIsBothSenderAndReceiver() {
Person person = personRepository.save(Person.builder().lastName("Raddatz").build());
Document doc = documentRepository.save(Document.builder()
.title("SenderEmpfänger").originalFilename("both.pdf")
.status(DocumentStatus.UPLOADED).sender(person)
.receivers(new java.util.HashSet<>(List.of(person))).build());
Page<Document> result = searchByPerson(person, null, null);
assertThat(result.getContent()).hasSize(1);
assertThat(result.getContent().get(0).getId()).isEqualTo(doc.getId());
}
@Test
void searchByPersonSpec_excludesDocuments_outsideDateRange() {
Person person = personRepository.save(Person.builder().lastName("Raddatz").build());
Document inside = documentRepository.save(Document.builder()
.title("Innen").originalFilename("inside.pdf").status(DocumentStatus.UPLOADED)
.sender(person).documentDate(LocalDate.of(1918, 6, 15)).build());
documentRepository.save(Document.builder()
.title("Außen").originalFilename("outside.pdf").status(DocumentStatus.UPLOADED)
.sender(person).documentDate(LocalDate.of(1920, 1, 1)).build());
Page<Document> result = searchByPerson(person, LocalDate.of(1914, 1, 1), LocalDate.of(1918, 12, 31));
assertThat(result.getContent()).extracting(Document::getId).containsExactly(inside.getId());
}
@Test
void searchByPersonSpec_returnsEmpty_whenNoMatchingDocuments() {
Person person = personRepository.save(Person.builder().lastName("Raddatz").build());
Person other = personRepository.save(Person.builder().lastName("Braun").build());
documentRepository.save(Document.builder()
.title("Fremder Brief").originalFilename("other.pdf")
.status(DocumentStatus.UPLOADED).sender(other).build());
Page<Document> result = searchByPerson(person, null, null);
assertThat(result.getContent()).isEmpty();
}
}

View File

@@ -5,6 +5,7 @@ import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.ArgumentCaptor;
import org.mockito.InjectMocks;
import org.mockito.Mock;
import org.mockito.Spy;
import org.mockito.junit.jupiter.MockitoExtension;
import org.raddatz.familienarchiv.audit.AuditKind;
import org.raddatz.familienarchiv.audit.AuditLogQueryService;
@@ -74,6 +75,9 @@ class DocumentServiceTest {
@Mock AuditLogQueryService auditLogQueryService;
@Mock TranscriptionBlockQueryService transcriptionBlockQueryService;
@Mock ThumbnailAsyncRunner thumbnailAsyncRunner;
// Real factory (pure, dependency-free) so save-time title-regeneration tests exercise the
// shared composition rather than a stub — the #726 single source of truth.
@Spy DocumentTitleFactory documentTitleFactory = new DocumentTitleFactory();
@InjectMocks DocumentService documentService;
// ─── deleteDocument ───────────────────────────────────────────────────────
@@ -228,6 +232,216 @@ class DocumentServiceTest {
assertThat(doc.getMetaDateRaw()).isEqualTo("Juni 1916");
}
// ─── updateDocument save-time auto-title regeneration (#726) ──────────────
//
// Exact old-vs-new comparison: the title is the catalog auto-title iff the submitted
// title equals what the factory builds from the CURRENTLY-persisted state. The edit form
// round-trips the stored title verbatim when untouched, so an equal submission means the
// user did not type over it. makeStored() seeds index/date/precision/location and sets the
// stored title to the matching auto-title, mirroring a freshly-imported row.
private Document makeStored(String index, LocalDate date, DatePrecision precision, String location) {
Document doc = Document.builder()
.id(UUID.randomUUID())
.originalFilename(index)
.documentDate(date)
.metaDatePrecision(precision)
.location(location)
.receivers(new HashSet<>())
.tags(new HashSet<>())
.build();
doc.setTitle(documentTitleFactory.build(doc));
return doc;
}
/** A DTO that round-trips the stored auto-title untouched, with new date/precision/location. */
private static DocumentUpdateDTO editDto(String submittedTitle, LocalDate date,
DatePrecision precision, String location) {
DocumentUpdateDTO dto = new DocumentUpdateDTO();
dto.setTitle(submittedTitle);
dto.setDocumentDate(date);
dto.setMetaDatePrecision(precision);
dto.setLocation(location);
return dto;
}
private Document runUpdate(Document stored, DocumentUpdateDTO dto) throws Exception {
when(documentRepository.findById(stored.getId())).thenReturn(Optional.of(stored));
when(documentRepository.save(any())).thenReturn(stored);
documentService.updateDocument(stored.getId(), dto, null, null);
return stored;
}
@Test
void updateDocument_regeneratesAutoTitle_whenDateChanges() throws Exception {
Document stored = makeStored("C-0029", LocalDate.of(2028, 1, 1), DatePrecision.YEAR, "Berlin");
// title untouched ("C-0029 2028 Berlin"), date corrected to 1928
DocumentUpdateDTO dto = editDto(stored.getTitle(), LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo("C-0029 1928 Berlin");
}
@Test
void updateDocument_keepsHandWrittenTitle_whenDateChanges() throws Exception {
Document stored = makeStored("C-0029", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, null);
stored.setTitle("C-0029 Brief an Mutter"); // hand-written, ≠ auto-title
DocumentUpdateDTO dto = editDto("C-0029 Brief an Mutter", LocalDate.of(1930, 1, 1), DatePrecision.YEAR, null);
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo("C-0029 Brief an Mutter");
}
@Test
void updateDocument_freshlyTypedTitleWins_overRegeneration() throws Exception {
Document stored = makeStored("C-0029", LocalDate.of(2028, 1, 1), DatePrecision.YEAR, "Berlin");
// user changed the date AND typed a new title in the same save
DocumentUpdateDTO dto = editDto("Geburtsanzeige", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo("Geburtsanzeige");
}
@Test
void updateDocument_regeneratesWithNewDateAndLocation() throws Exception {
Document stored = makeStored("C-0029", LocalDate.of(2028, 1, 1), DatePrecision.YEAR, "Berlin");
DocumentUpdateDTO dto = editDto(stored.getTitle(), LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "München");
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo("C-0029 1928 München");
}
@Test
void updateDocument_dropsTrailingLocationSegment_whenLocationCleared() throws Exception {
Document stored = makeStored("C-0029", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
// location cleared (null), title untouched
DocumentUpdateDTO dto = editDto(stored.getTitle(), LocalDate.of(1928, 1, 1), DatePrecision.YEAR, null);
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo("C-0029 1928");
}
@Test
void updateDocument_regeneratedTitle_doesNotContainOldDate() throws Exception {
Document stored = makeStored("C-0029", LocalDate.of(2028, 1, 1), DatePrecision.YEAR, "Berlin");
DocumentUpdateDTO dto = editDto(stored.getTitle(), LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
runUpdate(stored, dto);
assertThat(stored.getTitle()).doesNotContain("2028");
}
@Test
void updateDocument_relabelsOnPrecisionChange_yearToDay() throws Exception {
Document stored = makeStored("C-0029", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, null);
// stored auto-title "C-0029 1928"; set a full day at DAY precision
DocumentUpdateDTO dto = editDto(stored.getTitle(), LocalDate.of(1928, 1, 15), DatePrecision.DAY, null);
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo("C-0029 15. Januar 1928");
}
@Test
void updateDocument_populatesTitle_whenDateAddedToUnknownRow() throws Exception {
Document stored = makeStored("C-0029", null, DatePrecision.UNKNOWN, null);
// stored auto-title is just "C-0029"; add a 1928 YEAR date
DocumentUpdateDTO dto = editDto(stored.getTitle(), LocalDate.of(1928, 1, 1), DatePrecision.YEAR, null);
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo("C-0029 1928");
}
@Test
void updateDocument_roundTripsSeasonLabel() throws Exception {
Document stored = makeStored("C-0029", LocalDate.of(1943, 4, 1), DatePrecision.SEASON, null);
stored.setMetaDateRaw("Frühling 1943");
stored.setTitle(documentTitleFactory.build(stored)); // "C-0029 Frühling 1943"
DocumentUpdateDTO dto = editDto(stored.getTitle(), LocalDate.of(1943, 4, 1), DatePrecision.SEASON, null);
dto.setMetaDateRaw("Frühling 1943");
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo("C-0029 Frühling 1943");
}
@Test
void updateDocument_carriesStoredPrecisionAndRaw_whenDtoOmitsThem() throws Exception {
// Only the year changes; precision/end/raw are omitted from the DTO, so projectedState
// must carry them from the entity (exercises the skip-null effective* resolvers).
Document stored = makeStored("C-0029", LocalDate.of(1943, 4, 1), DatePrecision.SEASON, null);
stored.setMetaDateRaw("Frühling 1943");
stored.setTitle(documentTitleFactory.build(stored)); // "C-0029 Frühling 1943"
DocumentUpdateDTO dto = editDto(stored.getTitle(), LocalDate.of(1944, 4, 1), null, null);
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo("C-0029 Frühling 1944");
}
@Test
void updateDocument_roundTripsRangeLabel_atSaveTime() throws Exception {
Document stored = Document.builder()
.id(UUID.randomUUID())
.originalFilename("C-0029")
.documentDate(LocalDate.of(1917, 1, 10))
.metaDatePrecision(DatePrecision.RANGE)
.metaDateEnd(LocalDate.of(1917, 1, 11))
.receivers(new HashSet<>())
.tags(new HashSet<>())
.build();
stored.setTitle(documentTitleFactory.build(stored)); // "C-0029 10.11. Jan. 1917"
DocumentUpdateDTO dto = new DocumentUpdateDTO();
dto.setTitle(stored.getTitle());
dto.setDocumentDate(LocalDate.of(1918, 1, 10));
dto.setMetaDatePrecision(DatePrecision.RANGE);
dto.setMetaDateEnd(LocalDate.of(1918, 1, 11));
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo("C-0029 10.11. Jan. 1918");
}
@Test
void updateDocument_doesNotRegenerateToBlank_whenSubmittedTitleEmpty() throws Exception {
Document stored = makeStored("C-0029", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
DocumentUpdateDTO dto = editDto("", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
runUpdate(stored, dto);
assertThat(stored.getTitle()).isNotBlank();
}
@Test
void updateDocument_treatsFileReplacedDoc_asManual() throws Exception {
// originalFilename was reassigned by an earlier file-replace, so the stored title (built
// at import from the old index) no longer matches build(currentState) → treated as manual.
Document stored = makeStored("scan_2024.pdf", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
stored.setTitle("C-0029 1928 Berlin"); // legacy import title, ≠ build("scan_2024.pdf"…)
DocumentUpdateDTO dto = editDto("C-0029 1928 Berlin", LocalDate.of(1930, 1, 1), DatePrecision.YEAR, "Berlin");
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo("C-0029 1928 Berlin");
}
@Test
void updateDocument_idempotent_whenNothingChanges() throws Exception {
Document stored = makeStored("C-0029", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
String before = stored.getTitle();
DocumentUpdateDTO dto = editDto(before, LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
runUpdate(stored, dto);
assertThat(stored.getTitle()).isEqualTo(before);
}
// ─── updateDocument date-range validation (#678) ──────────────────────────
/** Builds a stored doc ready for an updateDocument call (collections initialised). */
@@ -481,6 +695,59 @@ class DocumentServiceTest {
verify(documentVersionService).recordVersion(any(Document.class));
}
// ─── backfillTitles — one-time stale-title cleanup (#726, FR-003) ─────────
@Test
void backfillTitles_rewritesStaleAutoTitle_andCountsIt() {
Document stale = makeStored("C-0029", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
stale.setTitle("C-0029 2028 Berlin"); // stale stored title (date typo never fixed)
when(documentRepository.findAll()).thenReturn(List.of(stale));
when(documentRepository.save(any())).thenReturn(stale);
int count = documentService.backfillTitles();
assertThat(count).isEqualTo(1);
assertThat(stale.getTitle()).isEqualTo("C-0029 1928 Berlin");
verify(documentRepository).save(stale);
}
@Test
void backfillTitles_skipsProse() {
Document prose = makeStored("C-0030", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, null);
prose.setTitle("C-0030 Brief an Mutter");
when(documentRepository.findAll()).thenReturn(List.of(prose));
int count = documentService.backfillTitles();
assertThat(count).isZero();
assertThat(prose.getTitle()).isEqualTo("C-0030 Brief an Mutter");
verify(documentRepository, never()).save(any());
}
@Test
void backfillTitles_isIdempotent_forAlreadyCorrectTitle() {
Document fresh = makeStored("C-0031", LocalDate.of(1940, 1, 1), DatePrecision.YEAR, null);
// title already equals build(current state) → nothing to do
when(documentRepository.findAll()).thenReturn(List.of(fresh));
int count = documentService.backfillTitles();
assertThat(count).isZero();
verify(documentRepository, never()).save(any());
}
@Test
void backfillTitles_neverRecordsVersions() {
Document stale = makeStored("C-0029", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
stale.setTitle("C-0029 2028 Berlin");
when(documentRepository.findAll()).thenReturn(List.of(stale));
when(documentRepository.save(any())).thenReturn(stale);
documentService.backfillTitles();
verify(documentVersionService, never()).recordVersion(any());
}
// ─── thumbnail dispatch ───────────────────────────────────────────────────
@Test

View File

@@ -0,0 +1,90 @@
package org.raddatz.familienarchiv.document;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.context.annotation.Import;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.transaction.annotation.Transactional;
import software.amazon.awssdk.services.s3.S3Client;
import java.time.LocalDate;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
/**
* End-to-end backfill against a real Postgres (#726, FR-003). H2 is unusable here — the
* {@code title} column is NOT NULL and the title-sync semantics depend on that — so this pins the
* behaviour on {@code postgres:16-alpine}: a stale auto-title is rewritten, the sweep is
* idempotent, prose is left alone, and the mechanical rename writes no {@code document_versions}
* rows. Permission enforcement (401/403) is covered faster by the {@code @WebMvcTest} slice in
* {@code AdminControllerTest}.
*/
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.NONE)
@ActiveProfiles("test")
@Import(PostgresContainerConfig.class)
@Transactional
class DocumentTitleBackfillIntegrationTest {
@MockitoBean S3Client s3Client;
@Autowired DocumentService documentService;
@Autowired DocumentRepository documentRepository;
@Autowired DocumentVersionRepository documentVersionRepository;
private Document persist(String index, String title, LocalDate date, DatePrecision precision, String location) {
return documentRepository.save(Document.builder()
.originalFilename(index)
.title(title)
.documentDate(date)
.metaDatePrecision(precision)
.location(location)
.status(DocumentStatus.PLACEHOLDER)
.build());
}
@Test
void backfill_rewritesStaleAutoTitle() {
Document stale = persist("C-0029", "C-0029 2028 Berlin",
LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
int count = documentService.backfillTitles();
assertThat(count).isEqualTo(1); // exactly the one stale row seeded (clean test DB)
assertThat(documentRepository.findById(stale.getId()).orElseThrow().getTitle())
.isEqualTo("C-0029 1928 Berlin");
}
@Test
void backfill_isIdempotent_secondRunChangesNothing() {
persist("C-0029", "C-0029 2028 Berlin", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
documentService.backfillTitles();
int secondRun = documentService.backfillTitles();
assertThat(secondRun).isZero();
}
@Test
void backfill_skipsProse() {
Document prose = persist("C-0030", "C-0030 Brief an Mutter",
LocalDate.of(1928, 1, 1), DatePrecision.YEAR, null);
documentService.backfillTitles();
assertThat(documentRepository.findById(prose.getId()).orElseThrow().getTitle())
.isEqualTo("C-0030 Brief an Mutter");
}
@Test
void backfill_addsNoDocumentVersionRows() {
persist("C-0029", "C-0029 2028 Berlin", LocalDate.of(1928, 1, 1), DatePrecision.YEAR, "Berlin");
long versionsBefore = documentVersionRepository.count();
documentService.backfillTitles();
assertThat(documentVersionRepository.count()).isEqualTo(versionsBefore);
}
}

View File

@@ -0,0 +1,175 @@
package org.raddatz.familienarchiv.document;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.Timeout;
import java.util.concurrent.TimeUnit;
import static org.assertj.core.api.Assertions.assertThat;
/**
* The backfill overwrite heuristic (FR-004) in isolation — every emittable date-label form is
* recognised, prose is left alone, and a regex-metacharacter index is matched literally without
* hanging. The exact label spellings mirror {@code docs/date-label-fixtures.json}.
*/
class DocumentTitleBackfillMatcherTest {
private static boolean overwritable(String title, String location) {
return DocumentTitleBackfillMatcher.isOverwritable(title, "C-0029", location);
}
// ─── each date-label form (index + form) is overwritable ──────────────────
@Test
void year_form() {
assertThat(overwritable("C-0029 1916", null)).isTrue();
}
@Test
void approx_form() {
assertThat(overwritable("C-0029 ca. 1920", null)).isTrue();
}
@Test
void month_form() {
assertThat(overwritable("C-0029 Juni 1916", null)).isTrue();
}
@Test
void day_form() {
assertThat(overwritable("C-0029 24. Dezember 1943", null)).isTrue();
}
@Test
void season_form() {
assertThat(overwritable("C-0029 Sommer 1916", null)).isTrue();
}
@Test
void unknown_label_form() {
assertThat(overwritable("C-0029 Datum unbekannt", null)).isTrue();
}
@Test
void range_same_month_form() {
assertThat(overwritable("C-0029 10.11. Jan. 1917", null)).isTrue();
}
@Test
void range_cross_month_form() {
assertThat(overwritable("C-0029 30. Jan. 2. Feb. 1917", null)).isTrue();
}
@Test
void range_cross_year_form() {
assertThat(overwritable("C-0029 30. Dez. 1916 2. Jan. 1917", null)).isTrue();
}
@Test
void range_single_day_form() {
assertThat(overwritable("C-0029 10. Jan. 1917", null)).isTrue();
}
@Test
void range_open_form() {
assertThat(overwritable("C-0029 ab 10. Jan. 1917", null)).isTrue();
}
// ─── date label + trailing location (any location) ────────────────────────
@Test
void date_form_with_trailing_location() {
assertThat(overwritable("C-0029 1916 Berlin", null)).isTrue();
}
@Test
void range_with_internal_separator_plus_trailing_location() {
// The range label itself contains " "; the trailing " Berlin" must still be peeled.
assertThat(overwritable("C-0029 30. Jan. 2. Feb. 1917 Berlin", null)).isTrue();
}
// ─── index-only and index+location cases ──────────────────────────────────
@Test
void exactly_index() {
assertThat(overwritable("C-0029", null)).isTrue();
}
@Test
void index_plus_location_equal_to_current() {
assertThat(overwritable("C-0029 Berlin", "Berlin")).isTrue();
}
// ─── prose is left untouched ──────────────────────────────────────────────
@Test
void prose_segment_not_matching_location_is_skipped() {
assertThat(overwritable("C-0029 Brief an Mutter", "Berlin")).isFalse();
}
@Test
void location_only_segment_is_skipped_when_no_current_location() {
// No date label, and the doc has no location to compare against → cannot prove machine.
assertThat(overwritable("C-0029 Berlin", null)).isFalse();
}
@Test
void title_not_starting_with_index_is_skipped() {
assertThat(overwritable("Ganz anderer Titel", null)).isFalse();
}
// ─── near-miss: shapes that look almost machine-built but are not ──────────
@Test
void ascii_hyphen_instead_of_en_dash_separator_is_skipped() {
// The separator is " " (en dash); a plain " - " is not the machine separator.
assertThat(overwritable("C-0029 - 1916", null)).isFalse();
}
@Test
void date_label_without_separator_before_trailing_text_is_skipped() {
// "1916 Berlin" is not a date label and is not joined by " "; prose, not machine.
assertThat(overwritable("C-0029 1916 Berlin", null)).isFalse();
}
@Test
void year_with_trailing_letters_is_not_a_year_label() {
assertThat(overwritable("C-0029 1916er Brief", null)).isFalse();
}
@Test
void index_immediately_followed_by_text_without_separator_is_skipped() {
assertThat(overwritable("C-0029x 1916", null)).isFalse();
}
// ─── fail-closed guards ───────────────────────────────────────────────────
@Test
void null_title_is_not_overwritable() {
assertThat(DocumentTitleBackfillMatcher.isOverwritable(null, "C-0029", null)).isFalse();
}
@Test
void null_index_is_not_overwritable() {
assertThat(DocumentTitleBackfillMatcher.isOverwritable("C-0029 1916", null, null)).isFalse();
}
@Test
void blank_index_is_not_overwritable() {
assertThat(DocumentTitleBackfillMatcher.isOverwritable(" 1916", " ", null)).isFalse();
}
// ─── ReDoS / regex-metacharacter index is matched literally and terminates ─
@Test
@Timeout(value = 5, unit = TimeUnit.SECONDS)
void index_with_regex_metacharacters_is_matched_literally_and_terminates() {
String hostileIndex = "C-0029(.*).pdf";
// Literal prefix → matches; trailing date label → overwritable. Must not hang.
assertThat(DocumentTitleBackfillMatcher.isOverwritable(
hostileIndex + " 1916", hostileIndex, null)).isTrue();
// A title that does NOT start with the literal hostile index is skipped, also fast.
assertThat(DocumentTitleBackfillMatcher.isOverwritable(
"C-0029 1916", hostileIndex, null)).isFalse();
}
}

View File

@@ -0,0 +1,89 @@
package org.raddatz.familienarchiv.document;
import org.junit.jupiter.api.Test;
import java.time.LocalDate;
import static org.assertj.core.api.Assertions.assertThat;
/**
* The auto-title composition {@code {index} {dateLabel} {location}} in isolation.
* The honest date-label forms themselves are pinned by {@link DocumentTitleFormatterTest}
* against the shared #666 fixture; here we assert only how the factory composes the
* three segments and which segments it omits.
*/
class DocumentTitleFactoryTest {
private final DocumentTitleFactory factory = new DocumentTitleFactory();
private static Document.DocumentBuilder doc(String index) {
return Document.builder()
.originalFilename(index)
.metaDatePrecision(DatePrecision.UNKNOWN);
}
@Test
void index_only_when_no_date_and_no_location() {
assertThat(factory.build(doc("C-0029").build())).isEqualTo("C-0029");
}
@Test
void index_and_year_date() {
Document d = doc("C-0029")
.documentDate(LocalDate.of(1928, 1, 15))
.metaDatePrecision(DatePrecision.YEAR)
.build();
assertThat(factory.build(d)).isEqualTo("C-0029 1928");
}
@Test
void index_date_and_location() {
Document d = doc("C-0029")
.documentDate(LocalDate.of(1928, 1, 15))
.metaDatePrecision(DatePrecision.YEAR)
.location("Berlin")
.build();
assertThat(factory.build(d)).isEqualTo("C-0029 1928 Berlin");
}
@Test
void location_without_date_attaches_directly_to_index() {
Document d = doc("C-0029").location("Berlin").build();
assertThat(factory.build(d)).isEqualTo("C-0029 Berlin");
}
@Test
void unknown_precision_omits_the_date_segment() {
Document d = doc("C-0029")
.documentDate(LocalDate.of(1928, 1, 15))
.metaDatePrecision(DatePrecision.UNKNOWN)
.build();
assertThat(factory.build(d)).isEqualTo("C-0029");
}
@Test
void blank_location_is_omitted() {
Document d = doc("C-0029")
.documentDate(LocalDate.of(1928, 1, 15))
.metaDatePrecision(DatePrecision.YEAR)
.location(" ")
.build();
assertThat(factory.build(d)).isEqualTo("C-0029 1928");
}
@Test
void bare_document_with_null_index_builds_empty_string_not_npe() {
// originalFilename is NOT NULL in production; the guard keeps a synthetic/partial entity
// from tripping StringBuilder(null) with an opaque NPE.
assertThat(factory.build(Document.builder().build())).isEqualTo("");
}
@Test
void day_precision_renders_the_full_german_label() {
Document d = doc("C-0029")
.documentDate(LocalDate.of(1928, 1, 15))
.metaDatePrecision(DatePrecision.DAY)
.build();
assertThat(factory.build(d)).isEqualTo("C-0029 15. Januar 1928");
}
}

View File

@@ -1,10 +1,9 @@
package org.raddatz.familienarchiv.importing;
package org.raddatz.familienarchiv.document;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.DynamicTest;
import org.junit.jupiter.api.TestFactory;
import org.raddatz.familienarchiv.document.DatePrecision;
import java.nio.file.Files;
import java.nio.file.Path;

View File

@@ -0,0 +1,123 @@
package org.raddatz.familienarchiv.document;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.raddatz.familienarchiv.tag.Tag;
import org.raddatz.familienarchiv.tag.TagRepository;
import org.raddatz.familienarchiv.tag.TagService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.context.annotation.Import;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.transaction.annotation.Transactional;
import software.amazon.awssdk.services.s3.S3Client;
import java.time.LocalDate;
import java.util.Comparator;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatCode;
/**
* #730 — tag-name resolution against a real Postgres. A mocked repo can't prove the two things that
* actually break: that {@code findAllByNameIgnoreCase} folds case the way Postgres {@code LOWER()}
* does (critical for umlauts like {@code ü}), and that saving a document tagged with a case-colliding
* tag no longer throws {@code NonUniqueResultException}. H2 folds case differently, so this pins the
* behaviour on {@code postgres:16-alpine}. The four-branch resolution logic itself is covered faster
* by the mocked {@code TagServiceTest}.
*/
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.NONE)
@ActiveProfiles("test")
@Import(PostgresContainerConfig.class)
@Transactional
class TagCaseCollisionIntegrationTest {
@MockitoBean S3Client s3Client;
@Autowired DocumentService documentService;
@Autowired DocumentRepository documentRepository;
@Autowired TagRepository tagRepository;
@Autowired TagService tagService;
private Tag persistTag(String name, String sourceRef, UUID parentId) {
return tagRepository.save(Tag.builder().name(name).sourceRef(sourceRef).parentId(parentId).build());
}
private Document persistDocTaggedWith(Tag tag) {
return documentRepository.save(Document.builder()
.originalFilename("C-7301")
.title("Weihnachtsbrief")
.documentDate(LocalDate.of(1928, 1, 1))
.metaDatePrecision(DatePrecision.YEAR)
.status(DocumentStatus.UPLOADED)
.tags(new HashSet<>(Set.of(tag)))
.build());
}
@Test
void updateDocument_succeedsAndKeepsExactChildTag_whenTaggedWithCaseCollidingChild() throws Exception {
Tag parent = persistTag("Weihnachten", "Weihnachten", null);
Tag child = persistTag("weihnachten", "Weihnachten/weihnachten", parent.getId());
Document doc = persistDocTaggedWith(child);
DocumentUpdateDTO dto = new DocumentUpdateDTO();
dto.setTitle("Weihnachtsbrief");
dto.setDocumentDate(LocalDate.of(1930, 1, 1)); // change the date — the field that 500'd on staging
dto.setMetaDatePrecision(DatePrecision.YEAR);
dto.setTags("weihnachten"); // the edit form round-trips the stored child name
assertThatCode(() -> documentService.updateDocument(doc.getId(), dto, null, null))
.doesNotThrowAnyException();
Set<Tag> tags = documentRepository.findById(doc.getId()).orElseThrow().getTags();
assertThat(tags).hasSize(1);
assertThat(tags.iterator().next().getId()).isEqualTo(child.getId()); // child kept, not the parent
}
@Test
void findOrCreate_resolvesUmlautCollisionDeterministically_withoutThrow() {
// The regression catcher: a plain-ASCII pair would stay green even if Postgres folded ü wrongly.
Tag parent = persistTag("Glückwünsche", "Glückwünsche", null);
Tag child = persistTag("glückwünsche", "Glückwünsche/glückwünsche", parent.getId());
// Proof that real Postgres LOWER() folds the umlaut so both rows match case-insensitively.
// Query with the UPPERCASE form findOrCreate actually passes — folding LOWER('GLÜCKWÜNSCHE')
// against LOWER(name) is the exact step under test; a lowercase probe wouldn't exercise it.
assertThat(tagRepository.findAllByNameIgnoreCase("GLÜCKWÜNSCHE")).hasSize(2);
// No exact-case "GLÜCKWÜNSCHE" row exists → resolution falls through to the case-insensitive
// branch with two candidates and must pick the lowest id deterministically, never throwing.
UUID expected = List.of(parent, child).stream().min(Comparator.comparing(Tag::getId)).orElseThrow().getId();
Tag first = tagService.findOrCreate("GLÜCKWÜNSCHE");
Tag second = tagService.findOrCreate("GLÜCKWÜNSCHE");
assertThat(first.getId()).isEqualTo(expected);
assertThat(second.getId()).isEqualTo(first.getId());
}
@Test
void bulkEdit_resolvesCaseCollidingTagThroughFindOrCreate_withoutThrow() {
// Bulk-edit shares resolveTags → findOrCreate; this guards a future refactor that bypasses it.
Tag parent = persistTag("Weihnachten", "Weihnachten", null);
Tag child = persistTag("weihnachten", "Weihnachten/weihnachten", parent.getId());
Document doc = documentRepository.save(Document.builder()
.originalFilename("C-7302")
.title("Brief")
.status(DocumentStatus.UPLOADED)
.build());
DocumentBulkEditDTO dto = new DocumentBulkEditDTO();
dto.setTagNames(List.of("weihnachten"));
assertThatCode(() -> documentService.applyBulkEditToDocument(doc.getId(), dto, null))
.doesNotThrowAnyException();
Set<Tag> tags = documentRepository.findById(doc.getId()).orElseThrow().getTags();
assertThat(tags).hasSize(1);
assertThat(tags.iterator().next().getId()).isEqualTo(child.getId());
}
}

View File

@@ -12,6 +12,8 @@ import org.raddatz.familienarchiv.document.annotation.DocumentAnnotation;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.document.transcription.PersonMention;
import org.raddatz.familienarchiv.document.transcription.TranscriptionBlock;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.jdbc.test.autoconfigure.AutoConfigureTestDatabase;
import org.springframework.boot.data.jpa.test.autoconfigure.DataJpaTest;
@@ -30,6 +32,7 @@ class TranscriptionBlockMentionsRepositoryTest {
@Autowired TranscriptionBlockRepository blockRepository;
@Autowired DocumentRepository documentRepository;
@Autowired AnnotationRepository annotationRepository;
@Autowired PersonRepository personRepository;
@Autowired EntityManager em;
private UUID documentId;
@@ -55,8 +58,9 @@ class TranscriptionBlockMentionsRepositoryTest {
@Test
void mentionedPersons_roundTripsTwoEntries() {
UUID auguste = UUID.randomUUID();
UUID hermann = UUID.randomUUID();
// person_id is a real FK since V71 — the mentioned persons must exist.
UUID auguste = personRepository.save(Person.builder().firstName("Auguste").lastName("Raddatz").build()).getId();
UUID hermann = personRepository.save(Person.builder().firstName("Hermann").lastName("Müller").build()).getId();
TranscriptionBlock saved = blockRepository.saveAndFlush(TranscriptionBlock.builder()
.annotationId(annotationId)
@@ -97,8 +101,9 @@ class TranscriptionBlockMentionsRepositoryTest {
@Test
void findByPersonIdWithMentionsFetched_returnsOnlyBlocksReferencingPerson_withMentionsLoaded() {
UUID augusteId = UUID.randomUUID();
UUID hermannId = UUID.randomUUID();
// person_id is a real FK since V71 — the mentioned persons must exist.
UUID augusteId = personRepository.save(Person.builder().firstName("Auguste").lastName("Raddatz").build()).getId();
UUID hermannId = personRepository.save(Person.builder().firstName("Hermann").lastName("Müller").build()).getId();
blockRepository.saveAndFlush(TranscriptionBlock.builder()
.annotationId(annotationId).documentId(documentId)

View File

@@ -12,6 +12,7 @@ import org.mockito.MockedStatic;
import org.mockito.junit.jupiter.MockitoExtension;
import org.slf4j.LoggerFactory;
import org.springframework.dao.DataIntegrityViolationException;
import org.springframework.dao.IncorrectResultSizeDataAccessException;
import org.springframework.http.ResponseEntity;
import static org.assertj.core.api.Assertions.assertThat;
@@ -37,6 +38,30 @@ class GlobalExceptionHandlerTest {
}
}
@Test
void handleGeneric_incorrectResultSize_staysOpaque_noHibernateOrRowCountLeak() {
// #731: before the fix, a case-colliding alias/name made Hibernate throw
// NonUniqueResultException → IncorrectResultSizeDataAccessException, which has no
// dedicated handler and falls through to handleGeneric. The fix removes the throw, but
// this pins the handler: a stray one must stay opaque — no Hibernate class name, no SQL,
// no "2 results were returned" row count reaching the client (CWE-209).
IncorrectResultSizeDataAccessException ex = new IncorrectResultSizeDataAccessException(
"query did not return a unique result: 2 results were returned", 1, 2);
try (MockedStatic<Sentry> sentryMock = mockStatic(Sentry.class)) {
ResponseEntity<GlobalExceptionHandler.ErrorResponse> response = handler.handleGeneric(ex);
assertThat(response.getStatusCode().value()).isEqualTo(500);
assertThat(response.getBody()).isNotNull();
assertThat(response.getBody().code()).isEqualTo(ErrorCode.INTERNAL_ERROR);
assertThat(response.getBody().message())
.isEqualTo("An unexpected error occurred")
.doesNotContain("results were returned")
.doesNotContain("NonUnique")
.doesNotContain("IncorrectResultSize");
}
}
@Test
void handleDataIntegrityViolation_returns400_withoutLeakingConstraint_orSentry() {
// A DataIntegrityViolationException carries the constraint name + SQL in its message;

View File

@@ -11,6 +11,7 @@ import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentTitleFactory;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.document.ThumbnailAsyncRunner;
import org.raddatz.familienarchiv.person.Person;
@@ -59,8 +60,10 @@ class DocumentImporterTest {
// override this stub locally (load_skipsFile_whenMagicByteCheckThrowsIoException).
lenient().when(fileStreamOpener.open(any(File.class)))
.thenAnswer(inv -> new java.io.FileInputStream(inv.getArgument(0, File.class)));
importer = new DocumentImporter(documentService, personService, tagService, s3Client,
thumbnailAsyncRunner, fileStreamOpener);
// Real factory (pure, dependency-free) so the title-content assertions below exercise
// the shared composition rather than a stub — the #726 single source of truth.
importer = new DocumentImporter(documentService, new DocumentTitleFactory(), personService,
tagService, s3Client, thumbnailAsyncRunner, fileStreamOpener);
ReflectionTestUtils.setField(importer, "bucketName", "test-bucket");
}

View File

@@ -21,6 +21,7 @@ import jakarta.persistence.PersistenceContext;
import java.util.List;
import java.util.Optional;
import java.util.Set;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
@@ -120,37 +121,60 @@ class PersonRepositoryTest {
.containsExactly("Anna", "Clara");
}
// ─── findByAliasIgnoreCase ────────────────────────────────────────────────
// ─── findByAlias (exact) / findAllByAliasIgnoreCase (case-folding siblings) ───
@Test
void findByAliasIgnoreCase_returnsMatchingPerson() {
void findByAlias_returnsExactCaseMatchOnly() {
personRepository.save(Person.builder()
.firstName("Karl").lastName("Brandt").alias("Opa Karl").build());
Optional<Person> found = personRepository.findByAliasIgnoreCase("opa karl");
assertThat(found).isPresent();
assertThat(found.get().getFirstName()).isEqualTo("Karl");
assertThat(personRepository.findByAlias("Opa Karl")).isPresent();
assertThat(personRepository.findByAlias("opa karl")).isEmpty(); // exact-case: a folded form does NOT match
}
@Test
void findByAliasIgnoreCase_returnsEmpty_whenAliasDoesNotMatch() {
Optional<Person> found = personRepository.findByAliasIgnoreCase("nobody");
assertThat(found).isEmpty();
void findAllByAliasIgnoreCase_returnsEmpty_whenAliasDoesNotMatch() {
assertThat(personRepository.findAllByAliasIgnoreCase("nobody")).isEmpty();
}
// ─── findByFirstNameIgnoreCaseAndLastNameIgnoreCase ───────────────────────
@Test
void findAllByAliasIgnoreCase_foldsUmlautCase_inRealPostgres() {
// Proves Postgres LOWER() folds ü the same way for both rows — a plain-ASCII probe would
// stay green even if umlaut folding regressed. Both case-colliding aliases must match.
personRepository.save(Person.builder().lastName("Müller").alias("Müller").build());
personRepository.save(Person.builder().lastName("müller").alias("müller").build());
assertThat(personRepository.findAllByAliasIgnoreCase("MÜLLER")).hasSize(2);
}
// ─── findByFirstNameAndLastName (exact) / findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase ───
@Test
void findByFirstNameIgnoreCaseAndLastNameIgnoreCase_returnsMatch() {
void findByFirstNameAndLastName_returnsExactCaseMatchOnly() {
personRepository.save(Person.builder().firstName("Maria").lastName("Raddatz").build());
Optional<Person> found = personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase(
"maria", "raddatz");
assertThat(personRepository.findByFirstNameAndLastName("Maria", "Raddatz")).isPresent();
assertThat(personRepository.findByFirstNameAndLastName("maria", "raddatz")).isEmpty(); // exact-case only
}
assertThat(found).isPresent();
assertThat(found.get().getFirstName()).isEqualTo("Maria");
@Test
void findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase_foldsUmlautCase_inRealPostgres() {
personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
personRepository.save(Person.builder().firstName("hans").lastName("müller").build());
assertThat(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("HANS", "MÜLLER"))
.hasSize(2);
}
@Test
void findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase_nullFirstName_foldsToNoMatch() {
// Fail-closed: a last-name-only filename (null first name) must NOT widen to first_name IS
// NULL and pull in the institution/last-name-only row as a "sender". Proven on real
// Postgres because a mocked unit test cannot catch the IS NULL vs `= NULL` semantics.
personRepository.save(Person.builder().lastName("Müller").build()); // first_name NULL
assertThat(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(null, "Müller"))
.isEmpty();
}
// ─── findCorrespondents ───────────────────────────────────────────────────
@@ -366,30 +390,6 @@ class PersonRepositoryTest {
assertThat(result).hasSize(1);
}
// ─── deleteReceiverReferences ─────────────────────────────────────────────
@Test
void deleteReceiverReferences_removesPersonFromAllDocumentReceivers() {
Person toDelete = personRepository.save(Person.builder().firstName("Weg").lastName("Person").build());
Person sender = personRepository.save(Person.builder().firstName("Send").lastName("Er").build());
Document doc1 = documentRepository.save(Document.builder()
.title("Brief 1").originalFilename("b1.pdf")
.status(DocumentStatus.UPLOADED)
.sender(sender).receivers(Set.of(toDelete)).build());
Document doc2 = documentRepository.save(Document.builder()
.title("Brief 2").originalFilename("b2.pdf")
.status(DocumentStatus.UPLOADED)
.sender(sender).receivers(Set.of(toDelete)).build());
personRepository.deleteReceiverReferences(toDelete.getId());
entityManager.flush();
entityManager.clear();
assertThat(documentRepository.findById(doc1.getId()).orElseThrow().getReceivers()).isEmpty();
assertThat(documentRepository.findById(doc2.getId()).orElseThrow().getReceivers()).isEmpty();
}
// ─── searchByName with aliases ───────────────────────────────────────────
@Test
@@ -707,4 +707,146 @@ class PersonRepositoryTest {
assertThat(found).isPresent();
assertThat(found.get().getGeneration()).isNull();
}
// ─── #684: ON DELETE integrity enforced at the database layer ──────────────
// A raw deleteById (bypassing PersonService) must keep referential integrity:
// documents.sender_id → SET NULL, document_receivers.person_id → CASCADE, and the
// transcription_block_mentioned_persons soft reference → CASCADE. These run against
// real Postgres because the FK ON DELETE behaviour never fires on H2.
@Test
void deleteById_personSenderOfAReceiverOfB_nullsSender_dropsReceiverRow_bothDocumentsSurvive() {
Person target = personRepository.save(Person.builder().firstName("Weg").lastName("Person").build());
Person bystander = personRepository.save(Person.builder().firstName("Bleibt").lastName("Hier").build());
Document sent = documentRepository.save(Document.builder()
.title("Gesendet").originalFilename("sent.pdf")
.status(DocumentStatus.UPLOADED).sender(target).build());
Document received = documentRepository.save(Document.builder()
.title("Empfangen").originalFilename("received.pdf")
.status(DocumentStatus.UPLOADED).sender(bystander)
.receivers(Set.of(target)).build());
entityManager.flush();
entityManager.clear();
personRepository.deleteById(target.getId());
entityManager.flush();
entityManager.clear();
assertThat(personRepository.findById(target.getId())).isEmpty();
Document reloadedSent = documentRepository.findById(sent.getId()).orElseThrow();
assertThat(reloadedSent.getSender()).isNull(); // AC-1: SET NULL
Document reloadedReceived = documentRepository.findById(received.getId()).orElseThrow();
assertThat(reloadedReceived.getReceivers())
.noneMatch(p -> p.getId().equals(target.getId())); // AC-2: CASCADE drops the join row
// Cascade-boundary guard (Nora, non-negotiable): the cascade stops at the join/reference
// layer — both documents themselves survive. Guards against a future migration turning
// documents.sender_id SET NULL into CASCADE and destroying historical letters.
assertThat(documentRepository.findById(sent.getId())).isPresent();
assertThat(documentRepository.findById(received.getId())).isPresent();
}
@Test
void deleteById_receiverWithCoReceiver_dropsOnlyDeletedPersonsJoinRow() {
Person target = personRepository.save(Person.builder().firstName("Weg").lastName("Person").build());
Person coReceiver = personRepository.save(Person.builder().firstName("Mit").lastName("Empfänger").build());
Person sender = personRepository.save(Person.builder().firstName("Send").lastName("Er").build());
Document doc = documentRepository.save(Document.builder()
.title("Brief").originalFilename("brief.pdf")
.status(DocumentStatus.UPLOADED).sender(sender)
.receivers(Set.of(target, coReceiver)).build());
entityManager.flush();
entityManager.clear();
personRepository.deleteById(target.getId());
entityManager.flush();
entityManager.clear();
Document reloaded = documentRepository.findById(doc.getId()).orElseThrow();
assertThat(reloaded.getReceivers()).extracting(Person::getId)
.containsExactly(coReceiver.getId()); // co-receiver untouched
}
@Test
void deleteById_personIsSenderAndReceiverOfSameDocument_documentSurvives_senderNull_receiverDropped() {
// AC-8: the trickier same-document interaction the cross-document cases don't exercise.
Person target = personRepository.save(Person.builder().firstName("Beides").lastName("Person").build());
Person coReceiver = personRepository.save(Person.builder().firstName("Mit").lastName("Empfänger").build());
Document doc = documentRepository.save(Document.builder()
.title("Selbstbrief").originalFilename("self.pdf")
.status(DocumentStatus.UPLOADED).sender(target)
.receivers(Set.of(target, coReceiver)).build());
entityManager.flush();
entityManager.clear();
personRepository.deleteById(target.getId());
entityManager.flush();
entityManager.clear();
Document reloaded = documentRepository.findById(doc.getId()).orElseThrow();
assertThat(reloaded.getSender()).isNull();
assertThat(reloaded.getReceivers()).extracting(Person::getId)
.containsExactly(coReceiver.getId());
}
@Test
void deleteById_mentionedPerson_dropsMentionRow_blockTextSurvives() {
// AC-3: the @-mention sidecar is a CASCADE soft reference, but the literal "@Name" lives
// in transcription_blocks.text and must stay visible as plain text after the person goes.
Person mentioned = personRepository.save(Person.builder().firstName("Auguste").lastName("Raddatz").build());
Person survivor = personRepository.save(Person.builder().firstName("Clara").lastName("Cram").build());
Document doc = documentRepository.save(Document.builder()
.title("Brief").originalFilename("brief.pdf")
.status(DocumentStatus.UPLOADED).build());
entityManager.flush();
UUID annotationId = UUID.randomUUID();
UUID blockId = UUID.randomUUID();
entityManager.createNativeQuery(
"INSERT INTO document_annotations (id, document_id, page_number, x, y, width, height, color) "
+ "VALUES (?1, ?2, 1, 0.1, 0.2, 0.3, 0.1, '#fff')")
.setParameter(1, annotationId).setParameter(2, doc.getId()).executeUpdate();
entityManager.createNativeQuery(
"INSERT INTO transcription_blocks (id, annotation_id, document_id, text) VALUES (?1, ?2, ?3, ?4)")
.setParameter(1, blockId).setParameter(2, annotationId).setParameter(3, doc.getId())
.setParameter(4, "Brief an @Auguste Raddatz und @Clara Cram").executeUpdate();
// Two mention rows on the same block: the deleted person and an innocent bystander.
entityManager.createNativeQuery(
"INSERT INTO transcription_block_mentioned_persons (block_id, person_id, display_name) "
+ "VALUES (?1, ?2, ?3)")
.setParameter(1, blockId).setParameter(2, mentioned.getId())
.setParameter(3, "Auguste Raddatz").executeUpdate();
entityManager.createNativeQuery(
"INSERT INTO transcription_block_mentioned_persons (block_id, person_id, display_name) "
+ "VALUES (?1, ?2, ?3)")
.setParameter(1, blockId).setParameter(2, survivor.getId())
.setParameter(3, "Clara Cram").executeUpdate();
entityManager.flush();
entityManager.clear();
personRepository.deleteById(mentioned.getId());
entityManager.flush();
entityManager.clear();
Number mentionRows = (Number) entityManager.createNativeQuery(
"SELECT count(*) FROM transcription_block_mentioned_persons WHERE person_id = ?1")
.setParameter(1, mentioned.getId()).getSingleResult();
assertThat(mentionRows.longValue()).isZero();
// The cascade is scoped to the deleted person — the bystander's mention row is untouched.
Number survivorRows = (Number) entityManager.createNativeQuery(
"SELECT count(*) FROM transcription_block_mentioned_persons WHERE person_id = ?1")
.setParameter(1, survivor.getId()).getSingleResult();
assertThat(survivorRows.longValue()).isEqualTo(1);
String text = (String) entityManager.createNativeQuery(
"SELECT text FROM transcription_blocks WHERE id = ?1")
.setParameter(1, blockId).getSingleResult();
assertThat(text).isEqualTo("Brief an @Auguste Raddatz und @Clara Cram");
}
}

View File

@@ -4,6 +4,7 @@ import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentRepository;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonType;
@@ -16,10 +17,13 @@ import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.transaction.annotation.Transactional;
import software.amazon.awssdk.services.s3.S3Client;
import org.springframework.mock.web.MockMultipartFile;
import jakarta.persistence.EntityManager;
import jakarta.persistence.PersistenceContext;
import java.util.Set;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
@@ -33,6 +37,7 @@ class PersonServiceIntegrationTest {
@Autowired PersonService personService;
@Autowired PersonRepository personRepository;
@Autowired DocumentRepository documentRepository;
@Autowired DocumentService documentService;
@PersistenceContext EntityManager entityManager;
@@ -75,6 +80,93 @@ class PersonServiceIntegrationTest {
assertThat(result.getLastName()).isEqualTo("Cram");
}
// ─── #731: case-colliding alias resolution against real Postgres ───────────
// The umlaut pair is mandatory — only the real DB proves Postgres LOWER() folds ü; a
// plain-ASCII test would stay green while umlaut aliases regressed.
@Test
void findOrCreateByAlias_resolvesUmlautAliasCollision_toLowestId_withoutThrow() {
Person muller = personRepository.save(Person.builder().lastName("Müller").alias("Müller").build());
Person mullerLower = personRepository.save(Person.builder().lastName("müller").alias("müller").build());
UUID expected = muller.getId().compareTo(mullerLower.getId()) <= 0 ? muller.getId() : mullerLower.getId();
// No exact-case "MÜLLER" row → falls through to the case-insensitive branch with two
// candidates and must pick the lowest id, never throwing NonUniqueResultException.
Person resolved = personService.findOrCreateByAlias("MÜLLER");
assertThat(resolved.getId()).isEqualTo(expected);
}
@Test
void findOrCreateByAlias_umlautAliasCollision_isDeterministicAcrossCalls() {
personRepository.save(Person.builder().lastName("Müller").alias("Müller").build());
personRepository.save(Person.builder().lastName("müller").alias("müller").build());
Person first = personService.findOrCreateByAlias("MÜLLER");
Person second = personService.findOrCreateByAlias("MÜLLER");
assertThat(second.getId()).isEqualTo(first.getId());
}
// ─── #731: filename-based sender resolution against real Postgres ──────────
@Test
void storeDocument_resolvesSender_whenFilenameNameIsUnique() throws Exception {
Person hans = personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
Document doc = uploadNamed("1965-03-12_Müller_Hans.pdf").document();
assertThat(doc.getSender()).isNotNull();
assertThat(doc.getSender().getId()).isEqualTo(hans.getId());
}
@Test
void storeDocument_resolvesSender_onSingleCaseInsensitiveMatch() throws Exception {
Person hans = personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
// Filename folds to "hans müller"; the only stored person is "Hans Müller".
Document doc = uploadNamed("1965-03-12_müller_hans.pdf").document();
assertThat(doc.getSender()).isNotNull();
assertThat(doc.getSender().getId()).isEqualTo(hans.getId());
}
@Test
void storeDocument_leavesSenderUnset_whenFilenameNameIsAmbiguous() throws Exception {
// Two persons collide case-insensitively; the filename casing ("HANS"/"MÜLLER") matches
// neither exactly → no exact-case winner → bail to null (never an arbitrary guess), no 500.
personRepository.save(Person.builder().firstName("Hans").lastName("Müller").build());
personRepository.save(Person.builder().firstName("hans").lastName("müller").build());
Document doc = uploadNamed("1965-03-12_MÜLLER_HANS.pdf").document();
assertThat(doc.getSender()).isNull();
}
@Test
void storeDocument_leavesSenderUnset_whenFilenameHasNoFirstName() throws Exception {
// A last-name-only filename never resolves to a sender (the parser yields no parsed name).
personRepository.save(Person.builder().lastName("Müller").build());
Document doc = uploadNamed("1965-03-12_Müller.pdf").document();
assertThat(doc.getSender()).isNull();
}
@Test
void findByName_nullFirstName_resolvesToEmpty_inRealPostgres() {
// Fail-closed against the real DB: a null first name must NOT widen to first_name IS NULL
// and pick up the last-name-only row.
personRepository.save(Person.builder().lastName("Müller").build()); // first_name NULL
assertThat(personService.findByName(null, "Müller")).isEmpty();
}
private DocumentService.StoreResult uploadNamed(String filename) throws Exception {
MockMultipartFile file = new MockMultipartFile("file", filename, "application/pdf", new byte[]{1, 2, 3});
return documentService.storeDocument(file, null);
}
// ─── #667: confirm round-trip + reader-default semantics ──────────────────
@Test
@@ -180,9 +272,9 @@ class PersonServiceIntegrationTest {
@Test
void deletePerson_detachesSentAndReceivedReferences_beforeDelete_noOrphan() {
// A person referenced as BOTH a document sender and a document receiver must delete
// cleanly: deletePerson nulls the sender_id FK and removes the receiver join row first
// (reassignSenderToNull → deleteReceiverReferences → deleteById), so no FK orphan and
// the documents themselves survive.
// cleanly via the service path: deletePerson just calls deleteById, and V71's ON DELETE
// constraints null the sender_id FK and drop the receiver join row, so there is no FK
// orphan and the documents themselves survive.
Person target = personRepository.save(Person.builder()
.firstName("Weg").lastName("Person").provisional(true).build());
Person bystander = personRepository.save(Person.builder()
@@ -196,16 +288,16 @@ class PersonServiceIntegrationTest {
.status(DocumentStatus.UPLOADED).sender(bystander)
.receivers(new java.util.HashSet<>(Set.of(target))).build());
// Persist the fixture and detach everything so the native @Modifying deletes operate on
// the database directly without the persistence context holding stale references that
// would re-flush a now-deleted person as a transient association.
// Persist the fixture and detach everything so the delete operates on the database
// directly without the persistence context holding stale references.
entityManager.flush();
entityManager.clear();
personService.deletePerson(target.getId());
// Native @Modifying queries bypass the persistence context — clear it so the asserting
// reads observe the post-delete database state, not stale managed entities.
// The ON DELETE cascade fires beneath Hibernate — flush the delete and clear the L1
// cache so the asserting reads observe the post-delete database state, not stale
// managed entities still holding the dropped sender/receiver associations.
entityManager.flush();
entityManager.clear();
@@ -220,4 +312,38 @@ class PersonServiceIntegrationTest {
// The other person and the documents themselves survive the delete.
assertThat(personRepository.findById(bystander.getId())).isPresent();
}
@Test
void mergePersons_targetInheritsReferences_sourceJoinRowCascadeDrops_noFkError() {
// AC-7: merging a source who is sender of A and receiver of B into a target leaves the
// target as sender of A and receiver of B, drops the source's leftover receiver row via
// V71's ON DELETE CASCADE (no explicit delete, no FK error), and co-receivers are intact.
Person source = personRepository.save(Person.builder().firstName("Anna").lastName("Alt").build());
Person target = personRepository.save(Person.builder().firstName("Anna").lastName("Neu").build());
Person coReceiver = personRepository.save(Person.builder().firstName("Mit").lastName("Empfänger").build());
Person sender = personRepository.save(Person.builder().firstName("Send").lastName("Er").build());
Document docA = documentRepository.save(Document.builder()
.title("Von Anna").originalFilename("a.pdf")
.status(DocumentStatus.UPLOADED).sender(source).build());
Document docB = documentRepository.save(Document.builder()
.title("An Anna").originalFilename("b.pdf")
.status(DocumentStatus.UPLOADED).sender(sender)
.receivers(new java.util.HashSet<>(Set.of(source, coReceiver))).build());
entityManager.flush();
entityManager.clear();
personService.mergePersons(source.getId(), target.getId());
entityManager.flush();
entityManager.clear();
assertThat(personRepository.findById(source.getId())).isEmpty();
Document reloadedA = documentRepository.findById(docA.getId()).orElseThrow();
assertThat(reloadedA.getSender().getId()).isEqualTo(target.getId());
Document reloadedB = documentRepository.findById(docB.getId()).orElseThrow();
assertThat(reloadedB.getReceivers()).extracting(Person::getId)
.containsExactlyInAnyOrder(target.getId(), coReceiver.getId());
}
}

View File

@@ -27,6 +27,7 @@ import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.argThat;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.verifyNoMoreInteractions;
import static org.mockito.Mockito.when;
@ExtendWith(MockitoExtension.class)
@@ -147,9 +148,11 @@ class PersonServiceTest {
personService.deletePerson(id);
verify(personRepository).reassignSenderToNull(id);
verify(personRepository).deleteReceiverReferences(id);
// Integrity is enforced by V71's ON DELETE constraints — the service only checks
// existence then deletes; it no longer detaches sender/receiver references itself.
verify(personRepository).findById(id);
verify(personRepository).deleteById(id);
verifyNoMoreInteractions(personRepository);
}
@Test
@@ -372,14 +375,57 @@ class PersonServiceTest {
// ─── findOrCreateByAlias ─────────────────────────────────────────────────
@Test
void findOrCreateByAlias_returnsExisting_whenAliasFound() {
String alias = "Walter de Gruyter";
Person existing = Person.builder().id(UUID.randomUUID()).alias(alias).build();
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.of(existing));
void findOrCreateByAlias_returnsExactCaseMatch_overCaseInsensitiveSibling() {
String alias = "müller";
Person exact = Person.builder().id(UUID.randomUUID()).alias("müller").build();
when(personRepository.findByAlias(alias)).thenReturn(Optional.of(exact));
Person result = personService.findOrCreateByAlias(alias);
assertThat(result).isEqualTo(existing);
assertThat(result).isEqualTo(exact);
verify(personRepository, never()).findAllByAliasIgnoreCase(any());
verify(personRepository, never()).save(any());
}
@Test
void findOrCreateByAlias_returnsExactCaseMatch_evenWhenMultipleSiblingsCollide() {
String alias = "Müller";
Person exact = Person.builder().id(UUID.randomUUID()).alias("Müller").build();
when(personRepository.findByAlias(alias)).thenReturn(Optional.of(exact));
Person result = personService.findOrCreateByAlias(alias);
assertThat(result).isEqualTo(exact);
// exact-case short-circuits — the case-insensitive siblings are never consulted.
verify(personRepository, never()).findAllByAliasIgnoreCase(any());
}
@Test
void findOrCreateByAlias_usesSingleCaseInsensitiveMatch_whenNoExactCase() {
String alias = "müller";
Person only = Person.builder().id(UUID.randomUUID()).alias("Müller").build();
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of(only));
Person result = personService.findOrCreateByAlias(alias);
assertThat(result).isEqualTo(only);
verify(personRepository, never()).save(any());
}
@Test
void findOrCreateByAlias_returnsLowestIdDeterministically_whenMultipleCaseInsensitiveMatches() {
String alias = "müller";
Person lower = Person.builder().id(UUID.fromString("00000000-0000-0000-0000-000000000001")).alias("Müller").build();
Person higher = Person.builder().id(UUID.fromString("00000000-0000-0000-0000-000000000002")).alias("müller").build();
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of(higher, lower)); // unordered
Person first = personService.findOrCreateByAlias(alias);
Person second = personService.findOrCreateByAlias(alias);
assertThat(first.getId()).isEqualTo(lower.getId()); // lowest id wins
assertThat(second.getId()).isEqualTo(first.getId()); // same result every call — never throws
verify(personRepository, never()).save(any());
}
@@ -387,7 +433,8 @@ class PersonServiceTest {
void findOrCreateByAlias_createsNew_whenAliasNotFound() {
String alias = "Clara Cram";
Person saved = Person.builder().id(UUID.randomUUID()).alias(alias).firstName("Clara").lastName("Cram").build();
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
when(personRepository.save(any())).thenReturn(saved);
Person result = personService.findOrCreateByAlias(alias);
@@ -400,7 +447,8 @@ class PersonServiceTest {
void findOrCreateByAlias_createsMaidenNameAlias_whenGebPresent() {
String alias = "Clara Cram geb. de Gruyter";
Person saved = Person.builder().id(UUID.randomUUID()).alias(alias).firstName("Clara").lastName("Cram").build();
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
when(personRepository.save(any())).thenReturn(saved);
when(aliasRepository.findMaxSortOrder(saved.getId())).thenReturn(0);
when(aliasRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
@@ -422,7 +470,8 @@ class PersonServiceTest {
@Test
void findOrCreateByAlias_setsInstitutionType_withFullNameInLastName() {
String alias = "Arthur Collignon GmbH";
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
when(personRepository.save(any())).thenAnswer(inv -> {
Person p = inv.getArgument(0);
p.setId(UUID.randomUUID());
@@ -439,7 +488,8 @@ class PersonServiceTest {
@Test
void findOrCreateByAlias_setsGroupType_withFullNameInLastName() {
String alias = "Geschwister de Gruyter";
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
when(personRepository.save(any())).thenAnswer(inv -> {
Person p = inv.getArgument(0);
p.setId(UUID.randomUUID());
@@ -457,7 +507,8 @@ class PersonServiceTest {
void findOrCreateByAlias_noAlias_whenNoGeb() {
String alias = "Clara Cram";
Person saved = Person.builder().id(UUID.randomUUID()).alias(alias).firstName("Clara").lastName("Cram").build();
when(personRepository.findByAliasIgnoreCase(alias)).thenReturn(Optional.empty());
when(personRepository.findByAlias(alias)).thenReturn(Optional.empty());
when(personRepository.findAllByAliasIgnoreCase(alias)).thenReturn(List.of());
when(personRepository.save(any())).thenReturn(saved);
personService.findOrCreateByAlias(alias);
@@ -469,11 +520,54 @@ class PersonServiceTest {
void findOrCreateByAlias_trimsInput() {
String alias = " Clara Cram ";
Person saved = Person.builder().id(UUID.randomUUID()).alias("Clara Cram").build();
when(personRepository.findByAliasIgnoreCase("Clara Cram")).thenReturn(Optional.of(saved));
when(personRepository.findByAlias("Clara Cram")).thenReturn(Optional.of(saved));
personService.findOrCreateByAlias(alias);
verify(personRepository).findByAliasIgnoreCase("Clara Cram");
verify(personRepository).findByAlias("Clara Cram");
}
// ─── findByName (filename-based sender resolution) ────────────────────────
@Test
void findByName_returnsExactCaseMatch_overCaseInsensitiveSibling() {
Person exact = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
when(personRepository.findByFirstNameAndLastName("Hans", "Müller")).thenReturn(Optional.of(exact));
assertThat(personService.findByName("Hans", "Müller")).contains(exact);
verify(personRepository, never()).findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(any(), any());
}
@Test
void findByName_usesSingleCaseInsensitiveMatch_whenNoExactCase() {
Person only = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
when(personRepository.findByFirstNameAndLastName("hans", "müller")).thenReturn(Optional.empty());
when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("hans", "müller"))
.thenReturn(List.of(only));
assertThat(personService.findByName("hans", "müller")).contains(only);
}
@Test
void findByName_bailsToEmpty_whenTwoOrMoreCaseInsensitiveMatches() {
Person a = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
Person b = Person.builder().id(UUID.randomUUID()).firstName("hans").lastName("müller").build();
when(personRepository.findByFirstNameAndLastName("hans", "müller")).thenReturn(Optional.empty());
when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase("hans", "müller"))
.thenReturn(List.of(a, b));
// Ambiguous sender → unset, never an arbitrary guess (provenance correctness over a
// confidently-wrong pre-fill). This is the deliberate divergence from the alias path.
assertThat(personService.findByName("hans", "müller")).isEmpty();
}
@Test
void findByName_returnsEmpty_whenFirstNameNullFoldsToNoMatch() {
when(personRepository.findByFirstNameAndLastName(null, "Müller")).thenReturn(Optional.empty());
when(personRepository.findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase(null, "Müller"))
.thenReturn(List.of());
assertThat(personService.findByName(null, "Müller")).isEmpty();
}
// ─── updatePerson (notes) ────────────────────────────────────────────────
@@ -700,10 +794,14 @@ class PersonServiceTest {
personService.mergePersons(sourceId, targetId);
verify(personRepository).findById(sourceId);
verify(personRepository).findById(targetId);
verify(personRepository).reassignSender(sourceId, targetId);
verify(personRepository).insertMissingReceiverReference(sourceId, targetId);
verify(personRepository).deleteReceiverReferences(sourceId);
verify(personRepository).deleteById(sourceId);
// The source's leftover receiver rows cascade-drop via V71's ON DELETE CASCADE on
// deleteById — merge no longer deletes them explicitly.
verifyNoMoreInteractions(personRepository);
}
// ─── getAliases ─────────────────────────────────────────────────────────
@@ -800,4 +898,15 @@ class PersonServiceTest {
.extracting(e -> ((DomainException) e).getStatus().value())
.isEqualTo(403);
}
@Test
void findByDisplayNameContaining_delegatesToSearchByName() {
Person walter = Person.builder().id(UUID.randomUUID()).firstName("Walter").lastName("Müller").build();
when(personRepository.searchByName("Walter")).thenReturn(List.of(walter));
List<Person> result = personService.findByDisplayNameContaining("Walter");
assertThat(result).containsExactly(walter);
verify(personRepository).searchByName("Walter");
}
}

View File

@@ -0,0 +1,440 @@
package org.raddatz.familienarchiv.search;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.mockito.ArgumentCaptor;
import org.mockito.Mock;
import org.mockito.MockitoAnnotations;
import org.raddatz.familienarchiv.document.DocumentSearchResult;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentSort;
import org.raddatz.familienarchiv.document.SearchFilters;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonService;
import org.raddatz.familienarchiv.tag.TagOperator;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
import static org.mockito.ArgumentMatchers.*;
import static org.mockito.Mockito.*;
class NlQueryParserServiceTest {
@Mock OllamaClient ollamaClient;
@Mock PersonService personService;
@Mock DocumentService documentService;
NlQueryParserService service;
static final Pageable PAGE = PageRequest.of(0, 20);
@BeforeEach
void setUp() {
MockitoAnnotations.openMocks(this);
service = new NlQueryParserService(ollamaClient, personService, documentService);
when(documentService.searchDocuments(any(), any(), any(), any()))
.thenReturn(DocumentSearchResult.of(List.of()));
when(documentService.searchDocumentsByPersonId(any(), any(), any(), any()))
.thenReturn(DocumentSearchResult.of(List.of()));
}
// --- Factory helpers ---
private OllamaExtraction extraction(List<String> names, String role, LocalDate from, LocalDate to,
List<String> keywords) {
String raw = names.isEmpty() ? "test query" : String.join(" ", names);
return new OllamaExtraction(names, role, from, to, keywords, raw);
}
private Person person(UUID id, String firstName, String lastName) {
return Person.builder().id(id).firstName(firstName).lastName(lastName).build();
}
private static final UUID P1 = UUID.fromString("00000000-0000-0000-0000-000000000001");
private static final UUID P2 = UUID.fromString("00000000-0000-0000-0000-000000000002");
private static final UUID P3 = UUID.fromString("00000000-0000-0000-0000-000000000003");
// --- 1. Single resolved name + personRole=sender ---
@Test
void search_resolvesSingleName_asSender() {
Person walter = person(P1, "Walter", "Raddatz");
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of("Walter"), "sender", null, null, List.of()));
when(personService.findByDisplayNameContaining("Walter")).thenReturn(List.of(walter));
NlSearchResponse resp = service.search("Was hat Walter geschrieben?", PAGE);
ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
verify(documentService).searchDocuments(cap.capture(), eq(DocumentSort.DATE), eq("desc"), eq(PAGE));
assertThat(cap.getValue().sender()).isEqualTo(P1);
assertThat(cap.getValue().receiver()).isNull();
assertThat(resp.interpretation().resolvedPersons()).hasSize(1);
assertThat(resp.interpretation().resolvedPersons().get(0).id()).isEqualTo(P1);
assertThat(resp.interpretation().ambiguousPersons()).isEmpty();
}
// --- 2. Multi-match name → ambiguous, search NOT executed ---
@Test
void search_multiMatchName_populatesAmbiguous_andSkipsSearch() {
Person a = person(UUID.randomUUID(), "Walter", "Braun");
Person b = person(UUID.randomUUID(), "Walter", "Schmidt");
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of("Walter"), "sender", null, null, List.of()));
when(personService.findByDisplayNameContaining("Walter")).thenReturn(List.of(a, b));
NlSearchResponse resp = service.search("Briefe von Walter", PAGE);
verify(documentService, never()).searchDocuments(any(), any(), any(), any());
verify(documentService, never()).searchDocumentsByPersonId(any(), any(), any(), any());
assertThat(resp.interpretation().ambiguousPersons()).hasSize(2);
assertThat(resp.interpretation().resolvedPersons()).isEmpty();
}
// --- 3. Multi-match + personRole=any → still ambiguous, search NOT executed ---
@Test
void search_multiMatchName_withPersonRoleAny_stillSkipsSearch() {
Person a = person(UUID.randomUUID(), "Emma", "Braun");
Person b = person(UUID.randomUUID(), "Emma", "Raddatz");
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of("Emma"), "any", null, null, List.of()));
when(personService.findByDisplayNameContaining("Emma")).thenReturn(List.of(a, b));
NlSearchResponse resp = service.search("Briefe an Emma", PAGE);
verify(documentService, never()).searchDocuments(any(), any(), any(), any());
verify(documentService, never()).searchDocumentsByPersonId(any(), any(), any(), any());
assertThat(resp.interpretation().ambiguousPersons()).hasSize(2);
}
// --- 4. No-match name → folded into text ---
@Test
void search_noMatchName_isFoldedIntoText() {
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of("Karl"), "any", null, null, List.of()));
when(personService.findByDisplayNameContaining("Karl")).thenReturn(List.of());
service.search("Briefe von Karl", PAGE);
ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
assertThat(cap.getValue().text()).contains("Karl");
assertThat(cap.getValue().sender()).isNull();
assertThat(cap.getValue().receiver()).isNull();
}
// --- 5. personRole=any + 1 resolved → searchDocumentsByPersonId called ---
@Test
void search_personRoleAny_singleMatch_callsSearchDocumentsByPersonId() {
Person walter = person(P1, "Walter", "Raddatz");
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of("Walter"), "any", null, null, List.of()));
when(personService.findByDisplayNameContaining("Walter")).thenReturn(List.of(walter));
NlSearchResponse resp = service.search("Briefe von Walter", PAGE);
verify(documentService).searchDocumentsByPersonId(eq(P1), isNull(), isNull(), eq(PAGE));
verify(documentService, never()).searchDocuments(any(), any(), any(), any());
assertThat(resp.interpretation().keywordsApplied()).isFalse();
}
// --- 6. 2 names both resolve → sender=person1, receiver=person2 ---
@Test
void search_twoNamesResolve_assignsSenderAndReceiver() {
Person walter = person(P1, "Walter", "Raddatz");
Person emma = person(P2, "Emma", "Raddatz");
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of("Walter", "Emma"), "any", null, null, List.of()));
when(personService.findByDisplayNameContaining("Walter")).thenReturn(List.of(walter));
when(personService.findByDisplayNameContaining("Emma")).thenReturn(List.of(emma));
NlSearchResponse resp = service.search("Briefe von Walter an Emma", PAGE);
ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
verify(documentService).searchDocuments(cap.capture(), eq(DocumentSort.DATE), eq("desc"), eq(PAGE));
assertThat(cap.getValue().sender()).isEqualTo(P1);
assertThat(cap.getValue().receiver()).isEqualTo(P2);
assertThat(resp.interpretation().resolvedPersons().get(0).id()).isEqualTo(P1);
assertThat(resp.interpretation().resolvedPersons().get(1).id()).isEqualTo(P2);
}
// --- 7. 2 names, first resolves, second ambiguous → search NOT executed ---
@Test
void search_twoNames_secondAmbiguous_skipsSearch() {
Person walter = person(P1, "Walter", "Raddatz");
Person emma1 = person(P2, "Emma", "Braun");
Person emma2 = person(P3, "Emma", "Schmidt");
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of("Walter", "Emma"), "sender", null, null, List.of()));
when(personService.findByDisplayNameContaining("Walter")).thenReturn(List.of(walter));
when(personService.findByDisplayNameContaining("Emma")).thenReturn(List.of(emma1, emma2));
NlSearchResponse resp = service.search("Briefe von Walter an Emma", PAGE);
verify(documentService, never()).searchDocuments(any(), any(), any(), any());
assertThat(resp.interpretation().ambiguousPersons()).hasSize(2);
}
// --- 8. 2 names, first no match → folded into text, second used as single person ---
@Test
void search_twoNames_firstNoMatch_secondResolved_foldFirstIntoText() {
Person emma = person(P2, "Emma", "Raddatz");
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of("Karl", "Emma"), "sender", null, null, List.of()));
when(personService.findByDisplayNameContaining("Karl")).thenReturn(List.of());
when(personService.findByDisplayNameContaining("Emma")).thenReturn(List.of(emma));
service.search("Briefe von Karl an Emma", PAGE);
ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
assertThat(cap.getValue().text()).contains("Karl");
assertThat(cap.getValue().sender()).isEqualTo(P2);
}
// --- 9. 3+ names all resolve → first two as sender/receiver, third folded into text ---
@Test
void search_threeNamesResolve_extraFoldedIntoText() {
Person walter = person(P1, "Walter", "Raddatz");
Person emma = person(P2, "Emma", "Raddatz");
Person heinrich = person(P3, "Heinrich", "Braun");
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of("Walter", "Emma", "Heinrich"), "any", null, null, List.of()));
when(personService.findByDisplayNameContaining("Walter")).thenReturn(List.of(walter));
when(personService.findByDisplayNameContaining("Emma")).thenReturn(List.of(emma));
when(personService.findByDisplayNameContaining("Heinrich")).thenReturn(List.of(heinrich));
service.search("Briefe von Walter an Emma über Heinrich", PAGE);
ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
assertThat(cap.getValue().sender()).isEqualTo(P1);
assertThat(cap.getValue().receiver()).isEqualTo(P2);
assertThat(cap.getValue().text()).contains("Heinrich");
}
// --- 10. Keywords space-joined into text ---
@Test
void search_keywords_areJoinedIntoText() {
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of(), "any", null, null, List.of("Krieg", "Walter")));
service.search("Dokumente über den Krieg Walter", PAGE);
ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
assertThat(cap.getValue().text()).isEqualTo("Krieg Walter");
}
// --- 11. Date range passed through ---
@Test
void search_dateRange_passedIntoSearchFilters() {
LocalDate from = LocalDate.of(1914, 1, 1);
LocalDate to = LocalDate.of(1914, 12, 31);
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of(), "any", from, to, List.of()));
service.search("Briefe aus dem Jahr 1914", PAGE);
ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
assertThat(cap.getValue().from()).isEqualTo(from);
assertThat(cap.getValue().to()).isEqualTo(to);
}
// --- 12. Null dates → null in SearchFilters (not an error) ---
@Test
void search_nullDates_passedAsNullIntoFilters() {
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of(), "any", null, null, List.of("Hochzeit")));
service.search("Hochzeitsbriefe", PAGE);
ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
assertThat(cap.getValue().from()).isNull();
assertThat(cap.getValue().to()).isNull();
}
// --- 13. Query under 3 chars → VALIDATION_ERROR before Ollama call ---
@Test
void search_queryTooShort_throwsValidationError() {
assertThatThrownBy(() -> service.search("ab", PAGE))
.isInstanceOf(DomainException.class)
.extracting(e -> ((DomainException) e).getCode())
.isEqualTo(ErrorCode.VALIDATION_ERROR);
verify(ollamaClient, never()).parse(anyString());
}
// --- 14. Query over 500 chars → VALIDATION_ERROR ---
@Test
void search_queryTooLong_throwsValidationError() {
String longQuery = "a".repeat(501);
assertThatThrownBy(() -> service.search(longQuery, PAGE))
.isInstanceOf(DomainException.class)
.extracting(e -> ((DomainException) e).getCode())
.isEqualTo(ErrorCode.VALIDATION_ERROR);
verify(ollamaClient, never()).parse(anyString());
}
// --- 15. Ollama returns empty names/keywords → raw query used as keyword fallback ---
@Test
void search_ollamaReturnsEmpty_usesRawQueryAsTextFallback() {
String raw = "Briefe aus dem Krieg";
when(ollamaClient.parse(anyString()))
.thenReturn(new OllamaExtraction(List.of(), "any", null, null, List.of(), raw));
service.search(raw, PAGE);
ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
assertThat(cap.getValue().text()).isEqualTo(raw);
}
// --- 16. Null personNames/keywords from Ollama → no NPE ---
@Test
void search_nullPersonNamesAndKeywords_handledWithoutNpe() {
OllamaExtraction ext = new OllamaExtraction(null, "any", null, null, null, "test query");
when(ollamaClient.parse(anyString())).thenReturn(ext);
NlSearchResponse resp = service.search("test query", PAGE);
assertThat(resp).isNotNull();
verify(documentService).searchDocuments(any(), any(), any(), any());
}
// --- 17. Unrecognized personRole → defaults to any-like behavior (no crash) ---
@Test
void search_unrecognizedPersonRole_treatedLikeAny_withSingleResolvedPerson() {
Person walter = person(P1, "Walter", "Raddatz");
// OllamaClient defensive parsing returns "any" for unknown roles,
// but NlQueryParserService must also be safe if something unexpected arrives.
when(ollamaClient.parse(anyString()))
.thenReturn(new OllamaExtraction(List.of("Walter"), "unknown_role", null, null, List.of(), "query"));
when(personService.findByDisplayNameContaining("Walter")).thenReturn(List.of(walter));
NlSearchResponse resp = service.search("Briefe von Walter", PAGE);
// Should not crash; "unknown_role" treated as fallback (neither sender nor receiver → any)
assertThat(resp).isNotNull();
}
// --- 18. Ollama throws SMART_SEARCH_UNAVAILABLE → propagates to caller ---
@Test
void search_ollamaThrowsUnavailable_propagates() {
when(ollamaClient.parse(anyString()))
.thenThrow(DomainException.tooManyRequests(ErrorCode.SMART_SEARCH_UNAVAILABLE, "offline"));
assertThatThrownBy(() -> service.search("Was hat Walter geschrieben?", PAGE))
.isInstanceOf(DomainException.class)
.extracting(e -> ((DomainException) e).getCode())
.isEqualTo(ErrorCode.SMART_SEARCH_UNAVAILABLE);
}
// --- 19. LLM-extracted name > 200 chars → skipped, PersonService never called ---
@Test
void search_nameLongerThan200Chars_isSkippedBeforePersonServiceCall() {
String longName = "A".repeat(201);
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of(longName), "sender", null, null, List.of()));
service.search("Briefe von sehr langem Namen", PAGE);
verify(personService, never()).findByDisplayNameContaining(anyString());
}
// --- 20. Max 10 candidates cap: 11 persons returned → only first 10 in ambiguousPersons ---
@Test
void search_elevenCandidates_capsAtTen() {
List<Person> eleven = new ArrayList<>();
for (int i = 0; i < 11; i++) {
eleven.add(person(UUID.randomUUID(), "Walter", "Person" + i));
}
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of("Walter"), "sender", null, null, List.of()));
when(personService.findByDisplayNameContaining("Walter")).thenReturn(eleven);
NlSearchResponse resp = service.search("Briefe von Walter", PAGE);
assertThat(resp.interpretation().ambiguousPersons()).hasSize(10);
verify(documentService, never()).searchDocuments(any(), any(), any(), any());
}
// --- 21. SearchFilters defaults: tagOperator=AND, status=null, undated=false, tags=empty ---
@Test
void search_searchFiltersDefaults_areCorrect() {
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of(), "any", null, null, List.of("Krieg")));
service.search("Dokumente über den Krieg", PAGE);
ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
verify(documentService).searchDocuments(cap.capture(), eq(DocumentSort.DATE), eq("desc"), eq(PAGE));
SearchFilters f = cap.getValue();
assertThat(f.tagOperator()).isEqualTo(TagOperator.AND);
assertThat(f.status()).isNull();
assertThat(f.undated()).isFalse();
assertThat(f.tags()).isEmpty();
assertThat(f.tagQ()).isNull();
}
// --- 22. personRole=receiver + 1 resolved → receiver UUID set ---
@Test
void search_personRoleReceiver_singleMatch_setsReceiver() {
Person emma = person(P2, "Emma", "Raddatz");
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of("Emma"), "receiver", null, null, List.of()));
when(personService.findByDisplayNameContaining("Emma")).thenReturn(List.of(emma));
service.search("Briefe an Emma", PAGE);
ArgumentCaptor<SearchFilters> cap = ArgumentCaptor.forClass(SearchFilters.class);
verify(documentService).searchDocuments(cap.capture(), any(), any(), any());
assertThat(cap.getValue().receiver()).isEqualTo(P2);
assertThat(cap.getValue().sender()).isNull();
}
// --- 23. keywordsApplied=true when text is non-blank ---
@Test
void search_keywordsApplied_trueWhenTextNonBlank() {
when(ollamaClient.parse(anyString()))
.thenReturn(extraction(List.of(), "any", null, null, List.of("Feldpost")));
NlSearchResponse resp = service.search("Feldpost aus dem Krieg", PAGE);
assertThat(resp.interpretation().keywordsApplied()).isTrue();
}
}

View File

@@ -0,0 +1,161 @@
package org.raddatz.familienarchiv.search;
import tools.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.document.DocumentSearchResult;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import org.raddatz.familienarchiv.security.SecurityConfig;
import org.raddatz.familienarchiv.security.PermissionAspect;
import org.raddatz.familienarchiv.user.CustomUserDetailsService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.autoconfigure.aop.AopAutoConfiguration;
import org.springframework.boot.webmvc.test.autoconfigure.WebMvcTest;
import org.springframework.context.annotation.Import;
import org.springframework.http.MediaType;
import org.springframework.security.test.context.support.WithMockUser;
import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.test.web.servlet.MockMvc;
import java.util.List;
import java.util.UUID;
import static org.mockito.ArgumentMatchers.*;
import static org.mockito.Mockito.when;
import static org.springframework.security.test.web.servlet.request.SecurityMockMvcRequestPostProcessors.csrf;
import static org.springframework.test.web.servlet.request.MockMvcRequestBuilders.post;
import static org.springframework.test.web.servlet.result.MockMvcResultMatchers.*;
@WebMvcTest(NlSearchController.class)
@Import({SecurityConfig.class, PermissionAspect.class, AopAutoConfiguration.class,
NlSearchRateLimiter.class, NlSearchRateLimitProperties.class})
class NlSearchControllerTest {
@Autowired MockMvc mockMvc;
private final ObjectMapper objectMapper = new ObjectMapper();
@MockitoBean NlQueryParserService nlQueryParserService;
@MockitoBean CustomUserDetailsService customUserDetailsService;
@Autowired NlSearchRateLimiter rateLimiter;
@BeforeEach
void resetRateLimiter() {
rateLimiter.resetForTest();
}
private NlSearchResponse makeResponse() {
PersonHint hint = new PersonHint(UUID.randomUUID(), "Walter Raddatz");
NlQueryInterpretation interp = new NlQueryInterpretation(
List.of(hint), List.of(), null, null,
List.of("Krieg"), "Briefe von Walter im Krieg", true);
return new NlSearchResponse(DocumentSearchResult.of(List.of()), interp);
}
// --- 1. Happy path ---
@Test
@WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
void search_returns200_withNlSearchResponse() throws Exception {
when(nlQueryParserService.search(anyString(), any())).thenReturn(makeResponse());
mockMvc.perform(post("/api/search/nl").with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"query\":\"Briefe von Walter im Krieg\"}"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.interpretation.rawQuery").value("Briefe von Walter im Krieg"))
.andExpect(jsonPath("$.interpretation.resolvedPersons[0].displayName").value("Walter Raddatz"))
.andExpect(jsonPath("$.interpretation.keywordsApplied").value(true));
}
// --- 2. ambiguousPersons in response shape ---
@Test
@WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
void search_returns200_withAmbiguousPersons() throws Exception {
PersonHint a = new PersonHint(UUID.randomUUID(), "Walter Braun");
PersonHint b = new PersonHint(UUID.randomUUID(), "Walter Schmidt");
NlQueryInterpretation interp = new NlQueryInterpretation(
List.of(), List.of(a, b), null, null,
List.of(), "Briefe von Walter", false);
NlSearchResponse resp = new NlSearchResponse(DocumentSearchResult.of(List.of()), interp);
when(nlQueryParserService.search(anyString(), any())).thenReturn(resp);
mockMvc.perform(post("/api/search/nl").with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"query\":\"Briefe von Walter\"}"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.interpretation.ambiguousPersons").isArray())
.andExpect(jsonPath("$.interpretation.ambiguousPersons[0].displayName").value("Walter Braun"))
.andExpect(jsonPath("$.interpretation.ambiguousPersons[1].id").isNotEmpty());
}
// --- 3. Unauthenticated → 401 ---
@Test
void search_returns401_whenUnauthenticated() throws Exception {
mockMvc.perform(post("/api/search/nl").with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"query\":\"Briefe von Walter\"}"))
.andExpect(status().isUnauthorized());
}
// --- 4. Query < 3 chars → 400 ---
@Test
@WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
void search_returns400_whenQueryTooShort() throws Exception {
mockMvc.perform(post("/api/search/nl").with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"query\":\"ab\"}"))
.andExpect(status().isBadRequest());
}
// --- 5. Query > 500 chars → 400 ---
@Test
@WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
void search_returns400_whenQueryTooLong() throws Exception {
String longQuery = "a".repeat(501);
mockMvc.perform(post("/api/search/nl").with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"query\":\"" + longQuery + "\"}"))
.andExpect(status().isBadRequest());
}
// --- 6. Ollama unavailable → 503 ---
@Test
@WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
void search_returns503_whenOllamaUnavailable() throws Exception {
when(nlQueryParserService.search(anyString(), any()))
.thenThrow(DomainException.serviceUnavailable(ErrorCode.SMART_SEARCH_UNAVAILABLE, "Ollama offline"));
mockMvc.perform(post("/api/search/nl").with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"query\":\"Briefe von Walter\"}"))
.andExpect(status().isServiceUnavailable())
.andExpect(jsonPath("$.code").value("SMART_SEARCH_UNAVAILABLE"));
}
// --- 7. 6th request in 1 minute → 429 ---
@Test
@WithMockUser(username = "user@test.com", authorities = {"READ_ALL"})
void search_returns429_onSixthRequestWithinRateLimit() throws Exception {
when(nlQueryParserService.search(anyString(), any())).thenReturn(makeResponse());
for (int i = 0; i < 5; i++) {
mockMvc.perform(post("/api/search/nl").with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"query\":\"Briefe von Walter\"}"))
.andExpect(status().isOk());
}
mockMvc.perform(post("/api/search/nl").with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"query\":\"Briefe von Walter\"}"))
.andExpect(status().isTooManyRequests())
.andExpect(jsonPath("$.code").value("SMART_SEARCH_RATE_LIMITED"));
}
}

View File

@@ -0,0 +1,62 @@
package org.raddatz.familienarchiv.search;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import static org.assertj.core.api.Assertions.assertThatCode;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
class NlSearchRateLimiterTest {
private NlSearchRateLimiter rateLimiter;
@BeforeEach
void setUp() {
NlSearchRateLimitProperties props = new NlSearchRateLimitProperties();
props.setMaxRequestsPerMinute(5);
rateLimiter = new NlSearchRateLimiter(props);
}
@Test
void checkAndConsume_allowsRequestsWithinLimit() {
for (int i = 0; i < 5; i++) {
assertThatCode(() -> rateLimiter.checkAndConsume("user@example.com"))
.doesNotThrowAnyException();
}
}
@Test
void checkAndConsume_throwsRateLimited_onSixthRequest() {
for (int i = 0; i < 5; i++) {
rateLimiter.checkAndConsume("user@example.com");
}
assertThatThrownBy(() -> rateLimiter.checkAndConsume("user@example.com"))
.isInstanceOf(DomainException.class)
.extracting(e -> ((DomainException) e).getCode())
.isEqualTo(ErrorCode.SMART_SEARCH_RATE_LIMITED);
}
@Test
void checkAndConsume_limitsAreIndependentPerUser() {
for (int i = 0; i < 5; i++) {
rateLimiter.checkAndConsume("alice@example.com");
}
assertThatCode(() -> rateLimiter.checkAndConsume("bob@example.com"))
.doesNotThrowAnyException();
}
@Test
void resetForTest_clearsAllBuckets() {
for (int i = 0; i < 5; i++) {
rateLimiter.checkAndConsume("user@example.com");
}
rateLimiter.resetForTest();
assertThatCode(() -> rateLimiter.checkAndConsume("user@example.com"))
.doesNotThrowAnyException();
}
}

View File

@@ -0,0 +1,113 @@
package org.raddatz.familienarchiv.search;
import com.github.tomakehurst.wiremock.WireMockServer;
import com.github.tomakehurst.wiremock.core.WireMockConfiguration;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import static com.github.tomakehurst.wiremock.client.WireMock.*;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
class RestClientOllamaClientTest {
private WireMockServer wireMock;
private RestClientOllamaClient client;
@BeforeEach
void setUp() {
wireMock = new WireMockServer(WireMockConfiguration.wireMockConfig().dynamicPort());
wireMock.start();
OllamaProperties props = new OllamaProperties();
props.setBaseUrl("http://localhost:" + wireMock.port());
props.setModel("qwen2.5:7b-instruct-q4_K_M");
props.setTimeoutSeconds(5);
props.setHealthCheckTimeoutSeconds(2);
client = new RestClientOllamaClient(props);
}
@AfterEach
void tearDown() {
wireMock.stop();
}
// --- Factory helpers ---
private String makeOllamaResponseJson(String personNamesJson, String personRole,
String dateFrom, String dateTo, String keywordsJson) {
String inner = String.format(
"{\"personNames\":%s,\"personRole\":\"%s\",\"dateFrom\":%s,\"dateTo\":%s,\"keywords\":%s}",
personNamesJson, personRole,
dateFrom == null ? "null" : "\"" + dateFrom + "\"",
dateTo == null ? "null" : "\"" + dateTo + "\"",
keywordsJson
);
return String.format("{\"model\":\"qwen2.5:7b-instruct-q4_K_M\",\"response\":\"%s\",\"done\":true}",
inner.replace("\"", "\\\""));
}
// --- Test cases ---
@Test
void parse_returnsExtraction_whenOllamaReturnsValidJson() {
String body = makeOllamaResponseJson("[\"Walter\"]", "sender", "1914-01-01", "1914-12-31", "[\"Krieg\"]");
wireMock.stubFor(post(urlEqualTo("/api/generate"))
.willReturn(aResponse()
.withStatus(200)
.withHeader("Content-Type", "application/json")
.withBody(body)));
OllamaExtraction result = client.parse("Was hat Walter im Krieg geschrieben?");
assertThat(result.personNames()).containsExactly("Walter");
assertThat(result.personRole()).isEqualTo("sender");
assertThat(result.keywords()).containsExactly("Krieg");
assertThat(result.dateFrom()).isNotNull();
assertThat(result.dateTo()).isNotNull();
}
@Test
void parse_throwsSmartSearchUnavailable_whenOllamaReturns500() {
wireMock.stubFor(post(urlEqualTo("/api/generate"))
.willReturn(aResponse().withStatus(500)));
assertThatThrownBy(() -> client.parse("some query"))
.isInstanceOf(DomainException.class)
.extracting(e -> ((DomainException) e).getCode())
.isEqualTo(ErrorCode.SMART_SEARCH_UNAVAILABLE);
}
@Test
void parse_throwsSmartSearchUnavailable_whenOllamaExceedsTimeout() {
wireMock.stubFor(post(urlEqualTo("/api/generate"))
.willReturn(aResponse()
.withStatus(200)
.withHeader("Content-Type", "application/json")
.withFixedDelay(6000)
.withBody("{\"response\":\"{}\",\"done\":true}")));
assertThatThrownBy(() -> client.parse("some query"))
.isInstanceOf(DomainException.class)
.extracting(e -> ((DomainException) e).getCode())
.isEqualTo(ErrorCode.SMART_SEARCH_UNAVAILABLE);
}
@Test
void parse_throwsSmartSearchUnavailable_whenOllamaReturnsMalformedJson() {
wireMock.stubFor(post(urlEqualTo("/api/generate"))
.willReturn(aResponse()
.withStatus(200)
.withHeader("Content-Type", "application/json")
.withBody("{\"response\":\"not-json-at-all\",\"done\":true}")));
assertThatThrownBy(() -> client.parse("some query"))
.isInstanceOf(DomainException.class)
.extracting(e -> ((DomainException) e).getCode())
.isEqualTo(ErrorCode.SMART_SEARCH_UNAVAILABLE);
}
}

View File

@@ -53,20 +53,68 @@ class TagServiceTest {
// ─── findOrCreate ─────────────────────────────────────────────────────────
@Test
void findOrCreate_returnsExisting_whenNameFound() {
Tag existing = Tag.builder().id(UUID.randomUUID()).name("Familie").build();
when(tagRepository.findByNameIgnoreCase("Familie")).thenReturn(Optional.of(existing));
void findOrCreate_exactCaseWins_overCaseInsensitiveSibling() {
// "Geburt" (parent) and "geburt" (child) both exist; the edit round-trip replays the stored
// name "geburt", which must bind to the exact-case row, not the parent.
Tag exact = Tag.builder().id(UUID.randomUUID()).name("geburt").build();
when(tagRepository.findByName("geburt")).thenReturn(Optional.of(exact));
Tag result = tagService.findOrCreate("Familie");
Tag result = tagService.findOrCreate("geburt");
assertThat(result).isEqualTo(existing);
assertThat(result).isEqualTo(exact);
verify(tagRepository, never()).save(any());
}
@Test
void findOrCreate_createsNew_whenNameNotFound() {
void findOrCreate_exactCaseWins_evenWhenItsIdIsNotTheLowest() {
// Adversarial guard: exact-case must short-circuit BEFORE the lowest-id rule. Here the exact row
// has the higher id, so a naive "always pick lowest id across all CI matches" would pick wrong.
Tag exactHigherId = Tag.builder().id(UUID.fromString("00000000-0000-0000-0000-000000000009")).name("geburt").build();
when(tagRepository.findByName("geburt")).thenReturn(Optional.of(exactHigherId));
Tag result = tagService.findOrCreate("geburt");
assertThat(result).isEqualTo(exactHigherId);
verify(tagRepository, never()).findAllByNameIgnoreCase(any()); // exact-case wins without consulting the CI list
verify(tagRepository, never()).save(any());
}
@Test
void findOrCreate_usesSingleCaseInsensitiveMatch_whenNoExactCase() {
// Stored name is "Weihnachten"; a save replays "weihnachten" (no exact-case row) → bind to the
// single case-insensitive match rather than creating a duplicate.
Tag stored = Tag.builder().id(UUID.randomUUID()).name("Weihnachten").build();
when(tagRepository.findByName("weihnachten")).thenReturn(Optional.empty());
when(tagRepository.findAllByNameIgnoreCase("weihnachten")).thenReturn(List.of(stored));
Tag result = tagService.findOrCreate("weihnachten");
assertThat(result).isEqualTo(stored);
verify(tagRepository, never()).save(any());
}
@Test
void findOrCreate_returnsLowestIdDeterministically_whenMultipleCaseInsensitiveMatches() {
// Two rows collide case-insensitively and neither equals the query exactly. Resolution must be
// deterministic (lowest id) and never throw — proven by calling twice and getting the same id.
Tag lowerId = Tag.builder().id(UUID.fromString("00000000-0000-0000-0000-000000000001")).name("Reisepläne").build();
Tag higherId = Tag.builder().id(UUID.fromString("00000000-0000-0000-0000-000000000002")).name("reisepläne").build();
when(tagRepository.findByName("REISEPLÄNE")).thenReturn(Optional.empty());
when(tagRepository.findAllByNameIgnoreCase("REISEPLÄNE")).thenReturn(List.of(higherId, lowerId));
Tag first = tagService.findOrCreate("REISEPLÄNE");
Tag second = tagService.findOrCreate("REISEPLÄNE");
assertThat(first.getId()).isEqualTo(lowerId.getId());
assertThat(second.getId()).isEqualTo(first.getId());
verify(tagRepository, never()).save(any());
}
@Test
void findOrCreate_createsOrphanTag_whenNameAbsent() {
Tag saved = Tag.builder().id(UUID.randomUUID()).name("Krieg").build();
when(tagRepository.findByNameIgnoreCase("Krieg")).thenReturn(Optional.empty());
when(tagRepository.findByName("Krieg")).thenReturn(Optional.empty());
when(tagRepository.findAllByNameIgnoreCase("Krieg")).thenReturn(List.of());
when(tagRepository.save(any())).thenReturn(saved);
Tag result = tagService.findOrCreate("Krieg");
@@ -76,13 +124,15 @@ class TagServiceTest {
}
@Test
void findOrCreate_trimsWhitespaceBeforeLookup() {
Tag existing = Tag.builder().id(UUID.randomUUID()).name("Urlaub").build();
when(tagRepository.findByNameIgnoreCase("Urlaub")).thenReturn(Optional.of(existing));
void findOrCreate_trimsWhitespace_thenLandsOnCaseInsensitiveChild() {
Tag child = Tag.builder().id(UUID.randomUUID()).name("weihnachten").build();
when(tagRepository.findByName("weihnachten")).thenReturn(Optional.empty());
when(tagRepository.findAllByNameIgnoreCase("weihnachten")).thenReturn(List.of(child));
tagService.findOrCreate(" Urlaub ");
Tag result = tagService.findOrCreate(" weihnachten ");
verify(tagRepository).findByNameIgnoreCase("Urlaub");
assertThat(result).isEqualTo(child);
verify(tagRepository).findByName("weihnachten");
}
// ─── update ───────────────────────────────────────────────────────────────

View File

@@ -132,6 +132,31 @@ class AdminControllerTest {
.andExpect(jsonPath("$.count").value(3));
}
// ─── POST /api/admin/backfill-titles (#726) ────────────────────────────────
@Test
void backfillTitles_returns401_whenUnauthenticated() throws Exception {
mockMvc.perform(post("/api/admin/backfill-titles").with(csrf()))
.andExpect(status().isUnauthorized());
}
@Test
@WithMockUser(roles = "USER")
void backfillTitles_returns403_whenNotAdmin() throws Exception {
mockMvc.perform(post("/api/admin/backfill-titles").with(csrf()))
.andExpect(status().isForbidden());
}
@Test
@WithMockUser(authorities = "ADMIN")
void backfillTitles_returns200_withCount_whenAdmin() throws Exception {
when(documentService.backfillTitles()).thenReturn(7);
mockMvc.perform(post("/api/admin/backfill-titles").with(csrf()))
.andExpect(status().isOk())
.andExpect(jsonPath("$.count").value(7));
}
// ─── POST /api/admin/generate-thumbnails ───────────────────────────────────
@Test

View File

@@ -141,6 +141,65 @@ services:
security_opt:
- no-new-privileges:true
# --- Ollama: Model init (one-shot pull) ---
# Pulls qwen2.5:7b-instruct-q4_K_M (~4.7 GB) into the ollama_models volume on first start.
# On subsequent starts (model already in volume), exits quickly without re-downloading.
# Not started in CI — CI uses explicit service selection
# (docker-compose.ci.yml: db minio create-buckets)
ollama-model-init:
image: ollama/ollama:0.30.6
restart: "no"
networks:
- archiv-net
volumes:
- ollama_models:/root/.ollama
mem_limit: 2g
read_only: true
tmpfs:
- /tmp:size=512m
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
command: >
sh -c "ollama serve & SERVE_PID=$$! && until curl -sf http://localhost:11434/api/tags; do sleep 1; done && ollama pull qwen2.5:7b-instruct-q4_K_M && kill $$SERVE_PID"
# --- Ollama: LLM inference server ---
# Serves the pre-pulled model for NL search inference.
# Not started in CI — CI uses explicit service selection
# (docker-compose.ci.yml: db minio create-buckets)
ollama:
image: ollama/ollama:0.30.6
container_name: archive-ollama
restart: unless-stopped
expose:
- "11434"
networks:
- archiv-net
volumes:
- ollama_models:/root/.ollama
environment:
OLLAMA_API_KEY: "${OLLAMA_API_KEY}"
cpus: "${OLLAMA_CPU_LIMIT:-4.0}"
mem_limit: "${OLLAMA_MEM_LIMIT:-8g}"
memswap_limit: "${OLLAMA_MEM_LIMIT:-8g}"
read_only: true
tmpfs:
- /tmp:size=512m
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s # model weights are pre-loaded by ollama-model-init; service only needs to bind port
depends_on:
ollama-model-init:
condition: service_completed_successfully
# --- Backend: Spring Boot ---
backend:
build:
@@ -184,6 +243,8 @@ services:
SPRING_MAIL_PROPERTIES_MAIL_SMTP_STARTTLS_ENABLE: ${MAIL_STARTTLS_ENABLE:-false}
APP_OCR_BASE_URL: http://ocr-service:8000
APP_OCR_TRAINING_TOKEN: "${OCR_TRAINING_TOKEN:-}"
APP_OLLAMA_BASE_URL: "${APP_OLLAMA_BASE_URL:-http://ollama:11434}"
APP_OLLAMA_API_KEY: "${OLLAMA_API_KEY}"
SENTRY_DSN: ${SENTRY_DSN:-}
SENTRY_TRACES_SAMPLE_RATE: ${SENTRY_TRACES_SAMPLE_RATE:-1.0}
# Observability: send traces to Tempo inside archiv-net (OTLP gRPC port 4317)
@@ -247,3 +308,4 @@ volumes:
frontend_node_modules:
ocr_models:
ocr_cache:
ollama_models:

View File

@@ -50,13 +50,16 @@ graph TD
The OCR service requires significant RAM for model loading. The dev compose sets `mem_limit: 12g`.
| Production target | RAM | Recommended OCR limit | Notes |
|---|---|---|---|
| Hetzner CX42 | 16 GB | 12 GB | Recommended for OCR-enabled production |
| Hetzner CX32 | 8 GB | 6 GB | Accept reduced batch sizes and slower throughput |
| Hetzner CX22 | 4 GB | | Disable the OCR service (`profiles: [ocr]`); run OCR on demand only |
| Production target | RAM | Recommended OCR limit | NL Search | Notes |
|---|---|---|---|---|
| Current server (Hetzner Serverbörse, i7-6700) | 64 GB | 12 GB | Supported | Default `mem_limit: 12g` works comfortably; plenty of headroom for Ollama |
| ≥ 16 GB RAM | 16+ GB | 12 GB | Supported | Default works |
| 8 GB RAM | 8 GB | 6 GB | Disabled — set `APP_OLLAMA_BASE_URL=` (empty) | Set `OCR_MEM_LIMIT=6g`; accept reduced batch sizes |
| 4 GB RAM | 4 GB | — | Unsupported | Disable OCR service (`profiles: [ocr]`); run OCR on demand only |
A CX32 cannot honour the default `mem_limit: 12g` — set the `OCR_MEM_LIMIT=6g` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow) before deploying on a CX32. The prod compose interpolates this var with a 12g default.
On servers with less than 16 GB RAM the default `mem_limit: 12g` cannot be honoured — set the `OCR_MEM_LIMIT` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow). The prod compose interpolates this var with a 12g default.
> **Memory budget:** OCR (~6 GB active) + Ollama (~8 GB) = ~14 GB. On servers with less than 16 GB RAM, do not run `docker-compose.observability.yml` continuously alongside both OCR and Ollama.
### Dev vs production differences
@@ -140,10 +143,20 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `ALLOWED_PDF_HOSTS` | SSRF protection — comma-separated list of allowed PDF source hosts. **Do not widen to `*`** | `minio,localhost,127.0.0.1` | YES | — |
| `KRAKEN_MODEL_PATH` | Directory containing Kraken HTR models (populated by `download-kraken-models.sh`) | `/app/models/` | — | — |
| `BLLA_MODEL_PATH` | Kraken baseline layout analysis model path | `/app/models/blla.mlmodel` | — | — |
| `OCR_MEM_LIMIT` | Container memory cap for ocr-service in `docker-compose.prod.yml`. Set to `6g` on CX32 hosts; leave unset on CX42+ to use the 12g default | `12g` (prod compose default) | — | — |
| `OCR_MEM_LIMIT` | Container memory cap for ocr-service in `docker-compose.prod.yml`. Set to `6g` on servers with 8 GB RAM; leave unset (12g default) on servers with ≥ 16 GB RAM | `12g` (prod compose default) | — | — |
| `XDG_CACHE_HOME` | XDG cache base dir — redirects Matplotlib and other XDG-aware libraries away from the read-only `HOME` (`/home/ocr`) to the writable cache volume | `/app/cache` | — | — |
| `TORCH_HOME` | PyTorch model cache — redirects `~/.cache/torch` to the writable models volume | `/app/models/torch` | — | — |
### Ollama (NL search) service
| Variable | Purpose | Default | Required? | Sensitive? |
|---|---|---|---|---|
| `APP_OLLAMA_BASE_URL` | Base URL for the Ollama service. Leave empty to disable NL search. | `http://ollama:11434` | — | — |
| `APP_OLLAMA_API_KEY` | API key passed as `Authorization: Bearer` to Ollama. Leave empty for unauthenticated access. Note: `OLLAMA_API_KEY` is not enforced in Ollama 0.6.5 or 0.30.6 (see ADR-028). | — | — | YES |
| `OLLAMA_CPU_LIMIT` | Docker CPU quota for the Ollama container. On CX42 (8 vCPUs) can be raised to `7.5`. | `4.0` | — | — |
| `OLLAMA_MEM_LIMIT` | Memory limit for the Ollama container. Requires CX42 (16 GB RAM). | `8g` | — | — |
| `OLLAMA_API_KEY` | API key set on the Ollama service itself. Same value as `APP_OLLAMA_API_KEY`. Leave empty for unauthenticated. | — | — | YES |
### Observability stack (`docker-compose.observability.yml`)
| Variable | Purpose | Default | Required? | Sensitive? |
@@ -264,6 +277,19 @@ git.raddatz.cloud A <server IP>
### 3.4 First deploy
> **First start — Ollama model pull:** On first `docker compose up -d`, the `ollama-model-init` container pulls `qwen2.5:7b-instruct-q4_K_M` (~4.7 GB). At 10 Mbps this takes approximately 6090 minutes; at 100 Mbps approximately 610 minutes. The pull is a one-time operation — subsequent restarts skip it (model already on the `ollama_models` volume). Monitor progress with `docker logs -f $(docker ps -q --filter name=ollama-model-init)`.
>
> **Do not use `--wait` on first deploy** — `docker compose up -d --wait` waits for all services to reach their health/completion target, including `ollama-model-init`. On first pull this blocks for 6090 minutes and will time out any CI/deploy script that uses `--wait`.
>
> **Re-deploy idempotency:** on subsequent `docker compose up -d` runs (including `--force-recreate`), `ollama-model-init` re-executes but exits in seconds — Ollama's CLI skips the download when the model digest already matches what is on the volume.
>
> **Verify NL search is active** after enabling Ollama (`APP_OLLAMA_BASE_URL=http://ollama:11434`):
> ```bash
> curl -s http://localhost:8080/api/nl-search?q=brief+von+grossmutter
> # Returns 200 with results → NL search is active
> # Returns 503 NL_SEARCH_UNAVAILABLE → Ollama is not reachable or APP_OLLAMA_BASE_URL is unset
> ```
```bash
# 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
# Expected: docker compose up -d --wait succeeds for archiv-staging, then
@@ -559,6 +585,55 @@ bash scripts/download-kraken-models.sh
> Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.
### Ollama — natural-language search (NL Search)
NL search uses a local Ollama instance for query parsing. The `ollama` service is defined in `docker-compose.yml` alongside the main stack.
**First-time model pull** (required before the feature works):
```bash
docker compose exec ollama ollama pull qwen2.5:7b-instruct-q4_K_M
```
This downloads ~4.4 GB. The model is stored in the `ollama_data` Docker volume and persists across container restarts.
**Verify the model is available:**
```bash
docker compose exec ollama ollama list
```
Expected output includes `qwen2.5:7b-instruct-q4_K_M`.
**Health check** — the backend polls `GET /api/tags` on Ollama at startup and before inference. If Ollama is absent, `POST /api/search/nl` returns HTTP 503 with `SMART_SEARCH_UNAVAILABLE`.
**Configuration** (see `application.yaml` under `app.ollama`):
| Property | Default | Description |
|---|---|---|
| `app.ollama.base-url` | `http://ollama:11434` | Ollama service URL (dev: `http://localhost:11434`) |
| `app.ollama.model` | `qwen2.5:7b-instruct-q4_K_M` | Model to use for inference |
| `app.ollama.timeout-seconds` | `30` | Read timeout for inference calls |
| `app.nl-search.rate-limit.max-requests-per-minute` | `5` | Per-user rate limit |
### Upgrade the Ollama model
To switch to a newer model version (e.g. a future release of `qwen2.5`):
1. Update the model name in the `ollama-model-init` `command:` in `docker-compose.yml`.
2. Remove the existing model volume to free the old weights:
```bash
docker volume rm familienarchiv_ollama_models
```
(In production the volume name is prefixed with the compose project: `archiv-production_ollama_models`.)
3. Restart the stack:
```bash
docker compose up -d
```
The `ollama-model-init` container pulls the new model weights on first start (~48 GB download depending on the model). The `ollama` inference server will not start until the pull completes (`condition: service_completed_successfully`).
> **`ollama_models` volume:** holds model weights only — fully reproducible by re-pull, no backup needed.
### Trigger a canonical import
The importer no longer parses the raw spreadsheet. It consumes the **canonical artifacts**

View File

@@ -45,6 +45,9 @@ _See also [TranscriptionBlock](#transcriptionblock-transcriptionblock)._
**raw attribution** (`Document.senderText`, `Document.receiverText`, `Document.metaDateRaw`) — the original spreadsheet cell text for a document's sender, receiver, and date, preserved verbatim even after a `Person` or normalized date is linked. It keeps provenance intact and enables an "as written in the original" view.
**auto-generated title** (`DocumentTitleFactory`) — a `Document` title composed by the formula `{index} {dateLabel} {location}` (index = `originalFilename`; date label honest at the row's precision; location omitted when blank). On edit, an unchanged auto-title follows a corrected date/location forward (exact old-vs-new match in `DocumentService.updateDocument`); a hand-written title is kept verbatim. `POST /api/admin/backfill-titles` rewrites already-stale ones in one sweep using a grammar heuristic (`DocumentTitleBackfillMatcher`).
_Not to be confused with a hand-written title_ — only a title that still equals what the factory builds is treated as machine-generated and rewritten; prose is left untouched.
**DocumentVersion** (`DocumentVersion`) — an append-only snapshot of a `Document`'s metadata at a point in time. Append-only by convention; no consumer-facing create or update endpoint exists. The entity uses Lombok `@Data` (which generates setters), so immutability is enforced by application convention, not at the Java level.
**Tag** (`Tag`) — a hierarchical category that can be applied to `Document`s. Tags are self-referencing via a `parent_id` foreign key, forming a tree structure.
@@ -164,6 +167,16 @@ _See also [Chronik](#chronik-internal)._
---
## NL Search Terms
**NlSearch** — the natural-language document search feature. Users type a plain-German query (e.g. "Was hat Walter im Krieg an Emma geschrieben?"); the backend parses it via Ollama, resolves person names to database UUIDs, and delegates to the standard `DocumentService.searchDocuments()` path. Endpoint: `POST /api/search/nl`.
**NlQueryInterpretation** — the structured result of parsing a natural-language query. Contains: `resolvedPersons` (persons whose names unambiguously matched one DB record), `ambiguousPersons` (all candidates when a name matched more than one person), `keywords` (LLM-extracted search terms), `dateFrom`/`dateTo` (extracted date range), `rawQuery` (the original user input), and `keywordsApplied` (whether keyword FTS was used in the search).
**PersonHint** — a lightweight `{id, displayName}` pair used in `NlQueryInterpretation` to describe a resolved or ambiguous person without exposing the full `Person` entity to the frontend.
---
## Infrastructure Terms
**archiv-app** — the bucket-scoped MinIO service account the backend uses to read and write the `familienarchiv` bucket. Distinct from the MinIO root account (`archiv`, used only by the bootstrap container for admin operations). Defined and provisioned in [`infra/minio/bootstrap.sh`](../infra/minio/bootstrap.sh) and consumed by the backend as `S3_ACCESS_KEY` in [`docker-compose.prod.yml`](../docker-compose.prod.yml). The attached `archiv-app-policy` grants `s3:GetObject/PutObject/DeleteObject` on `familienarchiv/*` and `s3:ListBucket/GetBucketLocation` on the bucket only — not the built-in `readwrite` policy which would grant `s3:*` on all buckets.

View File

@@ -35,7 +35,7 @@ Render thumbnails in-process in Spring Boot using **Apache PDFBox 3.0.4** (alrea
**Harder:**
- PDFBox is a parser attack surface. Mitigated by a 30-second watchdog timeout in `ThumbnailAsyncRunner` and by the fire-and-forget contract (failures never break upload).
- Memory ceiling: the `thumbnailExecutor` is capped at 2 threads on the CX32 (8 GB). A busy backfill alongside OCR can approach the 3 GB heap — acceptable but not comfortable. Streaming via `FileService.downloadFileStream` keeps this bounded for PDFs up to 50 MB.
- Memory ceiling: the `thumbnailExecutor` is capped at 2 threads on memory-constrained hosts. A busy backfill alongside OCR can approach the 3 GB heap on an 8 GB server — acceptable but not comfortable. The current production server (64 GB) has ample headroom. Streaming via `FileService.downloadFileStream` keeps this bounded for PDFs up to 50 MB.
### Operational caveats (intentional)

View File

@@ -62,7 +62,7 @@ The `/tmp` tmpfs remains at 512 MB and continues to serve training-ZIP extractio
## Alternatives considered
**Approach B — Enlarge `/tmp` to 4 GB**
One-line change. Discarded because: (1) 4 GB tmpfs counts against the cgroup `mem_limit`; on CX32 hosts with `OCR_MEM_LIMIT=6g` the combined Surya resident set + tmpfs would trigger OOMKill on cold start; (2) staging GB-scale model files through RAM is using the wrong storage tier; (3) any future model larger than 4 GB requires another bump.
One-line change. Discarded because: (1) 4 GB tmpfs counts against the cgroup `mem_limit`; on servers with `OCR_MEM_LIMIT=6g` the combined Surya resident set + tmpfs would trigger OOMKill on cold start; (2) staging GB-scale model files through RAM is using the wrong storage tier; (3) any future model larger than 4 GB requires another bump.
**Approach C — Both TMPDIR redirect and enlarged /tmp**
Belt-and-suspenders: Approach A + 1 GB tmpfs. Discarded in favour of the cleaner Approach A. The defence-in-depth benefit does not outweigh the extra compose churn; the 512 MB cap on `/tmp` is intentional.

View File

@@ -0,0 +1,65 @@
# ADR-028 — Natural language search is powered by Ollama (Qwen 2.5 7B), not a cloud API
**Date:** 2026-06-06
**Status:** Accepted
**Issue:** #738 (NL search backend); part of epic #735
**Milestone:** Archive Intelligence — NL Search
---
## Context
Family members write their search intent in plain German ("Was hat Walter im Krieg an Emma geschrieben?"), not in structured filter forms. Issue #735 defines NL search as a core product goal. Three delivery options were evaluated:
**Option A — extend the OCR service.** The OCR Python microservice already runs on the same host. Adding LLM inference there avoids a new container. Rejected: the OCR service is a single-purpose, CPU-bound pipeline optimised for Kraken; bundling a 4.5 GB LLM weight into the same image would bloat it, complicate model lifecycle management, and create an unrelated failure domain (OOM on large OCR batches vs. LLM load time). ADR-001 was explicit about keeping OCR single-purpose.
**Option B — call an external API (OpenAI, Anthropic, etc.).** Cloud inference is instant and requires no local hardware. Rejected: the archive contains real person names and private family correspondence from 18991950 — sending query content to a third party violates the project's data-residency principle (family data stays on the family server). Additionally, API cost and availability are outside the operator's control; the system must work air-gapped.
**Option C — local Ollama service (chosen).** Ollama is a purpose-built LLM runtime with a simple REST API, model lifecycle management (`ollama pull`), and support for grammar-constrained JSON output. It runs entirely on the existing server (i7-6700, 64 GB RAM) with no cloud dependency.
**Model selection:** Qwen 2.5 7B Q4_K_M (`qwen2.5:7b-instruct-q4_K_M`) was chosen over larger models because:
- Quantised weight is ~4.5 GB — fits comfortably in 64 GB RAM alongside PostgreSQL and the JVM.
- Instruction-tuned variant follows the structured JSON schema reliably without fine-tuning.
- CPU-only inference at Q4_K_M takes 215 seconds per query, acceptable for a search that replaces a multi-step filter form.
**Prompt injection mitigation:** The backend sends the raw user query to Ollama. To prevent the model from being prompted to return schema-breaking output, the API call uses Ollama's `format` parameter with a grammar-constrained JSON schema. Output length is further bounded by `maxLength` constraints in the schema (names ≤ 200 chars, keywords ≤ 100 chars). `NlQueryParserService` enforces these limits in code before any LLM-extracted fragment is passed to `PersonRepository.searchByName()` — defence in depth.
**DB-blind name resolution:** The Ollama prompt stays small (the raw query only); person database records are never sent to the model. Name resolution happens as a cheap SQL query after the model returns. This keeps the prompt short, avoids data leakage, and means adding 1,000 new persons requires no prompt change.
**Graceful degradation:** `RestClientOllamaClient.isHealthy()` is called inline before each inference request (calls `GET /api/tags` on a 2-second connect-timeout client). If Ollama is absent or times out, `NlQueryParserService` throws `DomainException` with `SMART_SEARCH_UNAVAILABLE` (HTTP 503). The regular structured search (`GET /api/documents/search`) is unaffected — it never calls Ollama.
**Expected inference latency:** 215 seconds on the current CPU-only hardware. The frontend issue must show a persistent "Suche läuft…" indicator for the full duration (see `aria-live="polite"` requirement in issue #738 frontend notes). The backend timeout is 30 seconds (`app.ollama.timeout-seconds=30`) — chosen as a safe upper bound for Q4_K_M on the i7-6700 with a realistic 500-character query under modest concurrent load.
**NL query logging policy:** Only metadata is logged — query length, resolved person count, latency in milliseconds. The raw query is never written to the log file. Rationale: queries contain real family names (PII); log files persist to disk and may be shipped to Loki. Structured metadata is sufficient for debugging latency regressions.
**Prompt-amplification abuse:** A malicious user could submit a long or crafted query to cause slow Ollama inference, consuming CPU. Mitigated by `NlSearchRateLimiter` (5 requests per user per minute, Bucket4j + Caffeine) and by `@Size(max=500)` on the request body. The rate limiter is node-local; in multi-replica deployments the effective limit multiplies by replica count — acceptable at the current single-node deployment scale.
**Ollama model pre-pull requirement:** The Docker image contains only the Ollama binary, not the model weights. The operator must run `ollama pull qwen2.5:7b-instruct-q4_K_M` (≈4.5 GB download, 1030 minutes) before the backend starts inference. If skipped, every NL search request returns 503 until the pull completes. The deployment runbook in `docs/DEPLOYMENT.md` covers this explicitly.
**Startup dependency:** The `backend` Compose service declares `depends_on: ollama: condition: service_healthy`. The Ollama healthcheck polls `GET http://localhost:11434/api/tags`; `start_period: 120s` provides margin for weight loading (2060 s on SSD). Note: `service_healthy` confirms the API is responding, not that the model is downloaded — if the pull was skipped, inference still returns 404.
**Multi-name resolution heuristic:** For 2-name queries (e.g. "Was hat Walter an Emma geschrieben?"), the first extracted name is treated as sender and the second as receiver. Per-name role annotation (e.g. `{name: "Walter", role: "sender"}`) was rejected because it would require a combinatorially complex Ollama schema and the most natural German phrasing strongly implies sender→receiver order. For single-name queries, a `personRole` field (`sender`/`receiver`/`any`) is returned.
**`personRole: "any"` keyword limitation:** When `personRole` is `"any"` and the name resolves to exactly one person, `DocumentService.searchDocumentsByPersonId()` is called (OR semantics: person as sender or receiver). Keyword filtering is not applied on this path — only person identity and date range. `keywordsApplied = false` is returned in the response. Rationale: the JPQL for OR-semantics person queries has no text predicate; adding FTS would require a native query or a separate pass, adding complexity for a case that is already well-narrowed by person identity.
**`search/``person/` + `document/` dependency direction:** `NlQueryParserService` calls `PersonService.findByDisplayNameContaining()` and `DocumentService.searchDocuments()` — both are legitimate cross-domain service calls, not repository leaks. The `search/` package has no JPA entities of its own and never accesses `PersonRepository` or `DocumentRepository` directly.
## Decision
**Introduce a new `search/` domain package** with a local Ollama integration via `RestClientOllamaClient`. The Ollama service runs as a separate Docker container, reachable only on the internal Docker network (`expose: ["11434"]`, not `ports:`). The backend calls Ollama's `/api/generate` endpoint with grammar-constrained JSON output. Name resolution and document search are performed by existing services after the model returns.
Key component structure:
- `OllamaClient` / `OllamaHealthClient` interfaces — mockable for tests, modelled on `OcrClient`/`OcrHealthClient`
- `RestClientOllamaClient` — two `RestClient` instances (30 s inference, 2 s health-check)
- `NlQueryParserService` — orchestrates Ollama → name resolution → document search
- `NlSearchRateLimiter` — Bucket4j + Caffeine, 5 req/min per user
- `NlSearchController``POST /api/search/nl`, `@RequirePermission(READ_ALL)`
## Consequences
- Family members can query in natural German without learning filter UI. Expected search satisfaction improvement for the 60+ age cohort (primary transcription audience) is significant.
- NL search is unavailable when Ollama is down or the model pull is not complete. The regular search is unaffected. The 503 response includes a CTA directing users to the regular search.
- Operator responsibility: run `ollama pull` on first deploy and after model updates. The backup runbook must exclude `ollama_models` volume (model weights are re-downloadable, not user data).
- Inference takes 215 seconds. The frontend loading indicator is a hard requirement (see issue #738 frontend notes).
- The rate limiter is node-local. At the current single-node deployment scale this is correct. If the service is ever scaled horizontally, the rate limiter must be moved to Redis (same caveat as `LoginRateLimiter`).
- The `search/` package introduces a new cross-domain dependency direction (`search``person`, `search``document`). This is intentional and documented in `docs/architecture/c4/l3-backend-search.puml`.

View File

@@ -0,0 +1,239 @@
# ADR-028: Ollama Docker Compose service for NL search
**Date:** 2026-06-06
**Status:** Accepted
**Deciders:** Marcel Raddatz
**Relates to:** #737 (infrastructure), #735 (NL search epic)
---
## Context
Issue #735 introduces natural-language document search, requiring a local LLM to generate embeddings and/or run inference at query time. The family archive stores personal family history — data privacy is non-negotiable, so cloud-based inference APIs are excluded. The production target is a Hetzner CX42 (16 GB RAM, 8 vCPUs, CPU-only, ~32 EUR/month).
Alternatives considered:
| Option | Reason rejected |
|---|---|
| **llama.cpp** | No HTTP API out of the box; requires custom wrapper; higher ops burden |
| **vLLM** | GPU-first; significant overhead on CPU-only hardware; overkill for this scale |
| **Cloud APIs** (OpenAI, Gemini, etc.) | Vendor lock-in; per-token cost at scale; data leaves the server — unacceptable for a private family archive |
| **Ollama** | Self-contained Docker image; built-in HTTP REST API; actively maintained; CPU-compatible; zero egress |
**Decision:** run Ollama as a Docker Compose service alongside the existing stack.
---
## Decisions
### 1. Hardware minimums and CPU-only constraint
All inference runs on CPU. The target is the Hetzner CX42 (16 GB RAM, 8 vCPUs).
| Tier | RAM | NL search |
|---|---|---|
| CX42 | 16 GB | Supported — full stack including Ollama |
| CX32 | 8 GB | Disabled — set `APP_OLLAMA_BASE_URL=` (empty) to skip Ollama entirely |
| CX22 | 4 GB | Unsupported for NL search |
### 2. Memory budget on CX42
| Component | `mem_limit` | Typical active RSS |
|---|---|---|
| OCR service | 12g (hard ceiling) | ~6 GB |
| Ollama | 8g | ~8 GB |
| **Total** | | **~14 GB active** |
`memswap_limit` on the Ollama service is set to `8g` (matching `mem_limit`) to prevent Linux from swapping model weights into swap under OCR memory pressure. Swapping model weights does not crash the container but silently degrades inference latency. This mirrors the pattern already applied to the OCR service.
**Operational constraint:** do NOT run `docker-compose.observability.yml` continuously alongside both OCR and Ollama on a CX42. The observability stack adds ~2 GB, which leaves no headroom.
### 3. Graceful-degradation contract
`app.ollama.base-url` absent OR blank → Ollama bean NOT registered → NL search returns HTTP 503 with `ErrorCode: NL_SEARCH_UNAVAILABLE`.
This single code path covers all unavailability scenarios: base-url unset, service unreachable, health check failed, and request timeout.
#### Why not `@ConditionalOnProperty`
`@ConditionalOnProperty` registers the bean when the property is present but blank (`APP_OLLAMA_BASE_URL=`). This produces a `RestClient` with an empty base URL that fails at runtime with an opaque error rather than a clean 503.
#### Correct condition expression
```java
@ConditionalOnExpression("!'${app.ollama.base-url:}'.isBlank()")
```
When the property is absent, the placeholder resolves to `''`; `.isBlank()` returns `true`; negation makes the condition `false`; the bean is not registered. Same result for an explicit empty string (`APP_OLLAMA_BASE_URL=`).
### 4. Backend configuration pattern
Use a `@ConfigurationProperties` record, not separate `@Value` injections:
```java
@ConfigurationProperties("app.ollama")
record OllamaProperties(String baseUrl, String apiKey) {}
```
`OllamaProperties` is registered unconditionally — it is a plain value holder with no side effects.
`@ConditionalOnExpression` belongs **only** on `RestClientOllamaClient` (the bean that creates a live network client).
**Deliberate divergence from the OCR pattern:** the OCR service uses `@Value`-with-default because OCR is always-on and `http://ocr-service:8000` is a safe default. Ollama is truly optional — a missing URL means "feature disabled", not "use this default server". There is no safe default Ollama URL.
### 5. Optional<OllamaClient> injection
The NL search service uses constructor injection with `Optional<OllamaClient>`:
```java
private final Optional<OllamaClient> ollamaClient;
```
When empty (bean not registered), the service method returns 503 immediately:
```java
var client = ollamaClient.orElseThrow(
() -> DomainException.internal(ErrorCode.NL_SEARCH_UNAVAILABLE, "Ollama not configured"));
```
Prefer this over `@Autowired(required = false)` with a null check — the null-check pattern is noisy when the service already uses `@RequiredArgsConstructor`.
### 6. Empty API key guard
`RestClientOllamaClient` omits the `Authorization` header entirely when `apiKey` is blank:
```java
if (!apiKey.isBlank()) {
request.header("Authorization", "Bearer " + apiKey);
}
```
Sending `Authorization: Bearer ` (empty token) has undefined or potentially broken behavior depending on the Ollama version. This mirrors the `trainingToken` guard in `RestClientOcrClient.java:107`.
### 7. OLLAMA_API_KEY behavior in Ollama 0.6.5 and 0.30.6
**Empirically verified (2026-06-06) on both `0.6.5` and `0.30.6`:** `OLLAMA_API_KEY` does **not** enforce request authentication in either version.
Test matrix run against `/api/tags`:
| Configuration | No auth header | `Authorization: Bearer ` (empty) | `Authorization: Bearer wrongkey` | `Authorization: Bearer correctkey` |
|---|---|---|---|---|
| `OLLAMA_API_KEY=` (empty) | 200 | 200 | — | — |
| `OLLAMA_API_KEY` unset | 200 | — | — | — |
| `OLLAMA_API_KEY=testkey99` | 200 | 200 | 200 | 200 |
**Finding:** The `OLLAMA_API_KEY` environment variable is not listed in Ollama's startup config dump and does not gate any HTTP request in either tested version. All configurations — empty string, fully unset, and a real key — accept all requests without authentication.
**Practical implication:** `OLLAMA_API_KEY` provides no defense-in-depth in the tested versions. `archiv-net` network isolation is the only effective security control. The env var is retained in the Compose definition and `.env.example` for forward compatibility if Ollama enables enforcement in a future version, but operators must not rely on it for access control.
**Backend guard still valid:** the `RestClientOllamaClient` code-level guard (omit `Authorization` header when `apiKey.isBlank()`) remains correct behavior regardless — it prevents a malformed `Authorization: Bearer ` header from being sent.
### 8. read_only: true feasibility
**Empirically verified (2026-06-06) on both `0.6.5` and `0.30.6`:** `read_only: true` works with Ollama. All three operations — `ollama serve`, `ollama pull qwen2.5:7b-instruct-q4_K_M`, and `ollama list` — succeeded with exit code 0 in both versions.
Test run:
```bash
docker run --rm --read-only \
-v ollama_models:/root/.ollama \
--tmpfs /tmp \
--entrypoint sh ollama/ollama:0.30.6 \
-c "ollama serve & sleep 5 && ollama pull qwen2.5:7b-instruct-q4_K_M && ollama list"
```
**Note:** the entrypoint must be overridden to `sh` for the test command — the container's default entrypoint is `/bin/ollama` and does not accept `sh` as a subcommand. This is a Docker invocation detail; the Compose service definition uses the image's default entrypoint and `command:` override for the init container, which works correctly.
**Result:** `read_only: true` and `tmpfs: - /tmp:size=512m` are applied to both `ollama` and `ollama-model-init`. The `ollama_models` volume handles all persistent writes; no other paths require write access during normal operation.
### 9. Peak RSS of init container during pull
**Empirically verified (2026-06-06):** Peak RSS during `qwen2.5:7b-instruct-q4_K_M` pull was **~108 MiB**.
`docker stats` samples during the pull (15-second intervals):
| Sample | MEM |
|---|---|
| 1 | 54.89 MiB |
| 2 | 66.3 MiB |
| 5 | 97.25 MiB |
| 9 | **107.8 MiB** (peak) |
`mem_limit: 2g` is adequate — the model weights stream directly to the named volume; RSS is dominated by the Ollama server process alone (~100 MB), not the model data. No bump to 4 GB needed.
### 10. Init container pull mechanism
The `ollama-model-init` container uses a curl-based readiness loop with captured PID:
```sh
ollama serve & SERVE_PID=$!
until curl -sf http://localhost:11434/api/tags; do sleep 1; done
ollama pull qwen2.5:7b-instruct-q4_K_M
kill $SERVE_PID
```
`kill %1` (job-control syntax) is unreliable in non-interactive `sh -c` contexts. Capturing the PID via `SERVE_PID=$!` is reliable.
The same endpoint (`/api/tags`) is used for both the init container readiness loop and the main service `healthcheck`.
### 11. start_period: 60s rationale
The model is pre-pulled by `ollama-model-init` before the main service starts (via `condition: service_completed_successfully`). At main service startup, Ollama only loads model weights from the named volume and binds port 11434.
60 seconds is appropriate for this cold-start profile. 300 seconds was considered — that would be appropriate if the service pulled the model itself — but overstates actual startup time when the model is already present on the volume.
### 12. Security threat model
**Primary control:** `archiv-net` network isolation. Ollama has no externally exposed port (`expose:` only, not `ports:`). The Caddyfile must not route any path to the Ollama service.
**Note on `OLLAMA_API_KEY`:** Per §7, `OLLAMA_API_KEY` is not enforced in Ollama 0.6.5 or 0.30.6 and provides no authentication barrier against a compromised backend container. `archiv-net` network isolation is the sole effective security control. The env var is retained for forward compatibility only — do not rely on it for access control.
Both `ollama` and `ollama-model-init` receive the ADR-019 hardening baseline:
```yaml
cap_drop: [ALL]
security_opt: [no-new-privileges:true]
```
### 13. CI exclusion strategy
Docker Compose profiles are not used — they would add developer friction (requiring `--profile ...` for all local dev commands).
CI uses explicit service selection in `docker-compose.ci.yml`:
```bash
docker compose -f docker-compose.ci.yml up -d db minio create-buckets
```
Ollama is simply not listed and is never started in CI. A YAML comment on the `ollama` service block documents this:
```yaml
# Not started in CI — CI uses explicit service selection
# (docker-compose.ci.yml: db minio create-buckets)
```
### 14. ollama_models volume operational note
The `ollama_models` named volume holds model weights only — fully reproducible by re-pull. No backup is needed.
If the volume fills after a model upgrade:
```bash
docker volume rm ollama_models && docker compose up -d
```
The init container re-pulls the model on next startup.
---
## Consequences
### Positive
- NL search runs entirely on-premises; no data leaves the server and no per-token cloud cost.
- Graceful degradation is a first-class concern: smaller or budget-constrained instances can run the app without Ollama with a single env var change.
- The init container pattern keeps model pull out of the critical startup path for the main service, giving accurate healthcheck timings.
- `@ConditionalOnExpression` with a blank-check is more correct than `@ConditionalOnProperty` for optional features with no safe default URL.
### Risks and operational implications
- **Memory pressure:** OCR + Ollama together consume ~14 GB on a 16 GB host. Running the observability stack simultaneously risks OOM kills. Monitor with `docker stats`.
- **CPU inference latency:** `qwen2.5:7b-instruct-q4_K_M` is chosen for CPU viability, but inference on 8 vCPUs will be noticeably slower than GPU-accelerated alternatives. This is acceptable for the family archive use case (low concurrency, not real-time).
- All three empirical TBD items from the original issue spec were resolved — see §7 (OLLAMA_API_KEY not enforced), §8 (`read_only: true` works), §9 (peak RSS ~108 MiB).
- Model upgrades require a `docker volume rm` to free old weights before pulling the replacement. Document this in runbook/DEPLOYMENT.md.

View File

@@ -0,0 +1,112 @@
# ADR-031 — The document title is a shared `document`-package factory, re-synced by an exact match on save and a grammar heuristic on a one-time backfill
**Date:** 2026-06-04
**Status:** Accepted
**Issue:** #726 (auto-sync document titles with date/location: save-time + one-time backfill)
**Milestone:**
---
## Context
A document title was a string built **once**, at import time, by a private
`DocumentImporter.buildTitle()` composing `{index} {dateLabel} {location}` (index =
`originalFilename`, date label honest at the row's precision via `DocumentTitleFormatter`,
location verbatim). Nothing rebuilt it afterwards. When an archivist later corrected a date
or location in the edit form, the title kept its stale value (e.g. it still read `2028`
after the date was fixed to `1928`), because the edit form round-trips the stored title
verbatim and `updateDocument` simply re-persisted it.
Two distinct problems live here:
1. **Going forward**, an edit to date/location must flow into a title that was machine-built
— but must never overwrite a title a human wrote.
2. **The existing backlog** of already-stale titles must be cleaned once. For these rows the
pre-edit state is gone, so there is no exact value to compare against.
The composition formula also existed only inside `importing`, which is the wrong owner: a
title is a `document` concern, and three call sites (import, save-time, backfill) must share
one rule or they will drift.
## Decision
### 1. One formula, owned by the `document` package (`DocumentTitleFactory`)
Extract the composition into `DocumentTitleFactory` (a `@Component` in the `document`
package) with `build(Document)`. `DocumentImporter` (package `importing`) now consumes it.
`DocumentTitleFormatter` moves into `document` alongside the factory (it stays
package-private; `importing` reaches the formula only through the factory). The direction is
deliberate: `document` owns the rule, `importing` depends on it — not the reverse. The
German date *label* remains the deliberate Java/TS dual implementation pinned by
`docs/date-label-fixtures.json` (#666); this ADR touches the **composition** only and does
not collapse the frontend `formatDocumentDate`.
### 2. Save-time regeneration is an EXACT match, not a heuristic
In `DocumentService.updateDocument` only (bulk edit is out of scope), capture
`autoTitleBefore = titleFactory.build(doc)` from the **currently-persisted** state *before*
any setter runs. Then:
- if the **submitted** title equals `autoTitleBefore`, it was the machine value → rebuild
from the new state;
- otherwise keep the submitted title verbatim (hand-written or freshly typed).
This is an exact old-vs-new comparison — no false positives, no false negatives — relying on
the edit form round-tripping an untouched title verbatim. `projectedState` mirrors the
existing setter asymmetry exactly: `documentDate`/`location` overwrite unconditionally (a
null clears them), while precision/end/raw are taken from the DTO only when non-null and
otherwise kept from the entity. A blank submission is never persisted (the title is always
present) — it falls back to the rebuilt auto-title, which always carries at least the index.
### 3. The one-time backlog cleanup is a grammar heuristic, behind an ADMIN endpoint
`POST /api/admin/backfill-titles` (synchronous, under `AdminController`'s class-level
`@RequirePermission(Permission.ADMIN)`) sweeps every document and, for each whose stored
title passes the overwrite test, rebuilds it via the factory. Because the pre-edit state is
gone, the test (`DocumentTitleBackfillMatcher`, used **only** here) is a grammar heuristic:
after stripping the **literal** index prefix, the remainder must be exactly the index, a
known date-label form (+ an optional trailing location), or a lone segment equal to the
document's current location. Prose is left untouched; anything malformed fails closed.
The backfill saves via `documentRepository.save` directly and **never** routes through
`updateDocument` — following the `backfillFileHashes` precedent — so a mechanical rename does
not snapshot the whole corpus into `document_versions`. It is idempotent (a second run
rewrites nothing) and logs one SLF4J-parameterized `scanned/updated/skipped` line; the
response is `BackfillResult(count)`.
### 4. Edit-form feedback (FR-005)
A localized helper line (de/en/es) under the title input explains that the title is built
from date/place and that a hand-edit is preserved, wired via `aria-describedby` and shown
only on the single-document edit form. A live preview was considered and declined.
## Consequences
- The three call sites can never diverge — there is exactly one formula
(`NFR-MAINT-001`). Save-time cost is a string build + compare; the backfill is one
synchronous transactional sweep over a low-thousands corpus.
- Security: the index is compared **literally** (`String.startsWith` / `Pattern.quote`)
because `originalFilename` is user-controlled and may carry regex metacharacters — an
unquoted pattern would be a ReDoS / regex-injection vector (CWE-1333 / CWE-625). The
date-label sub-patterns use only bounded, non-nested quantifiers.
- **File-replaced documents are treated as manual, by design.** The index is
`originalFilename`, which `updateDocument` reassigns to the uploaded file's name on a
file-replace. After a replace the stored title no longer matches `build(currentState)`, so
neither save-time nor backfill rewrites it. This is the accepted fail-safe of overloading
`originalFilename` rather than adding a dedicated `catalogIndex` column.
- The save-time heuristic risk is zero (exact match); the backfill heuristic can, by its
documented FR-004 rule, treat `{index} {valid date label} {anything}` as machine-built
and rewrite the trailing segment. This is the accepted trade for cleaning the backlog
without the lost pre-edit state.
## Alternatives considered
- **A dedicated `catalogIndex` column** instead of overloading `originalFilename` — rejected;
it adds a migration and a second source of truth for the index for no current benefit, and
the file-replace fail-safe is acceptable.
- **A heuristic at save-time too** (instead of the exact match) — rejected; the stored title
is available pre-edit, so an exact comparison is strictly better (no false positives).
- **A live title preview in the edit form** — rejected (FR-005); a static helper line is
calmer for the 60+ audience and avoids a second client-side mirror of the formula.
- **Collapsing the frontend `formatDocumentDate` into the backend** — out of scope; the
Java/TS date-label split is the deliberate #666 design, pinned by a shared fixture.

View File

@@ -0,0 +1,64 @@
# ADR-032 — Person-delete referential integrity lives in the database, and the cascade never reaches `documents`
**Date:** 2026-06-06
**Status:** Accepted
**Issue:** #684 (move person-delete FK detach to database-level `ON DELETE`)
**Milestone:**
---
## Context
Deleting a `Person` had to detach the two FKs into `persons` that lacked any `ON DELETE`
behaviour: `documents.sender_id` and `document_receivers.person_id` (both from V1).
`PersonService.deletePerson` and `mergePersons` did this in Java — nulling the sender and
deleting receiver join rows before `deleteById` — so the integrity guarantee lived in
application code. Any other delete path (a future endpoint, a manual `psql`, a batch job)
could still orphan rows or fail with an FK-violation 500.
A related soft reference made it worse: `transcription_block_mentioned_persons.person_id`
was a UUID column with **no FK** (V56, a deliberate "no FK" choice), so a person delete left
dangling `@`-mention rows. The literal `@DisplayName` lives in `transcription_blocks.text`,
so only the *link* was ever at stake — not the visible name.
## Decision
Move person-delete integrity into the database (migration V71) and thin the service to a
plain `deleteById`:
- `documents.sender_id``ON DELETE SET NULL` (`documents.senderText` preserves the raw
textual attribution, so nulling the link loses no historical record).
- `document_receivers.person_id``ON DELETE CASCADE` (the symmetric completion of V14,
which gave the `document_id` side the same).
- `transcription_block_mentioned_persons.person_id` → a real FK with `ON DELETE CASCADE`,
reversing V56's "no FK" decision. The read renderer already degrades a `@DisplayName` with
no sidecar row to plain escaped text, so removing the link is invisible to the reader.
**Cascade-boundary invariant:** the cascade stays strictly at the join/reference layer and
**never reaches `documents` rows** — a cascade into `documents` would destroy historical
letters. This is pinned by a non-negotiable document-survival assertion in
`PersonRepositoryTest`.
## Consequences
- A person delete is safe from every path, not just `PersonService`. The service and merge
stay thin (`deleteById` + the cascade); `reassignSenderToNull` and `deleteReceiverReferences`
are deleted.
- This *fixes* the pre-existing dead-link-on-deleted-person case — it is not a purely
invisible refactor. Note the read renderer strips the `@` prefix when it emits a live
mention link, but the degraded (deleted-person) path leaves the literal `@Name` in the
block text as-is — the reader sees `@Auguste Raddatz` as plain text, never a dead link.
- DB cascades run below `AuditService`, so the row-level cleanup is intentionally not
audit-logged; the person-delete action itself is still logged at the service layer.
- The V71 FK validation requires cleaning pre-existing orphan mention rows first; the
migration does this in a `DO` block that logs the purge count via `RAISE NOTICE`.
## Alternatives considered
- **Keep integrity in Java** — rejected; it only protects the one code path and re-breaks the
moment a second delete path appears.
- **Cascade `documents.sender_id`** — rejected; it would delete historical letters when a
sender is removed. `SET NULL` keeps the letter and its `senderText`.
- **Leave the mention sidecar FK-less (honour V56)** — rejected; the "no FK" rationale was
stale, the name survives in the block text regardless, and the FK removes the orphan-row
class of bug.

View File

@@ -0,0 +1,148 @@
# ADR-033 — Tag-name resolution tolerates case-collisions: exact-case first, then a deterministic lowest-id fallback, and never a `unique(lower(name))` constraint
**Date:** 2026-06-06
**Status:** Accepted
**Issue:** #730 (document with a case-colliding tag cannot be saved — `findByNameIgnoreCase` `NonUniqueResultException`)
**Milestone:**
---
## Context
`TagService.findOrCreate(name)` is the single point that turns a tag **name** into a `Tag`
row. The document edit form, bulk-edit, and the upload batch all round-trip tag **names**
(the edit form sends `tags.map(t => t.name).join(',')`) and re-resolve them on **every**
save through `resolveTags → findOrCreate`. The old implementation resolved with
`tagRepository.findByNameIgnoreCase(name)`, a derived query returning `Optional<Tag>`.
That signature encodes an invariant the data does **not** hold: that a name is globally
unique case-insensitively. The canonical tag tree legitimately contains names that differ
only by case — a parent container and its same-named lowercase **child** (`Geburt` /
`Geburt/geburt`, `Weihnachten` / `Weihnachten/weihnachten`, …), or two siblings
(`Reise/Reisepläne` / `Reise/reisepläne`). Each is a distinct node with its own
`source_ref` (the stable identity, per ADR-025) and its own document attachments — **not** an
accidental duplicate. When two rows matched case-insensitively, Hibernate threw
`NonUniqueResultException``IncorrectResultSizeDataAccessException` → a generic HTTP 500.
The effect was severe and opaque: every document carrying one of ~10 colliding tags (≈180
document-tag attachments on staging) became **un-editable** — any field change failed on save
because the whole tag set is re-resolved — and the user saw only "an unexpected error", with
no hint that a tag was the cause.
This is a **lookup** problem, not a data problem: the collisions are valid canonical nodes
and must be preserved.
## Decision
### 1. Resolution is exact-case first, then a non-throwing deterministic fallback
`findOrCreate` resolves in three ordered steps and never throws on a collision:
1. `findByName(cleanName)`**exact-case** derived query. If present, return it. The edit
round-trip replays the stored name verbatim, so the exact-case row is the right binding
(typing the bare child name `weihnachten` binds to the child; `Weihnachten` binds to the
parent container).
2. else `findAllByNameIgnoreCase(cleanName)` — the **plural** case-insensitive list. If
non-empty, return the element with the **lowest `id`** (`min(comparing(Tag::getId))`).
3. else create the tag (an orphan: null `sourceRef`/`parentId`).
The two repository methods are deliberately **two distinctly-named methods** — Spring Data
cannot disambiguate an `Optional<Tag>` from a `List<Tag>` derived query by return type alone.
The throwing `Optional<Tag> findByNameIgnoreCase` is **deleted** so the non-unique-throwing
call cannot be reintroduced; `findOrCreate` was its only production caller.
### 2. The tie-break is `id`, and it is load-bearing
`id` is a stable, always-present, unique column, so "lowest id" is a total, deterministic
order over the candidates: the same name resolves to the same row on every call, forever,
without throwing. This matters only in the free-text authoring path (step 2), where no
exact-case row exists yet two case-folding rows do.
### 3. No `unique(lower(name))` constraint — and a load-bearing comment says so
A global case-insensitive uniqueness constraint is **wrong**: it would reject the legitimate
parent/child canonical nodes. It would also **fail to apply** against the existing rows,
turning a code-only deploy into a failed Flyway migration that blocks startup. A comment at
both `findOrCreate` and the repository methods records this so the constraint is not "helpfully"
added later.
## Consequences
- **Code-only, zero migration, fully reversible** (roll back the JAR). No tag data is touched;
the colliding rows stay exactly as the canonical importer produced them.
- One change fixes all three write paths — single-document edit, bulk-edit, and upload batch —
because they all funnel through `resolveTags → findOrCreate`, which stays the single source
of truth (resolution logic is **not** hoisted into `DocumentService`).
- **Free-text tag semantics under collision are accepted as-is** (issue #730, option A): the
bare word `weihnachten` binds to the deep child and `Weihnachten` to the parent container.
Correct for the edit round-trip and acceptable for authoring; making the typeahead show the
tree path so an author can tell a container from its same-named child is a separate
follow-up.
- The wire response stays opaque: after the fix this path no longer throws
`IncorrectResultSizeDataAccessException`, and `GlobalExceptionHandler`'s generic handler maps
any stray one to `INTERNAL_ERROR` with no Hibernate/SQL leak — so no dedicated handler was
added.
- **The sibling Person path is fixed the same way — see the Person extension below (#731).**
- Postgres `LOWER()` folding of umlauts (`ü`/`ä`) is the actual correctness hinge of the
fallback and cannot be proven by a mocked repo, so it is pinned by a Testcontainers
`postgres:16-alpine` test on a `Glückwünsche`/`glückwünsche` pair; a plain-ASCII test would
stay green while the bug reappeared for umlaut tags.
## Person extension (#731)
The Person domain carried the same latent throw on **two** user-influenced lookup surfaces, and
is fixed with the same exact-case-first, non-throwing pattern — but with a deliberately
**different fallback per surface**, because the two paths have different consequences.
- **Alias path — `PersonService.findOrCreateByAlias` — deterministic lowest-id (mirrors tag).**
`findByAliasIgnoreCase` (`Optional`) is replaced by `findByAlias` (exact) → `findAllByAliasIgnoreCase`
(plural, lowest id) → the existing create-when-absent branch (INSTITUTION/GROUP and the
maiden-name alias are preserved verbatim). There is no human in the importer loop and the path
creates-on-absent anyway, so a deterministic guess is the right behaviour — exactly like tags.
- **Name/sender path — `PersonService.findByName` — bail to null on ambiguity (the new wrinkle).**
Used only by `DocumentService.storeDocument` to resolve the upload **sender** from the parsed
filename. `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` (`Optional`) is replaced by
`findByFirstNameAndLastName` (exact) → `findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase`
(plural). Resolution returns the exact-case match, else the single case-insensitive match, else
— on **two or more** matches — **empty**. The sender is left unset rather than guessing.
**Why this diverges from the alias (and tag) decision:** the archive's value is correct
provenance. A confidently-wrong pre-filled `Hans Müller` is worse than an empty field, because a
senior reviewer will not re-check a value that is already filled in, whereas an empty sender
routes the document into the "needs completion" state (`metadataComplete=false`) for a human to
assign. The load-bearing comment at `findByName` records this so a future "consistency cleanup"
does not reintroduce the confidently-wrong-sender bug by switching it to lowest-id.
- **Fail-closed on a null first name.** A parsed filename can lack a first name. The two new name
methods use explicit HQL equality (`= :firstName`) rather than a derived
`…IgnoreCase` query, because Spring Data folds a null derived-query argument to `first_name IS
NULL` — which would silently widen the match and pull a last-name-only / institution row in as a
"sender" (a quiet provenance-integrity defect). With HQL equality a null binds as `= NULL`,
which never matches, so a null first name resolves to **no sender**. This is pinned by a
real-Postgres repository test.
- **Scope — "ambiguous" is case-insensitive only.** Both exact-case lookups (`findByAlias`,
`findByFirstNameAndLastName`) return `Optional`, so two **byte-identical same-case** rows would
still throw `NonUniqueResultException`. That is a true data anomaly, deliberately out of scope
(it is not a case-collision), and it surfaces as the opaque `INTERNAL_ERROR` — never a silently
wrong row — so it is no worse than any other unexpected error and needs no extra handling here.
- **Same stance as tags otherwise:** no `unique(lower(alias))` / `unique(lower(name))` constraint
(collisions are valid human labels; `source_ref` is the stable identity per ADR-025), no
merge/dedupe, code-only and reversible, and no shared `resolveExactThenCi(...)` helper — the
two Person paths have different fallbacks, so the exact→CI→fallback logic is inlined at each
with its load-bearing comment (KISS).
## Alternatives considered
- **A `unique(lower(name))` index** — rejected: the collisions are valid canonical nodes, and
the migration would fail against the existing data and block startup.
- **Merging/deduping the colliding tags** — rejected: each has a distinct `source_ref`, tree
position, and real document attachments; they are not duplicates.
- **Round-tripping tag IDs instead of names** so resolution can't be ambiguous at all — the
cleaner long-term shape (removes the name-as-key smell), but a larger change with frontend
surface; deferred to #732. The lookup fix here is the minimal correct unblock.
- **Hoisting resolution into `DocumentService.resolveTags`** — rejected: it would duplicate the
rule across the edit, bulk-edit, and import paths and let them drift; `findOrCreate` stays
the one owner.

View File

@@ -9,10 +9,12 @@ Person(member, "Family Member", "Access by administrator invite. Searches, brows
System(familienarchiv, "Familienarchiv", "Web application for digitising, organising, and searching family documents")
System_Ext(mail, "Email Service", "SMTP server. Delivers notification emails (mentions, replies) and password-reset links.")
System_Ext(glitchtip, "GlitchTip", "Self-hosted error tracking (Sentry-compatible). Receives frontend and backend error events with stack traces.")
System_Ext(ollama, "Ollama (self-hosted)", "Local LLM inference server (qwen2.5:7b). Parses natural-language search queries into structured filters. Runs in the same Docker Compose stack.")
Rel(admin, familienarchiv, "Manages via browser", "HTTPS")
Rel(member, familienarchiv, "Searches, reads, and transcribes via browser", "HTTPS")
Rel(familienarchiv, mail, "Sends notification and password-reset emails (optional)", "SMTP")
Rel(familienarchiv, glitchtip, "Sends error events with errorId and stack trace", "HTTPS")
Rel(familienarchiv, ollama, "NL query parsing for natural-language search", "HTTP / REST (internal)")
@enduml

View File

@@ -12,13 +12,16 @@ System_Boundary(archiv, "Familienarchiv (Docker Compose)") {
Container(frontend, "Web Frontend", "SvelteKit / Node adapter / port 3000", "Server-side rendered UI. Handles auth session cookies, document search and viewer, transcription editor, annotation layer, family tree (Stammbaum), stories (Geschichten), activity feed (Chronik), enrichment workflow, and admin panel.")
Container(backend, "API Backend", "Spring Boot 4 / Java 21 / Jetty / port 8080", "REST API. Implements document management, search, user auth, file upload/download, transcription, OCR orchestration, and SSE notifications. Trusts X-Forwarded-* headers from Caddy.")
Container(ocr, "OCR Service", "Python FastAPI / port 8000", "Handwritten text recognition (HTR) and OCR microservice. Single-node by design — see ADR-001. Reachable only on the internal Docker network; no external port exposed.")
Container(ollama, "Ollama LLM Service", "ollama/ollama:0.30.6 / port 11434 (internal only)", "Local LLM inference server for NL search. Runs qwen2.5:7b-instruct-q4_K_M on CPU. Reachable only on the internal Docker network; no external port exposed. Disabled when APP_OLLAMA_BASE_URL is unset or blank.")
' Named volume: ollama_models — model weights, fully reproducible, no backup needed
ContainerDb(db, "Relational Database", "PostgreSQL 16", "Stores document metadata, persons, users, permission groups, tags, transcription blocks, audit log, and Spring Session data.")
ContainerDb(storage, "Object Storage", "MinIO (S3-compatible)", "Stores the actual document files (PDFs, scans). Backend uses a bucket-scoped service account (archiv-app), not MinIO root.")
Container(mc, "Bucket / Service-Account Init", "MinIO Client (mc)", "One-shot container on startup. Idempotent: creates the archive bucket, the archiv-app service account, and attaches the readwrite policy.")
Container(ollama, "Ollama", "Ollama / port 11434", "Local LLM inference server. Hosts qwen2.5:7b-instruct-q4_K_M for natural-language query parsing (NL Search). CPU-only; GPU not required.")
}
System_Boundary(observability, "Observability Stack (/opt/familienarchiv/docker-compose.observability.yml)") {
Container(prometheus, "Prometheus", "prom/prometheus:v3.4.0", "Scrapes metrics from backend management port 8081 (/actuator/prometheus), node-exporter, and cAdvisor. Retention: 30 days.")
Container(prometheus, "Prometheus", "prom/prometheus:v3.4.0", "Scrapes metrics from backend (8081 /actuator/prometheus), OCR service (8000 /metrics), Ollama (11434 /metrics), node-exporter, and cAdvisor. Retention: 30 days.")
Container(node_exporter, "Node Exporter", "prom/node-exporter:v1.9.0", "Host-level CPU, memory, disk, and network metrics.")
Container(cadvisor, "cAdvisor", "gcr.io/cadvisor/cadvisor:v0.52.1", "Per-container resource metrics.")
Container(loki, "Loki", "grafana/loki:3.4.2", "Stores log streams from all containers.")
@@ -41,10 +44,13 @@ Rel(backend, ocr, "OCR job requests with presigned MinIO URL", "HTTP / REST / JS
Rel(backend, mail, "Sends notification and password-reset emails (optional)", "SMTP")
Rel(ocr, storage, "Fetches PDF via presigned URL", "HTTP / S3 presigned")
Rel(mc, storage, "Bootstraps bucket + service account on startup", "MinIO Client CLI")
Rel(backend, ollama, "NL query parsing (POST /api/generate)", "HTTP / REST / JSON")
Rel(promtail, loki, "Pushes log streams", "HTTP/Loki push API")
Rel(backend, tempo, "Sends distributed traces via OTLP", "HTTP / OTLP / port 4318 (archiv-net)")
Rel(prometheus, backend, "Scrapes JVM + HTTP metrics", "HTTP 8081 /actuator/prometheus")
Rel(prometheus, ocr, "Scrapes OCR + http_* metrics", "HTTP 8000 /metrics")
Rel(backend, ollama, "NL search inference requests", "HTTP / REST / JSON")
Rel(prometheus, ollama, "Scrapes LLM request metrics", "HTTP 11434 /metrics")
Rel(grafana, prometheus, "Queries metrics", "HTTP 9090")
Rel(grafana, loki, "Queries logs", "HTTP 3100")
Rel(grafana, tempo, "Queries traces", "HTTP 3200")

View File

@@ -9,15 +9,17 @@ ContainerDb(minio, "Object Storage", "MinIO (S3-compatible)")
System_Boundary(backend, "API Backend (Spring Boot)") {
Component(docCtrl, "DocumentController", "Spring MVC — /api/documents", "CRUD for documents: search, get by ID, update metadata, upload/download file, batch metadata updates, and per-month density aggregation for the timeline filter widget.")
Component(adminCtrl, "AdminController", "Spring MVC — /api/admin", "Triggers the asynchronous canonical import (requires ADMIN permission). Reports import state (IDLE/RUNNING/DONE/FAILED).")
Component(docSvc, "DocumentService", "Spring Service", "Core document business logic: store, update, search. Resolves persons and tags, delegates file I/O to FileService, builds dynamic JPA Specifications, and integrates with audit logging.")
Component(adminCtrl, "AdminController", "Spring MVC — /api/admin", "Triggers the asynchronous canonical import (requires ADMIN permission). Reports import state (IDLE/RUNNING/DONE/FAILED). Hosts the one-shot maintenance backfills (versions, file-hashes, titles) — synchronous, ADMIN-only.")
Component(docSvc, "DocumentService", "Spring Service", "Core document business logic: store, update, search. On update, regenerates an unchanged auto-title from the new date/location (exact old-vs-new match, #726); exposes backfillTitles() to clean already-stale titles in one sweep. Resolves persons and tags, delegates file I/O to FileService, builds dynamic JPA Specifications, and integrates with audit logging.")
Component(fileSvc, "FileService", "Spring Service", "Wraps AWS SDK v2 S3Client. Uploads files with UUID-keyed paths, computes SHA-256 hash, downloads with content-type detection, and generates presigned URLs for OCR access.")
Component(importOrch, "CanonicalImportOrchestrator", "Spring Service — @Async", "Runs the four canonical loaders in an explicit dependency DAG (TagTree → PersonRegister → PersonTree → Document). Smoke-checks all four artifacts before starting, owns the IDLE/RUNNING/DONE/FAILED state machine, fails closed on a malformed artifact.")
Component(tagTreeLoader, "TagTreeImporter", "Spring Component", "Upserts the tag hierarchy from canonical-tag-tree.xlsx via TagService (by canonical tag_path).")
Component(personRegLoader, "PersonRegisterImporter", "Spring Component", "Upserts register persons from canonical-persons.xlsx via PersonService (by normalizer person_id).")
Component(personTreeLoader, "PersonTreeImporter", "Spring Component", "Upserts tree persons + relationships from canonical-persons-tree.json via PersonService and RelationshipService.")
Component(docLoader, "DocumentImporter", "Spring Component", "Loads canonical-documents.xlsx: routes attribution register-first (raw cell always retained in sender_text/receiver_text), parses clean dates, builds an honest precision-aware title via DocumentTitleFormatter, keeps the S3 upload + thumbnail plumbing, and resolves each PDF by index (importDir/<index>.pdf) guarded by strict index validation + canonical-path containment + %PDF magic-byte check (no recursive walk).")
Component(titleFmt, "DocumentTitleFormatter", "Pure helper", "Formats the date label baked into an import title at exactly the data's precision (MONTH -> 'Juni 1916', never a fabricated day). Mirrors the frontend formatDocumentDate; both are pinned to docs/date-label-fixtures.json (#666).")
Component(docLoader, "DocumentImporter", "Spring Component", "Loads canonical-documents.xlsx: routes attribution register-first (raw cell always retained in sender_text/receiver_text), parses clean dates, builds the title via DocumentTitleFactory, keeps the S3 upload + thumbnail plumbing, and resolves each PDF by index (importDir/<index>.pdf) guarded by strict index validation + canonical-path containment + %PDF magic-byte check (no recursive walk).")
Component(titleFactory, "DocumentTitleFactory", "Spring Component", "Single source of truth for the auto-title {index} {dateLabel} {location} (#726). The document package owns this formula; importer, save-time regeneration, and the backfill all build through it so they never diverge.")
Component(titleFmt, "DocumentTitleFormatter", "Pure helper (document pkg)", "Formats the date label at exactly the data's precision (MONTH -> 'Juni 1916', never a fabricated day). Mirrors the frontend formatDocumentDate; both are pinned to docs/date-label-fixtures.json (#666).")
Component(titleMatcher, "DocumentTitleBackfillMatcher", "Pure helper", "Backfill-only heuristic deciding whether a STORED title is machine-generated (overwritable) vs hand-written prose. Index matched literally (no regex injection / ReDoS); fail-closed.")
Component(sheetReader, "CanonicalSheetReader", "POI helper", "Maps a canonical .xlsx by header name (no positional indices), splits pipe-delimited list columns, fails closed (IMPORT_ARTIFACT_INVALID) on a missing required header.")
Component(minioConf, "MinioConfig", "Spring @Configuration", "Creates the S3Client and S3Presigner beans with path-style access for MinIO. Validates MinIO connectivity on startup.")
Component(docRepo, "DocumentRepository", "Spring Data JPA", "Queries documents with Specification-based dynamic search, full-text search with ranking and match highlighting, and transcription pipeline queue projections.")
@@ -44,7 +46,11 @@ Rel(importOrch, docLoader, "4. Loads documents")
Rel(tagTreeLoader, sheetReader, "Reads canonical .xlsx")
Rel(personRegLoader, sheetReader, "Reads canonical .xlsx")
Rel(docLoader, sheetReader, "Reads canonical .xlsx")
Rel(docLoader, titleFmt, "Builds honest title date")
Rel(docLoader, titleFactory, "Builds the auto-title")
Rel(docSvc, titleFactory, "Regenerates auto-title (save-time + backfill)")
Rel(docSvc, titleMatcher, "Backfill overwrite test")
Rel(titleFactory, titleFmt, "Formats the honest date label")
Rel(adminCtrl, docSvc, "backfillTitles() / backfillFileHashes()")
Rel(tagTreeLoader, tagSvc, "Upserts tags by source_ref")
Rel(personRegLoader, personSvc, "Upserts persons by source_ref")
Rel(personTreeLoader, personSvc, "Upserts persons by source_ref")

View File

@@ -0,0 +1,33 @@
@startuml
!include <C4/C4_Component>
title Component Diagram: API Backend — NL Search
Container(frontend, "Web Frontend", "SvelteKit")
ContainerDb(db, "PostgreSQL", "PostgreSQL 16")
Container(ollama, "Ollama", "ollama/ollama — port 11434 (internal only)")
System_Boundary(backend, "API Backend (Spring Boot)") {
Component(nlCtrl, "NlSearchController", "Spring MVC — POST /api/search/nl", "REST entry point for natural language search. Enforces READ_ALL permission. Uses @AuthenticationPrincipal UserDetails to obtain the caller's email for rate limiting. Delegates to NlQueryParserService and returns NlSearchResponse.")
Component(rateLimiter, "NlSearchRateLimiter", "Spring Service", "Bucket4j + Caffeine LoadingCache keyed on user email. Allows 5 NL search requests per minute per user. Throws DomainException(SMART_SEARCH_RATE_LIMITED / HTTP 429) when the bucket is exhausted. Node-local — same caveat as LoginRateLimiter.")
Component(parserSvc, "NlQueryParserService", "Spring Service", "Orchestrates the full NL search pipeline: (1) validates query length, (2) calls OllamaClient.parse() to extract structured intent, (3) resolves each person name via PersonService.findByDisplayNameContaining(), (4) applies multi-name / personRole heuristics, (5) delegates to DocumentService.searchDocuments() or searchDocumentsByPersonId(). Returns NlSearchResponse. Never logs raw query content (PII).")
Component(ollamaClient, "RestClientOllamaClient", "Spring Service — implements OllamaClient + OllamaHealthClient", "HTTP client for the Ollama API. Uses two separate RestClient instances: inference client (30 s read timeout) and health-check client (2 s connect timeout). Calls POST /api/generate with grammar-constrained JSON schema (personNames, personRole, dateFrom, dateTo, keywords). isHealthy() polls GET /api/tags. Null-coalesces absent personNames/keywords to List.of(). Defaults unknown personRole to 'any' with a warning log. Maps timeout/5xx/parse errors to DomainException(SMART_SEARCH_UNAVAILABLE / HTTP 503).")
Component(ollamaProps, "OllamaProperties", "@ConfigurationProperties(\"app.ollama\")", "Config bean: baseUrl, model (qwen2.5:7b-instruct-q4_K_M), timeoutSeconds (default: 30), healthCheckTimeoutSeconds (default: 2).")
Component(rateLimitProps, "NlSearchRateLimitProperties", "@ConfigurationProperties(\"app.nl-search.rate-limit\")", "Config bean: maxRequestsPerMinute (default: 5).")
}
Component(personSvc, "PersonService", "Spring Service", "See diagram 3e. findByDisplayNameContaining(fragment) delegates to PersonRepository.searchByName() — covers first+last name, alias, and name aliases via LEFT JOIN.")
Component(documentSvc, "DocumentService", "Spring Service", "See diagram 3b. searchDocuments() for keyword/sender/receiver/date queries. searchDocumentsByPersonId() for OR-semantics single-person queries (person as sender OR receiver, no keyword filter).")
Rel(frontend, nlCtrl, "POST /api/search/nl with JSON query", "HTTP / JSON")
Rel(nlCtrl, rateLimiter, "checkAndConsume(userEmail)")
Rel(nlCtrl, parserSvc, "parse(query)")
Rel(parserSvc, ollamaClient, "parse(rawQuery) — extracts intent", "HTTP / JSON")
Rel(ollamaClient, ollama, "POST /api/generate (grammar-constrained JSON schema)", "HTTP / REST")
Rel(ollamaClient, ollama, "GET /api/tags (health check)", "HTTP / REST")
Rel(parserSvc, personSvc, "findByDisplayNameContaining(name) for each extracted name")
Rel(parserSvc, documentSvc, "searchDocuments() or searchDocumentsByPersonId()")
Rel(documentSvc, db, "JPA queries", "JDBC")
Rel(personSvc, db, "JPA queries", "JDBC")
@enduml

View File

@@ -260,7 +260,7 @@ package "Transcription" {
entity transcription_block_mentioned_persons {
block_id : UUID <<FK>>
person_id : UUID NOT NULL
person_id : UUID NOT NULL <<FK>>
--
display_name : VARCHAR(200) NOT NULL
}
@@ -386,9 +386,9 @@ invite_token_group_ids }o--|| invite_tokens : invite_token_id
invite_token_group_ids }o--|| user_groups : group_id
' Document relationships
documents }o--o| persons : sender_id
documents }o--o| persons : sender_id (ON DELETE SET NULL)
document_receivers }o--|| documents : document_id
document_receivers }o--|| persons : person_id
document_receivers }o--|| persons : person_id (ON DELETE CASCADE)
document_tags }o--|| documents : document_id
document_tags }o--|| tag : tag_id
document_versions }o--|| documents : document_id
@@ -420,6 +420,7 @@ transcription_blocks }o--o| app_users : updated_by
transcription_block_versions }o--|| transcription_blocks : block_id
transcription_block_versions }o--o| app_users : changed_by
transcription_block_mentioned_persons }o--|| transcription_blocks : block_id
transcription_block_mentioned_persons }o--|| persons : person_id (ON DELETE CASCADE)
' OCR relationships
ocr_job_documents }o--|| ocr_jobs : job_id

View File

@@ -79,9 +79,9 @@ invite_token_group_ids }o--|| invite_tokens : invite_token_id
invite_token_group_ids }o--|| user_groups : group_id
' Document relationships
documents }o--o| persons : sender_id
documents }o--o| persons : sender_id (ON DELETE SET NULL)
document_receivers }o--|| documents : document_id
document_receivers }o--|| persons : person_id
document_receivers }o--|| persons : person_id (ON DELETE CASCADE)
document_tags }o--|| documents : document_id
document_tags }o--|| tag : tag_id
document_versions }o--|| documents : document_id
@@ -113,6 +113,7 @@ transcription_blocks }o--o| app_users : updated_by
transcription_block_versions }o--|| transcription_blocks : block_id
transcription_block_versions }o--o| app_users : changed_by
transcription_block_mentioned_persons }o--|| transcription_blocks : block_id
transcription_block_mentioned_persons }o--|| persons : person_id (ON DELETE CASCADE)
' OCR relationships
ocr_job_documents }o--|| ocr_jobs : job_id

View File

@@ -20,24 +20,19 @@ The observability stack (Prometheus, Loki, Grafana, Tempo, GlitchTip) ships as a
---
## VPS Sizing Recommendations
## Server Sizing
### Recommended: Hetzner CX32
### Current Production Server: Hetzner Dedicated (Serverbörse)
**Specs**: 4 vCPU, 8 GB RAM, 80 GB SSD · **Cost**: 17 EUR/mo
**Specs**: Intel Core i7-6700 (4C/8T, 3.4 GHz), 64 GB RAM · acquired via Hetzner server auction
Sufficient for the application stack (Postgres, MinIO, OCR with `mem_limit: 12g`, backend, frontend, Caddy) on a CX32 today. Once the observability stack lands (Prometheus/Loki/Grafana/Alertmanager add ~2 GB) consider a CX42.
Comfortably handles the full application stack (Postgres, MinIO, OCR with `mem_limit: 12g`, backend, frontend, Caddy, full observability stack) with headroom to spare. The 64 GB RAM means OCR, Ollama inference, and the observability stack can all run concurrently without memory pressure.
### When to Upgrade: Hetzner CX42
### When to Reconsider Hardware
**Specs**: 8 vCPU, 16 GB RAM · **Cost**: 29 EUR/mo
Upgrade when:
- Observability stack adds memory pressure (Loki + Grafana with >30 days retention)
- OCR throughput needs scaling beyond a single-node Surya/Kraken setup
- Real user load profiled in Grafana shows response-time degradation
Never upgrade the VPS tier before profiling — most perceived performance issues are application bugs, not resource constraints.
- CPU is Skylake (2015) — single-threaded performance is the likely bottleneck before RAM
- Profile with Grafana dashboards before concluding hardware is the constraint
- Most perceived performance issues are application bugs (unindexed queries, N+1 loads), not resource limits
---
@@ -45,12 +40,11 @@ Never upgrade the VPS tier before profiling — most perceived performance issue
| Service | Cost |
|---|---|
| Hetzner CX32 VPS | 17.00 EUR |
| Hetzner dedicated server (Serverbörse, i7-6700, 64 GB RAM) | see invoice |
| Hetzner DNS | 0.00 EUR |
| Hetzner SMTP relay | ~1.00 EUR |
| **Total** | **~18 EUR/mo** |
MinIO data lives on the VPS disk (no Object Storage line item yet). The Hetzner OBS migration would add ~5 EUR/mo at ~200 GB.
MinIO data lives on the server disk (no Object Storage line item yet). The Hetzner OBS migration would add ~5 EUR/mo at ~200 GB.
Equivalent SaaS stack: 200300 EUR/mo.

View File

@@ -0,0 +1,36 @@
import { expect, test } from '@playwright/test';
/**
* Auto-title sync, full-stack happy path (#726). A document whose stored title equals its
* machine-generated auto-title must follow a date correction forward on save; a hand-edit would
* be kept. The exhaustive permutations live in the backend unit/integration suites — this is the
* single end-to-end pass, and it also asserts the FR-005 helper line is present on the edit form.
*/
test.describe('Document auto-title sync (#726)', () => {
test('editing the date rebuilds the auto-title, and the edit form explains it', async ({
page
}) => {
// 1. Create a document with no date/location, so its stored title == its auto-title
// (originalFilename only). createDocument derives originalFilename from the title.
await page.goto('/documents/new');
await page.waitForSelector('[data-hydrated]');
await page.getByLabel('Titel').fill('E2E Auto-Titel Sync');
await page.getByRole('button', { name: 'Speichern', exact: true }).click();
await expect(page).toHaveURL(/\/documents\/[^/]+$/);
const detailUrl = page.url();
// 2. The edit form carries the FR-005 helper explaining the auto-generated title.
await page.goto(`${detailUrl}/edit`);
await page.waitForSelector('[data-hydrated]');
await expect(page.locator('#title-help')).toBeVisible();
// 3. Add a YEAR-precision date WITHOUT touching the title, then save.
await page.locator('#documentDate').fill('15.01.1928');
await page.locator('#metaDatePrecision').selectOption('YEAR');
await page.getByRole('button', { name: 'Speichern', exact: true }).click();
// 4. The detail page shows the regenerated title carrying the new year.
await expect(page).toHaveURL(/\/documents\/[^/]+$/);
await expect(page.getByRole('heading', { name: /E2E Auto-Titel Sync.*1928/ })).toBeVisible();
});
});

View File

@@ -0,0 +1,163 @@
import { test, expect, type Page } from '@playwright/test';
import path from 'path';
import { fileURLToPath } from 'url';
import fs from 'fs';
import { AxeBuilder } from '@axe-core/playwright';
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const PDF_FIXTURE = path.resolve(__dirname, 'fixtures/minimal.pdf');
/**
* E2E tests for the transcribe keyboard shortcuts + cheatsheet overlay — #327.
*
* Strategy mirrors annotations.spec: seed a document with two transcription
* blocks via API in beforeAll (no OCR, no manual drawing), then drive the
* keyboard. j/k navigation is exercised in read mode so no editable can trap
* focus — the active region's resize overlay renders regardless of read/edit.
*/
const RESIZE_AREA_LABEL = 'Annotationsgröße und -position ändern';
let docHref: string;
let docId: string;
let annotAId: string;
let annotBId: string;
test.describe('Transcribe keyboard shortcuts', () => {
test.beforeAll(async ({ request }) => {
const baseURL = process.env.E2E_BASE_URL ?? 'http://localhost:3000';
const createRes = await request.post('/api/documents', {
multipart: { title: 'E2E Shortcuts Test', documentDate: '1945-05-08' }
});
if (!createRes.ok()) throw new Error(`Create document failed: ${createRes.status()}`);
const doc = await createRes.json();
docId = doc.id;
docHref = `${baseURL}/documents/${docId}`;
const uploadRes = await request.put(`/api/documents/${docId}`, {
multipart: {
title: doc.title,
documentDate: '1945-05-08',
file: {
name: 'minimal.pdf',
mimeType: 'application/pdf',
buffer: fs.readFileSync(PDF_FIXTURE)
}
}
});
if (!uploadRes.ok()) throw new Error(`Upload PDF failed: ${uploadRes.status()}`);
const blockARes = await request.post(`/api/documents/${docId}/transcription-blocks`, {
data: {
pageNumber: 1,
x: 0.1,
y: 0.1,
width: 0.3,
height: 0.1,
text: 'Erste Zeile.',
label: 'Anrede'
}
});
if (!blockARes.ok()) throw new Error(`Create block A failed: ${blockARes.status()}`);
annotAId = (await blockARes.json()).annotationId;
const blockBRes = await request.post(`/api/documents/${docId}/transcription-blocks`, {
data: {
pageNumber: 1,
x: 0.1,
y: 0.35,
width: 0.3,
height: 0.1,
text: 'Zweite Zeile.',
label: null
}
});
if (!blockBRes.ok()) throw new Error(`Create block B failed: ${blockBRes.status()}`);
annotBId = (await blockBRes.json()).annotationId;
});
async function openTranscribe(page: Page) {
await page.goto(docHref);
await page.waitForSelector('[data-hydrated]');
await page.getByRole('button', { name: 'Transkribieren' }).click();
await page.locator('.tabular-nums').waitFor({ timeout: 15_000 });
await page.locator(`[data-testid="annotation-${annotAId}"]`).waitFor({ timeout: 10_000 });
}
function activeRegionOverlay(page: Page, annotationId: string) {
return page.locator(`[data-testid="annotation-${annotationId}"]`).getByLabel(RESIZE_AREA_LABEL);
}
test('? opens the cheatsheet; Esc closes it, then a second Esc closes the panel', async ({
page
}) => {
test.setTimeout(30_000);
await openTranscribe(page);
await page.keyboard.press('?');
const dialog = page.getByRole('dialog');
await expect(dialog).toBeVisible();
await expect(dialog.getByRole('heading', { name: 'Tastaturkürzel' })).toBeVisible();
await page.keyboard.press('Escape');
await expect(dialog).not.toBeVisible();
// Panel still open after closing only the cheatsheet (Esc ladder rung 1).
await expect(page.getByRole('button', { name: 'Fertig' })).toBeVisible();
await page.keyboard.press('Escape');
await expect(page.getByRole('button', { name: 'Transkribieren' })).toBeVisible();
});
test('e toggles between read and edit mode', async ({ page }) => {
test.setTimeout(30_000);
await openTranscribe(page);
// The "mark for training" section renders only in the edit view.
const editMarker = page.getByText('Für Training vormerken');
// Default for a writer with existing blocks is read mode.
await expect(editMarker).toHaveCount(0);
await page.keyboard.press('e');
await expect(editMarker).toBeVisible();
await page.keyboard.press('e');
await expect(editMarker).toHaveCount(0);
});
test('j and k walk forward and back through the regions', async ({ page }) => {
test.setTimeout(30_000);
await openTranscribe(page);
await page.keyboard.press('j');
await expect(activeRegionOverlay(page, annotAId)).toBeVisible();
await page.keyboard.press('j');
await expect(activeRegionOverlay(page, annotBId)).toBeVisible();
await expect(activeRegionOverlay(page, annotAId)).toHaveCount(0);
await page.keyboard.press('k');
await expect(activeRegionOverlay(page, annotAId)).toBeVisible();
});
test('the open cheatsheet has no critical accessibility violations', async ({ page }) => {
test.setTimeout(30_000);
await openTranscribe(page);
await page.keyboard.press('?');
await expect(page.getByRole('dialog')).toBeVisible();
const results = await new AxeBuilder({ page })
.include('dialog')
.withTags(['wcag2a', 'wcag2aa'])
.analyze();
const critical = results.violations.filter((v) => v.impact === 'critical');
expect(critical).toEqual([]);
// The dialog exposes a modal role with an accessible name (labelled heading).
const dialog = page.getByRole('dialog');
await expect(dialog).toHaveAttribute('aria-modal', 'true');
});
});

View File

@@ -22,6 +22,9 @@
"error_forbidden": "Sie haben keine Berechtigung für diese Aktion.",
"error_csrf_token_missing": "Sitzungsfehler. Bitte laden Sie die Seite neu.",
"error_too_many_login_attempts": "Zu viele Anmeldeversuche. Bitte versuchen Sie es später erneut.",
"error_smart_search_unavailable": "Die intelligente Suche ist momentan nicht verfügbar. Bitte nutzen Sie die normale Suche.",
"error_smart_search_rate_limited": "Sie haben die Suchfunktion zu häufig genutzt. Bitte warten Sie eine Minute.",
"smart_search_keywords_not_applied": "Schlüsselwörter konnten bei dieser Suche nicht berücksichtigt werden.",
"error_validation_error": "Die Eingabe ist ungültig.",
"error_internal_error": "Ein unerwarteter Fehler ist aufgetreten.",
"nav_documents": "Dokumente",
@@ -56,6 +59,7 @@
"form_label_sender": "Absender",
"form_label_receivers": "Empfänger",
"form_label_title": "Titel",
"form_helper_title_autogenerated": "Wird automatisch aus Datum und Ort gebildet — sobald du den Titel änderst, bleibt deine Version erhalten.",
"form_label_tags": "Schlagworte",
"form_label_content": "Inhalt",
"form_placeholder_content": "Kurze Beschreibung des Inhalts…",
@@ -927,6 +931,23 @@
"transcribe_coach_step_3_title": "Speichert automatisch.",
"transcribe_coach_footer_kurrent": "Hilfe zu Kurrent ↗",
"transcribe_coach_footer_richtlinien": "Transkriptions-Richtlinien ↗",
"transcribe_coach_shortcut_hint_before": "Tipp: Drücken Sie",
"transcribe_coach_shortcut_hint_after": "für eine Übersicht aller Tastenkürzel.",
"shortcut_next_region": "Nächster Bereich",
"shortcut_prev_region": "Vorheriger Bereich",
"shortcut_toggle_mode": "Lese-/Bearbeitungsmodus wechseln",
"shortcut_new_region": "Neuen Bereich zeichnen",
"shortcut_toggle_training": "Für Training markieren",
"shortcut_delete_region": "Aktuellen Bereich löschen",
"shortcut_close_panel": "Panel schließen",
"shortcut_help": "Tastaturkürzel anzeigen",
"shortcut_draw_hint": "Ziehen Sie mit der Maus einen Bereich auf.",
"key_cap_delete": "Entf",
"cheatsheet_title": "Tastaturkürzel",
"cheatsheet_close": "Kürzelübersicht schließen",
"cheatsheet_autosave_hint": "Änderungen werden automatisch gespeichert.",
"annotation_view_label": "Block anzeigen",
"annotation_label_with_delete": "Block anzeigen, Entf zum Löschen.",
"transcription_mode_help_label": "Lese- und Bearbeitungsmodus",
"transcription_mode_help_body": "Lesen zeigt die Transkription als fließenden Text. Bearbeiten öffnet die Textfelder für jede Passage.",
"richtlinien_title": "Transkriptions-Richtlinien",

View File

@@ -22,6 +22,9 @@
"error_forbidden": "You do not have permission for this action.",
"error_csrf_token_missing": "Session error. Please reload the page.",
"error_too_many_login_attempts": "Too many login attempts. Please try again later.",
"error_smart_search_unavailable": "The smart search is currently unavailable. Please use the regular search.",
"error_smart_search_rate_limited": "You have used the search function too frequently. Please wait a minute.",
"smart_search_keywords_not_applied": "Keywords could not be applied to this search.",
"error_validation_error": "The input is invalid.",
"error_internal_error": "An unexpected error occurred.",
"nav_documents": "Documents",
@@ -56,6 +59,7 @@
"form_label_sender": "Sender",
"form_label_receivers": "Recipients",
"form_label_title": "Title",
"form_helper_title_autogenerated": "Generated automatically from the date and place — as soon as you edit the title, your version is kept.",
"form_label_tags": "Tags",
"form_label_content": "Content",
"form_placeholder_content": "Brief description of the content…",
@@ -927,6 +931,23 @@
"transcribe_coach_step_3_title": "Saves automatically.",
"transcribe_coach_footer_kurrent": "Kurrent help ↗",
"transcribe_coach_footer_richtlinien": "Transcription guidelines ↗",
"transcribe_coach_shortcut_hint_before": "Tip: press",
"transcribe_coach_shortcut_hint_after": "for an overview of all keyboard shortcuts.",
"shortcut_next_region": "Next region",
"shortcut_prev_region": "Previous region",
"shortcut_toggle_mode": "Toggle read/edit mode",
"shortcut_new_region": "Draw a new region",
"shortcut_toggle_training": "Mark for training",
"shortcut_delete_region": "Delete current region",
"shortcut_close_panel": "Close panel",
"shortcut_help": "Show keyboard shortcuts",
"shortcut_draw_hint": "Drag a region with your mouse.",
"key_cap_delete": "Del",
"cheatsheet_title": "Keyboard shortcuts",
"cheatsheet_close": "Close shortcut overview",
"cheatsheet_autosave_hint": "Changes are saved automatically.",
"annotation_view_label": "View block",
"annotation_label_with_delete": "Show block, press Delete to remove.",
"transcription_mode_help_label": "Read and edit mode",
"transcription_mode_help_body": "Read shows the transcription as flowing text. Edit opens the text fields for each passage.",
"richtlinien_title": "Transcription Guidelines",

View File

@@ -22,6 +22,9 @@
"error_forbidden": "No tiene permiso para realizar esta acción.",
"error_csrf_token_missing": "Error de sesión. Recargue la página.",
"error_too_many_login_attempts": "Demasiados intentos. Por favor, inténtelo más tarde.",
"error_smart_search_unavailable": "La búsqueda inteligente no está disponible en este momento. Por favor, usa la búsqueda normal.",
"error_smart_search_rate_limited": "Has utilizado la función de búsqueda demasiadas veces. Por favor, espera un minuto.",
"smart_search_keywords_not_applied": "Las palabras clave no pudieron aplicarse a esta búsqueda.",
"error_validation_error": "La entrada no es válida.",
"error_internal_error": "Se ha producido un error inesperado.",
"nav_documents": "Documentos",
@@ -56,6 +59,7 @@
"form_label_sender": "Remitente",
"form_label_receivers": "Destinatarios",
"form_label_title": "Título",
"form_helper_title_autogenerated": "Se genera automáticamente a partir de la fecha y el lugar; en cuanto edites el título, se conservará tu versión.",
"form_label_tags": "Etiquetas",
"form_label_content": "Contenido",
"form_placeholder_content": "Breve descripción del contenido…",
@@ -927,6 +931,23 @@
"transcribe_coach_step_3_title": "Se guarda automáticamente.",
"transcribe_coach_footer_kurrent": "Ayuda sobre Kurrent ↗",
"transcribe_coach_footer_richtlinien": "Normas de transcripción ↗",
"transcribe_coach_shortcut_hint_before": "Consejo: pulse",
"transcribe_coach_shortcut_hint_after": "para ver todos los atajos de teclado.",
"shortcut_next_region": "Región siguiente",
"shortcut_prev_region": "Región anterior",
"shortcut_toggle_mode": "Cambiar modo lectura/edición",
"shortcut_new_region": "Dibujar una nueva región",
"shortcut_toggle_training": "Marcar para entrenamiento",
"shortcut_delete_region": "Eliminar la región actual",
"shortcut_close_panel": "Cerrar panel",
"shortcut_help": "Mostrar atajos de teclado",
"shortcut_draw_hint": "Arrastre una región con el ratón.",
"key_cap_delete": "Supr",
"cheatsheet_title": "Atajos de teclado",
"cheatsheet_close": "Cerrar el resumen de atajos",
"cheatsheet_autosave_hint": "Los cambios se guardan automáticamente.",
"annotation_view_label": "Ver bloque",
"annotation_label_with_delete": "Mostrar bloque, pulse Supr para eliminar.",
"transcription_mode_help_label": "Modo lectura y edición",
"transcription_mode_help_body": "Lectura muestra la transcripción como texto continuo. Edición abre los campos de texto para cada pasaje.",
"richtlinien_title": "Normas de transcripción",

View File

@@ -17,6 +17,7 @@ let {
titleRequired = false,
suggestedTitle = '',
hideTitle = false,
showTitleHelp = false,
editMode = false
}: {
tags?: Tag[];
@@ -31,6 +32,7 @@ let {
titleRequired?: boolean;
suggestedTitle?: string;
hideTitle?: boolean;
showTitleHelp?: boolean;
editMode?: boolean;
} = $props();
@@ -72,8 +74,14 @@ const titleValue = $derived(titleDirty ? currentTitle : suggestedTitle || curren
titleDirty = true;
}}
required={titleRequired}
aria-describedby={showTitleHelp ? 'title-help' : undefined}
class="block w-full rounded border border-line p-2 text-sm shadow-sm focus:outline-none focus-visible:ring-2 focus-visible:ring-focus-ring"
/>
{#if showTitleHelp}
<p id="title-help" class="mt-1 text-sm text-ink-3">
{m.form_helper_title_autogenerated()}
</p>
{/if}
</div>
{/if}

View File

@@ -1,6 +1,7 @@
import { afterEach, describe, expect, it } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import DescriptionSection from './DescriptionSection.svelte';
import { m } from '$lib/paraglide/messages.js';
afterEach(() => cleanup());
@@ -55,3 +56,28 @@ describe('DescriptionSection — onMount seeding (Felix B1/B2 fix regression fen
expect(input.value).toBe('Parent Value');
});
});
describe('DescriptionSection — auto-generated title helper (FR-TITLE-005)', () => {
it('shows the helper with the localized text and wires aria-describedby when showTitleHelp is set', async () => {
render(DescriptionSection, { showTitleHelp: true });
const help = document.querySelector('#title-help') as HTMLElement;
expect(help).not.toBeNull();
expect(help.textContent?.trim()).toBe(m.form_helper_title_autogenerated());
// ≥14px for the 60+ audience (FR-005 prefers a larger size than the 12px field hints).
expect(help.classList.contains('text-sm')).toBe(true);
const titleInput = document.querySelector('input#title') as HTMLInputElement;
expect(titleInput.getAttribute('aria-describedby')).toBe('title-help');
});
it('omits the helper by default (e.g. the new-document form)', async () => {
render(DescriptionSection, {});
expect(document.querySelector('#title-help')).toBeNull();
const titleInput = document.querySelector('input#title') as HTMLInputElement;
expect(titleInput.getAttribute('aria-describedby')).toBeNull();
});
it('omits the helper when the title field is hidden (bulk edit)', async () => {
render(DescriptionSection, { showTitleHelp: true, hideTitle: true });
expect(document.querySelector('#title-help')).toBeNull();
});
});

View File

@@ -221,6 +221,7 @@ async function handleReplaceFile(e: Event) {
initialArchiveFolder={doc.archiveFolder ?? ''}
initialSummary={doc.summary ?? ''}
titleRequired={true}
showTitleHelp={true}
/>
</form>

View File

@@ -25,7 +25,7 @@ type Props = {
flashAnnotationId?: string | null;
onAnnotationClick: (id: string) => void;
onTranscriptionDraw?: (rect: DrawRect) => void;
onDeleteAnnotationRequest?: (annotationId: string) => void;
onAnnotationFocus?: (id: string) => void;
};
let {
@@ -42,7 +42,7 @@ let {
flashAnnotationId = null,
onAnnotationClick,
onTranscriptionDraw,
onDeleteAnnotationRequest
onAnnotationFocus
}: Props = $props();
</script>
@@ -104,7 +104,7 @@ let {
flashAnnotationId={flashAnnotationId}
onAnnotationClick={onAnnotationClick}
onTranscriptionDraw={onTranscriptionDraw}
onDeleteAnnotationRequest={onDeleteAnnotationRequest}
onAnnotationFocus={onAnnotationFocus}
documentFileHash={doc.fileHash ?? null}
/>
{:else if fileUrl}

View File

@@ -19,7 +19,7 @@ let {
flashAnnotationId = null,
onDraw,
onAnnotationClick,
onDeleteRequest
onAnnotationFocus
}: {
annotations: Annotation[];
canDraw: boolean;
@@ -30,7 +30,7 @@ let {
flashAnnotationId?: string | null;
onDraw: (rect: DrawRect) => void;
onAnnotationClick?: (id: string) => void;
onDeleteRequest?: (annotationId: string) => void;
onAnnotationFocus?: (id: string) => void;
} = $props();
let drawStart = $state<{ x: number; y: number } | null>(null);
@@ -115,8 +115,8 @@ const containerStyle = $derived(
blockNumber={blockNumbers[annotation.id]}
isFlashing={flashAnnotationId === annotation.id}
showDelete={canDraw}
onDeleteRequest={() => onDeleteRequest?.(annotation.id)}
onclick={() => onAnnotationClick?.(annotation.id)}
onfocus={() => onAnnotationFocus?.(annotation.id)}
onpointerenter={() => (hoveredId = annotation.id)}
onpointerleave={() => (hoveredId = null)}
/>

View File

@@ -1,5 +1,6 @@
<script lang="ts">
import type { Annotation } from '$lib/shared/types';
import { m } from '$lib/paraglide/messages.js';
import AnnotationEditOverlay from './AnnotationEditOverlay.svelte';
let {
@@ -12,8 +13,8 @@ let {
isFlashing = false,
isResizable = false,
showDelete = false,
onDeleteRequest,
onclick,
onfocus,
onpointerenter,
onpointerleave
}: {
@@ -26,12 +27,19 @@ let {
isFlashing?: boolean;
isResizable?: boolean;
showDelete?: boolean;
onDeleteRequest?: () => void;
onclick: () => void;
onfocus?: () => void;
onpointerenter: () => void;
onpointerleave: () => void;
} = $props();
// When deletion is available (transcribe mode), announce the otherwise-hidden
// Delete affordance to assistive tech (issue #327). The transcribeShortcuts
// action is the single owner of the key itself.
const ariaLabel = $derived(
showDelete ? m.annotation_label_with_delete() : m.annotation_view_label()
);
function hexToRgba(hex: string, alpha: number): string {
const r = parseInt(hex.slice(1, 3), 16);
const g = parseInt(hex.slice(3, 5), 16);
@@ -83,11 +91,12 @@ let shapeStyle = $derived(
class:annotation-flash={isFlashing}
role="button"
tabindex="0"
aria-label="Block anzeigen"
aria-label={ariaLabel}
aria-keyshortcuts={showDelete ? 'Delete' : undefined}
onclick={onclick}
onfocus={onfocus}
onkeydown={(e) => {
if (e.key === 'Enter' || e.key === ' ') onclick();
if (e.key === 'Delete' && showDelete) onDeleteRequest?.();
}}
onpointerenter={onpointerenter}
onpointerleave={onpointerleave}

View File

@@ -2,9 +2,32 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
import AnnotationShape from './AnnotationShape.svelte';
import {
transcribeShortcuts,
type TranscribeShortcutOptions
} from '$lib/shared/actions/transcribeShortcuts';
afterEach(cleanup);
function noopShortcutOptions(
overrides: Partial<TranscribeShortcutOptions> = {}
): TranscribeShortcutOptions {
return {
isPanelOpen: () => true,
isCheatsheetOpen: () => false,
panelMode: () => 'edit',
goToNextRegion: () => {},
goToPrevRegion: () => {},
toggleMode: () => {},
closePanel: () => {},
startDrawMode: () => {},
toggleTrainingMark: () => {},
deleteCurrentRegion: () => {},
openCheatsheet: () => {},
...overrides
};
}
function makeAnnotation(id = 'ann-1') {
return {
id,
@@ -43,7 +66,6 @@ describe('AnnotationShape', () => {
isHovered: true,
isActive: true,
showDelete: true,
onDeleteRequest: vi.fn(),
onclick: () => {},
onpointerenter: () => {},
onpointerleave: () => {}
@@ -57,16 +79,17 @@ describe('AnnotationShape', () => {
expect(annotationEl.querySelectorAll('button').length).toBe(0);
});
it('calls onDeleteRequest when Delete key is pressed on the annotation', async () => {
const onDeleteRequest = vi.fn();
// Deletion is owned solely by the transcribeShortcuts action (issue #327,
// decision: action is the single Delete owner). The shape must NOT handle
// the Delete key itself, or the key would delete twice.
it('does not act on the Delete key itself (the action owns deletion)', async () => {
const onclick = vi.fn();
render(AnnotationShape, {
annotation: makeAnnotation(),
isHovered: false,
isActive: true,
showDelete: true,
onDeleteRequest,
onclick: () => {},
onclick,
onpointerenter: () => {},
onpointerleave: () => {}
});
@@ -74,26 +97,83 @@ describe('AnnotationShape', () => {
const annotationEl = page.getByTestId('annotation-ann-1').element() as HTMLElement;
annotationEl.dispatchEvent(new KeyboardEvent('keydown', { key: 'Delete', bubbles: true }));
expect(onDeleteRequest).toHaveBeenCalledOnce();
// No side effect from the shape; it stays in the document for the action to act on.
expect(onclick).not.toHaveBeenCalled();
await expect.element(page.getByTestId('annotation-ann-1')).toBeInTheDocument();
});
it('does not call onDeleteRequest on Delete key when showDelete is false', async () => {
const onDeleteRequest = vi.fn();
it('announces the Delete affordance via aria when deletion is available', async () => {
render(AnnotationShape, {
annotation: makeAnnotation(),
isHovered: false,
isActive: true,
showDelete: false,
onDeleteRequest,
showDelete: true,
onclick: () => {},
onpointerenter: () => {},
onpointerleave: () => {}
});
const annotationEl = page.getByTestId('annotation-ann-1').element() as HTMLElement;
expect(annotationEl.getAttribute('aria-keyshortcuts')).toBe('Delete');
expect(annotationEl.getAttribute('aria-label')).toContain('Entf');
});
it('keeps the plain label and no key hint when deletion is unavailable', async () => {
render(AnnotationShape, {
annotation: makeAnnotation(),
isHovered: false,
isActive: false,
showDelete: false,
onclick: () => {},
onpointerenter: () => {},
onpointerleave: () => {}
});
const annotationEl = page.getByTestId('annotation-ann-1').element() as HTMLElement;
expect(annotationEl.getAttribute('aria-keyshortcuts')).toBe(null);
expect(annotationEl.getAttribute('aria-label')).toBe('Block anzeigen');
});
it('calls onfocus when the annotation receives focus', async () => {
const onfocus = vi.fn();
render(AnnotationShape, {
annotation: makeAnnotation(),
isHovered: false,
isActive: false,
onfocus,
onclick: () => {},
onpointerenter: () => {},
onpointerleave: () => {}
});
const annotationEl = page.getByTestId('annotation-ann-1').element() as HTMLElement;
annotationEl.dispatchEvent(new FocusEvent('focus'));
expect(onfocus).toHaveBeenCalledOnce();
});
// Integration: a real rendered shape + the live transcribeShortcuts action.
// Pressing Delete on the focused region must delete exactly once — proving the
// action is the single owner and the shape contributes no competing handler.
it('with the transcribeShortcuts action active, Delete deletes the focused region exactly once', () => {
const deleteCurrentRegion = vi.fn();
render(AnnotationShape, {
annotation: makeAnnotation(),
isHovered: false,
isActive: true,
showDelete: true,
onclick: () => {},
onpointerenter: () => {},
onpointerleave: () => {}
});
const annotationEl = page.getByTestId('annotation-ann-1').element() as HTMLElement;
const action = transcribeShortcuts(annotationEl, noopShortcutOptions({ deleteCurrentRegion }));
annotationEl.focus();
annotationEl.dispatchEvent(new KeyboardEvent('keydown', { key: 'Delete', bubbles: true }));
expect(onDeleteRequest).not.toHaveBeenCalled();
expect(deleteCurrentRegion).toHaveBeenCalledTimes(1);
action.destroy();
});
});

View File

@@ -0,0 +1,104 @@
<script lang="ts">
import { m } from '$lib/paraglide/messages.js';
let { open = false, onClose }: { open?: boolean; onClose: () => void } = $props();
let dialogEl = $state<HTMLDialogElement>();
let closeButton = $state<HTMLButtonElement>();
// Grouped navigation / editing / utility — whitespace dividers, no headers.
const groups = [
[
{ cap: 'j', label: m.shortcut_next_region() },
{ cap: 'k', label: m.shortcut_prev_region() }
],
[
{ cap: 'e', label: m.shortcut_toggle_mode() },
{ cap: 'n', label: m.shortcut_new_region() },
{ cap: 't', label: m.shortcut_toggle_training() },
{ cap: m.key_cap_delete(), label: m.shortcut_delete_region() }
],
[
{ cap: 'Esc', label: m.shortcut_close_panel() },
{ cap: '?', label: m.shortcut_help() }
]
];
$effect(() => {
const el = dialogEl;
if (!el) return;
if (open && !el.open) {
el.showModal();
closeButton?.focus();
} else if (!open && el.open) {
el.close();
}
});
function handleBackdropClick(event: MouseEvent) {
if (event.target === dialogEl) onClose();
}
</script>
<dialog
bind:this={dialogEl}
aria-modal="true"
aria-labelledby="cheatsheet-title"
class="w-[calc(100%-2rem)] max-w-md rounded-sm border border-line bg-surface p-6 shadow-lg backdrop:bg-black/40"
onclose={onClose}
onclick={handleBackdropClick}
>
<div class="mb-5 flex items-center justify-between">
<h2 id="cheatsheet-title" class="font-serif text-lg font-bold text-ink">
{m.cheatsheet_title()}
</h2>
<button
bind:this={closeButton}
type="button"
onclick={onClose}
aria-label={m.cheatsheet_close()}
class="flex h-11 w-11 items-center justify-center rounded-sm text-ink-2 hover:bg-muted focus-visible:ring-2 focus-visible:ring-brand-mint focus-visible:outline-none"
>
<svg class="h-5 w-5" viewBox="0 0 24 24" fill="none" stroke="currentColor" aria-hidden="true">
<path stroke-linecap="round" stroke-width="2" d="M6 6l12 12M18 6L6 18" />
</svg>
</button>
</div>
<div class="divide-y divide-line">
{#each groups as group, i (i)}
<div class="flex flex-col gap-2 py-3 first:pt-0 last:pb-0">
{#each group as shortcut (shortcut.cap)}
<div class="flex items-center justify-between gap-4">
<kbd
class="rounded border border-line bg-muted px-2 py-0.5 font-mono text-sm text-ink shadow-sm"
>{shortcut.cap}</kbd
>
<span class="flex-1 text-right font-serif text-sm text-ink">{shortcut.label}</span>
</div>
{/each}
</div>
{/each}
</div>
<p class="mt-5 border-t border-line pt-3.5 font-sans text-xs text-ink-3">
{m.cheatsheet_autosave_hint()}
</p>
</dialog>
<style>
@media (prefers-reduced-motion: no-preference) {
dialog[open] {
animation: fadeIn 150ms ease;
}
}
@keyframes fadeIn {
from {
opacity: 0;
}
to {
opacity: 1;
}
}
</style>

View File

@@ -0,0 +1,65 @@
import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
import ShortcutCheatsheet from './ShortcutCheatsheet.svelte';
afterEach(cleanup);
describe('ShortcutCheatsheet', () => {
it('is not in the accessibility tree when closed', async () => {
render(ShortcutCheatsheet, { open: false, onClose: vi.fn() });
await expect.element(page.getByRole('dialog')).not.toBeInTheDocument();
});
it('opens as a modal dialog with a labelled heading when open', async () => {
render(ShortcutCheatsheet, { open: true, onClose: vi.fn() });
await expect.element(page.getByRole('dialog')).toBeInTheDocument();
await expect.element(page.getByRole('heading')).toBeInTheDocument();
});
it('lists all eight shortcut rows', async () => {
render(ShortcutCheatsheet, { open: true, onClose: vi.fn() });
const dialog = page.getByRole('dialog').element() as HTMLElement;
const keyCaps = dialog.querySelectorAll('kbd');
expect(keyCaps.length).toBe(8);
});
it('shows the autosave footer line', async () => {
render(ShortcutCheatsheet, { open: true, onClose: vi.fn() });
const dialog = page.getByRole('dialog').element() as HTMLElement;
expect(dialog.textContent).toContain('automatisch');
});
it('calls onClose when Escape is pressed', async () => {
const onClose = vi.fn();
render(ShortcutCheatsheet, { open: true, onClose });
const dialog = page.getByRole('dialog').element() as HTMLDialogElement;
dialog.dispatchEvent(new KeyboardEvent('keydown', { key: 'Escape', bubbles: true }));
// native <dialog> turns Esc into a 'cancel' + 'close'; assert close fired onClose
dialog.dispatchEvent(new Event('close'));
expect(onClose).toHaveBeenCalled();
});
it('calls onClose when the backdrop is clicked', async () => {
const onClose = vi.fn();
render(ShortcutCheatsheet, { open: true, onClose });
const dialog = page.getByRole('dialog').element() as HTMLDialogElement;
// a click whose target is the dialog element itself is a backdrop click
dialog.dispatchEvent(new MouseEvent('click', { bubbles: true }));
expect(onClose).toHaveBeenCalled();
});
it('does not close on "?" while open (open-only, not a toggle)', async () => {
const onClose = vi.fn();
render(ShortcutCheatsheet, { open: true, onClose });
const dialog = page.getByRole('dialog').element() as HTMLDialogElement;
dialog.dispatchEvent(new KeyboardEvent('keydown', { key: '?', bubbles: true }));
expect(onClose).not.toHaveBeenCalled();
});
it('focuses the close button on open', async () => {
render(ShortcutCheatsheet, { open: true, onClose: vi.fn() });
const closeButton = page.getByRole('button', { name: /schließen/i }).element();
expect(document.activeElement).toBe(closeButton);
});
});

View File

@@ -0,0 +1,14 @@
import { describe, it, expect } from 'vitest';
import { canArmDraw, shouldDisarmDraw } from './drawCue';
describe('draw cue policy', () => {
it('arms only in edit mode', () => {
expect(canArmDraw('edit')).toBe(true);
expect(canArmDraw('read')).toBe(false);
});
it('disarms in every mode except edit', () => {
expect(shouldDisarmDraw('read')).toBe(true);
expect(shouldDisarmDraw('edit')).toBe(false);
});
});

View File

@@ -0,0 +1,20 @@
/**
* Policy for the "draw a new region" keyboard cue (the `n` shortcut) in the
* transcribe panel — issue #327.
*
* The cue is only valid while editing: `n` arms it in edit mode, and it must
* clear when a region is drawn or when the panel leaves edit mode. Pure so both
* rules are testable without mounting the page.
*/
type PanelMode = 'read' | 'edit';
/** The draw cue may only be armed while in edit mode. */
export function canArmDraw(panelMode: PanelMode): boolean {
return panelMode === 'edit';
}
/** Leaving edit mode must disarm the draw cue. */
export function shouldDisarmDraw(panelMode: PanelMode): boolean {
return !canArmDraw(panelMode);
}

View File

@@ -0,0 +1,48 @@
import { describe, it, expect } from 'vitest';
import { stepRegion } from './regionNavigation';
describe('stepRegion', () => {
const ids = ['a', 'b', 'c'];
it('returns null for an empty list', () => {
expect(stepRegion([], null, 1)).toBe(null);
expect(stepRegion([], 'a', -1)).toBe(null);
});
it('steps forward from the middle', () => {
expect(stepRegion(ids, 'a', 1)).toBe('b');
expect(stepRegion(ids, 'b', 1)).toBe('c');
});
it('steps backward from the middle', () => {
expect(stepRegion(ids, 'c', -1)).toBe('b');
expect(stepRegion(ids, 'b', -1)).toBe('a');
});
it('wraps forward past the last region to the first', () => {
expect(stepRegion(ids, 'c', 1)).toBe('a');
});
it('wraps backward past the first region to the last', () => {
expect(stepRegion(ids, 'a', -1)).toBe('c');
});
it('lands on the first region when entering fresh (no active) going forward', () => {
expect(stepRegion(ids, null, 1)).toBe('a');
});
it('lands on the last region when entering fresh (no active) going backward', () => {
expect(stepRegion(ids, null, -1)).toBe('c');
});
it('treats an unknown active id as a fresh entry', () => {
expect(stepRegion(ids, 'zzz', 1)).toBe('a');
expect(stepRegion(ids, 'zzz', -1)).toBe('c');
});
it('returns the single region for both directions (wrap of length 1)', () => {
expect(stepRegion(['only'], 'only', 1)).toBe('only');
expect(stepRegion(['only'], 'only', -1)).toBe('only');
expect(stepRegion(['only'], null, 1)).toBe('only');
});
});

View File

@@ -0,0 +1,33 @@
/**
* Region navigation for the transcribe keyboard shortcuts (j/k) — issue #327.
*
* Pure and side-effect free so the wrap-around / fresh-entry branches are
* unit-testable without mounting the page.
*/
/**
* Pick the annotation id one step from the active region, wrapping around the
* ends. Entering fresh (no active region, or an unknown id) lands on the first
* region going forward and the last going backward.
*
* @param orderedAnnotationIds region annotation ids in display order
* @param activeId the currently active region, or null
* @param delta +1 for next (j), -1 for previous (k)
* @returns the next annotation id, or null when there are no regions
*/
export function stepRegion(
orderedAnnotationIds: string[],
activeId: string | null,
delta: 1 | -1
): string | null {
const count = orderedAnnotationIds.length;
if (count === 0) return null;
const current = activeId === null ? -1 : orderedAnnotationIds.indexOf(activeId);
if (current === -1) {
return delta > 0 ? orderedAnnotationIds[0] : orderedAnnotationIds[count - 1];
}
const next = (current + delta + count) % count;
return orderedAnnotationIds[next];
}

View File

@@ -0,0 +1,30 @@
import { describe, it, expect } from 'vitest';
import { resolveTrainingMark, RECOGNITION_TRAINING_LABEL } from './trainingMark';
describe('resolveTrainingMark', () => {
it('is a no-op (null) when no region is active', () => {
expect(resolveTrainingMark(null, [])).toBe(null);
expect(resolveTrainingMark(null, [RECOGNITION_TRAINING_LABEL])).toBe(null);
});
it('enrols recognition training when a region is active and not yet enrolled', () => {
expect(resolveTrainingMark('ann-1', [])).toEqual({
label: RECOGNITION_TRAINING_LABEL,
enrolled: true
});
});
it('un-enrols when recognition training is already enrolled', () => {
expect(resolveTrainingMark('ann-1', [RECOGNITION_TRAINING_LABEL])).toEqual({
label: RECOGNITION_TRAINING_LABEL,
enrolled: false
});
});
it('ignores unrelated document training labels', () => {
expect(resolveTrainingMark('ann-1', ['KURRENT_SEGMENTATION'])).toEqual({
label: RECOGNITION_TRAINING_LABEL,
enrolled: true
});
});
});

View File

@@ -0,0 +1,31 @@
/**
* "Mark for training" (the `t` shortcut) decision logic — issue #327.
*
* Training enrollment is document-level — two fixed script-type chips
* (KURRENT_RECOGNITION / KURRENT_SEGMENTATION); there is no per-region training
* flag yet (that arrives with #321). `t` toggles the primary recognition
* enrollment and is a silent no-op unless a region is active, so it reads as an
* action on the region the transcriber is working on.
*
* Pure so the no-op-when-no-region guard and the enrolled flip are testable
* without mounting the page.
*/
export const RECOGNITION_TRAINING_LABEL = 'KURRENT_RECOGNITION';
export type TrainingMarkToggle = { label: string; enrolled: boolean };
/**
* Decide the recognition-training toggle for the active region, or null when no
* region is active (the `t` shortcut is then a silent no-op).
*
* @param activeAnnotationId the currently active region, or null
* @param currentLabels the document's currently enrolled training labels
*/
export function resolveTrainingMark(
activeAnnotationId: string | null,
currentLabels: readonly string[]
): TrainingMarkToggle | null {
if (!activeAnnotationId) return null;
const enrolled = !currentLabels.includes(RECOGNITION_TRAINING_LABEL);
return { label: RECOGNITION_TRAINING_LABEL, enrolled };
}

View File

@@ -20,7 +20,7 @@ let {
activeAnnotationId = $bindable<string | null>(null),
onAnnotationClick,
onTranscriptionDraw,
onDeleteAnnotationRequest,
onAnnotationFocus,
documentFileHash,
annotationsDimmed = false,
flashAnnotationId = null,
@@ -35,7 +35,7 @@ let {
activeAnnotationId?: string | null;
onAnnotationClick?: (id: string) => void;
onTranscriptionDraw?: (rect: DrawRect) => void;
onDeleteAnnotationRequest?: (annotationId: string) => void;
onAnnotationFocus?: (id: string) => void;
documentFileHash?: string | null;
annotationsDimmed?: boolean;
flashAnnotationId?: string | null;
@@ -294,7 +294,7 @@ function handleAnnotationClick(id: string) {
flashAnnotationId={flashAnnotationId}
onDraw={handleDraw}
onAnnotationClick={handleAnnotationClick}
onDeleteRequest={onDeleteAnnotationRequest}
onAnnotationFocus={onAnnotationFocus}
/>
{/if}
</div>

View File

@@ -84,22 +84,6 @@ export interface paths {
patch?: never;
trace?: never;
};
"/api/persons/{id}/confirm": {
parameters: {
query?: never;
header?: never;
path?: never;
cookie?: never;
};
get?: never;
put?: never;
post?: never;
delete?: never;
options?: never;
head?: never;
patch: operations["confirmPerson"];
trace?: never;
};
"/api/documents/{id}": {
parameters: {
query?: never;
@@ -244,6 +228,22 @@ export interface paths {
patch?: never;
trace?: never;
};
"/api/search/nl": {
parameters: {
query?: never;
header?: never;
path?: never;
cookie?: never;
};
get?: never;
put?: never;
post: operations["search"];
delete?: never;
options?: never;
head?: never;
patch?: never;
trace?: never;
};
"/api/persons": {
parameters: {
query?: never;
@@ -708,6 +708,22 @@ export interface paths {
patch?: never;
trace?: never;
};
"/api/admin/backfill-titles": {
parameters: {
query?: never;
header?: never;
path?: never;
cookie?: never;
};
get?: never;
put?: never;
post: operations["backfillTitles"];
delete?: never;
options?: never;
head?: never;
patch?: never;
trace?: never;
};
"/api/admin/backfill-file-hashes": {
parameters: {
query?: never;
@@ -740,6 +756,22 @@ export interface paths {
patch: operations["patchFamilyMember"];
trace?: never;
};
"/api/persons/{id}/confirm": {
parameters: {
query?: never;
header?: never;
path?: never;
cookie?: never;
};
get?: never;
put?: never;
post?: never;
delete?: never;
options?: never;
head?: never;
patch: operations["confirmPerson"];
trace?: never;
};
"/api/notifications/{id}/read": {
parameters: {
query?: never;
@@ -859,7 +891,7 @@ export interface paths {
path?: never;
cookie?: never;
};
get: operations["search"];
get: operations["search_1"];
put?: never;
post?: never;
delete?: never;
@@ -1323,7 +1355,7 @@ export interface paths {
path?: never;
cookie?: never;
};
get: operations["search_1"];
get: operations["search_2"];
put?: never;
post?: never;
delete?: never;
@@ -1651,7 +1683,7 @@ export interface components {
/** Format: int32 */
deathYear?: number;
/** Format: int32 */
generation?: number | null;
generation?: number;
};
Person: {
/** Format: uuid */
@@ -1668,7 +1700,7 @@ export interface components {
/** Format: int32 */
deathYear?: number;
/** Format: int32 */
generation?: number | null;
generation?: number;
familyMember: boolean;
sourceRef?: string;
provisional: boolean;
@@ -1803,6 +1835,98 @@ export interface components {
/** Format: uuid */
targetId: string;
};
NlSearchRequest: {
query: string;
};
Pageable: {
/** Format: int32 */
page?: number;
/** Format: int32 */
size?: number;
sort?: string[];
};
ActivityActorDTO: {
initials: string;
color: string;
name?: string;
};
DocumentListItem: {
/** Format: uuid */
id: string;
title: string;
originalFilename: string;
thumbnailUrl?: string;
/** Format: date */
documentDate?: string;
/** @enum {string} */
metaDatePrecision: "DAY" | "MONTH" | "SEASON" | "YEAR" | "RANGE" | "APPROX" | "UNKNOWN";
/** Format: date */
metaDateEnd?: string;
sender?: components["schemas"]["Person"];
receivers: components["schemas"]["Person"][];
tags: components["schemas"]["Tag"][];
archiveBox?: string;
archiveFolder?: string;
location?: string;
summary?: string;
/** Format: int32 */
completionPercentage: number;
contributors: components["schemas"]["ActivityActorDTO"][];
matchData: components["schemas"]["SearchMatchData"];
/** Format: date-time */
createdAt: string;
/** Format: date-time */
updatedAt: string;
};
DocumentSearchResult: {
items: components["schemas"]["DocumentListItem"][];
/** Format: int64 */
totalElements: number;
/** Format: int32 */
pageNumber: number;
/** Format: int32 */
pageSize: number;
/** Format: int32 */
totalPages: number;
/** Format: int64 */
undatedCount: number;
};
MatchOffset: {
/** Format: int32 */
start: number;
/** Format: int32 */
length: number;
};
NlQueryInterpretation: {
resolvedPersons: components["schemas"]["PersonHint"][];
ambiguousPersons: components["schemas"]["PersonHint"][];
/** Format: date */
dateFrom?: string;
/** Format: date */
dateTo?: string;
keywords: string[];
rawQuery: string;
keywordsApplied: boolean;
};
NlSearchResponse: {
result: components["schemas"]["DocumentSearchResult"];
interpretation: components["schemas"]["NlQueryInterpretation"];
};
PersonHint: {
/** Format: uuid */
id: string;
displayName: string;
};
SearchMatchData: {
transcriptionSnippet?: string;
titleOffsets: components["schemas"]["MatchOffset"][];
senderMatched: boolean;
matchedReceiverIds: string[];
matchedTagIds: string[];
snippetOffsets: components["schemas"]["MatchOffset"][];
summarySnippet?: string;
summaryOffsets: components["schemas"]["MatchOffset"][];
};
CreateRelationshipRequest: {
/** Format: uuid */
relatedPersonId: string;
@@ -2188,11 +2312,6 @@ export interface components {
/** Format: int64 */
transcriptionCount: number;
};
ActivityActorDTO: {
initials: string;
color: string;
name?: string;
};
TranscriptionQueueItemDTO: {
/** Format: uuid */
id: string;
@@ -2235,25 +2354,6 @@ export interface components {
/** Format: int64 */
totalStories: number;
};
PersonSummaryDTO: {
title?: string;
/** Format: uuid */
id?: string;
displayName?: string;
firstName?: string;
lastName?: string;
/** Format: int64 */
documentCount?: number;
/** Format: int32 */
birthYear?: number;
/** Format: int32 */
deathYear?: number;
alias?: string;
notes?: string;
personType?: string;
familyMember?: boolean;
provisional?: boolean;
};
PersonSearchResult: {
items: components["schemas"]["PersonSummaryDTO"][];
/** Format: int64 */
@@ -2265,6 +2365,25 @@ export interface components {
/** Format: int32 */
totalPages: number;
};
PersonSummaryDTO: {
title?: string;
/** Format: uuid */
id?: string;
displayName?: string;
firstName?: string;
lastName?: string;
/** Format: int64 */
documentCount?: number;
notes?: string;
/** Format: int32 */
birthYear?: number;
/** Format: int32 */
deathYear?: number;
provisional?: boolean;
alias?: string;
personType?: string;
familyMember?: boolean;
};
InferredRelationshipWithPersonDTO: {
person: components["schemas"]["PersonNodeDTO"];
label: string;
@@ -2280,7 +2399,7 @@ export interface components {
/** Format: int32 */
deathYear?: number;
/** Format: int32 */
generation?: number | null;
generation?: number;
familyMember: boolean;
};
InferredRelationshipDTO: {
@@ -2433,63 +2552,6 @@ export interface components {
/** Format: int32 */
totalPages?: number;
};
DocumentListItem: {
/** Format: uuid */
id: string;
title: string;
originalFilename: string;
thumbnailUrl?: string;
/** Format: date */
documentDate?: string;
/** @enum {string} */
metaDatePrecision: "DAY" | "MONTH" | "SEASON" | "YEAR" | "RANGE" | "APPROX" | "UNKNOWN";
/** Format: date */
metaDateEnd?: string;
sender?: components["schemas"]["Person"];
receivers: components["schemas"]["Person"][];
tags: components["schemas"]["Tag"][];
archiveBox?: string;
archiveFolder?: string;
location?: string;
summary?: string;
/** Format: int32 */
completionPercentage: number;
contributors: components["schemas"]["ActivityActorDTO"][];
matchData: components["schemas"]["SearchMatchData"];
/** Format: date-time */
createdAt: string;
/** Format: date-time */
updatedAt: string;
};
DocumentSearchResult: {
items: components["schemas"]["DocumentListItem"][];
/** Format: int64 */
totalElements: number;
/** Format: int32 */
pageNumber: number;
/** Format: int32 */
pageSize: number;
/** Format: int32 */
totalPages: number;
/** Format: int64 */
undatedCount: number;
};
MatchOffset: {
/** Format: int32 */
start: number;
/** Format: int32 */
length: number;
};
SearchMatchData: {
transcriptionSnippet?: string;
titleOffsets: components["schemas"]["MatchOffset"][];
senderMatched: boolean;
matchedReceiverIds: string[];
matchedTagIds: string[];
snippetOffsets: components["schemas"]["MatchOffset"][];
summarySnippet?: string;
summaryOffsets: components["schemas"]["MatchOffset"][];
};
IncompleteDocumentDTO: {
/** Format: uuid */
id: string;
@@ -2828,6 +2890,26 @@ export interface operations {
};
};
};
deletePerson: {
parameters: {
query?: never;
header?: never;
path: {
id: string;
};
cookie?: never;
};
requestBody?: never;
responses: {
/** @description No Content */
204: {
headers: {
[name: string]: unknown;
};
content?: never;
};
};
};
getDocument: {
parameters: {
query?: never;
@@ -3154,6 +3236,32 @@ export interface operations {
};
};
};
search: {
parameters: {
query: {
pageable: components["schemas"]["Pageable"];
};
header?: never;
path?: never;
cookie?: never;
};
requestBody: {
content: {
"application/json": components["schemas"]["NlSearchRequest"];
};
};
responses: {
/** @description OK */
200: {
headers: {
[name: string]: unknown;
};
content: {
"*/*": components["schemas"]["NlSearchResponse"];
};
};
};
};
getPersons: {
parameters: {
query?: {
@@ -3184,48 +3292,6 @@ export interface operations {
};
};
};
confirmPerson: {
parameters: {
query?: never;
header?: never;
path: {
id: string;
};
cookie?: never;
};
requestBody?: never;
responses: {
/** @description OK */
200: {
headers: {
[name: string]: unknown;
};
content: {
"*/*": components["schemas"]["Person"];
};
};
};
};
deletePerson: {
parameters: {
query?: never;
header?: never;
path: {
id: string;
};
cookie?: never;
};
requestBody?: never;
responses: {
/** @description No Content */
204: {
headers: {
[name: string]: unknown;
};
content?: never;
};
};
};
createPerson: {
parameters: {
query?: never;
@@ -4117,6 +4183,26 @@ export interface operations {
};
};
};
backfillTitles: {
parameters: {
query?: never;
header?: never;
path?: never;
cookie?: never;
};
requestBody?: never;
responses: {
/** @description OK */
200: {
headers: {
[name: string]: unknown;
};
content: {
"*/*": components["schemas"]["BackfillResult"];
};
};
};
};
backfillFileHashes: {
parameters: {
query?: never;
@@ -4163,6 +4249,28 @@ export interface operations {
};
};
};
confirmPerson: {
parameters: {
query?: never;
header?: never;
path: {
id: string;
};
cookie?: never;
};
requestBody?: never;
responses: {
/** @description OK */
200: {
headers: {
[name: string]: unknown;
};
content: {
"*/*": components["schemas"]["Person"];
};
};
};
};
markOneRead: {
parameters: {
query?: never;
@@ -4443,7 +4551,7 @@ export interface operations {
};
};
};
search: {
search_1: {
parameters: {
query?: {
q?: string;
@@ -5067,7 +5175,7 @@ export interface operations {
};
};
};
search_1: {
search_2: {
parameters: {
query?: {
q?: string;

View File

@@ -1099,9 +1099,11 @@ describe('StammbaumTree keyboard tab order (#718)', () => {
const CLARA = '00000000-0000-0000-0000-0000000000a3';
const HANS = '00000000-0000-0000-0000-0000000000a4';
// Walter ↔ Eugenie (gen 0); their children Clara + Hans (gen 1). buildLayout
// sorts each generation alphabetically, so the deterministic visual order is
// Eugenie, Walter (top row) then Clara, Hans (next row).
// Walter ↔ Eugenie (gen 0); their children Clara + Hans (gen 1). The tidy-tree
// layout (#724) orders a couple's run by structural ownership (earliest birth
// year, then a deterministic id tie-break), not alphabetically — with no birth
// years here Walter (id …a1) owns the run and Eugenie sits to his right. So the
// deterministic visual order is Walter, Eugenie (top row) then Clara, Hans.
const FAMILY_EDGES = [
{
id: 'sp',
@@ -1171,7 +1173,7 @@ describe('StammbaumTree keyboard tab order (#718)', () => {
});
// Top generation left-to-right, then next generation left-to-right.
expect(nodeLabelsInDomOrder()).toEqual(['Eugenie', 'Walter', 'Clara', 'Hans']);
expect(nodeLabelsInDomOrder()).toEqual(['Walter', 'Eugenie', 'Clara', 'Hans']);
});
it('orders tab stops by rendered position regardless of input order', () => {

View File

@@ -0,0 +1,254 @@
import { describe, it, expect, afterEach, vi } from 'vitest';
import type { TranscribeShortcutOptions } from './transcribeShortcuts';
const { transcribeShortcuts } = await import('./transcribeShortcuts');
function makeOptions(
overrides: Partial<TranscribeShortcutOptions> = {}
): TranscribeShortcutOptions {
return {
isPanelOpen: () => true,
isCheatsheetOpen: () => false,
panelMode: () => 'edit',
goToNextRegion: vi.fn(),
goToPrevRegion: vi.fn(),
toggleMode: vi.fn(),
closePanel: vi.fn(),
startDrawMode: vi.fn(),
toggleTrainingMark: vi.fn(),
deleteCurrentRegion: vi.fn(),
openCheatsheet: vi.fn(),
...overrides
};
}
describe('transcribeShortcuts action', () => {
const nodes: HTMLElement[] = [];
const teardowns: Array<() => void> = [];
function makeNode(): HTMLElement {
const node = document.createElement('div');
document.body.appendChild(node);
nodes.push(node);
return node;
}
function attach(options: TranscribeShortcutOptions) {
const node = makeNode();
const action = transcribeShortcuts(node, options);
teardowns.push(() => action.destroy());
return action;
}
function press(
key: string,
opts: { target?: EventTarget; ctrlKey?: boolean; altKey?: boolean; metaKey?: boolean } = {}
): KeyboardEvent {
const event = new KeyboardEvent('keydown', {
key,
bubbles: true,
cancelable: true,
ctrlKey: opts.ctrlKey ?? false,
altKey: opts.altKey ?? false,
metaKey: opts.metaKey ?? false
});
(opts.target ?? document).dispatchEvent(event);
return event;
}
function makeEditable(tag: 'input' | 'textarea' | 'div'): HTMLElement {
const el = document.createElement(tag);
if (tag === 'div') el.setAttribute('contenteditable', 'true');
document.body.appendChild(el);
nodes.push(el);
return el;
}
afterEach(() => {
teardowns.forEach((t) => t());
teardowns.length = 0;
nodes.forEach((n) => n.remove());
nodes.length = 0;
});
describe('navigation and mode keys', () => {
it('fires goToNextRegion on "j" and prevents default', () => {
const options = makeOptions();
attach(options);
const event = press('j');
expect(options.goToNextRegion).toHaveBeenCalledOnce();
expect(event.defaultPrevented).toBe(true);
});
it('fires goToPrevRegion on "k"', () => {
const options = makeOptions();
attach(options);
press('k');
expect(options.goToPrevRegion).toHaveBeenCalledOnce();
});
it('fires toggleMode on "e"', () => {
const options = makeOptions();
attach(options);
press('e');
expect(options.toggleMode).toHaveBeenCalledOnce();
});
});
describe('panel-open guard', () => {
it('does not fire when the panel is closed', () => {
const options = makeOptions({ isPanelOpen: () => false });
attach(options);
press('j');
press('e');
expect(options.goToNextRegion).not.toHaveBeenCalled();
expect(options.toggleMode).not.toHaveBeenCalled();
});
});
describe('focus guard', () => {
it('does not fire "j" when focus is inside an <input>', () => {
const options = makeOptions();
attach(options);
press('j', { target: makeEditable('input') });
expect(options.goToNextRegion).not.toHaveBeenCalled();
});
it('does not fire "j"/"e"/"n"/"t" when focus is inside the TipTap contenteditable', () => {
const options = makeOptions();
attach(options);
const editor = makeEditable('div');
press('j', { target: editor });
press('e', { target: editor });
press('n', { target: editor });
press('t', { target: editor });
expect(options.goToNextRegion).not.toHaveBeenCalled();
expect(options.toggleMode).not.toHaveBeenCalled();
expect(options.startDrawMode).not.toHaveBeenCalled();
expect(options.toggleTrainingMark).not.toHaveBeenCalled();
});
});
describe('"?" cheatsheet (focus-independent)', () => {
it('opens the cheatsheet on "?"', () => {
const options = makeOptions();
attach(options);
const event = press('?');
expect(options.openCheatsheet).toHaveBeenCalledOnce();
expect(event.defaultPrevented).toBe(true);
});
it('opens the cheatsheet even when focus is inside the editor', () => {
const options = makeOptions();
attach(options);
press('?', { target: makeEditable('div') });
expect(options.openCheatsheet).toHaveBeenCalledOnce();
});
it('does nothing when a Ctrl/Alt/Meta modifier is held', () => {
const options = makeOptions();
attach(options);
press('?', { ctrlKey: true });
press('?', { altKey: true });
press('?', { metaKey: true });
expect(options.openCheatsheet).not.toHaveBeenCalled();
});
it('is a no-op when the cheatsheet is already open (open-only)', () => {
const options = makeOptions({ isCheatsheetOpen: () => true });
attach(options);
press('?');
expect(options.openCheatsheet).not.toHaveBeenCalled();
});
it('does not open the cheatsheet when the panel is closed', () => {
const options = makeOptions({ isPanelOpen: () => false });
attach(options);
press('?');
expect(options.openCheatsheet).not.toHaveBeenCalled();
});
});
describe('"n" draw mode (edit only)', () => {
it('is a no-op in read mode', () => {
const options = makeOptions({ panelMode: () => 'read' });
attach(options);
press('n');
expect(options.startDrawMode).not.toHaveBeenCalled();
});
it('fires startDrawMode in edit mode', () => {
const options = makeOptions({ panelMode: () => 'edit' });
attach(options);
press('n');
expect(options.startDrawMode).toHaveBeenCalledOnce();
});
});
describe('"t" training mark', () => {
it('fires toggleTrainingMark', () => {
const options = makeOptions();
attach(options);
press('t');
expect(options.toggleTrainingMark).toHaveBeenCalledOnce();
});
});
describe('"Delete" current region — single owner', () => {
it('fires deleteCurrentRegion exactly once when a focused annotation is the target', () => {
const options = makeOptions();
attach(options);
const annotation = document.createElement('div');
annotation.setAttribute('data-annotation', '');
annotation.setAttribute('tabindex', '0');
document.body.appendChild(annotation);
nodes.push(annotation);
press('Delete', { target: annotation });
expect(options.deleteCurrentRegion).toHaveBeenCalledOnce();
});
});
describe('Esc precedence ladder (decision B1)', () => {
it('rung 1 — cheatsheet open: closePanel is NOT called (dialog handles Esc)', () => {
const options = makeOptions({ isCheatsheetOpen: () => true });
attach(options);
press('Escape');
expect(options.closePanel).not.toHaveBeenCalled();
});
it('rung 2 — focus inside editable: no callback fires', () => {
const options = makeOptions();
attach(options);
press('Escape', { target: makeEditable('div') });
expect(options.closePanel).not.toHaveBeenCalled();
});
it('rung 3 — otherwise: closePanel is called (pins panel-close)', () => {
const options = makeOptions();
attach(options);
const event = press('Escape');
expect(options.closePanel).toHaveBeenCalledOnce();
expect(event.defaultPrevented).toBe(true);
});
});
describe('lifecycle', () => {
it('removes the listener on destroy (no leak)', () => {
const options = makeOptions();
const action = attach(options);
action.destroy();
press('j');
expect(options.goToNextRegion).not.toHaveBeenCalled();
});
it('update() swaps callbacks so the listener never closes over stale state', () => {
const first = makeOptions();
const action = attach(first);
const second = makeOptions();
action.update(second);
press('j');
expect(first.goToNextRegion).not.toHaveBeenCalled();
expect(second.goToNextRegion).toHaveBeenCalledOnce();
});
});
});

View File

@@ -0,0 +1,100 @@
/**
* Keyboard shortcuts for the Transcribe panel power path (issue #327).
*
* A pure input-to-callback translator: it owns no state and has no
* save/persistence responsibility. Panel state and every command are passed in
* as callbacks backed by the page's existing context setters, so the listener
* never closes over stale `$state`. It is the single owner of the panel's
* global `keydown` — including `Esc` (decision B1).
*/
export type TranscribeShortcutOptions = {
isPanelOpen: () => boolean; // reads transcribeMode ($state)
isCheatsheetOpen: () => boolean; // Esc ladder rung 1 + "?" open-only guard
panelMode: () => 'read' | 'edit'; // for the n-only-in-edit guard
goToNextRegion: () => void; // j
goToPrevRegion: () => void; // k
toggleMode: () => void; // e
closePanel: () => void; // Esc ladder rung 3
startDrawMode: () => void; // n (edit mode only)
toggleTrainingMark: () => void; // t (no-op when no active block)
deleteCurrentRegion: () => void; // Delete (confirm modal)
openCheatsheet: () => void; // ?
};
function isEditableTarget(target: EventTarget | null): boolean {
if (!(target instanceof HTMLElement)) return false;
const tag = target.tagName;
return tag === 'INPUT' || tag === 'TEXTAREA' || target.isContentEditable;
}
// `node` is unused: the listener is global (window) so a shortcut fires no
// matter where focus sits on the page. It is still authored as a Svelte action
// (`use:transcribeShortcuts`) so its lifecycle is tied to the host element's
// mount/unmount and `destroy()` reliably removes the listener.
export function transcribeShortcuts(_node: HTMLElement, initial: TranscribeShortcutOptions) {
let options = initial;
function handleKeydown(event: KeyboardEvent) {
// "?" is focus-independent: it fires regardless of the focus guard, but
// only with no Ctrl/Alt/Meta (Shift is allowed — "?" is Shift+ß on QWERTZ).
if (event.key === '?' && !event.ctrlKey && !event.altKey && !event.metaKey) {
if (!options.isPanelOpen()) return;
event.preventDefault();
if (!options.isCheatsheetOpen()) options.openCheatsheet();
return;
}
if (!options.isPanelOpen()) return;
// Esc precedence ladder (decision B1) — top rung wins.
if (event.key === 'Escape') {
if (options.isCheatsheetOpen()) return; // rung 1: the <dialog> closes itself
if (isEditableTarget(event.target)) return; // rung 2: let TipTap handle its Esc
event.preventDefault(); // rung 3: close the panel
options.closePanel();
return;
}
// Every remaining shortcut is inactive while focus is inside an editable.
if (isEditableTarget(event.target)) return;
switch (event.key) {
case 'j':
event.preventDefault();
options.goToNextRegion();
break;
case 'k':
event.preventDefault();
options.goToPrevRegion();
break;
case 'e':
event.preventDefault();
options.toggleMode();
break;
case 'n':
if (options.panelMode() !== 'edit') return; // silent no-op in read mode
event.preventDefault();
options.startDrawMode();
break;
case 't':
event.preventDefault();
options.toggleTrainingMark();
break;
case 'Delete':
event.preventDefault();
options.deleteCurrentRegion();
break;
}
}
window.addEventListener('keydown', handleKeydown);
return {
update(next: TranscribeShortcutOptions) {
options = next;
},
destroy() {
window.removeEventListener('keydown', handleKeydown);
}
};
}

View File

@@ -307,9 +307,21 @@ describe('renderTranscriptionBody', () => {
expect(result).not.toMatch(/>O"Brien<\/a>/);
});
it('renders nothing when mentionedPersons is undefined-empty and no @ triggers', () => {
const result = renderTranscriptionBody('Plain old transcription text.', []);
expect(result).toBe('Plain old transcription text.');
it('renders a deleted-person @mention as plain text with no dead link (graceful degradation)', () => {
// AC-6 (#684): when a mentioned person is deleted, V71's ON DELETE CASCADE removes the
// sidecar row, so the displayName reaches the renderer with an empty mentionedPersons
// array. The reader must still see the name as plain text — never a dead <a>, a
// person-mention class, a data-person-id, or an href. This locks the degradation
// contract so a future renderer refactor cannot silently reintroduce a dead link.
const result = renderTranscriptionBody('Brief an @Auguste Raddatz vom Mai', []);
// (a) the reader still sees the readable name and the surrounding sentence verbatim
expect(result).toContain('Auguste Raddatz');
expect(result).toBe('Brief an @Auguste Raddatz vom Mai');
// (b) none of the anchor artifacts leak through
expect(result).not.toContain('<a');
expect(result).not.toContain('person-mention');
expect(result).not.toContain('data-person-id');
expect(result).not.toContain('href');
});
it('skips substitution when personId is not a UUID (defense in depth)', () => {

View File

@@ -53,6 +53,8 @@ export type ErrorCode =
| 'FORBIDDEN'
| 'CSRF_TOKEN_MISSING'
| 'TOO_MANY_LOGIN_ATTEMPTS'
| 'SMART_SEARCH_UNAVAILABLE'
| 'SMART_SEARCH_RATE_LIMITED'
| 'VALIDATION_ERROR'
| 'BATCH_TOO_LARGE'
| 'BULK_EDIT_TOO_MANY_IDS'
@@ -178,6 +180,10 @@ export function getErrorMessage(code: ErrorCode | string | undefined): string {
return m.error_csrf_token_missing();
case 'TOO_MANY_LOGIN_ATTEMPTS':
return m.error_too_many_login_attempts();
case 'SMART_SEARCH_UNAVAILABLE':
return m.error_smart_search_unavailable();
case 'SMART_SEARCH_RATE_LIMITED':
return m.error_smart_search_rate_limited();
case 'VALIDATION_ERROR':
return m.error_validation_error();
case 'BATCH_TOO_LARGE':

View File

@@ -72,5 +72,13 @@ import TranscribeDragDemo from './TranscribeDragDemo.svelte';
{m.transcribe_coach_footer_richtlinien()}
<span class="ml-1 text-[11px] text-ink-3">{m.common_opens_new_tab()}</span>
</a>
<p class="w-full text-ink-3 [@media(pointer:coarse)]:hidden">
{m.transcribe_coach_shortcut_hint_before()}
<kbd
class="rounded border border-line bg-muted px-1.5 py-0.5 font-mono text-xs text-ink shadow-sm"
>?</kbd
>
{m.transcribe_coach_shortcut_hint_after()}
</p>
</div>
</div>

View File

@@ -14,6 +14,8 @@ vi.mock('$lib/paraglide/messages.js', () => ({
transcribe_coach_step_3_title: () => 'Speichert automatisch.',
transcribe_coach_footer_kurrent: () => 'Hilfe zu Kurrent ↗',
transcribe_coach_footer_richtlinien: () => 'Transkriptions-Richtlinien ↗',
transcribe_coach_shortcut_hint_before: () => 'Tipp: Drücken Sie',
transcribe_coach_shortcut_hint_after: () => 'für eine Übersicht aller Tastenkürzel.',
common_opens_new_tab: () => '(öffnet in neuem Tab)'
}
}));
@@ -63,6 +65,21 @@ describe('TranscribeCoachEmptyState', () => {
await expect.element(annotations.first()).toBeInTheDocument();
});
it('renders the keyboard-shortcut hint with a "?" key cap', async () => {
render(TranscribeCoachEmptyState);
await expect.element(page.getByText('Tastenkürzel', { exact: false })).toBeInTheDocument();
const kbd = document.querySelector('kbd');
expect(kbd?.textContent).toBe('?');
});
it('hides the keyboard hint on touch-only (coarse-pointer) devices', async () => {
render(TranscribeCoachEmptyState);
const hint = document.querySelector('kbd')?.closest('p');
// The hint is gated behind a fine-pointer media query so touch-only
// transcribers are never told to press a key they do not have (#327).
expect(hint?.className).toContain('pointer:coarse');
});
it('renders the drag demo animation region inside step 1', async () => {
render(TranscribeCoachEmptyState);
const demo = page.getByRole('img', { name: /Rahmen ziehen|Animation/i });

Some files were not shown because too many files have changed in this diff Show More