# ADR-033 — Tag-name resolution tolerates case-collisions: exact-case first, then a deterministic lowest-id fallback, and never a `unique(lower(name))` constraint

**Date:** 2026-06-06
**Status:** Accepted
**Issue:** #730 (document with a case-colliding tag cannot be saved — `findByNameIgnoreCase` `NonUniqueResultException`)
**Milestone:** —

---

## Context

`TagService.findOrCreate(name)` is the single point that turns a tag **name** into a `Tag`
row. The document edit form, bulk-edit, and the upload batch all round-trip tag **names**
(the edit form sends `tags.map(t => t.name).join(',')`) and re-resolve them on **every**
save through `resolveTags → findOrCreate`. The old implementation resolved with
`tagRepository.findByNameIgnoreCase(name)`, a derived query returning `Optional<Tag>`.

That signature encodes an invariant the data does **not** hold: that a name is globally
unique case-insensitively. The canonical tag tree legitimately contains names that differ
only by case — a parent container and its same-named lowercase **child** (`Geburt` /
`Geburt/geburt`, `Weihnachten` / `Weihnachten/weihnachten`, …), or two siblings
(`Reise/Reisepläne` / `Reise/reisepläne`). Each is a distinct node with its own
`source_ref` (the stable identity, per ADR-025) and its own document attachments — **not** an
accidental duplicate. When two rows matched case-insensitively, Hibernate threw
`NonUniqueResultException` → `IncorrectResultSizeDataAccessException` → a generic HTTP 500.

The effect was severe and opaque: every document carrying one of ~10 colliding tags (≈180
document-tag attachments on staging) became **un-editable** — any field change failed on save
because the whole tag set is re-resolved — and the user saw only "an unexpected error", with
no hint that a tag was the cause.

This is a **lookup** problem, not a data problem: the collisions are valid canonical nodes
and must be preserved.

## Decision

### 1. Resolution is exact-case first, then a non-throwing deterministic fallback

`findOrCreate` resolves in three ordered steps and never throws on a collision:

1. `findByName(cleanName)` — **exact-case** derived query. If present, return it. The edit
   round-trip replays the stored name verbatim, so the exact-case row is the right binding
   (typing the bare child name `weihnachten` binds to the child; `Weihnachten` binds to the
   parent container).
2. else `findAllByNameIgnoreCase(cleanName)` — the **plural** case-insensitive list. If
   non-empty, return the element with the **lowest `id`** (`min(comparing(Tag::getId))`).
3. else create the tag (an orphan: null `sourceRef`/`parentId`).

The two repository methods are deliberately **two distinctly-named methods** — Spring Data
cannot disambiguate an `Optional<Tag>` from a `List<Tag>` derived query by return type alone.
The throwing `Optional<Tag> findByNameIgnoreCase` is **deleted** so the non-unique-throwing
call cannot be reintroduced; `findOrCreate` was its only production caller.

### 2. The tie-break is `id`, and it is load-bearing

`id` is a stable, always-present, unique column, so "lowest id" is a total, deterministic
order over the candidates: the same name resolves to the same row on every call, forever,
without throwing. This matters only in the free-text authoring path (step 2), where no
exact-case row exists yet two case-folding rows do.

### 3. No `unique(lower(name))` constraint — and a load-bearing comment says so

A global case-insensitive uniqueness constraint is **wrong**: it would reject the legitimate
parent/child canonical nodes. It would also **fail to apply** against the existing rows,
turning a code-only deploy into a failed Flyway migration that blocks startup. A comment at
both `findOrCreate` and the repository methods records this so the constraint is not "helpfully"
added later.

## Consequences

- **Code-only, zero migration, fully reversible** (roll back the JAR). No tag data is touched;
  the colliding rows stay exactly as the canonical importer produced them.
- One change fixes all three write paths — single-document edit, bulk-edit, and upload batch —
  because they all funnel through `resolveTags → findOrCreate`, which stays the single source
  of truth (resolution logic is **not** hoisted into `DocumentService`).
- **Free-text tag semantics under collision are accepted as-is** (issue #730, option A): the
  bare word `weihnachten` binds to the deep child and `Weihnachten` to the parent container.
  Correct for the edit round-trip and acceptable for authoring; making the typeahead show the
  tree path so an author can tell a container from its same-named child is a separate
  follow-up.
- The wire response stays opaque: after the fix this path no longer throws
  `IncorrectResultSizeDataAccessException`, and `GlobalExceptionHandler`'s generic handler maps
  any stray one to `INTERNAL_ERROR` with no Hibernate/SQL leak — so no dedicated handler was
  added.
- **The sibling Person path is fixed the same way — see the Person extension below (#731).**
- Postgres `LOWER()` folding of umlauts (`ü`/`ä`) is the actual correctness hinge of the
  fallback and cannot be proven by a mocked repo, so it is pinned by a Testcontainers
  `postgres:16-alpine` test on a `Glückwünsche`/`glückwünsche` pair; a plain-ASCII test would
  stay green while the bug reappeared for umlaut tags.

## Person extension (#731)

The Person domain carried the same latent throw on **two** user-influenced lookup surfaces, and
is fixed with the same exact-case-first, non-throwing pattern — but with a deliberately
**different fallback per surface**, because the two paths have different consequences.

- **Alias path — `PersonService.findOrCreateByAlias` — deterministic lowest-id (mirrors tag).**
  `findByAliasIgnoreCase` (`Optional`) is replaced by `findByAlias` (exact) → `findAllByAliasIgnoreCase`
  (plural, lowest id) → the existing create-when-absent branch (INSTITUTION/GROUP and the
  maiden-name alias are preserved verbatim). There is no human in the importer loop and the path
  creates-on-absent anyway, so a deterministic guess is the right behaviour — exactly like tags.

- **Name/sender path — `PersonService.findByName` — bail to null on ambiguity (the new wrinkle).**
  Used only by `DocumentService.storeDocument` to resolve the upload **sender** from the parsed
  filename. `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` (`Optional`) is replaced by
  `findByFirstNameAndLastName` (exact) → `findAllByFirstNameIgnoreCaseAndLastNameIgnoreCase`
  (plural). Resolution returns the exact-case match, else the single case-insensitive match, else
  — on **two or more** matches — **empty**. The sender is left unset rather than guessing.

  **Why this diverges from the alias (and tag) decision:** the archive's value is correct
  provenance. A confidently-wrong pre-filled `Hans Müller` is worse than an empty field, because a
  senior reviewer will not re-check a value that is already filled in, whereas an empty sender
  routes the document into the "needs completion" state (`metadataComplete=false`) for a human to
  assign. The load-bearing comment at `findByName` records this so a future "consistency cleanup"
  does not reintroduce the confidently-wrong-sender bug by switching it to lowest-id.

- **Fail-closed on a null first name.** A parsed filename can lack a first name. The two new name
  methods use explicit HQL equality (`= :firstName`) rather than a derived
  `…IgnoreCase` query, because Spring Data folds a null derived-query argument to `first_name IS
  NULL` — which would silently widen the match and pull a last-name-only / institution row in as a
  "sender" (a quiet provenance-integrity defect). With HQL equality a null binds as `= NULL`,
  which never matches, so a null first name resolves to **no sender**. This is pinned by a
  real-Postgres repository test.

- **Scope — "ambiguous" is case-insensitive only.** Both exact-case lookups (`findByAlias`,
  `findByFirstNameAndLastName`) return `Optional`, so two **byte-identical same-case** rows would
  still throw `NonUniqueResultException`. That is a true data anomaly, deliberately out of scope
  (it is not a case-collision), and it surfaces as the opaque `INTERNAL_ERROR` — never a silently
  wrong row — so it is no worse than any other unexpected error and needs no extra handling here.

- **Same stance as tags otherwise:** no `unique(lower(alias))` / `unique(lower(name))` constraint
  (collisions are valid human labels; `source_ref` is the stable identity per ADR-025), no
  merge/dedupe, code-only and reversible, and no shared `resolveExactThenCi(...)` helper — the
  two Person paths have different fallbacks, so the exact→CI→fallback logic is inlined at each
  with its load-bearing comment (KISS).

## Alternatives considered

- **A `unique(lower(name))` index** — rejected: the collisions are valid canonical nodes, and
  the migration would fail against the existing data and block startup.
- **Merging/deduping the colliding tags** — rejected: each has a distinct `source_ref`, tree
  position, and real document attachments; they are not duplicates.
- **Round-tripping tag IDs instead of names** so resolution can't be ambiguous at all — the
  cleaner long-term shape (removes the name-as-key smell), but a larger change with frontend
  surface; deferred to #732. The lookup fix here is the minimal correct unblock.
- **Hoisting resolution into `DocumentService.resolveTags`** — rejected: it would duplicate the
  rule across the edit, bulk-edit, and import paths and let them drift; `findOrCreate` stays
  the one owner.