diff --git a/docs/adr/032-tag-name-resolution-tolerates-case-collisions.md b/docs/adr/032-tag-name-resolution-tolerates-case-collisions.md new file mode 100644 index 00000000..9a1fe61f --- /dev/null +++ b/docs/adr/032-tag-name-resolution-tolerates-case-collisions.md @@ -0,0 +1,105 @@ +# ADR-032 — Tag-name resolution tolerates case-collisions: exact-case first, then a deterministic lowest-id fallback, and never a `unique(lower(name))` constraint + +**Date:** 2026-06-06 +**Status:** Accepted +**Issue:** #730 (document with a case-colliding tag cannot be saved — `findByNameIgnoreCase` `NonUniqueResultException`) +**Milestone:** — + +--- + +## Context + +`TagService.findOrCreate(name)` is the single point that turns a tag **name** into a `Tag` +row. The document edit form, bulk-edit, and the upload batch all round-trip tag **names** +(the edit form sends `tags.map(t => t.name).join(',')`) and re-resolve them on **every** +save through `resolveTags → findOrCreate`. The old implementation resolved with +`tagRepository.findByNameIgnoreCase(name)`, a derived query returning `Optional`. + +That signature encodes an invariant the data does **not** hold: that a name is globally +unique case-insensitively. The canonical tag tree legitimately contains names that differ +only by case — a parent container and its same-named lowercase **child** (`Geburt` / +`Geburt/geburt`, `Weihnachten` / `Weihnachten/weihnachten`, …), or two siblings +(`Reise/Reisepläne` / `Reise/reisepläne`). Each is a distinct node with its own +`source_ref` (the stable identity, per ADR-025) and its own document attachments — **not** an +accidental duplicate. When two rows matched case-insensitively, Hibernate threw +`NonUniqueResultException` → `IncorrectResultSizeDataAccessException` → a generic HTTP 500. + +The effect was severe and opaque: every document carrying one of ~10 colliding tags (≈180 +document-tag attachments on staging) became **un-editable** — any field change failed on save +because the whole tag set is re-resolved — and the user saw only "an unexpected error", with +no hint that a tag was the cause. + +This is a **lookup** problem, not a data problem: the collisions are valid canonical nodes +and must be preserved. + +## Decision + +### 1. Resolution is exact-case first, then a non-throwing deterministic fallback + +`findOrCreate` resolves in three ordered steps and never throws on a collision: + +1. `findByName(cleanName)` — **exact-case** derived query. If present, return it. The edit + round-trip replays the stored name verbatim, so the exact-case row is the right binding + (typing the bare child name `weihnachten` binds to the child; `Weihnachten` binds to the + parent container). +2. else `findAllByNameIgnoreCase(cleanName)` — the **plural** case-insensitive list. If + non-empty, return the element with the **lowest `id`** (`min(comparing(Tag::getId))`). +3. else create the tag (an orphan: null `sourceRef`/`parentId`). + +The two repository methods are deliberately **two distinctly-named methods** — Spring Data +cannot disambiguate an `Optional` from a `List` derived query by return type alone. +The throwing `Optional findByNameIgnoreCase` is **deleted** so the non-unique-throwing +call cannot be reintroduced; `findOrCreate` was its only production caller. + +### 2. The tie-break is `id`, and it is load-bearing + +`id` is a stable, always-present, unique column, so "lowest id" is a total, deterministic +order over the candidates: the same name resolves to the same row on every call, forever, +without throwing. This matters only in the free-text authoring path (step 2), where no +exact-case row exists yet two case-folding rows do. + +### 3. No `unique(lower(name))` constraint — and a load-bearing comment says so + +A global case-insensitive uniqueness constraint is **wrong**: it would reject the legitimate +parent/child canonical nodes. It would also **fail to apply** against the existing rows, +turning a code-only deploy into a failed Flyway migration that blocks startup. A comment at +both `findOrCreate` and the repository methods records this so the constraint is not "helpfully" +added later. + +## Consequences + +- **Code-only, zero migration, fully reversible** (roll back the JAR). No tag data is touched; + the colliding rows stay exactly as the canonical importer produced them. +- One change fixes all three write paths — single-document edit, bulk-edit, and upload batch — + because they all funnel through `resolveTags → findOrCreate`, which stays the single source + of truth (resolution logic is **not** hoisted into `DocumentService`). +- **Free-text tag semantics under collision are accepted as-is** (issue #730, option A): the + bare word `weihnachten` binds to the deep child and `Weihnachten` to the parent container. + Correct for the edit round-trip and acceptable for authoring; making the typeahead show the + tree path so an author can tell a container from its same-named child is a separate + follow-up. +- The wire response stays opaque: after the fix this path no longer throws + `IncorrectResultSizeDataAccessException`, and `GlobalExceptionHandler`'s generic handler maps + any stray one to `INTERNAL_ERROR` with no Hibernate/SQL leak — so no dedicated handler was + added. +- **The sibling Person path is unfixed but tracked.** `PersonService.findOrCreateByAlias` + (`findByAliasIgnoreCase`) and `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` carry the same + latent `Optional`-non-unique throw on user-influenced names; deferred to #731 rather than + widened into this fix. +- Postgres `LOWER()` folding of umlauts (`ü`/`ä`) is the actual correctness hinge of the + fallback and cannot be proven by a mocked repo, so it is pinned by a Testcontainers + `postgres:16-alpine` test on a `Glückwünsche`/`glückwünsche` pair; a plain-ASCII test would + stay green while the bug reappeared for umlaut tags. + +## Alternatives considered + +- **A `unique(lower(name))` index** — rejected: the collisions are valid canonical nodes, and + the migration would fail against the existing data and block startup. +- **Merging/deduping the colliding tags** — rejected: each has a distinct `source_ref`, tree + position, and real document attachments; they are not duplicates. +- **Round-tripping tag IDs instead of names** so resolution can't be ambiguous at all — the + cleaner long-term shape (removes the name-as-key smell), but a larger change with frontend + surface; deferred to #732. The lookup fix here is the minimal correct unblock. +- **Hoisting resolution into `DocumentService.resolveTags`** — rejected: it would duplicate the + rule across the edit, bulk-edit, and import paths and let them drift; `findOrCreate` stays + the one owner.