fix(tag): resolve case-colliding tag names without throwing (#730) #733
105
docs/adr/032-tag-name-resolution-tolerates-case-collisions.md
Normal file
105
docs/adr/032-tag-name-resolution-tolerates-case-collisions.md
Normal file
@@ -0,0 +1,105 @@
|
||||
# ADR-032 — Tag-name resolution tolerates case-collisions: exact-case first, then a deterministic lowest-id fallback, and never a `unique(lower(name))` constraint
|
||||
|
||||
**Date:** 2026-06-06
|
||||
**Status:** Accepted
|
||||
**Issue:** #730 (document with a case-colliding tag cannot be saved — `findByNameIgnoreCase` `NonUniqueResultException`)
|
||||
**Milestone:** —
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
`TagService.findOrCreate(name)` is the single point that turns a tag **name** into a `Tag`
|
||||
row. The document edit form, bulk-edit, and the upload batch all round-trip tag **names**
|
||||
(the edit form sends `tags.map(t => t.name).join(',')`) and re-resolve them on **every**
|
||||
save through `resolveTags → findOrCreate`. The old implementation resolved with
|
||||
`tagRepository.findByNameIgnoreCase(name)`, a derived query returning `Optional<Tag>`.
|
||||
|
||||
That signature encodes an invariant the data does **not** hold: that a name is globally
|
||||
unique case-insensitively. The canonical tag tree legitimately contains names that differ
|
||||
only by case — a parent container and its same-named lowercase **child** (`Geburt` /
|
||||
`Geburt/geburt`, `Weihnachten` / `Weihnachten/weihnachten`, …), or two siblings
|
||||
(`Reise/Reisepläne` / `Reise/reisepläne`). Each is a distinct node with its own
|
||||
`source_ref` (the stable identity, per ADR-025) and its own document attachments — **not** an
|
||||
accidental duplicate. When two rows matched case-insensitively, Hibernate threw
|
||||
`NonUniqueResultException` → `IncorrectResultSizeDataAccessException` → a generic HTTP 500.
|
||||
|
||||
The effect was severe and opaque: every document carrying one of ~10 colliding tags (≈180
|
||||
document-tag attachments on staging) became **un-editable** — any field change failed on save
|
||||
because the whole tag set is re-resolved — and the user saw only "an unexpected error", with
|
||||
no hint that a tag was the cause.
|
||||
|
||||
This is a **lookup** problem, not a data problem: the collisions are valid canonical nodes
|
||||
and must be preserved.
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Resolution is exact-case first, then a non-throwing deterministic fallback
|
||||
|
||||
`findOrCreate` resolves in three ordered steps and never throws on a collision:
|
||||
|
||||
1. `findByName(cleanName)` — **exact-case** derived query. If present, return it. The edit
|
||||
round-trip replays the stored name verbatim, so the exact-case row is the right binding
|
||||
(typing the bare child name `weihnachten` binds to the child; `Weihnachten` binds to the
|
||||
parent container).
|
||||
2. else `findAllByNameIgnoreCase(cleanName)` — the **plural** case-insensitive list. If
|
||||
non-empty, return the element with the **lowest `id`** (`min(comparing(Tag::getId))`).
|
||||
3. else create the tag (an orphan: null `sourceRef`/`parentId`).
|
||||
|
||||
The two repository methods are deliberately **two distinctly-named methods** — Spring Data
|
||||
cannot disambiguate an `Optional<Tag>` from a `List<Tag>` derived query by return type alone.
|
||||
The throwing `Optional<Tag> findByNameIgnoreCase` is **deleted** so the non-unique-throwing
|
||||
call cannot be reintroduced; `findOrCreate` was its only production caller.
|
||||
|
||||
### 2. The tie-break is `id`, and it is load-bearing
|
||||
|
||||
`id` is a stable, always-present, unique column, so "lowest id" is a total, deterministic
|
||||
order over the candidates: the same name resolves to the same row on every call, forever,
|
||||
without throwing. This matters only in the free-text authoring path (step 2), where no
|
||||
exact-case row exists yet two case-folding rows do.
|
||||
|
||||
### 3. No `unique(lower(name))` constraint — and a load-bearing comment says so
|
||||
|
||||
A global case-insensitive uniqueness constraint is **wrong**: it would reject the legitimate
|
||||
parent/child canonical nodes. It would also **fail to apply** against the existing rows,
|
||||
turning a code-only deploy into a failed Flyway migration that blocks startup. A comment at
|
||||
both `findOrCreate` and the repository methods records this so the constraint is not "helpfully"
|
||||
added later.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Code-only, zero migration, fully reversible** (roll back the JAR). No tag data is touched;
|
||||
the colliding rows stay exactly as the canonical importer produced them.
|
||||
- One change fixes all three write paths — single-document edit, bulk-edit, and upload batch —
|
||||
because they all funnel through `resolveTags → findOrCreate`, which stays the single source
|
||||
of truth (resolution logic is **not** hoisted into `DocumentService`).
|
||||
- **Free-text tag semantics under collision are accepted as-is** (issue #730, option A): the
|
||||
bare word `weihnachten` binds to the deep child and `Weihnachten` to the parent container.
|
||||
Correct for the edit round-trip and acceptable for authoring; making the typeahead show the
|
||||
tree path so an author can tell a container from its same-named child is a separate
|
||||
follow-up.
|
||||
- The wire response stays opaque: after the fix this path no longer throws
|
||||
`IncorrectResultSizeDataAccessException`, and `GlobalExceptionHandler`'s generic handler maps
|
||||
any stray one to `INTERNAL_ERROR` with no Hibernate/SQL leak — so no dedicated handler was
|
||||
added.
|
||||
- **The sibling Person path is unfixed but tracked.** `PersonService.findOrCreateByAlias`
|
||||
(`findByAliasIgnoreCase`) and `findByFirstNameIgnoreCaseAndLastNameIgnoreCase` carry the same
|
||||
latent `Optional`-non-unique throw on user-influenced names; deferred to #731 rather than
|
||||
widened into this fix.
|
||||
- Postgres `LOWER()` folding of umlauts (`ü`/`ä`) is the actual correctness hinge of the
|
||||
fallback and cannot be proven by a mocked repo, so it is pinned by a Testcontainers
|
||||
`postgres:16-alpine` test on a `Glückwünsche`/`glückwünsche` pair; a plain-ASCII test would
|
||||
stay green while the bug reappeared for umlaut tags.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- **A `unique(lower(name))` index** — rejected: the collisions are valid canonical nodes, and
|
||||
the migration would fail against the existing data and block startup.
|
||||
- **Merging/deduping the colliding tags** — rejected: each has a distinct `source_ref`, tree
|
||||
position, and real document attachments; they are not duplicates.
|
||||
- **Round-tripping tag IDs instead of names** so resolution can't be ambiguous at all — the
|
||||
cleaner long-term shape (removes the name-as-key smell), but a larger change with frontend
|
||||
surface; deferred to #732. The lookup fix here is the minimal correct unblock.
|
||||
- **Hoisting resolution into `DocumentService.resolveTags`** — rejected: it would duplicate the
|
||||
rule across the edit, bulk-edit, and import paths and let them drift; `findOrCreate` stays
|
||||
the one owner.
|
||||
Reference in New Issue
Block a user