# ADR-031 — The document title is a shared `document`-package factory, re-synced by an exact match on save and a grammar heuristic on a one-time backfill **Date:** 2026-06-04 **Status:** Accepted **Issue:** #726 (auto-sync document titles with date/location: save-time + one-time backfill) **Milestone:** — --- ## Context A document title was a string built **once**, at import time, by a private `DocumentImporter.buildTitle()` composing `{index} – {dateLabel} – {location}` (index = `originalFilename`, date label honest at the row's precision via `DocumentTitleFormatter`, location verbatim). Nothing rebuilt it afterwards. When an archivist later corrected a date or location in the edit form, the title kept its stale value (e.g. it still read `2028` after the date was fixed to `1928`), because the edit form round-trips the stored title verbatim and `updateDocument` simply re-persisted it. Two distinct problems live here: 1. **Going forward**, an edit to date/location must flow into a title that was machine-built — but must never overwrite a title a human wrote. 2. **The existing backlog** of already-stale titles must be cleaned once. For these rows the pre-edit state is gone, so there is no exact value to compare against. The composition formula also existed only inside `importing`, which is the wrong owner: a title is a `document` concern, and three call sites (import, save-time, backfill) must share one rule or they will drift. ## Decision ### 1. One formula, owned by the `document` package (`DocumentTitleFactory`) Extract the composition into `DocumentTitleFactory` (a `@Component` in the `document` package) with `build(Document)`. `DocumentImporter` (package `importing`) now consumes it. `DocumentTitleFormatter` moves into `document` alongside the factory (it stays package-private; `importing` reaches the formula only through the factory). The direction is deliberate: `document` owns the rule, `importing` depends on it — not the reverse. The German date *label* remains the deliberate Java/TS dual implementation pinned by `docs/date-label-fixtures.json` (#666); this ADR touches the **composition** only and does not collapse the frontend `formatDocumentDate`. ### 2. Save-time regeneration is an EXACT match, not a heuristic In `DocumentService.updateDocument` only (bulk edit is out of scope), capture `autoTitleBefore = titleFactory.build(doc)` from the **currently-persisted** state *before* any setter runs. Then: - if the **submitted** title equals `autoTitleBefore`, it was the machine value → rebuild from the new state; - otherwise keep the submitted title verbatim (hand-written or freshly typed). This is an exact old-vs-new comparison — no false positives, no false negatives — relying on the edit form round-tripping an untouched title verbatim. `projectedState` mirrors the existing setter asymmetry exactly: `documentDate`/`location` overwrite unconditionally (a null clears them), while precision/end/raw are taken from the DTO only when non-null and otherwise kept from the entity. A blank submission is never persisted (the title is always present) — it falls back to the rebuilt auto-title, which always carries at least the index. ### 3. The one-time backlog cleanup is a grammar heuristic, behind an ADMIN endpoint `POST /api/admin/backfill-titles` (synchronous, under `AdminController`'s class-level `@RequirePermission(Permission.ADMIN)`) sweeps every document and, for each whose stored title passes the overwrite test, rebuilds it via the factory. Because the pre-edit state is gone, the test (`DocumentTitleBackfillMatcher`, used **only** here) is a grammar heuristic: after stripping the **literal** index prefix, the remainder must be exactly the index, a known date-label form (+ an optional trailing location), or a lone segment equal to the document's current location. Prose is left untouched; anything malformed fails closed. The backfill saves via `documentRepository.save` directly and **never** routes through `updateDocument` — following the `backfillFileHashes` precedent — so a mechanical rename does not snapshot the whole corpus into `document_versions`. It is idempotent (a second run rewrites nothing) and logs one SLF4J-parameterized `scanned/updated/skipped` line; the response is `BackfillResult(count)`. ### 4. Edit-form feedback (FR-005) A localized helper line (de/en/es) under the title input explains that the title is built from date/place and that a hand-edit is preserved, wired via `aria-describedby` and shown only on the single-document edit form. A live preview was considered and declined. ## Consequences - The three call sites can never diverge — there is exactly one formula (`NFR-MAINT-001`). Save-time cost is a string build + compare; the backfill is one synchronous transactional sweep over a low-thousands corpus. - Security: the index is compared **literally** (`String.startsWith` / `Pattern.quote`) because `originalFilename` is user-controlled and may carry regex metacharacters — an unquoted pattern would be a ReDoS / regex-injection vector (CWE-1333 / CWE-625). The date-label sub-patterns use only bounded, non-nested quantifiers. - **File-replaced documents are treated as manual, by design.** The index is `originalFilename`, which `updateDocument` reassigns to the uploaded file's name on a file-replace. After a replace the stored title no longer matches `build(currentState)`, so neither save-time nor backfill rewrites it. This is the accepted fail-safe of overloading `originalFilename` rather than adding a dedicated `catalogIndex` column. - The save-time heuristic risk is zero (exact match); the backfill heuristic can, by its documented FR-004 rule, treat `{index} – {valid date label} – {anything}` as machine-built and rewrite the trailing segment. This is the accepted trade for cleaning the backlog without the lost pre-edit state. ## Alternatives considered - **A dedicated `catalogIndex` column** instead of overloading `originalFilename` — rejected; it adds a migration and a second source of truth for the index for no current benefit, and the file-replace fail-safe is acceptable. - **A heuristic at save-time too** (instead of the exact match) — rejected; the stored title is available pre-edit, so an exact comparison is strictly better (no false positives). - **A live title preview in the edit form** — rejected (FR-005); a static helper line is calmer for the 60+ audience and avoids a second client-side mirror of the formula. - **Collapsing the frontend `formatDocumentDate` into the backend** — out of scope; the Java/TS date-label split is the deliberate #666 design, pinned by a shared fixture.