Dokumenttitel automatisch mit Datum/Ort synchronisieren (Save-time + einmaliger Backfill) #726

Closed
opened 2026-06-04 15:23:20 +02:00 by marcel · 1 comment
Owner

Problem

Document titles are stored strings built once at import ({index} – {dateLabel} – {location}). When a date or location is later corrected in the edit UI, the title does not follow — it leaves a stale title behind (e.g. the title still shows 2028 after the date is fixed to 1928). The title input is resubmitted unchanged on save, so the corrected date never reaches the title.

Goal

  • Titles stay correct automatically going forward when date/location change.
  • The existing backlog of already-stale titles is cleaned up once.
  • A hand-written title (e.g. C-0029 – Brief an Mutter) is never overwritten.

Actor: Admin / archivist.

Non-goals

  • No change to how date/location are edited.
  • No UI for the one-time cleanup.
  • Bulk edit is out of scope. DocumentBulkEditDTO carries no documentDate, metaDatePrecision, or location (it touches documentLocation = meta_document_location, a different field). Bulk edit cannot make a title stale, so there is nothing to do there.
  • File-replaced documents are not auto-synced. The leading index is the document's originalFilename, and DocumentService.updateDocument reassigns originalFilename to the uploaded file's name on file-replace (doc.setOriginalFilename(newFile.getOriginalFilename())). After a replace it is no longer the catalog index (and no longer matches the import INDEX_PATTERN), so the title no longer matches the formula and is treated as manual — neither save-time nor backfill will rewrite it. This is accepted (decision: keep overloading originalFilename, no dedicated catalogIndex column). Fail-safe by design.

How titles are built today (reference)

DocumentImporter.buildTitle() (private static), composing — in order, " – " separators:

{index} – {dateLabel} – {location}
   │          │             │
   │          │             └─ meta_location, verbatim (omitted if blank)
   │          └─ DocumentTitleFormatter, HONEST precision:
   │               DAY "15. Januar 1942" · MONTH "Januar 1942" · YEAR "1942"
   │               SEASON "Frühling 1943" · APPROX "ca. 1943" · RANGE "15.–20. Jan 1942"
   │               UNKNOWN → date segment omitted
   └─ originalFilename, e.g. "C-0029" (always present)

documentDate, metaDatePrecision, metaDateEnd, metaDateRaw, location all live on the Document entity, so a title is rebuildable later without the import spreadsheet. The edit form already submits metaDatePrecision/metaDateEnd/metaDateRaw as hidden inputs (WhoWhenSection.svelte), so the regenerated label is honest at the new precision.


A. Single source of truth (FR-TITLE-001)

Extract the title composition ({index} – {dateLabel} – {location}) out of DocumentImporter.buildTitle() into one shared component in the document package — e.g. DocumentTitleFactory under org.raddatz.familienarchiv.document. DocumentImporter (package importing) then calls it. Direction matters: document owns the formula, importing consumes it — do not make importing internals public to satisfy document.

Move the formatter too. DocumentTitleFormatter is currently package-private in importing. Since the new factory lives in document and depends on the date-label formatting, move DocumentTitleFormatter into the document package alongside the factory. Do not expose it from importing (that would invert the dependency the wrong way).

Scope note: the German date label is already a deliberate Java/TS dual implementation locked by docs/date-label-fixtures.json (#666). FR-001 is about the title composition only — do not try to collapse the frontend formatDocumentDate.

B. Save-time regeneration — exact, no heuristic (FR-TITLE-002, primary mechanism)

In DocumentService.updateDocument only (bulk edit is out of scope — see non-goals):

  1. Before any setter runs, compute autoTitleBefore = titleFactory.build(doc) from the document's currently-persisted index/date/precision/location. This must happen at the top of the method — the current code overwrites title/date/location at lines 379–382, and applyDatePrecision skips null fields, so the old state must be captured first.
  2. Decide the title:
    • If the submitted title equals autoTitleBefore → it was the machine value → set title = build(...) from the new state.
    • Else → keep the submitted title verbatim (hand-written or freshly typed).
  3. Apply the date/location/precision changes as today.

This is an exact old-vs-new comparison — no false positives, no false negatives. It relies on the edit form round-tripping the stored title when untouched (DescriptionSection.svelte, value={titleValue} — confirmed). Guard: a blank submitted title must not be regenerated to blank (title is NOT NULL).

projectedState must mirror the existing setter asymmetry exactly. In updateDocument, setDocumentDate(dto.getDocumentDate()) and setLocation(dto.getLocation()) overwrite unconditionally (a null DTO value clears the field), but applyDatePrecision skips null fields (keeps the stored precision/end/raw). The state the regenerated title is built from must reflect both rules — date/location taken from the DTO (incl. null→cleared), precision/end/raw taken from the DTO only when non-null, else from the entity. A mismatch here silently produces a wrong label.

Reference shape:

Document doc = documentRepository.findById(id).orElseThrow(...);
String autoTitleBefore = titleFactory.build(doc);            // BEFORE any setter
doc.setTitle(resolveTitle(dto.getTitle(), autoTitleBefore, doc, dto));
doc.setDocumentDate(dto.getDocumentDate());
applyDatePrecision(doc, dto);
doc.setLocation(dto.getLocation());
// ... rest unchanged

private String resolveTitle(String submitted, String autoBefore, Document doc, DocumentUpdateDTO dto) {
    if (!Objects.equals(submitted, autoBefore)) return submitted;   // manual or freshly typed
    return titleFactory.build(projectedState(doc, dto));            // regenerate from new state
}

C. One-time backlog cleanup — heuristic (FR-TITLE-003 / FR-TITLE-004)

The pre-edit state is gone for already-stale rows, so the cleanup uses a grammar heuristic.

  • FR-TITLE-003POST /api/admin/backfill-titles, under AdminController's class-level @RequirePermission(Permission.ADMIN), synchronous (the sweep is microseconds/row; matches backfill-versions/backfill-file-hashes), returns BackfillResult(count) where count = number updated. Iterates all documents; for each whose stored title passes the overwrite test, rebuilds it via titleFactory.build. Idempotent. No frontend UI — invoked once via curl / backend/api_tests/, called against the backend directly (port 8080) to bypass the SvelteKit proxy timeout.

    • Must NOT record a document_versions row per document. Save via documentRepository.save directly — follow the DocumentService.backfillFileHashes precedent, which does not call recordVersion. Never route the backfill through updateDocument (that path versions every write and would snapshot the whole corpus for a mechanical rename).
    • Emit one structured scanned/updated/skipped log line at completion (NFR-OBS-001), via SLF4J parameterized logging — never string-concatenate titles.
  • FR-TITLE-004 — Overwrite test (heuristic, used only by the cleanup). Prefer literal structural parsing: split the stored title on " – ", then test each segment. A title is overwritable iff it matches:

    1. {index} + a date segment matching the known formatter grammar, optional trailing location segment, or
    2. {index} + an optional location segment equal to the document's current location, or
    3. exactly {index}.

    Otherwise skipped. The {index} comparison must be literal — if any regex is used, Pattern.quote(index) it and anchor ^…$; originalFilename is user-controlled (set from an uploaded filename on file-replace, and no longer constrained by the import INDEX_PATTERN), so an unquoted pattern is a ReDoS / regex-injection vector (CWE-1333 / CWE-625). Avoid (...)*/(...)+ over title text. Fail closed: any malformed index/state → skip the document, never overwrite.

    {DATE_LABEL} = the formatter's emittable forms: YYYY, ca. YYYY, MMMM YYYY (German month), d. MMMM YYYY, {Frühling|Sommer|Herbst|Winter} YYYY, Datum unbekannt, and the range forms (ab d. MMM YYYY, d.–d. MMM YYYY, d. MMM – d. MMM YYYY, d. MMM YYYY – d. MMM YYYY).

D. Edit-form feedback (FR-TITLE-005)

The title input shows the old string while the user changes the date, then changes on its own after save — a surprise, especially for the 60+ audience. Add a helper line under the title input (DescriptionSection.svelte), associated via aria-describedby (the helper's id referenced from the <input>), localized de/en/es via Paraglide, e.g. „Wird automatisch aus Datum und Ort gebildet — sobald du den Titel änderst, bleibt deine Version erhalten.“ Keep it ≥12px (prefer 16px), not color-only, and verify the chosen token (e.g. text-ink-3) meets WCAG AA contrast (≥4.5:1) on bg-surface. (Live preview was considered and declined.)


Acceptance criteria

# Save-time (exact)
Scenario: Changing the date corrects the auto-title on save
  Given a stored document with title "C-0029 – 2028 – Berlin"
  When I change its date to 1928 in the edit form and save (title field untouched)
  Then the saved title is "C-0029 – 1928 – Berlin"

Scenario: A hand-written title survives a date change
  Given a stored document with title "C-0029 – Brief an Mutter"
  When I change its date and save
  Then the saved title is still "C-0029 – Brief an Mutter"

Scenario: Typing a new title in the same save wins
  Given a stored auto-title "C-0029 – 2028 – Berlin"
  When I change the date AND type "Geburtsanzeige" and save
  Then the saved title is "Geburtsanzeige"

Scenario: Date and location both changed
  Given a stored auto-title "C-0029 – 2028 – Berlin"
  When I change date to 1928 and location to "München" and save
  Then the saved title is "C-0029 – 1928 – München"

Scenario: Clearing the location drops the trailing segment
  Given a stored auto-title "C-0029 – 1928 – Berlin"
  When I clear the location and save (title field untouched)
  Then the saved title is "C-0029 – 1928"

Scenario: Regeneration uses the new date, not the old one
  Given a stored auto-title "C-0029 – 2028 – Berlin"
  When I change the date to 1928 and save
  Then the saved title does not contain "2028"

Scenario: Precision change relabels (YEAR to DAY)
  Given a stored auto-title "C-0029 – 1928" with YEAR precision
  When I set a full day date with DAY precision and save
  Then the saved title is "C-0029 – 15. Januar 1928"

Scenario: Adding a date to an UNKNOWN row populates the title
  Given a stored auto-title "C-0029" with UNKNOWN precision
  When I set a 1928 YEAR date and save
  Then the saved title is "C-0029 – 1928"

Scenario: SEASON and RANGE labels round-trip
  Given a stored auto-title built at SEASON/RANGE precision
  When I change the date keeping that precision and save
  Then the saved title carries the honest SEASON/RANGE label

Scenario: Blank submitted title is not regenerated to blank
  Given a stored auto-title and an empty submitted title
  When I save
  Then the title is not set to blank

Scenario: File-replaced document is treated as manual
  Given a document whose originalFilename was changed by a file-replace
  When I change its date and save
  Then its title is left unchanged

Scenario: Save-time idempotency
  Given a document saved with no change to date or location
  Then a manual title is left untouched and an auto title is unchanged

# One-time cleanup (heuristic)
Scenario: Backfill fixes an already-stale title
  Given a document already in the DB with title "C-0029 – 2028 – Berlin" and date 1928
  When an admin POSTs /api/admin/backfill-titles
  Then that title becomes "C-0029 – 1928 – Berlin" and count includes it

Scenario: Backfill skips prose
  Given a document with title "C-0029 – Brief an Mutter"
  When the backfill runs
  Then the title is unchanged

Scenario: Backfill is idempotent and does not version-spam
  Given the backfill has just run
  When it runs again immediately
  Then count == 0 and no document_versions rows were added by the backfill

Scenario: Index with regex metacharacters is matched literally
  Given a document whose originalFilename is "C-0029(.*).pdf"
  When the backfill runs
  Then matching terminates and the title is handled literally (no hang)

Scenario: Backfill permission
  Given a caller who is unauthenticated, or authenticated with only READ_ALL/WRITE_ALL
  When they POST /api/admin/backfill-titles
  Then it is rejected (401 unauthenticated / 403 non-admin)

Non-functional requirements

  • NFR-MAINT-001 — one shared title builder in the document package (FR-001); importer + save-time + backfill never diverge. Date-label Java/TS split (#666) stays as-is.
  • NFR-PERF-001 — save-time adds only a string build + compare; backfill is a single synchronous transactional sweep over a low-thousands corpus.
  • NFR-OBS-001 — backfill logs scanned/updated/skipped (SLF4J parameterized); count (= updated) is the response.
  • NFR-SEC-001 — backfill requires ADMIN; ordering/ReDoS guards per FR-004.

Test strategy

  • Unit (@ExtendWith(MockitoExtension.class), mocked repo): all save-time scenarios + the heuristic matcher in isolation (each DATE_LABEL form matches; prose skipped; regex-metacharacter index matched literally and terminates). The logic is pure — keep it off Spring.
  • Integration (Testcontainers postgres:16-alpine, never H2 — title is NOT NULL): POST /api/admin/backfill-titles fixes a stale row; idempotent (second run count == 0); skips prose; asserts no document_versions row added per doc; returns 401/403 for non-admin.
  • E2E: one Playwright pass (edit date → save → title updated on detail page). Keep permutations at the unit/integration layer.
  • i18n: assert the FR-005 helper key exists in all three locales (messages/{de,en,es}.json) — no missing-translation fallback.
  • Use factory builders makeDocument(title, date, precision, location).

Documentation

  • ADR in docs/adr/ for "document title is a shared document-package service + save-time exact-match regeneration".
  • docs/GLOSSARY.md entry for auto-generated title.
  • New backend service/endpoint → update the matching docs/architecture/c4/l3-backend-*.puml.
  • Add a backend/api_tests/ entry for POST /api/admin/backfill-titles hitting the backend directly on port 8080 (with the runbook note that it is a one-shot admin call).

Implementation notes

  • Move DocumentTitleFormatter into the document package; do not duplicate the German formatting or collapse the TS mirror.
  • Compute autoTitleBefore from the persisted entity before mutating it with the DTO; projectedState mirrors the date/location-overwrite vs precision-skip-null asymmetry.
  • Backfill follows the BackfillResult(count) shape and the backfillFileHashes save pattern (documentRepository.save, no recordVersion); never route it through updateDocument.
## Problem Document titles are stored strings built once at import (`{index} – {dateLabel} – {location}`). When a date or location is later corrected in the edit UI, the title does **not** follow — it leaves a stale title behind (e.g. the title still shows `2028` after the date is fixed to `1928`). The title input is resubmitted unchanged on save, so the corrected date never reaches the title. ## Goal - Titles stay correct **automatically** going forward when date/location change. - The **existing** backlog of already-stale titles is cleaned up once. - A **hand-written** title (e.g. `C-0029 – Brief an Mutter`) is **never** overwritten. **Actor:** Admin / archivist. ## Non-goals - No change to *how* date/location are edited. - No UI for the one-time cleanup. - **Bulk edit is out of scope.** `DocumentBulkEditDTO` carries no `documentDate`, `metaDatePrecision`, or `location` (it touches `documentLocation` = `meta_document_location`, a *different* field). Bulk edit cannot make a title stale, so there is nothing to do there. - **File-replaced documents are not auto-synced.** The leading `index` is the document's `originalFilename`, and `DocumentService.updateDocument` reassigns `originalFilename` to the uploaded file's name on file-replace (`doc.setOriginalFilename(newFile.getOriginalFilename())`). After a replace it is no longer the catalog index (and no longer matches the import `INDEX_PATTERN`), so the title no longer matches the formula and is treated as **manual** — neither save-time nor backfill will rewrite it. This is accepted (decision: keep overloading `originalFilename`, no dedicated `catalogIndex` column). Fail-safe by design. ## How titles are built today (reference) `DocumentImporter.buildTitle()` (private static), composing — in order, `" – "` separators: ``` {index} – {dateLabel} – {location} │ │ │ │ │ └─ meta_location, verbatim (omitted if blank) │ └─ DocumentTitleFormatter, HONEST precision: │ DAY "15. Januar 1942" · MONTH "Januar 1942" · YEAR "1942" │ SEASON "Frühling 1943" · APPROX "ca. 1943" · RANGE "15.–20. Jan 1942" │ UNKNOWN → date segment omitted └─ originalFilename, e.g. "C-0029" (always present) ``` `documentDate`, `metaDatePrecision`, `metaDateEnd`, `metaDateRaw`, `location` all live on the `Document` entity, so a title is rebuildable later without the import spreadsheet. The edit form already submits `metaDatePrecision`/`metaDateEnd`/`metaDateRaw` as hidden inputs (`WhoWhenSection.svelte`), so the regenerated label is honest at the new precision. --- ## A. Single source of truth (FR-TITLE-001) Extract the title **composition** (`{index} – {dateLabel} – {location}`) out of `DocumentImporter.buildTitle()` into one shared component **in the `document` package** — e.g. `DocumentTitleFactory` under `org.raddatz.familienarchiv.document`. `DocumentImporter` (package `importing`) then calls *it*. Direction matters: `document` owns the formula, `importing` consumes it — do **not** make `importing` internals public to satisfy `document`. > **Move the formatter too.** `DocumentTitleFormatter` is currently package-private **in `importing`**. Since the new factory lives in `document` and depends on the date-label formatting, **move `DocumentTitleFormatter` into the `document` package** alongside the factory. Do not expose it from `importing` (that would invert the dependency the wrong way). > **Scope note:** the German date *label* is already a deliberate Java/TS dual implementation locked by `docs/date-label-fixtures.json` (#666). FR-001 is about the title **composition only** — do **not** try to collapse the frontend `formatDocumentDate`. ## B. Save-time regeneration — exact, no heuristic (FR-TITLE-002, primary mechanism) In `DocumentService.updateDocument` **only** (bulk edit is out of scope — see non-goals): 1. **Before any setter runs**, compute `autoTitleBefore = titleFactory.build(doc)` from the document's **currently-persisted** index/date/precision/location. This must happen at the top of the method — the current code overwrites title/date/location at lines 379–382, and `applyDatePrecision` skips null fields, so the old state must be captured first. 2. Decide the title: - If the **submitted title equals `autoTitleBefore`** → it was the machine value → set title = `build(...)` from the **new** state. - Else → keep the submitted title verbatim (hand-written or freshly typed). 3. Apply the date/location/precision changes as today. This is an **exact** old-vs-new comparison — no false positives, no false negatives. It relies on the edit form round-tripping the stored title when untouched (`DescriptionSection.svelte`, `value={titleValue}` — confirmed). Guard: a blank submitted title must not be regenerated to blank (title is `NOT NULL`). > **`projectedState` must mirror the existing setter asymmetry exactly.** In `updateDocument`, `setDocumentDate(dto.getDocumentDate())` and `setLocation(dto.getLocation())` overwrite **unconditionally** (a null DTO value *clears* the field), but `applyDatePrecision` **skips null** fields (keeps the stored precision/end/raw). The state the regenerated title is built from must reflect both rules — date/location taken from the DTO (incl. null→cleared), precision/end/raw taken from the DTO only when non-null, else from the entity. A mismatch here silently produces a wrong label. Reference shape: ```java Document doc = documentRepository.findById(id).orElseThrow(...); String autoTitleBefore = titleFactory.build(doc); // BEFORE any setter doc.setTitle(resolveTitle(dto.getTitle(), autoTitleBefore, doc, dto)); doc.setDocumentDate(dto.getDocumentDate()); applyDatePrecision(doc, dto); doc.setLocation(dto.getLocation()); // ... rest unchanged private String resolveTitle(String submitted, String autoBefore, Document doc, DocumentUpdateDTO dto) { if (!Objects.equals(submitted, autoBefore)) return submitted; // manual or freshly typed return titleFactory.build(projectedState(doc, dto)); // regenerate from new state } ``` ## C. One-time backlog cleanup — heuristic (FR-TITLE-003 / FR-TITLE-004) The pre-edit state is gone for already-stale rows, so the cleanup uses a grammar heuristic. - **FR-TITLE-003** — `POST /api/admin/backfill-titles`, under `AdminController`'s class-level `@RequirePermission(Permission.ADMIN)`, **synchronous** (the sweep is microseconds/row; matches `backfill-versions`/`backfill-file-hashes`), returns `BackfillResult(count)` where `count` = number updated. Iterates **all** documents; for each whose stored title passes the overwrite test, rebuilds it via `titleFactory.build`. **Idempotent.** **No frontend UI** — invoked once via `curl` / `backend/api_tests/`, called against the backend **directly (port 8080)** to bypass the SvelteKit proxy timeout. - **Must NOT record a `document_versions` row per document.** Save via `documentRepository.save` **directly** — follow the `DocumentService.backfillFileHashes` precedent, which does not call `recordVersion`. **Never route the backfill through `updateDocument`** (that path versions every write and would snapshot the whole corpus for a mechanical rename). - Emit one structured `scanned/updated/skipped` log line at completion (NFR-OBS-001), via SLF4J parameterized logging — never string-concatenate titles. - **FR-TITLE-004** — Overwrite test (heuristic, used **only** by the cleanup). Prefer **literal structural parsing**: split the stored title on `" – "`, then test each segment. A title is overwritable iff it matches: 1. `{index}` + a date segment matching the known formatter grammar, optional trailing location segment, **or** 2. `{index}` + an optional location segment equal to the document's current `location`, **or** 3. exactly `{index}`. Otherwise skipped. The `{index}` comparison must be **literal** — if any regex is used, `Pattern.quote(index)` it and anchor `^…$`; `originalFilename` is user-controlled (set from an uploaded filename on file-replace, and no longer constrained by the import `INDEX_PATTERN`), so an unquoted pattern is a ReDoS / regex-injection vector (CWE-1333 / CWE-625). Avoid `(...)*`/`(...)+` over title text. **Fail closed:** any malformed index/state → skip the document, never overwrite. `{DATE_LABEL}` = the formatter's emittable forms: `YYYY`, `ca. YYYY`, `MMMM YYYY` (German month), `d. MMMM YYYY`, `{Frühling|Sommer|Herbst|Winter} YYYY`, `Datum unbekannt`, and the range forms (`ab d. MMM YYYY`, `d.–d. MMM YYYY`, `d. MMM – d. MMM YYYY`, `d. MMM YYYY – d. MMM YYYY`). ## D. Edit-form feedback (FR-TITLE-005) The title input shows the old string while the user changes the date, then changes on its own after save — a surprise, especially for the 60+ audience. Add a **helper line** under the title input (`DescriptionSection.svelte`), associated via `aria-describedby` (the helper's `id` referenced from the `<input>`), localized de/en/es via Paraglide, e.g. *„Wird automatisch aus Datum und Ort gebildet — sobald du den Titel änderst, bleibt deine Version erhalten.“* Keep it ≥12px (prefer 16px), not color-only, and verify the chosen token (e.g. `text-ink-3`) meets WCAG AA contrast (≥4.5:1) on `bg-surface`. (Live preview was considered and declined.) --- ## Acceptance criteria ```gherkin # Save-time (exact) Scenario: Changing the date corrects the auto-title on save Given a stored document with title "C-0029 – 2028 – Berlin" When I change its date to 1928 in the edit form and save (title field untouched) Then the saved title is "C-0029 – 1928 – Berlin" Scenario: A hand-written title survives a date change Given a stored document with title "C-0029 – Brief an Mutter" When I change its date and save Then the saved title is still "C-0029 – Brief an Mutter" Scenario: Typing a new title in the same save wins Given a stored auto-title "C-0029 – 2028 – Berlin" When I change the date AND type "Geburtsanzeige" and save Then the saved title is "Geburtsanzeige" Scenario: Date and location both changed Given a stored auto-title "C-0029 – 2028 – Berlin" When I change date to 1928 and location to "München" and save Then the saved title is "C-0029 – 1928 – München" Scenario: Clearing the location drops the trailing segment Given a stored auto-title "C-0029 – 1928 – Berlin" When I clear the location and save (title field untouched) Then the saved title is "C-0029 – 1928" Scenario: Regeneration uses the new date, not the old one Given a stored auto-title "C-0029 – 2028 – Berlin" When I change the date to 1928 and save Then the saved title does not contain "2028" Scenario: Precision change relabels (YEAR to DAY) Given a stored auto-title "C-0029 – 1928" with YEAR precision When I set a full day date with DAY precision and save Then the saved title is "C-0029 – 15. Januar 1928" Scenario: Adding a date to an UNKNOWN row populates the title Given a stored auto-title "C-0029" with UNKNOWN precision When I set a 1928 YEAR date and save Then the saved title is "C-0029 – 1928" Scenario: SEASON and RANGE labels round-trip Given a stored auto-title built at SEASON/RANGE precision When I change the date keeping that precision and save Then the saved title carries the honest SEASON/RANGE label Scenario: Blank submitted title is not regenerated to blank Given a stored auto-title and an empty submitted title When I save Then the title is not set to blank Scenario: File-replaced document is treated as manual Given a document whose originalFilename was changed by a file-replace When I change its date and save Then its title is left unchanged Scenario: Save-time idempotency Given a document saved with no change to date or location Then a manual title is left untouched and an auto title is unchanged # One-time cleanup (heuristic) Scenario: Backfill fixes an already-stale title Given a document already in the DB with title "C-0029 – 2028 – Berlin" and date 1928 When an admin POSTs /api/admin/backfill-titles Then that title becomes "C-0029 – 1928 – Berlin" and count includes it Scenario: Backfill skips prose Given a document with title "C-0029 – Brief an Mutter" When the backfill runs Then the title is unchanged Scenario: Backfill is idempotent and does not version-spam Given the backfill has just run When it runs again immediately Then count == 0 and no document_versions rows were added by the backfill Scenario: Index with regex metacharacters is matched literally Given a document whose originalFilename is "C-0029(.*).pdf" When the backfill runs Then matching terminates and the title is handled literally (no hang) Scenario: Backfill permission Given a caller who is unauthenticated, or authenticated with only READ_ALL/WRITE_ALL When they POST /api/admin/backfill-titles Then it is rejected (401 unauthenticated / 403 non-admin) ``` ## Non-functional requirements - **NFR-MAINT-001** — one shared title builder in the `document` package (FR-001); importer + save-time + backfill never diverge. Date-label Java/TS split (#666) stays as-is. - **NFR-PERF-001** — save-time adds only a string build + compare; backfill is a single synchronous transactional sweep over a low-thousands corpus. - **NFR-OBS-001** — backfill logs `scanned/updated/skipped` (SLF4J parameterized); `count` (= updated) is the response. - **NFR-SEC-001** — backfill requires `ADMIN`; ordering/ReDoS guards per FR-004. ## Test strategy - **Unit (`@ExtendWith(MockitoExtension.class)`, mocked repo):** all save-time scenarios + the heuristic matcher in isolation (each `DATE_LABEL` form matches; prose skipped; regex-metacharacter index matched literally and terminates). The logic is pure — keep it off Spring. - **Integration (Testcontainers `postgres:16-alpine`, never H2 — `title` is `NOT NULL`):** `POST /api/admin/backfill-titles` fixes a stale row; idempotent (second run `count == 0`); skips prose; **asserts no `document_versions` row added per doc**; returns 401/403 for non-admin. - **E2E:** one Playwright pass (edit date → save → title updated on detail page). Keep permutations at the unit/integration layer. - **i18n:** assert the FR-005 helper key exists in **all three** locales (`messages/{de,en,es}.json`) — no missing-translation fallback. - Use factory builders `makeDocument(title, date, precision, location)`. ## Documentation - ADR in `docs/adr/` for "document title is a shared `document`-package service + save-time exact-match regeneration". - `docs/GLOSSARY.md` entry for *auto-generated title*. - New backend service/endpoint → update the matching `docs/architecture/c4/l3-backend-*.puml`. - Add a `backend/api_tests/` entry for `POST /api/admin/backfill-titles` hitting the backend **directly on port 8080** (with the runbook note that it is a one-shot admin call). ## Implementation notes - Move `DocumentTitleFormatter` into the `document` package; do not duplicate the German formatting or collapse the TS mirror. - Compute `autoTitleBefore` from the persisted entity **before** mutating it with the DTO; `projectedState` mirrors the date/location-overwrite vs precision-skip-null asymmetry. - Backfill follows the `BackfillResult(count)` shape and the `backfillFileHashes` save pattern (`documentRepository.save`, no `recordVersion`); never route it through `updateDocument`.
marcel added the P2-mediumfeature labels 2026-06-04 15:23:25 +02:00
Author
Owner

Implemented on feat/issue-726-auto-title-sync

All acceptance criteria covered with red/green TDD; backend tests green (293 across the touched classes incl. a Testcontainers postgres:16 integration test), clean package builds, frontend lint clean.

What landed

A — Single source of truth (FR-TITLE-001)

  • New DocumentTitleFactory (@Component, document package) owns the {index} – {dateLabel} – {location} formula; DocumentImporter now consumes it.
  • DocumentTitleFormatter moved into document (package-private); its #666 fixture-parity test moved with it.

B — Save-time regeneration, exact match (FR-TITLE-002)

  • updateDocument captures autoTitleBefore from the persisted state before any setter, then resolveTitle + projectedState rebuild only when the submitted title still equals it. Hand-written/freshly-typed titles kept; blank never persisted; file-replaced docs fall through as manual (by design). projectedState mirrors the date/location-overwrite vs precision-skip-null asymmetry. All save-time gherkin scenarios are unit tests.

C — One-time backlog cleanup (FR-TITLE-003/004)

  • POST /api/admin/backfill-titles (synchronous, ADMIN-only) → BackfillResult(count).
  • DocumentTitleBackfillMatcher heuristic: literal startsWith index (no ReDoS/regex injection, CWE-1333), date-label forms derived from the same Locale.GERMAN formatters as the factory (no drift), prose left untouched, fail-closed. Each emittable date-label form unit-tested incl. the regex-metacharacter index case.
  • backfillTitles() saves via the repository directly (no recordVersion — no version-spam), idempotent, logs scanned/updated/skipped.
  • Testcontainers integration test: stale row fixed · idempotent (2nd run count==0) · prose skipped · no document_versions rows added. Permission 401/403 covered by the @WebMvcTest slice.

D — Edit-form feedback (FR-TITLE-005)

  • Helper line under the title input in DescriptionSection.svelte (single-edit only, via showTitleHelp), aria-describedby-wired, text-ink-3 (AA on bg-surface). New Paraglide key form_helper_title_autogenerated in de/en/es. Component test + one Playwright E2E (create auto-titled doc → edit date → title follows on detail page).

Docs — ADR-031, GLOSSARY.md auto-generated title, l3-backend-3b-document-management.puml, and a backend/api_tests/ runbook entry hitting port 8080 directly.

Commits

  • b1f77bc refactor(document): extract title composition into shared DocumentTitleFactory
  • e6ce000 feat(document): regenerate auto-title on save when date/location change
  • 26b45f1 feat(document): one-time backfill endpoint for stale auto-titles
  • 12db7b3 test(document): integration-test title backfill against real Postgres
  • 83e0afb feat(document): explain auto-generated title under the edit title field
  • cf457cb docs(document): ADR-031 + glossary/c4/api_tests

Operator note

After deploy, run the cleanup once: POST http://<backend>:8080/api/admin/backfill-titles (ADMIN, direct to 8080 to bypass the proxy timeout). Idempotent — a second run returns {"count": 0}.

## Implemented on `feat/issue-726-auto-title-sync` All acceptance criteria covered with red/green TDD; backend tests green (293 across the touched classes incl. a Testcontainers `postgres:16` integration test), `clean package` builds, frontend lint clean. ### What landed **A — Single source of truth (FR-TITLE-001)** - New `DocumentTitleFactory` (`@Component`, `document` package) owns the `{index} – {dateLabel} – {location}` formula; `DocumentImporter` now consumes it. - `DocumentTitleFormatter` moved into `document` (package-private); its #666 fixture-parity test moved with it. **B — Save-time regeneration, exact match (FR-TITLE-002)** - `updateDocument` captures `autoTitleBefore` from the persisted state before any setter, then `resolveTitle` + `projectedState` rebuild only when the submitted title still equals it. Hand-written/freshly-typed titles kept; blank never persisted; file-replaced docs fall through as manual (by design). `projectedState` mirrors the date/location-overwrite vs precision-skip-null asymmetry. All save-time gherkin scenarios are unit tests. **C — One-time backlog cleanup (FR-TITLE-003/004)** - `POST /api/admin/backfill-titles` (synchronous, ADMIN-only) → `BackfillResult(count)`. - `DocumentTitleBackfillMatcher` heuristic: literal `startsWith` index (no ReDoS/regex injection, CWE-1333), date-label forms derived from the same `Locale.GERMAN` formatters as the factory (no drift), prose left untouched, fail-closed. Each emittable date-label form unit-tested incl. the regex-metacharacter index case. - `backfillTitles()` saves via the repository directly (no `recordVersion` — no version-spam), idempotent, logs `scanned/updated/skipped`. - Testcontainers integration test: stale row fixed · idempotent (2nd run `count==0`) · prose skipped · **no `document_versions` rows added**. Permission 401/403 covered by the `@WebMvcTest` slice. **D — Edit-form feedback (FR-TITLE-005)** - Helper line under the title input in `DescriptionSection.svelte` (single-edit only, via `showTitleHelp`), `aria-describedby`-wired, `text-ink-3` (AA on `bg-surface`). New Paraglide key `form_helper_title_autogenerated` in de/en/es. Component test + one Playwright E2E (create auto-titled doc → edit date → title follows on detail page). **Docs** — ADR-031, `GLOSSARY.md` *auto-generated title*, `l3-backend-3b-document-management.puml`, and a `backend/api_tests/` runbook entry hitting port 8080 directly. ### Commits - `b1f77bc` refactor(document): extract title composition into shared DocumentTitleFactory - `e6ce000` feat(document): regenerate auto-title on save when date/location change - `26b45f1` feat(document): one-time backfill endpoint for stale auto-titles - `12db7b3` test(document): integration-test title backfill against real Postgres - `83e0afb` feat(document): explain auto-generated title under the edit title field - `cf457cb` docs(document): ADR-031 + glossary/c4/api_tests ### Operator note After deploy, run the cleanup **once**: `POST http://<backend>:8080/api/admin/backfill-titles` (ADMIN, direct to 8080 to bypass the proxy timeout). Idempotent — a second run returns `{"count": 0}`.
Sign in to join this conversation.
No Label P2-medium feature
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#726