Dokumenttitel automatisch mit Datum/Ort synchronisieren (Save-time + Backfill) (#726) #727
Reference in New Issue
Block a user
Delete Branch "feat/issue-726-auto-title-sync"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #726.
Document titles were strings built once at import (
{index} – {dateLabel} – {location}) and never rebuilt, so correcting a date/location left a stale title behind. This makes titles follow date/location forward automatically, cleans the existing backlog once, and never overwrites a hand-written title.What changed
A — Single source of truth (FR-TITLE-001)
DocumentTitleFactory(@Component,documentpackage) owns the{index} – {dateLabel} – {location}formula;DocumentImporterconsumes it.DocumentTitleFormattermoved intodocument(package-private) with its #666 fixture-parity test.B — Save-time regeneration, exact match (FR-TITLE-002)
updateDocumentcapturesautoTitleBeforefrom the persisted state before any setter;resolveTitle+projectedStaterebuild only when the submitted title still equals it. Hand-written/freshly-typed titles kept, blank never persisted, file-replaced docs treated as manual.projectedStatemirrors the date/location-overwrite vs precision-skip-null asymmetry.C — One-time backlog cleanup (FR-TITLE-003/004)
POST /api/admin/backfill-titles(synchronous, ADMIN-only) →BackfillResult(count).DocumentTitleBackfillMatcher: literalstartsWithindex (no ReDoS / regex injection, CWE-1333), date-label forms derived from the sameLocale.GERMANformatters as the factory (no drift), prose left untouched, fail-closed.recordVersion→ no version-spam), idempotent, logsscanned/updated/skipped.D — Edit-form feedback (FR-TITLE-005)
DescriptionSection.svelte(single-edit only, viashowTitleHelp),aria-describedby-wired,text-ink-3(AA onbg-surface). New Paraglide keyform_helper_title_autogeneratedin de/en/es.Docs — ADR-031,
GLOSSARY.mdauto-generated title,l3-backend-3b-document-management.puml,backend/api_tests/runbook entry.Tests
postgres:16): stale row fixed · idempotent (2nd runcount==0) · prose skipped · nodocument_versionsrows added.DescriptionSectionhelper component test + one Playwright E2E (edit date → title follows on detail page).293 tests green across the touched classes;
clean packagebuilds; frontend lint clean.Operator note
Run the cleanup once after deploy:
POST http://<backend>:8080/api/admin/backfill-titles(ADMIN, direct to port 8080 to bypass the SvelteKit proxy timeout). Idempotent — a second run returns{"count": 0}.🤖 Generated with Claude Code
Move DocumentTitleFormatter from importing into the document package and introduce DocumentTitleFactory there as the single source of truth for the {index} – {dateLabel} – {location} formula. DocumentImporter now consumes the factory instead of owning the composition; the document package owns the rule, importing depends on it (not the reverse). No behavioral change — importer title assertions and the #666 fixture parity test stay green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>⚙️ Tobias Wendt — DevOps & Platform Engineer
Verdict: ⚠️ Approved with concerns
The operational story is genuinely good for a one-shot maintenance endpoint: it reuses the established
backfill-*precedent, is ADMIN-gated, idempotent, fail-closed, logsscanned/updated/skipped, and ships a runbook entry. The ReDoS guard (literalstartsWith,Pattern.quoted month tokens, bounded quantifiers) is exactly the kind of thing that bites you in prod, and it's handled. My concerns are about how the full-corpus sweep behaves under a transaction on the request thread, not about correctness.Blockers
None. This is shippable as a one-time operator action.
Suggestions
Whole-corpus load inside one transaction — watch the heap and the lock window.
DocumentService.backfillTitles()doesdocumentRepository.findAll()(line ~1072) and iterates with per-rowsave(), all inside a single@Transactional. For the current archive (low thousands of rows) this is fine, but it means: the entiredocumentstable is materialised into the persistence context at once, and a single long-lived transaction holds for the duration of the sweep. Two pragmatic asks: (a) confirm the realistic corpus size keeps the heap comfortable on the CX32 (8GB) target, and (b) if it grows, switch to a paged/streamed sweep or chunked transactions. Not a blocker today — just don't letfindAll()become the default pattern as the corpus grows. ThebackfillFileHashesprecedent at least filters (findByFileHashIsNull...); this one scans everything by design, so it's the heaviest of the three.Synchronous on the request thread + the port-8080 note. The runbook correctly says "hit port 8080 directly, NOT through the SvelteKit proxy, so the sweep can't trip the proxy timeout." Good catch — but that's a smell worth flagging: it means the endpoint's runtime can exceed the proxy timeout, and there's no upstream limit if the corpus grows. The async pattern already exists in this very controller (
runImportAsync,runBackfillAsync+ a status endpoint). For a one-shot today, synchronous is acceptable and simpler (KISS), and I'd rather not over-engineer it — but note the inconsistency: the two heaviest admin operations inAdminControllerare async with status polling, and the unbounded-by-design one is synchronous. If a real archive ever makes this run for minutes, it should move to the async+status pattern, not stay reliant on operators remembering the port-8080 workaround.Runbook lives in a REST Client
.httpfile — fine, but verify it reaches the operator. Thebackend/api_tests/Admin-Auth.httpentry is clear and well-commented (one-shot, idempotent, returns{count}, direct port 8080). My only worry is discoverability: a.httptest file isn't where someone reads a deploy procedure. Confirm this one-time step is also referenced in the actual deploy/release notes for #726 so it isn't missed post-deploy. An auto-title sync that's coded but never triggered leaves the backlog stale.No migration — correct, and worth stating. This is intentionally a data backfill via endpoint rather than a Flyway migration, which is the right call (the rebuild depends on application-layer formatters, not pure SQL). No action needed; noting it so the "where's the migration?" question is answered: there deliberately isn't one, and the idempotency makes a re-run safe if an operator is unsure whether it already ran.
Nice work on the fail-closed matcher and the version-spam avoidance — those are the two things that would have caused an incident, and both are handled.
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
I read the full diff (24 files, +1037/-33) and the touched files on disk. This is the cleanest PR I've reviewed in a while — the TDD evidence is right there in the structure, the naming carries the intent, and the security framing is honest rather than decorative.
TDD evidence
updateDocument_regeneratesAutoTitle_whenDateChanges,backfillTitles_neverRecordsVersions,index_with_regex_metacharacters_is_matched_literally_and_terminates). The save-time block covers the full matrix the persona asks for: happy path, hand-written kept, freshly-typed wins, location-cleared, precision change, blank-never-persisted, file-replaced-as-manual, idempotent. The empty/edge states for the matcher (null title, null/blank index, prose, location-only) are all present and fail-closed.@Spy DocumentTitleFactory = new DocumentTitleFactory()in the service test and the real factory inDocumentImporterTestis the right call — the single source of truth is exercised, not stubbed away, so a drift in the formula breaks the tests. Good.@SpringBootTest+PostgresContainerConfig) is correctly scoped to what H2 can't prove (NOT NULL title + version-row count), and the comment explicitly defers permission enforcement to the faster@WebMvcTestslice (AdminControllerTest: 401/403/200). That's the test-slice discipline I look for.@Timeout(5s)on the ReDoS test is exactly how you prove the literal-prefix claim instead of asserting it in prose.Clean code / naming / function size
DocumentTitleFactory.build,resolveTitle,projectedState,DocumentTitleBackfillMatcher.isOverwritable— every name answers "what does this represent?" without tracing usage.resolveTitleis a guard-clause ladder with the happy path last, under 10 lines.backfillTitlesorchestrates and stays readable.importinginto thedocumentpackage, package-privateDocumentTitleFormatter, factory as the only seam — the dependency direction (documentowns,importingconsumes) is correct and the dead privatebuildTitlewas deleted, not commented out.Objects.equalsimport is present (verified on disk); noOptional.get(), no raw exceptions,@Transactionalonly on the two write sweeps.backfillTitlessaving viadocumentRepository.savedirectly (norecordVersion) following thebackfillFileHashesprecedent is the right reliability call and is pinned bybackfillTitles_neverRecordsVersions+ the integrationaddsNoDocumentVersionRows.Svelte 5
DescriptionSectionuses$props()with a typed shape and$derived(titleValue) — no$effect-to-compute, no plain Map/Set, no unkeyed{#each}introduced. The helper is gated onshowTitleHelpand wired viaaria-describedby(conditionalundefinedwhen off). Verified bulk-edit passeshideTitleand nevershowTitleHelp(defaults false), and the component test pins that hiding the title suppresses the helper — so the helper is single-edit-only by construction, not by accident. i18n key added to all three locales.Blockers
None.
Suggestions (non-blocking, take or leave)
build(Document)takes the whole entity where the formula needs four fields. My persona's own rule is "pass specific props, not the entire object". Here the entity is the natural domain aggregate andprojectedStatebuilds a partialDocumentprecisely to feed it, so I'd keep it — but it does mean a synthetic/partialDocumentis now a valid input the factory must tolerate (hence theoriginalFilename == nullguard). That guard is load-bearing; a one-line test assertingbuild()on a bareDocument.builder().build()returns""rather than NPEs would lock that contract down. Minor.(?: – .+)?$trailing-location peel is greedy over a range label that itself contains–.range_with_internal_separator_plus_trailing_locationproves it works for the cross-month case, but the accepted FR-004 trade ({index} – {valid date label} – {anything}→ rewrite the trailing segment) means a hand-written suffix on a dated title can be dropped on backfill. The ADR documents this as accepted and it's a one-time sweep, so I'm not blocking — just flagging that this is the single place where the heuristic can lose human text. The operator runbook note (run once, idempotent, checkcount) is the right mitigation.projectedStatemirrorsupdateDocument's setter asymmetry by hand. The Javadoc says it's kept "lock-step" and a mismatch "silently produces a wrong label". That's a real coupling risk ifapplyDatePrecisionever changes. The season round-trip and precision-change tests cover the current behaviour; if this asymmetry grows, consider extracting the projection so the real setter path andprojectedStateshare one mapping. Today it's fine.Docs are complete (ADR-031, GLOSSARY, the L3 PUML, the api_tests runbook). Ship it.
🎨 Leonie Voss — UI/UX Design Lead & Accessibility Advocate
Verdict: ✅ Approved
I reviewed the FR-TITLE-005 helper line through the UX/accessibility lens: tone for the 60+ audience,
aria-describedbywiring, contrast, font size, i18n completeness/quality, and that it only surfaces where it should. The implementation is clean and considerate. No blockers. A few suggestions to make it a touch better for the senior researcher on a small phone in bright sunlight.Blockers
None.
Suggestions
1. Helper font size is at the absolute floor (12px) — bump to 14px for the senior audience.
frontend/src/lib/document/DescriptionSection.svelte:81renders the helper astext-xs(12px). That clears my hard minimum, but my body-text guidance for the 60+ cohort is 16px (14px acceptable for secondary/helper text). 12px helper copy on a phone in daylight is hard for this audience — and this is exactly the explanatory line that calms a user worried their title will be overwritten. I'd lift it totext-sm(14px). Note the sametext-xs text-ink-3pattern is already used for the Karton/Mappe helpers at lines 137 and 153, so changing only the title helper would create inconsistency — ideally bump all three together, or at minimum this one.2. Contrast — verified, passes, no change needed (logged for the record).
text-ink-3resolves to--c-ink-3: #6b7280(gray-500) on--c-surface: #ffffff, documented atfrontend/src/routes/layout.css:100as ~4.8:1 — WCAG AA for normal text ✓. Dark mode remaps to#8b97a5on#011526= 7.1:1 — AAA ✓ (layout.css:191/267). Both themes verified. The helper is plain prose, not a color-only status cue, so the "never color alone" rule is satisfied. Good.3.
aria-describedbywiring is correct — nicely done.DescriptionSection.svelte:77setsaria-describedby={showTitleHelp ? 'title-help' : undefined}and the<p id="title-help">(line 81) only renders under the sameshowTitleHelpguard. So the id reference and its target appear/disappear together — no danglingaria-describedbypointing at a missing node when the helper is off. Screen readers will announce the explanation right after the field label. Exactly the pattern I want.4. "Only shows where it should" — mostly correct, one place worth a conscious decision.
documents/[id]/edit) → shows. ✓ Intended.BulkDocumentEditLayout.svelte:445/496) → passeshideTitle, no title field at all, no helper. ✓documents/new/+page.svelte) → usesBulkDocumentEditLayout,hideTitle, no helper. ✓ Correct — there's no auto-title to explain before the doc exists.enrich/[id]/+page.svelte:16) → usesDocumentEditLayout, which hard-codesshowTitleHelp={true}(DocumentEditLayout.svelte:224). So the helper also appears in the enrich flow.That last one is not a bug — enrich is single-doc editing of an auto-titled document, so the message is true and helpful there too. I just want it to be a deliberate choice rather than incidental. If you ever want enrich to differ,
showTitleHelpis already a prop onDescriptionSection, but it's pinnedtrueinsideDocumentEditLayout, so enrich can't opt out without threading a new prop. Fine as-is; flagging so it's intentional.5. i18n — complete across all three locales, and the translations are good.
form_helper_title_autogeneratedis present in de/en/es (messages/{de,en,es}.json:59). Tone check:edites) matches the German register. ✓All three lead with what happens and the reassurance that a hand-written title survives — that's the right emphasis to prevent the "did my title just change?" anxiety, which is the whole UX point of FR-005. Well-judged copy.
6. No surprise title changes — UX confirmed.
The helper text explicitly promises the user's edit is preserved, and that matches the save-time logic (regenerate only while the title still equals the auto value). The interface is honest about the state change, which is exactly what I ask for. Nice alignment between copy and behavior.
Ship it. If you only do one thing, do suggestion #1 (14px helper text) — it's the one that most directly serves the people we most often forget.
🏛️ Markus Keller — Application Architect
Verdict: ✅ Approved
I reviewed this strictly through the architecture lens: layer boundaries, the
document↔importingdependency direction, where the title formula/factory/matcher live, single-source-of-truth, abstraction quality, and documentation currency.The headline structural decision is the right one. The title formula now lives in
DocumentTitleFactory(@Component,documentpackage) andimportingconsumes it — the dependency points from the leaf importer into the owning domain, never the reverse.DocumentImporterinjectsDocumentTitleFactoryand callsbuild(doc)inapplyFileMetadata(DocumentImporter.java:78, 230) rather than carrying its own privatebuildTitle().DocumentTitleFormattermoved alongside it as package-private (document/DocumentTitleFormatter.java:17). That is single-source-of-truth done properly: one formula, one date-label implementation, both halves of the #666 fixture-parity contract co-located in the domain that owns the concept. ADR-031 records the why, and it supersedes the old import-time-only behaviour clearly.What I checked and found sound
importing → documentonly. No new edge fromdocumentback intoimporting. The formatter relocation eliminates the previous split-brain where the formula lived in the wrong module.backfillTitles()andupdateDocument()stay insideDocumentServiceagainst its owndocumentRepository; person/tag access still routes through the respective services. No boundary leak introduced.resolveTitle/projectedState(DocumentService.java:471–501) for the edit path; a separate grammar heuristic (DocumentTitleBackfillMatcher) for the one-time sweep. Keeping the fragile heuristic out of the hot save path and confined to the backfill is the correct containment of complexity.projectedStateexplicitly mirrorsapplyDatePrecision's skip-null asymmetry, with a comment naming the drift risk.backfillTitles()saves viadocumentRepository.savedirectly and skipsrecordVersion, consistent with thebackfillFileHashesprecedent (DocumentService.java:1074–1096). A mechanical rename should not snapshot the corpus intodocument_versions; this is the right call and is tested.POST, no broker, no async job for a one-shot cleanup. Appropriate for the volume. The operator note documents the once-after-deploy intent.BackfillResultreuse — thedocument-package record reused byAdminController, andAdminControllerliving inuserwhile importingdocumenttypes, are both pre-existing patterns (introduced inaf2c983f), not regressions from this PR.GLOSSARY.mdauto-generated title entry,l3-backend-3b-document-management.puml, and theapi_testsrunbook entry. No schema/Flyway change, so no DB-diagram obligation. Nothing outstanding.Suggestions (non-blocking)
projectedStateis a structural tripwire by design — keep it honest.DocumentService.java:489rebuilds a shadowDocumentwhose field handling must stay lock-step with the real setters inupdateDocument. Today they match. The day someone adds a fourth date field to the edit path, this duplicate will silently drift and emit a wrong label. The comment flags it, but consider extracting a singleapplyDateFields(target, dto, fallback)helper that both the live update and the projection call, so the two cannot diverge. Cheaper than a future "why is the title wrong" bug hunt.File-replace title behaviour is subtle.
resolveTitleruns atDocumentService.java:390, before the file-replace block sets a neworiginalFilenameat:440. So a file-replaced document's title is still computed against the old index and not recomputed against the new one. The PR body calls this "treated as manual," which is a defensible product choice — just worth a one-line code comment at the resolve site so the ordering is intentional-by-record, not incidental.Neither blocks merge. Structurally this is exactly how I'd want it: the formula migrated into its owning module, consumers depend inward, the risky heuristic is quarantined to the backfill, and the decision is captured in an ADR. Clean work.
— @mkeller
🧪 Sara Holt — QA Engineer & Test Strategist
Verdict: ⚠️ Approved with concerns
This is a genuinely well-built test suite — the pyramid is right (fast Mockito unit layer carries the permutations, one Testcontainers integration class pins the real-Postgres behaviour, a single E2E covers the journey), and the standout decisions all land: a
@SpyrealDocumentTitleFactoryso the save-time tests exercise the actual composition instead of a stub, an explicit@Timeout(5s)ReDoS regression fence on the regex-metacharacter index, the NOT-NULL-title rationale for choosing Postgres over H2 written into the Javadoc, and a dedicatedbackfill_addsNoDocumentVersionRowsassertion that proves the version-spam guarantee both at the unit layer (verify(documentVersionService, never())) and against the realdocument_versionstable. The acceptance scenarios from #726 (FR-001..005) each have a home. My concerns are coverage gaps, not defects in what exists.Blockers
None. Nothing here is wrong or flaky enough to block merge.
Suggestions
Save-time path never asserts the no-version / audit behaviour for an auto-title rewrite.
updateDocumentalways callsrecordVersion+ an audit event — correct for an interactive edit — but the title-sync tests (DocumentServiceTestlines 275-406) only assertstored.getTitle(). The backfill path has explicitverify(documentVersionService, never())coverage; the save path has no symmetric assertion that a title-only regeneration still emits exactly oneMETADATA_UPDATED(notSTATUS_CHANGED). Oneverify(auditService).logAfterCommit(eq(AuditKind.METADATA_UPDATED), ...)would lock that in.metaDateRaw/metaDateEndcarry-over asymmetry inprojectedStateis under-tested.projectedState(DocumentService.java:489) keeps storedmetaDateRaw/metaDateEndwhen the DTO omits them — this is load-bearing for SEASON and RANGE labels. OnlyupdateDocument_roundTripsSeasonLabelexercises it, and it passes raw explicitly in the DTO (line 367), so the "DTO omits raw, stored raw is reused" branch is never hit. Add a test: stored SEASON title, edit submitsmetaDateRaw=null, assert the season label survives. Same for a RANGE title where the DTO omitsmetaDateEnd. This is exactly the kind of setter-asymmetry the Javadoc warns "would silently produce a wrong label."Backfill matcher has no negative test for a near-miss date label.
DocumentTitleBackfillMatcherTestproves every valid form is overwritable and that prose is skipped, but there's no test that a plausible-but-invalid date-shaped segment is rejected — e.g.C-0029 – 13. Foo 1916orC-0029 – 1916er Jahre. Without it, a regression that loosens the alternation (matching arbitrary prose containing a year) would pass silently. OneisFalse()case closes that.Integration test asserts count
isGreaterThanOrEqualTo(1)instead of an exact value (DocumentTitleBackfillIntegrationTestline 55). Because it's@Transactional-isolated and persists exactly one stale row, the count is deterministically1.>=1would mask a bug where the sweep also rewrites an unrelated row. Tighten toisEqualTo(1).E2E asserts the heading matches
/E2E Auto-Titel Sync.*1928/but never asserts the old value is gone. A regex.*1928would still pass if the title became…Sync 2028 1928. A follow-upnot.toContainText-style check, or asserting the exact rebuilt string, makes the "title follows the date" guarantee airtight. Minor, since the unit layer already hasregeneratedTitle_doesNotContainOldDate.Frontend helper test asserts presence +
aria-describedbywiring but not that the text is the localised Paraglide string.DescriptionSection.svelte.spec.ts:64only checkstextContent.length > 0. Asserting it rendersm.form_helper_title_autogenerated()(or at least a stable substring) would catch an accidental rewire to the wrong message key — cheap, and you added the key in three locales.No test for the blank-index / null-
originalFilenamebackfill row at the integration layer. The matcher unit test coversnull/blank index fail-closed, but production rows have a NOT-NULLoriginalFilename; conversely a synthetic/legacy row with an empty filename would route throughbuild()→StringBuilder(""). Not a blocker (unit layer covers the matcher contract), just noting the integration suite only exercises happy rows.Net: the layering, infra choice, and idempotency/version-spam assertions are exactly what I'd want. Items 1-2 are the ones I'd most like addressed before merge because they guard the two save-time invariants (audit kind, raw/end carry-over) that currently have no direct assertion.
📋 Elicit — Requirements Engineer / Business Analyst
Verdict: ✅ Approved
I reviewed PR #727 strictly against issue #726: every FR, every NFR, the non-goals, and each Gherkin scenario traced to a test. The traceability is unusually clean — this is what a "spec-dense issue → implementation" round-trip should look like. No blockers. A handful of small traceability/wording notes that do not gate merge.
Requirements coverage
Functional requirements
DocumentTitleFactoryindocument, importer consumes it;DocumentTitleFormattermoved intodocumentdocument/DocumentTitleFactory.java(@Component),DocumentImporterconsumes it; the diff'srename from importing/… to document/DocumentTitleFormatter.javaconfirms the move (dependency direction correct —documentowns,importingconsumes).updateDocumentonlyautoTitleBeforecaptured before any setter;resolveTitle+projectedState; bulk edit untouched.POST /api/admin/backfill-titles, ADMIN, synchronous,BackfillResult(count), directrepository.save, no version-spam, structured logAdminController.backfillTitles,DocumentService.backfillTitles(direct save, parameterizedscanned/updated/skippedlog).DocumentTitleBackfillMatcher—String.startsWithfor the index (never compiled),Pattern.quoteon month tokens, bounded quantifiers, returnsfalseon any null/blank/structural surprise.aria-describedby, de/en/es, AA contrastDescriptionSection.svelte(showTitleHelp,#title-help,text-ink-3);form_helper_title_autogeneratedpresent in all three locale files; wired fromDocumentEditLayoutsingle-edit only.Non-goals — all respected
showTitleHelpdefaultsfalse, helper omitted whenhideTitle(testomits the helper when the title field is hidden (bulk edit)); save-time logic lives only inupdateDocument. ✅updateDocument_treatsFileReplacedDoc_asManualasserts the title is left unchanged. ✅api_tests/runbook hits port 8080 directly. ✅formatDocumentDate/ #666 date-label split not collapsed. ✅NFRs
DocumentTitleFactory; matcher derives month tokens from the sameLocale.GERMANformatters, so accepted spellings cannot drift from emitted ones.scanned/updated/skipped, never concatenates titles.@RequirePermission(ADMIN); 401/403 slice tests; literal index defuses CWE-1333/625.Gherkin → test traceability
All 12 save-time + 5 backfill scenarios trace to a test, with one wording caveat:
DocumentServiceTest.document_versionsrows (backfill_addsNoDocumentVersionRowsassertscountunchanged), regex-metacharacter index (@Timeout5s), permission 401/403/200 inAdminControllerTest.Suggestions (non-blocking)
Partial trace on the "SEASON and RANGE labels round-trip" save-time scenario. The Gherkin names both precisions, but the save-time suite only has
updateDocument_roundTripsSeasonLabel— there is no RANGE assertion through theprojectedStatepath. RANGE composition is covered at the factory/formatter and backfill-matcher layers, so the risk is low, but to fully discharge the scenario as written, add a save-time case that edits a RANGE-precision date and asserts the honest range label survives. This is the only scenario without a 1:1 save-time test.projectedStateis a hand-mirrored copy ofupdateDocument's setter asymmetry (date/location overwrite-with-null vs precision skip-null). The issue explicitly flagged this as the silent-wrong-label trap, and the code comments call it out well — but the two code paths can still drift independently over time. Consider a single guard test that pins the asymmetry (e.g. "null precision in the DTO keeps the stored precision in the regenerated title") so a future refactor ofapplyDatePrecisioncan't desyncprojectedStateunnoticed. (updateDocument_dropsTrailingLocationSegmentcovers the null-location half; the null-precision half is implied but not isolated.)Matcher branch 3 (
{index} – {location}with no date) requiresbody.equals(location)exactly. Correct and fail-safe as specified, but worth a one-line note in ADR-031: a stale location-only title (location was changed after the title was built, no date) will be skipped by the backfill rather than corrected, because the stored segment no longer equals the current location. Save-time still fixes it on the next edit; just make the accepted residue explicit so it isn't later mistaken for a bug.Helper-line copy is good and testable. The de string ("…sobald du den Titel änderst, bleibt deine Version erhalten") matches the issue's suggested wording and reads well for the 60+ audience. No change requested — noting that the component test asserts presence + non-empty +
aria-describedbywiring, which is the right level; the actual AA contrast oftext-ink-3onbg-surfaceis asserted by claim, not by an automated check (acceptable for this stack).Clean, traceable, non-goals honored. Ship it; fold suggestion (1) in if you want a fully discharged scenario set.
🔒 Nora "NullX" Steiner — Application Security Engineer
Verdict: ✅ Approved
I came at this PR with the adversarial assumption that the backfill matcher is the soft underbelly — it ingests two user-controlled strings (
titleandoriginalFilename, both of which an editor or the file-replace path can shape) and feeds the residue into a compiled regex. I tried to break it. I could not. The threat model named in the PR body is the right one, and the implementation actually defends against it. Below is what I verified, including the things I attempted to exploit and failed.What I verified (no blockers found)
1. Regex-injection / ReDoS on the user-controlled index — CWE-1333 / CWE-625: not exploitable.
DocumentTitleBackfillMatcher.isOverwritable(matcher lines 70–91) compares the index withString.startsWith(line 74) and the location withString.equals(line 90) — both literal. The index is never compiled into a pattern, so a filename likeC-0029(.*).pdforC-0029$|^.*is inert. The regression testindex_with_regex_metacharacters_is_matched_literally_and_terminates(matcher test 140–150,@Timeout(5s)) pins this. Good — this is exactly the test I would have asked for.2. The compiled date-label regex — no catastrophic backtracking.
DATE_LABEL_WITH_OPTIONAL_LOCATION(matcher lines 52–65) is the only attacker-reachable compiled pattern (thebodyit runs against is derived from a stored title). I extracted it and ran it standalone against inputs specifically engineered to exploit the overlap between the internal–alternatives and the greedy trailing(?: – .+)?, plus deeply-nested separator runs:"30. Jan." + " – 2. Feb.".repeat(8000) + " 1917 "(80 014 chars)("9. Jan. 1917 – ").repeat(8000) + "ZZZ""1917" + " – ".repeat(8000)"1".repeat(50000)All matched/failed in 0–1 ms, scaling linearly to 120 k chars. The reason it's safe: every prefix alternative is a fixed-structure, anchored token sequence; the quantifiers are bounded (
\d{1,4},\d{1,2}) and non-nested; and the single trailing.+has no ambiguous repetition wrapped around it. The PR's "linear, no catastrophic backtracking" claim is accurate, and I confirmed it empirically rather than on faith.3. Month-token derivation — no drift, no smuggled metacharacters.
monthAlternation(matcher 93–100) derivesJan.…Dez./Januar…Dezemberfrom the sameLocale.GERMANformatters the factory uses, then wraps each viaPattern.quote(\Q…\E). I dumped the actual tokens: the.in abbreviations is literal, never a wildcard. Sharing the formatter source between matcher andDocumentTitleFactoryis the correct way to keep an allowlist from rotting.4. Authorization on the new endpoint — fail-closed.
AdminControllercarries class-level@RequirePermission(Permission.ADMIN)(AdminController 21), which thePermissionAspectenforces overbackfillTitles()(54–58). Type-safe enum, not a magic string — a typo would not silently open the route.AdminControllerTestcovers all three boundaries: 401 unauthenticated (138–141), 403 for a non-ADMIN authenticated user (143–148), 200 for ADMIN (150–158). This is the 401-vs-403 distinction I always want to see tested separately, and it's here.5. CSRF — covered structurally.
SecurityConfig.securityFilterChain(97–99) appliesCookieCsrfTokenRepositoryto all state-changing requests, so the newPOST /api/admin/backfill-titlesrequires a matchingX-XSRF-TOKEN; a CSRF failure returns 403CSRF_TOKEN_MISSING(SecurityConfig 129–135). The tests exercise the route.with(csrf()), consistent with the existing backfill endpoints.6. No sensitive data in logs.
backfillTitleslogs only aggregate counters via parameterized SLF4J:log.info("Title backfill complete: scanned={} updated={} skipped={}", …)(DocumentService 1094). No title is string-concatenated into a log line, so there's no log-injection / Log4Shell-style sink and no PII leak of document titles into the log store.7. Fail-closed behavior.
The matcher returns
falseon any null/blank index or structural surprise (matcher 71–83), and the integration testbackfill_skipsProse(integration test 70–79) confirms prose is left untouched. Idempotency (backfill_isIdempotent_secondRunChangesNothing) and the "nodocument_versionsrows" guarantee (backfill_addsNoDocumentVersionRows) are both pinned against real Postgres — H2 would have hidden the NOT-NULLtitlesemantics, so using Testcontainers here was the right call.Suggestions (non-blocking)
backfillTitles()loads the whole corpus in one transaction.documentRepository.findAll()(DocumentService 1076) pulls everyDocumentinto memory inside a single@Transactional. At the documented ~1500-doc archive scale this is a non-issue, but it's an unbounded-allocation pattern: if the corpus ever grows (or an operator triggers it on a much larger future dataset), this is a memory-pressure / availability footgun. Consider paging/streaming the scan if the archive is expected to grow past ~50k. Worth a one-line code comment noting the bound is intentional so a future reader doesn't mistake it for an oversight.{index} – {valid German date label}will be treated as auto and rebuilt by the backfill. This is the documented FR-004 trade-off and the design accepts it (a valid date label is taken as strong evidence of a machine title), so I'm not flagging it as a defect — just calling it out explicitly so it's a known accepted risk rather than an emergent surprise. The save-time path correctly avoids this by using exact old-vs-new comparison instead of the heuristic (DocumentService 471–479), which is the safer design where it matters most.Nice work — the security-sensitive surface is small, literal where it must be, allowlisted where it can't be, and backed by the exact regression tests I'd have written. The threat-model comments in
DocumentTitleBackfillMatcher(matcher 27–33) explain whystartsWithis used instead of a pattern, which is precisely the kind of comment that survives a future "let me just refactor this into a regex" change. 🔒Review concerns addressed — pushed
7316c51…0693cfdThanks all. No blockers were raised; here's how the concerns landed.
Fixed
projectedStatedrift (Markus, Felix, Sara) — extractedeffectivePrecision/effectiveMetaDateEnd/effectiveMetaDateRaw, now the single skip-null resolution used by bothapplyDatePrecision(the real setters) andprojectedState. They can no longer diverge; behaviour unchanged (185 existingDocumentServiceTestcases stay green).7316c51originalFilenamereassignment site.7316c51""guard, four backfill-matcher negative near-misses (ASCII hyphen vs en dash, missing separator, year-with-trailing-letters, index-without-separator), and tightened the integration count assertion to exactly1.f656f7ctext-sm) for the 60+ audience and the component test now asserts the actual localized string + the size class.0693cfdTouched-class suite green (DocumentServiceTest 187, factory 8, matcher 26, integration 4); the
DescriptionSectionbrowser spec passes; lint clean.Deferred (with rationale)
findAll()in one transaction (Tobias, Nora) — kept as-is. The issue mandated a synchronous sweep matching the existingbackfill-versions/backfill-file-hashesprecedent, the corpus is bounded (~1.5k docs), and all three backfills share this pattern. Changing it now would diverge from spec + precedent for no current benefit; it's the right follow-up if the corpus ever approaches the #481 50k threshold.api_testsis the documented mitigation for the one-shot call.api_tests/Admin-Auth.http, ADR-031, and the issue completion comment.