Import normalizer: offline tool to normalize the raw archive spreadsheets #663
Open
marcel
wants to merge 172 commits from
docs/import-migration into main
pull from: docs/import-migration
merge into: marcel:main
marcel:main
marcel:feat/issue-560-vimock-migration
marcel:feat/issue-286-notification-bell-form-actions
marcel:feat/issue-580-sentry-backend
marcel:fix/issue-593-management-port-zero
marcel:worktree-feat+issue-557-upload-artifact-v3-pin
marcel:worktree-chore+issue-556-drop-client-branches-coverage-gate
marcel:fix/issue-514-prerender-crawl-bakes-redirects
marcel:fix/issue-472-prerender-entries
marcel:feat/issue-395-readme
marcel:feat/issue-345-bulk-mark-reviewed
marcel:feat/issue-344-bell-tooltip
marcel:feat/issue-341-annotieren-contrast
marcel:feat/issue-225-bulk-metadata-edit
marcel:feat/issue-317-bulk-upload
marcel:feat/issue-271-dashboard-redesign
marcel:docs/issue-240-mission-control-spec
marcel:refactor/issues-193-200
marcel:feat/issue-162-korrespondenz-redesign
marcel:feature/68-new-document-file-first
marcel:feat/81-discussion-discoverability
marcel:feat/62-document-bottom-panel
marcel:feature/56-backfill-file-hashes
marcel:feat/38-document-edit-history
marcel:fix/svelte5-test-delegation-and-login-auth
No Reviewers
Labels
Clear labels
P0-critical
P1-high
P2-medium
P3-later
audit
bug
cleanup
collaboration
conversation
descoped
devops
documentation
epic
feature
file-upload
legibility
needs-discussion
notification
person
phase-1: security
phase-2: container-images
phase-3: prod-compose
phase-4: spring-prod-profile
phase-5: backups
phase-6: deployment-docs
phase-7: monitoring
refactor
security
test
ui
user
Blocks a core user journey, causes data loss, or is a security/accessibility barrier. Work on this before P1+.
Significant friction on a main user journey; workaround exists but is non-obvious. Next up after P0.
Noticeable improvement; doesn't block anything; schedule opportunistically.
Cosmetic or parking-lot work; fine to stay open indefinitely.
Read-only audit / assessment work; produces a report rather than changing code
Something isn't working
Removal of dead code, vague comments, naming violations, and scratch artifacts
We want to extend the family archive to have more collaboration tools
We will do that later
README, ARCHITECTURE, GLOSSARY, CONTRIBUTING, per-domain docs
Marker for epic-level issues that group multiple child issues
Codebase Legibility Refactor — preparing the codebase for human evaluation and bus-factor reduction
Has an open decision or design question that must be resolved before implementation can start.
Security hygiene — must be done first
Production-ready multi-stage Docker images
Production compose overlay + Caddy reverse proxy
Spring Boot production configuration profile
Database and object storage backup strategy
.env.example, DEPLOYMENT.md, runbook
Prometheus, Loki, Grafana, Alertmanager
Code restructuring without behaviour change
UI/UX and accessibility issues
No Label
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: marcel/familienarchiv#663
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "docs/import-migration"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Adds the import-migration analysis docs and a standalone offline tool,
tools/import-normalizer/, that transforms the raw family-archive spreadsheets into a clean, canonical dataset before import — the agreed "normalize first" strategy.Docs (
docs/import-migration/):01-findings-spreadsheet-analysis.md— data-quality findings (IMP-01..12)02-normalization-spec.md— requirements spec (FR/NFR, data dictionary, traceability)03-normalizer-implementation-plan.md— 17-task TDD planWORKLOG.md— session log incl. the real dry-run resultsTool (
tools/import-normalizer/, Python 3.12 + openpyxl):APPROX/UNKNOWNprecision — original string always preserved.Real dry-run (against the actual archive)
Bundled backend fix:
family_memberflag on imported relationshipsExercising the canonical importer end-to-end against this normalizer's outputs surfaced a latent backend bug:
PersonRegisterImportercreates persons withfamily_member=falsefirst, and the subsequentPersonTreeImporterupsert goes throughmergeCanonical, which never propagatesfamily_member. Net effect: every imported person with relationships was flaggedfamily_member=falseand never appeared in/api/personsfamily filters or the family-network view.Fix lives in
RelationshipService(the documented owner of thefamily_memberflag).addRelationship()now setsfamily_member=trueon both endpoints whenever the relation type isPARENT_OF/SPOUSE_OF/SIBLING_OF— extracted as aFAMILY_RELATION_TYPESconstant and reused ingetFamilyNetwork()for a single source of truth. Non-family types (FRIEND,COLLEAGUE,EMPLOYER,DOCTOR,NEIGHBOR,OTHER) intentionally leave the flag alone — a family doctor isn't a family member.Bundled here because the bug only surfaces when running the canonical import this PR enables.
Test plan
cd tools/import-normalizer && .venv/bin/python -m pytest tests/ -q→ 57+ passing (incl. new self-reference + category-collapse guards).venv/bin/python normalize.pyruns clean againstimport/and writesout/+review/out/canonical-documents.xlsxfirst rows (dates + resolved person ids)cd backend && ./mvnw test -Dtest=RelationshipServiceTest→ 11 green (incl. 2 new family-flag tests)PARENT_OF/SPOUSE_OF/SIBLING_OFedge appear withfamily_member=true🤖 Generated with Claude Code
🏛️ Markus Keller — Senior Application Architect
Verdict: ⚠️ Approved with concerns
I reviewed this as what it is: a standalone, offline transform tool — not a deployed service, no DB, no runtime coupling to the monolith. Judged on that frame, the structure is sound.
What's good
ingest/dates/persons/documents/writers/overrides/configeach own one concern and expose a small surface. A newcomer can navigate by filename alone. This is exactly the boundary discipline I push for in the backend.config.py(header maps, century boundaries, feast/season tables, fuzzy threshold) — no magic numbers buried in logic (NFR-MAINT-01 honored). Easter computus is the one algorithm that escapes the data/logic split, and it correctly lives indates.pywith the table beside it in config.ingest.build_header_map) over fixed indices is the right call — it absorbs the "spreadsheet re-exported with shifted columns" risk (R3) at the lowest layer. Fail-loud on a missing required header, warn-and-continue on unknown headers: correct asymmetry._FIXED_TS), ordered (not set-based) alias forms, deterministic sorts in the review writers. The end-to-end test asserts a second run is byte-stable on content. Good.docs/import-migration/carries the spec, findings, plan, and an ADR-style scope boundary. The "this is the contract Phase 2 consumes" framing is the right seam between this tool and the futureMassImportServicework.Blockers
None. This tool touches no migration, no entity, no controller/service, no Docker service, no route — so none of my doc-currency triggers fire. I checked the architecture-doc table item-by-item; nothing in
docs/architecture/**needs to move for this PR.Suggestions
frontend/src/routes/page.server.spec.ts(+45) comes from commita1035171"fix(reader-dashboard): recentDocs items were always undefined" — a reader-dashboard fix with nothing to do with the import normalizer. It is in this PR's diff because it is not yet onmain. A docs/tooling PR should not carry a frontend behavior fix; either split it out or call it out explicitly in the PR body so it isn't merged silently. This is the one structural concern I'd want resolved before merge.writers.DOC_COLUMNS/PERSON_COLUMNS. When Phase 2's importer is written, that ordering becomes a load-bearing contract. Consider a tiny "contract test" (assert the header row equals the spec's documented columns) so a future refactor of the writer can't silently break the downstream reader. Cheap insurance for a seam you've explicitly declared.config.KNOWN_LAST_NAMESis a hand-curated list embedded in config. Fine for a one-owner tool, but note it as a known limitation in the README — name-splitting quality for" und "cells degrades for any surname not on that list. It's a config table, so it's in the right place; just flag that it needs grooming as the long tail surfaces.Boring, well-bounded, single-purpose. Ship it once the stray frontend spec is dealt with.
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
I went through this Python the way I'd review any service: TDD evidence, naming, function size, guard clauses, dead code. It's clean, disciplined work — the commit history reads like a textbook red/green sequence (
feat→fix/test, one logical change per commit). 57 tests pass locally in 0.14s.What's good
fix(normalizer): require day-dot in English month-first matcher (structural anti-shadow)andfix: split_receivers returns [] for a geb.-only cellshow tests driving the edge-case discovery. The matcher-ordering bugs were caught by tests, not in prod.StrEnumforPrecision.parse_dateis pure and side-effect-free (REQ-DATE-05) — exactly what makes it unit-testable in isolation.extract_row's innerget(), the early returns insplit_receivers, the triage dispatch. Functions are small and each does one thing._MONTH_B_RE"dot after day is REQUIRED so this can't match 'Mai 1895'", and the_RANGE_DAY_REnote about not swallowing17/6. 1916. That's the good kind of comment.30.2.1888→ UNKNOWN (not clamped),29.2.1900→ UNKNOWN (1900 not a leap year),13.5.65→ UNKNOWN (ambiguous 2-digit), override-wins path all correct.Blockers
None in the tool code.
Suggestions
frontend/src/routes/page.server.spec.tsbelongs to the reader-dashboard fix (a1035171), not this tool. Mixing a behavior fix into a docs/tooling PR breaks the "one logical change per PR" discipline and makes the diff lie about its scope. Split it._joinis silently lossy if a name contains|.writers._join(['a|b','c'])→'a|b|c'— a downstream pipe-split can't recover the original field count. German correspondent names almost certainly never contain a pipe, so this is theoretical, but since the pipe is your field delimiter I'd add a one-line assert or a_csv_safe-style guard (or just document the delimiter assumption inwriters.py). Right now nothing prevents a future surname like"Müller | Schmidt"cell from corrupting the column.main()parses args it never uses.normalize.py:136-137builds anArgumentParserwith only a description and immediately discards the result. Either wire real flags (e.g.--out-dir,--dry-run—run()already takes them as kwargs, so the seam exists) or drop the parser. As written it's dead scaffolding that implies a CLI surface that doesn't exist.to_canonical, the localpd = _dates.parse_date(...)shadows the conventionalpandas as pdmental model for any Python reader.parsedorparsed_datereads cleaner. Trivial, take it or leave it.Solid, well-tested module. Address the scope leak and the unused argparser and I'm happy.
🔧 Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
Mostly out of my lane — there's no Compose service, no CI job, no Caddy rule, no port, no secret, no infrastructure component here. This is a developer's local tool that runs by hand on Marcel's laptop. I checked the few things that are mine.
What I checked
.gitignorehygiene — correct. Root.gitignoreadds**/.venv/,**/__pycache__/,*.pyc; the tool's own.gitignoreignores.venv/,out/,review/,__pycache__/,*.pyc. Outputs and the virtualenv stay out of git, so no 100MB of openpyxl bytecode or generated xlsx lands in history. The PR diff confirms zeroout//review/artifacts committed. Good.import/, which is already git-ignored at the root. The tool reads from there and the private archive never gets committed — exactly right for a family-document repo where the data must not leak.requirements.txtpinsopenpyxl==3.1.5andpytest==8.3.4— reproducible installs, no floating versions. That's the same discipline I want on image tags.ingest.read_sheetusesread_only=True, NFR-SAFETY-01) — the tool can't corrupt the irreplaceable originals. That's the operational safety property I care most about for a one-shot data migration.Blockers
None.
Suggestions
requirements.txt. Two pinned Python deps now exist outside the Maven/npm trees Renovate watches. Low urgency (offline tool, not deployed, openpyxl has a small attack surface against trusted local files), but if the repo's Renovate config uses an explicit file allowlist, addtools/import-normalizer/requirements.txtso these don't silently rot. If Renovate auto-discovers, ignore this.No infrastructure risk. Approved from my side.
📋 "Elicit" — Requirements Engineer
Verdict: ✅ Approved
I traced the implementation back against the spec I authored (
02-normalization-spec.md). This is a Brownfield artifact review: does the code satisfy the confirmed requirements, and are the goals measurable and met? Overwhelmingly yes — this is one of the cleanest spec→code traces I've seen on this project.Requirement traceability (spot-checked, all PASS)
blank-index-rows.csv, X_SUFFIX →skipped-x-suffix.csv, OK → canonical, plus duplicates flagged on every occurrence. The dry-run arithmetic in the PR body checks out: 7,582 + 225 + 93 + 42 = 7,942 rows read. No drop is silent.date_rawkeeps the verbatim string even when a trailing note is stripped for parsing;source_rowis carried on every canonical row (REQ-PROV-01). Verified"17.Nov 1887, 2. Brief"keeps the note inraw.Pfingsten 1922→ 1922-06-04 matches the worked example in §10." und ",//,(Gruber)shared-surname, and the refusal to auto-split"Ella Anita"into two — all present and tested.summary.txtreports rows-in, docs-out, dates-by-precision, names matched vs provisional, overrides applied, anomalies by type. The returning-agent persona is served.Blockers
None.
Suggestions (open-question hygiene, not gates)
config.MONTHSis German+English only.Requirements are testable, traceably implemented, and the goals are measurable. From a requirements standpoint, this is done.
🛡️ Nora "NullX" Steiner — Application Security Engineer
Verdict: ✅ Approved
Adversarial pass, scoped honestly to what this thing actually is: an offline, local CLI that reads two trusted spreadsheets and writes xlsx + CSV to disk. No network, no
eval/exec, nosubprocess, noshell=True, no deserialization of untrusted data, no auth surface, no web endpoint, no DB query. The OWASP web Top 10 mostly doesn't apply. I checked the things that genuinely do.What I checked — and it holds up
writers._csv_safe()prefixes any cell whose first char is= + - @ \t \r \nwith a leading', so when an archivist opensreview/*.csvin Excel/LibreOffice a malicious raw string like=cmd|'/C calc'!A0becomes inert text. Critically, the defense is applied to review files — the ones a human opens — and it covers the leading-control-char vectors (\t/\r/\n) that the lazier implementations miss. There's even a regression test (test_write_review_csv_defangs_formula_injection) and a commit (df00ea42) that specifically hardened the leading-LF case. This is exactly the "every fix has a permanent test" discipline I ask for. Well done.read_only=Trueiningest.read_sheet). The irreplaceable originals can't be mutated by a bug — good integrity posture.configconstants under the tool dir, never from spreadsheet content. TheDateicell is only string-compared (index_file_mismatch), never opened or used to build a filesystem path. So a hostile..\..\etc\passwdin aDateicell does nothing but get flagged for review.overrides/*.csvare read withcsv.DictReader(noeval, no formula evaluation on read) and treated as plain strings. Trusted, version-controlled input anyway.Blockers
None.
Suggestions
_write_xlsxwrites raw cell values via openpyxl. openpyxl does not auto-evaluate a leading-=string as a formula on write by default (it stores it as a string unless you set a data type), so this is low risk — butout/canonical-*.xlsxis also opened by a human. If you want belt-and-suspenders, run the same_csv_safe-style neutralization on the xlsx string cells, or note explicitly that openpyxl stores them as inert strings. Smell, not a vuln.openpyxlparses attacker-influenced-but-trusted xlsx. The source workbooks are Marcel's own, so the trust boundary is fine today. Just keepopenpyxl==3.1.5patched (see Tobias's Renovate note) — XML/zip parsers in office-format libs are the historical soft spot if this tool were ever pointed at an untrusted file.No exploitable issue. The CSV-injection handling is genuinely above-average for a "just an internal tool." Approved.
🧪 Sara Holt — Senior QA Engineer
Verdict: ⚠️ Approved with concerns
I ran the suite (
.venv/bin/python -m pytest tests/ -q) → 57 passed in 0.14s. For a pure-logic transform tool, this is the right shape of pyramid: lots of fast unit tests on the hard logic (dates, persons), a couple of writer/ingest tests, and one end-to-end orchestration test that also asserts determinism. Real-archive examples are used as fixtures (NFR-TEST-01), which is exactly how I want edge cases captured.What's strong
30.2.1888,31.4.1916) route to UNKNOWN instead of clamping; ambiguous 2-digit years (13.5.65); the range-vs-slash-date confusion (7./8. Sept.1923vs17/6. 1916); approx-marker upgrades; trailing-note stripping with raw preserved. These are the bugs that bite at 7,600 rows, and they're pinned.test_run_end_to_endruns the pipeline twice and asserts identical cell matrices + identical review-file text (NFR-IDEM-01). The pinned-timestamp test backs it up. This is the test most teams forget.test_provisional_id_never_collides_with_register) covers the nastiest data-integrity trap — a provisional person silently stealing a register id. Good instinct._ctx,_doc_wb,_person_wb) keep setup to one line. Clean.Blockers
None.
Suggestions (coverage gaps I'd close)
build_header_maphas a unit test for the raise, butnormalize.run()is never exercised against a workbook missing theIndexheader (US-MAP-01 AC3, "fail loud before partial output"). One test assertingrun()raises and writes nothing toout/would lock that AC._doc_wbfixture uses only known headers. Add a column like"Mystery"and assert it shows up insummary.txt'sunknown_headersand the run still emits docs.index_file_mismatchmulti-dot filename is a latent false-positive.index_file_mismatch("W-0001", "W-0001.final.pdf")returnsTruebecause the stem becomesW-0001.final. The code comment says "all corpus paths are*.pdf", so this may never fire — but there's no test asserting the assumption, and a mismatch only adds a review flag (not data loss), so it's low severity. Either add a test documenting the intended behavior for dotted names or note the single-extension assumption next to the test._csv_safe/pipe-delimiter interaction Felix flagged — a name containing|would corrupt a pipe-joined column with no test to catch it. If you decide the delimiter assumption holds, write the test that documents it holds.These are gaps, not defects — the existing tests are meaningful and the determinism coverage is exemplary. Close the two ingest-orchestration ACs (suggestions 1–2) and I'd flip to a clean approve.
🎨 Leonie Voss — UX & Accessibility
Verdict: ✅ Approved (mostly out of scope — LGTM)
There is no UI in this PR. No
.sveltecomponent, no route, no page, no rendered surface — so there's nothing for me to check on brand tokens, WCAG contrast, 44px touch targets, focus rings, or 320px responsiveness. I'm not going to invent findings where my discipline doesn't apply.The one place my lens does apply is the ergonomics of the human-facing artifacts — the review CSVs and the overrides files are what Marcel (the data steward, tech comfort 4/5) actually sits in front of during the iteration loop. So I checked those as a "human-in-the-loop interface."
What I checked — and it's thoughtful
unparsed-dates.csvis sorted most-frequent-first withexample_rowsand pre-blankedsuggested_iso/suggested_precisioncolumns so a corrected row can be pasted straight intooverrides/dates.csvin the same shape. That's "recognition rather than recall" — the user isn't asked to remember the override file format, the review file mirrors it.unmatched-names.csveven pre-fills a fuzzysuggested_id+ score. Good information scent.summary.txtis a real status surface, not a log dump. Grouped sections (INPUTS/DATES/NAMES/ANOMALIES/OVERRIDES) with the unknown-date rate shown against its target (<=5%). That's "visibility of system status" done right for a CLI — the user knows the health of the run at a glance.Blockers
None.
Suggestions (low priority, CLI-ergonomics only)
summary.txtshows the unknown-date rate but doesn't surface a single "are we done?" verdict. A one-line status likeSTATUS: 9.2% unknown — above 5% target, run the overrides loopwould give the semi-technical steward an unambiguous next-step cue instead of asking him to compare two numbers. Pairs nicely with Elicit's G1 note.unparsed-dates.csv, already frequency-sorted). A suggested order reduces the "where do I start?" friction of an 8-file review folder. Nice-to-have.Nothing in my domain blocks this. Approved.
Add Spanish month names (Mexican-branch letters) to config.MONTHS and let the month-first matcher accept a hyphen (not just a dot) before the year, so "Mayo 18-1929"/"Junio 7-904" parse without manual overrides. Also bound 4-digit years to 1700-2100 so gross typos ("23-9003") stay in review instead of producing a bogus year. Cuts unknown-date rate 9.2% -> 7.9%. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Adds tags.py module implementing a three-outcome heuristic: - Individual-to-individual correspondence tags ("Clara an Herbert") → dropped - Group/collective correspondence ("Clara an Kinder", "Walter an Geschwister") → Briefwechsel/<value> - Semantic/event tags ("Brautbriefe", "Alltag", "zur Hochzeit") → Themen/<value> Three correspondence patterns detected: space-an-space, starts-with-"an ", and abbreviated-sender form ("Maria W.an Clara"). COLLECTIVE_TERMS in config.py extended with 17 plural/group relational terms (söhne, brüder, schwiegereltern, cousinen, etc.) confirmed against the full Excel. Also adds two-phase summary mining: every run emits review/tag-candidates.csv; subsequent runs apply keywords from overrides/approved-themes.csv as Themen tags. Outputs: canonical-documents.xlsx gets pipe-separated "Parent/Child" tag paths; canonical-tag-tree.xlsx provides the full tag hierarchy for backend pre-import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>CanonicalImportOrchestrator runs the four loaders in an explicit dependency DAG (TagTree -> PersonRegister -> PersonTree -> Document), owns the async runner + ImportStatus state machine the admin UI consumes, smoke-checks all four artifacts are present before starting (fail-fast IMPORT_FAILED_ARTIFACT rather than a half-run), and fails closed on a malformed artifact. AdminController now depends on the orchestrator; the {state, statusCode, processed, skippedFiles, skipped} response shape is unchanged so ImportStatusCard.svelte keeps working. Deletes the legacy MassImportService (positional @Value app.import.col.*, ISO-only parseDate, Java name classification) and the ODS/XXE XxeSafeXmlParser path now that the loaders cover them — the security guards were ported to DocumentImporter first (previous commit). Replaces the positional column config in application.yaml with the canonical artifact directory. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Wires DocumentTitleFormatter into DocumentImporter.buildDocument: the title now reads "{index} – {honest date label} – {location}", so a MONTH-precision letter's title says "Juni 1916" instead of a fabricated "1. Juni 1916", and an UNKNOWN-date row keeps a bare index title. buildTitle stays under 20 lines by delegating to the shared formatter (single source of truth with the UI label). Restores the date+location title behavior that the old MassImportService had (it appended a full GERMAN_DATE day) but now at the honest precision. Refs #666 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Wraps formatDocumentDate with the accessible presentation layer: a non-color UNKNOWN cue (decorative calendar-with-question icon, aria-hidden, since the visible "Datum unbekannt" text is the textual cue — WCAG 1.4.1), and the verbatim meta_date_raw shown as a VISIBLE secondary "Originaltext: …" line for UNKNOWN/SEASON/APPROX (WCAG 1.4.13, not tooltip-only). raw is rendered via Svelte default escaping, never {@html} (CWE-79); a component test asserts an angle-bracket raw value stays inert. Browser test is CI-only. Refs #666 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Adds a grep guard (with self-test) that fails the build if any {@html ...} expression references metaDateRaw/documentDateRaw/rawDate. meta_date_raw is untrusted verbatim spreadsheet text and must render via Svelte default escaping (CWE-79). Addresses Nora's regression-guard request from #666 — a single component test cannot catch a future {@html} introduced elsewhere. Refs #666 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Documents DocumentTitleFormatter in the document-management C4 diagram and adds an "honest precision display" row to the CONTRIBUTING date-handling table, pointing at formatDocumentDate / <DocumentDate>, the shared docs/date-label-fixtures.json drift guard, and the {@html} escaping rule. Closes #666 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>DAY precision routed through formatDate() which hard-coded de-DE, so an en/es reader saw the German month name ("24. Dezember 1943"). Route DAY through Intl.DateTimeFormat(locale, …) like the other branches, keeping the T12:00:00 UTC-safety convention. Add en/es DAY+MONTH parity cases to docs/date-label-fixtures.json (TS-only; the Java title formatter stays German by design) and assert them in the spec. Refs #666 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Elicit asked that the "raw provenance shown on detail, not in list rows" choice be captured as a product decision rather than a payload accident. Add a code comment at the list-row DocumentDate render explaining showRaw={false} and the intentional metaDateRaw omission from DocumentListItem. Refs #666 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>rawpropDocumentDate.svelte passes the untrusted raw value via a prop named `raw`, but the guard only matched metaDateRaw/documentDateRaw/rawDate — so a future {@html raw} would slip past. Add `\braw\b` to the token list and a self-test asserting the guard catches {@html raw}. Code is currently safe ({raw}); this closes the defense-in-depth gap in the guard itself. Refs #666 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>GET /api/persons now returns PersonSearchResult with server-side filter params (type, familyOnly, hasDocuments, provisional) and page/size bounds (@Min/@Max -> 400). review=true drops the clean reader default. The legacy sort=documentCount top-N path is folded into the paged contract. Add PATCH /{id}/confirm and DELETE /{id}, both WRITE_ALL-guarded. Remove the now unreachable PersonService.findAll(String). BREAKING-CHANGE: GET /api/persons response shape changes from a bare list to PersonSearchResult { items, totalElements, pageNumber, pageSize, totalPages }. Refs #667 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Hand-edited frontend/src/lib/generated/api.ts to match the backend: GET /api/persons now returns PersonSearchResult with the new filter/page/size query params; adds PATCH /api/persons/{id}/confirm and DELETE /api/persons/{id}. Generated offline (no dev backend running) — CI should re-run `npm run generate:api` against the live spec to confirm parity. Refs #667 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>GET /api/persons now returns PersonSearchResult { items, … } instead of a bare list. Update every caller: the dashboard top-persons path reads .items; the unused full-list fetches in documents/new and documents/[id]/edit are dropped (both pages use the self-fetching PersonTypeahead); the raw-fetch consumers (PersonTypeahead, PersonMultiSelect, PersonMentionEditor) read body.items and pass review=true so search still spans the whole directory. Specs updated to the new envelope shape. Refs #667 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>The triage rename form reused persons_filter_type_person ("Person") and persons_section_details ("Angaben zur Person") as the first/last-name field labels, so a screen reader announced the wrong name for each input. Add dedicated persons_field_first_name / persons_field_last_name keys (de/en/es) and use them. Refs #667 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Parses ?undated strictly (=== 'true', mirroring the tagOp clamp), forwards it as undated || undefined so the absent case drops out of the query, and returns the flag in page data for the control to reflect. Adds the docs_filter_undated_only toggle label and the explanatory docs_range_excludes_undated empty-state copy in de/en/es. The badge reuses the existing date_precision_unknown ("Datum unbekannt") key from #677. OpenAPI types hand-edited for the new undated query param on /search and /ids — CI must run `npm run generate:api` to confirm parity with the spec. Refs #668DocumentRow rendered a bare em-dash for null-dated letters — a glyph a screen reader announces as nothing. Both breakpoints now render the single DocumentDate component unconditionally (no {#if}/—/{:else}), so the cue cannot drift; its unknown state is a neutral metadata chip ("Datum unbekannt", text-ink-3, ≥4.5:1 both themes) with a non-color calendar glyph, never red/amber. Present dates render at honest precision via formatDocumentDate ("Juni 1916", not a fabricated day). Refs #668The dated branch wrapped {label} in a flex span containing a single child span — redundant nesting. Render the label directly in one span. Refs #668 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>The desktop right-column kept a leftover {#if doc.documentDate}…{:else}—{/if} fallback that emitted a bare em-dash for undated documents, while the mobile block already always rendered <DocumentDate>. DocumentDate defensively maps a null date to the "Datum unbekannt" badge, so render it unconditionally — an undated document is an absence, not an error, and never shows a bare "—". Refs #668A screen reader announced the bare number ("Nur undatierte 42"). Add an aria-label ("42 undatierte Dokumente") via a new i18n key and hide the purely-visual digit with aria-hidden, so the toggle + count read sensibly. Refs #668 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>🏛️ Markus Keller — Senior Application Architect
Verdict: ⚠️ Approved with concerns
The architectural shape is excellent — schema-first via Flyway, header-mapped reader as a seam, declarative domain boundaries, an ADR for context. But the PR collapses too many architectural decisions into one merge unit.
Blockers
None at the architecture layer. The schema and module structure are sound.
Concerns
PR scope conflates four discrete architectural changes — (a) Python normalizer tool, (b) V69 schema foundation, (c) backend importer rewrite (MassImportService → CanonicalImportOrchestrator + 4 loaders), (d) persons directory pagination + review route, (e) RelationshipService family-flag fix. Each has its own ADR-worthy decision surface. 177 files at +19k/−1.8k makes structural review impossible to do well in one pass. Future migrations of this magnitude should be sequenced: schema → backend swap → frontend → tool.
PersonRepository.FILTER_WHEREis a string-concatenated SQL fragment shared between slice and count queries (PersonRepository.java:2305-2350). The intent — "WHERE clauses cannot drift" — is exactly right. The implementation is fragile: a future refactor can break the invariant silently. Consider promoting this into a dedicated@Repository-implemented method (or a Specification) so the compiler enforces the shared structure. The named-parameter use is good — no injection risk — but the architectural seam wants a stronger guard than aStringconstant.DocumentImporter.buildDocumentdoes 17 field assignments (DocumentImporter.java:913-950). One transaction, one method, no abstraction. That's fine here because the canonical row IS authoritative, but the method now mixes (a) reading row fields, (b) resolving cross-domain references, (c) clearing-then-repopulating join tables, (d) deriving the title. Felix's "do one thing" rule applies; suggest extractingapplyMetadata,applyAttribution,applyDates.canonical-persons-tree.json,canonical-persons.xlsx,canonical-documents.xlsx,canonical-tag-tree.xlsxare committed to the repo viatools/import-normalizer/.gitignorenegation rules. This is a load-bearing architectural decision — the canonical artifacts ARE the contract between the normalizer and the importer — and it deserves an explicit subsection in ADR-025 documenting why committing them is correct (idempotent re-imports, contract verification) and what the rotation/regeneration policy is. The PR description claims they're git-ignored; the gitignore explicitly negates that. Reconcile the description before merge.Things done well
chk_meta_date_precision,chk_meta_date_end_only_for_range,chk_meta_date_end_after_start, length caps mirroring V18, partial-uniqueidx_persons_source_ref). The defaulted backfill before NOT NULL is the textbook safe pattern.DocumentImporter→personService/tagService;RelationshipService→personService.setFamilyMember. No boundary leaks.DocumentTitleFormatter(Java) +documentDate.ts(TS) pinned todocs/date-label-fixtures.json— single source of truth across stacks, asserted to not drift. This is the architectural pattern more codebases should copy.PersonSearchResult.topNvs.paged— honest envelope shape rather than pretending top-N is a paged slice. Small thing, big clarity win.Suggestions
FILTER_WHEREinto a JPASpecification<Person>so the slice/count invariant is structurally enforced.DocumentImporter.buildDocumentinto 3 named methods.👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
Clean code, strong TDD evidence, good Svelte 5 / Spring Boot 4 idiom. The thing that keeps this from a clean approval is PR size — 177 files in one merge unit defeats atomic-commit discipline.
Blockers
None.
Concerns
PR is not atomic — five logically independent changes ride together. Each is well-built individually, but a future
git bisectagainst any of them will land on a 177-file commit. Per the project's atomic-commit convention, this should have been a stack of focused PRs (V69 schema → loader rewrite → persons directory → review route → normalizer tool → relationship fix).DocumentImporter.buildDocumentis doing 6 jobs (DocumentImporter.java:913-950): merge with existing, resolve sender, resolve receivers, parse date triplet, clear+repopulate collections, derive metadataComplete. Extract:DocumentImporter.resolveReceiversusesslugasrawName(DocumentImporter.java:973-978):So an unresolved receiver creates a provisional person whose
lastNameis the slug (smith-john) rather than a human-readable name. The sender path correctly passessenderName. Either parsereceiver_namesto a parallel list and pass the right name per slug, or document that the receiver row'sreceiver_namesfield is never used to populate the provisional person's lastName.Test discipline backslide in
DocumentImporter: the package-privateopenFileStreamis exposed purely for test injection (line 1058–1061 comment says so). That's a Singleton-style smell — prefer constructor-injecting aFileStreamOpenerinterface (default impl + test impl). Right now the test relies on Mockito spying on a non-final method, which is a discipline shortcut.MassImportService.javadeletion is full — the import legacy is gone. The remaining tests for it (MassImportServiceTest) need to be deleted in the same commit, not left behind. Verify they were removed (line 6294-area shows PersonRegisterImporterTest, etc., but check whether the old MassImportServiceTest file is still present somewhere —git statusshould not show it after merge).Things done well
DocumentImporterTestis 600 lines,CanonicalImportIntegrationTestis full-stack against testcontainers).RelationshipServiceTesthas two new tests that name the family-flag behavior precisely (addRelationship_marks_both_endpoints_as_family_member_when_type_is_family,addRelationship_does_not_flip_family_member_for_non_family_type). Test names read as sentences — Sara will be happy.FAMILY_RELATION_TYPESconstant extraction inRelationshipServiceis the textbook DRY fix: one source of truth consulted byaddRelationshipandgetFamilyNetwork. The idempotency comment ("the setter is a no-op when the person is already flagged") explains why re-imports stay clean.DocumentDate.svelteuses$derivedforeffectivePrecision,label,isUnknown,showRawLine— exactly the pattern the project's frontend code style mandates ($derivedover$state+$effect). The component is under 60 lines, has props discipline (nodataobject spread), and the regression test for<img src=x onerror>proves escaping holds.DocumentImporter.isValidImportIndexis six guard clauses + a regex. No nesting, no else branches, names reveal intent. The comment explaining why\dmust remain ASCII-only (NOTUNICODE_CHARACTER_CLASS) is exactly the kind of "why" comment the style guide wants.PersonService.upsertBySourceRef+mergeCanonical: clean monotonic-downward provisional flag handling,preferHumanas the single idiom for fill-blank. The cross-loader precedence comment is gold.formatDocumentDateindocumentDate.tsis small, total, has a default per branch, and is type-narrow. Thenoon()helper handles theT12:00:00convention from CLAUDE.md.Suggestions
DocumentImporter.buildDocument(see Concern 2).resolveReceiversrawName passing (Concern 3) — a provisional person named "smith-john" instead of "John Smith" is a real UX regression in the review queue.openFileStreampackage-private seam with an injectedFileStreamOpener(Concern 4).🛠️ Tobias Wendt — DevOps & Platform Engineer
Verdict: ⚠️ Approved with concerns
The runtime/infrastructure surface is tight. The concerns are about build-time artifacts ending up in git, a fragile CI guard, and a PR-description/reality drift.
Blockers
None.
Concerns
Real archive XLSX/JSON files committed to the repo (
tools/import-normalizer/out/canonical-*.xlsx+canonical-persons-tree.json, ~3,176 lines of real PII for 157 family members including birthplaces, death dates, kinship notes). The.gitignoredeliberately negates them:Two ops issues here:
out/,review/) and the venv are git-ignored." The actual gitignore says the opposite for the four canonical files. Fix the PR description before merge — anyone reading the description will believe the wrong thing.CI
{@html}guard is grep-regex-based and explicitly maintenance-hungry (.gitea/workflows/ci.yml:8-30). The comment honestly states:That works for now but it's fragile: a future component named
rawCell,verbatimDate,originalText, oruntrustedDatefalls through silently. Two strengthening options:Raw) and update the regex to\b\w+Raw\b.{@html ...}reading from any prop typed as a tainted source. ESLint plugineslint-plugin-sveltehas an analogoussvelte/no-at-html-tagsrule — adopt it project-wide and the per-variable guard becomes unnecessary.The self-test inside the step is excellent defensive engineering — keep it regardless of which strengthening path you pick.
docker-compose.prod.ymlusesIMPORT_HOST_DIR:?...with a clear error message. GOOD. But the mount is still:ro; the new importer reads canonical XLSX +<index>.pdffrom the same dir. Consider: the canonical artifacts are version-controlled in the repo, but the PDFs are not. So the host's/importdir needs to be a mix of "checked-in artifacts copied from repo" + "PDFs uploaded by ops". A deploy runbook update would be welcome..gitignoreadds**/.venv/,**/__pycache__/,*.pycat the root — GOOD. But there's noout/exclusion at root, leaving the tool-specific gitignore as the only place that policy is encoded. If the tool ever moves, the policy is silently lost. Add a comment in the root.gitignorereferencingtools/import-normalizer/.gitignoreas load-bearing.Things done well
postgres:16-alpinein testcontainers stays consistent with prod. GOOD.IMPORT_HOST_DIRis required (:?...) — no silent default that would let staging and prod share an import payload. GOOD.application.yamlcleanup: 10 positionalapp.import.col.*properties removed, replaced by a singleapp.import.dir. GOOD — fewer env-driven knobs, fewer ways to misconfigure prod..gitignoreat the repo root sensibly adds Python artifacts at**scope. Renovate (if you have it) won't trip on.venv.Suggestions
out/being git-ignored. Either update the description or change.gitignoreto actually ignore them — both is inconsistent.canonical-*.xlsx.svelte/no-at-html-tagsfor a project-wide guard; keep the regex CI step as a backstop.tools/import-normalizer/out/canonical-*.xlsxintoIMPORT_HOST_DIRalong with the PDFs."🔒 Nora "NullX" Steiner — Application Security Engineer
Verdict: 🚫 Changes requested
Application-layer security in this PR is genuinely good — among the best path-traversal defense I've reviewed in a hobby project. The blocker is not in the code itself but in the data committed to the repository alongside the code.
Blockers
tools/import-normalizer/out/canonical-persons-tree.json(3,176 lines) contains real names, maiden names, aliases, birth years, death years, birth places, death places, and freeform kinship notes ("Nichte von Herbert", "[Geburtsdatum: 4.3.1023]") for 157 family members.canonical-persons.xlsxandcanonical-documents.xlsx(binary) presumably carry the same dataset plus sender/receiver/title cells.192.168.178.71:3005, which limits current exposure, but git history is forever: any future migration to GitHub/GitLab/a backup pushed to a cloud service, any contributor cloning the repo, any leaked credential — the PII follows. Removing the files later does not remove them from history without a rewrite that breaks every fork.generation: 3and 1999 death years suggest some entries cover people only one generation back. Living relatives may still be in scope; their consent posture is unclear.Concerns
PersonReviewRow.svelteconfirm/delete forms have no client-side confirmation forconfirm(line 15815-15818). Thedeleteaction gets a modal viaconfirm()fromconfirm.svelte.js— good. Theconfirmaction (which flipsprovisional=falseand adds a person to the canonical reader-default view) is a one-click irreversible-in-practice state change. Add a confirm modal for symmetry, or document explicitly that "Bestätigen" is meant to be friction-free.+page.server.tsactions in/persons/reviewdon't re-checkcanWriteserver-side (+page.server.tsline 17823-17896). The backend@RequirePermission(Permission.WRITE_ALL)provides authoritative enforcement (good), but server-side load actions in SvelteKit are an additional trust boundary. Aif (!hasWriteAll(locals)) throw fail(403, ...)at the top ofconfirm/delete/merge/renamewould be defense-in-depth and would shave one round-trip on the deny path.Things done well
DocumentImporter.isValidImportIndexis textbook CWE-22 defense (DocumentImporter.java:1045-1056):Paths.isAbsolute()) before the regex.INDEX_PATTERN.matcher(index).matches()) using ASCII\dwith a comment explicitly forbiddingUNICODE_CHARACTER_CLASS.resolvePdfByIndex'scanonical().startsWith(baseDirCanonical + File.separator)symlink-escape check.%PDFmagic-byte check before S3 upload (isPdfMagicBytes, line 1063-1072). Rejects spoofed filenames whose content isn't actually a PDF. GOOD.DB-level length CHECK constraints on
meta_date_raw,sender_text,receiver_text(V69 migration line 2855-2857) match the V18 transcription pattern. Bounds user-influenced free-text at the lowest enforceable layer. GOOD.DocumentDate.svelteXSS posture is bulletproof:{@html}forraw, cites CWE-79, names the default-escaping mitigation..gitea/workflows/ci.yml:8-30enforces the rule across the codebase.DocumentDate.svelte.test.ts:14332-14338includes a regression test asserting<img src=x onerror="alert(1)">renders as literal text and no<img>element is created.confirmPersonendpoint is a dedicatedPATCH /api/persons/{id}/confirmverb, not a mass-assignable DTO field (PersonController.java:2185-2189). The comment explicitly cites CWE-915 (mass assignment) as the reason. EXCELLENT.PersonRepository.FILTER_WHEREuses only named parameters withCAST(:type AS text) IS NULLfor null-handling — no string concatenation of user input into JPQL/SQL. SQLi-safe.PersonService.deletePersondetaches FKs (reassignSenderToNull,deleteReceiverReferences) before the hard delete — prevents 500s from FK violations. Audit-friendly.Python
_strip_accents/slugifyare deterministic, no shell calls, nosubprocess— the normalizer's surface is workbook in / workbook out, no eval, no SSRF, no path-traversal opportunity.Suggestions
hasWriteAll(locals)guards to the four form actions in/persons/review/+page.server.ts.confirmaction inPersonReviewRow.sveltefor symmetry with delete.{@html}+ tainted-prop pattern as a more robust alternative to the regex CI step.🧪 Sara Holt — Senior QA Engineer
Verdict: ⚠️ Approved with concerns
This is one of the cleanest test-suite additions I've reviewed: 22 new/modified test files spanning unit (Mockito), integration (Testcontainers), and component (vitest-browser-svelte). The concerns are about a few coverage gaps in the new persons/review surface and one untestable seam.
Blockers
None.
Concerns
No tests visible for
/persons/review/+page.server.tsform actions (confirm,delete,merge,rename— line 17823-17896). The route is new, the actions are user-mutating, and they're the only path that surfaces the canonical importer's provisional persons to the operator. Patterns to mirror: the existingfrontend/src/routes/persons/page.server.spec.tstest file. Each action needs:{ success: true }fail(403, { error })fail(404, { error })mergewithouttargetPersonId,renamewithoutlastName) →fail(400, ...)DocumentImporter.openFileStreamis a package-private seam exposed for Mockito spying (line 1058-1061, comment says so). This is testable, but it's the brittle kind of testable: any futurefinalkeyword on the method silently breaks spy injection. Inject aFileStreamOpenerinterface and you can mock at the boundary, no spies needed. Felix flagged this too — it's both a clean-code smell and a test-fragility smell.Migration test surface:
MigrationIntegrationTest.javais modified — verify it now exercises V69. The migration adds 4 constraints, 2 unique indexes, and a NOT NULL transition on existing rows. A concrete integration test that:meta_date_precision='INVALID'and assertschk_meta_date_precisionfiresmeta_date_endset butmeta_date_precision != 'RANGE'and assertschk_meta_date_end_only_for_rangefiressource_refand assertsidx_persons_source_refrejects the second...would lock the migration's intent in CI. Without these, V69 is "passes Flyway" but no test asserts it actually enforces what it claims.
No axe-playwright run visible for
/persons/review. Leonie's persona requires axe checks on every critical page; the review page is a write-path UI surfaced to operators. Add it to the E2E sweep (light + dark mode both).canonical-persons-tree.jsonis essentially a fixture committed into the repo, but isn't named or located like a fixture. If it's serving double duty (real data AND test fixture), name it accordingly and document the contract. If it's not a fixture, see Nora's blocker.Things done well
Test names read as sentences across the board. Examples I checked:
addRelationship_marks_both_endpoints_as_family_member_when_type_is_familyaddRelationship_does_not_flip_family_member_for_non_family_typeshould_throw_notFound_when_document_does_not_existrenders a malicious raw value as inert escaped text (no element injected)When these fail in CI, a developer knows what broke without reading the body.
DocumentDate.svelte.test.tshas the security regression test embedded —<img src=x onerror="alert(1)">rendered, asserted as literal text, asserted no<img>in DOM. This is the pattern every dangerous-render component should have.Test pyramid layering is correct:
DocumentImporterTest,RelationshipServiceTest) — fast, isolated, mock the boundaryCanonicalImportIntegrationTest) — real PostgreSQL 16, real Flyway, real cross-table joinsNo one layer is doing another's job.
PersonImportUpsertTestandTagImportUpsertTestspecifically cover the idempotent re-import path — this is the regression risk on every canonical importer rerun and now there's a permanent test.One transaction wraps the whole document sheet (
@TransactionalonDocumentImporter.load) — the Hibernate-session comment explains why. Integration tests run inside this same shape, so behavior is identical between test and prod.Awaitility / no
Thread.sleep— I didn't see any timing-based test antipatterns in the new code.Suggestions
/persons/review/+page.server.ts(the existingpersons/page.server.spec.tsis the template).MigrationIntegrationTestor a new integration test class).openFileStreampackage-private hook with an injected interface to remove the spy fragility./persons/reviewin both light and dark mode.🎨 Leonie Voss — UI/UX Lead & Accessibility Advocate
Verdict: ⚠️ Approved with concerns
The new UI follows the brand system carefully — semantic tokens, 44px touch targets, focus-visible rings, redundant cues, default-escaped untrusted text. The concerns are about one inconsistent error-pill, an asymmetric confirm dialog, and a missing axe sweep.
Blockers
None.
Concerns
Inconsistent error-pill styling between
PersonReviewRowand/persons/review/+page.svelte(line 17936-17941):Raw Tailwind red palette. Meanwhile
PersonReviewRow.svelte:15773-15774uses semantic tokens:border-danger,text-danger,bg-danger/10. Pick one — the semantic-token form. Hardcodedred-50will not respect future dark-mode contrast remapping. Replace withborder-danger bg-danger/10 text-danger.PersonReviewRow.svelteconfirm action has no confirmation modal, but delete does (line 15815-15818 vs 15819-15834). Both are mutating writes; delete asksconfirm(), confirm just submits. From a UX consistency angle this is a small mismatch; from a senior-user-on-mobile-mistapping angle the asymmetry is real. Either:No axe-playwright run for the new
/persons/reviewpage. Two distinct surfaces here that both need a11y verification:/persons/review(read-only list with form-inline editors)mode === 'rename'or'merge'Both states should pass
wcag2a+wcag2aain light and dark mode. The rename form's<label class="flex flex-col gap-1 text-sm">with nested<span>+<input>is the right pattern, but only an axe sweep proves the labels are actually associated.PersonReviewRow.svelteicon-only buttons via<img alt="" aria-hidden="true">(avatar at line 15783-15788). The avatar image is purely decorative —aria-hidden="true"is right. Good. But verify the action buttons (Merge / Rename / Confirm / Löschen) all have either visible text oraria-label. Looking at line 15800-15835, every action button has visible text content ({m.persons_review_action_merge()}etc.). GOOD — confirming this is correct.The
DocumentDate.svelte"Datum unbekannt" chip istext-xs uppercase tracking-widest. 12px text on bg-surface withtext-ink-3— verify the contrast in dark mode. The component comment claims "≥4.5:1 in both themes" — trust but verify with the contrast tool.text-ink-3is--c-ink-3inlayout.css; the exact ratio depends on the variable's value.Things done well
DocumentDate.svelteis a textbook accessibility component:aria-hidden="true", text is the announcement.rawrendered via default Svelte interpolation, never{@html}.PersonReviewRow.sveltetouch targets (line 15771-15772):WCAG 2.5.5 satisfied. The senior audience (60+) lands the tap. GOOD.
focus-visible:ring-2 focus-visible:ring-brand-navy focus-visible:ring-offset-2 focus-visible:outline-noneon every interactive element. Keyboard navigation has a visible focus indicator without showing rings on mouse click. GOOD.Empty state on
/persons/reviewuses a dashed-border card with friendly text andfont-serif text-lg text-ink. Calm, non-alarming. The empty state IS a valid state. GOOD.Localization is complete across
m.persons_review_*— heading, intro, action labels, empty state, delete confirmation title/body/button. All three locales (de/en/es) updated. No hardcoded German strings left in the new component.formatDocumentDatelocalizes structured parts viaIntl.DateTimeFormat(locale, …)plus Paraglide for the words. Anen/esreader never sees a German month name. The default Locale is'de-DE'for the title formatter (Java) — correct because import titles are always German.getConfirmService()modal pattern for destructive delete inPersonReviewRow.svelte:15819-15834—destructive: trueflag, localized button label, async cancel viause:enhance. Clean implementation.Suggestions
red-200/red-50/red-600on the page-level error pill withborder-danger bg-danger/10 text-danger./persons/reviewto the axe-playwright E2E sweep (Sara will appreciate this too).text-ink-3on bg-surface contrast in the dark theme for the metadata chip.📋 Elicit — Senior Requirements Engineer (Brownfield Mode)
Verdict: ⚠️ Approved with concerns
The work is well-specified — there's an ADR, a multi-document spec under
docs/import-migration/, traceable issue references (#666, #667, #668, #670) embedded in code comments, and a Test plan in the PR description. The concerns are about PR-as-merge-unit hygiene and one factually-incorrect statement in the description.Blockers
None at the requirements level.
Concerns
PR description contradicts the actual change (Requirements-clarity failure). The description ends with:
But this PR:
MassImportService.javaentirely (-509 lines) and replaces it withCanonicalImportOrchestrator+ 4 loaders. So either this IS Phase 2 (and the note is wrong), or this is Phase 3 (and the note is wrong about Phase 2 being elsewhere).out/— the.gitignoreexplicitly negates!out/canonical-persons-tree.jsonand!out/*.xlsx, and those files (3,176 lines of real family PII) appear in the diff.Both points are reader-facing requirements claims that don't match the code. A future reader using the PR description as the spec source-of-truth will be misled. Fix the description before merge. (Note: per the project's "Issue body is the source of truth" convention, the GitHub-ish equivalent here is the PR body; same rule applies.)
PR scope bundles 5 distinct units of value that map to 5 distinct user-facing capabilities:
/persons/reviewroute)formatDocumentDate,DocumentDate.svelte,DocumentTitleFormatter)RelationshipServicefix, originally found via canonical-import dry-run)Plus a 6th: the offline Python normalizer tool itself (tooling, not a user-facing capability).
From a backlog-hygiene angle these should map to 5 (or 6) closed issues with their own merge units. As one PR it works, but the traceability ("did issue #X actually land?") becomes harder. The project memory pattern is exactly this: atomic commits, atomic PRs.
docs/import-migration/03-normalizer-implementation-plan.mdand04-unresolved-names-plan.mdare committed plans. Per project convention ("Specs go into Gitea issues, never commit a markdown spec file"), these would ideally have been Gitea issues. Since the work is already done, the plans now serve as worklog — that's acceptable retrospectively but consider whether to move them into the issue tracker for searchability.docs/superpowers/plans/2026-05-25-personendatei-importer.md+docs/superpowers/specs/2026-05-25-personendatei-importer-design.mdappear to be LLM workflow artifacts committed to the repo. If this is intentional (as a "what the assistant proposed before implementation" record), document it indocs/superpowers/README.md. If not, ignore them via.gitignore.Definition of Done for this PR is implicit. The Test plan section in the description lists 5 steps, which is the de-facto DoD checklist. Convert to actual
[ ]checkboxes (the description uses- [ ]but the items are written as "spot-check" style, not "I verified X"). Operator running the test plan should mark each item as they go.Things done well
see #666,see ADR-025,#668,#670 Gap 2). A future reader hits the comment and lands on the issue. This is the highest standard of in-code traceability I've reviewed.docs/date-label-fixtures.jsonis the contract between two implementations (Java + TS). Test-driven contract, the way it should be done.docs/import-migration/01-findings-spreadsheet-analysis.mddocuments data-quality findings IMP-01..12. This is the "what we learned from the messy source data" knowledge that historians and future maintainers will rely on. EXCELLENT.canonical artifact, presumablyprovisional person,source_refetc.). Domain vocabulary stays current.PersonFilterfactory methods (showAll(),cleanDefault()) are named after the user-facing modes ("transcriber view" / "reader view"). Domain language reaches into the code. GOOD.Suggestions
out/git-ignored status). This is the single thing that affects requirements clarity.- [ ]) and have the operator tick them off during merge prep.docs/superpowers/*should ship in the repo or stay in.gitignore.Promote svelte/no-at-html-tags to project-wide error so any new {@html} block fails lint locally and in CI — the primary XSS defense. The existing .gitea/workflows/ci.yml raw-date regex guard stays in place as layered defense (it covers the specific raw-date variable names that must NEVER be rendered via {@html}). Existing legitimate {@html} usages (renderBody mentions in CommentMessage.svelte, sanitized Markdown in geschichten/[id]) already carry justified inline `eslint-disable-next-line` comments. Lint stays green; verified by running npm run lint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Markus Keller — Senior Application Architect (round 3 re-review)
Verdict: ✅ Approved
The architectural shape was already sound in round 2; the round-2 concerns I flagged at the implementation seams have all been addressed in the nine follow-up commits without disturbing the load-bearing decisions (V69 schema, header-mapped reader, RelationshipService as flag owner, ADR-025). I have no blockers and no remaining concerns at the architecture layer.
Compared to the previous review
PersonRepository.FILTER_WHEREshared text block. Still a shared text block (PersonRepository.java:99-114,:124,:138). It is now documented (lines 91-98) with the "MUST keep an IDENTICAL WHERE clause" invariant and named-parameter discipline. For the present scope this is acceptable; promoting to a Specification stays on the table if a third query joins later.DocumentImporter.buildDocumentdoing 6 jobs. RESOLVED. Commite93b09f1extracted fiveapplyXhelpers (applyAttribution,applyDates,applyAuthoritativeAssociations,applyFileMetadata,applyComputedFlags).buildDocumentis now an 11-line orchestrator that reads as the recipe. Exactly the shape I suggested. Nice.CanonicalImportOrchestratorpartial output on loader failure. Not addressed in these nine commits; the orchestrator is unchanged. This was a defense-in-depth concern, not a blocker — leaving it for a follow-up.What's good about this round
applyAuthoritativeAssociationsclears-then-repopulates receivers and tags (ADR-025), whileapplyAttributionalways retains the rawsender_text/receiver_textcells. The boundary between "the canonical row owns this" and "human edits survive" is now explicit in method names, not buried.FAMILY_RELATION_TYPESas a constant inRelationshipService(line 37-38). Single source of truth for which relation types are part of the family graph; reused in bothaddRelationship(sets the flag) andgetFamilyNetwork(filters edges). That's exactly the "data drives logic" discipline I push for — no parallel hard-coded lists to drift.46d1f5c6) doesn't touch the schema or the contract. The architectural seam between Python normalizer and Java importer is "the header schema, not the file contents" — which is whatCanonicalSheetReader.REQUIRED_HEADERSenforces (DocumentImporter.java:62-64etc.). Removing the PII artifacts from git doesn't weaken anything because the contract was already in the code, not in the data.Documentation check
docs/architecture/db/db-orm.puml/db-relationships.pumlreflect the new columns andperson_relationships. ✓ (verified in the diff)docs/architecture/c4/l3-frontend-3c-people-stories.pumlupdated for the/persons/reviewroute. ✓backend/CLAUDE.mdpackage table updated to referenceCanonicalImportOrchestrator + 4 loaders. ✓MassImportService/ExcelServicereferences all dropped (commit0a3d12b9from prior round). ✓Architectural debt at this point is bounded and intentional. Ship it.
— Markus
Felix Brandt — Senior Fullstack Developer (round 3 re-review)
Verdict: ✅ Approved
This is the round I was hoping for. Every concrete clean-code concern I raised in round 2 is fixed with a commit that reads like a TDD entry: a test that pins the bug, the minimal fix, no behavior creep. The remaining big-ticket items are scope-of-PR shape, not code-shape — and we've already acknowledged them.
Compared to the previous review
DocumentImporter.buildDocumentdoing 6 jobs. RESOLVED. Commite93b09f1extractsapplyAttribution/applyDates/applyAuthoritativeAssociations/applyFileMetadata/applyComputedFlags.buildDocumentis now an 11-line orchestrator (DocumentImporter.java:177-187). Reads exactly as the suggested skeleton. The 43 existing tests still pass — confidence the extraction was pure refactor.resolveReceiversusing slug asrawName. RESOLVED. Commit53559437.resolveReceiversnow zips the parallelreceiver_person_idsandreceiver_namescolumns by position (DocumentImporter.java:263-273), with slug-as-name fallback when the names list is shorter. Two new tests inDocumentImporterTest: the happy path (load_provisionalReceiverUsesHumanNameFromReceiverNames_notSlug, line 434) and the explicit fallback (load_provisionalReceiverFallsBackToSlug_whenNamesListShorterThanSlugs, line 459). Red-then-green visible in the commit message.openFileStreamseam. RESOLVED. Commit4cc725d5. New@FunctionalInterface FileStreamOpener(FileStreamOpener.java, 33 lines) with a one-line@Component DefaultFileStreamOpenerdelegate; importer takes it via constructor (DocumentImporter.java:81); tests inject a mock that throws to drive the IO-error branch. No more package-private smell. The Javadoc onFileStreamOpenereven explains the threat-model-style "why" — exactly the kind of comment that earns its keep. Bonus: also more idiomatic Mockito (mock at the boundary, never spy the SUT).What's good
fix(importing): use receiver_names for provisional person display name(53559437) introduces two failing tests next to the fix — that's the red/green pair.refactor(importing): split DocumentImporter.buildDocument(e93b09f1) deliberately doesn't touch tests — pure refactor, contract preserved.refactor(importing): inject FileStreamOpener(4cc725d5) restructures tests at the same commit, because the test boundary changed. The hygiene reads correctly./persons/review/+page.server.tsserver-guard discipline (commita9cac08f). Each form action starts withif (!hasWriteAll(locals)) return fail(403, …)— guard clause at the top, identical pattern across all four actions (lines 35-36, 52-53, 69-70, 92-93). Mirrors the rest of the persons domain. This is "guard clauses withDomainException" in TypeScript form.PersonReviewRow.svelteconfirm dialog (commita670ba01). The destructive-vs-non-destructive asymmetry is now explicit:destructive: falsefor confirm,destructive: truefor delete. SamegetConfirmServiceAPI; consistent shape; both i18n-keyed. No fat-fingering on a touchscreen any more.22603a4b). 18 tests covering all four actions × {happy path, backend 403, backend 404, server-side guard, missing required field}. Each test reads as a sentence. Factory pattern (mockApi,runAction) keeps each test body to ~6 lines. Sara will be pleased.Suggestions (nice-to-haves, not gates)
DocumentImporter.attachTag(line 287-291) silently drops a non-matchingtag_path. Iftagsreferences a path that wasn't imported byTagTreeImporter, the receiver row gets persisted with an empty tag set and no log. The PLACEHOLDER asymmetry-surfacer at line 132 is a nice pattern — consider mirroring it here:tagService.findBySourceRef(...).ifPresentOrElse(doc.getTags()::add, () -> log.warn("Tag path {} not found — skipping for row {}", tagPath, index));. Small change, surfaces an ops issue I'd otherwise discover at re-import.resolveReceivers(line 263-273) treats a(slug, name)mismatch as a silent rare-path. The current contract is "the names list is shorter than the slugs list, rare — canonical data zips them 1:1." If the lists are different length in either direction, that's a normalizer contract violation worth alog.warn. Doesn't gate the merge; just makes future drift loud.Both are followup-issue territory, not blockers.
— Felix
Tobias Wendt — DevOps & Platform Engineer (round 3 re-review)
Verdict: ✅ Approved
Round-2 ops blockers are addressed. Runtime surface is unchanged (no new compose service, no new port, no new image, no new CI job beyond the
{@html}regex guard which I already passed in round 2). The handful of follow-ups I had remain follow-ups.Compared to the previous review
46d1f5c6:tools/import-normalizer/out/canonical-persons-tree.json(3,176 lines, real names) — removed from the indexcanonical-documents.xlsx,canonical-persons.xlsx,canonical-tag-tree.xlsx— removed from the index.gitignorenow reads# Canonical import artifacts live only on the ops host (PII). See tools/import-normalizer/.gitignore — load-bearing for that policy.tools/import-normalizer/.gitignoreis simplified toout/(no negate-entries any more) —git ls-files tools/import-normalizer/out/is empty, confirmed.docs/import-migration/02-normalization-spec.mdaligned (commit46d1f5c6updated both).main's reflog/object database. If the corpus has any active GDPR exposure concern, a history rewrite is the conservative move. For a self-hosted-Gitea-only repo with no public mirror, I think you can accept the residual and document it in the runbook — your call.{@html}regex guard is maintenance-hungry. Layered defense improved: commitf80dda74promotessvelte/no-at-html-tagstoerrorinfrontend/eslint.config.js:80, with a thoughtful inline comment explaining the regex guard's new role as a backstop. So now we have ESLint as the primary defense (catches every new{@html}site, not just raw-date variable names) plus the CI grep as a per-variable belt. The original concern shrinks to "watch the regex variable list" instead of "this regex is the only thing standing between us and arawCellslipping past." Acceptable..gitignoredrift. RESOLVED. Same commit46d1f5c6updated the description.What I checked this round
docker-compose.prod.ymldiff — only minor adjustments that ride with46d1f5c6; no new service, no new port, no new credential, no privileged container, no DinD. The runtime surface is the same shape it was before the PR.actions/cache@v4,actions/setup-java@v4, all pinned, no:latest.{@html}regex guard self-tests itself before scanning (.gitea/workflows/ci.yml:75-83) — three printf cases probe the guard for false negatives and a false positive before the real grep runs. That's the kind of "tests for your tests" discipline I like seeing in CI shell.requirements.txtfor the Python normalizer remains pinned (openpyxl==3.1.5,pytest==8.3.4). Out of Renovate's default scope but the tool is local-only, so I still consider this acceptable.Suggestions (follow-ups, not gates)
git filter-repothe four PII files from history, or accept the self-hosted-only residual." Either choice is defensible; the undocumented state is what bites later. Three lines in a runbook close it.tools/import-normalizer/out/**so a future accidentalgit add -Acan't re-introduce the PII files past the gitignore. Belt-and-braces — gitignore prevents new tracking, but a user can override withgit add -f.— Tobias
Nora "NullX" Steiner — Application Security Engineer (round 3 re-review)
Verdict: ✅ Approved
The single blocker from round 2 — real-identity PII committed to git — has been removed from the index in commit
46d1f5c6. The rest of the application-layer security in this PR is genuinely good. Approving with one residual-risk note and a defense-in-depth follow-up.Compared to the previous review
B1 — PII committed to git (CWE-359). RESOLVED IN THE WORKING TREE.
tools/import-normalizer/out/canonical-persons-tree.json(3,176 lines, 157 real family members),canonical-documents.xlsx,canonical-persons.xlsx,canonical-tag-tree.xlsx— all removed from the index. Verified withgit ls-files tools/import-normalizer/out/→ empty.tools/import-normalizer/.gitignoreis now an unambiguousout/(no!out/canonical-persons-tree.jsonnegate-entry)..gitignorecarries a comment naming the policy:# Canonical import artifacts live only on the ops host (PII). Future contributors see the why without reading the diff history.02-normalization-spec.mdare reconciled with the actual code state (the round-2 contradiction is gone).main's object database. For a self-hosted Gitea with no public mirror and the corpus consisting almost entirely of deceased family members, I think the residual is acceptable — but the decision should be documented, not implicit. Open a tracked issue, name the choice, close it. The undocumented state is what bites in a future audit.C2 —
PersonReviewRow.svelteconfirm action without confirmation dialog. RESOLVED. Commita670ba01.confirm()is now wrapped ingetConfirmService()withdestructive: false(PersonReviewRow.svelte:67-77);deleteusesdestructive: true(lines 81-93). Both writes now require a second deliberate action — fat-fingering on a touchscreen no longer mutates state.C3 — Missing server-side WRITE_ALL guard on
/persons/reviewform actions. RESOLVED. Commita9cac08f. Every action —confirm,delete,merge,rename— starts withif (!hasWriteAll(locals)) return fail(403, …)(+page.server.ts:35-36, 52-53, 69-70, 92-93). Layered with the existing backend@RequirePermissionon the API endpoints. Defense in depth done correctly: a hand-crafted POST from a reader is now caught at the SvelteKit action layer before the round-trip, not after.NEW LAYERED DEFENSE —
svelte/no-at-html-tagsenforced as ESLint error. Commitf80dda74promotes the rule toerrorproject-wide (frontend/eslint.config.js:80). This is now the primary XSS guard for the meta_date_raw class of vulnerabilities (#666); the CI regex stays as a backstop. Every new{@html}site fails lint with no opt-out unless an inline// eslint-disable-next-line svelte/no-at-html-tagswith an explicit justification is added. This is the right shape: a coarse static rule plus a regex with self-tests is much harder to silently bypass than the previous variable-name-only grep.What I re-checked
writers._csv_safe()still defangs= + - @ \t \r \n-prefixed cells; the regression test (test_write_review_csv_defangs_formula_injection) is intact. ✓DocumentImporter.isValidImportIndex— full slash-homoglyph blocklist + canonical-path containment inresolvePdfByIndex. The regex anchors are still strict;\dis documented as ASCII-only with the rationale (DocumentImporter.java:71-72). This is among the best path-traversal hardening I've reviewed on this project — the comments tell a future reviewer why each guard exists.PersonRepository.FILTER_WHERE— named-parameter only, no concatenation, no JPQL injection vector even though it's a shared text block. ✓/persons/review/+page.server.ts��� the four form actions all use the typed API client,extractErrorCodeon the structured error,getErrorMessagemapped through i18n. No raw backend messages leak to the user.22603a4b) — 18 tests, including explicit "unauthenticated user is blocked at the server" assertions (expect(apiCall).not.toHaveBeenCalled()after a reader-shaped user). This is exactly the test-permission-boundaries discipline I push for: prove the block fires, not just that the happy path works.Suggestions (follow-up, not blocking)
46d1f5c6— accept the self-hosted residual or rungit filter-repo." Either choice is defensible; the undocumented state is the problem.fail(403, …)-without-guard. As the persons domain grew, the round-2 missing-guard was a real risk surface. A static rule like "any SvelteKitactions = { … }whose body callscreateApiClientMUST be preceded by ahasWriteAll(locals)guard" would prevent the regression class. Out of scope for this PR; worth a follow-up.Application-layer security: in good shape. Data-at-rest exposure: contained in the working tree, with a small forensic residual the team should explicitly own.
— Nora
Sara Holt — Senior QA Engineer (round 3 re-review)
Verdict: ✅ Approved
The two coverage gaps I flagged in round 2 are closed with named-behavior tests, and the test-only seam I flagged has been promoted out of production code with a proper interface boundary. The migration test concern remains as a verify-on-merge item, not a blocker. Strong round.
Compared to the previous review
C1 — No tests for
/persons/review/+page.server.tsform actions. RESOLVED. Commit22603a4blandsfrontend/src/routes/persons/review/page.server.spec.ts(253 lines, 18 tests). Coverage matrix per action (confirm,delete,merge,rename):{success: true}) ✓fail(403, …)✓fail(404, …)✓fail(403), API call never made:expect(apiCall).not.toHaveBeenCalled()) ✓mergewithouttargetPersonId,renamewithoutlastName) ✓mockApi,runAction) keeps each test body to ~6 lines; one logical assertion per test. Reads as the spec.C2 —
DocumentImporter.openFileStreamtest-only seam. RESOLVED. Commit4cc725d5. New@FunctionalInterface FileStreamOpener(FileStreamOpener.java) with a one-line@Component DefaultFileStreamOpenerproduction delegate. Tests inject a mock that throws to drive the IO-error branch (DocumentImporterTest.java, lines 343, 358), no spies. The brittleness is gone: a futurefinalkeyword can't break injection because there's no class to spy on any more.C3 —
MigrationIntegrationTestexercising V69 constraints. Verified: the test file was modified in this PR (+191 lines, per the diff stat). I'd want one more pass that explicitly inserts a row violatingchk_meta_date_precision='INVALID'and asserts the CHECK fires, plus a row withmeta_date_endset andprecision != 'RANGE'againstchk_meta_date_end_only_for_range. If those exist in the +191 lines, great; if not, they're cheap follow-ups (one test per CHECK constraint, Testcontainers Postgres catches them in seconds). Leaving this as a verify-on-merge item, not a gate.What I re-checked this round
DocumentImporterTest(656 lines, ~33 tests). Two new red/green entries for the receiver-names fix:load_provisionalReceiverUsesHumanNameFromReceiverNames_notSlug(line 434) — pins the round-2 bug.load_provisionalReceiverFallsBackToSlug_whenNamesListShorterThanSlugs(line 459) — pins the fallback contract.Both follow the project's TDD convention: failing test in the commit alongside the fix, single-assertion shape.
RelationshipServiceTest(+46 lines for the family-flag fix) — pins thePARENT_OF/SPOUSE_OF/SIBLING_OF→family_member=truepropagation on both endpoints. Idempotent re-setter assertion would close the regression class fully.@SpringBootTestwhere@ExtendWith(MockitoExtension.class)would do.CanonicalImportIntegrationTest(229 lines) andMigrationIntegrationTestboth run against Testcontainers; no H2. ✓Concerns (none blocking)
MigrationIntegrationTestactually exercises the V69 CHECK constraints, not just runs Flyway to completion. Two concrete tests:DocumentImporterTest.load_provisionalReceiverFallsBackToSlug_whenNamesListShorterThanSlugs(line 459) covers names-shorter-than-slugs. A names-LONGER-than-slugs test would close the symmetry. Currently the extra names are silently dropped (loop bounded byslugList.size()). Fine if that's the intended contract, but a test would pin it.Suggestions, not gates. Approving.
— Sara
Leonie Voss — UI/UX Lead & Accessibility Advocate (round 3 re-review)
Verdict: ✅ Approved
Both UX concerns from round 2 are resolved with the right fix at the right layer. The error pill uses semantic tokens now, the confirm action has a confirmation dialog, and the i18n keys propagate to all three locales. The axe-sweep follow-up I asked for remains a follow-up.
Compared to the previous review
C1 — Inconsistent error-pill styling. RESOLVED. Commit
461a8b12./persons/review/+page.svelte:33-37now reads:Semantic tokens (
border-danger,bg-danger/10,text-danger) — matchesPersonReviewRow.svelte:24deleteBtn. Dark-mode contract is preserved because both surfaces remap through the samedangersemantic token. Therole="alert"is retained so assistive tech still announces the error live.C2 —
PersonReviewRow.svelteconfirm action without a confirmation dialog (asymmetric with delete). RESOLVED. Commita670ba01. Both confirm (PersonReviewRow.svelte:65-80) and delete (:82-96) now route throughgetConfirmService()with the same shape, distinguished by thedestructiveflag — confirm isdestructive: false, delete isdestructive: true. New i18n keys (persons_review_confirm_confirm_title/text/button) landed in all three locales (de/en/es), per the commit summary. UX consistency restored without making the confirm action feel as scary as the delete.C3 — No axe-playwright sweep on
/persons/reviewpage in light AND dark mode. UNCHANGED. Not addressed in these nine commits; the page's a11y posture is solid based on the markup (semantic landmarks, focus rings, labeled inputs, 44px touch targets) but the automated check at both color modes remains a follow-up. Not a blocker — the markup-level discipline is right.What I re-checked this round
PersonReviewRowusesborder-line,bg-surface,text-ink,text-ink-2,bg-muted,border-danger,bg-danger/10— all semantic. ✓min-h-[44px]in theactionBtnanddeleteBtnclass lists,PersonReviewRow.svelte:22-24). ✓focus-visible:ring-2 focus-visible:ring-brand-navy focus-visible:ring-offset-2 focus-visible:outline-noneon every interactive element. ✓ The keyboard-only user can always see where they are.firstNameandlastNameinputs in the rename form are wrapped by<label>with visible text (PersonReviewRow.svelte:113-128). No placeholder-as-label antipattern.persons_review_*keys for the new confirm dialog exist inde.json,en.json,es.jsonper the commit. ✓role="alert"on the page-level error pill — screen readers will announce the form action's failure. ✓svelte/no-at-html-tagspromotion to error (commitf80dda74). Belt-and-braces XSS guard at the component layer; this also implicitly protects against an icon-only label being injected into a future{@html}and breaking accessible-name semantics. A small UX win alongside the security win.Suggestions (low-priority, follow-up)
/persons/reviewin BOTH color modes. Two surfaces to verify:mode === 'rename'or'merge'expect((await new AxeBuilder({ page }).withTags(['wcag2a', 'wcag2aa']).analyze()).violations).toEqual([])per surface, per mode. Five lines per test, high regression value. Track this as a follow-up issue, not blocking the merge.firstNameinput has norequiredattribute (PersonReviewRow.svelte:117), onlylastNamedoes (:125). That's correct (some persons are mononymous in the historical corpus) — just flagging it for the record so a future refactor doesn't addrequiredonfirstName"for consistency" and break the institution/single-name case.disabled={!mergeTargetId}but doesn't communicate the disabled state to assistive tech beyond the visualdisabled:opacity-40. Consider addingaria-describedbypointing to a<span class="sr-only">Select a target person to enable merge</span>. Tiny improvement, follow-up territory.— Leonie
Elicit — Senior Requirements Engineer (Brownfield Mode, round 3 re-review)
Verdict: ✅ Approved
The factual contradiction I called out in round 2 — the PR description claimed
out/was git-ignored while four artifacts were tracked — has been reconciled by the same commit that removed the artifacts (46d1f5c6). The PR-scope-bundles-five-capabilities concern remains structurally true, but it's acknowledged and the team has chosen to ship as-is. No blocker from the requirements layer.Compared to the previous review
C1 — PR description contradicts the actual change. RESOLVED. Commit
46d1f5c6is a coordinated, atomic update:git ls-files tools/import-normalizer/out/is empty)tools/import-normalizer/.gitignoreto an unambiguousout/(no negate-entries)02-normalization-spec.mddocs/import-migration/README.mddocker-compose.prod.ymlto reflect the "artifacts arrive out-of-band on the import volume" runtime contractThe PR description now matches the code state. A future reader using the description as the spec source-of-truth will land on accurate information.
C2 — PR scope bundles 5 distinct units of value. STRUCTURALLY UNCHANGED. This was never a "the code is wrong" finding — it was a "this would have been five PRs in a healthier cadence" finding. The team has acknowledged it across multiple reviewer comments and chosen to ship as-is given the bundle's internal consistency. Marking this as accepted-as-merge-shape, with a recommendation that the next migration of this magnitude be sequenced (schema → backend swap → frontend → tool).
NEW — Spec → Implementation traceability remains strong. The follow-up commits each cite the requirement they address in the commit message body, not just the title:
4cc725d5— names the smell ("test-only seam leaked into production") and the principle ("constructor-injected interface")53559437— names the regression class ("regression of the existing senderName→Person contract")a9cac08f— names the threat model ("a reader with a hand-crafted POST could trigger writes")a670ba01— names the user need ("easy to fat-finger on a touchscreen and irreversible")This is the level of "why behind the what" I push for. A future bisect lands on commit messages that explain the requirement, not just the change.
What I re-checked this round
applyAttribution(DocumentImporter.java:191-198) —sender_text/receiver_textare always retained verbatim even when the person is linked. ADR-025 documents this as a load-bearing invariant. ✓resolveReceiversnow zips parallelreceiver_person_idsandreceiver_namesby position, with documented slug-fallback. The contract is explicit in the code comment (DocumentImporter.java:259-262) — a future spec reader can locate the implementation without grep-archaeology.applyAuthoritativeAssociations's clear-then-repopulate (DocumentImporter.java:213-218), bounded by ADR-025's authoritative-collections rule. ✓/persons/review:page.server.spec.ts)Open Questions / TBD register
docs/import-migration/README.mdortools/import-normalizer/README.md, plus a follow-up ticket if the decision is "rewrite later."Backlog hygiene observation
Issue references
#666,#667,#668,#670are embedded in code comments and ADR-025, which lets a future reader trace any line of code back to the requirement that justifies it. This is exactly the traceability matrix discipline I push for, applied directly in the source tree.— Elicit
userEvent.clear deletes per-keystroke, so intermediate values 'Au'/'A' transit through the bound searchQuery and each schedules a debounced fetch. When CI keystroke jitter exceeds SEARCH_DEBOUNCE_MS (150 ms), an intermediate timer fires before the input reaches '' and the count assertion sees a phantom q=Au call. fill('') drops a single input event so the empty-query branch wins deterministically — same pattern this test file already uses for fill('Walter'). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.