feat(massimport): support // separator and dot-compressed names in PersonNameParser #190
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The mass import spreadsheet contains two name patterns that
PersonNameParsercurrently handles incorrectly:1.
//as multi-person separatorparseReceivers()only splits onund/u. The//separator — used for independent full names in a single cell — causes both names to be imported as a single garbage person record.Charl.Blomquist//Tante LollyWalter de Gruyter//Eugenie de Gruyter2. Dot-compressed names (no spaces)
The
split()fallback only handles space-separated names, so dot-compressed names producelastName = "?":E.Rockstroh("E.Rockstroh", "?")("E.", "Rockstroh")E.M.("E.M.", "?")("E.", "M.")Dr.Fr.Zarncke("Dr.Fr.Zarncke", "?")("Dr. Fr.", "Zarncke")Dr.Zarnke("Dr.Zarnke", "?")("Dr.", "Zarnke")Solution
Part 1 —
//pre-split inparseReceivers()Pre-split the raw input on
//before any existing logic. Each segment is then piped through the current pipeline independently, and results are flattened.Unlike
und/u,//always separates fully independent, complete names — no shared last-name distribution is needed. The recursive approach handles mixed cases likeHerbert und Clara Cram//Eugenie de Gruyternaturally.Part 2 — Dot-normalization in
split()After the
geb.stripping step insplit(), add a dot-normalization step that applies only when the cleaned name has no spaces but contains dots:Then the existing known-last-name check and last-space fallback handle the rest:
E.RockstrohE. Rockstroh("E.", "Rockstroh")✓E.M.E. M.("E.", "M.")✓Dr.Fr.ZarnckeDr. Fr. Zarncke("Dr. Fr.", "Zarncke")✓Dr.ZarnkeDr. Zarnke("Dr.", "Zarnke")✓No changes needed to
parseReceivers()for Part 2 —split()is called downstream inPersonService.findOrCreateByAlias().Files
backend/src/main/java/.../service/PersonNameParser.javaSLASH_SEPARATORconstant + pre-split step inparseReceivers(); add dot-normalization step insplit()backend/src/test/java/.../service/PersonNameParserTest.javaNo schema, API, or i18n changes needed.
New Tests
Part 1 —
//separatorPart 2 — Dot-compressed names
Verification
Replaces #182 and #184.
👨💻 Felix Brandt — Senior Fullstack Developer
Questions & Observations
//-segment re-entersparseReceivers(), so all existing logic (und/u splitting, last-name distribution) is reused without duplication. The mixed caseHerbert und Clara Cram//Eugenie de Gruyterworks naturally because recursion handles theundsplit on the left segment independently.replace(".", ". ").trim()) is KISS at its best — it normalizes input so the existing last-space fallback does all the work. No new branching logic, no new code paths.!cleaned.contains(" ") && cleaned.contains(".")is tight — it only fires for genuinely compressed names, avoiding interference with already-spaced names likeDr. Zarncke.SLASH_SEPARATORconstant is well-named. SincePattern.compileis used, confirm it'sstatic finalso the regex is compiled once (as shown — just calling it out for the implementer).Edge Cases Worth Testing
//://Herbert CramorHerbert Cram//— the.filter(s -> !s.isBlank())handles this, but an explicit test documents the contract.M.— after dot-normalization becomesM.→ trimmed toM.. No space → falls through toSplitName("M.", "?"). Is that intentional, or should single-initial inputs be handled as a special case?E.Rockstroh//Dr.Fr.Zarncke— the//split produces two segments, each of which passes throughparseReceiversas a single token. The dot-normalization only happens later whensplit()is called inPersonService.findOrCreateByAlias(). This layering is correct, but a test confirming the full pipeline for this combined case would be valuable.Suggestions
//splitting is the very first operation inparseReceivers().geb.stripping") is equally critical — a name likegeb.Rockstrohshould stripgeb.first, leavingRockstroh(no dots left, normalization doesn't fire). Confirm this ordering.//and one for the combinedE.Rockstroh//Dr.Fr.Zarnckepipeline. These document the interaction between the two parts.🏗️ Markus Keller — Application Architect
Questions & Observations
//-segment re-entersparseReceivers().split()already handles, rather than adding a parallel code path. This keeps method branching complexity flat.PersonNameParseris a pure utility within the service layer;parseReceivers()is called fromMassImportService,split()is called fromPersonService.findOrCreateByAlias(). Neither change is visible to callers.Suggestions
/vs//: Does the spreadsheet data ever contain a single/as a separator? If so, the//pattern won't match and those cases will silently pass through as single names. Worth verifying against actual import data.PersonNameParseris accumulating normalization steps —geb.stripping, known last names, now//pre-split and dot-normalization. Not a problem yet, but if more normalization steps are added in the future, consider extracting each into a named method and composing them as a small pipeline. The current approach (sequential if-checks) is fine for 4 steps; it starts to get hard to reason about at 6+.🧪 Sara Holt — QA Engineer
Questions & Observations
//splitting (inparseReceivers) and 5 tests for dot-normalization (4 insplit()+ 1parseReceiverspassthrough). Good layered coverage.parseReceivers_dotCompressedName_passthroughtest is particularly valuable — it confirms layer separation by proving thatparseReceiverstreatsDr.Fr.Zarnckeas a single token and doesn't try to split on dots.slashSeparator_segmentContainsUndtest exercises the interaction between//splitting andundsplitting via recursion — this is the highest-value test in the set.Missing Edge Case Tests
Part 1 —
//separator:Herbert Cram//or//Herbert Cram— does the blank filter correctly produce one person?Herbert Cram////Eugenie— do empty segments get filtered cleanly?//still pass through unchanged (probably covered by existing tests, but worth noting).Part 2 — Dot-compressed names:
Dr. Fr. Zarncke— the guard!cleaned.contains(" ")should prevent normalization. A regression test confirming no double-spacing is important.Rockstroh.— after normalization becomesRockstroh.→ trimmed toRockstroh.. How does the last-space fallback handle this?E..Rockstroh— afterreplace(".", ". ")becomesE. . Rockstroh. Does the fallback handle the extra whitespace?Cross-part interaction:
E.Rockstroh//Dr.Fr.Zarncke— full pipeline test confirming both features work together end-to-end.Suggestions
PersonNameParserTestclass after implementation, not just the new tests — both changes touch shared code paths, and regressions in existing tests would catch guard condition edge cases.MassImportServiceTest(if it exists) to catch integration-level regressions in the import pipeline.🔒 Nora "NullX" Steiner — Security Engineer
Questions & Observations
parseReceivers(segment)is bounded by the number of//-segments in a single spreadsheet cell, which is naturally small. No risk of stack overflow from realistic input.String.replace(".", ". ")is a literal string replacement, not regex — no risk of ReDoS or regex injection.Suggestions
🎨 Leonie Voss — UI/UX Design Lead
Questions & Observations
E. Rockstroh,Dr. Fr. Zarncke) instead of garbage entries withlastName = "?". This improves data quality throughout the UI without any frontend code changes.//separator and dot-compressed patterns are conventions within the import spreadsheet, not something end users interact with through the web interface.Suggestions
🛠️ Tobias Wendt — DevOps Engineer
Questions & Observations
./mvnw test -Dtest=PersonNameParserTest) runs fast unit tests only — no Testcontainers or external services required. Zero CI time impact.Suggestions
Implementation Complete
All tasks done on branch
feat/issue-190-slash-separator-dot-compressed.Commits
59475effeat(parser): support // as multi-person separator in parseReceivers - Pre-splits on//before existing logic; each segment recurses through the full pipeline. 5 new tests.0b57717feat(parser): normalize dot-compressed names in split() - Inserts spaces after dots when no spaces present, soE.Rockstroh->("E.", "Rockstroh"). 5 new tests.d6e7497test(parser): add regression and cross-feature interaction tests - Confirms already-spaced names aren't double-spaced; confirms//+ dot-compressed work together. 2 new tests.Test Results
PersonNameParserTest(was 23) - all greenmvnw clean package -DskipTests)Files Changed
PersonNameParser.javaSLASH_SEPARATORconstant + pre-split inparseReceivers(); dot-normalization insplit()PersonNameParserTest.java