feat(parser): support // separator and dot-compressed names #208
Reference in New Issue
Block a user
Delete Branch "feat/issue-190-slash-separator-dot-compressed"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Closes #190. Two enhancements to
PersonNameParser://pre-split inparseReceivers()— splits on//before existing logic, so each segment recurses through the full pipeline independently. Handles mixed cases likeHerbert und Clara Cram//Eugenie de Gruyternaturally.split()— inserts spaces after dots when the name has no spaces but contains dots (E.Rockstroh->E. Rockstroh), so the existing last-space fallback handles the rest.Changes
PersonNameParser.javaSLASH_SEPARATORconstant + pre-split step; dot-normalization stepPersonNameParserTest.javaTest plan
PersonNameParserTesttests green (was 23)mvnw clean package -DskipTests)👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
TDD Evidence
slashSeparator_segmentContainsUnd,split_dotCompressed_titleFirstNameLastName)Clean Code Check
SLASH_SEPARATORis clear and follows the existing pattern (GEB_PATTERN,PAREN_LAST_NAME,MULTI_SEPARATOR). Good.parseReceivers()norsplit()grew beyond reasonable length. The pre-split adds 7 lines, the dot-normalization adds 3 lines. No splitting needed.slashParts.length > 1check is a clean early return. The!cleaned.contains(" ") && cleaned.contains(".")guard is tight and only fires for genuinely compressed names.Specific Observations
SLASH_SEPARATOR.split(raw, -1)— the-1limit is necessary to handle trailing//correctly (Java'sPattern.splitdrops trailing empty strings by default). Good catch during implementation.parseReceiverspipeline per segment — no logic duplication. This is the right call.cleaned.replace(".", ". ").trim()in Part 2 is KISS at its best — 3 lines that normalize input so the existing last-space fallback handles everything.parseReceivers_dotCompressedName_passthroughconfirms layer separation —parseReceiversdoesn't try to split on dots, leaving that tosplit()downstream.No Issues Found
Clean, minimal implementation. Both changes stay within
PersonNameParserwith no cross-domain impact.🏗️ Markus Keller — Application Architect
Verdict: ✅ Approved
Architecture Check
PersonNameParser.java, a pure static utility class in the service layer. No new classes, no schema changes, no API surface changes. This is the correct scope for a parser enhancement.parseReceivers()is called fromMassImportService,split()fromPersonService.findOrCreateByAlias(). Neither change is visible to callers — input/output contracts are unchanged.Design Observations
//-segment re-entersparseReceivers(), so all existing normalization (geb. stripping, und/u splitting, last-name distribution, parenthesized overrides) applies to each segment automatically. No logic duplication.split()already handles, rather than adding a parallel code path. Method branching complexity stays flat.static final Pattern SLASH_SEPARATORis compiled once — no runtime regex compilation overhead.Complexity Assessment
PersonNameParsernow has 4 normalization steps inparseReceivers()(null check,//pre-split, geb. stripping, und/u splitting) and 3 insplit()(null check, geb. stripping, dot-normalization). This is manageable. If more steps are added in the future, consider extracting each into a named method and composing them as a pipeline — but that refactor is not needed yet (KISS).🧪 Sara Holt — QA Engineer
Verdict: ✅ Approved
Test Coverage Analysis
PersonNameParserTestfrom 23 to 35 tests. All at the unit layer — correct placement for pure parsing logic.//splitting inparseReceivers()split()E.Rockstroh//Dr.Fr.Zarncke)Test Quality
slashSeparator_segmentContainsUnd,split_dotCompressed_titleFirstNameLastName. Good.//(blank filter works)//(trimming works)//withundsplitting (recursion works)parseReceiversdoesn't split on dots)Suggested Additional Coverage (non-blocking)
//://Herbert Cram— the blank filter should handle it, but an explicit test documents the contract.//:Herbert Cram////Eugenie— do empty segments get filtered cleanly?E..Rockstroh(consecutive dots) — afterreplace(".", ". ")becomesE. . Rockstroh. Does the fallback handle the extra whitespace? Likely yes due totrim()and last-space logic, but worth a test.These are nice-to-haves, not blockers. The current coverage is solid for the specified requirements.
🔒 Nora "NullX" Steiner — Security Engineer
Verdict: ✅ Approved
Attack Surface Analysis
Specific Checks
SLASH_SEPARATOR = Pattern.compile("//")is a literal pattern with no quantifiers — zero ReDoS risk. The existingMULTI_SEPARATORpattern (\\s+(?:und|u)\\s+) is also safe — bounded alternation with no nested quantifiers.parseReceivers(segment)is bounded by the number of//-segments in a single spreadsheet cell, which is naturally small. No stack overflow risk from realistic input. A pathological input with thousands of//separators would need to come from a malicious spreadsheet uploaded by an authenticated admin — extremely low risk.cleaned.replace(".", ". ")is a literal string replacement, not regex — no injection vector.java.util.Arrayswas added as an import.No Security Concerns
Both changes are confined to a trusted-input parser within the service layer. No new attack vectors introduced.
🎨 Leonie Voss — UI/UX Design Lead
Verdict: ✅ Approved
Assessment
This is a backend-only parsing change with no direct UI impact. No frontend components, routes, styles, or user-facing text are modified.
Indirect UX benefit: Once these names are correctly parsed, person lists and document metadata will show clean names (
E. Rockstroh,Dr. Fr. Zarncke) instead of garbage entries withlastName = "?". This improves data quality throughout the UI without any frontend code changes.The
//separator and dot-compressed patterns are conventions within the import spreadsheet, not something end users interact with through the web interface.No design concerns.
🛠️ Tobias Wendt — DevOps Engineer
Verdict: ✅ Approved
Infrastructure Impact
java.util.Arraysis JDK standard library)../mvnw test -Dtest=PersonNameParserTest) runs fast unit tests only — no Testcontainers or external services required. Zero CI time impact.No concerns from my angle.