feat(parser): support // separator and dot-compressed names #208

marcel · 2026-04-07T17:43:29+02:00

marcel commented

2026-04-07 17:43:29 +02:00

Summary

Closes #190. Two enhancements to PersonNameParser:

// pre-split in parseReceivers() — splits on // before existing logic, so each segment recurses through the full pipeline independently. Handles mixed cases like Herbert und Clara Cram//Eugenie de Gruyter naturally.
Dot-normalization in split() — inserts spaces after dots when the name has no spaces but contains dots (E.Rockstroh -> E. Rockstroh), so the existing last-space fallback handles the rest.

Changes

File	Change
`PersonNameParser.java`	`SLASH_SEPARATOR` constant + pre-split step; dot-normalization step
`PersonNameParserTest.java`	12 new tests (5 slash, 5 dot-compressed, 2 regression/interaction)

Test plan

All 35 PersonNameParserTest tests green (was 23)
Backend builds cleanly (mvnw clean package -DskipTests)
Run full backend test suite to confirm no regressions

## Summary Closes #190. Two enhancements to `PersonNameParser`: - **`//` pre-split in `parseReceivers()`** — splits on `//` before existing logic, so each segment recurses through the full pipeline independently. Handles mixed cases like `Herbert und Clara Cram//Eugenie de Gruyter` naturally. - **Dot-normalization in `split()`** — inserts spaces after dots when the name has no spaces but contains dots (`E.Rockstroh` -> `E. Rockstroh`), so the existing last-space fallback handles the rest. ## Changes | File | Change | |---|---| | `PersonNameParser.java` | `SLASH_SEPARATOR` constant + pre-split step; dot-normalization step | | `PersonNameParserTest.java` | 12 new tests (5 slash, 5 dot-compressed, 2 regression/interaction) | ## Test plan - [x] All 35 `PersonNameParserTest` tests green (was 23) - [x] Backend builds cleanly (`mvnw clean package -DskipTests`) - [ ] Run full backend test suite to confirm no regressions

marcel added 3 commits 2026-04-07 17:43:30 +02:00

feat(parser): support // as multi-person separator in parseReceivers 59475efbcb

Pre-splits input on "//" before existing logic so each segment is
processed independently through the full pipeline (und/u splitting,
last-name distribution, etc.).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(parser): normalize dot-compressed names in split() 0b57717586

Inserts spaces after dots when the cleaned name has no spaces but
contains dots, so the existing last-space fallback handles names
like "E.Rockstroh" and "Dr.Fr.Zarncke" correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(parser): add regression and cross-feature interaction tests

CI / Unit & Component Tests (pull_request) Failing after 3s

Details

CI / Backend Unit Tests (pull_request) Failing after 2s

Details

CI / Unit & Component Tests (push) Failing after 4s

Details

CI / Backend Unit Tests (push) Failing after 3s

Details

d6e74972eb

Regression test confirms already-spaced dot names are not double-spaced.
Interaction test confirms // separator works with dot-compressed names.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel commented

2026-04-07 17:44:16 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ✅ Approved

TDD Evidence

Red/green/refactor cycle followed correctly across 3 commits:
1. Slash separator tests (RED) + implementation (GREEN) in one commit
2. Dot-compressed tests (RED) + implementation (GREEN) in one commit
3. Regression/interaction tests in a third commit
12 new tests total, all with descriptive sentence-style names (slashSeparator_segmentContainsUnd, split_dotCompressed_titleFirstNameLastName)
Tests fail for exactly one reason each — clean assertion failures, not NPEs

Clean Code Check

Naming: SLASH_SEPARATOR is clear and follows the existing pattern (GEB_PATTERN, PAREN_LAST_NAME, MULTI_SEPARATOR). Good.
Function size: Neither parseReceivers() nor split() grew beyond reasonable length. The pre-split adds 7 lines, the dot-normalization adds 3 lines. No splitting needed.
Guard clauses: The slashParts.length > 1 check is a clean early return. The !cleaned.contains(" ") && cleaned.contains(".") guard is tight and only fires for genuinely compressed names.
No dead code: Nothing commented out, no unused variables.
Comments: The two inline comments ("Pre-split on //" and "Normalize dot-compressed names") explain what and why concisely. Acceptable for non-obvious normalization logic.

Specific Observations

SLASH_SEPARATOR.split(raw, -1) — the -1 limit is necessary to handle trailing // correctly (Java's Pattern.split drops trailing empty strings by default). Good catch during implementation.
The recursive approach in Part 1 reuses the full parseReceivers pipeline per segment — no logic duplication. This is the right call.
The cleaned.replace(".", ". ").trim() in Part 2 is KISS at its best — 3 lines that normalize input so the existing last-space fallback handles everything.
parseReceivers_dotCompressedName_passthrough confirms layer separation — parseReceivers doesn't try to split on dots, leaving that to split() downstream.

No Issues Found

Clean, minimal implementation. Both changes stay within PersonNameParser with no cross-domain impact.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** ### TDD Evidence - Red/green/refactor cycle followed correctly across 3 commits: 1. Slash separator tests (RED) + implementation (GREEN) in one commit 2. Dot-compressed tests (RED) + implementation (GREEN) in one commit 3. Regression/interaction tests in a third commit - 12 new tests total, all with descriptive sentence-style names (`slashSeparator_segmentContainsUnd`, `split_dotCompressed_titleFirstNameLastName`) - Tests fail for exactly one reason each — clean assertion failures, not NPEs ### Clean Code Check - **Naming**: `SLASH_SEPARATOR` is clear and follows the existing pattern (`GEB_PATTERN`, `PAREN_LAST_NAME`, `MULTI_SEPARATOR`). Good. - **Function size**: Neither `parseReceivers()` nor `split()` grew beyond reasonable length. The pre-split adds 7 lines, the dot-normalization adds 3 lines. No splitting needed. - **Guard clauses**: The `slashParts.length > 1` check is a clean early return. The `!cleaned.contains(" ") && cleaned.contains(".")` guard is tight and only fires for genuinely compressed names. - **No dead code**: Nothing commented out, no unused variables. - **Comments**: The two inline comments ("Pre-split on //" and "Normalize dot-compressed names") explain *what* and *why* concisely. Acceptable for non-obvious normalization logic. ### Specific Observations - `SLASH_SEPARATOR.split(raw, -1)` — the `-1` limit is necessary to handle trailing `//` correctly (Java's `Pattern.split` drops trailing empty strings by default). Good catch during implementation. - The recursive approach in Part 1 reuses the full `parseReceivers` pipeline per segment — no logic duplication. This is the right call. - The `cleaned.replace(".", ". ").trim()` in Part 2 is KISS at its best — 3 lines that normalize input so the existing last-space fallback handles everything. - `parseReceivers_dotCompressedName_passthrough` confirms layer separation — `parseReceivers` doesn't try to split on dots, leaving that to `split()` downstream. ### No Issues Found Clean, minimal implementation. Both changes stay within `PersonNameParser` with no cross-domain impact.

marcel commented

2026-04-07 17:44:28 +02:00

🏗️ Markus Keller — Application Architect

Verdict: ✅ Approved

Architecture Check

Both changes are confined to PersonNameParser.java, a pure static utility class in the service layer. No new classes, no schema changes, no API surface changes. This is the correct scope for a parser enhancement.
No cross-domain impact: parseReceivers() is called from MassImportService, split() from PersonService.findOrCreateByAlias(). Neither change is visible to callers — input/output contracts are unchanged.
No layer violations: The parser stays in the service layer where it belongs. No repository access, no controller logic, no frontend changes needed.

Design Observations

Part 1 (// pre-split): The recursive approach is architecturally sound — each //-segment re-enters parseReceivers(), so all existing normalization (geb. stripping, und/u splitting, last-name distribution, parenthesized overrides) applies to each segment automatically. No logic duplication.
Part 2 (dot-normalization): Normalizing input to match what the existing split() already handles, rather than adding a parallel code path. Method branching complexity stays flat.
The static final Pattern SLASH_SEPARATOR is compiled once — no runtime regex compilation overhead.

Complexity Assessment

PersonNameParser now has 4 normalization steps in parseReceivers() (null check, // pre-split, geb. stripping, und/u splitting) and 3 in split() (null check, geb. stripping, dot-normalization). This is manageable. If more steps are added in the future, consider extracting each into a named method and composing them as a pipeline — but that refactor is not needed yet (KISS).

## 🏗️ Markus Keller — Application Architect **Verdict: ✅ Approved** ### Architecture Check - Both changes are confined to `PersonNameParser.java`, a pure static utility class in the service layer. No new classes, no schema changes, no API surface changes. This is the correct scope for a parser enhancement. - **No cross-domain impact**: `parseReceivers()` is called from `MassImportService`, `split()` from `PersonService.findOrCreateByAlias()`. Neither change is visible to callers — input/output contracts are unchanged. - **No layer violations**: The parser stays in the service layer where it belongs. No repository access, no controller logic, no frontend changes needed. ### Design Observations - **Part 1 (// pre-split)**: The recursive approach is architecturally sound — each `//`-segment re-enters `parseReceivers()`, so all existing normalization (geb. stripping, und/u splitting, last-name distribution, parenthesized overrides) applies to each segment automatically. No logic duplication. - **Part 2 (dot-normalization)**: Normalizing input to match what the existing `split()` already handles, rather than adding a parallel code path. Method branching complexity stays flat. - The `static final Pattern SLASH_SEPARATOR` is compiled once — no runtime regex compilation overhead. ### Complexity Assessment `PersonNameParser` now has 4 normalization steps in `parseReceivers()` (null check, `//` pre-split, geb. stripping, und/u splitting) and 3 in `split()` (null check, geb. stripping, dot-normalization). This is manageable. If more steps are added in the future, consider extracting each into a named method and composing them as a pipeline — but that refactor is not needed yet (KISS).

marcel commented

2026-04-07 17:44:39 +02:00

🧪 Sara Holt — QA Engineer

Verdict: ✅ Approved

Test Coverage Analysis

12 new tests added, bringing PersonNameParserTest from 23 to 35 tests. All at the unit layer — correct placement for pure parsing logic.
Tests are well-structured with clear separation:
- 5 tests for // splitting in parseReceivers()
- 5 tests for dot-normalization in split()
- 1 regression test confirming already-spaced names aren't double-spaced
- 1 cross-feature interaction test (E.Rockstroh//Dr.Fr.Zarncke)

Test Quality

Descriptive names: Every test reads as a sentence — slashSeparator_segmentContainsUnd, split_dotCompressed_titleFirstNameLastName. Good.
One assertion per logical behavior: Each test asserts one specific parsing outcome.
Edge cases covered:
- Trailing // (blank filter works)
- Spaces around // (trimming works)
- Mixed // with und splitting (recursion works)
- Already-spaced dot names (guard prevents double-spacing)
- Layer separation (parseReceivers doesn't split on dots)

Suggested Additional Coverage (non-blocking)

Leading //: //Herbert Cram — the blank filter should handle it, but an explicit test documents the contract.
Multiple consecutive //: Herbert Cram////Eugenie — do empty segments get filtered cleanly?
E..Rockstroh (consecutive dots) — after replace(".", ". ") becomes E. . Rockstroh. Does the fallback handle the extra whitespace? Likely yes due to trim() and last-space logic, but worth a test.

These are nice-to-haves, not blockers. The current coverage is solid for the specified requirements.

## 🧪 Sara Holt — QA Engineer **Verdict: ✅ Approved** ### Test Coverage Analysis - **12 new tests** added, bringing `PersonNameParserTest` from 23 to 35 tests. All at the unit layer — correct placement for pure parsing logic. - Tests are well-structured with clear separation: - 5 tests for `//` splitting in `parseReceivers()` - 5 tests for dot-normalization in `split()` - 1 regression test confirming already-spaced names aren't double-spaced - 1 cross-feature interaction test (`E.Rockstroh//Dr.Fr.Zarncke`) ### Test Quality - **Descriptive names**: Every test reads as a sentence — `slashSeparator_segmentContainsUnd`, `split_dotCompressed_titleFirstNameLastName`. Good. - **One assertion per logical behavior**: Each test asserts one specific parsing outcome. - **Edge cases covered**: - Trailing `//` (blank filter works) - Spaces around `//` (trimming works) - Mixed `//` with `und` splitting (recursion works) - Already-spaced dot names (guard prevents double-spacing) - Layer separation (`parseReceivers` doesn't split on dots) ### Suggested Additional Coverage (non-blocking) - **Leading `//`**: `//Herbert Cram` — the blank filter should handle it, but an explicit test documents the contract. - **Multiple consecutive `//`**: `Herbert Cram////Eugenie` — do empty segments get filtered cleanly? - **`E..Rockstroh`** (consecutive dots) — after `replace(".", ". ")` becomes `E. . Rockstroh`. Does the fallback handle the extra whitespace? Likely yes due to `trim()` and last-space logic, but worth a test. These are nice-to-haves, not blockers. The current coverage is solid for the specified requirements.

marcel commented

2026-04-07 17:44:48 +02:00

🔒 Nora "NullX" Steiner — Security Engineer

Verdict: ✅ Approved

Attack Surface Analysis

Both changes are pure parsing logic in a backend utility class that processes spreadsheet data from authenticated admin users (mass import). This is not processing untrusted external input in a web request context.
No new endpoints, no new user input vectors, no changes to authentication or authorization.

Specific Checks

ReDoS risk: SLASH_SEPARATOR = Pattern.compile("//") is a literal pattern with no quantifiers — zero ReDoS risk. The existing MULTI_SEPARATOR pattern (\\s+(?:und|u)\\s+) is also safe — bounded alternation with no nested quantifiers.
Recursion depth: The recursive call parseReceivers(segment) is bounded by the number of //-segments in a single spreadsheet cell, which is naturally small. No stack overflow risk from realistic input. A pathological input with thousands of // separators would need to come from a malicious spreadsheet uploaded by an authenticated admin — extremely low risk.
String replacement: cleaned.replace(".", ". ") is a literal string replacement, not regex — no injection vector.
No new dependencies: Only java.util.Arrays was added as an import.

No Security Concerns

Both changes are confined to a trusted-input parser within the service layer. No new attack vectors introduced.

## 🔒 Nora "NullX" Steiner — Security Engineer **Verdict: ✅ Approved** ### Attack Surface Analysis - Both changes are pure parsing logic in a backend utility class that processes spreadsheet data from authenticated admin users (mass import). This is **not** processing untrusted external input in a web request context. - No new endpoints, no new user input vectors, no changes to authentication or authorization. ### Specific Checks - **ReDoS risk**: `SLASH_SEPARATOR = Pattern.compile("//")` is a literal pattern with no quantifiers — zero ReDoS risk. The existing `MULTI_SEPARATOR` pattern (`\\s+(?:und|u)\\s+`) is also safe — bounded alternation with no nested quantifiers. - **Recursion depth**: The recursive call `parseReceivers(segment)` is bounded by the number of `//`-segments in a single spreadsheet cell, which is naturally small. No stack overflow risk from realistic input. A pathological input with thousands of `//` separators would need to come from a malicious spreadsheet uploaded by an authenticated admin — extremely low risk. - **String replacement**: `cleaned.replace(".", ". ")` is a literal string replacement, not regex — no injection vector. - **No new dependencies**: Only `java.util.Arrays` was added as an import. ### No Security Concerns Both changes are confined to a trusted-input parser within the service layer. No new attack vectors introduced.

marcel commented

2026-04-07 17:44:54 +02:00

🎨 Leonie Voss — UI/UX Design Lead

Verdict: ✅ Approved

Assessment

This is a backend-only parsing change with no direct UI impact. No frontend components, routes, styles, or user-facing text are modified.

Indirect UX benefit: Once these names are correctly parsed, person lists and document metadata will show clean names (E. Rockstroh, Dr. Fr. Zarncke) instead of garbage entries with lastName = "?". This improves data quality throughout the UI without any frontend code changes.

The // separator and dot-compressed patterns are conventions within the import spreadsheet, not something end users interact with through the web interface.

No design concerns.

## 🎨 Leonie Voss — UI/UX Design Lead **Verdict: ✅ Approved** ### Assessment This is a backend-only parsing change with no direct UI impact. No frontend components, routes, styles, or user-facing text are modified. **Indirect UX benefit**: Once these names are correctly parsed, person lists and document metadata will show clean names (`E. Rockstroh`, `Dr. Fr. Zarncke`) instead of garbage entries with `lastName = "?"`. This improves data quality throughout the UI without any frontend code changes. The `//` separator and dot-compressed patterns are conventions within the import spreadsheet, not something end users interact with through the web interface. No design concerns.

marcel commented

2026-04-07 17:44:59 +02:00

🛠️ Tobias Wendt — DevOps Engineer

Verdict: ✅ Approved

Infrastructure Impact

Pure Java code change — no new dependencies, no config changes, no Docker, CI, or deployment modifications needed.
No new Maven dependencies added (java.util.Arrays is JDK standard library).
The verification command (./mvnw test -Dtest=PersonNameParserTest) runs fast unit tests only — no Testcontainers or external services required. Zero CI time impact.
Both parts touch the same file and test file across 3 atomic commits — clean for review and bisect.

No concerns from my angle.

## 🛠️ Tobias Wendt — DevOps Engineer **Verdict: ✅ Approved** ### Infrastructure Impact - Pure Java code change — no new dependencies, no config changes, no Docker, CI, or deployment modifications needed. - No new Maven dependencies added (`java.util.Arrays` is JDK standard library). - The verification command (`./mvnw test -Dtest=PersonNameParserTest`) runs fast unit tests only — no Testcontainers or external services required. Zero CI time impact. - Both parts touch the same file and test file across 3 atomic commits — clean for review and bisect. No concerns from my angle.

marcel merged commit d6e74972eb into main

2026-04-08 18:46:53 +02:00

marcel deleted branch feat/issue-190-slash-separator-dot-compressed

2026-04-08 18:46:58 +02:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#208