feat(massimport): support // as multi-person separator in PersonNameParser #182

Closed
opened 2026-04-06 13:52:40 +02:00 by marcel · 6 comments
Owner

Problem

The mass import spreadsheet uses // to separate independent full names in a single cell, but PersonNameParser.parseReceivers() only splits on und/u. This causes both names to be imported as a single garbage person record.

Examples currently broken:

  • Charl.Blomquist//Tante Lolly → one person instead of two
  • Walter de Gruyter//Eugenie de Gruyter → one person instead of two

Solution

Pre-split the raw input on // before any existing logic. Each //-segment is then piped through the current pipeline independently, and results are flattened.

Unlike und/u, // always separates fully independent, complete names — no shared last-name distribution is needed. The recursive approach handles mixed cases like Herbert und Clara Cram//Eugenie de Gruyter naturally.

Changes

PersonNameParser.java

Add a SLASH_SEPARATOR pattern and a pre-split step at the top of parseReceivers():

private static final Pattern SLASH_SEPARATOR = Pattern.compile("//");

// In parseReceivers(), after null/blank check:
String[] slashParts = SLASH_SEPARATOR.split(raw);
if (slashParts.length > 1) {
    return Arrays.stream(slashParts)
            .map(String::trim)
            .filter(s -> !s.isBlank())
            .flatMap(segment -> parseReceivers(segment).stream())
            .toList();
}

PersonNameParserTest.java

4 new tests (TDD — red first):

  • slashSeparator_twoIndependentFullNames
  • slashSeparator_abbreviatedFirstName
  • slashSeparator_withSpacesAroundSlashes
  • slashSeparator_segmentContainsUnd

Files

File Change
backend/src/main/java/.../service/PersonNameParser.java Add constant + pre-split step
backend/src/test/java/.../service/PersonNameParserTest.java 4 new tests

No schema, API, or i18n changes needed.

Verification

cd backend && ./mvnw test -Dtest=PersonNameParserTest
## Problem The mass import spreadsheet uses `//` to separate independent full names in a single cell, but `PersonNameParser.parseReceivers()` only splits on `und`/`u`. This causes both names to be imported as a single garbage person record. Examples currently broken: - `Charl.Blomquist//Tante Lolly` → one person instead of two - `Walter de Gruyter//Eugenie de Gruyter` → one person instead of two ## Solution Pre-split the raw input on `//` **before** any existing logic. Each `//`-segment is then piped through the current pipeline independently, and results are flattened. Unlike `und`/`u`, `//` always separates fully independent, complete names — no shared last-name distribution is needed. The recursive approach handles mixed cases like `Herbert und Clara Cram//Eugenie de Gruyter` naturally. ## Changes ### `PersonNameParser.java` Add a `SLASH_SEPARATOR` pattern and a pre-split step at the top of `parseReceivers()`: ```java private static final Pattern SLASH_SEPARATOR = Pattern.compile("//"); // In parseReceivers(), after null/blank check: String[] slashParts = SLASH_SEPARATOR.split(raw); if (slashParts.length > 1) { return Arrays.stream(slashParts) .map(String::trim) .filter(s -> !s.isBlank()) .flatMap(segment -> parseReceivers(segment).stream()) .toList(); } ``` ### `PersonNameParserTest.java` 4 new tests (TDD — red first): - `slashSeparator_twoIndependentFullNames` - `slashSeparator_abbreviatedFirstName` - `slashSeparator_withSpacesAroundSlashes` - `slashSeparator_segmentContainsUnd` ## Files | File | Change | |---|---| | `backend/src/main/java/.../service/PersonNameParser.java` | Add constant + pre-split step | | `backend/src/test/java/.../service/PersonNameParserTest.java` | 4 new tests | No schema, API, or i18n changes needed. ## Verification ```bash cd backend && ./mvnw test -Dtest=PersonNameParserTest ```
marcel added the feature label 2026-04-06 13:52:46 +02:00
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Questions & Observations

  • The recursive approach (flatMap(segment -> parseReceivers(segment).stream())) is elegant and handles the Herbert und Clara Cram//Eugenie de Gruyter mixed case naturally. Good design — the recursion lets each segment reuse all existing logic without duplication.
  • The 4 proposed test names are well-structured and follow the [feature]_[scenario] naming convention. Each covers a distinct dimension of the feature.
  • I notice the issue doesn't mention an edge case: what happens with trailing or leading //? E.g., //Herbert Cram or Herbert Cram//. The .filter(s -> !s.isBlank()) handles this, but an explicit test confirming the behavior would make the contract clear.
  • What about a single // with no names around it (i.e., //)? Again, the blank filter should handle it, but a test documents intent.

Suggestions

  • Consider adding a 5th test: slashSeparator_trailingOrLeadingSlash_ignored — to document that //Herbert Cram produces one person, not two. This makes the guard clause's behavior explicit and regression-proof.
  • The SLASH_SEPARATOR constant name is good — it reveals intent. Since Pattern.compile is used, confirm it's static final so the regex is compiled once (as shown in the issue — just calling it out for the implementer).
  • The pre-split step should be the very first thing after the null/blank check, before any other normalization. The issue says "after null/blank check" — just confirming this is the right place so that // splitting happens on raw input, not on input that's already been partially normalized.
## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Questions & Observations - The recursive approach (`flatMap(segment -> parseReceivers(segment).stream())`) is elegant and handles the `Herbert und Clara Cram//Eugenie de Gruyter` mixed case naturally. Good design — the recursion lets each segment reuse all existing logic without duplication. - The 4 proposed test names are well-structured and follow the `[feature]_[scenario]` naming convention. Each covers a distinct dimension of the feature. - I notice the issue doesn't mention an edge case: what happens with trailing or leading `//`? E.g., `//Herbert Cram` or `Herbert Cram//`. The `.filter(s -> !s.isBlank())` handles this, but an explicit test confirming the behavior would make the contract clear. - What about a single `//` with no names around it (i.e., `//`)? Again, the blank filter should handle it, but a test documents intent. ### Suggestions - Consider adding a 5th test: `slashSeparator_trailingOrLeadingSlash_ignored` — to document that `//Herbert Cram` produces one person, not two. This makes the guard clause's behavior explicit and regression-proof. - The `SLASH_SEPARATOR` constant name is good — it reveals intent. Since `Pattern.compile` is used, confirm it's `static final` so the regex is compiled once (as shown in the issue — just calling it out for the implementer). - The pre-split step should be the **very first thing** after the null/blank check, before any other normalization. The issue says "after null/blank check" — just confirming this is the right place so that `//` splitting happens on raw input, not on input that's already been partially normalized.
Author
Owner

🏗️ Markus Keller — Application Architect

Questions & Observations

  • The change is well-scoped: a single method gets a 6-line pre-split step, no new classes, no new interfaces, no schema changes. This is the right level of intervention for a parser enhancement.
  • The recursive design is architecturally sound — each //-segment re-enters parseReceivers(), so all existing logic (und/u splitting, last-name distribution) is reused without duplication. No new code paths to maintain.
  • No cross-domain impact: PersonNameParser is a pure utility within the service layer, no repository or controller changes needed.

Suggestions

  • One thing to confirm: does the spreadsheet data ever contain a single / as a separator (as opposed to //)? If so, the // pattern won't match, and those cases will silently pass through as single names. Worth checking the actual import data to be sure // is the only slash-based separator in use.
  • The Pattern.compile("//") is straightforward, but since // has no special regex meaning, String.split("//") would also work and avoids the compiled pattern overhead. That said, for a utility called at import time (not hot path), the difference is negligible — either approach is fine. The compiled pattern is arguably more readable since it sits as a named constant.
## 🏗️ Markus Keller — Application Architect ### Questions & Observations - The change is well-scoped: a single method gets a 6-line pre-split step, no new classes, no new interfaces, no schema changes. This is the right level of intervention for a parser enhancement. - The recursive design is architecturally sound — each `//`-segment re-enters `parseReceivers()`, so all existing logic (und/u splitting, last-name distribution) is reused without duplication. No new code paths to maintain. - No cross-domain impact: `PersonNameParser` is a pure utility within the service layer, no repository or controller changes needed. ### Suggestions - One thing to confirm: does the spreadsheet data ever contain a single `/` as a separator (as opposed to `//`)? If so, the `//` pattern won't match, and those cases will silently pass through as single names. Worth checking the actual import data to be sure `//` is the only slash-based separator in use. - The `Pattern.compile("//")` is straightforward, but since `//` has no special regex meaning, `String.split("//")` would also work and avoids the compiled pattern overhead. That said, for a utility called at import time (not hot path), the difference is negligible — either approach is fine. The compiled pattern is arguably more readable since it sits as a named constant.
Author
Owner

🧪 Sara Holt — QA Engineer

Questions & Observations

  • The 4 proposed tests cover the core scenarios well: two independent names, abbreviated first name, whitespace tolerance, and mixed // + und. Good coverage of the feature's interaction with existing logic.
  • Missing edge case tests I'd want to see:
    • Empty segment: Herbert Cram// or //Herbert Cram — does the blank filter correctly discard the empty segment?
    • Multiple consecutive separators: Herbert Cram////Eugenie de Gruyter — does this produce empty segments that get filtered, or does it break?
    • Single name with no separator: Regression test to confirm that inputs without // still pass through unchanged.

Suggestions

  • The test slashSeparator_segmentContainsUnd is the most valuable of the four — it tests the interaction between the new // split and the existing und split. I'd make this the first test written (red phase) since it exercises the full recursive pipeline.
  • Consider a regression test that runs all existing parseReceivers tests after the change to confirm no behavior change for inputs without //. This is probably already covered by running the full test class, but worth explicitly noting in the PR.
  • Verification command ./mvnw test -Dtest=PersonNameParserTest is correct. I'd also recommend running the full MassImportService test suite (if it exists) to catch any integration-level regressions.
## 🧪 Sara Holt — QA Engineer ### Questions & Observations - The 4 proposed tests cover the core scenarios well: two independent names, abbreviated first name, whitespace tolerance, and mixed `//` + `und`. Good coverage of the feature's interaction with existing logic. - Missing edge case tests I'd want to see: - **Empty segment**: `Herbert Cram//` or `//Herbert Cram` — does the blank filter correctly discard the empty segment? - **Multiple consecutive separators**: `Herbert Cram////Eugenie de Gruyter` — does this produce empty segments that get filtered, or does it break? - **Single name with no separator**: Regression test to confirm that inputs without `//` still pass through unchanged. ### Suggestions - The test `slashSeparator_segmentContainsUnd` is the most valuable of the four — it tests the interaction between the new `//` split and the existing `und` split. I'd make this the first test written (red phase) since it exercises the full recursive pipeline. - Consider a regression test that runs all existing `parseReceivers` tests after the change to confirm no behavior change for inputs without `//`. This is probably already covered by running the full test class, but worth explicitly noting in the PR. - Verification command `./mvnw test -Dtest=PersonNameParserTest` is correct. I'd also recommend running the full `MassImportService` test suite (if it exists) to catch any integration-level regressions.
Author
Owner

🔒 Nora "NullX" Steiner — Security Engineer

Questions & Observations

  • This is a pure parsing change in a backend utility class that processes spreadsheet data from authenticated admin users (mass import). The attack surface is minimal — this is not processing untrusted external input in a web request context.
  • No new endpoints, no new user input vectors, no changes to authentication or authorization.
  • The recursive call parseReceivers(segment) is bounded by the number of //-segments in a single spreadsheet cell, which is naturally small. No risk of stack overflow from malicious input at realistic cell sizes.

Suggestions

  • No security concerns from my angle. The change is confined to a trusted-input parser within the service layer, and the recursive depth is bounded by the input data (spreadsheet cell content). No new attack vectors introduced.
## 🔒 Nora "NullX" Steiner — Security Engineer ### Questions & Observations - This is a pure parsing change in a backend utility class that processes spreadsheet data from authenticated admin users (mass import). The attack surface is minimal — this is not processing untrusted external input in a web request context. - No new endpoints, no new user input vectors, no changes to authentication or authorization. - The recursive call `parseReceivers(segment)` is bounded by the number of `//`-segments in a single spreadsheet cell, which is naturally small. No risk of stack overflow from malicious input at realistic cell sizes. ### Suggestions - No security concerns from my angle. The change is confined to a trusted-input parser within the service layer, and the recursive depth is bounded by the input data (spreadsheet cell content). No new attack vectors introduced.
Author
Owner

🎨 Leonie Voss — UI/UX Design Lead

Questions & Observations

  • This is a backend-only parsing change with no UI impact. No frontend components, routes, or user-facing text are affected.
  • The // separator is a convention within the import spreadsheet, not something end users interact with through the web UI.

Suggestions

  • No concerns from my angle. I checked the scope — this change is entirely within PersonNameParser.java and its test file. No UI, accessibility, or design implications.
## 🎨 Leonie Voss — UI/UX Design Lead ### Questions & Observations - This is a backend-only parsing change with no UI impact. No frontend components, routes, or user-facing text are affected. - The `//` separator is a convention within the import spreadsheet, not something end users interact with through the web UI. ### Suggestions - No concerns from my angle. I checked the scope — this change is entirely within `PersonNameParser.java` and its test file. No UI, accessibility, or design implications.
Author
Owner

🛠️ Tobias Wendt — DevOps Engineer

Questions & Observations

  • Pure Java code change, no infrastructure impact. No new dependencies, no config changes, no Docker or CI modifications needed.
  • The verification command (./mvnw test -Dtest=PersonNameParserTest) is a fast unit test — no Testcontainers or external services required. CI time impact is negligible.

Suggestions

  • No concerns from my angle. This is a self-contained parser enhancement that doesn't touch infrastructure, dependencies, or deployment configuration.
## 🛠️ Tobias Wendt — DevOps Engineer ### Questions & Observations - Pure Java code change, no infrastructure impact. No new dependencies, no config changes, no Docker or CI modifications needed. - The verification command (`./mvnw test -Dtest=PersonNameParserTest`) is a fast unit test — no Testcontainers or external services required. CI time impact is negligible. ### Suggestions - No concerns from my angle. This is a self-contained parser enhancement that doesn't touch infrastructure, dependencies, or deployment configuration.
Sign in to join this conversation.
No Label feature
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#182