feat(parser): improve GEB_PATTERN and store maiden name as alias #209

Closed
opened 2026-04-07 18:27:47 +02:00 by marcel · 9 comments
Owner

Problem

GEB_PATTERN (\s+geb\.\s+\S+) is too strict in two ways:

  1. Requires dot after "geb" - fails on Ella Dieckmann, geb de Gruyter (comma, no dot)
  2. Strips only one word - fails on multi-word maiden names like geb de Gruyter (only strips "de", leaves "Gruyter" dangling)

Currently the maiden name is discarded entirely. With the person_name_aliases table from #181, we can preserve it.

Examples

Input Current result Expected result
Eugenie de Gruyter geb. Muller Stripped, maiden name lost Strip, create alias Muller (type=MAIDEN_NAME)
Ella Dieckmann, geb de Gruyter firstName="Ella Dieckmann, geb", lastName="de Gruyter" Strip, create alias de Gruyter (type=MAIDEN_NAME)
Ella Dieckmann geb Wolff firstName="Ella Dieckmann geb", lastName="Wolff" Strip, create alias Wolff (type=MAIDEN_NAME)

Solution

Part 1 - Widen GEB_PATTERN

Change from: \s+geb\.\s+\S+
To: ,?\s*geb\.?\s+.*$

This handles:

  • Optional comma before "geb"
  • Optional dot after "geb"
  • Multi-word maiden names (captures everything after "geb" to end of string)

Part 2 - Preserve maiden name as alias

The matched maiden name should be extracted (not just stripped) and passed back to the caller so PersonService.findOrCreateByAlias() can create a PersonNameAlias with type MAIDEN_NAME.

This requires a small API change - split() and/or parseReceivers() need to return the extracted maiden name alongside the cleaned name. Options:

  • Return a richer object from split() that includes optional maiden name
  • Add a PersonNameAliasType.MAIDEN_NAME enum value

Part 3 - Add MAIDEN_NAME to PersonNameAliasType

Add MAIDEN_NAME to the existing PersonNameAliasType enum.

Files

File Change
PersonNameParser.java Widen GEB_PATTERN, extract maiden name into return type
PersonNameAliasType.java Add MAIDEN_NAME
PersonService.java Create maiden name alias during findOrCreateByAlias()
PersonNameParserTest.java New tests for widened pattern + maiden name extraction

Depends on

  • #181 (person name aliases) - merged

Found in

ODS import file analysis for #190.

## Problem `GEB_PATTERN` (`\s+geb\.\s+\S+`) is too strict in two ways: 1. **Requires dot after "geb"** - fails on `Ella Dieckmann, geb de Gruyter` (comma, no dot) 2. **Strips only one word** - fails on multi-word maiden names like `geb de Gruyter` (only strips "de", leaves "Gruyter" dangling) Currently the maiden name is discarded entirely. With the `person_name_aliases` table from #181, we can preserve it. ### Examples | Input | Current result | Expected result | |---|---|---| | `Eugenie de Gruyter geb. Muller` | Stripped, maiden name lost | Strip, create alias `Muller` (type=MAIDEN_NAME) | | `Ella Dieckmann, geb de Gruyter` | firstName="Ella Dieckmann, geb", lastName="de Gruyter" | Strip, create alias `de Gruyter` (type=MAIDEN_NAME) | | `Ella Dieckmann geb Wolff` | firstName="Ella Dieckmann geb", lastName="Wolff" | Strip, create alias `Wolff` (type=MAIDEN_NAME) | ## Solution ### Part 1 - Widen GEB_PATTERN Change from: `\s+geb\.\s+\S+` To: `,?\s*geb\.?\s+.*$` This handles: - Optional comma before "geb" - Optional dot after "geb" - Multi-word maiden names (captures everything after "geb" to end of string) ### Part 2 - Preserve maiden name as alias The matched maiden name should be extracted (not just stripped) and passed back to the caller so `PersonService.findOrCreateByAlias()` can create a `PersonNameAlias` with type `MAIDEN_NAME`. This requires a small API change - `split()` and/or `parseReceivers()` need to return the extracted maiden name alongside the cleaned name. Options: - Return a richer object from `split()` that includes optional maiden name - Add a `PersonNameAliasType.MAIDEN_NAME` enum value ### Part 3 - Add MAIDEN_NAME to PersonNameAliasType Add `MAIDEN_NAME` to the existing `PersonNameAliasType` enum. ## Files | File | Change | |---|---| | `PersonNameParser.java` | Widen `GEB_PATTERN`, extract maiden name into return type | | `PersonNameAliasType.java` | Add `MAIDEN_NAME` | | `PersonService.java` | Create maiden name alias during `findOrCreateByAlias()` | | `PersonNameParserTest.java` | New tests for widened pattern + maiden name extraction | ## Depends on - #181 (person name aliases) - merged ## Found in ODS import file analysis for #190.
marcel added the featureperson labels 2026-04-07 18:28:40 +02:00
Author
Owner

Complete Input/Output Table

Every geb-containing entry from the ODS, showing how each flows through the parser.

Direct geb inputs (after // pre-split where applicable)

# Raw input Column Cleaned name Maiden name (alias) Notes
1 Eugenie de Gruyter geb. Muller Von, An Eugenie de Gruyter Muller Standard case: dot, single-word maiden
2 Clara Cram geb. de Gruyter An Clara Cram de Gruyter Multi-word maiden name - currently broken (only strips "de")
3 Ella Dieckmann, geb de Gruyter An Ella Dieckmann de Gruyter Comma prefix, no dot, multi-word - currently broken
4 Elise Rockstroh geb Sintenis Von Elise Rockstroh Sintenis No dot - currently broken
5 Elisabeth geb Fernow Von Elisabeth Fernow No dot, no married last name before geb - currently broken

Entries that reach geb handling after // pre-split

These entries contain //. The // split (from #190) fires first, producing segments that individually contain geb:

# Raw input Column Segments after // split Each segment's cleaned name Maiden alias
6 Eugenie de Gruyter geb. Muller//Walter de Gruyter Von Eugenie de Gruyter geb. Muller, Walter de Gruyter Eugenie de Gruyter, Walter de Gruyter Muller (from segment 1)
7 Clara Cram geb. de Gruyter//Ellen B-M An Clara Cram geb. de Gruyter, Ellen B-M Clara Cram, Ellen B-M de Gruyter (from segment 1)
8 Clara Cram geb. de Gruyter//Ellen de Gruyter An Clara Cram geb. de Gruyter, Ellen de Gruyter Clara Cram, Ellen de Gruyter de Gruyter (from segment 1)
9 Clara Cram geb. de Gruyter//Familie An Clara Cram geb. de Gruyter, Familie Clara Cram (Familie filtered) de Gruyter (from segment 1)
10 Clara Cram geb. de Gruyter//Hans de Gruyter An Clara Cram geb. de Gruyter, Hans de Gruyter Clara Cram, Hans de Gruyter de Gruyter (from segment 1)
11 Clara Cram geb. de Gruyter//Herbert Cram An Clara Cram geb. de Gruyter, Herbert Cram Clara Cram, Herbert Cram de Gruyter (from segment 1)
12 Walter de Gruyter//Eugenie de Gruyter geb. Muller An Walter de Gruyter, Eugenie de Gruyter geb. Muller Walter de Gruyter, Eugenie de Gruyter Muller (from segment 2)

Pattern summary

Pattern Example Current GEB_PATTERN New pattern needed
X geb. Y ... geb. Muller Works (strips) Works (strips + captures)
X geb. Y Z ... geb. de Gruyter Broken (only strips "de") Must strip everything after geb
X geb Y ... geb Sintenis Broken (dot required) Dot must be optional
X, geb Y Z ..., geb de Gruyter Broken (comma, no dot, multi-word) Comma prefix must be optional
X geb Y (no married name) Elisabeth geb Fernow Broken (no dot) Must work without preceding last name
## Complete Input/Output Table Every `geb`-containing entry from the ODS, showing how each flows through the parser. ### Direct `geb` inputs (after `//` pre-split where applicable) | # | Raw input | Column | Cleaned name | Maiden name (alias) | Notes | |---|---|---|---|---|---| | 1 | `Eugenie de Gruyter geb. Muller` | Von, An | `Eugenie de Gruyter` | `Muller` | Standard case: dot, single-word maiden | | 2 | `Clara Cram geb. de Gruyter` | An | `Clara Cram` | `de Gruyter` | Multi-word maiden name - **currently broken** (only strips "de") | | 3 | `Ella Dieckmann, geb de Gruyter` | An | `Ella Dieckmann` | `de Gruyter` | Comma prefix, no dot, multi-word - **currently broken** | | 4 | `Elise Rockstroh geb Sintenis` | Von | `Elise Rockstroh` | `Sintenis` | No dot - **currently broken** | | 5 | `Elisabeth geb Fernow` | Von | `Elisabeth` | `Fernow` | No dot, no married last name before geb - **currently broken** | ### Entries that reach geb handling after `//` pre-split These entries contain `//`. The `//` split (from #190) fires first, producing segments that individually contain `geb`: | # | Raw input | Column | Segments after `//` split | Each segment's cleaned name | Maiden alias | |---|---|---|---|---|---| | 6 | `Eugenie de Gruyter geb. Muller//Walter de Gruyter` | Von | `Eugenie de Gruyter geb. Muller`, `Walter de Gruyter` | `Eugenie de Gruyter`, `Walter de Gruyter` | `Muller` (from segment 1) | | 7 | `Clara Cram geb. de Gruyter//Ellen B-M` | An | `Clara Cram geb. de Gruyter`, `Ellen B-M` | `Clara Cram`, `Ellen B-M` | `de Gruyter` (from segment 1) | | 8 | `Clara Cram geb. de Gruyter//Ellen de Gruyter` | An | `Clara Cram geb. de Gruyter`, `Ellen de Gruyter` | `Clara Cram`, `Ellen de Gruyter` | `de Gruyter` (from segment 1) | | 9 | `Clara Cram geb. de Gruyter//Familie` | An | `Clara Cram geb. de Gruyter`, `Familie` | `Clara Cram` (Familie filtered) | `de Gruyter` (from segment 1) | | 10 | `Clara Cram geb. de Gruyter//Hans de Gruyter` | An | `Clara Cram geb. de Gruyter`, `Hans de Gruyter` | `Clara Cram`, `Hans de Gruyter` | `de Gruyter` (from segment 1) | | 11 | `Clara Cram geb. de Gruyter//Herbert Cram` | An | `Clara Cram geb. de Gruyter`, `Herbert Cram` | `Clara Cram`, `Herbert Cram` | `de Gruyter` (from segment 1) | | 12 | `Walter de Gruyter//Eugenie de Gruyter geb. Muller` | An | `Walter de Gruyter`, `Eugenie de Gruyter geb. Muller` | `Walter de Gruyter`, `Eugenie de Gruyter` | `Muller` (from segment 2) | ### Pattern summary | Pattern | Example | Current GEB_PATTERN | New pattern needed | |---|---|---|---| | `X geb. Y` | `... geb. Muller` | Works (strips) | Works (strips + captures) | | `X geb. Y Z` | `... geb. de Gruyter` | **Broken** (only strips "de") | Must strip everything after `geb` | | `X geb Y` | `... geb Sintenis` | **Broken** (dot required) | Dot must be optional | | `X, geb Y Z` | `..., geb de Gruyter` | **Broken** (comma, no dot, multi-word) | Comma prefix must be optional | | `X geb Y` (no married name) | `Elisabeth geb Fernow` | **Broken** (no dot) | Must work without preceding last name |
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Questions & Observations

  • The new pattern ,?\s*geb\.?\s+.*$ is greedy - .*$ captures everything to end of string. This is correct for the maiden name use case, but means geb must always be followed by the maiden name as the LAST thing in the string. If there were ever text after the maiden name (unlikely, but worth noting), it would be swallowed. The existing parseReceivers strips geb before other processing, so this should be safe.
  • The SplitName record currently has two fields (firstName, lastName). Adding an optional maidenName field changes the return type. This touches every call site of split() - how many are there? If it's just PersonService.findOrCreateByAlias(), the change is contained. If split() is called elsewhere, the ripple is wider.
  • Entry #5 (Elisabeth geb Fernow) is interesting: after stripping geb Fernow, only Elisabeth remains. The maiden name Fernow becomes an alias, but split("Elisabeth") will produce ("Elisabeth", "?"). Is that the intended outcome? The person's married last name is unknown from this input.

Suggestions

  • The SplitName record could gain an optional String maidenName field (null when absent) rather than a separate return type. This keeps the API change minimal: new SplitName(firstName, lastName) becomes new SplitName(firstName, lastName, null) at most existing call sites, and new SplitName(firstName, lastName, maidenName) only where geb was extracted.
  • Write the geb extraction test BEFORE the SplitName API change - the test should assert both the cleaned name AND the extracted maiden name, forcing the API to evolve.
  • The GEB_PATTERN currently lives in both parseReceivers() (step 1) and split(). After the widening, both uses need to be updated and tested independently.
## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Questions & Observations - The new pattern `,?\s*geb\.?\s+.*$` is greedy - `.*$` captures everything to end of string. This is correct for the maiden name use case, but means `geb` must always be followed by the maiden name as the LAST thing in the string. If there were ever text after the maiden name (unlikely, but worth noting), it would be swallowed. The existing `parseReceivers` strips `geb` before other processing, so this should be safe. - The `SplitName` record currently has two fields (`firstName`, `lastName`). Adding an optional `maidenName` field changes the return type. This touches every call site of `split()` - how many are there? If it's just `PersonService.findOrCreateByAlias()`, the change is contained. If `split()` is called elsewhere, the ripple is wider. - Entry #5 (`Elisabeth geb Fernow`) is interesting: after stripping `geb Fernow`, only `Elisabeth` remains. The maiden name `Fernow` becomes an alias, but `split("Elisabeth")` will produce `("Elisabeth", "?")`. Is that the intended outcome? The person's married last name is unknown from this input. ### Suggestions - The `SplitName` record could gain an optional `String maidenName` field (null when absent) rather than a separate return type. This keeps the API change minimal: `new SplitName(firstName, lastName)` becomes `new SplitName(firstName, lastName, null)` at most existing call sites, and `new SplitName(firstName, lastName, maidenName)` only where geb was extracted. - Write the geb extraction test BEFORE the `SplitName` API change - the test should assert both the cleaned name AND the extracted maiden name, forcing the API to evolve. - The GEB_PATTERN currently lives in both `parseReceivers()` (step 1) and `split()`. After the widening, both uses need to be updated and tested independently.
Author
Owner

🏗️ Markus Keller — Application Architect

Questions & Observations

  • API boundary change: SplitName is currently a simple value object. Adding maidenName makes it carry domain semantics (alias creation). Consider whether the parser should return the raw extracted maiden name string, and let PersonService decide what to do with it (create alias, log, discard). This keeps the parser pure and the alias logic in the service where it belongs.
  • Two layers strip geb: parseReceivers() strips it in step 1 (before multi-person splitting), and split() strips it again (before firstName/lastName extraction). After widening the pattern, are both still needed? If parseReceivers strips geb first, split() never sees it. If split() is called directly (Von column), it needs its own stripping. Clarify the data flow.
  • Cross-issue dependency: #210 (parenthesized annotations), #212 (title stripping) also modify the split() pipeline. The processing order matters: geb strip -> dot-norm -> paren strip -> title strip -> known-last-name / fallback. Document this ordering as a contract in the code (comment or method structure) so future changes don't accidentally reorder steps.

Suggestions

  • Consider extracting the geb-stripping into a named method (stripMaidenName(String cleaned) -> record MaidenNameResult(String cleaned, String maidenName)) so the logic is testable in isolation and the pipeline ordering is explicit.
  • The parseReceivers geb stripping discards the maiden name (it only cares about the cleaned name for multi-person splitting). The split() geb stripping is where the maiden name should be captured. Make this distinction explicit.
## 🏗️ Markus Keller — Application Architect ### Questions & Observations - **API boundary change**: `SplitName` is currently a simple value object. Adding `maidenName` makes it carry domain semantics (alias creation). Consider whether the parser should return the raw extracted maiden name string, and let `PersonService` decide what to do with it (create alias, log, discard). This keeps the parser pure and the alias logic in the service where it belongs. - **Two layers strip `geb`**: `parseReceivers()` strips it in step 1 (before multi-person splitting), and `split()` strips it again (before firstName/lastName extraction). After widening the pattern, are both still needed? If `parseReceivers` strips `geb` first, `split()` never sees it. If `split()` is called directly (Von column), it needs its own stripping. Clarify the data flow. - **Cross-issue dependency**: #210 (parenthesized annotations), #212 (title stripping) also modify the `split()` pipeline. The processing order matters: geb strip -> dot-norm -> paren strip -> title strip -> known-last-name / fallback. Document this ordering as a contract in the code (comment or method structure) so future changes don't accidentally reorder steps. ### Suggestions - Consider extracting the geb-stripping into a named method (`stripMaidenName(String cleaned)` -> `record MaidenNameResult(String cleaned, String maidenName)`) so the logic is testable in isolation and the pipeline ordering is explicit. - The `parseReceivers` geb stripping discards the maiden name (it only cares about the cleaned name for multi-person splitting). The `split()` geb stripping is where the maiden name should be captured. Make this distinction explicit.
Author
Owner

🧪 Sara Holt — QA Engineer

Questions & Observations

  • The input/output table in the comment covers 12 entries - great coverage of real data. But some edge cases from the pattern widening need explicit tests:
    • geb as a substring of a word: does Ingeborgstr. match the pattern? The proposed ,?\s*geb\.?\s+.*$ requires whitespace after geb, so Ingeborg wouldn't match. But Ingeborg geb Muller would - is "Ingeborg" a realistic first name here? (Yes, it is a German name. The pattern is fine.)
    • What about geb. at the very start of the string? E.g. geb. Muller with no name before it. The ,?\s* prefix allows this. Result: empty cleaned name -> ("?", "?") with maiden alias Muller. Is that useful?
    • Case sensitivity: does Geb. or GEB. appear in the data? The current pattern is case-sensitive.

Suggestions

  • Add a regression test confirming that the // pre-split + geb stripping interaction works correctly end-to-end for entries like Clara Cram geb. de Gruyter//Ellen B-M (entry #7 in the table). This crosses the #190 and #209 boundaries.
  • Test the maiden name extraction separately from the stripping. A test should assert: given Clara Cram geb. de Gruyter, the extracted maiden name is de Gruyter AND the cleaned name is Clara Cram. Both outputs matter.
  • The table shows Muller without umlaut in some entries and Muller with umlaut in others. Ensure tests use the actual ODS values (with umlauts) to catch encoding issues.
## 🧪 Sara Holt — QA Engineer ### Questions & Observations - The input/output table in the comment covers 12 entries - great coverage of real data. But some edge cases from the pattern widening need explicit tests: - `geb` as a substring of a word: does `Ingeborgstr.` match the pattern? The proposed `,?\s*geb\.?\s+.*$` requires whitespace after `geb`, so `Ingeborg` wouldn't match. But `Ingeborg geb Muller` would - is "Ingeborg" a realistic first name here? (Yes, it is a German name. The pattern is fine.) - What about `geb.` at the very start of the string? E.g. `geb. Muller` with no name before it. The `,?\s*` prefix allows this. Result: empty cleaned name -> `("?", "?")` with maiden alias `Muller`. Is that useful? - Case sensitivity: does `Geb.` or `GEB.` appear in the data? The current pattern is case-sensitive. ### Suggestions - Add a regression test confirming that the `//` pre-split + geb stripping interaction works correctly end-to-end for entries like `Clara Cram geb. de Gruyter//Ellen B-M` (entry #7 in the table). This crosses the #190 and #209 boundaries. - Test the maiden name extraction separately from the stripping. A test should assert: given `Clara Cram geb. de Gruyter`, the extracted maiden name is `de Gruyter` AND the cleaned name is `Clara Cram`. Both outputs matter. - The table shows `Muller` without umlaut in some entries and `Muller` with umlaut in others. Ensure tests use the actual ODS values (with umlauts) to catch encoding issues.
Author
Owner

🔒 Nora "NullX" Steiner — Security Engineer

Questions & Observations

  • Same attack surface as #190 - pure parsing logic, trusted admin input, no new endpoints or user input vectors.
  • The widened regex ,?\s*geb\.?\s+.*$ is safe: .*$ is anchored at end-of-string and cannot cause catastrophic backtracking. No nested quantifiers.
  • The maiden name alias creation in PersonService should use the same input validation as other alias creation. The existing PersonNameAliasDTO validation (#181) should cover this if the service reuses that code path.

Suggestions

  • No security concerns. The only note: when persisting the extracted maiden name as an alias, ensure it goes through the same sanitization/validation as user-supplied aliases from the edit UI. Don't create a separate unvalidated code path just because the input comes from the import pipeline.
## 🔒 Nora "NullX" Steiner — Security Engineer ### Questions & Observations - Same attack surface as #190 - pure parsing logic, trusted admin input, no new endpoints or user input vectors. - The widened regex `,?\s*geb\.?\s+.*$` is safe: `.*$` is anchored at end-of-string and cannot cause catastrophic backtracking. No nested quantifiers. - The maiden name alias creation in `PersonService` should use the same input validation as other alias creation. The existing `PersonNameAliasDTO` validation (#181) should cover this if the service reuses that code path. ### Suggestions - No security concerns. The only note: when persisting the extracted maiden name as an alias, ensure it goes through the same sanitization/validation as user-supplied aliases from the edit UI. Don't create a separate unvalidated code path just because the input comes from the import pipeline.
Author
Owner

🎨 Leonie Voss — UI/UX Design Lead

Questions & Observations

  • Backend-only change with no direct UI impact. The maiden name aliases will appear in the NameHistoryCard on the person detail/edit pages, which already exists from #181.
  • The MAIDEN_NAME alias type will need a display label in the UI. The current NameHistoryCard shows alias types - what label should MAIDEN_NAME show? German: "Geburtsname" is the standard term.

Suggestions

  • Add the i18n translation key for MAIDEN_NAME to the files list: messages/de.json ("Geburtsname"), messages/en.json ("Maiden name"), messages/es.json ("Apellido de soltera"). This is a small addition but easy to forget since the issue focuses on backend parser logic.
## 🎨 Leonie Voss — UI/UX Design Lead ### Questions & Observations - Backend-only change with no direct UI impact. The maiden name aliases will appear in the NameHistoryCard on the person detail/edit pages, which already exists from #181. - The `MAIDEN_NAME` alias type will need a display label in the UI. The current NameHistoryCard shows alias types - what label should `MAIDEN_NAME` show? German: "Geburtsname" is the standard term. ### Suggestions - Add the i18n translation key for `MAIDEN_NAME` to the files list: `messages/de.json` ("Geburtsname"), `messages/en.json` ("Maiden name"), `messages/es.json` ("Apellido de soltera"). This is a small addition but easy to forget since the issue focuses on backend parser logic.
Author
Owner

🛠️ Tobias Wendt — DevOps Engineer

Questions & Observations

  • Pure Java code change + one enum value addition. No new dependencies, no config changes, no migration needed (the person_name_aliases table and PersonNameAliasType enum already exist from #181).
  • The PersonNameAliasType enum is stored as a string in the database (@Enumerated(EnumType.STRING)). Adding MAIDEN_NAME requires no schema migration - just the Java enum change. Good.

Suggestions

  • No concerns from my angle. Self-contained parser enhancement with zero infrastructure impact.
## 🛠️ Tobias Wendt — DevOps Engineer ### Questions & Observations - Pure Java code change + one enum value addition. No new dependencies, no config changes, no migration needed (the `person_name_aliases` table and `PersonNameAliasType` enum already exist from #181). - The `PersonNameAliasType` enum is stored as a string in the database (`@Enumerated(EnumType.STRING)`). Adding `MAIDEN_NAME` requires no schema migration - just the Java enum change. Good. ### Suggestions - No concerns from my angle. Self-contained parser enhancement with zero infrastructure impact.
Author
Owner

🏗️ Markus Keller — Application Architect (Discussion Summary)

Interactive discussion with Marcel covering 5 open items from the architecture review. All resolved.

Resolved Items

  • SplitName target shape — Rather than incrementally bolting fields onto SplitName across four issues, we created #213 as a preparatory refactor. It defines the full record shape once (title, firstName, lastName, maidenName, annotation), extracts the split() pipeline into named methods, and lays all shared infrastructure (nullable firstName, PersonType enum, Person.getDisplayName(), frontend displayName migration across 17+ files). After #213 lands, this issue (#209) becomes a clean additive change: widen GEB_PATTERN and populate the maidenName field.

  • Two-layer geb strippingparseReceivers() keeps its own geb strip (discards maiden name — it only needs the cleaned name for multi-person splitting). split() captures the maiden name via the new stripMaidenName() pipeline method. Both layers are needed because Von-column entries go directly to split() without passing through parseReceivers().

  • Pipeline ordering contract — Resolved by #213 Part 2: each normalization step is a named method composed in split() with explicit ordering: stripMaidenName -> normalizeDotCompressed -> stripAnnotation -> stripTitle -> splitByKnownLastNameOrFallback. The method structure IS the contract.

  • Parser purity vs domain semantics — Parser returns raw extracted strings in SplitName (maiden name as a String, not as an alias). PersonService.findOrCreateByAlias() decides what to do with it (create PersonNameAlias with type MAIDEN_NAME). Parser stays pure, domain logic stays in the service.

  • i18n for MAIDEN_NAME — Ships with #213 Part 10 alongside the enum value, so the label is available when #209 starts populating the field.

Dependency Update

This issue now depends on #213 (preparatory refactor) instead of directly depending on #181. Execution order: #190 (merged) -> #213 -> #209.

## 🏗️ Markus Keller — Application Architect (Discussion Summary) Interactive discussion with Marcel covering 5 open items from the architecture review. All resolved. ### Resolved Items - **SplitName target shape** — Rather than incrementally bolting fields onto `SplitName` across four issues, we created #213 as a preparatory refactor. It defines the full record shape once (`title`, `firstName`, `lastName`, `maidenName`, `annotation`), extracts the `split()` pipeline into named methods, and lays all shared infrastructure (nullable `firstName`, `PersonType` enum, `Person.getDisplayName()`, frontend `displayName` migration across 17+ files). After #213 lands, this issue (#209) becomes a clean additive change: widen `GEB_PATTERN` and populate the `maidenName` field. - **Two-layer geb stripping** — `parseReceivers()` keeps its own geb strip (discards maiden name — it only needs the cleaned name for multi-person splitting). `split()` captures the maiden name via the new `stripMaidenName()` pipeline method. Both layers are needed because Von-column entries go directly to `split()` without passing through `parseReceivers()`. - **Pipeline ordering contract** — Resolved by #213 Part 2: each normalization step is a named method composed in `split()` with explicit ordering: `stripMaidenName` -> `normalizeDotCompressed` -> `stripAnnotation` -> `stripTitle` -> `splitByKnownLastNameOrFallback`. The method structure IS the contract. - **Parser purity vs domain semantics** — Parser returns raw extracted strings in `SplitName` (maiden name as a `String`, not as an alias). `PersonService.findOrCreateByAlias()` decides what to do with it (create `PersonNameAlias` with type `MAIDEN_NAME`). Parser stays pure, domain logic stays in the service. - **i18n for MAIDEN_NAME** — Ships with #213 Part 10 alongside the enum value, so the label is available when #209 starts populating the field. ### Dependency Update This issue now depends on #213 (preparatory refactor) instead of directly depending on #181. Execution order: #190 (merged) -> #213 -> #209.
Author
Owner

Implementation Complete

All 3 parts implemented on branch feat/issues-209-213-person-parser-enhancements.

Commits

Commit Description
c49cb34 Widen GEB_PATTERN and extract maiden name in stripMaidenName()
8421d45 Add parseReceivers tests for widened geb pattern
9f90cc1 Create MAIDEN_NAME alias in findOrCreateByAlias()

What changed

  • GEB_PATTERN widened from \s+geb\.\s+\S+ to ,?\s*geb\.?\s+(.+)$ — handles optional comma, optional dot, multi-word maiden names
  • stripMaidenName() now captures the maiden name and returns it in MaidenNameResult
  • PersonService.findOrCreateByAlias() creates a PersonNameAlias with type MAIDEN_NAME when a maiden name is extracted
  • All 5 input variants from the ODS data are covered by tests

Test results

  • Backend: 704 tests passing (2 new parser tests + 2 new service tests)
## Implementation Complete All 3 parts implemented on branch `feat/issues-209-213-person-parser-enhancements`. ### Commits | Commit | Description | |--------|-------------| | `c49cb34` | Widen GEB_PATTERN and extract maiden name in stripMaidenName() | | `8421d45` | Add parseReceivers tests for widened geb pattern | | `9f90cc1` | Create MAIDEN_NAME alias in findOrCreateByAlias() | ### What changed - **GEB_PATTERN** widened from `\s+geb\.\s+\S+` to `,?\s*geb\.?\s+(.+)$` — handles optional comma, optional dot, multi-word maiden names - **stripMaidenName()** now captures the maiden name and returns it in `MaidenNameResult` - **PersonService.findOrCreateByAlias()** creates a `PersonNameAlias` with type `MAIDEN_NAME` when a maiden name is extracted - All 5 input variants from the ODS data are covered by tests ### Test results - Backend: 704 tests passing (2 new parser tests + 2 new service tests)
Sign in to join this conversation.
No Label feature person
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#209