feat(parser): improve GEB_PATTERN and store maiden name as alias #209

New Issue

marcel · 2026-04-07T18:27:47+02:00

marcel commented

2026-04-07 18:27:47 +02:00

Problem

GEB_PATTERN (\s+geb\.\s+\S+) is too strict in two ways:

Requires dot after "geb" - fails on Ella Dieckmann, geb de Gruyter (comma, no dot)
Strips only one word - fails on multi-word maiden names like geb de Gruyter (only strips "de", leaves "Gruyter" dangling)

Currently the maiden name is discarded entirely. With the person_name_aliases table from #181, we can preserve it.

Examples

Input	Current result	Expected result
`Eugenie de Gruyter geb. Muller`	Stripped, maiden name lost	Strip, create alias `Muller` (type=MAIDEN_NAME)
`Ella Dieckmann, geb de Gruyter`	firstName="Ella Dieckmann, geb", lastName="de Gruyter"	Strip, create alias `de Gruyter` (type=MAIDEN_NAME)
`Ella Dieckmann geb Wolff`	firstName="Ella Dieckmann geb", lastName="Wolff"	Strip, create alias `Wolff` (type=MAIDEN_NAME)

Solution

Part 1 - Widen GEB_PATTERN

Change from: \s+geb\.\s+\S+
To: ,?\s*geb\.?\s+.*$

This handles:

Optional comma before "geb"
Optional dot after "geb"
Multi-word maiden names (captures everything after "geb" to end of string)

Part 2 - Preserve maiden name as alias

The matched maiden name should be extracted (not just stripped) and passed back to the caller so PersonService.findOrCreateByAlias() can create a PersonNameAlias with type MAIDEN_NAME.

This requires a small API change - split() and/or parseReceivers() need to return the extracted maiden name alongside the cleaned name. Options:

Return a richer object from split() that includes optional maiden name
Add a PersonNameAliasType.MAIDEN_NAME enum value

Part 3 - Add MAIDEN_NAME to PersonNameAliasType

Add MAIDEN_NAME to the existing PersonNameAliasType enum.

Files

File	Change
`PersonNameParser.java`	Widen `GEB_PATTERN`, extract maiden name into return type
`PersonNameAliasType.java`	Add `MAIDEN_NAME`
`PersonService.java`	Create maiden name alias during `findOrCreateByAlias()`
`PersonNameParserTest.java`	New tests for widened pattern + maiden name extraction

Depends on

#181 (person name aliases) - merged

Found in

ODS import file analysis for #190.

## Problem `GEB_PATTERN` (`\s+geb\.\s+\S+`) is too strict in two ways: 1. **Requires dot after "geb"** - fails on `Ella Dieckmann, geb de Gruyter` (comma, no dot) 2. **Strips only one word** - fails on multi-word maiden names like `geb de Gruyter` (only strips "de", leaves "Gruyter" dangling) Currently the maiden name is discarded entirely. With the `person_name_aliases` table from #181, we can preserve it. ### Examples | Input | Current result | Expected result | |---|---|---| | `Eugenie de Gruyter geb. Muller` | Stripped, maiden name lost | Strip, create alias `Muller` (type=MAIDEN_NAME) | | `Ella Dieckmann, geb de Gruyter` | firstName="Ella Dieckmann, geb", lastName="de Gruyter" | Strip, create alias `de Gruyter` (type=MAIDEN_NAME) | | `Ella Dieckmann geb Wolff` | firstName="Ella Dieckmann geb", lastName="Wolff" | Strip, create alias `Wolff` (type=MAIDEN_NAME) | ## Solution ### Part 1 - Widen GEB_PATTERN Change from: `\s+geb\.\s+\S+` To: `,?\s*geb\.?\s+.*$` This handles: - Optional comma before "geb" - Optional dot after "geb" - Multi-word maiden names (captures everything after "geb" to end of string) ### Part 2 - Preserve maiden name as alias The matched maiden name should be extracted (not just stripped) and passed back to the caller so `PersonService.findOrCreateByAlias()` can create a `PersonNameAlias` with type `MAIDEN_NAME`. This requires a small API change - `split()` and/or `parseReceivers()` need to return the extracted maiden name alongside the cleaned name. Options: - Return a richer object from `split()` that includes optional maiden name - Add a `PersonNameAliasType.MAIDEN_NAME` enum value ### Part 3 - Add MAIDEN_NAME to PersonNameAliasType Add `MAIDEN_NAME` to the existing `PersonNameAliasType` enum. ## Files | File | Change | |---|---| | `PersonNameParser.java` | Widen `GEB_PATTERN`, extract maiden name into return type | | `PersonNameAliasType.java` | Add `MAIDEN_NAME` | | `PersonService.java` | Create maiden name alias during `findOrCreateByAlias()` | | `PersonNameParserTest.java` | New tests for widened pattern + maiden name extraction | ## Depends on - #181 (person name aliases) - merged ## Found in ODS import file analysis for #190.

marcel added the feature person labels 2026-04-07 18:28:40 +02:00

marcel commented

2026-04-07 18:32:04 +02:00

Complete Input/Output Table

Every geb-containing entry from the ODS, showing how each flows through the parser.

Direct `geb` inputs (after `//` pre-split where applicable)

#	Raw input	Column	Cleaned name	Maiden name (alias)	Notes
1	`Eugenie de Gruyter geb. Muller`	Von, An	`Eugenie de Gruyter`	`Muller`	Standard case: dot, single-word maiden
2	`Clara Cram geb. de Gruyter`	An	`Clara Cram`	`de Gruyter`	Multi-word maiden name - currently broken (only strips "de")
3	`Ella Dieckmann, geb de Gruyter`	An	`Ella Dieckmann`	`de Gruyter`	Comma prefix, no dot, multi-word - currently broken
4	`Elise Rockstroh geb Sintenis`	Von	`Elise Rockstroh`	`Sintenis`	No dot - currently broken
5	`Elisabeth geb Fernow`	Von	`Elisabeth`	`Fernow`	No dot, no married last name before geb - currently broken

Entries that reach geb handling after `//` pre-split

These entries contain //. The // split (from #190) fires first, producing segments that individually contain geb:

#	Raw input	Column	Segments after `//` split	Each segment's cleaned name	Maiden alias
6	`Eugenie de Gruyter geb. Muller//Walter de Gruyter`	Von	`Eugenie de Gruyter geb. Muller`, `Walter de Gruyter`	`Eugenie de Gruyter`, `Walter de Gruyter`	`Muller` (from segment 1)
7	`Clara Cram geb. de Gruyter//Ellen B-M`	An	`Clara Cram geb. de Gruyter`, `Ellen B-M`	`Clara Cram`, `Ellen B-M`	`de Gruyter` (from segment 1)
8	`Clara Cram geb. de Gruyter//Ellen de Gruyter`	An	`Clara Cram geb. de Gruyter`, `Ellen de Gruyter`	`Clara Cram`, `Ellen de Gruyter`	`de Gruyter` (from segment 1)
9	`Clara Cram geb. de Gruyter//Familie`	An	`Clara Cram geb. de Gruyter`, `Familie`	`Clara Cram` (Familie filtered)	`de Gruyter` (from segment 1)
10	`Clara Cram geb. de Gruyter//Hans de Gruyter`	An	`Clara Cram geb. de Gruyter`, `Hans de Gruyter`	`Clara Cram`, `Hans de Gruyter`	`de Gruyter` (from segment 1)
11	`Clara Cram geb. de Gruyter//Herbert Cram`	An	`Clara Cram geb. de Gruyter`, `Herbert Cram`	`Clara Cram`, `Herbert Cram`	`de Gruyter` (from segment 1)
12	`Walter de Gruyter//Eugenie de Gruyter geb. Muller`	An	`Walter de Gruyter`, `Eugenie de Gruyter geb. Muller`	`Walter de Gruyter`, `Eugenie de Gruyter`	`Muller` (from segment 2)

Pattern summary

Pattern	Example	Current GEB_PATTERN	New pattern needed
`X geb. Y`	`... geb. Muller`	Works (strips)	Works (strips + captures)
`X geb. Y Z`	`... geb. de Gruyter`	Broken (only strips "de")	Must strip everything after `geb`
`X geb Y`	`... geb Sintenis`	Broken (dot required)	Dot must be optional
`X, geb Y Z`	`..., geb de Gruyter`	Broken (comma, no dot, multi-word)	Comma prefix must be optional
`X geb Y` (no married name)	`Elisabeth geb Fernow`	Broken (no dot)	Must work without preceding last name

## Complete Input/Output Table Every `geb`-containing entry from the ODS, showing how each flows through the parser. ### Direct `geb` inputs (after `//` pre-split where applicable) | # | Raw input | Column | Cleaned name | Maiden name (alias) | Notes | |---|---|---|---|---|---| | 1 | `Eugenie de Gruyter geb. Muller` | Von, An | `Eugenie de Gruyter` | `Muller` | Standard case: dot, single-word maiden | | 2 | `Clara Cram geb. de Gruyter` | An | `Clara Cram` | `de Gruyter` | Multi-word maiden name - **currently broken** (only strips "de") | | 3 | `Ella Dieckmann, geb de Gruyter` | An | `Ella Dieckmann` | `de Gruyter` | Comma prefix, no dot, multi-word - **currently broken** | | 4 | `Elise Rockstroh geb Sintenis` | Von | `Elise Rockstroh` | `Sintenis` | No dot - **currently broken** | | 5 | `Elisabeth geb Fernow` | Von | `Elisabeth` | `Fernow` | No dot, no married last name before geb - **currently broken** | ### Entries that reach geb handling after `//` pre-split These entries contain `//`. The `//` split (from #190) fires first, producing segments that individually contain `geb`: | # | Raw input | Column | Segments after `//` split | Each segment's cleaned name | Maiden alias | |---|---|---|---|---|---| | 6 | `Eugenie de Gruyter geb. Muller//Walter de Gruyter` | Von | `Eugenie de Gruyter geb. Muller`, `Walter de Gruyter` | `Eugenie de Gruyter`, `Walter de Gruyter` | `Muller` (from segment 1) | | 7 | `Clara Cram geb. de Gruyter//Ellen B-M` | An | `Clara Cram geb. de Gruyter`, `Ellen B-M` | `Clara Cram`, `Ellen B-M` | `de Gruyter` (from segment 1) | | 8 | `Clara Cram geb. de Gruyter//Ellen de Gruyter` | An | `Clara Cram geb. de Gruyter`, `Ellen de Gruyter` | `Clara Cram`, `Ellen de Gruyter` | `de Gruyter` (from segment 1) | | 9 | `Clara Cram geb. de Gruyter//Familie` | An | `Clara Cram geb. de Gruyter`, `Familie` | `Clara Cram` (Familie filtered) | `de Gruyter` (from segment 1) | | 10 | `Clara Cram geb. de Gruyter//Hans de Gruyter` | An | `Clara Cram geb. de Gruyter`, `Hans de Gruyter` | `Clara Cram`, `Hans de Gruyter` | `de Gruyter` (from segment 1) | | 11 | `Clara Cram geb. de Gruyter//Herbert Cram` | An | `Clara Cram geb. de Gruyter`, `Herbert Cram` | `Clara Cram`, `Herbert Cram` | `de Gruyter` (from segment 1) | | 12 | `Walter de Gruyter//Eugenie de Gruyter geb. Muller` | An | `Walter de Gruyter`, `Eugenie de Gruyter geb. Muller` | `Walter de Gruyter`, `Eugenie de Gruyter` | `Muller` (from segment 2) | ### Pattern summary | Pattern | Example | Current GEB_PATTERN | New pattern needed | |---|---|---|---| | `X geb. Y` | `... geb. Muller` | Works (strips) | Works (strips + captures) | | `X geb. Y Z` | `... geb. de Gruyter` | **Broken** (only strips "de") | Must strip everything after `geb` | | `X geb Y` | `... geb Sintenis` | **Broken** (dot required) | Dot must be optional | | `X, geb Y Z` | `..., geb de Gruyter` | **Broken** (comma, no dot, multi-word) | Comma prefix must be optional | | `X geb Y` (no married name) | `Elisabeth geb Fernow` | **Broken** (no dot) | Must work without preceding last name |

marcel commented

2026-04-07 18:41:14 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Questions & Observations

The new pattern ,?\s*geb\.?\s+.*$ is greedy - .*$ captures everything to end of string. This is correct for the maiden name use case, but means geb must always be followed by the maiden name as the LAST thing in the string. If there were ever text after the maiden name (unlikely, but worth noting), it would be swallowed. The existing parseReceivers strips geb before other processing, so this should be safe.
The SplitName record currently has two fields (firstName, lastName). Adding an optional maidenName field changes the return type. This touches every call site of split() - how many are there? If it's just PersonService.findOrCreateByAlias(), the change is contained. If split() is called elsewhere, the ripple is wider.
Entry #5 (Elisabeth geb Fernow) is interesting: after stripping geb Fernow, only Elisabeth remains. The maiden name Fernow becomes an alias, but split("Elisabeth") will produce ("Elisabeth", "?"). Is that the intended outcome? The person's married last name is unknown from this input.

Suggestions

The SplitName record could gain an optional String maidenName field (null when absent) rather than a separate return type. This keeps the API change minimal: new SplitName(firstName, lastName) becomes new SplitName(firstName, lastName, null) at most existing call sites, and new SplitName(firstName, lastName, maidenName) only where geb was extracted.
Write the geb extraction test BEFORE the SplitName API change - the test should assert both the cleaned name AND the extracted maiden name, forcing the API to evolve.
The GEB_PATTERN currently lives in both parseReceivers() (step 1) and split(). After the widening, both uses need to be updated and tested independently.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Questions & Observations - The new pattern `,?\s*geb\.?\s+.*$` is greedy - `.*$` captures everything to end of string. This is correct for the maiden name use case, but means `geb` must always be followed by the maiden name as the LAST thing in the string. If there were ever text after the maiden name (unlikely, but worth noting), it would be swallowed. The existing `parseReceivers` strips `geb` before other processing, so this should be safe. - The `SplitName` record currently has two fields (`firstName`, `lastName`). Adding an optional `maidenName` field changes the return type. This touches every call site of `split()` - how many are there? If it's just `PersonService.findOrCreateByAlias()`, the change is contained. If `split()` is called elsewhere, the ripple is wider. - Entry #5 (`Elisabeth geb Fernow`) is interesting: after stripping `geb Fernow`, only `Elisabeth` remains. The maiden name `Fernow` becomes an alias, but `split("Elisabeth")` will produce `("Elisabeth", "?")`. Is that the intended outcome? The person's married last name is unknown from this input. ### Suggestions - The `SplitName` record could gain an optional `String maidenName` field (null when absent) rather than a separate return type. This keeps the API change minimal: `new SplitName(firstName, lastName)` becomes `new SplitName(firstName, lastName, null)` at most existing call sites, and `new SplitName(firstName, lastName, maidenName)` only where geb was extracted. - Write the geb extraction test BEFORE the `SplitName` API change - the test should assert both the cleaned name AND the extracted maiden name, forcing the API to evolve. - The GEB_PATTERN currently lives in both `parseReceivers()` (step 1) and `split()`. After the widening, both uses need to be updated and tested independently.

marcel commented

2026-04-07 18:41:23 +02:00

🏗️ Markus Keller — Application Architect

Questions & Observations

API boundary change: SplitName is currently a simple value object. Adding maidenName makes it carry domain semantics (alias creation). Consider whether the parser should return the raw extracted maiden name string, and let PersonService decide what to do with it (create alias, log, discard). This keeps the parser pure and the alias logic in the service where it belongs.
Two layers strip geb: parseReceivers() strips it in step 1 (before multi-person splitting), and split() strips it again (before firstName/lastName extraction). After widening the pattern, are both still needed? If parseReceivers strips geb first, split() never sees it. If split() is called directly (Von column), it needs its own stripping. Clarify the data flow.
Cross-issue dependency: #210 (parenthesized annotations), #212 (title stripping) also modify the split() pipeline. The processing order matters: geb strip -> dot-norm -> paren strip -> title strip -> known-last-name / fallback. Document this ordering as a contract in the code (comment or method structure) so future changes don't accidentally reorder steps.

Suggestions

Consider extracting the geb-stripping into a named method (stripMaidenName(String cleaned) -> record MaidenNameResult(String cleaned, String maidenName)) so the logic is testable in isolation and the pipeline ordering is explicit.
The parseReceivers geb stripping discards the maiden name (it only cares about the cleaned name for multi-person splitting). The split() geb stripping is where the maiden name should be captured. Make this distinction explicit.

## 🏗️ Markus Keller — Application Architect ### Questions & Observations - **API boundary change**: `SplitName` is currently a simple value object. Adding `maidenName` makes it carry domain semantics (alias creation). Consider whether the parser should return the raw extracted maiden name string, and let `PersonService` decide what to do with it (create alias, log, discard). This keeps the parser pure and the alias logic in the service where it belongs. - **Two layers strip `geb`**: `parseReceivers()` strips it in step 1 (before multi-person splitting), and `split()` strips it again (before firstName/lastName extraction). After widening the pattern, are both still needed? If `parseReceivers` strips `geb` first, `split()` never sees it. If `split()` is called directly (Von column), it needs its own stripping. Clarify the data flow. - **Cross-issue dependency**: #210 (parenthesized annotations), #212 (title stripping) also modify the `split()` pipeline. The processing order matters: geb strip -> dot-norm -> paren strip -> title strip -> known-last-name / fallback. Document this ordering as a contract in the code (comment or method structure) so future changes don't accidentally reorder steps. ### Suggestions - Consider extracting the geb-stripping into a named method (`stripMaidenName(String cleaned)` -> `record MaidenNameResult(String cleaned, String maidenName)`) so the logic is testable in isolation and the pipeline ordering is explicit. - The `parseReceivers` geb stripping discards the maiden name (it only cares about the cleaned name for multi-person splitting). The `split()` geb stripping is where the maiden name should be captured. Make this distinction explicit.

marcel commented

2026-04-07 18:41:34 +02:00

🧪 Sara Holt — QA Engineer

Questions & Observations

The input/output table in the comment covers 12 entries - great coverage of real data. But some edge cases from the pattern widening need explicit tests:
- geb as a substring of a word: does Ingeborgstr. match the pattern? The proposed ,?\s*geb\.?\s+.*$ requires whitespace after geb, so Ingeborg wouldn't match. But Ingeborg geb Muller would - is "Ingeborg" a realistic first name here? (Yes, it is a German name. The pattern is fine.)
- What about geb. at the very start of the string? E.g. geb. Muller with no name before it. The ,?\s* prefix allows this. Result: empty cleaned name -> ("?", "?") with maiden alias Muller. Is that useful?
- Case sensitivity: does Geb. or GEB. appear in the data? The current pattern is case-sensitive.

Suggestions

Add a regression test confirming that the // pre-split + geb stripping interaction works correctly end-to-end for entries like Clara Cram geb. de Gruyter//Ellen B-M (entry #7 in the table). This crosses the #190 and #209 boundaries.
Test the maiden name extraction separately from the stripping. A test should assert: given Clara Cram geb. de Gruyter, the extracted maiden name is de Gruyter AND the cleaned name is Clara Cram. Both outputs matter.
The table shows Muller without umlaut in some entries and Muller with umlaut in others. Ensure tests use the actual ODS values (with umlauts) to catch encoding issues.

## 🧪 Sara Holt — QA Engineer ### Questions & Observations - The input/output table in the comment covers 12 entries - great coverage of real data. But some edge cases from the pattern widening need explicit tests: - `geb` as a substring of a word: does `Ingeborgstr.` match the pattern? The proposed `,?\s*geb\.?\s+.*$` requires whitespace after `geb`, so `Ingeborg` wouldn't match. But `Ingeborg geb Muller` would - is "Ingeborg" a realistic first name here? (Yes, it is a German name. The pattern is fine.) - What about `geb.` at the very start of the string? E.g. `geb. Muller` with no name before it. The `,?\s*` prefix allows this. Result: empty cleaned name -> `("?", "?")` with maiden alias `Muller`. Is that useful? - Case sensitivity: does `Geb.` or `GEB.` appear in the data? The current pattern is case-sensitive. ### Suggestions - Add a regression test confirming that the `//` pre-split + geb stripping interaction works correctly end-to-end for entries like `Clara Cram geb. de Gruyter//Ellen B-M` (entry #7 in the table). This crosses the #190 and #209 boundaries. - Test the maiden name extraction separately from the stripping. A test should assert: given `Clara Cram geb. de Gruyter`, the extracted maiden name is `de Gruyter` AND the cleaned name is `Clara Cram`. Both outputs matter. - The table shows `Muller` without umlaut in some entries and `Muller` with umlaut in others. Ensure tests use the actual ODS values (with umlauts) to catch encoding issues.

marcel commented

2026-04-07 18:41:41 +02:00

🔒 Nora "NullX" Steiner — Security Engineer

Questions & Observations

Same attack surface as #190 - pure parsing logic, trusted admin input, no new endpoints or user input vectors.
The widened regex ,?\s*geb\.?\s+.*$ is safe: .*$ is anchored at end-of-string and cannot cause catastrophic backtracking. No nested quantifiers.
The maiden name alias creation in PersonService should use the same input validation as other alias creation. The existing PersonNameAliasDTO validation (#181) should cover this if the service reuses that code path.

Suggestions

No security concerns. The only note: when persisting the extracted maiden name as an alias, ensure it goes through the same sanitization/validation as user-supplied aliases from the edit UI. Don't create a separate unvalidated code path just because the input comes from the import pipeline.

## 🔒 Nora "NullX" Steiner — Security Engineer ### Questions & Observations - Same attack surface as #190 - pure parsing logic, trusted admin input, no new endpoints or user input vectors. - The widened regex `,?\s*geb\.?\s+.*$` is safe: `.*$` is anchored at end-of-string and cannot cause catastrophic backtracking. No nested quantifiers. - The maiden name alias creation in `PersonService` should use the same input validation as other alias creation. The existing `PersonNameAliasDTO` validation (#181) should cover this if the service reuses that code path. ### Suggestions - No security concerns. The only note: when persisting the extracted maiden name as an alias, ensure it goes through the same sanitization/validation as user-supplied aliases from the edit UI. Don't create a separate unvalidated code path just because the input comes from the import pipeline.

marcel commented

2026-04-07 18:41:47 +02:00

🎨 Leonie Voss — UI/UX Design Lead

Questions & Observations

Backend-only change with no direct UI impact. The maiden name aliases will appear in the NameHistoryCard on the person detail/edit pages, which already exists from #181.
The MAIDEN_NAME alias type will need a display label in the UI. The current NameHistoryCard shows alias types - what label should MAIDEN_NAME show? German: "Geburtsname" is the standard term.

Suggestions

Add the i18n translation key for MAIDEN_NAME to the files list: messages/de.json ("Geburtsname"), messages/en.json ("Maiden name"), messages/es.json ("Apellido de soltera"). This is a small addition but easy to forget since the issue focuses on backend parser logic.

## 🎨 Leonie Voss — UI/UX Design Lead ### Questions & Observations - Backend-only change with no direct UI impact. The maiden name aliases will appear in the NameHistoryCard on the person detail/edit pages, which already exists from #181. - The `MAIDEN_NAME` alias type will need a display label in the UI. The current NameHistoryCard shows alias types - what label should `MAIDEN_NAME` show? German: "Geburtsname" is the standard term. ### Suggestions - Add the i18n translation key for `MAIDEN_NAME` to the files list: `messages/de.json` ("Geburtsname"), `messages/en.json` ("Maiden name"), `messages/es.json` ("Apellido de soltera"). This is a small addition but easy to forget since the issue focuses on backend parser logic.

marcel commented

2026-04-07 18:41:53 +02:00

🛠️ Tobias Wendt — DevOps Engineer

Questions & Observations

Pure Java code change + one enum value addition. No new dependencies, no config changes, no migration needed (the person_name_aliases table and PersonNameAliasType enum already exist from #181).
The PersonNameAliasType enum is stored as a string in the database (@Enumerated(EnumType.STRING)). Adding MAIDEN_NAME requires no schema migration - just the Java enum change. Good.

Suggestions

No concerns from my angle. Self-contained parser enhancement with zero infrastructure impact.

## 🛠️ Tobias Wendt — DevOps Engineer ### Questions & Observations - Pure Java code change + one enum value addition. No new dependencies, no config changes, no migration needed (the `person_name_aliases` table and `PersonNameAliasType` enum already exist from #181). - The `PersonNameAliasType` enum is stored as a string in the database (`@Enumerated(EnumType.STRING)`). Adding `MAIDEN_NAME` requires no schema migration - just the Java enum change. Good. ### Suggestions - No concerns from my angle. Self-contained parser enhancement with zero infrastructure impact.

marcel referenced this issue

2026-04-07 18:43:16 +02:00

feat(model): add title/salutation field to Person and make firstName optional #212

marcel referenced this issue

2026-04-07 18:50:13 +02:00

refactor: preparatory infrastructure for PersonNameParser enhancements (#209-#212) #213

marcel commented

2026-04-07 18:51:39 +02:00

🏗️ Markus Keller — Application Architect (Discussion Summary)

Interactive discussion with Marcel covering 5 open items from the architecture review. All resolved.

Resolved Items

SplitName target shape — Rather than incrementally bolting fields onto SplitName across four issues, we created #213 as a preparatory refactor. It defines the full record shape once (title, firstName, lastName, maidenName, annotation), extracts the split() pipeline into named methods, and lays all shared infrastructure (nullable firstName, PersonType enum, Person.getDisplayName(), frontend displayName migration across 17+ files). After #213 lands, this issue (#209) becomes a clean additive change: widen GEB_PATTERN and populate the maidenName field.
Two-layer geb stripping — parseReceivers() keeps its own geb strip (discards maiden name — it only needs the cleaned name for multi-person splitting). split() captures the maiden name via the new stripMaidenName() pipeline method. Both layers are needed because Von-column entries go directly to split() without passing through parseReceivers().
Pipeline ordering contract — Resolved by #213 Part 2: each normalization step is a named method composed in split() with explicit ordering: stripMaidenName -> normalizeDotCompressed -> stripAnnotation -> stripTitle -> splitByKnownLastNameOrFallback. The method structure IS the contract.
Parser purity vs domain semantics — Parser returns raw extracted strings in SplitName (maiden name as a String, not as an alias). PersonService.findOrCreateByAlias() decides what to do with it (create PersonNameAlias with type MAIDEN_NAME). Parser stays pure, domain logic stays in the service.
i18n for MAIDEN_NAME — Ships with #213 Part 10 alongside the enum value, so the label is available when #209 starts populating the field.

Dependency Update

This issue now depends on #213 (preparatory refactor) instead of directly depending on #181. Execution order: #190 (merged) -> #213 -> #209.

## 🏗️ Markus Keller — Application Architect (Discussion Summary) Interactive discussion with Marcel covering 5 open items from the architecture review. All resolved. ### Resolved Items - **SplitName target shape** — Rather than incrementally bolting fields onto `SplitName` across four issues, we created #213 as a preparatory refactor. It defines the full record shape once (`title`, `firstName`, `lastName`, `maidenName`, `annotation`), extracts the `split()` pipeline into named methods, and lays all shared infrastructure (nullable `firstName`, `PersonType` enum, `Person.getDisplayName()`, frontend `displayName` migration across 17+ files). After #213 lands, this issue (#209) becomes a clean additive change: widen `GEB_PATTERN` and populate the `maidenName` field. - **Two-layer geb stripping** — `parseReceivers()` keeps its own geb strip (discards maiden name — it only needs the cleaned name for multi-person splitting). `split()` captures the maiden name via the new `stripMaidenName()` pipeline method. Both layers are needed because Von-column entries go directly to `split()` without passing through `parseReceivers()`. - **Pipeline ordering contract** — Resolved by #213 Part 2: each normalization step is a named method composed in `split()` with explicit ordering: `stripMaidenName` -> `normalizeDotCompressed` -> `stripAnnotation` -> `stripTitle` -> `splitByKnownLastNameOrFallback`. The method structure IS the contract. - **Parser purity vs domain semantics** — Parser returns raw extracted strings in `SplitName` (maiden name as a `String`, not as an alias). `PersonService.findOrCreateByAlias()` decides what to do with it (create `PersonNameAlias` with type `MAIDEN_NAME`). Parser stays pure, domain logic stays in the service. - **i18n for MAIDEN_NAME** — Ships with #213 Part 10 alongside the enum value, so the label is available when #209 starts populating the field. ### Dependency Update This issue now depends on #213 (preparatory refactor) instead of directly depending on #181. Execution order: #190 (merged) -> #213 -> #209.

marcel commented

2026-04-08 12:56:23 +02:00

Implementation Complete

All 3 parts implemented on branch feat/issues-209-213-person-parser-enhancements.

Commits

Commit	Description
`c49cb34`	Widen GEB_PATTERN and extract maiden name in stripMaidenName()
`8421d45`	Add parseReceivers tests for widened geb pattern
`9f90cc1`	Create MAIDEN_NAME alias in findOrCreateByAlias()

What changed

GEB_PATTERN widened from \s+geb\.\s+\S+ to ,?\s*geb\.?\s+(.+)$ — handles optional comma, optional dot, multi-word maiden names
stripMaidenName() now captures the maiden name and returns it in MaidenNameResult
PersonService.findOrCreateByAlias() creates a PersonNameAlias with type MAIDEN_NAME when a maiden name is extracted
All 5 input variants from the ODS data are covered by tests

Test results

Backend: 704 tests passing (2 new parser tests + 2 new service tests)

## Implementation Complete All 3 parts implemented on branch `feat/issues-209-213-person-parser-enhancements`. ### Commits | Commit | Description | |--------|-------------| | `c49cb34` | Widen GEB_PATTERN and extract maiden name in stripMaidenName() | | `8421d45` | Add parseReceivers tests for widened geb pattern | | `9f90cc1` | Create MAIDEN_NAME alias in findOrCreateByAlias() | ### What changed - **GEB_PATTERN** widened from `\s+geb\.\s+\S+` to `,?\s*geb\.?\s+(.+)$` — handles optional comma, optional dot, multi-word maiden names - **stripMaidenName()** now captures the maiden name and returns it in `MaidenNameResult` - **PersonService.findOrCreateByAlias()** creates a `PersonNameAlias` with type `MAIDEN_NAME` when a maiden name is extracted - All 5 input variants from the ODS data are covered by tests ### Test results - Backend: 704 tests passing (2 new parser tests + 2 new service tests)

marcel referenced a pull request that will close this issue

2026-04-08 13:16:37 +02:00

feat: PersonNameParser enhancements and Person model refactor (#209-#213) #215

marcel closed this issue

2026-04-08 18:48:01 +02:00

marcel referenced this issue from a commit

2026-04-08 18:48:03 +02:00

feat(model): add PersonType enum and MAIDEN_NAME alias type

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#209

feat(parser): improve GEB_PATTERN and store maiden name as alias #209

Problem

Examples

Solution

Part 1 - Widen GEB_PATTERN

Part 2 - Preserve maiden name as alias

Part 3 - Add MAIDEN_NAME to PersonNameAliasType

Files

Depends on

Found in

Complete Input/Output Table

Direct geb inputs (after // pre-split where applicable)

Entries that reach geb handling after // pre-split

Pattern summary

👨‍💻 Felix Brandt — Senior Fullstack Developer

Questions & Observations

Suggestions

🏗️ Markus Keller — Application Architect

Questions & Observations

Suggestions

🧪 Sara Holt — QA Engineer

Questions & Observations

Suggestions

🔒 Nora "NullX" Steiner — Security Engineer

Questions & Observations

Suggestions

🎨 Leonie Voss — UI/UX Design Lead

Questions & Observations

Suggestions

🛠️ Tobias Wendt — DevOps Engineer

Questions & Observations

Suggestions

🏗️ Markus Keller — Application Architect (Discussion Summary)

Resolved Items

Dependency Update

Implementation Complete

Commits

What changed

Test results

Direct `geb` inputs (after `//` pre-split where applicable)

Entries that reach geb handling after `//` pre-split