feat(parser): improve GEB_PATTERN and store maiden name as alias #209
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
GEB_PATTERN(\s+geb\.\s+\S+) is too strict in two ways:Ella Dieckmann, geb de Gruyter(comma, no dot)geb de Gruyter(only strips "de", leaves "Gruyter" dangling)Currently the maiden name is discarded entirely. With the
person_name_aliasestable from #181, we can preserve it.Examples
Eugenie de Gruyter geb. MullerMuller(type=MAIDEN_NAME)Ella Dieckmann, geb de Gruyterde Gruyter(type=MAIDEN_NAME)Ella Dieckmann geb WolffWolff(type=MAIDEN_NAME)Solution
Part 1 - Widen GEB_PATTERN
Change from:
\s+geb\.\s+\S+To:
,?\s*geb\.?\s+.*$This handles:
Part 2 - Preserve maiden name as alias
The matched maiden name should be extracted (not just stripped) and passed back to the caller so
PersonService.findOrCreateByAlias()can create aPersonNameAliaswith typeMAIDEN_NAME.This requires a small API change -
split()and/orparseReceivers()need to return the extracted maiden name alongside the cleaned name. Options:split()that includes optional maiden namePersonNameAliasType.MAIDEN_NAMEenum valuePart 3 - Add MAIDEN_NAME to PersonNameAliasType
Add
MAIDEN_NAMEto the existingPersonNameAliasTypeenum.Files
PersonNameParser.javaGEB_PATTERN, extract maiden name into return typePersonNameAliasType.javaMAIDEN_NAMEPersonService.javafindOrCreateByAlias()PersonNameParserTest.javaDepends on
Found in
ODS import file analysis for #190.
Complete Input/Output Table
Every
geb-containing entry from the ODS, showing how each flows through the parser.Direct
gebinputs (after//pre-split where applicable)Eugenie de Gruyter geb. MullerEugenie de GruyterMullerClara Cram geb. de GruyterClara Cramde GruyterElla Dieckmann, geb de GruyterElla Dieckmannde GruyterElise Rockstroh geb SintenisElise RockstrohSintenisElisabeth geb FernowElisabethFernowEntries that reach geb handling after
//pre-splitThese entries contain
//. The//split (from #190) fires first, producing segments that individually containgeb://splitEugenie de Gruyter geb. Muller//Walter de GruyterEugenie de Gruyter geb. Muller,Walter de GruyterEugenie de Gruyter,Walter de GruyterMuller(from segment 1)Clara Cram geb. de Gruyter//Ellen B-MClara Cram geb. de Gruyter,Ellen B-MClara Cram,Ellen B-Mde Gruyter(from segment 1)Clara Cram geb. de Gruyter//Ellen de GruyterClara Cram geb. de Gruyter,Ellen de GruyterClara Cram,Ellen de Gruyterde Gruyter(from segment 1)Clara Cram geb. de Gruyter//FamilieClara Cram geb. de Gruyter,FamilieClara Cram(Familie filtered)de Gruyter(from segment 1)Clara Cram geb. de Gruyter//Hans de GruyterClara Cram geb. de Gruyter,Hans de GruyterClara Cram,Hans de Gruyterde Gruyter(from segment 1)Clara Cram geb. de Gruyter//Herbert CramClara Cram geb. de Gruyter,Herbert CramClara Cram,Herbert Cramde Gruyter(from segment 1)Walter de Gruyter//Eugenie de Gruyter geb. MullerWalter de Gruyter,Eugenie de Gruyter geb. MullerWalter de Gruyter,Eugenie de GruyterMuller(from segment 2)Pattern summary
X geb. Y... geb. MullerX geb. Y Z... geb. de GruytergebX geb Y... geb SintenisX, geb Y Z..., geb de GruyterX geb Y(no married name)Elisabeth geb Fernow👨💻 Felix Brandt — Senior Fullstack Developer
Questions & Observations
,?\s*geb\.?\s+.*$is greedy -.*$captures everything to end of string. This is correct for the maiden name use case, but meansgebmust always be followed by the maiden name as the LAST thing in the string. If there were ever text after the maiden name (unlikely, but worth noting), it would be swallowed. The existingparseReceiversstripsgebbefore other processing, so this should be safe.SplitNamerecord currently has two fields (firstName,lastName). Adding an optionalmaidenNamefield changes the return type. This touches every call site ofsplit()- how many are there? If it's justPersonService.findOrCreateByAlias(), the change is contained. Ifsplit()is called elsewhere, the ripple is wider.Elisabeth geb Fernow) is interesting: after strippinggeb Fernow, onlyElisabethremains. The maiden nameFernowbecomes an alias, butsplit("Elisabeth")will produce("Elisabeth", "?"). Is that the intended outcome? The person's married last name is unknown from this input.Suggestions
SplitNamerecord could gain an optionalString maidenNamefield (null when absent) rather than a separate return type. This keeps the API change minimal:new SplitName(firstName, lastName)becomesnew SplitName(firstName, lastName, null)at most existing call sites, andnew SplitName(firstName, lastName, maidenName)only where geb was extracted.SplitNameAPI change - the test should assert both the cleaned name AND the extracted maiden name, forcing the API to evolve.parseReceivers()(step 1) andsplit(). After the widening, both uses need to be updated and tested independently.🏗️ Markus Keller — Application Architect
Questions & Observations
SplitNameis currently a simple value object. AddingmaidenNamemakes it carry domain semantics (alias creation). Consider whether the parser should return the raw extracted maiden name string, and letPersonServicedecide what to do with it (create alias, log, discard). This keeps the parser pure and the alias logic in the service where it belongs.geb:parseReceivers()strips it in step 1 (before multi-person splitting), andsplit()strips it again (before firstName/lastName extraction). After widening the pattern, are both still needed? IfparseReceiversstripsgebfirst,split()never sees it. Ifsplit()is called directly (Von column), it needs its own stripping. Clarify the data flow.split()pipeline. The processing order matters: geb strip -> dot-norm -> paren strip -> title strip -> known-last-name / fallback. Document this ordering as a contract in the code (comment or method structure) so future changes don't accidentally reorder steps.Suggestions
stripMaidenName(String cleaned)->record MaidenNameResult(String cleaned, String maidenName)) so the logic is testable in isolation and the pipeline ordering is explicit.parseReceiversgeb stripping discards the maiden name (it only cares about the cleaned name for multi-person splitting). Thesplit()geb stripping is where the maiden name should be captured. Make this distinction explicit.🧪 Sara Holt — QA Engineer
Questions & Observations
gebas a substring of a word: doesIngeborgstr.match the pattern? The proposed,?\s*geb\.?\s+.*$requires whitespace aftergeb, soIngeborgwouldn't match. ButIngeborg geb Mullerwould - is "Ingeborg" a realistic first name here? (Yes, it is a German name. The pattern is fine.)geb.at the very start of the string? E.g.geb. Mullerwith no name before it. The,?\s*prefix allows this. Result: empty cleaned name ->("?", "?")with maiden aliasMuller. Is that useful?Geb.orGEB.appear in the data? The current pattern is case-sensitive.Suggestions
//pre-split + geb stripping interaction works correctly end-to-end for entries likeClara Cram geb. de Gruyter//Ellen B-M(entry #7 in the table). This crosses the #190 and #209 boundaries.Clara Cram geb. de Gruyter, the extracted maiden name isde GruyterAND the cleaned name isClara Cram. Both outputs matter.Mullerwithout umlaut in some entries andMullerwith umlaut in others. Ensure tests use the actual ODS values (with umlauts) to catch encoding issues.🔒 Nora "NullX" Steiner — Security Engineer
Questions & Observations
,?\s*geb\.?\s+.*$is safe:.*$is anchored at end-of-string and cannot cause catastrophic backtracking. No nested quantifiers.PersonServiceshould use the same input validation as other alias creation. The existingPersonNameAliasDTOvalidation (#181) should cover this if the service reuses that code path.Suggestions
🎨 Leonie Voss — UI/UX Design Lead
Questions & Observations
MAIDEN_NAMEalias type will need a display label in the UI. The current NameHistoryCard shows alias types - what label shouldMAIDEN_NAMEshow? German: "Geburtsname" is the standard term.Suggestions
MAIDEN_NAMEto the files list:messages/de.json("Geburtsname"),messages/en.json("Maiden name"),messages/es.json("Apellido de soltera"). This is a small addition but easy to forget since the issue focuses on backend parser logic.🛠️ Tobias Wendt — DevOps Engineer
Questions & Observations
person_name_aliasestable andPersonNameAliasTypeenum already exist from #181).PersonNameAliasTypeenum is stored as a string in the database (@Enumerated(EnumType.STRING)). AddingMAIDEN_NAMErequires no schema migration - just the Java enum change. Good.Suggestions
🏗️ Markus Keller — Application Architect (Discussion Summary)
Interactive discussion with Marcel covering 5 open items from the architecture review. All resolved.
Resolved Items
SplitName target shape — Rather than incrementally bolting fields onto
SplitNameacross four issues, we created #213 as a preparatory refactor. It defines the full record shape once (title,firstName,lastName,maidenName,annotation), extracts thesplit()pipeline into named methods, and lays all shared infrastructure (nullablefirstName,PersonTypeenum,Person.getDisplayName(), frontenddisplayNamemigration across 17+ files). After #213 lands, this issue (#209) becomes a clean additive change: widenGEB_PATTERNand populate themaidenNamefield.Two-layer geb stripping —
parseReceivers()keeps its own geb strip (discards maiden name — it only needs the cleaned name for multi-person splitting).split()captures the maiden name via the newstripMaidenName()pipeline method. Both layers are needed because Von-column entries go directly tosplit()without passing throughparseReceivers().Pipeline ordering contract — Resolved by #213 Part 2: each normalization step is a named method composed in
split()with explicit ordering:stripMaidenName->normalizeDotCompressed->stripAnnotation->stripTitle->splitByKnownLastNameOrFallback. The method structure IS the contract.Parser purity vs domain semantics — Parser returns raw extracted strings in
SplitName(maiden name as aString, not as an alias).PersonService.findOrCreateByAlias()decides what to do with it (createPersonNameAliaswith typeMAIDEN_NAME). Parser stays pure, domain logic stays in the service.i18n for MAIDEN_NAME — Ships with #213 Part 10 alongside the enum value, so the label is available when #209 starts populating the field.
Dependency Update
This issue now depends on #213 (preparatory refactor) instead of directly depending on #181. Execution order: #190 (merged) -> #213 -> #209.
Implementation Complete
All 3 parts implemented on branch
feat/issues-209-213-person-parser-enhancements.Commits
c49cb348421d459f90cc1What changed
\s+geb\.\s+\S+to,?\s*geb\.?\s+(.+)$— handles optional comma, optional dot, multi-word maiden namesMaidenNameResultPersonNameAliaswith typeMAIDEN_NAMEwhen a maiden name is extractedTest results