feat(parser): support multi-person splitting for Von-column entries #214

Open
opened 2026-04-07 19:13:13 +02:00 by marcel · 7 comments
Owner

Problem

Von-column entries go directly to PersonService.findOrCreateByAlias() -> split(), bypassing parseReceivers(). This means und/u multi-person separators in Von entries are not handled — they produce a single Person with the separator embedded in the name.

Examples

Raw Von input Current result Expected result
Gerd u Brigitte de Gruyter 1 person: firstName="Gerd u Brigitte", lastName="de Gruyter" 2 persons: Gerd de Gruyter, Brigitte de Gruyter
Milly und Mutter Clara 1 person: firstName="Milly und Mutter", lastName="Clara" 2 persons: Milly, Mutter Clara (relationship prefix handled by #212)
Jappe Bakker u Frau 1 person: firstName="Jappe Bakker u", lastName="Frau" 1 person: Jappe Bakker (+ spouse reference — may need special handling)
Gisela u Eberhard 1 person: firstName="Gisela u", lastName="Eberhard" 2 persons: Gisela, Eberhard
Hans u Alma Cram 1 person: firstName="Hans u Alma", lastName="Cram" 2 persons: Hans Cram, Alma Cram
Herta u Felix Durr 1 person: firstName="Herta u Felix", lastName="Durr" 2 persons: Herta Durr, Felix Durr
Lili u Arnold Hardt 1 person: firstName="Lili u Arnold", lastName="Hardt" 2 persons: Lili Hardt, Arnold Hardt
Richard u Kathe Friedrichsen 1 person: firstName="Richard u Kathe", lastName="Friedrichsen" 2 persons: Richard Friedrichsen, Kathe Friedrichsen
M u M Mahling 1 person: firstName="M u M", lastName="Mahling" 2 persons: M Mahling, M Mahling (or 1 if deduped)

Entries that must NOT be split

Raw Von input Why not split
Westermann u Co INSTITUTION — classified by #211 before splitting
Architekt Korschelt u Renker INSTITUTION — classified by #211 before splitting ("Korschelt u Renker" is the architect business name)
Hochzeitsgedicht fur Paul u Luise de Gruyter SKIP — classified by #211 before splitting

Solution

Route Von-column entries through parseReceivers() (or a shared splitting step) so und/u separators are handled. The PersonTypeClassifier (#211) must run FIRST — only entries classified as PERSON or UNKNOWN should be split.

Processing order for Von entries

raw Von value
  -> PersonTypeClassifier.classify()        // #211
  -> if SKIP: store in Document notes, done
  -> if INSTITUTION/GROUP: store as non-person, done
  -> if PERSON/UNKNOWN: parseReceivers()    // THIS ISSUE
     -> for each name: findOrCreateByAlias() -> split()

This matches what An-column entries already do.

Files

File Change
MassImportService.java Route Von entries through parseReceivers() after classification
MassImportServiceTest.java / PersonNameParserTest.java Tests for Von multi-person splitting

Depends on

  • #211 (PersonType classifier) — must run before splitting to avoid splitting institutions on u
  • #213 (preparatory refactor) — pipeline infrastructure

Found in

ODS import file analysis for #190, flagged during #211 discussion.

## Problem Von-column entries go directly to `PersonService.findOrCreateByAlias()` -> `split()`, bypassing `parseReceivers()`. This means `und`/`u` multi-person separators in Von entries are not handled — they produce a single Person with the separator embedded in the name. ### Examples | Raw Von input | Current result | Expected result | |---|---|---| | `Gerd u Brigitte de Gruyter` | 1 person: firstName="Gerd u Brigitte", lastName="de Gruyter" | 2 persons: Gerd de Gruyter, Brigitte de Gruyter | | `Milly und Mutter Clara` | 1 person: firstName="Milly und Mutter", lastName="Clara" | 2 persons: Milly, Mutter Clara (relationship prefix handled by #212) | | `Jappe Bakker u Frau` | 1 person: firstName="Jappe Bakker u", lastName="Frau" | 1 person: Jappe Bakker (+ spouse reference — may need special handling) | | `Gisela u Eberhard` | 1 person: firstName="Gisela u", lastName="Eberhard" | 2 persons: Gisela, Eberhard | | `Hans u Alma Cram` | 1 person: firstName="Hans u Alma", lastName="Cram" | 2 persons: Hans Cram, Alma Cram | | `Herta u Felix Durr` | 1 person: firstName="Herta u Felix", lastName="Durr" | 2 persons: Herta Durr, Felix Durr | | `Lili u Arnold Hardt` | 1 person: firstName="Lili u Arnold", lastName="Hardt" | 2 persons: Lili Hardt, Arnold Hardt | | `Richard u Kathe Friedrichsen` | 1 person: firstName="Richard u Kathe", lastName="Friedrichsen" | 2 persons: Richard Friedrichsen, Kathe Friedrichsen | | `M u M Mahling` | 1 person: firstName="M u M", lastName="Mahling" | 2 persons: M Mahling, M Mahling (or 1 if deduped) | ### Entries that must NOT be split | Raw Von input | Why not split | |---|---| | `Westermann u Co` | INSTITUTION — classified by #211 before splitting | | `Architekt Korschelt u Renker` | INSTITUTION — classified by #211 before splitting ("Korschelt u Renker" is the architect business name) | | `Hochzeitsgedicht fur Paul u Luise de Gruyter` | SKIP — classified by #211 before splitting | ## Solution Route Von-column entries through `parseReceivers()` (or a shared splitting step) so `und`/`u` separators are handled. The `PersonTypeClassifier` (#211) must run FIRST — only entries classified as `PERSON` or `UNKNOWN` should be split. ### Processing order for Von entries ``` raw Von value -> PersonTypeClassifier.classify() // #211 -> if SKIP: store in Document notes, done -> if INSTITUTION/GROUP: store as non-person, done -> if PERSON/UNKNOWN: parseReceivers() // THIS ISSUE -> for each name: findOrCreateByAlias() -> split() ``` This matches what An-column entries already do. ## Files | File | Change | |---|---| | `MassImportService.java` | Route Von entries through `parseReceivers()` after classification | | `MassImportServiceTest.java` / `PersonNameParserTest.java` | Tests for Von multi-person splitting | ## Depends on - #211 (PersonType classifier) — must run before splitting to avoid splitting institutions on `u` - #213 (preparatory refactor) — pipeline infrastructure ## Found in ODS import file analysis for #190, flagged during #211 discussion.
marcel added the featureperson labels 2026-04-07 19:13:20 +02:00
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Questions & Observations

  • parseReceivers() reuse scope — The issue says Von entries should be routed through parseReceivers(). Currently parseReceivers() is designed for the An-column. Does it already handle all the patterns listed here (e.g. Gerd u Brigitte de Gruyter → shared last name distribution), or does parseReceivers() itself need to be extended? The issue's examples imply last-name inheritance logic (Hans u Alma Cram → both get Cram) — is that already in parseReceivers() or is it new behavior?

  • Jappe Bakker u Frau edge case — The expected result says "1 person: Jappe Bakker (+ spouse reference — may need special handling)". This feels under-specified. Does "may need special handling" mean this issue should handle it, or is it deferred? If deferred, what should the current parser produce — one person Jappe Bakker and silently drop Frau, or one person with the raw string preserved? I'd want a concrete expected output before writing the test.

  • Milly und Mutter Clara interaction with #212 — The expected result references "relationship prefix handled by #212". If #212 isn't merged yet, should the test for this case assert the intermediate behavior (without prefix stripping) or the final behavior (with it)? This affects whether we can write the test now or need to wait.

  • Von-column entry point — The issue mentions MassImportService.java as the file to change. Is there a single method where Von entries enter the pipeline (like processSender() or similar), or are there multiple call sites? This determines whether the change is a one-liner routing change or requires touching multiple paths.

Suggestions

  • Test-first approach: The examples table is excellent test input. I'd write one parameterized test per row in PersonNameParserTest (or MassImportServiceTest) before touching production code. The "must NOT be split" table is equally important — those are regression tests ensuring the classifier gate works.

  • Clarify the Frau case: Either spec the expected output precisely or explicitly mark it as out-of-scope for this issue. A test with a vague expected result is worse than no test — it'll pass for the wrong reason.

  • Consider a shared splitMultiPerson() method: If both An and Von columns need the same und/u splitting logic routed through the same classifier gate, extracting a named method (splitMultiPersonEntry()) that both paths call would make the intent clear and avoid duplicating the classifier → parseReceivers → split chain.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Questions & Observations - **`parseReceivers()` reuse scope** — The issue says Von entries should be routed through `parseReceivers()`. Currently `parseReceivers()` is designed for the An-column. Does it already handle all the patterns listed here (e.g. `Gerd u Brigitte de Gruyter` → shared last name distribution), or does `parseReceivers()` itself need to be extended? The issue's examples imply last-name inheritance logic (`Hans u Alma Cram` → both get `Cram`) — is that already in `parseReceivers()` or is it new behavior? - **`Jappe Bakker u Frau` edge case** — The expected result says "1 person: Jappe Bakker (+ spouse reference — may need special handling)". This feels under-specified. Does "may need special handling" mean this issue should handle it, or is it deferred? If deferred, what should the current parser produce — one person `Jappe Bakker` and silently drop `Frau`, or one person with the raw string preserved? I'd want a concrete expected output before writing the test. - **`Milly und Mutter Clara` interaction with #212** — The expected result references "relationship prefix handled by #212". If #212 isn't merged yet, should the test for this case assert the intermediate behavior (without prefix stripping) or the final behavior (with it)? This affects whether we can write the test now or need to wait. - **Von-column entry point** — The issue mentions `MassImportService.java` as the file to change. Is there a single method where Von entries enter the pipeline (like `processSender()` or similar), or are there multiple call sites? This determines whether the change is a one-liner routing change or requires touching multiple paths. ### Suggestions - **Test-first approach**: The examples table is excellent test input. I'd write one parameterized test per row in `PersonNameParserTest` (or `MassImportServiceTest`) before touching production code. The "must NOT be split" table is equally important — those are regression tests ensuring the classifier gate works. - **Clarify the `Frau` case**: Either spec the expected output precisely or explicitly mark it as out-of-scope for this issue. A test with a vague expected result is worse than no test — it'll pass for the wrong reason. - **Consider a shared `splitMultiPerson()` method**: If both An and Von columns need the same `und`/`u` splitting logic routed through the same classifier gate, extracting a named method (`splitMultiPersonEntry()`) that both paths call would make the intent clear and avoid duplicating the classifier → parseReceivers → split chain.
Author
Owner

🏗️ Markus Keller — Application Architect

Questions & Observations

  • Pipeline ordering as architecture — The processing order diagram (raw → classify → if PERSON/UNKNOWN → parseReceivers → split) is effectively defining a pipeline architecture. With #211 (classifier), #212 (relationship prefixes), #213 (refactor), and now #214 all touching the same flow, the implicit question is: is there a single, explicit pipeline abstraction that enforces this ordering, or is each issue adding another if branch into MassImportService? If it's the latter, four issues from now this method will be unreadable.

  • Von vs An symmetry — The issue states "this matches what An-column entries already do." If both columns now go through the same processing pipeline, is there still a reason to have separate code paths for Von and An at all? Or should both converge on a single processPersonEntry(raw, column) method where the only difference is what role (sender vs receiver) the resulting Person gets assigned to on the Document?

  • Classifier as a gate vs. filter — The pipeline shows the classifier deciding whether to proceed to parseReceivers(). This makes the classifier a control-flow gate in the import pipeline. Is this responsibility documented in #211's contract, or is #214 adding an implicit dependency on classifier behavior that #211 doesn't know about? If the classifier's return type or semantics change, this pipeline breaks silently.

  • "Store in Document notes" for SKIP — Where does this behavior live? Is it already implemented in #211, or is this issue also responsible for implementing the SKIP → notes routing? If it's split across issues, the intermediate state (after #214 but before "notes" routing) needs to be defined.

Suggestions

  • Make the pipeline explicit: Whether it's a simple method chain or a lightweight ImportPipeline with steps, the ordering guarantees should be in code, not just in issue descriptions. This doesn't need to be a framework — even a well-named method like processPersonColumn(raw) that calls classify → route → parse → split in order would suffice.

  • Document the classifier contract: #214 depends on PersonTypeClassifier.classify() returning specific values (PERSON, UNKNOWN, SKIP, INSTITUTION, GROUP). That contract should be explicit in #211 so both issues can be implemented and tested independently.

## 🏗️ Markus Keller — Application Architect ### Questions & Observations - **Pipeline ordering as architecture** — The processing order diagram (`raw → classify → if PERSON/UNKNOWN → parseReceivers → split`) is effectively defining a pipeline architecture. With #211 (classifier), #212 (relationship prefixes), #213 (refactor), and now #214 all touching the same flow, the implicit question is: is there a single, explicit pipeline abstraction that enforces this ordering, or is each issue adding another `if` branch into `MassImportService`? If it's the latter, four issues from now this method will be unreadable. - **Von vs An symmetry** — The issue states "this matches what An-column entries already do." If both columns now go through the same processing pipeline, is there still a reason to have separate code paths for Von and An at all? Or should both converge on a single `processPersonEntry(raw, column)` method where the only difference is what role (sender vs receiver) the resulting Person gets assigned to on the Document? - **Classifier as a gate vs. filter** — The pipeline shows the classifier deciding whether to proceed to `parseReceivers()`. This makes the classifier a control-flow gate in the import pipeline. Is this responsibility documented in #211's contract, or is #214 adding an implicit dependency on classifier behavior that #211 doesn't know about? If the classifier's return type or semantics change, this pipeline breaks silently. - **"Store in Document notes" for SKIP** — Where does this behavior live? Is it already implemented in #211, or is this issue also responsible for implementing the SKIP → notes routing? If it's split across issues, the intermediate state (after #214 but before "notes" routing) needs to be defined. ### Suggestions - **Make the pipeline explicit**: Whether it's a simple method chain or a lightweight `ImportPipeline` with steps, the ordering guarantees should be in code, not just in issue descriptions. This doesn't need to be a framework — even a well-named method like `processPersonColumn(raw)` that calls classify → route → parse → split in order would suffice. - **Document the classifier contract**: #214 depends on `PersonTypeClassifier.classify()` returning specific values (`PERSON`, `UNKNOWN`, `SKIP`, `INSTITUTION`, `GROUP`). That contract should be explicit in #211 so both issues can be implemented and tested independently.
Author
Owner

🧪 Sara Holt — QA Engineer & Test Strategist

Questions & Observations

  • Test data coverage — The examples table has 10 "should split" cases and 2 "must NOT split" cases. That's a good start, but the "must not split" category feels thin. Are there other patterns where u or und appears legitimately inside a name or institution that we need to protect? For instance, German place names like "Neustadt und Umgebung" or compound names — are those possible in the Von column?

  • Acceptance criteria missing — The issue describes expected results per example but doesn't state explicit acceptance criteria. I'd want:

    • "All 10 positive examples produce the expected person count and names"
    • "All negative examples are NOT split"
    • "Existing An-column splitting behavior is unchanged (regression)"
    • "PersonTypeClassifier gate is exercised — if classifier returns INSTITUTION, splitting does not occur"
  • Integration test boundary — The issue lists unit test files (PersonNameParserTest, MassImportServiceTest). Is there an integration test that exercises the full import pipeline (ODS file → MassImportService → database) with a Von entry containing u? A unit test on parseReceivers() proves the parser works; an integration test proves the routing in MassImportService actually calls it for Von entries.

  • M u M Mahling dedup question — The expected result says "2 persons: M Mahling, M Mahling (or 1 if deduped)". Which is it? findOrCreateByAlias() presumably deduplicates, so the test should assert the actual behavior — likely 1 person. If the answer is "depends on existing data", the test needs to cover both scenarios.

Suggestions

  • Parameterized tests: Both the positive and negative tables are perfect for @ParameterizedTest with @MethodSource or @CsvSource. One test method, N rows, clear failure messages per case.

  • Regression suite for An-column: Add a few existing An-column multi-person cases to the same test class to ensure the routing change doesn't break what already works.

  • Resolve ambiguous expected outputs before implementation: Jappe Bakker u Frau and M u M Mahling both have uncertain expected results. Pin them down now — ambiguous specs produce ambiguous tests.

## 🧪 Sara Holt — QA Engineer & Test Strategist ### Questions & Observations - **Test data coverage** — The examples table has 10 "should split" cases and 2 "must NOT split" cases. That's a good start, but the "must not split" category feels thin. Are there other patterns where `u` or `und` appears legitimately inside a name or institution that we need to protect? For instance, German place names like "Neustadt und Umgebung" or compound names — are those possible in the Von column? - **Acceptance criteria missing** — The issue describes expected results per example but doesn't state explicit acceptance criteria. I'd want: - "All 10 positive examples produce the expected person count and names" - "All negative examples are NOT split" - "Existing An-column splitting behavior is unchanged (regression)" - "PersonTypeClassifier gate is exercised — if classifier returns INSTITUTION, splitting does not occur" - **Integration test boundary** — The issue lists unit test files (`PersonNameParserTest`, `MassImportServiceTest`). Is there an integration test that exercises the full import pipeline (ODS file → MassImportService → database) with a Von entry containing `u`? A unit test on `parseReceivers()` proves the parser works; an integration test proves the routing in `MassImportService` actually calls it for Von entries. - **`M u M Mahling` dedup question** — The expected result says "2 persons: M Mahling, M Mahling (or 1 if deduped)". Which is it? `findOrCreateByAlias()` presumably deduplicates, so the test should assert the actual behavior — likely 1 person. If the answer is "depends on existing data", the test needs to cover both scenarios. ### Suggestions - **Parameterized tests**: Both the positive and negative tables are perfect for `@ParameterizedTest` with `@MethodSource` or `@CsvSource`. One test method, N rows, clear failure messages per case. - **Regression suite for An-column**: Add a few existing An-column multi-person cases to the same test class to ensure the routing change doesn't break what already works. - **Resolve ambiguous expected outputs before implementation**: `Jappe Bakker u Frau` and `M u M Mahling` both have uncertain expected results. Pin them down now — ambiguous specs produce ambiguous tests.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Questions & Observations

  • Input trust boundary — The Von-column data comes from ODS/Excel import files uploaded by users. The issue focuses on parsing logic, but I want to confirm: is the raw Von string sanitized before it hits parseReceivers() and split()? If a malicious import file contains a Von entry like <script>alert(1)</script> or a string with SQL-significant characters, does the pipeline handle that safely? This isn't a new attack surface (it exists for An-column too), but routing Von through the same path means confirming the same protections apply.

  • Person creation from untrusted inputfindOrCreateByAlias() creates Person records from import data. With Von entries now being split, a single Von field can create multiple Person records. Is there a limit on how many persons a single Von entry can produce? A crafted input like A u B u C u D u E u ... with hundreds of u separators could create a large number of Person records in one import row. This is a low-severity DoS vector but worth considering.

  • Classifier bypass — The pipeline depends on #211's classifier running FIRST to filter out non-person entries. If someone finds a way to bypass or confuse the classifier (e.g., an institution name that doesn't match the classifier's patterns), the splitting logic would run on it. The security impact is low (worst case: an institution gets split into multiple nonsense Person records), but it's worth noting that the classifier is now a security-relevant gate, not just a data quality filter.

Suggestions

  • No new security concerns introduced — This change is primarily a routing change within an existing trusted pipeline. As long as the existing input sanitization and JPA parameterized queries remain in place (which they do — findOrCreateByAlias uses Spring Data JPA), the security posture doesn't change. The main thing to verify is that the Von→parseReceivers path doesn't introduce any string concatenation into queries that the direct Von→split path didn't have.

  • Consider a sanity limit on split count: Cap parseReceivers() output at a reasonable maximum (e.g., 10 persons per field) to prevent resource exhaustion from malformed import data.

## 🔒 Nora "NullX" Steiner — Application Security Engineer ### Questions & Observations - **Input trust boundary** — The Von-column data comes from ODS/Excel import files uploaded by users. The issue focuses on parsing logic, but I want to confirm: is the raw Von string sanitized before it hits `parseReceivers()` and `split()`? If a malicious import file contains a Von entry like `<script>alert(1)</script>` or a string with SQL-significant characters, does the pipeline handle that safely? This isn't a new attack surface (it exists for An-column too), but routing Von through the same path means confirming the same protections apply. - **Person creation from untrusted input** — `findOrCreateByAlias()` creates Person records from import data. With Von entries now being split, a single Von field can create multiple Person records. Is there a limit on how many persons a single Von entry can produce? A crafted input like `A u B u C u D u E u ...` with hundreds of `u` separators could create a large number of Person records in one import row. This is a low-severity DoS vector but worth considering. - **Classifier bypass** — The pipeline depends on #211's classifier running FIRST to filter out non-person entries. If someone finds a way to bypass or confuse the classifier (e.g., an institution name that doesn't match the classifier's patterns), the splitting logic would run on it. The security impact is low (worst case: an institution gets split into multiple nonsense Person records), but it's worth noting that the classifier is now a security-relevant gate, not just a data quality filter. ### Suggestions - **No new security concerns introduced** — This change is primarily a routing change within an existing trusted pipeline. As long as the existing input sanitization and JPA parameterized queries remain in place (which they do — `findOrCreateByAlias` uses Spring Data JPA), the security posture doesn't change. The main thing to verify is that the Von→parseReceivers path doesn't introduce any string concatenation into queries that the direct Von→split path didn't have. - **Consider a sanity limit on split count**: Cap `parseReceivers()` output at a reasonable maximum (e.g., 10 persons per field) to prevent resource exhaustion from malformed import data.
Author
Owner

🎨 Leonie Voss — UI/UX Design Lead

Questions & Observations

  • No direct UI impact — This issue is purely backend parser logic within the ODS/Excel import pipeline. There are no frontend components, user-facing screens, or interaction patterns affected. The change happens silently during batch import — users see the resulting Person records but don't interact with the splitting logic itself.

  • Indirect UX improvement — From a user perspective, this is a data quality fix. Currently, users importing documents with Von entries like "Hans u Alma Cram" get a single garbage Person record with an embedded separator. After this change, they'll see two correctly formed persons. This reduces the need for manual cleanup in the person management UI — a meaningful UX win even though no UI code changes.

  • Person list display — If an import previously created one person "Gerd u Brigitte de Gruyter" and after this fix creates two persons "Gerd de Gruyter" and "Brigitte de Gruyter", is there a migration or cleanup step for existing bad data? If not, users will see both the old malformed record and the new clean ones. That could be confusing in search results.

Suggestions

  • No UI changes needed for this issue. The improvement is in data quality at import time, which is the right approach — fix the source, not the display.
## 🎨 Leonie Voss — UI/UX Design Lead ### Questions & Observations - **No direct UI impact** — This issue is purely backend parser logic within the ODS/Excel import pipeline. There are no frontend components, user-facing screens, or interaction patterns affected. The change happens silently during batch import — users see the resulting Person records but don't interact with the splitting logic itself. - **Indirect UX improvement** — From a user perspective, this is a data quality fix. Currently, users importing documents with Von entries like "Hans u Alma Cram" get a single garbage Person record with an embedded separator. After this change, they'll see two correctly formed persons. This reduces the need for manual cleanup in the person management UI — a meaningful UX win even though no UI code changes. - **Person list display** — If an import previously created one person "Gerd u Brigitte de Gruyter" and after this fix creates two persons "Gerd de Gruyter" and "Brigitte de Gruyter", is there a migration or cleanup step for existing bad data? If not, users will see both the old malformed record and the new clean ones. That could be confusing in search results. ### Suggestions - No UI changes needed for this issue. The improvement is in data quality at import time, which is the right approach — fix the source, not the display.
Author
Owner

⚙️ Tobias Wendt — DevOps & Platform Engineer

Questions & Observations

  • No infrastructure impact — This is a pure backend parsing change within MassImportService. No new services, no config changes, no new environment variables, no migration files, no Docker changes. The existing CI pipeline (build → test → deploy) handles this without modification.

  • Dependency chain risk — The issue depends on #211 and #213. From a CI/merge perspective: are these going to land on main in order, or could they be developed in parallel on feature branches? If parallel, the test suite for #214 needs to either mock the classifier or include it as a test dependency. If sequential, the branch for #214 should be rebased onto #213's merge. Either way, merge conflicts in MassImportService.java are likely if all three issues touch the same method.

  • Import performance — Von entries now go through an additional processing step (classifier + parseReceivers) compared to the current direct path. For a large ODS import (thousands of rows), is the added overhead measurable? Probably not — string parsing is fast — but if the classifier in #211 does anything expensive (regex compilation per call, database lookups), it could add up. Worth a quick check after implementation.

Suggestions

  • No action needed from my side. This is a clean backend logic change. Just ensure the feature branch is rebased cleanly onto #213 before merging to avoid conflicts in the import pipeline code.
## ⚙️ Tobias Wendt — DevOps & Platform Engineer ### Questions & Observations - **No infrastructure impact** — This is a pure backend parsing change within `MassImportService`. No new services, no config changes, no new environment variables, no migration files, no Docker changes. The existing CI pipeline (build → test → deploy) handles this without modification. - **Dependency chain risk** — The issue depends on #211 and #213. From a CI/merge perspective: are these going to land on `main` in order, or could they be developed in parallel on feature branches? If parallel, the test suite for #214 needs to either mock the classifier or include it as a test dependency. If sequential, the branch for #214 should be rebased onto #213's merge. Either way, merge conflicts in `MassImportService.java` are likely if all three issues touch the same method. - **Import performance** — Von entries now go through an additional processing step (classifier + parseReceivers) compared to the current direct path. For a large ODS import (thousands of rows), is the added overhead measurable? Probably not — string parsing is fast — but if the classifier in #211 does anything expensive (regex compilation per call, database lookups), it could add up. Worth a quick check after implementation. ### Suggestions - **No action needed from my side.** This is a clean backend logic change. Just ensure the feature branch is rebased cleanly onto #213 before merging to avoid conflicts in the import pipeline code.
Author
Owner

🏗️ Markus Keller — Application Architect

Discussion summary from architecture review with Marcel.

Resolved

  • Multiple senders (data model)Document.sender is @ManyToOne, but Von splitting via parseReceivers() can produce multiple persons. Agreed to change the model to @ManyToMany senders (symmetric with receivers) in a separate issue before #214. This includes UI changes. #214 becomes a follow-up that naturally works once the model supports it.
  • Von/An pipeline convergence — Both columns will go through the same flow. Agreed to extract a shared processPersonColumn() method in #214 rather than keeping separate code paths. The only difference is whether results land in senders or receivers.
  • Pipeline explicitness — Reviewed MassImportService after #211/#213 merge. The method is linear and readable — no nested if branches. The processPersonColumn() extraction will make it simpler. No pipeline abstraction needed.
  • SKIP → notes routing — The issue mentions storing SKIP entries in Document notes, but this isn't implemented yet. Agreed to handle in a separate issue, not in #214.

New issues created

  • Multiple senders support — data model change (@ManyToMany) + UI updates. Prerequisite for #214.
  • SKIP → notes routing — store the skip reason (e.g. "Briefumschlag", "Hochzeitsgedicht") in the document's notes field instead of silently dropping it.

Overall read

The sequencing matters here: multiple senders first, then #214's Von splitting slots in cleanly without workarounds. The codebase is in good shape after the recent merges — no architectural concerns with the current import pipeline structure.

## 🏗️ Markus Keller — Application Architect Discussion summary from architecture review with Marcel. ### Resolved - **Multiple senders (data model)** — `Document.sender` is `@ManyToOne`, but Von splitting via `parseReceivers()` can produce multiple persons. Agreed to change the model to `@ManyToMany senders` (symmetric with receivers) in a **separate issue before #214**. This includes UI changes. #214 becomes a follow-up that naturally works once the model supports it. - **Von/An pipeline convergence** — Both columns will go through the same flow. Agreed to extract a shared `processPersonColumn()` method in #214 rather than keeping separate code paths. The only difference is whether results land in `senders` or `receivers`. - **Pipeline explicitness** — Reviewed `MassImportService` after #211/#213 merge. The method is linear and readable — no nested `if` branches. The `processPersonColumn()` extraction will make it simpler. No pipeline abstraction needed. - **SKIP → notes routing** — The issue mentions storing SKIP entries in Document notes, but this isn't implemented yet. Agreed to handle in a **separate issue**, not in #214. ### New issues created - **Multiple senders support** — data model change (`@ManyToMany`) + UI updates. Prerequisite for #214. - **SKIP → notes routing** — store the skip reason (e.g. "Briefumschlag", "Hochzeitsgedicht") in the document's notes field instead of silently dropping it. ### Overall read The sequencing matters here: multiple senders first, then #214's Von splitting slots in cleanly without workarounds. The codebase is in good shape after the recent merges — no architectural concerns with the current import pipeline structure.
Sign in to join this conversation.
No Label feature person
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#214