feat(import): store SKIP entries in document notes #217
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
When
PersonTypeClassifierclassifies a Von or An entry asSKIP(e.g. "Briefumschlag aus Java", "Kondolenzbriefe zum Tod von Walter de Gruyter", "Hochzeitsgedicht fur Paul u Luise de Gruyter"),PersonService.findOrCreateByAlias()returnsnulland the entry is silently dropped. The information is lost — there's no record that the original spreadsheet row contained this text.Solution
When a person entry is classified as
SKIP, store the raw text in the document'snotesfield instead of discarding it. This preserves the original spreadsheet content for manual review.Processing
Files
MassImportService.javafindOrCreatePerson()returns null for a SKIP entry, append the raw value todoc.notesMassImportServiceTest.javaAcceptance criteria
Found in
Architecture discussion on #214.
👨💻 Felix Brandt — Senior Fullstack Developer
Questions & Observations
findOrCreatePerson()return value is ambiguous — CurrentlyfindOrCreateByAlias()returnsnullfor SKIP entries. Butnullis also what you'd get if something went wrong. The caller inMassImportServicecan't distinguish "this was a SKIP" from "something unexpected returned null." To store the skip reason in notes, the caller needs to know why it got null. Options:findOrCreateByAlias()throw a specificSkipExceptionthat the caller catchesOptional<Person>or a result type that carries the skip reasonPersonTypeClassifier.classify()directly inMassImportServicebefore callingfindOrCreatePerson(), and handle the SKIP branch thereThe third option is cleanest — it matches the pipeline order described in #214 and avoids changing
PersonService's contract. The classifier is already a static utility, so calling it fromMassImportServicehas no coupling cost.Append vs overwrite on
doc.notes— The issue says "append raw text to Document.notes." If a document has both a SKIP sender and a SKIP receiver (unlikely but possible), both should appear. If the document already has notes from a previous import or manual entry, the append must not destroy existing content. Use a separator like\nbetween entries.Test shape — I'd write a parameterized test in
MassImportServiceTest:Suggestions
MassImportServicebeforefindOrCreatePerson(). This is the right place for the routing decision — it keepsPersonServicefocused on person CRUD.🏗️ Markus Keller — Application Architect
Questions & Observations
Responsibility placement — The issue says to handle this in
MassImportServiceafterfindOrCreatePerson()returns null. I'd push back slightly: the SKIP classification should happen before callingfindOrCreatePerson(), not after. The classifier is a gate in the pipeline — it decides whether to proceed to person creation. Checking the return value offindOrCreatePerson()for null conflates "SKIP entry" with "failed lookup," which is a leaky abstraction.The clean pipeline:
Notes field semantics —
Document.notesis currently a free-text field (nullabletextcolumn). Using it for structured skip-reason data means mixing human-authored notes with machine-generated import metadata. Is that acceptable? For this project's scale (family archive, small team), yes — but the format should be consistent enough to be parseable later if needed. I'd suggest a prefix format:[Import] Übersprungen (Von): Briefumschlag aus Java.No schema change needed — The
notesfield already exists as atextcolumn with no length limit. No Flyway migration required. This is purely an application logic change.Suggestions
MassImportServicebefore thefindOrCreatePerson()call. This makes the pipeline order explicit in code, matching the diagram in #214.🧪 Sara Holt — QA Engineer & Test Strategist
Questions & Observations
Test coverage for all SKIP keywords — The classifier currently has three SKIP keywords: "Briefumschlag", "Kondolenzbriefe", "Hochzeitsgedicht". Each should have a test case in
MassImportServiceTestverifying the notes are populated. A@ParameterizedTestwith@CsvSourcecontaining all three keywords would cover this efficiently.Acceptance criteria need a "does not store" negative case — The criteria say "Non-SKIP entries are unaffected." This should be an explicit test: a normal person name in the Von column should NOT produce any skip text in notes. Currently the criteria are all positive assertions — add a negative one.
Multi-SKIP document — What happens when both Von and An are SKIP entries? The document would have no sender, no receivers, but notes with two skip entries. Is the document still created? The
metadataCompleteheuristic (date != null || !senderRaw.isBlank() || !receiversRaw.isBlank()) would still be true because the raw strings are non-blank — they just got classified as SKIP. IsmetadataComplete = truecorrect for a document where all person fields were skipped? This is an edge case worth testing and deciding on.Re-import behavior — If a document is re-imported (exists with PLACEHOLDER status), the notes from the first import would be overwritten or duplicated. Should re-import clear previous import notes, or append? The current code does
doc.setNotes(...)which would overwrite.Suggestions
🔒 Nora "NullX" Steiner — Application Security Engineer
Questions & Observations
Untrusted input stored in notes — The raw Von/An text from the ODS file is being stored directly in
Document.notes. This text comes from an import file uploaded by an authenticated user, so the trust level is moderate. However, ifnotesis rendered anywhere in the frontend without escaping, a crafted import file could inject HTML/XSS payloads via the Von/An column. Verify that wherevernotesis displayed in the UI, it's rendered as text content, not raw HTML. Svelte's{variable}syntax auto-escapes, so this should be safe — but check for any{@html notes}usage.No new attack surface — The notes field already exists and is already user-writable (manual edits). Storing import metadata in it doesn't change the trust boundary. The only new thing is that import file content now reaches the notes field, which was previously only populated by direct user input through the UI.
Suggestions
{@html}rendering of the notes field in the frontend. If it's all standard{variable}interpolation, this is safe by default. Quick grep for@html.*noteswould confirm.🎨 Leonie Voss — UI/UX Design Lead
Questions & Observations
No direct UI impact — This is a backend import pipeline change. The notes field is already displayed in the document detail view. Import-generated skip notes will appear alongside any manually written notes — no new component or layout change needed.
Visual distinction for import notes — If import-generated notes (like "[Import] Übersprungen (Von): Briefumschlag aus Java") appear mixed with user-written notes, users might be confused about what they can edit or delete. A consistent prefix like
[Import]helps, but it's still plain text in the same field. For the current project scale this is fine — if it becomes confusing later, a separate "import log" field could be added.Suggestions
[Import]prefix convention is sufficient to distinguish machine-generated notes from manual ones at this scale.⚙️ Tobias Wendt — DevOps & Platform Engineer
Questions & Observations
No infrastructure impact — No new services, no config changes, no migrations, no Docker changes. Pure application logic change within
MassImportService. The existing CI pipeline handles this without modification.Observability — The current code already logs SKIP entries via the classifier path. After this change, the skip information will also be persisted in the database (in
notes). That's actually a minor observability improvement — you can querySELECT original_filename, notes FROM documents WHERE notes LIKE '%Übersprungen%'to find all documents with skipped person entries, without needing to search through application logs.Suggestions