feat(document): replace DocumentSearchItem with flat DocumentListItem DTO (#647) #660
Reference in New Issue
Block a user
Delete Branch "feat/issue-647-document-list-dto"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Closes #647
DocumentListItemJava record replaces both theDocumentSearchItemwrapper and the nestedDocumentreference — eliminates OWASP API3:2023 excessive data exposure (no moretranscription,filePath,fileHash,contentType,scriptTypeetc. sent to list views)Document.listentity graph pruned tosender + receivers + tags—trainingLabelsremoved sinceDocumentListItem.toListItem()never touches it, ending the maintenance trapDocumentSearchResultupdated toList<DocumentListItem>;DocumentService.enrichItems()now calls a newtoListItem()mapping methodnpm run generate:apiregenerated;DocumentRow,DocumentList,+page.server.ts,DocumentMultiSelectall migrated to flatitem.title/item.senderaccessDocumentListItemIntegrationTest— Testcontainers guard: search withtrainingLabelsdoes not throwLazyInitializationException; detail endpoint still returnstrainingLabelsDocumentControllerTest—@WebMvcTestshape assertion:$.items[0].idpresent,$.items[0].transcription/filePath/fileHashabsentTest plan
DocumentControllerTest.search_returns_flat_item_with_id_and_without_sensitive_fieldsgreenDocumentListItemIntegrationTest— all 3 tests green (noLazyInitializationException, detail still returnstrainingLabels)npm run check— zero errors in changed files🤖 Generated with Claude Code
- Use documentService.getDocumentById() in detail_stillReturnsTrainingLabels so the Document.full entity graph eager-loads trainingLabels - Flatten makeItem() factory in DocumentList.svelte.test.ts (nested document: {} overrides broke item.id / item.documentDate access) - Remove { document: {} } wrapper from DocumentMultiSelect.svelte.spec.ts mock responses — component now reads body.items directly as flat items - Flatten single nested item in page.svelte.test.ts document list test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>c92eda33b1to627fc44d99👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
Blockers
None.
Suggestions
1.
DocumentMultiSelect.svelte— unsafeas unknown as Document[]castfrontend/src/lib/document/DocumentMultiSelect.svelte, line ~49:This works at runtime because
GeschichteEditoronly readsid,title, anddocumentDatefromselectedDocuments. But the cast tells TypeScript that everyDocument-only field (status,metadataComplete,scriptType,createdAt,updatedAt) is present — any future consumer that accesses one of those will compile fine and fail silently at runtime.A minimal improvement: map explicitly to the fields the component actually needs, then cast:
The real fix is giving
selectedDocumentsa purpose-built picker type, but that touchesGeschichteEditorand is out of scope here. At minimum, the field list in the map makes the implicit contract explicit.2.
DocumentSearchResult.java— factory method JavaDoc silently removedof()andpaged()had load-bearing documentation explaining non-obvious semantics ("single-page convenience factory used by empty-result shortcuts and by tests that don't care about paging" / "paged factory used by the service when it has a real Pageable + full match count"). These were removed as part of the type rename but the semantics didn't change. A future developer reading the class won't know why two factories exist. Restore the comments.3.
DocumentService.toListItem()— redundant null guards for@Builder.DefaultcollectionsDocument.receiversandDocument.tagsboth have@Builder.Defaultwithnew HashSet<>()— they're null only if aDocumentis constructed without the builder (a bug, not a runtime scenario). The defensive check implies a uncertainty that doesn't exist and makes readers wonder "when would this be null?". JustList.copyOf(doc.getReceivers())is correct — if it NPEs in a test, that's the test's broken setup speaking, which is useful signal.All good
const doc = $derived(item)inDocumentRow.svelte— perfect alias, zero template churntoListItem()private toDocumentService— mapping owned by the domain, no leakage🔐 Nora "NullX" Steiner — Application Security Engineer
Verdict: ✅ Approved
This PR directly remediates OWASP API3:2023 — Excessive Data Exposure for the document list endpoint. I audited every field in the new
DocumentListItemrecord.Fields removed from list response (all correct)
transcriptionfilePathfileHashcontentTypescriptTypestatusmetadataCompletethumbnailKeythumbnailUrlcomputed property is exposed instead (correct)pageCountSecurity regression test: excellent
DocumentControllerTest.search_returns_flat_item_with_id_and_without_sensitive_fieldsasserts at the HTTP layer that$.items[0].transcription,$.items[0].filePath, and$.items[0].fileHashdo not exist in the response. This is exactly the right layer — it catches any accidental re-introduction even if the Java types change.Note for a follow-up issue (not a blocker)
Personis still serialized as a full entity insenderandreceivers. ThePersonschema exposesbirthDate,deathDate, and potentiallynotes— fields that are not rendered in the list view. This was equally true in the oldDocumentSearchItem.document.senderpath, so this PR is not a regression. But it's worth a dedicated issue to introduce aPersonSummaryDTOwith justidanddisplayNamefor list-view contexts.🏗️ Markus Keller — Application Architect
Verdict: ✅ Approved
Architecture is sound
Domain ownership:
DocumentListItemlives inorg.raddatz.familienarchiv.document— the right package. The oldDocumentSearchItemalso lived there, so no boundary shift. The mapping methodtoListItem()isprivatetoDocumentService— the list projection is fully owned by the document domain, nothing leaks.DocumentSearchItemfully deleted — no dead code, no deprecated wrapper left behind. Clean.Entity graph: Removing
trainingLabelsfromDocument.listis correct.toListItem()never accessestrainingLabels, so there's no lazy-loading risk on that path.Document.full(used bygetDocumentById) still loadstrainingLabelscorrectly. The two graphs now accurately reflect what each query path actually needs.Pre-existing concerns (not introduced here, but worth flagging)
1.
DocumentListItemexposes fullPersonandTagentitiessender: Personandreceivers: List<Person>carry the completePersonentity to list-view clients. The list row only needsdisplayName(andidfor link targets). APersonSummaryDTOwith{id, displayName}would close this surface. This was equally true throughDocumentSearchItem.document.sender— not a regression — but now it's explicit in the list DTO's contract and worth tracking.2.
DocumentMultiSelectimplicit couplingThe
as unknown as Document[]cast inDocumentMultiSelect.sveltemakes visible an implicit contract betweenDocumentListItemandDocument: the component treats search results as if they were full entities. This works today becauseGeschichteEditoronly readsid/title/documentDate. ADocumentPickerIteminterface would make the contract compile-time-checkable. Small, but as the component grows this cast will bite someone.🧪 Sara Holt — QA Engineer & Test Strategist
Verdict: ⚠️ Approved with concerns
What's solid
DocumentListItemIntegrationTest— Three Testcontainers tests with a real PostgreSQL instance:search_doesNotThrow_whenDocumentHasTrainingLabels— confirms theLazyInitializationExceptionregression is guardedsearch_returns_list_item_without_sensitive_fields_when_document_has_training_labels— confirms flat item shapedetail_stillReturnsTrainingLabels— confirmsDocument.fullgraph is unaffected@AfterEach cleanup()viadocumentRepository.deleteAll()properly isolates tests. Correct use ofdocumentService.getDocumentById()(notdocumentRepository.findById()) to exercise the entity graph.DocumentControllerTest.search_returns_flat_item_with_id_and_without_sensitive_fields— Verifies the HTTP response shape, not just Java types. This is the right place for a data-exposure regression guard — it would catch even a reflection-based serialization change that bypasses the type system.Test factory flattening — All
makeItem()/baseItem()factories acrossDocumentRow.svelte.spec.ts,DocumentRow.svelte.test.ts,DocumentList.svelte.spec.ts,DocumentList.svelte.test.ts,page.svelte.test.tsare correctly updated. That's thorough.Concern
DocumentMultiSelect.svelte.spec.ts—docFactoryproduces fields that don't exist onDocumentListItemThe factory:
These extra fields are passed directly to the mock response's
itemsarray. The component casts toDocument[]anyway so the tests pass — but the test data no longer matches the actual API shape. Any future test that relies on those fields beingundefined(becauseDocumentListItemdoesn't have them) will get stale factory data instead.Update
docFactoryto matchDocumentListItem's actual shape:id,title,documentDate,originalFilename,receivers,tags,completionPercentage,contributors,matchData.Suggestion
DocumentSearchResultTest.item()factory — 15 positional args with many nullsIf the
DocumentListItemrecord field order ever changes, this silently constructs wrong data — no compile error. Since this is test-only code, a small named factory with only the fields each test cares about would be more robust. Not a blocker, but worth a follow-up.🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved — LGTM
Checked: CI impact, dependency surface, generated file management, container requirements.
DocumentListItemIntegrationTestuses the existingPostgresContainerConfigTestcontainers setup — no new container images. CI already pullspostgres:16for integration tests.frontend/src/lib/generated/api.tsis fully managed bynpm run generate:api— the diff is reproducible from the current backend spec. The incidental changes toImportStatus(newskippedFilesfield,SkippedFileschema) andPagedResponsefield reordering came from a backend feature already onmainthat the regeneration picked up. Not a problem — the file is always derived from the spec.Nothing here that touches infrastructure, secrets, or deployment config.
📋 Elicit — Requirements Engineer
Verdict: ✅ Approved
Full traceability audit from issue #647 acceptance criteria to implementation and tests.
AC #1 — List response exposes only list-view fields; sensitive storage fields absent
transcriptionabsentDocumentListItemrecordDocumentControllerTest.search_returns_flat_item_with_id_and_without_sensitive_fields→jsonPath("$.items[0].transcription").doesNotExist()filePathabsentDocumentListItemrecordjsonPath("$.items[0].filePath").doesNotExist()fileHashabsentDocumentListItemrecordjsonPath("$.items[0].fileHash").doesNotExist()idpresent at top levelDocumentListItem.id()jsonPath("$.items[0].id").value(...)✅ Fully covered.
AC #2 — Search with a document that has
trainingLabelsdoes not throwLazyInitializationExceptiontrainingLabelsremoved fromDocument.listentity graphDocument.java—@NamedEntityGraph(name = "Document.list")no longer liststrainingLabelsDocumentListItemIntegrationTest.search_doesNotThrow_whenDocumentHasTrainingLabels✅ Fully covered.
AC #3 — Detail endpoint still returns
trainingLabelsafter the entity graph changeDocument.fullgraph unchangedDocument.java—@NamedEntityGraph(name = "Document.full")still includestrainingLabelsDocumentListItemIntegrationTest.detail_stillReturnsTrainingLabelsviadocumentService.getDocumentById()✅ Fully covered.
All three acceptance criteria are implemented and machine-verified at the appropriate layer (HTTP response shape, integration with real DB).
🎨 Leonie Voss — UX Designer & Accessibility Strategist
Verdict: ✅ Approved — LGTM
This is a backend DTO refactor and TypeScript type rename. No rendered HTML changes, no CSS changes, no ARIA attribute changes.
DocumentRow.svelteandDocumentList.sveltenow receive a flatDocumentListIteminstead of a nestedDocumentSearchItem, but the template reads the same fields (title,documentDate,sender,receivers,tags,thumbnailUrl,summary,archiveBox,archiveFolder,location) through theconst doc = $derived(item)alias. The visual output is byte-for-byte identical to before.thumbnailUrlwas previously a computed@JsonPropertyon theDocumententity; it's now an explicit field onDocumentListItem. The URL format is unchanged — the thumbnail renders identically.Nothing to flag from a design or accessibility perspective.