Optimize document list API: avoid loading associations not needed by the list view #647
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
The document list endpoint (
GET /api/documents/search) serializes the fullDocumententity — including all lazy associations — inside eachDocumentSearchItem. Withopen-in-view: false, every lazy association must be eagerly loaded via entity graphs or the response crashes withLazyInitializationExceptionwhen Jackson touches an uninitialized proxy.Currently
Document.listloads:sender— displayed ✅receivers— displayed ✅tags— displayed ✅trainingLabels— not displayed in the list UI but must be loaded to prevent a serialization crashRoot cause
The list API returns the same
Documententity type as the detail API. There is no list-specific projection, so the server is forced to load (and the client is forced to receive) fields only the detail view needs —trainingLabels,transcription,summary,archiveBox,archiveFolder,documentLocation,scriptType, etc.This creates ongoing maintenance debt: every new lazy association added to
Documentmust be added toDocument.listor it will crash in production once any document carries that data.Goal
Reduce the payload and query cost of the list API to only the fields the list view actually renders, without regressing the detail view.
Acceptance criteria
id,title,originalFilename,thumbnailUrl,documentDate,sender,receivers,tags,archiveBox,archiveFolder,location,summary,status,completionPercentage,contributors,matchData.trainingLabelsassigned, when the list API is called, then noLazyInitializationExceptionis thrown (regardless of whether training labels are included in the response).GET /api/documents/{id}), when called, then all existing fields (includingtrainingLabels) are still returned.npm run generate:apiis re-run, then the generated types reflect the narrower list type without breaking the detail-page types.Options to consider
Option A — List-specific DTO (
DocumentListItem): a Java record with only the fields the list view needs. The search endpoint returnsDocumentSearchResult<DocumentListItem>. Clean separation but requires a new DTO and OpenAPI type, andgenerate:apimust be re-run.Option B —
@JsonView: annotateDocumentfields with@JsonView(Views.List.class)/@JsonView(Views.Detail.class)and set the view on each controller method. No new DTO, but adds Jackson-view coupling to the entity.Option C — Keep as-is, just ensure all lazy collections are in
Document.list: lowest risk, no refactoring. Accepted as the fallback if A/B add too much complexity for the current project stage.Out of scope
🏗️ Markus Keller — Senior Application Architect
Observations
Document.listandDocument.fullare currently identical (Document.java:31–41) — both eager-loadsender,receivers,tags, andtrainingLabels. The issue correctly names this as the root cause. TheDocument.listgraph was created purely to avoidLazyInitializationExceptionontrainingLabels, not to serve a different shape of data.DocumentSearchItem(DocumentSearchItem.java) wraps the fullDocumententity — the JSON payload includes every scalar field:transcription,filePath,fileHash,contentType,thumbnailKey,thumbnailAspect,pageCount,metadataComplete,scriptType,documentLocation,createdAt,updatedAt. The frontend uses none of these in the list view.@JsonView(Option B) couples the entity to Jackson view annotations — serialization behavior becomes invisible to the type system and to OpenAPI codegen. Option C just defers the maintenance debt to the next developer who adds a lazy field.thumbnailUrlis not a direct field onDocument— the entity storesthumbnailKey(an internal S3 key). The acceptance criteria listthumbnailUrlas a required field, but the issue doesn't specify how it gets computed. This needs to be resolved before implementation starts (see Open Decisions).DocumentSearchItemwrapper already separates enrichment concerns (completionPercentage, contributors, matchData) from the entity. Option A should collapse this into one flat record —DocumentListItemreplaces both the innerDocumentand theDocumentSearchItemwrapper — simplifying the response structure.Recommendations
DocumentListItemas a Java record containing all fields from the acceptance criteria:id,title,originalFilename,thumbnailUrl,documentDate,sender,receivers,tags,archiveBox,archiveFolder,location,summary,status,completionPercentage,contributors,matchData. This is a flat record — no nestedDocumentinside.trainingLabelsfrom theDocument.listentity graph once the DTO is in place. Better still, once no endpoint returns the rawDocumententity from the search path, deleteDocument.listentirely — it's no longer needed and leaves a trap for future developers.DocumentService:739–753(the enrichment pipeline). The private mapping functiontoListItem(Document, SearchMatchData, int, List<ActivityActorDTO>)should live here. It readsthumbnailKeyand callsfileService.getThumbnailUrl()to produce the finalthumbnailUrlvalue.DocumentSearchResultis renamed or restructured, updatedocs/architecture/c4/l3-backend-document.puml.Open Decisions
thumbnailUrlcomputation strategy: two viable approaches, each with a cost.GET /api/documents/{id}/thumbnailthat redirects to a fresh presigned URL on demand. Extra round-trip per thumbnail, but no TTL problem. Works better with HTTP caching.thumbnailKeyis already used to generate presigned URLs for the detail view — and standardize on that pattern.👨💻 Felix Brandt — Senior Fullstack Developer
Observations
DocumentService:739–753. The enrichment pipeline already buildsDocumentSearchItemfromDocument+ enrichment data. Replacing the innerDocumentreference with a flatDocumentListItemrecord is a clean, bounded change.@Schema(requiredMode = REQUIRED)annotations are load-bearing for TypeScript codegen (AC #4). Every non-nullable field onDocumentListItemneeds@Schema(requiredMode = Schema.RequiredMode.REQUIRED)or the generated TypeScript types will have everything as optional — breaking the type safety the frontend relies on.trainingLabelscan be dropped from theDocument.listentity graph — that's the only graph change needed.sender,receivers, andtagsmust stay because the DTO mapping reads them.statusis listed in AC #1 even thoughDocumentRow.sveltedoesn't currently display it. Keep it in the DTO — placeholder documents need to be distinguished from uploaded ones, and it costs nothing in a flat record.SearchMatchDatashould stay as a nested record inDocumentListItem— it's already a clean, well-scoped type. Don't flatten its fields into the top level.Recommendations
DocumentListItem, notDocumentSummaryorDocumentSearchDTO. The name communicates exactly when this type is used: as a list item in the document search result.private DocumentListItem toListItem(Document doc, SearchMatchData match, int completionPct, List<ActivityActorDTO> contributors). Keep it under 20 lines; every field assignment is a one-liner. Testable with@ExtendWith(MockitoExtension.class).@WebMvcTestasserting the response shape ofGET /api/documents/search(expected fields present, excluded fields absent).DocumentListItemrecord with@Schemaannotations — test goes green on the type level.trainingLabels→ noLazyInitializationException).trainingLabels— test goes green.npm run generate:apiand fix any TypeScript type references.npm run generate:api: search for all usages of the oldDocumenttype in the frontend list path (DocumentRow.svelte,DocumentList.svelte,+page.svelte). The generated type name may change — update imports.svelte-checkpasses — TypeScript errors in.sveltefiles aren't caught bytscalone.🔒 Nora "NullX" Steiner — Application Security Engineer
Observations
This issue is primarily a data minimization improvement — which is both an OWASP API Security Top 10 finding (API3:2023 — Broken Object Property Level Authorization / Excessive Data Exposure) and a GDPR principle (Article 5(1)(c): data minimization). The current list endpoint sends fields to every READ_ALL-authorized client that should only be available in the detail view:
transcriptionfileHash(SHA-256)filePathcontentType,thumbnailKeymetadataComplete,scriptTypeNone of these require new attack surface — they're sent passively in the current list response to any authenticated user.
Recommendations
@JsonIgnore,@JsonView) — adding a new field to the entity does not automatically expose it in the list response.@WebMvcTestassertion that excluded fields are absent from the response:implements Serializableor a Jackson mixin could accidentally re-expose these fields.thumbnailUrldoes not expose the rawthumbnailKeyin the DTO. The computed URL (presigned or redirect) should be what the DTO contains — never the internal S3 key.@RequirePermission(READ_ALL)on the search endpoint is unchanged and correct.WRITE_ALL-only detail view is ever introduced for the fullDocumententity (includingtranscription,trainingLabels), that's when to revisit permission boundaries. For now, both list and detail require READ_ALL, so this change is purely about field exposure, not access control.🧪 Sara Holt — Senior QA Engineer
Observations
@WebMvcTestlayer — assert response shape with MockMvc + jsonPath. Fast, no database needed.LazyInitializationExceptionis a runtime JPA behavior that only manifests with a real database and a detached entity. A@WebMvcTestwith a mocked service will never reproduce it. The test must: (1) persist a document withtrainingLabelsto real PostgreSQL, (2) call the search endpoint end-to-end, (3) assert 200 OK.GET /api/documents/{id}that assertstrainingLabelsis present, create one now — before the change — so it can go red if the detail endpoint is accidentally broken.npm run generate:api). There's no CI gate asserting that the committed generated types match what the running backend produces. This is a latent drift risk.trainingLabelsAND matches a full-text search query? TheLazyInitializationExceptionrisk is highest when the entity is fetched via the FTS path (which goes throughfindFtsPageRaw) and then detached before enrichment. The test should exercise this specific path, not just a basic search.Recommendations
Test plan — all four ACs:
@WebMvcTestsearch_returns_only_list_fieldsjsonPath("$.items[0].id")present;jsonPath("$.items[0].transcription")absent@SpringBootTest+ Testcontainerssearch_does_not_throw_when_document_has_training_labels@SpringBootTest+ Testcontainersdetail_still_returns_training_labelsjsonPath("$.trainingLabels")present and non-nullapi-types-freshnpm run generate:api && git diff --exit-code frontend/src/lib/api/For AC #2 specifically, create the fixture like this — don't rely on an existing document in the test database:
On CI for AC #4: Add to the Gitea Actions workflow after the backend tests:
This fails the build if the generated types drift from what the backend currently produces.
🎨 Leonie Voss — UX Designer & Accessibility Strategist
Observations
DocumentRow.svelteaccessesdocument.id,title,originalFilename,documentDate,sender.displayName,receivers[].displayName,tags[].{id,name,color},summary,archiveBox,archiveFolder,location. All of these appear in the acceptance criteria for the new DTO, so no visual regressions are expected.thumbnailUrlvsthumbnailKey: The current entity exposesthumbnailKey(an internal S3 path). The DTO should expose a ready-to-usethumbnailUrlcomputed server-side. This is actually a UX improvement — the frontend no longer needs to know about S3 key formats or URL construction logic. If the thumbnail doesn't exist, returnnull; the existing fallback rendering in the component handles this already.statusis included in AC #1 and should be kept even thoughDocumentRow.sveltedoesn't currently display it. Placeholder documents (no file yet, imported via Excel) may need to be styled differently in a future iteration — havingstatusavailable in the list item without a follow-up endpoint call enables this without an API change.Document(used as the inner object inDocumentSearchItem) toDocumentListItem. All Svelte components importing and destructuring this type need to be updated afternpm run generate:api.Recommendations
senderandreceiversfields keep the same shape as what the frontend currently accesses (displayName,type). If the DTO usesPersonSummaryDTO(from the Person domain redesign), confirm it exposes at least{ displayName: string }—DocumentRow.svelteaccessessender.displayNameandreceivers[].displayName.tagsfield shape —DocumentRow.svelteaccessestags[].{id, name, color}. The DTO's tags field must preservecolor— if it uses a lighterTagSummarytype, ensurecoloris included.undefinedbubbling through to a rendered prop.⚙️ Tobias Wendt — DevOps & Platform Engineer
Observations
transcription(can be several KB of handwritten letter text per document) for every item in a 20-item page. On slow home connections (common for the 60+ audience), this is real latency. A 20-item search result page could be 50–200KB smaller per request.npm run generate:apito be run manually after the backend change. There is currently no CI step asserting that the committed generated types match what the backend produces. This is a drift risk that can be fixed with a two-line CI addition (see Sara's recommendation for the exact workflow step).Recommendations
/api/documents/searchp95 latency and response size metrics for a before/after comparison. The query itself won't change (entity graph still hits the same joins), but serialization time and bandwidth usage should drop noticeably for documents with long transcriptions.📋 Elicit — Requirements Engineer
Observations
thumbnailUrlappears in AC #1 but is not a field onDocument— the entity hasthumbnailKey(an S3 path). The issue doesn't specify: (1) what URL formatthumbnailUrlshould take, (2) how long a presigned URL is valid, or (3) what the frontend should do when the URL expires. This is ambiguous — "the list UI renders thumbnailUrl" doesn't describe behavior, only the name of a field that doesn't yet exist. This is the most critical spec gap before implementation starts.statusis listed in AC #1 as a required field but the issue body says "only the fields the list UI renders." Ifstatusis not currently rendered in the list view, there's a contradiction. Either the list view renders status (and the issue should say so explicitly), or status should be removed from AC #1. Given that PLACEHOLDER documents imported via Excel have no file and likely need visual differentiation, includingstatusis probably correct — but the spec should make this explicit.npm run checkpasses with zero errors afternpm run generate:apiis re-run."Recommendations
thumbnailUrlspec gap before coding starts. Add to the issue: how isthumbnailUrlconstructed? Is it a presigned MinIO URL, a path to a/api/documents/{id}/thumbnailendpoint, or something else? What happens whenthumbnailKeyis null?statusintent in the spec. Replace "only the fields the list UI renders" with "the fields the list UI renders plusstatusfor conditional display logic." One sentence removes the ambiguity.npm run generate:apiis run against the updated backend, whennpm run checkis executed infrontend/, then it exits with zero errors." This is unambiguous and automatable.🗳️ Decision Queue — Action Required
2 decisions need your input before implementation starts.
API Design
thumbnailUrlcomputation strategy — The acceptance criteria requirethumbnailUrlin the list DTO, butDocumentstoresthumbnailKey(an internal S3 path). Two options:GET /api/documents/{id}/thumbnail): extra round-trip per thumbnail, but no TTL problem and HTTP-cacheable. (Raised by: Markus — check how thumbnails are served on the detail page and standardize on that pattern.)Spec Gaps
statusfield intent — AC #1 listsstatusas a required list field, but the issue body says "only the fields the list UI renders" andDocumentRow.sveltedoesn't currently render status. Isstatusincluded for future conditional display (e.g., greying out PLACEHOLDER documents), or should it be removed from AC #1? (Raised by: Elicit, confirmed by Leonie and Felix.)✅ Decision Queue — Resolved
Both open items are closed. Implementation can start.
thumbnailUrlcomputation strategy → Resolved.thumbnailUrlis already a computed@JsonPropertyon theDocumententity (getThumbnailUrl()), returning a stable/api/documents/{id}/thumbnail?v={timestamp}URL. The DTO mapper just callsdocument.getThumbnailUrl()— no presigned URLs, no TTL risk.statusfield → Resolved.statusis not included inDocumentListItem. Remove it from AC #1.Implemented — PR #660
All acceptance criteria addressed. Branch:
feat/issue-647-document-list-dtoWhat was done
Backend
DocumentListItem.java— flat Java record with 15 fields (id, title, originalFilename, thumbnailUrl, documentDate, sender, receivers, tags, archiveBox, archiveFolder, location, summary, completionPercentage, contributors, matchData). Sensitive fields (transcription, filePath, fileHash, contentType, scriptType, etc.) are absent by design.DocumentSearchItem.java— fully replaced.DocumentSearchResult—List<DocumentSearchItem>→List<DocumentListItem>.toListItem()mapping method inDocumentService;enrichItems()returnsList<DocumentListItem>.Document.listentity graph: removedtrainingLabels—toListItem()never accesses it, so theLazyInitializationExceptionmaintenance trap is gone.Tests
DocumentControllerTest— new@WebMvcTestslice test asserts$.items[0].idpresent and$.items[0].transcription/filePath/fileHashabsent.DocumentListItemIntegrationTest(Testcontainers) — 3 tests: (1) search withtrainingLabelsdocument does not throw; (2) returned item has flat fields; (3) detail endpoint still returnstrainingLabelsviaDocument.fullgraph.Frontend
frontend/src/lib/generated/api.ts—DocumentSearchItemremoved,DocumentListItemadded with flat fields.DocumentRow.svelte,DocumentList.svelte,+page.server.ts,DocumentMultiSelect.svelte— allitem.document.xxx→item.xxx.DocumentRow.svelte.spec.ts,DocumentRow.svelte.test.ts,DocumentList.svelte.spec.ts—makeItem()factories refactored to flat shape.Commits
eb7edd13refactor(document): replace DocumentSearchItem with flat DocumentListItem DTO0f32332btest(document): add LazyInit guard + detail regression tests; prune Document.list graph4d0b8b15refactor(document): migrate frontend from DocumentSearchItem to flat DocumentListItem