Optimize document list API: avoid loading associations not needed by the list view #647

New Issue

marcel · 2026-05-20T19:55:08+02:00

marcel commented

2026-05-20 19:55:08 +02:00

Context

The document list endpoint (GET /api/documents/search) serializes the full Document entity — including all lazy associations — inside each DocumentSearchItem. With open-in-view: false, every lazy association must be eagerly loaded via entity graphs or the response crashes with LazyInitializationException when Jackson touches an uninitialized proxy.

Currently Document.list loads:

sender — displayed ✅
receivers — displayed ✅
tags — displayed ✅
trainingLabels — not displayed in the list UI but must be loaded to prevent a serialization crash

Root cause

The list API returns the same Document entity type as the detail API. There is no list-specific projection, so the server is forced to load (and the client is forced to receive) fields only the detail view needs — trainingLabels, transcription, summary, archiveBox, archiveFolder, documentLocation, scriptType, etc.

This creates ongoing maintenance debt: every new lazy association added to Document must be added to Document.list or it will crash in production once any document carries that data.

Goal

Reduce the payload and query cost of the list API to only the fields the list view actually renders, without regressing the detail view.

Acceptance criteria

Given a document search request, when the response is received, then each item contains only the fields the list UI renders: id, title, originalFilename, thumbnailUrl, documentDate, sender, receivers, tags, archiveBox, archiveFolder, location, summary, status, completionPercentage, contributors, matchData.
Given a document with trainingLabels assigned, when the list API is called, then no LazyInitializationException is thrown (regardless of whether training labels are included in the response).
Given the detail API (GET /api/documents/{id}), when called, then all existing fields (including trainingLabels) are still returned.
Given the TypeScript API types, when npm run generate:api is re-run, then the generated types reflect the narrower list type without breaking the detail-page types.

Options to consider

Option A — List-specific DTO (DocumentListItem): a Java record with only the fields the list view needs. The search endpoint returns DocumentSearchResult<DocumentListItem>. Clean separation but requires a new DTO and OpenAPI type, and generate:api must be re-run.

Option B — @JsonView: annotate Document fields with @JsonView(Views.List.class) / @JsonView(Views.Detail.class) and set the view on each controller method. No new DTO, but adds Jackson-view coupling to the entity.

Option C — Keep as-is, just ensure all lazy collections are in Document.list: lowest risk, no refactoring. Accepted as the fallback if A/B add too much complexity for the current project stage.

Out of scope

Pagination performance (separate issue)
GraphQL or field-selection query params

## Context The document list endpoint (`GET /api/documents/search`) serializes the full `Document` entity — including all lazy associations — inside each `DocumentSearchItem`. With `open-in-view: false`, every lazy association must be eagerly loaded via entity graphs or the response crashes with `LazyInitializationException` when Jackson touches an uninitialized proxy. Currently `Document.list` loads: - `sender` — displayed ✅ - `receivers` — displayed ✅ - `tags` — displayed ✅ - `trainingLabels` — **not displayed in the list UI** but must be loaded to prevent a serialization crash ### Root cause The list API returns the same `Document` entity type as the detail API. There is no list-specific projection, so the server is forced to load (and the client is forced to receive) fields only the detail view needs — `trainingLabels`, `transcription`, `summary`, `archiveBox`, `archiveFolder`, `documentLocation`, `scriptType`, etc. This creates ongoing maintenance debt: every new lazy association added to `Document` must be added to `Document.list` or it will crash in production once any document carries that data. ## Goal Reduce the payload and query cost of the list API to only the fields the list view actually renders, without regressing the detail view. ## Acceptance criteria 1. **Given** a document search request, **when** the response is received, **then** each item contains only the fields the list UI renders: `id`, `title`, `originalFilename`, `thumbnailUrl`, `documentDate`, `sender`, `receivers`, `tags`, `archiveBox`, `archiveFolder`, `location`, `summary`, `status`, `completionPercentage`, `contributors`, `matchData`. 2. **Given** a document with `trainingLabels` assigned, **when** the list API is called, **then** no `LazyInitializationException` is thrown (regardless of whether training labels are included in the response). 3. **Given** the detail API (`GET /api/documents/{id}`), **when** called, **then** all existing fields (including `trainingLabels`) are still returned. 4. **Given** the TypeScript API types, **when** `npm run generate:api` is re-run, **then** the generated types reflect the narrower list type without breaking the detail-page types. ## Options to consider **Option A — List-specific DTO** (`DocumentListItem`): a Java record with only the fields the list view needs. The search endpoint returns `DocumentSearchResult<DocumentListItem>`. Clean separation but requires a new DTO and OpenAPI type, and `generate:api` must be re-run. **Option B — `@JsonView`**: annotate `Document` fields with `@JsonView(Views.List.class)` / `@JsonView(Views.Detail.class)` and set the view on each controller method. No new DTO, but adds Jackson-view coupling to the entity. **Option C — Keep as-is, just ensure all lazy collections are in `Document.list`**: lowest risk, no refactoring. Accepted as the fallback if A/B add too much complexity for the current project stage. ## Out of scope - Pagination performance (separate issue) - GraphQL or field-selection query params

marcel added the P2-medium refactor labels 2026-05-20 19:55:13 +02:00

marcel commented

2026-05-22 17:03:11 +02:00

🏗️ Markus Keller — Senior Application Architect

Observations

Document.list and Document.full are currently identical (Document.java:31–41) — both eager-load sender, receivers, tags, and trainingLabels. The issue correctly names this as the root cause. The Document.list graph was created purely to avoid LazyInitializationException on trainingLabels, not to serve a different shape of data.
DocumentSearchItem (DocumentSearchItem.java) wraps the full Document entity — the JSON payload includes every scalar field: transcription, filePath, fileHash, contentType, thumbnailKey, thumbnailAspect, pageCount, metadataComplete, scriptType, documentLocation, createdAt, updatedAt. The frontend uses none of these in the list view.
Option A (DTO) is the right architectural choice. @JsonView (Option B) couples the entity to Jackson view annotations — serialization behavior becomes invisible to the type system and to OpenAPI codegen. Option C just defers the maintenance debt to the next developer who adds a lazy field.
thumbnailUrl is not a direct field on Document — the entity stores thumbnailKey (an internal S3 key). The acceptance criteria list thumbnailUrl as a required field, but the issue doesn't specify how it gets computed. This needs to be resolved before implementation starts (see Open Decisions).
The DocumentSearchItem wrapper already separates enrichment concerns (completionPercentage, contributors, matchData) from the entity. Option A should collapse this into one flat record — DocumentListItem replaces both the inner Document and the DocumentSearchItem wrapper — simplifying the response structure.

Recommendations

Go with Option A. Create DocumentListItem as a Java record containing all fields from the acceptance criteria: id, title, originalFilename, thumbnailUrl, documentDate, sender, receivers, tags, archiveBox, archiveFolder, location, summary, status, completionPercentage, contributors, matchData. This is a flat record — no nested Document inside.
Remove trainingLabels from the Document.list entity graph once the DTO is in place. Better still, once no endpoint returns the raw Document entity from the search path, delete Document.list entirely — it's no longer needed and leaves a trap for future developers.
The mapping point is DocumentService:739–753 (the enrichment pipeline). The private mapping function toListItem(Document, SearchMatchData, int, List<ActivityActorDTO>) should live here. It reads thumbnailKey and calls fileService.getThumbnailUrl() to produce the final thumbnailUrl value.
No architecture doc updates needed for a DTO addition in an existing domain package. But if DocumentSearchResult is renamed or restructured, update docs/architecture/c4/l3-backend-document.puml.

Open Decisions

thumbnailUrl computation strategy: two viable approaches, each with a cost.
- Pre-signed URL at query time — compute once per item in the mapping function. Simple, but the URL expires (MinIO default: 7 days for presigned GET). If a client caches the list response beyond the TTL, thumbnail images break silently.
- Stable redirect endpoint — expose GET /api/documents/{id}/thumbnail that redirects to a fresh presigned URL on demand. Extra round-trip per thumbnail, but no TTL problem. Works better with HTTP caching.
- The right choice depends on how thumbnails are currently served elsewhere in the app. Check whether thumbnailKey is already used to generate presigned URLs for the detail view — and standardize on that pattern.

## 🏗️ Markus Keller — Senior Application Architect ### Observations - **`Document.list` and `Document.full` are currently identical** (`Document.java:31–41`) — both eager-load `sender`, `receivers`, `tags`, and `trainingLabels`. The issue correctly names this as the root cause. The `Document.list` graph was created purely to avoid `LazyInitializationException` on `trainingLabels`, not to serve a different shape of data. - **`DocumentSearchItem` (`DocumentSearchItem.java`) wraps the full `Document` entity** — the JSON payload includes every scalar field: `transcription`, `filePath`, `fileHash`, `contentType`, `thumbnailKey`, `thumbnailAspect`, `pageCount`, `metadataComplete`, `scriptType`, `documentLocation`, `createdAt`, `updatedAt`. The frontend uses none of these in the list view. - **Option A (DTO) is the right architectural choice.** `@JsonView` (Option B) couples the entity to Jackson view annotations — serialization behavior becomes invisible to the type system and to OpenAPI codegen. Option C just defers the maintenance debt to the next developer who adds a lazy field. - **`thumbnailUrl` is not a direct field on `Document`** — the entity stores `thumbnailKey` (an internal S3 key). The acceptance criteria list `thumbnailUrl` as a required field, but the issue doesn't specify how it gets computed. This needs to be resolved before implementation starts (see Open Decisions). - **The `DocumentSearchItem` wrapper already separates enrichment concerns** (completionPercentage, contributors, matchData) from the entity. Option A should collapse this into one flat record — `DocumentListItem` replaces both the inner `Document` and the `DocumentSearchItem` wrapper — simplifying the response structure. ### Recommendations - **Go with Option A.** Create `DocumentListItem` as a Java record containing all fields from the acceptance criteria: `id`, `title`, `originalFilename`, `thumbnailUrl`, `documentDate`, `sender`, `receivers`, `tags`, `archiveBox`, `archiveFolder`, `location`, `summary`, `status`, `completionPercentage`, `contributors`, `matchData`. This is a flat record — no nested `Document` inside. - **Remove `trainingLabels` from the `Document.list` entity graph** once the DTO is in place. Better still, once no endpoint returns the raw `Document` entity from the search path, delete `Document.list` entirely — it's no longer needed and leaves a trap for future developers. - **The mapping point is `DocumentService:739–753`** (the enrichment pipeline). The private mapping function `toListItem(Document, SearchMatchData, int, List<ActivityActorDTO>)` should live here. It reads `thumbnailKey` and calls `fileService.getThumbnailUrl()` to produce the final `thumbnailUrl` value. - **No architecture doc updates needed** for a DTO addition in an existing domain package. But if `DocumentSearchResult` is renamed or restructured, update `docs/architecture/c4/l3-backend-document.puml`. ### Open Decisions - **`thumbnailUrl` computation strategy**: two viable approaches, each with a cost. - *Pre-signed URL at query time* — compute once per item in the mapping function. Simple, but the URL expires (MinIO default: 7 days for presigned GET). If a client caches the list response beyond the TTL, thumbnail images break silently. - *Stable redirect endpoint* — expose `GET /api/documents/{id}/thumbnail` that redirects to a fresh presigned URL on demand. Extra round-trip per thumbnail, but no TTL problem. Works better with HTTP caching. - The right choice depends on how thumbnails are currently served elsewhere in the app. Check whether `thumbnailKey` is already used to generate presigned URLs for the detail view — and standardize on that pattern.

marcel commented

2026-05-22 17:03:27 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

The mapping change is localized to one place: DocumentService:739–753. The enrichment pipeline already builds DocumentSearchItem from Document + enrichment data. Replacing the inner Document reference with a flat DocumentListItem record is a clean, bounded change.
@Schema(requiredMode = REQUIRED) annotations are load-bearing for TypeScript codegen (AC #4). Every non-nullable field on DocumentListItem needs @Schema(requiredMode = Schema.RequiredMode.REQUIRED) or the generated TypeScript types will have everything as optional — breaking the type safety the frontend relies on.
trainingLabels can be dropped from the Document.list entity graph — that's the only graph change needed. sender, receivers, and tags must stay because the DTO mapping reads them.
status is listed in AC #1 even though DocumentRow.svelte doesn't currently display it. Keep it in the DTO — placeholder documents need to be distinguished from uploaded ones, and it costs nothing in a flat record.
SearchMatchData should stay as a nested record in DocumentListItem — it's already a clean, well-scoped type. Don't flatten its fields into the top level.

Recommendations

Name the DTO DocumentListItem, not DocumentSummary or DocumentSearchDTO. The name communicates exactly when this type is used: as a list item in the document search result.
The mapping function signature: private DocumentListItem toListItem(Document doc, SearchMatchData match, int completionPct, List<ActivityActorDTO> contributors). Keep it under 20 lines; every field assignment is a one-liner. Testable with @ExtendWith(MockitoExtension.class).
TDD order:
1. Write a failing @WebMvcTest asserting the response shape of GET /api/documents/search (expected fields present, excluded fields absent).
2. Create the DocumentListItem record with @Schema annotations — test goes green on the type level.
3. Write a failing Testcontainers integration test for AC #2 (document with trainingLabels → no LazyInitializationException).
4. Update the entity graph to drop trainingLabels — test goes green.
5. Run npm run generate:api and fix any TypeScript type references.
After npm run generate:api: search for all usages of the old Document type in the frontend list path (DocumentRow.svelte, DocumentList.svelte, +page.svelte). The generated type name may change — update imports.
Don't merge until svelte-check passes — TypeScript errors in .svelte files aren't caught by tsc alone.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - **The mapping change is localized to one place**: `DocumentService:739–753`. The enrichment pipeline already builds `DocumentSearchItem` from `Document` + enrichment data. Replacing the inner `Document` reference with a flat `DocumentListItem` record is a clean, bounded change. - **`@Schema(requiredMode = REQUIRED)` annotations are load-bearing for TypeScript codegen** (AC #4). Every non-nullable field on `DocumentListItem` needs `@Schema(requiredMode = Schema.RequiredMode.REQUIRED)` or the generated TypeScript types will have everything as optional — breaking the type safety the frontend relies on. - **`trainingLabels` can be dropped from the `Document.list` entity graph** — that's the only graph change needed. `sender`, `receivers`, and `tags` must stay because the DTO mapping reads them. - **`status` is listed in AC #1** even though `DocumentRow.svelte` doesn't currently display it. Keep it in the DTO — placeholder documents need to be distinguished from uploaded ones, and it costs nothing in a flat record. - **`SearchMatchData` should stay as a nested record** in `DocumentListItem` — it's already a clean, well-scoped type. Don't flatten its fields into the top level. ### Recommendations - **Name the DTO `DocumentListItem`**, not `DocumentSummary` or `DocumentSearchDTO`. The name communicates exactly when this type is used: as a list item in the document search result. - **The mapping function** signature: `private DocumentListItem toListItem(Document doc, SearchMatchData match, int completionPct, List<ActivityActorDTO> contributors)`. Keep it under 20 lines; every field assignment is a one-liner. Testable with `@ExtendWith(MockitoExtension.class)`. - **TDD order**: 1. Write a failing `@WebMvcTest` asserting the response shape of `GET /api/documents/search` (expected fields present, excluded fields absent). 2. Create the `DocumentListItem` record with `@Schema` annotations — test goes green on the type level. 3. Write a failing Testcontainers integration test for AC #2 (document with `trainingLabels` → no `LazyInitializationException`). 4. Update the entity graph to drop `trainingLabels` — test goes green. 5. Run `npm run generate:api` and fix any TypeScript type references. - **After `npm run generate:api`**: search for all usages of the old `Document` type in the frontend list path (`DocumentRow.svelte`, `DocumentList.svelte`, `+page.svelte`). The generated type name may change — update imports. - **Don't merge until `svelte-check` passes** — TypeScript errors in `.svelte` files aren't caught by `tsc` alone.

marcel commented

2026-05-22 17:03:42 +02:00

🔒 Nora "NullX" Steiner — Application Security Engineer

Observations

This issue is primarily a data minimization improvement — which is both an OWASP API Security Top 10 finding (API3:2023 — Broken Object Property Level Authorization / Excessive Data Exposure) and a GDPR principle (Article 5(1)(c): data minimization). The current list endpoint sends fields to every READ_ALL-authorized client that should only be available in the detail view:

Field	Why it's a concern
`transcription`	Full text of historical personal correspondence — sent for every list item, not just a snippet
`fileHash` (SHA-256)	Allows a READ_ALL user to determine whether a specific known file exists in the archive without accessing it — a covert channel
`filePath`	Exposes internal S3/MinIO key structure — useful to an attacker probing storage misconfiguration
`contentType`, `thumbnailKey`	Internal storage metadata, unnecessary in list context
`metadataComplete`, `scriptType`	Internal workflow state, unnecessary for list rendering

None of these require new attack surface — they're sent passively in the current list response to any authenticated user.

Recommendations

The DTO fix is the correct security mitigation. Explicit field inclusion (DTO) is safer than explicit field exclusion (@JsonIgnore, @JsonView) — adding a new field to the entity does not automatically expose it in the list response.

Add a @WebMvcTest assertion that excluded fields are absent from the response:

mockMvc.perform(get("/api/documents/search").param("q", ""))
    .andExpect(jsonPath("$.items[0].transcription").doesNotExist())
    .andExpect(jsonPath("$.items[0].filePath").doesNotExist())
    .andExpect(jsonPath("$.items[0].fileHash").doesNotExist());

This is a permanent regression guard. Without it, someone adding implements Serializable or a Jackson mixin could accidentally re-expose these fields.

Confirm thumbnailUrl does not expose the raw thumbnailKey in the DTO. The computed URL (presigned or redirect) should be what the DTO contains — never the internal S3 key.
No authentication or authorization changes are needed — @RequirePermission(READ_ALL) on the search endpoint is unchanged and correct.
Note for future: if a WRITE_ALL-only detail view is ever introduced for the full Document entity (including transcription, trainingLabels), that's when to revisit permission boundaries. For now, both list and detail require READ_ALL, so this change is purely about field exposure, not access control.

## 🔒 Nora "NullX" Steiner — Application Security Engineer ### Observations This issue is primarily a **data minimization improvement** — which is both an OWASP API Security Top 10 finding (API3:2023 — Broken Object Property Level Authorization / Excessive Data Exposure) and a GDPR principle (Article 5(1)(c): data minimization). The current list endpoint sends fields to every READ_ALL-authorized client that should only be available in the detail view: | Field | Why it's a concern | |---|---| | `transcription` | Full text of historical personal correspondence — sent for every list item, not just a snippet | | `fileHash` (SHA-256) | Allows a READ_ALL user to determine whether a specific known file exists in the archive without accessing it — a covert channel | | `filePath` | Exposes internal S3/MinIO key structure — useful to an attacker probing storage misconfiguration | | `contentType`, `thumbnailKey` | Internal storage metadata, unnecessary in list context | | `metadataComplete`, `scriptType` | Internal workflow state, unnecessary for list rendering | None of these require new attack surface — they're sent passively in the current list response to any authenticated user. ### Recommendations - **The DTO fix is the correct security mitigation.** Explicit field inclusion (DTO) is safer than explicit field exclusion (`@JsonIgnore`, `@JsonView`) — adding a new field to the entity does not automatically expose it in the list response. - **Add a `@WebMvcTest` assertion that excluded fields are absent from the response**: ```java mockMvc.perform(get("/api/documents/search").param("q", "")) .andExpect(jsonPath("$.items[0].transcription").doesNotExist()) .andExpect(jsonPath("$.items[0].filePath").doesNotExist()) .andExpect(jsonPath("$.items[0].fileHash").doesNotExist()); ``` This is a permanent regression guard. Without it, someone adding `implements Serializable` or a Jackson mixin could accidentally re-expose these fields. - **Confirm `thumbnailUrl` does not expose the raw `thumbnailKey`** in the DTO. The computed URL (presigned or redirect) should be what the DTO contains — never the internal S3 key. - **No authentication or authorization changes are needed** — `@RequirePermission(READ_ALL)` on the search endpoint is unchanged and correct. - **Note for future**: if a `WRITE_ALL`-only detail view is ever introduced for the full `Document` entity (including `transcription`, `trainingLabels`), that's when to revisit permission boundaries. For now, both list and detail require READ_ALL, so this change is purely about field exposure, not access control.

marcel commented

2026-05-22 17:03:58 +02:00

🧪 Sara Holt — Senior QA Engineer

Observations

AC #1 is verifiable at the @WebMvcTest layer — assert response shape with MockMvc + jsonPath. Fast, no database needed.
AC #2 requires a Testcontainers integration test — LazyInitializationException is a runtime JPA behavior that only manifests with a real database and a detached entity. A @WebMvcTest with a mocked service will never reproduce it. The test must: (1) persist a document with trainingLabels to real PostgreSQL, (2) call the search endpoint end-to-end, (3) assert 200 OK.
AC #3 (detail API unchanged) needs a regression guard. If there's no existing integration test for GET /api/documents/{id} that asserts trainingLabels is present, create one now — before the change — so it can go red if the detail endpoint is accidentally broken.
AC #4 (TypeScript types) is currently a manual step (npm run generate:api). There's no CI gate asserting that the committed generated types match what the running backend produces. This is a latent drift risk.
Missing edge case in the AC: What if a document has trainingLabels AND matches a full-text search query? The LazyInitializationException risk is highest when the entity is fetched via the FTS path (which goes through findFtsPageRaw) and then detached before enrichment. The test should exercise this specific path, not just a basic search.

Recommendations

Test plan — all four ACs:

AC	Layer	Test name	Key assertion
#1	`@WebMvcTest`	`search_returns_only_list_fields`	`jsonPath("$.items[0].id")` present; `jsonPath("$.items[0].transcription")` absent
#2	`@SpringBootTest` + Testcontainers	`search_does_not_throw_when_document_has_training_labels`	Status 200; no exception in application logs
#3	`@SpringBootTest` + Testcontainers	`detail_still_returns_training_labels`	`jsonPath("$.trainingLabels")` present and non-null
#4	CI step	`api-types-fresh`	`npm run generate:api && git diff --exit-code frontend/src/lib/api/`

For AC #2 specifically, create the fixture like this — don't rely on an existing document in the test database:

Document doc = documentRepository.save(Document.builder()
    .title("Test")
    .originalFilename("test.pdf")
    .status(DocumentStatus.UPLOADED)
    .trainingLabels(Set.of(TrainingLabel.CORRECT))
    .build());
// now call GET /api/documents/search?q=Test and assert 200

On CI for AC #4: Add to the Gitea Actions workflow after the backend tests:

- name: Check API types are up to date
  run: |
    cd frontend && npm run generate:api
    git diff --exit-code src/lib/api/

This fails the build if the generated types drift from what the backend currently produces.

## 🧪 Sara Holt — Senior QA Engineer ### Observations - **AC #1 is verifiable at the `@WebMvcTest` layer** — assert response shape with MockMvc + jsonPath. Fast, no database needed. - **AC #2 requires a Testcontainers integration test** — `LazyInitializationException` is a runtime JPA behavior that only manifests with a real database and a detached entity. A `@WebMvcTest` with a mocked service will never reproduce it. The test must: (1) persist a document with `trainingLabels` to real PostgreSQL, (2) call the search endpoint end-to-end, (3) assert 200 OK. - **AC #3 (detail API unchanged)** needs a regression guard. If there's no existing integration test for `GET /api/documents/{id}` that asserts `trainingLabels` is present, create one now — before the change — so it can go red if the detail endpoint is accidentally broken. - **AC #4 (TypeScript types)** is currently a manual step (`npm run generate:api`). There's no CI gate asserting that the committed generated types match what the running backend produces. This is a latent drift risk. - **Missing edge case in the AC**: What if a document has `trainingLabels` AND matches a full-text search query? The `LazyInitializationException` risk is highest when the entity is fetched via the FTS path (which goes through `findFtsPageRaw`) and then detached before enrichment. The test should exercise this specific path, not just a basic search. ### Recommendations **Test plan — all four ACs:** | AC | Layer | Test name | Key assertion | |---|---|---|---| | #1 | `@WebMvcTest` | `search_returns_only_list_fields` | `jsonPath("$.items[0].id")` present; `jsonPath("$.items[0].transcription")` absent | | #2 | `@SpringBootTest` + Testcontainers | `search_does_not_throw_when_document_has_training_labels` | Status 200; no exception in application logs | | #3 | `@SpringBootTest` + Testcontainers | `detail_still_returns_training_labels` | `jsonPath("$.trainingLabels")` present and non-null | | #4 | CI step | `api-types-fresh` | `npm run generate:api && git diff --exit-code frontend/src/lib/api/` | **For AC #2 specifically**, create the fixture like this — don't rely on an existing document in the test database: ```java Document doc = documentRepository.save(Document.builder() .title("Test") .originalFilename("test.pdf") .status(DocumentStatus.UPLOADED) .trainingLabels(Set.of(TrainingLabel.CORRECT)) .build()); // now call GET /api/documents/search?q=Test and assert 200 ``` **On CI for AC #4**: Add to the Gitea Actions workflow after the backend tests: ```yaml - name: Check API types are up to date run: | cd frontend && npm run generate:api git diff --exit-code src/lib/api/ ``` This fails the build if the generated types drift from what the backend currently produces.

marcel commented

2026-05-22 17:04:12 +02:00

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Observations

This change is backend-only in terms of rendered output — DocumentRow.svelte accesses document.id, title, originalFilename, documentDate, sender.displayName, receivers[].displayName, tags[].{id,name,color}, summary, archiveBox, archiveFolder, location. All of these appear in the acceptance criteria for the new DTO, so no visual regressions are expected.
thumbnailUrl vs thumbnailKey: The current entity exposes thumbnailKey (an internal S3 path). The DTO should expose a ready-to-use thumbnailUrl computed server-side. This is actually a UX improvement — the frontend no longer needs to know about S3 key formats or URL construction logic. If the thumbnail doesn't exist, return null; the existing fallback rendering in the component handles this already.
status is included in AC #1 and should be kept even though DocumentRow.svelte doesn't currently display it. Placeholder documents (no file yet, imported via Excel) may need to be styled differently in a future iteration — having status available in the list item without a follow-up endpoint call enables this without an API change.
The generated TypeScript type name will change — from the current Document (used as the inner object in DocumentSearchItem) to DocumentListItem. All Svelte components importing and destructuring this type need to be updated after npm run generate:api.

Recommendations

Verify that the new DTO's sender and receivers fields keep the same shape as what the frontend currently accesses (displayName, type). If the DTO uses PersonSummaryDTO (from the Person domain redesign), confirm it exposes at least { displayName: string } — DocumentRow.svelte accesses sender.displayName and receivers[].displayName.
Verify tags field shape — DocumentRow.svelte accesses tags[].{id, name, color}. The DTO's tags field must preserve color — if it uses a lighter TagSummary type, ensure color is included.
After regenerating types, run a visual check at 320px on the document list page. The rendered HTML shouldn't change, but a quick Playwright screenshot at mobile viewport is cheap insurance against an unexpected undefined bubbling through to a rendered prop.
No accessibility or brand compliance concerns with this change — it's a data shape change, not a UI change.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist ### Observations - **This change is backend-only in terms of rendered output** — `DocumentRow.svelte` accesses `document.id`, `title`, `originalFilename`, `documentDate`, `sender.displayName`, `receivers[].displayName`, `tags[].{id,name,color}`, `summary`, `archiveBox`, `archiveFolder`, `location`. All of these appear in the acceptance criteria for the new DTO, so no visual regressions are expected. - **`thumbnailUrl` vs `thumbnailKey`**: The current entity exposes `thumbnailKey` (an internal S3 path). The DTO should expose a ready-to-use `thumbnailUrl` computed server-side. This is actually a UX improvement — the frontend no longer needs to know about S3 key formats or URL construction logic. If the thumbnail doesn't exist, return `null`; the existing fallback rendering in the component handles this already. - **`status` is included in AC #1** and should be kept even though `DocumentRow.svelte` doesn't currently display it. Placeholder documents (no file yet, imported via Excel) may need to be styled differently in a future iteration — having `status` available in the list item without a follow-up endpoint call enables this without an API change. - **The generated TypeScript type name will change** — from the current `Document` (used as the inner object in `DocumentSearchItem`) to `DocumentListItem`. All Svelte components importing and destructuring this type need to be updated after `npm run generate:api`. ### Recommendations - **Verify that the new DTO's `sender` and `receivers` fields keep the same shape** as what the frontend currently accesses (`displayName`, `type`). If the DTO uses `PersonSummaryDTO` (from the Person domain redesign), confirm it exposes at least `{ displayName: string }` — `DocumentRow.svelte` accesses `sender.displayName` and `receivers[].displayName`. - **Verify `tags` field shape** — `DocumentRow.svelte` accesses `tags[].{id, name, color}`. The DTO's tags field must preserve `color` — if it uses a lighter `TagSummary` type, ensure `color` is included. - **After regenerating types**, run a visual check at 320px on the document list page. The rendered HTML shouldn't change, but a quick Playwright screenshot at mobile viewport is cheap insurance against an unexpected `undefined` bubbling through to a rendered prop. - **No accessibility or brand compliance concerns** with this change — it's a data shape change, not a UI change.

marcel commented

2026-05-22 17:04:21 +02:00

⚙️ Tobias Wendt — DevOps & Platform Engineer

Observations

No infrastructure changes required. This is a pure application-code change — no new services, no Docker Compose modifications, no environment variables, no CI pipeline restructuring.
Potential for measurable payload reduction: the current list response sends transcription (can be several KB of handwritten letter text per document) for every item in a 20-item page. On slow home connections (common for the 60+ audience), this is real latency. A 20-item search result page could be 50–200KB smaller per request.
The one CI gap worth closing: AC #4 requires npm run generate:api to be run manually after the backend change. There is currently no CI step asserting that the committed generated types match what the backend produces. This is a drift risk that can be fixed with a two-line CI addition (see Sara's recommendation for the exact workflow step).

Recommendations

No infrastructure action items. The change is clean from a deployment perspective.
Optional: add a k6 smoke assertion for list response size after the change ships. Something like:
```
check(res, { 'list payload under 50KB': (r) => r.body.length < 50000 });
```
This converts the optimization into a permanent regression guard — if someone accidentally adds a large field back, the smoke test catches it.
Monitor Grafana after deploy: check the /api/documents/search p95 latency and response size metrics for a before/after comparison. The query itself won't change (entity graph still hits the same joins), but serialization time and bandwidth usage should drop noticeably for documents with long transcriptions.

## ⚙️ Tobias Wendt — DevOps & Platform Engineer ### Observations - **No infrastructure changes required.** This is a pure application-code change — no new services, no Docker Compose modifications, no environment variables, no CI pipeline restructuring. - **Potential for measurable payload reduction**: the current list response sends `transcription` (can be several KB of handwritten letter text per document) for every item in a 20-item page. On slow home connections (common for the 60+ audience), this is real latency. A 20-item search result page could be 50–200KB smaller per request. - **The one CI gap worth closing**: AC #4 requires `npm run generate:api` to be run manually after the backend change. There is currently no CI step asserting that the committed generated types match what the backend produces. This is a drift risk that can be fixed with a two-line CI addition (see Sara's recommendation for the exact workflow step). ### Recommendations - **No infrastructure action items.** The change is clean from a deployment perspective. - **Optional: add a k6 smoke assertion** for list response size after the change ships. Something like: ```javascript check(res, { 'list payload under 50KB': (r) => r.body.length < 50000 }); ``` This converts the optimization into a permanent regression guard — if someone accidentally adds a large field back, the smoke test catches it. - **Monitor Grafana after deploy**: check the `/api/documents/search` p95 latency and response size metrics for a before/after comparison. The query itself won't change (entity graph still hits the same joins), but serialization time and bandwidth usage should drop noticeably for documents with long transcriptions.

marcel commented

2026-05-22 17:04:39 +02:00

📋 Elicit — Requirements Engineer

Observations

The acceptance criteria are well-structured and largely testable — the Gherkin-style Given/When/Then format in AC #1–#4 is clear. Most criteria can be directly translated to test names without further elicitation.
thumbnailUrl appears in AC #1 but is not a field on Document — the entity has thumbnailKey (an S3 path). The issue doesn't specify: (1) what URL format thumbnailUrl should take, (2) how long a presigned URL is valid, or (3) what the frontend should do when the URL expires. This is ambiguous — "the list UI renders thumbnailUrl" doesn't describe behavior, only the name of a field that doesn't yet exist. This is the most critical spec gap before implementation starts.
status is listed in AC #1 as a required field but the issue body says "only the fields the list UI renders." If status is not currently rendered in the list view, there's a contradiction. Either the list view renders status (and the issue should say so explicitly), or status should be removed from AC #1. Given that PLACEHOLDER documents imported via Excel have no file and likely need visual differentiation, including status is probably correct — but the spec should make this explicit.
The options (A/B/C) are framed as choices but the AC don't commit to one. The AC are written in terms of observable behavior (fields in response, no exceptions, detail unchanged), which is good — they're option-agnostic. However, the implementation approach should be decided before work starts, not discovered during it.
AC #4 is stated as "TypeScript types reflect the narrower list type without breaking the detail-page types" — this is testable but the acceptance condition ("without breaking") is vague. A concrete criterion would be: "npm run check passes with zero errors after npm run generate:api is re-run."
Out of scope is explicit and appropriate — pagination performance and GraphQL are correctly excluded.

Recommendations

Resolve the thumbnailUrl spec gap before coding starts. Add to the issue: how is thumbnailUrl constructed? Is it a presigned MinIO URL, a path to a /api/documents/{id}/thumbnail endpoint, or something else? What happens when thumbnailKey is null?
Clarify status intent in the spec. Replace "only the fields the list UI renders" with "the fields the list UI renders plus status for conditional display logic." One sentence removes the ambiguity.
Tighten AC #4 to: "Given npm run generate:api is run against the updated backend, when npm run check is executed in frontend/, then it exits with zero errors." This is unambiguous and automatable.
Commit to Option A in the issue body. The three options create implementation uncertainty; the decision should be made in the spec, not in code review. Based on the analysis across all reviewers, Option A is the clear winner — make it explicit.

## 📋 Elicit — Requirements Engineer ### Observations - **The acceptance criteria are well-structured and largely testable** — the Gherkin-style Given/When/Then format in AC #1–#4 is clear. Most criteria can be directly translated to test names without further elicitation. - **`thumbnailUrl` appears in AC #1 but is not a field on `Document`** — the entity has `thumbnailKey` (an S3 path). The issue doesn't specify: (1) what URL format `thumbnailUrl` should take, (2) how long a presigned URL is valid, or (3) what the frontend should do when the URL expires. This is ambiguous — "the list UI renders thumbnailUrl" doesn't describe behavior, only the name of a field that doesn't yet exist. This is the most critical spec gap before implementation starts. - **`status` is listed in AC #1 as a required field** but the issue body says "only the fields the list UI renders." If `status` is not currently rendered in the list view, there's a contradiction. Either the list view renders status (and the issue should say so explicitly), or status should be removed from AC #1. Given that PLACEHOLDER documents imported via Excel have no file and likely need visual differentiation, including `status` is probably correct — but the spec should make this explicit. - **The options (A/B/C) are framed as choices** but the AC don't commit to one. The AC are written in terms of observable behavior (fields in response, no exceptions, detail unchanged), which is good — they're option-agnostic. However, the implementation approach should be decided before work starts, not discovered during it. - **AC #4 is stated as "TypeScript types reflect the narrower list type without breaking the detail-page types"** — this is testable but the acceptance condition ("without breaking") is vague. A concrete criterion would be: "`npm run check` passes with zero errors after `npm run generate:api` is re-run." - **Out of scope is explicit and appropriate** — pagination performance and GraphQL are correctly excluded. ### Recommendations - **Resolve the `thumbnailUrl` spec gap before coding starts.** Add to the issue: how is `thumbnailUrl` constructed? Is it a presigned MinIO URL, a path to a `/api/documents/{id}/thumbnail` endpoint, or something else? What happens when `thumbnailKey` is null? - **Clarify `status` intent in the spec.** Replace "only the fields the list UI renders" with "the fields the list UI renders plus `status` for conditional display logic." One sentence removes the ambiguity. - **Tighten AC #4** to: _"Given `npm run generate:api` is run against the updated backend, when `npm run check` is executed in `frontend/`, then it exits with zero errors."_ This is unambiguous and automatable. - **Commit to Option A** in the issue body. The three options create implementation uncertainty; the decision should be made in the spec, not in code review. Based on the analysis across all reviewers, Option A is the clear winner — make it explicit.

marcel commented

2026-05-22 17:04:51 +02:00

🗳️ Decision Queue — Action Required

2 decisions need your input before implementation starts.

API Design

thumbnailUrl computation strategy — The acceptance criteria require thumbnailUrl in the list DTO, but Document stores thumbnailKey (an internal S3 path). Two options:
- Pre-signed URL at query time: compute once per item in the mapping function. Simple, but URLs expire (MinIO default TTL: 7 days). Cached list responses with expired URLs show broken images silently.
- Stable redirect endpoint (GET /api/documents/{id}/thumbnail): extra round-trip per thumbnail, but no TTL problem and HTTP-cacheable. (Raised by: Markus — check how thumbnails are served on the detail page and standardize on that pattern.)

Spec Gaps

status field intent — AC #1 lists status as a required list field, but the issue body says "only the fields the list UI renders" and DocumentRow.svelte doesn't currently render status. Is status included for future conditional display (e.g., greying out PLACEHOLDER documents), or should it be removed from AC #1? (Raised by: Elicit, confirmed by Leonie and Felix.)

## 🗳️ Decision Queue — Action Required _2 decisions need your input before implementation starts._ ### API Design - **`thumbnailUrl` computation strategy** — The acceptance criteria require `thumbnailUrl` in the list DTO, but `Document` stores `thumbnailKey` (an internal S3 path). Two options: - *Pre-signed URL at query time*: compute once per item in the mapping function. Simple, but URLs expire (MinIO default TTL: 7 days). Cached list responses with expired URLs show broken images silently. - *Stable redirect endpoint* (`GET /api/documents/{id}/thumbnail`): extra round-trip per thumbnail, but no TTL problem and HTTP-cacheable. _(Raised by: Markus — check how thumbnails are served on the detail page and standardize on that pattern.)_ ### Spec Gaps - **`status` field intent** — AC #1 lists `status` as a required list field, but the issue body says "only the fields the list UI renders" and `DocumentRow.svelte` doesn't currently render status. Is `status` included for future conditional display (e.g., greying out PLACEHOLDER documents), or should it be removed from AC #1? _(Raised by: Elicit, confirmed by Leonie and Felix.)_

marcel commented

2026-05-22 17:09:18 +02:00

✅ Decision Queue — Resolved

Both open items are closed. Implementation can start.

thumbnailUrl computation strategy → Resolved. thumbnailUrl is already a computed @JsonProperty on the Document entity (getThumbnailUrl()), returning a stable /api/documents/{id}/thumbnail?v={timestamp} URL. The DTO mapper just calls document.getThumbnailUrl() — no presigned URLs, no TTL risk.

status field → Resolved. status is not included in DocumentListItem. Remove it from AC #1.

## ✅ Decision Queue — Resolved Both open items are closed. Implementation can start. **`thumbnailUrl` computation strategy** → Resolved. `thumbnailUrl` is already a computed `@JsonProperty` on the `Document` entity (`getThumbnailUrl()`), returning a stable `/api/documents/{id}/thumbnail?v={timestamp}` URL. The DTO mapper just calls `document.getThumbnailUrl()` — no presigned URLs, no TTL risk. **`status` field** → Resolved. `status` is **not** included in `DocumentListItem`. Remove it from AC #1.

marcel referenced a pull request that will close this issue

2026-05-22 18:39:15 +02:00

feat(document): replace DocumentSearchItem with flat DocumentListItem DTO (#647) #660

marcel commented

2026-05-22 18:39:30 +02:00

Implemented — PR #660

All acceptance criteria addressed. Branch: feat/issue-647-document-list-dto

What was done

Backend

Created DocumentListItem.java — flat Java record with 15 fields (id, title, originalFilename, thumbnailUrl, documentDate, sender, receivers, tags, archiveBox, archiveFolder, location, summary, completionPercentage, contributors, matchData). Sensitive fields (transcription, filePath, fileHash, contentType, scriptType, etc.) are absent by design.
Deleted DocumentSearchItem.java — fully replaced.
Updated DocumentSearchResult — List<DocumentSearchItem> → List<DocumentListItem>.
Added toListItem() mapping method in DocumentService; enrichItems() returns List<DocumentListItem>.
Pruned Document.list entity graph: removed trainingLabels — toListItem() never accesses it, so the LazyInitializationException maintenance trap is gone.

Tests

DocumentControllerTest — new @WebMvcTest slice test asserts $.items[0].id present and $.items[0].transcription / filePath / fileHash absent.
DocumentListItemIntegrationTest (Testcontainers) — 3 tests: (1) search with trainingLabels document does not throw; (2) returned item has flat fields; (3) detail endpoint still returns trainingLabels via Document.full graph.

Frontend

Regenerated frontend/src/lib/generated/api.ts — DocumentSearchItem removed, DocumentListItem added with flat fields.
Migrated DocumentRow.svelte, DocumentList.svelte, +page.server.ts, DocumentMultiSelect.svelte — all item.document.xxx → item.xxx.
Updated DocumentRow.svelte.spec.ts, DocumentRow.svelte.test.ts, DocumentList.svelte.spec.ts — makeItem() factories refactored to flat shape.

Commits

eb7edd13 refactor(document): replace DocumentSearchItem with flat DocumentListItem DTO
0f32332b test(document): add LazyInit guard + detail regression tests; prune Document.list graph
4d0b8b15 refactor(document): migrate frontend from DocumentSearchItem to flat DocumentListItem

## Implemented — PR #660 All acceptance criteria addressed. Branch: `feat/issue-647-document-list-dto` ### What was done **Backend** - Created `DocumentListItem.java` — flat Java record with 15 fields (id, title, originalFilename, thumbnailUrl, documentDate, sender, receivers, tags, archiveBox, archiveFolder, location, summary, completionPercentage, contributors, matchData). Sensitive fields (transcription, filePath, fileHash, contentType, scriptType, etc.) are absent by design. - Deleted `DocumentSearchItem.java` — fully replaced. - Updated `DocumentSearchResult` — `List<DocumentSearchItem>` → `List<DocumentListItem>`. - Added `toListItem()` mapping method in `DocumentService`; `enrichItems()` returns `List<DocumentListItem>`. - Pruned `Document.list` entity graph: removed `trainingLabels` — `toListItem()` never accesses it, so the `LazyInitializationException` maintenance trap is gone. **Tests** - `DocumentControllerTest` — new `@WebMvcTest` slice test asserts `$.items[0].id` present and `$.items[0].transcription` / `filePath` / `fileHash` absent. - `DocumentListItemIntegrationTest` (Testcontainers) — 3 tests: (1) search with `trainingLabels` document does not throw; (2) returned item has flat fields; (3) detail endpoint still returns `trainingLabels` via `Document.full` graph. **Frontend** - Regenerated `frontend/src/lib/generated/api.ts` — `DocumentSearchItem` removed, `DocumentListItem` added with flat fields. - Migrated `DocumentRow.svelte`, `DocumentList.svelte`, `+page.server.ts`, `DocumentMultiSelect.svelte` — all `item.document.xxx` → `item.xxx`. - Updated `DocumentRow.svelte.spec.ts`, `DocumentRow.svelte.test.ts`, `DocumentList.svelte.spec.ts` — `makeItem()` factories refactored to flat shape. ### Commits - `eb7edd13` refactor(document): replace DocumentSearchItem with flat DocumentListItem DTO - `0f32332b` test(document): add LazyInit guard + detail regression tests; prune Document.list graph - `4d0b8b15` refactor(document): migrate frontend from DocumentSearchItem to flat DocumentListItem

marcel referenced this issue

2026-05-22 19:23:31 +02:00

feat(document): replace DocumentSearchItem with flat DocumentListItem DTO (#647) #660

marcel closed this issue

2026-05-23 19:35:45 +02:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#647