feature: PDF-Thumbnails für Dokumente (Upload + Admin-Backfill) #307

Closed
opened 2026-04-22 20:40:40 +02:00 by marcel · 8 comments
Owner

PDF Thumbnail Support

Context

Documents are shown as text-only rows across the app (home search, PersonDocumentList, ConversationTimeline, Chronik). At a glance there is no visual cue for "which document is this" — every row looks the same. For an archive that is fundamentally visual (handwritten letters, typed pages, photos), this is a real discoverability problem. Specs already exist for the target UI (docs/specs/briefwechsel-thumbnail-rows-spec.html).

This plan adds a small JPEG preview (first page for PDFs, scaled-down version for images) that is:

  1. generated on upload (fire-and-forget, does not block the upload response),
  2. backfilled for existing documents via a new admin card in /admin/system,
  3. stored next to the original in MinIO under a new prefix,
  4. surfaced on the Document entity so the frontend can render it wherever documents are listed.

Takes on approach

A few judgment calls worth flagging up front — these are the sensible defaults.

  • Same bucket, thumbnails/<docId>.jpg prefix. One bucket already exists; adding a prefix is simpler than a second bucket and keeps the deploy story identical. Deterministic key (no UUID suffix) so regenerating after a file replace overwrites cleanly.
  • JPEG, width 400px, quality 85. Works for both phone (~60×85 rendered) and retina list rows. JPEG is supported out of the box by javax.imageio — no new dependency. If WebP is desired later, it is a one-line change in the generator (swap the writer).
  • PDFBox 3.0.4 is already in pom.xml (used for training data export). PDFRenderer.renderImageWithDPI(0, 100) gives the first page. Zero new backend dependencies.
  • Fire-and-forget on upload, @Async on the reusable taskExecutor. Same pattern as OcrAsyncRunner — failure is logged, upload succeeds regardless. Backfill covers any doc the async task never got to.
  • Backfill job mirrors MassImportService exactly. In-memory ThumbnailBackfillStatus (IDLE/RUNNING/DONE/FAILED with processed/skipped/failed counters), polled every 2s by the admin UI. No new persistent job entity — overkill for a single-operator admin tool and inconsistent with existing admin backfills.
  • Entity fields, not a sidecar table. Add thumbnailKey + thumbnailGeneratedAt directly on Document. Two columns, no extra join, and the frontend already receives the whole Document on every endpoint. thumbnailGeneratedAt doubles as a cache-buster in the URL.
  • Thumbnail served by a new endpoint, not a presigned URL. GET /api/documents/{id}/thumbnail streams from S3 with Cache-Control: public, max-age=31536000, immutable (the URL changes when thumbnailGeneratedAt changes). Keeps auth consistent with /file and avoids presign complexity in <img> tags.

Backend changes

1. Data model (Flyway + entity)

backend/src/main/resources/db/migration/V39__add_document_thumbnails.sql (new)

ALTER TABLE documents
    ADD COLUMN thumbnail_key VARCHAR(255),
    ADD COLUMN thumbnail_generated_at TIMESTAMP;

backend/src/main/java/org/raddatz/familienarchiv/model/Document.java — add two nullable fields after fileHash:

@Column(name = "thumbnail_key")
private String thumbnailKey;

@Column(name = "thumbnail_generated_at")
private LocalDateTime thumbnailGeneratedAt;

Both nullable → no @Schema(requiredMode = REQUIRED), meaning they appear as string | undefined in the generated TS types.

2. New service: ThumbnailService

backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailService.java (new)

Responsibilities:

  • generate(Document doc) — downloads original via FileService.downloadFileBytes(), renders preview, uploads to thumbnails/{docId}.jpg via S3Client, sets thumbnailKey + thumbnailGeneratedAt on the document, saves.
  • Dispatches on doc.getContentType():
    • application/pdf → PDFBox Loader.loadPDF(bytes)PDFRenderer.renderImageWithDPI(0, 100, ImageType.RGB)
    • image/jpeg, image/png, image/tiffImageIO.read(new ByteArrayInputStream(bytes))
    • anything else → log warning, skip
  • Scales the BufferedImage to width=400 preserving aspect ratio (simple Image.getScaledInstance + drawImage, no extra library).
  • Writes to ByteArrayOutputStream via ImageIO.write(scaled, "jpg", out).
  • Uploads via existing S3Client bean with PutObjectRequest.contentType("image/jpeg").

Explicitly does not throw — logs and returns. Upload paths must not fail because of thumbnails.

3. Async runner: ThumbnailAsyncRunner

backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailAsyncRunner.java (new) — mirrors OcrAsyncRunner.

@Async
public void generateAsync(UUID documentId) {
    Document doc = documentService.getDocumentById(documentId);
    thumbnailService.generate(doc);
}

4. Hook into upload paths

In DocumentService — after each fileService.uploadFile(...) call and the subsequent save, trigger thumbnailAsyncRunner.generateAsync(saved.getId()). Four sites:

  • storeDocument
  • createDocument
  • updateDocument (only when newFile != null)
  • attachFile

In MassImportService.importSingleDocument — same hook after each file upload.

Inject ThumbnailAsyncRunner via constructor (@RequiredArgsConstructor handles it).

5. Backfill service

backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailBackfillService.java (new) — copy the shape of MassImportService:

public enum State { IDLE, RUNNING, DONE, FAILED }
public record BackfillStatus(State state, String message, int total, int processed, int skipped, int failed, LocalDateTime startedAt) {}

private volatile BackfillStatus currentStatus = new BackfillStatus(State.IDLE, "Bereit.", 0, 0, 0, 0, null);

@Async
public void runBackfillAsync() {
    if (currentStatus.state() == State.RUNNING) {
        throw DomainException.conflict(ErrorCode.THUMBNAIL_BACKFILL_ALREADY_RUNNING, "Thumbnail-Generierung läuft bereits");
    }
    // find docs with filePath != null and thumbnailKey == null
    // for each: call thumbnailService.generate(doc), update status counters
    // unsupported content types count as 'skipped', exceptions as 'failed'
}
public BackfillStatus getStatus() { return currentStatus; }

Add repository method on DocumentRepository:

List<Document> findByFilePathIsNotNullAndThumbnailKeyIsNull();

6. Error code

exception/ErrorCode.java — add THUMBNAIL_BACKFILL_ALREADY_RUNNING.
frontend/src/lib/errors.ts — mirror it, add a Paraglide key in messages/{de,en,es}.json.

7. Controller endpoints

AdminController (append):

@PostMapping("/generate-thumbnails")
public ThumbnailBackfillService.BackfillStatus triggerThumbnails() {
    thumbnailBackfillService.runBackfillAsync();
    return thumbnailBackfillService.getStatus();
}

@GetMapping("/thumbnail-status")
public ThumbnailBackfillService.BackfillStatus thumbnailStatus() {
    return thumbnailBackfillService.getStatus();
}

DocumentController (append, mirroring /file):

@GetMapping("/{id}/thumbnail")
public ResponseEntity<Resource> downloadThumbnail(@PathVariable UUID id) {
    Document doc = documentService.getDocumentById(id);
    if (doc.getThumbnailKey() == null) {
        throw DomainException.notFound(ErrorCode.FILE_NOT_FOUND, "No thumbnail for document: " + id);
    }
    FileService.S3FileDownload dl = fileService.downloadFile(doc.getThumbnailKey());
    return ResponseEntity.ok()
            .header(HttpHeaders.CONTENT_TYPE, "image/jpeg")
            .header(HttpHeaders.CACHE_CONTROL, "public, max-age=31536000, immutable")
            .body(dl.resource());
}

8. Tests (TDD — red → green)

  • ThumbnailServiceTest — unit test with a tiny fixture PDF (src/test/resources/fixtures/sample.pdf) and a fixture PNG. Assert S3 put is called with the correct key and bytes are a decodable JPEG of width 400.
  • ThumbnailBackfillServiceTest — mock ThumbnailService, verify it is called for each document without a thumbnailKey, that status transitions match the MassImportService pattern, and that unsupported content-types increment skipped.
  • AdminControllerTest — slice test for the new endpoints (permission required, status shape).
  • DocumentServiceTest — verify thumbnailAsyncRunner.generateAsync(id) is called after each of the four upload paths.

Frontend changes

1. Regenerate API types

After the backend is rebuilt with -DskipTests and running in dev:

cd frontend && npm run generate:api

This adds thumbnailKey?: string and thumbnailGeneratedAt?: string to the generated Document type.

2. Thumbnail URL helper

frontend/src/lib/thumbnails.ts (new) — one small function:

export function thumbnailUrl(doc: { id: string; thumbnailKey?: string; thumbnailGeneratedAt?: string }): string | null {
    if (!doc.thumbnailKey) return null;
    const bust = doc.thumbnailGeneratedAt ? `?v=${encodeURIComponent(doc.thumbnailGeneratedAt)}` : '';
    return `/api/documents/${doc.id}/thumbnail${bust}`;
}

The ?v= query param guarantees cache invalidation when a file is replaced.

3. Document list rows

frontend/src/lib/components/DocumentRow.svelte — insert a 60×84px thumbnail column before the title column. When thumbnailUrl(doc) returns null, keep the existing PDF icon as fallback. Use loading="lazy" on the <img>; border/radius matches the brand-sand card style.

frontend/src/routes/persons/[id]/PersonDocumentList.svelte — replace the small PDF icon tile with the thumbnail when available (same fallback pattern).

For ConversationTimeline.svelte and ChronikRow.svelte: out of scope for this iteration. The specs exist but each is its own layout problem — address in follow-ups once the core data plumbing is proven.

4. Admin system page — new card

frontend/src/routes/admin/system/+page.svelte — add a fourth card after "Mass Import" mirroring its structure exactly:

  • Button: "Thumbnails für fehlende Dokumente erzeugen"
  • Endpoint: POST /api/admin/generate-thumbnails
  • Polling: GET /api/admin/thumbnail-status every 2000 ms while state is RUNNING (reuse the fetchImportStatus pattern)
  • Display: processed / skipped / failed counters

5. E2E sanity check

Existing tests should keep passing. Add one Playwright test under frontend/e2e/ that uploads a PDF, waits briefly, and asserts the thumbnail image loads (HTTP 200 from /api/documents/{id}/thumbnail).

Critical files

New

  • backend/src/main/resources/db/migration/V39__add_document_thumbnails.sql
  • backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailService.java
  • backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailAsyncRunner.java
  • backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailBackfillService.java
  • backend/src/test/java/.../ThumbnailServiceTest.java
  • backend/src/test/java/.../ThumbnailBackfillServiceTest.java
  • frontend/src/lib/thumbnails.ts

Modified

  • backend/src/main/java/org/raddatz/familienarchiv/model/Document.java — two new columns
  • backend/src/main/java/org/raddatz/familienarchiv/service/DocumentService.java — inject runner, four hooks
  • backend/src/main/java/org/raddatz/familienarchiv/service/MassImportService.java — one hook in importSingleDocument
  • backend/src/main/java/org/raddatz/familienarchiv/repository/DocumentRepository.java — new finder
  • backend/src/main/java/org/raddatz/familienarchiv/controller/AdminController.java — two endpoints
  • backend/src/main/java/org/raddatz/familienarchiv/controller/DocumentController.java/thumbnail endpoint
  • backend/src/main/java/org/raddatz/familienarchiv/exception/ErrorCode.java — new code
  • frontend/src/lib/generated/api.ts — regenerated
  • frontend/src/lib/errors.ts + messages/{de,en,es}.json — new code translation
  • frontend/src/lib/components/DocumentRow.svelte — thumbnail column
  • frontend/src/routes/persons/[id]/PersonDocumentList.svelte — thumbnail tile
  • frontend/src/routes/admin/system/+page.svelte — fourth card

Reused (read-only)

  • FileService (uploadFile, downloadFileBytes, downloadFile) — no changes
  • AsyncConfig.taskExecutor() — same pool as OCR
  • OcrAsyncRunner — structural template
  • MassImportService — state-machine and status template

Verification

End-to-end checklist once implemented:

  1. ./mvnw test — all new and existing backend tests pass.
  2. docker-compose up -d then ./mvnw spring-boot:run -Dspring-boot.run.profiles=dev.
  3. Fresh upload: POST /api/documents with a multipart PDF. Response returns immediately; wait ~1 s; GET /api/documents/{id} shows thumbnailKey populated; GET /api/documents/{id}/thumbnail returns a JPEG with Cache-Control: immutable.
  4. Image upload: same with a PNG — thumbnail is a scaled PNG→JPEG.
  5. File replace (PUT /api/documents/{id} with new file): thumbnailGeneratedAt changes, cache-busting URL updates, new thumbnail visible.
  6. Admin backfill: pick a document, clear its thumbnail_key in DB manually; POST /api/admin/generate-thumbnails; poll /api/admin/thumbnail-status until DONE; verify the doc has a new thumbnail.
  7. Concurrent backfill is rejected: start two backfills in a row → second returns HTTP 409 with THUMBNAIL_BACKFILL_ALREADY_RUNNING.
  8. Unsupported content type (e.g. a legacy doc whose mime ended up as application/octet-stream): counted in skipped, no error, logged.
  9. Frontend: npm run generate:api && npm run dev. Check home search shows thumbnails in rows; person detail shows thumbnails in the sidebar list; admin system page shows the new card and polling works.
  10. Playwright e2e: npm run test:e2e -- thumbnail passes.
# PDF Thumbnail Support ## Context Documents are shown as text-only rows across the app (home search, `PersonDocumentList`, `ConversationTimeline`, Chronik). At a glance there is no visual cue for "which document is this" — every row looks the same. For an archive that is fundamentally visual (handwritten letters, typed pages, photos), this is a real discoverability problem. Specs already exist for the target UI (`docs/specs/briefwechsel-thumbnail-rows-spec.html`). This plan adds a small JPEG preview (first page for PDFs, scaled-down version for images) that is: 1. generated **on upload** (fire-and-forget, does not block the upload response), 2. backfilled for existing documents via a new admin card in `/admin/system`, 3. stored next to the original in MinIO under a new prefix, 4. surfaced on the `Document` entity so the frontend can render it wherever documents are listed. ## Takes on approach A few judgment calls worth flagging up front — these are the sensible defaults. - **Same bucket, `thumbnails/<docId>.jpg` prefix.** One bucket already exists; adding a prefix is simpler than a second bucket and keeps the deploy story identical. Deterministic key (no UUID suffix) so regenerating after a file replace overwrites cleanly. - **JPEG, width 400px, quality 85.** Works for both phone (~60×85 rendered) and retina list rows. JPEG is supported out of the box by `javax.imageio` — no new dependency. If WebP is desired later, it is a one-line change in the generator (swap the writer). - **PDFBox 3.0.4 is already in `pom.xml`** (used for training data export). `PDFRenderer.renderImageWithDPI(0, 100)` gives the first page. Zero new backend dependencies. - **Fire-and-forget on upload, `@Async` on the reusable `taskExecutor`.** Same pattern as `OcrAsyncRunner` — failure is logged, upload succeeds regardless. Backfill covers any doc the async task never got to. - **Backfill job mirrors `MassImportService` exactly.** In-memory `ThumbnailBackfillStatus` (IDLE/RUNNING/DONE/FAILED with processed/skipped/failed counters), polled every 2s by the admin UI. No new persistent job entity — overkill for a single-operator admin tool and inconsistent with existing admin backfills. - **Entity fields, not a sidecar table.** Add `thumbnailKey` + `thumbnailGeneratedAt` directly on `Document`. Two columns, no extra join, and the frontend already receives the whole Document on every endpoint. `thumbnailGeneratedAt` doubles as a cache-buster in the URL. - **Thumbnail served by a new endpoint, not a presigned URL.** `GET /api/documents/{id}/thumbnail` streams from S3 with `Cache-Control: public, max-age=31536000, immutable` (the URL changes when `thumbnailGeneratedAt` changes). Keeps auth consistent with `/file` and avoids presign complexity in `<img>` tags. ## Backend changes ### 1. Data model (Flyway + entity) **`backend/src/main/resources/db/migration/V39__add_document_thumbnails.sql`** (new) ```sql ALTER TABLE documents ADD COLUMN thumbnail_key VARCHAR(255), ADD COLUMN thumbnail_generated_at TIMESTAMP; ``` **`backend/src/main/java/org/raddatz/familienarchiv/model/Document.java`** — add two nullable fields after `fileHash`: ```java @Column(name = "thumbnail_key") private String thumbnailKey; @Column(name = "thumbnail_generated_at") private LocalDateTime thumbnailGeneratedAt; ``` Both nullable → no `@Schema(requiredMode = REQUIRED)`, meaning they appear as `string | undefined` in the generated TS types. ### 2. New service: `ThumbnailService` **`backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailService.java`** (new) Responsibilities: - `generate(Document doc)` — downloads original via `FileService.downloadFileBytes()`, renders preview, uploads to `thumbnails/{docId}.jpg` via `S3Client`, sets `thumbnailKey` + `thumbnailGeneratedAt` on the document, saves. - Dispatches on `doc.getContentType()`: - `application/pdf` → PDFBox `Loader.loadPDF(bytes)` → `PDFRenderer.renderImageWithDPI(0, 100, ImageType.RGB)` - `image/jpeg`, `image/png`, `image/tiff` → `ImageIO.read(new ByteArrayInputStream(bytes))` - anything else → log warning, skip - Scales the `BufferedImage` to width=400 preserving aspect ratio (simple `Image.getScaledInstance` + `drawImage`, no extra library). - Writes to `ByteArrayOutputStream` via `ImageIO.write(scaled, "jpg", out)`. - Uploads via existing `S3Client` bean with `PutObjectRequest.contentType("image/jpeg")`. Explicitly **does not** throw — logs and returns. Upload paths must not fail because of thumbnails. ### 3. Async runner: `ThumbnailAsyncRunner` **`backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailAsyncRunner.java`** (new) — mirrors `OcrAsyncRunner`. ```java @Async public void generateAsync(UUID documentId) { Document doc = documentService.getDocumentById(documentId); thumbnailService.generate(doc); } ``` ### 4. Hook into upload paths In **`DocumentService`** — after each `fileService.uploadFile(...)` call and the subsequent save, trigger `thumbnailAsyncRunner.generateAsync(saved.getId())`. Four sites: - `storeDocument` - `createDocument` - `updateDocument` (only when `newFile != null`) - `attachFile` In **`MassImportService.importSingleDocument`** — same hook after each file upload. Inject `ThumbnailAsyncRunner` via constructor (`@RequiredArgsConstructor` handles it). ### 5. Backfill service **`backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailBackfillService.java`** (new) — copy the shape of `MassImportService`: ```java public enum State { IDLE, RUNNING, DONE, FAILED } public record BackfillStatus(State state, String message, int total, int processed, int skipped, int failed, LocalDateTime startedAt) {} private volatile BackfillStatus currentStatus = new BackfillStatus(State.IDLE, "Bereit.", 0, 0, 0, 0, null); @Async public void runBackfillAsync() { if (currentStatus.state() == State.RUNNING) { throw DomainException.conflict(ErrorCode.THUMBNAIL_BACKFILL_ALREADY_RUNNING, "Thumbnail-Generierung läuft bereits"); } // find docs with filePath != null and thumbnailKey == null // for each: call thumbnailService.generate(doc), update status counters // unsupported content types count as 'skipped', exceptions as 'failed' } public BackfillStatus getStatus() { return currentStatus; } ``` Add repository method on `DocumentRepository`: ```java List<Document> findByFilePathIsNotNullAndThumbnailKeyIsNull(); ``` ### 6. Error code **`exception/ErrorCode.java`** — add `THUMBNAIL_BACKFILL_ALREADY_RUNNING`. **`frontend/src/lib/errors.ts`** — mirror it, add a Paraglide key in `messages/{de,en,es}.json`. ### 7. Controller endpoints **`AdminController`** (append): ```java @PostMapping("/generate-thumbnails") public ThumbnailBackfillService.BackfillStatus triggerThumbnails() { thumbnailBackfillService.runBackfillAsync(); return thumbnailBackfillService.getStatus(); } @GetMapping("/thumbnail-status") public ThumbnailBackfillService.BackfillStatus thumbnailStatus() { return thumbnailBackfillService.getStatus(); } ``` **`DocumentController`** (append, mirroring `/file`): ```java @GetMapping("/{id}/thumbnail") public ResponseEntity<Resource> downloadThumbnail(@PathVariable UUID id) { Document doc = documentService.getDocumentById(id); if (doc.getThumbnailKey() == null) { throw DomainException.notFound(ErrorCode.FILE_NOT_FOUND, "No thumbnail for document: " + id); } FileService.S3FileDownload dl = fileService.downloadFile(doc.getThumbnailKey()); return ResponseEntity.ok() .header(HttpHeaders.CONTENT_TYPE, "image/jpeg") .header(HttpHeaders.CACHE_CONTROL, "public, max-age=31536000, immutable") .body(dl.resource()); } ``` ### 8. Tests (TDD — red → green) - `ThumbnailServiceTest` — unit test with a tiny fixture PDF (`src/test/resources/fixtures/sample.pdf`) and a fixture PNG. Assert S3 put is called with the correct key and bytes are a decodable JPEG of width 400. - `ThumbnailBackfillServiceTest` — mock `ThumbnailService`, verify it is called for each document without a `thumbnailKey`, that status transitions match the `MassImportService` pattern, and that unsupported content-types increment `skipped`. - `AdminControllerTest` — slice test for the new endpoints (permission required, status shape). - `DocumentServiceTest` — verify `thumbnailAsyncRunner.generateAsync(id)` is called after each of the four upload paths. ## Frontend changes ### 1. Regenerate API types After the backend is rebuilt with `-DskipTests` and running in dev: ```bash cd frontend && npm run generate:api ``` This adds `thumbnailKey?: string` and `thumbnailGeneratedAt?: string` to the generated `Document` type. ### 2. Thumbnail URL helper **`frontend/src/lib/thumbnails.ts`** (new) — one small function: ```ts export function thumbnailUrl(doc: { id: string; thumbnailKey?: string; thumbnailGeneratedAt?: string }): string | null { if (!doc.thumbnailKey) return null; const bust = doc.thumbnailGeneratedAt ? `?v=${encodeURIComponent(doc.thumbnailGeneratedAt)}` : ''; return `/api/documents/${doc.id}/thumbnail${bust}`; } ``` The `?v=` query param guarantees cache invalidation when a file is replaced. ### 3. Document list rows **`frontend/src/lib/components/DocumentRow.svelte`** — insert a 60×84px thumbnail column before the title column. When `thumbnailUrl(doc)` returns null, keep the existing PDF icon as fallback. Use `loading="lazy"` on the `<img>`; border/radius matches the `brand-sand` card style. **`frontend/src/routes/persons/[id]/PersonDocumentList.svelte`** — replace the small PDF icon tile with the thumbnail when available (same fallback pattern). For `ConversationTimeline.svelte` and `ChronikRow.svelte`: **out of scope for this iteration**. The specs exist but each is its own layout problem — address in follow-ups once the core data plumbing is proven. ### 4. Admin system page — new card **`frontend/src/routes/admin/system/+page.svelte`** — add a fourth card after "Mass Import" mirroring its structure exactly: - Button: "Thumbnails für fehlende Dokumente erzeugen" - Endpoint: `POST /api/admin/generate-thumbnails` - Polling: `GET /api/admin/thumbnail-status` every 2000 ms while state is `RUNNING` (reuse the `fetchImportStatus` pattern) - Display: processed / skipped / failed counters ### 5. E2E sanity check Existing tests should keep passing. Add one Playwright test under `frontend/e2e/` that uploads a PDF, waits briefly, and asserts the thumbnail image loads (HTTP 200 from `/api/documents/{id}/thumbnail`). ## Critical files ### New - `backend/src/main/resources/db/migration/V39__add_document_thumbnails.sql` - `backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailService.java` - `backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailAsyncRunner.java` - `backend/src/main/java/org/raddatz/familienarchiv/service/ThumbnailBackfillService.java` - `backend/src/test/java/.../ThumbnailServiceTest.java` - `backend/src/test/java/.../ThumbnailBackfillServiceTest.java` - `frontend/src/lib/thumbnails.ts` ### Modified - `backend/src/main/java/org/raddatz/familienarchiv/model/Document.java` — two new columns - `backend/src/main/java/org/raddatz/familienarchiv/service/DocumentService.java` — inject runner, four hooks - `backend/src/main/java/org/raddatz/familienarchiv/service/MassImportService.java` — one hook in `importSingleDocument` - `backend/src/main/java/org/raddatz/familienarchiv/repository/DocumentRepository.java` — new finder - `backend/src/main/java/org/raddatz/familienarchiv/controller/AdminController.java` — two endpoints - `backend/src/main/java/org/raddatz/familienarchiv/controller/DocumentController.java` — `/thumbnail` endpoint - `backend/src/main/java/org/raddatz/familienarchiv/exception/ErrorCode.java` — new code - `frontend/src/lib/generated/api.ts` — regenerated - `frontend/src/lib/errors.ts` + `messages/{de,en,es}.json` — new code translation - `frontend/src/lib/components/DocumentRow.svelte` — thumbnail column - `frontend/src/routes/persons/[id]/PersonDocumentList.svelte` — thumbnail tile - `frontend/src/routes/admin/system/+page.svelte` — fourth card ### Reused (read-only) - `FileService` (`uploadFile`, `downloadFileBytes`, `downloadFile`) — no changes - `AsyncConfig.taskExecutor()` — same pool as OCR - `OcrAsyncRunner` — structural template - `MassImportService` — state-machine and status template ## Verification End-to-end checklist once implemented: 1. `./mvnw test` — all new and existing backend tests pass. 2. `docker-compose up -d` then `./mvnw spring-boot:run -Dspring-boot.run.profiles=dev`. 3. **Fresh upload**: `POST /api/documents` with a multipart PDF. Response returns immediately; wait ~1 s; `GET /api/documents/{id}` shows `thumbnailKey` populated; `GET /api/documents/{id}/thumbnail` returns a JPEG with `Cache-Control: immutable`. 4. **Image upload**: same with a PNG — thumbnail is a scaled PNG→JPEG. 5. **File replace** (`PUT /api/documents/{id}` with new file): `thumbnailGeneratedAt` changes, cache-busting URL updates, new thumbnail visible. 6. **Admin backfill**: pick a document, clear its `thumbnail_key` in DB manually; `POST /api/admin/generate-thumbnails`; poll `/api/admin/thumbnail-status` until `DONE`; verify the doc has a new thumbnail. 7. **Concurrent backfill is rejected**: start two backfills in a row → second returns HTTP 409 with `THUMBNAIL_BACKFILL_ALREADY_RUNNING`. 8. **Unsupported content type** (e.g. a legacy doc whose mime ended up as `application/octet-stream`): counted in `skipped`, no error, logged. 9. **Frontend**: `npm run generate:api && npm run dev`. Check home search shows thumbnails in rows; person detail shows thumbnails in the sidebar list; admin system page shows the new card and polling works. 10. Playwright e2e: `npm run test:e2e -- thumbnail` passes.
marcel added the featurefile-uploadui labels 2026-04-22 20:40:46 +02:00
Author
Owner

🏛️ Markus Keller — Senior Application Architect

Good plan overall — reuses what already exists (PDFBox, taskExecutor, MassImportService state-machine pattern) and introduces only a focused data-model change. A few architectural concerns before implementation:

Observations

  • The async dispatch happens inside a @Transactional method. DocumentService.storeDocument, createDocument, updateDocument, attachFile are all @Transactional. Calling thumbnailAsyncRunner.generateAsync(saved.getId()) from inside them means the async thread can start before the transaction commits — it will findById and either see nothing or stale data. The codebase already has the right tool for this: AuditService.logAfterCommit(...) uses TransactionSynchronizationManager.registerSynchronization with afterCommit(). That is the pattern to mirror.
  • Thread pool is tiny and fails fast. AsyncConfig.taskExecutor() is core=2, max=2, queue=10, AbortPolicy. The POST /api/documents/quick-upload endpoint is a batch upload — uploading 15 files at once overflows the queue and throws RejectedExecutionException. This plan adds thumbnail generation as a second async consumer of that same pool (today it's used by OcrAsyncRunner).
  • Backfill counters are not thread-safe. volatile BackfillStatus publishes the reference correctly but the record fields are int, and updating counters by creating a new record each iteration is a read-modify-write sequence. Single-threaded backfill is fine (the loop runs on one async thread), but make it explicit so it doesn't get "optimized" later to parallel processing.
  • In-memory state loses on restart. MassImportService has the same limitation — acceptable for a single-operator tool, but worth naming in an ADR so the next maintainer knows it's intentional.

Recommendations

  1. Register afterCommit() synchronization instead of calling the runner directly. Extract a small helper:

    TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() {
        @Override public void afterCommit() {
            thumbnailAsyncRunner.generateAsync(saved.getId());
        }
    });
    

    Follow the AuditService.logAfterCommit shape exactly — it already handles the "we're not in a transaction" edge case.

  2. Add a dedicated thumbnailExecutor bean. Don't share with taskExecutor. Configure e.g. core=1, max=2, queue=200, CallerRunsPolicy. Quick-upload of 15 files should not be the same contention domain as OCR streaming. CallerRunsPolicy applies back-pressure instead of dropping work. Put this next to the existing auditExecutor in AsyncConfig.java and annotate @Async("thumbnailExecutor").

  3. Backfill runs on the thumbnail executor with explicit sequential loop. One thumbnail job at a time, one doc at a time. PDFBox + downloadFileBytes is CPU + memory-heavy — concurrency here buys nothing but OOM risk.

  4. Write the ADR. "Why thumbnails are in-process with PDFBox, not delegated to ocr-service" is exactly the kind of decision that gets second-guessed in 18 months. Two paragraphs in docs/adr/: the constraint (keep backend self-sufficient, avoid cross-service latency on upload), the alternative (ocr-service with PyMuPDF — capable but adds network dependency to the critical upload path), the consequence (acceptable memory ceiling for typical family-archive PDF sizes, <50MB).

Open Decisions

  • Cache-Control on the thumbnail endpoint. public, immutable, max-age=1 year is wrong for a per-user authenticated resource — a shared proxy could serve one user's thumbnail to another. This is really Nora's call; flagging here because it's an architecture-level decision about caching semantics, not just a security fix.
## 🏛️ Markus Keller — Senior Application Architect Good plan overall — reuses what already exists (PDFBox, `taskExecutor`, `MassImportService` state-machine pattern) and introduces only a focused data-model change. A few architectural concerns before implementation: ### Observations - **The async dispatch happens inside a `@Transactional` method.** `DocumentService.storeDocument`, `createDocument`, `updateDocument`, `attachFile` are all `@Transactional`. Calling `thumbnailAsyncRunner.generateAsync(saved.getId())` from inside them means the async thread can start before the transaction commits — it will `findById` and either see nothing or stale data. The codebase already has the right tool for this: `AuditService.logAfterCommit(...)` uses `TransactionSynchronizationManager.registerSynchronization` with `afterCommit()`. That is the pattern to mirror. - **Thread pool is tiny and fails fast.** `AsyncConfig.taskExecutor()` is core=2, max=2, queue=10, `AbortPolicy`. The `POST /api/documents/quick-upload` endpoint is a batch upload — uploading 15 files at once overflows the queue and throws `RejectedExecutionException`. This plan adds thumbnail generation as a second async consumer of that same pool (today it's used by `OcrAsyncRunner`). - **Backfill counters are not thread-safe.** `volatile BackfillStatus` publishes the reference correctly but the record fields are `int`, and updating counters by creating a new record each iteration is a read-modify-write sequence. Single-threaded backfill is fine (the loop runs on one async thread), but make it explicit so it doesn't get "optimized" later to parallel processing. - **In-memory state loses on restart.** `MassImportService` has the same limitation — acceptable for a single-operator tool, but worth naming in an ADR so the next maintainer knows it's intentional. ### Recommendations 1. **Register `afterCommit()` synchronization instead of calling the runner directly.** Extract a small helper: ```java TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() { @Override public void afterCommit() { thumbnailAsyncRunner.generateAsync(saved.getId()); } }); ``` Follow the `AuditService.logAfterCommit` shape exactly — it already handles the "we're not in a transaction" edge case. 2. **Add a dedicated `thumbnailExecutor` bean.** Don't share with `taskExecutor`. Configure e.g. `core=1, max=2, queue=200, CallerRunsPolicy`. Quick-upload of 15 files should not be the same contention domain as OCR streaming. `CallerRunsPolicy` applies back-pressure instead of dropping work. Put this next to the existing `auditExecutor` in `AsyncConfig.java` and annotate `@Async("thumbnailExecutor")`. 3. **Backfill runs on the thumbnail executor with explicit sequential loop.** One thumbnail job at a time, one doc at a time. PDFBox + `downloadFileBytes` is CPU + memory-heavy — concurrency here buys nothing but OOM risk. 4. **Write the ADR.** "Why thumbnails are in-process with PDFBox, not delegated to ocr-service" is exactly the kind of decision that gets second-guessed in 18 months. Two paragraphs in `docs/adr/`: the constraint (keep backend self-sufficient, avoid cross-service latency on upload), the alternative (ocr-service with PyMuPDF — capable but adds network dependency to the critical upload path), the consequence (acceptable memory ceiling for typical family-archive PDF sizes, <50MB). ### Open Decisions - **Cache-Control on the thumbnail endpoint.** `public, immutable, max-age=1 year` is wrong for a per-user authenticated resource — a shared proxy could serve one user's thumbnail to another. This is really Nora's call; flagging here because it's an architecture-level decision about caching semantics, not just a security fix.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Plan reads cleanly and the TDD sequence in §8 is the right shape. A few implementation concerns that a red test will force you to confront early:

Observations

  • fileService.downloadFileBytes() loads the full PDF into a byte[]. For a 50MB PDF (current max-file-size), that is 50MB per concurrent thumbnail task. Two threads × 50MB × an intermediate BufferedImage copy = real peaks. The method is annotated "callers are responsible for not calling this on large files unnecessarily" — that caveat applies here.
  • Dispatching async from inside @Transactional is the classic race (see Markus's note). Use the afterCommit pattern, mirrored from AuditService.logAfterCommit. I will fail-fast on this in review.
  • thumbnails/{docId}.jpg is a deterministic key — good. But PutObject without any content-disposition means a stray GET response on the thumbnail endpoint could default to application/octet-stream if the frontend ever probes it outside our controller. Set the content type on put and let downloadFile() echo it back.
  • The ?v=${thumbnailGeneratedAt} cache-buster serializes a LocalDateTime as 2026-04-22T20:41:15.123456 with colons and dots. encodeURIComponent will escape the colons to %3A. That works, but the key stays more readable if you use .getTime() on an Instant / epoch millis on the backend DTO side. Or simpler: expose thumbnailVersion: number (epoch millis) and leave thumbnailGeneratedAt as the human-readable field.
  • Quick-upload batch path is covered because DocumentService.storeDocument is one of the four sites. The plan already lists it — just wanted to confirm that POST /api/documents/quick-upload flows through there.

Recommendations

  1. Use TransactionSynchronizationManager — code sketch, extract into a helper:

    void dispatchThumbnailAfterCommit(UUID documentId) {
        if (!TransactionSynchronizationManager.isSynchronizationActive()) {
            thumbnailAsyncRunner.generateAsync(documentId);
            return;
        }
        TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() {
            @Override public void afterCommit() { thumbnailAsyncRunner.generateAsync(documentId); }
        });
    }
    

    Four callsites in DocumentService + one in MassImportService. Unit test: verify the runner is NOT called when the surrounding transaction rolls back (e.g. by throwing after save).

  2. Stream the PDF into PDFBox instead of buffering all bytes. Loader.loadPDF(RandomAccessReadBuffer) accepts an InputStream. Add a FileService.downloadFileStream(s3Key) that returns the S3 ResponseInputStream and use try-with-resources. PDFBox only reads the pages it needs; you don't need the whole file in heap. For images, ImageIO.read(InputStream) works too.

  3. Scale with Thumbnails.of(src).width(400).outputFormat("jpg") — or plain AffineTransformOp with TYPE_BILINEAR. Do not use Image.getScaledInstance. It uses deprecated multi-pass AWT scaling that's slower and produces worse output than a single drawImage onto a new BufferedImage. Standard Java, zero dependency:

    BufferedImage scaled = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
    Graphics2D g = scaled.createGraphics();
    g.setRenderingHint(RenderingHints.KEY_INTERPOLATION, RenderingHints.VALUE_INTERPOLATION_BILINEAR);
    g.drawImage(source, 0, 0, width, height, null);
    g.dispose();
    

    TIFF decoding needs twelvemonkeys-imageio-tiff — JDK's ImageIO only handles JPEG/PNG/BMP/GIF by default. Plan calls out TIFF support; add the dependency.

  4. Width 400 is twice what's needed. Display is 60×84 CSS pixels → 120×168 at 2× DPR. 200–240px wide is plenty and halves the bytes. Agree with Leonie on the final number.

  5. Red test first, always. Four tests that must be red before any code:

    • PDFBox rendering produces a BufferedImage (write against a 1-page fixture PDF committed to src/test/resources/fixtures/)
    • S3 putObject is called with key thumbnails/{docId}.jpg and content type image/jpeg
    • TransactionSynchronizationManager afterCommit path fires on successful commit, not on rollback
    • DocumentService.storeDocument dispatches exactly once per call (verify with Mockito verify(thumbnailAsyncRunner, times(1)).generateAsync(...))
  6. Frontend thumbnails.ts helper: keep it a pure function, no $derived — it's imported into multiple components (DocumentRow, PersonDocumentList). Colocate the test next to it.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer Plan reads cleanly and the TDD sequence in §8 is the right shape. A few implementation concerns that a red test will force you to confront early: ### Observations - **`fileService.downloadFileBytes()` loads the full PDF into a `byte[]`.** For a 50MB PDF (current `max-file-size`), that is 50MB per concurrent thumbnail task. Two threads × 50MB × an intermediate `BufferedImage` copy = real peaks. The method is annotated "callers are responsible for not calling this on large files unnecessarily" — that caveat applies here. - **Dispatching async from inside `@Transactional` is the classic race** (see Markus's note). Use the `afterCommit` pattern, mirrored from `AuditService.logAfterCommit`. I will fail-fast on this in review. - **`thumbnails/{docId}.jpg` is a deterministic key — good.** But `PutObject` without any content-disposition means a stray `GET` response on the thumbnail endpoint could default to `application/octet-stream` if the frontend ever probes it outside our controller. Set the content type on put and let `downloadFile()` echo it back. - **The `?v=${thumbnailGeneratedAt}` cache-buster serializes a `LocalDateTime` as `2026-04-22T20:41:15.123456` with colons and dots.** `encodeURIComponent` will escape the colons to `%3A`. That works, but the key stays more readable if you use `.getTime()` on an `Instant` / epoch millis on the backend DTO side. Or simpler: expose `thumbnailVersion: number` (epoch millis) and leave `thumbnailGeneratedAt` as the human-readable field. - **Quick-upload batch path is covered because `DocumentService.storeDocument` is one of the four sites.** The plan already lists it — just wanted to confirm that `POST /api/documents/quick-upload` flows through there. ### Recommendations 1. **Use `TransactionSynchronizationManager` — code sketch, extract into a helper:** ```java void dispatchThumbnailAfterCommit(UUID documentId) { if (!TransactionSynchronizationManager.isSynchronizationActive()) { thumbnailAsyncRunner.generateAsync(documentId); return; } TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() { @Override public void afterCommit() { thumbnailAsyncRunner.generateAsync(documentId); } }); } ``` Four callsites in `DocumentService` + one in `MassImportService`. Unit test: verify the runner is NOT called when the surrounding transaction rolls back (e.g. by throwing after save). 2. **Stream the PDF into PDFBox instead of buffering all bytes.** `Loader.loadPDF(RandomAccessReadBuffer)` accepts an `InputStream`. Add a `FileService.downloadFileStream(s3Key)` that returns the S3 `ResponseInputStream` and use try-with-resources. PDFBox only reads the pages it needs; you don't need the whole file in heap. For images, `ImageIO.read(InputStream)` works too. 3. **Scale with `Thumbnails.of(src).width(400).outputFormat("jpg")`** — or plain `AffineTransformOp` with `TYPE_BILINEAR`. Do **not** use `Image.getScaledInstance`. It uses deprecated multi-pass AWT scaling that's slower and produces worse output than a single `drawImage` onto a new `BufferedImage`. Standard Java, zero dependency: ```java BufferedImage scaled = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB); Graphics2D g = scaled.createGraphics(); g.setRenderingHint(RenderingHints.KEY_INTERPOLATION, RenderingHints.VALUE_INTERPOLATION_BILINEAR); g.drawImage(source, 0, 0, width, height, null); g.dispose(); ``` TIFF decoding needs `twelvemonkeys-imageio-tiff` — JDK's ImageIO only handles JPEG/PNG/BMP/GIF by default. Plan calls out TIFF support; add the dependency. 4. **Width 400 is twice what's needed.** Display is 60×84 CSS pixels → 120×168 at 2× DPR. 200–240px wide is plenty and halves the bytes. Agree with Leonie on the final number. 5. **Red test first, always.** Four tests that must be red before any code: - PDFBox rendering produces a BufferedImage (write against a 1-page fixture PDF committed to `src/test/resources/fixtures/`) - S3 `putObject` is called with key `thumbnails/{docId}.jpg` and content type `image/jpeg` - `TransactionSynchronizationManager` afterCommit path fires on successful commit, not on rollback - `DocumentService.storeDocument` dispatches exactly once per call (verify with Mockito `verify(thumbnailAsyncRunner, times(1)).generateAsync(...)`) 6. **Frontend `thumbnails.ts` helper: keep it a pure function, no `$derived`** — it's imported into multiple components (DocumentRow, PersonDocumentList). Colocate the test next to it.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Solid plan with a security review needed on three specific items before merge. None are showstoppers; all have concrete fixes.

Observations

  1. Cache-Control: public, max-age=31536000, immutable on an authenticated endpoint is a CWE-525 cache leak. public instructs shared caches (corporate reverse proxies, ISP caches, CDN edge) to cache and serve the response to any requester. If anything between the user and your Caddy ever introduces caching with authentication stripped, User A's thumbnail (which may contain the first page of a private letter with PII) gets served to User B. For a household archive on a home LAN this is low exposure today, but the fix is one word:

    .header(HttpHeaders.CACHE_CONTROL, "private, max-age=31536000, immutable")
    

    Every authenticated resource in this codebase should use private. The existing /api/documents/{id}/file endpoint doesn't set Cache-Control at all (which is also imperfect — defaults to no caching, so no leak). Setting it to private, immutable on thumbnail is strictly better.

  2. GET /api/documents/{id}/thumbnail has no explicit @RequirePermission. I see the existing GET /api/documents/{id}/file also has no annotation — it falls back to Spring Security's .anyRequest().authenticated(). That is consistent but still means: any authenticated user (including a low-privilege READ_ALL role) can retrieve any document's thumbnail by UUID. For the current household scope this is acceptable. If you ever introduce per-document ACLs, both endpoints break. Flagging for future-proofing, not as a blocker — pair with the existing /file endpoint's behavior.

  3. PDFBox is a parser attack surface. Malicious PDFs (crafted by an uploader) can trigger: unbounded memory on embedded images, infinite loops on corrupt xrefs, historical CVEs. For a family archive where uploaders are trusted household members this is near-zero real risk. But the plan doesn't call out any resource limits. Low-cost mitigation:

    • Wrap thumbnailService.generate(doc) in a timeout (e.g. Executors.newSingleThreadExecutor().submit(...).get(30, TimeUnit.SECONDS))
    • Log + skip on timeout; the backfill counts it as "failed"
    • Do not propagate the exception into the upload path (plan already says "fire-and-forget, never throw" — good)
  4. The deterministic key thumbnails/{docId}.jpg is fine. UUIDs are unguessable; no risk of pre-populating an attacker-controlled key. No SSRF concerns — rendering happens against bytes we already own.

Recommendations

  1. Change publicprivate in the Cache-Control header. One-char fix, zero downside. Write the regression test:

    @Test
    void thumbnail_response_uses_private_cache_control() {
        mockMvc.perform(get("/api/documents/{id}/thumbnail", docId).with(user("viewer")))
            .andExpect(header().string(HttpHeaders.CACHE_CONTROL, containsString("private")))
            .andExpect(header().string(HttpHeaders.CACHE_CONTROL, not(containsString("public"))));
    }
    

    That test stays forever — it catches future regressions.

  2. Add a 30-second timeout around thumbnailService.generate() in ThumbnailAsyncRunner. Protection against a crafted PDF that makes PDFBox hang. Counts as "failed" in backfill status.

  3. Bound PDFBox memory via IOUtils.setByteArrayMaxOverride is a no-op in PDFBox 3 (API changed). Use Loader.loadPDF(RandomAccessReadBuffer) with a size-limited input stream wrapper. Cap at e.g. 100MB — already bounded by max-file-size: 50MB, so this is defense-in-depth.

  4. Permission test regardless of whether we add @RequirePermission:

    @Test void thumbnail_returns401_when_unauthenticated() { ... }
    @Test void thumbnail_returns404_when_doc_has_no_thumbnail() { ... }
    

    Both stay in the regression suite permanently.

## 🔒 Nora "NullX" Steiner — Application Security Engineer Solid plan with a security review needed on three specific items before merge. None are showstoppers; all have concrete fixes. ### Observations 1. **`Cache-Control: public, max-age=31536000, immutable` on an authenticated endpoint is a CWE-525 cache leak.** `public` instructs shared caches (corporate reverse proxies, ISP caches, CDN edge) to cache and serve the response to any requester. If anything between the user and your Caddy ever introduces caching with authentication stripped, User A's thumbnail (which may contain the first page of a private letter with PII) gets served to User B. For a household archive on a home LAN this is low exposure today, but the fix is one word: ```java .header(HttpHeaders.CACHE_CONTROL, "private, max-age=31536000, immutable") ``` Every authenticated resource in this codebase should use `private`. The existing `/api/documents/{id}/file` endpoint doesn't set Cache-Control at all (which is also imperfect — defaults to no caching, so no leak). Setting it to `private, immutable` on thumbnail is strictly better. 2. **`GET /api/documents/{id}/thumbnail` has no explicit `@RequirePermission`.** I see the existing `GET /api/documents/{id}/file` also has no annotation — it falls back to Spring Security's `.anyRequest().authenticated()`. That is consistent but still means: any authenticated user (including a low-privilege `READ_ALL` role) can retrieve any document's thumbnail by UUID. For the current household scope this is acceptable. If you ever introduce per-document ACLs, both endpoints break. Flagging for future-proofing, not as a blocker — pair with the existing `/file` endpoint's behavior. 3. **PDFBox is a parser attack surface.** Malicious PDFs (crafted by an uploader) can trigger: unbounded memory on embedded images, infinite loops on corrupt xrefs, historical CVEs. For a family archive where uploaders are trusted household members this is near-zero real risk. But the plan doesn't call out any resource limits. Low-cost mitigation: - Wrap `thumbnailService.generate(doc)` in a timeout (e.g. `Executors.newSingleThreadExecutor().submit(...).get(30, TimeUnit.SECONDS)`) - Log + skip on timeout; the backfill counts it as "failed" - Do not propagate the exception into the upload path (plan already says "fire-and-forget, never throw" — good) 4. **The deterministic key `thumbnails/{docId}.jpg` is fine.** UUIDs are unguessable; no risk of pre-populating an attacker-controlled key. No SSRF concerns — rendering happens against bytes we already own. ### Recommendations 1. **Change `public` → `private` in the Cache-Control header.** One-char fix, zero downside. Write the regression test: ```java @Test void thumbnail_response_uses_private_cache_control() { mockMvc.perform(get("/api/documents/{id}/thumbnail", docId).with(user("viewer"))) .andExpect(header().string(HttpHeaders.CACHE_CONTROL, containsString("private"))) .andExpect(header().string(HttpHeaders.CACHE_CONTROL, not(containsString("public")))); } ``` That test stays forever — it catches future regressions. 2. **Add a 30-second timeout around `thumbnailService.generate()` in `ThumbnailAsyncRunner`.** Protection against a crafted PDF that makes PDFBox hang. Counts as "failed" in backfill status. 3. **Bound PDFBox memory via `IOUtils.setByteArrayMaxOverride`** is a no-op in PDFBox 3 (API changed). Use `Loader.loadPDF(RandomAccessReadBuffer)` with a size-limited input stream wrapper. Cap at e.g. 100MB — already bounded by `max-file-size: 50MB`, so this is defense-in-depth. 4. **Permission test regardless of whether we add `@RequirePermission`:** ```java @Test void thumbnail_returns401_when_unauthenticated() { ... } @Test void thumbnail_returns404_when_doc_has_no_thumbnail() { ... } ``` Both stay in the regression suite permanently.
Author
Owner

🧪 Sara Holt — QA & Test Strategist

Test plan in §8 is in the right direction. Coverage gaps I want addressed before merge:

Observations

  • No test for the transaction-commit-ordering race. This is the single highest-risk bug in this plan (see Markus + Felix). Without a test that proves afterCommit semantics, it regresses the moment someone "simplifies" the dispatcher.
  • No test for file replacement regenerating the thumbnail. Verification §5 describes the scenario but there's no test. This is behavior a future refactor can silently break.
  • No test for the thumbnail endpoint cache headers. Security decisions that aren't codified get undone.
  • Backfill test only covers happy path + unsupported content type. Missing: S3 failure (transient network error on one doc shouldn't abort the whole job), PDFBox failure on a corrupt PDF (counts as failed, loop continues), concurrent start rejection (covered by the plan — good).
  • Fixture files. A 1-page PDF and a PNG in src/test/resources/fixtures/. The PDF must actually be renderable by PDFBox — generate it with PDFBox itself in a one-shot script and commit the output.
  • No integration test against real Postgres. Flyway migration V39 needs Testcontainers coverage — H2 accepts most DDL but won't catch column-type issues.
  • No Playwright test for the admin card. Lots of async state transitions (IDLE → RUNNING → DONE / FAILED) and polling logic. One happy-path E2E keeps the state machine honest.

Recommendations

  1. Add ThumbnailDispatchTransactionTest with these exact cases:

    @Test
    void thumbnail_dispatch_does_not_fire_when_transaction_rolls_back() {
        // Use @Transactional + assertThatThrownBy to force a rollback after save,
        // then verify(thumbnailAsyncRunner, never()).generateAsync(any())
    }
    
    @Test
    void thumbnail_dispatch_fires_after_transaction_commits() {
        // Use TransactionTemplate to run the save, assert the runner is called
        // exactly once after commit
    }
    

    These two tests defend the dispatcher contract.

  2. Add DocumentService_fileReplacement_regeneratesThumbnail test. Put a fixture doc with thumbnailKey set, call updateDocument with a new file, verify thumbnailAsyncRunner.generateAsync(docId) was called.

  3. Testcontainers for the migration and repository tests. The new findByFilePathIsNotNullAndThumbnailKeyIsNull() finder needs a real Postgres to validate the JPA derivation.

    @DataJpaTest
    @Testcontainers
    @AutoConfigureTestDatabase(replace = Replace.NONE)
    class DocumentRepositoryThumbnailTest {
        @Container static PostgreSQLContainer<?> pg = new PostgreSQLContainer<>("postgres:16-alpine");
        @DynamicPropertySource static void props(DynamicPropertyRegistry r) {
            r.add("spring.datasource.url", pg::getJdbcUrl);
            r.add("spring.datasource.username", pg::getUsername);
            r.add("spring.datasource.password", pg::getPassword);
        }
        // Test: seed 5 docs (3 with thumbnails, 2 without), assert finder returns the 2
    }
    
  4. Backfill failure scenarios:

    • S3 transient failure on one doc — use @Spy on ThumbnailService, throw StorageFileNotFoundException on the 2nd doc; assert failed counter = 1, processed = remaining docs, loop completed.
    • Unsupported content type — doc with content_type = application/mswordskipped counter increments, no exception.
    • Concurrent start — call runBackfillAsync() twice, expect the second to throw DomainException.conflict(THUMBNAIL_BACKFILL_ALREADY_RUNNING).
    • Poll during run — call getStatus() mid-loop (via CountDownLatch), assert state == RUNNING and processed < total.
  5. One Playwright E2E on the admin card — happy path only. Click button, assert status transitions to RUNNING then DONE within 10 seconds (await expect(page.getByText(/fertig/i)).toBeVisible({ timeout: 10_000 })). Use AxeBuilder on the admin page — new card content must not introduce a11y violations.

  6. Coverage gate — the full test plan must keep branch coverage ≥ 80%. PDFBox rendering branches (PDF vs. image vs. unsupported) need explicit tests; these are likely if/switch branches that coverage will track.

Open Decisions

  • Integration test for the thumbnail endpoint — should it hit a real MinIO via docker-compose.ci.yml, or mock at the FileService boundary? Real MinIO is more faithful; mocking is faster. I lean real-MinIO in one test (the endpoint serves actual bytes), mocked elsewhere. Tobias should weigh in on CI runtime cost.
## 🧪 Sara Holt — QA & Test Strategist Test plan in §8 is in the right direction. Coverage gaps I want addressed before merge: ### Observations - **No test for the transaction-commit-ordering race.** This is the single highest-risk bug in this plan (see Markus + Felix). Without a test that proves `afterCommit` semantics, it regresses the moment someone "simplifies" the dispatcher. - **No test for file replacement regenerating the thumbnail.** Verification §5 describes the scenario but there's no test. This is behavior a future refactor can silently break. - **No test for the thumbnail endpoint cache headers.** Security decisions that aren't codified get undone. - **Backfill test only covers happy path + unsupported content type.** Missing: S3 failure (transient network error on one doc shouldn't abort the whole job), PDFBox failure on a corrupt PDF (counts as `failed`, loop continues), concurrent start rejection (covered by the plan — good). - **Fixture files.** A 1-page PDF and a PNG in `src/test/resources/fixtures/`. The PDF must actually be renderable by PDFBox — generate it with PDFBox itself in a one-shot script and commit the output. - **No integration test against real Postgres.** Flyway migration V39 needs Testcontainers coverage — H2 accepts most DDL but won't catch column-type issues. - **No Playwright test for the admin card.** Lots of async state transitions (IDLE → RUNNING → DONE / FAILED) and polling logic. One happy-path E2E keeps the state machine honest. ### Recommendations 1. **Add `ThumbnailDispatchTransactionTest` with these exact cases:** ```java @Test void thumbnail_dispatch_does_not_fire_when_transaction_rolls_back() { // Use @Transactional + assertThatThrownBy to force a rollback after save, // then verify(thumbnailAsyncRunner, never()).generateAsync(any()) } @Test void thumbnail_dispatch_fires_after_transaction_commits() { // Use TransactionTemplate to run the save, assert the runner is called // exactly once after commit } ``` These two tests defend the dispatcher contract. 2. **Add `DocumentService_fileReplacement_regeneratesThumbnail` test.** Put a fixture doc with thumbnailKey set, call `updateDocument` with a new file, verify `thumbnailAsyncRunner.generateAsync(docId)` was called. 3. **Testcontainers for the migration and repository tests.** The new `findByFilePathIsNotNullAndThumbnailKeyIsNull()` finder needs a real Postgres to validate the JPA derivation. ```java @DataJpaTest @Testcontainers @AutoConfigureTestDatabase(replace = Replace.NONE) class DocumentRepositoryThumbnailTest { @Container static PostgreSQLContainer<?> pg = new PostgreSQLContainer<>("postgres:16-alpine"); @DynamicPropertySource static void props(DynamicPropertyRegistry r) { r.add("spring.datasource.url", pg::getJdbcUrl); r.add("spring.datasource.username", pg::getUsername); r.add("spring.datasource.password", pg::getPassword); } // Test: seed 5 docs (3 with thumbnails, 2 without), assert finder returns the 2 } ``` 4. **Backfill failure scenarios:** - **S3 transient failure on one doc** — use `@Spy` on `ThumbnailService`, throw `StorageFileNotFoundException` on the 2nd doc; assert `failed` counter = 1, `processed` = remaining docs, loop completed. - **Unsupported content type** — doc with `content_type = application/msword` → `skipped` counter increments, no exception. - **Concurrent start** — call `runBackfillAsync()` twice, expect the second to throw `DomainException.conflict(THUMBNAIL_BACKFILL_ALREADY_RUNNING)`. - **Poll during run** — call `getStatus()` mid-loop (via `CountDownLatch`), assert `state == RUNNING` and `processed < total`. 5. **One Playwright E2E on the admin card** — happy path only. Click button, assert status transitions to RUNNING then DONE within 10 seconds (`await expect(page.getByText(/fertig/i)).toBeVisible({ timeout: 10_000 })`). Use `AxeBuilder` on the admin page — new card content must not introduce a11y violations. 6. **Coverage gate — the full test plan must keep branch coverage ≥ 80%.** PDFBox rendering branches (PDF vs. image vs. unsupported) need explicit tests; these are likely `if`/`switch` branches that coverage will track. ### Open Decisions - **Integration test for the thumbnail endpoint** — should it hit a real MinIO via `docker-compose.ci.yml`, or mock at the `FileService` boundary? Real MinIO is more faithful; mocking is faster. I lean real-MinIO in one test (the endpoint serves actual bytes), mocked elsewhere. Tobias should weigh in on CI runtime cost.
Author
Owner

⚙️ Tobias Wendt — DevOps & Platform Engineer

Plan is reasonable for our single-VPS setup. Four operational concerns.

Observations

  • taskExecutor is core=2 / max=2 / queue=10 / AbortPolicy. Adding thumbnail generation as a second async consumer (alongside OcrAsyncRunner) on the same pool means quick-upload of 12+ files will throw RejectedExecutionException. The upload HTTP response succeeds, thumbnails silently don't generate, backfill eventually picks them up — but in production this surfaces as log spam and missing thumbnails users complain about. Not a blocker, but it shifts work onto the backfill path.
  • Memory ceiling is the real capacity constraint. On a CX32 (8GB RAM), Spring Boot at rest uses ~600MB. A 50MB PDF in byte[] + PDFBox PDDocument (~2–3× the file size during parse) + a BufferedImage at page resolution (~50MB for an A4 at 300dpi) is easily 300MB peak per concurrent thumbnail task. Two concurrent tasks on the shared pool = 600MB on top of baseline + OCR. Tight, not impossible.
  • No observability. Thumbnail failures today surface nowhere: no audit entry, no metric, no log beyond a log.warn. When a user says "no thumbnail appeared", the operator has to grep application logs by document UUID. Worth a one-line Metrics.counter("thumbnails.generated", "result", ...).increment() if we have Micrometer wired, or a structured log tag at minimum.
  • docker-compose.yml is unchanged by this plan. Confirmed — no new services, no new bucket, no new env vars. That's the right outcome. Production migration to Hetzner Object Storage (separate effort) works identically: the thumbnails/ prefix is a plain S3 key.

Recommendations

  1. Dedicated thumbnailExecutor bean with CallerRunsPolicy. Put it next to auditExecutor in AsyncConfig.java:

    @Bean("thumbnailExecutor")
    public Executor thumbnailExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(1);
        executor.setMaxPoolSize(2);
        executor.setQueueCapacity(200);
        executor.setThreadNamePrefix("Thumbnail-");
        // CallerRunsPolicy: if queue is full, the upload thread generates the thumbnail
        // synchronously. Slower upload, but no dropped work and no RejectedExecutionException.
        executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        return executor;
    }
    

    Annotate @Async("thumbnailExecutor"). Same pattern as @Async("auditExecutor"). Isolates PDF rendering from OCR.

  2. Stream PDF bytes instead of fully-buffering. FileService.downloadFileBytes is ~50MB for a full PDF; switch to downloadFileStream(s3Key) returning the ResponseInputStream. PDFBox can parse from InputStream, PDFRenderer renders one page — you never need all pages in memory. Halves the memory ceiling.

  3. Log failures structured, not just log.warn. For the backfill especially, an operator running it against 5000 docs wants a single summary line:

    log.info("Thumbnail backfill complete: total={} processed={} skipped={} failed={} durationMs={}",
        total, processed, skipped, failed, Duration.between(startedAt, now).toMillis());
    

    And for each failure: log.warn("Thumbnail generation failed for doc {}: {}", docId, e.getMessage()). The message goes to Loki once we set that up.

  4. CI test cost. The Playwright test Sara's requesting + the Testcontainers integration tests add ~20–40s to the pipeline. Acceptable — we're at ~4 min total today. Real MinIO in the integration test (vs. mock) is worth the extra 5–10s because PutObjectRequest signing differs subtly between mock and real S3 and has bitten us before.

Open Decisions

  • Thumbnail width: 200 or 400? Bandwidth matters for mobile (the family uses this from phones). At 400px every row is ~15–25KB; at 200px it's ~5–8KB. For a 30-row document list that's ~750KB vs ~240KB. Leonie should pick the final pixel size; I just want the smaller number unless she has a retina reason. (Also: JPEG quality 85 is a good default — don't go above 90, the size-to-quality cliff is there.)
## ⚙️ Tobias Wendt — DevOps & Platform Engineer Plan is reasonable for our single-VPS setup. Four operational concerns. ### Observations - **`taskExecutor` is core=2 / max=2 / queue=10 / AbortPolicy.** Adding thumbnail generation as a second async consumer (alongside `OcrAsyncRunner`) on the same pool means quick-upload of 12+ files will throw `RejectedExecutionException`. The upload HTTP response succeeds, thumbnails silently don't generate, backfill eventually picks them up — but in production this surfaces as log spam and missing thumbnails users complain about. Not a blocker, but it shifts work onto the backfill path. - **Memory ceiling is the real capacity constraint.** On a CX32 (8GB RAM), Spring Boot at rest uses ~600MB. A 50MB PDF in `byte[]` + PDFBox PDDocument (~2–3× the file size during parse) + a `BufferedImage` at page resolution (~50MB for an A4 at 300dpi) is easily 300MB peak per concurrent thumbnail task. Two concurrent tasks on the shared pool = 600MB on top of baseline + OCR. Tight, not impossible. - **No observability.** Thumbnail failures today surface nowhere: no audit entry, no metric, no log beyond a `log.warn`. When a user says "no thumbnail appeared", the operator has to grep application logs by document UUID. Worth a one-line `Metrics.counter("thumbnails.generated", "result", ...).increment()` if we have Micrometer wired, or a structured log tag at minimum. - **`docker-compose.yml` is unchanged by this plan.** Confirmed — no new services, no new bucket, no new env vars. That's the right outcome. Production migration to Hetzner Object Storage (separate effort) works identically: the `thumbnails/` prefix is a plain S3 key. ### Recommendations 1. **Dedicated `thumbnailExecutor` bean with `CallerRunsPolicy`.** Put it next to `auditExecutor` in `AsyncConfig.java`: ```java @Bean("thumbnailExecutor") public Executor thumbnailExecutor() { ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor(); executor.setCorePoolSize(1); executor.setMaxPoolSize(2); executor.setQueueCapacity(200); executor.setThreadNamePrefix("Thumbnail-"); // CallerRunsPolicy: if queue is full, the upload thread generates the thumbnail // synchronously. Slower upload, but no dropped work and no RejectedExecutionException. executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy()); return executor; } ``` Annotate `@Async("thumbnailExecutor")`. Same pattern as `@Async("auditExecutor")`. Isolates PDF rendering from OCR. 2. **Stream PDF bytes instead of fully-buffering.** `FileService.downloadFileBytes` is ~50MB for a full PDF; switch to `downloadFileStream(s3Key)` returning the `ResponseInputStream`. PDFBox can parse from `InputStream`, PDFRenderer renders one page — you never need all pages in memory. Halves the memory ceiling. 3. **Log failures structured, not just `log.warn`.** For the backfill especially, an operator running it against 5000 docs wants a single summary line: ```java log.info("Thumbnail backfill complete: total={} processed={} skipped={} failed={} durationMs={}", total, processed, skipped, failed, Duration.between(startedAt, now).toMillis()); ``` And for each failure: `log.warn("Thumbnail generation failed for doc {}: {}", docId, e.getMessage())`. The message goes to Loki once we set that up. 4. **CI test cost.** The Playwright test Sara's requesting + the Testcontainers integration tests add ~20–40s to the pipeline. Acceptable — we're at ~4 min total today. Real MinIO in the integration test (vs. mock) is worth the extra 5–10s because `PutObjectRequest` signing differs subtly between mock and real S3 and has bitten us before. ### Open Decisions - **Thumbnail width: 200 or 400?** Bandwidth matters for mobile (the family uses this from phones). At 400px every row is ~15–25KB; at 200px it's ~5–8KB. For a 30-row document list that's ~750KB vs ~240KB. Leonie should pick the final pixel size; I just want the smaller number unless she has a retina reason. (Also: JPEG quality 85 is a good default — don't go above 90, the size-to-quality cliff is there.)
Author
Owner

🎨 Leonie Voss — UX & Accessibility Lead

The spec at docs/specs/briefwechsel-thumbnail-rows-spec.html is the source of truth for the visual — this plan describes the plumbing, not the final presentation. A few things the plan should pin down before implementation.

Observations

  • Width 400px is more than the UI needs. The DocumentRow has a left-of-title slot at 60×84 CSS pixels; mobile is even smaller. At 2× device pixel ratio that's 120×168 actual — 400px is 3.3× oversampled. Bandwidth on a phone in the garden at 2 bars of LTE matters (half our audience is 60+ and often on patchy connections). 200–240px is plenty and cuts thumbnail weight in half. Quality 85 JPEG at 200px = typical ~6–8KB per thumbnail.
  • Variable aspect ratio on a grid. PDFs are portrait (A4 ~1:1.414). Images can be anything — landscape scans, square photos. Rendering "400px wide, height proportional" means row heights jump when an image thumbnail is mixed with PDF thumbnails. The spec needs to define a fixed tile size (e.g. 60×84 with object-fit: cover and centered focal point) so rows stay regular.
  • The plan mentions $lib/components/DocumentRow.svelte but not PersonDocumentList's current icon tile. The current PDF icon there has brand-navy background. Thumbnails will look very different next to it — inconsistent. Apply the same tile treatment everywhere thumbnails appear: white background, 1px border-line, 2px radius, fixed dimensions.
  • Fallback state when thumbnailKey is null. Three cases produce this: (a) pre-backfill existing docs, (b) unsupported content type (.doc, .eml), (c) transient generation failure. All three should render the current PDF icon as graceful degradation — users should never see a broken <img> or an empty rectangle.
  • Dark mode was not in the plan. Our admin page uses bg-surface / text-ink semantic tokens; the existing light-on-navy thumbnail placeholder works in both modes. In dark mode, JPEG thumbnails will show a bright white paper background against a dark page — distracting. This is a spec-level decision (borderize them? slight opacity reduction?) — docs/specs/briefwechsel-thumbnail-rows-spec.html should already cover it.
  • Admin card copy. Plan says button text: "Thumbnails für fehlende Dokumente erzeugen". This is fine semantically but bypasses Paraglide. Every other button in /admin/system goes through m.admin_system_...() keys. New keys needed in messages/{de,en,es}.json.
  • i18n missing. The "out of scope for this iteration" note drops ConversationTimeline + ChronikRow — I endorse that scoping, but please add a follow-up issue now so it doesn't get forgotten. The spec already covers the briefwechsel treatment.

Recommendations

  1. Generate at 240×auto, not 400×auto. Quality 85. Write as image/jpeg. Keep PNGs/TIFFs as JPEG output too — a 240×auto JPEG of a scanned letter looks identical to PNG at 2–3× the size.

  2. Fixed tile dimensions in the UI — not "height proportional":

    <div class="relative h-[84px] w-[60px] flex-shrink-0 overflow-hidden rounded-sm border border-line bg-white">
      {#if thumbUrl}
        <img src={thumbUrl} alt="" class="h-full w-full object-cover object-top" loading="lazy" decoding="async" />
      {:else}
        <!-- existing PDF icon centered -->
      {/if}
    </div>
    

    object-cover with object-top shows the top of the page (where letter salutations and titles live) in a consistent 60×84 frame. No layout shift between PDFs and images.

  3. Use alt="" on thumbnail <img>. The thumbnail is decorative; the document title next to it is the accessible name. Putting alt="Dokument-Vorschau" on every one adds announcement noise for screen readers. Empty alt + decorative is the right WCAG call here.

  4. loading="lazy" decoding="async" on every thumbnail. Avoids blocking the main thread on first paint.

  5. Focus-visible ring on the enclosing link. The existing DocumentRow <a> gets focus, not the img. Confirm it has a visible focus indicator in both light and dark mode — the row's existing <a> wrapper is the focus target. Run axe-core.

  6. Paraglide keys — add these to messages/{de,en,es}.json:

    • admin_system_thumbnails_heading → "Thumbnails erzeugen" / "Generate thumbnails" / "Generar miniaturas"
    • admin_system_thumbnails_description
    • admin_system_thumbnails_btn_start, _btn_retry, _status_running, _status_done, _status_failed
      Mirror the existing admin_system_import_* keys exactly.

Open Decisions

  • object-top vs object-center. Top gives you the salutation / masthead / title. Center is better for photos and landscape images. I lean object-top for letter-heavy archive, but I want to eyeball a few real examples from /import before locking it in.
  • Dark mode presentation of bright scans. The spec file may already answer this — I'll re-read before implementation. Worst case: add a mix-blend-mode: multiply or a 92% opacity on the img in dark mode.
## 🎨 Leonie Voss — UX & Accessibility Lead The spec at `docs/specs/briefwechsel-thumbnail-rows-spec.html` is the source of truth for the visual — this plan describes the plumbing, not the final presentation. A few things the plan should pin down before implementation. ### Observations - **Width 400px is more than the UI needs.** The DocumentRow has a left-of-title slot at 60×84 CSS pixels; mobile is even smaller. At 2× device pixel ratio that's 120×168 actual — 400px is 3.3× oversampled. Bandwidth on a phone in the garden at 2 bars of LTE matters (half our audience is 60+ and often on patchy connections). 200–240px is plenty and cuts thumbnail weight in half. Quality 85 JPEG at 200px = typical ~6–8KB per thumbnail. - **Variable aspect ratio on a grid.** PDFs are portrait (A4 ~1:1.414). Images can be anything — landscape scans, square photos. Rendering "400px wide, height proportional" means row heights jump when an image thumbnail is mixed with PDF thumbnails. The spec needs to define a **fixed tile size** (e.g. 60×84 with `object-fit: cover` and centered focal point) so rows stay regular. - **The plan mentions `$lib/components/DocumentRow.svelte` but not `PersonDocumentList`'s current icon tile.** The current PDF icon there has brand-navy background. Thumbnails will look very different next to it — inconsistent. Apply the same tile treatment everywhere thumbnails appear: white background, 1px `border-line`, 2px radius, fixed dimensions. - **Fallback state when `thumbnailKey` is null.** Three cases produce this: (a) pre-backfill existing docs, (b) unsupported content type (`.doc`, `.eml`), (c) transient generation failure. All three should render the current PDF icon as graceful degradation — users should never see a broken `<img>` or an empty rectangle. - **Dark mode was not in the plan.** Our admin page uses `bg-surface` / `text-ink` semantic tokens; the existing light-on-navy thumbnail placeholder works in both modes. In dark mode, JPEG thumbnails will show a bright white paper background against a dark page — distracting. This is a spec-level decision (borderize them? slight opacity reduction?) — `docs/specs/briefwechsel-thumbnail-rows-spec.html` should already cover it. - **Admin card copy.** Plan says button text: "Thumbnails für fehlende Dokumente erzeugen". This is fine semantically but bypasses Paraglide. Every other button in `/admin/system` goes through `m.admin_system_...()` keys. New keys needed in `messages/{de,en,es}.json`. - **i18n missing.** The "out of scope for this iteration" note drops ConversationTimeline + ChronikRow — I endorse that scoping, but please add a follow-up issue now so it doesn't get forgotten. The spec already covers the briefwechsel treatment. ### Recommendations 1. **Generate at 240×auto, not 400×auto.** Quality 85. Write as `image/jpeg`. Keep PNGs/TIFFs as JPEG output too — a 240×auto JPEG of a scanned letter looks identical to PNG at 2–3× the size. 2. **Fixed tile dimensions in the UI** — not "height proportional": ```svelte <div class="relative h-[84px] w-[60px] flex-shrink-0 overflow-hidden rounded-sm border border-line bg-white"> {#if thumbUrl} <img src={thumbUrl} alt="" class="h-full w-full object-cover object-top" loading="lazy" decoding="async" /> {:else} <!-- existing PDF icon centered --> {/if} </div> ``` `object-cover` with `object-top` shows the top of the page (where letter salutations and titles live) in a consistent 60×84 frame. No layout shift between PDFs and images. 3. **Use `alt=""` on thumbnail `<img>`.** The thumbnail is decorative; the document title next to it is the accessible name. Putting `alt="Dokument-Vorschau"` on every one adds announcement noise for screen readers. Empty alt + decorative is the right WCAG call here. 4. **`loading="lazy" decoding="async"`** on every thumbnail. Avoids blocking the main thread on first paint. 5. **Focus-visible ring on the enclosing link.** The existing DocumentRow `<a>` gets focus, not the img. Confirm it has a visible focus indicator in both light and dark mode — the row's existing `<a>` wrapper is the focus target. Run axe-core. 6. **Paraglide keys** — add these to `messages/{de,en,es}.json`: - `admin_system_thumbnails_heading` → "Thumbnails erzeugen" / "Generate thumbnails" / "Generar miniaturas" - `admin_system_thumbnails_description` - `admin_system_thumbnails_btn_start`, `_btn_retry`, `_status_running`, `_status_done`, `_status_failed` Mirror the existing `admin_system_import_*` keys exactly. ### Open Decisions - **`object-top` vs `object-center`.** Top gives you the salutation / masthead / title. Center is better for photos and landscape images. I lean `object-top` for letter-heavy archive, but I want to eyeball a few real examples from `/import` before locking it in. - **Dark mode presentation of bright scans.** The spec file may already answer this — I'll re-read before implementation. Worst case: add a `mix-blend-mode: multiply` or a 92% opacity on the img in dark mode.
Author
Owner

🗳️ Decision Queue — Action Required

4 decisions need your input before implementation starts. Deduplicated across personas.

🎨 UX & Presentation

  • Thumbnail generation width. Leonie recommends 240px (display is 60×84 CSS → 120×168 retina). Tobias backs the smaller number for bandwidth (240px ≈ 6–8KB/thumbnail vs 15–25KB at 400px; for a 30-row list that's ~240KB vs ~750KB on mobile). Plan currently says 400px.
    Options: 240px (default recommendation) · 320px (middle ground) · 400px (as planned).
    (Raised by: Leonie, Tobias, Felix)

  • object-top vs object-center focal point. A fixed 60×84 tile needs a cropping rule. Letter-heavy docs benefit from object-top (salutation + title visible). Photo-heavy content benefits from object-center. Leonie leans top but wants to eyeball real examples from /import first.
    Options: top (letter-first) · center (photo-first) · both, with content-type hint driving choice.
    (Raised by: Leonie)

  • Dark-mode thumbnail treatment. JPEG thumbnails of white letter scans will look very bright against a dark page. Leonie flagged but didn't lock in a choice.
    Options: no treatment (accept the brightness) · mix-blend-mode: multiply in dark mode · 92% opacity on <img> in dark mode · thin border-line frame to separate from page background.
    (Raised by: Leonie)

🏛️ Architecture

  • ADR for "PDFBox in backend, not ocr-service". Markus wants a short ADR (2 paragraphs in docs/adr/) capturing why thumbnails render in-process rather than being delegated to the existing Python OCR service. This is not an open technical choice (the plan is correct to keep it in the backend for now) — it's a deliverable to prevent a future reviewer from second-guessing the decision.
    Options: write the ADR as part of this PR · defer to a follow-up · skip.
    (Raised by: Markus)

Items NOT in the queue — convergent recommendations already aligned across personas

The following were flagged by multiple personas but have a single clear answer:

Item Resolution Raised by
Dispatch thumbnail task with afterCommit, not inline Mirror AuditService.logAfterCommit pattern Markus, Felix, Sara
Dedicated thumbnailExecutor bean, not shared taskExecutor core=1, max=2, queue=200, CallerRunsPolicy in AsyncConfig Markus, Tobias
Stream PDF from S3, don't buffer 50MB into heap Add FileService.downloadFileStream() Felix, Tobias
Cache-Control: private not public CWE-525 cache-leak fix Nora, Markus
30-second timeout around PDFBox rendering Defense against malicious PDFs Nora
TIFF support requires twelvemonkeys-imageio-tiff dependency JDK ImageIO doesn't handle TIFF Felix
Scale with Graphics2D.drawImage + bilinear, not Image.getScaledInstance Deprecated, slower, worse output Felix
alt="" on thumbnail <img> (decorative) Title nearby is the accessible name Leonie
loading="lazy" decoding="async" on every thumbnail Avoids blocking main thread Leonie
Paraglide keys for admin card copy Mirror admin_system_import_* Leonie
Real MinIO in one integration test, mocked elsewhere CI cost acceptable Sara, Tobias
Testcontainers (not H2) for migration + repository tests Catches Postgres-specific column behavior Sara
Tests that must exist: afterCommit ordering, file-replacement regen, cache-header regression, backfill failure scenarios All permanent regression coverage Sara, Nora, Felix
Fixed 60×84 tile with object-cover — not variable height Prevents row-height jumps Leonie
Follow-up issue for ConversationTimeline + ChronikRow thumbnails Scoped out of this PR by plan Leonie

If you disagree with any of the above, flag it in the walk-through and we'll reopen.

## 🗳️ Decision Queue — Action Required _4 decisions need your input before implementation starts. Deduplicated across personas._ ### 🎨 UX & Presentation - **Thumbnail generation width.** Leonie recommends **240px** (display is 60×84 CSS → 120×168 retina). Tobias backs the smaller number for bandwidth (240px ≈ 6–8KB/thumbnail vs 15–25KB at 400px; for a 30-row list that's ~240KB vs ~750KB on mobile). Plan currently says 400px. _Options: 240px (default recommendation) · 320px (middle ground) · 400px (as planned)._ _(Raised by: Leonie, Tobias, Felix)_ - **`object-top` vs `object-center` focal point.** A fixed 60×84 tile needs a cropping rule. Letter-heavy docs benefit from `object-top` (salutation + title visible). Photo-heavy content benefits from `object-center`. Leonie leans top but wants to eyeball real examples from `/import` first. _Options: top (letter-first) · center (photo-first) · both, with content-type hint driving choice._ _(Raised by: Leonie)_ - **Dark-mode thumbnail treatment.** JPEG thumbnails of white letter scans will look very bright against a dark page. Leonie flagged but didn't lock in a choice. _Options: no treatment (accept the brightness) · `mix-blend-mode: multiply` in dark mode · 92% opacity on `<img>` in dark mode · thin `border-line` frame to separate from page background._ _(Raised by: Leonie)_ ### 🏛️ Architecture - **ADR for "PDFBox in backend, not ocr-service".** Markus wants a short ADR (2 paragraphs in `docs/adr/`) capturing why thumbnails render in-process rather than being delegated to the existing Python OCR service. This is not an open technical choice (the plan is correct to keep it in the backend for now) — it's a deliverable to prevent a future reviewer from second-guessing the decision. _Options: write the ADR as part of this PR · defer to a follow-up · skip._ _(Raised by: Markus)_ --- ### ✅ Items NOT in the queue — convergent recommendations already aligned across personas The following were flagged by multiple personas but have a single clear answer: | Item | Resolution | Raised by | |---|---|---| | Dispatch thumbnail task with `afterCommit`, not inline | Mirror `AuditService.logAfterCommit` pattern | Markus, Felix, Sara | | Dedicated `thumbnailExecutor` bean, not shared `taskExecutor` | `core=1, max=2, queue=200, CallerRunsPolicy` in AsyncConfig | Markus, Tobias | | Stream PDF from S3, don't buffer 50MB into heap | Add `FileService.downloadFileStream()` | Felix, Tobias | | `Cache-Control: private` not `public` | CWE-525 cache-leak fix | Nora, Markus | | 30-second timeout around PDFBox rendering | Defense against malicious PDFs | Nora | | TIFF support requires `twelvemonkeys-imageio-tiff` dependency | JDK ImageIO doesn't handle TIFF | Felix | | Scale with `Graphics2D.drawImage` + bilinear, not `Image.getScaledInstance` | Deprecated, slower, worse output | Felix | | `alt=""` on thumbnail `<img>` (decorative) | Title nearby is the accessible name | Leonie | | `loading="lazy" decoding="async"` on every thumbnail | Avoids blocking main thread | Leonie | | Paraglide keys for admin card copy | Mirror `admin_system_import_*` | Leonie | | Real MinIO in one integration test, mocked elsewhere | CI cost acceptable | Sara, Tobias | | Testcontainers (not H2) for migration + repository tests | Catches Postgres-specific column behavior | Sara | | Tests that must exist: afterCommit ordering, file-replacement regen, cache-header regression, backfill failure scenarios | All permanent regression coverage | Sara, Nora, Felix | | Fixed 60×84 tile with `object-cover` — not variable height | Prevents row-height jumps | Leonie | | Follow-up issue for ConversationTimeline + ChronikRow thumbnails | Scoped out of this PR by plan | Leonie | If you disagree with any of the above, flag it in the walk-through and we'll reopen.
Author
Owner

🎯 Discussion Resolutions

After walking through every point raised by the six personas, here are the agreed decisions. These now act as the authoritative design for implementation — implement will read this comment alongside the original issue.

T1 — Async dispatch & transaction ordering

  • Decision: New public method ThumbnailAsyncRunner.dispatchAfterCommit(UUID docId) encapsulates TransactionSynchronizationManager.registerSynchronization(…afterCommit…). All callers (4 sites in DocumentService, 1 in MassImportService) simply call this method.
  • Rationale: Race condition if generateAsync fires before the commit. Mirror the existing AuditService.logAfterCommit pattern — transaction awareness belongs in the runner, not at every callsite.

T2 — Thread pool & backpressure

  • Decision: New @Bean("thumbnailExecutor") in AsyncConfig.java: core=1, max=2, queue=200, CallerRunsPolicy, prefix Thumbnail-. @Async("thumbnailExecutor") on runner and backfill service. Backfill loop strictly sequential.
  • Rationale: Quick-upload of 15+ files would overflow the shared taskExecutor (queue=10, AbortPolicy) and silently drop thumbnails. CallerRunsPolicy provides backpressure instead of dropping work. Dedicated pool isolates from OCR.

T3 — Memory ceiling & PDF handling

  • Decision (streaming): Add FileService.downloadFileStream(s3Key) returning ResponseInputStream<GetObjectResponse>. ThumbnailService consumes stream, never buffers full bytes.
  • Decision (timeout): 30-second timeout wrapping the full generate() call inside ThumbnailAsyncRunner. On timeout: warn log, increment failed counter, don't rethrow.
  • Decision (scaling): Graphics2D.drawImage with VALUE_INTERPOLATION_BILINEAR. Do not use Image.getScaledInstance.
  • Decision (TIFF): Add com.twelvemonkeys.imageio:imageio-tiff:3.12.0 to backend/pom.xml.
  • Rationale: CX32 memory ceiling, defense against corrupt/malicious PDFs, correct modern scaling API, TIFF is already in the upload allowlist so we must support it in thumbnails.

T4 — Thumbnail endpoint: caching, headers, permissions

  • Decision (cache): Cache-Control: private, max-age=31536000, immutable on /api/documents/{id}/thumbnail. Regression test asserts private present and public absent.
  • Decision (content-type): image/jpeg set on S3 putObject, echoed back via existing FileService.downloadFile().
  • Decision (auth): No explicit @RequirePermission — mirrors existing /file endpoint (authenticated-only). Future ACL work addresses both endpoints together.
  • Rationale: CWE-525 cache-leak on public; consistency with /file.

T5 — Output size & format

  • Decision: Width 240px, height proportional, JPEG quality 85, all inputs (PDF/JPEG/PNG/TIFF) rendered to JPEG output.
  • Rationale: Display is 60×84 CSS → 120×168 retina. 240px is 2× oversampled; 400px was 3.3×. Cuts thumbnail bandwidth roughly in half.

T6 — Tile rendering in UI

  • Decision (component): New shared $lib/components/DocumentThumbnail.svelte takes a single doc: Pick<Document, 'id' | 'thumbnailKey' | 'thumbnailGeneratedAt' | 'contentType'> prop. Used in DocumentRow.svelte and PersonDocumentList.svelte.
  • Decision (tile): Fixed 60×84 tile, object-fit: cover, object-position: top, white bg, border-line, rounded-sm, alt="", loading="lazy", decoding="async". Focus-visible ring stays on the enclosing <a>.
  • Decision (focal point): object-top globally — optimized for letter-heavy archive.
  • Decision (dark mode): dark:mix-blend-multiply on the <img>; border visible in both themes.
  • Decision (fallback): Existing PDF icon centered inside the tile when thumbnailKey is null.
  • Rationale: Centralized component enforces visual consistency; cropping optimized for letters; multiply blend-mode prevents bright white rectangles in dark theme.

T7 — Admin card i18n

  • Decision: Seven new Paraglide keys (admin_system_thumbnails_heading, _description, _btn_start, _btn_retry, _status_running, _status_done {count}, _status_failed {message}), mirroring admin_system_import_*. Translations drafted in de/en/es.

T8 — Cache-bust URL parameter

  • Decision: Keep thumbnailGeneratedAt (LocalDateTime) as-is. Accept the escaped-colons URL. No separate thumbnailVersion field.
  • Rationale: URL is a query param users never see; shaving one @Transient getter off the entity is worth the minor aesthetic hit.

T9 — Test coverage

  • Decision (matrix): Unit (Mockito) + integration (Testcontainers + real Postgres) + controller slice (@WebMvcTest) + frontend unit + component tests + one Playwright E2E on admin card + axe-core scan.
  • Decision (MinIO): One integration test hits real MinIO via docker-compose.ci.yml. Everything else mocks at FileService boundary.
  • Decision (fixtures): Committed src/test/resources/fixtures/sample.pdf (generated by one-shot PDFBox script) and sample.png.
  • Critical tests that must exist: afterCommit commit-vs-rollback dispatch, file-replacement regeneration, cache-header regression, backfill failure scenarios (S3 error, unsupported content type, concurrent start, poll during run).

T10 — Observability

  • Decision: Per-failure log.warn("Thumbnail generation failed for doc={} reason={}", ...). Backfill end log.info("Thumbnail backfill complete: total={} processed={} skipped={} failed={} durationMs={}", ...).
  • Decision: Micrometer counter deferred — not in phase-7 infra yet.

T11 — Documentation

  • Decision: Write docs/adr/ADR-001-pdfbox-for-thumbnails.md as part of this PR. Establishes the docs/adr/ convention.
  • Decision: No follow-up issue for ConversationTimeline / ChronikRow — explicitly out of scope for this iteration.

Open / Skipped

None — every point raised by the personas was walked through and resolved. Nothing deferred.

# 🎯 Discussion Resolutions After walking through every point raised by the six personas, here are the agreed decisions. These now act as the authoritative design for implementation — `implement` will read this comment alongside the original issue. ## T1 — Async dispatch & transaction ordering - **Decision**: New public method `ThumbnailAsyncRunner.dispatchAfterCommit(UUID docId)` encapsulates `TransactionSynchronizationManager.registerSynchronization(…afterCommit…)`. All callers (4 sites in `DocumentService`, 1 in `MassImportService`) simply call this method. - **Rationale**: Race condition if `generateAsync` fires before the commit. Mirror the existing `AuditService.logAfterCommit` pattern — transaction awareness belongs in the runner, not at every callsite. ## T2 — Thread pool & backpressure - **Decision**: New `@Bean("thumbnailExecutor")` in `AsyncConfig.java`: `core=1, max=2, queue=200, CallerRunsPolicy`, prefix `Thumbnail-`. `@Async("thumbnailExecutor")` on runner and backfill service. Backfill loop strictly sequential. - **Rationale**: Quick-upload of 15+ files would overflow the shared `taskExecutor` (queue=10, AbortPolicy) and silently drop thumbnails. `CallerRunsPolicy` provides backpressure instead of dropping work. Dedicated pool isolates from OCR. ## T3 — Memory ceiling & PDF handling - **Decision (streaming)**: Add `FileService.downloadFileStream(s3Key)` returning `ResponseInputStream<GetObjectResponse>`. `ThumbnailService` consumes stream, never buffers full bytes. - **Decision (timeout)**: 30-second timeout wrapping the full `generate()` call inside `ThumbnailAsyncRunner`. On timeout: warn log, increment `failed` counter, don't rethrow. - **Decision (scaling)**: `Graphics2D.drawImage` with `VALUE_INTERPOLATION_BILINEAR`. Do not use `Image.getScaledInstance`. - **Decision (TIFF)**: Add `com.twelvemonkeys.imageio:imageio-tiff:3.12.0` to `backend/pom.xml`. - **Rationale**: CX32 memory ceiling, defense against corrupt/malicious PDFs, correct modern scaling API, TIFF is already in the upload allowlist so we must support it in thumbnails. ## T4 — Thumbnail endpoint: caching, headers, permissions - **Decision (cache)**: `Cache-Control: private, max-age=31536000, immutable` on `/api/documents/{id}/thumbnail`. Regression test asserts `private` present and `public` absent. - **Decision (content-type)**: `image/jpeg` set on S3 `putObject`, echoed back via existing `FileService.downloadFile()`. - **Decision (auth)**: No explicit `@RequirePermission` — mirrors existing `/file` endpoint (authenticated-only). Future ACL work addresses both endpoints together. - **Rationale**: CWE-525 cache-leak on `public`; consistency with `/file`. ## T5 — Output size & format - **Decision**: Width **240px**, height proportional, JPEG quality **85**, all inputs (PDF/JPEG/PNG/TIFF) rendered to JPEG output. - **Rationale**: Display is 60×84 CSS → 120×168 retina. 240px is 2× oversampled; 400px was 3.3×. Cuts thumbnail bandwidth roughly in half. ## T6 — Tile rendering in UI - **Decision (component)**: New shared `$lib/components/DocumentThumbnail.svelte` takes a single `doc: Pick<Document, 'id' | 'thumbnailKey' | 'thumbnailGeneratedAt' | 'contentType'>` prop. Used in `DocumentRow.svelte` and `PersonDocumentList.svelte`. - **Decision (tile)**: Fixed 60×84 tile, `object-fit: cover`, `object-position: top`, white bg, `border-line`, `rounded-sm`, `alt=""`, `loading="lazy"`, `decoding="async"`. Focus-visible ring stays on the enclosing `<a>`. - **Decision (focal point)**: `object-top` globally — optimized for letter-heavy archive. - **Decision (dark mode)**: `dark:mix-blend-multiply` on the `<img>`; border visible in both themes. - **Decision (fallback)**: Existing PDF icon centered inside the tile when `thumbnailKey` is null. - **Rationale**: Centralized component enforces visual consistency; cropping optimized for letters; multiply blend-mode prevents bright white rectangles in dark theme. ## T7 — Admin card i18n - **Decision**: Seven new Paraglide keys (`admin_system_thumbnails_heading`, `_description`, `_btn_start`, `_btn_retry`, `_status_running`, `_status_done {count}`, `_status_failed {message}`), mirroring `admin_system_import_*`. Translations drafted in de/en/es. ## T8 — Cache-bust URL parameter - **Decision**: Keep `thumbnailGeneratedAt` (`LocalDateTime`) as-is. Accept the escaped-colons URL. No separate `thumbnailVersion` field. - **Rationale**: URL is a query param users never see; shaving one `@Transient` getter off the entity is worth the minor aesthetic hit. ## T9 — Test coverage - **Decision (matrix)**: Unit (Mockito) + integration (Testcontainers + real Postgres) + controller slice (`@WebMvcTest`) + frontend unit + component tests + one Playwright E2E on admin card + axe-core scan. - **Decision (MinIO)**: One integration test hits real MinIO via `docker-compose.ci.yml`. Everything else mocks at `FileService` boundary. - **Decision (fixtures)**: Committed `src/test/resources/fixtures/sample.pdf` (generated by one-shot PDFBox script) and `sample.png`. - **Critical tests that must exist**: afterCommit commit-vs-rollback dispatch, file-replacement regeneration, cache-header regression, backfill failure scenarios (S3 error, unsupported content type, concurrent start, poll during run). ## T10 — Observability - **Decision**: Per-failure `log.warn("Thumbnail generation failed for doc={} reason={}", ...)`. Backfill end `log.info("Thumbnail backfill complete: total={} processed={} skipped={} failed={} durationMs={}", ...)`. - **Decision**: Micrometer counter deferred — not in phase-7 infra yet. ## T11 — Documentation - **Decision**: Write `docs/adr/ADR-001-pdfbox-for-thumbnails.md` as part of this PR. Establishes the `docs/adr/` convention. - **Decision**: No follow-up issue for `ConversationTimeline` / `ChronikRow` — explicitly out of scope for this iteration. --- ### Open / Skipped None — every point raised by the personas was walked through and resolved. Nothing deferred.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#307