feature: PDF-Thumbnails für Dokumente (Upload + Admin-Backfill) #308
Reference in New Issue
Block a user
Delete Branch "feat/issue-307-pdf-thumbnails"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #307
Summary
Adds JPEG thumbnails (240px wide, quality 85) for documents across the archive. PDFs render the first page via PDFBox; JPEG/PNG/TIFF images are scaled with Graphics2D bilinear interpolation. Thumbnails are generated fire-and-forget on upload and via a new admin backfill card for legacy documents.
Implementation follows the consolidated Discussion Resolutions posted as comment #3978 on the issue — every decision from the multi-persona review is honored:
ThumbnailAsyncRunner.dispatchAfterCommit— mirrorsAuditService.logAfterCommit, guarantees rollback-safe semantics (T1)thumbnailExecutorbean withCallerRunsPolicy— isolates PDF rendering from OCR, applies back-pressure instead of dropping work (T2)FileService.downloadFileStream— never buffers 50 MB into heap; 30 s timeout defends against corrupt/malicious PDFs (T3)Cache-Control: private, immutableon/api/documents/{id}/thumbnail— avoids CWE-525 shared-cache leak (T4)DocumentThumbnail.svelte60×84 tile withobject-top,dark:mix-blend-multiply, lazy loading, empty alt — used in bothDocumentRowandPersonDocumentList(T5, T6)admin_system_import_*(T7)thumbnails.ts, Playwright E2E for the admin card + axe-core on/admin/system(T9)docs/adr/capturing why PDFBox stays in-process (T11)Files
Backend (new):
ThumbnailService,ThumbnailAsyncRunner,ThumbnailBackfillService, their tests,ThumbnailServiceIntegrationTest(real MinIO), migrationV52__add_document_thumbnails.sql.Backend (modified):
Document,DocumentRepository,ErrorCode,AsyncConfig,FileService,DocumentService,MassImportService,AdminController,DocumentController, and their tests.pom.xmlgainstwelvemonkeys-imageio-tiff.Frontend (new):
DocumentThumbnail.svelte,thumbnails.ts+ test.Frontend (modified): generated
api.ts,DocumentRow,PersonDocumentList,persons/[id]/+page.svelte,admin/system/+page.svelte,messages/{de,en,es}.json,e2e/admin.spec.ts,e2e/accessibility.spec.ts.Docs (new):
docs/adr/004-pdfbox-thumbnails.md.Test Plan
./mvnw test— 1251 backend tests, 0 failuresnpx vitest run thumbnails— 4 frontend unit tests passThumbnailServiceIntegrationTestspins up MinIO via Testcontainers GenericContainer, confirms real S3 round-tripadmin triggers thumbnail generation and sees DONE within 30s— requires full stack; run locally vianpm run test:e2ethumbnail_keyon a doc, trigger from/admin/system, watch status transition to DONE🤖 Generated with Claude Code
Renders a 240px-wide JPEG (quality 85) from either a PDF first page via PDFBox or a JPEG/PNG/TIFF scan via ImageIO, then uploads to S3 under thumbnails/{docId}.jpg and updates the Document entity. Scaling uses Graphics2D.drawImage with VALUE_INTERPOLATION_BILINEAR (not deprecated Image.getScaledInstance). Source is streamed via FileService.downloadFileStream to avoid buffering 50MB PDFs. Never throws — returns Outcome.SKIPPED for unsupported content types and Outcome.FAILED for rendering/upload errors so the backfill can tally them without aborting the run. Refs #307 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Pure function returning /api/documents/{id}/thumbnail?v=<timestamp> or null when thumbnailKey is missing. The encoded timestamp changes whenever the backend regenerates a thumbnail (file replace), invalidating browser caches despite the immutable Cache-Control. Refs #307 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>🏛️ Markus Keller — Senior Application Architect
Verdict: ⚠️ Approved with concerns
I've reviewed the 37 changed files against the consolidated Discussion Resolutions. The shape matches what we agreed: dedicated executor, afterCommit dispatch, in-process PDFBox, sequential backfill loop, ADR-004 captured. Good execution overall — a few architectural concerns to raise.
Blockers
None.
Concerns
ThumbnailServiceinjectsDocumentRepositorydirectly (ThumbnailService.java:38). Per the project's layering rules, "services never reach into another domain's repository — call the other domain's service instead". The cleanest way around this is circular becauseDocumentServiceneedsThumbnailAsyncRunner, andThumbnailServiceneeds to persist the thumbnail fields ontoDocument. I'd treatThumbnailServiceas an aspect-owner of the thumbnail columns specifically — but the ADR should explicitly acknowledge this layering exception. Same applies toThumbnailBackfillService(ThumbnailBackfillService.java:37).The ADR-004 doesn't mention in-memory
BackfillStatus. It's a real operational trait (state lost on restart; single-node only) and every future maintainer will re-ask. Two sentences covers it.Documententity got two new columns without@Schema(requiredMode = REQUIRED)— correct (they're nullable), but the convention on new fields in this entity is to add the annotation when the field is always populated. Since these are nullable by design, the current state is correct. Just flagging that the review saw it and it's right.Suggestions
ThumbnailService.generate()does file fetch → render → scale → encode → upload → persist in one 40-line method. Splitting intoreadSource(doc)/renderJpeg(source)/uploadAndPersist(doc, jpeg)would make each stage individually testable without mocking S3. Not a blocker — you test at the right boundaries and the existing tests are strong.admin/system/+page.svelteis now duplicated between import and thumbnail status. Two instances doesn't justify extraction. Three would. Leave it for now.ThumbnailAsyncRunner.generateAsyncspawns a single-threadedExecutorServiceper call for the 30 s timeout. Correct but allocates a thread + shuts it down every invocation. At the expected volume this is fine; at 10k+ docs during backfill it's 10k thread allocations. Consider a small sharedScheduledExecutorServiceif the backfill grows. Follow-up, not blocker.👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
TDD discipline looks clean:
ThumbnailServiceTest(7),ThumbnailAsyncRunnerTest(5),ThumbnailBackfillServiceTest(7), controller slice tests, integration test, frontend unit tests forthumbnails.ts. Every test I'd want to defend the decisions is present. 1251 backend tests green. Two real concerns below, rest is polish.Concerns
ThumbnailService.generate()has twotryblocks with overlapping catches (ThumbnailService.java:68-103). The first wraps the source read, the second wraps scale/encode/upload. Both catchException→ log+returnFAILED. The second one swallows errors fromdocumentRepository.save()too — if Hibernate throws aConstraintViolationExceptionthe thumbnail was successfully uploaded to S3 but the entity update failed silently. We'll see orphanthumbnails/{docId}.jpgobjects in MinIO that no document points at. Not catastrophic (next backfill regenerates), but worth a TODO at minimum, and ideally a log line distinguishing "upload failed" from "save failed".updateDocumentdoesn't delete the previous thumbnail whennewFile != null(DocumentService.java:262-268). The S3 key is deterministic (thumbnails/{docId}.jpg), soPutObjectoverwrites it — so no actual orphan, and the regenerated thumbnail replaces the old one cleanly. Re-read the code twice to confirm this. Writing it out because future-me will wonder: yes, deterministic key +PutObject= replace, so we're safe. Maybe worth a one-line comment on the thumbnail key constant.Suggestions
ThumbnailService.generateis ~40 lines across two try blocks. Splitting source-read / transform / upload into private helpers makes each stage unit-testable without building a whole MinIO in-memory. Not a blocker.thumbnails.tstest hasencodeURIComponentin the expected value (thumbnails.test.ts:21). That's fine, but the test documents the URL shape future readers will see — add one sentence above the test:// %3A escape is expected — colons in LocalDateTime are URL-unsafe.DocumentThumbnail.svelteusesdoc={doc}in DocumentRow (formatter expanded it from{doc}). Both are valid Svelte 5; prettier's preference. Not worth changing.DocumentThumbnail.svelterenders for any unsupported / missing thumbnail — including images. An image file whose thumbnail generation fails gets a PDF icon, which is mildly misleading. Low priority, but a neutral "file" icon would be more honest.🔒 Nora "NullX" Steiner — Application Security Engineer
Verdict: ✅ Approved
All three items I flagged during the issue review have been addressed:
Cache-Control: private, max-age=31536000, immutableon/api/documents/{id}/thumbnail(DocumentController.java:109) with regression test assertingprivatepresent andpublicabsent (DocumentControllerTest.java:389-401)ThumbnailAsyncRunner.generateAsync(ThumbnailAsyncRunner.java:68), tested with a 5-second sleep vs. 1-second configurable timeout (ThumbnailAsyncRunnerTest.java:93-108)Loader.loadPDF(new RandomAccessReadBuffer(in))(ThumbnailService.java:115) with streamed input; the stream is closed in a try-with-resources block so memory and file-descriptor leaks are not an issue even on malformed PDFsSuggestions
minioadmin/minioadmin(ThumbnailServiceIntegrationTest.java:49-50). Consistent withMINIO_ROOT_USERdefaults used elsewhere in test setup; not a leak risk (the container only exists during the test). No action needed, just flagging for awareness..anyRequest().authenticated()rather than@RequirePermission(READ_ALL). That's consistent with the existing/fileendpoint and was the walkthrough decision. When the household eventually gets per-document ACLs, both endpoints need the same treatment — worth a TODO reminder inDocumentControllercomments near/fileand/thumbnail.Good security hygiene. Ship it.
🧪 Sara Holt — QA & Test Strategist
Verdict: ✅ Approved
The test matrix I asked for during the issue review is fully present. Going through the checklist:
ThumbnailServiceTest— PDF render, image types (JPEG/PNG), S3 put with correct key + content-type, unsupported content type → SKIPPED, source stream failure → FAILED, S3 put failure → FAILED (7 tests)ThumbnailAsyncRunnerTest—afterCommitpath, no-transaction direct dispatch, rollback path does NOT fire runner, timeout path (5 tests)ThumbnailBackfillServiceTest— initial IDLE status, happy path, skipped counting, failed-but-continue, unexpected exception continuation, concurrent start rejection,startedAtis set (7 tests)DocumentServiceTest— exactly-once dispatch on all four paths (storeDocument / createDocument / updateDocument / attachFile), no-dispatch when no file provided, file-replacement regeneration (5 new tests)AdminControllerTest— 401/403/202/200 on new endpoints (5 tests)DocumentControllerTest— 200 with private Cache-Control + immutable, 404 when no thumbnail, 404 when storage missing, 401 when unauthenticated (4 tests)ThumbnailServiceIntegrationTest— real MinIO + real Postgres round-trip; asserts the uploaded JPEG decodes at width 240 (Sara's "one integration test against real MinIO" target)thumbnails.test.ts— null case, no-generatedAt case, encoded generatedAt case, different timestamps produce different URLsadmin triggers thumbnail generation and sees DONE within 30saccessibility.spec.tsextended to cover/admin/systemin light + system-dark + manual-dark modesBackend suite: 1251 tests, 0 failures, 0 errors. Frontend
thumbnailssuite: 4/4. The numbers themselves don't prove correctness — I spot-checked the actual assertions and they verify the right invariants (not just that methods were called).Minor suggestions
ThumbnailServiceIntegrationTestbumps test suite time by ~45s on first run (container pull). Acceptable per the walkthrough. If CI perf becomes a concern later, consider reusing a single MinIO container across multiple integration tests via a@TestConfigurationbean.DocumentControllerThumbnailITwas called out in my issue review but the integration test lives in the service layer instead (ThumbnailServiceIntegrationTest). That's fine — the round-trip is what mattered; the controller slice tests cover HTTP shape separately.Test hygiene is solid. Quality gate (branch coverage ≥ 80%) should be satisfied — every new branch in ThumbnailService has at least one test hitting it.
⚙️ Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
Operational shape is right. Everything flagged during the issue review has a concrete response in the diff:
thumbnailExecutorbean (AsyncConfig.java:42-56) withcore=1, max=2, queue=200, CallerRunsPolicy. Isolated fromtaskExecutor. Comment explains whyCallerRunsPolicyis safe here whenauditExecutorusesAbortPolicy.FileService.downloadFileStream(FileService.java:115-127) replaces full-bytes buffering — memory ceiling on CX32 holds.log.warn("Thumbnail generation failed for doc={} reason={}", ...)and backfill completionlog.info("Thumbnail backfill complete: total={} processed={} skipped={} failed={} durationMs={}", ...). Parseable by Loki once we wire it up (phase-7).docker-compose.ymluntouched. No new services, no new env vars, same bucket. Production migration to Hetzner Object Storage works identically —thumbnails/is a prefix on the existing bucket.RELEASE.2024-06-13T22-53-53Z(ThumbnailServiceIntegrationTest.java:55). Renovate will pick up future bumps.twelvemonkeys-imageio-tiff:3.12.0also pinned in pom.xml.Operational notes for deploy day
V52__add_document_thumbnails.sqladds two nullable columns with no default backfill — instant ALTER on a non-trivialdocumentstable in Postgres 16. No table rewrite, no lock hold beyond metadata.${app.s3.bucket}./admin/systemshould complete in seconds to a couple of minutes. If the archive grows to thousands of docs, the single-threaded loop × ~2s per PDF = real time. Still acceptable as a one-shot admin action.Suggestions (none blocking)
docker-compose.ymlhas a MinIO MC helper container that creates thearchive-documentsbucket on startup. For Hetzner Object Storage migration we'll want the same "ensure thumbnails prefix permissions" logic — but since it's a prefix not a separate bucket, nothing to do today. Just noting for the migration checklist.Counter.builder("thumbnails.generated").tag("result", ...)on each outcome. Trivial one-liner, valuable at scale.Ship it.
🎨 Leonie Voss — UX & Accessibility Lead
Verdict: ⚠️ Approved with concerns
The shared
DocumentThumbnail.sveltecomponent is exactly what I asked for — centralized, single source of truth, used in bothDocumentRowandPersonDocumentList. Every a11y decision from the walkthrough is in the code:object-cover object-topalt=""(decorative — the title nearby is the accessible name)loading="lazy" decoding="async"dark:mix-blend-multiplyon the<img>for dark-mode glareborder border-linevisible in both themesrounded-sm+bg-whitebackgroundthumbnailKeyis nulladmin_system_import_*/admin/systemextended to the three-mode matrixConcerns
PersonDocumentList's row height grew significantly (old: 32px icon + padding ≈ 56px total; new: 84px tile + padding ≈ 108px total). Five rows now take ~540px vertical space instead of ~280px — nearly doubling the section. On a 1280×800 person detail page this might push the "more" button below the fold. I endorse the uniform 60×84 tile in principle but I want to eyeball this at the real archive before declaring it ideal. Not a blocker for shipping — just something to revisit after the backfill runs and we see real thumbnails in place of the icon fallback.
Fallback icon is always the PDF icon, even for image documents (
DocumentThumbnail.svelte:34-54). Felix flagged the same thing. Users with PNG family photos will see a PDF icon until the backfill runs. Minor visual inconsistency; low priority.Suggestions (post-merge iteration)
/with actual JPEG data to confirm thedark:mix-blend-multiplydoesn't introduce any contrast failures I didn't catch in the static preview.docs/specs/briefwechsel-thumbnail-rows-spec.html) is ready when we come back to it.Approve with the PersonDocumentList height concern noted. Doesn't block merge — it's a "check it looks right with real data" follow-up.
Review Cycle 1 — Changes
Addressed
BackfillStatus" + "treat this as a layering exception, acknowledge it explicitly" — expandeddocs/adr/004-pdfbox-thumbnails.mdwith an "Operational caveats (intentional)" section covering both. Commitf137aa79.ThumbnailService.generateinto four stages (readSourceImage/encodeThumbnail/uploadToStorage/persistThumbnailMetadata), each with a distinct log line. Persist failure specifically logs"orphaned in storage as {key}". Addedgenerate_returnsFailed_whenPersistThrows_butUploadSucceededtest. Also extractedTHUMBNAIL_KEY_PREFIX/SUFFIXconstants with a comment explaining the deterministic-overwrite contract. Commitf0f9753c.b6bfb914.Not addressed (follow-up, not blocker per reviewer)
ScheduledExecutorServiceinstead of per-callExecutors.newSingleThreadExecutor" — explicitly flagged as "follow-up, not blocker" at the expected volume. Deferring.ThumbnailService.generateinto helpers for individual testability" — addressed as part of the try-block split above; the four new private methods are now each testable via their public caller's surface.Deferred to new issues
None — every raised concern has either been fixed, explicitly deferred by the reviewer, or is dependent on post-merge data to evaluate.
Test suite: 1252 backend tests, 0 failures (+1 from the new persist-fails regression test).
Re-running review cycle 2.
🏛️ Markus Keller — Senior Application Architect
Verdict: ✅ Approved
Cycle 2 review — checking only the delta.
BackfillStatuscaveat (single-node, restart-lossy, idempotent replay viafindByFilePathIsNotNullAndThumbnailKeyIsNull).No further architectural concerns. The per-call
Executors.newSingleThreadExecutoris follow-up material — at the expected volume (hundreds to low thousands of docs) thread allocation is a non-issue.Ship it.
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
Cycle 2 delta check.
ThumbnailService.generatenow reads as four named stages — each try/catch wraps a single concern.persistThumbnailMetadatalogs the orphan explicitly:"Thumbnail persist failed for doc={} (orphaned in storage as {}): {}". That's the audit trail the operator needs.THUMBNAIL_KEY_PREFIX/SUFFIXconstants + comment on the deterministic-overwrite contract. The key pattern is in one place now, so changing it (e.g. adding a size suffix) is a one-line change.generate_returnsFailed_whenPersistThrows_butUploadSucceededverifies S3 put was called AND save was called — proves the FAILED outcome is from the persist, not the upload. 8/8 ThumbnailServiceTest green.The fallback icon swap is clean — single SVG,
h-8 w-8at 60×84 is ~37% of the tile width, visually substantial. No action.Approve.
🔒 Nora "NullX" Steiner — Application Security Engineer
Verdict: ✅ Approved
Re-review after cycle 2. No security-relevant changes in the delta — refactor of
ThumbnailService.generatepreserves every control I checked in cycle 1:private, max-age=31536000, immutable)RandomAccessReadBuffer(in)with streaming + try-with-resourcesThe new
persistThumbnailMetadatalog line improves auditability — distinct log per failure mode is exactly what you want during incident response.No findings. Ship.
🧪 Sara Holt — QA & Test Strategist
Verdict: ✅ Approved
Cycle 2 delta:
generate_returnsFailed_whenPersistThrows_butUploadSucceededcodifies Felix's orphan-thumbnail concern. Assertsverify(s3Client).putObject(...)ANDverify(documentRepository).save(...)both happened — proves the upload succeeded before the persist failure, which is exactly the "orphan" case we want to catch if it regresses.DocumentThumbnailbehavior (url-present → img, url-null → fallback) is unchanged.No further findings. Branch coverage should still satisfy the 80% gate — each of the four new private methods in
ThumbnailServicehas a path to an existing public-level test.Approve.
⚙️ Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
Cycle 2 changes are code-only — no infrastructure delta. Re-checking:
AsyncConfig,docker-compose.yml,pom.xml, or any migration."orphaned in storage as {key}"is operator-grade — I can grep Loki for this specifically once phase-7 is up, and count how often it fires. Good forward compatibility with monitoring rollout.BackfillStatus— aligns with the existingMassImportServicepattern that's already been running in production for a year. No new operational surprise.Nothing to add. Ship.
🎨 Leonie Voss — UX & Accessibility Lead
Verdict: ✅ Approved
Cycle 2 delta:
h-6 w-6(24 px) toh-8 w-8(32 px) — fills the 60×84 tile better; the empty state no longer looks sparse.aria-hiddeninside a decorative wrapper div,alt=""on the real<img>when thumbnails are present, focus ring on the enclosing<a>.My earlier concern about PersonDocumentList row height doubling is still valid but genuinely requires real thumbnail data to evaluate — post-merge follow-up if it feels wrong in practice. Not a blocker.
Ship it.