Tobias C2 — DocumentBulkEditDTO carries @Size guards on tagNames (max 200
entries × 200 chars), receiverIds (max 200), and the three location strings
(max 255 chars each). Controller now uses @Valid on @RequestBody so they
fire. The 500-cap on documentIds stays as a controller-level check (typed
BULK_EDIT_TOO_MANY_IDS code, not generic VALIDATION_ERROR).
Markus #7 — replace fully-qualified type names inside DocumentService with
imports (DocumentBatchSummary, DocumentBulkEditDTO).
Markus #8 — @Transactional(readOnly = true) on findIdsForFilter and
batchMetadata. Both are pure read paths; the marker lets Hibernate skip
dirty-checking on the loaded entities.
Record conversion of DocumentBulkEditDTO (Markus #6 / Felix #3) deferred
to a follow-up — keeping @Data avoids 10+ test bodies that mutate the DTO
via setters; the inconsistency is documented in the DTO's class-level
Javadoc.
Refs #225, PR #331
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses Markus B1+B2, Nora C1+C4+C5, Tobias #1, Sara B1+B2+C2, Elicit S2+C4
from the cycle 1 review on PR #331.
Audit / version trail
applyBulkEditToDocument now takes actorId, calls
documentVersionService.recordVersion(saved), and emits an
AuditKind.METADATA_UPDATED event tagged source=BULK_EDIT — restoring parity
with the single-doc updateDocument path.
Caps
/api/documents/batch-metadata: 500-ID cap (matches PATCH cap)
/api/documents/ids: 5000 result cap with BULK_EDIT_TOO_MANY_IDS on overflow
Permission tightening
/api/documents/ids re-gated WRITE_ALL — its only consumer is the bulk-edit
fast path (least-privilege per Elicit S2 + Nora's defence-in-depth).
Audit log
/ids and /batch-metadata now emit one log.info per call, mirroring the
quickUpload + bulkEdit format.
Robustness
Duplicates in PATCH documentIds are de-duplicated via LinkedHashSet so a
double-clicked "Alle X editieren" cannot inflate the updated count.
log.warn lines that interpolate Throwable.getMessage() now run through a
CRLF-strip helper (CWE-117).
Tests added
applyBulkEditToDocument_recordsVersion_andLogsAuditEvent_taggedSourceBulkEdit
patchBulk_acceptsExactly500Ids_atTheCap (off-by-one fence)
patchBulk_dedupesDuplicateDocumentIds_doesNotInflateUpdatedCount
getDocumentIds_returns403_forUserWithoutWriteAll
getDocumentIds_returns400_whenResultExceedsFilterCap
batchMetadata_returns403_forUserWithoutReadAll
batchMetadata_returns400_whenIdsExceedsCap
All 231 backend tests green.
Refs #225, PR #331
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
READ_ALL-gated endpoint returning all document UUIDs matching the same
filter parameters as /search, ignoring page/size. Powers the "Alle X
editieren" fast path so the bulk-edit page can replace the selection
with every match in one round-trip.
Refs #225
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
READ_ALL-gated batch endpoint returning lightweight summaries (id, title,
server PDF URL) for the bulk-edit page's left strip. Unknown IDs are silently
dropped — missing previews would be obvious to the user already.
Refs #225
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WRITE_ALL-gated batch endpoint that applies a partial DTO to up to 500
documents per request. Per-document failures (DOCUMENT_NOT_FOUND, etc.)
are collected into the response's errors[] without aborting the batch.
Logs an audit line consistent with quickUpload.
Refs #225
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per-document atomic mutation method for the upcoming bulk PATCH endpoint.
Tags and receivers merge additively into existing sets; sender and the three
location fields replace only when the DTO field is non-blank. Wrapped in its
own @Transactional so a per-document failure cannot partially mutate other
documents in the outer batch loop.
Refs #225
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the request/response shapes for the upcoming PATCH /api/documents/bulk,
POST /api/documents/batch-metadata, and the new error code for the 500-ID cap.
Refs #225
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers four behaviours of applyBatchMetadata that had no coverage:
title applied by list index, sender resolved via PersonService,
tags applied via updateDocumentTags, and title left unchanged when
the fileIndex exceeds the titles list length.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
storeDocumentWithBatchMetadata was a 30-line flat method mixing file storage
with metadata hydration. The private helper makes each concern visible at a
glance.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces comma-delimited String with a proper JSON array field — callers no
longer need to pre-serialise. Service drops the split/trim/filter step and
passes tagNames directly to updateDocumentTags().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Validation guards (BATCH_TOO_LARGE, titles > files) are domain rules and
belong in the service where they can be unit-tested without the HTTP layer.
Controller now delegates to documentService.validateBatch().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add DocumentBatchMetadataDTO (titles, senderId, receiverIds, documentDate, location, tags, metadataComplete)
- Add BATCH_TOO_LARGE to ErrorCode
- Extend quickUpload to accept optional @RequestPart("metadata"); dispatches to storeDocumentWithBatchMetadata when present
- Cap batch at 50 files/request; reject 400 when titles.size > files.size
- Add DocumentService.storeDocumentWithBatchMetadata applying shared fields + index-based titles to both created and updated docs
- Raise max-request-size to 500MB (10-file chunk at max per-file size)
- Add structured SLF4J logging for every quickUpload call
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Seeds 120 UPLOADED docs with a deterministic date spread and runs
DocumentService.searchDocuments against a Testcontainers Postgres, not
a Mockito mock. Five cases:
1. First page returns exactly page_size items + correct totalElements
2. Last partial page returns the tail slice (offset 100 → 20 items)
3. Page beyond last returns empty content, totalElements still 120
4. SENDER sort path slices in-memory + reports correct total
5. Different pages return disjoint document id sets
Closes the integration-coverage gap between the Mockito unit tests and
the full Spec→Pageable→Page→DTO path that unit tests can't exercise.
Runs in ~87 s against the shared Testcontainers instance. (#316)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PageRequest.of(0, 10_000) was inlined at ~12 sites across DocumentServiceTest
and DocumentServiceSortTest as an "effectively unpaged" sentinel for tests
that don't care about paging. Extracted to a named constant on each class
so the intent is visible at each callsite and we don't risk copy-paste
drift of the magic number. No behaviour change. (#316)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds 5 dedicated controller cases — paging fields exposed on the JSON,
rejections for size>100 / size<1 / page<0 / page>100000, and a
captor assertion that the built PageRequest is forwarded to the service.
The size>100 case is the load-bearing guard on @Validated at
DocumentController — removing the annotation silently reopens the DoS
window this PR is meant to close.
Adds 5 service cases — fast path uses findAll(Spec, Pageable) (not Sort),
propagates page+size to the DB, carries totalElements/totalPages/
pageNumber/pageSize back on the result, and for SENDER sort slices in
memory and reports the pre-slice total. Page-beyond-last returns empty
content with a correct totalElements (JPA edge case). (#315)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fast path (DATE/TITLE/UPLOAD_DATE) pushes sort + paging into the DB via
findAll(Specification, PageRequest) and enriches only the returned slice
— 30× cheaper than enriching all 1500 matches when the user is only
going to see 50. In-memory sort paths (SENDER/RECEIVER/RELEVANCE) keep
their LEFT JOIN-friendly sort but now slice in-memory too, so enrichment
still runs against the page slice only.
Controller passes PageRequest.of(page, size) built from @RequestParam
values. Plan-level "add @Validated" prerequisite comes in the next commit.
All existing tests updated mechanically to pass a pageable argument
(PageRequest.of(0, 10_000) as an "effectively unpaged" sentinel). Stubs
that previously matched findAll(Specification, Sort) for the fast path
now match findAll(Specification, Pageable) with PageImpl<>.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rename `total` → `totalElements` for Spring-Page parity and add three new
required paging fields: pageNumber, pageSize, totalPages. Adds a `paged(
slice, pageable, totalElements)` factory alongside the existing single-page
`of(list)` shortcut. Enables offset pagination of /documents search (#315).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Prior coverage only exercised getThumbnailUrl() as a Java method call.
The new case serialises via ObjectMapper and asserts the resulting JSON
contains "thumbnailUrl":"..." so we catch silent breakages in the wire
contract (getter rename, @JsonIgnore, visibility drop) — not just
regressions in the method's return value (#309).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DashboardService now reads the URL from the Document's computed getter
instead of passing null, so the resume strip can display the real
thumbnail of whatever the user was last working on (#309).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@JsonProperty makes the computed getter part of every Document response
Jackson produces, so any DTO returning a Document automatically carries
the thumbnail URL without per-controller plumbing. The accompanying
comment warns future readers that the cache-buster is load-bearing
for the endpoint's `immutable` cache header (CWE-525) (#309).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Matches the shape the frontend previously built via
encodeURIComponent(thumbnailGeneratedAt), so the backend is now the
single source of truth for the thumbnail URL convention (#309).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The no-cache-buster branch covers documents whose thumbnail key is set
but whose thumbnailGeneratedAt is still null — which only happens in
the narrow window between the key being persisted and the async worker
stamping the timestamp (#309).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First TDD step for centralising the thumbnail URL convention on the
Document entity (#309). Adds a stub getter returning null and a test
that locks the "no key → no URL" branch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
persistThumbnailMetadata was a four-arg method signature that mixed
three conceptually related values. Wrapping them in a private
ThumbnailResult record drops the signature to (Document, result),
mirrors the existing SourcePreview record one step earlier in the
pipeline, and keeps generate() reading as a narrative of small
named outputs rather than positional arguments.
Refs #305
Fixes @felixbrandt suggestion 2 from PR review
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Groups the first-page BufferedImage and the source's total page count
into a SourcePreview record so both values travel through generate()
together. PDFs get pdf.getNumberOfPages(); image uploads always get 1
(a scan is one page from the user's perspective). The page badge on
the thumbnail row uses this value to show "1 / N" for multi-page
letters without a separate round-trip.
Refs #305
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Computes aspect at generate-time from the loaded BufferedImage: w/h
above 1.1 → LANDSCAPE, otherwise PORTRAIT. The threshold keeps
near-square A4 scans in the portrait tile (ratio ≈ 1.0) rather than
flipping to landscape on a rounding error. Also hardens the pipeline
with an explicit dimension guard so width=0 / height=0 edge cases fail
cleanly instead of dividing by zero when the aspect is computed.
Refs #305
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both cases already return FAILED via the existing catch-Exception blocks
in readSourceImage. Pinning the behavior with regression tests before
thumbnailAspect and pageCount computation is added, so a future
refactor that removes the safety net is caught at compile/test time.
Refs #305
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds ThumbnailAspect enum (PORTRAIT | LANDSCAPE) and maps the two
nullable columns from V53 as JPA fields so ThumbnailService can
populate them and the API can return them unchanged to the frontend.
Refs #305
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two nullable metadata columns to documents, populated by
ThumbnailService when it generates the JPEG preview. Both remain null
until the existing admin backfill endpoint reruns the service.
Refs #305
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses @felixbrandt — fix(backend): "the two try blocks in generate()
overlap — a save failure logs 'generation failed' even though the
thumbnail is already in S3 as an orphan".
generate() now orchestrates four stages, each in its own try+log:
readSourceImage / encodeThumbnail / uploadToStorage / persistThumbnailMetadata
persistThumbnailMetadata emits the distinct "orphaned in storage as <key>"
log line so an operator can see database-side failures after the upload
completed. The deterministic key ensures the next run overwrites cleanly,
so the orphan is self-healing.
Also extracts THUMBNAIL_KEY_PREFIX/SUFFIX constants with a comment
explaining the deterministic-overwrite contract.
Adds test: generate_returnsFailed_whenPersistThrows_butUploadSucceeded.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Spins up a MinIO container (Testcontainers GenericContainer) alongside
the existing PostgresContainerConfig, uploads a sample PDF, runs the
real ThumbnailService, and reads the resulting JPEG back from the
object store. Catches S3 signing / path-style access issues a mocked
S3Client wouldn't — justifies the CI cost (~45s) per walkthrough T9b.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Streams the JPEG thumbnail from S3 with Cache-Control: private,
max-age=31536000, immutable — `private` (not `public`) prevents
shared caches from leaking one user's thumbnail to another (CWE-525).
`immutable` is safe because the URL carries ?v=<thumbnailGeneratedAt>
as a cache-buster that changes whenever the file is replaced.
Authentication falls back to the global .anyRequest().authenticated()
rule, matching the existing /file endpoint's permission model.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- POST /api/admin/generate-thumbnails → triggers async backfill, 202
- GET /api/admin/thumbnail-status → returns current BackfillStatus
Both gated by the class-level @RequirePermission(Permission.ADMIN).
Shape and polling semantics mirror the mass-import endpoints.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sequentially processes all documents with a file but no thumbnail and
tallies processed / skipped / failed counts. Runs on thumbnailExecutor
so it shares back-pressure with live upload thumbnails but can never
saturate them (single-threaded loop).
Concurrent start rejected with THUMBNAIL_BACKFILL_ALREADY_RUNNING.
Emits a structured summary log line on completion for operator
visibility.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ODS/Excel imports that actually upload a file (file.isPresent()) now
trigger thumbnail generation alongside hash/metadata. Metadata-only
import rows produce no thumbnail — nothing to render.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
All four upload code paths (storeDocument, createDocument, updateDocument,
attachFile) now call thumbnailAsyncRunner.dispatchAfterCommit(id) after
the document save. createDocument and updateDocument only dispatch when a
file was actually provided/replaced.
The dispatch is afterCommit-safe: if the surrounding @Transactional
method rolls back, no thumbnail is generated for a document that never
reached the DB.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bridges @Transactional upload paths to the async thumbnail pipeline.
dispatchAfterCommit registers a TransactionSynchronization so the async
task only fires after the surrounding commit (and is silently skipped
on rollback) — mirrors the AuditService.logAfterCommit pattern.
generateAsync wraps the full ThumbnailService.generate call in a 30s
watchdog so a hung PDFBox render cannot occupy a thumbnailExecutor slot
indefinitely.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Renders a 240px-wide JPEG (quality 85) from either a PDF first page
via PDFBox or a JPEG/PNG/TIFF scan via ImageIO, then uploads to
S3 under thumbnails/{docId}.jpg and updates the Document entity.
Scaling uses Graphics2D.drawImage with VALUE_INTERPOLATION_BILINEAR
(not deprecated Image.getScaledInstance). Source is streamed via
FileService.downloadFileStream to avoid buffering 50MB PDFs.
Never throws — returns Outcome.SKIPPED for unsupported content types
and Outcome.FAILED for rendering/upload errors so the backfill can
tally them without aborting the run.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Dedicated thread pool (core=1, max=2, queue=200) with CallerRunsPolicy
for back-pressure. Keeps thumbnail rendering off the shared taskExecutor
used by OCR and out of the AbortPolicy queue that drops work on overflow.
Quick-upload batches (15+ files) now apply back-pressure instead of
silently dropping thumbnail jobs.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Thumbnail generation will call this for PDFs up to 50 MB — loading the
full byte[] via downloadFileBytes would cause real memory pressure on
the single-VPS deploy. Stream-based reads let PDFBox parse the first
page without holding the whole file in heap.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
JDK ImageIO handles JPEG, PNG, BMP, GIF out of the box but not TIFF.
Since the document upload allowlist permits image/tiff, the thumbnail
generator must also decode it.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors the IMPORT_ALREADY_RUNNING pattern for the concurrent-start
guard in ThumbnailBackfillService.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds findByFilePathIsNotNullAndThumbnailKeyIsNull() used by the
upcoming ThumbnailBackfillService to locate documents that have a
file attached but no thumbnail yet.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two nullable columns to the documents table and their JPA mappings
on the Document entity. Both are left out of the OpenAPI required-mode
schema so the generated TypeScript type exposes them as optional.
Refs #307
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Replace any(Set.class) with any() to eliminate the raw-type unchecked
cast in DashboardControllerTest
- Derive ALL_ELIGIBLE_KINDS from AuditKind.ROLLUP_ELIGIBLE.stream() so
the integration test constant stays in sync with the production constant
automatically when new kinds are added
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses review concern: the test lived in the dashboard package but
tests the audit domain service. Package-by-feature convention requires
audit tests to live in the audit package.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Added @Parameter annotation so SpringDoc renders kinds as an
enum-array query param; regenerated TypeScript API types.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spring auto-converts ?kinds=FILE_UPLOADED,TEXT_SAVED to Set<AuditKind>.
Absent or empty kinds defaults to ROLLUP_ELIGIBLE. Unknown value → 400.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>