security(import): validate PDF magic bytes in MassImportService before S3 upload #529
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type: Security hardening (input validation)
Priority: P1-high — prod-reachable file ingestion with no content-type check
Source: review of #526 by Nora "NullX" Steiner — comment #8648, finding 5
Parent PR: #526 (mass-import bind mount → first prod-reachable use)
Summary
Before uploading any file to S3,
MassImportServiceMUST read the first 4 bytes and reject anything that doesn't carry the PDF magic header (%PDF). Rejections MUST be counted and surfaced inImportStatus.Context
MassImportServicereads filenames from the ODS spreadsheet, walks/importfor matching files, and uploads whatever it finds to S3 viaS3Client.putObject(MassImportService.java:153,:381). Currently no content-type validation: a file namedLetter-1947.pdfcould be a renamed.zip(potentially exploiting downstream PDF parsers), a.html(XSS via signed S3 URL preview), or arbitrary bytes — and the OCR/thumbnail pipeline would then process the garbage.With #526 making this endpoint prod-reachable, the upload-anything behaviour is now an issue.
Required
s3Client.putObject(...)call inMassImportService, read the first 4 bytes of the file and reject anything not matching the PDF magic header (25 50 44 46=%PDF).originalFilenameand the actual first bytes (sanitized via SLF4J{}placeholder, per Nora's Log4Shell point).ImportStatus— counter of skipped files alongside the existingprocessedcounter, so the operator sees what happened.%PDF-) and one renamed text file (txt-as-pdf.pdfcontainingnot a pdf); assert one upload + one skip + status message lists the rejected filename.Out of scope
Acceptance criteria
ImportStatusexposes askippedcount and askippedFileslist (or similar){}placeholders, not string concatenationLinked NFRs
ImportStatus(count + filename list).Definition of Ready
%PDF) defined🤖 Generated with Claude Code during /implement on #526