feat(backend): add FileService.downloadFileStream for memory-efficient reads
Thumbnail generation will call this for PDFs up to 50 MB — loading the full byte[] via downloadFileBytes would cause real memory pressure on the single-VPS deploy. Stream-based reads let PDFBox parse the first page without holding the whole file in heap. Refs #307 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -112,6 +112,27 @@ public class FileService {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Opens a streaming download from S3/MinIO. The caller is responsible for
|
||||
* closing the returned stream — typically via try-with-resources. Preferred
|
||||
* over {@link #downloadFileBytes(String)} for large files (multi-MB PDFs
|
||||
* during thumbnail generation) because it avoids loading the entire file
|
||||
* into heap memory.
|
||||
*/
|
||||
public InputStream downloadFileStream(String s3Key) throws IOException {
|
||||
try {
|
||||
GetObjectRequest getObjectRequest = GetObjectRequest.builder()
|
||||
.bucket(bucketName)
|
||||
.key(s3Key)
|
||||
.build();
|
||||
return s3Client.getObject(getObjectRequest);
|
||||
} catch (NoSuchKeyException e) {
|
||||
throw new StorageFileNotFoundException("File not found in storage: " + s3Key);
|
||||
} catch (S3Exception e) {
|
||||
throw new IOException("Failed to open stream from storage: " + e.getMessage(), e);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Generates a presigned URL for downloading an object from S3/MinIO.
|
||||
* Valid for 1 hour — covers multi-page documents on CPU-only OCR hardware
|
||||
|
||||
Reference in New Issue
Block a user