security(uploads): integrate ClamAV scan before persisting documents to MinIO #464

New Issue

marcel · 2026-05-07T17:24:57+02:00

marcel commented

2026-05-07 17:24:57 +02:00

Context

backend/src/main/java/org/raddatz/familienarchiv/filestorage/FileService.java:46-66 writes uploaded multipart files straight to MinIO without antimalware scanning. Combined with #84 (no magic-byte MIME validation) and the BODY_SIZE_LIMIT bypass in @sveltejs/adapter-node (covered in the new security(deps): bump @sveltejs/kit + vite issue), the upload path is the largest untrusted-input surface in the system.

The OCR pipeline then loads each PDF into PyTorch / Surya / Kraken / opencv. Several CVE classes target image/PDF parsers specifically; a malicious PDF can ship an exploit payload regardless of MIME validation.

Note: this is not about email-style virus scanning of family content — it's about catching adversarial content before it reaches native image-processing code.

Approach

Add a ClamAV sidecar to the stack and call it from FileService.persistFile() before the MinIO PutObject. Reject infected uploads with a translated 400.

`docker-compose.yml`

clamav:
  image: clamav/clamav:1.4
  container_name: archive-clamav
  restart: unless-stopped
  expose: ["3310"]
  networks: [archive-net]
  healthcheck:
    test: ["CMD", "clamdcheck.sh"]
    interval: 60s
    start_period: 5m  # signature DB takes time on first boot
  volumes:
    - clamav_db:/var/lib/clamav

Backend depends_on: { clamav: { condition: service_healthy } }. Memory: ~2 GB resident (signature DB).

`backend/pom.xml`

Pick a maintained Java client. xenit-eu/spring-boot-starter-clamav or the lightweight de.malkusch.clamav.ClamAvClient. The latter is a single-file dep:

<dependency>
    <groupId>de.malkusch.clamav</groupId>
    <artifactId>clamav-client</artifactId>
    <version>2.x.x</version>
</dependency>

`FileService.persistFile()`

@Value("${app.clamav.enabled:true}")
private boolean clamavEnabled;

public StoredFile persistFile(MultipartFile file) {
    if (clamavEnabled) {
        ScanResult result = clamavClient.scan(file.getInputStream());
        if (result.isInfected()) {
            log.warn("Rejected infected upload: {} — signature: {}", 
                file.getOriginalFilename(), result.getSignature());
            throw DomainException.badRequest(ErrorCode.MALWARE_DETECTED,
                "Datei wurde durch Virusscan abgewiesen");
        }
    }
    // … existing PutObject code …
}

app.clamav.enabled: false for the dev profile (skip scan locally) and true for prod.

Streaming consideration

MultipartFile.getInputStream() is fine for ≤50 MB (the configured max). Don't read bytes into a buffer — pass the InputStream directly to clamavClient.scan(InputStream) so the file is streamed through clamd.

Critical files

docker-compose.yml — add clamav service + volume
backend/pom.xml — clamav-client dep
backend/src/main/java/org/raddatz/familienarchiv/filestorage/FileService.java
backend/src/main/java/org/raddatz/familienarchiv/filestorage/ClamAvConfig.java — new (host/port/timeout config)
backend/src/main/java/org/raddatz/familienarchiv/exception/ErrorCode.java — MALWARE_DETECTED
frontend/src/lib/shared/errors.ts — mirror
frontend/messages/{de,en,es}.json — i18n key
backend/src/main/resources/application.yaml — app.clamav.{host,port,enabled}
backend/src/test/java/.../filestorage/FileServiceClamAvTest.java — new, uses EICAR test string

Verification

EICAR test: upload a file containing the EICAR signature (X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*). Expect 400 + MALWARE_DETECTED.
Clean PDF: upload a real letter PDF. Expect 200 + the document persisted.
ClamAV down: stop the clamav service. Verify upload fails closed (rejected), not fail-open. Decision: fail closed is correct here — upload is a low-frequency operation, and silently disabling AV scan because the sidecar is down is unacceptable.
Signature update: ClamAV's freshclam runs daily by default. Verify in logs that DB version increases over a week of operation.

Acceptance criteria

EICAR test file rejected with 400 + MALWARE_DETECTED ErrorCode + translated message.
Real PDFs accepted unchanged.
When ClamAV is unavailable, uploads fail closed with a 503 (not a silent bypass).
app.clamav.enabled: false works for local dev (no sidecar required).
ClamAV sidecar's signature DB persists across container recreate (named volume clamav_db).
Stream-based scan path: no full file buffered into memory.

Effort

M — 1-2 days. Most of the time is the EICAR test fixture + the fail-closed integration test.

Risk if not addressed

Adversarial PDFs reach native image-processing code (libvips, OpenEXR, PyTorch image decoders) where image-parser CVEs (zero-days, n-days) are a recurring class.

Tracked in audit doc as F-12 (High).

## Context `backend/src/main/java/org/raddatz/familienarchiv/filestorage/FileService.java:46-66` writes uploaded multipart files straight to MinIO without antimalware scanning. Combined with #84 (no magic-byte MIME validation) and the `BODY_SIZE_LIMIT` bypass in `@sveltejs/adapter-node` (covered in the new `security(deps): bump @sveltejs/kit + vite` issue), the upload path is the largest untrusted-input surface in the system. The OCR pipeline then loads each PDF into PyTorch / Surya / Kraken / opencv. Several CVE classes target image/PDF parsers specifically; a malicious PDF can ship an exploit payload regardless of MIME validation. Note: this is **not** about email-style virus scanning of family content — it's about catching adversarial content before it reaches native image-processing code. ## Approach Add a ClamAV sidecar to the stack and call it from `FileService.persistFile()` before the MinIO `PutObject`. Reject infected uploads with a translated 400. ### `docker-compose.yml` ```yaml clamav: image: clamav/clamav:1.4 container_name: archive-clamav restart: unless-stopped expose: ["3310"] networks: [archive-net] healthcheck: test: ["CMD", "clamdcheck.sh"] interval: 60s start_period: 5m # signature DB takes time on first boot volumes: - clamav_db:/var/lib/clamav ``` Backend `depends_on: { clamav: { condition: service_healthy } }`. Memory: ~2 GB resident (signature DB). ### `backend/pom.xml` Pick a maintained Java client. `xenit-eu/spring-boot-starter-clamav` or the lightweight `de.malkusch.clamav.ClamAvClient`. The latter is a single-file dep: ```xml <dependency> <groupId>de.malkusch.clamav</groupId> <artifactId>clamav-client</artifactId> <version>2.x.x</version> </dependency> ``` ### `FileService.persistFile()` ```java @Value("${app.clamav.enabled:true}") private boolean clamavEnabled; public StoredFile persistFile(MultipartFile file) { if (clamavEnabled) { ScanResult result = clamavClient.scan(file.getInputStream()); if (result.isInfected()) { log.warn("Rejected infected upload: {} — signature: {}", file.getOriginalFilename(), result.getSignature()); throw DomainException.badRequest(ErrorCode.MALWARE_DETECTED, "Datei wurde durch Virusscan abgewiesen"); } } // … existing PutObject code … } ``` `app.clamav.enabled: false` for the dev profile (skip scan locally) and `true` for prod. ### Streaming consideration `MultipartFile.getInputStream()` is fine for ≤50 MB (the configured max). Don't read bytes into a buffer — pass the InputStream directly to `clamavClient.scan(InputStream)` so the file is streamed through clamd. ## Critical files - `docker-compose.yml` — add `clamav` service + volume - `backend/pom.xml` — clamav-client dep - `backend/src/main/java/org/raddatz/familienarchiv/filestorage/FileService.java` - `backend/src/main/java/org/raddatz/familienarchiv/filestorage/ClamAvConfig.java` — **new** (host/port/timeout config) - `backend/src/main/java/org/raddatz/familienarchiv/exception/ErrorCode.java` — `MALWARE_DETECTED` - `frontend/src/lib/shared/errors.ts` — mirror - `frontend/messages/{de,en,es}.json` — i18n key - `backend/src/main/resources/application.yaml` — `app.clamav.{host,port,enabled}` - `backend/src/test/java/.../filestorage/FileServiceClamAvTest.java` — **new**, uses EICAR test string ## Verification 1. **EICAR test**: upload a file containing the EICAR signature (`X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*`). Expect 400 + `MALWARE_DETECTED`. 2. **Clean PDF**: upload a real letter PDF. Expect 200 + the document persisted. 3. **ClamAV down**: stop the `clamav` service. Verify upload **fails closed** (rejected), not fail-open. Decision: **fail closed** is correct here — upload is a low-frequency operation, and silently disabling AV scan because the sidecar is down is unacceptable. 4. **Signature update**: ClamAV's `freshclam` runs daily by default. Verify in logs that DB version increases over a week of operation. ## Acceptance criteria - [ ] EICAR test file rejected with 400 + `MALWARE_DETECTED` ErrorCode + translated message. - [ ] Real PDFs accepted unchanged. - [ ] When ClamAV is unavailable, uploads fail closed with a 503 (not a silent bypass). - [ ] `app.clamav.enabled: false` works for local dev (no sidecar required). - [ ] ClamAV sidecar's signature DB persists across container recreate (named volume `clamav_db`). - [ ] Stream-based scan path: no full file buffered into memory. ## Effort M — 1-2 days. Most of the time is the EICAR test fixture + the fail-closed integration test. ## Risk if not addressed Adversarial PDFs reach native image-processing code (libvips, OpenEXR, PyTorch image decoders) where image-parser CVEs (zero-days, n-days) are a recurring class. Tracked in audit doc as **F-12** (High).

marcel added this to the Production v1 milestone 2026-05-07 17:24:57 +02:00

marcel added the P1-high file-upload security labels 2026-05-07 17:26:59 +02:00

marcel commented

2026-05-07 17:37:33 +02:00

🏗️ Markus Keller — Application Architect

Observations

ClamAV is a new external system — it belongs in docs/architecture/c4/l2-containers.puml and docs/DEPLOYMENT.md. The current l2-containers.puml has no antivirus layer; adding it without updating the diagram is a doc omission that I treat as a blocker.
docs/architecture/c4/seq-document-upload.puml must be updated. The current sequence diagram shows the upload flow as: Frontend → Backend → PermissionAspect → DocSvc → FileSvc → MinIO. ClamAV adds a new step between FileSvc and MinIO that the diagram must reflect, along with the rejection path.
An ADR is warranted. This issue introduces a second network dependency in FileService (currently S3 only), a 2 GB memory addition to the stack, a fail-closed availability constraint, and a decision that ClamAV is the right tool over alternatives (magic-byte-only validation, no AV, server-side VirusTotal API). ADRs 001–006 exist; ADR-007 should capture this. The "why ClamAV over X" reasoning in the issue body is exactly the kind of content that needs to be preserved.
ClamAvConfig placement. The issue proposes a new filestorage/ClamAvConfig.java. That's correct — it lives in the filestorage package, not in config/ (which is for infrastructure wiring like MinioConfig). Keep it there.
app.clamav.enabled as a boolean flag. The issue proposes using @Value("${app.clamav.enabled:true}") to skip scanning in dev. This is fine as a tactical measure, but it means the dev profile silently bypasses a security control. A comment on the field explaining why this is acceptable in dev (sidecar not running locally, low-risk dev data) keeps the intent visible for future readers.
No new domain package needed. ClamAV is an infrastructure concern, not a domain concept — filestorage/ is the right home. Do not create a top-level antivirus/ package.
The uploadFile method currently reads the entire file into a byte[] for SHA-256 hashing before the S3 put. The issue's streaming note says "pass InputStream directly to clamavClient.scan(InputStream)" — but the file has already been fully buffered. For this iteration, scanning the same byte[] as a ByteArrayInputStream is fine and avoids re-streaming concerns. The issue's streaming guidance is aspirationally correct but slightly misleading given the current implementation.

Recommendations

Write ADR-007 before the PR opens. Frame: context (upload surface + OCR parser risk), decision (ClamAV sidecar), alternatives (magic-byte-only, no AV, SaaS API), consequences (2 GB memory, fail-closed, freshclam dependency, startup delay).
Update docs/architecture/c4/l2-containers.puml to add the clamav container inside archiv boundary, with a relationship FileSvc → ClamAV: "Scan before PutObject".
Update docs/architecture/c4/seq-document-upload.puml to show the scan step and the 400 rejection path.
Update docs/DEPLOYMENT.md to document the 2 GB ClamAV memory requirement and the start_period: 5m for signature DB initialization.
Add MALWARE_DETECTED to CLAUDE.md's error handling section once the ErrorCode is added.

Open Decisions

Should ClamAV also scan files uploaded via the mass-import path (MassImportService)? The issue scopes to FileService.uploadFile(), which covers all upload paths that go through FileService. Verify that mass import does not write files to MinIO via a different code path — if it does, that path needs the same guard.
ADR-007 must document whether the app.clamav.enabled: false dev shortcut is acceptable long-term or if a Testcontainers-based ClamAV container in CI is planned to test the scan path end-to-end.

## 🏗️ Markus Keller — Application Architect ### Observations - **ClamAV is a new external system** — it belongs in `docs/architecture/c4/l2-containers.puml` and `docs/DEPLOYMENT.md`. The current `l2-containers.puml` has no antivirus layer; adding it without updating the diagram is a doc omission that I treat as a blocker. - **`docs/architecture/c4/seq-document-upload.puml` must be updated.** The current sequence diagram shows the upload flow as: `Frontend → Backend → PermissionAspect → DocSvc → FileSvc → MinIO`. ClamAV adds a new step between `FileSvc` and `MinIO` that the diagram must reflect, along with the rejection path. - **An ADR is warranted.** This issue introduces a second network dependency in `FileService` (currently S3 only), a 2 GB memory addition to the stack, a fail-closed availability constraint, and a decision that ClamAV is the right tool over alternatives (magic-byte-only validation, no AV, server-side VirusTotal API). ADRs 001–006 exist; ADR-007 should capture this. The "why ClamAV over X" reasoning in the issue body is exactly the kind of content that needs to be preserved. - **`ClamAvConfig` placement.** The issue proposes a new `filestorage/ClamAvConfig.java`. That's correct — it lives in the `filestorage` package, not in `config/` (which is for infrastructure wiring like `MinioConfig`). Keep it there. - **`app.clamav.enabled` as a boolean flag.** The issue proposes using `@Value("${app.clamav.enabled:true}")` to skip scanning in dev. This is fine as a tactical measure, but it means the dev profile silently bypasses a security control. A comment on the field explaining *why* this is acceptable in dev (sidecar not running locally, low-risk dev data) keeps the intent visible for future readers. - **No new domain package needed.** ClamAV is an infrastructure concern, not a domain concept — `filestorage/` is the right home. Do not create a top-level `antivirus/` package. - **The `uploadFile` method currently reads the entire file into a `byte[]`** for SHA-256 hashing before the S3 put. The issue's streaming note says "pass `InputStream` directly to `clamavClient.scan(InputStream)`" — but the file has already been fully buffered. For this iteration, scanning the same `byte[]` as a `ByteArrayInputStream` is fine and avoids re-streaming concerns. The issue's streaming guidance is aspirationally correct but slightly misleading given the current implementation. ### Recommendations - Write **ADR-007** before the PR opens. Frame: context (upload surface + OCR parser risk), decision (ClamAV sidecar), alternatives (magic-byte-only, no AV, SaaS API), consequences (2 GB memory, fail-closed, freshclam dependency, startup delay). - Update `docs/architecture/c4/l2-containers.puml` to add the `clamav` container inside `archiv` boundary, with a relationship `FileSvc → ClamAV: "Scan before PutObject"`. - Update `docs/architecture/c4/seq-document-upload.puml` to show the scan step and the 400 rejection path. - Update `docs/DEPLOYMENT.md` to document the 2 GB ClamAV memory requirement and the `start_period: 5m` for signature DB initialization. - Add `MALWARE_DETECTED` to `CLAUDE.md`'s error handling section once the `ErrorCode` is added. ### Open Decisions - **Should ClamAV also scan files uploaded via the mass-import path (`MassImportService`)?** The issue scopes to `FileService.uploadFile()`, which covers all upload paths that go through `FileService`. Verify that mass import does not write files to MinIO via a different code path — if it does, that path needs the same guard. - **ADR-007 must document** whether the `app.clamav.enabled: false` dev shortcut is acceptable long-term or if a Testcontainers-based ClamAV container in CI is planned to test the scan path end-to-end.

marcel commented

2026-05-07 17:37:51 +02:00

👨‍💻 Felix Brandt — Fullstack Developer

Observations

uploadFile currently reads the full file into byte[]. Line 47 of FileService.java: byte[] bytes = file.getBytes(). The SHA-256 hash is computed on that buffer, then the bytes are passed to RequestBody.fromBytes(bytes). The issue says "pass InputStream directly to clamavClient.scan(InputStream)" — but the file is already in memory. In practice, scanning a ByteArrayInputStream(bytes) is the right call here; wrapping with a new ByteArrayInputStream(bytes) before the scan keeps things clean without refactoring SHA-256.
ClamAvConfig should follow project conventions. The issue sketches @Value injection directly into FileService. Better: a @ConfigurationProperties(prefix = "app.clamav") record in a new ClamAvConfig.java — matching the existing MinioConfig pattern — and a @Bean ClamAvClient produced there. This keeps FileService clean and makes the connection pool (host, port, timeout) configurable and testable.
FileService is currently constructed with manual constructor injection (not @RequiredArgsConstructor) because it uses @Value for bucketName. Adding a second @Value for clamavEnabled is fine, but if you add a ClamAvClient bean it should go through the constructor, not field injection. Prefer: FileService(S3Client, S3Presigner, String bucketName, ClamAvClient clamavClient, boolean clamavEnabled).
The MALWARE_DETECTED error code path. The issue uses DomainException.badRequest(ErrorCode.MALWARE_DETECTED, ...). That's correct per project conventions. Make sure the GlobalExceptionHandler maps DomainException to 400 for badRequest — it does, based on the existing pattern.
log.warn on infected upload is correct. Use SLF4J parameterized form: log.warn("Rejected infected upload: {} — signature: {}", file.getOriginalFilename(), result.getSignature()) — already shown in the issue. Ensure the original filename is not logged unsanitized if it could contain log injection characters (strip newlines/CRLFs before logging).
No frontend changes needed for the scan step itself — the frontend already handles the MALWARE_DETECTED error code via getErrorMessage() once it's wired in errors.ts. The i18n keys need to land in messages/{de,en,es}.json.

Recommendations

Implement ClamAvConfig as a @ConfigurationProperties bean (not @Value scattered in FileService). This matches MinioConfig's pattern and makes the config class testable in isolation.
In FileService.uploadFile(), scan before hashing and before the S3 put — reject as early as possible. Order: scan → hash → put.
Sanitize file.getOriginalFilename() before logging: filename.replaceAll("[\r\n]", "_").
Add NOTIFICATION_NOT_FOUND to errors.ts while you're in there — it's in ErrorCode.java but absent from the TypeScript mirror (unrelated debt, but easy to fix in the same PR).
The i18n key pattern should follow existing convention: error_malware_detected in messages/de.json, messages/en.json, messages/es.json.

Open Decisions

None — the implementation path is clear.

## 👨‍💻 Felix Brandt — Fullstack Developer ### Observations - **`uploadFile` currently reads the full file into `byte[]`.** Line 47 of `FileService.java`: `byte[] bytes = file.getBytes()`. The SHA-256 hash is computed on that buffer, then the bytes are passed to `RequestBody.fromBytes(bytes)`. The issue says "pass `InputStream` directly to `clamavClient.scan(InputStream)`" — but the file is already in memory. In practice, scanning a `ByteArrayInputStream(bytes)` is the right call here; wrapping with a `new ByteArrayInputStream(bytes)` before the scan keeps things clean without refactoring SHA-256. - **`ClamAvConfig` should follow project conventions.** The issue sketches `@Value` injection directly into `FileService`. Better: a `@ConfigurationProperties(prefix = "app.clamav")` record in a new `ClamAvConfig.java` — matching the existing `MinioConfig` pattern — and a `@Bean ClamAvClient` produced there. This keeps `FileService` clean and makes the connection pool (host, port, timeout) configurable and testable. - **`FileService` is currently constructed with manual constructor injection** (not `@RequiredArgsConstructor`) because it uses `@Value` for `bucketName`. Adding a second `@Value` for `clamavEnabled` is fine, but if you add a `ClamAvClient` bean it should go through the constructor, not field injection. Prefer: `FileService(S3Client, S3Presigner, String bucketName, ClamAvClient clamavClient, boolean clamavEnabled)`. - **The `MALWARE_DETECTED` error code path.** The issue uses `DomainException.badRequest(ErrorCode.MALWARE_DETECTED, ...)`. That's correct per project conventions. Make sure the `GlobalExceptionHandler` maps `DomainException` to 400 for `badRequest` — it does, based on the existing pattern. - **`log.warn` on infected upload is correct.** Use SLF4J parameterized form: `log.warn("Rejected infected upload: {} — signature: {}", file.getOriginalFilename(), result.getSignature())` — already shown in the issue. Ensure the original filename is not logged unsanitized if it could contain log injection characters (strip newlines/CRLFs before logging). - **No frontend changes needed for the scan step itself** — the frontend already handles the `MALWARE_DETECTED` error code via `getErrorMessage()` once it's wired in `errors.ts`. The i18n keys need to land in `messages/{de,en,es}.json`. ### Recommendations - Implement `ClamAvConfig` as a `@ConfigurationProperties` bean (not `@Value` scattered in `FileService`). This matches `MinioConfig`'s pattern and makes the config class testable in isolation. - In `FileService.uploadFile()`, scan before hashing and before the S3 put — reject as early as possible. Order: `scan → hash → put`. - Sanitize `file.getOriginalFilename()` before logging: `filename.replaceAll("[\r\n]", "_")`. - Add `NOTIFICATION_NOT_FOUND` to `errors.ts` while you're in there — it's in `ErrorCode.java` but absent from the TypeScript mirror (unrelated debt, but easy to fix in the same PR). - The i18n key pattern should follow existing convention: `error_malware_detected` in `messages/de.json`, `messages/en.json`, `messages/es.json`. ### Open Decisions - None — the implementation path is clear.

marcel commented

2026-05-07 17:38:16 +02:00

🛠️ Tobias Wendt — DevOps & Platform Engineer

Observations

minio/minio:latest is already a problem in the current docker-compose.yml — unpinned image, non-reproducible builds. This issue adds another service (clamav/clamav:1.4) that the issue does pin. That's good. But the existing minio:latest and axllent/mailpit:latest should be pinned in the same PR as hygiene, since we're in the Compose file anyway.
clamav/clamav:1.4 is pinned to a minor version, not a patch-pinned digest. 1.4 will pull different images over time as clamav/clamav:1.4 is a mutable tag. Pin to clamav/clamav:1.4.2 (or whatever the current stable patch is) and add to Renovate config. The security team accepting an unpinned AV container is ironic.
start_period: 5m is appropriate — the signature database download on first boot genuinely takes 3–5 minutes. Document this in a comment next to the healthcheck so operators know why deploys appear slow on first run.
Named volume clamav_db is specified in the issue. The current Compose file uses bind mounts for Postgres (./data/postgres) and MinIO (./data/minio) — both of which are flagged in my standard review. The ClamAV volume correctly uses a named volume. Staying consistent: add clamav_db to the volumes: section at the bottom of the Compose file.
backend: depends_on: { clamav: { condition: service_healthy } } is fail-closed at startup — correct. But note that once the backend is running, if ClamAV crashes, the backend will not restart automatically. The issue correctly states uploads should fail closed (503) when ClamAV is down. Verify the Java client throws a catchable exception on connection refused that can be mapped to a 503.
Memory budget: ~2 GB resident. Current stack on CX32 (8 GB RAM): Postgres ~300 MB, MinIO ~200 MB, OCR service ~5 GB, backend ~512 MB, frontend ~256 MB. Adding 2 GB for ClamAV brings estimated total to ~8.3 GB — over the CX32 limit. The OCR service alone is why the VPS is already at the edge. This needs a sizing conversation before deploying to production.
expose: ["3310"] is correct — clamd listens on 3310, no external port needed. Do not add a ports: mapping.
freshclam runs daily inside the ClamAV container by default. No additional configuration needed; verify with docker logs archive-clamav after a week. The issue's verification step #4 is correct.
No CI integration test for the scan path is mentioned. If app.clamav.enabled: false in dev, the CI pipeline also skips the scan. That means the fail-closed behavior (503 when clamd is unreachable) is only manually tested. At minimum, a docker-compose.ci.yml overlay could start a real ClamAV container for the integration test suite.

Recommendations

Pin clamav/clamav:1.4 to a patch-pinned tag (e.g., clamav/clamav:1.4.2) and add it to the Renovate config so it gets automatic version bump PRs.
Add a comment in the Compose file above the healthcheck: # ClamAV downloads the virus signature DB on first boot — allow 5 minutes before marking healthy.
Add clamav_db: to the volumes: section at the bottom of docker-compose.yml.
Before deploying to production, upgrade the VPS from CX32 to CX42 (8 vCPU, 16 GB RAM, ~25 EUR/mo) or reduce OCR service memory to 8g from 12g to make room. At the current estimate, the stack exceeds available RAM with ClamAV added.
Add a CLAMAV_SERVICE_UNAVAILABLE runbook entry: "If uploads return 503, check docker logs archive-clamav. If freshclam failed to download signatures, restart the container."

Open Decisions

VPS sizing: CX32 may be insufficient with ClamAV's 2 GB footprint added to the OCR service's 5 GB. Needs a concrete memory profiling pass or a VPS upgrade decision before Production v1 launch.

## 🛠️ Tobias Wendt — DevOps & Platform Engineer ### Observations - **`minio/minio:latest` is already a problem** in the current `docker-compose.yml` — unpinned image, non-reproducible builds. This issue adds another service (`clamav/clamav:1.4`) that the issue *does* pin. That's good. But the existing `minio:latest` and `axllent/mailpit:latest` should be pinned in the same PR as hygiene, since we're in the Compose file anyway. - **`clamav/clamav:1.4` is pinned to a minor version**, not a patch-pinned digest. `1.4` will pull different images over time as `clamav/clamav:1.4` is a mutable tag. Pin to `clamav/clamav:1.4.2` (or whatever the current stable patch is) and add to Renovate config. The security team accepting an unpinned AV container is ironic. - **`start_period: 5m` is appropriate** — the signature database download on first boot genuinely takes 3–5 minutes. Document this in a comment next to the healthcheck so operators know why deploys appear slow on first run. - **Named volume `clamav_db` is specified in the issue.** The current Compose file uses bind mounts for Postgres (`./data/postgres`) and MinIO (`./data/minio`) — both of which are flagged in my standard review. The ClamAV volume correctly uses a named volume. Staying consistent: add `clamav_db` to the `volumes:` section at the bottom of the Compose file. - **`backend: depends_on: { clamav: { condition: service_healthy } }` is fail-closed at startup** — correct. But note that once the backend is running, if ClamAV crashes, the backend will not restart automatically. The issue correctly states uploads should fail closed (503) when ClamAV is down. Verify the Java client throws a catchable exception on connection refused that can be mapped to a 503. - **Memory budget: `~2 GB` resident.** Current stack on CX32 (8 GB RAM): Postgres ~300 MB, MinIO ~200 MB, OCR service ~5 GB, backend ~512 MB, frontend ~256 MB. Adding 2 GB for ClamAV brings estimated total to ~8.3 GB — **over the CX32 limit**. The OCR service alone is why the VPS is already at the edge. This needs a sizing conversation before deploying to production. - **`expose: ["3310"]` is correct** — clamd listens on 3310, no external port needed. Do not add a `ports:` mapping. - **`freshclam` runs daily inside the ClamAV container by default.** No additional configuration needed; verify with `docker logs archive-clamav` after a week. The issue's verification step #4 is correct. - **No CI integration test for the scan path** is mentioned. If `app.clamav.enabled: false` in dev, the CI pipeline also skips the scan. That means the fail-closed behavior (503 when clamd is unreachable) is only manually tested. At minimum, a `docker-compose.ci.yml` overlay could start a real ClamAV container for the integration test suite. ### Recommendations - Pin `clamav/clamav:1.4` to a patch-pinned tag (e.g., `clamav/clamav:1.4.2`) and add it to the Renovate config so it gets automatic version bump PRs. - Add a comment in the Compose file above the healthcheck: `# ClamAV downloads the virus signature DB on first boot — allow 5 minutes before marking healthy`. - Add `clamav_db:` to the `volumes:` section at the bottom of `docker-compose.yml`. - Before deploying to production, upgrade the VPS from CX32 to CX42 (8 vCPU, 16 GB RAM, ~25 EUR/mo) or reduce OCR service memory to `8g` from `12g` to make room. At the current estimate, the stack exceeds available RAM with ClamAV added. - Add a `CLAMAV_SERVICE_UNAVAILABLE` runbook entry: "If uploads return 503, check `docker logs archive-clamav`. If freshclam failed to download signatures, restart the container." ### Open Decisions - **VPS sizing**: CX32 may be insufficient with ClamAV's 2 GB footprint added to the OCR service's 5 GB. Needs a concrete memory profiling pass or a VPS upgrade decision before Production v1 launch.

marcel commented

2026-05-07 17:38:36 +02:00

📋 Elicit — Requirements Engineer

Observations

The acceptance criteria are well-formed and testable — EICAR rejection, clean PDF acceptance, fail-closed behavior, dev bypass, volume persistence, and stream-based scan are each independently verifiable. This is above average for security issues.
Two use cases share the same code path but have different user-observable outcomes — clean upload (200) and infected upload (400). Both are specified. A third case — ClamAV unavailable (503) — is also specified. All three are in the acceptance criteria, which is correct.
The i18n requirement is implicit (throw DomainException.badRequest(...) with a translated 400) but the user-facing string is not specified. "Datei wurde durch Virusscan abgewiesen" is a developer default. The actual translated message key error_malware_detected needs to be defined in all three message files (de/en/es). The German string in the issue is functional; the English and Spanish equivalents need to be written.
Missing NFR: upload latency impact. ClamAV scan adds round-trip time to every upload. For a 50 MB file on a local Docker network, clamd scan time is typically 0.5–2 seconds. For a 10-file batch upload (max 500 MB configured), that's a potential 5–20 seconds of added latency. No latency budget for the scan is stated. This is a legitimate NFR gap.
Missing NFR: signature freshness. The issue mentions freshclam runs daily. There is no requirement stating how stale a signature database is acceptable before uploads should be rejected. If freshclam fails for 7 days, the database is 7 days stale — is that acceptable? This is a deliberate gap the issue leaves open, but it should be a tracked decision.
"Fail closed" is correctly decided in the issue. Quoting: "upload is a low-frequency operation, and silently disabling AV scan because the sidecar is down is unacceptable." This reasoning is sound for a private family archive where adversarial uploads are the primary concern. No further debate needed.
The error message "Datei wurde durch Virusscan abgewiesen" reveals the control. An attacker learning a file was rejected by AV gets information about the defense layer. For a private family archive with known users, this is an acceptable tradeoff. It's worth noting in the ADR.

Recommendations

Add a latency NFR to the issue: "The ClamAV scan must not add more than 3 seconds to a 50 MB upload on the internal Docker network." This gives the implementer a testable threshold and allows regression detection.
Write the three i18n strings before closing the issue: error_malware_detected in de/en/es. Suggested text: de: "Die Datei wurde vom Virusscan abgewiesen.", en: "The file was rejected by the antivirus scan.", es: "El archivo fue rechazado por el análisis antivirus."
Add a "Definition of Done" check: all three message files contain the key before the PR merges.

Open Decisions

Signature staleness tolerance: how many days of stale signatures trigger an operational alert? The issue specifies daily freshclam but no alert threshold. This is a DevOps/Ops question, but the requirement boundary should be noted.

## 📋 Elicit — Requirements Engineer ### Observations - **The acceptance criteria are well-formed and testable** — EICAR rejection, clean PDF acceptance, fail-closed behavior, dev bypass, volume persistence, and stream-based scan are each independently verifiable. This is above average for security issues. - **Two use cases share the same code path but have different user-observable outcomes** — clean upload (200) and infected upload (400). Both are specified. A third case — ClamAV unavailable (503) — is also specified. All three are in the acceptance criteria, which is correct. - **The i18n requirement is implicit** (`throw DomainException.badRequest(...)` with a translated 400) but the user-facing string is not specified. "Datei wurde durch Virusscan abgewiesen" is a developer default. The actual translated message key `error_malware_detected` needs to be defined in all three message files (de/en/es). The German string in the issue is functional; the English and Spanish equivalents need to be written. - **Missing NFR: upload latency impact.** ClamAV scan adds round-trip time to every upload. For a 50 MB file on a local Docker network, clamd scan time is typically 0.5–2 seconds. For a 10-file batch upload (max 500 MB configured), that's a potential 5–20 seconds of added latency. No latency budget for the scan is stated. This is a legitimate NFR gap. - **Missing NFR: signature freshness.** The issue mentions `freshclam` runs daily. There is no requirement stating how stale a signature database is acceptable before uploads should be rejected. If freshclam fails for 7 days, the database is 7 days stale — is that acceptable? This is a deliberate gap the issue leaves open, but it should be a tracked decision. - **"Fail closed" is correctly decided** in the issue. Quoting: "upload is a low-frequency operation, and silently disabling AV scan because the sidecar is down is unacceptable." This reasoning is sound for a private family archive where adversarial uploads are the primary concern. No further debate needed. - **The error message "Datei wurde durch Virusscan abgewiesen" reveals the control.** An attacker learning a file was rejected by AV gets information about the defense layer. For a private family archive with known users, this is an acceptable tradeoff. It's worth noting in the ADR. ### Recommendations - Add a latency NFR to the issue: "The ClamAV scan must not add more than 3 seconds to a 50 MB upload on the internal Docker network." This gives the implementer a testable threshold and allows regression detection. - Write the three i18n strings before closing the issue: `error_malware_detected` in de/en/es. Suggested text: de: "Die Datei wurde vom Virusscan abgewiesen.", en: "The file was rejected by the antivirus scan.", es: "El archivo fue rechazado por el análisis antivirus." - Add a "Definition of Done" check: all three message files contain the key before the PR merges. ### Open Decisions - **Signature staleness tolerance**: how many days of stale signatures trigger an operational alert? The issue specifies daily `freshclam` but no alert threshold. This is a DevOps/Ops question, but the requirement boundary should be noted.

marcel commented

2026-05-07 17:39:02 +02:00

🔐 Nora "NullX" Steiner — Security Engineer

Observations

The threat model is correctly identified. The risk is not EICAR-test viruses in family photos — it is adversarial PDFs crafted to exploit memory-corruption vulnerabilities in native image-parsing code (libvips, pdfium, PyTorch's image decoders in the OCR pipeline). ClamAV's PDF parser detects known-malicious PDFs and some polyglot payloads before they reach those decoders. The issue's framing in the Context section is accurate.
MIME validation bypass (issue #84) is not resolved by this PR. The current MIME check in DocumentController (lines 179/188–190) checks file.getContentType(), which comes from the Content-Type multipart field — a client-controlled value. An attacker can set Content-Type: application/pdf on an ELF binary. Magic-byte validation (Apache Tika or similar) closes this gap. ClamAV and magic-byte validation are complementary, not substitutes. This PR should note that #84 remains open.
app.clamav.enabled: false in dev is a security control bypass — the code path that rejects malicious files is not exercised in the dev profile. This is architecturally acceptable for local development but means the integration tests (which also run with app.clamav.enabled: false if they inherit the dev profile) never verify the scan path. The EICAR test must run with ClamAV enabled — either via a Testcontainers ClamAV or a dedicated integration test profile.
Fail-closed on ClamAV unavailability must be verified. The issue mandates a 503 when ClamAV is down. The de.malkusch.clamav.ClamAvClient throws ClamAvException (or similar) on connection failure. FileService must catch that and throw a 503, not swallow it and allow the upload. This must be tested explicitly.
The scan → hash → put order matters. Currently uploadFile does: getBytes() → sha256Hex(bytes) → putObject. The scan should happen before SHA-256 — reject before doing any work. Correct order: getBytes() → scan(bytes) → sha256Hex(bytes) → putObject.
Log the rejection event at WARN with the signature name — the issue already shows this. Ensure the signature name (e.g., Eicar-Signature) is logged so it's auditable. Do not log the file content or the full original filename without sanitization (strip newlines).
The EICAR test string itself should be stored as a constant or loaded from a resource file in the test, not inline in test code — EICAR strings in source occasionally trigger overzealous CI scanners.
CWE reference: CWE-434 (Unrestricted Upload of File with Dangerous Type) — the audit entry F-12 is correct. This PR addresses the execution-context risk, not the storage risk (files remain in MinIO regardless of scan result on clean files).

Recommendations

Add a dedicated integration test profile (application-test-clamav.yaml) where app.clamav.enabled: true and a Testcontainers ClamAV container is started. The EICAR test runs only in this profile. @Tag("clamav-integration") to exclude from default test run.
Add a unit test for the fail-closed path: mock ClamAvClient to throw IOException on scan(), assert that FileService.uploadFile() throws a DomainException mapping to 503, not that it succeeds silently.
Add a security comment above the clamavEnabled field explaining the threat model: // When false (dev profile): ClamAV sidecar not required. Never set false in production — adversarial PDFs reach OCR image parsers without this check.
Track #84 (magic-byte MIME validation) as a follow-on — do not let this PR's merge create the impression that the upload surface is now fully hardened.

Open Decisions

Should the upload be rejected even if the scan result is indeterminate (clamd returns an error response that is not "clean" but also not "infected")? Some ClamAV client libraries distinguish three states: CLEAN, INFECTED, ERROR. The fail-closed recommendation means treat ERROR as rejected — confirm this is the intended behavior and test it.

## 🔐 Nora "NullX" Steiner — Security Engineer ### Observations - **The threat model is correctly identified.** The risk is not EICAR-test viruses in family photos — it is adversarial PDFs crafted to exploit memory-corruption vulnerabilities in native image-parsing code (libvips, pdfium, PyTorch's image decoders in the OCR pipeline). ClamAV's PDF parser detects known-malicious PDFs and some polyglot payloads before they reach those decoders. The issue's framing in the Context section is accurate. - **MIME validation bypass (issue #84) is not resolved by this PR.** The current MIME check in `DocumentController` (lines 179/188–190) checks `file.getContentType()`, which comes from the `Content-Type` multipart field — a client-controlled value. An attacker can set `Content-Type: application/pdf` on an ELF binary. Magic-byte validation (Apache Tika or similar) closes this gap. ClamAV and magic-byte validation are complementary, not substitutes. This PR should note that #84 remains open. - **`app.clamav.enabled: false` in dev is a security control bypass** — the code path that rejects malicious files is not exercised in the dev profile. This is architecturally acceptable for local development but means the integration tests (which also run with `app.clamav.enabled: false` if they inherit the dev profile) never verify the scan path. The EICAR test must run with ClamAV enabled — either via a Testcontainers ClamAV or a dedicated integration test profile. - **Fail-closed on ClamAV unavailability must be verified.** The issue mandates a 503 when ClamAV is down. The `de.malkusch.clamav.ClamAvClient` throws `ClamAvException` (or similar) on connection failure. `FileService` must catch that and throw a 503, not swallow it and allow the upload. This must be tested explicitly. - **The `scan → hash → put` order matters.** Currently `uploadFile` does: `getBytes() → sha256Hex(bytes) → putObject`. The scan should happen before SHA-256 — reject before doing any work. Correct order: `getBytes() → scan(bytes) → sha256Hex(bytes) → putObject`. - **Log the rejection event at WARN with the signature name** — the issue already shows this. Ensure the signature name (e.g., `Eicar-Signature`) is logged so it's auditable. Do not log the file content or the full original filename without sanitization (strip newlines). - **The EICAR test string itself should be stored as a constant or loaded from a resource file** in the test, not inline in test code — EICAR strings in source occasionally trigger overzealous CI scanners. - **CWE reference:** CWE-434 (Unrestricted Upload of File with Dangerous Type) — the audit entry F-12 is correct. This PR addresses the execution-context risk, not the storage risk (files remain in MinIO regardless of scan result on clean files). ### Recommendations - Add a dedicated integration test profile (`application-test-clamav.yaml`) where `app.clamav.enabled: true` and a Testcontainers ClamAV container is started. The EICAR test runs only in this profile. `@Tag("clamav-integration")` to exclude from default test run. - Add a unit test for the fail-closed path: mock `ClamAvClient` to throw `IOException` on `scan()`, assert that `FileService.uploadFile()` throws a `DomainException` mapping to 503, not that it succeeds silently. - Add a security comment above the `clamavEnabled` field explaining the threat model: `// When false (dev profile): ClamAV sidecar not required. Never set false in production — adversarial PDFs reach OCR image parsers without this check.` - Track #84 (magic-byte MIME validation) as a follow-on — do not let this PR's merge create the impression that the upload surface is now fully hardened. ### Open Decisions - **Should the upload be rejected even if the scan result is indeterminate** (clamd returns an error response that is not "clean" but also not "infected")? Some ClamAV client libraries distinguish three states: `CLEAN`, `INFECTED`, `ERROR`. The fail-closed recommendation means treat `ERROR` as rejected — confirm this is the intended behavior and test it.

marcel commented

2026-05-07 17:39:23 +02:00

🧪 Sara Holt — QA Engineer

Observations

The issue specifies FileServiceClamAvTest.java — a new test class for the EICAR fixture. This is the right layer: unit test with a mocked or real ClamAV client testing the FileService scan integration. The test class name is well-chosen.
Four test scenarios are implied by the acceptance criteria — and the existing FileServiceTest.java gives us the test structure to follow:
1. EICAR → 400 MALWARE_DETECTED
2. Clean PDF → upload proceeds normally
3. ClamAV unavailable → 503 (fail-closed)
4. clamavEnabled: false → scan skipped, upload proceeds
The current FileServiceTest.java uses manual mocking (mock(S3Client.class)) not @ExtendWith(MockitoExtension.class). The new ClamAV test should use @ExtendWith(MockitoExtension.class) for consistency with the project's test style guide.
The EICAR test string inline in source is a risk — some CI pipelines and Git hosting implementations flag repositories containing EICAR in source files. Store the test string in a src/test/resources/eicar.txt file and read it in the test: Files.readAllBytes(Path.of(getClass().getResource("/eicar.txt").toURI())).
The "ClamAV down" test needs a concrete mock strategy. Options: (a) mock ClamAvClient.scan() to throw IOException directly in the unit test; (b) start a real ClamAV container via Testcontainers and stop it mid-test (fragile). Option (a) is the correct unit-test approach; the Testcontainers approach belongs in an @Tag("clamav-integration") test.
No frontend test changes are needed — the MALWARE_DETECTED error code flows through the existing error-handling pipeline. Existing frontend error handling tests cover the getErrorMessage() dispatch table; just add the new case.
The test for clamavEnabled: false should use @TestPropertySource(properties = "app.clamav.enabled=false") or construct FileService with the flag set to false directly in the unit test constructor call. The latter is simpler and avoids Spring context loading.
Coverage impact: FileService.uploadFile() currently has test coverage via FileServiceTest. Adding the scan path increases branch count — the new branches (infected, error, disabled) all need coverage to maintain the 88% JaCoCo gate.

Recommendations

Store the EICAR string in src/test/resources/eicar.txt and load it via getClass().getResource(). Add eicar.txt to .gitignore if the Gitea instance flags it (unlikely for self-hosted, but worth checking).
Write FileServiceClamAvTest.java as a pure unit test (@ExtendWith(MockitoExtension.class), @Mock ClamAvClient clamavClient, @InjectMocks FileService fileService). Four tests: EICAR rejected, clean accepted, client throws IOException → 503 mapped, enabled=false skips scan.
Add a @Tag("clamav-integration") integration test class that starts a real ClamAV container via Testcontainers and runs the EICAR test end-to-end. This gives confidence that the Java client actually talks to clamd correctly — the unit test mocks the client, so real protocol behavior is only verified here.
Verify the JaCoCo coverage gate (currently 88% branch) does not regress after adding the new branches — new branches without test coverage drop the percentage.

Open Decisions

None — the test strategy is clear: unit test for all four branches, integration test with Testcontainers ClamAV for the real-protocol smoke.

## 🧪 Sara Holt — QA Engineer ### Observations - **The issue specifies `FileServiceClamAvTest.java` — a new test class** for the EICAR fixture. This is the right layer: unit test with a mocked or real ClamAV client testing the `FileService` scan integration. The test class name is well-chosen. - **Four test scenarios are implied by the acceptance criteria** — and the existing `FileServiceTest.java` gives us the test structure to follow: 1. EICAR → 400 `MALWARE_DETECTED` 2. Clean PDF → upload proceeds normally 3. ClamAV unavailable → 503 (fail-closed) 4. `clamavEnabled: false` → scan skipped, upload proceeds - **The current `FileServiceTest.java` uses manual mocking** (`mock(S3Client.class)`) not `@ExtendWith(MockitoExtension.class)`. The new ClamAV test should use `@ExtendWith(MockitoExtension.class)` for consistency with the project's test style guide. - **The EICAR test string inline in source is a risk** — some CI pipelines and Git hosting implementations flag repositories containing EICAR in source files. Store the test string in a `src/test/resources/eicar.txt` file and read it in the test: `Files.readAllBytes(Path.of(getClass().getResource("/eicar.txt").toURI()))`. - **The "ClamAV down" test needs a concrete mock strategy.** Options: (a) mock `ClamAvClient.scan()` to throw `IOException` directly in the unit test; (b) start a real ClamAV container via Testcontainers and stop it mid-test (fragile). Option (a) is the correct unit-test approach; the Testcontainers approach belongs in an `@Tag("clamav-integration")` test. - **No frontend test changes are needed** — the `MALWARE_DETECTED` error code flows through the existing error-handling pipeline. Existing frontend error handling tests cover the `getErrorMessage()` dispatch table; just add the new case. - **The test for `clamavEnabled: false`** should use `@TestPropertySource(properties = "app.clamav.enabled=false")` or construct `FileService` with the flag set to `false` directly in the unit test constructor call. The latter is simpler and avoids Spring context loading. - **Coverage impact**: `FileService.uploadFile()` currently has test coverage via `FileServiceTest`. Adding the scan path increases branch count — the new branches (infected, error, disabled) all need coverage to maintain the 88% JaCoCo gate. ### Recommendations - Store the EICAR string in `src/test/resources/eicar.txt` and load it via `getClass().getResource()`. Add `eicar.txt` to `.gitignore` if the Gitea instance flags it (unlikely for self-hosted, but worth checking). - Write `FileServiceClamAvTest.java` as a pure unit test (`@ExtendWith(MockitoExtension.class)`, `@Mock ClamAvClient clamavClient`, `@InjectMocks FileService fileService`). Four tests: EICAR rejected, clean accepted, client throws IOException → 503 mapped, enabled=false skips scan. - Add a `@Tag("clamav-integration")` integration test class that starts a real ClamAV container via Testcontainers and runs the EICAR test end-to-end. This gives confidence that the Java client actually talks to clamd correctly — the unit test mocks the client, so real protocol behavior is only verified here. - Verify the JaCoCo coverage gate (currently 88% branch) does not regress after adding the new branches — new branches without test coverage drop the percentage. ### Open Decisions - None — the test strategy is clear: unit test for all four branches, integration test with Testcontainers ClamAV for the real-protocol smoke.

marcel commented

2026-05-07 17:39:34 +02:00

🎨 Leonie Voss — UI/UX Design Lead

Observations

This is a backend-only change — no new UI surfaces, no new routes, no visual components. My review scope is narrow: the user-facing error message and any upload feedback changes.
The error message that reaches users matters. When the scan rejects a file, the user sees a 400 error. The current error display pattern maps backend error codes to i18n strings via getErrorMessage(). The MALWARE_DETECTED case must produce a message that:
- Explains what happened in plain language ("The file was rejected because it may be harmful.")
- Does not alarm unnecessarily — most users are family members uploading legitimate scans.
- Suggests a next action ("If you believe this is a mistake, please contact the administrator.")
The audience for this error is primarily transcribers (60+) — laptop/tablet users uploading physical letter scans. A clinical "Malware detected" message is jarring and confusing for this audience. Frame it as a safety check, not an accusation.
The upload flow currently shows no progress state for the file upload. Adding a ClamAV scan step (0.5–2 seconds additional latency) makes the existing absence of a progress indicator more noticeable. Not a blocker for this PR, but the upload UX issue becomes more acute.
No new UI components, no accessibility concerns specific to this change. The existing error display component handles the error code correctly once the i18n key is wired.

Recommendations

Use this i18n copy (all three languages must be present before the PR merges):
- de: "Die Datei konnte nicht hochgeladen werden, da sie möglicherweise schädliche Inhalte enthält. Bitte wende dich an den Administrator, falls du das für einen Fehler hältst."
- en: "The file could not be uploaded because it may contain harmful content. Please contact the administrator if you believe this is a mistake."
- es: "El archivo no pudo cargarse porque puede contener contenido dañino. Ponte en contacto con el administrador si crees que esto es un error."
File an issue (or add a note to the upload UX backlog) for a visible progress indicator during upload — the added scan latency will make the "stuck" perception worse.

Open Decisions

None — the message copy is a recommendation, not a design decision requiring sign-off.

## 🎨 Leonie Voss — UI/UX Design Lead ### Observations - **This is a backend-only change** — no new UI surfaces, no new routes, no visual components. My review scope is narrow: the user-facing error message and any upload feedback changes. - **The error message that reaches users matters.** When the scan rejects a file, the user sees a 400 error. The current error display pattern maps backend error codes to i18n strings via `getErrorMessage()`. The `MALWARE_DETECTED` case must produce a message that: - Explains what happened in plain language ("The file was rejected because it may be harmful.") - Does not alarm unnecessarily — most users are family members uploading legitimate scans. - Suggests a next action ("If you believe this is a mistake, please contact the administrator.") - **The audience for this error is primarily transcribers (60+)** — laptop/tablet users uploading physical letter scans. A clinical "Malware detected" message is jarring and confusing for this audience. Frame it as a safety check, not an accusation. - **The upload flow currently shows no progress state** for the file upload. Adding a ClamAV scan step (0.5–2 seconds additional latency) makes the existing absence of a progress indicator more noticeable. Not a blocker for this PR, but the upload UX issue becomes more acute. - **No new UI components, no accessibility concerns** specific to this change. The existing error display component handles the error code correctly once the i18n key is wired. ### Recommendations - Use this i18n copy (all three languages must be present before the PR merges): - **de**: `"Die Datei konnte nicht hochgeladen werden, da sie möglicherweise schädliche Inhalte enthält. Bitte wende dich an den Administrator, falls du das für einen Fehler hältst."` - **en**: `"The file could not be uploaded because it may contain harmful content. Please contact the administrator if you believe this is a mistake."` - **es**: `"El archivo no pudo cargarse porque puede contener contenido dañino. Ponte en contacto con el administrador si crees que esto es un error."` - File an issue (or add a note to the upload UX backlog) for a visible progress indicator during upload — the added scan latency will make the "stuck" perception worse. ### Open Decisions - None — the message copy is a recommendation, not a design decision requiring sign-off.

marcel commented

2026-05-07 17:40:08 +02:00

🗳️ Decision Queue — Open Items Requiring Human Judgment

Grouped by theme. All items below are genuine tradeoffs where a clear winner does not exist without product/ops input.

Infrastructure / VPS Sizing

Raised by: Tobias

The CX32 VPS (8 GB RAM) is already near capacity: OCR service alone is ~5 GB, backend ~512 MB, Postgres ~300 MB, MinIO ~200 MB, frontend ~256 MB. ClamAV adds ~2 GB. Estimated total: ~8.3 GB — over the hardware limit.

Decision needed: Before Production v1, choose one:

(A) Upgrade to CX42 (16 GB RAM, ~25 EUR/mo, +8 EUR/mo).
(B) Reduce OCR service mem_limit from 12g to 8g (acceptable if no concurrent large OCR jobs), freeing ~4 GB.
(C) Run ClamAV only on a separate network call to a self-hosted scanning service, outside the VPS stack (more complex, not recommended for this team size).

Testing Strategy: EICAR in CI vs. Dev-Only Bypass

Raised by: Nora, Sara, Markus

app.clamav.enabled: false in the dev profile means the scan path is never exercised by CI unless a dedicated profile is set up.

Decision needed: Choose one:

(A) Add a Testcontainers ClamAV container for a @Tag("clamav-integration") test suite that CI runs on PR. This verifies real protocol behavior but adds ~2–3 minutes to CI and requires Docker-in-Docker or equivalent.
(B) Unit-test only (mock ClamAvClient), accept that real clamd protocol behavior is validated only by manual EICAR verification on first deploy. Simpler, faster, but leaves a gap.

ClamAV Indeterminate Scan State

Raised by: Nora

The de.malkusch.clamav.ClamAvClient library (and most clamd clients) distinguishes three result states: CLEAN, INFECTED, ERROR. The issue only explicitly handles INFECTED (reject) and connection failure (fail-closed 503). An ERROR state (clamd responded but couldn't complete the scan — e.g., corrupted stream) is not specified.

Decision needed: For the ERROR state (clamd reachable but scan returned error):

(A) Treat as INFECTED — reject the upload (fully fail-closed). Safest.
(B) Treat as a 503 (service-level error) — same as "ClamAV unavailable". Honest.
(C) Treat as CLEAN — allow upload (fail-open). Not recommended; contradicts the fail-closed decision.

Recommendation: Option (A) is consistent with the issue's stated fail-closed intent.

Mass Import Upload Path Coverage

Raised by: Markus

The issue scopes the scan to FileService.uploadFile(). MassImportService also stores files, but does it route through FileService.uploadFile() or write directly to S3? The codebase shows MassImportService calls fileService.uploadFile() for file persistence — so the scan will cover that path automatically. Verify during implementation: if any code path calls s3Client.putObject() directly (bypassing FileService), it needs its own scan call.

Decision needed: Confirm during implementation that no direct s3Client calls exist outside FileService. If found, decide whether to add scan there or refactor to route through FileService.

Signature Staleness Alert Threshold

Raised by: Elicit

freshclam runs daily by default. No requirement specifies what happens if freshclam fails for an extended period (signatures become stale).

Decision needed: Define an acceptable staleness window:

(A) No operational alert — accept up to 7 days stale before manual intervention (low operational overhead, low risk for a private archive).
(B) Add a Prometheus/Grafana alert if the ClamAV container hasn't successfully updated signatures in >48 hours (requires parsing clamd logs or a freshclam webhook, non-trivial).

For a family archive, Option (A) is likely sufficient.

## 🗳️ Decision Queue — Open Items Requiring Human Judgment Grouped by theme. All items below are genuine tradeoffs where a clear winner does not exist without product/ops input. --- ### Infrastructure / VPS Sizing **Raised by: Tobias** The CX32 VPS (8 GB RAM) is already near capacity: OCR service alone is ~5 GB, backend ~512 MB, Postgres ~300 MB, MinIO ~200 MB, frontend ~256 MB. ClamAV adds ~2 GB. Estimated total: ~8.3 GB — over the hardware limit. **Decision needed:** Before Production v1, choose one: - (A) Upgrade to CX42 (16 GB RAM, ~25 EUR/mo, +8 EUR/mo). - (B) Reduce OCR service `mem_limit` from `12g` to `8g` (acceptable if no concurrent large OCR jobs), freeing ~4 GB. - (C) Run ClamAV only on a separate network call to a self-hosted scanning service, outside the VPS stack (more complex, not recommended for this team size). --- ### Testing Strategy: EICAR in CI vs. Dev-Only Bypass **Raised by: Nora, Sara, Markus** `app.clamav.enabled: false` in the dev profile means the scan path is never exercised by CI unless a dedicated profile is set up. **Decision needed:** Choose one: - (A) Add a Testcontainers ClamAV container for a `@Tag("clamav-integration")` test suite that CI runs on PR. This verifies real protocol behavior but adds ~2–3 minutes to CI and requires Docker-in-Docker or equivalent. - (B) Unit-test only (mock `ClamAvClient`), accept that real clamd protocol behavior is validated only by manual EICAR verification on first deploy. Simpler, faster, but leaves a gap. --- ### ClamAV Indeterminate Scan State **Raised by: Nora** The `de.malkusch.clamav.ClamAvClient` library (and most clamd clients) distinguishes three result states: `CLEAN`, `INFECTED`, `ERROR`. The issue only explicitly handles `INFECTED` (reject) and connection failure (fail-closed 503). An `ERROR` state (clamd responded but couldn't complete the scan — e.g., corrupted stream) is not specified. **Decision needed:** For the `ERROR` state (clamd reachable but scan returned error): - (A) Treat as `INFECTED` — reject the upload (fully fail-closed). Safest. - (B) Treat as a 503 (service-level error) — same as "ClamAV unavailable". Honest. - (C) Treat as `CLEAN` — allow upload (fail-open). Not recommended; contradicts the fail-closed decision. Recommendation: Option (A) is consistent with the issue's stated fail-closed intent. --- ### Mass Import Upload Path Coverage **Raised by: Markus** The issue scopes the scan to `FileService.uploadFile()`. `MassImportService` also stores files, but does it route through `FileService.uploadFile()` or write directly to S3? The codebase shows `MassImportService` calls `fileService.uploadFile()` for file persistence — so the scan will cover that path automatically. **Verify during implementation**: if any code path calls `s3Client.putObject()` directly (bypassing `FileService`), it needs its own scan call. **Decision needed:** Confirm during implementation that no direct `s3Client` calls exist outside `FileService`. If found, decide whether to add scan there or refactor to route through `FileService`. --- ### Signature Staleness Alert Threshold **Raised by: Elicit** `freshclam` runs daily by default. No requirement specifies what happens if freshclam fails for an extended period (signatures become stale). **Decision needed:** Define an acceptable staleness window: - (A) No operational alert — accept up to 7 days stale before manual intervention (low operational overhead, low risk for a private archive). - (B) Add a Prometheus/Grafana alert if the ClamAV container hasn't successfully updated signatures in >48 hours (requires parsing clamd logs or a freshclam webhook, non-trivial). For a family archive, Option (A) is likely sufficient.

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#464