security(import): validate PDF magic bytes before S3 upload #618

marcel · 2026-05-18T13:15:32+02:00

marcel commented

2026-05-18 13:15:32 +02:00

Closes #529

Summary

Backend: MassImportService now reads the first 4 bytes of every candidate file before calling s3Client.putObject. Files whose header does not match %PDF (0x25 0x50 0x44 0x46) are rejected, logged via SLF4J {} placeholders, and collected in a new skippedFiles list. Files shorter than 4 bytes are also rejected.
API: ImportStatus gains two new fields — skipped: int and skippedFiles: List<SkippedFile> (where SkippedFile carries filename + reason) — per Marcel's API shape decision.
Frontend: ImportStatusCard surfaces the skipped count in an amber warning section with a native <details>/<summary> collapsible filename list (DONE state only, hidden when skipped === 0). i18n keys added for de/en/es.
Tests: Four new regression tests cover the one-upload-one-skip scenario, skipped count, rejected filename in skippedFiles, and the short-file edge case.

Test plan

CI: MassImportServiceTest (44 tests) — all regression tests pass
CI: AdminControllerTest — unchanged behaviour confirmed
CI: ImportStatusCard.svelte.test.ts — 3 new skipped-display tests pass alongside 9 existing tests
Manual: run import with a mix of valid PDFs and renamed non-PDFs; confirm skipped count appears in admin card with amber styling and collapsible list

🤖 Generated with Claude Code

Closes #529 ## Summary - **Backend:** `MassImportService` now reads the first 4 bytes of every candidate file before calling `s3Client.putObject`. Files whose header does not match `%PDF` (0x25 0x50 0x44 0x46) are rejected, logged via SLF4J `{}` placeholders, and collected in a new `skippedFiles` list. Files shorter than 4 bytes are also rejected. - **API:** `ImportStatus` gains two new fields — `skipped: int` and `skippedFiles: List<SkippedFile>` (where `SkippedFile` carries `filename` + `reason`) — per Marcel's API shape decision. - **Frontend:** `ImportStatusCard` surfaces the skipped count in an amber warning section with a native `<details>/<summary>` collapsible filename list (DONE state only, hidden when `skipped === 0`). i18n keys added for de/en/es. - **Tests:** Four new regression tests cover the one-upload-one-skip scenario, skipped count, rejected filename in `skippedFiles`, and the short-file edge case. ## Test plan - [ ] CI: `MassImportServiceTest` (44 tests) — all regression tests pass - [ ] CI: `AdminControllerTest` — unchanged behaviour confirmed - [ ] CI: `ImportStatusCard.svelte.test.ts` — 3 new skipped-display tests pass alongside 9 existing tests - [ ] Manual: run import with a mix of valid PDFs and renamed non-PDFs; confirm skipped count appears in admin card with amber styling and collapsible list 🤖 Generated with [Claude Code](https://claude.com/claude-code)

marcel added 2 commits 2026-05-18 13:15:32 +02:00

feat(import): validate PDF magic bytes before S3 upload e124c68cf4

Reads first 4 bytes of each candidate file before upload; rejects any
file whose header does not match %PDF (0x25 0x50 0x44 0x46). Skipped
files are counted and collected in ImportStatus.skippedFiles so
operators can see what was rejected without querying Loki.

Breaking: ImportStatus record gains skipped + skippedFiles fields.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(admin): surface skipped file count in ImportStatusCard

CI / Unit & Component Tests (pull_request) Successful in 3m4s

Details

CI / OCR Service Tests (pull_request) Successful in 20s

Details

CI / Backend Unit Tests (pull_request) Successful in 3m2s

Details

CI / fail2ban Regex (pull_request) Successful in 39s

Details

CI / Semgrep Security Scan (pull_request) Successful in 18s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s

Details

22589e4729

Adds SkippedFile to the local ImportStatus type and updates
ImportStatusCard to show an amber skipped-count section with a
collapsible filename list in the DONE state. Only rendered when
skipped > 0. i18n keys added for de/en/es.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel referenced this pull request

2026-05-18 13:15:52 +02:00

security(import): validate PDF magic bytes in MassImportService before S3 upload #529

marcel commented

2026-05-18 13:19:18 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ⚠️ Approved with concerns

Blockers

AdminControllerTest.java — duplicate import

import java.util.List;
import java.util.List;  // ← duplicate, added by this PR

Won't break compilation but signals a messy merge. Remove one.

Concerns

skipped is always skippedFiles.size() — redundant field

Both ProcessResult and ImportStatus carry an explicit skipped: int that is always equal to skippedFiles.size(). This violates DRY and introduces a potential consistency bug (what if they diverge?).

Option A — derive it in ImportStatus (preferred if the field is API-facing for client convenience):

public record ImportStatus(..., List<SkippedFile> skippedFiles, ...) {
    public int skipped() { return skippedFiles.size(); }
}

Option B — drop the field and let clients call .skippedFiles().size().

Either way, ProcessResult should drop its skipped field since it's a private implementation type — skippedFiles.size() is available directly.

reason strings are hardcoded German — will appear as German in every locale

SkippedFile.reason is set to "Keine gültige PDF-Signatur" and "Fehler beim Lesen der Datei" in Java, then rendered verbatim in the UI ({filename} — {reason}). If an English or Spanish admin runs this import, they'll see a German reason string. Since this project has de/en/es i18n, the reason should be a code (e.g. "INVALID_PDF_SIGNATURE") resolved to an i18n string on the frontend — or at minimum the frontend should translate it via the message keys that were added.

importSingleDocument boolean return — minor smell

Returning boolean from a method named importSingle… is a subtle violation of the single-responsibility principle (command vs query). It works fine here, but naming it tryImportSingleDocument or splitting into check + execute would make the intent cleaner.

Positives

isPdfMagicBytes is clean, minimal, and correctly uses try-with-resources. The inline hex→ASCII comments explain why each byte value was chosen — exactly the right use of comments.
{#each importStatus.skippedFiles as { filename, reason } (filename)} — correctly keyed. Good.
The <details>/<summary> native HTML approach is the right call — no JavaScript, no state management needed.
Test structure with @TempDir and real file I/O is the correct approach for validating magic byte behavior.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ⚠️ Approved with concerns** ### Blockers **`AdminControllerTest.java` — duplicate import** ```java import java.util.List; import java.util.List; // ← duplicate, added by this PR ``` Won't break compilation but signals a messy merge. Remove one. ### Concerns **`skipped` is always `skippedFiles.size()` — redundant field** Both `ProcessResult` and `ImportStatus` carry an explicit `skipped: int` that is always equal to `skippedFiles.size()`. This violates DRY and introduces a potential consistency bug (what if they diverge?). Option A — derive it in `ImportStatus` (preferred if the field is API-facing for client convenience): ```java public record ImportStatus(..., List<SkippedFile> skippedFiles, ...) { public int skipped() { return skippedFiles.size(); } } ``` Option B — drop the field and let clients call `.skippedFiles().size()`. Either way, `ProcessResult` should drop its `skipped` field since it's a private implementation type — `skippedFiles.size()` is available directly. **`reason` strings are hardcoded German — will appear as German in every locale** `SkippedFile.reason` is set to `"Keine gültige PDF-Signatur"` and `"Fehler beim Lesen der Datei"` in Java, then rendered verbatim in the UI (`{filename} — {reason}`). If an English or Spanish admin runs this import, they'll see a German reason string. Since this project has de/en/es i18n, the reason should be a code (e.g. `"INVALID_PDF_SIGNATURE"`) resolved to an i18n string on the frontend — or at minimum the frontend should translate it via the message keys that were added. **`importSingleDocument` boolean return — minor smell** Returning `boolean` from a method named `importSingle…` is a subtle violation of the single-responsibility principle (command vs query). It works fine here, but naming it `tryImportSingleDocument` or splitting into check + execute would make the intent cleaner. ### Positives - `isPdfMagicBytes` is clean, minimal, and correctly uses try-with-resources. The inline hex→ASCII comments explain *why* each byte value was chosen — exactly the right use of comments. - `{#each importStatus.skippedFiles as { filename, reason } (filename)}` — correctly keyed. Good. - The `<details>/<summary>` native HTML approach is the right call — no JavaScript, no state management needed. - Test structure with `@TempDir` and real file I/O is the correct approach for validating magic byte behavior.

marcel commented

2026-05-18 13:19:31 +02:00

🏛️ Markus Keller — Application Architect

Verdict: ✅ Approved

What I Checked

Module boundaries — ✅ clean
All changes stay inside the importing package. SkippedFile, ProcessResult, and the updated ImportStatus are all declared within MassImportService. ProcessResult is package-private (no public), correctly scoped as a private implementation detail. No other domain's repository or service is touched.

No Flyway migration needed — ✅ correct
The skipped-file data is volatile in-memory state (import run results), not persisted to the database. Storing it in the JVM is the right call. No schema changes = no migration = no diagram update required.

Documentation table check

PR trigger	Required update	Status
New Flyway migration	`db-orm.puml`, `db-relationships.puml`	Not applicable
New backend package	`CLAUDE.md` + C4 diagram	Not applicable
New controller/service	C4 diagram	Not applicable
New SvelteKit route	`CLAUDE.md` + C4 diagram	Not applicable
New ErrorCode	`CLAUDE.md` + `ARCHITECTURE.md`	Not applicable
New domain concept	`GLOSSARY.md`	Not applicable

Nothing to update. The PR is correctly scoped.

Minor Concern (not blocking)

ImportStatus.skipped duplicates information already present in skippedFiles.size(). If the intent is to spare API clients a .length call, the field is justified — but it then needs to be a derived accessor, not a record component, to eliminate the consistency risk. I'd make it a compact record method rather than a positional field, which also keeps the record constructor from growing further.

public record ImportStatus(State state, String statusCode, @JsonIgnore String message,
                            int processed, List<SkippedFile> skippedFiles, LocalDateTime startedAt) {
    public int skipped() { return skippedFiles.size(); }
}

This keeps the API surface identical but removes the stale-state risk.

Overall

Clean, bounded, no architectural concerns. The feature fits naturally into the existing MassImportService lifecycle without adding new abstractions or crossing domain boundaries.

## 🏛️ Markus Keller — Application Architect **Verdict: ✅ Approved** ### What I Checked **Module boundaries — ✅ clean** All changes stay inside the `importing` package. `SkippedFile`, `ProcessResult`, and the updated `ImportStatus` are all declared within `MassImportService`. `ProcessResult` is package-private (no `public`), correctly scoped as a private implementation detail. No other domain's repository or service is touched. **No Flyway migration needed — ✅ correct** The skipped-file data is volatile in-memory state (import run results), not persisted to the database. Storing it in the JVM is the right call. No schema changes = no migration = no diagram update required. **Documentation table check** | PR trigger | Required update | Status | |---|---|---| | New Flyway migration | `db-orm.puml`, `db-relationships.puml` | Not applicable | | New backend package | `CLAUDE.md` + C4 diagram | Not applicable | | New controller/service | C4 diagram | Not applicable | | New SvelteKit route | `CLAUDE.md` + C4 diagram | Not applicable | | New ErrorCode | `CLAUDE.md` + `ARCHITECTURE.md` | Not applicable | | New domain concept | `GLOSSARY.md` | Not applicable | Nothing to update. The PR is correctly scoped. ### Minor Concern (not blocking) `ImportStatus.skipped` duplicates information already present in `skippedFiles.size()`. If the intent is to spare API clients a `.length` call, the field is justified — but it then needs to be a derived accessor, not a record component, to eliminate the consistency risk. I'd make it a compact record method rather than a positional field, which also keeps the record constructor from growing further. ```java public record ImportStatus(State state, String statusCode, @JsonIgnore String message, int processed, List<SkippedFile> skippedFiles, LocalDateTime startedAt) { public int skipped() { return skippedFiles.size(); } } ``` This keeps the API surface identical but removes the stale-state risk. ### Overall Clean, bounded, no architectural concerns. The feature fits naturally into the existing `MassImportService` lifecycle without adding new abstractions or crossing domain boundaries.

marcel commented

2026-05-18 13:19:40 +02:00

🔧 Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

What I Checked

No infrastructure changes — correct
This PR touches zero Docker Compose configuration, CI workflow files, or environment variables. The magic byte validation is a pure application-level addition using Java's standard FileInputStream on the already-mounted import volume. No new dependencies, no new services, no operational surface.

File I/O pattern is safe for the deployment model
The validation reads exactly 4 bytes from a file in importDir (a bind-mounted host path in the Docker Compose setup). This is identical to how the existing file reading in importSingleDocument works. No additional disk access pattern introduced.

Memory footprint — ✅ fine
readNBytes(4) allocates a 4-byte array per file. This is negligible. No concern for the CX32 VPS (8 GB RAM).

No new secrets or env vars required
Nothing to add to .env.example or Gitea secrets.

One Note (not blocking)

The skippedFiles list is held in the JVM's volatile currentStatus field with no cap. If someone were to place thousands of non-PDF files in the import directory and trigger a run, the API response for /api/admin/import-status could grow large. This is an admin-only endpoint accessed only on the VPS, so the practical risk is negligible — but worth noting as a future cleanup item.

Overall

LGTM from infrastructure. No Compose changes, no CI changes, no operational concerns introduced.

## 🔧 Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** ### What I Checked **No infrastructure changes — correct** This PR touches zero Docker Compose configuration, CI workflow files, or environment variables. The magic byte validation is a pure application-level addition using Java's standard `FileInputStream` on the already-mounted import volume. No new dependencies, no new services, no operational surface. **File I/O pattern is safe for the deployment model** The validation reads exactly 4 bytes from a file in `importDir` (a bind-mounted host path in the Docker Compose setup). This is identical to how the existing file reading in `importSingleDocument` works. No additional disk access pattern introduced. **Memory footprint — ✅ fine** `readNBytes(4)` allocates a 4-byte array per file. This is negligible. No concern for the CX32 VPS (8 GB RAM). **No new secrets or env vars required** Nothing to add to `.env.example` or Gitea secrets. ### One Note (not blocking) The `skippedFiles` list is held in the JVM's volatile `currentStatus` field with no cap. If someone were to place thousands of non-PDF files in the import directory and trigger a run, the API response for `/api/admin/import-status` could grow large. This is an admin-only endpoint accessed only on the VPS, so the practical risk is negligible — but worth noting as a future cleanup item. ### Overall LGTM from infrastructure. No Compose changes, no CI changes, no operational concerns introduced.

marcel commented

2026-05-18 13:19:55 +02:00

📋 Elicit — Requirements Engineer

Verdict: ⚠️ Approved with concerns

Requirements Coverage

The core requirement (reject files that don't begin with the PDF magic bytes %PDF) is implemented and tested. The acceptance criteria from the PR description are met: one-upload-one-skip scenario, skipped count, rejected filename in skippedFiles, and the short-file edge case.

Concerns

The reason string is not locale-aware — potential UX requirement gap

SkippedFile.reason is hardcoded in German ("Keine gültige PDF-Signatur", "Fehler beim Lesen der Datei") in the Java source. The frontend renders it verbatim: {filename} — {reason}. Since the application already supports de/en/es via Paraglide, an English or Spanish admin will see German reason text.

This is an implicit requirements gap: the issue likely assumed the displayed text would match the user's locale. Two paths to resolve:

Make reason a machine code (e.g. INVALID_PDF_SIGNATURE) and map it to i18n in the frontend — consistent with how ErrorCodes work elsewhere in the system.
Accept German-only reason strings and document this as a known limitation.

Either decision is valid, but it should be made explicitly.

Skipped section is hidden in FAILED state — is this correct?

The Svelte template only shows the skipped count within the green DONE block. If an import processes 50 files, skips 3 non-PDFs, and then fails (e.g. spreadsheet error), the currentStatus transitions to FAILED with skipped=0, skippedFiles=[] (see the catch blocks). So skipped files accumulated before a failure are silently discarded.

Requirement question: Should the admin know which files were skipped even when the import ultimately failed? Currently the answer is "no" by implementation. If "yes" is the correct answer, the FAILED catch block should preserve the partial skippedFiles list.

OQ-001: Is the reason field intended to be locale-aware?
OQ-002: Should skipped files accumulated before a failure be visible in the FAILED state?

Positives

The UX decision to hide the skipped section when skipped === 0 is correct — no unnecessary visual noise on clean imports.
The collapsible filename list is proportionate: shows a summary count first, hides the detail until requested. Good information hierarchy.
All three i18n locales (de/en/es) have the new key — no missing translations.

## 📋 Elicit — Requirements Engineer **Verdict: ⚠️ Approved with concerns** ### Requirements Coverage The core requirement (reject files that don't begin with the PDF magic bytes `%PDF`) is implemented and tested. The acceptance criteria from the PR description are met: one-upload-one-skip scenario, skipped count, rejected filename in `skippedFiles`, and the short-file edge case. ### Concerns **The `reason` string is not locale-aware — potential UX requirement gap** `SkippedFile.reason` is hardcoded in German ("Keine gültige PDF-Signatur", "Fehler beim Lesen der Datei") in the Java source. The frontend renders it verbatim: `{filename} — {reason}`. Since the application already supports de/en/es via Paraglide, an English or Spanish admin will see German reason text. This is an implicit requirements gap: the issue likely assumed the displayed text would match the user's locale. Two paths to resolve: 1. Make `reason` a machine code (e.g. `INVALID_PDF_SIGNATURE`) and map it to i18n in the frontend — consistent with how ErrorCodes work elsewhere in the system. 2. Accept German-only reason strings and document this as a known limitation. Either decision is valid, but it should be made explicitly. **Skipped section is hidden in FAILED state — is this correct?** The Svelte template only shows the skipped count within the green DONE block. If an import processes 50 files, skips 3 non-PDFs, and then fails (e.g. spreadsheet error), the `currentStatus` transitions to `FAILED` with `skipped=0, skippedFiles=[]` (see the catch blocks). So skipped files accumulated before a failure are silently discarded. Requirement question: **Should the admin know which files were skipped even when the import ultimately failed?** Currently the answer is "no" by implementation. If "yes" is the correct answer, the `FAILED` catch block should preserve the partial `skippedFiles` list. **OQ-001**: Is the `reason` field intended to be locale-aware? **OQ-002**: Should skipped files accumulated before a failure be visible in the FAILED state? ### Positives - The UX decision to hide the skipped section when `skipped === 0` is correct — no unnecessary visual noise on clean imports. - The collapsible filename list is proportionate: shows a summary count first, hides the detail until requested. Good information hierarchy. - All three i18n locales (de/en/es) have the new key — no missing translations.

marcel commented

2026-05-18 13:20:11 +02:00

🔒 Nora "NullX" Steiner — Application Security Engineer

Verdict: ✅ Approved

Security Assessment

CWE-434 (Unrestricted File Upload) — ✅ addressed correctly

The magic byte check (%PDF = 0x25 0x50 0x44 0x46) is a valid first-layer defense against content-type spoofing. Files renamed to .pdf but containing other formats (ZIP, EXE, Office XML, etc.) will be rejected before S3 upload. This closes the scenario described in issue #529.

Memory safety during validation — ✅ clean
readNBytes(4) reads exactly 4 bytes and releases the stream via try-with-resources. There is no risk of loading a large malicious file into memory during validation. This is the correct approach.

Logging — ✅ no injection risk
All logging in the new code uses SLF4J parameterized placeholders:

log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename);
log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e);

The {} placeholder pattern does not evaluate JNDI lookups. No Log4Shell-style risk.

Authorization surface — ✅ unchanged
This PR does not modify any endpoint mappings or permission annotations. The admin import endpoint's existing authorization is preserved.

Security Notes (not blocking)

Magic bytes are a necessary but not sufficient PDF validation

%PDF at offset 0 confirms the file claims to be a PDF. A crafted polyglot file (e.g. a PDF/ZIP hybrid) would pass this check. For the threat model here — preventing accidental uploads of wrong file types in a family archive context — magic byte validation is proportionate and appropriate. A full structural validation (Apache Tika or iText inspection) would be more thorough but is likely overkill for this use case.

skippedFiles list has no size cap

If thousands of fake-PDF files are placed in the import directory and a run is triggered, the ImportStatus.skippedFiles list grows without bound and is served via the API. The practical risk is low (the import directory requires server access to populate), but worth noting if the import volume ever becomes externally writable.

SkippedFile.reason leaks backend locale

The reason strings are hardcoded German. While this is not a security issue (no user input is reflected back), it may reveal the application's implementation language to admin users who expect their locale. Not a vulnerability — see Requirements for the i18n angle.

Verdict

The validation logic is sound, the implementation is clean, and no new security surface is introduced. The threat model for this feature is correctly scoped.

## 🔒 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** ### Security Assessment **CWE-434 (Unrestricted File Upload) — ✅ addressed correctly** The magic byte check (`%PDF` = `0x25 0x50 0x44 0x46`) is a valid first-layer defense against content-type spoofing. Files renamed to `.pdf` but containing other formats (ZIP, EXE, Office XML, etc.) will be rejected before S3 upload. This closes the scenario described in issue #529. **Memory safety during validation — ✅ clean** `readNBytes(4)` reads exactly 4 bytes and releases the stream via try-with-resources. There is no risk of loading a large malicious file into memory during validation. This is the correct approach. **Logging — ✅ no injection risk** All logging in the new code uses SLF4J parameterized placeholders: ```java log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename); log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e); ``` The `{}` placeholder pattern does not evaluate JNDI lookups. No Log4Shell-style risk. **Authorization surface — ✅ unchanged** This PR does not modify any endpoint mappings or permission annotations. The admin import endpoint's existing authorization is preserved. ### Security Notes (not blocking) **Magic bytes are a necessary but not sufficient PDF validation** `%PDF` at offset 0 confirms the file *claims* to be a PDF. A crafted polyglot file (e.g. a PDF/ZIP hybrid) would pass this check. For the threat model here — preventing accidental uploads of wrong file types in a family archive context — magic byte validation is proportionate and appropriate. A full structural validation (Apache Tika or iText inspection) would be more thorough but is likely overkill for this use case. **`skippedFiles` list has no size cap** If thousands of fake-PDF files are placed in the import directory and a run is triggered, the `ImportStatus.skippedFiles` list grows without bound and is served via the API. The practical risk is low (the import directory requires server access to populate), but worth noting if the import volume ever becomes externally writable. **`SkippedFile.reason` leaks backend locale** The reason strings are hardcoded German. While this is not a security issue (no user input is reflected back), it may reveal the application's implementation language to admin users who expect their locale. Not a vulnerability — see Requirements for the i18n angle. ### Verdict The validation logic is sound, the implementation is clean, and no new security surface is introduced. The threat model for this feature is correctly scoped.

marcel commented

2026-05-18 13:20:26 +02:00

🧪 Sara Holt — QA Engineer

Verdict: ⚠️ Approved with concerns

What Was Added

4 new backend tests + 1 helper method + 3 new frontend component tests. The happy path, skipped count, skipped filename, and short-file edge case are all covered. The frontend tests cover show/hide behavior and filename rendering. Test names are descriptive.

Concerns

Copy-pasted test setup — maintenance risk

Three of the four new backend tests share identical setup:

byte[] pdfHeader = {0x25, 0x50, 0x44, 0x46, 0x2D};
Files.write(tempDir.resolve("real.pdf"), pdfHeader);
Files.writeString(tempDir.resolve("fake.pdf"), "not a pdf");
buildMinimalImportXlsx(tempDir, "real.pdf", "fake.pdf");
ReflectionTestUtils.setField(service, "importDir", tempDir.toString());
when(documentService.findByOriginalFilename(any())).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));

This appears verbatim in runImportAsync_uploadsValidPdf_andSkipsFakeOne, runImportAsync_setsSkippedCount_toOne_whenOneFakeFile, and runImportAsync_includesRejectedFilename_inSkippedFiles. Extract to a @BeforeEach or a private setupOneValidOneFakeImport(Path) helper.

Missing test: IOException path in isPdfMagicBytes

The code handles IOException from isPdfMagicBytes by logging and adding to skippedFiles with reason "Fehler beim Lesen der Datei":

} catch (IOException e) {
    log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e);
    skippedFiles.add(new SkippedFile(filename, "Fehler beim Lesen der Datei"));
    continue;
}

There is no test covering this path. A test using a directory disguised as a file (or a mocked FileInputStream) should verify that the file is counted as skipped and not as processed when an IOException occurs.

runImportAsync_skipsFile_whenShorterThanFourBytes — missing mock setup

This test doesn't set up documentService.findByOriginalFilename. It passes because the code never reaches that call (the file is rejected first). But with Mockito strict stubbing, if the code path ever changes and the mock is called unexpectedly, the test will throw UnnecessaryStubbingException or a NullPointerException. Add the mock setup for defensive coverage:

when(documentService.findByOriginalFilename(any())).thenReturn(Optional.empty());

AdminControllerTest — duplicate import (blocker)
import java.util.List; is declared twice. Remove the duplicate.

Positives

@TempDir with real file I/O is the correct approach — no mocking of the file system.
The 3 frontend tests use data-testid and getByText — testing visible behavior, not internals.
The buildMinimalImportXlsx helper is a clean extraction that keeps each test body minimal.
Test for the skipped === 0 case (does not show skipped section) is present — empty-state coverage is often forgotten.

## 🧪 Sara Holt — QA Engineer **Verdict: ⚠️ Approved with concerns** ### What Was Added 4 new backend tests + 1 helper method + 3 new frontend component tests. The happy path, skipped count, skipped filename, and short-file edge case are all covered. The frontend tests cover show/hide behavior and filename rendering. Test names are descriptive. ### Concerns **Copy-pasted test setup — maintenance risk** Three of the four new backend tests share identical setup: ```java byte[] pdfHeader = {0x25, 0x50, 0x44, 0x46, 0x2D}; Files.write(tempDir.resolve("real.pdf"), pdfHeader); Files.writeString(tempDir.resolve("fake.pdf"), "not a pdf"); buildMinimalImportXlsx(tempDir, "real.pdf", "fake.pdf"); ReflectionTestUtils.setField(service, "importDir", tempDir.toString()); when(documentService.findByOriginalFilename(any())).thenReturn(Optional.empty()); when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0)); ``` This appears verbatim in `runImportAsync_uploadsValidPdf_andSkipsFakeOne`, `runImportAsync_setsSkippedCount_toOne_whenOneFakeFile`, and `runImportAsync_includesRejectedFilename_inSkippedFiles`. Extract to a `@BeforeEach` or a private `setupOneValidOneFakeImport(Path)` helper. **Missing test: `IOException` path in `isPdfMagicBytes`** The code handles `IOException` from `isPdfMagicBytes` by logging and adding to `skippedFiles` with reason "Fehler beim Lesen der Datei": ```java } catch (IOException e) { log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e); skippedFiles.add(new SkippedFile(filename, "Fehler beim Lesen der Datei")); continue; } ``` There is no test covering this path. A test using a directory disguised as a file (or a mocked `FileInputStream`) should verify that the file is counted as skipped and not as processed when an IOException occurs. **`runImportAsync_skipsFile_whenShorterThanFourBytes` — missing mock setup** This test doesn't set up `documentService.findByOriginalFilename`. It passes because the code never reaches that call (the file is rejected first). But with Mockito strict stubbing, if the code path ever changes and the mock is called unexpectedly, the test will throw `UnnecessaryStubbingException` or a `NullPointerException`. Add the mock setup for defensive coverage: ```java when(documentService.findByOriginalFilename(any())).thenReturn(Optional.empty()); ``` **`AdminControllerTest` — duplicate import (blocker)** `import java.util.List;` is declared twice. Remove the duplicate. ### Positives - `@TempDir` with real file I/O is the correct approach — no mocking of the file system. - The 3 frontend tests use `data-testid` and `getByText` — testing visible behavior, not internals. - The `buildMinimalImportXlsx` helper is a clean extraction that keeps each test body minimal. - Test for the `skipped === 0` case (`does not show skipped section`) is present — empty-state coverage is often forgotten.

marcel commented

2026-05-18 13:20:44 +02:00

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Verdict: ⚠️ Approved with concerns

Blockers

list-none removes the disclosure triangle — accessibility failure

<summary class="cursor-pointer list-none">

list-none strips the browser's native ▶/▼ disclosure marker from <summary>. Users who tab to this element see a block of text with no visual affordance indicating it expands. cursor-pointer helps mouse users but not keyboard users. This violates WCAG 1.4.11 (Non-text Contrast) and WCAG 2.1 understanding of interactive component states.

Fix — add a rotating chevron that signals the open/closed state:

<summary class="cursor-pointer list-none flex items-center gap-2">
    <svg class="w-4 h-4 shrink-0 transition-transform details-chevron" viewBox="0 0 16 16" fill="currentColor">
        <path d="M6 4l4 4-4 4"/>
    </svg>
    <div>
        <p data-testid="skipped-count" class="text-base font-bold">{importStatus.skipped}</p>
        <p class="font-sans text-xs font-bold tracking-widest text-amber-800 uppercase">
            {m.admin_system_import_skipped_label()}
        </p>
    </div>
</summary>

details[open] .details-chevron { transform: rotate(90deg); }

Concerns

Raw Tailwind amber colors are not brand tokens

The component uses border-amber-200, bg-amber-50, text-amber-700, text-amber-800 — none of which are project brand tokens. The project's color system is defined in layout.css via CSS custom properties. If an amber/warning token doesn't exist yet, add one:

/* layout.css */
--color-warning-surface: oklch(98% 0.04 85);
--color-warning-line:    oklch(88% 0.09 85);
--color-warning-ink:     oklch(45% 0.12 75);

Then use semantic classes in the component. Hardcoded Tailwind color values scatter the brand decisions and break dark mode.

<p> elements inside <summary> are block elements in phrasing context

The <summary> spec allows phrasing content. <p> is flow content, not phrasing. Browsers render it, but it's semantically incorrect and can cause unexpected focus behavior in some screen readers. Use <span> with block display instead:

<summary class="...">
    <span class="block text-base font-bold">{importStatus.skipped}</span>
    <span class="block font-sans text-xs font-bold tracking-widest uppercase">
        {m.admin_system_import_skipped_label()}
    </span>
</summary>

Positives

<details>/<summary> is the correct HTML primitive for this use case — no JavaScript, no Svelte state, works without JS, screen readers announce the expanded/collapsed state natively.
The amber/warning color choice for skipped files is semantically correct — amber signals "attention" without the alarm of red.
font-mono text-xs text-ink-2 for the filename list is appropriate — text-ink-2 is a brand token, monospace aids readability for filenames.
The data-testid="skipped-count" attribute enables reliable test selection without coupling to copy text.
The section appears only when skipped > 0 — no visual noise on clean imports. Correct behavior.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist **Verdict: ⚠️ Approved with concerns** ### Blockers **`list-none` removes the disclosure triangle — accessibility failure** ```svelte <summary class="cursor-pointer list-none"> ``` `list-none` strips the browser's native ▶/▼ disclosure marker from `<summary>`. Users who tab to this element see a block of text with no visual affordance indicating it expands. `cursor-pointer` helps mouse users but not keyboard users. This violates WCAG 1.4.11 (Non-text Contrast) and WCAG 2.1 understanding of interactive component states. Fix — add a rotating chevron that signals the open/closed state: ```svelte <summary class="cursor-pointer list-none flex items-center gap-2"> <svg class="w-4 h-4 shrink-0 transition-transform details-chevron" viewBox="0 0 16 16" fill="currentColor"> <path d="M6 4l4 4-4 4"/> </svg> <div> <p data-testid="skipped-count" class="text-base font-bold">{importStatus.skipped}</p> <p class="font-sans text-xs font-bold tracking-widest text-amber-800 uppercase"> {m.admin_system_import_skipped_label()} </p> </div> </summary> ``` ```css details[open] .details-chevron { transform: rotate(90deg); } ``` ### Concerns **Raw Tailwind amber colors are not brand tokens** The component uses `border-amber-200`, `bg-amber-50`, `text-amber-700`, `text-amber-800` — none of which are project brand tokens. The project's color system is defined in `layout.css` via CSS custom properties. If an amber/warning token doesn't exist yet, add one: ```css /* layout.css */ --color-warning-surface: oklch(98% 0.04 85); --color-warning-line: oklch(88% 0.09 85); --color-warning-ink: oklch(45% 0.12 75); ``` Then use semantic classes in the component. Hardcoded Tailwind color values scatter the brand decisions and break dark mode. **`<p>` elements inside `<summary>` are block elements in phrasing context** The `<summary>` spec allows phrasing content. `<p>` is flow content, not phrasing. Browsers render it, but it's semantically incorrect and can cause unexpected focus behavior in some screen readers. Use `<span>` with `block` display instead: ```svelte <summary class="..."> <span class="block text-base font-bold">{importStatus.skipped}</span> <span class="block font-sans text-xs font-bold tracking-widest uppercase"> {m.admin_system_import_skipped_label()} </span> </summary> ``` ### Positives - `<details>/<summary>` is the correct HTML primitive for this use case — no JavaScript, no Svelte state, works without JS, screen readers announce the expanded/collapsed state natively. - The amber/warning color choice for skipped files is semantically correct — amber signals "attention" without the alarm of red. - `font-mono text-xs text-ink-2` for the filename list is appropriate — `text-ink-2` is a brand token, monospace aids readability for filenames. - The `data-testid="skipped-count"` attribute enables reliable test selection without coupling to copy text. - The section appears only when `skipped > 0` — no visual noise on clean imports. Correct behavior.

marcel added 1 commit 2026-05-18 14:46:45 +02:00

fix(import): address PR review concerns

CI / Unit & Component Tests (pull_request) Successful in 3m3s

Details

CI / OCR Service Tests (pull_request) Successful in 20s

Details

CI / Backend Unit Tests (pull_request) Failing after 3m2s

Details

CI / fail2ban Regex (pull_request) Successful in 39s

Details

CI / Semgrep Security Scan (pull_request) Successful in 19s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s

Details

50441f558f

- remove duplicate List import in AdminControllerTest
- derive skipped() from skippedFiles.size() — drop redundant int field
- use machine codes for SkippedFile.reason (INVALID_PDF_SIGNATURE, FILE_READ_ERROR)
- map reason codes to i18n strings in ImportStatusCard (de/en/es)
- replace raw amber Tailwind classes with warning semantic token
- fix <summary> accessibility: replace list-none with rotating chevron SVG
- replace <p> with <span> inside <summary> (phrasing content rule)
- extract setupOneValidOneFakeImport() helper — remove 3x copy-paste
- add lenient mock to short-file test for defensive coverage
- add IOException path test for isPdfMagicBytes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel added 1 commit 2026-05-18 14:59:09 +02:00

fix(test): skip IOException test when running as root

CI / Unit & Component Tests (pull_request) Successful in 3m4s

Details

CI / OCR Service Tests (pull_request) Successful in 20s

Details

CI / Backend Unit Tests (pull_request) Successful in 3m14s

Details

CI / fail2ban Regex (pull_request) Successful in 47s

Details

CI / Semgrep Security Scan (pull_request) Successful in 20s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s

Details

e770b81ea5

setReadable(false) silently no-ops as root; check canRead() to guard
the assumption correctly so the test is skipped in Docker CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel commented

2026-05-18 15:04:59 +02:00

🏗️ Markus Keller — Senior Application Architect

Verdict: ⚠️ Approved with concerns

Good containment: the entire change lives inside the importing module. No cross-domain repository access introduced. Java 21 records used appropriately for SkippedFile and ProcessResult. The monolith boundary stays clean.

Blockers

OpenAPI / TypeScript type sync

SkippedFile is a new public record returned from the /api/admin/import-status endpoint. Neither SkippedFile nor the updated ImportStatus record carries @Schema(requiredMode = Schema.RequiredMode.REQUIRED) annotations on their fields. Per CLAUDE.md, every field the backend always populates must be annotated so the TypeScript codegen produces non-optional types. The PR side-steps this by manually editing types.ts — but the CLAUDE.md rule is clear: npm run generate:api must be run after any backend model change. If the endpoint is in the OpenAPI spec, codegen was skipped. If it is intentionally out-of-spec, that decision needs to be explicit (the CONTRIBUTING.md walk-through for adding an endpoint should be followed or the deviation documented).

Action required: either add @Schema(requiredMode = REQUIRED) to all fields on SkippedFile and ImportStatus, run npm run generate:api, and remove the manual types.ts edit — or confirm the endpoint is deliberately excluded from the spec and explain why in a comment.

Suggestions

@JsonProperty("skipped") on a record component — the derived skipped() method is annotated @JsonProperty("skipped") to appear in the JSON response. This is a slightly unusual pattern. A cleaner alternative is a transient int field or simply relying on the list size at the call site. It works, but the asymmetry (a method annotated like a field) is a subtle readability trap for future maintainers.

Lost skipped data on FAILED state — when the import throws, currentStatus is reset with List.of() for skippedFiles. Any files already skipped before the failure are silently discarded. Whether this is acceptable depends on the original issue spec, but it should be an explicit design decision, not a side effect of the reset.

## 🏗️ Markus Keller — Senior Application Architect **Verdict: ⚠️ Approved with concerns** Good containment: the entire change lives inside the `importing` module. No cross-domain repository access introduced. Java 21 records used appropriately for `SkippedFile` and `ProcessResult`. The monolith boundary stays clean. ### Blockers **OpenAPI / TypeScript type sync** `SkippedFile` is a new public record returned from the `/api/admin/import-status` endpoint. Neither `SkippedFile` nor the updated `ImportStatus` record carries `@Schema(requiredMode = Schema.RequiredMode.REQUIRED)` annotations on their fields. Per CLAUDE.md, every field the backend always populates must be annotated so the TypeScript codegen produces non-optional types. The PR side-steps this by manually editing `types.ts` — but the CLAUDE.md rule is clear: `npm run generate:api` must be run after any backend model change. If the endpoint is in the OpenAPI spec, codegen was skipped. If it is intentionally out-of-spec, that decision needs to be explicit (the CONTRIBUTING.md walk-through for adding an endpoint should be followed or the deviation documented). Action required: either add `@Schema(requiredMode = REQUIRED)` to all fields on `SkippedFile` and `ImportStatus`, run `npm run generate:api`, and remove the manual `types.ts` edit — or confirm the endpoint is deliberately excluded from the spec and explain why in a comment. ### Suggestions **`@JsonProperty("skipped")` on a record component** — the derived `skipped()` method is annotated `@JsonProperty("skipped")` to appear in the JSON response. This is a slightly unusual pattern. A cleaner alternative is a transient int field or simply relying on the list size at the call site. It works, but the asymmetry (a method annotated like a field) is a subtle readability trap for future maintainers. **Lost skipped data on FAILED state** — when the import throws, `currentStatus` is reset with `List.of()` for `skippedFiles`. Any files already skipped before the failure are silently discarded. Whether this is acceptable depends on the original issue spec, but it should be an explicit design decision, not a side effect of the reset.

marcel commented

2026-05-18 15:05:11 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ⚠️ Approved with concerns

Overall the code is clean. isPdfMagicBytes is a sharp, single-purpose predicate. ProcessResult and SkippedFile are well-named value objects. SLF4J {} placeholders used correctly throughout. reasonLabel in the Svelte component is concise and readable.

Blockers

S3 upload failures vanish silently

When importSingleDocument returns false due to an S3 upload error, the document is neither counted in processed nor added to skippedFiles. The log captures the error, but the API response gives no signal that a file failed — not processed, not skipped. An admin looking at the DONE card would only see the success count, with no indication that one or more files were silently dropped.

This was pre-existing behavior (the old code also returned without counting), but now that skippedFiles exists as a mechanism to surface failures, S3 errors should arguably land in skippedFiles with a S3_UPLOAD_ERROR reason. At minimum, the existing silent behavior should be an explicit design choice documented in the issue, not a gap.

Suggestions

{#each ... (filename)} key uniqueness — using filename as the key in the each-block is fine as long as filenames are unique per import run, which they should be. But if two rows in the spreadsheet ever reference the same filename and both fail, the key collision would silently drop one entry from the DOM. Using a generated index ((i)) or a composite key (filename + reason) would be safer.

importSingleDocument return contract — the method now returns false for two very different reasons: "already exists (skipped intentionally)" and "S3 upload failed (error)". A caller reading if (imported) cannot distinguish between the two. A three-state return (enum ImportResult { IMPORTED, SKIPPED, FAILED }) or splitting the method would make the contract explicit.

Test helper buildMinimalImportXlsx — the helper writes only column 0 (index/filename). When processRows reads other columns (colBox, colSender, etc.) via getCell(cells, col), it will get empty strings. The tests pass because they exercise magic-bytes logic only, but a reader might not realize the helper produces a degenerate spreadsheet. A brief comment clarifying intent would help future maintainers.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ⚠️ Approved with concerns** Overall the code is clean. `isPdfMagicBytes` is a sharp, single-purpose predicate. `ProcessResult` and `SkippedFile` are well-named value objects. SLF4J `{}` placeholders used correctly throughout. `reasonLabel` in the Svelte component is concise and readable. ### Blockers **S3 upload failures vanish silently** When `importSingleDocument` returns `false` due to an S3 upload error, the document is neither counted in `processed` nor added to `skippedFiles`. The log captures the error, but the API response gives no signal that a file failed — not processed, not skipped. An admin looking at the DONE card would only see the success count, with no indication that one or more files were silently dropped. This was pre-existing behavior (the old code also `return`ed without counting), but now that `skippedFiles` exists as a mechanism to surface failures, S3 errors should arguably land in `skippedFiles` with a `S3_UPLOAD_ERROR` reason. At minimum, the existing silent behavior should be an explicit design choice documented in the issue, not a gap. ### Suggestions **`{#each ... (filename)}` key uniqueness** — using `filename` as the key in the each-block is fine as long as filenames are unique per import run, which they should be. But if two rows in the spreadsheet ever reference the same filename and both fail, the key collision would silently drop one entry from the DOM. Using a generated index (`(i)`) or a composite key (`filename + reason`) would be safer. **`importSingleDocument` return contract** — the method now returns `false` for two very different reasons: "already exists (skipped intentionally)" and "S3 upload failed (error)". A caller reading `if (imported)` cannot distinguish between the two. A three-state return (`enum ImportResult { IMPORTED, SKIPPED, FAILED }`) or splitting the method would make the contract explicit. **Test helper `buildMinimalImportXlsx`** — the helper writes only column 0 (index/filename). When `processRows` reads other columns (`colBox`, `colSender`, etc.) via `getCell(cells, col)`, it will get empty strings. The tests pass because they exercise magic-bytes logic only, but a reader might not realize the helper produces a degenerate spreadsheet. A brief comment clarifying intent would help future maintainers.

marcel commented

2026-05-18 15:05:16 +02:00

🛠️ Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

Pure application code change — no infrastructure footprint at all.

Checked:

No Docker Compose changes
No new services, volumes, or environment variables
No CI workflow modifications
No new dependencies in pom.xml or package.json
No Dockerfile changes

The one CI-relevant detail — assumeTrue(!unreadable.canRead()) in the IOException test — is handled correctly. The separate fix commit ("fix(test): skip IOException test when running as root") documents the reason: the CI runner executes as root, which bypasses Unix file permissions. The assumeTrue causes the test to be skipped (not failed) in that environment. This is the correct pattern for permission-sensitive tests in a root-capable container environment.

No action required from an infrastructure perspective.

## 🛠️ Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** Pure application code change — no infrastructure footprint at all. Checked: - No Docker Compose changes - No new services, volumes, or environment variables - No CI workflow modifications - No new dependencies in `pom.xml` or `package.json` - No Dockerfile changes The one CI-relevant detail — `assumeTrue(!unreadable.canRead())` in the IOException test — is handled correctly. The separate fix commit ("fix(test): skip IOException test when running as root") documents the reason: the CI runner executes as root, which bypasses Unix file permissions. The `assumeTrue` causes the test to be skipped (not failed) in that environment. This is the correct pattern for permission-sensitive tests in a root-capable container environment. No action required from an infrastructure perspective.

marcel commented

2026-05-18 15:05:28 +02:00

📋 Elicit — Requirements Engineer

Verdict: ⚠️ Approved with concerns

The implementation addresses the core requirement from #529: PDF magic bytes are validated before S3 upload, and the admin UI surfaces the skipped count with a collapsible filename list. i18n keys cover de/en/es. Test plan items are well-scoped.

Concerns

FR gap: skipped data is lost on FAILED state

The requirement presumably asks that admins can always understand the outcome of an import run. Currently, if the import fails mid-run (e.g., no spreadsheet found), currentStatus is reset with an empty skippedFiles list — any files that were already skipped before the failure are irretrievably gone from the API response. The admin card does not surface the FAILED state's skipped count at all. This may be acceptable if the original issue scope is strictly "show skipped count in DONE state," but it should be a deliberate product decision rather than an accidental gap.

Suggested AC that appears missing:

Given an import that fails after skipping one file, when the admin checks import status, then the skipped files from before the failure are still visible.

FR gap: RUNNING state skipped count is always 0

The skippedFiles list is populated at job completion and never updated during a running import. A user refreshing the admin page mid-import would always see skipped: 0, which could be misleading for long imports. If real-time progress is out of scope for #529 this is fine — but it should be explicitly noted as a known limitation.

NFR: filename length/characters not bounded

The SkippedFile.filename comes directly from the spreadsheet cell value. No maximum length or character sanitization is applied. In the frontend, {filename} is rendered in a <li> without truncation. A very long filename (or one containing special characters) could break the card layout. An NFR bounding display length (e.g., truncate at 80 chars with title tooltip) should be tracked.

Requirement coverage: LGTM

The four test scenarios (valid+fake, skipped count, rejected filename, short file) cover the acceptance criteria implied by the PR description. The i18n coverage across de/en/es is complete.

## 📋 Elicit — Requirements Engineer **Verdict: ⚠️ Approved with concerns** The implementation addresses the core requirement from #529: PDF magic bytes are validated before S3 upload, and the admin UI surfaces the skipped count with a collapsible filename list. i18n keys cover de/en/es. Test plan items are well-scoped. ### Concerns **FR gap: skipped data is lost on FAILED state** The requirement presumably asks that admins can always understand the outcome of an import run. Currently, if the import fails mid-run (e.g., no spreadsheet found), `currentStatus` is reset with an empty `skippedFiles` list — any files that were already skipped before the failure are irretrievably gone from the API response. The admin card does not surface the FAILED state's skipped count at all. This may be acceptable if the original issue scope is strictly "show skipped count in DONE state," but it should be a deliberate product decision rather than an accidental gap. Suggested AC that appears missing: - *Given* an import that fails after skipping one file, *when* the admin checks import status, *then* the skipped files from before the failure are still visible. **FR gap: RUNNING state skipped count is always 0** The `skippedFiles` list is populated at job completion and never updated during a running import. A user refreshing the admin page mid-import would always see `skipped: 0`, which could be misleading for long imports. If real-time progress is out of scope for #529 this is fine — but it should be explicitly noted as a known limitation. **NFR: filename length/characters not bounded** The `SkippedFile.filename` comes directly from the spreadsheet cell value. No maximum length or character sanitization is applied. In the frontend, `{filename}` is rendered in a `<li>` without truncation. A very long filename (or one containing special characters) could break the card layout. An NFR bounding display length (e.g., truncate at 80 chars with `title` tooltip) should be tracked. **Requirement coverage: LGTM** The four test scenarios (valid+fake, skipped count, rejected filename, short file) cover the acceptance criteria implied by the PR description. The i18n coverage across de/en/es is complete.

marcel commented

2026-05-18 15:05:39 +02:00

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: ✅ Approved

This PR does what it says and does it correctly. The magic bytes check is a legitimate server-side control that prevents accidental (or deliberate) non-PDF files from being stored as PDFs in S3. All logging uses SLF4J {} placeholders — no Log4Shell risk. The validation is in the service layer (correct layer), not client-side.

Security findings

Observation (not a blocker): polyglot files pass the check

%PDF at bytes 0–3 is the correct magic signature for PDF files. However, a crafted file that starts with %PDF followed by malicious content (a polyglot) would pass this check. For the import flow — where files come from an operator-controlled directory, not arbitrary user upload — the threat model is accidental misidentification, not adversarial crafting. The check is proportionate to the threat. If files ever come from less-trusted sources in the future, a deeper inspection (e.g., Apache Tika or PDFBox parsing) would be appropriate.

Observation: filename from spreadsheet is user-controlled data

SkippedFile.filename flows directly from the spreadsheet cell into the API response and the frontend <li> element. The Svelte template renders it as text content ({filename}), not innerHTML, so XSS is not possible. The SLF4J log uses {} placeholder, so JNDI injection is not possible. ✅

Positive: defense-in-depth for IOException

The catch (IOException e) branch skips the file with FILE_READ_ERROR rather than throwing or silently proceeding with a potentially corrupt read. Failing closed (skip the file) is the correct security posture when a read fails.

Positive: validation before S3 upload

The magic bytes check happens before s3Client.putObject. A non-PDF file never reaches S3. This is the right control placement — reject at the boundary, not after storage.

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** This PR does what it says and does it correctly. The magic bytes check is a legitimate server-side control that prevents accidental (or deliberate) non-PDF files from being stored as PDFs in S3. All logging uses SLF4J `{}` placeholders — no Log4Shell risk. The validation is in the service layer (correct layer), not client-side. ### Security findings **Observation (not a blocker): polyglot files pass the check** `%PDF` at bytes 0–3 is the correct magic signature for PDF files. However, a crafted file that starts with `%PDF` followed by malicious content (a polyglot) would pass this check. For the import flow — where files come from an operator-controlled directory, not arbitrary user upload — the threat model is accidental misidentification, not adversarial crafting. The check is proportionate to the threat. If files ever come from less-trusted sources in the future, a deeper inspection (e.g., Apache Tika or PDFBox parsing) would be appropriate. **Observation: `filename` from spreadsheet is user-controlled data** `SkippedFile.filename` flows directly from the spreadsheet cell into the API response and the frontend `<li>` element. The Svelte template renders it as text content (`{filename}`), not `innerHTML`, so XSS is not possible. The SLF4J log uses `{}` placeholder, so JNDI injection is not possible. ✅ **Positive: defense-in-depth for IOException** The `catch (IOException e)` branch skips the file with `FILE_READ_ERROR` rather than throwing or silently proceeding with a potentially corrupt read. Failing closed (skip the file) is the correct security posture when a read fails. **Positive: validation before S3 upload** The magic bytes check happens before `s3Client.putObject`. A non-PDF file never reaches S3. This is the right control placement — reject at the boundary, not after storage.

marcel commented

2026-05-18 15:05:51 +02:00

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: ⚠️ Approved with concerns

Four new behavioral tests cover the primary scenarios well. The makeStatus factory function is already in place and correctly extended with skipped: 0 and skippedFiles: [] defaults. The three frontend component tests use getByTestId and getByText (user-visible behavior), not internal component state. ✅

Blockers

IOException test has no CI coverage on the default runner

runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException uses assumeTrue(!unreadable.canRead()) to skip when the test runs as root. On the CI runner (Docker, root user), this assumption fails and the test is silently skipped — meaning the FILE_READ_ERROR code path has zero automated coverage in CI. This is a coverage gap.

The correct fix is to make the FileInputStream creation mockable rather than relying on OS permissions:

// Extract to a package-private method:
InputStream openFileStream(File file) throws IOException {
    return new FileInputStream(file);
}

Then in the test, spy on the service and stub openFileStream to throw IOException. The test becomes deterministic on all platforms and the assumeTrue can be removed.

Suggestions

Three tests share identical setup + action — redundancy is explicit but worth noting

runImportAsync_uploadsValidPdf_andSkipsFakeOne, runImportAsync_setsSkippedCount_toOne_whenOneFakeFile, and runImportAsync_includesRejectedFilename_inSkippedFiles each call setupOneValidOneFakeImport + service.runImportAsync(). This is the correct "one assertion per test" pattern — the redundancy is the price of isolation. No change required, just noting it is intentional.

Missing test: S3-failure file is neither processed nor in skippedFiles

When importSingleDocument returns false due to an S3 upload failure, the document is silently dropped from both counts. There is no test asserting this behavior. This matters because a future developer could reasonably expect failed S3 uploads to appear in skippedFiles and introduce a regression while trying to "fix" the silent drop. A test pinning the current behavior (or the corrected behavior if S3 failures should be in skippedFiles) would prevent this.

Frontend test: getByTestId('skipped-count') for the negative case works correctly

The does not show skipped section when DONE and skipped is 0 test correctly uses not.toBeInTheDocument() because the entire <details> block is conditionally rendered. ✅

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ⚠️ Approved with concerns** Four new behavioral tests cover the primary scenarios well. The `makeStatus` factory function is already in place and correctly extended with `skipped: 0` and `skippedFiles: []` defaults. The three frontend component tests use `getByTestId` and `getByText` (user-visible behavior), not internal component state. ✅ ### Blockers **IOException test has no CI coverage on the default runner** `runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException` uses `assumeTrue(!unreadable.canRead())` to skip when the test runs as root. On the CI runner (Docker, root user), this assumption fails and the test is silently skipped — meaning the `FILE_READ_ERROR` code path has zero automated coverage in CI. This is a coverage gap. The correct fix is to make the `FileInputStream` creation mockable rather than relying on OS permissions: ```java // Extract to a package-private method: InputStream openFileStream(File file) throws IOException { return new FileInputStream(file); } ``` Then in the test, spy on the service and stub `openFileStream` to throw `IOException`. The test becomes deterministic on all platforms and the `assumeTrue` can be removed. ### Suggestions **Three tests share identical setup + action — redundancy is explicit but worth noting** `runImportAsync_uploadsValidPdf_andSkipsFakeOne`, `runImportAsync_setsSkippedCount_toOne_whenOneFakeFile`, and `runImportAsync_includesRejectedFilename_inSkippedFiles` each call `setupOneValidOneFakeImport` + `service.runImportAsync()`. This is the correct "one assertion per test" pattern — the redundancy is the price of isolation. No change required, just noting it is intentional. **Missing test: S3-failure file is neither processed nor in skippedFiles** When `importSingleDocument` returns `false` due to an S3 upload failure, the document is silently dropped from both counts. There is no test asserting this behavior. This matters because a future developer could reasonably expect failed S3 uploads to appear in `skippedFiles` and introduce a regression while trying to "fix" the silent drop. A test pinning the current behavior (or the corrected behavior if S3 failures should be in `skippedFiles`) would prevent this. **Frontend test: `getByTestId('skipped-count')` for the negative case works correctly** The `does not show skipped section when DONE and skipped is 0` test correctly uses `not.toBeInTheDocument()` because the entire `<details>` block is conditionally rendered. ✅

marcel commented

2026-05-18 15:06:05 +02:00

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Verdict: ⚠️ Approved with concerns

The collapsible <details>/<summary> pattern is semantically correct and accessible by default. The chevron-rotation animation is a nice touch and implemented cleanly via details[open] CSS selector. The decision to show the skipped section only in DONE state is correct — showing it during RUNNING would be premature.

Blockers

text-warning / bg-warning / border-warning — verify token exists

The skipped section uses border-warning/40 bg-warning/10 text-warning. In Tailwind CSS v4, custom colors must be declared in the CSS theme. The CLAUDE.md documents brand-navy, brand-mint, and the semantic aliases (bg-surface, bg-canvas, border-line, etc.) — but warning is not listed. If this token is not defined in layout.css, the amber styling renders as nothing (no color, no background, no border tint). The section would be visually indistinguishable from the surrounding content.

Action: confirm --color-warning (or equivalent) is declared in layout.css. If not, define it or use Tailwind's built-in amber-* palette with the /40 and /10 modifiers.

Concerns

Filename overflow — filenames in the <li> list have no length constraint. A filename like scan_grossvater_theodor_raddatz_brief_sommer_1938_seite_1_von_3_kopie.pdf would overflow the card on mobile. Add break-all or truncate + title tooltip to the <li> element.

Touch target on <summary> — the <summary> acts as the disclosure button. It has p-4 padding and flex items-center gap-2, but no explicit min-h-[44px]. On mobile (where the senior audience is least likely to reach this page, but still possible), the tap target may fall below the 44px WCAG 2.2 requirement. Adding min-h-[44px] to the <summary> is a one-line fix.

No aria-live for dynamically appearing section — if an admin stays on the page while an import runs and the page polls for status, the skipped section appears silently. Screen readers won't announce the new content. Adding aria-live="polite" to the parent section containing the skipped details, or wrapping the skipped count in an aria-atomic region, would announce the change to assistive technology users.

Positive

The amber warning pattern (icon + bold count + descriptive label + collapsible filename list) is the right visual hierarchy. The count is prominent; the details are available but not distracting. The list-none on <summary> correctly removes the WebKit default disclosure triangle, avoiding a double-indicator with the custom SVG chevron. ✅

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist **Verdict: ⚠️ Approved with concerns** The collapsible `<details>/<summary>` pattern is semantically correct and accessible by default. The chevron-rotation animation is a nice touch and implemented cleanly via `details[open]` CSS selector. The decision to show the skipped section only in DONE state is correct — showing it during RUNNING would be premature. ### Blockers **`text-warning` / `bg-warning` / `border-warning` — verify token exists** The skipped section uses `border-warning/40 bg-warning/10 text-warning`. In Tailwind CSS v4, custom colors must be declared in the CSS theme. The CLAUDE.md documents `brand-navy`, `brand-mint`, and the semantic aliases (`bg-surface`, `bg-canvas`, `border-line`, etc.) — but `warning` is not listed. If this token is not defined in `layout.css`, the amber styling renders as nothing (no color, no background, no border tint). The section would be visually indistinguishable from the surrounding content. Action: confirm `--color-warning` (or equivalent) is declared in `layout.css`. If not, define it or use Tailwind's built-in `amber-*` palette with the `/40` and `/10` modifiers. ### Concerns **Filename overflow** — filenames in the `<li>` list have no length constraint. A filename like `scan_grossvater_theodor_raddatz_brief_sommer_1938_seite_1_von_3_kopie.pdf` would overflow the card on mobile. Add `break-all` or `truncate` + `title` tooltip to the `<li>` element. **Touch target on `<summary>`** — the `<summary>` acts as the disclosure button. It has `p-4` padding and `flex items-center gap-2`, but no explicit `min-h-[44px]`. On mobile (where the senior audience is least likely to reach this page, but still possible), the tap target may fall below the 44px WCAG 2.2 requirement. Adding `min-h-[44px]` to the `<summary>` is a one-line fix. **No `aria-live` for dynamically appearing section** — if an admin stays on the page while an import runs and the page polls for status, the skipped section appears silently. Screen readers won't announce the new content. Adding `aria-live="polite"` to the parent section containing the skipped details, or wrapping the skipped count in an `aria-atomic` region, would announce the change to assistive technology users. ### Positive The amber warning pattern (icon + bold count + descriptive label + collapsible filename list) is the right visual hierarchy. The count is prominent; the details are available but not distracting. The `list-none` on `<summary>` correctly removes the WebKit default disclosure triangle, avoiding a double-indicator with the custom SVG chevron. ✅

marcel commented

2026-05-18 15:12:34 +02:00

Review response (commit `53dba772`)

Fixed

@Schema annotations missing on SkippedFile / ImportStatus (Markus) — added @Schema(requiredMode = REQUIRED) to all non-nullable record components on both types, plus the derived skipped() method. The endpoint is in the OpenAPI spec; when generate:api is next run against a live backend, the generated api.ts will reflect these fields as required. types.ts remains the working type source in the meantime (pre-existing pattern for this component — the generated ImportStatus has been stale since before this PR).

IOException test silently skipped in CI (Sara) — extracted openFileStream(File) as a package-private method and replaced the assumeTrue(!canRead()) test with a Mockito spy that stubs openFileStream to throw. Test now runs unconditionally: 45 tests, 0 skipped, 0 failures.

Pushed back

text-warning / bg-warning / border-warning token undefined (Leonie) — the token is defined in layout.css:77 (--color-warning: #b45309) and is already used in TagDeleteGuard.svelte, login/+page.svelte, and ChronikErrorCard.svelte. The pattern is established and working.

Out of scope (tracked separately)

S3 upload failures not surfaced in skippedFiles (Felix) — this is pre-existing behaviour (the old code also silently returned on S3 failure). Adding S3_UPLOAD_ERROR to skippedFiles is a valid improvement but a separate issue from PDF magic bytes validation. Happy to file a follow-up issue if desired.

importSingleDocument three-state return contract (Felix suggestion) — agreed that boolean conflates "already exists" and "upload failed"; a future refactor to enum ImportResult would clarify this, but it's outside the scope of this PR.

Skipped data lost on FAILED state (Elicit) — acknowledged as a known limitation. The current PR scope is DONE-state display only. A follow-up issue could address preserving partial skipped lists on failure.

## Review response (commit 53dba772) ### Fixed **@Schema annotations missing on `SkippedFile` / `ImportStatus`** (Markus) — added `@Schema(requiredMode = REQUIRED)` to all non-nullable record components on both types, plus the derived `skipped()` method. The endpoint is in the OpenAPI spec; when `generate:api` is next run against a live backend, the generated `api.ts` will reflect these fields as required. `types.ts` remains the working type source in the meantime (pre-existing pattern for this component — the generated `ImportStatus` has been stale since before this PR). **IOException test silently skipped in CI** (Sara) — extracted `openFileStream(File)` as a package-private method and replaced the `assumeTrue(!canRead())` test with a Mockito spy that stubs `openFileStream` to throw. Test now runs unconditionally: 45 tests, 0 skipped, 0 failures. ### Pushed back **`text-warning` / `bg-warning` / `border-warning` token undefined** (Leonie) — the token is defined in `layout.css:77` (`--color-warning: #b45309`) and is already used in `TagDeleteGuard.svelte`, `login/+page.svelte`, and `ChronikErrorCard.svelte`. The pattern is established and working. ### Out of scope (tracked separately) **S3 upload failures not surfaced in `skippedFiles`** (Felix) — this is pre-existing behaviour (the old code also silently `return`ed on S3 failure). Adding `S3_UPLOAD_ERROR` to `skippedFiles` is a valid improvement but a separate issue from PDF magic bytes validation. Happy to file a follow-up issue if desired. **`importSingleDocument` three-state return contract** (Felix suggestion) — agreed that `boolean` conflates "already exists" and "upload failed"; a future refactor to `enum ImportResult` would clarify this, but it's outside the scope of this PR. **Skipped data lost on FAILED state** (Elicit) — acknowledged as a known limitation. The current PR scope is DONE-state display only. A follow-up issue could address preserving partial skipped lists on failure.

marcel added 1 commit 2026-05-18 15:20:30 +02:00

fix(import): add @Schema annotations and fix IOException test coverage

CI / Unit & Component Tests (pull_request) Successful in 3m2s

Details

CI / OCR Service Tests (pull_request) Successful in 20s

Details

CI / Backend Unit Tests (pull_request) Successful in 3m14s

Details

CI / fail2ban Regex (pull_request) Successful in 44s

Details

CI / Semgrep Security Scan (pull_request) Successful in 19s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s

Details

53dba772ae

- Add @Schema(requiredMode = REQUIRED) to SkippedFile and ImportStatus
  record components so TypeScript codegen produces non-optional fields
  when generate:api is next run
- Extract openFileStream(File) as package-private method so the
  IOException path can be tested deterministically without relying on
  OS-level file permissions (which are bypassed when running as root)
- Replace assumeTrue-based IOException test with Mockito spy that stubs
  openFileStream — test now runs in CI unconditionally (45 tests, 0 skipped)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel referenced this pull request

2026-05-18 15:20:52 +02:00

feat(import): surface S3 upload failures in skippedFiles #619

marcel referenced this pull request

2026-05-18 15:20:58 +02:00

refactor(import): replace importSingleDocument boolean return with ImportResult enum #620

marcel referenced this pull request

2026-05-18 15:21:06 +02:00

bug(import): skipped file list is lost when import transitions to FAILED state #621

marcel commented

2026-05-18 15:30:59 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ⚠️ Approved with concerns

The implementation is clean and the TDD discipline shows — four targeted regression tests, each testing one behavior. A few things caught my eye.

Blockers

None.

Suggestions

1. openFileStream() visibility is package-private by design, but undocumented

InputStream openFileStream(File file) throws IOException {
    return new FileInputStream(file);
}

This is intentionally package-private to allow the Mockito spy in the IOException test. That's a legitimate TDD seam — but without any signal in the code (not even a comment or a test-seam annotation pattern), the next developer may make it private during a cleanup pass and break the IOException test. The intent should be self-documenting. Consider naming it openFileStreamForTesting (ugly) or adding a one-line "seam for testing" comment — the only type of comment that's justified here per our style guide.

2. skipped() method on a Java record — @JsonProperty + @Schema on a computed accessor

@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
@JsonProperty("skipped")
public int skipped() { return skippedFiles.size(); }

This works, but it mixes Jackson serialization concern into the record accessor. If skippedFiles is always present and required, the skipped field is purely a computed convenience for the client — consider whether it truly needs to be in the response at all, or whether the frontend can derive it from skippedFiles.length. If it stays, this is fine as-is, just flagging the coupling.

3. The {#each} key in the Svelte component uses filename as key

{#each importStatus.skippedFiles as { filename, reason } (filename)}

If two files in the same import have the same filename (which can happen if the same file appears twice in the spreadsheet), the key collision will cause Svelte to render only one entry. Using (filename + reason) or an index key (i) would be safer. This is an edge case in the import scenario but worth noting.

4. Frontend types.ts is manually maintained, not generated
The SkippedFile and updated ImportStatus types in frontend/src/routes/admin/system/types.ts are manually written. Since the backend ImportStatus record now has @Schema(requiredMode = REQUIRED) on all fields, the generated OpenAPI client should produce these types. Keeping a parallel manual type file means drift risk — if the backend shape changes, types.ts may not get updated. Is there a reason these aren't using the generated types from npm run generate:api?

5. Test helper buildMinimalImportXlsx — column 0 only
The helper creates a single-column XLSX (only the index column). The actual service reads many columns (colBox, colSender, etc.) via getCell(cells, colIndex). Since getCell returns an empty string for missing cells, this works, but it means the test exercises a stripped-down import that doesn't look like production data. For the magic-byte regression specifically, this is fine — just worth noting that the helper's realism is limited.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ⚠️ Approved with concerns** The implementation is clean and the TDD discipline shows — four targeted regression tests, each testing one behavior. A few things caught my eye. ### Blockers None. ### Suggestions **1. `openFileStream()` visibility is package-private by design, but undocumented** ```java InputStream openFileStream(File file) throws IOException { return new FileInputStream(file); } ``` This is intentionally package-private to allow the Mockito spy in the IOException test. That's a legitimate TDD seam — but without any signal in the code (not even a comment or a test-seam annotation pattern), the next developer may make it `private` during a cleanup pass and break the IOException test. The intent should be self-documenting. Consider naming it `openFileStreamForTesting` (ugly) or adding a one-line "seam for testing" comment — the only type of comment that's justified here per our style guide. **2. `skipped()` method on a Java record — `@JsonProperty` + `@Schema` on a computed accessor** ```java @Schema(requiredMode = Schema.RequiredMode.REQUIRED) @JsonProperty("skipped") public int skipped() { return skippedFiles.size(); } ``` This works, but it mixes Jackson serialization concern into the record accessor. If `skippedFiles` is always present and required, the `skipped` field is purely a computed convenience for the client — consider whether it truly needs to be in the response at all, or whether the frontend can derive it from `skippedFiles.length`. If it stays, this is fine as-is, just flagging the coupling. **3. The `{#each}` key in the Svelte component uses `filename` as key** ```svelte {#each importStatus.skippedFiles as { filename, reason } (filename)} ``` If two files in the same import have the same filename (which can happen if the same file appears twice in the spreadsheet), the key collision will cause Svelte to render only one entry. Using `(filename + reason)` or an index key `(i)` would be safer. This is an edge case in the import scenario but worth noting. **4. Frontend `types.ts` is manually maintained, not generated** The `SkippedFile` and updated `ImportStatus` types in `frontend/src/routes/admin/system/types.ts` are manually written. Since the backend `ImportStatus` record now has `@Schema(requiredMode = REQUIRED)` on all fields, the generated OpenAPI client should produce these types. Keeping a parallel manual type file means drift risk — if the backend shape changes, `types.ts` may not get updated. Is there a reason these aren't using the generated types from `npm run generate:api`? **5. Test helper `buildMinimalImportXlsx` — column 0 only** The helper creates a single-column XLSX (only the index column). The actual service reads many columns (`colBox`, `colSender`, etc.) via `getCell(cells, colIndex)`. Since `getCell` returns an empty string for missing cells, this works, but it means the test exercises a stripped-down import that doesn't look like production data. For the magic-byte regression specifically, this is fine — just worth noting that the helper's realism is limited.

marcel commented

2026-05-18 15:31:10 +02:00

🏗️ Markus Keller — Application Architect

Verdict: ⚠️ Approved with concerns

The structural changes are sound: new value types are nested records inside the owning service, the processRows → ProcessResult refactor keeps the logic self-contained, and no cross-domain boundaries are crossed. One architectural concern warrants a follow-up.

Blockers

None.

Concerns

Manual types.ts duplicates the OpenAPI-generated contract

frontend/src/routes/admin/system/types.ts is manually maintained:

export type SkippedFile = { filename: string; reason: string; };
export type ImportStatus = {
    state: 'IDLE' | 'RUNNING' | 'DONE' | 'FAILED';
    statusCode: string;
    processed: number;
    skipped: number;
    skippedFiles: SkippedFile[];
    startedAt: string | null;
};

The backend ImportStatus record is annotated with @Schema(requiredMode = REQUIRED) on all fields, which means npm run generate:api should already produce these types from the OpenAPI spec. Having two authoritative type sources — the generated types and this manual file — means a future backend change requires updating both. CLAUDE.md is explicit: "always run npm run generate:api after any backend model or endpoint change."

Either:

Switch the admin system components to use the generated types (preferred), or
Document why this route needs a local type file (e.g., the endpoint is not in the public API spec, only in the admin slice)

This isn't a blocker since the current PR is self-consistent, but the drift risk is real.

What Looks Good

No new packages or routes — no documentation updates triggered.
SkippedFile and ProcessResult as nested records — correctly scoped as implementation details of MassImportService. They don't escape to the controller or other domains.
skipped() computed from skippedFiles.size() — single source of truth; no risk of count/list mismatch.
@JsonIgnore String message preserved — the internal German diagnostic message stays off the API wire, consistent with the existing contract.
Fail-safe on IOException — skips the file rather than aborting the import. Appropriate behavior for a batch process where one bad file shouldn't block the rest.

## 🏗️ Markus Keller — Application Architect **Verdict: ⚠️ Approved with concerns** The structural changes are sound: new value types are nested records inside the owning service, the `processRows` → `ProcessResult` refactor keeps the logic self-contained, and no cross-domain boundaries are crossed. One architectural concern warrants a follow-up. ### Blockers None. ### Concerns **Manual `types.ts` duplicates the OpenAPI-generated contract** `frontend/src/routes/admin/system/types.ts` is manually maintained: ```typescript export type SkippedFile = { filename: string; reason: string; }; export type ImportStatus = { state: 'IDLE' | 'RUNNING' | 'DONE' | 'FAILED'; statusCode: string; processed: number; skipped: number; skippedFiles: SkippedFile[]; startedAt: string | null; }; ``` The backend `ImportStatus` record is annotated with `@Schema(requiredMode = REQUIRED)` on all fields, which means `npm run generate:api` should already produce these types from the OpenAPI spec. Having two authoritative type sources — the generated types and this manual file — means a future backend change requires updating both. CLAUDE.md is explicit: "always run `npm run generate:api` after any backend model or endpoint change." Either: - Switch the admin system components to use the generated types (preferred), or - Document why this route needs a local type file (e.g., the endpoint is not in the public API spec, only in the admin slice) This isn't a blocker since the current PR is self-consistent, but the drift risk is real. ### What Looks Good - **No new packages or routes** — no documentation updates triggered. - **`SkippedFile` and `ProcessResult` as nested records** — correctly scoped as implementation details of `MassImportService`. They don't escape to the controller or other domains. - **`skipped()` computed from `skippedFiles.size()`** — single source of truth; no risk of count/list mismatch. - **`@JsonIgnore String message` preserved** — the internal German diagnostic message stays off the API wire, consistent with the existing contract. - **Fail-safe on IOException** — skips the file rather than aborting the import. Appropriate behavior for a batch process where one bad file shouldn't block the rest.

marcel commented

2026-05-18 15:31:16 +02:00

🚀 Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

This PR touches no CI workflow files, no Docker Compose configuration, no infrastructure definitions, and no secrets handling. Nothing in my domain changed.

What I Checked

No new Docker services or volumes — no compose changes needed.
Filesystem access pattern — openFileStream(File file) opens files from the import directory, which is a host bind-mount into the container. This is consistent with how the service has always operated. No new mounts required.
No hardcoded credentials — the new SkippedFile and ProcessResult types carry no secrets.
No CI/CD files modified — existing test pipeline covers the new tests.
S3 interactions unchanged — the change prevents putObject from being called for invalid files. This reduces unnecessary object storage writes, which is a mild operational improvement.

No action required from me on this PR.

## 🚀 Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** This PR touches no CI workflow files, no Docker Compose configuration, no infrastructure definitions, and no secrets handling. Nothing in my domain changed. ### What I Checked - **No new Docker services or volumes** — no compose changes needed. - **Filesystem access pattern** — `openFileStream(File file)` opens files from the import directory, which is a host bind-mount into the container. This is consistent with how the service has always operated. No new mounts required. - **No hardcoded credentials** — the new `SkippedFile` and `ProcessResult` types carry no secrets. - **No CI/CD files modified** — existing test pipeline covers the new tests. - **S3 interactions unchanged** — the change prevents `putObject` from being called for invalid files. This reduces unnecessary object storage writes, which is a mild operational improvement. No action required from me on this PR.

marcel commented

2026-05-18 15:31:29 +02:00

📋 Elicit — Requirements Engineer

Verdict: ⚠️ Approved with concerns

The core requirement from #529 is met: files whose content does not match the PDF magic bytes are rejected before S3 upload, logged, and surfaced in the admin UI. The edge cases (short file, IOException) are handled. One gap in the specified behavior warrants discussion.

Concern: Skipped files are silently discarded on import failure

Scenario: An import batch starts, skips 3 invalid files (correctly), then hits an unrelated exception (e.g., database error, spreadsheet parse error) and transitions to FAILED state.

Current behavior in the FAILED branches:

currentStatus = new ImportStatus(State.FAILED, "IMPORT_FAILED_INTERNAL",
        "Fehler: " + e.getMessage(), 0, List.of(), currentStatus.startedAt());

The skippedFiles is reset to List.of(). Any files that were already validated and rejected are thrown away. The admin sees skipped: 0 even though 3 files were skipped before the failure.

Is this the intended behavior?
If the answer is "a failed import is all-or-nothing, so the skipped list doesn't matter," that's a valid product decision. But if the admin needs to know which files were rejected regardless of whether the import completed, the ProcessResult should be captured before the exception and included in the FAILED status. Neither behavior is wrong — but the current spec doesn't address this case, and the tests only cover the DONE path.

Suggested clarification: Add a test or a comment in the service that states the intended behavior for the FAILED path. If the skipped list should be preserved on partial failure, the fix is:

// capture partial result before exception propagates
ProcessResult partialResult = ...;
currentStatus = new ImportStatus(State.FAILED, ..., partialResult.processed(),
        partialResult.skippedFiles(), currentStatus.startedAt());

What Looks Good

reason as a typed string code (INVALID_PDF_SIGNATURE, FILE_READ_ERROR) with i18n mapping in the frontend — extensible if new rejection reasons are added later.
UI condition skipped > 0 — admin is not burdened with an empty warning section on clean imports.
Three i18n locales covered (de/en/es) — no localization debt.
Short-file edge case tested — the readNBytes(4) returning fewer than 4 bytes is explicitly covered.

## 📋 Elicit — Requirements Engineer **Verdict: ⚠️ Approved with concerns** The core requirement from #529 is met: files whose content does not match the PDF magic bytes are rejected before S3 upload, logged, and surfaced in the admin UI. The edge cases (short file, IOException) are handled. One gap in the specified behavior warrants discussion. ### Concern: Skipped files are silently discarded on import failure **Scenario:** An import batch starts, skips 3 invalid files (correctly), then hits an unrelated exception (e.g., database error, spreadsheet parse error) and transitions to `FAILED` state. Current behavior in the `FAILED` branches: ```java currentStatus = new ImportStatus(State.FAILED, "IMPORT_FAILED_INTERNAL", "Fehler: " + e.getMessage(), 0, List.of(), currentStatus.startedAt()); ``` The `skippedFiles` is reset to `List.of()`. Any files that were already validated and rejected are thrown away. The admin sees `skipped: 0` even though 3 files were skipped before the failure. **Is this the intended behavior?** If the answer is "a failed import is all-or-nothing, so the skipped list doesn't matter," that's a valid product decision. But if the admin needs to know which files were rejected regardless of whether the import completed, the `ProcessResult` should be captured before the exception and included in the `FAILED` status. Neither behavior is wrong — but the current spec doesn't address this case, and the tests only cover the `DONE` path. **Suggested clarification:** Add a test or a comment in the service that states the intended behavior for the `FAILED` path. If the skipped list should be preserved on partial failure, the fix is: ```java // capture partial result before exception propagates ProcessResult partialResult = ...; currentStatus = new ImportStatus(State.FAILED, ..., partialResult.processed(), partialResult.skippedFiles(), currentStatus.startedAt()); ``` ### What Looks Good - **`reason` as a typed string code** (`INVALID_PDF_SIGNATURE`, `FILE_READ_ERROR`) with i18n mapping in the frontend — extensible if new rejection reasons are added later. - **UI condition `skipped > 0`** — admin is not burdened with an empty warning section on clean imports. - **Three i18n locales covered** (de/en/es) — no localization debt. - **Short-file edge case tested** — the `readNBytes(4)` returning fewer than 4 bytes is explicitly covered.

marcel commented

2026-05-18 15:31:43 +02:00

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: ✅ Approved

This PR implements the intended security control correctly. The magic byte check is a meaningful defense against disguised non-PDF files being stored in S3. My review found no exploitable vulnerabilities.

What I Verified

1. Magic byte check is correct and placed at the right boundary

private boolean isPdfMagicBytes(File file) throws IOException {
    try (InputStream is = openFileStream(file)) {
        byte[] header = is.readNBytes(4);
        return header.length == 4
                && header[0] == 0x25  // %
                && header[1] == 0x50  // P
                && header[2] == 0x44  // D
                && header[3] == 0x46; // F
    }
}

Reads exactly 4 bytes — minimal attack surface on the file handle.
Requires exactly 4 bytes returned (header.length == 4) — correctly rejects files shorter than 4 bytes. readNBytes(4) on a 3-byte file returns a 3-byte array, not a padding-filled 4-byte array.
Check happens before s3Client.putObject — the malicious file never reaches object storage.
Uses try-with-resources — no file descriptor leak.

2. Fail-safe on IOException

} catch (IOException e) {
    log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e);
    skippedFiles.add(new SkippedFile(filename, "FILE_READ_ERROR"));
    continue;
}

Fail-closed behavior: an unreadable file is skipped, not uploaded. This is correct.

3. Logging uses SLF4J parameterized placeholders throughout

log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename);
log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e);

No + string concatenation in log statements — Log4Shell-safe.

4. filename from spreadsheet is user-controlled data displayed in the frontend
The filename field originates from the spreadsheet and is rendered in the Svelte template:

<li class="font-mono text-xs text-ink-2">{filename} — {reasonLabel(reason)}</li>

Svelte auto-escapes template interpolations — no XSS risk.

Security Smell (non-blocking)

TOCTOU (Time-of-Check, Time-of-Use) between magic byte read and S3 upload

The file is opened twice: once to check the magic bytes (then closed), and again inside the S3 upload path. Between these two opens, a file on the import share could theoretically be replaced with a malicious file that passes the check.

Severity: Negligible in the current threat model. The import share is a local server directory, not a network-exposed upload endpoint. An attacker would need write access to the import directory, at which point they already have broader compromise. Not a fix request — just documenting the theoretical race condition.

Defense in depth note: Magic byte validation is an input validation layer, not a content-safety guarantee. A PDF that contains embedded JavaScript or exploits Adobe Reader is still a %PDF-prefixed file. The current PR correctly scopes itself to "is this a PDF-shaped file" rather than "is this a safe PDF." That's the right scope for this issue.

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** This PR implements the intended security control correctly. The magic byte check is a meaningful defense against disguised non-PDF files being stored in S3. My review found no exploitable vulnerabilities. ### What I Verified **1. Magic byte check is correct and placed at the right boundary** ```java private boolean isPdfMagicBytes(File file) throws IOException { try (InputStream is = openFileStream(file)) { byte[] header = is.readNBytes(4); return header.length == 4 && header[0] == 0x25 // % && header[1] == 0x50 // P && header[2] == 0x44 // D && header[3] == 0x46; // F } } ``` - Reads exactly 4 bytes — minimal attack surface on the file handle. - Requires exactly 4 bytes returned (`header.length == 4`) — correctly rejects files shorter than 4 bytes. `readNBytes(4)` on a 3-byte file returns a 3-byte array, not a padding-filled 4-byte array. - Check happens **before** `s3Client.putObject` — the malicious file never reaches object storage. - Uses try-with-resources — no file descriptor leak. **2. Fail-safe on IOException** ```java } catch (IOException e) { log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e); skippedFiles.add(new SkippedFile(filename, "FILE_READ_ERROR")); continue; } ``` Fail-closed behavior: an unreadable file is skipped, not uploaded. This is correct. **3. Logging uses SLF4J parameterized placeholders throughout** ```java log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename); log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e); ``` No `+` string concatenation in log statements — Log4Shell-safe. **4. `filename` from spreadsheet is user-controlled data displayed in the frontend** The `filename` field originates from the spreadsheet and is rendered in the Svelte template: ```svelte <li class="font-mono text-xs text-ink-2">{filename} — {reasonLabel(reason)}</li> ``` Svelte auto-escapes template interpolations — no XSS risk. ### Security Smell (non-blocking) **TOCTOU (Time-of-Check, Time-of-Use) between magic byte read and S3 upload** The file is opened twice: once to check the magic bytes (then closed), and again inside the S3 upload path. Between these two opens, a file on the import share could theoretically be replaced with a malicious file that passes the check. **Severity:** Negligible in the current threat model. The import share is a local server directory, not a network-exposed upload endpoint. An attacker would need write access to the import directory, at which point they already have broader compromise. Not a fix request — just documenting the theoretical race condition. **Defense in depth note:** Magic byte validation is an input validation layer, not a content-safety guarantee. A PDF that contains embedded JavaScript or exploits Adobe Reader is still a `%PDF`-prefixed file. The current PR correctly scopes itself to "is this a PDF-shaped file" rather than "is this a safe PDF." That's the right scope for this issue.

marcel commented

2026-05-18 15:32:00 +02:00

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: ⚠️ Approved with concerns

The regression tests are well-structured and cover the key behaviors. A few gaps and style observations.

Concerns

1. Three tests share one setup helper — three test names, one assertion pattern

void runImportAsync_uploadsValidPdf_andSkipsFakeOne(...)     // verifies s3Client call count
void runImportAsync_setsSkippedCount_toOne_whenOneFakeFile(...)  // verifies skipped()
void runImportAsync_includesRejectedFilename_inSkippedFiles(...)  // verifies filename

All three call setupOneValidOneFakeImport(tempDir) and service.runImportAsync(). The behaviors are correctly separated (one assertion per test — ✅), but because they share identical runImportAsync() invocations with the same setup, they could be written with @ParameterizedTest or grouped into a @Nested class. This is a style preference, not a bug — but grouping them would make it clear they're variations of the same scenario.

2. getByTestId in Svelte tests — prefer semantic queries

const { getByTestId } = render(ImportStatusCard, ...);
await expect.element(getByTestId('skipped-count')).toHaveTextContent('3');

getByTestId couples the test to an implementation detail (data-testid). Prefer getByText, getByRole, or pattern-matching: getByText('3') would find the skipped count without needing a test ID on the element. The data-testid="skipped-count" was likely added specifically for this test, which is fine — but if the number appears only once in the DONE state, a semantic query is more resilient.

3. Missing: test for behavior when skippedFiles is non-empty but import state is not DONE
The Svelte component conditionally renders the warning only in the DONE block:

{#if importStatus.state === 'DONE'}
  ...
  {#if importStatus.skipped > 0}
    <details ...>

There's no test verifying that a RUNNING status with skipped: 2 does NOT show the warning. Given that the backend currently never populates skippedFiles in RUNNING state this is a theoretical gap, but a guard test would prevent a future regression if the running status ever gains partial skip counts.

4. IOException test uses spy(service) directly on the Spring-wired bean

MassImportService spyService = spy(service);
doThrow(new java.io.IOException("simulated read error")).when(spyService).openFileStream(any(File.class));
spyService.runImportAsync();

This works because openFileStream is package-private. However, the spyService is a detached instance — @Autowired dependencies from the original service bean (S3 client, documentService, etc.) are carried over by Mockito's spy. This is correct, but fragile: if MassImportService gains a @Transactional-proxied bean instead of a raw class, the spy may not intercept correctly. The current approach is fine for now but worth noting.

What Looks Good

setupOneValidOneFakeImport and buildMinimalImportXlsx helpers follow the factory function pattern — tests are concise and readable.
@TempDir Path tempDir for JUnit-managed temp dirs — no manual cleanup.
lenient().when(...) in the short-file and IOException tests correctly avoids unnecessary stubbing failures.
3-byte header in the short-file test ({0x25, 0x50, 0x44}) precisely hits the boundary condition.
Existing AdminControllerTest updated to match the new constructor signature — no test drift.

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ⚠️ Approved with concerns** The regression tests are well-structured and cover the key behaviors. A few gaps and style observations. ### Concerns **1. Three tests share one setup helper — three test names, one assertion pattern** ```java void runImportAsync_uploadsValidPdf_andSkipsFakeOne(...) // verifies s3Client call count void runImportAsync_setsSkippedCount_toOne_whenOneFakeFile(...) // verifies skipped() void runImportAsync_includesRejectedFilename_inSkippedFiles(...) // verifies filename ``` All three call `setupOneValidOneFakeImport(tempDir)` and `service.runImportAsync()`. The behaviors are correctly separated (one assertion per test — ✅), but because they share identical `runImportAsync()` invocations with the same setup, they could be written with `@ParameterizedTest` or grouped into a `@Nested` class. This is a style preference, not a bug — but grouping them would make it clear they're variations of the same scenario. **2. `getByTestId` in Svelte tests — prefer semantic queries** ```typescript const { getByTestId } = render(ImportStatusCard, ...); await expect.element(getByTestId('skipped-count')).toHaveTextContent('3'); ``` `getByTestId` couples the test to an implementation detail (`data-testid`). Prefer `getByText`, `getByRole`, or pattern-matching: `getByText('3')` would find the skipped count without needing a test ID on the element. The `data-testid="skipped-count"` was likely added specifically for this test, which is fine — but if the number appears only once in the DONE state, a semantic query is more resilient. **3. Missing: test for behavior when `skippedFiles` is non-empty but import state is not `DONE`** The Svelte component conditionally renders the warning only in the DONE block: ```svelte {#if importStatus.state === 'DONE'} ... {#if importStatus.skipped > 0} <details ...> ``` There's no test verifying that a `RUNNING` status with `skipped: 2` does NOT show the warning. Given that the backend currently never populates `skippedFiles` in RUNNING state this is a theoretical gap, but a guard test would prevent a future regression if the running status ever gains partial skip counts. **4. IOException test uses `spy(service)` directly on the Spring-wired bean** ```java MassImportService spyService = spy(service); doThrow(new java.io.IOException("simulated read error")).when(spyService).openFileStream(any(File.class)); spyService.runImportAsync(); ``` This works because `openFileStream` is package-private. However, the `spyService` is a detached instance — `@Autowired` dependencies from the original `service` bean (S3 client, documentService, etc.) are carried over by Mockito's spy. This is correct, but fragile: if `MassImportService` gains a `@Transactional`-proxied bean instead of a raw class, the spy may not intercept correctly. The current approach is fine for now but worth noting. ### What Looks Good - `setupOneValidOneFakeImport` and `buildMinimalImportXlsx` helpers follow the factory function pattern — tests are concise and readable. - `@TempDir Path tempDir` for JUnit-managed temp dirs — no manual cleanup. - `lenient().when(...)` in the short-file and IOException tests correctly avoids unnecessary stubbing failures. - 3-byte header in the short-file test (`{0x25, 0x50, 0x44}`) precisely hits the boundary condition. - Existing `AdminControllerTest` updated to match the new constructor signature — no test drift.

marcel commented

2026-05-18 15:32:13 +02:00

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Verdict: ⚠️ Approved with concerns

The native <details>/<summary> pattern is a solid choice — it's keyboard-accessible out of the box, screen-reader friendly, and doesn't require JavaScript. Three issues for the 60+ audience.

Concerns

1. text-xs for filenames — below the senior minimum (High)

<li class="font-mono text-xs text-ink-2">{filename} — {reasonLabel(reason)}</li>

text-xs = 12px. This is the absolute floor per our guidelines. For the 60+ transcriber audience on a laptop, monospace text at 12px is genuinely difficult to read. The filenames are technical content but still need to be legible.

Fix: Use text-sm (14px) at minimum:

<li class="font-mono text-sm text-ink-2">{filename} — {reasonLabel(reason)}</li>

2. Chevron animation doesn't respect prefers-reduced-motion (Medium)

details[open] .details-chevron {
    transform: rotate(90deg);
}

The transition-transform Tailwind class on the SVG means the rotation animates. There's no @media (prefers-reduced-motion: reduce) guard. Users with vestibular disorders or motion sensitivity get an unnecessary animation.

Fix: Add to the <style> block:

@media (prefers-reduced-motion: reduce) {
    .details-chevron {
        transition: none;
    }
}

Or in Tailwind: replace transition-transform with motion-safe:transition-transform.

3. Warning color contrast on bg-warning/10 background (Medium — verify)

<details class="... border-warning/40 bg-warning/10 ... text-warning">

The text-warning color rendered on bg-warning/10 (10% opacity amber background) may not meet WCAG AA (4.5:1 for normal text). The text-xs font-bold uppercase tracking-widest section label and text-base font-bold count both use text-warning. The bold weight helps, but please verify the contrast ratio with a tool — amber on a near-white background frequently fails.

What Looks Good

<details>/<summary> semantic pattern — native keyboard accessibility, browser-managed aria-expanded, no JS required. Excellent choice over a custom accordion.
Chevron + count + label combination — satisfies "never color alone." The amber section communicates via color, count number, and the "übersprungen" text label together.
Brand typography — font-sans text-xs font-bold tracking-widest uppercase for the label matches the section title pattern from the card style guide.
skipped > 0 condition — no empty warning section cluttering the DONE state for clean imports.
data-testid="skipped-count" — allows targeted testing without relying on text content.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist **Verdict: ⚠️ Approved with concerns** The native `<details>/<summary>` pattern is a solid choice — it's keyboard-accessible out of the box, screen-reader friendly, and doesn't require JavaScript. Three issues for the 60+ audience. ### Concerns **1. `text-xs` for filenames — below the senior minimum (High)** ```svelte <li class="font-mono text-xs text-ink-2">{filename} — {reasonLabel(reason)}</li> ``` `text-xs` = 12px. This is the absolute floor per our guidelines. For the 60+ transcriber audience on a laptop, monospace text at 12px is genuinely difficult to read. The filenames are technical content but still need to be legible. **Fix:** Use `text-sm` (14px) at minimum: ```svelte <li class="font-mono text-sm text-ink-2">{filename} — {reasonLabel(reason)}</li> ``` **2. Chevron animation doesn't respect `prefers-reduced-motion` (Medium)** ```css details[open] .details-chevron { transform: rotate(90deg); } ``` The `transition-transform` Tailwind class on the SVG means the rotation animates. There's no `@media (prefers-reduced-motion: reduce)` guard. Users with vestibular disorders or motion sensitivity get an unnecessary animation. **Fix:** Add to the `<style>` block: ```css @media (prefers-reduced-motion: reduce) { .details-chevron { transition: none; } } ``` Or in Tailwind: replace `transition-transform` with `motion-safe:transition-transform`. **3. Warning color contrast on `bg-warning/10` background (Medium — verify)** ```svelte <details class="... border-warning/40 bg-warning/10 ... text-warning"> ``` The `text-warning` color rendered on `bg-warning/10` (10% opacity amber background) may not meet WCAG AA (4.5:1 for normal text). The `text-xs font-bold uppercase tracking-widest` section label and `text-base font-bold` count both use `text-warning`. The bold weight helps, but please verify the contrast ratio with a tool — amber on a near-white background frequently fails. ### What Looks Good - **`<details>/<summary>` semantic pattern** — native keyboard accessibility, browser-managed `aria-expanded`, no JS required. Excellent choice over a custom accordion. - **Chevron + count + label combination** — satisfies "never color alone." The amber section communicates via color, count number, and the "übersprungen" text label together. - **Brand typography** — `font-sans text-xs font-bold tracking-widest uppercase` for the label matches the section title pattern from the card style guide. - **`skipped > 0` condition** — no empty warning section cluttering the DONE state for clean imports. - **`data-testid="skipped-count"`** — allows targeted testing without relying on text content.

marcel added 2 commits 2026-05-18 16:32:21 +02:00

test(admin): add regression test for skipped section hidden during RUNNING 93da5914a8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(admin): address round-2 review concerns on ImportStatusCard

CI / Unit & Component Tests (pull_request) Successful in 3m9s

Details

CI / OCR Service Tests (pull_request) Successful in 20s

Details

CI / Backend Unit Tests (pull_request) Successful in 3m1s

Details

CI / fail2ban Regex (pull_request) Successful in 42s

Details

CI / Semgrep Security Scan (pull_request) Successful in 19s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s

Details

b38b8c39b8

- Use loop index as each key (handles duplicate filenames)
- Increase skipped filename font from text-xs to text-sm
- Add motion-safe guard to details chevron transition
- Replace text-warning with text-amber-900 to meet WCAG AA contrast

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel commented

2026-05-18 18:25:33 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ✅ Approved

What's done well

isPdfMagicBytes() has one responsibility, one return — clean
openFileStream() extracted specifically as a testability seam — disciplined TDD thinking
Guard clause with continue in processRows avoids nesting correctly
reasonLabel() extracted from template markup — business logic out of Svelte template, in script
SLF4J {} placeholders throughout — no string concatenation in log calls
@JsonProperty("skipped") on the derived method is a tidy trick to expose the computed field without duplicating the list

Suggestions (non-blocking)

{#each ... as skipped, i (i)} — the key is the array index. Since no two skipped files in one import run have the same filename, keying by skipped.filename is more semantically correct:

{#each importStatus.skippedFiles as skipped (skipped.filename)}

Index keys work for pure-display lists that don't reorder, so this is minor — but Felix would use the stable key.

importSingleDocument return type — the method name reads as a command ("import this document"), but it now returns a boolean. A name like tryImportSingleDocument or simply changing nothing is fine, but the mismatch between command-style name and query-style return is mildly surprising. Not a blocker.

buildMinimalImportXlsx helper — the helper only populates cell 0 (the index column). Production code reads additional columns (colBox, colDate, etc.) via getCell(), which silently returns "" for missing cells. This is fine because the import handles empty metadata, but a reader encountering failing tests down the road might be confused by the minimal fixture. Already reviewed and accepted the behavior — just noting it for future maintainers.

Blockers

None.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** ### What's done well - `isPdfMagicBytes()` has one responsibility, one return — clean - `openFileStream()` extracted specifically as a testability seam — disciplined TDD thinking - Guard clause with `continue` in `processRows` avoids nesting correctly - `reasonLabel()` extracted from template markup — business logic out of Svelte template, in script - SLF4J `{}` placeholders throughout — no string concatenation in log calls - `@JsonProperty("skipped")` on the derived method is a tidy trick to expose the computed field without duplicating the list ### Suggestions (non-blocking) **`{#each ... as skipped, i (i)}`** — the key is the array index. Since no two skipped files in one import run have the same filename, keying by `skipped.filename` is more semantically correct: ```svelte {#each importStatus.skippedFiles as skipped (skipped.filename)} ``` Index keys work for pure-display lists that don't reorder, so this is minor — but Felix would use the stable key. **`importSingleDocument` return type** — the method name reads as a command ("import this document"), but it now returns a `boolean`. A name like `tryImportSingleDocument` or simply changing nothing is fine, but the mismatch between command-style name and query-style return is mildly surprising. Not a blocker. **`buildMinimalImportXlsx` helper** — the helper only populates cell 0 (the index column). Production code reads additional columns (colBox, colDate, etc.) via `getCell()`, which silently returns `""` for missing cells. This is fine because the import handles empty metadata, but a reader encountering failing tests down the road might be confused by the minimal fixture. Already reviewed and accepted the behavior — just noting it for future maintainers. ### Blockers None.

marcel commented

2026-05-18 18:25:43 +02:00

🏛️ Markus Keller — Senior Application Architect

Verdict: ✅ Approved

Architecture checklist

Layering: MassImportService validates before delegating to importSingleDocument, which delegates to documentService. No layer violations introduced.

Module boundaries: SkippedFile and ProcessResult are package-scoped inner types of MassImportService — correct. They don't leak into other domains.

No cross-domain repository access: ✅ — documentService is called, never documentRepository directly.

Doc update table (per review protocol):

Trigger	Required update	Status
New Flyway migration	DB diagrams	Not applicable — no schema change
New backend package	CLAUDE.md package table	Not applicable
New SvelteKit route	CLAUDE.md route table	Not applicable
New ErrorCode	CLAUDE.md + ARCHITECTURE.md	Not applicable — reason codes are untyped strings, not `ErrorCode` enum values (see note below)

No documentation blockers.

Architectural note (non-blocking)

SkippedFile.reason is a raw String. The values "INVALID_PDF_SIGNATURE" and "FILE_READ_ERROR" are de-facto API contracts — the frontend's reasonLabel() switches on them. As more rejection reasons accumulate, an untyped string becomes harder to track across backend and frontend.

Options:

Keep as-is (KISS — only two reasons, very stable)
Promote to an enum (SkipReason) and add it to the API spec — correct but adds ceremony for two values
Add to ErrorCode enum — but these aren't HTTP error codes, so semantically wrong

Option 1 is fine for the current scope. If a third reason is added in the future, an enum becomes the clear choice. An ADR noting this decision would be valuable but is not required for this PR.

Blockers

None.

## 🏛️ Markus Keller — Senior Application Architect **Verdict: ✅ Approved** ### Architecture checklist **Layering:** `MassImportService` validates before delegating to `importSingleDocument`, which delegates to `documentService`. No layer violations introduced. **Module boundaries:** `SkippedFile` and `ProcessResult` are package-scoped inner types of `MassImportService` — correct. They don't leak into other domains. **No cross-domain repository access:** ✅ — `documentService` is called, never `documentRepository` directly. **Doc update table (per review protocol):** | Trigger | Required update | Status | |---|---|---| | New Flyway migration | DB diagrams | Not applicable — no schema change | | New backend package | CLAUDE.md package table | Not applicable | | New SvelteKit route | CLAUDE.md route table | Not applicable | | New ErrorCode | CLAUDE.md + ARCHITECTURE.md | Not applicable — reason codes are untyped strings, not `ErrorCode` enum values (see note below) | No documentation blockers. ### Architectural note (non-blocking) `SkippedFile.reason` is a raw `String`. The values `"INVALID_PDF_SIGNATURE"` and `"FILE_READ_ERROR"` are de-facto API contracts — the frontend's `reasonLabel()` switches on them. As more rejection reasons accumulate, an untyped string becomes harder to track across backend and frontend. **Options:** 1. Keep as-is (KISS — only two reasons, very stable) 2. Promote to an enum (`SkipReason`) and add it to the API spec — correct but adds ceremony for two values 3. Add to `ErrorCode` enum — but these aren't HTTP error codes, so semantically wrong Option 1 is fine for the current scope. If a third reason is added in the future, an enum becomes the clear choice. An ADR noting this decision would be valuable but is not required for this PR. ### Blockers None.

marcel commented

2026-05-18 18:25:49 +02:00

🔧 Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

This PR touches no infrastructure files — no Compose changes, no CI workflow changes, no Docker image updates.

CI impact check:

New tests use @TempDir from JUnit 5 for temporary file creation — JVM manages cleanup, no disk leaks in the CI runner
buildMinimalImportXlsx writes to the @TempDir path, not the project directory — safe
No additional containers or external services introduced by new tests
The spy(service) pattern in the IOException test runs entirely in-process — no network or filesystem side effects

What's clean: The openFileStream() seam means the IOException test doesn't need a mock filesystem, just a Mockito spy. That keeps the test fast and deterministic in CI.

Nothing to flag. The test suite for this feature should run in the existing unit test budget without adding meaningful CI time.

Blockers

None.

## 🔧 Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** This PR touches no infrastructure files — no Compose changes, no CI workflow changes, no Docker image updates. **CI impact check:** - New tests use `@TempDir` from JUnit 5 for temporary file creation — JVM manages cleanup, no disk leaks in the CI runner - `buildMinimalImportXlsx` writes to the `@TempDir` path, not the project directory — safe - No additional containers or external services introduced by new tests - The `spy(service)` pattern in the IOException test runs entirely in-process — no network or filesystem side effects **What's clean:** The `openFileStream()` seam means the IOException test doesn't need a mock filesystem, just a Mockito spy. That keeps the test fast and deterministic in CI. Nothing to flag. The test suite for this feature should run in the existing unit test budget without adding meaningful CI time. ### Blockers None.

marcel commented

2026-05-18 18:26:01 +02:00

📋 Elicit — Requirements Engineer

Verdict: ✅ Approved with open questions for the backlog

The implementation correctly satisfies the stated requirement: reject files that do not begin with the PDF magic bytes (%PDF) before uploading to S3. The API shape decision (separate skipped count + skippedFiles list) was Marcel's explicit decision per the PR description — that's clear.

Acceptance criteria check

Criterion	Status
Valid PDFs are uploaded to S3	✅ Covered by test `runImportAsync_uploadsValidPdf_andSkipsFakeOne`
Non-PDFs are rejected before S3 upload	✅ Covered
Skipped files appear in `skippedFiles` list	✅ Covered
Files shorter than 4 bytes are rejected	✅ Covered
Admin UI shows skipped count in DONE state	✅ Covered by frontend tests
Skipped section hidden when skipped = 0	✅ Covered
Skipped section hidden when state ≠ DONE	✅ Covered for RUNNING; FAILED state implicit but untested

Open questions (backlog candidates, not blockers)

OQ-1: Skipped file persistence across server restarts
currentStatus is an in-memory volatile field. A server restart clears the skipped file list. Is this acceptable, or should skipped files be persisted (e.g. logged to a database table or a file)? For a family archive with infrequent imports, in-memory is probably fine — but the admin has no historical record of past skipped files beyond log files.

OQ-2: Reason code for files shorter than 4 bytes
Files with fewer than 4 bytes get reason "INVALID_PDF_SIGNATURE" (because the magic bytes check returns false). A separate reason like "FILE_TOO_SHORT" might help admins distinguish "wrong file type" from "corrupted/empty file." Consider whether this matters for the admin's workflow.

OQ-3: No notification for metadata-only rows
Rows where the file doesn't exist on disk (fileOnDisk.isEmpty()) skip the magic bytes check and proceed. This is correct — no file to validate. The requirement should explicitly state this is by design.

Blockers

None.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved with open questions for the backlog** The implementation correctly satisfies the stated requirement: reject files that do not begin with the PDF magic bytes (`%PDF`) before uploading to S3. The API shape decision (separate `skipped` count + `skippedFiles` list) was Marcel's explicit decision per the PR description — that's clear. ### Acceptance criteria check | Criterion | Status | |---|---| | Valid PDFs are uploaded to S3 | ✅ Covered by test `runImportAsync_uploadsValidPdf_andSkipsFakeOne` | | Non-PDFs are rejected before S3 upload | ✅ Covered | | Skipped files appear in `skippedFiles` list | ✅ Covered | | Files shorter than 4 bytes are rejected | ✅ Covered | | Admin UI shows skipped count in DONE state | ✅ Covered by frontend tests | | Skipped section hidden when skipped = 0 | ✅ Covered | | Skipped section hidden when state ≠ DONE | ✅ Covered for RUNNING; FAILED state implicit but untested | ### Open questions (backlog candidates, not blockers) **OQ-1: Skipped file persistence across server restarts** `currentStatus` is an in-memory volatile field. A server restart clears the skipped file list. Is this acceptable, or should skipped files be persisted (e.g. logged to a database table or a file)? For a family archive with infrequent imports, in-memory is probably fine — but the admin has no historical record of past skipped files beyond log files. **OQ-2: Reason code for files shorter than 4 bytes** Files with fewer than 4 bytes get reason `"INVALID_PDF_SIGNATURE"` (because the magic bytes check returns `false`). A separate reason like `"FILE_TOO_SHORT"` might help admins distinguish "wrong file type" from "corrupted/empty file." Consider whether this matters for the admin's workflow. **OQ-3: No notification for metadata-only rows** Rows where the file doesn't exist on disk (`fileOnDisk.isEmpty()`) skip the magic bytes check and proceed. This is correct — no file to validate. The requirement should explicitly state this is by design. ### Blockers None.

marcel commented

2026-05-18 18:26:16 +02:00

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: ✅ Approved

This PR addresses a real threat correctly: an admin could accidentally (or via a compromised spreadsheet) include non-PDF files in an import batch. Validating file content before S3 upload is the right defense.

What's done correctly

Content-based validation — reads the first 4 bytes from disk, not the filename or a Content-Type header. Both of those are trivially spoofed. Magic bytes are harder to fake accidentally.

SLF4J {} placeholders for filenames in log lines:

log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename)

Filename comes from the spreadsheet column (admin-controlled input). The {} placeholder prevents JNDI lookup injection (the Log4Shell class of vulnerabilities). ✅

try-with-resources on InputStream — no file handle leak, even on exception path. ✅

Controlled reason codes — "INVALID_PDF_SIGNATURE" and "FILE_READ_ERROR" in SkippedFile.reason are string literals from the service code, not reflected from user input. No reflected XSS risk when the admin UI renders these. ✅

Fails closed on IOException — a file that cannot be read is skipped (not uploaded). Correct behavior: deny by default when uncertain. ✅

Known limitations (not defects for this threat model)

Polyglot files: A file starting with %PDF but crafted to be malicious in another format (PDF/ZIP polyglot, for example) passes this check. For the family archive threat model — preventing accidental non-PDF uploads from family members — magic byte validation is sufficient. Full structural PDF validation (e.g. Apache PDFBox) would catch this but is out of scope.

TOCTOU window: Between the magic bytes check and the S3 putObject call, a file on disk could theoretically be swapped. This requires write access to the import directory, which means the attacker already has local system access — a stronger control than this PR should address.

Minor observation

openFileStream() is package-private by omission (no access modifier) rather than private. This is intentional — it's the testability seam used by the spy-based IOException test. A future reviewer might make it private "to reduce visibility" and break the test without understanding why. This is a case where a single-line comment is justified:

// package-private: Mockito spy in tests can override to inject IOException
InputStream openFileStream(File file) throws IOException {

CLAUDE.md says add comments only when the WHY is non-obvious — this qualifies, since removing the seam would look like a safe refactor.

Blockers

None. Security fix is correct and well-implemented.

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** This PR addresses a real threat correctly: an admin could accidentally (or via a compromised spreadsheet) include non-PDF files in an import batch. Validating file content before S3 upload is the right defense. ### What's done correctly **Content-based validation** — reads the first 4 bytes from disk, not the filename or a `Content-Type` header. Both of those are trivially spoofed. Magic bytes are harder to fake accidentally. **SLF4J `{}` placeholders for filenames in log lines:** ```java log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename) ``` Filename comes from the spreadsheet column (admin-controlled input). The `{}` placeholder prevents JNDI lookup injection (the Log4Shell class of vulnerabilities). ✅ **`try-with-resources` on InputStream** — no file handle leak, even on exception path. ✅ **Controlled reason codes** — `"INVALID_PDF_SIGNATURE"` and `"FILE_READ_ERROR"` in `SkippedFile.reason` are string literals from the service code, not reflected from user input. No reflected XSS risk when the admin UI renders these. ✅ **Fails closed on IOException** — a file that cannot be read is skipped (not uploaded). Correct behavior: deny by default when uncertain. ✅ ### Known limitations (not defects for this threat model) **Polyglot files:** A file starting with `%PDF` but crafted to be malicious in another format (PDF/ZIP polyglot, for example) passes this check. For the family archive threat model — preventing accidental non-PDF uploads from family members — magic byte validation is sufficient. Full structural PDF validation (e.g. Apache PDFBox) would catch this but is out of scope. **TOCTOU window:** Between the magic bytes check and the S3 `putObject` call, a file on disk could theoretically be swapped. This requires write access to the import directory, which means the attacker already has local system access — a stronger control than this PR should address. ### Minor observation `openFileStream()` is package-private by omission (no access modifier) rather than `private`. This is intentional — it's the testability seam used by the `spy`-based IOException test. A future reviewer might make it `private` "to reduce visibility" and break the test without understanding why. This is a case where a single-line comment is justified: ```java // package-private: Mockito spy in tests can override to inject IOException InputStream openFileStream(File file) throws IOException { ``` CLAUDE.md says add comments only when the WHY is non-obvious — this qualifies, since removing the seam would look like a safe refactor. ### Blockers None. Security fix is correct and well-implemented.

marcel commented

2026-05-18 18:26:31 +02:00

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: ⚠️ Approved with concerns

The new tests are well-structured and cover the right scenarios. The setupOneValidOneFakeImport helper correctly DRYs up three tests. Real filesystem I/O via @TempDir is the right choice — no mocked filesystem needed. The openFileStream seam + spy is a clean approach for the IOException case.

Blockers

None.

Concerns

1. PR description test counts don't match the diff

The PR body says:

"Four new regression tests cover the one-upload-one-skip scenario..."
"3 new skipped-display tests pass alongside 9 existing tests"

Actual count in the diff:

Backend: 5 new tests (_uploadsValidPdf, _setsSkippedCount, _includesRejectedFilename, _skipsFile_whenShorterThanFourBytes, _skipsFile_whenMagicBytesCheckThrowsIOException)
Frontend: 4 new tests (shows skipped count, shows skipped filenames, does not show skipped section when DONE and skipped is 0, does not show skipped section when RUNNING)

Minor discrepancy, but test plans should match implementation.

2. FAILED state not explicitly tested in the frontend

The frontend tests cover DONE (shown), DONE+skipped=0 (hidden), and RUNNING (hidden). The FAILED state is not explicitly tested. The implementation correctly hides the skipped section when state is FAILED because the {#if} block is inside the DONE conditional — but this invariant isn't documented by a test.

Suggested addition:

it('does not show skipped section when FAILED even with skipped > 0', async () => {
    const { getByTestId } = render(ImportStatusCard, {
        props: {
            importStatus: makeStatus({
                state: 'FAILED',
                statusCode: 'IMPORT_FAILED_INTERNAL',
                skipped: 1,
                skippedFiles: [{ filename: 'bad.pdf', reason: 'INVALID_PDF_SIGNATURE' }]
            }),
            ontrigger: () => {}
        }
    });
    await expect.element(getByTestId('skipped-count')).not.toBeInTheDocument();
});

3. reasonLabel fallback not tested

reasonLabel('UNKNOWN_CODE') returns 'UNKNOWN_CODE' — this is the correct forward-compatible behavior when new reason codes are added to the backend before the frontend is updated. The fallback branch has no test. A one-liner would close this gap:

it('returns raw code for unknown reason codes', () => {
    expect(reasonLabel('SOME_FUTURE_CODE')).toBe('SOME_FUTURE_CODE');
});

Note: reasonLabel is a module-level function in the <script> block, not exported — testing it directly from ImportStatusCard.svelte.test.ts would require either exporting it or testing via rendered output with a fixture using an unknown reason code.

What's strong

makeStatus factory updated correctly — new tests inherit sensible defaults
@TempDir cleanup is automatic — no test isolation risk in CI
The IOException test exercises the actual error path via Mockito spy, not a stub — more realistic than a pure mock

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ⚠️ Approved with concerns** The new tests are well-structured and cover the right scenarios. The `setupOneValidOneFakeImport` helper correctly DRYs up three tests. Real filesystem I/O via `@TempDir` is the right choice — no mocked filesystem needed. The `openFileStream` seam + `spy` is a clean approach for the IOException case. ### Blockers None. ### Concerns **1. PR description test counts don't match the diff** The PR body says: - "Four new regression tests cover the one-upload-one-skip scenario..." - "3 new skipped-display tests pass alongside 9 existing tests" Actual count in the diff: - Backend: **5** new tests (`_uploadsValidPdf`, `_setsSkippedCount`, `_includesRejectedFilename`, `_skipsFile_whenShorterThanFourBytes`, `_skipsFile_whenMagicBytesCheckThrowsIOException`) - Frontend: **4** new tests (`shows skipped count`, `shows skipped filenames`, `does not show skipped section when DONE and skipped is 0`, `does not show skipped section when RUNNING`) Minor discrepancy, but test plans should match implementation. **2. FAILED state not explicitly tested in the frontend** The frontend tests cover DONE (shown), DONE+skipped=0 (hidden), and RUNNING (hidden). The FAILED state is not explicitly tested. The implementation correctly hides the skipped section when state is FAILED because the `{#if}` block is inside the DONE conditional — but this invariant isn't documented by a test. Suggested addition: ```typescript it('does not show skipped section when FAILED even with skipped > 0', async () => { const { getByTestId } = render(ImportStatusCard, { props: { importStatus: makeStatus({ state: 'FAILED', statusCode: 'IMPORT_FAILED_INTERNAL', skipped: 1, skippedFiles: [{ filename: 'bad.pdf', reason: 'INVALID_PDF_SIGNATURE' }] }), ontrigger: () => {} } }); await expect.element(getByTestId('skipped-count')).not.toBeInTheDocument(); }); ``` **3. `reasonLabel` fallback not tested** `reasonLabel('UNKNOWN_CODE')` returns `'UNKNOWN_CODE'` — this is the correct forward-compatible behavior when new reason codes are added to the backend before the frontend is updated. The fallback branch has no test. A one-liner would close this gap: ```typescript it('returns raw code for unknown reason codes', () => { expect(reasonLabel('SOME_FUTURE_CODE')).toBe('SOME_FUTURE_CODE'); }); ``` Note: `reasonLabel` is a module-level function in the `<script>` block, not exported — testing it directly from `ImportStatusCard.svelte.test.ts` would require either exporting it or testing via rendered output with a fixture using an unknown reason code. ### What's strong - `makeStatus` factory updated correctly — new tests inherit sensible defaults - `@TempDir` cleanup is automatic — no test isolation risk in CI - The IOException test exercises the actual error path via Mockito spy, not a stub — more realistic than a pure mock

marcel commented

2026-05-18 18:26:46 +02:00

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Verdict: ⚠️ Approved with one concern to verify

The native <details>/<summary> approach is the right call — semantic HTML, zero JavaScript, keyboard-navigable, screen-reader compatible out of the box. The motion-safe:transition-transform on the chevron correctly respects prefers-reduced-motion. This is thoughtful implementation.

Concern (verify before merge)

warning color token may not be defined in the design system

The component uses border-warning/40 bg-warning/10 which assumes a warning CSS custom property is configured in Tailwind 4. The project's documented brand tokens in CLAUDE.md are: brand-navy, brand-mint, bg-canvas, bg-surface, border-line. No warning token is listed.

If --color-warning is not defined in layout.css or the Tailwind config, these utilities compile to nothing — the amber warning border and background are invisible to the user.

In the same block, text-amber-900 uses a raw Tailwind color, creating a mix of design-system tokens and raw values. Recommend using one approach consistently:

<!-- Option A: if warning token exists in layout.css -->
<details class="border border-warning/40 bg-warning/10 text-warning-900 ...">

<!-- Option B: if it doesn't — use raw Tailwind amber consistently -->
<details class="border border-amber-300 bg-amber-50 text-amber-900 ...">

→ Please confirm --color-warning is defined before merge.

Positive observations

Accessibility:

<details>/<summary> provides disclosure semantics automatically — no ARIA needed
list-none removes native browser triangle and replaces with custom chevron — fine, semantics are preserved
Summary text provides a meaningful accessible name ("3 übersprungen / skipped / omitidos")

Reduced motion:

motion-safe:transition-transform on the chevron — ✅ correctly scoped

Font choices:

font-mono for filenames is appropriate — technical paths in monospace aids readability
text-sm (14px) on filenames is acceptable for an admin-only view, though borderline for the 60+ audience

Touch target:

p-4 padding on the <summary> gives a generous tap target — ✅

Information hierarchy in the summary:

Large bold number (skipped count) + small uppercase label follows the project's card pattern — consistent with the processed count section above it

Blockers

Verify the warning color token exists. If it doesn't, use raw amber utilities consistently across border-*, bg-*, and text-*.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist **Verdict: ⚠️ Approved with one concern to verify** The native `<details>`/`<summary>` approach is the right call — semantic HTML, zero JavaScript, keyboard-navigable, screen-reader compatible out of the box. The `motion-safe:transition-transform` on the chevron correctly respects `prefers-reduced-motion`. This is thoughtful implementation. ### Concern (verify before merge) **`warning` color token may not be defined in the design system** The component uses `border-warning/40 bg-warning/10` which assumes a `warning` CSS custom property is configured in Tailwind 4. The project's documented brand tokens in CLAUDE.md are: `brand-navy`, `brand-mint`, `bg-canvas`, `bg-surface`, `border-line`. No `warning` token is listed. If `--color-warning` is not defined in `layout.css` or the Tailwind config, these utilities compile to nothing — the amber warning border and background are invisible to the user. In the same block, `text-amber-900` uses a raw Tailwind color, creating a mix of design-system tokens and raw values. Recommend using one approach consistently: ```svelte  <details class="border border-warning/40 bg-warning/10 text-warning-900 ...">  <details class="border border-amber-300 bg-amber-50 text-amber-900 ..."> ``` → **Please confirm `--color-warning` is defined before merge.** ### Positive observations **Accessibility:** - `<details>`/`<summary>` provides disclosure semantics automatically — no ARIA needed - `list-none` removes native browser triangle and replaces with custom chevron — fine, semantics are preserved - Summary text provides a meaningful accessible name ("3 übersprungen / skipped / omitidos") **Reduced motion:** - `motion-safe:transition-transform` on the chevron — ✅ correctly scoped **Font choices:** - `font-mono` for filenames is appropriate — technical paths in monospace aids readability - `text-sm` (14px) on filenames is acceptable for an admin-only view, though borderline for the 60+ audience **Touch target:** - `p-4` padding on the `<summary>` gives a generous tap target — ✅ **Information hierarchy in the summary:** - Large bold number (skipped count) + small uppercase label follows the project's card pattern — consistent with the `processed` count section above it ### Blockers Verify the `warning` color token exists. If it doesn't, use raw amber utilities consistently across `border-*`, `bg-*`, and `text-*`.

marcel added 1 commit 2026-05-18 19:05:00 +02:00

fix(import): address round-3 review concerns

CI / Unit & Component Tests (pull_request) Successful in 3m13s

Details

CI / OCR Service Tests (pull_request) Successful in 19s

Details

CI / Backend Unit Tests (pull_request) Successful in 3m8s

Details

CI / fail2ban Regex (pull_request) Successful in 41s

Details

CI / Semgrep Security Scan (pull_request) Successful in 19s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s

Details

f144b025b0

- Add comment to openFileStream() explaining package-private visibility
  is intentional (Mockito spy seam for IOException test)
- Key {#each} skippedFiles by filename instead of array index
- Add test: skipped section hidden when state is FAILED
- Add test: reasonLabel returns raw code for unknown reason strings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel commented

2026-05-18 19:05:10 +02:00

Round-3 concerns addressed — `f144b025`

All open reviewer concerns from the round-2 review have been resolved in commit f144b025.

Concern	Reviewer	Resolution
`openFileStream()` package-private visibility is non-obvious	@NullX (Nora)	Added one-line comment explaining the seam is intentional for Mockito spy
`{#each}` keyed by array index instead of stable key	@felixbrandt (Felix)	Changed to `(skipped.filename)` — semantically correct, no duplicates in one run
FAILED state: skipped section hidden (invariant undocumented)	@saraholt (Sara)	New test: `does not show skipped section when FAILED even with skipped > 0`
`reasonLabel` fallback path has no test	@saraholt (Sara)	New test: `shows raw reason code for unknown skip reasons` — renders component with `'SOME_FUTURE_CODE'` and asserts it appears
`warning` color token verification	@leonievoss (Leonie)	No change needed — `--color-warning: #b45309` is defined in `layout.css:77`; `border-warning/40 bg-warning/10` is valid. `text-amber-900` is the accessible dark-text choice (no dark-warning token exists)

## Round-3 concerns addressed — f144b025 All open reviewer concerns from the round-2 review have been resolved in commit `f144b025`. | Concern | Reviewer | Resolution | |---|---|---| | `openFileStream()` package-private visibility is non-obvious | @NullX (Nora) | Added one-line comment explaining the seam is intentional for Mockito spy | | `{#each}` keyed by array index instead of stable key | @felixbrandt (Felix) | Changed to `(skipped.filename)` — semantically correct, no duplicates in one run | | FAILED state: skipped section hidden (invariant undocumented) | @saraholt (Sara) | New test: `does not show skipped section when FAILED even with skipped > 0` | | `reasonLabel` fallback path has no test | @saraholt (Sara) | New test: `shows raw reason code for unknown skip reasons` — renders component with `'SOME_FUTURE_CODE'` and asserts it appears | | `warning` color token verification | @leonievoss (Leonie) | **No change needed** — `--color-warning: #b45309` is defined in `layout.css:77`; `border-warning/40 bg-warning/10` is valid. `text-amber-900` is the accessible dark-text choice (no dark-warning token exists) |

marcel commented

2026-05-18 21:58:15 +02:00

🏛️ Markus Keller — Senior Application Architect

Verdict: ⚠️ Approved with concerns

Good, focused PR. Magic byte validation belongs where it is — before S3 upload, inside the service layer. No layer boundary violations.

Blockers

types.ts is manually maintained instead of auto-generated

frontend/src/routes/admin/system/types.ts defines ImportStatus and SkippedFile by hand. The backend now has @Schema(requiredMode = REQUIRED) annotations on all new fields, which means npm run generate:api should produce the canonical TypeScript types. The manually maintained file risks silent drift the next time a backend field changes — especially dangerous for non-required fields (like startedAt: string | null) that might quietly become wrong.

CLAUDE.md is explicit: "always run npm run generate:api in frontend/ after any backend model or endpoint change." Either migrate these types into the auto-generated client or add a comment explaining why this file exists alongside the generated types.

Concerns

openFileStream test seam is package-private production code

// package-private: Mockito spy in tests can override to inject IOException
InputStream openFileStream(File file) throws IOException {
    return new FileInputStream(file);
}

The comment is correct — this is the right "why" explanation. But package-private methods in a @Service class that exist solely as test hooks are a code smell. A cleaner alternative is extracting a FileReader collaborator (one method: openStream(File)) and injecting it. That makes the seam explicit without leaking the hook into the production class surface. Acceptable for now if the team is comfortable with the trade-off, but worth an ADR note or inline acknowledgment that this is a deliberate test seam.

@JsonProperty("skipped") on a record accessor method is unconventional

@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
@JsonProperty("skipped")
public int skipped() { return skippedFiles.size(); }

This works and is tested, but it's the only method in the codebase that combines @JsonProperty with a record accessor to expose a derived value. The alternative — adding skipped as a stored field initialized to skippedFiles.size() — avoids the Jackson annotation magic and is more readable for future maintainers. Not a blocker, but the pattern is worth documenting.

Doc check

Changed area	Required update	Status
No new Flyway migration	No DB diagram update	✅
No new backend package	No CLAUDE.md package table update	✅
No new SvelteKit route	No route table update	✅
`ImportStatus` API shape changed	`npm run generate:api` should have been run	⚠️ See blocker
No new Docker service	No DEPLOYMENT.md update	✅

The import flow change (magic byte pre-validation) does not affect seq-document-upload.puml since that covers user-facing document uploads, not the admin batch import.

## 🏛️ Markus Keller — Senior Application Architect **Verdict: ⚠️ Approved with concerns** Good, focused PR. Magic byte validation belongs where it is — before S3 upload, inside the service layer. No layer boundary violations. ### Blockers **`types.ts` is manually maintained instead of auto-generated** `frontend/src/routes/admin/system/types.ts` defines `ImportStatus` and `SkippedFile` by hand. The backend now has `@Schema(requiredMode = REQUIRED)` annotations on all new fields, which means `npm run generate:api` should produce the canonical TypeScript types. The manually maintained file risks silent drift the next time a backend field changes — especially dangerous for non-required fields (like `startedAt: string | null`) that might quietly become wrong. CLAUDE.md is explicit: "always run `npm run generate:api` in `frontend/` after any backend model or endpoint change." Either migrate these types into the auto-generated client or add a comment explaining why this file exists alongside the generated types. ### Concerns **`openFileStream` test seam is package-private production code** ```java // package-private: Mockito spy in tests can override to inject IOException InputStream openFileStream(File file) throws IOException { return new FileInputStream(file); } ``` The comment is correct — this is the right "why" explanation. But package-private methods in a `@Service` class that exist solely as test hooks are a code smell. A cleaner alternative is extracting a `FileReader` collaborator (one method: `openStream(File)`) and injecting it. That makes the seam explicit without leaking the hook into the production class surface. Acceptable for now if the team is comfortable with the trade-off, but worth an ADR note or inline acknowledgment that this is a deliberate test seam. **`@JsonProperty("skipped")` on a record accessor method is unconventional** ```java @Schema(requiredMode = Schema.RequiredMode.REQUIRED) @JsonProperty("skipped") public int skipped() { return skippedFiles.size(); } ``` This works and is tested, but it's the only method in the codebase that combines `@JsonProperty` with a record accessor to expose a derived value. The alternative — adding `skipped` as a stored field initialized to `skippedFiles.size()` — avoids the Jackson annotation magic and is more readable for future maintainers. Not a blocker, but the pattern is worth documenting. ### Doc check | Changed area | Required update | Status | |---|---|---| | No new Flyway migration | No DB diagram update | ✅ | | No new backend package | No CLAUDE.md package table update | ✅ | | No new SvelteKit route | No route table update | ✅ | | `ImportStatus` API shape changed | `npm run generate:api` should have been run | ⚠️ See blocker | | No new Docker service | No DEPLOYMENT.md update | ✅ | The import flow change (magic byte pre-validation) does not affect `seq-document-upload.puml` since that covers user-facing document uploads, not the admin batch import.

marcel commented

2026-05-18 21:58:34 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ⚠️ Approved with concerns

The core feature is clean. isPdfMagicBytes is well-named, properly isolated, and the try-with-resources is correct. Tests precede the implementation (I can verify from the PR history). The Svelte component change is appropriately scoped. One blocker and a few things worth naming.

Blockers

importSingleDocument returning false conflates two semantically different outcomes

The method now returns false in two distinct cases:

Document already exists and is not PLACEHOLDER → intentional skip ("I've seen this before")
S3 upload throws an exception → unexpected failure ("something went wrong")

In the original code, count++ happened unconditionally at the call site regardless of what the method did internally (early return or success). The new code only increments processed when the method returns true. This means:

Documents that already exist: previously counted as processed, now not counted ✓ (this is arguably a bug fix)
Documents with S3 upload failures: previously counted as processed, now not counted ✓ (also arguably a bug fix)

But both cases are now semantically unified under return false without distinction. An S3 failure is not the same as a deliberate skip — one belongs in skippedFiles, the other in an error log. Neither is surfaced to the admin UI. If this behavior change is intentional, it needs a test that specifically verifies the count semantics for both cases. Right now there's no test asserting "S3 upload failure → processed count stays at 0."

Consider using a tri-state result: IMPORTED, SKIPPED, ERROR — or at minimum, separate the existing-document skip from the S3-error path.

Concerns

reasonLabel should be a switch expression

function reasonLabel(code: string): string {
    if (code === 'INVALID_PDF_SIGNATURE') return m.import_reason_invalid_pdf_signature();
    if (code === 'FILE_READ_ERROR') return m.import_reason_file_read_error();
    return code;
}

Two cases, so it's fine now. When a third reason is added, the chain will grow. A switch (code) expression with a default: return code makes the exhaustion intent explicit. Not a blocker for two cases.

The {#each} is correctly keyed — good

{#each importStatus.skippedFiles as skipped (skipped.filename)}

(skipped.filename) is the right key assuming filenames are unique per import run. If the same filename appears twice in one import, Svelte will reuse the DOM node incorrectly. Given that the service deduplicates by filename (you can only import the same filename once before it's counted as existing), this key is safe in practice. ✅

$derived not used for failureMessage

Looking at the existing const failureMessage = $derived(...) in the component — good, this is already correct. The new reasonLabel is a plain function, not a $derived, which is appropriate since it takes an argument. ✅

Backend comment explains the test seam correctly

// package-private: Mockito spy in tests can override to inject IOException
InputStream openFileStream(File file) throws IOException {

This is a legitimate "why" comment. The test proves the seam works. ✅

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ⚠️ Approved with concerns** The core feature is clean. `isPdfMagicBytes` is well-named, properly isolated, and the try-with-resources is correct. Tests precede the implementation (I can verify from the PR history). The Svelte component change is appropriately scoped. One blocker and a few things worth naming. ### Blockers **`importSingleDocument` returning `false` conflates two semantically different outcomes** The method now returns `false` in two distinct cases: 1. Document already exists and is not `PLACEHOLDER` → intentional skip ("I've seen this before") 2. S3 upload throws an exception → unexpected failure ("something went wrong") In the original code, `count++` happened unconditionally at the call site regardless of what the method did internally (early return or success). The new code only increments `processed` when the method returns `true`. This means: - Documents that already exist: previously counted as processed, now not counted ✓ (this is arguably a bug fix) - Documents with S3 upload failures: previously counted as processed, now not counted ✓ (also arguably a bug fix) But both cases are now semantically unified under `return false` without distinction. An S3 failure is not the same as a deliberate skip — one belongs in `skippedFiles`, the other in an error log. Neither is surfaced to the admin UI. If this behavior change is intentional, it needs a test that specifically verifies the count semantics for both cases. Right now there's no test asserting "S3 upload failure → processed count stays at 0." Consider using a tri-state result: `IMPORTED`, `SKIPPED`, `ERROR` — or at minimum, separate the existing-document skip from the S3-error path. ### Concerns **`reasonLabel` should be a switch expression** ```typescript function reasonLabel(code: string): string { if (code === 'INVALID_PDF_SIGNATURE') return m.import_reason_invalid_pdf_signature(); if (code === 'FILE_READ_ERROR') return m.import_reason_file_read_error(); return code; } ``` Two cases, so it's fine now. When a third reason is added, the chain will grow. A `switch (code)` expression with a `default: return code` makes the exhaustion intent explicit. Not a blocker for two cases. **The `{#each}` is correctly keyed — good** ```svelte {#each importStatus.skippedFiles as skipped (skipped.filename)} ``` `(skipped.filename)` is the right key assuming filenames are unique per import run. If the same filename appears twice in one import, Svelte will reuse the DOM node incorrectly. Given that the service deduplicates by filename (you can only import the same filename once before it's counted as existing), this key is safe in practice. ✅ **`$derived` not used for `failureMessage`** Looking at the existing `const failureMessage = $derived(...)` in the component — good, this is already correct. The new `reasonLabel` is a plain function, not a `$derived`, which is appropriate since it takes an argument. ✅ **Backend comment explains the test seam correctly** ```java // package-private: Mockito spy in tests can override to inject IOException InputStream openFileStream(File file) throws IOException { ``` This is a legitimate "why" comment. The test proves the seam works. ✅

marcel commented

2026-05-18 21:58:43 +02:00

🔧 Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

This PR has no changes to infrastructure files, Docker Compose configuration, CI workflows, or environment variables. There is nothing for me to flag.

What I checked

Docker Compose: No new services, no port changes, no volume changes. The magic byte check reads from the local import directory that is already mounted — no new bind mounts introduced. ✅
CI pipeline: No changes to workflow files. The new tests run as part of the existing Maven test phase. ✅
Environment variables: No new env vars introduced. The validation logic is self-contained in the service. ✅
S3/MinIO interaction: The check happens before s3Client.putObject, so invalid files never reach MinIO. This is the correct placement — it reduces unnecessary object storage traffic. ✅
Logging: SLF4J {} placeholders are used correctly in all new log statements. No string concatenation in log calls. ✅

One observation (not blocking)

The importDir is a configured path on the host, and the magic byte check opens a FileInputStream on files within it. If the import directory is on a network mount (NFS, SMB) and the mount is temporarily unavailable, the IOException is caught and adds the file to skippedFiles with FILE_READ_ERROR. This is graceful degradation. The operator will see the skipped count in the admin UI and can investigate. No action needed — just noting the operational behavior is correct.

## 🔧 Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** This PR has no changes to infrastructure files, Docker Compose configuration, CI workflows, or environment variables. There is nothing for me to flag. ### What I checked - **Docker Compose**: No new services, no port changes, no volume changes. The magic byte check reads from the local import directory that is already mounted — no new bind mounts introduced. ✅ - **CI pipeline**: No changes to workflow files. The new tests run as part of the existing Maven test phase. ✅ - **Environment variables**: No new env vars introduced. The validation logic is self-contained in the service. ✅ - **S3/MinIO interaction**: The check happens *before* `s3Client.putObject`, so invalid files never reach MinIO. This is the correct placement — it reduces unnecessary object storage traffic. ✅ - **Logging**: SLF4J `{}` placeholders are used correctly in all new log statements. No string concatenation in log calls. ✅ ### One observation (not blocking) The `importDir` is a configured path on the host, and the magic byte check opens a `FileInputStream` on files within it. If the import directory is on a network mount (NFS, SMB) and the mount is temporarily unavailable, the `IOException` is caught and adds the file to `skippedFiles` with `FILE_READ_ERROR`. This is graceful degradation. The operator will see the skipped count in the admin UI and can investigate. No action needed — just noting the operational behavior is correct.

marcel commented

2026-05-18 21:58:58 +02:00

📋 Elicit — Requirements Engineer & Business Analyst

Verdict: ⚠️ Approved with concerns

The implementation satisfies the core requirement (validate PDF magic bytes, report skipped files). Two open questions from a requirements perspective that could affect users and operators.

Blockers

Metadata import behavior is undefined when the file is rejected

When isPdfMagicBytes returns false, the code does continue — skipping importSingleDocument entirely. This means:

The file is not uploaded to S3 ✅ (intended)
The document metadata row is also not imported ✗ (likely unintended)

In the current import flow, importSingleDocument handles both file upload AND metadata persistence (creating the document entity). If an admin has a spreadsheet row for a file called rechnung.pdf but that file turns out to be a JPEG renamed to .pdf, the entire row is silently skipped — no document entity, no PLACEHOLDER status, nothing in the archive.

The original issue #529 likely targeted the file security concern, not the metadata concern. The user's expectation may well be: "skip the file upload but still create the document record as a PLACEHOLDER so I know the row exists." The current implementation conflates two concerns.

Recommendation: Clarify with Marcel whether the intent is (a) skip everything for invalid files, or (b) create a PLACEHOLDER document entry but skip the S3 upload. If (b), the logic needs adjustment.

Concerns

The skippedFiles list is in-memory only — it is lost on application restart

currentStatus is a volatile field initialized to IDLE. After a restart, the admin sees IDLE with no record of the previous import's skipped files. For a family archive with infrequent imports, this may be acceptable — but it should be a conscious decision. If the admin restarts the backend mid-review, they lose the skipped file list permanently.

The reason field is a machine code exposed to end-users

The SkippedFile.reason field contains values like INVALID_PDF_SIGNATURE and FILE_READ_ERROR. The frontend translates these to human-readable strings for the two known codes, and falls back to the raw code for unknown ones. This is the right fallback, but it means future reason codes added on the backend require a synchronized frontend update. There is no exhaustive type-safety between the backend string constants and the frontend reasonLabel function — no compile-time guarantee that a new backend code gets a frontend translation.

## 📋 Elicit — Requirements Engineer & Business Analyst **Verdict: ⚠️ Approved with concerns** The implementation satisfies the core requirement (validate PDF magic bytes, report skipped files). Two open questions from a requirements perspective that could affect users and operators. ### Blockers **Metadata import behavior is undefined when the file is rejected** When `isPdfMagicBytes` returns `false`, the code does `continue` — skipping `importSingleDocument` entirely. This means: - The file is not uploaded to S3 ✅ (intended) - **The document metadata row is also not imported** ✗ (likely unintended) In the current import flow, `importSingleDocument` handles both file upload AND metadata persistence (creating the document entity). If an admin has a spreadsheet row for a file called `rechnung.pdf` but that file turns out to be a JPEG renamed to `.pdf`, the entire row is silently skipped — no document entity, no PLACEHOLDER status, nothing in the archive. The original issue #529 likely targeted the file security concern, not the metadata concern. The user's expectation may well be: "skip the file upload but still create the document record as a PLACEHOLDER so I know the row exists." The current implementation conflates two concerns. **Recommendation**: Clarify with Marcel whether the intent is (a) skip everything for invalid files, or (b) create a PLACEHOLDER document entry but skip the S3 upload. If (b), the logic needs adjustment. ### Concerns **The `skippedFiles` list is in-memory only — it is lost on application restart** `currentStatus` is a volatile field initialized to `IDLE`. After a restart, the admin sees `IDLE` with no record of the previous import's skipped files. For a family archive with infrequent imports, this may be acceptable — but it should be a conscious decision. If the admin restarts the backend mid-review, they lose the skipped file list permanently. **The `reason` field is a machine code exposed to end-users** The `SkippedFile.reason` field contains values like `INVALID_PDF_SIGNATURE` and `FILE_READ_ERROR`. The frontend translates these to human-readable strings for the two known codes, and falls back to the raw code for unknown ones. This is the right fallback, but it means future reason codes added on the backend require a synchronized frontend update. There is no exhaustive type-safety between the backend string constants and the frontend `reasonLabel` function — no compile-time guarantee that a new backend code gets a frontend translation.

marcel commented

2026-05-18 21:59:17 +02:00

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: ⚠️ Approved with concerns

This PR directly addresses a file upload security concern (CWE-434: Unrestricted Upload of File with Dangerous Type). The implementation is correct for what it does. My concerns are about what it doesn't do, and one structural observation.

Concerns (not blockers for this PR, but register them)

Magic bytes are a necessary but insufficient control — polyglot files bypass it

A file that begins with %PDF but contains malicious content (JavaScript payload, macro-enabled content embedded after the header) will pass this check. Known polyglot attack patterns:

%PDF-1.4\n%ÁÂÃ\n... followed by embedded HTML/JavaScript
ZIP-over-PDF polyglots (pass magic byte check, are valid ZIP archives)

For this project — an admin-only batch import of family documents from a trusted local directory — the risk is low. The threat model is "prevent accidentally uploading non-PDFs" rather than "prevent adversarial file uploads." Magic byte validation is proportionate to that threat model.

If the threat model expands (e.g., user-uploaded files via the web UI), the validation layer should be upgraded to Apache Tika or a content-aware validator that inspects the full file structure, not just the header.

The filename field in SkippedFile comes from the spreadsheet and is user-controlled input

skippedFiles.add(new SkippedFile(filename, "INVALID_PDF_SIGNATURE"));

filename originates from the spreadsheet cell. This value is returned in the API response and rendered in the Svelte template:

{skipped.filename} — {reasonLabel(skipped.reason)}

Svelte auto-escapes text content in {...} expressions, so stored XSS is not possible here. ✅ Log injection is mitigated by SLF4J parameterized logging. ✅ No concern here — documenting for audit completeness.

What's done well

try (InputStream is = openFileStream(file)) — correct resource management, no file handle leak under any code path ✅
header.length == 4 check — correctly handles files shorter than 4 bytes rather than throwing an ArrayIndexOutOfBoundsException ✅
IOException is caught and logged with the filename, then results in a FILE_READ_ERROR skip rather than crashing the import ✅
SLF4J {} placeholders throughout — immune to log injection ✅
Validation occurs before S3 upload — invalid files never touch the object store ✅
The openFileStream test seam does not introduce a security boundary — it's only callable from the same package ✅

Suggested future work (file as an issue, not this PR)

Consider adding Apache Tika validation as a second layer for user-uploaded documents via the web UI (if that path doesn't already validate). Magic bytes are the right first layer; content-aware validation is the right second layer.

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ⚠️ Approved with concerns** This PR directly addresses a file upload security concern (CWE-434: Unrestricted Upload of File with Dangerous Type). The implementation is correct for what it does. My concerns are about what it doesn't do, and one structural observation. ### Concerns (not blockers for this PR, but register them) **Magic bytes are a necessary but insufficient control — polyglot files bypass it** A file that begins with `%PDF` but contains malicious content (JavaScript payload, macro-enabled content embedded after the header) will pass this check. Known polyglot attack patterns: - `%PDF-1.4\n%ÁÂÃ\n...` followed by embedded HTML/JavaScript - ZIP-over-PDF polyglots (pass magic byte check, are valid ZIP archives) For this project — an admin-only batch import of family documents from a trusted local directory — the risk is low. The threat model is "prevent accidentally uploading non-PDFs" rather than "prevent adversarial file uploads." Magic byte validation is proportionate to that threat model. If the threat model expands (e.g., user-uploaded files via the web UI), the validation layer should be upgraded to Apache Tika or a content-aware validator that inspects the full file structure, not just the header. **The `filename` field in `SkippedFile` comes from the spreadsheet and is user-controlled input** ```java skippedFiles.add(new SkippedFile(filename, "INVALID_PDF_SIGNATURE")); ``` `filename` originates from the spreadsheet cell. This value is returned in the API response and rendered in the Svelte template: ```svelte {skipped.filename} — {reasonLabel(skipped.reason)} ``` Svelte auto-escapes text content in `{...}` expressions, so stored XSS is not possible here. ✅ Log injection is mitigated by SLF4J parameterized logging. ✅ No concern here — documenting for audit completeness. ### What's done well - `try (InputStream is = openFileStream(file))` — correct resource management, no file handle leak under any code path ✅ - `header.length == 4` check — correctly handles files shorter than 4 bytes rather than throwing an `ArrayIndexOutOfBoundsException` ✅ - `IOException` is caught and logged with the filename, then results in a `FILE_READ_ERROR` skip rather than crashing the import ✅ - SLF4J `{}` placeholders throughout — immune to log injection ✅ - Validation occurs **before** S3 upload — invalid files never touch the object store ✅ - The `openFileStream` test seam does not introduce a security boundary — it's only callable from the same package ✅ ### Suggested future work (file as an issue, not this PR) Consider adding Apache Tika validation as a second layer for user-uploaded documents via the web UI (if that path doesn't already validate). Magic bytes are the right first layer; content-aware validation is the right second layer.

marcel commented

2026-05-18 21:59:33 +02:00

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: ⚠️ Approved with concerns

Test coverage for the new feature is solid. I have one behavior-change concern that lacks a specific test, and a few gaps in the frontend test suite.

Concerns

The processed count semantics changed — no test covers the pre-change behavior

In the original code, count++ was unconditional:

importSingleDocument(cells, fileOnDisk, filename, index);
count++;  // counted regardless of whether the document was skipped internally

Now, processed++ only fires when importSingleDocument returns true. This silently changes what "processed" means:

Documents that already exist (not PLACEHOLDER): previously counted, now not counted
Documents with S3 upload errors: previously counted, now not counted

The new tests cover the PDF-rejection path, but there's no test asserting: "when a document already exists and is skipped, processed stays at 0." Without this, the behavior change is untested and a future refactor could accidentally revert it. Add:

void processRows_doesNotCountExistingDocuments_asProcessed()
void processRows_doesNotCountS3UploadFailures_asProcessed()

Frontend: no test for FILE_READ_ERROR reason label

it('shows raw reason code for unknown skip reasons', ...)  // ✅ tests SOME_FUTURE_CODE

But there's no test for FILE_READ_ERROR specifically. The reasonLabel function has two branches — only one is explicitly tested. Add:

it('translates FILE_READ_ERROR reason to human-readable label', ...)

Frontend: no test for the <details>/<summary> toggle interaction

The collapsible list is rendered via native <details>. There's no test asserting that clicking the summary reveals the filename list. Since vitest-browser-svelte runs against a real browser DOM, this is testable:

it('expands filename list when summary is clicked', async () => {
    // render with skipped files
    // click summary
    // assert filenames are visible
})

What's done well

setupOneValidOneFakeImport factory pattern — one helper, three tests share it ✅
buildMinimalImportXlsx is reusable across backend test cases ✅
The @TempDir annotation handles cleanup — no @AfterEach needed ✅
The spy(service) pattern for the IOException test is the correct approach given the package-private openFileStream seam ✅
Frontend makeStatus factory function extended correctly — existing tests unaffected ✅
The regression test for RUNNING-state hiding is a good guard even though the scenario is currently impossible to reach in production ✅
Five frontend tests + five backend tests for a feature this size is proportionate ✅

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ⚠️ Approved with concerns** Test coverage for the new feature is solid. I have one behavior-change concern that lacks a specific test, and a few gaps in the frontend test suite. ### Concerns **The `processed` count semantics changed — no test covers the pre-change behavior** In the original code, `count++` was unconditional: ```java importSingleDocument(cells, fileOnDisk, filename, index); count++; // counted regardless of whether the document was skipped internally ``` Now, `processed++` only fires when `importSingleDocument` returns `true`. This silently changes what "processed" means: - Documents that already exist (not PLACEHOLDER): previously counted, now not counted - Documents with S3 upload errors: previously counted, now not counted The new tests cover the PDF-rejection path, but there's no test asserting: "when a document already exists and is skipped, `processed` stays at 0." Without this, the behavior change is untested and a future refactor could accidentally revert it. Add: ```java void processRows_doesNotCountExistingDocuments_asProcessed() void processRows_doesNotCountS3UploadFailures_asProcessed() ``` **Frontend: no test for `FILE_READ_ERROR` reason label** ```typescript it('shows raw reason code for unknown skip reasons', ...) // ✅ tests SOME_FUTURE_CODE ``` But there's no test for `FILE_READ_ERROR` specifically. The `reasonLabel` function has two branches — only one is explicitly tested. Add: ```typescript it('translates FILE_READ_ERROR reason to human-readable label', ...) ``` **Frontend: no test for the `<details>/<summary>` toggle interaction** The collapsible list is rendered via native `<details>`. There's no test asserting that clicking the summary reveals the filename list. Since `vitest-browser-svelte` runs against a real browser DOM, this is testable: ```typescript it('expands filename list when summary is clicked', async () => { // render with skipped files // click summary // assert filenames are visible }) ``` ### What's done well - `setupOneValidOneFakeImport` factory pattern — one helper, three tests share it ✅ - `buildMinimalImportXlsx` is reusable across backend test cases ✅ - The `@TempDir` annotation handles cleanup — no `@AfterEach` needed ✅ - The `spy(service)` pattern for the `IOException` test is the correct approach given the package-private `openFileStream` seam ✅ - Frontend `makeStatus` factory function extended correctly — existing tests unaffected ✅ - The regression test for RUNNING-state hiding is a good guard even though the scenario is currently impossible to reach in production ✅ - Five frontend tests + five backend tests for a feature this size is proportionate ✅

marcel commented

2026-05-18 21:59:53 +02:00

🎨 Leonie Voss — UI/UX Design Lead & Accessibility Strategist

Verdict: ⚠️ Approved with concerns

The choice of native <details>/<summary> is excellent — built-in keyboard support (Space/Enter to toggle), no JavaScript needed, and screen readers handle it natively. The amber warning pattern makes sense for a "files were skipped" alert. A few things to tighten.

Concerns

text-amber-900 is a raw Tailwind color, not a semantic token

<details class="mb-4 rounded-sm border border-warning/40 bg-warning/10 p-4 text-amber-900">

The border-warning/40 and bg-warning/10 correctly use the project's semantic token. But text-amber-900 bypasses it. If the warning color ever changes (or if dark mode is added), the text color will be stranded. Replace with a semantic token or extend the warning palette with a text variant:

text-warning-content  <!-- or whatever the semantic token name is -->

If no semantic text variant exists for warning, it should be defined in layout.css alongside --palette-warning.

The <summary> touch target may fall below 44px on some viewports

The summary contains:

A 16px number (text-base font-bold)
A 12px label (text-xs font-bold tracking-widest)

Without explicit min-height, the total touch area depends on line heights and the p-4 padding of the parent <details>. On some browsers, the <summary> itself doesn't inherit the parent padding for its own hit area. Add min-h-[44px] directly to the <summary> element to guarantee the WCAG 2.2 touch target requirement:

<summary class="flex cursor-pointer list-none items-center gap-2 min-h-[44px]">

No warning icon alongside the amber section

The warning is communicated via amber color + "übersprungen/skipped" text. For our senior audience (60+, some with color vision deficiencies), adding a ⚠️ or a triangle SVG icon adjacent to the skipped count reinforces the signal without relying on color alone. This is the WCAG requirement: never convey meaning by color alone.

The chevron SVG is already present (disclosure control), but an additional warning icon in the summary header would strengthen the semantic.

What's done well

motion-safe:transition-transform on the chevron — correctly respects prefers-reduced-motion ✅
list-none on <summary> removes browser-native disclosure triangles — custom SVG takes over cleanly ✅
cursor-pointer makes the interactive surface obvious ✅
font-mono for filenames — appropriate for technical content ✅
text-sm text-ink-2 for filename list — 14px is above minimum, text-ink-2 is a semantic token ✅
data-testid="skipped-count" — good test hook placement ✅
The {#each ... (skipped.filename)} is properly keyed ✅
Scoped <style> block with details[open] .details-chevron — Svelte's scoping ensures no bleed ✅

## 🎨 Leonie Voss — UI/UX Design Lead & Accessibility Strategist **Verdict: ⚠️ Approved with concerns** The choice of native `<details>/<summary>` is excellent — built-in keyboard support (Space/Enter to toggle), no JavaScript needed, and screen readers handle it natively. The amber warning pattern makes sense for a "files were skipped" alert. A few things to tighten. ### Concerns **`text-amber-900` is a raw Tailwind color, not a semantic token** ```svelte <details class="mb-4 rounded-sm border border-warning/40 bg-warning/10 p-4 text-amber-900"> ``` The `border-warning/40` and `bg-warning/10` correctly use the project's semantic token. But `text-amber-900` bypasses it. If the warning color ever changes (or if dark mode is added), the text color will be stranded. Replace with a semantic token or extend the `warning` palette with a text variant: ```svelte text-warning-content  ``` If no semantic text variant exists for warning, it should be defined in `layout.css` alongside `--palette-warning`. **The `<summary>` touch target may fall below 44px on some viewports** The summary contains: - A 16px number (`text-base font-bold`) - A 12px label (`text-xs font-bold tracking-widest`) Without explicit min-height, the total touch area depends on line heights and the `p-4` padding of the parent `<details>`. On some browsers, the `<summary>` itself doesn't inherit the parent padding for its own hit area. Add `min-h-[44px]` directly to the `<summary>` element to guarantee the WCAG 2.2 touch target requirement: ```svelte <summary class="flex cursor-pointer list-none items-center gap-2 min-h-[44px]"> ``` **No warning icon alongside the amber section** The warning is communicated via amber color + "übersprungen/skipped" text. For our senior audience (60+, some with color vision deficiencies), adding a ⚠️ or a triangle SVG icon adjacent to the skipped count reinforces the signal without relying on color alone. This is the WCAG requirement: never convey meaning by color alone. The chevron SVG is already present (disclosure control), but an additional warning icon in the summary header would strengthen the semantic. ### What's done well - `motion-safe:transition-transform` on the chevron — correctly respects `prefers-reduced-motion` ✅ - `list-none` on `<summary>` removes browser-native disclosure triangles — custom SVG takes over cleanly ✅ - `cursor-pointer` makes the interactive surface obvious ✅ - `font-mono` for filenames — appropriate for technical content ✅ - `text-sm text-ink-2` for filename list — 14px is above minimum, `text-ink-2` is a semantic token ✅ - `data-testid="skipped-count"` — good test hook placement ✅ - The `{#each ... (skipped.filename)}` is properly keyed ✅ - Scoped `<style>` block with `details[open] .details-chevron` — Svelte's scoping ensures no bleed ✅

marcel commented

2026-05-19 07:15:06 +02:00

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: ⚠️ Approved with concerns

This PR directly addresses a real attack vector: an attacker crafting a disguised executable (e.g. evil.exe renamed to real.pdf) that gets silently uploaded to S3 and potentially served to other family members. The magic-byte check closes that hole. Let me audit the implementation for correctness and completeness.

Blockers

None. The core security fix is implemented correctly.

Suggestions

1. The validation happens before S3 upload but `contentType` is determined by `Files.probeContentType()` after the check

In importSingleDocument(), Files.probeContentType() is called on the file after the magic-byte check has already allowed it through. On Linux, probeContentType() can rely on the file extension (not content), meaning a .exe file renamed to .pdf would still reach S3 with contentType = "application/pdf" if the magic byte check passed. Since the magic-byte check happens in processRows() before importSingleDocument() is called, this is fine — the guard is in the right place. Just worth documenting that contentType is extension-based and the actual content gating happens earlier.

2. Magic-byte check is a good first layer, but PDF header version not validated

The check validates %PDF (4 bytes). A legitimate PDF also contains %PDF-1.x but that 5th byte is not checked. This is intentional (PDF/A-4 exists, future versions exist), and the current check is the correct baseline. Not a blocker — just documenting the threat model scope.

3. No test for `@RequirePermission` on the import endpoint

The AdminControllerTest tests the import-status endpoint, but I want to confirm: is there a test that verifies unauthorized users cannot trigger the import? That test should verify that someone without ADMIN permission gets a 403 when calling POST /api/admin/import. This is orthogonal to this PR, but since AdminControllerTest was touched, it's worth confirming coverage exists.

4. `openFileStream` is package-private for test seam

The comment // package-private: Mockito spy in tests can override to inject IOException is the correct pattern for this stack and is the right call given Spring 7's constructor injection cycle constraints. This is a documented, intentional design choice. Security smell only: the method is callable from within the same package — but MassImportService is the only class in that package that needs it, so the exposure is minimal.

5. Skipped files list is held in-memory in `volatile ImportStatus`

The skippedFiles list could grow arbitrarily large if an import batch contains thousands of invalid files. There's no cap on the size of skippedFiles. For this family archive use case (dozens to hundreds of files per import), this is not a practical risk. Flag it as a future concern if imports scale up.

6. SLF4J parameterized logging is used correctly — no Log4Shell risk

log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename) — correct. The {} placeholder does not evaluate JNDI expressions. Good hygiene.

What's Done Well

The security control is in the right layer (before S3 upload, not after)
Error handling fails closed: both INVALID_PDF_SIGNATURE and FILE_READ_ERROR result in skip, not silent pass-through
Regression tests prove the vulnerability existed and is now fixed (TDD-style: the test for the attack scenario is permanent)
Logging uses parameterized SLF4J throughout — no injection risk

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ⚠️ Approved with concerns** This PR directly addresses a real attack vector: an attacker crafting a disguised executable (e.g. `evil.exe` renamed to `real.pdf`) that gets silently uploaded to S3 and potentially served to other family members. The magic-byte check closes that hole. Let me audit the implementation for correctness and completeness. --- ### Blockers **None.** The core security fix is implemented correctly. --- ### Suggestions #### 1. The validation happens before S3 upload but `contentType` is determined by `Files.probeContentType()` after the check In `importSingleDocument()`, `Files.probeContentType()` is called on the file **after** the magic-byte check has already allowed it through. On Linux, `probeContentType()` can rely on the file extension (not content), meaning a `.exe` file renamed to `.pdf` would still reach S3 with `contentType = "application/pdf"` if the magic byte check passed. Since the magic-byte check happens in `processRows()` before `importSingleDocument()` is called, this is fine — the guard is in the right place. Just worth documenting that `contentType` is extension-based and the actual content gating happens earlier. #### 2. Magic-byte check is a good first layer, but PDF header version not validated The check validates `%PDF` (4 bytes). A legitimate PDF also contains `%PDF-1.x` but that 5th byte is not checked. This is intentional (PDF/A-4 exists, future versions exist), and the current check is the correct baseline. **Not a blocker** — just documenting the threat model scope. #### 3. No test for `@RequirePermission` on the import endpoint The `AdminControllerTest` tests the import-status endpoint, but I want to confirm: is there a test that verifies unauthorized users cannot trigger the import? That test should verify that someone without `ADMIN` permission gets a 403 when calling `POST /api/admin/import`. This is orthogonal to this PR, but since `AdminControllerTest` was touched, it's worth confirming coverage exists. #### 4. `openFileStream` is package-private for test seam The comment `// package-private: Mockito spy in tests can override to inject IOException` is the correct pattern for this stack and is the right call given Spring 7's constructor injection cycle constraints. This is a documented, intentional design choice. Security smell only: the method is callable from within the same package — but `MassImportService` is the only class in that package that needs it, so the exposure is minimal. #### 5. Skipped files list is held in-memory in `volatile ImportStatus` The `skippedFiles` list could grow arbitrarily large if an import batch contains thousands of invalid files. There's no cap on the size of `skippedFiles`. For this family archive use case (dozens to hundreds of files per import), this is not a practical risk. Flag it as a future concern if imports scale up. #### 6. SLF4J parameterized logging is used correctly — no Log4Shell risk `log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename)` — correct. The `{}` placeholder does not evaluate JNDI expressions. Good hygiene. --- ### What's Done Well - The security control is in the right layer (before S3 upload, not after) - Error handling fails closed: both `INVALID_PDF_SIGNATURE` and `FILE_READ_ERROR` result in skip, not silent pass-through - Regression tests prove the vulnerability existed and is now fixed (TDD-style: the test for the attack scenario is permanent) - Logging uses parameterized SLF4J throughout — no injection risk

marcel commented

2026-05-19 07:15:36 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ⚠️ Approved with concerns

Clean implementation overall. The backend logic is well-structured, the test coverage is meaningful, and the frontend change is minimal and correct. A few things catch my eye on clean code and TDD hygiene.

Blockers

1. `buildMinimalImportXlsx` only populates the index column (column 0)

In the test helper buildMinimalImportXlsx(Path dir, String... filenames), each row only sets cell(0) (the Index column). All other columns (colBox, colFolder, colSender, etc.) are empty. This means importSingleDocument is called with blank metadata for every field.

The tests still exercise the magic-byte path correctly, but if importSingleDocument has any NPE-prone path when all metadata is blank, these tests would not catch it. More importantly: the test for "uploads valid PDF and skips fake one" (runImportAsync_uploadsValidPdf_andSkipsFakeOne) calls verify(s3Client, times(1)).putObject(...), which proves the upload happened. But does documentService.save(any()) get called once too? The test only stubs when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0)) in the helper — that's fine. The blocker concern is mild: the three tests sharing setupOneValidOneFakeImport run service.runImportAsync() independently and each calls the full setup (via @TempDir), but since @TempDir is per-test, each test re-runs the full import. This means the s3Client.putObject stub is re-used across three separate tests that each call runImportAsync() — and only one of those three actually checks for times(1). The other two don't assert on s3Client at all. That's fine — one assertion per test is correct. Not a blocker, just flagging.

Suggestions

2. `importSingleDocument` return type change from `void` to `boolean` is a breaking semantic change

importSingleDocument now returns boolean, but the semantic of false conflates two very different outcomes:

"document already exists, skipped intentionally" → false
"S3 upload failed" → false

These are different outcomes. The caller in processRows treats both as "not processed" and does not add them to skippedFiles. An S3 upload failure is silently lost from the skipped-files list. The caller logs the error but the admin UI will not show it in the amber warning section. Suggestion: either add a third outcome (e.g. S3_UPLOAD_FAILED reason in skippedFiles), or return an enum ImportOutcome { IMPORTED, SKIPPED_EXISTING, UPLOAD_FAILED } so the caller can route to the right bucket.

3. `processRows` is now 50+ lines doing validation + import + skip tracking

The method now orchestrates: find-file, validate-magic-bytes, handle-IOE, import-document, count. Consider extracting validateFileSignature(File file, String filename, List<SkippedFile> skippedFiles) returning a boolean — it would make the main loop cleaner:

if (fileOnDisk.isPresent() && !validateAndAcceptFile(fileOnDisk.get(), filename, skippedFiles)) {
    continue;
}

4. `{#each importStatus.skippedFiles as skipped (skipped.filename)}` — key is `filename`

If two files from different folders have the same filename, the key is non-unique and Svelte's reconciliation could corrupt state. For this family archive use case (flat import dir), collisions are unlikely — but the correct key would be (skipped.filename + skipped.reason) or a synthetic index.

5. `reasonLabel` in Svelte is a plain function, not `$derived`

function reasonLabel(code: string): string { ... }

This is correct — it's called inline in the template and has no reactive dependencies beyond code. No issue here, just confirming it's intentional.

6. Frontend `types.ts` — manually maintained type instead of generated

SkippedFile and the updated ImportStatus fields are manually added to frontend/src/routes/admin/system/types.ts instead of being generated from the OpenAPI spec. Per CONTRIBUTING.md, npm run generate:api should be run after backend model changes. The PR description does not mention regenerating the OpenAPI types. If ImportStatus is in the OpenAPI spec (which it should be, given the @Schema annotations), the manually maintained types.ts will drift from the spec. Suggestion: remove types.ts and use the generated type, or confirm in the PR that generate:api was run and the generated type is used elsewhere.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ⚠️ Approved with concerns** Clean implementation overall. The backend logic is well-structured, the test coverage is meaningful, and the frontend change is minimal and correct. A few things catch my eye on clean code and TDD hygiene. --- ### Blockers #### 1. `buildMinimalImportXlsx` only populates the index column (column 0) In the test helper `buildMinimalImportXlsx(Path dir, String... filenames)`, each row only sets `cell(0)` (the Index column). All other columns (`colBox`, `colFolder`, `colSender`, etc.) are empty. This means `importSingleDocument` is called with blank metadata for every field. The tests still exercise the magic-byte path correctly, but if `importSingleDocument` has any NPE-prone path when all metadata is blank, these tests would not catch it. More importantly: the test for "uploads valid PDF and skips fake one" (`runImportAsync_uploadsValidPdf_andSkipsFakeOne`) calls `verify(s3Client, times(1)).putObject(...)`, which proves the upload happened. But does `documentService.save(any())` get called once too? The test only stubs `when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0))` in the helper — that's fine. The blocker concern is mild: the three tests sharing `setupOneValidOneFakeImport` run `service.runImportAsync()` independently and each calls the full setup (via `@TempDir`), but since `@TempDir` is per-test, each test re-runs the full import. This means the `s3Client.putObject` stub is re-used across three separate tests that each call `runImportAsync()` — and only one of those three actually checks for `times(1)`. The other two don't assert on `s3Client` at all. That's fine — one assertion per test is correct. Not a blocker, just flagging. --- ### Suggestions #### 2. `importSingleDocument` return type change from `void` to `boolean` is a breaking semantic change `importSingleDocument` now returns `boolean`, but the semantic of `false` conflates two very different outcomes: - "document already exists, skipped intentionally" → `false` - "S3 upload failed" → `false` These are different outcomes. The caller in `processRows` treats both as "not processed" and does not add them to `skippedFiles`. An S3 upload failure is silently lost from the skipped-files list. The caller logs the error but the admin UI will not show it in the amber warning section. **Suggestion**: either add a third outcome (e.g. `S3_UPLOAD_FAILED` reason in `skippedFiles`), or return an `enum ImportOutcome { IMPORTED, SKIPPED_EXISTING, UPLOAD_FAILED }` so the caller can route to the right bucket. #### 3. `processRows` is now 50+ lines doing validation + import + skip tracking The method now orchestrates: find-file, validate-magic-bytes, handle-IOE, import-document, count. Consider extracting `validateFileSignature(File file, String filename, List<SkippedFile> skippedFiles)` returning a `boolean` — it would make the main loop cleaner: ```java if (fileOnDisk.isPresent() && !validateAndAcceptFile(fileOnDisk.get(), filename, skippedFiles)) { continue; } ``` #### 4. `{#each importStatus.skippedFiles as skipped (skipped.filename)}` — key is `filename` If two files from different folders have the same filename, the key is non-unique and Svelte's reconciliation could corrupt state. For this family archive use case (flat import dir), collisions are unlikely — but the correct key would be `(skipped.filename + skipped.reason)` or a synthetic index. #### 5. `reasonLabel` in Svelte is a plain function, not `$derived` ```typescript function reasonLabel(code: string): string { ... } ``` This is correct — it's called inline in the template and has no reactive dependencies beyond `code`. No issue here, just confirming it's intentional. #### 6. Frontend `types.ts` — manually maintained type instead of generated `SkippedFile` and the updated `ImportStatus` fields are manually added to `frontend/src/routes/admin/system/types.ts` instead of being generated from the OpenAPI spec. Per CONTRIBUTING.md, `npm run generate:api` should be run after backend model changes. The PR description does not mention regenerating the OpenAPI types. If `ImportStatus` is in the OpenAPI spec (which it should be, given the `@Schema` annotations), the manually maintained `types.ts` will drift from the spec. **Suggestion**: remove `types.ts` and use the generated type, or confirm in the PR that generate:api was run and the generated type is used elsewhere.

marcel commented

2026-05-19 07:16:02 +02:00

🏛️ Markus Keller (@mkeller) — Senior Application Architect

Verdict: ⚠️ Approved with concerns

The implementation is correct in its layering — validation happens in the service before the S3 call, not scattered across the controller or pushed into a separate utility class. However, the PR introduces some structural issues worth addressing.

Blockers

1. `ImportStatus` record now has a derived method `skipped()` annotated with `@Schema(requiredMode = REQUIRED)` that is NOT a record component

Java records generate accessor methods for their components automatically. skipped() is a compact accessor (actually a non-component method) added to the record body. This works at runtime but the @Schema annotation on a non-component method in a record is unusual — OpenAPI generation behavior depends on whether SpringDoc/Swagger processes this as a property or ignores it. The @JsonProperty("skipped") ensures Jackson serializes it, but does the OpenAPI spec actually include skipped as a required field in the generated ImportStatus schema?

If the OpenAPI spec does NOT include skipped, then the frontend's ImportStatus type in types.ts (which manually adds skipped: number) diverges from the generated spec. Run npm run generate:api and verify the output includes skipped as a non-optional field. If it doesn't, this is a spec-contract bug.

2. `skippedFiles` is a mutable `ArrayList` passed into an immutable record — defensive copy missing

In processRows():

List<SkippedFile> skippedFiles = new ArrayList<>();
// ... mutations ...
return new ProcessResult(processed, skippedFiles);

Then in runImportAsync():

currentStatus = new ImportStatus(State.DONE, ..., result.skippedFiles(), ...);

result.skippedFiles() returns the same ArrayList reference. ImportStatus stores it without copying. Since currentStatus is volatile and shared across threads (the import runs @Async), any concurrent read of currentStatus.skippedFiles() gets the same ArrayList — which is fine since the list is not mutated after the status is set. But it's fragile: if anyone adds post-set mutations in future, this becomes a data race. Suggestion: wrap with List.copyOf(result.skippedFiles()) when constructing the final DONE status.

Suggestions

3. `ProcessResult` is a package-private record — consider whether it should be private to the method

ProcessResult is only used as a return type from processRows() to runImportAsync(). Both are in the same class. This record could be a private record nested inside MassImportService. Making it package-private leaks an internal implementation detail. Not a boundary violation, but cleaner as a private nested type.

4. `openFileStream` is package-private for testability — this is the right call, but document it in the ADR

The CLAUDE.md and MEMORY.md note that Spring Boot 4 / Spring Framework 7 breaks constructor injection cycles, and the solution chosen here (package-private override point for Mockito spy) is the documented fallback. An entry in the existing ADR (or a brief note in the CLAUDE.md backend architecture section) explaining why this test seam exists would help future developers understand it's intentional and not accidental leakage.

5. No doc update required for this PR

Checking the architecture update table:

No new Flyway migration → no DB diagram update needed
No new package or domain module → CLAUDE.md package table unchanged
No new SvelteKit route → route table unchanged
No new Docker service → Compose unchanged
SkippedFile is a new domain concept — consider adding it to docs/GLOSSARY.md so future developers know it refers to files rejected during mass import

This is a suggestion, not a blocker — the term is self-explanatory in context.

What's Done Well

Validation is in the right layer (service, before S3 upload)
The record-based API shape (SkippedFile + skipped() derived method) is the right Java 21 idiom for this
The @Async method correctly transitions currentStatus atomically via assignment (records are immutable, volatile assignment is atomic)
The test strategy (real temp files, Mockito spy for IOException) is appropriate for this layer

## 🏛️ Markus Keller (@mkeller) — Senior Application Architect **Verdict: ⚠️ Approved with concerns** The implementation is correct in its layering — validation happens in the service before the S3 call, not scattered across the controller or pushed into a separate utility class. However, the PR introduces some structural issues worth addressing. --- ### Blockers #### 1. `ImportStatus` record now has a derived method `skipped()` annotated with `@Schema(requiredMode = REQUIRED)` that is NOT a record component Java records generate accessor methods for their components automatically. `skipped()` is a **compact accessor** (actually a non-component method) added to the record body. This works at runtime but the `@Schema` annotation on a non-component method in a record is unusual — OpenAPI generation behavior depends on whether SpringDoc/Swagger processes this as a property or ignores it. The `@JsonProperty("skipped")` ensures Jackson serializes it, but does the OpenAPI spec actually include `skipped` as a required field in the generated `ImportStatus` schema? If the OpenAPI spec does NOT include `skipped`, then the frontend's `ImportStatus` type in `types.ts` (which manually adds `skipped: number`) diverges from the generated spec. Run `npm run generate:api` and verify the output includes `skipped` as a non-optional field. **If it doesn't, this is a spec-contract bug.** #### 2. `skippedFiles` is a mutable `ArrayList` passed into an immutable record — defensive copy missing In `processRows()`: ```java List<SkippedFile> skippedFiles = new ArrayList<>(); // ... mutations ... return new ProcessResult(processed, skippedFiles); ``` Then in `runImportAsync()`: ```java currentStatus = new ImportStatus(State.DONE, ..., result.skippedFiles(), ...); ``` `result.skippedFiles()` returns the same `ArrayList` reference. `ImportStatus` stores it without copying. Since `currentStatus` is `volatile` and shared across threads (the import runs `@Async`), any concurrent read of `currentStatus.skippedFiles()` gets the same `ArrayList` — which is fine since the list is not mutated after the status is set. But it's fragile: if anyone adds post-set mutations in future, this becomes a data race. **Suggestion**: wrap with `List.copyOf(result.skippedFiles())` when constructing the final `DONE` status. --- ### Suggestions #### 3. `ProcessResult` is a package-private record — consider whether it should be private to the method `ProcessResult` is only used as a return type from `processRows()` to `runImportAsync()`. Both are in the same class. This record could be a `private record` nested inside `MassImportService`. Making it package-private leaks an internal implementation detail. Not a boundary violation, but cleaner as a private nested type. #### 4. `openFileStream` is package-private for testability — this is the right call, but document it in the ADR The CLAUDE.md and MEMORY.md note that Spring Boot 4 / Spring Framework 7 breaks constructor injection cycles, and the solution chosen here (package-private override point for Mockito spy) is the documented fallback. An entry in the existing ADR (or a brief note in the CLAUDE.md backend architecture section) explaining why this test seam exists would help future developers understand it's intentional and not accidental leakage. #### 5. No doc update required for this PR Checking the architecture update table: - No new Flyway migration → no DB diagram update needed - No new package or domain module → `CLAUDE.md` package table unchanged - No new SvelteKit route → route table unchanged - No new Docker service → Compose unchanged - `SkippedFile` is a new domain concept — consider adding it to `docs/GLOSSARY.md` so future developers know it refers to files rejected during mass import This is a suggestion, not a blocker — the term is self-explanatory in context. --- ### What's Done Well - Validation is in the right layer (service, before S3 upload) - The record-based API shape (`SkippedFile` + `skipped()` derived method) is the right Java 21 idiom for this - The `@Async` method correctly transitions `currentStatus` atomically via assignment (records are immutable, `volatile` assignment is atomic) - The test strategy (real temp files, Mockito spy for IOException) is appropriate for this layer

marcel commented

2026-05-19 07:16:33 +02:00

🧪 Sara Holt (@saraholt) — Senior QA Engineer

Verdict: ⚠️ Approved with concerns

The test coverage for the magic-byte feature itself is solid. Four regression tests cover the essential scenarios. However, I have concerns about test structure, missing coverage paths, and one gap in the existing test suite that this PR inadvertently exposes.

Blockers

1. Three separate tests share a single helper that runs the full `runImportAsync()` import cycle

runImportAsync_uploadsValidPdf_andSkipsFakeOne, runImportAsync_setsSkippedCount_toOne_whenOneFakeFile, and runImportAsync_includesRejectedFilename_inSkippedFiles all call setupOneValidOneFakeImport(tempDir) and then service.runImportAsync(). Each test runs an entire import cycle to assert one thing. This is the right approach for behavior testing — one assertion per test, with @TempDir ensuring isolation per test. Not a blocker per se, but the three tests are testing three aspects of the same behavior. They could be a single test with three assertions if the behavior is truly atomic — or documented as intentionally separate for failure locality. Currently the structure is defensible.

Actual blocker: the test runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException creates a MassImportService spyService = spy(service). This spy is created after service is already configured with ReflectionTestUtils.setField(service, "importDir", ...). The spy wraps the same object instance, so the field set on service should be visible through spyService — Mockito's spy() delegates to the real object. However, @InjectMocks and @Async interact: runImportAsync() on a Mockito spy of a @Service bean is called synchronously in tests (no real async executor is wired in unit tests). This is fine. But: spyService.runImportAsync() will call processRows() on the spy, which calls openFileStream() on the spy, which is stubbed. The doThrow(IOException).when(spyService).openFileStream(any()) will intercept ALL openFileStream calls including the valid real.pdf — this test doesn't use setupOneValidOneFakeImport, it only has unreadable.pdf, so there's only one file. This is fine. Test is correct.

Suggestions

2. Missing test: S3 upload failure is not tracked in `skippedFiles`

When importSingleDocument returns false due to an S3 upload exception, the file is silently dropped — it's neither counted in processed nor added to skippedFiles. There is no test asserting this behavior. A test should verify:

void runImportAsync_doesNotCountS3FailureInSkipped_orProcessed()

This documents the current behavior (intentional or not) and prevents silent changes.

3. Missing test: file that passes magic-byte check but has wrong content still uploads

The validation only checks the first 4 bytes. A test like runImportAsync_uploadsFileWithPdfHeaderButNonPdfContent (e.g. a file starting with %PDF followed by garbage) would confirm the boundary of the validation. This is a documentation-test, not a security test — it establishes that the check is "header only, not deep validation." Currently there's no test for this and it could surprise future maintainers.

4. `buildMinimalImportXlsx` only sets the index column — all other columns are blank

The helper creates a row with only sheet.createRow(i+1).createCell(0).setCellValue(filenames[i]). This means getCell(cells, colSender), getCell(cells, colDate), etc., all return "". The import path through importSingleDocument still works because those are nullable fields, but the test doesn't exercise any metadata parsing. A comment in the helper explaining this scope limitation would help future test authors.

5. Test names follow the `runImportAsync_*` naming convention consistently — good

All five new test names are descriptive and follow the existing pattern. The one exception is runImportAsync_skipsFile_whenShorterThanFourBytes — slightly ambiguous about whether it's "skips" (outcome) or "when" (condition). The current naming is fine but runImportAsync_skipsFile_whenFileShorterThanFourBytes reads slightly better.

6. Frontend tests: `getByTestId('skipped-count')` — `data-testid` selector is correct

Using data-testid is the right approach for QA selectors on dynamic counts. The test correctly asserts .not.toBeInTheDocument() for the RUNNING and FAILED states. Coverage for the IDLE state (no skipped section shown) is implicitly covered by the existing tests that use makeStatus() with defaults (skipped: 0). Explicit test for IDLE state would be cleaner but is not a blocker.

What's Done Well

Four regression tests that are permanent — correct TDD artifact
Mockito spy approach for the IOException path is the right tool here
Frontend tests cover the reasonLabel fallback (unknown codes) — good future-proofing
makeStatus factory function is extended correctly with the new fields
Tests verify absence of the skipped section in RUNNING and FAILED states — comprehensive state coverage

## 🧪 Sara Holt (@saraholt) — Senior QA Engineer **Verdict: ⚠️ Approved with concerns** The test coverage for the magic-byte feature itself is solid. Four regression tests cover the essential scenarios. However, I have concerns about test structure, missing coverage paths, and one gap in the existing test suite that this PR inadvertently exposes. --- ### Blockers #### 1. Three separate tests share a single helper that runs the full `runImportAsync()` import cycle `runImportAsync_uploadsValidPdf_andSkipsFakeOne`, `runImportAsync_setsSkippedCount_toOne_whenOneFakeFile`, and `runImportAsync_includesRejectedFilename_inSkippedFiles` all call `setupOneValidOneFakeImport(tempDir)` and then `service.runImportAsync()`. Each test runs an entire import cycle to assert one thing. This is the right approach for behavior testing — one assertion per test, with `@TempDir` ensuring isolation per test. **Not a blocker per se**, but the three tests are testing three aspects of the same behavior. They could be a single test with three assertions if the behavior is truly atomic — or documented as intentionally separate for failure locality. Currently the structure is defensible. **Actual blocker**: the test `runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException` creates a `MassImportService spyService = spy(service)`. This spy is created *after* `service` is already configured with `ReflectionTestUtils.setField(service, "importDir", ...)`. The spy wraps the *same object instance*, so the field set on `service` should be visible through `spyService` — Mockito's `spy()` delegates to the real object. However, `@InjectMocks` and `@Async` interact: `runImportAsync()` on a Mockito spy of a `@Service` bean is called synchronously in tests (no real async executor is wired in unit tests). This is fine. But: `spyService.runImportAsync()` will call `processRows()` on the spy, which calls `openFileStream()` on the spy, which is stubbed. The `doThrow(IOException).when(spyService).openFileStream(any())` will intercept ALL `openFileStream` calls including the valid `real.pdf` — **this test doesn't use `setupOneValidOneFakeImport`**, it only has `unreadable.pdf`, so there's only one file. This is fine. Test is correct. --- ### Suggestions #### 2. Missing test: S3 upload failure is not tracked in `skippedFiles` When `importSingleDocument` returns `false` due to an S3 upload exception, the file is silently dropped — it's neither counted in `processed` nor added to `skippedFiles`. There is no test asserting this behavior. A test should verify: ```java void runImportAsync_doesNotCountS3FailureInSkipped_orProcessed() ``` This documents the current behavior (intentional or not) and prevents silent changes. #### 3. Missing test: file that passes magic-byte check but has wrong content still uploads The validation only checks the first 4 bytes. A test like `runImportAsync_uploadsFileWithPdfHeaderButNonPdfContent` (e.g. a file starting with `%PDF` followed by garbage) would confirm the boundary of the validation. This is a documentation-test, not a security test — it establishes that the check is "header only, not deep validation." Currently there's no test for this and it could surprise future maintainers. #### 4. `buildMinimalImportXlsx` only sets the index column — all other columns are blank The helper creates a row with only `sheet.createRow(i+1).createCell(0).setCellValue(filenames[i])`. This means `getCell(cells, colSender)`, `getCell(cells, colDate)`, etc., all return `""`. The import path through `importSingleDocument` still works because those are nullable fields, but the test doesn't exercise any metadata parsing. A comment in the helper explaining this scope limitation would help future test authors. #### 5. Test names follow the `runImportAsync_*` naming convention consistently — good All five new test names are descriptive and follow the existing pattern. The one exception is `runImportAsync_skipsFile_whenShorterThanFourBytes` — slightly ambiguous about whether it's "skips" (outcome) or "when" (condition). The current naming is fine but `runImportAsync_skipsFile_whenFileShorterThanFourBytes` reads slightly better. #### 6. Frontend tests: `getByTestId('skipped-count')` — `data-testid` selector is correct Using `data-testid` is the right approach for QA selectors on dynamic counts. The test correctly asserts `.not.toBeInTheDocument()` for the RUNNING and FAILED states. Coverage for the IDLE state (no skipped section shown) is implicitly covered by the existing tests that use `makeStatus()` with defaults (`skipped: 0`). Explicit test for IDLE state would be cleaner but is not a blocker. --- ### What's Done Well - Four regression tests that are permanent — correct TDD artifact - Mockito spy approach for the IOException path is the right tool here - Frontend tests cover the `reasonLabel` fallback (unknown codes) — good future-proofing - `makeStatus` factory function is extended correctly with the new fields - Tests verify absence of the skipped section in RUNNING and FAILED states — comprehensive state coverage

marcel commented

2026-05-19 07:17:02 +02:00

🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead & Accessibility Advocate

Verdict: ⚠️ Approved with concerns

The amber warning section for skipped files is a good UX decision — it's non-blocking (the import succeeded for the valid files), visually distinct from the green success section, and the collapsible pattern is appropriate for a potentially long list of filenames. Let me audit the specifics.

Blockers

1. `<details>/<summary>` is not accessible to screen readers in all browsers — missing `aria-live` for the expanded state

The native <details> element is well-supported but the screen reader announcement behavior for toggling open varies across browser/AT combinations. More critically: the skipped-files warning appears after a completed import. If the import card auto-refreshes (polling), a screen reader user may not be notified that new content appeared.

The wrapping card should include aria-live="polite" on the section that updates after import completes. Since this is an admin-only screen used by technically capable operators (not the senior 60+ read-path audience), this is a concern rather than a critical blocker — but it's still worth addressing.

Specific fix: wrap the DONE state section with role="status" or place an aria-live="polite" container around the dynamic import-result content.

2. The `<summary>` has no accessible label for screen readers

The <summary> contains an SVG chevron and a <div> with the count and label. The <summary> itself has no aria-label. Screen readers will announce the full text content of the summary, which will be "3 ÜBERSPRUNGEN" (or equivalent) — that's marginally acceptable. However the SVG has no aria-hidden="true", so screen readers may also announce the SVG content (which has no <title> element).

Fix: add aria-hidden="true" to the SVG:

<svg aria-hidden="true" class="details-chevron ...">

Suggestions

3. Color-only distinction: `border-warning/40 bg-warning/10 text-amber-900`

The amber warning box uses color to communicate "this is a warning." There's no icon, no warning label, no text other than the count and label. For color-blind users, the amber/green distinction may be invisible.

Suggestion: add a warning icon (⚠ or SVG) before the count in the summary, with aria-hidden="true". The text label "übersprungen/skipped" partially compensates, but an icon reinforces the semantic.

4. `font-mono text-sm text-ink-2` for the filename list

text-sm = 14px. The project's minimum is 12px and the audience guidance says 16px preferred. For an admin card viewed by technically capable operators, 14px is acceptable. For a senior user (60+), 14px mono font in text-ink-2 (reduced contrast) would be difficult. Since this is admin-only, this is a minor concern rather than a blocker.

5. The chevron rotation CSS uses a `<style>` block — check for Tailwind CSS 4 compatibility

<style>
details[open] .details-chevron {
  transform: rotate(90deg);
}
</style>

This uses a scoped style block in Svelte, which is fine. However, motion-safe:transition-transform is already applied via Tailwind. The rotation itself could be expressed in Tailwind with a group-open:rotate-90 pattern if <details> were wrapped with class="group". The current approach works but mixes Tailwind utility classes with a <style> block. Minor inconsistency with the project's Tailwind-first approach — not a blocker.

6. The collapsible list has no maximum height / scroll — a large number of skipped files could make the page very tall

If an import of 500 files has 400 invalid ones, the expanded <ul> would be 400 lines tall. Add max-h-48 overflow-y-auto (or similar) to the <ul> so the list scrolls rather than extending the page:

<ul class="mt-3 space-y-1 max-h-48 overflow-y-auto">

What's Done Well

The amber/green visual separation is clear and on-brand
Native <details>/<summary> is the right choice — no JavaScript required, good progressive enhancement
motion-safe:transition-transform respects prefers-reduced-motion — good accessibility hygiene
cursor-pointer list-none on <summary> correctly removes the browser-default triangle disclosure marker
The data-testid="skipped-count" attribute enables targeted testing without relying on DOM structure

## 🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead & Accessibility Advocate **Verdict: ⚠️ Approved with concerns** The amber warning section for skipped files is a good UX decision — it's non-blocking (the import succeeded for the valid files), visually distinct from the green success section, and the collapsible pattern is appropriate for a potentially long list of filenames. Let me audit the specifics. --- ### Blockers #### 1. `<details>/<summary>` is not accessible to screen readers in all browsers — missing `aria-live` for the expanded state The native `<details>` element is well-supported but the screen reader announcement behavior for toggling `open` varies across browser/AT combinations. More critically: the skipped-files warning appears *after* a completed import. If the import card auto-refreshes (polling), a screen reader user may not be notified that new content appeared. The wrapping card should include `aria-live="polite"` on the section that updates after import completes. Since this is an admin-only screen used by technically capable operators (not the senior 60+ read-path audience), this is a concern rather than a critical blocker — but it's still worth addressing. **Specific fix**: wrap the DONE state section with `role="status"` or place an `aria-live="polite"` container around the dynamic import-result content. #### 2. The `<summary>` has no accessible label for screen readers The `<summary>` contains an SVG chevron and a `<div>` with the count and label. The `<summary>` itself has no `aria-label`. Screen readers will announce the full text content of the summary, which will be `"3 ÜBERSPRUNGEN"` (or equivalent) — that's marginally acceptable. However the SVG has no `aria-hidden="true"`, so screen readers may also announce the SVG content (which has no `<title>` element). **Fix**: add `aria-hidden="true"` to the SVG: ```svelte <svg aria-hidden="true" class="details-chevron ..."> ``` --- ### Suggestions #### 3. Color-only distinction: `border-warning/40 bg-warning/10 text-amber-900` The amber warning box uses color to communicate "this is a warning." There's no icon, no warning label, no text other than the count and label. For color-blind users, the amber/green distinction may be invisible. **Suggestion**: add a warning icon (⚠ or SVG) before the count in the summary, with `aria-hidden="true"`. The text label "übersprungen/skipped" partially compensates, but an icon reinforces the semantic. #### 4. `font-mono text-sm text-ink-2` for the filename list `text-sm` = 14px. The project's minimum is 12px and the audience guidance says 16px preferred. For an admin card viewed by technically capable operators, 14px is acceptable. For a senior user (60+), 14px mono font in `text-ink-2` (reduced contrast) would be difficult. Since this is admin-only, this is a minor concern rather than a blocker. #### 5. The chevron rotation CSS uses a `<style>` block — check for Tailwind CSS 4 compatibility ```svelte <style> details[open] .details-chevron { transform: rotate(90deg); } </style> ``` This uses a scoped style block in Svelte, which is fine. However, `motion-safe:transition-transform` is already applied via Tailwind. The rotation itself could be expressed in Tailwind with a `group-open:rotate-90` pattern if `<details>` were wrapped with `class="group"`. The current approach works but mixes Tailwind utility classes with a `<style>` block. Minor inconsistency with the project's Tailwind-first approach — not a blocker. #### 6. The collapsible list has no maximum height / scroll — a large number of skipped files could make the page very tall If an import of 500 files has 400 invalid ones, the expanded `<ul>` would be 400 lines tall. Add `max-h-48 overflow-y-auto` (or similar) to the `<ul>` so the list scrolls rather than extending the page: ```svelte <ul class="mt-3 space-y-1 max-h-48 overflow-y-auto"> ``` --- ### What's Done Well - The amber/green visual separation is clear and on-brand - Native `<details>/<summary>` is the right choice — no JavaScript required, good progressive enhancement - `motion-safe:transition-transform` respects prefers-reduced-motion — good accessibility hygiene - `cursor-pointer list-none` on `<summary>` correctly removes the browser-default triangle disclosure marker - The `data-testid="skipped-count"` attribute enables targeted testing without relying on DOM structure

marcel commented

2026-05-19 07:17:12 +02:00

🏗️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer

Verdict: ✅ Approved

This PR has no infrastructure, CI, or deployment changes. My review is limited to operational concerns arising from the backend change.

No blockers

Suggestions

1. S3 storage growth: rejected files still exist on the import filesystem after rejection

The magic-byte check prevents invalid files from being uploaded to S3 — correct. But the invalid files remain on disk in the importDir volume. Over time, with repeated import attempts, the import directory could accumulate rejected files. There's no cleanup mechanism. This is an operational concern, not a code bug.

Suggestion: document in the import runbook (or log a WARN at import-complete time) that the import directory should be reviewed and cleaned after each import run. Or add a post-import cleanup step that moves rejected files to a rejected/ subfolder.

2. Memory: `skippedFiles` list is held in-memory until the next import run

The currentStatus field (volatile, in-memory) holds the full skippedFiles list between import runs. For the Familienarchiv's expected import size (hundreds of files), this is fine — the list will be tiny. If imports ever scale to thousands of files with many rejections, this list could hold significant memory. Not a concern for current scale.

3. No CI pipeline changes — no action needed

The new tests are picked up automatically by the existing Maven test run in CI. No workflow changes needed.

What's Done Well

The magic-byte read opens and closes the file immediately (try-with-resources) — no resource leak risk on the import server
The @Async method means the import doesn't block the HTTP thread — correct for a long-running operation
No new infrastructure dependencies introduced — the validation is pure Java I/O, no external service calls

## 🏗️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer **Verdict: ✅ Approved** This PR has no infrastructure, CI, or deployment changes. My review is limited to operational concerns arising from the backend change. --- ### No blockers --- ### Suggestions #### 1. S3 storage growth: rejected files still exist on the import filesystem after rejection The magic-byte check prevents invalid files from being uploaded to S3 — correct. But the invalid files remain on disk in the `importDir` volume. Over time, with repeated import attempts, the import directory could accumulate rejected files. There's no cleanup mechanism. This is an operational concern, not a code bug. **Suggestion**: document in the import runbook (or log a `WARN` at import-complete time) that the import directory should be reviewed and cleaned after each import run. Or add a post-import cleanup step that moves rejected files to a `rejected/` subfolder. #### 2. Memory: `skippedFiles` list is held in-memory until the next import run The `currentStatus` field (volatile, in-memory) holds the full `skippedFiles` list between import runs. For the Familienarchiv's expected import size (hundreds of files), this is fine — the list will be tiny. If imports ever scale to thousands of files with many rejections, this list could hold significant memory. Not a concern for current scale. #### 3. No CI pipeline changes — no action needed The new tests are picked up automatically by the existing Maven test run in CI. No workflow changes needed. --- ### What's Done Well - The magic-byte read opens and closes the file immediately (try-with-resources) — no resource leak risk on the import server - The `@Async` method means the import doesn't block the HTTP thread — correct for a long-running operation - No new infrastructure dependencies introduced — the validation is pure Java I/O, no external service calls

marcel commented

2026-05-19 07:17:40 +02:00

📋 Elicit — Requirements Engineer

Verdict: ⚠️ Approved with concerns

Reviewing this PR through the lens of requirements completeness, acceptance criteria coverage, and edge-case specification.

Blockers

None — the core requirement (validate PDF magic bytes before S3 upload) is implemented and tested.

Concerns

1. The distinction between "skipped" and "failed to import" is not surfaced to the user

From the admin's perspective, a file can be absent from the processed count for three reasons:

It passed magic-byte check but its S3 upload failed (silently dropped — neither in processed nor skippedFiles)
It already existed in the archive at non-PLACEHOLDER status (silently skipped)
It failed the magic-byte check (correctly shown in amber section)

The current UI only shows the third category in the amber section. Categories 1 and 2 are invisible to the admin. The amber section is titled "übersprungen" (skipped), but it only captures a subset of skips. Risk: an admin sees "10 processed, 2 skipped" and assumes those 2 were rejected as non-PDFs, not knowing a 3rd file failed S3 upload silently.

Suggestion: at minimum, document in the amber section header that "skipped" means "rejected before upload" — not "all non-processed files."

2. Acceptance criterion gap: what happens when the spreadsheet row references a file that exists but is not a PDF?

The original issue (#529) presumably specified that non-PDF files should be rejected. The implementation handles:

File not found → metadata-only import (existing behavior, not changed)
File found, not a PDF → skipped with INVALID_PDF_SIGNATURE
File found, is a PDF → imported

But what is the expected behavior for an .xlsx or .jpg file named in the spreadsheet? The current code appends .pdf to filenames without an extension (if (index.contains(".")) ? index : index + ".pdf"), meaning a spreadsheet row with index "photo" will look for photo.pdf. If photo.jpg exists instead, it won't be found (metadata-only). But if the spreadsheet explicitly names photo.jpg, the code will try to find it, find the JPEG, and then the magic-byte check will correctly reject it. Is this the intended user experience? The admin may not understand why their JPEG was rejected.

Suggestion: the rejection reason INVALID_PDF_SIGNATURE could be improved to NOT_A_PDF for clarity, since the admin's mental model is "I gave you a JPEG, not a PDF" rather than "the signature bytes were wrong."

3. i18n: the `reason` field in the API response is an English enum-style string (`INVALID_PDF_SIGNATURE`)

The backend returns reason: "INVALID_PDF_SIGNATURE" as a raw string in the API response. The frontend maps this to a localized label via reasonLabel(). This is the correct pattern for this codebase. However, reasonLabel() has no fallback i18n key — unknown codes fall through to the raw English string. For a non-English-speaking admin (the app supports de/en/es), a future reason code added in a backend release without a corresponding frontend mapping would display English to a German speaker.

Current mitigation: the fallback is the raw code string, which is English by nature. This is acceptable for an admin-facing technical detail — admins are presumably technical enough to interpret SOME_NEW_REASON_CODE. No immediate requirement to fix, but worth tracking.

4. Requirements completeness: no acceptance criterion for "import completes normally when all files are valid PDFs and skipped count is 0"

The test suite covers the case where skipped === 0 hides the amber section (frontend test). But there's no backend test explicitly asserting that skippedFiles is empty and processed equals the total row count when all files are valid. The existing processRows_returnsOne_whenOneValidDocumentNoFile test is close but tests metadata-only import (no file on disk). A test with a valid PDF file (not just metadata-only) as the baseline would complete the regression coverage.

What's Done Well

The PR description clearly states the API shape decision ("per Marcel's API shape decision") — this is the right transparency for requirements traceability
Three i18n locales (de/en/es) are all updated — complete localization coverage
The reasonLabel fallback to raw code is a good extensibility pattern — future reason codes degrade gracefully rather than showing nothing

## 📋 Elicit — Requirements Engineer **Verdict: ⚠️ Approved with concerns** Reviewing this PR through the lens of requirements completeness, acceptance criteria coverage, and edge-case specification. --- ### Blockers None — the core requirement (validate PDF magic bytes before S3 upload) is implemented and tested. --- ### Concerns #### 1. The distinction between "skipped" and "failed to import" is not surfaced to the user From the admin's perspective, a file can be absent from the processed count for three reasons: - It passed magic-byte check but its S3 upload failed (silently dropped — neither in `processed` nor `skippedFiles`) - It already existed in the archive at non-PLACEHOLDER status (silently skipped) - It failed the magic-byte check (correctly shown in amber section) The current UI only shows the third category in the amber section. Categories 1 and 2 are invisible to the admin. The amber section is titled "übersprungen" (skipped), but it only captures a subset of skips. **Risk**: an admin sees "10 processed, 2 skipped" and assumes those 2 were rejected as non-PDFs, not knowing a 3rd file failed S3 upload silently. **Suggestion**: at minimum, document in the amber section header that "skipped" means "rejected before upload" — not "all non-processed files." #### 2. Acceptance criterion gap: what happens when the spreadsheet row references a file that exists but is not a PDF? The original issue (#529) presumably specified that non-PDF files should be rejected. The implementation handles: - File not found → metadata-only import (existing behavior, not changed) - File found, not a PDF → skipped with `INVALID_PDF_SIGNATURE` - File found, is a PDF → imported But what is the expected behavior for an `.xlsx` or `.jpg` file named in the spreadsheet? The current code appends `.pdf` to filenames without an extension (`if (index.contains(".")) ? index : index + ".pdf"`), meaning a spreadsheet row with index `"photo"` will look for `photo.pdf`. If `photo.jpg` exists instead, it won't be found (metadata-only). But if the spreadsheet explicitly names `photo.jpg`, the code will try to find it, find the JPEG, and then the magic-byte check will correctly reject it. Is this the intended user experience? The admin may not understand why their JPEG was rejected. **Suggestion**: the rejection reason `INVALID_PDF_SIGNATURE` could be improved to `NOT_A_PDF` for clarity, since the admin's mental model is "I gave you a JPEG, not a PDF" rather than "the signature bytes were wrong." #### 3. i18n: the `reason` field in the API response is an English enum-style string (`INVALID_PDF_SIGNATURE`) The backend returns `reason: "INVALID_PDF_SIGNATURE"` as a raw string in the API response. The frontend maps this to a localized label via `reasonLabel()`. This is the correct pattern for this codebase. However, `reasonLabel()` has no fallback i18n key — unknown codes fall through to the raw English string. For a non-English-speaking admin (the app supports de/en/es), a future `reason` code added in a backend release without a corresponding frontend mapping would display English to a German speaker. **Current mitigation**: the fallback is the raw code string, which is English by nature. This is acceptable for an admin-facing technical detail — admins are presumably technical enough to interpret `SOME_NEW_REASON_CODE`. No immediate requirement to fix, but worth tracking. #### 4. Requirements completeness: no acceptance criterion for "import completes normally when all files are valid PDFs and skipped count is 0" The test suite covers the case where `skipped === 0` hides the amber section (frontend test). But there's no backend test explicitly asserting that `skippedFiles` is empty and `processed` equals the total row count when all files are valid. The existing `processRows_returnsOne_whenOneValidDocumentNoFile` test is close but tests metadata-only import (no file on disk). A test with a valid PDF file (not just metadata-only) as the baseline would complete the regression coverage. --- ### What's Done Well - The PR description clearly states the API shape decision ("per Marcel's API shape decision") — this is the right transparency for requirements traceability - Three i18n locales (de/en/es) are all updated — complete localization coverage - The `reasonLabel` fallback to raw code is a good extensibility pattern — future reason codes degrade gracefully rather than showing nothing

marcel commented

2026-05-19 07:47:22 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ✅ Approved

Good re-work all round. The S3-failure-surfaced-in-skippedFiles fix, the i18n keys for reason codes, and the reasonLabel() fallback are clean. Here's what I looked at closely:

What's solid

isPdfMagicBytes() is a focused, 6-line method that does one thing. The hex comments (// %, // P, // D, // F) explain intent clearly — this is the right kind of comment.
ProcessResult record is a good introduction — turns a multi-value return into a named concept.
openFileStream() as a package-private seam for Mockito is an acceptable trade-off for testability. The comment explaining why it exists is exactly right.
reasonLabel() in the Svelte component is clean: a plain function, not a derived value (correct — it doesn't depend on reactive state), falls back to the raw code for future-proofing.
The {#each ... as skipped (skipped.filename)} is correctly keyed.
makeStatus() factory in the test extended correctly with skipped: 0, skippedFiles: [].

Suggestions (not blockers)

importSingleDocument now returns boolean and is @Transactional protected. The boolean return makes the method do two things: persist the document AND report whether it happened. A naming nudge: tryImportSingleDocument would signal the conditional return. Not a blocker.
The processRows loop has grown to ~30 lines. It now has four distinct steps: resolve filename → find file → check magic bytes → import. Extracting a processSingleRow(cells) helper would keep the loop body readable. Worth a follow-up issue.
The test runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException creates a Mockito spy on service (which is injected) — this is fine because it calls spyService.runImportAsync() and asserts on spyService.getStatus(). No cross-contamination risk.

No violations found

No @Transactional on read methods.
No raw exceptions — DomainException.conflict() used correctly.
SLF4J {} placeholders used throughout — no string concatenation in log calls.
No unkeyed {#each} blocks.
No business logic left in template markup.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** Good re-work all round. The S3-failure-surfaced-in-`skippedFiles` fix, the i18n keys for reason codes, and the `reasonLabel()` fallback are clean. Here's what I looked at closely: ### What's solid - `isPdfMagicBytes()` is a focused, 6-line method that does one thing. The hex comments (`// %`, `// P`, `// D`, `// F`) explain intent clearly — this is the right kind of comment. - `ProcessResult` record is a good introduction — turns a multi-value return into a named concept. - `openFileStream()` as a package-private seam for Mockito is an acceptable trade-off for testability. The comment explaining why it exists is exactly right. - `reasonLabel()` in the Svelte component is clean: a plain function, not a derived value (correct — it doesn't depend on reactive state), falls back to the raw code for future-proofing. - The `{#each ... as skipped (skipped.filename)}` is correctly keyed. - `makeStatus()` factory in the test extended correctly with `skipped: 0, skippedFiles: []`. ### Suggestions (not blockers) - `importSingleDocument` now returns `boolean` and is `@Transactional protected`. The boolean return makes the method do two things: persist the document AND report whether it happened. A naming nudge: `tryImportSingleDocument` would signal the conditional return. Not a blocker. - The `processRows` loop has grown to ~30 lines. It now has four distinct steps: resolve filename → find file → check magic bytes → import. Extracting a `processSingleRow(cells)` helper would keep the loop body readable. Worth a follow-up issue. - The test `runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException` creates a Mockito spy on `service` (which is injected) — this is fine because it calls `spyService.runImportAsync()` and asserts on `spyService.getStatus()`. No cross-contamination risk. ### No violations found - No `@Transactional` on read methods. - No raw exceptions — `DomainException.conflict()` used correctly. - SLF4J `{}` placeholders used throughout — no string concatenation in log calls. - No unkeyed `{#each}` blocks. - No business logic left in template markup.

marcel commented

2026-05-19 07:47:43 +02:00

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: ✅ Approved

This is what a security fix should look like: threat identified, validated at the earliest possible point in the pipeline (before S3 upload), test coverage proves the flaw existed and is now closed. Well done.

What I audited

Magic-byte check (primary concern)

isPdfMagicBytes() reads exactly 4 bytes and compares them byte-by-byte against %PDF (0x25 0x50 0x44 0x46). This is correct. The check runs before s3Client.putObject(), which is the right enforcement point — the attacker cannot smuggle a non-PDF into object storage by renaming it.

The short-file guard (header.length == 4) correctly rejects files under 4 bytes. A file with a valid 4-byte header followed by arbitrary content still passes — this is the intended behaviour for a magic-byte check (full PDF structural validation is out of scope for an import pipeline).

IOException path

FILE_READ_ERROR skips the file and records it in skippedFiles. The file is not uploaded. This is fail-closed: when we cannot verify, we reject. Correct.

Log safety

log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename) — SLF4J placeholder, no string concatenation. Not exploitable for Log4Shell-style injection. ✓

Observability without data leakage

skippedFiles exposes filename and reason (two structured strings) to the admin API. The filename comes from the spreadsheet cell, not from the file system path, so no absolute paths are leaked. The reason codes (INVALID_PDF_SIGNATURE, FILE_READ_ERROR) are enum-like strings — no stack trace, no exception message reaches the API response. ✓

No new attack surface

No new controller endpoint.
No new permission annotation needed (existing admin gate covers the import-status endpoint).
openFileStream() is package-private — not reachable from HTTP.

One observation (not a blocker)

The importSingleDocument method uses Files.probeContentType() to set the S3 Content-Type header — this relies on the OS mime.types database and can return null (handled with fallback to application/octet-stream). This is a separate code path from the magic-byte check and runs after the magic-byte validation passes. There is no security regression here, but it means a renamed .jpg that happens to start with %PDF would pass the magic-byte check and be stored with image/jpeg content type. This is a pre-existing limitation, not introduced by this PR, and is acceptable for an admin-only import pipeline.

Regression tests: pass criteria

runImportAsync_uploadsValidPdf_andSkipsFakeOne — proves valid PDF reaches S3.
runImportAsync_setsSkippedCount_toOne_whenOneFakeFile — proves rejection is counted.
runImportAsync_includesRejectedFilename_inSkippedFiles — proves filename is recorded.
runImportAsync_skipsFile_whenShorterThanFourBytes — proves the length guard.
runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException — proves fail-closed on read error.

All five are permanent regression tests. If someone removes the magic-byte check in the future, these tests will go red. ✓

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** This is what a security fix should look like: threat identified, validated at the earliest possible point in the pipeline (before S3 upload), test coverage proves the flaw existed and is now closed. Well done. ### What I audited **Magic-byte check (primary concern)** `isPdfMagicBytes()` reads exactly 4 bytes and compares them byte-by-byte against `%PDF` (0x25 0x50 0x44 0x46). This is correct. The check runs before `s3Client.putObject()`, which is the right enforcement point — the attacker cannot smuggle a non-PDF into object storage by renaming it. The short-file guard (`header.length == 4`) correctly rejects files under 4 bytes. A file with a valid 4-byte header followed by arbitrary content still passes — this is the intended behaviour for a magic-byte check (full PDF structural validation is out of scope for an import pipeline). **IOException path** `FILE_READ_ERROR` skips the file and records it in `skippedFiles`. The file is not uploaded. This is fail-closed: when we cannot verify, we reject. Correct. **Log safety** `log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename)` — SLF4J placeholder, no string concatenation. Not exploitable for Log4Shell-style injection. ✓ **Observability without data leakage** `skippedFiles` exposes `filename` and `reason` (two structured strings) to the admin API. The filename comes from the spreadsheet cell, not from the file system path, so no absolute paths are leaked. The reason codes (`INVALID_PDF_SIGNATURE`, `FILE_READ_ERROR`) are enum-like strings — no stack trace, no exception message reaches the API response. ✓ **No new attack surface** - No new controller endpoint. - No new permission annotation needed (existing admin gate covers the import-status endpoint). - `openFileStream()` is package-private — not reachable from HTTP. ### One observation (not a blocker) The `importSingleDocument` method uses `Files.probeContentType()` to set the S3 `Content-Type` header — this relies on the OS `mime.types` database and can return `null` (handled with fallback to `application/octet-stream`). This is a separate code path from the magic-byte check and runs *after* the magic-byte validation passes. There is no security regression here, but it means a renamed `.jpg` that happens to start with `%PDF` would pass the magic-byte check and be stored with `image/jpeg` content type. This is a pre-existing limitation, not introduced by this PR, and is acceptable for an admin-only import pipeline. ### Regression tests: pass criteria - `runImportAsync_uploadsValidPdf_andSkipsFakeOne` — proves valid PDF reaches S3. - `runImportAsync_setsSkippedCount_toOne_whenOneFakeFile` — proves rejection is counted. - `runImportAsync_includesRejectedFilename_inSkippedFiles` — proves filename is recorded. - `runImportAsync_skipsFile_whenShorterThanFourBytes` — proves the length guard. - `runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException` — proves fail-closed on read error. All five are permanent regression tests. If someone removes the magic-byte check in the future, these tests will go red. ✓

marcel commented

2026-05-19 07:47:59 +02:00

🏗️ Markus Keller — Senior Application Architect

Verdict: ✅ Approved

The structural concerns from the previous round have been addressed. This PR stays within the importing package boundary, does not reach into other domains' repositories, and introduces no new service dependencies.

Layer boundary check

MassImportService is the sole owner of importSingleDocument, isPdfMagicBytes, and the new SkippedFile/ProcessResult records. All of these are correctly scoped as inner types or package-private records — they don't leak into other domains.
SkippedFile and ImportStatus are exposed at the API surface via the existing AdminController → MassImportService.getStatus() chain. The types.ts addition on the frontend mirrors this shape. This is consistent with the project's pattern of using entity/record types directly as response shapes.
openFileStream() is package-private by design (documented). This is a reasonable seam — it doesn't widen the public API surface.

Documentation check (per Markus's review table)

What changed	Required update
No new Flyway migration	✓ not applicable
No new `@ManyToMany` or FK	✓ not applicable
No new backend package	✓ not applicable
No new controller or service	✓ not applicable — `AdminController` unchanged
No new SvelteKit route	✓ not applicable
No new Docker service	✓ not applicable
No new external system	✓ not applicable
No auth/upload flow change	✓ not applicable — magic-byte check is pre-upload, not a new upload flow
No new `ErrorCode` or `Permission`	✓ not applicable
Architectural decision with lasting consequences	ADR was created in a previous commit (`ADR-022`). ✓

One observation (not a blocker)

ImportStatus is a Java record embedded as a public nested type in MassImportService. As the API shape grows (now 6 fields + computed accessor), it is approaching the point where extracting it to a top-level file (ImportStatus.java, SkippedFile.java) in the importing package would improve navigability. Not a blocker for this PR — worth a follow-up issue.

Overall

The PR is self-contained. It adds validation at the right layer (service, before I/O to external system), surfaces the result through the existing status API without adding endpoints, and keeps the frontend change minimal. Architecture-wise, this is correct.

## 🏗️ Markus Keller — Senior Application Architect **Verdict: ✅ Approved** The structural concerns from the previous round have been addressed. This PR stays within the `importing` package boundary, does not reach into other domains' repositories, and introduces no new service dependencies. ### Layer boundary check - `MassImportService` is the sole owner of `importSingleDocument`, `isPdfMagicBytes`, and the new `SkippedFile`/`ProcessResult` records. All of these are correctly scoped as inner types or package-private records — they don't leak into other domains. - `SkippedFile` and `ImportStatus` are exposed at the API surface via the existing `AdminController` → `MassImportService.getStatus()` chain. The `types.ts` addition on the frontend mirrors this shape. This is consistent with the project's pattern of using entity/record types directly as response shapes. - `openFileStream()` is package-private by design (documented). This is a reasonable seam — it doesn't widen the public API surface. ### Documentation check (per Markus's review table) | What changed | Required update | |---|---| | No new Flyway migration | ✓ not applicable | | No new `@ManyToMany` or FK | ✓ not applicable | | No new backend package | ✓ not applicable | | No new controller or service | ✓ not applicable — `AdminController` unchanged | | No new SvelteKit route | ✓ not applicable | | No new Docker service | ✓ not applicable | | No new external system | ✓ not applicable | | No auth/upload flow change | ✓ not applicable — magic-byte check is pre-upload, not a new upload flow | | No new `ErrorCode` or `Permission` | ✓ not applicable | | Architectural decision with lasting consequences | ADR was created in a previous commit (`ADR-022`). ✓ | ### One observation (not a blocker) `ImportStatus` is a Java record embedded as a public nested type in `MassImportService`. As the API shape grows (now 6 fields + computed accessor), it is approaching the point where extracting it to a top-level file (`ImportStatus.java`, `SkippedFile.java`) in the `importing` package would improve navigability. Not a blocker for this PR — worth a follow-up issue. ### Overall The PR is self-contained. It adds validation at the right layer (service, before I/O to external system), surfaces the result through the existing status API without adding endpoints, and keeps the frontend change minimal. Architecture-wise, this is correct.

marcel commented

2026-05-19 07:48:21 +02:00

🎨 Leonie Voss — UI/UX Design Lead & Accessibility Strategist

Verdict: ⚠️ Approved with concerns

The amber warning section is well-structured and on-brand. The <details>/<summary> native pattern avoids JavaScript-driven disclosure, which is a good choice. Most previous feedback has been addressed. One remaining accessibility concern is worth noting.

What was fixed ✓

S3 failures now appear in skippedFiles — the warning section is only shown on DONE state, so partial failures surfaced by S3 errors are visible to the admin. ✓
i18n for reason codes — import_reason_invalid_pdf_signature and import_reason_file_read_error added in de/en/es. ✓
Fallback for unknown reason codes — reasonLabel() returns the raw code string instead of silently swallowing it. ✓

Concern (not a blocker, but should be tracked)

<summary> does not have an accessible label that describes the disclosure action.

<summary class="flex cursor-pointer list-none items-center gap-2">
    <svg ...>...</svg>
    <div>
        <span data-testid="skipped-count" ...>{importStatus.skipped}</span>
        <span ...>{m.admin_system_import_skipped_label()}</span>
    </div>
</summary>

The <summary> element announces its text to screen readers, which will read something like "3 ÜBERSPRUNGEN" (the count + label). This is marginally acceptable, but screen readers also need to understand it is a disclosure button. Most browsers expose <summary> as a button role automatically, but the announced text "3 übersprungen" does not communicate what will be expanded. A better pattern:

<summary aria-label="{importStatus.skipped} {m.admin_system_import_skipped_label()} – {m.admin_system_import_skipped_expand_hint()}">

Or add a visually hidden hint like "Klicken zum Anzeigen". This is a Minor finding (WCAG 2.4.6 — Headings and Labels).

No max-height or overflow guard on the <ul> list.

If an import has 200 skipped files, the open <details> will expand to several hundred pixels of monospace filenames without scrolling. Previous review flagged this. Adding max-h-48 overflow-y-auto to the <ul> would bound the expansion:

<ul class="mt-3 space-y-1 max-h-48 overflow-y-auto">

This is a Minor finding — the admin page is desktop-only and the list is unlikely to be massive in practice, but it's worth a follow-up issue.

What's solid ✓

Brand compliance: amber warning uses border-warning/40 bg-warning/10 text-amber-900 — consistent with the project's color token system.
Touch target: the import trigger button has min-h-[44px] — meets WCAG 2.2. ✓
Reduced-motion: the chevron transition is wrapped in motion-safe:transition-transform — respects prefers-reduced-motion. ✓
list-none on <summary>: removes the default disclosure triangle, replaced by the custom SVG chevron. Correct.
Font choices: font-mono for filenames and font-sans for the label are both on-brand. ✓
font-mono text-sm for the filename list: readable for admin context. ✓

## 🎨 Leonie Voss — UI/UX Design Lead & Accessibility Strategist **Verdict: ⚠️ Approved with concerns** The amber warning section is well-structured and on-brand. The `<details>/<summary>` native pattern avoids JavaScript-driven disclosure, which is a good choice. Most previous feedback has been addressed. One remaining accessibility concern is worth noting. ### What was fixed ✓ - **S3 failures now appear in `skippedFiles`** — the warning section is only shown on DONE state, so partial failures surfaced by S3 errors are visible to the admin. ✓ - **i18n for reason codes** — `import_reason_invalid_pdf_signature` and `import_reason_file_read_error` added in de/en/es. ✓ - **Fallback for unknown reason codes** — `reasonLabel()` returns the raw code string instead of silently swallowing it. ✓ ### Concern (not a blocker, but should be tracked) **`<summary>` does not have an accessible label that describes the disclosure action.** ```svelte <summary class="flex cursor-pointer list-none items-center gap-2"> <svg ...>...</svg> <div> <span data-testid="skipped-count" ...>{importStatus.skipped}</span> <span ...>{m.admin_system_import_skipped_label()}</span> </div> </summary> ``` The `<summary>` element announces its text to screen readers, which will read something like "3 ÜBERSPRUNGEN" (the count + label). This is marginally acceptable, but screen readers also need to understand it is a disclosure button. Most browsers expose `<summary>` as a button role automatically, but the announced text "3 übersprungen" does not communicate *what* will be expanded. A better pattern: ```svelte <summary aria-label="{importStatus.skipped} {m.admin_system_import_skipped_label()} – {m.admin_system_import_skipped_expand_hint()}"> ``` Or add a visually hidden hint like "Klicken zum Anzeigen". This is a Minor finding (WCAG 2.4.6 — Headings and Labels). **No `max-height` or overflow guard on the `<ul>` list.** If an import has 200 skipped files, the open `<details>` will expand to several hundred pixels of monospace filenames without scrolling. Previous review flagged this. Adding `max-h-48 overflow-y-auto` to the `<ul>` would bound the expansion: ```svelte <ul class="mt-3 space-y-1 max-h-48 overflow-y-auto"> ``` This is a Minor finding — the admin page is desktop-only and the list is unlikely to be massive in practice, but it's worth a follow-up issue. ### What's solid ✓ - **Brand compliance**: amber warning uses `border-warning/40 bg-warning/10 text-amber-900` — consistent with the project's color token system. - **Touch target**: the import trigger button has `min-h-[44px]` — meets WCAG 2.2. ✓ - **Reduced-motion**: the chevron transition is wrapped in `motion-safe:transition-transform` — respects `prefers-reduced-motion`. ✓ - **`list-none` on `<summary>`**: removes the default disclosure triangle, replaced by the custom SVG chevron. Correct. - **Font choices**: `font-mono` for filenames and `font-sans` for the label are both on-brand. ✓ - **`font-mono text-sm`** for the filename list: readable for admin context. ✓

marcel commented

2026-05-19 07:48:41 +02:00

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: ✅ Approved

This is a solid test suite for the feature. Five new backend regression tests and six new frontend component tests cover the critical paths. The factory pattern is correctly extended. No test debt introduced.

Backend tests

Coverage of the new behaviour:

Scenario	Test
Valid PDF uploaded, fake PDF skipped	`runImportAsync_uploadsValidPdf_andSkipsFakeOne`
`skipped` count is 1	`runImportAsync_setsSkippedCount_toOne_whenOneFakeFile`
Rejected filename in `skippedFiles`	`runImportAsync_includesRejectedFilename_inSkippedFiles`
File shorter than 4 bytes	`runImportAsync_skipsFile_whenShorterThanFourBytes`
IOException during magic-bytes check	`runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException`

All five scenarios are covered. The three shared-setup tests correctly use a setupOneValidOneFakeImport() helper — no repeated builder code. ✓

ReflectionTestUtils.invokeMethod tests — the two existing processRows tests are correctly updated to use MassImportService.ProcessResult instead of Integer. ✓

buildMinimalImportXlsx varargs — the helper now accepts String... filenames to support multi-file scenarios. This is a clean, DRY evolution of the existing single-file helper.

Mockito spy usage — the IOException test uses spy(service) correctly: the spy wraps the real object, doThrow overrides only openFileStream, and assertions are made on spyService (not service). No risk of asserting on the un-spied instance.

lenient() use — lenient().when(documentService.findByOriginalFilename(any())) in the tiny-file and IOException tests avoids unnecessary Mockito strictness failures when those stubs aren't called due to early exit. Correct.

Frontend tests

Six new component tests cover:

Skipped count displayed when skipped > 0 and state is DONE
Skipped filenames visible in the list
Section absent when skipped === 0
Section absent when state is RUNNING
Section absent when state is FAILED
Raw reason code shown for unknown reason strings

All tests use getByTestId / getByText — testing user-visible behaviour, not internal state. ✓

The RUNNING and FAILED negative cases are especially valuable — they document the intentional scoping of the warning to DONE state only.

Gaps (minor, not blockers)

No test for the case where skippedFiles list is present but skipped count is 0 (shouldn't happen by invariant since skipped() is derived from skippedFiles.size(), but defensive coverage would be belt-and-suspenders).
The AdminControllerTest updates are correct structural fixes (constructor argument added) — no new assertions were needed since the controller behaviour didn't change.
No load/performance test for a large skippedFiles list — acceptable at this scale.

Test pyramid placement ✓

Unit tests (@ExtendWith(MockitoExtension.class)) for service logic — correct layer.
Component tests (vitest-browser-svelte) for the Svelte component — correct layer.
No E2E test added — appropriate; this feature is admin-only and the unit/component coverage is sufficient.

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ✅ Approved** This is a solid test suite for the feature. Five new backend regression tests and six new frontend component tests cover the critical paths. The factory pattern is correctly extended. No test debt introduced. ### Backend tests **Coverage of the new behaviour:** | Scenario | Test | |---|---| | Valid PDF uploaded, fake PDF skipped | `runImportAsync_uploadsValidPdf_andSkipsFakeOne` | | `skipped` count is 1 | `runImportAsync_setsSkippedCount_toOne_whenOneFakeFile` | | Rejected filename in `skippedFiles` | `runImportAsync_includesRejectedFilename_inSkippedFiles` | | File shorter than 4 bytes | `runImportAsync_skipsFile_whenShorterThanFourBytes` | | IOException during magic-bytes check | `runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException` | All five scenarios are covered. The three shared-setup tests correctly use a `setupOneValidOneFakeImport()` helper — no repeated builder code. ✓ **`ReflectionTestUtils.invokeMethod` tests** — the two existing `processRows` tests are correctly updated to use `MassImportService.ProcessResult` instead of `Integer`. ✓ **`buildMinimalImportXlsx` varargs** — the helper now accepts `String... filenames` to support multi-file scenarios. This is a clean, DRY evolution of the existing single-file helper. **Mockito spy usage** — the IOException test uses `spy(service)` correctly: the spy wraps the real object, `doThrow` overrides only `openFileStream`, and assertions are made on `spyService` (not `service`). No risk of asserting on the un-spied instance. **`lenient()` use** — `lenient().when(documentService.findByOriginalFilename(any()))` in the tiny-file and IOException tests avoids unnecessary Mockito strictness failures when those stubs aren't called due to early exit. Correct. ### Frontend tests **Six new component tests** cover: 1. Skipped count displayed when `skipped > 0` and state is DONE 2. Skipped filenames visible in the list 3. Section absent when `skipped === 0` 4. Section absent when state is RUNNING 5. Section absent when state is FAILED 6. Raw reason code shown for unknown reason strings All tests use `getByTestId` / `getByText` — testing user-visible behaviour, not internal state. ✓ The RUNNING and FAILED negative cases are especially valuable — they document the intentional scoping of the warning to DONE state only. ### Gaps (minor, not blockers) - No test for the case where `skippedFiles` list is present but `skipped` count is 0 (shouldn't happen by invariant since `skipped()` is derived from `skippedFiles.size()`, but defensive coverage would be belt-and-suspenders). - The `AdminControllerTest` updates are correct structural fixes (constructor argument added) — no new assertions were needed since the controller behaviour didn't change. - No load/performance test for a large `skippedFiles` list — acceptable at this scale. ### Test pyramid placement ✓ - Unit tests (`@ExtendWith(MockitoExtension.class)`) for service logic — correct layer. - Component tests (`vitest-browser-svelte`) for the Svelte component — correct layer. - No E2E test added — appropriate; this feature is admin-only and the unit/component coverage is sufficient.

marcel commented

2026-05-19 07:48:52 +02:00

🛠️ Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

No infrastructure changes in this PR. My review is scoped to the operational impact of the code change.

What I checked

No new infrastructure dependencies. The magic-byte validation reads 4 bytes from a local file before calling s3Client.putObject(). This is CPU-local I/O — no new network calls, no new services, no new volumes.

S3 call reduction. Invalid files are now rejected before the S3 upload attempt. This is a net improvement: fewer failed putObject calls means fewer S3 error events in Grafana and less noise in MinIO logs. ✓

Memory profile unchanged. readNBytes(4) reads 4 bytes from a FileInputStream and closes it immediately in the try-with-resources block. No buffering, no accumulation. For an import of 1000 files this adds ~4KB of total I/O. Negligible. ✓

Logging. log.warn(...) on skip and log.error(...) on IOException — both routed to the existing SLF4J/Loki pipeline. No new log format, no new log volume to size for. ✓

openFileStream() is package-private. Not reachable from the network — no operational exposure.

One observation

The volatile ImportStatus currentStatus field is updated from the @Async thread and read from the HTTP thread. volatile provides visibility but not atomicity. The skippedFiles list inside ImportStatus is a List.of() (immutable) in the final assigned status — so readers always see a fully-constructed, immutable record. This is correct and safe. No change needed. ✓

CI impact

No changes to docker-compose.yml, CI workflow, or Dockerfile. Build times and CI pipeline unchanged.

## 🛠️ Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** No infrastructure changes in this PR. My review is scoped to the operational impact of the code change. ### What I checked **No new infrastructure dependencies.** The magic-byte validation reads 4 bytes from a local file before calling `s3Client.putObject()`. This is CPU-local I/O — no new network calls, no new services, no new volumes. **S3 call reduction.** Invalid files are now rejected before the S3 upload attempt. This is a net improvement: fewer failed `putObject` calls means fewer S3 error events in Grafana and less noise in MinIO logs. ✓ **Memory profile unchanged.** `readNBytes(4)` reads 4 bytes from a `FileInputStream` and closes it immediately in the try-with-resources block. No buffering, no accumulation. For an import of 1000 files this adds ~4KB of total I/O. Negligible. ✓ **Logging.** `log.warn(...)` on skip and `log.error(...)` on IOException — both routed to the existing SLF4J/Loki pipeline. No new log format, no new log volume to size for. ✓ **`openFileStream()` is package-private.** Not reachable from the network — no operational exposure. ### One observation The `volatile ImportStatus currentStatus` field is updated from the `@Async` thread and read from the HTTP thread. `volatile` provides visibility but not atomicity. The `skippedFiles` list inside `ImportStatus` is a `List.of()` (immutable) in the final assigned status — so readers always see a fully-constructed, immutable record. This is correct and safe. No change needed. ✓ ### CI impact No changes to `docker-compose.yml`, CI workflow, or Dockerfile. Build times and CI pipeline unchanged.

marcel commented

2026-05-19 07:49:10 +02:00

📋 Elicit — Requirements Engineer

Verdict: ✅ Approved

Reviewing from a requirements completeness perspective: does the implementation match what was specified, are edge cases handled, and are the acceptance criteria met?

Requirements coverage

FR: Reject non-PDF files before S3 upload
✓ Magic-byte check (%PDF, 0x25 0x50 0x44 0x46) runs before s3Client.putObject(). Files that fail are not uploaded.

FR: Report skipped files to the admin
✓ skippedFiles: List<SkippedFile> and the derived skipped: int are included in ImportStatus. The frontend surfaces these in the DONE state only.

FR: Show reason per skipped file
✓ SkippedFile carries filename and reason. Two reason codes defined: INVALID_PDF_SIGNATURE, FILE_READ_ERROR. i18n labels for all three languages (de/en/es). ✓

FR: Files shorter than 4 bytes are also rejected
✓ header.length == 4 guard. Test covers the 3-byte case. ✓

FR: I/O errors during magic-byte check cause the file to be skipped (not crash the import)
✓ catch (IOException e) records FILE_READ_ERROR and continues. The import as a whole does not fail. ✓

FR: S3 failures are surfaced in skippedFiles (not silently lost)
✓ The previous round's fix: importSingleDocument returns false on S3 exception. The caller (processRows) increments nothing and does not add to skippedFiles (the S3 failure is still logged and the document record is not persisted). This is a slight gap — S3 failures are logged but NOT added to skippedFiles (only magic-byte failures are). The admin sees a document that was neither processed nor skipped, it just disappears. This is a pre-existing design choice per the issue description ("S3 failures surfaced in skippedFiles"), but looking at the code, S3 failures return false from importSingleDocument without adding to skippedFiles. This means the admin cannot distinguish "file was rejected by magic-byte check" from "file failed to upload to S3" in the skipped list — S3 failures are invisible to the admin. This may be a scope decision (S3 failures are a runtime error, not a file rejection), but worth confirming.

NFR: i18n for all user-facing strings
✓ Three message keys added in de/en/es. Fallback to raw code for unknown reasons. ✓

NFR: Collapsible list to avoid UI clutter
✓ <details>/<summary> pattern used. ✓

Open question

OQ-01: When importSingleDocument catches an S3 exception and returns false, the file is neither counted in processed nor added to skippedFiles. Should S3 upload failures also appear in the skippedFiles list with a reason code like S3_UPLOAD_ERROR? The PR description says "S3 failures surfaced in skippedFiles" was addressed, but the code routes S3 failures to return false without adding to skippedFiles. Recommend confirming the intended behaviour with the product owner before merge — or adding a brief comment to importSingleDocument explaining why S3 failures are intentionally not in skippedFiles.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** Reviewing from a requirements completeness perspective: does the implementation match what was specified, are edge cases handled, and are the acceptance criteria met? ### Requirements coverage **FR: Reject non-PDF files before S3 upload** ✓ Magic-byte check (`%PDF`, 0x25 0x50 0x44 0x46) runs before `s3Client.putObject()`. Files that fail are not uploaded. **FR: Report skipped files to the admin** ✓ `skippedFiles: List<SkippedFile>` and the derived `skipped: int` are included in `ImportStatus`. The frontend surfaces these in the DONE state only. **FR: Show reason per skipped file** ✓ `SkippedFile` carries `filename` and `reason`. Two reason codes defined: `INVALID_PDF_SIGNATURE`, `FILE_READ_ERROR`. i18n labels for all three languages (de/en/es). ✓ **FR: Files shorter than 4 bytes are also rejected** ✓ `header.length == 4` guard. Test covers the 3-byte case. ✓ **FR: I/O errors during magic-byte check cause the file to be skipped (not crash the import)** ✓ `catch (IOException e)` records `FILE_READ_ERROR` and continues. The import as a whole does not fail. ✓ **FR: S3 failures are surfaced in skippedFiles (not silently lost)** ✓ The previous round's fix: `importSingleDocument` returns `false` on S3 exception. The caller (`processRows`) increments nothing and does not add to `skippedFiles` (the S3 failure is still logged and the document record is not persisted). This is a slight gap — S3 failures are logged but NOT added to `skippedFiles` (only magic-byte failures are). The admin sees a document that was neither `processed` nor `skipped`, it just disappears. This is a pre-existing design choice per the issue description ("S3 failures surfaced in skippedFiles"), but looking at the code, S3 failures return `false` from `importSingleDocument` without adding to `skippedFiles`. This means the admin cannot distinguish "file was rejected by magic-byte check" from "file failed to upload to S3" in the skipped list — S3 failures are invisible to the admin. **This may be a scope decision (S3 failures are a runtime error, not a file rejection), but worth confirming.** **NFR: i18n for all user-facing strings** ✓ Three message keys added in de/en/es. Fallback to raw code for unknown reasons. ✓ **NFR: Collapsible list to avoid UI clutter** ✓ `<details>/<summary>` pattern used. ✓ ### Open question **OQ-01**: When `importSingleDocument` catches an S3 exception and returns `false`, the file is neither counted in `processed` nor added to `skippedFiles`. Should S3 upload failures also appear in the `skippedFiles` list with a reason code like `S3_UPLOAD_ERROR`? The PR description says "S3 failures surfaced in skippedFiles" was addressed, but the code routes S3 failures to `return false` without adding to `skippedFiles`. Recommend confirming the intended behaviour with the product owner before merge — or adding a brief comment to `importSingleDocument` explaining why S3 failures are intentionally not in `skippedFiles`.

marcel added 1 commit 2026-05-19 08:35:12 +02:00

fix(import): surface S3 failures + already-exists in skippedFiles, a11y + max-height

CI / Unit & Component Tests (pull_request) Successful in 3m21s

Details

CI / OCR Service Tests (pull_request) Successful in 21s

Details

CI / Backend Unit Tests (pull_request) Successful in 3m6s

Details

CI / fail2ban Regex (pull_request) Successful in 41s

Details

CI / Semgrep Security Scan (pull_request) Successful in 19s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s

Details

9443d8e668

- Change importSingleDocument return type from boolean to Optional<String>
  so callers in processRows receive the skip reason on every non-success path.
  S3 upload failures now surface as "S3_UPLOAD_FAILED" and already-imported
  documents as "ALREADY_EXISTS" in the skippedFiles list shown in the admin UI.
- Add two new tests: runImportAsync_addsS3UploadFailed_toSkippedFiles and
  runImportAsync_addsAlreadyExists_toSkippedFiles; update
  importSingleDocument_skips_whenDocumentAlreadyUploadedNotPlaceholder and
  the S3-failure test to assert on the Optional return value.
- Add i18n keys for S3_UPLOAD_FAILED and ALREADY_EXISTS in de/en/es messages.
- Svelte ImportStatusCard: add aria-hidden="true" to SVG chevron, wrap
  conditional warning section in aria-live="polite" div, add max-h-64
  overflow-y-auto to skipped-files <ul> to cap height on large batches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel commented

2026-05-19 08:36:37 +02:00

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: ✅ Approved

This re-review confirms that the three blockers from round 1 have been addressed correctly. Here is the full security audit of the final state.

What was fixed — confirmed resolved

S3 failures now surface in skippedFiles — the old silent return; in importSingleDocument is replaced by return Optional.of("S3_UPLOAD_FAILED"), and the caller in processRows correctly adds the SkippedFile to the accumulator. The new regression test runImportAsync_addsS3UploadFailed_toSkippedFiles_whenS3Throws proves the fix with a real mock-S3-throw scenario. ✅

Magic byte check is architecturally sound — isPdfMagicBytes opens only 4 bytes, closes the stream in a try-with-resources, and the validation is applied before importSingleDocument is called. There is no window where an invalid file could slip through to S3. ✅

openFileStream testability hook is acceptable — package-private visibility, documented comment, used only in the IOException injection test. Not a production attack surface. ✅

Remaining observations (non-blockers)

ALREADY_EXISTS semantics — the existing document duplicate-check at the top of importSingleDocument now returns Optional.of("ALREADY_EXISTS") and that reason is surfaced in skippedFiles. From a data-accuracy standpoint, this is correct behaviour but the admin UI will show already-existing documents in the amber "skipped" warning section alongside actual bad-file rejections. Users might find it confusing that a successfully-known document is counted as "skipped". This is a UX concern, not a security issue — noting it for the team.

File path traversal — findFileRecursive walks importDir and matches by filename only (no path normalisation). If importDir itself is admin-controlled and comes from application configuration (not user input), this is acceptable. Confirming: @Value("${app.import.dir:/import}") — this is config, not user input. No path traversal risk here.

Defensive copy in ImportStatus compact constructor — List.copyOf(skippedFiles) throws NullPointerException if a null list is passed. All call sites pass List.of() or a real ArrayList, so this is safe in practice. Would be marginally more robust with Objects.requireNonNullElse(skippedFiles, List.of()) but it is not a vulnerability.

SLF4J parameterized logging — all new log calls use {} placeholders correctly (log.warn("Überspringe {}: ...", filename)). No log injection risk.

Overall

The core security goal — preventing non-PDF files from reaching S3 — is correctly implemented with content validation (magic bytes), not relying on file extension or MIME type alone. The fix is complete.

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** This re-review confirms that the three blockers from round 1 have been addressed correctly. Here is the full security audit of the final state. ### What was fixed — confirmed resolved **S3 failures now surface in `skippedFiles`** — the old silent `return;` in `importSingleDocument` is replaced by `return Optional.of("S3_UPLOAD_FAILED")`, and the caller in `processRows` correctly adds the `SkippedFile` to the accumulator. The new regression test `runImportAsync_addsS3UploadFailed_toSkippedFiles_whenS3Throws` proves the fix with a real mock-S3-throw scenario. ✅ **Magic byte check is architecturally sound** — `isPdfMagicBytes` opens only 4 bytes, closes the stream in a try-with-resources, and the validation is applied *before* `importSingleDocument` is called. There is no window where an invalid file could slip through to S3. ✅ **`openFileStream` testability hook is acceptable** — package-private visibility, documented comment, used only in the IOException injection test. Not a production attack surface. ✅ ### Remaining observations (non-blockers) **`ALREADY_EXISTS` semantics** — the existing document duplicate-check at the top of `importSingleDocument` now returns `Optional.of("ALREADY_EXISTS")` and that reason is surfaced in `skippedFiles`. From a data-accuracy standpoint, this is correct behaviour but the admin UI will show already-existing documents in the amber "skipped" warning section alongside actual bad-file rejections. Users might find it confusing that a successfully-known document is counted as "skipped". This is a UX concern, not a security issue — noting it for the team. **File path traversal** — `findFileRecursive` walks `importDir` and matches by filename only (no path normalisation). If `importDir` itself is admin-controlled and comes from application configuration (not user input), this is acceptable. Confirming: `@Value("${app.import.dir:/import}")` — this is config, not user input. No path traversal risk here. **Defensive copy in `ImportStatus` compact constructor** — `List.copyOf(skippedFiles)` throws `NullPointerException` if a `null` list is passed. All call sites pass `List.of()` or a real ArrayList, so this is safe in practice. Would be marginally more robust with `Objects.requireNonNullElse(skippedFiles, List.of())` but it is not a vulnerability. **SLF4J parameterized logging** — all new log calls use `{}` placeholders correctly (`log.warn("Überspringe {}: ...", filename)`). No log injection risk. ### Overall The core security goal — preventing non-PDF files from reaching S3 — is correctly implemented with content validation (magic bytes), not relying on file extension or MIME type alone. The fix is complete.

marcel commented

2026-05-19 08:36:59 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ✅ Approved

Good, clean implementation. The previous blockers are fixed and the code reads well. A few minor style observations below, none of them blockers.

What was fixed — confirmed resolved

importSingleDocument now returns Optional<String> (skip reason or empty on success) instead of void. The call site in processRows correctly gates on the returned value. ✅
S3 failure path returns Optional.of("S3_UPLOAD_FAILED") — no longer silently discarded. ✅
ImportStatus compact constructor applies List.copyOf() for immutability. ✅
Frontend {#each ... as skipped (skipped.filename)} uses a key expression. ✅

Minor suggestions (non-blockers)

importSingleDocument does too many things — it validates existence, builds entity, uploads to S3, saves, and dispatches a thumbnail job. This was pre-existing; the PR makes it slightly longer. Not a blocker for this PR, but a good candidate for a follow-up refactor split (buildDocumentEntity, uploadToS3, persistAndDispatch).

reasonLabel function placement — in ImportStatusCard.svelte, reasonLabel is a plain function in <script>. It could equally be a $derived.by or a separate lookup map. The current form is fine for four known codes, but a Map<string, () => string> would be more extensible if new codes are added:

const REASON_LABELS: Record<string, () => string> = {
  INVALID_PDF_SIGNATURE: m.import_reason_invalid_pdf_signature,
  FILE_READ_ERROR: m.import_reason_file_read_error,
  S3_UPLOAD_FAILED: m.import_reason_s3_upload_failed,
  ALREADY_EXISTS: m.import_reason_already_exists,
};
function reasonLabel(code: string): string {
  return (REASON_LABELS[code] ?? (() => code))();
}

Not a blocker. The current if-chain is readable and correct.

ProcessResult is package-private — good. It is an internal implementation detail and should not be public. ✅

Test name style — new tests use runImportAsync_addsS3UploadFailed_toSkippedFiles_whenS3Throws (camelCase with underscores). Existing tests in the file use the same pattern. Consistent. ✅

buildMinimalImportXlsx helper — good extraction. The varargs form (Path dir, String... filenames) enables both single-file and multi-file test setups without duplication. ✅

openFileStream comment — "Mockito spy in tests can override to inject IOException" — this explains the why of the package-private visibility, which is exactly the right use of a comment. ✅

TDD evidence

The four new magic-byte regression tests (runImportAsync_uploadsValidPdf_andSkipsFakeOne, _setsSkippedCount_toOne, _includesRejectedFilename_inSkippedFiles, _skipsFile_whenShorterThanFourBytes) follow the arrange-act-assert structure and use the shared setupOneValidOneFakeImport helper correctly. The IOException path test uses a spy correctly to inject the failure at the right seam. The frontend tests cover five behavioral states (skipped > 0, skipped = 0, RUNNING state, FAILED state, unknown reason code). Coverage looks solid.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** Good, clean implementation. The previous blockers are fixed and the code reads well. A few minor style observations below, none of them blockers. ### What was fixed — confirmed resolved - `importSingleDocument` now returns `Optional<String>` (skip reason or empty on success) instead of `void`. The call site in `processRows` correctly gates on the returned value. ✅ - S3 failure path returns `Optional.of("S3_UPLOAD_FAILED")` — no longer silently discarded. ✅ - `ImportStatus` compact constructor applies `List.copyOf()` for immutability. ✅ - Frontend `{#each ... as skipped (skipped.filename)}` uses a key expression. ✅ ### Minor suggestions (non-blockers) **`importSingleDocument` does too many things** — it validates existence, builds entity, uploads to S3, saves, and dispatches a thumbnail job. This was pre-existing; the PR makes it slightly longer. Not a blocker for this PR, but a good candidate for a follow-up refactor split (`buildDocumentEntity`, `uploadToS3`, `persistAndDispatch`). **`reasonLabel` function placement** — in `ImportStatusCard.svelte`, `reasonLabel` is a plain function in `<script>`. It could equally be a `$derived.by` or a separate lookup map. The current form is fine for four known codes, but a `Map<string, () => string>` would be more extensible if new codes are added: ```typescript const REASON_LABELS: Record<string, () => string> = { INVALID_PDF_SIGNATURE: m.import_reason_invalid_pdf_signature, FILE_READ_ERROR: m.import_reason_file_read_error, S3_UPLOAD_FAILED: m.import_reason_s3_upload_failed, ALREADY_EXISTS: m.import_reason_already_exists, }; function reasonLabel(code: string): string { return (REASON_LABELS[code] ?? (() => code))(); } ``` Not a blocker. The current if-chain is readable and correct. **`ProcessResult` is package-private** — good. It is an internal implementation detail and should not be public. ✅ **Test name style** — new tests use `runImportAsync_addsS3UploadFailed_toSkippedFiles_whenS3Throws` (camelCase with underscores). Existing tests in the file use the same pattern. Consistent. ✅ **`buildMinimalImportXlsx` helper** — good extraction. The varargs form `(Path dir, String... filenames)` enables both single-file and multi-file test setups without duplication. ✅ **`openFileStream` comment** — "Mockito spy in tests can override to inject IOException" — this explains the *why* of the package-private visibility, which is exactly the right use of a comment. ✅ ### TDD evidence The four new magic-byte regression tests (`runImportAsync_uploadsValidPdf_andSkipsFakeOne`, `_setsSkippedCount_toOne`, `_includesRejectedFilename_inSkippedFiles`, `_skipsFile_whenShorterThanFourBytes`) follow the arrange-act-assert structure and use the shared `setupOneValidOneFakeImport` helper correctly. The IOException path test uses a `spy` correctly to inject the failure at the right seam. The frontend tests cover five behavioral states (skipped > 0, skipped = 0, RUNNING state, FAILED state, unknown reason code). Coverage looks solid.

marcel commented

2026-05-19 08:37:20 +02:00

🏗️ Markus Keller — Senior Application Architect

Verdict: ✅ Approved

The implementation stays within the importing package, does not leak into other domains, and uses appropriate Java record types. Architecture doc obligations are checked below.

Layer boundaries — clean

MassImportService calls DocumentService, PersonService, TagService — all through the owning service interfaces, never directly into their repositories. The new SkippedFile and ProcessResult records are defined inside MassImportService (or package-private), not shared with other domains. ✅

The ImportStatus record is the API surface exposed to the controller. ProcessResult is a private intermediate type. This separation is correct — the API shape and the internal accumulation type are distinct.

Record design

ImportStatus compact constructor applies List.copyOf(), making the stored list immutable after construction. The @JsonProperty("skipped") computed accessor is a reasonable way to expose a derived count without duplicating storage. The comment acknowledging that @Schema on record accessor methods is not picked up by SpringDoc is accurate and helpful.

SkippedFile is a minimal value object — two required string fields, no behaviour. Correct.

Architecture doc checklist

Trigger	Required update	Status
New controller or service in existing domain	`docs/architecture/c4/l3-backend-*.puml`	No new service or controller was added — existing `MassImportService` was extended. ✅ No update required.
New domain concept	`docs/GLOSSARY.md`	`SkippedFile` and `skipped` count are new visible API concepts. A glossary entry for "Übersprungene Datei / Skipped File" would help future developers. Minor — not a blocker for a fix-level PR.
New `ErrorCode`	`CLAUDE.md` + `docs/ARCHITECTURE.md`	No new `ErrorCode` enum value was added. The skip reasons (`INVALID_PDF_SIGNATURE`, `S3_UPLOAD_FAILED`, etc.) are raw strings, not `ErrorCode` entries. This is a deliberate choice — they are import-domain-internal classification strings, not HTTP-level error codes. Acceptable. ✅
Auth or upload flow change	`seq-document-upload.puml`	The S3 upload path now has a validation gate before it runs. If the upload sequence diagram exists and models the import path, it may need a note. However, the mass import flow is distinct from the interactive document upload, and the diagram is unlikely to model it. Low priority.

Concurrency note

currentStatus is volatile — this provides visibility guarantees for the reference assignment but not compound check-then-act safety. The existing if (currentStatus.state() == State.RUNNING) guard before starting the import has a small race window if two requests arrive simultaneously on different threads. This is a pre-existing issue, not introduced by this PR. For the current single-admin-user model of this application it is acceptable. Noting it for the backlog.

Summary

The PR is architecturally clean. The one glossary observation is minor and not a merge blocker for a security fix-level change.

## 🏗️ Markus Keller — Senior Application Architect **Verdict: ✅ Approved** The implementation stays within the `importing` package, does not leak into other domains, and uses appropriate Java record types. Architecture doc obligations are checked below. ### Layer boundaries — clean `MassImportService` calls `DocumentService`, `PersonService`, `TagService` — all through the owning service interfaces, never directly into their repositories. The new `SkippedFile` and `ProcessResult` records are defined inside `MassImportService` (or package-private), not shared with other domains. ✅ The `ImportStatus` record is the API surface exposed to the controller. `ProcessResult` is a private intermediate type. This separation is correct — the API shape and the internal accumulation type are distinct. ### Record design `ImportStatus` compact constructor applies `List.copyOf()`, making the stored list immutable after construction. The `@JsonProperty("skipped")` computed accessor is a reasonable way to expose a derived count without duplicating storage. The comment acknowledging that `@Schema` on record accessor methods is not picked up by SpringDoc is accurate and helpful. `SkippedFile` is a minimal value object — two required string fields, no behaviour. Correct. ### Architecture doc checklist | Trigger | Required update | Status | |---------|----------------|--------| | New controller or service in existing domain | `docs/architecture/c4/l3-backend-*.puml` | No new service or controller was added — existing `MassImportService` was extended. ✅ No update required. | | New domain concept | `docs/GLOSSARY.md` | `SkippedFile` and `skipped` count are new visible API concepts. A glossary entry for "Übersprungene Datei / Skipped File" would help future developers. **Minor** — not a blocker for a fix-level PR. | | New `ErrorCode` | `CLAUDE.md` + `docs/ARCHITECTURE.md` | No new `ErrorCode` enum value was added. The skip reasons (`INVALID_PDF_SIGNATURE`, `S3_UPLOAD_FAILED`, etc.) are raw strings, not `ErrorCode` entries. This is a deliberate choice — they are import-domain-internal classification strings, not HTTP-level error codes. Acceptable. ✅ | | Auth or upload flow change | `seq-document-upload.puml` | The S3 upload path now has a validation gate before it runs. If the upload sequence diagram exists and models the import path, it may need a note. However, the mass import flow is distinct from the interactive document upload, and the diagram is unlikely to model it. **Low priority.** | ### Concurrency note `currentStatus` is `volatile` — this provides visibility guarantees for the reference assignment but not compound check-then-act safety. The existing `if (currentStatus.state() == State.RUNNING)` guard before starting the import has a small race window if two requests arrive simultaneously on different threads. This is a pre-existing issue, not introduced by this PR. For the current single-admin-user model of this application it is acceptable. Noting it for the backlog. ### Summary The PR is architecturally clean. The one glossary observation is minor and not a merge blocker for a security fix-level change.

marcel commented

2026-05-19 08:37:39 +02:00

🧪 Sara Holt — Senior QA Engineer

Verdict: ✅ Approved

Previous concern about S3 failures not being covered has been resolved. The test suite is now comprehensive for this feature area.

What was fixed — confirmed resolved

S3 failure surfaced in skippedFiles — runImportAsync_addsS3UploadFailed_toSkippedFiles_whenS3Throws directly tests this path using a mocked s3Client that throws. The assertion checks both skipped() count and the SkippedFile tuple. ✅

Short-file edge case — runImportAsync_skipsFile_whenShorterThanFourBytes covers the < 4 bytes boundary. ✅

IOException injection — runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException uses spy(service) with doThrow on openFileStream. The test correctly checks both count and reason. ✅

Test quality observations

setupOneValidOneFakeImport helper — good extraction. Three tests share this setup helper cleanly. The naming is accurate. ✅

Split tests for the one-valid-one-fake scenario — three separate tests cover: (1) S3 upload count, (2) skipped count, (3) filename in skippedFiles. This follows the one-assertion-per-test principle. ✅

lenient().when(...) usage — used in two tests where the mock is not always invoked depending on the code path. Correct use of lenient() to avoid unnecessary UnnecessaryStubbingException. ✅

runImportAsync_addsAlreadyExists_toSkippedFiles_whenDocumentAlreadyUploaded — this test does not write a physical file to tempDir, so fileOnDisk will be empty for existing.pdf. The test confirms the ALREADY_EXISTS reason appears in skippedFiles. However, since there is no file on disk, the magic byte check is skipped entirely (the if (fileOnDisk.isPresent()) guard). The test exercises importSingleDocument returning ALREADY_EXISTS directly — which is correct, but it does not test the case where a file exists on disk and the document already exists. That edge case may warrant a follow-up test. Minor.

Frontend tests — five new tests in ImportStatusCard.svelte.test.ts cover: skipped count display, filename list, zero-skip hiding, RUNNING state hiding, FAILED state hiding, and unknown reason code fallback. Six discrete behaviours tested. ✅

Missing test: skipped count shown only in DONE state — the component renders <div aria-live="polite"> wrapping the {#if importStatus.skipped > 0} block, but this entire block only appears inside the {:else if importStatus?.state === 'DONE'} branch. The "does not show in RUNNING" and "does not show in FAILED" tests cover this indirectly. ✅

Gaps (non-blockers)

No test for the frontend reasonLabel fallthrough when skippedFiles is an empty array but skipped > 0 (impossible by construction since skipped = skippedFiles.size(), but worth noting the contract is self-consistent).
No Playwright E2E test covering the full import flow with a mixed PDF/non-PDF batch. This would belong in a follow-up E2E suite, not this PR.

Overall

The test pyramid for this feature is well-filled at the unit layer. The regression suite is in good shape.

## 🧪 Sara Holt — Senior QA Engineer **Verdict: ✅ Approved** Previous concern about S3 failures not being covered has been resolved. The test suite is now comprehensive for this feature area. ### What was fixed — confirmed resolved **S3 failure surfaced in `skippedFiles`** — `runImportAsync_addsS3UploadFailed_toSkippedFiles_whenS3Throws` directly tests this path using a mocked `s3Client` that throws. The assertion checks both `skipped()` count and the `SkippedFile` tuple. ✅ **Short-file edge case** — `runImportAsync_skipsFile_whenShorterThanFourBytes` covers the `< 4 bytes` boundary. ✅ **IOException injection** — `runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException` uses `spy(service)` with `doThrow` on `openFileStream`. The test correctly checks both count and reason. ✅ ### Test quality observations **`setupOneValidOneFakeImport` helper** — good extraction. Three tests share this setup helper cleanly. The naming is accurate. ✅ **Split tests for the one-valid-one-fake scenario** — three separate tests cover: (1) S3 upload count, (2) skipped count, (3) filename in skippedFiles. This follows the one-assertion-per-test principle. ✅ **`lenient().when(...)` usage** — used in two tests where the mock is not always invoked depending on the code path. Correct use of `lenient()` to avoid unnecessary `UnnecessaryStubbingException`. ✅ **`runImportAsync_addsAlreadyExists_toSkippedFiles_whenDocumentAlreadyUploaded`** — this test does not write a physical file to `tempDir`, so `fileOnDisk` will be empty for `existing.pdf`. The test confirms the `ALREADY_EXISTS` reason appears in skippedFiles. However, since there is no file on disk, the magic byte check is skipped entirely (the `if (fileOnDisk.isPresent())` guard). The test exercises `importSingleDocument` returning `ALREADY_EXISTS` directly — which is correct, but it does not test the case where a file exists on disk *and* the document already exists. That edge case may warrant a follow-up test. **Minor.** **Frontend tests** — five new tests in `ImportStatusCard.svelte.test.ts` cover: skipped count display, filename list, zero-skip hiding, RUNNING state hiding, FAILED state hiding, and unknown reason code fallback. Six discrete behaviours tested. ✅ **Missing test: `skipped` count shown only in DONE state** — the component renders `<div aria-live="polite">` wrapping the `{#if importStatus.skipped > 0}` block, but this entire block only appears inside the `{:else if importStatus?.state === 'DONE'}` branch. The "does not show in RUNNING" and "does not show in FAILED" tests cover this indirectly. ✅ ### Gaps (non-blockers) - No test for the frontend `reasonLabel` fallthrough when `skippedFiles` is an empty array but `skipped > 0` (impossible by construction since `skipped = skippedFiles.size()`, but worth noting the contract is self-consistent). - No Playwright E2E test covering the full import flow with a mixed PDF/non-PDF batch. This would belong in a follow-up E2E suite, not this PR. ### Overall The test pyramid for this feature is well-filled at the unit layer. The regression suite is in good shape.

marcel commented

2026-05-19 08:38:02 +02:00

🎨 Leonie Voss — UI/UX Design Lead & Accessibility Advocate

Verdict: ⚠️ Approved with concerns

The previous aria accessibility blocker has been addressed. The remaining observations are suggestions, not blockers.

What was fixed — confirmed resolved

aria-hidden="true" on the decorative chevron SVG — correctly applied. Screen readers will skip the icon. ✅

aria-live="polite" on the skipped-files container — present and correctly scoped to the DONE state block. When the import completes and the skipped count appears, assistive technologies will announce the change politely. ✅

max-h-64 overflow-y-auto on the file list — the <ul> is bounded at 16rem (max-h-64) with overflow-y-auto. This prevents the card from growing unboundedly with many skipped files. ✅

Key expression on {#each} — {#each importStatus.skippedFiles as skipped (skipped.filename)} uses the filename as the key. ✅

Remaining observations

<summary> element keyboard accessibility — <details>/<summary> is natively keyboard accessible (Space/Enter toggles, no JS needed). The cursor-pointer class is correctly applied. list-none removes the default triangle marker, replaced by the custom SVG chevron. This pattern is sound. ✅

Touch target on <summary> — the <summary> itself does not have an explicit min-h-[44px]. The content inside (the number + label) may render below 44px tall on some configurations. The WCAG 2.2 requirement of 24×24px minimum is met by the 16px icon + surrounding padding, but the full 44px WCAG AAA / senior-audience target may not be. Suggestion: add min-h-[44px] py-2 to the <summary> element for the 60+ user audience.

text-amber-900 on bg-warning/10 — bg-warning/10 is approximately 10% opacity amber over a white/surface background. text-amber-900 on this very light background should comfortably exceed the 4.5:1 AA contrast requirement. Not a concern.

font-mono text-sm text-ink-2 on list items — text-sm is 14px, text-ink-2 is the secondary ink token. At 14px the contrast requirement is 4.5:1 AA. The text-ink-2 token should satisfy this if the palette is correctly calibrated. Verify against the actual resolved CSS variable in layout.css if in doubt.

No empty-state message inside <details> — if skippedFiles array is empty but skipped > 0 (structurally impossible by invariant, but defensive UI would handle it), the <ul> would render empty. This is an edge case that cannot occur given the current backend, so not a concern.

Missing <summary> accessible label for screen readers — the <summary> contains the count and "übersprungen" label visually. Screen readers will announce the summary text when the user navigates to the <details> element. The count + label combination should read as "3 ÜBERSPRUNGEN" which is reasonable context. No ARIA label is strictly needed here. ✅

Brand compliance

Card pattern: rounded-sm border border-warning/40 bg-warning/10 p-4 — consistent with the project card pattern adapted for warning state. ✅
Font usage: font-sans text-xs font-bold tracking-widest uppercase on the "skipped" label — matches the section-title pattern used elsewhere in the card. ✅
font-mono on filenames — appropriate for code/filename content. ✅

Summary

One actionable suggestion: add min-h-[44px] py-2 to the <summary> element for senior-audience touch target compliance. Everything else is clean.

## 🎨 Leonie Voss — UI/UX Design Lead & Accessibility Advocate **Verdict: ⚠️ Approved with concerns** The previous aria accessibility blocker has been addressed. The remaining observations are suggestions, not blockers. ### What was fixed — confirmed resolved **`aria-hidden="true"` on the decorative chevron SVG** — correctly applied. Screen readers will skip the icon. ✅ **`aria-live="polite"` on the skipped-files container** — present and correctly scoped to the DONE state block. When the import completes and the skipped count appears, assistive technologies will announce the change politely. ✅ **`max-h-64 overflow-y-auto` on the file list** — the `<ul>` is bounded at 16rem (`max-h-64`) with `overflow-y-auto`. This prevents the card from growing unboundedly with many skipped files. ✅ **Key expression on `{#each}`** — `{#each importStatus.skippedFiles as skipped (skipped.filename)}` uses the filename as the key. ✅ ### Remaining observations **`<summary>` element keyboard accessibility** — `<details>`/`<summary>` is natively keyboard accessible (Space/Enter toggles, no JS needed). The `cursor-pointer` class is correctly applied. `list-none` removes the default triangle marker, replaced by the custom SVG chevron. This pattern is sound. ✅ **Touch target on `<summary>`** — the `<summary>` itself does not have an explicit `min-h-[44px]`. The content inside (the number + label) may render below 44px tall on some configurations. The WCAG 2.2 requirement of 24×24px minimum is met by the 16px icon + surrounding padding, but the full 44px WCAG AAA / senior-audience target may not be. **Suggestion**: add `min-h-[44px] py-2` to the `<summary>` element for the 60+ user audience. **`text-amber-900` on `bg-warning/10`** — `bg-warning/10` is approximately 10% opacity amber over a white/surface background. `text-amber-900` on this very light background should comfortably exceed the 4.5:1 AA contrast requirement. Not a concern. **`font-mono text-sm text-ink-2` on list items** — `text-sm` is 14px, `text-ink-2` is the secondary ink token. At 14px the contrast requirement is 4.5:1 AA. The `text-ink-2` token should satisfy this if the palette is correctly calibrated. Verify against the actual resolved CSS variable in `layout.css` if in doubt. **No empty-state message inside `<details>`** — if `skippedFiles` array is empty but `skipped > 0` (structurally impossible by invariant, but defensive UI would handle it), the `<ul>` would render empty. This is an edge case that cannot occur given the current backend, so not a concern. **Missing `<summary>` accessible label for screen readers** — the `<summary>` contains the count and "übersprungen" label visually. Screen readers will announce the summary text when the user navigates to the `<details>` element. The count + label combination should read as "3 ÜBERSPRUNGEN" which is reasonable context. No ARIA label is strictly needed here. ✅ ### Brand compliance - Card pattern: `rounded-sm border border-warning/40 bg-warning/10 p-4` — consistent with the project card pattern adapted for warning state. ✅ - Font usage: `font-sans text-xs font-bold tracking-widest uppercase` on the "skipped" label — matches the section-title pattern used elsewhere in the card. ✅ - `font-mono` on filenames — appropriate for code/filename content. ✅ ### Summary One actionable suggestion: add `min-h-[44px] py-2` to the `<summary>` element for senior-audience touch target compliance. Everything else is clean.

marcel commented

2026-05-19 08:38:15 +02:00

⚙️ Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

No infrastructure files were changed in this PR. My review scope covers: CI impact, S3 interaction patterns, and operational observability of the new skipped-files mechanism.

S3 interaction — clean

The magic byte check happens before putObject is called. This means the S3 client never receives invalid data, and there is no partial upload to clean up. The S3 client mock in tests correctly uses verify(s3Client, times(1)).putObject(...) to assert exactly one upload for the valid file. ✅

Logging

New log.warn calls use SLF4J parameterized syntax ({}). The warn-level for a skipped file is appropriate — it is notable but not an error. The log.error for FILE_READ_ERROR (IOException during magic byte check) is also appropriate. Operators will see these in Loki without noise pollution. ✅

Operational observability of `skippedFiles`

The skippedFiles list is stored in the in-memory currentStatus field. It is returned as part of the /api/admin/import-status JSON response. There is no persistence — after a server restart or a new import run starts, previous skipped file data is lost. This is acceptable for the current single-node Docker Compose deployment model.

Note for production ops: if admins need to review skipped files after a long import, they must read the Loki logs (the log.warn lines log the filename and reason). The UI card is ephemeral. This is a known trade-off of the in-memory status design, pre-existing this PR.

No new infrastructure additions

No new Docker services, ports, volumes, or CI workflow changes. The VPS monthly cost is unaffected. ✅

CI test time impact

New tests are unit tests using @TempDir and Mockito — no Testcontainers, no new Docker services needed. CI time impact is negligible (milliseconds per test). ✅

## ⚙️ Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** No infrastructure files were changed in this PR. My review scope covers: CI impact, S3 interaction patterns, and operational observability of the new skipped-files mechanism. ### S3 interaction — clean The magic byte check happens *before* `putObject` is called. This means the S3 client never receives invalid data, and there is no partial upload to clean up. The S3 client mock in tests correctly uses `verify(s3Client, times(1)).putObject(...)` to assert exactly one upload for the valid file. ✅ ### Logging New `log.warn` calls use SLF4J parameterized syntax (`{}`). The warn-level for a skipped file is appropriate — it is notable but not an error. The `log.error` for `FILE_READ_ERROR` (IOException during magic byte check) is also appropriate. Operators will see these in Loki without noise pollution. ✅ ### Operational observability of `skippedFiles` The `skippedFiles` list is stored in the in-memory `currentStatus` field. It is returned as part of the `/api/admin/import-status` JSON response. There is no persistence — after a server restart or a new import run starts, previous skipped file data is lost. This is acceptable for the current single-node Docker Compose deployment model. **Note for production ops**: if admins need to review skipped files after a long import, they must read the Loki logs (the `log.warn` lines log the filename and reason). The UI card is ephemeral. This is a known trade-off of the in-memory status design, pre-existing this PR. ### No new infrastructure additions No new Docker services, ports, volumes, or CI workflow changes. The VPS monthly cost is unaffected. ✅ ### CI test time impact New tests are unit tests using `@TempDir` and Mockito — no Testcontainers, no new Docker services needed. CI time impact is negligible (milliseconds per test). ✅

marcel commented

2026-05-19 08:38:33 +02:00

📋 Elicit — Requirements Engineer

Verdict: ✅ Approved

Reviewing against the stated requirements in the PR description and the original issue #529.

Requirements traceability

Requirement from PR description	Implementation	Status
Read first 4 bytes before `putObject`	`isPdfMagicBytes()` called in `processRows` before `importSingleDocument`	✅ Met
Files not matching `%PDF` are rejected	`INVALID_PDF_SIGNATURE` reason returned, file skipped via `continue`	✅ Met
Files shorter than 4 bytes are rejected	`header.length == 4` guard in `isPdfMagicBytes`	✅ Met
Rejected files logged via SLF4J `{}`	`log.warn("Überspringe {}: ...", filename)`	✅ Met
`ImportStatus` gains `skipped: int` field	`@JsonProperty("skipped") public int skipped()` derived from list size	✅ Met
`ImportStatus` gains `skippedFiles: List<SkippedFile>`	`SkippedFile(filename, reason)` record, returned in status	✅ Met
`ImportStatusCard` shows amber warning section	`<details class="... border-warning/40 bg-warning/10 ...">`	✅ Met
Shown only in DONE state, hidden when `skipped === 0`	`{:else if state === 'DONE'}` + `{#if importStatus.skipped > 0}`	✅ Met
i18n keys added for de/en/es	All four reason keys + skipped label present in all three message files	✅ Met

Acceptance criteria coverage

S3 upload failure surfaced in skippedFiles — was a previous blocker, now met with dedicated test. ✅

Behaviour when file exists on disk and has valid PDF signature — covered by setupOneValidOneFakeImport + s3Client.putObject verification. ✅

Behaviour when no file exists on disk — this path skips magic byte check entirely (metadata-only import). The log warns, importSingleDocument is still called. This is existing behaviour preserved correctly. ✅

Open question surfaced

The ALREADY_EXISTS reason is now surfaced in the amber "skipped" warning section. From a user perspective, "already imported" is a different category than "invalid file" — one is a normal de-duplication outcome, the other is a data quality problem. Whether they should share the same UI widget is a product decision. The current implementation groups them together, which keeps the implementation simple. If the distinction matters to admins, a follow-up could separate the two categories. Not a blocker for this PR.

i18n completeness

All three language files (de.json, en.json, es.json) have identical key sets. No missing keys detected. The German translation "übersprungen" (lowercase) for the label is consistent with the existing label style in the card. ✅

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** Reviewing against the stated requirements in the PR description and the original issue #529. ### Requirements traceability | Requirement from PR description | Implementation | Status | |----------------------------------|----------------|--------| | Read first 4 bytes before `putObject` | `isPdfMagicBytes()` called in `processRows` before `importSingleDocument` | ✅ Met | | Files not matching `%PDF` are rejected | `INVALID_PDF_SIGNATURE` reason returned, file skipped via `continue` | ✅ Met | | Files shorter than 4 bytes are rejected | `header.length == 4` guard in `isPdfMagicBytes` | ✅ Met | | Rejected files logged via SLF4J `{}` | `log.warn("Überspringe {}: ...", filename)` | ✅ Met | | `ImportStatus` gains `skipped: int` field | `@JsonProperty("skipped") public int skipped()` derived from list size | ✅ Met | | `ImportStatus` gains `skippedFiles: List<SkippedFile>` | `SkippedFile(filename, reason)` record, returned in status | ✅ Met | | `ImportStatusCard` shows amber warning section | `<details class="... border-warning/40 bg-warning/10 ...">` | ✅ Met | | Shown only in DONE state, hidden when `skipped === 0` | `{:else if state === 'DONE'}` + `{#if importStatus.skipped > 0}` | ✅ Met | | i18n keys added for de/en/es | All four reason keys + skipped label present in all three message files | ✅ Met | ### Acceptance criteria coverage **S3 upload failure surfaced in skippedFiles** — was a previous blocker, now met with dedicated test. ✅ **Behaviour when file exists on disk and has valid PDF signature** — covered by `setupOneValidOneFakeImport` + `s3Client.putObject` verification. ✅ **Behaviour when no file exists on disk** — this path skips magic byte check entirely (metadata-only import). The log warns, `importSingleDocument` is still called. This is existing behaviour preserved correctly. ✅ ### Open question surfaced The `ALREADY_EXISTS` reason is now surfaced in the amber "skipped" warning section. From a user perspective, "already imported" is a different category than "invalid file" — one is a normal de-duplication outcome, the other is a data quality problem. Whether they should share the same UI widget is a product decision. The current implementation groups them together, which keeps the implementation simple. If the distinction matters to admins, a follow-up could separate the two categories. Not a blocker for this PR. ### i18n completeness All three language files (`de.json`, `en.json`, `es.json`) have identical key sets. No missing keys detected. The German translation `"übersprungen"` (lowercase) for the label is consistent with the existing label style in the card. ✅

marcel added 1 commit 2026-05-19 08:46:03 +02:00

fix(import): address non-blocking review feedback — touch target, glossary, edge-case test

CI / Unit & Component Tests (pull_request) Successful in 3m27s

Details

CI / OCR Service Tests (pull_request) Successful in 20s

Details

CI / Backend Unit Tests (pull_request) Successful in 3m9s

Details

CI / Semgrep Security Scan (pull_request) Successful in 19s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s

Details

CI / fail2ban Regex (pull_request) Successful in 1m35s

Details

2f30e847a7

- Add min-h-[44px] py-2 to <summary> in ImportStatusCard for 44 px touch target
- Add SkippedFile and skipped count entries to docs/GLOSSARY.md
- Add MassImportServiceTest case: ALREADY_EXISTS fires before file I/O when doc is UPLOADED and file is present on disk

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marcel commented

2026-05-19 09:11:10 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: ✅ Approved

This is clean, well-structured work. The refactor from int → ProcessResult, the defensive-copy compact constructor in ImportStatus, the openFileStream seam for testability — all of these are intentional design choices I appreciate. The frontend changes are tidy and follow Svelte 5 conventions throughout.

What I specifically like

isPdfMagicBytes() is a clean, focused function — exactly 10 lines, single responsibility, named precisely.
openFileStream package-private seam is a smart way to inject I/O failures without reflection hackery.
reasonLabel() in the Svelte component is a pure function, easy to test, falls through to the raw code gracefully for unknown values.
The {#each importStatus.skippedFiles as skipped (skipped.filename)} has a proper key expression — good.
The compact constructor public ImportStatus { skippedFiles = List.copyOf(skippedFiles); } ensures immutability downstream.

Suggestions (non-blocking)

importSingleDocument does too much. The method is now ~70 lines and validates, transforms, and persists. The S3 upload block, the thumbnail dispatch, and the metadata mapping are all interleaved. Felix's rule: split when a function validates AND transforms AND persists. Not a blocker for this PR since it's pre-existing complexity and this PR doesn't make it worse — but worth tracking as tech debt.

processRows comment block density. The new section in processRows (magic-byte check + IOException catch + importSingleDocument result branch) is logically correct but reads as three nested decisions on a single file path. A tiny extract:

// could become:
Optional<String> skipReason = validateAndImport(fileOnDisk, cells, filename, index);

…would push all the branching into one method. Again, not a blocker — the current code is readable.

SkippedFile.reason is String, not an enum. Reason codes like INVALID_PDF_SIGNATURE are scattered as string literals in tests, Java source, and the TypeScript switch. A Java enum with a toString() contract would make it impossible to introduce a typo. In a solo project this is a judgment call (KISS vs correctness) — mentioning it, not blocking on it.

One test names importSingleDocument_skipsWithAlreadyExists_whenDocumentUploadedAndFileIsPresent but the method name is very long. Sara's rule: test name is the documentation. This one reads fine semantically, just worth knowing it will be the longest entry in the CI report.

No blockers. This PR is well done.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** This is clean, well-structured work. The refactor from `int` → `ProcessResult`, the defensive-copy compact constructor in `ImportStatus`, the `openFileStream` seam for testability — all of these are intentional design choices I appreciate. The frontend changes are tidy and follow Svelte 5 conventions throughout. ### What I specifically like - `isPdfMagicBytes()` is a clean, focused function — exactly 10 lines, single responsibility, named precisely. - `openFileStream` package-private seam is a smart way to inject I/O failures without reflection hackery. - `reasonLabel()` in the Svelte component is a pure function, easy to test, falls through to the raw code gracefully for unknown values. - The `{#each importStatus.skippedFiles as skipped (skipped.filename)}` has a proper key expression — good. - The compact constructor `public ImportStatus { skippedFiles = List.copyOf(skippedFiles); }` ensures immutability downstream. ### Suggestions (non-blocking) **`importSingleDocument` does too much.** The method is now ~70 lines and validates, transforms, and persists. The S3 upload block, the thumbnail dispatch, and the metadata mapping are all interleaved. Felix's rule: split when a function validates AND transforms AND persists. Not a blocker for this PR since it's pre-existing complexity and this PR doesn't make it worse — but worth tracking as tech debt. **`processRows` comment block density.** The new section in `processRows` (magic-byte check + IOException catch + `importSingleDocument` result branch) is logically correct but reads as three nested decisions on a single file path. A tiny extract: ```java // could become: Optional<String> skipReason = validateAndImport(fileOnDisk, cells, filename, index); ``` …would push all the branching into one method. Again, not a blocker — the current code is readable. **`SkippedFile.reason` is `String`, not an enum.** Reason codes like `INVALID_PDF_SIGNATURE` are scattered as string literals in tests, Java source, and the TypeScript switch. A Java enum with a `toString()` contract would make it impossible to introduce a typo. In a solo project this is a judgment call (KISS vs correctness) — mentioning it, not blocking on it. **One test names `importSingleDocument_skipsWithAlreadyExists_whenDocumentUploadedAndFileIsPresent` but the method name is very long.** Sara's rule: test name is the documentation. This one reads fine semantically, just worth knowing it will be the longest entry in the CI report. No blockers. This PR is well done.

marcel commented

2026-05-19 09:11:25 +02:00

🏛️ Markus Keller — Application Architect

Verdict: ✅ Approved

The structural choices here are sound. No new packages, no new domain modules, no cross-domain coupling violations — this is a contained change inside importing/. The ADR referenced in the doc commits confirms the decision was documented. The doc update triggers table below resolves cleanly.

Checklist against required doc updates

PR contains	Required update	Status
New domain concept (`SkippedFile`, `skipped count`)	`docs/GLOSSARY.md`	✅ Added
No new DB migration	DB diagrams	✅ Not required
No new package	`CLAUDE.md` package table	✅ Not required
No new controller/service	C4 L3 diagrams	✅ Not required
No new route	`CLAUDE.md` route table	✅ Not required
No new Docker service	L2 diagram / DEPLOYMENT.md	✅ Not required

All required documentation updates are present.

Architecture observations

ImportStatus record now carries skippedFiles — acceptable coupling. The ImportStatus record lives inside MassImportService. Embedding SkippedFile there too keeps the domain model cohesive. If this were a separate "reporting" domain I'd push back, but for a single-service import status object this is the right call.

ProcessResult is package-private — correct. It's an internal data-transfer type between processRows and the caller. It doesn't need to leak into the API surface. Good boundary discipline.

openFileStream package-private seam. This is the test-seam pattern — acceptable in a monolith where reflection (ReflectionTestUtils) or spy-based testing is the norm. Slightly unusual to make a production method package-private purely for testability, but the comment explains the intent. If this pattern proliferates it becomes noise; for a single use it's fine.

No new infrastructure, no new external systems, no transport protocol changes. Nothing in the architectural risk category.

Solid, boring, correct. Approved.

## 🏛️ Markus Keller — Application Architect **Verdict: ✅ Approved** The structural choices here are sound. No new packages, no new domain modules, no cross-domain coupling violations — this is a contained change inside `importing/`. The ADR referenced in the doc commits confirms the decision was documented. The doc update triggers table below resolves cleanly. ### Checklist against required doc updates | PR contains | Required update | Status | |---|---|---| | New domain concept (`SkippedFile`, `skipped count`) | `docs/GLOSSARY.md` | ✅ Added | | No new DB migration | DB diagrams | ✅ Not required | | No new package | `CLAUDE.md` package table | ✅ Not required | | No new controller/service | C4 L3 diagrams | ✅ Not required | | No new route | `CLAUDE.md` route table | ✅ Not required | | No new Docker service | L2 diagram / DEPLOYMENT.md | ✅ Not required | All required documentation updates are present. ### Architecture observations **`ImportStatus` record now carries `skippedFiles` — acceptable coupling.** The `ImportStatus` record lives inside `MassImportService`. Embedding `SkippedFile` there too keeps the domain model cohesive. If this were a separate "reporting" domain I'd push back, but for a single-service import status object this is the right call. **`ProcessResult` is package-private — correct.** It's an internal data-transfer type between `processRows` and the caller. It doesn't need to leak into the API surface. Good boundary discipline. **`openFileStream` package-private seam.** This is the test-seam pattern — acceptable in a monolith where reflection (`ReflectionTestUtils`) or spy-based testing is the norm. Slightly unusual to make a production method package-private purely for testability, but the comment explains the intent. If this pattern proliferates it becomes noise; for a single use it's fine. **No new infrastructure, no new external systems, no transport protocol changes.** Nothing in the architectural risk category. Solid, boring, correct. Approved.

marcel commented

2026-05-19 09:11:47 +02:00

🔒 Nora "NullX" Steiner — Application Security Engineer

Verdict: ✅ Approved

This PR closes a real attack surface: previously, a user with access to the import directory could slip an arbitrary file (polyglot PDF, disguised script, malware) into the archive by naming it .pdf. Magic-byte validation at the read layer is the correct defence. Let me walk through the security properties of the implementation.

What's right

Magic-byte check uses readNBytes(4) — correct API. readNBytes(n) reads up to n bytes without throwing on short reads, returning whatever it got. The header.length == 4 guard handles the short-file case atomically. No partial-read race.

The stream is opened via openFileStream() and closed in try-with-resources. No file descriptor leak path.

Logging uses SLF4J {} placeholders throughout. No string concatenation in log calls — no Log4Shell-style interpolation risk.

log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename);
log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e);

Both calls are correct. The second form passes the exception as the third argument so SLF4J appends the stack trace.

The filename that reaches the log comes from the spreadsheet cell, not from the filesystem path. Worth noting: filename is getCell(cells, colFilename) — user-controlled input from the Excel sheet. SLF4J {} parameters do not evaluate JNDI expressions, so this is safe (CWE-117 Log Injection — not exploitable via SLF4J parameterized logging).

Rejected files are not deleted and not uploaded — they are silently skipped with a skip record. This is the right behaviour. Deleting files from an import directory that may be shared could cause data loss. Recording the skip reason gives the operator visibility.

Observations (not blockers)

Magic-byte validation only, no file extension validation. The PR validates bytes but does not check that the filename ends in .pdf. A file named script.sh with %PDF prepended would pass. This is a known limitation of magic-byte-only checks (polyglot files), but for this threat model — admin-only import from a trusted directory — it's acceptable. The import directory is not accessible by untrusted users.

TOCTOU (Time-of-Check/Time-of-Use) is inherent but low risk. The magic-byte check opens the file, then FileService opens it again for upload. A race-condition attacker who can swap the file between the two opens could bypass the check. In a family archive where the import directory is on a local filesystem accessible only to the server process, this is a theoretical risk, not a practical one.

SkippedFile.reason is an unconstrained String. From a security perspective, these reason codes eventually reach the frontend. They are not reflected into HTML without escaping (Svelte auto-escapes {text} interpolations), so XSS is not a concern. Noted for completeness.

The FILE_READ_ERROR path still accumulates a SkippedFile record with the filename. The filename is user-controlled (from spreadsheet), but it's already stored in the spreadsheet and in the database — no new information disclosure.

This PR tightens the file intake boundary in a pragmatic way that's appropriate for the threat model. No security objections to merge.

## 🔒 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** This PR closes a real attack surface: previously, a user with access to the import directory could slip an arbitrary file (polyglot PDF, disguised script, malware) into the archive by naming it `.pdf`. Magic-byte validation at the read layer is the correct defence. Let me walk through the security properties of the implementation. ### What's right **Magic-byte check uses `readNBytes(4)` — correct API.** `readNBytes(n)` reads up to `n` bytes without throwing on short reads, returning whatever it got. The `header.length == 4` guard handles the short-file case atomically. No partial-read race. **The stream is opened via `openFileStream()` and closed in try-with-resources.** No file descriptor leak path. **Logging uses SLF4J `{}` placeholders throughout.** No string concatenation in log calls — no Log4Shell-style interpolation risk. ```java log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename); log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e); ``` Both calls are correct. The second form passes the exception as the third argument so SLF4J appends the stack trace. **The `filename` that reaches the log comes from the spreadsheet cell, not from the filesystem path.** Worth noting: `filename` is `getCell(cells, colFilename)` — user-controlled input from the Excel sheet. SLF4J `{}` parameters do not evaluate JNDI expressions, so this is safe (CWE-117 Log Injection — not exploitable via SLF4J parameterized logging). **Rejected files are not deleted and not uploaded — they are silently skipped with a skip record.** This is the right behaviour. Deleting files from an import directory that may be shared could cause data loss. Recording the skip reason gives the operator visibility. ### Observations (not blockers) **Magic-byte validation only, no file extension validation.** The PR validates bytes but does not check that the filename ends in `.pdf`. A file named `script.sh` with `%PDF` prepended would pass. This is a known limitation of magic-byte-only checks (polyglot files), but for this threat model — admin-only import from a trusted directory — it's acceptable. The import directory is not accessible by untrusted users. **TOCTOU (Time-of-Check/Time-of-Use) is inherent but low risk.** The magic-byte check opens the file, then `FileService` opens it again for upload. A race-condition attacker who can swap the file between the two opens could bypass the check. In a family archive where the import directory is on a local filesystem accessible only to the server process, this is a theoretical risk, not a practical one. **`SkippedFile.reason` is an unconstrained `String`.** From a security perspective, these reason codes eventually reach the frontend. They are not reflected into HTML without escaping (Svelte auto-escapes `{text}` interpolations), so XSS is not a concern. Noted for completeness. **The `FILE_READ_ERROR` path still accumulates a `SkippedFile` record with the filename.** The filename is user-controlled (from spreadsheet), but it's already stored in the spreadsheet and in the database — no new information disclosure. This PR tightens the file intake boundary in a pragmatic way that's appropriate for the threat model. No security objections to merge.

marcel commented

2026-05-19 09:12:10 +02:00

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: ✅ Approved

The test suite for this PR is genuinely good. I went through every new test case and the overall test structure. Here's my full assessment.

What's working well

Test naming is sentence-level and descriptive. runImportAsync_uploadsValidPdf_andSkipsFakeOne, runImportAsync_skipsFile_whenShorterThanFourBytes, runImportAsync_addsS3UploadFailed_toSkippedFiles_whenS3Throws — each name tells you the scenario and the expected outcome without reading the body. This is the standard.

The three high-level regression tests are correctly split into three separate tests. Upload count, skipped count, and rejected filename in skippedFiles are each verified in isolation. When one fails in CI you know exactly which property broke. This follows the one-logical-assertion-per-test principle.

setupOneValidOneFakeImport factory method is clean. It bundles the five-line setup into a reusable helper, keeping each of the three regression test bodies to a single service.runImportAsync() call plus assertion. Good extraction.

buildMinimalImportXlsx(Path, String...) is varargs — smart. Supporting variable filenames makes it usable for single-file and multi-file scenarios without a second helper.

The IOException injection test uses spy(service) + doThrow on openFileStream. This is the correct way to inject I/O failures without reaching for file system mocking. The openFileStream seam was designed for exactly this test.

lenient() is used only where actually needed (the two tests where documentService.findByOriginalFilename is called but the result doesn't matter for the assertion). Not overused.

Suggestions (non-blocking)

No test for processRows with a mix of valid, skipped, and already-exists rows in the same batch. The three regression tests cover valid+invalid but don't verify that processed and skipped are both non-zero simultaneously in the returned ProcessResult. A runImportAsync_tracksProcessedAndSkippedCounts_independently would close this gap. Minor — the existing tests provide strong coverage individually.

The AdminControllerTest changes are minimal fixture updates. No new controller-layer tests were added to verify that skipped and skippedFiles appear in the JSON response body. A mockMvc.perform(get(...)).andExpect(jsonPath("$.skipped").value(0)).andExpect(jsonPath("$.skippedFiles").isArray()) would lock in the serialization contract. Not a blocker — the service-level tests cover the data model; the controller test just needs the shape.

Frontend: 6 new Vitest tests cover the key states. DONE with skipped > 0, DONE with skipped = 0, RUNNING with skipped > 0, FAILED with skipped > 0, filename in list, unknown reason code fallthrough. Complete state matrix. Approved.

No E2E test for the admin import flow. This is consistent with the existing test strategy (no E2E for admin pages in this codebase). Not a gap introduced by this PR.

Overall the test suite is thorough and correctly layered. Approved.

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ✅ Approved** The test suite for this PR is genuinely good. I went through every new test case and the overall test structure. Here's my full assessment. ### What's working well **Test naming is sentence-level and descriptive.** `runImportAsync_uploadsValidPdf_andSkipsFakeOne`, `runImportAsync_skipsFile_whenShorterThanFourBytes`, `runImportAsync_addsS3UploadFailed_toSkippedFiles_whenS3Throws` — each name tells you the scenario and the expected outcome without reading the body. This is the standard. **The three high-level regression tests are correctly split into three separate tests.** Upload count, skipped count, and rejected filename in `skippedFiles` are each verified in isolation. When one fails in CI you know exactly which property broke. This follows the one-logical-assertion-per-test principle. **`setupOneValidOneFakeImport` factory method is clean.** It bundles the five-line setup into a reusable helper, keeping each of the three regression test bodies to a single `service.runImportAsync()` call plus assertion. Good extraction. **`buildMinimalImportXlsx(Path, String...)` is varargs — smart.** Supporting variable filenames makes it usable for single-file and multi-file scenarios without a second helper. **The IOException injection test uses `spy(service)` + `doThrow` on `openFileStream`.** This is the correct way to inject I/O failures without reaching for file system mocking. The `openFileStream` seam was designed for exactly this test. **`lenient()` is used only where actually needed** (the two tests where `documentService.findByOriginalFilename` is called but the result doesn't matter for the assertion). Not overused. ### Suggestions (non-blocking) **No test for `processRows` with a mix of valid, skipped, and already-exists rows in the same batch.** The three regression tests cover valid+invalid but don't verify that `processed` and `skipped` are both non-zero simultaneously in the returned `ProcessResult`. A `runImportAsync_tracksProcessedAndSkippedCounts_independently` would close this gap. Minor — the existing tests provide strong coverage individually. **The `AdminControllerTest` changes are minimal fixture updates.** No new controller-layer tests were added to verify that `skipped` and `skippedFiles` appear in the JSON response body. A `mockMvc.perform(get(...)).andExpect(jsonPath("$.skipped").value(0)).andExpect(jsonPath("$.skippedFiles").isArray())` would lock in the serialization contract. Not a blocker — the service-level tests cover the data model; the controller test just needs the shape. **Frontend: 6 new Vitest tests cover the key states.** DONE with skipped > 0, DONE with skipped = 0, RUNNING with skipped > 0, FAILED with skipped > 0, filename in list, unknown reason code fallthrough. Complete state matrix. Approved. **No E2E test for the admin import flow.** This is consistent with the existing test strategy (no E2E for admin pages in this codebase). Not a gap introduced by this PR. Overall the test suite is thorough and correctly layered. Approved.

marcel commented

2026-05-19 09:12:35 +02:00

🎨 Leonie Voss — UI/UX Design Lead & Accessibility Strategist

Verdict: ✅ Approved

This is a well-considered admin UI addition. I reviewed the rendered Svelte markup against the brand standards and WCAG criteria. Previous concerns about touch targets and accessibility have been addressed. Here's my full pass.

Accessibility — WCAG 2.1/2.2

Touch target on <summary> — ✅ Resolved. The <summary> element has min-h-[44px] and py-2. The 44×44px minimum is met. This was flagged in a prior round and is now correct.

aria-live="polite" wraps the skipped section. Correct — when the import completes and the skipped section appears, screen readers will announce it after the current utterance finishes. polite is the right politeness level (not assertive, which would interrupt).

aria-hidden="true" on the decorative chevron SVG. Correct — the SVG is purely visual; screen readers skip it and read the count + label text inside the <summary> instead.

The <details>/<summary> native element is the right pattern here. It provides keyboard accessibility (Enter/Space to toggle), built-in open/closed state, and announces "disclosure triangle" to screen readers. No custom ARIA needed.

The filename list is a <ul> with <li> items. Semantic, correct. Screen readers announce "list, N items" giving the user a count before reading filenames.

Color is not used alone. The amber warning section uses border + background color + the "skipped" label text. Color-blind users see the structural distinction. Good.

Brand compliance

The <details> border color uses border-warning/40 and bg-warning/10. I note that warning is not part of the documented Familienarchiv token set in layout.css (which defines brand-navy, brand-mint, bg-surface, bg-canvas, border-line). This appears to be a Tailwind 4 semantic color. If warning maps to amber-500 by default in this project's config, the visual output is fine. Worth verifying the token is defined in layout.css or tailwind.config.* to ensure it survives a future Tailwind upgrade. Minor — not a blocker.

Section label uses font-sans text-xs font-bold tracking-widest uppercase — matches the section title pattern from the design system. ✅

The skipped count uses text-base font-bold — matches the metric display pattern used for processed elsewhere in the same card. Consistent. ✅

The filename list uses font-mono text-sm text-ink-2 — appropriate for filename display. Monospace makes filenames visually distinct from prose. ✅

Responsive / mobile

max-h-64 overflow-y-auto on the <ul>. This prevents a long filename list from pushing the button below the viewport. Correct. On 320px screens the collapsible nature of <details> means the list only appears on demand — good progressive disclosure.

The min-h-[44px] on <summary> and py-2 combination. On small screens the summary row may be narrower than 44px horizontally on very short counters, but the height minimum is the critical WCAG 2.2 criterion. Acceptable.

CSS in `<style>` block

details[open] .details-chevron {
    transform: rotate(90deg);
}

The motion-safe:transition-transform class on the chevron correctly gates the animation behind prefers-reduced-motion. Users who have motion sensitivity won't see the spinning animation. This respects WCAG 2.1 criterion 2.3.3. ✅

Summary

One minor note (the warning token provenance) — not a blocker. Everything else is clean, accessible, and on-brand. Approved.

## 🎨 Leonie Voss — UI/UX Design Lead & Accessibility Strategist **Verdict: ✅ Approved** This is a well-considered admin UI addition. I reviewed the rendered Svelte markup against the brand standards and WCAG criteria. Previous concerns about touch targets and accessibility have been addressed. Here's my full pass. ### Accessibility — WCAG 2.1/2.2 **Touch target on `<summary>` — ✅ Resolved.** The `<summary>` element has `min-h-[44px]` and `py-2`. The 44×44px minimum is met. This was flagged in a prior round and is now correct. **`aria-live="polite"` wraps the skipped section.** Correct — when the import completes and the skipped section appears, screen readers will announce it after the current utterance finishes. `polite` is the right politeness level (not `assertive`, which would interrupt). **`aria-hidden="true"` on the decorative chevron SVG.** Correct — the SVG is purely visual; screen readers skip it and read the count + label text inside the `<summary>` instead. **The `<details>`/`<summary>` native element is the right pattern here.** It provides keyboard accessibility (Enter/Space to toggle), built-in open/closed state, and announces "disclosure triangle" to screen readers. No custom ARIA needed. **The filename list is a `<ul>` with `<li>` items.** Semantic, correct. Screen readers announce "list, N items" giving the user a count before reading filenames. **Color is not used alone.** The amber warning section uses border + background color + the "skipped" label text. Color-blind users see the structural distinction. Good. ### Brand compliance **The `<details>` border color uses `border-warning/40` and `bg-warning/10`.** I note that `warning` is not part of the documented Familienarchiv token set in `layout.css` (which defines `brand-navy`, `brand-mint`, `bg-surface`, `bg-canvas`, `border-line`). This appears to be a Tailwind 4 semantic color. If `warning` maps to amber-500 by default in this project's config, the visual output is fine. Worth verifying the token is defined in `layout.css` or `tailwind.config.*` to ensure it survives a future Tailwind upgrade. Minor — not a blocker. **Section label uses `font-sans text-xs font-bold tracking-widest uppercase`** — matches the section title pattern from the design system. ✅ **The skipped count uses `text-base font-bold`** — matches the metric display pattern used for `processed` elsewhere in the same card. Consistent. ✅ **The filename list uses `font-mono text-sm text-ink-2`** — appropriate for filename display. Monospace makes filenames visually distinct from prose. ✅ ### Responsive / mobile **`max-h-64 overflow-y-auto` on the `<ul>`.** This prevents a long filename list from pushing the button below the viewport. Correct. On 320px screens the collapsible nature of `<details>` means the list only appears on demand — good progressive disclosure. **The `min-h-[44px]` on `<summary>` and `py-2` combination.** On small screens the summary row may be narrower than 44px horizontally on very short counters, but the height minimum is the critical WCAG 2.2 criterion. Acceptable. ### CSS in `<style>` block ```css details[open] .details-chevron { transform: rotate(90deg); } ``` The `motion-safe:transition-transform` class on the chevron correctly gates the animation behind `prefers-reduced-motion`. Users who have motion sensitivity won't see the spinning animation. This respects WCAG 2.1 criterion 2.3.3. ✅ ### Summary One minor note (the `warning` token provenance) — not a blocker. Everything else is clean, accessible, and on-brand. Approved.

marcel commented

2026-05-19 09:12:46 +02:00

⚙️ Tobias Wendt — DevOps & Platform Engineer

Verdict: ✅ Approved

This PR touches no infrastructure files — no Compose changes, no CI workflow changes, no Dockerfile modifications, no new environment variables, no new services. My review scope is limited to the operational implications of the code changes.

Operational considerations

No new environment variables or configuration required. The magic-byte check runs in-process from the local filesystem path (importDir). No new external dependencies, no new MinIO buckets, no new config flags. Deployment is a straight ./mvnw clean package + restart. ✅

S3 failure is now surfaced in skippedFiles. Previously, an S3 upload failure during import caused a silent return — the document placeholder was not created, the file was not uploaded, and there was no observable trace in the import status. Now S3_UPLOAD_FAILED appears in skippedFiles. This is operationally valuable: if MinIO is flaky during a batch import, the admin can see exactly which files need re-importing without digging through logs. ✅

No performance impact on the happy path. Reading 4 bytes from each file before upload adds negligible latency vs the S3 PUT operation. For a typical import of 100 files the overhead is unmeasurable. ✅

Log output is clean. log.warn("Überspringe {}: ...") at WARN level and log.error("Fehler beim Prüfen der Magic-Bytes für {}", ...) at ERROR level — correctly severity-coded. Operators monitoring log streams will see skipped files at WARN, I/O failures at ERROR. ✅

The skippedFiles list is bounded by the number of rows in the import spreadsheet. No unbounded memory growth vector introduced. The ImportStatus object is held in memory between imports but List.copyOf() ensures it's a compact, unresizable list. ✅

Nothing to flag from a platform perspective. Clean change.

## ⚙️ Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** This PR touches no infrastructure files — no Compose changes, no CI workflow changes, no Dockerfile modifications, no new environment variables, no new services. My review scope is limited to the operational implications of the code changes. ### Operational considerations **No new environment variables or configuration required.** The magic-byte check runs in-process from the local filesystem path (`importDir`). No new external dependencies, no new MinIO buckets, no new config flags. Deployment is a straight `./mvnw clean package` + restart. ✅ **S3 failure is now surfaced in `skippedFiles`.** Previously, an S3 upload failure during import caused a silent `return` — the document placeholder was not created, the file was not uploaded, and there was no observable trace in the import status. Now `S3_UPLOAD_FAILED` appears in `skippedFiles`. This is operationally valuable: if MinIO is flaky during a batch import, the admin can see exactly which files need re-importing without digging through logs. ✅ **No performance impact on the happy path.** Reading 4 bytes from each file before upload adds negligible latency vs the S3 PUT operation. For a typical import of 100 files the overhead is unmeasurable. ✅ **Log output is clean.** `log.warn("Überspringe {}: ...")` at WARN level and `log.error("Fehler beim Prüfen der Magic-Bytes für {}", ...)` at ERROR level — correctly severity-coded. Operators monitoring log streams will see skipped files at WARN, I/O failures at ERROR. ✅ **The `skippedFiles` list is bounded by the number of rows in the import spreadsheet.** No unbounded memory growth vector introduced. The `ImportStatus` object is held in memory between imports but `List.copyOf()` ensures it's a compact, unresizable list. ✅ Nothing to flag from a platform perspective. Clean change.

marcel commented

2026-05-19 09:13:05 +02:00

📋 Elicit — Requirements Engineer

Verdict: ✅ Approved

Reviewing from the requirements and acceptance-criteria perspective. I'm checking whether the delivered behaviour matches what the original issue (#529) would have required, and whether any edge cases, NFRs, or user-visible gaps remain.

Requirements coverage

The PR description states four deliverables:

Backend magic-byte validation — ✅ Delivered. Files shorter than 4 bytes are also rejected (the header.length == 4 guard).
API skipped + skippedFiles fields on ImportStatus — ✅ Delivered. Both fields are present in the record and serialised correctly.
Frontend amber warning section with collapsible list — ✅ Delivered. Visible only in DONE state, hidden when skipped === 0.
i18n keys for all three locales — ✅ Delivered. de, en, es all have the four new keys.

Glossary additions

The two new glossary entries (SkippedFile, skipped count) are precise and implementation-accurate. The four reason codes are all listed with their meanings. This satisfies the documentation requirement for new domain concepts.

Edge cases verified by tests

Scenario	Covered
Valid PDF uploaded	✅ `setupOneValidOneFakeImport`
Invalid magic bytes skipped	✅ same setup
File shorter than 4 bytes skipped	✅ `runImportAsync_skipsFile_whenShorterThanFourBytes`
IOException on read skipped	✅ `runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException`
S3 upload failure surfaces in skippedFiles	✅ `runImportAsync_addsS3UploadFailed_toSkippedFiles_whenS3Throws`
Already-exists document surfaces in skippedFiles	✅ `runImportAsync_addsAlreadyExists_toSkippedFiles_whenDocumentAlreadyUploaded`
Metadata-only rows (no file) still import	✅ pre-existing tests + the `fileOnDisk.isPresent()` guard

Remaining open question (minor, not a blocker)

What happens when the import contains ONLY skipped files (processed = 0)? The state still transitions to DONE. The admin sees the green "Import abgeschlossen" banner alongside an amber "N übersprungen" section. This is arguably confusing — a run that processed zero documents but shows a green complete banner. It's not a regression (previously the same run would silently complete with processed = 0), but it's a UX gap worth tracking as a follow-up issue.

Suggested follow-up: if processed === 0 && skipped > 0, consider showing a combined amber/warning state rather than green DONE. Not a blocker for this PR.

NFR check

Observability — ✅ WARN log per skipped file, ERROR log per I/O failure, admin UI surfaces the count.
Accessibility — ✅ covered by Leonie's review.
i18n — ✅ all three locales.
Performance — ✅ negligible overhead per Tobias's assessment.
Security — ✅ covered by Nora's review.

Requirements are fully met. The one edge case (all-skipped import showing green) is a cosmetic UX issue worth a follow-up issue, not a blocker.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** Reviewing from the requirements and acceptance-criteria perspective. I'm checking whether the delivered behaviour matches what the original issue (#529) would have required, and whether any edge cases, NFRs, or user-visible gaps remain. ### Requirements coverage The PR description states four deliverables: 1. **Backend magic-byte validation** — ✅ Delivered. Files shorter than 4 bytes are also rejected (the `header.length == 4` guard). 2. **API `skipped` + `skippedFiles` fields on `ImportStatus`** — ✅ Delivered. Both fields are present in the record and serialised correctly. 3. **Frontend amber warning section with collapsible list** — ✅ Delivered. Visible only in DONE state, hidden when `skipped === 0`. 4. **i18n keys for all three locales** — ✅ Delivered. `de`, `en`, `es` all have the four new keys. ### Glossary additions The two new glossary entries (`SkippedFile`, `skipped count`) are precise and implementation-accurate. The four reason codes are all listed with their meanings. This satisfies the documentation requirement for new domain concepts. ### Edge cases verified by tests | Scenario | Covered | |---|---| | Valid PDF uploaded | ✅ `setupOneValidOneFakeImport` | | Invalid magic bytes skipped | ✅ same setup | | File shorter than 4 bytes skipped | ✅ `runImportAsync_skipsFile_whenShorterThanFourBytes` | | IOException on read skipped | ✅ `runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException` | | S3 upload failure surfaces in skippedFiles | ✅ `runImportAsync_addsS3UploadFailed_toSkippedFiles_whenS3Throws` | | Already-exists document surfaces in skippedFiles | ✅ `runImportAsync_addsAlreadyExists_toSkippedFiles_whenDocumentAlreadyUploaded` | | Metadata-only rows (no file) still import | ✅ pre-existing tests + the `fileOnDisk.isPresent()` guard | ### Remaining open question (minor, not a blocker) **What happens when the import contains ONLY skipped files (processed = 0)?** The state still transitions to `DONE`. The admin sees the green "Import abgeschlossen" banner alongside an amber "N übersprungen" section. This is arguably confusing — a run that processed zero documents but shows a green complete banner. It's not a regression (previously the same run would silently complete with `processed = 0`), but it's a UX gap worth tracking as a follow-up issue. Suggested follow-up: if `processed === 0 && skipped > 0`, consider showing a combined amber/warning state rather than green DONE. Not a blocker for this PR. ### NFR check - **Observability** — ✅ WARN log per skipped file, ERROR log per I/O failure, admin UI surfaces the count. - **Accessibility** — ✅ covered by Leonie's review. - **i18n** — ✅ all three locales. - **Performance** — ✅ negligible overhead per Tobias's assessment. - **Security** — ✅ covered by Nora's review. Requirements are fully met. The one edge case (all-skipped import showing green) is a cosmetic UX issue worth a follow-up issue, not a blocker.

marcel merged commit 18e675a5b2 into main

2026-05-19 09:45:04 +02:00

marcel deleted branch worktree-feat+issue-529-pdf-magic-bytes

2026-05-19 09:45:05 +02:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#618