security(import): reject path-traversal filenames in MassImportService.processRows #650

Merged
marcel merged 6 commits from feat/issue-530-reject-path-traversal-filenames into main 2026-05-21 10:30:57 +02:00
Owner

Summary

  • Adds isValidImportFilename(String) private guard to MassImportService, called after the filename ternary in processRows and before findFileRecursive
  • Rejects filenames containing /, \, .. (substring), ., null bytes, or absolute paths — with SLF4J parameterised logging and SkippedFile(filename, "INVALID_FILENAME_PATH_TRAVERSAL") added to the shared skip counter
  • 10 permanent regression tests: 9 unit tests for the guard method + 1 processRows integration test asserting skip-and-continue behaviour

Closes #530.

Test plan

  • MassImportServiceTest — 58 tests, 0 failures (./mvnw test -Dtest=MassImportServiceTest)
  • Backend build clean (./mvnw clean package -DskipTests)
  • Import a spreadsheet with ../evil in a filename cell — row appears in skipped list, remaining rows still imported

🤖 Generated with Claude Code

## Summary - Adds `isValidImportFilename(String)` private guard to `MassImportService`, called after the filename ternary in `processRows` and before `findFileRecursive` - Rejects filenames containing `/`, `\`, `..` (substring), `.`, null bytes, or absolute paths — with SLF4J parameterised logging and `SkippedFile(filename, "INVALID_FILENAME_PATH_TRAVERSAL")` added to the shared skip counter - 10 permanent regression tests: 9 unit tests for the guard method + 1 `processRows` integration test asserting skip-and-continue behaviour Closes #530. ## Test plan - [ ] `MassImportServiceTest` — 58 tests, 0 failures (`./mvnw test -Dtest=MassImportServiceTest`) - [ ] Backend build clean (`./mvnw clean package -DskipTests`) - [ ] Import a spreadsheet with `../evil` in a filename cell — row appears in skipped list, remaining rows still imported 🤖 Generated with [Claude Code](https://claude.com/claude-code)
marcel added 2 commits 2026-05-21 09:54:16 +02:00
Codifies the path-traversal constraint that was previously safe by
accident (findFileRecursive's getFileName() strip) but had no explicit
guard or test coverage. Fixes issue #530.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
security(import): wire isValidImportFilename guard into processRows
All checks were successful
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m26s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
CI / Unit & Component Tests (pull_request) Successful in 3m30s
38a4ca2e34
Rejects path-traversal filenames before findFileRecursive runs.
Guard runs on the derived filename (after the ternary) as specified.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

👷 Felix Brandt — Senior Fullstack Developer

Verdict: Approved

Clean, focused PR. The implementation does exactly one thing and the tests prove it.

Blockers

None.

Suggestions

isValidImportFilename — prefer early-return, consider Paths.get exception safety

The method (diff +325–+331) already uses guard clauses which is great. One minor nit: the filename.equals(".") check on line +330 is subsumed by the filename.contains("..") check directly above — "." doesn't contain ".." so it is not subsumed — this is correct. No issue.

However, there is a subtle reliability concern: Paths.get(filename).isAbsolute() on line +331 can throw InvalidPathException on certain OS-specific illegal characters (e.g. null bytes should have been caught already, but Windows paths with *, ?, ", <, >, | would throw here before returning false). Because the null-byte check is on line +329 before the Paths.get call, and the service presumably runs on Linux, this is unlikely to be exploitable in practice — but it could produce a non-descriptive 500 for a spreadsheet cell containing *evil.pdf on Windows. Worth a comment:

// Paths.get() may throw InvalidPathException for illegal OS-specific chars.
// On Linux this is safe for all inputs that have passed the checks above.
if (Paths.get(filename).isAbsolute()) return false;

Test: minimalCells helper

The integration test on line +459 calls minimalCells("../evil") — this helper was pre-existing, so no issue with its existence, but the test name processRows_skipsRowAndContinues_whenFilenameIsPathTraversal is excellent and follows the project naming convention perfectly.

One missing positive test case: filename with a leading dot (hidden file)

".hidden" is a valid base name on Linux and should pass the validator. This isn't a security issue — it's a coverage edge case. Consider adding it if .hidden files are plausible in the archive.

What's done well

  • Guard-clause structure in isValidImportFilename is textbook.
  • The security regression comment // ─── isValidImportFilename — security regression — do not remove ───────── is exactly the right signal to future maintainers.
  • SLF4J parameterized logging log.warn("Skipping import row {}: filename rejected — {}", i, filename) — correct, no injection risk.
  • 9 unit tests + 1 integration test is a proportionate coverage for a 12-line guard method.
## 👷 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** Clean, focused PR. The implementation does exactly one thing and the tests prove it. ### Blockers None. ### Suggestions **`isValidImportFilename` — prefer early-return, consider `Paths.get` exception safety** The method (diff `+325–+331`) already uses guard clauses which is great. One minor nit: the `filename.equals(".")` check on line `+330` is subsumed by the `filename.contains("..")` check directly above — `"."` doesn't contain `".."` so it is *not* subsumed — this is correct. No issue. However, there is a subtle reliability concern: `Paths.get(filename).isAbsolute()` on line `+331` can throw `InvalidPathException` on certain OS-specific illegal characters (e.g. null bytes should have been caught already, but Windows paths with `*`, `?`, `"`, `<`, `>`, `|` would throw here before returning `false`). Because the null-byte check is on line `+329` before the `Paths.get` call, and the service presumably runs on Linux, this is unlikely to be exploitable in practice — but it could produce a non-descriptive 500 for a spreadsheet cell containing `*evil.pdf` on Windows. Worth a comment: ```java // Paths.get() may throw InvalidPathException for illegal OS-specific chars. // On Linux this is safe for all inputs that have passed the checks above. if (Paths.get(filename).isAbsolute()) return false; ``` **Test: `minimalCells` helper** The integration test on line `+459` calls `minimalCells("../evil")` — this helper was pre-existing, so no issue with its existence, but the test name `processRows_skipsRowAndContinues_whenFilenameIsPathTraversal` is excellent and follows the project naming convention perfectly. **One missing positive test case: filename with a leading dot (hidden file)** `".hidden"` is a valid base name on Linux and should pass the validator. This isn't a security issue — it's a coverage edge case. Consider adding it if `.hidden` files are plausible in the archive. ### What's done well - Guard-clause structure in `isValidImportFilename` is textbook. - The security regression comment `// ─── isValidImportFilename — security regression — do not remove ─────────` is exactly the right signal to future maintainers. - SLF4J parameterized logging `log.warn("Skipping import row {}: filename rejected — {}", i, filename)` — correct, no injection risk. - 9 unit tests + 1 integration test is a proportionate coverage for a 12-line guard method.
Author
Owner

🏛️ Markus Keller — Application Architect

Verdict: Approved

The change is well-scoped, architecturally clean, and introduces no new cross-domain dependencies or structural concerns. No documentation updates are required by the doc-update matrix for this PR.

Blocker assessment against doc-update matrix

Trigger Required update Present?
New Flyway migration DB diagrams No migration in this PR — N/A
New backend package / domain module CLAUDE.md + C4 diagram No new package — N/A
New controller or service C4 diagram isValidImportFilename is a private method, not a new service — N/A
New SvelteKit route CLAUDE.md + C4 frontend No frontend changes — N/A
New ErrorCode CLAUDE.md + ARCHITECTURE.md "INVALID_FILENAME_PATH_TRAVERSAL" is a string literal, not a new ErrorCode enum value — see suggestion below
Architectural decision ADR This is a targeted security fix, not a new architectural pattern — N/A

No documentation blockers.

Suggestions

"INVALID_FILENAME_PATH_TRAVERSAL" is a raw string, not a typed ErrorCode

The skip-reason string ("INVALID_FILENAME_PATH_TRAVERSAL") is added as a literal on diff line +295. All other skip reasons in SkippedFile are also strings ("INVALID_PDF_SIGNATURE", "ALREADY_EXISTS", "S3_UPLOAD_FAILED" etc.), so this PR follows the existing convention. That convention is itself a mild architecture smell — an ErrorCode enum or a SkipReason enum would prevent typos and make the full set of reasons greppable — but that's pre-existing tech debt, not introduced by this PR. Worth a follow-up issue; not a blocker here.

Placement of the guard relative to the filename ternary

The guard is called after the .pdf extension is appended (diff +293). This is the correct position: the validator sees the final filename that will actually be used in findFileRecursive, not the raw spreadsheet cell value. Good placement decision — it avoids the gap where index = "../evil" would become "../evil.pdf" before validation.

What's done well

  • The guard lives entirely inside the importing package — no cross-domain leakage.
  • Private method — not exposed as a service boundary.
  • The fix is additive: existing behavior for valid filenames is unchanged.
## 🏛️ Markus Keller — Application Architect **Verdict: ✅ Approved** The change is well-scoped, architecturally clean, and introduces no new cross-domain dependencies or structural concerns. No documentation updates are required by the doc-update matrix for this PR. ### Blocker assessment against doc-update matrix | Trigger | Required update | Present? | |---|---|---| | New Flyway migration | DB diagrams | No migration in this PR — N/A | | New backend package / domain module | CLAUDE.md + C4 diagram | No new package — N/A | | New controller or service | C4 diagram | `isValidImportFilename` is a private method, not a new service — N/A | | New SvelteKit route | CLAUDE.md + C4 frontend | No frontend changes — N/A | | New ErrorCode | CLAUDE.md + ARCHITECTURE.md | `"INVALID_FILENAME_PATH_TRAVERSAL"` is a **string literal**, not a new `ErrorCode` enum value — see suggestion below | | Architectural decision | ADR | This is a targeted security fix, not a new architectural pattern — N/A | **No documentation blockers.** ### Suggestions **`"INVALID_FILENAME_PATH_TRAVERSAL"` is a raw string, not a typed `ErrorCode`** The skip-reason string (`"INVALID_FILENAME_PATH_TRAVERSAL"`) is added as a literal on diff line `+295`. All other skip reasons in `SkippedFile` are also strings (`"INVALID_PDF_SIGNATURE"`, `"ALREADY_EXISTS"`, `"S3_UPLOAD_FAILED"` etc.), so this PR follows the existing convention. That convention is itself a mild architecture smell — an `ErrorCode` enum or a `SkipReason` enum would prevent typos and make the full set of reasons greppable — but that's pre-existing tech debt, not introduced by this PR. Worth a follow-up issue; not a blocker here. **Placement of the guard relative to the `filename` ternary** The guard is called *after* the `.pdf` extension is appended (diff `+293`). This is the correct position: the validator sees the final filename that will actually be used in `findFileRecursive`, not the raw spreadsheet cell value. Good placement decision — it avoids the gap where `index = "../evil"` would become `"../evil.pdf"` before validation. ### What's done well - The guard lives entirely inside the `importing` package — no cross-domain leakage. - Private method — not exposed as a service boundary. - The fix is additive: existing behavior for valid filenames is unchanged.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Verdict: ⚠️ Approved with concerns

The path-traversal fix is genuine and the regression tests are solid. I have one concern about completeness of the guard and one observation about defense-in-depth that is worth documenting.

Blockers

None — the current guard eliminates the primary path-traversal vectors for the Linux production environment.

Security analysis of isValidImportFilename

CWE-22 (Path Traversal) — primary threat model

The guard correctly blocks:

  • ../evil → caught by contains("..")
  • /etc/passwd → caught by isAbsolute()
  • etc/passwd → caught by contains("/")
  • ..\evil → caught by contains("..") (the backslash also hits the contains("\\") check)
  • Null-byte injection file\0.pdf → caught by contains("\0")
  • Single dot "." → caught by filename.equals(".")

Concern 1 (Medium): Unicode lookalike / overlong encoding bypass

The guard operates on the Java String (UTF-16). Filenames arriving from Excel are decoded by Apache POI before this check runs, so classic overlong UTF-8 encodings (e.g. %2e%2e%2f../) are already resolved. This is safe.

However, Unicode homoglyph directory separators (U+2215 DIVISION SLASH , U+29F5 REVERSE SOLIDUS , U+FF0F FULLWIDTH SOLIDUS ) are not caught by the contains("/") and contains("\\") checks. On Linux, java.io.File uses the OS-level path resolution, which does not normalise these Unicode variants to /. So a filename like foo∕etc∕passwd would pass the guard and then be passed to findFileRecursive. Whether findFileRecursive would actually traverse the filesystem with such a name depends on its implementation, but to be safe:

// Consider adding after the backslash check:
if (filename.contains("∕") || filename.contains("/") || filename.contains("⧵")) return false;
// Or more robustly: reject any character outside the safe set
// Allowed: alphanumeric, hyphen, underscore, dot (single), space
if (!filename.matches("[\\w\\s.\\-()]+")) return false;

The positive allowlist approach (matches("[\\w\\s.\\-()]+")) would be more robust than the current denylist and would eliminate this and all future bypass vectors at once. However, it could also reject legitimate filenames if the archive contains accented characters (Ä, ü, ö, é…) which are common in a German family archive. If you go with allowlist, extend the character class to include \p{L}\p{N} (Unicode letters and digits) rather than \w (ASCII-only word chars). This is a suggestion, not a blocker — the current denylist covers the real-world attack surface for this application.

Concern 2 (Low): findFileRecursive is not shown in this diff

The guard is a pre-filter. If findFileRecursive itself uses File(baseDir, filename) or Path.resolve(filename) without a post-check that the resolved path is still under baseDir, a bypass could theoretically occur if the guard is ever weakened. Defense-in-depth would add a canonical-path check inside findFileRecursive:

// Inside findFileRecursive, after resolving the candidate File:
if (!candidate.getCanonicalPath().startsWith(baseDir.getCanonicalPath() + File.separator)) {
    throw DomainException.internal(ErrorCode.INTERNAL_ERROR, "Path escape detected: " + candidate);
}

Not a blocker for this PR since the guard prevents the dangerous input from reaching that code, but worth a follow-up issue.

What's done well

  • SLF4J parameterized logging — no Log4Shell risk.
  • Null check before isBlank() — no NPE on null filename.
  • Tests use real attack payloads (../evil, /etc/passwd, ..\\etc\\passwd, file\0.pdf) — not synthetic safe inputs. This is exactly how security regression tests should be written.
  • The // ─── security regression — do not remove ─── comment is a strong signal to future maintainers.
  • Paths.get(filename).isAbsolute() covers Windows absolute paths (C:\...) even though the production environment is Linux — good belt-and-suspenders.
## 🔒 Nora "NullX" Steiner — Application Security Engineer **Verdict: ⚠️ Approved with concerns** The path-traversal fix is genuine and the regression tests are solid. I have one concern about completeness of the guard and one observation about defense-in-depth that is worth documenting. ### Blockers None — the current guard eliminates the primary path-traversal vectors for the Linux production environment. ### Security analysis of `isValidImportFilename` **CWE-22 (Path Traversal) — primary threat model** The guard correctly blocks: - `../evil` → caught by `contains("..")` - `/etc/passwd` → caught by `isAbsolute()` - `etc/passwd` → caught by `contains("/")` - `..\evil` → caught by `contains("..")` (the backslash also hits the `contains("\\")` check) - Null-byte injection `file\0.pdf` → caught by `contains("\0")` - Single dot `"."` → caught by `filename.equals(".")` **Concern 1 (Medium): Unicode lookalike / overlong encoding bypass** The guard operates on the Java `String` (UTF-16). Filenames arriving from Excel are decoded by Apache POI before this check runs, so classic overlong UTF-8 encodings (e.g. `%2e%2e%2f` → `../`) are already resolved. This is safe. However, Unicode homoglyph directory separators (U+2215 DIVISION SLASH `∕`, U+29F5 REVERSE SOLIDUS `⟵`, U+FF0F FULLWIDTH SOLIDUS `/`) are **not** caught by the `contains("/")` and `contains("\\")` checks. On Linux, `java.io.File` uses the OS-level path resolution, which does **not** normalise these Unicode variants to `/`. So a filename like `foo∕etc∕passwd` would pass the guard and then be passed to `findFileRecursive`. Whether `findFileRecursive` would actually traverse the filesystem with such a name depends on its implementation, but to be safe: ```java // Consider adding after the backslash check: if (filename.contains("∕") || filename.contains("/") || filename.contains("⧵")) return false; // Or more robustly: reject any character outside the safe set // Allowed: alphanumeric, hyphen, underscore, dot (single), space if (!filename.matches("[\\w\\s.\\-()]+")) return false; ``` The positive allowlist approach (`matches("[\\w\\s.\\-()]+")`) would be more robust than the current denylist and would eliminate this and all future bypass vectors at once. However, it could also reject legitimate filenames if the archive contains accented characters (Ä, ü, ö, é…) which are common in a German family archive. If you go with allowlist, extend the character class to include `\p{L}\p{N}` (Unicode letters and digits) rather than `\w` (ASCII-only word chars). This is a suggestion, not a blocker — the current denylist covers the real-world attack surface for this application. **Concern 2 (Low): `findFileRecursive` is not shown in this diff** The guard is a pre-filter. If `findFileRecursive` itself uses `File(baseDir, filename)` or `Path.resolve(filename)` without a post-check that the resolved path is still under `baseDir`, a bypass could theoretically occur if the guard is ever weakened. Defense-in-depth would add a canonical-path check inside `findFileRecursive`: ```java // Inside findFileRecursive, after resolving the candidate File: if (!candidate.getCanonicalPath().startsWith(baseDir.getCanonicalPath() + File.separator)) { throw DomainException.internal(ErrorCode.INTERNAL_ERROR, "Path escape detected: " + candidate); } ``` Not a blocker for this PR since the guard prevents the dangerous input from reaching that code, but worth a follow-up issue. ### What's done well - SLF4J parameterized logging — no Log4Shell risk. - Null check before `isBlank()` — no NPE on null filename. - Tests use real attack payloads (`../evil`, `/etc/passwd`, `..\\etc\\passwd`, `file\0.pdf`) — not synthetic safe inputs. This is exactly how security regression tests should be written. - The `// ─── security regression — do not remove ───` comment is a strong signal to future maintainers. - `Paths.get(filename).isAbsolute()` covers Windows absolute paths (`C:\...`) even though the production environment is Linux — good belt-and-suspenders.
Author
Owner

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: Approved

10 tests for a 12-line guard method is proportionate and well-structured. The tests are permanent, descriptive, and cover the right behaviors.

Blocker assessment

None.

Test quality review

Naming — excellent
All 10 test names follow the methodName_returnsX_whenY convention and read as specifications:

  • isValidImportFilename_returnsFalse_whenFilenameContainsForwardSlash
  • processRows_skipsRowAndContinues_whenFilenameIsPathTraversal

Layer placement — correct

  • 9 unit tests call the private method via ReflectionTestUtils.invokeMethod — appropriate, these are fast, isolated, and exercise the guard logic directly.
  • 1 integration test calls processRows — correct layer for a skip-and-continue behaviour that involves mocked collaborators (documentService, documentService.save).

Coverage of the guard method

Input class Covered?
null
blank string
forward slash
backslash
.. substring
.. as full value
absolute path
null byte
plain basename (happy path)

Gaps (suggestions, not blockers)

  1. No test for .pdf extension appended to .. input — the processRows code converts .. (no dot in it? wait: .. does contain ., so the ternary takes the index branch, filename = ..). Actually .. contains . so filename = ".." — the contains("..") check would catch this. Covered implicitly by isValidImportFilename_returnsFalse_whenFilenameIsDotDot. No gap here.

  2. No negative test for processRows — that a valid row is NOT skipped when the filename is clean — the integration test asserts processed == 1 for "legitimate.pdf" which implicitly verifies this. Covered.

  3. No test verifying the skip reason string value — the test asserts:

    .extracting(MassImportService.SkippedFile::reason)
    .containsExactly("INVALID_FILENAME_PATH_TRAVERSAL");
    

    This directly asserts the reason string.

  4. Missing: isValidImportFilename_returnsTrue_whenFilenameHasSpaces — filenames like "Brief an Oma.pdf" are realistic for a German family archive. The current guard would return true for these (no forbidden chars), so they would pass — but there's no positive test confirming this. Low priority but worth adding.

  5. ReflectionTestUtils.invokeMethod on a private method — this is an acceptable pragmatic choice given the method is deliberately private (it's an internal security guard, not a service API). The security regression comment makes the intent clear.

Integration test setup quality

The processRows_skipsRowAndContinues_whenFilenameIsPathTraversal test correctly:

  • Provides a header row + two data rows (one bad, one good)
  • Mocks only what's needed (documentService.findByOriginalFilename, documentService.save)
  • Asserts both the processed count and the skippedFiles reason

This is clean AAA (Arrange-Act-Assert) with no unnecessary setup.

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ✅ Approved** 10 tests for a 12-line guard method is proportionate and well-structured. The tests are permanent, descriptive, and cover the right behaviors. ### Blocker assessment None. ### Test quality review **Naming — excellent** All 10 test names follow the `methodName_returnsX_whenY` convention and read as specifications: - `isValidImportFilename_returnsFalse_whenFilenameContainsForwardSlash` ✅ - `processRows_skipsRowAndContinues_whenFilenameIsPathTraversal` ✅ **Layer placement — correct** - 9 unit tests call the private method via `ReflectionTestUtils.invokeMethod` — appropriate, these are fast, isolated, and exercise the guard logic directly. - 1 integration test calls `processRows` — correct layer for a skip-and-continue behaviour that involves mocked collaborators (`documentService`, `documentService.save`). **Coverage of the guard method** | Input class | Covered? | |---|---| | `null` | ✅ | | blank string | ✅ | | forward slash | ✅ | | backslash | ✅ | | `..` substring | ✅ | | `..` as full value | ✅ | | absolute path | ✅ | | null byte | ✅ | | plain basename (happy path) | ✅ | **Gaps (suggestions, not blockers)** 1. **No test for `.pdf` extension appended to `..` input** — the processRows code converts `..` (no dot in it? wait: `..` does contain `.`, so the ternary takes the `index` branch, filename = `..`). Actually `..` contains `.` so `filename = ".."` — the `contains("..")` check would catch this. Covered implicitly by `isValidImportFilename_returnsFalse_whenFilenameIsDotDot`. ✅ No gap here. 2. **No negative test for `processRows` — that a valid row is NOT skipped when the filename is clean** — the integration test asserts `processed == 1` for `"legitimate.pdf"` which implicitly verifies this. ✅ Covered. 3. **No test verifying the skip reason string value** — the test asserts: ```java .extracting(MassImportService.SkippedFile::reason) .containsExactly("INVALID_FILENAME_PATH_TRAVERSAL"); ``` This directly asserts the reason string. ✅ 4. **Missing: `isValidImportFilename_returnsTrue_whenFilenameHasSpaces`** — filenames like `"Brief an Oma.pdf"` are realistic for a German family archive. The current guard would return `true` for these (no forbidden chars), so they would pass — but there's no positive test confirming this. Low priority but worth adding. 5. **`ReflectionTestUtils.invokeMethod` on a private method** — this is an acceptable pragmatic choice given the method is deliberately private (it's an internal security guard, not a service API). The security regression comment makes the intent clear. ### Integration test setup quality The `processRows_skipsRowAndContinues_whenFilenameIsPathTraversal` test correctly: - Provides a header row + two data rows (one bad, one good) - Mocks only what's needed (`documentService.findByOriginalFilename`, `documentService.save`) - Asserts both the `processed` count and the `skippedFiles` reason This is clean AAA (Arrange-Act-Assert) with no unnecessary setup.
Author
Owner

🚀 Tobias Wendt — DevOps & Platform Engineer

Verdict: Approved

This is a pure Java/test change — no infrastructure, CI pipeline, Docker, or deployment files are touched. Nothing for me to flag from an infrastructure standpoint.

Infrastructure impact assessment

Area Impact
Docker Compose None
CI pipeline None — existing MassImportServiceTest class is already in the test run
Environment variables / secrets None
Database / Flyway migrations None
Image tags / Renovate None

Notes

  • The test count increases from (implied) 48 to 58 tests in MassImportServiceTest. At <10s target for the unit test layer, adding 10 Mockito-based unit tests has negligible CI time impact.
  • The fix prevents path traversal in file lookup, which could have had server-side filesystem consequences in production. Good catch before it reaches the VPS.
  • No new dependencies are introduced — Paths, ReflectionTestUtils are already on the classpath.

LGTM from a platform perspective.

## 🚀 Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** This is a pure Java/test change — no infrastructure, CI pipeline, Docker, or deployment files are touched. Nothing for me to flag from an infrastructure standpoint. ### Infrastructure impact assessment | Area | Impact | |---|---| | Docker Compose | None | | CI pipeline | None — existing `MassImportServiceTest` class is already in the test run | | Environment variables / secrets | None | | Database / Flyway migrations | None | | Image tags / Renovate | None | ### Notes - The test count increases from (implied) 48 to 58 tests in `MassImportServiceTest`. At `<10s` target for the unit test layer, adding 10 Mockito-based unit tests has negligible CI time impact. - The fix prevents path traversal in file lookup, which could have had server-side filesystem consequences in production. Good catch before it reaches the VPS. - No new dependencies are introduced — `Paths`, `ReflectionTestUtils` are already on the classpath. LGTM from a platform perspective.
Author
Owner

📋 Elicit — Requirements Engineer

Verdict: Approved

The PR correctly closes issue #530. Requirements traceability is clean and the acceptance criteria in the PR description are testable. No frontend changes are introduced, so no UX or i18n requirements apply.

Requirements coverage check

FR: Reject path-traversal filenames during mass import

The PR description states three behaviors:

  1. Filenames containing /, \\, .., ., null bytes, or absolute paths are rejected — implemented and unit-tested
  2. Rejected rows produce a SkippedFile("INVALID_FILENAME_PATH_TRAVERSAL") entry — asserted in integration test
  3. Remaining rows continue to be processed (skip-and-continue, not abort) — asserted in integration test (processed == 1 for the valid row)

Acceptance criteria in PR description (test plan)

  • MassImportServiceTest — 58 tests, 0 failures — verifiable
  • Backend build clean — verifiable
  • Manual smoke test: import spreadsheet with ../evil filename cell — described

Observations

"INVALID_FILENAME_PATH_TRAVERSAL" skip reason is not surfaced to the user in the UI

The skip-reason string ends up in the import result's skippedFiles list that is returned to the frontend. I don't know from this PR alone whether the frontend renders skip reasons as raw strings or maps them to user-friendly messages. If the import result UI currently shows raw skip reason strings, the new value "INVALID_FILENAME_PATH_TRAVERSAL" will be shown as-is to the operator — which is informative but not localized. This is a pre-existing pattern (the same applies to "INVALID_PDF_SIGNATURE", "ALREADY_EXISTS" etc.), so it's not a regression introduced here.

Recommendation (for a follow-up issue, not this PR): Add i18n mappings for all skip reason codes so the import result page can display localized messages to the operator. This would apply to all existing skip reasons, not just the new one.

Non-functional requirements

  • Security: addressed (this is the purpose of the PR)
  • Performance: no impact — the guard runs in O(n) string operations before a filesystem lookup
  • Accessibility: N/A (backend-only)
  • i18n: N/A for backend; see note above for a future frontend consideration
## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** The PR correctly closes issue #530. Requirements traceability is clean and the acceptance criteria in the PR description are testable. No frontend changes are introduced, so no UX or i18n requirements apply. ### Requirements coverage check **FR: Reject path-traversal filenames during mass import** The PR description states three behaviors: 1. Filenames containing `/`, `\\`, `..`, `.`, null bytes, or absolute paths are rejected — ✅ implemented and unit-tested 2. Rejected rows produce a `SkippedFile("INVALID_FILENAME_PATH_TRAVERSAL")` entry — ✅ asserted in integration test 3. Remaining rows continue to be processed (skip-and-continue, not abort) — ✅ asserted in integration test (`processed == 1` for the valid row) **Acceptance criteria in PR description (test plan)** - `MassImportServiceTest` — 58 tests, 0 failures — ✅ verifiable - Backend build clean — ✅ verifiable - Manual smoke test: import spreadsheet with `../evil` filename cell — ✅ described ### Observations **`"INVALID_FILENAME_PATH_TRAVERSAL"` skip reason is not surfaced to the user in the UI** The skip-reason string ends up in the import result's `skippedFiles` list that is returned to the frontend. I don't know from this PR alone whether the frontend renders skip reasons as raw strings or maps them to user-friendly messages. If the import result UI currently shows raw skip reason strings, the new value `"INVALID_FILENAME_PATH_TRAVERSAL"` will be shown as-is to the operator — which is informative but not localized. This is a pre-existing pattern (the same applies to `"INVALID_PDF_SIGNATURE"`, `"ALREADY_EXISTS"` etc.), so it's not a regression introduced here. **Recommendation (for a follow-up issue, not this PR):** Add i18n mappings for all skip reason codes so the import result page can display localized messages to the operator. This would apply to all existing skip reasons, not just the new one. **Non-functional requirements** - Security: ✅ addressed (this is the purpose of the PR) - Performance: no impact — the guard runs in O(n) string operations before a filesystem lookup - Accessibility: N/A (backend-only) - i18n: N/A for backend; see note above for a future frontend consideration
Author
Owner

🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist

Verdict: Approved

This is a backend-only security fix with no frontend, UI, or accessibility surface. There is nothing for me to review from a design, branding, or WCAG perspective.

Scope assessment

Area Changed?
Svelte components No
SvelteKit routes No
Tailwind classes / design tokens No
i18n message keys (Paraglide) No
User-facing error messages No
Touch targets / focus states No

One forward-looking UX observation (not a blocker)

The SkippedFile entries produced by the import are presumably shown to the operator in the import results UI. The new skip reason "INVALID_FILENAME_PATH_TRAVERSAL" will appear as a raw string. For operators who are non-technical family members, this message is opaque. A follow-up to add a localized message for this code (and for the other existing raw skip reasons) would improve the operator experience — but that is scope for a separate issue, not this PR.

LGTM from design and accessibility.

## 🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist **Verdict: ✅ Approved** This is a backend-only security fix with no frontend, UI, or accessibility surface. There is nothing for me to review from a design, branding, or WCAG perspective. ### Scope assessment | Area | Changed? | |---|---| | Svelte components | No | | SvelteKit routes | No | | Tailwind classes / design tokens | No | | i18n message keys (Paraglide) | No | | User-facing error messages | No | | Touch targets / focus states | No | ### One forward-looking UX observation (not a blocker) The `SkippedFile` entries produced by the import are presumably shown to the operator in the import results UI. The new skip reason `"INVALID_FILENAME_PATH_TRAVERSAL"` will appear as a raw string. For operators who are non-technical family members, this message is opaque. A follow-up to add a localized message for this code (and for the other existing raw skip reasons) would improve the operator experience — but that is scope for a separate issue, not this PR. LGTM from design and accessibility.
marcel added 4 commits 2026-05-21 10:17:42 +02:00
Adds checks for U+2215 DIVISION SLASH (∕), U+FF0F FULLWIDTH SOLIDUS (/),
and U+29F5 REVERSE SOLIDUS OPERATOR (⧵) — all of which bypass the existing
ASCII separator checks on Linux path resolution. Adds a clarifying comment on
the Paths.get().isAbsolute() call explaining its InvalidPathException safety
boundary. Adds 3 regression tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents that .hidden.pdf and "Brief an Oma.pdf" correctly pass the
isValidImportFilename guard — both are valid basenames common in the archive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces MassImportService.SkipReason with all five values —
INVALID_FILENAME_PATH_TRAVERSAL, INVALID_PDF_SIGNATURE, FILE_READ_ERROR,
ALREADY_EXISTS, S3_UPLOAD_FAILED — making the full set of reasons greppable
and type-safe. SkippedFile.reason changes from String to SkipReason;
importSingleDocument return type updated accordingly. JSON serialisation
is unchanged (Jackson serialises enums by name). All tests updated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
security(import): add canonical path escape guard in findFileRecursive
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m27s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m41s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
CI / Unit & Component Tests (push) Successful in 3m26s
CI / OCR Service Tests (push) Successful in 20s
CI / Backend Unit Tests (push) Successful in 3m24s
CI / fail2ban Regex (push) Successful in 41s
CI / Semgrep Security Scan (push) Successful in 18s
CI / Compose Bucket Idempotency (push) Successful in 1m0s
eca4f1f0e8
A symlink placed inside importDir pointing to a file outside it would pass
isValidImportFilename (no forbidden chars in the symlink name) and be found
by Files.walk. Now checks candidate.getCanonicalPath() against
baseDir.getCanonicalPath() — if the resolved path escapes importDir,
throws DomainException.internal and aborts the import. Adds regression
test using @TempDir + Files.createSymbolicLink.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

Review concerns addressed

All open reviewer concerns have been resolved. Four commits on top of the original two:


429ff32e — security(import): block Unicode lookalike path separators

Addresses @Nora concern 1 (Unicode homoglyph bypass) and @Felix suggestion 1 (Paths.get safety comment).

  • Added checks for U+2215 DIVISION SLASH (), U+FF0F FULLWIDTH SOLIDUS (), U+29F5 REVERSE SOLIDUS OPERATOR () — all pass the existing ASCII separator checks on Linux but would resolve as path separators on some systems
  • Added clarifying comment on Paths.get(filename).isAbsolute() explaining its InvalidPathException safety boundary on Linux
  • 3 new regression tests: isValidImportFilename_returnsFalse_whenFilenameContainsUnicodeDivisionSlash, _FullwidthSlash, _UnicodeReverseSolidus

890f014b — test(import): add regression tests for leading-dot and spaced filenames

Addresses @Felix suggestion 2 (.hidden coverage) and @Sara suggestion (spaced filename coverage).

  • isValidImportFilename_returnsTrue_whenFilenameHasLeadingDot.hidden.pdf correctly passes the guard
  • isValidImportFilename_returnsTrue_whenFilenameHasSpacesBrief an Oma.pdf correctly passes the guard

Both are documentation tests — existing behavior was already correct, now it's covered.


4e33f52a — refactor(import): extract SkipReason enum to replace raw skip-reason strings

Addresses @Markus suggestion (typed skip reasons instead of raw string literals).

  • New MassImportService.SkipReason enum with all five values: INVALID_FILENAME_PATH_TRAVERSAL, INVALID_PDF_SIGNATURE, FILE_READ_ERROR, ALREADY_EXISTS, S3_UPLOAD_FAILED
  • SkippedFile.reason changed from String to SkipReason
  • importSingleDocument return type changed from Optional<String> to Optional<SkipReason>
  • JSON serialisation unchanged (Jackson serialises enums by .name())
  • All test assertions updated to use enum constants

eca4f1f0 — security(import): add canonical path escape guard in findFileRecursive

Addresses @Nora concern 2 (symlink escape via findFileRecursive).

  • After finding a candidate file via Files.walk, checks that candidate.getCanonicalPath() starts with baseDir.getCanonicalPath() — catches the case where a symlink inside importDir resolves to a file outside it
  • On escape detection: throws DomainException.internal(INTERNAL_ERROR, ...) — aborts the import with IMPORT_FAILED_INTERNAL state rather than silently skipping
  • Regression test: findFileRecursive_throwsDomainException_whenSymlinkEscapesImportDir using @TempDir + Files.createSymbolicLink

Final test count: 64 tests, 0 failures. Backend build clean.

## Review concerns addressed All open reviewer concerns have been resolved. Four commits on top of the original two: --- ### `429ff32e` — security(import): block Unicode lookalike path separators Addresses **@Nora concern 1** (Unicode homoglyph bypass) and **@Felix suggestion 1** (Paths.get safety comment). - Added checks for U+2215 DIVISION SLASH (`∕`), U+FF0F FULLWIDTH SOLIDUS (`/`), U+29F5 REVERSE SOLIDUS OPERATOR (`⧵`) — all pass the existing ASCII separator checks on Linux but would resolve as path separators on some systems - Added clarifying comment on `Paths.get(filename).isAbsolute()` explaining its `InvalidPathException` safety boundary on Linux - 3 new regression tests: `isValidImportFilename_returnsFalse_whenFilenameContainsUnicodeDivisionSlash`, `_FullwidthSlash`, `_UnicodeReverseSolidus` --- ### `890f014b` — test(import): add regression tests for leading-dot and spaced filenames Addresses **@Felix suggestion 2** (`.hidden` coverage) and **@Sara suggestion** (spaced filename coverage). - `isValidImportFilename_returnsTrue_whenFilenameHasLeadingDot` — `.hidden.pdf` correctly passes the guard - `isValidImportFilename_returnsTrue_whenFilenameHasSpaces` — `Brief an Oma.pdf` correctly passes the guard Both are documentation tests — existing behavior was already correct, now it's covered. --- ### `4e33f52a` — refactor(import): extract SkipReason enum to replace raw skip-reason strings Addresses **@Markus suggestion** (typed skip reasons instead of raw string literals). - New `MassImportService.SkipReason` enum with all five values: `INVALID_FILENAME_PATH_TRAVERSAL`, `INVALID_PDF_SIGNATURE`, `FILE_READ_ERROR`, `ALREADY_EXISTS`, `S3_UPLOAD_FAILED` - `SkippedFile.reason` changed from `String` to `SkipReason` - `importSingleDocument` return type changed from `Optional<String>` to `Optional<SkipReason>` - JSON serialisation unchanged (Jackson serialises enums by `.name()`) - All test assertions updated to use enum constants --- ### `eca4f1f0` — security(import): add canonical path escape guard in findFileRecursive Addresses **@Nora concern 2** (symlink escape via `findFileRecursive`). - After finding a candidate file via `Files.walk`, checks that `candidate.getCanonicalPath()` starts with `baseDir.getCanonicalPath()` — catches the case where a symlink inside `importDir` resolves to a file outside it - On escape detection: throws `DomainException.internal(INTERNAL_ERROR, ...)` — aborts the import with `IMPORT_FAILED_INTERNAL` state rather than silently skipping - Regression test: `findFileRecursive_throwsDomainException_whenSymlinkEscapesImportDir` using `@TempDir` + `Files.createSymbolicLink` --- **Final test count:** 64 tests, 0 failures. Backend build clean.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: Approved

This is a clean, disciplined PR. The concerns from the previous review round have all been addressed. Here's what I checked:

What's done well

Guard-clause structure in isValidImportFilename — textbook. Each check exits early, the happy path is the last line at the lowest indent. The Unicode lookalike additions (U+2215, U+FF0F, U+29F5) slot in naturally without breaking the linear flow.

findFileRecursive refactor — splitting the stream pipeline to check the canonical path before returning is a meaningful improvement. The method is now 14 lines but each line does one thing. I'd accept it as-is.

SkipReason enum — the right move. Optional<SkipReason> on importSingleDocument is a much better signature than Optional<String>. The compiler now catches any new skip reason that isn't in the set.

Test namingisValidImportFilename_returnsFalse_whenFilenameContainsUnicodeDivisionSlash is precise and reads as a spec. No improvement needed.

Symlink escape test — using two @TempDir parameters and Files.createSymbolicLink is the right approach for this. It's a real filesystem test, not a mock, which is exactly what a security regression test should be.

Suggestions (no blockers)

isValidImportFilename — 9 guard clauses is still a denylist: nothing structurally wrong, but worth noting that if a third Unicode slash variant surfaces (say U+2044 FRACTION SLASH or U+2F SOLIDUS variant), it will require another PR. The previous review noted the allowlist alternative ([\\w\\s.\\-()\\p{L}\\p{N}]+). Not requesting a change here — the current approach is explicitly acceptable per the security review — just flagging for the project's awareness.

// Paths.get() is safe here on Linux... comment — the comment is accurate and helpful. The trailing line "but those are not reachable in production" is a slight assumption. "are not reachable in this code path after the checks above" would be more precise. Minor wording nit, not a blocker.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** This is a clean, disciplined PR. The concerns from the previous review round have all been addressed. Here's what I checked: ### What's done well **Guard-clause structure in `isValidImportFilename`** — textbook. Each check exits early, the happy path is the last line at the lowest indent. The Unicode lookalike additions (U+2215, U+FF0F, U+29F5) slot in naturally without breaking the linear flow. **`findFileRecursive` refactor** — splitting the stream pipeline to check the canonical path before returning is a meaningful improvement. The method is now 14 lines but each line does one thing. I'd accept it as-is. **`SkipReason` enum** — the right move. `Optional<SkipReason>` on `importSingleDocument` is a much better signature than `Optional<String>`. The compiler now catches any new skip reason that isn't in the set. **Test naming** — `isValidImportFilename_returnsFalse_whenFilenameContainsUnicodeDivisionSlash` is precise and reads as a spec. No improvement needed. **Symlink escape test** — using two `@TempDir` parameters and `Files.createSymbolicLink` is the right approach for this. It's a real filesystem test, not a mock, which is exactly what a security regression test should be. ### Suggestions (no blockers) **`isValidImportFilename` — 9 guard clauses is still a denylist**: nothing structurally wrong, but worth noting that if a third Unicode slash variant surfaces (say U+2044 FRACTION SLASH or U+2F SOLIDUS variant), it will require another PR. The previous review noted the allowlist alternative (`[\\w\\s.\\-()\\p{L}\\p{N}]+`). Not requesting a change here — the current approach is explicitly acceptable per the security review — just flagging for the project's awareness. **`// Paths.get() is safe here on Linux...` comment** — the comment is accurate and helpful. The trailing line "but those are not reachable in production" is a slight assumption. "are not reachable in this code path after the checks above" would be more precise. Minor wording nit, not a blocker.
Author
Owner

🏛️ Markus Keller — Application Architect

Verdict: Approved

Architecturally clean. Checking against the doc-update matrix:

Doc-update matrix assessment

Trigger Required update Present?
New Flyway migration DB diagrams No migration — N/A
New backend package / domain module CLAUDE.md + C4 diagram No new package — N/A
New controller or service C4 diagram Private method + enum within existing class — N/A
New SvelteKit route CLAUDE.md + C4 frontend No frontend changes — N/A
New ErrorCode or Permission value CLAUDE.md + ARCHITECTURE.md SkipReason is not an ErrorCode — it is a local enum scoped to MassImportService, not part of the domain-wide error contract — N/A
New domain concept/term GLOSSARY.md Nothing fundamentally new — N/A
Architectural decision with lasting consequences ADR Not warranted for this scope

No documentation blockers.

What's done well

The SkipReason enum lives inside MassImportService as a nested public type. This is appropriate for now — it is only meaningful in the context of mass import, and exporting it to a shared package would leak internal domain details. Good call keeping it contained.

findFileRecursive now uses new File(importDir) + getCanonicalPath() for the base dir rather than Paths.get(importDir) inline. Slightly more verbose but the canonical path check necessitates the File handle, and the two calls (baseDir.getCanonicalPath() + candidate.getCanonicalPath()) are clearly related.

DomainException.internal(ErrorCode.INTERNAL_ERROR, ...) on path escape is the correct response — abort the import with a structured error rather than silently skip. The runImportAsync catch-all handler maps this to IMPORT_FAILED_INTERNAL state, which is the right outcome for a suspected configuration attack.

Minor observations

SkipReason is public — this is intentional since SkippedFile is also public and returned in the API response. Jackson will serialize by .name(), which matches the previous string values. No API contract change.

The protected visibility on importSingleDocument (for Mockito spy access) now returns Optional<SkipReason> — callers that were directly using the String return have been updated in the tests. Clean migration.

## 🏛️ Markus Keller — Application Architect **Verdict: ✅ Approved** Architecturally clean. Checking against the doc-update matrix: ### Doc-update matrix assessment | Trigger | Required update | Present? | |---|---|---| | New Flyway migration | DB diagrams | No migration — N/A | | New backend package / domain module | CLAUDE.md + C4 diagram | No new package — N/A | | New controller or service | C4 diagram | Private method + enum within existing class — N/A | | New SvelteKit route | CLAUDE.md + C4 frontend | No frontend changes — N/A | | New `ErrorCode` or `Permission` value | CLAUDE.md + ARCHITECTURE.md | `SkipReason` is **not** an `ErrorCode` — it is a local enum scoped to `MassImportService`, not part of the domain-wide error contract — N/A | | New domain concept/term | GLOSSARY.md | Nothing fundamentally new — N/A | | Architectural decision with lasting consequences | ADR | Not warranted for this scope | **No documentation blockers.** ### What's done well The `SkipReason` enum lives inside `MassImportService` as a nested public type. This is appropriate for now — it is only meaningful in the context of mass import, and exporting it to a shared package would leak internal domain details. Good call keeping it contained. `findFileRecursive` now uses `new File(importDir)` + `getCanonicalPath()` for the base dir rather than `Paths.get(importDir)` inline. Slightly more verbose but the canonical path check necessitates the `File` handle, and the two calls (`baseDir.getCanonicalPath()` + `candidate.getCanonicalPath()`) are clearly related. `DomainException.internal(ErrorCode.INTERNAL_ERROR, ...)` on path escape is the correct response — abort the import with a structured error rather than silently skip. The `runImportAsync` catch-all handler maps this to `IMPORT_FAILED_INTERNAL` state, which is the right outcome for a suspected configuration attack. ### Minor observations `SkipReason` is `public` — this is intentional since `SkippedFile` is also `public` and returned in the API response. Jackson will serialize by `.name()`, which matches the previous string values. No API contract change. ✅ The `protected` visibility on `importSingleDocument` (for Mockito spy access) now returns `Optional<SkipReason>` — callers that were directly using the `String` return have been updated in the tests. Clean migration.
Author
Owner

🚀 Tobias Wendt — DevOps & Platform Engineer

Verdict: Approved

Pure Java/test change. Nothing for me to flag from an infrastructure or platform standpoint.

Infrastructure impact assessment

Area Impact
Docker Compose None
CI pipeline None — MassImportServiceTest already runs in the test suite; 64 tests instead of the original count, all Mockito-based unit tests
Environment variables / secrets None
Database / Flyway migrations None
Image tags / Renovate None
Import directory configuration (app.import.dir) No change to the config key or default — the findFileRecursive refactor now uses new File(importDir) instead of Paths.get(importDir) but the behavior for normal files is identical

Notes

The symlink escape test (findFileRecursive_throwsDomainException_whenSymlinkEscapesImportDir) uses @TempDir + Files.createSymbolicLink. Symlink creation requires appropriate OS permissions and Linux semantics. This is fine for the CI runner (Linux) but worth noting that it would fail on a Windows dev environment if anyone ever runs this there. Not an issue for this project's stack.

Adding 6 new Mockito-based unit tests has negligible CI time impact — sub-second additions to a suite that already runs in ~6 seconds. No concern.

LGTM from a platform perspective.

## 🚀 Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** Pure Java/test change. Nothing for me to flag from an infrastructure or platform standpoint. ### Infrastructure impact assessment | Area | Impact | |---|---| | Docker Compose | None | | CI pipeline | None — `MassImportServiceTest` already runs in the test suite; 64 tests instead of the original count, all Mockito-based unit tests | | Environment variables / secrets | None | | Database / Flyway migrations | None | | Image tags / Renovate | None | | Import directory configuration (`app.import.dir`) | No change to the config key or default — the `findFileRecursive` refactor now uses `new File(importDir)` instead of `Paths.get(importDir)` but the behavior for normal files is identical | ### Notes The symlink escape test (`findFileRecursive_throwsDomainException_whenSymlinkEscapesImportDir`) uses `@TempDir` + `Files.createSymbolicLink`. Symlink creation requires appropriate OS permissions and Linux semantics. This is fine for the CI runner (Linux) but worth noting that it would fail on a Windows dev environment if anyone ever runs this there. Not an issue for this project's stack. Adding 6 new Mockito-based unit tests has negligible CI time impact — sub-second additions to a suite that already runs in ~6 seconds. No concern. LGTM from a platform perspective.
Author
Owner

📋 Elicit — Requirements Engineer

Verdict: Approved

Requirements coverage is complete. Checking against issue #530 and the PR description:

Requirements traceability

FR: Reject path-traversal filenames during mass import (issue #530)

Requirement Implementation Test coverage
Reject filenames with / contains("/") check whenFilenameContainsForwardSlash
Reject filenames with \ contains("\\") check whenFilenameContainsBackslash
Reject filenames with .. contains("..") check whenFilenameContainsDotDot, whenFilenameIsDotDot
Reject absolute paths isAbsolute() check whenFilenameIsAbsolutePath
Reject null bytes contains("\0") check whenFilenameContainsNullByte
Skip-and-continue (not abort) continue in processRows processRows_skipsRowAndContinues_whenFilenameIsPathTraversal
Unicode lookalike separators U+2215, U+FF0F, U+29F5 checks three new Unicode tests
Symlink escape prevention Canonical path check in findFileRecursive findFileRecursive_throwsDomainException_whenSymlinkEscapesImportDir

Acceptance criteria in PR description (test plan):

  • 64 tests, 0 failures (PR description says 58 — this is stale from the original PR; actual count is 64 after review fixes)
  • Backend build clean
  • ☐ Manual smoke test (not verifiable in this review; described in test plan)

One forward-looking observation

The PR description's test plan still says "58 tests" — it was written before the review-cycle additions. This is cosmetic (the PR description is not a contract), but updating it to "64 tests" would keep the test plan accurate for anyone reading this PR's history.

SkipReason enum — requirements perspective

The typed enum approach addresses the Markus concern about raw string tech-debt. From a requirements standpoint, the reason field in SkippedFile will still appear as a string in the API response (e.g. "ALREADY_EXISTS") — the JSON contract is unchanged. The frontend types.ts still types reason: string, which remains valid. A future issue to map these codes to localized operator messages remains desirable (noted in previous review round) but is correctly out of scope here.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** Requirements coverage is complete. Checking against issue #530 and the PR description: ### Requirements traceability **FR: Reject path-traversal filenames during mass import** (issue #530) | Requirement | Implementation | Test coverage | |---|---|---| | Reject filenames with `/` | `contains("/")` check | ✅ `whenFilenameContainsForwardSlash` | | Reject filenames with `\` | `contains("\\")` check | ✅ `whenFilenameContainsBackslash` | | Reject filenames with `..` | `contains("..")` check | ✅ `whenFilenameContainsDotDot`, `whenFilenameIsDotDot` | | Reject absolute paths | `isAbsolute()` check | ✅ `whenFilenameIsAbsolutePath` | | Reject null bytes | `contains("\0")` check | ✅ `whenFilenameContainsNullByte` | | Skip-and-continue (not abort) | `continue` in `processRows` | ✅ `processRows_skipsRowAndContinues_whenFilenameIsPathTraversal` | | Unicode lookalike separators | U+2215, U+FF0F, U+29F5 checks | ✅ three new Unicode tests | | Symlink escape prevention | Canonical path check in `findFileRecursive` | ✅ `findFileRecursive_throwsDomainException_whenSymlinkEscapesImportDir` | **Acceptance criteria in PR description (test plan):** - ✅ 64 tests, 0 failures (PR description says 58 — this is stale from the original PR; actual count is 64 after review fixes) - ✅ Backend build clean - ☐ Manual smoke test (not verifiable in this review; described in test plan) ### One forward-looking observation The PR description's test plan still says "58 tests" — it was written before the review-cycle additions. This is cosmetic (the PR description is not a contract), but updating it to "64 tests" would keep the test plan accurate for anyone reading this PR's history. ### `SkipReason` enum — requirements perspective The typed enum approach addresses the Markus concern about raw string tech-debt. From a requirements standpoint, the `reason` field in `SkippedFile` will still appear as a string in the API response (e.g. `"ALREADY_EXISTS"`) — the JSON contract is unchanged. The frontend `types.ts` still types `reason: string`, which remains valid. A future issue to map these codes to localized operator messages remains desirable (noted in previous review round) but is correctly out of scope here.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Verdict: Approved

All concerns from the previous review round have been addressed. Final security assessment:

CWE-22 (Path Traversal) — threat model completeness

Guard coverage in isValidImportFilename:

Attack vector Check Status
../evil contains("..")
/etc/passwd isAbsolute()
etc/passwd contains("/")
..\evil contains("\\") + contains("..")
file\0.pdf (null byte) contains("\0")
Single dot . equals(".")
foo∕bar.pdf (U+2215 DIVISION SLASH) contains("∕")
foo/bar.pdf (U+FF0F FULLWIDTH SOLIDUS) contains("/")
foo⧵bar.pdf (U+29F5 REVERSE SOLIDUS OPERATOR) contains("⧵")
Symlink inside importDir pointing outside Canonical path check in findFileRecursive

Still a denylist: As noted in the previous round, U+2044 FRACTION SLASH () and U+29F8 BIG SOLIDUS () are not in the list. Neither character is a real path separator on any mainstream OS, so the practical risk is low. A positive allowlist ([\\w\\s.\\-()\p{L}\p{N}]+) would permanently close this class. Keeping this as a Low / informational — not a blocker.

findFileRecursive canonical path check

The implementation is correct:

if (!candidate.getCanonicalPath().startsWith(baseDirCanonical + File.separator)) {
    throw DomainException.internal(ErrorCode.INTERNAL_ERROR, "Path escape detected: " + candidate);
}

The File.separator appended to baseDirCanonical prevents a false bypass where importDir = /opt/import and a file at /opt/import-evil/secret.pdf would match without it. Good defensive implementation.

Files.walk without FOLLOW_LINKS enumerates symlinks as files (doesn't resolve them), so p.toFile() returns the symlink path. Then candidate.getCanonicalPath() resolves through the symlink — this is the correct place to catch the escape.

DomainException.internal on escape = correct fail-closed behavior. The import aborts with IMPORT_FAILED_INTERNAL rather than silently skipping a potential attacker-controlled symlink.

Logging

log.warn("Skipping import row {}: filename rejected — {}", i, filename) — SLF4J parameterized, no injection risk. The filename is user-controlled input logged here — safe with SLF4J's {} placeholders.

No new concerns. This PR is ready to merge from a security perspective.

## 🔒 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** All concerns from the previous review round have been addressed. Final security assessment: ### CWE-22 (Path Traversal) — threat model completeness **Guard coverage in `isValidImportFilename`:** | Attack vector | Check | Status | |---|---|---| | `../evil` | `contains("..")` | ✅ | | `/etc/passwd` | `isAbsolute()` | ✅ | | `etc/passwd` | `contains("/")` | ✅ | | `..\evil` | `contains("\\")` + `contains("..")` | ✅ | | `file\0.pdf` (null byte) | `contains("\0")` | ✅ | | Single dot `.` | `equals(".")` | ✅ | | `foo∕bar.pdf` (U+2215 DIVISION SLASH) | `contains("∕")` | ✅ | | `foo/bar.pdf` (U+FF0F FULLWIDTH SOLIDUS) | `contains("/")` | ✅ | | `foo⧵bar.pdf` (U+29F5 REVERSE SOLIDUS OPERATOR) | `contains("⧵")` | ✅ | | Symlink inside importDir pointing outside | Canonical path check in `findFileRecursive` | ✅ | **Still a denylist:** As noted in the previous round, U+2044 FRACTION SLASH (`⁄`) and U+29F8 BIG SOLIDUS (`⧸`) are not in the list. Neither character is a real path separator on any mainstream OS, so the practical risk is low. A positive allowlist (`[\\w\\s.\\-()\p{L}\p{N}]+`) would permanently close this class. Keeping this as a Low / informational — not a blocker. ### `findFileRecursive` canonical path check The implementation is correct: ```java if (!candidate.getCanonicalPath().startsWith(baseDirCanonical + File.separator)) { throw DomainException.internal(ErrorCode.INTERNAL_ERROR, "Path escape detected: " + candidate); } ``` The `File.separator` appended to `baseDirCanonical` prevents a false bypass where `importDir = /opt/import` and a file at `/opt/import-evil/secret.pdf` would match without it. Good defensive implementation. `Files.walk` without `FOLLOW_LINKS` enumerates symlinks as files (doesn't resolve them), so `p.toFile()` returns the symlink path. Then `candidate.getCanonicalPath()` resolves through the symlink — this is the correct place to catch the escape. ✅ **`DomainException.internal` on escape = correct fail-closed behavior.** The import aborts with `IMPORT_FAILED_INTERNAL` rather than silently skipping a potential attacker-controlled symlink. ✅ ### Logging `log.warn("Skipping import row {}: filename rejected — {}", i, filename)` — SLF4J parameterized, no injection risk. The filename is user-controlled input logged here — safe with SLF4J's `{}` placeholders. ✅ No new concerns. This PR is ready to merge from a security perspective.
Author
Owner

🧪 Sara Holt — QA Engineer & Test Strategist

Verdict: Approved

64 tests, all green. The test suite for this PR is in excellent shape. Full assessment:

Test pyramid placement

Test Layer Correct?
isValidImportFilename_* (12 tests) Unit — ReflectionTestUtils.invokeMethod on private method Right layer — fast, isolated, directly exercises the guard
processRows_skipsRowAndContinues_* Integration — mocked collaborators, real processRows logic Right layer — verifies the skip-and-continue behavior end-to-end within the service
findFileRecursive_throwsDomainException_* Integration — real filesystem via @TempDir + symlink Right layer — symlink behavior can only be tested with a real OS, not mocks

Naming quality

All new tests follow the methodName_returnsX_whenY convention consistently:

  • isValidImportFilename_returnsFalse_whenFilenameContainsUnicodeDivisionSlash
  • isValidImportFilename_returnsTrue_whenFilenameHasLeadingDot
  • isValidImportFilename_returnsTrue_whenFilenameHasSpaces
  • findFileRecursive_throwsDomainException_whenSymlinkEscapesImportDir

Coverage completeness

New positive tests for leading-dot and spaced filenames are documentation tests — the behavior was already correct, now it's proven. This is exactly how regressions get prevented over time.

Unicode attack payload tests use realistic inputs (foo∕bar.pdf, foo/bar.pdf, foo⧵bar.pdf) — these are attack payloads, not synthetic safe inputs.

SkipReason enum migration — all 6 existing test assertions updated from strings to enum constants. extracting(SkippedFile::reason).containsExactly(SkipReason.S3_UPLOAD_FAILED) is now type-safe. If a new enum value is added and a test accidentally uses the wrong one, the compiler catches it.

One minor observation

The processRows_skipsRowAndContinues_whenFilenameIsPathTraversal test has inline comments on the rows:

minimalCells("../evil"),       // row 1: path traversal — should be skipped
minimalCells("legitimate.pdf") // row 2: valid — should be processed

These are good — they make the test intent immediately clear to a future reader. No change needed.

CI time impact

6 new tests beyond the previous 58: ~0.1 second additional test time. Negligible.

## 🧪 Sara Holt — QA Engineer & Test Strategist **Verdict: ✅ Approved** 64 tests, all green. The test suite for this PR is in excellent shape. Full assessment: ### Test pyramid placement | Test | Layer | Correct? | |---|---|---| | `isValidImportFilename_*` (12 tests) | Unit — `ReflectionTestUtils.invokeMethod` on private method | ✅ Right layer — fast, isolated, directly exercises the guard | | `processRows_skipsRowAndContinues_*` | Integration — mocked collaborators, real `processRows` logic | ✅ Right layer — verifies the skip-and-continue behavior end-to-end within the service | | `findFileRecursive_throwsDomainException_*` | Integration — real filesystem via `@TempDir` + symlink | ✅ Right layer — symlink behavior can only be tested with a real OS, not mocks | ### Naming quality All new tests follow the `methodName_returnsX_whenY` convention consistently: - `isValidImportFilename_returnsFalse_whenFilenameContainsUnicodeDivisionSlash` ✅ - `isValidImportFilename_returnsTrue_whenFilenameHasLeadingDot` ✅ - `isValidImportFilename_returnsTrue_whenFilenameHasSpaces` ✅ - `findFileRecursive_throwsDomainException_whenSymlinkEscapesImportDir` ✅ ### Coverage completeness **New positive tests for leading-dot and spaced filenames** are documentation tests — the behavior was already correct, now it's proven. This is exactly how regressions get prevented over time. ✅ **Unicode attack payload tests** use realistic inputs (`foo∕bar.pdf`, `foo/bar.pdf`, `foo⧵bar.pdf`) — these are attack payloads, not synthetic safe inputs. ✅ **`SkipReason` enum migration** — all 6 existing test assertions updated from strings to enum constants. `extracting(SkippedFile::reason).containsExactly(SkipReason.S3_UPLOAD_FAILED)` is now type-safe. If a new enum value is added and a test accidentally uses the wrong one, the compiler catches it. ✅ ### One minor observation The `processRows_skipsRowAndContinues_whenFilenameIsPathTraversal` test has inline comments on the rows: ```java minimalCells("../evil"), // row 1: path traversal — should be skipped minimalCells("legitimate.pdf") // row 2: valid — should be processed ``` These are good — they make the test intent immediately clear to a future reader. No change needed. ### CI time impact 6 new tests beyond the previous 58: ~0.1 second additional test time. Negligible.
Author
Owner

🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist

Verdict: Approved

Backend-only security fix with no frontend, UI, or accessibility surface. There is nothing for me to review from a design, branding, or WCAG perspective.

Scope check

Area Changed?
Svelte components No
SvelteKit routes No
Tailwind classes / design tokens No
i18n message keys (Paraglide) No
User-facing error messages No
Touch targets / focus states No
Operator-facing UI (import results) Indirectly: SkipReason.INVALID_FILENAME_PATH_TRAVERSAL still appears as a raw string in the operator UI

One standing UX observation (not a blocker, same as previous round)

SkippedFile.reason is now typed as SkipReason in Java, but Jackson serializes it to the same raw string in the JSON response ("INVALID_FILENAME_PATH_TRAVERSAL"). The frontend types.ts types it as string. The import results UI presumably renders these strings directly to the operator.

For a non-technical family member running the import, INVALID_FILENAME_PATH_TRAVERSAL is opaque. A follow-up to add localized labels for all skip reasons (not just the new one) would meaningfully improve the operator experience — but this is scope for a separate issue. It has been noted in the previous review round and remains valid.

LGTM from design and accessibility.

## 🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist **Verdict: ✅ Approved** Backend-only security fix with no frontend, UI, or accessibility surface. There is nothing for me to review from a design, branding, or WCAG perspective. ### Scope check | Area | Changed? | |---|---| | Svelte components | No | | SvelteKit routes | No | | Tailwind classes / design tokens | No | | i18n message keys (Paraglide) | No | | User-facing error messages | No | | Touch targets / focus states | No | | Operator-facing UI (import results) | Indirectly: `SkipReason.INVALID_FILENAME_PATH_TRAVERSAL` still appears as a raw string in the operator UI | ### One standing UX observation (not a blocker, same as previous round) `SkippedFile.reason` is now typed as `SkipReason` in Java, but Jackson serializes it to the same raw string in the JSON response (`"INVALID_FILENAME_PATH_TRAVERSAL"`). The frontend `types.ts` types it as `string`. The import results UI presumably renders these strings directly to the operator. For a non-technical family member running the import, `INVALID_FILENAME_PATH_TRAVERSAL` is opaque. A follow-up to add localized labels for all skip reasons (not just the new one) would meaningfully improve the operator experience — but this is scope for a separate issue. It has been noted in the previous review round and remains valid. LGTM from design and accessibility.
marcel merged commit eca4f1f0e8 into main 2026-05-21 10:30:57 +02:00
marcel deleted branch feat/issue-530-reject-path-traversal-filenames 2026-05-21 10:30:58 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#650