security(import): reject path-traversal filenames in MassImportService.processRows #650
Reference in New Issue
Block a user
Delete Branch "feat/issue-530-reject-path-traversal-filenames"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
isValidImportFilename(String)private guard toMassImportService, called after the filename ternary inprocessRowsand beforefindFileRecursive/,\,..(substring),., null bytes, or absolute paths — with SLF4J parameterised logging andSkippedFile(filename, "INVALID_FILENAME_PATH_TRAVERSAL")added to the shared skip counterprocessRowsintegration test asserting skip-and-continue behaviourCloses #530.
Test plan
MassImportServiceTest— 58 tests, 0 failures (./mvnw test -Dtest=MassImportServiceTest)./mvnw clean package -DskipTests)../evilin a filename cell — row appears in skipped list, remaining rows still imported🤖 Generated with Claude Code
👷 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
Clean, focused PR. The implementation does exactly one thing and the tests prove it.
Blockers
None.
Suggestions
isValidImportFilename— prefer early-return, considerPaths.getexception safetyThe method (diff
+325–+331) already uses guard clauses which is great. One minor nit: thefilename.equals(".")check on line+330is subsumed by thefilename.contains("..")check directly above —"."doesn't contain".."so it is not subsumed — this is correct. No issue.However, there is a subtle reliability concern:
Paths.get(filename).isAbsolute()on line+331can throwInvalidPathExceptionon certain OS-specific illegal characters (e.g. null bytes should have been caught already, but Windows paths with*,?,",<,>,|would throw here before returningfalse). Because the null-byte check is on line+329before thePaths.getcall, and the service presumably runs on Linux, this is unlikely to be exploitable in practice — but it could produce a non-descriptive 500 for a spreadsheet cell containing*evil.pdfon Windows. Worth a comment:Test:
minimalCellshelperThe integration test on line
+459callsminimalCells("../evil")— this helper was pre-existing, so no issue with its existence, but the test nameprocessRows_skipsRowAndContinues_whenFilenameIsPathTraversalis excellent and follows the project naming convention perfectly.One missing positive test case: filename with a leading dot (hidden file)
".hidden"is a valid base name on Linux and should pass the validator. This isn't a security issue — it's a coverage edge case. Consider adding it if.hiddenfiles are plausible in the archive.What's done well
isValidImportFilenameis textbook.// ─── isValidImportFilename — security regression — do not remove ─────────is exactly the right signal to future maintainers.log.warn("Skipping import row {}: filename rejected — {}", i, filename)— correct, no injection risk.🏛️ Markus Keller — Application Architect
Verdict: ✅ Approved
The change is well-scoped, architecturally clean, and introduces no new cross-domain dependencies or structural concerns. No documentation updates are required by the doc-update matrix for this PR.
Blocker assessment against doc-update matrix
isValidImportFilenameis a private method, not a new service — N/A"INVALID_FILENAME_PATH_TRAVERSAL"is a string literal, not a newErrorCodeenum value — see suggestion belowNo documentation blockers.
Suggestions
"INVALID_FILENAME_PATH_TRAVERSAL"is a raw string, not a typedErrorCodeThe skip-reason string (
"INVALID_FILENAME_PATH_TRAVERSAL") is added as a literal on diff line+295. All other skip reasons inSkippedFileare also strings ("INVALID_PDF_SIGNATURE","ALREADY_EXISTS","S3_UPLOAD_FAILED"etc.), so this PR follows the existing convention. That convention is itself a mild architecture smell — anErrorCodeenum or aSkipReasonenum would prevent typos and make the full set of reasons greppable — but that's pre-existing tech debt, not introduced by this PR. Worth a follow-up issue; not a blocker here.Placement of the guard relative to the
filenameternaryThe guard is called after the
.pdfextension is appended (diff+293). This is the correct position: the validator sees the final filename that will actually be used infindFileRecursive, not the raw spreadsheet cell value. Good placement decision — it avoids the gap whereindex = "../evil"would become"../evil.pdf"before validation.What's done well
importingpackage — no cross-domain leakage.🔒 Nora "NullX" Steiner — Application Security Engineer
Verdict: ⚠️ Approved with concerns
The path-traversal fix is genuine and the regression tests are solid. I have one concern about completeness of the guard and one observation about defense-in-depth that is worth documenting.
Blockers
None — the current guard eliminates the primary path-traversal vectors for the Linux production environment.
Security analysis of
isValidImportFilenameCWE-22 (Path Traversal) — primary threat model
The guard correctly blocks:
../evil→ caught bycontains("..")/etc/passwd→ caught byisAbsolute()etc/passwd→ caught bycontains("/")..\evil→ caught bycontains("..")(the backslash also hits thecontains("\\")check)file\0.pdf→ caught bycontains("\0")"."→ caught byfilename.equals(".")Concern 1 (Medium): Unicode lookalike / overlong encoding bypass
The guard operates on the Java
String(UTF-16). Filenames arriving from Excel are decoded by Apache POI before this check runs, so classic overlong UTF-8 encodings (e.g.%2e%2e%2f→../) are already resolved. This is safe.However, Unicode homoglyph directory separators (U+2215 DIVISION SLASH
∕, U+29F5 REVERSE SOLIDUS⟵, U+FF0F FULLWIDTH SOLIDUS/) are not caught by thecontains("/")andcontains("\\")checks. On Linux,java.io.Fileuses the OS-level path resolution, which does not normalise these Unicode variants to/. So a filename likefoo∕etc∕passwdwould pass the guard and then be passed tofindFileRecursive. WhetherfindFileRecursivewould actually traverse the filesystem with such a name depends on its implementation, but to be safe:The positive allowlist approach (
matches("[\\w\\s.\\-()]+")) would be more robust than the current denylist and would eliminate this and all future bypass vectors at once. However, it could also reject legitimate filenames if the archive contains accented characters (Ä, ü, ö, é…) which are common in a German family archive. If you go with allowlist, extend the character class to include\p{L}\p{N}(Unicode letters and digits) rather than\w(ASCII-only word chars). This is a suggestion, not a blocker — the current denylist covers the real-world attack surface for this application.Concern 2 (Low):
findFileRecursiveis not shown in this diffThe guard is a pre-filter. If
findFileRecursiveitself usesFile(baseDir, filename)orPath.resolve(filename)without a post-check that the resolved path is still underbaseDir, a bypass could theoretically occur if the guard is ever weakened. Defense-in-depth would add a canonical-path check insidefindFileRecursive:Not a blocker for this PR since the guard prevents the dangerous input from reaching that code, but worth a follow-up issue.
What's done well
isBlank()— no NPE on null filename.../evil,/etc/passwd,..\\etc\\passwd,file\0.pdf) — not synthetic safe inputs. This is exactly how security regression tests should be written.// ─── security regression — do not remove ───comment is a strong signal to future maintainers.Paths.get(filename).isAbsolute()covers Windows absolute paths (C:\...) even though the production environment is Linux — good belt-and-suspenders.🧪 Sara Holt — QA Engineer & Test Strategist
Verdict: ✅ Approved
10 tests for a 12-line guard method is proportionate and well-structured. The tests are permanent, descriptive, and cover the right behaviors.
Blocker assessment
None.
Test quality review
Naming — excellent
All 10 test names follow the
methodName_returnsX_whenYconvention and read as specifications:isValidImportFilename_returnsFalse_whenFilenameContainsForwardSlash✅processRows_skipsRowAndContinues_whenFilenameIsPathTraversal✅Layer placement — correct
ReflectionTestUtils.invokeMethod— appropriate, these are fast, isolated, and exercise the guard logic directly.processRows— correct layer for a skip-and-continue behaviour that involves mocked collaborators (documentService,documentService.save).Coverage of the guard method
null..substring..as full valueGaps (suggestions, not blockers)
No test for
.pdfextension appended to..input — the processRows code converts..(no dot in it? wait:..does contain., so the ternary takes theindexbranch, filename =..). Actually..contains.sofilename = ".."— thecontains("..")check would catch this. Covered implicitly byisValidImportFilename_returnsFalse_whenFilenameIsDotDot. ✅ No gap here.No negative test for
processRows— that a valid row is NOT skipped when the filename is clean — the integration test assertsprocessed == 1for"legitimate.pdf"which implicitly verifies this. ✅ Covered.No test verifying the skip reason string value — the test asserts:
This directly asserts the reason string. ✅
Missing:
isValidImportFilename_returnsTrue_whenFilenameHasSpaces— filenames like"Brief an Oma.pdf"are realistic for a German family archive. The current guard would returntruefor these (no forbidden chars), so they would pass — but there's no positive test confirming this. Low priority but worth adding.ReflectionTestUtils.invokeMethodon a private method — this is an acceptable pragmatic choice given the method is deliberately private (it's an internal security guard, not a service API). The security regression comment makes the intent clear.Integration test setup quality
The
processRows_skipsRowAndContinues_whenFilenameIsPathTraversaltest correctly:documentService.findByOriginalFilename,documentService.save)processedcount and theskippedFilesreasonThis is clean AAA (Arrange-Act-Assert) with no unnecessary setup.
🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
This is a pure Java/test change — no infrastructure, CI pipeline, Docker, or deployment files are touched. Nothing for me to flag from an infrastructure standpoint.
Infrastructure impact assessment
MassImportServiceTestclass is already in the test runNotes
MassImportServiceTest. At<10starget for the unit test layer, adding 10 Mockito-based unit tests has negligible CI time impact.Paths,ReflectionTestUtilsare already on the classpath.LGTM from a platform perspective.
📋 Elicit — Requirements Engineer
Verdict: ✅ Approved
The PR correctly closes issue #530. Requirements traceability is clean and the acceptance criteria in the PR description are testable. No frontend changes are introduced, so no UX or i18n requirements apply.
Requirements coverage check
FR: Reject path-traversal filenames during mass import
The PR description states three behaviors:
/,\\,..,., null bytes, or absolute paths are rejected — ✅ implemented and unit-testedSkippedFile("INVALID_FILENAME_PATH_TRAVERSAL")entry — ✅ asserted in integration testprocessed == 1for the valid row)Acceptance criteria in PR description (test plan)
MassImportServiceTest— 58 tests, 0 failures — ✅ verifiable../evilfilename cell — ✅ describedObservations
"INVALID_FILENAME_PATH_TRAVERSAL"skip reason is not surfaced to the user in the UIThe skip-reason string ends up in the import result's
skippedFileslist that is returned to the frontend. I don't know from this PR alone whether the frontend renders skip reasons as raw strings or maps them to user-friendly messages. If the import result UI currently shows raw skip reason strings, the new value"INVALID_FILENAME_PATH_TRAVERSAL"will be shown as-is to the operator — which is informative but not localized. This is a pre-existing pattern (the same applies to"INVALID_PDF_SIGNATURE","ALREADY_EXISTS"etc.), so it's not a regression introduced here.Recommendation (for a follow-up issue, not this PR): Add i18n mappings for all skip reason codes so the import result page can display localized messages to the operator. This would apply to all existing skip reasons, not just the new one.
Non-functional requirements
🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist
Verdict: ✅ Approved
This is a backend-only security fix with no frontend, UI, or accessibility surface. There is nothing for me to review from a design, branding, or WCAG perspective.
Scope assessment
One forward-looking UX observation (not a blocker)
The
SkippedFileentries produced by the import are presumably shown to the operator in the import results UI. The new skip reason"INVALID_FILENAME_PATH_TRAVERSAL"will appear as a raw string. For operators who are non-technical family members, this message is opaque. A follow-up to add a localized message for this code (and for the other existing raw skip reasons) would improve the operator experience — but that is scope for a separate issue, not this PR.LGTM from design and accessibility.
Review concerns addressed
All open reviewer concerns have been resolved. Four commits on top of the original two:
429ff32e— security(import): block Unicode lookalike path separatorsAddresses @Nora concern 1 (Unicode homoglyph bypass) and @Felix suggestion 1 (Paths.get safety comment).
∕), U+FF0F FULLWIDTH SOLIDUS (/), U+29F5 REVERSE SOLIDUS OPERATOR (⧵) — all pass the existing ASCII separator checks on Linux but would resolve as path separators on some systemsPaths.get(filename).isAbsolute()explaining itsInvalidPathExceptionsafety boundary on LinuxisValidImportFilename_returnsFalse_whenFilenameContainsUnicodeDivisionSlash,_FullwidthSlash,_UnicodeReverseSolidus890f014b— test(import): add regression tests for leading-dot and spaced filenamesAddresses @Felix suggestion 2 (
.hiddencoverage) and @Sara suggestion (spaced filename coverage).isValidImportFilename_returnsTrue_whenFilenameHasLeadingDot—.hidden.pdfcorrectly passes the guardisValidImportFilename_returnsTrue_whenFilenameHasSpaces—Brief an Oma.pdfcorrectly passes the guardBoth are documentation tests — existing behavior was already correct, now it's covered.
4e33f52a— refactor(import): extract SkipReason enum to replace raw skip-reason stringsAddresses @Markus suggestion (typed skip reasons instead of raw string literals).
MassImportService.SkipReasonenum with all five values:INVALID_FILENAME_PATH_TRAVERSAL,INVALID_PDF_SIGNATURE,FILE_READ_ERROR,ALREADY_EXISTS,S3_UPLOAD_FAILEDSkippedFile.reasonchanged fromStringtoSkipReasonimportSingleDocumentreturn type changed fromOptional<String>toOptional<SkipReason>.name())eca4f1f0— security(import): add canonical path escape guard in findFileRecursiveAddresses @Nora concern 2 (symlink escape via
findFileRecursive).Files.walk, checks thatcandidate.getCanonicalPath()starts withbaseDir.getCanonicalPath()— catches the case where a symlink insideimportDirresolves to a file outside itDomainException.internal(INTERNAL_ERROR, ...)— aborts the import withIMPORT_FAILED_INTERNALstate rather than silently skippingfindFileRecursive_throwsDomainException_whenSymlinkEscapesImportDirusing@TempDir+Files.createSymbolicLinkFinal test count: 64 tests, 0 failures. Backend build clean.
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
This is a clean, disciplined PR. The concerns from the previous review round have all been addressed. Here's what I checked:
What's done well
Guard-clause structure in
isValidImportFilename— textbook. Each check exits early, the happy path is the last line at the lowest indent. The Unicode lookalike additions (U+2215, U+FF0F, U+29F5) slot in naturally without breaking the linear flow.findFileRecursiverefactor — splitting the stream pipeline to check the canonical path before returning is a meaningful improvement. The method is now 14 lines but each line does one thing. I'd accept it as-is.SkipReasonenum — the right move.Optional<SkipReason>onimportSingleDocumentis a much better signature thanOptional<String>. The compiler now catches any new skip reason that isn't in the set.Test naming —
isValidImportFilename_returnsFalse_whenFilenameContainsUnicodeDivisionSlashis precise and reads as a spec. No improvement needed.Symlink escape test — using two
@TempDirparameters andFiles.createSymbolicLinkis the right approach for this. It's a real filesystem test, not a mock, which is exactly what a security regression test should be.Suggestions (no blockers)
isValidImportFilename— 9 guard clauses is still a denylist: nothing structurally wrong, but worth noting that if a third Unicode slash variant surfaces (say U+2044 FRACTION SLASH or U+2F SOLIDUS variant), it will require another PR. The previous review noted the allowlist alternative ([\\w\\s.\\-()\\p{L}\\p{N}]+). Not requesting a change here — the current approach is explicitly acceptable per the security review — just flagging for the project's awareness.// Paths.get() is safe here on Linux...comment — the comment is accurate and helpful. The trailing line "but those are not reachable in production" is a slight assumption. "are not reachable in this code path after the checks above" would be more precise. Minor wording nit, not a blocker.🏛️ Markus Keller — Application Architect
Verdict: ✅ Approved
Architecturally clean. Checking against the doc-update matrix:
Doc-update matrix assessment
ErrorCodeorPermissionvalueSkipReasonis not anErrorCode— it is a local enum scoped toMassImportService, not part of the domain-wide error contract — N/ANo documentation blockers.
What's done well
The
SkipReasonenum lives insideMassImportServiceas a nested public type. This is appropriate for now — it is only meaningful in the context of mass import, and exporting it to a shared package would leak internal domain details. Good call keeping it contained.findFileRecursivenow usesnew File(importDir)+getCanonicalPath()for the base dir rather thanPaths.get(importDir)inline. Slightly more verbose but the canonical path check necessitates theFilehandle, and the two calls (baseDir.getCanonicalPath()+candidate.getCanonicalPath()) are clearly related.DomainException.internal(ErrorCode.INTERNAL_ERROR, ...)on path escape is the correct response — abort the import with a structured error rather than silently skip. TherunImportAsynccatch-all handler maps this toIMPORT_FAILED_INTERNALstate, which is the right outcome for a suspected configuration attack.Minor observations
SkipReasonispublic— this is intentional sinceSkippedFileis alsopublicand returned in the API response. Jackson will serialize by.name(), which matches the previous string values. No API contract change. ✅The
protectedvisibility onimportSingleDocument(for Mockito spy access) now returnsOptional<SkipReason>— callers that were directly using theStringreturn have been updated in the tests. Clean migration.🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
Pure Java/test change. Nothing for me to flag from an infrastructure or platform standpoint.
Infrastructure impact assessment
MassImportServiceTestalready runs in the test suite; 64 tests instead of the original count, all Mockito-based unit testsapp.import.dir)findFileRecursiverefactor now usesnew File(importDir)instead ofPaths.get(importDir)but the behavior for normal files is identicalNotes
The symlink escape test (
findFileRecursive_throwsDomainException_whenSymlinkEscapesImportDir) uses@TempDir+Files.createSymbolicLink. Symlink creation requires appropriate OS permissions and Linux semantics. This is fine for the CI runner (Linux) but worth noting that it would fail on a Windows dev environment if anyone ever runs this there. Not an issue for this project's stack.Adding 6 new Mockito-based unit tests has negligible CI time impact — sub-second additions to a suite that already runs in ~6 seconds. No concern.
LGTM from a platform perspective.
📋 Elicit — Requirements Engineer
Verdict: ✅ Approved
Requirements coverage is complete. Checking against issue #530 and the PR description:
Requirements traceability
FR: Reject path-traversal filenames during mass import (issue #530)
/contains("/")checkwhenFilenameContainsForwardSlash\contains("\\")checkwhenFilenameContainsBackslash..contains("..")checkwhenFilenameContainsDotDot,whenFilenameIsDotDotisAbsolute()checkwhenFilenameIsAbsolutePathcontains("\0")checkwhenFilenameContainsNullBytecontinueinprocessRowsprocessRows_skipsRowAndContinues_whenFilenameIsPathTraversalfindFileRecursivefindFileRecursive_throwsDomainException_whenSymlinkEscapesImportDirAcceptance criteria in PR description (test plan):
One forward-looking observation
The PR description's test plan still says "58 tests" — it was written before the review-cycle additions. This is cosmetic (the PR description is not a contract), but updating it to "64 tests" would keep the test plan accurate for anyone reading this PR's history.
SkipReasonenum — requirements perspectiveThe typed enum approach addresses the Markus concern about raw string tech-debt. From a requirements standpoint, the
reasonfield inSkippedFilewill still appear as a string in the API response (e.g."ALREADY_EXISTS") — the JSON contract is unchanged. The frontendtypes.tsstill typesreason: string, which remains valid. A future issue to map these codes to localized operator messages remains desirable (noted in previous review round) but is correctly out of scope here.🔒 Nora "NullX" Steiner — Application Security Engineer
Verdict: ✅ Approved
All concerns from the previous review round have been addressed. Final security assessment:
CWE-22 (Path Traversal) — threat model completeness
Guard coverage in
isValidImportFilename:../evilcontains("..")/etc/passwdisAbsolute()etc/passwdcontains("/")..\evilcontains("\\")+contains("..")file\0.pdf(null byte)contains("\0").equals(".")foo∕bar.pdf(U+2215 DIVISION SLASH)contains("∕")foo/bar.pdf(U+FF0F FULLWIDTH SOLIDUS)contains("/")foo⧵bar.pdf(U+29F5 REVERSE SOLIDUS OPERATOR)contains("⧵")findFileRecursiveStill a denylist: As noted in the previous round, U+2044 FRACTION SLASH (
⁄) and U+29F8 BIG SOLIDUS (⧸) are not in the list. Neither character is a real path separator on any mainstream OS, so the practical risk is low. A positive allowlist ([\\w\\s.\\-()\p{L}\p{N}]+) would permanently close this class. Keeping this as a Low / informational — not a blocker.findFileRecursivecanonical path checkThe implementation is correct:
The
File.separatorappended tobaseDirCanonicalprevents a false bypass whereimportDir = /opt/importand a file at/opt/import-evil/secret.pdfwould match without it. Good defensive implementation.Files.walkwithoutFOLLOW_LINKSenumerates symlinks as files (doesn't resolve them), sop.toFile()returns the symlink path. Thencandidate.getCanonicalPath()resolves through the symlink — this is the correct place to catch the escape. ✅DomainException.internalon escape = correct fail-closed behavior. The import aborts withIMPORT_FAILED_INTERNALrather than silently skipping a potential attacker-controlled symlink. ✅Logging
log.warn("Skipping import row {}: filename rejected — {}", i, filename)— SLF4J parameterized, no injection risk. The filename is user-controlled input logged here — safe with SLF4J's{}placeholders. ✅No new concerns. This PR is ready to merge from a security perspective.
🧪 Sara Holt — QA Engineer & Test Strategist
Verdict: ✅ Approved
64 tests, all green. The test suite for this PR is in excellent shape. Full assessment:
Test pyramid placement
isValidImportFilename_*(12 tests)ReflectionTestUtils.invokeMethodon private methodprocessRows_skipsRowAndContinues_*processRowslogicfindFileRecursive_throwsDomainException_*@TempDir+ symlinkNaming quality
All new tests follow the
methodName_returnsX_whenYconvention consistently:isValidImportFilename_returnsFalse_whenFilenameContainsUnicodeDivisionSlash✅isValidImportFilename_returnsTrue_whenFilenameHasLeadingDot✅isValidImportFilename_returnsTrue_whenFilenameHasSpaces✅findFileRecursive_throwsDomainException_whenSymlinkEscapesImportDir✅Coverage completeness
New positive tests for leading-dot and spaced filenames are documentation tests — the behavior was already correct, now it's proven. This is exactly how regressions get prevented over time. ✅
Unicode attack payload tests use realistic inputs (
foo∕bar.pdf,foo/bar.pdf,foo⧵bar.pdf) — these are attack payloads, not synthetic safe inputs. ✅SkipReasonenum migration — all 6 existing test assertions updated from strings to enum constants.extracting(SkippedFile::reason).containsExactly(SkipReason.S3_UPLOAD_FAILED)is now type-safe. If a new enum value is added and a test accidentally uses the wrong one, the compiler catches it. ✅One minor observation
The
processRows_skipsRowAndContinues_whenFilenameIsPathTraversaltest has inline comments on the rows:These are good — they make the test intent immediately clear to a future reader. No change needed.
CI time impact
6 new tests beyond the previous 58: ~0.1 second additional test time. Negligible.
🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist
Verdict: ✅ Approved
Backend-only security fix with no frontend, UI, or accessibility surface. There is nothing for me to review from a design, branding, or WCAG perspective.
Scope check
SkipReason.INVALID_FILENAME_PATH_TRAVERSALstill appears as a raw string in the operator UIOne standing UX observation (not a blocker, same as previous round)
SkippedFile.reasonis now typed asSkipReasonin Java, but Jackson serializes it to the same raw string in the JSON response ("INVALID_FILENAME_PATH_TRAVERSAL"). The frontendtypes.tstypes it asstring. The import results UI presumably renders these strings directly to the operator.For a non-technical family member running the import,
INVALID_FILENAME_PATH_TRAVERSALis opaque. A follow-up to add localized labels for all skip reasons (not just the new one) would meaningfully improve the operator experience — but this is scope for a separate issue. It has been noted in the previous review round and remains valid.LGTM from design and accessibility.