A symlink placed inside importDir pointing to a file outside it would pass
isValidImportFilename (no forbidden chars in the symlink name) and be found
by Files.walk. Now checks candidate.getCanonicalPath() against
baseDir.getCanonicalPath() — if the resolved path escapes importDir,
throws DomainException.internal and aborts the import. Adds regression
test using @TempDir + Files.createSymbolicLink.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces MassImportService.SkipReason with all five values —
INVALID_FILENAME_PATH_TRAVERSAL, INVALID_PDF_SIGNATURE, FILE_READ_ERROR,
ALREADY_EXISTS, S3_UPLOAD_FAILED — making the full set of reasons greppable
and type-safe. SkippedFile.reason changes from String to SkipReason;
importSingleDocument return type updated accordingly. JSON serialisation
is unchanged (Jackson serialises enums by name). All tests updated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents that .hidden.pdf and "Brief an Oma.pdf" correctly pass the
isValidImportFilename guard — both are valid basenames common in the archive.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds checks for U+2215 DIVISION SLASH (∕), U+FF0F FULLWIDTH SOLIDUS (/),
and U+29F5 REVERSE SOLIDUS OPERATOR (⧵) — all of which bypass the existing
ASCII separator checks on Linux path resolution. Adds a clarifying comment on
the Paths.get().isAbsolute() call explaining its InvalidPathException safety
boundary. Adds 3 regression tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rejects path-traversal filenames before findFileRecursive runs.
Guard runs on the derived filename (after the ternary) as specified.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codifies the path-traversal constraint that was previously safe by
accident (findFileRecursive's getFileName() strip) but had no explicit
guard or test coverage. Fixes issue #530.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>