feat(filename): support compound last names like de Gruyter
Some checks failed
CI / Unit & Component Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
CI / Unit & Component Tests (pull_request) Successful in 2m17s
CI / Backend Unit Tests (pull_request) Successful in 2m13s
CI / E2E Tests (pull_request) Failing after 25m0s

Replace the four fixed regexes with a split-based algorithm:
- first segment = date → last segment = firstName, rest = lastName parts
- last segment = date → second-to-last = firstName, rest = lastName parts

18881025_de_Gruyter_Walter.pdf now correctly yields "Walter de Gruyter".
Simple two-segment names behave identically to before.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-03-26 15:33:21 +01:00
parent a302f96560
commit f0940524e7
4 changed files with 157 additions and 87 deletions

View File

@@ -545,6 +545,18 @@ class DocumentServiceTest {
.isEqualTo("Hans Mueller (12.03.1965)");
}
@Test
void titleFromFilename_compound_lastName_dateFirst() {
assertThat(DocumentService.titleFromFilename("18881025_de_Gruyter_Walter.pdf"))
.isEqualTo("Walter de Gruyter (25.10.1888)");
}
@Test
void titleFromFilename_compound_lastName_dateLast() {
assertThat(DocumentService.titleFromFilename("de_Gruyter_Walter_18881025.pdf"))
.isEqualTo("Walter de Gruyter (25.10.1888)");
}
@Test
void titleFromFilename_fallsBackToStripExtension() {
assertThat(DocumentService.titleFromFilename("scan_001.pdf")).isEqualTo("scan_001");