Compare commits

..

11 Commits

Author SHA1 Message Date
Marcel
b5239f515f fix(notification): address review suggestions
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m23s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m17s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
- ChronikFuerDichBox: move update() inside the failure branch so success
  path skips it, matching NotificationDropdown's pattern
- NotificationDropdown test: add role=alert assertion for mark-all-read
  failure to match existing dismiss-failure coverage in ChronikFuerDichBox
- +page.server.ts: use getErrorMessage(undefined) instead of null so the
  missing-notificationId 400 goes through the same i18n pipeline as other errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 07:31:26 +02:00
Marcel
f2bb58e294 fix(chronik): surface action failures in ChronikFuerDichBox with accessible error banner
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m35s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m17s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Add $state errorMessage + role=alert banner to ChronikFuerDichBox. Both enhance callbacks
now inspect result.type and set the error message on 'failure' or 'error'; errorMessage
is cleared on each new submit attempt.

Upgrade both test files to the mockFormResult pattern (via vi.hoisted) so the result
callback is exercised. Add a failing-action test in each file that asserts role=alert
appears after a form submit with type='failure'.

Fix bare Function cast → explicit typed cast to satisfy @typescript-eslint/no-unsafe-function-type.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 07:06:58 +02:00
Marcel
2adb98895d fix(aktivitaeten): narrow File cast and use null payload for missing notificationId
Replace 'as string | null' cast (which silently accepts File values) with an explicit
typeof check. Use error: null instead of hardcoded German so the client falls through
to the generic i18n-keyed error banner.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 07:06:15 +02:00
Marcel
6049dcadd3 fix(notification-dropdown): handle error result type, add role=alert, fix update ordering
- Add role="alert" to error banner so screen-reader users hear failures
- Handle result.type === 'error' (network failure) alongside 'failure' in both enhance callbacks
- Clear errorMessage at the start of each submit so stale errors don't persist on retry
- On dismiss success: skip update() entirely since goto() navigates away from the page
- On dismiss failure: await update() then set error message
- On mark-all success: skip update() (optimistic state already applied)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 07:05:50 +02:00
Marcel
7fe8842b57 fix(notifications): surface action failures as an error banner
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m25s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m23s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Semgrep Security Scan (pull_request) Successful in 18s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
When dismiss-notification or mark-all-read returns a failure the dropdown
now shows a localised error message above the list. Added
notification_error_generic key (de/en/es) as the fallback when the
action response carries no explicit error string.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:56:33 +02:00
Marcel
f9340366d1 fix(notifications): move onClose/goto into enhance result callback
onClose() and goto() were firing before the server responded, making it
impossible for a fail() response to cancel navigation. Moved them inside
the result callback behind a result.type !== 'failure' guard.

Updated the $app/forms enhance mock to always invoke the returned async
callback with a configurable mockFormResult, and added three tests:
- success path calls onClose + goto with the correct deep-link URL
- failure path skips onClose and goto
- annotationId is appended to the URL when present

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:54:15 +02:00
Marcel
af84ffc379 fix(notifications): guard against null notificationId in dismiss action
Casting null to string caused PATCH to fire against /api/notifications/null/read
when the field was absent. Added an early-return fail(400) and a test that
submitting an empty form returns 400 without calling the API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:48:37 +02:00
Marcel
23439e581a refactor(chronik): replace callback props with form actions in ChronikFuerDichBox
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m20s
CI / OCR Service Tests (pull_request) Successful in 21s
CI / Backend Unit Tests (pull_request) Successful in 3m22s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
Dismiss (X) button and mark-all-read button now submit forms to
/aktivitaeten?/dismiss-notification and /aktivitaeten?/mark-all-read respectively.
Props renamed onMarkRead/onMarkAllRead → optimisticMarkRead/optimisticMarkAllRead.

aktivitaeten/+page.svelte drops the now-deleted onMarkRead/onMarkAllRead wrapper functions
and passes notificationStore.optimisticMarkRead/optimisticMarkAllRead directly to the box.

Tests: $app/forms enhance mock added to both spec files so dismiss and mark-all assertions
work synchronously against form-submit events.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:16:58 +02:00
Marcel
2c6b59d0c7 refactor(notification): replace callback props with form actions in Dropdown and Bell
NotificationDropdown now wraps each row in a <form action="/aktivitaeten?/dismiss-notification">
and the mark-all control in <form action="/aktivitaeten?/mark-all-read">, wired via use:enhance
for optimistic UI. Props renamed onMarkRead/onMarkAllRead → optimisticMarkRead/optimisticMarkAllRead
to match the simplified store API. NotificationBell passes the store helpers directly; handleMarkRead
is removed.

Test mocks updated: $app/forms enhance mock fires SubmitFunction synchronously on form submit so
callback assertions work without a real HTTP round-trip.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:15:56 +02:00
Marcel
c0a7408ef4 refactor(notification): rename markRead/markAllRead to optimistic helpers without fetch
Removes raw fetch() calls from the store. optimisticMarkRead(id) and
optimisticMarkAllRead() now only mutate local $state — the actual API
calls move to SvelteKit form actions on /aktivitaeten.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 22:59:01 +02:00
Marcel
9d283c4500 feat(notification): add dismiss-notification and mark-all-read form actions to aktivitaeten
Adds two SvelteKit form actions to /aktivitaeten/+page.server.ts so the
notification bell can POST there instead of calling the backend directly
from the browser.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 22:51:08 +02:00
157 changed files with 544 additions and 15669 deletions

View File

@@ -39,12 +39,6 @@ PORT_PROMETHEUS=9090
# Grafana admin password — change this before exposing Grafana beyond localhost
GRAFANA_ADMIN_PASSWORD=changeme
# Password for the read-only grafana_reader PostgreSQL role used by the PO
# Overview dashboard. Consumed by Flyway V68 (to set the role's password) and
# by Grafana's PostgreSQL datasource (to connect). REQUIRED in production —
# generate with: openssl rand -hex 32
GRAFANA_DB_PASSWORD=changeme-generate-with-openssl-rand-hex-32
# GlitchTip domain — production: use https://glitchtip.archiv.raddatz.cloud (must match Caddy vhost)
GLITCHTIP_DOMAIN=http://localhost:3002

View File

@@ -31,7 +31,6 @@ name: nightly
# STAGING_APP_ADMIN_USERNAME
# STAGING_APP_ADMIN_PASSWORD
# GRAFANA_ADMIN_PASSWORD
# GRAFANA_DB_PASSWORD (read-only grafana_reader DB role, issue #651)
# GLITCHTIP_SECRET_KEY
# SENTRY_DSN (set after GlitchTip first-run; empty = Sentry disabled)
@@ -80,8 +79,6 @@ jobs:
IMPORT_HOST_DIR=/srv/familienarchiv-staging/import
POSTGRES_USER=archiv
SENTRY_DSN=${{ secrets.SENTRY_DSN }}
VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
EOF
- name: Verify backend /import:ro mount is wired
@@ -145,7 +142,6 @@ jobs:
cp docker-compose.observability.yml /opt/familienarchiv/
cat > /opt/familienarchiv/obs-secrets.env <<'EOF'
GRAFANA_ADMIN_PASSWORD=${{ secrets.GRAFANA_ADMIN_PASSWORD }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
GLITCHTIP_SECRET_KEY=${{ secrets.GLITCHTIP_SECRET_KEY }}
POSTGRES_PASSWORD=${{ secrets.STAGING_POSTGRES_PASSWORD }}
POSTGRES_HOST=archiv-staging-db-1

View File

@@ -35,7 +35,6 @@ name: release
# MAIL_USERNAME
# MAIL_PASSWORD
# GRAFANA_ADMIN_PASSWORD
# GRAFANA_DB_PASSWORD (read-only grafana_reader DB role, issue #651)
# GLITCHTIP_SECRET_KEY
# SENTRY_DSN (set after GlitchTip first-run; empty = Sentry disabled)
@@ -78,7 +77,6 @@ jobs:
IMPORT_HOST_DIR=/srv/familienarchiv-production/import
POSTGRES_USER=archiv
SENTRY_DSN=${{ secrets.SENTRY_DSN }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
EOF
- name: Build images
@@ -112,7 +110,6 @@ jobs:
cp docker-compose.observability.yml /opt/familienarchiv/
cat > /opt/familienarchiv/obs-secrets.env <<'EOF'
GRAFANA_ADMIN_PASSWORD=${{ secrets.GRAFANA_ADMIN_PASSWORD }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
GLITCHTIP_SECRET_KEY=${{ secrets.GLITCHTIP_SECRET_KEY }}
POSTGRES_PASSWORD=${{ secrets.PROD_POSTGRES_PASSWORD }}
POSTGRES_HOST=archiv-production-db-1

4
.gitignore vendored
View File

@@ -26,7 +26,3 @@ node_modules/
# Repo uses npm; yarn.lock is ignored to avoid double-lockfile drift.
frontend/yarn.lock
**/.venv/
**/__pycache__/
*.pyc

View File

@@ -5,10 +5,8 @@ import lombok.extern.slf4j.Slf4j;
import org.flywaydb.core.Flyway;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.env.Environment;
import javax.sql.DataSource;
import java.util.Map;
@Configuration
@RequiredArgsConstructor
@@ -16,7 +14,6 @@ import java.util.Map;
public class FlywayConfig {
private final DataSource dataSource;
private final Environment environment;
@Bean(name = "flyway")
public Flyway flyway() {
@@ -24,7 +21,6 @@ public class FlywayConfig {
Flyway flyway = Flyway.configure()
.dataSource(dataSource)
.locations("classpath:db/migration")
.placeholders(Map.of("grafanaDbPassword", resolveGrafanaDbPassword()))
.baselineOnMigrate(true)
.baselineVersion("4")
.load();
@@ -32,22 +28,4 @@ public class FlywayConfig {
log.info("Flyway: {} migration(s) applied.", result.migrationsExecuted);
return flyway;
}
// Fail-closed: refuse to boot when GRAFANA_DB_PASSWORD is unset. The
// grafana_reader role's password is (re)set on every boot by
// R__grafana_reader_password.sql, so a missing env var means we'd either
// skip the rotation silently or — with a hardcoded fallback — publish a
// well-known credential for a role with SELECT on audit_log, documents,
// and transcription_blocks. Same shape as UserDataInitializer's refusal
// to seed default admin credentials outside dev/test/e2e.
String resolveGrafanaDbPassword() {
String value = environment.getProperty("GRAFANA_DB_PASSWORD");
if (value == null || value.isBlank()) {
throw new IllegalStateException(
"GRAFANA_DB_PASSWORD is required: it is consumed by "
+ "R__grafana_reader_password.sql to (re)set the grafana_reader "
+ "role's password on every boot. Generate with: openssl rand -hex 32");
}
return value;
}
}

View File

@@ -25,12 +25,10 @@ import java.util.UUID;
@NamedEntityGraph(name = "Document.full", attributeNodes = {
@NamedAttributeNode("sender"),
@NamedAttributeNode("receivers"),
@NamedAttributeNode("tags"),
@NamedAttributeNode("trainingLabels")
@NamedAttributeNode("tags")
})
@NamedEntityGraph(name = "Document.list", attributeNodes = {
@NamedAttributeNode("sender"),
@NamedAttributeNode("receivers"),
@NamedAttributeNode("tags")
})
@Entity

View File

@@ -1,36 +0,0 @@
package org.raddatz.familienarchiv.document;
import io.swagger.v3.oas.annotations.media.Schema;
import org.raddatz.familienarchiv.audit.ActivityActorDTO;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.tag.Tag;
import java.time.LocalDate;
import java.util.List;
import java.util.UUID;
public record DocumentListItem(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
UUID id,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
String title,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
String originalFilename,
String thumbnailUrl,
LocalDate documentDate,
Person sender,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<Person> receivers,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<Tag> tags,
String archiveBox,
String archiveFolder,
String location,
String summary,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
int completionPercentage,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<ActivityActorDTO> contributors,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
SearchMatchData matchData
) {}

View File

@@ -0,0 +1,18 @@
package org.raddatz.familienarchiv.document;
import io.swagger.v3.oas.annotations.media.Schema;
import org.raddatz.familienarchiv.audit.ActivityActorDTO;
import org.raddatz.familienarchiv.document.Document;
import java.util.List;
public record DocumentSearchItem(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
Document document,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
SearchMatchData matchData,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
int completionPercentage,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<ActivityActorDTO> contributors
) {}

View File

@@ -7,7 +7,7 @@ import java.util.List;
public record DocumentSearchResult(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<DocumentListItem> items,
List<DocumentSearchItem> items,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
long totalElements,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
@@ -21,16 +21,16 @@ public record DocumentSearchResult(
* Single-page convenience factory used by empty-result shortcuts and by tests that
* don't care about paging. Treats the whole list as page 0 of itself.
*/
public static DocumentSearchResult of(List<DocumentListItem> items) {
public static DocumentSearchResult of(List<DocumentSearchItem> items) {
int size = items.size();
return new DocumentSearchResult(items, size, 0, size, size == 0 ? 0 : 1);
}
/**
* Paged factory used by the service when it has a real Pageable + full match count
* (e.g. from Spring's Page&lt;T&gt; or from an in-memory sort-then-slice).
* (e.g. from Spring's Page<T> or from an in-memory sort-then-slice).
*/
public static DocumentSearchResult paged(List<DocumentListItem> slice, Pageable pageable, long totalElements) {
public static DocumentSearchResult paged(List<DocumentSearchItem> slice, Pageable pageable, long totalElements) {
int pageSize = pageable.getPageSize();
int totalPages = pageSize == 0 ? 0 : (int) ((totalElements + pageSize - 1) / pageSize);
return new DocumentSearchResult(slice, totalElements, pageable.getPageNumber(), pageSize, totalPages);

View File

@@ -10,6 +10,7 @@ import org.raddatz.familienarchiv.audit.AuditService;
import org.raddatz.familienarchiv.document.DocumentBatchMetadataDTO;
import org.raddatz.familienarchiv.document.DocumentBatchSummary;
import org.raddatz.familienarchiv.document.DocumentBulkEditDTO;
import org.raddatz.familienarchiv.document.DocumentSearchItem;
import org.raddatz.familienarchiv.document.DocumentSearchResult;
import org.raddatz.familienarchiv.document.DocumentSort;
import org.raddatz.familienarchiv.document.DocumentUpdateDTO;
@@ -735,7 +736,7 @@ public class DocumentService {
return DocumentSearchResult.paged(enrichItems(slice, text), pageable, totalElements);
}
private List<DocumentListItem> enrichItems(List<Document> documents, String text) {
private List<DocumentSearchItem> enrichItems(List<Document> documents, String text) {
List<Document> colorResolved = resolveDocumentTagColors(documents);
Map<UUID, SearchMatchData> matchData = enrichWithMatchData(colorResolved, text);
@@ -743,7 +744,7 @@ public class DocumentService {
Map<UUID, Integer> completionByDoc = fetchCompletionPercentages(docIds);
Map<UUID, List<ActivityActorDTO>> contributorsByDoc = auditLogQueryService.findRecentContributorsPerDocument(docIds);
return colorResolved.stream().map(doc -> toListItem(
return colorResolved.stream().map(doc -> new DocumentSearchItem(
doc,
matchData.getOrDefault(doc.getId(), SearchMatchData.empty()),
completionByDoc.getOrDefault(doc.getId(), 0),
@@ -751,26 +752,6 @@ public class DocumentService {
)).toList();
}
private DocumentListItem toListItem(Document doc, SearchMatchData match, int completionPct, List<ActivityActorDTO> contributors) {
return new DocumentListItem(
doc.getId(),
doc.getTitle(),
doc.getOriginalFilename(),
doc.getThumbnailUrl(),
doc.getDocumentDate(),
doc.getSender(),
List.copyOf(doc.getReceivers()),
List.copyOf(doc.getTags()),
doc.getArchiveBox(),
doc.getArchiveFolder(),
doc.getLocation(),
doc.getSummary(),
completionPct,
contributors,
match
);
}
private Map<UUID, Integer> fetchCompletionPercentages(List<UUID> docIds) {
return transcriptionBlockQueryService.getCompletionStats(docIds);
}

View File

@@ -43,7 +43,7 @@ public class TranscriptionBlockController {
@PostMapping
@ResponseStatus(HttpStatus.CREATED)
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public TranscriptionBlock createBlock(
@PathVariable UUID documentId,
@Valid @RequestBody CreateTranscriptionBlockDTO dto,
@@ -53,7 +53,7 @@ public class TranscriptionBlockController {
}
@PutMapping("/{blockId}")
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public TranscriptionBlock updateBlock(
@PathVariable UUID documentId,
@PathVariable UUID blockId,
@@ -65,7 +65,7 @@ public class TranscriptionBlockController {
@DeleteMapping("/{blockId}")
@ResponseStatus(HttpStatus.NO_CONTENT)
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public void deleteBlock(
@PathVariable UUID documentId,
@PathVariable UUID blockId) {
@@ -73,7 +73,7 @@ public class TranscriptionBlockController {
}
@PutMapping("/reorder")
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public List<TranscriptionBlock> reorderBlocks(
@PathVariable UUID documentId,
@RequestBody ReorderTranscriptionBlocksDTO dto) {
@@ -82,7 +82,7 @@ public class TranscriptionBlockController {
}
@PutMapping("/{blockId}/review")
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public TranscriptionBlock reviewBlock(
@PathVariable UUID documentId,
@PathVariable UUID blockId,
@@ -92,7 +92,7 @@ public class TranscriptionBlockController {
}
@PutMapping("/review-all")
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public List<TranscriptionBlock> markAllBlocksReviewed(
@PathVariable UUID documentId,
Authentication authentication) {

View File

@@ -56,17 +56,9 @@ public class MassImportService {
public enum State { IDLE, RUNNING, DONE, FAILED }
public enum SkipReason {
INVALID_FILENAME_PATH_TRAVERSAL,
INVALID_PDF_SIGNATURE,
FILE_READ_ERROR,
ALREADY_EXISTS,
S3_UPLOAD_FAILED
}
public record SkippedFile(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) String filename,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) SkipReason reason
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) String reason
) {}
public record ImportStatus(
@@ -299,11 +291,6 @@ public class MassImportService {
if (index.isBlank()) continue;
String filename = index.contains(".") ? index : index + ".pdf";
if (!isValidImportFilename(filename)) {
log.warn("Skipping import row {}: filename rejected — {}", i, filename);
skippedFiles.add(new SkippedFile(filename, SkipReason.INVALID_FILENAME_PATH_TRAVERSAL));
continue;
}
Optional<File> fileOnDisk = findFileRecursive(filename);
if (fileOnDisk.isEmpty()) {
log.warn("Datei nicht gefunden, importiere nur Metadaten: {}", filename);
@@ -313,17 +300,17 @@ public class MassImportService {
try {
if (!isPdfMagicBytes(fileOnDisk.get())) {
log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename);
skippedFiles.add(new SkippedFile(filename, SkipReason.INVALID_PDF_SIGNATURE));
skippedFiles.add(new SkippedFile(filename, "INVALID_PDF_SIGNATURE"));
continue;
}
} catch (IOException e) {
log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e);
skippedFiles.add(new SkippedFile(filename, SkipReason.FILE_READ_ERROR));
skippedFiles.add(new SkippedFile(filename, "FILE_READ_ERROR"));
continue;
}
}
Optional<SkipReason> skipReason = importSingleDocument(cells, fileOnDisk, filename, index);
Optional<String> skipReason = importSingleDocument(cells, fileOnDisk, filename, index);
if (skipReason.isPresent()) {
skippedFiles.add(new SkippedFile(filename, skipReason.get()));
} else {
@@ -333,23 +320,6 @@ public class MassImportService {
return new ProcessResult(processed, skippedFiles);
}
private boolean isValidImportFilename(String filename) {
if (filename == null || filename.isBlank()) return false;
if (filename.contains("/")) return false;
if (filename.contains("\\")) return false;
if (filename.contains("")) return false; // U+2215 DIVISION SLASH
if (filename.contains("")) return false; // U+FF0F FULLWIDTH SOLIDUS
if (filename.contains("")) return false; // U+29F5 REVERSE SOLIDUS OPERATOR
if (filename.contains("..")) return false;
if (filename.equals(".")) return false;
if (filename.contains("\0")) return false;
// Paths.get() is safe here on Linux for all inputs that passed the checks above;
// it may throw InvalidPathException for OS-specific illegal chars on Windows,
// but those are not reachable in production.
if (Paths.get(filename).isAbsolute()) return false;
return true;
}
// package-private: Mockito spy in tests can override to inject IOException
InputStream openFileStream(File file) throws IOException {
return new FileInputStream(file);
@@ -372,11 +342,11 @@ public class MassImportService {
* @return empty Optional on success; an Optional containing the skip reason on failure/skip.
*/
@Transactional
protected Optional<SkipReason> importSingleDocument(List<String> cells, Optional<File> file, String originalFilename, String index) {
protected Optional<String> importSingleDocument(List<String> cells, Optional<File> file, String originalFilename, String index) {
Optional<Document> existing = documentService.findByOriginalFilename(originalFilename);
if (existing.isPresent() && existing.get().getStatus() != DocumentStatus.PLACEHOLDER) {
log.info("Dokument {} existiert bereits, überspringe.", originalFilename);
return Optional.of(SkipReason.ALREADY_EXISTS);
return Optional.of("ALREADY_EXISTS");
}
String archiveBox = getCell(cells, colBox);
@@ -412,7 +382,7 @@ public class MassImportService {
status = DocumentStatus.UPLOADED;
} catch (Exception e) {
log.error("S3 Upload Fehler für {}", file.get().getName(), e);
return Optional.of(SkipReason.S3_UPLOAD_FAILED);
return Optional.of("S3_UPLOAD_FAILED");
}
}
@@ -490,18 +460,11 @@ public class MassImportService {
}
private Optional<File> findFileRecursive(String filename) {
File baseDir = new File(importDir);
try (Stream<Path> walk = Files.walk(baseDir.toPath())) {
Optional<Path> match = walk.filter(p -> !Files.isDirectory(p))
try (Stream<Path> walk = Files.walk(Paths.get(importDir))) {
return walk.filter(p -> !Files.isDirectory(p))
.filter(p -> p.getFileName().toString().equals(filename))
.map(Path::toFile)
.findFirst();
if (match.isEmpty()) return Optional.empty();
File candidate = match.get().toFile();
String baseDirCanonical = baseDir.getCanonicalPath();
if (!candidate.getCanonicalPath().startsWith(baseDirCanonical + File.separator)) {
throw DomainException.internal(ErrorCode.INTERNAL_ERROR, "Path escape detected: " + candidate);
}
return Optional.of(candidate);
} catch (IOException e) {
return Optional.empty();
}

View File

@@ -1,14 +0,0 @@
-- Repeatable migration: sets the grafana_reader role's password from the
-- ${grafanaDbPassword} placeholder (resolved by FlywayConfig from the
-- GRAFANA_DB_PASSWORD environment variable). Flyway computes the checksum on
-- the resolved migration content, so any change to GRAFANA_DB_PASSWORD changes
-- the checksum and re-applies this migration on the next boot. That makes
-- password rotation a "change env var + restart" operation — no manual psql.
--
-- V68 created the role itself (without a usable password). This file owns the
-- password lifecycle; nothing else writes it.
DO $$
BEGIN
EXECUTE format('ALTER ROLE grafana_reader WITH PASSWORD %L', '${grafanaDbPassword}');
END
$$;

View File

@@ -1,17 +0,0 @@
-- Read-only role used by the Grafana PostgreSQL datasource for the PO Overview
-- dashboard (issue #651). The role is created here without a usable password
-- (LOGIN-capable but no password set); R__grafana_reader_password.sql sets the
-- password from GRAFANA_DB_PASSWORD on every boot, so rotation is just "bump
-- the env var and restart the backend" — see docs/adr/024-* and the rotation
-- runbook in docs/DEPLOYMENT.md.
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = 'grafana_reader') THEN
CREATE ROLE grafana_reader WITH LOGIN;
END IF;
END
$$;
GRANT CONNECT ON DATABASE ${flyway:database} TO grafana_reader;
GRANT USAGE ON SCHEMA public TO grafana_reader;
GRANT SELECT ON audit_log, documents, transcription_blocks TO grafana_reader;

View File

@@ -1,37 +0,0 @@
package org.raddatz.familienarchiv.config;
import org.junit.jupiter.api.Test;
import org.springframework.mock.env.MockEnvironment;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
class FlywayConfigTest {
@Test
void resolveGrafanaDbPassword_throws_when_env_unset() {
FlywayConfig config = new FlywayConfig(null, new MockEnvironment());
assertThatThrownBy(config::resolveGrafanaDbPassword)
.isInstanceOf(IllegalStateException.class)
.hasMessageContaining("GRAFANA_DB_PASSWORD is required");
}
@Test
void resolveGrafanaDbPassword_throws_when_env_blank() {
MockEnvironment env = new MockEnvironment().withProperty("GRAFANA_DB_PASSWORD", " ");
FlywayConfig config = new FlywayConfig(null, env);
assertThatThrownBy(config::resolveGrafanaDbPassword)
.isInstanceOf(IllegalStateException.class)
.hasMessageContaining("GRAFANA_DB_PASSWORD is required");
}
@Test
void resolveGrafanaDbPassword_returns_value_when_env_set() {
MockEnvironment env = new MockEnvironment().withProperty("GRAFANA_DB_PASSWORD", "abc");
FlywayConfig config = new FlywayConfig(null, env);
assertThat(config.resolveGrafanaDbPassword()).isEqualTo("abc");
}
}

View File

@@ -1,89 +0,0 @@
package org.raddatz.familienarchiv.config;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.data.jpa.test.autoconfigure.DataJpaTest;
import org.springframework.boot.jdbc.test.autoconfigure.AutoConfigureTestDatabase;
import org.springframework.context.annotation.Import;
import org.springframework.jdbc.core.JdbcTemplate;
import static org.assertj.core.api.Assertions.assertThat;
// GRAFANA_DB_PASSWORD is supplied via the global test default in
// src/test/resources/application.properties — FlywayConfig fails closed
// when it is unset, so all tests that load the migration path need it.
@DataJpaTest
@AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
@Import({PostgresContainerConfig.class, FlywayConfig.class})
class GrafanaReaderRoleIntegrationTest {
@Autowired JdbcTemplate jdbc;
// --- positive grants (SELECT on the three explicitly granted tables) ---
@Test
void grafana_reader_has_select_on_audit_log() {
assertThat(hasPrivilege("audit_log", "SELECT")).isTrue();
}
@Test
void grafana_reader_has_select_on_documents() {
assertThat(hasPrivilege("documents", "SELECT")).isTrue();
}
@Test
void grafana_reader_has_select_on_transcription_blocks() {
assertThat(hasPrivilege("transcription_blocks", "SELECT")).isTrue();
}
// --- write-deny on the granted tables: SELECT-only means SELECT-only.
// A future migration that GRANTs INSERT/UPDATE/DELETE on any of these
// would fail these tests, even though the original positive grants still
// pass. Locks the boundary in both directions.
@Test
void grafana_reader_has_no_INSERT_on_documents() {
assertThat(hasPrivilege("documents", "INSERT")).isFalse();
}
@Test
void grafana_reader_has_no_UPDATE_on_audit_log() {
assertThat(hasPrivilege("audit_log", "UPDATE")).isFalse();
}
@Test
void grafana_reader_has_no_DELETE_on_transcription_blocks() {
assertThat(hasPrivilege("transcription_blocks", "DELETE")).isFalse();
}
// --- negative grants: PII / sensitive tables MUST NOT be readable.
// The parameterized form catches the "someone widened the grant to
// ALL TABLES IN SCHEMA public" footgun — three specific positive grants
// would still pass while this sweep turns red.
@ParameterizedTest
@ValueSource(strings = {
"app_users",
"user_groups",
"persons",
"notifications",
"document_comments",
"document_annotations",
"geschichten"
})
void grafana_reader_has_no_SELECT_on_protected_table(String table) {
assertThat(hasPrivilege(table, "SELECT")).isFalse();
}
private boolean hasPrivilege(String table, String privilege) {
Boolean result = jdbc.queryForObject(
"SELECT has_table_privilege('grafana_reader', ?, ?)",
Boolean.class,
table,
privilege);
return Boolean.TRUE.equals(result);
}
}

View File

@@ -27,6 +27,7 @@ import org.springframework.security.test.context.support.WithMockUser;
import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.test.web.servlet.MockMvc;
import org.raddatz.familienarchiv.document.DocumentSearchItem;
import org.raddatz.familienarchiv.document.SearchMatchData;
import java.time.LocalDateTime;
@@ -129,13 +130,16 @@ class DocumentControllerTest {
@WithMockUser
void search_responseBodyItemsContainMatchData() throws Exception {
UUID docId = UUID.randomUUID();
Document doc = Document.builder()
.id(docId)
.title("Brief an Anna")
.originalFilename("brief.pdf")
.status(DocumentStatus.UPLOADED)
.build();
var matchData = new SearchMatchData(
"Er schrieb einen langen Brief", List.of(), false, List.of(), List.of(), List.of(), null, List.of());
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any()))
.thenReturn(DocumentSearchResult.of(List.of(new DocumentListItem(
docId, "Brief an Anna", "brief.pdf", null, null, null,
List.of(), List.of(), null, null, null, null,
0, List.of(), matchData))));
.thenReturn(DocumentSearchResult.of(List.of(new DocumentSearchItem(doc, matchData, 0, List.of()))));
mockMvc.perform(get("/api/documents/search").param("q", "Brief"))
.andExpect(status().isOk())
@@ -144,27 +148,6 @@ class DocumentControllerTest {
.value("Er schrieb einen langen Brief"));
}
@Test
@WithMockUser
void search_returns_flat_item_with_id_and_without_sensitive_fields() throws Exception {
UUID docId = UUID.randomUUID();
var matchData = new SearchMatchData(null, List.of(), false, List.of(), List.of(), List.of(), null, List.of());
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any()))
.thenReturn(DocumentSearchResult.of(List.of(new DocumentListItem(
docId, "Brief an Anna", "brief.pdf", null, null, null,
List.of(), List.of(), null, null, null, null,
0, List.of(), matchData))));
mockMvc.perform(get("/api/documents/search"))
.andExpect(status().isOk())
// flat id field present at top of item (not nested under $.items[0].document.id)
.andExpect(jsonPath("$.items[0].id").value(docId.toString()))
// sensitive storage fields must never appear in list response
.andExpect(jsonPath("$.items[0].transcription").doesNotExist())
.andExpect(jsonPath("$.items[0].filePath").doesNotExist())
.andExpect(jsonPath("$.items[0].fileHash").doesNotExist());
}
// ─── /api/documents/search pagination ─────────────────────────────────────
@Test

View File

@@ -127,7 +127,7 @@ class DocumentLazyLoadingTest {
PageRequest.of(0, 20));
assertThat(result.totalElements()).isGreaterThan(0);
assertThatCode(() ->
result.items().forEach(i -> { if (i.sender() != null) i.sender().getLastName(); }))
result.items().forEach(i -> i.document().getSender().getLastName()))
.doesNotThrowAnyException();
}

View File

@@ -1,98 +0,0 @@
package org.raddatz.familienarchiv.document;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.raddatz.familienarchiv.audit.AuditLogQueryService;
import org.raddatz.familienarchiv.ocr.TrainingLabel;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.context.annotation.Import;
import org.springframework.data.domain.PageRequest;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.bean.override.mockito.MockitoBean;
import software.amazon.awssdk.services.s3.S3Client;
import java.util.HashSet;
import java.util.Set;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatCode;
/**
* AC #2: Document with trainingLabels does not cause LazyInitializationException in search.
* AC #3: Detail API still returns trainingLabels after the Document.list graph change.
*/
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.NONE)
@ActiveProfiles("test")
@Import(PostgresContainerConfig.class)
class DocumentListItemIntegrationTest {
@MockitoBean
S3Client s3Client;
@MockitoBean
AuditLogQueryService auditLogQueryService;
@Autowired
DocumentRepository documentRepository;
@Autowired
DocumentService documentService;
@AfterEach
void cleanup() {
documentRepository.deleteAll();
}
@Test
void search_doesNotThrow_whenDocumentHasTrainingLabels() {
documentRepository.save(Document.builder()
.title("Kurrent Brief")
.originalFilename("kurrent.pdf")
.status(DocumentStatus.UPLOADED)
.trainingLabels(new HashSet<>(Set.of(TrainingLabel.KURRENT_RECOGNITION)))
.build());
assertThatCode(() -> documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null,
PageRequest.of(0, 50)))
.doesNotThrowAnyException();
}
@Test
void search_returns_list_item_without_sensitive_fields_when_document_has_training_labels() {
documentRepository.save(Document.builder()
.title("Kurrent Brief")
.originalFilename("kurrent2.pdf")
.status(DocumentStatus.UPLOADED)
.trainingLabels(new HashSet<>(Set.of(TrainingLabel.KURRENT_RECOGNITION)))
.build());
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null,
PageRequest.of(0, 50));
assertThat(result.totalElements()).isGreaterThan(0);
DocumentListItem item = result.items().get(0);
assertThat(item.id()).isNotNull();
assertThat(item.title()).isEqualTo("Kurrent Brief");
}
@Test
void detail_stillReturnsTrainingLabels() {
Document saved = documentRepository.save(Document.builder()
.title("Detail Test")
.originalFilename("detail_test.pdf")
.status(DocumentStatus.UPLOADED)
.trainingLabels(new HashSet<>(Set.of(TrainingLabel.KURRENT_RECOGNITION)))
.build());
// Document.full entity graph (used by getDocumentById) must still load trainingLabels
Document loaded = documentService.getDocumentById(saved.getId());
assertThat(loaded.getTrainingLabels()).containsExactly(TrainingLabel.KURRENT_RECOGNITION);
}
}

View File

@@ -125,10 +125,10 @@ class DocumentSearchPagedIntegrationTest {
// No document id should appear on both pages — slicing must be exclusive.
var idsOnPage0 = page0.items().stream()
.map(item -> item.id())
.map(item -> item.document().getId())
.toList();
var idsOnPage1 = page1.items().stream()
.map(item -> item.id())
.map(item -> item.document().getId())
.toList();
for (UUID id : idsOnPage0) {
assertThat(idsOnPage1).doesNotContain(id);

View File

@@ -3,6 +3,8 @@ package org.raddatz.familienarchiv.document;
import io.swagger.v3.oas.annotations.media.Schema;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.audit.ActivityActorDTO;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.springframework.data.domain.PageRequest;
import java.util.List;
@@ -12,11 +14,14 @@ import static org.assertj.core.api.Assertions.assertThat;
class DocumentSearchResultTest {
private DocumentListItem item(UUID docId) {
return new DocumentListItem(
docId, "Test", "test.pdf", null, null, null,
List.of(), List.of(), null, null, null, null,
0, List.of(), SearchMatchData.empty());
private DocumentSearchItem item(UUID docId) {
Document doc = Document.builder()
.id(docId)
.title("Test")
.originalFilename("test.pdf")
.status(DocumentStatus.UPLOADED)
.build();
return new DocumentSearchItem(doc, SearchMatchData.empty(), 0, List.of());
}
@Test
@@ -40,7 +45,7 @@ class DocumentSearchResultTest {
@Test
void paged_factory_populates_paging_fields_from_pageable_and_total() {
List<DocumentListItem> slice = List.of(item(UUID.randomUUID()), item(UUID.randomUUID()));
List<DocumentSearchItem> slice = List.of(item(UUID.randomUUID()), item(UUID.randomUUID()));
DocumentSearchResult result = DocumentSearchResult.paged(slice, PageRequest.of(1, 50), 120L);
@@ -63,10 +68,9 @@ class DocumentSearchResultTest {
void of_exposes_items_with_completion_and_contributors() {
UUID id = UUID.randomUUID();
ActivityActorDTO actor = new ActivityActorDTO("AB", "#f00", "Anna Braun");
DocumentListItem item = new DocumentListItem(
id, "T", "t.pdf", null, null, null,
List.of(), List.of(), null, null, null, null,
75, List.of(actor), SearchMatchData.empty());
Document doc = Document.builder().id(id).title("T").originalFilename("t.pdf")
.status(DocumentStatus.UPLOADED).build();
DocumentSearchItem item = new DocumentSearchItem(doc, SearchMatchData.empty(), 75, List.of(actor));
DocumentSearchResult result = DocumentSearchResult.of(List.of(item));

View File

@@ -70,7 +70,7 @@ class DocumentServiceSortTest {
"Brief", null, null, null, null, null, null, null, DocumentSort.DATE, "DESC", null, PAGE);
assertThat(result.items()).hasSize(2);
assertThat(result.items().get(0).id()).isEqualTo(id2); // newer first
assertThat(result.items().get(0).document().getId()).isEqualTo(id2); // newer first
}
// ─── RELEVANCE sort — pure text (no filters) ──────────────────────────────
@@ -104,7 +104,7 @@ class DocumentServiceSortTest {
DocumentSearchResult result = documentService.searchDocuments(
"Brief", null, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, PAGE);
assertThat(result.items().get(0).id()).isEqualTo(id1);
assertThat(result.items().get(0).document().getId()).isEqualTo(id1);
}
@Test
@@ -121,7 +121,7 @@ class DocumentServiceSortTest {
DocumentSearchResult result = documentService.searchDocuments(
"Brief", null, null, null, null, null, null, null, null, null, null, PAGE);
assertThat(result.items().get(0).id()).isEqualTo(id1);
assertThat(result.items().get(0).document().getId()).isEqualTo(id1);
}
// ─── RELEVANCE sort — overflow guard ─────────────────────────────────────
@@ -156,7 +156,7 @@ class DocumentServiceSortTest {
DocumentSort.RELEVANCE, null, null, PAGE);
assertThat(result.items()).hasSize(1);
assertThat(result.items().get(0).id()).isEqualTo(uuidId);
assertThat(result.items().get(0).document().getId()).isEqualTo(uuidId);
}
// ─── RELEVANCE sort — text + active filter ────────────────────────────────

View File

@@ -11,7 +11,7 @@ import org.raddatz.familienarchiv.audit.AuditLogQueryService;
import org.raddatz.familienarchiv.audit.AuditService;
import org.raddatz.familienarchiv.document.annotation.AnnotationService;
import org.raddatz.familienarchiv.document.transcription.TranscriptionBlockQueryService;
import org.raddatz.familienarchiv.document.DocumentListItem;
import org.raddatz.familienarchiv.document.DocumentSearchItem;
import org.raddatz.familienarchiv.document.DocumentSearchResult;
import org.raddatz.familienarchiv.document.DocumentSort;
import org.raddatz.familienarchiv.document.DocumentUpdateDTO;
@@ -1444,7 +1444,7 @@ class DocumentServiceTest {
assertThat(result.totalPages()).isEqualTo(3);
assertThat(result.items()).hasSize(50);
// Page 1 (offset 50) under ascending sender sort should start at L050
assertThat(result.items().get(0).sender().getLastName()).isEqualTo("L050");
assertThat(result.items().get(0).document().getSender().getLastName()).isEqualTo("L050");
}
@Test
@@ -1565,7 +1565,7 @@ class DocumentServiceTest {
null, null, null, null, null, null, null, null, DocumentSort.SENDER, "asc", null, UNPAGED);
assertThat(result.items()).hasSize(2);
assertThat(result.items()).extracting(DocumentListItem::title).containsExactly("Has Sender", "No Sender");
assertThat(result.items()).extracting(item -> item.document().getTitle()).containsExactly("Has Sender", "No Sender");
}
// ─── searchDocuments — RECEIVER sort, empty receivers ───────────────────────
@@ -1584,7 +1584,7 @@ class DocumentServiceTest {
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null, DocumentSort.RECEIVER, "asc", null, UNPAGED);
assertThat(result.items()).extracting(DocumentListItem::title)
assertThat(result.items()).extracting(item -> item.document().getTitle())
.containsExactly("Has Receiver", "No Receivers");
}
@@ -1607,7 +1607,7 @@ class DocumentServiceTest {
null, null, null, null, null, null, null, null, DocumentSort.SENDER, "asc", null, UNPAGED);
// null lastName should sort to end (treated as empty), not before "smith" (as "null")
assertThat(result.items()).extracting(DocumentListItem::title)
assertThat(result.items()).extracting(item -> item.document().getTitle())
.containsExactly("smith doc", "Null lastname doc");
}

View File

@@ -154,10 +154,10 @@ class MassImportServiceTest {
.build();
when(documentService.findByOriginalFilename("doc001.pdf")).thenReturn(Optional.of(existing));
Optional<MassImportService.SkipReason> result = service.importSingleDocument(minimalCells("doc001.pdf"), Optional.empty(), "doc001.pdf", "doc001");
Optional<String> result = service.importSingleDocument(minimalCells("doc001.pdf"), Optional.empty(), "doc001.pdf", "doc001");
verify(documentService, never()).save(any());
assertThat(result).isPresent().contains(MassImportService.SkipReason.ALREADY_EXISTS);
assertThat(result).isPresent().contains("ALREADY_EXISTS");
}
// ─── importSingleDocument — already-exists guard fires before file I/O ─────
@@ -179,10 +179,10 @@ class MassImportServiceTest {
byte[] pdfHeader = {0x25, 0x50, 0x44, 0x46, 0x2D}; // %PDF-
Files.write(physicalFile, pdfHeader);
Optional<MassImportService.SkipReason> result = service.importSingleDocument(
Optional<String> result = service.importSingleDocument(
minimalCells("present.pdf"), Optional.of(physicalFile.toFile()), "present.pdf", "present");
assertThat(result).isPresent().contains(MassImportService.SkipReason.ALREADY_EXISTS);
assertThat(result).isPresent().contains("ALREADY_EXISTS");
verify(s3Client, never()).putObject(any(PutObjectRequest.class), any(RequestBody.class));
verify(documentService, never()).save(any());
}
@@ -204,7 +204,7 @@ class MassImportServiceTest {
assertThat(service.getStatus().skipped()).isEqualTo(1);
assertThat(service.getStatus().skippedFiles())
.extracting(MassImportService.SkippedFile::filename, MassImportService.SkippedFile::reason)
.containsExactly(org.assertj.core.groups.Tuple.tuple("upload_fail.pdf", MassImportService.SkipReason.S3_UPLOAD_FAILED));
.containsExactly(org.assertj.core.groups.Tuple.tuple("upload_fail.pdf", "S3_UPLOAD_FAILED"));
}
@Test
@@ -223,7 +223,7 @@ class MassImportServiceTest {
assertThat(service.getStatus().skipped()).isEqualTo(1);
assertThat(service.getStatus().skippedFiles())
.extracting(MassImportService.SkippedFile::reason)
.containsExactly(MassImportService.SkipReason.ALREADY_EXISTS);
.containsExactly("ALREADY_EXISTS");
}
// ─── importSingleDocument — create new document (metadata only) ───────────
@@ -283,11 +283,11 @@ class MassImportServiceTest {
doThrow(new RuntimeException("S3 error"))
.when(s3Client).putObject(any(PutObjectRequest.class), any(RequestBody.class));
Optional<MassImportService.SkipReason> result = service.importSingleDocument(
Optional<String> result = service.importSingleDocument(
minimalCells("fail.pdf"), Optional.of(tempFile.toFile()), "fail.pdf", "fail");
verify(documentService, never()).save(any());
assertThat(result).isPresent().contains(MassImportService.SkipReason.S3_UPLOAD_FAILED);
assertThat(result).isPresent().contains("S3_UPLOAD_FAILED");
}
// ─── importSingleDocument — sender handling ───────────────────────────────
@@ -438,110 +438,6 @@ class MassImportServiceTest {
verify(documentService).findByOriginalFilename("doc002.pdf");
}
// ─── isValidImportFilename — security regression — do not remove ─────────
@Test
void isValidImportFilename_returnsFalse_whenFilenameIsNull() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", (String) null);
assertThat(result).isFalse();
}
@Test
void isValidImportFilename_returnsFalse_whenFilenameIsBlank() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", " ");
assertThat(result).isFalse();
}
@Test
void isValidImportFilename_returnsFalse_whenFilenameContainsForwardSlash() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", "etc/passwd");
assertThat(result).isFalse();
}
@Test
void isValidImportFilename_returnsFalse_whenFilenameContainsBackslash() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", "..\\etc\\passwd");
assertThat(result).isFalse();
}
@Test
void isValidImportFilename_returnsFalse_whenFilenameContainsDotDot() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", "doc..evil.pdf");
assertThat(result).isFalse();
}
@Test
void isValidImportFilename_returnsFalse_whenFilenameIsDotDot() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", "..");
assertThat(result).isFalse();
}
@Test
void isValidImportFilename_returnsFalse_whenFilenameIsAbsolutePath() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", "/etc/passwd");
assertThat(result).isFalse();
}
@Test
void isValidImportFilename_returnsFalse_whenFilenameContainsNullByte() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", "file\0.pdf");
assertThat(result).isFalse();
}
@Test
void isValidImportFilename_returnsTrue_whenFilenameIsPlainBasename() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", "document.pdf");
assertThat(result).isTrue();
}
@Test
void isValidImportFilename_returnsFalse_whenFilenameContainsUnicodeDivisionSlash() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", "foobar.pdf");
assertThat(result).isFalse();
}
@Test
void isValidImportFilename_returnsFalse_whenFilenameContainsFullwidthSlash() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", "foobar.pdf");
assertThat(result).isFalse();
}
@Test
void isValidImportFilename_returnsFalse_whenFilenameContainsUnicodeReverseSolidus() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", "foobar.pdf");
assertThat(result).isFalse();
}
@Test
void isValidImportFilename_returnsTrue_whenFilenameHasLeadingDot() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", ".hidden.pdf");
assertThat(result).isTrue();
}
@Test
void isValidImportFilename_returnsTrue_whenFilenameHasSpaces() {
boolean result = ReflectionTestUtils.invokeMethod(service, "isValidImportFilename", "Brief an Oma.pdf");
assertThat(result).isTrue();
}
@Test
void processRows_skipsRowAndContinues_whenFilenameIsPathTraversal() {
when(documentService.findByOriginalFilename("legitimate.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
List<List<String>> rows = List.of(
List.of("header"),
minimalCells("../evil"), // row 1: path traversal — should be skipped
minimalCells("legitimate.pdf") // row 2: valid — should be processed
);
MassImportService.ProcessResult result = ReflectionTestUtils.invokeMethod(service, "processRows", rows);
assertThat(result.processed()).isEqualTo(1);
assertThat(result.skippedFiles())
.extracting(MassImportService.SkippedFile::reason)
.containsExactly(MassImportService.SkipReason.INVALID_FILENAME_PATH_TRAVERSAL);
}
// ─── importSingleDocument — non-blank optional fields ────────────────────
@Test
@@ -755,22 +651,7 @@ class MassImportServiceTest {
assertThat(spyService.getStatus().skipped()).isEqualTo(1);
assertThat(spyService.getStatus().skippedFiles())
.extracting(MassImportService.SkippedFile::reason)
.containsExactly(MassImportService.SkipReason.FILE_READ_ERROR);
}
// ─── findFileRecursive — symlink escape security regression — do not remove ─
@Test
void findFileRecursive_throwsDomainException_whenSymlinkEscapesImportDir(
@TempDir Path importDirPath, @TempDir Path outsideDir) throws Exception {
Path outsideFile = outsideDir.resolve("secret.pdf");
Files.writeString(outsideFile, "sensitive content");
Files.createSymbolicLink(importDirPath.resolve("secret.pdf"), outsideFile);
ReflectionTestUtils.setField(service, "importDir", importDirPath.toString());
assertThatThrownBy(() -> ReflectionTestUtils.invokeMethod(service, "findFileRecursive", "secret.pdf"))
.isInstanceOf(DomainException.class);
.containsExactly("FILE_READ_ERROR");
}
// ─── readOds — XXE security regression ───────────────────────────────────

View File

@@ -1,8 +1,2 @@
logging.level.root=WARN
logging.level.org.raddatz=INFO
# Default test value so FlywayConfig's fail-closed check passes without each
# test having to set GRAFANA_DB_PASSWORD explicitly. The actual value is
# irrelevant in tests — Flyway only uses it to set the grafana_reader role's
# password, which no test connects with.
GRAFANA_DB_PASSWORD=test-grafana-reader-password

View File

@@ -147,9 +147,6 @@ services:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PASSWORD:-changeme}
GF_USERS_ALLOW_SIGN_UP: "false"
GF_SERVER_ROOT_URL: ${GF_SERVER_ROOT_URL:-http://localhost:3003}
# Read-only password for the grafana_reader PostgreSQL role; interpolated
# into the provisioned PostgreSQL datasource (see datasources.yml).
GRAFANA_DB_PASSWORD: ${GRAFANA_DB_PASSWORD}
volumes:
- grafana_data:/var/lib/grafana
- ./infra/observability/grafana/provisioning:/etc/grafana/provisioning:ro
@@ -168,7 +165,6 @@ services:
condition: service_healthy
networks:
- obs-net
- archiv-net # PO Overview dashboard queries archive-db via the grafana_reader role
# --- Error Tracking: GlitchTip ---

View File

@@ -227,9 +227,6 @@ services:
SPRING_DATASOURCE_URL: jdbc:postgresql://db:5432/archiv
SPRING_DATASOURCE_USERNAME: archiv
SPRING_DATASOURCE_PASSWORD: ${POSTGRES_PASSWORD}
# Consumed by Flyway V68 via the ${grafanaDbPassword} placeholder to set
# the read-only grafana_reader role's password.
GRAFANA_DB_PASSWORD: ${GRAFANA_DB_PASSWORD}
# Application uses the bucket-scoped service account, not MinIO root.
S3_ENDPOINT: http://minio:9000
S3_ACCESS_KEY: archiv-app
@@ -255,8 +252,6 @@ services:
OTEL_METRICS_EXPORTER: none
MANAGEMENT_METRICS_TAGS_APPLICATION: Familienarchiv
MANAGEMENT_TRACING_SAMPLING_PROBABILITY: ${MANAGEMENT_TRACING_SAMPLING_PROBABILITY:-0.1}
SENTRY_DSN: ${SENTRY_DSN:-}
LOGGING_STRUCTURED_FORMAT_CONSOLE: ecs
networks:
- archiv-net
healthcheck:
@@ -271,10 +266,6 @@ services:
build:
context: ./frontend
target: production
args:
# Vite build-time variable — baked into the JS bundle at build time.
# Empty default so deploys succeed before the secret is configured.
VITE_SENTRY_DSN: ${VITE_SENTRY_DSN:-}
restart: unless-stopped
depends_on:
backend:

View File

@@ -163,9 +163,6 @@ services:
SPRING_DATASOURCE_URL: jdbc:postgresql://db:5432/${POSTGRES_DB}
SPRING_DATASOURCE_USERNAME: ${POSTGRES_USER}
SPRING_DATASOURCE_PASSWORD: ${POSTGRES_PASSWORD}
# Consumed by Flyway V68 via the ${grafanaDbPassword} placeholder to set
# the read-only grafana_reader role's password.
GRAFANA_DB_PASSWORD: ${GRAFANA_DB_PASSWORD}
S3_ENDPOINT: http://minio:9000
S3_ACCESS_KEY: ${MINIO_ROOT_USER}
S3_SECRET_KEY: ${MINIO_ROOT_PASSWORD}

View File

@@ -152,7 +152,6 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `PORT_GRAFANA` | Host port for the Grafana UI (bound to `127.0.0.1` only) | `3003` | — | — |
| `POSTGRES_HOST` | PostgreSQL hostname for GlitchTip's db-init job and workers. Override when only the staging stack is running and `archive-db` is not resolvable by that name. | `archive-db` | — | — |
| `GRAFANA_ADMIN_PASSWORD` | Grafana `admin` user password | `changeme` | YES (prod) | YES |
| `GRAFANA_DB_PASSWORD` | Password for the read-only `grafana_reader` PostgreSQL role used by the PO Overview dashboard (issue #651). Consumed by Flyway V68 and the Grafana PostgreSQL datasource. Generate with `openssl rand -hex 32`. | — | YES (prod) | YES |
| `PORT_GLITCHTIP` | Host port for the GlitchTip UI (bound to `127.0.0.1` only) | `3002` | — | — |
| `GLITCHTIP_DOMAIN` | Public-facing base URL for GlitchTip (used in email links and CORS) | `http://localhost:3002` | YES (prod) | — |
| `GLITCHTIP_SECRET_KEY` | Django secret key for GlitchTip — generate with `python3 -c "import secrets; print(secrets.token_hex(32))"` | — | YES | YES |
@@ -257,7 +256,6 @@ git.raddatz.cloud A <server IP>
| `MAIL_USERNAME` | release.yml | SMTP user |
| `MAIL_PASSWORD` | release.yml | SMTP password |
| `GRAFANA_ADMIN_PASSWORD` | both | Grafana `admin` login — generate a strong password |
| `GRAFANA_DB_PASSWORD` | both | Read-only `grafana_reader` role password — `openssl rand -hex 32` |
| `GLITCHTIP_SECRET_KEY` | both | Django secret key — `openssl rand -hex 32` |
| `SENTRY_DSN` | both | GlitchTip project DSN — set after first-run (§4); leave empty to keep Sentry disabled |
| `VITE_SENTRY_DSN` | both | GlitchTip frontend project DSN — set after first-run (§4); leave empty to keep Sentry disabled |
@@ -359,7 +357,6 @@ Both files are passed explicitly via `--env-file` to the compose command, so the
| Gitea secret | Notes |
|---|---|
| `GRAFANA_ADMIN_PASSWORD` | Strong unique password; shared by nightly and release |
| `GRAFANA_DB_PASSWORD` | `openssl rand -hex 32`; shared by nightly and release — read-only DB role for the PO Overview dashboard |
| `GLITCHTIP_SECRET_KEY` | `openssl rand -hex 32`; shared by nightly and release |
| `STAGING_POSTGRES_PASSWORD` / `PROD_POSTGRES_PASSWORD` | Must match the running PostgreSQL container |
@@ -430,31 +427,6 @@ docker exec obs-loki wget -qO- \
Prometheus port `9090` and Grafana port `3003` (default; configurable via `PORT_GRAFANA`) are bound to `127.0.0.1` on the host. No other observability ports are host-bound.
##### Rotate the `grafana_reader` DB password
The PO Overview dashboard reads `audit_log`, `documents`, and `transcription_blocks` through the SELECT-only `grafana_reader` PostgreSQL role (issue #651, ADR-024). The role's password is owned by `R__grafana_reader_password.sql` — a Flyway *repeatable* migration that re-runs whenever the resolved `${grafanaDbPassword}` placeholder changes. That makes rotation a two-restart operation, no manual `psql` required.
```bash
# 1. Generate a new value
openssl rand -hex 32
# 2. Update both sides:
# - Gitea secret GRAFANA_DB_PASSWORD (nightly + release workflows pick it up)
# - Local .env on the server / dev machine
# 3. Restart the backend. Flyway sees that R__'s resolved checksum changed and
# re-applies it, issuing ALTER ROLE grafana_reader WITH PASSWORD '<new>'.
docker compose restart backend
# 4. Restart obs-grafana so the provisioned datasource picks up the new env value.
docker compose -f docker-compose.observability.yml restart obs-grafana
# 5. Verify the dashboard loads — PO Overview's Postgres panels should populate
# instead of "Data source error".
```
If `GRAFANA_DB_PASSWORD` is unset, the backend **refuses to start** (`IllegalStateException`). That is deliberate — see `FlywayConfig.resolveGrafanaDbPassword()` and the rationale in ADR-024.
#### GlitchTip
| Item | Value |

View File

@@ -80,14 +80,6 @@ _See also [DocumentStatus lifecycle](#documentstatus-lifecycle)._
**Sütterlin** — A specific standardized style of Kurrent taught in German schools from 1915 to 1941.
**Illegible word** — a word whose recognition confidence falls below the configured threshold; replaced with the literal token `[unleserlich]` in the rendered block text and counted in the `ocr_illegible_words_total` Prometheus counter.
**Models-ready gauge** — the `ocr_models_ready` Prometheus gauge, flipped from `0` to `1` once the FastAPI lifespan startup has finished loading the Kraken model and the spell-checker. Used both for the `/health` endpoint and as the supervised signal for the `ocr_models_ready < 1 for 2m` alert.
**Recognition model accuracy** — the accuracy reported by `ketos train` for the recognition (text-line) model, exposed as `ocr_model_accuracy{kind="recognition"}`. Sourced from `_parse_best_checkpoint` on the highest-scoring checkpoint after training.
**Segmentation model accuracy** — the accuracy reported by `ketos segtrain` for the baseline layout analysis (`blla`) model, exposed as `ocr_model_accuracy{kind="segmentation"}`. Distinct from recognition accuracy because the two models are trained and improved independently.
---
## Other Domain Terms

View File

@@ -118,14 +118,11 @@ To find a trace for a specific request in staging/production, either increase th
## Metrics (Prometheus → Grafana)
Prometheus scrapes two targets every 15 s:
Prometheus scrapes the backend management endpoint every 15 s:
```
Target: backend:8081/actuator/prometheus
Labels: job="spring-boot", application="Familienarchiv"
Target: ocr:8000/metrics
Labels: job="ocr-service"
```
All Spring Boot metrics carry the `application="Familienarchiv"` tag, which is how the Grafana Spring Boot Observability dashboard (ID 17175) filters to this service.
@@ -149,70 +146,6 @@ jvm_memory_used_bytes{area="heap", application="Familienarchiv"}
hikaricp_connections_active
```
### OCR-service custom metrics
Exposed at `ocr:8000/metrics` by `prometheus-fastapi-instrumentator`. The
`http_*` metrics describe the FastAPI request layer; the `ocr_*` series are
domain-specific. **Never label these with PII or document content** — labels
have unbounded cardinality risk and are visible to anyone with Grafana access.
| Metric | Type | Labels | Unit | What it tracks |
|---|---|---|---|---|
| `ocr_jobs_total` | Counter | `engine` (`surya`/`kraken`), `script_type` | jobs | OCR jobs that started after a successful PDF download |
| `ocr_pages_total` | Counter | `engine` | pages | Successfully OCR'd pages in the streaming generator |
| `ocr_skipped_pages_total` | Counter | — | pages | Pages skipped because the engine raised on them |
| `ocr_words_total` | Counter | — | words | Recognized words summed across every block |
| `ocr_illegible_words_total` | Counter | — | words | Words below the confidence threshold (rendered as `[unleserlich]`) |
| `ocr_processing_seconds` | Histogram | `engine` | seconds | Per-page (stream) or per-document (`/ocr`) engine time, excluding preprocessing |
| `ocr_training_runs_total` | Counter | `kind` (`recognition`/`segmentation`), `outcome` (`success`/`error`) | runs | Completed training runs |
| `ocr_model_accuracy` | Gauge | `kind` | ratio (01) | Latest accuracy reported by a successful training run |
| `ocr_models_ready` | Gauge | — | 0\|1 | 1 once the lifespan startup has finished loading models |
Canonical example queries (the same ones referenced in issue #652):
```promql
# OCR throughput by engine
sum by (engine) (rate(ocr_pages_total[5m]))
# Share of words rendered as [unleserlich]
sum(rate(ocr_illegible_words_total[5m]))
/ sum(rate(ocr_words_total[5m]))
# p95 page processing time per engine
histogram_quantile(0.95, sum by (engine, le) (
rate(ocr_processing_seconds_bucket[5m])
))
# Training error rate
sum(rate(ocr_training_runs_total{outcome="error"}[1h]))
/ sum(rate(ocr_training_runs_total[1h]))
# Latest recognition vs segmentation accuracy
ocr_model_accuracy
```
### Internal-only endpoints
`/metrics` is exposed by the OCR service over plain HTTP without
authentication. The container is reachable only on the internal Docker
network — Caddy never proxies to it directly. If the service is ever
exposed (e.g. a `ports:` mapping is added), block the endpoint at the
reverse proxy:
```caddy
ocr.example.com {
@internal_only path /metrics /health
respond @internal_only 404
reverse_proxy ocr:8000
}
```
The `MetricsPathFilter` in `ocr-service/main.py` suppresses uvicorn's
**stdout** access log lines for `/metrics` and `/health` so the container
console stays focused on real OCR traffic. Promtail/Loki still receive
access lines from any other source. Treat the filter as console
noise-control, not an audit-suppression mechanism.
## Errors (GlitchTip)
GlitchTip receives errors from both the backend (via Sentry Java SDK) and the frontend (via Sentry JavaScript SDK). It groups events by fingerprint, tracks first/last seen times, and links to the release that introduced the error.

View File

@@ -1,94 +0,0 @@
# ADR-023: Prometheus Instrumentator and Metrics Registry Injection
## Status
Accepted
## Context
Until issue #652 the OCR service exposed no `/metrics` endpoint. The
observability stack already scrapes the Spring Boot backend's actuator
endpoint, but it had nothing to scrape on the Python side. Without HTTP-
and domain-level metrics from `ocr-service` we cannot answer questions
like "what is the share of words rendered as `[unleserlich]`" or
"is the training error rate above its budget" from Grafana.
Two implementation requirements influenced the design:
1. **Counter / gauge isolation in tests.** `prometheus_client` collectors
are module-level singletons keyed by name on the global `REGISTRY`.
Re-importing or naively re-instantiating them raises a duplicated-
collector error and cross-test state leaks (a `.inc()` in test A is
still readable by test B). A test harness needs a way to swap the
active container for a fresh per-test instance.
2. **Minimal blast radius on the request path.** We did not want to
hand-instrument every endpoint with FastAPI middleware. The
`prometheus-fastapi-instrumentator` library already provides
`http_requests_total`, `http_request_duration_seconds`, and the
`/metrics` exposition route, all idiomatic Prometheus names.
## Decision
- Add `prometheus-fastapi-instrumentator==7.0.0` and pin its transitive
dependency `prometheus-client==0.25.0` explicitly in
`ocr-service/requirements.txt`.
- Mount the instrumentator once at module load:
`Instrumentator(excluded_handlers=["/health", "/metrics"]).instrument(app).expose(app)`.
This adds `/metrics` and an HTTP-level dashboard surface without
changing any endpoint code.
- Define every domain metric (`ocr_jobs_total`, `ocr_pages_total`,
`ocr_processing_seconds`, …) inside a `build_metrics(registry)`
factory in `ocr-service/metrics.py` that returns a frozen `OcrMetrics`
dataclass. Production code binds the container to the default
`REGISTRY` once: `metrics: OcrMetrics = build_metrics(REGISTRY)`.
- Tests use a `fresh_metrics` fixture that builds a new
`CollectorRegistry()` per test and monkeypatches `main.metrics` with
a container bound to it. The endpoint code keeps reading
`metrics.<name>` without knowing whether it is talking to the global
registry or a per-test one.
## Consequences
**Positive**
- One reusable factory captures the metric definitions; future metrics
go in one place.
- Tests run with full counter isolation. Cross-test state leakage is
impossible because each test sees its own dataclass instance.
- The instrumentator gives us `http_*` metrics for free, including a
Grafana-ready histogram that pairs with the Spring Boot one.
**Negative**
- One extra level of indirection: any test that asserts on metric
values must remember to monkeypatch `main.metrics`, not the registry
directly. Rebinding through the registry is harmless but useless —
the dataclass holds references to the original collectors.
- `prometheus-client` is now pinned. Upgrading it requires an explicit
bump and re-checking the instrumentator's compatibility range.
- `/metrics` is exposed unauthenticated and relies on the Docker
internal network for confidentiality. See
[docs/OBSERVABILITY.md §Internal-only endpoints](../OBSERVABILITY.md)
for the Caddy snippet that must be added if the service ever gets a
host-side port mapping.
## Alternatives considered
- **Hand-roll the `/metrics` endpoint.** Rejected: would have meant
duplicating what `prometheus-fastapi-instrumentator` ships, plus
middleware for the HTTP histograms.
- **Skip the factory; pass `registry` as a function argument
everywhere.** Rejected: clutters every endpoint signature and breaks
the symmetry with the Spring Boot side, which also relies on a
process-global Micrometer registry.
- **Use a `pytest` autouse fixture that resets `REGISTRY` between
tests.** Rejected: `prometheus_client` does not expose a clean
"unregister all" hook, and we would be relying on private APIs.
## References
- Issue: [#652](https://git.raddatz.cloud/marcel/familienarchiv/issues/652)
- Library: <https://github.com/trallnag/prometheus-fastapi-instrumentator>
- Code: `ocr-service/metrics.py`, `ocr-service/main.py`,
`ocr-service/test_metrics.py`

View File

@@ -1,123 +0,0 @@
# ADR-024: Grafana reads archive-db via a bridged network and a SELECT-only role
## Status
Accepted
## Context
Issue #651 (the PO Overview Grafana dashboard) needs aggregates over three
tables in the main application database — `audit_log`, `documents`, and
`transcription_blocks` — to answer the operator's four weekly questions: is
everything working, are people using it, is the archive making progress, is
OCR working well.
Until now, `obs-grafana` and the rest of the observability stack lived on
their own Docker network (`obs-net`) and never touched `archiv-net`, where
`archive-db` runs. The two were intentionally isolated: a compromise of any
observability container could not pivot to the application database.
The PO Overview's archive-progress and user-activity panels need rolling
7-day SQL aggregates that cannot be served by Prometheus or Loki. That
forces a connection from `obs-grafana` to `archive-db` for the first time.
Two implementation requirements shaped the design:
1. **Least privilege on the database side.** The Spring Boot application
role (`archiv`) has full read/write on every table. Letting Grafana
connect with that role would mean a Grafana compromise becomes an
application compromise. The dashboard only needs SELECT on three
tables; the role must reflect that and nothing more.
2. **Operational simplicity of secret rotation.** The role's password is
shared between the migration that sets it and the Grafana datasource
that uses it. A first version of this work put the password in a
versioned Flyway migration (V68), which Flyway only applies once —
leaving rotation as an out-of-band `psql ALTER ROLE` step that no
runbook documented. The shape must support rotation without manual
SQL.
## Decision
- Provision a dedicated PostgreSQL role `grafana_reader` with `LOGIN` plus
`GRANT SELECT` on `audit_log`, `documents`, `transcription_blocks` only.
No INSERT/UPDATE/DELETE on any table, no access to any other table —
enforced by the database, locked in by both positive and parameterized
negative tests in `GrafanaReaderRoleIntegrationTest`.
- Split the role's lifecycle across two migrations:
- `V68__add_grafana_reader_role.sql` — versioned, immutable, idempotent.
Creates the role and applies the grants. Runs exactly once per
database, like every other versioned migration.
- `R__grafana_reader_password.sql` — Flyway *repeatable* migration that
issues `ALTER ROLE grafana_reader WITH PASSWORD '${grafanaDbPassword}'`.
Flyway computes the checksum on the resolved content, so any change
to `GRAFANA_DB_PASSWORD` flips the checksum and re-applies the
migration on the next boot. Rotation becomes "bump env var, restart
backend, restart obs-grafana" — see the runbook in
`docs/DEPLOYMENT.md §4 → Rotate the grafana_reader DB password`.
- Resolve the password through Spring's `Environment` rather than a raw
`System.getenv()` call, so tests inject via `application.properties`
and the resolver is unit-testable with `MockEnvironment`. Fail closed
with `IllegalStateException` when the variable is unset — no fallback
string. Same shape as `UserDataInitializer`'s refusal to seed default
admin credentials outside dev/test/e2e.
- Join `obs-grafana` to `archiv-net` in addition to `obs-net`. Only the
Grafana container crosses the boundary; Loki, Tempo, Prometheus,
GlitchTip, and the worker containers remain `obs-net`-only.
## Consequences
**Positive**
- Database-level least privilege: a Grafana compromise gains SELECT on
three tables. Cannot write, cannot read PII tables like `app_users`,
`persons`, `notifications`, `document_comments`, `geschichten`. The
parameterized PII negative sweep in `GrafanaReaderRoleIntegrationTest`
is the regression gate; new sensitive tables get added to that list.
- Rotation is documented, idempotent, and survives operator turnover.
No "the password set on day 1 is the password forever" failure mode.
- Tests pin down both sides of the boundary: positive grants must hold,
write-deny must hold, and the PII negative list must stay empty.
**Negative / trade-offs**
- `obs-net` is no longer fully isolated from `archiv-net`. A Grafana RCE
(e.g. via a future Grafana CVE) gains a TCP path to `archive-db`
contained, but not impossible. The least-privilege role is the
mitigation; we accept that mitigation as sufficient for a single
bridged container.
- The backend must hold `GRAFANA_DB_PASSWORD` in its environment forever,
so Flyway can resolve the placeholder on every boot. A backend RCE
therefore also leaks the Grafana datasource password. Acceptable
because that password's blast radius is itself bounded by the
least-privilege grants on `grafana_reader`.
## Alternatives considered
- **Prometheus PostgreSQL exporter, no direct connection.** Loses ad-hoc
SQL aggregates — the dashboard would need every metric pre-defined as
an exporter query, with a redeploy to add a new one. The PO Overview
is the type of dashboard that grows panels over time; pre-defining
every aggregate is the wrong shape.
- **Read replica or logical-replication slot dedicated to Grafana.**
Real operational cost (extra Postgres instance, replication monitoring,
storage doubled) disproportionate to a weekly PO glance.
- **Versioned migration with `flyway repair` for rotation.** Rejected:
conflates schema lifecycle with credential lifecycle, requires manual
intervention to rotate, and the repair command's semantics are
surprising to operators unfamiliar with Flyway internals.
- **Hardcoded fallback password when env var is unset.** Rejected as a
security blocker: publishes a known credential for a role with read
access to user activity and full letter text. The fail-closed
behavior is the explicit defense.
## References
- Issue #651 — PO Overview Grafana dashboard
- `backend/src/main/resources/db/migration/V68__add_grafana_reader_role.sql`
- `backend/src/main/resources/db/migration/R__grafana_reader_password.sql`
- `backend/src/main/java/org/raddatz/familienarchiv/config/FlywayConfig.java`
- `backend/src/test/java/org/raddatz/familienarchiv/config/GrafanaReaderRoleIntegrationTest.java`
- `infra/observability/grafana/provisioning/datasources/datasources.yml`
- `docker-compose.observability.yml``archiv-net` bridge on `obs-grafana`
- `docs/DEPLOYMENT.md §4` — rotation runbook

View File

@@ -43,12 +43,9 @@ Rel(ocr, storage, "Fetches PDF via presigned URL", "HTTP / S3 presigned")
Rel(mc, storage, "Bootstraps bucket + service account on startup", "MinIO Client CLI")
Rel(promtail, loki, "Pushes log streams", "HTTP/Loki push API")
Rel(backend, tempo, "Sends distributed traces via OTLP", "HTTP / OTLP / port 4318 (archiv-net)")
Rel(prometheus, backend, "Scrapes JVM + HTTP metrics", "HTTP 8081 /actuator/prometheus")
Rel(prometheus, ocr, "Scrapes OCR + http_* metrics", "HTTP 8000 /metrics")
Rel(grafana, prometheus, "Queries metrics", "HTTP 9090")
Rel(grafana, loki, "Queries logs", "HTTP 3100")
Rel(grafana, tempo, "Queries traces", "HTTP 3200")
Rel(grafana, db, "Read-only dashboard queries via grafana_reader role", "PostgreSQL / archiv-net")
Rel(glitchtip, db, "Stores error events in glitchtip DB", "PostgreSQL / archiv-net")
Rel(obs_glitchtip_worker, obs_redis, "Processes Celery tasks", "Redis / obs-net")

View File

@@ -1,313 +0,0 @@
# Spreadsheet Analysis — Findings (2026-05-25)
Analysis of the **real raw archive** spreadsheets against the current `MassImportService`
(`backend/.../importing/MassImportService.java`). Goal: import ~7,600 letter rows + a
163-person register, with PDFs to follow.
Every issue has an ID (`IMP-NN`), severity, evidence, and a proposed approach.
---
## 0. Context: how the importer reads a row today
`MassImportService` reads **sheet index 0** and maps columns by configurable indices
(`app.import.col.*`, defaults in the source):
| Property | Default col | Meaning |
| --- | --- | --- |
| `colIndex` | 0 | Index (→ filename `<index>.pdf`) |
| `colBox` | 1 | Box |
| `colFolder` | 2 | Mappe |
| `colSender` | 3 | Sender (raw) |
| `colReceivers` | 5 | Receivers (raw) |
| `colDate` | 7 | Date |
| `colLocation` | 9 | Location |
| `colTags` | 10 | Tag (single) |
| `colSummary` | 11 | Summary |
| `colTranscription` | 13 | Transcription |
These defaults match the **ODS** file exactly (`Index, Box, Mappe, Von, BriefeschreiberIn,
An, EmpfängerIn, Datum, Datum Originalformat, Ort, Schlagwort, Inhalt, Zeitlicher Kontext,
Transkript` = 14 cols). The ODS was the development target. The new xlsx is a different beast.
Per-row pipeline: skip if Index blank → derive filename from Index → validate filename →
look for file on disk (recursive; metadata-only if absent) → check PDF magic bytes →
`importSingleDocument` (upsert by `originalFilename`, dedupe non-placeholders as
`ALREADY_EXISTS`). Date parsing is **ISO-only** (`LocalDate.parse`).
---
## IMP-01 — New xlsx column layout ≠ importer defaults 🔴 BLOCKER
The new `…aktuell…xlsx` (sheet `Familienarchiv`, 7,943 rows × 12 cols) has a **denser,
different** layout. There is an extra `Datei` column at index 1, and the normalized
`Von`/`An`/ISO-`Datum` columns from the ODS **do not exist**.
| col | New xlsx header | Importer default expects | Result with defaults |
| --- | --- | --- | --- |
| 0 | Index | Index | ✅ ok |
| 1 | **Datei** (path) | Box | ❌ Box ← `..\__scan\W-0001.pdf` |
| 2 | Box | Mappe | ❌ Mappe ← `V` |
| 3 | Mappe | Sender | ❌ Sender ← `1` |
| 4 | BriefeschreiberIn (sender) | — (unused) | ❌ sender ignored |
| 5 | EmpfängerIn (receiver) | Receivers | ✅ coincidentally ok |
| 6 | Datum des Briefes | — (unused) | ❌ date ignored |
| 7 | Ort (location) | Date | ❌ Date ← `Rotterdam` → null |
| 8 | Schlagwort (tag) | — (unused) | ❌ tag ignored |
| 9 | Inhalt (summary) | Location | ❌ Location ← summary text |
| 10 | — | Tag | ❌ empty |
| 11 | — | Summary | ❌ empty |
| 13 | — | Transcription | ❌ column doesn't exist |
**Impact:** importing as-is produces almost entirely garbage metadata.
**Proposed approach (decide with Marcel):**
- (a) Re-map via the existing `app.import.col.*` properties — fast, no code. New mapping:
`index=0, box=2, folder=3, sender=4, receivers=5, date=6, location=7, tags=8, summary=9`,
and there is **no** transcription column (point it past the end or add a "missing column"
convention). Caveat: tags land in `colTags` but the real per-letter keywords are in
`Inhalt` (col 9) — see IMP-08 note on tags vs summary.
- (b) Make the importer **header-driven** (map by header name, not index) so it survives
layout drift across files. More robust, needs a code change (→ Gitea issue).
Recommendation: (b) is the durable fix given we have ≥3 different layouts already.
---
## IMP-02 — 90% of dates are free-text the parser can't read 🔴 BLOCKER
The dates are written **as in the letter**. `parseDate()` only does `LocalDate.parse()`
(ISO `yyyy-MM-dd`), so anything non-ISO becomes `null`.
Of **7,319** rows with a date value (col 6):
| kind | count | parses today? |
| --- | --- | --- |
| Real Excel date cells (→ ISO via POI) | 748 | ✅ |
| Free-text date strings | 6,571 | ❌ → null |
**90% of dated rows lose their date.** (623 rows have no date at all.)
Observed free-text formats (counts approximate, from col 6):
| Format | Count | Examples |
| --- | --- | --- |
| `D.M.YY` | 1,338 | `11.10.08`, `13.5.09` |
| `D.RomanMonth.YY/YYYY` | ~1,527 | `22.III.18`, `19.XII.1954`, `1.III.27` |
| `D.Month YYYY` | 950 | `6.März 1888`, `9.März 1888` (note: **no space** after the dot) |
| `D.M.YYYY` | 358 | `15.2.1888`, `7.3.1888` |
| Approximate / unknown | 146 | `?`, `13.7.18?`, `17.Nov (?) 1887`, `13.Januar ? 1907` |
| `Month YYYY` / season / holiday | 41+27 | `Mai 1895`, `Herbst 1913`, `Pfingsten 1922`, `Ostern 1890` |
| `YYYY` only | 17 | `1905`, `1949` |
| `D.M.` no year | 10 | `8.9.`, `14.3.` |
| Ranges | 5+ | `8.1.1916 - 15.3.1916`, `1881/82`, `1945/46?` |
| Abbrev/English months, no space | many | `29.Sept.1891`, `10.Oct.95`, `9.December1889`, `18.Dez.1916` |
| Slash separator | ~315 | `2/2. 18`, `17/6. 1916`, `10/4. 1917` |
| English `Month D. YYYY` | several | `April 12. 1922`, `Oct.5. 1916`, `Mai 23. 1917` |
| Trailing notes | 5+ | `26.4.1888, 2. Brief`, `31.8.1888,2.Brief` |
| 3-digit year (typo) | 107 | `30.1.889` (→ 1889), `4.3.1023` (in person file → 1923) |
| Day-range within month | several | `7./8. Sept.1923` |
**Proposed approach:** build a tolerant German/historical date parser (→ Gitea issue, it's
a code change). Requirements:
- Numeric `D.M.YY[YY]` and `D/M. YY[YY]` (slash = dot).
- Roman-numeral months (`I``XII`).
- German + English month names, full + abbreviated, with/without separating space
(`März`, `Sept.`, `Dez`, `December`, `Oct.`).
- 2-digit and 3-digit year normalization (`08`→1908? needs a century rule; `889`→1889).
- Partial dates → store what's known. The schema only has a single `documentDate
LocalDate`; **decide** whether to (i) store first-of-month/year, (ii) add a
`datePrecision` enum + `dateOriginal` text column, or (iii) keep raw text in a new
`documentDateRaw` field and leave `documentate` null when imprecise. Recommendation:
preserve the **original string** always (new column) + best-effort parsed date +
precision flag, so nothing is lost and the UI can show "ca. 1916".
- Unparseable/approximate (`?`, `Herbst 1913`) → keep raw, leave parsed date null, **do
not drop the row**.
**Cross-check:** even after IMP-01 is fixed so the date column is read, IMP-02 still bites.
Both must be solved before a real import.
---
## IMP-03 — New xlsx has no normalized/ISO date or name columns 🔴 BLOCKER
The ODS had helper columns the importer relied on: `Von`/`An` (normalized names) and
`Datum` (ISO) alongside `Datum Originalformat`. The new xlsx has **only the raw**
`BriefeschreiberIn` / `EmpfängerIn` / `Datum des Briefes`. So:
- Names must be parsed from raw strings (PersonNameParser already does receivers; **sender
is taken raw, never split** — fine for senders, which are single, but no normalization).
- Dates must be parsed from raw (IMP-02).
This is the root reason IMP-01/02 exist: the new file is the *uncurated* source, not the
hand-normalized ODS. Tie any importer redesign to this reality — we will not get clean
helper columns in the 7k-row file.
---
## IMP-04 — Person register not imported at all 🟠 MAJOR
`Personendatei 2.xlsx` → sheet `Tabelle1`, **163 people**, columns:
`Generation, Familienname, Vorname, geb als (maiden), Geburtsdatum, Geburtsort,
Todesdatum, Sterbeort, verheiratet mit, Bemerkung`.
Today `MassImportService` has **no person-register import**. Persons are only
auto-created as bare aliases from the document sender/receiver strings
(`personService.findOrCreateByAlias`). All this rich genealogical data is unused:
- birth/death dates + places,
- maiden names (the key to dedup — see IMP-05),
- `verheiratet mit` (marriage links → `PersonRelationship` domain),
- `Bemerkung` relationship hints (`"Schwester v Marie Cram"`, `"Nichte von Herbert"`),
- `Generation` (G 1G 4),
- nicknames in quotes (`"Tante Lolly"`).
Data-quality notes in this file too: multi-value `Vorname` (`Charlotte,Meta,Jacobi`);
mixed Excel-date vs text dates; typos (`4.3.1023`); missing-day dates (`.12.1955`);
trailing spaces (`30.8.1862 `).
**Proposed approach:** a separate **Person import** (→ Gitea issue). Order matters: import
persons *first* so documents can link to real people instead of creating alias stubs.
Use `geb als` + `verheiratet mit` to pre-build the alias/relationship graph.
---
## IMP-05 — Name variations create duplicate Persons 🟠 MAJOR
The same person appears under several surface forms across the document sheet:
- `Eugenie Müller` (151) vs `Eugenie de Gruyter` (452) — maiden vs married.
- `Clara Cram` (sender 1,284) vs `Clara de Gruyter` (455) vs `Clara de Gruyter sen.` (66).
- `Walter de Gruyter` (589) vs bare `Walter` (78).
`findOrCreateByAlias` keys on the raw string, so each variant becomes (or matches) a
distinct alias and likely a **distinct Person**. Result: fragmented person records,
broken Briefwechsel pairing, wrong stats.
**Proposed approach:** drive dedup from the register's `geb als` column (IMP-04) —
`Eugenie de Gruyter geb Müller` tells us the two strings are one person. Build an alias
map (married ↔ maiden ↔ nickname) before/while importing documents. This is partly data
(an alias mapping table/sheet) and partly code (consume it). Likely a Gitea issue once the
mapping format is decided.
945 distinct sender strings / 274 distinct receiver strings — expect a long-tail of
variants to reconcile. Don't try to be perfect on the first pass; get the high-frequency
names right.
---
## IMP-06 — 93 data rows with blank Index are silently dropped 🟠 MAJOR
`processRows` does `if (index.isBlank()) continue;`. **93 rows** have a blank Index but
carry other data (sender/receiver/date/etc.). These are silently skipped — they don't even
appear in the `skippedFiles` report (that list only covers rows that *had* an index but
failed file checks).
**Proposed approach:** before import, triage these 93 rows — are they continuation rows,
section markers, or genuine letters missing an ID? At minimum, surface a count/warning so
nothing vanishes unnoticed. Possibly a small importer change to report blank-index skips.
---
## IMP-07 — 43 duplicate Index values 🟡 MINOR
43 Index values repeat (e.g. `W-0388`, `Eu-0332`, `C-0234`, `C-0235`, `C-0236`, `J-0175`).
Since the filename is derived from Index, the importer's upsert keys both rows on the same
`originalFilename`: the second occurrence is treated as `ALREADY_EXISTS` (if the first
isn't a placeholder) and **its metadata is lost**, or it overwrites a placeholder.
**Proposed approach:** list the 43 duplicates, check whether they're true duplicates or
two distinct letters that share an ID by mistake. Fix in the source data, or extend the ID
scheme. Data task first; software only if the ID scheme must change.
---
## IMP-08 — Section/title rows interleaved with data 🟡 MINOR
Row 2 of the sheet is a section header sitting only in the sender column
(`Brautbriefe von Walter der Gruyter an Eugenie Müller`) with a blank Index — caught by the
blank-Index skip (overlaps IMP-06). There may be more such banners scattered through 7,943
rows. Also relevant: the per-letter **keywords live in `Inhalt` (col 9)** as comma-joined
values (`Tilburg,Verwandschaft`, `poetisch,Reise nach Breda`), while `Schlagwort` (col 8)
holds a single broad tag (`Brautbriefe`). The importer only takes **one** tag column —
decide which column feeds tags vs summary, and whether to split comma-lists into multiple
tags.
**Proposed approach:** scan for rows where Index is blank but other cells are set (already
have the count: relates to the 93 in IMP-06). Confirm tag vs summary column choice with
Marcel.
---
## IMP-09 — Index ↔ Datei filename mismatches 🟡 MINOR
The `Datei` column (col 1) holds explicit relative paths (`..\__scan\W-0001.pdf`) but they
don't always agree with the Index. Example: row 20 has Index `W-0010x` but Datei
`..\__scan\W-0011x.pdf`. The importer derives the filename from **Index**, so it will look
for `W-0010x.pdf` and may miss the actual scan. (Note: the `Datei` paths themselves are
Windows-style with `\` and `..` and would be **rejected** by `isValidImportFilename` if anyone
tried to use that column directly — 7,623 rows use backslashes, 7,455 contain `..`.)
**Proposed approach:** when the PDFs arrive, reconcile Index-derived names against actual
filenames; produce a mismatch report. Keep deriving from Index (stable IDs) but flag
disagreements. Mostly a data/QA task.
---
## IMP-10 — `x`-suffix rows (letter backsides / enclosures) 🟡 MINOR
**42 rows** have an `x`-suffixed Index (`W-0001x`, `W-0002x`, …). They're sparse — typically
only Index + Datei + sender + receiver, no box/folder/date. They appear to be the reverse
side or an enclosure of the preceding letter. The importer treats each as an independent
Document, and the `metadataComplete` heuristic flags them complete as soon as a sender is
present (date/box/folder all missing).
**Proposed approach:** decide whether `x` rows should be (a) separate documents, (b) extra
pages/files attached to their parent, or (c) skipped. Affects both the data model and the
`metadataComplete` heuristic. Discuss with Marcel.
---
## IMP-11 — Multi-receiver separators include bare `u` / `u.` 🟡 MINOR
`PersonNameParser.parseReceivers` already handles ` und `, ` u `, `//`, `geb.`,
parenthesised shared surnames, and `Familie` filtering — good. But the real data also uses
the abbreviation in forms the top-receivers list shows are common:
`Eugenie u Walter de Gruyter` (230), `Herbert u Clara` (94), `Juan u Marie Cram` (75),
and space-joined pairs like `Ella Anita` (79) that may be two people.
Raw separator tally on receivers: ` und ` ×70, `,` ×11, `;` ×2, `/` ×1 — plus the many ` u `
cases above. Senders are **not** parsed at all (taken raw), which is fine unless a sender
cell ever holds two names.
**Proposed approach:** add `MassImportServiceTest` cases for the real-world strings above;
extend the parser only where it actually fails. `Ella Anita`-style space-joined pairs are
ambiguous — likely leave as one person unless the register says otherwise (ties to IMP-05).
---
## IMP-12 — Importer reads only the first sheet, no validation 🟡 MINOR
`readXlsx` does `workbook.getSheetAt(0)`. For the new xlsx that's `Familienarchiv` (✅), but
the file also contains `Inhaltsverzeichnis grob`, `Inhaltsverzeichnis WdG`, `Tabelle4`.
There is no header validation: if the wrong file/sheet is dropped in `/import`, the importer
will happily map columns positionally and import nonsense. Also `findSpreadsheetFile()` picks
the **first** spreadsheet found in `/import` — with three spreadsheets present there today,
which one wins is filesystem-order-dependent.
**Proposed approach:** (a) validate the header row against expected names before importing;
(b) make the target sheet/file explicit (config or header match) rather than "first found".
Ties into the header-driven mapping in IMP-01(b).
---
## Summary of recommended sequencing
1. **Decide the importer mapping strategy** (IMP-01): positional re-config vs header-driven.
Header-driven is the durable choice and unblocks IMP-03/12.
2. **Build the tolerant date parser** (IMP-02) with original-string preservation + precision.
3. **Import the Person register first** (IMP-04) and build the alias/marriage graph,
which feeds person dedup (IMP-05).
4. **Then import documents**, with reporting for blank-index (IMP-06), duplicates (IMP-07),
and section rows (IMP-08).
5. **Reconcile files** when the ~7,000 PDFs arrive (IMP-09), and decide `x`-row semantics
(IMP-10).
Code-change items (→ Gitea issues when we get there): IMP-01(b), IMP-02, IMP-04, IMP-05
(consume side), IMP-06 reporting, IMP-12. Pure-data items stay in this folder.

View File

@@ -1,386 +0,0 @@
# Spec — Import Normalizer
> Authored in the voice of **"Elicit"**, requirements engineer (see
> `.claude/personas/req_engineer.md`). This is a requirements artifact: it states
> *what* the normalizer must do and *how we'll know it's done*, in problem/behaviour
> language. Technology choices already made during brainstorming (Python, openpyxl,
> overrides-and-rerun) are recorded as **constraints**, not re-litigated here.
- **Status:** Draft for review
- **Date:** 2026-05-25
- **Related:** [`01-findings-spreadsheet-analysis.md`](./01-findings-spreadsheet-analysis.md) (issues `IMP-01..12`), [`README.md`](./README.md)
- **Scope boundary:** This spec covers the **offline normalizer** that turns the raw
spreadsheets into a clean, canonical dataset + review artifacts. Wiring the canonical
contract into the Java `MassImportService` and the `Document`/`Person` model is **Phase 2**
and gets its own spec. This spec only *defines the contract* Phase 2 must satisfy.
---
## 1. Project Brief
**Vision.** Turn the family's human-curated, free-form archive spreadsheets into a clean,
canonical dataset that imports deterministically — without hand-editing thousands of rows
and without losing the historical nuance of how things were originally written.
**Problem.** The real archive (`…aktuell…xlsx`, 7,943 rows) and the person register
(`Personendatei 2.xlsx`, 163 people) were authored for humans to read, not machines to
import. Dates are written as they appeared in each letter (≈90% unparseable by the current
importer), the column layout differs from what the importer expects, and the same person
appears under many names. Importing as-is produces garbage (see `IMP-01..12`).
**Goal (measurable).**
- G1 — After the automated pass, **≤ 5%** of dated rows remain `UNKNOWN`; after the
overrides-iteration loop, **≤ 0.5%**.
- G2 — **100%** of source rows are represented in the canonical output or in a review file —
*zero silent drops*.
- G3 — **100%** of original values (raw date string, raw name string, source row number)
are preserved.
- G4 — A full run over the current inputs completes in **< 60 s** on the dev laptop and is
**content-deterministic** when re-run with unchanged inputs+overrides: identical canonical
cell matrices and identical review-file contents. (Workbook metadata is pinned; literal xlsx
byte-identity is not guaranteed because the zip container stores entry metadata.)
**Primary actor.** Marcel — solo owner & data steward (tech comfort 4/5). Also: a future
agent re-running the pipeline; and the `MassImportService` as the downstream consumer.
**Non-Goals (explicitly out of scope).**
- NG1 — Changing `MassImportService` or the DB schema (that is Phase 2).
- NG2 — Uploading/attaching the ~7,000 PDFs (they arrive later; import matches by `index`).
- NG3 — A GUI. The interface is spreadsheets in, CSVs out, an overrides file hand-edited.
- NG4 — Perfect genealogical reconstruction. We resolve confidently-matchable people; the
long tail stays as provisional persons.
- NG5 — OCR/transcription content (the new xlsx has no transcription column).
**Key assumptions.** (A1) Sheet `Familienarchiv` is the document source of truth.
(A2) Archive date range is **18731957** (drives the 2-digit-year century rule).
(A3) `index` is the stable document key and the basis for future PDF matching.
(A4) `Schlagwort` is a broad tag; `Inhalt` is a short summary/topic.
**Risks.** (R1) 2-digit/partial dates are genuinely ambiguous → mitigated by precision flag
+ overrides. (R2) Name matching false-positives merge distinct people → mitigated by
conservative matching + review before merge. (R3) Source spreadsheet may be re-exported with
layout drift → mitigated by header-name-based mapping, not fixed indices.
---
## 2. Personas
**Marcel — Data Steward.** Role: solo owner of Familienarchiv. Context: holds the complete
raw archive; PDFs follow. Tech comfort: 4/5 (semi-technical, reads CSV/spreadsheets fluently,
not keen to hand-edit 7,600 rows). Primary goal: a clean, importable dataset he trusts.
Frustrations: dates in ~20 formats; one ancestor under 4 name variants. **JTBD:** *"When I
have raw, human-curated archive spreadsheets, I want to transform them into a clean importable
dataset without losing how things were originally written, so I can load the archive and keep
correcting edge cases as they surface."*
**The Returning Agent.** Role: a future assistant session resuming the work. Goal: re-run the
pipeline deterministically and understand exactly what still needs human input. **JTBD:**
*"When I pick this up cold, I want one command and a clear residue report, so I can continue
without re-deriving context."*
---
## 3. Constraints & Decisions Already Made
These were settled during brainstorming and are fixed inputs to the requirements below.
| # | Decision | Rationale |
| --- | --- | --- |
| C1 | **New canonical layout** with explicit headers (not the old positional ODS shape). | Fits the new data; importer becomes header-driven in Phase 2. |
| C2 | Dates stored as **parsed (nullable) + raw + precision**. | Historical archive; never lose the original; enable "ca. 1916". |
| C3 | **Include person resolution** (register + alias/marriage map → canonical persons) in this effort. | Maiden-name dedup needs the register. |
| C4 | **Overrides-file + re-run** loop for residue. | Deterministic, diffable, repeatable. |
| C5 | Implementation: **Python 3.12 + openpyxl**, standalone tool at `tools/import-normalizer/`. | Fast iteration; no Spring rebuild / coverage gate on transform code. |
| C6 | Century rule for archive **18731957**: 2-digit `0057``19YY`, `7399``18YY`, `5872`→**flag**; 3-digit `DDD``1DDD`; never 20xx. | Stated by Marcel. Boundaries live in config. |
| C7 | `Schlagwort`→tag, `Inhalt`→summary. | Matches importer's existing semantics. |
| C8 | Non-register correspondents become **provisional persons**. | ~945 distinct sender strings vs 163 register people. |
---
## 4. Functional Requirements
Each requirement has a stable ID. User stories use Connextra + Given-When-Then; system rules
use EARS. Traceability to findings in §8.
### 4.1 Ingest & layout (`FR-INGEST`, `FR-MAP`)
**US-MAP-01** — *As the data steward, I want each source column mapped to a named canonical
field regardless of its position, so a re-exported spreadsheet with shifted columns still
imports correctly.*
- AC1 — Given the `Familienarchiv` sheet, when the normalizer reads the header row, then it
maps columns by **header name** (not fixed index) to the canonical fields.
- AC2 — Given a header the normalizer does not recognise, when it runs, then it records the
unknown header in `review/summary.txt` and continues (does not crash).
- AC3 — Given a required source header is **absent**, when it runs, then it aborts with a
clear message naming the missing header (fail loud, before producing partial output).
- **REQ-INGEST-01** — The normalizer shall read only the `Familienarchiv` sheet of the
document workbook and the `Tabelle1` sheet of the person workbook.
- **REQ-MAP-01** — Header matching shall be case-insensitive and tolerant of internal
multiple spaces (e.g. `"Datum des Briefes"`).
### 4.2 Row triage (`FR-TRIAGE`) — resolves IMP-06, IMP-07, IMP-08
**US-TRIAGE-01** — *As the data steward, I want rows that have data but no index surfaced
rather than dropped, so I never lose a letter silently.*
- AC1 — Given a row whose `index` is blank but which has any other non-empty cell, when the
normalizer runs, then that row is written to `review/blank-index-rows.csv` with its source
row number and is **not** emitted as a canonical document.
- AC2 — Given a fully empty row, when it runs, then the row is skipped and counted (not
reported as an anomaly).
- **REQ-TRIAGE-01** — If two or more rows resolve to the same `index`, then the normalizer
shall emit all of them to `review/duplicate-index.csv` and mark each canonical row
`needs_review = duplicate_index` (it shall **not** silently drop either).
- **REQ-TRIAGE-02** — Where a row is identified as a section/banner row (blank index, text
only in a name column), the normalizer shall classify it as such in the blank-index report.
- **REQ-TRIAGE-03** — Rows whose `index` ends in `x` (a transcription/back-side of the base
letter, not yet independently mappable) shall be **skipped** — not emitted as a canonical
document — and written to `review/skipped-x-suffix.csv` with their source row and base index
(`index` minus the trailing `x`), so they can be linked in a later pass. (Resolves IMP-10.)
### 4.3 Date normalization (`FR-DATE`) — resolves IMP-02, IMP-03
**US-DATE-01** — *As the data steward, I want every date interpreted as precisely as the
source allows, with the original always kept, so I can sort the archive and still see what the
letter actually said.*
- AC1 — Given a parseable date, when normalized, then `date_iso` holds the best-effort ISO
date, `date_raw` holds the verbatim source string, and `date_precision`
`{DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN}`.
- AC2 — Given an unparseable date, when normalized, then `date_iso` is empty,
`date_precision = UNKNOWN`, `date_raw` is preserved, and the value appears in
`review/unparsed-dates.csv`.
- AC3 — Given the same `date_raw` appears in `overrides/dates.csv`, when normalized, then the
override's `(iso, precision)` wins over the automatic parse.
- **REQ-DATE-01** — The parser shall accept, at minimum, these forms (see §10 examples):
Excel/ISO; `D.M.YYYY`/`D.M.YY`; `D/M. YY[YY]` (slash treated as dot); Roman-numeral months
`IXII`; German + English month names, full and abbreviated, with or without a separating
space; `Month YYYY`; season/holiday + year; bare `YYYY`; and start-anchored ranges.
- **REQ-DATE-02** — Precision shall be assigned by what is known: full day → `DAY`; month+year
`MONTH` (day = 1); a **named feast/holiday + year** → resolved to its **actual calendar
date for that year** → `DAY`; a **season + year** → representative mid-season month (day = 1)
`SEASON`; year only → `YEAR` (month = Jan, day = 1); a range → start date + `RANGE`; a
value carrying an uncertainty marker (`?`, `um`, `ca`, `circa`) → `APPROX` with best-effort date.
- **REQ-DATE-03** — Two-digit and three-digit years shall be expanded per **C6**; a 2-digit
year in `5872` shall yield `UNKNOWN` + a review entry rather than a guess.
- **REQ-DATE-04** — Trailing editorial notes (e.g. `", 2. Brief"`) shall be stripped before
parsing and preserved (kept within `date_raw`; not invented into the date).
- **REQ-DATE-05** — The parser shall be pure and side-effect-free so it can be unit-tested in
isolation (see NFR-TEST-01).
- **REQ-DATE-06** — **Movable feasts are never mapped to a fixed month**; they shall be
computed per year from Easter (Gauss/Butcher computus): Karfreitag = Easter2, Ostern =
Easter Sunday, Himmelfahrt = Easter+39, Pfingst(sonntag) = Easter+49, Pfingstmontag =
Easter+50, Fronleichnam = Easter+60, 1.4. Advent = the 4th…1st Sunday before 25 Dec. Fixed
feasts use a lookup table (Neujahr=01-01, Heiligabend=12-24, Weihnachten=12-25,
Silvester=12-31, …). Seasons map to representative months: Frühling/Frühjahr=Apr, Sommer=Jul,
Herbst=Oct, Winter=Jan. The feast/season tables and Easter algorithm live in `config.py`
(NFR-MAINT-01).
### 4.4 Person resolution & dedup (`FR-PERS`, `FR-DEDUP`) — resolves IMP-04, IMP-05, IMP-11
**US-PERS-01** — *As the data steward, I want the genealogical register turned into canonical
people with all their known facts, so documents can link to real persons.*
- AC1 — Given a register row, when parsed, then a canonical person is produced with
`person_id`, name parts, `maiden_name`, birth/death (parsed + raw + place), spouse,
generation, nickname, notes — applying the same date rules as §4.3 to birth/death dates.
- AC2 — Given multi-value given names (`"Charlotte,Meta,Jacobi"`), when parsed, then the
primary given name is the first; the remainder are retained as additional names/aliases.
**US-PERS-02** — *As the data steward, I want each sender/receiver string matched to a
canonical person where possible and never dropped otherwise, so the correspondence graph is
complete.*
- AC1 — Given a sender/receiver string, when resolved, then it maps to a register
`person_id` via the alias index (exact → normalized/casefold → conservative fuzzy).
- AC2 — Given no confident match, when resolved, then a **provisional person** is created from
the cleaned string, linked, and listed in `review/unmatched-names.csv` (occurrence count +
example source rows).
- AC3 — Given the string appears in `overrides/names.csv`, when resolved, then it maps to the
specified `person_id` (override wins).
- AC4 — Given a multi-person receiver cell (`"Eugenie u Walter de Gruyter"`, `"Herbert u
Clara"`, `"…//…"`, `"Hedi und Tutu (Gruber)"`), when resolved, then it is split into
individual people, each resolved independently; ambiguous space-joined pairs
(`"Ella Anita"`) are emitted to `review/ambiguous-receivers.csv` rather than guessed.
- **REQ-DEDUP-01** — The alias index shall be derived from the register: canonical
"First Last", maiden form (`geb als`), spouse-surname married form, nickname, and
first-name-only **only when unambiguous** across the register.
- **REQ-DEDUP-02** — The normalizer shall not merge two distinct strings into one person on
fuzzy similarity alone above a configured threshold without the match being reported; merges
must be auditable.
- **REQ-PERS-01** — Sender cells shall be parsed for multi-person content using the same rules
as receiver cells (today the importer parses only receivers — IMP-11).
### 4.5 Overrides & idempotency (`FR-OVR`) — supports the iteration loop
- **REQ-OVR-01** — When the normalizer runs, then it shall load `overrides/dates.csv` and
`overrides/names.csv` if present and apply them; absence of either file shall not be an error.
- **REQ-OVR-02** — While overrides are unchanged and inputs are unchanged, re-running shall
produce **byte-identical** canonical outputs and review files (NFR-IDEM-01).
- **REQ-OVR-03** — Each override application shall be counted in `review/summary.txt` (how many
dates/names were resolved by override vs automatically).
### 4.6 Canonical output & provenance (`FR-OUT`, `FR-PROV`) — resolves IMP-01, IMP-09, IMP-12
- **REQ-OUT-01** — The normalizer shall write `out/canonical-documents.xlsx` and
`out/canonical-persons.xlsx` with the headered schemas in §6.
- **REQ-PROV-01** — Every canonical document row shall carry `source_row` (1-based row number
in the source sheet) so any value can be traced back to the original.
- **REQ-PROV-02** — Every canonical row shall carry a `needs_review` field listing zero or more
flags (`duplicate_index`, `unparsed_date`, `unmatched_sender`, `unmatched_receiver`,
`index_file_mismatch`, …) so the import and the UI can foreground uncertain data.
- **REQ-OUT-02** — Where the source `Datei` path disagrees with the index-derived filename
(IMP-09), the normalizer shall record the discrepancy in `review/index-file-mismatch.csv`
and flag the row; it shall **not** alter the `index` (the stable key).
---
## 5. Non-Functional Requirements
| ID | Category | Requirement (measurable) |
| --- | --- | --- |
| NFR-DATA-01 | Data integrity | 100% of source rows are accounted for in output **or** a review file; 100% of original date/name strings preserved verbatim. |
| NFR-IDEM-01 | Determinism | Identical inputs + overrides ⇒ identical *logical* output across runs/machines: identical canonical cell matrices and review-file contents. Workbook `created`/`modified` metadata is pinned to a constant; ordering of all generated rows/aliases is stable (no set-iteration leakage). xlsx byte-identity is explicitly not required — determinism is asserted on content. |
| NFR-PERF-01 | Performance | Full run over 7,943 doc rows + 163 person rows completes in < 60 s on the dev laptop. |
| NFR-ACCUR-01 | Date accuracy | After automated pass, `UNKNOWN` dates ≤ 5% of dated rows; after overrides iteration, ≤ 0.5%. |
| NFR-ACCUR-02 | Name coverage | Every sender/receiver occurrence yields a linked person (register or provisional); 0 dropped. |
| NFR-I18N-01 | Encoding | UTF-8 end-to-end; German diacritics and ß round-trip with no mojibake in any output. |
| NFR-TEST-01 | Testability | `dates.py` and `persons.py` have pytest tests covering every format/alias category in §10 with real examples from the archive. |
| NFR-MAINT-01 | Maintainability | Column-name map, century boundaries, season→month map, and fuzzy threshold live in `config.py`, not inline in logic. |
| NFR-OBSERV-01 | Observability | `review/summary.txt` reports per-run stats: rows in, documents out, dates by precision, names matched vs provisional, overrides applied, anomalies by type. |
| NFR-SAFETY-01 | Source safety | Source workbooks are opened read-only and never written. |
---
## 6. Data Dictionary (canonical contract)
This is the contract Phase 2 (the importer) must consume. Field-level, format-level — not a
DB schema.
### 6.1 `canonical-documents.xlsx`
| Field | Required | Format / values | Notes |
| --- | --- | --- | --- |
| `index` | yes | string | Stable key; basis for PDF matching. |
| `box` | no | string | from `Box`. |
| `folder` | no | string | from `Mappe`. |
| `sender_person_id` | no | person_id | resolved; empty if no sender. |
| `sender_name` | no | string | canonical display name (or cleaned raw if provisional). |
| `receiver_person_ids` | no | `id\|id\|…` | pipe-separated. |
| `receiver_names` | no | `name\|name\|…` | pipe-separated, aligned with ids. |
| `date_iso` | no | `YYYY-MM-DD` | best-effort; empty if `UNKNOWN`. |
| `date_raw` | no | string | verbatim source date. |
| `date_precision` | yes | enum | `DAY\|MONTH\|SEASON\|YEAR\|RANGE\|APPROX\|UNKNOWN`. |
| `location` | no | string | from `Ort`. |
| `tags` | no | `tag\|tag` | from `Schlagwort`. |
| `summary` | no | string | from `Inhalt`. |
| `source_row` | yes | int | provenance (NFR-DATA-01). |
| `needs_review` | yes | `flag\|flag` or empty | review flags (REQ-PROV-02). |
### 6.2 `canonical-persons.xlsx`
| Field | Required | Format | Notes |
| --- | --- | --- | --- |
| `person_id` | yes | slug | stable id (e.g. `de-gruyter-eugenie`); collisions suffixed. |
| `last_name` | yes | string | from `Familienname`. |
| `first_name` | no | string | primary given name. |
| `maiden_name` | no | string | from `geb als` — drives dedup. |
| `title` | no | string | e.g. honorifics if present. |
| `nickname` | no | string | from quoted `Bemerkung`/spouse field. |
| `birth_date` / `birth_date_raw` / `birth_place` | no | ISO / string / string | §4.3 rules. |
| `death_date` / `death_date_raw` / `death_place` | no | ISO / string / string | §4.3 rules. |
| `spouse` | no | person_id or name | from `verheiratet mit`. |
| `generation` | no | string | `G 1`..`G 4`. |
| `notes` | no | string | from `Bemerkung`. |
| `aliases` | no | `a\|b\|c` | every surface form that maps here. |
| `provisional` | yes | bool | true if created from a document string, not the register. |
---
## 7. Prioritized Backlog (MoSCoW)
| ID | Item | MoSCoW | Effort | Depends on |
| --- | --- | --- | --- | --- |
| B1 | Project scaffolding + read both workbooks (`FR-INGEST`, header map `FR-MAP`) | Must | S | — |
| B2 | Row triage + blank/duplicate/empty reports (`FR-TRIAGE`) | Must | S | B1 |
| B3 | Date parser + precision + century rule + Easter/feast computus + season map + tests (`FR-DATE`) | Must | L | B1 |
| B4 | Person register parser → canonical persons (`FR-PERS` US-PERS-01) | Must | M | B1 |
| B5 | Alias index + name resolution + multi-person split (`FR-DEDUP`, US-PERS-02) | Must | L | B4 |
| B6 | Overrides load + apply + idempotency (`FR-OVR`) | Must | S | B3,B5 |
| B7 | Canonical writers + provenance + review summary (`FR-OUT`, `FR-PROV`) | Must | M | B2,B3,B5 |
| B8 | Index↔Datei mismatch report (`REQ-OUT-02`) | Should | XS | B1 |
| B9 | Ambiguous-receiver review path (US-PERS-02 AC4) | Should | S | B5 |
| B10 | Comma-split `Inhalt` into extra tags | Could | XS | B7 |
| B11 | Phase-2 importer wiring (separate spec) | Won't (this spec) | — | B7 |
---
## 8. Traceability — Findings → Requirements
| Finding | Severity | Addressed by |
| --- | --- | --- |
| IMP-01 layout mismatch | blocker | C1, FR-MAP, REQ-OUT-01 |
| IMP-02 free-text dates | blocker | FR-DATE (all), C2, C6 |
| IMP-03 no ISO/normalized cols | blocker | FR-DATE, FR-PERS |
| IMP-04 register unimported | major | C3, US-PERS-01, §6.2 |
| IMP-05 name variants → dupes | major | C3, FR-DEDUP |
| IMP-06 blank-index dropped | major | US-TRIAGE-01 |
| IMP-07 duplicate indices | minor | REQ-TRIAGE-01 |
| IMP-08 section rows / tags vs summary | minor | REQ-TRIAGE-02, C7 |
| IMP-09 index↔file mismatch | minor | REQ-OUT-02, B8 |
| IMP-10 `x`-suffix rows | minor | REQ-TRIAGE-03 (skip + log this pass) |
| IMP-11 sender not split / ` u ` sep | minor | REQ-PERS-01, US-PERS-02 AC4 |
| IMP-12 first-sheet, no validation | minor | REQ-INGEST-01, FR-MAP AC2/AC3 |
---
## 9. Open Questions / TBD Register
| ID | Question | Why it matters | Ref | Resolution |
| --- | --- | --- | --- | --- |
| OQ-01 ✅ | Season/holiday → date. | Accuracy of ~70 SEASON/feast rows. | REQ-DATE-06 | **Resolved (2026-05-25):** movable feasts (Ostern, Pfingsten, Himmelfahrt, Advent, …) **computed per year from Easter — never a fixed month**; fixed feasts looked up (Weihnachten=12-25, Neujahr=01-01, …); seasons = mid-season month (Frühling=Apr, Sommer=Jul, Herbst=Oct, Winter=Jan). |
| OQ-02 ✅ | Date ranges: start only, or start+end? | Sorting/display of ~315 range values. | REQ-DATE-02 | **Confirmed:** store **start** in `date_iso`, precision `RANGE`, full text in `date_raw`. |
| OQ-03 ✅ | `person_id` format. | Stability across re-runs; diffability. | §6 | **Confirmed:** readable slug `lastname-firstname`, numeric suffix on collision. |
| OQ-04 ✅ | `x`-suffix row handling. | 42 rows. | REQ-TRIAGE-03 | **Resolved (2026-05-25):** `x` rows are transcriptions of the base letter but not yet mappable → **skip this pass**, log to `review/skipped-x-suffix.csv` for later linking. |
| OQ-05 ✅ | Importer output format. | Phase-2 reader. | B11 | **Confirmed:** `.xlsx` (openpyxl-native, headered). |
| OQ-06 ✅ | Fuzzy-match policy. | False-positive person merges (R2). | REQ-DEDUP-02 | **Confirmed:** conservative — report all fuzzy matches; no silent merge. |
*All open questions resolved as of 2026-05-25. New ambiguities discovered during build go here.*
---
## 10. Glossary & Worked Examples
**Precision** — how exactly a date is known (`DAY` … `UNKNOWN`). **Provisional person** — a
person created from a document name string with no register match. **Alias index** — map from
every known surface form of a name to a canonical `person_id`. **Override** — a
human-supplied correction applied deterministically on each run.
**Date examples → expected outcome:**
| `date_raw` | `date_iso` | `date_precision` |
| --- | --- | --- |
| `15.2.1888` | 1888-02-15 | DAY |
| `6.März 1888` | 1888-03-06 | DAY |
| `22.III.18` | 1918-03-22 | DAY |
| `13.5.09` | 1909-05-13 | DAY |
| `10.Oct.95` | 1895-10-10 | DAY |
| `17/6. 1916` | 1916-06-17 | DAY |
| `Mai 1895` | 1895-05-01 | MONTH |
| `Pfingsten 1922` | 1922-06-04 | DAY (computed: Easter 1922 = Apr 16, +49 days) |
| `Herbst 1913` | 1913-10-01 | SEASON |
| `1905` | 1905-01-01 | YEAR |
| `8.1.1916 - 15.3.1916` | 1916-01-08 | RANGE |
| `17.Nov (?) 1887` | 1887-11-17 | APPROX |
| `?` | *(empty)* | UNKNOWN |
**Name examples → expected outcome:**
| raw cell | resolves to |
| --- | --- |
| `Eugenie Müller` (+ register `geb Müller`) | `de-gruyter-eugenie` (matched via maiden alias) |
| `Eugenie de Gruyter` | `de-gruyter-eugenie` |
| `Herbert u Clara` | `cram-herbert` + `cram-clara` (split, surname distributed) |
| `Hedi und Tutu (Gruber)` | `gruber-hedi` + `gruber-tutu` |
| `Ella Anita` | → `review/ambiguous-receivers.csv` (not auto-split) |
| `Hans Wittkopf` (not in register) | provisional `wittkopf-hans` |

File diff suppressed because it is too large Load Diff

View File

@@ -1,502 +0,0 @@
# Unresolved-Name Classification Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add a focused `review/unresolved-names.csv` that isolates sender/receiver strings whose *name itself* is problematic (unknown/illegible, single-token, relational-only, collective/group, prose-in-name-column, or a genuine two-given-name pair), and fix the ambiguous-pair heuristic so a plain `First Surname` external person (e.g. `Mieze Schefold`) is no longer falsely flagged.
**Architecture:** A pure `classify_name(raw, given_names)` function in `persons.py` returns a `NameClass`. `ResolutionContext` classifies every *unmatched* name and records the non-`RESOLVABLE` ones in `self.unresolved`. A runtime-built given-name set (register first names + a small config supplement) lets the classifier distinguish a two-given-name pair (`Ella Anita` → two people) from a first+surname single person (`Mieze Schefold`). The orchestrator writes the aggregated report and per-category stats, replacing the noisy `ambiguous-receivers.csv`.
**Tech Stack:** Python 3.12, openpyxl, pytest — extends the existing `tools/import-normalizer/`.
**Context:** This builds on the completed normalizer (PR #663). Run all tests with CWD = the tool dir, e.g. `cd tools/import-normalizer && .venv/bin/python -m pytest tests/test_X.py -v`. Reuse the existing venv at `tools/import-normalizer/.venv` (do NOT recreate it). Commit on the current branch `docs/import-migration` (never main, never push). Each commit message ends with a trailing `Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>` line.
---
## File Structure
```
tools/import-normalizer/
├── config.py # + RELATIONAL_TERMS, COLLECTIVE_TERMS, UNKNOWN_NAME_MARKERS, PROSE_MAX_LEN, EXTRA_GIVEN_NAMES
├── persons.py # + NameClass, classify_name(), build_given_names(); ResolutionContext gains given_names + self.unresolved
├── normalize.py # writes unresolved-names.csv (replaces ambiguous-receivers.csv) + per-category stats
├── README.md # + unresolved-names.csv row in the review-file table
└── tests/
├── test_config.py # + name-table presence test
├── test_persons.py # + classify_name + build_given_names tests
├── test_documents.py # ambiguous test → unresolved test (+ resolvable-pair test)
└── test_normalize.py # integration asserts unresolved-names.csv
```
---
### Task 1: Config — name-classification tables
**Files:**
- Modify: `tools/import-normalizer/config.py`
- Modify: `tools/import-normalizer/tests/test_config.py`
- [ ] **Step 1: Add the failing test** to `tests/test_config.py`
```python
def test_name_classification_tables():
assert "tante" in config.RELATIONAL_TERMS
assert "familie" in config.COLLECTIVE_TERMS
assert "unbekannt" in config.UNKNOWN_NAME_MARKERS
assert config.PROSE_MAX_LEN >= 30
assert "anita" in config.EXTRA_GIVEN_NAMES
```
- [ ] **Step 2: Run to verify it fails**
Run: `cd tools/import-normalizer && .venv/bin/python -m pytest tests/test_config.py::test_name_classification_tables -v && cd -`
Expected: FAIL — `AttributeError: module 'config' has no attribute 'RELATIONAL_TERMS'`.
- [ ] **Step 3: Implement** — append to `config.py` (after the existing tables, before/after `KNOWN_LAST_NAMES` — anywhere at module level)
```python
# --- Name classification (unresolved-name review) ---
# Relational reference terms — a sender/receiver named by relation, not a proper name.
RELATIONAL_TERMS = {
"tante", "onkel", "mutter", "vater", "oma", "opa", "großmutter", "grossmutter",
"großvater", "grossvater", "schwester", "bruder", "cousin", "cousine", "kusine",
"neffe", "nichte", "tochter", "sohn", "schwager", "schwägerin", "schwiegermutter",
"schwiegervater", "enkel", "enkelin", "vetter", "base", "witwe", "witwer",
}
# Collective/group terms — not a single person. Matched against alpha-only word tokens
# (so "Fam.Cram" -> ["fam","cram"] matches "fam"), NOT as substrings/prefixes.
COLLECTIVE_TERMS = {
"familie", "fam", "kinder", "eltern", "geschwister", "großeltern",
"grosseltern", "alle", "diverse", "div", "gebrüder", "gebr",
}
# Markers of an unknown/illegible name (the literal "?" is handled separately in code).
# All long enough to be safe as SUBSTRING matches — do NOT add short tokens like "nn"
# (it occurs inside real names: Hanni, Johanna, Anna).
UNKNOWN_NAME_MARKERS = {"unbekannt", "unbek", "unleserlich", "unklar", "unsicher"}
# A name-column value longer than this (chars) is treated as prose/description, not a name.
PROSE_MAX_LEN = 40
# Common given names that may appear in two-given-name pairs (e.g. "Ella Anita") but are not
# in the family register. Only used to detect AMBIGUOUS_PAIR — extend as review surfaces more.
EXTRA_GIVEN_NAMES = {
"ella", "anita", "kurt", "georg", "hanni", "mieze", "ellen", "leni", "klara",
"margret", "gustava", "emmy", "minna", "sophie", "helga", "raymonde", "augusta",
}
```
- [ ] **Step 4: Run to verify it passes**
Run: `cd tools/import-normalizer && .venv/bin/python -m pytest tests/test_config.py -v && cd -`
Expected: PASS (all config tests).
- [ ] **Step 5: Commit**
```bash
git add tools/import-normalizer/config.py tools/import-normalizer/tests/test_config.py
git commit -m "feat(normalizer): config tables for name classification"
```
---
### Task 2: `classify_name` + `NameClass`
**Files:**
- Modify: `tools/import-normalizer/persons.py`
- Modify: `tools/import-normalizer/tests/test_persons.py`
- [ ] **Step 1: Add failing tests** to `tests/test_persons.py`
```python
from persons import NameClass
GIVEN = {"ella", "anita", "kurt", "georg", "clara", "eugenie"}
def test_classify_unknown():
assert persons.classify_name("?", GIVEN) is NameClass.UNKNOWN
assert persons.classify_name("A. Kredell?", GIVEN) is NameClass.UNKNOWN
assert persons.classify_name("unbekannt", GIVEN) is NameClass.UNKNOWN
def test_classify_prose():
assert persons.classify_name("Adressenliste v Clara Cram zur Kondolenz", GIVEN) is NameClass.PROSE
assert persons.classify_name("Clara de Gruyter(*1871)", GIVEN) is NameClass.PROSE # digit
assert persons.classify_name('"Cramiade" Gedicht', GIVEN) is NameClass.PROSE # quote
def test_classify_collective():
assert persons.classify_name("Familie", GIVEN) is NameClass.COLLECTIVE
assert persons.classify_name("Fam.Cram", GIVEN) is NameClass.COLLECTIVE
assert persons.classify_name("Eltern Cram", GIVEN) is NameClass.COLLECTIVE
assert persons.classify_name("seine Kinder", GIVEN) is NameClass.COLLECTIVE
def test_classify_relational():
assert persons.classify_name("Cousine Emmy Haniel", GIVEN) is NameClass.RELATIONAL
assert persons.classify_name("Schwester Hanni", GIVEN) is NameClass.RELATIONAL
def test_classify_single_token():
assert persons.classify_name("Agnes", GIVEN) is NameClass.SINGLE_TOKEN
assert persons.classify_name("A.B.", GIVEN) is NameClass.SINGLE_TOKEN
def test_classify_ambiguous_pair():
assert persons.classify_name("Ella Anita", GIVEN) is NameClass.AMBIGUOUS_PAIR
assert persons.classify_name("Kurt Georg", GIVEN) is NameClass.AMBIGUOUS_PAIR
def test_classify_resolvable_single_person():
# first + surname (surname not a given name) -> one real person, NOT ambiguous
assert persons.classify_name("Mieze Schefold", GIVEN) is NameClass.RESOLVABLE
assert persons.classify_name("Adolf Butenandt", GIVEN) is NameClass.RESOLVABLE
```
- [ ] **Step 2: Run to verify it fails**
Run: `cd tools/import-normalizer && .venv/bin/python -m pytest tests/test_persons.py -k classify -v && cd -`
Expected: FAIL — `NameClass` / `classify_name` not defined.
- [ ] **Step 3: Implement** — add to `persons.py`. Add `from enum import StrEnum` to the imports if not present, then add:
```python
class NameClass(StrEnum):
RESOLVABLE = "resolvable"
UNKNOWN = "unknown"
SINGLE_TOKEN = "single_token"
RELATIONAL = "relational"
COLLECTIVE = "collective"
PROSE = "prose"
AMBIGUOUS_PAIR = "ambiguous_pair"
_QUOTE_CHARS = "\"'“”„‚‘’"
def classify_name(raw: str, given_names: set[str]) -> NameClass:
"""Classify a (post-split) sender/receiver string by why it may be unresolvable.
Precedence (first match wins): UNKNOWN -> PROSE -> COLLECTIVE -> RELATIONAL ->
SINGLE_TOKEN -> AMBIGUOUS_PAIR -> RESOLVABLE.
"""
s = raw.strip()
if not s:
return NameClass.RESOLVABLE
low = s.lower()
tokens = s.split()
# alpha-only word tokens: "Fam.Cram" -> ["fam","cram"], so collective/relational terms
# are matched as whole words (no substring/prefix false positives like "Allerton").
alpha_words = re.findall(r"[a-zäöüß]+", low)
if "?" in s or any(m in low for m in config.UNKNOWN_NAME_MARKERS):
return NameClass.UNKNOWN
if (len(s) > config.PROSE_MAX_LEN or any(c.isdigit() for c in s)
or any(q in s for q in _QUOTE_CHARS) or len(tokens) > 3):
return NameClass.PROSE
if any(w in config.COLLECTIVE_TERMS for w in alpha_words):
return NameClass.COLLECTIVE
if any(w in config.RELATIONAL_TERMS for w in alpha_words):
return NameClass.RELATIONAL
if len(tokens) == 1:
return NameClass.SINGLE_TOKEN
if len(tokens) == 2 and all(_norm(t) in given_names for t in tokens):
return NameClass.AMBIGUOUS_PAIR
return NameClass.RESOLVABLE
# Known limitation: a 4+-token name with no digits/quotes (e.g. "Anna von der Heide") is
# classified PROSE. Such multi-particle names are rare here and usually resolve via the
# register; if they surface in review, lower-priority than the real prose entries.
```
> Note: `_norm` already exists in `persons.py` (added in the alias-index task) and strips accents + lowercases. `classify_name` uses it so given-name matching is accent-insensitive.
- [ ] **Step 4: Run to verify it passes**
Run: `cd tools/import-normalizer && .venv/bin/python -m pytest tests/test_persons.py -v && cd -`
Expected: PASS (all persons tests, including the 7 new classify tests).
- [ ] **Step 5: Commit**
```bash
git add tools/import-normalizer/persons.py tools/import-normalizer/tests/test_persons.py
git commit -m "feat(normalizer): classify_name + NameClass"
```
---
### Task 3: `build_given_names`
**Files:**
- Modify: `tools/import-normalizer/persons.py`
- Modify: `tools/import-normalizer/tests/test_persons.py`
- [ ] **Step 1: Add failing test** to `tests/test_persons.py`
```python
def test_build_given_names():
people = persons.parse_register([
{"last_name": "de Gruyter", "first_name": "Eugenie"},
{"last_name": "Cram", "first_name": "Charlotte,Meta"}, # comma -> primary + extra given
])
g = persons.build_given_names(people, {"Anita"})
assert "eugenie" in g
assert "charlotte" in g and "meta" in g # primary + extra given names
assert "anita" in g # from the extra set, normalized
assert "schefold" not in g
```
- [ ] **Step 2: Run to verify it fails**
Run: `cd tools/import-normalizer && .venv/bin/python -m pytest tests/test_persons.py::test_build_given_names -v && cd -`
Expected: FAIL — `build_given_names` not defined.
- [ ] **Step 3: Implement** — add to `persons.py`
```python
def build_given_names(register: list[Person], extra: set[str]) -> set[str]:
"""Set of normalized given names from the register (first + extra given) plus a supplement.
Used by classify_name to tell a two-given-name pair (two people) from a first+surname.
"""
names: set[str] = set()
for p in register:
if p.first_name:
names.add(_norm(p.first_name))
for g in p.extra_given_names:
names.add(_norm(g))
for e in extra:
names.add(_norm(e))
return names
```
- [ ] **Step 4: Run to verify it passes**
Run: `cd tools/import-normalizer && .venv/bin/python -m pytest tests/test_persons.py -v && cd -`
Expected: PASS.
- [ ] **Step 5: Commit**
```bash
git add tools/import-normalizer/persons.py tools/import-normalizer/tests/test_persons.py
git commit -m "feat(normalizer): build_given_names from register + supplement"
```
---
### Task 4: Integrate — ResolutionContext records unresolved; orchestrator writes the report
This task touches `persons.py`, `normalize.py`, and two test files together so the whole suite stays green in one commit (removing `ctx.ambiguous` requires updating its only consumer, `normalize.py`, in the same change).
**Files:**
- Modify: `tools/import-normalizer/persons.py` (ResolutionContext)
- Modify: `tools/import-normalizer/normalize.py`
- Modify: `tools/import-normalizer/tests/test_documents.py`
- Modify: `tools/import-normalizer/tests/test_normalize.py`
- [ ] **Step 1: Update the failing tests first**
In `tests/test_documents.py`, **replace** the existing `test_ambiguous_space_pair_flagged_not_split` function entirely with these two functions:
```python
def test_ambiguous_pair_recorded_in_unresolved():
people = persons.parse_register([{"last_name": "de Gruyter", "first_name": "Walter"}])
ctx = persons.ResolutionContext(persons.AliasIndex(people), name_overrides={},
given_names={"ella", "anita"})
raw = documents.RawRow(source_row=7, index="C-0200", sender="", receivers="Ella Anita")
doc = documents.to_canonical(raw, ctx, date_overrides={})
assert len(doc.receiver_person_ids) == 1 # not split — one provisional
assert any(name == "Ella Anita" and cat == "ambiguous_pair" for name, cat, _ in ctx.unresolved)
def test_resolvable_first_surname_pair_not_unresolved():
ctx = persons.ResolutionContext(persons.AliasIndex([]), name_overrides={},
given_names={"ella", "anita"})
ctx.resolve_one("Mieze Schefold", source_row=1) # surname is not a given name
assert ctx.unresolved == [] # RESOLVABLE -> not recorded
```
In `tests/test_normalize.py`, in the `_doc_wb` fixture, change the `C-0001` row's receiver from empty to `"?"` so the run produces an unresolved entry. Find the line that appends the `C-0001` row and set its `EmpfängerIn` cell to `"?"`. For example the row currently reads:
```python
ws.append(["C-0001", "", "", "", "Hans Wittkopf", "", "Freitag 1919", "", "", ""])
```
change the 6th cell (EmpfängerIn) from `""` to `"?"`:
```python
ws.append(["C-0001", "", "", "", "Hans Wittkopf", "?", "Freitag 1919", "", "", ""])
```
Then add these assertions inside `test_run_end_to_end`, right after the existing `assert (review_dir / "unparsed-dates.csv").exists()` line:
```python
assert (out_dir / "canonical-documents.xlsx").exists() # (keep existing asserts above)
assert (review_dir / "unresolved-names.csv").exists()
unresolved_text = (review_dir / "unresolved-names.csv").read_text(encoding="utf-8")
assert "unknown" in unresolved_text and "?" in unresolved_text # the "?" receiver
assert not (review_dir / "ambiguous-receivers.csv").exists() # replaced
```
- [ ] **Step 2: Run to verify they fail**
Run: `cd tools/import-normalizer && .venv/bin/python -m pytest tests/test_documents.py tests/test_normalize.py -v && cd -`
Expected: FAIL — `ResolutionContext` has no `given_names`/`unresolved`; `unresolved-names.csv` not written.
- [ ] **Step 3a: Implement — `ResolutionContext` in `persons.py`**
Replace the `ResolutionContext.__init__` body's two lines (`self.ambiguous` and add `given_names`) and the relevant methods. The new `__init__`:
```python
def __init__(self, alias_index: AliasIndex, name_overrides: dict[str, str],
given_names: set[str] | None = None):
self.index = alias_index
self.name_overrides = name_overrides
self.given_names = given_names or set()
self.provisional: dict[str, Person] = {}
self.unmatched: dict[str, list] = {}
self.unresolved: list[tuple] = [] # (raw_name, category, source_row) for non-RESOLVABLE names
self._raw_to_pid: dict[str, str] = {}
self.override_hits = 0
```
In `resolve_one`, the provisional branch must classify the name. Replace this existing block:
```python
# provisional person (unmatched) — never reuse a register id
self.unmatched.setdefault(name, []).append(source_row)
if name in self._raw_to_pid:
return self._raw_to_pid[name], name, False
```
with:
```python
# provisional person (unmatched) — never reuse a register id
self.unmatched.setdefault(name, []).append(source_row)
category = classify_name(name, self.given_names)
if category is not NameClass.RESOLVABLE:
self.unresolved.append((name, str(category), source_row))
if name in self._raw_to_pid:
return self._raw_to_pid[name], name, False
```
Replace the entire `resolve_receivers` method (the ambiguous detection now lives in `resolve_one` via `classify_name`):
```python
def resolve_receivers(self, raw: str, source_row: int):
return [self.resolve_one(part, source_row) for part in split_receivers(raw)]
```
- [ ] **Step 3b: Implement — `normalize.py`**
Find the line that builds the context:
```python
ctx = persons.ResolutionContext(alias_index, name_overrides)
```
replace it with (build the given-name set from the register + config supplement):
```python
given_names = persons.build_given_names(register, config.EXTRA_GIVEN_NAMES)
ctx = persons.ResolutionContext(alias_index, name_overrides, given_names=given_names)
```
Replace the `ambiguous-receivers.csv` write line:
```python
writers.write_review_csv(review_dir / "ambiguous-receivers.csv", ["raw", "part", "source_row"], ctx.ambiguous)
```
with an aggregated unresolved-names report:
```python
unresolved_agg: dict[tuple, list] = {}
for name, category, row in ctx.unresolved:
unresolved_agg.setdefault((category, name), []).append(row)
unresolved_rows = sorted(
([cat, name, len(rows), " ".join(map(str, sorted(rows)[:5]))]
for (cat, name), rows in unresolved_agg.items()),
key=lambda r: (r[0], -r[2], r[1]))
writers.write_review_csv(review_dir / "unresolved-names.csv",
["category", "raw", "count", "example_rows"], unresolved_rows)
```
In the `stats` dict, replace the `"ambiguous_receivers"` line:
```python
"ambiguous_receivers": len(ctx.ambiguous),
```
with a per-category breakdown:
```python
"unresolved_name_occurrences": len(ctx.unresolved),
"unresolved_unknown": sum(1 for _, c, _ in ctx.unresolved if c == "unknown"),
"unresolved_single_token": sum(1 for _, c, _ in ctx.unresolved if c == "single_token"),
"unresolved_relational": sum(1 for _, c, _ in ctx.unresolved if c == "relational"),
"unresolved_collective": sum(1 for _, c, _ in ctx.unresolved if c == "collective"),
"unresolved_prose": sum(1 for _, c, _ in ctx.unresolved if c == "prose"),
"unresolved_ambiguous_pair": sum(1 for _, c, _ in ctx.unresolved if c == "ambiguous_pair"),
```
- [ ] **Step 4: Run the whole suite to verify green**
Run: `cd tools/import-normalizer && .venv/bin/python -m pytest tests/ -q && cd -`
Expected: PASS (all tests, no `ambiguous` references remain).
Also grep to confirm no dangling references:
Run: `grep -rn "ctx.ambiguous\|ambiguous-receivers\|ambiguous_receivers\|self.ambiguous" tools/import-normalizer/*.py`
Expected: no matches.
- [ ] **Step 5: Commit**
```bash
git add tools/import-normalizer/persons.py tools/import-normalizer/normalize.py tools/import-normalizer/tests/test_documents.py tools/import-normalizer/tests/test_normalize.py
git commit -m "feat(normalizer): unresolved-names report + fix ambiguous-pair over-flagging"
```
---
### Task 5: README — document the new report
**Files:**
- Modify: `tools/import-normalizer/README.md`
- [ ] **Step 1: Update the review-file table** in `README.md`. Replace the `ambiguous-receivers.csv` row with an `unresolved-names.csv` row. Find the table row referencing `ambiguous-receivers.csv` and replace it with:
```markdown
| `unresolved-names.csv` | Names whose value is itself problematic, grouped by `category`: `unknown` (`?`/illegible), `single_token` (first OR last name only), `relational` (`Tante …`), `collective` (`Familie …`), `prose` (a description landed in a name column), `ambiguous_pair` (two given names → likely two people, not auto-split). Review highest-impact categories first; add decisions to `overrides/names.csv`. |
```
If the README has no such row (older version), add the row above to the review-file table.
- [ ] **Step 2: Add a note** to the iteration-loop section of `README.md` (after the table):
```markdown
> `unresolved-names.csv` is the focused "names that need a human" list — distinct from
> `unmatched-names.csv` (which is just non-family correspondents that got provisional persons).
> The given-name set that drives `ambiguous_pair` detection is the register's first names plus
> `config.EXTRA_GIVEN_NAMES` — add names there if a real two-person cell isn't being flagged.
```
- [ ] **Step 3: Verify the suite is still green** (README-only change, but confirm nothing references the old file)
Run: `cd tools/import-normalizer && .venv/bin/python -m pytest tests/ -q && cd -`
Expected: PASS.
- [ ] **Step 4: Commit**
```bash
git add tools/import-normalizer/README.md
git commit -m "docs(normalizer): document unresolved-names.csv review report"
```
---
## Self-Review
**Spec coverage** (against the agreed proposal):
- Focused report isolating problem name classes → Task 4 writes `review/unresolved-names.csv` with a `category` column; categories defined in Task 2 `classify_name`. ✓
- Fix ambiguous over-flagging of `First Surname` → Task 2 `AMBIGUOUS_PAIR` requires *both* tokens in the given-name set; `Mieze Schefold``RESOLVABLE` (tested). ✓
- Distinguish "not fully known" (unknown/single-token/relational/collective/prose) from "can't split cleanly" (ambiguous_pair) → all are `NameClass` values, each its own category column value. ✓
- Per-category counts in summary → Task 4 stats. ✓
- Senders covered too (not just receivers) → classification happens in `resolve_one`, which both `resolve_sender` and `resolve_receivers` call. ✓
**Placeholder scan:** No TBD/TODO; every code step has complete code. The README replacement gives the exact row text.
**Type consistency:** `NameClass` (StrEnum) defined Task 2; `classify_name(raw, given_names)` and `build_given_names(register, extra)` signatures used consistently in Task 4; `ResolutionContext(alias_index, name_overrides, given_names=…)` matches the new `__init__`; `self.unresolved` is `list[tuple]` of `(raw, category, source_row)` and read with that shape in both the report and the stats. `str(category)` yields the StrEnum value (e.g. `"ambiguous_pair"`), matching the stat comparisons and the test assertions.
**Cross-task green:** Task 4 deliberately bundles the `persons.py` + `normalize.py` + test changes into one commit because removing `ctx.ambiguous` breaks its consumer otherwise — no red commit is left behind (lesson from the prior build).
**Out of scope (future):** Spanish month names + `Mon DD-YYYY` date form (separate date-parser enhancement); promoting `unresolved` rows into a document-level `needs_review` flag; auto-splitting confirmed `ambiguous_pair` entries via overrides.

View File

@@ -1,62 +0,0 @@
# Import Migration — Working Folder
This folder tracks the iterative work of mass-importing the **real, raw family archive**
spreadsheets (≈7,600 letter rows + ~7,000 PDFs that arrive later) into Familienarchiv.
It is intentionally **local docs, not Gitea issues**. We only open a Gitea issue when a
finding requires a *software* change (e.g. a new date parser). Pure data observations and
the running plan live here so any agent can pick the work up cold.
## Source files (in `/import`)
| File | What it is | Importer support today |
| --- | --- | --- |
| `zzfamilienarchiv aktuell 2 - Kopie 2025-07-05.xlsx` | The **real raw archive** — 7,943 rows, sheet `Familienarchiv`. Human-readable, dates as written in the letters. | ❌ layout does **not** match importer defaults |
| `Personendatei 2.xlsx` | Genealogical **person register** — 163 people, sheet `Tabelle1` (maiden names, birth/death, marriages, relationships). | ❌ no importer at all |
| `zzfamilienarchiv Walter und Eugenie 2025-04-10.ods` | A small, **already-normalized** subset (Walter & Eugenie brautbriefe). 14 clean columns incl. ISO dates. | ✅ this is what `MassImportService` was built for |
The PDFs (~7,000) will follow later. The importer matches files by the **Index** column
(e.g. `W-0001``W-0001.pdf`), and already imports metadata-only when a file is missing —
so we can import all metadata now and the PDFs will attach on a re-run.
## How to inspect the spreadsheets
`openpyxl` is installed in the OCR service venv:
```bash
/home/marcel/Desktop/familienarchiv/ocr-service/.venv/bin/python3 -c "import openpyxl; print(openpyxl.__version__)"
```
## Documents in this folder
- [`01-findings-spreadsheet-analysis.md`](./01-findings-spreadsheet-analysis.md) — full analysis of every data-quality / importer issue found (2026-05-25). Each issue has an ID `IMP-NN`.
- [`02-normalization-spec.md`](./02-normalization-spec.md) — requirements spec for the offline **import normalizer** (the agreed strategy: normalize the raw sheets into a clean canonical dataset before import). Requirements `FR-*`/`NFR-*`, traceable to the `IMP-NN` findings.
- `WORKLOG.md` — running log of what each session did and what's next. **Start here when resuming.**
## Strategy (decided 2026-05-25)
Normalize **before** import. A standalone Python tool (`tools/import-normalizer/`, not yet
built) transforms the raw xlsx + person register into a clean canonical dataset
(`canonical-documents.xlsx`, `canonical-persons.xlsx`) plus review CSVs. Residual cases
(unparseable dates, unmatched names) are fixed via a version-controlled overrides file and
re-run. The Java importer is adjusted to consume the canonical contract in a later **Phase 2**.
See the spec for the full contract.
## Status board
| ID | Issue | Severity | Status |
| --- | --- | --- | --- |
| IMP-01 | New xlsx column layout ≠ importer defaults | 🔴 blocker | open |
| IMP-02 | 90% of dates are free-text the parser can't read | 🔴 blocker | open |
| IMP-03 | No ISO/normalized date column in the new xlsx | 🔴 blocker | open |
| IMP-04 | Person register (`Personendatei 2.xlsx`) not imported | 🟠 major | open |
| IMP-05 | Name variations = duplicate Persons (maiden vs married) | 🟠 major | open |
| IMP-06 | 93 data rows with blank Index are silently dropped | 🟠 major | open |
| IMP-07 | 43 duplicate Index values | 🟡 minor | open |
| IMP-08 | Section/title rows interleaved in data | 🟡 minor | open |
| IMP-09 | Index↔Datei filename mismatches | 🟡 minor | open |
| IMP-10 | `x`-suffix rows (letter backsides/enclosures) | 🟡 minor | open |
| IMP-11 | Multi-receiver separators incl. bare `u`/`u.` | 🟡 minor | open |
| IMP-12 | Importer reads only the first sheet, no validation | 🟡 minor | open |
See the findings doc for detail and proposed approach per issue.

View File

@@ -1,147 +0,0 @@
# Import Migration — Worklog
Running log of each working session. **Resume here.** Newest entry on top.
---
## 2026-05-25 (session 5) — Unresolved-name classification
**Did:** Implemented [`04-unresolved-names-plan.md`](./04-unresolved-names-plan.md) subagent-driven
(5 tasks, TDD, per-task spec + code-quality review; 67 tests pass). Added `classify_name` +
`NameClass` + `build_given_names` in `persons.py`; `ResolutionContext` now records non-RESOLVABLE
names in `self.unresolved`; orchestrator writes `review/unresolved-names.csv` (replaces the noisy
`ambiguous-receivers.csv`) with per-category stats.
**Why:** `unmatched-names.csv` mixes boring non-family correspondents (expected) with genuinely
unresolvable entries. The new report isolates the latter so review focuses on ~440 real cases.
**Real-run result:** unresolved-names.csv = single_token 191 / prose 103 / unknown 74 /
collective 46 / relational 21 / ambiguous_pair **5** (distinct). The ambiguous over-flagging fix
cut `ambiguous_pair` from 303 → 5 (genuine two-given-name pairs only; `Mieze Schefold` etc. now
correctly RESOLVABLE). given-name set = register first names `config.EXTRA_GIVEN_NAMES`.
**Next:** populate `overrides/names.csv` from unresolved-names.csv (highest-count first); extend
`EXTRA_GIVEN_NAMES` if a real pair isn't flagged; still-open date work (Spanish months, 5872 band).
---
## 2026-05-25 (session 4) — Built the normalizer (subagent-driven, all 17 tasks)
**Did:** Executed the plan subagent-driven (implementer + spec review + code-quality review per
task). The tool `tools/import-normalizer/` is **complete and passing (57 tests)**. Final
opus review: **READY** — determinism verified on the real corpus (two runs → identical cell
matrices + byte-identical review files), zero silent drops.
**Per-task code review caught & fixed real issues** (all in the committed code): leading
qualifiers `nach/vor/…` now → APPROX; English month-first matcher hardened to structurally
not shadow `Mai 1895`; person-id collision de-dup suffixes *all* members; `split_receivers`
returns `[]` for a `geb.`-only cell; boolean cells no longer coerced to `1/0`; duplicate-index
flags every occurrence; provisional ids never steal a register id; CSV-injection defanged.
**REAL DRY-RUN** (`python normalize.py` over the actual archive — outputs are gitignored):
- documents_emitted **7,582** (+225 empty +93 blank-index +42 x-suffix = 7,942 rows read, 0 dropped)
- register_persons **163**, provisional_persons **942**
- dates: DAY 6,509 / MONTH 36 / RANGE 36 / APPROX 28 / YEAR 17 / SEASON 1 / UNKNOWN 955
- **unknown_date_rate 9.2%** (of dated rows; target ≤5% pre-override, ≤0.5% after overrides)
- duplicate_index 85, index_file_mismatches 550, ambiguous_receivers 303
**⚠️ Concurrency incident:** a parallel Claude session committed reader-dashboard work to this
branch and hard-reset it mid-execution, deleting the Task 15 files and orphaning a commit.
Recovered via reflog (`reset --hard 366b4848` + `checkout 401160e3 -- <task15 files>`); no code
lost. Casualty: my *during-execution* edits to the plan/spec docs (02/03) for Tasks 514 were
discarded — **the committed code + tests are the source of truth**, not the plan doc, which now
reflects the pre-execution + persona-review version.
**Next steps (iterative refinement — the overrides loop, as designed):**
1. Shave the 9.2% UNKNOWN cheaply: add **Spanish month names** (Enero…Diciembre) and the
`Mon DD-YYYY` dash form to `config.MONTHS`/the parser (Mexican-branch correspondence);
revisit the 5872 two-digit-year band (real `…58/59/60` dates = 19581960, just past the
18731957 window — decide whether to extend the upper bound in `config`).
2. `?` (99×) is genuinely "date unknown" — leave UNKNOWN or add a convention.
3. Populate `overrides/dates.csv` + `overrides/names.csv` from the review CSVs and re-run.
4. README note: a leading `'`/`!` in a `review/*.csv` `raw` cell may be a CSV-defang artifact —
match against the true source value when writing overrides.
5. Phase 2 (separate spec): wire the canonical contract into the Java `MassImportService`.
---
## 2026-05-25 (session 3) — Implementation plan + persona review
**Did:**
- Wrote [`03-normalizer-implementation-plan.md`](./03-normalizer-implementation-plan.md): 17
bite-sized TDD tasks for `tools/import-normalizer/` (Python, openpyxl), bottom-up — date
parser w/ Easter computus first, then persons/alias, ingest, mapping, orchestrator, writers.
- Ran a 6-persona inline review (architect, developer, tester, req-engineer, security, devops;
ui-expert too) via parallel agents. Acted on all material findings.
**Key fixes from review (see plan §"Review feedback incorporated"):**
- Idempotency redefined byte-identical → **content-deterministic** (spec G4/NFR-IDEM-01);
pinned workbook timestamps + deterministic alias ordering + a real two-run equality test.
- Real bug: duplicate-index only reported repeats → now flags/reports every occurrence.
- Provisional `person_id` could overwrite a register id → now suffixed.
- Date parser gaps: invalid-calendar-date → UNKNOWN, intra-month day-range (`7./8. Sept.1923`).
- Multi-person sender now split + flagged (REQ-PERS-01); CSV-injection defanged in review files;
pinned deps + hardened root `.gitignore`.
**Next:**
- Marcel reviews the plan. Then execute it (subagent-driven or inline) — the date parser
(Task 3/8 + Easter computus) is the meatiest piece.
---
## 2026-05-25 (session 2) — Strategy + normalizer spec
**Did:**
- Decided strategy with Marcel: **normalize the raw sheets first**, then import (higher
leverage than making the Java importer tolerate every mess).
- Locked design decisions (see spec §3): new canonical layout; dates = parsed + raw +
precision; include person register + dedup in this effort; overrides-file + re-run loop;
Python tool at `tools/import-normalizer/`.
- Century rule fixed by Marcel: archive spans **18731957**; 2-digit `0057`→19YY,
`7399`→18YY, `5872`→flag; 3-digit→1DDD; never 20xx.
- Wrote [`02-normalization-spec.md`](./02-normalization-spec.md) in the requirements-engineer
persona (FR/NFR, Given-When-Then ACs, traceability to IMP-NN, TBD register).
**All 6 open questions resolved (spec §9):** OQ-01 — movable feasts (Ostern, Pfingsten, …)
**computed per year from Easter**, never a fixed month; seasons → mid-season month
(Sommer=Jul, Herbst=Oct). OQ-02 ranges → start+RANGE. OQ-03 slug ids. OQ-04 — `x`-suffix rows
**skipped + logged** this pass (they're transcriptions of the base letter, not yet mappable).
OQ-05 → `.xlsx`. OQ-06 → conservative, no silent merge.
**Git:** moved off the unrelated `feat/issue-356-…` branch; pulled `main`; created clean
branch **`docs/import-migration`** and committed these docs there. (The dirty `.venv`
pycache + `skills/implement/SKILL.md` in the tree are pre-existing/environmental noise — left
uncommitted, not ours.)
**Next:**
- Marcel reviews the spec.
- Then writing-plans → build the normalizer at `tools/import-normalizer/` (backlog B1B7 are
the Musts; B3 date parser incl. Easter computus is the big one).
---
## 2026-05-25 (session 1) — Initial analysis
**Did:**
- Got the real raw archive xlsx (7,943 rows) + person register (163 people). PDFs to follow.
- Compared the new xlsx layout against `MassImportService` defaults and the old ODS.
- Full statistical scan of all rows: dates, indices, senders/receivers, file column.
- Wrote [`01-findings-spreadsheet-analysis.md`](./01-findings-spreadsheet-analysis.md)
with 12 issues (IMP-01..IMP-12) + recommended sequencing.
- Installed `openpyxl` into the OCR service venv for inspection.
**Key facts established:**
- Importer defaults match the **ODS**, not the new xlsx → wrong column mapping (IMP-01).
- **90%** of dated rows (6,571 / 7,319) are free-text dates the ISO-only parser drops (IMP-02).
- Person register is rich but **unimported**; holds the maiden-name dedup key (IMP-04/05).
**Decisions pending from Marcel (blockers for any code work):**
1. IMP-01: positional re-config of `app.import.col.*` vs header-driven mapping rewrite?
2. IMP-02: how to store imprecise dates — new `dateOriginal` + `precision` columns, or lossy?
3. IMP-04/05: format for the person/alias mapping; import persons before documents?
4. IMP-10: are `x`-suffix rows separate documents, attachments, or skipped?
**Next:**
- Get Marcel's calls on the 4 decisions above.
- Then split the code-change items into Gitea issues (IMP-01b, IMP-02, IMP-04, IMP-06, IMP-12).
- Pure-data tasks (IMP-07 dup list, IMP-09 file reconcile) stay here.

File diff suppressed because it is too large Load Diff

View File

@@ -1,292 +0,0 @@
# Personendatei Importer — Design Spec
**Date:** 2026-05-25
**Source file:** `import/Personendatei 2.xlsx`
**Output:** `tools/import-normalizer/out/canonical-persons-tree.json`
**Tool location:** `tools/import-normalizer/persons_tree.py`
---
## 1. Purpose
Normalize the 163-person family register in `Personendatei 2.xlsx` into a machine-readable JSON file that a future backend importer can consume to seed the `persons` and `person_relationships` tables. The tool is offline (no backend required) and produces a reviewable artifact with an explicit `unresolved[]` list for manual follow-up.
---
## 2. Source Data — Column Map
Sheet: `Tabelle1` (rows 2164; row 1 is the header).
| Col | Header | Content | Notes |
|-----|--------|---------|-------|
| A | Generation | `G 0``G 5` | Generation relative to Herbert & Clara Cram (G 2). Inconsistent formatting: `"G3"`, `"G 0"`, `"G 2 de Gruyter"` — strip non-digit chars and parse the integer. |
| B | Familienname | Last name | Sometimes compound: `"de Gruyter"`, `"Cram Heydrich"`, `"Burkhard- Meier"` |
| C | Vorname | First name | Sometimes multiple: `"Charlotte,Meta,Jacobi"`, nicknames in parens: `"Otto (Herbert)"` |
| D | geb als | Maiden name | Used as a name alias for matching |
| E | Geburtsdatum | Birth date | **Mixed types** — see §4 |
| F | Geburtsort | Birth place | Free-text string, stored verbatim |
| G | Todesdatum | Death date | Same mixed types as col E |
| H | Sterbeort | Death place | Free-text string, stored verbatim |
| I | verheiratet mit | Spouse name | Partial name in either `"Firstname Lastname"` or `"Lastname Firstname"` order |
| J | Bemerkung | German relationship notes | `"Sohn v Clara u Herbert"`, `"Nichte v Herbert"`, free text |
---
## 3. Two-Pass Architecture
### Pass 1 — Parse & Normalize (rows → person records)
For each row:
1. Read all 10 columns.
2. Assign a stable `rowId`: `"row_{i:03d}"` where `i` is the 1-based row number (e.g. `row_002`).
3. Normalize fields per §4 and §5.
4. Build the **name-lookup index** (see §6).
5. Emit a person record.
### Pass 2 — Resolve Relationships
Walk every person record:
1. Resolve col I (spouse) → emit `SPOUSE_OF` edge or `unresolved` entry.
2. Parse col J (Bemerkung) for parent/child patterns → emit `PARENT_OF` edges or `unresolved` entries.
3. Append unmatched Bemerkung text to `person.notes`.
---
## 4. Date Parsing
Both col E (birth) and col G (death) arrive as either an Excel numeric serial or a string.
### Excel serial conversion
When the cell value is an integer (or a float with no string representation):
```
date = datetime(1899, 12, 30) + timedelta(days=int(value))
year = date.year
```
Excel's epoch is 1899-12-30 (accounts for the Lotus 1-2-3 leap-year bug).
### String fallback — reuse existing `dates.parse_date()`
Pass the raw string to the existing `tools/import-normalizer/dates.parse_date()`. It already handles:
- `DD.MM.YYYY` and `D.M.YY`
- Year-only (`1930`)
- Month + year (`August 1941`, `Sept. 1913`)
- Partial/approximate markers
Extract `.year` from the returned `ParsedDate.iso` if `iso` is not `None`.
### Unresolvable dates
If both paths yield `None` (e.g. `"2.9.196"`, `"4.3.1023"`, `".12.1955"`):
- Set `birthYear`/`deathYear` to `null`.
- Append the raw value to `person.notes` as `"[Geburtsdatum: <raw>]"` or `"[Todesdatum: <raw>]"` for human review.
---
## 5. Person Record Normalization
### Name fields
- **lastName** = col B, stripped.
- **firstName** = col C. Keep as-is (including multi-name strings and parenthetical nicknames) — the backend can split later.
- **maidenName** = col D, stripped. Stored in the JSON; the backend maps this to a `PersonNameAlias` of type `BIRTH_NAME`.
- **alias** = `null` (the tool does not invent aliases; maiden name is the alias).
### Generation
Extract the first digit sequence from col A:
```python
import re
m = re.search(r"\d+", raw_generation)
generation = int(m.group()) if m else None
```
Handles all observed variants: `"G 3"`, `"G3"`, `"G 0"`, `"G 2 de Gruyter"`, `"G 0"`.
Stored as `generation: int | null` in the JSON (informational; not mapped to a backend field directly).
### familyMember
Set `true` for all records. Every person in this register is part of the family network. The backend can refine this.
### notes
Constructed by concatenation:
1. Unmatched Bemerkung text (after relationship pattern is stripped).
2. Unresolvable date raw values (prefixed with field name).
---
## 6. Name Lookup Index
After pass 1, build a `dict[str, list[str]]` mapping normalized name keys → list of `rowId`s.
### Normalization function `_norm(s) -> str`
1. Lowercase.
2. Strip surrounding `"` and `'`.
3. Remove parenthetical substrings: `r"\([^)]*\)"`.
4. Collapse internal whitespace.
5. Strip geographic/honorific suffixes: `aachen`, `mex.`, `mexiko`, `sen`, `jun`, `jr`.
6. Strip trailing commas, dots.
### Keys indexed per person
For a person with firstName `F`, lastName `L`, maidenName `M`:
- `_norm(f"{F} {L}")` — canonical order
- `_norm(f"{L} {F}")` — reversed order (col I uses this heavily)
- `_norm(f"{F} {M}")` if maidenName is set — maiden-name reference
- `_norm(L)` alone — single-token fallback
### Match resolution
Given a raw name string from col I or col J:
1. `_norm(raw)` → look up in index.
2. **Exactly one hit** → match confirmed, use that `rowId`.
3. **Zero hits**`reason: "not_found"``unresolved[]`.
4. **Multiple hits**`reason: "ambiguous"``unresolved[]`.
---
## 7. Relationship Extraction
### 7.1 SPOUSE_OF (col I — `verheiratet mit`)
1. Normalize col I value.
2. Resolve via name index (§6).
3. If matched: emit one edge `{ personId, relatedPersonId, type: "SPOUSE_OF", source: "verheiratet_mit" }`.
- Skip if an identical edge (regardless of direction) already exists in the relationship list.
4. If unresolved: add to `unresolved[]`.
### 7.2 PARENT_OF (col J — `Bemerkung`)
Apply these regex patterns in order, case-insensitive, with optional whitespace:
| Pattern | Direction | Note |
|---------|-----------|------|
| `(Sohn\|Tochter)\s+v(?:on)?\s+(.+)` | Named person(s) → this person | "Sohn v Clara u Herbert" |
| `(Vater\|Mutter)\s+v(?:on)?\s+(.+)` | This person → named person(s) | "Vater v Herbert" |
**Multi-parent extraction:** The parent string may contain two parents joined by `\s+u(?:nd)?\s+`. Split on this pattern, resolve each part independently.
**Emit** one `PARENT_OF` edge per resolved parent:
```json
{
"personId": "<parent_rowId>",
"relatedPersonId": "<child_rowId>",
"type": "PARENT_OF",
"source": "bemerkung",
"rawBemerkung": "<original col J value>"
}
```
**Skip** (do not emit, do not add to `unresolved[]`, leave in notes):
- Patterns starting with `Neffe`, `Nichte`, `Enkel`, `Enkelin`, `Urenkel`, `Urenkelin` — too indirect.
- Patterns starting with `Bruder`, `Schwester` — SIBLING_OF is out of scope for this tool.
- Any other Bemerkung text that does not match the parent patterns.
**After extraction:** the matched portion of the Bemerkung is removed; the remainder goes into `person.notes`.
---
## 8. Output JSON Schema
File: `tools/import-normalizer/out/canonical-persons-tree.json`
```json
{
"generated_at": "<ISO-8601 timestamp>",
"source": "Personendatei 2.xlsx",
"stats": {
"persons": 163,
"relationships": 87,
"unresolved": 12
},
"persons": [
{
"rowId": "row_002",
"firstName": "Elsgard",
"lastName": "Allemeyer",
"maidenName": "Wöhler",
"alias": null,
"notes": "Nichte von Herbert",
"birthYear": 1920,
"deathYear": 1999,
"birthPlace": "Garz",
"deathPlace": "Espelkamp",
"generation": 3,
"familyMember": true
}
],
"relationships": [
{
"personId": "row_002",
"relatedPersonId": "row_003",
"type": "SPOUSE_OF",
"source": "verheiratet_mit"
},
{
"personId": "row_019",
"relatedPersonId": "row_021",
"type": "PARENT_OF",
"source": "bemerkung",
"rawBemerkung": "Tochter v Clara u Herbert"
}
],
"unresolved": [
{
"rowId": "row_007",
"field": "verheiratet_mit",
"raw": "\"Tante Lolly\"",
"reason": "not_found"
},
{
"rowId": "row_042",
"field": "bemerkung",
"raw": "Zwillingsbruder v Herbert",
"reason": "not_found"
}
]
}
```
---
## 9. CLI Interface
```
python3 persons_tree.py [--input PATH] [--output PATH] [--dry-run]
```
| Flag | Default | Description |
|------|---------|-------------|
| `--input` | `../../import/Personendatei 2.xlsx` | Source Excel file |
| `--output` | `out/canonical-persons-tree.json` | Output JSON file |
| `--dry-run` | off | Print stats + first 5 unresolved entries; do not write file |
On success, print:
```
✓ 163 persons parsed
✓ 87 relationships emitted (52 SPOUSE_OF, 35 PARENT_OF)
⚠ 12 unresolved (see unresolved[] in output)
→ out/canonical-persons-tree.json
```
---
## 10. Module Reuse
| Existing module | What we reuse |
|-----------------|---------------|
| `dates.parse_date()` | String date parsing — handles DD.MM.YYYY, year-only, month+year, approximate markers |
| `config.MONTHS` | Month name → integer mapping (German + Spanish month names already present) |
The Excel serial conversion is new logic added directly in `persons_tree.py` (3 lines).
---
## 11. What This Tool Does NOT Do
- Does not call the backend API or touch the database.
- Does not create `PersonNameAlias` records — it emits `maidenName` as a field; the future backend importer maps it.
- Does not infer SIBLING_OF edges (requires symmetric lookup across multiple rows — deferred).
- Does not deduplicate persons that appear in both this file and `canonical-persons.xlsx` — deduplication is the backend importer's responsibility.
- Does produce `birthPlace` / `deathPlace` as top-level fields in the JSON (see §8) — they are free-text strings and informational only. The `Person` entity has no corresponding columns; the future backend importer decides whether to add columns or fold the values into `notes`.
---
## 12. Resolved Decisions
| OQ | Question | Decision |
|----|----------|----------|
| OQ-01 | Duplicate rows (127/138 — Christa Schütz; 129/139 — Christoph Seils). | **Tool deduplicates.** On pass 1, after building the person list, detect rows with identical `(firstName, lastName, birthYear)` and keep only the first occurrence. Log skipped row ids to stdout. |
| OQ-02 | `birthPlace` / `deathPlace` absent from `Person` entity. | **Keep as separate top-level fields** in the JSON (`birthPlace`, `deathPlace`). The future backend importer may add columns to the `persons` table; the field is preserved here to avoid data loss. |
| OQ-03 | `firstName` = `"Charlotte,Meta,Jacobi"` (multi-name comma string). | **Store verbatim as `firstName`.** No splitting. |

View File

@@ -16,10 +16,6 @@ CMD ["npm", "run", "dev"]
# Compiles the SvelteKit Node-adapter output to /app/build.
FROM node:20.19.0-alpine3.21 AS build
WORKDIR /app
# VITE_SENTRY_DSN is a build-time variable — Vite bakes it into the bundle.
# Passed via docker-compose build.args; empty string disables the SDK.
ARG VITE_SENTRY_DSN
ENV VITE_SENTRY_DSN=$VITE_SENTRY_DSN
COPY package.json package-lock.json ./
RUN npm ci
COPY . .

View File

@@ -106,31 +106,6 @@ export default defineConfig(
]
}
},
{
// Forbid test fixtures (*.test-fixture.svelte) from being imported by
// production code. Tree-shaking keeps them out of the production bundle
// today (no route reaches them), but a lint rule makes the boundary
// explicit so an accidental autocomplete import in a route or component
// fails fast. Test files (*.spec.ts / *.test.ts) and the fixtures
// themselves are exempt — see the next block. Nora #2 on PR #629
// round 3.
files: ['**/*.svelte', '**/*.svelte.ts', '**/*.svelte.js', '**/*.ts'],
ignores: ['**/*.spec.ts', '**/*.test.ts', '**/*.test-fixture.svelte'],
rules: {
'no-restricted-imports': [
'error',
{
patterns: [
{
group: ['**/*.test-fixture.svelte'],
message:
'Test fixtures (*.test-fixture.svelte) are test-only — do not import from production code. Tracked by #637.'
}
]
}
]
}
},
{
plugins: { boundaries },
settings: {

View File

@@ -445,12 +445,8 @@
"person_mention_load_error": "Person konnte nicht geladen werden.",
"person_mention_loading": "Lade Person…",
"person_mention_popup_empty": "Keine Personen gefunden",
"person_mention_search_label": "Person suchen",
"person_mention_search_prompt": "Namen eingeben…",
"person_mention_btn_label": "Person verlinken",
"person_mention_create_new": "Neue Person anlegen",
"person_mention_results_count_singular": "1 Person gefunden",
"person_mention_results_count_plural": "{count} Personen gefunden",
"transcription_editor_aria_label": "Transkriptionstext",
"person_born_name_prefix": "geb.",
"page_title_home": "Archiv",
@@ -638,9 +634,6 @@
"transcription_block_review": "Als geprüft markieren",
"transcription_block_unreview": "Markierung aufheben",
"transcription_reviewed_count": "{reviewed} von {total} geprüft",
"transcription_mark_all_reviewed": "Alle als fertig markieren",
"transcription_mark_all_reviewed_disabled": "Alle Blöcke sind bereits als fertig markiert",
"transcription_mark_all_reviewed_error": "Markierung fehlgeschlagen. Bitte versuchen Sie es erneut.",
"training_ocr_heading": "Kurrent-Erkennung trainieren",
"training_ocr_description": "Starte ein neues Training mit den bisher geprüften OCR-Blöcken, um die Erkennungsgenauigkeit für Kurrentschrift zu verbessern.",
"training_ocr_blocks_ready": "{blocks} geprüfte Blöcke bereit / {docs} Dokumente",

View File

@@ -445,12 +445,8 @@
"person_mention_load_error": "Could not load person.",
"person_mention_loading": "Loading person…",
"person_mention_popup_empty": "No persons found",
"person_mention_search_label": "Search for a person",
"person_mention_search_prompt": "Enter a name…",
"person_mention_btn_label": "Link person",
"person_mention_create_new": "Create new person",
"person_mention_results_count_singular": "1 person found",
"person_mention_results_count_plural": "{count} persons found",
"transcription_editor_aria_label": "Transcription text",
"person_born_name_prefix": "née",
"page_title_home": "Archive",
@@ -638,9 +634,6 @@
"transcription_block_review": "Mark as reviewed",
"transcription_block_unreview": "Unmark as reviewed",
"transcription_reviewed_count": "{reviewed} of {total} reviewed",
"transcription_mark_all_reviewed": "Mark all as reviewed",
"transcription_mark_all_reviewed_disabled": "All blocks are already marked as reviewed",
"transcription_mark_all_reviewed_error": "Failed to mark all as reviewed. Please try again.",
"training_ocr_heading": "Train Kurrent recognition",
"training_ocr_description": "Start a new training run using the reviewed OCR blocks to improve recognition accuracy for Kurrent script.",
"training_ocr_blocks_ready": "{blocks} reviewed blocks ready / {docs} documents",

View File

@@ -445,12 +445,8 @@
"person_mention_load_error": "No se pudo cargar la persona.",
"person_mention_loading": "Cargando persona…",
"person_mention_popup_empty": "No se encontraron personas",
"person_mention_search_label": "Buscar persona",
"person_mention_search_prompt": "Escribe un nombre…",
"person_mention_btn_label": "Vincular persona",
"person_mention_create_new": "Crear nueva persona",
"person_mention_results_count_singular": "1 persona encontrada",
"person_mention_results_count_plural": "{count} personas encontradas",
"transcription_editor_aria_label": "Texto de transcripción",
"person_born_name_prefix": "n.",
"page_title_home": "Archivo",
@@ -638,9 +634,6 @@
"transcription_block_review": "Marcar como revisado",
"transcription_block_unreview": "Desmarcar como revisado",
"transcription_reviewed_count": "{reviewed} de {total} revisados",
"transcription_mark_all_reviewed": "Marcar todo como revisado",
"transcription_mark_all_reviewed_disabled": "Todos los bloques ya están marcados como revisados",
"transcription_mark_all_reviewed_error": "Error al marcar como revisado. Intente de nuevo.",
"training_ocr_heading": "Entrenar reconocimiento Kurrent",
"training_ocr_description": "Inicia un nuevo entrenamiento con los bloques OCR revisados para mejorar la precisión de reconocimiento del script Kurrent.",
"training_ocr_blocks_ready": "{blocks} bloques revisados listos / {docs} documentos",

View File

@@ -17,7 +17,6 @@ import PdfViewer from '$lib/document/viewer/PdfViewer.svelte';
import { bulkTitleFromFilename } from '$lib/document/filename';
import type { Tag } from '$lib/tag/TagInput.svelte';
import type { components } from '$lib/generated/api';
import { withCsrf } from '$lib/shared/cookies';
type Person = components['schemas']['Person'];
@@ -184,10 +183,7 @@ async function saveUpload() {
// FormData with per-chunk progress. Session cookie is sent automatically
// by the browser for same-origin requests.
try {
const res = await fetch(
'/api/documents/quick-upload',
withCsrf({ method: 'POST', body: formData })
);
const res = await fetch('/api/documents/quick-upload', { method: 'POST', body: formData });
const body = await res.json().catch(() => ({ errors: [] }));
const errorFilenames = new Set<string>(
(body.errors ?? []).map((err: { filename: string }) => err.filename)

View File

@@ -5,7 +5,7 @@ import { clickOutside } from '$lib/shared/actions/clickOutside';
import { formatDate } from '$lib/shared/utils/date';
type Document = components['schemas']['Document'];
type DocumentListItem = components['schemas']['DocumentListItem'];
type DocumentSearchItem = components['schemas']['DocumentSearchItem'];
interface Props {
selectedDocuments?: Document[];
@@ -45,12 +45,8 @@ function handleInput() {
try {
const res = await fetch(`/api/documents/search?q=${encodeURIComponent(searchTerm)}&size=10`);
if (res.ok) {
const body: { items: DocumentListItem[] } = await res.json();
const docs = body.items.map((it) => ({
id: it.id,
title: it.title,
documentDate: it.documentDate
})) as unknown as Document[];
const body: { items: DocumentSearchItem[] } = await res.json();
const docs = body.items.map((it) => it.document);
results = docs.filter((d) => !selectedDocuments.some((s) => s.id === d.id));
}
} catch {

View File

@@ -10,19 +10,7 @@ const docFactory = (id: string, title: string, date = '1880-01-01') => ({
title,
documentDate: date,
originalFilename: `${title}.pdf`,
receivers: [],
tags: [],
completionPercentage: 0,
contributors: [],
matchData: {
titleOffsets: [],
senderMatched: false,
matchedReceiverIds: [],
matchedTagIds: [],
snippetOffsets: [],
summaryOffsets: []
},
status: 'UPLOADED' as const,
status: 'UPLOADED',
metadataComplete: false,
scriptType: 'UNKNOWN' as const,
createdAt: '2024-01-01T00:00:00',
@@ -34,7 +22,7 @@ function mockSearchResponse(items: ReturnType<typeof docFactory>[]) {
'fetch',
vi.fn().mockResolvedValue({
ok: true,
json: vi.fn().mockResolvedValue({ items })
json: vi.fn().mockResolvedValue({ items: items.map((document) => ({ document })) })
})
);
}
@@ -103,7 +91,10 @@ describe('DocumentMultiSelect — search and select', () => {
const fetchMock = vi.fn().mockResolvedValue({
ok: true,
json: vi.fn().mockResolvedValue({
items: [docFactory('d1', 'Already attached'), docFactory('d2', 'Not attached')]
items: [
{ document: docFactory('d1', 'Already attached') },
{ document: docFactory('d2', 'Not attached') }
]
})
});
vi.stubGlobal('fetch', fetchMock);

View File

@@ -9,11 +9,11 @@ import ProgressRing from '$lib/shared/primitives/ProgressRing.svelte';
import ContributorStack from '$lib/shared/primitives/ContributorStack.svelte';
import DocumentThumbnail from './DocumentThumbnail.svelte';
type DocumentListItem = components['schemas']['DocumentListItem'];
type DocumentSearchItem = components['schemas']['DocumentSearchItem'];
let { item, canWrite = false }: { item: DocumentListItem; canWrite?: boolean } = $props();
let { item, canWrite = false }: { item: DocumentSearchItem; canWrite?: boolean } = $props();
const doc = $derived(item);
const doc = $derived(item.document);
const titleText = $derived(doc.title || doc.originalFilename);
const titleOffsets = $derived(item.matchData?.titleOffsets ?? []);
const titleSegments = $derived(applyOffsets(titleText, titleOffsets));

View File

@@ -14,17 +14,24 @@ afterEach(() => {
bulkSelectionStore.clear();
});
type DocumentListItem = components['schemas']['DocumentListItem'];
type DocumentSearchItem = components['schemas']['DocumentSearchItem'];
function makeItem(overrides: Partial<DocumentListItem> = {}): DocumentListItem {
function makeItem(overrides: Partial<DocumentSearchItem> = {}): DocumentSearchItem {
return {
id: '1',
title: 'Testbrief',
originalFilename: 'testbrief.pdf',
documentDate: '2024-03-15',
sender: undefined,
receivers: [],
tags: [],
document: {
id: '1',
title: 'Testbrief',
originalFilename: 'testbrief.pdf',
status: 'UPLOADED',
documentDate: '2024-03-15',
sender: null,
receivers: [],
tags: [],
createdAt: '2024-01-01T00:00:00Z',
updatedAt: '2024-01-01T00:00:00Z',
metadataComplete: false,
scriptType: 'UNKNOWN'
},
matchData: {
titleOffsets: [],
senderMatched: false,
@@ -48,14 +55,14 @@ describe('DocumentRow title', () => {
});
it('falls back to originalFilename when title is null', async () => {
const item = makeItem({ title: null as unknown as string });
const item = makeItem({ document: { ...makeItem().document, title: null } });
render(DocumentRow, { item });
await expect.element(page.getByRole('heading', { name: 'testbrief.pdf' })).toBeInTheDocument();
});
it('renders a mark element for highlighted title offsets', async () => {
const item = makeItem({
title: 'Brief an Anna',
document: { ...makeItem().document, title: 'Brief an Anna' },
matchData: {
titleOffsets: [{ start: 0, length: 5 }],
senderMatched: false,
@@ -102,12 +109,9 @@ describe('DocumentRow snippet', () => {
describe('DocumentRow sender', () => {
it('shows sender display name', async () => {
const item = makeItem({
sender: {
id: 's1',
lastName: 'Maria',
displayName: 'Großmutter Maria',
personType: 'PERSON',
familyMember: false
document: {
...makeItem().document,
sender: { id: 's1', displayName: 'Großmutter Maria' }
}
});
render(DocumentRow, { item });
@@ -122,12 +126,9 @@ describe('DocumentRow sender', () => {
it('highlights the sender when senderMatched is true', async () => {
const item = makeItem({
sender: {
id: 's1',
lastName: 'Maria',
displayName: 'Großmutter Maria',
personType: 'PERSON',
familyMember: false
document: {
...makeItem().document,
sender: { id: 's1', displayName: 'Großmutter Maria' }
},
matchData: {
...makeItem().matchData,
@@ -141,15 +142,10 @@ describe('DocumentRow sender', () => {
it('highlights a receiver when matchedReceiverIds includes its id', async () => {
const item = makeItem({
receivers: [
{
id: 'r1',
lastName: 'Karl',
displayName: 'Onkel Karl',
personType: 'PERSON',
familyMember: false
}
],
document: {
...makeItem().document,
receivers: [{ id: 'r1', displayName: 'Onkel Karl' }]
},
matchData: {
...makeItem().matchData,
matchedReceiverIds: ['r1']
@@ -166,7 +162,10 @@ describe('DocumentRow sender', () => {
describe('DocumentRow summary', () => {
it('renders the document summary when present', async () => {
const item = makeItem({
summary: 'Brief von Eugenie über die Heimreise aus dem Süden.'
document: {
...makeItem().document,
summary: 'Brief von Eugenie über die Heimreise aus dem Süden.'
}
});
render(DocumentRow, { item });
await expect
@@ -181,7 +180,7 @@ describe('DocumentRow summary', () => {
it('applies summary search-match highlight via summaryOffsets', async () => {
const item = makeItem({
summary: 'Brief über Menton',
document: { ...makeItem().document, summary: 'Brief über Menton' },
matchData: {
...makeItem().matchData,
summaryOffsets: [{ start: 11, length: 6 }]
@@ -197,19 +196,25 @@ describe('DocumentRow summary', () => {
describe('DocumentRow archive chips', () => {
it('renders the archive box chip when set', async () => {
const item = makeItem({ archiveBox: 'K3' });
const item = makeItem({
document: { ...makeItem().document, archiveBox: 'K3' }
});
render(DocumentRow, { item });
await expect.element(page.getByText('K3')).toBeInTheDocument();
});
it('renders the archive folder chip when set', async () => {
const item = makeItem({ archiveFolder: 'Mappe A' });
const item = makeItem({
document: { ...makeItem().document, archiveFolder: 'Mappe A' }
});
render(DocumentRow, { item });
await expect.element(page.getByText('Mappe A')).toBeInTheDocument();
});
it('renders the location chip when meta_location is set', async () => {
const item = makeItem({ location: 'Berlin' });
const item = makeItem({
document: { ...makeItem().document, location: 'Berlin' }
});
render(DocumentRow, { item });
await expect.element(page.getByText('Berlin')).toBeInTheDocument();
});
@@ -220,7 +225,10 @@ describe('DocumentRow archive chips', () => {
describe('DocumentRow tags', () => {
it('renders tag buttons', async () => {
const item = makeItem({
tags: [{ id: 't1', name: 'Familie' }]
document: {
...makeItem().document,
tags: [{ id: 't1', name: 'Familie', color: null, parentId: null }]
}
});
render(DocumentRow, { item });
await expect.element(page.getByRole('button', { name: 'Familie' })).toBeInTheDocument();
@@ -228,7 +236,10 @@ describe('DocumentRow tags', () => {
it('navigates to /documents?tag=… on tag click', async () => {
const item = makeItem({
tags: [{ id: 't1', name: 'Urlaub & Reise' }]
document: {
...makeItem().document,
tags: [{ id: 't1', name: 'Urlaub & Reise', color: null, parentId: null }]
}
});
render(DocumentRow, { item });
// Tailwind CSS isn't loaded in the vitest-browser client project, so the
@@ -244,7 +255,10 @@ describe('DocumentRow tags', () => {
it('tag click does not navigate to the document detail page', async () => {
const item = makeItem({
tags: [{ id: 't2', name: 'Familie' }]
document: {
...makeItem().document,
tags: [{ id: 't2', name: 'Familie', color: null, parentId: null }]
}
});
render(DocumentRow, { item });
const before = window.location.href;
@@ -267,7 +281,7 @@ describe('DocumentRow bulk selection checkbox', () => {
});
it('checkbox aria-label includes the document title', async () => {
const item = makeItem({ title: 'Brief an Anna' });
const item = makeItem({ document: { ...makeItem().document, title: 'Brief an Anna' } });
render(DocumentRow, { item, canWrite: true });
await expect
.element(page.getByRole('checkbox', { name: /Brief an Anna/i }))
@@ -275,7 +289,7 @@ describe('DocumentRow bulk selection checkbox', () => {
});
it('toggling the checkbox calls bulkSelectionStore.toggle', async () => {
const item = makeItem({ id: 'doc-42' });
const item = makeItem({ document: { ...makeItem().document, id: 'doc-42' } });
render(DocumentRow, { item, canWrite: true });
expect(bulkSelectionStore.has('doc-42')).toBe(false);
@@ -286,7 +300,7 @@ describe('DocumentRow bulk selection checkbox', () => {
it('checked state mirrors the store', async () => {
bulkSelectionStore.add('doc-99');
const item = makeItem({ id: 'doc-99' });
const item = makeItem({ document: { ...makeItem().document, id: 'doc-99' } });
render(DocumentRow, { item, canWrite: true });
await expect.element(page.getByRole('checkbox')).toBeChecked();
});

View File

@@ -20,31 +20,10 @@ const { default: DocumentRow } = await import('./DocumentRow.svelte');
afterEach(cleanup);
const sender = {
id: 's1',
lastName: 'Schmidt',
displayName: 'Anna Schmidt',
personType: 'PERSON' as const,
familyMember: false
};
const receiver = {
id: 'r1',
lastName: 'Meier',
displayName: 'Bert Meier',
personType: 'PERSON' as const,
familyMember: false
};
const sender = { id: 's1', displayName: 'Anna Schmidt' };
const receiver = { id: 'r1', displayName: 'Bert Meier' };
const emptyMatchData = {
titleOffsets: [],
senderMatched: false,
matchedReceiverIds: [],
matchedTagIds: [],
snippetOffsets: [],
summaryOffsets: []
};
const baseItem = (overrides: Record<string, unknown> = {}) => ({
const makeDoc = (overrides: Record<string, unknown> = {}) => ({
id: 'd1',
title: 'Brief 1923',
originalFilename: 'b.pdf',
@@ -52,16 +31,22 @@ const baseItem = (overrides: Record<string, unknown> = {}) => ({
sender,
receivers: [receiver],
tags: [],
summary: undefined,
archiveBox: undefined,
archiveFolder: undefined,
location: undefined,
matchData: emptyMatchData,
completionPercentage: 0,
contributors: [],
thumbnailUrl: null,
contentType: 'application/pdf',
summary: null,
archiveBox: null,
archiveFolder: null,
location: null,
...overrides
});
const baseItem = (docOverrides: Record<string, unknown> = {}) => ({
document: makeDoc(docOverrides),
matchData: null,
completionPercentage: 0,
contributors: []
});
describe('DocumentRow', () => {
it('renders the title', async () => {
render(DocumentRow, { props: { item: baseItem() } });
@@ -136,9 +121,12 @@ describe('DocumentRow', () => {
it('renders the snippet when matchData provides a transcriptionSnippet', async () => {
render(DocumentRow, {
props: {
item: baseItem({
matchData: { ...emptyMatchData, transcriptionSnippet: 'Hello world snippet' }
})
item: {
document: makeDoc(),
matchData: { transcriptionSnippet: 'Hello world snippet' },
completionPercentage: 50,
contributors: []
}
}
});

View File

@@ -1,7 +1,7 @@
import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
import TranscriptionBlockHost from './TranscriptionBlock.test-fixture.svelte';
import TranscriptionBlockHost from './TranscriptionBlock.test-host.svelte';
import type { ConfirmService } from '$lib/shared/services/confirm.svelte.js';
afterEach(cleanup);

View File

@@ -6,7 +6,6 @@ import TranscribeCoachEmptyState from '$lib/shared/help/TranscribeCoachEmptyStat
import type { PersonMention, TranscriptionBlockData } from '$lib/shared/types';
import { createBlockAutoSave } from '$lib/document/transcription/useBlockAutoSave.svelte';
import { createBlockDragDrop } from '$lib/document/transcription/useBlockDragDrop.svelte';
import { withCsrf } from '$lib/shared/cookies';
type Props = {
documentId: string;
@@ -50,7 +49,6 @@ let activeBlockId: string | null = $state(null);
let localLabels: string[] = $derived.by(() => [...trainingLabels]);
let listEl: HTMLElement | null = $state(null);
let markingAllReviewed = $state(false);
let markAllError = $state<string | null>(null);
const sortedBlocks = $derived([...blocks].sort((a, b) => a.sortOrder - b.sortOrder));
const hasBlocks = $derived(blocks.length > 0);
@@ -69,11 +67,8 @@ $effect(() => {
async function handleMarkAllReviewed() {
if (!onMarkAllReviewed) return;
markingAllReviewed = true;
markAllError = null;
try {
await onMarkAllReviewed();
} catch {
markAllError = m.transcription_mark_all_reviewed_error();
} finally {
markingAllReviewed = false;
}
@@ -114,14 +109,11 @@ function handleDelete(blockId: string) {
async function reorder(newOrder: string[]) {
try {
const res = await fetch(
`/api/documents/${documentId}/transcription-blocks/reorder`,
withCsrf({
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ blockIds: newOrder })
})
);
const res = await fetch(`/api/documents/${documentId}/transcription-blocks/reorder`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ blockIds: newOrder })
});
if (!res.ok) return;
const updated = await res.json();
for (const b of updated) {
@@ -177,7 +169,7 @@ async function handleLabelToggle(label: string) {
<button
onclick={handleMarkAllReviewed}
disabled={allReviewed || markingAllReviewed}
title={allReviewed ? m.transcription_mark_all_reviewed_disabled() : undefined}
title={allReviewed ? 'Alle Blöcke sind bereits als fertig markiert' : undefined}
class="flex min-h-[44px] items-center gap-1.5 rounded-sm px-3 font-sans text-xs font-medium text-brand-navy/80 transition-colors hover:text-brand-navy focus-visible:ring-2 focus-visible:ring-brand-navy disabled:opacity-40"
>
{#if markingAllReviewed}
@@ -215,7 +207,7 @@ async function handleLabelToggle(label: string) {
<path stroke-linecap="round" stroke-linejoin="round" d="M5 13l4 4L19 7" />
</svg>
{/if}
{m.transcription_mark_all_reviewed()}
Alle als fertig markieren
</button>
{/if}
</div>
@@ -225,31 +217,6 @@ async function handleLabelToggle(label: string) {
style="width: {reviewProgress}%"
></div>
</div>
{#if markAllError}
<div
role="alert"
class="mt-1.5 flex items-center gap-2 rounded-sm border border-red-200 bg-red-50 px-3 py-2 font-sans text-sm text-red-700"
>
<span class="flex-1">{markAllError}</span>
<button
onclick={() => (markAllError = null)}
aria-label={m.comp_dismiss()}
class="flex min-h-[44px] min-w-[44px] items-center justify-center rounded text-red-600 hover:text-red-700 focus-visible:ring-2 focus-visible:ring-red-500"
>
<svg
class="h-4 w-4"
fill="none"
stroke="currentColor"
stroke-width="2"
viewBox="0 0 24 24"
xmlns="http://www.w3.org/2000/svg"
aria-hidden="true"
>
<path stroke-linecap="round" stroke-linejoin="round" d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
</div>
{/if}
</div>
<div class="p-4">
<!-- svelte-ignore a11y_no_static_element_interactions -->

View File

@@ -3,7 +3,6 @@ import { cleanup, render } from 'vitest-browser-svelte';
import { page, userEvent } from 'vitest/browser';
import TranscriptionEditView from './TranscriptionEditView.svelte';
import { createConfirmService, CONFIRM_KEY } from '$lib/shared/services/confirm.svelte.js';
import { m } from '$lib/paraglide/messages.js';
afterEach(cleanup);
@@ -313,14 +312,14 @@ describe('TranscriptionEditView — mark all reviewed', () => {
onMarkAllReviewed: vi.fn().mockResolvedValue(undefined)
});
await expect
.element(page.getByRole('button', { name: m.transcription_mark_all_reviewed() }))
.element(page.getByRole('button', { name: /Alle als fertig markieren/ }))
.toBeInTheDocument();
});
it('does not show "Alle als fertig markieren" button when onMarkAllReviewed is not provided', async () => {
renderView({ blocks: [unreviewedBlock1, unreviewedBlock2] });
await expect
.element(page.getByRole('button', { name: m.transcription_mark_all_reviewed() }))
.element(page.getByRole('button', { name: /Alle als fertig markieren/ }))
.not.toBeInTheDocument();
});
@@ -330,7 +329,7 @@ describe('TranscriptionEditView — mark all reviewed', () => {
onMarkAllReviewed: vi.fn().mockResolvedValue(undefined)
});
await expect
.element(page.getByRole('button', { name: m.transcription_mark_all_reviewed() }))
.element(page.getByRole('button', { name: /Alle als fertig markieren/ }))
.toBeDisabled();
});
@@ -344,7 +343,7 @@ describe('TranscriptionEditView — mark all reviewed', () => {
// userEvent.click() via Playwright CDP doesn't reliably trigger Svelte 5 onclick
// handlers when a TipTap editor is mounted in the same component tree.
const btn = (await page
.getByRole('button', { name: m.transcription_mark_all_reviewed() })
.getByRole('button', { name: /Alle als fertig markieren/ })
.element()) as HTMLButtonElement;
btn.dispatchEvent(new MouseEvent('click', { bubbles: true, cancelable: true }));
await vi.waitFor(() => expect(onMarkAllReviewed).toHaveBeenCalledTimes(1));
@@ -362,83 +361,12 @@ describe('TranscriptionEditView — mark all reviewed', () => {
// Same CDP click workaround: dispatch from browser JS to reliably fire Svelte 5 onclick
const btnEl = (await page
.getByRole('button', { name: m.transcription_mark_all_reviewed() })
.getByRole('button', { name: /Alle als fertig markieren/ })
.element()) as HTMLButtonElement;
btnEl.dispatchEvent(new MouseEvent('click', { bubbles: true, cancelable: true }));
await expect
.element(page.getByRole('button', { name: m.transcription_mark_all_reviewed() }))
.element(page.getByRole('button', { name: /Alle als fertig markieren/ }))
.toBeDisabled();
resolveMarkAll();
});
it('shows error message when onMarkAllReviewed callback rejects', async () => {
const onMarkAllReviewed = vi.fn().mockRejectedValue(new Error('INTERNAL_ERROR'));
renderView({ blocks: [unreviewedBlock1, unreviewedBlock2], onMarkAllReviewed });
const btnEl = (await page
.getByRole('button', { name: m.transcription_mark_all_reviewed() })
.element()) as HTMLButtonElement;
btnEl.dispatchEvent(new MouseEvent('click', { bubbles: true, cancelable: true }));
await expect.element(page.getByRole('alert')).toBeInTheDocument();
await expect
.element(page.getByRole('alert'))
.toHaveTextContent(m.transcription_mark_all_reviewed_error());
});
it('clears error when dismiss button is clicked', async () => {
const onMarkAllReviewed = vi.fn().mockRejectedValue(new Error('INTERNAL_ERROR'));
renderView({ blocks: [unreviewedBlock1, unreviewedBlock2], onMarkAllReviewed });
const btnEl = (await page
.getByRole('button', { name: m.transcription_mark_all_reviewed() })
.element()) as HTMLButtonElement;
btnEl.dispatchEvent(new MouseEvent('click', { bubbles: true, cancelable: true }));
await expect.element(page.getByRole('alert')).toBeInTheDocument();
const dismissEl = (await page
.getByRole('button', { name: m.comp_dismiss() })
.element()) as HTMLButtonElement;
dismissEl.dispatchEvent(new MouseEvent('click', { bubbles: true, cancelable: true }));
await expect.element(page.getByRole('alert')).not.toBeInTheDocument();
});
it('clears error on next successful markAllReviewed call', async () => {
const onMarkAllReviewed = vi
.fn()
.mockRejectedValueOnce(new Error('INTERNAL_ERROR'))
.mockResolvedValue(undefined);
renderView({ blocks: [unreviewedBlock1, unreviewedBlock2], onMarkAllReviewed });
const btnEl = (await page
.getByRole('button', { name: m.transcription_mark_all_reviewed() })
.element()) as HTMLButtonElement;
btnEl.dispatchEvent(new MouseEvent('click', { bubbles: true, cancelable: true }));
await expect.element(page.getByRole('alert')).toBeInTheDocument();
// Wait for the button to be re-enabled before the second click — ensures the first
// async rejection has fully settled and Svelte has flushed state changes
await expect
.element(page.getByRole('button', { name: m.transcription_mark_all_reviewed() }))
.not.toBeDisabled();
btnEl.dispatchEvent(new MouseEvent('click', { bubbles: true, cancelable: true }));
await expect.element(page.getByRole('alert')).not.toBeInTheDocument();
});
it('re-enables button after markAllReviewed failure', async () => {
const onMarkAllReviewed = vi.fn().mockRejectedValue(new Error('INTERNAL_ERROR'));
renderView({ blocks: [unreviewedBlock1, unreviewedBlock2], onMarkAllReviewed });
const btnEl = (await page
.getByRole('button', { name: m.transcription_mark_all_reviewed() })
.element()) as HTMLButtonElement;
btnEl.dispatchEvent(new MouseEvent('click', { bubbles: true, cancelable: true }));
await expect.element(page.getByRole('alert')).toBeInTheDocument();
await expect
.element(page.getByRole('button', { name: m.transcription_mark_all_reviewed() }))
.not.toBeDisabled();
});
});

View File

@@ -1,6 +1,5 @@
import { SvelteMap } from 'svelte/reactivity';
import type { PersonMention } from '$lib/shared/types';
import { withCsrf } from '$lib/shared/cookies';
export type SaveState = 'idle' | 'saving' | 'saved' | 'fading' | 'error';
@@ -117,15 +116,12 @@ export function createBlockAutoSave({ saveFn, documentId }: Options) {
for (const [blockId, text] of pendingTexts) {
const mentions = pendingMentions.get(blockId) ?? [];
clearDebounce(blockId);
void fetch(
`/api/documents/${documentId}/transcription-blocks/${blockId}`,
withCsrf({
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text, mentionedPersons: mentions }),
keepalive: true
})
);
void fetch(`/api/documents/${documentId}/transcription-blocks/${blockId}`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text, mentionedPersons: mentions }),
keepalive: true
});
pendingTexts.delete(blockId);
pendingMentions.delete(blockId);
}

View File

@@ -259,15 +259,12 @@ describe('createTranscriptionBlocks.markAllReviewed', () => {
expect(ctrl.blocks.every((b) => b.reviewed)).toBe(true);
});
it('throws and leaves blocks unchanged when PUT returns non-OK', async () => {
it('is a no-op when PUT returns non-OK', async () => {
const fetchImpl = vi.fn(async (url: RequestInfo | URL, init?: RequestInit) => {
const u = url.toString();
const method = init?.method ?? 'GET';
if (u.includes('/review-all') && method === 'PUT') {
return new Response(JSON.stringify({ code: 'INTERNAL_ERROR' }), {
status: 500,
headers: { 'Content-Type': 'application/json' }
});
return new Response('', { status: 500 });
}
return new Response(JSON.stringify([baseBlock({ id: 'b-1', reviewed: false })]), {
status: 200,
@@ -277,26 +274,7 @@ describe('createTranscriptionBlocks.markAllReviewed', () => {
const ctrl = createTranscriptionBlocks({ documentId: () => 'doc-1', fetchImpl });
await ctrl.load();
await expect(ctrl.markAllReviewed()).rejects.toThrow('INTERNAL_ERROR');
expect(ctrl.blocks[0].reviewed).toBe(false);
});
it('throws INTERNAL_ERROR when PUT returns non-JSON body (e.g. nginx 502)', async () => {
const fetchImpl = vi.fn(async (url: RequestInfo | URL, init?: RequestInit) => {
const u = url.toString();
const method = init?.method ?? 'GET';
if (u.includes('/review-all') && method === 'PUT') {
return new Response('Bad Gateway', { status: 502 });
}
return new Response(JSON.stringify([baseBlock({ id: 'b-1', reviewed: false })]), {
status: 200,
headers: { 'Content-Type': 'application/json' }
});
});
const ctrl = createTranscriptionBlocks({ documentId: () => 'doc-1', fetchImpl });
await ctrl.load();
await expect(ctrl.markAllReviewed()).rejects.toThrow('INTERNAL_ERROR');
await ctrl.markAllReviewed();
expect(ctrl.blocks[0].reviewed).toBe(false);
});
});

View File

@@ -2,7 +2,6 @@
lastEditedAt's $derived are scope-local to one computation; they're never
stored on $state. */
import type { TranscriptionBlockData, PersonMention } from '$lib/shared/types';
import { makeCsrfFetch } from '$lib/shared/cookies';
import { saveBlockWithConflictRetry } from './saveBlockWithConflictRetry';
import { BlockConflictResolvedError } from './blockConflictMerge';
@@ -42,7 +41,7 @@ export function createTranscriptionBlocks(
options: TranscriptionBlocksOptions
): TranscriptionBlocksController {
const { documentId } = options;
const fetchImpl = makeCsrfFetch(options.fetchImpl ?? fetch);
const fetchImpl = options.fetchImpl ?? fetch;
let blocks = $state<TranscriptionBlockData[]>([]);
let annotationReloadKey = $state(0);
@@ -120,11 +119,7 @@ export function createTranscriptionBlocks(
const res = await fetchImpl(`/api/documents/${documentId()}/transcription-blocks/review-all`, {
method: 'PUT'
});
if (!res.ok) {
const body = await res.json().catch(() => ({}));
// Never render body.message — route through getErrorMessage() to prevent leaking backend internals
throw new Error((body as { code?: string })?.code ?? 'INTERNAL_ERROR');
}
if (!res.ok) return;
const updated = (await res.json()) as { id: string; reviewed: boolean }[];
for (const b of updated) {
const existing = blocks.find((x) => x.id === b.id);

View File

@@ -2068,20 +2068,12 @@ export interface components {
};
ImportStatus: {
/** @enum {string} */
state: "IDLE" | "RUNNING" | "DONE" | "FAILED";
statusCode: string;
state?: "IDLE" | "RUNNING" | "DONE" | "FAILED";
statusCode?: string;
/** Format: int32 */
processed: number;
skippedFiles: components["schemas"]["SkippedFile"][];
processed?: number;
/** Format: date-time */
startedAt?: string;
/** Format: int32 */
skipped?: number;
};
SkippedFile: {
filename: string;
/** @enum {string} */
reason: "INVALID_FILENAME_PATH_TRAVERSAL" | "INVALID_PDF_SIGNATURE" | "FILE_READ_ERROR" | "ALREADY_EXISTS" | "S3_UPLOAD_FAILED";
};
BackfillStatus: {
/** @enum {string} */
@@ -2205,10 +2197,10 @@ export interface components {
totalStories: number;
};
PersonSummaryDTO: {
title?: string;
/** Format: uuid */
id?: string;
displayName?: string;
title?: string;
firstName?: string;
lastName?: string;
/** Format: int64 */
@@ -2315,14 +2307,14 @@ export interface components {
/** Format: int32 */
totalPages?: number;
pageable?: components["schemas"]["PageableObject"];
first?: boolean;
last?: boolean;
/** Format: int32 */
size?: number;
content?: components["schemas"]["NotificationDTO"][];
/** Format: int32 */
number?: number;
sort?: components["schemas"]["SortObject"];
first?: boolean;
last?: boolean;
/** Format: int32 */
numberOfElements?: number;
empty?: boolean;
@@ -2388,28 +2380,15 @@ export interface components {
/** Format: int32 */
totalPages?: number;
};
DocumentListItem: {
/** Format: uuid */
id: string;
title: string;
originalFilename: string;
thumbnailUrl?: string;
/** Format: date */
documentDate?: string;
sender?: components["schemas"]["Person"];
receivers: components["schemas"]["Person"][];
tags: components["schemas"]["Tag"][];
archiveBox?: string;
archiveFolder?: string;
location?: string;
summary?: string;
DocumentSearchItem: {
document: components["schemas"]["Document"];
matchData: components["schemas"]["SearchMatchData"];
/** Format: int32 */
completionPercentage: number;
contributors: components["schemas"]["ActivityActorDTO"][];
matchData: components["schemas"]["SearchMatchData"];
};
DocumentSearchResult: {
items: components["schemas"]["DocumentListItem"][];
items: components["schemas"]["DocumentSearchItem"][];
/** Format: int64 */
totalElements: number;
/** Format: int32 */

View File

@@ -2,7 +2,6 @@
import TrainingHistory from './TrainingHistory.svelte';
import { m } from '$lib/paraglide/messages.js';
import type { TrainingRun } from '$lib/ocr/training.js';
import { withCsrf } from '$lib/shared/cookies';
interface TrainingInfo {
availableBlocks?: number;
@@ -34,7 +33,7 @@ async function startTraining() {
successMessage = null;
errorMessage = null;
try {
const res = await fetch('/api/ocr/train', withCsrf({ method: 'POST' }));
const res = await fetch('/api/ocr/train', { method: 'POST' });
if (res.ok) {
successMessage = m.training_success();
setTimeout(() => {

View File

@@ -2,7 +2,6 @@
import TrainingHistory from './TrainingHistory.svelte';
import { m } from '$lib/paraglide/messages.js';
import type { TrainingRun } from '$lib/ocr/training.js';
import { withCsrf } from '$lib/shared/cookies';
interface TrainingInfo {
availableSegBlocks?: number;
@@ -28,7 +27,7 @@ async function startTraining() {
training = true;
successMessage = null;
try {
const res = await fetch('/api/ocr/segtrain', withCsrf({ method: 'POST' }));
const res = await fetch('/api/ocr/segtrain', { method: 'POST' });
if (res.ok) {
successMessage = m.training_success();
setTimeout(() => {

View File

@@ -1,20 +0,0 @@
import { describe, it, expect } from 'vitest';
import { extractErrorCode } from './api.server';
describe('extractErrorCode', () => {
it('returns the code string when error has a code property', () => {
expect(extractErrorCode({ code: 'DOCUMENT_NOT_FOUND' })).toBe('DOCUMENT_NOT_FOUND');
});
it('returns undefined when error is undefined', () => {
expect(extractErrorCode(undefined)).toBeUndefined();
});
it('returns undefined when error is null', () => {
expect(extractErrorCode(null)).toBeUndefined();
});
it('returns undefined when error is a plain string', () => {
expect(extractErrorCode('oops')).toBeUndefined();
});
it('returns undefined when error object has no code property', () => {
expect(extractErrorCode({ message: 'fail' })).toBeUndefined();
});
});

View File

@@ -23,11 +23,3 @@ export function createApiClient(fetch: typeof globalThis.fetch) {
fetch
});
}
export interface ApiError {
code?: string;
}
export function extractErrorCode(error: unknown): string | undefined {
return (error as ApiError | undefined)?.code;
}

View File

@@ -1,46 +1,3 @@
/**
* Reads the XSRF-TOKEN cookie set by Spring Security's CookieCsrfTokenRepository.
* Returns null outside the browser or when the cookie is absent.
*/
export function getCsrfToken(): string | null {
if (typeof document === 'undefined') return null;
const match = document.cookie.match(/(?:^|;\s*)XSRF-TOKEN=([^;]+)/);
return match ? decodeURIComponent(match[1]) : null;
}
/**
* Merges the X-XSRF-TOKEN header into a RequestInit so Spring Security's
* CSRF filter accepts the request. Safe to call server-side (no-op when the
* cookie is absent).
*/
export function withCsrf(init?: RequestInit): RequestInit {
const token = getCsrfToken();
if (!token) return init ?? {};
const headers = new Headers(init?.headers);
headers.set('X-XSRF-TOKEN', token);
return { ...init, headers };
}
/**
* Wraps a fetch implementation so that every state-mutating call (POST, PUT,
* PATCH, DELETE) automatically includes the X-XSRF-TOKEN header. GET/HEAD
* requests pass through unchanged.
*
* Used to CSRF-protect client-side hooks that accept an injectable fetchImpl.
* In unit tests the injected mock is wrapped but getCsrfToken() returns null
* (no browser cookie), so no header is added and existing test expectations
* are unaffected.
*/
export function makeCsrfFetch(inner: typeof fetch): typeof fetch {
return (input: RequestInfo | URL, init?: RequestInit): Promise<Response> => {
const method = (init?.method ?? 'GET').toUpperCase();
if (['POST', 'PUT', 'PATCH', 'DELETE'].includes(method)) {
return inner(input, withCsrf(init));
}
return inner(input, init);
};
}
/**
* Extracts the fa_session cookie value from a list of Set-Cookie response headers.
*

View File

@@ -2,18 +2,7 @@
import type { components } from '$lib/generated/api';
// eslint-disable-next-line boundaries/dependencies -- mention dropdown needs person date formatting; extract to shared if it becomes reusable
import { formatLifeDateRange } from '$lib/person/personLifeDates';
import { untrack } from 'svelte';
import { m } from '$lib/paraglide/messages.js';
// Layered defence cap on the @mention search query length (CWE-400
// amplification). The <input maxlength> attribute below caps direct
// user edits, but the editor-mirror path (Tiptap contenteditable -> mirror
// $effect -> searchQuery) is not covered by `maxlength` since the
// contenteditable has no such enforcement. Clipping at the mirror keeps
// the cap honest from both paths. Tracked server-side separately.
// Nora #1 on PR #629. Hoisted to mentionConstants.ts so the host editor
// (PersonMentionEditor) can clip the inserted displayName to the same cap
// — see Felix #3 on PR #629.
import { MAX_QUERY_LENGTH } from './mentionConstants';
type Person = components['schemas']['Person'];
@@ -28,46 +17,7 @@ type DropdownState = {
clientRect: (() => DOMRect | null) | null;
};
let {
model,
editorQuery = '',
onSearch = () => {}
}: {
model: DropdownState;
/** Text typed after `@` in the host editor. Mirrors into the search input
* until the user takes manual ownership by typing into the input itself. */
editorQuery?: string;
onSearch?: (query: string) => void;
} = $props();
let searchQuery = $state(untrack(() => editorQuery.slice(0, MAX_QUERY_LENGTH)));
let userHasEdited = $state(false);
// Intent-revealing alias used by both the persistent aria-live announcer and
// the visible empty-state copy. Folding the duplicated rule into one $derived
// keeps the two branches in lockstep. Felix #3 on PR #629 round 4.
const isQueryEmpty = $derived(searchQuery.trim() === '');
// Mirror the editor's typed text until the user takes ownership.
//
// Why `$state + $effect` (not `$derived`): `searchQuery` is also written by
// `bind:value` on the <input> below, so it needs to be a mutable `$state`.
// A `$derived` would be read-only and would clobber direct user edits on
// every editor keystroke. The `userHasEdited` latch pins ownership once the
// user types into the input. Felix #1 on PR #629.
$effect(() => {
if (!userHasEdited) {
searchQuery = editorQuery.slice(0, MAX_QUERY_LENGTH);
}
});
// Fire onSearch whenever the effective query changes — covers both the
// editor mirror and direct input edits. This is the only place onSearch
// fires; when the dropdown is unmounted, the effect is disposed and no
// further fetches occur.
$effect(() => {
onSearch(searchQuery);
});
let { model }: { model: DropdownState } = $props();
// highlightedIndex must be both writable (keyboard handler mutates it) and
// reset when `items` changes (so it never points past the end of a new list).
@@ -162,70 +112,16 @@ function selectItem(item: Person) {
unauthenticated users.
-->
<div
class="fixed z-50 w-72 max-w-[calc(100vw-1rem)] overflow-hidden rounded-sm border border-line bg-surface shadow-lg"
class="fixed z-50 w-72 overflow-hidden rounded-sm border border-line bg-surface shadow-lg"
role="listbox"
aria-label={m.person_mention_btn_label()}
style:top={position.top}
style:bottom={position.bottom}
style:left={position.left}
>
<div class="border-b border-line px-3 py-2">
<label class="sr-only" for="mention-search">{m.person_mention_search_label()}</label>
<div class="flex items-center gap-2">
<svg
aria-hidden="true"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="2"
class="h-5 w-5 shrink-0 text-ink-2"
>
<circle cx="11" cy="11" r="7" />
<path d="m20 20-3.5-3.5" stroke-linecap="round" />
</svg>
<input
id="mention-search"
type="search"
data-test-search-input
maxlength={MAX_QUERY_LENGTH}
class="min-h-[44px] w-full bg-transparent font-sans text-base text-ink placeholder:text-ink-3 focus:outline-none focus-visible:ring-2 focus-visible:ring-brand-navy focus-visible:ring-inset"
placeholder={m.person_mention_search_prompt()}
bind:value={searchQuery}
oninput={() => {
userHasEdited = true;
}}
onmousedown={(e) => e.stopPropagation()}
/>
</div>
</div>
<!--
Persistent aria-live region — lives ABOVE the conditional branches so the
element never unmounts when items transition between empty and populated.
VoiceOver in particular swallows announcements from freshly-mounted live
regions, and the previous (conditional-inside) markup silently dropped
the "N persons found" announcement when results populated. Leonie #3 on
PR #629 round 3.
-->
<p class="sr-only" aria-live="polite">
{#if model.items.length === 0}
{isQueryEmpty ? m.person_mention_search_prompt() : m.person_mention_popup_empty()}
{:else if model.items.length === 1}
{m.person_mention_results_count_singular()}
{:else}
{m.person_mention_results_count_plural({ count: model.items.length })}
{/if}
</p>
{#if model.items.length === 0}
<!--
Visible empty-state copy — visual-only. The persistent sr-only <p>
above is the sole AT announcer; this one is hidden from screen readers
via aria-hidden="true" so VoiceOver does not double-announce
(NVDA de-dups, VoiceOver does not). Leonie S-2 on PR #629 round 4.
Do NOT add an aria-live attribute here — that would re-introduce
the duplicate announcement.
-->
<p aria-hidden="true" class="px-3 py-2.5 font-sans text-sm text-ink-3">
{isQueryEmpty ? m.person_mention_search_prompt() : m.person_mention_popup_empty()}
<p class="px-3 py-2.5 font-sans text-sm text-ink-3">
{m.person_mention_popup_empty()}
</p>
<!--
Empty-state escape hatch — without it the transcriber has to close
@@ -236,7 +132,7 @@ function selectItem(item: Person) {
<a
href="/persons/new"
target="_blank"
rel="noopener noreferrer"
rel="noopener"
class="flex min-h-[44px] items-center gap-2 border-t border-line px-3 py-2.5 font-sans text-sm font-medium text-brand-navy hover:bg-canvas focus:bg-canvas focus:outline-none"
onmousedown={(e) => e.preventDefault()}
>

View File

@@ -1,37 +1,22 @@
import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page, userEvent } from 'vitest/browser';
import { flushSync, mount, tick, unmount } from 'svelte';
import { page } from 'vitest/browser';
import MentionDropdown from './MentionDropdown.svelte';
import MentionDropdownFixture from './MentionDropdown.test-fixture.svelte';
import { m } from '$lib/paraglide/messages.js';
import type { components } from '$lib/generated/api';
type Person = components['schemas']['Person'];
afterEach(cleanup);
const makePerson = (id: string, name: string, overrides: Partial<Person> = {}): Person => {
const parts = name.split(' ');
return {
id,
firstName: parts[0],
lastName: parts.slice(1).join(' ') || name,
displayName: name,
personType: 'PERSON',
familyMember: false,
...overrides
};
};
const makePerson = (id: string, name: string, overrides: Record<string, unknown> = {}) => ({
id,
firstName: name.split(' ')[0] ?? null,
lastName: name.split(' ').slice(1).join(' ') || name,
displayName: name,
birthYear: null as number | null,
deathYear: null as number | null,
...overrides
});
type DropdownState = {
items: Person[];
command: (item: Person) => void;
clientRect: (() => DOMRect | null) | null;
};
const baseModel = (overrides: Partial<DropdownState> = {}): DropdownState => ({
items: [],
const baseModel = (overrides: Record<string, unknown> = {}) => ({
items: [] as ReturnType<typeof makePerson>[],
command: vi.fn(),
clientRect: () => new DOMRect(100, 100, 0, 24),
...overrides
@@ -44,32 +29,14 @@ describe('MentionDropdown', () => {
await expect.element(page.getByRole('listbox', { name: /person verlinken/i })).toBeVisible();
});
it('shows the "enter a name" prompt when the search field is empty', async () => {
it('renders the empty placeholder when items is empty', async () => {
render(MentionDropdown, { props: { model: baseModel() } });
// Scope to the visible empty-state <p> (text-ink-3) — the persistent
// sr-only aria-live region above also contains the same prompt copy.
const visibleEmptyP = document.querySelector(
'[role="listbox"] p.text-ink-3'
) as HTMLElement | null;
expect(visibleEmptyP).not.toBeNull();
expect(visibleEmptyP!.textContent ?? '').toContain(m.person_mention_search_prompt());
expect(visibleEmptyP!.textContent ?? '').not.toContain(m.person_mention_popup_empty());
});
it('shows "no persons found" when the search has a query but the list is empty', async () => {
render(MentionDropdown, { props: { model: baseModel(), editorQuery: 'WdG' } });
const visibleEmptyP = document.querySelector(
'[role="listbox"] p.text-ink-3'
) as HTMLElement | null;
expect(visibleEmptyP).not.toBeNull();
expect(visibleEmptyP!.textContent ?? '').toContain(m.person_mention_popup_empty());
expect(visibleEmptyP!.textContent ?? '').not.toContain(m.person_mention_search_prompt());
await expect.element(page.getByText('Keine Personen gefunden')).toBeVisible();
});
it('shows the create-new escape hatch link in the empty state', async () => {
render(MentionDropdown, { props: { model: baseModel(), editorQuery: 'unknown' } });
render(MentionDropdown, { props: { model: baseModel() } });
const link = (await page
.getByRole('link', { name: /neue person anlegen/i })
@@ -77,7 +44,6 @@ describe('MentionDropdown', () => {
expect(link.href).toContain('/persons/new');
expect(link.target).toBe('_blank');
expect(link.rel).toContain('noopener');
expect(link.rel).toContain('noreferrer');
});
it('renders one option per item when populated', async () => {
@@ -138,315 +104,3 @@ describe('MentionDropdown', () => {
expect(dropdown.style.left).toBe('123px');
});
});
// ─── Search input — Issue #380 ────────────────────────────────────────────────
describe('MentionDropdown — search input', () => {
it('renders a search input pre-filled with the editorQuery prop', async () => {
render(MentionDropdown, {
props: { model: baseModel(), editorQuery: 'WdG' }
});
await expect.element(page.getByRole('searchbox')).toHaveValue('WdG');
});
it('exposes a data-test-search-input attribute for E2E selectors', async () => {
render(MentionDropdown, { props: { model: baseModel() } });
const input = document.querySelector('[data-test-search-input]');
expect(input).not.toBeNull();
expect((input as HTMLInputElement).type).toBe('search');
});
it('search input wrapper meets the 44px touch target (WCAG 2.2 AA)', async () => {
render(MentionDropdown, { props: { model: baseModel() } });
const input = document.querySelector('[data-test-search-input]') as HTMLElement;
expect(input).not.toBeNull();
expect(input.className).toContain('min-h-[44px]');
});
it('renders a persistent aria-live="polite" region (does not remount on items transition; Leonie #3 on PR #629)', async () => {
render(MentionDropdown, { props: { model: baseModel() } });
const listbox = document.querySelector('[role="listbox"]');
expect(listbox).not.toBeNull();
const live = listbox!.querySelector('p[aria-live="polite"]');
expect(live).not.toBeNull();
// Empty + empty-query → "Namen eingeben…" prompt
expect(live!.textContent ?? '').toContain(m.person_mention_search_prompt());
});
it('announces the result count in the persistent live region when items populate (Leonie #3 on PR #629)', async () => {
render(MentionDropdown, {
props: {
model: baseModel({
items: [
makePerson('p1', 'Anna Schmidt'),
makePerson('p2', 'Bert Meier'),
makePerson('p3', 'Carl Vogel')
]
})
}
});
const listbox = document.querySelector('[role="listbox"]');
expect(listbox).not.toBeNull();
const live = listbox!.querySelector('p[aria-live="polite"]');
expect(live).not.toBeNull();
// Populated → "3 Personen gefunden" (plural)
expect(live!.textContent ?? '').toContain('3');
});
it('announces the singular form when exactly one item is present (Sara #4 on PR #629)', async () => {
render(MentionDropdown, {
props: {
model: baseModel({
items: [makePerson('p1', 'Anna Schmidt')]
})
}
});
const listbox = document.querySelector('[role="listbox"]');
expect(listbox).not.toBeNull();
const live = listbox!.querySelector('p[aria-live="polite"]');
expect(live).not.toBeNull();
// Singular branch — "1 Person gefunden" / "1 person found" / "1 persona encontrada"
// (locale-dependent; resolved via the Paraglide message helper).
expect(live!.textContent ?? '').toContain(m.person_mention_results_count_singular());
});
it('keeps the visible empty-state copy without its own aria-live and hides it from AT (Leonie #3 on PR #629 round 3; Leonie S-2 round 4)', async () => {
render(MentionDropdown, { props: { model: baseModel(), editorQuery: 'WdG' } });
// Visible empty-state <p> exists with the empty-result copy ...
const empty = document.querySelector('p.text-ink-3') as HTMLElement | null;
expect(empty).not.toBeNull();
expect(empty!.textContent ?? '').toContain(m.person_mention_popup_empty());
// ... but it must NOT carry its own aria-live (the persistent sr-only
// region above the conditional is the announcer now).
expect(empty!.hasAttribute('aria-live')).toBe(false);
// ... and it MUST be hidden from screen readers via aria-hidden="true"
// so VoiceOver does not double-announce (the persistent sr-only region
// is the sole AT source of truth). Leonie S-2 on PR #629 round 4.
expect(empty!.getAttribute('aria-hidden')).toBe('true');
});
it('renders the magnifier icon at h-5 w-5 with text-ink-2 (Leonie BLOCKER on PR #629)', async () => {
render(MentionDropdown, { props: { model: baseModel() } });
const icon = document.querySelector('[data-test-search-input]')
?.previousElementSibling as SVGElement | null;
expect(icon).not.toBeNull();
expect(icon!.tagName.toLowerCase()).toBe('svg');
expect(icon!.getAttribute('class') ?? '').toContain('h-5');
expect(icon!.getAttribute('class') ?? '').toContain('w-5');
expect(icon!.getAttribute('class') ?? '').toContain('text-ink-2');
});
it('caps the search input at maxlength=100 (CWE-400 amplification — Nora on PR #629)', async () => {
render(MentionDropdown, { props: { model: baseModel() } });
const input = document.querySelector('[data-test-search-input]') as HTMLInputElement;
expect(input).not.toBeNull();
expect(input.maxLength).toBe(100);
});
it('clips a long editorQuery mirror to 100 chars (CWE-400 layered — Nora #1 on PR #629)', async () => {
const longQuery = 'A'.repeat(200);
render(MentionDropdown, { props: { model: baseModel(), editorQuery: longQuery } });
const input = document.querySelector('[data-test-search-input]') as HTMLInputElement;
expect(input).not.toBeNull();
expect(input.value.length).toBe(100);
expect(input.value).toBe('A'.repeat(100));
});
it('caps the listbox width to the viewport (320 px reflow guard — Leonie FINDING-MENTION-005)', async () => {
render(MentionDropdown, { props: { model: baseModel() } });
const listbox = document.querySelector('[role="listbox"]') as HTMLElement;
expect(listbox).not.toBeNull();
expect(listbox.className).toContain('max-w-[calc(100vw-1rem)]');
});
it('renders the @mention search input at text-base (16 px senior-audience floor — Leonie FINDING-MENTION-006)', async () => {
render(MentionDropdown, { props: { model: baseModel() } });
const input = document.querySelector('[data-test-search-input]') as HTMLInputElement;
expect(input).not.toBeNull();
expect(input.className).toContain('text-base');
expect(input.className).not.toContain('text-sm');
});
it('invokes onSearch with the current value whenever the user types', async () => {
const onSearch = vi.fn();
render(MentionDropdown, { props: { model: baseModel(), onSearch } });
await userEvent.type(page.getByRole('searchbox'), 'Walter');
await vi.waitFor(() => {
expect(onSearch).toHaveBeenCalled();
expect(onSearch).toHaveBeenLastCalledWith('Walter');
});
});
it('keeps the user-edited search value when editorQuery changes after the takeover (Felix on PR #629)', async () => {
let setEditorQuery!: (q: string) => void;
render(MentionDropdownFixture, {
model: baseModel(),
initialEditorQuery: 'WdG',
onReady: (s: (q: string) => void) => {
setEditorQuery = s;
}
});
await expect.element(page.getByRole('searchbox')).toHaveValue('WdG');
await page.getByRole('searchbox').fill('Walter');
await expect.element(page.getByRole('searchbox')).toHaveValue('Walter');
setEditorQuery('WdGruyter');
// Flush pending Svelte reactivity so any (non-)update from the mirror
// $effect has landed before we assert. expect.element already polls, so
// no fixed-timeout fallback is needed. Sara on PR #629 round 3.
await tick();
await expect.element(page.getByRole('searchbox')).toHaveValue('Walter');
});
});
// ─── ArrowDown via exported onKeyDown (Sara #3 on PR #629) ──────────────────
//
// In production, Tiptap intercepts ArrowDown/ArrowUp/Enter at the editor level
// and forwards them to the dropdown via its exported onKeyDown(event) function
// — the dropdown itself has no DOM keydown listener. This test exercises the
// same export so a regression in highlightedIndex/selection logic is caught
// at the unit level. The full E2E focus-chain test is deferred to a separate
// issue (Playwright).
//
// These unit tests directly invoke the exported `onKeyDown` to pin its
// behaviour in isolation. They do NOT exercise the Tiptap forwarding
// chain (PersonMentionEditor.suggestion.render() returning { onKeyDown })
// — that integration is covered by the 'ArrowDown moves the highlight'
// test in PersonMentionEditor.svelte.spec.ts. Sara on PR #629 round 3.
describe('MentionDropdown — onKeyDown forwarding', () => {
// flushSync ensures Svelte reactivity propagation completes before
// asserting (uniform across all four key tests so the next reader
// doesn't have to figure out why some are wrapped and others aren't).
// Felix #1 suggestion on PR #629 round 3.
it('ArrowDown advances aria-selected to the next option in the listbox', async () => {
const container = document.createElement('div');
document.body.appendChild(container);
const instance = mount(MentionDropdown, {
target: container,
props: {
model: baseModel({
items: [makePerson('p1', 'Anna Schmidt'), makePerson('p2', 'Bert Meier')]
})
}
});
try {
const exports = instance as unknown as { onKeyDown: (e: KeyboardEvent) => boolean };
// First option starts highlighted.
const first = container.querySelector('[data-test-person-id="p1"]') as HTMLElement;
const second = container.querySelector('[data-test-person-id="p2"]') as HTMLElement;
expect(first.getAttribute('aria-selected')).toBe('true');
expect(second.getAttribute('aria-selected')).toBe('false');
let consumed = false;
flushSync(() => {
consumed = exports.onKeyDown(new KeyboardEvent('keydown', { key: 'ArrowDown' }));
});
expect(consumed).toBe(true);
expect(first.getAttribute('aria-selected')).toBe('false');
expect(second.getAttribute('aria-selected')).toBe('true');
} finally {
unmount(instance);
container.remove();
}
});
it('ArrowUp wraps from the first option to the last', async () => {
const container = document.createElement('div');
document.body.appendChild(container);
const instance = mount(MentionDropdown, {
target: container,
props: {
model: baseModel({
items: [makePerson('p1', 'Anna Schmidt'), makePerson('p2', 'Bert Meier')]
})
}
});
try {
const exports = instance as unknown as { onKeyDown: (e: KeyboardEvent) => boolean };
let consumed = false;
flushSync(() => {
consumed = exports.onKeyDown(new KeyboardEvent('keydown', { key: 'ArrowUp' }));
});
expect(consumed).toBe(true);
const second = container.querySelector('[data-test-person-id="p2"]') as HTMLElement;
expect(second.getAttribute('aria-selected')).toBe('true');
} finally {
unmount(instance);
container.remove();
}
});
it('Enter invokes model.command with the currently highlighted item', async () => {
const command = vi.fn();
const container = document.createElement('div');
document.body.appendChild(container);
const instance = mount(MentionDropdown, {
target: container,
props: {
model: baseModel({
items: [makePerson('p1', 'Anna Schmidt'), makePerson('p2', 'Bert Meier')],
command
})
}
});
try {
const exports = instance as unknown as { onKeyDown: (e: KeyboardEvent) => boolean };
let consumed = false;
flushSync(() => {
consumed = exports.onKeyDown(new KeyboardEvent('keydown', { key: 'Enter' }));
});
expect(consumed).toBe(true);
expect(command).toHaveBeenCalledTimes(1);
expect(command.mock.calls[0][0].id).toBe('p1');
} finally {
unmount(instance);
container.remove();
}
});
it('Escape returns false so the suggestion plugin can handle it', async () => {
const container = document.createElement('div');
document.body.appendChild(container);
const instance = mount(MentionDropdown, {
target: container,
props: {
model: baseModel({ items: [makePerson('p1', 'Anna Schmidt')] })
}
});
try {
const exports = instance as unknown as { onKeyDown: (e: KeyboardEvent) => boolean };
let consumed = true;
flushSync(() => {
consumed = exports.onKeyDown(new KeyboardEvent('keydown', { key: 'Escape' }));
});
expect(consumed).toBe(false);
} finally {
unmount(instance);
container.remove();
}
});
});

View File

@@ -1,32 +0,0 @@
<script lang="ts">
import { untrack } from 'svelte';
import MentionDropdown from './MentionDropdown.svelte';
import type { components } from '$lib/generated/api';
type Person = components['schemas']['Person'];
type DropdownState = {
items: Person[];
command: (item: Person) => void;
clientRect: (() => DOMRect | null) | null;
};
type Props = {
model: DropdownState;
initialEditorQuery: string;
/** Test hook: receives a setter for editorQuery so the test can mutate it. */
onReady?: (setEditorQuery: (q: string) => void) => void;
onSearch?: (q: string) => void;
};
let { model, initialEditorQuery, onReady, onSearch = () => {} }: Props = $props();
let editorQuery = $state(untrack(() => initialEditorQuery));
$effect(() => {
onReady?.((q) => {
editorQuery = q;
});
});
</script>
<MentionDropdown model={model} editorQuery={editorQuery} onSearch={onSearch} />

View File

@@ -7,9 +7,7 @@ import { m } from '$lib/paraglide/messages.js';
import type { components } from '$lib/generated/api';
import type { PersonMention } from '$lib/shared/types';
import { deserialize, serialize } from '$lib/shared/discussion/mentionSerializer';
import { debounce } from '$lib/shared/utils/debounce';
import MentionDropdown from './MentionDropdown.svelte';
import { MAX_QUERY_LENGTH, SEARCH_DEBOUNCE_MS, SEARCH_RESULT_LIMIT } from './mentionConstants';
type Person = components['schemas']['Person'];
@@ -35,13 +33,6 @@ let {
let editorEl: HTMLDivElement;
let editor: Editor | null = null;
// Hoisted so onDestroy can guarantee the imperatively-mounted dropdown is
// torn down even if Tiptap's suggestion plugin onExit didn't fire (e.g. when
// the host component is unmounted while the dropdown is still open).
let mountedDropdown: object | null = null;
// Hoisted so onDestroy can cancel any pending fetch — otherwise a trailing
// debounced search can fire after the editor is gone and pollute later tests.
let cancelPendingSearch: (() => void) | null = null;
// Single reactive state object shared with MentionDropdown. Mutating these
// fields propagates to the mounted dropdown via Svelte's $state proxy —
@@ -51,12 +42,10 @@ let dropdownState = $state<{
items: Person[];
command: (item: Person) => void;
clientRect: (() => DOMRect | null) | null;
editorQuery: string;
}>({
items: [],
command: () => {},
clientRect: null,
editorQuery: ''
clientRect: null
});
type DropdownExports = {
@@ -149,13 +138,16 @@ onMount(() => {
// Nora #5618 #3 — separate issue tracks the GET /api/persons
// response-shape audit (PersonSummaryDTO leaks `notes`).
// ─────────────────────────────────────────────────────────────
// Tiptap's suggestion plugin requires an `items()` callback to keep
// the dropdown alive, but the actual fetch is owned by `runSearch`
// below — routed through the dropdown's search input via the
// debounced `onSearch` channel. Returning `[]` here keeps Tiptap
// happy without firing a duplicate per-keystroke fetch.
// Markus #5616 / Felix / Nora / Sara on PR #629.
items: async () => [],
items: async ({ query }: { query: string }) => {
if (!query) return [];
try {
const res = await fetch(`/api/persons?q=${encodeURIComponent(query)}`);
if (!res.ok) return [];
return ((await res.json()) as Person[]).slice(0, 5);
} catch {
return [];
}
},
// AC-1 fix: insert the typed query as displayName, not person.displayName.
command({ editor: ed, range, props }) {
const p = props as unknown as { personId: string; displayName: string };
@@ -173,6 +165,7 @@ onMount(() => {
.run();
},
render() {
let component: object | null = null;
let exports: DropdownExports | null = null;
// Tiptap's SuggestionProps types `command` against the default
@@ -185,84 +178,25 @@ onMount(() => {
clientRect?: (() => DOMRect | null) | null;
};
// Request-token guard: every onSearch invocation bumps `requestId`;
// runSearch captures the id active when its fetch starts and discards
// the response if a newer onSearch has fired since. Without this, a
// late response can repopulate the dropdown after the user cleared
// the search input. Sara on PR #629.
let requestId = 0;
const runSearch = async (query: string) => {
const id = requestId;
try {
// Defensive client-side cap — server-side enforcement is tracked
// separately. Markus on PR #629.
const res = await fetch(
`/api/persons?q=${encodeURIComponent(query)}&limit=${SEARCH_RESULT_LIMIT}`
);
if (id !== requestId) return;
if (!res.ok) {
dropdownState.items = [];
return;
}
const data = (await res.json()) as Person[];
if (id !== requestId) return;
dropdownState.items = data.slice(0, SEARCH_RESULT_LIMIT);
} catch {
if (id !== requestId) return;
dropdownState.items = [];
}
};
const debouncedSearch = debounce(runSearch, SEARCH_DEBOUNCE_MS);
cancelPendingSearch = () => debouncedSearch.cancel();
const onSearch = (query: string) => {
requestId++;
if (query.trim() === '') {
debouncedSearch.cancel();
dropdownState.items = [];
return;
}
debouncedSearch(query);
};
const updateState = (renderProps: LooseRenderProps) => {
// Clip once here so both the inserted displayName and the
// dropdown's editor-mirror see the same value. The dropdown
// already clips the mirror (Nora #1 CWE-400), but without
// clipping at the command boundary an unclipped query would
// still flow through as the inserted displayName — visible
// UI divergence between "what I searched" and "what was
// inserted". Felix #3 on PR #629.
const clippedQuery = renderProps.query.slice(0, MAX_QUERY_LENGTH);
dropdownState.items = renderProps.items as Person[];
// AC-1: pass typed query as displayName, not person.displayName
dropdownState.command = (item: Person) =>
renderProps.command({
personId: item.id,
displayName: clippedQuery
displayName: renderProps.query
});
dropdownState.clientRect = renderProps.clientRect ?? null;
dropdownState.editorQuery = clippedQuery;
};
return {
onStart(renderProps) {
const loose = renderProps as unknown as LooseRenderProps;
updateState(loose);
// MentionDropdown reads `editorQuery` off the shared state
// proxy via its `editorQuery` prop binding below — this is
// the same pattern as `model.items`. We do not pass it as a
// separate prop because Svelte 5's mount() does not expose
// settable prop accessors, so we route through the proxy.
updateState(renderProps as unknown as LooseRenderProps);
const mounted = mount(MentionDropdown, {
target: document.body,
props: {
model: dropdownState,
get editorQuery() {
return dropdownState.editorQuery;
},
onSearch
}
props: { model: dropdownState }
});
mountedDropdown = mounted as object;
component = mounted as object;
exports = mounted as unknown as DropdownExports;
},
onUpdate(renderProps) {
@@ -274,16 +208,9 @@ onMount(() => {
return exports?.onKeyDown(event) ?? false;
},
onExit() {
// Cancel any pending debounce so a closed dropdown's trailing
// runSearch cannot fire against the *next* dropdown's state.
// The hoisted `cancelPendingSearch` would be overwritten by
// the next render()'s onStart before the trailing call fires,
// so we cancel locally via the closure-scoped debouncedSearch.
// Felix #1 on PR #629.
debouncedSearch.cancel();
if (mountedDropdown) {
unmount(mountedDropdown);
mountedDropdown = null;
if (component) {
unmount(component);
component = null;
exports = null;
}
}
@@ -326,15 +253,7 @@ onMount(() => {
});
onDestroy(() => {
cancelPendingSearch?.();
editor?.destroy();
// Tiptap suggestion onExit usually unmounts the dropdown, but if the host
// component is destroyed while a suggestion is active the dropdown can
// outlive the editor — clean it up explicitly.
if (mountedDropdown) {
unmount(mountedDropdown);
mountedDropdown = null;
}
});
// Keep the data-placeholder attribute in sync with actual emptiness so the

View File

@@ -8,45 +8,29 @@
import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page, userEvent } from 'vitest/browser';
import { tick } from 'svelte';
import PersonMentionEditorHost from './PersonMentionEditor.test-fixture.svelte';
import PersonMentionEditorHost from './PersonMentionEditor.test-host.svelte';
import type { components } from '$lib/generated/api';
import { m } from '$lib/paraglide/messages.js';
// Single source of truth for the debounce window — imported from the shared
// module so the test cannot drift from production. Sara on PR #629 round 3.
import { SEARCH_DEBOUNCE_MS } from './mentionConstants';
type Person = components['schemas']['Person'];
type PersonMention = components['schemas']['PersonMention'];
/**
* Headroom above SEARCH_DEBOUNCE_MS for the debounce-window wait
* assertions in this file. 350 ms is calibrated against CI-runner jitter
* we observed pre-#629; dropping it below ~200 ms reintroduces flake.
* See PR #629 round-2 review comment #10935 (Sara).
*/
const POST_DEBOUNCE_SLACK_MS = 350;
const AUGUSTE: Person = {
id: 'p-aug',
firstName: 'Auguste',
lastName: 'Raddatz',
displayName: 'Auguste Raddatz',
personType: 'PERSON',
familyMember: false,
birthYear: 1882,
deathYear: 1944
};
} as unknown as Person;
const ANNA: Person = {
id: 'p-anna',
firstName: 'Anna',
lastName: 'Schmidt',
displayName: 'Anna Schmidt',
personType: 'PERSON',
familyMember: false,
birthYear: 1860
};
} as unknown as Person;
function mockFetchWithPersons(persons: Person[] = [AUGUSTE, ANNA]) {
vi.stubGlobal(
@@ -141,20 +125,6 @@ describe('PersonMentionEditor — typeahead', () => {
});
});
it('appends &limit=5 to the fetch URL (defensive client-side cap, Markus on PR #629)', async () => {
const fetchMock = vi
.fn()
.mockResolvedValue({ ok: true, json: vi.fn().mockResolvedValue([AUGUSTE]) });
vi.stubGlobal('fetch', fetchMock);
renderHost();
await userEvent.type(page.getByRole('textbox'), '@Aug');
await vi.waitFor(() => {
expect(fetchMock).toHaveBeenCalledWith(expect.stringContaining('limit=5'));
});
});
it('shows life dates next to the name in the dropdown', async () => {
mockFetchWithPersons();
renderHost();
@@ -172,15 +142,8 @@ describe('PersonMentionEditor — typeahead', () => {
await userEvent.type(page.getByRole('textbox'), '@xyz');
// The visible empty-state <p> (text-ink-3) shows the copy. The persistent
// sr-only aria-live region also contains the same copy, so we scope to the
// visible element to avoid a multi-match resolution in expect.element.
await vi.waitFor(() => {
const visibleEmptyP = document.querySelector(
'[role="listbox"] p.text-ink-3'
) as HTMLElement | null;
expect(visibleEmptyP).not.toBeNull();
expect(visibleEmptyP!.textContent ?? '').toContain('Keine Personen gefunden');
await vi.waitFor(async () => {
await expect.element(page.getByText('Keine Personen gefunden')).toBeInTheDocument();
});
});
@@ -198,254 +161,6 @@ describe('PersonMentionEditor — typeahead', () => {
});
});
// ─── AC-2/3: search input drives the person fetch (debounced) ───────────────
describe('PersonMentionEditor — AC-2/3: search input drives fetch', () => {
it('editing the search input fires a debounced fetch with the new query', async () => {
const fetchMock = vi
.fn()
.mockResolvedValue({ ok: true, json: vi.fn().mockResolvedValue([AUGUSTE]) });
vi.stubGlobal('fetch', fetchMock);
renderHost();
// Open the dropdown so the search input is reachable.
await userEvent.type(page.getByRole('textbox'), '@');
await vi.waitFor(async () => {
await expect.element(page.getByRole('searchbox')).toBeVisible();
});
const fetchesBeforeSearch = fetchMock.mock.calls.length;
// `fill` simulates a single input event with the final value — sidesteps
// per-keystroke timing of userEvent.type so the test can deterministically
// assert that one input event collapses into one debounced fetch.
await page.getByRole('searchbox').fill('Walter');
await vi.waitFor(
() => {
expect(fetchMock).toHaveBeenCalledWith(expect.stringContaining('q=Walter'));
},
{ timeout: 1000 }
);
const fetchesAfterSearch = fetchMock.mock.calls.length - fetchesBeforeSearch;
expect(fetchesAfterSearch).toBe(1);
});
it('fires exactly one /api/persons fetch when the user searches for Walter (regression guard)', async () => {
// Regression guard: a previous version of PersonMentionEditor had a
// duplicated `items()` callback in the Tiptap suggestion config that
// fetched per-keystroke in addition to the debounced search-input fetch
// (Markus & Felix round-1). To catch that regression, we must NOT
// subtract any baseline — every fetch from render onwards counts.
// Sara on PR #629 round 3.
const fetchMock = vi
.fn()
.mockResolvedValue({ ok: true, json: vi.fn().mockResolvedValue([AUGUSTE]) });
vi.stubGlobal('fetch', fetchMock);
renderHost();
// Open the dropdown, then drive the search input via fill() — sidesteps
// per-keystroke timing of userEvent.type that Sara flagged round 2.
await userEvent.type(page.getByRole('textbox'), '@');
await vi.waitFor(async () => {
await expect.element(page.getByRole('searchbox')).toBeVisible();
});
await page.getByRole('searchbox').fill('Walter');
await new Promise((r) => setTimeout(r, SEARCH_DEBOUNCE_MS + POST_DEBOUNCE_SLACK_MS));
// No baseline subtraction — count ALL /api/persons fetches since render.
// If the legacy per-keystroke items() callback returns, typing `@` alone
// would already produce one fetch and `fill('Walter')` another, breaking
// this assertion.
const personsFetches = fetchMock.mock.calls.filter(
([url]) => typeof url === 'string' && url.startsWith('/api/persons')
);
expect(personsFetches.length).toBe(1);
});
it('clearing the search input clears the list without firing a fetch', async () => {
const fetchMock = vi
.fn()
.mockResolvedValue({ ok: true, json: vi.fn().mockResolvedValue([AUGUSTE]) });
vi.stubGlobal('fetch', fetchMock);
renderHost();
await userEvent.type(page.getByRole('textbox'), '@Aug');
await vi.waitFor(async () => {
await expect.element(page.getByText('Auguste Raddatz')).toBeInTheDocument();
});
const fetchesBeforeClear = fetchMock.mock.calls.length;
await userEvent.clear(page.getByRole('searchbox'));
// Negative assertion: wait past the debounce window to confirm no
// trailing fetch was scheduled. Removing this wait would mask a
// re-introduction of the keystroke-driven items() fetch.
await new Promise((r) => setTimeout(r, SEARCH_DEBOUNCE_MS + POST_DEBOUNCE_SLACK_MS));
expect(fetchMock.mock.calls.length).toBe(fetchesBeforeClear);
await expect.element(page.getByText('Auguste Raddatz')).not.toBeInTheDocument();
});
});
// ─── Whitespace-only query (Elicit AC-4 ambiguity on PR #629) ───────────────
describe('PersonMentionEditor — whitespace-only query', () => {
it('keeps the "Namen eingeben…" prompt and fires no fetch when @ is followed only by spaces', async () => {
const fetchMock = vi.fn().mockResolvedValue({ ok: true, json: vi.fn().mockResolvedValue([]) });
vi.stubGlobal('fetch', fetchMock);
renderHost();
await userEvent.type(page.getByRole('textbox'), '@ ');
await vi.waitFor(async () => {
await expect.element(page.getByRole('searchbox')).toBeVisible();
});
await new Promise((r) => setTimeout(r, SEARCH_DEBOUNCE_MS + POST_DEBOUNCE_SLACK_MS));
// Scope to the visible empty-state <p> (text-ink-3) — the persistent
// sr-only aria-live region above contains the same copy.
const visibleEmptyP = document.querySelector(
'[role="listbox"] p.text-ink-3'
) as HTMLElement | null;
expect(visibleEmptyP).not.toBeNull();
expect(visibleEmptyP!.textContent ?? '').toContain(m.person_mention_search_prompt());
expect(fetchMock).not.toHaveBeenCalled();
});
});
// ─── Stale-response race (Sara on PR #629) ───────────────────────────────────
describe('PersonMentionEditor — stale-response race', () => {
it('discards a stale response that resolves after the search has been cleared', async () => {
let resolveFetch!: (v: { ok: boolean; json: () => Promise<Person[]> }) => void;
const pendingResponse = new Promise<{ ok: boolean; json: () => Promise<Person[]> }>((r) => {
resolveFetch = r;
});
const fetchMock = vi.fn().mockReturnValue(pendingResponse);
vi.stubGlobal('fetch', fetchMock);
renderHost();
// Open the dropdown and let the debounce fire so a fetch is in flight.
await userEvent.type(page.getByRole('textbox'), '@Aug');
await vi.waitFor(() => {
expect(fetchMock).toHaveBeenCalledWith(expect.stringContaining('/api/persons?q=Aug'));
});
// Clear the search input *before* the fetch resolves.
await userEvent.clear(page.getByRole('searchbox'));
await expect.element(page.getByRole('searchbox')).toHaveValue('');
// The stale fetch now resolves with persons. The dropdown must stay empty.
resolveFetch({ ok: true, json: () => Promise.resolve([AUGUSTE]) });
// Flush pending Svelte reactivity so any (non-)update from the stale
// fetch resolution has landed before we assert. expect.element already
// polls, so no fixed-timeout fallback is needed. Sara on PR #629 round 4.
await tick();
await expect.element(page.getByText('Auguste Raddatz')).not.toBeInTheDocument();
});
});
// ─── Server failure characterization (Sara #2 on PR #629) ───────────────────
describe('PersonMentionEditor — server failure', () => {
it('on 500 response keeps the dropdown open with the empty-state copy (silent failure pinned; distinct error UX tracked separately)', async () => {
const fetchMock = vi
.fn()
.mockResolvedValue({ ok: false, status: 500, json: vi.fn().mockResolvedValue({}) });
vi.stubGlobal('fetch', fetchMock);
renderHost();
await userEvent.type(page.getByRole('textbox'), '@Aug');
await new Promise((r) => setTimeout(r, SEARCH_DEBOUNCE_MS + POST_DEBOUNCE_SLACK_MS));
// Pins current silent-failure behaviour. The day someone implements a
// distinct error UX (toast / "Suche fehlgeschlagen" copy), this test
// goes red and forces them to update the assertion. Scope to the
// visible <p> (text-ink-3) — the persistent sr-only live region
// above contains the same copy.
const visibleEmptyP = document.querySelector(
'[role="listbox"] p.text-ink-3'
) as HTMLElement | null;
expect(visibleEmptyP).not.toBeNull();
expect(visibleEmptyP!.textContent ?? '').toContain(m.person_mention_popup_empty());
});
it('on a fetch reject (network failure) keeps the dropdown open with the empty-state copy', async () => {
const fetchMock = vi.fn().mockRejectedValue(new TypeError('NetworkError'));
vi.stubGlobal('fetch', fetchMock);
renderHost();
await userEvent.type(page.getByRole('textbox'), '@Aug');
await new Promise((r) => setTimeout(r, SEARCH_DEBOUNCE_MS + POST_DEBOUNCE_SLACK_MS));
const visibleEmptyP = document.querySelector(
'[role="listbox"] p.text-ink-3'
) as HTMLElement | null;
expect(visibleEmptyP).not.toBeNull();
expect(visibleEmptyP!.textContent ?? '').toContain(m.person_mention_popup_empty());
});
});
// ─── onExit cancels pending debounce (Felix #1 on PR #629) ───────────────────
describe('PersonMentionEditor — onExit cancels pending debounce', () => {
it('cancels the pending debounced fetch when Escape closes the dropdown before the debounce fires', async () => {
const fetchMock = vi.fn().mockResolvedValue({ ok: true, json: vi.fn().mockResolvedValue([]) });
vi.stubGlobal('fetch', fetchMock);
renderHost();
// Open the dropdown by typing @ + a query in the editor.
await userEvent.type(page.getByRole('textbox'), '@A');
await vi.waitFor(async () => {
await expect.element(page.getByRole('searchbox')).toBeVisible();
});
// Wait for any in-flight fetch from opening the dropdown to settle.
await new Promise((r) => setTimeout(r, SEARCH_DEBOUNCE_MS + POST_DEBOUNCE_SLACK_MS));
const fetchesBeforeEscape = fetchMock.mock.calls.length;
// Trigger a new debounced search (queues runSearch after 150 ms), then
// immediately Escape *while focus is back in the editor* so Tiptap's
// suggestion-plugin Escape handler fires onExit before the debounce.
// Without onExit cancelling the pending debounce, runSearch executes
// against the now-unmounted dropdown's state.
await page.getByRole('searchbox').fill('Walter');
// Focus the editor so the Escape lands on Tiptap's suggestion handler.
(page.getByRole('textbox').element() as HTMLElement).focus();
await userEvent.keyboard('{Escape}');
// Wait past the debounce window. If onExit did not cancel the pending
// debounce, a fetch with q=Walter would still fire here.
await new Promise((r) => setTimeout(r, SEARCH_DEBOUNCE_MS + POST_DEBOUNCE_SLACK_MS));
const newFetches = fetchMock.mock.calls.slice(fetchesBeforeEscape);
const walterFetches = newFetches.filter(
([url]) => typeof url === 'string' && url.includes('q=Walter')
);
expect(walterFetches.length).toBe(0);
});
});
// ─── AC-1: search input prefilled with text typed after @ ───────────────────
describe('PersonMentionEditor — AC-1: search input prefill', () => {
it('prefills the dropdown search input with the text typed after @', async () => {
mockFetchEmpty();
renderHost();
await userEvent.type(page.getByRole('textbox'), '@WdG');
await vi.waitFor(async () => {
await expect.element(page.getByRole('searchbox')).toHaveValue('WdG');
});
});
});
// ─── AC-1: typed text becomes displayName, not DB name ───────────────────────
describe('PersonMentionEditor — AC-1: typed text as displayName', () => {
@@ -514,39 +229,6 @@ describe('PersonMentionEditor — AC-1: typed text as displayName', () => {
});
});
it('clips the inserted displayName to MAX_QUERY_LENGTH=100 chars (Felix #3 on PR #629)', async () => {
// CWE-400 amplification: the dropdown clips its search input + mirror at
// 100 chars (Nora #1), but the host editor was passing the unclipped
// renderProps.query straight through to displayName — so a 105-char
// @-suffix in the editor could insert a 105-char displayName into the
// sidecar even though the dropdown only searched the first 100.
mockFetchWithPersons();
const host = renderHost();
// Type @ + 105 'A' chars in the contenteditable. The renderProps.query
// fed into the command callback derives from the editor text after `@`,
// not the dropdown's searchbox — so we must drive the editor.
await userEvent.type(page.getByRole('textbox'), '@' + 'A'.repeat(105));
// The mocked /api/persons returns AUGUSTE for any query — wait for it.
await vi.waitFor(async () => {
await expect.element(page.getByRole('option', { name: /Auguste Raddatz/ })).toBeVisible();
});
const option = (await page
.getByRole('option', { name: /Auguste Raddatz/ })
.element()) as HTMLElement;
option.dispatchEvent(new MouseEvent('mousedown', { bubbles: true, cancelable: true }));
await vi.waitFor(() => {
expect(host.snapshot.mentionedPersons).toHaveLength(1);
// Tight assertion: input is 105 chars, cap is exactly 100. Using
// `toHaveLength(100)` discriminates "clip works" from "clip works
// AND nothing weakened it to e.g. 95". Sara on PR #629 round 4.
expect(host.snapshot.mentionedPersons[0].displayName).toHaveLength(100);
});
});
it('does not duplicate the sidecar entry when the same person is selected twice', async () => {
mockFetchWithPersons();
const host = renderHost({

View File

@@ -1,10 +0,0 @@
/** Shared knobs for the @mention typeahead. Single source of truth for
* the dropdown component and the host editor — keeps the layered length
* cap and the debounce window consistent across both files. */
export const MAX_QUERY_LENGTH = 100;
export const SEARCH_DEBOUNCE_MS = 150;
/** Defensive client-side cap on the result list. Single consumer today
* (PersonMentionEditor), kept here for symmetry with the other limit
* knobs so the @mention configuration lives in one place. Felix #1 on
* PR #629 round 4. */
export const SEARCH_RESULT_LIMIT = 5;

View File

@@ -1,7 +1,7 @@
import { describe, it, expect, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page, userEvent } from 'vitest/browser';
import TestHost from './confirm.test-fixture.svelte';
import TestHost from './confirm.test-host.svelte';
import type { ConfirmService } from './confirm.svelte.js';
afterEach(cleanup);

View File

@@ -1,25 +1,12 @@
/**
* Returns a debounced version of fn that delays invocation until after
* `delay` ms have elapsed since the last call. The returned function
* exposes a `cancel()` method that DROPS (does not flush) the pending
* trailing invocation — essential when the host context (a destroyed
* component, an unmounted editor) shouldn't fire the trailing call.
* `delay` ms have elapsed since the last call.
*/
// eslint-disable-next-line @typescript-eslint/no-explicit-any
export function debounce<T extends (...args: any[]) => void>(
fn: T,
delay: number
): T & { cancel: () => void } {
let timer: ReturnType<typeof setTimeout> | undefined;
const wrapped = ((...args: Parameters<T>) => {
if (timer !== undefined) clearTimeout(timer);
export function debounce<T extends (...args: any[]) => void>(fn: T, delay: number): T {
let timer: ReturnType<typeof setTimeout>;
return ((...args: Parameters<T>) => {
clearTimeout(timer);
timer = setTimeout(() => fn(...args), delay);
}) as T & { cancel: () => void };
wrapped.cancel = () => {
if (timer !== undefined) {
clearTimeout(timer);
timer = undefined;
}
};
return wrapped;
}) as T;
}

View File

@@ -5,7 +5,7 @@ import DocumentRow from '$lib/document/DocumentRow.svelte';
import { SvelteMap } from 'svelte/reactivity';
import type { components } from '$lib/generated/api';
type DocumentListItem = components['schemas']['DocumentListItem'];
type DocumentSearchItem = components['schemas']['DocumentSearchItem'];
type SortMode = 'DATE' | 'TITLE' | 'SENDER' | 'RECEIVER' | 'UPLOAD_DATE' | 'RELEVANCE';
@@ -17,7 +17,7 @@ let {
q = '',
sort = 'DATE'
}: {
items: DocumentListItem[];
items: DocumentSearchItem[];
canWrite: boolean;
error?: string | null;
total?: number;
@@ -31,10 +31,10 @@ const groups = $derived.by(() => {
return groupByYear(items);
});
function groupByYear(docItems: DocumentListItem[]) {
const map = new SvelteMap<string, DocumentListItem[]>();
function groupByYear(docItems: DocumentSearchItem[]) {
const map = new SvelteMap<string, DocumentSearchItem[]>();
for (const item of docItems) {
const label = item.documentDate?.substring(0, 4) ?? m.docs_group_undated();
const label = item.document.documentDate?.substring(0, 4) ?? m.docs_group_undated();
const bucket = map.get(label);
if (bucket) bucket.push(item);
else map.set(label, [item]);
@@ -42,10 +42,10 @@ function groupByYear(docItems: DocumentListItem[]) {
return Array.from(map.entries()).map(([label, groupItems]) => ({ label, items: groupItems }));
}
function groupBySender(docItems: DocumentListItem[]) {
const map = new SvelteMap<string, DocumentListItem[]>();
function groupBySender(docItems: DocumentSearchItem[]) {
const map = new SvelteMap<string, DocumentSearchItem[]>();
for (const item of docItems) {
const label = item.sender?.displayName ?? m.docs_group_unknown_sender();
const label = item.document.sender?.displayName ?? m.docs_group_unknown_sender();
const bucket = map.get(label);
if (bucket) bucket.push(item);
else map.set(label, [item]);
@@ -53,10 +53,10 @@ function groupBySender(docItems: DocumentListItem[]) {
return Array.from(map.entries()).map(([label, groupItems]) => ({ label, items: groupItems }));
}
function groupByReceiver(docItems: DocumentListItem[]) {
const map = new SvelteMap<string, DocumentListItem[]>();
function groupByReceiver(docItems: DocumentSearchItem[]) {
const map = new SvelteMap<string, DocumentSearchItem[]>();
for (const item of docItems) {
const receivers = item.receivers ?? [];
const receivers = item.document.receivers ?? [];
const labels =
receivers.length > 0
? receivers.map((r) => r.displayName)
@@ -99,7 +99,7 @@ function groupByReceiver(docItems: DocumentListItem[]) {
>
</div>
<ul class="divide-y divide-line">
{#each group.items as item (group.label + '-' + item.id)}
{#each group.items as item (group.label + '-' + item.document.id)}
<DocumentRow item={item} canWrite={canWrite} />
{/each}
</ul>

View File

@@ -8,17 +8,24 @@ vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
afterEach(() => cleanup());
type DocumentListItem = components['schemas']['DocumentListItem'];
type DocumentSearchItem = components['schemas']['DocumentSearchItem'];
function makeItem(overrides: Partial<DocumentListItem> = {}): DocumentListItem {
function makeItem(overrides: Partial<DocumentSearchItem> = {}): DocumentSearchItem {
return {
id: '1',
title: 'Testbrief',
originalFilename: 'testbrief.pdf',
documentDate: '2024-03-15',
sender: undefined,
receivers: [],
tags: [],
document: {
id: '1',
title: 'Testbrief',
originalFilename: 'testbrief.pdf',
status: 'UPLOADED',
documentDate: '2024-03-15',
sender: undefined,
receivers: [],
tags: [],
createdAt: '2024-01-01T00:00:00Z',
updatedAt: '2024-01-01T00:00:00Z',
metadataComplete: false,
scriptType: 'UNKNOWN'
},
matchData: {
titleOffsets: [],
senderMatched: false,
@@ -68,8 +75,8 @@ describe('DocumentList empty state', () => {
describe('DocumentList year grouping', () => {
it('groups documents by year into separate cards', async () => {
const items = [
makeItem({ id: '1', documentDate: '1923-04-12' }),
makeItem({ id: '2', documentDate: '1965-08-03' })
makeItem({ document: { ...makeItem().document, id: '1', documentDate: '1923-04-12' } }),
makeItem({ document: { ...makeItem().document, id: '2', documentDate: '1965-08-03' } })
];
render(DocumentList, { ...baseProps, items, total: 2 });
const groupCards = page.getByTestId('group-card');
@@ -78,15 +85,17 @@ describe('DocumentList year grouping', () => {
});
it('uses undated label for items with no documentDate', async () => {
const items = [makeItem({ id: '1', documentDate: undefined })];
const items = [
makeItem({ document: { ...makeItem().document, id: '1', documentDate: undefined } })
];
render(DocumentList, { ...baseProps, items, total: 1 });
await expect.element(page.getByText('Undatiert')).toBeInTheDocument();
});
it('single year renders one group-card', async () => {
const items = [
makeItem({ id: '1', documentDate: '1938-01-01' }),
makeItem({ id: '2', documentDate: '1938-06-15' })
makeItem({ document: { ...makeItem().document, id: '1', documentDate: '1938-01-01' } }),
makeItem({ document: { ...makeItem().document, id: '2', documentDate: '1938-06-15' } })
];
render(DocumentList, { ...baseProps, items, total: 2 });
const groupCards = page.getByTestId('group-card');
@@ -99,7 +108,9 @@ describe('DocumentList year grouping', () => {
describe('DocumentList sort fallback', () => {
it('falls back to year grouping when sort is not SENDER or RECEIVER', async () => {
const items = [makeItem({ id: '1', documentDate: '2024-03-15' })];
const items = [
makeItem({ document: { ...makeItem().document, id: '1', documentDate: '2024-03-15' } })
];
render(DocumentList, { ...baseProps, items, total: 1, sort: 'TITLE' });
await expect
.element(page.getByTestId('group-header').filter({ hasText: '2024' }))
@@ -113,23 +124,29 @@ describe('DocumentList sender grouping', () => {
it('groups by sender displayName when sort is SENDER', async () => {
const items = [
makeItem({
id: '1',
sender: {
id: 's1',
lastName: 'Mustermann',
displayName: 'Max Mustermann',
personType: 'PERSON',
familyMember: false
document: {
...makeItem().document,
id: '1',
sender: {
id: 's1',
lastName: 'Mustermann',
displayName: 'Max Mustermann',
personType: 'PERSON',
familyMember: false
}
}
}),
makeItem({
id: '2',
sender: {
id: 's2',
lastName: 'Musterfrau',
displayName: 'Anna Musterfrau',
personType: 'PERSON',
familyMember: false
document: {
...makeItem().document,
id: '2',
sender: {
id: 's2',
lastName: 'Musterfrau',
displayName: 'Anna Musterfrau',
personType: 'PERSON',
familyMember: false
}
}
})
];
@@ -150,7 +167,10 @@ describe('DocumentList sender grouping', () => {
personType: 'PERSON' as const,
familyMember: false
};
const items = [makeItem({ id: '1', sender }), makeItem({ id: '2', sender })];
const items = [
makeItem({ document: { ...makeItem().document, id: '1', sender } }),
makeItem({ document: { ...makeItem().document, id: '2', sender } })
];
render(DocumentList, { ...baseProps, items, total: 2, sort: 'SENDER' });
const cards = page.getByTestId('group-card');
await expect.element(cards.first()).toBeInTheDocument();
@@ -158,7 +178,7 @@ describe('DocumentList sender grouping', () => {
});
it('places items with no sender under fallback label', async () => {
const items = [makeItem({ id: '1', sender: undefined })];
const items = [makeItem({ document: { ...makeItem().document, id: '1', sender: undefined } })];
render(DocumentList, { ...baseProps, items, total: 1, sort: 'SENDER' });
await expect.element(page.getByText('Unbekannter Absender')).toBeInTheDocument();
});
@@ -170,16 +190,19 @@ describe('DocumentList receiver grouping', () => {
it('groups by receiver displayName when sort is RECEIVER', async () => {
const items = [
makeItem({
id: '1',
receivers: [
{
id: 'r1',
lastName: 'Brandt',
displayName: 'Felix Brandt',
personType: 'PERSON',
familyMember: false
}
]
document: {
...makeItem().document,
id: '1',
receivers: [
{
id: 'r1',
lastName: 'Brandt',
displayName: 'Felix Brandt',
personType: 'PERSON',
familyMember: false
}
]
}
})
];
render(DocumentList, { ...baseProps, items, total: 1, sort: 'RECEIVER' });
@@ -191,24 +214,27 @@ describe('DocumentList receiver grouping', () => {
it('duplicates a document into each receiver group', async () => {
const items = [
makeItem({
id: '1',
title: 'Rundbriefchen',
receivers: [
{
id: 'r1',
lastName: 'Brandt',
displayName: 'Felix Brandt',
personType: 'PERSON',
familyMember: false
},
{
id: 'r2',
lastName: 'Meier',
displayName: 'Hans Meier',
personType: 'PERSON',
familyMember: false
}
]
document: {
...makeItem().document,
id: '1',
title: 'Rundbriefchen',
receivers: [
{
id: 'r1',
lastName: 'Brandt',
displayName: 'Felix Brandt',
personType: 'PERSON',
familyMember: false
},
{
id: 'r2',
lastName: 'Meier',
displayName: 'Hans Meier',
personType: 'PERSON',
familyMember: false
}
]
}
})
];
render(DocumentList, { ...baseProps, items, total: 1, sort: 'RECEIVER' });
@@ -223,7 +249,7 @@ describe('DocumentList receiver grouping', () => {
});
it('places items with no receivers under fallback label', async () => {
const items = [makeItem({ id: '1', receivers: [] })];
const items = [makeItem({ document: { ...makeItem().document, id: '1', receivers: [] } })];
render(DocumentList, { ...baseProps, items, total: 1, sort: 'RECEIVER' });
await expect.element(page.getByText('Unbekannter Empfänger')).toBeInTheDocument();
});
@@ -235,7 +261,7 @@ describe('DocumentList DocumentRow delegation', () => {
it('shows transcription snippet when matchData has one', async () => {
const items = [
makeItem({
id: 'doc1',
document: { ...makeItem().document, id: 'doc1' },
matchData: {
transcriptionSnippet: 'Er schrieb einen langen Brief',
titleOffsets: [],
@@ -252,7 +278,7 @@ describe('DocumentList DocumentRow delegation', () => {
});
it('does not render snippet when matchData has no transcription snippet', async () => {
const items = [makeItem({ id: 'doc1' })];
const items = [makeItem({ document: { ...makeItem().document, id: 'doc1' } })];
render(DocumentList, { ...baseProps, items, total: 1 });
await expect.element(page.getByTestId('search-snippet')).not.toBeInTheDocument();
});
@@ -260,8 +286,7 @@ describe('DocumentList DocumentRow delegation', () => {
it('renders mark for title highlight when titleOffsets present', async () => {
const items = [
makeItem({
id: 'doc1',
title: 'Brief an Anna',
document: { ...makeItem().document, id: 'doc1', title: 'Brief an Anna' },
matchData: {
titleOffsets: [{ start: 0, length: 5 }], // "Brief"
senderMatched: false,

View File

@@ -20,46 +20,29 @@ const { default: DocumentList } = await import('./DocumentList.svelte');
afterEach(cleanup);
const sender = {
id: 's1',
lastName: 'Schmidt',
displayName: 'Anna Schmidt',
personType: 'PERSON' as const,
familyMember: false
};
const receiver = {
id: 'r1',
lastName: 'Meier',
displayName: 'Bert Meier',
personType: 'PERSON' as const,
familyMember: false
};
const emptyMatchData = {
titleOffsets: [],
senderMatched: false,
matchedReceiverIds: [],
matchedTagIds: [],
snippetOffsets: [],
summaryOffsets: []
};
const sender = { id: 's1', displayName: 'Anna Schmidt' };
const receiver = { id: 'r1', displayName: 'Bert Meier' };
const makeItem = (overrides: Record<string, unknown> = {}) => ({
id: 'd1',
title: 'Brief 1923',
originalFilename: 'b.pdf',
documentDate: '1923-04-15',
sender,
receivers: [receiver],
tags: [],
summary: undefined,
archiveBox: undefined,
archiveFolder: undefined,
location: undefined,
matchData: emptyMatchData,
document: {
id: 'd1',
title: 'Brief 1923',
originalFilename: 'b.pdf',
documentDate: '1923-04-15',
sender,
receivers: [receiver],
tags: [],
thumbnailUrl: null,
contentType: 'application/pdf',
summary: null,
archiveBox: null,
archiveFolder: null,
location: null,
...overrides
},
matchData: null,
completionPercentage: 0,
contributors: [],
...overrides
contributors: []
});
describe('DocumentList', () => {
@@ -104,26 +87,8 @@ describe('DocumentList', () => {
render(DocumentList, {
props: {
items: [
makeItem({
id: 'd1',
sender: {
id: 's1',
lastName: 'Schmidt',
displayName: 'Anna Schmidt',
personType: 'PERSON',
familyMember: false
}
}),
makeItem({
id: 'd2',
sender: {
id: 's2',
lastName: 'Meier',
displayName: 'Bert Meier',
personType: 'PERSON',
familyMember: false
}
})
makeItem({ id: 'd1', sender: { id: 's1', displayName: 'Anna Schmidt' } }),
makeItem({ id: 'd2', sender: { id: 's2', displayName: 'Bert Meier' } })
],
canWrite: false,
sort: 'SENDER' as const

View File

@@ -1,6 +1,6 @@
import { error } from '@sveltejs/kit';
import { env } from '$env/dynamic/private';
import { createApiClient, extractErrorCode } from '$lib/shared/api.server';
import { createApiClient } from '$lib/shared/api.server';
import { getErrorMessage } from '$lib/shared/errors';
import type { components } from '$lib/generated/api';
@@ -34,16 +34,16 @@ export async function load({ fetch, locals }) {
]);
if (!usersResult.response.ok) {
throw error(usersResult.response.status, getErrorMessage(extractErrorCode(usersResult.error)));
const code = (usersResult.error as unknown as { code?: string })?.code;
throw error(usersResult.response.status, getErrorMessage(code));
}
if (!groupsResult.response.ok) {
throw error(
groupsResult.response.status,
getErrorMessage(extractErrorCode(groupsResult.error))
);
const code = (groupsResult.error as unknown as { code?: string })?.code;
throw error(groupsResult.response.status, getErrorMessage(code));
}
if (!tagsResult.response.ok) {
throw error(tagsResult.response.status, getErrorMessage(extractErrorCode(tagsResult.error)));
const code = (tagsResult.error as unknown as { code?: string })?.code;
throw error(tagsResult.response.status, getErrorMessage(code));
}
let inviteCount = 0;

View File

@@ -1,6 +1,6 @@
import { error, fail, redirect } from '@sveltejs/kit';
import type { PageServerLoad, Actions } from './$types';
import { createApiClient, extractErrorCode } from '$lib/shared/api.server';
import { createApiClient } from '$lib/shared/api.server';
import { getErrorMessage } from '$lib/shared/errors';
export const load: PageServerLoad = async ({ params, parent }) => {
@@ -24,9 +24,8 @@ export const actions: Actions = {
});
if (!result.response.ok) {
return fail(result.response.status, {
error: getErrorMessage(extractErrorCode(result.error))
});
const code = (result.error as unknown as { code?: string })?.code;
return fail(result.response.status, { error: getErrorMessage(code) });
}
return { success: true };
@@ -39,9 +38,8 @@ export const actions: Actions = {
});
if (!result.response.ok) {
return fail(result.response.status, {
error: getErrorMessage(extractErrorCode(result.error))
});
const code = (result.error as unknown as { code?: string })?.code;
return fail(result.response.status, { error: getErrorMessage(code) });
}
throw redirect(303, '/admin/groups');

View File

@@ -7,8 +7,7 @@ const mockApi = {
};
vi.mock('$lib/shared/api.server', () => ({
createApiClient: () => mockApi,
extractErrorCode: (e: unknown) => (e as { code?: string } | undefined)?.code
createApiClient: () => mockApi
}));
beforeEach(() => vi.clearAllMocks());

View File

@@ -1,10 +1,7 @@
import { describe, expect, it, vi, beforeEach } from 'vitest';
import { load } from './+layout.server';
vi.mock('$lib/shared/api.server', () => ({
createApiClient: vi.fn(),
extractErrorCode: (e: unknown) => (e as { code?: string } | undefined)?.code
}));
vi.mock('$lib/shared/api.server', () => ({ createApiClient: vi.fn() }));
import { createApiClient } from '$lib/shared/api.server';

View File

@@ -1,6 +1,6 @@
import { fail, redirect } from '@sveltejs/kit';
import type { Actions } from './$types';
import { createApiClient, extractErrorCode } from '$lib/shared/api.server';
import { createApiClient } from '$lib/shared/api.server';
import { getErrorMessage } from '$lib/shared/errors';
export const actions: Actions = {
@@ -16,9 +16,8 @@ export const actions: Actions = {
});
if (!result.response.ok) {
return fail(result.response.status, {
error: getErrorMessage(extractErrorCode(result.error))
});
const code = (result.error as unknown as { code?: string })?.code;
return fail(result.response.status, { error: getErrorMessage(code) });
}
throw redirect(303, '/admin/groups');

View File

@@ -1,5 +1,5 @@
import { fail } from '@sveltejs/kit';
import { createApiClient, extractErrorCode } from '$lib/shared/api.server';
import { createApiClient } from '$lib/shared/api.server';
import { getErrorMessage } from '$lib/shared/errors';
import type { Actions, PageServerLoad } from './$types';
import type { components } from '$lib/generated/api';
@@ -25,7 +25,8 @@ export const load: PageServerLoad = async ({ url, fetch }) => {
let invites: InviteListItem[] = [];
let loadError: string | null = null;
if (!invitesResult.response.ok) {
loadError = extractErrorCode(invitesResult.error) ?? 'INTERNAL_ERROR';
const code = (invitesResult.error as unknown as { code?: string })?.code;
loadError = code ?? 'INTERNAL_ERROR';
} else {
invites = (invitesResult.data ?? []) as InviteListItem[];
}
@@ -33,7 +34,8 @@ export const load: PageServerLoad = async ({ url, fetch }) => {
let groups: UserGroup[] = [];
let groupsLoadError: string | null = null;
if (!groupsResult.response.ok) {
groupsLoadError = extractErrorCode(groupsResult.error) ?? 'INTERNAL_ERROR';
const code = (groupsResult.error as unknown as { code?: string })?.code;
groupsLoadError = code ?? 'INTERNAL_ERROR';
} else {
const raw = groupsResult.data ?? [];
groups = [...raw].sort((a, b) => a.name.localeCompare(b.name));
@@ -60,9 +62,8 @@ export const actions = {
});
if (!result.response.ok) {
return fail(result.response.status, {
createError: extractErrorCode(result.error) ?? 'INTERNAL_ERROR'
});
const code = (result.error as unknown as { code?: string })?.code;
return fail(result.response.status, { createError: code ?? 'INTERNAL_ERROR' });
}
return { created: result.data! as InviteListItem };
@@ -77,9 +78,8 @@ export const actions = {
const result = await api.DELETE('/api/invites/{id}', { params: { path: { id } } });
if (!result.response.ok) {
return fail(result.response.status, {
revokeError: extractErrorCode(result.error) ?? 'INTERNAL_ERROR'
});
const code = (result.error as unknown as { code?: string })?.code;
return fail(result.response.status, { revokeError: code ?? 'INTERNAL_ERROR' });
}
return { revoked: id };

View File

@@ -1,10 +1,7 @@
import { describe, expect, it, vi, beforeEach } from 'vitest';
import { load } from './+layout.server';
vi.mock('$lib/shared/api.server', () => ({
createApiClient: vi.fn(),
extractErrorCode: (e: unknown) => (e as { code?: string } | undefined)?.code
}));
vi.mock('$lib/shared/api.server', () => ({ createApiClient: vi.fn() }));
import { createApiClient } from '$lib/shared/api.server';

View File

@@ -1,6 +1,6 @@
import { error } from '@sveltejs/kit';
import type { PageServerLoad } from './$types';
import { createApiClient, extractErrorCode } from '$lib/shared/api.server';
import { createApiClient } from '$lib/shared/api.server';
import { getErrorMessage } from '$lib/shared/errors';
export const load: PageServerLoad = async ({ fetch }) => {
@@ -8,7 +8,8 @@ export const load: PageServerLoad = async ({ fetch }) => {
const result = await api.GET('/api/ocr/training-info');
if (!result.response.ok) {
throw error(result.response.status, getErrorMessage(extractErrorCode(result.error)));
const code = (result.error as unknown as { code?: string })?.code;
throw error(result.response.status, getErrorMessage(code));
}
return { trainingInfo: result.data! };

View File

@@ -1,6 +1,6 @@
import { error } from '@sveltejs/kit';
import type { PageServerLoad } from './$types';
import { createApiClient, extractErrorCode } from '$lib/shared/api.server';
import { createApiClient } from '$lib/shared/api.server';
import { getErrorMessage } from '$lib/shared/errors';
export const load: PageServerLoad = async ({ params, fetch }) => {
@@ -10,7 +10,8 @@ export const load: PageServerLoad = async ({ params, fetch }) => {
});
if (!result.response.ok) {
throw error(result.response.status, getErrorMessage(extractErrorCode(result.error)));
const code = (result.error as unknown as { code?: string })?.code;
throw error(result.response.status, getErrorMessage(code));
}
return { history: result.data!, personId: params.personId };

View File

@@ -3,10 +3,7 @@ import { load } from './+page.server';
const mockApi = { GET: vi.fn() };
vi.mock('$lib/shared/api.server', () => ({
createApiClient: () => mockApi,
extractErrorCode: (e: unknown) => (e as { code?: string } | undefined)?.code
}));
vi.mock('$lib/shared/api.server', () => ({ createApiClient: () => mockApi }));
beforeEach(() => vi.clearAllMocks());

View File

@@ -1,6 +1,6 @@
import { error } from '@sveltejs/kit';
import type { PageServerLoad } from './$types';
import { createApiClient, extractErrorCode } from '$lib/shared/api.server';
import { createApiClient } from '$lib/shared/api.server';
import { getErrorMessage } from '$lib/shared/errors';
export const load: PageServerLoad = async ({ fetch }) => {
@@ -8,7 +8,8 @@ export const load: PageServerLoad = async ({ fetch }) => {
const result = await api.GET('/api/ocr/training-info/global');
if (!result.response.ok) {
throw error(result.response.status, getErrorMessage(extractErrorCode(result.error)));
const code = (result.error as unknown as { code?: string })?.code;
throw error(result.response.status, getErrorMessage(code));
}
return { history: result.data! };

View File

@@ -3,10 +3,7 @@ import { load } from './+page.server';
const mockApi = { GET: vi.fn() };
vi.mock('$lib/shared/api.server', () => ({
createApiClient: () => mockApi,
extractErrorCode: (e: unknown) => (e as { code?: string } | undefined)?.code
}));
vi.mock('$lib/shared/api.server', () => ({ createApiClient: () => mockApi }));
beforeEach(() => vi.clearAllMocks());

View File

@@ -3,10 +3,7 @@ import { load } from './+page.server';
const mockApi = { GET: vi.fn() };
vi.mock('$lib/shared/api.server', () => ({
createApiClient: () => mockApi,
extractErrorCode: (e: unknown) => (e as { code?: string } | undefined)?.code
}));
vi.mock('$lib/shared/api.server', () => ({ createApiClient: () => mockApi }));
beforeEach(() => vi.clearAllMocks());

View File

@@ -1,6 +1,6 @@
import { error, fail, redirect } from '@sveltejs/kit';
import type { PageServerLoad, Actions } from './$types';
import { createApiClient, extractErrorCode } from '$lib/shared/api.server';
import { createApiClient } from '$lib/shared/api.server';
import { getErrorMessage } from '$lib/shared/errors';
export const load: PageServerLoad = async ({ params, parent, url }) => {
@@ -25,9 +25,8 @@ export const actions: Actions = {
});
if (!result.response.ok) {
return fail(result.response.status, {
error: getErrorMessage(extractErrorCode(result.error))
});
const code = (result.error as unknown as { code?: string })?.code;
return fail(result.response.status, { error: getErrorMessage(code) });
}
return { success: true };
@@ -44,9 +43,8 @@ export const actions: Actions = {
});
if (!result.response.ok) {
return fail(result.response.status, {
error: getErrorMessage(extractErrorCode(result.error))
});
const code = (result.error as unknown as { code?: string })?.code;
return fail(result.response.status, { error: getErrorMessage(code) });
}
throw redirect(303, `/admin/tags/${result.data!.id}?merged=1`);
@@ -67,9 +65,8 @@ export const actions: Actions = {
});
if (!result.response.ok) {
return fail(result.response.status, {
error: getErrorMessage(extractErrorCode(result.error))
});
const code = (result.error as unknown as { code?: string })?.code;
return fail(result.response.status, { error: getErrorMessage(code) });
}
throw redirect(303, '/admin/tags');

View File

@@ -8,8 +8,7 @@ const mockApi = {
};
vi.mock('$lib/shared/api.server', () => ({
createApiClient: () => mockApi,
extractErrorCode: (e: unknown) => (e as { code?: string } | undefined)?.code
createApiClient: () => mockApi
}));
beforeEach(() => vi.clearAllMocks());

View File

@@ -1,10 +1,7 @@
import { describe, expect, it, vi, beforeEach } from 'vitest';
import { load } from './+layout.server';
vi.mock('$lib/shared/api.server', () => ({
createApiClient: vi.fn(),
extractErrorCode: (e: unknown) => (e as { code?: string } | undefined)?.code
}));
vi.mock('$lib/shared/api.server', () => ({ createApiClient: vi.fn() }));
import { createApiClient } from '$lib/shared/api.server';

View File

@@ -1,6 +1,6 @@
import { error, fail, redirect } from '@sveltejs/kit';
import type { PageServerLoad, Actions } from './$types';
import { createApiClient, extractErrorCode } from '$lib/shared/api.server';
import { createApiClient } from '$lib/shared/api.server';
import { getErrorMessage } from '$lib/shared/errors';
import type { components } from '$lib/generated/api';
@@ -55,9 +55,8 @@ export const actions: Actions = {
});
if (!result.response.ok) {
return fail(result.response.status, {
error: getErrorMessage(extractErrorCode(result.error))
});
const code = (result.error as unknown as { code?: string })?.code;
return fail(result.response.status, { error: getErrorMessage(code) });
}
return { success: true };
@@ -70,9 +69,8 @@ export const actions: Actions = {
});
if (!result.response.ok) {
return fail(result.response.status, {
error: getErrorMessage(extractErrorCode(result.error))
});
const code = (result.error as unknown as { code?: string })?.code;
return fail(result.response.status, { error: getErrorMessage(code) });
}
throw redirect(303, '/admin/users');

View File

@@ -4,10 +4,7 @@ vi.mock('$env/dynamic/private', () => ({
env: { API_INTERNAL_URL: 'http://localhost:8080' }
}));
vi.mock('$lib/shared/api.server', () => ({
createApiClient: vi.fn(),
extractErrorCode: (e: unknown) => (e as { code?: string } | undefined)?.code
}));
vi.mock('$lib/shared/api.server', () => ({ createApiClient: vi.fn() }));
import { load, actions } from './+page.server';
import { createApiClient } from '$lib/shared/api.server';

View File

@@ -1,10 +1,7 @@
import { describe, expect, it, vi, beforeEach } from 'vitest';
import { load } from './+layout.server';
vi.mock('$lib/shared/api.server', () => ({
createApiClient: vi.fn(),
extractErrorCode: (e: unknown) => (e as { code?: string } | undefined)?.code
}));
vi.mock('$lib/shared/api.server', () => ({ createApiClient: vi.fn() }));
import { createApiClient } from '$lib/shared/api.server';

View File

@@ -1,6 +1,6 @@
import { error, fail, redirect } from '@sveltejs/kit';
import type { PageServerLoad, Actions } from './$types';
import { createApiClient, extractErrorCode } from '$lib/shared/api.server';
import { createApiClient } from '$lib/shared/api.server';
import { getErrorMessage } from '$lib/shared/errors';
export const load: PageServerLoad = async ({ fetch, locals }) => {
@@ -35,9 +35,8 @@ export const actions: Actions = {
});
if (!result.response.ok) {
return fail(result.response.status, {
error: getErrorMessage(extractErrorCode(result.error))
});
const code = (result.error as unknown as { code?: string })?.code;
return fail(result.response.status, { error: getErrorMessage(code) });
}
throw redirect(303, '/admin/users');

View File

@@ -8,8 +8,7 @@ const mockApi = {
};
vi.mock('$lib/shared/api.server', () => ({
createApiClient: () => mockApi,
extractErrorCode: (e: unknown) => (e as { code?: string } | undefined)?.code
createApiClient: () => mockApi
}));
function buildUrl(search = ''): URL {

Some files were not shown because too many files have changed in this diff Show More