Compare commits

..

11 Commits

Author SHA1 Message Date
Marcel
b5239f515f fix(notification): address review suggestions
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m23s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m17s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
- ChronikFuerDichBox: move update() inside the failure branch so success
  path skips it, matching NotificationDropdown's pattern
- NotificationDropdown test: add role=alert assertion for mark-all-read
  failure to match existing dismiss-failure coverage in ChronikFuerDichBox
- +page.server.ts: use getErrorMessage(undefined) instead of null so the
  missing-notificationId 400 goes through the same i18n pipeline as other errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 07:31:26 +02:00
Marcel
f2bb58e294 fix(chronik): surface action failures in ChronikFuerDichBox with accessible error banner
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m35s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m17s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Add $state errorMessage + role=alert banner to ChronikFuerDichBox. Both enhance callbacks
now inspect result.type and set the error message on 'failure' or 'error'; errorMessage
is cleared on each new submit attempt.

Upgrade both test files to the mockFormResult pattern (via vi.hoisted) so the result
callback is exercised. Add a failing-action test in each file that asserts role=alert
appears after a form submit with type='failure'.

Fix bare Function cast → explicit typed cast to satisfy @typescript-eslint/no-unsafe-function-type.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 07:06:58 +02:00
Marcel
2adb98895d fix(aktivitaeten): narrow File cast and use null payload for missing notificationId
Replace 'as string | null' cast (which silently accepts File values) with an explicit
typeof check. Use error: null instead of hardcoded German so the client falls through
to the generic i18n-keyed error banner.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 07:06:15 +02:00
Marcel
6049dcadd3 fix(notification-dropdown): handle error result type, add role=alert, fix update ordering
- Add role="alert" to error banner so screen-reader users hear failures
- Handle result.type === 'error' (network failure) alongside 'failure' in both enhance callbacks
- Clear errorMessage at the start of each submit so stale errors don't persist on retry
- On dismiss success: skip update() entirely since goto() navigates away from the page
- On dismiss failure: await update() then set error message
- On mark-all success: skip update() (optimistic state already applied)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 07:05:50 +02:00
Marcel
7fe8842b57 fix(notifications): surface action failures as an error banner
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m25s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m23s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Semgrep Security Scan (pull_request) Successful in 18s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
When dismiss-notification or mark-all-read returns a failure the dropdown
now shows a localised error message above the list. Added
notification_error_generic key (de/en/es) as the fallback when the
action response carries no explicit error string.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:56:33 +02:00
Marcel
f9340366d1 fix(notifications): move onClose/goto into enhance result callback
onClose() and goto() were firing before the server responded, making it
impossible for a fail() response to cancel navigation. Moved them inside
the result callback behind a result.type !== 'failure' guard.

Updated the $app/forms enhance mock to always invoke the returned async
callback with a configurable mockFormResult, and added three tests:
- success path calls onClose + goto with the correct deep-link URL
- failure path skips onClose and goto
- annotationId is appended to the URL when present

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:54:15 +02:00
Marcel
af84ffc379 fix(notifications): guard against null notificationId in dismiss action
Casting null to string caused PATCH to fire against /api/notifications/null/read
when the field was absent. Added an early-return fail(400) and a test that
submitting an empty form returns 400 without calling the API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:48:37 +02:00
Marcel
23439e581a refactor(chronik): replace callback props with form actions in ChronikFuerDichBox
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m20s
CI / OCR Service Tests (pull_request) Successful in 21s
CI / Backend Unit Tests (pull_request) Successful in 3m22s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 58s
Dismiss (X) button and mark-all-read button now submit forms to
/aktivitaeten?/dismiss-notification and /aktivitaeten?/mark-all-read respectively.
Props renamed onMarkRead/onMarkAllRead → optimisticMarkRead/optimisticMarkAllRead.

aktivitaeten/+page.svelte drops the now-deleted onMarkRead/onMarkAllRead wrapper functions
and passes notificationStore.optimisticMarkRead/optimisticMarkAllRead directly to the box.

Tests: $app/forms enhance mock added to both spec files so dismiss and mark-all assertions
work synchronously against form-submit events.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:16:58 +02:00
Marcel
2c6b59d0c7 refactor(notification): replace callback props with form actions in Dropdown and Bell
NotificationDropdown now wraps each row in a <form action="/aktivitaeten?/dismiss-notification">
and the mark-all control in <form action="/aktivitaeten?/mark-all-read">, wired via use:enhance
for optimistic UI. Props renamed onMarkRead/onMarkAllRead → optimisticMarkRead/optimisticMarkAllRead
to match the simplified store API. NotificationBell passes the store helpers directly; handleMarkRead
is removed.

Test mocks updated: $app/forms enhance mock fires SubmitFunction synchronously on form submit so
callback assertions work without a real HTTP round-trip.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 23:15:56 +02:00
Marcel
c0a7408ef4 refactor(notification): rename markRead/markAllRead to optimistic helpers without fetch
Removes raw fetch() calls from the store. optimisticMarkRead(id) and
optimisticMarkAllRead() now only mutate local $state — the actual API
calls move to SvelteKit form actions on /aktivitaeten.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 22:59:01 +02:00
Marcel
9d283c4500 feat(notification): add dismiss-notification and mark-all-read form actions to aktivitaeten
Adds two SvelteKit form actions to /aktivitaeten/+page.server.ts so the
notification bell can POST there instead of calling the backend directly
from the browser.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 22:51:08 +02:00
303 changed files with 2555 additions and 25484 deletions

View File

@@ -39,12 +39,6 @@ PORT_PROMETHEUS=9090
# Grafana admin password — change this before exposing Grafana beyond localhost
GRAFANA_ADMIN_PASSWORD=changeme
# Password for the read-only grafana_reader PostgreSQL role used by the PO
# Overview dashboard. Consumed by Flyway V68 (to set the role's password) and
# by Grafana's PostgreSQL datasource (to connect). REQUIRED in production —
# generate with: openssl rand -hex 32
GRAFANA_DB_PASSWORD=changeme-generate-with-openssl-rand-hex-32
# GlitchTip domain — production: use https://glitchtip.archiv.raddatz.cloud (must match Caddy vhost)
GLITCHTIP_DOMAIN=http://localhost:3002

View File

@@ -65,29 +65,6 @@ jobs:
exit 1
fi
- name: Assert no raw document date rendered via {@html} (CWE-79 — #666)
shell: bash
run: |
# meta_date_raw is untrusted verbatim spreadsheet text — it must render via
# Svelte default escaping, never {@html}. This guard flags any {@html ...}
# whose expression references a raw-date variable. A comment mentioning
# "{@html}" without a raw token inside the braces does NOT match.
# The token list MUST cover every variable that carries the raw value:
# DocumentDate.svelte exposes it via the `raw` prop, so `\braw\b` is included.
# Grow this list whenever a new raw-bearing variable name is introduced.
pattern='\{@html[^}]*(metaDateRaw|documentDateRaw|rawDate|\braw\b)'
# Self-test: the regex must catch the dangerous forms and ignore the comment form.
printf '{@html doc.metaDateRaw}\n' | grep -qP "$pattern" \
|| { echo "FAIL: guard self-test — regex missed the unsafe {@html metaDateRaw} form"; exit 1; }
printf '{@html raw}\n' | grep -qP "$pattern" \
|| { echo "FAIL: guard self-test — regex missed the unsafe {@html raw} form (DocumentDate prop)"; exit 1; }
printf 'never use {@html} for this\n' | grep -qvP "$pattern" \
|| { echo "FAIL: guard self-test — regex wrongly flagged a {@html} comment"; exit 1; }
if grep -rPln "$pattern" --include='*.svelte' frontend/src/; then
echo "FAIL: meta_date_raw rendered via {@html} — use default {…} escaping (CWE-79, #666)."
exit 1
fi
- name: Assert no (upload|download)-artifact past v3
shell: bash
run: |

View File

@@ -31,7 +31,6 @@ name: nightly
# STAGING_APP_ADMIN_USERNAME
# STAGING_APP_ADMIN_PASSWORD
# GRAFANA_ADMIN_PASSWORD
# GRAFANA_DB_PASSWORD (read-only grafana_reader DB role, issue #651)
# GLITCHTIP_SECRET_KEY
# SENTRY_DSN (set after GlitchTip first-run; empty = Sentry disabled)
@@ -80,8 +79,6 @@ jobs:
IMPORT_HOST_DIR=/srv/familienarchiv-staging/import
POSTGRES_USER=archiv
SENTRY_DSN=${{ secrets.SENTRY_DSN }}
VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
EOF
- name: Verify backend /import:ro mount is wired
@@ -145,7 +142,6 @@ jobs:
cp docker-compose.observability.yml /opt/familienarchiv/
cat > /opt/familienarchiv/obs-secrets.env <<'EOF'
GRAFANA_ADMIN_PASSWORD=${{ secrets.GRAFANA_ADMIN_PASSWORD }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
GLITCHTIP_SECRET_KEY=${{ secrets.GLITCHTIP_SECRET_KEY }}
POSTGRES_PASSWORD=${{ secrets.STAGING_POSTGRES_PASSWORD }}
POSTGRES_HOST=archiv-staging-db-1

View File

@@ -35,7 +35,6 @@ name: release
# MAIL_USERNAME
# MAIL_PASSWORD
# GRAFANA_ADMIN_PASSWORD
# GRAFANA_DB_PASSWORD (read-only grafana_reader DB role, issue #651)
# GLITCHTIP_SECRET_KEY
# SENTRY_DSN (set after GlitchTip first-run; empty = Sentry disabled)
@@ -78,7 +77,6 @@ jobs:
IMPORT_HOST_DIR=/srv/familienarchiv-production/import
POSTGRES_USER=archiv
SENTRY_DSN=${{ secrets.SENTRY_DSN }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
EOF
- name: Build images
@@ -112,7 +110,6 @@ jobs:
cp docker-compose.observability.yml /opt/familienarchiv/
cat > /opt/familienarchiv/obs-secrets.env <<'EOF'
GRAFANA_ADMIN_PASSWORD=${{ secrets.GRAFANA_ADMIN_PASSWORD }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
GLITCHTIP_SECRET_KEY=${{ secrets.GLITCHTIP_SECRET_KEY }}
POSTGRES_PASSWORD=${{ secrets.PROD_POSTGRES_PASSWORD }}
POSTGRES_HOST=archiv-production-db-1

7
.gitignore vendored
View File

@@ -26,10 +26,3 @@ node_modules/
# Repo uses npm; yarn.lock is ignored to avoid double-lockfile drift.
frontend/yarn.lock
**/.venv/
**/__pycache__/
*.pyc
# Canonical import artifacts live only on the ops host (PII).
# See tools/import-normalizer/.gitignore — load-bearing for that policy.

View File

@@ -87,7 +87,7 @@ backend/src/main/java/org/raddatz/familienarchiv/
├── exception/ DomainException, ErrorCode, GlobalExceptionHandler
├── filestorage/ FileService (S3/MinIO)
├── geschichte/ Geschichte (story) domain
├── importing/ CanonicalImportOrchestrator + four loaders (TagTree/PersonRegister/PersonTree/Document) + CanonicalSheetReader
├── importing/ MassImportService
├── notification/ Notification domain + SseEmitterRegistry
├── ocr/ OCR domain — OcrService, OcrBatchService, training
├── person/ Person domain
@@ -192,13 +192,11 @@ frontend/src/routes/
├── persons/
│ ├── [id]/ Person detail
│ ├── [id]/edit/ Person edit form
── new/ Create person form
│ └── review/ Triage view — confirm/rename/merge/delete provisional persons
── new/ Create person form
├── briefwechsel/ Bilateral conversation timeline (Briefwechsel)
├── aktivitaeten/ Unified activity feed (Chronik)
├── geschichten/ Stories — list, [id], [id]/edit, new
├── stammbaum/ Family tree (Stammbaum)
├── themen/ Topics directory — browsable tag index
├── enrich/ Enrichment workflow — [id], done
├── admin/ User, group, tag, OCR, system management
├── hilfe/transkription/ Transcription help page

View File

@@ -272,7 +272,6 @@ For multipart/form-data (file uploads): bypass the typed client and use `event.f
| Form display | German `dd.mm.yyyy` with auto-dot insertion via `handleDateInput()` |
| Wire format | ISO 8601 via a hidden `<input type="hidden" name="documentDate" value={dateIso}>` |
| Display | `new Intl.DateTimeFormat('de-DE', …).format(new Date(val + 'T12:00:00'))` |
| Honest precision display | `formatDocumentDate(iso, precision, end?, raw?, locale?)` (`$lib/shared/utils/documentDate.ts`) or the `<DocumentDate>` component — renders a document date at exactly its `meta_date_precision` (MONTH → "Juni 1916", never a fabricated day). It mirrors the Java `DocumentTitleFormatter`; both are pinned to `docs/date-label-fixtures.json` so the title and UI labels can't drift. `meta_date_raw` is untrusted — render it via default escaping, never `{@html}` (a CI guard enforces this). |
### Security checklist (new endpoint)

View File

@@ -34,7 +34,7 @@ src/main/java/org/raddatz/familienarchiv/
├── exception/ # DomainException, ErrorCode, GlobalExceptionHandler
├── filestorage/ # FileService (S3/MinIO)
├── geschichte/ # Geschichte (story) domain
├── importing/ # CanonicalImportOrchestrator + 4 loaders + CanonicalSheetReader
├── importing/ # MassImportService
├── notification/ # Notification domain + SseEmitterRegistry
├── ocr/ # OCR domain — OcrService, OcrBatchService, training
├── person/ # Person domain — Person, PersonService, PersonController

View File

@@ -5,10 +5,8 @@ import lombok.extern.slf4j.Slf4j;
import org.flywaydb.core.Flyway;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.env.Environment;
import javax.sql.DataSource;
import java.util.Map;
@Configuration
@RequiredArgsConstructor
@@ -16,7 +14,6 @@ import java.util.Map;
public class FlywayConfig {
private final DataSource dataSource;
private final Environment environment;
@Bean(name = "flyway")
public Flyway flyway() {
@@ -24,7 +21,6 @@ public class FlywayConfig {
Flyway flyway = Flyway.configure()
.dataSource(dataSource)
.locations("classpath:db/migration")
.placeholders(Map.of("grafanaDbPassword", resolveGrafanaDbPassword()))
.baselineOnMigrate(true)
.baselineVersion("4")
.load();
@@ -32,22 +28,4 @@ public class FlywayConfig {
log.info("Flyway: {} migration(s) applied.", result.migrationsExecuted);
return flyway;
}
// Fail-closed: refuse to boot when GRAFANA_DB_PASSWORD is unset. The
// grafana_reader role's password is (re)set on every boot by
// R__grafana_reader_password.sql, so a missing env var means we'd either
// skip the rotation silently or — with a hardcoded fallback — publish a
// well-known credential for a role with SELECT on audit_log, documents,
// and transcription_blocks. Same shape as UserDataInitializer's refusal
// to seed default admin credentials outside dev/test/e2e.
String resolveGrafanaDbPassword() {
String value = environment.getProperty("GRAFANA_DB_PASSWORD");
if (value == null || value.isBlank()) {
throw new IllegalStateException(
"GRAFANA_DB_PASSWORD is required: it is consumed by "
+ "R__grafana_reader_password.sql to (re)set the grafana_reader "
+ "role's password on every boot. Generate with: openssl rand -hex 32");
}
return value;
}
}

View File

@@ -1,17 +0,0 @@
package org.raddatz.familienarchiv.document;
/**
* Precision of a document's date. Verbatim mirror of the import normalizer's
* {@code Precision} enum (tools/import-normalizer/dates.py) — the canonical output is the
* contract, so there is no translation layer. Do not add, remove, or rename values without
* also changing the normalizer; a mismatch silently breaks import idempotency (see ADR-025).
*/
public enum DatePrecision {
DAY,
MONTH,
SEASON,
YEAR,
RANGE,
APPROX,
UNKNOWN
}

View File

@@ -25,12 +25,10 @@ import java.util.UUID;
@NamedEntityGraph(name = "Document.full", attributeNodes = {
@NamedAttributeNode("sender"),
@NamedAttributeNode("receivers"),
@NamedAttributeNode("tags"),
@NamedAttributeNode("trainingLabels")
@NamedAttributeNode("tags")
})
@NamedEntityGraph(name = "Document.list", attributeNodes = {
@NamedAttributeNode("sender"),
@NamedAttributeNode("receivers"),
@NamedAttributeNode("tags")
})
@Entity
@@ -91,29 +89,6 @@ public class Document {
@Column(name = "meta_date")
private LocalDate documentDate; // Wann wurde der Brief geschrieben?
// Precision of documentDate — drives honest rendering ("ca. 1943", "Frühjahr 1943").
// Verbatim mirror of the normalizer's Precision enum (see ADR-025).
@Enumerated(EnumType.STRING)
@Column(name = "meta_date_precision", nullable = false, length = 16)
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
@Builder.Default
private DatePrecision metaDatePrecision = DatePrecision.UNKNOWN;
// Range end — only set when metaDatePrecision is RANGE (open-ended ranges allowed → may be null).
@Column(name = "meta_date_end")
private LocalDate metaDateEnd;
// Original date cell, verbatim, preserved for provenance and "as written" display.
@Column(name = "meta_date_raw", columnDefinition = "TEXT")
private String metaDateRaw;
// Raw attribution preserved even when a person is linked via sender/receivers.
@Column(name = "sender_text", columnDefinition = "TEXT")
private String senderText;
@Column(name = "receiver_text", columnDefinition = "TEXT")
private String receiverText;
@Column(name = "meta_location")
private String location;

View File

@@ -12,8 +12,6 @@ public class DocumentBatchMetadataDTO {
private UUID senderId;
private List<UUID> receiverIds;
private LocalDate documentDate;
private DatePrecision metaDatePrecision;
private LocalDate metaDateEnd;
private String location;
private List<String> tagNames;
private Boolean metadataComplete;

View File

@@ -313,10 +313,9 @@ public class DocumentController {
@RequestParam(required = false) String tagQ,
@RequestParam(required = false) DocumentStatus status,
@RequestParam(required = false) String tagOp,
@RequestParam(required = false) Boolean undated,
Authentication authentication) {
TagOperator operator = "OR".equalsIgnoreCase(tagOp) ? TagOperator.OR : TagOperator.AND;
List<UUID> ids = documentService.findIdsForFilter(q, from, to, senderId, receiverId, tags, tagQ, status, operator, Boolean.TRUE.equals(undated));
List<UUID> ids = documentService.findIdsForFilter(q, from, to, senderId, receiverId, tags, tagQ, status, operator);
if (ids.size() > BULK_EDIT_FILTER_MAX_IDS) {
throw DomainException.badRequest(ErrorCode.BULK_EDIT_TOO_MANY_IDS,
"Filter matches " + ids.size() + " documents — refine filter (max " + BULK_EDIT_FILTER_MAX_IDS + ")");
@@ -376,7 +375,6 @@ public class DocumentController {
@Parameter(description = "Sort field") @RequestParam(required = false) DocumentSort sort,
@Parameter(description = "Sort direction: ASC or DESC") @RequestParam(required = false, defaultValue = "DESC") String dir,
@Parameter(description = "Tag operator: AND (default) or OR") @RequestParam(required = false) String tagOp,
@Parameter(description = "Restrict to undated documents (meta_date IS NULL)") @RequestParam(required = false) Boolean undated,
// @Max on page guards against overflow when pageable.getOffset() is computed
// as page * size — Integer.MAX_VALUE * 50 would wrap to a negative long, which
// Hibernate cheerfully turns into an invalid SQL OFFSET.
@@ -389,7 +387,7 @@ public class DocumentController {
// defaults to AND, which matches the frontend default and keeps old clients working.
TagOperator operator = "OR".equalsIgnoreCase(tagOp) ? TagOperator.OR : TagOperator.AND;
Pageable pageable = PageRequest.of(page, size);
return ResponseEntity.ok(documentService.searchDocuments(q, from, to, senderId, receiverId, tags, tagQ, status, sort, dir, operator, Boolean.TRUE.equals(undated), pageable));
return ResponseEntity.ok(documentService.searchDocuments(q, from, to, senderId, receiverId, tags, tagQ, status, sort, dir, operator, pageable));
}
@GetMapping(value = "/density", produces = MediaType.APPLICATION_JSON_VALUE)

View File

@@ -1,44 +0,0 @@
package org.raddatz.familienarchiv.document;
import io.swagger.v3.oas.annotations.media.Schema;
import org.raddatz.familienarchiv.audit.ActivityActorDTO;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.tag.Tag;
import java.time.LocalDate;
import java.time.LocalDateTime;
import java.util.List;
import java.util.UUID;
public record DocumentListItem(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
UUID id,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
String title,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
String originalFilename,
String thumbnailUrl,
LocalDate documentDate,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
DatePrecision metaDatePrecision,
LocalDate metaDateEnd,
Person sender,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<Person> receivers,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<Tag> tags,
String archiveBox,
String archiveFolder,
String location,
String summary,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
int completionPercentage,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<ActivityActorDTO> contributors,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
SearchMatchData matchData,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
LocalDateTime createdAt,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
LocalDateTime updatedAt
) {}

View File

@@ -0,0 +1,18 @@
package org.raddatz.familienarchiv.document;
import io.swagger.v3.oas.annotations.media.Schema;
import org.raddatz.familienarchiv.audit.ActivityActorDTO;
import org.raddatz.familienarchiv.document.Document;
import java.util.List;
public record DocumentSearchItem(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
Document document,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
SearchMatchData matchData,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
int completionPercentage,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<ActivityActorDTO> contributors
) {}

View File

@@ -7,7 +7,7 @@ import java.util.List;
public record DocumentSearchResult(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<DocumentListItem> items,
List<DocumentSearchItem> items,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
long totalElements,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
@@ -15,45 +15,24 @@ public record DocumentSearchResult(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
int pageSize,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
int totalPages,
/**
* Total number of undated documents (meta_date IS NULL) matching the current
* filter context (q/tags/sender/receiver/status) across ALL pages — not the
* undated rows on the current page. Computed independently of the "Nur
* undatierte" toggle so it never collapses to the page slice (issue #668).
*/
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
long undatedCount
int totalPages
) {
/**
* Single-page convenience factory used by empty-result shortcuts and by tests that
* don't care about paging. Treats the whole list as page 0 of itself. The undated
* count defaults to 0 — the service overlays the real global count via
* {@link #withUndatedCount(long)} before returning.
* don't care about paging. Treats the whole list as page 0 of itself.
*/
public static DocumentSearchResult of(List<DocumentListItem> items) {
public static DocumentSearchResult of(List<DocumentSearchItem> items) {
int size = items.size();
return new DocumentSearchResult(items, size, 0, size, size == 0 ? 0 : 1, 0L);
return new DocumentSearchResult(items, size, 0, size, size == 0 ? 0 : 1);
}
/**
* Paged factory used by the service when it has a real Pageable + full match count
* (e.g. from Spring's Page&lt;T&gt; or from an in-memory sort-then-slice). The undated
* count defaults to 0 — the service overlays the real global count via
* {@link #withUndatedCount(long)} before returning.
* (e.g. from Spring's Page<T> or from an in-memory sort-then-slice).
*/
public static DocumentSearchResult paged(List<DocumentListItem> slice, Pageable pageable, long totalElements) {
public static DocumentSearchResult paged(List<DocumentSearchItem> slice, Pageable pageable, long totalElements) {
int pageSize = pageable.getPageSize();
int totalPages = pageSize == 0 ? 0 : (int) ((totalElements + pageSize - 1) / pageSize);
return new DocumentSearchResult(slice, totalElements, pageable.getPageNumber(), pageSize, totalPages, 0L);
}
/**
* Returns a copy with the global undated count overlaid, leaving every other
* field untouched. Lets the service compute the count once and attach it to
* whichever result shape the search path produced.
*/
public DocumentSearchResult withUndatedCount(long undatedCount) {
return new DocumentSearchResult(items, totalElements, pageNumber, pageSize, totalPages, undatedCount);
return new DocumentSearchResult(slice, totalElements, pageable.getPageNumber(), pageSize, totalPages);
}
}

View File

@@ -10,6 +10,7 @@ import org.raddatz.familienarchiv.audit.AuditService;
import org.raddatz.familienarchiv.document.DocumentBatchMetadataDTO;
import org.raddatz.familienarchiv.document.DocumentBatchSummary;
import org.raddatz.familienarchiv.document.DocumentBulkEditDTO;
import org.raddatz.familienarchiv.document.DocumentSearchItem;
import org.raddatz.familienarchiv.document.DocumentSearchResult;
import org.raddatz.familienarchiv.document.DocumentSort;
import org.raddatz.familienarchiv.document.DocumentUpdateDTO;
@@ -171,7 +172,7 @@ public class DocumentService {
hasFts, ftsIds, null, null,
filters.sender(), filters.receiver(),
filters.tags(), filters.tagQ(),
filters.status(), filters.tagOperator(), false);
filters.status(), filters.tagOperator());
return documentRepository.findAll(spec).stream()
.map(Document::getDocumentDate)
.filter(Objects::nonNull)
@@ -378,7 +379,6 @@ public class DocumentService {
// 1. Einfache Felder Update
doc.setTitle(dto.getTitle());
doc.setDocumentDate(dto.getDocumentDate());
applyDatePrecision(doc, dto);
doc.setLocation(dto.getLocation());
doc.setTranscription(dto.getTranscription());
doc.setSummary(dto.getSummary());
@@ -447,25 +447,6 @@ public class DocumentService {
return saved;
}
/**
* Applies the three date-precision fields only when the DTO carries them.
* A null field means "not submitted" — overwriting the stored value with null
* would fabricate a precision the user never chose, the exact dishonesty #666
* exists to prevent. A row with a genuinely-unknown precision must keep it when
* an unrelated edit (e.g. a location typo) is saved.
*/
private void applyDatePrecision(Document doc, DocumentUpdateDTO dto) {
if (dto.getMetaDatePrecision() != null) {
doc.setMetaDatePrecision(dto.getMetaDatePrecision());
}
if (dto.getMetaDateEnd() != null) {
doc.setMetaDateEnd(dto.getMetaDateEnd());
}
if (dto.getMetaDateRaw() != null) {
doc.setMetaDateRaw(dto.getMetaDateRaw());
}
}
@Transactional
public Document updateDocumentTags(UUID docId, List<String> tagNames) {
Document doc = documentRepository.findById(docId)
@@ -501,8 +482,7 @@ public class DocumentService {
*/
@Transactional(readOnly = true)
public List<UUID> findIdsForFilter(String text, LocalDate from, LocalDate to, UUID sender, UUID receiver,
List<String> tags, String tagQ, DocumentStatus status, TagOperator tagOperator,
boolean undated) {
List<String> tags, String tagQ, DocumentStatus status, TagOperator tagOperator) {
boolean hasText = StringUtils.hasText(text);
List<UUID> rankedIds = null;
if (hasText) {
@@ -511,7 +491,7 @@ public class DocumentService {
}
Specification<Document> spec = buildSearchSpec(
hasText, rankedIds, from, to, sender, receiver, tags, tagQ, status, tagOperator, undated);
hasText, rankedIds, from, to, sender, receiver, tags, tagQ, status, tagOperator);
return documentRepository.findAll(spec).stream().map(Document::getId).toList();
}
@@ -525,8 +505,7 @@ public class DocumentService {
LocalDate from, LocalDate to,
UUID sender, UUID receiver,
List<String> tags, String tagQ,
DocumentStatus status, TagOperator tagOperator,
boolean undated) {
DocumentStatus status, TagOperator tagOperator) {
boolean useOrLogic = tagOperator == TagOperator.OR;
List<Set<UUID>> expandedTagSets = tagService.expandTagNamesToDescendantIdSets(tags);
Specification<Document> textSpec = hasText ? hasIds(ftsIds) : (root, query, cb) -> null;
@@ -536,8 +515,7 @@ public class DocumentService {
.and(hasReceiver(receiver))
.and(hasTags(expandedTagSets, useOrLogic))
.and(hasTagPartial(tagQ))
.and(hasStatus(status))
.and(undatedOnly(undated));
.and(hasStatus(status));
}
/**
@@ -666,62 +644,22 @@ public class DocumentService {
}
// 1. Allgemeine Suche (für das Suchfeld im Frontend)
public DocumentSearchResult searchDocuments(String text, LocalDate from, LocalDate to, UUID sender, UUID receiver, List<String> tags, String tagQ, DocumentStatus status, DocumentSort sort, String dir, TagOperator tagOperator, boolean undated, Pageable pageable) {
public DocumentSearchResult searchDocuments(String text, LocalDate from, LocalDate to, UUID sender, UUID receiver, List<String> tags, String tagQ, DocumentStatus status, DocumentSort sort, String dir, TagOperator tagOperator, Pageable pageable) {
boolean hasText = StringUtils.hasText(text);
// Pure-text RELEVANCE: push pagination + ts_rank ordering into SQL — skip
// findAllMatchingIdsByFts entirely (ADR-008). This must run BEFORE any
// findAllMatchingIdsByFts call so the fast path is preserved. An active undated
// filter must NOT take this path: it bypasses buildSearchSpec, so the
// undatedOnly predicate would be silently dropped. By definition this path has
// no date/sender/receiver/tag/status filters, and undated documents are valid
// FTS hits already folded into the ranked page, so there is no separate undated
// count to report here.
if (!undated && isPureTextRelevance(hasText, sort, from, to, sender, receiver, tags, tagQ, status)) {
// Pure-text RELEVANCE: push pagination into SQL — skip findAllMatchingIdsByFts entirely (ADR-008).
if (isPureTextRelevance(hasText, sort, from, to, sender, receiver, tags, tagQ, status)) {
return relevanceSortedPageFromSql(text, pageable);
}
List<UUID> rankedIds = null;
if (hasText) {
rankedIds = documentRepository.findAllMatchingIdsByFts(text);
// FTS matched nothing → no results and, by definition, no undated matches either.
if (rankedIds.isEmpty()) return DocumentSearchResult.of(List.of());
}
// Global undated count for the current filter (q/tags/sender/receiver/status),
// forcing undatedOnly(true) and IGNORING the user's "Nur undatierte" toggle so
// it never collapses to the page slice and never double-counts (issue #668).
long undatedCount = countUndatedForFilter(hasText, rankedIds, from, to, sender, receiver, tags, tagQ, status, tagOperator);
return runSearch(text, hasText, rankedIds, from, to, sender, receiver, tags, tagQ, status, sort, dir, tagOperator, undated, pageable)
.withUndatedCount(undatedCount);
}
/**
* Counts every undated document (meta_date IS NULL) matching the active filter,
* across all pages, independent of the undated toggle. Reuses {@link #buildSearchSpec}
* with {@code undated=true} forced so the count tracks q/tags/sender/receiver/status.
* A {@code from}/{@code to} range excludes undated rows by the collision rule (#668),
* so the count is legitimately 0 inside a date range.
*/
private long countUndatedForFilter(boolean hasText, List<UUID> ftsIds,
LocalDate from, LocalDate to, UUID sender, UUID receiver,
List<String> tags, String tagQ, DocumentStatus status, TagOperator tagOperator) {
Specification<Document> undatedSpec = buildSearchSpec(
hasText, ftsIds, from, to, sender, receiver, tags, tagQ, status, tagOperator, true);
return documentRepository.count(undatedSpec);
}
/** The original search dispatch — produces the page slice + totals, sans undated count. */
private DocumentSearchResult runSearch(String text, boolean hasText, List<UUID> rankedIds,
LocalDate from, LocalDate to, UUID sender, UUID receiver,
List<String> tags, String tagQ, DocumentStatus status,
DocumentSort sort, String dir, TagOperator tagOperator,
boolean undated, Pageable pageable) {
// The pure-text RELEVANCE fast path is handled by the caller (searchDocuments)
// before findAllMatchingIdsByFts runs, so it never reaches here (ADR-008).
Specification<Document> spec = buildSearchSpec(
hasText, rankedIds, from, to, sender, receiver, tags, tagQ, status, tagOperator, undated);
hasText, rankedIds, from, to, sender, receiver, tags, tagQ, status, tagOperator);
// SENDER and RECEIVER sorts load the full match set and slice in-memory.
// JPA's Sort.by("sender.lastName") generates an INNER JOIN that silently drops
@@ -798,7 +736,7 @@ public class DocumentService {
return DocumentSearchResult.paged(enrichItems(slice, text), pageable, totalElements);
}
private List<DocumentListItem> enrichItems(List<Document> documents, String text) {
private List<DocumentSearchItem> enrichItems(List<Document> documents, String text) {
List<Document> colorResolved = resolveDocumentTagColors(documents);
Map<UUID, SearchMatchData> matchData = enrichWithMatchData(colorResolved, text);
@@ -806,7 +744,7 @@ public class DocumentService {
Map<UUID, Integer> completionByDoc = fetchCompletionPercentages(docIds);
Map<UUID, List<ActivityActorDTO>> contributorsByDoc = auditLogQueryService.findRecentContributorsPerDocument(docIds);
return colorResolved.stream().map(doc -> toListItem(
return colorResolved.stream().map(doc -> new DocumentSearchItem(
doc,
matchData.getOrDefault(doc.getId(), SearchMatchData.empty()),
completionByDoc.getOrDefault(doc.getId(), 0),
@@ -814,30 +752,6 @@ public class DocumentService {
)).toList();
}
private DocumentListItem toListItem(Document doc, SearchMatchData match, int completionPct, List<ActivityActorDTO> contributors) {
return new DocumentListItem(
doc.getId(),
doc.getTitle(),
doc.getOriginalFilename(),
doc.getThumbnailUrl(),
doc.getDocumentDate(),
doc.getMetaDatePrecision(),
doc.getMetaDateEnd(),
doc.getSender(),
List.copyOf(doc.getReceivers()),
List.copyOf(doc.getTags()),
doc.getArchiveBox(),
doc.getArchiveFolder(),
doc.getLocation(),
doc.getSummary(),
completionPct,
contributors,
match,
doc.getCreatedAt(),
doc.getUpdatedAt()
);
}
private Map<UUID, Integer> fetchCompletionPercentages(List<UUID> docIds) {
return transcriptionBlockQueryService.getCompletionStats(docIds);
}
@@ -845,15 +759,7 @@ public class DocumentService {
private Sort resolveSort(DocumentSort sort, String dir) {
Sort.Direction direction = "ASC".equalsIgnoreCase(dir) ? Sort.Direction.ASC : Sort.Direction.DESC;
if (sort == null || sort == DocumentSort.DATE || sort == DocumentSort.RELEVANCE) {
// Undated documents (null documentDate) must order last regardless of
// direction — Postgres puts NULLs FIRST on ASC by default, which would
// surface the undated pile at the top with no explanation (issue #668).
// The title tiebreaker gives a stable total order when every row is
// null-dated (the "Nur undatierte" filter), so pagination is deterministic.
// title is @Column(nullable=false), so it is always present.
return Sort.by(
new Sort.Order(direction, "documentDate").nullsLast(),
Sort.Order.asc("title"));
return Sort.by(direction, "documentDate");
}
// SENDER and RECEIVER are sorted in-memory before this method is called
return switch (sort) {

View File

@@ -55,12 +55,6 @@ public class DocumentSpecifications {
return (root, query, cb) -> status == null ? null : cb.equal(root.get("status"), status);
}
// Filtert auf undatierte Dokumente (meta_date IS NULL) — für die "Nur undatierte"-Triage.
// false → kein Prädikat (no-op), true → documentDate IS NULL (issue #668).
public static Specification<Document> undatedOnly(boolean undated) {
return (root, query, cb) -> undated ? cb.isNull(root.get("documentDate")) : null;
}
/**
* Filtert nach vorausgeweiteten Tag-ID-Sets mit AND- oder OR-Logik.
*

View File

@@ -11,11 +11,6 @@ import org.raddatz.familienarchiv.ocr.ScriptType;
public class DocumentUpdateDTO {
private String title;
private LocalDate documentDate;
private DatePrecision metaDatePrecision;
private LocalDate metaDateEnd;
private String metaDateRaw;
private String senderText;
private String receiverText;
private String location;
private String documentLocation;
private String archiveBox;

View File

@@ -43,7 +43,7 @@ public class TranscriptionBlockController {
@PostMapping
@ResponseStatus(HttpStatus.CREATED)
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public TranscriptionBlock createBlock(
@PathVariable UUID documentId,
@Valid @RequestBody CreateTranscriptionBlockDTO dto,
@@ -53,7 +53,7 @@ public class TranscriptionBlockController {
}
@PutMapping("/{blockId}")
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public TranscriptionBlock updateBlock(
@PathVariable UUID documentId,
@PathVariable UUID blockId,
@@ -65,7 +65,7 @@ public class TranscriptionBlockController {
@DeleteMapping("/{blockId}")
@ResponseStatus(HttpStatus.NO_CONTENT)
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public void deleteBlock(
@PathVariable UUID documentId,
@PathVariable UUID blockId) {
@@ -73,7 +73,7 @@ public class TranscriptionBlockController {
}
@PutMapping("/reorder")
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public List<TranscriptionBlock> reorderBlocks(
@PathVariable UUID documentId,
@RequestBody ReorderTranscriptionBlocksDTO dto) {
@@ -82,7 +82,7 @@ public class TranscriptionBlockController {
}
@PutMapping("/{blockId}/review")
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public TranscriptionBlock reviewBlock(
@PathVariable UUID documentId,
@PathVariable UUID blockId,
@@ -92,7 +92,7 @@ public class TranscriptionBlockController {
}
@PutMapping("/review-all")
@RequirePermission({Permission.ANNOTATE_ALL, Permission.WRITE_ALL})
@RequirePermission(Permission.WRITE_ALL)
public List<TranscriptionBlock> markAllBlocksReviewed(
@PathVariable UUID documentId,
Authentication authentication) {

View File

@@ -40,8 +40,6 @@ public enum ErrorCode {
// --- Import ---
/** A mass import is already in progress; only one can run at a time. 409 */
IMPORT_ALREADY_RUNNING,
/** A canonical import artifact is missing, unreadable, or missing a required header. 400 */
IMPORT_ARTIFACT_INVALID,
// --- Thumbnails ---
/** A thumbnail backfill is already in progress; only one can run at a time. 409 */

View File

@@ -1,131 +0,0 @@
package org.raddatz.familienarchiv.importing;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import org.raddatz.familienarchiv.person.relationship.RelationType;
import org.raddatz.familienarchiv.person.relationship.RelationshipService;
import org.raddatz.familienarchiv.person.relationship.dto.NetworkDTO;
import org.raddatz.familienarchiv.person.relationship.dto.PersonNodeDTO;
import org.raddatz.familienarchiv.person.relationship.dto.RelationshipDTO;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import java.io.File;
import java.time.LocalDateTime;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;
/**
* Runs the four canonical loaders in their real dependency order — encoded explicitly
* here, not implied by call order — and owns the async runner plus the {@link ImportStatus}
* state machine the admin UI consumes. The orchestrator smoke-checks that all four
* artifacts are present before starting, failing fast rather than half-loading tags but no
* documents. A malformed artifact (a loader throwing) sets {@code FAILED}; an individual
* bad file is surfaced through the {@link ImportStatus.SkippedFile} mechanism instead.
*/
@Service
@RequiredArgsConstructor
@Slf4j
public class CanonicalImportOrchestrator {
private static final String TAG_TREE_ARTIFACT = "canonical-tag-tree.xlsx";
private static final String PERSONS_ARTIFACT = "canonical-persons.xlsx";
private static final String PERSONS_TREE_ARTIFACT = "canonical-persons-tree.json";
private static final String DOCUMENTS_ARTIFACT = "canonical-documents.xlsx";
private final TagTreeImporter tagTreeImporter;
private final PersonRegisterImporter personRegisterImporter;
private final PersonTreeImporter personTreeImporter;
private final DocumentImporter documentImporter;
private final RelationshipService relationshipService;
@Value("${app.import.dir:/import}")
private String canonicalDir;
private volatile ImportStatus currentStatus = new ImportStatus(
ImportStatus.State.IDLE, "IMPORT_IDLE", "Kein Import gestartet.", 0, List.of(), null);
public ImportStatus getStatus() {
return currentStatus;
}
@Async
public void runImportAsync() {
if (currentStatus.state() == ImportStatus.State.RUNNING) {
throw DomainException.conflict(ErrorCode.IMPORT_ALREADY_RUNNING, "A mass import is already in progress");
}
runImport();
}
/** Synchronous entry point — wrapped by {@link #runImportAsync()} and called directly in tests. */
void runImport() {
currentStatus = new ImportStatus(ImportStatus.State.RUNNING, "IMPORT_RUNNING",
"Import läuft...", 0, List.of(), LocalDateTime.now());
try {
File tagTree = requireArtifact(TAG_TREE_ARTIFACT);
File persons = requireArtifact(PERSONS_ARTIFACT);
File personsTree = requireArtifact(PERSONS_TREE_ARTIFACT);
File documents = requireArtifact(DOCUMENTS_ARTIFACT);
// Dependency DAG: documents need persons + tags; the tree needs persons.
tagTreeImporter.load(tagTree);
personRegisterImporter.load(persons);
personTreeImporter.load(personsTree);
warnOnGenerationMonotonicityViolations();
DocumentImporter.LoadResult result = documentImporter.load(documents);
currentStatus = new ImportStatus(ImportStatus.State.DONE, "IMPORT_DONE",
"Import abgeschlossen. " + result.processed() + " Dokumente verarbeitet.",
result.processed(), result.skippedFiles(), currentStatus.startedAt());
} catch (DomainException e) {
log.error("Canonical import failed: {}", e.getMessage());
currentStatus = new ImportStatus(ImportStatus.State.FAILED, "IMPORT_FAILED_ARTIFACT",
"Fehler: " + e.getMessage(), 0, List.of(), currentStatus.startedAt());
} catch (Exception e) {
log.error("Canonical import failed", e);
currentStatus = new ImportStatus(ImportStatus.State.FAILED, "IMPORT_FAILED_INTERNAL",
"Fehler: " + e.getMessage(), 0, List.of(), currentStatus.startedAt());
}
}
private File requireArtifact(String name) {
File artifact = new File(canonicalDir, name);
if (!artifact.isFile()) {
throw DomainException.badRequest(ErrorCode.IMPORT_ARTIFACT_INVALID,
"Missing canonical artifact: " + name);
}
return artifact;
}
/**
* Walks every PARENT_OF edge in the family graph and logs a WARN whenever a child's
* generation is not strictly deeper than its parent's. Soft check only — the import
* is never aborted; the warning is a forensic signal for the curator. Reads through
* {@link RelationshipService} so the orchestrator stays within the layering rule
* (no direct repository access).
*/
private void warnOnGenerationMonotonicityViolations() {
NetworkDTO network = relationshipService.getFamilyNetwork();
Map<UUID, PersonNodeDTO> byId = new HashMap<>(network.nodes().size());
for (PersonNodeDTO node : network.nodes()) {
byId.put(node.id(), node);
}
for (RelationshipDTO edge : network.edges()) {
if (edge.relationType() != RelationType.PARENT_OF) continue;
PersonNodeDTO parent = byId.get(edge.personId());
PersonNodeDTO child = byId.get(edge.relatedPersonId());
if (parent == null || child == null) continue;
Integer pg = parent.generation();
Integer cg = child.generation();
if (pg != null && cg != null && cg <= pg) {
log.warn("Generation monotonicity violation: parent {} (G{}) -> child {} (G{})",
parent.displayName(), pg, child.displayName(), cg);
}
}
}
}

View File

@@ -1,133 +0,0 @@
package org.raddatz.familienarchiv.importing;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import java.io.File;
import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* Value-level POI helper for the canonical import artifacts. No Spring, no domain
* knowledge: it opens a workbook, maps the header row to column indices by name, and
* yields typed rows whose cells are looked up by header name — the seam that replaces
* the old positional {@code @Value app.import.col.*} indices. List columns are split on
* the pipe delimiter the normalizer emits.
*/
public final class CanonicalSheetReader {
private CanonicalSheetReader() {
}
/** A single data row, addressable by canonical header name (never by index). */
public static final class Row {
private final Map<String, Integer> headerIndex;
private final List<String> cells;
private Row(Map<String, Integer> headerIndex, List<String> cells) {
this.headerIndex = headerIndex;
this.cells = cells;
}
/** Trimmed cell value for the named header, or "" when absent/blank. */
public String get(String header) {
Integer index = headerIndex.get(header);
if (index == null || index >= cells.size()) return "";
String value = cells.get(index);
return value == null ? "" : value.trim();
}
}
/**
* Reads all data rows from the first sheet, validating that every required header is
* present. Throws a fail-closed {@link DomainException} on a missing header so a
* loader never silently maps the wrong column.
*/
public static List<Row> readRows(File file, List<String> requiredHeaders) {
try (FileInputStream fis = new FileInputStream(file);
Workbook workbook = WorkbookFactory.create(fis)) {
Sheet sheet = workbook.getSheetAt(0);
org.apache.poi.ss.usermodel.Row headerRow = sheet.getRow(sheet.getFirstRowNum());
Map<String, Integer> headerIndex = mapHeaders(headerRow);
requireHeaders(file, headerIndex, requiredHeaders);
List<Row> rows = new ArrayList<>();
for (int i = sheet.getFirstRowNum() + 1; i <= sheet.getLastRowNum(); i++) {
org.apache.poi.ss.usermodel.Row poiRow = sheet.getRow(i);
if (poiRow == null) continue;
rows.add(new Row(headerIndex, readCells(poiRow, headerIndex.size())));
}
return rows;
} catch (DomainException e) {
throw e;
} catch (Exception e) {
throw DomainException.badRequest(ErrorCode.IMPORT_ARTIFACT_INVALID,
"Unreadable canonical artifact: " + file.getName());
}
}
/** Splits a pipe-delimited list column into trimmed, non-empty segments. */
public static List<String> splitList(String raw) {
if (raw == null || raw.isBlank()) return List.of();
return Arrays.stream(raw.split("\\|"))
.map(String::trim)
.filter(s -> !s.isEmpty())
.toList();
}
private static Map<String, Integer> mapHeaders(org.apache.poi.ss.usermodel.Row headerRow) {
if (headerRow == null) {
return Map.of();
}
Map<String, Integer> headerIndex = new HashMap<>();
for (int c = 0; c < headerRow.getLastCellNum(); c++) {
String name = cellToString(headerRow.getCell(c)).trim();
if (!name.isEmpty()) headerIndex.putIfAbsent(name, c);
}
return headerIndex;
}
private static void requireHeaders(File file, Map<String, Integer> headerIndex, List<String> requiredHeaders) {
for (String header : requiredHeaders) {
if (!headerIndex.containsKey(header)) {
throw DomainException.badRequest(ErrorCode.IMPORT_ARTIFACT_INVALID,
"Missing required header '" + header + "' in artifact " + file.getName());
}
}
}
private static List<String> readCells(org.apache.poi.ss.usermodel.Row poiRow, int columnCount) {
int width = Math.max(columnCount, poiRow.getLastCellNum());
List<String> cells = new ArrayList<>(width);
for (int c = 0; c < width; c++) {
cells.add(cellToString(poiRow.getCell(c)));
}
return cells;
}
private static String cellToString(Cell cell) {
if (cell == null) return "";
return switch (cell.getCellType()) {
case STRING -> cell.getStringCellValue();
case NUMERIC -> {
if (DateUtil.isCellDateFormatted(cell)) {
yield cell.getLocalDateTimeCellValue().toLocalDate().toString();
}
yield String.valueOf((long) cell.getNumericCellValue());
}
case BOOLEAN -> String.valueOf(cell.getBooleanCellValue());
default -> "";
};
}
}

View File

@@ -1,391 +0,0 @@
package org.raddatz.familienarchiv.importing;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.raddatz.familienarchiv.document.DatePrecision;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.document.ThumbnailAsyncRunner;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonService;
import org.raddatz.familienarchiv.person.PersonType;
import org.raddatz.familienarchiv.person.PersonUpsertCommand;
import org.raddatz.familienarchiv.tag.Tag;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
import org.springframework.transaction.annotation.Transactional;
import software.amazon.awssdk.core.sync.RequestBody;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
import org.raddatz.familienarchiv.tag.TagService;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.time.LocalDate;
import java.time.format.DateTimeParseException;
import java.util.ArrayList;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Optional;
import java.util.Set;
import java.util.UUID;
import java.util.regex.Pattern;
/**
* Loads {@code canonical-documents.xlsx} into the document domain. Java performs no
* semantic transformation: the normalizer already resolved people to slugs and dates to
* ISO values. This loader maps columns by header name, routes each attribution
* register-first (always retaining the raw cell in {@code sender_text}/{@code receiver_text}),
* parses clean dates, and keeps the S3/thumbnail plumbing.
*
* <p>The import corpus is uniform — every PDF is named {@code <index>.pdf} flat in the import
* dir — so a document's PDF is resolved <em>directly by its index</em>:
* {@code importDir.resolve(index + ".pdf")}. The {@code index} is still hostile input
* regardless of upstream trust (CWE-22 does not care it came from our Python tool): it is
* validated against a strict catalog pattern with {@link #isValidImportIndex} (no path
* separators, no {@code .}/{@code ..}, no absolute path, no slash homoglyphs) and the
* resolved path is asserted to stay inside the import dir in {@link #resolvePdfByIndex} as
* defense-in-depth. The {@code %PDF} magic-byte check still gates upload.
*/
@Component
@RequiredArgsConstructor
@Slf4j
public class DocumentImporter {
static final List<String> REQUIRED_HEADERS = List.of(
"index", "sender_person_id", "sender_name",
"receiver_person_ids", "receiver_names", "date_iso", "date_raw", "date_precision");
// Catalog index shape: 14 letters (ASCII + Latin-1 letters, e.g. the German "ü" in
// "Mü-0001"), one or more hyphens (the corpus has a few "C--0029" data-entry artefacts),
// digits, and an optional trailing "x" the normalizer recognises. Anchored, with no
// separator / dot / slash characters in the class, so "<index>.pdf" can never traverse.
// NOTE: `\d` here is intentionally ASCII-only ([0-9]). Java's java.util.regex matches `\d`
// against [0-9] unless Pattern.UNICODE_CHARACTER_CLASS is set — do NOT add that flag, or
// Arabic-Indic / fullwidth digits would silently widen the accepted set.
private static final Pattern INDEX_PATTERN =
Pattern.compile("[A-Za-z\\u00C0-\\u00D6\\u00D8-\\u00F6\\u00F8-\\u00FF]{1,4}-+\\d+x?");
private final DocumentService documentService;
private final PersonService personService;
private final TagService tagService;
private final S3Client s3Client;
private final ThumbnailAsyncRunner thumbnailAsyncRunner;
private final FileStreamOpener fileStreamOpener;
@Value("${app.s3.bucket:familienarchiv}")
private String bucketName;
@Value("${app.import.dir:/import}")
private String importDir;
/** Outcome of loading the document sheet: processed count + per-file skips. */
public record LoadResult(int processed, List<ImportStatus.SkippedFile> skippedFiles) {}
// One transaction for the whole sheet keeps the Hibernate session open so an existing
// document's lazy receivers collection initialises during an idempotent re-import.
// Invoked cross-bean from the orchestrator, so the @Transactional proxy applies.
@Transactional
public LoadResult load(File artifact) {
List<CanonicalSheetReader.Row> rows = CanonicalSheetReader.readRows(artifact, REQUIRED_HEADERS);
int processed = 0;
List<ImportStatus.SkippedFile> skipped = new ArrayList<>();
// 1-based source row number for ops triage breadcrumbs (the spreadsheet header is row 1,
// so the first data row is row 2 — matches what an operator sees in the .xlsx).
int rowNumber = 1;
for (CanonicalSheetReader.Row row : rows) {
rowNumber++;
String index = row.get("index");
if (index.isBlank()) continue;
Optional<ImportStatus.SkipReason> skipReason = importRow(row, index, rowNumber);
if (skipReason.isPresent()) {
skipped.add(new ImportStatus.SkippedFile(index, skipReason.get()));
} else {
processed++;
}
}
log.info("Imported {} documents from {} ({} skipped)", processed, artifact.getName(), skipped.size());
return new LoadResult(processed, skipped);
}
private Optional<ImportStatus.SkipReason> importRow(CanonicalSheetReader.Row row, String index, int rowNumber) {
if (!isValidImportIndex(index)) {
// Breadcrumb is the source row number, NOT the raw (possibly-hostile) index — an
// operator triaging the import can find the offending row in the .xlsx without us
// echoing attacker-controlled input into the log.
log.warn("Skipping import row {}: index rejected (fails catalog-shape validation)", rowNumber);
return Optional.of(ImportStatus.SkipReason.INVALID_FILENAME_PATH_TRAVERSAL);
}
Optional<File> resolved = resolvePdfByIndex(index, rowNumber);
if (resolved.isEmpty()) {
// Distinct from the "index rejected" skip above: the index is VALID but no
// <index>.pdf is on disk, so the row becomes a normal PLACEHOLDER (not skipped). The
// index is a validated catalog id (no hostile content), so it is safe to log here —
// this surfaces a corpus that drifts from the "<index>.pdf" assumption (e.g. a file
// that arrived under a different name) rather than dropping it silently.
log.info("Import row {}: index {} is valid but {}.pdf is absent — creating PLACEHOLDER",
rowNumber, index, index);
} else {
try {
if (!isPdfMagicBytes(resolved.get())) {
return Optional.of(ImportStatus.SkipReason.INVALID_PDF_SIGNATURE);
}
} catch (IOException e) {
log.error("Magic-byte check failed for row {}", index, e);
return Optional.of(ImportStatus.SkipReason.FILE_READ_ERROR);
}
}
return persist(row, index, resolved);
}
private Optional<ImportStatus.SkipReason> persist(CanonicalSheetReader.Row row, String index, Optional<File> file) {
Document existing = documentService.findByOriginalFilename(index).orElse(null);
if (existing != null && existing.getStatus() != DocumentStatus.PLACEHOLDER) {
return Optional.of(ImportStatus.SkipReason.ALREADY_EXISTS);
}
String s3Key = null;
String contentType = null;
DocumentStatus status = DocumentStatus.PLACEHOLDER;
if (file.isPresent()) {
contentType = probeContentType(file.get());
s3Key = "documents/" + UUID.randomUUID() + "_" + file.get().getName();
try {
uploadToS3(file.get(), s3Key, contentType);
status = DocumentStatus.UPLOADED;
} catch (Exception e) {
log.error("S3 upload failed for {}", file.get().getName(), e);
return Optional.of(ImportStatus.SkipReason.S3_UPLOAD_FAILED);
}
}
Document doc = buildDocument(row, index, existing, s3Key, contentType, status);
Document saved = documentService.save(doc);
if (file.isPresent()) {
thumbnailAsyncRunner.dispatchAfterCommit(saved.getId());
}
return Optional.empty();
}
private Document buildDocument(CanonicalSheetReader.Row row, String index, Document existing,
String s3Key, String contentType, DocumentStatus status) {
Document doc = existing != null ? existing
: Document.builder().originalFilename(index).build();
applyAttribution(doc, row);
applyDates(doc, row);
applyAuthoritativeAssociations(doc, row);
applyFileMetadata(doc, s3Key, contentType, status, index);
applyComputedFlags(doc);
return doc;
}
// Sender + raw sender/receiver text. The raw cells are always retained verbatim, even
// when a person is linked — the load-bearing invariant behind the merge story (ADR-025).
private void applyAttribution(Document doc, CanonicalSheetReader.Row row) {
String senderName = row.get("sender_name");
String receiverNames = row.get("receiver_names");
Person sender = resolveSender(row.get("sender_person_id"), senderName);
doc.setSender(sender);
doc.setSenderText(blankToNull(senderName));
doc.setReceiverText(blankToNull(receiverNames));
}
// Date triplet + raw + location. Pure value parsing, no semantic logic.
private void applyDates(Document doc, CanonicalSheetReader.Row row) {
doc.setDocumentDate(parseIsoDate(row.get("date_iso")));
doc.setMetaDatePrecision(parsePrecision(row.get("date_precision")));
doc.setMetaDateEnd(parseIsoDate(row.get("date_end")));
doc.setMetaDateRaw(blankToNull(row.get("date_raw")));
doc.setLocation(blankToNull(row.get("location")));
doc.setSummary(blankToNull(row.get("summary")));
}
// Receivers and tags are owned by the canonical row (ADR-025): clear then re-populate so a
// shrunk set on re-import prunes stale links rather than accumulating them. The
// "preserve human edits" rule does NOT extend to these collections.
private void applyAuthoritativeAssociations(Document doc, CanonicalSheetReader.Row row) {
Set<Person> receivers = resolveReceivers(row.get("receiver_person_ids"), row.get("receiver_names"));
doc.getReceivers().clear();
doc.getReceivers().addAll(receivers);
attachTag(doc, row.get("tags"));
}
// S3 key, content type, status, and the index-derived title.
private void applyFileMetadata(Document doc, String s3Key, String contentType,
DocumentStatus status, String index) {
doc.setStatus(status);
doc.setFilePath(s3Key);
doc.setContentType(contentType);
doc.setTitle(buildTitle(index, doc.getDocumentDate(), doc.getMetaDatePrecision(),
doc.getMetaDateEnd(), doc.getMetaDateRaw(), doc.getLocation()));
}
// metadataComplete: a document counts as fully described if any of the three "who/when"
// pieces is filled. Called last so the upstream setters have already populated the doc.
private void applyComputedFlags(Document doc) {
doc.setMetadataComplete(doc.getDocumentDate() != null
|| doc.getSender() != null
|| !doc.getReceivers().isEmpty());
}
// The title carries the date at the HONEST precision (never a fabricated day) via the
// shared DocumentTitleFormatter, plus the location — kept under 20 lines by delegating.
private static String buildTitle(String index, LocalDate date, DatePrecision precision,
LocalDate end, String raw, String location) {
StringBuilder title = new StringBuilder(index);
if (date != null && precision != DatePrecision.UNKNOWN) {
title.append(" ").append(DocumentTitleFormatter.formatTitleDate(date, precision, end, raw));
}
if (location != null && !location.isBlank()) {
title.append(" ").append(location);
}
return title.toString();
}
// ─── attribution routing — register-first, always retain raw ─────────────────────
private Person resolveSender(String slug, String rawName) {
if (slug.isBlank()) return null;
return resolvePerson(slug, rawName);
}
// Zips the parallel `receiver_person_ids` and `receiver_names` columns by position so an
// unresolved receiver becomes a provisional Person whose lastName is the human name from
// `receiver_names`, not the slug. If the names list is shorter than the slugs list (rare —
// canonical data zips them 1:1), missing entries fall back to slug-as-name.
private Set<Person> resolveReceivers(String slugs, String names) {
List<String> slugList = CanonicalSheetReader.splitList(slugs);
List<String> nameList = CanonicalSheetReader.splitList(names);
Set<Person> receivers = new LinkedHashSet<>();
for (int i = 0; i < slugList.size(); i++) {
String slug = slugList.get(i);
String name = i < nameList.size() ? nameList.get(i) : slug;
receivers.add(resolvePerson(slug, name));
}
return receivers;
}
private Person resolvePerson(String slug, String rawName) {
return personService.findBySourceRef(slug)
.orElseGet(() -> personService.upsertBySourceRef(PersonUpsertCommand.builder()
.sourceRef(slug)
.lastName(blankToNull(rawName) == null ? slug : rawName)
.personType(PersonType.PERSON)
.provisional(true)
.build()));
}
// Authoritative: the canonical row defines the document's tags exactly. Clearing first
// means a tag removed from the row is pruned on re-import (ADR-025).
private void attachTag(Document doc, String tagPath) {
doc.getTags().clear();
if (tagPath.isBlank()) return;
tagService.findBySourceRef(tagPath).ifPresent(tag -> doc.getTags().add(tag));
}
// ─── clean-value parsing (no semantic logic) ─────────────────────────────────────
private static LocalDate parseIsoDate(String value) {
if (value == null || value.isBlank()) return null;
try {
return LocalDate.parse(value.trim());
} catch (DateTimeParseException e) {
return null;
}
}
private static DatePrecision parsePrecision(String value) {
if (value == null || value.isBlank()) return DatePrecision.UNKNOWN;
try {
return DatePrecision.valueOf(value.trim());
} catch (IllegalArgumentException e) {
return DatePrecision.UNKNOWN;
}
}
// ─── file handling + S3 (small ≤20-line methods) ─────────────────────────────────
private String probeContentType(File file) {
try {
String probed = Files.probeContentType(file.toPath());
return probed != null ? probed : "application/octet-stream";
} catch (IOException e) {
return "application/octet-stream";
}
}
private void uploadToS3(File file, String s3Key, String contentType) {
s3Client.putObject(PutObjectRequest.builder()
.bucket(bucketName)
.key(s3Key)
.contentType(contentType)
.build(),
RequestBody.fromFile(file));
}
// ─── index validation + containment — defense-in-depth, do not weaken ────────────
// The index is the only thing that drives the on-disk lookup, so it must never contain a
// path separator, traversal token, slash homoglyph, null byte, or absolute-path marker —
// each guard mirrors the filename guards ported from MassImportService — and it must match
// the strict catalog shape so anything unexpected is skipped loudly rather than read.
private boolean isValidImportIndex(String index) {
if (index == null || index.isBlank()) return false;
if (index.contains("/")) return false;
if (index.contains("\\")) return false;
if (index.contains("")) return false; // U+2215 DIVISION SLASH
if (index.contains("")) return false; // U+FF0F FULLWIDTH SOLIDUS
if (index.contains("")) return false; // U+29F5 REVERSE SOLIDUS OPERATOR
if (index.contains(".")) return false; // no dots — "<index>.pdf" is the only extension
if (index.contains("\0")) return false;
if (Paths.get(index).isAbsolute()) return false;
return INDEX_PATTERN.matcher(index).matches();
}
private boolean isPdfMagicBytes(File file) throws IOException {
// FileStreamOpener is injected so tests can stub a throwing implementation for the
// IO-error branch without spying on the importer itself.
try (InputStream is = fileStreamOpener.open(file)) {
byte[] header = is.readNBytes(4);
return header.length == 4
&& header[0] == 0x25 // %
&& header[1] == 0x50 // P
&& header[2] == 0x44 // D
&& header[3] == 0x46; // F
}
}
// O(1) direct lookup: the PDF is exactly importDir/<index>.pdf. The caller has already
// validated the index shape; the canonical-path containment assertion below is
// defense-in-depth so even a symlinked <index>.pdf cannot read outside importDir.
private Optional<File> resolvePdfByIndex(String index, int rowNumber) {
File baseDir = new File(importDir);
File candidate = baseDir.toPath().resolve(index + ".pdf").toFile();
try {
if (!candidate.isFile()) return Optional.empty();
String baseDirCanonical = baseDir.getCanonicalPath();
if (!candidate.getCanonicalPath().startsWith(baseDirCanonical + File.separator)) {
throw DomainException.internal(ErrorCode.INTERNAL_ERROR, "Path escape detected: " + candidate);
}
return Optional.of(candidate);
} catch (IOException e) {
// Distinct from the deliberate symlink-escape abort above (which throws): canonical
// resolution itself failed (e.g. the OS rejected the path mid-resolution). We fail
// safe to a PLACEHOLDER, but never silently — log it so the asymmetry surfaces in ops.
log.warn("Canonical path resolution failed for import row {}: treating {}.pdf as absent",
rowNumber, index, e);
return Optional.empty();
}
}
private static String blankToNull(String s) {
return (s == null || s.isBlank()) ? null : s;
}
}

View File

@@ -1,112 +0,0 @@
package org.raddatz.familienarchiv.importing;
import org.raddatz.familienarchiv.document.DatePrecision;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.Locale;
/**
* Produces the honest German date label baked into an import title — at exactly
* the precision the data claims, never finer. This is the Java half of the
* single source of truth shared with the frontend {@code formatDocumentDate}
* (TypeScript): both are asserted against {@code docs/date-label-fixtures.json}
* so the two implementations cannot drift (see #666).
*
* <p>Import titles are always German, so the labels here are the German
* canonical form (mirroring the {@code de} Paraglide messages used by the UI).
*/
final class DocumentTitleFormatter {
private static final DateTimeFormatter LONG = DateTimeFormatter.ofPattern("d. MMMM yyyy", Locale.GERMAN);
private static final DateTimeFormatter MONTH_YEAR = DateTimeFormatter.ofPattern("MMMM yyyy", Locale.GERMAN);
private static final DateTimeFormatter MEDIUM = DateTimeFormatter.ofPattern("d. MMM yyyy", Locale.GERMAN);
private static final DateTimeFormatter DAY_MONTH = DateTimeFormatter.ofPattern("d. MMM", Locale.GERMAN);
private static final String UNKNOWN = "Datum unbekannt";
private static final String APPROX_PREFIX = "ca.";
private static final String OPEN_RANGE_PREFIX = "ab";
private DocumentTitleFormatter() {
}
/**
* @param date the sort/filter anchor day; null for UNKNOWN rows
* @param precision descriptive precision metadata
* @param end the RANGE end day; null means an open-ended range
* @param raw the verbatim spreadsheet cell, used only to pick a season word
* @return the honest German label
*/
static String formatTitleDate(LocalDate date, DatePrecision precision, LocalDate end, String raw) {
if (precision == DatePrecision.UNKNOWN || date == null) {
return UNKNOWN;
}
return switch (precision) {
case DAY -> LONG.format(date);
case MONTH -> MONTH_YEAR.format(date);
case SEASON -> seasonLabel(date, raw);
case YEAR -> String.valueOf(date.getYear());
case APPROX -> APPROX_PREFIX + " " + date.getYear();
case RANGE -> rangeLabel(date, end);
case UNKNOWN -> UNKNOWN;
};
}
private static String seasonLabel(LocalDate date, String raw) {
Season season = seasonFromRaw(raw);
if (season == null) {
season = seasonOfMonth(date.getMonthValue());
}
return season.german + " " + date.getYear();
}
private static String rangeLabel(LocalDate start, LocalDate end) {
if (end == null) {
return OPEN_RANGE_PREFIX + " " + MEDIUM.format(start);
}
if (end.equals(start)) {
return MEDIUM.format(start);
}
if (start.getYear() != end.getYear()) {
return MEDIUM.format(start) + " " + MEDIUM.format(end);
}
if (start.getMonthValue() == end.getMonthValue()) {
return start.getDayOfMonth() + "." + MEDIUM.format(end);
}
return DAY_MONTH.format(start) + " " + MEDIUM.format(end);
}
// ─── season mapping — mirrors the normalizer's representative months ─────────────
private enum Season {
SPRING("Frühling"),
SUMMER("Sommer"),
AUTUMN("Herbst"),
WINTER("Winter");
private final String german;
Season(String german) {
this.german = german;
}
}
private static Season seasonOfMonth(int month) {
if (month >= 3 && month <= 5) return Season.SPRING;
if (month >= 6 && month <= 8) return Season.SUMMER;
if (month >= 9 && month <= 11) return Season.AUTUMN;
return Season.WINTER;
}
private static Season seasonFromRaw(String raw) {
if (raw == null || raw.isBlank()) return null;
String token = raw.trim().split("\\s+")[0].toLowerCase(Locale.GERMAN);
return switch (token) {
case "frühling", "frühjahr" -> Season.SPRING;
case "sommer" -> Season.SUMMER;
case "herbst" -> Season.AUTUMN;
case "winter" -> Season.WINTER;
default -> null;
};
}
}

View File

@@ -1,33 +0,0 @@
package org.raddatz.familienarchiv.importing;
import org.springframework.stereotype.Component;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
/**
* Test seam for opening a {@link File} as an {@link InputStream}. Extracted so the magic-byte
* check in {@link DocumentImporter} can be unit-tested for the IO-error branch by injecting a
* mock that throws, without needing a Mockito spy on the importer itself.
*
* <p>Production uses {@link DefaultFileStreamOpener}, a one-line delegate to
* {@code new FileInputStream(file)}.
*/
@FunctionalInterface
public interface FileStreamOpener {
/** Opens {@code file} for sequential reads. Caller closes the returned stream. */
InputStream open(File file) throws IOException;
/** Default production implementation: plain {@code FileInputStream}. */
@Component
final class DefaultFileStreamOpener implements FileStreamOpener {
@Override
public InputStream open(File file) throws IOException {
return new FileInputStream(file);
}
}
}

View File

@@ -1,50 +0,0 @@
package org.raddatz.familienarchiv.importing;
import com.fasterxml.jackson.annotation.JsonIgnore;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.swagger.v3.oas.annotations.media.Schema;
import java.time.LocalDateTime;
import java.util.List;
/**
* Async import state surfaced to {@code admin/system/ImportStatusCard.svelte} via the
* generated types. The shape ({@code state, statusCode, processed, skippedFiles, skipped})
* is kept verbatim from the retired MassImportService so the admin UI keeps working.
*/
public record ImportStatus(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) State state,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) String statusCode,
@JsonIgnore String message,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) int processed,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) List<SkippedFile> skippedFiles,
LocalDateTime startedAt
) {
public enum State { IDLE, RUNNING, DONE, FAILED }
public enum SkipReason {
INVALID_FILENAME_PATH_TRAVERSAL,
INVALID_PDF_SIGNATURE,
FILE_READ_ERROR,
ALREADY_EXISTS,
S3_UPLOAD_FAILED
}
public record SkippedFile(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) String filename,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) SkipReason reason
) {}
// Note: @Schema on a record accessor method is not picked up by SpringDoc; the
// "skipped" count is a computed convenience field derived from skippedFiles.size().
@JsonProperty("skipped")
public int skipped() {
return skippedFiles.size();
}
/** Defensive-copy constructor — callers cannot mutate the stored list after construction. */
public ImportStatus {
skippedFiles = List.copyOf(skippedFiles);
}
}

View File

@@ -0,0 +1,472 @@
package org.raddatz.familienarchiv.importing;
import com.fasterxml.jackson.annotation.JsonIgnore;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.swagger.v3.oas.annotations.media.Schema;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.poi.ss.usermodel.*;
import java.util.Objects;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.document.ThumbnailAsyncRunner;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.tag.Tag;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonNameParser;
import org.raddatz.familienarchiv.person.PersonService;
import org.raddatz.familienarchiv.tag.TagService;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import software.amazon.awssdk.core.sync.RequestBody;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDate;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeParseException;
import java.util.ArrayList;
import java.util.List;
import java.util.Locale;
import java.util.Optional;
import java.util.UUID;
import java.util.stream.Stream;
import java.util.zip.ZipFile;
@Service
@RequiredArgsConstructor
@Slf4j
public class MassImportService {
public enum State { IDLE, RUNNING, DONE, FAILED }
public record SkippedFile(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) String filename,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) String reason
) {}
public record ImportStatus(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) State state,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) String statusCode,
@JsonIgnore String message,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) int processed,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) List<SkippedFile> skippedFiles,
LocalDateTime startedAt
) {
// Note: @Schema on a record accessor method is not picked up by SpringDoc; the
// "skipped" count is a computed convenience field derived from skippedFiles.size().
@JsonProperty("skipped")
public int skipped() { return skippedFiles.size(); }
/** Defensive-copy constructor — callers cannot mutate the stored list after construction. */
public ImportStatus {
skippedFiles = List.copyOf(skippedFiles);
}
}
record ProcessResult(int processed, List<SkippedFile> skippedFiles) {}
private volatile ImportStatus currentStatus = new ImportStatus(State.IDLE, "IMPORT_IDLE", "Kein Import gestartet.", 0, List.of(), null);
public ImportStatus getStatus() {
return currentStatus;
}
private final DocumentService documentService;
private final PersonService personService;
private final TagService tagService;
private final S3Client s3Client;
private final ThumbnailAsyncRunner thumbnailAsyncRunner;
@Value("${app.s3.bucket}")
private String bucketName;
@Value("${app.import.col.index:0}")
private int colIndex;
@Value("${app.import.col.box:1}")
private int colBox;
@Value("${app.import.col.folder:2}")
private int colFolder;
@Value("${app.import.col.sender:3}")
private int colSender;
@Value("${app.import.col.receivers:5}")
private int colReceivers;
@Value("${app.import.col.date:7}")
private int colDate;
@Value("${app.import.col.location:9}")
private int colLocation;
@Value("${app.import.col.tags:10}")
private int colTags;
@Value("${app.import.col.summary:11}")
private int colSummary;
@Value("${app.import.col.transcription:13}")
private int colTranscription;
@Value("${app.import.dir:/import}")
private String importDir;
private static final DateTimeFormatter GERMAN_DATE = DateTimeFormatter.ofPattern("d. MMMM yyyy", Locale.GERMAN);
// ODS XML namespaces
private static final String NS_TABLE = "urn:oasis:names:tc:opendocument:xmlns:table:1.0";
private static final String NS_TEXT = "urn:oasis:names:tc:opendocument:xmlns:text:1.0";
// We only need up to this many columns; caps repeated-empty-cell expansion
private static final int MAX_COLS = 20;
@Async
public void runImportAsync() {
if (currentStatus.state() == State.RUNNING) {
throw DomainException.conflict(ErrorCode.IMPORT_ALREADY_RUNNING, "A mass import is already in progress");
}
currentStatus = new ImportStatus(State.RUNNING, "IMPORT_RUNNING", "Import läuft...", 0, List.of(), LocalDateTime.now());
try {
File spreadsheet = findSpreadsheetFile();
log.info("Starte Massenimport aus: {}", spreadsheet.getAbsolutePath());
ProcessResult result = processRows(readSpreadsheet(spreadsheet));
currentStatus = new ImportStatus(State.DONE, "IMPORT_DONE",
"Import abgeschlossen. " + result.processed() + " Dokumente verarbeitet.",
result.processed(), result.skippedFiles(), currentStatus.startedAt());
} catch (NoSpreadsheetException e) {
log.error("Massenimport fehlgeschlagen: keine Tabellendatei", e);
currentStatus = new ImportStatus(State.FAILED, "IMPORT_FAILED_NO_SPREADSHEET",
"Fehler: " + e.getMessage(), 0, List.of(), currentStatus.startedAt());
} catch (Exception e) {
log.error("Massenimport fehlgeschlagen", e);
currentStatus = new ImportStatus(State.FAILED, "IMPORT_FAILED_INTERNAL",
"Fehler: " + e.getMessage(), 0, List.of(), currentStatus.startedAt());
}
}
private static class NoSpreadsheetException extends RuntimeException {
NoSpreadsheetException(String message) { super(message); }
}
private File findSpreadsheetFile() throws IOException {
try (Stream<Path> files = Files.list(Paths.get(importDir))) {
return files
.filter(p -> {
String name = p.toString().toLowerCase();
return name.endsWith(".ods") || name.endsWith(".xlsx") || name.endsWith(".xls");
})
.findFirst()
.orElseThrow(() -> new NoSpreadsheetException(
"Keine Tabellendatei (.ods/.xlsx/.xls) in " + importDir + " gefunden!"))
.toFile();
}
}
// --- Spreadsheet reading (format-specific, produces neutral List<List<String>>) ---
private List<List<String>> readSpreadsheet(File file) throws Exception {
String name = file.getName().toLowerCase();
if (name.endsWith(".ods")) {
return readOds(file);
}
return readXlsx(file);
}
/**
* Reads an ODS file by parsing its content.xml directly (no extra library needed).
* ODS is a ZIP archive; content.xml holds the spreadsheet data as XML.
*/
List<List<String>> readOds(File file) throws Exception {
List<List<String>> result = new ArrayList<>();
try (ZipFile zip = new ZipFile(file)) {
var entry = zip.getEntry("content.xml");
if (entry == null) throw new RuntimeException("Ungültige ODS-Datei: content.xml fehlt");
var factory = XxeSafeXmlParser.hardenedFactory();
factory.setNamespaceAware(true);
var builder = factory.newDocumentBuilder();
var doc = builder.parse(zip.getInputStream(entry));
NodeList tables = doc.getElementsByTagNameNS(NS_TABLE, "table");
if (tables.getLength() == 0) return result;
var table = (Element) tables.item(0);
NodeList rows = table.getElementsByTagNameNS(NS_TABLE, "table-row");
for (int i = 0; i < rows.getLength(); i++) {
var row = (Element) rows.item(i);
List<String> rowData = new ArrayList<>();
NodeList cells = row.getElementsByTagNameNS(NS_TABLE, "table-cell");
for (int j = 0; j < cells.getLength() && rowData.size() < MAX_COLS; j++) {
var cell = (Element) cells.item(j);
// Read the display text (first <text:p>)
String value = "";
NodeList textNodes = cell.getElementsByTagNameNS(NS_TEXT, "p");
if (textNodes.getLength() > 0) {
value = textNodes.item(0).getTextContent().trim();
}
// Expand number-columns-repeated (capped at MAX_COLS)
String repeatAttr = cell.getAttributeNS(NS_TABLE, "number-columns-repeated");
int repeat = repeatAttr.isEmpty() ? 1 : Integer.parseInt(repeatAttr);
repeat = Math.min(repeat, MAX_COLS - rowData.size());
for (int r = 0; r < repeat; r++) {
rowData.add(value);
}
}
result.add(rowData);
}
}
return result;
}
/** Reads an XLSX/XLS file using Apache POI. Converts all cells to strings. */
private List<List<String>> readXlsx(File file) throws Exception {
List<List<String>> result = new ArrayList<>();
try (FileInputStream fis = new FileInputStream(file);
Workbook workbook = WorkbookFactory.create(fis)) {
Sheet sheet = workbook.getSheetAt(0);
for (int i = 0; i <= sheet.getLastRowNum(); i++) {
Row row = sheet.getRow(i);
List<String> rowData = new ArrayList<>();
if (row != null) {
for (int j = 0; j < MAX_COLS; j++) {
rowData.add(xlsxCellToString(row.getCell(j)));
}
}
result.add(rowData);
}
}
return result;
}
private String xlsxCellToString(Cell cell) {
if (cell == null) return "";
return switch (cell.getCellType()) {
case STRING -> cell.getStringCellValue();
case NUMERIC -> {
if (DateUtil.isCellDateFormatted(cell)) {
yield cell.getLocalDateTimeCellValue().toLocalDate().toString(); // ISO
}
yield String.valueOf((int) cell.getNumericCellValue());
}
case BOOLEAN -> String.valueOf(cell.getBooleanCellValue());
default -> "";
};
}
// --- Import logic (works on neutral List<String> rows) ---
private ProcessResult processRows(List<List<String>> rows) {
int processed = 0;
List<SkippedFile> skippedFiles = new ArrayList<>();
for (int i = 1; i < rows.size(); i++) { // skip header row
List<String> cells = rows.get(i);
String index = getCell(cells, colIndex);
if (index.isBlank()) continue;
String filename = index.contains(".") ? index : index + ".pdf";
Optional<File> fileOnDisk = findFileRecursive(filename);
if (fileOnDisk.isEmpty()) {
log.warn("Datei nicht gefunden, importiere nur Metadaten: {}", filename);
}
if (fileOnDisk.isPresent()) {
try {
if (!isPdfMagicBytes(fileOnDisk.get())) {
log.warn("Überspringe {}: Datei beginnt nicht mit %PDF-Signatur", filename);
skippedFiles.add(new SkippedFile(filename, "INVALID_PDF_SIGNATURE"));
continue;
}
} catch (IOException e) {
log.error("Fehler beim Prüfen der Magic-Bytes für {}", filename, e);
skippedFiles.add(new SkippedFile(filename, "FILE_READ_ERROR"));
continue;
}
}
Optional<String> skipReason = importSingleDocument(cells, fileOnDisk, filename, index);
if (skipReason.isPresent()) {
skippedFiles.add(new SkippedFile(filename, skipReason.get()));
} else {
processed++;
}
}
return new ProcessResult(processed, skippedFiles);
}
// package-private: Mockito spy in tests can override to inject IOException
InputStream openFileStream(File file) throws IOException {
return new FileInputStream(file);
}
private boolean isPdfMagicBytes(File file) throws IOException {
try (InputStream is = openFileStream(file)) {
byte[] header = is.readNBytes(4);
return header.length == 4
&& header[0] == 0x25 // %
&& header[1] == 0x50 // P
&& header[2] == 0x44 // D
&& header[3] == 0x46; // F
}
}
/**
* Imports a single document row.
*
* @return empty Optional on success; an Optional containing the skip reason on failure/skip.
*/
@Transactional
protected Optional<String> importSingleDocument(List<String> cells, Optional<File> file, String originalFilename, String index) {
Optional<Document> existing = documentService.findByOriginalFilename(originalFilename);
if (existing.isPresent() && existing.get().getStatus() != DocumentStatus.PLACEHOLDER) {
log.info("Dokument {} existiert bereits, überspringe.", originalFilename);
return Optional.of("ALREADY_EXISTS");
}
String archiveBox = getCell(cells, colBox);
String archiveFolder = getCell(cells, colFolder);
String senderRaw = getCell(cells, colSender);
String receiversRaw = getCell(cells, colReceivers);
LocalDate date = parseDate(getCell(cells, colDate));
String location = getCell(cells, colLocation);
String tagRaw = getCell(cells, colTags);
String summary = getCell(cells, colSummary);
String transcription = getCell(cells, colTranscription);
String s3Key = null;
String contentType = null;
DocumentStatus status = DocumentStatus.PLACEHOLDER;
if (file.isPresent()) {
try {
contentType = Files.probeContentType(file.get().toPath());
} catch (IOException e) {
contentType = null;
}
if (contentType == null) contentType = "application/octet-stream";
s3Key = "documents/" + UUID.randomUUID() + "_" + file.get().getName();
try {
s3Client.putObject(PutObjectRequest.builder()
.bucket(bucketName)
.key(s3Key)
.contentType(contentType)
.build(),
RequestBody.fromFile(file.get()));
status = DocumentStatus.UPLOADED;
} catch (Exception e) {
log.error("S3 Upload Fehler für {}", file.get().getName(), e);
return Optional.of("S3_UPLOAD_FAILED");
}
}
Person sender = senderRaw.isBlank() ? null : findOrCreatePerson(senderRaw);
List<Person> receivers = PersonNameParser.parseReceivers(receiversRaw).stream()
.map(this::findOrCreatePerson)
.filter(Objects::nonNull)
.toList();
Tag tag = null;
if (!tagRaw.isBlank()) {
tag = tagService.findOrCreate(tagRaw);
}
Document doc = existing.orElse(Document.builder()
.originalFilename(originalFilename)
.build());
// Heuristic: mark as complete if at least one key field is present in the spreadsheet row
boolean metadataComplete = date != null || !senderRaw.isBlank() || !receiversRaw.isBlank();
doc.setTitle(buildTitle(index, date, location));
doc.setFilePath(s3Key);
doc.setContentType(contentType);
doc.setStatus(status);
doc.setArchiveBox(archiveBox.isBlank() ? null : archiveBox);
doc.setArchiveFolder(archiveFolder.isBlank() ? null : archiveFolder);
doc.setDocumentDate(date);
doc.setLocation(location.isBlank() ? null : location);
doc.setSummary(summary.isBlank() ? null : summary);
doc.setTranscription(transcription.isBlank() ? null : transcription);
doc.setSender(sender);
doc.getReceivers().addAll(receivers);
if (tag != null) doc.getTags().add(tag);
doc.setMetadataComplete(metadataComplete);
Document saved = documentService.save(doc);
if (file.isPresent()) {
thumbnailAsyncRunner.dispatchAfterCommit(saved.getId());
}
log.info("Importiert{}: {}", file.isEmpty() ? " (nur Metadaten)" : "", originalFilename);
return Optional.empty();
}
// --- Helpers ---
private String getCell(List<String> cells, int col) {
if (col >= cells.size()) return "";
String val = cells.get(col);
return val == null ? "" : val.trim();
}
private LocalDate parseDate(String value) {
if (value == null || value.isBlank()) return null;
try {
return LocalDate.parse(value.trim());
} catch (DateTimeParseException e) {
return null;
}
}
private String buildTitle(String index, LocalDate date, String location) {
StringBuilder sb = new StringBuilder(index);
if (date != null) {
sb.append(" \u2013 ").append(date.format(GERMAN_DATE));
}
if (location != null && !location.isBlank()) {
sb.append(" \u2013 ").append(location);
}
return sb.toString();
}
private Person findOrCreatePerson(String rawName) {
return personService.findOrCreateByAlias(rawName);
}
private Optional<File> findFileRecursive(String filename) {
try (Stream<Path> walk = Files.walk(Paths.get(importDir))) {
return walk.filter(p -> !Files.isDirectory(p))
.filter(p -> p.getFileName().toString().equals(filename))
.map(Path::toFile)
.findFirst();
} catch (IOException e) {
return Optional.empty();
}
}
}

View File

@@ -1,99 +0,0 @@
package org.raddatz.familienarchiv.importing;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.raddatz.familienarchiv.person.PersonGeneration;
import org.raddatz.familienarchiv.person.PersonService;
import org.raddatz.familienarchiv.person.PersonType;
import org.raddatz.familienarchiv.person.PersonUpsertCommand;
import org.springframework.stereotype.Component;
import java.io.File;
import java.time.LocalDate;
import java.time.format.DateTimeParseException;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Loads {@code canonical-persons.xlsx} (the register) into the person domain via
* {@link PersonService}, upserting each person by the normalizer {@code person_id}
* (source_ref). Register persons are confident identities, so {@code provisional} is
* driven by the sheet's already-clean value (normally {@code False}).
*/
@Component
@RequiredArgsConstructor
@Slf4j
public class PersonRegisterImporter {
static final List<String> REQUIRED_HEADERS = List.of("person_id", "last_name", "first_name", "provisional");
// Matches a leading optional G then a signed integer. Anchored at the
// start so noise can't slip in before the number, but tolerant of trailing
// commentary cells (e.g. "G 2 de Gruyter") since curated rows sometimes
// carry an inline note. Out-of-range values are caught by the post-parse
// range guard, not by the regex.
private static final Pattern GENERATION_PATTERN = Pattern.compile("^\\s*G?\\s*(-?\\d+)");
private final PersonService personService;
public int load(File artifact) {
List<CanonicalSheetReader.Row> rows = CanonicalSheetReader.readRows(artifact, REQUIRED_HEADERS);
int processed = 0;
for (CanonicalSheetReader.Row row : rows) {
String personId = row.get("person_id");
if (personId.isBlank()) continue;
personService.upsertBySourceRef(toCommand(row, personId));
processed++;
}
log.info("Imported {} register persons from {}", processed, artifact.getName());
return processed;
}
private PersonUpsertCommand toCommand(CanonicalSheetReader.Row row, String personId) {
return PersonUpsertCommand.builder()
.sourceRef(personId)
.lastName(blankToNull(row.get("last_name")))
.firstName(blankToNull(row.get("first_name")))
.maidenName(blankToNull(row.get("maiden_name")))
.notes(blankToNull(row.get("notes")))
.birthYear(yearOf(row.get("birth_date")))
.deathYear(yearOf(row.get("death_date")))
.generation(parseGeneration(row.get("generation"), personId))
.personType(PersonType.PERSON)
.provisional(Boolean.parseBoolean(row.get("provisional")))
.build();
}
/**
* Parses an optional {@code G n} generation cell. Returns null for blanks,
* non-matching strings, and any value outside the {@link PersonGeneration}
* bounds (mirroring the V70 CHECK). Out-of-range values log a WARN but
* never abort the batch — REQ-IMP-001.
*/
static Integer parseGeneration(String raw, String personId) {
if (raw == null || raw.isBlank()) return null;
Matcher m = GENERATION_PATTERN.matcher(raw);
if (!m.find()) return null;
int parsed = Integer.parseInt(m.group(1));
if (parsed < PersonGeneration.MIN_GENERATION || parsed > PersonGeneration.MAX_GENERATION) {
log.warn("Skipping out-of-range generation '{}' for row {}", raw, personId);
return null;
}
log.debug("Parsed generation '{}' for person {}", raw, personId);
return parsed;
}
private static Integer yearOf(String isoDate) {
if (isoDate == null || isoDate.isBlank()) return null;
try {
return LocalDate.parse(isoDate.trim()).getYear();
} catch (DateTimeParseException e) {
return null;
}
}
private static String blankToNull(String s) {
return (s == null || s.isBlank()) ? null : s;
}
}

View File

@@ -1,153 +0,0 @@
package org.raddatz.familienarchiv.importing;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonGeneration;
import org.raddatz.familienarchiv.person.PersonService;
import org.raddatz.familienarchiv.person.PersonType;
import org.raddatz.familienarchiv.person.PersonUpsertCommand;
import org.raddatz.familienarchiv.person.relationship.RelationType;
import org.raddatz.familienarchiv.person.relationship.RelationshipService;
import org.raddatz.familienarchiv.person.relationship.dto.CreateRelationshipRequest;
import org.springframework.stereotype.Component;
import java.io.File;
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;
/**
* Loads {@code canonical-persons-tree.json} into the person + relationship domains.
* Tree persons are upserted via {@link PersonService} keyed on the shared
* {@code personId} slug (which Phase 1 #670 now emits into the tree), so they reconcile
* with the register rather than duplicating it. Relationships reference persons by the
* tree's local {@code rowId}; each side is mapped to the upserted person's UUID and
* created through {@link RelationshipService} (never the relationship repository —
* layering rule). A duplicate relationship on re-import is swallowed for idempotency.
*/
@Component
@RequiredArgsConstructor
@Slf4j
public class PersonTreeImporter {
// The tree JSON is a local implementation detail, not a shared API payload, so the
// importer owns its own mapper rather than depending on the web ObjectMapper bean.
private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
private final PersonService personService;
private final RelationshipService relationshipService;
public int load(File artifact) {
JsonNode root = readTree(artifact);
Map<String, UUID> idByRowId = upsertPersons(root.path("persons"));
int relationships = createRelationships(root.path("relationships"), idByRowId);
log.info("Imported {} tree persons and {} relationships from {}",
idByRowId.size(), relationships, artifact.getName());
return idByRowId.size();
}
private JsonNode readTree(File artifact) {
try {
return OBJECT_MAPPER.readTree(artifact);
} catch (Exception e) {
throw DomainException.badRequest(ErrorCode.IMPORT_ARTIFACT_INVALID,
"Unreadable canonical artifact: " + artifact.getName());
}
}
private Map<String, UUID> upsertPersons(JsonNode persons) {
Map<String, UUID> idByRowId = new HashMap<>();
for (JsonNode node : persons) {
String personId = text(node, "personId");
if (personId.isBlank()) continue;
Person person = personService.upsertBySourceRef(toCommand(node, personId));
idByRowId.put(text(node, "rowId"), person.getId());
}
return idByRowId;
}
private PersonUpsertCommand toCommand(JsonNode node, String personId) {
return PersonUpsertCommand.builder()
.sourceRef(personId)
.lastName(blankToNull(text(node, "lastName")))
.firstName(blankToNull(text(node, "firstName")))
.maidenName(blankToNull(text(node, "maidenName")))
.notes(blankToNull(text(node, "notes")))
.birthYear(intOrNull(node, "birthYear"))
.deathYear(intOrNull(node, "deathYear"))
.generation(generationOrNull(node, personId))
.familyMember(node.path("familyMember").asBoolean(false))
.personType(PersonType.PERSON)
.provisional(false)
.build();
}
/**
* Returns the JSON {@code generation} value if present and within the
* {@link PersonGeneration} bounds; null otherwise. Out-of-range values
* log a WARN but never abort the batch — mirrors the register-importer
* skip-and-warn policy.
*/
private static Integer generationOrNull(JsonNode node, String personId) {
Integer raw = intOrNull(node, "generation");
if (raw == null) return null;
if (raw < PersonGeneration.MIN_GENERATION || raw > PersonGeneration.MAX_GENERATION) {
log.warn("Skipping out-of-range generation '{}' for person {}", raw, personId);
return null;
}
return raw;
}
private int createRelationships(JsonNode relationships, Map<String, UUID> idByRowId) {
int created = 0;
for (JsonNode node : relationships) {
// Trap: a relationship node's personId / relatedPersonId fields carry the tree's
// local rowId (e.g. "row_a"), NOT a person slug. They are resolved through
// idByRowId to the upserted person's UUID.
UUID person = idByRowId.get(text(node, "personId"));
UUID related = idByRowId.get(text(node, "relatedPersonId"));
if (person == null || related == null) {
log.warn("Skipping tree relationship with unresolved rowId: {} -> {}",
text(node, "personId"), text(node, "relatedPersonId"));
continue;
}
if (addRelationshipIdempotently(person, related, text(node, "type"))) {
created++;
}
}
return created;
}
private boolean addRelationshipIdempotently(UUID person, UUID related, String type) {
try {
relationshipService.addRelationship(person,
new CreateRelationshipRequest(related, RelationType.valueOf(type), null, null, null));
return true;
} catch (DomainException e) {
if (e.getCode() == ErrorCode.DUPLICATE_RELATIONSHIP
|| e.getCode() == ErrorCode.CIRCULAR_RELATIONSHIP) {
return false;
}
throw e;
}
}
private static String text(JsonNode node, String field) {
JsonNode value = node.get(field);
return value == null || value.isNull() ? "" : value.asText();
}
private static Integer intOrNull(JsonNode node, String field) {
JsonNode value = node.get(field);
return value == null || value.isNull() ? null : value.asInt();
}
private static String blankToNull(String s) {
return (s == null || s.isBlank()) ? null : s;
}
}

View File

@@ -1,54 +0,0 @@
package org.raddatz.familienarchiv.importing;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.raddatz.familienarchiv.tag.Tag;
import org.raddatz.familienarchiv.tag.TagService;
import org.springframework.stereotype.Component;
import java.io.File;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;
/**
* Loads {@code canonical-tag-tree.xlsx} into the tag domain via {@link TagService},
* upserting each tag by its canonical {@code tag_path} (the source_ref). Parent links are
* resolved by the parent's path, which is the child path with its last {@code /segment}
* stripped. Rows are emitted parents-first by the normalizer, so a parent is always
* resolved before any child references it.
*/
@Component
@RequiredArgsConstructor
@Slf4j
public class TagTreeImporter {
static final List<String> REQUIRED_HEADERS = List.of("tag_path", "parent_name", "tag_name");
private static final String PATH_SEPARATOR = "/";
private final TagService tagService;
public int load(File artifact) {
List<CanonicalSheetReader.Row> rows = CanonicalSheetReader.readRows(artifact, REQUIRED_HEADERS);
Map<String, UUID> idByPath = new HashMap<>();
int processed = 0;
for (CanonicalSheetReader.Row row : rows) {
String path = row.get("tag_path");
if (path.isBlank()) continue;
UUID parentId = resolveParentId(path, idByPath);
Tag tag = tagService.upsertBySourceRef(path, row.get("tag_name"), parentId);
idByPath.put(path, tag.getId());
processed++;
}
log.info("Imported {} tags from {}", processed, artifact.getName());
return processed;
}
private UUID resolveParentId(String path, Map<String, UUID> idByPath) {
int lastSeparator = path.lastIndexOf(PATH_SEPARATOR);
if (lastSeparator < 0) return null;
String parentPath = path.substring(0, lastSeparator);
return idByPath.get(parentPath);
}
}

View File

@@ -0,0 +1,20 @@
package org.raddatz.familienarchiv.importing;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
class XxeSafeXmlParser {
private XxeSafeXmlParser() {}
static DocumentBuilderFactory hardenedFactory() throws ParserConfigurationException {
var factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);
return factory;
}
}

View File

@@ -52,30 +52,11 @@ public class Person {
private Integer birthYear;
private Integer deathYear;
// Hand-curated generation index from canonical-persons.xlsx (G 0 = oldest).
// Nullable for persons outside the curated family graph. Drives the
// Stammbaum strict-rank seed (see #689) and re-import preserves human
// edits via PersonService.preferHuman (ADR-025).
@Column(name = "generation")
private Integer generation;
@Column(name = "family_member", nullable = false)
@Builder.Default
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
private boolean familyMember = false;
// The normalizer person_id — join key and re-import idempotency key. Null for manually
// created persons; unique among non-null values (see ADR-025).
@Column(name = "source_ref")
private String sourceRef;
// A provisional person is one the importer inferred but could not confidently identify.
// Distinct from familyMember (a genealogical fact); set true only by the importer (Phase 3).
@Column(name = "provisional", nullable = false)
@Builder.Default
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
private boolean provisional = false;
// Entity-graph navigation for JPA JOIN queries (e.g. DocumentSpecifications.hasText).
// Uses entity relationship rather than cross-domain repository access, avoiding a
// separate DB roundtrip while respecting domain boundaries.

View File

@@ -22,15 +22,12 @@ import org.springframework.web.bind.annotation.*;
import org.springframework.web.server.ResponseStatusException;
import jakarta.validation.Valid;
import jakarta.validation.constraints.Max;
import jakarta.validation.constraints.Min;
import lombok.RequiredArgsConstructor;
@RestController
@RequestMapping("/api/persons")
@RequiredArgsConstructor
@Validated
public class PersonController {
private final PersonService personService;
@@ -38,37 +35,15 @@ public class PersonController {
@GetMapping
@RequirePermission(Permission.READ_ALL)
public ResponseEntity<PersonSearchResult> getPersons(
public ResponseEntity<List<PersonSummaryDTO>> getPersons(
@RequestParam(required = false) String q,
@RequestParam(required = false) PersonType type,
@RequestParam(required = false) Boolean familyOnly,
@RequestParam(required = false) Boolean hasDocuments,
@RequestParam(required = false) Boolean provisional,
// review=true reveals the import noise (transcriber view); absent/false keeps the
// clean reader default (familyMember OR documentCount > 0). The explicit filters AND
// within whichever base the review flag selects.
@RequestParam(required = false, defaultValue = "false") boolean review,
@RequestParam(required = false) String sort,
@RequestParam(defaultValue = "0") @Min(0) int page,
@RequestParam(defaultValue = "50") @Min(1) @Max(100) int size) {
// Legacy top-N-by-document-count path (reader dashboard): preserved, wrapped in the
// same envelope so /api/persons always returns one shape. It is explicitly NON-paged —
// the top-N query returns the complete result, so PersonSearchResult.topN reports an
// honest totalElements (= returned count) instead of pretending to be a page slice.
if ("documentCount".equals(sort) && q == null) {
@RequestParam(required = false, defaultValue = "0") int size,
@RequestParam(required = false) String sort) {
if ("documentCount".equals(sort) && size > 0 && q == null) {
int safeSize = Math.min(size, 50);
List<PersonSummaryDTO> top = personService.findTopByDocumentCount(safeSize);
return ResponseEntity.ok(PersonSearchResult.topN(top));
return ResponseEntity.ok(personService.findTopByDocumentCount(safeSize));
}
PersonFilter filter = PersonFilter.builder()
.type(type)
.familyOnly(familyOnly)
.hasDocuments(hasDocuments)
.provisional(provisional)
.readerDefault(!review)
.build();
return ResponseEntity.ok(personService.search(filter, page, size, q));
return ResponseEntity.ok(personService.findAll(q));
}
@GetMapping("/{id}")
@@ -135,21 +110,6 @@ public class PersonController {
personService.mergePersons(id, UUID.fromString(targetIdStr));
}
// Dedicated state transition that clears the provisional flag. A separate verb (not a
// mass-assignable DTO field) so provisional can never be smuggled in via create/update.
@PatchMapping("/{id}/confirm")
@RequirePermission(Permission.WRITE_ALL)
public ResponseEntity<Person> confirmPerson(@PathVariable UUID id) {
return ResponseEntity.ok(personService.confirmPerson(id));
}
@DeleteMapping("/{id}")
@ResponseStatus(HttpStatus.NO_CONTENT)
@RequirePermission(Permission.WRITE_ALL)
public void deletePerson(@PathVariable UUID id) {
personService.deletePerson(id);
}
// ─── Alias endpoints ────────────────────────────────────────────────────
@GetMapping("/{id}/aliases")

View File

@@ -1,36 +0,0 @@
package org.raddatz.familienarchiv.person;
import lombok.Builder;
/**
* The reader/triage filter set for the persons directory, threaded as one value through
* {@code PersonController -> PersonService -> PersonRepository}. Each field is nullable:
* null means "do not constrain on this dimension".
*
* <ul>
* <li>{@code type} — restrict to a single {@link PersonType}.</li>
* <li>{@code familyOnly} — when true, only {@code familyMember} persons.</li>
* <li>{@code hasDocuments} — when true, only persons with documentCount &gt; 0.</li>
* <li>{@code provisional} — match the {@code Person.provisional} flag exactly.</li>
* <li>{@code readerDefault} — when true, restrict to {@code familyMember OR documentCount > 0}
* (the clean reader view). The explicit filters above AND with this restriction.</li>
* </ul>
*/
@Builder
public record PersonFilter(
PersonType type,
Boolean familyOnly,
Boolean hasDocuments,
Boolean provisional,
boolean readerDefault
) {
/** The unconstrained "show all" filter (transcriber view, no reader restriction). */
public static PersonFilter showAll() {
return PersonFilter.builder().readerDefault(false).build();
}
/** The clean reader default: familyMember OR documentCount &gt; 0, no other constraints. */
public static PersonFilter cleanDefault() {
return PersonFilter.builder().readerDefault(true).build();
}
}

View File

@@ -1,16 +0,0 @@
package org.raddatz.familienarchiv.person;
/**
* Single source of truth for the {@code persons.generation} value range.
* The DB CHECK in V70, the {@code PersonUpdateDTO} Bean Validation annotations,
* and the canonical importers all reference these constants so a future widening
* (e.g. accepting {@code G 1} ancestors) happens in one place. Mirror this file
* by hand in the V70 migration comment when adjusting bounds.
*/
public final class PersonGeneration {
public static final int MIN_GENERATION = 0;
public static final int MAX_GENERATION = 10;
private PersonGeneration() {}
}

View File

@@ -32,9 +32,6 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
// Lookup by full alias string, used during ODS mass import
Optional<Person> findByAliasIgnoreCase(String alias);
// Lookup by the normalizer person_id, used for idempotent canonical re-import (Phase 3).
Optional<Person> findBySourceRef(String sourceRef);
// Exact first+last name match, used for filename-based sender lookup
Optional<Person> findByFirstNameIgnoreCaseAndLastNameIgnoreCase(String firstName, String lastName);
@@ -44,7 +41,7 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
SELECT p.id, p.title, p.first_name AS firstName, p.last_name AS lastName,
p.person_type AS personType,
p.alias, p.birth_year AS birthYear, p.death_year AS deathYear, p.notes,
p.family_member AS familyMember, p.provisional AS provisional,
p.family_member AS familyMember,
(SELECT COUNT(*) FROM documents d WHERE d.sender_id = p.id)
+ (SELECT COUNT(*) FROM document_receivers dr WHERE dr.person_id = p.id) AS documentCount
FROM persons p
@@ -57,7 +54,7 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
SELECT p.id, p.title, p.first_name AS firstName, p.last_name AS lastName,
p.person_type AS personType,
p.alias, p.birth_year AS birthYear, p.death_year AS deathYear, p.notes,
p.family_member AS familyMember, p.provisional AS provisional,
p.family_member AS familyMember,
(SELECT COUNT(*) FROM documents d WHERE d.sender_id = p.id)
+ (SELECT COUNT(*) FROM document_receivers dr WHERE dr.person_id = p.id) AS documentCount
FROM persons p
@@ -66,7 +63,7 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
OR LOWER(CONCAT(p.last_name,' ',COALESCE(p.first_name,''))) LIKE LOWER(CONCAT('%',:query,'%'))
OR LOWER(p.alias) LIKE LOWER(CONCAT('%',:query,'%'))
OR LOWER(a.last_name) LIKE LOWER(CONCAT('%',:query,'%'))
GROUP BY p.id, p.title, p.first_name, p.last_name, p.person_type, p.alias, p.birth_year, p.death_year, p.notes, p.family_member, p.provisional
GROUP BY p.id, p.title, p.first_name, p.last_name, p.person_type, p.alias, p.birth_year, p.death_year, p.notes, p.family_member
ORDER BY p.last_name ASC, p.first_name ASC
""",
nativeQuery = true)
@@ -78,7 +75,7 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
SELECT p.id, p.title, p.first_name AS firstName, p.last_name AS lastName,
p.person_type AS personType,
p.alias, p.birth_year AS birthYear, p.death_year AS deathYear, p.notes,
p.family_member AS familyMember, p.provisional AS provisional,
p.family_member AS familyMember,
(SELECT COUNT(*) FROM documents d WHERE d.sender_id = p.id)
+ (SELECT COUNT(*) FROM document_receivers dr WHERE dr.person_id = p.id) AS documentCount
FROM persons p
@@ -88,61 +85,6 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
nativeQuery = true)
List<PersonSummaryDTO> findTopByDocumentCount(@Param("limit") int limit);
// --- #667: filter-aware paged directory ---
//
// The slice query and the count query below MUST keep an IDENTICAL WHERE clause so the
// rendered page and totalElements can never drift. Every filter is nullable: a null param
// disables that predicate via the `:param IS NULL OR …` idiom. `readerDefault` (a plain
// boolean) restricts to "familyMember OR has documents"; the explicit filters AND on top.
// documentCount is recomputed inline (not via the SELECT alias) because WHERE cannot
// reference a computed alias. All params are named — no string concatenation, no injection.
String FILTER_WHERE = """
WHERE (CAST(:type AS text) IS NULL OR p.person_type = CAST(:type AS text))
AND (:familyOnly = FALSE OR :familyOnly IS NULL OR p.family_member = TRUE)
AND (:hasDocuments = FALSE OR :hasDocuments IS NULL OR (
(SELECT COUNT(*) FROM documents d WHERE d.sender_id = p.id)
+ (SELECT COUNT(*) FROM document_receivers dr WHERE dr.person_id = p.id)) > 0)
AND (:provisional IS NULL OR p.provisional = :provisional)
AND (:readerDefault = FALSE OR (
p.family_member = TRUE OR (
(SELECT COUNT(*) FROM documents d WHERE d.sender_id = p.id)
+ (SELECT COUNT(*) FROM document_receivers dr WHERE dr.person_id = p.id)) > 0))
AND (CAST(:query AS text) IS NULL OR
LOWER(CONCAT(COALESCE(p.first_name,''),' ',p.last_name)) LIKE LOWER(CONCAT('%',CAST(:query AS text),'%'))
OR LOWER(CONCAT(p.last_name,' ',COALESCE(p.first_name,''))) LIKE LOWER(CONCAT('%',CAST(:query AS text),'%'))
OR LOWER(p.alias) LIKE LOWER(CONCAT('%',CAST(:query AS text),'%')))
""";
@Query(value = """
SELECT p.id, p.title, p.first_name AS firstName, p.last_name AS lastName,
p.person_type AS personType,
p.alias, p.birth_year AS birthYear, p.death_year AS deathYear, p.notes,
p.family_member AS familyMember, p.provisional AS provisional,
(SELECT COUNT(*) FROM documents d WHERE d.sender_id = p.id)
+ (SELECT COUNT(*) FROM document_receivers dr WHERE dr.person_id = p.id) AS documentCount
FROM persons p
""" + FILTER_WHERE + """
ORDER BY p.last_name ASC, p.first_name ASC
LIMIT :limit OFFSET :offset
""",
nativeQuery = true)
List<PersonSummaryDTO> findByFilter(@Param("type") String type,
@Param("familyOnly") Boolean familyOnly,
@Param("hasDocuments") Boolean hasDocuments,
@Param("provisional") Boolean provisional,
@Param("readerDefault") boolean readerDefault,
@Param("query") String query,
@Param("limit") int limit,
@Param("offset") int offset);
@Query(value = "SELECT COUNT(*) FROM persons p " + FILTER_WHERE, nativeQuery = true)
long countByFilter(@Param("type") String type,
@Param("familyOnly") Boolean familyOnly,
@Param("hasDocuments") Boolean hasDocuments,
@Param("provisional") Boolean provisional,
@Param("readerDefault") boolean readerDefault,
@Param("query") String query);
// --- Correspondent queries ---
@Query(value = """
@@ -194,12 +136,6 @@ public interface PersonRepository extends JpaRepository<Person, UUID> {
@Query(value = "UPDATE documents SET sender_id = :target WHERE sender_id = :source", nativeQuery = true)
void reassignSender(@Param("source") UUID source, @Param("target") UUID target);
// Used by deletePerson: detach a deleted person from documents they sent, so the hard
// delete cannot orphan a documents.sender_id FK (the column is nullable).
@Modifying
@Query(value = "UPDATE documents SET sender_id = NULL WHERE sender_id = :source", nativeQuery = true)
void reassignSenderToNull(@Param("source") UUID source);
@Modifying
@Query(value = """
INSERT INTO document_receivers (document_id, person_id)

View File

@@ -1,50 +0,0 @@
package org.raddatz.familienarchiv.person;
import io.swagger.v3.oas.annotations.media.Schema;
import java.util.List;
/**
* Paged result for the /api/persons list endpoint.
*
* <p>Hand-written to mirror {@code document/DocumentSearchResult} field-for-field so the
* frontend sees one paged shape across the app. Deliberately NOT Spring {@code Page<T>}
* (unstable serialized shape across Spring versions, noisy in OpenAPI) and deliberately
* NOT a reuse of the document DTO (would couple two feature modules — duplication beats
* coupling here).
*/
public record PersonSearchResult(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
List<PersonSummaryDTO> items,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
long totalElements,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
int pageNumber,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
int pageSize,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED)
int totalPages
) {
/**
* Paged factory: derives {@code totalPages} from the full match count and the page size.
* A zero count yields zero pages so the frontend hides the pagination control.
*/
public static PersonSearchResult paged(List<PersonSummaryDTO> slice, int pageNumber, int pageSize, long totalElements) {
int totalPages = pageSize == 0 ? 0 : (int) ((totalElements + pageSize - 1) / pageSize);
return new PersonSearchResult(slice, totalElements, pageNumber, pageSize, totalPages);
}
/**
* Non-paged factory for the legacy {@code sort=documentCount} top-N dashboard path.
* That query returns the <em>complete</em> result in one shot — there is no further page
* to fetch — so the envelope reports reality rather than pretending to be a slice of a
* larger set: {@code totalElements} equals the number of rows actually returned,
* {@code pageSize} equals that same count, and {@code totalPages} is 1 (or 0 when empty).
* This avoids the earlier ambiguity where {@code totalElements} looked like a paged total.
*/
public static PersonSearchResult topN(List<PersonSummaryDTO> all) {
int count = all.size();
int totalPages = count == 0 ? 0 : 1;
return new PersonSearchResult(all, count, 0, count, totalPages);
}
}

View File

@@ -31,55 +31,20 @@ public class PersonService {
private final PersonRepository personRepository;
private final PersonNameAliasRepository aliasRepository;
public List<PersonSummaryDTO> findAll(String q) {
if (q == null) {
return personRepository.findAllWithDocumentCount();
}
if (q.isBlank()) {
return List.of();
}
return personRepository.searchWithDocumentCount(q.trim());
}
public List<PersonSummaryDTO> findTopByDocumentCount(int limit) {
return personRepository.findTopByDocumentCount(limit);
}
/**
* Filtered, paginated directory query. The slice and the total are derived from one
* shared WHERE clause (see {@link PersonRepository#FILTER_WHERE}) so totalElements can
* never drift from the rendered page. {@code type} is passed as the enum name because the
* native query compares against the string column.
*/
public PersonSearchResult search(PersonFilter filter, int page, int size, String q) {
String type = filter.type() == null ? null : filter.type().name();
String query = (q == null || q.isBlank()) ? null : q.trim();
int offset = page * size;
List<PersonSummaryDTO> items = personRepository.findByFilter(
type, filter.familyOnly(), filter.hasDocuments(), filter.provisional(),
filter.readerDefault(), query, size, offset);
long total = personRepository.countByFilter(
type, filter.familyOnly(), filter.hasDocuments(), filter.provisional(),
filter.readerDefault(), query);
return PersonSearchResult.paged(items, page, size, total);
}
/**
* Clears the {@code provisional} flag — a deliberate state transition exposed as
* {@code PATCH /api/persons/{id}/confirm}, never as a mass-assignable DTO field (CWE-915).
*/
@Transactional
public Person confirmPerson(UUID id) {
Person person = getById(id);
person.setProvisional(false);
return personRepository.save(person);
}
/**
* Hard-deletes a person used by triage. Detaches the person from any documents they
* sent (nulls sender_id) and from any received-document references first, so the delete
* cannot orphan an FK and fail with a 500.
*/
@Transactional
public void deletePerson(UUID id) {
getById(id);
personRepository.reassignSenderToNull(id);
personRepository.deleteReceiverReferences(id);
personRepository.deleteById(id);
}
public Person getById(UUID id) {
return personRepository.findById(id)
.orElseThrow(() -> DomainException.notFound(ErrorCode.PERSON_NOT_FOUND, "Person not found: " + id));
@@ -115,11 +80,6 @@ public class PersonService {
return personRepository.findByFirstNameIgnoreCaseAndLastNameIgnoreCase(firstName, lastName);
}
/** Lookup by the normalizer person_id — used by the canonical importer for register-first matching. */
public Optional<Person> findBySourceRef(String sourceRef) {
return personRepository.findBySourceRef(sourceRef);
}
@Nullable
@Transactional
public Person findOrCreateByAlias(String rawName) {
@@ -155,82 +115,6 @@ public class PersonService {
});
}
/**
* Idempotent upsert keyed on {@code sourceRef} (the normalizer person_id) for the
* canonical importer (Phase 3, ADR-025). On first import the canonical fields are
* written verbatim. On re-import the human-edit-preserve precedence applies:
* a non-blank existing field is never overwritten, and {@code provisional} never
* flips back to true once a human has confirmed the person.
*/
@Transactional
public Person upsertBySourceRef(PersonUpsertCommand cmd) {
return personRepository.findBySourceRef(cmd.sourceRef())
.map(existing -> personRepository.save(mergeCanonical(existing, cmd)))
.orElseGet(() -> fromCanonical(cmd));
}
private Person fromCanonical(PersonUpsertCommand cmd) {
Person person = personRepository.save(Person.builder()
.sourceRef(cmd.sourceRef())
.firstName(blankToNull(cmd.firstName()))
.lastName(cmd.lastName())
.notes(blankToNull(cmd.notes()))
.birthYear(cmd.birthYear())
.deathYear(cmd.deathYear())
.generation(cmd.generation())
.familyMember(cmd.familyMember())
.personType(cmd.personType() == null ? PersonType.PERSON : cmd.personType())
.provisional(cmd.provisional())
.build());
String maiden = blankToNull(cmd.maidenName());
if (maiden != null) {
int nextSortOrder = aliasRepository.findMaxSortOrder(person.getId()) + 1;
aliasRepository.save(PersonNameAlias.builder()
.person(person)
.lastName(maiden)
.type(PersonNameAliasType.MAIDEN_NAME)
.sortOrder(nextSortOrder)
.build());
}
return person;
}
private Person mergeCanonical(Person existing, PersonUpsertCommand cmd) {
existing.setFirstName(preferHuman(existing.getFirstName(), cmd.firstName()));
existing.setLastName(preferHuman(existing.getLastName(), cmd.lastName()));
existing.setNotes(preferHuman(existing.getNotes(), cmd.notes()));
existing.setBirthYear(preferHuman(existing.getBirthYear(), cmd.birthYear()));
existing.setDeathYear(preferHuman(existing.getDeathYear(), cmd.deathYear()));
existing.setGeneration(preferHuman(existing.getGeneration(), cmd.generation()));
if (cmd.personType() != null && existing.getPersonType() == PersonType.PERSON) {
existing.setPersonType(cmd.personType());
}
// provisional is monotonic-downward: once it is false it never reverts to true.
// This also pins the cross-loader precedence (ADR-025): a register/tree person is
// loaded before documents and already false, so a later document row that references
// the same source_ref (provisional=true) can never flip it provisional — the guard
// below only fires while existing is still provisional. Order of document rows is
// therefore irrelevant.
if (existing.isProvisional()) {
existing.setProvisional(cmd.provisional());
}
return existing;
}
// preferHuman keeps an existing human-entered value and only falls back to the canonical
// value when the existing one is absent — the single idiom for every fill-blank field.
private static String preferHuman(String existing, String canonical) {
return (existing == null || existing.isBlank()) ? blankToNull(canonical) : existing;
}
private static Integer preferHuman(Integer existing, Integer canonical) {
return existing != null ? existing : canonical;
}
private static String blankToNull(String s) {
return (s == null || s.isBlank()) ? null : s.trim();
}
@Transactional
public Person createPerson(String firstName, String lastName, String alias) {
Person person = Person.builder()
@@ -256,7 +140,6 @@ public class PersonService {
.notes(dto.getNotes() == null || dto.getNotes().isBlank() ? null : dto.getNotes().trim())
.birthYear(dto.getBirthYear())
.deathYear(dto.getDeathYear())
.generation(dto.getGeneration())
.build();
return personRepository.save(person);
}
@@ -289,9 +172,6 @@ public class PersonService {
person.setNotes(dto.getNotes() == null || dto.getNotes().isBlank() ? null : dto.getNotes().trim());
person.setBirthYear(dto.getBirthYear());
person.setDeathYear(dto.getDeathYear());
// Form path: a human can clear generation back to null. Unlike the importer
// which routes through preferHuman, we write the DTO value verbatim.
person.setGeneration(dto.getGeneration());
return personRepository.save(person);
}

View File

@@ -18,7 +18,6 @@ public interface PersonSummaryDTO {
Integer getDeathYear();
String getNotes();
boolean isFamilyMember();
boolean isProvisional();
long getDocumentCount();
default String getDisplayName() {

View File

@@ -1,7 +1,5 @@
package org.raddatz.familienarchiv.person;
import jakarta.validation.constraints.Max;
import jakarta.validation.constraints.Min;
import jakarta.validation.constraints.NotNull;
import jakarta.validation.constraints.Size;
import lombok.Data;
@@ -23,9 +21,4 @@ public class PersonUpdateDTO {
private String notes;
private Integer birthYear;
private Integer deathYear;
// Mirror of the persons.generation CHECK constraint (V70). Bounds live in
// PersonGeneration so DB, DTO, and importer all read from one place.
@Min(PersonGeneration.MIN_GENERATION)
@Max(PersonGeneration.MAX_GENERATION)
private Integer generation;
}

View File

@@ -1,25 +0,0 @@
package org.raddatz.familienarchiv.person;
import lombok.Builder;
/**
* Importer → {@link PersonService} command for an idempotent upsert keyed on
* {@code sourceRef} (the normalizer's stable person_id). Carries only the canonical
* fields the importer owns; the service applies the human-edit-preserve precedence
* (see ADR-025): non-blank existing fields are never overwritten, and {@code provisional}
* never flips back to true once a human has confirmed a person.
*/
@Builder
public record PersonUpsertCommand(
String sourceRef,
String firstName,
String lastName,
String maidenName,
String notes,
Integer birthYear,
Integer deathYear,
Integer generation,
boolean familyMember,
PersonType personType,
boolean provisional
) {}

View File

@@ -96,8 +96,7 @@ public class RelationshipInferenceService {
if (p == null) continue;
List<RelationToken> path = shortestPaths.get(id);
PersonNodeDTO node = new PersonNodeDTO(
p.getId(), p.getDisplayName(), p.getBirthYear(), p.getDeathYear(),
p.getGeneration(), p.isFamilyMember());
p.getId(), p.getDisplayName(), p.getBirthYear(), p.getDeathYear(), p.isFamilyMember());
out.add(new InferredRelationshipWithPersonDTO(node, labelFor(path), path.size()));
}
out.sort(Comparator.comparingInt(InferredRelationshipWithPersonDTO::hops)

View File

@@ -31,12 +31,6 @@ import java.util.UUID;
@RequiredArgsConstructor
public class RelationshipService {
// Single source of truth for which relationship types are part of the family graph.
// Consulted by addRelationship (to set family_member on both endpoints) and by
// getFamilyNetwork (to filter the edges returned). FRIEND/COLLEAGUE/etc. are excluded.
private static final List<RelationType> FAMILY_RELATION_TYPES =
List.of(RelationType.PARENT_OF, RelationType.SPOUSE_OF, RelationType.SIBLING_OF);
private final PersonRelationshipRepository relationshipRepository;
private final PersonService personService;
private final RelationshipInferenceService inferenceService;
@@ -66,12 +60,11 @@ public class RelationshipService {
for (Person p : familyMembers) {
familyIds.add(p.getId());
nodes.add(new PersonNodeDTO(
p.getId(), p.getDisplayName(), p.getBirthYear(), p.getDeathYear(),
p.getGeneration(), true));
p.getId(), p.getDisplayName(), p.getBirthYear(), p.getDeathYear(), true));
}
List<PersonRelationship> familyEdges = relationshipRepository.findAllByRelationTypeIn(
FAMILY_RELATION_TYPES);
List.of(RelationType.PARENT_OF, RelationType.SPOUSE_OF, RelationType.SIBLING_OF));
List<RelationshipDTO> edges = new ArrayList<>();
for (PersonRelationship r : familyEdges) {
@@ -112,23 +105,15 @@ public class RelationshipService {
.notes(blankToNull(dto.notes()))
.build();
PersonRelationship saved;
try {
// saveAndFlush so the unique_rel constraint violates synchronously and is
// caught here, not at commit time outside the @Transactional boundary.
saved = relationshipRepository.saveAndFlush(rel);
return toDTO(relationshipRepository.saveAndFlush(rel));
} catch (DataIntegrityViolationException e) {
throw DomainException.conflict(
ErrorCode.DUPLICATE_RELATIONSHIP,
"Relationship already exists for (" + personId + ", " + relatedPerson.getId() + ", " + dto.relationType() + ")");
}
// Family-graph edges imply both endpoints are family members. Idempotent: the
// setter is a no-op when the person is already flagged, so re-imports stay clean.
if (FAMILY_RELATION_TYPES.contains(dto.relationType())) {
personService.setFamilyMember(person.getId(), true);
personService.setFamilyMember(relatedPerson.getId(), true);
}
return toDTO(saved);
}
@Transactional

View File

@@ -10,6 +10,5 @@ public record PersonNodeDTO(
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) String displayName,
Integer birthYear,
Integer deathYear,
Integer generation,
@Schema(requiredMode = Schema.RequiredMode.REQUIRED) boolean familyMember
) {}

View File

@@ -30,11 +30,4 @@ public class Tag {
/** Color token name (e.g. "sage"), only set on root-level tags. Null means no color. */
private String color;
/**
* Import identity key, keyed on the canonical tag_path. Null for manually created tags;
* unique among non-null values. The importer (Phase 3) uses it for idempotent re-import.
*/
@Column(name = "source_ref")
private String sourceRef;
}

View File

@@ -22,9 +22,6 @@ public interface TagRepository extends JpaRepository<Tag, UUID> {
Optional<Tag> findByNameIgnoreCase(String name);
// Lookup by the canonical tag_path, used for idempotent canonical re-import (Phase 3).
Optional<Tag> findBySourceRef(String sourceRef);
List<Tag> findByNameContainingIgnoreCase(String name);
/**

View File

@@ -7,7 +7,6 @@ import java.util.HashSet;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.Set;
import java.util.UUID;
import java.util.stream.Collectors;
@@ -50,37 +49,12 @@ public class TagService {
.orElseThrow(() -> DomainException.notFound(ErrorCode.TAG_NOT_FOUND, "Tag not found: " + id));
}
/** Lookup by the canonical tag_path — used by the canonical importer to attach a document's tag. */
public Optional<Tag> findBySourceRef(String sourceRef) {
return tagRepository.findBySourceRef(sourceRef);
}
public Tag findOrCreate(String name) {
String cleanName = name.trim();
return tagRepository.findByNameIgnoreCase(cleanName)
.orElseGet(() -> tagRepository.save(Tag.builder().name(cleanName).build()));
}
/**
* Idempotent upsert keyed on {@code sourceRef} (the canonical tag_path) for the
* Phase-3 importer (ADR-025). On first import the canonical name and parent are
* written; on re-import a human-renamed tag name is preserved (the source_ref is the
* stable identity, the name is a human-editable label).
*/
@Transactional
public Tag upsertBySourceRef(String sourceRef, String name, UUID parentId) {
return tagRepository.findBySourceRef(sourceRef)
.map(existing -> {
existing.setParentId(parentId);
return tagRepository.save(existing);
})
.orElseGet(() -> tagRepository.save(Tag.builder()
.sourceRef(sourceRef)
.name(name)
.parentId(parentId)
.build()));
}
@Transactional
public Tag update(UUID id, TagUpdateDTO dto) {
Tag tag = getById(id);

View File

@@ -5,8 +5,7 @@ import org.raddatz.familienarchiv.security.Permission;
import org.raddatz.familienarchiv.security.RequirePermission;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentVersionService;
import org.raddatz.familienarchiv.importing.CanonicalImportOrchestrator;
import org.raddatz.familienarchiv.importing.ImportStatus;
import org.raddatz.familienarchiv.importing.MassImportService;
import org.raddatz.familienarchiv.document.ThumbnailBackfillService;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
@@ -22,20 +21,20 @@ import lombok.RequiredArgsConstructor;
@RequiredArgsConstructor
public class AdminController {
private final CanonicalImportOrchestrator importOrchestrator;
private final MassImportService massImportService;
private final DocumentService documentService;
private final DocumentVersionService documentVersionService;
private final ThumbnailBackfillService thumbnailBackfillService;
@PostMapping("/trigger-import")
public ResponseEntity<ImportStatus> triggerMassImport() {
importOrchestrator.runImportAsync();
return ResponseEntity.accepted().body(importOrchestrator.getStatus());
public ResponseEntity<MassImportService.ImportStatus> triggerMassImport() {
massImportService.runImportAsync();
return ResponseEntity.accepted().body(massImportService.getStatus());
}
@GetMapping("/import-status")
public ResponseEntity<ImportStatus> importStatus() {
return ResponseEntity.ok(importOrchestrator.getStatus());
public ResponseEntity<MassImportService.ImportStatus> importStatus() {
return ResponseEntity.ok(massImportService.getStatus());
}
@PostMapping("/backfill-versions")

View File

@@ -125,10 +125,17 @@ app:
password: ${APP_ADMIN_PASSWORD:admin123}
import:
# Directory holding the normalizer's committed canonical artifacts
# (canonical-{documents,persons,tag-tree}.xlsx + canonical-persons-tree.json).
# The loader maps columns by header name — no positional indices (see ADR-025).
dir: ${IMPORT_DIR:/import}
col:
index: 0
box: 1
folder: 2
sender: 3
receivers: 5
date: 7
location: 9
tags: 10
summary: 11
transcription: 13
ocr:
sender-model:

View File

@@ -1,14 +0,0 @@
-- Repeatable migration: sets the grafana_reader role's password from the
-- ${grafanaDbPassword} placeholder (resolved by FlywayConfig from the
-- GRAFANA_DB_PASSWORD environment variable). Flyway computes the checksum on
-- the resolved migration content, so any change to GRAFANA_DB_PASSWORD changes
-- the checksum and re-applies this migration on the next boot. That makes
-- password rotation a "change env var + restart" operation — no manual psql.
--
-- V68 created the role itself (without a usable password). This file owns the
-- password lifecycle; nothing else writes it.
DO $$
BEGIN
EXECUTE format('ALTER ROLE grafana_reader WITH PASSWORD %L', '${grafanaDbPassword}');
END
$$;

View File

@@ -1,17 +0,0 @@
-- Read-only role used by the Grafana PostgreSQL datasource for the PO Overview
-- dashboard (issue #651). The role is created here without a usable password
-- (LOGIN-capable but no password set); R__grafana_reader_password.sql sets the
-- password from GRAFANA_DB_PASSWORD on every boot, so rotation is just "bump
-- the env var and restart the backend" — see docs/adr/024-* and the rotation
-- runbook in docs/DEPLOYMENT.md.
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = 'grafana_reader') THEN
CREATE ROLE grafana_reader WITH LOGIN;
END IF;
END
$$;
GRANT CONNECT ON DATABASE ${flyway:database} TO grafana_reader;
GRANT USAGE ON SCHEMA public TO grafana_reader;
GRANT SELECT ON audit_log, documents, transcription_blocks TO grafana_reader;

View File

@@ -1,67 +0,0 @@
-- Phase 2 of "Handling the Unknowns": the schema foundation.
-- Consolidates every new import/precision/attribution/identity column into ONE
-- migration with a single owner so downstream phases (importer, rendering, persons
-- directory) compile against a finished, collision-free schema. See ADR-025.
--
-- This file is forward-only and immutable once shipped (Flyway checksum model):
-- any fix goes in a later version, never an edit here.
-- ─── documents: date precision, range end, raw date, raw attribution ──────────
-- Range end is only set for RANGE precision (open-ended ranges allowed → end may be null).
ALTER TABLE documents ADD COLUMN meta_date_end date;
-- Original date cell, verbatim, for provenance and "as written" display (Phase 4).
ALTER TABLE documents ADD COLUMN meta_date_raw text;
-- Raw attribution preserved even when a person is linked.
ALTER TABLE documents ADD COLUMN sender_text text;
ALTER TABLE documents ADD COLUMN receiver_text text;
-- Bound user-influenced spreadsheet text at the DB layer (mirrors transcription_blocks
-- length cap in V18). Defense in depth against malformed/huge import cells.
ALTER TABLE documents ADD CONSTRAINT chk_meta_date_raw_length CHECK (length(meta_date_raw) <= 10000);
ALTER TABLE documents ADD CONSTRAINT chk_sender_text_length CHECK (length(sender_text) <= 10000);
ALTER TABLE documents ADD CONSTRAINT chk_receiver_text_length CHECK (length(receiver_text) <= 10000);
-- Precision enum — added with a DB default of 'UNKNOWN', backfilled, then made NOT NULL.
-- The DEFAULT serves two purposes: (1) existing rows get 'UNKNOWN' immediately, and
-- (2) raw-SQL inserts that omit the column (test fixtures, ad-hoc data loads) get a sane,
-- CHECK-valid value instead of violating the NOT NULL constraint. JPA saves still set it
-- explicitly via the entity's @Builder.Default = DatePrecision.UNKNOWN.
ALTER TABLE documents ADD COLUMN meta_date_precision varchar(16) DEFAULT 'UNKNOWN';
UPDATE documents
SET meta_date_precision = CASE WHEN meta_date IS NOT NULL THEN 'DAY' ELSE 'UNKNOWN' END;
ALTER TABLE documents ALTER COLUMN meta_date_precision SET NOT NULL;
-- Fail-closed allowlist of the seven precision values (verbatim mirror of the
-- normalizer's Precision enum). The DB enforces validity independent of the Java enum.
ALTER TABLE documents ADD CONSTRAINT chk_meta_date_precision
CHECK (meta_date_precision IN ('DAY', 'MONTH', 'SEASON', 'YEAR', 'RANGE', 'APPROX', 'UNKNOWN'));
-- A non-null range end is permitted only when precision = RANGE. A RANGE row MAY have a
-- null end (open-ended range), so the rule is one-directional, not biconditional.
ALTER TABLE documents ADD CONSTRAINT chk_meta_date_end_only_for_range
CHECK (meta_date_end IS NULL OR meta_date_precision = 'RANGE');
-- For ranges with both endpoints, the end must not precede the start.
ALTER TABLE documents ADD CONSTRAINT chk_meta_date_end_after_start
CHECK (meta_date_end IS NULL OR meta_date IS NULL OR meta_date_end >= meta_date);
-- ─── persons: source_ref (import identity) + provisional flag ─────────────────
-- The normalizer person_id: join key for documents → persons and idempotency key for
-- re-import. Nullable (manually created persons never have one); unique among non-nulls.
ALTER TABLE persons ADD COLUMN source_ref varchar(255);
CREATE UNIQUE INDEX idx_persons_source_ref ON persons (source_ref);
-- A provisional person is one the importer inferred but could not confidently identify.
-- Stays false until Phase 3 (importer) sets it; no code path writes true in this phase.
ALTER TABLE persons ADD COLUMN provisional boolean NOT NULL DEFAULT false;
-- ─── tag: source_ref (import identity, keyed on canonical tag_path) ───────────
ALTER TABLE tag ADD COLUMN source_ref varchar(255);
CREATE UNIQUE INDEX idx_tag_source_ref ON tag (source_ref);

View File

@@ -1,26 +0,0 @@
-- #689: persist the hand-curated "G 0…G 5" generation index from
-- canonical-persons.xlsx so the Stammbaum layout can use it as a strict
-- rank anchor (replacing the current iterative longest-path heuristic that
-- silently misplaces loose spouses with their own parents in the graph).
--
-- Nullable: pre-import rows and persons outside the curated family graph
-- legitimately have no generation. The canonical importer back-fills via
-- preferHuman on the next run; a human-edited value is never overwritten
-- (see ADR-025).
ALTER TABLE persons ADD COLUMN generation SMALLINT;
-- Allowlist of valid generation indices. The 0..10 bounds mirror
-- PersonGeneration.MIN_GENERATION / MAX_GENERATION in Java — keep the
-- two in sync (the DTO @Min/@Max and both importer range guards read from
-- those Java constants). Current data tops out at G 5, but a future G 6 →
-- G 10 widening needs no migration. A G 1 ancestor would require a
-- separate one-shot shift migration (out of scope here; the layout's
-- normalise step already handles negative seeds at render time).
ALTER TABLE persons ADD CONSTRAINT chk_generation_range
CHECK (generation IS NULL OR generation BETWEEN 0 AND 10);
-- Partial index: only the curated rows (≈ 163 of 1,105) ever get a value,
-- and the layout only ever queries for non-null rows.
CREATE INDEX idx_persons_generation ON persons (generation)
WHERE generation IS NOT NULL;

View File

@@ -479,191 +479,6 @@ class MigrationIntegrationTest {
assertThat(count).isEqualTo(1);
}
// ─── V69: import/precision/attribution/identity schema foundation ────────
@Test
void v69_metaDatePrecisionColumn_isNotNull() {
Integer count = jdbc.queryForObject(
"""
SELECT COUNT(*) FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'documents'
AND column_name = 'meta_date_precision'
AND is_nullable = 'NO'
""",
Integer.class);
assertThat(count).isEqualTo(1);
}
@Test
void v69_backfillSql_setsDatedRowsToDayPrecision() {
// Re-run the migration's backfill UPDATE on a freshly dated row to prove the rule.
UUID docId = createDocumentWithDate("1943-05-12");
jdbc.update(V69_BACKFILL_PRECISION_SQL);
String precision = jdbc.queryForObject(
"SELECT meta_date_precision FROM documents WHERE id = ?", String.class, docId);
assertThat(precision).isEqualTo("DAY");
}
@Test
void v69_backfillSql_setsUndatedRowsToUnknownPrecision() {
UUID docId = createDocument(); // no meta_date
jdbc.update(V69_BACKFILL_PRECISION_SQL);
String precision = jdbc.queryForObject(
"SELECT meta_date_precision FROM documents WHERE id = ?", String.class, docId);
assertThat(precision).isEqualTo("UNKNOWN");
}
// Mirrors the backfill UPDATE shipped in V69; idempotent for verification.
private static final String V69_BACKFILL_PRECISION_SQL = """
UPDATE documents
SET meta_date_precision = CASE WHEN meta_date IS NOT NULL THEN 'DAY' ELSE 'UNKNOWN' END
""";
@Test
void v69_precisionCheck_rejectsValueOutsideEnum() {
UUID docId = createDocument();
assertThatThrownBy(() ->
jdbc.update("UPDATE documents SET meta_date_precision = 'BOGUS' WHERE id = ?", docId)
).isInstanceOf(DataIntegrityViolationException.class);
}
@Test
void v69_metaDateEndCheck_rejectsNonNullEndWhenPrecisionNotRange() {
UUID docId = createDocumentWithDate("1943-05-12"); // precision DAY
assertThatThrownBy(() ->
jdbc.update("UPDATE documents SET meta_date_end = '1943-06-01' WHERE id = ?", docId)
).isInstanceOf(DataIntegrityViolationException.class);
}
@Test
void v69_metaDateEndCheck_allowsNonNullEndWhenPrecisionRange() {
UUID docId = createDocumentWithDate("1943-05-12");
int rows = jdbc.update(
"UPDATE documents SET meta_date_precision = 'RANGE', meta_date_end = '1943-06-01' WHERE id = ?",
docId);
assertThat(rows).isEqualTo(1);
}
@Test
void v69_metaDateEndCheck_allowsRangeWithNullEnd() {
// Loose semantics: the normalizer may emit an open-ended RANGE (start only).
UUID docId = createDocumentWithDate("1943-05-12");
int rows = jdbc.update(
"UPDATE documents SET meta_date_precision = 'RANGE' WHERE id = ?", docId);
assertThat(rows).isEqualTo(1);
}
@Test
void v69_metaDateEndCheck_allowsRangeWithBothEndpointsNull() {
// Fully-open RANGE: neither start (meta_date) nor end (meta_date_end) is set.
// Both CHECKs hold (end IS NULL passes chk_meta_date_end_only_for_range; both-null
// passes chk_meta_date_end_after_start), so the row survives. This locks the actual
// DB behavior so a future tightening to a biconditional rule is a deliberate change.
UUID docId = createDocument(); // null meta_date
int rows = jdbc.update(
"UPDATE documents SET meta_date_precision = 'RANGE' WHERE id = ?", docId);
assertThat(rows).isEqualTo(1);
Object metaDate = jdbc.queryForObject("SELECT meta_date FROM documents WHERE id = ?", Object.class, docId);
Object metaDateEnd = jdbc.queryForObject(
"SELECT meta_date_end FROM documents WHERE id = ?", Object.class, docId);
assertThat(metaDate).isNull();
assertThat(metaDateEnd).isNull();
}
@Test
void v69_rangeOrderCheck_rejectsEndBeforeStart() {
UUID docId = createDocumentWithDate("1943-05-12");
assertThatThrownBy(() ->
jdbc.update(
"UPDATE documents SET meta_date_precision = 'RANGE', meta_date_end = '1943-01-01' WHERE id = ?",
docId)
).isInstanceOf(DataIntegrityViolationException.class);
}
@Test
void v69_metaDateRawCheck_rejectsOverlongText() {
UUID docId = createDocument();
String tooLong = "x".repeat(10001);
assertThatThrownBy(() ->
jdbc.update("UPDATE documents SET meta_date_raw = ? WHERE id = ?", tooLong, docId)
).isInstanceOf(DataIntegrityViolationException.class);
}
@Test
void v69_senderTextAndReceiverText_storeRawAttribution() {
UUID docId = createDocument();
int rows = jdbc.update(
"UPDATE documents SET sender_text = 'Oma Anna', receiver_text = 'Tante Grete' WHERE id = ?",
docId);
assertThat(rows).isEqualTo(1);
}
@Test
@Transactional(propagation = Propagation.NOT_SUPPORTED)
void v69_personsSourceRef_uniqueIndexRejectsDuplicate() {
jdbc.update(
"INSERT INTO persons (id, last_name, source_ref) VALUES (gen_random_uuid(), 'A', 'person:dup')");
try {
assertThatThrownBy(() ->
jdbc.update(
"INSERT INTO persons (id, last_name, source_ref) VALUES (gen_random_uuid(), 'B', 'person:dup')")
).isInstanceOf(DataIntegrityViolationException.class);
} finally {
jdbc.update("DELETE FROM persons WHERE source_ref = 'person:dup'");
}
}
@Test
@Transactional(propagation = Propagation.NOT_SUPPORTED)
void v69_personsSourceRef_allowsMultipleNulls() {
UUID a = createPerson("Null", "RefA");
UUID b = createPerson("Null", "RefB");
try {
String refA = jdbc.queryForObject("SELECT source_ref FROM persons WHERE id = ?", String.class, a);
String refB = jdbc.queryForObject("SELECT source_ref FROM persons WHERE id = ?", String.class, b);
assertThat(refA).isNull();
assertThat(refB).isNull();
} finally {
jdbc.update("DELETE FROM persons WHERE id IN (?, ?)", a, b);
}
}
@Test
void v69_personsProvisional_defaultsToFalse() {
UUID id = createPerson("Provisional", "Default");
Boolean provisional = jdbc.queryForObject(
"SELECT provisional FROM persons WHERE id = ?", Boolean.class, id);
assertThat(provisional).isFalse();
}
@Test
@Transactional(propagation = Propagation.NOT_SUPPORTED)
void v69_tagSourceRef_uniqueIndexRejectsDuplicate() {
jdbc.update("INSERT INTO tag (id, name, source_ref) VALUES (gen_random_uuid(), 'TagDupA', 'tag:dup')");
try {
assertThatThrownBy(() ->
jdbc.update("INSERT INTO tag (id, name, source_ref) VALUES (gen_random_uuid(), 'TagDupB', 'tag:dup')")
).isInstanceOf(DataIntegrityViolationException.class);
} finally {
jdbc.update("DELETE FROM tag WHERE source_ref = 'tag:dup'");
}
}
// ─── helpers ─────────────────────────────────────────────────────────────
private UUID createPerson(String firstName, String lastName) {
@@ -689,12 +504,6 @@ class MigrationIntegrationTest {
return doc.getId();
}
private UUID createDocumentWithDate(String isoDate) {
UUID id = createDocument();
jdbc.update("UPDATE documents SET meta_date = ?::date WHERE id = ?", isoDate, id);
return id;
}
private UUID insertAnnotation(UUID docId) {
UUID id = UUID.randomUUID();
jdbc.update("""

View File

@@ -1,37 +0,0 @@
package org.raddatz.familienarchiv.config;
import org.junit.jupiter.api.Test;
import org.springframework.mock.env.MockEnvironment;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
class FlywayConfigTest {
@Test
void resolveGrafanaDbPassword_throws_when_env_unset() {
FlywayConfig config = new FlywayConfig(null, new MockEnvironment());
assertThatThrownBy(config::resolveGrafanaDbPassword)
.isInstanceOf(IllegalStateException.class)
.hasMessageContaining("GRAFANA_DB_PASSWORD is required");
}
@Test
void resolveGrafanaDbPassword_throws_when_env_blank() {
MockEnvironment env = new MockEnvironment().withProperty("GRAFANA_DB_PASSWORD", " ");
FlywayConfig config = new FlywayConfig(null, env);
assertThatThrownBy(config::resolveGrafanaDbPassword)
.isInstanceOf(IllegalStateException.class)
.hasMessageContaining("GRAFANA_DB_PASSWORD is required");
}
@Test
void resolveGrafanaDbPassword_returns_value_when_env_set() {
MockEnvironment env = new MockEnvironment().withProperty("GRAFANA_DB_PASSWORD", "abc");
FlywayConfig config = new FlywayConfig(null, env);
assertThat(config.resolveGrafanaDbPassword()).isEqualTo("abc");
}
}

View File

@@ -1,89 +0,0 @@
package org.raddatz.familienarchiv.config;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.data.jpa.test.autoconfigure.DataJpaTest;
import org.springframework.boot.jdbc.test.autoconfigure.AutoConfigureTestDatabase;
import org.springframework.context.annotation.Import;
import org.springframework.jdbc.core.JdbcTemplate;
import static org.assertj.core.api.Assertions.assertThat;
// GRAFANA_DB_PASSWORD is supplied via the global test default in
// src/test/resources/application.properties — FlywayConfig fails closed
// when it is unset, so all tests that load the migration path need it.
@DataJpaTest
@AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
@Import({PostgresContainerConfig.class, FlywayConfig.class})
class GrafanaReaderRoleIntegrationTest {
@Autowired JdbcTemplate jdbc;
// --- positive grants (SELECT on the three explicitly granted tables) ---
@Test
void grafana_reader_has_select_on_audit_log() {
assertThat(hasPrivilege("audit_log", "SELECT")).isTrue();
}
@Test
void grafana_reader_has_select_on_documents() {
assertThat(hasPrivilege("documents", "SELECT")).isTrue();
}
@Test
void grafana_reader_has_select_on_transcription_blocks() {
assertThat(hasPrivilege("transcription_blocks", "SELECT")).isTrue();
}
// --- write-deny on the granted tables: SELECT-only means SELECT-only.
// A future migration that GRANTs INSERT/UPDATE/DELETE on any of these
// would fail these tests, even though the original positive grants still
// pass. Locks the boundary in both directions.
@Test
void grafana_reader_has_no_INSERT_on_documents() {
assertThat(hasPrivilege("documents", "INSERT")).isFalse();
}
@Test
void grafana_reader_has_no_UPDATE_on_audit_log() {
assertThat(hasPrivilege("audit_log", "UPDATE")).isFalse();
}
@Test
void grafana_reader_has_no_DELETE_on_transcription_blocks() {
assertThat(hasPrivilege("transcription_blocks", "DELETE")).isFalse();
}
// --- negative grants: PII / sensitive tables MUST NOT be readable.
// The parameterized form catches the "someone widened the grant to
// ALL TABLES IN SCHEMA public" footgun — three specific positive grants
// would still pass while this sweep turns red.
@ParameterizedTest
@ValueSource(strings = {
"app_users",
"user_groups",
"persons",
"notifications",
"document_comments",
"document_annotations",
"geschichten"
})
void grafana_reader_has_no_SELECT_on_protected_table(String table) {
assertThat(hasPrivilege(table, "SELECT")).isFalse();
}
private boolean hasPrivilege(String table, String privilege) {
Boolean result = jdbc.queryForObject(
"SELECT has_table_privilege('grafana_reader', ?, ?)",
Boolean.class,
table,
privilege);
return Boolean.TRUE.equals(result);
}
}

View File

@@ -1,7 +1,6 @@
package org.raddatz.familienarchiv.document;
import org.junit.jupiter.api.Test;
import org.mockito.ArgumentCaptor;
import org.raddatz.familienarchiv.document.DocumentBatchMetadataDTO;
import org.raddatz.familienarchiv.document.DocumentSearchResult;
import org.raddatz.familienarchiv.document.DocumentVersionSummary;
@@ -28,6 +27,7 @@ import org.springframework.security.test.context.support.WithMockUser;
import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.test.web.servlet.MockMvc;
import org.raddatz.familienarchiv.document.DocumentSearchItem;
import org.raddatz.familienarchiv.document.SearchMatchData;
import java.time.LocalDateTime;
@@ -36,9 +36,7 @@ import java.util.List;
import java.util.Optional;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.anyBoolean;
import static org.mockito.ArgumentMatchers.anyInt;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.Mockito.verify;
@@ -76,69 +74,23 @@ class DocumentControllerTest {
@Test
@WithMockUser
void search_returns200_whenAuthenticated() throws Exception {
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean(), any()))
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any()))
.thenReturn(DocumentSearchResult.of(List.of()));
mockMvc.perform(get("/api/documents/search"))
.andExpect(status().isOk());
}
@Test
@WithMockUser
void search_undatedTrue_isReachableByAuthenticatedUser() throws Exception {
// The read GET must stay reachable for READ_ALL users — guards against a
// future refactor accidentally write-guarding the undated triage path (#668).
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean(), any()))
.thenReturn(DocumentSearchResult.of(List.of()));
mockMvc.perform(get("/api/documents/search").param("undated", "true"))
.andExpect(status().isOk());
}
@Test
void search_undatedTrue_returns401_whenUnauthenticated() throws Exception {
mockMvc.perform(get("/api/documents/search").param("undated", "true"))
.andExpect(status().isUnauthorized());
}
@Test
@WithMockUser
void search_undatedTrue_isForwardedToServiceAsTrue() throws Exception {
ArgumentCaptor<Boolean> undatedCaptor = ArgumentCaptor.forClass(Boolean.class);
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean(), any()))
.thenReturn(DocumentSearchResult.of(List.of()));
mockMvc.perform(get("/api/documents/search").param("undated", "true"))
.andExpect(status().isOk());
verify(documentService).searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), undatedCaptor.capture(), any());
assertThat(undatedCaptor.getValue()).isTrue();
}
@Test
@WithMockUser
void search_withoutUndatedParam_forwardsFalseToService() throws Exception {
ArgumentCaptor<Boolean> undatedCaptor = ArgumentCaptor.forClass(Boolean.class);
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean(), any()))
.thenReturn(DocumentSearchResult.of(List.of()));
mockMvc.perform(get("/api/documents/search"))
.andExpect(status().isOk());
verify(documentService).searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), undatedCaptor.capture(), any());
assertThat(undatedCaptor.getValue()).isFalse();
}
@Test
@WithMockUser
void search_withStatusParam_passesItToService() throws Exception {
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), eq(DocumentStatus.REVIEWED), any(), any(), any(), anyBoolean(), any()))
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), eq(DocumentStatus.REVIEWED), any(), any(), any(), any()))
.thenReturn(DocumentSearchResult.of(List.of()));
mockMvc.perform(get("/api/documents/search").param("status", "REVIEWED"))
.andExpect(status().isOk());
verify(documentService).searchDocuments(any(), any(), any(), any(), any(), any(), any(), eq(DocumentStatus.REVIEWED), any(), any(), any(), anyBoolean(), any());
verify(documentService).searchDocuments(any(), any(), any(), any(), any(), any(), any(), eq(DocumentStatus.REVIEWED), any(), any(), any(), any());
}
@Test
@@ -165,7 +117,7 @@ class DocumentControllerTest {
@Test
@WithMockUser
void search_responseContainsTotalCount() throws Exception {
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean(), any()))
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any()))
.thenReturn(DocumentSearchResult.of(List.of()));
mockMvc.perform(get("/api/documents/search"))
@@ -178,15 +130,16 @@ class DocumentControllerTest {
@WithMockUser
void search_responseBodyItemsContainMatchData() throws Exception {
UUID docId = UUID.randomUUID();
Document doc = Document.builder()
.id(docId)
.title("Brief an Anna")
.originalFilename("brief.pdf")
.status(DocumentStatus.UPLOADED)
.build();
var matchData = new SearchMatchData(
"Er schrieb einen langen Brief", List.of(), false, List.of(), List.of(), List.of(), null, List.of());
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean(), any()))
.thenReturn(DocumentSearchResult.of(List.of(new DocumentListItem(
docId, "Brief an Anna", "brief.pdf", null, null,
DatePrecision.UNKNOWN, null, null,
List.of(), List.of(), null, null, null, null,
0, List.of(), matchData,
LocalDateTime.of(2026, 1, 15, 10, 0), LocalDateTime.of(2026, 1, 15, 10, 0)))));
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any()))
.thenReturn(DocumentSearchResult.of(List.of(new DocumentSearchItem(doc, matchData, 0, List.of()))));
mockMvc.perform(get("/api/documents/search").param("q", "Brief"))
.andExpect(status().isOk())
@@ -195,35 +148,12 @@ class DocumentControllerTest {
.value("Er schrieb einen langen Brief"));
}
@Test
@WithMockUser
void search_returns_flat_item_with_id_and_without_sensitive_fields() throws Exception {
UUID docId = UUID.randomUUID();
var matchData = new SearchMatchData(null, List.of(), false, List.of(), List.of(), List.of(), null, List.of());
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean(), any()))
.thenReturn(DocumentSearchResult.of(List.of(new DocumentListItem(
docId, "Brief an Anna", "brief.pdf", null, null,
DatePrecision.UNKNOWN, null, null,
List.of(), List.of(), null, null, null, null,
0, List.of(), matchData,
LocalDateTime.of(2026, 1, 15, 10, 0), LocalDateTime.of(2026, 1, 15, 10, 0)))));
mockMvc.perform(get("/api/documents/search"))
.andExpect(status().isOk())
// flat id field present at top of item (not nested under $.items[0].document.id)
.andExpect(jsonPath("$.items[0].id").value(docId.toString()))
// sensitive storage fields must never appear in list response
.andExpect(jsonPath("$.items[0].transcription").doesNotExist())
.andExpect(jsonPath("$.items[0].filePath").doesNotExist())
.andExpect(jsonPath("$.items[0].fileHash").doesNotExist());
}
// ─── /api/documents/search pagination ─────────────────────────────────────
@Test
@WithMockUser
void search_responseExposesPagingFields() throws Exception {
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean(), any()))
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any()))
.thenReturn(DocumentSearchResult.of(List.of()));
mockMvc.perform(get("/api/documents/search"))
@@ -268,7 +198,7 @@ class DocumentControllerTest {
@Test
@WithMockUser
void search_passesPageRequestToService() throws Exception {
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean(), any()))
when(documentService.searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any()))
.thenReturn(DocumentSearchResult.of(List.of()));
mockMvc.perform(get("/api/documents/search").param("page", "2").param("size", "25"))
@@ -276,7 +206,7 @@ class DocumentControllerTest {
org.mockito.ArgumentCaptor<org.springframework.data.domain.Pageable> captor =
org.mockito.ArgumentCaptor.forClass(org.springframework.data.domain.Pageable.class);
verify(documentService).searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean(), captor.capture());
verify(documentService).searchDocuments(any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), any(), captor.capture());
org.springframework.data.domain.Pageable pageable = captor.getValue();
org.assertj.core.api.Assertions.assertThat(pageable.getPageNumber()).isEqualTo(2);
org.assertj.core.api.Assertions.assertThat(pageable.getPageSize()).isEqualTo(25);
@@ -345,34 +275,6 @@ class DocumentControllerTest {
.andExpect(status().isOk());
}
@Test
@WithMockUser(authorities = "WRITE_ALL")
void updateDocument_bindsPrecisionFormFields_toDTO() throws Exception {
// Pins the wire contract: the edit form's metaDatePrecision / metaDateEnd /
// metaDateRaw multipart field names must bind to DocumentUpdateDTO. A rename
// on either side silently drops the precision edit; this captures the DTO.
UUID id = UUID.randomUUID();
Document doc = Document.builder().id(id).title("Brief").originalFilename("brief.pdf").build();
when(userService.findByEmail(any())).thenReturn(AppUser.builder().id(UUID.randomUUID()).build());
org.mockito.ArgumentCaptor<DocumentUpdateDTO> captor =
org.mockito.ArgumentCaptor.forClass(DocumentUpdateDTO.class);
when(documentService.updateDocument(eq(id), captor.capture(), any(), any())).thenReturn(doc);
mockMvc.perform(multipart("/api/documents/" + id)
.param("metaDatePrecision", "RANGE")
.param("metaDateEnd", "1917-01-11")
.param("metaDateRaw", "10.11. Januar 1917")
.with(req -> { req.setMethod("PUT"); return req; }).with(csrf()))
.andExpect(status().isOk());
DocumentUpdateDTO bound = captor.getValue();
org.assertj.core.api.Assertions.assertThat(bound.getMetaDatePrecision()).isEqualTo(DatePrecision.RANGE);
org.assertj.core.api.Assertions.assertThat(bound.getMetaDateEnd())
.isEqualTo(java.time.LocalDate.of(1917, 1, 11));
org.assertj.core.api.Assertions.assertThat(bound.getMetaDateRaw()).isEqualTo("10.11. Januar 1917");
}
// ─── DELETE /api/documents/{id} ──────────────────────────────────────────
@Test
@@ -1194,7 +1096,7 @@ class DocumentControllerTest {
void getDocumentIds_returns200_andDelegatesToService() throws Exception {
when(userService.findByEmail(any())).thenReturn(AppUser.builder().id(UUID.randomUUID()).build());
UUID id = UUID.randomUUID();
when(documentService.findIdsForFilter(any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean()))
when(documentService.findIdsForFilter(any(), any(), any(), any(), any(), any(), any(), any(), any()))
.thenReturn(List.of(id));
mockMvc.perform(get("/api/documents/ids"))
@@ -1207,13 +1109,13 @@ class DocumentControllerTest {
void getDocumentIds_passesSenderIdParamToService() throws Exception {
when(userService.findByEmail(any())).thenReturn(AppUser.builder().id(UUID.randomUUID()).build());
UUID senderId = UUID.randomUUID();
when(documentService.findIdsForFilter(any(), any(), any(), eq(senderId), any(), any(), any(), any(), any(), anyBoolean()))
when(documentService.findIdsForFilter(any(), any(), any(), eq(senderId), any(), any(), any(), any(), any()))
.thenReturn(List.of());
mockMvc.perform(get("/api/documents/ids").param("senderId", senderId.toString()))
.andExpect(status().isOk());
verify(documentService).findIdsForFilter(any(), any(), any(), eq(senderId), any(), any(), any(), any(), any(), anyBoolean());
verify(documentService).findIdsForFilter(any(), any(), any(), eq(senderId), any(), any(), any(), any(), any());
}
@Test
@@ -1223,7 +1125,7 @@ class DocumentControllerTest {
// Service returns 5001 IDs — one over BULK_EDIT_FILTER_MAX_IDS (5000).
java.util.List<UUID> tooMany = new java.util.ArrayList<>(5001);
for (int i = 0; i < 5001; i++) tooMany.add(UUID.randomUUID());
when(documentService.findIdsForFilter(any(), any(), any(), any(), any(), any(), any(), any(), any(), anyBoolean()))
when(documentService.findIdsForFilter(any(), any(), any(), any(), any(), any(), any(), any(), any()))
.thenReturn(tooMany);
mockMvc.perform(get("/api/documents/ids"))

View File

@@ -123,10 +123,11 @@ class DocumentLazyLoadingTest {
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.RECEIVER, "asc", null, false, PageRequest.of(0, 20));
DocumentSort.RECEIVER, "asc", null,
PageRequest.of(0, 20));
assertThat(result.totalElements()).isGreaterThan(0);
assertThatCode(() ->
result.items().forEach(i -> { if (i.sender() != null) i.sender().getLastName(); }))
result.items().forEach(i -> i.document().getSender().getLastName()))
.doesNotThrowAnyException();
}
@@ -138,7 +139,8 @@ class DocumentLazyLoadingTest {
assertThatCode(() -> documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.SENDER, "asc", null, false, PageRequest.of(0, 20)))
DocumentSort.SENDER, "asc", null,
PageRequest.of(0, 20)))
.doesNotThrowAnyException();
}

View File

@@ -1,117 +0,0 @@
package org.raddatz.familienarchiv.document;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.raddatz.familienarchiv.audit.AuditLogQueryService;
import org.raddatz.familienarchiv.ocr.TrainingLabel;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.context.annotation.Import;
import org.springframework.data.domain.PageRequest;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.bean.override.mockito.MockitoBean;
import software.amazon.awssdk.services.s3.S3Client;
import java.util.HashSet;
import java.util.Set;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatCode;
/**
* AC #2: Document with trainingLabels does not cause LazyInitializationException in search.
* AC #3: Detail API still returns trainingLabels after the Document.list graph change.
*/
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.NONE)
@ActiveProfiles("test")
@Import(PostgresContainerConfig.class)
class DocumentListItemIntegrationTest {
@MockitoBean
S3Client s3Client;
@MockitoBean
AuditLogQueryService auditLogQueryService;
@Autowired
DocumentRepository documentRepository;
@Autowired
DocumentService documentService;
@AfterEach
void cleanup() {
documentRepository.deleteAll();
}
@Test
void search_doesNotThrow_whenDocumentHasTrainingLabels() {
documentRepository.save(Document.builder()
.title("Kurrent Brief")
.originalFilename("kurrent.pdf")
.status(DocumentStatus.UPLOADED)
.trainingLabels(new HashSet<>(Set.of(TrainingLabel.KURRENT_RECOGNITION)))
.build());
assertThatCode(() -> documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, PageRequest.of(0, 50)))
.doesNotThrowAnyException();
}
@Test
void search_returns_list_item_without_sensitive_fields_when_document_has_training_labels() {
documentRepository.save(Document.builder()
.title("Kurrent Brief")
.originalFilename("kurrent2.pdf")
.status(DocumentStatus.UPLOADED)
.trainingLabels(new HashSet<>(Set.of(TrainingLabel.KURRENT_RECOGNITION)))
.build());
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, PageRequest.of(0, 50));
assertThat(result.totalElements()).isGreaterThan(0);
DocumentListItem item = result.items().get(0);
assertThat(item.id()).isNotNull();
assertThat(item.title()).isEqualTo("Kurrent Brief");
}
@Test
void search_listItem_carriesMetaDatePrecisionAndEnd() {
documentRepository.save(Document.builder()
.title("Range Brief")
.originalFilename("range.pdf")
.status(DocumentStatus.UPLOADED)
.documentDate(java.time.LocalDate.of(1943, 1, 1))
.metaDatePrecision(DatePrecision.RANGE)
.metaDateEnd(java.time.LocalDate.of(1943, 12, 31))
.build());
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, PageRequest.of(0, 50));
DocumentListItem item = result.items().stream()
.filter(i -> i.title().equals("Range Brief")).findFirst().orElseThrow();
assertThat(item.metaDatePrecision()).isEqualTo(DatePrecision.RANGE);
assertThat(item.metaDateEnd()).isEqualTo(java.time.LocalDate.of(1943, 12, 31));
}
@Test
void detail_stillReturnsTrainingLabels() {
Document saved = documentRepository.save(Document.builder()
.title("Detail Test")
.originalFilename("detail_test.pdf")
.status(DocumentStatus.UPLOADED)
.trainingLabels(new HashSet<>(Set.of(TrainingLabel.KURRENT_RECOGNITION)))
.build());
// Document.full entity graph (used by getDocumentById) must still load trainingLabels
Document loaded = documentService.getDocumentById(saved.getId());
assertThat(loaded.getTrainingLabels()).containsExactly(TrainingLabel.KURRENT_RECOGNITION);
}
}

View File

@@ -62,7 +62,8 @@ class DocumentSearchPagedIntegrationTest {
void search_firstPage_returnsExactlyPageSizeItems_andCorrectTotalElements() {
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, PageRequest.of(0, 50));
DocumentSort.DATE, "DESC", null,
PageRequest.of(0, 50));
assertThat(result.items()).hasSize(50);
assertThat(result.totalElements()).isEqualTo(FIXTURE_SIZE);
@@ -75,7 +76,8 @@ class DocumentSearchPagedIntegrationTest {
void search_lastPartialPage_returnsRemainingItems() {
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, PageRequest.of(2, 50));
DocumentSort.DATE, "DESC", null,
PageRequest.of(2, 50));
// Page 2 (offset 100) of 120 docs → exactly 20 items on the tail.
assertThat(result.items()).hasSize(20);
@@ -87,7 +89,8 @@ class DocumentSearchPagedIntegrationTest {
void search_pageBeyondLast_returnsEmptyContent_totalElementsStillCorrect() {
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, PageRequest.of(99, 50));
DocumentSort.DATE, "DESC", null,
PageRequest.of(99, 50));
assertThat(result.items()).isEmpty();
assertThat(result.totalElements()).isEqualTo(FIXTURE_SIZE);
@@ -100,7 +103,8 @@ class DocumentSearchPagedIntegrationTest {
// returns the correct total from a real repository fetch.
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.SENDER, "asc", null, false, PageRequest.of(1, 50));
DocumentSort.SENDER, "asc", null,
PageRequest.of(1, 50));
assertThat(result.items()).hasSize(50);
assertThat(result.totalElements()).isEqualTo(FIXTURE_SIZE);
@@ -108,98 +112,23 @@ class DocumentSearchPagedIntegrationTest {
assertThat(result.totalPages()).isEqualTo(3);
}
@Test
void search_undatedCount_isGlobalFilteredTotal_notPageSlice() {
// Seed 70 undated docs on top of the 120 dated ones. With a 50-per-page
// window the undated rows span multiple pages, so a page-local count could
// never exceed 50 — the global count must be the full 70 (issue #668).
int undatedTotal = 70;
for (int i = 0; i < undatedTotal; i++) {
documentRepository.save(Document.builder()
.title("Undatiert-" + String.format("%03d", i))
.originalFilename("undatiert-" + i + ".pdf")
.status(DocumentStatus.UPLOADED)
.metaDatePrecision(DatePrecision.UNKNOWN)
.documentDate(null)
.build());
}
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, PageRequest.of(0, 50));
// Global undated count is the full undated total, independent of page size.
assertThat(result.undatedCount()).isEqualTo(undatedTotal);
// Total matches both dated + undated (no undated-only filter applied).
assertThat(result.totalElements()).isEqualTo(FIXTURE_SIZE + undatedTotal);
// The first DATE-DESC page is all dated rows (nulls last), so a page-local
// tally would report 0 undated — proving the count is not page-derived.
assertThat(result.items()).allMatch(item -> item.documentDate() != null);
}
@Test
void search_undatedCount_ignoresUndatedOnlyToggle() {
// The "Nur undatierte" toggle must not skew the count: whether undated=true or
// false, the global undated count for the same filter is identical (issue #668).
int undatedTotal = 12;
for (int i = 0; i < undatedTotal; i++) {
documentRepository.save(Document.builder()
.title("U-" + i)
.originalFilename("u-" + i + ".pdf")
.status(DocumentStatus.UPLOADED)
.metaDatePrecision(DatePrecision.UNKNOWN)
.documentDate(null)
.build());
}
DocumentSearchResult unfiltered = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, PageRequest.of(0, 50));
DocumentSearchResult undatedOnly = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, true, PageRequest.of(0, 50));
assertThat(unfiltered.undatedCount()).isEqualTo(undatedTotal);
assertThat(undatedOnly.undatedCount()).isEqualTo(undatedTotal);
}
@Test
void search_undatedCount_isZero_insideDateRange() {
// A from/to range excludes undated rows by the collision rule (#668), so the
// global undated count inside a range is legitimately 0 even when undated docs exist.
for (int i = 0; i < 5; i++) {
documentRepository.save(Document.builder()
.title("U-range-" + i)
.originalFilename("u-range-" + i + ".pdf")
.status(DocumentStatus.UPLOADED)
.metaDatePrecision(DatePrecision.UNKNOWN)
.documentDate(null)
.build());
}
DocumentSearchResult result = documentService.searchDocuments(
null, LocalDate.of(1900, 1, 1), LocalDate.of(2000, 12, 31),
null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, PageRequest.of(0, 50));
assertThat(result.undatedCount()).isZero();
}
@Test
void search_differentPagesReturnDisjointSlices() {
DocumentSearchResult page0 = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, PageRequest.of(0, 50));
DocumentSort.DATE, "DESC", null,
PageRequest.of(0, 50));
DocumentSearchResult page1 = documentService.searchDocuments(
null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, PageRequest.of(1, 50));
DocumentSort.DATE, "DESC", null,
PageRequest.of(1, 50));
// No document id should appear on both pages — slicing must be exclusive.
var idsOnPage0 = page0.items().stream()
.map(item -> item.id())
.map(item -> item.document().getId())
.toList();
var idsOnPage1 = page1.items().stream()
.map(item -> item.id())
.map(item -> item.document().getId())
.toList();
for (UUID id : idsOnPage0) {
assertThat(idsOnPage1).doesNotContain(id);

View File

@@ -3,9 +3,10 @@ package org.raddatz.familienarchiv.document;
import io.swagger.v3.oas.annotations.media.Schema;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.audit.ActivityActorDTO;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.springframework.data.domain.PageRequest;
import java.time.LocalDateTime;
import java.util.List;
import java.util.UUID;
@@ -13,13 +14,14 @@ import static org.assertj.core.api.Assertions.assertThat;
class DocumentSearchResultTest {
private DocumentListItem item(UUID docId) {
return new DocumentListItem(
docId, "Test", "test.pdf", null, null,
DatePrecision.UNKNOWN, null, null,
List.of(), List.of(), null, null, null, null,
0, List.of(), SearchMatchData.empty(),
LocalDateTime.of(2026, 1, 15, 10, 0), LocalDateTime.of(2026, 1, 15, 10, 0));
private DocumentSearchItem item(UUID docId) {
Document doc = Document.builder()
.id(docId)
.title("Test")
.originalFilename("test.pdf")
.status(DocumentStatus.UPLOADED)
.build();
return new DocumentSearchItem(doc, SearchMatchData.empty(), 0, List.of());
}
@Test
@@ -43,7 +45,7 @@ class DocumentSearchResultTest {
@Test
void paged_factory_populates_paging_fields_from_pageable_and_total() {
List<DocumentListItem> slice = List.of(item(UUID.randomUUID()), item(UUID.randomUUID()));
List<DocumentSearchItem> slice = List.of(item(UUID.randomUUID()), item(UUID.randomUUID()));
DocumentSearchResult result = DocumentSearchResult.paged(slice, PageRequest.of(1, 50), 120L);
@@ -66,12 +68,9 @@ class DocumentSearchResultTest {
void of_exposes_items_with_completion_and_contributors() {
UUID id = UUID.randomUUID();
ActivityActorDTO actor = new ActivityActorDTO("AB", "#f00", "Anna Braun");
DocumentListItem item = new DocumentListItem(
id, "T", "t.pdf", null, null,
DatePrecision.UNKNOWN, null, null,
List.of(), List.of(), null, null, null, null,
75, List.of(actor), SearchMatchData.empty(),
LocalDateTime.of(2026, 1, 15, 10, 0), LocalDateTime.of(2026, 1, 15, 10, 0));
Document doc = Document.builder().id(id).title("T").originalFilename("t.pdf")
.status(DocumentStatus.UPLOADED).build();
DocumentSearchItem item = new DocumentSearchItem(doc, SearchMatchData.empty(), 75, List.of(actor));
DocumentSearchResult result = DocumentSearchResult.of(List.of(item));
@@ -102,32 +101,4 @@ class DocumentSearchResultTest {
assertThat(schema.requiredMode()).isEqualTo(Schema.RequiredMode.REQUIRED);
}
}
@Test
void undatedCount_component_is_annotated_as_required_in_openapi_schema() throws NoSuchFieldException {
Schema schema = DocumentSearchResult.class.getDeclaredField("undatedCount").getAnnotation(Schema.class);
assertThat(schema).isNotNull();
assertThat(schema.requiredMode()).isEqualTo(Schema.RequiredMode.REQUIRED);
}
@Test
void factories_default_undatedCount_to_zero() {
assertThat(DocumentSearchResult.of(List.of()).undatedCount()).isZero();
assertThat(DocumentSearchResult.paged(List.of(), PageRequest.of(0, 50), 0L).undatedCount()).isZero();
}
@Test
void withUndatedCount_overlays_count_and_preserves_other_fields() {
DocumentSearchResult base = DocumentSearchResult.paged(
List.of(item(UUID.randomUUID())), PageRequest.of(1, 50), 120L);
DocumentSearchResult withCount = base.withUndatedCount(7L);
assertThat(withCount.undatedCount()).isEqualTo(7L);
assertThat(withCount.items()).isEqualTo(base.items());
assertThat(withCount.totalElements()).isEqualTo(120L);
assertThat(withCount.pageNumber()).isEqualTo(1);
assertThat(withCount.pageSize()).isEqualTo(50);
assertThat(withCount.totalPages()).isEqualTo(3);
}
}

View File

@@ -67,10 +67,10 @@ class DocumentServiceSortTest {
.thenReturn(new PageImpl<>(List.of(newer, older)));
DocumentSearchResult result = documentService.searchDocuments(
"Brief", null, null, null, null, null, null, null, DocumentSort.DATE, "DESC", null, false, PAGE);
"Brief", null, null, null, null, null, null, null, DocumentSort.DATE, "DESC", null, PAGE);
assertThat(result.items()).hasSize(2);
assertThat(result.items().get(0).id()).isEqualTo(id2); // newer first
assertThat(result.items().get(0).document().getId()).isEqualTo(id2); // newer first
}
// ─── RELEVANCE sort — pure text (no filters) ──────────────────────────────
@@ -84,7 +84,7 @@ class DocumentServiceSortTest {
.thenReturn(List.of(doc(id1)));
documentService.searchDocuments(
"Brief", null, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, false, PAGE);
"Brief", null, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, PAGE);
verify(documentRepository).findFtsPageRaw(anyString(), anyInt(), anyInt());
verify(documentRepository, never()).findAllMatchingIdsByFts(anyString());
@@ -102,9 +102,9 @@ class DocumentServiceSortTest {
when(documentRepository.findAllById(any())).thenReturn(List.of(doc(id2), doc(id1))); // unordered from JPA
DocumentSearchResult result = documentService.searchDocuments(
"Brief", null, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, false, PAGE);
"Brief", null, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, PAGE);
assertThat(result.items().get(0).id()).isEqualTo(id1);
assertThat(result.items().get(0).document().getId()).isEqualTo(id1);
}
@Test
@@ -119,9 +119,9 @@ class DocumentServiceSortTest {
when(documentRepository.findAllById(any())).thenReturn(List.of(doc(id2), doc(id1)));
DocumentSearchResult result = documentService.searchDocuments(
"Brief", null, null, null, null, null, null, null, null, null, null, false, PAGE);
"Brief", null, null, null, null, null, null, null, null, null, null, PAGE);
assertThat(result.items().get(0).id()).isEqualTo(id1);
assertThat(result.items().get(0).document().getId()).isEqualTo(id1);
}
// ─── RELEVANCE sort — overflow guard ─────────────────────────────────────
@@ -133,7 +133,7 @@ class DocumentServiceSortTest {
DocumentSearchResult result = documentService.searchDocuments(
"Brief", null, null, null, null, null, null, null,
DocumentSort.RELEVANCE, null, null, false, hugePage);
DocumentSort.RELEVANCE, null, null, hugePage);
assertThat(result.items()).isEmpty();
verify(documentRepository, never()).findFtsPageRaw(anyString(), anyInt(), anyInt());
@@ -153,10 +153,10 @@ class DocumentServiceSortTest {
DocumentSearchResult result = documentService.searchDocuments(
"Brief", null, null, null, null, null, null, null,
DocumentSort.RELEVANCE, null, null, false, PAGE);
DocumentSort.RELEVANCE, null, null, PAGE);
assertThat(result.items()).hasSize(1);
assertThat(result.items().get(0).id()).isEqualTo(uuidId);
assertThat(result.items().get(0).document().getId()).isEqualTo(uuidId);
}
// ─── RELEVANCE sort — text + active filter ────────────────────────────────
@@ -173,7 +173,7 @@ class DocumentServiceSortTest {
// sender filter is active → triggers in-memory path, not findFtsPageRaw
LocalDate from = LocalDate.of(1900, 1, 1);
documentService.searchDocuments(
"Brief", from, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, false, PAGE);
"Brief", from, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, PAGE);
verify(documentRepository, never()).findFtsPageRaw(anyString(), anyInt(), anyInt());
verify(documentRepository).findAllMatchingIdsByFts("Brief");

View File

@@ -11,7 +11,7 @@ import org.raddatz.familienarchiv.audit.AuditLogQueryService;
import org.raddatz.familienarchiv.audit.AuditService;
import org.raddatz.familienarchiv.document.annotation.AnnotationService;
import org.raddatz.familienarchiv.document.transcription.TranscriptionBlockQueryService;
import org.raddatz.familienarchiv.document.DocumentListItem;
import org.raddatz.familienarchiv.document.DocumentSearchItem;
import org.raddatz.familienarchiv.document.DocumentSearchResult;
import org.raddatz.familienarchiv.document.DocumentSort;
import org.raddatz.familienarchiv.document.DocumentUpdateDTO;
@@ -47,8 +47,6 @@ import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.anyInt;
import static org.mockito.ArgumentMatchers.anyString;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.ArgumentMatchers.isNull;
import static org.mockito.Mockito.*;
@@ -146,53 +144,6 @@ class DocumentServiceTest {
assertThat(doc.getArchiveFolder()).isEqualTo("Mappe B");
}
@Test
void updateDocument_persistsDatePrecisionEndAndRaw() throws Exception {
UUID id = UUID.randomUUID();
Document doc = Document.builder().id(id).receivers(new HashSet<>()).tags(new HashSet<>()).build();
when(documentRepository.findById(id)).thenReturn(Optional.of(doc));
when(documentRepository.save(any())).thenReturn(doc);
DocumentUpdateDTO dto = new DocumentUpdateDTO();
dto.setDocumentDate(LocalDate.of(1917, 1, 10));
dto.setMetaDatePrecision(DatePrecision.RANGE);
dto.setMetaDateEnd(LocalDate.of(1917, 1, 11));
dto.setMetaDateRaw("10.11. Januar 1917");
documentService.updateDocument(id, dto, null, null);
assertThat(doc.getMetaDatePrecision()).isEqualTo(DatePrecision.RANGE);
assertThat(doc.getMetaDateEnd()).isEqualTo(LocalDate.of(1917, 1, 11));
assertThat(doc.getMetaDateRaw()).isEqualTo("10.11. Januar 1917");
}
@Test
void updateDocument_preservesStoredPrecision_whenDtoOmitsIt() throws Exception {
// Editing a doc (e.g. fixing a location typo) without touching the precision
// controls must NOT fabricate a precision. The form omits the three precision
// fields → they arrive null on the DTO → the stored values must be preserved.
UUID id = UUID.randomUUID();
Document doc = Document.builder()
.id(id)
.metaDatePrecision(DatePrecision.MONTH)
.metaDateEnd(LocalDate.of(1916, 6, 30))
.metaDateRaw("Juni 1916")
.receivers(new HashSet<>())
.tags(new HashSet<>())
.build();
when(documentRepository.findById(id)).thenReturn(Optional.of(doc));
when(documentRepository.save(any())).thenReturn(doc);
DocumentUpdateDTO dto = new DocumentUpdateDTO();
dto.setLocation("Berlin"); // unrelated edit; precision fields left null
documentService.updateDocument(id, dto, null, null);
assertThat(doc.getMetaDatePrecision()).isEqualTo(DatePrecision.MONTH);
assertThat(doc.getMetaDateEnd()).isEqualTo(LocalDate.of(1916, 6, 30));
assertThat(doc.getMetaDateRaw()).isEqualTo("Juni 1916");
}
// ─── deleteTagCascading ───────────────────────────────────────────────────
@Test
@@ -1411,7 +1362,8 @@ class DocumentServiceTest {
.thenReturn(new PageImpl<>(List.of()));
documentService.searchDocuments(null, null, null, null, null, null, null, null,
org.raddatz.familienarchiv.document.DocumentSort.DATE, "DESC", null, false, org.springframework.data.domain.PageRequest.of(1, 50));
org.raddatz.familienarchiv.document.DocumentSort.DATE, "DESC", null,
org.springframework.data.domain.PageRequest.of(1, 50));
verify(documentRepository).findAll(any(org.springframework.data.jpa.domain.Specification.class), any(Pageable.class));
verify(documentRepository, never()).findAll(any(org.springframework.data.jpa.domain.Specification.class), any(Sort.class));
@@ -1424,7 +1376,8 @@ class DocumentServiceTest {
.thenReturn(new PageImpl<>(List.of()));
documentService.searchDocuments(null, null, null, null, null, null, null, null,
org.raddatz.familienarchiv.document.DocumentSort.DATE, "DESC", null, false, org.springframework.data.domain.PageRequest.of(3, 25));
org.raddatz.familienarchiv.document.DocumentSort.DATE, "DESC", null,
org.springframework.data.domain.PageRequest.of(3, 25));
verify(documentRepository).findAll(any(org.springframework.data.jpa.domain.Specification.class), captor.capture());
assertThat(captor.getValue().getPageNumber()).isEqualTo(3);
@@ -1440,7 +1393,8 @@ class DocumentServiceTest {
.thenReturn(new PageImpl<>(List.of(d), org.springframework.data.domain.PageRequest.of(0, 50), 120L));
DocumentSearchResult result = documentService.searchDocuments(null, null, null, null, null, null, null, null,
org.raddatz.familienarchiv.document.DocumentSort.DATE, "DESC", null, false, org.springframework.data.domain.PageRequest.of(0, 50));
org.raddatz.familienarchiv.document.DocumentSort.DATE, "DESC", null,
org.springframework.data.domain.PageRequest.of(0, 50));
assertThat(result.totalElements()).isEqualTo(120L);
assertThat(result.pageNumber()).isZero();
@@ -1449,50 +1403,6 @@ class DocumentServiceTest {
assertThat(result.items()).hasSize(1); // only the slice is enriched
}
@Test
void searchDocuments_dateSort_DESC_ordersUndatedLast() {
ArgumentCaptor<Pageable> captor = ArgumentCaptor.forClass(Pageable.class);
when(documentRepository.findAll(any(org.springframework.data.jpa.domain.Specification.class), any(Pageable.class)))
.thenReturn(new PageImpl<>(List.of()));
documentService.searchDocuments(null, null, null, null, null, null, null, null,
DocumentSort.DATE, "DESC", null, false, org.springframework.data.domain.PageRequest.of(0, 5));
verify(documentRepository).findAll(any(org.springframework.data.jpa.domain.Specification.class), captor.capture());
Sort.Order dateOrder = captor.getValue().getSort().getOrderFor("documentDate");
assertThat(dateOrder).isNotNull();
assertThat(dateOrder.getDirection()).isEqualTo(Sort.Direction.DESC);
assertThat(dateOrder.getNullHandling()).isEqualTo(Sort.NullHandling.NULLS_LAST);
// Owner-decided tiebreaker (#668): title ASC, not createdAt.
Sort.Order tiebreak = captor.getValue().getSort().getOrderFor("title");
assertThat(tiebreak).isNotNull();
assertThat(tiebreak.getDirection()).isEqualTo(Sort.Direction.ASC);
assertThat(captor.getValue().getSort().getOrderFor("createdAt")).isNull();
}
@Test
void searchDocuments_dateSort_ASC_ordersUndatedLast() {
// The ASC bug: Postgres puts NULLs FIRST on ascending sort without explicit
// NULLS LAST, surfacing undated documents at the top. This is the red.
ArgumentCaptor<Pageable> captor = ArgumentCaptor.forClass(Pageable.class);
when(documentRepository.findAll(any(org.springframework.data.jpa.domain.Specification.class), any(Pageable.class)))
.thenReturn(new PageImpl<>(List.of()));
documentService.searchDocuments(null, null, null, null, null, null, null, null,
DocumentSort.DATE, "ASC", null, false, org.springframework.data.domain.PageRequest.of(0, 5));
verify(documentRepository).findAll(any(org.springframework.data.jpa.domain.Specification.class), captor.capture());
Sort.Order dateOrder = captor.getValue().getSort().getOrderFor("documentDate");
assertThat(dateOrder).isNotNull();
assertThat(dateOrder.getDirection()).isEqualTo(Sort.Direction.ASC);
assertThat(dateOrder.getNullHandling()).isEqualTo(Sort.NullHandling.NULLS_LAST);
// Owner-decided tiebreaker (#668): title ASC, not createdAt.
Sort.Order tiebreak = captor.getValue().getSort().getOrderFor("title");
assertThat(tiebreak).isNotNull();
assertThat(tiebreak.getDirection()).isEqualTo(Sort.Direction.ASC);
assertThat(captor.getValue().getSort().getOrderFor("createdAt")).isNull();
}
@Test
void searchDocuments_UPDATED_AT_sort_resolves_to_updatedAt_field() {
ArgumentCaptor<Pageable> captor = ArgumentCaptor.forClass(Pageable.class);
@@ -1500,7 +1410,8 @@ class DocumentServiceTest {
.thenReturn(new PageImpl<>(List.of()));
documentService.searchDocuments(null, null, null, null, null, null, null, null,
DocumentSort.UPDATED_AT, "DESC", null, false, org.springframework.data.domain.PageRequest.of(0, 5));
DocumentSort.UPDATED_AT, "DESC", null,
org.springframework.data.domain.PageRequest.of(0, 5));
verify(documentRepository).findAll(any(org.springframework.data.jpa.domain.Specification.class), captor.capture());
assertThat(captor.getValue().getSort())
@@ -1524,7 +1435,8 @@ class DocumentServiceTest {
.thenReturn(all);
DocumentSearchResult result = documentService.searchDocuments(null, null, null, null, null, null, null, null,
org.raddatz.familienarchiv.document.DocumentSort.SENDER, "asc", null, false, org.springframework.data.domain.PageRequest.of(1, 50));
org.raddatz.familienarchiv.document.DocumentSort.SENDER, "asc", null,
org.springframework.data.domain.PageRequest.of(1, 50));
assertThat(result.totalElements()).isEqualTo(120L);
assertThat(result.pageNumber()).isEqualTo(1);
@@ -1532,7 +1444,7 @@ class DocumentServiceTest {
assertThat(result.totalPages()).isEqualTo(3);
assertThat(result.items()).hasSize(50);
// Page 1 (offset 50) under ascending sender sort should start at L050
assertThat(result.items().get(0).sender().getLastName()).isEqualTo("L050");
assertThat(result.items().get(0).document().getSender().getLastName()).isEqualTo("L050");
}
@Test
@@ -1548,7 +1460,8 @@ class DocumentServiceTest {
.thenReturn(all);
DocumentSearchResult result = documentService.searchDocuments(null, null, null, null, null, null, null, null,
org.raddatz.familienarchiv.document.DocumentSort.SENDER, "asc", null, false, org.springframework.data.domain.PageRequest.of(10, 50));
org.raddatz.familienarchiv.document.DocumentSort.SENDER, "asc", null,
org.springframework.data.domain.PageRequest.of(10, 50));
assertThat(result.items()).isEmpty();
assertThat(result.totalElements()).isEqualTo(30L);
@@ -1561,7 +1474,7 @@ class DocumentServiceTest {
when(documentRepository.findAll(any(org.springframework.data.jpa.domain.Specification.class), any(Pageable.class)))
.thenReturn(new PageImpl<>(List.of()));
documentService.searchDocuments(null, null, null, null, null, null, null, DocumentStatus.REVIEWED, null, null, null, false, UNPAGED);
documentService.searchDocuments(null, null, null, null, null, null, null, DocumentStatus.REVIEWED, null, null, null, UNPAGED);
verify(documentRepository).findAll(any(org.springframework.data.jpa.domain.Specification.class), any(Pageable.class));
}
@@ -1571,7 +1484,7 @@ class DocumentServiceTest {
when(documentRepository.findAll(any(org.springframework.data.jpa.domain.Specification.class), any(Pageable.class)))
.thenReturn(new PageImpl<>(List.of()));
documentService.searchDocuments(null, null, null, null, null, null, null, null, null, null, null, false, UNPAGED);
documentService.searchDocuments(null, null, null, null, null, null, null, null, null, null, null, UNPAGED);
verify(documentRepository).findAll(any(org.springframework.data.jpa.domain.Specification.class), any(Pageable.class));
}
@@ -1649,10 +1562,10 @@ class DocumentServiceTest {
.thenReturn(List.of(withSender, noSender));
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null, DocumentSort.SENDER, "asc", null, false, UNPAGED);
null, null, null, null, null, null, null, null, DocumentSort.SENDER, "asc", null, UNPAGED);
assertThat(result.items()).hasSize(2);
assertThat(result.items()).extracting(DocumentListItem::title).containsExactly("Has Sender", "No Sender");
assertThat(result.items()).extracting(item -> item.document().getTitle()).containsExactly("Has Sender", "No Sender");
}
// ─── searchDocuments — RECEIVER sort, empty receivers ───────────────────────
@@ -1669,117 +1582,12 @@ class DocumentServiceTest {
.thenReturn(List.of(noReceivers, withReceiver));
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null, DocumentSort.RECEIVER, "asc", null, false, UNPAGED);
null, null, null, null, null, null, null, null, DocumentSort.RECEIVER, "asc", null, UNPAGED);
assertThat(result.items()).extracting(DocumentListItem::title)
assertThat(result.items()).extracting(item -> item.document().getTitle())
.containsExactly("Has Receiver", "No Receivers");
}
// ─── searchDocuments — undated docs stay in their person group (#668) ───────
@Test
void searchDocuments_senderSort_asc_keepsUndatedInsideSenderGroupNotAtHead() {
// Locking test (#668): the in-memory SENDER comparator orders by sender name,
// not by date, so an undated (null documentDate) letter must stay WITHIN its
// sender's group — it must NOT float to the head of a multi-sender page.
// Two senders, each with a dated + an undated doc. ASC by "lastName firstName":
// "Adler Bob" < "Ziegler Anna", so both of Bob's docs come before both of Anna's.
// The undated doc supplied FIRST in the input proves grouping (not date) wins:
// were it ordered by date, the two undated docs would clump together at one end.
Person bobAdler = Person.builder().id(UUID.randomUUID()).firstName("Bob").lastName("Adler").build();
Person annaZiegler = Person.builder().id(UUID.randomUUID()).firstName("Anna").lastName("Ziegler").build();
Document undatedBob = Document.builder().id(UUID.randomUUID()).title("Bob undated")
.sender(bobAdler).documentDate(null).build();
Document datedBob = Document.builder().id(UUID.randomUUID()).title("Bob dated")
.sender(bobAdler).documentDate(LocalDate.of(1916, 6, 15)).build();
Document undatedAnna = Document.builder().id(UUID.randomUUID()).title("Anna undated")
.sender(annaZiegler).documentDate(null).build();
Document datedAnna = Document.builder().id(UUID.randomUUID()).title("Anna dated")
.sender(annaZiegler).documentDate(LocalDate.of(1943, 12, 24)).build();
// Input order interleaves dated/undated so a date-based regression would reorder.
when(documentRepository.findAll(any(org.springframework.data.jpa.domain.Specification.class)))
.thenReturn(List.of(undatedBob, datedAnna, datedBob, undatedAnna));
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null, DocumentSort.SENDER, "asc", null, false, UNPAGED);
// Bob's group precedes Anna's group (ASC by sender). The sort is stable, so
// within each group the input order is preserved (undatedBob, datedBob for Bob;
// datedAnna, undatedAnna for Anna). The undated docs never jump to the head and
// each stays inside its sender group — a date-based comparator would instead
// clump the two undated docs together at one end.
assertThat(result.items()).extracting(DocumentListItem::title)
.containsExactly("Bob undated", "Bob dated", "Anna dated", "Anna undated");
}
@Test
void searchDocuments_senderSort_desc_keepsUndatedInsideSenderGroupNotAtHead() {
// DESC symmetry for the in-memory path: sender order reverses ("Ziegler Anna"
// before "Adler Bob"), but the undated doc still sorts by sender, never by date,
// so it stays within its group and does not surface at the page head.
Person bobAdler = Person.builder().id(UUID.randomUUID()).firstName("Bob").lastName("Adler").build();
Person annaZiegler = Person.builder().id(UUID.randomUUID()).firstName("Anna").lastName("Ziegler").build();
Document undatedBob = Document.builder().id(UUID.randomUUID()).title("Bob undated")
.sender(bobAdler).documentDate(null).build();
Document datedBob = Document.builder().id(UUID.randomUUID()).title("Bob dated")
.sender(bobAdler).documentDate(LocalDate.of(1916, 6, 15)).build();
Document undatedAnna = Document.builder().id(UUID.randomUUID()).title("Anna undated")
.sender(annaZiegler).documentDate(null).build();
Document datedAnna = Document.builder().id(UUID.randomUUID()).title("Anna dated")
.sender(annaZiegler).documentDate(LocalDate.of(1943, 12, 24)).build();
when(documentRepository.findAll(any(org.springframework.data.jpa.domain.Specification.class)))
.thenReturn(List.of(undatedBob, datedAnna, datedBob, undatedAnna));
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null, DocumentSort.SENDER, "desc", null, false, UNPAGED);
// Anna's group precedes Bob's (DESC by sender); undated stays inside its group.
assertThat(result.items()).extracting(DocumentListItem::title)
.containsExactly("Anna dated", "Anna undated", "Bob undated", "Bob dated");
}
@Test
void searchDocuments_undatedTrue_withSenderSort_appliesUndatedSpecification() {
// Reachable UI state: "Nur undatierte" toggled on while grouped by sender.
// The SENDER sort takes the in-memory path, but the undatedOnly predicate must
// still be composed into the Specification handed to the repository — proven by
// capturing the spec passed to findAll and confirming it filters to null dates.
Person alice = Person.builder().id(UUID.randomUUID()).firstName("Alice").lastName("Ziegler").build();
Document undatedFromAlice = Document.builder().id(UUID.randomUUID()).title("Undated")
.sender(alice).documentDate(null).build();
org.mockito.ArgumentCaptor<org.springframework.data.jpa.domain.Specification<Document>> specCaptor =
org.mockito.ArgumentCaptor.forClass(org.springframework.data.jpa.domain.Specification.class);
when(documentRepository.findAll(specCaptor.capture()))
.thenReturn(List.of(undatedFromAlice));
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null, DocumentSort.SENDER, "asc", null, true, UNPAGED);
// The in-memory path queried via a Specification (built by buildSearchSpec with
// undatedOnly(true)) rather than skipping straight to a sorted findAll.
assertThat(specCaptor.getValue()).isNotNull();
assertThat(result.items()).extracting(DocumentListItem::title).containsExactly("Undated");
}
@Test
void searchDocuments_undatedTrue_usesSpecificationPath_notPureTextRelevanceShortcut() {
// undated=true must bypass the pure-text RELEVANCE SQL shortcut, which
// skips buildSearchSpec and would silently drop the undatedOnly predicate.
when(documentRepository.findAllMatchingIdsByFts("brief")).thenReturn(List.of(UUID.randomUUID()));
when(documentRepository.findAll(any(org.springframework.data.jpa.domain.Specification.class)))
.thenReturn(List.of());
documentService.searchDocuments("brief", null, null, null, null, null, null, null,
DocumentSort.RELEVANCE, null, null, true, UNPAGED);
// The FTS-id path (buildSearchSpec) ran; the raw-page SQL shortcut did not.
verify(documentRepository).findAllMatchingIdsByFts("brief");
verify(documentRepository, never()).findFtsPageRaw(anyString(), anyInt(), anyInt());
}
@Test
void searchDocuments_senderSort_nullLastNameSortsToEnd() {
// Without fix: null lastName produces sort key "null Smith" which compares
@@ -1796,10 +1604,10 @@ class DocumentServiceTest {
.thenReturn(List.of(docNullName, docSmith));
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null, DocumentSort.SENDER, "asc", null, false, UNPAGED);
null, null, null, null, null, null, null, null, DocumentSort.SENDER, "asc", null, UNPAGED);
// null lastName should sort to end (treated as empty), not before "smith" (as "null")
assertThat(result.items()).extracting(DocumentListItem::title)
assertThat(result.items()).extracting(item -> item.document().getTitle())
.containsExactly("smith doc", "Null lastname doc");
}
@@ -1819,7 +1627,7 @@ class DocumentServiceTest {
when(documentRepository.findEnrichmentData(any(), eq("Brief"))).thenReturn(rows);
DocumentSearchResult result = documentService.searchDocuments(
"Brief", null, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, false, UNPAGED);
"Brief", null, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, UNPAGED);
assertThat(result.items()).hasSize(1);
SearchMatchData md = result.items().get(0).matchData();
@@ -1833,7 +1641,8 @@ class DocumentServiceTest {
.thenReturn(new PageImpl<>(List.of()));
DocumentSearchResult result = documentService.searchDocuments(
null, null, null, null, null, null, null, null, null, null, null, false, UNPAGED);
null, null, null, null, null, null, null, null, null, null, null,
UNPAGED);
assertThat(result.items()).isEmpty();
}
@@ -1853,7 +1662,7 @@ class DocumentServiceTest {
when(documentRepository.findEnrichmentData(any(), eq("Brief"))).thenReturn(rows);
DocumentSearchResult result = documentService.searchDocuments(
"Brief", null, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, false, UNPAGED);
"Brief", null, null, null, null, null, null, null, DocumentSort.RELEVANCE, null, null, UNPAGED);
SearchMatchData md = result.items().get(0).matchData();
assertThat(md.transcriptionSnippet()).isEqualTo("Hier ist der Brief aus Berlin");
@@ -2370,7 +2179,7 @@ class DocumentServiceTest {
.thenReturn(List.of(d1, d2));
List<UUID> result = documentService.findIdsForFilter(
null, null, null, null, null, null, null, null, null, false);
null, null, null, null, null, null, null, null, null);
assertThat(result).containsExactly(d1.getId(), d2.getId());
}
@@ -2385,7 +2194,7 @@ class DocumentServiceTest {
when(tagService.expandTagNamesToDescendantIdSets(any())).thenReturn(List.of());
documentService.findIdsForFilter(
null, null, null, null, null, List.of("Brief"), null, null, TagOperator.OR, false);
null, null, null, null, null, List.of("Brief"), null, null, TagOperator.OR);
// Spec built without throwing → OR branch was exercised. Coverage gain
// is in not-throwing on the OR-specific code path; the actual SQL is
@@ -2398,7 +2207,7 @@ class DocumentServiceTest {
when(documentRepository.findAllMatchingIdsByFts("xyz")).thenReturn(List.of());
List<UUID> result = documentService.findIdsForFilter(
"xyz", null, null, null, null, null, null, null, null, false);
"xyz", null, null, null, null, null, null, null, null);
assertThat(result).isEmpty();
verify(documentRepository, never()).findAll(any(org.springframework.data.jpa.domain.Specification.class));

View File

@@ -261,21 +261,4 @@ class DocumentSpecificationsTest {
assertThat(result).isEmpty();
}
// ─── undatedOnly ──────────────────────────────────────────────────────────
@Test
void undatedOnly_false_returnsAllDocuments() {
// false → no predicate (null), so the filter is a no-op (issue #668).
List<Document> result = documentRepository.findAll(Specification.where(undatedOnly(false)));
assertThat(result).hasSize(3);
}
@Test
void undatedOnly_true_returnsOnlyDocumentsWithoutADate() {
// Only the placeholder photo has a null documentDate in the fixture.
List<Document> result = documentRepository.findAll(Specification.where(undatedOnly(true)));
assertThat(result).extracting(Document::getTitle).containsExactly("Familienfoto");
assertThat(result).allMatch(d -> d.getDocumentDate() == null);
}
}

View File

@@ -1,149 +0,0 @@
package org.raddatz.familienarchiv.document;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.raddatz.familienarchiv.config.FlywayConfig;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.data.jpa.test.autoconfigure.DataJpaTest;
import org.springframework.boot.jdbc.test.autoconfigure.AutoConfigureTestDatabase;
import org.springframework.context.annotation.Import;
import org.springframework.data.domain.Sort;
import org.springframework.data.jpa.domain.Specification;
import java.time.LocalDate;
import java.util.List;
import static org.assertj.core.api.Assertions.assertThat;
import static org.raddatz.familienarchiv.document.DocumentSpecifications.isBetween;
import static org.raddatz.familienarchiv.document.DocumentSpecifications.undatedOnly;
/**
* Real-Postgres assertions for issue #668. H2 disagrees with Postgres on
* {@code NULLS FIRST/LAST} defaults and on whether {@code BETWEEN} excludes
* NULL, so these guarantees MUST run against {@code postgres:16-alpine}, never
* an in-memory database.
*/
@DataJpaTest
@AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
@Import({PostgresContainerConfig.class, FlywayConfig.class})
class UndatedDocumentOrderingIntegrationTest {
@Autowired DocumentRepository documentRepository;
@BeforeEach
void setUp() {
documentRepository.deleteAll();
save("1916", LocalDate.of(1916, 6, 15));
save("1943", LocalDate.of(1943, 12, 24));
save("undated-a", null);
save("undated-b", null);
}
private void save(String title, LocalDate date) {
documentRepository.save(Document.builder()
.title(title)
.originalFilename(title + ".pdf")
.status(DocumentStatus.UPLOADED)
.metaDatePrecision(date == null ? DatePrecision.UNKNOWN : DatePrecision.DAY)
.documentDate(date)
.build());
}
@Test
void dateAscWithNullsLast_returnsDatedFirstUndatedLast() {
Sort sort = Sort.by(new Sort.Order(Sort.Direction.ASC, "documentDate").nullsLast());
List<Document> result = documentRepository.findAll(sort);
assertThat(result).hasSize(4);
assertThat(result.get(0).getDocumentDate()).isEqualTo(LocalDate.of(1916, 6, 15));
assertThat(result.get(1).getDocumentDate()).isEqualTo(LocalDate.of(1943, 12, 24));
assertThat(result.get(2).getDocumentDate()).isNull();
assertThat(result.get(3).getDocumentDate()).isNull();
}
@Test
void sameDate_tiebreaksByTitleAsc_notCreatedAt_forBothDirections() throws Exception {
// Owner decision (#668): equal-date rows tie-break by title ASC, NOT
// createdAt. Insert two same-date docs so that createdAt order (insertion
// order) is the OPPOSITE of title order: the first-saved doc gets the later
// title ("zzz-first"), the second-saved doc gets the earlier title
// ("aaa-second"). If the tiebreaker were still createdAt-asc the first-saved
// row would lead; because it is title-asc the "aaa-second" row must lead —
// and it must lead in BOTH ASC and DESC date directions, since the date is
// equal so only the title tiebreaker decides.
//
// The Sort under test is built by the PRODUCTION resolveSort(DATE, dir) (via
// reflection — it is private), not hand-rolled here, so this test proves the
// real Postgres ordering that production emits, on real same-date rows.
documentRepository.deleteAll();
LocalDate sameDate = LocalDate.of(1920, 3, 3);
save("zzz-first", sameDate); // saved first → earlier createdAt
save("aaa-second", sameDate); // saved second → later createdAt
List<Document> asc = documentRepository.findAll(resolveProductionSort("ASC"));
assertThat(asc).extracting(Document::getTitle)
.containsExactly("aaa-second", "zzz-first");
List<Document> desc = documentRepository.findAll(resolveProductionSort("DESC"));
assertThat(desc).extracting(Document::getTitle)
.containsExactly("aaa-second", "zzz-first");
}
/**
* Invokes the production {@link DocumentService#resolveSort(DocumentSort, String)}
* for the DATE sort so the integration assertions exercise the real tiebreaker
* choice rather than a sort hand-built in the test.
*/
private Sort resolveProductionSort(String dir) throws Exception {
// resolveSort is a pure function of its arguments (uses no instance state), so a
// bean instance with null collaborators is sufficient to exercise it.
var ctor = DocumentService.class.getDeclaredConstructors()[0];
ctor.setAccessible(true);
Object[] args = new Object[ctor.getParameterCount()];
DocumentService service = (DocumentService) ctor.newInstance(args);
var m = DocumentService.class.getDeclaredMethod("resolveSort", DocumentSort.class, String.class);
m.setAccessible(true);
return (Sort) m.invoke(service, DocumentSort.DATE, dir);
}
@Test
void undatedOnly_returnsExactlyTheNullDatedRows() {
List<Document> result = documentRepository.findAll(undatedOnly(true));
assertThat(result).hasSize(2);
assertThat(result).allMatch(d -> d.getDocumentDate() == null);
}
@Test
void undatedOnly_false_returnsAllRows() {
Specification<Document> spec = Specification.where(undatedOnly(false));
List<Document> result = documentRepository.findAll(spec);
assertThat(result).hasSize(4);
}
@Test
void dateRange_excludesUndatedRows() {
List<Document> result = documentRepository.findAll(isBetween(
LocalDate.of(1900, 1, 1), LocalDate.of(2000, 12, 31)));
assertThat(result).hasSize(2);
assertThat(result).allMatch(d -> d.getDocumentDate() != null);
}
@Test
void undatedOnly_combinedWithDateRange_returnsEmpty() {
// The collision rule (#668): a from/to range and undated=true are mutually
// exclusive — a row cannot both have a null date and fall inside a range.
Specification<Document> spec = Specification
.where(undatedOnly(true))
.and(isBetween(LocalDate.of(1900, 1, 1), LocalDate.of(2000, 12, 31)));
List<Document> result = documentRepository.findAll(spec);
assertThat(result).isEmpty();
}
}

View File

@@ -1,229 +0,0 @@
package org.raddatz.familienarchiv.importing;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentRepository;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonRepository;
import org.raddatz.familienarchiv.tag.TagRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.context.annotation.Import;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.test.util.ReflectionTestUtils;
import software.amazon.awssdk.services.s3.S3Client;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import java.util.Optional;
import static org.assertj.core.api.Assertions.assertThat;
/**
* Real Postgres (Testcontainers) integration test for the canonical importer. The
* {@code UNIQUE(source_ref)} constraint and the upsert-on-conflict behaviour only exist
* in real Postgres (never H2), so idempotency is verified here. S3 is mocked — the
* synthetic document rows carry no on-disk files, so every document is a PLACEHOLDER and
* no upload is attempted.
*/
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.NONE)
@ActiveProfiles("test")
@Import(PostgresContainerConfig.class)
class CanonicalImportIntegrationTest {
@MockitoBean S3Client s3Client;
@Autowired CanonicalImportOrchestrator orchestrator;
@Autowired PersonRepository personRepository;
@Autowired TagRepository tagRepository;
@Autowired DocumentRepository documentRepository;
Path artifactDir;
@BeforeEach
void setUp() throws Exception {
documentRepository.deleteAll();
personRepository.deleteAll();
tagRepository.deleteAll();
artifactDir = Files.createTempDirectory("canonical-import-it");
writeArtifacts(artifactDir);
ReflectionTestUtils.setField(orchestrator, "canonicalDir", artifactDir.toString());
}
/**
* The import commits through its own transactions (the orchestrator is not transactional),
* so this test cannot rely on {@code @Transactional} rollback for isolation. Delete the
* committed rows after each test — otherwise the last test's documents (dated 1888-02) and
* persons/tags leak into the shared Testcontainers Postgres and pollute other integration
* tests that assume a known seed (e.g. DocumentDensityIntegrationTest,
* DocumentSearchPagedIntegrationTest). Mirrors the @AfterEach deleteAll convention used by
* DocumentListItemIntegrationTest.
*/
@AfterEach
void cleanup() {
documentRepository.deleteAll();
personRepository.deleteAll();
tagRepository.deleteAll();
}
@Test
void reimport_isIdempotent_noDuplicatePersonsTagsOrDocuments() {
orchestrator.runImport();
long personsAfterFirst = personRepository.count();
long tagsAfterFirst = tagRepository.count();
long documentsAfterFirst = documentRepository.count();
assertThat(orchestrator.getStatus().state()).isEqualTo(ImportStatus.State.DONE);
assertThat(personsAfterFirst).isPositive();
assertThat(tagsAfterFirst).isPositive();
assertThat(documentsAfterFirst).isPositive();
orchestrator.runImport();
assertThat(personRepository.count()).isEqualTo(personsAfterFirst);
assertThat(tagRepository.count()).isEqualTo(tagsAfterFirst);
assertThat(documentRepository.count()).isEqualTo(documentsAfterFirst);
}
@Test
void reimport_preservesHumanEditedPersonField() {
orchestrator.runImport();
Person walter = personRepository.findBySourceRef("de-gruyter-walter").orElseThrow();
walter.setNotes("Verified by archivist");
walter.setFirstName("Walther");
personRepository.save(walter);
orchestrator.runImport();
Person reimported = personRepository.findBySourceRef("de-gruyter-walter").orElseThrow();
assertThat(reimported.getNotes()).isEqualTo("Verified by archivist");
assertThat(reimported.getFirstName()).isEqualTo("Walther");
}
@Test
void import_linksDocumentSenderToRegisterPerson_andRetainsRawText() {
orchestrator.runImport();
Person walter = personRepository.findBySourceRef("de-gruyter-walter").orElseThrow();
Document doc = documentRepository.findByOriginalFilename("W-0001").orElseThrow();
assertThat(doc.getSender()).isNotNull();
assertThat(doc.getSender().getId()).isEqualTo(walter.getId());
assertThat(doc.getSenderText()).isEqualTo("Walter de Gruyter");
assertThat(doc.getStatus()).isEqualTo(DocumentStatus.PLACEHOLDER);
}
@Test
void import_provisionalFlag_trueForImporterCreated_falseForRegister() {
orchestrator.runImport();
Optional<Person> register = personRepository.findBySourceRef("de-gruyter-walter");
assertThat(register).get().extracting(Person::isProvisional).isEqualTo(false);
}
@Test
void reimport_prunesRemovedReceiverAndTag_whenCanonicalRowShrinks() throws Exception {
orchestrator.runImport();
// findById uses the Document.full entity graph so receivers/tags initialise eagerly.
Document before = documentRepository.findById(
documentRepository.findByOriginalFilename("W-0001").orElseThrow().getId()).orElseThrow();
assertThat(before.getReceivers()).isNotEmpty();
assertThat(before.getTags()).isNotEmpty();
// Re-stage the document sheet with W-0001's receiver and tag removed.
writeSheet(artifactDir.resolve("canonical-documents.xlsx"),
List.of("index", "sender_person_id", "sender_name", "receiver_person_ids",
"receiver_names", "date_iso", "date_raw", "date_precision", "date_end", "location", "tags", "summary"),
List.of(
List.of("W-0001", "de-gruyter-walter", "Walter de Gruyter",
"", "", "1888-02-15", "15.2.1888", "DAY", "", "Rotterdam", "", "Geschäftsreise"),
List.of("W-0002", "de-gruyter-eugenie", "Eugenie de Gruyter",
"de-gruyter-walter", "Walter de Gruyter", "1888-02-16", "16.2.1888", "DAY", "",
"Middelburg", "Themen/Brautbriefe", "Reisepläne")));
orchestrator.runImport();
Document after = documentRepository.findById(before.getId()).orElseThrow();
assertThat(after.getReceivers()).isEmpty();
assertThat(after.getTags()).isEmpty();
}
@Test
void import_neverFlipsRegisterPersonToProvisional_whenReferencedByDocumentRow() {
// de-gruyter-walter is a register person (provisional=false) AND the sender of W-0001.
// The orchestrator loads the register before documents, so the document loader's
// register-first match links the existing person and never mints a provisional one.
// A second run (documents reference the same person again) must not flip it true.
orchestrator.runImport();
orchestrator.runImport();
Person walter = personRepository.findBySourceRef("de-gruyter-walter").orElseThrow();
assertThat(walter.isProvisional()).isFalse();
Person eugenie = personRepository.findBySourceRef("de-gruyter-eugenie").orElseThrow();
assertThat(eugenie.isProvisional()).isFalse();
}
// ─── synthetic-but-real artifact set ─────────────────────────────────────────────
private void writeArtifacts(Path dir) throws Exception {
writeSheet(dir.resolve("canonical-tag-tree.xlsx"),
List.of("tag_path", "parent_name", "tag_name"),
List.of(
List.of("Themen", "", "Themen"),
List.of("Themen/Brautbriefe", "Themen", "Brautbriefe")));
writeSheet(dir.resolve("canonical-persons.xlsx"),
List.of("person_id", "last_name", "first_name", "maiden_name", "notes", "birth_date", "death_date", "provisional"),
List.of(
List.of("de-gruyter-walter", "de Gruyter", "Walter", "", "", "1865-01-01", "", "False"),
List.of("de-gruyter-eugenie", "de Gruyter", "Eugenie", "Wöhler", "", "", "", "False")));
Files.writeString(dir.resolve("canonical-persons-tree.json"), """
{"persons":[
{"rowId":"row_1","firstName":"Walter","lastName":"de Gruyter","familyMember":true,"personId":"de-gruyter-walter"},
{"rowId":"row_2","firstName":"Eugenie","lastName":"de Gruyter","maidenName":"Wöhler","familyMember":true,"personId":"de-gruyter-eugenie"}
],"relationships":[
{"personId":"row_1","relatedPersonId":"row_2","type":"SPOUSE_OF","source":"verheiratet_mit"}
]}
""");
writeSheet(dir.resolve("canonical-documents.xlsx"),
List.of("index", "sender_person_id", "sender_name", "receiver_person_ids",
"receiver_names", "date_iso", "date_raw", "date_precision", "date_end", "location", "tags", "summary"),
List.of(
List.of("W-0001", "de-gruyter-walter", "Walter de Gruyter",
"de-gruyter-eugenie", "Eugenie de Gruyter", "1888-02-15", "15.2.1888", "DAY", "",
"Rotterdam", "Themen/Brautbriefe", "Geschäftsreise"),
List.of("W-0002", "de-gruyter-eugenie", "Eugenie de Gruyter",
"de-gruyter-walter", "Walter de Gruyter", "1888-02-16", "16.2.1888", "DAY", "",
"Middelburg", "Themen/Brautbriefe", "Reisepläne")));
}
private void writeSheet(Path file, List<String> headers, List<List<String>> rows) throws Exception {
try (XSSFWorkbook wb = new XSSFWorkbook()) {
Sheet sheet = wb.createSheet("Sheet1");
Row header = sheet.createRow(0);
for (int i = 0; i < headers.size(); i++) {
header.createCell(i).setCellValue(headers.get(i));
}
for (int r = 0; r < rows.size(); r++) {
Row row = sheet.createRow(r + 1);
List<String> values = rows.get(r);
for (int c = 0; c < values.size(); c++) {
row.createCell(c).setCellValue(values.get(c));
}
}
try (OutputStream out = Files.newOutputStream(file)) {
wb.write(out);
}
}
}
}

View File

@@ -1,183 +0,0 @@
package org.raddatz.familienarchiv.importing;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.junit.jupiter.api.io.TempDir;
import org.mockito.InOrder;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.person.relationship.RelationType;
import org.raddatz.familienarchiv.person.relationship.RelationshipService;
import org.raddatz.familienarchiv.person.relationship.dto.NetworkDTO;
import org.raddatz.familienarchiv.person.relationship.dto.PersonNodeDTO;
import org.raddatz.familienarchiv.person.relationship.dto.RelationshipDTO;
import org.springframework.test.util.ReflectionTestUtils;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.inOrder;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
@ExtendWith(MockitoExtension.class)
class CanonicalImportOrchestratorTest {
@Mock TagTreeImporter tagTreeImporter;
@Mock PersonRegisterImporter personRegisterImporter;
@Mock PersonTreeImporter personTreeImporter;
@Mock DocumentImporter documentImporter;
@Mock RelationshipService relationshipService;
private CanonicalImportOrchestrator orchestrator(Path dir) {
CanonicalImportOrchestrator o = new CanonicalImportOrchestrator(
tagTreeImporter, personRegisterImporter, personTreeImporter, documentImporter,
relationshipService);
ReflectionTestUtils.setField(o, "canonicalDir", dir.toString());
return o;
}
private void writeAllArtifacts(Path dir) throws Exception {
Files.writeString(dir.resolve("canonical-tag-tree.xlsx"), "x");
Files.writeString(dir.resolve("canonical-persons.xlsx"), "x");
Files.writeString(dir.resolve("canonical-persons-tree.json"), "x");
Files.writeString(dir.resolve("canonical-documents.xlsx"), "x");
}
@Test
void getStatus_isIdleByDefault(@TempDir Path dir) {
assertThat(orchestrator(dir).getStatus().state()).isEqualTo(ImportStatus.State.IDLE);
}
@Test
void runImport_loadsTagsAndPersonsBeforeDocuments(@TempDir Path dir) throws Exception {
writeAllArtifacts(dir);
when(documentImporter.load(any())).thenReturn(new DocumentImporter.LoadResult(0, List.of()));
when(relationshipService.getFamilyNetwork()).thenReturn(new NetworkDTO(List.of(), List.of()));
CanonicalImportOrchestrator o = orchestrator(dir);
o.runImport();
InOrder order = inOrder(tagTreeImporter, personRegisterImporter, personTreeImporter, documentImporter);
order.verify(tagTreeImporter).load(any());
order.verify(personRegisterImporter).load(any());
order.verify(personTreeImporter).load(any());
order.verify(documentImporter).load(any());
}
@Test
void runImport_setsStatusDone_onSuccess(@TempDir Path dir) throws Exception {
writeAllArtifacts(dir);
when(documentImporter.load(any())).thenReturn(new DocumentImporter.LoadResult(3, List.of()));
when(relationshipService.getFamilyNetwork()).thenReturn(new NetworkDTO(List.of(), List.of()));
CanonicalImportOrchestrator o = orchestrator(dir);
o.runImport();
assertThat(o.getStatus().state()).isEqualTo(ImportStatus.State.DONE);
assertThat(o.getStatus().processed()).isEqualTo(3);
}
@Test
void runImport_failsClosed_whenAnArtifactIsMissing(@TempDir Path dir) throws Exception {
Files.writeString(dir.resolve("canonical-tag-tree.xlsx"), "x");
// the other three artifacts are absent
CanonicalImportOrchestrator o = orchestrator(dir);
o.runImport();
assertThat(o.getStatus().state()).isEqualTo(ImportStatus.State.FAILED);
verify(tagTreeImporter, never()).load(any());
verify(documentImporter, never()).load(any());
}
@Test
void runImport_setsStatusFailed_whenLoaderThrows(@TempDir Path dir) throws Exception {
writeAllArtifacts(dir);
when(tagTreeImporter.load(any())).thenThrow(DomainException.badRequest(
org.raddatz.familienarchiv.exception.ErrorCode.IMPORT_ARTIFACT_INVALID, "bad"));
CanonicalImportOrchestrator o = orchestrator(dir);
o.runImport();
assertThat(o.getStatus().state()).isEqualTo(ImportStatus.State.FAILED);
verify(documentImporter, never()).load(any());
}
@Test
void runImportAsync_throwsConflict_whenAlreadyRunning(@TempDir Path dir) {
CanonicalImportOrchestrator o = orchestrator(dir);
ReflectionTestUtils.setField(o, "currentStatus", new ImportStatus(
ImportStatus.State.RUNNING, "IMPORT_RUNNING", "running", 0, List.of(), null));
assertThatThrownBy(o::runImportAsync)
.isInstanceOf(DomainException.class)
.hasMessageContaining("already in progress");
}
@Test
void runImport_aggregatesDocumentSkips(@TempDir Path dir) throws Exception {
writeAllArtifacts(dir);
when(documentImporter.load(any())).thenReturn(new DocumentImporter.LoadResult(1,
List.of(new ImportStatus.SkippedFile("fake.pdf", ImportStatus.SkipReason.INVALID_PDF_SIGNATURE))));
when(relationshipService.getFamilyNetwork()).thenReturn(new NetworkDTO(List.of(), List.of()));
CanonicalImportOrchestrator o = orchestrator(dir);
o.runImport();
assertThat(o.getStatus().skipped()).isEqualTo(1);
assertThat(o.getStatus().skippedFiles())
.extracting(ImportStatus.SkippedFile::filename)
.containsExactly("fake.pdf");
}
// ─── generation monotonicity soft-check (#689) ─────────────────────────────
@Test
void runImport_invokesGetFamilyNetwork_afterPersonLoaders_beforeDocuments(@TempDir Path dir) throws Exception {
writeAllArtifacts(dir);
when(documentImporter.load(any())).thenReturn(new DocumentImporter.LoadResult(0, List.of()));
when(relationshipService.getFamilyNetwork()).thenReturn(new NetworkDTO(List.of(), List.of()));
CanonicalImportOrchestrator o = orchestrator(dir);
o.runImport();
InOrder order = inOrder(personRegisterImporter, personTreeImporter, relationshipService, documentImporter);
order.verify(personRegisterImporter).load(any());
order.verify(personTreeImporter).load(any());
order.verify(relationshipService).getFamilyNetwork();
order.verify(documentImporter).load(any());
}
@Test
void runImport_completes_evenWhenMonotonicityViolatingEdgePresent(@TempDir Path dir) throws Exception {
// child.generation (2) <= parent.generation (3) — monotonicity violation.
// The orchestrator must WARN and continue; it must not abort or fail-closed.
writeAllArtifacts(dir);
UUID parentId = UUID.randomUUID();
UUID childId = UUID.randomUUID();
PersonNodeDTO parent = new PersonNodeDTO(parentId, "Parent", null, null, 3, true);
PersonNodeDTO child = new PersonNodeDTO(childId, "Child", null, null, 2, true);
RelationshipDTO edge = new RelationshipDTO(
UUID.randomUUID(), parentId, childId,
"Parent", null, null, "Child", null, null,
RelationType.PARENT_OF, null, null, null);
when(relationshipService.getFamilyNetwork())
.thenReturn(new NetworkDTO(List.of(parent, child), List.of(edge)));
when(documentImporter.load(any())).thenReturn(new DocumentImporter.LoadResult(0, List.of()));
CanonicalImportOrchestrator o = orchestrator(dir);
o.runImport();
assertThat(o.getStatus().state()).isEqualTo(ImportStatus.State.DONE);
verify(documentImporter).load(any());
}
}

View File

@@ -1,115 +0,0 @@
package org.raddatz.familienarchiv.importing;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.io.TempDir;
import org.raddatz.familienarchiv.exception.DomainException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
class CanonicalSheetReaderTest {
@Test
void readRows_mapsCellsByHeaderName(@TempDir Path tempDir) throws Exception {
Path xlsx = write(tempDir, List.of("index", "file"), List.of(List.of("W-0001", "scan.pdf")));
List<CanonicalSheetReader.Row> rows = CanonicalSheetReader.readRows(xlsx.toFile(), List.of("index", "file"));
assertThat(rows).hasSize(1);
assertThat(rows.get(0).get("index")).isEqualTo("W-0001");
assertThat(rows.get(0).get("file")).isEqualTo("scan.pdf");
}
@Test
void readRows_throwsBadRequest_whenRequiredHeaderMissing(@TempDir Path tempDir) throws Exception {
Path xlsx = write(tempDir, List.of("index"), List.of(List.of("W-0001")));
assertThatThrownBy(() -> CanonicalSheetReader.readRows(xlsx.toFile(), List.of("index", "file")))
.isInstanceOf(DomainException.class)
.hasMessageContaining("file");
}
@Test
void get_returnsEmptyString_forBlankCell(@TempDir Path tempDir) throws Exception {
Path xlsx = write(tempDir, List.of("index", "file"), List.of(List.of("W-0001", "")));
List<CanonicalSheetReader.Row> rows = CanonicalSheetReader.readRows(xlsx.toFile(), List.of("index", "file"));
assertThat(rows.get(0).get("file")).isEmpty();
}
@Test
void get_returnsEmptyString_forUnknownColumn(@TempDir Path tempDir) throws Exception {
Path xlsx = write(tempDir, List.of("index"), List.of(List.of("W-0001")));
List<CanonicalSheetReader.Row> rows = CanonicalSheetReader.readRows(xlsx.toFile(), List.of("index"));
assertThat(rows.get(0).get("does_not_exist")).isEmpty();
}
@Test
void get_returnsEmptyString_forTrailingColumns_whenRowShorterThanHeader(@TempDir Path tempDir) throws Exception {
// POI omits trailing empty cells, so a real-world artifact row can be narrower than
// the header. The missing columns must read as "" rather than throwing.
Path xlsx = write(tempDir,
List.of("index", "file", "summary"),
List.of(List.of("W-0001")));
List<CanonicalSheetReader.Row> rows = CanonicalSheetReader.readRows(xlsx.toFile(), List.of("index", "file", "summary"));
assertThat(rows.get(0).get("index")).isEqualTo("W-0001");
assertThat(rows.get(0).get("file")).isEmpty();
assertThat(rows.get(0).get("summary")).isEmpty();
}
@Test
void splitList_splitsOnPipe() {
assertThat(CanonicalSheetReader.splitList("a|b|c")).containsExactly("a", "b", "c");
}
@Test
void splitList_returnsEmptyList_forBlank() {
assertThat(CanonicalSheetReader.splitList("")).isEmpty();
assertThat(CanonicalSheetReader.splitList(" ")).isEmpty();
}
@Test
void splitList_returnsSingleElement_whenNoPipe() {
assertThat(CanonicalSheetReader.splitList("solo")).containsExactly("solo");
}
@Test
void splitList_trimsAndDropsEmptySegments() {
assertThat(CanonicalSheetReader.splitList("a| |b")).containsExactly("a", "b");
}
private Path write(Path dir, List<String> headers, List<List<String>> dataRows) throws Exception {
Path xlsx = dir.resolve("sheet.xlsx");
try (XSSFWorkbook wb = new XSSFWorkbook()) {
Sheet sheet = wb.createSheet("Sheet1");
Row header = sheet.createRow(0);
for (int i = 0; i < headers.size(); i++) {
header.createCell(i).setCellValue(headers.get(i));
}
for (int r = 0; r < dataRows.size(); r++) {
Row row = sheet.createRow(r + 1);
List<String> values = dataRows.get(r);
for (int c = 0; c < values.size(); c++) {
row.createCell(c).setCellValue(values.get(c));
}
}
try (OutputStream out = Files.newOutputStream(xlsx)) {
wb.write(out);
}
}
return xlsx;
}
}

View File

@@ -1,656 +0,0 @@
package org.raddatz.familienarchiv.importing;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.junit.jupiter.api.io.TempDir;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.document.ThumbnailAsyncRunner;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonService;
import org.raddatz.familienarchiv.person.PersonUpsertCommand;
import org.raddatz.familienarchiv.tag.Tag;
import org.raddatz.familienarchiv.tag.TagService;
import org.springframework.test.util.ReflectionTestUtils;
import software.amazon.awssdk.core.sync.RequestBody;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
import java.io.File;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.LocalDate;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.lenient;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
@ExtendWith(MockitoExtension.class)
class DocumentImporterTest {
@Mock DocumentService documentService;
@Mock PersonService personService;
@Mock TagService tagService;
@Mock S3Client s3Client;
@Mock ThumbnailAsyncRunner thumbnailAsyncRunner;
@Mock FileStreamOpener fileStreamOpener;
DocumentImporter importer;
@BeforeEach
void setUp() throws java.io.IOException {
// Default opener delegates to FileInputStream — tests that need to force an IOException
// override this stub locally (load_skipsFile_whenMagicByteCheckThrowsIoException).
lenient().when(fileStreamOpener.open(any(File.class)))
.thenAnswer(inv -> new java.io.FileInputStream(inv.getArgument(0, File.class)));
importer = new DocumentImporter(documentService, personService, tagService, s3Client,
thumbnailAsyncRunner, fileStreamOpener);
ReflectionTestUtils.setField(importer, "bucketName", "test-bucket");
}
// ─── index validation — a malicious/garbage index can never reach disk I/O ─────────
@Test
void isValidImportIndex_returnsFalse_whenNull() {
assertThat(validIndex(null)).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenBlank() {
assertThat(validIndex(" ")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenForwardSlash() {
assertThat(validIndex("etc/passwd")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenBackslash() {
assertThat(validIndex("..\\etc\\passwd")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenDotDot() {
assertThat(validIndex("W-..0001")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenIsDotDot() {
assertThat(validIndex("..")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenSingleDot() {
assertThat(validIndex(".")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenAbsolutePath() {
assertThat(validIndex("/etc/passwd")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenNullByte() {
assertThat(validIndex("W-0001\0")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenUnicodeDivisionSlash() {
assertThat(validIndex("W0001")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenFullwidthSlash() {
assertThat(validIndex("W0001")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenReverseSolidusOperator() {
assertThat(validIndex("W0001")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenContainsDotPdfExtension() {
// The index is the bare catalog id; appending ".pdf" is the importer's job. A dot in
// the index would let "W-0001.pdf" become "W-0001.pdf.pdf" or smuggle an extension.
assertThat(validIndex("W-0001.pdf")).isFalse();
}
// ─── catalog-shape rejects — pass the char pre-checks but must fail INDEX_PATTERN ────
// These pin the regex branch itself: each string contains no separator, dot, slash
// homoglyph, null byte, or absolute marker, so it sails past every char guard and is
// rejected *only* because INDEX_PATTERN.matches() returns false. A weaker pattern would
// let them through — these tests would then go red.
@Test
void isValidImportIndex_returnsFalse_whenSpaceInIndex() {
// The real-world reject: "J 0070" is a space-typo with no PDF on disk.
assertThat(validIndex("J 0070")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenFiveLetterPrefix() {
// The catalog prefix is at most 4 letters; 5 must not match.
assertThat(validIndex("WXYZA-0001")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenNoLetterPrefix() {
// A digit-led id (no letter prefix) is not a catalog shape.
assertThat(validIndex("12-0001")).isFalse();
}
@Test
void isValidImportIndex_returnsFalse_whenUppercaseXSuffix() {
// Only a lowercase trailing "x" is allowed; an uppercase "X" suffix must fail.
assertThat(validIndex("W-0001X")).isFalse();
}
@Test
void isValidImportIndex_returnsTrue_whenPlainCatalogIndex() {
assertThat(validIndex("W-0124")).isTrue();
}
@Test
void isValidImportIndex_returnsTrue_whenTwoLetterPrefix() {
assertThat(validIndex("Al-0001")).isTrue();
}
@Test
void isValidImportIndex_returnsTrue_whenThreeLetterPrefix() {
assertThat(validIndex("CuH-0010")).isTrue();
}
@Test
void isValidImportIndex_returnsTrue_whenUmlautPrefix() {
// Real corpus indices carry a German umlaut, e.g. "Mü-0001.pdf" exists on disk.
assertThat(validIndex("Mü-0001")).isTrue();
}
@Test
void isValidImportIndex_returnsTrue_whenDoubleHyphen() {
// Real corpus: "C--0029" appears in the spreadsheet (a data-entry artefact, but a
// legitimate catalog shape that must still resolve, not crash).
assertThat(validIndex("C--0029")).isTrue();
}
@Test
void isValidImportIndex_returnsTrue_whenXSuffix() {
// The normalizer recognises an x-suffix catalog id; allow it defensively.
assertThat(validIndex("W-0001x")).isTrue();
}
// ─── a valid index resolves to exactly importDir/<index>.pdf within containment ─────
@Test
void load_resolvesPdfByIndex_uploadsToS3_andSetsStatusUploaded(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
byte[] pdf = {0x25, 0x50, 0x44, 0x46, 0x2D};
Files.write(tempDir.resolve("W-0124.pdf"), pdf);
when(documentService.findByOriginalFilename("W-0124")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
Path xlsx = writeDocs(tempDir, docRow("W-0124", "", "", "", "", "", "", "", ""));
importer.load(xlsx.toFile());
// exactly importDir/<index>.pdf was uploaded — the S3 key carries that basename
org.mockito.ArgumentCaptor<RequestBody> bodyCaptor = org.mockito.ArgumentCaptor.forClass(RequestBody.class);
verify(s3Client).putObject(any(PutObjectRequest.class), bodyCaptor.capture());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d ->
d.getStatus() == DocumentStatus.UPLOADED
&& d.getFilePath() != null
&& d.getFilePath().endsWith("_W-0124.pdf")));
}
@Test
void load_yieldsPlaceholder_whenIndexedPdfMissing(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
when(documentService.findByOriginalFilename("X-9999")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
Path xlsx = writeDocs(tempDir, docRow("X-9999", "", "", "", "", "", "", "", ""));
importer.load(xlsx.toFile());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d -> d.getStatus() == DocumentStatus.PLACEHOLDER));
verify(s3Client, never()).putObject(any(PutObjectRequest.class), any(RequestBody.class));
}
@Test
void load_rejectsMaliciousIndex_neverReadsOutsideImportDir(@TempDir Path tempDir) throws Exception {
// An index with a path separator must be skipped outright, never used for disk I/O.
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Path xlsx = writeDocs(tempDir, docRow("../../etc/cron.d/x", "", "", "", "", "", "", "", ""));
DocumentImporter.LoadResult result = importer.load(xlsx.toFile());
assertThat(result.skippedFiles())
.extracting(ImportStatus.SkippedFile::reason)
.containsExactly(ImportStatus.SkipReason.INVALID_FILENAME_PATH_TRAVERSAL);
verify(documentService, never()).save(any());
verify(s3Client, never()).putObject(any(PutObjectRequest.class), any(RequestBody.class));
}
@Test
void resolvePdfByIndex_throwsWhenResolvedPathEscapesImportDir_viaSymlink(
@TempDir Path importDirPath, @TempDir Path outsideDir) throws Exception {
// Containment defense-in-depth: even a syntactically valid index whose <index>.pdf is a
// symlink pointing outside importDir must be refused — the resolved canonical path is
// asserted to stay inside importDir.
Path outsideFile = outsideDir.resolve("secret.pdf");
Files.writeString(outsideFile, "sensitive");
Files.createSymbolicLink(importDirPath.resolve("W-0001.pdf"), outsideFile);
ReflectionTestUtils.setField(importer, "importDir", importDirPath.toString());
org.assertj.core.api.Assertions.assertThatThrownBy(
() -> ReflectionTestUtils.invokeMethod(importer, "resolvePdfByIndex", "W-0001", 2))
.isInstanceOf(org.raddatz.familienarchiv.exception.DomainException.class);
}
@Test
void resolvePdfByIndex_returnsExactlyImportDirIndexPdf_whenPresent(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Path expected = tempDir.resolve("Eu-0628.pdf");
Files.writeString(expected, "%PDF-1.4");
Optional<File> resolved = ReflectionTestUtils.invokeMethod(importer, "resolvePdfByIndex", "Eu-0628", 2);
assertThat(resolved).isPresent();
assertThat(resolved.get().getCanonicalFile()).isEqualTo(expected.toFile().getCanonicalFile());
}
// NOTE (Sara, PR #687): the IOException branch of resolvePdfByIndex — where
// File.getCanonicalPath() itself throws (an OS-level failure mid-resolution, not the
// symlink-escape DomainException) — is intentionally NOT covered by a test. Unlike
// isPdfMagicBytes, which has the package-private openFileStream(File) seam a Mockito spy can
// make throw, getCanonicalPath() is called on a File built internally with no injection seam,
// and there is no portable, deterministic way to make it throw on a temp file (it does not
// throw for missing/symlinked paths — those are handled by isFile()/the containment check).
// Adding a seam purely to test this would be production code in service of a non-defect; the
// substantive fix is the log.warn() now emitted in that branch so the quiet skip surfaces in
// ops. Left uncovered by deliberate decision, documented here so the branch is not assumed
// tested.
// ─── PDF magic-byte guard — ported — do not remove ──────────────────────────────
@Test
void load_skipsFile_whenNotPdfMagicBytes(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Files.writeString(tempDir.resolve("W-0001.pdf"), "not a pdf");
lenient().when(documentService.findByOriginalFilename(any())).thenReturn(Optional.empty());
Path xlsx = writeDocs(tempDir, docRow("W-0001", "", "", "", "", "", "", "", ""));
DocumentImporter.LoadResult result = importer.load(xlsx.toFile());
assertThat(result.skippedFiles())
.extracting(ImportStatus.SkippedFile::reason)
.containsExactly(ImportStatus.SkipReason.INVALID_PDF_SIGNATURE);
verify(s3Client, never()).putObject(any(PutObjectRequest.class), any(RequestBody.class));
}
@Test
void load_skipsFile_whenMagicByteCheckThrowsIoException(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Files.writeString(tempDir.resolve("W-0001.pdf"), "content");
lenient().when(documentService.findByOriginalFilename(any())).thenReturn(Optional.empty());
Path xlsx = writeDocs(tempDir, docRow("W-0001", "", "", "", "", "", "", "", ""));
// FileStreamOpener is injected — stub it to throw, no spy on the importer needed.
org.mockito.Mockito.when(fileStreamOpener.open(any(File.class)))
.thenThrow(new java.io.IOException("read error"));
DocumentImporter.LoadResult result = importer.load(xlsx.toFile());
assertThat(result.skippedFiles())
.extracting(ImportStatus.SkippedFile::reason)
.containsExactly(ImportStatus.SkipReason.FILE_READ_ERROR);
}
@Test
void load_skipsAlreadyExists_whenDocumentUploadedNotPlaceholder(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Document existing = Document.builder().id(UUID.randomUUID())
.originalFilename("W-0001").status(DocumentStatus.UPLOADED).build();
when(documentService.findByOriginalFilename("W-0001")).thenReturn(Optional.of(existing));
Path xlsx = writeDocs(tempDir, docRow("W-0001", "", "", "", "", "", "", "", ""));
DocumentImporter.LoadResult result = importer.load(xlsx.toFile());
assertThat(result.skippedFiles())
.extracting(ImportStatus.SkippedFile::reason)
.containsExactly(ImportStatus.SkipReason.ALREADY_EXISTS);
verify(documentService, never()).save(any());
}
// ─── presence of importDir/<index>.pdf drives status: present → UPLOADED, absent → PLACEHOLDER ─
@Test
void load_setsStatusPlaceholder_whenNoIndexedPdf(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
when(documentService.findByOriginalFilename("W-0099")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
Path xlsx = writeDocs(tempDir, docRow("W-0099", "", "", "", "", "", "", "", ""));
importer.load(xlsx.toFile());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d -> d.getStatus() == DocumentStatus.PLACEHOLDER));
verify(s3Client, never()).putObject(any(PutObjectRequest.class), any(RequestBody.class));
}
// ─── attribution routing — register-first + always retain raw ────────────────────
@Test
void load_linksRegisterSender_andRetainsRawSenderText(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Person walter = Person.builder().id(UUID.randomUUID()).sourceRef("de-gruyter-walter")
.firstName("Walter").lastName("de Gruyter").build();
when(documentService.findByOriginalFilename("W-0001")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
when(personService.findBySourceRef("de-gruyter-walter")).thenReturn(Optional.of(walter));
Path xlsx = writeDocs(tempDir, docRow("W-0001", "de-gruyter-walter", "Walter de Gruyter",
"", "", "", "", "", ""));
importer.load(xlsx.toFile());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d ->
d.getSender() == walter && "Walter de Gruyter".equals(d.getSenderText())));
}
@Test
void load_createsProvisionalSender_whenSlugUnmatchedInRegister(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Person provisional = Person.builder().id(UUID.randomUUID()).sourceRef("schwester-hanni")
.lastName("Schwester Hanni").provisional(true).build();
when(documentService.findByOriginalFilename("W-0002")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
when(personService.findBySourceRef("schwester-hanni")).thenReturn(Optional.empty());
when(personService.upsertBySourceRef(any())).thenReturn(provisional);
Path xlsx = writeDocs(tempDir, docRow("W-0002", "schwester-hanni", "Schwester Hanni",
"", "", "", "", "", ""));
importer.load(xlsx.toFile());
org.mockito.ArgumentCaptor<PersonUpsertCommand> captor =
org.mockito.ArgumentCaptor.forClass(PersonUpsertCommand.class);
verify(personService).upsertBySourceRef(captor.capture());
assertThat(captor.getValue().provisional()).isTrue();
assertThat(captor.getValue().lastName()).isEqualTo("Schwester Hanni");
}
@Test
void load_createsNoSenderPerson_whenSlugEmptyButRawPresent(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
when(documentService.findByOriginalFilename("W-0003")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
Path xlsx = writeDocs(tempDir, docRow("W-0003", "", "?",
"", "", "", "", "", ""));
importer.load(xlsx.toFile());
verify(personService, never()).findBySourceRef(any());
verify(personService, never()).upsertBySourceRef(any());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d ->
d.getSender() == null && "?".equals(d.getSenderText())));
}
@Test
void load_splitsMultipleReceivers_andRetainsRawReceiverText(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Person herbert = Person.builder().id(UUID.randomUUID()).sourceRef("cram-herbert").lastName("Cram").build();
Person clara = Person.builder().id(UUID.randomUUID()).sourceRef("clara").lastName("Clara").build();
when(documentService.findByOriginalFilename("W-0004")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
when(personService.findBySourceRef("cram-herbert")).thenReturn(Optional.of(herbert));
when(personService.findBySourceRef("clara")).thenReturn(Optional.of(clara));
Path xlsx = writeDocs(tempDir, docRow("W-0004", "", "",
"cram-herbert|clara", "Herbert Cram|Clara", "", "", "", ""));
importer.load(xlsx.toFile());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d ->
d.getReceivers().size() == 2
&& d.getReceivers().contains(herbert)
&& d.getReceivers().contains(clara)
&& "Herbert Cram|Clara".equals(d.getReceiverText())));
}
@Test
void load_provisionalReceiverUsesHumanNameFromReceiverNames_notSlug(@TempDir Path tempDir) throws Exception {
// Regression: resolveReceivers used to pass the slug as both `sourceRef` AND `lastName`,
// so an unresolved receiver "smith-john" became a provisional Person with
// lastName="smith-john". The fix consumes the parallel `receiver_names` column.
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Person provisional = Person.builder().id(UUID.randomUUID()).sourceRef("smith-john")
.lastName("John Smith").provisional(true).build();
when(documentService.findByOriginalFilename("W-0050")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
when(personService.findBySourceRef("smith-john")).thenReturn(Optional.empty());
when(personService.upsertBySourceRef(any())).thenReturn(provisional);
Path xlsx = writeDocs(tempDir, docRow("W-0050", "", "",
"smith-john", "John Smith", "", "", "", ""));
importer.load(xlsx.toFile());
org.mockito.ArgumentCaptor<PersonUpsertCommand> captor =
org.mockito.ArgumentCaptor.forClass(PersonUpsertCommand.class);
verify(personService).upsertBySourceRef(captor.capture());
assertThat(captor.getValue().sourceRef()).isEqualTo("smith-john");
assertThat(captor.getValue().lastName()).isEqualTo("John Smith");
assertThat(captor.getValue().provisional()).isTrue();
}
@Test
void load_provisionalReceiverFallsBackToSlug_whenNamesListShorterThanSlugs(@TempDir Path tempDir) throws Exception {
// Parallel-list zip: if the names list is shorter than the slugs list, slugs without a
// matching name fall back to slug as the display name. This is the "missing name" case
// (rare in canonical data but the contract must define it).
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Person alice = Person.builder().id(UUID.randomUUID()).sourceRef("alice-jones")
.lastName("Alice Jones").provisional(true).build();
Person bob = Person.builder().id(UUID.randomUUID()).sourceRef("bob-roe")
.lastName("bob-roe").provisional(true).build();
when(documentService.findByOriginalFilename("W-0051")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
when(personService.findBySourceRef("alice-jones")).thenReturn(Optional.empty());
when(personService.findBySourceRef("bob-roe")).thenReturn(Optional.empty());
when(personService.upsertBySourceRef(any())).thenReturn(alice).thenReturn(bob);
Path xlsx = writeDocs(tempDir, docRow("W-0051", "", "",
"alice-jones|bob-roe", "Alice Jones", "", "", "", ""));
importer.load(xlsx.toFile());
org.mockito.ArgumentCaptor<PersonUpsertCommand> captor =
org.mockito.ArgumentCaptor.forClass(PersonUpsertCommand.class);
verify(personService, org.mockito.Mockito.times(2)).upsertBySourceRef(captor.capture());
assertThat(captor.getAllValues()).extracting(PersonUpsertCommand::sourceRef)
.containsExactly("alice-jones", "bob-roe");
assertThat(captor.getAllValues()).extracting(PersonUpsertCommand::lastName)
.containsExactly("Alice Jones", "bob-roe");
}
// ─── clean date values parse without semantic logic ──────────────────────────────
@Test
void load_parsesCleanDateAndPrecision(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
when(documentService.findByOriginalFilename("W-0005")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
Path xlsx = writeDocs(tempDir, docRow("W-0005", "", "",
"", "", "1916-06-01", "1.6.1916", "MONTH", ""));
importer.load(xlsx.toFile());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d ->
LocalDate.of(1916, 6, 1).equals(d.getDocumentDate())
&& d.getMetaDatePrecision() == org.raddatz.familienarchiv.document.DatePrecision.MONTH
&& "1.6.1916".equals(d.getMetaDateRaw())));
}
@Test
void load_attachesTagBySourceRef(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Tag tag = Tag.builder().id(UUID.randomUUID()).name("Brautbriefe").sourceRef("Themen/Brautbriefe").build();
when(documentService.findByOriginalFilename("W-0006")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
when(tagService.findBySourceRef("Themen/Brautbriefe")).thenReturn(Optional.of(tag));
Path xlsx = writeDocs(tempDir, docRowWithTag("W-0006", "Themen/Brautbriefe"));
importer.load(xlsx.toFile());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d -> d.getTags().contains(tag)));
}
// ─── idempotency — update existing document in place by index ─────────────────────
@Test
void load_updatesExistingDocumentInPlace_whenIndexExists(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Document existing = Document.builder().id(UUID.randomUUID())
.originalFilename("W-0007").status(DocumentStatus.PLACEHOLDER).build();
when(documentService.findByOriginalFilename("W-0007")).thenReturn(Optional.of(existing));
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
Path xlsx = writeDocs(tempDir, docRow("W-0007", "", "", "", "", "", "", "", ""));
importer.load(xlsx.toFile());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d -> d.getId().equals(existing.getId())));
}
// ─── canonical collections are authoritative — re-import prunes removed links ──────
@Test
void load_prunesReceiversAndTags_whenCanonicalRowShrinks(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
Person staleReceiver = Person.builder().id(UUID.randomUUID()).sourceRef("stale-receiver").lastName("Stale").build();
Tag staleTag = Tag.builder().id(UUID.randomUUID()).name("Stale").sourceRef("Themen/Stale").build();
Document existing = Document.builder().id(UUID.randomUUID())
.originalFilename("W-0008").status(DocumentStatus.PLACEHOLDER).build();
existing.getReceivers().add(staleReceiver);
existing.getTags().add(staleTag);
when(documentService.findByOriginalFilename("W-0008")).thenReturn(Optional.of(existing));
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
// The canonical row now carries no receiver and no tag: both stale links must go.
Path xlsx = writeDocs(tempDir, docRow("W-0008", "", "", "", "", "", "", "", ""));
importer.load(xlsx.toFile());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d ->
d.getReceivers().isEmpty() && d.getTags().isEmpty()));
}
// ─── title carries the honest date label — never a precision the data lacks ───────
@Test
void load_buildsTitleWithMonthLabel_whenPrecisionIsMonth(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
when(documentService.findByOriginalFilename("W-0100")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
Path xlsx = writeDocs(tempDir, docRow("W-0100", "", "", "", "",
"1916-06-01", "Juni 1916", "MONTH", ""));
importer.load(xlsx.toFile());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d ->
d.getTitle().contains("Juni 1916") && !d.getTitle().contains("1. Juni")));
}
@Test
void load_buildsTitleWithFullDate_whenPrecisionIsDay(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
when(documentService.findByOriginalFilename("W-0101")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
Path xlsx = writeDocs(tempDir, docRow("W-0101", "", "", "", "",
"1943-12-24", "24.12.1943", "DAY", ""));
importer.load(xlsx.toFile());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d ->
d.getTitle().contains("24. Dezember 1943")));
}
@Test
void load_buildsTitleFromIndexOnly_whenDateUnknown(@TempDir Path tempDir) throws Exception {
ReflectionTestUtils.setField(importer, "importDir", tempDir.toString());
when(documentService.findByOriginalFilename("W-0102")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
Path xlsx = writeDocs(tempDir, docRow("W-0102", "", "", "", "",
"", "?", "UNKNOWN", ""));
importer.load(xlsx.toFile());
verify(documentService).save(org.mockito.ArgumentMatchers.argThat(d ->
d.getTitle().equals("W-0102")));
}
// ─── helpers ─────────────────────────────────────────────────────────────────────
private Boolean validIndex(String index) {
return ReflectionTestUtils.invokeMethod(importer, "isValidImportIndex", index);
}
private Map<String, String> docRow(String index, String senderId, String senderName,
String receiverIds, String receiverNames, String dateIso,
String dateRaw, String datePrecision, String dateEnd) {
Map<String, String> r = new LinkedHashMap<>();
r.put("index", index);
r.put("sender_person_id", senderId);
r.put("sender_name", senderName);
r.put("receiver_person_ids", receiverIds);
r.put("receiver_names", receiverNames);
r.put("date_iso", dateIso);
r.put("date_raw", dateRaw);
r.put("date_precision", datePrecision);
r.put("date_end", dateEnd);
r.put("location", "");
r.put("tags", "");
r.put("summary", "");
return r;
}
private Map<String, String> docRowWithTag(String index, String tagPath) {
Map<String, String> r = docRow(index, "", "", "", "", "", "", "", "");
r.put("tags", tagPath);
return r;
}
@SafeVarargs
private Path writeDocs(Path dir, Map<String, String>... rows) throws Exception {
Path xlsx = dir.resolve("canonical-documents.xlsx");
List<String> headers = List.of("index", "sender_person_id", "sender_name",
"receiver_person_ids", "receiver_names", "date_iso", "date_raw", "date_precision",
"date_end", "location", "tags", "summary");
try (XSSFWorkbook wb = new XSSFWorkbook()) {
Sheet sheet = wb.createSheet("Sheet1");
Row header = sheet.createRow(0);
for (int i = 0; i < headers.size(); i++) {
header.createCell(i).setCellValue(headers.get(i));
}
for (int r = 0; r < rows.length; r++) {
Row row = sheet.createRow(r + 1);
for (int c = 0; c < headers.size(); c++) {
row.createCell(c).setCellValue(rows[r].getOrDefault(headers.get(c), ""));
}
}
try (OutputStream out = Files.newOutputStream(xlsx)) {
wb.write(out);
}
}
return xlsx;
}
}

View File

@@ -1,49 +0,0 @@
package org.raddatz.familienarchiv.importing;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.DynamicTest;
import org.junit.jupiter.api.TestFactory;
import org.raddatz.familienarchiv.document.DatePrecision;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.List;
import static org.assertj.core.api.Assertions.assertThat;
/**
* Asserts the Java title label against the SAME shared fixture table the TS
* formatter spec uses ({@code docs/date-label-fixtures.json}). This is the
* drift guard requested in #666 review: the two label implementations cannot
* silently diverge (en-dash vs hyphen, "ca." vs "circa", season words, range
* collapse) because both are pinned to one committed rule set.
*/
class DocumentTitleFormatterTest {
@TestFactory
List<DynamicTest> matchesSharedFixtureTable() throws Exception {
// Maven runs tests from the backend/ module dir; the fixture lives at repo-root docs/.
Path fixture = Path.of("..", "docs", "date-label-fixtures.json");
JsonNode root = new ObjectMapper().readTree(Files.readString(fixture));
List<DynamicTest> tests = new ArrayList<>();
for (JsonNode c : root.get("cases")) {
String name = c.get("name").asText();
LocalDate anchor = parseDate(c.get("anchor"));
DatePrecision precision = DatePrecision.valueOf(c.get("precision").asText());
LocalDate end = parseDate(c.get("end"));
String raw = c.get("raw").isNull() ? null : c.get("raw").asText();
String expected = c.get("expected").asText();
tests.add(DynamicTest.dynamicTest(name, () ->
assertThat(DocumentTitleFormatter.formatTitleDate(anchor, precision, end, raw))
.isEqualTo(expected)));
}
return tests;
}
private static LocalDate parseDate(JsonNode node) {
return node == null || node.isNull() ? null : LocalDate.parse(node.asText());
}
}

View File

@@ -0,0 +1,777 @@
package org.raddatz.familienarchiv.importing;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.junit.jupiter.api.io.TempDir;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.document.ThumbnailAsyncRunner;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.tag.Tag;
import org.raddatz.familienarchiv.tag.TagService;
import org.raddatz.familienarchiv.person.PersonService;
import org.springframework.test.util.ReflectionTestUtils;
import software.amazon.awssdk.core.sync.RequestBody;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.xml.sax.SAXParseException;
import java.io.File;
import java.io.OutputStream;
import java.io.ByteArrayOutputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.LocalDate;
import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.*;
@ExtendWith(MockitoExtension.class)
class MassImportServiceTest {
@Mock DocumentService documentService;
@Mock PersonService personService;
@Mock TagService tagService;
@Mock S3Client s3Client;
@Mock ThumbnailAsyncRunner thumbnailAsyncRunner;
MassImportService service;
@BeforeEach
void setUp() {
service = new MassImportService(documentService, personService, tagService, s3Client, thumbnailAsyncRunner);
ReflectionTestUtils.setField(service, "bucketName", "test-bucket");
ReflectionTestUtils.setField(service, "importDir", "/import");
ReflectionTestUtils.setField(service, "colIndex", 0);
ReflectionTestUtils.setField(service, "colBox", 1);
ReflectionTestUtils.setField(service, "colFolder", 2);
ReflectionTestUtils.setField(service, "colSender", 3);
ReflectionTestUtils.setField(service, "colReceivers", 5);
ReflectionTestUtils.setField(service, "colDate", 7);
ReflectionTestUtils.setField(service, "colLocation", 9);
ReflectionTestUtils.setField(service, "colTags", 10);
ReflectionTestUtils.setField(service, "colSummary", 11);
ReflectionTestUtils.setField(service, "colTranscription", 13);
}
// ─── getStatus ────────────────────────────────────────────────────────────
@Test
void getStatus_returnsIdleByDefault() {
assertThat(service.getStatus().state()).isEqualTo(MassImportService.State.IDLE);
}
@Test
void getStatus_hasStatusCode_IMPORT_IDLE_byDefault() {
assertThat(service.getStatus().statusCode()).isEqualTo("IMPORT_IDLE");
}
// ─── runImportAsync ───────────────────────────────────────────────────────
@Test
void runImportAsync_setsFailedStatus_whenImportDirectoryDoesNotExist() {
// /import directory doesn't exist in test environment → IOException → IMPORT_FAILED_INTERNAL
service.runImportAsync();
assertThat(service.getStatus().state()).isEqualTo(MassImportService.State.FAILED);
assertThat(service.getStatus().statusCode()).isEqualTo("IMPORT_FAILED_INTERNAL");
}
@Test
void runImportAsync_readsFromConfiguredImportDir(@TempDir Path tempDir) {
// Empty temp dir → findSpreadsheetFile throws "no spreadsheet" with the
// configured path in the message. Proves the field, not a constant,
// drives the lookup.
ReflectionTestUtils.setField(service, "importDir", tempDir.toString());
service.runImportAsync();
assertThat(service.getStatus().state()).isEqualTo(MassImportService.State.FAILED);
assertThat(service.getStatus().message()).contains(tempDir.toString());
}
@Test
void runImportAsync_setsStatusCode_IMPORT_FAILED_NO_SPREADSHEET_whenDirIsEmpty(@TempDir Path tempDir) {
ReflectionTestUtils.setField(service, "importDir", tempDir.toString());
service.runImportAsync();
assertThat(service.getStatus().statusCode()).isEqualTo("IMPORT_FAILED_NO_SPREADSHEET");
}
@Test
void runImportAsync_setsStatusCode_IMPORT_DONE_whenSpreadsheetHasNoDataRows(@TempDir Path tempDir) throws Exception {
Path xlsx = tempDir.resolve("import.xlsx");
try (XSSFWorkbook wb = new XSSFWorkbook()) {
wb.createSheet("Sheet1");
try (OutputStream out = Files.newOutputStream(xlsx)) {
wb.write(out);
}
}
ReflectionTestUtils.setField(service, "importDir", tempDir.toString());
service.runImportAsync();
assertThat(service.getStatus().statusCode()).isEqualTo("IMPORT_DONE");
}
@Test
void runImportAsync_throwsConflict_whenAlreadyRunning() {
MassImportService.ImportStatus running = new MassImportService.ImportStatus(
MassImportService.State.RUNNING, "IMPORT_RUNNING", "Running...", 0, List.of(), LocalDateTime.now());
ReflectionTestUtils.setField(service, "currentStatus", running);
assertThatThrownBy(() -> service.runImportAsync())
.isInstanceOf(DomainException.class)
.hasMessageContaining("already in progress");
}
// ─── importSingleDocument — skip already uploaded ─────────────────────────
@Test
void importSingleDocument_skips_whenDocumentAlreadyUploadedNotPlaceholder() {
Document existing = Document.builder()
.id(UUID.randomUUID())
.originalFilename("doc001.pdf")
.status(DocumentStatus.UPLOADED)
.build();
when(documentService.findByOriginalFilename("doc001.pdf")).thenReturn(Optional.of(existing));
Optional<String> result = service.importSingleDocument(minimalCells("doc001.pdf"), Optional.empty(), "doc001.pdf", "doc001");
verify(documentService, never()).save(any());
assertThat(result).isPresent().contains("ALREADY_EXISTS");
}
// ─── importSingleDocument — already-exists guard fires before file I/O ─────
@Test
void importSingleDocument_skipsWithAlreadyExists_whenDocumentUploadedAndFileIsPresent(@TempDir Path tempDir) throws Exception {
// Document already exists with status UPLOADED (not PLACEHOLDER).
// A physical PDF file is also present on disk (valid magic bytes).
// Expected: ALREADY_EXISTS is returned and no S3 upload is attempted —
// the guard fires before any file I/O, so no partial processing occurs.
Document existing = Document.builder()
.id(UUID.randomUUID())
.originalFilename("present.pdf")
.status(DocumentStatus.UPLOADED)
.build();
when(documentService.findByOriginalFilename("present.pdf")).thenReturn(Optional.of(existing));
Path physicalFile = tempDir.resolve("present.pdf");
byte[] pdfHeader = {0x25, 0x50, 0x44, 0x46, 0x2D}; // %PDF-
Files.write(physicalFile, pdfHeader);
Optional<String> result = service.importSingleDocument(
minimalCells("present.pdf"), Optional.of(physicalFile.toFile()), "present.pdf", "present");
assertThat(result).isPresent().contains("ALREADY_EXISTS");
verify(s3Client, never()).putObject(any(PutObjectRequest.class), any(RequestBody.class));
verify(documentService, never()).save(any());
}
// ─── importSingleDocument — S3 failure surfaced in skippedFiles ──────────
@Test
void runImportAsync_addsS3UploadFailed_toSkippedFiles_whenS3Throws(@TempDir Path tempDir) throws Exception {
byte[] pdfHeader = {0x25, 0x50, 0x44, 0x46, 0x2D}; // %PDF-
Files.write(tempDir.resolve("upload_fail.pdf"), pdfHeader);
buildMinimalImportXlsx(tempDir, "upload_fail.pdf");
ReflectionTestUtils.setField(service, "importDir", tempDir.toString());
when(documentService.findByOriginalFilename("upload_fail.pdf")).thenReturn(Optional.empty());
doThrow(new RuntimeException("S3 unavailable"))
.when(s3Client).putObject(any(PutObjectRequest.class), any(RequestBody.class));
service.runImportAsync();
assertThat(service.getStatus().skipped()).isEqualTo(1);
assertThat(service.getStatus().skippedFiles())
.extracting(MassImportService.SkippedFile::filename, MassImportService.SkippedFile::reason)
.containsExactly(org.assertj.core.groups.Tuple.tuple("upload_fail.pdf", "S3_UPLOAD_FAILED"));
}
@Test
void runImportAsync_addsAlreadyExists_toSkippedFiles_whenDocumentAlreadyUploaded(@TempDir Path tempDir) throws Exception {
buildMinimalImportXlsx(tempDir, "existing.pdf");
ReflectionTestUtils.setField(service, "importDir", tempDir.toString());
Document existing = Document.builder()
.id(UUID.randomUUID())
.originalFilename("existing.pdf")
.status(DocumentStatus.UPLOADED)
.build();
when(documentService.findByOriginalFilename("existing.pdf")).thenReturn(Optional.of(existing));
service.runImportAsync();
assertThat(service.getStatus().skipped()).isEqualTo(1);
assertThat(service.getStatus().skippedFiles())
.extracting(MassImportService.SkippedFile::reason)
.containsExactly("ALREADY_EXISTS");
}
// ─── importSingleDocument — create new document (metadata only) ───────────
@Test
void importSingleDocument_createsNewDocument_whenNotExists() {
when(documentService.findByOriginalFilename("doc002.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
service.importSingleDocument(minimalCells("doc002.pdf"), Optional.empty(), "doc002.pdf", "doc002");
verify(documentService).save(argThat(d ->
d.getOriginalFilename().equals("doc002.pdf")
&& d.getStatus() == DocumentStatus.PLACEHOLDER));
}
// ─── importSingleDocument — update existing placeholder ──────────────────
@Test
void importSingleDocument_updatesExistingPlaceholder() {
Document placeholder = Document.builder()
.id(UUID.randomUUID())
.originalFilename("existing.pdf")
.status(DocumentStatus.PLACEHOLDER)
.build();
when(documentService.findByOriginalFilename("existing.pdf")).thenReturn(Optional.of(placeholder));
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
service.importSingleDocument(minimalCells("existing.pdf"), Optional.empty(), "existing.pdf", "existing");
verify(documentService).save(same(placeholder));
}
// ─── importSingleDocument — with file (S3 upload) ─────────────────────────
@Test
void importSingleDocument_uploadsFileToS3_andSetsStatusUploaded(@TempDir Path tempDir) throws Exception {
Path tempFile = tempDir.resolve("doc003.pdf");
Files.write(tempFile, "PDF content".getBytes());
when(documentService.findByOriginalFilename("doc003.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
service.importSingleDocument(
minimalCells("doc003.pdf"), Optional.of(tempFile.toFile()), "doc003.pdf", "doc003");
verify(s3Client).putObject(any(PutObjectRequest.class), any(RequestBody.class));
verify(documentService).save(argThat(d -> d.getStatus() == DocumentStatus.UPLOADED));
}
@Test
void importSingleDocument_returnsS3UploadFailed_whenS3UploadFails(@TempDir Path tempDir) throws Exception {
Path tempFile = tempDir.resolve("fail.pdf");
Files.write(tempFile, "data".getBytes());
when(documentService.findByOriginalFilename("fail.pdf")).thenReturn(Optional.empty());
doThrow(new RuntimeException("S3 error"))
.when(s3Client).putObject(any(PutObjectRequest.class), any(RequestBody.class));
Optional<String> result = service.importSingleDocument(
minimalCells("fail.pdf"), Optional.of(tempFile.toFile()), "fail.pdf", "fail");
verify(documentService, never()).save(any());
assertThat(result).isPresent().contains("S3_UPLOAD_FAILED");
}
// ─── importSingleDocument — sender handling ───────────────────────────────
@Test
void importSingleDocument_setsNullSender_whenSenderCellIsBlank() {
when(documentService.findByOriginalFilename("nosender.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
List<String> cells = buildCells("nosender.pdf", "", "", "");
service.importSingleDocument(cells, Optional.empty(), "nosender.pdf", "nosender");
verify(documentService).save(argThat(d -> d.getSender() == null));
verify(personService, never()).findOrCreateByAlias(any());
}
@Test
void importSingleDocument_createsSender_whenSenderCellIsNonBlank() {
Person sender = Person.builder().id(UUID.randomUUID()).firstName("Walter").lastName("Müller").build();
when(documentService.findByOriginalFilename("withsender.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
when(personService.findOrCreateByAlias("Walter Müller")).thenReturn(sender);
List<String> cells = buildCells("withsender.pdf", "Walter Müller", "", "");
service.importSingleDocument(cells, Optional.empty(), "withsender.pdf", "withsender");
verify(personService).findOrCreateByAlias("Walter Müller");
verify(documentService).save(argThat(d -> d.getSender() == sender));
}
// ─── importSingleDocument — tag handling ─────────────────────────────────
@Test
void importSingleDocument_createsTag_whenTagCellIsNonBlank() {
Tag tag = Tag.builder().id(UUID.randomUUID()).name("Familie").build();
when(documentService.findByOriginalFilename("tagged.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
when(tagService.findOrCreate("Familie")).thenReturn(tag);
List<String> cells = buildCells("tagged.pdf", "", "", "Familie");
service.importSingleDocument(cells, Optional.empty(), "tagged.pdf", "tagged");
verify(tagService).findOrCreate("Familie");
}
@Test
void importSingleDocument_doesNotCreateTag_whenTagCellIsBlank() {
when(documentService.findByOriginalFilename("notag.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
List<String> cells = buildCells("notag.pdf", "", "", "");
service.importSingleDocument(cells, Optional.empty(), "notag.pdf", "notag");
verify(tagService, never()).findOrCreate(any());
}
// ─── importSingleDocument — metadataComplete heuristic ───────────────────
@Test
void importSingleDocument_metadataComplete_whenSenderPresent() {
Person sender = Person.builder().id(UUID.randomUUID()).firstName("A").lastName("B").build();
when(documentService.findByOriginalFilename("meta.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
when(personService.findOrCreateByAlias("A B")).thenReturn(sender);
List<String> cells = buildCells("meta.pdf", "A B", "", "");
service.importSingleDocument(cells, Optional.empty(), "meta.pdf", "meta");
verify(documentService).save(argThat(Document::isMetadataComplete));
}
@Test
void importSingleDocument_metadataIncomplete_whenNoKeyFieldsPresent() {
when(documentService.findByOriginalFilename("nometa.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
List<String> cells = buildCells("nometa.pdf", "", "", "");
service.importSingleDocument(cells, Optional.empty(), "nometa.pdf", "nometa");
verify(documentService).save(argThat(d -> !d.isMetadataComplete()));
}
// ─── importSingleDocument — blank fields set to null ─────────────────────
@Test
void importSingleDocument_setsBlankFieldsToNull() {
when(documentService.findByOriginalFilename("blank.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
List<String> cells = buildCells("blank.pdf", "", "", "");
service.importSingleDocument(cells, Optional.empty(), "blank.pdf", "blank");
verify(documentService).save(argThat(d ->
d.getLocation() == null &&
d.getSummary() == null &&
d.getTranscription() == null &&
d.getArchiveBox() == null &&
d.getArchiveFolder() == null));
}
// ─── processRows — via ReflectionTestUtils ────────────────────────────────
@Test
void processRows_returnsZero_whenOnlyHeaderRow() {
List<List<String>> rows = List.of(List.of("header", "col1"));
MassImportService.ProcessResult result = ReflectionTestUtils.invokeMethod(service, "processRows", rows);
assertThat(result.processed()).isEqualTo(0);
}
@Test
void processRows_skipsRowWithBlankIndex() {
List<List<String>> rows = List.of(
List.of("header"),
minimalCells("") // blank index
);
MassImportService.ProcessResult result = ReflectionTestUtils.invokeMethod(service, "processRows", rows);
assertThat(result.processed()).isEqualTo(0);
verify(documentService, never()).findByOriginalFilename(any());
}
@Test
void processRows_addsExtension_whenIndexHasNoDot() {
when(documentService.findByOriginalFilename("doc001.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
List<List<String>> rows = List.of(
List.of("header"),
minimalCells("doc001") // no dot → appends ".pdf"
);
MassImportService.ProcessResult result = ReflectionTestUtils.invokeMethod(service, "processRows", rows);
assertThat(result.processed()).isEqualTo(1);
verify(documentService).findByOriginalFilename("doc001.pdf");
}
@Test
void processRows_usesFilenameAsIs_whenIndexHasDot() {
when(documentService.findByOriginalFilename("doc002.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
List<List<String>> rows = List.of(
List.of("header"),
minimalCells("doc002.pdf") // has dot → used as-is
);
MassImportService.ProcessResult result = ReflectionTestUtils.invokeMethod(service, "processRows", rows);
assertThat(result.processed()).isEqualTo(1);
verify(documentService).findByOriginalFilename("doc002.pdf");
}
// ─── importSingleDocument — non-blank optional fields ────────────────────
@Test
void importSingleDocument_setsNonNullOptionalFields_whenPresent() {
when(documentService.findByOriginalFilename("rich.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
// box=1, folder=2, location=9, summary=11, transcription=13
List<String> cells = List.of(
"rich.pdf", // 0: index
"Box A", // 1: box
"Folder B", // 2: folder
"", // 3: sender
"", // 4: unused
"", // 5: receivers
"", // 6: unused
"", // 7: date
"", // 8: unused
"Hamburg", // 9: location
"", // 10: tags
"A summary", // 11: summary
"", // 12: unused
"A transcript" // 13: transcription
);
service.importSingleDocument(cells, Optional.empty(), "rich.pdf", "rich");
verify(documentService).save(argThat(d ->
"Box A".equals(d.getArchiveBox()) &&
"Folder B".equals(d.getArchiveFolder()) &&
"Hamburg".equals(d.getLocation()) &&
"A summary".equals(d.getSummary()) &&
"A transcript".equals(d.getTranscription())));
}
@Test
void importSingleDocument_setsMetadataComplete_whenReceiversArePresent() {
Person receiver = Person.builder().id(UUID.randomUUID()).firstName("Walter").lastName("Müller").build();
when(documentService.findByOriginalFilename("rcv.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
when(personService.findOrCreateByAlias("Walter Müller")).thenReturn(receiver);
List<String> cells = List.of(
"rcv.pdf", "", "", "", "", "Walter Müller", "", "", "", "", "", "", "", "");
service.importSingleDocument(cells, Optional.empty(), "rcv.pdf", "rcv");
verify(documentService).save(argThat(Document::isMetadataComplete));
}
@Test
void importSingleDocument_setsMetadataComplete_whenDateIsPresent() {
when(documentService.findByOriginalFilename("dated.pdf")).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
List<String> cells = List.of(
"dated.pdf", "", "", "", "", "", "", "2024-03-15", "", "", "", "", "", "");
service.importSingleDocument(cells, Optional.empty(), "dated.pdf", "dated");
verify(documentService).save(argThat(Document::isMetadataComplete));
}
// ─── buildTitle — null location ───────────────────────────────────────────
@Test
void buildTitle_withNullLocation_skipsLocationPart() {
String result = ReflectionTestUtils.invokeMethod(service, "buildTitle",
"doc005", LocalDate.of(1940, 5, 1), (String) null);
assertThat(result).contains("doc005").contains("1940");
assertThat(result).doesNotContain("Berlin");
}
// ─── parseDate — via ReflectionTestUtils ─────────────────────────────────
@Test
void parseDate_returnsNull_whenValueIsNull() {
LocalDate result = ReflectionTestUtils.invokeMethod(service, "parseDate", (String) null);
assertThat(result).isNull();
}
@Test
void parseDate_returnsNull_whenValueIsBlank() {
LocalDate result = ReflectionTestUtils.invokeMethod(service, "parseDate", " ");
assertThat(result).isNull();
}
@Test
void parseDate_returnsDate_whenValidIsoFormat() {
LocalDate result = ReflectionTestUtils.invokeMethod(service, "parseDate", "2024-03-15");
assertThat(result).isEqualTo(LocalDate.of(2024, 3, 15));
}
@Test
void parseDate_returnsNull_whenInvalidDateString() {
LocalDate result = ReflectionTestUtils.invokeMethod(service, "parseDate", "15.03.2024");
assertThat(result).isNull();
}
// ─── buildTitle — via ReflectionTestUtils ────────────────────────────────
@Test
void buildTitle_withDateAndLocation() {
String result = ReflectionTestUtils.invokeMethod(service, "buildTitle",
"doc001", LocalDate.of(1940, 5, 1), "Berlin");
assertThat(result).contains("doc001").contains("Berlin").contains("1940");
}
@Test
void buildTitle_withDateOnly() {
String result = ReflectionTestUtils.invokeMethod(service, "buildTitle",
"doc002", LocalDate.of(1960, 8, 15), "");
assertThat(result).contains("doc002").contains("1960");
assertThat(result).doesNotContain("Berlin");
}
@Test
void buildTitle_withIndexOnly_whenDateAndLocationAreNull() {
String result = ReflectionTestUtils.invokeMethod(service, "buildTitle",
"doc003", null, "");
assertThat(result).isEqualTo("doc003");
}
@Test
void buildTitle_withLocationOnly_whenDateIsNull() {
// date=null, location present → date part skipped, location appended
String result = ReflectionTestUtils.invokeMethod(service, "buildTitle",
"doc004", null, "Berlin");
assertThat(result).contains("doc004").contains("Berlin");
assertThat(result).doesNotContain("("); // no date part
}
// ─── getCell — via ReflectionTestUtils ───────────────────────────────────
@Test
void getCell_returnsEmptyString_whenColBeyondListSize() {
List<String> cells = List.of("a", "b");
String result = ReflectionTestUtils.invokeMethod(service, "getCell", cells, 5);
assertThat(result).isEmpty();
}
@Test
void getCell_returnsEmptyString_whenValueIsNull() {
List<String> cells = new ArrayList<>();
cells.add(null);
cells.add("b");
String result = ReflectionTestUtils.invokeMethod(service, "getCell", cells, 0);
assertThat(result).isEmpty();
}
@Test
void getCell_returnsTrimmedValue() {
List<String> cells = List.of(" hello ", "world");
String result = ReflectionTestUtils.invokeMethod(service, "getCell", cells, 0);
assertThat(result).isEqualTo("hello");
}
// ─── PDF magic byte validation regression ─────────────────────────────────
@Test
void runImportAsync_uploadsValidPdf_andSkipsFakeOne(@TempDir Path tempDir) throws Exception {
setupOneValidOneFakeImport(tempDir);
service.runImportAsync();
verify(s3Client, times(1)).putObject(any(PutObjectRequest.class), any(RequestBody.class));
}
@Test
void runImportAsync_setsSkippedCount_toOne_whenOneFakeFile(@TempDir Path tempDir) throws Exception {
setupOneValidOneFakeImport(tempDir);
service.runImportAsync();
assertThat(service.getStatus().skipped()).isEqualTo(1);
}
@Test
void runImportAsync_includesRejectedFilename_inSkippedFiles(@TempDir Path tempDir) throws Exception {
setupOneValidOneFakeImport(tempDir);
service.runImportAsync();
assertThat(service.getStatus().skippedFiles())
.extracting(MassImportService.SkippedFile::filename)
.contains("fake.pdf");
}
@Test
void runImportAsync_skipsFile_whenShorterThanFourBytes(@TempDir Path tempDir) throws Exception {
Files.write(tempDir.resolve("tiny.pdf"), new byte[]{0x25, 0x50, 0x44}); // only 3 bytes
buildMinimalImportXlsx(tempDir, "tiny.pdf");
ReflectionTestUtils.setField(service, "importDir", tempDir.toString());
lenient().when(documentService.findByOriginalFilename(any())).thenReturn(Optional.empty());
service.runImportAsync();
assertThat(service.getStatus().skipped()).isEqualTo(1);
}
@Test
void runImportAsync_skipsFile_whenMagicBytesCheckThrowsIOException(@TempDir Path tempDir) throws Exception {
Files.writeString(tempDir.resolve("unreadable.pdf"), "some content");
buildMinimalImportXlsx(tempDir, "unreadable.pdf");
ReflectionTestUtils.setField(service, "importDir", tempDir.toString());
lenient().when(documentService.findByOriginalFilename(any())).thenReturn(Optional.empty());
MassImportService spyService = spy(service);
doThrow(new java.io.IOException("simulated read error")).when(spyService).openFileStream(any(File.class));
spyService.runImportAsync();
assertThat(spyService.getStatus().skipped()).isEqualTo(1);
assertThat(spyService.getStatus().skippedFiles())
.extracting(MassImportService.SkippedFile::reason)
.containsExactly("FILE_READ_ERROR");
}
// ─── readOds — XXE security regression ───────────────────────────────────
// Security regression — do not remove.
@Test
void readOds_rejects_xxe_doctype_payload(@TempDir Path tempDir) throws Exception {
File malicious = buildXxeOds(tempDir, "file:///etc/hostname");
assertThatThrownBy(() -> service.readOds(malicious))
.isInstanceOf(SAXParseException.class)
.hasMessageContaining("DOCTYPE is disallowed");
}
@Test
void readOds_parses_valid_ods_correctly(@TempDir Path tempDir) throws Exception {
File valid = buildValidOds(tempDir, "Mustermann");
List<List<String>> rows = service.readOds(valid);
assertThat(rows).isNotEmpty();
assertThat(rows.get(0)).contains("Mustermann");
}
// ─── helpers ──────────────────────────────────────────────────────────────
/**
* Builds a minimal 14-element cell row with the given filename at index 0
* and blanks for all optional fields.
*/
private List<String> minimalCells(String filename) {
return buildCells(filename, "", "", "");
}
/**
* Builds a cell row with sender, receiver, and tag controls.
* Layout matches the default column indices set in setUp().
*/
private List<String> buildCells(String filename, String sender, String receivers, String tag) {
// 14 elements: index=0,box=1,folder=2,sender=3,[4],receivers=5,[6],date=7,[8],location=9,tag=10,summary=11,[12],transcription=13
return List.of(
filename, // 0: index
"", // 1: box
"", // 2: folder
sender, // 3: sender
"", // 4: (unused)
receivers, // 5: receivers
"", // 6: (unused)
"", // 7: date
"", // 8: (unused)
"", // 9: location
tag, // 10: tags
"", // 11: summary
"", // 12: (unused)
"" // 13: transcription
);
}
/** Creates a minimal ODS ZIP containing a content.xml with an XXE payload. */
private File buildXxeOds(Path dir, String entityTarget) throws Exception {
String xml = "<?xml version=\"1.0\"?>"
+ "<!DOCTYPE foo [<!ENTITY xxe SYSTEM \"" + entityTarget + "\">]>"
+ "<office:document-content"
+ " xmlns:office=\"urn:oasis:names:tc:opendocument:xmlns:office:1.0\""
+ " xmlns:table=\"urn:oasis:names:tc:opendocument:xmlns:table:1.0\""
+ " xmlns:text=\"urn:oasis:names:tc:opendocument:xmlns:text:1.0\">"
+ "<office:body><office:spreadsheet>"
+ "<table:table><table:table-row><table:table-cell>"
+ "<text:p>&xxe;</text:p>"
+ "</table:table-cell></table:table-row></table:table>"
+ "</office:spreadsheet></office:body>"
+ "</office:document-content>";
return writeOdsZip(dir.resolve("malicious.ods"), xml);
}
/** Creates a minimal valid ODS ZIP containing a content.xml with the given cell value.
* cellValue must not contain XML metacharacters ({@code < > &}). */
private File buildValidOds(Path dir, String cellValue) throws Exception {
String xml = "<?xml version=\"1.0\"?>"
+ "<office:document-content"
+ " xmlns:office=\"urn:oasis:names:tc:opendocument:xmlns:office:1.0\""
+ " xmlns:table=\"urn:oasis:names:tc:opendocument:xmlns:table:1.0\""
+ " xmlns:text=\"urn:oasis:names:tc:opendocument:xmlns:text:1.0\">"
+ "<office:body><office:spreadsheet>"
+ "<table:table><table:table-row><table:table-cell>"
+ "<text:p>" + cellValue + "</text:p>"
+ "</table:table-cell></table:table-row></table:table>"
+ "</office:spreadsheet></office:body>"
+ "</office:document-content>";
return writeOdsZip(dir.resolve("valid.ods"), xml);
}
private File writeOdsZip(Path destination, String contentXml) throws Exception {
try (OutputStream fos = Files.newOutputStream(destination);
ZipOutputStream zip = new ZipOutputStream(fos)) {
zip.putNextEntry(new ZipEntry("content.xml"));
zip.write(contentXml.getBytes(StandardCharsets.UTF_8));
zip.closeEntry();
}
return destination.toFile();
}
private void setupOneValidOneFakeImport(Path tempDir) throws Exception {
byte[] pdfHeader = {0x25, 0x50, 0x44, 0x46, 0x2D}; // %PDF-
Files.write(tempDir.resolve("real.pdf"), pdfHeader);
Files.writeString(tempDir.resolve("fake.pdf"), "not a pdf");
buildMinimalImportXlsx(tempDir, "real.pdf", "fake.pdf");
ReflectionTestUtils.setField(service, "importDir", tempDir.toString());
when(documentService.findByOriginalFilename(any())).thenReturn(Optional.empty());
when(documentService.save(any())).thenAnswer(inv -> inv.getArgument(0));
}
private void buildMinimalImportXlsx(Path dir, String... filenames) throws Exception {
Path xlsx = dir.resolve("import.xlsx");
try (XSSFWorkbook wb = new XSSFWorkbook()) {
org.apache.poi.ss.usermodel.Sheet sheet = wb.createSheet("Sheet1");
sheet.createRow(0).createCell(0).setCellValue("Index");
for (int i = 0; i < filenames.length; i++) {
sheet.createRow(i + 1).createCell(0).setCellValue(filenames[i]);
}
try (OutputStream out = Files.newOutputStream(xlsx)) {
wb.write(out);
}
}
}
}

View File

@@ -1,208 +0,0 @@
package org.raddatz.familienarchiv.importing;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.junit.jupiter.api.io.TempDir;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.CsvSource;
import org.mockito.ArgumentCaptor;
import org.mockito.junit.jupiter.MockitoExtension;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonService;
import org.raddatz.familienarchiv.person.PersonUpsertCommand;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.times;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
@ExtendWith(MockitoExtension.class)
class PersonRegisterImporterTest {
@Test
void load_upsertsPersonBySourceRef_withProvisionalFalse(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
when(personService.upsertBySourceRef(any())).thenAnswer(inv -> personOf(inv.getArgument(0)));
Path xlsx = writePersons(tempDir, row(
"allemeyer-elsgard", "Allemeyer", "Elsgard", "Wöhler", "Nichte von Herbert", "False"));
new PersonRegisterImporter(personService).load(xlsx.toFile());
ArgumentCaptor<PersonUpsertCommand> captor = ArgumentCaptor.forClass(PersonUpsertCommand.class);
verify(personService).upsertBySourceRef(captor.capture());
PersonUpsertCommand cmd = captor.getValue();
assertThat(cmd.sourceRef()).isEqualTo("allemeyer-elsgard");
assertThat(cmd.lastName()).isEqualTo("Allemeyer");
assertThat(cmd.firstName()).isEqualTo("Elsgard");
assertThat(cmd.maidenName()).isEqualTo("Wöhler");
assertThat(cmd.notes()).isEqualTo("Nichte von Herbert");
assertThat(cmd.provisional()).isFalse();
}
@Test
void load_parsesCapitalisedPythonBool_True(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
when(personService.upsertBySourceRef(any())).thenAnswer(inv -> personOf(inv.getArgument(0)));
Path xlsx = writePersons(tempDir, row(
"noise-geschirr", "Geschirr", "", "", "", "True"));
new PersonRegisterImporter(personService).load(xlsx.toFile());
ArgumentCaptor<PersonUpsertCommand> captor = ArgumentCaptor.forClass(PersonUpsertCommand.class);
verify(personService).upsertBySourceRef(captor.capture());
assertThat(captor.getValue().provisional()).isTrue();
}
@Test
void load_skipsRowWithBlankPersonId(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
Path xlsx = writePersons(tempDir, row("", "NoId", "", "", "", "False"));
new PersonRegisterImporter(personService).load(xlsx.toFile());
verify(personService, times(0)).upsertBySourceRef(any());
}
@Test
void load_returnsCountOfProcessedRows(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
when(personService.upsertBySourceRef(any())).thenAnswer(inv -> personOf(inv.getArgument(0)));
Path xlsx = writePersons(tempDir,
row("a-one", "One", "A", "", "", "False"),
row("a-two", "Two", "B", "", "", "False"));
int processed = new PersonRegisterImporter(personService).load(xlsx.toFile());
assertThat(processed).isEqualTo(2);
}
// ─── generation parsing (#689) ─────────────────────────────────────────────
@ParameterizedTest
@CsvSource(value = {
"'G 3', 3",
"'G3', 3",
"'G 3', 3",
"'3', 3",
"' 3 ', 3",
"'G 2 de Gruyter', 2",
"'', null",
"'garbage', null",
"'G 99', null",
"'G -1', null"
}, nullValues = "null")
void load_parsesGeneration_perRegex(String raw, Integer expected, @TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
when(personService.upsertBySourceRef(any())).thenAnswer(inv -> personOf(inv.getArgument(0)));
Path xlsx = writePersonsWithGeneration(tempDir,
rowWithGeneration("herbert-cram", "Cram", "Herbert", "", "", "False", raw));
new PersonRegisterImporter(personService).load(xlsx.toFile());
ArgumentCaptor<PersonUpsertCommand> captor = ArgumentCaptor.forClass(PersonUpsertCommand.class);
verify(personService).upsertBySourceRef(captor.capture());
assertThat(captor.getValue().generation()).isEqualTo(expected);
}
@Test
void load_succeeds_andLeavesGenerationNull_whenArtifactHasNoGenerationColumn(@TempDir Path tempDir) throws Exception {
// REQ-IMP-001: older artifacts without the `generation` column must still
// import. REQUIRED_HEADERS is intentionally not extended.
PersonService personService = mock(PersonService.class);
when(personService.upsertBySourceRef(any())).thenAnswer(inv -> personOf(inv.getArgument(0)));
Path xlsx = writePersons(tempDir, row(
"old-artifact", "Mueller", "Hans", "", "", "False"));
new PersonRegisterImporter(personService).load(xlsx.toFile());
ArgumentCaptor<PersonUpsertCommand> captor = ArgumentCaptor.forClass(PersonUpsertCommand.class);
verify(personService).upsertBySourceRef(captor.capture());
assertThat(captor.getValue().generation()).isNull();
}
private static Person personOf(PersonUpsertCommand cmd) {
return Person.builder().id(UUID.randomUUID()).sourceRef(cmd.sourceRef())
.firstName(cmd.firstName()).lastName(cmd.lastName())
.provisional(cmd.provisional()).build();
}
private Map<String, String> row(String personId, String lastName, String firstName,
String maidenName, String notes, String provisional) {
Map<String, String> r = new LinkedHashMap<>();
r.put("person_id", personId);
r.put("last_name", lastName);
r.put("first_name", firstName);
r.put("maiden_name", maidenName);
r.put("notes", notes);
r.put("provisional", provisional);
return r;
}
@SafeVarargs
private Path writePersons(Path dir, Map<String, String>... rows) throws Exception {
Path xlsx = dir.resolve("canonical-persons.xlsx");
List<String> headers = List.of("person_id", "last_name", "first_name", "maiden_name", "notes", "provisional");
try (XSSFWorkbook wb = new XSSFWorkbook()) {
Sheet sheet = wb.createSheet("Sheet1");
Row header = sheet.createRow(0);
for (int i = 0; i < headers.size(); i++) {
header.createCell(i).setCellValue(headers.get(i));
}
for (int r = 0; r < rows.length; r++) {
Row row = sheet.createRow(r + 1);
for (int c = 0; c < headers.size(); c++) {
row.createCell(c).setCellValue(rows[r].getOrDefault(headers.get(c), ""));
}
}
try (OutputStream out = Files.newOutputStream(xlsx)) {
wb.write(out);
}
}
return xlsx;
}
private Map<String, String> rowWithGeneration(String personId, String lastName, String firstName,
String maidenName, String notes, String provisional,
String generation) {
Map<String, String> r = row(personId, lastName, firstName, maidenName, notes, provisional);
r.put("generation", generation);
return r;
}
@SafeVarargs
private Path writePersonsWithGeneration(Path dir, Map<String, String>... rows) throws Exception {
Path xlsx = dir.resolve("canonical-persons.xlsx");
List<String> headers = List.of(
"person_id", "last_name", "first_name", "maiden_name", "notes", "provisional", "generation");
try (XSSFWorkbook wb = new XSSFWorkbook()) {
Sheet sheet = wb.createSheet("Sheet1");
Row header = sheet.createRow(0);
for (int i = 0; i < headers.size(); i++) {
header.createCell(i).setCellValue(headers.get(i));
}
for (int r = 0; r < rows.length; r++) {
Row row = sheet.createRow(r + 1);
for (int c = 0; c < headers.size(); c++) {
row.createCell(c).setCellValue(rows[r].getOrDefault(headers.get(c), ""));
}
}
try (OutputStream out = Files.newOutputStream(xlsx)) {
wb.write(out);
}
}
return xlsx;
}
}

View File

@@ -1,222 +0,0 @@
package org.raddatz.familienarchiv.importing;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.junit.jupiter.api.io.TempDir;
import org.mockito.ArgumentCaptor;
import org.mockito.junit.jupiter.MockitoExtension;
import org.raddatz.familienarchiv.exception.DomainException;
import org.raddatz.familienarchiv.exception.ErrorCode;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonService;
import org.raddatz.familienarchiv.person.PersonUpsertCommand;
import org.raddatz.familienarchiv.person.relationship.RelationType;
import org.raddatz.familienarchiv.person.relationship.RelationshipService;
import org.raddatz.familienarchiv.person.relationship.dto.CreateRelationshipRequest;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.Mockito.doThrow;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
@ExtendWith(MockitoExtension.class)
class PersonTreeImporterTest {
@Test
void load_upsertsTreePersonBySourceRef_withFamilyMemberFlag(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
RelationshipService relationshipService = mock(RelationshipService.class);
when(personService.upsertBySourceRef(any())).thenAnswer(inv -> personOf(inv.getArgument(0)));
Path json = write(tempDir, """
{"persons":[
{"rowId":"row_002","firstName":"Elsgard","lastName":"Allemeyer","maidenName":"Wöhler",
"notes":"Nichte","birthYear":1920,"deathYear":1999,"familyMember":true,"personId":"allemeyer-elsgard"}
],"relationships":[]}
""");
new PersonTreeImporter(personService, relationshipService)
.load(json.toFile());
ArgumentCaptor<PersonUpsertCommand> captor = ArgumentCaptor.forClass(PersonUpsertCommand.class);
verify(personService).upsertBySourceRef(captor.capture());
PersonUpsertCommand cmd = captor.getValue();
assertThat(cmd.sourceRef()).isEqualTo("allemeyer-elsgard");
assertThat(cmd.familyMember()).isTrue();
assertThat(cmd.provisional()).isFalse();
}
@Test
void load_createsRelationship_resolvingRowIdsToUpsertedPersons(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
RelationshipService relationshipService = mock(RelationshipService.class);
UUID idA = UUID.randomUUID();
UUID idB = UUID.randomUUID();
when(personService.upsertBySourceRef(any())).thenAnswer(inv -> {
PersonUpsertCommand c = inv.getArgument(0);
return Person.builder().id(c.sourceRef().equals("a") ? idA : idB)
.sourceRef(c.sourceRef()).lastName(c.lastName()).build();
});
Path json = write(tempDir, """
{"persons":[
{"rowId":"row_a","lastName":"A","familyMember":true,"personId":"a"},
{"rowId":"row_b","lastName":"B","familyMember":true,"personId":"b"}
],"relationships":[
{"personId":"row_a","relatedPersonId":"row_b","type":"SPOUSE_OF","source":"verheiratet_mit"}
]}
""");
new PersonTreeImporter(personService, relationshipService)
.load(json.toFile());
ArgumentCaptor<CreateRelationshipRequest> captor = ArgumentCaptor.forClass(CreateRelationshipRequest.class);
verify(relationshipService).addRelationship(eq(idA), captor.capture());
assertThat(captor.getValue().relatedPersonId()).isEqualTo(idB);
assertThat(captor.getValue().relationType()).isEqualTo(RelationType.SPOUSE_OF);
}
@Test
void load_swallowsDuplicateRelationship_forIdempotentReimport(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
RelationshipService relationshipService = mock(RelationshipService.class);
when(personService.upsertBySourceRef(any()))
.thenAnswer(inv -> personOf(inv.getArgument(0)));
doThrow(DomainException.conflict(ErrorCode.DUPLICATE_RELATIONSHIP, "exists"))
.when(relationshipService).addRelationship(any(), any());
Path json = write(tempDir, """
{"persons":[
{"rowId":"row_a","lastName":"A","familyMember":true,"personId":"a"},
{"rowId":"row_b","lastName":"B","familyMember":true,"personId":"b"}
],"relationships":[
{"personId":"row_a","relatedPersonId":"row_b","type":"SPOUSE_OF","source":"verheiratet_mit"}
]}
""");
PersonTreeImporter importer = new PersonTreeImporter(personService, relationshipService);
// Must not propagate the conflict — re-import is idempotent.
importer.load(json.toFile());
verify(relationshipService).addRelationship(any(), any());
}
@Test
void load_propagatesUnexpectedDomainException_fromAddRelationship(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
RelationshipService relationshipService = mock(RelationshipService.class);
when(personService.upsertBySourceRef(any()))
.thenAnswer(inv -> personOf(inv.getArgument(0)));
// An unexpected ErrorCode (not DUPLICATE/CIRCULAR) must NOT be swallowed.
doThrow(DomainException.internal(ErrorCode.INTERNAL_ERROR, "boom"))
.when(relationshipService).addRelationship(any(), any());
Path json = write(tempDir, """
{"persons":[
{"rowId":"row_a","lastName":"A","familyMember":true,"personId":"a"},
{"rowId":"row_b","lastName":"B","familyMember":true,"personId":"b"}
],"relationships":[
{"personId":"row_a","relatedPersonId":"row_b","type":"SPOUSE_OF","source":"verheiratet_mit"}
]}
""");
PersonTreeImporter importer = new PersonTreeImporter(personService, relationshipService);
assertThatThrownBy(() -> importer.load(json.toFile()))
.isInstanceOf(DomainException.class)
.extracting("code").isEqualTo(ErrorCode.INTERNAL_ERROR);
}
@Test
void load_skipsRelationship_whenRowIdUnresolved(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
RelationshipService relationshipService = mock(RelationshipService.class);
when(personService.upsertBySourceRef(any())).thenAnswer(inv -> personOf(inv.getArgument(0)));
Path json = write(tempDir, """
{"persons":[
{"rowId":"row_a","lastName":"A","familyMember":true,"personId":"a"}
],"relationships":[
{"personId":"row_a","relatedPersonId":"row_ghost","type":"SPOUSE_OF","source":"x"}
]}
""");
new PersonTreeImporter(personService, relationshipService)
.load(json.toFile());
verify(relationshipService, org.mockito.Mockito.never()).addRelationship(any(), any());
}
// ─── generation (#689) ─────────────────────────────────────────────────────
@Test
void load_passesGenerationFromJson(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
RelationshipService relationshipService = mock(RelationshipService.class);
when(personService.upsertBySourceRef(any())).thenAnswer(inv -> personOf(inv.getArgument(0)));
Path json = write(tempDir, """
{"persons":[
{"rowId":"row_a","lastName":"Cram","firstName":"Herbert","familyMember":true,
"personId":"herbert-cram","generation":3}
],"relationships":[]}
""");
new PersonTreeImporter(personService, relationshipService).load(json.toFile());
ArgumentCaptor<PersonUpsertCommand> captor = ArgumentCaptor.forClass(PersonUpsertCommand.class);
verify(personService).upsertBySourceRef(captor.capture());
assertThat(captor.getValue().generation()).isEqualTo(3);
}
@Test
void load_returnsNullGeneration_whenAbsentFromJson(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
RelationshipService relationshipService = mock(RelationshipService.class);
when(personService.upsertBySourceRef(any())).thenAnswer(inv -> personOf(inv.getArgument(0)));
Path json = write(tempDir, """
{"persons":[
{"rowId":"row_a","lastName":"Cram","firstName":"Herbert","familyMember":true,
"personId":"herbert-cram"}
],"relationships":[]}
""");
new PersonTreeImporter(personService, relationshipService).load(json.toFile());
ArgumentCaptor<PersonUpsertCommand> captor = ArgumentCaptor.forClass(PersonUpsertCommand.class);
verify(personService).upsertBySourceRef(captor.capture());
assertThat(captor.getValue().generation()).isNull();
}
@Test
void load_skipsOutOfRangeGeneration_logsWarn_neverAborts(@TempDir Path tempDir) throws Exception {
PersonService personService = mock(PersonService.class);
RelationshipService relationshipService = mock(RelationshipService.class);
when(personService.upsertBySourceRef(any())).thenAnswer(inv -> personOf(inv.getArgument(0)));
Path json = write(tempDir, """
{"persons":[
{"rowId":"row_a","lastName":"Cram","firstName":"Herbert","familyMember":true,
"personId":"herbert-cram","generation":99}
],"relationships":[]}
""");
new PersonTreeImporter(personService, relationshipService).load(json.toFile());
ArgumentCaptor<PersonUpsertCommand> captor = ArgumentCaptor.forClass(PersonUpsertCommand.class);
verify(personService).upsertBySourceRef(captor.capture());
assertThat(captor.getValue().generation()).isNull();
}
private static Person personOf(PersonUpsertCommand cmd) {
return Person.builder().id(UUID.randomUUID()).sourceRef(cmd.sourceRef()).lastName(cmd.lastName()).build();
}
private Path write(Path dir, String json) throws Exception {
Path file = dir.resolve("canonical-persons-tree.json");
Files.writeString(file, json);
return file;
}
}

View File

@@ -1,103 +0,0 @@
package org.raddatz.familienarchiv.importing;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.junit.jupiter.api.io.TempDir;
import org.mockito.junit.jupiter.MockitoExtension;
import org.raddatz.familienarchiv.tag.Tag;
import org.raddatz.familienarchiv.tag.TagService;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.ArgumentMatchers.isNull;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
@ExtendWith(MockitoExtension.class)
class TagTreeImporterTest {
@Test
void load_upsertsRootTagWithNullParent(@TempDir Path tempDir) throws Exception {
TagService tagService = mock(TagService.class);
when(tagService.upsertBySourceRef(any(), any(), any()))
.thenAnswer(inv -> tagOf(inv.getArgument(0), inv.getArgument(1), inv.getArgument(2)));
Path xlsx = writeTagTree(tempDir, List.<String[]>of(
new String[]{"Themen", "", "Themen"}));
new TagTreeImporter(tagService).load(xlsx.toFile());
verify(tagService).upsertBySourceRef("Themen", "Themen", null);
}
@Test
void load_resolvesParentByPath_forChildTag(@TempDir Path tempDir) throws Exception {
TagService tagService = mock(TagService.class);
UUID rootId = UUID.randomUUID();
when(tagService.upsertBySourceRef(eq("Themen"), eq("Themen"), isNull()))
.thenReturn(tagOf("Themen", "Themen", null, rootId));
when(tagService.upsertBySourceRef(eq("Themen/Brautbriefe"), eq("Brautbriefe"), eq(rootId)))
.thenReturn(tagOf("Themen/Brautbriefe", "Brautbriefe", rootId));
Path xlsx = writeTagTree(tempDir, List.<String[]>of(
new String[]{"Themen", "", "Themen"},
new String[]{"Themen/Brautbriefe", "Themen", "Brautbriefe"}));
new TagTreeImporter(tagService).load(xlsx.toFile());
verify(tagService).upsertBySourceRef("Themen/Brautbriefe", "Brautbriefe", rootId);
}
@Test
void load_returnsCountOfProcessedRows(@TempDir Path tempDir) throws Exception {
TagService tagService = mock(TagService.class);
when(tagService.upsertBySourceRef(any(), any(), any()))
.thenAnswer(inv -> tagOf(inv.getArgument(0), inv.getArgument(1), inv.getArgument(2)));
Path xlsx = writeTagTree(tempDir, List.<String[]>of(
new String[]{"Themen", "", "Themen"},
new String[]{"Themen/Brautbriefe", "Themen", "Brautbriefe"}));
int processed = new TagTreeImporter(tagService).load(xlsx.toFile());
assertThat(processed).isEqualTo(2);
}
private static Tag tagOf(String sourceRef, String name, UUID parentId) {
return tagOf(sourceRef, name, parentId, UUID.randomUUID());
}
private static Tag tagOf(String sourceRef, String name, UUID parentId, UUID id) {
return Tag.builder().id(id).sourceRef(sourceRef).name(name).parentId(parentId).build();
}
private Path writeTagTree(Path dir, List<String[]> rows) throws Exception {
Path xlsx = dir.resolve("canonical-tag-tree.xlsx");
try (XSSFWorkbook wb = new XSSFWorkbook()) {
Sheet sheet = wb.createSheet("Sheet1");
Row header = sheet.createRow(0);
header.createCell(0).setCellValue("tag_path");
header.createCell(1).setCellValue("parent_name");
header.createCell(2).setCellValue("tag_name");
for (int r = 0; r < rows.size(); r++) {
Row row = sheet.createRow(r + 1);
String[] values = rows.get(r);
for (int c = 0; c < values.length; c++) {
row.createCell(c).setCellValue(values[c]);
}
}
try (OutputStream out = Files.newOutputStream(xlsx)) {
wb.write(out);
}
}
return xlsx;
}
}

View File

@@ -65,144 +65,44 @@ class PersonControllerTest {
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_returns200_withEmptyPagedResult() throws Exception {
when(personService.search(any(), eq(0), eq(50), eq(null)))
.thenReturn(PersonSearchResult.paged(Collections.emptyList(), 0, 50, 0));
void getPersons_returns200_withEmptyList() throws Exception {
when(personService.findAll(null)).thenReturn(Collections.emptyList());
mockMvc.perform(get("/api/persons"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.items").isArray())
.andExpect(jsonPath("$.totalElements").value(0));
.andExpect(status().isOk());
}
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_delegatesQueryParam_toService() throws Exception {
PersonSummaryDTO dto = mockPersonSummary("Hans", "Müller");
when(personService.search(any(), eq(0), eq(50), eq("Hans")))
.thenReturn(PersonSearchResult.paged(List.of(dto), 0, 50, 1));
when(personService.findAll("Hans")).thenReturn(List.of(dto));
mockMvc.perform(get("/api/persons").param("q", "Hans"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.items[0].firstName").value("Hans"));
.andExpect(jsonPath("$[0].firstName").value("Hans"));
}
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_passesFilterParams_toService() throws Exception {
ArgumentCaptor<PersonFilter> filterCaptor = ArgumentCaptor.forClass(PersonFilter.class);
when(personService.search(filterCaptor.capture(), eq(0), eq(50), eq(null)))
.thenReturn(PersonSearchResult.paged(Collections.emptyList(), 0, 50, 0));
mockMvc.perform(get("/api/persons")
.param("type", "INSTITUTION")
.param("familyOnly", "true")
.param("hasDocuments", "true")
.param("provisional", "false"))
.andExpect(status().isOk());
PersonFilter captured = filterCaptor.getValue();
assertThat(captured.type()).isEqualTo(PersonType.INSTITUTION);
assertThat(captured.familyOnly()).isTrue();
assertThat(captured.hasDocuments()).isTrue();
assertThat(captured.provisional()).isFalse();
}
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_defaultsToReaderDefault_whenNoReviewFlag() throws Exception {
ArgumentCaptor<PersonFilter> filterCaptor = ArgumentCaptor.forClass(PersonFilter.class);
when(personService.search(filterCaptor.capture(), eq(0), eq(50), eq(null)))
.thenReturn(PersonSearchResult.paged(Collections.emptyList(), 0, 50, 0));
mockMvc.perform(get("/api/persons")).andExpect(status().isOk());
assertThat(filterCaptor.getValue().readerDefault()).isTrue();
}
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_dropsReaderDefault_whenReviewFlagSet() throws Exception {
ArgumentCaptor<PersonFilter> filterCaptor = ArgumentCaptor.forClass(PersonFilter.class);
when(personService.search(filterCaptor.capture(), eq(0), eq(50), eq(null)))
.thenReturn(PersonSearchResult.paged(Collections.emptyList(), 0, 50, 0));
mockMvc.perform(get("/api/persons").param("review", "true")).andExpect(status().isOk());
assertThat(filterCaptor.getValue().readerDefault()).isFalse();
}
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_passesPageAndSize_toService() throws Exception {
when(personService.search(any(), eq(2), eq(25), eq(null)))
.thenReturn(PersonSearchResult.paged(Collections.emptyList(), 2, 25, 0));
mockMvc.perform(get("/api/persons").param("page", "2").param("size", "25"))
.andExpect(status().isOk());
verify(personService).search(any(), eq(2), eq(25), eq(null));
}
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_returns400_whenSizeIsZero() throws Exception {
mockMvc.perform(get("/api/persons").param("size", "0"))
.andExpect(status().isBadRequest());
}
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_returns400_whenSizeExceeds100() throws Exception {
mockMvc.perform(get("/api/persons").param("size", "101"))
.andExpect(status().isBadRequest());
}
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_returns400_whenPageIsNegative() throws Exception {
mockMvc.perform(get("/api/persons").param("page", "-1"))
.andExpect(status().isBadRequest());
}
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_delegatesTopByDocumentCount_whenSortGiven() throws Exception {
void getPersons_delegatesTopByDocumentCount_whenSortAndSizeGiven() throws Exception {
PersonSummaryDTO top = mockPersonSummary("Käthe", "Raddatz");
when(personService.findTopByDocumentCount(4)).thenReturn(List.of(top));
mockMvc.perform(get("/api/persons").param("sort", "documentCount").param("size", "4"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.items[0].firstName").value("Käthe"));
.andExpect(jsonPath("$[0].firstName").value("Käthe"));
}
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_topByDocumentCount_isNonPaged_totalElementsEqualsReturnedCount() throws Exception {
// The top-N dashboard path is deliberately NON-paged: it returns the complete result
// (no further page exists), so totalElements equals the number of rows returned and
// totalPages is 1. Pinned so nobody "fixes" it into a misleading paged total.
when(personService.findTopByDocumentCount(50))
.thenReturn(List.of(mockPersonSummary("Käthe", "Raddatz"),
mockPersonSummary("Hans", "Müller")));
void getPersons_capsTopByDocumentCount_atFifty() throws Exception {
ArgumentCaptor<Integer> sizeCaptor = ArgumentCaptor.forClass(Integer.class);
when(personService.findTopByDocumentCount(sizeCaptor.capture())).thenReturn(Collections.emptyList());
mockMvc.perform(get("/api/persons").param("sort", "documentCount"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.items.length()").value(2))
.andExpect(jsonPath("$.totalElements").value(2))
.andExpect(jsonPath("$.pageNumber").value(0))
.andExpect(jsonPath("$.pageSize").value(2))
.andExpect(jsonPath("$.totalPages").value(1));
}
mockMvc.perform(get("/api/persons").param("sort", "documentCount").param("size", "999"))
.andExpect(status().isOk());
@Test
@WithMockUser(authorities = "READ_ALL")
void getPersons_topByDocumentCount_emptyResult_reportsZeroPages() throws Exception {
when(personService.findTopByDocumentCount(50)).thenReturn(Collections.emptyList());
mockMvc.perform(get("/api/persons").param("sort", "documentCount"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.totalElements").value(0))
.andExpect(jsonPath("$.totalPages").value(0));
assertThat(sizeCaptor.getValue()).isEqualTo(50);
}
private PersonSummaryDTO mockPersonSummary(String firstName, String lastName) {
@@ -217,7 +117,6 @@ class PersonControllerTest {
public Integer getDeathYear() { return null; }
public String getNotes() { return null; }
public boolean isFamilyMember() { return false; }
public boolean isProvisional() { return false; }
public long getDocumentCount() { return 0; }
};
}
@@ -498,61 +397,6 @@ class PersonControllerTest {
.andExpect(status().isNoContent());
}
// ─── PATCH /api/persons/{id}/confirm ──────────────────────────────────────
@Test
void confirmPerson_returns401_whenUnauthenticated() throws Exception {
mockMvc.perform(patch("/api/persons/{id}/confirm", UUID.randomUUID()).with(csrf()))
.andExpect(status().isUnauthorized());
}
@Test
@WithMockUser(authorities = "READ_ALL")
void confirmPerson_returns403_whenUserHasOnlyReadPermission() throws Exception {
mockMvc.perform(patch("/api/persons/{id}/confirm", UUID.randomUUID()).with(csrf()))
.andExpect(status().isForbidden());
}
@Test
@WithMockUser(authorities = "WRITE_ALL")
void confirmPerson_returns200_andClearsProvisional() throws Exception {
UUID id = UUID.randomUUID();
Person confirmed = Person.builder().id(id).firstName("Bald").lastName("Bestaetigt").provisional(false).build();
when(personService.confirmPerson(id)).thenReturn(confirmed);
mockMvc.perform(patch("/api/persons/{id}/confirm", id).with(csrf()))
.andExpect(status().isOk())
.andExpect(jsonPath("$.provisional").value(false));
verify(personService).confirmPerson(id);
}
// ─── DELETE /api/persons/{id} ──────────────────────────────────────────────
@Test
void deletePerson_returns401_whenUnauthenticated() throws Exception {
mockMvc.perform(delete("/api/persons/{id}", UUID.randomUUID()).with(csrf()))
.andExpect(status().isUnauthorized());
}
@Test
@WithMockUser(authorities = "READ_ALL")
void deletePerson_returns403_whenUserHasOnlyReadPermission() throws Exception {
mockMvc.perform(delete("/api/persons/{id}", UUID.randomUUID()).with(csrf()))
.andExpect(status().isForbidden());
}
@Test
@WithMockUser(authorities = "WRITE_ALL")
void deletePerson_returns204_whenValid() throws Exception {
UUID id = UUID.randomUUID();
mockMvc.perform(delete("/api/persons/{id}", id).with(csrf()))
.andExpect(status().isNoContent());
verify(personService).deletePerson(id);
}
// ─── PUT /api/persons/{id} — lastName blank branch ────────────────────────
@Test
@@ -718,74 +562,4 @@ class PersonControllerTest {
.content("{\"lastName\":\"de Gruyter\"}"))
.andExpect(status().isBadRequest());
}
// ─── generation field validation (#689) ────────────────────────────────────
@Test
@WithMockUser(authorities = "WRITE_ALL")
void updatePerson_returns400_whenGenerationAboveRange() throws Exception {
mockMvc.perform(put("/api/persons/{id}", UUID.randomUUID()).with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"firstName\":\"Hans\",\"lastName\":\"Müller\","
+ "\"personType\":\"PERSON\",\"generation\":11}"))
.andExpect(status().isBadRequest())
.andExpect(jsonPath("$.code").value(ErrorCode.VALIDATION_ERROR.name()));
}
@Test
@WithMockUser(authorities = "WRITE_ALL")
void updatePerson_returns400_whenGenerationBelowRange() throws Exception {
mockMvc.perform(put("/api/persons/{id}", UUID.randomUUID()).with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"firstName\":\"Hans\",\"lastName\":\"Müller\","
+ "\"personType\":\"PERSON\",\"generation\":-1}"))
.andExpect(status().isBadRequest())
.andExpect(jsonPath("$.code").value(ErrorCode.VALIDATION_ERROR.name()));
}
@Test
@WithMockUser(authorities = "WRITE_ALL")
void updatePerson_returns200_whenGenerationNull() throws Exception {
// Symmetric body assertion: the response must echo generation as null (not
// absent), so the frontend re-hydrates the "(none)" option after a clear.
// Without this, the in-range test below would be the only end-to-end proof
// that the field flows through the controller.
Person saved = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").build();
when(personService.updatePerson(any(), any())).thenReturn(saved);
mockMvc.perform(put("/api/persons/{id}", UUID.randomUUID()).with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"firstName\":\"Hans\",\"lastName\":\"Müller\","
+ "\"personType\":\"PERSON\",\"generation\":null}"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.generation").value(org.hamcrest.Matchers.nullValue()));
}
@Test
@WithMockUser(authorities = "WRITE_ALL")
void updatePerson_returns200_whenGenerationInRange() throws Exception {
Person saved = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").generation(3).build();
when(personService.updatePerson(any(), any())).thenReturn(saved);
mockMvc.perform(put("/api/persons/{id}", UUID.randomUUID()).with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"firstName\":\"Hans\",\"lastName\":\"Müller\","
+ "\"personType\":\"PERSON\",\"generation\":3}"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.generation").value(3));
}
@Test
@WithMockUser(authorities = "WRITE_ALL")
void createPerson_returns200_whenGenerationInRange() throws Exception {
Person saved = Person.builder().id(UUID.randomUUID()).firstName("Hans").lastName("Müller").generation(3).build();
when(personService.createPerson(any(org.raddatz.familienarchiv.person.PersonUpdateDTO.class))).thenReturn(saved);
mockMvc.perform(post("/api/persons").with(csrf())
.contentType(MediaType.APPLICATION_JSON)
.content("{\"firstName\":\"Hans\",\"lastName\":\"Müller\","
+ "\"personType\":\"PERSON\",\"generation\":3}"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.generation").value(3));
}
}

View File

@@ -1,202 +0,0 @@
package org.raddatz.familienarchiv.person;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.InjectMocks;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;
import java.util.Optional;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.argThat;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
@ExtendWith(MockitoExtension.class)
class PersonImportUpsertTest {
@Mock PersonRepository personRepository;
@Mock PersonNameAliasRepository aliasRepository;
@InjectMocks PersonService personService;
@Test
void upsertBySourceRef_insertsNewPerson_whenSourceRefUnknown() {
when(personRepository.findBySourceRef("clara-cram")).thenReturn(Optional.empty());
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpsertCommand cmd = PersonUpsertCommand.builder()
.sourceRef("clara-cram").firstName("Clara").lastName("Cram")
.personType(PersonType.PERSON).provisional(false).build();
Person result = personService.upsertBySourceRef(cmd);
assertThat(result.getSourceRef()).isEqualTo("clara-cram");
assertThat(result.getFirstName()).isEqualTo("Clara");
assertThat(result.getLastName()).isEqualTo("Cram");
assertThat(result.isProvisional()).isFalse();
}
@Test
void upsertBySourceRef_updatesInPlace_whenSourceRefExists() {
Person existing = Person.builder()
.id(UUID.randomUUID()).sourceRef("clara-cram")
.firstName("Clara").lastName("Cram").build();
when(personRepository.findBySourceRef("clara-cram")).thenReturn(Optional.of(existing));
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpsertCommand cmd = PersonUpsertCommand.builder()
.sourceRef("clara-cram").firstName("Clara").lastName("Cram")
.notes("Updated note").personType(PersonType.PERSON).provisional(false).build();
personService.upsertBySourceRef(cmd);
verify(personRepository).save(argThat(p -> p.getId().equals(existing.getId())));
verify(personRepository, never()).save(argThat(p -> p.getId() == null));
}
@Test
void upsertBySourceRef_preservesHumanEditedNonBlankFields() {
// A human renamed the maiden-name register person and added notes in-app.
Person humanEdited = Person.builder()
.id(UUID.randomUUID()).sourceRef("clara-cram")
.firstName("Klara").lastName("Cram-Müller").notes("Verified by Marcel").build();
when(personRepository.findBySourceRef("clara-cram")).thenReturn(Optional.of(humanEdited));
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpsertCommand cmd = PersonUpsertCommand.builder()
.sourceRef("clara-cram").firstName("Clara").lastName("Cram")
.notes("Auto note").personType(PersonType.PERSON).provisional(false).build();
Person result = personService.upsertBySourceRef(cmd);
// Human edits survive the re-import.
assertThat(result.getFirstName()).isEqualTo("Klara");
assertThat(result.getLastName()).isEqualTo("Cram-Müller");
assertThat(result.getNotes()).isEqualTo("Verified by Marcel");
}
@Test
void upsertBySourceRef_fillsOnlyBlankFields_onReimport() {
Person existing = Person.builder()
.id(UUID.randomUUID()).sourceRef("clara-cram")
.firstName("Clara").lastName("Cram").notes(null).build();
when(personRepository.findBySourceRef("clara-cram")).thenReturn(Optional.of(existing));
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpsertCommand cmd = PersonUpsertCommand.builder()
.sourceRef("clara-cram").firstName("Clara").lastName("Cram")
.notes("Nichte von Herbert").personType(PersonType.PERSON).provisional(false).build();
Person result = personService.upsertBySourceRef(cmd);
// Blank field gets filled by canonical value.
assertThat(result.getNotes()).isEqualTo("Nichte von Herbert");
}
@Test
void upsertBySourceRef_fillsBlankYears_butPreservesHumanEditedYears_onReimport() {
// Existing has a human-set birthYear and a blank deathYear.
Person existing = Person.builder()
.id(UUID.randomUUID()).sourceRef("clara-cram")
.lastName("Cram").birthYear(1890).deathYear(null).build();
when(personRepository.findBySourceRef("clara-cram")).thenReturn(Optional.of(existing));
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpsertCommand cmd = PersonUpsertCommand.builder()
.sourceRef("clara-cram").lastName("Cram")
.birthYear(1888).deathYear(1965)
.personType(PersonType.PERSON).provisional(false).build();
Person result = personService.upsertBySourceRef(cmd);
assertThat(result.getBirthYear()).isEqualTo(1890); // human value kept
assertThat(result.getDeathYear()).isEqualTo(1965); // blank filled from canonical
}
@Test
void upsertBySourceRef_neverFlipsProvisionalBackToTrue_onceHumanConfirmed() {
// A human confirmed this provisional importer-created person (provisional -> false).
Person confirmed = Person.builder()
.id(UUID.randomUUID()).sourceRef("schwester-hanni")
.firstName(null).lastName("Schwester Hanni").provisional(false).build();
when(personRepository.findBySourceRef("schwester-hanni")).thenReturn(Optional.of(confirmed));
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpsertCommand cmd = PersonUpsertCommand.builder()
.sourceRef("schwester-hanni").lastName("Schwester Hanni")
.personType(PersonType.PERSON).provisional(true).build();
Person result = personService.upsertBySourceRef(cmd);
assertThat(result.isProvisional()).isFalse();
}
@Test
void upsertBySourceRef_setsProvisionalTrue_forNewProvisionalPerson() {
when(personRepository.findBySourceRef("noise-geschirr")).thenReturn(Optional.empty());
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpsertCommand cmd = PersonUpsertCommand.builder()
.sourceRef("noise-geschirr").lastName("Tante Tüten")
.personType(PersonType.PERSON).provisional(true).build();
Person result = personService.upsertBySourceRef(cmd);
assertThat(result.isProvisional()).isTrue();
}
// ─── generation (#689) ─────────────────────────────────────────────────────
@Test
void upsertBySourceRef_writesGeneration_onFirstImport() {
when(personRepository.findBySourceRef("herbert-cram")).thenReturn(Optional.empty());
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpsertCommand cmd = PersonUpsertCommand.builder()
.sourceRef("herbert-cram").firstName("Herbert").lastName("Cram")
.generation(3).personType(PersonType.PERSON).provisional(false).build();
Person result = personService.upsertBySourceRef(cmd);
assertThat(result.getGeneration()).isEqualTo(3);
}
@Test
void upsertBySourceRef_preservesHumanEditedGeneration_onReimport() {
Person humanEdited = Person.builder()
.id(UUID.randomUUID()).sourceRef("herbert-cram")
.firstName("Herbert").lastName("Cram").generation(4).build();
when(personRepository.findBySourceRef("herbert-cram")).thenReturn(Optional.of(humanEdited));
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpsertCommand cmd = PersonUpsertCommand.builder()
.sourceRef("herbert-cram").firstName("Herbert").lastName("Cram")
.generation(2).personType(PersonType.PERSON).provisional(false).build();
Person result = personService.upsertBySourceRef(cmd);
assertThat(result.getGeneration()).isEqualTo(4);
}
@Test
void mergeCanonical_overwrites_human_null_with_canonical_value_documenting_known_limitation() {
// If preferHuman gains explicit-null-vs-unset semantics, delete this test (see issue #689).
Person existing = Person.builder()
.id(UUID.randomUUID()).sourceRef("herbert-cram")
.firstName("Herbert").lastName("Cram").generation(null).build();
when(personRepository.findBySourceRef("herbert-cram")).thenReturn(Optional.of(existing));
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpsertCommand cmd = PersonUpsertCommand.builder()
.sourceRef("herbert-cram").firstName("Herbert").lastName("Cram")
.generation(3).personType(PersonType.PERSON).provisional(false).build();
Person result = personService.upsertBySourceRef(cmd);
assertThat(result.getGeneration()).isEqualTo(3);
}
}

View File

@@ -463,248 +463,4 @@ class PersonRepositoryTest {
assertThat(result).hasSize(1);
assertThat(result.get(0).getLastName()).isEqualTo("Gesellschafter des Verlages");
}
// ─── #671: provisional must be SELECTed in all three native projections ───
// Adding isProvisional() to the interface compiles even if a native query forgets
// to SELECT p.provisional — it then silently returns false. These tests are the only
// guard against that trap, so they must run against real Postgres.
@Test
void findAllWithDocumentCount_projectsProvisionalTrue() {
personRepository.save(Person.builder()
.firstName("Inferred").lastName("Person").provisional(true).build());
List<PersonSummaryDTO> result = personRepository.findAllWithDocumentCount();
assertThat(result).anyMatch(PersonSummaryDTO::isProvisional);
}
@Test
void searchWithDocumentCount_projectsProvisionalTrue() {
personRepository.save(Person.builder()
.firstName("Provisorisch").lastName("Müller").provisional(true).build());
List<PersonSummaryDTO> result = personRepository.searchWithDocumentCount("Provisorisch");
assertThat(result).hasSize(1);
assertThat(result.get(0).isProvisional()).isTrue();
}
@Test
void findTopByDocumentCount_projectsProvisionalTrue() {
Person provisional = personRepository.save(Person.builder()
.firstName("Top").lastName("Provisional").provisional(true).build());
documentRepository.save(Document.builder()
.title("Brief").originalFilename("b.pdf")
.status(DocumentStatus.UPLOADED)
.sender(provisional).build());
List<PersonSummaryDTO> result = personRepository.findTopByDocumentCount(10);
PersonSummaryDTO summary = result.stream()
.filter(p -> p.getId().equals(provisional.getId())).findFirst().orElseThrow();
assertThat(summary.isProvisional()).isTrue();
}
// ─── #667: filter-aware paged slice + paired COUNT (Postgres-only) ────────
// The slice query (findByFilter) and the count query (countByFilter) MUST share one
// WHERE clause so totalElements can never drift from the rendered page. These tests run
// against real Postgres because the slice ORDER BY uses a computed alias that fails on H2.
private void seedDirectoryFixture() {
// Register family member, no documents — visible by reader default (familyMember)
personRepository.save(Person.builder().firstName("Karl").lastName("Register").familyMember(true).build());
// Person with one document — visible by reader default (documentCount > 0)
Person hasDoc = personRepository.save(Person.builder().firstName("Doku").lastName("Person").build());
documentRepository.save(Document.builder().title("B").originalFilename("b.pdf")
.status(DocumentStatus.UPLOADED).sender(hasDoc).build());
// Provisional, zero-document, non-family — hidden by reader default
personRepository.save(Person.builder().firstName("Unbe").lastName("Staetigt").provisional(true).build());
// An institution with no documents, non-family, non-provisional
personRepository.save(Person.builder().lastName("Verlag GmbH").personType(PersonType.INSTITUTION).build());
}
@Test
void findByFilter_readerDefault_returnsOnlyFamilyOrWithDocuments() {
seedDirectoryFixture();
List<PersonSummaryDTO> slice = personRepository.findByFilter(
null, null, null, null, true, null, 50, 0);
assertThat(slice).extracting(PersonSummaryDTO::getLastName)
.containsExactlyInAnyOrder("Register", "Person");
}
@Test
void countByFilter_readerDefault_matchesSliceSize() {
seedDirectoryFixture();
long count = personRepository.countByFilter(null, null, null, null, true, null);
assertThat(count).isEqualTo(2);
}
@Test
void findByFilter_showAll_returnsEveryone() {
seedDirectoryFixture();
List<PersonSummaryDTO> slice = personRepository.findByFilter(
null, null, null, null, false, null, 50, 0);
assertThat(slice).hasSize(4);
}
@Test
void findByFilter_typeInstitution_returnsOnlyInstitutions() {
seedDirectoryFixture();
List<PersonSummaryDTO> slice = personRepository.findByFilter(
"INSTITUTION", null, null, null, false, null, 50, 0);
assertThat(slice).extracting(PersonSummaryDTO::getLastName).containsExactly("Verlag GmbH");
}
@Test
void findByFilter_familyOnly_returnsOnlyFamilyMembers() {
seedDirectoryFixture();
List<PersonSummaryDTO> slice = personRepository.findByFilter(
null, true, null, null, false, null, 50, 0);
assertThat(slice).extracting(PersonSummaryDTO::getLastName).containsExactly("Register");
}
@Test
void findByFilter_hasDocuments_returnsOnlyPersonsWithDocuments() {
seedDirectoryFixture();
List<PersonSummaryDTO> slice = personRepository.findByFilter(
null, null, true, null, false, null, 50, 0);
assertThat(slice).extracting(PersonSummaryDTO::getLastName).containsExactly("Person");
}
@Test
void findByFilter_provisionalTrue_returnsOnlyProvisional() {
seedDirectoryFixture();
List<PersonSummaryDTO> slice = personRepository.findByFilter(
null, null, null, true, false, null, 50, 0);
assertThat(slice).extracting(PersonSummaryDTO::getLastName).containsExactly("Staetigt");
}
@Test
void findByFilter_combinedFilters_andTogether() {
seedDirectoryFixture();
// family + has-documents → intersection is empty (Register has no docs, Doku is not family)
List<PersonSummaryDTO> slice = personRepository.findByFilter(
null, true, true, null, false, null, 50, 0);
assertThat(slice).isEmpty();
}
@Test
void findByFilter_query_combinesWithFilters() {
seedDirectoryFixture();
List<PersonSummaryDTO> slice = personRepository.findByFilter(
null, null, null, null, false, "Verlag", 50, 0);
assertThat(slice).extracting(PersonSummaryDTO::getLastName).containsExactly("Verlag GmbH");
}
@Test
void findByFilter_pageBeyondRange_returnsEmptySlice() {
seedDirectoryFixture();
List<PersonSummaryDTO> slice = personRepository.findByFilter(
null, null, null, null, false, null, 50, 999 * 50);
assertThat(slice).isEmpty();
}
@Test
void findByFilter_respectsPageSize() {
seedDirectoryFixture();
List<PersonSummaryDTO> firstPage = personRepository.findByFilter(
null, null, null, null, false, null, 2, 0);
List<PersonSummaryDTO> secondPage = personRepository.findByFilter(
null, null, null, null, false, null, 2, 2);
assertThat(firstPage).hasSize(2);
assertThat(secondPage).hasSize(2);
assertThat(firstPage).extracting(PersonSummaryDTO::getId)
.doesNotContainAnyElementsOf(secondPage.stream().map(PersonSummaryDTO::getId).toList());
}
@Test
void countByFilter_typeInstitution_matchesSlice() {
seedDirectoryFixture();
long count = personRepository.countByFilter("INSTITUTION", null, null, null, false, null);
assertThat(count).isEqualTo(1);
}
@Test
void countByFilter_query_matchesSliceSize() {
// The whole point of the shared FILTER_WHERE is that the slice and the count can never
// drift. Pin the query (LIKE) path explicitly: countByFilter must equal the slice size
// so a future edit to one query's LIKE clause is caught.
seedDirectoryFixture();
List<PersonSummaryDTO> slice = personRepository.findByFilter(
null, null, null, null, false, "Verlag", 50, 0);
long count = personRepository.countByFilter(null, null, null, null, false, "Verlag");
assertThat(count).isEqualTo(slice.size());
assertThat(count).isEqualTo(1);
}
@Test
void findByFilter_projectsDocumentCount() {
seedDirectoryFixture();
List<PersonSummaryDTO> slice = personRepository.findByFilter(
null, null, true, null, false, null, 50, 0);
assertThat(slice.get(0).getDocumentCount()).isEqualTo(1);
}
// ─── generation column (#689) ──────────────────────────────────────────────
@Test
void save_persistsGeneration_andFindByIdReturnsSameGeneration() {
Person person = Person.builder()
.firstName("Walter")
.lastName("Raddatz")
.generation(3)
.build();
Person saved = personRepository.save(person);
entityManager.flush();
entityManager.clear();
Optional<Person> found = personRepository.findById(saved.getId());
assertThat(found).isPresent();
assertThat(found.get().getGeneration()).isEqualTo(3);
}
@Test
void save_allowsNullGeneration_existingRowsRemainNull() {
Person person = Person.builder()
.firstName("Anonym")
.lastName("Person")
.build();
Person saved = personRepository.save(person);
entityManager.flush();
entityManager.clear();
Optional<Person> found = personRepository.findById(saved.getId());
assertThat(found).isPresent();
assertThat(found.get().getGeneration()).isNull();
}
}

View File

@@ -2,9 +2,6 @@ package org.raddatz.familienarchiv.person;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.raddatz.familienarchiv.document.Document;
import org.raddatz.familienarchiv.document.DocumentRepository;
import org.raddatz.familienarchiv.document.DocumentStatus;
import org.raddatz.familienarchiv.person.Person;
import org.raddatz.familienarchiv.person.PersonType;
import org.raddatz.familienarchiv.person.PersonRepository;
@@ -16,11 +13,6 @@ import org.springframework.test.context.bean.override.mockito.MockitoBean;
import org.springframework.transaction.annotation.Transactional;
import software.amazon.awssdk.services.s3.S3Client;
import jakarta.persistence.EntityManager;
import jakarta.persistence.PersistenceContext;
import java.util.Set;
import static org.assertj.core.api.Assertions.assertThat;
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.NONE)
@@ -32,9 +24,6 @@ class PersonServiceIntegrationTest {
@MockitoBean S3Client s3Client;
@Autowired PersonService personService;
@Autowired PersonRepository personRepository;
@Autowired DocumentRepository documentRepository;
@PersistenceContext EntityManager entityManager;
@Test
void findOrCreateByAlias_skipReturnsNull_noRecordCreated() {
@@ -74,150 +63,4 @@ class PersonServiceIntegrationTest {
assertThat(result.getFirstName()).isEqualTo("Clara");
assertThat(result.getLastName()).isEqualTo("Cram");
}
// ─── #667: confirm round-trip + reader-default semantics ──────────────────
@Test
void search_readerDefault_hidesProvisionalZeroDocumentPerson() {
personRepository.save(Person.builder()
.firstName("Unbe").lastName("Staetigt").provisional(true).build());
PersonSearchResult result = personService.search(PersonFilter.cleanDefault(), 0, 50, null);
assertThat(result.items()).noneMatch(p -> p.getLastName().equals("Staetigt"));
assertThat(result.totalElements()).isEqualTo(result.items().size());
}
@Test
void search_showAll_includesProvisionalZeroDocumentPerson() {
personRepository.save(Person.builder()
.firstName("Unbe").lastName("Staetigt").provisional(true).build());
PersonSearchResult result = personService.search(PersonFilter.showAll(), 0, 50, null);
assertThat(result.items()).anyMatch(p -> p.getLastName().equals("Staetigt"));
}
@Test
void confirmPerson_clearsProvisional_andShowAllTreatsItAsConfirmed() {
Person provisional = personRepository.save(Person.builder()
.firstName("Bald").lastName("Bestaetigt").provisional(true).build());
personService.confirmPerson(provisional.getId());
Person reloaded = personRepository.findById(provisional.getId()).orElseThrow();
assertThat(reloaded.isProvisional()).isFalse();
PersonSearchResult showAll = personService.search(PersonFilter.showAll(), 0, 50, null);
assertThat(showAll.items())
.filteredOn(p -> p.getId().equals(provisional.getId()))
.allMatch(p -> !p.isProvisional());
}
@Test
void deletePerson_removesPerson() {
Person target = personRepository.save(Person.builder()
.firstName("Weg").lastName("Person").provisional(true).build());
personService.deletePerson(target.getId());
assertThat(personRepository.findById(target.getId())).isEmpty();
}
// ─── generation full-stack round-trip (#689) ──────────────────────────────
@Test
void updatePerson_clearGenerationToNull_readsBackNullFromDb() {
// Sara's QA concern: pin the full PUT→DB→GET round-trip for the
// null-clear path. Without this we only have the WebMvcTest mocked
// boundary; nothing proved the JPA flush actually wrote SQL NULL.
Person seeded = personRepository.save(Person.builder()
.firstName("Hans").lastName("Raddatz")
.personType(PersonType.PERSON).generation(3).build());
entityManager.flush();
entityManager.clear();
PersonUpdateDTO dto = new PersonUpdateDTO();
dto.setPersonType(PersonType.PERSON);
dto.setFirstName("Hans");
dto.setLastName("Raddatz");
dto.setGeneration(null);
personService.updatePerson(seeded.getId(), dto);
entityManager.flush();
entityManager.clear();
Person reloaded = personRepository.findById(seeded.getId()).orElseThrow();
assertThat(reloaded.getGeneration()).isNull();
}
@Test
void updatePerson_setGenerationToZero_readsBackZeroFromDb() {
// Pin the G 0 case end-to-end. The form-action spec covers that 0
// doesn't get spread-dropped at the SvelteKit boundary; this test
// covers that the controller + service + JPA chain preserves the
// primitive zero (not coerced to null somewhere along the way).
Person seeded = personRepository.save(Person.builder()
.firstName("Walter").lastName("Raddatz")
.personType(PersonType.PERSON).build());
entityManager.flush();
entityManager.clear();
PersonUpdateDTO dto = new PersonUpdateDTO();
dto.setPersonType(PersonType.PERSON);
dto.setFirstName("Walter");
dto.setLastName("Raddatz");
dto.setGeneration(0);
personService.updatePerson(seeded.getId(), dto);
entityManager.flush();
entityManager.clear();
Person reloaded = personRepository.findById(seeded.getId()).orElseThrow();
assertThat(reloaded.getGeneration()).isEqualTo(0);
}
@Test
void deletePerson_detachesSentAndReceivedReferences_beforeDelete_noOrphan() {
// A person referenced as BOTH a document sender and a document receiver must delete
// cleanly: deletePerson nulls the sender_id FK and removes the receiver join row first
// (reassignSenderToNull → deleteReceiverReferences → deleteById), so no FK orphan and
// the documents themselves survive.
Person target = personRepository.save(Person.builder()
.firstName("Weg").lastName("Person").provisional(true).build());
Person bystander = personRepository.save(Person.builder()
.firstName("Bleibt").lastName("Hier").build());
Document sent = documentRepository.save(Document.builder()
.title("Sent letter").originalFilename("sent.pdf")
.status(DocumentStatus.UPLOADED).sender(target).build());
Document received = documentRepository.save(Document.builder()
.title("Received letter").originalFilename("received.pdf")
.status(DocumentStatus.UPLOADED).sender(bystander)
.receivers(new java.util.HashSet<>(Set.of(target))).build());
// Persist the fixture and detach everything so the native @Modifying deletes operate on
// the database directly without the persistence context holding stale references that
// would re-flush a now-deleted person as a transient association.
entityManager.flush();
entityManager.clear();
personService.deletePerson(target.getId());
// Native @Modifying queries bypass the persistence context — clear it so the asserting
// reads observe the post-delete database state, not stale managed entities.
entityManager.flush();
entityManager.clear();
assertThat(personRepository.findById(target.getId())).isEmpty();
Document reloadedSent = documentRepository.findById(sent.getId()).orElseThrow();
assertThat(reloadedSent.getSender()).isNull();
Document reloadedReceived = documentRepository.findById(received.getId()).orElseThrow();
assertThat(reloadedReceived.getReceivers())
.noneMatch(p -> p.getId().equals(target.getId()));
// The other person and the documents themselves survive the delete.
assertThat(personRepository.findById(bystander.getId())).isPresent();
}
}

View File

@@ -58,109 +58,33 @@ class PersonServiceTest {
assertThat(personService.getById(id)).isEqualTo(person);
}
// ─── #667: search (filter + pagination) ──────────────────────────────────
// ─── findAll ─────────────────────────────────────────────────────────────
@Test
void search_returnsPagedResult_withTotalsFromCountQuery() {
PersonFilter filter = PersonFilter.cleanDefault();
when(personRepository.countByFilter(null, null, null, null, true, null)).thenReturn(120L);
when(personRepository.findByFilter(null, null, null, null, true, null, 50, 0))
.thenReturn(List.of());
void findAll_returnsAll_whenQueryIsNull() {
List<PersonSummaryDTO> expected = List.of();
when(personRepository.findAllWithDocumentCount()).thenReturn(expected);
PersonSearchResult result = personService.search(filter, 0, 50, null);
assertThat(result.totalElements()).isEqualTo(120L);
assertThat(result.pageNumber()).isEqualTo(0);
assertThat(result.pageSize()).isEqualTo(50);
assertThat(result.totalPages()).isEqualTo(3); // ceil(120 / 50)
assertThat(personService.findAll(null)).isEqualTo(expected);
verify(personRepository).findAllWithDocumentCount();
verify(personRepository, never()).searchWithDocumentCount(any());
}
@Test
void search_passesTypeAsEnumName_toRepository() {
PersonFilter filter = PersonFilter.builder().type(PersonType.INSTITUTION).build();
when(personRepository.countByFilter("INSTITUTION", null, null, null, false, null)).thenReturn(0L);
when(personRepository.findByFilter("INSTITUTION", null, null, null, false, null, 50, 0))
.thenReturn(List.of());
personService.search(filter, 0, 50, null);
verify(personRepository).findByFilter("INSTITUTION", null, null, null, false, null, 50, 0);
void findAll_returnsEmpty_whenQueryIsWhitespaceOnly() {
assertThat(personService.findAll(" ")).isEmpty();
verify(personRepository, never()).findAllWithDocumentCount();
verify(personRepository, never()).searchWithDocumentCount(any());
}
@Test
void search_computesOffset_fromPageAndSize() {
PersonFilter filter = PersonFilter.showAll();
when(personRepository.countByFilter(null, null, null, null, false, null)).thenReturn(0L);
when(personRepository.findByFilter(null, null, null, null, false, null, 20, 40))
.thenReturn(List.of());
void findAll_searchesByName_whenQueryIsNonBlank() {
List<PersonSummaryDTO> expected = List.of();
when(personRepository.searchWithDocumentCount("Anna")).thenReturn(expected);
personService.search(filter, 2, 20, null); // offset = page * size = 40
verify(personRepository).findByFilter(null, null, null, null, false, null, 20, 40);
}
@Test
void search_trimsBlankQueryToNull() {
PersonFilter filter = PersonFilter.showAll();
when(personRepository.countByFilter(null, null, null, null, false, null)).thenReturn(0L);
when(personRepository.findByFilter(null, null, null, null, false, null, 50, 0))
.thenReturn(List.of());
personService.search(filter, 0, 50, " ");
verify(personRepository).findByFilter(null, null, null, null, false, null, 50, 0);
}
// ─── #667: confirmPerson ──────────────────────────────────────────────────
@Test
void confirmPerson_clearsProvisionalFlag() {
UUID id = UUID.randomUUID();
Person provisional = Person.builder().id(id).firstName("Inferred").lastName("Person").provisional(true).build();
when(personRepository.findById(id)).thenReturn(Optional.of(provisional));
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
Person result = personService.confirmPerson(id);
assertThat(result.isProvisional()).isFalse();
verify(personRepository).save(argThat(p -> !p.isProvisional()));
}
@Test
void confirmPerson_throwsNotFound_whenMissing() {
UUID id = UUID.randomUUID();
when(personRepository.findById(id)).thenReturn(Optional.empty());
assertThatThrownBy(() -> personService.confirmPerson(id))
.isInstanceOf(DomainException.class)
.extracting(e -> ((DomainException) e).getStatus().value())
.isEqualTo(404);
}
// ─── #667: deletePerson ───────────────────────────────────────────────────
@Test
void deletePerson_deletes_whenPersonExists() {
UUID id = UUID.randomUUID();
Person person = Person.builder().id(id).firstName("Weg").lastName("Person").build();
when(personRepository.findById(id)).thenReturn(Optional.of(person));
personService.deletePerson(id);
verify(personRepository).reassignSenderToNull(id);
verify(personRepository).deleteReceiverReferences(id);
verify(personRepository).deleteById(id);
}
@Test
void deletePerson_throwsNotFound_whenMissing() {
UUID id = UUID.randomUUID();
when(personRepository.findById(id)).thenReturn(Optional.empty());
assertThatThrownBy(() -> personService.deletePerson(id))
.isInstanceOf(DomainException.class)
.extracting(e -> ((DomainException) e).getStatus().value())
.isEqualTo(404);
assertThat(personService.findAll("Anna")).isEqualTo(expected);
verify(personRepository).searchWithDocumentCount("Anna");
verify(personRepository, never()).findAllWithDocumentCount();
}
// ─── createPerson ─────────────────────────────────────────────────────────
@@ -261,54 +185,6 @@ class PersonServiceTest {
.isEqualTo(400);
}
@Test
void createPerson_dto_persistsGeneration() {
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpdateDTO dto = new PersonUpdateDTO();
dto.setFirstName("Hans"); dto.setLastName("Raddatz");
dto.setPersonType(PersonType.PERSON); dto.setGeneration(3);
Person result = personService.createPerson(dto);
assertThat(result.getGeneration()).isEqualTo(3);
}
@Test
void updatePerson_writesGeneration_includingExplicitNullClear() {
// The form path is the only place a human can clear generation back to null.
UUID id = UUID.randomUUID();
Person existing = Person.builder().id(id).firstName("Hans").lastName("Raddatz")
.personType(PersonType.PERSON).generation(3).build();
when(personRepository.findById(id)).thenReturn(Optional.of(existing));
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpdateDTO dto = new PersonUpdateDTO();
dto.setFirstName("Hans"); dto.setLastName("Raddatz");
dto.setPersonType(PersonType.PERSON); dto.setGeneration(null);
Person result = personService.updatePerson(id, dto);
assertThat(result.getGeneration()).isNull();
}
@Test
void updatePerson_writesGeneration_whenSet() {
UUID id = UUID.randomUUID();
Person existing = Person.builder().id(id).firstName("Hans").lastName("Raddatz")
.personType(PersonType.PERSON).build();
when(personRepository.findById(id)).thenReturn(Optional.of(existing));
when(personRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
PersonUpdateDTO dto = new PersonUpdateDTO();
dto.setFirstName("Hans"); dto.setLastName("Raddatz");
dto.setPersonType(PersonType.PERSON); dto.setGeneration(2);
Person result = personService.updatePerson(id, dto);
assertThat(result.getGeneration()).isEqualTo(2);
}
// ─── updatePerson (personType) ───────────────────────────────────────────
@Test

View File

@@ -93,7 +93,7 @@ class RelationshipControllerTest {
@Test
@WithMockUser(username = "testuser", authorities = {"READ_ALL"})
void getNetwork_returns200_with_NetworkDTO_for_authenticated_user() throws Exception {
PersonNodeDTO node = new PersonNodeDTO(PERSON_ID, "Alice Müller", 1900, 1980, null, true);
PersonNodeDTO node = new PersonNodeDTO(PERSON_ID, "Alice Müller", 1900, 1980, true);
RelationshipDTO edge = new RelationshipDTO(
UUID.randomUUID(), PERSON_ID, OTHER_ID,
"Alice Müller", 1900, 1980,
@@ -111,7 +111,7 @@ class RelationshipControllerTest {
@Test
@WithMockUser(username = "testuser", authorities = {"READ_ALL"})
void getInferredRelationships_returns200_with_list_for_authenticated_user() throws Exception {
PersonNodeDTO relative = new PersonNodeDTO(OTHER_ID, "Bob Müller", 1930, null, null, true);
PersonNodeDTO relative = new PersonNodeDTO(OTHER_ID, "Bob Müller", 1930, null, true);
InferredRelationshipWithPersonDTO inferred =
new InferredRelationshipWithPersonDTO(relative, "Großvater", 2);
when(relationshipService.getInferredRelationships(PERSON_ID))

View File

@@ -144,12 +144,10 @@ class RelationshipServiceIntegrationTest {
@Test
void setFamilyMember_true_makes_person_appear_in_network() {
// addRelationship side-effects family_member=true on both endpoints for family-graph
// edges (PARENT_OF/SPOUSE_OF/SIBLING_OF). Reset charlie so the explicit
// setFamilyMember(true) call below is the thing under test, not the auto-flip.
// charlie starts with familyMember = false. Add a PARENT_OF edge alice→charlie
// so the edge exists, then flip charlie's flag and verify he appears in nodes.
relationshipService.addRelationship(alice.getId(),
new CreateRelationshipRequest(charlie.getId(), RelationType.PARENT_OF, null, null, null));
relationshipService.setFamilyMember(charlie.getId(), false);
NetworkDTO before = relationshipService.getFamilyNetwork();
assertThat(before.nodes()).extracting("id").doesNotContain(charlie.getId());

View File

@@ -23,8 +23,6 @@ import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.anyBoolean;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
@@ -150,50 +148,6 @@ class RelationshipServiceTest {
assertThat(result.notes()).isEqualTo("first born");
}
@Test
void addRelationship_marks_both_endpoints_as_family_member_when_type_is_family() {
// Creating a family-graph edge (PARENT_OF / SPOUSE_OF / SIBLING_OF) must mark both
// endpoints as family members so they appear in findAllFamilyMembers and the network.
// This is what makes the canonical importer's relationships actually show up in the UI.
when(personService.getById(alice.getId())).thenReturn(alice);
when(personService.getById(bob.getId())).thenReturn(bob);
when(relationshipRepository.existsByPersonIdAndRelatedPersonIdAndRelationType(
bob.getId(), alice.getId(), RelationType.PARENT_OF)).thenReturn(false);
when(relationshipRepository.saveAndFlush(any())).thenAnswer(inv -> {
PersonRelationship r = inv.getArgument(0);
r.setId(UUID.randomUUID());
r.setCreatedAt(Instant.now());
return r;
});
var dto = new CreateRelationshipRequest(bob.getId(), RelationType.PARENT_OF, null, null, null);
service.addRelationship(alice.getId(), dto);
verify(personService).setFamilyMember(alice.getId(), true);
verify(personService).setFamilyMember(bob.getId(), true);
}
@Test
void addRelationship_does_not_flip_family_member_for_non_family_type() {
// FRIEND / COLLEAGUE / EMPLOYER / DOCTOR / NEIGHBOR / OTHER are NOT family-graph
// edges (see getFamilyNetwork's filter), so addRelationship must leave family_member
// alone — a doctor of the family is not a family member.
when(personService.getById(alice.getId())).thenReturn(alice);
when(personService.getById(bob.getId())).thenReturn(bob);
when(relationshipRepository.saveAndFlush(any())).thenAnswer(inv -> {
PersonRelationship r = inv.getArgument(0);
r.setId(UUID.randomUUID());
r.setCreatedAt(Instant.now());
return r;
});
var dto = new CreateRelationshipRequest(bob.getId(), RelationType.FRIEND, null, null, null);
service.addRelationship(alice.getId(), dto);
verify(personService, never()).setFamilyMember(eq(alice.getId()), anyBoolean());
verify(personService, never()).setFamilyMember(eq(bob.getId()), anyBoolean());
}
@Test
void deleteRelationship_succeeds_when_viewpoint_is_object() {
UUID relId = UUID.randomUUID();
@@ -237,22 +191,6 @@ class RelationshipServiceTest {
assertThat(result.edges().get(0).relatedPersonId()).isEqualTo(bob.getId());
}
@Test
void getFamilyNetwork_populates_generation_on_PersonNodeDTO() {
Person walter = Person.builder().id(UUID.randomUUID()).lastName("Raddatz")
.familyMember(true).generation(2).build();
Person clara = Person.builder().id(UUID.randomUUID()).lastName("Raddatz")
.familyMember(true).generation(3).build();
when(personService.findAllFamilyMembers()).thenReturn(List.of(walter, clara));
when(relationshipRepository.findAllByRelationTypeIn(any())).thenReturn(List.of());
NetworkDTO result = service.getFamilyNetwork();
assertThat(result.nodes()).hasSize(2);
assertThat(result.nodes().stream().map(n -> n.generation()).toList())
.containsExactlyInAnyOrder(2, 3);
}
// --- helpers ---
private static Person person(String name) {

View File

@@ -1,62 +0,0 @@
package org.raddatz.familienarchiv.tag;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.InjectMocks;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;
import java.util.Optional;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.argThat;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
@ExtendWith(MockitoExtension.class)
class TagImportUpsertTest {
@Mock TagRepository tagRepository;
@InjectMocks TagService tagService;
@Test
void upsertBySourceRef_insertsNewTag_whenSourceRefUnknown() {
when(tagRepository.findBySourceRef("Themen/Brautbriefe")).thenReturn(Optional.empty());
when(tagRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
UUID parentId = UUID.randomUUID();
Tag result = tagService.upsertBySourceRef("Themen/Brautbriefe", "Brautbriefe", parentId);
assertThat(result.getSourceRef()).isEqualTo("Themen/Brautbriefe");
assertThat(result.getName()).isEqualTo("Brautbriefe");
assertThat(result.getParentId()).isEqualTo(parentId);
}
@Test
void upsertBySourceRef_updatesInPlace_whenSourceRefExists() {
Tag existing = Tag.builder().id(UUID.randomUUID()).name("Brautbriefe")
.sourceRef("Themen/Brautbriefe").build();
when(tagRepository.findBySourceRef("Themen/Brautbriefe")).thenReturn(Optional.of(existing));
when(tagRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
tagService.upsertBySourceRef("Themen/Brautbriefe", "Brautbriefe", null);
verify(tagRepository).save(argThat(t -> t.getId().equals(existing.getId())));
verify(tagRepository, never()).save(argThat(t -> t.getId() == null));
}
@Test
void upsertBySourceRef_preservesHumanRenamedTag_onReimport() {
Tag humanRenamed = Tag.builder().id(UUID.randomUUID()).name("Verlobungsbriefe")
.sourceRef("Themen/Brautbriefe").build();
when(tagRepository.findBySourceRef("Themen/Brautbriefe")).thenReturn(Optional.of(humanRenamed));
when(tagRepository.save(any())).thenAnswer(inv -> inv.getArgument(0));
Tag result = tagService.upsertBySourceRef("Themen/Brautbriefe", "Brautbriefe", null);
assertThat(result.getName()).isEqualTo("Verlobungsbriefe");
}
}

View File

@@ -7,8 +7,7 @@ import org.raddatz.familienarchiv.security.PermissionAspect;
import org.raddatz.familienarchiv.user.CustomUserDetailsService;
import org.raddatz.familienarchiv.document.DocumentService;
import org.raddatz.familienarchiv.document.DocumentVersionService;
import org.raddatz.familienarchiv.importing.CanonicalImportOrchestrator;
import org.raddatz.familienarchiv.importing.ImportStatus;
import org.raddatz.familienarchiv.importing.MassImportService;
import org.raddatz.familienarchiv.document.ThumbnailBackfillService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.autoconfigure.aop.AopAutoConfiguration;
@@ -36,7 +35,7 @@ class AdminControllerTest {
@Autowired MockMvc mockMvc;
@MockitoBean CanonicalImportOrchestrator importOrchestrator;
@MockitoBean MassImportService massImportService;
@MockitoBean DocumentService documentService;
@MockitoBean DocumentVersionService documentVersionService;
@MockitoBean ThumbnailBackfillService thumbnailBackfillService;
@@ -47,9 +46,9 @@ class AdminControllerTest {
@Test
@WithMockUser(authorities = "ADMIN")
void importStatus_returns200_withStatusCode_whenAdmin() throws Exception {
ImportStatus status = new ImportStatus(
ImportStatus.State.IDLE, "IMPORT_IDLE", "Kein Import gestartet.", 0, List.of(), null);
when(importOrchestrator.getStatus()).thenReturn(status);
MassImportService.ImportStatus status = new MassImportService.ImportStatus(
MassImportService.State.IDLE, "IMPORT_IDLE", "Kein Import gestartet.", 0, List.of(), null);
when(massImportService.getStatus()).thenReturn(status);
mockMvc.perform(get("/api/admin/import-status"))
.andExpect(status().isOk())
@@ -61,9 +60,9 @@ class AdminControllerTest {
@Test
@WithMockUser(authorities = "ADMIN")
void importStatus_messageField_notPresentInApiResponse() throws Exception {
ImportStatus status = new ImportStatus(
ImportStatus.State.IDLE, "IMPORT_IDLE", "Kein Import gestartet.", 0, List.of(), null);
when(importOrchestrator.getStatus()).thenReturn(status);
MassImportService.ImportStatus status = new MassImportService.ImportStatus(
MassImportService.State.IDLE, "IMPORT_IDLE", "Kein Import gestartet.", 0, List.of(), null);
when(massImportService.getStatus()).thenReturn(status);
mockMvc.perform(get("/api/admin/import-status"))
.andExpect(status().isOk())

View File

@@ -1,8 +1,2 @@
logging.level.root=WARN
logging.level.org.raddatz=INFO
# Default test value so FlywayConfig's fail-closed check passes without each
# test having to set GRAFANA_DB_PASSWORD explicitly. The actual value is
# irrelevant in tests — Flyway only uses it to set the grafana_reader role's
# password, which no test connects with.
GRAFANA_DB_PASSWORD=test-grafana-reader-password

View File

@@ -147,9 +147,6 @@ services:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PASSWORD:-changeme}
GF_USERS_ALLOW_SIGN_UP: "false"
GF_SERVER_ROOT_URL: ${GF_SERVER_ROOT_URL:-http://localhost:3003}
# Read-only password for the grafana_reader PostgreSQL role; interpolated
# into the provisioned PostgreSQL datasource (see datasources.yml).
GRAFANA_DB_PASSWORD: ${GRAFANA_DB_PASSWORD}
volumes:
- grafana_data:/var/lib/grafana
- ./infra/observability/grafana/provisioning:/etc/grafana/provisioning:ro
@@ -168,7 +165,6 @@ services:
condition: service_healthy
networks:
- obs-net
- archiv-net # PO Overview dashboard queries archive-db via the grafana_reader role
# --- Error Tracking: GlitchTip ---

View File

@@ -26,19 +26,15 @@
# MAIL_HOST, MAIL_PORT, SMTP relay (production only; staging uses mailpit)
# MAIL_USERNAME, MAIL_PASSWORD
# APP_MAIL_FROM sender address (e.g. noreply@raddatz.cloud)
# IMPORT_HOST_DIR absolute host path holding the canonical
# import artifacts (canonical-*.xlsx +
# canonical-persons-tree.json) and the
# <index>.pdf files for /admin/system
# IMPORT_HOST_DIR absolute host path holding ONLY the ODS
# spreadsheet and PDFs for /admin/system mass
# import — mounted read-only at /import inside
# the backend. Compose refuses to start when
# this var is unset, so staging and prod cannot
# accidentally share an import source. Must be
# readable by the backend container's UID
# (currently root via the OpenJDK image — any
# world-readable directory works). Canonical
# artifacts are NOT in git (PII — ADR-025); ops
# syncs them in beside the PDFs out-of-band.
# world-readable directory works).
networks:
archiv-net:
@@ -221,24 +217,16 @@ services:
# Bound to localhost only — Caddy fronts external traffic.
ports:
- "127.0.0.1:${PORT_BACKEND}:8080"
# Host path holding the canonical import artifacts (canonical-*.xlsx +
# canonical-persons-tree.json) + <index>.pdf files for the import endpoint.
# Read-only; the canonical importer only reads them from /import.
# Host path holding the ODS spreadsheet + PDFs for the mass-import endpoint.
# Read-only; MassImportService only reads (Files.list / Files.walk on /import).
# Required — no default — so staging and prod cannot accidentally share an
# import source. CI workflows pin this per-env (see .gitea/workflows/).
# NOTE: the canonical artifacts are NOT version-controlled (they contain real
# family PII — see ADR-025). Ops must produce them locally from the Python
# normalizer (tools/import-normalizer/) and sync them into this host path
# alongside the <index>.pdf corpus before triggering an import.
volumes:
- ${IMPORT_HOST_DIR:?Set IMPORT_HOST_DIR to a host path holding the import payload (canonical artifacts + <index>.pdf files). See docs/DEPLOYMENT.md.}:/import:ro
- ${IMPORT_HOST_DIR:?Set IMPORT_HOST_DIR to a host path holding the mass-import payload (ODS + PDFs). See docs/DEPLOYMENT.md.}:/import:ro
environment:
SPRING_DATASOURCE_URL: jdbc:postgresql://db:5432/archiv
SPRING_DATASOURCE_USERNAME: archiv
SPRING_DATASOURCE_PASSWORD: ${POSTGRES_PASSWORD}
# Consumed by Flyway V68 via the ${grafanaDbPassword} placeholder to set
# the read-only grafana_reader role's password.
GRAFANA_DB_PASSWORD: ${GRAFANA_DB_PASSWORD}
# Application uses the bucket-scoped service account, not MinIO root.
S3_ENDPOINT: http://minio:9000
S3_ACCESS_KEY: archiv-app
@@ -264,8 +252,6 @@ services:
OTEL_METRICS_EXPORTER: none
MANAGEMENT_METRICS_TAGS_APPLICATION: Familienarchiv
MANAGEMENT_TRACING_SAMPLING_PROBABILITY: ${MANAGEMENT_TRACING_SAMPLING_PROBABILITY:-0.1}
SENTRY_DSN: ${SENTRY_DSN:-}
LOGGING_STRUCTURED_FORMAT_CONSOLE: ecs
networks:
- archiv-net
healthcheck:
@@ -280,10 +266,6 @@ services:
build:
context: ./frontend
target: production
args:
# Vite build-time variable — baked into the JS bundle at build time.
# Empty default so deploys succeed before the secret is configured.
VITE_SENTRY_DSN: ${VITE_SENTRY_DSN:-}
restart: unless-stopped
depends_on:
backend:

View File

@@ -163,9 +163,6 @@ services:
SPRING_DATASOURCE_URL: jdbc:postgresql://db:5432/${POSTGRES_DB}
SPRING_DATASOURCE_USERNAME: ${POSTGRES_USER}
SPRING_DATASOURCE_PASSWORD: ${POSTGRES_PASSWORD}
# Consumed by Flyway V68 via the ${grafanaDbPassword} placeholder to set
# the read-only grafana_reader role's password.
GRAFANA_DB_PASSWORD: ${GRAFANA_DB_PASSWORD}
S3_ENDPOINT: http://minio:9000
S3_ACCESS_KEY: ${MINIO_ROOT_USER}
S3_SECRET_KEY: ${MINIO_ROOT_PASSWORD}
@@ -201,7 +198,7 @@ services:
networks:
- archiv-net
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:8081/actuator/health | grep -q UP || exit 1"]
test: ["CMD-SHELL", "wget -qO- http://localhost:8080/actuator/health | grep -q UP || exit 1"]
interval: 15s
timeout: 5s
retries: 10

View File

@@ -65,7 +65,7 @@ Members of the cross-cutting layer have no entity of their own, no user-facing C
| `dashboard` | Stats aggregation for the admin dashboard and Family Pulse widget | Aggregates from 3+ domains; no owned entities |
| `exception` | `DomainException`, `ErrorCode` enum, `GlobalExceptionHandler` | Framework infra; consumed by every controller and service. Adding a new `ErrorCode` requires matching updates in `frontend/src/lib/shared/errors.ts` and all three `messages/*.json` locale files. Current security-related codes: `CSRF_TOKEN_MISSING` (403 on mutating request without valid `X-XSRF-TOKEN` header), `TOO_MANY_LOGIN_ATTEMPTS` (429 when login rate limit exceeded). |
| `filestorage` | `FileService` — MinIO/S3 upload, download, presigned-URL generation | Generic service; consumed by `document` and `ocr` |
| `importing` | `CanonicalImportOrchestrator` — async canonical import running four idempotent loaders (`TagTreeImporter``PersonRegisterImporter``PersonTreeImporter``DocumentImporter`) over the normalizer's committed canonical artifacts (`canonical-*.xlsx` + `canonical-persons-tree.json`) | Orchestrates across `person`, `tag`, `document` |
| `importing` | `MassImportService` — async ODS/Excel batch import | Orchestrates across `person`, `tag`, `document` |
| `security` | `SecurityConfig`, `Permission` enum, `@RequirePermission` annotation, `PermissionAspect` (AOP) | Framework infra; enforced globally across all controllers |
**Frontend `shared/`** follows the same admission criteria. Key members: `api.server.ts` (typed openapi-fetch client factory), `errors.ts` (backend `ErrorCode` → i18n mapping), `shared/primitives/` (generic UI components used across ≥2 domains), `shared/discussion/` (comment/mention editor used by `document` and `geschichte`), `shared/utils/` (pure date/sort/debounce utilities).

View File

@@ -99,7 +99,7 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `APP_BASE_URL` | Public-facing URL for email links | `http://localhost:3000` | YES (prod) | — |
| `APP_OCR_BASE_URL` | Internal URL of the OCR service | — | YES | — |
| `APP_OCR_TRAINING_TOKEN` | Secret token for OCR training endpoints | — | YES (prod) | YES |
| `IMPORT_HOST_DIR` | Absolute host path holding the normalizer's canonical artifacts (`canonical-{documents,persons,tag-tree}.xlsx` + `canonical-persons-tree.json`) **plus the `<index>.pdf` files** for the `/admin/system` import. Mounted read-only at `/import` inside the backend (the canonical importer reads via `app.import.dir`). Compose refuses to start when unset, so staging and prod cannot accidentally share the source. Convention: `/srv/familienarchiv-staging/import` and `/srv/familienarchiv-production/import` | — | YES (prod compose) | — |
| `IMPORT_HOST_DIR` | Absolute host path holding the ODS spreadsheet + PDFs for the `/admin/system` mass-import card. Mounted read-only at `/import` inside the backend (compose-only — backend reads via `app.import.dir`). Compose refuses to start when unset, so staging and prod cannot accidentally share the source. Convention: `/srv/familienarchiv-staging/import` and `/srv/familienarchiv-production/import` | — | YES (prod compose) | — |
| `MAIL_HOST` | SMTP host | `mailpit` (dev) | YES (prod) | — |
| `MAIL_PORT` | SMTP port | `1025` (dev) | YES (prod) | — |
| `MAIL_USERNAME` | SMTP username | — | YES (prod) | YES |
@@ -152,7 +152,6 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `PORT_GRAFANA` | Host port for the Grafana UI (bound to `127.0.0.1` only) | `3003` | — | — |
| `POSTGRES_HOST` | PostgreSQL hostname for GlitchTip's db-init job and workers. Override when only the staging stack is running and `archive-db` is not resolvable by that name. | `archive-db` | — | — |
| `GRAFANA_ADMIN_PASSWORD` | Grafana `admin` user password | `changeme` | YES (prod) | YES |
| `GRAFANA_DB_PASSWORD` | Password for the read-only `grafana_reader` PostgreSQL role used by the PO Overview dashboard (issue #651). Consumed by Flyway V68 and the Grafana PostgreSQL datasource. Generate with `openssl rand -hex 32`. | — | YES (prod) | YES |
| `PORT_GLITCHTIP` | Host port for the GlitchTip UI (bound to `127.0.0.1` only) | `3002` | — | — |
| `GLITCHTIP_DOMAIN` | Public-facing base URL for GlitchTip (used in email links and CORS) | `http://localhost:3002` | YES (prod) | — |
| `GLITCHTIP_SECRET_KEY` | Django secret key for GlitchTip — generate with `python3 -c "import secrets; print(secrets.token_hex(32))"` | — | YES | YES |
@@ -257,7 +256,6 @@ git.raddatz.cloud A <server IP>
| `MAIL_USERNAME` | release.yml | SMTP user |
| `MAIL_PASSWORD` | release.yml | SMTP password |
| `GRAFANA_ADMIN_PASSWORD` | both | Grafana `admin` login — generate a strong password |
| `GRAFANA_DB_PASSWORD` | both | Read-only `grafana_reader` role password — `openssl rand -hex 32` |
| `GLITCHTIP_SECRET_KEY` | both | Django secret key — `openssl rand -hex 32` |
| `SENTRY_DSN` | both | GlitchTip project DSN — set after first-run (§4); leave empty to keep Sentry disabled |
| `VITE_SENTRY_DSN` | both | GlitchTip frontend project DSN — set after first-run (§4); leave empty to keep Sentry disabled |
@@ -359,7 +357,6 @@ Both files are passed explicitly via `--env-file` to the compose command, so the
| Gitea secret | Notes |
|---|---|
| `GRAFANA_ADMIN_PASSWORD` | Strong unique password; shared by nightly and release |
| `GRAFANA_DB_PASSWORD` | `openssl rand -hex 32`; shared by nightly and release — read-only DB role for the PO Overview dashboard |
| `GLITCHTIP_SECRET_KEY` | `openssl rand -hex 32`; shared by nightly and release |
| `STAGING_POSTGRES_PASSWORD` / `PROD_POSTGRES_PASSWORD` | Must match the running PostgreSQL container |
@@ -430,31 +427,6 @@ docker exec obs-loki wget -qO- \
Prometheus port `9090` and Grafana port `3003` (default; configurable via `PORT_GRAFANA`) are bound to `127.0.0.1` on the host. No other observability ports are host-bound.
##### Rotate the `grafana_reader` DB password
The PO Overview dashboard reads `audit_log`, `documents`, and `transcription_blocks` through the SELECT-only `grafana_reader` PostgreSQL role (issue #651, ADR-024). The role's password is owned by `R__grafana_reader_password.sql` — a Flyway *repeatable* migration that re-runs whenever the resolved `${grafanaDbPassword}` placeholder changes. That makes rotation a two-restart operation, no manual `psql` required.
```bash
# 1. Generate a new value
openssl rand -hex 32
# 2. Update both sides:
# - Gitea secret GRAFANA_DB_PASSWORD (nightly + release workflows pick it up)
# - Local .env on the server / dev machine
# 3. Restart the backend. Flyway sees that R__'s resolved checksum changed and
# re-applies it, issuing ALTER ROLE grafana_reader WITH PASSWORD '<new>'.
docker compose restart backend
# 4. Restart obs-grafana so the provisioned datasource picks up the new env value.
docker compose -f docker-compose.observability.yml restart obs-grafana
# 5. Verify the dashboard loads — PO Overview's Postgres panels should populate
# instead of "Data source error".
```
If `GRAFANA_DB_PASSWORD` is unset, the backend **refuses to start** (`IllegalStateException`). That is deliberate — see `FlywayConfig.resolveGrafanaDbPassword()` and the rationale in ADR-024.
#### GlitchTip
| Item | Value |
@@ -559,45 +531,20 @@ bash scripts/download-kraken-models.sh
> Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.
### Trigger a canonical import
### Trigger a mass import (Excel/ODS)
The importer no longer parses the raw spreadsheet. It consumes the **canonical artifacts**
produced by the normalizer (`tools/import-normalizer/`) — `canonical-tag-tree.xlsx`,
`canonical-persons.xlsx`, `canonical-persons-tree.json`, `canonical-documents.xlsx` — which
are committed under `tools/import-normalizer/out/`. The semantic transformation
(German-date parsing, name classification) lives entirely in the normalizer; the backend
maps the clean columns by header name. See [ADR-025](adr/025-canonical-import-and-single-migration-schema-foundation.md).
**Prerequisite — regenerate the artifacts when the source data changes:**
```bash
cd tools/import-normalizer
python3 -m venv .venv && .venv/bin/pip install -r requirements.txt # once, on a fresh clone
.venv/bin/python normalize.py
# writes the four canonical artifacts into ./out/
```
**Dev:** place all four canonical artifacts **plus** the PDFs into `./import/`
at the repo root (the dev compose bind-mounts it to `/import`, which is `app.import.dir`).
Each PDF must be named `<index>.pdf` (e.g. `W-0124.pdf`, `Mü-0001.pdf`) and live flat in the
import dir: since #686 the importer resolves a document's PDF directly by its index
(`importDir/<index>.pdf`), not via a `datei`/`file` column — the recursive directory walk and
its basename/homoglyph guards are gone, replaced by strict index validation plus a
canonical-path containment assertion (a document whose `<index>.pdf` is absent simply becomes a
`PLACEHOLDER`). The orchestrator smoke-checks that all four artifacts are present before
starting and fails closed (`IMPORT_ARTIFACT_INVALID`) if any is missing.
**Dev:** drop the ODS spreadsheet + PDFs into `./import/` at the repo root — the dev compose bind-mounts it to `/import` automatically.
**Staging/production:**
1. Pre-stage the four canonical artifacts + PDFs on the host. Convention:
`/srv/familienarchiv-staging/import/` or `/srv/familienarchiv-production/import/`.
1. Pre-stage the payload on the host. Convention: `/srv/familienarchiv-staging/import/` or `/srv/familienarchiv-production/import/`.
```bash
rsync -avh --progress ./import/ user@host:/srv/familienarchiv-staging/import/
```
2. Make sure `IMPORT_HOST_DIR=<host-path>` is set in `.env.staging` / `.env.production` (the nightly/release workflows already write this — see §3). Compose refuses to start without it.
3. Redeploy the stack so the bind mount picks up — or, if the mount is already in place, skip to step 4.
4. Call `POST /api/admin/trigger-import` (requires `ADMIN` permission), or click the "Import starten" button on `/admin/system`.
5. The import runs asynchronously — poll `GET /api/admin/import-status`, watch `/admin/system`, or tail the backend logs. Re-running is safe and idempotent (upsert by `source_ref` / document `index`). Person and tag scalar fields you edited in the app are preserved on re-import; a document's sender/receivers/tags are **canonical-authoritative** — a re-import re-applies them to exactly match the export, so a link removed from the export is removed from the document (the raw sender/receiver cell text is always kept).
5. The import runs asynchronously — poll `GET /api/admin/import-status`, watch `/admin/system`, or tail the backend logs.
---

View File

@@ -25,11 +25,6 @@ _Not to be confused with [AppUser](#appuser-appuser)_ — `Person` is a historic
**UserGroup** (`UserGroup`) — a named permission bundle assigned to one or more `AppUser`s. A user's effective permissions are the union of all permissions across all groups they belong to.
**source_ref** (`Person.sourceRef`, `Tag.sourceRef`) — the import normalizer's stable identity for a `Person` (its `person_id`) or `Tag` (its canonical `tag_path`). It is the join key linking normalized records to documents and the idempotency key for re-import; null for manually created records and unique among non-null values.
**provisional person** (`Person.provisional`) — a `Person` the importer inferred from raw attribution text but could not confidently match to a known individual. The flag lets the persons directory surface uncertainty honestly rather than fabricate a confident identity; it defaults to `false` and is set `true` only by the importer.
_Not to be confused with `family_member`_ — `provisional` expresses import confidence, while `family_member` is a genealogical fact about whether the person belongs to the family tree.
---
## Document-Related Terms
@@ -41,10 +36,6 @@ _See also [TranscriptionBlock](#transcriptionblock-transcriptionblock)._
**Document** (`Document`) — a single archival item (letter, postcard, photograph) with a file stored in MinIO/S3 and associated metadata (sender, receivers, date, tags, transcription blocks).
**date precision** (`Document.metaDatePrecision`, enum `DatePrecision`) — how exactly a document's date is known, one of `DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN`. A verbatim mirror of the import normalizer's `Precision` enum so honest dates can be rendered (`APPROX` → "ca.", `RANGE` uses `meta_date_end`) instead of fabricating a false `DAY`-level date. `UNKNOWN` is the explicit value for undated documents.
**raw attribution** (`Document.senderText`, `Document.receiverText`, `Document.metaDateRaw`) — the original spreadsheet cell text for a document's sender, receiver, and date, preserved verbatim even after a `Person` or normalized date is linked. It keeps provenance intact and enables an "as written in the original" view.
**DocumentVersion** (`DocumentVersion`) — an append-only snapshot of a `Document`'s metadata at a point in time. Append-only by convention; no consumer-facing create or update endpoint exists. The entity uses Lombok `@Data` (which generates setters), so immutability is enforced by application convention, not at the Java level.
**Tag** (`Tag`) — a hierarchical category that can be applied to `Document`s. Tags are self-referencing via a `parent_id` foreign key, forming a tree structure.
@@ -64,13 +55,9 @@ _See also [Annotation](#annotation-documentannotation)._
- `REVIEWED`: a reviewer has approved the transcription.
- `ARCHIVED`: the document is finalized and read-only.
**Canonical import** — an asynchronous batch process (`CanonicalImportOrchestrator`) that consumes the normalizer's committed canonical artifacts and creates `Tag`s, `Person`s (register + tree), family relationships, and `Document`s. Four idempotent loaders run in a fixed dependency order — `TagTreeImporter``PersonRegisterImporter``PersonTreeImporter``DocumentImporter` — each calling the owning domain's service. Re-running it never duplicates rows (upsert by `source_ref` / document `index`) and never overwrites a human-edited field. Only one import can run at a time (`IMPORT_ALREADY_RUNNING` error if attempted concurrently); a missing or malformed artifact fails closed (`IMPORT_ARTIFACT_INVALID`). Replaced the legacy raw-spreadsheet `MassImportService` (see ADR-025).
**Mass import** — an asynchronous batch process (`MassImportService`) that reads an Excel or ODS file and creates `Person`s, `Tag`s, and `PLACEHOLDER` `Document`s in one shot. Only one import can run at a time (`IMPORT_ALREADY_RUNNING` error if attempted concurrently).
**canonical artifact** — one of the four files the normalizer (`tools/import-normalizer/`) emits and commits to `tools/import-normalizer/out/`: `canonical-tag-tree.xlsx`, `canonical-persons.xlsx`, `canonical-persons-tree.json`, `canonical-documents.xlsx`. They are the contract the backend importer reads (mapped by header name); the semantic transformation (German-date parsing, name classification) lives only in the normalizer, never in Java.
**CanonicalSheetReader** — the value-level POI helper that opens a canonical `.xlsx`, maps the header row to column indices by name (replacing the brittle positional column config), splits pipe-delimited list columns, and throws `IMPORT_ARTIFACT_INVALID` on a missing required header rather than NPE-ing on a null index.
**SkippedFile** (`ImportStatus.SkippedFile`) — a file that was presented for import but not processed, recorded with a `filename` and a `reason` code. Possible reasons: `INVALID_FILENAME_PATH_TRAVERSAL` (the file-column basename failed the path-traversal guard), `INVALID_PDF_SIGNATURE` (magic-byte validation failed), `S3_UPLOAD_FAILED` (file upload to MinIO/S3 threw an exception), `FILE_READ_ERROR` (the file could not be opened for reading), or `ALREADY_EXISTS` (a document with the same `index` already exists in the archive with a status other than `PLACEHOLDER`).
**SkippedFile** (`MassImportService.SkippedFile`) — a file that was presented for import but not processed, recorded with a `filename` and a `reason` code. Possible reasons: `INVALID_PDF_SIGNATURE` (magic-byte validation failed), `S3_UPLOAD_FAILED` (file upload to MinIO/S3 threw an exception), `FILE_READ_ERROR` (the file could not be opened for reading), or `ALREADY_EXISTS` (a document with the same filename already exists in the archive with a status other than `PLACEHOLDER`).
**skipped count** — the total number of `SkippedFile` entries accumulated during a single import run (`ImportStatus.skipped()`). Shown in the amber warning section of the Import Status Card in the admin UI; a value of zero suppresses the section entirely.
@@ -93,38 +80,6 @@ _See also [DocumentStatus lifecycle](#documentstatus-lifecycle)._
**Sütterlin** — A specific standardized style of Kurrent taught in German schools from 1915 to 1941.
**Illegible word** — a word whose recognition confidence falls below the configured threshold; replaced with the literal token `[unleserlich]` in the rendered block text and counted in the `ocr_illegible_words_total` Prometheus counter.
**Models-ready gauge** — the `ocr_models_ready` Prometheus gauge, flipped from `0` to `1` once the FastAPI lifespan startup has finished loading the Kraken model and the spell-checker. Used both for the `/health` endpoint and as the supervised signal for the `ocr_models_ready < 1 for 2m` alert.
**Recognition model accuracy** — the accuracy reported by `ketos train` for the recognition (text-line) model, exposed as `ocr_model_accuracy{kind="recognition"}`. Sourced from `_parse_best_checkpoint` on the highest-scoring checkpoint after training.
**Segmentation model accuracy** — the accuracy reported by `ketos segtrain` for the baseline layout analysis (`blla`) model, exposed as `ocr_model_accuracy{kind="segmentation"}`. Distinct from recognition accuracy because the two models are trained and improved independently.
---
## Stammbaum (Family-Tree Layout) Terms
**Stammbaum** `[user-facing]` — the genealogy / family-tree view of the archive, accessible at `/stammbaum`. Renders every `Person` as a node positioned by `PersonRelationship` edges (`PARENT_OF`, `SPOUSE_OF`) into rows that correspond to generations. The browser-side layout pipeline lives at `frontend/src/lib/person/genealogy/`.
_See also [PersonRelationship](#person-person)._
**seeded rank** (`Person.generation`) — the imported generation index on a `Person` (G 0 = founders, increasing downward), used as a strict row anchor in `buildLayout.ts`. The iterative fallback heuristic never overrides a seeded rank, and spouse-pulldown never pulls a seeded rank — only unseeded nodes (no `generation`) flow through the heuristic.
**sibling block** — a layout unit holding the children of a single parent-set at one generation, used inside `buildLayout.ts`. Each block has a center computed from the parents' midpoint; blocks are then packed left-to-right within a generation row. Two adjacent sibling blocks at the same rank can be merged if a `SPOUSE_OF` edge crosses them (intra-family marriage, AC2).
**loose spouse** — a person at a given generation who is a spouse of someone in a sibling block but is not themselves a parented child of anyone in the graph. Loose spouses are attached adjacent to their parented partner (right side per Leonie's UX rule) so the spouse line stays short.
_Not to be confused with [parented](#parented-layout)_ — loose is the absence of parent edges into the graph.
**parented** `[layout]` — a layout flag on a sibling-block member indicating that the person has at least one `PARENT_OF` edge incoming from a node already in the graph at the prior generation. Parented members are the layout anchors of their block (the block is centred so the average index of parented members sits under the parents' midpoint); non-parented members (loose spouses) ride along on the side.
**anchor index** — within a sibling block, the average position of `parented` member indices. The block is shifted horizontally so this index, multiplied by `NODE_W + COL_GAP`, lines up under the midpoint of the block's parents — keeping every parent-child connector orthogonal (90°).
**intra-family marriage** — a `SPOUSE_OF` edge where both endpoints are parented members of *different* sibling blocks at the same rank (i.e. both have parents in the graph, but the parent sets differ). Layout merges the two blocks so the spouses sit adjacent at the join boundary; latent in current data (0 cases in the May-2026 canonical snapshot) but covered by a synthetic regression test in `buildLayout.test.ts`.
**marriage dot** — the SVG circle drawn at the midpoint of a `SPOUSE_OF` connector in the Stammbaum tree (`StammbaumTree.svelte`). Radius is `r=6` (12 px diameter) so the marker meets WCAG 1.4.11 (3:1 non-text contrast) when it stacks to disambiguate multiple marriages on the same focal person.
**canonical fixture** (Stammbaum) — `frontend/src/lib/person/genealogy/__fixtures__/stammbaum.json`, a pinned `/api/network` snapshot used by `buildLayout.test.ts` for structural-property assertions against real data. Captured locally via `frontend/scripts/capture-network-fixture.mjs` with explicit credentials and a localhost backend; never invoked from CI. Sanity-gated by `validateFixture.ts` (≥ 50 nodes / ≥ 5 generations / ≥ 1 SPOUSE_OF edge / ≥ 1 multi-spouse person).
---
## Other Domain Terms
@@ -169,3 +124,4 @@ _Terms flagged as potentially ambiguous that have not yet been formally defined
- Terms surfaced by Epic 1 audit findings (#388#392) — review audit reports under `docs/audits/` when available and add any term flagged as ambiguous.
- `OcrBatchService` vs `OcrAsyncRunner` — both handle async OCR orchestration; their division of responsibility should be clarified here.
- `Stammbaum` — the genealogy tree view; relationship to `PersonRelationship` entity.

View File

@@ -118,14 +118,11 @@ To find a trace for a specific request in staging/production, either increase th
## Metrics (Prometheus → Grafana)
Prometheus scrapes two targets every 15 s:
Prometheus scrapes the backend management endpoint every 15 s:
```
Target: backend:8081/actuator/prometheus
Labels: job="spring-boot", application="Familienarchiv"
Target: ocr:8000/metrics
Labels: job="ocr-service"
```
All Spring Boot metrics carry the `application="Familienarchiv"` tag, which is how the Grafana Spring Boot Observability dashboard (ID 17175) filters to this service.
@@ -149,70 +146,6 @@ jvm_memory_used_bytes{area="heap", application="Familienarchiv"}
hikaricp_connections_active
```
### OCR-service custom metrics
Exposed at `ocr:8000/metrics` by `prometheus-fastapi-instrumentator`. The
`http_*` metrics describe the FastAPI request layer; the `ocr_*` series are
domain-specific. **Never label these with PII or document content** — labels
have unbounded cardinality risk and are visible to anyone with Grafana access.
| Metric | Type | Labels | Unit | What it tracks |
|---|---|---|---|---|
| `ocr_jobs_total` | Counter | `engine` (`surya`/`kraken`), `script_type` | jobs | OCR jobs that started after a successful PDF download |
| `ocr_pages_total` | Counter | `engine` | pages | Successfully OCR'd pages in the streaming generator |
| `ocr_skipped_pages_total` | Counter | — | pages | Pages skipped because the engine raised on them |
| `ocr_words_total` | Counter | — | words | Recognized words summed across every block |
| `ocr_illegible_words_total` | Counter | — | words | Words below the confidence threshold (rendered as `[unleserlich]`) |
| `ocr_processing_seconds` | Histogram | `engine` | seconds | Per-page (stream) or per-document (`/ocr`) engine time, excluding preprocessing |
| `ocr_training_runs_total` | Counter | `kind` (`recognition`/`segmentation`), `outcome` (`success`/`error`) | runs | Completed training runs |
| `ocr_model_accuracy` | Gauge | `kind` | ratio (01) | Latest accuracy reported by a successful training run |
| `ocr_models_ready` | Gauge | — | 0\|1 | 1 once the lifespan startup has finished loading models |
Canonical example queries (the same ones referenced in issue #652):
```promql
# OCR throughput by engine
sum by (engine) (rate(ocr_pages_total[5m]))
# Share of words rendered as [unleserlich]
sum(rate(ocr_illegible_words_total[5m]))
/ sum(rate(ocr_words_total[5m]))
# p95 page processing time per engine
histogram_quantile(0.95, sum by (engine, le) (
rate(ocr_processing_seconds_bucket[5m])
))
# Training error rate
sum(rate(ocr_training_runs_total{outcome="error"}[1h]))
/ sum(rate(ocr_training_runs_total[1h]))
# Latest recognition vs segmentation accuracy
ocr_model_accuracy
```
### Internal-only endpoints
`/metrics` is exposed by the OCR service over plain HTTP without
authentication. The container is reachable only on the internal Docker
network — Caddy never proxies to it directly. If the service is ever
exposed (e.g. a `ports:` mapping is added), block the endpoint at the
reverse proxy:
```caddy
ocr.example.com {
@internal_only path /metrics /health
respond @internal_only 404
reverse_proxy ocr:8000
}
```
The `MetricsPathFilter` in `ocr-service/main.py` suppresses uvicorn's
**stdout** access log lines for `/metrics` and `/health` so the container
console stays focused on real OCR traffic. Promtail/Loki still receive
access lines from any other source. Treat the filter as console
noise-control, not an audit-suppression mechanism.
## Errors (GlitchTip)
GlitchTip receives errors from both the backend (via Sentry Java SDK) and the frontend (via Sentry JavaScript SDK). It groups events by fingerprint, tracks first/last seen times, and links to the release that introduced the error.

View File

@@ -94,6 +94,17 @@ The schema includes `spring_session` and `spring_session_attributes` tables, but
---
### `MassImportService` provides no status or error feedback
**File:** `service/MassImportService.java`, `controller/AdminController.java`
`/api/admin/trigger-import` returns immediately (async), but there is no way for the admin to know whether the import succeeded, failed, or is still running. Errors during async execution are silently swallowed.
**Fix options:**
- Store import job status in a DB table (`import_jobs`) with state (`RUNNING`, `DONE`, `FAILED`) and expose a `GET /api/admin/import-status` endpoint
- Alternatively, make the endpoint synchronous since it already blocks on file I/O — only use async if you need true non-blocking behaviour
---
## Missing Capabilities
### No test coverage
@@ -103,7 +114,7 @@ The only test is a Spring context load test. No unit or integration tests exist
**Suggested starting points (highest value for effort):**
1. `DocumentSpecifications` — pure logic, easy to unit test with an in-memory H2 or Testcontainers PostgreSQL
2. Canonical import loaders (`CanonicalSheetReader`, `DocumentImporter`, etc.) — parsing/upsert logic, test with fixture canonical `.xlsx` files
2. `ExcelService` — parsing logic, test with fixture `.xlsx` files (one exists in `api_tests/`)
3. `PermissionAspect` — security logic should be tested; use `@WithMockUser` from Spring Security Test
---

View File

@@ -1,94 +0,0 @@
# ADR-023: Prometheus Instrumentator and Metrics Registry Injection
## Status
Accepted
## Context
Until issue #652 the OCR service exposed no `/metrics` endpoint. The
observability stack already scrapes the Spring Boot backend's actuator
endpoint, but it had nothing to scrape on the Python side. Without HTTP-
and domain-level metrics from `ocr-service` we cannot answer questions
like "what is the share of words rendered as `[unleserlich]`" or
"is the training error rate above its budget" from Grafana.
Two implementation requirements influenced the design:
1. **Counter / gauge isolation in tests.** `prometheus_client` collectors
are module-level singletons keyed by name on the global `REGISTRY`.
Re-importing or naively re-instantiating them raises a duplicated-
collector error and cross-test state leaks (a `.inc()` in test A is
still readable by test B). A test harness needs a way to swap the
active container for a fresh per-test instance.
2. **Minimal blast radius on the request path.** We did not want to
hand-instrument every endpoint with FastAPI middleware. The
`prometheus-fastapi-instrumentator` library already provides
`http_requests_total`, `http_request_duration_seconds`, and the
`/metrics` exposition route, all idiomatic Prometheus names.
## Decision
- Add `prometheus-fastapi-instrumentator==7.0.0` and pin its transitive
dependency `prometheus-client==0.25.0` explicitly in
`ocr-service/requirements.txt`.
- Mount the instrumentator once at module load:
`Instrumentator(excluded_handlers=["/health", "/metrics"]).instrument(app).expose(app)`.
This adds `/metrics` and an HTTP-level dashboard surface without
changing any endpoint code.
- Define every domain metric (`ocr_jobs_total`, `ocr_pages_total`,
`ocr_processing_seconds`, …) inside a `build_metrics(registry)`
factory in `ocr-service/metrics.py` that returns a frozen `OcrMetrics`
dataclass. Production code binds the container to the default
`REGISTRY` once: `metrics: OcrMetrics = build_metrics(REGISTRY)`.
- Tests use a `fresh_metrics` fixture that builds a new
`CollectorRegistry()` per test and monkeypatches `main.metrics` with
a container bound to it. The endpoint code keeps reading
`metrics.<name>` without knowing whether it is talking to the global
registry or a per-test one.
## Consequences
**Positive**
- One reusable factory captures the metric definitions; future metrics
go in one place.
- Tests run with full counter isolation. Cross-test state leakage is
impossible because each test sees its own dataclass instance.
- The instrumentator gives us `http_*` metrics for free, including a
Grafana-ready histogram that pairs with the Spring Boot one.
**Negative**
- One extra level of indirection: any test that asserts on metric
values must remember to monkeypatch `main.metrics`, not the registry
directly. Rebinding through the registry is harmless but useless —
the dataclass holds references to the original collectors.
- `prometheus-client` is now pinned. Upgrading it requires an explicit
bump and re-checking the instrumentator's compatibility range.
- `/metrics` is exposed unauthenticated and relies on the Docker
internal network for confidentiality. See
[docs/OBSERVABILITY.md §Internal-only endpoints](../OBSERVABILITY.md)
for the Caddy snippet that must be added if the service ever gets a
host-side port mapping.
## Alternatives considered
- **Hand-roll the `/metrics` endpoint.** Rejected: would have meant
duplicating what `prometheus-fastapi-instrumentator` ships, plus
middleware for the HTTP histograms.
- **Skip the factory; pass `registry` as a function argument
everywhere.** Rejected: clutters every endpoint signature and breaks
the symmetry with the Spring Boot side, which also relies on a
process-global Micrometer registry.
- **Use a `pytest` autouse fixture that resets `REGISTRY` between
tests.** Rejected: `prometheus_client` does not expose a clean
"unregister all" hook, and we would be relying on private APIs.
## References
- Issue: [#652](https://git.raddatz.cloud/marcel/familienarchiv/issues/652)
- Library: <https://github.com/trallnag/prometheus-fastapi-instrumentator>
- Code: `ocr-service/metrics.py`, `ocr-service/main.py`,
`ocr-service/test_metrics.py`

View File

@@ -1,123 +0,0 @@
# ADR-024: Grafana reads archive-db via a bridged network and a SELECT-only role
## Status
Accepted
## Context
Issue #651 (the PO Overview Grafana dashboard) needs aggregates over three
tables in the main application database — `audit_log`, `documents`, and
`transcription_blocks` — to answer the operator's four weekly questions: is
everything working, are people using it, is the archive making progress, is
OCR working well.
Until now, `obs-grafana` and the rest of the observability stack lived on
their own Docker network (`obs-net`) and never touched `archiv-net`, where
`archive-db` runs. The two were intentionally isolated: a compromise of any
observability container could not pivot to the application database.
The PO Overview's archive-progress and user-activity panels need rolling
7-day SQL aggregates that cannot be served by Prometheus or Loki. That
forces a connection from `obs-grafana` to `archive-db` for the first time.
Two implementation requirements shaped the design:
1. **Least privilege on the database side.** The Spring Boot application
role (`archiv`) has full read/write on every table. Letting Grafana
connect with that role would mean a Grafana compromise becomes an
application compromise. The dashboard only needs SELECT on three
tables; the role must reflect that and nothing more.
2. **Operational simplicity of secret rotation.** The role's password is
shared between the migration that sets it and the Grafana datasource
that uses it. A first version of this work put the password in a
versioned Flyway migration (V68), which Flyway only applies once —
leaving rotation as an out-of-band `psql ALTER ROLE` step that no
runbook documented. The shape must support rotation without manual
SQL.
## Decision
- Provision a dedicated PostgreSQL role `grafana_reader` with `LOGIN` plus
`GRANT SELECT` on `audit_log`, `documents`, `transcription_blocks` only.
No INSERT/UPDATE/DELETE on any table, no access to any other table —
enforced by the database, locked in by both positive and parameterized
negative tests in `GrafanaReaderRoleIntegrationTest`.
- Split the role's lifecycle across two migrations:
- `V68__add_grafana_reader_role.sql` — versioned, immutable, idempotent.
Creates the role and applies the grants. Runs exactly once per
database, like every other versioned migration.
- `R__grafana_reader_password.sql` — Flyway *repeatable* migration that
issues `ALTER ROLE grafana_reader WITH PASSWORD '${grafanaDbPassword}'`.
Flyway computes the checksum on the resolved content, so any change
to `GRAFANA_DB_PASSWORD` flips the checksum and re-applies the
migration on the next boot. Rotation becomes "bump env var, restart
backend, restart obs-grafana" — see the runbook in
`docs/DEPLOYMENT.md §4 → Rotate the grafana_reader DB password`.
- Resolve the password through Spring's `Environment` rather than a raw
`System.getenv()` call, so tests inject via `application.properties`
and the resolver is unit-testable with `MockEnvironment`. Fail closed
with `IllegalStateException` when the variable is unset — no fallback
string. Same shape as `UserDataInitializer`'s refusal to seed default
admin credentials outside dev/test/e2e.
- Join `obs-grafana` to `archiv-net` in addition to `obs-net`. Only the
Grafana container crosses the boundary; Loki, Tempo, Prometheus,
GlitchTip, and the worker containers remain `obs-net`-only.
## Consequences
**Positive**
- Database-level least privilege: a Grafana compromise gains SELECT on
three tables. Cannot write, cannot read PII tables like `app_users`,
`persons`, `notifications`, `document_comments`, `geschichten`. The
parameterized PII negative sweep in `GrafanaReaderRoleIntegrationTest`
is the regression gate; new sensitive tables get added to that list.
- Rotation is documented, idempotent, and survives operator turnover.
No "the password set on day 1 is the password forever" failure mode.
- Tests pin down both sides of the boundary: positive grants must hold,
write-deny must hold, and the PII negative list must stay empty.
**Negative / trade-offs**
- `obs-net` is no longer fully isolated from `archiv-net`. A Grafana RCE
(e.g. via a future Grafana CVE) gains a TCP path to `archive-db`
contained, but not impossible. The least-privilege role is the
mitigation; we accept that mitigation as sufficient for a single
bridged container.
- The backend must hold `GRAFANA_DB_PASSWORD` in its environment forever,
so Flyway can resolve the placeholder on every boot. A backend RCE
therefore also leaks the Grafana datasource password. Acceptable
because that password's blast radius is itself bounded by the
least-privilege grants on `grafana_reader`.
## Alternatives considered
- **Prometheus PostgreSQL exporter, no direct connection.** Loses ad-hoc
SQL aggregates — the dashboard would need every metric pre-defined as
an exporter query, with a redeploy to add a new one. The PO Overview
is the type of dashboard that grows panels over time; pre-defining
every aggregate is the wrong shape.
- **Read replica or logical-replication slot dedicated to Grafana.**
Real operational cost (extra Postgres instance, replication monitoring,
storage doubled) disproportionate to a weekly PO glance.
- **Versioned migration with `flyway repair` for rotation.** Rejected:
conflates schema lifecycle with credential lifecycle, requires manual
intervention to rotate, and the repair command's semantics are
surprising to operators unfamiliar with Flyway internals.
- **Hardcoded fallback password when env var is unset.** Rejected as a
security blocker: publishes a known credential for a role with read
access to user activity and full letter text. The fail-closed
behavior is the explicit defense.
## References
- Issue #651 — PO Overview Grafana dashboard
- `backend/src/main/resources/db/migration/V68__add_grafana_reader_role.sql`
- `backend/src/main/resources/db/migration/R__grafana_reader_password.sql`
- `backend/src/main/java/org/raddatz/familienarchiv/config/FlywayConfig.java`
- `backend/src/test/java/org/raddatz/familienarchiv/config/GrafanaReaderRoleIntegrationTest.java`
- `infra/observability/grafana/provisioning/datasources/datasources.yml`
- `docker-compose.observability.yml``archiv-net` bridge on `obs-grafana`
- `docs/DEPLOYMENT.md §4` — rotation runbook

View File

@@ -1,201 +0,0 @@
# ADR-025 — Canonical Import Output as Contract & Single-Migration Schema Foundation
**Date:** 2026-05-27
**Status:** Accepted
**Issue:** #671 (schema, decisions 12); #669 (importer architecture, decision 3)
**Milestone:** Handling the Unknowns — honest uncertainty in dates & people
---
## Context
The "Handling the Unknowns" milestone introduces honest uncertainty into the archive:
documents whose dates are known only approximately or as a range, and people the importer
infers from raw attribution text but cannot confidently identify. Three sibling issues —
date precision (#666), name triage (#665), and the importer (#669) — each independently
planned a Flyway `V69` migration that altered `persons`. Three `V69`s is a boot failure
(Flyway versions must be unique), and `persons.provisional` was at risk of being defined
twice.
Two durable decisions had to be made before any application code in Phases 36 could
compile against the new schema.
---
## Decision
### 1. All import/precision/attribution/identity schema lives in ONE migration with a single owner
`V69__import_precision_attribution_identity_schema.sql` adds every new column for this
milestone in a single, atomic, forward-only migration:
- `documents`: `meta_date_precision` (backfilled `DAY` where dated / `UNKNOWN` where not,
then `NOT NULL`), `meta_date_end`, `meta_date_raw`, `sender_text`, `receiver_text`.
- `persons`: `source_ref` (unique index, nullable), `provisional` (`NOT NULL DEFAULT false`).
- `tag`: `source_ref` (unique index, nullable).
Integrity is pushed to the database as fail-closed `CHECK` constraints (the precedent is
`V22`'s `person_type` allowlist):
- `meta_date_precision` must be one of the seven enum values.
- `meta_date_end` may be non-null **only** when precision = `RANGE` (one-directional, not
biconditional — see Consequences).
- `meta_date_end >= meta_date` for ranges with both endpoints (a `CHECK`, not a trigger).
- `meta_date_raw`, `sender_text`, `receiver_text` are length-capped at 10 000 (mirrors the
`transcription_blocks` cap in `V18`).
No sibling issue adds another migration that alters `persons` or `documents` in this
milestone.
### 2. The backend `DatePrecision` enum is a verbatim mirror of the normalizer's `Precision`; the canonical output is the contract
The importer reads the Python normalizer's canonical output
(`tools/import-normalizer/`). The backend `DatePrecision` enum
(`DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN`) is a verbatim copy of the normalizer's
`Precision(StrEnum)` (`dates.py`). There is **no translation layer**: the normalizer's
output strings are persisted as-is. The same applies to `source_ref`, which carries the
normalizer's `person_id` / canonical `tag_path` unchanged as the re-import idempotency key.
### 3. The importer is four idempotent loaders over the canonical artifacts; Java no longer parses the raw spreadsheet (Phase 3, #669)
The legacy `MassImportService` read the *raw* original spreadsheet by positional column
index (`@Value app.import.col.*`) and re-derived everything in Java (ISO-only date parsing,
name classification via `findOrCreateByAlias`, an ODS/XXE XML path). It is **deleted**.
The rebuild is a `CanonicalImportOrchestrator` driving four single-responsibility loaders in
an explicit dependency DAG — `TagTreeImporter``PersonRegisterImporter`
`PersonTreeImporter``DocumentImporter` — that **consume the canonical artifacts produced
by the offline Python normalizer** (`tools/import-normalizer/out/`, synced onto the ops host
alongside the PDFs — see "Canonical artifacts are produced locally, NOT version-controlled"
below). A shared `CanonicalSheetReader` maps columns **by header
name** (not by index) and fails closed (`IMPORT_ARTIFACT_INVALID`) on a missing header. Each
loader calls the **owning domain's service**, never a repository (layering rule); the tree
loader uses `RelationshipService`, never the relationship repository.
Settled sub-decisions:
- **Idempotency precedence is domain-specific.** Persons/tags upsert by `source_ref`,
documents by `index`. Two distinct rules apply:
- **Person/Tag scalar fields = preserve human edits.** On re-import a non-blank field a human
changed in-app is never overwritten (blank fields are filled from canonical via the single
`preferHuman` idiom), and `provisional` is monotonic-downward — once a human confirms a
person (`false`) it never reverts to `true`. Because the orchestrator loads the register and
tree *before* documents, a person already `false` can never be flipped provisional by a
later document row that references the same `source_ref`, regardless of document-row order.
- **Document sender/receivers/tags = canonical-authoritative.** A document's sender, receiver
set, and tag set are owned by the canonical row, not the archivist. On re-import of a
PLACEHOLDER document `DocumentImporter` clears and re-populates `receivers`/`tags` so a row
whose set *shrinks* prunes the removed links rather than accumulating stale ones. The
"preserve human edits" rule above does **not** extend to these collections. The raw
`sender_text`/`receiver_text` cells are always retained verbatim (a separate invariant).
Note non-PLACEHOLDER documents are skipped entirely (`ALREADY_EXISTS`), so once a document
has a file the importer never touches it again — this bounds the authoritative-overwrite
blast radius to placeholder rows.
Verified against real Postgres in `CanonicalImportIntegrationTest`
(`reimport_preservesHumanEditedPersonField`, `reimport_prunesRemovedReceiverAndTag…`,
`import_neverFlipsRegisterPersonToProvisional…`).
- **Name policy = Option A.** The normalizer resolved attribution upstream: the document sheet
carries the resolved slug in `sender_person_id` / `receiver_person_ids` and the raw cell in
`sender_name` / `receiver_names`. The importer routes register-first by `source_ref`
(provisional `Person` when a slug is unmatched), and **always retains the raw cell** in
`sender_text` / `receiver_text` even when a person is linked — the load-bearing invariant
behind the merge story. A row with no slug but raw text (prose / `?` / object-noise) links
no person and keeps only the raw text.
- **`provisional` is now populated.** Importer-minted persons are `provisional = true`;
register and tree persons stay `false`. This is the Phase-3 contract the schema (decision 1)
left at default-`false`.
- **PDFs resolve directly by index (`<index>.pdf`), not by a `file` column.** The corpus is
uniform — all PDFs are named `<index>.pdf` flat in the import dir (e.g. `W-0124.pdf`,
`Mü-0001.pdf`) — so `DocumentImporter` resolves a document's PDF with an O(1)
`importDir.resolve(index + ".pdf")` lookup. The redundant `file` column (carrying the
spreadsheet's messy `datei` value) and the recursive directory walk that resolved it were
removed (#686, which also closed #676 — the O(rows×tree) walk is gone). The normalizer no
longer emits `file` or the `index_file_mismatch` review flag.
- **Security guards are defense-in-depth, not upstream-trust.** The `index` is the only thing
that drives the on-disk lookup, so it is treated as hostile (CWE-22 does not care it came from
our tool): `isValidImportIndex` rejects slash/backslash, three Unicode slash homoglyphs, any
`.` (so `<index>.pdf` is the only extension and `..` can never appear), null byte, and
absolute paths, and requires a strict catalog shape (14 Latin letters incl. umlauts, one or
more hyphens, digits, optional trailing `x`). A bad index skips the row with a clear
`SkipReason` (`INVALID_FILENAME_PATH_TRAVERSAL`). The resolved canonical path is still asserted
to stay inside the import dir as a second line of defense (a symlinked `<index>.pdf` cannot
escape), and the `%PDF` magic-byte check still gates upload. These guards and their tests were
ported from the file-column resolution (originally from `MassImportService`).
---
## Consequences
- **RANGE is one-directional, not biconditional.** A `RANGE` row may have a null
`meta_date_end` (an open-ended range with only a start), because the normalizer can emit
start-only ranges. A biconditional `RANGE ⟺ end IS NOT NULL` rule would reject valid
normalizer output, so it was rejected. Phase 4 rendering must handle a `RANGE` with no end
gracefully.
- **`provisional` stays `false` throughout this phase.** The column and flag exist, but no
code path sets it `true`; the importer (Phase 3) is the only writer. This is intentional,
not a half-built feature.
- **A future dev must not "improve" the enum.** Renaming or dropping a `DatePrecision` value
without changing the normalizer silently breaks import idempotency and date rendering. The
enum's Javadoc states this; the DB `CHECK` enforces validity independent of the Java enum.
- **`source_ref` is unique + nullable.** Manually created persons/tags have `source_ref =
NULL`; Postgres allows multiple NULLs under a plain unique index, so no backfill is needed.
- **Forward-only.** The migration is immutable once shipped (Flyway checksum model); any fix
goes in a later version. There is no down-migration — rollback means restoring from the
nightly `pg_dump`, the standard procedure.
- **`runImport()` is non-transactional — per-loader transactions only.** The orchestrator
does not wrap the four loaders in a single transaction; each loader (or the per-call
`upsertBySourceRef` / `DocumentImporter.load`) carries its own `@Transactional` boundary. A
partial failure mid-run (e.g. the document loader throws after tags + persons committed)
leaves the earlier loaders' data committed and the `ImportStatus` set to `FAILED`. This is
acceptable precisely because the import is idempotent: re-running is safe and converges to
the same state, so the operational recovery for a partial failure is simply to fix the
offending artifact and re-trigger the import — no manual cleanup of half-written data is
required. A future maintainer must not assume all-or-nothing semantics.
- **The index pattern is corpus-specific and must be revisited if the catalog scheme grows.**
`INDEX_PATTERN` accepts only the *current* corpus shape — at most four Latin-1 letters (incl.
umlauts) followed by one or more hyphens, ASCII digits, and an optional trailing `x`. This is a
conscious constraint, not a general filename validator: a future sub-collection catalogued with
a 5-letter prefix, a digit-led id, or a non-Latin-1 letter (e.g. `Č` or a Cyrillic id) would
fail `isValidImportIndex` and its rows would be **skipped** (`INVALID_FILENAME_PATH_TRAVERSAL`),
not imported. Likewise a real PDF that does not follow `<index>.pdf` produces a `PLACEHOLDER`
(the importer logs both cases distinctly — see #686). If the catalog scheme ever changes, the
pattern and its tests must be widened deliberately; do not loosen it casually, as it is the
allowlist that keeps the on-disk lookup safe. Note `\d` is intentionally ASCII-only — adding
`Pattern.UNICODE_CHARACTER_CLASS` would silently widen the accepted digit set.
- **A malicious/garbage index skips its row with a loud `SkipReason`, by design.** Since #686
the index is the only on-disk lookup key. An index that fails `isValidImportIndex`
(path separator, traversal token, slash homoglyph, null byte, absolute path, or a non-catalog
shape) is recorded as a `SkippedFile` with reason `INVALID_FILENAME_PATH_TRAVERSAL` and the
import continues with the remaining rows — nothing outside the import dir is ever read. A
symlinked `<index>.pdf` whose canonical path escapes the import dir is the one case that still
aborts the import (a `DomainException` from the containment assertion), because a syntactically
valid index resolving outside the dir is an environment-level attack signal, not a row typo.
- **`PersonSummaryDTO` coupling.** `provisional` was added to the `PersonSummaryDTO` native
interface projection; because the projection is backed by native SQL, the column had to be
added to all three native `SELECT`s (`findAllWithDocumentCount`, `searchWithDocumentCount`,
`findTopByDocumentCount`) or it would silently return `false`. Guarded by integration tests
against real Postgres.
---
## Canonical artifacts are produced locally, NOT version-controlled
The four files in `tools/import-normalizer/out/` —
`canonical-documents.xlsx`, `canonical-persons.xlsx`, `canonical-tag-tree.xlsx`,
`canonical-persons-tree.json` — contain real family PII (names, addresses, attribution
prose) and are **deliberately excluded from the git index** via
`tools/import-normalizer/.gitignore`. They are regenerated locally from the source
spreadsheet by running the Python normalizer, and synced into the ops host's
`IMPORT_HOST_DIR` out-of-band (alongside the `<index>.pdf` corpus) — the same mechanism
that delivers the PDFs.
The contract between normalizer and importer is the **header schema** (column names,
their types, the `Precision` enum strings, the slug shape) — not the file contents.
`CanonicalSheetReader` maps columns by header name and fails closed
(`IMPORT_ARTIFACT_INVALID`) on a missing header, which is what locks the contract; the
file-level golden fixtures stay outside the repo.
A future maintainer must not "fix" CI by checking these artifacts back in — they are
PII, the regression that prompted this rule. Tests use small synthetic fixtures
constructed in-process (`DocumentImporterTest`, `CanonicalImportIntegrationTest`) rather
than real-corpus snapshots.

View File

@@ -1,162 +0,0 @@
# ADR-026 — In-House Stammbaum Layout, dagre Evaluated and Deferred
**Date:** 2026-05-28
**Status:** Accepted
**Issue:** #361
**Supersedes:** _none_
**Supersedes-on-trigger:** A future ADR-027 if any acceptance criterion below stops converging in-house.
---
## Context
After #689 shipped the seeded-rank invariant — `buildLayout.ts` treats imported
`persons.generation` as a strict row anchor and the iterative heuristic only
runs for unseeded nodes — the question "should we adopt
[@dagrejs/dagre](https://www.npmjs.com/package/@dagrejs/dagre) for Stammbaum
layout?" had to be re-evaluated.
dagre's headline value is **rank assignment** via `network-simplex` /
`longest-path`. That value is now mostly redundant: curated import data already
pins ranks for the family graph, and the residual heuristic only places
unseeded nodes (today: family members imported without a `generation` column,
spouses with no parents in the graph).
What remains are **position-within-rank** problems:
1. Multi-spouse persons (canonical case: Albert de Gruyter, 4 marriages) whose
secondary marriages were silently dropped by a `Map<string, string>` shape.
2. Intra-family marriages — two persons in different sibling blocks at the
same rank who marry each other (latent; zero cases in current data).
3. Unseeded loose spouses whose parents are also in the graph (latent; zero
cases — 0 of 942 unseeded persons match the predicate in the May-2026
snapshot).
Six persona walkthroughs on #361 (Leonie/UX, Felix/Dev, Markus/Architect,
Nora/Security, Sara/QA, Tobias/DevOps, Elicit/Requirements) converged on the
same recommendation: try the in-house fix path first, against the canonical
dataset, with quantitative exit triggers — adopt dagre only if any acceptance
criterion fails to converge.
---
## Decision
**Keep Stammbaum layout in-house. Do not adopt dagre at this time.**
The fix path lands as six commits on #361:
1. Spec geometry reconcile (`NODE_W=160, NODE_H=56` matches `buildLayout.ts`)
and an explicit seeded-rank-invariant Layout-rules line.
2. Canonical `/api/network` fixture capture script + pinned snapshot for
structural assertions in `buildLayout.test.ts`.
3. `spousePairs: Map<string, string>``Map<string, Set<string>>`. Preserves
all marriages; closes Nora's robustness gap (edges referencing IDs outside
`allNodes` are guarded at ingestion).
4. Multi-spouse ordering: `(fromYear ASC NULLS LAST, displayName ASC)`,
inserted to the right of the parented focal — matches Leonie's UX rule.
5. Intra-family-marriage block merge across same-rank parented sibling blocks
(AC2) — adjacent placement at the join boundary.
6. Marriage-line midpoint dot enlarged from `r=4.5` to `r=6` (WCAG 1.4.11
informational contrast — the dot disambiguates stacked marriages and is
no longer decorative).
The block-packer + AC2 merge stays well under Markus's 80-LoC extraction
threshold, so `packBlocks.ts` is **not** yet warranted.
---
## Consequences
### Accepted today
- **AC1 (multi-spouse preservation)** is now a property of `buildLayout`, verified
by both synthetic and canonical fixture tests.
- **AC2 (intra-family marriage)** ships latent but covered by a synthetic
two-family regression test.
- **AC4 (seeded-rank invariant)** preserved end-to-end by every `buildLayout.test.ts`
case from #689.
- **AC5 (spec ↔ code geometry)** reconciled in commit 1.
### Deferred with revisit triggers
- **AC3 — Unseeded loose spouse with parents-in-graph.** Database verification:
0 of 942 unseeded persons match the predicate today. Structurally, every
realistic case maps to a **curation/import gap** (P's parents were imported
with `generation`, P themselves was not) and belongs in the canonical import
sheet rather than `buildLayout`. **Revisit trigger:** first canonical fixture
containing a parented unseeded spouse — at which point this ADR is updated
in place or superseded by an ADR-027. **Reproducible verification query**
(PostgreSQL — paste into a read-only psql session against
`familienarchiv_archive`):
```sql
-- AC3 reachability probe. Returns one row per unseeded person who has at
-- least one parent edge whose parent IS seeded. A non-zero count means the
-- AC3 layout branch becomes reachable for that person and ADR-026 should
-- be revisited. Last run May 2026: 0 rows.
SELECT p.id, p.display_name
FROM persons p
WHERE p.generation IS NULL
AND EXISTS (
SELECT 1
FROM person_relationships r
JOIN persons parent ON parent.id = r.person_id
WHERE r.relation_type = 'PARENT_OF'
AND r.related_person_id = p.id
AND parent.generation IS NOT NULL
);
```
The same predicate is encoded as a unit-testable JavaScript function — see
`findAc3Candidates()` in
`frontend/src/lib/person/genealogy/__fixtures__/findAc3Candidates.mjs`,
asserted against the committed canonical fixture by
`validateFixture.test.ts`, and emitted as a stderr soft-warn by
`frontend/scripts/capture-network-fixture.mjs` on every recapture. The SQL
is the source-of-truth probe against live data; the function is the
capture-time and fixture-time signal that the predicate's count crossed
zero.
- **AC6 — Bundle-impact gate (≤ 40 kB gzipped on `/stammbaum`).** Moot under
this ADR; reactivates only under ADR-027 (dagre adoption).
- **AC7 — Visual regression at 320 / 768 / 1440.** `toHaveScreenshot()`
permanently dropped (high running cost, speculative coverage). The axe-core
3:1 contrast check for the enlarged marriage dot is verified one-shot at
PR time, not committed; the permanent contrast/breakpoint gate lands with
#692 (mobile pan/zoom epic) alongside the breakpoint visual-regression
infrastructure.
### UX-signal-only stop trigger for dagre adoption
There is **no LoC cap** on the in-house path. The only divergence signal that
warrants reopening the dagre decision is a **UX failure against the canonical
fixture** — specifically, Albert de Gruyter's 4 marriages failing the read
test ("can a 67-year-old researcher unambiguously see all four spouses?").
If that ever happens, Felix posts a divergence-evidence comment on #361 (or
the equivalent successor issue), the team re-runs brainstorming with the
dagre option on the table, and adoption proceeds under the supply-chain
controls already documented in #361's body (`@dagrejs/dagre` exact-pinned, no
auto-merge, try/catch fallback with structured log, deterministic input sort).
### Operational
- **No CI, image, or compose changes.** Pure frontend layout work; standard
frontend rebuild covers the deploy.
- **No service topology changes.** No new env vars, ports, resource limits.
---
## Notes
- `frontend/scripts/capture-network-fixture.mjs` is a **local-only developer
utility**, never invoked from CI. Re-run intentionally; commit the resulting
JSON in one atomic commit when a new structural case appears (new edge type,
new marriage configuration, new generation range).
- The canonical fixture contains real family names. Repository is private;
scrubbing is a single-commit migration if it ever opens.
- Brand-mint enforcement on SVG strokes (Leonie's "all connectors render in
brand-navy, hierarchy comes from shape") stays a **code-review check at PR
time**. No CI grep, no custom ESLint rule.
- **Revisit cadence.** Re-evaluate dagre adoption on the first canonical
fixture refresh that hits AC3, OR by 2027-05-01 at the latest. Owner: Felix
Brandt.

View File

@@ -93,7 +93,7 @@ C4Component
### 3b — Document Management & Import
Document management, file storage, and the canonical import.
Document management, file storage, and bulk Excel/ODS import.
```mermaid
C4Component
@@ -105,11 +105,12 @@ C4Component
System_Boundary(backend, "API Backend (Spring Boot)") {
Component(docCtrl, "DocumentController", "Spring MVC — /api/documents", "CRUD for documents: search, get by ID, update metadata, upload/download file, conversation thread, and batch metadata updates.")
Component(adminCtrl, "AdminController", "Spring MVC — /api/admin", "Triggers the asynchronous canonical import (requires ADMIN permission). Reports import state via GET /api/admin/import-status (IDLE/RUNNING/DONE/FAILED).")
Component(adminCtrl, "AdminController", "Spring MVC — /api/admin", "Triggers asynchronous Excel/ODS mass import (requires ADMIN permission). Reports import state (IDLE/RUNNING/DONE/FAILED).")
Component(docSvc, "DocumentService", "Spring Service", "Core document business logic: store, update, search. Resolves persons and tags, delegates file I/O to FileService, builds dynamic JPA Specifications, and integrates with audit logging.")
Component(fileSvc, "FileService", "Spring Service", "Wraps AWS SDK v2 S3Client. Uploads files with UUID-keyed paths, computes SHA-256 hash, downloads with content-type detection, and generates presigned URLs for OCR access.")
Component(importOrch, "CanonicalImportOrchestrator", "Spring Service — @Async", "Runs four idempotent loaders (TagTree → PersonRegister → PersonTree → Document) in a fixed DAG over the normalizer's committed canonical artifacts (canonical-*.xlsx + canonical-persons-tree.json) from /import — see diagram 3b. Owns the IDLE/RUNNING/DONE/FAILED state machine.")
Component(massImport, "MassImportService", "Spring Service — @Async", "Reads Excel/ODS files from /import mount. Tracks import state (IDLE/RUNNING/DONE/FAILED) and delegates to ExcelService. Returns immediately; processing runs asynchronously.")
Component(excelSvc, "ExcelService", "Spring Service", "Parses Excel/ODS workbooks (Apache POI). Column indices configurable via application.properties. Creates/updates document records per row.")
Component(minioConf, "MinioConfig", "Spring @Configuration", "Creates the S3Client and S3Presigner beans with path-style access for MinIO. Validates MinIO connectivity on startup.")
Component(docRepo, "DocumentRepository", "Spring Data JPA", "Queries documents with Specification-based dynamic search, bidirectional conversation thread queries, full-text search with ranking and match highlighting, and transcription pipeline queue projections.")
@@ -122,15 +123,14 @@ C4Component
Rel(frontend, docCtrl, "Document requests", "HTTP / JSON")
Rel(frontend, adminCtrl, "Trigger import", "HTTP / JSON")
Rel(docCtrl, docSvc, "Delegates to", "")
Rel(adminCtrl, importOrch, "Triggers", "")
Rel(adminCtrl, massImport, "Triggers", "")
Rel(docSvc, fileSvc, "Upload / download files", "")
Rel(docSvc, docRepo, "Reads / writes documents", "")
Rel(docSvc, docSpec, "Builds search predicates", "")
Rel(docSvc, personSvc, "Resolves sender / receivers", "")
Rel(docSvc, tagSvc, "Finds or creates tags", "")
Rel(importOrch, docSvc, "Upserts documents (PDF by index) — see 3b", "")
Rel(importOrch, personSvc, "Upserts persons + relationships", "")
Rel(importOrch, tagSvc, "Upserts tag hierarchy", "")
Rel(massImport, excelSvc, "Parses Excel/ODS file", "")
Rel(excelSvc, docSvc, "Creates / updates documents", "")
Rel(minioConf, fileSvc, "Provides S3Client and S3Presigner beans", "")
Rel(fileSvc, minio, "PUT / GET / presigned URL objects", "S3 API / HTTP")
Rel(docRepo, db, "SQL queries", "JDBC")
@@ -492,7 +492,7 @@ C4Component
Component(adminGroups, "/admin/groups, /admin/groups/[id], /admin/groups/new", "SvelteKit Routes", "Permission group management: create/edit groups and their permission sets.")
Component(adminTags, "/admin/tags and /admin/tags/[id]", "SvelteKit Routes", "Tag administration: edit tag hierarchy, merge tags, delete subtrees.")
Component(adminOcr, "/admin/ocr and /admin/ocr/[personId]", "SvelteKit Routes", "Global and per-person OCR configuration. Manages script types and triggers sender model training.")
Component(adminSystem, "/admin/system", "SvelteKit Route", "System status panel. Triggers the canonical import (POST /api/admin/trigger-import). Displays import state.")
Component(adminSystem, "/admin/system", "SvelteKit Route", "System status panel. Triggers Excel/ODS mass import (POST /api/admin/trigger-import). Displays import state.")
Component(hilfe, "/hilfe/transkription", "SvelteKit Route", "Static transcription style guide for Kurrent and Sütterlin character recognition. No backend calls.")
}

View File

@@ -43,12 +43,9 @@ Rel(ocr, storage, "Fetches PDF via presigned URL", "HTTP / S3 presigned")
Rel(mc, storage, "Bootstraps bucket + service account on startup", "MinIO Client CLI")
Rel(promtail, loki, "Pushes log streams", "HTTP/Loki push API")
Rel(backend, tempo, "Sends distributed traces via OTLP", "HTTP / OTLP / port 4318 (archiv-net)")
Rel(prometheus, backend, "Scrapes JVM + HTTP metrics", "HTTP 8081 /actuator/prometheus")
Rel(prometheus, ocr, "Scrapes OCR + http_* metrics", "HTTP 8000 /metrics")
Rel(grafana, prometheus, "Queries metrics", "HTTP 9090")
Rel(grafana, loki, "Queries logs", "HTTP 3100")
Rel(grafana, tempo, "Queries traces", "HTTP 3200")
Rel(grafana, db, "Read-only dashboard queries via grafana_reader role", "PostgreSQL / archiv-net")
Rel(glitchtip, db, "Stores error events in glitchtip DB", "PostgreSQL / archiv-net")
Rel(obs_glitchtip_worker, obs_redis, "Processes Celery tasks", "Redis / obs-net")

Some files were not shown because too many files have changed in this diff Show More