Commit Graph

775 Commits

Author SHA1 Message Date
Marcel
61ca5a6e40 test(person): tighten generation null-clear coverage (#689)
Sara's QA concerns:

1. PersonControllerTest.updatePerson_returns200_whenGenerationNull was
   asymmetric — only checked status 200, no body assertion. Now also
   asserts `$.generation` is null in the JSON response, mirroring the
   in-range test's body check.

2. New full-stack PUT→DB→GET round-trip in PersonServiceIntegrationTest
   (updatePerson_clearGenerationToNull_readsBackNullFromDb) seeds a
   person with generation=3, calls updatePerson with generation=null,
   flushes the persistence context, and asserts the column reads back
   null from the DB. Without this we only had the mocked WebMvcTest
   boundary; nothing proved JPA actually wrote SQL NULL.

3. Sibling test (updatePerson_setGenerationToZero_readsBackZeroFromDb)
   pins the G 0 end-to-end so a primitive zero can't silently coerce
   to null anywhere along controller → service → JPA.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 16:19:13 +02:00
Marcel
516a0a3814 refactor(person): single source of truth for generation bounds (#689)
Markus flagged the 0/10 range was duplicated across five sites (DB
CHECK, both importers, DTO @Min/@Max, dropdown range). New
PersonGeneration.MIN_GENERATION / MAX_GENERATION constants are now
the canonical Java source; the DTO annotations and both importer
guards reference them. The V70 SQL CHECK comment now points at the
Java constants so future widening updates one Java class plus one
SQL literal (Flyway forbids rewriting the migration in place).

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 16:16:26 +02:00
Marcel
8f163f9b77 feat(import): warn on generation monotonicity violations (#689)
Inject RelationshipService into CanonicalImportOrchestrator and walk
PARENT_OF edges in the family network after both person loaders finish
(before documents). For every edge where child.generation is set and
not strictly deeper than parent.generation, log a WARN — soft check,
never fails the batch.

Reads through getFamilyNetwork() per the layering rule (orchestrator
never touches PersonRelationshipRepository directly). Curators see the
warning in the import log; the rest of the pipeline is unaffected so
data with curatorial gaps still loads cleanly.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:39:29 +02:00
Marcel
40511535eb feat(relationship): add generation to PersonNodeDTO + update all sites (#689)
PersonNodeDTO is a positional record. The optional Integer generation
field is inserted between deathYear and familyMember so all four
construction sites stay readable without a builder.

- RelationshipService.getFamilyNetwork → populates with
  person.getGeneration() (the Stammbaum's strict-rank source on the
  frontend).
- RelationshipInferenceService.findAllFor → populates the same way;
  inference UI does not consume it but the field travels along for
  consistency.
- RelationshipControllerTest fixtures pass null.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:35:40 +02:00
Marcel
a68a822c13 feat(import): pass generation from JSON in PersonTreeImporter (#689)
Reads the optional `generation` integer from the canonical tree JSON and
routes it into PersonUpsertCommand. Out-of-range values are skip-and-
warned with the same policy as the register importer.

Tree imports run after register (per CanonicalImportOrchestrator); a
tree-confirmed integer overwrites a register-parsed value — both sides
are "canonical" in preferHuman terms (neither is a human edit).

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:32:27 +02:00
Marcel
df0037cba2 feat(import): parse generation column in PersonRegisterImporter (#689)
Reads the optional `generation` cell by header name (REQUIRED_HEADERS is
not extended — REQ-IMP-001 backward-compat for older artifacts), parses
it through GENERATION_PATTERN (^\s*G?\s*(-?\d+)), and routes it into
PersonUpsertCommand.generation.

Out-of-range values (G 99, G -1) are skip-and-warned, never abort the
batch; the post-parse range guard mirrors the V70 CHECK constraint so
the DB never sees a value Bean Validation wouldn't accept.

Pinned with a parametrised CsvSource covering every shape from the
acceptance criteria plus a backward-compat test (artifact without a
generation column still imports, all upserts get generation=null).

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:30:31 +02:00
Marcel
dcb5585c64 feat(person): route generation through service write paths (#689)
- fromCanonical writes the imported generation into a new Person row.
- mergeCanonical routes existing/canonical generation through the
  existing preferHuman(Integer, Integer) overload so a human-edited
  value is never overwritten on re-import (ADR-025).
- updatePerson writes generation verbatim from the form DTO so a human
  can clear it back to null — same shape as birthYear/deathYear.
- createPerson(PersonUpdateDTO) writes generation so /persons/new flow
  doesn't silently drop a selected G value on create.

Pinned with five tests covering the four write paths plus the
documenting test that captures preferHuman's known limitation
(explicit human null is overwritten by a non-null canonical value —
same as birthYear/deathYear, deferred to a future helper rework if it
ever produces a user-visible bug).

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:27:11 +02:00
Marcel
1e77d6d98c feat(person): generation on PersonUpsertCommand + PersonUpdateDTO (#689)
Adds the optional generation field to both DTOs:

- PersonUpsertCommand gains Integer generation in the canonical-import
  builder chain; service wiring lands in the next commit.
- PersonUpdateDTO gains @Min(0)@Max(10) Integer generation, the form-path
  surface. The constraints mirror the V70 CHECK so validation fails fast
  at the controller before reaching the DB.

PersonControllerTest pins the validation behaviour: -1 → 400, 11 → 400,
null → 200, 3 → 200 for both PUT (update) and POST (create) paths. The
GlobalExceptionHandler maps MethodArgumentNotValidException to
VALIDATION_ERROR so the frontend's extractErrorCode keeps working.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:23:38 +02:00
Marcel
f22508ca91 feat(person): add nullable generation column to persons (#689)
Flyway V70: SMALLINT generation column with CHECK(0..10) and partial
index over non-null rows. Person.generation field surfaces it through
the JPA model. Pre-import rows and persons outside the curated family
graph legitimately stay null; the canonical importer (next commits)
back-fills via preferHuman so a human-edited value is never lost.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:20:24 +02:00
Marcel
8f3c799b8f test(relationship): reset family_member flag in setFamilyMember network test
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3m2s
CI / OCR Service Tests (pull_request) Successful in 24s
CI / Backend Unit Tests (pull_request) Successful in 3m56s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m4s
addRelationship now auto-flips family_member=true on both endpoints for
PARENT_OF/SPOUSE_OF/SIBLING_OF (commit 07300aef). That side-effect breaks
the pre-condition assertion in setFamilyMember_true_makes_person_appear_in_network,
which expects charlie not to appear in the network before the explicit flip.
Reset charlie's flag after addRelationship so the test still exercises the
setFamilyMember(true) -> network presence path it was written for.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 11:38:48 +02:00
Marcel
4cc725d546 refactor(importing): inject FileStreamOpener to remove test-only seam
DocumentImporter exposed a package-private openFileStream(File) so a
Mockito spy could force the IO-error branch of isPdfMagicBytes. The
test-only seam leaked into production: the method existed for testing,
not for any production extensibility.

Replace with a constructor-injected FileStreamOpener interface (single
abstract method, @FunctionalInterface) and a one-line
@Component DefaultFileStreamOpener delegate. Tests now inject a mock
opener instead of spying on the importer itself, which is also a more
idiomatic Mockito usage.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:29:41 +02:00
Marcel
535594378a fix(importing): use receiver_names for provisional person display name
resolveReceivers passed the slug as both `sourceRef` AND `lastName`, so
an unresolved receiver "smith-john" became a provisional Person with
lastName="smith-john" — a regression of the existing senderName→Person
contract.

Fix: zip the parallel `receiver_person_ids` and `receiver_names`
columns by position (the normalizer emits them 1:1 like
sender_person_id/sender_name). When the names list is shorter than the
slugs list, fall back to slug-as-name for the missing entries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:26:28 +02:00
Marcel
e93b09f1e2 refactor(importing): split DocumentImporter.buildDocument into named applyX helpers
buildDocument was a ~30-line method mixing attribution routing, date
parsing, authoritative collection management, file metadata, and
computed flags. Split into five named helpers — applyAttribution,
applyDates, applyAuthoritativeAssociations, applyFileMetadata,
applyComputedFlags — each doing one job. Pure refactor; all 43 existing
DocumentImporterTest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:23:24 +02:00
Marcel
07300aeff7 fix(person): flip family_member on both endpoints when a family-graph relationship is added
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m39s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Failing after 3m45s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m4s
The canonical importer creates persons via PersonRegisterImporter first (no family_member
set) and then upserts them via PersonTreeImporter, but mergeCanonical never propagates
family_member to existing persons — so persons with imported relationships ended up
flagged family_member=false and never appeared in /api/persons family filters or the
family-network view.

RelationshipService is documented as the owner of the family_member flag, so the fix
lives there: addRelationship now sets family_member=true on both endpoints whenever the
relation type is PARENT_OF / SPOUSE_OF / SIBLING_OF (the same set getFamilyNetwork
filters by). Non-family types (FRIEND/COLLEAGUE/EMPLOYER/DOCTOR/NEIGHBOR/OTHER) leave
the flag alone — a family doctor isn't a family member. Extracted the type list as a
FAMILY_RELATION_TYPES constant and reused it in getFamilyNetwork for a single source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 09:15:37 +02:00
Marcel
9d9cd644ec Merge remote-tracking branch 'origin/main' into HEAD
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m30s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m46s
CI / fail2ban Regex (pull_request) Failing after 46s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
# Conflicts:
#	frontend/src/lib/shared/dashboard/ReaderRecentDocs.svelte.spec.ts
#	frontend/src/routes/+page.server.ts
2026-05-27 22:16:26 +02:00
Marcel
f5e2241fe0 test(importing): pin regex reject-boundary + note untestable IO branch
Address PR #687 review concerns on DocumentImporterTest:
- Sara/Felix: add catalog-shape reject tests that pass every char
  pre-check but must fail INDEX_PATTERN — "J 0070" (space), "WXYZA-0001"
  (5 letters), "12-0001" (no letter prefix), "W-0001X" (uppercase X).
  Verified red against a weakened pattern, green against the real one,
  so the pattern branch (not the char guards) is now pinned.
- Felix: restore the import java.io.OutputStream line (was over-deleted
  and patched with a fully-qualified name).
- Sara: document why the resolvePdfByIndex getCanonicalPath IOException
  branch is intentionally left uncovered (no deterministic injection
  seam; the log.warn is the substantive fix).

Adjust the two reflective resolvePdfByIndex calls for the new rowNumber
parameter.

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
f96b9fbffc feat(importing): log import-row breadcrumbs and distinguish skip outcomes
Address PR #687 review concerns on DocumentImporter:
- Tobias: thread a 1-based source row number into importRow so the
  "index rejected" skip log carries a breadcrumb (the row number, never
  the raw hostile index) for post-import triage.
- Elicit: emit a distinct log when a valid index has no <index>.pdf on
  disk (normal PLACEHOLDER) so it is not conflated with a rejected index.
- Nora: add a log.warn in resolvePdfByIndex's getCanonicalPath IOException
  branch so the quiet fail-safe skip surfaces in ops, distinct from the
  deliberate symlink-escape abort.
- Felix: replace inline fully-qualified java.util.regex.Pattern with an
  import.
- Nora: document that \d is intentionally ASCII-only (do not add
  UNICODE_CHARACTER_CLASS).

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
f5eb227239 feat(importing): resolve import PDFs directly by index
The corpus is uniform — every PDF is <index>.pdf flat in the import
dir — so resolve a document's PDF with an O(1) importDir.resolve(index
+ ".pdf") lookup instead of a recursive directory walk over the file
column. The index is validated against a strict catalog pattern
(1–4 Latin letters incl. umlauts, hyphen(s), digits, optional x) plus
the ported separator/dot/dotdot/null/slash-homoglyph/absolute-path
guards, and the resolved canonical path is asserted to stay inside the
import dir as defense-in-depth. The %PDF magic-byte check still gates
upload; status UPLOADED/PLACEHOLDER and the index→originalFilename
upsert key are unchanged. The file column and findFileRecursive walk
are gone, and the security regression tests now assert a malicious or
garbage index is rejected and a valid index resolves to exactly
importDir/<index>.pdf within containment.

Closes #686
Closes #676

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
7183d15fe5 fix(document): restore pure-text-relevance FTS fast path past undated count
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m29s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m52s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
The global undated-count rework moved the pure-text-RELEVANCE shortcut
into runSearch, where it ran after the unconditional
findAllMatchingIdsByFts call. That routed pure-text relevance through the
in-memory id path and returned empty match data, breaking FTS rank order
and snippet/offset enrichment.

Hoist the shortcut back to the top of searchDocuments so it short-circuits
to findFtsPageRaw before findAllMatchingIdsByFts, while still computing the
global undatedCount for all non-fast-path searches.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 21:04:48 +02:00
Marcel
b52bf60913 fix(document): tie-break equal-date DATE sort by title asc, not createdAt
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3m2s
CI / OCR Service Tests (pull_request) Successful in 24s
CI / Backend Unit Tests (pull_request) Failing after 3m54s
CI / fail2ban Regex (pull_request) Successful in 47s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
Owner decision (#668): when two documents share a meta_date, order them by
title ascending instead of createdAt ascending. title is @Column(nullable=false)
so it is always present, giving a deterministic, human-meaningful total order.
Only the DATE-sort fast path changes; the in-memory SENDER/RECEIVER/RELEVANCE
comparators are untouched.

ORDER BY meta_date <dir> NULLS LAST, title ASC

Tests assert title-asc tiebreaking for same-date rows in BOTH directions, with a
fixture whose title order is the OPPOSITE of insertion (createdAt) order so the
test fails if the tiebreaker reverts to createdAt. The integration test drives
the production resolveSort against real Postgres.

Refs #668
2026-05-27 20:21:18 +02:00
Marcel
a3c3f14aea feat(documents): return global undated count in search response
The undated bucket count was page-local — derived from the year-grouping
of the current page's items, so it could never exceed the page size. The
owner's decision is for it to reflect ALL undated documents matching the
active filter across every page.

Add an undatedCount field to DocumentSearchResult, computed once per search
via a COUNT over the same filter spec with undatedOnly(true) forced —
independent of the "Nur undatierte" toggle so it never collapses to the
page slice or double-counts. A from/to range excludes undated rows by the
collision rule, so the count is legitimately 0 inside a date range.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:42:32 +02:00
Marcel
caec92e7de test(document): lock undated-stays-in-sender-group with ordered multi-sender assertions
Replace the single-sender containsExactlyInAnyOrder check with a two-sender
fixture and ordered containsExactly proving an undated doc stays within its
sender group and never floats to the page head. Add a DESC-direction case for
in-memory-path symmetry and an undated=true + sort=SENDER case capturing the
Specification to prove undatedOnly is still applied on the person-sort path.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:06:33 +02:00
Marcel
eacfd15f8e refactor(document): revert resolveSort to private
No test calls resolveSort directly — the sort tests assert through
searchDocuments + ArgumentCaptor<Pageable>, so the package-private widening
added no value. Narrow the API surface back to private.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:06:16 +02:00
Marcel
268c31a49b feat(document): thread an undated filter through search and the /ids path
Adds an optional `undated` query param to GET /api/documents/search and
/api/documents/ids, threaded through searchDocuments and findIdsForFilter
into the shared buildSearchSpec via undatedOnly(boolean). undated=true also
bypasses the pure-text RELEVANCE SQL shortcut, which skips buildSearchSpec
and would otherwise drop the predicate. The read GET stays unguarded
(WebMvc authz test pins 200 for an authenticated user, 401 unauthenticated).
A locking test proves the in-memory SENDER sort keeps undated letters under
their sender.

Refs #668
2026-05-27 18:42:17 +02:00
Marcel
39a462b2bb feat(document): add undatedOnly Specification for the undated-only filter
undatedOnly(false) is a no-op (null predicate); undatedOnly(true) returns
documentDate IS NULL, matching the existing hasStatus null-as-no-op pattern.
Real-Postgres tests pin the load-bearing guarantees H2 cannot prove: ASC
NULLS-LAST ordering, BETWEEN excludes null-dated rows, and that undated=true
combined with a from/to range returns empty (the collision rule).

Refs #668
2026-05-27 18:34:10 +02:00
Marcel
5f2ef823e1 fix(document): order undated documents last on the DATE sort fast path
resolveSort produced Sort.by(direction, "documentDate") with NATIVE null
handling, so Postgres surfaced undated (null meta_date) documents FIRST on
an ASC sort. Apply nullsLast() so undated rows order last for both ASC and
DESC, with a createdAt-asc tiebreaker for a stable total order when every
row is null-dated (the upcoming "Nur undatierte" filter).

Refs #668
2026-05-27 18:31:40 +02:00
Marcel
362672cdbf test(person): pin query count-parity and delete FK-detach ordering
Add countByFilter parity coverage for the query (LIKE) path so the shared
FILTER_WHERE slice and count can't drift, and an integration test proving
deletePerson detaches a person referenced as both sender and receiver before
delete — the documents survive (sender nulled, receiver link removed) with no
FK orphan.

Refs #667

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 14:19:06 +02:00
Marcel
1e3e420860 fix(person): report honest totals on the non-paged top-N persons path
The legacy sort=documentCount path wrapped its result with paged(top, 0,
safeSize, top.size()), so totalElements/pageSize looked like a paged slice of
a larger set when in fact the top-N query returns the complete result. Add a
dedicated PersonSearchResult.topN factory that reports reality — totalElements
= returned count, pageSize = that count, totalPages = 1 (0 when empty) — and
pin both the populated and empty semantics with controller tests.

Refs #667

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 14:19:00 +02:00
Marcel
529c92fcc3 feat(person): paginate GET /api/persons and add confirm/delete endpoints
GET /api/persons now returns PersonSearchResult with server-side filter params
(type, familyOnly, hasDocuments, provisional) and page/size bounds (@Min/@Max
-> 400). review=true drops the clean reader default. The legacy
sort=documentCount top-N path is folded into the paged contract. Add
PATCH /{id}/confirm and DELETE /{id}, both WRITE_ALL-guarded. Remove the now
unreachable PersonService.findAll(String).

BREAKING-CHANGE: GET /api/persons response shape changes from a bare list to
PersonSearchResult { items, totalElements, pageNumber, pageSize, totalPages }.

Refs #667

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 13:33:10 +02:00
Marcel
ec357ac13c feat(person): add paged search, confirm and delete to PersonService
PersonService.search maps a PersonFilter to the paired slice/count repository
queries and returns a PersonSearchResult with a server-side total. confirmPerson
clears the provisional flag (the state transition behind PATCH /confirm).
deletePerson detaches sender/receiver document references before the hard delete
so it cannot orphan an FK.

Refs #667

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 13:30:14 +02:00
Marcel
a24764e58a feat(person): add filter-aware paged repository queries
Add PersonSearchResult (mirrors DocumentSearchResult shape) and PersonFilter
records, plus paired findByFilter/countByFilter native queries sharing one
WHERE clause so the rendered page and totalElements can never drift. Filters
(type, familyOnly, hasDocuments, provisional, readerDefault, q) each disable
via a null/false param. Tested against real Postgres via Testcontainers.

Refs #667

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 13:27:39 +02:00
Marcel
f99673321c test(dates): pin edit-form precision field binding to DocumentUpdateDTO
@WebMvcTest multipart PUT asserting metaDatePrecision / metaDateEnd /
metaDateRaw form field names bind to the DTO. A rename on either side
silently drops the precision edit; the captured DTO catches it.

Refs #666
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 12:36:51 +02:00
Marcel
728078f1e5 fix(dates): preserve stored date precision when edit omits it
updateDocument unconditionally set metaDatePrecision/End/Raw from the DTO,
so saving an unrelated edit (a multipart PUT where the form omits the
precision controls) clobbered the stored precision with null — fabricating
a precision the user never chose. Apply each field only when the DTO carries
it, mirroring the existing metadataComplete/scriptType guards.

Refs #666
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 12:34:58 +02:00
Marcel
7245571ea8 feat(document): edit document date precision, end and raw
Adds the edit-form date-precision controls to WhoWhenSection: a labelled
precision <select> (min 48px touch target for senior authors), a conditionally
revealed end-date field (only for RANGE, announced via aria-live=polite), and
the verbatim raw cell as labelled read-only static text (not a disabled input).
Fields submit as metaDatePrecision/metaDateEnd/metaDateRaw and flow through the
existing PUT form action.

Backend: DocumentService.updateDocument now persists the three DTO fields (they
existed since #671 but were never applied), so the new controls are real, not
decorative — addresses Nora's "a client <select> constrains nothing" note for
the persistence half. Server-side enum/end>=start validation remains #671's
scope.

Refs #666

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 12:04:14 +02:00
Marcel
c816934391 feat(importing): build honest precision-aware document import titles
Wires DocumentTitleFormatter into DocumentImporter.buildDocument: the title
now reads "{index} – {honest date label} – {location}", so a MONTH-precision
letter's title says "Juni 1916" instead of a fabricated "1. Juni 1916", and an
UNKNOWN-date row keeps a bare index title. buildTitle stays under 20 lines by
delegating to the shared formatter (single source of truth with the UI label).

Restores the date+location title behavior that the old MassImportService had
(it appended a full GERMAN_DATE day) but now at the honest precision.

Refs #666

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 11:47:51 +02:00
Marcel
1caae38946 feat(importing): add precision-aware DocumentTitleFormatter
Adds the Java half of the honest date label — formatTitleDate(date,
precision, end, raw) — mirroring the frontend formatDocumentDate rules so an
import title never shows a precision the data lacks (MONTH → "Juni 1916", not
a fabricated day). Both implementations are pinned to the shared
docs/date-label-fixtures.json table, which this test asserts case-by-case, so
they cannot drift. Java's de CLDR renders the same "Jan."/"Dez." abbreviations
and en-dash the TS side produces.

Refs #666

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 11:45:57 +02:00
Marcel
151d6aa03f test(importing): clean up committed rows after CanonicalImportIntegrationTest
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m41s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 3m34s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
The canonical importer commits through its own transactions, so this test
cannot use @Transactional rollback for isolation. Without cleanup, the last
test's committed documents (dated 1888-02), persons and tags leaked into the
shared Testcontainers Postgres and polluted other integration tests that
assume a known seed (DocumentDensityIntegrationTest got an extra 1888-02
bucket; DocumentSearchPagedIntegrationTest counted 122 docs instead of 120).

Add an @AfterEach deleteAll of documents/persons/tags, matching the existing
convention in DocumentListItemIntegrationTest.

Refs #669
2026-05-27 11:09:21 +02:00
Marcel
e9ddaed76a refactor(person): unify fill-blank under preferHuman and clarify rowId trap
Unify birthYear/deathYear fill-blank logic under an Integer preferHuman overload so
every canonical field uses one self-documenting precedence idiom, and add a guard
test pinning year fill-blank vs human-edit preservation. Add a comment in
PersonTreeImporter.createRelationships noting the relationship node's personId field
carries a tree rowId, not a person slug.

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 11:03:56 +02:00
Marcel
5f53c3670f test(importing): verify re-import pruning and provisional precedence on real Postgres
Add a Testcontainers test that re-imports a document with a receiver and a tag
removed from the canonical row and asserts both links are pruned. Add a test that a
register person referenced by a document row is never flipped to provisional,
regardless of re-import, since the orchestrator loads the register/tree before
documents and the monotonic-downward guard prevents a flip. Pin that cross-loader
precedence in a mergeCanonical comment.

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 11:02:37 +02:00
Marcel
7ebf7acd72 test(importing): pin relationship error propagation and short-row reads
Add a negative test that an unexpected DomainException from
addRelationshipIdempotently propagates rather than being swallowed (only
DUPLICATE/CIRCULAR are caught for idempotency), guarding against a future
swallow-all refactor. Add a CanonicalSheetReader test for a row narrower than
the header (POI omits trailing empty cells) reading absent columns as "".

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:59:52 +02:00
Marcel
2f7ea37466 fix(importing): make document receivers/tags canonical-authoritative on re-import
The DocumentImporter accumulated receivers/tags via addAll without pruning, so a
shrunk canonical row left stale links on a re-imported PLACEHOLDER document. Clear
the collections before re-populating so the canonical row is authoritative: a removed
receiver/tag is now pruned. Raw sender_text/receiver_text retention is unchanged.

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:58:57 +02:00
Marcel
21c85ff081 docs(importing): document the canonical importer rebuild
- ADR-025: add decision 3 (four idempotent loaders over canonical artifacts;
  raw spreadsheet no longer parsed by Java) with the settled Option-A name
  policy, human-edit-preserve precedence, provisional contract, and ported
  security guards.
- l3-backend-3b diagram: replace MassImportService/ExcelService with the
  orchestrator, the four loaders, and CanonicalSheetReader, with the loader
  dependency edges.
- GLOSSARY: Canonical import / canonical artifact / CanonicalSheetReader terms;
  refresh SkippedFile (new INVALID_FILENAME_PATH_TRAVERSAL reason, index key).
- DEPLOYMENT §6: canonical-artifact prerequisite runbook (run normalizer →
  place four artifacts → trigger import); note idempotent re-run.
- CLAUDE.md (root + backend): importing/ package now lists the orchestrator +
  loaders + CanonicalSheetReader.

OpenAPI: no generate:api needed — the ImportStatus/SkippedFile generated
schemas already match the new types byte-for-byte (same fields + SkipReason
enum), so the API surface is unchanged.

Closes #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:44:45 +02:00
Marcel
9cc682cf72 test(importing): Testcontainers idempotency + human-edit-preserve IT
Full-stack integration test on real postgres:16-alpine (the UNIQUE(source_ref)
+ upsert-on-conflict only exist in real Postgres, never H2). Writes a
synthetic-but-real four-artifact set, runs the import twice, and asserts
person/tag/document counts are identical on re-import (no duplicates), plus
the Resolved-decision-#1 precedence: a person field edited in-app survives a
re-import. Also asserts register-first sender linkage with raw-text retention
and the provisional contract.

Fixes a re-import bug the IT surfaced: load() is now @Transactional so an
existing document's lazy receivers collection initialises within the session
(the previous self-invoked @Transactional on the per-row method never opened
a transaction). PersonTreeImporter owns its ObjectMapper rather than
depending on the web bean, which is absent in a NONE web environment.

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:41:08 +02:00
Marcel
459ba14207 feat(importing): add orchestrator, wire admin, retire raw-spreadsheet path
CanonicalImportOrchestrator runs the four loaders in an explicit dependency
DAG (TagTree -> PersonRegister -> PersonTree -> Document), owns the async
runner + ImportStatus state machine the admin UI consumes, smoke-checks all
four artifacts are present before starting (fail-fast IMPORT_FAILED_ARTIFACT
rather than a half-run), and fails closed on a malformed artifact.

AdminController now depends on the orchestrator; the {state, statusCode,
processed, skippedFiles, skipped} response shape is unchanged so
ImportStatusCard.svelte keeps working.

Deletes the legacy MassImportService (positional @Value app.import.col.*,
ISO-only parseDate, Java name classification) and the ODS/XXE
XxeSafeXmlParser path now that the loaders cover them — the security guards
were ported to DocumentImporter first (previous commit). Replaces the
positional column config in application.yaml with the canonical artifact
directory.

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:36:28 +02:00
Marcel
c56ba6219c feat(importing): add DocumentImporter loader with ported security guards
Fourth canonical loader. Maps canonical-documents.xlsx by header name,
routes each attribution register-first by source_ref (provisional person
when a slug is unmatched), ALWAYS retains the raw sender_name/receiver_names
in sender_text/receiver_text, splits pipe-delimited receivers, parses clean
date_iso/date_precision/date_end/date_raw with no semantic logic, attaches
the tag by canonical tag_path, and keeps the S3 upload + thumbnail plumbing
in small resolveFile/uploadToS3/buildDocument methods. Documents upsert by
index (originalFilename); UPLOADED when a file resolves on disk, PLACEHOLDER
otherwise.

Security guards ported intact from MassImportService BEFORE retiring it:
isValidImportFilename (forward/back slash, three Unicode slash homoglyphs,
.., null byte, absolute path), findFileRecursive canonical-path containment
(symlink-escape), and the %PDF magic-byte check + FILE_READ_ERROR path. The
file column is treated as hostile input (CWE-22): its basename is validated
then resolved only inside importDir, so a traversal value cannot escape.

Extracts the verbatim ImportStatus/SkipReason/SkippedFile shape into its own
class so the admin UI contract is unchanged.

Assumption: the committed canonical-documents.xlsx carries no
sender_category/receiver_category columns (the issue's described schema) —
the normalizer already resolved Option-A routing into slugs + raw names, so
the loader routes by slug presence rather than a category enum.

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:33:17 +02:00
Marcel
cbf1984430 feat(importing): add PersonTreeImporter loader
Third canonical loader. Reads canonical-persons-tree.json, upserts tree
persons via PersonService keyed on the shared personId slug (#670 now
emits it into the tree, so the tree reconciles with the register rather
than duplicating it). Relationships are resolved from local rowIds to the
upserted person UUIDs and created via RelationshipService (never the
repository). A duplicate/circular relationship on re-import is swallowed
for idempotency; unresolved rowIds are skipped with a warning.

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:28:33 +02:00
Marcel
f6bfb8f030 feat(importing): add PersonRegisterImporter loader
Second canonical loader. Reads canonical-persons.xlsx by header name and
upserts each register person via PersonService.upsertBySourceRef keyed on
the normalizer person_id. provisional is driven by the sheet's clean
value; Boolean.parseBoolean handles the capitalised Python "True"/"False".
ISO birth/death dates are reduced to the year the Person entity stores.

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:27:12 +02:00
Marcel
bcd928f12d feat(importing): add TagTreeImporter loader
First of four canonical loaders. Reads canonical-tag-tree.xlsx by header
name, upserts each tag via TagService.upsertBySourceRef (never the
repository — layering rule), and resolves parent links by stripping the
last /segment of the canonical tag_path. Idempotent by source_ref.

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:26:05 +02:00
Marcel
3501382ff5 feat(tag): add upsertBySourceRef keyed on canonical tag_path
Idempotent tag upsert for the Phase-3 importer (ADR-025). source_ref is
the stable identity (the canonical tag_path); on re-import a
human-renamed tag name is preserved while the parent link is refreshed.

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:24:30 +02:00
Marcel
05dd824283 feat(person): add upsertBySourceRef with human-edit-preserve precedence
Idempotent person upsert keyed on the normalizer person_id (source_ref),
for the Phase-3 canonical importer. Re-import precedence (Resolved
decision #1): a non-blank existing field is never overwritten, blank
fields are filled from canonical, and provisional is monotonic — once a
human confirms a person (false) it never reverts to true. New
importer-created persons carry provisional=true; register persons false.

Maiden name is stored as a MAIDEN_NAME PersonNameAlias, matching the
existing findOrCreateByAlias behaviour.

Refs #669

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:23:28 +02:00