Commit Graph

3013 Commits

Author SHA1 Message Date
Marcel
a68a822c13 feat(import): pass generation from JSON in PersonTreeImporter (#689)
Reads the optional `generation` integer from the canonical tree JSON and
routes it into PersonUpsertCommand. Out-of-range values are skip-and-
warned with the same policy as the register importer.

Tree imports run after register (per CanonicalImportOrchestrator); a
tree-confirmed integer overwrites a register-parsed value — both sides
are "canonical" in preferHuman terms (neither is a human edit).

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:32:27 +02:00
Marcel
df0037cba2 feat(import): parse generation column in PersonRegisterImporter (#689)
Reads the optional `generation` cell by header name (REQUIRED_HEADERS is
not extended — REQ-IMP-001 backward-compat for older artifacts), parses
it through GENERATION_PATTERN (^\s*G?\s*(-?\d+)), and routes it into
PersonUpsertCommand.generation.

Out-of-range values (G 99, G -1) are skip-and-warned, never abort the
batch; the post-parse range guard mirrors the V70 CHECK constraint so
the DB never sees a value Bean Validation wouldn't accept.

Pinned with a parametrised CsvSource covering every shape from the
acceptance criteria plus a backward-compat test (artifact without a
generation column still imports, all upserts get generation=null).

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:30:31 +02:00
Marcel
dcb5585c64 feat(person): route generation through service write paths (#689)
- fromCanonical writes the imported generation into a new Person row.
- mergeCanonical routes existing/canonical generation through the
  existing preferHuman(Integer, Integer) overload so a human-edited
  value is never overwritten on re-import (ADR-025).
- updatePerson writes generation verbatim from the form DTO so a human
  can clear it back to null — same shape as birthYear/deathYear.
- createPerson(PersonUpdateDTO) writes generation so /persons/new flow
  doesn't silently drop a selected G value on create.

Pinned with five tests covering the four write paths plus the
documenting test that captures preferHuman's known limitation
(explicit human null is overwritten by a non-null canonical value —
same as birthYear/deathYear, deferred to a future helper rework if it
ever produces a user-visible bug).

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:27:11 +02:00
Marcel
1e77d6d98c feat(person): generation on PersonUpsertCommand + PersonUpdateDTO (#689)
Adds the optional generation field to both DTOs:

- PersonUpsertCommand gains Integer generation in the canonical-import
  builder chain; service wiring lands in the next commit.
- PersonUpdateDTO gains @Min(0)@Max(10) Integer generation, the form-path
  surface. The constraints mirror the V70 CHECK so validation fails fast
  at the controller before reaching the DB.

PersonControllerTest pins the validation behaviour: -1 → 400, 11 → 400,
null → 200, 3 → 200 for both PUT (update) and POST (create) paths. The
GlobalExceptionHandler maps MethodArgumentNotValidException to
VALIDATION_ERROR so the frontend's extractErrorCode keeps working.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:23:38 +02:00
Marcel
f22508ca91 feat(person): add nullable generation column to persons (#689)
Flyway V70: SMALLINT generation column with CHECK(0..10) and partial
index over non-null rows. Person.generation field surfaces it through
the JPA model. Pre-import rows and persons outside the curated family
graph legitimately stay null; the canonical importer (next commits)
back-fills via preferHuman so a human-edited value is never lost.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:20:24 +02:00
Marcel
1cb05697cc refactor(stammbaum): extract buildLayout to pure module
Move the layout function out of StammbaumTree.svelte (lines 47-275) into a
new pure TypeScript module at frontend/src/lib/person/genealogy/layout/
buildLayout.ts so it can be exercised by direct unit tests. Drops the
eslint-disable svelte/prefer-svelte-reactivity blanket; switches the
remaining scope-local Maps/Sets in parentLinks to SvelteMap/SvelteSet to
satisfy the rule per-call-site. No behaviour change — existing
StammbaumTree tests must pass byte-for-byte.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:17:18 +02:00
ccf1661768 Merge branch 'main' into docs/import-migration
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m36s
CI / OCR Service Tests (pull_request) Successful in 21s
CI / Backend Unit Tests (pull_request) Successful in 3m55s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
CI / Unit & Component Tests (push) Successful in 4m3s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 3m37s
CI / fail2ban Regex (push) Successful in 43s
CI / Semgrep Security Scan (push) Successful in 23s
CI / Compose Bucket Idempotency (push) Successful in 1m8s
2026-05-28 13:00:36 +02:00
Marcel
74cc4c8722 fix(admin): drop processed count from RUNNING import card
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Semgrep Security Scan (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
The whole document load commits in one transaction, so a live counter
sits at 0 for the entire run and only jumps to the final number on
completion. Showing "0" next to the spinner read as "nothing happening"
and prompted repeated retriggers. Render just the spinner + running
label until the DONE branch displays the final processed count.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 12:56:00 +02:00
Marcel
548bc60747 fix(admin): include CSRF token on admin trigger/backfill POSTs
The four admin actions (trigger-import, generate-thumbnails,
backfill-versions, backfill-file-hashes) were posting bare fetches, so
the backend's CSRF filter would reject them once the protection is on.
Wrap each init with withCsrf() so the X-XSRF-TOKEN header is attached
from the cookie — same pattern other admin actions use.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 12:55:34 +02:00
Marcel
4581fc0b1f test(discussion): atomically clear mention searchbox to kill CI flake
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 21s
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Semgrep Security Scan (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
userEvent.clear deletes per-keystroke, so intermediate values 'Au'/'A'
transit through the bound searchQuery and each schedules a debounced
fetch. When CI keystroke jitter exceeds SEARCH_DEBOUNCE_MS (150 ms), an
intermediate timer fires before the input reaches '' and the count
assertion sees a phantom q=Au call. fill('') drops a single input event
so the empty-query branch wins deterministically — same pattern this
test file already uses for fill('Walter').

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 12:53:36 +02:00
Marcel
8f3c799b8f test(relationship): reset family_member flag in setFamilyMember network test
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3m2s
CI / OCR Service Tests (pull_request) Successful in 24s
CI / Backend Unit Tests (pull_request) Successful in 3m56s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m4s
addRelationship now auto-flips family_member=true on both endpoints for
PARENT_OF/SPOUSE_OF/SIBLING_OF (commit 07300aef). That side-effect breaks
the pre-condition assertion in setFamilyMember_true_makes_person_appear_in_network,
which expects charlie not to appear in the network before the explicit flip.
Reset charlie's flag after addRelationship so the test still exercises the
setFamilyMember(true) -> network presence path it was written for.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 11:38:48 +02:00
Marcel
f80dda74f0 chore(lint): enable svelte/no-at-html-tags as primary XSS guard
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m49s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Failing after 4m19s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m4s
Promote svelte/no-at-html-tags to project-wide error so any new
{@html} block fails lint locally and in CI — the primary XSS defense.
The existing .gitea/workflows/ci.yml raw-date regex guard stays in
place as layered defense (it covers the specific raw-date variable
names that must NEVER be rendered via {@html}).

Existing legitimate {@html} usages (renderBody mentions in
CommentMessage.svelte, sanitized Markdown in geschichten/[id]) already
carry justified inline `eslint-disable-next-line` comments. Lint stays
green; verified by running npm run lint.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:45:10 +02:00
Marcel
22603a4b04 test(persons): cover review form actions in server spec
Extend the WRITE_ALL-guard spec to a full matrix for each of the four
form actions (confirm, delete, merge, rename): happy path (backend 200),
required-field validation where applicable (merge without
targetPersonId, rename without lastName), backend 403, backend 404,
and the unauthorized guard from the previous commit. Mirrors the
shape of frontend/src/routes/persons/page.server.spec.ts.

18 tests, all green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:39:43 +02:00
Marcel
461a8b125d fix(persons): use danger semantic tokens for review error pill
The page-level error pill on /persons/review used raw Tailwind colour
classes (border-red-200, bg-red-50, text-red-600) — bypassing the
project's danger semantic tokens and breaking dark-mode contract. Align
with the rest of the persons domain (and PersonReviewRow's own deleteBtn)
by switching to border-danger / bg-danger/10 / text-danger.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:37:39 +02:00
Marcel
a670ba014c feat(persons): add confirm dialog to provisional confirm action
Confirming a provisional person was a one-click write — easy to fat-finger
on a touchscreen and irreversible (the person disappears from the review
list, with no obvious undo path). Mirror the destructive-delete pattern
with a non-destructive confirm dialog (destructive: false) so the action
requires a second deliberate click.

New i18n keys (persons_review_confirm_confirm_title/text/button) added
to all three locales (de, en, es).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:36:38 +02:00
Marcel
a9cac08f3c fix(persons): guard review form actions with hasWriteAll server-side
The four form actions on /persons/review (confirm, delete, merge,
rename) had no server-side permission check — a reader with a hand-
crafted POST could trigger writes that the backend then rejected with
FORBIDDEN, but only after the round-trip. Add the existing hasWriteAll
guard at the top of each action and short-circuit with fail(403,
FORBIDDEN). Mirrors the guard pattern in the rest of the persons
domain (review-only writers must be gated client-side AND server-side).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:33:23 +02:00
Marcel
4cc725d546 refactor(importing): inject FileStreamOpener to remove test-only seam
DocumentImporter exposed a package-private openFileStream(File) so a
Mockito spy could force the IO-error branch of isPdfMagicBytes. The
test-only seam leaked into production: the method existed for testing,
not for any production extensibility.

Replace with a constructor-injected FileStreamOpener interface (single
abstract method, @FunctionalInterface) and a one-line
@Component DefaultFileStreamOpener delegate. Tests now inject a mock
opener instead of spying on the importer itself, which is also a more
idiomatic Mockito usage.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:29:41 +02:00
Marcel
535594378a fix(importing): use receiver_names for provisional person display name
resolveReceivers passed the slug as both `sourceRef` AND `lastName`, so
an unresolved receiver "smith-john" became a provisional Person with
lastName="smith-john" — a regression of the existing senderName→Person
contract.

Fix: zip the parallel `receiver_person_ids` and `receiver_names`
columns by position (the normalizer emits them 1:1 like
sender_person_id/sender_name). When the names list is shorter than the
slugs list, fall back to slug-as-name for the missing entries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:26:28 +02:00
Marcel
e93b09f1e2 refactor(importing): split DocumentImporter.buildDocument into named applyX helpers
buildDocument was a ~30-line method mixing attribution routing, date
parsing, authoritative collection management, file metadata, and
computed flags. Split into five named helpers — applyAttribution,
applyDates, applyAuthoritativeAssociations, applyFileMetadata,
applyComputedFlags — each doing one job. Pure refactor; all 43 existing
DocumentImporterTest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:23:24 +02:00
Marcel
46d1f5c6d8 chore(import): stop tracking real family PII canonical artifacts
The four files in tools/import-normalizer/out/ contain real names,
addresses, and attribution prose for ~163 living/deceased family members
and were committed by mistake. They are now removed from the index
(kept on disk for local development) and gitignored.

The canonical artifacts are produced locally from the Python normalizer
and synced into IMPORT_HOST_DIR out-of-band alongside the PDFs. The
contract between normalizer and importer is the header schema, not the
file contents — CanonicalSheetReader fails closed on a missing header,
which is what locks the contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:20:38 +02:00
Marcel
07300aeff7 fix(person): flip family_member on both endpoints when a family-graph relationship is added
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m39s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Failing after 3m45s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m4s
The canonical importer creates persons via PersonRegisterImporter first (no family_member
set) and then upserts them via PersonTreeImporter, but mergeCanonical never propagates
family_member to existing persons — so persons with imported relationships ended up
flagged family_member=false and never appeared in /api/persons family filters or the
family-network view.

RelationshipService is documented as the owner of the family_member flag, so the fix
lives there: addRelationship now sets family_member=true on both endpoints whenever the
relation type is PARENT_OF / SPOUSE_OF / SIBLING_OF (the same set getFamilyNetwork
filters by). Non-family types (FRIEND/COLLEAGUE/EMPLOYER/DOCTOR/NEIGHBOR/OTHER) leave
the flag alone — a family doctor isn't a family member. Extracted the type list as a
FAMILY_RELATION_TYPES constant and reused it in getFamilyNetwork for a single source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 09:15:37 +02:00
Marcel
643d504c7a fix(docker): bump frontend image to Node 22 for pdfjs-dist engine requirement
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m51s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m41s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 23s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m4s
CI / Unit & Component Tests (push) Successful in 3m41s
CI / OCR Service Tests (push) Successful in 21s
CI / fail2ban Regex (push) Has been cancelled
CI / Semgrep Security Scan (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
pdfjs-dist resolves to 5.7.284, which requires Node >=22.13.0 || >=24.
With engine-strict=true in .npmrc, npm ci hard-fails on the Node 20 base
image, so the frontend dev server crash-loops (and a clean build fails).
CI runs the frontend on Node 22 (Playwright image), so the committed
lockfile already assumes 22. Bump all three Dockerfile stages to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:46:16 +02:00
Marcel
c9f5f6d665 fix(docker): point dev backend healthcheck at management port 8081
The observability work moved actuator to a separate management port
(management.server.port: 8081), but the dev compose healthcheck still
probed :8080/actuator/health, which 404s. The backend was reported
unhealthy and the frontend (depends_on: backend healthy) never started.
docker-compose.prod.yml already uses 8081; this aligns dev with it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:45:51 +02:00
Marcel
9d9cd644ec Merge remote-tracking branch 'origin/main' into HEAD
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m30s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m46s
CI / fail2ban Regex (pull_request) Failing after 46s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
# Conflicts:
#	frontend/src/lib/shared/dashboard/ReaderRecentDocs.svelte.spec.ts
#	frontend/src/routes/+page.server.ts
2026-05-27 22:16:26 +02:00
Marcel
0a3d12b9af docs: drop remaining stale MassImportService/ExcelService references
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m42s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m50s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
Replace the legacy raw-spreadsheet importer references left behind after
#674 with the canonical import architecture (CanonicalImportOrchestrator +
four loaders) and document #686 index-based PDF resolution.

- l3-backend-3b: DocumentImporter now resolves PDF by index (importDir/
  <index>.pdf) with index validation + canonical-path containment + %PDF
  magic-byte check (no recursive walk / homoglyph file-path guards)
- c4-diagrams.md: replace massImport/excelSvc components + their rels with
  an importOrch (CanonicalImportOrchestrator) component wired to doc/person/
  tag services; refresh adminCtrl and adminSystem descriptions
- ARCHITECTURE.md: importing package row now describes the orchestrator +
  four loaders consuming canonical artifacts
- TODO-backend.md: remove obsolete "MassImportService provides no status"
  item (service deleted; orchestrator already exposes import-status); update
  stale ExcelService test-coverage suggestion

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
34e0eec1ba docs(adr): record the index pattern as a corpus-specific constraint
Address PR #687 review concern (Elicit): add an ADR-025 Consequences
entry noting INDEX_PATTERN accepts only the current corpus shape (<=4
Latin-1 letters, hyphens, ASCII digits, optional x) and must be revisited
deliberately if the catalog scheme grows (5-letter prefix, digit-led id,
non-Latin letter), since such rows would otherwise be skipped, not
imported. Also records the ASCII-only \d intent.

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
f5e2241fe0 test(importing): pin regex reject-boundary + note untestable IO branch
Address PR #687 review concerns on DocumentImporterTest:
- Sara/Felix: add catalog-shape reject tests that pass every char
  pre-check but must fail INDEX_PATTERN — "J 0070" (space), "WXYZA-0001"
  (5 letters), "12-0001" (no letter prefix), "W-0001X" (uppercase X).
  Verified red against a weakened pattern, green against the real one,
  so the pattern branch (not the char guards) is now pinned.
- Felix: restore the import java.io.OutputStream line (was over-deleted
  and patched with a fully-qualified name).
- Sara: document why the resolvePdfByIndex getCanonicalPath IOException
  branch is intentionally left uncovered (no deterministic injection
  seam; the log.warn is the substantive fix).

Adjust the two reflective resolvePdfByIndex calls for the new rowNumber
parameter.

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
f96b9fbffc feat(importing): log import-row breadcrumbs and distinguish skip outcomes
Address PR #687 review concerns on DocumentImporter:
- Tobias: thread a 1-based source row number into importRow so the
  "index rejected" skip log carries a breadcrumb (the row number, never
  the raw hostile index) for post-import triage.
- Elicit: emit a distinct log when a valid index has no <index>.pdf on
  disk (normal PLACEHOLDER) so it is not conflated with a rejected index.
- Nora: add a log.warn in resolvePdfByIndex's getCanonicalPath IOException
  branch so the quiet fail-safe skip surfaces in ops, distinct from the
  deliberate symlink-escape abort.
- Felix: replace inline fully-qualified java.util.regex.Pattern with an
  import.
- Nora: document that \d is intentionally ASCII-only (do not add
  UNICODE_CHARACTER_CLASS).

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
a4c2b6289d docs: drop stale MassImportService/ODS references from import deploy docs
The mass-import card no longer parses an ODS spreadsheet and MassImportService
was deleted (#674); /import now holds the normalizer's canonical artifacts
(canonical-*.xlsx + canonical-persons-tree.json) plus <index>.pdf files, read
by the canonical importer. Fix the IMPORT_HOST_DIR descriptions in
DEPLOYMENT.md and docker-compose.prod.yml accordingly.

Refs #686
2026-05-27 22:08:45 +02:00
Marcel
658277e97c docs(import): document index-based PDF resolution in ADR-025 and DEPLOYMENT
File resolution is now by index (<index>.pdf), not the datei/file
column. Update the ADR-025 security sub-decision and consequence (the
recursive walk and file column are gone; a bad index skips its row with
a loud SkipReason, a symlink-escape still aborts via the containment
assertion) and DEPLOYMENT §6 (PDFs must be named <index>.pdf flat in
the import dir).

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
32d9a33550 chore(normalizer): regenerate canonical-documents.xlsx without file column
Regenerated from the source workbooks with the committed overrides; the
export schema now has 16 columns (no file). canonical-persons.xlsx and
canonical-tag-tree.xlsx were unchanged at the cell level (only openpyxl
zip-byte churn) and were left untouched to keep the diff minimal.

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
f5eb227239 feat(importing): resolve import PDFs directly by index
The corpus is uniform — every PDF is <index>.pdf flat in the import
dir — so resolve a document's PDF with an O(1) importDir.resolve(index
+ ".pdf") lookup instead of a recursive directory walk over the file
column. The index is validated against a strict catalog pattern
(1–4 Latin letters incl. umlauts, hyphen(s), digits, optional x) plus
the ported separator/dot/dotdot/null/slash-homoglyph/absolute-path
guards, and the resolved canonical path is asserted to stay inside the
import dir as defense-in-depth. The %PDF magic-byte check still gates
upload; status UPLOADED/PLACEHOLDER and the index→originalFilename
upsert key are unchanged. The file column and findFileRecursive walk
are gone, and the security regression tests now assert a malicious or
garbage index is rejected and a valid index resolves to exactly
importDir/<index>.pdf within containment.

Closes #686
Closes #676

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
227116fe2d refactor(normalizer): drop file column now PDFs resolve by index
The import corpus is uniform: every PDF is named <index>.pdf, so the
file column (the spreadsheet's datei value) is redundant. Remove file
from CanonicalDocument, RawRow, _FIELDS, to_canonical, and DOC_COLUMNS,
plus the now-moot index_file_mismatch review flag/CSV/stat and the
datei header mapping. date_end and the tree person_id are kept.

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
7183d15fe5 fix(document): restore pure-text-relevance FTS fast path past undated count
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m29s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m52s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
The global undated-count rework moved the pure-text-RELEVANCE shortcut
into runSearch, where it ran after the unconditional
findAllMatchingIdsByFts call. That routed pure-text relevance through the
in-memory id path and returned empty match data, breaking FTS rank order
and snippet/offset enrichment.

Hoist the shortcut back to the top of searchDocuments so it short-circuits
to findFtsPageRaw before findAllMatchingIdsByFts, while still computing the
global undatedCount for all non-fast-path searches.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 21:04:48 +02:00
Marcel
b52bf60913 fix(document): tie-break equal-date DATE sort by title asc, not createdAt
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3m2s
CI / OCR Service Tests (pull_request) Successful in 24s
CI / Backend Unit Tests (pull_request) Failing after 3m54s
CI / fail2ban Regex (pull_request) Successful in 47s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
Owner decision (#668): when two documents share a meta_date, order them by
title ascending instead of createdAt ascending. title is @Column(nullable=false)
so it is always present, giving a deterministic, human-meaningful total order.
Only the DATE-sort fast path changes; the in-memory SENDER/RECEIVER/RELEVANCE
comparators are untouched.

ORDER BY meta_date <dir> NULLS LAST, title ASC

Tests assert title-asc tiebreaking for same-date rows in BOTH directions, with a
fixture whose title order is the OPPOSITE of insertion (createdAt) order so the
test fails if the tiebreaker reverts to createdAt. The integration test drives
the production resolveSort against real Postgres.

Refs #668
2026-05-27 20:21:18 +02:00
Marcel
45e63307bb fix(documents): give the undated count chip a self-describing a11y name
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m42s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Failing after 3m46s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
A screen reader announced the bare number ("Nur undatierte 42"). Add an
aria-label ("42 undatierte Dokumente") via a new i18n key and hide the
purely-visual digit with aria-hidden, so the toggle + count read sensibly.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:54:48 +02:00
Marcel
995471082e test(documents): update obsolete em-dash assertion to undated badge
The "missing documentDate" test asserted the OLD bare em-dash; #668
replaced it with the "Datum unbekannt" badge via <DocumentDate>. Assert
the badge text and rename the misleading test title.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:54:24 +02:00
Marcel
c6137a26a2 feat(documents): show global undated count chip on the filter toggle
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m50s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Failing after 4m3s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
Surface the backend's global undatedCount on the "Nur undatierte" toggle as
a count chip — the total undated documents matching the current filter
across all pages, not the page slice. The loader forwards undatedCount
straight through (defaulting to 0); the chip hides at 0 and stays visible
regardless of the toggle state so it advertises the triage backlog size.

generate:api was hand-edited (undatedCount added to DocumentSearchResult) —
CI must re-run npm run generate:api to confirm parity.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:42:57 +02:00
Marcel
a3c3f14aea feat(documents): return global undated count in search response
The undated bucket count was page-local — derived from the year-grouping
of the current page's items, so it could never exceed the page size. The
owner's decision is for it to reflect ALL undated documents matching the
active filter across every page.

Add an undatedCount field to DocumentSearchResult, computed once per search
via a COUNT over the same filter spec with undatedOnly(true) forced —
independent of the "Nur undatierte" toggle so it never collapses to the
page slice or double-counts. A from/to range excludes undated rows by the
collision rule, so the count is legitimately 0 inside a date range.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:42:32 +02:00
Marcel
19cd17d9cd fix(documents): always render undated badge in DocumentRow desktop column
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m54s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m40s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s
The desktop right-column kept a leftover {#if doc.documentDate}…{:else}—{/if}
fallback that emitted a bare em-dash for undated documents, while the mobile
block already always rendered <DocumentDate>. DocumentDate defensively maps a
null date to the "Datum unbekannt" badge, so render it unconditionally — an
undated document is an absence, not an error, and never shows a bare "—".

Refs #668
2026-05-27 19:17:18 +02:00
Marcel
508575eccb refactor(documents): collapse redundant span nesting in DocumentDate else branch
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m51s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m43s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
The dated branch wrapped {label} in a flex span containing a single child
span — redundant nesting. Render the label directly in one span.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:09:07 +02:00
Marcel
85372e3669 fix(documents): enlarge undated badge text to text-xs for legibility
"Datum unbekannt" is a semantically meaningful date surface, not decorative
chrome, so the 10px chip text is too small for the senior reader audience.
Bump to text-xs (≥12px) per the WCAG min-legible-text guidance.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:08:41 +02:00
Marcel
caec92e7de test(document): lock undated-stays-in-sender-group with ordered multi-sender assertions
Replace the single-sender containsExactlyInAnyOrder check with a two-sender
fixture and ordered containsExactly proving an undated doc stays within its
sender group and never floats to the page head. Add a DESC-direction case for
in-memory-path symmetry and an undated=true + sort=SENDER case capturing the
Specification to prove undatedOnly is still applied on the person-sort path.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:06:33 +02:00
Marcel
eacfd15f8e refactor(document): revert resolveSort to private
No test calls resolveSort directly — the sort tests assert through
searchDocuments + ArgumentCaptor<Pageable>, so the package-private widening
added no value. Narrow the API surface back to private.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:06:16 +02:00
Marcel
a345bba74b test(activity): assert Chronik rows never fabricate a letter date
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m54s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m30s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
Negative guarantee for #668: ChronikRow renders the activity timestamp
(happenedAt), and ActivityFeedItemDTO carries no document-date surface, so
no undated badge or "Datum unbekannt" letter-date label may appear. Pins
this as a regression fixture so a future change can't quietly add a date
chip to the activity feed.

Refs #668
2026-05-27 18:54:35 +02:00
Marcel
098c2c9def feat(documents): add a "Nur undatierte" filter toggle wired to the URL
SearchFilterBar gains an aria-pressed "Nur undatierte" toggle in the
advanced row (min-h-[44px] touch target, labels the state not the colour).
The documents page threads `undated` through the filter snapshot so it is a
shareable URL param picked up by both filter-change nav and pagination, and
flows into the bulk-edit "select all" /ids request. Toggling resets to page
0 via the existing implicit page-drop.

Refs #668
2026-05-27 18:53:44 +02:00
Marcel
5d8bb70255 feat(documents): explain that a date range excludes undated documents
DocumentList gains from/to props; when a date range is active and yields no
results, the empty state shows the localized docs_range_excludes_undated
note instead of the generic copy, so the reader understands undated letters
aren't part of a range. Person-grouped modes keep undated letters under
their sender/receiver (badge-on-row, no synthetic sub-group).

Refs #668
2026-05-27 18:50:18 +02:00
Marcel
bca3f34cec feat(documents): badge undated rows instead of a bare em-dash
DocumentRow rendered a bare em-dash for null-dated letters — a glyph a
screen reader announces as nothing. Both breakpoints now render the single
DocumentDate component unconditionally (no {#if}/—/{:else}), so the cue
cannot drift; its unknown state is a neutral metadata chip ("Datum
unbekannt", text-ink-3, ≥4.5:1 both themes) with a non-color calendar glyph,
never red/amber. Present dates render at honest precision via
formatDocumentDate ("Juni 1916", not a fabricated day).

Refs #668
2026-05-27 18:48:45 +02:00
Marcel
f1fc3dc1ce feat(documents): thread undated filter through the search loader + i18n
Parses ?undated strictly (=== 'true', mirroring the tagOp clamp), forwards
it as undated || undefined so the absent case drops out of the query, and
returns the flag in page data for the control to reflect. Adds the
docs_filter_undated_only toggle label and the explanatory
docs_range_excludes_undated empty-state copy in de/en/es. The badge reuses
the existing date_precision_unknown ("Datum unbekannt") key from #677.

OpenAPI types hand-edited for the new undated query param on /search and
/ids — CI must run `npm run generate:api` to confirm parity with the spec.

Refs #668
2026-05-27 18:45:03 +02:00
Marcel
268c31a49b feat(document): thread an undated filter through search and the /ids path
Adds an optional `undated` query param to GET /api/documents/search and
/api/documents/ids, threaded through searchDocuments and findIdsForFilter
into the shared buildSearchSpec via undatedOnly(boolean). undated=true also
bypasses the pure-text RELEVANCE SQL shortcut, which skips buildSearchSpec
and would otherwise drop the predicate. The read GET stays unguarded
(WebMvc authz test pins 200 for an authenticated user, 401 unauthenticated).
A locking test proves the in-memory SENDER sort keeps undated letters under
their sender.

Refs #668
2026-05-27 18:42:17 +02:00