The whole document load commits in one transaction, so a live counter
sits at 0 for the entire run and only jumps to the final number on
completion. Showing "0" next to the spinner read as "nothing happening"
and prompted repeated retriggers. Render just the spinner + running
label until the DONE branch displays the final processed count.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The four admin actions (trigger-import, generate-thumbnails,
backfill-versions, backfill-file-hashes) were posting bare fetches, so
the backend's CSRF filter would reject them once the protection is on.
Wrap each init with withCsrf() so the X-XSRF-TOKEN header is attached
from the cookie — same pattern other admin actions use.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
userEvent.clear deletes per-keystroke, so intermediate values 'Au'/'A'
transit through the bound searchQuery and each schedules a debounced
fetch. When CI keystroke jitter exceeds SEARCH_DEBOUNCE_MS (150 ms), an
intermediate timer fires before the input reaches '' and the count
assertion sees a phantom q=Au call. fill('') drops a single input event
so the empty-query branch wins deterministically — same pattern this
test file already uses for fill('Walter').
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
addRelationship now auto-flips family_member=true on both endpoints for
PARENT_OF/SPOUSE_OF/SIBLING_OF (commit 07300aef). That side-effect breaks
the pre-condition assertion in setFamilyMember_true_makes_person_appear_in_network,
which expects charlie not to appear in the network before the explicit flip.
Reset charlie's flag after addRelationship so the test still exercises the
setFamilyMember(true) -> network presence path it was written for.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Promote svelte/no-at-html-tags to project-wide error so any new
{@html} block fails lint locally and in CI — the primary XSS defense.
The existing .gitea/workflows/ci.yml raw-date regex guard stays in
place as layered defense (it covers the specific raw-date variable
names that must NEVER be rendered via {@html}).
Existing legitimate {@html} usages (renderBody mentions in
CommentMessage.svelte, sanitized Markdown in geschichten/[id]) already
carry justified inline `eslint-disable-next-line` comments. Lint stays
green; verified by running npm run lint.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extend the WRITE_ALL-guard spec to a full matrix for each of the four
form actions (confirm, delete, merge, rename): happy path (backend 200),
required-field validation where applicable (merge without
targetPersonId, rename without lastName), backend 403, backend 404,
and the unauthorized guard from the previous commit. Mirrors the
shape of frontend/src/routes/persons/page.server.spec.ts.
18 tests, all green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The page-level error pill on /persons/review used raw Tailwind colour
classes (border-red-200, bg-red-50, text-red-600) — bypassing the
project's danger semantic tokens and breaking dark-mode contract. Align
with the rest of the persons domain (and PersonReviewRow's own deleteBtn)
by switching to border-danger / bg-danger/10 / text-danger.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Confirming a provisional person was a one-click write — easy to fat-finger
on a touchscreen and irreversible (the person disappears from the review
list, with no obvious undo path). Mirror the destructive-delete pattern
with a non-destructive confirm dialog (destructive: false) so the action
requires a second deliberate click.
New i18n keys (persons_review_confirm_confirm_title/text/button) added
to all three locales (de, en, es).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The four form actions on /persons/review (confirm, delete, merge,
rename) had no server-side permission check — a reader with a hand-
crafted POST could trigger writes that the backend then rejected with
FORBIDDEN, but only after the round-trip. Add the existing hasWriteAll
guard at the top of each action and short-circuit with fail(403,
FORBIDDEN). Mirrors the guard pattern in the rest of the persons
domain (review-only writers must be gated client-side AND server-side).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DocumentImporter exposed a package-private openFileStream(File) so a
Mockito spy could force the IO-error branch of isPdfMagicBytes. The
test-only seam leaked into production: the method existed for testing,
not for any production extensibility.
Replace with a constructor-injected FileStreamOpener interface (single
abstract method, @FunctionalInterface) and a one-line
@Component DefaultFileStreamOpener delegate. Tests now inject a mock
opener instead of spying on the importer itself, which is also a more
idiomatic Mockito usage.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
resolveReceivers passed the slug as both `sourceRef` AND `lastName`, so
an unresolved receiver "smith-john" became a provisional Person with
lastName="smith-john" — a regression of the existing senderName→Person
contract.
Fix: zip the parallel `receiver_person_ids` and `receiver_names`
columns by position (the normalizer emits them 1:1 like
sender_person_id/sender_name). When the names list is shorter than the
slugs list, fall back to slug-as-name for the missing entries.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
buildDocument was a ~30-line method mixing attribution routing, date
parsing, authoritative collection management, file metadata, and
computed flags. Split into five named helpers — applyAttribution,
applyDates, applyAuthoritativeAssociations, applyFileMetadata,
applyComputedFlags — each doing one job. Pure refactor; all 43 existing
DocumentImporterTest cases still pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The four files in tools/import-normalizer/out/ contain real names,
addresses, and attribution prose for ~163 living/deceased family members
and were committed by mistake. They are now removed from the index
(kept on disk for local development) and gitignored.
The canonical artifacts are produced locally from the Python normalizer
and synced into IMPORT_HOST_DIR out-of-band alongside the PDFs. The
contract between normalizer and importer is the header schema, not the
file contents — CanonicalSheetReader fails closed on a missing header,
which is what locks the contract.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The canonical importer creates persons via PersonRegisterImporter first (no family_member
set) and then upserts them via PersonTreeImporter, but mergeCanonical never propagates
family_member to existing persons — so persons with imported relationships ended up
flagged family_member=false and never appeared in /api/persons family filters or the
family-network view.
RelationshipService is documented as the owner of the family_member flag, so the fix
lives there: addRelationship now sets family_member=true on both endpoints whenever the
relation type is PARENT_OF / SPOUSE_OF / SIBLING_OF (the same set getFamilyNetwork
filters by). Non-family types (FRIEND/COLLEAGUE/EMPLOYER/DOCTOR/NEIGHBOR/OTHER) leave
the flag alone — a family doctor isn't a family member. Extracted the type list as a
FAMILY_RELATION_TYPES constant and reused it in getFamilyNetwork for a single source of truth.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
pdfjs-dist resolves to 5.7.284, which requires Node >=22.13.0 || >=24.
With engine-strict=true in .npmrc, npm ci hard-fails on the Node 20 base
image, so the frontend dev server crash-loops (and a clean build fails).
CI runs the frontend on Node 22 (Playwright image), so the committed
lockfile already assumes 22. Bump all three Dockerfile stages to match.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The observability work moved actuator to a separate management port
(management.server.port: 8081), but the dev compose healthcheck still
probed :8080/actuator/health, which 404s. The backend was reported
unhealthy and the frontend (depends_on: backend healthy) never started.
docker-compose.prod.yml already uses 8081; this aligns dev with it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the legacy raw-spreadsheet importer references left behind after
#674 with the canonical import architecture (CanonicalImportOrchestrator +
four loaders) and document #686 index-based PDF resolution.
- l3-backend-3b: DocumentImporter now resolves PDF by index (importDir/
<index>.pdf) with index validation + canonical-path containment + %PDF
magic-byte check (no recursive walk / homoglyph file-path guards)
- c4-diagrams.md: replace massImport/excelSvc components + their rels with
an importOrch (CanonicalImportOrchestrator) component wired to doc/person/
tag services; refresh adminCtrl and adminSystem descriptions
- ARCHITECTURE.md: importing package row now describes the orchestrator +
four loaders consuming canonical artifacts
- TODO-backend.md: remove obsolete "MassImportService provides no status"
item (service deleted; orchestrator already exposes import-status); update
stale ExcelService test-coverage suggestion
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address PR #687 review concern (Elicit): add an ADR-025 Consequences
entry noting INDEX_PATTERN accepts only the current corpus shape (<=4
Latin-1 letters, hyphens, ASCII digits, optional x) and must be revisited
deliberately if the catalog scheme grows (5-letter prefix, digit-led id,
non-Latin letter), since such rows would otherwise be skipped, not
imported. Also records the ASCII-only \d intent.
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address PR #687 review concerns on DocumentImporterTest:
- Sara/Felix: add catalog-shape reject tests that pass every char
pre-check but must fail INDEX_PATTERN — "J 0070" (space), "WXYZA-0001"
(5 letters), "12-0001" (no letter prefix), "W-0001X" (uppercase X).
Verified red against a weakened pattern, green against the real one,
so the pattern branch (not the char guards) is now pinned.
- Felix: restore the import java.io.OutputStream line (was over-deleted
and patched with a fully-qualified name).
- Sara: document why the resolvePdfByIndex getCanonicalPath IOException
branch is intentionally left uncovered (no deterministic injection
seam; the log.warn is the substantive fix).
Adjust the two reflective resolvePdfByIndex calls for the new rowNumber
parameter.
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address PR #687 review concerns on DocumentImporter:
- Tobias: thread a 1-based source row number into importRow so the
"index rejected" skip log carries a breadcrumb (the row number, never
the raw hostile index) for post-import triage.
- Elicit: emit a distinct log when a valid index has no <index>.pdf on
disk (normal PLACEHOLDER) so it is not conflated with a rejected index.
- Nora: add a log.warn in resolvePdfByIndex's getCanonicalPath IOException
branch so the quiet fail-safe skip surfaces in ops, distinct from the
deliberate symlink-escape abort.
- Felix: replace inline fully-qualified java.util.regex.Pattern with an
import.
- Nora: document that \d is intentionally ASCII-only (do not add
UNICODE_CHARACTER_CLASS).
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The mass-import card no longer parses an ODS spreadsheet and MassImportService
was deleted (#674); /import now holds the normalizer's canonical artifacts
(canonical-*.xlsx + canonical-persons-tree.json) plus <index>.pdf files, read
by the canonical importer. Fix the IMPORT_HOST_DIR descriptions in
DEPLOYMENT.md and docker-compose.prod.yml accordingly.
Refs #686
File resolution is now by index (<index>.pdf), not the datei/file
column. Update the ADR-025 security sub-decision and consequence (the
recursive walk and file column are gone; a bad index skips its row with
a loud SkipReason, a symlink-escape still aborts via the containment
assertion) and DEPLOYMENT §6 (PDFs must be named <index>.pdf flat in
the import dir).
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Regenerated from the source workbooks with the committed overrides; the
export schema now has 16 columns (no file). canonical-persons.xlsx and
canonical-tag-tree.xlsx were unchanged at the cell level (only openpyxl
zip-byte churn) and were left untouched to keep the diff minimal.
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The corpus is uniform — every PDF is <index>.pdf flat in the import
dir — so resolve a document's PDF with an O(1) importDir.resolve(index
+ ".pdf") lookup instead of a recursive directory walk over the file
column. The index is validated against a strict catalog pattern
(1–4 Latin letters incl. umlauts, hyphen(s), digits, optional x) plus
the ported separator/dot/dotdot/null/slash-homoglyph/absolute-path
guards, and the resolved canonical path is asserted to stay inside the
import dir as defense-in-depth. The %PDF magic-byte check still gates
upload; status UPLOADED/PLACEHOLDER and the index→originalFilename
upsert key are unchanged. The file column and findFileRecursive walk
are gone, and the security regression tests now assert a malicious or
garbage index is rejected and a valid index resolves to exactly
importDir/<index>.pdf within containment.
Closes#686Closes#676
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The import corpus is uniform: every PDF is named <index>.pdf, so the
file column (the spreadsheet's datei value) is redundant. Remove file
from CanonicalDocument, RawRow, _FIELDS, to_canonical, and DOC_COLUMNS,
plus the now-moot index_file_mismatch review flag/CSV/stat and the
datei header mapping. date_end and the tree person_id are kept.
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The global undated-count rework moved the pure-text-RELEVANCE shortcut
into runSearch, where it ran after the unconditional
findAllMatchingIdsByFts call. That routed pure-text relevance through the
in-memory id path and returned empty match data, breaking FTS rank order
and snippet/offset enrichment.
Hoist the shortcut back to the top of searchDocuments so it short-circuits
to findFtsPageRaw before findAllMatchingIdsByFts, while still computing the
global undatedCount for all non-fast-path searches.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Owner decision (#668): when two documents share a meta_date, order them by
title ascending instead of createdAt ascending. title is @Column(nullable=false)
so it is always present, giving a deterministic, human-meaningful total order.
Only the DATE-sort fast path changes; the in-memory SENDER/RECEIVER/RELEVANCE
comparators are untouched.
ORDER BY meta_date <dir> NULLS LAST, title ASC
Tests assert title-asc tiebreaking for same-date rows in BOTH directions, with a
fixture whose title order is the OPPOSITE of insertion (createdAt) order so the
test fails if the tiebreaker reverts to createdAt. The integration test drives
the production resolveSort against real Postgres.
Refs #668
A screen reader announced the bare number ("Nur undatierte 42"). Add an
aria-label ("42 undatierte Dokumente") via a new i18n key and hide the
purely-visual digit with aria-hidden, so the toggle + count read sensibly.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "missing documentDate" test asserted the OLD bare em-dash; #668
replaced it with the "Datum unbekannt" badge via <DocumentDate>. Assert
the badge text and rename the misleading test title.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Surface the backend's global undatedCount on the "Nur undatierte" toggle as
a count chip — the total undated documents matching the current filter
across all pages, not the page slice. The loader forwards undatedCount
straight through (defaulting to 0); the chip hides at 0 and stays visible
regardless of the toggle state so it advertises the triage backlog size.
generate:api was hand-edited (undatedCount added to DocumentSearchResult) —
CI must re-run npm run generate:api to confirm parity.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The undated bucket count was page-local — derived from the year-grouping
of the current page's items, so it could never exceed the page size. The
owner's decision is for it to reflect ALL undated documents matching the
active filter across every page.
Add an undatedCount field to DocumentSearchResult, computed once per search
via a COUNT over the same filter spec with undatedOnly(true) forced —
independent of the "Nur undatierte" toggle so it never collapses to the
page slice or double-counts. A from/to range excludes undated rows by the
collision rule, so the count is legitimately 0 inside a date range.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The desktop right-column kept a leftover {#if doc.documentDate}…{:else}—{/if}
fallback that emitted a bare em-dash for undated documents, while the mobile
block already always rendered <DocumentDate>. DocumentDate defensively maps a
null date to the "Datum unbekannt" badge, so render it unconditionally — an
undated document is an absence, not an error, and never shows a bare "—".
Refs #668
The dated branch wrapped {label} in a flex span containing a single child
span — redundant nesting. Render the label directly in one span.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
"Datum unbekannt" is a semantically meaningful date surface, not decorative
chrome, so the 10px chip text is too small for the senior reader audience.
Bump to text-xs (≥12px) per the WCAG min-legible-text guidance.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the single-sender containsExactlyInAnyOrder check with a two-sender
fixture and ordered containsExactly proving an undated doc stays within its
sender group and never floats to the page head. Add a DESC-direction case for
in-memory-path symmetry and an undated=true + sort=SENDER case capturing the
Specification to prove undatedOnly is still applied on the person-sort path.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
No test calls resolveSort directly — the sort tests assert through
searchDocuments + ArgumentCaptor<Pageable>, so the package-private widening
added no value. Narrow the API surface back to private.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Negative guarantee for #668: ChronikRow renders the activity timestamp
(happenedAt), and ActivityFeedItemDTO carries no document-date surface, so
no undated badge or "Datum unbekannt" letter-date label may appear. Pins
this as a regression fixture so a future change can't quietly add a date
chip to the activity feed.
Refs #668
SearchFilterBar gains an aria-pressed "Nur undatierte" toggle in the
advanced row (min-h-[44px] touch target, labels the state not the colour).
The documents page threads `undated` through the filter snapshot so it is a
shareable URL param picked up by both filter-change nav and pagination, and
flows into the bulk-edit "select all" /ids request. Toggling resets to page
0 via the existing implicit page-drop.
Refs #668
DocumentList gains from/to props; when a date range is active and yields no
results, the empty state shows the localized docs_range_excludes_undated
note instead of the generic copy, so the reader understands undated letters
aren't part of a range. Person-grouped modes keep undated letters under
their sender/receiver (badge-on-row, no synthetic sub-group).
Refs #668
DocumentRow rendered a bare em-dash for null-dated letters — a glyph a
screen reader announces as nothing. Both breakpoints now render the single
DocumentDate component unconditionally (no {#if}/—/{:else}), so the cue
cannot drift; its unknown state is a neutral metadata chip ("Datum
unbekannt", text-ink-3, ≥4.5:1 both themes) with a non-color calendar glyph,
never red/amber. Present dates render at honest precision via
formatDocumentDate ("Juni 1916", not a fabricated day).
Refs #668
Parses ?undated strictly (=== 'true', mirroring the tagOp clamp), forwards
it as undated || undefined so the absent case drops out of the query, and
returns the flag in page data for the control to reflect. Adds the
docs_filter_undated_only toggle label and the explanatory
docs_range_excludes_undated empty-state copy in de/en/es. The badge reuses
the existing date_precision_unknown ("Datum unbekannt") key from #677.
OpenAPI types hand-edited for the new undated query param on /search and
/ids — CI must run `npm run generate:api` to confirm parity with the spec.
Refs #668
Adds an optional `undated` query param to GET /api/documents/search and
/api/documents/ids, threaded through searchDocuments and findIdsForFilter
into the shared buildSearchSpec via undatedOnly(boolean). undated=true also
bypasses the pure-text RELEVANCE SQL shortcut, which skips buildSearchSpec
and would otherwise drop the predicate. The read GET stays unguarded
(WebMvc authz test pins 200 for an authenticated user, 401 unauthenticated).
A locking test proves the in-memory SENDER sort keeps undated letters under
their sender.
Refs #668
undatedOnly(false) is a no-op (null predicate); undatedOnly(true) returns
documentDate IS NULL, matching the existing hasStatus null-as-no-op pattern.
Real-Postgres tests pin the load-bearing guarantees H2 cannot prove: ASC
NULLS-LAST ordering, BETWEEN excludes null-dated rows, and that undated=true
combined with a from/to range returns empty (the collision rule).
Refs #668
resolveSort produced Sort.by(direction, "documentDate") with NATIVE null
handling, so Postgres surfaced undated (null meta_date) documents FIRST on
an ASC sort. Apply nullsLast() so undated rows order last for both ASC and
DESC, with a createdAt-asc tiebreaker for a stable total order when every
row is null-dated (the upcoming "Nur undatierte" filter).
Refs #668
Pure formatting (line wrap) so the file passes prettier --check; no behaviour
change.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add countByFilter parity coverage for the query (LIKE) path so the shared
FILTER_WHERE slice and count can't drift, and an integration test proving
deletePerson detaches a person referenced as both sender and receiver before
delete — the documents survive (sender nulled, receiver link removed) with no
FK orphan.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The legacy sort=documentCount path wrapped its result with paged(top, 0,
safeSize, top.size()), so totalElements/pageSize looked like a paged slice of
a larger set when in fact the top-N query returns the complete result. Add a
dedicated PersonSearchResult.topN factory that reports reality — totalElements
= returned count, pageSize = that count, totalPages = 1 (0 when empty) — and
pin both the populated and empty semantics with controller tests.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The locals.user.groups.some(...WRITE_ALL) derivation was copy-pasted across
the persons directory, persons review and the two document loaders touched by
this PR. Extract a single tested hasWriteAll(locals) helper in
$lib/shared/server and reuse it, removing the ad-hoc casts.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>