Address PR #687 review concern (Elicit): add an ADR-025 Consequences
entry noting INDEX_PATTERN accepts only the current corpus shape (<=4
Latin-1 letters, hyphens, ASCII digits, optional x) and must be revisited
deliberately if the catalog scheme grows (5-letter prefix, digit-led id,
non-Latin letter), since such rows would otherwise be skipped, not
imported. Also records the ASCII-only \d intent.
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address PR #687 review concerns on DocumentImporterTest:
- Sara/Felix: add catalog-shape reject tests that pass every char
pre-check but must fail INDEX_PATTERN — "J 0070" (space), "WXYZA-0001"
(5 letters), "12-0001" (no letter prefix), "W-0001X" (uppercase X).
Verified red against a weakened pattern, green against the real one,
so the pattern branch (not the char guards) is now pinned.
- Felix: restore the import java.io.OutputStream line (was over-deleted
and patched with a fully-qualified name).
- Sara: document why the resolvePdfByIndex getCanonicalPath IOException
branch is intentionally left uncovered (no deterministic injection
seam; the log.warn is the substantive fix).
Adjust the two reflective resolvePdfByIndex calls for the new rowNumber
parameter.
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address PR #687 review concerns on DocumentImporter:
- Tobias: thread a 1-based source row number into importRow so the
"index rejected" skip log carries a breadcrumb (the row number, never
the raw hostile index) for post-import triage.
- Elicit: emit a distinct log when a valid index has no <index>.pdf on
disk (normal PLACEHOLDER) so it is not conflated with a rejected index.
- Nora: add a log.warn in resolvePdfByIndex's getCanonicalPath IOException
branch so the quiet fail-safe skip surfaces in ops, distinct from the
deliberate symlink-escape abort.
- Felix: replace inline fully-qualified java.util.regex.Pattern with an
import.
- Nora: document that \d is intentionally ASCII-only (do not add
UNICODE_CHARACTER_CLASS).
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The mass-import card no longer parses an ODS spreadsheet and MassImportService
was deleted (#674); /import now holds the normalizer's canonical artifacts
(canonical-*.xlsx + canonical-persons-tree.json) plus <index>.pdf files, read
by the canonical importer. Fix the IMPORT_HOST_DIR descriptions in
DEPLOYMENT.md and docker-compose.prod.yml accordingly.
Refs #686
File resolution is now by index (<index>.pdf), not the datei/file
column. Update the ADR-025 security sub-decision and consequence (the
recursive walk and file column are gone; a bad index skips its row with
a loud SkipReason, a symlink-escape still aborts via the containment
assertion) and DEPLOYMENT §6 (PDFs must be named <index>.pdf flat in
the import dir).
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Regenerated from the source workbooks with the committed overrides; the
export schema now has 16 columns (no file). canonical-persons.xlsx and
canonical-tag-tree.xlsx were unchanged at the cell level (only openpyxl
zip-byte churn) and were left untouched to keep the diff minimal.
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The corpus is uniform — every PDF is <index>.pdf flat in the import
dir — so resolve a document's PDF with an O(1) importDir.resolve(index
+ ".pdf") lookup instead of a recursive directory walk over the file
column. The index is validated against a strict catalog pattern
(1–4 Latin letters incl. umlauts, hyphen(s), digits, optional x) plus
the ported separator/dot/dotdot/null/slash-homoglyph/absolute-path
guards, and the resolved canonical path is asserted to stay inside the
import dir as defense-in-depth. The %PDF magic-byte check still gates
upload; status UPLOADED/PLACEHOLDER and the index→originalFilename
upsert key are unchanged. The file column and findFileRecursive walk
are gone, and the security regression tests now assert a malicious or
garbage index is rejected and a valid index resolves to exactly
importDir/<index>.pdf within containment.
Closes#686Closes#676
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The import corpus is uniform: every PDF is named <index>.pdf, so the
file column (the spreadsheet's datei value) is redundant. Remove file
from CanonicalDocument, RawRow, _FIELDS, to_canonical, and DOC_COLUMNS,
plus the now-moot index_file_mismatch review flag/CSV/stat and the
datei header mapping. date_end and the tree person_id are kept.
Refs #686
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The global undated-count rework moved the pure-text-RELEVANCE shortcut
into runSearch, where it ran after the unconditional
findAllMatchingIdsByFts call. That routed pure-text relevance through the
in-memory id path and returned empty match data, breaking FTS rank order
and snippet/offset enrichment.
Hoist the shortcut back to the top of searchDocuments so it short-circuits
to findFtsPageRaw before findAllMatchingIdsByFts, while still computing the
global undatedCount for all non-fast-path searches.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Owner decision (#668): when two documents share a meta_date, order them by
title ascending instead of createdAt ascending. title is @Column(nullable=false)
so it is always present, giving a deterministic, human-meaningful total order.
Only the DATE-sort fast path changes; the in-memory SENDER/RECEIVER/RELEVANCE
comparators are untouched.
ORDER BY meta_date <dir> NULLS LAST, title ASC
Tests assert title-asc tiebreaking for same-date rows in BOTH directions, with a
fixture whose title order is the OPPOSITE of insertion (createdAt) order so the
test fails if the tiebreaker reverts to createdAt. The integration test drives
the production resolveSort against real Postgres.
Refs #668
A screen reader announced the bare number ("Nur undatierte 42"). Add an
aria-label ("42 undatierte Dokumente") via a new i18n key and hide the
purely-visual digit with aria-hidden, so the toggle + count read sensibly.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "missing documentDate" test asserted the OLD bare em-dash; #668
replaced it with the "Datum unbekannt" badge via <DocumentDate>. Assert
the badge text and rename the misleading test title.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Surface the backend's global undatedCount on the "Nur undatierte" toggle as
a count chip — the total undated documents matching the current filter
across all pages, not the page slice. The loader forwards undatedCount
straight through (defaulting to 0); the chip hides at 0 and stays visible
regardless of the toggle state so it advertises the triage backlog size.
generate:api was hand-edited (undatedCount added to DocumentSearchResult) —
CI must re-run npm run generate:api to confirm parity.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The undated bucket count was page-local — derived from the year-grouping
of the current page's items, so it could never exceed the page size. The
owner's decision is for it to reflect ALL undated documents matching the
active filter across every page.
Add an undatedCount field to DocumentSearchResult, computed once per search
via a COUNT over the same filter spec with undatedOnly(true) forced —
independent of the "Nur undatierte" toggle so it never collapses to the
page slice or double-counts. A from/to range excludes undated rows by the
collision rule, so the count is legitimately 0 inside a date range.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The desktop right-column kept a leftover {#if doc.documentDate}…{:else}—{/if}
fallback that emitted a bare em-dash for undated documents, while the mobile
block already always rendered <DocumentDate>. DocumentDate defensively maps a
null date to the "Datum unbekannt" badge, so render it unconditionally — an
undated document is an absence, not an error, and never shows a bare "—".
Refs #668
The dated branch wrapped {label} in a flex span containing a single child
span — redundant nesting. Render the label directly in one span.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
"Datum unbekannt" is a semantically meaningful date surface, not decorative
chrome, so the 10px chip text is too small for the senior reader audience.
Bump to text-xs (≥12px) per the WCAG min-legible-text guidance.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the single-sender containsExactlyInAnyOrder check with a two-sender
fixture and ordered containsExactly proving an undated doc stays within its
sender group and never floats to the page head. Add a DESC-direction case for
in-memory-path symmetry and an undated=true + sort=SENDER case capturing the
Specification to prove undatedOnly is still applied on the person-sort path.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
No test calls resolveSort directly — the sort tests assert through
searchDocuments + ArgumentCaptor<Pageable>, so the package-private widening
added no value. Narrow the API surface back to private.
Refs #668
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Negative guarantee for #668: ChronikRow renders the activity timestamp
(happenedAt), and ActivityFeedItemDTO carries no document-date surface, so
no undated badge or "Datum unbekannt" letter-date label may appear. Pins
this as a regression fixture so a future change can't quietly add a date
chip to the activity feed.
Refs #668
SearchFilterBar gains an aria-pressed "Nur undatierte" toggle in the
advanced row (min-h-[44px] touch target, labels the state not the colour).
The documents page threads `undated` through the filter snapshot so it is a
shareable URL param picked up by both filter-change nav and pagination, and
flows into the bulk-edit "select all" /ids request. Toggling resets to page
0 via the existing implicit page-drop.
Refs #668
DocumentList gains from/to props; when a date range is active and yields no
results, the empty state shows the localized docs_range_excludes_undated
note instead of the generic copy, so the reader understands undated letters
aren't part of a range. Person-grouped modes keep undated letters under
their sender/receiver (badge-on-row, no synthetic sub-group).
Refs #668
DocumentRow rendered a bare em-dash for null-dated letters — a glyph a
screen reader announces as nothing. Both breakpoints now render the single
DocumentDate component unconditionally (no {#if}/—/{:else}), so the cue
cannot drift; its unknown state is a neutral metadata chip ("Datum
unbekannt", text-ink-3, ≥4.5:1 both themes) with a non-color calendar glyph,
never red/amber. Present dates render at honest precision via
formatDocumentDate ("Juni 1916", not a fabricated day).
Refs #668
Parses ?undated strictly (=== 'true', mirroring the tagOp clamp), forwards
it as undated || undefined so the absent case drops out of the query, and
returns the flag in page data for the control to reflect. Adds the
docs_filter_undated_only toggle label and the explanatory
docs_range_excludes_undated empty-state copy in de/en/es. The badge reuses
the existing date_precision_unknown ("Datum unbekannt") key from #677.
OpenAPI types hand-edited for the new undated query param on /search and
/ids — CI must run `npm run generate:api` to confirm parity with the spec.
Refs #668
Adds an optional `undated` query param to GET /api/documents/search and
/api/documents/ids, threaded through searchDocuments and findIdsForFilter
into the shared buildSearchSpec via undatedOnly(boolean). undated=true also
bypasses the pure-text RELEVANCE SQL shortcut, which skips buildSearchSpec
and would otherwise drop the predicate. The read GET stays unguarded
(WebMvc authz test pins 200 for an authenticated user, 401 unauthenticated).
A locking test proves the in-memory SENDER sort keeps undated letters under
their sender.
Refs #668
undatedOnly(false) is a no-op (null predicate); undatedOnly(true) returns
documentDate IS NULL, matching the existing hasStatus null-as-no-op pattern.
Real-Postgres tests pin the load-bearing guarantees H2 cannot prove: ASC
NULLS-LAST ordering, BETWEEN excludes null-dated rows, and that undated=true
combined with a from/to range returns empty (the collision rule).
Refs #668
resolveSort produced Sort.by(direction, "documentDate") with NATIVE null
handling, so Postgres surfaced undated (null meta_date) documents FIRST on
an ASC sort. Apply nullsLast() so undated rows order last for both ASC and
DESC, with a createdAt-asc tiebreaker for a stable total order when every
row is null-dated (the upcoming "Nur undatierte" filter).
Refs #668
Pure formatting (line wrap) so the file passes prettier --check; no behaviour
change.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add countByFilter parity coverage for the query (LIKE) path so the shared
FILTER_WHERE slice and count can't drift, and an integration test proving
deletePerson detaches a person referenced as both sender and receiver before
delete — the documents survive (sender nulled, receiver link removed) with no
FK orphan.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The legacy sort=documentCount path wrapped its result with paged(top, 0,
safeSize, top.size()), so totalElements/pageSize looked like a paged slice of
a larger set when in fact the top-N query returns the complete result. Add a
dedicated PersonSearchResult.topN factory that reports reality — totalElements
= returned count, pageSize = that count, totalPages = 1 (0 when empty) — and
pin both the populated and empty semantics with controller tests.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The locals.user.groups.some(...WRITE_ALL) derivation was copy-pasted across
the persons directory, persons review and the two document loaders touched by
this PR. Extract a single tested hasWriteAll(locals) helper in
$lib/shared/server and reuse it, removing the ad-hoc casts.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The role="switch" toggle set a fixed aria-label of "Zu prüfen (N)" while its
visible text flips to "Alle anzeigen" when active — a visible-text /
accessible-name mismatch (WCAG 2.5.3 Label in Name). Drop the aria-label so
the visible text is the accessible name; aria-checked carries the state.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The triage rename form reused persons_filter_type_person ("Person") and
persons_section_details ("Angaben zur Person") as the first/last-name field
labels, so a screen reader announced the wrong name for each input. Add
dedicated persons_field_first_name / persons_field_last_name keys (de/en/es)
and use them.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Align PersonCard's "unbestätigt" badge with the authoritative provisional
flag so the badge, the "Zu prüfen (N)" count and the /persons/review triage
list can never disagree. Empty/"?" name handling is now a separate
crash-safety concern: it still routes to the neutral placeholder glyph
(never a "?" initial) but no longer implies a badge on its own.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add /persons/review to the CLAUDE.md route tables and reflect the paged,
filtered directory plus the confirm/delete endpoints in the frontend
people-stories and backend persons C4 diagrams.
Closes#667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GET /api/persons now returns PersonSearchResult { items, … } instead of a bare
list. Update every caller: the dashboard top-persons path reads .items; the
unused full-list fetches in documents/new and documents/[id]/edit are dropped
(both pages use the self-fetching PersonTypeahead); the raw-fetch consumers
(PersonTypeahead, PersonMultiSelect, PersonMentionEditor) read body.items and
pass review=true so search still spans the whole directory. Specs updated to
the new envelope shape.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New WRITE-gated triage route lists provisional persons (one PersonReviewRow
each) with Merge (reuses POST /merge), Umbenennen (PUT), Bestätigen
(PATCH /confirm) and Löschen (DELETE behind the focus-trapped, Escape-dismissible
ConfirmDialog service). Actions run as form actions via use:enhance so they work
without JS and stay server-side permission-guarded; the loader is READ_ALL.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rewrite /persons: server-side filter chips (type, family-only, has-documents)
that AND within the clean reader default (familyMember OR documentCount > 0),
a writer-only show-all/Zu-prüfen toggle, and reused Pagination. Extract
PersonCard (fixes the null-lastName render crash and never shows a "?" initial —
provisional/UNKNOWN/"?" entries get a neutral placeholder avatar + a text+icon
"unbestätigt" badge, WCAG 1.4.1) and PersonFilterBar (44px aria-pressed chips,
role=switch toggle with the count in its accessible name). The loader applies
the reader restriction unless review=1 and surfaces a cheap needsReviewCount.
i18n keys added for de/en/es.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hand-edited frontend/src/lib/generated/api.ts to match the backend:
GET /api/persons now returns PersonSearchResult with the new filter/page/size
query params; adds PATCH /api/persons/{id}/confirm and DELETE /api/persons/{id}.
Generated offline (no dev backend running) — CI should re-run
`npm run generate:api` against the live spec to confirm parity.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GET /api/persons now returns PersonSearchResult with server-side filter params
(type, familyOnly, hasDocuments, provisional) and page/size bounds (@Min/@Max
-> 400). review=true drops the clean reader default. The legacy
sort=documentCount top-N path is folded into the paged contract. Add
PATCH /{id}/confirm and DELETE /{id}, both WRITE_ALL-guarded. Remove the now
unreachable PersonService.findAll(String).
BREAKING-CHANGE: GET /api/persons response shape changes from a bare list to
PersonSearchResult { items, totalElements, pageNumber, pageSize, totalPages }.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PersonService.search maps a PersonFilter to the paired slice/count repository
queries and returns a PersonSearchResult with a server-side total. confirmPerson
clears the provisional flag (the state transition behind PATCH /confirm).
deletePerson detaches sender/receiver document references before the hard delete
so it cannot orphan an FK.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add PersonSearchResult (mirrors DocumentSearchResult shape) and PersonFilter
records, plus paired findByFilter/countByFilter native queries sharing one
WHERE clause so the rendered page and totalElements can never drift. Filters
(type, familyOnly, hasDocuments, provisional, readerDefault, q) each disable
via a null/false param. Tested against real Postgres via Testcontainers.
Refs #667
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The top bar now renders document dates through formatDocumentDate, so a
DAY-precision date like 1923-04-15 renders as "15. April 1923" (de) via
Intl.DateTimeFormat — no longer the old short "15.04.1923". These two
browser-project specs still asserted the old short form and were never
updated (CI-only, not run locally by prior agents).
Refs #666
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DocumentDate.svelte passes the untrusted raw value via a prop named `raw`,
but the guard only matched metaDateRaw/documentDateRaw/rawDate — so a future
{@html raw} would slip past. Add `\braw\b` to the token list and a self-test
asserting the guard catches {@html raw}. Code is currently safe ({raw}); this
closes the defense-in-depth gap in the guard itself.
Refs #666
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@WebMvcTest multipart PUT asserting metaDatePrecision / metaDateEnd /
metaDateRaw form field names bind to the DTO. A rename on either side
silently drops the precision edit; the captured DTO catches it.
Refs #666
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
updateDocument unconditionally set metaDatePrecision/End/Raw from the DTO,
so saving an unrelated edit (a multipart PUT where the form omits the
precision controls) clobbered the stored precision with null — fabricating
a precision the user never chose. Apply each field only when the DTO carries
it, mirroring the existing metadataComplete/scriptType guards.
Refs #666
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Elicit asked that the "raw provenance shown on detail, not in list rows"
choice be captured as a product decision rather than a payload accident.
Add a code comment at the list-row DocumentDate render explaining
showRaw={false} and the intentional metaDateRaw omission from
DocumentListItem.
Refs #666
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The search results were mapped to a partial object then forced with
`as unknown as Document[]`. DocumentListItem already carries every field
the picker reads (id, title, documentDate, metaDatePrecision REQUIRED,
metaDateEnd), so introduce a DocumentOption Pick type and drop the
double-cast — the mapped objects are now honestly typed.
Refs #666
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The end-date input used px-2 py-3 with no min-h while the sibling
precision select sets min-h-[48px]. Add min-h-[48px] so the RANGE form
is uniformly senior-friendly (WCAG 2.2 2.5.8, matches the select).
Refs #666
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DAY precision routed through formatDate() which hard-coded de-DE, so an
en/es reader saw the German month name ("24. Dezember 1943"). Route DAY
through Intl.DateTimeFormat(locale, …) like the other branches, keeping
the T12:00:00 UTC-safety convention. Add en/es DAY+MONTH parity cases to
docs/date-label-fixtures.json (TS-only; the Java title formatter stays
German by design) and assert them in the spec.
Refs #666
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>