Commit Graph

643 Commits

Author SHA1 Message Date
Marcel
80f6468d52 refactor(tag): use orElseThrow over Optional.get in findOrCreate (#730)
The lowest-id tie-break stream is guarded non-empty, so .get() never
throws — but the project bans Optional.get(). Switch to .orElseThrow()
for the project idiom. No behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 11:05:45 +02:00
Marcel
d000170f52 fix(tag): resolve case-colliding tag names without throwing (#730)
findOrCreate used tagRepository.findByNameIgnoreCase, which returns
Optional<Tag> and threw NonUniqueResultException whenever two tags
collided case-insensitively (a canonical parent and its same-named
lowercase child). Every document carrying such a tag became un-editable:
any save re-resolves the whole tag set by name and blew up with a 500.

Replace the throwing lookup with exact-case-first resolution: findByName
(exact) → findAllByNameIgnoreCase (lowest-id, deterministic, never
throws) → create. Delete findByNameIgnoreCase so the throwing call can't
be reintroduced. Case collisions are valid tree nodes — no migration, no
unique(lower(name)) constraint.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 10:49:02 +02:00
Marcel
7316c51d4a refactor(document): share skip-null date-field resolution between save and projection (#726)
Extract effectivePrecision/effectiveMetaDateEnd/effectiveMetaDateRaw, used by both
applyDatePrecision (the real setters) and projectedState (the title projection), so the two
can no longer drift — addresses review feedback (Markus/Felix/Sara). Writing a stored value
back when the DTO omits a field is a harmless no-op, so behaviour is unchanged (185 existing
DocumentServiceTest cases stay green). Also documents the file-replace "treat as manual" path
inline at the reassignment site.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 17:08:51 +02:00
Marcel
26b45f1c78 feat(document): one-time backfill endpoint for stale auto-titles (#726)
Adds POST /api/admin/backfill-titles (ADMIN-only, synchronous) which rebuilds every
machine-generated title from the row's current state. A grammar heuristic
(DocumentTitleBackfillMatcher) decides overwritability: index matched literally via
startsWith (originalFilename is user-controlled — no regex injection / ReDoS, CWE-1333),
date-label forms derived from the same Locale.GERMAN formatters as the factory so they
cannot drift, prose left untouched, fail-closed on any surprise. Saves via the repository
directly (no recordVersion — follows backfillFileHashes), so the mechanical rename never
version-spams document_versions. Idempotent: a second run rewrites nothing. Emits one
SLF4J-parameterized scanned/updated/skipped line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:29:57 +02:00
Marcel
e6ce00035e feat(document): regenerate auto-title on save when date/location change (#726)
updateDocument now captures the machine title from the persisted state before any
setter runs, and rebuilds it from the new state only when the submitted title still
equals that machine value — an exact comparison that relies on the edit form
round-tripping an untouched title verbatim. A hand-written or freshly-typed title is
kept; a blank submission falls back to the rebuilt auto-title (title is always present);
a file-replaced document no longer matches its import-time title and is treated as
manual. projectedState mirrors the setter asymmetry exactly (date/location overwrite
incl. null-clear; precision/end/raw skip-null from the entity).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:20:46 +02:00
Marcel
b1f77bcfb6 refactor(document): extract title composition into shared DocumentTitleFactory (#726)
Move DocumentTitleFormatter from importing into the document package and
introduce DocumentTitleFactory there as the single source of truth for the
{index} – {dateLabel} – {location} formula. DocumentImporter now consumes the
factory instead of owning the composition; the document package owns the rule,
importing depends on it (not the reverse). No behavioral change — importer
title assertions and the #666 fixture parity test stay green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:15:00 +02:00
Marcel
4e68b81bf7 feat(document): remove conversation repository queries
Delete findConversation and findSinglePersonCorrespondence (no remaining
callers after the service methods were removed) and their integration
test section. Drops the now-unused LocalDate import.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 10:26:54 +02:00
Marcel
985b31f71f feat(document): remove conversation service methods
Delete getConversationFiltered (the endpoint's only caller is gone) and
the dead 2-arg getConversation(personA, personB) which had zero callers,
along with both getConversationFiltered test blocks. The hasSender/
hasReceiver specifications stay — document search still uses them.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 10:26:54 +02:00
Marcel
3fb312b1c6 feat(document): remove the conversation endpoint
Delete GET /api/documents/conversation and its controller handler — the
only client was the removed Briefwechsel view. Drops the now-unused Sort
import.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 10:26:54 +02:00
Marcel
50f554680c refactor(document): drop the 5-minute Cache-Control TTL on /density (#709)
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Semgrep Security Scan (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Successful in 3m21s
CI / OCR Service Tests (push) Successful in 19s
CI / Backend Unit Tests (push) Successful in 3m45s
CI / fail2ban Regex (push) Successful in 45s
CI / Semgrep Security Scan (push) Successful in 21s
CI / Compose Bucket Idempotency (push) Successful in 1m6s
The density chart is an interactive filter control; a 5-minute private
browser cache let it show stale month counts after an edit/upload/re-tag.
The in-memory aggregation is sub-200ms p95 over ~5k docs, so there is no
load reason to cache. Removing the explicit header lets Spring Security's
default no-store directive apply, so the response is always fresh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 19:56:50 +02:00
Marcel
ff7cfd4b1a fix(exception): log the violated constraint name at WARN (#678)
Addresses Tobias's review concern: the generic DataIntegrityViolation
backstop turned every integrity violation into a silent 400 with no
constraint name, no stack, no Sentry — an unanticipated write bug would
fail invisibly in production.

Now extract the constraint NAME from the cause chain (schema metadata, safe
for Loki) and log it parameterized at WARN, so the failure is debuggable.
Still never pass `ex`/`getMessage()` (SQL + values, CWE-209) and still no
Sentry — the response stays generic, so the response logic is not brittle.

New test proves the WARN names the constraint but never carries the SQL.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 11:03:04 +02:00
Marcel
3a4c2c6225 feat(exception): backstop DataIntegrityViolation as a clean 400 (#678)
Add @ExceptionHandler(DataIntegrityViolationException) returning 400
VALIDATION_ERROR with a fixed constant message, so any integrity violation
that slips past the upstream guards (a future constraint, or the import
path) becomes a clean 400 instead of a 500 + Sentry alert (AC9).

Deliberately generic — it does not inspect which constraint failed. Never
echoes ex.getMessage() (constraint name + SQL, CWE-209), logs at WARN
without passing the exception (would re-leak the SQL to Loki), and does not
call Sentry.captureException.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 09:20:22 +02:00
Marcel
73f614bc3a feat(document): reject end date without RANGE precision (#678)
Add the second validateDateRange predicate mirroring
chk_meta_date_end_only_for_range, so a direct API client that sets an end
date without RANGE precision gets a clean 400 INVALID_DATE_RANGE instead of
a 500 (AC6). Shares the code with the end-before-start branch.

Also fix updateDocument_preservesStoredPrecision_whenDtoOmitsIt: its stored
fixture (MONTH + end date) is a state the DB CHECK forbids, so the
carried-over-state guard correctly rejects it. Switched to RANGE + end —
the only DB-valid non-null-end combo — preserving the test's intent.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 09:17:52 +02:00
Marcel
a574d96351 feat(document): reject RANGE with end before start (#678)
Add ErrorCode.INVALID_DATE_RANGE and a validateDateRange guard on
DocumentService.updateDocument, run right after applyDatePrecision so it
fires before any save (updateDocumentTags persists earlier in the method).
Mirrors the V69 chk_meta_date_end_after_start CHECK: end >= start with a
null start allowed, using isBefore so equal dates stay valid. Turns a user
date typo into a clean 400 instead of a 500 + Sentry alert.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 09:12:54 +02:00
Marcel
2e44cab614 docs(document): explain the DensityFilters->SearchFilters bridge (#683)
Clarify at loadFilteredDates why the density path constructs a SearchFilters:
the two filter records are kept separate (density has no date/undated fields),
so it adapts here to reuse buildSearchSpec. Raised in the #702 review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 15:54:56 +02:00
Marcel
dcb57ffacd refactor(document): thread SearchFilters through the search chain (#683)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m26s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
Replace the long positional filter lists on the document search chain
with the SearchFilters record. searchDocuments now takes
(SearchFilters, DocumentSort, String dir, Pageable) and findIdsForFilter
takes a single SearchFilters; the four private helpers (buildSearchSpec,
runSearch, countUndatedForFilter, isPureTextRelevance) no longer carry a
positional 10-field filter list. The controller builds the record after
its existing tagOp/undated coercions; the density path adapts its
DensityFilters into a SearchFilters at the shared buildSearchSpec call.

The forced-undated count path is preserved via filters.withUndated(true),
so countUndatedForFilter still ignores the user's toggle (#668) while
runSearch honours it. No behaviour change.

Controller binding tests swap their positional any()/eq() matchers for
ArgumentCaptor<SearchFilters>, asserting captured.undated()/.status()/
.sender() — strictly stronger than the previous any()-soup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 15:20:13 +02:00
Marcel
1c961619f1 refactor(document): introduce SearchFilters record (#683)
Filter-only value object bundling the ten search predicates so the long
positional argument lists on the document search chain can be replaced
with one named record — killing the sender/receiver and from/to swap-bug
class. Mirrors the existing DensityFilters; carries a withUndated copy
accessor for the forced-undated count path. Unused as of this commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 15:07:10 +02:00
Marcel
2cdb48f4a4 refactor(document): compute hasTranscription only on the detail path (#697)
Move the hasTranscription existence query out of the shared getDocumentById
into a dedicated getDocumentDetail used solely by GET /api/documents/{id}.
The flag is only consumed by the detail page, so the extra EXISTS query no
longer runs for the many internal getDocumentById callers (e.g. the
Geschichte resolve loop and the dashboard resume path). Behaviour of the
detail endpoint is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 13:28:37 +02:00
Marcel
fc69758a92 feat(document): add server-computed hasTranscription to detail payload (#697)
getDocumentById now populates a transient hasTranscription boolean so the
document detail page can gate the transcription entry control at first
paint (no client store, no full block fetch, no layout shift).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 13:28:37 +02:00
Marcel
f55efda0d2 feat(transcription): expose hasBlocks on TranscriptionBlockQueryService (#697)
Domain-service wrapper over existsByDocumentId so other domains can ask
"does this document have any transcription blocks?" without reaching into
the repository.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 13:28:37 +02:00
Marcel
77eddfc599 feat(transcription): add existsByDocumentId block query (#697)
Cheap EXISTS query backing a server-side "has a transcription" signal so
read-only users can be offered the read view at first paint.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 13:28:37 +02:00
Marcel
5ea47d4ec7 docs(tag): document the dual document counts on the tag tree (#698)
Record that getTagTree returns both documentCount (direct, read by admin
surfaces) and subtreeDocumentCount (rollup, read by the reader surfaces),
matching the corrected getTagTree JavaDoc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 12:57:41 +02:00
Marcel
138bf446e4 feat(tag): add subtree document-count rollup to tag tree (#698)
Add subtreeDocumentCount to TagTreeNodeDTO, populated by a new recursive-CTE
aggregate query that builds a tag closure and counts distinct documents per
ancestor subtree. The direct documentCount is unchanged; getTagTree now maps
both counts onto each node from two aggregate queries (no N+1).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 12:57:41 +02:00
Marcel
516a0a3814 refactor(person): single source of truth for generation bounds (#689)
Markus flagged the 0/10 range was duplicated across five sites (DB
CHECK, both importers, DTO @Min/@Max, dropdown range). New
PersonGeneration.MIN_GENERATION / MAX_GENERATION constants are now
the canonical Java source; the DTO annotations and both importer
guards reference them. The V70 SQL CHECK comment now points at the
Java constants so future widening updates one Java class plus one
SQL literal (Flyway forbids rewriting the migration in place).

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 16:16:26 +02:00
Marcel
8f163f9b77 feat(import): warn on generation monotonicity violations (#689)
Inject RelationshipService into CanonicalImportOrchestrator and walk
PARENT_OF edges in the family network after both person loaders finish
(before documents). For every edge where child.generation is set and
not strictly deeper than parent.generation, log a WARN — soft check,
never fails the batch.

Reads through getFamilyNetwork() per the layering rule (orchestrator
never touches PersonRelationshipRepository directly). Curators see the
warning in the import log; the rest of the pipeline is unaffected so
data with curatorial gaps still loads cleanly.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:39:29 +02:00
Marcel
40511535eb feat(relationship): add generation to PersonNodeDTO + update all sites (#689)
PersonNodeDTO is a positional record. The optional Integer generation
field is inserted between deathYear and familyMember so all four
construction sites stay readable without a builder.

- RelationshipService.getFamilyNetwork → populates with
  person.getGeneration() (the Stammbaum's strict-rank source on the
  frontend).
- RelationshipInferenceService.findAllFor → populates the same way;
  inference UI does not consume it but the field travels along for
  consistency.
- RelationshipControllerTest fixtures pass null.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:35:40 +02:00
Marcel
a68a822c13 feat(import): pass generation from JSON in PersonTreeImporter (#689)
Reads the optional `generation` integer from the canonical tree JSON and
routes it into PersonUpsertCommand. Out-of-range values are skip-and-
warned with the same policy as the register importer.

Tree imports run after register (per CanonicalImportOrchestrator); a
tree-confirmed integer overwrites a register-parsed value — both sides
are "canonical" in preferHuman terms (neither is a human edit).

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:32:27 +02:00
Marcel
df0037cba2 feat(import): parse generation column in PersonRegisterImporter (#689)
Reads the optional `generation` cell by header name (REQUIRED_HEADERS is
not extended — REQ-IMP-001 backward-compat for older artifacts), parses
it through GENERATION_PATTERN (^\s*G?\s*(-?\d+)), and routes it into
PersonUpsertCommand.generation.

Out-of-range values (G 99, G -1) are skip-and-warned, never abort the
batch; the post-parse range guard mirrors the V70 CHECK constraint so
the DB never sees a value Bean Validation wouldn't accept.

Pinned with a parametrised CsvSource covering every shape from the
acceptance criteria plus a backward-compat test (artifact without a
generation column still imports, all upserts get generation=null).

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:30:31 +02:00
Marcel
dcb5585c64 feat(person): route generation through service write paths (#689)
- fromCanonical writes the imported generation into a new Person row.
- mergeCanonical routes existing/canonical generation through the
  existing preferHuman(Integer, Integer) overload so a human-edited
  value is never overwritten on re-import (ADR-025).
- updatePerson writes generation verbatim from the form DTO so a human
  can clear it back to null — same shape as birthYear/deathYear.
- createPerson(PersonUpdateDTO) writes generation so /persons/new flow
  doesn't silently drop a selected G value on create.

Pinned with five tests covering the four write paths plus the
documenting test that captures preferHuman's known limitation
(explicit human null is overwritten by a non-null canonical value —
same as birthYear/deathYear, deferred to a future helper rework if it
ever produces a user-visible bug).

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:27:11 +02:00
Marcel
1e77d6d98c feat(person): generation on PersonUpsertCommand + PersonUpdateDTO (#689)
Adds the optional generation field to both DTOs:

- PersonUpsertCommand gains Integer generation in the canonical-import
  builder chain; service wiring lands in the next commit.
- PersonUpdateDTO gains @Min(0)@Max(10) Integer generation, the form-path
  surface. The constraints mirror the V70 CHECK so validation fails fast
  at the controller before reaching the DB.

PersonControllerTest pins the validation behaviour: -1 → 400, 11 → 400,
null → 200, 3 → 200 for both PUT (update) and POST (create) paths. The
GlobalExceptionHandler maps MethodArgumentNotValidException to
VALIDATION_ERROR so the frontend's extractErrorCode keeps working.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:23:38 +02:00
Marcel
f22508ca91 feat(person): add nullable generation column to persons (#689)
Flyway V70: SMALLINT generation column with CHECK(0..10) and partial
index over non-null rows. Person.generation field surfaces it through
the JPA model. Pre-import rows and persons outside the curated family
graph legitimately stay null; the canonical importer (next commits)
back-fills via preferHuman so a human-edited value is never lost.

Refs #689

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 15:20:24 +02:00
Marcel
4cc725d546 refactor(importing): inject FileStreamOpener to remove test-only seam
DocumentImporter exposed a package-private openFileStream(File) so a
Mockito spy could force the IO-error branch of isPdfMagicBytes. The
test-only seam leaked into production: the method existed for testing,
not for any production extensibility.

Replace with a constructor-injected FileStreamOpener interface (single
abstract method, @FunctionalInterface) and a one-line
@Component DefaultFileStreamOpener delegate. Tests now inject a mock
opener instead of spying on the importer itself, which is also a more
idiomatic Mockito usage.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:29:41 +02:00
Marcel
535594378a fix(importing): use receiver_names for provisional person display name
resolveReceivers passed the slug as both `sourceRef` AND `lastName`, so
an unresolved receiver "smith-john" became a provisional Person with
lastName="smith-john" — a regression of the existing senderName→Person
contract.

Fix: zip the parallel `receiver_person_ids` and `receiver_names`
columns by position (the normalizer emits them 1:1 like
sender_person_id/sender_name). When the names list is shorter than the
slugs list, fall back to slug-as-name for the missing entries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:26:28 +02:00
Marcel
e93b09f1e2 refactor(importing): split DocumentImporter.buildDocument into named applyX helpers
buildDocument was a ~30-line method mixing attribution routing, date
parsing, authoritative collection management, file metadata, and
computed flags. Split into five named helpers — applyAttribution,
applyDates, applyAuthoritativeAssociations, applyFileMetadata,
applyComputedFlags — each doing one job. Pure refactor; all 43 existing
DocumentImporterTest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 10:23:24 +02:00
Marcel
07300aeff7 fix(person): flip family_member on both endpoints when a family-graph relationship is added
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m39s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Failing after 3m45s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m4s
The canonical importer creates persons via PersonRegisterImporter first (no family_member
set) and then upserts them via PersonTreeImporter, but mergeCanonical never propagates
family_member to existing persons — so persons with imported relationships ended up
flagged family_member=false and never appeared in /api/persons family filters or the
family-network view.

RelationshipService is documented as the owner of the family_member flag, so the fix
lives there: addRelationship now sets family_member=true on both endpoints whenever the
relation type is PARENT_OF / SPOUSE_OF / SIBLING_OF (the same set getFamilyNetwork
filters by). Non-family types (FRIEND/COLLEAGUE/EMPLOYER/DOCTOR/NEIGHBOR/OTHER) leave
the flag alone — a family doctor isn't a family member. Extracted the type list as a
FAMILY_RELATION_TYPES constant and reused it in getFamilyNetwork for a single source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 09:15:37 +02:00
Marcel
9d9cd644ec Merge remote-tracking branch 'origin/main' into HEAD
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m30s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m46s
CI / fail2ban Regex (pull_request) Failing after 46s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
# Conflicts:
#	frontend/src/lib/shared/dashboard/ReaderRecentDocs.svelte.spec.ts
#	frontend/src/routes/+page.server.ts
2026-05-27 22:16:26 +02:00
Marcel
f96b9fbffc feat(importing): log import-row breadcrumbs and distinguish skip outcomes
Address PR #687 review concerns on DocumentImporter:
- Tobias: thread a 1-based source row number into importRow so the
  "index rejected" skip log carries a breadcrumb (the row number, never
  the raw hostile index) for post-import triage.
- Elicit: emit a distinct log when a valid index has no <index>.pdf on
  disk (normal PLACEHOLDER) so it is not conflated with a rejected index.
- Nora: add a log.warn in resolvePdfByIndex's getCanonicalPath IOException
  branch so the quiet fail-safe skip surfaces in ops, distinct from the
  deliberate symlink-escape abort.
- Felix: replace inline fully-qualified java.util.regex.Pattern with an
  import.
- Nora: document that \d is intentionally ASCII-only (do not add
  UNICODE_CHARACTER_CLASS).

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
f5eb227239 feat(importing): resolve import PDFs directly by index
The corpus is uniform — every PDF is <index>.pdf flat in the import
dir — so resolve a document's PDF with an O(1) importDir.resolve(index
+ ".pdf") lookup instead of a recursive directory walk over the file
column. The index is validated against a strict catalog pattern
(1–4 Latin letters incl. umlauts, hyphen(s), digits, optional x) plus
the ported separator/dot/dotdot/null/slash-homoglyph/absolute-path
guards, and the resolved canonical path is asserted to stay inside the
import dir as defense-in-depth. The %PDF magic-byte check still gates
upload; status UPLOADED/PLACEHOLDER and the index→originalFilename
upsert key are unchanged. The file column and findFileRecursive walk
are gone, and the security regression tests now assert a malicious or
garbage index is rejected and a valid index resolves to exactly
importDir/<index>.pdf within containment.

Closes #686
Closes #676

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00
Marcel
7183d15fe5 fix(document): restore pure-text-relevance FTS fast path past undated count
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m29s
CI / OCR Service Tests (pull_request) Successful in 25s
CI / Backend Unit Tests (pull_request) Successful in 3m52s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
The global undated-count rework moved the pure-text-RELEVANCE shortcut
into runSearch, where it ran after the unconditional
findAllMatchingIdsByFts call. That routed pure-text relevance through the
in-memory id path and returned empty match data, breaking FTS rank order
and snippet/offset enrichment.

Hoist the shortcut back to the top of searchDocuments so it short-circuits
to findFtsPageRaw before findAllMatchingIdsByFts, while still computing the
global undatedCount for all non-fast-path searches.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 21:04:48 +02:00
Marcel
b52bf60913 fix(document): tie-break equal-date DATE sort by title asc, not createdAt
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 3m2s
CI / OCR Service Tests (pull_request) Successful in 24s
CI / Backend Unit Tests (pull_request) Failing after 3m54s
CI / fail2ban Regex (pull_request) Successful in 47s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
Owner decision (#668): when two documents share a meta_date, order them by
title ascending instead of createdAt ascending. title is @Column(nullable=false)
so it is always present, giving a deterministic, human-meaningful total order.
Only the DATE-sort fast path changes; the in-memory SENDER/RECEIVER/RELEVANCE
comparators are untouched.

ORDER BY meta_date <dir> NULLS LAST, title ASC

Tests assert title-asc tiebreaking for same-date rows in BOTH directions, with a
fixture whose title order is the OPPOSITE of insertion (createdAt) order so the
test fails if the tiebreaker reverts to createdAt. The integration test drives
the production resolveSort against real Postgres.

Refs #668
2026-05-27 20:21:18 +02:00
Marcel
a3c3f14aea feat(documents): return global undated count in search response
The undated bucket count was page-local — derived from the year-grouping
of the current page's items, so it could never exceed the page size. The
owner's decision is for it to reflect ALL undated documents matching the
active filter across every page.

Add an undatedCount field to DocumentSearchResult, computed once per search
via a COUNT over the same filter spec with undatedOnly(true) forced —
independent of the "Nur undatierte" toggle so it never collapses to the
page slice or double-counts. A from/to range excludes undated rows by the
collision rule, so the count is legitimately 0 inside a date range.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:42:32 +02:00
Marcel
eacfd15f8e refactor(document): revert resolveSort to private
No test calls resolveSort directly — the sort tests assert through
searchDocuments + ArgumentCaptor<Pageable>, so the package-private widening
added no value. Narrow the API surface back to private.

Refs #668

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:06:16 +02:00
Marcel
268c31a49b feat(document): thread an undated filter through search and the /ids path
Adds an optional `undated` query param to GET /api/documents/search and
/api/documents/ids, threaded through searchDocuments and findIdsForFilter
into the shared buildSearchSpec via undatedOnly(boolean). undated=true also
bypasses the pure-text RELEVANCE SQL shortcut, which skips buildSearchSpec
and would otherwise drop the predicate. The read GET stays unguarded
(WebMvc authz test pins 200 for an authenticated user, 401 unauthenticated).
A locking test proves the in-memory SENDER sort keeps undated letters under
their sender.

Refs #668
2026-05-27 18:42:17 +02:00
Marcel
39a462b2bb feat(document): add undatedOnly Specification for the undated-only filter
undatedOnly(false) is a no-op (null predicate); undatedOnly(true) returns
documentDate IS NULL, matching the existing hasStatus null-as-no-op pattern.
Real-Postgres tests pin the load-bearing guarantees H2 cannot prove: ASC
NULLS-LAST ordering, BETWEEN excludes null-dated rows, and that undated=true
combined with a from/to range returns empty (the collision rule).

Refs #668
2026-05-27 18:34:10 +02:00
Marcel
5f2ef823e1 fix(document): order undated documents last on the DATE sort fast path
resolveSort produced Sort.by(direction, "documentDate") with NATIVE null
handling, so Postgres surfaced undated (null meta_date) documents FIRST on
an ASC sort. Apply nullsLast() so undated rows order last for both ASC and
DESC, with a createdAt-asc tiebreaker for a stable total order when every
row is null-dated (the upcoming "Nur undatierte" filter).

Refs #668
2026-05-27 18:31:40 +02:00
Marcel
1e3e420860 fix(person): report honest totals on the non-paged top-N persons path
The legacy sort=documentCount path wrapped its result with paged(top, 0,
safeSize, top.size()), so totalElements/pageSize looked like a paged slice of
a larger set when in fact the top-N query returns the complete result. Add a
dedicated PersonSearchResult.topN factory that reports reality — totalElements
= returned count, pageSize = that count, totalPages = 1 (0 when empty) — and
pin both the populated and empty semantics with controller tests.

Refs #667

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 14:19:00 +02:00
Marcel
529c92fcc3 feat(person): paginate GET /api/persons and add confirm/delete endpoints
GET /api/persons now returns PersonSearchResult with server-side filter params
(type, familyOnly, hasDocuments, provisional) and page/size bounds (@Min/@Max
-> 400). review=true drops the clean reader default. The legacy
sort=documentCount top-N path is folded into the paged contract. Add
PATCH /{id}/confirm and DELETE /{id}, both WRITE_ALL-guarded. Remove the now
unreachable PersonService.findAll(String).

BREAKING-CHANGE: GET /api/persons response shape changes from a bare list to
PersonSearchResult { items, totalElements, pageNumber, pageSize, totalPages }.

Refs #667

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 13:33:10 +02:00
Marcel
ec357ac13c feat(person): add paged search, confirm and delete to PersonService
PersonService.search maps a PersonFilter to the paired slice/count repository
queries and returns a PersonSearchResult with a server-side total. confirmPerson
clears the provisional flag (the state transition behind PATCH /confirm).
deletePerson detaches sender/receiver document references before the hard delete
so it cannot orphan an FK.

Refs #667

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 13:30:14 +02:00
Marcel
a24764e58a feat(person): add filter-aware paged repository queries
Add PersonSearchResult (mirrors DocumentSearchResult shape) and PersonFilter
records, plus paired findByFilter/countByFilter native queries sharing one
WHERE clause so the rendered page and totalElements can never drift. Filters
(type, familyOnly, hasDocuments, provisional, readerDefault, q) each disable
via a null/false param. Tested against real Postgres via Testcontainers.

Refs #667

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 13:27:39 +02:00
Marcel
728078f1e5 fix(dates): preserve stored date precision when edit omits it
updateDocument unconditionally set metaDatePrecision/End/Raw from the DTO,
so saving an unrelated edit (a multipart PUT where the form omits the
precision controls) clobbered the stored precision with null — fabricating
a precision the user never chose. Apply each field only when the DTO carries
it, mirroring the existing metadataComplete/scriptType guards.

Refs #666
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 12:34:58 +02:00