Assert that when the same person id is returned by two different token
fetches, the person appears exactly once in the result -- pinning
fetchPool's putIfAbsent dedup so a future refactor can't silently
double-classify a candidate.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
AC#4 (maiden alias -> direct) and AC#5 (alias first name -> fetchable +
classifiable) were each split across PersonRepositoryTest (the fetch) and
PersonServiceTest (the classifier with stubs) -- nothing walked
searchByName -> resolveByName end-to-end on real Postgres. Add two tests
in the existing @DataJpaTest slice that build a real PersonService over
the autowired repositories, persist a person with a MAIDEN_NAME alias and
one with an alias firstName, and assert both classify as direct.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
resolveNames now delegates to PersonService.resolveByName and maps by match
strength: 1 direct → resolved (auto-select), ≥2 direct → ambiguous, 0 direct
with partials → ambiguous suggestions, 0 candidates → folded into full-text.
A single direct match no longer forces the picker when looser substring hits
coexist. The MAX_CANDIDATES cap moved into PersonService (after classification);
the MAX_NAME_LENGTH guard, resolved-cap overflow, and sender/receiver mapping
are preserved.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Token-set containment over all of a person's name components (firstName,
lastName, alias, each PersonNameAlias first+last, title) decides direct vs
partial. Orchestrates tokenize → cap(8) → fetch pool → classify → cap(10)
after classification, with an empty-token guard and a PII-free debug log of
the outcome bucket. MAX_TOKENS is a DoS control; the after-classify cap keeps a
direct match that sorts past position 10 among partials. Read-only transaction
keeps lazy nameAliases reachable during classification (ADR-022).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The direct-match classifier accepts alias firstName tokens, so the fetch must
surface candidates matchable only via an alias first name. Add a.firstName to
the searchByName LIKE clause (reuses the bound :query — injection-proof). The
person_name_aliases.first_name column already exists; no migration.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Lowercase, split on whitespace/hyphen/apostrophe, drop empties. Applied
symmetrically to query and candidate name components so "Anna-Maria" and
"Anna Maria" tokenize alike. Foundation for resolveByName direct matching.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Verifies the recursive CTE in findDescendantIdsByName expands a parent tag
to include all child IDs, and that findByNameContainingIgnoreCase matches
both parent and child names when the fragment appears in both.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers multi-tag match, no-match FTS fallback, mixed resolution, personRole
bypass, cap at 10, short-keyword skip, dedup, rawQuery suppression when all
keywords resolve, flag independence, colour propagation via resolveEffectiveColors,
and colour=null when depth constraint prevents resolution.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keywords that substring-match the tag taxonomy become OR-union tag filters;
non-matching keywords stay as FTS text. Resolved tags surface in the
NlQueryInterpretation as TagHint objects with effective colours. The
rawQuery fallback is now guarded by hadStructuredMatch to prevent
double-apply when all keywords resolve.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Positional record fields added; all 3 construction sites updated with neutral
defaults; NlQueryParserService wired for TagService (4th constructor arg);
NlQueryParserServiceTest and NlSearchControllerTest synced.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switch to wiremock-jetty12 artifact and force ee10 Jetty deps to 12.1.8
to resolve compatibility with Spring Boot 4's Jetty 12.1.8 core.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
findByName resolved via Optional<Person>
findByFirstNameIgnoreCaseAndLastNameIgnoreCase, which threw
NonUniqueResultException once two people shared a first+last name case-
insensitively (hans müller / Hans Müller) — a 500 on the routine upload path
(DocumentService.storeDocument sender resolution).
findByName now resolves exact-case → single case-insensitive match → else
empty. The sender path deliberately diverges from the alias path: an
ambiguous name leaves the sender UNSET rather than guessing the lowest id,
because correct provenance beats a confidently-wrong pre-fill a reviewer
won't re-check. The two new name queries use explicit HQL equality so a null
first name binds as `= NULL` (no match) instead of the derived-query fold to
`first_name IS NULL`, which would widen a last-name-only row in as a sender.
Pins the opaque error path (IncorrectResultSizeDataAccessException stays
INTERNAL_ERROR with no Hibernate/SQL/row-count leak) and extends ADR-032 with
the Person section.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
findOrCreateByAlias resolved via Optional<Person> findByAliasIgnoreCase,
which throws NonUniqueResultException once two aliases collide only by case
(müller / Müller) — a generic 500 on the importer path. Mirror the #730 tag
fix: resolve exact-case first, then the lowest-id case-insensitive sibling,
then create-when-absent (institution/group and maiden-name alias preserved).
The throwing Optional<…>IgnoreCase variant is deleted so it can't be reused.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
V71 gives transcription_block_mentioned_persons.person_id a real FK, so two
TranscriptionBlockMentionsRepositoryTest cases that inserted mention rows with
random (non-existent) person ids now violate fk_tbmp_person. Persist real
Person rows and use their ids. Caught by CI's full suite.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- AC-3 cascade test: assert an innocent bystander's mention row survives the
delete, proving the cascade is scoped to the deleted person (Nora).
- Fix integration-test comment: receivers is @ManyToMany(LAZY), not an EAGER
@ElementCollection (Sara).
- ADR-032: note the @ prefix is kept in the degraded path, stripped in live
mentions (Leonie).
- Add trailing newline to PersonRepository.java (Felix).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The deletePerson service-path guard (AC-4) is unchanged behaviourally, but its
comments described the removed reassignSenderToNull/deleteReceiverReferences
chain. Update them to the V71 ON DELETE cascade mechanism.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the explicit deleteReceiverReferences call from mergePersons — the
source's leftover receiver join rows now cascade-drop via V71's ON DELETE
CASCADE on deleteById. Remove the now-unused deleteReceiverReferences
repository method (and its repo test), and add clearAutomatically +
flushAutomatically to the remaining merge native queries so the L1 cache
cannot desync from the bulk updates. Rewrite the merge unit test with
verifyNoMoreInteractions and add an end-to-end merge regression test (AC-7).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the application-layer sender/receiver detach from deletePerson — the
V71 ON DELETE constraints now enforce it. Remove the now-unused
reassignSenderToNull repository method and rewrite the unit test to assert
only the existence check plus deleteById (verifyNoMoreInteractions).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add ON DELETE behaviour to the two V1 FKs into persons (documents.sender_id
-> SET NULL, document_receivers.person_id -> CASCADE) and a real FK with
ON DELETE CASCADE on the transcription_block_mentioned_persons soft reference,
cleaning up pre-existing orphan mention rows first. The cascade stays strictly
at the join/reference layer and never reaches documents rows.
Proven by new Postgres-backed PersonRepositoryTest cascade tests (AC-1/2/3/8
plus the cascade-boundary document-survival guard). Rewrites the now-stale
V56 'no FK' comment.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two adversarial gaps from PR #733 review:
- Unit: exact-case must win even when its id is NOT the lowest, proving
exact-case short-circuits before the lowest-id tie-break (a naive
"lowest id across all CI matches" would pick the wrong row).
- Integration: assert findAllByNameIgnoreCase folds the UPPERCASE
"GLÜCKWÜNSCHE" — the exact string findOrCreate passes — so the umlaut
proof matches the resolution path under test, not a lowercase probe.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mocked TagServiceTest can't prove the two things that actually broke:
that findAllByNameIgnoreCase folds umlauts the way Postgres LOWER() does,
and that saving a document tagged with a case-colliding tag no longer
throws NonUniqueResultException. Testcontainers postgres:16-alpine:
- updateDocument on a doc tagged with the child "weihnachten" succeeds
and keeps exactly the child tag (not the parent).
- findOrCreate("GLÜCKWÜNSCHE") resolves the Glückwünsche/glückwünsche
umlaut pair deterministically (lowest id) without throwing — the
regression catcher a plain-ASCII pair would miss.
- bulk-edit funnels through resolveTags → findOrCreate, guarding a
future refactor that bypasses it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
findOrCreate used tagRepository.findByNameIgnoreCase, which returns
Optional<Tag> and threw NonUniqueResultException whenever two tags
collided case-insensitively (a canonical parent and its same-named
lowercase child). Every document carrying such a tag became un-editable:
any save re-resolves the whole tag set by name and blew up with a 500.
Replace the throwing lookup with exact-case-first resolution: findByName
(exact) → findAllByNameIgnoreCase (lowest-id, deterministic, never
throws) → create. Delete findByNameIgnoreCase so the throwing call can't
be reintroduced. Case collisions are valid tree nodes — no migration, no
unique(lower(name)) constraint.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- save-time: precision+raw carry-over when the DTO omits them (exercises the shared skip-null
resolvers), and a RANGE label round-trip (Sara/Elicit)
- factory: a bare Document with a null index builds "" rather than NPE-ing (Felix)
- backfill matcher: negative near-misses — ASCII hyphen vs en dash, missing separator before
trailing text, year-with-trailing-letters, index followed by text without a separator (Sara)
- backfill integration: tighten the count assertion to exactly 1 on the clean test DB (Sara)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pins backfill behaviour on postgres:16-alpine (H2 unusable — title is NOT NULL): a stale
auto-title is rewritten, the sweep is idempotent (second run touches nothing), prose is
left alone, and the mechanical rename adds no document_versions rows. Permission (401/403)
stays in the faster @WebMvcTest slice.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds POST /api/admin/backfill-titles (ADMIN-only, synchronous) which rebuilds every
machine-generated title from the row's current state. A grammar heuristic
(DocumentTitleBackfillMatcher) decides overwritability: index matched literally via
startsWith (originalFilename is user-controlled — no regex injection / ReDoS, CWE-1333),
date-label forms derived from the same Locale.GERMAN formatters as the factory so they
cannot drift, prose left untouched, fail-closed on any surprise. Saves via the repository
directly (no recordVersion — follows backfillFileHashes), so the mechanical rename never
version-spams document_versions. Idempotent: a second run rewrites nothing. Emits one
SLF4J-parameterized scanned/updated/skipped line.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
updateDocument now captures the machine title from the persisted state before any
setter runs, and rebuilds it from the new state only when the submitted title still
equals that machine value — an exact comparison that relies on the edit form
round-tripping an untouched title verbatim. A hand-written or freshly-typed title is
kept; a blank submission falls back to the rebuilt auto-title (title is always present);
a file-replaced document no longer matches its import-time title and is treated as
manual. projectedState mirrors the setter asymmetry exactly (date/location overwrite
incl. null-clear; precision/end/raw skip-null from the entity).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move DocumentTitleFormatter from importing into the document package and
introduce DocumentTitleFactory there as the single source of truth for the
{index} – {dateLabel} – {location} formula. DocumentImporter now consumes the
factory instead of owning the composition; the document package owns the rule,
importing depends on it (not the reverse). No behavioral change — importer
title assertions and the #666 fixture parity test stay green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Delete findConversation and findSinglePersonCorrespondence (no remaining
callers after the service methods were removed) and their integration
test section. Drops the now-unused LocalDate import.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Delete getConversationFiltered (the endpoint's only caller is gone) and
the dead 2-arg getConversation(personA, personB) which had zero callers,
along with both getConversationFiltered test blocks. The hasSender/
hasReceiver specifications stay — document search still uses them.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The density chart is an interactive filter control; a 5-minute private
browser cache let it show stale month counts after an edit/upload/re-tag.
The in-memory aggregation is sub-200ms p95 over ~5k docs, so there is no
load reason to cache. Removing the explicit header lets Spring Security's
default no-store directive apply, so the response is always fresh.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses Sara's review concerns:
- Add a negative Testcontainers test: saveAndFlush of a RANGE with end < start
throws DataIntegrityViolationException, proving chk_meta_date_end_after_start
actually fires (H2 wouldn't) and exercising the backstop's trigger end-to-end.
Guards against silent app/DB drift if the service guard ever regresses.
- Tighten updateDocument_acceptsRange_whenEndAfterStart to assert the persisted
end value, not just that save was called.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses Tobias's review concern: the generic DataIntegrityViolation
backstop turned every integrity violation into a silent 400 with no
constraint name, no stack, no Sentry — an unanticipated write bug would
fail invisibly in production.
Now extract the constraint NAME from the cause chain (schema metadata, safe
for Loki) and log it parameterized at WARN, so the failure is debuggable.
Still never pass `ex`/`getMessage()` (SQL + values, CWE-209) and still no
Sentry — the response stays generic, so the response logic is not brittle.
New test proves the WARN names the constraint but never carries the SQL.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Testcontainers integration test persisting a RANGE doc with end == start
against real Postgres + Flyway, which (unlike H2) enforces the V69
chk_meta_date_end_after_start CHECK. Pins the app guard's isBefore
semantics to the actual >= constraint, guarding against app/DB drift (AC2).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add @ExceptionHandler(DataIntegrityViolationException) returning 400
VALIDATION_ERROR with a fixed constant message, so any integrity violation
that slips past the upstream guards (a future constraint, or the import
path) becomes a clean 400 instead of a 500 + Sentry alert (AC9).
Deliberately generic — it does not inspect which constraint failed. Never
echoes ex.getMessage() (constraint name + SQL, CWE-209), logs at WARN
without passing the exception (would re-leak the SQL to Loki), and does not
call Sentry.captureException.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the second validateDateRange predicate mirroring
chk_meta_date_end_only_for_range, so a direct API client that sets an end
date without RANGE precision gets a clean 400 INVALID_DATE_RANGE instead of
a 500 (AC6). Shares the code with the end-before-start branch.
Also fix updateDocument_preservesStoredPrecision_whenDtoOmitsIt: its stored
fixture (MONTH + end date) is a state the DB CHECK forbids, so the
carried-over-state guard correctly rejects it. Switched to RANGE + end —
the only DB-valid non-null-end combo — preserving the test's intent.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cover AC2 (end == start), AC3 (open-ended, end null) and AC4 (null start +
end set, which must not reject or NPE), plus end-after-start. Guards the
guard against future over-rejection that would diverge from the DB CHECK.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add ErrorCode.INVALID_DATE_RANGE and a validateDateRange guard on
DocumentService.updateDocument, run right after applyDatePrecision so it
fires before any save (updateDocumentTags persists earlier in the method).
Mirrors the V69 chk_meta_date_end_after_start CHECK: end >= start with a
null start allowed, using isBefore so equal dates stay valid. Turns a user
date typo into a clean 400 instead of a 500 + Sentry alert.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The /search path already pins the Boolean-undated->primitive coercion via
search_withoutUndatedParam_forwardsFalseToService; add the symmetric pin for
getDocumentIds so an absent param provably resolves to undated=false on the
record (never NPE). Raised in the #702 review.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the ~29 repeated `new SearchFilters(null, null, null, null, null,
null, null, null, null, false)` literals across the search test suites with
a shared SearchFiltersFixtures.noFilters() factory (and noFilters()
.withUndated(true) for the undated-only case). Tests that pin a specific
field keep their explicit `new SearchFilters(...)` so intent stays visible.
Pure test-ergonomics cleanup raised in the #702 review; no behaviour change.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the long positional filter lists on the document search chain
with the SearchFilters record. searchDocuments now takes
(SearchFilters, DocumentSort, String dir, Pageable) and findIdsForFilter
takes a single SearchFilters; the four private helpers (buildSearchSpec,
runSearch, countUndatedForFilter, isPureTextRelevance) no longer carry a
positional 10-field filter list. The controller builds the record after
its existing tagOp/undated coercions; the density path adapts its
DensityFilters into a SearchFilters at the shared buildSearchSpec call.
The forced-undated count path is preserved via filters.withUndated(true),
so countUndatedForFilter still ignores the user's toggle (#668) while
runSearch honours it. No behaviour change.
Controller binding tests swap their positional any()/eq() matchers for
ArgumentCaptor<SearchFilters>, asserting captured.undated()/.status()/
.sender() — strictly stronger than the previous any()-soup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Round out the "read-only users can't write anything" boundary: a READ_ALL
principal is forbidden from posting a block comment, replying, and editing a
comment (the prior tests only used a no-authority principal for create).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the hasTranscription existence query out of the shared getDocumentById
into a dedicated getDocumentDetail used solely by GET /api/documents/{id}.
The flag is only consumed by the detail page, so the extra EXISTS query no
longer runs for the many internal getDocumentById callers (e.g. the
Geschichte resolve loop and the dashboard resume path). Behaviour of the
detail endpoint is unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>