marcel/familienarchiv

Fork 0

Files

Marcel be15de6cc6

CI / Unit & Component Tests (pull_request) Successful in 3m15s

Details

CI / OCR Service Tests (pull_request) Successful in 19s

Details

CI / Backend Unit Tests (pull_request) Successful in 3m25s

Details

CI / fail2ban Regex (pull_request) Successful in 42s

Details

CI / Semgrep Security Scan (pull_request) Successful in 20s

Details

CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s

Details

fix(document): fix test assertion structure + add entity graph decision comments

- Refactor DocumentLazyLoadingTest: pull value assertions (assertThat) out
  of assertThatCode lambdas so failures surface as AssertionError rather
  than "unexpected exception: AssertionError" (review item 1)
- Add @EntityGraph("Document.full") to findBySenderId, findByReceiversId,
  findConversation, and findSinglePersonCorrespondence — all return full
  Documents to the controller for JSON serialization (review item 2)
- Add "// Callers access only ..." comments to un-graphed methods where no
  lazy associations are touched: findByTags_Id, findByStatus,
  findByMetadataCompleteFalse(Sort), findByMetadataCompleteFalse(Pageable)
- Remove "what" inline comments from @Transactional(readOnly=true)
  on getRecentActivity and getDocumentById — the why is in ADR-022 (item 4)
- Add named-graph coupling consequence to ADR-022: Document.java and
  DocumentRepository.java graph name strings must stay in sync (item 5)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-19 07:42:59 +02:00

5.7 KiB

Raw Blame History

ADR-022 — EAGER→LAZY Fetch Strategy for Document Collections

Date: 2026-05-18 Status: Accepted Issue: #467 PR: #622

Context

A pre-production query audit of 24 HTTP requests to the document list and detail endpoints produced 2,733 SQL statements — primarily N+1 queries caused by FetchType.EAGER on Document.receivers, Document.tags, Document.trainingLabels, and Document.sender.

With EAGER fetch, every Document loaded by any repository method immediately triggers additional SELECT statements for each associated collection, regardless of whether the caller needs those associations. For a list of 100 documents, this means up to 400 extra queries for receivers alone.

Decision

Switch all four associations to FetchType.LAZY and use a two-tier strategy to load exactly what each code path needs:

Tier 1 — Named entity graphs on Document + @EntityGraph overrides on DocumentRepository:

Document.full — loads sender, receivers, tags — used by findById (detail view)
Document.list — loads sender, tags — used by findAll(Spec, Pageable), findAll(Spec), and findAll(Pageable) (list/search/dashboard paths)

Each repository method that is called from a hot code path has an @EntityGraph override that declares exactly which associations to JOIN-fetch, collapsing N+1 into 1–2 queries.

Tier 2 — @BatchSize(50) fallback on all four associations:

For any lazy access path not covered by an entity graph (e.g., a future ad-hoc query or an in-memory sort that touches trainingLabels), Hibernate batches the secondary SELECT to at most one statement per 50 entities instead of one per entity.

Session lifetime for post-return lazy access:

getDocumentById and getRecentActivity return entities to callers that may access lazy associations after the repository call returns. Both methods are annotated @Transactional(readOnly = true) to keep the Hibernate session open until the service method returns, making those post-return accesses safe.

This is an intentional exception to the project convention that read methods are not annotated (see CLAUDE.md §Services). The convention remains correct for all other read methods; this exception applies only to methods that serve lazy-initialized associations to their callers.

Alternatives Considered

`@BatchSize`-only (no entity graphs)

@BatchSize(50) on all associations would eliminate the worst N+1 cases (100 documents → 2 batch queries instead of 100 individual queries) without requiring repository overrides. Simpler to maintain — no named graph definitions, no per-method overrides.

Rejected because batch loading is best-effort: it depends on what Hibernate happens to find in the first-level cache and produces a variable number of statements. Entity graphs produce a deterministic, verifiable statement count that can be asserted in tests. The query-count test suite (DocumentRepositoryTest) validates the exact statement bounds on every CI run.

Single unified entity graph (`Document.full` everywhere)

Loading receivers on every list query is wasteful — the document list view only needs sender and tags. receivers is a @ManyToMany collection that, when JOIN-fetched together with tags, forces Hibernate to split into two queries anyway (to avoid Cartesian product). Using a single graph on list paths would load data the UI does not display.

Rejected in favour of two graphs with distinct scopes: Document.list for list paths (sender + tags), Document.full for detail paths (sender + receivers + tags).

`@Transactional` on the Spring Data repository methods

Spring Data allows @Transactional on repository interfaces directly. This would keep the session open for all calls to those methods without touching the service layer.

Rejected because the transaction boundary belongs at the service layer — repositories should not own transaction lifecycle. The service methods are the natural scope for "keep the session open long enough for the caller to use the result."

Consequences

Query count reduced from ~2,733 to ≤10 statements per 24 HTTP requests — verified by DocumentRepositoryTest query-count assertions and DocumentLazyLoadingTest smoke tests.
Read methods that return lazily-initialized entities must carry @Transactional(readOnly = true). Any future service method that loads a Document and returns it to a caller that accesses lazy associations must follow this pattern. Removing the annotation causes LazyInitializationException in production.
New lazy code paths need an entity graph or @BatchSize review. Any new DocumentRepository method added to a hot code path should be assessed for N+1 risk and given an @EntityGraph override if warranted.
@JsonIgnoreProperties({"hibernateLazyInitializer", "handler"}) required on serialized lazy-proxy entities. Person and Tag carry this annotation to prevent Jackson from attempting to serialize Hibernate proxy internals when the association is not initialized. Any new entity that is used as a lazy association and serialized directly (without a DTO) needs the same annotation.
Named graph strings in Document.java and DocumentRepository.java must stay in sync. The @NamedEntityGraph(name = "Document.full") / @NamedEntityGraph(name = "Document.list") definitions on Document are referenced by string in every @EntityGraph(value = "...") on DocumentRepository. If the names diverge (e.g. a graph is renamed in one place but not the other), Spring Data throws at application startup. Always update both files together when renaming or restructuring a named graph.

5.7 KiB Raw Blame History Unescape Escape

ADR-022 — EAGER→LAZY Fetch Strategy for Document Collections

Context

Decision

Alternatives Considered

@BatchSize-only (no entity graphs)

Single unified entity graph (Document.full everywhere)

@Transactional on the Spring Data repository methods

Consequences

5.7 KiB

Raw Blame History

`@BatchSize`-only (no entity graphs)

Single unified entity graph (`Document.full` everywhere)

`@Transactional` on the Spring Data repository methods