Files
familienarchiv/docs/adr/022-eager-to-lazy-fetch-strategy.md
Marcel 7342c60952 fix(document): fix test assertion structure + add entity graph decision comments
- Refactor DocumentLazyLoadingTest: pull value assertions (assertThat) out
  of assertThatCode lambdas so failures surface as AssertionError rather
  than "unexpected exception: AssertionError" (review item 1)
- Add @EntityGraph("Document.full") to findBySenderId, findByReceiversId,
  findConversation, and findSinglePersonCorrespondence — all return full
  Documents to the controller for JSON serialization (review item 2)
- Add "// Callers access only ..." comments to un-graphed methods where no
  lazy associations are touched: findByTags_Id, findByStatus,
  findByMetadataCompleteFalse(Sort), findByMetadataCompleteFalse(Pageable)
- Remove "what" inline comments from @Transactional(readOnly=true)
  on getRecentActivity and getDocumentById — the why is in ADR-022 (item 4)
- Add named-graph coupling consequence to ADR-022: Document.java and
  DocumentRepository.java graph name strings must stay in sync (item 5)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 09:23:30 +02:00

111 lines
5.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-022 — EAGER→LAZY Fetch Strategy for Document Collections
**Date:** 2026-05-18
**Status:** Accepted
**Issue:** #467
**PR:** #622
---
## Context
A pre-production query audit of 24 HTTP requests to the document list and detail endpoints
produced **2,733 SQL statements** — primarily N+1 queries caused by `FetchType.EAGER` on
`Document.receivers`, `Document.tags`, `Document.trainingLabels`, and `Document.sender`.
With EAGER fetch, every `Document` loaded by any repository method immediately triggers
additional `SELECT` statements for each associated collection, regardless of whether the
caller needs those associations. For a list of 100 documents, this means up to 400 extra
queries for `receivers` alone.
---
## Decision
Switch all four associations to `FetchType.LAZY` and use a two-tier strategy to load exactly
what each code path needs:
**Tier 1 — Named entity graphs on `Document` + `@EntityGraph` overrides on `DocumentRepository`:**
- `Document.full` — loads `sender`, `receivers`, `tags` — used by `findById` (detail view)
- `Document.list` — loads `sender`, `tags` — used by `findAll(Spec, Pageable)`,
`findAll(Spec)`, and `findAll(Pageable)` (list/search/dashboard paths)
Each repository method that is called from a hot code path has an `@EntityGraph` override
that declares exactly which associations to JOIN-fetch, collapsing N+1 into 12 queries.
**Tier 2 — `@BatchSize(50)` fallback on all four associations:**
For any lazy access path not covered by an entity graph (e.g., a future ad-hoc query or an
in-memory sort that touches `trainingLabels`), Hibernate batches the secondary `SELECT` to
at most one statement per 50 entities instead of one per entity.
**Session lifetime for post-return lazy access:**
`getDocumentById` and `getRecentActivity` return entities to callers that may access lazy
associations after the repository call returns. Both methods are annotated
`@Transactional(readOnly = true)` to keep the Hibernate session open until the service method
returns, making those post-return accesses safe.
This is an intentional exception to the project convention that read methods are not annotated
(see `CLAUDE.md §Services`). The convention remains correct for all other read methods; this
exception applies only to methods that serve lazy-initialized associations to their callers.
---
## Alternatives Considered
### `@BatchSize`-only (no entity graphs)
`@BatchSize(50)` on all associations would eliminate the worst N+1 cases (100 documents → 2
batch queries instead of 100 individual queries) without requiring repository overrides. Simpler
to maintain — no named graph definitions, no per-method overrides.
Rejected because batch loading is best-effort: it depends on what Hibernate happens to find in
the first-level cache and produces a variable number of statements. Entity graphs produce a
deterministic, verifiable statement count that can be asserted in tests. The query-count test
suite (`DocumentRepositoryTest`) validates the exact statement bounds on every CI run.
### Single unified entity graph (`Document.full` everywhere)
Loading `receivers` on every list query is wasteful — the document list view only needs
`sender` and `tags`. `receivers` is a `@ManyToMany` collection that, when JOIN-fetched together
with `tags`, forces Hibernate to split into two queries anyway (to avoid Cartesian product).
Using a single graph on list paths would load data the UI does not display.
Rejected in favour of two graphs with distinct scopes: `Document.list` for list paths
(sender + tags), `Document.full` for detail paths (sender + receivers + tags).
### `@Transactional` on the Spring Data repository methods
Spring Data allows `@Transactional` on repository interfaces directly. This would keep the
session open for all calls to those methods without touching the service layer.
Rejected because the transaction boundary belongs at the service layer — repositories should
not own transaction lifecycle. The service methods are the natural scope for "keep the session
open long enough for the caller to use the result."
---
## Consequences
- **Query count reduced from ~2,733 to ≤10 statements per 24 HTTP requests** — verified by
`DocumentRepositoryTest` query-count assertions and `DocumentLazyLoadingTest` smoke tests.
- **Read methods that return lazily-initialized entities must carry `@Transactional(readOnly = true)`.**
Any future service method that loads a `Document` and returns it to a caller that accesses
lazy associations must follow this pattern. Removing the annotation causes
`LazyInitializationException` in production.
- **New lazy code paths need an entity graph or `@BatchSize` review.** Any new
`DocumentRepository` method added to a hot code path should be assessed for N+1 risk and
given an `@EntityGraph` override if warranted.
- **`@JsonIgnoreProperties({"hibernateLazyInitializer", "handler"})` required on serialized lazy-proxy entities.**
`Person` and `Tag` carry this annotation to prevent Jackson from attempting to serialize
Hibernate proxy internals when the association is not initialized. Any new entity that is
used as a lazy association and serialized directly (without a DTO) needs the same annotation.
- **Named graph strings in `Document.java` and `DocumentRepository.java` must stay in sync.**
The `@NamedEntityGraph(name = "Document.full")` / `@NamedEntityGraph(name = "Document.list")`
definitions on `Document` are referenced by string in every `@EntityGraph(value = "...")` on
`DocumentRepository`. If the names diverge (e.g. a graph is renamed in one place but not the
other), Spring Data throws at application startup. Always update both files together when
renaming or restructuring a named graph.