- Refactor DocumentLazyLoadingTest: pull value assertions (assertThat) out
of assertThatCode lambdas so failures surface as AssertionError rather
than "unexpected exception: AssertionError" (review item 1)
- Add @EntityGraph("Document.full") to findBySenderId, findByReceiversId,
findConversation, and findSinglePersonCorrespondence — all return full
Documents to the controller for JSON serialization (review item 2)
- Add "// Callers access only ..." comments to un-graphed methods where no
lazy associations are touched: findByTags_Id, findByStatus,
findByMetadataCompleteFalse(Sort), findByMetadataCompleteFalse(Pageable)
- Remove "what" inline comments from @Transactional(readOnly=true)
on getRecentActivity and getDocumentById — the why is in ADR-022 (item 4)
- Add named-graph coupling consequence to ADR-022: Document.java and
DocumentRepository.java graph name strings must stay in sync (item 5)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.7 KiB
ADR-022 — EAGER→LAZY Fetch Strategy for Document Collections
Date: 2026-05-18 Status: Accepted Issue: #467 PR: #622
Context
A pre-production query audit of 24 HTTP requests to the document list and detail endpoints
produced 2,733 SQL statements — primarily N+1 queries caused by FetchType.EAGER on
Document.receivers, Document.tags, Document.trainingLabels, and Document.sender.
With EAGER fetch, every Document loaded by any repository method immediately triggers
additional SELECT statements for each associated collection, regardless of whether the
caller needs those associations. For a list of 100 documents, this means up to 400 extra
queries for receivers alone.
Decision
Switch all four associations to FetchType.LAZY and use a two-tier strategy to load exactly
what each code path needs:
Tier 1 — Named entity graphs on Document + @EntityGraph overrides on DocumentRepository:
Document.full— loadssender,receivers,tags— used byfindById(detail view)Document.list— loadssender,tags— used byfindAll(Spec, Pageable),findAll(Spec), andfindAll(Pageable)(list/search/dashboard paths)
Each repository method that is called from a hot code path has an @EntityGraph override
that declares exactly which associations to JOIN-fetch, collapsing N+1 into 1–2 queries.
Tier 2 — @BatchSize(50) fallback on all four associations:
For any lazy access path not covered by an entity graph (e.g., a future ad-hoc query or an
in-memory sort that touches trainingLabels), Hibernate batches the secondary SELECT to
at most one statement per 50 entities instead of one per entity.
Session lifetime for post-return lazy access:
getDocumentById and getRecentActivity return entities to callers that may access lazy
associations after the repository call returns. Both methods are annotated
@Transactional(readOnly = true) to keep the Hibernate session open until the service method
returns, making those post-return accesses safe.
This is an intentional exception to the project convention that read methods are not annotated
(see CLAUDE.md §Services). The convention remains correct for all other read methods; this
exception applies only to methods that serve lazy-initialized associations to their callers.
Alternatives Considered
@BatchSize-only (no entity graphs)
@BatchSize(50) on all associations would eliminate the worst N+1 cases (100 documents → 2
batch queries instead of 100 individual queries) without requiring repository overrides. Simpler
to maintain — no named graph definitions, no per-method overrides.
Rejected because batch loading is best-effort: it depends on what Hibernate happens to find in
the first-level cache and produces a variable number of statements. Entity graphs produce a
deterministic, verifiable statement count that can be asserted in tests. The query-count test
suite (DocumentRepositoryTest) validates the exact statement bounds on every CI run.
Single unified entity graph (Document.full everywhere)
Loading receivers on every list query is wasteful — the document list view only needs
sender and tags. receivers is a @ManyToMany collection that, when JOIN-fetched together
with tags, forces Hibernate to split into two queries anyway (to avoid Cartesian product).
Using a single graph on list paths would load data the UI does not display.
Rejected in favour of two graphs with distinct scopes: Document.list for list paths
(sender + tags), Document.full for detail paths (sender + receivers + tags).
@Transactional on the Spring Data repository methods
Spring Data allows @Transactional on repository interfaces directly. This would keep the
session open for all calls to those methods without touching the service layer.
Rejected because the transaction boundary belongs at the service layer — repositories should not own transaction lifecycle. The service methods are the natural scope for "keep the session open long enough for the caller to use the result."
Consequences
- Query count reduced from ~2,733 to ≤10 statements per 24 HTTP requests — verified by
DocumentRepositoryTestquery-count assertions andDocumentLazyLoadingTestsmoke tests. - Read methods that return lazily-initialized entities must carry
@Transactional(readOnly = true). Any future service method that loads aDocumentand returns it to a caller that accesses lazy associations must follow this pattern. Removing the annotation causesLazyInitializationExceptionin production. - New lazy code paths need an entity graph or
@BatchSizereview. Any newDocumentRepositorymethod added to a hot code path should be assessed for N+1 risk and given an@EntityGraphoverride if warranted. @JsonIgnoreProperties({"hibernateLazyInitializer", "handler"})required on serialized lazy-proxy entities.PersonandTagcarry this annotation to prevent Jackson from attempting to serialize Hibernate proxy internals when the association is not initialized. Any new entity that is used as a lazy association and serialized directly (without a DTO) needs the same annotation.- Named graph strings in
Document.javaandDocumentRepository.javamust stay in sync. The@NamedEntityGraph(name = "Document.full")/@NamedEntityGraph(name = "Document.list")definitions onDocumentare referenced by string in every@EntityGraph(value = "...")onDocumentRepository. If the names diverge (e.g. a graph is renamed in one place but not the other), Spring Data throws at application startup. Always update both files together when renaming or restructuring a named graph.