diff --git a/docs/adr/022-eager-to-lazy-fetch-strategy.md b/docs/adr/022-eager-to-lazy-fetch-strategy.md new file mode 100644 index 00000000..88e6bc7c --- /dev/null +++ b/docs/adr/022-eager-to-lazy-fetch-strategy.md @@ -0,0 +1,104 @@ +# ADR-022 — EAGER→LAZY Fetch Strategy for Document Collections + +**Date:** 2026-05-18 +**Status:** Accepted +**Issue:** #467 +**PR:** #622 + +--- + +## Context + +A pre-production query audit of 24 HTTP requests to the document list and detail endpoints +produced **2,733 SQL statements** — primarily N+1 queries caused by `FetchType.EAGER` on +`Document.receivers`, `Document.tags`, `Document.trainingLabels`, and `Document.sender`. + +With EAGER fetch, every `Document` loaded by any repository method immediately triggers +additional `SELECT` statements for each associated collection, regardless of whether the +caller needs those associations. For a list of 100 documents, this means up to 400 extra +queries for `receivers` alone. + +--- + +## Decision + +Switch all four associations to `FetchType.LAZY` and use a two-tier strategy to load exactly +what each code path needs: + +**Tier 1 — Named entity graphs on `Document` + `@EntityGraph` overrides on `DocumentRepository`:** + +- `Document.full` — loads `sender`, `receivers`, `tags` — used by `findById` (detail view) +- `Document.list` — loads `sender`, `tags` — used by `findAll(Spec, Pageable)`, + `findAll(Spec)`, and `findAll(Pageable)` (list/search/dashboard paths) + +Each repository method that is called from a hot code path has an `@EntityGraph` override +that declares exactly which associations to JOIN-fetch, collapsing N+1 into 1–2 queries. + +**Tier 2 — `@BatchSize(50)` fallback on all four associations:** + +For any lazy access path not covered by an entity graph (e.g., a future ad-hoc query or an +in-memory sort that touches `trainingLabels`), Hibernate batches the secondary `SELECT` to +at most one statement per 50 entities instead of one per entity. + +**Session lifetime for post-return lazy access:** + +`getDocumentById` and `getRecentActivity` return entities to callers that may access lazy +associations after the repository call returns. Both methods are annotated +`@Transactional(readOnly = true)` to keep the Hibernate session open until the service method +returns, making those post-return accesses safe. + +This is an intentional exception to the project convention that read methods are not annotated +(see `CLAUDE.md §Services`). The convention remains correct for all other read methods; this +exception applies only to methods that serve lazy-initialized associations to their callers. + +--- + +## Alternatives Considered + +### `@BatchSize`-only (no entity graphs) + +`@BatchSize(50)` on all associations would eliminate the worst N+1 cases (100 documents → 2 +batch queries instead of 100 individual queries) without requiring repository overrides. Simpler +to maintain — no named graph definitions, no per-method overrides. + +Rejected because batch loading is best-effort: it depends on what Hibernate happens to find in +the first-level cache and produces a variable number of statements. Entity graphs produce a +deterministic, verifiable statement count that can be asserted in tests. The query-count test +suite (`DocumentRepositoryTest`) validates the exact statement bounds on every CI run. + +### Single unified entity graph (`Document.full` everywhere) + +Loading `receivers` on every list query is wasteful — the document list view only needs +`sender` and `tags`. `receivers` is a `@ManyToMany` collection that, when JOIN-fetched together +with `tags`, forces Hibernate to split into two queries anyway (to avoid Cartesian product). +Using a single graph on list paths would load data the UI does not display. + +Rejected in favour of two graphs with distinct scopes: `Document.list` for list paths +(sender + tags), `Document.full` for detail paths (sender + receivers + tags). + +### `@Transactional` on the Spring Data repository methods + +Spring Data allows `@Transactional` on repository interfaces directly. This would keep the +session open for all calls to those methods without touching the service layer. + +Rejected because the transaction boundary belongs at the service layer — repositories should +not own transaction lifecycle. The service methods are the natural scope for "keep the session +open long enough for the caller to use the result." + +--- + +## Consequences + +- **Query count reduced from ~2,733 to ≤10 statements per 24 HTTP requests** — verified by + `DocumentRepositoryTest` query-count assertions and `DocumentLazyLoadingTest` smoke tests. +- **Read methods that return lazily-initialized entities must carry `@Transactional(readOnly = true)`.** + Any future service method that loads a `Document` and returns it to a caller that accesses + lazy associations must follow this pattern. Removing the annotation causes + `LazyInitializationException` in production. +- **New lazy code paths need an entity graph or `@BatchSize` review.** Any new + `DocumentRepository` method added to a hot code path should be assessed for N+1 risk and + given an `@EntityGraph` override if warranted. +- **`@JsonIgnoreProperties({"hibernateLazyInitializer", "handler"})` required on serialized lazy-proxy entities.** + `Person` and `Tag` carry this annotation to prevent Jackson from attempting to serialize + Hibernate proxy internals when the association is not initialized. Any new entity that is + used as a lazy association and serialized directly (without a DTO) needs the same annotation.