refactor(document): switch Document.tags + receivers + trainingLabels to LAZY + @EntityGraph #467
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
backend/src/main/java/org/raddatz/familienarchiv/document/Document.javausesFetchType.EAGERon three associations:Pre-prod audit ran
pg_stat_statementsagainst the live stack and confirmed the predicted N+1:At 1,520 documents the latency is hidden (mean 0.07 ms each, total ~200 ms over 2,733 calls). At 10× data volume, every list/search response triples query count linearly.
The hot search path uses a hand-written native SQL with
LATERAL JOINso it doesn't suffer the N+1. The N+1 hits the list and detail-after-search paths — exactly the Reader Experience milestone's load profile.Approach
Switch to LAZY by default; declare
@EntityGraphs on read paths that need the associations.Document.javaNamed entity graphs
Repository
LazyInitializationException risk
spring.jpa.open-in-view: falseis already set (audit verified). That means lazy collections accessed outside the transactional service method will throw. This is good — it forces every access into a service method or a properly-fetched entity.The risk is in
*Mapper/DocumentServicemethods that read from collections after returning to the controller. Audit by:For every match, confirm it's either inside a
@Transactionalmethod or the call site uses an@EntityGraph-annotated repository method.Bonus:
@BatchSize(50)on collectionsFor the inevitable list-of-detail-views case where the graph isn't requested but multiple Documents' collections are accessed, add
@BatchSize(50)so Hibernate fetches in 50-row batches instead of one-by-one.Critical files
backend/src/main/java/org/raddatz/familienarchiv/document/Document.javabackend/src/main/java/org/raddatz/familienarchiv/document/DocumentRepository.java— add@EntityGraphannotationsbackend/src/main/java/org/raddatz/familienarchiv/document/DocumentService.java— confirm all collection accesses are in@Transactionalscopebackend/src/test/java/.../document/DocumentRepositoryTest.java— add a test that asserts query count forfindByIdandfindAllVerification
Hibernate.unproxy()+@SpringBootTestwith statistics enabled. After loading 50 documents viafindAll(...), query count should be ≤2 (the page query + 1 batch fetch via @BatchSize), not 50+.pg_stat_statements: enablepg_stat_statements, exercise the same 24-call sequence the audit used, confirmdocument_receiverscalls drop from 2,733 to ≤24 (1 per HTTP request that needs the collection).LazyInitializationExceptionanywhere in logs../mvnw testall green; coverage gate intact.Acceptance criteria
LAZY.findByIduses@EntityGraph("Document.full").findAll(Specification, Pageable)uses@EntityGraph("Document.list").@BatchSize(50)on receivers + tags.pg_stat_statementsshows ≤24 calls per 24 HTTP requests fordocument_receivers(1 per request, not 1 per row).LazyInitializationExceptionregressions in any existing test.Effort
M — 2 days. Most of the time is auditing every collection access for transactional scope.
Risk if not addressed
Linear query-count growth as data scales. At 10K documents and a 50-row search page, today's 60-query response becomes 600-query — likely the first 5xx-from-the-DB-pool incident in production.
Tracked in audit doc as F-17 (High, escalated from Medium after dynamic confirmation).
👨💻 Markus Keller — Application Architect
Observations
FetchType.EAGERon three collections (receivers,tags,trainingLabels) generates the confirmed N+1 pattern. The audit numbers are credible — 2,733 collection-loading queries for 24 HTTP requests is exactly what EAGER on a paged list produces.open-in-view: falseis already set (confirmed inapplication.yamlline 15), which is the right precondition for a safe LAZY migration. Good foundation.@NamedEntityGraphapproach is the correct JPA idiom.Document.full(receivers + tags + sender) for detail view,Document.list(sender only) for list — the boundary makes sense given that the list view renders tags as chips but not receivers inline.DocumentService.getRecentActivity()callsdocumentRepository.findAll(PageRequest...)without a specification, and the returned documents later have their tags resolved viaresolveDocumentTagColors(). This call path is NOT annotated@Transactional, so with LAZY collections it will throwLazyInitializationExceptionthe momentd.getTags().stream()is called inresolveDocumentTagColors. This is a critical path that the issue's grep audit would NOT catch because the method name isgetRecentActivity, notget*ById.DashboardService.buildCaption()accessesdoc.getReceivers()on a document fetched viadocumentService.getDocumentById().getDocumentById()is also not@Transactional, meaning the session is closed beforebuildCaption()is called. Currently safe because EAGER loads receivers into the entity. After LAZY, this becomes aLazyInitializationException. This is a cross-domain collection access that's invisible to the grep suggested in the issue.MassImportService(line 337-338) callsdoc.getReceivers().addAll(receivers)anddoc.getTags().add(tag)— but this method IS wrapped in@Transactional(line 263), so it's safe.DocumentVersionServiceaccessesgetReceivers()andgetTags()(lines 185-206). These are called from within@Transactionalwrite methods inDocumentService, so they're safe as long as the session is still open.@BatchSize(50)suggestion is a good belt-and-suspenders addition. It handles thefindSinglePersonCorrespondence/findConversation/getDocumentsBySenderpaths that returnList<Document>without going through an@EntityGraph-annotated repository method.Recommendations
@TransactionaltoDocumentService.getDocumentById()— it already callstagService.resolveEffectiveColors(doc.getTags())outside a transaction. Every caller that uses the returned document's collections (DashboardService, DocumentController detail route) depends on this being safe.@TransactionaltoDocumentService.getRecentActivity()or add an@EntityGraphhint to thefindAll(PageRequest)call inside it — both are valid; the@Transactionalis simpler.@NamedEntityGraphdefinitions belong onDocument.java(correct placement). Add@BatchSize(50)to the collection fields directly (Hibernate annotation), not as a separate repository config.@NamedEntityGraphover Spring Data projections or explicit JPQL JOIN FETCH queries has lasting consequences. Record it.DocumentRepositoryTestas the target test class. It currently uses@DataJpaTestwith real Postgres (good). The query-count assertion should use Hibernate'sStatisticsAPI viaSessionFactory— this is more deterministic thanpg_stat_statementsin a test context.Open Decisions
Document.listentity graph includetagsas well? The list view renders tags as colored chips on every card. Fetching sender-only infindAllsaves one join, but the tag color resolution still happens inresolveDocumentTagColors()— meaning tags will be lazy-loaded one document at a time there. IncludingtagsinDocument.listmay be the right call depending on how search results are rendered.👨💻 Felix Brandt — Fullstack Developer
Observations
DocumentService, collection accesses exist in:DashboardService.buildCaption()— accessesdoc.getReceivers()on a document returned fromdocumentService.getDocumentById(), which has no@Transactionalboundary. This will throwLazyInitializationExceptionpost-migration.DocumentService.getRecentActivity()(line 570-574) — not@Transactional, returnsDocumentlist and is followed immediately byresolveDocumentTagColors()which iteratesd.getTags().stream(). Session closed before the tag stream.DocumentService.applyBulkEditToDocument()(lines 491, 501) — inside@Transactional, safe.MassImportService(lines 337-338) — inside@Transactional, safe.DocumentVersionService(lines 185-206) — called from within write transactions, safe.DocumentService.updateDocumentTags()(line 381) is a public write method with@Transactionalmissing. It's called byapplyBatchMetadata(itself inside a transaction), so currently no problem, but it's worth noting as a smell — if someone calls it directly from a controller it would be non-transactional.@EntityGraphonfindByIdandfindAll(Specification, Pageable). The repository currently has nofindAll(Specification, Pageable)— it inheritsfindAll(Specification<T>, Pageable)fromJpaSpecificationExecutor. Overriding this in the interface with@EntityGraphis valid but requires care: it overrides the inherited method, which is fine in Spring Data JPA as long as the signature exactly matches.@NamedEntityGraphannotation must appear on theDocumententity class (not the repository). The placement in the issue spec is correct.@Builder.Defaulton the collections is already in place. After switching to LAZY,@Builder.Defaultwithnew HashSet<>()remains correct — new entities start with empty initialized sets.Recommendations
@TransactionaltogetDocumentById()(line 739). It's a read method, but the LAZY access totagsinside it (viatagService.resolveEffectiveColors(doc.getTags())) requires an open session. Per project convention this is one of the documented exceptions to "reads don't need@Transactional."getRecentActivity()in@Transactional(readOnly = true)or fetch tags eagerly for that specific call. The tags are accessed right after inresolveDocumentTagColors().@NamedEntityGraph+@EntityGraphchanges in this exact order: (a) add graphs toDocument.java, (b) add@EntityGraphto the two repository methods, (c) add@BatchSize(50)to the collection fields, (d) then run the full test suite. Do not switchFetchType.LAZYfirst and debug later.@EntityGraphoverride forfindAll(Specification, Pageable)should be tested with a query-count assertion inDocumentRepositoryTest— not just a smoke test. The test must assertqueryCount <= 2for a 50-document page.updateDocumentTags()should get@Transactionalannotation — it's a write method without one, which is a clean-code violation regardless of this issue.Open Decisions
@EntityGraph+@BatchSize(50)+ fix the two non-transactional read methods that access collections.👨💻 Tobias Wendt — DevOps & Platform Engineer
Observations
pg_stat_statementsagainst the "live stack." This implies running the verification against the local dev Docker Compose stack, not against a shared staging environment. That's fine —pg_stat_statementsworks the same way locally.pg_stat_statementsalready. Confirmpg_stat_statementsis enabled in the local dev Postgres config. The Docker Compose postgres image does not enable it by default. Checkdocker-compose.ymlforPOSTGRES_INITDB_ARGSor a mountedpostgresql.conf.CLAUDE.md) means adding a query-count test is not optional — it's needed to keep coverage from dropping on the newly modified repository methods.Recommendations
pg_stat_statementsverification, confirm the extension is loaded:docker-compose exec db psql -U postgres -c "SELECT * FROM pg_extension WHERE extname = 'pg_stat_statements';". If not present, addcommand: postgres -c shared_preload_libraries=pg_stat_statementsto the db service indocker-compose.ymlfor local dev.pg_stat_statementsdependency in tests). Reservepg_stat_statementsfor the manual live-stack verification.pg_stat_statementsnumbers. Document the result in the PR description so there's a paper trail for the claimed improvement.Open Decisions
👨💻 Elicit — Requirements Engineer
Observations
searchDocuments()has three code paths (RECEIVER sort, SENDER sort, RELEVANCE sort) that callfindAll(spec)— the no-pageable overload. These paths also need to be covered, or the acceptance criterion should explicitly list which paths are in scope.DashboardServiceand the non-transactionalgetDocumentById(), this estimate is slightly optimistic. S/M boundary is appropriate — L would be over-estimating.DashboardService.buildCaption()orDocumentService.getRecentActivity(). If those are not fixed, the "No LazyInitializationException regressions" criterion will fail in prod even if all listed ACs pass.@BatchSizeshould be a required AC, not a bonus — it's the safety net for any list path that doesn't go through an@EntityGraphmethod.Recommendations
DashboardServiceandDocumentService.getRecentActivity()are confirmed to run inside an open Hibernate session or are annotated@Transactional(readOnly = true)."@BatchSize(50)from "Bonus" to a required AC. It's cheap to add and eliminates the entire class of fallback lazy-load bugs.findAll(Specification, Pageable)AC: state explicitly which call sites it covers (the fast-path insearchDocumentsonly, or allfindAlloverloads).pg_stat_statementsis excellent — keep it as a post-merge manual check and document the result in the PR.Open Decisions
👨💻 Nora "NullX" Steiner — Security Engineer
Observations
@JsonIgnoreProperties({"hibernateLazyInitializer", "handler"})is not present onPersonorTag, Jackson will attempt to traverse the proxy and either trigger unexpected queries or throw a serialization error.Document.javadoes not currently annotatePersonorTagreferences with@JsonIgnoreProperties. CheckPerson.javaandTag.javafor this annotation.@EntityGraphdoes not affect authorization checks —@RequirePermissionis enforced at the controller layer by AOP regardless of fetch strategy.open-in-view: falsesetting is confirmed present. This is the correct security-aligned default — it prevents Hibernate sessions from being held open across the full HTTP request lifecycle, which would otherwise make lazy loading silently work outside transaction boundaries and mask real bugs.grep -rE "doc\.getReceivers|doc\.getTags|doc\.getTrainingLabels") is a good starting point but misses method-chained accesses likeDashboardService.buildCaption()which accessesdoc.getReceivers()— a liveLazyInitializationExceptionrisk after this change. ALazyInitializationExceptionsurfacing at the controller layer may produce an unstructured 500 response that leaks stack trace detail if the global exception handler doesn't catch it. ConfirmGlobalExceptionHandlercoversLazyInitializationException.@EntityGraphuses JPA's join mechanism, not string concatenation.Recommendations
Person.javaandTag.javafor@JsonIgnoreProperties({"hibernateLazyInitializer", "handler"}). If absent, add it before switching to LAZY — otherwise Jackson serialization of lazy proxies produces either unexpected extra queries orHttpMessageConversionExceptionin the response.GlobalExceptionHandlercoversLazyInitializationExceptionwith a structured JSON response (not a raw 500 with stack trace). If it doesn't, add a handler that maps it to a 500 with theINTERNAL_ERRORerror code. ALazyInitializationExceptionin prod is a bug, but it shouldn't leak implementation details to the client.Open Decisions
👨💻 Sara Holt — QA Engineer
Observations
DocumentRepositoryTestis a@DataJpaTestwith real Postgres via Testcontainers — the right foundation. The test file exists but currently has no query-count assertion. Adding one is required by the acceptance criteria.StatisticsAPI. The@DataJpaTestslice does NOT includeSessionFactoryby default — you'll need to autowireEntityManagerFactoryand cast toSessionFactoryImplto access statistics, OR configurespring.jpa.properties.hibernate.generate_statistics=trueinsrc/test/resources/application-test.yamland injectStatisticsService.pg_stat_statementslive check, smoke test, and full test suite. This is a solid 4-layer plan. The missing layer is integration test coverage for the cross-domain scenarios — specificallyDashboardService.getResume()which is not currently covered by any test that would catch aLazyInitializationException.searchDocuments()bypass the@EntityGraph-annotatedfindAll(Specification, Pageable)method and call the no-pageablefindAll(spec)instead (SENDER sort, RECEIVER sort, RELEVANCE sort). The query-count assertion test must cover at least the sorted/paged fast-path — the in-memory sort paths load all documents into memory anyway, so their N+1 behavior is different.DocumentServiceTestuses Mockito — it will not catchLazyInitializationExceptionissues because mock repositories return plain Java objects, not Hibernate proxies. The query-count test MUST be inDocumentRepositoryTest(real Postgres) or a@SpringBootTestintegration test.Recommendations
DocumentRepositoryTest:src/test/resources/application-test.yaml:spring.jpa.properties.hibernate.generate_statistics: trueDashboardService.getResume()in an@SpringBootTestintegration test with a document that has receivers — this is currently not covered and will catch theLazyInitializationExceptioninbuildCaption()if it's not fixed.findAll(spec)paths insearchDocuments()(SENDER, RECEIVER, RELEVANCE sorts) should be covered by at least one integration test asserting noLazyInitializationExceptionis thrown.Open Decisions
DocumentRepositoryTest(@DataJpaTest) or a newDocumentPerformanceTest(@SpringBootTest)? Recommendation:DocumentRepositoryTestfor the repository-level assertions; a new@SpringBootTestclass for the service-level smoke tests. This keeps the fast@DataJpaTestslice focused on repository contracts.👨💻 Leonie Voss — UI/UX Design Lead
Observations
@EntityGraphconfiguration is wrong and producesLazyInitializationExceptionerrors in production, the user-visible result is a blank page or an unhandled 500 — the worst possible outcome for the 60+ audience who are not comfortable with technical failures. The thorough testing approach proposed in the issue is the right insurance.Recommendations
Open Decisions
Decision Queue
One genuine tradeoff needs a call before implementation starts.
DQ-1 — Should
Document.listentity graph includetags? (raised by Markus)Context: The issue proposes
Document.listfetches onlysender. The list view (search results) renders tag chips on every document card. After switching to LAZY,resolveDocumentTagColors()iteratesd.getTags().stream()on the returned list — this would trigger N lazy loads (one per document) unless tags are included in the graph or the method runs inside a transaction with@BatchSizeas the fallback.Option A — Include
tagsinDocument.listOne JOIN FETCH covers both sender and tags. Zero lazy loads in
resolveDocumentTagColors(). Costs one additional join on every list query, but that join already happened implicitly under EAGER — so this is neutral vs. today.Option B — Keep
Document.listsender-only, rely on@BatchSize(50)+@TransactionalTags are batch-fetched in groups of 50 when
resolveDocumentTagColors()touches them inside a transactional boundary. RequiresgetRecentActivity()and the search service to be wrapped in@Transactional(readOnly = true). Slightly more complex; the batch fetch adds 1-2 extra queries per 50-document page, but avoids the full JOIN FETCH on list queries that don't render tags.Recommendation: Option A. The list view always renders tags, so the join is never wasted. It's the simpler invariant and matches what EAGER gave you today — one load, all data present, no session tricks required.
Owner: Marcel. Call needed before the
@NamedEntityGraphdefinitions are written intoDocument.java.👨💻 Felix Brandt — Fullstack Developer — Pre-implementation discussion summary
Walked through 5 open items before touching code. All resolved.
✅ DQ-1 —
Document.listentity graph includes{sender, tags}Decision: Option A. The list view always renders tag chips, so the join is never wasted. Matches what EAGER gave us today — one load, all data present, no session tricks required.
@BatchSizefallback is belt-and-suspenders, not the primary strategy.✅
findAll(spec)no-pageable sort paths also get@EntityGraphDecision: Option A. Override both
findAll(Specification<T>)(no-pageable) andfindAll(Specification<T>, Pageable)inDocumentRepositorywith@EntityGraph("Document.list"). The threesearchDocuments()sort paths (RECEIVER, SENDER, RELEVANCE) that call the no-pageable overload are covered explicitly — not left to@BatchSizefallback.✅
@JsonIgnorePropertiesadded toPerson.javaandTag.javaDecision: in scope for this PR. Confirmed neither class currently has
@JsonIgnoreProperties({"hibernateLazyInitializer", "handler"}). Without it, Jackson may traverse Hibernate proxy internals on serialization after switching to LAZY — either triggering unexpected queries or throwingHttpMessageConversionException. One-liner per class, added while we're here.✅
@Transactionaladded toupdateDocumentTags()Decision: in scope for this PR. It's a write method with no
@Transactional. Currently safe only because its sole caller is itself transactional — an invisible invariant. Fix it now rather than wait for someone to call it from a controller directly.✅ Test structure confirmed
DocumentRepositoryTest(@DataJpaTest) — query-count assertions forfindByIdandfindAll(Specification, Pageable). Hibernate statistics enabled viaspring.jpa.properties.hibernate.generate_statistics: trueinsrc/test/resources/application-test.yaml. Access statistics via autowiredStatisticsbean — noSessionFactoryImplcasting required.@SpringBootTestclass — service-level smoke tests forDashboardService.getResume()/buildCaption()and the threesearchDocuments()sort paths. Asserts noLazyInitializationExceptionis thrown.Implementation order (from my earlier review comment) stands: (a) add
@NamedEntityGraphtoDocument.java, (b) add@EntityGraphto both repository overloads, (c) add@BatchSize(50)to collection fields, (d) add@TransactionaltogetDocumentById(),getRecentActivity(),updateDocumentTags(), (e) switchFetchType.LAZYon all three associations, (f) add@JsonIgnorePropertiestoPerson+Tag, (g) run full test suite before writing the new tests.