fix(db): add indexes on documents.sender_id and document_comments.author_id #470
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Pre-prod audit's schema inspection found two FK columns without supporting indexes:
These are queried on common UI paths:
documents.sender_id— every "show all letters from this person" query (used on/persons/[id]Korrespondenz-Überblick per #306, on/briefwechsel, and as part of mass-edit flows).document_comments.author_id— every "show all comments by this user" query (admin user detail, activity feed, future contributor profile per #335).Without an index, Postgres does a sequential scan. At 1,520 documents and a few hundred comments today the sequential scan is fast (< 1 ms). At 10× data volume and 100× comments, both queries become bottlenecks for the read-side experience the team is building toward (Reader Experience v1 / Persons Redesign v2 milestones).
Approach
Single Flyway migration adding both indexes. Indexes on FK columns rarely cause downsides — query planner can ignore them, and Postgres maintains them cheaply at the volumes this app sees.
V64__index_fk_columns.sql
CREATE INDEX CONCURRENTLYwould be production-correct for live deploys, but Flyway transactional DDL doesn't allow it; rely on the brief lock at deploy-time. At current data volume the index build is sub-second.Optional: cover audit_log / notifications
While we're here, consider whether
audit_log.actor_idornotifications.user_iddeserve the same treatment. Audit didn't flag them, but they're queried on similar patterns. Out of scope for this issue unless cheap.Critical files
backend/src/main/resources/db/migration/V64__index_fk_columns.sql— newVerification
pg_stat_statements: re-exercise the audit's 24-call sequence; total time for sender-filtered queries drops.pg_indexescheck.Acceptance criteria
\d documentsand\d document_commentsshow them.WHERE sender_id = ?andWHERE author_id = ?shows Index Scan.IF NOT EXISTS), so re-running on a partially-migrated DB doesn't fail.Effort
XS — 30 minutes.
Risk if not addressed
At 10K+ documents, sender-filtered queries on
/persons/[id]become user-visibly slow (>200 ms).Tracked in audit doc as F-33 (Medium) — new dynamic finding from
pg_attributejoin.👨💻 Markus Keller — Application Architect
Observations
V1__initial_schema.sqldefinesdocuments.sender_idas a plain FK column with noCREATE INDEXstatement.V12__add_document_comments.sqlcreates indexes ondocument_id,annotation_id, andparent_id— but explicitly omitsauthor_id. No subsequent migration (V2–V60) fills either gap.sender_idis used in at least six distinct query shapes acrossDocumentRepositoryandPersonRepository:findBySenderId, two briefwechsel JOIN queries, two Korrespondenz-Überblick correlated subqueries (one used inside aSELECT COUNT(*)per person in list views), and a bulk-merge UPDATE. EveryWHERE d.sender_id = :personIdhits a sequential scan today.audit_log.actor_idandnotifications.recipient_idare already indexed (V46 and V16 respectively). The issue's "optional" consideration is moot — those are already covered.IF NOT EXISTSguard is correct practice. This codebase's existing migrations (V28, V30, V36, V42) all use this guard. Consistency is good.CREATE INDEX CONCURRENTLYis incompatible with Flyway's transactional DDL. At the documented data volume, the brief table lock during index creation is sub-second. Accept the trade-off.docs/architecture/db/db-orm.pumlordocs/architecture/db/db-relationships.puml(they show entity relationships, not physical storage). No CLAUDE.md or ADR changes needed.Recommendations
IF NOT EXISTSguard matches the project's pattern. Add no scope creep.geschichten.author_idto this migration. That FK referencesapp_users, notpersons, and the table is new (V58). Its query patterns (story list, contributor profile) are not yet production-load bottlenecks. Create a separate issue if the audit eventually flags it.MigrationIntegrationTestwith two index-existence assertions. The pattern for verifying database-layer invariants is already established in that file. Apg_indexesquery assertingidx_documents_sender_idandidx_comments_author_idexist after migration gives the same enforcement guarantee as the constraint tests already there. See Sara's review for the exact test shape.git log --all --oneline -- backend/src/main/resources/db/migration/V6*.sqlbefore merging to check no in-flight branch has claimed V61–V63.👨💻 Felix Brandt — Fullstack Developer
Observations
CREATE INDEX IF NOT EXISTS, matching the style in V28, V36, V38, V48, V49, and V55. The names follow the existing convention:idx_<table>_<column>.sender_idis not just a simple FK — it participates in complex correlated subqueries inPersonRepository. Two queries each useWHERE d.sender_id = :personIdandWHERE dr.person_id = :personId AND d.sender_id IS NOT NULLinside UNION expressions. These scan thedocumentstable per-person during list rendering. Without the index, every person in a directory listing fires a sequential scan.document_comments.author_idis stored in the entity (@Column(name = "author_id")) butCommentServicecurrently queries only byblockId,parentId, anddocumentId. The index anticipates the future "comments by this user" admin view (#335 / contributor profile) — which is explicitly described in the issue. There is nofindByAuthorIdinCommentRepositorytoday, which means the index is speculative for the write path. It is still cheap and correct to add.whyof each index, referencing issue numbers. That's consistent with the project's comment style in migrations like V36 and V49.Recommendations
findByAuthorIdmethod toCommentRepositoryspeculatively — let the query consumer drive that when the feature exists.MigrationIntegrationTestassertions first (they fail because the indexes don't exist yet), then create the migration file (they go green). The test body is apg_indexesquery — two assertions, red then green. This upholds the red/green discipline for infrastructure changes.👨💻 Tobias Wendt — DevOps & Platform Engineer
Observations
CREATE INDEX CONCURRENTLYlimitation with Flyway's transactional DDL. The brief table lock at deploy time is the accepted trade-off. At sub-second index build time (confirmed by the issue at current data volumes), this is a non-issue for the deploy window../mvnw clean packagewhich triggersMigrationIntegrationTestvia Testcontainers against a realpostgres:16-alpinecontainer. The new migration will run through CI on every PR with no additional pipeline configuration needed.IF NOT EXISTSguard means re-running Flyway against a partially-migrated database (e.g., a failed mid-deploy recovery) will not error. This is correct for a self-hosted single-node deployment where rollback is apg_restore, not a migration rollback.Recommendations
IF NOT EXISTSguard makes it deploy-safe.CREATE INDEX CONCURRENTLYstep outside Flyway (via a maintenance script). That is a future concern, not a present one.👨💻 Elicit — Requirements Engineer
Observations
pg_stat_statements, idempotency test). This is production-grade issue writing./persons/[id]Korrespondenz-Überblick is a reader-facing path (confirmed by #306 linkage), which means this issue is directly on the Reader Experience v1 critical path.audit_log.actor_id(V46) andnotifications.recipient_id(V16) are already indexed. This optional section is noise and should be struck from the issue to avoid confusion during implementation. A note that both are already covered would be better than leaving the "consider whether" language.pg_indexesIF NOT EXISTSstructure in the SQLauthor_idmatters. These forward references are valuable — they connect the infrastructure investment to product milestones.WHERE sender_id = ?query against a table with ≥100 rows shows Bitmap Index Scan or Index Scan (not Seq Scan)." This is verifiable in the MigrationIntegrationTest if seed data is added.Recommendations
audit_log/notificationsparagraph and replace it with: "Audit log and notifications are already covered:" This prevents any implementer from adding unnecessary indexes.audit_loghasidx_audit_log_actor_id(V46) andnotificationshasidx_notifications_recipient(V16).IF NOT EXISTScriterion is already testable and the EXPLAIN criterion is verifiable during manual QA or via the MigrationIntegrationTest.👨💻 Nora "NullX" Steiner — Security Engineer
Observations
sender_idandauthor_iddoes not change any access control boundary, does not expose new query paths, and does not affect authentication or authorization logic.CREATE INDEXstatements — the SQL is not user-controlled and cannot be injected. This is a DDL-only change applied at deploy time by Flyway.IF NOT EXISTSguard is safe and correct: it prevents double-execution errors without bypassing any constraint.@RequirePermission+PermissionAspectauthorization model remains unchanged. There is no IDOR or privilege escalation risk from adding a covering index on a FK column.document_comments.author_idis nullable (REFERENCES users(id) ON DELETE SET NULL). Deleted-user comments retain their content with a nullauthor_id. An index on a nullable column is fine in PostgreSQL — null values are indexed by default in B-tree indexes, which is correct behavior here. No security concern.docs/audits/2026-05-07-pre-prod-architectural-review.md) flags several Critical findings (no CSP, auth-token-in-cookie brute-force, default admin passwordadmin123). Those are materially higher severity than this fix. This issue is correctly labeled P2-medium.Recommendations
/login, theadmin123default credential). This issue is on the correct priority tier; the Critical findings are not.👨💻 Sara Holt — QA Engineer
Observations
pg_indexescheck." This is partially correct — index existence is verifiable at the DB layer — but the existingMigrationIntegrationTestpattern is the right place to add that verification, not skip it entirely.MigrationIntegrationTestusing@DataJpaTest+PostgresContainerConfig+JdbcTemplate. Every other schema invariant in the project (CHECK constraints, partial unique indexes, nullable columns) has a corresponding test in that class. Leaving the index unverified breaks the project's own standard that schema-level guarantees are tested.JdbcTemplatequery againstpg_catalog.pg_indexes:pg_stat_statements) are good manual QA steps but are not substitutes for automated regression coverage. Once the index exists, a future migration that accidentally drops it would go undetected without thepg_indexesassertions.v64_*test methods first (they fail with 0 != 1), then createV64__index_fk_columns.sql(they go green). This follows the red/green discipline for infrastructure changes.Recommendations
pg_indexesassertions toMigrationIntegrationTestas part of this issue. The issue says "no new tests needed" — disagree. The project's standard is: every schema invariant has a test. Two 4-line test methods prevent silent regression.MigrationIntegrationTestconsistent with the existing style (// ─── V64: ...).pg_indexescheck is the right automated gate; the EXPLAIN verification is a one-time manual QA step.Open Decisions
MigrationIntegrationTestgain seed-data-based query plan tests? The issue's verification step 2 (EXPLAIN shows Index Scan) cannot be automated reliably against the Testcontainers database, which starts empty. If this project eventually wants automated EXPLAIN-based regression tests, it would require a seed fixture with enough rows to force the planner past the seq-scan threshold (~50+ rows for a selective FK). That is broader than this issue — worth a separate spike.👨💻 Leonie Voss — UI/UX Designer
Observations
/persons/[id]Korrespondenz-Überblick and/briefwechselare reader-facing paths used by the senior audience (60+). Latency on those views directly degrades usability for the primary transcriber audience — reducing query time from potentially 200ms+ to sub-millisecond supports the "no perceptible lag on navigation" experience goal for the reader experience milestone.Recommendations
Decision Queue
One open decision surfaced across the persona reviews. It is out of scope for this issue but worth deciding before the next migration-test cycle.
Theme: Automated EXPLAIN-based regression testing in
MigrationIntegrationTestRaised by: Sara Holt (@saraholt)
Question: Should
MigrationIntegrationTestgain seed-data-based query plan assertions that verifyEXPLAINshows an Index Scan (not Seq Scan) after FK-column index migrations?Why it matters: The issue's verification step 2 ("EXPLAIN shows Index Scan") is currently a manual QA step. The Testcontainers database starts empty, so the query planner legitimately chooses a sequential scan on an empty table regardless of index presence. Automating this requires pre-seeding ≥50 rows of test data before the EXPLAIN call, which adds complexity to
MigrationIntegrationTest.Options:
pg_indexesassertion confirms the index exists; EXPLAIN verification remains a one-time manual step. Simple, no test flakiness risk.EXPLAIN, and checks for "Index Scan" in the output string. More thorough, but brittle if the planner changes thresholds between Postgres minor versions.Recommendation from review: Option A is correct for this issue. Revisit Option B as a standalone spike if the team wants automated query-plan regression coverage more broadly.
Owner: Marcel — decide before or during implementation of this issue.