Add database ER and ORM diagrams (PlantUML) #451
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
Add two PlantUML database diagrams to the architecture docs so that the schema is visible alongside the existing C4 diagrams:
db-relationships.puml— table names + FK arrows only (lightweight, one-page overview)db-orm.puml— every column with PostgreSQL type (full ORM reference)Both are single files (not split by domain). Both are referenced in
docs/architecture/c4-diagrams.mdunder a new "Database" section.File Locations
Domain Packages
Entities are grouped into seven
packageblocks matching the backend package structure:app_users,user_groups,app_users_groups,group_permissions,password_reset_tokens,invite_tokens,invite_token_group_idsdocuments,document_receivers,document_tags,document_versions,document_annotations,document_comments,document_training_labels,comment_mentionspersons,person_name_aliases,person_relationshipstagtranscription_blocks,transcription_block_versions,transcription_block_mentioned_personsocr_jobs,ocr_job_documents,ocr_training_runs,sender_modelsnotifications,audit_log,geschichten,geschichten_persons,geschichten_documentsCross-domain FK arrows drawn as inter-package lines labelled with the FK column name.
PlantUML Syntax
entity TableName {}(empty body), crow's foot cardinality (||--o{),left to right directioncolumn_name : TYPE, PK marked<<PK>>, FK columns marked<<FK>>, top-down directionhide circle, plain white background — matches the existing C4 fileshttp://heim-nas:8500Schema Source
Derived from Flyway migrations V1–V60. 30 tables total.
Key cross-domain relationships:
documents.sender_id → persons.iddocument_receivers.{document_id, person_id} → documents, personsdocument_tags.{document_id, tag_id} → documents, tagdocument_annotations.created_by → app_users.idtranscription_blocks.{annotation_id, document_id} → document_annotations, documentsaudit_log.{actor_id, document_id} → app_users, documentsnotifications.recipient_id → app_users.idgeschichten.author_id → app_users.idgeschichten_persons.person_id → persons.idgeschichten_documents.document_id → documents.idocr_training_runs.person_id → persons.idsender_models.person_id → persons.idtag.parent_id → tag.id(self-referencing)person_relationships.{person_id, related_person_id} → persons.id(self-referencing)Acceptance Criteria
docs/architecture/db/db-relationships.pumlrenders cleanly in VS Code with all 30 tables grouped into 7 packages and all FK arrows presentdocs/architecture/db/db-orm.pumlrenders cleanly with all columns and types for all 30 tablesdocs/architecture/c4-diagrams.mdupdated with a "Database" section linking both diagrams🏛️ Markus Keller — Application Architect
Observations
Migration gaps V37 and V43 are intentionally absent from the Flyway history. The issue body says "derived from V1–V60" but doesn't acknowledge this. The diagram must reflect the current schema — not the version count — so this is fine, but the diagram header or a code comment should note which migrations were the source (e.g.,
-- Generated from V1–V60, excl. V37, V43 (removed)).search_vector tsvectoris a trigger-maintained computed column (V34). It's populated byfn_documents_fts_update()— the application never writes to it directly. In the full ORM diagram, listing it assearch_vector : tsvectorwithout a marker is misleading. Recommend marking it<<computed>>so readers understand it's not a JPA-managed field.Self-referencing FKs need explicit labeling. Three tables have non-obvious recursive relationships:
tag.parent_id → tag.id,person_relationships.related_person_id → persons.id, anddocument_comments.parent_id → document_comments.id. In the relationship diagram these will produce arrows looping back to the same entity — PlantUML renders these awkwardly. Call out self-references with a note (e.g.,tag "0..1 parent" --> "0..* children" tag) so the shape isn't mistaken for a rendering artifact.Domain grouping aligns well with the backend package structure. The "Supporting" package (notifications, audit_log, geschichten, geschichten_persons, geschichten_documents) is the only one that bundles genuinely distinct concerns. This is acceptable at 5 tables — only worth splitting if the diagram becomes unreadable.
Recommendations
.pumlfiles:' Schema source: Flyway V1–V60 (excl. V37, V43)and' Last updated: <date>. This gives future maintainers a baseline to diff against.search_vectoras<<computed>>indb-orm.puml.tag,person_relationships, anddocument_comments.parent_idin both diagrams.docs/architecture/c4-diagrams.mdwith a "Database" section that matches the heading/description style of the existing C4 sections — not just a bare link.👨💻 Felix Brandt — Senior Fullstack Developer
Observations
The diagrams will drift. We're at V60 now. Based on the migration cadence, there will likely be V61, V62, … within weeks. Neither diagram has a stated update workflow — no hook, no reminder, no CI check. The full ORM diagram in particular lists every column and type; one
ALTER TABLE ADD COLUMNand it's already stale.search_vector tsvectorindocumentsis maintained by a DB trigger, not JPA — it won't appear in any entity Java file. If the implementor derives the column list from the Java entities rather than the migrations directly, this column gets dropped silently.The full ORM diagram source will be large. 30 tables × average ~8 columns = ~240 column lines, plus entity headers, FK arrows, and package wrappers. That's easily 400–500 lines of PlantUML. PlantUML renders this fine, but readability in the source file suffers. Recommend grouping each package block clearly with blank lines and section comments so future editors know where to add new columns.
document_comments.parent_idis a nullable self-reference — comments can be top-level or replies. In the relationship diagram, this self-arrow should be labeledparent (nullable)to distinguish it from the non-null FKs.Recommendations
.pumlfiles:' ⚠ Update this file when adding a Flyway migration that changes the schema.This is the cheapest drift-prevention mechanism available without tooling.search_vectorcolumn and the renamedapp_userstable (V60) are examples of things the entity layer abstracts away..pumlsource:' ── Auth ──,' ── Documents ──, etc., even though PlantUML package blocks already group them visually. Makes the source navigable.||--o{(one-to-many required) vs||--o{with a?label hint for nullable FKs likesender_idandparent_id.🧪 Sara Holt — QA Engineer
Observations
"Renders cleanly" is the only acceptance criterion for correctness — this tests that PlantUML doesn't error out, not that the diagram is accurate. An entity with a missing column or a wrong FK direction will still "render cleanly."
There is no criterion that ties the diagram to the migration history. If the implementor misses
search_vector,thumbnail_aspect,page_count, or any of the V52/V53 columns that were added asALTER TABLEstatements late in the migration chain, the diagram passes the current ACs and is still wrong.No criterion covers the c4-diagrams.md update specifically — AC #3 says "updated with a Database section linking both diagrams" but doesn't say what a passing update looks like (heading style, description text, link format).
No plan for ongoing accuracy. The diagram is a snapshot — the ACs don't establish any process for knowing when it becomes stale.
Recommendations
db-relationships.puml, manually verified against the Flyway migration history." This gives the reviewer something to check.db-orm.puml, including trigger-maintained columns such assearch_vector.".pumlfile listing the migration version it was generated from — future reviewers can rungit log -- backend/src/main/resources/db/migration/and immediately see if new migrations have been added since the diagram was last updated.docs/architecture/db/*.puml."🔒 Nora Steiner — Security Engineer
Observations
The full ORM diagram (
db-orm.puml) will document sensitive column names that are part of the auth surface:app_users.password— BCrypt hash storagepassword_reset_tokens.token— single-use password reset tokensinvite_tokens.code— invite codes that grant account creationThese columns are not secrets in the traditional sense — the column names are expected and their existence is implied by the features. However, the diagram will be committed to the repository. If the repository is private (as expected for a family archive), this is not a concern. If it were ever made public, the schema would aid an attacker in understanding the auth model.
I checked the codebase: there are no RLS policies or row-level database isolation in place — the app relies entirely on application-layer
@RequirePermissionAOP checks. The ORM diagram makes this architecture visible, which is fine for internal documentation but worth being aware of.No
audit_logcolumns leak PII beyondactor_id(a UUID reference) andpayload(JSONB — application-controlled). TheON DELETE SET NULLonactor_idis the GDPR right-to-erasure mechanism noted in V46; this is worth calling out in the diagram as a comment on that FK for any reader who might later question why it's nullable.Recommendations
db-orm.puml, add a brief comment onpassword_reset_tokens.tokenandinvite_tokens.code: these are hashed/opaque values — not plaintext secrets stored in the DB, but column names that inform auth architecture.🚀 Tobias Wendt — DevOps & Platform Engineer
Observations
The diagrams depend on a dev-time external service. The VS Code PlantUML extension is configured to render via
http://heim-nas:8500. If that server is down (restart, power, maintenance), diagrams are not renderable locally and the Gitea UI won't show them either. The.pumlfiles will exist in the repo as opaque source text only.No pre-rendered PNGs are committed. For someone reviewing a PR on mobile or on a machine without VS Code + the PlantUML extension, the diagrams are invisible. The Gitea web UI doesn't render
.pumlfiles — they show as raw text.docs/architecture/db/directory must be created — it doesn't exist yet. Not a blocker, but the implementation needs amkdir -por the directory needs to be present via a placeholder.Migration gaps (V37, V43) are intentionally removed migrations. This is a Flyway repair situation — the gap is intentional and correct. The diagram doesn't need to account for this, but the implementor should be aware when counting tables that these gaps don't represent missing schema.
Recommendations
Strongly recommend generating and committing PNG exports alongside the
.pumlfiles. The easiest approach: use the existing PlantUML server (http://heim-nas:8500) to export during implementation, and commit the PNGs todocs/architecture/db/. Add a note inc4-diagrams.mdthat both source (.puml) and rendered (.png) versions are in the directory.Alternatively, add a one-line CI step using the PlantUML jar (available as a Maven dependency or standalone download) to render PNGs on every commit to
docs/— this keeps the Gitea UI rendering correctly without the NAS server dependency.If committing PNGs: add
docs/architecture/db/*.pngto.gitattributeswithbinaryflag to avoid diff noise in PRs.No Compose file changes, no infrastructure changes, no CI pipeline changes strictly required — but the PNG suggestion significantly improves usability.
🎨 Leonie Voss — UX Designer
Observations
This is a documentation artifact rather than a UI feature, so UX concerns are minimal. The one user-facing surface is
docs/architecture/c4-diagrams.md— the page that links to both diagrams.The current
c4-diagrams.mdhas a consistent structure: each diagram section has a heading, a one-sentence description of what the diagram shows, and either a Mermaid embed or a reference. The new "Database" section should follow this pattern rather than being a bare file link.For readers of
c4-diagrams.mdwho encounter the new section: the two diagrams serve different audiences (overview vs. reference lookup) and this should be surfaced in the description so readers know which one to open.Recommendations
In the "Database" section of
c4-diagrams.md, add a one-line description per diagram, e.g.:db-relationships.puml— "Entity relationships: all tables and foreign-key connections, grouped by domain. Start here for an overview."db-orm.puml— "Full schema reference: all columns and types for all 30 tables. Use this when mapping Java entities to database columns."Follow the existing heading level pattern (use
##for "Database", matching## Level 1,## Level 2, etc.).No accessibility or visual design concerns for
.pumlsource files.📋 Elicit — Requirements Engineer
Observations
Acceptance criteria gap — correctness is untestable as written:
The three ACs all test presence or rendering, not accuracy. A diagram with wrong FK directions, missing columns, or a stale table list passes all three.
Maintenance lifecycle is undefined:
The issue treats this as a one-time deliverable, but the diagram will drift the moment V61 lands. There is no "living document" requirement and no "snapshot as of version X" declaration. Both interpretations are valid — but the implementor needs to pick one and document it.
"30 tables" is a derived claim, not a verified one:
The schema was counted by reading all 60 migrations. V37 and V43 are absent from the migration sequence, suggesting removed migrations. The implementor should verify the actual table count by running
\dtagainst the live database rather than counting migration files, since migration removal can leave behind or drop tables non-obviously.Non-functional requirement missing: diagram render time:
The full ORM diagram at 30 tables × 8+ columns will be a large PlantUML document. Some PlantUML servers time out on large diagrams. No acceptance criterion covers "renders within a reasonable time" or "doesn't hit the server's max diagram size."
Recommendations
Strengthen the ACs:
\d table_namein psql) for at least the 5 most recently modified tables."Add a decision to the issue: Is this a living document or a versioned snapshot?
' Schema as of: V60 (2026-05-06)comment header and explicitly notes it will not auto-update.Open Decisions
🗳️ Decision Queue — Action Required
1 decision needs your input before implementation starts.
Documentation Strategy
.pumlfiles be maintained alongside every future migration (living document: low friction per PR, requires discipline), or committed as a dated snapshot with' Schema as of: V60 (2026-05-06)in the header (honest about eventual staleness, zero maintenance overhead)? A living document needs a COLLABORATING.md reminder; a snapshot does not. (Raised by: Elicit, also flagged by Felix, Sara, Markus)