docs(legibility): write docs/GLOSSARY.md disambiguating overloaded terms #397
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Part of Epic #394 — Documentation. This is DOC-3: a glossary that prevents Anja and Tobias from forming wrong mental models when they hit overloaded terms.
Per the Legibility Rubric, this addresses C3.3 (Critical).
Required content
A single
docs/GLOSSARY.mdcontaining terms that are easily confused or domain-specific. At minimum:Identity terms
Document-related terms
Workflow terms
PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVEDOCR-specific terms
Other domain terms
Architectural terms
conversation,activity)shared/because it has no entity, no user-facing CRUD, AND ≥2 consumers OR is framework infraFormat
Each entry: Term — short definition (≤2 sentences). Where two terms are easily confused, add a "Not to be confused with" line.
Group by category (use
##headings). Alphabetical within each category.Acceptance criteria
docs/GLOSSARY.mdexists with all categories aboveREADME.md(DOC-1) anddocs/ARCHITECTURE.md(DOC-2)Definition of Done
docs/GLOSSARY.mdcommitted onmain. Closing comment links to it.🏗️ Markus Keller — Senior Application Architect
Observations
PersonvsAppUseris the most load-bearing conceptual distinction in this codebase, and I've seen it conflated in comments and informal discussions. Getting it in writing is the right call.SenderModelentry straddles two concerns: it's both an OCR term (the ML model concept) and a JPA entity (sender_modelstable). The glossary entry as written captures the ML concept but silently glosses over that it is a first-class persistent entity with its own controller. This could mislead someone into thinking it's a pure OCR-service concept.Commentin the issue says "attached to a Document, Annotation, or TranscriptionBlock." Looking atDocumentComment.java, the entity storesdocumentId,annotationId, andblockIdcolumns. The definition is accurate but could be sharpened — it is always scoped to a Document and optionally further contextualized by an Annotation or Block.TranscriptionBlockdefinition in the issue says it has a "bounding box on the page." The actual entity uses a polygon (JSONB column), not a rectangle. TheAnnotationentity also uses polygon. The CLAUDE.md andPolygonConverter.javaconfirm this. "Bounding box" should be corrected to "polygon overlay" or at minimum "bounding region (stored as polygon)."Recommendations
TranscriptionBlockdefinition: replace "bounding box" with "polygon region" to match ADR-002 (polygon-jsonb-storage) and the actualpolygonJSONB column.SenderModel: add a note that it is also a persistent entity linking aPersonto a fine-tuned Kraken model file. Current definition reads as if it is a pure runtime concept.OcrJobto the glossary under a new "OCR workflow terms" sub-group or expand the existing OCR section. It is a first-class entity (ocr_jobs,ocr_job_documents) with its own lifecycle (OcrJobStatus), and it is meaningfully distinct from the concept of running OCR on a document.docs/architecture/c4-diagrams.mdin addition to the two files listed in the acceptance criteria — the C4 diagram is where architects land first.README.md(DOC-1) anddocs/ARCHITECTURE.md(DOC-2). Verify those files exist before the PR opens —README.mddoes not currently exist at the repo root, anddocs/ARCHITECTURE.mdis not the same asdocs/architecture/c4-diagrams.md.👨💻 Felix Brandt — Senior Fullstack Developer
Observations
TranscriptionBlock— the issue says "bounding box on the page." The entity has apolygonJSONB column (PolygonConverter.java, ADR-002). It is a polygon, not an axis-aligned box.Comment— the entity class isDocumentComment(table:document_comments), not justComment. The glossary will likely useCommentas the short term, but a note should clarify the class name to avoid confusion when people grep the codebase.Transcription queuesays "three sub-queues (segmentation, transcription, ready-to-read)." The test fileTranscriptionQueueControllerTest.javaconfirms these three endpoints (/api/transcription/segmentation-queue,/api/transcription/ready-to-read). Definition is accurate and should stay.SenderModelis listed under OCR-specific terms. It is also a JPA entity (@Entity, tablesender_models), scoped to aPerson. The current definition would have a developer look for it only in the OCR service, but it lives inbackend/src/main/java/.../model/SenderModel.java.Recommendations
DocumentComment,GeschichteStatus(DRAFT/PUBLISHED enum),TranscriptionBlock,DocumentAnnotation,DocumentVersion.DocumentComment) — ..." This makes the glossary immediately useful when navigating the codebase.Geschichtesays DRAFT → PUBLISHED. CheckedGeschichteStatus.java— confirmed. Definition is correct.DocumentVersionsays "immutable snapshot." Checked the entity — it uses@Data(which generates setters), so it is not technically immutable at the Java layer. The intent is immutable by convention (no update endpoint exists). Worth noting "append-only by convention" rather than "immutable" to avoid confusion.## Terms to Addsection at the bottom) so contributors know how to extend it.🔒 Nora "NullX" Steiner — Application Security Engineer
Observations
AppUservsPersondistinction — the glossary's framing is correct and security-load-bearing. The note "NEVER has a login account" (for Person) is exactly the right emphasis. If a developer conflates these, they might expose Person-domain operations to identity checks that don't belong there — or worse, assume that Person records carry PII protections that only AppUser records should have.Permissionentry — listing the enum values is good. I'd recommend adding a one-liner explaining the enforcement mechanism: "enforced via@RequirePermissionAOP on controller methods — not via Spring Security's@PreAuthorize." This prevents contributors from accidentally duplicating the permission check using the wrong annotation.Audit log— the entry says "UPDATE/DELETE revoked at DB level." This is a strong and important claim. I checked:AuditLog.javaexists as an entity, andAuditLogQueryService.javaexists. However, the DB-level revocation claim should cite the specific Flyway migration that enforces it (a partial index or role-level REVOKE). If it is not enforced at the DB layer today, the glossary should say "append-only by application convention" rather than implying a DB constraint exists — false security documentation is worse than none.Notification— entry notes "no email/SMS today." This is a useful constraint statement for contributors who might start wiring email senders. Good.Recommendations
UPDATE/DELETEis actually revoked at the DB level. If only application-layer protection exists today, adjust the wording to reflect the actual state. Security documentation that overstates guarantees creates false confidence.Permission: include "@RequirePermission(Permission.WRITE_ALL)on controller methods — checked byPermissionAspect." This prevents contributors from reaching for@PreAuthorizeor a manual if-check.AppUserandPersonas the very first two entries (as the acceptance criteria already require). Make the "Not to be confused with" cross-reference bidirectional:Personsays "Not to be confused with AppUser" ANDAppUsersays "Not to be confused with Person."🧪 Sara Holt — QA Engineer & Test Strategist
Observations
Transcription queueentry is particularly useful for testers: it explicitly states the queue is "NOT a persistent entity." This prevents me from writing@DataJpaTestintegration tests for something that doesn't exist in the DB — a confusion I'd expect without this note.DocumentStatuslifecycle entry matches what's in the codebase (PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED). This is a load-bearing definition for our E2E test seeding — test factories setstatus = UPLOADEDroutinely and this confirms that's the correct entry point for file-based tests.Geschichtestatus (DRAFT → PUBLISHED) is confirmed inGeschichteStatus.java. Our existing E2E tests onfeat/issue-381-geschichtenrely on this lifecycle.Recommendations
DocumentVersionclarifying that there is no "create" endpoint for consumers — versions are created automatically on document save. This prevents contributors from writing tests expecting a standalonePOST /api/document-versionsendpoint.## Pending Termssection listing placeholders) so reviewers can confirm the audit integration happened rather than assuming it.COLLABORATING.mdas well — it is the document contributors read before touching the codebase. A pointer like "For domain terminology, seedocs/GLOSSARY.md" would surface it at exactly the right moment.OcrJobis conspicuously absent from the issue's term list. As a QA engineer, I write tests againstOcrJobStatustransitions frequently. Its absence is a gap worth filling before closing the issue.🎨 Leonie Voss — UI/UX Design Lead
Observations
messages/de.json,en.json,es.json). Several glossary terms appear in the UI with German names: Geschichte, Briefwechsel, Chronik. The glossary should clarify which are user-facing labels (appear in the UI) vs. internal code terms (used only in code and documentation).Briefwechsel— this appears as a route and UI label. Users see "Briefwechsel" in the navigation. The glossary definition is accurate ("bilateral conversation timeline between two Persons") but doesn't note it is user-visible. A contributor building a new feature might translate it differently in a new context.Geschichte— also user-visible (the Geschichten feature exists onfeat/issue-381-geschichten). Same concern: it is both a code entity (Geschichte.java) and a UI concept with a published page.Chronik / Aktivität— these appear to be synonymous or overlapping in the glossary. If both terms appear in the UI, they need distinct definitions. If "Chronik" is the user-facing label and "Aktivität" is the internal code concept, that distinction should be explicit.AppUserdistinction (readers vs. transcribers) could be noted here as a design constraint.Recommendations
[user-facing]or[internal](or use a consistent notation). User-facing terms must stay consistent with their Paraglide translation keys — if the glossary says "Geschichte" andde.jsonsays "Geschichten," that discrepancy needs to be flagged.ChronikvsAktivität: if they are the same concept with different labels in different contexts, say so explicitly. If they are distinct, give each a separate entry.Open Decisions (omit this section entirely if none)
ChronikvsAktivitätnaming: Are these the same concept? If both appear in the UI at different points, one needs to be the canonical label and the other deprecated. The right answer depends on which term users already encounter and recognize.⚙️ Tobias Wendt — DevOps & Platform Engineer
Observations
README.md(DOC-1) has an implicit dependency:README.mdmust already exist at the repo root. Checking the directory listing: noREADME.mdexists at the repo root today. The glossary PR will fail this acceptance criterion unless DOC-1 (the README) is either already merged or co-delivered in the same PR.docs/ARCHITECTURE.mdis referenced as DOC-2, but the current docs structure hasdocs/architecture/c4-diagrams.md, not a top-leveldocs/ARCHITECTURE.md. The glossary linking target doesn't exist yet.Recommendations
README.md) and DOC-2 (docs/ARCHITECTURE.md). If those don't exist yet, the acceptance criterion "linked from README.md and docs/ARCHITECTURE.md" cannot be met. Either create stubs in the same PR or relax the criterion to "linked from any existing top-level doc that references architecture."docs/. No build steps, no linting gates, no generated artifacts. Merge can proceed without touching CI.No open decisions from my angle.
📋 Elicit — Requirements Engineer
Observations
Chronik / Aktivitätare grouped as a single entry with a "/" separator. This implies they are synonymous or two names for the same concept. If they surface separately in the codebase or UI, they should be separate entries. If they are truly synonymous, the canonical term should be stated ("useChronik;Aktivitätis deprecated / internal alias").Briefwechselis defined as "bilateral conversation timeline between two Persons (derived from Document sender/receivers)." The "derived" note is important — it aligns with the architectural distinction between Tier-1 and Tier-2 domains. However, the glossary entry does not cross-reference theDerived domaindefinition that appears later. This is a missed internal link that would help newcomers.OcrJobandOcrJobStatusare active entities with controllers and services, but they are absent from the glossary. This is a gap, especially given OCR is one of the more complex workflows in the system.PersonNameAliasexists as a model class (PersonNameAlias.java,PersonNameAliasType.java) and is part of the Person domain's find-or-create logic. It is not in the glossary, but it is a term that would confuse a new contributor ("wait, is an alias the same as a Person?").Recommendations
OcrJob(withOcrJobStatuslifecycle) to the OCR section.PersonNameAliasto the Identity Terms section: "An alternate name or historical name form associated with a Person. Used to locate Person records during mass import."README.md(DOC-1) anddocs/ARCHITECTURE.md(DOC-2) exist before the PR is opened. Tobias flagged this too — neither file exists in the current repo structure. The "linked from" criteria will fail on merge if the link targets don't exist.Open Decisions (omit this section entirely if none)
ChronikvsAktivitätcanonical name: the glossary groups them as one entry. If both terms appear in the UI or codebase independently, one should be declared canonical. The right answer requires knowing which term users already see in the app and which one Paraglide uses as the translation key.🗳️ Decision Queue — Action Required
2 decisions need your input before implementation starts.
Terminology
ChronikvsAktivitätcanonical name — the issue groups these as a single entry (Chronik / Aktivität). If both terms appear independently in the UI or codebase, one must be declared canonical and the other marked as an alias or deprecated. If they are the same concept, the entry should state which name wins. The right answer depends on what users already see in the nav/UI and what Paraglide translation key is in use. (Raised by: Leonie, Elicit)Documentation Scope / Dependencies
README.md(DOC-1) anddocs/ARCHITECTURE.md(DOC-2) do not yet exist — the acceptance criteria for this issue require linking the glossary from both files, but neither exists in the current repo. Options: (a) deliver DOC-1 and DOC-2 stubs in the same PR as DOC-3, (b) relax the acceptance criterion to link from an existing file (e.g.,docs/architecture/c4-diagrams.mdorCOLLABORATING.md), or (c) treat this issue as blocked by DOC-1 and DOC-2 completing first. Each option has a different merge sequencing cost. (Raised by: Markus, Tobias, Elicit)✅ Decision Queue — Resolved
The 2 decisions raised in #397#issuecomment-6305:
1.
ChronikvsAktivität→ separate entries, both real, in different layersThese are not the same concept at the same level:
Aktivität/Aktivitätenis the user-facing label — the German UI label for the activity feed. The route is/aktivitaeten(verified infrontend/src/routes/aktivitaeten/).Chronikis the internal/technical name — used in code and architecture (e.g., ADR-003 is003-chronik-unified-activity-feed.md). It does not appear in user-facing nav.Glossary treatment: two entries.
Aktivität(Aktivitäten) —[user-facing]— the family activity feed accessible at/aktivitaeten. Shows recent documents, transcriptions, comments, and Geschichten as a chronological timeline.Chronik—[internal]— the conceptual name for the unified activity feed (per ADR-003). Used in code and architecture documents. The user-facing label isAktivität.Cross-reference: each entry says "See also
<other-term>."2.
README.md(DOC-1) anddocs/ARCHITECTURE.md(DOC-2) don't exist yet → deliver DOC-1 and DOC-3 in parallel; relax the linking ACDOC-3 is independently writable — it doesn't depend on the README's content, only on a link existing somewhere. Three-pronged resolution:
docs/GLOSSARY.mdand adds a temporary link fromdocs/architecture/c4-diagrams.md(which already exists) andCOLLABORATING.md(per Sara's recommendation). This satisfies "linked from at least one architecture doc."README.mdanddocs/ARCHITECTURE.md" is recorded but verifiable only after DOC-1/DOC-2 merge. Track it as a follow-up checkbox closed when DOC-1 and DOC-2 land.This unblocks DOC-3 so it can ship without waiting for DOC-1/DOC-2.
📌 Additional persona feedback to fold into implementation
These adjust the term list and definitions; not separate decisions:
TranscriptionBlockdefinition — replace "bounding box" with "polygon region" (per ADR-002 polygon-jsonb-storage and the actualpolygonJSONB column). FixDocumentVersion— replace "immutable snapshot" with "append-only by convention; no consumer-facing create/update endpoint" (the entity has Lombok@Datawith setters; immutability is convention, not enforced).OcrJobto the OCR section with itsOcrJobStatuslifecycle. Distinct from "running OCR" — it's a first-class entity (ocr_jobs,ocr_job_documents).SenderModel— note it is also a persistent entity (sender_modelstable) linking aPersonto a fine-tuned Kraken model file. Currently reads as if it's a pure runtime concept.Comment— say "always scoped to a Document; optionally further contextualized by an Annotation or TranscriptionBlock." Java class isDocumentComment— note the class name in parentheses per Felix.**Comment** (DocumentComment) — …. Helps grep. Apply toPerson,AppUser,Document,TranscriptionBlock,Annotation(DocumentAnnotation), etc.Audit logclaim before commit. The current entry says "UPDATE/DELETE revoked at DB level." Verify Flyway migrations actually enforce this (look forREVOKEor partial-index constructs). If only application-layer protection exists, change wording to "append-only by application convention." False security documentation is worse than none.Permissionentry — "enforced via@RequirePermissionAOP on controller methods (checked byPermissionAspect); not via Spring Security@PreAuthorize."Person↔AppUsercross-reference bidirectional ("Not to be confused with X" on both).DocumentVersion— "no consumer-facing create endpoint; versions created automatically on document save."PersonNameAliasto Identity Terms — "an alternate name or historical name form associated with a Person; used to locate Person records during mass import."## Pending Termstrailing section (per Sara) so contributors know how to extend the glossary as new audit findings surface.COLLABORATING.md— it's read first by new contributors.docs/architecture/c4-diagrams.md.[user-facing]or[internal]for terms that exist in both layers (Geschichte, Briefwechsel, Aktivität, Chronik). Clarifies which terms must stay synchronised with Paraglide translation keys.Status: Ready for implementation.
Implementation complete — PR #439
docs/GLOSSARY.mdis written and PR #439 is open.Commits
53020751—docs/GLOSSARY.md(113 lines, 7 sections)9dae044e— link fromdocs/architecture/c4-diagrams.mda5f4b0df— link fromCOLLABORATING.mdAcceptance criteria status
docs/GLOSSARY.mdexists with all categoriesdocs/architecture/c4-diagrams.mdandCOLLABORATING.md(temporary; DOC-1/DOC-2 PRs own their own GLOSSARY links per the resolved Decision Queue)## Pending Termssection includes a note to review Epic 1 audit findings (#388–#392) before closingREADME.md— deferred until DOC-1 PR (#395) landsdocs/ARCHITECTURE.md— deferred until DOC-2 PR (#396) landsKey corrections applied (vs issue body)
TranscriptionBlockPolygonConverter.javaDocumentVersion@Datagenerates settersAudit logAdditions beyond the issue body
OcrJobwithOcrJobStatuslifecycle (PENDING → RUNNING → DONE / FAILED)PersonNameAliasin Identity TermsSenderModelsharpened — also a persistent entity (sender_modelstable)Commentsharpened — Java class isDocumentCommentChronik[internal] andAktivität[user-facing] as separate entriesPermissionenforcement note (@RequirePermissionAOP viaPermissionAspect)[user-facing]/[internal]tags on Geschichte, Briefwechsel, Aktivität, Chronik