Disambiguates all overloaded terms in the codebase: Person vs AppUser, Chronik (internal) vs Aktivität (user-facing), TranscriptionBlock polygon vs bounding box, DocumentVersion append-only convention, OcrJob lifecycle, SenderModel as persistent entity, Audit log DB-layer caveat, and more. Includes Pending Terms section for audit follow-ups (#388–#392). Refs #397 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8.8 KiB
Familienarchiv — Glossary
Domain-specific and overloaded terms used in this codebase. Each entry: Term — definition (≤ 2 sentences). Where two terms are easily confused, a Not to be confused with note follows.
For architecture context see docs/architecture/c4-diagrams.md.
For domain package structure see docs/ARCHITECTURE.md (coming: DOC-2).
Identity Terms
AppUser (AppUser) — a real person who can log into the system (a family member or administrator). AppUser records carry login credentials, group memberships, and notification history.
Not to be confused with Person — an AppUser is never recorded as a document sender, receiver, or historical individual.
Permission — a discrete capability string assigned to a UserGroup (e.g. READ_ALL, WRITE_ALL, ADMIN, ADMIN_USER, ADMIN_TAG, ADMIN_PERMISSION). Enforced via the @RequirePermission AOP annotation on controller methods, checked at runtime by PermissionAspect; not via Spring Security's @PreAuthorize.
Person (Person) — a historical individual in the family archive (sender, receiver of letters, person mentioned in transcriptions). NEVER has a login account and NEVER appears as an AppUser.
Not to be confused with AppUser — Person is a historical record; AppUser is someone who can log in today.
PersonNameAlias (PersonNameAlias) — an alternate or historical name form associated with a Person (e.g. maiden name, nickname, abbreviated form). Used to locate Person records during mass import via PersonNameAliasType.
UserGroup (UserGroup) — a named permission bundle assigned to one or more AppUsers. A user's effective permissions are the union of all permissions across all groups they belong to.
Document-Related Terms
Annotation (DocumentAnnotation) — a free-form polygon or shape drawn over a document page image to highlight a region of interest. Always scoped to a specific page of a Document; stored as a polygon (JSONB).
See also TranscriptionBlock.
Comment (DocumentComment, table document_comments) — a threaded discussion message attached to a Document. Always scoped to a Document; optionally further contextualized by a specific DocumentAnnotation or TranscriptionBlock.
Document (Document) — a single archival item (letter, postcard, photograph) with a file stored in MinIO/S3 and associated metadata (sender, receivers, date, tags, transcription blocks).
DocumentVersion (DocumentVersion) — an append-only snapshot of a Document's metadata at a point in time. Append-only by convention; no consumer-facing create or update endpoint exists. The entity uses Lombok @Data (which generates setters), so immutability is enforced by application convention, not at the Java level.
Tag (Tag) — a hierarchical category that can be applied to Documents. Tags are self-referencing via a parent_id foreign key, forming a tree structure.
TranscriptionBlock (TranscriptionBlock) — a paragraph-level segment of a Document's transcribed text, with a polygon region (stored as JSONB) identifying its position on the page. One document can have many blocks across multiple pages.
See also Annotation.
Workflow Terms
DocumentStatus lifecycle — the ordered states a Document moves through:
PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED
PLACEHOLDER: created during mass import; no file attached yet.UPLOADED: a file has been stored in MinIO/S3.TRANSCRIBED: all transcription blocks have been marked done.REVIEWED: a reviewer has approved the transcription.ARCHIVED: the document is finalized and read-only.
Mass import — an asynchronous batch process (MassImportService) that reads an Excel or ODS file and creates Persons, Tags, and PLACEHOLDER Documents in one shot. Only one import can run at a time (IMPORT_ALREADY_RUNNING error if attempted concurrently).
Transcription queue — the set of Documents and TranscriptionBlocks awaiting work, computed on-the-fly from Document/Block status. Three views: segmentation queue, transcription queue, ready-to-read queue. NOT a persistent entity — no transcription_queues table exists.
See also DocumentStatus lifecycle.
OCR-Specific Terms
HTR — Handwritten Text Recognition. Recognizes cursive and historical handwriting (contrasted with OCR for printed/typewritten text). The primary mode used for letters in this archive.
Kurrent — Old German cursive handwriting style, the primary historical script appearing in letters from the 1899–1950 period covered by this archive.
OCR — Optical Character Recognition. Recognizes printed or typewritten text. Used for typed documents; HTR is used for handwritten ones.
OcrJob (OcrJob, table ocr_jobs) — a first-class persistent entity tracking a batch OCR run across one or more documents (OcrJobDocument, table ocr_job_documents). Distinct from the concept of "running OCR on a single document." Lifecycle: PENDING → RUNNING → DONE / FAILED (see OcrJobStatus).
SenderModel (SenderModel, table sender_models) — a fine-tuned Kraken HTR model trained on a specific historical correspondent's handwriting. Both an OCR-service concept (the model weights) and a persistent entity linking a Person to the path of their trained model file.
Sütterlin — A specific standardized style of Kurrent taught in German schools from 1915 to 1941.
Other Domain Terms
Aktivität / Aktivitäten [user-facing] — the family activity feed accessible at /aktivitaeten. Shows recent documents, transcriptions, comments, and Geschichten as a chronological timeline.
See also Chronik.
Briefwechsel [user-facing] — the bilateral conversation timeline between two Persons, derived from Document sender/receiver relationships. Accessible at /briefwechsel. Not a persistent entity — data is computed from existing Document records.
See also Derived domain.
Chronik [internal] — the conceptual and code-level name for the unified activity feed (per ADR-003 003-chronik-unified-activity-feed.md). Used in code, architecture documents, and ADRs. The user-facing label for the same concept is Aktivität.
Geschichte (Geschichte) [user-facing] — a narrative story or article published in the archive, linking Persons and Documents. Lifecycle: DRAFT → PUBLISHED (see GeschichteStatus). DRAFT stories are hidden from users without the BLOG_WRITE permission.
Notification (Notification) — an in-app message delivered to an AppUser. No email or SMS delivery exists today. Delivered via Server-Sent Events (SseEmitterRegistry) and persisted in the notifications table.
Audit log (AuditLog, table audit_log) — an append-only event store recording domain-level activity (document edits, user actions, etc.). Append-only by application convention; a REVOKE UPDATE, DELETE is attempted at the DB layer (see migrations V46, V47) but is a no-op if the application role is the table owner in PostgreSQL. Do not rely on DB-enforced immutability — the constraint is application-layer only.
Architectural Terms
Cross-cutting — code that lives in lib/shared/ (frontend) or cross-domain packages (backend) because it has no entity of its own, no user-facing CRUD, AND is used by two or more domains OR is framework infrastructure (error handling, API client, i18n utilities).
Derived domain — a Tier-2 frontend domain that has its own UI but no backend entities of its own. Data is computed from Tier-1 domain records. Current derived domains: conversation (from Document sender/receivers) and activity (from audit, notifications, document events).
See also Briefwechsel.
Domain — a Tier-1 bounded context with its own entities, controller, service, repository, and DTOs. Backend domains: document, person, tag, user, geschichte, notification, ocr, audit, dashboard. Frontend domains mirror this structure under src/lib/.
Pending Terms
Terms flagged as potentially ambiguous that have not yet been formally defined here. Add an entry above and remove it from this list when resolved.
- Terms surfaced by Epic 1 audit findings (#388–#392) — review audit reports under
docs/audits/when available and add any term flagged as ambiguous. OcrBatchServicevsOcrAsyncRunner— both handle async OCR orchestration; their division of responsibility should be clarified here.Stammbaum— the genealogy tree view; relationship toPersonRelationshipentity.