Files
familienarchiv/docs/GLOSSARY.md
Marcel 9e1754bbb0 docs: add Reader glossary entry + clarifying comments on specs and query
- GLOSSARY.md: defines "Reader" as the permission-derived role
  (isReader = !canWrite && !canAnnotate) — addresses @Markus blocker
- GeschichteSpecifications.hasAuthor: comment explains null = no restriction
  (PUBLISHED path) — addresses @Markus suggestion
- PersonRepository.findTopByDocumentCount: comment explains alias-in-ORDER-BY
  is intentional PostgreSQL behaviour — addresses @Markus suggestion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 15:56:47 +02:00

9.3 KiB
Raw Permalink Blame History

Familienarchiv — Glossary

Domain-specific and overloaded terms used in this codebase. Each entry: Term — definition (≤ 2 sentences). Where two terms are easily confused, a Not to be confused with note follows.

For architecture context see docs/architecture/c4-diagrams.md. For domain package structure see docs/ARCHITECTURE.md (coming: DOC-2).


Identity Terms

AppUser (AppUser) — a real person who can log into the system (a family member or administrator). AppUser records carry login credentials, group memberships, and notification history. Not to be confused with Person — an AppUser is never recorded as a document sender, receiver, or historical individual.

Reader — an AppUser whose effective permissions include READ_ALL but neither WRITE_ALL nor ANNOTATE_ALL. Readers see a dedicated dashboard (isReader = !canWrite && !canAnnotate) focused on browsing documents, persons, and stories rather than contribution tasks. A user who also holds BLOG_WRITE is still classified as a Reader and additionally sees a drafts module. Not to be confused with AppUser — Reader is a permission-derived role, not an entity.

Permission — a discrete capability string assigned to a UserGroup (e.g. READ_ALL, WRITE_ALL, ADMIN, ADMIN_USER, ADMIN_TAG, ADMIN_PERMISSION). Enforced via the @RequirePermission AOP annotation on controller methods, checked at runtime by PermissionAspect; not via Spring Security's @PreAuthorize.

Person (Person) — a historical individual in the family archive (sender, receiver of letters, person mentioned in transcriptions). NEVER has a login account and NEVER appears as an AppUser. Not to be confused with AppUserPerson is a historical record; AppUser is someone who can log in today.

PersonNameAlias (PersonNameAlias) — an alternate or historical name form associated with a Person (e.g. maiden name, nickname, abbreviated form). Used to locate Person records during mass import via PersonNameAliasType.

UserGroup (UserGroup) — a named permission bundle assigned to one or more AppUsers. A user's effective permissions are the union of all permissions across all groups they belong to.


Annotation (DocumentAnnotation) — a free-form polygon or shape drawn over a document page image to highlight a region of interest. Always scoped to a specific page of a Document; stored as a polygon (JSONB). See also TranscriptionBlock.

Comment (DocumentComment, table document_comments) — a threaded discussion message attached to a Document. Always scoped to a Document; optionally further contextualized by a specific DocumentAnnotation or TranscriptionBlock.

Document (Document) — a single archival item (letter, postcard, photograph) with a file stored in MinIO/S3 and associated metadata (sender, receivers, date, tags, transcription blocks).

DocumentVersion (DocumentVersion) — an append-only snapshot of a Document's metadata at a point in time. Append-only by convention; no consumer-facing create or update endpoint exists. The entity uses Lombok @Data (which generates setters), so immutability is enforced by application convention, not at the Java level.

Tag (Tag) — a hierarchical category that can be applied to Documents. Tags are self-referencing via a parent_id foreign key, forming a tree structure.

TranscriptionBlock (TranscriptionBlock) — a paragraph-level segment of a Document's transcribed text, with a polygon region (stored as JSONB) identifying its position on the page. One document can have many blocks across multiple pages. See also Annotation.


Workflow Terms

DocumentStatus lifecycle — the ordered states a Document moves through: PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED

  • PLACEHOLDER: created during mass import; no file attached yet.
  • UPLOADED: a file has been stored in MinIO/S3.
  • TRANSCRIBED: all transcription blocks have been marked done.
  • REVIEWED: a reviewer has approved the transcription.
  • ARCHIVED: the document is finalized and read-only.

Mass import — an asynchronous batch process (MassImportService) that reads an Excel or ODS file and creates Persons, Tags, and PLACEHOLDER Documents in one shot. Only one import can run at a time (IMPORT_ALREADY_RUNNING error if attempted concurrently).

Transcription queue — the set of Documents and TranscriptionBlocks awaiting work, computed on-the-fly from Document/Block status. Three views: segmentation queue, transcription queue, ready-to-read queue. NOT a persistent entity — no transcription_queues table exists. See also DocumentStatus lifecycle.


OCR-Specific Terms

HTR — Handwritten Text Recognition. Recognizes cursive and historical handwriting (contrasted with OCR for printed/typewritten text). The primary mode used for letters in this archive.

Kurrent — Old German cursive handwriting style, the primary historical script appearing in letters from the 18991950 period covered by this archive.

OCR — Optical Character Recognition. Recognizes printed or typewritten text. Used for typed documents; HTR is used for handwritten ones.

OcrJob (OcrJob, table ocr_jobs) — a first-class persistent entity tracking a batch OCR run across one or more documents (OcrJobDocument, table ocr_job_documents). Distinct from the concept of "running OCR on a single document." Lifecycle: PENDING → RUNNING → DONE / FAILED (see OcrJobStatus).

SenderModel (SenderModel, table sender_models) — a fine-tuned Kraken HTR model trained on a specific historical correspondent's handwriting. Both an OCR-service concept (the model weights) and a persistent entity linking a Person to the path of their trained model file.

Sütterlin — A specific standardized style of Kurrent taught in German schools from 1915 to 1941.


Other Domain Terms

Aktivität / Aktivitäten [user-facing] — the family activity feed accessible at /aktivitaeten. Shows recent documents, transcriptions, comments, and Geschichten as a chronological timeline. See also Chronik.

Briefwechsel [user-facing] — the bilateral conversation timeline between two Persons, derived from Document sender/receiver relationships. Accessible at /briefwechsel. Not a persistent entity — data is computed from existing Document records. See also Derived domain.

Chronik [internal] — the conceptual and code-level name for the unified activity feed (per ADR-003 003-chronik-unified-activity-feed.md). Used in code, architecture documents, and ADRs. The user-facing label for the same concept is Aktivität.

Geschichte (Geschichte) [user-facing] — a narrative story or article published in the archive, linking Persons and Documents. Lifecycle: DRAFT → PUBLISHED (see GeschichteStatus). DRAFT stories are hidden from users without the BLOG_WRITE permission.

Notification (Notification) — an in-app message delivered to an AppUser. No email or SMS delivery exists today. Delivered via Server-Sent Events (SseEmitterRegistry) and persisted in the notifications table.

Audit log (AuditLog, table audit_log) — an append-only event store recording domain-level activity (document edits, user actions, etc.). Append-only by application convention; a REVOKE UPDATE, DELETE is attempted at the DB layer (see migrations V46, V47) but is a no-op if the application role is the table owner in PostgreSQL. Do not rely on DB-enforced immutability — the constraint is application-layer only.


Architectural Terms

Cross-cutting — code that lives in lib/shared/ (frontend) or cross-domain packages (backend) because it has no entity of its own, no user-facing CRUD, AND is used by two or more domains OR is framework infrastructure (error handling, API client, i18n utilities).

Derived domain — a Tier-2 frontend domain that has its own UI but no backend entities of its own. Data is computed from Tier-1 domain records. Current derived domains: conversation (from Document sender/receivers) and activity (from audit, notifications, document events). See also Briefwechsel.

Domain — a Tier-1 bounded context with its own entities, controller, service, repository, and DTOs. Backend domains: document, person, tag, user, geschichte, notification, ocr, audit, dashboard. Frontend domains mirror this structure under src/lib/.


Pending Terms

Terms flagged as potentially ambiguous that have not yet been formally defined here. Add an entry above and remove it from this list when resolved.

  • Terms surfaced by Epic 1 audit findings (#388#392) — review audit reports under docs/audits/ when available and add any term flagged as ambiguous.
  • OcrBatchService vs OcrAsyncRunner — both handle async OCR orchestration; their division of responsibility should be clarified here.
  • Stammbaum — the genealogy tree view; relationship to PersonRelationship entity.