Files
familienarchiv/docs/GLOSSARY.md
Marcel 5302075124 docs(legibility): write docs/GLOSSARY.md — DOC-3
Disambiguates all overloaded terms in the codebase: Person vs AppUser,
Chronik (internal) vs Aktivität (user-facing), TranscriptionBlock polygon
vs bounding box, DocumentVersion append-only convention, OcrJob lifecycle,
SenderModel as persistent entity, Audit log DB-layer caveat, and more.

Includes Pending Terms section for audit follow-ups (#388–#392).

Refs #397

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 22:27:13 +02:00

114 lines
8.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Familienarchiv — Glossary
Domain-specific and overloaded terms used in this codebase.
Each entry: **Term** — definition (≤ 2 sentences). Where two terms are easily confused, a _Not to be confused with_ note follows.
For architecture context see [`docs/architecture/c4-diagrams.md`](architecture/c4-diagrams.md).
For domain package structure see [`docs/ARCHITECTURE.md`](ARCHITECTURE.md) _(coming: DOC-2)_.
---
## Identity Terms
**AppUser** (`AppUser`) — a real person who can log into the system (a family member or administrator). `AppUser` records carry login credentials, group memberships, and notification history.
_Not to be confused with [Person](#person-person)_ — an AppUser is never recorded as a document sender, receiver, or historical individual.
**Permission** — a discrete capability string assigned to a `UserGroup` (e.g. `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`). Enforced via the `@RequirePermission` AOP annotation on controller methods, checked at runtime by `PermissionAspect`; not via Spring Security's `@PreAuthorize`.
**Person** (`Person`) — a historical individual in the family archive (sender, receiver of letters, person mentioned in transcriptions). NEVER has a login account and NEVER appears as an `AppUser`.
_Not to be confused with [AppUser](#appuser-appuser)_`Person` is a historical record; `AppUser` is someone who can log in today.
**PersonNameAlias** (`PersonNameAlias`) — an alternate or historical name form associated with a `Person` (e.g. maiden name, nickname, abbreviated form). Used to locate `Person` records during mass import via `PersonNameAliasType`.
**UserGroup** (`UserGroup`) — a named permission bundle assigned to one or more `AppUser`s. A user's effective permissions are the union of all permissions across all groups they belong to.
---
## Document-Related Terms
**Annotation** (`DocumentAnnotation`) — a free-form polygon or shape drawn over a document page image to highlight a region of interest. Always scoped to a specific page of a `Document`; stored as a polygon (JSONB).
_See also [TranscriptionBlock](#transcriptionblock-transcriptionblock)._
**Comment** (`DocumentComment`, table `document_comments`) — a threaded discussion message attached to a `Document`. Always scoped to a `Document`; optionally further contextualized by a specific `DocumentAnnotation` or `TranscriptionBlock`.
**Document** (`Document`) — a single archival item (letter, postcard, photograph) with a file stored in MinIO/S3 and associated metadata (sender, receivers, date, tags, transcription blocks).
**DocumentVersion** (`DocumentVersion`) — an append-only snapshot of a `Document`'s metadata at a point in time. Append-only by convention; no consumer-facing create or update endpoint exists. The entity uses Lombok `@Data` (which generates setters), so immutability is enforced by application convention, not at the Java level.
**Tag** (`Tag`) — a hierarchical category that can be applied to `Document`s. Tags are self-referencing via a `parent_id` foreign key, forming a tree structure.
**TranscriptionBlock** (`TranscriptionBlock`) — a paragraph-level segment of a `Document`'s transcribed text, with a polygon region (stored as JSONB) identifying its position on the page. One document can have many blocks across multiple pages.
_See also [Annotation](#annotation-documentannotation)._
---
## Workflow Terms
**DocumentStatus lifecycle** — the ordered states a `Document` moves through:
`PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`
- `PLACEHOLDER`: created during mass import; no file attached yet.
- `UPLOADED`: a file has been stored in MinIO/S3.
- `TRANSCRIBED`: all transcription blocks have been marked done.
- `REVIEWED`: a reviewer has approved the transcription.
- `ARCHIVED`: the document is finalized and read-only.
**Mass import** — an asynchronous batch process (`MassImportService`) that reads an Excel or ODS file and creates `Person`s, `Tag`s, and `PLACEHOLDER` `Document`s in one shot. Only one import can run at a time (`IMPORT_ALREADY_RUNNING` error if attempted concurrently).
**Transcription queue** — the set of `Document`s and `TranscriptionBlock`s awaiting work, computed on-the-fly from `Document`/`Block` status. Three views: segmentation queue, transcription queue, ready-to-read queue. NOT a persistent entity — no `transcription_queues` table exists.
_See also [DocumentStatus lifecycle](#documentstatus-lifecycle)._
---
## OCR-Specific Terms
**HTR** — Handwritten Text Recognition. Recognizes cursive and historical handwriting (contrasted with OCR for printed/typewritten text). The primary mode used for letters in this archive.
**Kurrent** — Old German cursive handwriting style, the primary historical script appearing in letters from the 18991950 period covered by this archive.
**OCR** — Optical Character Recognition. Recognizes printed or typewritten text. Used for typed documents; HTR is used for handwritten ones.
**OcrJob** (`OcrJob`, table `ocr_jobs`) — a first-class persistent entity tracking a batch OCR run across one or more documents (`OcrJobDocument`, table `ocr_job_documents`). Distinct from the concept of "running OCR on a single document." Lifecycle: `PENDING → RUNNING → DONE / FAILED` (see `OcrJobStatus`).
**SenderModel** (`SenderModel`, table `sender_models`) — a fine-tuned Kraken HTR model trained on a specific historical correspondent's handwriting. Both an OCR-service concept (the model weights) and a persistent entity linking a `Person` to the path of their trained model file.
**Sütterlin** — A specific standardized style of Kurrent taught in German schools from 1915 to 1941.
---
## Other Domain Terms
**Aktivität / Aktivitäten** `[user-facing]` — the family activity feed accessible at `/aktivitaeten`. Shows recent documents, transcriptions, comments, and Geschichten as a chronological timeline.
_See also [Chronik](#chronik-internal)._
**Briefwechsel** `[user-facing]` — the bilateral conversation timeline between two `Person`s, derived from `Document` sender/receiver relationships. Accessible at `/briefwechsel`. Not a persistent entity — data is computed from existing `Document` records.
_See also [Derived domain](#derived-domain)._
**Chronik** `[internal]` — the conceptual and code-level name for the unified activity feed (per ADR-003 `003-chronik-unified-activity-feed.md`). Used in code, architecture documents, and ADRs. The user-facing label for the same concept is [Aktivität](#aktivitat--aktivitaten-user-facing).
**Geschichte** (`Geschichte`) `[user-facing]` — a narrative story or article published in the archive, linking `Person`s and `Document`s. Lifecycle: `DRAFT → PUBLISHED` (see `GeschichteStatus`). DRAFT stories are hidden from users without the `BLOG_WRITE` permission.
**Notification** (`Notification`) — an in-app message delivered to an `AppUser`. No email or SMS delivery exists today. Delivered via Server-Sent Events (`SseEmitterRegistry`) and persisted in the `notifications` table.
**Audit log** (`AuditLog`, table `audit_log`) — an append-only event store recording domain-level activity (document edits, user actions, etc.). Append-only by application convention; a `REVOKE UPDATE, DELETE` is attempted at the DB layer (see migrations V46, V47) but is a no-op if the application role is the table owner in PostgreSQL. Do not rely on DB-enforced immutability — the constraint is application-layer only.
---
## Architectural Terms
**Cross-cutting** — code that lives in `lib/shared/` (frontend) or cross-domain packages (backend) because it has no entity of its own, no user-facing CRUD, AND is used by two or more domains OR is framework infrastructure (error handling, API client, i18n utilities).
**Derived domain** — a Tier-2 frontend domain that has its own UI but no backend entities of its own. Data is computed from Tier-1 domain records. Current derived domains: `conversation` (from `Document` sender/receivers) and `activity` (from audit, notifications, document events).
_See also [Briefwechsel](#briefwechsel-user-facing)._
**Domain** — a Tier-1 bounded context with its own entities, controller, service, repository, and DTOs. Backend domains: `document`, `person`, `tag`, `user`, `geschichte`, `notification`, `ocr`, `audit`, `dashboard`. Frontend domains mirror this structure under `src/lib/`.
---
## Pending Terms
_Terms flagged as potentially ambiguous that have not yet been formally defined here. Add an entry above and remove it from this list when resolved._
- Terms surfaced by Epic 1 audit findings (#388#392) — review audit reports under `docs/audits/` when available and add any term flagged as ambiguous.
- `OcrBatchService` vs `OcrAsyncRunner` — both handle async OCR orchestration; their division of responsibility should be clarified here.
- `Stammbaum` — the genealogy tree view; relationship to `PersonRelationship` entity.