diff --git a/docs/GLOSSARY.md b/docs/GLOSSARY.md new file mode 100644 index 00000000..2b7ebe42 --- /dev/null +++ b/docs/GLOSSARY.md @@ -0,0 +1,113 @@ +# Familienarchiv — Glossary + +Domain-specific and overloaded terms used in this codebase. +Each entry: **Term** — definition (≤ 2 sentences). Where two terms are easily confused, a _Not to be confused with_ note follows. + +For architecture context see [`docs/architecture/c4-diagrams.md`](architecture/c4-diagrams.md). +For domain package structure see [`docs/ARCHITECTURE.md`](ARCHITECTURE.md) _(coming: DOC-2)_. + +--- + +## Identity Terms + +**AppUser** (`AppUser`) — a real person who can log into the system (a family member or administrator). `AppUser` records carry login credentials, group memberships, and notification history. +_Not to be confused with [Person](#person-person)_ — an AppUser is never recorded as a document sender, receiver, or historical individual. + +**Permission** — a discrete capability string assigned to a `UserGroup` (e.g. `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`). Enforced via the `@RequirePermission` AOP annotation on controller methods, checked at runtime by `PermissionAspect`; not via Spring Security's `@PreAuthorize`. + +**Person** (`Person`) — a historical individual in the family archive (sender, receiver of letters, person mentioned in transcriptions). NEVER has a login account and NEVER appears as an `AppUser`. +_Not to be confused with [AppUser](#appuser-appuser)_ — `Person` is a historical record; `AppUser` is someone who can log in today. + +**PersonNameAlias** (`PersonNameAlias`) — an alternate or historical name form associated with a `Person` (e.g. maiden name, nickname, abbreviated form). Used to locate `Person` records during mass import via `PersonNameAliasType`. + +**UserGroup** (`UserGroup`) — a named permission bundle assigned to one or more `AppUser`s. A user's effective permissions are the union of all permissions across all groups they belong to. + +--- + +## Document-Related Terms + +**Annotation** (`DocumentAnnotation`) — a free-form polygon or shape drawn over a document page image to highlight a region of interest. Always scoped to a specific page of a `Document`; stored as a polygon (JSONB). +_See also [TranscriptionBlock](#transcriptionblock-transcriptionblock)._ + +**Comment** (`DocumentComment`, table `document_comments`) — a threaded discussion message attached to a `Document`. Always scoped to a `Document`; optionally further contextualized by a specific `DocumentAnnotation` or `TranscriptionBlock`. + +**Document** (`Document`) — a single archival item (letter, postcard, photograph) with a file stored in MinIO/S3 and associated metadata (sender, receivers, date, tags, transcription blocks). + +**DocumentVersion** (`DocumentVersion`) — an append-only snapshot of a `Document`'s metadata at a point in time. Append-only by convention; no consumer-facing create or update endpoint exists. The entity uses Lombok `@Data` (which generates setters), so immutability is enforced by application convention, not at the Java level. + +**Tag** (`Tag`) — a hierarchical category that can be applied to `Document`s. Tags are self-referencing via a `parent_id` foreign key, forming a tree structure. + +**TranscriptionBlock** (`TranscriptionBlock`) — a paragraph-level segment of a `Document`'s transcribed text, with a polygon region (stored as JSONB) identifying its position on the page. One document can have many blocks across multiple pages. +_See also [Annotation](#annotation-documentannotation)._ + +--- + +## Workflow Terms + +**DocumentStatus lifecycle** — the ordered states a `Document` moves through: +`PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED` +- `PLACEHOLDER`: created during mass import; no file attached yet. +- `UPLOADED`: a file has been stored in MinIO/S3. +- `TRANSCRIBED`: all transcription blocks have been marked done. +- `REVIEWED`: a reviewer has approved the transcription. +- `ARCHIVED`: the document is finalized and read-only. + +**Mass import** — an asynchronous batch process (`MassImportService`) that reads an Excel or ODS file and creates `Person`s, `Tag`s, and `PLACEHOLDER` `Document`s in one shot. Only one import can run at a time (`IMPORT_ALREADY_RUNNING` error if attempted concurrently). + +**Transcription queue** — the set of `Document`s and `TranscriptionBlock`s awaiting work, computed on-the-fly from `Document`/`Block` status. Three views: segmentation queue, transcription queue, ready-to-read queue. NOT a persistent entity — no `transcription_queues` table exists. +_See also [DocumentStatus lifecycle](#documentstatus-lifecycle)._ + +--- + +## OCR-Specific Terms + +**HTR** — Handwritten Text Recognition. Recognizes cursive and historical handwriting (contrasted with OCR for printed/typewritten text). The primary mode used for letters in this archive. + +**Kurrent** — Old German cursive handwriting style, the primary historical script appearing in letters from the 1899–1950 period covered by this archive. + +**OCR** — Optical Character Recognition. Recognizes printed or typewritten text. Used for typed documents; HTR is used for handwritten ones. + +**OcrJob** (`OcrJob`, table `ocr_jobs`) — a first-class persistent entity tracking a batch OCR run across one or more documents (`OcrJobDocument`, table `ocr_job_documents`). Distinct from the concept of "running OCR on a single document." Lifecycle: `PENDING → RUNNING → DONE / FAILED` (see `OcrJobStatus`). + +**SenderModel** (`SenderModel`, table `sender_models`) — a fine-tuned Kraken HTR model trained on a specific historical correspondent's handwriting. Both an OCR-service concept (the model weights) and a persistent entity linking a `Person` to the path of their trained model file. + +**Sütterlin** — A specific standardized style of Kurrent taught in German schools from 1915 to 1941. + +--- + +## Other Domain Terms + +**Aktivität / Aktivitäten** `[user-facing]` — the family activity feed accessible at `/aktivitaeten`. Shows recent documents, transcriptions, comments, and Geschichten as a chronological timeline. +_See also [Chronik](#chronik-internal)._ + +**Briefwechsel** `[user-facing]` — the bilateral conversation timeline between two `Person`s, derived from `Document` sender/receiver relationships. Accessible at `/briefwechsel`. Not a persistent entity — data is computed from existing `Document` records. +_See also [Derived domain](#derived-domain)._ + +**Chronik** `[internal]` — the conceptual and code-level name for the unified activity feed (per ADR-003 `003-chronik-unified-activity-feed.md`). Used in code, architecture documents, and ADRs. The user-facing label for the same concept is [Aktivität](#aktivitat--aktivitaten-user-facing). + +**Geschichte** (`Geschichte`) `[user-facing]` — a narrative story or article published in the archive, linking `Person`s and `Document`s. Lifecycle: `DRAFT → PUBLISHED` (see `GeschichteStatus`). DRAFT stories are hidden from users without the `BLOG_WRITE` permission. + +**Notification** (`Notification`) — an in-app message delivered to an `AppUser`. No email or SMS delivery exists today. Delivered via Server-Sent Events (`SseEmitterRegistry`) and persisted in the `notifications` table. + +**Audit log** (`AuditLog`, table `audit_log`) — an append-only event store recording domain-level activity (document edits, user actions, etc.). Append-only by application convention; a `REVOKE UPDATE, DELETE` is attempted at the DB layer (see migrations V46, V47) but is a no-op if the application role is the table owner in PostgreSQL. Do not rely on DB-enforced immutability — the constraint is application-layer only. + +--- + +## Architectural Terms + +**Cross-cutting** — code that lives in `lib/shared/` (frontend) or cross-domain packages (backend) because it has no entity of its own, no user-facing CRUD, AND is used by two or more domains OR is framework infrastructure (error handling, API client, i18n utilities). + +**Derived domain** — a Tier-2 frontend domain that has its own UI but no backend entities of its own. Data is computed from Tier-1 domain records. Current derived domains: `conversation` (from `Document` sender/receivers) and `activity` (from audit, notifications, document events). +_See also [Briefwechsel](#briefwechsel-user-facing)._ + +**Domain** — a Tier-1 bounded context with its own entities, controller, service, repository, and DTOs. Backend domains: `document`, `person`, `tag`, `user`, `geschichte`, `notification`, `ocr`, `audit`, `dashboard`. Frontend domains mirror this structure under `src/lib/`. + +--- + +## Pending Terms + +_Terms flagged as potentially ambiguous that have not yet been formally defined here. Add an entry above and remove it from this list when resolved._ + +- Terms surfaced by Epic 1 audit findings (#388–#392) — review audit reports under `docs/audits/` when available and add any term flagged as ambiguous. +- `OcrBatchService` vs `OcrAsyncRunner` — both handle async OCR orchestration; their division of responsibility should be clarified here. +- `Stammbaum` — the genealogy tree view; relationship to `PersonRelationship` entity.