Files
familienarchiv/docs/GLOSSARY.md
Marcel 4d4d5793bb
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m48s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m5s
CI / fail2ban Regex (pull_request) Successful in 37s
CI / Compose Bucket Idempotency (pull_request) Failing after 50s
CI / Unit & Component Tests (push) Failing after 2m46s
CI / OCR Service Tests (push) Successful in 15s
CI / Backend Unit Tests (push) Successful in 4m4s
CI / fail2ban Regex (push) Successful in 37s
CI / Compose Bucket Idempotency (push) Failing after 50s
docs(glossary): add archiv-app service account entry
`archiv-app` is the bucket-scoped MinIO service account introduced
in PR #499 alongside the production deploy pipeline. Until now the
term only appeared in `infra/minio/bootstrap.sh` and the prod compose
file; a reader encountering `S3_ACCESS_KEY: archiv-app` had no
single-page reference distinguishing it from the MinIO root account.

Adds a new "Infrastructure Terms" section to docs/GLOSSARY.md so the
distinction (root account vs. application service account) and the
attached `archiv-app-policy` scope live in the canonical glossary
location. Cross-links to ADR-010 for the MinIO-stays-self-hosted
rationale. Addresses @elicit's round-2 recommendation on PR #499.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 14:11:46 +02:00

124 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Familienarchiv — Glossary
Domain-specific and overloaded terms used in this codebase.
Each entry: **Term** — definition (≤ 2 sentences). Where two terms are easily confused, a _Not to be confused with_ note follows.
For architecture context see [`docs/architecture/c4-diagrams.md`](architecture/c4-diagrams.md).
For domain package structure see [`docs/ARCHITECTURE.md`](ARCHITECTURE.md) _(coming: DOC-2)_.
---
## Identity Terms
**AppUser** (`AppUser`) — a real person who can log into the system (a family member or administrator). `AppUser` records carry login credentials, group memberships, and notification history.
_Not to be confused with [Person](#person-person)_ — an AppUser is never recorded as a document sender, receiver, or historical individual.
**Reader** — an `AppUser` whose effective permissions include `READ_ALL` but neither `WRITE_ALL` nor `ANNOTATE_ALL`. Readers see a dedicated dashboard (`isReader = !canWrite && !canAnnotate`) focused on browsing documents, persons, and stories rather than contribution tasks. A user who also holds `BLOG_WRITE` is still classified as a Reader and additionally sees a drafts module.
_Not to be confused with [AppUser](#appuser-appuser)_ — Reader is a permission-derived role, not an entity.
**Permission** — a discrete capability string assigned to a `UserGroup` (e.g. `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`). Enforced via the `@RequirePermission` AOP annotation on controller methods, checked at runtime by `PermissionAspect`; not via Spring Security's `@PreAuthorize`.
**Person** (`Person`) — a historical individual in the family archive (sender, receiver of letters, person mentioned in transcriptions). NEVER has a login account and NEVER appears as an `AppUser`.
_Not to be confused with [AppUser](#appuser-appuser)_`Person` is a historical record; `AppUser` is someone who can log in today.
**PersonNameAlias** (`PersonNameAlias`) — an alternate or historical name form associated with a `Person` (e.g. maiden name, nickname, abbreviated form). Used to locate `Person` records during mass import via `PersonNameAliasType`.
**UserGroup** (`UserGroup`) — a named permission bundle assigned to one or more `AppUser`s. A user's effective permissions are the union of all permissions across all groups they belong to.
---
## Document-Related Terms
**Annotation** (`DocumentAnnotation`) — a free-form polygon or shape drawn over a document page image to highlight a region of interest. Always scoped to a specific page of a `Document`; stored as a polygon (JSONB).
_See also [TranscriptionBlock](#transcriptionblock-transcriptionblock)._
**Comment** (`DocumentComment`, table `document_comments`) — a threaded discussion message attached to a `Document`. Always scoped to a `Document`; optionally further contextualized by a specific `DocumentAnnotation` or `TranscriptionBlock`.
**Document** (`Document`) — a single archival item (letter, postcard, photograph) with a file stored in MinIO/S3 and associated metadata (sender, receivers, date, tags, transcription blocks).
**DocumentVersion** (`DocumentVersion`) — an append-only snapshot of a `Document`'s metadata at a point in time. Append-only by convention; no consumer-facing create or update endpoint exists. The entity uses Lombok `@Data` (which generates setters), so immutability is enforced by application convention, not at the Java level.
**Tag** (`Tag`) — a hierarchical category that can be applied to `Document`s. Tags are self-referencing via a `parent_id` foreign key, forming a tree structure.
**TranscriptionBlock** (`TranscriptionBlock`) — a paragraph-level segment of a `Document`'s transcribed text, with a polygon region (stored as JSONB) identifying its position on the page. One document can have many blocks across multiple pages.
_See also [Annotation](#annotation-documentannotation)._
---
## Workflow Terms
**DocumentStatus lifecycle** — the ordered states a `Document` moves through:
`PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`
- `PLACEHOLDER`: created during mass import; no file attached yet.
- `UPLOADED`: a file has been stored in MinIO/S3.
- `TRANSCRIBED`: all transcription blocks have been marked done.
- `REVIEWED`: a reviewer has approved the transcription.
- `ARCHIVED`: the document is finalized and read-only.
**Mass import** — an asynchronous batch process (`MassImportService`) that reads an Excel or ODS file and creates `Person`s, `Tag`s, and `PLACEHOLDER` `Document`s in one shot. Only one import can run at a time (`IMPORT_ALREADY_RUNNING` error if attempted concurrently).
**Transcription queue** — the set of `Document`s and `TranscriptionBlock`s awaiting work, computed on-the-fly from `Document`/`Block` status. Three views: segmentation queue, transcription queue, ready-to-read queue. NOT a persistent entity — no `transcription_queues` table exists.
_See also [DocumentStatus lifecycle](#documentstatus-lifecycle)._
---
## OCR-Specific Terms
**HTR** — Handwritten Text Recognition. Recognizes cursive and historical handwriting (contrasted with OCR for printed/typewritten text). The primary mode used for letters in this archive.
**Kurrent** — Old German cursive handwriting style, the primary historical script appearing in letters from the 18991950 period covered by this archive.
**OCR** — Optical Character Recognition. Recognizes printed or typewritten text. Used for typed documents; HTR is used for handwritten ones.
**OcrJob** (`OcrJob`, table `ocr_jobs`) — a first-class persistent entity tracking a batch OCR run across one or more documents (`OcrJobDocument`, table `ocr_job_documents`). Distinct from the concept of "running OCR on a single document." Lifecycle: `PENDING → RUNNING → DONE / FAILED` (see `OcrJobStatus`).
**SenderModel** (`SenderModel`, table `sender_models`) — a fine-tuned Kraken HTR model trained on a specific historical correspondent's handwriting. Both an OCR-service concept (the model weights) and a persistent entity linking a `Person` to the path of their trained model file.
**Sütterlin** — A specific standardized style of Kurrent taught in German schools from 1915 to 1941.
---
## Other Domain Terms
**Aktivität / Aktivitäten** `[user-facing]` — the family activity feed accessible at `/aktivitaeten`. Shows recent documents, transcriptions, comments, and Geschichten as a chronological timeline.
_See also [Chronik](#chronik-internal)._
**Briefwechsel** `[user-facing]` — the bilateral conversation timeline between two `Person`s, derived from `Document` sender/receiver relationships. Accessible at `/briefwechsel`. Not a persistent entity — data is computed from existing `Document` records.
_See also [Derived domain](#derived-domain)._
**Chronik** `[internal]` — the conceptual and code-level name for the unified activity feed (per ADR-003 `003-chronik-unified-activity-feed.md`). Used in code, architecture documents, and ADRs. The user-facing label for the same concept is [Aktivität](#aktivitat--aktivitaten-user-facing).
**Geschichte** (`Geschichte`) `[user-facing]` — a narrative story or article published in the archive, linking `Person`s and `Document`s. Lifecycle: `DRAFT → PUBLISHED` (see `GeschichteStatus`). DRAFT stories are hidden from users without the `BLOG_WRITE` permission.
**Notification** (`Notification`) — an in-app message delivered to an `AppUser`. No email or SMS delivery exists today. Delivered via Server-Sent Events (`SseEmitterRegistry`) and persisted in the `notifications` table.
**Audit log** (`AuditLog`, table `audit_log`) — an append-only event store recording domain-level activity (document edits, user actions, etc.). Append-only by application convention; a `REVOKE UPDATE, DELETE` is attempted at the DB layer (see migrations V46, V47) but is a no-op if the application role is the table owner in PostgreSQL. Do not rely on DB-enforced immutability — the constraint is application-layer only.
---
## Architectural Terms
**Cross-cutting** — code that lives in `lib/shared/` (frontend) or cross-domain packages (backend) because it has no entity of its own, no user-facing CRUD, AND is used by two or more domains OR is framework infrastructure (error handling, API client, i18n utilities).
**Derived domain** — a Tier-2 frontend domain that has its own UI but no backend entities of its own. Data is computed from Tier-1 domain records. Current derived domains: `conversation` (from `Document` sender/receivers) and `activity` (from audit, notifications, document events).
_See also [Briefwechsel](#briefwechsel-user-facing)._
**Domain** — a Tier-1 bounded context with its own entities, controller, service, repository, and DTOs. Backend domains: `document`, `person`, `tag`, `user`, `geschichte`, `notification`, `ocr`, `audit`, `dashboard`. Frontend domains mirror this structure under `src/lib/`.
---
## Infrastructure Terms
**archiv-app** — the bucket-scoped MinIO service account the backend uses to read and write the `familienarchiv` bucket. Distinct from the MinIO root account (`archiv`, used only by the bootstrap container for admin operations). Defined and provisioned in [`infra/minio/bootstrap.sh`](../infra/minio/bootstrap.sh) and consumed by the backend as `S3_ACCESS_KEY` in [`docker-compose.prod.yml`](../docker-compose.prod.yml). The attached `archiv-app-policy` grants `s3:GetObject/PutObject/DeleteObject` on `familienarchiv/*` and `s3:ListBucket/GetBucketLocation` on the bucket only — not the built-in `readwrite` policy which would grant `s3:*` on all buckets.
_See also [ADR-010 — MinIO stays self-hosted, not Hetzner OBS](./adr/010-minio-self-hosted-not-hetzner-obs.md)._
---
## Pending Terms
_Terms flagged as potentially ambiguous that have not yet been formally defined here. Add an entry above and remove it from this list when resolved._
- Terms surfaced by Epic 1 audit findings (#388#392) — review audit reports under `docs/audits/` when available and add any term flagged as ambiguous.
- `OcrBatchService` vs `OcrAsyncRunner` — both handle async OCR orchestration; their division of responsibility should be clarified here.
- `Stammbaum` — the genealogy tree view; relationship to `PersonRelationship` entity.