Files
familienarchiv/docs/GLOSSARY.md
Marcel 4d4d5793bb
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m48s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 4m5s
CI / fail2ban Regex (pull_request) Successful in 37s
CI / Compose Bucket Idempotency (pull_request) Failing after 50s
CI / Unit & Component Tests (push) Failing after 2m46s
CI / OCR Service Tests (push) Successful in 15s
CI / Backend Unit Tests (push) Successful in 4m4s
CI / fail2ban Regex (push) Successful in 37s
CI / Compose Bucket Idempotency (push) Failing after 50s
docs(glossary): add archiv-app service account entry
`archiv-app` is the bucket-scoped MinIO service account introduced
in PR #499 alongside the production deploy pipeline. Until now the
term only appeared in `infra/minio/bootstrap.sh` and the prod compose
file; a reader encountering `S3_ACCESS_KEY: archiv-app` had no
single-page reference distinguishing it from the MinIO root account.

Adds a new "Infrastructure Terms" section to docs/GLOSSARY.md so the
distinction (root account vs. application service account) and the
attached `archiv-app-policy` scope live in the canonical glossary
location. Cross-links to ADR-010 for the MinIO-stays-self-hosted
rationale. Addresses @elicit's round-2 recommendation on PR #499.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 14:11:46 +02:00

10 KiB
Raw Permalink Blame History

Familienarchiv — Glossary

Domain-specific and overloaded terms used in this codebase. Each entry: Term — definition (≤ 2 sentences). Where two terms are easily confused, a Not to be confused with note follows.

For architecture context see docs/architecture/c4-diagrams.md. For domain package structure see docs/ARCHITECTURE.md (coming: DOC-2).


Identity Terms

AppUser (AppUser) — a real person who can log into the system (a family member or administrator). AppUser records carry login credentials, group memberships, and notification history. Not to be confused with Person — an AppUser is never recorded as a document sender, receiver, or historical individual.

Reader — an AppUser whose effective permissions include READ_ALL but neither WRITE_ALL nor ANNOTATE_ALL. Readers see a dedicated dashboard (isReader = !canWrite && !canAnnotate) focused on browsing documents, persons, and stories rather than contribution tasks. A user who also holds BLOG_WRITE is still classified as a Reader and additionally sees a drafts module. Not to be confused with AppUser — Reader is a permission-derived role, not an entity.

Permission — a discrete capability string assigned to a UserGroup (e.g. READ_ALL, WRITE_ALL, ADMIN, ADMIN_USER, ADMIN_TAG, ADMIN_PERMISSION). Enforced via the @RequirePermission AOP annotation on controller methods, checked at runtime by PermissionAspect; not via Spring Security's @PreAuthorize.

Person (Person) — a historical individual in the family archive (sender, receiver of letters, person mentioned in transcriptions). NEVER has a login account and NEVER appears as an AppUser. Not to be confused with AppUserPerson is a historical record; AppUser is someone who can log in today.

PersonNameAlias (PersonNameAlias) — an alternate or historical name form associated with a Person (e.g. maiden name, nickname, abbreviated form). Used to locate Person records during mass import via PersonNameAliasType.

UserGroup (UserGroup) — a named permission bundle assigned to one or more AppUsers. A user's effective permissions are the union of all permissions across all groups they belong to.


Annotation (DocumentAnnotation) — a free-form polygon or shape drawn over a document page image to highlight a region of interest. Always scoped to a specific page of a Document; stored as a polygon (JSONB). See also TranscriptionBlock.

Comment (DocumentComment, table document_comments) — a threaded discussion message attached to a Document. Always scoped to a Document; optionally further contextualized by a specific DocumentAnnotation or TranscriptionBlock.

Document (Document) — a single archival item (letter, postcard, photograph) with a file stored in MinIO/S3 and associated metadata (sender, receivers, date, tags, transcription blocks).

DocumentVersion (DocumentVersion) — an append-only snapshot of a Document's metadata at a point in time. Append-only by convention; no consumer-facing create or update endpoint exists. The entity uses Lombok @Data (which generates setters), so immutability is enforced by application convention, not at the Java level.

Tag (Tag) — a hierarchical category that can be applied to Documents. Tags are self-referencing via a parent_id foreign key, forming a tree structure.

TranscriptionBlock (TranscriptionBlock) — a paragraph-level segment of a Document's transcribed text, with a polygon region (stored as JSONB) identifying its position on the page. One document can have many blocks across multiple pages. See also Annotation.


Workflow Terms

DocumentStatus lifecycle — the ordered states a Document moves through: PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED

  • PLACEHOLDER: created during mass import; no file attached yet.
  • UPLOADED: a file has been stored in MinIO/S3.
  • TRANSCRIBED: all transcription blocks have been marked done.
  • REVIEWED: a reviewer has approved the transcription.
  • ARCHIVED: the document is finalized and read-only.

Mass import — an asynchronous batch process (MassImportService) that reads an Excel or ODS file and creates Persons, Tags, and PLACEHOLDER Documents in one shot. Only one import can run at a time (IMPORT_ALREADY_RUNNING error if attempted concurrently).

Transcription queue — the set of Documents and TranscriptionBlocks awaiting work, computed on-the-fly from Document/Block status. Three views: segmentation queue, transcription queue, ready-to-read queue. NOT a persistent entity — no transcription_queues table exists. See also DocumentStatus lifecycle.


OCR-Specific Terms

HTR — Handwritten Text Recognition. Recognizes cursive and historical handwriting (contrasted with OCR for printed/typewritten text). The primary mode used for letters in this archive.

Kurrent — Old German cursive handwriting style, the primary historical script appearing in letters from the 18991950 period covered by this archive.

OCR — Optical Character Recognition. Recognizes printed or typewritten text. Used for typed documents; HTR is used for handwritten ones.

OcrJob (OcrJob, table ocr_jobs) — a first-class persistent entity tracking a batch OCR run across one or more documents (OcrJobDocument, table ocr_job_documents). Distinct from the concept of "running OCR on a single document." Lifecycle: PENDING → RUNNING → DONE / FAILED (see OcrJobStatus).

SenderModel (SenderModel, table sender_models) — a fine-tuned Kraken HTR model trained on a specific historical correspondent's handwriting. Both an OCR-service concept (the model weights) and a persistent entity linking a Person to the path of their trained model file.

Sütterlin — A specific standardized style of Kurrent taught in German schools from 1915 to 1941.


Other Domain Terms

Aktivität / Aktivitäten [user-facing] — the family activity feed accessible at /aktivitaeten. Shows recent documents, transcriptions, comments, and Geschichten as a chronological timeline. See also Chronik.

Briefwechsel [user-facing] — the bilateral conversation timeline between two Persons, derived from Document sender/receiver relationships. Accessible at /briefwechsel. Not a persistent entity — data is computed from existing Document records. See also Derived domain.

Chronik [internal] — the conceptual and code-level name for the unified activity feed (per ADR-003 003-chronik-unified-activity-feed.md). Used in code, architecture documents, and ADRs. The user-facing label for the same concept is Aktivität.

Geschichte (Geschichte) [user-facing] — a narrative story or article published in the archive, linking Persons and Documents. Lifecycle: DRAFT → PUBLISHED (see GeschichteStatus). DRAFT stories are hidden from users without the BLOG_WRITE permission.

Notification (Notification) — an in-app message delivered to an AppUser. No email or SMS delivery exists today. Delivered via Server-Sent Events (SseEmitterRegistry) and persisted in the notifications table.

Audit log (AuditLog, table audit_log) — an append-only event store recording domain-level activity (document edits, user actions, etc.). Append-only by application convention; a REVOKE UPDATE, DELETE is attempted at the DB layer (see migrations V46, V47) but is a no-op if the application role is the table owner in PostgreSQL. Do not rely on DB-enforced immutability — the constraint is application-layer only.


Architectural Terms

Cross-cutting — code that lives in lib/shared/ (frontend) or cross-domain packages (backend) because it has no entity of its own, no user-facing CRUD, AND is used by two or more domains OR is framework infrastructure (error handling, API client, i18n utilities).

Derived domain — a Tier-2 frontend domain that has its own UI but no backend entities of its own. Data is computed from Tier-1 domain records. Current derived domains: conversation (from Document sender/receivers) and activity (from audit, notifications, document events). See also Briefwechsel.

Domain — a Tier-1 bounded context with its own entities, controller, service, repository, and DTOs. Backend domains: document, person, tag, user, geschichte, notification, ocr, audit, dashboard. Frontend domains mirror this structure under src/lib/.


Infrastructure Terms

archiv-app — the bucket-scoped MinIO service account the backend uses to read and write the familienarchiv bucket. Distinct from the MinIO root account (archiv, used only by the bootstrap container for admin operations). Defined and provisioned in infra/minio/bootstrap.sh and consumed by the backend as S3_ACCESS_KEY in docker-compose.prod.yml. The attached archiv-app-policy grants s3:GetObject/PutObject/DeleteObject on familienarchiv/* and s3:ListBucket/GetBucketLocation on the bucket only — not the built-in readwrite policy which would grant s3:* on all buckets. See also ADR-010 — MinIO stays self-hosted, not Hetzner OBS.


Pending Terms

Terms flagged as potentially ambiguous that have not yet been formally defined here. Add an entry above and remove it from this list when resolved.

  • Terms surfaced by Epic 1 audit findings (#388#392) — review audit reports under docs/audits/ when available and add any term flagged as ambiguous.
  • OcrBatchService vs OcrAsyncRunner — both handle async OCR orchestration; their division of responsibility should be clarified here.
  • Stammbaum — the genealogy tree view; relationship to PersonRelationship entity.