Review follow-up (Markus/Architect): ADR-026 pre-committed a successor ADR if the in-house layout stopped converging; its UX stop-trigger (Albert smeared across the canvas) fired. ADR-030 records the bottom-up tidy-tree, the module split, and the two maintainer-confirmed decisions (hybrid intra-family, per-bloodline width metric), superseding ADR-026's block-packer in part (no-dagre + seeded-rank retained). GLOSSARY replaces the deleted sibling-block / parented / anchor-index vocabulary with the new family-forest model (unit, tidy tree, structural owner, bloodline, cross-link). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
20 KiB
Familienarchiv — Glossary
Domain-specific and overloaded terms used in this codebase. Each entry: Term — definition (≤ 2 sentences). Where two terms are easily confused, a Not to be confused with note follows.
For architecture context see docs/architecture/c4-diagrams.md.
For domain package structure see docs/ARCHITECTURE.md (coming: DOC-2).
Identity Terms
AppUser (AppUser) — a real person who can log into the system (a family member or administrator). AppUser records carry login credentials, group memberships, and notification history.
Not to be confused with Person — an AppUser is never recorded as a document sender, receiver, or historical individual.
Reader — an AppUser whose effective permissions include READ_ALL but neither WRITE_ALL nor ANNOTATE_ALL. Readers see a dedicated dashboard (isReader = !canWrite && !canAnnotate) focused on browsing documents, persons, and stories rather than contribution tasks. A user who also holds BLOG_WRITE is still classified as a Reader and additionally sees a drafts module.
Not to be confused with AppUser — Reader is a permission-derived role, not an entity.
Permission — a discrete capability string assigned to a UserGroup (e.g. READ_ALL, WRITE_ALL, ADMIN, ADMIN_USER, ADMIN_TAG, ADMIN_PERMISSION). Enforced via the @RequirePermission AOP annotation on controller methods, checked at runtime by PermissionAspect; not via Spring Security's @PreAuthorize.
Person (Person) — a historical individual in the family archive (sender, receiver of letters, person mentioned in transcriptions). NEVER has a login account and NEVER appears as an AppUser.
Not to be confused with AppUser — Person is a historical record; AppUser is someone who can log in today.
PersonNameAlias (PersonNameAlias) — an alternate or historical name form associated with a Person (e.g. maiden name, nickname, abbreviated form). Used to locate Person records during mass import via PersonNameAliasType.
UserGroup (UserGroup) — a named permission bundle assigned to one or more AppUsers. A user's effective permissions are the union of all permissions across all groups they belong to.
source_ref (Person.sourceRef, Tag.sourceRef) — the import normalizer's stable identity for a Person (its person_id) or Tag (its canonical tag_path). It is the join key linking normalized records to documents and the idempotency key for re-import; null for manually created records and unique among non-null values.
provisional person (Person.provisional) — a Person the importer inferred from raw attribution text but could not confidently match to a known individual. The flag lets the persons directory surface uncertainty honestly rather than fabricate a confident identity; it defaults to false and is set true only by the importer.
Not to be confused with family_member — provisional expresses import confidence, while family_member is a genealogical fact about whether the person belongs to the family tree.
Document-Related Terms
Annotation (DocumentAnnotation) — a free-form polygon or shape drawn over a document page image to highlight a region of interest. Always scoped to a specific page of a Document; stored as a polygon (JSONB).
See also TranscriptionBlock.
Comment (DocumentComment, table document_comments) — a threaded discussion message attached to a Document. Always scoped to a Document; optionally further contextualized by a specific DocumentAnnotation or TranscriptionBlock.
Document (Document) — a single archival item (letter, postcard, photograph) with a file stored in MinIO/S3 and associated metadata (sender, receivers, date, tags, transcription blocks).
date precision (Document.metaDatePrecision, enum DatePrecision) — how exactly a document's date is known, one of DAY, MONTH, SEASON, YEAR, RANGE, APPROX, UNKNOWN. A verbatim mirror of the import normalizer's Precision enum so honest dates can be rendered (APPROX → "ca.", RANGE uses meta_date_end) instead of fabricating a false DAY-level date. UNKNOWN is the explicit value for undated documents.
raw attribution (Document.senderText, Document.receiverText, Document.metaDateRaw) — the original spreadsheet cell text for a document's sender, receiver, and date, preserved verbatim even after a Person or normalized date is linked. It keeps provenance intact and enables an "as written in the original" view.
DocumentVersion (DocumentVersion) — an append-only snapshot of a Document's metadata at a point in time. Append-only by convention; no consumer-facing create or update endpoint exists. The entity uses Lombok @Data (which generates setters), so immutability is enforced by application convention, not at the Java level.
Tag (Tag) — a hierarchical category that can be applied to Documents. Tags are self-referencing via a parent_id foreign key, forming a tree structure.
TranscriptionBlock (TranscriptionBlock) — a paragraph-level segment of a Document's transcribed text, with a polygon region (stored as JSONB) identifying its position on the page. One document can have many blocks across multiple pages.
See also Annotation.
Workflow Terms
DocumentStatus lifecycle — the ordered states a Document moves through:
PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED
PLACEHOLDER: created during mass import; no file attached yet.UPLOADED: a file has been stored in MinIO/S3.TRANSCRIBED: all transcription blocks have been marked done.REVIEWED: a reviewer has approved the transcription.ARCHIVED: the document is finalized and read-only.
Canonical import — an asynchronous batch process (CanonicalImportOrchestrator) that consumes the normalizer's committed canonical artifacts and creates Tags, Persons (register + tree), family relationships, and Documents. Four idempotent loaders run in a fixed dependency order — TagTreeImporter → PersonRegisterImporter → PersonTreeImporter → DocumentImporter — each calling the owning domain's service. Re-running it never duplicates rows (upsert by source_ref / document index) and never overwrites a human-edited field. Only one import can run at a time (IMPORT_ALREADY_RUNNING error if attempted concurrently); a missing or malformed artifact fails closed (IMPORT_ARTIFACT_INVALID). Replaced the legacy raw-spreadsheet MassImportService (see ADR-025).
canonical artifact — one of the four files the normalizer (tools/import-normalizer/) emits and commits to tools/import-normalizer/out/: canonical-tag-tree.xlsx, canonical-persons.xlsx, canonical-persons-tree.json, canonical-documents.xlsx. They are the contract the backend importer reads (mapped by header name); the semantic transformation (German-date parsing, name classification) lives only in the normalizer, never in Java.
CanonicalSheetReader — the value-level POI helper that opens a canonical .xlsx, maps the header row to column indices by name (replacing the brittle positional column config), splits pipe-delimited list columns, and throws IMPORT_ARTIFACT_INVALID on a missing required header rather than NPE-ing on a null index.
SkippedFile (ImportStatus.SkippedFile) — a file that was presented for import but not processed, recorded with a filename and a reason code. Possible reasons: INVALID_FILENAME_PATH_TRAVERSAL (the file-column basename failed the path-traversal guard), INVALID_PDF_SIGNATURE (magic-byte validation failed), S3_UPLOAD_FAILED (file upload to MinIO/S3 threw an exception), FILE_READ_ERROR (the file could not be opened for reading), or ALREADY_EXISTS (a document with the same index already exists in the archive with a status other than PLACEHOLDER).
skipped count — the total number of SkippedFile entries accumulated during a single import run (ImportStatus.skipped()). Shown in the amber warning section of the Import Status Card in the admin UI; a value of zero suppresses the section entirely.
Transcription queue — the set of Documents and TranscriptionBlocks awaiting work, computed on-the-fly from Document/Block status. Three views: segmentation queue, transcription queue, ready-to-read queue. NOT a persistent entity — no transcription_queues table exists.
See also DocumentStatus lifecycle.
OCR-Specific Terms
HTR — Handwritten Text Recognition. Recognizes cursive and historical handwriting (contrasted with OCR for printed/typewritten text). The primary mode used for letters in this archive.
Kurrent — Old German cursive handwriting style, the primary historical script appearing in letters from the 1899–1950 period covered by this archive.
OCR — Optical Character Recognition. Recognizes printed or typewritten text. Used for typed documents; HTR is used for handwritten ones.
OcrJob (OcrJob, table ocr_jobs) — a first-class persistent entity tracking a batch OCR run across one or more documents (OcrJobDocument, table ocr_job_documents). Distinct from the concept of "running OCR on a single document." Lifecycle: PENDING → RUNNING → DONE / FAILED (see OcrJobStatus).
SenderModel (SenderModel, table sender_models) — a fine-tuned Kraken HTR model trained on a specific historical correspondent's handwriting. Both an OCR-service concept (the model weights) and a persistent entity linking a Person to the path of their trained model file.
Sütterlin — A specific standardized style of Kurrent taught in German schools from 1915 to 1941.
Illegible word — a word whose recognition confidence falls below the configured threshold; replaced with the literal token [unleserlich] in the rendered block text and counted in the ocr_illegible_words_total Prometheus counter.
Models-ready gauge — the ocr_models_ready Prometheus gauge, flipped from 0 to 1 once the FastAPI lifespan startup has finished loading the Kraken model and the spell-checker. Used both for the /health endpoint and as the supervised signal for the ocr_models_ready < 1 for 2m alert.
Recognition model accuracy — the accuracy reported by ketos train for the recognition (text-line) model, exposed as ocr_model_accuracy{kind="recognition"}. Sourced from _parse_best_checkpoint on the highest-scoring checkpoint after training.
Segmentation model accuracy — the accuracy reported by ketos segtrain for the baseline layout analysis (blla) model, exposed as ocr_model_accuracy{kind="segmentation"}. Distinct from recognition accuracy because the two models are trained and improved independently.
Stammbaum (Family-Tree Layout) Terms
Stammbaum [user-facing] — the genealogy / family-tree view of the archive, accessible at /stammbaum. Renders every Person as a node positioned by PersonRelationship edges (PARENT_OF, SPOUSE_OF) into rows that correspond to generations. The browser-side layout pipeline lives at frontend/src/lib/person/genealogy/.
See also PersonRelationship.
seeded rank (Person.generation) — the imported generation index on a Person (G 0 = founders, increasing downward), used as a strict row anchor in buildLayout.ts. The iterative fallback heuristic never overrides a seeded rank, and spouse-pulldown never pulls a seeded rank — only unseeded nodes (no generation) flow through the heuristic.
family forest — the model the Stammbaum horizontal layout reasons over (ADR-030, familyForest.ts): a forest of units rather than per-generation rows. Replaces the old per-generation "sibling block" packer. The canonical fixture is ~24 root units over 62 nodes.
unit [layout] — one bloodline carrier (the primary) plus the spouse(s) absorbed into its run, rendered as one adjacent row of cards. members[0] is the primary; the rest are spouses in marriage-year order (#361). A lone person is a unit of one. A unit's children are the units anchored by the couple's offspring. The unit — not the individual — is the node the tidy-tree packs.
tidy tree — the bottom-up Reingold–Tilford contour packer (tidyTree.ts) that assigns each unit's horizontal x: lay out child subtrees first, pack them so their contours clear by COL_GAP at every level, then centre the unit over the span of its children. Contours are indexed by absolute generation level, so unrelated roots at different generations share x-columns. x comes from structure; y still comes from rank (assignRanks, #689).
structural owner — for a couple, the spouse that keeps the bloodline (hierarchy) position: lower birthYear, then stable id (pickStructuralOwner in familyForest.ts). The other spouse is absorbed into the owner's run. Reused by the cross-link, cycle, and intra-family paths so the rule is defined once.
loose spouse — a person who marries into the graph with no PARENT_OF edges of their own. They are absorbed into their partner's unit run (no ancestor subtree), but any children of theirs still anchor through the couple unit.
bloodline — the set of people reachable from a root unit via structural-owner PARENT_OF edges; renders as one contiguous horizontal band with no foreign node interleaved (the contiguity invariant that fixed the smeared-bloodline bug, #724).
cross-link [layout] — a PARENT_OF edge whose child is positioned in a spouse's run elsewhere (a cross-level intra-family marriage). The connector draws it with a distinct 2 6 dash at reduced opacity — never the 4 4 ended-marriage cadence — with geometry still landing on the child (WCAG 1.4.1).
intra-family marriage — a SPOUSE_OF edge where both endpoints have parents in the graph. The couple is always exactly adjacent in the owner's run; when the two spouses' parents sit at the same structural level the displaced parent edge stays solid (the adjacency case), otherwise it renders as a cross-link. The canonical fixture has two such marriages (Walter⚭Eugenie, Clara⚭Herbert), covered in buildLayout.test.ts.
marriage dot — the SVG circle drawn at the midpoint of a SPOUSE_OF connector in the Stammbaum tree (StammbaumTree.svelte). Radius is r=6 (12 px diameter) so the marker meets WCAG 1.4.11 (3:1 non-text contrast) when it stacks to disambiguate multiple marriages on the same focal person.
canonical fixture (Stammbaum) — frontend/src/lib/person/genealogy/__fixtures__/stammbaum.json, a pinned /api/network snapshot used by buildLayout.test.ts for structural-property assertions against real data. Captured locally via frontend/scripts/capture-network-fixture.mjs with explicit credentials and a localhost backend; never invoked from CI. Sanity-gated by validateFixture.ts (≥ 50 nodes / ≥ 5 generations / ≥ 1 SPOUSE_OF edge / ≥ 1 multi-spouse person).
pan/zoom view state [#692] — the { x, y, z } triple (PanZoomState in frontend/src/lib/person/genealogy/panZoom.ts) describing the Stammbaum canvas position: x/y are pan offsets applied to the SVG viewBox centre, z is the zoom factor (clamped 0.25–10). Mirrored into the shareable URL as ?cx&cy&z and seeded server-side from those params. See ADR-027.
fit-to-screen [user-facing, #692] — the Stammbaum control (⤢) and initial state that frames the whole tree in the viewport. Because the base viewBox already encloses the layout at z=1, fit-to-screen is simply the default view {x:0, y:0, z:1}.
lineage highlight [user-facing, #703] — the focus+dim layer bound to the Stammbaum side panel: while a person is selected, that person, their full pedigree upward, their full descendant tree downward, and the spouses of all those blood people render at full strength while everyone else is dimmed (opacity, not a hue swap). Connectors dim unless both joined people are active. Computed by the pure traversal in frontend/src/lib/person/genealogy/layout/highlightLineage.ts.
Other Domain Terms
Aktivität / Aktivitäten [user-facing] — the family activity feed accessible at /aktivitaeten. Shows recent documents, transcriptions, comments, and Geschichten as a chronological timeline.
See also Chronik.
Chronik [internal] — the conceptual and code-level name for the unified activity feed (per ADR-003 003-chronik-unified-activity-feed.md). Used in code, architecture documents, and ADRs. The user-facing label for the same concept is Aktivität.
Geschichte (Geschichte) [user-facing] — a narrative story or article published in the archive, linking Persons and Documents. Lifecycle: DRAFT → PUBLISHED (see GeschichteStatus). DRAFT stories are hidden from users without the BLOG_WRITE permission.
Notification (Notification) — an in-app message delivered to an AppUser. No email or SMS delivery exists today. Delivered via Server-Sent Events (SseEmitterRegistry) and persisted in the notifications table.
Audit log (AuditLog, table audit_log) — an append-only event store recording domain-level activity (document edits, user actions, etc.). Append-only by application convention; a REVOKE UPDATE, DELETE is attempted at the DB layer (see migrations V46, V47) but is a no-op if the application role is the table owner in PostgreSQL. Do not rely on DB-enforced immutability — the constraint is application-layer only.
Architectural Terms
Cross-cutting — code that lives in lib/shared/ (frontend) or cross-domain packages (backend) because it has no entity of its own, no user-facing CRUD, AND is used by two or more domains OR is framework infrastructure (error handling, API client, i18n utilities).
Derived domain — a Tier-2 frontend domain that has its own UI but no backend entities of its own. Data is computed from Tier-1 domain records. The current derived domain is activity (from audit, notifications, document events).
Domain — a Tier-1 bounded context with its own entities, controller, service, repository, and DTOs. Backend domains: document, person, tag, user, geschichte, notification, ocr, audit, dashboard. Frontend domains mirror this structure under src/lib/.
Infrastructure Terms
archiv-app — the bucket-scoped MinIO service account the backend uses to read and write the familienarchiv bucket. Distinct from the MinIO root account (archiv, used only by the bootstrap container for admin operations). Defined and provisioned in infra/minio/bootstrap.sh and consumed by the backend as S3_ACCESS_KEY in docker-compose.prod.yml. The attached archiv-app-policy grants s3:GetObject/PutObject/DeleteObject on familienarchiv/* and s3:ListBucket/GetBucketLocation on the bucket only — not the built-in readwrite policy which would grant s3:* on all buckets.
See also ADR-010 — MinIO stays self-hosted, not Hetzner OBS.
Pending Terms
Terms flagged as potentially ambiguous that have not yet been formally defined here. Add an entry above and remove it from this list when resolved.
- Terms surfaced by Epic 1 audit findings (#388–#392) — review audit reports under
docs/audits/when available and add any term flagged as ambiguous. OcrBatchServicevsOcrAsyncRunner— both handle async OCR orchestration; their division of responsibility should be clarified here.