The four files in tools/import-normalizer/out/ contain real names, addresses, and attribution prose for ~163 living/deceased family members and were committed by mistake. They are now removed from the index (kept on disk for local development) and gitignored. The canonical artifacts are produced locally from the Python normalizer and synced into IMPORT_HOST_DIR out-of-band alongside the PDFs. The contract between normalizer and importer is the header schema, not the file contents — CanonicalSheetReader fails closed on a missing header, which is what locks the contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
docs/
Project documentation organised into four categories: architecture decision records (ADRs), system architecture diagrams, infrastructure runbooks, and detailed UI/UX specifications.
Folder structure
docs/
├── adr/ # Architecture Decision Records
├── architecture/ # C4 model diagrams and system architecture docs
├── infrastructure/ # Deployment, CI/CD, and ops guides
├── specs/ # UI/UX feature specifications (HTML)
├── ARCHITECTURE.md # Human-readable architecture overview (DOC-2)
├── DEPLOYMENT.md # Day-1 checklist and operational reference (DOC-5)
├── GLOSSARY.md # Domain terminology (DOC-3)
├── security-guide.md # Security policies and hardening guide
└── STYLEGUIDE.md # Coding and design style guide
ADR (adr/)
Architecture Decision Records capture major technical decisions and their rationale.
| ADR | Title | Status |
|---|---|---|
001-ocr-python-microservice.md |
OCR as a separate Python container | Accepted |
002-polygon-jsonb-storage.md |
Polygon coordinates in JSONB columns | Accepted |
003-chronik-unified-activity-feed.md |
Unified activity feed (Chronik) | Accepted |
When making a significant architectural change (new service, data model change, technology swap), write a new ADR:
- Status (Proposed / Accepted / Deprecated / Superseded)
- Context (forces at play)
- Decision (what we decided)
- Consequences (trade-offs)
- Alternatives Considered (table format)
ADRs are sequential (NNN-descriptive-name.md). Do not reuse numbers.
Architecture (architecture/)
Contains C4 model diagrams describing the system at different zoom levels:
- Context diagram — How Familienarchiv fits into the user and system ecosystem
- Container diagram — The high-level technology choices (Spring Boot, SvelteKit, PostgreSQL, MinIO, OCR service)
- Component diagram — Major structural components within the backend
Written in Markdown with embedded Mermaid diagrams (c4-diagrams.md). Gitea renders these automatically.
For the human-readable architecture narrative, see docs/ARCHITECTURE.md.
Infrastructure (infrastructure/)
Operational documentation for running Familienarchiv in production and CI.
| Document | Purpose |
|---|---|
ci-gitea.md |
Gitea CI/CD pipeline configuration |
production-compose.md |
Production Docker Compose setup and VPS provisioning |
s3-migration.md |
Migrating documents between S3 buckets |
self-hosted-catalogue.md |
Self-hosted software catalogue |
For the day-1 deployment checklist, see docs/DEPLOYMENT.md.
Specs (specs/)
High-fidelity UI/UX specifications written as standalone HTML files. These are design documents describing exact layout, interactions, and responsive behavior before implementation.
Each spec typically includes:
- Visual mockups with CSS-in-HTML styling
- Interaction flows and state transitions
- Responsive breakpoint behavior
- Accessibility requirements
Before implementing a feature, check specs/ for an existing specification.
Style Guide
docs/STYLEGUIDE.md covers:
- Code formatting and linting rules
- Component naming conventions
- Color palette and typography
- Accessibility standards (WCAG 2.1 AA)