Fourth canonical loader. Maps canonical-documents.xlsx by header name, routes each attribution register-first by source_ref (provisional person when a slug is unmatched), ALWAYS retains the raw sender_name/receiver_names in sender_text/receiver_text, splits pipe-delimited receivers, parses clean date_iso/date_precision/date_end/date_raw with no semantic logic, attaches the tag by canonical tag_path, and keeps the S3 upload + thumbnail plumbing in small resolveFile/uploadToS3/buildDocument methods. Documents upsert by index (originalFilename); UPLOADED when a file resolves on disk, PLACEHOLDER otherwise. Security guards ported intact from MassImportService BEFORE retiring it: isValidImportFilename (forward/back slash, three Unicode slash homoglyphs, .., null byte, absolute path), findFileRecursive canonical-path containment (symlink-escape), and the %PDF magic-byte check + FILE_READ_ERROR path. The file column is treated as hostile input (CWE-22): its basename is validated then resolved only inside importDir, so a traversal value cannot escape. Extracts the verbatim ImportStatus/SkipReason/SkippedFile shape into its own class so the admin UI contract is unchanged. Assumption: the committed canonical-documents.xlsx carries no sender_category/receiver_category columns (the issue's described schema) — the normalizer already resolved Option-A routing into slugs + raw names, so the loader routes by slug presence rather than a category enum. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Familienarchiv
Familienarchiv is a private web application for digitising, organising, and searching a family document collection — letters, postcards, and photographs from 1899 to 1950. Family members upload scans, transcribe handwritten text (Kurrent/Sütterlin), and read the archive from any device.
Subsystems
frontend/— SvelteKit 2 / Svelte 5 / TypeScript / Tailwind 4 web app (server-side rendered)backend/— Spring Boot 4 (Java 21) REST API; handles documents, persons, search, and user managementocr-service/— Python FastAPI microservice for OCR and handwritten text recognition (HTR); single-node by design — see ADR-001. Not part of the default dev stack (see Quick start below)infra/— Gitea Actions CI/CD config; future home for infrastructure-as-codescripts/— operational and data-pipeline helpers (reset-db.sh,clean-e2e-data.sh, import scripts)
Quick start
Prerequisites: Java 21, Node 24, Docker with the docker compose plugin (V2).
1. Configure environment
cp .env.example .env
# The defaults in .env.example work for local development without changes.
2. Start infrastructure
# Starts PostgreSQL, MinIO (object storage), and Mailpit (dev mail catcher)
docker compose up -d db minio mailpit
3. Start the backend
cd backend
./mvnw spring-boot:run
# Starts on http://localhost:8080
# API docs (dev profile, auto-enabled): http://localhost:8080/v3/api-docs
4. Start the frontend
cd frontend
npm install
npm run dev
# Starts on http://localhost:5173
Open http://localhost:5173 — you should see the Familienarchiv login screen.
Default development credentials:
# local dev only — change before any network-exposed deployment
Email: admin@familyarchive.local
Password: admin123
Development setup only. The default
docker composeconfig exposes the database port and uses root MinIO credentials. Do not connect this to a network without first readingdocs/DEPLOYMENT.md(coming: DOC-5, #399).
Running the full stack via Docker (optional)
To run everything including the backend and frontend in containers:
docker compose up -d
Note: the OCR service (ocr-service/) builds its Docker image locally and downloads ~6 GB of ML models on first start. Expect 30–60 minutes on a first run. The rest of the stack starts independently; OCR can be excluded with --scale ocr-service=0 on memory-constrained machines (requires ≥ 12 GB RAM).
Where to go next
| Resource | Purpose |
|---|---|
| docs/architecture/c4-diagrams.md | C4 container and component diagrams (current system view) |
| docs/ARCHITECTURE.md (coming: DOC-2, #396) | Full architecture guide with domain list |
| docs/GLOSSARY.md | Overloaded terms: Person vs AppUser, Chronik vs Aktivität, etc. |
| CONTRIBUTING.md (coming: DOC-4, #398) | How to add a domain, endpoint, or SvelteKit route |
| docs/DEPLOYMENT.md (coming: DOC-5, #399) | Production deployment checklist and secrets guide |
| docs/adr/ | Architecture Decision Records — the "why" behind key choices |
| Gitea issue tracker (internal — home network only) | Bug reports, feature requests, and project planning |
License
Private project — all rights reserved. Not licensed for redistribution.