Files
familienarchiv/docs/DEPLOYMENT.md
Marcel a3c17750cd
Some checks failed
CI / Unit & Component Tests (push) Has been cancelled
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
fix(docs): correct DEPLOYMENT.md env var name and prod overlay note
- Security checklist: OCR_TRAINING_TOKEN → APP_OCR_TRAINING_TOKEN (backend)
  plus TRAINING_TOKEN (OCR service); both must share the same value
- Bootstrap: clarify docker-compose.prod.yml is not committed — must be
  created from docs/infrastructure/production-compose.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 07:35:23 +02:00

12 KiB

Familienarchiv — Deployment Reference

If the app is down right now → jump to §4 Logs.

This doc is the Day-1 checklist and operational reference. It links to the canonical infrastructure docs in docs/infrastructure/ rather than duplicating them.

Audience: operator bringing up a fresh instance, or Successor-X debugging a live incident.

Ownership: project owner. Update this file in any PR that changes the container topology, env vars, or backup procedure.

Table of Contents

  1. Deployment topology
  2. Environment variables
  3. Bootstrap from scratch
  4. Logs + observability
  5. Backup + recovery
  6. Common operational tasks
  7. Known limitations

1. Deployment topology

graph TD
    Browser -->|HTTPS| Caddy["Caddy (TLS termination)"]
    Caddy -->|HTTP :5173| Frontend["Web Frontend\nSvelteKit / Node.js"]
    Caddy -->|HTTP :8080| Backend["API Backend\nSpring Boot / Jetty :8080"]
    Backend -->|JDBC :5432| DB[(PostgreSQL 16)]
    Backend -->|S3 API :9000| MinIO[(MinIO / Hetzner OBS)]
    Backend -->|HTTP :8000 internal| OCR["OCR Service\nPython FastAPI"]
    OCR -->|presigned URL| MinIO
    Browser -->|SSE direct| Backend

Key facts:

  • Caddy terminates TLS and reverse-proxies to frontend and backend. See the Caddyfile in docs/infrastructure/production-compose.md.
  • The OCR service has no external port — reachable only on the internal Docker network from the backend.
  • SSE notifications go directly backend → browser (not via the SvelteKit SSR layer).
  • Management port 8081 (Spring Actuator / Prometheus scrape) is internal only — the Caddy config blocks /actuator/* externally.

OCR memory requirements

The OCR service requires significant RAM for model loading. The dev compose sets mem_limit: 12g.

Production target RAM Recommended OCR limit Notes
Hetzner CX42 16 GB 12 GB Recommended for OCR-enabled production
Hetzner CX32 8 GB 6 GB Accept reduced batch sizes and slower throughput
Hetzner CX22 4 GB Disable the OCR service (profiles: [ocr]); run OCR on demand only

A CX32 cannot honour a mem_limit: 12g — set it to 6g in the prod overlay or use CX42.

Dev vs production differences

Concern Dev compose Prod overlay
MinIO image tag minio/minio:latest (unpinned) Pinned in prod overlay
Data persistence Bind mounts ./data/postgres, ./data/minio Named Docker volumes
Bucket creation create-buckets helper container Pre-created in Hetzner console
Spring profile dev,e2e (enables OpenAPI + Swagger UI) prod
Mail Mailpit (local catcher) Real SMTP

Full prod overlay: docs/infrastructure/production-compose.md.


2. Environment variables

All vars are set in .env at the repo root (copy from .env.example). The backend resolves them via application.yaml; the Docker Compose file wires them into each container.

Any var found in docker-compose.yml or application*.yaml that is not in this table is a blocking review comment on any PR that changes those files.

Backend

Variable Purpose Default Required? Sensitive?
SPRING_DATASOURCE_URL PostgreSQL JDBC URL YES
SPRING_DATASOURCE_USERNAME DB username YES
SPRING_DATASOURCE_PASSWORD DB password YES YES
S3_ENDPOINT MinIO / OBS endpoint URL YES
S3_ACCESS_KEY MinIO access key (use service account, not root in prod) YES YES
S3_SECRET_KEY MinIO secret key YES YES
S3_BUCKET_NAME Target bucket name YES
S3_REGION S3 region string us-east-1 YES
APP_ADMIN_USERNAME Bootstrap admin username (⚠ not in .env.example) admin YES
APP_ADMIN_PASSWORD Bootstrap admin password (⚠ ships as admin123) admin123 YES YES
APP_BASE_URL Public-facing URL for email links http://localhost:3000 YES (prod)
APP_OCR_BASE_URL Internal URL of the OCR service YES
APP_OCR_TRAINING_TOKEN Secret token for OCR training endpoints YES (prod) YES
MAIL_HOST SMTP host mailpit (dev) YES (prod)
MAIL_PORT SMTP port 1025 (dev) YES (prod)
MAIL_USERNAME SMTP username YES (prod) YES
MAIL_PASSWORD SMTP password YES (prod) YES
APP_MAIL_FROM From address for outbound mail noreply@familienarchiv.local YES (prod)
MAIL_SMTP_AUTH SMTP auth enabled false (dev) YES (prod)
MAIL_STARTTLS_ENABLE STARTTLS enabled false (dev) YES (prod)
SPRING_PROFILES_ACTIVE Spring profile dev,e2e (compose) YES

PostgreSQL container

Variable Purpose Default Required? Sensitive?
POSTGRES_USER DB superuser archive_user YES
POSTGRES_PASSWORD DB password change-me YES YES
POSTGRES_DB Database name family_archive_db YES

MinIO container

Variable Purpose Default Required? Sensitive?
MINIO_ROOT_USER MinIO root username minio_admin YES
MINIO_ROOT_PASSWORD MinIO root password change-me YES YES
MINIO_DEFAULT_BUCKETS Bucket name archive-documents YES

OCR service

Variable Purpose Default Required? Sensitive?
TRAINING_TOKEN Guards /train and /segtrain endpoints (accepts file uploads) YES (prod) YES
ALLOWED_PDF_HOSTS SSRF protection — comma-separated list of allowed PDF source hosts. Do not widen to * minio,localhost,127.0.0.1 YES
KRAKEN_MODEL_PATH Directory containing Kraken HTR models (populated by download-kraken-models.sh) /app/models/
BLLA_MODEL_PATH Kraken baseline layout analysis model path /app/models/blla.mlmodel

3. Bootstrap from scratch

Full VPS provisioning steps are in docs/infrastructure/production-compose.md. This section covers the sequence and the security-critical steps.

Security checklist — complete before first boot

⚠️ These defaults ship in .env.example and application.yaml. Change them or you will have an insecure installation.

  • Set APP_ADMIN_PASSWORD (default: admin123 — change before starting the backend)
  • Set APP_ADMIN_USERNAME if you want a non-default admin login name (add to .env — not in .env.example)
  • Rotate POSTGRES_PASSWORD from change-me
  • Rotate MINIO_ROOT_PASSWORD from change-me
  • Set a strong APP_OCR_TRAINING_TOKEN (backend) and the matching TRAINING_TOKEN (OCR service) — both must be the same value (python3 -c "import secrets; print(secrets.token_hex(32))")
  • Confirm ALLOWED_PDF_HOSTS is locked to your MinIO/S3 hostname — widening to * opens SSRF
  • Set SPRING_PROFILES_ACTIVE=prod in the prod overlay (not dev,e2e — that exposes Swagger UI and /v3/api-docs)
  • Use a dedicated MinIO service account for S3_ACCESS_KEY / S3_SECRET_KEY, not the root credentials

Bootstrap sequence

# 1. Copy and fill the env file
cp .env.example .env
# edit .env — complete the security checklist above first

# 2. (Production only) Create the MinIO / Hetzner OBS bucket in the console
#    The dev compose has a create-buckets helper; production does not.
#    Create the bucket named $MINIO_DEFAULT_BUCKETS with private access.

# 3. Start the stack (prod overlay — see docs/infrastructure/production-compose.md)
#    docker-compose.prod.yml is NOT committed — create it from the guide above
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

# 4. Flyway migrations run automatically on backend start.
#    Watch the backend log to confirm:
docker compose logs --follow --tail=100 backend

# 5. Verify the stack is healthy
curl http://localhost:8080/actuator/health
# Expected: {"status":"UP"}

# 6. Open the app and log in with the admin credentials from .env

Do not use docker-compose.ci.yml locally — it disables bind mounts that the dev workflow depends on.


4. Logs + observability

First-response commands

# Stream backend logs (most useful first)
docker compose logs --follow --tail=100 backend

# Stream all services
docker compose logs --follow

# Single snapshot
docker compose logs --tail=200 <service>
# services: frontend, backend, db, minio, ocr-service

Log locations

  • Backend application log: stdout (captured by Docker). Access inside the container at /app/logs/ via docker exec.
  • Spring Actuator health: http://localhost:8080/actuator/health (internal only in prod — port 8081 for Prometheus scraping)
  • Prometheus scraping: management port 8081, path /actuator/prometheus. Internal only; Caddy blocks /actuator/* externally.

Future observability

Phase 7 of the Production v1 milestone adds Prometheus + Loki + Grafana. No monitoring infrastructure is in place yet.


5. Backup + recovery

Current state — no automated backup

No automated backup is configured. Manual procedure for a point-in-time backup:

# PostgreSQL dump
docker exec archive-db pg_dump -U ${POSTGRES_USER} ${POSTGRES_DB} > backup-$(date +%Y%m%d).sql

# MinIO data (bind-mounted in dev)
# Copy ./data/minio/ to external storage

Restoration:

# Restore Postgres
docker exec -i archive-db psql -U ${POSTGRES_USER} ${POSTGRES_DB} < backup-YYYYMMDD.sql

Planned — phase 5 of Production v1 milestone

Automated backup (PostgreSQL WAL archiving + MinIO bucket replication) is planned in the Production v1 milestone phase 5. Until that ships: manual backups are the only recovery option.


6. Common operational tasks

Reset dev database (truncates data, keeps schema)

bash scripts/reset-db.sh

Truncates all data but does not drop the schema or re-run Flyway. Use for E2E test resets, not full reinstalls. ⚠️ Script hardcodes DB_USER=archive_user and DB_NAME=family_archive_db — if you customised these in .env, edit the script accordingly.

Rebuild frontend container (clears node_modules volume)

bash scripts/rebuild-frontend.sh

Assumes the Docker Compose volume is named familienarchiv_frontend_node_modules. If your project directory is not named familienarchiv, edit line 16 of the script.

Download Kraken OCR models

bash scripts/download-kraken-models.sh

Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.

Trigger a mass import (Excel/ODS)

  1. Place the import file in the import/ bind mount on the backend container.
  2. Call POST /api/admin/trigger-import (requires ADMIN permission).
  3. The import runs asynchronously — poll GET /api/admin/import-status or watch backend logs.

7. Known limitations

Limitation Reason Reference
Single-node OCR service The two required OCR engines (Surya + Kraken) exist only in the Python ecosystem; horizontal scaling would require a job queue not currently implemented ADR-001
No multi-tenancy Designed as a single-family private archive; all authenticated users share the same document space Deliberate scope decision (family-only product frame)
No multi-region Single PostgreSQL + MinIO instance; no replication or failover Deliberate scope decision
Max upload size 50 MB per file (500 MB per request for multi-file) Configurable in application.yaml (spring.servlet.multipart)
No automated backup Phase 5 of Production v1 milestone is not yet implemented See §5 above