Files
familienarchiv/docs/DEPLOYMENT.md
2026-05-12 07:42:28 +02:00

18 KiB

Familienarchiv — Deployment Reference

If the app is down right now → jump to §4 Logs.

This doc is the Day-1 checklist and operational reference. It links to the canonical infrastructure docs in docs/infrastructure/ rather than duplicating them.

Audience: operator bringing up a fresh instance, or Successor-X debugging a live incident.

Ownership: project owner. Update this file in any PR that changes the container topology, env vars, or backup procedure.

Table of Contents

  1. Deployment topology
  2. Environment variables
  3. Bootstrap from scratch
  4. Logs + observability
  5. Backup + recovery
  6. Common operational tasks
  7. Known limitations

1. Deployment topology

graph TD
    Browser -->|HTTPS| Caddy["Caddy (TLS termination)"]
    Caddy -->|HTTP :3000| Frontend["Web Frontend\nSvelteKit Node adapter"]
    Caddy -->|HTTP :8080| Backend["API Backend\nSpring Boot / Jetty :8080"]
    Backend -->|JDBC :5432| DB[(PostgreSQL 16)]
    Backend -->|S3 API :9000| MinIO[(MinIO)]
    Backend -->|HTTP :8000 internal| OCR["OCR Service\nPython FastAPI"]
    OCR -->|presigned URL| MinIO
    Caddy -->|SSE proxy_pass| Backend

Key facts:

  • Caddy terminates TLS and reverse-proxies to frontend (:3000) and backend (:8080). The Caddyfile is committed at infra/caddy/Caddyfile and is installed on the host as /etc/caddy/Caddyfile (symlink).
  • The host binds all docker-published ports to 127.0.0.1 only; Caddy is the sole external entry point.
  • The OCR service has no published port — reachable only on the internal Docker network from the backend.
  • SSE notifications transit Caddy (browser → Caddy → backend); the backend is never reachable directly from the public internet. The SvelteKit SSR layer is bypassed for SSE, but Caddy is not.
  • The Caddyfile responds 404 on /actuator/* (defense in depth). Internal monitoring scrapes the backend on the docker network, not through Caddy.
  • Production and staging cohabit on the same host via docker compose project names: archiv-production (ports 8080/3000) and archiv-staging (ports 8081/3001).

OCR memory requirements

The OCR service requires significant RAM for model loading. The dev compose sets mem_limit: 12g.

Production target RAM Recommended OCR limit Notes
Hetzner CX42 16 GB 12 GB Recommended for OCR-enabled production
Hetzner CX32 8 GB 6 GB Accept reduced batch sizes and slower throughput
Hetzner CX22 4 GB Disable the OCR service (profiles: [ocr]); run OCR on demand only

A CX32 cannot honour the default mem_limit: 12g — set the OCR_MEM_LIMIT=6g env var (in .env.production / .env.staging, or as a Gitea secret consumed by the workflow) before deploying on a CX32. The prod compose interpolates this var with a 12g default.

Dev vs production differences

Concern Dev (docker-compose.yml) Prod (docker-compose.prod.yml)
MinIO image tag minio/minio:latest Pinned minio/minio:RELEASE.…
Data persistence Bind mounts ./data/postgres, ./data/minio Named Docker volumes (postgres-data, minio-data)
MinIO credentials for backend Root user/password Service account archiv-app with bucket-scoped rights
Bucket creation create-buckets helper Same helper, plus service-account bootstrap on every up
Spring profile dev,e2e (Swagger + e2e overrides) unset — base application.yaml is production-ready
Mail Mailpit (local catcher) Real SMTP (production) / Mailpit via profiles: [staging] (staging)
Frontend image Dev server, target: development, port 5173 Node adapter, target: production, port 3000
Host port binding All published Bound to 127.0.0.1 only; Caddy is the front door
Deploy method docker compose up -d (manual) Gitea Actions: nightly.yml (staging, cron) and release.yml (production, on v* tag) — both use up -d --wait

Full prod compose: docker-compose.prod.yml. Workflow files: .gitea/workflows/nightly.yml, .gitea/workflows/release.yml.


2. Environment variables

All vars are set in .env at the repo root (copy from .env.example). The backend resolves them via application.yaml; the Docker Compose file wires them into each container.

Any var found in docker-compose.yml or application*.yaml that is not in this table is a blocking review comment on any PR that changes those files.

Backend

Variable Purpose Default Required? Sensitive?
SPRING_DATASOURCE_URL PostgreSQL JDBC URL YES
SPRING_DATASOURCE_USERNAME DB username YES
SPRING_DATASOURCE_PASSWORD DB password YES YES
S3_ENDPOINT MinIO / OBS endpoint URL YES
S3_ACCESS_KEY MinIO access key (use service account, not root in prod) YES YES
S3_SECRET_KEY MinIO secret key YES YES
S3_BUCKET_NAME Target bucket name YES
S3_REGION S3 region string us-east-1 YES
APP_ADMIN_USERNAME Bootstrap admin username (⚠ not in .env.example) admin YES
APP_ADMIN_PASSWORD Bootstrap admin password (⚠ ships as admin123) admin123 YES YES
APP_BASE_URL Public-facing URL for email links http://localhost:3000 YES (prod)
APP_OCR_BASE_URL Internal URL of the OCR service YES
APP_OCR_TRAINING_TOKEN Secret token for OCR training endpoints YES (prod) YES
IMPORT_HOST_DIR Absolute host path holding the ODS spreadsheet + PDFs for the /admin/system mass-import card. Mounted read-only at /import inside the backend (compose-only — backend reads via app.import.dir). Compose refuses to start when unset, so staging and prod cannot accidentally share the source. Convention: /srv/familienarchiv-staging/import and /srv/familienarchiv-production/import YES (prod compose)
MAIL_HOST SMTP host mailpit (dev) YES (prod)
MAIL_PORT SMTP port 1025 (dev) YES (prod)
MAIL_USERNAME SMTP username YES (prod) YES
MAIL_PASSWORD SMTP password YES (prod) YES
APP_MAIL_FROM From address for outbound mail noreply@familienarchiv.local YES (prod)
MAIL_SMTP_AUTH SMTP auth enabled false (dev) YES (prod)
MAIL_STARTTLS_ENABLE STARTTLS enabled false (dev) YES (prod)
SPRING_PROFILES_ACTIVE Spring profile dev,e2e (compose) YES

PostgreSQL container

Variable Purpose Default Required? Sensitive?
POSTGRES_USER DB superuser archive_user YES
POSTGRES_PASSWORD DB password change-me YES YES
POSTGRES_DB Database name family_archive_db YES

MinIO container

Variable Purpose Default Required? Sensitive?
MINIO_ROOT_USER MinIO root username (dev compose only — prod compose hardcodes archiv) minio_admin YES (dev)
MINIO_ROOT_PASSWORD / MINIO_PASSWORD MinIO root password. Used only by the mc admin bootstrap in prod, never by the backend. change-me YES YES
MINIO_APP_PASSWORD Password for the archiv-app service account that the backend uses. Bucket-scoped via readwrite policy on familienarchiv. Bootstrapped by create-buckets. YES (prod) YES
MINIO_DEFAULT_BUCKETS Bucket name (dev compose only — prod compose hardcodes familienarchiv) archive-documents YES (dev)

OCR service

Variable Purpose Default Required? Sensitive?
TRAINING_TOKEN Guards /train and /segtrain endpoints (accepts file uploads) YES (prod) YES
ALLOWED_PDF_HOSTS SSRF protection — comma-separated list of allowed PDF source hosts. Do not widen to * minio,localhost,127.0.0.1 YES
KRAKEN_MODEL_PATH Directory containing Kraken HTR models (populated by download-kraken-models.sh) /app/models/
BLLA_MODEL_PATH Kraken baseline layout analysis model path /app/models/blla.mlmodel
OCR_MEM_LIMIT Container memory cap for ocr-service in docker-compose.prod.yml. Set to 6g on CX32 hosts; leave unset on CX42+ to use the 12g default 12g (prod compose default)

3. Bootstrap from scratch

Production and staging deploy via Gitea Actions (release.yml on v* tag, nightly.yml on cron). The server itself only needs to host Caddy, Docker, and the runner — the workflows handle the rest.

3.1 Server one-time setup

# Base hardening
ufw default deny incoming && ufw allow 22/tcp && ufw allow 80/tcp && ufw allow 443/tcp && ufw enable
# /etc/ssh/sshd_config: PasswordAuthentication no, PermitRootLogin no

# Install Caddy 2 (https://caddyserver.com/docs/install#debian-ubuntu-raspbian)
apt install caddy

# Use the Caddyfile from the repo (replace path with the runner's clone target)
# CI DEPENDENCY: the nightly and release workflows run `systemctl reload caddy` to
# pick up committed Caddyfile changes. They find the file via this symlink — if it
# is absent or points elsewhere, the reload succeeds but serves stale config.
ln -sf /opt/familienarchiv/infra/caddy/Caddyfile /etc/caddy/Caddyfile
systemctl reload caddy

# fail2ban — protect /api/auth/login from credential stuffing.
# Jail watches the Caddy JSON access log for 401 responses on
# /api/auth/login. The jail (maxretry=10 / findtime=10m / bantime=30m)
# and filter are committed under infra/fail2ban/ — symlink them in:
apt install fail2ban
ln -sf /opt/familienarchiv/infra/fail2ban/jail.d/familienarchiv.conf \
       /etc/fail2ban/jail.d/familienarchiv.conf
ln -sf /opt/familienarchiv/infra/fail2ban/filter.d/familienarchiv-auth.conf \
       /etc/fail2ban/filter.d/familienarchiv-auth.conf
systemctl reload fail2ban
# Verify after first deploy with:
#   fail2ban-client status familienarchiv-auth
#   fail2ban-regex /var/log/caddy/access.log familienarchiv-auth

# Tailscale — used by the backup pipeline to reach heim-nas (follow-up issue)
curl -fsSL https://tailscale.com/install.sh | sh && tailscale up

# Self-hosted Gitea runner — register against the repo with a runner token.
# This runner is assumed single-tenant: the deploy workflows write .env.*
# files to disk during execution (cleaned up unconditionally on completion).
# A multi-tenant runner would need to switch to stdin-piped env files.
# (See https://docs.gitea.com/usage/actions/quickstart for the register step.)

3.2 DNS records

archiv.raddatz.cloud   A   <server IP>
staging.raddatz.cloud  A   <server IP>
git.raddatz.cloud      A   <server IP>

3.3 Gitea secrets (Repo → Settings → Actions → Secrets)

Secret Used by Notes
PROD_POSTGRES_PASSWORD release.yml strong unique password
PROD_MINIO_PASSWORD release.yml MinIO root password; used only at bootstrap
PROD_MINIO_APP_PASSWORD release.yml application service-account password
PROD_OCR_TRAINING_TOKEN release.yml python3 -c "import secrets; print(secrets.token_hex(32))"
PROD_APP_ADMIN_USERNAME release.yml e.g. admin@archiv.raddatz.cloud
PROD_APP_ADMIN_PASSWORD release.yml ⚠ locked permanently on first deploy — see §3.5
STAGING_POSTGRES_PASSWORD nightly.yml different from prod
STAGING_MINIO_PASSWORD nightly.yml different from prod
STAGING_MINIO_APP_PASSWORD nightly.yml different from prod
STAGING_OCR_TRAINING_TOKEN nightly.yml different from prod
STAGING_APP_ADMIN_USERNAME nightly.yml e.g. admin@staging.raddatz.cloud
STAGING_APP_ADMIN_PASSWORD nightly.yml locked on first staging deploy
MAIL_HOST release.yml SMTP relay hostname (prod only)
MAIL_PORT release.yml typically 587
MAIL_USERNAME release.yml SMTP user
MAIL_PASSWORD release.yml SMTP password

3.4 First deploy

# 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
#    Expected: docker compose up -d --wait succeeds for archiv-staging, then
#    the workflow's "Smoke test deployed environment" step asserts:
#      - https://staging.raddatz.cloud/login returns 200
#      - HSTS header is present
#      - /actuator/health returns 404 (defense-in-depth check)
# 2. (Optional) Re-verify manually
curl -I https://staging.raddatz.cloud/
#    Expected: 200 (login page) with HSTS + X-Content-Type-Options headers
# 3. When staging looks healthy, push a v* tag to trigger release.yml
git tag v1.0.0 && git push origin v1.0.0

3.5 ⚠ Admin password is locked on first deploy

UserDataInitializer creates the admin user only if the email does not exist. The first successful deploy persists the admin password to the database. Changing PROD_APP_ADMIN_PASSWORD in Gitea secrets after that point has no effect — the secret is only consulted when the row is missing.

Before the first deploy: rotate PROD_APP_ADMIN_PASSWORD to a strong value. After the first deploy: change the admin password via the in-app account settings, not via the Gitea secret.


4. Logs + observability

First-response commands

# Stream backend logs (most useful first)
docker compose logs --follow --tail=100 backend

# Stream all services
docker compose logs --follow

# Single snapshot
docker compose logs --tail=200 <service>
# services: frontend, backend, db, minio, ocr-service

Log locations

  • Backend application log: stdout (captured by Docker). Access inside the container at /app/logs/ via docker exec.
  • Spring Actuator health: http://localhost:8080/actuator/health (internal only in prod — port 8081 for Prometheus scraping)
  • Prometheus scraping: management port 8081, path /actuator/prometheus. Internal only; Caddy blocks /actuator/* externally.

Future observability

Phase 7 of the Production v1 milestone adds Prometheus + Loki + Grafana. No monitoring infrastructure is in place yet.


5. Backup + recovery

Current state — no automated backup

No automated backup is configured. Manual procedure for a point-in-time backup:

# PostgreSQL dump
docker exec archive-db pg_dump -U ${POSTGRES_USER} ${POSTGRES_DB} > backup-$(date +%Y%m%d).sql

# MinIO data (bind-mounted in dev)
# Copy ./data/minio/ to external storage

Restoration:

# Restore Postgres
docker exec -i archive-db psql -U ${POSTGRES_USER} ${POSTGRES_DB} < backup-YYYYMMDD.sql

Planned — phase 5 of Production v1 milestone

Automated backup (nightly pg_dump + MinIO mc mirror over Tailscale to heim-nas) is a follow-up issue. Until that ships: manual backups are the only recovery option.

Rollback

Each release tag corresponds to a docker image tag on the host daemon (built via DooD; no registry). Rolling back to a previous tag is one command:

TAG=v1.0.0 docker compose \
  -f docker-compose.prod.yml \
  -p archiv-production \
  --env-file /opt/familienarchiv/.env.production \
  up -d --wait --remove-orphans

If the rollback target image is no longer present on the host (host disk pruned, etc.), re-trigger release.yml for that tag from Gitea Actions UI — it rebuilds and redeploys.

Flyway migrations are not auto-rolled-back. If a release contained a destructive migration (drop column, rename table), a tag rollback brings the schema back to a previous app version but the data shape has already changed. For breaking schema changes, prefer a forward-only fix.


6. Common operational tasks

Reset dev database (truncates data, keeps schema)

bash scripts/reset-db.sh

Truncates all data but does not drop the schema or re-run Flyway. Use for E2E test resets, not full reinstalls. ⚠️ Script hardcodes DB_USER=archive_user and DB_NAME=family_archive_db — if you customised these in .env, edit the script accordingly.

Rebuild frontend container (clears node_modules volume)

bash scripts/rebuild-frontend.sh

Assumes the Docker Compose volume is named familienarchiv_frontend_node_modules. If your project directory is not named familienarchiv, edit line 16 of the script.

Download Kraken OCR models

bash scripts/download-kraken-models.sh

Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.

Trigger a mass import (Excel/ODS)

Dev: drop the ODS spreadsheet + PDFs into ./import/ at the repo root — the dev compose bind-mounts it to /import automatically.

Staging/production:

  1. Pre-stage the payload on the host. Convention: /srv/familienarchiv-staging/import/ or /srv/familienarchiv-production/import/.
    rsync -avh --progress ./import/ user@host:/srv/familienarchiv-staging/import/
    
  2. Make sure IMPORT_HOST_DIR=<host-path> is set in .env.staging / .env.production (the nightly/release workflows already write this — see §3). Compose refuses to start without it.
  3. Redeploy the stack so the bind mount picks up — or, if the mount is already in place, skip to step 4.
  4. Call POST /api/admin/trigger-import (requires ADMIN permission), or click the "Import starten" button on /admin/system.
  5. The import runs asynchronously — poll GET /api/admin/import-status, watch /admin/system, or tail the backend logs.

7. Known limitations

Limitation Reason Reference
Single-node OCR service The two required OCR engines (Surya + Kraken) exist only in the Python ecosystem; horizontal scaling would require a job queue not currently implemented ADR-001
No multi-tenancy Designed as a single-family private archive; all authenticated users share the same document space Deliberate scope decision (family-only product frame)
No multi-region Single PostgreSQL + MinIO instance; no replication or failover Deliberate scope decision
Max upload size 50 MB per file (500 MB per request for multi-file) Configurable in application.yaml (spring.servlet.multipart)
No automated backup Phase 5 of Production v1 milestone is not yet implemented See §5 above