Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
17 KiB
Familienarchiv — Deployment Reference
If the app is down right now → jump to §4 Logs.
This doc is the Day-1 checklist and operational reference. It links to the canonical infrastructure docs in docs/infrastructure/ rather than duplicating them.
Audience: operator bringing up a fresh instance, or Successor-X debugging a live incident.
Ownership: project owner. Update this file in any PR that changes the container topology, env vars, or backup procedure.
Table of Contents
- Deployment topology
- Environment variables
- Bootstrap from scratch
- Logs + observability
- Backup + recovery
- Common operational tasks
- Known limitations
1. Deployment topology
graph TD
Browser -->|HTTPS| Caddy["Caddy (TLS termination)"]
Caddy -->|HTTP :3000| Frontend["Web Frontend\nSvelteKit Node adapter"]
Caddy -->|HTTP :8080| Backend["API Backend\nSpring Boot / Jetty :8080"]
Backend -->|JDBC :5432| DB[(PostgreSQL 16)]
Backend -->|S3 API :9000| MinIO[(MinIO)]
Backend -->|HTTP :8000 internal| OCR["OCR Service\nPython FastAPI"]
OCR -->|presigned URL| MinIO
Caddy -->|SSE proxy_pass| Backend
Key facts:
- Caddy terminates TLS and reverse-proxies to frontend (
:3000) and backend (:8080). The Caddyfile is committed atinfra/caddy/Caddyfileand is installed on the host as/etc/caddy/Caddyfile(symlink). - The host binds all docker-published ports to
127.0.0.1only; Caddy is the sole external entry point. - The OCR service has no published port — reachable only on the internal Docker network from the backend.
- SSE notifications transit Caddy (browser → Caddy → backend); the backend is never reachable directly from the public internet. The SvelteKit SSR layer is bypassed for SSE, but Caddy is not.
- The Caddyfile responds
404on/actuator/*(defense in depth). Internal monitoring scrapes the backend on the docker network, not through Caddy. - Production and staging cohabit on the same host via docker compose project names:
archiv-production(ports 8080/3000) andarchiv-staging(ports 8081/3001).
OCR memory requirements
The OCR service requires significant RAM for model loading. The dev compose sets mem_limit: 12g.
| Production target | RAM | Recommended OCR limit | Notes |
|---|---|---|---|
| Hetzner CX42 | 16 GB | 12 GB | Recommended for OCR-enabled production |
| Hetzner CX32 | 8 GB | 6 GB | Accept reduced batch sizes and slower throughput |
| Hetzner CX22 | 4 GB | — | Disable the OCR service (profiles: [ocr]); run OCR on demand only |
A CX32 cannot honour the default mem_limit: 12g — set the OCR_MEM_LIMIT=6g env var (in .env.production / .env.staging, or as a Gitea secret consumed by the workflow) before deploying on a CX32. The prod compose interpolates this var with a 12g default.
Dev vs production differences
| Concern | Dev (docker-compose.yml) |
Prod (docker-compose.prod.yml) |
|---|---|---|
| MinIO image tag | minio/minio:latest |
Pinned minio/minio:RELEASE.… |
| Data persistence | Bind mounts ./data/postgres, ./data/minio |
Named Docker volumes (postgres-data, minio-data) |
| MinIO credentials for backend | Root user/password | Service account archiv-app with bucket-scoped rights |
| Bucket creation | create-buckets helper |
Same helper, plus service-account bootstrap on every up |
| Spring profile | dev,e2e (Swagger + e2e overrides) |
unset — base application.yaml is production-ready |
| Mailpit (local catcher) | Real SMTP (production) / Mailpit via profiles: [staging] (staging) |
|
| Frontend image | Dev server, target: development, port 5173 |
Node adapter, target: production, port 3000 |
| Host port binding | All published | Bound to 127.0.0.1 only; Caddy is the front door |
| Deploy method | docker compose up -d (manual) |
Gitea Actions: nightly.yml (staging, cron) and release.yml (production, on v* tag) — both use up -d --wait |
Full prod compose: docker-compose.prod.yml. Workflow files: .gitea/workflows/nightly.yml, .gitea/workflows/release.yml.
2. Environment variables
All vars are set in .env at the repo root (copy from .env.example). The backend resolves them via application.yaml; the Docker Compose file wires them into each container.
Any var found in docker-compose.yml or application*.yaml that is not in this table is a blocking review comment on any PR that changes those files.
Backend
| Variable | Purpose | Default | Required? | Sensitive? |
|---|---|---|---|---|
SPRING_DATASOURCE_URL |
PostgreSQL JDBC URL | — | YES | — |
SPRING_DATASOURCE_USERNAME |
DB username | — | YES | — |
SPRING_DATASOURCE_PASSWORD |
DB password | — | YES | YES |
S3_ENDPOINT |
MinIO / OBS endpoint URL | — | YES | — |
S3_ACCESS_KEY |
MinIO access key (use service account, not root in prod) | — | YES | YES |
S3_SECRET_KEY |
MinIO secret key | — | YES | YES |
S3_BUCKET_NAME |
Target bucket name | — | YES | — |
S3_REGION |
S3 region string | us-east-1 |
YES | — |
APP_ADMIN_USERNAME |
Bootstrap admin username (⚠ not in .env.example) | admin |
YES | — |
APP_ADMIN_PASSWORD |
Bootstrap admin password (⚠ ships as admin123) |
admin123 |
YES | YES |
APP_BASE_URL |
Public-facing URL for email links | http://localhost:3000 |
YES (prod) | — |
APP_OCR_BASE_URL |
Internal URL of the OCR service | — | YES | — |
APP_OCR_TRAINING_TOKEN |
Secret token for OCR training endpoints | — | YES (prod) | YES |
MAIL_HOST |
SMTP host | mailpit (dev) |
YES (prod) | — |
MAIL_PORT |
SMTP port | 1025 (dev) |
YES (prod) | — |
MAIL_USERNAME |
SMTP username | — | YES (prod) | YES |
MAIL_PASSWORD |
SMTP password | — | YES (prod) | YES |
APP_MAIL_FROM |
From address for outbound mail | noreply@familienarchiv.local |
YES (prod) | — |
MAIL_SMTP_AUTH |
SMTP auth enabled | false (dev) |
YES (prod) | — |
MAIL_STARTTLS_ENABLE |
STARTTLS enabled | false (dev) |
YES (prod) | — |
SPRING_PROFILES_ACTIVE |
Spring profile | dev,e2e (compose) |
YES | — |
PostgreSQL container
| Variable | Purpose | Default | Required? | Sensitive? |
|---|---|---|---|---|
POSTGRES_USER |
DB superuser | archive_user |
YES | — |
POSTGRES_PASSWORD |
DB password | change-me |
YES | YES |
POSTGRES_DB |
Database name | family_archive_db |
YES | — |
MinIO container
| Variable | Purpose | Default | Required? | Sensitive? |
|---|---|---|---|---|
MINIO_ROOT_USER |
MinIO root username (dev compose only — prod compose hardcodes archiv) |
minio_admin |
YES (dev) | — |
MINIO_ROOT_PASSWORD / MINIO_PASSWORD |
MinIO root password. Used only by the mc admin bootstrap in prod, never by the backend. |
change-me |
YES | YES |
MINIO_APP_PASSWORD |
Password for the archiv-app service account that the backend uses. Bucket-scoped via readwrite policy on familienarchiv. Bootstrapped by create-buckets. |
— | YES (prod) | YES |
MINIO_DEFAULT_BUCKETS |
Bucket name (dev compose only — prod compose hardcodes familienarchiv) |
archive-documents |
YES (dev) | — |
OCR service
| Variable | Purpose | Default | Required? | Sensitive? |
|---|---|---|---|---|
TRAINING_TOKEN |
Guards /train and /segtrain endpoints (accepts file uploads) |
— | YES (prod) | YES |
ALLOWED_PDF_HOSTS |
SSRF protection — comma-separated list of allowed PDF source hosts. Do not widen to * |
minio,localhost,127.0.0.1 |
YES | — |
KRAKEN_MODEL_PATH |
Directory containing Kraken HTR models (populated by download-kraken-models.sh) |
/app/models/ |
— | — |
BLLA_MODEL_PATH |
Kraken baseline layout analysis model path | /app/models/blla.mlmodel |
— | — |
OCR_MEM_LIMIT |
Container memory cap for ocr-service in docker-compose.prod.yml. Set to 6g on CX32 hosts; leave unset on CX42+ to use the 12g default |
12g (prod compose default) |
— | — |
3. Bootstrap from scratch
Production and staging deploy via Gitea Actions (release.yml on v* tag, nightly.yml on cron). The server itself only needs to host Caddy, Docker, and the runner — the workflows handle the rest.
3.1 Server one-time setup
# Base hardening
ufw default deny incoming && ufw allow 22/tcp && ufw allow 80/tcp && ufw allow 443/tcp && ufw enable
# /etc/ssh/sshd_config: PasswordAuthentication no, PermitRootLogin no
# Install Caddy 2 (https://caddyserver.com/docs/install#debian-ubuntu-raspbian)
apt install caddy
# Use the Caddyfile from the repo (replace path with the runner's clone target)
# CI DEPENDENCY: the nightly and release workflows run `systemctl reload caddy` to
# pick up committed Caddyfile changes. They find the file via this symlink — if it
# is absent or points elsewhere, the reload succeeds but serves stale config.
ln -sf /opt/familienarchiv/infra/caddy/Caddyfile /etc/caddy/Caddyfile
systemctl reload caddy
# fail2ban — protect /api/auth/login from credential stuffing.
# Jail watches the Caddy JSON access log for 401 responses on
# /api/auth/login. The jail (maxretry=10 / findtime=10m / bantime=30m)
# and filter are committed under infra/fail2ban/ — symlink them in:
apt install fail2ban
ln -sf /opt/familienarchiv/infra/fail2ban/jail.d/familienarchiv.conf \
/etc/fail2ban/jail.d/familienarchiv.conf
ln -sf /opt/familienarchiv/infra/fail2ban/filter.d/familienarchiv-auth.conf \
/etc/fail2ban/filter.d/familienarchiv-auth.conf
systemctl reload fail2ban
# Verify after first deploy with:
# fail2ban-client status familienarchiv-auth
# fail2ban-regex /var/log/caddy/access.log familienarchiv-auth
# Tailscale — used by the backup pipeline to reach heim-nas (follow-up issue)
curl -fsSL https://tailscale.com/install.sh | sh && tailscale up
# Self-hosted Gitea runner — register against the repo with a runner token.
# This runner is assumed single-tenant: the deploy workflows write .env.*
# files to disk during execution (cleaned up unconditionally on completion).
# A multi-tenant runner would need to switch to stdin-piped env files.
# (See https://docs.gitea.com/usage/actions/quickstart for the register step.)
3.2 DNS records
archiv.raddatz.cloud A <server IP>
staging.raddatz.cloud A <server IP>
git.raddatz.cloud A <server IP>
3.3 Gitea secrets (Repo → Settings → Actions → Secrets)
| Secret | Used by | Notes |
|---|---|---|
PROD_POSTGRES_PASSWORD |
release.yml | strong unique password |
PROD_MINIO_PASSWORD |
release.yml | MinIO root password; used only at bootstrap |
PROD_MINIO_APP_PASSWORD |
release.yml | application service-account password |
PROD_OCR_TRAINING_TOKEN |
release.yml | python3 -c "import secrets; print(secrets.token_hex(32))" |
PROD_APP_ADMIN_USERNAME |
release.yml | e.g. admin@archiv.raddatz.cloud |
PROD_APP_ADMIN_PASSWORD |
release.yml | ⚠ locked permanently on first deploy — see §3.5 |
STAGING_POSTGRES_PASSWORD |
nightly.yml | different from prod |
STAGING_MINIO_PASSWORD |
nightly.yml | different from prod |
STAGING_MINIO_APP_PASSWORD |
nightly.yml | different from prod |
STAGING_OCR_TRAINING_TOKEN |
nightly.yml | different from prod |
STAGING_APP_ADMIN_USERNAME |
nightly.yml | e.g. admin@staging.raddatz.cloud |
STAGING_APP_ADMIN_PASSWORD |
nightly.yml | locked on first staging deploy |
MAIL_HOST |
release.yml | SMTP relay hostname (prod only) |
MAIL_PORT |
release.yml | typically 587 |
MAIL_USERNAME |
release.yml | SMTP user |
MAIL_PASSWORD |
release.yml | SMTP password |
3.4 First deploy
# 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
# Expected: docker compose up -d --wait succeeds for archiv-staging, then
# the workflow's "Smoke test deployed environment" step asserts:
# - https://staging.raddatz.cloud/login returns 200
# - HSTS header is present
# - /actuator/health returns 404 (defense-in-depth check)
# 2. (Optional) Re-verify manually
curl -I https://staging.raddatz.cloud/
# Expected: 200 (login page) with HSTS + X-Content-Type-Options headers
# 3. When staging looks healthy, push a v* tag to trigger release.yml
git tag v1.0.0 && git push origin v1.0.0
3.5 ⚠ Admin password is locked on first deploy
UserDataInitializer creates the admin user only if the email does not exist. The first successful deploy persists the admin password to the database. Changing PROD_APP_ADMIN_PASSWORD in Gitea secrets after that point has no effect — the secret is only consulted when the row is missing.
Before the first deploy: rotate PROD_APP_ADMIN_PASSWORD to a strong value. After the first deploy: change the admin password via the in-app account settings, not via the Gitea secret.
4. Logs + observability
First-response commands
# Stream backend logs (most useful first)
docker compose logs --follow --tail=100 backend
# Stream all services
docker compose logs --follow
# Single snapshot
docker compose logs --tail=200 <service>
# services: frontend, backend, db, minio, ocr-service
Log locations
- Backend application log: stdout (captured by Docker). Access inside the container at
/app/logs/viadocker exec. - Spring Actuator health:
http://localhost:8080/actuator/health(internal only in prod — port 8081 for Prometheus scraping) - Prometheus scraping: management port 8081, path
/actuator/prometheus. Internal only; Caddy blocks/actuator/*externally.
Future observability
Phase 7 of the Production v1 milestone adds Prometheus + Loki + Grafana. No monitoring infrastructure is in place yet.
5. Backup + recovery
Current state — no automated backup
No automated backup is configured. Manual procedure for a point-in-time backup:
# PostgreSQL dump
docker exec archive-db pg_dump -U ${POSTGRES_USER} ${POSTGRES_DB} > backup-$(date +%Y%m%d).sql
# MinIO data (bind-mounted in dev)
# Copy ./data/minio/ to external storage
Restoration:
# Restore Postgres
docker exec -i archive-db psql -U ${POSTGRES_USER} ${POSTGRES_DB} < backup-YYYYMMDD.sql
Planned — phase 5 of Production v1 milestone
Automated backup (nightly pg_dump + MinIO mc mirror over Tailscale to heim-nas) is a follow-up issue. Until that ships: manual backups are the only recovery option.
Rollback
Each release tag corresponds to a docker image tag on the host daemon (built via DooD; no registry). Rolling back to a previous tag is one command:
TAG=v1.0.0 docker compose \
-f docker-compose.prod.yml \
-p archiv-production \
--env-file /opt/familienarchiv/.env.production \
up -d --wait --remove-orphans
If the rollback target image is no longer present on the host (host disk pruned, etc.), re-trigger release.yml for that tag from Gitea Actions UI — it rebuilds and redeploys.
Flyway migrations are not auto-rolled-back. If a release contained a destructive migration (drop column, rename table), a tag rollback brings the schema back to a previous app version but the data shape has already changed. For breaking schema changes, prefer a forward-only fix.
6. Common operational tasks
Reset dev database (truncates data, keeps schema)
bash scripts/reset-db.sh
Truncates all data but does not drop the schema or re-run Flyway. Use for E2E test resets, not full reinstalls. ⚠️ Script hardcodes
DB_USER=archive_userandDB_NAME=family_archive_db— if you customised these in.env, edit the script accordingly.
Rebuild frontend container (clears node_modules volume)
bash scripts/rebuild-frontend.sh
Assumes the Docker Compose volume is named
familienarchiv_frontend_node_modules. If your project directory is not namedfamilienarchiv, edit line 16 of the script.
Download Kraken OCR models
bash scripts/download-kraken-models.sh
Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.
Trigger a mass import (Excel/ODS)
- Place the import file in the
import/bind mount on the backend container. - Call
POST /api/admin/trigger-import(requiresADMINpermission). - The import runs asynchronously — poll
GET /api/admin/import-statusor watch backend logs.
7. Known limitations
| Limitation | Reason | Reference |
|---|---|---|
| Single-node OCR service | The two required OCR engines (Surya + Kraken) exist only in the Python ecosystem; horizontal scaling would require a job queue not currently implemented | ADR-001 |
| No multi-tenancy | Designed as a single-family private archive; all authenticated users share the same document space | Deliberate scope decision (family-only product frame) |
| No multi-region | Single PostgreSQL + MinIO instance; no replication or failover | Deliberate scope decision |
| Max upload size | 50 MB per file (500 MB per request for multi-file) | Configurable in application.yaml (spring.servlet.multipart) |
| No automated backup | Phase 5 of Production v1 milestone is not yet implemented | See §5 above |