Files
familienarchiv/docs/DEPLOYMENT.md
Marcel 40d9713b79 docs(deployment): fix stale GlitchTip image tags and add SENTRY_DSN to env vars table
- GlitchTip image corrected from glitchtip:v4 to glitchtip:6.1.6 in services table
- Grafana default port corrected from 3001 to 3003 in services table description
- SENTRY_DSN added to backend env vars table (wired in docker-compose.yml and application.yaml)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:53:31 +02:00

30 KiB

Familienarchiv — Deployment Reference

If the app is down right now → jump to §4 Logs.

This doc is the Day-1 checklist and operational reference. It links to the canonical infrastructure docs in docs/infrastructure/ rather than duplicating them.

Audience: operator bringing up a fresh instance, or Successor-X debugging a live incident.

Ownership: project owner. Update this file in any PR that changes the container topology, env vars, or backup procedure.

Table of Contents

  1. Deployment topology
  2. Environment variables
  3. Bootstrap from scratch
  4. Logs + observability
  5. Backup + recovery
  6. Common operational tasks
  7. Known limitations

1. Deployment topology

graph TD
    Browser -->|HTTPS| Caddy["Caddy (TLS termination)"]
    Caddy -->|HTTP :3000| Frontend["Web Frontend\nSvelteKit Node adapter"]
    Caddy -->|HTTP :8080| Backend["API Backend\nSpring Boot / Jetty :8080"]
    Backend -->|JDBC :5432| DB[(PostgreSQL 16)]
    Backend -->|S3 API :9000| MinIO[(MinIO)]
    Backend -->|HTTP :8000 internal| OCR["OCR Service\nPython FastAPI"]
    OCR -->|presigned URL| MinIO
    Caddy -->|SSE proxy_pass| Backend

Key facts:

  • Caddy terminates TLS and reverse-proxies to frontend (:3000) and backend (:8080). The Caddyfile is committed at infra/caddy/Caddyfile and is installed on the host as /etc/caddy/Caddyfile (symlink).
  • The host binds all docker-published ports to 127.0.0.1 only; Caddy is the sole external entry point.
  • The OCR service has no published port — reachable only on the internal Docker network from the backend.
  • SSE notifications transit Caddy (browser → Caddy → backend); the backend is never reachable directly from the public internet. The SvelteKit SSR layer is bypassed for SSE, but Caddy is not.
  • The Caddyfile responds 404 on /actuator/* (defense in depth). Internal monitoring scrapes the backend on the docker network, not through Caddy.
  • Production and staging cohabit on the same host via docker compose project names: archiv-production (ports 8080/3000) and archiv-staging (ports 8081/3001).
  • An optional observability stack (Prometheus, Node Exporter, cAdvisor, Loki, Tempo, Grafana, GlitchTip) runs as a separate compose file. Configuration lives under infra/observability/. In production and CI, the stack is managed from /opt/familienarchiv/ (CI copies it there on every nightly run) so bind mounts survive workspace wipes — see §4 for the ops procedure.

OCR memory requirements

The OCR service requires significant RAM for model loading. The dev compose sets mem_limit: 12g.

Production target RAM Recommended OCR limit Notes
Hetzner CX42 16 GB 12 GB Recommended for OCR-enabled production
Hetzner CX32 8 GB 6 GB Accept reduced batch sizes and slower throughput
Hetzner CX22 4 GB Disable the OCR service (profiles: [ocr]); run OCR on demand only

A CX32 cannot honour the default mem_limit: 12g — set the OCR_MEM_LIMIT=6g env var (in .env.production / .env.staging, or as a Gitea secret consumed by the workflow) before deploying on a CX32. The prod compose interpolates this var with a 12g default.

Dev vs production differences

Concern Dev (docker-compose.yml) Prod (docker-compose.prod.yml)
MinIO image tag minio/minio:latest Pinned minio/minio:RELEASE.…
Data persistence Bind mounts ./data/postgres, ./data/minio Named Docker volumes (postgres-data, minio-data)
MinIO credentials for backend Root user/password Service account archiv-app with bucket-scoped rights
Bucket creation create-buckets helper Same helper, plus service-account bootstrap on every up
Spring profile dev,e2e (Swagger + e2e overrides) unset — base application.yaml is production-ready
Mail Mailpit (local catcher) Real SMTP (production) / Mailpit via profiles: [staging] (staging)
Frontend image Dev server, target: development, port 5173 Node adapter, target: production, port 3000
Host port binding All published Bound to 127.0.0.1 only; Caddy is the front door
Deploy method docker compose up -d (manual) Gitea Actions: nightly.yml (staging, cron) and release.yml (production, on v* tag) — both use up -d --wait

Full prod compose: docker-compose.prod.yml. Workflow files: .gitea/workflows/nightly.yml, .gitea/workflows/release.yml.


2. Environment variables

All vars are set in .env at the repo root (copy from .env.example). The backend resolves them via application.yaml; the Docker Compose file wires them into each container.

Any var found in docker-compose.yml or application*.yaml that is not in this table is a blocking review comment on any PR that changes those files.

Backend

Variable Purpose Default Required? Sensitive?
SPRING_DATASOURCE_URL PostgreSQL JDBC URL YES
SPRING_DATASOURCE_USERNAME DB username YES
SPRING_DATASOURCE_PASSWORD DB password YES YES
S3_ENDPOINT MinIO / OBS endpoint URL YES
S3_ACCESS_KEY MinIO access key (use service account, not root in prod) YES YES
S3_SECRET_KEY MinIO secret key YES YES
S3_BUCKET_NAME Target bucket name YES
S3_REGION S3 region string us-east-1 YES
APP_ADMIN_USERNAME Bootstrap admin username (⚠ not in .env.example) admin YES
APP_ADMIN_PASSWORD Bootstrap admin password (⚠ ships as admin123) admin123 YES YES
APP_BASE_URL Public-facing URL for email links http://localhost:3000 YES (prod)
APP_OCR_BASE_URL Internal URL of the OCR service YES
APP_OCR_TRAINING_TOKEN Secret token for OCR training endpoints YES (prod) YES
IMPORT_HOST_DIR Absolute host path holding the ODS spreadsheet + PDFs for the /admin/system mass-import card. Mounted read-only at /import inside the backend (compose-only — backend reads via app.import.dir). Compose refuses to start when unset, so staging and prod cannot accidentally share the source. Convention: /srv/familienarchiv-staging/import and /srv/familienarchiv-production/import YES (prod compose)
MAIL_HOST SMTP host mailpit (dev) YES (prod)
MAIL_PORT SMTP port 1025 (dev) YES (prod)
MAIL_USERNAME SMTP username YES (prod) YES
MAIL_PASSWORD SMTP password YES (prod) YES
APP_MAIL_FROM From address for outbound mail noreply@familienarchiv.local YES (prod)
MAIL_SMTP_AUTH SMTP auth enabled false (dev) YES (prod)
MAIL_STARTTLS_ENABLE STARTTLS enabled false (dev) YES (prod)
SPRING_PROFILES_ACTIVE Spring profile dev,e2e (compose) YES
OTEL_EXPORTER_OTLP_ENDPOINT OTLP gRPC endpoint for distributed traces (Tempo). Set to http://tempo:4317 via compose. http://localhost:4317
MANAGEMENT_TRACING_SAMPLING_PROBABILITY Micrometer tracing sample rate; overridden to 0.0 in test profile. 0.1 (compose) / 1.0 (dev)
SENTRY_DSN GlitchTip / Sentry DSN for backend error reporting. Leave empty to disable the SDK. Set after GlitchTip first-run (§4). YES

PostgreSQL container

Variable Purpose Default Required? Sensitive?
POSTGRES_USER DB superuser archive_user YES
POSTGRES_PASSWORD DB password change-me YES YES
POSTGRES_DB Database name family_archive_db YES

MinIO container

Variable Purpose Default Required? Sensitive?
MINIO_ROOT_USER MinIO root username (dev compose only — prod compose hardcodes archiv) minio_admin YES (dev)
MINIO_ROOT_PASSWORD / MINIO_PASSWORD MinIO root password. Used only by the mc admin bootstrap in prod, never by the backend. change-me YES YES
MINIO_APP_PASSWORD Password for the archiv-app service account that the backend uses. Bucket-scoped via readwrite policy on familienarchiv. Bootstrapped by create-buckets. YES (prod) YES
MINIO_DEFAULT_BUCKETS Bucket name (dev compose only — prod compose hardcodes familienarchiv) archive-documents YES (dev)

OCR service

Variable Purpose Default Required? Sensitive?
TRAINING_TOKEN Guards /train and /segtrain endpoints (accepts file uploads) YES (prod) YES
ALLOWED_PDF_HOSTS SSRF protection — comma-separated list of allowed PDF source hosts. Do not widen to * minio,localhost,127.0.0.1 YES
KRAKEN_MODEL_PATH Directory containing Kraken HTR models (populated by download-kraken-models.sh) /app/models/
BLLA_MODEL_PATH Kraken baseline layout analysis model path /app/models/blla.mlmodel
OCR_MEM_LIMIT Container memory cap for ocr-service in docker-compose.prod.yml. Set to 6g on CX32 hosts; leave unset on CX42+ to use the 12g default 12g (prod compose default)

Observability stack (docker-compose.observability.yml)

Variable Purpose Default Required? Sensitive?
PORT_PROMETHEUS Host port for the Prometheus UI (bound to 127.0.0.1 only) 9090
PORT_GRAFANA Host port for the Grafana UI (bound to 127.0.0.1 only) 3003
POSTGRES_HOST PostgreSQL hostname for GlitchTip's db-init job and workers. Override when only the staging stack is running and archive-db is not resolvable by that name. archive-db
GRAFANA_ADMIN_PASSWORD Grafana admin user password changeme YES (prod) YES
PORT_GLITCHTIP Host port for the GlitchTip UI (bound to 127.0.0.1 only) 3002
GLITCHTIP_DOMAIN Public-facing base URL for GlitchTip (used in email links and CORS) http://localhost:3002 YES (prod)
GLITCHTIP_SECRET_KEY Django secret key for GlitchTip — generate with python3 -c "import secrets; print(secrets.token_hex(32))" YES YES

3. Bootstrap from scratch

Production and staging deploy via Gitea Actions (release.yml on v* tag, nightly.yml on cron). The server itself only needs to host Caddy, Docker, and the runner — the workflows handle the rest.

3.1 Server one-time setup

# Base hardening
ufw default deny incoming && ufw allow 22/tcp && ufw allow 80/tcp && ufw allow 443/tcp && ufw enable
# /etc/ssh/sshd_config: PasswordAuthentication no, PermitRootLogin no

# Install Caddy 2 (https://caddyserver.com/docs/install#debian-ubuntu-raspbian)
apt install caddy

# Use the Caddyfile from the repo (replace path with the runner's clone target)
# CI DEPENDENCY: the nightly and release workflows run `systemctl reload caddy` to
# pick up committed Caddyfile changes. They find the file via this symlink — if it
# is absent or points elsewhere, the reload succeeds but serves stale config.
ln -sf /opt/familienarchiv/infra/caddy/Caddyfile /etc/caddy/Caddyfile
systemctl reload caddy

# fail2ban — protect /api/auth/login from credential stuffing.
# Jail watches the Caddy JSON access log for 401 responses on
# /api/auth/login. The jail (maxretry=10 / findtime=10m / bantime=30m)
# and filter are committed under infra/fail2ban/ — symlink them in:
apt install fail2ban
ln -sf /opt/familienarchiv/infra/fail2ban/jail.d/familienarchiv.conf \
       /etc/fail2ban/jail.d/familienarchiv.conf
ln -sf /opt/familienarchiv/infra/fail2ban/filter.d/familienarchiv-auth.conf \
       /etc/fail2ban/filter.d/familienarchiv-auth.conf
systemctl reload fail2ban
# Verify after first deploy with:
#   fail2ban-client status familienarchiv-auth
#   fail2ban-regex /var/log/caddy/access.log familienarchiv-auth

# Tailscale — used by the backup pipeline to reach heim-nas (follow-up issue)
curl -fsSL https://tailscale.com/install.sh | sh && tailscale up

# Self-hosted Gitea runner — register against the repo with a runner token.
# This runner is assumed single-tenant: the deploy workflows write .env.*
# files to disk during execution (cleaned up unconditionally on completion).
# A multi-tenant runner would need to switch to stdin-piped env files.
# (See https://docs.gitea.com/usage/actions/quickstart for the register step.)

# Runner workspace directory — required for DooD bind-mount resolution (ADR-015).
# act_runner stores job workspaces here so that docker compose bind mounts resolve
# to real host paths. The path must be identical on the host and inside job containers.
mkdir -p /srv/gitea-workspace
# Observability config permanent directory — the nightly CI job copies
# docker-compose.observability.yml and infra/observability/ here on every run.
# The obs stack is always started from this path, not from the workspace.
# See ADR-016 for why this directory is used instead of a server-pull approach.
mkdir -p /opt/familienarchiv/infra
# Both paths must also appear in the runner service volumes in ~/docker/gitea/compose.yaml:
#   volumes:
#     - /srv/gitea-workspace:/srv/gitea-workspace
# /opt/familienarchiv does NOT need to be in the runner container's volumes — job
# containers are spawned by the host daemon directly (DooD), so the host path is
# accessible to them as long as runner-config.yaml lists it in valid_volumes + options.
# See runner-config.yaml (workdir_parent + valid_volumes + options) and ADR-015/016.

# ⚠ IMPORTANT: after any change to runner-config.yaml (valid_volumes, options, workdir_parent),
# restart the Gitea Act runner for the new config to take effect:
#   docker restart gitea-runner
# Until restarted, job containers are spawned with the old config and any new bind mounts
# (e.g. /opt/familienarchiv) will not be available inside job steps.

3.2 DNS records

archiv.raddatz.cloud   A   <server IP>
staging.raddatz.cloud  A   <server IP>
git.raddatz.cloud      A   <server IP>

3.3 Gitea secrets (Repo → Settings → Actions → Secrets)

Secret Used by Notes
PROD_POSTGRES_PASSWORD release.yml strong unique password
PROD_MINIO_PASSWORD release.yml MinIO root password; used only at bootstrap
PROD_MINIO_APP_PASSWORD release.yml application service-account password
PROD_OCR_TRAINING_TOKEN release.yml python3 -c "import secrets; print(secrets.token_hex(32))"
PROD_APP_ADMIN_USERNAME release.yml e.g. admin@archiv.raddatz.cloud
PROD_APP_ADMIN_PASSWORD release.yml ⚠ locked permanently on first deploy — see §3.5
STAGING_POSTGRES_PASSWORD nightly.yml different from prod
STAGING_MINIO_PASSWORD nightly.yml different from prod
STAGING_MINIO_APP_PASSWORD nightly.yml different from prod
STAGING_OCR_TRAINING_TOKEN nightly.yml different from prod
STAGING_APP_ADMIN_USERNAME nightly.yml e.g. admin@staging.raddatz.cloud
STAGING_APP_ADMIN_PASSWORD nightly.yml locked on first staging deploy
MAIL_HOST release.yml SMTP relay hostname (prod only)
MAIL_PORT release.yml typically 587
MAIL_USERNAME release.yml SMTP user
MAIL_PASSWORD release.yml SMTP password
GRAFANA_ADMIN_PASSWORD both Grafana admin login — generate a strong password
GLITCHTIP_SECRET_KEY both Django secret key — openssl rand -hex 32
SENTRY_DSN both GlitchTip project DSN — set after first-run (§4); leave empty to keep Sentry disabled

3.4 First deploy

# 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
#    Expected: docker compose up -d --wait succeeds for archiv-staging, then
#    the workflow's "Smoke test deployed environment" step asserts:
#      - https://staging.raddatz.cloud/login returns 200
#      - HSTS header is present
#      - /actuator/health returns 404 (defense-in-depth check)
# 2. (Optional) Re-verify manually
curl -I https://staging.raddatz.cloud/
#    Expected: 200 (login page) with HSTS + X-Content-Type-Options headers
# 3. When staging looks healthy, push a v* tag to trigger release.yml
git tag v1.0.0 && git push origin v1.0.0

3.5 ⚠ Admin password is locked on first deploy

UserDataInitializer creates the admin user only if the email does not exist. The first successful deploy persists the admin password to the database. Changing PROD_APP_ADMIN_PASSWORD in Gitea secrets after that point has no effect — the secret is only consulted when the row is missing.

Before the first deploy: rotate PROD_APP_ADMIN_PASSWORD to a strong value. After the first deploy: change the admin password via the in-app account settings, not via the Gitea secret.


4. Logs + observability

First-response commands

# Stream backend logs (most useful first)
docker compose logs --follow --tail=100 backend

# Stream all services
docker compose logs --follow

# Single snapshot
docker compose logs --tail=200 <service>
# services: frontend, backend, db, minio, ocr-service

Log locations

  • Backend application log: stdout (captured by Docker). Access inside the container at /app/logs/ via docker exec.
  • Spring Actuator health: http://localhost:8080/actuator/health (internal only in prod — port 8081 for Prometheus scraping)
  • Prometheus scraping: management port 8081, path /actuator/prometheus. Internal only; Caddy blocks /actuator/* externally.

Observability stack

An observability stack is available via docker-compose.observability.yml. Configuration lives under infra/observability/.

Dev — start from the workspace

docker compose up -d                                    # creates archiv-net
docker compose -f docker-compose.observability.yml up -d

Why the obs stack is managed differently from the main app stack

The main app stack (docker-compose.prod.yml) has no config-file bind mounts — its containers read config from env vars and image defaults. The workspace is wiped after each CI run but that does not affect running containers, because they hold no references to workspace paths.

The obs stack is different: prometheus.yml, tempo.yml, Loki config, Grafana provisioning files, and Promtail config are all bind-mounted from the host filesystem into their containers. If those source paths disappear (workspace wipe), the containers can restart fine until a docker compose up is run again — at that point Docker tries to re-resolve the bind-mount source and fails because the workspace path no longer exists.

The fix is to keep the obs compose file and config tree at a permanent path that CI copies to on every run but which survives between runs: /opt/familienarchiv/ (see ADR-016).

Production — managed from /opt/familienarchiv/

Every CI run (nightly + release) copies docker-compose.observability.yml and infra/observability/ to /opt/familienarchiv/ before starting the stack. Bind mounts then resolve to /opt/familienarchiv/infra/observability/… — a stable path that outlasts any workspace wipe.

Environment variables follow the same two-source model as the main stack:

Source What it contains Managed by
infra/observability/obs.env All non-secret config (ports, URLs, hostnames) Git — reviewed in PRs
/opt/familienarchiv/obs-secrets.env Passwords and secret keys only CI — written fresh from Gitea secrets on every deploy

Both files are passed explicitly via --env-file to the compose command, so there is no implicit auto-read .env and no operator-managed file to keep in sync.

Non-secret config (infra/observability/obs.env):

Key Value Notes
PORT_GRAFANA 3003 Avoids collision with staging frontend on port 3001
PORT_GLITCHTIP 3002
PORT_PROMETHEUS 9090
GF_SERVER_ROOT_URL https://grafana.archiv.raddatz.cloud Required for alert email links and OAuth redirects
GLITCHTIP_DOMAIN https://glitchtip.archiv.raddatz.cloud Must match the Caddy vhost
POSTGRES_HOST archive-db Override if only the staging stack is running

Secret keys (set in Gitea secrets, injected by CI into obs-secrets.env):

Gitea secret Notes
GRAFANA_ADMIN_PASSWORD Strong unique password; shared by nightly and release
GLITCHTIP_SECRET_KEY openssl rand -hex 32; shared by nightly and release
STAGING_POSTGRES_PASSWORD / PROD_POSTGRES_PASSWORD Must match the running PostgreSQL container

To start or restart the obs stack manually on the server (after CI has run at least once):

docker compose \
  -f /opt/familienarchiv/docker-compose.observability.yml \
  --env-file /opt/familienarchiv/infra/observability/obs.env \
  --env-file /opt/familienarchiv/obs-secrets.env \
  up -d --wait --remove-orphans

Note (manual ops only): CI clears the destination with rm -rf before copying, so deleted files are removed automatically on the next run. If you copy manually with cp -r without first removing the directory, stale files from deleted configs will persist until cleaned up:

rm /opt/familienarchiv/infra/observability/<path-to-removed-file>

Current services:

Service Image Purpose
obs-prometheus prom/prometheus:v3.4.0 Scrapes metrics from backend management port 8081 (/actuator/prometheus), node-exporter, and cAdvisor
obs-node-exporter prom/node-exporter:v1.9.0 Host-level CPU / memory / disk / network metrics
obs-cadvisor gcr.io/cadvisor/cadvisor:v0.52.1 Per-container resource metrics
obs-loki grafana/loki:3.4.2 Log aggregation — receives log streams from Promtail. Port 3100 is expose-only (not host-bound).
obs-promtail grafana/promtail:3.4.2 Log shipping agent — reads all Docker container logs via the Docker socket and forwards them to Loki with container_name, compose_service, and compose_project labels
obs-tempo grafana/tempo:2.7.2 Distributed trace storage — OTLP gRPC receiver on port 4317, OTLP HTTP on port 4318 (both archiv-net-internal). Grafana queries traces on port 3200 (obs-net-internal). All ports are expose-only (not host-bound).
obs-grafana grafana/grafana-oss:11.6.1 Unified observability UI — metrics dashboards, log exploration, trace viewer. Bound to 127.0.0.1:${PORT_GRAFANA:-3003} on the host.
obs-glitchtip glitchtip/glitchtip:6.1.6 Sentry-compatible error tracker. Receives frontend + backend error events, groups by fingerprint, provides issue UI with stack traces. Bound to 127.0.0.1:${PORT_GLITCHTIP:-3002}.
obs-glitchtip-worker glitchtip/glitchtip:6.1.6 Celery + beat worker — processes async GlitchTip tasks (event ingestion, notifications, cleanup).
obs-redis redis:7-alpine Celery task broker for GlitchTip. Internal to obs-net; no host port exposed.
obs-glitchtip-db-init postgres:16-alpine One-shot init container. Creates the glitchtip database on the existing archive-db PostgreSQL instance if it does not already exist. Runs at stack startup; exits cleanly once done.

Grafana

Item Value
URL http://localhost:3003 (or http://localhost:$PORT_GRAFANA)
Username admin
Password $GRAFANA_ADMIN_PASSWORD (default: changemechange before exposing to a network)

Datasources are auto-provisioned on first start (Prometheus, Loki, Tempo — no manual setup required). Three dashboards are pre-loaded:

Dashboard Grafana ID Purpose
Node Exporter Full 1860 Host CPU, memory, disk, network
Spring Boot Observability 17175 JVM metrics, HTTP latency, error rate
Loki Logs 13639 Log exploration and filtering

Tempo traces are accessible via Grafana Explore → Tempo datasource, and linked from Loki logs via the traceId derived field.

Loki quick checks (after ~60 s, run from inside the obs-loki container):

# Loki health
docker exec obs-loki wget -qO- http://localhost:3100/ready

# List labels
docker exec obs-loki wget -qO- 'http://localhost:3100/loki/api/v1/labels'

# Query logs by service (stable across dev and prod environments)
docker exec obs-loki wget -qO- \
  'http://localhost:3100/loki/api/v1/query_range?query=%7Bcompose_service%3D%22backend%22%7D&limit=5'

Prefer compose_service over container_name in LogQL queriescontainer_name differs between dev (archive-backend) and prod (archiv-production-backend-1), while compose_service is stable (backend, db, minio, etc.).

Prometheus port 9090 and Grafana port 3003 (default; configurable via PORT_GRAFANA) are bound to 127.0.0.1 on the host. No other observability ports are host-bound.

GlitchTip

Item Value
URL http://localhost:3002 (or http://localhost:$PORT_GLITCHTIP)

Required env vars — set in .env before first start:

GLITCHTIP_SECRET_KEY=$(python3 -c "import secrets; print(secrets.token_hex(32))")
GLITCHTIP_DOMAIN=http://localhost:3002   # change to your public URL in prod
PORT_GLITCHTIP=3002                      # optional, defaults to 3002

Database: GlitchTip shares the existing archive-db PostgreSQL instance. The obs-glitchtip-db-init one-shot container creates a dedicated glitchtip database on first stack start — no manual step required.

First-run steps (one-time, after docker compose -f docker-compose.observability.yml up -d):

# 1. Create the Django superuser (interactive)
docker exec -it obs-glitchtip ./manage.py createsuperuser

# 2. Open the GlitchTip UI and log in
open http://localhost:3002

# 3. Create an organisation (e.g. "Familienarchiv")
# 4. Create two projects:
#    - "familienarchiv-frontend"  (platform: JavaScript / SvelteKit)
#    - "familienarchiv-backend"   (platform: Java / Spring Boot)
# 5. Copy each project's DSN from Settings → Projects → <project> → Client Keys
# 6. Wire the DSNs into the backend and frontend via env vars (separate issue)

5. Backup + recovery

Current state — no automated backup

No automated backup is configured. Manual procedure for a point-in-time backup:

# PostgreSQL dump
docker exec archive-db pg_dump -U ${POSTGRES_USER} ${POSTGRES_DB} > backup-$(date +%Y%m%d).sql

# MinIO data (bind-mounted in dev)
# Copy ./data/minio/ to external storage

Restoration:

# Restore Postgres
docker exec -i archive-db psql -U ${POSTGRES_USER} ${POSTGRES_DB} < backup-YYYYMMDD.sql

Planned — phase 5 of Production v1 milestone

Automated backup (nightly pg_dump + MinIO mc mirror over Tailscale to heim-nas) is a follow-up issue. Until that ships: manual backups are the only recovery option.

Rollback

Each release tag corresponds to a docker image tag on the host daemon (built via DooD; no registry). Rolling back to a previous tag is one command:

TAG=v1.0.0 docker compose \
  -f docker-compose.prod.yml \
  -p archiv-production \
  --env-file /opt/familienarchiv/.env.production \
  up -d --wait --remove-orphans

If the rollback target image is no longer present on the host (host disk pruned, etc.), re-trigger release.yml for that tag from Gitea Actions UI — it rebuilds and redeploys.

Flyway migrations are not auto-rolled-back. If a release contained a destructive migration (drop column, rename table), a tag rollback brings the schema back to a previous app version but the data shape has already changed. For breaking schema changes, prefer a forward-only fix.


6. Common operational tasks

Reset dev database (truncates data, keeps schema)

bash scripts/reset-db.sh

Truncates all data but does not drop the schema or re-run Flyway. Use for E2E test resets, not full reinstalls. ⚠️ Script hardcodes DB_USER=archive_user and DB_NAME=family_archive_db — if you customised these in .env, edit the script accordingly.

Rebuild frontend container (clears node_modules volume)

bash scripts/rebuild-frontend.sh

Assumes the Docker Compose volume is named familienarchiv_frontend_node_modules. If your project directory is not named familienarchiv, edit line 16 of the script.

Download Kraken OCR models

bash scripts/download-kraken-models.sh

Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.

Trigger a mass import (Excel/ODS)

Dev: drop the ODS spreadsheet + PDFs into ./import/ at the repo root — the dev compose bind-mounts it to /import automatically.

Staging/production:

  1. Pre-stage the payload on the host. Convention: /srv/familienarchiv-staging/import/ or /srv/familienarchiv-production/import/.
    rsync -avh --progress ./import/ user@host:/srv/familienarchiv-staging/import/
    
  2. Make sure IMPORT_HOST_DIR=<host-path> is set in .env.staging / .env.production (the nightly/release workflows already write this — see §3). Compose refuses to start without it.
  3. Redeploy the stack so the bind mount picks up — or, if the mount is already in place, skip to step 4.
  4. Call POST /api/admin/trigger-import (requires ADMIN permission), or click the "Import starten" button on /admin/system.
  5. The import runs asynchronously — poll GET /api/admin/import-status, watch /admin/system, or tail the backend logs.

7. Known limitations

Limitation Reason Reference
Single-node OCR service The two required OCR engines (Surya + Kraken) exist only in the Python ecosystem; horizontal scaling would require a job queue not currently implemented ADR-001
No multi-tenancy Designed as a single-family private archive; all authenticated users share the same document space Deliberate scope decision (family-only product frame)
No multi-region Single PostgreSQL + MinIO instance; no replication or failover Deliberate scope decision
Max upload size 50 MB per file (500 MB per request for multi-file) Configurable in application.yaml (spring.servlet.multipart)
No automated backup Phase 5 of Production v1 milestone is not yet implemented See §5 above