Some checks failed
CI / Unit & Component Tests (push) Failing after 2m48s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 4m4s
CI / fail2ban Regex (push) Successful in 40s
CI / Compose Bucket Idempotency (push) Successful in 58s
CI / Unit & Component Tests (pull_request) Failing after 2m49s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Compose Bucket Idempotency (pull_request) Successful in 56s
CI / Backend Unit Tests (pull_request) Successful in 4m12s
CI / fail2ban Regex (pull_request) Successful in 37s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
350 lines
17 KiB
Markdown
350 lines
17 KiB
Markdown
<!-- Last reviewed: 2026-05-05 — reviewed at every milestone close -->
|
|
|
|
# Familienarchiv — Deployment Reference
|
|
|
|
> **If the app is down right now → jump to [§4 Logs](#4-logs--observability).**
|
|
|
|
This doc is the Day-1 checklist and operational reference. It links to the canonical infrastructure docs in `docs/infrastructure/` rather than duplicating them.
|
|
|
|
**Audience:** operator bringing up a fresh instance, or Successor-X debugging a live incident.
|
|
|
|
**Ownership:** project owner. Update this file in any PR that changes the container topology, env vars, or backup procedure.
|
|
|
|
## Table of Contents
|
|
|
|
1. [Deployment topology](#1-deployment-topology)
|
|
2. [Environment variables](#2-environment-variables)
|
|
3. [Bootstrap from scratch](#3-bootstrap-from-scratch)
|
|
4. [Logs + observability](#4-logs--observability)
|
|
5. [Backup + recovery](#5-backup--recovery)
|
|
6. [Common operational tasks](#6-common-operational-tasks)
|
|
7. [Known limitations](#7-known-limitations)
|
|
|
|
---
|
|
|
|
## 1. Deployment topology
|
|
|
|
```mermaid
|
|
graph TD
|
|
Browser -->|HTTPS| Caddy["Caddy (TLS termination)"]
|
|
Caddy -->|HTTP :3000| Frontend["Web Frontend\nSvelteKit Node adapter"]
|
|
Caddy -->|HTTP :8080| Backend["API Backend\nSpring Boot / Jetty :8080"]
|
|
Backend -->|JDBC :5432| DB[(PostgreSQL 16)]
|
|
Backend -->|S3 API :9000| MinIO[(MinIO)]
|
|
Backend -->|HTTP :8000 internal| OCR["OCR Service\nPython FastAPI"]
|
|
OCR -->|presigned URL| MinIO
|
|
Caddy -->|SSE proxy_pass| Backend
|
|
```
|
|
|
|
**Key facts:**
|
|
- Caddy terminates TLS and reverse-proxies to frontend (`:3000`) and backend (`:8080`). The Caddyfile is committed at [`infra/caddy/Caddyfile`](../infra/caddy/Caddyfile) and is installed on the host as `/etc/caddy/Caddyfile` (symlink).
|
|
- The host binds all docker-published ports to `127.0.0.1` only; Caddy is the sole external entry point.
|
|
- The OCR service has **no published port** — reachable only on the internal Docker network from the backend.
|
|
- SSE notifications transit Caddy (browser → Caddy → backend); the backend is never reachable directly from the public internet. The SvelteKit SSR layer is bypassed for SSE, but Caddy is not.
|
|
- The Caddyfile responds `404` on `/actuator/*` (defense in depth). Internal monitoring scrapes the backend on the docker network, not through Caddy.
|
|
- Production and staging cohabit on the same host via docker compose project names: `archiv-production` (ports 8080/3000) and `archiv-staging` (ports 8081/3001).
|
|
|
|
### OCR memory requirements
|
|
|
|
The OCR service requires significant RAM for model loading. The dev compose sets `mem_limit: 12g`.
|
|
|
|
| Production target | RAM | Recommended OCR limit | Notes |
|
|
|---|---|---|---|
|
|
| Hetzner CX42 | 16 GB | 12 GB | Recommended for OCR-enabled production |
|
|
| Hetzner CX32 | 8 GB | 6 GB | Accept reduced batch sizes and slower throughput |
|
|
| Hetzner CX22 | 4 GB | — | Disable the OCR service (`profiles: [ocr]`); run OCR on demand only |
|
|
|
|
A CX32 cannot honour the default `mem_limit: 12g` — set the `OCR_MEM_LIMIT=6g` env var (in `.env.production` / `.env.staging`, or as a Gitea secret consumed by the workflow) before deploying on a CX32. The prod compose interpolates this var with a 12g default.
|
|
|
|
### Dev vs production differences
|
|
|
|
| Concern | Dev (`docker-compose.yml`) | Prod (`docker-compose.prod.yml`) |
|
|
|---|---|---|
|
|
| MinIO image tag | `minio/minio:latest` | Pinned `minio/minio:RELEASE.…` |
|
|
| Data persistence | Bind mounts `./data/postgres`, `./data/minio` | Named Docker volumes (`postgres-data`, `minio-data`) |
|
|
| MinIO credentials for backend | Root user/password | Service account `archiv-app` with bucket-scoped rights |
|
|
| Bucket creation | `create-buckets` helper | Same helper, plus service-account bootstrap on every up |
|
|
| Spring profile | `dev,e2e` (Swagger + e2e overrides) | unset — base `application.yaml` is production-ready |
|
|
| Mail | Mailpit (local catcher) | Real SMTP (production) / Mailpit via `profiles: [staging]` (staging) |
|
|
| Frontend image | Dev server, `target: development`, port 5173 | Node adapter, `target: production`, port 3000 |
|
|
| Host port binding | All published | Bound to `127.0.0.1` only; Caddy is the front door |
|
|
| Deploy method | `docker compose up -d` (manual) | Gitea Actions: `nightly.yml` (staging, cron) and `release.yml` (production, on `v*` tag) — both use `up -d --wait` |
|
|
|
|
Full prod compose: [`docker-compose.prod.yml`](../docker-compose.prod.yml). Workflow files: [`.gitea/workflows/nightly.yml`](../.gitea/workflows/nightly.yml), [`.gitea/workflows/release.yml`](../.gitea/workflows/release.yml).
|
|
|
|
---
|
|
|
|
## 2. Environment variables
|
|
|
|
All vars are set in `.env` at the repo root (copy from `.env.example`). The backend resolves them via `application.yaml`; the Docker Compose file wires them into each container.
|
|
|
|
**Any var found in `docker-compose.yml` or `application*.yaml` that is not in this table is a blocking review comment on any PR that changes those files.**
|
|
|
|
### Backend
|
|
|
|
| Variable | Purpose | Default | Required? | Sensitive? |
|
|
|---|---|---|---|---|
|
|
| `SPRING_DATASOURCE_URL` | PostgreSQL JDBC URL | — | YES | — |
|
|
| `SPRING_DATASOURCE_USERNAME` | DB username | — | YES | — |
|
|
| `SPRING_DATASOURCE_PASSWORD` | DB password | — | YES | YES |
|
|
| `S3_ENDPOINT` | MinIO / OBS endpoint URL | — | YES | — |
|
|
| `S3_ACCESS_KEY` | MinIO access key (use service account, not root in prod) | — | YES | YES |
|
|
| `S3_SECRET_KEY` | MinIO secret key | — | YES | YES |
|
|
| `S3_BUCKET_NAME` | Target bucket name | — | YES | — |
|
|
| `S3_REGION` | S3 region string | `us-east-1` | YES | — |
|
|
| `APP_ADMIN_USERNAME` | Bootstrap admin username (⚠ not in .env.example) | `admin` | YES | — |
|
|
| `APP_ADMIN_PASSWORD` | Bootstrap admin password (⚠ ships as `admin123`) | `admin123` | YES | YES |
|
|
| `APP_BASE_URL` | Public-facing URL for email links | `http://localhost:3000` | YES (prod) | — |
|
|
| `APP_OCR_BASE_URL` | Internal URL of the OCR service | — | YES | — |
|
|
| `APP_OCR_TRAINING_TOKEN` | Secret token for OCR training endpoints | — | YES (prod) | YES |
|
|
| `MAIL_HOST` | SMTP host | `mailpit` (dev) | YES (prod) | — |
|
|
| `MAIL_PORT` | SMTP port | `1025` (dev) | YES (prod) | — |
|
|
| `MAIL_USERNAME` | SMTP username | — | YES (prod) | YES |
|
|
| `MAIL_PASSWORD` | SMTP password | — | YES (prod) | YES |
|
|
| `APP_MAIL_FROM` | From address for outbound mail | `noreply@familienarchiv.local` | YES (prod) | — |
|
|
| `MAIL_SMTP_AUTH` | SMTP auth enabled | `false` (dev) | YES (prod) | — |
|
|
| `MAIL_STARTTLS_ENABLE` | STARTTLS enabled | `false` (dev) | YES (prod) | — |
|
|
| `SPRING_PROFILES_ACTIVE` | Spring profile | `dev,e2e` (compose) | YES | — |
|
|
|
|
### PostgreSQL container
|
|
|
|
| Variable | Purpose | Default | Required? | Sensitive? |
|
|
|---|---|---|---|---|
|
|
| `POSTGRES_USER` | DB superuser | `archive_user` | YES | — |
|
|
| `POSTGRES_PASSWORD` | DB password | `change-me` | YES | YES |
|
|
| `POSTGRES_DB` | Database name | `family_archive_db` | YES | — |
|
|
|
|
### MinIO container
|
|
|
|
| Variable | Purpose | Default | Required? | Sensitive? |
|
|
|---|---|---|---|---|
|
|
| `MINIO_ROOT_USER` | MinIO root username (dev compose only — prod compose hardcodes `archiv`) | `minio_admin` | YES (dev) | — |
|
|
| `MINIO_ROOT_PASSWORD` / `MINIO_PASSWORD` | MinIO root password. **Used only by the `mc admin` bootstrap in prod, never by the backend.** | `change-me` | YES | YES |
|
|
| `MINIO_APP_PASSWORD` | Password for the `archiv-app` service account that the backend uses. Bucket-scoped via `readwrite` policy on `familienarchiv`. Bootstrapped by `create-buckets`. | — | YES (prod) | YES |
|
|
| `MINIO_DEFAULT_BUCKETS` | Bucket name (dev compose only — prod compose hardcodes `familienarchiv`) | `archive-documents` | YES (dev) | — |
|
|
|
|
### OCR service
|
|
|
|
| Variable | Purpose | Default | Required? | Sensitive? |
|
|
|---|---|---|---|---|
|
|
| `TRAINING_TOKEN` | Guards `/train` and `/segtrain` endpoints (accepts file uploads) | — | YES (prod) | YES |
|
|
| `ALLOWED_PDF_HOSTS` | SSRF protection — comma-separated list of allowed PDF source hosts. **Do not widen to `*`** | `minio,localhost,127.0.0.1` | YES | — |
|
|
| `KRAKEN_MODEL_PATH` | Directory containing Kraken HTR models (populated by `download-kraken-models.sh`) | `/app/models/` | — | — |
|
|
| `BLLA_MODEL_PATH` | Kraken baseline layout analysis model path | `/app/models/blla.mlmodel` | — | — |
|
|
| `OCR_MEM_LIMIT` | Container memory cap for ocr-service in `docker-compose.prod.yml`. Set to `6g` on CX32 hosts; leave unset on CX42+ to use the 12g default | `12g` (prod compose default) | — | — |
|
|
|
|
---
|
|
|
|
## 3. Bootstrap from scratch
|
|
|
|
Production and staging deploy via Gitea Actions (`release.yml` on `v*` tag, `nightly.yml` on cron). The server itself only needs to host Caddy, Docker, and the runner — the workflows handle the rest.
|
|
|
|
### 3.1 Server one-time setup
|
|
|
|
```bash
|
|
# Base hardening
|
|
ufw default deny incoming && ufw allow 22/tcp && ufw allow 80/tcp && ufw allow 443/tcp && ufw enable
|
|
# /etc/ssh/sshd_config: PasswordAuthentication no, PermitRootLogin no
|
|
|
|
# Install Caddy 2 (https://caddyserver.com/docs/install#debian-ubuntu-raspbian)
|
|
apt install caddy
|
|
|
|
# Use the Caddyfile from the repo (replace path with the runner's clone target)
|
|
# CI DEPENDENCY: the nightly and release workflows run `systemctl reload caddy` to
|
|
# pick up committed Caddyfile changes. They find the file via this symlink — if it
|
|
# is absent or points elsewhere, the reload succeeds but serves stale config.
|
|
ln -sf /opt/familienarchiv/infra/caddy/Caddyfile /etc/caddy/Caddyfile
|
|
systemctl reload caddy
|
|
|
|
# fail2ban — protect /api/auth/login from credential stuffing.
|
|
# Jail watches the Caddy JSON access log for 401 responses on
|
|
# /api/auth/login. The jail (maxretry=10 / findtime=10m / bantime=30m)
|
|
# and filter are committed under infra/fail2ban/ — symlink them in:
|
|
apt install fail2ban
|
|
ln -sf /opt/familienarchiv/infra/fail2ban/jail.d/familienarchiv.conf \
|
|
/etc/fail2ban/jail.d/familienarchiv.conf
|
|
ln -sf /opt/familienarchiv/infra/fail2ban/filter.d/familienarchiv-auth.conf \
|
|
/etc/fail2ban/filter.d/familienarchiv-auth.conf
|
|
systemctl reload fail2ban
|
|
# Verify after first deploy with:
|
|
# fail2ban-client status familienarchiv-auth
|
|
# fail2ban-regex /var/log/caddy/access.log familienarchiv-auth
|
|
|
|
# Tailscale — used by the backup pipeline to reach heim-nas (follow-up issue)
|
|
curl -fsSL https://tailscale.com/install.sh | sh && tailscale up
|
|
|
|
# Self-hosted Gitea runner — register against the repo with a runner token.
|
|
# This runner is assumed single-tenant: the deploy workflows write .env.*
|
|
# files to disk during execution (cleaned up unconditionally on completion).
|
|
# A multi-tenant runner would need to switch to stdin-piped env files.
|
|
# (See https://docs.gitea.com/usage/actions/quickstart for the register step.)
|
|
```
|
|
|
|
### 3.2 DNS records
|
|
|
|
```
|
|
archiv.raddatz.cloud A <server IP>
|
|
staging.raddatz.cloud A <server IP>
|
|
git.raddatz.cloud A <server IP>
|
|
```
|
|
|
|
### 3.3 Gitea secrets (Repo → Settings → Actions → Secrets)
|
|
|
|
| Secret | Used by | Notes |
|
|
|---|---|---|
|
|
| `PROD_POSTGRES_PASSWORD` | release.yml | strong unique password |
|
|
| `PROD_MINIO_PASSWORD` | release.yml | MinIO root password; used only at bootstrap |
|
|
| `PROD_MINIO_APP_PASSWORD` | release.yml | application service-account password |
|
|
| `PROD_OCR_TRAINING_TOKEN` | release.yml | `python3 -c "import secrets; print(secrets.token_hex(32))"` |
|
|
| `PROD_APP_ADMIN_USERNAME` | release.yml | e.g. `admin@archiv.raddatz.cloud` |
|
|
| `PROD_APP_ADMIN_PASSWORD` | release.yml | **⚠ locked permanently on first deploy** — see §3.5 |
|
|
| `STAGING_POSTGRES_PASSWORD` | nightly.yml | different from prod |
|
|
| `STAGING_MINIO_PASSWORD` | nightly.yml | different from prod |
|
|
| `STAGING_MINIO_APP_PASSWORD` | nightly.yml | different from prod |
|
|
| `STAGING_OCR_TRAINING_TOKEN` | nightly.yml | different from prod |
|
|
| `STAGING_APP_ADMIN_USERNAME` | nightly.yml | e.g. `admin@staging.raddatz.cloud` |
|
|
| `STAGING_APP_ADMIN_PASSWORD` | nightly.yml | locked on first staging deploy |
|
|
| `MAIL_HOST` | release.yml | SMTP relay hostname (prod only) |
|
|
| `MAIL_PORT` | release.yml | typically `587` |
|
|
| `MAIL_USERNAME` | release.yml | SMTP user |
|
|
| `MAIL_PASSWORD` | release.yml | SMTP password |
|
|
|
|
### 3.4 First deploy
|
|
|
|
```bash
|
|
# 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow")
|
|
# Expected: docker compose up -d --wait succeeds for archiv-staging, then
|
|
# the workflow's "Smoke test deployed environment" step asserts:
|
|
# - https://staging.raddatz.cloud/login returns 200
|
|
# - HSTS header is present
|
|
# - /actuator/health returns 404 (defense-in-depth check)
|
|
# 2. (Optional) Re-verify manually
|
|
curl -I https://staging.raddatz.cloud/
|
|
# Expected: 200 (login page) with HSTS + X-Content-Type-Options headers
|
|
# 3. When staging looks healthy, push a v* tag to trigger release.yml
|
|
git tag v1.0.0 && git push origin v1.0.0
|
|
```
|
|
|
|
### 3.5 ⚠ Admin password is locked on first deploy
|
|
|
|
`UserDataInitializer` creates the admin user **only if the email does not exist**. The first successful deploy persists the admin password to the database. Changing `PROD_APP_ADMIN_PASSWORD` in Gitea secrets after that point has **no effect** — the secret is only consulted when the row is missing.
|
|
|
|
Before the first deploy: rotate `PROD_APP_ADMIN_PASSWORD` to a strong value. After the first deploy: change the admin password via the in-app account settings, not via the Gitea secret.
|
|
|
|
---
|
|
|
|
## 4. Logs + observability
|
|
|
|
### First-response commands
|
|
|
|
```bash
|
|
# Stream backend logs (most useful first)
|
|
docker compose logs --follow --tail=100 backend
|
|
|
|
# Stream all services
|
|
docker compose logs --follow
|
|
|
|
# Single snapshot
|
|
docker compose logs --tail=200 <service>
|
|
# services: frontend, backend, db, minio, ocr-service
|
|
```
|
|
|
|
### Log locations
|
|
|
|
- **Backend application log**: stdout (captured by Docker). Access inside the container at `/app/logs/` via `docker exec`.
|
|
- **Spring Actuator health**: `http://localhost:8080/actuator/health` (internal only in prod — port 8081 for Prometheus scraping)
|
|
- **Prometheus scraping**: management port 8081, path `/actuator/prometheus`. Internal only; Caddy blocks `/actuator/*` externally.
|
|
|
|
### Future observability
|
|
|
|
Phase 7 of the Production v1 milestone adds Prometheus + Loki + Grafana. No monitoring infrastructure is in place yet.
|
|
|
|
---
|
|
|
|
## 5. Backup + recovery
|
|
|
|
### Current state — no automated backup
|
|
|
|
No automated backup is configured. Manual procedure for a point-in-time backup:
|
|
|
|
```bash
|
|
# PostgreSQL dump
|
|
docker exec archive-db pg_dump -U ${POSTGRES_USER} ${POSTGRES_DB} > backup-$(date +%Y%m%d).sql
|
|
|
|
# MinIO data (bind-mounted in dev)
|
|
# Copy ./data/minio/ to external storage
|
|
```
|
|
|
|
Restoration:
|
|
```bash
|
|
# Restore Postgres
|
|
docker exec -i archive-db psql -U ${POSTGRES_USER} ${POSTGRES_DB} < backup-YYYYMMDD.sql
|
|
```
|
|
|
|
### Planned — phase 5 of Production v1 milestone
|
|
|
|
Automated backup (nightly `pg_dump` + MinIO `mc mirror` over Tailscale to `heim-nas`) is a follow-up issue. Until that ships: **manual backups are the only recovery option.**
|
|
|
|
### Rollback
|
|
|
|
Each release tag corresponds to a docker image tag on the host daemon (built via DooD; no registry). Rolling back to a previous tag is one command:
|
|
|
|
```bash
|
|
TAG=v1.0.0 docker compose \
|
|
-f docker-compose.prod.yml \
|
|
-p archiv-production \
|
|
--env-file /opt/familienarchiv/.env.production \
|
|
up -d --wait --remove-orphans
|
|
```
|
|
|
|
If the rollback target image is no longer present on the host (host disk pruned, etc.), re-trigger `release.yml` for that tag from Gitea Actions UI — it rebuilds and redeploys.
|
|
|
|
**Flyway migrations are not auto-rolled-back.** If a release contained a destructive migration (drop column, rename table), a tag rollback brings the schema back to a previous app version but the data shape has already changed. For breaking schema changes, prefer a forward-only fix.
|
|
|
|
---
|
|
|
|
## 6. Common operational tasks
|
|
|
|
### Reset dev database (truncates data, keeps schema)
|
|
|
|
```bash
|
|
bash scripts/reset-db.sh
|
|
```
|
|
|
|
> Truncates all data but does **not** drop the schema or re-run Flyway. Use for E2E test resets, not full reinstalls.
|
|
> ⚠️ Script hardcodes `DB_USER=archive_user` and `DB_NAME=family_archive_db` — if you customised these in `.env`, edit the script accordingly.
|
|
|
|
### Rebuild frontend container (clears node_modules volume)
|
|
|
|
```bash
|
|
bash scripts/rebuild-frontend.sh
|
|
```
|
|
|
|
> Assumes the Docker Compose volume is named `familienarchiv_frontend_node_modules`. If your project directory is not named `familienarchiv`, edit line 16 of the script.
|
|
|
|
### Download Kraken OCR models
|
|
|
|
```bash
|
|
bash scripts/download-kraken-models.sh
|
|
```
|
|
|
|
> Downloads the Kurrent/Sütterlin HTR models. Run once after a fresh clone or when models are updated.
|
|
|
|
### Trigger a mass import (Excel/ODS)
|
|
|
|
1. Place the import file in the `import/` bind mount on the backend container.
|
|
2. Call `POST /api/admin/trigger-import` (requires `ADMIN` permission).
|
|
3. The import runs asynchronously — poll `GET /api/admin/import-status` or watch backend logs.
|
|
|
|
---
|
|
|
|
## 7. Known limitations
|
|
|
|
| Limitation | Reason | Reference |
|
|
|---|---|---|
|
|
| **Single-node OCR service** | The two required OCR engines (Surya + Kraken) exist only in the Python ecosystem; horizontal scaling would require a job queue not currently implemented | [ADR-001](adr/001-ocr-python-microservice.md) |
|
|
| **No multi-tenancy** | Designed as a single-family private archive; all authenticated users share the same document space | Deliberate scope decision (family-only product frame) |
|
|
| **No multi-region** | Single PostgreSQL + MinIO instance; no replication or failover | Deliberate scope decision |
|
|
| **Max upload size** | 50 MB per file (500 MB per request for multi-file) | Configurable in `application.yaml` (`spring.servlet.multipart`) |
|
|
| **No automated backup** | Phase 5 of Production v1 milestone is not yet implemented | See §5 above |
|