diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index 6e697c55..674bc15f 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -27,20 +27,22 @@ This doc is the Day-1 checklist and operational reference. It links to the canon ```mermaid graph TD Browser -->|HTTPS| Caddy["Caddy (TLS termination)"] - Caddy -->|HTTP :5173| Frontend["Web Frontend\nSvelteKit / Node.js"] + Caddy -->|HTTP :3000| Frontend["Web Frontend\nSvelteKit Node adapter"] Caddy -->|HTTP :8080| Backend["API Backend\nSpring Boot / Jetty :8080"] Backend -->|JDBC :5432| DB[(PostgreSQL 16)] - Backend -->|S3 API :9000| MinIO[(MinIO / Hetzner OBS)] + Backend -->|S3 API :9000| MinIO[(MinIO)] Backend -->|HTTP :8000 internal| OCR["OCR Service\nPython FastAPI"] OCR -->|presigned URL| MinIO Browser -->|SSE direct| Backend ``` **Key facts:** -- Caddy terminates TLS and reverse-proxies to frontend and backend. See the Caddyfile in [`docs/infrastructure/production-compose.md`](infrastructure/production-compose.md). -- The OCR service has **no external port** — reachable only on the internal Docker network from the backend. +- Caddy terminates TLS and reverse-proxies to frontend (`:3000`) and backend (`:8080`). The Caddyfile is committed at [`infra/caddy/Caddyfile`](../infra/caddy/Caddyfile) and is installed on the host as `/etc/caddy/Caddyfile` (symlink). +- The host binds all docker-published ports to `127.0.0.1` only; Caddy is the sole external entry point. +- The OCR service has **no published port** — reachable only on the internal Docker network from the backend. - SSE notifications go directly backend → browser (not via the SvelteKit SSR layer). -- Management port 8081 (Spring Actuator / Prometheus scrape) is internal only — the Caddy config blocks `/actuator/*` externally. +- The Caddyfile responds `404` on `/actuator/*` (defense in depth). Internal monitoring scrapes the backend on the docker network, not through Caddy. +- Production and staging cohabit on the same host via docker compose project names: `archiv-production` (ports 8080/3000) and `archiv-staging` (ports 8081/3001). ### OCR memory requirements @@ -56,15 +58,19 @@ A CX32 cannot honour a `mem_limit: 12g` — set it to `6g` in the prod overlay o ### Dev vs production differences -| Concern | Dev compose | Prod overlay | +| Concern | Dev (`docker-compose.yml`) | Prod (`docker-compose.prod.yml`) | |---|---|---| -| MinIO image tag | `minio/minio:latest` (unpinned) | Pinned in prod overlay | -| Data persistence | Bind mounts `./data/postgres`, `./data/minio` | Named Docker volumes | -| Bucket creation | `create-buckets` helper container | Pre-created in Hetzner console | -| Spring profile | `dev,e2e` (enables OpenAPI + Swagger UI) | `prod` | -| Mail | Mailpit (local catcher) | Real SMTP | +| MinIO image tag | `minio/minio:latest` | Pinned `minio/minio:RELEASE.…` | +| Data persistence | Bind mounts `./data/postgres`, `./data/minio` | Named Docker volumes (`postgres-data`, `minio-data`) | +| MinIO credentials for backend | Root user/password | Service account `archiv-app` with bucket-scoped rights | +| Bucket creation | `create-buckets` helper | Same helper, plus service-account bootstrap on every up | +| Spring profile | `dev,e2e` (Swagger + e2e overrides) | unset — base `application.yaml` is production-ready | +| Mail | Mailpit (local catcher) | Real SMTP (production) / Mailpit via `profiles: [staging]` (staging) | +| Frontend image | Dev server, `target: development`, port 5173 | Node adapter, `target: production`, port 3000 | +| Host port binding | All published | Bound to `127.0.0.1` only; Caddy is the front door | +| Deploy method | `docker compose up -d` (manual) | Gitea Actions: `nightly.yml` (staging, cron) and `release.yml` (production, on `v*` tag) — both use `up -d --wait` | -Full prod overlay: [`docs/infrastructure/production-compose.md`](infrastructure/production-compose.md). +Full prod compose: [`docker-compose.prod.yml`](../docker-compose.prod.yml). Workflow files: [`.gitea/workflows/nightly.yml`](../.gitea/workflows/nightly.yml), [`.gitea/workflows/release.yml`](../.gitea/workflows/release.yml). --- @@ -112,9 +118,10 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back | Variable | Purpose | Default | Required? | Sensitive? | |---|---|---|---|---| -| `MINIO_ROOT_USER` | MinIO root username | `minio_admin` | YES | — | -| `MINIO_ROOT_PASSWORD` | MinIO root password | `change-me` | YES | YES | -| `MINIO_DEFAULT_BUCKETS` | Bucket name | `archive-documents` | YES | — | +| `MINIO_ROOT_USER` | MinIO root username (dev compose only — prod compose hardcodes `archiv`) | `minio_admin` | YES (dev) | — | +| `MINIO_ROOT_PASSWORD` / `MINIO_PASSWORD` | MinIO root password. **Used only by the `mc admin` bootstrap in prod, never by the backend.** | `change-me` | YES | YES | +| `MINIO_APP_PASSWORD` | Password for the `archiv-app` service account that the backend uses. Bucket-scoped via `readwrite` policy on `familienarchiv`. Bootstrapped by `create-buckets`. | — | YES (prod) | YES | +| `MINIO_DEFAULT_BUCKETS` | Bucket name (dev compose only — prod compose hardcodes `familienarchiv`) | `archive-documents` | YES (dev) | — | ### OCR service @@ -129,48 +136,81 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back ## 3. Bootstrap from scratch -> Full VPS provisioning steps are in [`docs/infrastructure/production-compose.md`](infrastructure/production-compose.md). This section covers the sequence and the security-critical steps. +Production and staging deploy via Gitea Actions (`release.yml` on `v*` tag, `nightly.yml` on cron). The server itself only needs to host Caddy, Docker, and the runner — the workflows handle the rest. -### Security checklist — complete before first boot - -> ⚠️ **These defaults ship in `.env.example` and `application.yaml`. Change them or you will have an insecure installation.** - -- [ ] Set `APP_ADMIN_PASSWORD` (default: `admin123` — change before starting the backend) -- [ ] Set `APP_ADMIN_USERNAME` if you want a non-default admin login name (add to `.env` — not in `.env.example`) -- [ ] Rotate `POSTGRES_PASSWORD` from `change-me` -- [ ] Rotate `MINIO_ROOT_PASSWORD` from `change-me` -- [ ] Set a strong `APP_OCR_TRAINING_TOKEN` (backend) and the matching `TRAINING_TOKEN` (OCR service) — both must be the same value (`python3 -c "import secrets; print(secrets.token_hex(32))"`) -- [ ] Confirm `ALLOWED_PDF_HOSTS` is locked to your MinIO/S3 hostname — widening to `*` opens SSRF -- [ ] Set `SPRING_PROFILES_ACTIVE=prod` in the prod overlay (not `dev,e2e` — that exposes Swagger UI and `/v3/api-docs`) -- [ ] Use a dedicated MinIO service account for `S3_ACCESS_KEY` / `S3_SECRET_KEY`, not the root credentials - -### Bootstrap sequence +### 3.1 Server one-time setup ```bash -# 1. Copy and fill the env file -cp .env.example .env -# edit .env — complete the security checklist above first +# Base hardening +ufw default deny incoming && ufw allow 22/tcp && ufw allow 80/tcp && ufw allow 443/tcp && ufw enable +# /etc/ssh/sshd_config: PasswordAuthentication no, PermitRootLogin no -# 2. (Production only) Create the MinIO / Hetzner OBS bucket in the console -# The dev compose has a create-buckets helper; production does not. -# Create the bucket named $MINIO_DEFAULT_BUCKETS with private access. +# Install Caddy 2 (https://caddyserver.com/docs/install#debian-ubuntu-raspbian) +apt install caddy -# 3. Start the stack (prod overlay — see docs/infrastructure/production-compose.md) -# docker-compose.prod.yml is NOT committed — create it from the guide above -docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d +# Use the Caddyfile from the repo (replace path with the runner's clone target) +ln -sf /opt/familienarchiv/infra/caddy/Caddyfile /etc/caddy/Caddyfile +systemctl reload caddy -# 4. Flyway migrations run automatically on backend start. -# Watch the backend log to confirm: -docker compose logs --follow --tail=100 backend +# fail2ban — protect /api/auth/login from credential stuffing +# Jail watches Caddy access log for 401 responses on /api/auth/login. +# maxretry=10 findtime=10m bantime=30m +apt install fail2ban +# Drop the jail definition under /etc/fail2ban/jail.d/familienarchiv.conf -# 5. Verify the stack is healthy -curl http://localhost:8080/actuator/health -# Expected: {"status":"UP"} +# Tailscale — used by the backup pipeline to reach heim-nas (follow-up issue) +curl -fsSL https://tailscale.com/install.sh | sh && tailscale up -# 6. Open the app and log in with the admin credentials from .env +# Self-hosted Gitea runner — register against the repo with a runner token +# (see https://docs.gitea.com/usage/actions/quickstart for the register step) ``` -> **Do not use `docker-compose.ci.yml` locally** — it disables bind mounts that the dev workflow depends on. +### 3.2 DNS records + +``` +archiv.raddatz.cloud A +staging.raddatz.cloud A +git.raddatz.cloud A +``` + +### 3.3 Gitea secrets (Repo → Settings → Actions → Secrets) + +| Secret | Used by | Notes | +|---|---|---| +| `PROD_POSTGRES_PASSWORD` | release.yml | strong unique password | +| `PROD_MINIO_PASSWORD` | release.yml | MinIO root password; used only at bootstrap | +| `PROD_MINIO_APP_PASSWORD` | release.yml | application service-account password | +| `PROD_OCR_TRAINING_TOKEN` | release.yml | `python3 -c "import secrets; print(secrets.token_hex(32))"` | +| `PROD_APP_ADMIN_USERNAME` | release.yml | e.g. `admin@archiv.raddatz.cloud` | +| `PROD_APP_ADMIN_PASSWORD` | release.yml | **⚠ locked permanently on first deploy** — see §3.5 | +| `STAGING_POSTGRES_PASSWORD` | nightly.yml | different from prod | +| `STAGING_MINIO_PASSWORD` | nightly.yml | different from prod | +| `STAGING_MINIO_APP_PASSWORD` | nightly.yml | different from prod | +| `STAGING_OCR_TRAINING_TOKEN` | nightly.yml | different from prod | +| `STAGING_APP_ADMIN_USERNAME` | nightly.yml | e.g. `admin@staging.raddatz.cloud` | +| `STAGING_APP_ADMIN_PASSWORD` | nightly.yml | locked on first staging deploy | +| `MAIL_HOST` | release.yml | SMTP relay hostname (prod only) | +| `MAIL_PORT` | release.yml | typically `587` | +| `MAIL_USERNAME` | release.yml | SMTP user | +| `MAIL_PASSWORD` | release.yml | SMTP password | + +### 3.4 First deploy + +```bash +# 1. Trigger nightly.yml manually (Repo → Actions → nightly → "Run workflow") +# Expected: docker compose up -d --wait succeeds for archiv-staging +# 2. Verify TLS + reverse proxy +curl -I https://staging.raddatz.cloud/ +# Expected: 200 (login page) with HSTS + X-Content-Type-Options headers +# 3. When staging looks healthy, push a v* tag to trigger release.yml +git tag v1.0.0 && git push origin v1.0.0 +``` + +### 3.5 ⚠ Admin password is locked on first deploy + +`UserDataInitializer` creates the admin user **only if the email does not exist**. The first successful deploy persists the admin password to the database. Changing `PROD_APP_ADMIN_PASSWORD` in Gitea secrets after that point has **no effect** — the secret is only consulted when the row is missing. + +Before the first deploy: rotate `PROD_APP_ADMIN_PASSWORD` to a strong value. After the first deploy: change the admin password via the in-app account settings, not via the Gitea secret. --- @@ -224,7 +264,23 @@ docker exec -i archive-db psql -U ${POSTGRES_USER} ${POSTGRES_DB} < backup-YYYYM ### Planned — phase 5 of Production v1 milestone -Automated backup (PostgreSQL WAL archiving + MinIO bucket replication) is planned in the Production v1 milestone phase 5. Until that ships: **manual backups are the only recovery option.** +Automated backup (nightly `pg_dump` + MinIO `mc mirror` over Tailscale to `heim-nas`) is a follow-up issue. Until that ships: **manual backups are the only recovery option.** + +### Rollback + +Each release tag corresponds to a docker image tag on the host daemon (built via DooD; no registry). Rolling back to a previous tag is one command: + +```bash +TAG=v1.0.0 docker compose \ + -f docker-compose.prod.yml \ + -p archiv-production \ + --env-file /opt/familienarchiv/.env.production \ + up -d --wait --remove-orphans +``` + +If the rollback target image is no longer present on the host (host disk pruned, etc.), re-trigger `release.yml` for that tag from Gitea Actions UI — it rebuilds and redeploys. + +**Flyway migrations are not auto-rolled-back.** If a release contained a destructive migration (drop column, rename table), a tag rollback brings the schema back to a previous app version but the data shape has already changed. For breaking schema changes, prefer a forward-only fix. ---