Add automated PostgreSQL backup script with offsite upload #138

Open
opened 2026-03-28 08:52:49 +01:00 by marcel · 1 comment
Owner

Why

There is currently no backup automation. PostgreSQL data lives in ./data/postgres/ (a host bind mount). If the VPS is lost, the disk fails, or someone runs docker compose down -v, all family archive data is gone permanently. For a family archive storing irreplaceable scanned documents and metadata, this is the highest-consequence gap in the entire production setup.

Object storage (Hetzner S3) provides built-in durability for the uploaded files — the database is the unprotected part.

What to do

1. Create scripts/backup-db.sh

Nightly logical backup using pg_dump, compressed, uploaded to Hetzner Object Storage via rclone.

#!/bin/bash
set -euo pipefail

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/opt/backups/postgres"
BACKUP_FILE="${BACKUP_DIR}/dump_${TIMESTAMP}.sql.gz"

mkdir -p "$BACKUP_DIR"

# Dump and compress in one pass — no uncompressed intermediate file
docker exec archive-db pg_dump \
  -U "$POSTGRES_USER" "$POSTGRES_DB" \
  --format=plain \
  --no-owner \
  | gzip > "$BACKUP_FILE"

echo "Backup created: $BACKUP_FILE ($(du -sh "$BACKUP_FILE" | cut -f1))"

# Upload to Hetzner Object Storage
rclone copy "$BACKUP_FILE" hetzner-s3:archive-db-backups/

# Remove local copies older than 7 days
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +7 -delete

echo "Backup complete."

2. Create scripts/restore-test.sh

Monthly smoke test — restores the latest backup into a throwaway container and confirms it starts cleanly. A backup you have never tested is not a backup.

#!/bin/bash
set -euo pipefail

LATEST=$(ls -t /opt/backups/postgres/*.sql.gz | head -1)
echo "Testing restore of: $LATEST"

docker run -d --name pg-restore-test \
  -e POSTGRES_PASSWORD=test \
  -e POSTGRES_USER=test \
  -e POSTGRES_DB=test \
  postgres:16-alpine

sleep 5
zcat "$LATEST" | docker exec -i pg-restore-test psql -U test -d test -q
echo "Restore test passed."
docker rm -f pg-restore-test

3. Schedule with cron on the VPS

# /etc/cron.d/familienarchiv-backup
# Nightly backup at 02:00
0 2 * * * root /opt/familienarchiv/scripts/backup-db.sh >> /var/log/familienarchiv-backup.log 2>&1

# Monthly restore test on the 1st at 03:00
0 3 1 * * root /opt/familienarchiv/scripts/restore-test.sh >> /var/log/familienarchiv-restore-test.log 2>&1

4. Configure rclone for Hetzner Object Storage

# On the VPS — one-time setup
rclone config create hetzner-s3 s3 \
  provider=Other \
  access_key_id=<hetzner-access-key> \
  secret_access_key=<hetzner-secret-key> \
  endpoint=https://fsn1.your-objectstorage.com \
  acl=private

Create a dedicated bucket archive-db-backups (separate from archive-documents used by the app).

What this gives us

  • 7-day rolling local backup history
  • Offsite copy on Hetzner Object Storage (geographically separate from the VPS)
  • Monthly proof that restores actually work
  • Full audit trail in /var/log/familienarchiv-backup.log

Out of scope for this issue

WAL-G continuous archiving (point-in-time recovery) is a further improvement but not required for v1. A nightly pg_dump with offsite upload is sufficient for a family archive with low write volume.

Acceptance criteria

  • scripts/backup-db.sh runs without error on the VPS and produces a .sql.gz in /opt/backups/postgres/.
  • The backup file is visible in the archive-db-backups bucket on Hetzner Object Storage.
  • scripts/restore-test.sh successfully restores the latest backup into a throwaway container.
  • Cron jobs are installed and visible in /etc/cron.d/familienarchiv-backup.
  • Backup log shows successful runs for at least 3 consecutive days.
## Why There is currently no backup automation. PostgreSQL data lives in `./data/postgres/` (a host bind mount). If the VPS is lost, the disk fails, or someone runs `docker compose down -v`, all family archive data is gone permanently. For a family archive storing irreplaceable scanned documents and metadata, this is the highest-consequence gap in the entire production setup. Object storage (Hetzner S3) provides built-in durability for the uploaded files — the database is the unprotected part. ## What to do ### 1. Create `scripts/backup-db.sh` Nightly logical backup using `pg_dump`, compressed, uploaded to Hetzner Object Storage via `rclone`. ```bash #!/bin/bash set -euo pipefail TIMESTAMP=$(date +%Y%m%d_%H%M%S) BACKUP_DIR="/opt/backups/postgres" BACKUP_FILE="${BACKUP_DIR}/dump_${TIMESTAMP}.sql.gz" mkdir -p "$BACKUP_DIR" # Dump and compress in one pass — no uncompressed intermediate file docker exec archive-db pg_dump \ -U "$POSTGRES_USER" "$POSTGRES_DB" \ --format=plain \ --no-owner \ | gzip > "$BACKUP_FILE" echo "Backup created: $BACKUP_FILE ($(du -sh "$BACKUP_FILE" | cut -f1))" # Upload to Hetzner Object Storage rclone copy "$BACKUP_FILE" hetzner-s3:archive-db-backups/ # Remove local copies older than 7 days find "$BACKUP_DIR" -name "*.sql.gz" -mtime +7 -delete echo "Backup complete." ``` ### 2. Create `scripts/restore-test.sh` Monthly smoke test — restores the latest backup into a throwaway container and confirms it starts cleanly. A backup you have never tested is not a backup. ```bash #!/bin/bash set -euo pipefail LATEST=$(ls -t /opt/backups/postgres/*.sql.gz | head -1) echo "Testing restore of: $LATEST" docker run -d --name pg-restore-test \ -e POSTGRES_PASSWORD=test \ -e POSTGRES_USER=test \ -e POSTGRES_DB=test \ postgres:16-alpine sleep 5 zcat "$LATEST" | docker exec -i pg-restore-test psql -U test -d test -q echo "Restore test passed." docker rm -f pg-restore-test ``` ### 3. Schedule with cron on the VPS ``` # /etc/cron.d/familienarchiv-backup # Nightly backup at 02:00 0 2 * * * root /opt/familienarchiv/scripts/backup-db.sh >> /var/log/familienarchiv-backup.log 2>&1 # Monthly restore test on the 1st at 03:00 0 3 1 * * root /opt/familienarchiv/scripts/restore-test.sh >> /var/log/familienarchiv-restore-test.log 2>&1 ``` ### 4. Configure rclone for Hetzner Object Storage ```bash # On the VPS — one-time setup rclone config create hetzner-s3 s3 \ provider=Other \ access_key_id=<hetzner-access-key> \ secret_access_key=<hetzner-secret-key> \ endpoint=https://fsn1.your-objectstorage.com \ acl=private ``` Create a dedicated bucket `archive-db-backups` (separate from `archive-documents` used by the app). ## What this gives us - 7-day rolling local backup history - Offsite copy on Hetzner Object Storage (geographically separate from the VPS) - Monthly proof that restores actually work - Full audit trail in `/var/log/familienarchiv-backup.log` ## Out of scope for this issue WAL-G continuous archiving (point-in-time recovery) is a further improvement but not required for v1. A nightly `pg_dump` with offsite upload is sufficient for a family archive with low write volume. ## Acceptance criteria - `scripts/backup-db.sh` runs without error on the VPS and produces a `.sql.gz` in `/opt/backups/postgres/`. - The backup file is visible in the `archive-db-backups` bucket on Hetzner Object Storage. - `scripts/restore-test.sh` successfully restores the latest backup into a throwaway container. - Cron jobs are installed and visible in `/etc/cron.d/familienarchiv-backup`. - Backup log shows successful runs for at least 3 consecutive days.
marcel added the devopsphase-5: backups labels 2026-03-28 10:46:44 +01:00
Author
Owner

Audit confirmation + scope expansion (2026-05-07)

Pre-prod audit confirms zero backup infrastructure: no backup sidecar in docker-compose.yml, no pg_dump/wal-g/restic, no MinIO bucket versioning or replication, no documented RTO/RPO.

# docker-compose.prod.yml
db-backup:
  image: prodrigestivill/postgres-backup-local:16
  environment:
    POSTGRES_HOST: db
    POSTGRES_DB: ${POSTGRES_DB}
    POSTGRES_USER: ${POSTGRES_USER}
    POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    SCHEDULE: "@daily"
    BACKUP_KEEP_DAYS: 30
    BACKUP_KEEP_WEEKS: 4
    BACKUP_KEEP_MONTHS: 12
    HEALTHCHECK_PORT: 8080
  volumes: [./data/backups/postgres:/backups]

MinIO — please add to scope

The audit found MinIO is currently a single-node bind-mount with no versioning. Letters/PDFs in MinIO are irreplaceable scanned originals — losing the bucket without recovery is total data loss for the archive's central asset.

  • mc version enable myminio/$BUCKET — bucket versioning on
  • mc replicate add to a second MinIO instance OR a real S3-compatible target
  • OR mc mirror --remove --watch to off-host storage (Hetzner Storage Box, B2)

Off-host upload

The current title says "offsite upload" — please make this concrete. Recommendation: restic + Hetzner Storage Box (you already have Hetzner per #141 Tailscale issue), encrypted with RESTIC_PASSWORD from a separate secret.

Critical AC — restore must be tested

  • Restore-test runbook in docs/RUNBOOK.md: spin up an empty stack, run the restore script, verify a known document loads end-to-end. The audit's harshest finding is "untested backups = no backups."
  • Restore-test scheduled monthly via cron, with a smoke email/notification on success.
  • RTO target documented (recommend 4h) and RPO target (recommend 24h, weekly off-host).

Tracked in audit doc as F-04 (Critical). See docs/audits/2026-05-07-pre-prod-architectural-review.md.

## Audit confirmation + scope expansion (2026-05-07) Pre-prod audit confirms zero backup infrastructure: no backup sidecar in `docker-compose.yml`, no `pg_dump`/`wal-g`/`restic`, no MinIO bucket versioning or replication, no documented RTO/RPO. ### Postgres — recommended approach ```yaml # docker-compose.prod.yml db-backup: image: prodrigestivill/postgres-backup-local:16 environment: POSTGRES_HOST: db POSTGRES_DB: ${POSTGRES_DB} POSTGRES_USER: ${POSTGRES_USER} POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} SCHEDULE: "@daily" BACKUP_KEEP_DAYS: 30 BACKUP_KEEP_WEEKS: 4 BACKUP_KEEP_MONTHS: 12 HEALTHCHECK_PORT: 8080 volumes: [./data/backups/postgres:/backups] ``` ### MinIO — please add to scope The audit found MinIO is currently a single-node bind-mount with no versioning. Letters/PDFs in MinIO are **irreplaceable scanned originals** — losing the bucket without recovery is total data loss for the archive's central asset. - [ ] `mc version enable myminio/$BUCKET` — bucket versioning on - [ ] `mc replicate add` to a second MinIO instance OR a real S3-compatible target - [ ] OR `mc mirror --remove --watch` to off-host storage (Hetzner Storage Box, B2) ### Off-host upload The current title says "offsite upload" — please make this concrete. Recommendation: `restic` + Hetzner Storage Box (you already have Hetzner per #141 Tailscale issue), encrypted with `RESTIC_PASSWORD` from a separate secret. ### Critical AC — restore must be tested - [ ] **Restore-test runbook** in `docs/RUNBOOK.md`: spin up an empty stack, run the restore script, verify a known document loads end-to-end. The audit's harshest finding is "untested backups = no backups." - [ ] Restore-test scheduled monthly via cron, with a smoke email/notification on success. - [ ] RTO target documented (recommend 4h) and RPO target (recommend 24h, weekly off-host). Tracked in audit doc as **F-04** (Critical). See `docs/audits/2026-05-07-pre-prod-architectural-review.md`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#138