docs(adr): ADR-010 MinIO stays self-hosted, Hetzner OBS deferred

Records the reversal of the earlier "migrate to Hetzner Object Storage"
direction in docs/infrastructure/production-compose.md. Documents the
cost/benefit (current 13 GB fits trivially on the VPS; OBS billing is
dominated by base fee at this size; migration is a three-env-var swap
plus `mc mirror`, no application rewrite cost).

Captures the four triggers that should re-open the decision (50 GB
threshold, healthcheck latency, VPS upgrade cost, backup runtime) so
the deferral does not become an indefinite punt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-05-11 13:15:38 +02:00
parent 59bc81d353
commit b57afb9ad2

View File

@@ -0,0 +1,53 @@
# ADR-010: MinIO stays self-hosted on the production VPS
## Status
Accepted
## Context
`docs/infrastructure/production-compose.md` (pre-this-PR) sketched a production topology in which the application bucket migrates from in-cluster MinIO to Hetzner Object Storage (OBS, S3-compatible). The motivation was operational: one less service to back up, no MinIO RAM/disk pressure on the VPS, hand off durability to the hyperscaler.
Two facts revisited at pre-merge review (issue #497, comment #8331) changed the answer:
1. **Current data size is small.** The archive is ~13 GB of file uploads (Kurrent letters, scanned ODS files, attachment PDFs). Hetzner OBS billing on this size is dominated by the per-month base fee (~5 EUR/mo for the smallest unit), not capacity or egress. The break-even point against the VPS's existing disk is far above the current footprint.
2. **MinIO is already production-grade.** The dev stack uses MinIO; the backend already drives it via the AWS SDK v2 with a generic `S3_ENDPOINT`. Switching providers is a runtime env-var change (`S3_ENDPOINT`, `S3_ACCESS_KEY`, `S3_SECRET_KEY`) plus an `mc mirror` to copy objects. There is no application-level rewrite cost waiting.
If Hetzner OBS were a one-way-door (provider-specific SDK, complex IAM integration, multi-month migration), the decision would deserve a serious weighing. As reversible as the migration is, deferring it costs nothing.
## Decision
MinIO stays on the production VPS for the first launch. The application bucket is created and managed inside the docker-compose stack (`infra/minio/bootstrap.sh`). The backend uses a least-privilege service account (`archiv-app`) with a bucket-scoped IAM policy, not the MinIO root credentials.
Hetzner Object Storage is **explicitly deferred**, not rejected. The migration path is documented as a runbook in `docs/DEPLOYMENT.md` (when the trigger fires): provision an OBS bucket, run `mc mirror local-minio:/familienarchiv obs:/familienarchiv`, rotate the three env vars, restart the backend, decommission the MinIO service from `docker-compose.prod.yml`.
## Triggers to re-evaluate
Revisit the decision when **any** of the following holds:
- The `minio-data` volume exceeds 50 GB and is growing > 5 GB/month.
- MinIO healthcheck latency exceeds 200 ms p95 (signal of disk pressure on the host).
- The VPS upgrade required to keep MinIO healthy costs more per month than the equivalent OBS bucket + traffic.
- Backup of the MinIO volume to `heim-nas` over Tailscale (deferred follow-up) is implemented and consistently runs > 30 min nightly. At that point durability-as-a-service starts paying for itself.
The migration runbook in `docs/DEPLOYMENT.md` is the script for executing the swap when one of the triggers fires.
## Alternatives Considered
| Alternative | Why rejected (for now) |
|---|---|
| Migrate to Hetzner Object Storage in this PR | Premature. Adds an external dependency, locks the operator into the Hetzner ecosystem before the data has demonstrated it needs hyperscaler durability, blocks the PR on a migration that buys ~5 GB of headroom. |
| Migrate to S3 (AWS) for HA across regions | Way over-spec for a family archive. Egress cost would dwarf any benefit; durability concerns at this size are addressed by nightly off-site backup, not by multi-region replication. |
| Drop S3 abstraction entirely; store files directly on the VPS disk | Possible, but loses the bucket-policy IAM surface (least-privilege service account), loses presigned-URL flow (OCR service downloads files via short-lived URLs, not via shared filesystem), loses the migration path to OBS. The S3 indirection is cheap insurance. |
| Self-hosted on-VPS plus periodic `mc mirror` to Hetzner OBS for off-site backup | This is the **target** for the backup pipeline follow-up. Treated as backup, not primary — primary stays MinIO. |
## Consequences
- The production VPS sizing (Hetzner CX42, 16 GB RAM, 80 GB disk) must accommodate MinIO's working set. Current footprint leaves ample headroom.
- Backup of MinIO data is the operator's responsibility until the off-site `mc mirror` pipeline is implemented (deferred follow-up). The DEPLOYMENT.md rollback procedure explicitly flags this — manual backup is the only recovery option until the pipeline ships.
- The backend never sees the MinIO root password; it uses the `archiv-app` service account with a bucket-scoped IAM policy (see `infra/minio/bootstrap.sh`). A backend RCE/SSRF cannot escalate beyond the `familienarchiv` bucket.
- The migration to Hetzner OBS remains a small, well-understood runbook step rather than a major refactor. No application code, no SDK swap.
## Future Direction
When one of the triggers above fires, the migration is: provision OBS bucket → `mc mirror` → rotate three env vars → restart backend → remove MinIO service from compose. The bucket-scoped policy translates 1:1 to an OBS user policy (S3-compatible).