docs(legibility): write docs/DEPLOYMENT.md (production runtime + env vars) #399

Closed
opened 2026-05-04 16:08:47 +02:00 by marcel · 10 comments
Owner

Context

Part of Epic #394 — Documentation. This is DOC-5: the production-runtime reference. Successor-X reads this when something breaks in production or when they need to redeploy. Anja reads this to understand the operational shape of the system.

Per the Legibility Rubric, this addresses C9.1, C9.2, C9.4 (all Major or Minor).

Note: the existing Production v1 milestone (#1) covers actual deployment work in 7 phases. This issue is about documenting what exists and is planned, not doing the deployment itself.

Required content

A single docs/DEPLOYMENT.md containing:

1. Deployment topology

What runs where in production:

  • Reverse proxy (Caddy per phase-3 plans) → terminates TLS
  • Frontend container → static SvelteKit build
  • Backend container → Spring Boot Jetty
  • OCR service container ��� Python FastAPI (single-node, see ADR-001)
  • PostgreSQL 16
  • MinIO (S3 object store)
  • Mailpit (dev only — note that production needs a real SMTP)

ASCII or Mermaid diagram preferred.

2. Environment variables (per service)

For each container, a table:

Variable Purpose Default Required? Sensitive?

Cover at least: DB credentials, MinIO credentials, OCR service URL, Spring profile, frontend public API URL, mailer config.

3. Bootstrap from scratch

The exact sequence to bring a fresh production environment up. References:

  • docker-compose.yml (dev) and the eventual docker-compose.prod.yml (phase-3 of Production v1)
  • Initial admin user creation
  • Initial MinIO bucket creation (currently handled by the create-buckets helper container)
  • Initial Flyway migration run (auto on backend start)

4. Logs + observability

Where to look when things break:

  • Container logs: docker compose logs <service>
  • Backend application log location
  • Future: Prometheus / Loki / Grafana per phase-7 of Production v1

5. Backup + recovery

What gets backed up, how, how often, and how to restore. References the phase-5 work in Production v1.

6. Common operational tasks

  • Reset dev DB (scripts/reset-db.sh)
  • Rebuild frontend container (scripts/rebuild-frontend.sh)
  • Download Kraken models (scripts/download-kraken-models.sh)
  • Mass import (Excel/ODS) procedure

7. Known limitations

  • Single-node OCR service (ADR-001 — link to ARCHITECTURE.md)
  • No multi-tenancy
  • No multi-region

Acceptance criteria

  • docs/DEPLOYMENT.md exists with all 7 sections
  • Every env var actually used in any docker-compose file or application*.yml is listed
  • Linked from README.md (DOC-1) and from each phase-N issue in the Production v1 milestone
  • PR opened and merged

Dependency

Soft dependency on AUDIT-5 (#392) for findings about repo hygiene + infra/.

Definition of Done

docs/DEPLOYMENT.md committed on main. Closing comment links to it and notes any deployment phase that this doc reveals as still-undocumented (those become follow-up issues in Production v1).

## Context Part of **Epic #394** — Documentation. This is **DOC-5**: the production-runtime reference. Successor-X reads this when something breaks in production or when they need to redeploy. Anja reads this to understand the operational shape of the system. Per the Legibility Rubric, this addresses **C9.1, C9.2, C9.4** (all Major or Minor). Note: the existing **Production v1** milestone (#1) covers actual deployment work in 7 phases. This issue is about **documenting** what exists and is planned, not doing the deployment itself. ## Required content A single `docs/DEPLOYMENT.md` containing: ### 1. Deployment topology What runs where in production: - Reverse proxy (Caddy per phase-3 plans) → terminates TLS - Frontend container → static SvelteKit build - Backend container → Spring Boot Jetty - OCR service container ��� Python FastAPI (single-node, see ADR-001) - PostgreSQL 16 - MinIO (S3 object store) - Mailpit (dev only — note that production needs a real SMTP) ASCII or Mermaid diagram preferred. ### 2. Environment variables (per service) For each container, a table: | Variable | Purpose | Default | Required? | Sensitive? | |---|---|---|---|---| Cover at least: DB credentials, MinIO credentials, OCR service URL, Spring profile, frontend public API URL, mailer config. ### 3. Bootstrap from scratch The exact sequence to bring a fresh production environment up. References: - `docker-compose.yml` (dev) and the eventual `docker-compose.prod.yml` (phase-3 of Production v1) - Initial admin user creation - Initial MinIO bucket creation (currently handled by the `create-buckets` helper container) - Initial Flyway migration run (auto on backend start) ### 4. Logs + observability Where to look when things break: - Container logs: `docker compose logs <service>` - Backend application log location - Future: Prometheus / Loki / Grafana per phase-7 of Production v1 ### 5. Backup + recovery What gets backed up, how, how often, and how to restore. References the phase-5 work in Production v1. ### 6. Common operational tasks - Reset dev DB (`scripts/reset-db.sh`) - Rebuild frontend container (`scripts/rebuild-frontend.sh`) - Download Kraken models (`scripts/download-kraken-models.sh`) - Mass import (Excel/ODS) procedure ### 7. Known limitations - Single-node OCR service (ADR-001 — link to ARCHITECTURE.md) - No multi-tenancy - No multi-region ## Acceptance criteria - [ ] `docs/DEPLOYMENT.md` exists with all 7 sections - [ ] Every env var actually used in any docker-compose file or `application*.yml` is listed - [ ] Linked from `README.md` (DOC-1) and from each phase-N issue in the Production v1 milestone - [ ] PR opened and merged ## Dependency Soft dependency on AUDIT-5 (#392) for findings about repo hygiene + infra/. ## Definition of Done `docs/DEPLOYMENT.md` committed on `main`. Closing comment links to it and notes any deployment phase that this doc reveals as still-undocumented (those become follow-up issues in Production v1).
marcel added this to the Codebase Legibility milestone 2026-05-04 16:08:47 +02:00
marcel added the P1-highdocumentationlegibility labels 2026-05-04 16:09:53 +02:00
Author
Owner

🏗️ Markus Keller — Senior Application Architect

Observations

  • The issue mandates documenting a topology that already exists in docs/infrastructure/production-compose.md (full docker-compose.prod.yml, Caddyfile, cost breakdown) and docs/infrastructure/s3-migration.md. This material must be referenced or synthesised, not duplicated. Two canonical sources of truth for the same topology will diverge.
  • The issue links to ARCHITECTURE.md for the single-node OCR limitation (ADR-001) — that file doesn't exist yet as a standalone doc, but docs/adr/001-ocr-python-microservice.md does. The link in Section 7 should resolve to the ADR, not to a non-existent ARCHITECTURE.md.
  • The "Known limitations" section is a solid forcing function for ADR hygiene. Each limitation deserves a back-reference to its ADR, not just a mention. ADR-001 exists. No ADR exists yet for "no multi-tenancy" or "no multi-region" — both are deliberate constraints. This doc is a good prompt to add them (or at minimum note they are accepted-by-default, not accidental).
  • The issue requests docker-compose.prod.yml bootstrap instructions, but this file doesn't yet exist as a committed file — it lives documented in docs/infrastructure/production-compose.md. Section 3 ("Bootstrap from scratch") should clarify whether readers are following the documented overlay approach or a not-yet-committed prod compose file.

Recommendations

  • Open DEPLOYMENT.md with a one-sentence architecture map, then link to docs/infrastructure/production-compose.md for the full Compose file and docs/adr/ for each design decision. Don't repeat what's already in those files.
  • Replace the "See ADR-001" reference in Section 7 with the exact relative path docs/adr/001-ocr-python-microservice.md so links resolve from docs/DEPLOYMENT.md.
  • For the OCR memory limit: the current docker-compose.yml sets mem_limit: 12g on the OCR service. Section 1 (topology) should note this — it is a direct constraint on VPS sizing (CX32 has 8 GB total RAM, which means OCR cannot actually be mem-limited to 12 GB there; that's a prod-sizing gap worth calling out explicitly).
  • Mermaid diagram is the right call over ASCII for this stack — it renders in Gitea natively and stays legible when service count grows.
## 🏗️ Markus Keller — Senior Application Architect ### Observations - The issue mandates documenting a topology that already exists in `docs/infrastructure/production-compose.md` (full `docker-compose.prod.yml`, Caddyfile, cost breakdown) and `docs/infrastructure/s3-migration.md`. This material must be **referenced or synthesised**, not duplicated. Two canonical sources of truth for the same topology will diverge. - The issue links to `ARCHITECTURE.md` for the single-node OCR limitation (ADR-001) — that file doesn't exist yet as a standalone doc, but `docs/adr/001-ocr-python-microservice.md` does. The link in Section 7 should resolve to the ADR, not to a non-existent `ARCHITECTURE.md`. - The "Known limitations" section is a solid forcing function for ADR hygiene. Each limitation deserves a back-reference to its ADR, not just a mention. ADR-001 exists. No ADR exists yet for "no multi-tenancy" or "no multi-region" — both are deliberate constraints. This doc is a good prompt to add them (or at minimum note they are accepted-by-default, not accidental). - The issue requests `docker-compose.prod.yml` bootstrap instructions, but this file doesn't yet exist as a committed file — it lives documented in `docs/infrastructure/production-compose.md`. Section 3 ("Bootstrap from scratch") should clarify whether readers are following the documented overlay approach or a not-yet-committed prod compose file. ### Recommendations - Open `DEPLOYMENT.md` with a one-sentence architecture map, then link to `docs/infrastructure/production-compose.md` for the full Compose file and `docs/adr/` for each design decision. Don't repeat what's already in those files. - Replace the "See ADR-001" reference in Section 7 with the exact relative path `docs/adr/001-ocr-python-microservice.md` so links resolve from `docs/DEPLOYMENT.md`. - For the OCR memory limit: the current `docker-compose.yml` sets `mem_limit: 12g` on the OCR service. Section 1 (topology) should note this — it is a direct constraint on VPS sizing (CX32 has 8 GB total RAM, which means OCR cannot actually be mem-limited to 12 GB there; that's a prod-sizing gap worth calling out explicitly). - Mermaid diagram is the right call over ASCII for this stack — it renders in Gitea natively and stays legible when service count grows.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

  • The issue asks for scripts/rebuild-frontend.sh in Section 6. That script exists at /scripts/rebuild-frontend.sh — good. But scripts/reset-db.sh hardcodes DB_USER=archive_user and DB_NAME=family_archive_db instead of reading from .env. If someone customises those in .env, the script silently operates on wrong values. Worth noting in the doc as a "gotcha" rather than leaving it to discovery.
  • Section 2 (env vars table) will be the highest-maintenance part of this doc. The env vars are defined in three places: .env.example (the canonical dev list), docker-compose.yml (the wiring), and application.yaml (backend resolution). There's also APP_ADMIN_USERNAME / APP_ADMIN_PASSWORD in application.yaml that doesn't appear in .env.example at all — this is a gap the doc should surface.
  • Section 3 ("Bootstrap from scratch") should cover the initial admin user. Currently application.yaml has app.admin.username: ${APP_ADMIN_USERNAME:admin} and app.admin.password: ${APP_ADMIN_PASSWORD:admin123} with insecure defaults. The bootstrap procedure must include explicitly overriding these, otherwise new deployments will ship with admin/admin123.
  • The SPRING_PROFILES_ACTIVE: dev,e2e in docker-compose.yml is developer-facing only. Section 3 should note that production uses prod profile (per production-compose.md). This is documented elsewhere but easy to miss.

Recommendations

  • For the env vars table: derive it from .env.example (already well-documented) and application.yaml rather than writing it from scratch. Any var in application.yaml not in .env.example is a documentation gap — APP_ADMIN_USERNAME / APP_ADMIN_PASSWORD is the main one I found.
  • Include a "change these before first boot" callout box in Section 3 listing the three vars that must not remain at defaults: POSTGRES_PASSWORD, MINIO_ROOT_PASSWORD, APP_ADMIN_PASSWORD. These are the three that ship with change-me / admin123.
  • scripts/reset-db.sh correctly warns before truncating — document its scope limitation (it truncates data but doesn't drop the schema or re-run Flyway; use it for E2E resets, not full reinstalls).
## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - The issue asks for `scripts/rebuild-frontend.sh` in Section 6. That script exists at `/scripts/rebuild-frontend.sh` — good. But `scripts/reset-db.sh` hardcodes `DB_USER=archive_user` and `DB_NAME=family_archive_db` instead of reading from `.env`. If someone customises those in `.env`, the script silently operates on wrong values. Worth noting in the doc as a "gotcha" rather than leaving it to discovery. - Section 2 (env vars table) will be the highest-maintenance part of this doc. The env vars are defined in three places: `.env.example` (the canonical dev list), `docker-compose.yml` (the wiring), and `application.yaml` (backend resolution). There's also `APP_ADMIN_USERNAME` / `APP_ADMIN_PASSWORD` in `application.yaml` that doesn't appear in `.env.example` at all — this is a gap the doc should surface. - Section 3 ("Bootstrap from scratch") should cover the initial admin user. Currently `application.yaml` has `app.admin.username: ${APP_ADMIN_USERNAME:admin}` and `app.admin.password: ${APP_ADMIN_PASSWORD:admin123}` with insecure defaults. The bootstrap procedure must include explicitly overriding these, otherwise new deployments will ship with `admin/admin123`. - The `SPRING_PROFILES_ACTIVE: dev,e2e` in `docker-compose.yml` is developer-facing only. Section 3 should note that production uses `prod` profile (per `production-compose.md`). This is documented elsewhere but easy to miss. ### Recommendations - For the env vars table: derive it from `.env.example` (already well-documented) and `application.yaml` rather than writing it from scratch. Any var in `application.yaml` not in `.env.example` is a documentation gap — `APP_ADMIN_USERNAME` / `APP_ADMIN_PASSWORD` is the main one I found. - Include a "change these before first boot" callout box in Section 3 listing the three vars that must not remain at defaults: `POSTGRES_PASSWORD`, `MINIO_ROOT_PASSWORD`, `APP_ADMIN_PASSWORD`. These are the three that ship with `change-me` / `admin123`. - `scripts/reset-db.sh` correctly warns before truncating — document its scope limitation (it truncates data but doesn't drop the schema or re-run Flyway; use it for E2E resets, not full reinstalls).
Author
Owner

🛠️ Tobias Wendt — DevOps & Platform Engineer

Observations

  • The dev docker-compose.yml has two production-unfriendly patterns the doc must flag explicitly: minio/minio:latest (unpinned tag) and bind mounts for both PostgreSQL (./data/postgres) and MinIO (./data/minio). The production overlay in docs/infrastructure/production-compose.md correctly uses named volumes — the doc should make this contrast explicit so an operator doesn't accidentally ship the dev compose to prod.
  • The OCR service mem_limit: 12g in the dev compose exceeds the recommended CX32's 8 GB total RAM. This is fine in dev (your home NAS likely has more), but deploying to a CX32 without adjusting this limit will result in the container failing to start or being killed by OOM. Section 1 should flag the OCR service memory requirements explicitly.
  • The backend healthcheck in docker-compose.yml uses wget to hit /actuator/health. The actuator endpoint isn't exposed beyond the internal Docker network in production (per the overlay), but the doc should mention that Prometheus scrapes port 8081 (management port) internally, not 8080.
  • The create-buckets container (uses minio/mc) runs without a pinned image tag and isn't in the prod overlay (correctly excluded via profiles: ["dev"]). Section 3 should document what replaces it in production — per s3-migration.md, production uses Hetzner Object Storage, where the bucket is pre-created manually in the Hetzner console.
  • The issue mentions scripts/rebuild-frontend.sh by name but the script assumes the volume is named familienarchiv_frontend_node_modules (hardcoded on line 16). If the project directory isn't named familienarchiv, this silently fails. Worth a doc note.

Recommendations

  • Section 3 (bootstrap) should include the Hetzner Object Storage bucket pre-creation step for production, since there's no create-buckets helper in prod.
  • Explicitly list in Section 2 that S3_ACCESS_KEY/S3_SECRET_KEY should be a dedicated MinIO service account in dev and a scoped Hetzner credential in prod — not the root credentials. The current .env.example uses MINIO_ROOT_USER/MINIO_ROOT_PASSWORD wired through to the app, which is the root-credential antipattern (see docs/infrastructure/production-compose.md for the correct prod approach).
  • Section 4 (logs): add docker compose logs --follow --tail=100 backend as the first-response command. Also note that the backend log is at /app/logs/ inside the container — useful for docker exec forensics.
## 🛠️ Tobias Wendt — DevOps & Platform Engineer ### Observations - The dev `docker-compose.yml` has two production-unfriendly patterns the doc must flag explicitly: `minio/minio:latest` (unpinned tag) and bind mounts for both PostgreSQL (`./data/postgres`) and MinIO (`./data/minio`). The production overlay in `docs/infrastructure/production-compose.md` correctly uses named volumes — the doc should make this contrast explicit so an operator doesn't accidentally ship the dev compose to prod. - The OCR service `mem_limit: 12g` in the dev compose exceeds the recommended CX32's 8 GB total RAM. This is fine in dev (your home NAS likely has more), but deploying to a CX32 without adjusting this limit will result in the container failing to start or being killed by OOM. Section 1 should flag the OCR service memory requirements explicitly. - The backend healthcheck in `docker-compose.yml` uses `wget` to hit `/actuator/health`. The actuator endpoint isn't exposed beyond the internal Docker network in production (per the overlay), but the doc should mention that Prometheus scrapes port 8081 (management port) internally, not 8080. - The `create-buckets` container (uses `minio/mc`) runs without a pinned image tag and isn't in the prod overlay (correctly excluded via `profiles: ["dev"]`). Section 3 should document what replaces it in production — per `s3-migration.md`, production uses Hetzner Object Storage, where the bucket is pre-created manually in the Hetzner console. - The issue mentions `scripts/rebuild-frontend.sh` by name but the script assumes the volume is named `familienarchiv_frontend_node_modules` (hardcoded on line 16). If the project directory isn't named `familienarchiv`, this silently fails. Worth a doc note. ### Recommendations - Section 3 (bootstrap) should include the Hetzner Object Storage bucket pre-creation step for production, since there's no `create-buckets` helper in prod. - Explicitly list in Section 2 that `S3_ACCESS_KEY`/`S3_SECRET_KEY` should be a **dedicated MinIO service account** in dev and a scoped Hetzner credential in prod — not the root credentials. The current `.env.example` uses `MINIO_ROOT_USER`/`MINIO_ROOT_PASSWORD` wired through to the app, which is the root-credential antipattern (see `docs/infrastructure/production-compose.md` for the correct prod approach). - Section 4 (logs): add `docker compose logs --follow --tail=100 backend` as the first-response command. Also note that the backend log is at `/app/logs/` inside the container — useful for `docker exec` forensics.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Observations

  • application.yaml contains app.admin.password: ${APP_ADMIN_PASSWORD:admin123}. The fallback default admin123 ships to any deployment that doesn't set APP_ADMIN_PASSWORD in .env. The doc must treat this as a security-critical bootstrap step, not just a configuration note. A new operator who skips Section 3 will have an admin/admin123 account in production.
  • The .env.example comment for OCR_TRAINING_TOKEN correctly says "Must not be empty in production." This deserves its own row in the Section 2 env vars table with Required? = YES (prod) and Sensitive? = YES — it controls model training endpoints that accept file uploads.
  • The ALLOWED_PDF_HOSTS env var on the OCR service (default: minio,localhost,127.0.0.1) is a critical SSRF control. It is not mentioned anywhere in .env.example or docker-compose.yml. The Section 2 env vars table must include it with a note explaining why it exists — an operator who doesn't understand it might widen it to * to unblock a bug.
  • The backend management port (8081 for Prometheus) should be called out in Section 1 as internal-only. The Caddy config in docs/infrastructure/production-compose.md blocks /actuator/* — this is important and should be referenced in the security context of Section 4 (observability).
  • The BLLA_MODEL_PATH env var in the OCR service (os.environ.get("BLLA_MODEL_PATH", "/app/models/blla.mlmodel")) is not documented anywhere. It controls which baseline layout analysis model Kraken uses — worth including in the env vars table.

Recommendations

  • Add a "Security checklist before first deployment" box at the top of Section 3, containing: (1) change APP_ADMIN_PASSWORD, (2) set a strong OCR_TRAINING_TOKEN, (3) rotate POSTGRES_PASSWORD and MINIO_ROOT_PASSWORD from .env.example defaults, (4) confirm ALLOWED_PDF_HOSTS is locked to your MinIO/S3 hostname only.
  • Section 2 must explicitly mark POSTGRES_PASSWORD, MINIO_ROOT_PASSWORD, S3_SECRET_KEY, APP_ADMIN_PASSWORD, OCR_TRAINING_TOKEN, and MAIL_PASSWORD as Sensitive? = YES in the table — this signals to operators to use secrets injection, not hardcode them in env files.
  • The SPRING_PROFILES_ACTIVE: dev,e2e in dev compose enables OpenAPI (/v3/api-docs) and Swagger UI. Section 3 should confirm this is replaced by prod profile in production, which disables both. An accidental dev profile in production exposes the full API schema publicly.
## 🔒 Nora "NullX" Steiner — Application Security Engineer ### Observations - `application.yaml` contains `app.admin.password: ${APP_ADMIN_PASSWORD:admin123}`. The fallback default `admin123` ships to any deployment that doesn't set `APP_ADMIN_PASSWORD` in `.env`. The doc must treat this as a **security-critical bootstrap step**, not just a configuration note. A new operator who skips Section 3 will have an `admin/admin123` account in production. - The `.env.example` comment for `OCR_TRAINING_TOKEN` correctly says "Must not be empty in production." This deserves its own row in the Section 2 env vars table with `Required? = YES (prod)` and `Sensitive? = YES` — it controls model training endpoints that accept file uploads. - The `ALLOWED_PDF_HOSTS` env var on the OCR service (default: `minio,localhost,127.0.0.1`) is a critical SSRF control. It is not mentioned anywhere in `.env.example` or `docker-compose.yml`. The Section 2 env vars table must include it with a note explaining why it exists — an operator who doesn't understand it might widen it to `*` to unblock a bug. - The backend management port (8081 for Prometheus) should be called out in Section 1 as internal-only. The Caddy config in `docs/infrastructure/production-compose.md` blocks `/actuator/*` — this is important and should be referenced in the security context of Section 4 (observability). - The `BLLA_MODEL_PATH` env var in the OCR service (`os.environ.get("BLLA_MODEL_PATH", "/app/models/blla.mlmodel")`) is not documented anywhere. It controls which baseline layout analysis model Kraken uses — worth including in the env vars table. ### Recommendations - Add a "Security checklist before first deployment" box at the top of Section 3, containing: (1) change `APP_ADMIN_PASSWORD`, (2) set a strong `OCR_TRAINING_TOKEN`, (3) rotate `POSTGRES_PASSWORD` and `MINIO_ROOT_PASSWORD` from `.env.example` defaults, (4) confirm `ALLOWED_PDF_HOSTS` is locked to your MinIO/S3 hostname only. - Section 2 must explicitly mark `POSTGRES_PASSWORD`, `MINIO_ROOT_PASSWORD`, `S3_SECRET_KEY`, `APP_ADMIN_PASSWORD`, `OCR_TRAINING_TOKEN`, and `MAIL_PASSWORD` as `Sensitive? = YES` in the table — this signals to operators to use secrets injection, not hardcode them in env files. - The `SPRING_PROFILES_ACTIVE: dev,e2e` in dev compose enables OpenAPI (`/v3/api-docs`) and Swagger UI. Section 3 should confirm this is replaced by `prod` profile in production, which disables both. An accidental `dev` profile in production exposes the full API schema publicly.
Author
Owner

🧪 Sara Holt — QA Engineer & Test Strategist

Observations

  • The acceptance criteria say "every env var actually used in any docker-compose file or application*.yml is listed." This is testable — I'd suggest a lightweight CI check: grep -E "\$\{[A-Z_]+\}" docker-compose.yml backend/src/main/resources/application*.yaml | grep -oE "\$\{[A-Z_]+[^}:]*\}" | sort -u produces the full list of referenced vars. Running this as part of a PR that touches compose or yaml files catches omissions. The doc should note this pattern so it can be used to audit completeness after the initial write.
  • The issue says the doc should be "linked from README.md (DOC-1)". README.md doesn't exist at the repo root — only in frontend/ and in .pytest_cache/. The acceptance criteria require a link from it, which means either DOC-1 must land first, or this issue's criteria should say "link from DOC-1 when merged." Marking this criterion as unverifiable until DOC-1 exists.
  • The acceptance criteria "linked from each phase-N issue in the Production v1 milestone" requires manual updates to those issues. There's no automated check — this should be explicitly tracked as a follow-up task in the closing comment, otherwise it will be silently skipped.
  • Section 5 (backup + recovery) mentions "phase-5 work in Production v1" but that phase hasn't been implemented. The doc must clearly distinguish between "current state" (no backup configured) and "planned state" (phase-5). Otherwise a future operator reading it might assume backups are already running.

Recommendations

  • Split Section 5 into two subsections: Current state (no automated backup; manual pg_dump procedure described) and Planned (phase-5 target). This prevents the doc from being misleading on a critical operational concern.
  • The closing comment requirement — "notes any deployment phase that this doc reveals as still-undocumented" — is the right forcing function. I'd strengthen the acceptance criteria: require the closing comment to explicitly name each undocumented phase as a follow-up issue title, not just a free-text note. Turns findings into actionable tickets.
  • For the bootstrap sequence in Section 3, add an explicit verification step: after docker compose up, run curl http://localhost:8080/actuator/health and confirm {"status":"UP"} before proceeding. This is the smoke test that confirms the stack came up correctly.
## 🧪 Sara Holt — QA Engineer & Test Strategist ### Observations - The acceptance criteria say "every env var actually used in any docker-compose file or `application*.yml` is listed." This is testable — I'd suggest a lightweight CI check: `grep -E "\$\{[A-Z_]+\}" docker-compose.yml backend/src/main/resources/application*.yaml | grep -oE "\$\{[A-Z_]+[^}:]*\}" | sort -u` produces the full list of referenced vars. Running this as part of a PR that touches compose or yaml files catches omissions. The doc should note this pattern so it can be used to audit completeness after the initial write. - The issue says the doc should be "linked from `README.md` (DOC-1)". `README.md` doesn't exist at the repo root — only in `frontend/` and in `.pytest_cache/`. The acceptance criteria require a link from it, which means either DOC-1 must land first, or this issue's criteria should say "link from DOC-1 when merged." Marking this criterion as unverifiable until DOC-1 exists. - The acceptance criteria "linked from each phase-N issue in the Production v1 milestone" requires manual updates to those issues. There's no automated check — this should be explicitly tracked as a follow-up task in the closing comment, otherwise it will be silently skipped. - Section 5 (backup + recovery) mentions "phase-5 work in Production v1" but that phase hasn't been implemented. The doc must clearly distinguish between "current state" (no backup configured) and "planned state" (phase-5). Otherwise a future operator reading it might assume backups are already running. ### Recommendations - Split Section 5 into two subsections: **Current state** (no automated backup; manual pg_dump procedure described) and **Planned** (phase-5 target). This prevents the doc from being misleading on a critical operational concern. - The closing comment requirement — "notes any deployment phase that this doc reveals as still-undocumented" — is the right forcing function. I'd strengthen the acceptance criteria: require the closing comment to explicitly name each undocumented phase as a follow-up issue title, not just a free-text note. Turns findings into actionable tickets. - For the bootstrap sequence in Section 3, add an explicit verification step: after `docker compose up`, run `curl http://localhost:8080/actuator/health` and confirm `{"status":"UP"}` before proceeding. This is the smoke test that confirms the stack came up correctly.
Author
Owner

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Observations

  • This is a technical doc for operators and future contributors, not a user-facing UI. UX concerns are minimal here. That said: the audience split matters. The issue names two readers — "Successor-X" (someone debugging at 3am) and "Anja" (understanding operational shape). These are different reading modes with very different information-density needs.
  • The section ordering in the issue puts "Deployment topology" first, which is correct for Anja. But Successor-X arriving at 3am needs "Logs + observability" and "Common operational tasks" immediately visible — those are currently Sections 4 and 6. A Table of Contents at the top of the doc will let either reader jump directly.
  • Section 6 ("Common operational tasks") will be read repeatedly in high-stress moments. Short, copy-pasteable command blocks with zero prose between the heading and the command are critical here. The existing scripts (reset-db.sh, rebuild-frontend.sh) are well-structured — the doc should show the exact invocation command, not a description of what the script does.

Recommendations

  • Add a ToC with anchor links as the first element after the intro paragraph. Gitea renders Markdown ToC links natively.
  • Structure Section 6 as a command-first format: heading → one-line command block → one-sentence "what this does" → one-sentence "when to use it." No paragraphs. Scannable in low-light at 3am.
  • Add a "First things to check" section or callout box between the intro and Section 1 — something like "If the app is down right now, go here: [Section 4 — Logs]." This is the emergency entry point for Successor-X and costs 3 lines.
  • No concerns about the doc's visual complexity beyond the above — it's developer documentation, prose and tables are appropriate, no accessibility considerations apply to a Markdown file.
## 🎨 Leonie Voss — UX Designer & Accessibility Strategist ### Observations - This is a technical doc for operators and future contributors, not a user-facing UI. UX concerns are minimal here. That said: the **audience split matters**. The issue names two readers — "Successor-X" (someone debugging at 3am) and "Anja" (understanding operational shape). These are different reading modes with very different information-density needs. - The section ordering in the issue puts "Deployment topology" first, which is correct for Anja. But Successor-X arriving at 3am needs "Logs + observability" and "Common operational tasks" immediately visible — those are currently Sections 4 and 6. A Table of Contents at the top of the doc will let either reader jump directly. - Section 6 ("Common operational tasks") will be read repeatedly in high-stress moments. Short, copy-pasteable command blocks with zero prose between the heading and the command are critical here. The existing scripts (`reset-db.sh`, `rebuild-frontend.sh`) are well-structured — the doc should show the exact invocation command, not a description of what the script does. ### Recommendations - Add a ToC with anchor links as the first element after the intro paragraph. Gitea renders Markdown ToC links natively. - Structure Section 6 as a command-first format: heading → one-line command block → one-sentence "what this does" → one-sentence "when to use it." No paragraphs. Scannable in low-light at 3am. - Add a **"First things to check"** section or callout box between the intro and Section 1 — something like "If the app is down right now, go here: [Section 4 — Logs]." This is the emergency entry point for Successor-X and costs 3 lines. - No concerns about the doc's visual complexity beyond the above — it's developer documentation, prose and tables are appropriate, no accessibility considerations apply to a Markdown file.
Author
Owner

📋 Elicit — Requirements Engineer

Observations

  • The issue is well-specified with 7 named sections and concrete acceptance criteria. The Definition of Done is precise (file committed on main, closing comment links to it). This is ready to implement — no major requirements ambiguity.
  • One verifiability gap: AC says "Every env var actually used in any docker-compose file or application*.yml is listed." This is a completeness claim that a reviewer cannot easily verify by reading the PR. The verifiable test is a diff command (see Sara's comment for the grep approach). Without it, this criterion is assessed by trust, not evidence.
  • One dependency gap: AC says "Linked from README.md (DOC-1)" but no README.md exists at the repo root. If DOC-1 isn't merged first, this acceptance criterion is unachievable. The issue should clarify: either (a) create a minimal docs/README.md stub as part of this issue, or (b) change the AC to "linked from DOC-1 when both are merged."
  • Section 5 scope ambiguity: The issue says "References the phase-5 work in Production v1." This could mean documenting what will be done (forward-reference) or documenting what exists. Given that backups aren't implemented yet, this section risks being entirely speculative. The requirement should say explicitly: "Section 5 documents the current state (no automation, manual procedure) and links to the phase-5 issue as the planned solution."
  • Missing non-functional requirement: The doc is P1-high and under the "Codebase Legibility" milestone, but there's no freshness criterion. Who is responsible for keeping it current when the topology changes? Without an ownership statement, it will drift.

Recommendations

  • Add one sentence to the Definition of Done: "Any env var found in docker-compose.yml or application*.yaml that is not in the table is a blocking review comment." This makes the completeness criterion actionable for the reviewer.
  • Resolve the README dependency explicitly in this issue. The simplest fix: change AC to "linked from README.md when DOC-1 is merged, tracked in closing comment."
  • Add one sentence to the intro of Section 5 distinguishing current state from planned state, avoiding the impression that backup procedures are already operational.
## 📋 Elicit — Requirements Engineer ### Observations - The issue is well-specified with 7 named sections and concrete acceptance criteria. The Definition of Done is precise (file committed on `main`, closing comment links to it). This is ready to implement — no major requirements ambiguity. - **One verifiability gap**: AC says "Every env var actually used in any docker-compose file or `application*.yml` is listed." This is a completeness claim that a reviewer cannot easily verify by reading the PR. The verifiable test is a diff command (see Sara's comment for the grep approach). Without it, this criterion is assessed by trust, not evidence. - **One dependency gap**: AC says "Linked from `README.md` (DOC-1)" but no `README.md` exists at the repo root. If DOC-1 isn't merged first, this acceptance criterion is unachievable. The issue should clarify: either (a) create a minimal `docs/README.md` stub as part of this issue, or (b) change the AC to "linked from DOC-1 when both are merged." - **Section 5 scope ambiguity**: The issue says "References the phase-5 work in Production v1." This could mean documenting what *will* be done (forward-reference) or documenting what *exists*. Given that backups aren't implemented yet, this section risks being entirely speculative. The requirement should say explicitly: "Section 5 documents the current state (no automation, manual procedure) and links to the phase-5 issue as the planned solution." - **Missing non-functional requirement**: The doc is P1-high and under the "Codebase Legibility" milestone, but there's no freshness criterion. Who is responsible for keeping it current when the topology changes? Without an ownership statement, it will drift. ### Recommendations - Add one sentence to the Definition of Done: "Any env var found in `docker-compose.yml` or `application*.yaml` that is not in the table is a blocking review comment." This makes the completeness criterion actionable for the reviewer. - Resolve the README dependency explicitly in this issue. The simplest fix: change AC to "linked from `README.md` when DOC-1 is merged, tracked in closing comment." - Add one sentence to the intro of Section 5 distinguishing current state from planned state, avoiding the impression that backup procedures are already operational.
Author
Owner

🗳️ Decision Queue — Action Required

3 decisions need your input before implementation starts.

Architecture

  • Duplicate vs. reference docs/infrastructure/production-compose.md — The topology and docker-compose.prod.yml are already documented in detail in docs/infrastructure/production-compose.md. Section 1 and Section 3 of DEPLOYMENT.md will either (a) duplicate that content (two sources of truth that will diverge) or (b) summarise and link to it (leaner, stays in sync automatically). Option (b) is strongly preferred architecturally, but changes the scope of this issue: DEPLOYMENT.md becomes a nav/summary doc, not a self-contained reference. (Raised by: Markus)

  • OCR mem_limit: 12g vs CX32 target VPS (8 GB RAM) — The dev compose sets mem_limit: 12g on the OCR service. The recommended production VPS is CX32 (8 GB RAM total). These are incompatible: a CX32 cannot honour a 12 GB mem limit. This is a real sizing gap. Options: (a) lower mem_limit to e.g. 6 GB in prod overlay and accept reduced batch sizes, (b) recommend CX42 (16 GB) as the production target for deployments with OCR, (c) make OCR optional in the prod compose with a note. The DEPLOYMENT.md should document whichever is chosen — this decision needs to be made first. (Raised by: Tobias, Markus)

Requirements

  • README dependency: block or decouple — AC requires a link from README.md (DOC-1), but no root-level README.md exists yet. Options: (a) resolve by adding a stub README.md as part of this issue's scope (minimal: title + link to docs/DEPLOYMENT.md), (b) drop the README link from this issue's AC and add it as a task in the DOC-1 issue, (c) create README.md as a separate pre-req commit in the same PR. Without a decision, the PR will have one unverifiable acceptance criterion. (Raised by: Sara, Elicit)
## 🗳️ Decision Queue — Action Required _3 decisions need your input before implementation starts._ ### Architecture - **Duplicate vs. reference `docs/infrastructure/production-compose.md`** — The topology and `docker-compose.prod.yml` are already documented in detail in `docs/infrastructure/production-compose.md`. Section 1 and Section 3 of `DEPLOYMENT.md` will either (a) duplicate that content (two sources of truth that will diverge) or (b) summarise and link to it (leaner, stays in sync automatically). Option (b) is strongly preferred architecturally, but changes the scope of this issue: `DEPLOYMENT.md` becomes a nav/summary doc, not a self-contained reference. _(Raised by: Markus)_ - **OCR `mem_limit: 12g` vs CX32 target VPS (8 GB RAM)** — The dev compose sets `mem_limit: 12g` on the OCR service. The recommended production VPS is CX32 (8 GB RAM total). These are incompatible: a CX32 cannot honour a 12 GB mem limit. This is a real sizing gap. Options: (a) lower `mem_limit` to e.g. 6 GB in prod overlay and accept reduced batch sizes, (b) recommend CX42 (16 GB) as the production target for deployments with OCR, (c) make OCR optional in the prod compose with a note. The DEPLOYMENT.md should document whichever is chosen — this decision needs to be made first. _(Raised by: Tobias, Markus)_ ### Requirements - **README dependency: block or decouple** — AC requires a link from `README.md` (DOC-1), but no root-level `README.md` exists yet. Options: (a) resolve by adding a stub `README.md` as part of this issue's scope (minimal: title + link to `docs/DEPLOYMENT.md`), (b) drop the README link from this issue's AC and add it as a task in the DOC-1 issue, (c) create `README.md` as a separate pre-req commit in the same PR. Without a decision, the PR will have one unverifiable acceptance criterion. _(Raised by: Sara, Elicit)_
Author
Owner

Decision Queue — Resolved

The 3 decisions raised in #399#issuecomment-6340:

This is the same answer as epic-level D1: DOC-5 is an entry-point Day-1 checklist, not a duplicate of docs/infrastructure/. Two canonical sources for the same topology will diverge — Markus's concern is right.

DOC-5 owns:

  • The "Day-1 checklist" sequencing (run order).
  • The Secrets checklist (per Nora — fail-fast vs dev-default classification).
  • The known-limitations summary (with ADR back-references).
  • The link tree to the canonical infrastructure docs.

docs/infrastructure/production-compose.md owns: the full Compose file, Caddyfile, and step-by-step VPS provisioning. DOC-5 links to it; does not copy it.

2. OCR mem_limit: 12g vs CX32 (8 GB) → document both options, recommend CX42 for OCR-enabled prod (Option B), with a note that this is the operator's call

The 12g value in dev compose is sized for the home NAS, not a CX32. The doc must not gloss over this — Tobias's flag is real. Recommended treatment in DOC-5 Section 1:

OCR memory requirements. The OCR service requires significant RAM for model loading and concurrent requests. The dev compose sets mem_limit: 12g. A Hetzner CX32 (8 GB RAM total) cannot host the OCR service at this limit — choose one:

  • CX42 (16 GB, ~14 EUR/month) — recommended target for OCR-enabled production. Set mem_limit: 12g in the prod overlay.
  • CX32 with reduced batch size — set mem_limit: 6g in the prod overlay; expect smaller batch sizes and slower throughput.
  • CX22 without OCR — disable the OCR service via profiles: ["ocr"] and run OCR locally / on demand only.

This documents the tradeoff so the operator chooses with eyes open. The doc is descriptive; the actual sizing decision is for the project owner to make — flag it on the issue as a follow-up if a definitive recommendation is needed before DOC-5 ships.

Note (open): the project owner may have a definitive answer here (e.g. "we are on CX22 today; OCR runs locally"). If so, document the actual current state and remove the speculative options. If still TBD, leave the table.

Same shape as DOC-3's resolution. DOC-5 is independently writable. Resolution:

  • DOC-5 PR adds docs/DEPLOYMENT.md. Does not require an existing README.md link to merge.
  • The DOC-1 PR (#395) adds the link to DEPLOYMENT.md as part of its Section 4 ("See also"). DOC-1 owns that cross-reference.
  • The original AC "Linked from README.md" is recorded but verifiable only after DOC-1 merges. Track in closing comment as a follow-up checkbox.

The "linked from each phase-N issue in the Production v1 milestone" criterion (per Sara): treat as a closing-comment checklist, with explicit follow-up tickets for each phase issue that needs the back-link.


📌 Additional persona feedback to fold into implementation

  • Markus: Section 7 (Known limitations) — link to docs/adr/001-ocr-python-microservice.md (real path). Add ADRs for "no multi-tenancy" and "no multi-region" if they're deliberate constraints (which they are, per family-only project frame).
  • Felix: derive the Section 2 env vars table from .env.example + application.yaml, not from scratch. Any var in application.yaml not in .env.example is a doc gap. Confirmed gap: APP_ADMIN_USERNAME / APP_ADMIN_PASSWORD ship with admin/admin123 defaults — Section 3 must include "change these before first boot" callout listing POSTGRES_PASSWORD, MINIO_ROOT_PASSWORD, APP_ADMIN_PASSWORD.
  • Felix: Document scripts/reset-db.sh scope — truncates data, doesn't drop schema or re-run Flyway. Hardcoded DB_USER=archive_user, DB_NAME=family_archive_db — note as "gotcha" if those are customised in .env.
  • Tobias: Section 1 — explicit dev-vs-prod contrast: minio/minio:latest (dev, unpinned) vs pinned tag in prod overlay; bind mounts (./data/postgres, ./data/minio) in dev vs named volumes in prod.
  • Tobias: Section 3 — Hetzner Object Storage bucket pre-creation step (replaces dev's create-buckets MinIO MC helper).
  • Tobias: management port 8081 (Prometheus scraping) — internal only, not exposed via Caddy in prod (per production-compose.md's Caddy block on /actuator/*).
  • Tobias: scripts/rebuild-frontend.sh assumes volume name familienarchiv_frontend_node_modules — flag if directory is renamed.
  • Nora: add a "Security checklist before first deployment" box at the top of Section 3:
    1. Change APP_ADMIN_PASSWORD from admin123.
    2. Set a strong OCR_TRAINING_TOKEN (must not be empty in prod).
    3. Rotate POSTGRES_PASSWORD and MINIO_ROOT_PASSWORD from .env.example defaults.
    4. Confirm ALLOWED_PDF_HOSTS (default: minio,localhost,127.0.0.1) is locked to your MinIO/S3 hostname only — widening to * is an SSRF.
    5. Confirm SPRING_PROFILES_ACTIVE=prod (not dev,e2e) — dev exposes Swagger UI and /v3/api-docs.
  • Nora: Section 2 — mark Sensitive? = YES on POSTGRES_PASSWORD, MINIO_ROOT_PASSWORD, S3_SECRET_KEY, APP_ADMIN_PASSWORD, OCR_TRAINING_TOKEN, MAIL_PASSWORD. Recommend secrets injection (Docker secrets / Kubernetes secrets), not env files.
  • Nora: add ALLOWED_PDF_HOSTS and BLLA_MODEL_PATH to env vars table (currently undocumented).
  • Sara: Section 5 — split into Current state ("no automated backup; document manual pg_dump procedure") and Planned ("phase-5 of Production v1 milestone — link to issue"). Don't imply backups exist when they don't.
  • Sara: add CI grep check — grep -E '\$\{[A-Z_]+\}' docker-compose.yml backend/src/main/resources/application*.yaml | grep -oE '\$\{[A-Z_]+[^}:]*\}' | sort -u produces the canonical env-var list. Run as part of any PR touching compose/yaml. Document this pattern.
  • Sara: Section 3 bootstrap — verification step: curl http://localhost:8080/actuator/health should return {"status":"UP"} before continuing.
  • Sara: closing-comment requirement — name each undocumented deployment phase as a follow-up issue title, not free-text.
  • Leonie: add ToC at top with anchor links (Gitea renders natively).
  • Leonie: Section 6 — command-first format (heading → code block → 1-sentence "what" → 1-sentence "when"). 3 a.m. scannable.
  • Leonie: add "First things to check" callout box between intro and Section 1: "If the app is down right now → §4 Logs."
  • Elicit: AC strengthening — "Any env var found in docker-compose.yml or application*.yaml that is not in the table is a blocking review comment."
  • Elicit: add ownership statement — who maintains DOC-5 currency when topology changes? (Project owner; reviewed at every milestone close.)

Status: Ready for implementation. The OCR mem_limit / VPS sizing item flagged above (D2) is the only thing the project owner may want to nail down before DOC-5 ships.

## ✅ Decision Queue — Resolved The 3 decisions raised in [#399#issuecomment-6340](http://heim-nas:3005/marcel/familienarchiv/issues/399#issuecomment-6340): ### 1. Duplicate vs reference `docs/infrastructure/production-compose.md` → **summarise and link (Option B)** This is the same answer as **epic-level D1**: DOC-5 is an entry-point Day-1 checklist, not a duplicate of `docs/infrastructure/`. Two canonical sources for the same topology will diverge — Markus's concern is right. **DOC-5 owns:** - The "Day-1 checklist" sequencing (run order). - The Secrets checklist (per Nora — fail-fast vs dev-default classification). - The known-limitations summary (with ADR back-references). - The link tree to the canonical infrastructure docs. **`docs/infrastructure/production-compose.md` owns:** the full Compose file, Caddyfile, and step-by-step VPS provisioning. DOC-5 links to it; does not copy it. ### 2. OCR `mem_limit: 12g` vs CX32 (8 GB) → **document both options, recommend CX42 for OCR-enabled prod (Option B), with a note that this is the operator's call** The `12g` value in dev compose is sized for the home NAS, not a CX32. The doc must not gloss over this — Tobias's flag is real. Recommended treatment in DOC-5 Section 1: > **OCR memory requirements.** The OCR service requires significant RAM for model loading and concurrent requests. The dev compose sets `mem_limit: 12g`. A Hetzner **CX32** (8 GB RAM total) cannot host the OCR service at this limit — choose one: > > - **CX42 (16 GB, ~14 EUR/month)** — recommended target for OCR-enabled production. Set `mem_limit: 12g` in the prod overlay. > - **CX32 with reduced batch size** — set `mem_limit: 6g` in the prod overlay; expect smaller batch sizes and slower throughput. > - **CX22 without OCR** — disable the OCR service via `profiles: ["ocr"]` and run OCR locally / on demand only. This documents the tradeoff so the operator chooses with eyes open. **The doc is descriptive; the actual sizing decision is for the project owner to make** — flag it on the issue as a follow-up if a definitive recommendation is needed before DOC-5 ships. > Note (open): the project owner may have a definitive answer here (e.g. "we are on CX22 today; OCR runs locally"). If so, document the actual current state and remove the speculative options. If still TBD, leave the table. ### 3. README dependency → **relax AC; track link as follow-up when DOC-1 lands** Same shape as DOC-3's resolution. DOC-5 is independently writable. Resolution: - DOC-5 PR adds `docs/DEPLOYMENT.md`. Does **not** require an existing `README.md` link to merge. - The DOC-1 PR (#395) adds the link to DEPLOYMENT.md as part of its Section 4 ("See also"). DOC-1 owns that cross-reference. - The original AC "_Linked from `README.md`_" is recorded but verifiable only after DOC-1 merges. Track in closing comment as a follow-up checkbox. The "_linked from each phase-N issue in the Production v1 milestone_" criterion (per Sara): treat as a closing-comment checklist, with explicit follow-up tickets for each phase issue that needs the back-link. --- ## 📌 Additional persona feedback to fold into implementation - **Markus:** Section 7 (Known limitations) — link to `docs/adr/001-ocr-python-microservice.md` (real path). Add ADRs for "no multi-tenancy" and "no multi-region" if they're deliberate constraints (which they are, per family-only project frame). - **Felix:** **derive the Section 2 env vars table from `.env.example` + `application.yaml`**, not from scratch. Any var in `application.yaml` not in `.env.example` is a doc gap. Confirmed gap: `APP_ADMIN_USERNAME` / `APP_ADMIN_PASSWORD` ship with `admin/admin123` defaults — Section 3 must include "_change these before first boot_" callout listing `POSTGRES_PASSWORD`, `MINIO_ROOT_PASSWORD`, `APP_ADMIN_PASSWORD`. - **Felix:** Document `scripts/reset-db.sh` scope — truncates data, doesn't drop schema or re-run Flyway. Hardcoded `DB_USER=archive_user`, `DB_NAME=family_archive_db` — note as "gotcha" if those are customised in `.env`. - **Tobias:** Section 1 — explicit dev-vs-prod contrast: `minio/minio:latest` (dev, unpinned) vs pinned tag in prod overlay; bind mounts (`./data/postgres`, `./data/minio`) in dev vs named volumes in prod. - **Tobias:** Section 3 — Hetzner Object Storage bucket pre-creation step (replaces dev's `create-buckets` MinIO MC helper). - **Tobias:** management port 8081 (Prometheus scraping) — internal only, not exposed via Caddy in prod (per `production-compose.md`'s Caddy block on `/actuator/*`). - **Tobias:** `scripts/rebuild-frontend.sh` assumes volume name `familienarchiv_frontend_node_modules` — flag if directory is renamed. - **Nora:** **add a "Security checklist before first deployment" box at the top of Section 3**: 1. Change `APP_ADMIN_PASSWORD` from `admin123`. 2. Set a strong `OCR_TRAINING_TOKEN` (must not be empty in prod). 3. Rotate `POSTGRES_PASSWORD` and `MINIO_ROOT_PASSWORD` from `.env.example` defaults. 4. Confirm `ALLOWED_PDF_HOSTS` (default: `minio,localhost,127.0.0.1`) is locked to your MinIO/S3 hostname only — widening to `*` is an SSRF. 5. Confirm `SPRING_PROFILES_ACTIVE=prod` (not `dev,e2e`) — `dev` exposes Swagger UI and `/v3/api-docs`. - **Nora:** Section 2 — mark `Sensitive? = YES` on `POSTGRES_PASSWORD`, `MINIO_ROOT_PASSWORD`, `S3_SECRET_KEY`, `APP_ADMIN_PASSWORD`, `OCR_TRAINING_TOKEN`, `MAIL_PASSWORD`. Recommend secrets injection (Docker secrets / Kubernetes secrets), not env files. - **Nora:** add `ALLOWED_PDF_HOSTS` and `BLLA_MODEL_PATH` to env vars table (currently undocumented). - **Sara:** Section 5 — split into **Current state** ("no automated backup; document manual `pg_dump` procedure") and **Planned** ("phase-5 of Production v1 milestone — link to issue"). Don't imply backups exist when they don't. - **Sara:** add CI grep check — `grep -E '\$\{[A-Z_]+\}' docker-compose.yml backend/src/main/resources/application*.yaml | grep -oE '\$\{[A-Z_]+[^}:]*\}' | sort -u` produces the canonical env-var list. Run as part of any PR touching compose/yaml. Document this pattern. - **Sara:** Section 3 bootstrap — verification step: `curl http://localhost:8080/actuator/health` should return `{"status":"UP"}` before continuing. - **Sara:** closing-comment requirement — name each undocumented deployment phase as a follow-up issue title, not free-text. - **Leonie:** add ToC at top with anchor links (Gitea renders natively). - **Leonie:** Section 6 — command-first format (heading → code block → 1-sentence "what" → 1-sentence "when"). 3 a.m. scannable. - **Leonie:** add "**First things to check**" callout box between intro and Section 1: "_If the app is down right now → §4 Logs._" - **Elicit:** AC strengthening — "_Any env var found in `docker-compose.yml` or `application*.yaml` that is not in the table is a blocking review comment._" - **Elicit:** add ownership statement — who maintains DOC-5 currency when topology changes? (Project owner; reviewed at every milestone close.) **Status:** Ready for implementation. The OCR mem_limit / VPS sizing item flagged above (D2) is the only thing the project owner may want to nail down before DOC-5 ships.
Author
Owner

DOC-5 implemented — PR #443

docs/DEPLOYMENT.md committed with all 7 sections. Key points:

Undocumented gaps surfaced:

  • APP_ADMIN_USERNAME / APP_ADMIN_PASSWORD ship with admin/admin123 defaults and are not in .env.example — both are now in the env vars table with Sensitive? = YES
  • ALLOWED_PDF_HOSTS (SSRF guard) and BLLA_MODEL_PATH (Kraken model) were absent from compose and .env.example — both added to table

Security checklist (8 items): must complete before first boot — APP_ADMIN_PASSWORD, OCR_TRAINING_TOKEN, POSTGRES_PASSWORD, MINIO_ROOT_PASSWORD, ALLOWED_PDF_HOSTS, SPRING_PROFILES_ACTIVE=prod, dedicated S3 service account

Deployment phase follow-ups (still undocumented in Production v1):

  • Phase 7 (Prometheus / Loki / Grafana) — no monitoring infrastructure in place yet
  • Phase 5 (automated backup) — no automation; manual pg_dump is the only recovery option

README link: tracked here — will be filled in when PR #440 (DOC-1) merges.

PR: http://heim-nas:3005/marcel/familienarchiv/pulls/443

## DOC-5 implemented — PR #443 `docs/DEPLOYMENT.md` committed with all 7 sections. Key points: **Undocumented gaps surfaced:** - `APP_ADMIN_USERNAME` / `APP_ADMIN_PASSWORD` ship with `admin`/`admin123` defaults and are not in `.env.example` — both are now in the env vars table with `Sensitive? = YES` - `ALLOWED_PDF_HOSTS` (SSRF guard) and `BLLA_MODEL_PATH` (Kraken model) were absent from compose and `.env.example` — both added to table **Security checklist (8 items):** must complete before first boot — `APP_ADMIN_PASSWORD`, `OCR_TRAINING_TOKEN`, `POSTGRES_PASSWORD`, `MINIO_ROOT_PASSWORD`, `ALLOWED_PDF_HOSTS`, `SPRING_PROFILES_ACTIVE=prod`, dedicated S3 service account **Deployment phase follow-ups** (still undocumented in Production v1): - Phase 7 (Prometheus / Loki / Grafana) — no monitoring infrastructure in place yet - Phase 5 (automated backup) — no automation; manual `pg_dump` is the only recovery option **README link:** tracked here — will be filled in when PR #440 (DOC-1) merges. PR: http://heim-nas:3005/marcel/familienarchiv/pulls/443
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#399