# ADR-024: Grafana reads archive-db via a bridged network and a SELECT-only role ## Status Accepted ## Context Issue #651 (the PO Overview Grafana dashboard) needs aggregates over three tables in the main application database — `audit_log`, `documents`, and `transcription_blocks` — to answer the operator's four weekly questions: is everything working, are people using it, is the archive making progress, is OCR working well. Until now, `obs-grafana` and the rest of the observability stack lived on their own Docker network (`obs-net`) and never touched `archiv-net`, where `archive-db` runs. The two were intentionally isolated: a compromise of any observability container could not pivot to the application database. The PO Overview's archive-progress and user-activity panels need rolling 7-day SQL aggregates that cannot be served by Prometheus or Loki. That forces a connection from `obs-grafana` to `archive-db` for the first time. Two implementation requirements shaped the design: 1. **Least privilege on the database side.** The Spring Boot application role (`archiv`) has full read/write on every table. Letting Grafana connect with that role would mean a Grafana compromise becomes an application compromise. The dashboard only needs SELECT on three tables; the role must reflect that and nothing more. 2. **Operational simplicity of secret rotation.** The role's password is shared between the migration that sets it and the Grafana datasource that uses it. A first version of this work put the password in a versioned Flyway migration (V68), which Flyway only applies once — leaving rotation as an out-of-band `psql ALTER ROLE` step that no runbook documented. The shape must support rotation without manual SQL. ## Decision - Provision a dedicated PostgreSQL role `grafana_reader` with `LOGIN` plus `GRANT SELECT` on `audit_log`, `documents`, `transcription_blocks` only. No INSERT/UPDATE/DELETE on any table, no access to any other table — enforced by the database, locked in by both positive and parameterized negative tests in `GrafanaReaderRoleIntegrationTest`. - Split the role's lifecycle across two migrations: - `V68__add_grafana_reader_role.sql` — versioned, immutable, idempotent. Creates the role and applies the grants. Runs exactly once per database, like every other versioned migration. - `R__grafana_reader_password.sql` — Flyway *repeatable* migration that issues `ALTER ROLE grafana_reader WITH PASSWORD '${grafanaDbPassword}'`. Flyway computes the checksum on the resolved content, so any change to `GRAFANA_DB_PASSWORD` flips the checksum and re-applies the migration on the next boot. Rotation becomes "bump env var, restart backend, restart obs-grafana" — see the runbook in `docs/DEPLOYMENT.md §4 → Rotate the grafana_reader DB password`. - Resolve the password through Spring's `Environment` rather than a raw `System.getenv()` call, so tests inject via `application.properties` and the resolver is unit-testable with `MockEnvironment`. Fail closed with `IllegalStateException` when the variable is unset — no fallback string. Same shape as `UserDataInitializer`'s refusal to seed default admin credentials outside dev/test/e2e. - Join `obs-grafana` to `archiv-net` in addition to `obs-net`. Only the Grafana container crosses the boundary; Loki, Tempo, Prometheus, GlitchTip, and the worker containers remain `obs-net`-only. ## Consequences **Positive** - Database-level least privilege: a Grafana compromise gains SELECT on three tables. Cannot write, cannot read PII tables like `app_users`, `persons`, `notifications`, `document_comments`, `geschichten`. The parameterized PII negative sweep in `GrafanaReaderRoleIntegrationTest` is the regression gate; new sensitive tables get added to that list. - Rotation is documented, idempotent, and survives operator turnover. No "the password set on day 1 is the password forever" failure mode. - Tests pin down both sides of the boundary: positive grants must hold, write-deny must hold, and the PII negative list must stay empty. **Negative / trade-offs** - `obs-net` is no longer fully isolated from `archiv-net`. A Grafana RCE (e.g. via a future Grafana CVE) gains a TCP path to `archive-db` — contained, but not impossible. The least-privilege role is the mitigation; we accept that mitigation as sufficient for a single bridged container. - The backend must hold `GRAFANA_DB_PASSWORD` in its environment forever, so Flyway can resolve the placeholder on every boot. A backend RCE therefore also leaks the Grafana datasource password. Acceptable because that password's blast radius is itself bounded by the least-privilege grants on `grafana_reader`. ## Alternatives considered - **Prometheus PostgreSQL exporter, no direct connection.** Loses ad-hoc SQL aggregates — the dashboard would need every metric pre-defined as an exporter query, with a redeploy to add a new one. The PO Overview is the type of dashboard that grows panels over time; pre-defining every aggregate is the wrong shape. - **Read replica or logical-replication slot dedicated to Grafana.** Real operational cost (extra Postgres instance, replication monitoring, storage doubled) disproportionate to a weekly PO glance. - **Versioned migration with `flyway repair` for rotation.** Rejected: conflates schema lifecycle with credential lifecycle, requires manual intervention to rotate, and the repair command's semantics are surprising to operators unfamiliar with Flyway internals. - **Hardcoded fallback password when env var is unset.** Rejected as a security blocker: publishes a known credential for a role with read access to user activity and full letter text. The fail-closed behavior is the explicit defense. ## References - Issue #651 — PO Overview Grafana dashboard - `backend/src/main/resources/db/migration/V68__add_grafana_reader_role.sql` - `backend/src/main/resources/db/migration/R__grafana_reader_password.sql` - `backend/src/main/java/org/raddatz/familienarchiv/config/FlywayConfig.java` - `backend/src/test/java/org/raddatz/familienarchiv/config/GrafanaReaderRoleIntegrationTest.java` - `infra/observability/grafana/provisioning/datasources/datasources.yml` - `docker-compose.observability.yml` — `archiv-net` bridge on `obs-grafana` - `docs/DEPLOYMENT.md §4` — rotation runbook