feat(observability): PO Overview Grafana dashboard (#651) #659

Open
marcel wants to merge 9 commits from feat/issue-651-grafana-po-overview into main
Owner

Summary

Provisioned a single Grafana dashboard the product owner can open once a week to answer four questions at a glance — is everything working, are people using it, is the archive making progress, is OCR working well — without needing a developer to drag SQL out of audit_log. Closes #651.

Architecture

.env / CI ──GRAFANA_DB_PASSWORD──┬─► archive-backend ──Flyway V68 (${grafanaDbPassword})──► grafana_reader role
                                 │                                                          (SELECT-only on
                                 │                                                           audit_log, documents,
                                 │                                                           transcription_blocks)
                                 │
                                 └─► obs-grafana ──postgres datasource──► archive-db:5432 (joined to archiv-net)

Decisions taken (from the issue's decision queue)

  • Password injection: Flyway placeholder substitution via FlywayConfig.placeholders(Map.of(...)). Atomic, idempotent (CREATE on first run, ALTER on re-run), no separate bootstrap script.
  • Transcription coverage: two adjacent panels — a full-width bar gauge for % and a separate stat for Blocks Transcribed This Week. The TEXT_SAVED weekly count was already a dedicated panel in Row 3, so the "+N this week" badge collapses naturally into it.
  • Row 4 placeholder: none — #652 is closed, so the OCR panels use real Prometheus queries from day one.

What's in the PR

File Why
backend/.../db/migration/V68__add_grafana_reader_role.sql Idempotent grafana_reader with SELECT on audit_log, documents, transcription_blocks.
backend/.../config/FlywayConfig.java Adds .placeholders(Map.of("grafanaDbPassword", System.getenv(...))). Logs a warning + uses a non-secret fallback if the env var is missing, so tests/CI don't fail loudly when the secret is irrelevant.
backend/.../config/GrafanaReaderRoleIntegrationTest.java Testcontainers test: positive grants on the 3 tables, negative grant on app_users (NFR-SEC-01).
infra/observability/grafana/provisioning/datasources/datasources.yml Adds postgres datasource, editable: false, sslmode: disable (intra-network).
infra/observability/grafana/provisioning/dashboards/po-overview.json 4 rows × 16 panels, uid: po-overview, 7-day default range, manual refresh, panel titles per Leonie's recommendations.
docker-compose.observability.yml obs-grafana joins archiv-net and receives GRAFANA_DB_PASSWORD.
docker-compose.yml archive-backend receives GRAFANA_DB_PASSWORD (Flyway needs it).
.env.example, infra/observability/obs.env Declares the variable + openssl rand -hex 32 hint.
docs/DEPLOYMENT.md Adds the var to the observability env-var table, Gitea secrets table, and obs-secrets.env reference.
docs/architecture/c4/l2-containers.puml New Grafana → archive-db arrow.

Test plan

  • ./mvnw test -Dtest=GrafanaReaderRoleIntegrationTest — 4/4 green (positive + negative privilege assertions)
  • ./mvnw test -Dtest=AuditLogQueryRepositoryIntegrationTest — still 4/4 green (Flyway placeholder didn't break the existing migration suite)
  • ./mvnw compile — clean
  • Manual smoke (operator step, Sara's recommendation):
    export GRAFANA_DB_PASSWORD=$(openssl rand -hex 32)
    docker compose down -v && docker compose up -d
    docker compose -f docker-compose.observability.yml up -d
    docker exec archive-db psql -U archiv -d familienarchiv -c \
      "SELECT has_table_privilege('grafana_reader','audit_log','SELECT');"   # → t
    docker exec archive-db psql -U archiv -d familienarchiv -c \
      "SELECT has_table_privilege('grafana_reader','app_users','SELECT');"   # → f
    
    Open http://localhost:3003 → PO Overview → confirm every panel populates (not "No data") and the default range is "Last 7 days".

Out of scope

Alerting, per-user breakdown, per-sender OCR accuracy, mobile layout — all explicit in the issue.

🤖 Generated with Claude Code

## Summary Provisioned a single Grafana dashboard the product owner can open once a week to answer four questions at a glance — is everything working, are people using it, is the archive making progress, is OCR working well — without needing a developer to drag SQL out of `audit_log`. Closes #651. ## Architecture ``` .env / CI ──GRAFANA_DB_PASSWORD──┬─► archive-backend ──Flyway V68 (${grafanaDbPassword})──► grafana_reader role │ (SELECT-only on │ audit_log, documents, │ transcription_blocks) │ └─► obs-grafana ──postgres datasource──► archive-db:5432 (joined to archiv-net) ``` ## Decisions taken (from the issue's decision queue) - **Password injection**: Flyway placeholder substitution via `FlywayConfig.placeholders(Map.of(...))`. Atomic, idempotent (`CREATE` on first run, `ALTER` on re-run), no separate bootstrap script. - **Transcription coverage**: two adjacent panels — a full-width bar gauge for `%` and a separate stat for `Blocks Transcribed This Week`. The `TEXT_SAVED` weekly count was already a dedicated panel in Row 3, so the "+N this week" badge collapses naturally into it. - **Row 4 placeholder**: none — #652 is closed, so the OCR panels use real Prometheus queries from day one. ## What's in the PR | File | Why | |---|---| | `backend/.../db/migration/V68__add_grafana_reader_role.sql` | Idempotent `grafana_reader` with SELECT on `audit_log`, `documents`, `transcription_blocks`. | | `backend/.../config/FlywayConfig.java` | Adds `.placeholders(Map.of("grafanaDbPassword", System.getenv(...)))`. Logs a warning + uses a non-secret fallback if the env var is missing, so tests/CI don't fail loudly when the secret is irrelevant. | | `backend/.../config/GrafanaReaderRoleIntegrationTest.java` | Testcontainers test: positive grants on the 3 tables, negative grant on `app_users` (NFR-SEC-01). | | `infra/observability/grafana/provisioning/datasources/datasources.yml` | Adds `postgres` datasource, `editable: false`, `sslmode: disable` (intra-network). | | `infra/observability/grafana/provisioning/dashboards/po-overview.json` | 4 rows × 16 panels, `uid: po-overview`, 7-day default range, manual refresh, panel titles per Leonie's recommendations. | | `docker-compose.observability.yml` | `obs-grafana` joins `archiv-net` and receives `GRAFANA_DB_PASSWORD`. | | `docker-compose.yml` | `archive-backend` receives `GRAFANA_DB_PASSWORD` (Flyway needs it). | | `.env.example`, `infra/observability/obs.env` | Declares the variable + `openssl rand -hex 32` hint. | | `docs/DEPLOYMENT.md` | Adds the var to the observability env-var table, Gitea secrets table, and obs-secrets.env reference. | | `docs/architecture/c4/l2-containers.puml` | New Grafana → archive-db arrow. | ## Test plan - [x] `./mvnw test -Dtest=GrafanaReaderRoleIntegrationTest` — 4/4 green (positive + negative privilege assertions) - [x] `./mvnw test -Dtest=AuditLogQueryRepositoryIntegrationTest` — still 4/4 green (Flyway placeholder didn't break the existing migration suite) - [x] `./mvnw compile` — clean - [ ] **Manual smoke (operator step, Sara's recommendation)**: ```bash export GRAFANA_DB_PASSWORD=$(openssl rand -hex 32) docker compose down -v && docker compose up -d docker compose -f docker-compose.observability.yml up -d docker exec archive-db psql -U archiv -d familienarchiv -c \ "SELECT has_table_privilege('grafana_reader','audit_log','SELECT');" # → t docker exec archive-db psql -U archiv -d familienarchiv -c \ "SELECT has_table_privilege('grafana_reader','app_users','SELECT');" # → f ``` Open `http://localhost:3003` → PO Overview → confirm every panel populates (not "No data") and the default range is "Last 7 days". ## Out of scope Alerting, per-user breakdown, per-sender OCR accuracy, mobile layout — all explicit in the issue. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
marcel added 9 commits 2026-05-21 20:22:25 +02:00
Add Flyway V68 migration that provisions a read-only PostgreSQL role
scoped to audit_log, documents, and transcription_blocks. The role's
password is injected via the new ${grafanaDbPassword} Flyway placeholder,
which FlywayConfig reads from the GRAFANA_DB_PASSWORD env var. The
migration is idempotent: CREATE on first run, ALTER on re-run.

Adds a Testcontainers integration test asserting positive grants on the
three intended tables and a negative grant on app_users (NFR-SEC-01).

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
obs-grafana now joins archiv-net so it can resolve archive-db:5432 for the
PO Overview dashboard's PostgreSQL datasource, and receives GRAFANA_DB_PASSWORD
so provisioning can interpolate it into the datasource config.

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Flyway runs inside the backend container at startup; V68's
${grafanaDbPassword} placeholder is resolved from this env var.

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a read-only datasource pointing at archive-db using the grafana_reader
role (provisioned by Flyway V68). The password is interpolated from the
GRAFANA_DB_PASSWORD env var passed to obs-grafana, and the connection is
locked to editable: false so the credential cannot be inspected via the UI.

sslmode=disable is intentional: traffic stays inside archiv-net.

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
.env.example: declare GRAFANA_DB_PASSWORD with an openssl rand -hex 32 hint
so a missing value fails loudly (NFR-OPS-02). obs.env: add a comment
explaining that the real value comes from CI's obs-secrets.env, matching
the pattern used for other secrets in that file.

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Provisioned dashboard for the product owner's weekly check-in: system
health (Prometheus + Loki), user activity (PostgreSQL audit_log), archive
progress (PostgreSQL transcription_blocks + audit_log), and OCR quality
(Prometheus ocr-service metrics). Default range 7d, manual refresh,
thresholds per the issue spec.

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds GRAFANA_DB_PASSWORD to the observability-stack env-var table, the
Gitea secrets table, and the obs-secrets.env reference, so operators see
the variable wherever they look for related secrets.

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the new read-only connection from Grafana to archive-db (via the
grafana_reader role) introduced by the PO Overview dashboard.

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ci(observability): inject GRAFANA_DB_PASSWORD from Gitea secrets
All checks were successful
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s
CI / Unit & Component Tests (pull_request) Successful in 3m32s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m30s
bcba4dab80
Wires the new GRAFANA_DB_PASSWORD secret through the deploy pipeline:

- docker-compose.prod.yml: backend env now passes GRAFANA_DB_PASSWORD
  through so Flyway V68 can resolve the ${grafanaDbPassword} placeholder
  in production and staging (it already worked in local dev via
  docker-compose.yml).
- release.yml + nightly.yml: declare GRAFANA_DB_PASSWORD as a required
  Gitea secret, write it into .env.production / .env.staging (consumed
  by archive-backend), and into /opt/familienarchiv/obs-secrets.env
  (consumed by obs-grafana's PostgreSQL datasource).

Operator action before the next deploy: add a GRAFANA_DB_PASSWORD value
to the Gitea repo secrets (openssl rand -hex 32).

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
marcel force-pushed feat/issue-651-grafana-po-overview from 3a0290ab35 to bcba4dab80 2026-05-21 20:22:25 +02:00 Compare
All checks were successful
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s
CI / Unit & Component Tests (pull_request) Successful in 3m32s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m30s
This pull request can be merged automatically.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin feat/issue-651-grafana-po-overview:feat/issue-651-grafana-po-overview
git checkout feat/issue-651-grafana-po-overview
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#659