observability: add /api/dashboard/activity p95 latency panel to Grafana #291

New Issue

marcel · 2026-04-20T18:13:33+02:00

marcel commented

2026-04-20 18:13:33 +02:00

Background

Deferred from PR #288 during review cycle 1 (Tobias Wendt).

The rollup query + partial covering index introduced in #285 makes /api/dashboard/activity a new hot path consumed by both /chronik and the dashboard side-rail. If the V49 index is missing on some environment, or a future query regression hits, the symptom is a slow feed — before anything else breaks.

Concern

No Grafana panel for /api/dashboard/activity request rate + latency. A silent regression would go unnoticed until users complain.

Scope

Grafana panel: request rate (req/s), p50 / p95 / p99 latency.
Alertmanager rule: p95 > 500 ms sustained 5 min.
Add to the existing "API performance" dashboard (or create one if none exists for app-level latency).

Reference

PR: http://heim-nas:3005/marcel/familienarchiv/pulls/288
Parent issue: feat: unify /notifications and dashboard activity feed into a /chronik page (#285)
Index migration: backend/src/main/resources/db/migration/V49__add_audit_log_rollup_index.sql

## Background Deferred from PR #288 during review cycle 1 (Tobias Wendt). The rollup query + partial covering index introduced in #285 makes `/api/dashboard/activity` a new hot path consumed by both `/chronik` and the dashboard side-rail. If the `V49` index is missing on some environment, or a future query regression hits, the symptom is a slow feed — before anything else breaks. ## Concern No Grafana panel for `/api/dashboard/activity` request rate + latency. A silent regression would go unnoticed until users complain. ## Scope - Grafana panel: request rate (req/s), p50 / p95 / p99 latency. - Alertmanager rule: p95 > 500 ms sustained 5 min. - Add to the existing "API performance" dashboard (or create one if none exists for app-level latency). ## Reference - PR: http://heim-nas:3005/marcel/familienarchiv/pulls/288 - Parent issue: #285 - Index migration: `backend/src/main/resources/db/migration/V49__add_audit_log_rollup_index.sql`

marcel referenced this issue

2026-04-20 18:23:59 +02:00

feat: unify /notifications and dashboard activity feed into a /chronik page #288

marcel referenced this issue

2026-04-20 18:24:51 +02:00

feat: unify /notifications and dashboard activity feed into a /chronik page #288

marcel referenced this issue

2026-04-20 18:25:33 +02:00

feat: unify /notifications and dashboard activity feed into a /chronik page #288

marcel referenced this issue

2026-04-20 19:06:30 +02:00

feat(chronik): add cursor/offset pagination to /api/dashboard/activity + wire "Mehr laden" #290

marcel commented

2026-04-20 19:08:13 +02:00

🏗️ Tobias Wendt — DevOps & Platform Engineer

Observations

Spring Boot Actuator is already on the classpath (spring-boot-starter-actuator at pom.xml:35). Prometheus scrape endpoint is NOT enabled yet — needs micrometer-registry-prometheus dependency + management.endpoints.web.exposure.include + management.endpoint.prometheus.enabled: true.
Prometheus + Grafana + Loki + Promtail stack is documented at docs/infrastructure/production-compose.md (lines 65–95) with pinned versions (prom/prometheus:v2.51.0, grafana/grafana:10.4.0, grafana/loki:2.9.0, grafana/promtail:2.9.0). The ./observability/ directory is referenced for provisioning config but does not exist in the repo yet — needs creation with prometheus.yml + Grafana provisioning YAML + dashboard JSON.
No existing observability/ folder, no dashboards in the repo. This is the first observability PR — setup work, not just a panel addition.
Spring Boot 4 + Micrometer emits http.server.requests histogram metrics by default once Prometheus registry is wired. Per-URI latency is available via the uri label (templated — /api/dashboard/activity not /api/dashboard/activity?limit=40).

Recommendations

Scope expansion — this issue is not "just a panel," it's the full observability pipeline's first increment. Split into two sub-tasks in the same PR:

Wire Prometheus metrics export: add micrometer-registry-prometheus to pom.xml, enable /actuator/prometheus behind the internal management port (8081, per architect guidance — never the public port). Add an application-prod.yaml override.

Create observability/ directory structure:

observability/
├── prometheus.yml              # scrape config: backend:8081/actuator/prometheus every 15s
├── grafana/
│   └── provisioning/
│       ├── datasources/
│       │   └── prometheus.yaml
│       └── dashboards/
│           ├── dashboards.yaml # provider config
│           └── api-performance.json  # the dashboard
└── alertmanager/
    └── config.yml              # webhook to ntfy or email

Dashboard panels (on the new "API performance" dashboard):
- Request rate by URI — sum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity"}[5m]))
- Latency p50/p95/p99 — histogram_quantile(0.50 | 0.95 | 0.99, sum(rate(http_server_requests_seconds_bucket{uri="/api/dashboard/activity"}[5m])) by (le))
- Error rate — sum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity", status=~"5.."}[5m])) / sum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity"}[5m]))
- Add the same three panels for /api/dashboard/resume and /api/dashboard/pulse while you're there — cheap, and these all share the rollup/audit query path.

Alertmanager rule:

- alert: DashboardActivityP95Slow
  expr: histogram_quantile(0.95, sum(rate(http_server_requests_seconds_bucket{uri="/api/dashboard/activity"}[5m])) by (le)) > 0.5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "/api/dashboard/activity p95 > 500ms sustained 5min"
    runbook: "docs/runbooks/slow-activity-feed.md"

Write the runbook at the same time as the alert. Alerts without runbooks = 3 AM guessing.

Block /actuator/prometheus at Caddy. Management port 8081 is internal-only (Prometheus scrapes it over the compose network). The public backend reverse proxy must not forward /actuator/* — this was already called out as a security requirement.
Metric cardinality guard: /api/dashboard/activity?beforeAt=...&beforeDocId=...&beforeKind=... (once #290 lands) — make sure Micrometer templates the URI as /api/dashboard/activity not the full querystring. Default behavior in Spring Boot 4 does this correctly; verify with a live scrape before merge.
No Loki/Promtail for this issue. Log-based alerting isn't needed when metrics already give you p95.

Open Decisions

Scope of first observability PR. Option A: minimum viable — wire Prometheus export, add one dashboard with the three panels for /api/dashboard/activity, and the alert. Option B: do it properly — same as A, plus extend to all dashboard endpoints, JVM metrics panel, and a log aggregation check. Option C: split into two issues — this one stays "just the panel," a separate issue sets up the Prometheus pipeline. Recommend C if you want a clean commit history; A if you want to ship faster. B is premature without a clear operational pain point.

## 🏗️ Tobias Wendt — DevOps & Platform Engineer ### Observations - **Spring Boot Actuator is already on the classpath** (`spring-boot-starter-actuator` at `pom.xml:35`). Prometheus scrape endpoint is NOT enabled yet — needs `micrometer-registry-prometheus` dependency + `management.endpoints.web.exposure.include` + `management.endpoint.prometheus.enabled: true`. - **Prometheus + Grafana + Loki + Promtail stack is documented at `docs/infrastructure/production-compose.md`** (lines 65–95) with pinned versions (`prom/prometheus:v2.51.0`, `grafana/grafana:10.4.0`, `grafana/loki:2.9.0`, `grafana/promtail:2.9.0`). The `./observability/` directory is referenced for provisioning config but does not exist in the repo yet — needs creation with `prometheus.yml` + Grafana provisioning YAML + dashboard JSON. - **No existing `observability/` folder, no dashboards in the repo.** This is the first observability PR — setup work, not just a panel addition. - **Spring Boot 4 + Micrometer emits `http.server.requests` histogram metrics** by default once Prometheus registry is wired. Per-URI latency is available via the `uri` label (templated — `/api/dashboard/activity` not `/api/dashboard/activity?limit=40`). ### Recommendations - **Scope expansion — this issue is not "just a panel," it's the full observability pipeline's first increment.** Split into two sub-tasks in the same PR: 1. **Wire Prometheus metrics export**: add `micrometer-registry-prometheus` to `pom.xml`, enable `/actuator/prometheus` behind the internal management port (8081, per architect guidance — never the public port). Add an `application-prod.yaml` override. 2. **Create `observability/` directory structure**: ``` observability/ ├── prometheus.yml # scrape config: backend:8081/actuator/prometheus every 15s ├── grafana/ │ └── provisioning/ │ ├── datasources/ │ │ └── prometheus.yaml │ └── dashboards/ │ ├── dashboards.yaml # provider config │ └── api-performance.json # the dashboard └── alertmanager/ └── config.yml # webhook to ntfy or email ``` - **Dashboard panels (on the new "API performance" dashboard):** - **Request rate by URI** — `sum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity"}[5m]))` - **Latency p50/p95/p99** — `histogram_quantile(0.50 | 0.95 | 0.99, sum(rate(http_server_requests_seconds_bucket{uri="/api/dashboard/activity"}[5m])) by (le))` - **Error rate** — `sum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity", status=~"5.."}[5m])) / sum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity"}[5m]))` - Add the same three panels for `/api/dashboard/resume` and `/api/dashboard/pulse` while you're there — cheap, and these all share the rollup/audit query path. - **Alertmanager rule:** ```yaml - alert: DashboardActivityP95Slow expr: histogram_quantile(0.95, sum(rate(http_server_requests_seconds_bucket{uri="/api/dashboard/activity"}[5m])) by (le)) > 0.5 for: 5m labels: severity: warning annotations: summary: "/api/dashboard/activity p95 > 500ms sustained 5min" runbook: "docs/runbooks/slow-activity-feed.md" ``` Write the runbook at the same time as the alert. Alerts without runbooks = 3 AM guessing. - **Block `/actuator/prometheus` at Caddy.** Management port 8081 is internal-only (Prometheus scrapes it over the compose network). The public `backend` reverse proxy must not forward `/actuator/*` — this was already called out as a security requirement. - **Metric cardinality guard:** `/api/dashboard/activity?beforeAt=...&beforeDocId=...&beforeKind=...` (once #290 lands) — make sure Micrometer templates the URI as `/api/dashboard/activity` not the full querystring. Default behavior in Spring Boot 4 does this correctly; verify with a live scrape before merge. - **No Loki/Promtail for this issue.** Log-based alerting isn't needed when metrics already give you p95. ### Open Decisions - **Scope of first observability PR.** Option A: minimum viable — wire Prometheus export, add one dashboard with the three panels for `/api/dashboard/activity`, and the alert. Option B: do it properly — same as A, plus extend to all dashboard endpoints, JVM metrics panel, and a log aggregation check. Option C: split into two issues — this one stays "just the panel," a separate issue sets up the Prometheus pipeline. Recommend C if you want a clean commit history; A if you want to ship faster. B is premature without a clear operational pain point.

marcel commented

2026-04-20 19:08:24 +02:00

🏛️ Markus Keller — Senior Application Architect

Observations

Metric scope at the HTTP layer (http_server_requests_seconds) is an application-framework concern. The rollup query itself emits no metrics today — Micrometer's Spring Data JPA integration is not wired.
The decision between "HTTP endpoint latency" and "SQL query latency" is not binary — they measure different failure modes. HTTP p95 catches slow queries, slow serialization, thread-pool saturation, and lock contention. Query-level timing isolates the DB round-trip.
Management port separation (8081 vs public 8080) is already documented in docs/infrastructure/production-compose.md. This issue should reuse that split, not negotiate it.

Recommendations

HTTP-level metric is the right layer for this issue. The question we want to answer is "is /chronik slow?" — that's an end-to-end question. SQL-level timing for the rollup query specifically is a follow-up if the HTTP panel ever fires a slow-alert and we need to narrow down.
No new abstraction for the metric plumbing. Micrometer is the contract. Don't introduce a MetricsService wrapper or project-specific timer interface. Inject MeterRegistry if a non-HTTP custom metric ever gets justified — until then, lean on Actuator defaults.
Runbook structure matches ADR structure. docs/runbooks/slow-activity-feed.md with: Symptom → Immediate check (is V49 present? \d+ idx_audit_log_rollup) → Common causes → Resolution → Escalation. Same rhythm as an ADR but operational.

Open Decisions

None.

## 🏛️ Markus Keller — Senior Application Architect ### Observations - Metric scope at the HTTP layer (`http_server_requests_seconds`) is an application-framework concern. The rollup query itself emits no metrics today — Micrometer's Spring Data JPA integration is not wired. - The decision between "HTTP endpoint latency" and "SQL query latency" is not binary — they measure different failure modes. HTTP p95 catches slow queries, slow serialization, thread-pool saturation, and lock contention. Query-level timing isolates the DB round-trip. - Management port separation (8081 vs public 8080) is already documented in `docs/infrastructure/production-compose.md`. This issue should reuse that split, not negotiate it. ### Recommendations - **HTTP-level metric is the right layer for this issue.** The question we want to answer is "is `/chronik` slow?" — that's an end-to-end question. SQL-level timing for the rollup query specifically is a follow-up if the HTTP panel ever fires a slow-alert and we need to narrow down. - **No new abstraction for the metric plumbing.** Micrometer is the contract. Don't introduce a `MetricsService` wrapper or project-specific timer interface. Inject `MeterRegistry` if a non-HTTP custom metric ever gets justified — until then, lean on Actuator defaults. - **Runbook structure matches ADR structure.** `docs/runbooks/slow-activity-feed.md` with: Symptom → Immediate check (is V49 present? `\d+ idx_audit_log_rollup`) → Common causes → Resolution → Escalation. Same rhythm as an ADR but operational. ### Open Decisions - None.

marcel commented

2026-04-20 19:08:37 +02:00

🔒 Nora "NullX" Steiner — Application Security Engineer

Observations

application.yaml currently has only management.health.mail.enabled: false — no management.endpoints.web.exposure.include override. Spring Boot 4 default exposes only /actuator/health on the web — safe.
Enabling /actuator/prometheus broadens the attack surface. If reachable from the internet, it leaks:
- Complete URI topology of the application (every mapped endpoint as a metric label).
- JVM memory, thread counts, GC stats — reconnaissance material for targeted DoS.
- Request-count distribution — timing oracle for endpoint existence.

Recommendations

Management endpoints on port 8081, public endpoints on 8080. Already the documented pattern in docs/infrastructure/production-compose.md. Enforce it at the application level:

management:
  server:
    port: 8081                           # separate port = separate TCP listener
    address: 0.0.0.0                     # reachable within Docker network
  endpoints:
    web:
      exposure:
        include: health, prometheus
  endpoint:
    prometheus:
      enabled: true

Caddy must not proxy /actuator/* on the public domain. Add an explicit 404:

familienarchiv.example.com {
    @actuator path /actuator/*
    respond @actuator 404
    reverse_proxy backend:8080
}

The 404 (not 403) avoids confirming the endpoint exists.

Prometheus scraper auth. If Prometheus is on the same Docker network, no auth needed. If Prometheus is remote (unlikely for a self-hosted single-VPS setup), bind Basic Auth on the management port — never bare.
Alert webhook authentication. When the Alertmanager rule fires, the webhook target (ntfy, email, Slack) must use credentials from .env, never hardcoded in alertmanager.yml committed to the repo.
No PII in metric labels. URI templating is the main defense here — {uri="/api/documents/{id}"} is safe; raw paths like /api/documents/550e8400-... would create high-cardinality metrics and leak document IDs. Spring Boot's default URI tagger templates correctly; verify after wiring.

Open Decisions

None — the guidance is load-bearing but straightforward.

## 🔒 Nora "NullX" Steiner — Application Security Engineer ### Observations - `application.yaml` currently has only `management.health.mail.enabled: false` — no `management.endpoints.web.exposure.include` override. Spring Boot 4 default exposes only `/actuator/health` on the web — safe. - Enabling `/actuator/prometheus` broadens the attack surface. If reachable from the internet, it leaks: - Complete URI topology of the application (every mapped endpoint as a metric label). - JVM memory, thread counts, GC stats — reconnaissance material for targeted DoS. - Request-count distribution — timing oracle for endpoint existence. ### Recommendations - **Management endpoints on port 8081, public endpoints on 8080.** Already the documented pattern in `docs/infrastructure/production-compose.md`. Enforce it at the application level: ```yaml management: server: port: 8081 # separate port = separate TCP listener address: 0.0.0.0 # reachable within Docker network endpoints: web: exposure: include: health, prometheus endpoint: prometheus: enabled: true ``` - **Caddy must not proxy `/actuator/*` on the public domain.** Add an explicit 404: ```caddyfile familienarchiv.example.com { @actuator path /actuator/* respond @actuator 404 reverse_proxy backend:8080 } ``` The 404 (not 403) avoids confirming the endpoint exists. - **Prometheus scraper auth.** If Prometheus is on the same Docker network, no auth needed. If Prometheus is remote (unlikely for a self-hosted single-VPS setup), bind Basic Auth on the management port — never bare. - **Alert webhook authentication.** When the Alertmanager rule fires, the webhook target (ntfy, email, Slack) must use credentials from `.env`, never hardcoded in `alertmanager.yml` committed to the repo. - **No PII in metric labels.** URI templating is the main defense here — `{uri="/api/documents/{id}"}` is safe; raw paths like `/api/documents/550e8400-...` would create high-cardinality metrics and leak document IDs. Spring Boot's default URI tagger templates correctly; verify after wiring. ### Open Decisions - None — the guidance is load-bearing but straightforward.

marcel commented

2026-04-20 19:08:52 +02:00

🧪 Sara Holt — Senior QA Engineer

Observations

This issue is observability infrastructure — no new app code behavior, no domain test pyramid changes. Test strategy here is about verifying the metric pipeline itself, not the business logic.
spring-boot-starter-actuator-test is already in pom.xml:84 — @AutoConfigureObservability or direct MeterRegistry access in tests is supported.

Recommendations

Integration smoke test: /actuator/prometheus returns 200 with the expected metric name.

@Test
void actuator_prometheus_exposes_http_server_requests() throws Exception {
    mockMvc.perform(get("http://localhost:8081/actuator/prometheus"))
        .andExpect(status().isOk())
        .andExpect(content().string(containsString("http_server_requests_seconds_count")));
}

This catches the common failure mode: someone bumps Spring Boot and the metric name changes or the endpoint disables itself.

Security regression: /actuator/prometheus is NOT reachable on the public port.

@Test
void actuator_prometheus_returns_404_on_public_port() throws Exception {
    mockMvc.perform(get("http://localhost:8080/actuator/prometheus"))
        .andExpect(status().isNotFound());
}

Lock the Nora-flagged boundary in CI.

Manual verification checklist for the dashboard JSON (no automated test for JSON content):
- Import api-performance.json via Grafana's Dashboard → Import UI; confirm no validation errors.
- Each panel renders data against the local Prometheus within 1 min of traffic.
- Hover-tooltips show URI templated (no raw paths/IDs).

Alert rule unit test — Prometheus itself has a rule test framework:

# observability/alertmanager/rules_test.yml
rule_files: [alerts.yml]
tests:
  - interval: 1m
    input_series:
      - series: 'http_server_requests_seconds_bucket{uri="/api/dashboard/activity", le="0.5"}'
        values: '100 100 100 100 100 100'
      - series: 'http_server_requests_seconds_bucket{uri="/api/dashboard/activity", le="+Inf"}'
        values: '200 200 200 200 200 200'  # 50% over 500ms → fires
    alert_rule_test:
      - eval_time: 5m
        alertname: DashboardActivityP95Slow
        exp_alerts:
          - exp_labels: { severity: warning }

Run via promtool test rules rules_test.yml in CI. Catches the case where someone tweaks the expression and accidentally disables the alert.

Open Decisions

None.

## 🧪 Sara Holt — Senior QA Engineer ### Observations - This issue is observability infrastructure — no new app code behavior, no domain test pyramid changes. Test strategy here is about verifying the metric pipeline itself, not the business logic. - `spring-boot-starter-actuator-test` is already in `pom.xml:84` — `@AutoConfigureObservability` or direct `MeterRegistry` access in tests is supported. ### Recommendations - **Integration smoke test: `/actuator/prometheus` returns 200 with the expected metric name.** ```java @Test void actuator_prometheus_exposes_http_server_requests() throws Exception { mockMvc.perform(get("http://localhost:8081/actuator/prometheus")) .andExpect(status().isOk()) .andExpect(content().string(containsString("http_server_requests_seconds_count"))); } ``` This catches the common failure mode: someone bumps Spring Boot and the metric name changes or the endpoint disables itself. - **Security regression: `/actuator/prometheus` is NOT reachable on the public port.** ```java @Test void actuator_prometheus_returns_404_on_public_port() throws Exception { mockMvc.perform(get("http://localhost:8080/actuator/prometheus")) .andExpect(status().isNotFound()); } ``` Lock the Nora-flagged boundary in CI. - **Manual verification checklist for the dashboard JSON** (no automated test for JSON content): - [ ] Import `api-performance.json` via Grafana's Dashboard → Import UI; confirm no validation errors. - [ ] Each panel renders data against the local Prometheus within 1 min of traffic. - [ ] Hover-tooltips show URI templated (no raw paths/IDs). - **Alert rule unit test** — Prometheus itself has a rule test framework: ```yaml # observability/alertmanager/rules_test.yml rule_files: [alerts.yml] tests: - interval: 1m input_series: - series: 'http_server_requests_seconds_bucket{uri="/api/dashboard/activity", le="0.5"}' values: '100 100 100 100 100 100' - series: 'http_server_requests_seconds_bucket{uri="/api/dashboard/activity", le="+Inf"}' values: '200 200 200 200 200 200' # 50% over 500ms → fires alert_rule_test: - eval_time: 5m alertname: DashboardActivityP95Slow exp_alerts: - exp_labels: { severity: warning } ``` Run via `promtool test rules rules_test.yml` in CI. Catches the case where someone tweaks the expression and accidentally disables the alert. ### Open Decisions - None.

marcel commented

2026-04-20 19:08:57 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

No concerns from my angle — this is observability wiring, not application code. Checked: the backend changes needed are Spring Boot / Actuator configuration, which is Tobias's lane; the only "app code" touch is adding micrometer-registry-prometheus to pom.xml and an application-prod.yaml snippet, both of which are one-line mechanical changes with no business logic implications.

Happy to write the actuator_prometheus_exposes_http_server_requests + actuator_prometheus_returns_404_on_public_port tests Sara outlined — those go in the existing @WebMvcTest slice pattern we use for DashboardControllerTest and require no new test infrastructure.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer No concerns from my angle — this is observability wiring, not application code. Checked: the backend changes needed are Spring Boot / Actuator configuration, which is Tobias's lane; the only "app code" touch is adding `micrometer-registry-prometheus` to `pom.xml` and an `application-prod.yaml` snippet, both of which are one-line mechanical changes with no business logic implications. Happy to write the `actuator_prometheus_exposes_http_server_requests` + `actuator_prometheus_returns_404_on_public_port` tests Sara outlined — those go in the existing `@WebMvcTest` slice pattern we use for `DashboardControllerTest` and require no new test infrastructure.

marcel commented

2026-04-20 19:09:01 +02:00

🎨 Leonie Voss — UX/Design Lead

No concerns from my angle — this issue is internal operator tooling, not a user-facing surface. Checked: no spec reference, no UI, no user workflow. Grafana has its own design system that the family archive team cannot and should not influence.

One operational UX note since Tobias mentioned a runbook: when that runbook gets written, keep the voice friendly and instructive rather than terse. The "user" of that runbook is the same Marcel who debugs at 23:00 on a Sunday — future-you deserves clear steps, not telegram-style bullet points.

## 🎨 Leonie Voss — UX/Design Lead No concerns from my angle — this issue is internal operator tooling, not a user-facing surface. Checked: no spec reference, no UI, no user workflow. Grafana has its own design system that the family archive team cannot and should not influence. One operational UX note since Tobias mentioned a runbook: when that runbook gets written, keep the voice friendly and instructive rather than terse. The "user" of that runbook is the same Marcel who debugs at 23:00 on a Sunday — future-you deserves clear steps, not telegram-style bullet points.

marcel commented

2026-04-20 19:09:15 +02:00

🗳️ Decision Queue — Action Required

1 decision needs your input before implementation starts.

Infrastructure

Scope of this first observability PR.
- Option A: Minimum viable — wire Prometheus export, add one dashboard with three panels for /api/dashboard/activity, and the p95 alert. Ships fast.
- Option B: Do it properly — A plus extend to /api/dashboard/resume and /api/dashboard/pulse, JVM metrics panel, log aggregation check. 1–2 days more work.
- Option C (recommended): Split into two issues — this one stays "just the panel," a separate issue sets up the Prometheus pipeline first. Cleanest commit history, easier review, easier rollback.
- Tradeoff: A = fast ship, single PR touches config + dashboard + alert. C = two smaller PRs that each tell a clean story. B ships the most value but is over-scoped for a deferred follow-up.
- (Raised by: Tobias)

## 🗳️ Decision Queue — Action Required _1 decision needs your input before implementation starts._ ### Infrastructure - **Scope of this first observability PR.** - **Option A:** Minimum viable — wire Prometheus export, add one dashboard with three panels for `/api/dashboard/activity`, and the p95 alert. Ships fast. - **Option B:** Do it properly — A plus extend to `/api/dashboard/resume` and `/api/dashboard/pulse`, JVM metrics panel, log aggregation check. 1–2 days more work. - **Option C (recommended):** Split into two issues — this one stays "just the panel," a separate issue sets up the Prometheus pipeline first. Cleanest commit history, easier review, easier rollback. - _Tradeoff:_ A = fast ship, single PR touches config + dashboard + alert. C = two smaller PRs that each tell a clean story. B ships the most value but is over-scoped for a deferred follow-up. - _(Raised by: Tobias)_

marcel referenced this issue

2026-04-20 19:11:04 +02:00

feat(dashboard): add `kinds` CSV query param to /api/dashboard/activity #293

marcel closed this issue

2026-04-21 19:06:33 +02:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marcel/familienarchiv#291