observability: add /api/dashboard/activity p95 latency panel to Grafana #291

Closed
opened 2026-04-20 18:13:33 +02:00 by marcel · 7 comments
Owner

Background

Deferred from PR #288 during review cycle 1 (Tobias Wendt).

The rollup query + partial covering index introduced in #285 makes /api/dashboard/activity a new hot path consumed by both /chronik and the dashboard side-rail. If the V49 index is missing on some environment, or a future query regression hits, the symptom is a slow feed — before anything else breaks.

Concern

No Grafana panel for /api/dashboard/activity request rate + latency. A silent regression would go unnoticed until users complain.

Scope

  • Grafana panel: request rate (req/s), p50 / p95 / p99 latency.
  • Alertmanager rule: p95 > 500 ms sustained 5 min.
  • Add to the existing "API performance" dashboard (or create one if none exists for app-level latency).

Reference

## Background Deferred from PR #288 during review cycle 1 (Tobias Wendt). The rollup query + partial covering index introduced in #285 makes `/api/dashboard/activity` a new hot path consumed by both `/chronik` and the dashboard side-rail. If the `V49` index is missing on some environment, or a future query regression hits, the symptom is a slow feed — before anything else breaks. ## Concern No Grafana panel for `/api/dashboard/activity` request rate + latency. A silent regression would go unnoticed until users complain. ## Scope - Grafana panel: request rate (req/s), p50 / p95 / p99 latency. - Alertmanager rule: p95 > 500 ms sustained 5 min. - Add to the existing "API performance" dashboard (or create one if none exists for app-level latency). ## Reference - PR: http://heim-nas:3005/marcel/familienarchiv/pulls/288 - Parent issue: #285 - Index migration: `backend/src/main/resources/db/migration/V49__add_audit_log_rollup_index.sql`
Author
Owner

🏗️ Tobias Wendt — DevOps & Platform Engineer

Observations

  • Spring Boot Actuator is already on the classpath (spring-boot-starter-actuator at pom.xml:35). Prometheus scrape endpoint is NOT enabled yet — needs micrometer-registry-prometheus dependency + management.endpoints.web.exposure.include + management.endpoint.prometheus.enabled: true.
  • Prometheus + Grafana + Loki + Promtail stack is documented at docs/infrastructure/production-compose.md (lines 65–95) with pinned versions (prom/prometheus:v2.51.0, grafana/grafana:10.4.0, grafana/loki:2.9.0, grafana/promtail:2.9.0). The ./observability/ directory is referenced for provisioning config but does not exist in the repo yet — needs creation with prometheus.yml + Grafana provisioning YAML + dashboard JSON.
  • No existing observability/ folder, no dashboards in the repo. This is the first observability PR — setup work, not just a panel addition.
  • Spring Boot 4 + Micrometer emits http.server.requests histogram metrics by default once Prometheus registry is wired. Per-URI latency is available via the uri label (templated — /api/dashboard/activity not /api/dashboard/activity?limit=40).

Recommendations

  • Scope expansion — this issue is not "just a panel," it's the full observability pipeline's first increment. Split into two sub-tasks in the same PR:
    1. Wire Prometheus metrics export: add micrometer-registry-prometheus to pom.xml, enable /actuator/prometheus behind the internal management port (8081, per architect guidance — never the public port). Add an application-prod.yaml override.
    2. Create observability/ directory structure:
      observability/
      ├── prometheus.yml              # scrape config: backend:8081/actuator/prometheus every 15s
      ├── grafana/
      │   └── provisioning/
      │       ├── datasources/
      │       │   └── prometheus.yaml
      │       └── dashboards/
      │           ├── dashboards.yaml # provider config
      │           └── api-performance.json  # the dashboard
      └── alertmanager/
          └── config.yml              # webhook to ntfy or email
      
  • Dashboard panels (on the new "API performance" dashboard):
    • Request rate by URIsum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity"}[5m]))
    • Latency p50/p95/p99histogram_quantile(0.50 | 0.95 | 0.99, sum(rate(http_server_requests_seconds_bucket{uri="/api/dashboard/activity"}[5m])) by (le))
    • Error ratesum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity", status=~"5.."}[5m])) / sum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity"}[5m]))
    • Add the same three panels for /api/dashboard/resume and /api/dashboard/pulse while you're there — cheap, and these all share the rollup/audit query path.
  • Alertmanager rule:
    - alert: DashboardActivityP95Slow
      expr: histogram_quantile(0.95, sum(rate(http_server_requests_seconds_bucket{uri="/api/dashboard/activity"}[5m])) by (le)) > 0.5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "/api/dashboard/activity p95 > 500ms sustained 5min"
        runbook: "docs/runbooks/slow-activity-feed.md"
    
    Write the runbook at the same time as the alert. Alerts without runbooks = 3 AM guessing.
  • Block /actuator/prometheus at Caddy. Management port 8081 is internal-only (Prometheus scrapes it over the compose network). The public backend reverse proxy must not forward /actuator/* — this was already called out as a security requirement.
  • Metric cardinality guard: /api/dashboard/activity?beforeAt=...&beforeDocId=...&beforeKind=... (once #290 lands) — make sure Micrometer templates the URI as /api/dashboard/activity not the full querystring. Default behavior in Spring Boot 4 does this correctly; verify with a live scrape before merge.
  • No Loki/Promtail for this issue. Log-based alerting isn't needed when metrics already give you p95.

Open Decisions

  • Scope of first observability PR. Option A: minimum viable — wire Prometheus export, add one dashboard with the three panels for /api/dashboard/activity, and the alert. Option B: do it properly — same as A, plus extend to all dashboard endpoints, JVM metrics panel, and a log aggregation check. Option C: split into two issues — this one stays "just the panel," a separate issue sets up the Prometheus pipeline. Recommend C if you want a clean commit history; A if you want to ship faster. B is premature without a clear operational pain point.
## 🏗️ Tobias Wendt — DevOps & Platform Engineer ### Observations - **Spring Boot Actuator is already on the classpath** (`spring-boot-starter-actuator` at `pom.xml:35`). Prometheus scrape endpoint is NOT enabled yet — needs `micrometer-registry-prometheus` dependency + `management.endpoints.web.exposure.include` + `management.endpoint.prometheus.enabled: true`. - **Prometheus + Grafana + Loki + Promtail stack is documented at `docs/infrastructure/production-compose.md`** (lines 65–95) with pinned versions (`prom/prometheus:v2.51.0`, `grafana/grafana:10.4.0`, `grafana/loki:2.9.0`, `grafana/promtail:2.9.0`). The `./observability/` directory is referenced for provisioning config but does not exist in the repo yet — needs creation with `prometheus.yml` + Grafana provisioning YAML + dashboard JSON. - **No existing `observability/` folder, no dashboards in the repo.** This is the first observability PR — setup work, not just a panel addition. - **Spring Boot 4 + Micrometer emits `http.server.requests` histogram metrics** by default once Prometheus registry is wired. Per-URI latency is available via the `uri` label (templated — `/api/dashboard/activity` not `/api/dashboard/activity?limit=40`). ### Recommendations - **Scope expansion — this issue is not "just a panel," it's the full observability pipeline's first increment.** Split into two sub-tasks in the same PR: 1. **Wire Prometheus metrics export**: add `micrometer-registry-prometheus` to `pom.xml`, enable `/actuator/prometheus` behind the internal management port (8081, per architect guidance — never the public port). Add an `application-prod.yaml` override. 2. **Create `observability/` directory structure**: ``` observability/ ├── prometheus.yml # scrape config: backend:8081/actuator/prometheus every 15s ├── grafana/ │ └── provisioning/ │ ├── datasources/ │ │ └── prometheus.yaml │ └── dashboards/ │ ├── dashboards.yaml # provider config │ └── api-performance.json # the dashboard └── alertmanager/ └── config.yml # webhook to ntfy or email ``` - **Dashboard panels (on the new "API performance" dashboard):** - **Request rate by URI** — `sum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity"}[5m]))` - **Latency p50/p95/p99** — `histogram_quantile(0.50 | 0.95 | 0.99, sum(rate(http_server_requests_seconds_bucket{uri="/api/dashboard/activity"}[5m])) by (le))` - **Error rate** — `sum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity", status=~"5.."}[5m])) / sum(rate(http_server_requests_seconds_count{uri="/api/dashboard/activity"}[5m]))` - Add the same three panels for `/api/dashboard/resume` and `/api/dashboard/pulse` while you're there — cheap, and these all share the rollup/audit query path. - **Alertmanager rule:** ```yaml - alert: DashboardActivityP95Slow expr: histogram_quantile(0.95, sum(rate(http_server_requests_seconds_bucket{uri="/api/dashboard/activity"}[5m])) by (le)) > 0.5 for: 5m labels: severity: warning annotations: summary: "/api/dashboard/activity p95 > 500ms sustained 5min" runbook: "docs/runbooks/slow-activity-feed.md" ``` Write the runbook at the same time as the alert. Alerts without runbooks = 3 AM guessing. - **Block `/actuator/prometheus` at Caddy.** Management port 8081 is internal-only (Prometheus scrapes it over the compose network). The public `backend` reverse proxy must not forward `/actuator/*` — this was already called out as a security requirement. - **Metric cardinality guard:** `/api/dashboard/activity?beforeAt=...&beforeDocId=...&beforeKind=...` (once #290 lands) — make sure Micrometer templates the URI as `/api/dashboard/activity` not the full querystring. Default behavior in Spring Boot 4 does this correctly; verify with a live scrape before merge. - **No Loki/Promtail for this issue.** Log-based alerting isn't needed when metrics already give you p95. ### Open Decisions - **Scope of first observability PR.** Option A: minimum viable — wire Prometheus export, add one dashboard with the three panels for `/api/dashboard/activity`, and the alert. Option B: do it properly — same as A, plus extend to all dashboard endpoints, JVM metrics panel, and a log aggregation check. Option C: split into two issues — this one stays "just the panel," a separate issue sets up the Prometheus pipeline. Recommend C if you want a clean commit history; A if you want to ship faster. B is premature without a clear operational pain point.
Author
Owner

🏛️ Markus Keller — Senior Application Architect

Observations

  • Metric scope at the HTTP layer (http_server_requests_seconds) is an application-framework concern. The rollup query itself emits no metrics today — Micrometer's Spring Data JPA integration is not wired.
  • The decision between "HTTP endpoint latency" and "SQL query latency" is not binary — they measure different failure modes. HTTP p95 catches slow queries, slow serialization, thread-pool saturation, and lock contention. Query-level timing isolates the DB round-trip.
  • Management port separation (8081 vs public 8080) is already documented in docs/infrastructure/production-compose.md. This issue should reuse that split, not negotiate it.

Recommendations

  • HTTP-level metric is the right layer for this issue. The question we want to answer is "is /chronik slow?" — that's an end-to-end question. SQL-level timing for the rollup query specifically is a follow-up if the HTTP panel ever fires a slow-alert and we need to narrow down.
  • No new abstraction for the metric plumbing. Micrometer is the contract. Don't introduce a MetricsService wrapper or project-specific timer interface. Inject MeterRegistry if a non-HTTP custom metric ever gets justified — until then, lean on Actuator defaults.
  • Runbook structure matches ADR structure. docs/runbooks/slow-activity-feed.md with: Symptom → Immediate check (is V49 present? \d+ idx_audit_log_rollup) → Common causes → Resolution → Escalation. Same rhythm as an ADR but operational.

Open Decisions

  • None.
## 🏛️ Markus Keller — Senior Application Architect ### Observations - Metric scope at the HTTP layer (`http_server_requests_seconds`) is an application-framework concern. The rollup query itself emits no metrics today — Micrometer's Spring Data JPA integration is not wired. - The decision between "HTTP endpoint latency" and "SQL query latency" is not binary — they measure different failure modes. HTTP p95 catches slow queries, slow serialization, thread-pool saturation, and lock contention. Query-level timing isolates the DB round-trip. - Management port separation (8081 vs public 8080) is already documented in `docs/infrastructure/production-compose.md`. This issue should reuse that split, not negotiate it. ### Recommendations - **HTTP-level metric is the right layer for this issue.** The question we want to answer is "is `/chronik` slow?" — that's an end-to-end question. SQL-level timing for the rollup query specifically is a follow-up if the HTTP panel ever fires a slow-alert and we need to narrow down. - **No new abstraction for the metric plumbing.** Micrometer is the contract. Don't introduce a `MetricsService` wrapper or project-specific timer interface. Inject `MeterRegistry` if a non-HTTP custom metric ever gets justified — until then, lean on Actuator defaults. - **Runbook structure matches ADR structure.** `docs/runbooks/slow-activity-feed.md` with: Symptom → Immediate check (is V49 present? `\d+ idx_audit_log_rollup`) → Common causes → Resolution → Escalation. Same rhythm as an ADR but operational. ### Open Decisions - None.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Observations

  • application.yaml currently has only management.health.mail.enabled: false — no management.endpoints.web.exposure.include override. Spring Boot 4 default exposes only /actuator/health on the web — safe.
  • Enabling /actuator/prometheus broadens the attack surface. If reachable from the internet, it leaks:
    • Complete URI topology of the application (every mapped endpoint as a metric label).
    • JVM memory, thread counts, GC stats — reconnaissance material for targeted DoS.
    • Request-count distribution — timing oracle for endpoint existence.

Recommendations

  • Management endpoints on port 8081, public endpoints on 8080. Already the documented pattern in docs/infrastructure/production-compose.md. Enforce it at the application level:
    management:
      server:
        port: 8081                           # separate port = separate TCP listener
        address: 0.0.0.0                     # reachable within Docker network
      endpoints:
        web:
          exposure:
            include: health, prometheus
      endpoint:
        prometheus:
          enabled: true
    
  • Caddy must not proxy /actuator/* on the public domain. Add an explicit 404:
    familienarchiv.example.com {
        @actuator path /actuator/*
        respond @actuator 404
        reverse_proxy backend:8080
    }
    
    The 404 (not 403) avoids confirming the endpoint exists.
  • Prometheus scraper auth. If Prometheus is on the same Docker network, no auth needed. If Prometheus is remote (unlikely for a self-hosted single-VPS setup), bind Basic Auth on the management port — never bare.
  • Alert webhook authentication. When the Alertmanager rule fires, the webhook target (ntfy, email, Slack) must use credentials from .env, never hardcoded in alertmanager.yml committed to the repo.
  • No PII in metric labels. URI templating is the main defense here — {uri="/api/documents/{id}"} is safe; raw paths like /api/documents/550e8400-... would create high-cardinality metrics and leak document IDs. Spring Boot's default URI tagger templates correctly; verify after wiring.

Open Decisions

  • None — the guidance is load-bearing but straightforward.
## 🔒 Nora "NullX" Steiner — Application Security Engineer ### Observations - `application.yaml` currently has only `management.health.mail.enabled: false` — no `management.endpoints.web.exposure.include` override. Spring Boot 4 default exposes only `/actuator/health` on the web — safe. - Enabling `/actuator/prometheus` broadens the attack surface. If reachable from the internet, it leaks: - Complete URI topology of the application (every mapped endpoint as a metric label). - JVM memory, thread counts, GC stats — reconnaissance material for targeted DoS. - Request-count distribution — timing oracle for endpoint existence. ### Recommendations - **Management endpoints on port 8081, public endpoints on 8080.** Already the documented pattern in `docs/infrastructure/production-compose.md`. Enforce it at the application level: ```yaml management: server: port: 8081 # separate port = separate TCP listener address: 0.0.0.0 # reachable within Docker network endpoints: web: exposure: include: health, prometheus endpoint: prometheus: enabled: true ``` - **Caddy must not proxy `/actuator/*` on the public domain.** Add an explicit 404: ```caddyfile familienarchiv.example.com { @actuator path /actuator/* respond @actuator 404 reverse_proxy backend:8080 } ``` The 404 (not 403) avoids confirming the endpoint exists. - **Prometheus scraper auth.** If Prometheus is on the same Docker network, no auth needed. If Prometheus is remote (unlikely for a self-hosted single-VPS setup), bind Basic Auth on the management port — never bare. - **Alert webhook authentication.** When the Alertmanager rule fires, the webhook target (ntfy, email, Slack) must use credentials from `.env`, never hardcoded in `alertmanager.yml` committed to the repo. - **No PII in metric labels.** URI templating is the main defense here — `{uri="/api/documents/{id}"}` is safe; raw paths like `/api/documents/550e8400-...` would create high-cardinality metrics and leak document IDs. Spring Boot's default URI tagger templates correctly; verify after wiring. ### Open Decisions - None — the guidance is load-bearing but straightforward.
Author
Owner

🧪 Sara Holt — Senior QA Engineer

Observations

  • This issue is observability infrastructure — no new app code behavior, no domain test pyramid changes. Test strategy here is about verifying the metric pipeline itself, not the business logic.
  • spring-boot-starter-actuator-test is already in pom.xml:84@AutoConfigureObservability or direct MeterRegistry access in tests is supported.

Recommendations

  • Integration smoke test: /actuator/prometheus returns 200 with the expected metric name.
    @Test
    void actuator_prometheus_exposes_http_server_requests() throws Exception {
        mockMvc.perform(get("http://localhost:8081/actuator/prometheus"))
            .andExpect(status().isOk())
            .andExpect(content().string(containsString("http_server_requests_seconds_count")));
    }
    
    This catches the common failure mode: someone bumps Spring Boot and the metric name changes or the endpoint disables itself.
  • Security regression: /actuator/prometheus is NOT reachable on the public port.
    @Test
    void actuator_prometheus_returns_404_on_public_port() throws Exception {
        mockMvc.perform(get("http://localhost:8080/actuator/prometheus"))
            .andExpect(status().isNotFound());
    }
    
    Lock the Nora-flagged boundary in CI.
  • Manual verification checklist for the dashboard JSON (no automated test for JSON content):
    • Import api-performance.json via Grafana's Dashboard → Import UI; confirm no validation errors.
    • Each panel renders data against the local Prometheus within 1 min of traffic.
    • Hover-tooltips show URI templated (no raw paths/IDs).
  • Alert rule unit test — Prometheus itself has a rule test framework:
    # observability/alertmanager/rules_test.yml
    rule_files: [alerts.yml]
    tests:
      - interval: 1m
        input_series:
          - series: 'http_server_requests_seconds_bucket{uri="/api/dashboard/activity", le="0.5"}'
            values: '100 100 100 100 100 100'
          - series: 'http_server_requests_seconds_bucket{uri="/api/dashboard/activity", le="+Inf"}'
            values: '200 200 200 200 200 200'  # 50% over 500ms → fires
        alert_rule_test:
          - eval_time: 5m
            alertname: DashboardActivityP95Slow
            exp_alerts:
              - exp_labels: { severity: warning }
    
    Run via promtool test rules rules_test.yml in CI. Catches the case where someone tweaks the expression and accidentally disables the alert.

Open Decisions

  • None.
## 🧪 Sara Holt — Senior QA Engineer ### Observations - This issue is observability infrastructure — no new app code behavior, no domain test pyramid changes. Test strategy here is about verifying the metric pipeline itself, not the business logic. - `spring-boot-starter-actuator-test` is already in `pom.xml:84` — `@AutoConfigureObservability` or direct `MeterRegistry` access in tests is supported. ### Recommendations - **Integration smoke test: `/actuator/prometheus` returns 200 with the expected metric name.** ```java @Test void actuator_prometheus_exposes_http_server_requests() throws Exception { mockMvc.perform(get("http://localhost:8081/actuator/prometheus")) .andExpect(status().isOk()) .andExpect(content().string(containsString("http_server_requests_seconds_count"))); } ``` This catches the common failure mode: someone bumps Spring Boot and the metric name changes or the endpoint disables itself. - **Security regression: `/actuator/prometheus` is NOT reachable on the public port.** ```java @Test void actuator_prometheus_returns_404_on_public_port() throws Exception { mockMvc.perform(get("http://localhost:8080/actuator/prometheus")) .andExpect(status().isNotFound()); } ``` Lock the Nora-flagged boundary in CI. - **Manual verification checklist for the dashboard JSON** (no automated test for JSON content): - [ ] Import `api-performance.json` via Grafana's Dashboard → Import UI; confirm no validation errors. - [ ] Each panel renders data against the local Prometheus within 1 min of traffic. - [ ] Hover-tooltips show URI templated (no raw paths/IDs). - **Alert rule unit test** — Prometheus itself has a rule test framework: ```yaml # observability/alertmanager/rules_test.yml rule_files: [alerts.yml] tests: - interval: 1m input_series: - series: 'http_server_requests_seconds_bucket{uri="/api/dashboard/activity", le="0.5"}' values: '100 100 100 100 100 100' - series: 'http_server_requests_seconds_bucket{uri="/api/dashboard/activity", le="+Inf"}' values: '200 200 200 200 200 200' # 50% over 500ms → fires alert_rule_test: - eval_time: 5m alertname: DashboardActivityP95Slow exp_alerts: - exp_labels: { severity: warning } ``` Run via `promtool test rules rules_test.yml` in CI. Catches the case where someone tweaks the expression and accidentally disables the alert. ### Open Decisions - None.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

No concerns from my angle — this is observability wiring, not application code. Checked: the backend changes needed are Spring Boot / Actuator configuration, which is Tobias's lane; the only "app code" touch is adding micrometer-registry-prometheus to pom.xml and an application-prod.yaml snippet, both of which are one-line mechanical changes with no business logic implications.

Happy to write the actuator_prometheus_exposes_http_server_requests + actuator_prometheus_returns_404_on_public_port tests Sara outlined — those go in the existing @WebMvcTest slice pattern we use for DashboardControllerTest and require no new test infrastructure.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer No concerns from my angle — this is observability wiring, not application code. Checked: the backend changes needed are Spring Boot / Actuator configuration, which is Tobias's lane; the only "app code" touch is adding `micrometer-registry-prometheus` to `pom.xml` and an `application-prod.yaml` snippet, both of which are one-line mechanical changes with no business logic implications. Happy to write the `actuator_prometheus_exposes_http_server_requests` + `actuator_prometheus_returns_404_on_public_port` tests Sara outlined — those go in the existing `@WebMvcTest` slice pattern we use for `DashboardControllerTest` and require no new test infrastructure.
Author
Owner

🎨 Leonie Voss — UX/Design Lead

No concerns from my angle — this issue is internal operator tooling, not a user-facing surface. Checked: no spec reference, no UI, no user workflow. Grafana has its own design system that the family archive team cannot and should not influence.

One operational UX note since Tobias mentioned a runbook: when that runbook gets written, keep the voice friendly and instructive rather than terse. The "user" of that runbook is the same Marcel who debugs at 23:00 on a Sunday — future-you deserves clear steps, not telegram-style bullet points.

## 🎨 Leonie Voss — UX/Design Lead No concerns from my angle — this issue is internal operator tooling, not a user-facing surface. Checked: no spec reference, no UI, no user workflow. Grafana has its own design system that the family archive team cannot and should not influence. One operational UX note since Tobias mentioned a runbook: when that runbook gets written, keep the voice friendly and instructive rather than terse. The "user" of that runbook is the same Marcel who debugs at 23:00 on a Sunday — future-you deserves clear steps, not telegram-style bullet points.
Author
Owner

🗳️ Decision Queue — Action Required

1 decision needs your input before implementation starts.

Infrastructure

  • Scope of this first observability PR.
    • Option A: Minimum viable — wire Prometheus export, add one dashboard with three panels for /api/dashboard/activity, and the p95 alert. Ships fast.
    • Option B: Do it properly — A plus extend to /api/dashboard/resume and /api/dashboard/pulse, JVM metrics panel, log aggregation check. 1–2 days more work.
    • Option C (recommended): Split into two issues — this one stays "just the panel," a separate issue sets up the Prometheus pipeline first. Cleanest commit history, easier review, easier rollback.
    • Tradeoff: A = fast ship, single PR touches config + dashboard + alert. C = two smaller PRs that each tell a clean story. B ships the most value but is over-scoped for a deferred follow-up.
    • (Raised by: Tobias)
## 🗳️ Decision Queue — Action Required _1 decision needs your input before implementation starts._ ### Infrastructure - **Scope of this first observability PR.** - **Option A:** Minimum viable — wire Prometheus export, add one dashboard with three panels for `/api/dashboard/activity`, and the p95 alert. Ships fast. - **Option B:** Do it properly — A plus extend to `/api/dashboard/resume` and `/api/dashboard/pulse`, JVM metrics panel, log aggregation check. 1–2 days more work. - **Option C (recommended):** Split into two issues — this one stays "just the panel," a separate issue sets up the Prometheus pipeline first. Cleanest commit history, easier review, easier rollback. - _Tradeoff:_ A = fast ship, single PR touches config + dashboard + alert. C = two smaller PRs that each tell a clean story. B ships the most value but is over-scoped for a deferred follow-up. - _(Raised by: Tobias)_
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#291