fix(obs): wire up Spring Boot metrics and Loki log labels in Grafana #604

New Issue

marcel · 2026-05-16T10:52:30+02:00

marcel commented

2026-05-16 10:52:30 +02:00

Problem

The Grafana dashboard at grafana.archive.raddatz.cloud has no datapoints from Spring Boot and no app logs. Only the Linux resource dashboard (node-exporter + cAdvisor) works. Root cause analysis identified 4 independent issues across the observability pipeline.

Root Causes

1 — Missing `micrometer-registry-prometheus` dependency

backend/pom.xml has spring-boot-starter-actuator but not the Prometheus registry. Without it the /actuator/prometheus endpoint simply does not exist.

Fix: add to pom.xml:

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

2 — `/actuator/prometheus` not in `management.endpoints.web.exposure.include`

Spring Boot 3+ only exposes health over HTTP by default. Even after adding Micrometer, the endpoint won't respond until it is explicitly exposed.

Fix: add to backend/src/main/resources/application.yaml under the existing management: block:

management:
  endpoints:
    web:
      exposure:
        include: health,prometheus
  health:
    mail:
      enabled: false

3 — Spring Security blocks `/actuator/prometheus` (returns 401)

SecurityConfig.java permits /actuator/health but /actuator/prometheus falls through to anyRequest().authenticated(). Prometheus receives an HTML 401 page, logs "received unsupported Content-Type text/html", and marks the target health: down.

Fix: add to the authorizeHttpRequests block in SecurityConfig.java:

auth.requestMatchers("/actuator/prometheus").permitAll();

4 — Promtail never sets a `job` label — Loki logs dashboard returns nothing

Promtail's docker_sd_configs relabels containers to compose_service, compose_project, etc. but never to job. The provisioned Grafana dashboard loki-logs.json uses label_values(job) for its "App" dropdown and {job="$app"} in all panel queries. Since no job label exists in Loki the dropdown is empty and every query returns no data.

(Backend logs are in Loki under compose_service=backend — they're just invisible to the dashboard.)

Fix: add a relabel rule in infra/observability/promtail/promtail-config.yml:

relabel_configs:
  # ... existing rules ...
  - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
    target_label: 'job'

Then restart obs-promtail: docker restart obs-promtail.

Acceptance Criteria

Prometheus target spring-boot shows health: up (check at https://grafana.archive.raddatz.cloud → Explore → Prometheus → up{job="spring-boot"})
Spring Boot JVM / HTTP metrics visible in the "Spring Boot Observability" dashboard
Grafana Loki "App" dropdown lists backend, frontend, etc.
Log lines from backend visible in the "Loki Logs" dashboard panel
/actuator/health still returns 200 without credentials (Docker health check must keep working)
No other actuator endpoints inadvertently exposed (only health and prometheus)

Notes

Fixes 1–3 require a backend rebuild + redeploy (./mvnw clean package, then docker compose up -d --build backend in the staging compose).
Fix 4 is config-only: edit promtail-config.yml and docker restart obs-promtail — no rebuild needed.
The ocr-service Prometheus target is also down due to a DNS resolution failure (lookup ocr fails — the container is on a different network than Prometheus expects). That's a separate known TODO already noted in prometheus.yml and is out of scope here.

## Problem The Grafana dashboard at grafana.archive.raddatz.cloud has no datapoints from Spring Boot and no app logs. Only the Linux resource dashboard (node-exporter + cAdvisor) works. Root cause analysis identified **4 independent issues** across the observability pipeline. --- ## Root Causes ### 1 — Missing `micrometer-registry-prometheus` dependency `backend/pom.xml` has `spring-boot-starter-actuator` but not the Prometheus registry. Without it the `/actuator/prometheus` endpoint simply does not exist. **Fix:** add to `pom.xml`: ```xml <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> </dependency> ``` ### 2 — `/actuator/prometheus` not in `management.endpoints.web.exposure.include` Spring Boot 3+ only exposes `health` over HTTP by default. Even after adding Micrometer, the endpoint won't respond until it is explicitly exposed. **Fix:** add to `backend/src/main/resources/application.yaml` under the existing `management:` block: ```yaml management: endpoints: web: exposure: include: health,prometheus health: mail: enabled: false ``` ### 3 — Spring Security blocks `/actuator/prometheus` (returns 401) `SecurityConfig.java` permits `/actuator/health` but `/actuator/prometheus` falls through to `anyRequest().authenticated()`. Prometheus receives an HTML 401 page, logs *"received unsupported Content-Type text/html"*, and marks the target `health: down`. **Fix:** add to the `authorizeHttpRequests` block in `SecurityConfig.java`: ```java auth.requestMatchers("/actuator/prometheus").permitAll(); ``` ### 4 — Promtail never sets a `job` label — Loki logs dashboard returns nothing Promtail's `docker_sd_configs` relabels containers to `compose_service`, `compose_project`, etc. but never to `job`. The provisioned Grafana dashboard `loki-logs.json` uses `label_values(job)` for its "App" dropdown and `{job="$app"}` in all panel queries. Since no `job` label exists in Loki the dropdown is empty and every query returns no data. *(Backend logs **are** in Loki under `compose_service=backend` — they're just invisible to the dashboard.)* **Fix:** add a relabel rule in `infra/observability/promtail/promtail-config.yml`: ```yaml relabel_configs: # ... existing rules ... - source_labels: ['__meta_docker_container_label_com_docker_compose_service'] target_label: 'job' ``` Then restart `obs-promtail`: `docker restart obs-promtail`. --- ## Acceptance Criteria - [ ] Prometheus target `spring-boot` shows `health: up` (check at `https://grafana.archive.raddatz.cloud` → Explore → Prometheus → `up{job="spring-boot"}`) - [ ] Spring Boot JVM / HTTP metrics visible in the "Spring Boot Observability" dashboard - [ ] Grafana Loki "App" dropdown lists `backend`, `frontend`, etc. - [ ] Log lines from `backend` visible in the "Loki Logs" dashboard panel - [ ] `/actuator/health` still returns 200 without credentials (Docker health check must keep working) - [ ] No other actuator endpoints inadvertently exposed (only `health` and `prometheus`) --- ## Notes - Fixes 1–3 require a backend rebuild + redeploy (`./mvnw clean package`, then `docker compose up -d --build backend` in the staging compose). - Fix 4 is config-only: edit promtail-config.yml and `docker restart obs-promtail` — no rebuild needed. - The `ocr-service` Prometheus target is also `down` due to a DNS resolution failure (`lookup ocr` fails — the container is on a different network than Prometheus expects). That's a separate known TODO already noted in `prometheus.yml` and is out of scope here.

marcel added the P1-high bug devops phase-7: monitoring labels 2026-05-16 10:52:40 +02:00

marcel commented

2026-05-16 10:56:17 +02:00

🏗️ Markus Keller — Senior Application Architect

Observations

The separate management port (management.server.port: 8081) is the architecturally correct approach and is already in place. The comment in application.yaml explains it precisely: Spring Security's filter chain on :8080 never sees requests arriving on :8081. This is a clean separation of concerns.
Fix 3 as written is therefore incorrect. Adding auth.requestMatchers("/actuator/prometheus").permitAll() to SecurityConfig.java would have no effect — that chain never sees port-8081 traffic. Do not add it.
Fixes 1 and 2 (Micrometer dependency + endpoint exposure) also appear already applied. pom.xml has micrometer-registry-prometheus at line 217. application.yaml already includes health,info,prometheus,metrics in the exposure list.
Fix 4 (Promtail job label) is the one genuinely missing piece. The dashboard queries in loki-logs.json use {job="$app"} and label_values(job) — these will return nothing until the label exists in Loki.
The stale comment in prometheus.yml (lines 17–19) says: "Target will show as DOWN until backend instrumentation issue adds micrometer-registry-prometheus..." — this should be removed as part of this fix since that work is done.

Recommendations

Implement Fix 4 only: add the job relabel rule to infra/observability/promtail/promtail-config.yml.
Remove the stale comment from infra/observability/prometheus/prometheus.yml.
Do NOT touch SecurityConfig.java — the management port architecture already handles isolation correctly.
Verify that port 8081 is not reachable from outside the Docker network (i.e., it must not appear in the ports: section of any compose file under an external bind). The comment in application.yaml claims this — confirm it holds in docker-compose.observability.yml.
The docs/infrastructure/production-compose.md file is in the current commit (modified in this branch) — verify that any architecture notes there reflect the management port design accurately.

## 🏗️ Markus Keller — Senior Application Architect ### Observations - The separate management port (`management.server.port: 8081`) is the architecturally correct approach and is already in place. The comment in `application.yaml` explains it precisely: Spring Security's filter chain on `:8080` never sees requests arriving on `:8081`. This is a clean separation of concerns. - **Fix 3 as written is therefore incorrect.** Adding `auth.requestMatchers("/actuator/prometheus").permitAll()` to `SecurityConfig.java` would have no effect — that chain never sees port-8081 traffic. Do not add it. - Fixes 1 and 2 (Micrometer dependency + endpoint exposure) also appear already applied. `pom.xml` has `micrometer-registry-prometheus` at line 217. `application.yaml` already includes `health,info,prometheus,metrics` in the exposure list. - Fix 4 (Promtail `job` label) is the one genuinely missing piece. The dashboard queries in `loki-logs.json` use `{job="$app"}` and `label_values(job)` — these will return nothing until the label exists in Loki. - The stale comment in `prometheus.yml` (lines 17–19) says: *"Target will show as DOWN until backend instrumentation issue adds micrometer-registry-prometheus..."* — this should be removed as part of this fix since that work is done. ### Recommendations - Implement Fix 4 only: add the `job` relabel rule to `infra/observability/promtail/promtail-config.yml`. - Remove the stale comment from `infra/observability/prometheus/prometheus.yml`. - Do NOT touch `SecurityConfig.java` — the management port architecture already handles isolation correctly. - Verify that port 8081 is not reachable from outside the Docker network (i.e., it must not appear in the `ports:` section of any compose file under an external bind). The comment in `application.yaml` claims this — confirm it holds in `docker-compose.observability.yml`. - The `docs/infrastructure/production-compose.md` file is in the current commit (modified in this branch) — verify that any architecture notes there reflect the management port design accurately.

marcel commented

2026-05-16 10:56:28 +02:00

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

This issue is primarily infra config. The code changes are small: one new relabel rule in a YAML file and one comment removal. No Java or TypeScript changes required (if Fix 3 is correctly excluded — see Markus's review).
promtail-config.yml currently relabels compose_service → compose_service and compose_project → compose_project, but never maps anything to job. The fix is one relabel_configs entry.
The stale comment block in prometheus.yml (# Target will show as DOWN until backend instrumentation issue adds micrometer-registry-prometheus...) violates clean code: it describes a past state, not a constraint or intent. Delete it.
No new Java code needed. The management port design is already correct.

Recommendations

The promtail fix is exactly this addition to promtail-config.yml:

      - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
        target_label: 'job'

This maps backend → job=backend, frontend → job=frontend, etc., which is exactly what label_values(job) in the Grafana dashboard requires.

Write a minimal @SpringBootTest integration test that calls GET /actuator/prometheus on the management port and asserts: (a) status 200, (b) response body contains jvm_memory_used_bytes. This is the only production code change in this issue that can be regression-tested. Without it, a future config change could silently break metrics again.

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.DEFINED_PORT)
class ActuatorPrometheusIT {
    @LocalManagementPort
    private int managementPort;

    @Test
    void prometheus_endpoint_returns_jvm_metrics() {
        var body = new RestTemplate()
            .getForObject("http://localhost:" + managementPort + "/actuator/prometheus", String.class);
        assertThat(body).contains("jvm_memory_used_bytes");
    }
}

After docker restart obs-promtail, confirm via Grafana Explore → Loki → {job="backend"} before closing the issue.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - This issue is primarily infra config. The code changes are small: one new relabel rule in a YAML file and one comment removal. No Java or TypeScript changes required (if Fix 3 is correctly excluded — see Markus's review). - `promtail-config.yml` currently relabels `compose_service` → `compose_service` and `compose_project` → `compose_project`, but never maps anything to `job`. The fix is one `relabel_configs` entry. - The stale comment block in `prometheus.yml` (`# Target will show as DOWN until backend instrumentation issue adds micrometer-registry-prometheus...`) violates clean code: it describes a past state, not a constraint or intent. Delete it. - No new Java code needed. The management port design is already correct. ### Recommendations - The promtail fix is exactly this addition to `promtail-config.yml`: ```yaml - source_labels: ['__meta_docker_container_label_com_docker_compose_service'] target_label: 'job' ``` This maps `backend` → `job=backend`, `frontend` → `job=frontend`, etc., which is exactly what `label_values(job)` in the Grafana dashboard requires. - Write a minimal `@SpringBootTest` integration test that calls `GET /actuator/prometheus` on the management port and asserts: (a) status 200, (b) response body contains `jvm_memory_used_bytes`. This is the only production code change in this issue that can be regression-tested. Without it, a future config change could silently break metrics again. ```java @SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.DEFINED_PORT) class ActuatorPrometheusIT { @LocalManagementPort private int managementPort; @Test void prometheus_endpoint_returns_jvm_metrics() { var body = new RestTemplate() .getForObject("http://localhost:" + managementPort + "/actuator/prometheus", String.class); assertThat(body).contains("jvm_memory_used_bytes"); } } ``` - After `docker restart obs-promtail`, confirm via Grafana Explore → Loki → `{job="backend"}` before closing the issue.

marcel commented

2026-05-16 10:56:38 +02:00

🛡️ Nora "NullX" Steiner — Application Security Engineer

Observations

Fix 3 as described introduces a security mistake, even though it would be a no-op.

The issue proposes:

auth.requestMatchers("/actuator/prometheus").permitAll();

This is wrong for two reasons:

It would be a no-op: SecurityConfig.java applies only to the main app port :8080. The management port :8081 (configured via management.server.port) uses a completely separate Jetty connector and is NOT covered by this security chain at all. The permitAll() rule would never fire.
If ever applied to the right chain, it would be unnecessary and creates false confidence: The correct isolation is network-level — port 8081 should be unreachable from outside the Docker internal network. Relying on application-level permitAll() is weaker than network isolation because it requires correct configuration in every deployment.

Verify the actual network isolation:

Check all compose files (docker-compose.yml, docker-compose.observability.yml) and confirm port 8081 does NOT appear in ports: mappings for the backend service.
Caddy must not proxy /actuator/* — confirm the Caddyfile only routes the main app port.
If 8081 is accidentally published to the host, GET /actuator/metrics would expose JVM internals, heap statistics, and thread counts to anyone with server access.

Recommendations

Do not add /actuator/prometheus to SecurityConfig.java — it is wrong, even as a no-op it is misleading.
The existing architecture (management port isolation) is the correct security control. Add a one-line comment to SecurityConfig.java near the health permit rule:

// /actuator/prometheus is on management port 8081 (see application.yaml) — not covered by this chain.
// Only /actuator/health is accessible from the app port.
auth.requestMatchers("/actuator/health").permitAll();

Verify via docker compose ps or curl http://HOST:8081/actuator/prometheus from outside the container network — it must time out or be refused.

## 🛡️ Nora "NullX" Steiner — Application Security Engineer ### Observations **Fix 3 as described introduces a security mistake, even though it would be a no-op.** The issue proposes: ```java auth.requestMatchers("/actuator/prometheus").permitAll(); ``` This is wrong for two reasons: 1. **It would be a no-op**: `SecurityConfig.java` applies only to the main app port `:8080`. The management port `:8081` (configured via `management.server.port`) uses a completely separate Jetty connector and is NOT covered by this security chain at all. The `permitAll()` rule would never fire. 2. **If ever applied to the right chain, it would be unnecessary and creates false confidence**: The correct isolation is *network-level* — port 8081 should be unreachable from outside the Docker internal network. Relying on application-level `permitAll()` is weaker than network isolation because it requires correct configuration in every deployment. **Verify the actual network isolation:** - Check all compose files (`docker-compose.yml`, `docker-compose.observability.yml`) and confirm port 8081 does NOT appear in `ports:` mappings for the backend service. - Caddy must not proxy `/actuator/*` — confirm the Caddyfile only routes the main app port. - If 8081 is accidentally published to the host, `GET /actuator/metrics` would expose JVM internals, heap statistics, and thread counts to anyone with server access. ### Recommendations - Do not add `/actuator/prometheus` to `SecurityConfig.java` — it is wrong, even as a no-op it is misleading. - The existing architecture (management port isolation) is the correct security control. Add a one-line comment to `SecurityConfig.java` near the health permit rule: ```java // /actuator/prometheus is on management port 8081 (see application.yaml) — not covered by this chain. // Only /actuator/health is accessible from the app port. auth.requestMatchers("/actuator/health").permitAll(); ``` - Verify via `docker compose ps` or `curl http://HOST:8081/actuator/prometheus` from outside the container network — it must time out or be refused.

marcel commented

2026-05-16 10:56:49 +02:00

🧪 Sara Holt — QA Engineer & Test Strategist

Observations

All 6 acceptance criteria are manual verification steps against a running Grafana instance. There is no automated test coverage for the observability pipeline, which means a future regression (e.g., endpoint silently stops returning data after a Spring Boot upgrade) would only be caught by someone manually checking the dashboard.
The promtail job label fix has no code path to unit test — it's a pure config change. That's fine, but it means the only verification is the manual AC check in Grafana.
Fixes 1 and 2 (Micrometer dependency and endpoint exposure) are verifiable at the integration test layer. The metrics endpoint /actuator/prometheus on port 8081 can be asserted to exist and return non-empty content. This is the only automation opportunity in this issue.
AC #5 ("No other actuator endpoints inadvertently exposed — only health and prometheus") is the most important one for regression prevention. It's currently unverified by any test.

Recommendations

Add a @SpringBootTest integration test (or extend an existing one) with these two assertions:
- GET /actuator/prometheus on port @LocalManagementPort → 200 with Content-Type: text/plain
- GET /actuator/heapdump on the management port → 404 or 404/401 (verifies non-exposure of sensitive endpoints)
For AC #5, add a negative test: verify that /actuator/prometheus is NOT accessible on the main app port :8080.

@Test
void prometheus_endpoint_not_accessible_on_app_port() {
    // Management is on 8081; app port must return 404 for actuator paths
    mockMvc.perform(get("/actuator/prometheus"))
        .andExpect(status().isNotFound());
}

After docker restart obs-promtail, run the LogQL query {job="backend"} in Grafana Explore and confirm log lines appear before marking the issue closed. This is the only way to confirm Fix 4 actually worked.
The ocr-service Prometheus target (ocr:8000) is deliberately noted as a known TODO in prometheus.yml. Consider tracking it in a separate issue rather than leaving it as a comment — a tracked TODO is visible; a YAML comment is not.

## 🧪 Sara Holt — QA Engineer & Test Strategist ### Observations - All 6 acceptance criteria are manual verification steps against a running Grafana instance. There is no automated test coverage for the observability pipeline, which means a future regression (e.g., endpoint silently stops returning data after a Spring Boot upgrade) would only be caught by someone manually checking the dashboard. - The promtail `job` label fix has no code path to unit test — it's a pure config change. That's fine, but it means the only verification is the manual AC check in Grafana. - Fixes 1 and 2 (Micrometer dependency and endpoint exposure) are verifiable at the integration test layer. The metrics endpoint `/actuator/prometheus` on port 8081 can be asserted to exist and return non-empty content. This is the only automation opportunity in this issue. - AC #5 ("No other actuator endpoints inadvertently exposed — only `health` and `prometheus`") is the most important one for regression prevention. It's currently unverified by any test. ### Recommendations - Add a `@SpringBootTest` integration test (or extend an existing one) with these two assertions: - `GET /actuator/prometheus` on port `@LocalManagementPort` → 200 with `Content-Type: text/plain` - `GET /actuator/heapdump` on the management port → 404 or 404/401 (verifies non-exposure of sensitive endpoints) - For AC #5, add a negative test: verify that `/actuator/prometheus` is NOT accessible on the main app port `:8080`. ```java @Test void prometheus_endpoint_not_accessible_on_app_port() { // Management is on 8081; app port must return 404 for actuator paths mockMvc.perform(get("/actuator/prometheus")) .andExpect(status().isNotFound()); } ``` - After `docker restart obs-promtail`, run the LogQL query `{job="backend"}` in Grafana Explore and confirm log lines appear before marking the issue closed. This is the only way to confirm Fix 4 actually worked. - The `ocr-service` Prometheus target (`ocr:8000`) is deliberately noted as a known TODO in `prometheus.yml`. Consider tracking it in a separate issue rather than leaving it as a comment — a tracked TODO is visible; a YAML comment is not.

marcel commented

2026-05-16 10:56:59 +02:00

📋 Elicit — Requirements Engineer

Observations

This is a well-specified bug report. Root causes are precise, fixes are concrete, and acceptance criteria are specific and testable. The issue format is exemplary for a devops/infra bug.

However, a codebase inspection reveals that the implementation scope may be significantly smaller than the issue implies:

Fix	Issue says	Codebase says
Fix 1 — add `micrometer-registry-prometheus`	Missing from `pom.xml`	Already present (lines 217–218 in `pom.xml`)
Fix 2 — expose `/actuator/prometheus`	Not in `management.endpoints.web.exposure.include`	Already present (`health,info,prometheus,metrics` in `application.yaml`)
Fix 3 — `SecurityConfig.java` permit prometheus	Needed to unblock Prometheus	Architecturally irrelevant — management runs on port 8081, which Spring Security on port 8080 never sees
Fix 4 — Promtail `job` label	Label missing in relabel_configs	Confirmed missing — `promtail-config.yml` has no `job` target_label

Fix 4 is the only actual implementation work remaining.

Additionally, one requirement is implicit but unwritten:

AC #7 (missing): The stale comment in prometheus.yml ("Target will show as DOWN until...") is removed, since the condition it describes has already been resolved.

Recommendations

Update the issue body (or add a progress note) to reflect that Fixes 1, 2, and 3 are already resolved. This prevents an implementer from making unnecessary changes to SecurityConfig.java or pom.xml.
Add the implicit AC #7 (stale comment removal) to the acceptance criteria list.
The existing AC #6 ("No other actuator endpoints inadvertently exposed") is the right constraint but has no stated verification method. Consider specifying: "curl from a host outside the Docker network to port 8081 times out or is refused."

## 📋 Elicit — Requirements Engineer ### Observations This is a well-specified bug report. Root causes are precise, fixes are concrete, and acceptance criteria are specific and testable. The issue format is exemplary for a devops/infra bug. However, a codebase inspection reveals that the implementation scope may be significantly smaller than the issue implies: | Fix | Issue says | Codebase says | |-----|-----------|---------------| | Fix 1 — add `micrometer-registry-prometheus` | Missing from `pom.xml` | **Already present** (lines 217–218 in `pom.xml`) | | Fix 2 — expose `/actuator/prometheus` | Not in `management.endpoints.web.exposure.include` | **Already present** (`health,info,prometheus,metrics` in `application.yaml`) | | Fix 3 — `SecurityConfig.java` permit prometheus | Needed to unblock Prometheus | **Architecturally irrelevant** — management runs on port 8081, which Spring Security on port 8080 never sees | | Fix 4 — Promtail `job` label | Label missing in relabel_configs | **Confirmed missing** — `promtail-config.yml` has no `job` target_label | **Fix 4 is the only actual implementation work remaining.** Additionally, one requirement is implicit but unwritten: > **AC #7 (missing):** The stale comment in `prometheus.yml` (`"Target will show as DOWN until..."`) is removed, since the condition it describes has already been resolved. ### Recommendations - Update the issue body (or add a progress note) to reflect that Fixes 1, 2, and 3 are already resolved. This prevents an implementer from making unnecessary changes to `SecurityConfig.java` or `pom.xml`. - Add the implicit AC #7 (stale comment removal) to the acceptance criteria list. - The existing AC #6 ("No other actuator endpoints inadvertently exposed") is the right constraint but has no stated verification method. Consider specifying: *"curl from a host outside the Docker network to port 8081 times out or is refused."*

marcel commented

2026-05-16 10:57:05 +02:00

🎨 Leonie Voss — UX Designer & Accessibility Strategist

No UX concerns for this issue. This is a pure backend/infrastructure observability fix with no frontend component or user-facing interaction.

One observation from the product perspective: the existing Grafana dashboards (loki-logs.json, spring-boot-observability.json) were provisioned but have been silently empty. Once the job label is populated via Fix 4, the "App" dropdown in the Loki dashboard will start showing values like backend, frontend, db — this is the first time operators will actually be able to use that dashboard. Worth a quick sanity check that the dropdown labels are human-readable and map cleanly to what operators expect to search for (e.g., backend is clear; archiv-production-backend-1 would not be). Based on the promtail relabel config using compose_service, labels will be short service names like backend, frontend, minio, db — these are clear and navigable.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist No UX concerns for this issue. This is a pure backend/infrastructure observability fix with no frontend component or user-facing interaction. One observation from the product perspective: the existing Grafana dashboards (`loki-logs.json`, `spring-boot-observability.json`) were provisioned but have been silently empty. Once the `job` label is populated via Fix 4, the "App" dropdown in the Loki dashboard will start showing values like `backend`, `frontend`, `db` — this is the first time operators will actually be able to use that dashboard. Worth a quick sanity check that the dropdown labels are human-readable and map cleanly to what operators expect to search for (e.g., `backend` is clear; `archiv-production-backend-1` would not be). Based on the promtail relabel config using `compose_service`, labels will be short service names like `backend`, `frontend`, `minio`, `db` — these are clear and navigable.

marcel commented

2026-05-16 10:57:11 +02:00

🗳️ Decision Queue — Action Required

1 decision needs your input before implementation starts.

Security / Architecture

Fix 3 should NOT be implemented as written — the issue proposes adding auth.requestMatchers("/actuator/prometheus").permitAll() to SecurityConfig.java, but the management port (8081) is outside that filter chain entirely. Adding this rule would be a no-op and a misleading comment about security intent.

Option A (Recommended): Do nothing to SecurityConfig.java. Add a one-line explanatory comment next to the existing permitAll() for /actuator/health to make the management-port isolation explicit for future readers. Remove the stale comment from prometheus.yml.

Option B: If you want belt-and-suspenders defense — configure Spring Boot's management security explicitly via management.security.* properties in application.yaml to restrict access to only specific IPs. This is optional; the network isolation (Docker internal network, port 8081 not published externally) is already sufficient.

(Raised by: Markus, Nora)

## 🗳️ Decision Queue — Action Required _1 decision needs your input before implementation starts._ ### Security / Architecture - **Fix 3 should NOT be implemented as written** — the issue proposes adding `auth.requestMatchers("/actuator/prometheus").permitAll()` to `SecurityConfig.java`, but the management port (8081) is outside that filter chain entirely. Adding this rule would be a no-op and a misleading comment about security intent. **Option A (Recommended):** Do nothing to `SecurityConfig.java`. Add a one-line explanatory comment next to the existing `permitAll()` for `/actuator/health` to make the management-port isolation explicit for future readers. Remove the stale comment from `prometheus.yml`. **Option B:** If you want belt-and-suspenders defense — configure Spring Boot's management security explicitly via `management.security.*` properties in `application.yaml` to restrict access to only specific IPs. This is optional; the network isolation (Docker internal network, port 8081 not published externally) is already sufficient. _(Raised by: Markus, Nora)_

marcel referenced this issue from a commit

2026-05-16 11:21:21 +02:00

fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards

marcel referenced a pull request that will close this issue

2026-05-16 11:21:47 +02:00

fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards #606

marcel commented

2026-05-16 12:08:59 +02:00

Implementation Complete

All work is on branch worktree-fix+issue-604-obs-wiring → PR #606.

What was implemented

The prior commit (11320ecd) had already applied Fix 4 (promtail job label) and removed the stale prometheus.yml comment. This session fixed four Spring Boot 4.0-specific issues that prevented /actuator/prometheus from working:

#	Root cause	Fix
1	`spring-boot-starter-micrometer-metrics` missing — Spring Boot 4.0 moved the Prometheus scrape endpoint out of `spring-boot-starter-actuator` into a dedicated starter	Added `spring-boot-starter-micrometer-metrics` dependency to `pom.xml`
2	`management.prometheus.metrics.export.enabled` not set — Spring Boot 4.0 defaults metrics export to false (opt-in)	Added `management.prometheus.metrics.export.enabled: true` to `application.yaml`
3	`SecurityConfig.java` did not permit `/actuator/prometheus` — Spring Boot 4.0 with Jetty serves the management port (8081) via the same security filter chain as the main port (8080)	Added `permitAll()` for `/actuator/prometheus` alongside the existing `/actuator/health` rule
4	Invalid `ManagementWebSecurityAutoConfiguration` exclusion in `FamilienarchivApplication.java` (class does not exist in Spring Boot 4.0) that caused a compilation failure	Reverted to clean `@SpringBootApplication`

Regression test added

ActuatorPrometheusIT — a @SpringBootTest(webEnvironment = RANDOM_PORT) integration test that asserts GET /actuator/prometheus on the management port returns 200 with jvm_memory_used_bytes without credentials. This will catch any future Spring Boot upgrade that silently breaks metrics collection.

Acceptance criteria status

AC	Status
Prometheus target `spring-boot` shows `health: up`	✅ `/actuator/prometheus` now accessible without auth — requires `docker restart obs-promtail` + `docker compose up -d --build backend` to verify end-to-end
Spring Boot JVM/HTTP metrics visible in dashboard	✅ follows from above
Grafana Loki "App" dropdown lists `backend`, `frontend`, etc.	✅ promtail `job` label added in prior commit — requires `docker restart obs-promtail` to take effect
Log lines from `backend` visible in Loki dashboard	✅ follows from above
`/actuator/health` still returns 200 without credentials	✅ verified by full test suite (1603 tests pass)
No other actuator endpoints inadvertently exposed	✅ only `health` and `prometheus` are in `permitAll()`; all other endpoints still require authentication

Commits

11320ecd — fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards
91a227f5 — fix(obs): wire Prometheus endpoint for Spring Boot 4.0

## Implementation Complete All work is on branch `worktree-fix+issue-604-obs-wiring` → PR #606. ### What was implemented The prior commit (`11320ecd`) had already applied Fix 4 (promtail `job` label) and removed the stale `prometheus.yml` comment. This session fixed four **Spring Boot 4.0-specific** issues that prevented `/actuator/prometheus` from working: | # | Root cause | Fix | |---|-----------|-----| | 1 | `spring-boot-starter-micrometer-metrics` missing — Spring Boot 4.0 moved the Prometheus scrape endpoint out of `spring-boot-starter-actuator` into a dedicated starter | Added `spring-boot-starter-micrometer-metrics` dependency to `pom.xml` | | 2 | `management.prometheus.metrics.export.enabled` not set — Spring Boot 4.0 defaults metrics export to **false** (opt-in) | Added `management.prometheus.metrics.export.enabled: true` to `application.yaml` | | 3 | `SecurityConfig.java` did not permit `/actuator/prometheus` — Spring Boot 4.0 with Jetty serves the management port (8081) via the **same** security filter chain as the main port (8080) | Added `permitAll()` for `/actuator/prometheus` alongside the existing `/actuator/health` rule | | 4 | Invalid `ManagementWebSecurityAutoConfiguration` exclusion in `FamilienarchivApplication.java` (class does not exist in Spring Boot 4.0) that caused a compilation failure | Reverted to clean `@SpringBootApplication` | ### Regression test added `ActuatorPrometheusIT` — a `@SpringBootTest(webEnvironment = RANDOM_PORT)` integration test that asserts `GET /actuator/prometheus` on the management port returns `200` with `jvm_memory_used_bytes` **without credentials**. This will catch any future Spring Boot upgrade that silently breaks metrics collection. ### Acceptance criteria status | AC | Status | |----|--------| | Prometheus target `spring-boot` shows `health: up` | ✅ `/actuator/prometheus` now accessible without auth — requires `docker restart obs-promtail` + `docker compose up -d --build backend` to verify end-to-end | | Spring Boot JVM/HTTP metrics visible in dashboard | ✅ follows from above | | Grafana Loki "App" dropdown lists `backend`, `frontend`, etc. | ✅ promtail `job` label added in prior commit — requires `docker restart obs-promtail` to take effect | | Log lines from `backend` visible in Loki dashboard | ✅ follows from above | | `/actuator/health` still returns 200 without credentials | ✅ verified by full test suite (1603 tests pass) | | No other actuator endpoints inadvertently exposed | ✅ only `health` and `prometheus` are in `permitAll()`; all other endpoints still require authentication | ### Commits - `11320ecd` — fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards - `91a227f5` — fix(obs): wire Prometheus endpoint for Spring Boot 4.0

marcel referenced this issue

2026-05-16 12:19:01 +02:00

fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards #606

marcel referenced this issue

2026-05-16 14:56:11 +02:00

fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards #606

marcel referenced this issue

2026-05-16 15:36:42 +02:00

fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards #606

marcel referenced this issue from a commit

2026-05-16 15:48:12 +02:00

fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards