fix(obs): wire up Spring Boot metrics and Loki log labels in Grafana #604
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The Grafana dashboard at grafana.archive.raddatz.cloud has no datapoints from Spring Boot and no app logs. Only the Linux resource dashboard (node-exporter + cAdvisor) works. Root cause analysis identified 4 independent issues across the observability pipeline.
Root Causes
1 — Missing
micrometer-registry-prometheusdependencybackend/pom.xmlhasspring-boot-starter-actuatorbut not the Prometheus registry. Without it the/actuator/prometheusendpoint simply does not exist.Fix: add to
pom.xml:2 —
/actuator/prometheusnot inmanagement.endpoints.web.exposure.includeSpring Boot 3+ only exposes
healthover HTTP by default. Even after adding Micrometer, the endpoint won't respond until it is explicitly exposed.Fix: add to
backend/src/main/resources/application.yamlunder the existingmanagement:block:3 — Spring Security blocks
/actuator/prometheus(returns 401)SecurityConfig.javapermits/actuator/healthbut/actuator/prometheusfalls through toanyRequest().authenticated(). Prometheus receives an HTML 401 page, logs "received unsupported Content-Type text/html", and marks the targethealth: down.Fix: add to the
authorizeHttpRequestsblock inSecurityConfig.java:4 — Promtail never sets a
joblabel — Loki logs dashboard returns nothingPromtail's
docker_sd_configsrelabels containers tocompose_service,compose_project, etc. but never tojob. The provisioned Grafana dashboardloki-logs.jsonuseslabel_values(job)for its "App" dropdown and{job="$app"}in all panel queries. Since nojoblabel exists in Loki the dropdown is empty and every query returns no data.(Backend logs are in Loki under
compose_service=backend— they're just invisible to the dashboard.)Fix: add a relabel rule in
infra/observability/promtail/promtail-config.yml:Then restart
obs-promtail:docker restart obs-promtail.Acceptance Criteria
spring-bootshowshealth: up(check athttps://grafana.archive.raddatz.cloud→ Explore → Prometheus →up{job="spring-boot"})backend,frontend, etc.backendvisible in the "Loki Logs" dashboard panel/actuator/healthstill returns 200 without credentials (Docker health check must keep working)healthandprometheus)Notes
./mvnw clean package, thendocker compose up -d --build backendin the staging compose).docker restart obs-promtail— no rebuild needed.ocr-servicePrometheus target is alsodowndue to a DNS resolution failure (lookup ocrfails — the container is on a different network than Prometheus expects). That's a separate known TODO already noted inprometheus.ymland is out of scope here.🏗️ Markus Keller — Senior Application Architect
Observations
management.server.port: 8081) is the architecturally correct approach and is already in place. The comment inapplication.yamlexplains it precisely: Spring Security's filter chain on:8080never sees requests arriving on:8081. This is a clean separation of concerns.auth.requestMatchers("/actuator/prometheus").permitAll()toSecurityConfig.javawould have no effect — that chain never sees port-8081 traffic. Do not add it.pom.xmlhasmicrometer-registry-prometheusat line 217.application.yamlalready includeshealth,info,prometheus,metricsin the exposure list.joblabel) is the one genuinely missing piece. The dashboard queries inloki-logs.jsonuse{job="$app"}andlabel_values(job)— these will return nothing until the label exists in Loki.prometheus.yml(lines 17–19) says: "Target will show as DOWN until backend instrumentation issue adds micrometer-registry-prometheus..." — this should be removed as part of this fix since that work is done.Recommendations
jobrelabel rule toinfra/observability/promtail/promtail-config.yml.infra/observability/prometheus/prometheus.yml.SecurityConfig.java— the management port architecture already handles isolation correctly.ports:section of any compose file under an external bind). The comment inapplication.yamlclaims this — confirm it holds indocker-compose.observability.yml.docs/infrastructure/production-compose.mdfile is in the current commit (modified in this branch) — verify that any architecture notes there reflect the management port design accurately.👨💻 Felix Brandt — Senior Fullstack Developer
Observations
promtail-config.ymlcurrently relabelscompose_service→compose_serviceandcompose_project→compose_project, but never maps anything tojob. The fix is onerelabel_configsentry.prometheus.yml(# Target will show as DOWN until backend instrumentation issue adds micrometer-registry-prometheus...) violates clean code: it describes a past state, not a constraint or intent. Delete it.Recommendations
promtail-config.yml:This maps
backend→job=backend,frontend→job=frontend, etc., which is exactly whatlabel_values(job)in the Grafana dashboard requires.@SpringBootTestintegration test that callsGET /actuator/prometheuson the management port and asserts: (a) status 200, (b) response body containsjvm_memory_used_bytes. This is the only production code change in this issue that can be regression-tested. Without it, a future config change could silently break metrics again.docker restart obs-promtail, confirm via Grafana Explore → Loki →{job="backend"}before closing the issue.🛡️ Nora "NullX" Steiner — Application Security Engineer
Observations
Fix 3 as described introduces a security mistake, even though it would be a no-op.
The issue proposes:
This is wrong for two reasons:
It would be a no-op:
SecurityConfig.javaapplies only to the main app port:8080. The management port:8081(configured viamanagement.server.port) uses a completely separate Jetty connector and is NOT covered by this security chain at all. ThepermitAll()rule would never fire.If ever applied to the right chain, it would be unnecessary and creates false confidence: The correct isolation is network-level — port 8081 should be unreachable from outside the Docker internal network. Relying on application-level
permitAll()is weaker than network isolation because it requires correct configuration in every deployment.Verify the actual network isolation:
docker-compose.yml,docker-compose.observability.yml) and confirm port 8081 does NOT appear inports:mappings for the backend service./actuator/*— confirm the Caddyfile only routes the main app port.GET /actuator/metricswould expose JVM internals, heap statistics, and thread counts to anyone with server access.Recommendations
/actuator/prometheustoSecurityConfig.java— it is wrong, even as a no-op it is misleading.SecurityConfig.javanear the health permit rule:docker compose psorcurl http://HOST:8081/actuator/prometheusfrom outside the container network — it must time out or be refused.🧪 Sara Holt — QA Engineer & Test Strategist
Observations
joblabel fix has no code path to unit test — it's a pure config change. That's fine, but it means the only verification is the manual AC check in Grafana./actuator/prometheuson port 8081 can be asserted to exist and return non-empty content. This is the only automation opportunity in this issue.healthandprometheus") is the most important one for regression prevention. It's currently unverified by any test.Recommendations
@SpringBootTestintegration test (or extend an existing one) with these two assertions:GET /actuator/prometheuson port@LocalManagementPort→ 200 withContent-Type: text/plainGET /actuator/heapdumpon the management port → 404 or 404/401 (verifies non-exposure of sensitive endpoints)/actuator/prometheusis NOT accessible on the main app port:8080.docker restart obs-promtail, run the LogQL query{job="backend"}in Grafana Explore and confirm log lines appear before marking the issue closed. This is the only way to confirm Fix 4 actually worked.ocr-servicePrometheus target (ocr:8000) is deliberately noted as a known TODO inprometheus.yml. Consider tracking it in a separate issue rather than leaving it as a comment — a tracked TODO is visible; a YAML comment is not.📋 Elicit — Requirements Engineer
Observations
This is a well-specified bug report. Root causes are precise, fixes are concrete, and acceptance criteria are specific and testable. The issue format is exemplary for a devops/infra bug.
However, a codebase inspection reveals that the implementation scope may be significantly smaller than the issue implies:
micrometer-registry-prometheuspom.xmlpom.xml)/actuator/prometheusmanagement.endpoints.web.exposure.includehealth,info,prometheus,metricsinapplication.yaml)SecurityConfig.javapermit prometheusjoblabelpromtail-config.ymlhas nojobtarget_labelFix 4 is the only actual implementation work remaining.
Additionally, one requirement is implicit but unwritten:
Recommendations
SecurityConfig.javaorpom.xml.🎨 Leonie Voss — UX Designer & Accessibility Strategist
No UX concerns for this issue. This is a pure backend/infrastructure observability fix with no frontend component or user-facing interaction.
One observation from the product perspective: the existing Grafana dashboards (
loki-logs.json,spring-boot-observability.json) were provisioned but have been silently empty. Once thejoblabel is populated via Fix 4, the "App" dropdown in the Loki dashboard will start showing values likebackend,frontend,db— this is the first time operators will actually be able to use that dashboard. Worth a quick sanity check that the dropdown labels are human-readable and map cleanly to what operators expect to search for (e.g.,backendis clear;archiv-production-backend-1would not be). Based on the promtail relabel config usingcompose_service, labels will be short service names likebackend,frontend,minio,db— these are clear and navigable.🗳️ Decision Queue — Action Required
1 decision needs your input before implementation starts.
Security / Architecture
Fix 3 should NOT be implemented as written — the issue proposes adding
auth.requestMatchers("/actuator/prometheus").permitAll()toSecurityConfig.java, but the management port (8081) is outside that filter chain entirely. Adding this rule would be a no-op and a misleading comment about security intent.Option A (Recommended): Do nothing to
SecurityConfig.java. Add a one-line explanatory comment next to the existingpermitAll()for/actuator/healthto make the management-port isolation explicit for future readers. Remove the stale comment fromprometheus.yml.Option B: If you want belt-and-suspenders defense — configure Spring Boot's management security explicitly via
management.security.*properties inapplication.yamlto restrict access to only specific IPs. This is optional; the network isolation (Docker internal network, port 8081 not published externally) is already sufficient.(Raised by: Markus, Nora)
Implementation Complete
All work is on branch
worktree-fix+issue-604-obs-wiring→ PR #606.What was implemented
The prior commit (
11320ecd) had already applied Fix 4 (promtailjoblabel) and removed the staleprometheus.ymlcomment. This session fixed four Spring Boot 4.0-specific issues that prevented/actuator/prometheusfrom working:spring-boot-starter-micrometer-metricsmissing — Spring Boot 4.0 moved the Prometheus scrape endpoint out ofspring-boot-starter-actuatorinto a dedicated starterspring-boot-starter-micrometer-metricsdependency topom.xmlmanagement.prometheus.metrics.export.enablednot set — Spring Boot 4.0 defaults metrics export to false (opt-in)management.prometheus.metrics.export.enabled: truetoapplication.yamlSecurityConfig.javadid not permit/actuator/prometheus— Spring Boot 4.0 with Jetty serves the management port (8081) via the same security filter chain as the main port (8080)permitAll()for/actuator/prometheusalongside the existing/actuator/healthruleManagementWebSecurityAutoConfigurationexclusion inFamilienarchivApplication.java(class does not exist in Spring Boot 4.0) that caused a compilation failure@SpringBootApplicationRegression test added
ActuatorPrometheusIT— a@SpringBootTest(webEnvironment = RANDOM_PORT)integration test that assertsGET /actuator/prometheuson the management port returns200withjvm_memory_used_byteswithout credentials. This will catch any future Spring Boot upgrade that silently breaks metrics collection.Acceptance criteria status
spring-bootshowshealth: up/actuator/prometheusnow accessible without auth — requiresdocker restart obs-promtail+docker compose up -d --build backendto verify end-to-endbackend,frontend, etc.joblabel added in prior commit — requiresdocker restart obs-promtailto take effectbackendvisible in Loki dashboard/actuator/healthstill returns 200 without credentialshealthandprometheusare inpermitAll(); all other endpoints still require authenticationCommits
11320ecd— fix(obs): wire Prometheus metrics and Loki job label for Grafana dashboards91a227f5— fix(obs): wire Prometheus endpoint for Spring Boot 4.0