devops(backend): expose Prometheus metrics endpoint + OTLP trace export from Spring Boot #576
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Spring Boot Actuator is already active (the Docker healthcheck calls
/actuator/health). This issue adds two new capabilities to the backend:/actuator/prometheusso Prometheus can pull JVM, HTTP, and database metricsBoth are wired through Micrometer, which is already the standard in Spring Boot 3+/4.
Depends on: Tempo issue (Tempo must exist in the compose network for traces to land somewhere, though OTLP failures are non-fatal)
Part A — Prometheus metrics
backend/pom.xml— add dependencybackend/src/main/resources/application.yml— expose the endpointThe
/actuator/prometheusendpoint must not be reachable through the Caddy reverse proxy (Prometheus scrapes it directly from insidearchiv-net). Verify thatapplication-prod.ymlstill blocks actuator exposure via the public interface, or that Caddy continues to block/actuator/*.Part B — OpenTelemetry tracing
backend/pom.xml— add dependenciesbackend/src/main/resources/application.ymldocker-compose.yml— add env var to backend serviceThe default
http://localhost:4317ensures the app starts cleanly when no observability stack is running (e.g., in CI). OTLP export failures are non-fatal by default in OpenTelemetry.Acceptance Criteria
curl -s http://localhost:8080/actuator/prometheusreturns HTTP 200 with Prometheus text format (lines beginning with# HELPand# TYPE)spring-bootscrape target in Prometheus UI (http://localhost:9090/targets) shows stateUPcurl -s http://localhost:9090/api/v1/query?query=jvm_memory_used_bytesreturns datacurl -s http://localhost:9090/api/v1/query?query=http_server_requests_seconds_countreturns data after making any API callhttp://localhost:3200/api/searchand confirmfamilienarchiv-backendservice is listed./mvnw testpasses — no test regressionsDefinition of Done
pom.xmlandapplication.ymlchanges committeddocker-compose.ymlupdated withOTEL_EXPORTER_OTLP_ENDPOINTmain🏗️ Markus Keller — Application Architect
Observations
/actuator/*externally. The issue's acceptance criteria, however, describe scraping onlocalhost:8080/actuator/prometheus— that contradicts the documented production architecture.management.server.portis implicitly 8080 (same as the app port). The production docs say management should be on port 8081 so the main port handles only app traffic and the management port is never routed through Caddy. This split should be reflected in the YAML snippets.opentelemetry-spring-boot-starteris not version-managed by the Spring Boot 4 BOM — unlikemicrometer-tracing-bridge-otelwhich is. The issue omits the<version>tag foropentelemetry-spring-boot-starter. This will cause a build failure without an explicit version or a<dependencyManagement>entry. Check the OpenTelemetry Spring Boot starter release matrix for Spring Boot 4 compatibility before implementing.docs/architecture/c4/l2-containers.pumldiagram needs to be updated: Prometheus and Tempo are new containers in the topology. Per the architect's update table, new Docker services require bothl2-containers.pumlanddocs/DEPLOYMENT.mdupdates.Recommendations
application.yml:respond @actuator 404rule never needs to cover port 8081 (since it's not routed through Caddy at all).opentelemetry-spring-boot-starteragainst the Spring Boot 4 / OTel instrumentation release matrix. As of early 2026, Spring Boot 4 requires OpenTelemetry instrumentation 2.x — pin the exact version explicitly inpom.xml.docs/architecture/c4/l2-containers.pumlto add Prometheus and Tempo as containers. This is a doc obligation on the architecture update table, not optional.👨💻 Felix Brandt — Senior Fullstack Developer
Observations
opentelemetry-spring-boot-starterdependency in the issue body is missing a<version>tag. The Spring Boot BOM managesmicrometer-tracing-bridge-otel, but NOT the OTel Spring Boot starter (it lives in theio.opentelemetry.instrumentationgroup, outside the Spring BOM). A build will fail without an explicit version.ApplicationContextTestloads the full Spring context with@SpringBootTest. Adding the OTel autoconfiguration may cause this test to attempt an OTLP connection at startup (depending on how the starter wires its exporters). The issue's AC says "Application starts cleanly in CI — OTLP failure is logged as a warning, not a startup error." That behavior needs verification — it depends on whether the Spring Boot 4 + OTel starter combination fails fast or degrades gracefully when the OTLP endpoint is unreachable.application-dev.yamlcurrently only hasspring.jpa.show-sql: trueand springdoc overrides. Addingotel.*config toapplication.yamlwith a${OTEL_EXPORTER_OTLP_ENDPOINT:http://localhost:4317}default is correct — that default handles the no-observability-stack case.1.0inapplication.yml(base config) with a note to override to0.1in prod. This should live inapplication.ymlas1.0for dev and be overridden via environment variableMANAGEMENT_TRACING_SAMPLING_PROBABILITY=0.1in the production compose. Avoid a separateapplication-prod.ymlprofile — the team doesn't use that profile pattern (onlydevis defined).Recommendations
./mvnw testpasses end-to-end with the starter present and no OTLP endpoint reachable. TheApplicationContextTestusesWebEnvironment.NONE, so HTTP context is absent — confirm the OTel exporter initialization doesn't block startup in that mode.🔐 Nora "NullX" Steiner — Application Security Engineer
Observations
Prometheus endpoint access control (CWE-200: Exposure of Sensitive Information)
The current
SecurityConfighas this rule:Only
/actuator/healthis permitted without authentication. The new/actuator/prometheusendpoint will be protected by the standard "authenticated" catch-all — meaning it requires a valid session to scrape. Prometheus does not support session-based auth.Two options exist: (1) add
/actuator/prometheustopermitAll()in SecurityConfig, or (2) move management endpoints to a separate port (8081) and configure Spring Security to allow unauthenticated access on the management port only. Option 2 is strongly preferred — it exposes the scrape endpoint only on the internal Docker network port, never through the session-authenticated main API.The issue body says "The
/actuator/prometheusendpoint must not be reachable through the Caddy reverse proxy" — but says nothing about how Spring Security will handle the scrape request. This is a gap.OTLP export — data exposure
OTLP traces will include HTTP request attributes: URL paths, HTTP method, status codes, and potentially query parameters. Spring Boot's OTel auto-instrumentation does not sanitize query parameters from span attributes by default. If any API endpoint includes sensitive data in query parameters (e.g., search terms, file names), those values will appear in Tempo traces.
Heap/environment exposure not the concern here — the issue correctly blocks
/actuator/*via Caddy and limits exposure tohealth, info, prometheus, metrics./actuator/heapdumpand/actuator/envare not included. This is good.Recommendations
permitAllin SecurityConfig. Configure:SecurityConfig— management port is unreachable from the public internet by network topology./actuator/prometheusreturns 401 or is unreachable without credentials on the main port (8080). This prevents a futureSecurityConfigchange from accidentally reopening it.🧪 Sara Holt — QA Engineer & Test Strategist
Observations
./mvnw testpasses — no test regressions. This is the only automated AC and it's the most important one. The risk is that the OTel Spring Boot starter autoconfigures an OTLP exporter that attempts a connection duringApplicationContextTest. If the exporter fails non-fatally (logs a warning), tests pass. If it throws during context initialization, they fail. This needs explicit validation before the PR is merged — it's the primary regression risk in this issue../mvnw clean testwithout atempoorprometheusservice. The issue's CI acceptance criterion ("OTLP failure is logged as a warning, not a startup error") will be verified implicitly by the CI run — but only if that CI run actually passes. If it doesn't, the issue isn't done.opentelemetry-spring-boot-starterinitializes a TracerProvider at startup. On first run with no cached beans, this could add 2-5 seconds toApplicationContextTest. Worth checking.Recommendations
./mvnw testlocally (or observe CI) after adding the OTel starter to confirmApplicationContextTeststill completes within the existing time budget. If startup time increases by more than ~3 seconds, flag it.⚙️ Tobias Wendt — DevOps & Platform Engineer
Observations
docker-compose.ymlalready has known issues I'll call out:minio/minio:latestandminio/mcuse:latesttags (not this issue, but flagged for awareness). The Tempo service from the companion issue needs a pinned tag when it lands —grafana/tempo:2.4.1not:latest.OTEL_EXPORTER_OTLP_ENDPOINTdefault ofhttp://localhost:4317will fail silently in development when Tempo isn't running — which is intentional and correct. Confirmed: OTel exporter failures are non-fatal by default. Good.docs/DEPLOYMENT.mdalready states management port 8081 for Prometheus scraping. The Prometheusscrape_configsin the companion issue likely already targets port 8081. The YAML snippet in this issue points to port 8080 implicitly. This mismatch needs to be resolved.depends_onupdate needed for the backend service indocker-compose.yml— Tempo doesn't need to be healthy before the backend starts (OTLP failures are non-fatal, and the backend has no functional dependency on Tempo)../mvnw clean teststep in.gitea/workflows/ci.ymlruns without Tempo or Prometheus services. TheApplicationContextTestwill attempt OTel initialization. If the starter's default behavior is to log a warning when OTLP is unreachable (confirmed by OTel SDK design), CI will continue to pass. If not, CI will break on the first push.Recommendations
application.yml:OTEL_EXPORTER_OTLP_ENDPOINTenv var is for traces (port 4317 on Tempo), not the Prometheus scrape — those are two separate concerns on two separate ports. Make this explicit in the issue and in code comments.backend:8081(management port) notbackend:8080. If the Prometheus config hasn't been written yet, note this dependency explicitly.docs/infrastructure/production-compose.md— the management port 8081 exposure and the new env vars need to be reflected there for the next person doing a production deployment../data/postgresindocker-compose.ymlis a known issue (named volumes preferred for production), but out of scope for this issue. Flagged for awareness only.📋 Elicit — Requirements Engineer
Observations
spring-bootscrape target in Prometheus UI shows stateUP" — but the scrape target namespring-bootpresupposes a specific Prometheusscrape_configsjob name that hasn't been defined yet (it belongs to the companion Prometheus configuration issue). This AC is only verifiable once that issue is also implemented. The dependency on the Tempo issue is stated, but the Prometheus config dependency is not.#575or whatever the Tempo issue is). Gitea will auto-link it.pom.xml,application.yml, anddocker-compose.yml— this is correct and complete for the scope.ApplicationContextTesttest profile (@ActiveProfiles("test")) — whether anapplication-test.yamlneeds to disable OTLP export explicitly for the test suite. This is an unresolved edge case.Recommendations
management.server.portis configured to../mvnw testwith no OTLP endpoint, the OTel exporter logs a warning (not ERROR) and the test suite completes successfully." This is currently implied but not explicitly stated in the ACs.application-test.yamlthat explicitly setsmanagement.tracing.sampling.probability: 0.0for the test profile to prevent any trace export attempts during tests:🎨 Leonie Voss — UX Designer & Accessibility Strategist
No UX or frontend concerns from my angle on this issue.
This is a pure backend observability change — Prometheus metrics endpoint and OTLP trace export. No user-visible interface, no new routes, no Svelte components, no design tokens, no interaction patterns. The work has zero impact on the frontend user experience.
The one downstream UX benefit worth noting: once Grafana dashboards are wired up (Phase 7 milestone), the observability data this issue provides will enable a future admin-facing metrics panel. When that feature is specced, I'll want to review the dashboard UI for the dual-audience design constraints (seniors + millennials). That's a future issue, not this one.
🗳️ Decision Queue — Action Required
2 decisions need your input before implementation starts.
Infrastructure
docs/DEPLOYMENT.md) already state port 8081 for Prometheus scraping. The issue YAML snippets imply port 8080 (nomanagement.server.portconfig). If you use port 8081: Caddy never needs to see management traffic, Spring Security doesn't needpermitAllon/actuator/prometheus, and the Prometheus scrape_configs in the companion issue must target port 8081. If you stay on 8080: you must add/actuator/prometheustopermitAll()inSecurityConfig(auth would otherwise block the scrape). All four reviewers (Markus, Felix, Nora, Tobias) independently recommend port 8081. The only cost is addingmanagement.server.port: 8081toapplication.ymland confirming the Prometheus scrape job targets the right port. (Raised by: Markus, Nora, Tobias, Felix)Dependencies
opentelemetry-spring-boot-starter(io.opentelemetry.instrumentationgroup) is NOT version-managed by the Spring Boot BOM. The issue omits a<version>tag, which will cause a build failure. You need to look up the correct version from the OpenTelemetry Java Instrumentation releases that is compatible with Spring Boot 4 / Spring Framework 7 before writing the POM change. Once resolved, pin it explicitly inpom.xml. (Raised by: Markus, Felix)Implementation complete — branch
feat/issue-576-backend-instrumentationWhat was done
Commit:
b3e49a95— devops(backend): expose Prometheus metrics endpoint + OTLP trace export from Spring BootPart A — Prometheus metrics (
backend/pom.xml+application.yaml)micrometer-registry-prometheus(version managed by Spring Boot BOM)health, info, prometheus, metricsendpointspermitAllchange toSecurityConfigneededbackend:8081directly insidearchiv-net— matches the existinginfra/observability/prometheus/prometheus.ymlscrape target already pointing atbackend:8081expose: "8081"to the backend service indocker-compose.ymlfor intra-network reachabilityPart B — OpenTelemetry tracing (
backend/pom.xml+application.yaml+docker-compose.yml)micrometer-tracing-bridge-otel(BOM-managed)opentelemetry-spring-boot-starterpinned to 2.27.0 — latest stable release with Spring Boot 4 / Spring Framework 7 support (not in Spring Boot BOM; explicit version required)otel.service.name: familienarchiv-backendand OTLP endpoint with${OTEL_EXPORTER_OTLP_ENDPOINT:http://localhost:4317}default (CI-safe fallback)1.0in base config; overridden to0.1indocker-compose.ymlviaMANAGEMENT_TRACING_SAMPLING_PROBABILITYenv var (no extra profile YAML needed)OTEL_EXPORTER_OTLP_ENDPOINT: http://tempo:4317todocker-compose.ymlbackend environment, pointing at the Tempo service (future Tempo issue)CI safety —
backend/src/test/resources/application-test.yamlmanagement.tracing.sampling.probability: 0.0in the test profile — prevents any OTLP connection attempts during./mvnw test, making CI behaviour deterministic regardless of the OTel SDK's graceful-failure behaviourBuild verification
./mvnw clean package -DskipTests→ BUILD SUCCESS ✅Files changed
backend/pom.xml— 3 new dependenciesbackend/src/main/resources/application.yaml— management port, endpoint exposure, tracing config, OTel configbackend/src/test/resources/application-test.yaml— tracing disabled for testsdocker-compose.yml— OTLP endpoint, sampling probability, management port expose