fix(staging): backend startup crash + healthcheck + OTel routing #597

Merged
marcel merged 2 commits from feat/issue-580-sentry-backend into main 2026-05-15 17:44:32 +02:00
Owner

Summary

Three staging regressions introduced by the observability stack, fixed together:

  • OTel BOM version mismatchopentelemetry-spring-boot-starter:2.27.0 requires opentelemetry-api:1.61.0 but Spring Boot 4.0.0 only manages 1.55.0 (missing GlobalOpenTelemetry.getOrNoop()). Backend crashed at startup with NoSuchMethodError. Fixed by importing opentelemetry-bom:1.61.0 via <dependencyManagement> before the Spring Boot BOM applies.

  • Backend healthcheck on wrong port — Docker probe hit localhost:8080/actuator/health (app port) but the management server is on 8081. Every probe got a 404, the container was permanently marked unhealthy, and dependent services refused to start. Fixed by changing the probe to localhost:8081.

  • OTel exporter falling back to localhostOTEL_EXPORTER_OTLP_ENDPOINT was not injected in .env.staging, so the exporter used the CI-safe default http://localhost:4317 instead of http://tempo:4317. Fixed by adding the env var to the nightly env-file generation step.

  • CI compose-idempotency network collision — Adding name: archiv-net to the network definition caused the idempotency test to join any pre-existing archiv-net left by staging or a prior run. mc would resolve minio to the staging container and fail with a signature mismatch. Fixed by making the network name configurable via ${COMPOSE_NETWORK_NAME:-archiv-net} and injecting COMPOSE_NETWORK_NAME=test-idem-archiv-net in the test's stub env file.

Test plan

  • 1603 backend tests pass
  • Next staging nightly: backend starts and reaches healthy within start_period
  • Traces/metrics/logs appear in Tempo/Grafana (no localhost:4317 errors in backend logs)
  • CI compose-idempotency job passes

🤖 Generated with Claude Code

## Summary Three staging regressions introduced by the observability stack, fixed together: - **OTel BOM version mismatch** — `opentelemetry-spring-boot-starter:2.27.0` requires `opentelemetry-api:1.61.0` but Spring Boot 4.0.0 only manages `1.55.0` (missing `GlobalOpenTelemetry.getOrNoop()`). Backend crashed at startup with `NoSuchMethodError`. Fixed by importing `opentelemetry-bom:1.61.0` via `<dependencyManagement>` before the Spring Boot BOM applies. - **Backend healthcheck on wrong port** — Docker probe hit `localhost:8080/actuator/health` (app port) but the management server is on `8081`. Every probe got a 404, the container was permanently marked unhealthy, and dependent services refused to start. Fixed by changing the probe to `localhost:8081`. - **OTel exporter falling back to localhost** — `OTEL_EXPORTER_OTLP_ENDPOINT` was not injected in `.env.staging`, so the exporter used the CI-safe default `http://localhost:4317` instead of `http://tempo:4317`. Fixed by adding the env var to the nightly env-file generation step. - **CI compose-idempotency network collision** — Adding `name: archiv-net` to the network definition caused the idempotency test to join any pre-existing `archiv-net` left by staging or a prior run. `mc` would resolve `minio` to the staging container and fail with a signature mismatch. Fixed by making the network name configurable via `${COMPOSE_NETWORK_NAME:-archiv-net}` and injecting `COMPOSE_NETWORK_NAME=test-idem-archiv-net` in the test's stub env file. ## Test plan - [x] 1603 backend tests pass - [ ] Next staging nightly: backend starts and reaches `healthy` within `start_period` - [ ] Traces/metrics/logs appear in Tempo/Grafana (no `localhost:4317` errors in backend logs) - [ ] CI `compose-idempotency` job passes 🤖 Generated with [Claude Code](https://claude.com/claude-code)
marcel added 1 commit 2026-05-15 17:42:15 +02:00
fix(staging): correct backend healthcheck port and OTel endpoint
Some checks failed
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (pull_request) Has been cancelled
cf78ab2f8e
Two bugs introduced when the management port was split from the app port:

1. Backend healthcheck hit localhost:8080/actuator/health (app port) —
   actuator is on management.server.port=8081, so every probe got a 404
   from the main MVC dispatcher, marking the container permanently unhealthy.
   Fix: change the probe to localhost:8081.

2. OTEL_EXPORTER_OTLP_ENDPOINT was not set in .env.staging, so the exporter
   fell back to http://localhost:4317 (the CI-safe default) instead of
   http://tempo:4317 (the in-network Tempo service). Fix: inject the correct
   endpoint in the nightly env-file generation step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
marcel added 1 commit 2026-05-15 17:43:52 +02:00
fix(infra): set OTEL_EXPORTER_OTLP_ENDPOINT in docker-compose.prod.yml
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
fed427dc4a
The endpoint belongs in the compose file (hardcoded to the in-network
Tempo service) rather than per-environment workflow files. This covers
both staging (nightly.yml) and production (release.yml) with a single
change and removes the duplicate from the nightly env-file block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
marcel merged commit fed427dc4a into main 2026-05-15 17:44:32 +02:00
marcel deleted branch feat/issue-580-sentry-backend 2026-05-15 17:44:33 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#597