Three root causes confirmed via live server investigation (issue #604):
1. ManagementWebSecurityAutoConfiguration applied HTTP Basic auth to the
management port (8081), causing Prometheus to receive 401 HTML responses
instead of metrics. Excluded the auto-config — the Docker network
(archiv-net) provides the security boundary for this internal port.
2. promtail-config.yml had no `job` relabel rule. Grafana's Loki dashboards
query {job="$app"} which matched nothing; logs were in Loki under
compose_service but invisible to every dashboard panel.
3. prometheus.yml had a stale comment claiming the spring-boot target would
be DOWN until micrometer-registry-prometheus was added — it has been
present in pom.xml for some time.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Creates the skeleton observability stack (no running services yet) that all
subsequent Grafana LGTM + GlitchTip issues depend on:
- docker-compose.observability.yml: external archiv-net join, obs-net bridge,
named volumes for all five services, placeholder comments for each service
group (Metrics/Logs/Traces/Dashboards/Error Tracking), startup-order note
- infra/observability/{prometheus,loki,promtail,tempo,grafana/provisioning/{datasources,dashboards}}/.gitkeep
- .env.example: new # --- Observability --- section with PORT_GRAFANA,
PORT_GLITCHTIP, PORT_PROMETHEUS, GLITCHTIP_DOMAIN, GLITCHTIP_SECRET_KEY
(with generation hint), SENTRY_DSN, VITE_SENTRY_DSN
Verified: docker compose -f docker-compose.observability.yml config exits 0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>