Tests that /actuator/health is accessible without credentials and
/actuator/env requires authentication — permanent regression guards
against CVE-2026-40976-class Actuator filter chain bypass bugs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the decision to use the Sentry SDK with self-hosted GlitchTip,
sendDefaultPii:false rationale, errorId surfacing to users, and alternatives
considered (Sentry SaaS rejected for data-minimisation reasons).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Lower tracesSampleRate from 1.0 to 0.1 in both hooks (errors still captured
at 100%; trace volume reduced for self-hosted GlitchTip on shared VPS)
- Add comment explaining VITE_SENTRY_DSN is a write-only ingest key, safe in
client bundle — prevents accidental rotation as if it were a password
- Restore HTTP status code prominence: text-4xl font-bold (was text-xs text-ink-3)
- Add min-w-[44px] to copy button for WCAG 2.2 minimum touch target
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The handleError callback in hooks.server.ts is now gated by the 80% branch
coverage threshold along with the rest of the server-side logic.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two tests matching the existing hooks.server.test.ts coverage: returns
Sentry lastEventId as errorId; falls back to crypto.randomUUID when
lastEventId returns undefined.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds availability guard (navigator.clipboard may be undefined in non-HTTPS
contexts) and a rejection handler so clipboard-denied errors are silently
caught rather than becoming unhandled promise rejections. Tests cover the
success feedback and the silent-failure path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port 4317 is gRPC; the backend uses HttpExporter (HTTP/1.1) and sends
to port 4318. Update Container description and Rel label to match.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New docs/OBSERVABILITY.md: developer-facing guide with a "where to look
for what" table, common LogQL queries, trace exploration workflow,
log→trace correlation via traceId links, and a signal summary table
- Link from DEPLOYMENT.md §4 (ops section now points to dev guide) and
from CLAUDE.md Infrastructure section
- Fix stale DEPLOYMENT.md env var table: OTEL_EXPORTER_OTLP_ENDPOINT
now documents port 4318 (HTTP) not 4317 (gRPC); add the three new
env vars wired in this PR (OTEL_LOGS_EXPORTER, OTEL_METRICS_EXPORTER,
MANAGEMENT_METRICS_TAGS_APPLICATION) with their rationale
- Fix stale obs-tempo service description (port 4318, not 4317)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tempo only handles traces; sending metrics to /v1/metrics returns 404.
Prometheus already scrapes Spring Boot metrics via the pull-model at
/actuator/prometheus, so OTLP metric push is redundant and noisy.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Change OTEL default endpoint from port 4317 (gRPC) to 4318 (HTTP) to
match Spring Boot's HttpExporter; sending HTTP/1.1 to a gRPC listener
caused "Connection reset" errors
- Add otel.logs.exporter=none: Promtail captures Docker logs via the
logging driver; sending logs to Tempo's OTLP endpoint (which only
handles traces) produced 404 errors
- Add management.metrics.tags.application to every metric so Grafana's
Spring Boot Observability dashboard (ID 17175) can filter by the
application label_values() template variable
- Add MANAGEMENT_METRICS_TAGS_APPLICATION and OTEL_LOGS_EXPORTER env
vars to docker-compose.prod.yml; production Tempo endpoint already
uses 4318
- Add MANAGEMENT_TRACING_SAMPLING_PROBABILITY to prod compose with
0.1 default to avoid 100% trace sampling in production
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The job label (derived from the Docker Compose service name) is what
powers {job="backend"} queries in Loki dashboards and populates the
Grafana "App" variable dropdown. Operators need to know this mapping
when writing custom Loki queries.
Addresses @markus non-blocker suggestion from PR #606 review.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the architectural decision behind the dedicated management
SecurityFilterChain, the discovery that SB4+Jetty removed the isolated
management child-context security, and the consequences for actuator
endpoint exposure.
Addresses @markus blocker from PR #606 review.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add @Order(1) managementFilterChain scoped to /actuator/** with explicit
401 entry point, blocking all non-public actuator paths without the
form-login redirect that the main chain uses for browser clients.
- Split single combined test into two focused assertions
(prometheus_endpoint_returns_200_without_credentials,
prometheus_endpoint_returns_jvm_metrics).
- Add negative regression test: actuator_metrics_requires_authentication
verifies that /actuator/metrics returns 401 without credentials.
Addresses reviewer concerns from @sara (missing negative test, split
assertions) and @nora (dedicated management security layer).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four Spring Boot 4.0-specific issues prevented /actuator/prometheus from working:
1. spring-boot-starter-micrometer-metrics missing — Spring Boot 4.0 splits
Micrometer metrics export (including the Prometheus scrape endpoint) out of
spring-boot-starter-actuator into its own starter. Added dependency.
2. management.prometheus.metrics.export.enabled not set — Spring Boot 4.0
defaults metrics export to false (opt-in). Added the property to
application.yaml.
3. SecurityConfig did not permit /actuator/prometheus — Spring Boot 4.0
with Jetty serves the management port (8081) via the same security filter
chain as the main port (8080). The previous commit's exclusion of
ManagementWebSecurityAutoConfiguration was a no-op (that class no longer
exists in Spring Boot 4.0); removed it and added the correct permitAll()
rule. Updated the architecture comment in application.yaml to reflect the
true filter-chain behaviour.
4. Reverted invalid FamilienarchivApplication.java change from the prior
commit (ManagementWebSecurityAutoConfiguration import compiled against a
class that does not exist in the Spring Boot 4.0 BOM).
Also adds ActuatorPrometheusIT — an integration test that asserts the
/actuator/prometheus endpoint returns 200 with jvm_memory_used_bytes without
credentials, serving as regression protection against future Spring Boot
upgrades silently breaking metrics collection.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three root causes confirmed via live server investigation (issue #604):
1. ManagementWebSecurityAutoConfiguration applied HTTP Basic auth to the
management port (8081), causing Prometheus to receive 401 HTML responses
instead of metrics. Excluded the auto-config — the Docker network
(archiv-net) provides the security boundary for this internal port.
2. promtail-config.yml had no `job` relabel rule. Grafana's Loki dashboards
query {job="$app"} which matched nothing; logs were in Loki under
compose_service but invisible to every dashboard panel.
3. prometheus.yml had a stale comment claiming the spring-boot target would
be DOWN until micrometer-registry-prometheus was added — it has been
present in pom.xml for some time.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace with a cross-reference to DEPLOYMENT.md §4 now that the obs
stack shipped as docker-compose.observability.yml.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- GlitchTip image corrected from glitchtip:v4 to glitchtip:6.1.6 in services table
- Grafana default port corrected from 3001 to 3003 in services table description
- SENTRY_DSN added to backend env vars table (wired in docker-compose.yml and application.yaml)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GlitchTip 6.x moved its internal listen port from 8080 to 8000.
The ports mapping was forwarding to the wrong port (host traffic
never reached the app), and the healthcheck was probing 8080 with
wget (not present in the image), causing the container to stay
permanently unhealthy.
Fix: map to port 8000, check with bash /dev/tcp (no external tools
needed, available in the Python base image).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The live runner config was missing /opt/familienarchiv in valid_volumes
and options, so deploy steps wrote files into the ephemeral job
container rather than the host — silently discarded on exit.
Updated /root/docker/gitea/runner-config.yaml on the server and
restarted gitea-runner. Repo file now matches the server exactly,
including the network: gitea_gitea setting that was previously
only on the server.
DEPLOYMENT.md: clarifies that /opt/familienarchiv does not need to be
in the runner container's own volumes (DooD spawns job containers from
the host daemon directly); updates restart command from systemctl to
docker restart; narrows the cp-r stale-file note to manual ops only
(CI uses rm -rf before copying).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rsync is not present in the act_runner job container image. rm -rf +
cp -r gives identical semantics (including removal of deleted files)
using only coreutils, which are always available.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move POSTGRES_USER to obs.env (non-secret, constant across envs)
- Replace cp -r with rsync -a --delete so removed config files are
purged from /opt/familienarchiv on next deploy instead of lingering
- Document --env-file ordering contract in validate + start steps:
obs.env first (defaults), obs-secrets.env second (wins on dupes)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI uses 'cp -r' which does not remove deleted files. Documents the
manual cleanup step for config files removed from git.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same fix as nightly.yml: prevents shell expansion of '$' in secret
values after Gitea renders them. Keep in sync with nightly.yml.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevents shell from expanding '$' in Gitea-rendered secret values.
Without the quote, a password like 'P@$s5w0rd' has '$s5w0rd' silently
expanded to '' — writing a truncated value to obs-secrets.env.
'<<'EOF'' suppresses shell expansion; Gitea's '${{ }}' template
rendering already ran before the shell sees the script.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The container names archiv-staging-db-1 and archiv-production-db-1 are
derived from the Compose project name + service name. A project rename
silently breaks the obs stack DB connection. Add a comment at the point
of definition so the dependency is obvious when someone changes it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The heredoc creates the file with default umask permissions (644 —
world-readable). Setting 600 immediately after creation prevents other
processes on the host from reading the Grafana, GlitchTip, and Postgres
credentials. Defence-in-depth for the single-tenant VPS.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nightly.yml had two observability gates that release.yml lacked:
- "Validate observability compose config" (docker compose config --quiet)
catches missing env vars and YAML errors before any containers start
- "Assert observability stack health" checks obs-loki/prometheus/grafana/tempo
are healthy after up --wait, covering services without healthcheck directives
Mirrors the nightly.yml steps verbatim so the production deploy path is at
least as well-verified as the nightly staging path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Decision section described an operator-managed /opt/familienarchiv/.env
that CI does not touch. The actual implementation is a two-source model:
obs.env (git-tracked, non-secret config) + obs-secrets.env (CI-written
fresh from Gitea secrets on every deploy). Also updates the Consequences
bullet that incorrectly stated secrets are decoupled from CI.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>