The observability stack's bind-mount sources pointed to workspace-relative
paths. When CI wiped the workspace between runs, containers kept running but
their config files disappeared — causing Docker to auto-create directories
at the missing paths and crash the services on next restart.
Fix: mount /opt/familienarchiv/ into CI job containers via runner-config.yaml,
then copy infra/observability/ and docker-compose.observability.yml there before
docker compose up. Compose runs from the permanent path, so bind mounts resolve
to stable host paths that survive workspace wipes.
Docker Compose reads /opt/familienarchiv/.env automatically (no --env-file flag),
which is managed on the server and persists between CI runs.
Closes#601
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POSTGRES_HOST variable (default: archive-db) lets the observability stack
connect to a different Postgres container — needed when only the staging
stack is running (container name: archiv-staging-db-1).
PORT_GRAFANA default changed from 3001 to 3003 to avoid collision with
the staging frontend which occupies 3001.
Closes#601
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tempo 2.7.2 removed `processors` from the top-level metrics_generator
config; the field is only valid under `overrides.defaults.metrics_generator`.
The setting was already present there, so this only removes the now-rejected
duplicate at the top level.
Closes part of #601
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5646e739 added svelte-kit sync before lint so .svelte-kit/tsconfig.json
always exists. This activated projectService: true for every run, which
builds the full TypeScript language service for all .svelte files and
caused CI lint to take 7+ minutes.
None of the rules in the Svelte-specific block need type information —
they are all AST-selector-based no-restricted-syntax checks. Removing
projectService restores the previous fast path without losing any lint
coverage.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add the /srv/gitea-workspace prerequisite step to DEPLOYMENT.md §3.1
and a new "Workspace bind-mount setup" subsection plus failure mode 4
to ci-gitea.md, covering the root cause, one-time host setup, disk
management, and troubleshooting for the bind-mount resolution fix
introduced in ADR-015.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the decision to use workdir_parent + identical host<->container
path instead of the overlay2 MergedDir sync that was in the initial fix.
Captures the alternatives (nsenter sync, image-baked configs, path mismatch)
and the operational consequences (prereq directory, out-of-band compose.yaml).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
runner-config.yaml: correct path to /srv/gitea-workspace (VPS, not Synology).
docker-compose.observability.yml: revert 5 bind mounts to plain relative paths;
OBS_CONFIG_DIR variable is no longer needed.
nightly.yml / release.yml: remove OBS_CONFIG_DIR env injection and the
"Sync observability configs to host" step from both workflows.
With workdir_parent=/srv/gitea-workspace and an identical host<->container
bind mount, $(pwd) inside job containers resolves to a real host path the
daemon can find — no privileged container, no overlay2 inspection, no nsenter.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Set workdir_parent to /volume1/gitea-workspace so act_runner stores job
workspaces at a real NAS path. Mounting that path at the same absolute
location in job containers means $(pwd) inside any job container resolves
to a host path the daemon can find — no overlay2 tricks needed.
Prerequisite (NAS): mkdir -p /volume1/gitea-workspace and add
- /volume1/gitea-workspace:/volume1/gitea-workspace
to the runner service volumes in gitea's docker-compose.yml, then restart
the runner.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DooD runner only shares /var/run/docker.sock — no workspace directory is
mapped to the host daemon. Relative bind mounts in
docker-compose.observability.yml resolved to paths that didn't exist on
the host; Docker auto-created directories in their place, causing
'not a directory' mount failures for all five config files.
Fix:
- docker-compose.observability.yml: replace hardcoded ./infra/observability/
prefix with ${OBS_CONFIG_DIR:-./infra/observability} so the path is
configurable while remaining backwards-compatible for local use.
- nightly.yml / release.yml: add a 'Sync observability configs to host'
step that finds the job container's overlay2 MergedDir (the container's
full filesystem as seen from the host mount namespace), then uses the
existing nsenter/alpine pattern to cp the config tree into a stable host
path (/srv/familienarchiv-{staging,production}/obs-configs).
OBS_CONFIG_DIR is injected into the env file so Compose picks it up.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
glitchtip/glitchtip:v4 is not a real tag — GlitchTip does not use a
v-prefix in its Docker image versioning. Latest stable release is 6.1.6.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The endpoint belongs in the compose file (hardcoded to the in-network
Tempo service) rather than per-environment workflow files. This covers
both staging (nightly.yml) and production (release.yml) with a single
change and removes the duplicate from the nightly env-file block.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs introduced when the management port was split from the app port:
1. Backend healthcheck hit localhost:8080/actuator/health (app port) —
actuator is on management.server.port=8081, so every probe got a 404
from the main MVC dispatcher, marking the container permanently unhealthy.
Fix: change the probe to localhost:8081.
2. OTEL_EXPORTER_OTLP_ENDPOINT was not set in .env.staging, so the exporter
fell back to http://localhost:4317 (the CI-safe default) instead of
http://tempo:4317 (the in-network Tempo service). Fix: inject the correct
endpoint in the nightly env-file generation step.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The name: archiv-net declaration (needed so docker-compose.observability.yml
can join the network as external: true) caused the compose-idempotency CI job
to collide with any archiv-net left on the runner from staging or a previous
run. mc would resolve 'minio' to the wrong container and fail with a signature
mismatch.
Make the network name interpolable via COMPOSE_NETWORK_NAME (default: archiv-net
so production/staging behaviour is unchanged). Inject COMPOSE_NETWORK_NAME=
test-idem-archiv-net into the stub env file so the idempotency test always
gets a fully isolated network.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
opentelemetry-spring-boot-starter:2.27.0 was built against
opentelemetry-api:1.61.0. Spring Boot 4.0.0 only manages 1.55.0,
which is missing GlobalOpenTelemetry.getOrNoop(). The backend crashed
at startup with NoSuchMethodError on the first staging nightly.
Add a <dependencyManagement> import of opentelemetry-bom:1.61.0 before
the Spring Boot BOM applies, so all OTel core artifacts resolve to the
version the instrumentation starter actually requires.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both nightly and release workflows were missing --remove-orphans on the
observability compose up, while the main app deploy step already had it.
Without it, containers removed from docker-compose.observability.yml
linger as unnamed orphans until manually pruned.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The GlitchTip vhost only had a manual HSTS header; the rest of the
(security_headers) snippet (X-Content-Type-Options, Referrer-Policy,
Permissions-Policy, -Server removal) was missing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, and SENTRY_DSN were
referenced in the workflow env files but absent from the secrets table,
leaving the first-run operator without a complete checklist.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds SENTRY_DSN as an optional secret (empty by default) so it can be
set after GlitchTip first-run without requiring another code change.
Backend reads it via application.yaml; empty value keeps Sentry disabled.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prometheus, Loki, Tempo, and Grafana all define healthchecks in
docker-compose.observability.yml. Without --wait, the step exits 0
as soon as containers are created, masking startup failures silently.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Caddy does not set Strict-Transport-Security on GlitchTip because the
full security_headers snippet is intentionally omitted (Permissions-Policy
interferes with the Sentry SDK CORS). Adding HSTS alone guarantees
HTTPS enforcement at the Caddy layer without breaking SDK ingestion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
grafana.archiv.raddatz.cloud → 127.0.0.1:3003 (with security headers)
glitchtip.archiv.raddatz.cloud → 127.0.0.1:3002 (no security headers —
GlitchTip manages its own; the Sentry SDK also POSTs here)
Requires A records for both subdomains pointing at the server before
the next `systemctl reload caddy`.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docker-compose.prod.yml: add `name: archiv-net` so the network has a
stable Docker name regardless of compose project name (-p flag).
Both staging and production share the same host-level network, which
is correct since the observability stack is a single shared instance.
- nightly.yml / release.yml: add observability env vars (POSTGRES_USER,
PORT_GRAFANA=3003, PORT_GLITCHTIP=3002, PORT_PROMETHEUS=9090,
GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, GLITCHTIP_DOMAIN) to the
env file, then `docker compose -f docker-compose.observability.yml up -d`
after the app deploy step. PORT_GRAFANA=3003 avoids collision with
staging frontend on 3001.
Requires two new Gitea secrets: GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4 integration test classes were restarting the full Spring context (and a new
Postgres Testcontainer, ~75s each) after every test method — 10 unnecessary
container startups adding ~12 minutes to CI. Fixed by:
- PersonServiceIntegrationTest, DocumentSearchPagedIntegrationTest,
GeschichteServiceIntegrationTest: swap to @Transactional so each test
rolls back instead of destroying the context.
- AuditServiceIntegrationTest: cannot use @Transactional (logAfterCommit
hooks into AFTER_COMMIT which requires a real commit); reset state with
@BeforeEach deleteAll() instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sentry-spring-boot-starter-jakarta 8.5.0 does not support Spring Boot 4.0 —
it logs an "Incompatible Spring Boot Version" warning and its SentryAutoConfiguration
crashes SF7 bean-name generation. sentry-spring-boot-4 (added in 8.21.0) is the
dedicated Spring Boot 4 module with a fixed auto-configuration class.
- Replace sentry-spring-boot-starter-jakarta:8.5.0 with sentry-spring-boot-4:8.41.0
- Delete SentryConfig.java — workaround no longer needed, auto-config handles init
- Remove spring.autoconfigure.exclude from application.yaml + application-test.yaml
- Delete SentryConfigTest.java — tested the deleted workaround class
- Update ApplicationContextTest: assert Sentry.isEnabled() is false when no DSN set
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SentryAutoConfiguration$HubConfiguration$SentrySpanRestClientConfiguration is a triply-
nested @Configuration class conditionally loaded when RestClient is on the classpath
(always true on Spring Framework 7). Spring Framework 7's bean name generator fails
on such deeply-nested @Import-ed classes, crashing every @SpringBootTest context.
Replace the broken auto-configuration with a minimal SentryConfig bean that calls
Sentry.init() with the same properties (DSN, environment, sample rate, PII guard,
DomainException filter). Unexpected 5xx exceptions are forwarded to Sentry via
Sentry.captureException() in GlobalExceptionHandler.handleGeneric().
Also add management.server.port=0 to application-test.yaml to eliminate TIME_WAIT
conflicts from @DirtiesContext restarts on the fixed management port 8081 (see #593).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port 8081 was fixed by #576. With four @DirtiesContext(AFTER_EACH_TEST_METHOD)
classes (22 context restarts total), the OS TIME_WAIT state holds port 8081
for ~45-60s per cycle — adding ~17 min overhead. All 1601 tests pass but
surefire's 10-min timeout fires before the suite finishes.
Fixes#593.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the sentrySvelteKit() Vite plugin as the first plugin in vite.config.ts.
When SENTRY_AUTH_TOKEN is set at build time, source maps are uploaded to
GlitchTip so error stack traces show original TypeScript source and line number.
When SENTRY_AUTH_TOKEN is absent (CI, dev builds), upload is disabled via
autoUploadSourceMaps: false — the build succeeds normally.
Resolves Felix's review blocker on PR #591.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds @sentry/sveltekit to hooks.client.ts and hooks.server.ts.
When VITE_SENTRY_DSN is unset (default), Sentry is fully disabled.
When set to a GlitchTip JavaScript project DSN, browser exceptions
and SSR handleError events are forwarded automatically.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds GlitchTip env vars to the observability env var table, extends the
services table, and adds a first-run section with superuser creation and
project setup steps. Updates the C4 L2 container diagram with GlitchTip
and Redis containers and their relationships.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>