Replace with a cross-reference to DEPLOYMENT.md §4 now that the obs
stack shipped as docker-compose.observability.yml.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- GlitchTip image corrected from glitchtip:v4 to glitchtip:6.1.6 in services table
- Grafana default port corrected from 3001 to 3003 in services table description
- SENTRY_DSN added to backend env vars table (wired in docker-compose.yml and application.yaml)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GlitchTip 6.x moved its internal listen port from 8080 to 8000.
The ports mapping was forwarding to the wrong port (host traffic
never reached the app), and the healthcheck was probing 8080 with
wget (not present in the image), causing the container to stay
permanently unhealthy.
Fix: map to port 8000, check with bash /dev/tcp (no external tools
needed, available in the Python base image).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The live runner config was missing /opt/familienarchiv in valid_volumes
and options, so deploy steps wrote files into the ephemeral job
container rather than the host — silently discarded on exit.
Updated /root/docker/gitea/runner-config.yaml on the server and
restarted gitea-runner. Repo file now matches the server exactly,
including the network: gitea_gitea setting that was previously
only on the server.
DEPLOYMENT.md: clarifies that /opt/familienarchiv does not need to be
in the runner container's own volumes (DooD spawns job containers from
the host daemon directly); updates restart command from systemctl to
docker restart; narrows the cp-r stale-file note to manual ops only
(CI uses rm -rf before copying).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rsync is not present in the act_runner job container image. rm -rf +
cp -r gives identical semantics (including removal of deleted files)
using only coreutils, which are always available.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move POSTGRES_USER to obs.env (non-secret, constant across envs)
- Replace cp -r with rsync -a --delete so removed config files are
purged from /opt/familienarchiv on next deploy instead of lingering
- Document --env-file ordering contract in validate + start steps:
obs.env first (defaults), obs-secrets.env second (wins on dupes)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI uses 'cp -r' which does not remove deleted files. Documents the
manual cleanup step for config files removed from git.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same fix as nightly.yml: prevents shell expansion of '$' in secret
values after Gitea renders them. Keep in sync with nightly.yml.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevents shell from expanding '$' in Gitea-rendered secret values.
Without the quote, a password like 'P@$s5w0rd' has '$s5w0rd' silently
expanded to '' — writing a truncated value to obs-secrets.env.
'<<'EOF'' suppresses shell expansion; Gitea's '${{ }}' template
rendering already ran before the shell sees the script.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The container names archiv-staging-db-1 and archiv-production-db-1 are
derived from the Compose project name + service name. A project rename
silently breaks the obs stack DB connection. Add a comment at the point
of definition so the dependency is obvious when someone changes it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The heredoc creates the file with default umask permissions (644 —
world-readable). Setting 600 immediately after creation prevents other
processes on the host from reading the Grafana, GlitchTip, and Postgres
credentials. Defence-in-depth for the single-tenant VPS.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nightly.yml had two observability gates that release.yml lacked:
- "Validate observability compose config" (docker compose config --quiet)
catches missing env vars and YAML errors before any containers start
- "Assert observability stack health" checks obs-loki/prometheus/grafana/tempo
are healthy after up --wait, covering services without healthcheck directives
Mirrors the nightly.yml steps verbatim so the production deploy path is at
least as well-verified as the nightly staging path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Decision section described an operator-managed /opt/familienarchiv/.env
that CI does not touch. The actual implementation is a two-source model:
obs.env (git-tracked, non-secret config) + obs-secrets.env (CI-written
fresh from Gitea secrets on every deploy). Also updates the Consequences
bullet that incorrectly stated secrets are decoupled from CI.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The observability stack's bind-mount sources pointed to workspace-relative
paths. When CI wiped the workspace between runs, containers kept running but
their config files disappeared — causing Docker to auto-create directories
at the missing paths and crash the services on next restart.
Fix: mount /opt/familienarchiv/ into CI job containers via runner-config.yaml,
then copy infra/observability/ and docker-compose.observability.yml there before
docker compose up. Compose runs from the permanent path, so bind mounts resolve
to stable host paths that survive workspace wipes.
Docker Compose reads /opt/familienarchiv/.env automatically (no --env-file flag),
which is managed on the server and persists between CI runs.
Closes#601
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POSTGRES_HOST variable (default: archive-db) lets the observability stack
connect to a different Postgres container — needed when only the staging
stack is running (container name: archiv-staging-db-1).
PORT_GRAFANA default changed from 3001 to 3003 to avoid collision with
the staging frontend which occupies 3001.
Closes#601
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tempo 2.7.2 removed `processors` from the top-level metrics_generator
config; the field is only valid under `overrides.defaults.metrics_generator`.
The setting was already present there, so this only removes the now-rejected
duplicate at the top level.
Closes part of #601
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5646e739 added svelte-kit sync before lint so .svelte-kit/tsconfig.json
always exists. This activated projectService: true for every run, which
builds the full TypeScript language service for all .svelte files and
caused CI lint to take 7+ minutes.
None of the rules in the Svelte-specific block need type information —
they are all AST-selector-based no-restricted-syntax checks. Removing
projectService restores the previous fast path without losing any lint
coverage.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add the /srv/gitea-workspace prerequisite step to DEPLOYMENT.md §3.1
and a new "Workspace bind-mount setup" subsection plus failure mode 4
to ci-gitea.md, covering the root cause, one-time host setup, disk
management, and troubleshooting for the bind-mount resolution fix
introduced in ADR-015.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the decision to use workdir_parent + identical host<->container
path instead of the overlay2 MergedDir sync that was in the initial fix.
Captures the alternatives (nsenter sync, image-baked configs, path mismatch)
and the operational consequences (prereq directory, out-of-band compose.yaml).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
runner-config.yaml: correct path to /srv/gitea-workspace (VPS, not Synology).
docker-compose.observability.yml: revert 5 bind mounts to plain relative paths;
OBS_CONFIG_DIR variable is no longer needed.
nightly.yml / release.yml: remove OBS_CONFIG_DIR env injection and the
"Sync observability configs to host" step from both workflows.
With workdir_parent=/srv/gitea-workspace and an identical host<->container
bind mount, $(pwd) inside job containers resolves to a real host path the
daemon can find — no privileged container, no overlay2 inspection, no nsenter.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Set workdir_parent to /volume1/gitea-workspace so act_runner stores job
workspaces at a real NAS path. Mounting that path at the same absolute
location in job containers means $(pwd) inside any job container resolves
to a host path the daemon can find — no overlay2 tricks needed.
Prerequisite (NAS): mkdir -p /volume1/gitea-workspace and add
- /volume1/gitea-workspace:/volume1/gitea-workspace
to the runner service volumes in gitea's docker-compose.yml, then restart
the runner.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DooD runner only shares /var/run/docker.sock — no workspace directory is
mapped to the host daemon. Relative bind mounts in
docker-compose.observability.yml resolved to paths that didn't exist on
the host; Docker auto-created directories in their place, causing
'not a directory' mount failures for all five config files.
Fix:
- docker-compose.observability.yml: replace hardcoded ./infra/observability/
prefix with ${OBS_CONFIG_DIR:-./infra/observability} so the path is
configurable while remaining backwards-compatible for local use.
- nightly.yml / release.yml: add a 'Sync observability configs to host'
step that finds the job container's overlay2 MergedDir (the container's
full filesystem as seen from the host mount namespace), then uses the
existing nsenter/alpine pattern to cp the config tree into a stable host
path (/srv/familienarchiv-{staging,production}/obs-configs).
OBS_CONFIG_DIR is injected into the env file so Compose picks it up.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
glitchtip/glitchtip:v4 is not a real tag — GlitchTip does not use a
v-prefix in its Docker image versioning. Latest stable release is 6.1.6.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The endpoint belongs in the compose file (hardcoded to the in-network
Tempo service) rather than per-environment workflow files. This covers
both staging (nightly.yml) and production (release.yml) with a single
change and removes the duplicate from the nightly env-file block.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs introduced when the management port was split from the app port:
1. Backend healthcheck hit localhost:8080/actuator/health (app port) —
actuator is on management.server.port=8081, so every probe got a 404
from the main MVC dispatcher, marking the container permanently unhealthy.
Fix: change the probe to localhost:8081.
2. OTEL_EXPORTER_OTLP_ENDPOINT was not set in .env.staging, so the exporter
fell back to http://localhost:4317 (the CI-safe default) instead of
http://tempo:4317 (the in-network Tempo service). Fix: inject the correct
endpoint in the nightly env-file generation step.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The name: archiv-net declaration (needed so docker-compose.observability.yml
can join the network as external: true) caused the compose-idempotency CI job
to collide with any archiv-net left on the runner from staging or a previous
run. mc would resolve 'minio' to the wrong container and fail with a signature
mismatch.
Make the network name interpolable via COMPOSE_NETWORK_NAME (default: archiv-net
so production/staging behaviour is unchanged). Inject COMPOSE_NETWORK_NAME=
test-idem-archiv-net into the stub env file so the idempotency test always
gets a fully isolated network.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
opentelemetry-spring-boot-starter:2.27.0 was built against
opentelemetry-api:1.61.0. Spring Boot 4.0.0 only manages 1.55.0,
which is missing GlobalOpenTelemetry.getOrNoop(). The backend crashed
at startup with NoSuchMethodError on the first staging nightly.
Add a <dependencyManagement> import of opentelemetry-bom:1.61.0 before
the Spring Boot BOM applies, so all OTel core artifacts resolve to the
version the instrumentation starter actually requires.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both nightly and release workflows were missing --remove-orphans on the
observability compose up, while the main app deploy step already had it.
Without it, containers removed from docker-compose.observability.yml
linger as unnamed orphans until manually pruned.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The GlitchTip vhost only had a manual HSTS header; the rest of the
(security_headers) snippet (X-Content-Type-Options, Referrer-Policy,
Permissions-Policy, -Server removal) was missing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, and SENTRY_DSN were
referenced in the workflow env files but absent from the secrets table,
leaving the first-run operator without a complete checklist.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds SENTRY_DSN as an optional secret (empty by default) so it can be
set after GlitchTip first-run without requiring another code change.
Backend reads it via application.yaml; empty value keeps Sentry disabled.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prometheus, Loki, Tempo, and Grafana all define healthchecks in
docker-compose.observability.yml. Without --wait, the step exits 0
as soon as containers are created, masking startup failures silently.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>