Commit Graph

2531 Commits

Author SHA1 Message Date
Marcel
c9a22945c8 docs(claude): add URL format example to GLITCHTIP_DOMAIN env var
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:25:01 +02:00
Marcel
9d84ebc4fe docs(deployment): add VITE_SENTRY_DSN to §3.3 Gitea secrets table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:24:37 +02:00
Marcel
58b9204395 docs(deployment): add VITE_SENTRY_DSN to §2 observability env vars table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:15:24 +02:00
Marcel
0d662f3a5e docs(c4): update GlitchTip image tag to 6.1.6 in L2 container diagram
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 11:14:17 +02:00
Marcel
2e864e5b81 docs(infra): remove stale 'observability not yet deployed' note
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m3s
CI / OCR Service Tests (pull_request) Successful in 16s
CI / Backend Unit Tests (pull_request) Successful in 2m42s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
Replace with a cross-reference to DEPLOYMENT.md §4 now that the obs
stack shipped as docker-compose.observability.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:54:04 +02:00
Marcel
40d9713b79 docs(deployment): fix stale GlitchTip image tags and add SENTRY_DSN to env vars table
- GlitchTip image corrected from glitchtip:v4 to glitchtip:6.1.6 in services table
- Grafana default port corrected from 3001 to 3003 in services table description
- SENTRY_DSN added to backend env vars table (wired in docker-compose.yml and application.yaml)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:53:31 +02:00
Marcel
68d07fe961 docs(claude): add observability service table and env var reference
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:52:36 +02:00
Marcel
6145a25fe2 fix(obs): correct GlitchTip port and healthcheck for v6.x
Some checks failed
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
GlitchTip 6.x moved its internal listen port from 8080 to 8000.
The ports mapping was forwarding to the wrong port (host traffic
never reached the app), and the healthcheck was probing 8080 with
wget (not present in the image), causing the container to stay
permanently unhealthy.

Fix: map to port 8000, check with bash /dev/tcp (no external tools
needed, available in the Python base image).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:31:07 +02:00
Marcel
c43f45a472 Merge branch 'fix/issue-601-obs-stack-permanent'
Some checks failed
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
2026-05-16 10:19:59 +02:00
Marcel
134f1e2ae0 chore(runner): mount /opt/familienarchiv into job containers
The live runner config was missing /opt/familienarchiv in valid_volumes
and options, so deploy steps wrote files into the ephemeral job
container rather than the host — silently discarded on exit.

Updated /root/docker/gitea/runner-config.yaml on the server and
restarted gitea-runner. Repo file now matches the server exactly,
including the network: gitea_gitea setting that was previously
only on the server.

DEPLOYMENT.md: clarifies that /opt/familienarchiv does not need to be
in the runner container's own volumes (DooD spawns job containers from
the host daemon directly); updates restart command from systemctl to
docker restart; narrows the cp-r stale-file note to manual ops only
(CI uses rm -rf before copying).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:19:09 +02:00
Marcel
55ccd5f3c0 ci(obs): replace rsync with rm+cp in deploy step
rsync is not present in the act_runner job container image. rm -rf +
cp -r gives identical semantics (including removal of deleted files)
using only coreutils, which are always available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 10:18:42 +02:00
3658733003 fix(obs): add GlitchTip healthcheck on /_health/ (port 8080)
Some checks failed
CI / Unit & Component Tests (push) Waiting to run
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / OCR Service Tests (push) Successful in 42s
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
2026-05-16 09:37:17 +02:00
0bb0a314ad ci(obs): add obs-glitchtip to health assertion loop (now has /_health/ healthcheck)
Some checks are pending
CI / Unit & Component Tests (pull_request) Waiting to run
CI / OCR Service Tests (pull_request) Waiting to run
CI / Backend Unit Tests (pull_request) Waiting to run
CI / fail2ban Regex (pull_request) Waiting to run
CI / Compose Bucket Idempotency (pull_request) Waiting to run
2026-05-16 09:36:37 +02:00
b194b565f6 ci(obs): reference #603 in keep-in-sync comments; add obs-glitchtip to health assertion
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
2026-05-16 09:35:43 +02:00
Marcel
6720a5aeb2 chore(obs): improve deploy maintainability from review feedback
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m45s
CI / OCR Service Tests (pull_request) Successful in 47s
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
- Move POSTGRES_USER to obs.env (non-secret, constant across envs)
- Replace cp -r with rsync -a --delete so removed config files are
  purged from /opt/familienarchiv on next deploy instead of lingering
- Document --env-file ordering contract in validate + start steps:
  obs.env first (defaults), obs-secrets.env second (wins on dupes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:20:08 +02:00
Marcel
a7f60ebed8 docs(obs): add cp-r stale-file cleanup note to DEPLOYMENT.md
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m39s
CI / OCR Service Tests (pull_request) Successful in 46s
CI / Backend Unit Tests (pull_request) Failing after 9m24s
CI / fail2ban Regex (pull_request) Successful in 2m52s
CI / Compose Bucket Idempotency (pull_request) Successful in 2m24s
CI uses 'cp -r' which does not remove deleted files. Documents the
manual cleanup step for config files removed from git.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:04:41 +02:00
Marcel
25062be657 ci(obs): quote heredoc delimiter in release obs-secrets.env write
Same fix as nightly.yml: prevents shell expansion of '$' in secret
values after Gitea renders them. Keep in sync with nightly.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:04:12 +02:00
Marcel
9662ff5f8c ci(obs): quote heredoc delimiter in nightly obs-secrets.env write
Prevents shell from expanding '$' in Gitea-rendered secret values.
Without the quote, a password like 'P@$s5w0rd' has '$s5w0rd' silently
expanded to '' — writing a truncated value to obs-secrets.env.
'<<'EOF'' suppresses shell expansion; Gitea's '${{ }}' template
rendering already ran before the shell sees the script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:03:46 +02:00
Marcel
f5c7be932b ci(obs): document POSTGRES_HOST derivation from Compose project name
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m38s
CI / OCR Service Tests (pull_request) Successful in 45s
CI / Backend Unit Tests (pull_request) Failing after 10m48s
CI / fail2ban Regex (pull_request) Successful in 2m51s
CI / Compose Bucket Idempotency (pull_request) Successful in 2m16s
The container names archiv-staging-db-1 and archiv-production-db-1 are
derived from the Compose project name + service name. A project rename
silently breaks the obs stack DB connection. Add a comment at the point
of definition so the dependency is obvious when someone changes it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:54:17 +02:00
Marcel
dec0001bd1 ci(obs): chmod 600 obs-secrets.env after creation in both workflows
The heredoc creates the file with default umask permissions (644 —
world-readable). Setting 600 immediately after creation prevents other
processes on the host from reading the Grafana, GlitchTip, and Postgres
credentials. Defence-in-depth for the single-tenant VPS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:53:49 +02:00
Marcel
f628ab6435 ci(obs): add validate + health assertion steps to release.yml
nightly.yml had two observability gates that release.yml lacked:
- "Validate observability compose config" (docker compose config --quiet)
  catches missing env vars and YAML errors before any containers start
- "Assert observability stack health" checks obs-loki/prometheus/grafana/tempo
  are healthy after up --wait, covering services without healthcheck directives

Mirrors the nightly.yml steps verbatim so the production deploy path is at
least as well-verified as the nightly staging path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:53:18 +02:00
Marcel
4c5ee96e36 docs(adr): correct ADR-016 Decision section to match two-source env model
The Decision section described an operator-managed /opt/familienarchiv/.env
that CI does not touch. The actual implementation is a two-source model:
obs.env (git-tracked, non-secret config) + obs-secrets.env (CI-written
fresh from Gitea secrets on every deploy). Also updates the Consequences
bullet that incorrectly stated secrets are decoupled from CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:52:42 +02:00
Marcel
53cf1837b2 fix(obs): set POSTGRES_HOST per environment — staging/prod use compose auto-names not archive-db
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 2m58s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 2m39s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:21:53 +02:00
Marcel
d83ed7254d docs(obs): document obs vs main stack env model, obs.env + obs-secrets.env approach
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:20:21 +02:00
Marcel
1ae4bfe325 ci(obs): GitOps obs env split in release — deploy to /opt/familienarchiv/, secrets fresh from Gitea
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:19:12 +02:00
Marcel
c5139851b8 ci(obs): GitOps obs env split in nightly — obs.env in git, secrets fresh from Gitea
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:18:38 +02:00
Marcel
f9baf02b86 feat(obs): add GF_SERVER_ROOT_URL to Grafana service
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:17:47 +02:00
Marcel
b67bd201b2 feat(obs): add obs.env with non-secret config tracked in git
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:17:07 +02:00
Marcel
79735e23e0 ci(obs): assert obs-loki/prometheus/grafana/tempo are healthy after stack up
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 2m58s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 2m36s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:01:48 +02:00
Marcel
df37113d38 ci(obs): add compose config dry-run before obs stack up to catch .env substitution errors
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:01:17 +02:00
Marcel
c7d2eeb3f0 docs(ci): harden runner-config.yaml security comment for /opt/familienarchiv/ write access
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:00:44 +02:00
Marcel
4e94d85d7e docs(adr): add ADR-016 for obs stack co-location and CI-push config sync
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:00:07 +02:00
Marcel
dec6b8139b docs(c4): update l2-containers obs boundary to show /opt/familienarchiv/ permanent path
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:59:11 +02:00
Marcel
7b7d0c92a8 docs(obs): update DEPLOYMENT.md with /opt/familienarchiv/ ops section, env keys, runner restart
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:58:42 +02:00
Marcel
448c3cdcdb docs(obs): update .env.example for PORT_GRAFANA 3003, POSTGRES_HOST, $$ escaping
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:57:31 +02:00
Marcel
7e52494880 fix(ci): deploy obs configs to /opt/familienarchiv/ before starting stack
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m4s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m42s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
The observability stack's bind-mount sources pointed to workspace-relative
paths. When CI wiped the workspace between runs, containers kept running but
their config files disappeared — causing Docker to auto-create directories
at the missing paths and crash the services on next restart.

Fix: mount /opt/familienarchiv/ into CI job containers via runner-config.yaml,
then copy infra/observability/ and docker-compose.observability.yml there before
docker compose up. Compose runs from the permanent path, so bind mounts resolve
to stable host paths that survive workspace wipes.

Docker Compose reads /opt/familienarchiv/.env automatically (no --env-file flag),
which is managed on the server and persists between CI runs.

Closes #601

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 21:59:23 +02:00
Marcel
1181b97f94 fix(obs): make Postgres host configurable and fix PORT_GRAFANA default
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m6s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 2m43s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
POSTGRES_HOST variable (default: archive-db) lets the observability stack
connect to a different Postgres container — needed when only the staging
stack is running (container name: archiv-staging-db-1).

PORT_GRAFANA default changed from 3001 to 3003 to avoid collision with
the staging frontend which occupies 3001.

Closes #601

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 21:46:11 +02:00
Marcel
458968ded5 fix(obs): remove invalid processors block from tempo metrics_generator
Tempo 2.7.2 removed `processors` from the top-level metrics_generator
config; the field is only valid under `overrides.defaults.metrics_generator`.
The setting was already present there, so this only removes the now-rejected
duplicate at the top level.

Closes part of #601

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 21:45:49 +02:00
Marcel
23515b8542 fix(eslint): remove projectService from Svelte parser — restores fast lint
Some checks failed
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Successful in 3m23s
CI / OCR Service Tests (push) Successful in 17s
CI / Backend Unit Tests (push) Successful in 2m37s
CI / fail2ban Regex (push) Successful in 44s
CI / Compose Bucket Idempotency (push) Successful in 1m1s
nightly / deploy-staging (push) Failing after 2m33s
5646e739 added svelte-kit sync before lint so .svelte-kit/tsconfig.json
always exists. This activated projectService: true for every run, which
builds the full TypeScript language service for all .svelte files and
caused CI lint to take 7+ minutes.

None of the rules in the Svelte-specific block need type information —
they are all AST-selector-based no-restricted-syntax checks. Removing
projectService restores the previous fast path without losing any lint
coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 20:08:52 +02:00
Marcel
e4ac5f08e7 docs(ci): document workspace bind-mount setup for DooD runners
Some checks failed
CI / OCR Service Tests (pull_request) Successful in 59s
CI / Unit & Component Tests (push) Has been cancelled
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Backend Unit Tests (pull_request) Successful in 5m22s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / Unit & Component Tests (pull_request) Failing after 14m44s
Add the /srv/gitea-workspace prerequisite step to DEPLOYMENT.md §3.1
and a new "Workspace bind-mount setup" subsection plus failure mode 4
to ci-gitea.md, covering the root cause, one-time host setup, disk
management, and troubleshooting for the bind-mount resolution fix
introduced in ADR-015.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:46:54 +02:00
Marcel
15ef079eff docs(adr): add ADR-015 for DooD workspace bind-mount approach
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m36s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m7s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
Documents the decision to use workdir_parent + identical host<->container
path instead of the overlay2 MergedDir sync that was in the initial fix.
Captures the alternatives (nsenter sync, image-baked configs, path mismatch)
and the operational consequences (prereq directory, out-of-band compose.yaml).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:38:18 +02:00
Marcel
56c3e51657 fix(ci): replace overlay2 sync with workspace bind mount for DooD
runner-config.yaml: correct path to /srv/gitea-workspace (VPS, not Synology).
docker-compose.observability.yml: revert 5 bind mounts to plain relative paths;
  OBS_CONFIG_DIR variable is no longer needed.
nightly.yml / release.yml: remove OBS_CONFIG_DIR env injection and the
  "Sync observability configs to host" step from both workflows.

With workdir_parent=/srv/gitea-workspace and an identical host<->container
bind mount, $(pwd) inside job containers resolves to a real host path the
daemon can find — no privileged container, no overlay2 inspection, no nsenter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:36:55 +02:00
Marcel
2cc8b1174b fix(ci): configure workspace bind mount for DooD bind-mount resolution
Set workdir_parent to /volume1/gitea-workspace so act_runner stores job
workspaces at a real NAS path. Mounting that path at the same absolute
location in job containers means $(pwd) inside any job container resolves
to a host path the daemon can find — no overlay2 tricks needed.

Prerequisite (NAS): mkdir -p /volume1/gitea-workspace and add
  - /volume1/gitea-workspace:/volume1/gitea-workspace
to the runner service volumes in gitea's docker-compose.yml, then restart
the runner.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:33:36 +02:00
Marcel
1fc47888d5 fix(ci): sync observability configs to host before docker compose up (#598)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m26s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m40s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
DooD runner only shares /var/run/docker.sock — no workspace directory is
mapped to the host daemon. Relative bind mounts in
docker-compose.observability.yml resolved to paths that didn't exist on
the host; Docker auto-created directories in their place, causing
'not a directory' mount failures for all five config files.

Fix:
- docker-compose.observability.yml: replace hardcoded ./infra/observability/
  prefix with ${OBS_CONFIG_DIR:-./infra/observability} so the path is
  configurable while remaining backwards-compatible for local use.
- nightly.yml / release.yml: add a 'Sync observability configs to host'
  step that finds the job container's overlay2 MergedDir (the container's
  full filesystem as seen from the host mount namespace), then uses the
  existing nsenter/alpine pattern to cp the config tree into a stable host
  path (/srv/familienarchiv-{staging,production}/obs-configs).
  OBS_CONFIG_DIR is injected into the env file so Compose picks it up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:02:53 +02:00
Marcel
d435b2b0e4 fix(infra): pin GlitchTip image to 6.1.6 (v4 tag never existed)
All checks were successful
CI / Unit & Component Tests (push) Successful in 3m30s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 2m34s
CI / fail2ban Regex (push) Successful in 40s
CI / Compose Bucket Idempotency (push) Successful in 57s
glitchtip/glitchtip:v4 is not a real tag — GlitchTip does not use a
v-prefix in its Docker image versioning. Latest stable release is 6.1.6.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 18:07:32 +02:00
Marcel
fed427dc4a fix(infra): set OTEL_EXPORTER_OTLP_ENDPOINT in docker-compose.prod.yml
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
The endpoint belongs in the compose file (hardcoded to the in-network
Tempo service) rather than per-environment workflow files. This covers
both staging (nightly.yml) and production (release.yml) with a single
change and removes the duplicate from the nightly env-file block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 17:43:23 +02:00
Marcel
cf78ab2f8e fix(staging): correct backend healthcheck port and OTel endpoint
Some checks failed
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (pull_request) Has been cancelled
Two bugs introduced when the management port was split from the app port:

1. Backend healthcheck hit localhost:8080/actuator/health (app port) —
   actuator is on management.server.port=8081, so every probe got a 404
   from the main MVC dispatcher, marking the container permanently unhealthy.
   Fix: change the probe to localhost:8081.

2. OTEL_EXPORTER_OTLP_ENDPOINT was not set in .env.staging, so the exporter
   fell back to http://localhost:4317 (the CI-safe default) instead of
   http://tempo:4317 (the in-network Tempo service). Fix: inject the correct
   endpoint in the nightly env-file generation step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 17:37:15 +02:00
Marcel
c8883d0e40 fix(ci): isolate compose-idempotency network from archiv-net collisions
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 5m40s
CI / OCR Service Tests (pull_request) Successful in 34s
CI / Backend Unit Tests (pull_request) Successful in 7m8s
CI / fail2ban Regex (pull_request) Successful in 1m58s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m41s
CI / Unit & Component Tests (push) Successful in 5m37s
CI / OCR Service Tests (push) Successful in 28s
CI / Backend Unit Tests (push) Successful in 6m59s
CI / fail2ban Regex (push) Successful in 1m59s
CI / Compose Bucket Idempotency (push) Successful in 1m44s
The name: archiv-net declaration (needed so docker-compose.observability.yml
can join the network as external: true) caused the compose-idempotency CI job
to collide with any archiv-net left on the runner from staging or a previous
run. mc would resolve 'minio' to the wrong container and fail with a signature
mismatch.

Make the network name interpolable via COMPOSE_NETWORK_NAME (default: archiv-net
so production/staging behaviour is unchanged). Inject COMPOSE_NETWORK_NAME=
test-idem-archiv-net into the stub env file so the idempotency test always
gets a fully isolated network.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 16:33:07 +02:00
Marcel
7154092547 fix(deps): pin opentelemetry-bom to 1.61.0 to fix startup crash
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m34s
CI / OCR Service Tests (pull_request) Successful in 30s
CI / Backend Unit Tests (pull_request) Successful in 7m6s
CI / fail2ban Regex (pull_request) Successful in 1m49s
CI / Compose Bucket Idempotency (pull_request) Failing after 1m26s
opentelemetry-spring-boot-starter:2.27.0 was built against
opentelemetry-api:1.61.0. Spring Boot 4.0.0 only manages 1.55.0,
which is missing GlobalOpenTelemetry.getOrNoop(). The backend crashed
at startup with NoSuchMethodError on the first staging nightly.

Add a <dependencyManagement> import of opentelemetry-bom:1.61.0 before
the Spring Boot BOM applies, so all OTel core artifacts resolve to the
version the instrumentation starter actually requires.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 16:05:44 +02:00
Marcel
ada3a3ccaf devops(ci): add --remove-orphans to observability stack deploy steps
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 5m27s
CI / OCR Service Tests (pull_request) Successful in 34s
CI / Backend Unit Tests (pull_request) Successful in 7m13s
CI / fail2ban Regex (pull_request) Successful in 1m51s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m47s
CI / Unit & Component Tests (push) Successful in 5m45s
CI / OCR Service Tests (push) Successful in 36s
CI / Backend Unit Tests (push) Successful in 7m12s
CI / fail2ban Regex (push) Successful in 1m54s
CI / Compose Bucket Idempotency (push) Successful in 1m41s
Both nightly and release workflows were missing --remove-orphans on the
observability compose up, while the main app deploy step already had it.
Without it, containers removed from docker-compose.observability.yml
linger as unnamed orphans until manually pruned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 14:55:28 +02:00