Commit Graph

2520 Commits

Author SHA1 Message Date
3658733003 fix(obs): add GlitchTip healthcheck on /_health/ (port 8080)
Some checks failed
CI / Unit & Component Tests (push) Waiting to run
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / OCR Service Tests (push) Successful in 42s
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
2026-05-16 09:37:17 +02:00
0bb0a314ad ci(obs): add obs-glitchtip to health assertion loop (now has /_health/ healthcheck)
Some checks are pending
CI / Unit & Component Tests (pull_request) Waiting to run
CI / OCR Service Tests (pull_request) Waiting to run
CI / Backend Unit Tests (pull_request) Waiting to run
CI / fail2ban Regex (pull_request) Waiting to run
CI / Compose Bucket Idempotency (pull_request) Waiting to run
2026-05-16 09:36:37 +02:00
b194b565f6 ci(obs): reference #603 in keep-in-sync comments; add obs-glitchtip to health assertion
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
2026-05-16 09:35:43 +02:00
Marcel
6720a5aeb2 chore(obs): improve deploy maintainability from review feedback
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m45s
CI / OCR Service Tests (pull_request) Successful in 47s
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
- Move POSTGRES_USER to obs.env (non-secret, constant across envs)
- Replace cp -r with rsync -a --delete so removed config files are
  purged from /opt/familienarchiv on next deploy instead of lingering
- Document --env-file ordering contract in validate + start steps:
  obs.env first (defaults), obs-secrets.env second (wins on dupes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:20:08 +02:00
Marcel
a7f60ebed8 docs(obs): add cp-r stale-file cleanup note to DEPLOYMENT.md
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m39s
CI / OCR Service Tests (pull_request) Successful in 46s
CI / Backend Unit Tests (pull_request) Failing after 9m24s
CI / fail2ban Regex (pull_request) Successful in 2m52s
CI / Compose Bucket Idempotency (pull_request) Successful in 2m24s
CI uses 'cp -r' which does not remove deleted files. Documents the
manual cleanup step for config files removed from git.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:04:41 +02:00
Marcel
25062be657 ci(obs): quote heredoc delimiter in release obs-secrets.env write
Same fix as nightly.yml: prevents shell expansion of '$' in secret
values after Gitea renders them. Keep in sync with nightly.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:04:12 +02:00
Marcel
9662ff5f8c ci(obs): quote heredoc delimiter in nightly obs-secrets.env write
Prevents shell from expanding '$' in Gitea-rendered secret values.
Without the quote, a password like 'P@$s5w0rd' has '$s5w0rd' silently
expanded to '' — writing a truncated value to obs-secrets.env.
'<<'EOF'' suppresses shell expansion; Gitea's '${{ }}' template
rendering already ran before the shell sees the script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 09:03:46 +02:00
Marcel
f5c7be932b ci(obs): document POSTGRES_HOST derivation from Compose project name
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m38s
CI / OCR Service Tests (pull_request) Successful in 45s
CI / Backend Unit Tests (pull_request) Failing after 10m48s
CI / fail2ban Regex (pull_request) Successful in 2m51s
CI / Compose Bucket Idempotency (pull_request) Successful in 2m16s
The container names archiv-staging-db-1 and archiv-production-db-1 are
derived from the Compose project name + service name. A project rename
silently breaks the obs stack DB connection. Add a comment at the point
of definition so the dependency is obvious when someone changes it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:54:17 +02:00
Marcel
dec0001bd1 ci(obs): chmod 600 obs-secrets.env after creation in both workflows
The heredoc creates the file with default umask permissions (644 —
world-readable). Setting 600 immediately after creation prevents other
processes on the host from reading the Grafana, GlitchTip, and Postgres
credentials. Defence-in-depth for the single-tenant VPS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:53:49 +02:00
Marcel
f628ab6435 ci(obs): add validate + health assertion steps to release.yml
nightly.yml had two observability gates that release.yml lacked:
- "Validate observability compose config" (docker compose config --quiet)
  catches missing env vars and YAML errors before any containers start
- "Assert observability stack health" checks obs-loki/prometheus/grafana/tempo
  are healthy after up --wait, covering services without healthcheck directives

Mirrors the nightly.yml steps verbatim so the production deploy path is at
least as well-verified as the nightly staging path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:53:18 +02:00
Marcel
4c5ee96e36 docs(adr): correct ADR-016 Decision section to match two-source env model
The Decision section described an operator-managed /opt/familienarchiv/.env
that CI does not touch. The actual implementation is a two-source model:
obs.env (git-tracked, non-secret config) + obs-secrets.env (CI-written
fresh from Gitea secrets on every deploy). Also updates the Consequences
bullet that incorrectly stated secrets are decoupled from CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 08:52:42 +02:00
Marcel
53cf1837b2 fix(obs): set POSTGRES_HOST per environment — staging/prod use compose auto-names not archive-db
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 2m58s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 2m39s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:21:53 +02:00
Marcel
d83ed7254d docs(obs): document obs vs main stack env model, obs.env + obs-secrets.env approach
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:20:21 +02:00
Marcel
1ae4bfe325 ci(obs): GitOps obs env split in release — deploy to /opt/familienarchiv/, secrets fresh from Gitea
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:19:12 +02:00
Marcel
c5139851b8 ci(obs): GitOps obs env split in nightly — obs.env in git, secrets fresh from Gitea
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:18:38 +02:00
Marcel
f9baf02b86 feat(obs): add GF_SERVER_ROOT_URL to Grafana service
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:17:47 +02:00
Marcel
b67bd201b2 feat(obs): add obs.env with non-secret config tracked in git
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:17:07 +02:00
Marcel
79735e23e0 ci(obs): assert obs-loki/prometheus/grafana/tempo are healthy after stack up
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 2m58s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 2m36s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:01:48 +02:00
Marcel
df37113d38 ci(obs): add compose config dry-run before obs stack up to catch .env substitution errors
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:01:17 +02:00
Marcel
c7d2eeb3f0 docs(ci): harden runner-config.yaml security comment for /opt/familienarchiv/ write access
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:00:44 +02:00
Marcel
4e94d85d7e docs(adr): add ADR-016 for obs stack co-location and CI-push config sync
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 00:00:07 +02:00
Marcel
dec6b8139b docs(c4): update l2-containers obs boundary to show /opt/familienarchiv/ permanent path
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:59:11 +02:00
Marcel
7b7d0c92a8 docs(obs): update DEPLOYMENT.md with /opt/familienarchiv/ ops section, env keys, runner restart
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:58:42 +02:00
Marcel
448c3cdcdb docs(obs): update .env.example for PORT_GRAFANA 3003, POSTGRES_HOST, $$ escaping
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:57:31 +02:00
Marcel
7e52494880 fix(ci): deploy obs configs to /opt/familienarchiv/ before starting stack
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m4s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m42s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
The observability stack's bind-mount sources pointed to workspace-relative
paths. When CI wiped the workspace between runs, containers kept running but
their config files disappeared — causing Docker to auto-create directories
at the missing paths and crash the services on next restart.

Fix: mount /opt/familienarchiv/ into CI job containers via runner-config.yaml,
then copy infra/observability/ and docker-compose.observability.yml there before
docker compose up. Compose runs from the permanent path, so bind mounts resolve
to stable host paths that survive workspace wipes.

Docker Compose reads /opt/familienarchiv/.env automatically (no --env-file flag),
which is managed on the server and persists between CI runs.

Closes #601

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 21:59:23 +02:00
Marcel
1181b97f94 fix(obs): make Postgres host configurable and fix PORT_GRAFANA default
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m6s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 2m43s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
POSTGRES_HOST variable (default: archive-db) lets the observability stack
connect to a different Postgres container — needed when only the staging
stack is running (container name: archiv-staging-db-1).

PORT_GRAFANA default changed from 3001 to 3003 to avoid collision with
the staging frontend which occupies 3001.

Closes #601

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 21:46:11 +02:00
Marcel
458968ded5 fix(obs): remove invalid processors block from tempo metrics_generator
Tempo 2.7.2 removed `processors` from the top-level metrics_generator
config; the field is only valid under `overrides.defaults.metrics_generator`.
The setting was already present there, so this only removes the now-rejected
duplicate at the top level.

Closes part of #601

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 21:45:49 +02:00
Marcel
23515b8542 fix(eslint): remove projectService from Svelte parser — restores fast lint
Some checks failed
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Successful in 3m23s
CI / OCR Service Tests (push) Successful in 17s
CI / Backend Unit Tests (push) Successful in 2m37s
CI / fail2ban Regex (push) Successful in 44s
CI / Compose Bucket Idempotency (push) Successful in 1m1s
nightly / deploy-staging (push) Failing after 2m33s
5646e739 added svelte-kit sync before lint so .svelte-kit/tsconfig.json
always exists. This activated projectService: true for every run, which
builds the full TypeScript language service for all .svelte files and
caused CI lint to take 7+ minutes.

None of the rules in the Svelte-specific block need type information —
they are all AST-selector-based no-restricted-syntax checks. Removing
projectService restores the previous fast path without losing any lint
coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 20:08:52 +02:00
Marcel
e4ac5f08e7 docs(ci): document workspace bind-mount setup for DooD runners
Some checks failed
CI / OCR Service Tests (pull_request) Successful in 59s
CI / Unit & Component Tests (push) Has been cancelled
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Backend Unit Tests (pull_request) Successful in 5m22s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / Unit & Component Tests (pull_request) Failing after 14m44s
Add the /srv/gitea-workspace prerequisite step to DEPLOYMENT.md §3.1
and a new "Workspace bind-mount setup" subsection plus failure mode 4
to ci-gitea.md, covering the root cause, one-time host setup, disk
management, and troubleshooting for the bind-mount resolution fix
introduced in ADR-015.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:46:54 +02:00
Marcel
15ef079eff docs(adr): add ADR-015 for DooD workspace bind-mount approach
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m36s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m7s
CI / fail2ban Regex (pull_request) Successful in 39s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
Documents the decision to use workdir_parent + identical host<->container
path instead of the overlay2 MergedDir sync that was in the initial fix.
Captures the alternatives (nsenter sync, image-baked configs, path mismatch)
and the operational consequences (prereq directory, out-of-band compose.yaml).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:38:18 +02:00
Marcel
56c3e51657 fix(ci): replace overlay2 sync with workspace bind mount for DooD
runner-config.yaml: correct path to /srv/gitea-workspace (VPS, not Synology).
docker-compose.observability.yml: revert 5 bind mounts to plain relative paths;
  OBS_CONFIG_DIR variable is no longer needed.
nightly.yml / release.yml: remove OBS_CONFIG_DIR env injection and the
  "Sync observability configs to host" step from both workflows.

With workdir_parent=/srv/gitea-workspace and an identical host<->container
bind mount, $(pwd) inside job containers resolves to a real host path the
daemon can find — no privileged container, no overlay2 inspection, no nsenter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:36:55 +02:00
Marcel
2cc8b1174b fix(ci): configure workspace bind mount for DooD bind-mount resolution
Set workdir_parent to /volume1/gitea-workspace so act_runner stores job
workspaces at a real NAS path. Mounting that path at the same absolute
location in job containers means $(pwd) inside any job container resolves
to a host path the daemon can find — no overlay2 tricks needed.

Prerequisite (NAS): mkdir -p /volume1/gitea-workspace and add
  - /volume1/gitea-workspace:/volume1/gitea-workspace
to the runner service volumes in gitea's docker-compose.yml, then restart
the runner.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:33:36 +02:00
Marcel
1fc47888d5 fix(ci): sync observability configs to host before docker compose up (#598)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m26s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 2m40s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Compose Bucket Idempotency (pull_request) Successful in 57s
DooD runner only shares /var/run/docker.sock — no workspace directory is
mapped to the host daemon. Relative bind mounts in
docker-compose.observability.yml resolved to paths that didn't exist on
the host; Docker auto-created directories in their place, causing
'not a directory' mount failures for all five config files.

Fix:
- docker-compose.observability.yml: replace hardcoded ./infra/observability/
  prefix with ${OBS_CONFIG_DIR:-./infra/observability} so the path is
  configurable while remaining backwards-compatible for local use.
- nightly.yml / release.yml: add a 'Sync observability configs to host'
  step that finds the job container's overlay2 MergedDir (the container's
  full filesystem as seen from the host mount namespace), then uses the
  existing nsenter/alpine pattern to cp the config tree into a stable host
  path (/srv/familienarchiv-{staging,production}/obs-configs).
  OBS_CONFIG_DIR is injected into the env file so Compose picks it up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:02:53 +02:00
Marcel
d435b2b0e4 fix(infra): pin GlitchTip image to 6.1.6 (v4 tag never existed)
All checks were successful
CI / Unit & Component Tests (push) Successful in 3m30s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 2m34s
CI / fail2ban Regex (push) Successful in 40s
CI / Compose Bucket Idempotency (push) Successful in 57s
glitchtip/glitchtip:v4 is not a real tag — GlitchTip does not use a
v-prefix in its Docker image versioning. Latest stable release is 6.1.6.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 18:07:32 +02:00
Marcel
fed427dc4a fix(infra): set OTEL_EXPORTER_OTLP_ENDPOINT in docker-compose.prod.yml
Some checks failed
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
The endpoint belongs in the compose file (hardcoded to the in-network
Tempo service) rather than per-environment workflow files. This covers
both staging (nightly.yml) and production (release.yml) with a single
change and removes the duplicate from the nightly env-file block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 17:43:23 +02:00
Marcel
cf78ab2f8e fix(staging): correct backend healthcheck port and OTel endpoint
Some checks failed
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Unit & Component Tests (pull_request) Has been cancelled
Two bugs introduced when the management port was split from the app port:

1. Backend healthcheck hit localhost:8080/actuator/health (app port) —
   actuator is on management.server.port=8081, so every probe got a 404
   from the main MVC dispatcher, marking the container permanently unhealthy.
   Fix: change the probe to localhost:8081.

2. OTEL_EXPORTER_OTLP_ENDPOINT was not set in .env.staging, so the exporter
   fell back to http://localhost:4317 (the CI-safe default) instead of
   http://tempo:4317 (the in-network Tempo service). Fix: inject the correct
   endpoint in the nightly env-file generation step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 17:37:15 +02:00
Marcel
c8883d0e40 fix(ci): isolate compose-idempotency network from archiv-net collisions
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 5m40s
CI / OCR Service Tests (pull_request) Successful in 34s
CI / Backend Unit Tests (pull_request) Successful in 7m8s
CI / fail2ban Regex (pull_request) Successful in 1m58s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m41s
CI / Unit & Component Tests (push) Successful in 5m37s
CI / OCR Service Tests (push) Successful in 28s
CI / Backend Unit Tests (push) Successful in 6m59s
CI / fail2ban Regex (push) Successful in 1m59s
CI / Compose Bucket Idempotency (push) Successful in 1m44s
The name: archiv-net declaration (needed so docker-compose.observability.yml
can join the network as external: true) caused the compose-idempotency CI job
to collide with any archiv-net left on the runner from staging or a previous
run. mc would resolve 'minio' to the wrong container and fail with a signature
mismatch.

Make the network name interpolable via COMPOSE_NETWORK_NAME (default: archiv-net
so production/staging behaviour is unchanged). Inject COMPOSE_NETWORK_NAME=
test-idem-archiv-net into the stub env file so the idempotency test always
gets a fully isolated network.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 16:33:07 +02:00
Marcel
7154092547 fix(deps): pin opentelemetry-bom to 1.61.0 to fix startup crash
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 5m34s
CI / OCR Service Tests (pull_request) Successful in 30s
CI / Backend Unit Tests (pull_request) Successful in 7m6s
CI / fail2ban Regex (pull_request) Successful in 1m49s
CI / Compose Bucket Idempotency (pull_request) Failing after 1m26s
opentelemetry-spring-boot-starter:2.27.0 was built against
opentelemetry-api:1.61.0. Spring Boot 4.0.0 only manages 1.55.0,
which is missing GlobalOpenTelemetry.getOrNoop(). The backend crashed
at startup with NoSuchMethodError on the first staging nightly.

Add a <dependencyManagement> import of opentelemetry-bom:1.61.0 before
the Spring Boot BOM applies, so all OTel core artifacts resolve to the
version the instrumentation starter actually requires.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 16:05:44 +02:00
Marcel
ada3a3ccaf devops(ci): add --remove-orphans to observability stack deploy steps
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 5m27s
CI / OCR Service Tests (pull_request) Successful in 34s
CI / Backend Unit Tests (pull_request) Successful in 7m13s
CI / fail2ban Regex (pull_request) Successful in 1m51s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m47s
CI / Unit & Component Tests (push) Successful in 5m45s
CI / OCR Service Tests (push) Successful in 36s
CI / Backend Unit Tests (push) Successful in 7m12s
CI / fail2ban Regex (push) Successful in 1m54s
CI / Compose Bucket Idempotency (push) Successful in 1m41s
Both nightly and release workflows were missing --remove-orphans on the
observability compose up, while the main app deploy step already had it.
Without it, containers removed from docker-compose.observability.yml
linger as unnamed orphans until manually pruned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 14:55:28 +02:00
Marcel
8cf3a2a726 devops(caddy): apply full security_headers snippet to GlitchTip vhost
The GlitchTip vhost only had a manual HSTS header; the rest of the
(security_headers) snippet (X-Content-Type-Options, Referrer-Policy,
Permissions-Policy, -Server removal) was missing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 14:54:54 +02:00
Marcel
553e2f8898 docs(deployment): add observability secrets to §3.3 Gitea secrets table
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 5m35s
CI / OCR Service Tests (pull_request) Successful in 33s
CI / Backend Unit Tests (pull_request) Successful in 7m10s
CI / fail2ban Regex (pull_request) Successful in 1m54s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m39s
GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, and SENTRY_DSN were
referenced in the workflow env files but absent from the secrets table,
leaving the first-run operator without a complete checklist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 13:46:01 +02:00
Marcel
4a7349543a devops(ci): wire SENTRY_DSN into staging and production env files
Adds SENTRY_DSN as an optional secret (empty by default) so it can be
set after GlitchTip first-run without requiring another code change.
Backend reads it via application.yaml; empty value keeps Sentry disabled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 13:45:07 +02:00
Marcel
f15e004645 devops(ci): add --wait to observability stack startup
Prometheus, Loki, Tempo, and Grafana all define healthchecks in
docker-compose.observability.yml. Without --wait, the step exits 0
as soon as containers are created, masking startup failures silently.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 13:44:16 +02:00
Marcel
b137e3e72d devops(caddy): add HSTS to GlitchTip vhost
Caddy does not set Strict-Transport-Security on GlitchTip because the
full security_headers snippet is intentionally omitted (Permissions-Policy
interferes with the Sentry SDK CORS). Adding HSTS alone guarantees
HTTPS enforcement at the Caddy layer without breaking SDK ingestion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 13:43:35 +02:00
Marcel
4c8a23ff14 devops(caddy): add Grafana and GlitchTip vhosts
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 5m33s
CI / OCR Service Tests (pull_request) Successful in 33s
CI / Backend Unit Tests (pull_request) Successful in 7m10s
CI / fail2ban Regex (pull_request) Successful in 1m55s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m42s
grafana.archiv.raddatz.cloud → 127.0.0.1:3003 (with security headers)
glitchtip.archiv.raddatz.cloud → 127.0.0.1:3002 (no security headers —
  GlitchTip manages its own; the Sentry SDK also POSTs here)

Requires A records for both subdomains pointing at the server before
the next `systemctl reload caddy`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 11:27:07 +02:00
Marcel
d7d225af77 devops(observability): wire observability stack into nightly and release deploys
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 4m32s
CI / OCR Service Tests (pull_request) Successful in 17s
CI / Backend Unit Tests (pull_request) Successful in 4m3s
CI / fail2ban Regex (pull_request) Successful in 1m55s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m42s
- docker-compose.prod.yml: add `name: archiv-net` so the network has a
  stable Docker name regardless of compose project name (-p flag).
  Both staging and production share the same host-level network, which
  is correct since the observability stack is a single shared instance.

- nightly.yml / release.yml: add observability env vars (POSTGRES_USER,
  PORT_GRAFANA=3003, PORT_GLITCHTIP=3002, PORT_PROMETHEUS=9090,
  GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY, GLITCHTIP_DOMAIN) to the
  env file, then `docker compose -f docker-compose.observability.yml up -d`
  after the app deploy step. PORT_GRAFANA=3003 avoids collision with
  staging frontend on 3001.

  Requires two new Gitea secrets: GRAFANA_ADMIN_PASSWORD, GLITCHTIP_SECRET_KEY.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 11:22:37 +02:00
Marcel
4358997482 perf(test): replace DirtiesContext(AFTER_EACH_TEST_METHOD) with @Transactional
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 4m40s
CI / OCR Service Tests (pull_request) Successful in 18s
CI / Backend Unit Tests (pull_request) Successful in 3m20s
CI / fail2ban Regex (pull_request) Successful in 47s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m4s
CI / Unit & Component Tests (push) Successful in 4m20s
CI / OCR Service Tests (push) Successful in 16s
CI / Backend Unit Tests (push) Successful in 3m8s
CI / fail2ban Regex (push) Successful in 44s
CI / Compose Bucket Idempotency (push) Successful in 1m1s
4 integration test classes were restarting the full Spring context (and a new
Postgres Testcontainer, ~75s each) after every test method — 10 unnecessary
container startups adding ~12 minutes to CI. Fixed by:

- PersonServiceIntegrationTest, DocumentSearchPagedIntegrationTest,
  GeschichteServiceIntegrationTest: swap to @Transactional so each test
  rolls back instead of destroying the context.
- AuditServiceIntegrationTest: cannot use @Transactional (logAfterCommit
  hooks into AFTER_COMMIT which requires a real commit); reset state with
  @BeforeEach deleteAll() instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 10:29:35 +02:00
Marcel
7c2e75facc fix(backend): switch to sentry-spring-boot-4:8.41.0 for Spring Boot 4/SF7 compatibility
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 6m12s
CI / OCR Service Tests (pull_request) Successful in 42s
CI / Backend Unit Tests (pull_request) Failing after 17m13s
CI / fail2ban Regex (pull_request) Successful in 2m37s
CI / Compose Bucket Idempotency (pull_request) Successful in 2m6s
sentry-spring-boot-starter-jakarta 8.5.0 does not support Spring Boot 4.0 —
it logs an "Incompatible Spring Boot Version" warning and its SentryAutoConfiguration
crashes SF7 bean-name generation. sentry-spring-boot-4 (added in 8.21.0) is the
dedicated Spring Boot 4 module with a fixed auto-configuration class.

- Replace sentry-spring-boot-starter-jakarta:8.5.0 with sentry-spring-boot-4:8.41.0
- Delete SentryConfig.java — workaround no longer needed, auto-config handles init
- Remove spring.autoconfigure.exclude from application.yaml + application-test.yaml
- Delete SentryConfigTest.java — tested the deleted workaround class
- Update ApplicationContextTest: assert Sentry.isEnabled() is false when no DSN set

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 09:51:53 +02:00
Marcel
7b05b9d5a0 test(context): assert SentryAutoConfiguration is excluded from Spring context
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 09:45:32 +02:00
Marcel
20edc0474c test(exception): verify handleGeneric captures exception in Sentry and returns 500
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 09:44:10 +02:00