Compare commits

...

2 Commits

Author SHA1 Message Date
Marcel
109202246e fix(deps): bump vite 7.3.3 -> 7.3.5 to clear the high-severity audit gate
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 7m30s
CI / OCR Service Tests (pull_request) Successful in 37s
CI / Backend Unit Tests (pull_request) Failing after 12m40s
CI / fail2ban Regex (pull_request) Successful in 1m46s
CI / Semgrep Security Scan (pull_request) Successful in 35s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m49s
SDD Gate / RTM Check (pull_request) Successful in 31s
SDD Gate / Contract Validate (pull_request) Successful in 41s
SDD Gate / Constitution Impact (pull_request) Successful in 29s
CI / Unit & Component Tests (push) Waiting to run
CI / OCR Service Tests (push) Waiting to run
CI / Backend Unit Tests (push) Waiting to run
CI / fail2ban Regex (push) Waiting to run
CI / Semgrep Security Scan (push) Waiting to run
CI / Compose Bucket Idempotency (push) Waiting to run
vite 7.3.3 carries two high-severity advisories (GHSA-v6wh-96g9-6wx3
NTLMv2 UNC disclosure, GHSA-fx2h-pf6j-xcff server.fs.deny bypass), both
flagged by the CI gate `npm audit --audit-level=high --omit=dev`. 7.3.5
is in-range of the existing `^7.3.3` constraint, so this is a
lockfile-only patch with no package.json change. Gate now exits 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 09:16:58 +02:00
273a97046a fix(ci): re-enable Testcontainers Ryuk to stop the backend fork shutdown hang (#848) (#849)
Some checks failed
CI / Unit & Component Tests (push) Failing after 39s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 5m57s
CI / fail2ban Regex (push) Successful in 43s
CI / Semgrep Security Scan (push) Successful in 24s
CI / Compose Bucket Idempotency (push) Successful in 1m8s
nightly / deploy-staging (push) Successful in 4m49s
nightly / npm-audit (push) Failing after 18s
Renovate / renovate (push) Failing after 23s
Fixes #848.

## Symptom

CI `Backend Unit Tests` goes red despite **all tests passing**: after the last test, the fork hangs at JVM shutdown and Surefire reports `There was a timeout in the fork` → `BUILD FAILURE`.

## Root cause (corrected after investigation)

My first theory (slow shutdown needs a bigger timeout) was **wrong** — raising `forkedProcessExitTimeoutInSeconds` 30→120 only delayed the kill by ~90s (total time 12:35 → 14:04), proving an *indefinite* hang, not slowness.

The real cause is **Testcontainers teardown with Ryuk disabled**:
- The job set `TESTCONTAINERS_RYUK_DISABLED: "true"` (carry-over from the old NAS runner).
- With Ryuk off, containers are reaped by the **in-JVM `JVMHookResourceReaper`** at shutdown. That reaper crashes (`NotFoundException`) and **leaks containers run-over-run**.
- The run boots ~30 per-context Spring contexts (`PostgresContainerConfig` is a per-context `@Bean`), so ~30 Postgres containers are torn down in-JVM at shutdown.
- As leaks accumulate on the runner, per-run teardown degrades until the fork hangs at shutdown → fork timeout. **The server had 21 orphaned `postgres:16-alpine`/`minio` containers up to 5 weeks old**; manually killing them is what restored CI before (a recurring pattern).

Environment confirmed via `ssh root@raddatz.cloud`: CI now runs on a root server with **Docker 29.4.3** (8 CPU, 62 GB, socket access) — so the original reason to disable Ryuk no longer applies, and Docker is *not* slow.

## Change

1. **Re-enable Ryuk** (remove `TESTCONTAINERS_RYUK_DISABLED`) — Ryuk reaps each run's containers out-of-process after the JVM exits, so they never accumulate. Automates the manual "kill all testcontainers."
2. Keep `forkedProcessExitTimeoutInSeconds=120` as a harmless backstop.
3. Drop the stale "NAS runner" comment on `DOCKER_API_VERSION`.

Operational: the 21 leaked containers were already removed from the server (by `org.testcontainers=true` label; real services untouched), giving immediate relief.

## Validation

Validated by this PR's CI run on the real runner (watching it). If Ryuk can't start in the runner's docker-outside-docker setup, the integration tests fail fast and I revert — fallback is a singleton Postgres container.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Marcel <marcel@familienarchiv>
Reviewed-on: #849
2026-06-15 20:53:58 +02:00
3 changed files with 16 additions and 5 deletions

View File

@@ -229,9 +229,14 @@ jobs:
name: Backend Unit Tests name: Backend Unit Tests
runs-on: ubuntu-latest runs-on: ubuntu-latest
env: env:
DOCKER_API_VERSION: "1.43" # NAS runner runs Docker 24.x (max API 1.43); Testcontainers 2.x defaults to 1.44 # CI runs against the root-server Docker daemon (29.x). This API pin is a harmless
# carry-over from the old NAS runner (Docker 24.x, max API 1.43); safe to drop later.
DOCKER_API_VERSION: "1.43"
DOCKER_HOST: unix:///var/run/docker.sock DOCKER_HOST: unix:///var/run/docker.sock
TESTCONTAINERS_RYUK_DISABLED: "true" # Ryuk (Testcontainers' out-of-process reaper) is intentionally LEFT ENABLED so it
# removes each run's containers after the JVM exits. Disabling it forced the in-JVM
# reaper, which hung at JVM shutdown and leaked Postgres containers run-over-run until
# the daemon degraded and the fork timed out at teardown — see #848.
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4

View File

@@ -369,6 +369,12 @@
<artifactId>maven-surefire-plugin</artifactId> <artifactId>maven-surefire-plugin</artifactId>
<configuration> <configuration>
<forkedProcessTimeoutInSeconds>600</forkedProcessTimeoutInSeconds> <forkedProcessTimeoutInSeconds>600</forkedProcessTimeoutInSeconds>
<!-- Grace period after the test JVM calls System.exit(0). The 30s default is too
short: the single reused fork closes ~32 cached Spring contexts at shutdown,
each tearing down a Testcontainers Postgres + HikariCP pool, which overruns 30s
and makes Surefire kill the fork (BUILD FAILURE despite 0 test failures). This is
a different knob from forkedProcessTimeoutInSeconds above. See issue #848. -->
<forkedProcessExitTimeoutInSeconds>120</forkedProcessExitTimeoutInSeconds>
<systemPropertyVariables> <systemPropertyVariables>
<junit.jupiter.execution.timeout.default>90 s</junit.jupiter.execution.timeout.default> <junit.jupiter.execution.timeout.default>90 s</junit.jupiter.execution.timeout.default>
</systemPropertyVariables> </systemPropertyVariables>

View File

@@ -9515,9 +9515,9 @@
} }
}, },
"node_modules/vite": { "node_modules/vite": {
"version": "7.3.3", "version": "7.3.5",
"resolved": "https://registry.npmjs.org/vite/-/vite-7.3.3.tgz", "resolved": "https://registry.npmjs.org/vite/-/vite-7.3.5.tgz",
"integrity": "sha512-/4XH147Ui7OGTjg3HbdWe5arnZQSbfuRzdr9Ec7TQi5I7R+ir0Rlc9GIvD4v0XZurELqA035KVXJXpR61xhiTA==", "integrity": "sha512-KuOaNhcnGFN2zIPGA7wRmzF+lJA1sea7rHq17aiJ++9lzY1WWG6Jpwqwe1KNbRVPIqHmr8GLYx7jbrQcN/7/ww==",
"license": "MIT", "license": "MIT",
"dependencies": { "dependencies": {
"esbuild": "^0.27.0", "esbuild": "^0.27.0",