fix(ci): re-enable Testcontainers Ryuk to stop the shutdown hang
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 1m22s
CI / OCR Service Tests (pull_request) Successful in 54s
CI / Backend Unit Tests (pull_request) Successful in 10m55s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 24s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m9s
SDD Gate / RTM Check (pull_request) Successful in 16s
SDD Gate / Contract Validate (pull_request) Successful in 23s
SDD Gate / Constitution Impact (pull_request) Successful in 17s
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 1m22s
CI / OCR Service Tests (pull_request) Successful in 54s
CI / Backend Unit Tests (pull_request) Successful in 10m55s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 24s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m9s
SDD Gate / RTM Check (pull_request) Successful in 16s
SDD Gate / Contract Validate (pull_request) Successful in 23s
SDD Gate / Constitution Impact (pull_request) Successful in 17s
The backend job set TESTCONTAINERS_RYUK_DISABLED=true, a carry-over from the old NAS runner. With Ryuk off, Testcontainers tears down containers via the in-JVM JVMHookResourceReaper at shutdown; that reaper crashes (NotFoundException) and leaks containers run-over-run. As leaked postgres:16-alpine containers pile up on the runner, the per-run teardown of ~30 per-context containers degrades until the fork hangs at JVM shutdown and Surefire reports "There was a timeout in the fork" — even though all tests pass. (The server had 21 such leaks, up to 5 weeks old; manually killing them was what restored CI before.) CI now runs on a root server with modern Docker (29.4.3, socket access), so the original reason to disable Ryuk no longer applies. Re-enabling it reaps each run's containers out-of-process after the JVM exits, so they never accumulate. Also drops the stale "NAS runner" comment on DOCKER_API_VERSION. Fixes #848. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -229,9 +229,14 @@ jobs:
|
||||
name: Backend Unit Tests
|
||||
runs-on: ubuntu-latest
|
||||
env:
|
||||
DOCKER_API_VERSION: "1.43" # NAS runner runs Docker 24.x (max API 1.43); Testcontainers 2.x defaults to 1.44
|
||||
# CI runs against the root-server Docker daemon (29.x). This API pin is a harmless
|
||||
# carry-over from the old NAS runner (Docker 24.x, max API 1.43); safe to drop later.
|
||||
DOCKER_API_VERSION: "1.43"
|
||||
DOCKER_HOST: unix:///var/run/docker.sock
|
||||
TESTCONTAINERS_RYUK_DISABLED: "true"
|
||||
# Ryuk (Testcontainers' out-of-process reaper) is intentionally LEFT ENABLED so it
|
||||
# removes each run's containers after the JVM exits. Disabling it forced the in-JVM
|
||||
# reaper, which hung at JVM shutdown and leaked Postgres containers run-over-run until
|
||||
# the daemon degraded and the fork timed out at teardown — see #848.
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
|
||||
Reference in New Issue
Block a user