The backend job set TESTCONTAINERS_RYUK_DISABLED=true, a carry-over from the
old NAS runner. With Ryuk off, Testcontainers tears down containers via the
in-JVM JVMHookResourceReaper at shutdown; that reaper crashes (NotFoundException)
and leaks containers run-over-run. As leaked postgres:16-alpine containers pile
up on the runner, the per-run teardown of ~30 per-context containers degrades
until the fork hangs at JVM shutdown and Surefire reports "There was a timeout
in the fork" — even though all tests pass. (The server had 21 such leaks, up to
5 weeks old; manually killing them was what restored CI before.)
CI now runs on a root server with modern Docker (29.4.3, socket access), so the
original reason to disable Ryuk no longer applies. Re-enabling it reaps each
run's containers out-of-process after the JVM exits, so they never accumulate.
Also drops the stale "NAS runner" comment on DOCKER_API_VERSION.
Fixes#848.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
All 2327 tests pass, but the build went red: after the suite finishes,
Surefire calls System.exit(0) and the single reused fork then closes ~32
cached Spring contexts at once — each tearing down a Testcontainers Postgres
+ HikariCP pool — which overruns Surefire's 30s default post-exit grace.
Surefire force-kills the fork and reports a fork timeout (BUILD FAILURE with
0 failures). The session-cleanup InterruptedException and Testcontainers
reaper NotFoundException in the log are symptoms of that contended teardown.
Set the previously-unset forkedProcessExitTimeoutInSeconds to 120s. This is a
different knob from forkedProcessTimeoutInSeconds (total/inactivity), already
600s, which is why the earlier ceiling bumps never addressed this failure.
Phase B of #848; the durable fix (singleton Testcontainers Postgres +
disabling the Spring Session JDBC cleanup scheduler in tests) follows.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>