marcel 273a97046a
Some checks failed
CI / Unit & Component Tests (push) Failing after 39s
CI / OCR Service Tests (push) Successful in 23s
CI / Backend Unit Tests (push) Successful in 5m57s
CI / fail2ban Regex (push) Successful in 43s
CI / Semgrep Security Scan (push) Successful in 24s
CI / Compose Bucket Idempotency (push) Successful in 1m8s
nightly / deploy-staging (push) Successful in 4m49s
nightly / npm-audit (push) Failing after 18s
Renovate / renovate (push) Failing after 23s
fix(ci): re-enable Testcontainers Ryuk to stop the backend fork shutdown hang (#848) (#849)
Fixes #848.

## Symptom

CI `Backend Unit Tests` goes red despite **all tests passing**: after the last test, the fork hangs at JVM shutdown and Surefire reports `There was a timeout in the fork` → `BUILD FAILURE`.

## Root cause (corrected after investigation)

My first theory (slow shutdown needs a bigger timeout) was **wrong** — raising `forkedProcessExitTimeoutInSeconds` 30→120 only delayed the kill by ~90s (total time 12:35 → 14:04), proving an *indefinite* hang, not slowness.

The real cause is **Testcontainers teardown with Ryuk disabled**:
- The job set `TESTCONTAINERS_RYUK_DISABLED: "true"` (carry-over from the old NAS runner).
- With Ryuk off, containers are reaped by the **in-JVM `JVMHookResourceReaper`** at shutdown. That reaper crashes (`NotFoundException`) and **leaks containers run-over-run**.
- The run boots ~30 per-context Spring contexts (`PostgresContainerConfig` is a per-context `@Bean`), so ~30 Postgres containers are torn down in-JVM at shutdown.
- As leaks accumulate on the runner, per-run teardown degrades until the fork hangs at shutdown → fork timeout. **The server had 21 orphaned `postgres:16-alpine`/`minio` containers up to 5 weeks old**; manually killing them is what restored CI before (a recurring pattern).

Environment confirmed via `ssh root@raddatz.cloud`: CI now runs on a root server with **Docker 29.4.3** (8 CPU, 62 GB, socket access) — so the original reason to disable Ryuk no longer applies, and Docker is *not* slow.

## Change

1. **Re-enable Ryuk** (remove `TESTCONTAINERS_RYUK_DISABLED`) — Ryuk reaps each run's containers out-of-process after the JVM exits, so they never accumulate. Automates the manual "kill all testcontainers."
2. Keep `forkedProcessExitTimeoutInSeconds=120` as a harmless backstop.
3. Drop the stale "NAS runner" comment on `DOCKER_API_VERSION`.

Operational: the 21 leaked containers were already removed from the server (by `org.testcontainers=true` label; real services untouched), giving immediate relief.

## Validation

Validated by this PR's CI run on the real runner (watching it). If Ryuk can't start in the runner's docker-outside-docker setup, the integration tests fail fast and I revert — fallback is a singleton Postgres container.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Marcel <marcel@familienarchiv>
Reviewed-on: #849
2026-06-15 20:53:58 +02:00

Familienarchiv

Familienarchiv is a private web application for digitising, organising, and searching a family document collection — letters, postcards, and photographs from 1899 to 1950. Family members upload scans, transcribe handwritten text (Kurrent/Sütterlin), and read the archive from any device.


Subsystems

  • frontend/ — SvelteKit 2 / Svelte 5 / TypeScript / Tailwind 4 web app (server-side rendered)
  • backend/ — Spring Boot 4 (Java 21) REST API; handles documents, persons, search, and user management
  • ocr-service/ — Python FastAPI microservice for OCR and handwritten text recognition (HTR); single-node by design — see ADR-001. Not part of the default dev stack (see Quick start below)
  • infra/ — Gitea Actions CI/CD config; future home for infrastructure-as-code
  • scripts/ — operational and data-pipeline helpers (reset-db.sh, clean-e2e-data.sh, import scripts)

Quick start

Prerequisites: Java 21, Node 24, Docker with the docker compose plugin (V2).

1. Configure environment

cp .env.example .env
# The defaults in .env.example work for local development without changes.

2. Start infrastructure

# Starts PostgreSQL, MinIO (object storage), and Mailpit (dev mail catcher)
docker compose up -d db minio mailpit

3. Start the backend

cd backend
./mvnw spring-boot:run
# Starts on http://localhost:8080
# API docs (dev profile, auto-enabled): http://localhost:8080/v3/api-docs

4. Start the frontend

cd frontend
npm install
npm run dev
# Starts on http://localhost:5173

Open http://localhost:5173 — you should see the Familienarchiv login screen.

Default development credentials:

# local dev only — change before any network-exposed deployment
Email:    admin@familyarchive.local
Password: admin123

Development setup only. The default docker compose config exposes the database port and uses root MinIO credentials. Do not connect this to a network without first reading docs/DEPLOYMENT.md (coming: DOC-5, #399).

Running the full stack via Docker (optional)

To run everything including the backend and frontend in containers:

docker compose up -d

Note: the OCR service (ocr-service/) builds its Docker image locally and downloads ~6 GB of ML models on first start. Expect 3060 minutes on a first run. The rest of the stack starts independently; OCR can be excluded with --scale ocr-service=0 on memory-constrained machines (requires ≥ 12 GB RAM).


Where to go next

Resource Purpose
docs/architecture/c4-diagrams.md C4 container and component diagrams (current system view)
docs/ARCHITECTURE.md (coming: DOC-2, #396) Full architecture guide with domain list
docs/GLOSSARY.md Overloaded terms: Person vs AppUser, Chronik vs Aktivität, etc.
CONTRIBUTING.md (coming: DOC-4, #398) How to add a domain, endpoint, or SvelteKit route
docs/DEPLOYMENT.md (coming: DOC-5, #399) Production deployment checklist and secrets guide
docs/adr/ Architecture Decision Records — the "why" behind key choices
Gitea issue tracker (internal — home network only) Bug reports, feature requests, and project planning

License

Private project — all rights reserved. Not licensed for redistribution.

Description
No description provided
Readme 52 MiB
Languages
Python 66.4%
Java 14.3%
TypeScript 14.2%
Svelte 4.8%
JavaScript 0.1%
Other 0.1%