devops(backend): switch to multi-stage Docker build #238
Reference in New Issue
Block a user
Delete Branch "devops/multi-stage-docker-build"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
spring-boot:runat container startup with a proper multi-stage build~/.m2— subsequent builds skip dependency downloadseclipse-temurin:21-jrewith onlyapp.jar— smaller image, no JDK in production./backend:/appsource mount andmaven_cachenamed volume fromdocker-compose.ymlMotivation
The previous setup recompiled the entire project on every container start, causing:
target/Deploy
Subsequent rebuilds are fast: only the
src/layer is invalidated when source changes; the dependency layer stays cached.Test plan
docker compose build backendsucceedsdocker compose up -d backendstarts container cleanlyStarted FamilienarchivApplicationwith no errors/actuator/healthreturns UP🤖 Generated with Claude Code
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
No production code touched, no business logic changed, no test files affected. This is infrastructure only and the fix is the right call — I was the one hitting the restart loops during development.
Suggestions
Use
-Dmaven.test.skip=trueinstead of-DskipTestsin the Dockerfile-DskipTests(=-Dmaven.surefire.skip=true) still compiles test sources.-Dmaven.test.skip=trueskips both compilation and execution. Since we never run tests in the image, skipping test compilation saves time and avoids pulling test-only dependencies. If we ever have a test-only compile error (like a missing class referenced in a test),-DskipTestsfails the Docker build;-Dmaven.test.skip=truedoes not.The
*.jarglob is safe — but document whyThe glob works because Spring Boot Maven Plugin renames the pre-repackage artifact to
.jar.original. A one-line comment prevents future confusion if someone wonders "what if there are two JARs?"🏛️ Markus Keller — Application Architect
Verdict: ✅ Approved
This removes an antipattern — bind-mounting source code into a runtime container and compiling at startup conflates build concerns with runtime concerns. A container image should be an immutable artifact. This PR makes it one.
What's correct
~/.m2is better than a named Docker volume for build cache — it's managed by BuildKit, not Docker Compose, and doesn't leak into the running container.maven_cachefrom the named volumes list is correct — it was a workaround for the old runtime-compilation approach and has no place in the new model.Suggestions
Consider a Compose overlay for environment separation
The current single
docker-compose.ymlserves both dev and (eventually) production. Adocker-compose.prod.ymloverlay would allow environment-specific overrides without duplicating the base file:This isn't a blocker for this PR — it's the next natural step once the build pipeline is solid.
restart: unless-stoppedbehaviour changes with a pre-built JARUnder the old setup, a bad startup (e.g., Flyway migration failure) resulted in a 90-second pause per retry because compilation was happening. With a pre-built JAR, startup is ~15 seconds, so the restart loop is much tighter. Not a problem today but something to be aware of if a bad migration ships — it will hammer the database faster. No change needed now.
🧪 Sara Holt — QA Engineer
Verdict: ✅ Approved
No test files changed. No test infrastructure affected. Testcontainers-based integration tests are entirely independent of the Docker Compose setup and are not impacted by this change.
Observations
Test execution is now decoupled from deployment — make this explicit
The old setup ran
spring-boot:runwhich (awkwardly) included test compilation as part of startup. Withpackage -DskipTests, tests are explicitly not run during the image build. This is correct, but it means there must be a separate step for running tests — either locally (./mvnw test) or in CI.If CI doesn't exist yet, the PR description should note the expected test command so the workflow is unambiguous:
The manual test checklist in the PR description is the right instinct
The four checked items (build, start, actuator health) represent a minimal smoke test. Once CI is configured, these should be automated assertions — not manual checks. A post-deploy smoke test against
/actuator/healthis straightforward to automate.start_period: 60sis now very conservativeWith compilation removed from startup, the backend starts in ~15 seconds (JVM init + Flyway). The first health check doesn't fire for 60 seconds. Not a correctness issue, but it means failed starts are detected 45 seconds later than necessary. Consider reducing to 30s.
🔒 Nora "NullX" Steiner — Application Security Engineer
Verdict: ⚠️ Approved with concerns
The move to a runtime JRE image is a genuine security improvement — the JDK compiler,
jshell, andjcmdtools are not present ineclipse-temurin:21-jre, reducing the attacker's toolkit after a container escape. BuildKit cache mounts don't persist into the runtime image, so no build artifacts or Maven credentials are exposed in the final layer. No secrets appear in the Dockerfile.One concern needs attention before production use.
Blockers
None that prevent merging for dev use. One concern for production:
Suggestions
Pin image tags to specific digests (important for production)
Both base images use floating tags:
eclipse-temurin:21-jdkis not a pinned version — it will silently update when Adoptium publishes21.0.8. This means:For production, pin to image digest:
Or at minimum, pin to the full tag including patch version (e.g.
eclipse-temurin:21.0.7_6-jdk-jammy). Use Renovate to automate version bump PRs when new patches are released.Pre-existing concerns (not introduced by this PR)
S3_ACCESS_KEY: ${MINIO_ROOT_USER}— root MinIO credentials used for application S3 access. Root can delete all buckets. Create a service account with bucket-scoped permissions before production deployment."${PORT_DB}:5432"— PostgreSQL port exposed to the host machine. Useexpose: ["5432"]in production so only thearchive-netDocker network can reach the database.🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist
Verdict: ✅ Approved
No UI changes. No Svelte components. No frontend routes. No CSS. No accessibility-relevant changes.
Checked: no frontend service definition changes in
docker-compose.yml. The frontend service, its volumes, and its environment variables are untouched. This PR has zero user-visible impact and no design concerns to raise.🛠️ Tobias Wendt — DevOps & Platform Engineer
Verdict: ⚠️ Approved with concerns
The approach is correct and solves a real problem. Multi-stage builds with BuildKit cache mounts are the right pattern for this stack. The dependency layer separation (
pom.xml→dependency:go-offline→src/) means only source changes invalidate the compile step. Good.Two things need addressing:
Blockers
Missing
.dockerignore—target/(111MB) is sent to the BuildKit daemon on every buildThe backend's
target/directory is 111MB of compiled classes, JARs, test reports, and surefire output. Without a.dockerignore, Docker sends the entire build context to the daemon before the first instruction executes. On a fresh CI runner or new developer machine, that's 111MB of wasted transfer before a single layer runs. The cached build appeared fast locally because the daemon already had the context, but cold builds will be noticeably slower.Create
backend/.dockerignore:The three
COPYinstructions only need.mvn/,mvnw,pom.xml, andsrc/— everything else in the build context is noise.Suggestions
eclipse-temurin:21-jdkandeclipse-temurin:21-jreare floating tagsPer the Tobias Wendt house rule:
:latestis not a version, and neither is:21-jdk. When Adoptium releases 21.0.8, the tag moves, builds become non-reproducible, and rollback is impossible.Pin to the full distribution tag:
Then add Renovate to automate version bump PRs. This is a one-time setup cost that pays dividends forever.
dependency:go-offlineis an approximationdependency:go-offlinedownloads declared POM dependencies but misses most Maven plugin dependencies (Surefire, Jacoco, Spring Boot Maven plugin). The first build with a cold cache downloads plugin deps duringmvnw clean package, which works — they get cached in the BuildKit cache mount for subsequent runs. But thedependency:go-offlinestep itself can be slow and only provides partial cache priming. An alternative that achieves the same layer separation more reliably:Or simply drop the
dependency:go-offlinestep and accept that the first cold build is slower — subsequent builds use the cached layer anyway.start_period: 60sshould be reduced now that compilation is goneThe old 60s accounted for 90+ seconds of compilation (with 60s headroom before first health check). With a pre-built JAR, Spring Boot + Flyway starts in ~15 seconds.
start_period: 30sis sufficient and means failed starts are caught 30 seconds sooner.What is done well
~/.m2is correct — avoids the permission and lifecycle issues of the old named volume./import:/importvolume retained — runtime data correctly kept as a bind mountmaven_cachenamed volume cleanly removed from both the service definition and the global volumes block- Pin to eclipse-temurin:21.0.10_7-{jdk,jre}-noble for reproducible builds - Switch -DskipTests to -Dmaven.test.skip=true: skips test compilation entirely, not just execution — faster and avoids build failures from test-only missing classes - Add comment on COPY *.jar explaining why the glob is safe (Spring Boot renames the pre-repackage artifact to .jar.original, leaving only one .jar in target/) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>Review concerns addressed
All actionable concerns from the six-persona review have been resolved.
Blocker fixed — missing
.dockerignore(Tobias)Commit:
eee2d4f—devops(backend): add .dockerignore to exclude target/ from build contextAdded
backend/.dockerignoreexcludingtarget/,.git/,*.md,api_tests/. Build context transfer dropped from ~1MB (warm cache) to 97KB — and cold builds no longer send 111MB of compiled output to the daemon.Fixed —
-DskipTests→-Dmaven.test.skip=true(Felix)Commit:
3865a9c—devops(backend): pin eclipse-temurin tags, skip test compilation, document jar glob-Dmaven.test.skip=trueskips test compilation entirely, not just execution. Faster build and immune to test-only compile errors (e.g. a missing class referenced only in test code).Fixed —
*.jarglob documented (Felix)Commit:
3865a9c— same commitAdded inline comment on the
COPY --from=builderline explaining why the glob is safe (Spring Boot Maven Plugin renames the pre-repackage artifact to.jar.original).Fixed — image tags pinned to
21.0.10_7(Tobias + Nora)Commit:
3865a9c— same commitBoth base images pinned to
eclipse-temurin:21.0.10_7-{jdk,jre}-noble. Builds are now reproducible and safe from silent upstream tag mutations.Fixed —
start_period: 60s→30s(Tobias + Sara)Commit:
bbafbe6—devops(backend): reduce healthcheck start_period to 30sJAR starts in ~15 seconds. The 60s was sized for runtime compilation. 30s gives 2x headroom with faster detection of failed starts.
Deferred
dependency:go-offlineapproximation — noted by Tobias, works correctly in practice, deferred