fix(ci): resolve smoke test host via bridge gateway, not 127.0.0.1 #540
Reference in New Issue
Block a user
Delete Branch "fix/nightly-caddy-reload"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
staging.raddatz.cloud/archiv.raddatz.cloudto127.0.0.1, but job containers run in bridge network mode —127.0.0.1inside the container is the container's own loopback, not the host's. Caddy on the host was unreachable, causing an immediateECONNREFUSED("after 0 ms").ip route show default) and using that as the--resolvetarget. Caddy binds0.0.0.0:443so it is reachable from the container via the bridge gateway.nightly.ymlandrelease.yml. Updated comments which incorrectly described127.0.0.1as "the runner's loopback".Test plan
workflow_dispatch) and confirm the Smoke test step passespinned to 172.x.x.x via bridge gateway(not127.0.0.1)🤖 Generated with Claude Code
e6893ae67eto3056311c24🏛️ Markus Keller (@mkeller) — Application Architect
Verdict: ✅ Approved
This is a targeted CI infrastructure fix — no backend packages, routes, DB migrations, domain entities, or external systems changed. My doc-update checklist has nothing to trigger here.
What I checked
@ManyToMany, Flyway migration, or DB schema change → DB diagrams unchanged: correct.docs/DEPLOYMENT.mdandl2-containers.pumlunchanged: correct.docs/GLOSSARY.mdunchanged: correct.Observations
The change correctly follows the "boring technology wins" principle — detecting the bridge gateway with
ip route show default | awk '/default/ {print $3}'is a standard POSIX approach available in every Linux runner image without installing anything extra. The cross-reference comment inrelease.ymlpointing to the full explanation innightly.ymlavoids duplication while keeping the workflow self-documenting.No structural concerns. LGTM from an architecture perspective.
👨💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer
Verdict: ✅ Approved
This is a CI workflow fix, not application code, so the TDD cycle doesn't directly apply. The shell scripting is correct and readable — let me walk through what I checked.
What I checked
Shell correctness
HOST_IP=$(ip route show default | awk '/default/ {print $3}')— theawkpattern/default/matches the line containing the default route and$3correctly extracts the gateway IP (field 3 inip routeoutput on standard Linux).set -eis already active, so any command failure propagates correctly.Variable quoting
RESOLVE="--resolve $HOST:443:$HOST_IP"— double-quoted; fine. The subsequentcurl -fsS $RESOLVEpasses$RESOLVEunquoted, which works here because the value contains no spaces or glob characters, but it's a mild style inconsistency with the surrounding code.curl -fsS "$RESOLVE"would be more defensive.Comments explain WHY, not WHAT
The updated comment block in
nightly.yml(lines 162–170 in the diff) correctly explains the network topology constraint — bridge mode, loopback scope,0.0.0.0binding — rather than restating the code. Therelease.ymlcross-reference is a clean way to avoid duplication.Consistency
The same fix is applied to both
nightly.ymlandrelease.ymlwith matching output messages (pinned to $HOST_IP via bridge gateway). Good.Suggestion (not a blocker)
curl -fsS "$RESOLVE"— quote the variable for defensive consistency, even though the value is safe here. Three-character fix.🔧 Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer
Verdict: ⚠️ Approved with concerns
The root cause diagnosis is exactly right: job containers in bridge mode have their own loopback (
127.0.0.1≠ host). Usingip route show default | awk '/default/ {print $3}'to get the bridge gateway is the standard, portable approach. Caddy binds0.0.0.0:443, so the gateway IP reaches it cleanly. Good fix.Blocker
Missing guard for empty
HOST_IPIf
ip route show defaultreturns no output (e.g. the runner image has a different routing setup, oripisn't installed),HOST_IPwill be an empty string.curlwill then receive--resolve staging.raddatz.cloud:443:which it treats as a malformed resolve spec and exits with error 6 ("couldn't resolve host") — a confusing failure mode that will take time to diagnose.Add a guard immediately after the assignment:
Same guard needed in
release.yml. Withset -ealready active, the guard makes the failure message actionable rather than cryptic.Suggestions (not blockers)
$RESOLVEin the curl call.curl -fsS $RESOLVEshould becurl -fsS "$RESOLVE". It's safe as-is today (no spaces in the value), but quoting is the correct default.nightly.ymlandrelease.yml. The cross-reference comment is acceptable for now. If a third workflow needs this pattern, consider extracting it into a composite action.What's correct
set -eis already present, so failures propagate correctly.echooutput change (pinned to $HOST_IP via bridge gateway) makes the CI log self-describing — good for debugging.🔒 Nora "NullX" Steiner — Application Security Engineer
Verdict: ✅ Approved
This change touches CI workflows only — no application code, no auth flows, no API surfaces. I reviewed it through a security lens anyway.
What I checked
Command injection risk: none
HOST_IP=$(ip route show default | awk '/default/ {print $3}')— the input toawkcomes entirely from the kernel routing table viaip route. There is no user-controlled input path here. The resulting IP is used incurl's--resolveflag, which is parsed by curl and not interpreted by a shell. No injection vector.Network target of the smoke test
The fix directs curl to the Docker bridge gateway IP, which is the host machine running Caddy. This is the intended target — the smoke test is verifying the deployed service, not an arbitrary network destination. The
--resolveflag only overrides DNS resolution for the named host; SNI still usesstaging.raddatz.cloud/archiv.raddatz.cloud, so TLS certificate validation remains correct.No secrets, no credentials
No secrets in the diff. The CI workflow correctly relies on Gitea secrets for any auth-related values elsewhere in the workflow.
No expanded attack surface
The bridge gateway IP is not logged in a way that exposes internal infrastructure topology beyond what's already visible in CI logs. The echo statement (
pinned to 172.x.x.x via bridge gateway) is informational and benign.LGTM from a security perspective.
🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist
Verdict: ✅ Approved
This is a CI infrastructure fix, not application code. Traditional test pyramid concerns (unit/integration/E2E) don't apply directly, but I reviewed the testability of the fix itself.
What I checked
Test plan quality
The PR description includes a concrete test plan:
nightlyviaworkflow_dispatch✅pinned to 172.x.x.x via bridge gateway✅These are specific, observable acceptance criteria. "Confirm the smoke test step passes" is the right gate.
Failure mode visibility
The echo statement before the curl invocation (
echo "Smoke test: $URL (pinned to $HOST_IP via bridge gateway)") means that when the job runs, CI logs will clearly show which IP was resolved. This makes future failures diagnosable from logs alone — good test hygiene applied to CI.Regression risk to existing test infrastructure
None. The change only affects the
--resolvetarget in the curl commands and the surrounding comments. The HSTS header assertions and actuator block checks that follow are unchanged.Observation (not a blocker)
There is no automated guard that verifies
HOST_IPis non-empty before it's used (noted also by Tobias). If the gateway detection fails silently, the curl command will fail with a confusing error code rather than a meaningful message. A[ -n "$HOST_IP" ]guard would make the failure self-diagnosing — which is what I'd want for any step in a smoke test suite.📋 Elicit — Requirements Engineer
Verdict: ✅ Approved
This PR is a corrective infrastructure fix, not a requirements change. My role here is to verify that the problem statement is clear, the acceptance criteria are testable, and no implicit requirements are being violated.
Problem statement
Clear and specific:
127.0.0.1inside a bridge-mode container is the container's own loopback, not the host's. Caddy was unreachable, causing immediateECONNREFUSED. The diagnosis is precise and matches the observable symptom ("after 0 ms").Acceptance criteria (from PR description)
pinned to 172.x.x.x via bridge gateway✅ (observable, specific)Both criteria meet the Definition of Ready: they are concrete, observable, and verifiable by a single person without ambiguity.
Implicit requirements not violated
The fix preserves all existing smoke test assertions (HSTS header pinning, actuator block verification). The scope is tight — two comment blocks and four lines of shell code across two workflow files. No feature scope creep.
Open question (low priority)
The test plan relies on manual
workflow_dispatchverification. If this fix is foundational for release confidence, it may be worth tracking a follow-up to add an automated gate that confirms the bridge detection works in the runner environment before the smoke test runs. Not a blocker for this PR — just something to consider for observability of the CI pipeline itself.🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead
Verdict: ✅ Approved
This PR modifies CI workflow YAML files only. There are no Svelte components, no HTML, no CSS, no Tailwind classes, no design tokens, no i18n strings, and no user-facing screens affected.
No UI/UX or accessibility concerns to raise. LGTM.
If `ip route show default` returns no output the old code passed an empty string to curl --resolve, producing a confusing error 6 ("couldn't resolve host") with no indication that gateway detection had failed. The new guard exits immediately with a clear message. Addresses review concern raised by Tobias Wendt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>Review concerns addressed
Two commits pushed to
fix/nightly-caddy-reloadfollowing the persona review:✅ Tobias's blocker — empty
HOST_IPguard (commit0602d45c)Added to both
nightly.ymlandrelease.yml, immediately after theip routeassignment:If gateway detection fails, the job now exits with a clear actionable message instead of a cryptic curl error 6.
✅ Felix + Tobias suggestion — quoted
$RESOLVE(commitf12eac45)All 8 curl invocations across both files updated:
curl -fsS $RESOLVE→curl -fsS "$RESOLVE"andcurl -s $RESOLVE→curl -s "$RESOLVE".Shell syntax validated with
bash -non both extracted script blocks before each commit.