fix(ci): use Bash array for curl --resolve to fix smoke tests #616

Merged
marcel merged 1 commits from fix/ci-smoke-curl-resolve-quoting into main 2026-05-18 12:08:14 +02:00
Owner

Summary

  • Root cause: RESOLVE was a plain string and expanded as "$RESOLVE" (double-quoted), so curl received --resolve host:443:ip as one argument instead of two — the embedded space made the entire string an unknown option name (exit 2).
  • Fix: RESOLVE is now a Bash array (RESOLVE=(--resolve "$HOST:443:$HOST_IP")); "${RESOLVE[@]}" always expands to exactly two separate shell words.
  • Bonus (release.yml): Gateway detection switched from ip route (requires iproute2) to /proc/net/route — consistent with nightly.yml, no extra package needed in the job container.

Test Plan

  • Trigger nightly workflow manually (workflow_dispatch) and verify smoke test step passes
  • Verify release workflow smoke test step on the next v* tag push
## Summary - **Root cause:** `RESOLVE` was a plain string and expanded as `"$RESOLVE"` (double-quoted), so curl received `--resolve host:443:ip` as **one argument** instead of two — the embedded space made the entire string an unknown option name (exit 2). - **Fix:** `RESOLVE` is now a Bash array (`RESOLVE=(--resolve "$HOST:443:$HOST_IP")`); `"${RESOLVE[@]}"` always expands to exactly two separate shell words. - **Bonus (release.yml):** Gateway detection switched from `ip route` (requires iproute2) to `/proc/net/route` — consistent with nightly.yml, no extra package needed in the job container. ## Test Plan - [ ] Trigger nightly workflow manually (`workflow_dispatch`) and verify smoke test step passes - [ ] Verify release workflow smoke test step on the next `v*` tag push
marcel added 1 commit 2026-05-18 12:02:57 +02:00
fix(ci): use Bash array for curl --resolve to fix smoke tests
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m6s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m8s
CI / fail2ban Regex (pull_request) Successful in 40s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
CI / Unit & Component Tests (push) Successful in 3m3s
CI / OCR Service Tests (push) Successful in 19s
CI / Backend Unit Tests (push) Successful in 3m5s
CI / fail2ban Regex (push) Successful in 42s
CI / Semgrep Security Scan (push) Successful in 19s
CI / Compose Bucket Idempotency (push) Successful in 1m0s
nightly / deploy-staging (push) Successful in 2m8s
49c5324352
Quoting RESOLVE as a string and expanding with "$RESOLVE" passes the
flag and its value as a single token to curl; curl rejects the whole
string as an unknown option (exit 2). Switching to a Bash array and
"${RESOLVE[@]}" ensures the two words are always passed as separate
arguments regardless of quoting context.

Also aligns release.yml gateway detection with nightly.yml: replaces
`ip route` (requires iproute2) with /proc/net/route (always available
in the job container, no extra package needed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Verdict: Approved

This is a clean, targeted fix with no application code involved. Reviewing from a correctness and readability standpoint.

What's correct

  • Bash array semantics are right. RESOLVE=(--resolve "$HOST:443:$HOST_IP") followed by "${RESOLVE[@]}" is the canonical way to pass a multi-word option as a single logical unit in Bash. The old quoted-string approach ("$RESOLVE") caused curl to receive --resolve host:443:ip as one token — the embedded space made curl treat it as an unknown option name (exit 2). This is a real, non-obvious Bash pitfall that's easy to miss in review.

  • Comments explain the WHY, not the WHAT. The updated comment block in release.yml:

    # --resolve stored as a Bash array so "${RESOLVE[@]}" expands to two
    # separate arguments; a quoted string would pass the flag and its value
    # as one token and curl would reject it as an unknown option.
    

    This is exactly the right level of comment — a non-obvious constraint that a future developer could accidentally revert. Well done.

  • set -e is in place — failures propagate correctly. No silent error swallowing.

One minor note

The awk hex-decode for /proc/net/route is dense:

awk 'NR>1 && $2=="00000000"{h=$3;printf "%d.%d.%d.%d\n",strtonum("0x"substr(h,7,2)),...}'

It works, it's consistent with nightly.yml, and it avoids the iproute2 dependency — so no change needed. But if this ever needs debugging, the logic parses little-endian hex from column 3. Worth keeping the existing nightly.yml comment nearby as the canonical reference.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer **Verdict: ✅ Approved** This is a clean, targeted fix with no application code involved. Reviewing from a correctness and readability standpoint. ### What's correct - **Bash array semantics are right.** `RESOLVE=(--resolve "$HOST:443:$HOST_IP")` followed by `"${RESOLVE[@]}"` is the canonical way to pass a multi-word option as a single logical unit in Bash. The old quoted-string approach (`"$RESOLVE"`) caused curl to receive `--resolve host:443:ip` as one token — the embedded space made curl treat it as an unknown option name (exit 2). This is a real, non-obvious Bash pitfall that's easy to miss in review. - **Comments explain the WHY, not the WHAT.** The updated comment block in `release.yml`: ``` # --resolve stored as a Bash array so "${RESOLVE[@]}" expands to two # separate arguments; a quoted string would pass the flag and its value # as one token and curl would reject it as an unknown option. ``` This is exactly the right level of comment — a non-obvious constraint that a future developer could accidentally revert. Well done. - **`set -e` is in place** — failures propagate correctly. No silent error swallowing. ### One minor note The `awk` hex-decode for `/proc/net/route` is dense: ```bash awk 'NR>1 && $2=="00000000"{h=$3;printf "%d.%d.%d.%d\n",strtonum("0x"substr(h,7,2)),...}' ``` It works, it's consistent with `nightly.yml`, and it avoids the `iproute2` dependency — so no change needed. But if this ever needs debugging, the logic parses little-endian hex from column 3. Worth keeping the existing `nightly.yml` comment nearby as the canonical reference.
Author
Owner

🏛️ Markus Keller — Senior Application Architect

Verdict: Approved

This PR touches only CI workflow files. No architectural changes, no layer boundaries affected, no new infrastructure components introduced.

Doc update check

Per the architecture doc-update matrix:

Trigger Required update
New Flyway migration Not applicable
New backend package/route Not applicable
New Docker service Not applicable
Architectural decision with lasting consequences Not applicable

No documentation updates are required for this change. The fix is squarely in the CI platform layer.

Consistency improvement worth noting

The release.yml previously used ip route show default | awk '/default/ {print $3}' to detect the Docker bridge gateway IP, while nightly.yml already used the /proc/net/route parsing approach. This PR aligns them — one pattern, two files. That's the right call: divergent approaches to the same problem in parallel workflows are a future maintenance trap.

No concerns.

The infrastructure topology hasn't changed. The smoke test strategy (HSTS pin, Permissions-Policy pin, actuator block verification) remains intact and correct.

## 🏛️ Markus Keller — Senior Application Architect **Verdict: ✅ Approved** This PR touches only CI workflow files. No architectural changes, no layer boundaries affected, no new infrastructure components introduced. ### Doc update check Per the architecture doc-update matrix: | Trigger | Required update | |---|---| | New Flyway migration | Not applicable | | New backend package/route | Not applicable | | New Docker service | Not applicable | | Architectural decision with lasting consequences | Not applicable | No documentation updates are required for this change. The fix is squarely in the CI platform layer. ### Consistency improvement worth noting The `release.yml` previously used `ip route show default | awk '/default/ {print $3}'` to detect the Docker bridge gateway IP, while `nightly.yml` already used the `/proc/net/route` parsing approach. This PR aligns them — one pattern, two files. That's the right call: divergent approaches to the same problem in parallel workflows are a future maintenance trap. ### No concerns. The infrastructure topology hasn't changed. The smoke test strategy (HSTS pin, Permissions-Policy pin, actuator block verification) remains intact and correct.
Author
Owner

🔧 Tobias Wendt — DevOps & Platform Engineer

Verdict: Approved

This is a solid, minimal CI fix. Reviewing the infrastructure and pipeline changes.

What's correct

1. Bash array for --resolve — correct fix for the root cause.

The bug: curl "$RESOLVE" ... where RESOLVE="--resolve host:443:ip" passes the entire string as a single argument. curl sees --resolve host:443:ip (with a space inside) as an unknown option name and exits 2. The fix — RESOLVE=(--resolve "$HOST:443:$HOST_IP") with "${RESOLVE[@]}" — ensures curl receives exactly two words: the flag and its value. This is the correct Bash idiom.

2. /proc/net/route over ip route show default — better choice.

ip route show default requires the iproute2 package to be present in the runner container. /proc/net/route is a kernel virtual file — always present on Linux, no package dependency. The Gitea runner image may or may not have iproute2; this was likely the actual root cause of failures in the release job. Good catch.

3. nightly.yml and release.yml are now consistent.

Both use the same gateway detection logic and the same curl invocation pattern. Divergent approaches to identical problems in parallel jobs are a maintenance hazard. This PR closes that gap.

4. Error guard is in place.

[ -n "$HOST_IP" ] || { echo "ERROR: could not detect Docker bridge gateway via /proc/net/route"; exit 1; }

Explicit failure with a clear error message before any curl call. Clean.

Suggestions (non-blocking)

  • The awk hex-decode one-liner is opaque to anyone who hasn't worked with /proc/net/route's little-endian hex format before. The existing comment in nightly.yml explains it well; release.yml now references nightly.yml for the full explanation — that cross-reference is acceptable.
  • Test plan requires a manual workflow_dispatch and a v* tag push. That's the right approach for CI infrastructure — there's no cheaper way to verify it. Worth noting in the test plan that the nightly dispatch should be tested before merging if the release tag isn't imminent.
## 🔧 Tobias Wendt — DevOps & Platform Engineer **Verdict: ✅ Approved** This is a solid, minimal CI fix. Reviewing the infrastructure and pipeline changes. ### What's correct **1. Bash array for `--resolve` — correct fix for the root cause.** The bug: `curl "$RESOLVE" ...` where `RESOLVE="--resolve host:443:ip"` passes the entire string as a single argument. curl sees `--resolve host:443:ip` (with a space inside) as an unknown option name and exits 2. The fix — `RESOLVE=(--resolve "$HOST:443:$HOST_IP")` with `"${RESOLVE[@]}"` — ensures curl receives exactly two words: the flag and its value. This is the correct Bash idiom. **2. `/proc/net/route` over `ip route show default` — better choice.** `ip route show default` requires the `iproute2` package to be present in the runner container. `/proc/net/route` is a kernel virtual file — always present on Linux, no package dependency. The Gitea runner image may or may not have `iproute2`; this was likely the actual root cause of failures in the release job. Good catch. **3. `nightly.yml` and `release.yml` are now consistent.** Both use the same gateway detection logic and the same curl invocation pattern. Divergent approaches to identical problems in parallel jobs are a maintenance hazard. This PR closes that gap. **4. Error guard is in place.** ```bash [ -n "$HOST_IP" ] || { echo "ERROR: could not detect Docker bridge gateway via /proc/net/route"; exit 1; } ``` Explicit failure with a clear error message before any curl call. Clean. ### Suggestions (non-blocking) - The `awk` hex-decode one-liner is opaque to anyone who hasn't worked with `/proc/net/route`'s little-endian hex format before. The existing comment in `nightly.yml` explains it well; `release.yml` now references `nightly.yml` for the full explanation — that cross-reference is acceptable. - Test plan requires a manual `workflow_dispatch` and a `v*` tag push. That's the right approach for CI infrastructure — there's no cheaper way to verify it. Worth noting in the test plan that the nightly dispatch should be tested before merging if the release tag isn't imminent.
Author
Owner

🔐 Nora "NullX" Steiner — Application Security Engineer

Verdict: Approved

This PR modifies only CI smoke test scripts. Security surface is limited, and the existing controls are intact.

What I checked

1. No user-controlled input in shell variables.

HOST is hardcoded in both scripts (HOST="archiv.raddatz.cloud" / HOST="$FRONTEND_HOST"). HOST_IP is derived from /proc/net/route — a kernel file, no network input, no user-controlled data. No injection vector exists here.

2. Secrets are not introduced or referenced in a new way.

No credentials are added. The workflow files do not introduce new ${{ secrets.* }} references or hardcode any values that should be secret.

3. The smoke tests verify security-critical headers.

The suite actively checks:

  • HSTS with a pinned value (max-age=31536000; includeSubDomains; preload) — not just header presence, but the exact HSTS policy required for preload list eligibility. A degraded max-age=1 now fails this check.
  • Permissions-Policy with an explicit deny list (camera=(), microphone=(), geolocation=()) — a loosened or dropped header fails the check.
  • Actuator blocked/actuator/health returning 404 confirms the Caddy reverse proxy is blocking it. This is the right verification: if someone accidentally routes /actuator/* to the backend, this test catches it.

These are meaningful security regression gates. The fix ensures they actually run correctly.

4. /proc/net/route vs ip route show default.

No security difference between these two approaches. Both read system state to discover the Docker bridge IP. No external input, no SSRF vector.

No findings.

## 🔐 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** This PR modifies only CI smoke test scripts. Security surface is limited, and the existing controls are intact. ### What I checked **1. No user-controlled input in shell variables.** `HOST` is hardcoded in both scripts (`HOST="archiv.raddatz.cloud"` / `HOST="$FRONTEND_HOST"`). `HOST_IP` is derived from `/proc/net/route` — a kernel file, no network input, no user-controlled data. No injection vector exists here. **2. Secrets are not introduced or referenced in a new way.** No credentials are added. The workflow files do not introduce new `${{ secrets.* }}` references or hardcode any values that should be secret. **3. The smoke tests verify security-critical headers.** The suite actively checks: - **HSTS** with a pinned value (`max-age=31536000; includeSubDomains; preload`) — not just header presence, but the exact HSTS policy required for preload list eligibility. A degraded `max-age=1` now fails this check. - **Permissions-Policy** with an explicit deny list (`camera=(), microphone=(), geolocation=()`) — a loosened or dropped header fails the check. - **Actuator blocked** — `/actuator/health` returning 404 confirms the Caddy reverse proxy is blocking it. This is the right verification: if someone accidentally routes `/actuator/*` to the backend, this test catches it. These are meaningful security regression gates. The fix ensures they actually run correctly. **4. `/proc/net/route` vs `ip route show default`.** No security difference between these two approaches. Both read system state to discover the Docker bridge IP. No external input, no SSRF vector. ### No findings.
Author
Owner

🧪 Sara Holt — Senior QA Engineer

Verdict: Approved

This is a CI infrastructure fix, not application code — no test pyramid changes are required or expected. Reviewing the quality of the smoke test suite and the verification approach.

Smoke test coverage assessment

The smoke tests check 4 distinct behaviors:

  1. Login page reachablecurl -fsS ... "$URL/login" -o /dev/null (basic connectivity + TLS)
  2. HSTS header with exact policy — regex match for max-age=31536000.*includeSubDomains.*preload
  3. Permissions-Policy header with exact deny list — regex match for camera=\(\),.*microphone=\(\),.*geolocation=\(\)
  4. Actuator blocked at proxy — HTTP 404 on /actuator/health

Each check fails loudly on its own failure mode. This is a reasonable smoke suite for a deployment gate.

Fix verification

The test plan calls for:

  • Manual workflow_dispatch trigger for nightly
  • A v* tag push for release

That's the correct approach. Shell script correctness in a Gitea Actions context can't be unit-tested in isolation — the only meaningful test is running it in the actual runner environment. The plan is honest about this.

One observation (non-blocking)

The status=$(curl -s "${RESOLVE[@]}" ...) subshell captures the exit code from curl but the http_code from -w "%{http_code}". If curl itself fails (network error, TLS error), $status will be empty rather than an HTTP code, and the check [ "$status" = "404" ] will fail with a misleading message. This is pre-existing behavior, not introduced by this PR, and the failure mode is loud (exit 1) so it doesn't silently pass. No action needed.

## 🧪 Sara Holt — Senior QA Engineer **Verdict: ✅ Approved** This is a CI infrastructure fix, not application code — no test pyramid changes are required or expected. Reviewing the quality of the smoke test suite and the verification approach. ### Smoke test coverage assessment The smoke tests check 4 distinct behaviors: 1. **Login page reachable** — `curl -fsS ... "$URL/login" -o /dev/null` (basic connectivity + TLS) 2. **HSTS header with exact policy** — regex match for `max-age=31536000.*includeSubDomains.*preload` 3. **Permissions-Policy header with exact deny list** — regex match for `camera=\(\),.*microphone=\(\),.*geolocation=\(\)` 4. **Actuator blocked at proxy** — HTTP 404 on `/actuator/health` Each check fails loudly on its own failure mode. This is a reasonable smoke suite for a deployment gate. ### Fix verification The test plan calls for: - Manual `workflow_dispatch` trigger for nightly - A `v*` tag push for release That's the correct approach. Shell script correctness in a Gitea Actions context can't be unit-tested in isolation — the only meaningful test is running it in the actual runner environment. The plan is honest about this. ### One observation (non-blocking) The `status=$(curl -s "${RESOLVE[@]}" ...)` subshell captures the exit code from curl but the `http_code` from `-w "%{http_code}"`. If curl itself fails (network error, TLS error), `$status` will be empty rather than an HTTP code, and the check `[ "$status" = "404" ]` will fail with a misleading message. This is pre-existing behavior, not introduced by this PR, and the failure mode is loud (exit 1) so it doesn't silently pass. No action needed.
Author
Owner

📋 Elicit — Requirements Engineer

Verdict: Approved

This is a targeted bug fix for a CI infrastructure regression. Reviewing from a requirements completeness standpoint.

Requirements traceability

The PR description clearly documents the full requirement chain:

  • Root cause identified: "$RESOLVE" passes one shell token where curl expects two
  • Fix described in implementation terms: Bash array with "${RESOLVE[@]}"
  • Scope contained: only CI workflows, no user-facing behavior change
  • Bonus fix documented: release.yml gateway detection aligned with nightly.yml

The test plan correctly focuses on manual trigger verification — the only meaningful acceptance test for this class of fix.

No requirements gaps.

This PR doesn't introduce new user-facing behavior, modify any domain concept, or change system NFRs. No acceptance criteria artifacts are missing.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** This is a targeted bug fix for a CI infrastructure regression. Reviewing from a requirements completeness standpoint. ### Requirements traceability The PR description clearly documents the full requirement chain: - **Root cause** identified: `"$RESOLVE"` passes one shell token where curl expects two - **Fix described** in implementation terms: Bash array with `"${RESOLVE[@]}"` - **Scope contained**: only CI workflows, no user-facing behavior change - **Bonus fix documented**: `release.yml` gateway detection aligned with `nightly.yml` The test plan correctly focuses on manual trigger verification — the only meaningful acceptance test for this class of fix. ### No requirements gaps. This PR doesn't introduce new user-facing behavior, modify any domain concept, or change system NFRs. No acceptance criteria artifacts are missing.
Author
Owner

🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist

Verdict: Approved

This PR modifies only CI workflow shell scripts — no user interface, no Svelte components, no HTML, no CSS, no design tokens. Zero UI/UX impact.

Nothing to review from my domain. LGTM.

## 🎨 Leonie Voss — UI/UX Designer & Accessibility Strategist **Verdict: ✅ Approved** This PR modifies only CI workflow shell scripts — no user interface, no Svelte components, no HTML, no CSS, no design tokens. Zero UI/UX impact. Nothing to review from my domain. LGTM.
marcel merged commit 49c5324352 into main 2026-05-18 12:08:14 +02:00
marcel deleted branch fix/ci-smoke-curl-resolve-quoting 2026-05-18 12:08:14 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#616