fix(ci): replace iproute2 ip with /proc/net/route for gateway detection #544

Merged
marcel merged 1 commits from fix/nightly-caddy-reload into main 2026-05-12 09:57:03 +02:00
Owner

Summary

  • The nightly smoke test was failing with ip: command not found (exit 127) because iproute2 is not installed in the Gitea runner container
  • Replaces ip route show default | awk '/default/ {print $3}' with a direct read of /proc/net/route — a kernel virtual file always present on Linux, requiring no installed package
  • awk decodes the little-endian 32-bit hex gateway field to dotted-decimal; behaviour is identical to the previous ip route approach

Test Plan

  • Trigger nightly workflow manually via workflow_dispatch and verify the smoke test step reaches the curl checks without exit 127
  • Confirm HOST_IP is logged as the expected Docker bridge gateway (e.g. 172.17.0.1)
## Summary - The nightly smoke test was failing with `ip: command not found` (exit 127) because `iproute2` is not installed in the Gitea runner container - Replaces `ip route show default | awk '/default/ {print $3}'` with a direct read of `/proc/net/route` — a kernel virtual file always present on Linux, requiring no installed package - `awk` decodes the little-endian 32-bit hex gateway field to dotted-decimal; behaviour is identical to the previous `ip route` approach ## Test Plan - [ ] Trigger nightly workflow manually via `workflow_dispatch` and verify the smoke test step reaches the curl checks without exit 127 - [ ] Confirm `HOST_IP` is logged as the expected Docker bridge gateway (e.g. `172.17.0.1`)
marcel added 1 commit 2026-05-12 09:53:15 +02:00
fix(ci): replace iproute2 ip with /proc/net/route for gateway detection
Some checks failed
CI / Unit & Component Tests (push) Has been cancelled
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (pull_request) Has been cancelled
CI / OCR Service Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
9c26c00eee
`ip route` (iproute2) is not installed in the Gitea runner container,
causing the smoke test step to exit 127. /proc/net/route is a kernel
virtual file that is always present on Linux; awk decodes the
little-endian hex gateway field to dotted-decimal without any external
binary dependency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

🔧 Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer

Verdict: Approved

This is my turf, so I looked at it carefully.

What's correct

  • Root cause is real. iproute2 is not installed in the Gitea runner container image (the job steps run in a DooD container, not directly on the Ubuntu host). ip is a userspace binary from that package — it was never guaranteed to be present.
  • /proc/net/route is the right replacement. It's a kernel virtual file, always present on Linux, no package required. This is what ip route reads under the hood anyway.
  • Hex decoding is mathematically correct. The gateway field in /proc/net/route is a little-endian 32-bit hex value (e.g. 0101A8C0 = 192.168.1.1). Reading bytes at positions 7–8, 5–6, 3–4, 1–2 and converting each pair with strtonum("0x...") produces the right dotted-decimal result.
  • The guard is preserved. [ -n "$HOST_IP" ] || { echo ... ; exit 1; } still catches the no-default-route case cleanly.
  • exit on first match handles the (unlikely but possible) case of multiple default routes.

One observation — not a blocker

strtonum is a gawk extension. On systems with mawk (Ubuntu's lightweight default awk in some images) it would fail silently or produce zero. Ubuntu's awk symlink points to gawk in the standard ubuntu images, and the Gitea runner containers are built on that base, so this works in practice. Worth knowing if the runner image ever changes.

A fully portable fallback if needed:

HOST_IP=$(python3 -c "
import struct, socket
for line in open('/proc/net/route'):
    f = line.split()
    if len(f) > 2 and f[1] == '00000000':
        print(socket.inet_ntoa(struct.pack('<I', int(f[2], 16))))
        break
")

But this is overkill for a gawk-equipped ubuntu runner. The awk form is fine.

What's good

The comment block on lines 169–172 is exactly the right level of detail — it explains the data format (little-endian 32-bit hex) and the field selection ($2=="00000000" = default route), so no future engineer needs to reverse-engineer this.

LGTM. Ship it.

## 🔧 Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer **Verdict: ✅ Approved** This is my turf, so I looked at it carefully. ### What's correct - **Root cause is real.** `iproute2` is not installed in the Gitea runner container image (the job steps run in a DooD container, not directly on the Ubuntu host). `ip` is a userspace binary from that package — it was never guaranteed to be present. - **`/proc/net/route` is the right replacement.** It's a kernel virtual file, always present on Linux, no package required. This is what `ip route` reads under the hood anyway. - **Hex decoding is mathematically correct.** The gateway field in `/proc/net/route` is a little-endian 32-bit hex value (e.g. `0101A8C0` = 192.168.1.1). Reading bytes at positions 7–8, 5–6, 3–4, 1–2 and converting each pair with `strtonum("0x...")` produces the right dotted-decimal result. - **The guard is preserved.** `[ -n "$HOST_IP" ] || { echo ... ; exit 1; }` still catches the no-default-route case cleanly. - **`exit` on first match** handles the (unlikely but possible) case of multiple default routes. ### One observation — not a blocker `strtonum` is a **gawk extension**. On systems with `mawk` (Ubuntu's lightweight default awk in some images) it would fail silently or produce zero. Ubuntu's `awk` symlink points to `gawk` in the standard ubuntu images, and the Gitea runner containers are built on that base, so this works in practice. Worth knowing if the runner image ever changes. A fully portable fallback if needed: ```bash HOST_IP=$(python3 -c " import struct, socket for line in open('/proc/net/route'): f = line.split() if len(f) > 2 and f[1] == '00000000': print(socket.inet_ntoa(struct.pack('<I', int(f[2], 16)))) break ") ``` But this is overkill for a gawk-equipped ubuntu runner. The awk form is fine. ### What's good The comment block on lines 169–172 is exactly the right level of detail — it explains the data format (`little-endian 32-bit hex`) and the field selection (`$2=="00000000"` = default route), so no future engineer needs to reverse-engineer this. LGTM. Ship it.
Author
Owner

👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer

Verdict: Approved

Pure CI YAML — no backend, no frontend, no test pyramid concerns. I'll focus on code quality of the shell snippet.

What's good

  • The comment explains the why, not the what. little-endian 32-bit hex value which awk decodes to dotted-decimal is the kind of remark that saves someone 20 minutes with a hex dump. That's the correct comment-writing discipline.
  • The guard [ -n "$HOST_IP" ] || { ... exit 1 } is a proper early-exit guard clause. Consistent with how we handle missing values elsewhere.
  • Error message updated to match the new implementation (/proc/net/route instead of 'ip route'). Small thing, gets it wrong surprisingly often in patches.

One observation — not a blocker

The one-liner on line 177 is dense:

HOST_IP=$(awk 'NR>1 && $2=="00000000"{h=$3;printf "%d.%d.%d.%d\n",strtonum("0x"substr(h,7,2)),strtonum("0x"substr(h,5,2)),strtonum("0x"substr(h,3,2)),strtonum("0x"substr(h,1,2));exit}' /proc/net/route)

This is technically correct and the comment above explains what it does, so a reader has the context they need. The alternative would be a multi-line awk block, which would be easier to scan at a glance but adds visual noise in YAML. Given the comment coverage, the one-liner is defensible.

No changes required from me.

## 👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer **Verdict: ✅ Approved** Pure CI YAML — no backend, no frontend, no test pyramid concerns. I'll focus on code quality of the shell snippet. ### What's good - The comment explains the *why*, not the *what*. `little-endian 32-bit hex value which awk decodes to dotted-decimal` is the kind of remark that saves someone 20 minutes with a hex dump. That's the correct comment-writing discipline. - The guard `[ -n "$HOST_IP" ] || { ... exit 1 }` is a proper early-exit guard clause. Consistent with how we handle missing values elsewhere. - Error message updated to match the new implementation (`/proc/net/route` instead of `'ip route'`). Small thing, gets it wrong surprisingly often in patches. ### One observation — not a blocker The one-liner on line 177 is dense: ```bash HOST_IP=$(awk 'NR>1 && $2=="00000000"{h=$3;printf "%d.%d.%d.%d\n",strtonum("0x"substr(h,7,2)),strtonum("0x"substr(h,5,2)),strtonum("0x"substr(h,3,2)),strtonum("0x"substr(h,1,2));exit}' /proc/net/route) ``` This is technically correct and the comment above explains what it does, so a reader has the context they need. The alternative would be a multi-line awk block, which would be easier to scan at a glance but adds visual noise in YAML. Given the comment coverage, the one-liner is defensible. No changes required from me.
Author
Owner

🏛️ Markus Keller (@mkeller) — Application Architect

Verdict: Approved

This is a tactical CI fix — no architectural implications. My standard documentation checklist:

PR trigger Required doc update Needed here?
New Docker service or infrastructure component docs/architecture/c4/l2-containers.puml + docs/DEPLOYMENT.md No new service
New external system docs/architecture/c4/l1-context.puml No new integration
Architectural decision with lasting consequences New ADR in docs/adr/ See below

ADR consideration

The existing ADR-012 documents the nsenter pattern for host service management in CI. The change from ip route to /proc/net/route is an implementation detail within that pattern, not a new architectural decision. No new ADR is needed. If ADR-012 contains a code snippet referencing ip route, updating it would be appropriate — but it's not a blocker.

What's right

The comment block correctly captures why this approach was chosen (no iproute2 package dependency) and how the data is structured (little-endian hex). This is the information that belongs in an ADR, and it's in the right place (the code that uses it). The principle of "boring technology" applies here — /proc/net/route is as boring and stable as Linux gets.

No blockers.

## 🏛️ Markus Keller (@mkeller) — Application Architect **Verdict: ✅ Approved** This is a tactical CI fix — no architectural implications. My standard documentation checklist: | PR trigger | Required doc update | Needed here? | |---|---|---| | New Docker service or infrastructure component | `docs/architecture/c4/l2-containers.puml` + `docs/DEPLOYMENT.md` | ❌ No new service | | New external system | `docs/architecture/c4/l1-context.puml` | ❌ No new integration | | Architectural decision with lasting consequences | New ADR in `docs/adr/` | See below | ### ADR consideration The existing ADR-012 documents the nsenter pattern for host service management in CI. The change from `ip route` to `/proc/net/route` is an implementation detail *within* that pattern, not a new architectural decision. No new ADR is needed. If ADR-012 contains a code snippet referencing `ip route`, updating it would be appropriate — but it's not a blocker. ### What's right The comment block correctly captures *why* this approach was chosen (no iproute2 package dependency) and *how* the data is structured (little-endian hex). This is the information that belongs in an ADR, and it's in the right place (the code that uses it). The principle of "boring technology" applies here — `/proc/net/route` is as boring and stable as Linux gets. No blockers.
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Verdict: Approved

CI workflow only. I reviewed through an adversarial lens anyway.

Threat surface analysis

/proc/net/route as a data source:
This is a Linux kernel virtual file, read-only from userspace, populated by the kernel's routing table. It cannot be written to or manipulated by any userspace process. No injection risk from the data source itself.

HOST_IP used in --resolve $HOST:443:$HOST_IP:
The value is passed as a curl --resolve flag argument. curl parses this as host:port:ip — it does not invoke a shell. Even if the awk expression produced unexpected output (e.g. due to a corrupt /proc/net/route, which is structurally impossible), the worst outcome is a curl failure, caught by set -e. There is no shell injection path here.

Context:
This step runs on a self-hosted runner. There is no external input to /proc/net/route. The only "attacker" who could influence this would already have host-level access, at which point the CI pipeline is the least of your problems.

No concerns

The guard [ -n "$HOST_IP" ] provides a fail-closed check. The awk command has no user-controlled input. The resulting value is consumed by curl in a non-shell-expanded context.

Clean from a security standpoint.

## 🔒 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** CI workflow only. I reviewed through an adversarial lens anyway. ### Threat surface analysis **`/proc/net/route` as a data source:** This is a Linux kernel virtual file, read-only from userspace, populated by the kernel's routing table. It cannot be written to or manipulated by any userspace process. No injection risk from the data source itself. **`HOST_IP` used in `--resolve $HOST:443:$HOST_IP`:** The value is passed as a curl `--resolve` flag argument. curl parses this as `host:port:ip` — it does not invoke a shell. Even if the awk expression produced unexpected output (e.g. due to a corrupt `/proc/net/route`, which is structurally impossible), the worst outcome is a curl failure, caught by `set -e`. There is no shell injection path here. **Context:** This step runs on a self-hosted runner. There is no external input to `/proc/net/route`. The only "attacker" who could influence this would already have host-level access, at which point the CI pipeline is the least of your problems. ### No concerns The guard `[ -n "$HOST_IP" ]` provides a fail-closed check. The awk command has no user-controlled input. The resulting value is consumed by curl in a non-shell-expanded context. Clean from a security standpoint.
Author
Owner

🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist

Verdict: Approved

CI infrastructure change — no unit test pyramid involvement. I'm reviewing from a verification and reliability perspective.

Verification coverage

The test plan lists:

  • Trigger via workflow_dispatch and confirm the smoke test step passes exit 127
  • Confirm HOST_IP is logged as the expected bridge gateway

This is the correct verification strategy for a CI step — the only meaningful test is running it. The step itself echoes Smoke test: $URL (pinned to $HOST_IP via bridge gateway) which gives a direct observable signal in the logs.

Edge cases I checked

Scenario Handled?
/proc/net/route missing (impossible on Linux, but) awk would fail, HOST_IP would be empty, guard fires
No default route row in table awk produces no output, guard fires with clear error message
Multiple default routes (ECMP) exit on first match — deterministic
strtonum produces 0 for invalid hex Could yield 0.0.0.0, non-empty, guard passes but curl fails — the smoke test itself is the backstop

The last scenario is theoretical only (the kernel writes valid hex to /proc/net/route), but if extra paranoia is wanted, a regex guard on the IP format could be added:

[[ "$HOST_IP" =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]] || { echo "ERROR: invalid IP format: $HOST_IP"; exit 1; }

Not a blocker — I mention it as a potential improvement for a follow-up if this step continues to be fragile.

LGTM.

## 🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist **Verdict: ✅ Approved** CI infrastructure change — no unit test pyramid involvement. I'm reviewing from a verification and reliability perspective. ### Verification coverage The test plan lists: - [ ] Trigger via `workflow_dispatch` and confirm the smoke test step passes exit 127 - [ ] Confirm `HOST_IP` is logged as the expected bridge gateway This is the correct verification strategy for a CI step — the only meaningful test is running it. The step itself echoes `Smoke test: $URL (pinned to $HOST_IP via bridge gateway)` which gives a direct observable signal in the logs. ### Edge cases I checked | Scenario | Handled? | |---|---| | `/proc/net/route` missing (impossible on Linux, but) | `awk` would fail, `HOST_IP` would be empty, guard fires | | No default route row in table | `awk` produces no output, guard fires with clear error message | | Multiple default routes (ECMP) | `exit` on first match — deterministic | | `strtonum` produces 0 for invalid hex | Could yield `0.0.0.0`, non-empty, guard passes but curl fails — the smoke test itself is the backstop | The last scenario is theoretical only (the kernel writes valid hex to `/proc/net/route`), but if extra paranoia is wanted, a regex guard on the IP format could be added: ```bash [[ "$HOST_IP" =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]] || { echo "ERROR: invalid IP format: $HOST_IP"; exit 1; } ``` Not a blocker — I mention it as a potential improvement for a follow-up if this step continues to be fragile. LGTM.
Author
Owner

🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead

Verdict: Approved

This PR modifies a CI workflow file only — no frontend components, no Svelte templates, no Tailwind classes, no accessibility surface, no brand tokens. Completely outside my review scope.

LGTM from a UI/UX perspective. Nothing to check here.

## 🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead **Verdict: ✅ Approved** This PR modifies a CI workflow file only — no frontend components, no Svelte templates, no Tailwind classes, no accessibility surface, no brand tokens. Completely outside my review scope. LGTM from a UI/UX perspective. Nothing to check here.
Author
Owner

📋 Elicit — Requirements Engineer

Verdict: Approved

This is a corrective infrastructure fix, not a feature. No requirements implications.

The PR description maps cleanly to the failure: ip: command not found → iproute2 not installed → replace with /proc/net/route. The "why" is stated, the fix is described, and the test plan is actionable and verifiable (workflow_dispatch trigger with specific log signals to check).

From a requirements standpoint, the nightly pipeline's stated responsibility — "deploy staging and smoke-test the public surface" — remains unchanged. The gateway detection is an implementation detail, not a functional requirement. This fix restores a broken prerequisite without changing scope.

No open questions, no scope concerns.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** This is a corrective infrastructure fix, not a feature. No requirements implications. The PR description maps cleanly to the failure: `ip: command not found` → iproute2 not installed → replace with `/proc/net/route`. The "why" is stated, the fix is described, and the test plan is actionable and verifiable (workflow_dispatch trigger with specific log signals to check). From a requirements standpoint, the nightly pipeline's stated responsibility — "deploy staging and smoke-test the public surface" — remains unchanged. The gateway detection is an implementation detail, not a functional requirement. This fix restores a broken prerequisite without changing scope. No open questions, no scope concerns.
marcel merged commit 51e2d50dd0 into main 2026-05-12 09:57:03 +02:00
marcel deleted branch fix/nightly-caddy-reload 2026-05-12 09:57:03 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#544