fix(docker): skip postinstall in production image #583

Merged
marcel merged 1 commits from fix/production-npm-ci-ignore-scripts into main 2026-05-14 19:49:06 +02:00
Owner

Problem

The nightly deploy pipeline fails at [frontend production 6/6] RUN npm ci --omit=dev with exit code 127:

sh: patch-package: not found
npm error command sh -c patch-package

postinstall unconditionally runs patch-package, which is a devDependency. --omit=dev skips devDependencies, so the binary is absent.

Root Cause

patch-package patches @vitest/browser-playwright — a pure test tool. It is never needed at runtime in the production image, which only runs the pre-built SvelteKit app via node build.

Fix

Add --ignore-scripts to the production npm ci call. This is the npm-native way to suppress lifecycle scripts during a dep-only install. No changes to package.json needed — postinstall continues to work normally in dev and test environments.

Test Plan

  • Docker build succeeds for the production target: docker build --target production frontend/
  • node build starts and serves the app correctly
  • Dev install (npm ci without flags) still applies patches via postinstall
## Problem The nightly deploy pipeline fails at `[frontend production 6/6] RUN npm ci --omit=dev` with exit code 127: ``` sh: patch-package: not found npm error command sh -c patch-package ``` `postinstall` unconditionally runs `patch-package`, which is a `devDependency`. `--omit=dev` skips devDependencies, so the binary is absent. ## Root Cause `patch-package` patches `@vitest/browser-playwright` — a pure test tool. It is never needed at runtime in the production image, which only runs the pre-built SvelteKit app via `node build`. ## Fix Add `--ignore-scripts` to the production `npm ci` call. This is the npm-native way to suppress lifecycle scripts during a dep-only install. No changes to `package.json` needed — `postinstall` continues to work normally in dev and test environments. ## Test Plan - [ ] Docker build succeeds for the `production` target: `docker build --target production frontend/` - [ ] `node build` starts and serves the app correctly - [ ] Dev install (`npm ci` without flags) still applies patches via `postinstall`
marcel added 1 commit 2026-05-14 19:44:22 +02:00
fix(docker): skip postinstall in production image
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m9s
CI / OCR Service Tests (pull_request) Successful in 15s
CI / Backend Unit Tests (pull_request) Successful in 4m31s
CI / fail2ban Regex (pull_request) Successful in 38s
CI / Compose Bucket Idempotency (pull_request) Successful in 59s
CI / OCR Service Tests (push) Has been cancelled
CI / Backend Unit Tests (push) Has been cancelled
CI / fail2ban Regex (push) Has been cancelled
CI / Compose Bucket Idempotency (push) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
33c738db3b
The production stage runs npm ci --omit=dev to install runtime deps for
the pre-built SvelteKit app. The postinstall script calls patch-package,
which is a devDependency, so it is absent and causes exit code 127.

--ignore-scripts is the correct npm-native fix: no lifecycle scripts are
needed when installing into a pre-built image.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

🛠️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer

Verdict: Approved

This is exactly right. Let me walk through it.

What's correct

--ignore-scripts is the npm-native answer here. The production stage's only job is to install runtime dependencies so node build can execute — no lifecycle scripts belong there. The alternative (a shell guard in package.json) would have been a smell: it patches around the problem in the wrong place and is not idiomatic.

The multi-stage build structure is also sound:

  • build stage: full npm ci → applies patches via postinstall → compiles the app
  • production stage: npm ci --omit=dev --ignore-scripts → installs only what node build needs at runtime

Patches are applied exactly once, at the right stage.

One thing worth adding (suggestion)

Consider adding a brief comment in the Dockerfile explaining why --ignore-scripts is here. Future maintainers will wonder if it's safe to remove:

# --ignore-scripts: postinstall runs patch-package (a devDep), which is absent here.
# Lifecycle scripts are not needed in this stage — the app is already compiled.
RUN npm ci --omit=dev --ignore-scripts

This is a suggestion, not a blocker.

Missing CI guard (suggestion)

The production Docker target is not explicitly tested in CI — the pipeline currently only catches failures during nightly deploy runs. Consider adding a docker build --target production frontend/ step to the CI workflow so regressions like this fail at PR time, not post-merge. Again, not a blocker for this PR, but worth tracking.

What's done well

  • Node image tag is pinned (node:20.19.0-alpine3.21) — reproducible builds ✓
  • package.json stays clean — no workaround code committed ✓
  • Root cause analysis in the PR description is accurate and clear ✓
## 🛠️ Tobias Wendt (@tobiwendt) — DevOps & Platform Engineer **Verdict: ✅ Approved** This is exactly right. Let me walk through it. ### What's correct `--ignore-scripts` is the npm-native answer here. The production stage's only job is to install runtime dependencies so `node build` can execute — no lifecycle scripts belong there. The alternative (a shell guard in `package.json`) would have been a smell: it patches around the problem in the wrong place and is not idiomatic. The multi-stage build structure is also sound: - `build` stage: full `npm ci` → applies patches via `postinstall` → compiles the app - `production` stage: `npm ci --omit=dev --ignore-scripts` → installs only what `node build` needs at runtime Patches are applied exactly once, at the right stage. ### One thing worth adding (suggestion) Consider adding a brief comment in the Dockerfile explaining *why* `--ignore-scripts` is here. Future maintainers will wonder if it's safe to remove: ```dockerfile # --ignore-scripts: postinstall runs patch-package (a devDep), which is absent here. # Lifecycle scripts are not needed in this stage — the app is already compiled. RUN npm ci --omit=dev --ignore-scripts ``` This is a suggestion, not a blocker. ### Missing CI guard (suggestion) The production Docker target is not explicitly tested in CI — the pipeline currently only catches failures during nightly deploy runs. Consider adding a `docker build --target production frontend/` step to the CI workflow so regressions like this fail at PR time, not post-merge. Again, not a blocker for this PR, but worth tracking. ### What's done well - Node image tag is pinned (`node:20.19.0-alpine3.21`) — reproducible builds ✓ - `package.json` stays clean — no workaround code committed ✓ - Root cause analysis in the PR description is accurate and clear ✓
Author
Owner

🔒 Nora "NullX" Steiner — Application Security Engineer

Verdict: Approved

This is a net security improvement, not just a bug fix.

Why --ignore-scripts is the more secure approach

npm lifecycle scripts (postinstall, preinstall, install) are a well-known supply-chain attack vector. A compromised transitive dependency can ship a malicious postinstall that exfiltrates secrets, creates backdoors, or fingerprints the host at install time. This is not hypothetical — it's how packages like event-stream (2018) and ua-parser-js (2021) were weaponized.

By adding --ignore-scripts to the production install, this PR ensures that no package in the dependency tree can execute arbitrary code during the production container build. The attack surface shrinks meaningfully.

The development install (npm ci without flags) still runs scripts — and that's correct. Patches need to be applied in dev/build. But in the production stage, the app is already compiled and patched; there is no legitimate reason for any lifecycle script to run.

CWE reference

This aligns with CWE-829 (Inclusion of Functionality from Untrusted Control Sphere). Lifecycle scripts are untrusted control sphere code executing in a privileged build context.

Nothing to flag

  • No credentials or secrets in the Dockerfile ✓
  • No new external dependencies introduced ✓
  • Image tag pinned to a specific version ✓
  • Production container runs as the default Node user (non-root in recent Node images — verify this is the case for node:20.19.0-alpine3.21) ✓

Minor observation (not a blocker)

If you want to harden further in the future: npm ci --omit=dev --ignore-scripts combined with --no-audit removes the network call to the npm audit registry during CI, which is a small additional hardening step (reduces outbound egress and eliminates a potential audit-endpoint compromise vector). Not required here.

## 🔒 Nora "NullX" Steiner — Application Security Engineer **Verdict: ✅ Approved** This is a net security improvement, not just a bug fix. ### Why `--ignore-scripts` is the more secure approach npm lifecycle scripts (`postinstall`, `preinstall`, `install`) are a well-known supply-chain attack vector. A compromised transitive dependency can ship a malicious `postinstall` that exfiltrates secrets, creates backdoors, or fingerprints the host at install time. This is not hypothetical — it's how packages like `event-stream` (2018) and `ua-parser-js` (2021) were weaponized. By adding `--ignore-scripts` to the production install, this PR ensures that **no package in the dependency tree can execute arbitrary code during the production container build**. The attack surface shrinks meaningfully. The development install (`npm ci` without flags) still runs scripts — and that's correct. Patches need to be applied in dev/build. But in the `production` stage, the app is already compiled and patched; there is no legitimate reason for any lifecycle script to run. ### CWE reference This aligns with **CWE-829 (Inclusion of Functionality from Untrusted Control Sphere)**. Lifecycle scripts are untrusted control sphere code executing in a privileged build context. ### Nothing to flag - No credentials or secrets in the Dockerfile ✓ - No new external dependencies introduced ✓ - Image tag pinned to a specific version ✓ - Production container runs as the default Node user (non-root in recent Node images — verify this is the case for `node:20.19.0-alpine3.21`) ✓ ### Minor observation (not a blocker) If you want to harden further in the future: `npm ci --omit=dev --ignore-scripts` combined with `--no-audit` removes the network call to the npm audit registry during CI, which is a small additional hardening step (reduces outbound egress and eliminates a potential audit-endpoint compromise vector). Not required here.
Author
Owner

👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer

Verdict: Approved

This is outside my primary domain (no application code changed), but I have one observation from a TDD perspective.

What's good

The fix is in the right place. Adding a shell guard to package.json would have been the wrong layer — it would have mixed infrastructure concerns into the application's package manifest. Fixing it in the Dockerfile keeps the separation clean.

The one-line diff is exactly what a minimal fix should look like. No "while I'm here" cleanup, no unrelated changes bundled in.

TDD lens

The PR description includes a test plan with three manual steps. That's better than nothing, but this bug existed because there was no automated regression for it. The fix resolves the immediate failure, but without an automated test, the same class of problem (lifecycle script relying on a devDep, breaking --omit=dev installs) can silently reappear.

A CI step like:

- name: Verify production Docker build
  run: docker build --target production frontend/

…would have caught this before it ever reached the nightly deploy. Worth tracking, though I'd open a separate issue rather than block this fix.

## 👨‍💻 Felix Brandt (@felixbrandt) — Senior Fullstack Developer **Verdict: ✅ Approved** This is outside my primary domain (no application code changed), but I have one observation from a TDD perspective. ### What's good The fix is in the right place. Adding a shell guard to `package.json` would have been the wrong layer — it would have mixed infrastructure concerns into the application's package manifest. Fixing it in the Dockerfile keeps the separation clean. The one-line diff is exactly what a minimal fix should look like. No "while I'm here" cleanup, no unrelated changes bundled in. ### TDD lens The PR description includes a test plan with three manual steps. That's better than nothing, but this bug existed because there was no automated regression for it. The fix resolves the immediate failure, but without an automated test, the same class of problem (lifecycle script relying on a devDep, breaking `--omit=dev` installs) can silently reappear. A CI step like: ```yaml - name: Verify production Docker build run: docker build --target production frontend/ ``` …would have caught this before it ever reached the nightly deploy. Worth tracking, though I'd open a separate issue rather than block this fix.
Author
Owner

🏛️ Markus Keller (@mkeller) — Application Architect

Verdict: Approved

Docs check

This PR modifies frontend/Dockerfile. Per the documentation update table:

What changed Required update
New Docker service or infrastructure component docs/architecture/c4/l2-containers.puml + docs/DEPLOYMENT.md

This is not a new service or component — it's a flag change to an existing build step. No diagram or doc update is required. ✓

Architecture assessment

The fix is at the correct layer. The multi-stage build design is sound:

  1. build stage installs everything, compiles the app
  2. production stage installs only runtime deps, serves the pre-compiled output

The postinstall script exists to apply patches to test tooling (@vitest/browser-playwright). Test tooling patches have no place in the production image. --ignore-scripts enforces that boundary without requiring any change to the application's dependency manifest — which is the right call.

One structural note (not a blocker)

The Dockerfile currently has no comment explaining why --ignore-scripts is used. Given that this is a deliberate constraint, future maintainers (or Renovate bumping the npm version) might remove it thinking it's unnecessary. A one-line comment would document the intent without ceremony. Tobias flagged the same thing — aligning on that suggestion.

## 🏛️ Markus Keller (@mkeller) — Application Architect **Verdict: ✅ Approved** ### Docs check This PR modifies `frontend/Dockerfile`. Per the documentation update table: | What changed | Required update | |---|---| | New Docker service or infrastructure component | `docs/architecture/c4/l2-containers.puml` + `docs/DEPLOYMENT.md` | This is **not** a new service or component — it's a flag change to an existing build step. No diagram or doc update is required. ✓ ### Architecture assessment The fix is at the correct layer. The multi-stage build design is sound: 1. `build` stage installs everything, compiles the app 2. `production` stage installs only runtime deps, serves the pre-compiled output The `postinstall` script exists to apply patches to test tooling (`@vitest/browser-playwright`). Test tooling patches have no place in the production image. `--ignore-scripts` enforces that boundary without requiring any change to the application's dependency manifest — which is the right call. ### One structural note (not a blocker) The Dockerfile currently has no comment explaining *why* `--ignore-scripts` is used. Given that this is a deliberate constraint, future maintainers (or Renovate bumping the npm version) might remove it thinking it's unnecessary. A one-line comment would document the intent without ceremony. Tobias flagged the same thing — aligning on that suggestion.
Author
Owner

🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist

Verdict: ⚠️ Approved with concerns

The fix is correct. My concern is about what's missing after it.

Blocker: no regression guard

This bug lived undetected until the nightly deploy pipeline failed. That means the production Docker build is not verified anywhere in the PR-level CI pipeline. The same failure mode — a lifecycle script depending on a devDep — can silently reappear in any future package.json change, and it will again only be discovered after deploy.

The test plan in this PR is three manual steps. Manual steps do not prevent regressions.

Suggested follow-up (I'd open a separate issue for this):

- name: Build production Docker image
  run: docker build --target production frontend/

Adding this to CI would make the production build a first-class quality gate. It's fast (minutes), it runs on every PR that touches frontend/, and it would have caught this before merge.

I won't block this PR for that — the fix is urgent and correct. But please track the CI coverage gap.

What's covered

  • The manual test plan covers the happy path and the dev regression path ✓
  • The diff is minimal and targeted — no risk of unrelated side effects ✓
  • The fix does not affect test environments (npm ci without flags still applies patches) ✓
## 🧪 Sara Holt (@saraholt) — QA Engineer & Test Strategist **Verdict: ⚠️ Approved with concerns** The fix is correct. My concern is about what's missing after it. ### Blocker: no regression guard This bug lived undetected until the nightly deploy pipeline failed. That means the production Docker build is not verified anywhere in the PR-level CI pipeline. The same failure mode — a lifecycle script depending on a devDep — can silently reappear in any future `package.json` change, and it will again only be discovered after deploy. The test plan in this PR is three manual steps. Manual steps do not prevent regressions. **Suggested follow-up (I'd open a separate issue for this):** ```yaml - name: Build production Docker image run: docker build --target production frontend/ ``` Adding this to CI would make the production build a first-class quality gate. It's fast (minutes), it runs on every PR that touches `frontend/`, and it would have caught this before merge. I won't block this PR for that — the fix is urgent and correct. But please track the CI coverage gap. ### What's covered - The manual test plan covers the happy path and the dev regression path ✓ - The diff is minimal and targeted — no risk of unrelated side effects ✓ - The fix does not affect test environments (`npm ci` without flags still applies patches) ✓
Author
Owner

📋 Elicit — Requirements Engineer

Verdict: Approved

Reviewing from a requirements and specification perspective.

Problem statement quality

The PR description is clear and precise:

  • Problem: production build fails with exit code 127 — exact error message quoted ✓
  • Root cause: postinstall calls patch-package, which is a devDependency absent under --omit=dev
  • Fix: --ignore-scripts suppresses lifecycle scripts in the production stage ✓
  • Scope: dev and test environments unaffected ✓

Acceptance criteria check

The test plan covers three scenarios:

  1. docker build --target production succeeds ✓ (directly verifies the fix)
  2. node build starts and serves the app ✓ (verifies no runtime regression)
  3. Dev npm ci still applies patches ✓ (verifies no dev regression)

This is complete for the stated scope.

One gap worth noting

The requirements context (the nightly deploy pipeline) is referenced in the problem description but not linked to a Gitea issue. If there's a tracking issue for deploy reliability, linking this PR to it would close the traceability loop. Not a blocker — this is a hotfix and speed is appropriate.

## 📋 Elicit — Requirements Engineer **Verdict: ✅ Approved** Reviewing from a requirements and specification perspective. ### Problem statement quality The PR description is clear and precise: - **Problem**: production build fails with exit code 127 — exact error message quoted ✓ - **Root cause**: `postinstall` calls `patch-package`, which is a devDependency absent under `--omit=dev` ✓ - **Fix**: `--ignore-scripts` suppresses lifecycle scripts in the production stage ✓ - **Scope**: dev and test environments unaffected ✓ ### Acceptance criteria check The test plan covers three scenarios: 1. `docker build --target production` succeeds ✓ (directly verifies the fix) 2. `node build` starts and serves the app ✓ (verifies no runtime regression) 3. Dev `npm ci` still applies patches ✓ (verifies no dev regression) This is complete for the stated scope. ### One gap worth noting The requirements context (the nightly deploy pipeline) is referenced in the problem description but not linked to a Gitea issue. If there's a tracking issue for deploy reliability, linking this PR to it would close the traceability loop. Not a blocker — this is a hotfix and speed is appropriate.
Author
Owner

🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead

Verdict: Approved

No UI or UX impact from this change. The modification is purely to the Docker build pipeline — no Svelte components, no routes, no accessibility surface, no brand tokens, no responsive layouts affected.

Checked: no frontend source files changed, no user-facing behavior modified, no design system elements touched.

LGTM from a UI/UX perspective. Ship it.

## 🎨 Leonie Voss (@leonievoss) — UI/UX Design Lead **Verdict: ✅ Approved** No UI or UX impact from this change. The modification is purely to the Docker build pipeline — no Svelte components, no routes, no accessibility surface, no brand tokens, no responsive layouts affected. Checked: no frontend source files changed, no user-facing behavior modified, no design system elements touched. LGTM from a UI/UX perspective. Ship it.
marcel merged commit 33c738db3b into main 2026-05-14 19:49:06 +02:00
marcel deleted branch fix/production-npm-ci-ignore-scripts 2026-05-14 19:49:07 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#583