Files
familienarchiv/docs/adr/028-pdfjs-wasm-decoders-and-csp-constraint.md
Marcel 420c0e3e10
Some checks failed
CI / Unit & Component Tests (pull_request) Successful in 3m21s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / fail2ban Regex (pull_request) Has been cancelled
CI / Semgrep Security Scan (pull_request) Has been cancelled
CI / Compose Bucket Idempotency (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
CI / Unit & Component Tests (push) Successful in 3m18s
CI / OCR Service Tests (push) Successful in 21s
CI / Backend Unit Tests (push) Successful in 3m45s
CI / fail2ban Regex (push) Successful in 44s
CI / Semgrep Security Scan (push) Successful in 21s
CI / Compose Bucket Idempotency (push) Successful in 1m3s
nightly / deploy-staging (push) Successful in 2m14s
docs(adr): record pdf.js wasm same-origin serving + future-CSP constraint
Promote the future-CSP constraint from an inline Caddyfile comment to a
durable ADR-028: serve the pdf.js wasm decoders same-origin (never a
CDN), any future CSP must allow 'wasm-unsafe-eval' + worker-src 'self'
blob:, and the build-time guard keeps the wasm shipping. Caddyfile now
points at the ADR.

Addresses re-review: Markus (constraint should be an ADR, not a comment).

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:17:41 +02:00

3.0 KiB

ADR-028 — pdf.js wasm decoders are served same-origin; a future CSP must allow them

Date: 2026-06-01 Status: Accepted Issue: #708 (scanned PDFs with CCITT/JBIG2 images render blank) Milestone: Pre-prod read-path hardening


Context

pdf.js 5.x moved the JBIG2, CCITTFax, and JPEG2000 image decoders into WebAssembly. A single jbig2.wasm module decodes both JBIG2 and CCITTFax; openjpeg.wasm decodes JPEG2000. These modules live in node_modules/pdfjs-dist/wasm/ and are not on the web path by default, and getDocument will not load them unless it is given a wasmUrl. Without that, bi-level black-and-white scans (CCITT G4 fax — ~16% of the archive) painted a blank canvas in production while JPEG scans rendered fine.

Two cross-cutting, long-lived constraints fall out of the fix and are not obvious from reading any single file — hence this record.

Decision

  1. Serve the pdf.js wasm from our own origin, at the unversioned path /pdfjs-wasm/, copied from node_modules/pdfjs-dist/wasm/ into build/client/ at build time by vite-plugin-static-copy (a devDependency; see frontend/vite.config.ts). getDocument is called with wasmUrl: '/pdfjs-wasm/'. Never point wasmUrl at a public CDN — a decoder on the core read path must not become a supply-chain RCE surface.

  2. Any future Content-Security-Policy MUST include script-src 'wasm-unsafe-eval' and worker-src 'self' blob:. pdf.js instantiates WebAssembly and runs its renderer in a worker created from a blob: URL. A CSP without these directives silently re-breaks PDF rendering for the exact class of documents #708 fixed. No CSP is set today (infra/caddy/Caddyfile (security_headers)); the Caddyfile carries a pointer to this ADR so the future CSP author cannot miss it.

  3. The wasm shipping is guarded at build time. frontend/postbuild (scripts/assert-pdfjs-wasm.mjs) fails the build loudly if jbig2.wasm or openjpeg.wasm is absent from build/client/pdfjs-wasm/ — so a future pdfjs-dist bump that renames or relocates the wasm cannot regress to a blank canvas unnoticed. This runs in CI and in the Docker build stage.

Consequences

  • The decoders load from the same origin as the app — no third-party trust, no SRI to manage, correct Content-Type: application/wasm served by adapter-node.
  • /pdfjs-wasm/ is not content-hashed, so it must not be served immutable — a revalidating cache avoids serving a stale .wasm against a newer worker after a pdfjs upgrade.
  • The CSP constraint is a standing obligation on whoever introduces a CSP. If that work happens, this ADR and the Caddyfile note are the source of truth.
  • No new container or external system is introduced, so the C4 L1/L2 diagrams are unaffected; /pdfjs-wasm/ is a static asset served by the existing frontend container.
  • Render/decode failures are no longer silent: the viewer surfaces a localized message plus a working download link (see #708).