diff --git a/docs/adr/028-pdfjs-wasm-decoders-and-csp-constraint.md b/docs/adr/028-pdfjs-wasm-decoders-and-csp-constraint.md new file mode 100644 index 00000000..fe058752 --- /dev/null +++ b/docs/adr/028-pdfjs-wasm-decoders-and-csp-constraint.md @@ -0,0 +1,60 @@ +# ADR-028 — pdf.js wasm decoders are served same-origin; a future CSP must allow them + +**Date:** 2026-06-01 +**Status:** Accepted +**Issue:** #708 (scanned PDFs with CCITT/JBIG2 images render blank) +**Milestone:** Pre-prod read-path hardening + +--- + +## Context + +pdf.js 5.x moved the **JBIG2, CCITTFax, and JPEG2000 image decoders into +WebAssembly**. A single `jbig2.wasm` module decodes both JBIG2 and CCITTFax; +`openjpeg.wasm` decodes JPEG2000. These modules live in +`node_modules/pdfjs-dist/wasm/` and are not on the web path by default, and +`getDocument` will not load them unless it is given a `wasmUrl`. Without that, +bi-level black-and-white scans (CCITT G4 fax — ~16% of the archive) painted a +blank canvas in production while JPEG scans rendered fine. + +Two cross-cutting, long-lived constraints fall out of the fix and are not +obvious from reading any single file — hence this record. + +## Decision + +1. **Serve the pdf.js wasm from our own origin**, at the unversioned path + `/pdfjs-wasm/`, copied from `node_modules/pdfjs-dist/wasm/` into + `build/client/` at build time by `vite-plugin-static-copy` (a devDependency; + see `frontend/vite.config.ts`). `getDocument` is called with + `wasmUrl: '/pdfjs-wasm/'`. **Never point `wasmUrl` at a public CDN** — a + decoder on the core read path must not become a supply-chain RCE surface. + +2. **Any future `Content-Security-Policy` MUST include + `script-src 'wasm-unsafe-eval'` and `worker-src 'self' blob:`.** pdf.js + instantiates WebAssembly and runs its renderer in a worker created from a + `blob:` URL. A CSP without these directives silently re-breaks PDF rendering + for the exact class of documents #708 fixed. No CSP is set today + (`infra/caddy/Caddyfile` `(security_headers)`); the Caddyfile carries a + pointer to this ADR so the future CSP author cannot miss it. + +3. **The wasm shipping is guarded at build time.** `frontend/postbuild` + (`scripts/assert-pdfjs-wasm.mjs`) fails the build loudly if `jbig2.wasm` or + `openjpeg.wasm` is absent from `build/client/pdfjs-wasm/` — so a future + `pdfjs-dist` bump that renames or relocates the wasm cannot regress to a + blank canvas unnoticed. This runs in CI and in the Docker build stage. + +## Consequences + +- The decoders load from the same origin as the app — no third-party trust, no + SRI to manage, correct `Content-Type: application/wasm` served by + adapter-node. +- `/pdfjs-wasm/` is **not** content-hashed, so it must not be served + `immutable` — a revalidating cache avoids serving a stale `.wasm` against a + newer worker after a pdfjs upgrade. +- The CSP constraint is a standing obligation on whoever introduces a CSP. If + that work happens, this ADR and the Caddyfile note are the source of truth. +- No new container or external system is introduced, so the C4 L1/L2 diagrams + are unaffected; `/pdfjs-wasm/` is a static asset served by the existing + frontend container. +- Render/decode failures are no longer silent: the viewer surfaces a localized + message plus a working download link (see #708). diff --git a/infra/caddy/Caddyfile b/infra/caddy/Caddyfile index 3b47d4e6..8c0642bb 100644 --- a/infra/caddy/Caddyfile +++ b/infra/caddy/Caddyfile @@ -25,7 +25,8 @@ # No Content-Security-Policy is set yet. When one is added, it MUST # include `script-src 'wasm-unsafe-eval'` and `worker-src 'self' blob:` # or the pdf.js WebAssembly image decoders (JBIG2/CCITTFax/JPEG2000) - # and worker will be blocked and scanned PDFs render blank. See #708. + # and worker will be blocked and scanned PDFs render blank. + # See #708 and docs/adr/028-pdfjs-wasm-decoders-and-csp-constraint.md. -Server } }