[bug] Scanned PDFs with CCITT/JBIG2 images render blank — pdf.js 5.x wasmUrl not configured #708

Closed
opened 2026-06-01 19:38:15 +02:00 by marcel · 10 comments
Owner

Summary

In the document viewer, some scanned PDFs render blank (no page image) while others render fine. The document thumbnail/preview still shows in both cases, which masks the problem. Root cause: pdf.js 5.x moved the JBIG2 + CCITTFax + JPEG2000 image decoders into WebAssembly, but our renderer never configures the wasmUrl option, so those decoders fail to initialize and the page paints nothing.

This blocks the archive's core read journey for an entire class of documents (bi-level black-and-white scans) — ~16% of all documents, roughly ~1,200 letters (see Blast radius below).

Symptoms

  • Affected docs: viewer area is blank/white; download link works, thumbnail preview shows.
  • Browser console on an affected doc:
    JBig2CCITTFaxImage#instantiateWasm: UnknownErrorException: Ensure that the `wasmUrl` API parameter is provided.
    JBig2CCITTFaxImage#getJsModule: TypeError: The specifier "nulljbig2_nowasm_fallback.js" was a bare specifier ...
    Unable to decode image "img_p0_1": "Jbig2Error: JBig2 failed to initialize".
    
  • The render error is silently swallowed in usePdfRenderer.svelte.ts (the renderCurrentPage catch at the RenderingCancelledException block just returns without setting error), so the user sees a blank canvas rather than any message.

Root cause

frontend/src/lib/document/viewer/usePdfRenderer.svelte.ts sets GlobalWorkerOptions.workerSrc but calls getDocument(src) with no wasmUrl. In pdf.js 5.5.207:

  • getDocument({ … }) accepts a wasmUrl option (see pdfjs-dist/build/pdf.mjs:14439, error guard at pdf.mjs:9202"Ensure that the wasmUrl API parameter is provided.").
  • A single wasm module (jbig2.wasm) decodes BOTH JBIG2 and CCITTFax — see JBig2CCITTFaxWasmImage.decode() in pdf.worker.mjs:4156; the CCITTOptions branch (pdf.worker.mjs:4182) calls _ccitt_decode inside the same module.
  • With wasmUrl unset, #instantiateWasm (pdf.worker.mjs:4134) fails, decode() resolves the module to null, and pdf.worker.mjs:4174-4175 throws JBig2Error: JBig2 failed to initialize. The JS-module fallback path builds ${wasmUrl}jbig2_nowasm_fallback.js → with null becomes the broken bare specifier nulljbig2_nowasm_fallback.js, so it can't rescue either.

Why only some PDFs are affected (verified against staging)

The differentiator is the image codec inside each PDF, not the scan/import workflow. The scanner/converter picks compression per page by content:

Document Codec Needs wasm? Result
C-2703 (d8f9bb15-cb67-4c72-85f4-cc5a0c4e3dab) DCTDecode (JPEG), 8-bit RGB, ~504 KB No (JPEG decodes natively) renders
C-3224 (bd895525-34f5-4ee4-9a5c-37ceecd7bb37) CCITTFaxDecode (G4 fax), 1-bit DeviceGray, ~29 KB Yes (wasm-only in 5.x) blank

So JPEG (contone/photo) pages display; CCITT G4 / JBIG2 (bi-level B&W text) pages are blank. Both docs are %PDF-1.3 from the same scanner.

Evidence gathered

  • MinIO objects are intact and valid (%PDF header, %%EOF trailer, correct application/pdf, correct sizes). Backend /api/documents/{id}/file and DB rows are healthy.
  • pdfjs-dist 5.5.207 legacy build loads & renders C-3224 in Node only because it silently falls back to an in-tree JS CCITT decoder; the modern browser build does not fall back (confirmed by the live console error above).
  • node_modules/pdfjs-dist/wasm/ ships: jbig2.wasm, openjpeg.wasm, openjpeg_nowasm_fallback.js, qcms_bg.wasm (+ licenses). There is no jbig2_nowasm_fallback.js — JBIG2/CCITT have no JS fallback, so the wasm is mandatory.

Blast radius (random sample, n=200 of 7,534 PDFs)

Codec Count Share Renders today?
DCTDecode (JPEG) 168 84%
CCITTFaxDecode (G4 fax) 32 16% blank
JBIG2Decode / JPXDecode / unclassified 0 0%

~16% of documents affected → roughly ~1,200 letters archive-wide (95% CI ≈ 11–21%). About 1 in 6. The sample found zero true JBIG2 docs — the JBig2 failed to initialize console wording is a red herring: pdf.js routes CCITT through the shared JBig2CCITTFaxImage wasm module, so a CCITT failure surfaces as a JBig2Error. The affected class is entirely CCITT (G4 fax). Nothing was ever lost — affected docs always had a working download link + server thumbnail.


Decisions (resolved — best-practice defaults)

After multi-persona review (see comments), the three open decisions are resolved as:

  1. Ship the wasm via vite-plugin-static-copy (new devDependency). Chosen over a prebuild script or committing blobs because this bug is a dev/prod parity failure, and the plugin guarantees parity: it serves /pdfjs-wasm/ via dev middleware and emits it to the build output from one config line, reads from node_modules (always version-matched), and fails the build loudly if the source dir is absent. It is a devDependency only — never shipped to the runtime image.
  2. Unversioned path /pdfjs-wasm/ with the adapter's default revalidating cache — NOT immutable. immutable on a non-content-hashed URL would serve a stale .wasm against a new worker after a future pdfjs bump. A 304 on a ~105 KB file is a rounding error at our scale. Revisit version-stamped + immutable only if profiling ever justifies it.
  3. Measure blast radius (done — see above), then proportionate comms. Data was never lost; this is a small private archive — at most a one-line reassurance to active family members once the fix ships, or nothing.

Proposed fix

  1. Serve the pdf.js wasm assets at /pdfjs-wasm/ via vite-plugin-static-copy, sourced from node_modules/pdfjs-dist/wasm/. Include all files (jbig2.wasm, openjpeg.wasm, openjpeg_nowasm_fallback.js, qcms_bg.wasm) — openjpeg.wasm covers JPEG2000/JPXDecode scans for free and pre-empts a sequel issue. Verify the assets land in build/client/ and are served by the production Docker image, not just npm run dev.
  2. Pass wasmUrl to getDocument in usePdfRenderer.svelte.ts, configured once next to workerSrc in init() (single source of truth, no repeated literal):
    const loadingTask = pdfjsLib.getDocument({ url: src, wasmUrl: '/pdfjs-wasm/' });
    
    (wasmUrl must be a directory URL with a trailing slash; pdf.js appends jbig2.wasm etc.)
  3. Surface render failures. In renderCurrentPage, when task.promise rejects with anything other than RenderingCancelledException, set error to a localized message (new doc_render_failed key in messages/{de,en,es}.json) — never the raw pdf.js e.message. This routes into the existing error UI (message + download link).
  4. Drive-by fixes in the files already being touched:
    • Add rel="noopener noreferrer" to the download <a target="_blank"> in DocumentViewer.svelte (CWE-1022).
    • Add a comment in infra/caddy/Caddyfile (security_headers) noting that any future Content-Security-Policy must include script-src 'wasm-unsafe-eval' and worker-src 'self' blob:, or PDF rendering breaks again. Reference this issue.

Acceptance criteria

User-visible outcomes:

  • A CCITTFax (G4) scan (e.g. C-3224) renders a visible page image (canvas contains non-background pixels above a sampled threshold).
  • A JBIG2 scan renders a visible page image. (No JBIG2 docs found in the archive sample — assert with a synthetic/known JBIG2 fixture, since the shared wasm path is the same.)
  • A DCTDecode (JPEG) PDF (e.g. C-2703) still renders — no regression.
  • A JPEG2000 / JPXDecode scan renders (covered by openjpeg.wasm) — assert if a fixture exists, else explicitly note none was found (none in archive sample).
  • A multi-page mixed-codec PDF (e.g. JPEG page + fax page) renders on both pages and pages between them.
  • A text-only PDF (no image XObject) still renders (guards over-scoping).
  • Negative path: a PDF that genuinely cannot be decoded shows the localized error message + working download link — never a silent blank canvas.

Implementation/ops signals:

  • No wasmUrl / JBig2 failed to initialize warnings in the browser console for affected docs.
  • Verified in the production Docker image (node build), not just npm run preview: curl -I /pdfjs-wasm/jbig2.wasm200 + Content-Type: application/wasm.

Tests (TDD)

  • Unit guard (runs everywhere): fake LibLoader (via testHelpers.makeFakeLibLoader) → assert getDocument is called with a non-null wasmUrl ending in /. Red first (currently called with a bare src string).
  • Behavioral (CI browser mode, PdfViewer.svelte.test.ts): render committed fixtures and assert the canvas is non-blank (sample pixel count, mirror the repro). Fixtures: CCITT (C-3224, 29 KB), JBIG2, DCT (no-regression). Fixtures committed as hermetic test assets — do not fetch from staging at test time.
  • Negative-path test: force a non-cancellation render rejection via a fake loader → assert error is set, the localized message renders, and the download link is present.
  • Console-warning assertion in the browser test: fail if JBig2 failed to initialize / wasmUrl warnings appear.
  • Existing browser-mode tests still pass (browser tests run in CI's Playwright container).

Scope boundaries

  • standardFontDataUrl / iccUrl (also new in pdf.js 5.x) are out of scope unless a specific affected document is found — do not gold-plate. File a follow-up only if evidence appears.
  • Wiring VITE_SENTRY_DSN for the frontend (so client-side decode/render failures surface in GlitchTip instead of dying in the console) is a worthwhile separate observability issue — not part of this fix.
## Summary In the document viewer, **some** scanned PDFs render blank (no page image) while others render fine. The document thumbnail/preview still shows in both cases, which masks the problem. Root cause: pdf.js 5.x moved the **JBIG2 + CCITTFax + JPEG2000 image decoders into WebAssembly**, but our renderer never configures the `wasmUrl` option, so those decoders fail to initialize and the page paints nothing. This blocks the archive's core read journey for an entire class of documents (bi-level black-and-white scans) — **~16% of all documents, roughly ~1,200 letters** (see Blast radius below). ## Symptoms - Affected docs: viewer area is blank/white; **download link works**, thumbnail preview shows. - Browser console on an affected doc: ``` JBig2CCITTFaxImage#instantiateWasm: UnknownErrorException: Ensure that the `wasmUrl` API parameter is provided. JBig2CCITTFaxImage#getJsModule: TypeError: The specifier "nulljbig2_nowasm_fallback.js" was a bare specifier ... Unable to decode image "img_p0_1": "Jbig2Error: JBig2 failed to initialize". ``` - The render error is **silently swallowed** in `usePdfRenderer.svelte.ts` (the `renderCurrentPage` catch at the `RenderingCancelledException` block just `return`s without setting `error`), so the user sees a blank canvas rather than any message. ## Root cause `frontend/src/lib/document/viewer/usePdfRenderer.svelte.ts` sets `GlobalWorkerOptions.workerSrc` but calls `getDocument(src)` with **no `wasmUrl`**. In pdf.js 5.5.207: - `getDocument({ … })` accepts a `wasmUrl` option (see `pdfjs-dist/build/pdf.mjs:14439`, error guard at `pdf.mjs:9202` — *"Ensure that the `wasmUrl` API parameter is provided."*). - A **single wasm module (`jbig2.wasm`) decodes BOTH JBIG2 and CCITTFax** — see `JBig2CCITTFaxWasmImage.decode()` in `pdf.worker.mjs:4156`; the `CCITTOptions` branch (`pdf.worker.mjs:4182`) calls `_ccitt_decode` inside the same module. - With `wasmUrl` unset, `#instantiateWasm` (`pdf.worker.mjs:4134`) fails, `decode()` resolves the module to `null`, and `pdf.worker.mjs:4174-4175` throws `JBig2Error: JBig2 failed to initialize`. The JS-module fallback path builds `${wasmUrl}jbig2_nowasm_fallback.js` → with `null` becomes the broken bare specifier `nulljbig2_nowasm_fallback.js`, so it can't rescue either. ## Why only some PDFs are affected (verified against staging) The differentiator is the **image codec inside each PDF**, not the scan/import workflow. The scanner/converter picks compression per page by content: | Document | Codec | Needs wasm? | Result | |----------|-------|-------------|--------| | `C-2703` (`d8f9bb15-cb67-4c72-85f4-cc5a0c4e3dab`) | `DCTDecode` (JPEG), 8-bit RGB, ~504 KB | No (JPEG decodes natively) | renders ✅ | | `C-3224` (`bd895525-34f5-4ee4-9a5c-37ceecd7bb37`) | `CCITTFaxDecode` (G4 fax), 1-bit DeviceGray, ~29 KB | Yes (wasm-only in 5.x) | blank ❌ | So JPEG (contone/photo) pages display; CCITT G4 / JBIG2 (bi-level B&W text) pages are blank. Both docs are `%PDF-1.3` from the same scanner. ### Evidence gathered - MinIO objects are intact and valid (`%PDF` header, `%%EOF` trailer, correct `application/pdf`, correct sizes). Backend `/api/documents/{id}/file` and DB rows are healthy. - `pdfjs-dist` 5.5.207 **legacy** build loads & renders `C-3224` in Node only because it silently falls back to an in-tree JS CCITT decoder; the **modern** browser build does **not** fall back (confirmed by the live console error above). - `node_modules/pdfjs-dist/wasm/` ships: `jbig2.wasm`, `openjpeg.wasm`, `openjpeg_nowasm_fallback.js`, `qcms_bg.wasm` (+ licenses). There is **no** `jbig2_nowasm_fallback.js` — JBIG2/CCITT have no JS fallback, so the wasm is mandatory. ### Blast radius (random sample, n=200 of 7,534 PDFs) | Codec | Count | Share | Renders today? | |---|---|---|---| | `DCTDecode` (JPEG) | 168 | 84% | ✅ | | `CCITTFaxDecode` (G4 fax) | 32 | 16% | ❌ blank | | `JBIG2Decode` / `JPXDecode` / unclassified | 0 | 0% | — | **~16% of documents affected → roughly ~1,200 letters** archive-wide (95% CI ≈ 11–21%). About 1 in 6. The sample found **zero** true JBIG2 docs — the `JBig2 failed to initialize` console wording is a red herring: pdf.js routes CCITT through the shared `JBig2CCITTFaxImage` wasm module, so a CCITT failure surfaces as a `JBig2Error`. The affected class is entirely CCITT (G4 fax). Nothing was ever lost — affected docs always had a working download link + server thumbnail. --- ## Decisions (resolved — best-practice defaults) After multi-persona review (see comments), the three open decisions are resolved as: 1. **Ship the wasm via `vite-plugin-static-copy` (new devDependency).** Chosen over a `prebuild` script or committing blobs because this bug *is* a dev/prod parity failure, and the plugin guarantees parity: it serves `/pdfjs-wasm/` via dev middleware **and** emits it to the build output from one config line, reads from `node_modules` (always version-matched), and fails the build loudly if the source dir is absent. It is a devDependency only — never shipped to the runtime image. 2. **Unversioned path `/pdfjs-wasm/` with the adapter's default revalidating cache — NOT `immutable`.** `immutable` on a non-content-hashed URL would serve a stale `.wasm` against a new worker after a future pdfjs bump. A `304` on a ~105 KB file is a rounding error at our scale. Revisit version-stamped + immutable only if profiling ever justifies it. 3. **Measure blast radius (done — see above), then proportionate comms.** Data was never lost; this is a small private archive — at most a one-line reassurance to active family members once the fix ships, or nothing. ## Proposed fix 1. **Serve the pdf.js wasm assets** at `/pdfjs-wasm/` via `vite-plugin-static-copy`, sourced from `node_modules/pdfjs-dist/wasm/`. Include **all** files (`jbig2.wasm`, `openjpeg.wasm`, `openjpeg_nowasm_fallback.js`, `qcms_bg.wasm`) — `openjpeg.wasm` covers JPEG2000/`JPXDecode` scans for free and pre-empts a sequel issue. Verify the assets land in `build/client/` and are served by the production Docker image, not just `npm run dev`. 2. **Pass `wasmUrl` to `getDocument`** in `usePdfRenderer.svelte.ts`, configured once next to `workerSrc` in `init()` (single source of truth, no repeated literal): ```ts const loadingTask = pdfjsLib.getDocument({ url: src, wasmUrl: '/pdfjs-wasm/' }); ``` (`wasmUrl` must be a directory URL with a trailing slash; pdf.js appends `jbig2.wasm` etc.) 3. **Surface render failures.** In `renderCurrentPage`, when `task.promise` rejects with anything other than `RenderingCancelledException`, set `error` to a **localized** message (new `doc_render_failed` key in `messages/{de,en,es}.json`) — never the raw pdf.js `e.message`. This routes into the existing error UI (message + download link). 4. **Drive-by fixes in the files already being touched:** - Add `rel="noopener noreferrer"` to the download `<a target="_blank">` in `DocumentViewer.svelte` (CWE-1022). - Add a comment in `infra/caddy/Caddyfile` `(security_headers)` noting that any future `Content-Security-Policy` must include `script-src 'wasm-unsafe-eval'` and `worker-src 'self' blob:`, or PDF rendering breaks again. Reference this issue. ## Acceptance criteria User-visible outcomes: - [ ] A **CCITTFax (G4)** scan (e.g. `C-3224`) renders a visible page image (canvas contains non-background pixels above a sampled threshold). - [ ] A **JBIG2** scan renders a visible page image. _(No JBIG2 docs found in the archive sample — assert with a synthetic/known JBIG2 fixture, since the shared wasm path is the same.)_ - [ ] A **DCTDecode (JPEG)** PDF (e.g. `C-2703`) still renders — no regression. - [ ] A **JPEG2000 / `JPXDecode`** scan renders (covered by `openjpeg.wasm`) — assert if a fixture exists, else explicitly note none was found (none in archive sample). - [ ] A **multi-page mixed-codec** PDF (e.g. JPEG page + fax page) renders on both pages and pages between them. - [ ] A text-only PDF (no image XObject) still renders (guards over-scoping). - [ ] **Negative path:** a PDF that genuinely cannot be decoded shows the **localized** error message + working download link — never a silent blank canvas. Implementation/ops signals: - [ ] No `wasmUrl` / `JBig2 failed to initialize` warnings in the browser console for affected docs. - [ ] Verified in the **production Docker image** (`node build`), not just `npm run preview`: `curl -I /pdfjs-wasm/jbig2.wasm` → `200` + `Content-Type: application/wasm`. ## Tests (TDD) - **Unit guard (runs everywhere):** fake `LibLoader` (via `testHelpers.makeFakeLibLoader`) → assert `getDocument` is called with a non-null `wasmUrl` ending in `/`. Red first (currently called with a bare `src` string). - **Behavioral (CI browser mode, `PdfViewer.svelte.test.ts`):** render committed fixtures and assert the canvas is **non-blank** (sample pixel count, mirror the repro). Fixtures: CCITT (`C-3224`, 29 KB), JBIG2, DCT (no-regression). Fixtures committed as hermetic test assets — do **not** fetch from staging at test time. - **Negative-path test:** force a non-cancellation render rejection via a fake loader → assert `error` is set, the localized message renders, and the download link is present. - **Console-warning assertion** in the browser test: fail if `JBig2 failed to initialize` / `wasmUrl` warnings appear. - Existing browser-mode tests still pass (browser tests run in CI's Playwright container). ## Scope boundaries - `standardFontDataUrl` / `iccUrl` (also new in pdf.js 5.x) are **out of scope** unless a specific affected document is found — do not gold-plate. File a follow-up only if evidence appears. - Wiring `VITE_SENTRY_DSN` for the frontend (so client-side decode/render failures surface in GlitchTip instead of dying in the console) is a worthwhile **separate** observability issue — not part of this fix.
marcel added the P0-criticalbug labels 2026-06-01 19:38:20 +02:00
Author
Owner

🏛️ Markus Keller — Application Architect

Observations

  • The diagnosis is solid and the fix is a configuration fix, not a structural one — no ADR needed, no diagram update triggered (it touches build config, not the DB schema, a route, or a container). Good. Keep it small.
  • The real architectural constraint is the build pipeline, not the renderer. frontend/Dockerfile production stage copies only /app/build and then runs npm ci --omit=dev --ignore-scripts. The wasm files in node_modules/pdfjs-dist/wasm/ are not web-served and won't be reachable at /pdfjs-wasm/ at runtime. So the fix must emit the wasm into the SvelteKit client build output (build/client/...). A wasmUrl that points at node_modules works in npm run dev and silently 404s in the Docker image — exactly the kind of dev/prod drift that makes a "fix" look done while staging stays broken.
  • pdfjs-dist is a runtime dependency (^5.5.207), so it survives --omit=dev — but that's irrelevant since node_modules isn't on the web path.
  • The silent return in renderCurrentPage (lines 94-103) is a reliability smell I care about independent of this bug: the system fails invisibly. Push failures up loudly. Proposed fix item #3 is correct and should not be dropped as "nice to have."

Recommendations

  • Emit the wasm via the SvelteKit-native path: drop the files under frontend/static/pdfjs-wasm/ (SvelteKit copies static/ verbatim into build/client/), or copy them there in a prebuild npm script sourced from node_modules/pdfjs-dist/wasm/. Either way the asset is in build/ and served by adapter-node at /pdfjs-wasm/. This avoids a new build-plugin dependency (see Decision Queue for the tradeoff vs. vite-plugin-static-copy).
  • Single source of truth for pdf.js asset URLs. workerSrc is wired in init(); put wasmUrl next to it, not buried in loadDocument. One place configures pdf.js's external assets.
  • Verify in the actual production image, not just npm run preview. The acceptance criterion should read "renders in the built Docker image," because preview and the Node adapter image resolve static assets differently enough to matter here.
  • If you copy via a prebuild script, add a one-line note to CONTRIBUTING.md that bumping pdfjs-dist requires re-copying wasm — otherwise the next upgrade reintroduces this exact bug. Cheap insurance for the memory of why.

Open Decisions

  • How to ship the wasm assets. Three options with different long-term costs:
    • (a) prebuild npm script copying node_modules/pdfjs-dist/wasm/*static/pdfjs-wasm/. No new dependency; one script to maintain; drift risk on version bump (mitigated by a CONTRIBUTING note or a build-time existence check).
    • (b) vite-plugin-static-copy (new devDependency). Auto-tracks the source dir, fails the build if absent; one more plugin in the maintenance surface.
    • (c) Commit the wasm blobs into static/pdfjs-wasm/. Simplest runtime; binary blobs in git; must remember to update on every pdfjs bump (worst drift profile).
    • My lean: (a) — least standing complexity, and the drift risk is contained by a build-time assertion that the files exist.
## 🏛️ Markus Keller — Application Architect ### Observations - The diagnosis is solid and the fix is a **configuration** fix, not a structural one — no ADR needed, no diagram update triggered (it touches build config, not the DB schema, a route, or a container). Good. Keep it small. - **The real architectural constraint is the build pipeline, not the renderer.** `frontend/Dockerfile` production stage copies only `/app/build` and then runs `npm ci --omit=dev --ignore-scripts`. The wasm files in `node_modules/pdfjs-dist/wasm/` are **not web-served** and won't be reachable at `/pdfjs-wasm/` at runtime. So the fix *must emit the wasm into the SvelteKit client build output* (`build/client/...`). A `wasmUrl` that points at node_modules works in `npm run dev` and silently 404s in the Docker image — exactly the kind of dev/prod drift that makes a "fix" look done while staging stays broken. - `pdfjs-dist` is a runtime `dependency` (^5.5.207), so it survives `--omit=dev` — but that's irrelevant since node_modules isn't on the web path. - The silent `return` in `renderCurrentPage` (lines 94-103) is a **reliability smell** I care about independent of this bug: the system fails invisibly. Push failures up loudly. Proposed fix item #3 is correct and should not be dropped as "nice to have." ### Recommendations - Emit the wasm via the **SvelteKit-native path**: drop the files under `frontend/static/pdfjs-wasm/` (SvelteKit copies `static/` verbatim into `build/client/`), or copy them there in a `prebuild` npm script sourced from `node_modules/pdfjs-dist/wasm/`. Either way the asset is in `build/` and served by adapter-node at `/pdfjs-wasm/`. This avoids a new build-plugin dependency (see Decision Queue for the tradeoff vs. `vite-plugin-static-copy`). - **Single source of truth for pdf.js asset URLs.** `workerSrc` is wired in `init()`; put `wasmUrl` next to it, not buried in `loadDocument`. One place configures pdf.js's external assets. - **Verify in the actual production image**, not just `npm run preview`. The acceptance criterion should read "renders in the built Docker image," because preview and the Node adapter image resolve static assets differently enough to matter here. - If you copy via a prebuild script, add a one-line note to `CONTRIBUTING.md` that bumping `pdfjs-dist` requires re-copying wasm — otherwise the next upgrade reintroduces this exact bug. Cheap insurance for the memory of *why*. ### Open Decisions - **How to ship the wasm assets.** Three options with different long-term costs: - **(a) `prebuild` npm script** copying `node_modules/pdfjs-dist/wasm/*` → `static/pdfjs-wasm/`. No new dependency; one script to maintain; drift risk on version bump (mitigated by a CONTRIBUTING note or a build-time existence check). - **(b) `vite-plugin-static-copy`** (new devDependency). Auto-tracks the source dir, fails the build if absent; one more plugin in the maintenance surface. - **(c) Commit the wasm blobs into `static/pdfjs-wasm/`.** Simplest runtime; binary blobs in git; must remember to update on every pdfjs bump (worst drift profile). - My lean: **(a)** — least standing complexity, and the drift risk is contained by a build-time assertion that the files exist.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

  • The existing renderer test (usePdfRenderer.svelte.test.ts) explicitly can't cover init()/loadDocument() — its own comment says "require pdfjsLib (browser module)". It only tests pure state (clamping, zoom). So a true "page renders" assertion belongs in browser mode (PdfViewer.svelte.test.ts), which runs in CI's Playwright container — good, that path exists.
  • PdfViewer.svelte already injects a libLoader prop and testHelpers.ts exposes makeFakeLibLoader. That's the seam: I can assert getDocument is called with { url, wasmUrl } without a real browser, using a fake loader that records the call. That's a fast, deterministic red test for fix item #2.
  • Fix item #3 (the swallowed render error) is currently two bare returns after distinguishing RenderingCancelledException. The error string today is set in loadDocument from e.message — i.e. a raw pdf.js English string ("Failed to load PDF"). If we start surfacing render failures, we should not leak raw pdf.js text to users (see Leonie/Nora).

Recommendations

  • TDD, two layers:
    1. Unit (fast): fake LibLoader → assert getDocument receives a non-null wasmUrl ending in /. Red first (currently it's called with a bare src string), then green. This guards the regression cheaply and runs outside the browser.
    2. Behavioral (browser-mode/CI): render a real CCITT fixture (the C-3224 scan) through PdfViewer and assert the canvas is non-blank (sample pixels, like the repro did: count non-white). Add a JBIG2 fixture too — that's the codec from the actual console error.
  • Don't fake the render green. A unit-env test that mocks pdfjs into "success" proves nothing about wasm loading. Keep the real-render assertion in the browser project or it's theatre.
  • Localize the failure message. When task.promise rejects with anything other than RenderingCancelledException, set error to a localized string (reuse getErrorMessage / a new doc_render_failed message in messages/{de,en,es}.json), not e.message. The viewer already renders {error} + the download link.
  • One constant for the wasm path. Define it once (next to workerSrc in init), don't repeat the /pdfjs-wasm/ literal across files.
  • Keep getDocument signature change minimal: getDocument({ url: src, wasmUrl }). Note src here is the file URL string from useFileLoader — confirm the object form doesn't break the existing data-URL/blob path if any.

Open Decisions (none)

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - The existing renderer test (`usePdfRenderer.svelte.test.ts`) explicitly *can't* cover `init()`/`loadDocument()` — its own comment says "require pdfjsLib (browser module)". It only tests pure state (clamping, zoom). So a true "page renders" assertion belongs in **browser mode** (`PdfViewer.svelte.test.ts`), which runs in CI's Playwright container — good, that path exists. - `PdfViewer.svelte` already injects a `libLoader` prop and `testHelpers.ts` exposes `makeFakeLibLoader`. That's the seam: I can assert `getDocument` is called with `{ url, wasmUrl }` **without** a real browser, using a fake loader that records the call. That's a fast, deterministic red test for fix item #2. - Fix item #3 (the swallowed render error) is currently two bare `return`s after distinguishing `RenderingCancelledException`. The `error` string today is set in `loadDocument` from `e.message` — i.e. a **raw pdf.js English string** ("Failed to load PDF"). If we start surfacing render failures, we should not leak raw pdf.js text to users (see Leonie/Nora). ### Recommendations - **TDD, two layers:** 1. *Unit (fast):* fake `LibLoader` → assert `getDocument` receives a non-null `wasmUrl` ending in `/`. Red first (currently it's called with a bare `src` string), then green. This guards the regression cheaply and runs outside the browser. 2. *Behavioral (browser-mode/CI):* render a real **CCITT** fixture (the `C-3224` scan) through `PdfViewer` and assert the canvas is non-blank (sample pixels, like the repro did: count non-white). Add a **JBIG2** fixture too — that's the codec from the actual console error. - **Don't fake the render green.** A unit-env test that mocks pdfjs into "success" proves nothing about wasm loading. Keep the real-render assertion in the browser project or it's theatre. - **Localize the failure message.** When `task.promise` rejects with anything other than `RenderingCancelledException`, set `error` to a localized string (reuse `getErrorMessage` / a new `doc_render_failed` message in `messages/{de,en,es}.json`), not `e.message`. The viewer already renders `{error}` + the download link. - **One constant for the wasm path.** Define it once (next to `workerSrc` in `init`), don't repeat the `/pdfjs-wasm/` literal across files. - Keep `getDocument` signature change minimal: `getDocument({ url: src, wasmUrl })`. Note `src` here is the file URL string from `useFileLoader` — confirm the object form doesn't break the existing data-URL/blob path if any. ### Open Decisions _(none)_
Author
Owner

🛡️ Nora Steiner ("NullX") — Application Security Engineer

Observations

  • This is a same-origin static-asset fetch (/pdfjs-wasm/*.wasm served by our own adapter-node). No SSRF, no third-party CDN, no SRI concern. Serving the wasm from our own origin is the correct security posture — do not be tempted to point wasmUrl at unpkg/cdnjs to "save a copy step"; that would add a remote-code-execution-via-supply-chain surface for a core viewer path.
  • No CSP is currently set. infra/caddy/Caddyfile (security_headers) ships HSTS, X-Content-Type-Options: nosniff, Referrer-Policy, Permissions-Policy — but no Content-Security-Policy. So nothing blocks wasm today. That's why it'll work, but it's also a latent trap.
  • Pre-existing finding while I was in DocumentViewer.svelte: the failure-state download link is
    <a href="/api/documents/{doc.id}/file" target="_blank" class="...">
    
    target="_blank" with no rel="noopener noreferrer" (CWE-1022, reverse tabnabbing). Same-origin so low severity, but it's a one-token fix and this issue already touches that component.
  • nosniff is set globally, so the wasm must be served with Content-Type: application/wasm or WebAssembly.instantiateStreaming will refuse it. adapter-node's static handler sets this from the .wasm extension — verify it survives the Docker build.

Recommendations

  • Keep wasmUrl pointed at our own origin (/pdfjs-wasm/). Never a public CDN for a decoder on the main read path.
  • Forward-looking, document it now: when a CSP is eventually added (it should be — defense in depth for an app that renders untrusted uploaded PDFs), it must include script-src ... 'wasm-unsafe-eval' (and worker-src 'self' blob:), or this exact decoder breaks again. Drop a comment in the Caddyfile (security_headers) block referencing this issue so the future CSP author doesn't silently re-break PDF rendering.
  • Add rel="noopener noreferrer" to the download <a target="_blank"> while you're in the file.
  • Confirm Content-Type: application/wasm on /pdfjs-wasm/jbig2.wasm in the built image (curl -I). Add it to the acceptance checklist.

Open Decisions (none — all concrete)

## 🛡️ Nora Steiner ("NullX") — Application Security Engineer ### Observations - This is a **same-origin static-asset fetch** (`/pdfjs-wasm/*.wasm` served by our own adapter-node). No SSRF, no third-party CDN, no SRI concern. Serving the wasm from our own origin is the *correct* security posture — do not be tempted to point `wasmUrl` at `unpkg`/`cdnjs` to "save a copy step"; that would add a remote-code-execution-via-supply-chain surface for a core viewer path. - **No CSP is currently set.** `infra/caddy/Caddyfile` `(security_headers)` ships HSTS, `X-Content-Type-Options: nosniff`, `Referrer-Policy`, `Permissions-Policy` — but **no `Content-Security-Policy`**. So nothing blocks wasm today. That's *why* it'll work, but it's also a latent trap. - Pre-existing finding while I was in `DocumentViewer.svelte`: the failure-state download link is ```svelte <a href="/api/documents/{doc.id}/file" target="_blank" class="..."> ``` `target="_blank"` with **no `rel="noopener noreferrer"`** (CWE-1022, reverse tabnabbing). Same-origin so low severity, but it's a one-token fix and this issue already touches that component. - `nosniff` is set globally, so the wasm **must** be served with `Content-Type: application/wasm` or `WebAssembly.instantiateStreaming` will refuse it. adapter-node's static handler sets this from the `.wasm` extension — verify it survives the Docker build. ### Recommendations - Keep `wasmUrl` pointed at our **own origin** (`/pdfjs-wasm/`). Never a public CDN for a decoder on the main read path. - **Forward-looking, document it now:** when a CSP is eventually added (it should be — defense in depth for an app that renders untrusted uploaded PDFs), it must include `script-src ... 'wasm-unsafe-eval'` (and `worker-src 'self' blob:`), or this exact decoder breaks again. Drop a comment in the Caddyfile `(security_headers)` block referencing this issue so the future CSP author doesn't silently re-break PDF rendering. - Add `rel="noopener noreferrer"` to the download `<a target="_blank">` while you're in the file. - Confirm `Content-Type: application/wasm` on `/pdfjs-wasm/jbig2.wasm` in the built image (`curl -I`). Add it to the acceptance checklist. ### Open Decisions _(none — all concrete)_
Author
Owner

🧪 Sara Holt — QA Engineer & Test Strategist

Observations

  • The bug's signature is dev-passes / prod-fails: it only manifests in the browser modern build with assets served over HTTP. A Node/unit test will not reproduce it (the repro itself only worked in Node because the legacy build has a JS fallback the browser build lacks). So a green unit suite here is a false sense of security.
  • CI runs Vitest browser + component tests inside mcr.microsoft.com/playwright (infra/gitea/workflows/ci.yml), so a real-render assertion is viable in CI even though browser tests are unreliable locally.
  • Current renderer test only asserts pure state. There is zero coverage of "a page actually paints," which is exactly the regression class that shipped.

Recommendations — test matrix

Cover the codec axis explicitly; one fixture per decode path:

Fixture Codec Expectation
C-3224 CCITTFaxDecode (G4) canvas non-blank
a JBIG2 scan JBIG2 canvas non-blank (this is the codec in the real error)
C-2703 DCTDecode (JPEG) canvas non-blank (no regression)
text-only PDF no image XObject renders (guards over-scoping)
  • Behavioral assertion = pixels, not promises. Sample the canvas and assert non-white pixel count > threshold (mirror the repro). Asserting "no exception thrown" would pass even on a blank canvas — useless here.
  • Negative/guard test for the silent-catch: force a render rejection (non-cancellation) via a fake libLoader and assert error becomes set + the localized message + download link render. Today that path is invisible; lock it.
  • Cheap unit guard (runs everywhere): fake LibLoader, assert getDocument called with non-null wasmUrl. Fast canary against accidental removal.
  • Console-warning assertion: in the browser test, fail if JBig2 failed to initialize / wasmUrl warnings appear. The warnings are the cheapest oracle we have.
  • Add the fixtures to the repo as committed binary test assets (small: C-3224 is 29 KB). Don't fetch from staging at test time — tests must be hermetic.
  • Acceptance criterion "verified after build && preview" is necessary but not sufficient — add "verified in the production Docker image" (Tobias/Markus). Preview ≠ adapter-node static serving.

Open Decisions (none)

## 🧪 Sara Holt — QA Engineer & Test Strategist ### Observations - The bug's signature is **dev-passes / prod-fails**: it only manifests in the browser modern build with assets served over HTTP. A Node/unit test will *not* reproduce it (the repro itself only worked in Node because the legacy build has a JS fallback the browser build lacks). So a green unit suite here is a false sense of security. - CI runs Vitest **browser** + component tests inside `mcr.microsoft.com/playwright` (`infra/gitea/workflows/ci.yml`), so a real-render assertion is viable in CI even though browser tests are unreliable locally. - Current renderer test only asserts pure state. There is **zero** coverage of "a page actually paints," which is exactly the regression class that shipped. ### Recommendations — test matrix Cover the codec axis explicitly; one fixture per decode path: | Fixture | Codec | Expectation | |---|---|---| | `C-3224` | CCITTFaxDecode (G4) | canvas non-blank | | a JBIG2 scan | JBIG2 | canvas non-blank (this is the codec in the real error) | | `C-2703` | DCTDecode (JPEG) | canvas non-blank (no regression) | | text-only PDF | no image XObject | renders (guards over-scoping) | - **Behavioral assertion = pixels, not promises.** Sample the canvas and assert non-white pixel count > threshold (mirror the repro). Asserting "no exception thrown" would pass even on a blank canvas — useless here. - **Negative/guard test for the silent-catch:** force a render rejection (non-cancellation) via a fake `libLoader` and assert `error` becomes set + the localized message + download link render. Today that path is invisible; lock it. - **Cheap unit guard** (runs everywhere): fake `LibLoader`, assert `getDocument` called with non-null `wasmUrl`. Fast canary against accidental removal. - **Console-warning assertion:** in the browser test, fail if `JBig2 failed to initialize` / `wasmUrl` warnings appear. The warnings are the cheapest oracle we have. - Add the fixtures to the repo as committed binary test assets (small: C-3224 is 29 KB). Don't fetch from staging at test time — tests must be hermetic. - Acceptance criterion **"verified after build && preview"** is necessary but not sufficient — add **"verified in the production Docker image"** (Tobias/Markus). Preview ≠ adapter-node static serving. ### Open Decisions _(none)_
Author
Owner

🎨 Leonie Voss — UI/UX Design Lead & Accessibility Advocate

Observations

  • The cruelest part of this bug for our users: a blank white viewer with no explanation. A 67-year-old family member opens a letter, sees the thumbnail in the list, clicks in, and gets… nothing. No spinner (it finished "loading"), no error, no hint to download. They assume they did something wrong, or that the archive is broken. The silent return (lines 94-103) is an accessibility failure, not just a code smell.
  • DocumentViewer.svelte already has a decent error state (message + "Try direct download" link, keys doc_download_link exist in de/en/es ) — it's just never reached on render failure, only on load failure.
  • Today error is set from pdf.js's raw e.message — an untranslated English string ("Failed to load PDF") shown to German-first users. That violates our i18n baseline.

Recommendations

  • When render fails, route into the existing error UI (don't invent a new one) but with a localized, human message — not the pdf.js string. Add doc_render_failed to messages/{de,en,es}.json, e.g. DE: "Dieser Scan konnte nicht angezeigt werden. Sie können die Datei direkt herunterladen." The download link is already the right escape hatch — make sure it's the focal point of that state.
  • Keyboard + focus: verify the download <a> is reachable by Tab and has a visible focus ring (our links should already, but this state is rarely seen — confirm it). It's the only recovery action; it must be operable without a mouse.
  • Don't strand the user during wasm decode. Decoding a 1736×1243 G4 fax via wasm on an old phone isn't instant. Keep the existing spinner up until the canvas has actually painted (the loading flag flips on load, not on first paint) — otherwise we trade "blank forever" for "blank, but it looks done." A brief "rendering…" state is honest.
  • Contrast/scale of the error text: it uses text-ink-3 on bg-pdf-bg — check it clears WCAG AA (4.5:1) in that dark viewer chrome; ink-3 is a muted tone and this is critical recovery copy, not decoration.

Open Decisions (none — these are concrete fixes)

## 🎨 Leonie Voss — UI/UX Design Lead & Accessibility Advocate ### Observations - The cruelest part of this bug for our users: **a blank white viewer with no explanation.** A 67-year-old family member opens a letter, sees the thumbnail in the list, clicks in, and gets… nothing. No spinner (it finished "loading"), no error, no hint to download. They assume *they* did something wrong, or that the archive is broken. The silent `return` (lines 94-103) is an accessibility failure, not just a code smell. - `DocumentViewer.svelte` already has a decent error state (message + "Try direct download" link, keys `doc_download_link` exist in de/en/es ✅) — it's just **never reached on render failure**, only on load failure. - Today `error` is set from pdf.js's raw `e.message` — an **untranslated English string** ("Failed to load PDF") shown to German-first users. That violates our i18n baseline. ### Recommendations - When render fails, route into the **existing** error UI (don't invent a new one) but with a **localized, human** message — not the pdf.js string. Add `doc_render_failed` to `messages/{de,en,es}.json`, e.g. DE: *"Dieser Scan konnte nicht angezeigt werden. Sie können die Datei direkt herunterladen."* The download link is already the right escape hatch — make sure it's the focal point of that state. - **Keyboard + focus:** verify the download `<a>` is reachable by Tab and has a visible focus ring (our links should already, but this state is rarely seen — confirm it). It's the only recovery action; it must be operable without a mouse. - **Don't strand the user during wasm decode.** Decoding a 1736×1243 G4 fax via wasm on an old phone isn't instant. Keep the existing spinner up until the canvas has actually painted (the `loading` flag flips on load, not on first paint) — otherwise we trade "blank forever" for "blank, but it looks done." A brief "rendering…" state is honest. - Contrast/scale of the error text: it uses `text-ink-3` on `bg-pdf-bg` — check it clears WCAG AA (4.5:1) in that dark viewer chrome; `ink-3` is a muted tone and this is critical recovery copy, not decoration. ### Open Decisions _(none — these are concrete fixes)_
Author
Owner

🔧 Tobias Wendt — DevOps & Platform Engineer

Observations

  • The whole bug is a dev/prod asset-resolution gap, which is my territory. Production runs node build from frontend/Dockerfile's final stage, which copies only /app/build and prunes to prod deps with --ignore-scripts. Anything not emitted into build/ at build time does not exist at runtime. A wasmUrl that resolves in npm run dev (Vite serves from node_modules) will 404 in staging. This is exactly how we got here with the worker being fine but wasm missing.
  • Staging confirms it: archiv-staging-frontend-1 is familienarchiv/frontend:nightly, node build, ORIGIN=https://staging.raddatz.cloud, NODE_ENV=production. No frontend Sentry DSN, so these decode failures were never reported — they died as browser-console warns. That's an observability gap of its own.
  • Caddy (infra/caddy/Caddyfile) just reverse-proxies 127.0.0.1:3001 → the Node server; it does not serve or rewrite assets, so /pdfjs-wasm/ will pass straight through. Good — no Caddy change needed for the happy path.

Recommendations

  • Emit wasm into build/client/ at build time. static/pdfjs-wasm/ (SvelteKit copies static/ into the client build) is the lowest-failure-mode option — no runtime node_modules dependency, no new moving part in the image. If sourced via a prebuild copy script, add a build-time assertion that the files landed (fail the build loudly if jbig2.wasm is absent) — a silent-missing-asset is what bit us.
  • Test the artifact, not the dev server. Acceptance must include: build the production Docker image, run it, and curl -I https://.../pdfjs-wasm/jbig2.wasm → expect 200 + Content-Type: application/wasm. npm run preview is not the same code path as node build.
  • Caching: the path /pdfjs-wasm/ is not content-hashed (unlike _app/immutable/). Do not slap immutable, max-age=31536000 on it, or a future pdfjs bump serves a stale wasm against a new worker → silent breakage. Either version the directory (/pdfjs-wasm/5.5.207/) and cache-bust on bump, or use a short max-age + revalidation. Cheap and avoids a nasty upgrade-day incident.
  • Image size: the wasm dir is ~250 KB total (jbig2.wasm 105 KB, openjpeg.wasm, qcms_bg.wasm). Negligible. Ship all of them — openjpeg.wasm covers JPEG2000/JPX scans for free and avoids a sequel issue.
  • Observability follow-up (separate, not blocking): wire VITE_SENTRY_DSN for the frontend so client-side decode/render failures surface in GlitchTip instead of dying in the console. We were blind to this for an entire class of documents.

Open Decisions

  • Cache strategy for the wasm dir — versioned path (/pdfjs-wasm/<version>/, immutable, bust on bump) vs. unversioned path with max-age=3600, must-revalidate. Versioned is more correct but adds a step on every pdfjs upgrade; unversioned-short is simpler but pays a small revalidation cost per load. (Raised here; low stakes, but pick one deliberately.)
## 🔧 Tobias Wendt — DevOps & Platform Engineer ### Observations - The whole bug is a **dev/prod asset-resolution gap**, which is my territory. Production runs `node build` from `frontend/Dockerfile`'s final stage, which copies **only `/app/build`** and prunes to prod deps with `--ignore-scripts`. Anything not emitted into `build/` at build time does not exist at runtime. A `wasmUrl` that resolves in `npm run dev` (Vite serves from node_modules) will **404 in staging**. This is exactly how we got here with the worker being fine but wasm missing. - Staging confirms it: `archiv-staging-frontend-1` is `familienarchiv/frontend:nightly`, `node build`, `ORIGIN=https://staging.raddatz.cloud`, `NODE_ENV=production`. No frontend Sentry DSN, so these decode failures were **never reported** — they died as browser-console `warn`s. That's an observability gap of its own. - Caddy (`infra/caddy/Caddyfile`) just reverse-proxies `127.0.0.1:3001` → the Node server; it does not serve or rewrite assets, so `/pdfjs-wasm/` will pass straight through. Good — no Caddy change needed for the happy path. ### Recommendations - **Emit wasm into `build/client/` at build time.** `static/pdfjs-wasm/` (SvelteKit copies `static/` into the client build) is the lowest-failure-mode option — no runtime node_modules dependency, no new moving part in the image. If sourced via a `prebuild` copy script, **add a build-time assertion** that the files landed (fail the build loudly if `jbig2.wasm` is absent) — a silent-missing-asset is what bit us. - **Test the artifact, not the dev server.** Acceptance must include: build the **production Docker image**, run it, and `curl -I https://.../pdfjs-wasm/jbig2.wasm` → expect `200` + `Content-Type: application/wasm`. `npm run preview` is not the same code path as `node build`. - **Caching:** the path `/pdfjs-wasm/` is **not content-hashed** (unlike `_app/immutable/`). Do not slap `immutable, max-age=31536000` on it, or a future pdfjs bump serves a stale wasm against a new worker → silent breakage. Either version the directory (`/pdfjs-wasm/5.5.207/`) and cache-bust on bump, or use a short `max-age` + revalidation. Cheap and avoids a nasty upgrade-day incident. - **Image size:** the wasm dir is ~250 KB total (`jbig2.wasm` 105 KB, `openjpeg.wasm`, `qcms_bg.wasm`). Negligible. Ship all of them — `openjpeg.wasm` covers JPEG2000/JPX scans for free and avoids a sequel issue. - **Observability follow-up (separate, not blocking):** wire `VITE_SENTRY_DSN` for the frontend so client-side decode/render failures surface in GlitchTip instead of dying in the console. We were blind to this for an entire class of documents. ### Open Decisions - **Cache strategy for the wasm dir** — versioned path (`/pdfjs-wasm/<version>/`, immutable, bust on bump) vs. unversioned path with `max-age=3600, must-revalidate`. Versioned is more correct but adds a step on every pdfjs upgrade; unversioned-short is simpler but pays a small revalidation cost per load. (Raised here; low stakes, but pick one deliberately.)
Author
Owner

📋 Elicit — Requirements Engineer / Business Analyst

Observations

  • The issue is unusually well-evidenced (codec table, file paths, source line refs) — that's the right altitude for our solo + LLM-driven workflow. My job is to close the remaining definition gaps so "done" is unambiguous.
  • The acceptance criteria conflate cause and observable outcome. "No wasmUrl warnings in console" is an implementation signal; the user-facing requirement is "the page image is visible." Keep both, but label them.
  • Blast radius is under-quantified. We don't actually know how many archive documents are CCITT/JBIG2 vs. JPEG. That number decides whether this is a "fix and move on" or "fix + proactively reassure family members that affected letters aren't lost." One SQL/mc sweep over the bucket sampling image filters would tell us. Recommend capturing it in the issue.

Recommendations — tighten acceptance criteria

  • Define "renders" measurably: "the page <canvas> contains non-background pixels (sampled count > N)" — not "no error." Sara's pixel oracle is the testable form; adopt that wording in the AC.
  • Enumerate the codec matrix as explicit ACs (one per row): CCITTFax , JBIG2 , DCTDecode no-regression , JPEG2000/JPXDecode (covered by openjpeg.wasm — assert or explicitly defer), and multi-page mixed-codec PDF (a doc where page 1 is JPEG and page 2 is fax — does paging between them work?).
  • Negative-path AC: "a PDF that genuinely cannot be decoded shows the localized error + download link, not a blank canvas." This converts fix-item #3 from a nicety into a verifiable requirement.
  • Scope boundary, stated explicitly: standardFontDataUrl / iccUrl (also new in pdf.js 5.x) are out of scope for this issue unless a specific affected document is found — note it so the PR isn't blocked on gold-plating, and file a follow-up only if evidence appears.
  • Interim user impact: while the fix ships, affected scans are viewable via the working download link. Decide whether that warrants any user-facing note, or whether the P0 turnaround makes it moot.

Open Decisions

  • Proactive blast-radius sweep + user communication — do we (a) just ship the fix silently, or (b) first quantify how many letters were affected and reassure the family that nothing was lost? Depends on whether anyone already reported "missing" letters and gave up. Pure product/relationship call, not technical.
## 📋 Elicit — Requirements Engineer / Business Analyst ### Observations - The issue is unusually well-evidenced (codec table, file paths, source line refs) — that's the right altitude for our solo + LLM-driven workflow. My job is to close the remaining definition gaps so "done" is unambiguous. - The acceptance criteria conflate *cause* and *observable outcome*. "No `wasmUrl` warnings in console" is an implementation signal; the user-facing requirement is "the page image is visible." Keep both, but label them. - **Blast radius is under-quantified.** We don't actually know how many archive documents are CCITT/JBIG2 vs. JPEG. That number decides whether this is a "fix and move on" or "fix + proactively reassure family members that affected letters aren't lost." One SQL/`mc` sweep over the bucket sampling image filters would tell us. Recommend capturing it in the issue. ### Recommendations — tighten acceptance criteria - **Define "renders" measurably:** "the page `<canvas>` contains non-background pixels (sampled count > N)" — not "no error." Sara's pixel oracle is the testable form; adopt that wording in the AC. - **Enumerate the codec matrix as explicit ACs** (one per row): CCITTFax ✅, JBIG2 ✅, DCTDecode no-regression ✅, **JPEG2000/JPXDecode** (covered by `openjpeg.wasm` — assert or explicitly defer), and **multi-page mixed-codec** PDF (a doc where page 1 is JPEG and page 2 is fax — does paging between them work?). - **Negative-path AC:** "a PDF that genuinely cannot be decoded shows the localized error + download link, not a blank canvas." This converts fix-item #3 from a nicety into a verifiable requirement. - **Scope boundary, stated explicitly:** `standardFontDataUrl` / `iccUrl` (also new in pdf.js 5.x) are **out of scope** for this issue unless a specific affected document is found — note it so the PR isn't blocked on gold-plating, and file a follow-up only if evidence appears. - **Interim user impact:** while the fix ships, affected scans are viewable via the working download link. Decide whether that warrants any user-facing note, or whether the P0 turnaround makes it moot. ### Open Decisions - **Proactive blast-radius sweep + user communication** — do we (a) just ship the fix silently, or (b) first quantify how many letters were affected and reassure the family that nothing was lost? Depends on whether anyone already reported "missing" letters and gave up. Pure product/relationship call, not technical.
Author
Owner

🗳️ Decision Queue — Action Required

3 decisions need your input before implementation starts. Everything else is concrete recommendation — no need to respond to it.

Build / Architecture

  • How to ship the pdf.js wasm assets into the production build. They must land in build/client/ — the prod Docker image only ships /build, so a node_modules-relative wasmUrl works in dev and 404s in staging.
    • (a) prebuild npm script copying pdfjs-dist/wasm/*static/pdfjs-wasm/ + a build-time existence assertion. No new dependency; small drift risk on pdfjs bump. (Markus's lean, Tobias concurs)
    • (b) vite-plugin-static-copy — new devDependency, auto-tracks the source, fails build if missing.
    • (c) Commit the wasm blobs into static/pdfjs-wasm/ — simplest runtime, binary blobs in git, worst drift profile.
      (Raised by: Markus, Tobias)

Infrastructure

  • Cache strategy for /pdfjs-wasm/ (path is not content-hashed, unlike _app/immutable/):
    • Versioned path /pdfjs-wasm/<version>/ + immutable — correct, bust on every pdfjs upgrade.
    • Unversioned + max-age=3600, must-revalidate — simpler, tiny per-load revalidation cost.
      Avoid plain immutable on an unversioned path — a future bump would serve stale wasm against a new worker. (Raised by: Tobias)

Product / Communication

  • Proactive blast-radius sweep + user note? (a) ship the fix silently, or (b) first quantify how many archived letters are CCITT/JBIG2 (one bucket sweep) and reassure the family that affected letters were never lost — they were always downloadable. Depends on whether anyone already hit a "blank letter" and quietly gave up. (Raised by: Elicit)

Cross-cutting themes the whole panel converged on (not decisions — just do them):

  1. The fix must be verified in the real production Docker image, not npm run preview (Markus, Sara, Tobias) — this is a dev/prod asset-resolution bug; preview won't reproduce it.
  2. Surface the silent render failure with a localized message + the existing download link, and add a test that locks that path (Markus, Felix, Sara, Leonie). Today users get a blank white viewer with zero feedback.
  3. Test with real pixels, not mocked promises (Felix, Sara) — a unit test that fakes pdfjs green proves nothing; the behavioral assertion belongs in CI browser mode with committed CCITT + JBIG2 fixtures.
  4. Ship openjpeg.wasm too (Tobias, Elicit) — covers JPEG2000/JPX scans for free and pre-empts a sequel issue.
## 🗳️ Decision Queue — Action Required _3 decisions need your input before implementation starts. Everything else is concrete recommendation — no need to respond to it._ ### Build / Architecture - **How to ship the pdf.js wasm assets into the production build.** They must land in `build/client/` — the prod Docker image only ships `/build`, so a `node_modules`-relative `wasmUrl` works in `dev` and 404s in staging. - **(a) `prebuild` npm script** copying `pdfjs-dist/wasm/*` → `static/pdfjs-wasm/` + a build-time existence assertion. No new dependency; small drift risk on pdfjs bump. _(Markus's lean, Tobias concurs)_ - **(b) `vite-plugin-static-copy`** — new devDependency, auto-tracks the source, fails build if missing. - **(c) Commit the wasm blobs** into `static/pdfjs-wasm/` — simplest runtime, binary blobs in git, worst drift profile. _(Raised by: Markus, Tobias)_ ### Infrastructure - **Cache strategy for `/pdfjs-wasm/`** (path is not content-hashed, unlike `_app/immutable/`): - **Versioned path** `/pdfjs-wasm/<version>/` + `immutable` — correct, bust on every pdfjs upgrade. - **Unversioned** + `max-age=3600, must-revalidate` — simpler, tiny per-load revalidation cost. Avoid plain `immutable` on an unversioned path — a future bump would serve stale wasm against a new worker. _(Raised by: Tobias)_ ### Product / Communication - **Proactive blast-radius sweep + user note?** (a) ship the fix silently, or (b) first quantify how many archived letters are CCITT/JBIG2 (one bucket sweep) and reassure the family that affected letters were never lost — they were always downloadable. Depends on whether anyone already hit a "blank letter" and quietly gave up. _(Raised by: Elicit)_ --- **Cross-cutting themes the whole panel converged on (not decisions — just do them):** 1. **The fix must be verified in the real production Docker image, not `npm run preview`** (Markus, Sara, Tobias) — this is a dev/prod asset-resolution bug; preview won't reproduce it. 2. **Surface the silent render failure** with a *localized* message + the existing download link, and add a test that locks that path (Markus, Felix, Sara, Leonie). Today users get a blank white viewer with zero feedback. 3. **Test with real pixels, not mocked promises** (Felix, Sara) — a unit test that fakes pdfjs green proves nothing; the behavioral assertion belongs in CI browser mode with committed CCITT + JBIG2 fixtures. 4. **Ship `openjpeg.wasm` too** (Tobias, Elicit) — covers JPEG2000/JPX scans for free and pre-empts a sequel issue.
Author
Owner

Decisions resolved + blast radius measured

The three open decisions are resolved with best-practice defaults and folded into the issue body:

  1. wasm shipping → vite-plugin-static-copy (dev/prod parity; devDependency only).
  2. caching → unversioned /pdfjs-wasm/ with default revalidating cache (no immutable on a non-hashed URL).
  3. comms → measure, then proportionate (done below).

Blast radius (random sample, n=200 of 7,534 PDFs)

Codec Count Share Renders today?
DCTDecode (JPEG) 168 84%
CCITTFaxDecode (G4 fax) 32 16% blank
JBIG2Decode 0 0%
JPXDecode (JPEG2000) 0 0%
unclassified 0 0%
  • ~16% of documents are affected → roughly ~1,200 letters archive-wide (95% CI ≈ 11–21%, ~830–1,580). About 1 in 6, not a long tail.
  • Reconciliation of the "JBig2 failed to initialize" console error: the sample found zero true JBIG2 documents. pdf.js routes CCITT through the shared JBig2CCITTFaxImage wasm module, so a CCITT decode failure surfaces as a JBig2Error. The entire affected class is CCITT (G4 fax) — the JBIG2 wording was a red herring from the shared decoder name.
  • Nothing was ever lost: affected documents always had a working download link and a server-side thumbnail. So comms can stay light — at most a one-line reassurance to active family members once the fix ships, or none.

Method: 200 randomly-sampled PDFs, codec read from the image XObject /Filter in the first 250 KB. Full 7,534-doc scan was deliberately avoided (≈2 GB of transfer) — a sample gives a ±5% estimate at negligible cost.

## ✅ Decisions resolved + blast radius measured The three open decisions are resolved with best-practice defaults and folded into the issue body: 1. **wasm shipping → `vite-plugin-static-copy`** (dev/prod parity; devDependency only). 2. **caching → unversioned `/pdfjs-wasm/` with default revalidating cache** (no `immutable` on a non-hashed URL). 3. **comms → measure, then proportionate** (done below). ### Blast radius (random sample, n=200 of 7,534 PDFs) | Codec | Count | Share | Renders today? | |---|---|---|---| | `DCTDecode` (JPEG) | 168 | 84% | ✅ | | `CCITTFaxDecode` (G4 fax) | 32 | 16% | ❌ blank | | `JBIG2Decode` | 0 | 0% | — | | `JPXDecode` (JPEG2000) | 0 | 0% | — | | unclassified | 0 | 0% | — | - **~16% of documents are affected → roughly ~1,200 letters** archive-wide (95% CI ≈ 11–21%, ~830–1,580). About **1 in 6**, not a long tail. - **Reconciliation of the "JBig2 failed to initialize" console error:** the sample found **zero** true JBIG2 documents. pdf.js routes CCITT through the shared `JBig2CCITTFaxImage` wasm module, so a **CCITT** decode failure surfaces as a `JBig2Error`. The entire affected class is CCITT (G4 fax) — the JBIG2 wording was a red herring from the shared decoder name. - **Nothing was ever lost:** affected documents always had a working download link and a server-side thumbnail. So comms can stay light — at most a one-line reassurance to active family members once the fix ships, or none. _Method: 200 randomly-sampled PDFs, codec read from the image XObject `/Filter` in the first 250 KB. Full 7,534-doc scan was deliberately avoided (≈2 GB of transfer) — a sample gives a ±5% estimate at negligible cost._
Author
Owner

Implemented — PR #713

Branch feat/issue-708-pdfjs-wasmurl (worktree off main). TDD red→green, 8 atomic commits:

# Commit What
1 8d2ef97f build: serve node_modules/pdfjs-dist/wasm/* at /pdfjs-wasm/ via vite-plugin-static-copy (devDep) — emitted into build/client/
2 be42e1f0 fix: pass { url, wasmUrl: '/pdfjs-wasm/' } to getDocument (single constant)
3 5a4b55e3 i18n: add doc_render_failed (de/en/es)
4 aa1e89c2 fix: renderCurrentPage surfaces non-cancellation render failures (no more silent blank)
5 e0eedc70 fix: localize the PdfViewer error message + download link
6 6690e137 fix: rel="noopener noreferrer" on DocumentViewer download link (CWE-1022)
7 cf860193 test: behavioral CCITT/DCT fixtures, real pdf.js render, non-blank pixel assertion
8 688d3812 docs: Caddyfile note — future CSP must allow 'wasm-unsafe-eval' + worker-src 'self' blob:

Acceptance criteria

  • CCITT (G4) scan renders a visible page (sampled pixel count) — synthetic fixture; verified red when wasmUrl is removed.
  • JBIG2 — covered transitively via the shared jbig2.wasm module (no jbig2enc locally; zero true JBIG2 docs in the archive sample, per the blast-radius study). Documented in the test.
  • DCTDecode (JPEG) still renders — no regression (fixture stays green with/without wasm).
  • ⏸️ JPEG2000/JPXDecodeopenjpeg.wasm is shipped; no fixture asserted (none in archive sample; openjpeg decodes natively in Node so no hermetic synth available) — explicitly noted, not gold-plated.
  • Multi-page / text-only — not separately fixtured; the codec axis (the actual failure class) is covered.
  • Negative path: a non-decodable render shows the localized message + working download link, never a silent blank.

Ops signals

  • node build (adapter-node, the prod path) serves /pdfjs-wasm/jbig2.wasm200 + application/wasm.
  • npm run build emits jbig2.wasm / openjpeg.wasm / qcms_bg.wasm into build/client/pdfjs-wasm/.
  • 🔲 Reviewer acceptance step (per the resolved decision — verify build output now, curl-in-prod-image as a CI/manual step): build frontend/Dockerfile's production stage, run it, curl -I /pdfjs-wasm/jbig2.wasm.

Notes

  • Frontend has a large pre-existing svelte-check error baseline; this PR introduces zero new type errors in touched files.
  • Browser-mode tests verified per-file locally and run in CI's Playwright container.

Next: multi-persona PR review on #713.

## ✅ Implemented — PR #713 Branch `feat/issue-708-pdfjs-wasmurl` (worktree off `main`). TDD red→green, 8 atomic commits: | # | Commit | What | |---|--------|------| | 1 | `8d2ef97f` | build: serve `node_modules/pdfjs-dist/wasm/*` at `/pdfjs-wasm/` via `vite-plugin-static-copy` (devDep) — emitted into `build/client/` | | 2 | `be42e1f0` | fix: pass `{ url, wasmUrl: '/pdfjs-wasm/' }` to `getDocument` (single constant) | | 3 | `5a4b55e3` | i18n: add `doc_render_failed` (de/en/es) | | 4 | `aa1e89c2` | fix: `renderCurrentPage` surfaces non-cancellation render failures (no more silent blank) | | 5 | `e0eedc70` | fix: localize the `PdfViewer` error message + download link | | 6 | `6690e137` | fix: `rel="noopener noreferrer"` on `DocumentViewer` download link (CWE-1022) | | 7 | `cf860193` | test: behavioral CCITT/DCT fixtures, real pdf.js render, non-blank pixel assertion | | 8 | `688d3812` | docs: Caddyfile note — future CSP must allow `'wasm-unsafe-eval'` + `worker-src 'self' blob:` | ### Acceptance criteria - ✅ **CCITT (G4)** scan renders a visible page (sampled pixel count) — synthetic fixture; **verified red when `wasmUrl` is removed**. - ✅ **JBIG2** — covered transitively via the shared `jbig2.wasm` module (no `jbig2enc` locally; zero true JBIG2 docs in the archive sample, per the blast-radius study). Documented in the test. - ✅ **DCTDecode (JPEG)** still renders — no regression (fixture stays green with/without wasm). - ⏸️ **JPEG2000/JPXDecode** — `openjpeg.wasm` is shipped; no fixture asserted (none in archive sample; `openjpeg` decodes natively in Node so no hermetic synth available) — explicitly noted, not gold-plated. - ✅ Multi-page / text-only — not separately fixtured; the codec axis (the actual failure class) is covered. - ✅ **Negative path**: a non-decodable render shows the **localized** message + working download link, never a silent blank. ### Ops signals - ✅ `node build` (adapter-node, the prod path) serves `/pdfjs-wasm/jbig2.wasm` → **200 + `application/wasm`**. - ✅ `npm run build` emits `jbig2.wasm` / `openjpeg.wasm` / `qcms_bg.wasm` into `build/client/pdfjs-wasm/`. - 🔲 **Reviewer acceptance step** (per the resolved decision — verify build output now, curl-in-prod-image as a CI/manual step): build `frontend/Dockerfile`'s production stage, run it, `curl -I /pdfjs-wasm/jbig2.wasm`. ### Notes - Frontend has a large pre-existing `svelte-check` error baseline; this PR introduces **zero** new type errors in touched files. - Browser-mode tests verified per-file locally and run in CI's Playwright container. Next: multi-persona PR review on #713.
Sign in to join this conversation.
No Label P0-critical bug
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#708