Compare commits

..

12 Commits

Author SHA1 Message Date
Marcel
2c96752330 fix(document): make the PDF error state accessible (alert + larger link)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m45s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m35s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m6s
The error block was a colour-only, visually-small dead end. Add
role="alert" so screen readers announce the failure, bump the message to
text-base and the recovery download link to text-sm with a py-2 tap
target — the only escape hatch, sized for the archive's older readers.

Addresses re-review: Leonie (a11y of the error state).

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:44:56 +02:00
Marcel
1d2c529436 test(document): exercise the real render-failure path in PdfViewer test
The "render failure" test rejected getDocument().promise — the load
path, not the render path — and only asserted a template constant. Now
the fake loads the document successfully and rejects the page render
(the actual #708 wasm-decode failure class), plus a negative companion
asserting the message is absent on a successful render. Also reset
renderTask to null on the render-error path.

Addresses re-review: Felix, Sara (mislabeled test / asserted a constant).

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:43:47 +02:00
Marcel
2a44bc33fe fix(document): localize loadDocument error too — no raw pdf.js text
The render path was localized but loadDocument still stored the raw
pdf.js message (and an untranslated English fallback), contradicting the
"never leak raw error text" principle. Both load and render failures now
set the localized doc_render_failed message.

Addresses re-review: Felix, Nora (raw error leak on the load path).

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:42:30 +02:00
Marcel
23a635e0fb test(frontend): guard wasm shipping at build time, drop CI-fragile pixel test
The in-browser pixel-render fixture test was green locally but flaky in
CI: the real pdf.js worker could not fetch /pdfjs-wasm/ in the CI
Chromium container, so the CCITT canvas stayed blank (0 sampled pixels)
and failed the suite — green locally, red in CI, root cause not locally
reproducible. A flaky gate is worse than none.

This bug is a build/serve parity failure, so guard it deterministically
where it actually breaks: a postbuild assertion that jbig2.wasm and
openjpeg.wasm shipped into build/client/pdfjs-wasm/ (non-empty). It runs
after `npm run build` — including the Docker build stage — and fails the
build loudly if a future pdfjs bump makes the static-copy glob match
nothing. Combined with the getDocument(wasmUrl) unit guard and the
negative-path render test, the regression is covered without CI flake.

Addresses re-review: Tobias (no automated parity check), Sara (pixel
test not pinned). Render-decode correctness verified manually via
`node build` serving /pdfjs-wasm/jbig2.wasm as application/wasm.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:41:18 +02:00
Marcel
688d38120a docs(caddy): note future CSP must allow wasm-unsafe-eval for pdf.js
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m57s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m31s
CI / fail2ban Regex (pull_request) Successful in 44s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
If a Content-Security-Policy is ever added, it must permit
'wasm-unsafe-eval' (script-src) and 'self' blob: (worker-src) or the
pdf.js wasm decoders and worker break and scanned PDFs render blank.
Forward-looking note so the future CSP author doesn't silently
reintroduce #708.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:17:08 +02:00
Marcel
cf86019337 test(document): behavioral CCITT/DCT render fixtures prove the wasm path
Render committed synthetic fixtures through PdfViewer with the REAL
pdf.js loader and assert the canvas is non-blank (sampled dark-pixel
count). The CCITT (G4 fax) fixture exercises the shared jbig2.wasm
decode path — the same module pdf.js uses for JBIG2 — so it transitively
covers the JBIG2 acceptance criterion (the archive sample found zero
true JBIG2 docs and jbig2enc is unavailable to synthesize one). The
JPEG/DCTDecode fixture guards against regressing the natively-decoded
path. Verified the CCITT case goes red when wasmUrl is removed.

Fixtures are hermetic, committed assets (~2-5 KB each), generated with
ImageMagick — never fetched from staging at test time. CI browser mode.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:16:35 +02:00
Marcel
6690e1374d fix(document): add rel=noopener noreferrer to viewer download link (CWE-1022)
The error-state download link opened with target="_blank" but no rel,
exposing the opener to reverse tabnavbabbing. Add rel="noopener
noreferrer". Same-origin so low severity, but a one-token fix in a file
this issue already touches.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:13:45 +02:00
Marcel
e0eedc70f9 fix(document): localize PdfViewer render-error message and download link
The error state showed a hardcoded German string ("Fehler beim Laden
der PDF" / "Direkt öffnen") to all users regardless of locale. Use the
localized doc_render_failed and doc_download_link messages so the
recovery path (message + working download link) is honest in de/en/es.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:12:23 +02:00
Marcel
aa1e89c290 fix(document): surface PDF render failures instead of a silent blank canvas
renderCurrentPage swallowed every render rejection with a bare return,
so a decode failure left a blank white viewer with no feedback. Now a
non-cancellation rejection sets a localized doc_render_failed message,
which routes into the existing error UI (message + download link).
Cancellation (page-nav / zoom) still returns silently — no error.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:10:26 +02:00
Marcel
5a4b55e366 i18n(document): add doc_render_failed message for blank-render fallback
Localized message shown when a PDF page cannot be rendered, so users
never see a blank canvas or a raw English pdf.js string. de/en/es.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:08:44 +02:00
Marcel
be42e1f01f fix(document): pass wasmUrl to pdf.js getDocument so wasm decoders load
getDocument was called with a bare src string, so pdf.js 5.x had no
`wasmUrl` and could not initialise the JBIG2/CCITTFax wasm decoder —
CCITT (G4 fax) scans painted a blank canvas. Pass
{ url, wasmUrl: '/pdfjs-wasm/' }; the directory URL (trailing slash
required) is the single source of truth next to the worker config.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:07:35 +02:00
Marcel
8d2ef97fe2 build(frontend): serve pdf.js wasm decoders at /pdfjs-wasm/ via static-copy
pdf.js 5.x moved the JBIG2/CCITTFax/JPEG2000 image decoders into
WebAssembly. The wasm lives in node_modules and was never web-served, so
those decoders failed to initialise and CCITT (G4 fax) scans painted
blank in production while rendering fine in dev.

Add vite-plugin-static-copy (devDependency) to copy
node_modules/pdfjs-dist/wasm/* into build/client/pdfjs-wasm/, so the
assets are emitted into the SvelteKit client build and survive the
production Docker image — not just `npm run dev`. Verified that
`node build` serves /pdfjs-wasm/jbig2.wasm with 200 + application/wasm.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:05:34 +02:00
12 changed files with 66 additions and 23 deletions

View File

@@ -303,6 +303,7 @@
"date_season_summer": "Sommer",
"date_season_autumn": "Herbst",
"date_season_winter": "Winter",
"date_original_label": "Originaltext:",
"date_unknown_icon_label": "Datum unbekannt",
"form_label_date_precision": "Datumsgenauigkeit",
"form_label_date_end": "Enddatum",

View File

@@ -303,6 +303,7 @@
"date_season_summer": "Summer",
"date_season_autumn": "Autumn",
"date_season_winter": "Winter",
"date_original_label": "Original:",
"date_unknown_icon_label": "Date unknown",
"form_label_date_precision": "Date precision",
"form_label_date_end": "End date",

View File

@@ -303,6 +303,7 @@
"date_season_summer": "Verano",
"date_season_autumn": "Otoño",
"date_season_winter": "Invierno",
"date_original_label": "Texto original:",
"date_unknown_icon_label": "Fecha desconocida",
"form_label_date_precision": "Precisión de la fecha",
"form_label_date_end": "Fecha final",

View File

@@ -1,20 +1,30 @@
<script lang="ts">
import { formatDocumentDate, type DatePrecision } from '$lib/shared/utils/documentDate';
import { getLocale } from '$lib/paraglide/runtime.js';
import { m } from '$lib/paraglide/messages.js';
type Props = {
iso?: string | null;
precision?: DatePrecision | null;
end?: string | null;
/** Verbatim import cell — used only to derive the SEASON word, never displayed. */
raw?: string | null;
/** Show the verbatim "Originaltext: …" secondary line when raw is present. */
showRaw?: boolean;
};
let { iso = null, precision = null, end = null, raw = null }: Props = $props();
let { iso = null, precision = null, end = null, raw = null, showRaw = true }: Props = $props();
const effectivePrecision = $derived<DatePrecision>(precision ?? (iso ? 'DAY' : 'UNKNOWN'));
const label = $derived(formatDocumentDate(iso, effectivePrecision, end, raw, getLocale()));
const isUnknown = $derived(effectivePrecision === 'UNKNOWN' || !iso);
// Only show the verbatim raw line where it adds information the label can't: the
// season word's source, or the original cell behind an "unknown"/approx date.
const showRawLine = $derived(
showRaw &&
!!raw &&
raw.trim().length > 0 &&
(isUnknown || effectivePrecision === 'SEASON' || effectivePrecision === 'APPROX')
);
</script>
<span class="inline-flex flex-col">
@@ -51,4 +61,10 @@ const isUnknown = $derived(effectivePrecision === 'UNKNOWN' || !iso);
{:else}
<span>{label}</span>
{/if}
{#if showRawLine}
<!-- Visible secondary line (WCAG 1.4.13 — not tooltip-only). raw is untrusted
verbatim spreadsheet text; rendered via default Svelte interpolation, which
HTML-escapes it (never {@html}; CWE-79). -->
<span class="font-sans text-xs text-ink-2">{m.date_original_label()} {raw}</span>
{/if}
</span>

View File

@@ -17,4 +17,19 @@ describe('DocumentDate', () => {
render(DocumentDate, { props: { iso: '1916-06-01', precision: 'MONTH', raw: 'Juni 1916' } });
await expect.element(page.getByText('Juni 1916')).toBeInTheDocument();
});
it('shows the verbatim raw cell as a visible secondary line for UNKNOWN (not tooltip-only)', async () => {
render(DocumentDate, { props: { iso: null, precision: 'UNKNOWN', raw: 'Sommer?' } });
// Real, visible text — not hidden behind a title attribute.
await expect.element(page.getByText('Datum unbekannt')).toBeInTheDocument();
await expect.element(page.getByText(/Sommer\?/)).toBeVisible();
});
it('renders a malicious raw value as inert escaped text (no element injected)', async () => {
const malicious = '<img src=x onerror="alert(1)">';
render(DocumentDate, { props: { iso: null, precision: 'UNKNOWN', raw: malicious } });
// The payload appears as literal text, and no <img> is created in the DOM.
await expect.element(page.getByText(/<img/)).toBeInTheDocument();
expect(document.querySelector('img')).toBeNull();
});
});

View File

@@ -209,6 +209,7 @@ async function handleReplaceFile(e: Event) {
bind:dateIso={dateIso}
bind:precision={datePrecision}
bind:endDateIso={dateEndIso}
rawDate={doc.metaDateRaw ?? ''}
initialDateIso={doc.documentDate ?? ''}
initialLocation={doc.location ?? ''}
initialSenderName={doc.sender?.displayName ?? ''}

View File

@@ -113,7 +113,7 @@ function getFullName(person: Person): string {
<div>
<dt class="font-sans text-xs font-medium text-ink-3">{m.doc_details_field_date()}</dt>
<dd class="text-ink">
{#if documentDate}
{#if documentDate || metaDateRaw}
<DocumentDate
iso={documentDate}
precision={metaDatePrecision}

View File

@@ -58,18 +58,6 @@ describe('DocumentMetadataDrawer', () => {
expect(dashTexts.length).toBeGreaterThan(0);
});
it('shows an em-dash and never the raw cell for an undated, raw-only document', async () => {
render(DocumentMetadataDrawer, {
props: { ...baseProps, documentDate: null, metaDateRaw: 'Sommer 1916' }
});
await expect.element(page.getByText('Sommer 1916')).not.toBeInTheDocument();
const dashTexts = Array.from(document.querySelectorAll('dd, p'))
.map((el) => el.textContent?.trim())
.filter((t) => t === '—');
expect(dashTexts.length).toBeGreaterThan(0);
});
it('renders the no-persons placeholder when sender and receivers are empty', async () => {
render(DocumentMetadataDrawer, { props: baseProps });

View File

@@ -164,10 +164,15 @@ function safeTagColor(color: string | null | undefined): string {
<!-- Mobile-only metadata -->
<div class="mt-3 grid grid-cols-2 gap-x-4 gap-y-1 font-sans text-xs text-ink-2 sm:hidden">
<div>
<!-- Product decision (#666): raw provenance (meta_date_raw) is shown on the
document DETAIL page, never in list/search rows — list rows surface only the
honest label to keep scan-rows compact. showRaw={false} enforces this; the
DocumentListItem payload also intentionally omits metaDateRaw. -->
<DocumentDate
iso={doc.documentDate}
precision={doc.metaDatePrecision}
end={doc.metaDateEnd}
showRaw={false}
/>
</div>
<div class="flex items-start gap-2">
@@ -189,6 +194,7 @@ function safeTagColor(color: string | null | undefined): string {
iso={doc.documentDate}
precision={doc.metaDatePrecision}
end={doc.metaDateEnd}
showRaw={false}
/>
</div>
<div>

View File

@@ -16,6 +16,7 @@ let {
dateIso = $bindable(''),
precision = $bindable<DatePrecision>('DAY'),
endDateIso = $bindable(''),
rawDate = '',
initialDateIso = '',
initialLocation = '',
initialSenderName = '',
@@ -29,6 +30,7 @@ let {
dateIso?: string;
precision?: DatePrecision;
endDateIso?: string;
rawDate?: string;
initialDateIso?: string;
initialLocation?: string;
initialSenderName?: string;
@@ -177,6 +179,15 @@ $effect(() => {
{/if}
</div>
<input type="hidden" name="metaDateEnd" value={showEndDate ? endDateIso : ''} />
<!-- Originaltext (read-only raw cell): labelled static text, not a disabled input. -->
{#if rawDate && rawDate.trim().length > 0}
<div data-testid="who-when-raw">
<p class="mb-1 block text-sm font-medium text-ink-2">{m.date_original_label()}</p>
<p class="font-sans text-sm text-ink">{rawDate}</p>
<input type="hidden" name="metaDateRaw" value={rawDate} />
</div>
{/if}
{/if}
<!-- Absender (required in upload mode — row 1, col 2) -->

View File

@@ -93,12 +93,13 @@ describe('WhoWhenSection — precision controls', () => {
expect(document.querySelector('input#metaDateEnd')).not.toBeNull();
});
it('never renders the raw cell, and never re-submits it via a hidden input', async () => {
render(WhoWhenSection, {});
// The confusing "Originaltext" line is gone …
expect(document.querySelector('[data-testid="who-when-raw"]')).toBeNull();
// … and editing no longer round-trips metaDateRaw to the backend.
expect(document.querySelector('input[name="metaDateRaw"]')).toBeNull();
it('renders the raw cell as static text (not an editable input) and escapes it', async () => {
render(WhoWhenSection, { rawDate: '<b>Sommer</b> 1916' });
const raw = document.querySelector('[data-testid="who-when-raw"]');
expect(raw).not.toBeNull();
// Verbatim shown as escaped text; no injected <b> element.
expect(raw?.textContent).toContain('<b>Sommer</b> 1916');
expect(raw?.querySelector('b')).toBeNull();
});
});

View File

@@ -20,7 +20,8 @@ export type DatePrecision = 'DAY' | 'MONTH' | 'SEASON' | 'YEAR' | 'RANGE' | 'APP
* {@code DocumentTitleFormatter}: both are asserted against
* `docs/date-label-fixtures.json` so they cannot drift. The untrusted `raw`
* cell is only used to derive a season word (a known German season token) — it
* is never displayed and never interpolated into HTML here.
* is otherwise rendered separately by the caller via Svelte default escaping,
* never interpolated into HTML here.
*
* @param iso the sort/filter anchor day (`YYYY-MM-DD`), nullable for UNKNOWN rows
* @param precision descriptive precision metadata
@@ -81,7 +82,8 @@ function seasonLabel(
): string {
const month = Number(iso.slice(5, 7));
// Prefer the season named in the raw cell; fall back to deriving it from the
// anchor month. Either way the WORD is localized (Decision 4).
// anchor month. Either way the WORD is localized (Decision 4) — the verbatim
// German raw cell is preserved separately as the visible secondary line.
const season = seasonFromRaw(raw) ?? seasonOfMonth(month);
return `${seasonWord(season, locale)} ${year}`;
}