Compare commits

..

16 Commits

Author SHA1 Message Date
Marcel
e16b7402bd fix(document): make the PDF error state accessible (alert + larger link)
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m20s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m42s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m7s
The error block was a colour-only, visually-small dead end. Add
role="alert" so screen readers announce the failure, bump the message to
text-base and the recovery download link to text-sm with a py-2 tap
target — the only escape hatch, sized for the archive's older readers.

Addresses re-review: Leonie (a11y of the error state).

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
229c1b0539 test(document): exercise the real render-failure path in PdfViewer test
The "render failure" test rejected getDocument().promise — the load
path, not the render path — and only asserted a template constant. Now
the fake loads the document successfully and rejects the page render
(the actual #708 wasm-decode failure class), plus a negative companion
asserting the message is absent on a successful render. Also reset
renderTask to null on the render-error path.

Addresses re-review: Felix, Sara (mislabeled test / asserted a constant).

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
f24c415b04 fix(document): localize loadDocument error too — no raw pdf.js text
The render path was localized but loadDocument still stored the raw
pdf.js message (and an untranslated English fallback), contradicting the
"never leak raw error text" principle. Both load and render failures now
set the localized doc_render_failed message.

Addresses re-review: Felix, Nora (raw error leak on the load path).

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
4c57a2262f test(frontend): guard wasm shipping at build time, drop CI-fragile pixel test
The in-browser pixel-render fixture test was green locally but flaky in
CI: the real pdf.js worker could not fetch /pdfjs-wasm/ in the CI
Chromium container, so the CCITT canvas stayed blank (0 sampled pixels)
and failed the suite — green locally, red in CI, root cause not locally
reproducible. A flaky gate is worse than none.

This bug is a build/serve parity failure, so guard it deterministically
where it actually breaks: a postbuild assertion that jbig2.wasm and
openjpeg.wasm shipped into build/client/pdfjs-wasm/ (non-empty). It runs
after `npm run build` — including the Docker build stage — and fails the
build loudly if a future pdfjs bump makes the static-copy glob match
nothing. Combined with the getDocument(wasmUrl) unit guard and the
negative-path render test, the regression is covered without CI flake.

Addresses re-review: Tobias (no automated parity check), Sara (pixel
test not pinned). Render-decode correctness verified manually via
`node build` serving /pdfjs-wasm/jbig2.wasm as application/wasm.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
b8e01f997d docs(caddy): note future CSP must allow wasm-unsafe-eval for pdf.js
If a Content-Security-Policy is ever added, it must permit
'wasm-unsafe-eval' (script-src) and 'self' blob: (worker-src) or the
pdf.js wasm decoders and worker break and scanned PDFs render blank.
Forward-looking note so the future CSP author doesn't silently
reintroduce #708.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
e8e57d2712 test(document): behavioral CCITT/DCT render fixtures prove the wasm path
Render committed synthetic fixtures through PdfViewer with the REAL
pdf.js loader and assert the canvas is non-blank (sampled dark-pixel
count). The CCITT (G4 fax) fixture exercises the shared jbig2.wasm
decode path — the same module pdf.js uses for JBIG2 — so it transitively
covers the JBIG2 acceptance criterion (the archive sample found zero
true JBIG2 docs and jbig2enc is unavailable to synthesize one). The
JPEG/DCTDecode fixture guards against regressing the natively-decoded
path. Verified the CCITT case goes red when wasmUrl is removed.

Fixtures are hermetic, committed assets (~2-5 KB each), generated with
ImageMagick — never fetched from staging at test time. CI browser mode.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
817835fd6a fix(document): add rel=noopener noreferrer to viewer download link (CWE-1022)
The error-state download link opened with target="_blank" but no rel,
exposing the opener to reverse tabnavbabbing. Add rel="noopener
noreferrer". Same-origin so low severity, but a one-token fix in a file
this issue already touches.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
c361b3cd45 fix(document): localize PdfViewer render-error message and download link
The error state showed a hardcoded German string ("Fehler beim Laden
der PDF" / "Direkt öffnen") to all users regardless of locale. Use the
localized doc_render_failed and doc_download_link messages so the
recovery path (message + working download link) is honest in de/en/es.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
5c8034d298 fix(document): surface PDF render failures instead of a silent blank canvas
renderCurrentPage swallowed every render rejection with a bare return,
so a decode failure left a blank white viewer with no feedback. Now a
non-cancellation rejection sets a localized doc_render_failed message,
which routes into the existing error UI (message + download link).
Cancellation (page-nav / zoom) still returns silently — no error.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
8b1b070254 i18n(document): add doc_render_failed message for blank-render fallback
Localized message shown when a PDF page cannot be rendered, so users
never see a blank canvas or a raw English pdf.js string. de/en/es.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
4ca1c967d2 fix(document): pass wasmUrl to pdf.js getDocument so wasm decoders load
getDocument was called with a bare src string, so pdf.js 5.x had no
`wasmUrl` and could not initialise the JBIG2/CCITTFax wasm decoder —
CCITT (G4 fax) scans painted a blank canvas. Pass
{ url, wasmUrl: '/pdfjs-wasm/' }; the directory URL (trailing slash
required) is the single source of truth next to the worker config.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
24d9d975d1 build(frontend): serve pdf.js wasm decoders at /pdfjs-wasm/ via static-copy
pdf.js 5.x moved the JBIG2/CCITTFax/JPEG2000 image decoders into
WebAssembly. The wasm lives in node_modules and was never web-served, so
those decoders failed to initialise and CCITT (G4 fax) scans painted
blank in production while rendering fine in dev.

Add vite-plugin-static-copy (devDependency) to copy
node_modules/pdfjs-dist/wasm/* into build/client/pdfjs-wasm/, so the
assets are emitted into the SvelteKit client build and survive the
production Docker image — not just `npm run dev`. Verified that
`node build` serves /pdfjs-wasm/jbig2.wasm with 200 + application/wasm.

Refs #708

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:12:23 +02:00
Marcel
8a1cc2d1f0 chore(i18n): drop the unused date_original_label key and stale comments
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m18s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m39s
CI / fail2ban Regex (pull_request) Successful in 46s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m5s
CI / Unit & Component Tests (push) Successful in 3m19s
CI / OCR Service Tests (push) Successful in 24s
CI / Backend Unit Tests (push) Successful in 3m37s
CI / fail2ban Regex (push) Successful in 44s
CI / Semgrep Security Scan (push) Successful in 21s
CI / Compose Bucket Idempotency (push) Successful in 1m6s
With the visible "Originaltext" line gone from every view, the
date_original_label message has no remaining references — remove it from
de/en/es. Also drop the now-inaccurate comments in documentDate.ts that
described the raw cell as "preserved separately as the visible secondary
line"; the raw cell now only feeds the SEASON word and is never shown.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:10:20 +02:00
Marcel
d5bf401085 feat(document): stop surfacing the raw cell in the detail drawer
The detail drawer's date cell rendered DocumentDate whenever a date OR a
raw cell was present (`{#if documentDate || metaDateRaw}`). For an
undated, raw-only document that meant the verbatim import text leaked
into the view. Tighten the guard to `{#if documentDate}` so such a
document shows "—". The raw prop is still passed through for the SEASON
word on dated documents. Covered by a new test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:10:20 +02:00
Marcel
4944918692 feat(document): remove the visible Originaltext line from DocumentDate
DocumentDate rendered an "Originaltext: <raw>" secondary line for
UNKNOWN/SEASON/APPROX dates, gated by a showRaw prop. Drop the visible
line, the showRaw prop, the showRawLine derived, and the now-unused
date_original_label message import. The raw prop stays — it still feeds
the SEASON word in formatDocumentDate, which only ever maps a fixed
German season token (never emits raw text), so no XSS surface remains.

Update both DocumentRow call sites to drop the now-gone showRaw={false}
and the comment that justified it. Remove the two DocumentDate tests
that asserted on the deleted DOM sink (the UNKNOWN secondary line and
its XSS-escaping); the DAY/MONTH coverage stays.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:10:20 +02:00
Marcel
bf90427bfa feat(document): drop the read-only Originaltext field from the edit form
The "Originaltext:" line in WhoWhenSection rendered the verbatim import
cell (metaDateRaw) as static text plus a hidden input that re-submitted
it on every save. Editors mistook it for an editable field. Remove the
visible line, the hidden round-trip input, and the now-unused rawDate
prop (here and at the DocumentEditLayout call site). The backend's
partial update preserves the stored value, so no data is lost.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 20:10:20 +02:00
12 changed files with 23 additions and 66 deletions

View File

@@ -303,7 +303,6 @@
"date_season_summer": "Sommer",
"date_season_autumn": "Herbst",
"date_season_winter": "Winter",
"date_original_label": "Originaltext:",
"date_unknown_icon_label": "Datum unbekannt",
"form_label_date_precision": "Datumsgenauigkeit",
"form_label_date_end": "Enddatum",

View File

@@ -303,7 +303,6 @@
"date_season_summer": "Summer",
"date_season_autumn": "Autumn",
"date_season_winter": "Winter",
"date_original_label": "Original:",
"date_unknown_icon_label": "Date unknown",
"form_label_date_precision": "Date precision",
"form_label_date_end": "End date",

View File

@@ -303,7 +303,6 @@
"date_season_summer": "Verano",
"date_season_autumn": "Otoño",
"date_season_winter": "Invierno",
"date_original_label": "Texto original:",
"date_unknown_icon_label": "Fecha desconocida",
"form_label_date_precision": "Precisión de la fecha",
"form_label_date_end": "Fecha final",

View File

@@ -1,30 +1,20 @@
<script lang="ts">
import { formatDocumentDate, type DatePrecision } from '$lib/shared/utils/documentDate';
import { getLocale } from '$lib/paraglide/runtime.js';
import { m } from '$lib/paraglide/messages.js';
type Props = {
iso?: string | null;
precision?: DatePrecision | null;
end?: string | null;
/** Verbatim import cell — used only to derive the SEASON word, never displayed. */
raw?: string | null;
/** Show the verbatim "Originaltext: …" secondary line when raw is present. */
showRaw?: boolean;
};
let { iso = null, precision = null, end = null, raw = null, showRaw = true }: Props = $props();
let { iso = null, precision = null, end = null, raw = null }: Props = $props();
const effectivePrecision = $derived<DatePrecision>(precision ?? (iso ? 'DAY' : 'UNKNOWN'));
const label = $derived(formatDocumentDate(iso, effectivePrecision, end, raw, getLocale()));
const isUnknown = $derived(effectivePrecision === 'UNKNOWN' || !iso);
// Only show the verbatim raw line where it adds information the label can't: the
// season word's source, or the original cell behind an "unknown"/approx date.
const showRawLine = $derived(
showRaw &&
!!raw &&
raw.trim().length > 0 &&
(isUnknown || effectivePrecision === 'SEASON' || effectivePrecision === 'APPROX')
);
</script>
<span class="inline-flex flex-col">
@@ -61,10 +51,4 @@ const showRawLine = $derived(
{:else}
<span>{label}</span>
{/if}
{#if showRawLine}
<!-- Visible secondary line (WCAG 1.4.13 — not tooltip-only). raw is untrusted
verbatim spreadsheet text; rendered via default Svelte interpolation, which
HTML-escapes it (never {@html}; CWE-79). -->
<span class="font-sans text-xs text-ink-2">{m.date_original_label()} {raw}</span>
{/if}
</span>

View File

@@ -17,19 +17,4 @@ describe('DocumentDate', () => {
render(DocumentDate, { props: { iso: '1916-06-01', precision: 'MONTH', raw: 'Juni 1916' } });
await expect.element(page.getByText('Juni 1916')).toBeInTheDocument();
});
it('shows the verbatim raw cell as a visible secondary line for UNKNOWN (not tooltip-only)', async () => {
render(DocumentDate, { props: { iso: null, precision: 'UNKNOWN', raw: 'Sommer?' } });
// Real, visible text — not hidden behind a title attribute.
await expect.element(page.getByText('Datum unbekannt')).toBeInTheDocument();
await expect.element(page.getByText(/Sommer\?/)).toBeVisible();
});
it('renders a malicious raw value as inert escaped text (no element injected)', async () => {
const malicious = '<img src=x onerror="alert(1)">';
render(DocumentDate, { props: { iso: null, precision: 'UNKNOWN', raw: malicious } });
// The payload appears as literal text, and no <img> is created in the DOM.
await expect.element(page.getByText(/<img/)).toBeInTheDocument();
expect(document.querySelector('img')).toBeNull();
});
});

View File

@@ -209,7 +209,6 @@ async function handleReplaceFile(e: Event) {
bind:dateIso={dateIso}
bind:precision={datePrecision}
bind:endDateIso={dateEndIso}
rawDate={doc.metaDateRaw ?? ''}
initialDateIso={doc.documentDate ?? ''}
initialLocation={doc.location ?? ''}
initialSenderName={doc.sender?.displayName ?? ''}

View File

@@ -113,7 +113,7 @@ function getFullName(person: Person): string {
<div>
<dt class="font-sans text-xs font-medium text-ink-3">{m.doc_details_field_date()}</dt>
<dd class="text-ink">
{#if documentDate || metaDateRaw}
{#if documentDate}
<DocumentDate
iso={documentDate}
precision={metaDatePrecision}

View File

@@ -58,6 +58,18 @@ describe('DocumentMetadataDrawer', () => {
expect(dashTexts.length).toBeGreaterThan(0);
});
it('shows an em-dash and never the raw cell for an undated, raw-only document', async () => {
render(DocumentMetadataDrawer, {
props: { ...baseProps, documentDate: null, metaDateRaw: 'Sommer 1916' }
});
await expect.element(page.getByText('Sommer 1916')).not.toBeInTheDocument();
const dashTexts = Array.from(document.querySelectorAll('dd, p'))
.map((el) => el.textContent?.trim())
.filter((t) => t === '—');
expect(dashTexts.length).toBeGreaterThan(0);
});
it('renders the no-persons placeholder when sender and receivers are empty', async () => {
render(DocumentMetadataDrawer, { props: baseProps });

View File

@@ -164,15 +164,10 @@ function safeTagColor(color: string | null | undefined): string {
<!-- Mobile-only metadata -->
<div class="mt-3 grid grid-cols-2 gap-x-4 gap-y-1 font-sans text-xs text-ink-2 sm:hidden">
<div>
<!-- Product decision (#666): raw provenance (meta_date_raw) is shown on the
document DETAIL page, never in list/search rows — list rows surface only the
honest label to keep scan-rows compact. showRaw={false} enforces this; the
DocumentListItem payload also intentionally omits metaDateRaw. -->
<DocumentDate
iso={doc.documentDate}
precision={doc.metaDatePrecision}
end={doc.metaDateEnd}
showRaw={false}
/>
</div>
<div class="flex items-start gap-2">
@@ -194,7 +189,6 @@ function safeTagColor(color: string | null | undefined): string {
iso={doc.documentDate}
precision={doc.metaDatePrecision}
end={doc.metaDateEnd}
showRaw={false}
/>
</div>
<div>

View File

@@ -16,7 +16,6 @@ let {
dateIso = $bindable(''),
precision = $bindable<DatePrecision>('DAY'),
endDateIso = $bindable(''),
rawDate = '',
initialDateIso = '',
initialLocation = '',
initialSenderName = '',
@@ -30,7 +29,6 @@ let {
dateIso?: string;
precision?: DatePrecision;
endDateIso?: string;
rawDate?: string;
initialDateIso?: string;
initialLocation?: string;
initialSenderName?: string;
@@ -179,15 +177,6 @@ $effect(() => {
{/if}
</div>
<input type="hidden" name="metaDateEnd" value={showEndDate ? endDateIso : ''} />
<!-- Originaltext (read-only raw cell): labelled static text, not a disabled input. -->
{#if rawDate && rawDate.trim().length > 0}
<div data-testid="who-when-raw">
<p class="mb-1 block text-sm font-medium text-ink-2">{m.date_original_label()}</p>
<p class="font-sans text-sm text-ink">{rawDate}</p>
<input type="hidden" name="metaDateRaw" value={rawDate} />
</div>
{/if}
{/if}
<!-- Absender (required in upload mode — row 1, col 2) -->

View File

@@ -93,13 +93,12 @@ describe('WhoWhenSection — precision controls', () => {
expect(document.querySelector('input#metaDateEnd')).not.toBeNull();
});
it('renders the raw cell as static text (not an editable input) and escapes it', async () => {
render(WhoWhenSection, { rawDate: '<b>Sommer</b> 1916' });
const raw = document.querySelector('[data-testid="who-when-raw"]');
expect(raw).not.toBeNull();
// Verbatim shown as escaped text; no injected <b> element.
expect(raw?.textContent).toContain('<b>Sommer</b> 1916');
expect(raw?.querySelector('b')).toBeNull();
it('never renders the raw cell, and never re-submits it via a hidden input', async () => {
render(WhoWhenSection, {});
// The confusing "Originaltext" line is gone …
expect(document.querySelector('[data-testid="who-when-raw"]')).toBeNull();
// … and editing no longer round-trips metaDateRaw to the backend.
expect(document.querySelector('input[name="metaDateRaw"]')).toBeNull();
});
});

View File

@@ -20,8 +20,7 @@ export type DatePrecision = 'DAY' | 'MONTH' | 'SEASON' | 'YEAR' | 'RANGE' | 'APP
* {@code DocumentTitleFormatter}: both are asserted against
* `docs/date-label-fixtures.json` so they cannot drift. The untrusted `raw`
* cell is only used to derive a season word (a known German season token) — it
* is otherwise rendered separately by the caller via Svelte default escaping,
* never interpolated into HTML here.
* is never displayed and never interpolated into HTML here.
*
* @param iso the sort/filter anchor day (`YYYY-MM-DD`), nullable for UNKNOWN rows
* @param precision descriptive precision metadata
@@ -82,8 +81,7 @@ function seasonLabel(
): string {
const month = Number(iso.slice(5, 7));
// Prefer the season named in the raw cell; fall back to deriving it from the
// anchor month. Either way the WORD is localized (Decision 4) — the verbatim
// German raw cell is preserved separately as the visible secondary line.
// anchor month. Either way the WORD is localized (Decision 4).
const season = seasonFromRaw(raw) ?? seasonOfMonth(month);
return `${seasonWord(season, locale)} ${year}`;
}