fix(document-viewer): surface error + retry when file load stalls instead of spinning forever #322
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
On document detail, the PDF/image viewer currently shows only an indefinite loading spinner when the underlying file can't be fetched. Captured twice during Phase B2 audit on 2026-04-24 — the viewer spun for the entire screenshot timeout with no error, no retry, no diagnostic.
Users have no way to distinguish:
Silent hangs feel like a broken app. Particularly costly on mobile (reader persona on phones with variable connectivity — this is the primary consumer surface).
Non-goals
Proposed behaviour
Wrap the viewer in an error boundary that watches for:
load/errorevents from the underlying<img>,<iframe>, or PDF.js instance.loadevent has fired yet.fetch()preflight oronerrorwith status capture — pick whichever is cleanest for the current implementation).State machine:
loadfiresload, no errImplementation plan
Frontend
frontend/src/lib/components/DocumentViewerErrorBoundary.svelte:documentId,fileUrl,children snippet.status: 'loading' | 'ok' | 'timeout' | 'error-404' | 'error-403' | 'error-5xx' | 'network'.setTimeout(10_000); cleared on childloadevent.frontend/src/routes/documents/[id]/+page.sveltearound the existing viewer component.status = 'loading'and regenerate the URL (cache-bust with a?retry=<n>param to force a fresh fetch).Backend
i18n
6 new Paraglide keys:
viewer_error_timeout_title,viewer_error_404,viewer_error_403,viewer_error_5xx,viewer_error_network,viewer_error_retry,viewer_error_report.Tests
errorevent with mocked status code, assert correct message renders.offlinecontext, reload, expect "Keine Verbindung" UI.Verification
Manual runs on the dev stack:
docker compose stop minio). Reload a document page. Within 10 s, error UI renders with retry. Start MinIO. Click retry. Page loads.Document.visible=false(if such a flag exists, else simulate 403). Navigate. 403 UI shows.Acceptance criteria
Critical files
Related
👨💻 Markus Keller — Application Architect
Observations
Backend error propagation is already correct.
DocumentController.getDocumentFile()already mapsStorageFileNotFoundException → DomainException.notFound(FILE_NOT_FOUND)and permission failures go through Spring Security /PermissionAspectas 403. TheGlobalExceptionHandlerconverts everyDomainExceptionto a structured JSON body with anErrorCode. No backend changes are strictly needed — the issue's "Verify current behaviour first" note is already answered: it is correct.The actual gap is entirely in
useFileLoader.svelte.ts. TheloadFile()method currently doesif (!response.ok) throw new Error('Failed to load file')— it discards the HTTP status code completely before the catch block converts everything to the same hardcoded German string. Status 404, 403, and 503 are all treated identically.DocumentViewer.sveltealready acceptserror: stringas a prop and renders it. The viewer itself does not need to be a new "error boundary component" — the existing prop surface is sufficient. The issue proposes a newDocumentViewerErrorBoundary.sveltewrapper, which would duplicate the orchestration logic already in+page.svelte. The simpler path is to extenduseFileLoaderto expose a typed status instead.The proposed component boundary is questionable. A
DocumentViewerErrorBoundary.sveltewithdocumentId+fileUrl+children snippetprops would need to know about loading state, retry logic, AND rendering the error UI — that is three concerns in one component. The existing separation (useFileLoaderowns fetch state,DocumentViewerrenders based on props) is cleaner.The "Report button → Gitea new-issue URL" idea has no architectural home. A deep link to the Gitea instance URL (heim-nas:3005) would be hardcoded infrastructure detail in frontend code, or require a new config surface. The
mailto:fallback is simpler and does not couple the frontend to the ops environment.Recommendations
Extend
useFileLoaderto exposefileStatus: 'loading' | 'ok' | 'timeout' | 'error-404' | 'error-403' | 'error-5xx' | 'network'instead of justfileError: string. The caller (+page.svelte) passes this toDocumentViewer, which renders the appropriate message. This requires zero new components and no new prop contracts.Implement timeout via
AbortControllerwith a 10-second signal insideuseFileLoader.loadFile(). This is cleaner than asetTimeoutthat races against fetch completion and requires manual cleanup.Drop the
DocumentViewerErrorBoundary.sveltefrom the plan. The existinguseFileLoader→DocumentViewerpipeline already has the right shape. Adding a wrapper component introduces a layer with no clear name — "ErrorBoundary" names an error handling mechanism, not a visible UI region.Use
mailto:for the report button, not a Gitea URL. The Gitea instance hostname is internal infrastructure. Hardcoding it in frontend code is an ops coupling that breaks whenever the instance moves.mailto:admin@familyarchive.local?subject=Viewer+error+%23<documentId>requires no config surface.No architecture documentation updates needed (no new packages, routes, or services). The CLAUDE.md package table and the C4 diagrams are unaffected.
Open Decisions
👨💻 Felix Brandt — Fullstack Developer
Observations
useFileLoader.svelte.tsdiscards the HTTP status code. Line 21 readsif (!response.ok) throw new Error('Failed to load file')— the status is available onresponse.statusbut is not captured before the throw. The catch block then sets a hardcoded German string regardless of whether the failure was 404, 403, 503, or a network error. The tests inuseFileLoader.svelte.test.tsconfirm this: the test'sets fileError on a failed fetch (non-ok response)'only checksfileError !== ''— it does not assert the status is differentiated.DocumentViewer.sveltealready handleserror: stringrendering (lines 66–77 in the current file). However, the error UI only shows a generic{error}paragraph and a download link. It does not differentiate by error type for distinct messages or actions (retry vs. back).PdfViewer.sveltehas its own inline error state (renderer.error) that also renders a hardcoded German string ("Fehler beim Laden der PDF") without going through theerrorprop ofDocumentViewer. This is a second unhandled error surface not mentioned in the issue — if PDF.js itself fails to parse the blob, the error boundary inuseFileLoaderwill not catch it.The
loadFilefunction does not captureresponse.statusbefore the exception path, making it impossible to distinguish 404 vs. 403 vs. 5xx after the fact without refactoring.The retry mechanism needs a cache-bust. The issue proposes
?retry=<n>. However,useFileLoadercreates an object URL from a blob — it does not serve the URL directly to<img>or<iframe>. The cache-bust needs to be on thefetch()call to the backend, not on the blob URL. A?_t=<Date.now()>query param on the/api/documents/${id}/filerequest achieves this.The image viewer path (
{:else if fileUrl}inDocumentViewer.svelteline 107) has noonload/onerrorhandler at all. If the image blob itself is corrupt or thecreateObjectURLreturns an empty blob, it silently renders nothing.Recommendations
Refactor
useFileLoaderto capture and exposehttpStatus: number | nullalongsidefileError. Replace the singlefileError: stringwithfileStatus: 'loading' | 'ok' | 'timeout' | 'error-404' | 'error-403' | 'error-5xx' | 'network'. Derive the human-readable message in the component, not in the hook — the hook should expose facts, not display strings.Add
AbortController+ a 10-second timeout signal toloadFile(). On abort, setfileStatus = 'timeout'. Clear the controller on successful load or explicit retry.Add
onerrorhandler to the<img>element inDocumentViewer.svelteto surface image decode failures.Use
?_t=${Date.now()}on the retry fetch — not on the blob URL.Write the failing tests first (per TDD): add test cases to
useFileLoader.svelte.test.tsforstatus: 404,status: 403,status: 503, andAbortErrorbefore touching the production code.Address the
PdfViewer.svelteinline error state separately or in the same PR — it should either pipe through theerrorprop or use the same status enum. Leaving two independent error surfaces will confuse the UX.Open Decisions
👨💻 Tobias Wendt — DevOps & Platform Engineer
Observations
The E2E verification steps require stopping MinIO mid-test (
docker compose stop minio). This is realistic for manual verification but is not automatable in CI without a custom Docker Compose CI overlay that can inject MinIO failures. The existing CI setup does not appear to have such an overlay.The "MinIO is down" scenario is the hardest to reproduce deterministically in CI. A stopped MinIO container returns a connection refused error, which maps to a network failure at the fetch layer — not an HTTP 5xx from the backend. The backend's
GlobalExceptionHandlercatchesRuntimeExceptionfromFileServiceand returns 500, so the frontend would actually see an HTTP 500 in that case, not a network error. The E2E testdocker compose stop minio→expect 5xx UIis the correct mapping, but needs to be documented clearly.The 10-second timeout E2E test (
with MinIO stopped, navigate to a document → within 10 s see error UI) will be slow in CI — adding 10 seconds of wall-clock time per test run. This is a layer mismatch: timeout behavior is better validated at the unit/component layer usingvi.useFakeTimers().No infrastructure changes are required for this issue. No new Docker services, no new environment variables (if the 10s timeout stays hardcoded), no Caddy config changes. The existing health check setup is unaffected.
The "Report button → Gitea new-issue URL" would require the frontend to know the Gitea instance's external URL (
http://heim-nas:3005). That is an infrastructure detail that would need to be injected via aPUBLIC_GITEA_URLenv var in SvelteKit's.envor the Docker Compose environment. This adds a config surface for a minor feature.Recommendations
Move the timeout test to the component layer using
vi.useFakeTimers(). Keep the E2E test to verify the full "MinIO down → 5xx error UI" path (not the timeout path), which is faster because it does not wait 10 seconds.Add a
docker-compose.ci.ymloverlay that includes aminio-flakeyprofile or use Playwright'spage.route()to intercept/api/documents/*/fileand return a synthetic 503 — this is more reliable and faster than stopping a real container.If the report button uses
mailto:, no env var is needed. If Gitea URL is chosen, addPUBLIC_GITEA_URLto.env.exampleand document it indocs/DEPLOYMENT.md.Do not add the Gitea instance URL to the Docker Compose file — it belongs in
.envas aPUBLIC_GITEA_URLvar that defaults to empty (which disables the Gitea link and falls back to mailto).Open Decisions
page.route()mocking or real MinIO outage scenarios?page.route()is faster and deterministic. Real MinIO outage tests the full stack more faithfully but is slow and CI-unfriendly.👨💻 Elicit — Requirements Engineer
Observations
The issue is well-specified overall. It includes a state machine table, concrete trigger-to-UI mappings, an i18n key list, a test plan, and an acceptance criteria checklist. It meets the Definition of Ready on most dimensions.
Two acceptance criteria are ambiguous or untestable as written:
"10-second timeout triggers a retry-able error state"— does not specify the start point of the 10 seconds (navigation start? fetch start?onMount?). The issue body says "10 seconds from navigation if noloadevent has fired yet" but the AC does not capture this. A test engineer cannot write a deterministic test without knowing the precise clock start."Report button produces a pre-filled Gitea issue URL (or mailto fallback)"— "or" is an unresolved branch. Both paths are acceptable? Under what condition does each fire? If the condition is "whether a Gitea API token is available in scope," that token availability is an infrastructure concern that is not resolved in the issue.The non-goal "Not handling MinIO presigned-URL expiry specially — surfaces as HTTP 403" implies a user-visible 403 message for an expired presigned URL. But the current architecture does NOT use presigned URLs for file serving —
DocumentController.getDocumentFile()fetches the file server-side and streams it to the client. A 403 in the viewer means permission denied on the Spring Boot endpoint, not an expired presigned URL. The non-goal is correct in its conclusion (treat as 403) but wrong in its reasoning.Missing NFR: accessibility. The error UI includes retry buttons, back buttons, and a report button. None of the acceptance criteria check that these are keyboard-accessible or have proper ARIA roles. Issue #318 (reader-path responsive) is referenced, but accessibility is a separate concern.
Missing NFR: what happens on intermittent failure (retry succeeds)? The AC says "Retry button re-issues the fetch with cache-bust" but does not specify what state the component returns to on success after a retry. Does the spinner reappear? Is the retry count shown to the user?
The i18n key count in the spec body (6 keys) does not match the list (7 keys listed):
viewer_error_timeout_title,viewer_error_404,viewer_error_403,viewer_error_5xx,viewer_error_network,viewer_error_retry,viewer_error_report— that is 7, not 6.Recommendations
Tighten AC 1:
"The timeout clock starts whenloadFile()is called. After 10 seconds with no successful response, the UI transitions to the timeout error state."This is testable withvi.useFakeTimers().Resolve the report button branch before implementation: Pick one mechanism (mailto or Gitea URL) and make it unconditional. Conditional behavior based on runtime token availability adds complexity and an untested branch. Recommend: mailto unconditionally for v1. A Gitea deep-link can be a follow-up enhancement (separate issue).
Add an explicit AC for the retry-success path:
"Given the timeout error state is shown, when the user clicks Retry and the subsequent fetch succeeds within 10 seconds, then the viewer renders normally with no error UI visible."Add a minimal accessibility AC:
"All error state buttons (Retry, Back, Report) are reachable by keyboard Tab and activated by Enter/Space."Fix the i18n key count to 7 in the issue body (minor, but counts matter for implementation tracking).
Open Decisions
👨💻 Nora "NullX" Steiner — Security Engineer
Observations
The report button's Gitea URL construction is a potential open redirect / information disclosure vector. If the report URL is constructed from
document.location.href + documentIdand rendered as an<a>or passed towindow.open(), an attacker who can control the document URL or ID could inject a crafted URL. The risk is low in a family app, but the pattern of constructing URLs from runtime state deserves explicit attention.The cache-bust parameter
?retry=<n>or?_t=<timestamp>on the file fetch has no security implication — it is a backend-facing internal request authenticated by session cookie. No issue here.The
fetch()call inuseFileLoader.tssends the session cookie (same-origin request). The 403 path is correctly handled server-side via@RequirePermission. No bypass risk in the new error-state code.The report button must not expose internal document paths or backend error messages in the pre-filled issue body. The issue spec says "prefilled title/body containing document id + current URL + timestamp." The current URL contains the document UUID — this is acceptable for a private family app. However, the error message from the backend should NOT be included in the pre-filled body because
GlobalExceptionHandlerincludes the raw file path in developer messages (e.g.,"File missing in storage: documents/abc-123.pdf"). Only the document UUID and a human-friendly description should be included.The
rel="noopener noreferrer"attribute on the download fallback link inDocumentViewer.svelte(line 71) — the link opens/api/documents/{doc.id}/filein atarget="_blank"— is missingrel="noopener noreferrer". This is a pre-existing issue but will be in scope when this component is touched. The opened tab can accesswindow.openerand redirect the parent page.The
StorageFileNotFoundExceptioninFileServicefalls through to the genericExceptionhandler if it is not explicitly caught inDocumentController. Looking at the code, it IS caught ingetDocumentFile()(line 106) and mapped toDomainException.notFound(). The generic handler at line 64 ofGlobalExceptionHandlerwould catch it as a 500 if any call path missed the catch — this is a latent risk if new download methods are added without the try/catch pattern.Recommendations
Fix
rel="noopener noreferrer"on the existing download fallback link inDocumentViewer.svelteline 71 as part of this PR. It is a pre-existing CWE-1022 and touching the component in this issue is the right time to fix it.Do not include backend error messages in the report button pre-fill. Only include: document UUID, current page URL path (not query string), timestamp, and a user-typed description field. Example:
mailto:admin@familyarchive.local?subject=Viewer-Fehler+%23<uuid>&body=Seite%3A+%2Fdocuments%2F<uuid>%0ADatum%3A+<date>.Add explicit unit test coverage for the 403 path in
useFileLoaderto assert that the error state does not leak the backend error message string — only a localized, user-safe message is exposed.Consider adding a
StorageFileNotFoundExceptionhandler toGlobalExceptionHandlerthat explicitly returns 404, rather than relying on the catch in every controller method. This closes the latent risk of unhandledStorageFileNotFoundExceptionreturning 500.Open Decisions
👨💻 Sara Holt — QA Engineer & Test Strategist
Observations
The existing
useFileLoader.svelte.test.tstests do not cover HTTP status differentiation at all. The test'sets fileError on a failed fetch (non-ok response)'usessetupFetch(false)with nostatusfield on the mock response — soresponse.statusisundefined. Any new code that readsresponse.statuswould be untested by the existing suite. The tests need to be updated before the new status logic is written (TDD: write the failing tests first).The issue proposes both a component spec (
DocumentViewerErrorBoundary.spec.ts) and an E2E spec (e2e/viewer-errors.spec.ts). This is the correct two-layer approach. The component test handles unit-level state machine verification; the E2E handles the full integration path.The timeout test strategy needs
vi.useFakeTimers(). The issue spec says "advance timers by 10 s" — this implies fake timers, which is correct. However, if the timeout is implemented withAbortController + setTimeout,vi.useFakeTimers()does NOT automatically fakeAbortControllersignals. The implementation must usesetTimeoutdirectly (notAbortController's native timeout) for fake timers to work, OR the test must usevi.advanceTimersByTimeAsync(10_000)which handles promise microtasks correctly.The E2E test for the timeout case (
with MinIO stopped → within 10 s see error UI) will add 10+ seconds of wall-clock time to CI. This is a pyramid violation: timeout behavior belongs in the component/unit layer, not E2E.Missing test: what happens when retry is clicked and the SECOND fetch succeeds? The issue test plan only covers "assert URL is re-fetched with cache-bust" — it does not assert that the successful retry restores the
okstate and removes the error UI.Missing test: what happens when the
<img>element'sonerrorfires afterfileUrlis set? The currentDocumentViewer.sveltehas noonerrorhandler on the<img>(line 109). A corrupt blob URL would silently show nothing.The E2E test "navigate to a deleted document (delete via API mid-test)" is a complex setup that requires test isolation. Deleting a document mid-test while other tests may be using it is a shared-state risk. Use a dedicated test document created and deleted within the test's
beforeAll/afterAll.Recommendations
Write these failing tests first, in this order:
useFileLoader:'sets status to error-404 when response is 404'useFileLoader:'sets status to error-403 when response is 403'useFileLoader:'sets status to error-5xx when response is 503'useFileLoader:'sets status to timeout after 10 seconds with no response'(withvi.useFakeTimers())useFileLoader:'sets status to network when fetch throws TypeError'useFileLoader:'resets to loading then ok on successful retry''shows retry button on timeout state''shows back button but no retry on 403 state'Add
statusfield to the mock fetch response insetupFetch():Move the timeout E2E scenario to the component test layer. Keep only the
page.route()mock scenarios (404, 403, 5xx, network) in the E2E suite — they are fast and don't require fake timers.Page Object Model for the error states:
Run
axeon all error state UIs — the error states will be the primary UX for users with connectivity issues, who may also be on assistive technology.Open Decisions
fetchwas called twice? Asserting the exact?_t=timestamp is brittle (time-dependent). Asserting thatfetchwas called with a URL containing the base path (and a different query param than the first call) is more robust.👨💻 Leonie Voss — UI/UX Design Lead & Accessibility Strategist
Observations
The existing
DocumentViewer.svelteerror state (line 66–77) is incomplete for the intended audience. The senior reader persona (60+, phone, variable connectivity) will encounter this error most. The current UI shows{error}intext-ink-3— a muted color — with a small underlined download link. This fails several design principles: the color is likely too low-contrast on the dark PDF background, the touch target for the download link is not 44px, and there is no action hierarchy (no primary CTA).The state machine table in the issue is correct and user-appropriate — the German copy is clear and the action hierarchy (Retry > Zurück > Problem melden) makes sense. The 403 omitting Retry is the right call (retrying won't help with permission failures).
The timeout state message
"Das Dokument lädt länger als erwartet."is good — it is honest without alarming. The suggested actions (Erneut versuchen, Problem melden) are appropriate.The 404 message
"Die Datei wurde nicht gefunden oder wurde gelöscht."is good. The [Zurück] [Problem melden] actions are appropriate — no retry button is correct.The "Report problem" action must have a visible label, not just an icon, given the senior audience. Icon-only buttons are a WCAG 2.1 failure (SC 1.1.1).
The error UI lives inside
<div class="absolute inset-0 bg-pdf-bg">on a dark background. The design tokens for the dark viewer panel (bg-pdf-bg) are not standard surface tokens — error text and buttons styled withtext-ink-3will have unknown contrast on this background. All error UI colors must be verified against the actualbg-pdf-bgvalue inlayout.css.The spinner (loading state) already renders correctly according to the issue — no changes needed there. The issue correctly calls this the "current behaviour" to preserve.
The
<img>viewer path has no loading state — it just appears whenfileUrlis set. For large images on slow connections, the image container is blank until the blob URL is decoded. A skeleton placeholder during decode would improve perceived performance.Recommendations
Error UI pattern for each state should follow this structure:
The vertical centering is already provided by
flex h-full flex-col items-center justify-center.Buttons must be
min-h-[44px] min-w-[44px]— WCAG 2.2 SC 2.5.8. This is especially critical for the Retry button which is the primary recovery action on mobile.Use
aria-live="polite"on the error container so that screen readers announce the state change when the timeout triggers. The timeout state transition happens silently otherwise.Use
role="alert"specifically for network errors (most disruptive) andaria-live="polite"for timeout and HTTP errors (less urgent).Do not use
text-ink-3onbg-pdf-bgwithout verifying contrast. Usetext-ink-2ortext-white/80for body copy on the dark viewer background. Provide a utility classtext-viewer-mutedor similar that is defined with the correct contrast value againstbg-pdf-bg.Add
focus-visible:ring-2 focus-visible:ring-brand-mintto all error state buttons — keyboard focus must be visible on the dark background. The standard navy ring may not be visible on dark backgrounds; mint provides better contrast.The
rel="noopener noreferrer"on the existing download link (line 71 ofDocumentViewer.svelte) is missing — the security team flagged this too. Fix it in this PR.All three i18n locales (de/en/es) must be completed before merge. Spanish is often omitted from "done" criteria — make it a checklist item at the PR level, not just the issue level.
Open Decisions
Decision Queue — Open Items Needing Human Judgment
Consolidated from all persona reviews. These are genuine tradeoffs — no clear winner without product/ops context.
1. Report Button Mechanism (Elicit + Markus + Tobias + Nora)
The question: Gitea new-issue deep-link, or
mailto:?PUBLIC_GITEA_URLas an env var, and couples the frontend to the internal hostname (heim-nas:3005). Breaks if the instance moves.mailto:requires no config surface, works offline, but lands in an email inbox rather than the issue tracker. No infrastructure coupling.Unanimous recommendation from all four personas: use
mailto:unconditionally for v1. A Gitea deep-link can be a follow-up enhancement in its own issue once the ops URL stabilization question is resolved.Decision needed: Confirm
mailto:for v1 and pick the admin email address to use in the link.2. Timeout Duration — 10s Hardcoded vs. Configurable (Markus)
The question: Is 10 seconds the right default for a family app on home Wi-Fi serving large PDFs?
PUBLIC_VIEWER_TIMEOUT_MSvar makes it tunable without code change.Decision needed: Pick a duration (10s / 15s / 20s) and decide hardcoded vs. env var.
3. Retry Count Cap (Elicit)
The question: Should the Retry button be disabled or replaced with a final "contact admin" message after N consecutive failures?
"Falls das Problem weiterhin besteht, wenden Sie sich an den Administrator."No retry possible until page reload.Decision needed: Unlimited vs. capped (and if capped, what N).
4. E2E Test Strategy for Infrastructure Failures (Tobias + Sara)
The question: Should the E2E tests use
page.route()intercepts (fast, deterministic, no real MinIO) or real Docker Compose MinIO outage (slow, more faithful)?page.route()to return synthetic 503/404/network-error: adds ~0s CI overhead, testable without Docker changes, but does not exercise the backend error path.docker compose stop minio: tests the full stack faithfully, but adds 10+ seconds per scenario and requires CI infrastructure changes.Unanimous recommendation from both personas: use
page.route()for CI E2E tests; keep the real MinIO outage test as a documented manual verification step.Decision needed: Confirm this split or override.
5.
PdfViewer.svelteInline Error State — This PR or Follow-up? (Felix)The question:
PdfViewer.sveltehas a hardcoded German error string for PDF.js load failures ("Fehler beim Laden der PDF") that is independent of theuseFileLoaderstatus system this issue builds. It represents a second unhandled error surface.Decision needed: In-scope for this issue or create a follow-up?