feat(observability): add handleError hook with structured stdout sink #462
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
frontend/src/hooks.server.tsexportshandleandhandleFetchonly. There is nohandleErrorhook. As a result, every unhandled SSR error renders SvelteKit's default 500 page and disappears — no error sink, no Sentry, no log aggregation.Operationally this means:
Approach
Implement
handleErrorin bothhooks.server.tsandhooks.client.ts. Forward to structured stdout for now (Loki picks it up after #140). When a real error sink is chosen (Sentry, GlitchTip, etc.), swap the destination — the contract stays the same.frontend/src/hooks.server.tsThe returned object becomes
$page.erroron the rendered error page — surfaceerrorIdto the user so they can quote it in a bug report.frontend/src/hooks.client.ts(new file)frontend/src/routes/+error.svelteShow
errorIdto the user, with a "copy to clipboard" affordance:i18n keys:
error_page_title,error_page_id_label(de/en/es).Lint-time enforcement of related cleanup
Add to
frontend/eslint.config.js:This catches the stray
console.logthe audit found atfrontend/src/routes/api/tags/+server.ts:2and prevents new ones (closes audit F-21).Critical files
frontend/src/hooks.server.ts— addhandleErrorfrontend/src/hooks.client.ts— new filefrontend/src/routes/+error.svelte— surface error IDfrontend/messages/{de,en,es}.json— 2 new keysfrontend/eslint.config.js—no-consolerulefrontend/src/routes/api/tags/+server.ts:2— remove the existingconsole.logVerification
npm run dev, intentionally throw inside a+page.server.tsload → server stdout shows JSON line witherrorId; user-facing page shows "Es ist etwas schiefgelaufen" + the sameerrorId.npm run dev, throw inside an in-browser action → browser console shows[client-error]with same shape; user sees same error UI.npm run lintfails if a newconsole.logis added; passes after removing the existing one.Acceptance criteria
handleErrorexported in bothhooks.server.tsandhooks.client.ts.errorId.errorIdvisible to the user on+error.svelte(copyable).console.loginfrontend/src(existing one removed).no-consoleenforced.Effort
S — half a day.
Risk if not addressed
Production incidents are undebuggable. Unhappy-path bugs accumulate without telemetry to find them.
Tracked in audit doc as F-02 (Critical) + F-21 (Medium, folded into this PR).
👨💻 Markus Keller — Application Architect
Observations
handleErrorhook means every unhandled SSR exception is a silent production black hole. This is a straightforward, well-scoped fix.{ message, errorId }fromhandleErroris architecturally sound: it extendsApp.Errorcleanly. The issue does not mention thatApp.Errormust be augmented insrc/app.d.tsto declare theerrorIdfield — without this TypeScript will complain when+error.sveltereadspage.error?.errorId. Add:no-consoleESLint rule addition is the right lint-time enforcement mechanism. However, the existinghooks.server.tsalready usesconsole.error(...)in theuserGrouphandler (line 56). That call is legitimate — it should survive the new rule becauseno-consoleallowserrorandwarnby default in the proposed config. Confirm this carefully during implementation.hooks.client.ts) and modifies the error page. Neither touches a backend domain, Flyway migration, or Docker service — no C4 or DB diagram update needed. However, if the team adds a future Sentry integration, that's a new external system requiringl1-context.pumlupdate. The issue correctly calls this out as a future swap point.docs/audits/file tracking these, that file should be updated to mark them as addressed once the PR merges. I seedocs/audits/2026-05-07-pre-prod-architectural-review.mdin the git status — the implementing PR should include a pass over that file to close the relevant findings.Recommendations
App.Erroraugmentation tosrc/app.d.tsas the very first step — it is the TypeScript contract everything else depends on.crypto.randomUUID()on the server (Node 21, available natively). On the client,crypto.randomUUID()is also available in all modern browsers — no polyfill needed. The issue's code is correct; just make this explicit in the PR description.ts,level,errorId,status,url,userId,error.name,error.message,error.stack. Do not log request headers, query parameters, or cookie values in the initial implementation — they may contain tokens. The issue's PII note is good; make it a concrete exclusion list in the implementation.console.loginapi/tags/+server.ts:27for removal in the same PR — the issue already calls this out as F-21. Bundle it; don't leave it for a follow-up.ErrorServiceor abstraction layer. The two hook files are the right abstraction level for this scale. Keep it boring.Open Decisions
errorIdsourcing: the client hook generates its own UUID independently. This means a client-side error and a server-side re-render of the error page will have different IDs. For correlating a client crash report to a server log, the IDs will never match. Whether this matters depends on whether client errors are ever expected to have a server-side counterpart — for this app's SSR-first architecture, probably not. No action needed now, but worth keeping in mind when Sentry is added.👨💻 Felix Brandt — Fullstack Developer
Observations
+error.svelte(14 lines) displayspage.statusandpage.error?.messagebut has noerrorIdsurface at all — this is the primary gap the issue addresses.hooks.client.tsdoes not exist.hooks.server.tsexportshandleandhandleFetchonly —handleErroris missing. Both are confirmed by reading the file.console.log('Tags Data', data)atfrontend/src/routes/api/tags/+server.ts:27is confirmed. It is the onlyconsole.loginfrontend/src/— easy to remove.handleErrorimplementations are clean and correct. One naming note: the returned object usesmessage(matches SvelteKit's defaultApp.Error.messagefield) anderrorId(new field). This is consistent with the project's error-handling idiom.+error.sveltesnippet in the issue uses$page.error?.errorId— this reads fine in Svelte 5 with$app/state'spagerune (the existing file already usesimport { page } from '$app/state'). No store migration needed.errorIdin a<code>block but no button. This should be implemented as a small<button>that callsnavigator.clipboard.writeText(page.error.errorId)— consistent with how other copy actions in the codebase work.Recommendations
loadfunction in a test page that verifies a thrown error produces a response witherrorIdin$page.error— or at minimum write a unit test that thehandleErrorfunction returns{ message, errorId }whereerrorIdmatches a UUID regex.console.logstatement, verifynpm run lintfails, then add the rule and verify it passes.+error.svelte: the issue's snippet is minimal (good). Don't create a child component — at 15-20 lines it doesn't need splitting. Keep the copy button inline.handleErroron the server should serializeerror.stackonly whenstatus >= 500. For 4xx errors (which can also reach the error hook viaerror()from load functions), stack traces add noise and may expose path info. Add a conditional:console.loginapi/tags/+server.tsin the same commit as the ESLint rule — otherwise the rule will immediately fail CI on the existing codebase. The issue already groups these; enforce the commit order.error_page_titleanderror_page_id_labelshould be added to all three locale files in the same commit. The existingpage_title_errorkey is already present —error_page_titlewould be a duplicate concept. Reuse the existing key for the title; only adderror_page_id_labelas a new key.Open Decisions
console.errorinhooks.client.tsis redundant and potentially verbose. Omiterror.stackfrom the client hook payload — the browser DevTools already surface it. This is a recommendation, not a blocker.👨💻 Tobias Wendt — DevOps & Platform Engineer
Observations
console.error(JSON.stringify({...}))line lands indocker logs frontendand is automatically ingested by Loki via promtail with zero configuration change. No additional infrastructure needed for phase 1.ts,level,errorId,status,url,userId,error) is clean. Thetsfield uses ISO 8601, which Loki's log pipeline can parse natively for time ordering.docker-compose.ymlis listed as modified ingit status. I have not seen the diff, but if the frontend service definition changes there, verify the Node adapter's startup command still uses stdout (not a log file) —@sveltejs/adapter-nodedoes by default, so no change expected.docs/infrastructure/self-hosted-catalogue.md) — it's the natural fit when a real error sink is needed, and it runs on the VPS at zero additional cost. The code contract (return{ message, errorId }from the hook, swap the destination) is the right abstraction.no-consolerule addition (no-console: ['error', { allow: ['warn', 'error'] }]) is a CI-time control — it gates every PR. This is the correct enforcement layer.Recommendations
JSON.stringify(...)call has no line breaks in the error object — a multi-line stack trace will break Loki's line-based ingestion. Truncate or replace newlines in the stack:source="frontend-ssr"label in the log line so Loki queries can filter frontend errors from backend logs in a single dashboard panel. Add this field to the JSON payload now — it costs nothing and saves a schema migration later.docker-compose.ymlare part of this PR (beyond what was already modified). The stdout sink is handled by the existing container runtime.npm run lintgreen. The only pre-existingconsole.logis atapi/tags/+server.ts:27— remove it in the same PR. Ifnpm run lintwas not part of CI before, add it now as a step before the build step.Open Decisions
errorIdin the log cannot be found fast enough." At that point, stand up GlitchTip from the self-hosted catalogue, add the Sentry SDK (GlitchTip is Sentry-compatible), and swap the hook body. The current stdout approach is good enough until then.👨💻 Elicit — Requirements Engineer
Observations
JSON.stringify(...)always produces single-line output. The acceptance criterion should clarify that stack traces with embedded newlines must be escaped (see DevOps note).hooks.server.tscontext) and the same enforcement mechanism (no-consolerule). No scope creep concern.App.Erroraugmentation (missing from the spec) adds ~15 minutes.error_page_title,error_page_id_label) are needed in all three locale files. The issue lists this in Critical files but the acceptance criteria do not include a localization check. Minor gap.Recommendations
App.Errorinterface insrc/app.d.tsdeclares{ message: string; errorId?: string }andsvelte-checkpasses." This catches the TypeScript contract gap before it reaches runtime.page_title_errori18n key for the error page title — it already exists in all three locales. Onlyerror_page_id_labelis a net-new key. This reduces the i18n work to 3 lines (one per locale).errorIdmust be copyable with a single interaction (button or click-to-select). The issue mentions "copy to clipboard" but the snippet only shows a<code>block. Specify the exact interaction so the implementer doesn't ship a non-copyable code block.+page.server.tsload" is good manual QA. For a P0 item, consider whether a Playwright smoke test that triggers a 500 and verifies the error page renders anerrorIdshould be part of the acceptance criteria.👨💻 Nora "NullX" Steiner — Security Engineer
Observations
error.message— in the Familienarchiv context, aDomainExceptionmessage may include entity IDs ("Document not found: <UUID>") or usernames ("Unknown user: admin@family.local"). UUIDs are low-risk; usernames are borderline. The current approach is acceptable but deserves explicit attention.error.stackin the stdout payload can include file system paths from the Node.js process (e.g.,/home/marcel/Desktop/familienarchiv/frontend/.svelte-kit/...). In the dev environment this is fine; in production, these paths could aid an attacker in understanding the server's directory layout if logs are ever misconfigured to be publicly accessible. Recommendation: strip or hash path components in production.errorIdreturned to the user in$page.erroris a UUID — this is safe. UUIDs reveal nothing about the system. Good design.hooks.client.tsis visible to any user who opens DevTools.{ errorId, status, message, url, error }is acceptable — theerrorobject in the browser will include stack traces from the client bundle. Since the production bundle is minified, path leakage is minimal. No concern here.no-consoleESLint rule targetsconsole.log(disallowed) while allowingconsole.warnandconsole.error. The existingconsole.error(...)call inhooks.server.tsline 56 (userGroup handler) and the newhandleErrorbody both useconsole.error— these survive the rule correctly.url: event.url.pathnamefield in the server log is user-controlled. A crafted URL could inject JSON characters (",\n,{,}) into the log line, potentially breaking Loki's parser or injecting false log entries.JSON.stringify({...})at the top level handles this correctly because the entire object is stringified — the pathname is a JSON string value and its special characters are escaped automatically. This is safe as written.event.request.bodyis never read or logged in the hook.Recommendations
error.stack: log the full stack in development, truncate to the first 3 lines in production. This is a one-liner:error.messagefrom the client hook (or document explicitly that it may be user-visible). In the browser,console.erroroutput is visible to the end user in DevTools. For this family app with trusted users, this is not a meaningful threat, but it establishes a good habit for any future public-facing derivative.handleErrorwith a craftedevent.url.pathnamecontaining"}\n{"level":"admin"and verifies the output is a valid single-line JSON object. This proves the log injection path is closed.Open Decisions
userIdin the server log: the issue logsevent.locals.user?.id ?? null. User IDs (UUIDs) are pseudonymous — they identify an account without revealing a name. For a private family app this is appropriate and useful for incident correlation. For any future GDPR-scope deployment, user IDs in logs are personal data requiring a retention policy. Not a blocker now; flag for the production hardening checklist.👨💻 Sara Holt — QA Engineer
Observations
handleErroris a plain TypeScript function that takes{ error, event, status, message }and returns{ message: string, errorId: string }. It is unit-testable without any SvelteKit runtime. A Vitest test can import it directly, call it with a mock event, and assert on the returned shape and the stdout output.vitest-browser-sveltefor component tests and Vitest (Node environment) for server-side logic. ThehandleErrorfunction fits naturally into the Node environment project.+error.sveltechange (adding theerrorIddisplay and copy button) should have avitest-browser-sveltecomponent test that:page.errorincludingerrorIderrorIdtext is visibleno-console) is tested implicitly bynpm run lintin CI. No separate unit test needed — but CI must run lint as a gate.console.logremoval atapi/tags/+server.ts:27has an existing unit test implication: if there were tests for this route's handler, they would be unaffected by this removal. Theconsole.logis a debug artifact, not load-bearing behavior.Recommendations
handleError(server):+error.svelte: usevitest-browser-svelteto render it withpage.error = { message: 'test error', errorId: 'abc-123' }mocked via a store override, and assertgetByText('abc-123')is visible.errorIdelement is presentThis covers the full SSR error path that unit tests cannot reach.
npm run lintis a CI step that runs before the build. If it is not currently in the Gitea Actions workflow, add it as part of this PR.no-consolerule itself: add a fixture file (undersrc/lib/**/__fixtures__/) with aconsole.logcall and verifynpm run lintreports an error on it. The existingeslint.config.jsalready excludes__fixtures__from the regular lint pass — this makes it a safe test location.Open Decisions
hooks.server.tsandhooks.client.tsare currently not covered by the Vitest coverage report (if one exists). Whether to include them in the coverage threshold or keep them as manually-verified integration points is a team decision. Given the P0 severity, recommend including them in coverage.👨💻 Leonie Voss — UI/UX Design Lead
Observations
+error.svelte(14 lines) is minimal but functional: it showspage.status(large, bold,text-6xl) andpage.error?.message(small,text-ink-2). It uses the correctfont-sansandtext-ink/text-ink-2tokens. Brand compliance is already correct.errorIddisplay and the copy button. The issue's snippet shows the ID in a<code>block but does not specify the copy interaction, touch target size, or focus state. For the transcriber audience (60+, laptop/tablet), this matters.page.statusnumber (500, 404) is not meaningful to a 65-year-old transcriber. A human-readable headline in German ("Es ist etwas schiefgelaufen" as the issue proposes) is essential.<code>element forerrorIdwill render in the browser's monospace fallback font, which may have poor contrast depending on the system font. The project uses Tinos (serif) and Montserrat (sans) — neither is applied to<code>by default.aria-label, minimum 44px touch target, and a confirmation state ("Kopiert!" for 2 seconds) so seniors know the action succeeded.Recommendations
Add a human-readable headline before the status code. The issue proposes "Es ist etwas schiefgelaufen" — use the
error_page_titlei18n key already present in all three locales (check the key name matches). Place it as an<h1>withfont-serif text-2xl text-ink.Style the
errorIddisplay to brand standards:The
select-allclass on the<code>element means users can also triple-click to select the ID without using the button — essential fallback for seniors who may not trust or find the button.Add a confirmation state to the copy button: a 2-second "Kopiert!" feedback (swap the button label using a
$stateboolean) prevents seniors from pressing the button multiple times uncertain whether it worked.Check the error page at 320px: at small viewport widths,
text-6xlfor the status code (96px) will dominate the screen. Consider reducing totext-4xlortext-5xlon mobile, scaling up on desktop:Verify the existing
page.error?.messagedisplay: with the newhandleError, the message will always be"An unexpected error occurred"(localized). The hardcoded fallback'Internal Error'in the current template (?? 'Internal Error') should be replaced with a Paraglide key:?? m.error_internal_error().Open Decisions
Decision Queue — Open Decisions for #462
Consolidated from all persona reviews. Each item requires a judgment call before or during implementation.
Theme 1: Error ID Correlation (Server vs. Client)
Raised by: Markus Keller
The server
handleErrorand the clienthandleErroreach generate independent UUIDs. A client-side crash that triggers a page re-render on the server will produce two unrelated IDs. For this app's SSR-first architecture this is probably acceptable — client errors are mostly post-hydration UI bugs, not server failures.Decision needed: Accept independent IDs for now (deferred until Sentry/GlitchTip is added), or thread a server-generated
errorIdthrough the initial SSR response for the client to reuse?Recommendation: Accept independent IDs now. Revisit when a real error sink is connected.
Theme 2: Stack Traces in Client Hook
Raised by: Felix Brandt
Browser DevTools already show the full stack for client-side errors. Logging
error.stackin thehooks.client.tsconsole.errorcall is redundant and adds noise.Decision needed: Include or omit
error.stackfrom the client hook'sconsole.errorpayload?Recommendation: Omit from the client hook. The browser surfaces it automatically.
Theme 3: Stack Traces in Production (Security + DevOps)
Raised by: Nora Steiner, Tobias Wendt
error.stackin stdout may include absolute file system paths (e.g.,/home/marcel/Desktop/familienarchiv/...). In development, fine. In production with proper log access controls, acceptable. If logs are ever accidentally exposed, path information aids attackers.Decision needed: Log full stack traces in production, truncate to N lines, or strip path components?
Recommendation: Truncate to first 3 lines in production using
process.env.NODE_ENV === 'production'. Zero infrastructure cost, reduces leakage surface.Theme 4: Show Status Code to End Users
Raised by: Leonie Voss
500and404are not meaningful to a 65-year-old transcriber. The current+error.svelteshows the status code prominently (text-6xl). With the new error page, theerrorIdis the user-facing artifact — the status code is developer-facing.Decision needed: Keep the large status code, make it smaller, or hide it entirely for production users?
Recommendation: Keep it but reduce visual weight —
text-2xlortext-xlin mutedtext-ink-3, and make the human-readable message (h1) more prominent. A senior's first read should be "something went wrong" + the ID, not "500".Theme 5:
userIdin Server Logs and GDPRRaised by: Nora Steiner
The server hook logs
event.locals.user?.id(a UUID). For a private family app, this is useful for incident correlation and low risk. For any future deployment with external users, user IDs in logs are personal data under GDPR, requiring a documented retention policy.Decision needed: Log
userIdas proposed, or log a hashed/pseudonymized value?Recommendation: Log the raw UUID for now — it is already pseudonymous and the app is family-only. Add a note in the PR that a retention policy is required before any public launch.