As a family member I want to see which documents need transcription and which are ready to read so I know where to contribute #240
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
Encourage transcription contributions by surfacing documents that need work, and reward users by making finished documents easy to find.
Two patterns to implement
A — Dashboard Widgets
Two new cards on the home dashboard (alongside the existing "needs metadata" widget):
/enrich/{id}./documents/{id}.B — Smart Search Filter Chips
Three quick-filter chips on the document search page:
Thresholds
Backend changes
TranscriptionFilterenum (NEEDS_ANNOTATIONS,NEEDS_REVIEW,READY_TO_READ)@Querymethods onDocumentRepositoryfor the three states (percentage calculation via SQL subquery)DocumentController:GET /api/documents/needs-transcription?size=3GET /api/documents/ready-to-read?size=3transcriptionFilterrequest param onGET /api/documents/searchDocumentSpecifications.hasTranscriptionFilter()Frontend changes
DashboardNeedsTranscription.sveltewidgetDashboardReadyToRead.sveltewidget+page.server.ts: fetch both widget data + passtranscriptionFilterto search+page.svelte: render widgets in dashboard + filter chips in searchde.json,en.json,es.jsonData model (already in DB, no migration needed)
DocumentAnnotation.document_id— drawn annotation boxesTranscriptionBlock.reviewed(boolean) — quality-check flag per text blockDocumentviadocument_id👨💻 Felix Brandt — Senior Fullstack Developer
Questions
Merge ordering in
findNeedsTranscription(): The proposed service method combinesfindNeedingAnnotations()+findNeedingReview()viastream().distinct().limit(size). After dedup, what's the sort order? Mixing two separately-ordered lists produces an arbitrary result. Should the combined result be sorted by "urgency" (0 annotations first, then by ascending review %)?TranscriptionFilterenum placement: The issue places this in themodelpackage, but it's a query parameter, not a domain entity. Existing pattern: request/query parameters live indto/or as@RequestParamtypes. Woulddto/TranscriptionFilter.javaor a newquery/package fit better?Filter chip exclusivity: Are the three filter chips mutually exclusive (one active at a time) or combinable? The issue implies one-at-a-time, but
NEEDS_ANNOTATIONSandNEEDS_REVIEWtogether would mean the same as "needs any work". Should the URL param be a singletranscriptionFilter=Xvalue (not multi-value)?Widget empty state for "Lesefertig": The issue says "hidden when empty" — make this explicit in the component contract. An empty
docsarray →{#if docs.length > 0}wrapping the whole card? Or a CSShiddenclass? Let's agree on the pattern before implementing.Suggestions
Consider splitting
findNeedsTranscription()into two separate service methods (findNeedingAnnotations(int size)andfindNeedingReview(int size)) and doing the merge at the call site in+page.server.ts. Each method is then independently testable and the merge logic lives in one well-named place.The filter chips are a good candidate for extracting into a small
TranscriptionFilterChips.sveltecomponent (instead of inline in+page.svelte) — they have their own state (active chip) and will grow with future filter additions.TDD approach: write the three failing repository integration tests first — one for each native query — before writing a single line of SQL. The test data setup will immediately reveal the edge cases (annotations with 0 blocks, exactly-75%-reviewed documents).
🏛️ Markus Keller — Application Architect
Questions
Business rules in the repository layer: The native
@Querymethods embed the threshold values (75%, 90%) directly in SQL strings. If these change, you must modifyDocumentRepository— the wrong layer for a business rule. Consider passing the threshold as a@Param("threshold") double thresholdso the service owns the rule and the repository is a dumb query executor.Specification bypass for the search filter: The issue notes that
hasTranscriptionFilter()inDocumentSpecifications"may need to fall back to native queries." If it does, the Specification pattern becomes leaky — some filters compose via Criteria API, one falls back to a native query. That inconsistency will confuse future developers. The FTS case already handles this cleanly by bypassing the Specification system entirely and doing a pre-filter on IDs. ShouldtranscriptionFilterfollow the same pattern: a pre-filter that narrows the ID set, then the rest of the Specification chain applies?Domain ownership:
TranscriptionFilterspans two entities —DocumentAnnotation(ownership check) andTranscriptionBlock(reviewed % check). TheDocumentServiceowns the query, but neitherDocumentAnnotationServicenorTranscriptionServiceare consulted. Is this an intentional shortcut, or doesTranscriptionService(if it exists) need a method likegetDocumentIdsByCompletionStatus(TranscriptionFilter, int size)thatDocumentServicedelegates to?Suggestions
The two new endpoints (
/needs-transcription,/ready-to-read) are read-only dashboard projections — not core document CRUD. Consider whether they belong in aStatsControllerorDashboardControllerrather thanDocumentController, which is already large. Grouping by purpose is more readable than grouping by entity.The
TranscriptionFilterenum should live indto/or alongside the search parameter types, not inmodel/. Entities inmodel/represent persistent state; enums representing UI query intent live indto/.🧪 Sara Holt — QA Engineer
Missing Acceptance Criteria
The issue defines the UI and backend changes but has no observable acceptance criteria. Before implementation starts, I'd want to agree on at least:
NEEDS_ANNOTATIONSreturns only documents with 0 annotation rowsNEEDS_REVIEWreturns documents with ≥ 1 annotation AND reviewed% < 75%READY_TO_READreturns only documents with reviewed% ≥ 90%Edge Cases to Test
< 75%, not≤ 75%)PLACEHOLDER(no file yet)status NOT IN ('PLACEHOLDER', 'ARCHIVED'))?size=0)Test Coverage Gaps
::floatcasting. Testcontainers withpostgres:16-alpineis mandatory here — H2 will not catch a broken CAST.findNeedsTranscription()merge logic: the deduplicate + limit behavior should be verified in isolation (a document that has 0 annotations will also have 0 blocks reviewed — it could appear in both repository queries).DashboardReadyToReadshould not render whendocs.length === 0.+page.server.tsshould handlePromise.allSettledrejection for the two new API calls gracefully (return empty array, not undefined).🔒 Nora Steiner — Application Security
Authorization on New Endpoints
The two new endpoints (
GET /api/documents/needs-transcription,GET /api/documents/ready-to-read) returnList<Document>— the full entity, including all fields. The issue doesn't mention a@RequirePermissionannotation for them./api/documents/incompleteendpoint, it likely requires at minimumREAD_ALL. Make this explicit — don't rely on a blanketanyRequest().authenticated()policy if other endpoints inDocumentControlleralready use@RequirePermission(READ_ALL).Documententity includesmetadataComplete, internal timestamps, and potentially sensitive metadata fields. If a lower-privilege role should only see the "needs work" prompt but not full document details, consider returning a slim DTO (e.g.,{ id, title, documentDate }) instead of the full entity.Input Validation
transcriptionFilteras an enum@RequestParam: Spring MVC will return400 Bad Requestfor unknown enum values by default — this is the correct behavior. Verify that the error response goes through the existingDomainException/ error handler, not as a raw Spring stack trace.sizeparameter on dashboard endpoints: what's the maximum allowed value? An unboundsize=10000would return thousands of documents and could become an inadvertent data exfiltration vector. Add@Max(20)or a hard cap in the service layer.No New Attack Surface
The native SQL uses JPA named parameters (
:size,:threshold) — injection-safe. The filter enum is validated at deserialization. No new file uploads, no new user-controlled strings flowing into queries. This feature has minimal security surface overall.Logging
Verify that the new
transcriptionFilterparameter is not logged verbatim withlogger.info("Search called with filter: " + transcriptionFilter)— use SLF4J parameterized logging:logger.debug("Transcription filter: {}", transcriptionFilter).🎨 Leonie Voss — UX Design & Accessibility
"Transkription ausstehend" Widget — Context Clarity
The widget mixes two distinct cases into one list: documents with no annotations at all, and documents with partial reviews. For a 60+ user, seeing "Brief von Oma 1943" in this list with no explanation is confusing — why is it here? What do I need to do?
Suggestion: Add a short contextual subtitle per row:
This removes ambiguity and tells the user what action to take.
Filter Chips — Accessibility
<button aria-pressed="true|false">elements, not styled<a>links. Screen readers announcearia-pressedstate, so users know which filter is active.aria-label="Lesefertig, vollständig transkribiert"so screen reader users hear a complete description, not just "✓".min-h-[44px]touch target (WCAG 2.2 requirement, critical for 60+ users on touch devices). If the design uses small pill shapes, addpy-3to meet the minimum.Mobile Layout (320px)
Three filter chips side-by-side at 320px will overflow and either clip or force horizontal scroll without indication. Recommend:
flex-wrapis the correct fix — it avoids hidden overflow and works for both sighted and screen-reader users."Lesefertig" Empty State — Hidden, Not Collapsed
Hiding the "Lesefertig" card entirely when empty is the right call. Showing an empty "reward" section would feel like a broken UI. Confirm the implementation uses
{#if readyToRead.length > 0}(conditional render, no DOM node) rather than CSShidden(node exists, inaccessible to screen reader but present in DOM).Positive Reinforcement Copy
The empty state for "Transkription ausstehend" ("Alle Dokumente sind transkribiert ✓") is a nice moment. Consider making it visually celebratory — a soft mint background (
bg-brand-mint/10) rather than default grey, so it reads as an achievement rather than just an empty list.⚙️ Tobias Wendt — DevOps & Platform
Query Performance — Check Indexes First
The native SQL queries use correlated subqueries on
document_annotations.document_idandtranscription_blocks.document_id. These run on every dashboard load and on every search request that includes atranscriptionFilter.Before merging, verify that index coverage exists:
If
document_idis not indexed on these tables, both subqueries become sequential scans and will be visible on the dashboard latency graphs. A missing index is a Flyway migration, not an application change.Dashboard Load Budget
The home page already runs 3 parallel API calls on every dashboard load (stats, incomplete, recent-activity). Adding 2 more takes it to 5 parallel calls. On the CX32 (8GB RAM, 4 vCPU), this is fine —
Promise.allSettledkeeps them non-blocking. But set a mental baseline: if the dashboard P95 latency climbs above 500ms after this lands, these subquery-heavy endpoints are the first suspects.No New Infrastructure
This feature adds zero new services, zero new volumes, and zero new environment variables. The CI pipeline does not change. From an ops perspective, this is low-risk.
Observability Recommendation
Once deployed, keep an eye on:
/api/documents/needs-transcriptionand/api/documents/ready-to-readin application logsIf the subqueries start showing up as slow in Postgres
pg_stat_statements, a materialized view refreshed on a schedule (or updated by a trigger ontranscription_blocks.reviewedchanges) would be the next step.Design decisions — Mission Control Strip
We worked through the dashboard expansion problem and landed on a full-width Mission Control Strip below the existing two-column grid. It doesn't touch the right column (DropZone + NeedsMetadata), nothing gets pushed below the fold.
Spec files (PR #244)
docs/specs/dashboard-expansion-patterns.html— four alternatives evaluated (Tabs, Accordion, Mission Control Strip, Priority Queue) with mockups and a recommendationdocs/specs/mission-control-strip-final.html— everything needed for implementationThree columns = three pipeline stages
annotation_count = 0annotation_count > 0ANDreviewed_pct < 0.90reviewed_pct ≥ 0.90Sorting strategy — seeded weekly shuffle
Partial progress always bubbles to the top. Zero-progress documents rotate weekly so buried easy docs eventually surface. Within a week the order is stable (no jitter on refresh).
Hard documents — expert flag escape hatch
A new
needs_expert BOOLEANcolumn ondocuments. Contributors can mark a document illegible from the enrich page (PATCH /api/documents/{id}/needs-expert). The UI shows a purple Experten gesucht badge; flagged docs sort after all unflagged ones. Prevents hard Kurrent pages from monopolising the Transkription column forever.Progress bar — cold-start problem solved
1 500 documents with 0 transcriptions → a global "12 / 1 500" bar at 0.8% fill is demotivating (endowed-progress effect working against us). Solution: three different granularities:
↑ +5 diese Woche(shows momentum, not total)3 / 8 Blöcke(only shown whenannotation_count > 0)94% geprüftNo global progress bar at all during the early phase.
New backend contracts needed
GET /api/transcription/segmentation-queueGET /api/transcription/transcription-queueGET /api/transcription/ready-to-readGET /api/transcription/weekly-statsPATCH /api/documents/{id}/needs-expertNew Svelte components
MissionControlStrip.svelte— strip wrapper with 3-column gridSegmentationColumn.svelte— col 1TranscriptionColumn.svelte— col 2 (with per-doc bars)ReadyColumn.svelte— col 3 (mint background when filled, dashed border when empty)ExpertBadge.svelte— purple badge + sort-to-bottom logic