As a family member I want to see which documents need transcription and which are ready to read so I know where to contribute #240

Closed
opened 2026-04-15 14:20:02 +02:00 by marcel · 7 comments
Owner

Goal

Encourage transcription contributions by surfacing documents that need work, and reward users by making finished documents easy to find.

Two patterns to implement

A — Dashboard Widgets

Two new cards on the home dashboard (alongside the existing "needs metadata" widget):

  • "Transkription ausstehend" — up to 3 documents that have no annotations yet, or where fewer than 75% of transcription blocks are marked as reviewed. Links to /enrich/{id}.
  • "Lesefertig" — up to 3 documents where ≥ 90% of blocks are reviewed. Links to /documents/{id}.

B — Smart Search Filter Chips

Three quick-filter chips on the document search page:

Chip Filter
Keine Transkription 0 annotations
Review ausstehend annotations present, reviewed < 75%
Lesefertig ✓ reviewed ≥ 90%

Thresholds

  • "Needs work": 0 annotations OR reviewed blocks / total blocks < 75%
  • "Ready to read": reviewed blocks / total blocks ≥ 90%

Backend changes

  • New TranscriptionFilter enum (NEEDS_ANNOTATIONS, NEEDS_REVIEW, READY_TO_READ)
  • Native @Query methods on DocumentRepository for the three states (percentage calculation via SQL subquery)
  • Two new endpoints on DocumentController:
    • GET /api/documents/needs-transcription?size=3
    • GET /api/documents/ready-to-read?size=3
  • New transcriptionFilter request param on GET /api/documents/search
  • New specification in DocumentSpecifications.hasTranscriptionFilter()

Frontend changes

  • New DashboardNeedsTranscription.svelte widget
  • New DashboardReadyToRead.svelte widget
  • +page.server.ts: fetch both widget data + pass transcriptionFilter to search
  • +page.svelte: render widgets in dashboard + filter chips in search
  • New i18n keys in de.json, en.json, es.json

Data model (already in DB, no migration needed)

  • DocumentAnnotation.document_id — drawn annotation boxes
  • TranscriptionBlock.reviewed (boolean) — quality-check flag per text block
  • Both linked to Document via document_id
## Goal Encourage transcription contributions by surfacing documents that need work, and reward users by making finished documents easy to find. ## Two patterns to implement ### A — Dashboard Widgets Two new cards on the home dashboard (alongside the existing "needs metadata" widget): - **"Transkription ausstehend"** — up to 3 documents that have no annotations yet, or where fewer than 75% of transcription blocks are marked as reviewed. Links to `/enrich/{id}`. - **"Lesefertig"** — up to 3 documents where ≥ 90% of blocks are reviewed. Links to `/documents/{id}`. ### B — Smart Search Filter Chips Three quick-filter chips on the document search page: | Chip | Filter | |------|--------| | Keine Transkription | 0 annotations | | Review ausstehend | annotations present, reviewed < 75% | | Lesefertig ✓ | reviewed ≥ 90% | ## Thresholds - "Needs work": 0 annotations **OR** reviewed blocks / total blocks < **75%** - "Ready to read": reviewed blocks / total blocks ≥ **90%** ## Backend changes - New `TranscriptionFilter` enum (`NEEDS_ANNOTATIONS`, `NEEDS_REVIEW`, `READY_TO_READ`) - Native `@Query` methods on `DocumentRepository` for the three states (percentage calculation via SQL subquery) - Two new endpoints on `DocumentController`: - `GET /api/documents/needs-transcription?size=3` - `GET /api/documents/ready-to-read?size=3` - New `transcriptionFilter` request param on `GET /api/documents/search` - New specification in `DocumentSpecifications.hasTranscriptionFilter()` ## Frontend changes - New `DashboardNeedsTranscription.svelte` widget - New `DashboardReadyToRead.svelte` widget - `+page.server.ts`: fetch both widget data + pass `transcriptionFilter` to search - `+page.svelte`: render widgets in dashboard + filter chips in search - New i18n keys in `de.json`, `en.json`, `es.json` ## Data model (already in DB, no migration needed) - `DocumentAnnotation.document_id` — drawn annotation boxes - `TranscriptionBlock.reviewed` (boolean) — quality-check flag per text block - Both linked to `Document` via `document_id`
marcel added the collaborationfeatureui labels 2026-04-15 14:20:06 +02:00
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Questions

  • Merge ordering in findNeedsTranscription(): The proposed service method combines findNeedingAnnotations() + findNeedingReview() via stream().distinct().limit(size). After dedup, what's the sort order? Mixing two separately-ordered lists produces an arbitrary result. Should the combined result be sorted by "urgency" (0 annotations first, then by ascending review %)?

  • TranscriptionFilter enum placement: The issue places this in the model package, but it's a query parameter, not a domain entity. Existing pattern: request/query parameters live in dto/ or as @RequestParam types. Would dto/TranscriptionFilter.java or a new query/ package fit better?

  • Filter chip exclusivity: Are the three filter chips mutually exclusive (one active at a time) or combinable? The issue implies one-at-a-time, but NEEDS_ANNOTATIONS and NEEDS_REVIEW together would mean the same as "needs any work". Should the URL param be a single transcriptionFilter=X value (not multi-value)?

  • Widget empty state for "Lesefertig": The issue says "hidden when empty" — make this explicit in the component contract. An empty docs array → {#if docs.length > 0} wrapping the whole card? Or a CSS hidden class? Let's agree on the pattern before implementing.

Suggestions

  • Consider splitting findNeedsTranscription() into two separate service methods (findNeedingAnnotations(int size) and findNeedingReview(int size)) and doing the merge at the call site in +page.server.ts. Each method is then independently testable and the merge logic lives in one well-named place.

  • The filter chips are a good candidate for extracting into a small TranscriptionFilterChips.svelte component (instead of inline in +page.svelte) — they have their own state (active chip) and will grow with future filter additions.

  • TDD approach: write the three failing repository integration tests first — one for each native query — before writing a single line of SQL. The test data setup will immediately reveal the edge cases (annotations with 0 blocks, exactly-75%-reviewed documents).

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Questions - **Merge ordering in `findNeedsTranscription()`**: The proposed service method combines `findNeedingAnnotations()` + `findNeedingReview()` via `stream().distinct().limit(size)`. After dedup, what's the sort order? Mixing two separately-ordered lists produces an arbitrary result. Should the combined result be sorted by "urgency" (0 annotations first, then by ascending review %)? - **`TranscriptionFilter` enum placement**: The issue places this in the `model` package, but it's a query parameter, not a domain entity. Existing pattern: request/query parameters live in `dto/` or as `@RequestParam` types. Would `dto/TranscriptionFilter.java` or a new `query/` package fit better? - **Filter chip exclusivity**: Are the three filter chips mutually exclusive (one active at a time) or combinable? The issue implies one-at-a-time, but `NEEDS_ANNOTATIONS` and `NEEDS_REVIEW` together would mean the same as "needs any work". Should the URL param be a single `transcriptionFilter=X` value (not multi-value)? - **Widget empty state for "Lesefertig"**: The issue says "hidden when empty" — make this explicit in the component contract. An empty `docs` array → `{#if docs.length > 0}` wrapping the whole card? Or a CSS `hidden` class? Let's agree on the pattern before implementing. ### Suggestions - Consider splitting `findNeedsTranscription()` into two separate service methods (`findNeedingAnnotations(int size)` and `findNeedingReview(int size)`) and doing the merge at the call site in `+page.server.ts`. Each method is then independently testable and the merge logic lives in one well-named place. - The filter chips are a good candidate for extracting into a small `TranscriptionFilterChips.svelte` component (instead of inline in `+page.svelte`) — they have their own state (active chip) and will grow with future filter additions. - TDD approach: write the three failing repository integration tests first — one for each native query — before writing a single line of SQL. The test data setup will immediately reveal the edge cases (annotations with 0 blocks, exactly-75%-reviewed documents).
Author
Owner

🏛️ Markus Keller — Application Architect

Questions

  • Business rules in the repository layer: The native @Query methods embed the threshold values (75%, 90%) directly in SQL strings. If these change, you must modify DocumentRepository — the wrong layer for a business rule. Consider passing the threshold as a @Param("threshold") double threshold so the service owns the rule and the repository is a dumb query executor.

  • Specification bypass for the search filter: The issue notes that hasTranscriptionFilter() in DocumentSpecifications "may need to fall back to native queries." If it does, the Specification pattern becomes leaky — some filters compose via Criteria API, one falls back to a native query. That inconsistency will confuse future developers. The FTS case already handles this cleanly by bypassing the Specification system entirely and doing a pre-filter on IDs. Should transcriptionFilter follow the same pattern: a pre-filter that narrows the ID set, then the rest of the Specification chain applies?

  • Domain ownership: TranscriptionFilter spans two entities — DocumentAnnotation (ownership check) and TranscriptionBlock (reviewed % check). The DocumentService owns the query, but neither DocumentAnnotationService nor TranscriptionService are consulted. Is this an intentional shortcut, or does TranscriptionService (if it exists) need a method like getDocumentIdsByCompletionStatus(TranscriptionFilter, int size) that DocumentService delegates to?

Suggestions

  • The two new endpoints (/needs-transcription, /ready-to-read) are read-only dashboard projections — not core document CRUD. Consider whether they belong in a StatsController or DashboardController rather than DocumentController, which is already large. Grouping by purpose is more readable than grouping by entity.

  • The TranscriptionFilter enum should live in dto/ or alongside the search parameter types, not in model/. Entities in model/ represent persistent state; enums representing UI query intent live in dto/.

## 🏛️ Markus Keller — Application Architect ### Questions - **Business rules in the repository layer**: The native `@Query` methods embed the threshold values (75%, 90%) directly in SQL strings. If these change, you must modify `DocumentRepository` — the wrong layer for a business rule. Consider passing the threshold as a `@Param("threshold") double threshold` so the *service* owns the rule and the repository is a dumb query executor. - **Specification bypass for the search filter**: The issue notes that `hasTranscriptionFilter()` in `DocumentSpecifications` "may need to fall back to native queries." If it does, the Specification pattern becomes leaky — some filters compose via Criteria API, one falls back to a native query. That inconsistency will confuse future developers. The FTS case already handles this cleanly by bypassing the Specification system entirely and doing a pre-filter on IDs. Should `transcriptionFilter` follow the same pattern: a pre-filter that narrows the ID set, then the rest of the Specification chain applies? - **Domain ownership**: `TranscriptionFilter` spans two entities — `DocumentAnnotation` (ownership check) and `TranscriptionBlock` (reviewed % check). The `DocumentService` owns the query, but neither `DocumentAnnotationService` nor `TranscriptionService` are consulted. Is this an intentional shortcut, or does `TranscriptionService` (if it exists) need a method like `getDocumentIdsByCompletionStatus(TranscriptionFilter, int size)` that `DocumentService` delegates to? ### Suggestions - The two new endpoints (`/needs-transcription`, `/ready-to-read`) are read-only dashboard projections — not core document CRUD. Consider whether they belong in a `StatsController` or `DashboardController` rather than `DocumentController`, which is already large. Grouping by purpose is more readable than grouping by entity. - The `TranscriptionFilter` enum should live in `dto/` or alongside the search parameter types, not in `model/`. Entities in `model/` represent persistent state; enums representing UI query intent live in `dto/`.
Author
Owner

🧪 Sara Holt — QA Engineer

Missing Acceptance Criteria

The issue defines the UI and backend changes but has no observable acceptance criteria. Before implementation starts, I'd want to agree on at least:

  • Home page shows "Transkription ausstehend" widget with ≤ 3 documents
  • Home page shows "Lesefertig" widget only when at least 1 document qualifies
  • Search filter chip NEEDS_ANNOTATIONS returns only documents with 0 annotation rows
  • Search filter chip NEEDS_REVIEW returns documents with ≥ 1 annotation AND reviewed% < 75%
  • Search filter chip READY_TO_READ returns only documents with reviewed% ≥ 90%

Edge Cases to Test

Scenario Expected behavior
Document has annotations but 0 transcription blocks (box drawn, no text yet) Reviewed% = 0% → appears in NEEDS_REVIEW; the native SQL must guard against division by zero
Document has exactly 75% reviewed (e.g., 3 of 4 blocks) Does NOT appear in NEEDS_REVIEW (threshold is < 75%, not ≤ 75%)
Document with status PLACEHOLDER (no file yet) Must NOT appear in either widget or filter results (filtered by status NOT IN ('PLACEHOLDER', 'ARCHIVED'))
Empty database — no documents qualify for any category Widget renders empty state / is hidden gracefully, no 500 error
Size param at the boundary (e.g., ?size=0) Should return empty list, not a SQL error

Test Coverage Gaps

  • Integration test for native SQL: the percentage calculation uses PostgreSQL-specific ::float casting. Testcontainers with postgres:16-alpine is mandatory here — H2 will not catch a broken CAST.
  • Unit test for findNeedsTranscription() merge logic: the deduplicate + limit behavior should be verified in isolation (a document that has 0 annotations will also have 0 blocks reviewed — it could appear in both repository queries).
  • Frontend component test: DashboardReadyToRead should not render when docs.length === 0.
  • Load function test: +page.server.ts should handle Promise.allSettled rejection for the two new API calls gracefully (return empty array, not undefined).
## 🧪 Sara Holt — QA Engineer ### Missing Acceptance Criteria The issue defines the UI and backend changes but has no observable acceptance criteria. Before implementation starts, I'd want to agree on at least: - [ ] Home page shows "Transkription ausstehend" widget with ≤ 3 documents - [ ] Home page shows "Lesefertig" widget only when at least 1 document qualifies - [ ] Search filter chip `NEEDS_ANNOTATIONS` returns only documents with 0 annotation rows - [ ] Search filter chip `NEEDS_REVIEW` returns documents with ≥ 1 annotation AND reviewed% < 75% - [ ] Search filter chip `READY_TO_READ` returns only documents with reviewed% ≥ 90% ### Edge Cases to Test | Scenario | Expected behavior | |---|---| | Document has annotations but **0 transcription blocks** (box drawn, no text yet) | Reviewed% = 0% → appears in NEEDS_REVIEW; the native SQL must guard against division by zero | | Document has exactly **75% reviewed** (e.g., 3 of 4 blocks) | Does NOT appear in NEEDS_REVIEW (threshold is `< 75%`, not `≤ 75%`) | | Document with status `PLACEHOLDER` (no file yet) | Must NOT appear in either widget or filter results (filtered by `status NOT IN ('PLACEHOLDER', 'ARCHIVED')`) | | **Empty database** — no documents qualify for any category | Widget renders empty state / is hidden gracefully, no 500 error | | Size param at the boundary (e.g., `?size=0`) | Should return empty list, not a SQL error | ### Test Coverage Gaps - **Integration test for native SQL**: the percentage calculation uses PostgreSQL-specific `::float` casting. Testcontainers with `postgres:16-alpine` is mandatory here — H2 will not catch a broken CAST. - **Unit test for `findNeedsTranscription()` merge logic**: the deduplicate + limit behavior should be verified in isolation (a document that has 0 annotations will also have 0 blocks reviewed — it could appear in both repository queries). - **Frontend component test**: `DashboardReadyToRead` should not render when `docs.length === 0`. - **Load function test**: `+page.server.ts` should handle `Promise.allSettled` rejection for the two new API calls gracefully (return empty array, not undefined).
Author
Owner

🔒 Nora Steiner — Application Security

Authorization on New Endpoints

The two new endpoints (GET /api/documents/needs-transcription, GET /api/documents/ready-to-read) return List<Document> — the full entity, including all fields. The issue doesn't mention a @RequirePermission annotation for them.

  • What permission should these endpoints require? Looking at the existing /api/documents/incomplete endpoint, it likely requires at minimum READ_ALL. Make this explicit — don't rely on a blanket anyRequest().authenticated() policy if other endpoints in DocumentController already use @RequirePermission(READ_ALL).
  • The full Document entity includes metadataComplete, internal timestamps, and potentially sensitive metadata fields. If a lower-privilege role should only see the "needs work" prompt but not full document details, consider returning a slim DTO (e.g., { id, title, documentDate }) instead of the full entity.

Input Validation

  • transcriptionFilter as an enum @RequestParam: Spring MVC will return 400 Bad Request for unknown enum values by default — this is the correct behavior. Verify that the error response goes through the existing DomainException / error handler, not as a raw Spring stack trace.
  • The size parameter on dashboard endpoints: what's the maximum allowed value? An unbound size=10000 would return thousands of documents and could become an inadvertent data exfiltration vector. Add @Max(20) or a hard cap in the service layer.

No New Attack Surface

The native SQL uses JPA named parameters (:size, :threshold) — injection-safe. The filter enum is validated at deserialization. No new file uploads, no new user-controlled strings flowing into queries. This feature has minimal security surface overall.

Logging

Verify that the new transcriptionFilter parameter is not logged verbatim with logger.info("Search called with filter: " + transcriptionFilter) — use SLF4J parameterized logging: logger.debug("Transcription filter: {}", transcriptionFilter).

## 🔒 Nora Steiner — Application Security ### Authorization on New Endpoints The two new endpoints (`GET /api/documents/needs-transcription`, `GET /api/documents/ready-to-read`) return `List<Document>` — the full entity, including all fields. The issue doesn't mention a `@RequirePermission` annotation for them. - What permission should these endpoints require? Looking at the existing `/api/documents/incomplete` endpoint, it likely requires at minimum `READ_ALL`. Make this explicit — don't rely on a blanket `anyRequest().authenticated()` policy if other endpoints in `DocumentController` already use `@RequirePermission(READ_ALL)`. - The full `Document` entity includes `metadataComplete`, internal timestamps, and potentially sensitive metadata fields. If a lower-privilege role should only see the "needs work" prompt but not full document details, consider returning a slim DTO (e.g., `{ id, title, documentDate }`) instead of the full entity. ### Input Validation - `transcriptionFilter` as an enum `@RequestParam`: Spring MVC will return `400 Bad Request` for unknown enum values by default — this is the correct behavior. Verify that the error response goes through the existing `DomainException` / error handler, not as a raw Spring stack trace. - The `size` parameter on dashboard endpoints: what's the maximum allowed value? An unbound `size=10000` would return thousands of documents and could become an inadvertent data exfiltration vector. Add `@Max(20)` or a hard cap in the service layer. ### No New Attack Surface The native SQL uses JPA named parameters (`:size`, `:threshold`) — injection-safe. The filter enum is validated at deserialization. No new file uploads, no new user-controlled strings flowing into queries. This feature has minimal security surface overall. ### Logging Verify that the new `transcriptionFilter` parameter is not logged verbatim with `logger.info("Search called with filter: " + transcriptionFilter)` — use SLF4J parameterized logging: `logger.debug("Transcription filter: {}", transcriptionFilter)`.
Author
Owner

🎨 Leonie Voss — UX Design & Accessibility

"Transkription ausstehend" Widget — Context Clarity

The widget mixes two distinct cases into one list: documents with no annotations at all, and documents with partial reviews. For a 60+ user, seeing "Brief von Oma 1943" in this list with no explanation is confusing — why is it here? What do I need to do?

Suggestion: Add a short contextual subtitle per row:

  • "Noch nicht begonnen" (0 annotations)
  • "3 von 8 Blöcken geprüft" (partial review %)

This removes ambiguity and tells the user what action to take.

Filter Chips — Accessibility

  • Filter chips must be <button aria-pressed="true|false"> elements, not styled <a> links. Screen readers announce aria-pressed state, so users know which filter is active.
  • The "Lesefertig ✓" label uses a Unicode checkmark. Add aria-label="Lesefertig, vollständig transkribiert" so screen reader users hear a complete description, not just "✓".
  • Each chip needs a min-h-[44px] touch target (WCAG 2.2 requirement, critical for 60+ users on touch devices). If the design uses small pill shapes, add py-3 to meet the minimum.

Mobile Layout (320px)

Three filter chips side-by-side at 320px will overflow and either clip or force horizontal scroll without indication. Recommend:

<div class="flex flex-wrap gap-2">
  <!-- chips wrap to second line at narrow widths -->
</div>

flex-wrap is the correct fix — it avoids hidden overflow and works for both sighted and screen-reader users.

"Lesefertig" Empty State — Hidden, Not Collapsed

Hiding the "Lesefertig" card entirely when empty is the right call. Showing an empty "reward" section would feel like a broken UI. Confirm the implementation uses {#if readyToRead.length > 0} (conditional render, no DOM node) rather than CSS hidden (node exists, inaccessible to screen reader but present in DOM).

Positive Reinforcement Copy

The empty state for "Transkription ausstehend" ("Alle Dokumente sind transkribiert ✓") is a nice moment. Consider making it visually celebratory — a soft mint background (bg-brand-mint/10) rather than default grey, so it reads as an achievement rather than just an empty list.

## 🎨 Leonie Voss — UX Design & Accessibility ### "Transkription ausstehend" Widget — Context Clarity The widget mixes two distinct cases into one list: documents with no annotations at all, and documents with partial reviews. For a 60+ user, seeing "Brief von Oma 1943" in this list with no explanation is confusing — *why* is it here? What do I need to do? **Suggestion**: Add a short contextual subtitle per row: - "Noch nicht begonnen" (0 annotations) - "3 von 8 Blöcken geprüft" (partial review %) This removes ambiguity and tells the user what action to take. ### Filter Chips — Accessibility - Filter chips must be `<button aria-pressed="true|false">` elements, **not** styled `<a>` links. Screen readers announce `aria-pressed` state, so users know which filter is active. - The "Lesefertig ✓" label uses a Unicode checkmark. Add `aria-label="Lesefertig, vollständig transkribiert"` so screen reader users hear a complete description, not just "✓". - Each chip needs a `min-h-[44px]` touch target (WCAG 2.2 requirement, critical for 60+ users on touch devices). If the design uses small pill shapes, add `py-3` to meet the minimum. ### Mobile Layout (320px) Three filter chips side-by-side at 320px will overflow and either clip or force horizontal scroll without indication. Recommend: ```svelte <div class="flex flex-wrap gap-2"> <!-- chips wrap to second line at narrow widths --> </div> ``` `flex-wrap` is the correct fix — it avoids hidden overflow and works for both sighted and screen-reader users. ### "Lesefertig" Empty State — Hidden, Not Collapsed Hiding the "Lesefertig" card entirely when empty is the right call. Showing an empty "reward" section would feel like a broken UI. Confirm the implementation uses `{#if readyToRead.length > 0}` (conditional render, no DOM node) rather than CSS `hidden` (node exists, inaccessible to screen reader but present in DOM). ### Positive Reinforcement Copy The empty state for "Transkription ausstehend" ("Alle Dokumente sind transkribiert ✓") is a nice moment. Consider making it visually celebratory — a soft mint background (`bg-brand-mint/10`) rather than default grey, so it reads as an achievement rather than just an empty list.
Author
Owner

⚙️ Tobias Wendt — DevOps & Platform

Query Performance — Check Indexes First

The native SQL queries use correlated subqueries on document_annotations.document_id and transcription_blocks.document_id. These run on every dashboard load and on every search request that includes a transcriptionFilter.

Before merging, verify that index coverage exists:

-- Run EXPLAIN ANALYZE on the production DB (or a realistic dataset):
EXPLAIN ANALYZE
SELECT COUNT(*) FROM document_annotations WHERE document_id = '<some-uuid>';

EXPLAIN ANALYZE
SELECT SUM(CASE WHEN reviewed THEN 1.0 ELSE 0.0 END)::float / COUNT(*)
FROM transcription_blocks WHERE document_id = '<some-uuid>';

If document_id is not indexed on these tables, both subqueries become sequential scans and will be visible on the dashboard latency graphs. A missing index is a Flyway migration, not an application change.

Dashboard Load Budget

The home page already runs 3 parallel API calls on every dashboard load (stats, incomplete, recent-activity). Adding 2 more takes it to 5 parallel calls. On the CX32 (8GB RAM, 4 vCPU), this is fine — Promise.allSettled keeps them non-blocking. But set a mental baseline: if the dashboard P95 latency climbs above 500ms after this lands, these subquery-heavy endpoints are the first suspects.

No New Infrastructure

This feature adds zero new services, zero new volumes, and zero new environment variables. The CI pipeline does not change. From an ops perspective, this is low-risk.

Observability Recommendation

Once deployed, keep an eye on:

  • Query duration for /api/documents/needs-transcription and /api/documents/ready-to-read in application logs
  • Home page load time in browser dev tools

If the subqueries start showing up as slow in Postgres pg_stat_statements, a materialized view refreshed on a schedule (or updated by a trigger on transcription_blocks.reviewed changes) would be the next step.

## ⚙️ Tobias Wendt — DevOps & Platform ### Query Performance — Check Indexes First The native SQL queries use correlated subqueries on `document_annotations.document_id` and `transcription_blocks.document_id`. These run on **every dashboard load** and on **every search request** that includes a `transcriptionFilter`. Before merging, verify that index coverage exists: ```sql -- Run EXPLAIN ANALYZE on the production DB (or a realistic dataset): EXPLAIN ANALYZE SELECT COUNT(*) FROM document_annotations WHERE document_id = '<some-uuid>'; EXPLAIN ANALYZE SELECT SUM(CASE WHEN reviewed THEN 1.0 ELSE 0.0 END)::float / COUNT(*) FROM transcription_blocks WHERE document_id = '<some-uuid>'; ``` If `document_id` is not indexed on these tables, both subqueries become sequential scans and will be visible on the dashboard latency graphs. A missing index is a Flyway migration, not an application change. ### Dashboard Load Budget The home page already runs 3 parallel API calls on every dashboard load (stats, incomplete, recent-activity). Adding 2 more takes it to 5 parallel calls. On the CX32 (8GB RAM, 4 vCPU), this is fine — `Promise.allSettled` keeps them non-blocking. But set a mental baseline: if the dashboard P95 latency climbs above 500ms after this lands, these subquery-heavy endpoints are the first suspects. ### No New Infrastructure This feature adds zero new services, zero new volumes, and zero new environment variables. The CI pipeline does not change. From an ops perspective, this is low-risk. ### Observability Recommendation Once deployed, keep an eye on: - Query duration for `/api/documents/needs-transcription` and `/api/documents/ready-to-read` in application logs - Home page load time in browser dev tools If the subqueries start showing up as slow in Postgres `pg_stat_statements`, a materialized view refreshed on a schedule (or updated by a trigger on `transcription_blocks.reviewed` changes) would be the next step.
Author
Owner

Design decisions — Mission Control Strip

We worked through the dashboard expansion problem and landed on a full-width Mission Control Strip below the existing two-column grid. It doesn't touch the right column (DropZone + NeedsMetadata), nothing gets pushed below the fold.

Spec files (PR #244)

  • Pattern analysis: docs/specs/dashboard-expansion-patterns.html — four alternatives evaluated (Tabs, Accordion, Mission Control Strip, Priority Queue) with mockups and a recommendation
  • Final blueprint: docs/specs/mission-control-strip-final.html — everything needed for implementation

Three columns = three pipeline stages

Column Filter Skill needed
Segmentierung annotation_count = 0 None — anyone draws boxes
Transkription annotation_count > 0 AND reviewed_pct < 0.90 Kurrent helpful
Lesefertig reviewed_pct ≥ 0.90

Sorting strategy — seeded weekly shuffle

ORDER BY
  textedBlocks DESC,
  HASHTEXT(id::text || EXTRACT(WEEK FROM NOW())::int::text)

Partial progress always bubbles to the top. Zero-progress documents rotate weekly so buried easy docs eventually surface. Within a week the order is stable (no jitter on refresh).

Hard documents — expert flag escape hatch

A new needs_expert BOOLEAN column on documents. Contributors can mark a document illegible from the enrich page (PATCH /api/documents/{id}/needs-expert). The UI shows a purple Experten gesucht badge; flagged docs sort after all unflagged ones. Prevents hard Kurrent pages from monopolising the Transkription column forever.


Progress bar — cold-start problem solved

1 500 documents with 0 transcriptions → a global "12 / 1 500" bar at 0.8% fill is demotivating (endowed-progress effect working against us). Solution: three different granularities:

  • Column headers — weekly pulse: ↑ +5 diese Woche (shows momentum, not total)
  • Transkription rows — per-document mini bar: 3 / 8 Blöcke (only shown when annotation_count > 0)
  • Lesefertig rows — percentage text: 94% geprüft

No global progress bar at all during the early phase.


New backend contracts needed

Endpoint Purpose
GET /api/transcription/segmentation-queue Top 5 docs for Segmentierung column
GET /api/transcription/transcription-queue Top 5 docs for Transkription column
GET /api/transcription/ready-to-read Top 5 docs for Lesefertig column
GET /api/transcription/weekly-stats Weekly pulse counts per column
PATCH /api/documents/{id}/needs-expert Set/unset the expert flag

New Svelte components

  • MissionControlStrip.svelte — strip wrapper with 3-column grid
  • SegmentationColumn.svelte — col 1
  • TranscriptionColumn.svelte — col 2 (with per-doc bars)
  • ReadyColumn.svelte — col 3 (mint background when filled, dashed border when empty)
  • ExpertBadge.svelte — purple badge + sort-to-bottom logic
## Design decisions — Mission Control Strip We worked through the dashboard expansion problem and landed on a full-width **Mission Control Strip** below the existing two-column grid. It doesn't touch the right column (DropZone + NeedsMetadata), nothing gets pushed below the fold. ### Spec files (PR #244) - **Pattern analysis**: `docs/specs/dashboard-expansion-patterns.html` — four alternatives evaluated (Tabs, Accordion, Mission Control Strip, Priority Queue) with mockups and a recommendation - **Final blueprint**: `docs/specs/mission-control-strip-final.html` — everything needed for implementation --- ### Three columns = three pipeline stages | Column | Filter | Skill needed | |---|---|---| | **Segmentierung** | `annotation_count = 0` | None — anyone draws boxes | | **Transkription** | `annotation_count > 0` AND `reviewed_pct < 0.90` | Kurrent helpful | | **Lesefertig** | `reviewed_pct ≥ 0.90` | — | --- ### Sorting strategy — seeded weekly shuffle ```sql ORDER BY textedBlocks DESC, HASHTEXT(id::text || EXTRACT(WEEK FROM NOW())::int::text) ``` Partial progress always bubbles to the top. Zero-progress documents rotate weekly so buried easy docs eventually surface. Within a week the order is stable (no jitter on refresh). ### Hard documents — expert flag escape hatch A new `needs_expert BOOLEAN` column on `documents`. Contributors can mark a document illegible from the enrich page (`PATCH /api/documents/{id}/needs-expert`). The UI shows a purple **Experten gesucht** badge; flagged docs sort after all unflagged ones. Prevents hard Kurrent pages from monopolising the Transkription column forever. --- ### Progress bar — cold-start problem solved 1 500 documents with 0 transcriptions → a global "12 / 1 500" bar at 0.8% fill is demotivating (endowed-progress effect working against us). Solution: **three different granularities**: - **Column headers** — weekly pulse: `↑ +5 diese Woche` (shows momentum, not total) - **Transkription rows** — per-document mini bar: `3 / 8 Blöcke` (only shown when `annotation_count > 0`) - **Lesefertig rows** — percentage text: `94% geprüft` No global progress bar at all during the early phase. --- ### New backend contracts needed | Endpoint | Purpose | |---|---| | `GET /api/transcription/segmentation-queue` | Top 5 docs for Segmentierung column | | `GET /api/transcription/transcription-queue` | Top 5 docs for Transkription column | | `GET /api/transcription/ready-to-read` | Top 5 docs for Lesefertig column | | `GET /api/transcription/weekly-stats` | Weekly pulse counts per column | | `PATCH /api/documents/{id}/needs-expert` | Set/unset the expert flag | --- ### New Svelte components - `MissionControlStrip.svelte` — strip wrapper with 3-column grid - `SegmentationColumn.svelte` — col 1 - `TranscriptionColumn.svelte` — col 2 (with per-doc bars) - `ReadyColumn.svelte` — col 3 (mint background when filled, dashed border when empty) - `ExpertBadge.svelte` — purple badge + sort-to-bottom logic
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#240