feat: Annotation-backed collaborative transcription system #176
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implement a collaborative transcription system where users draw turquoise rectangles on PDF scans to mark text regions, then type transcriptions into linked text blocks. Includes block-level comment threads with quoted text selections, version history, and mode exclusivity (transcribe mode vs annotate mode).
This is the core transcription editing experience — the counterpart to the read mode (separate issue).
Motivation
Family members need to collaboratively transcribe handwritten letters. The annotation-backed approach links each transcription block to a specific region on the scan, enabling scroll-sync and visual correspondence between handwriting and typed text.
Spec
📄
docs/specs/annotation-transcription-final-spec.html— open locally in browser for full mockupsKey screens
Core concepts
Annotation-backed blocks
transcription_blockis created, linked to the annotation viaannotation_idsort_order(manual drag) or annotation Y-positionMode exclusivity
Block-level comment threads with quoted selections
block_idonly — no char-offset anchoring (avoids OT/CRDT complexity)> "Breslau")Version history
Data model changes
New table:
transcription_blocksidannotation_iddocument_idtextlabelsort_ordercreated_byupdated_bycreated_atupdated_atModified table:
document_commentsblock_idNo
char_offset_start/endcolumns — deliberately avoided. Char offsets shift when text is edited, requiring OT/CRDT infrastructure (Y.js future work, not MVP). A stale quote is better than a broken offset.API endpoints
/api/documents/{id}/transcription-blocks/api/documents/{id}/transcription-blocks/api/transcription-blocks/{id}/api/transcription-blocks/{id}/api/transcription-blocks/{id}/history/api/transcription-blocks/{id}/comments/api/transcription-blocks/{id}/commentsComponent architecture
TranscriptionEditView.svelteTranscriptionBlock.svelteBlockCommentThread.svelteTranscriptionToolbar.svelteAnnotationLayer.svelteAcceptance criteria
transcription_blockstable created via Flyway migrationdocument_comments.block_idcolumn added via Flyway migration👨💻 Felix Brandt — Senior Fullstack Developer
Questions & Observations
This is a large feature — 7 new API endpoints, 2 Flyway migrations, 5+ new Svelte components, contenteditable text editing, drag-and-drop reordering, auto-save, comment threads, version history. This needs to be broken into multiple PRs. I'd suggest: (1) backend data model + CRUD endpoints, (2) frontend annotation drawing + block rendering, (3) comment threads, (4) version history, (5) auto-save + reordering. Each PR is independently shippable and testable.
contenteditableis a minefield — the spec says blocks usecontenteditablefor text editing. This means we need to handle: paste events (strip HTML, keep plain text only), undo/redo (browser default or custom?), keyboard shortcuts, cursor position management, IME input for non-Latin scripts. Have we considered using a<textarea>with auto-resize instead? Simpler, more predictable, and avoids the entire class of contenteditable bugs. The visual difference is negligible with proper styling.Auto-save debounced 1.5s — the AC says "auto-save on block text changes (debounced 1.5s)." What happens when the user navigates away before the debounce fires? We need either
beforeunloadprotection or an immediate flush on blur/navigation. Also: what's the HTTP method? PATCH for partial updates (just the text field) or PUT for the full block? The endpoint table shows PUT for update — that means we send the full block every time, even if only one character changed. Consider a PATCH endpoint for text-only updates to reduce payload.document_iddenormalized ontranscription_blocks— the issue notes this is "for query convenience." That's fine, but it creates a consistency risk: annotation.document_id could theoretically differ from block.document_id. Should we add a DB constraint or just trust the application layer? I'd suggest a CHECK constraint or at minimum a comment in the migration explaining why it's denormalized.Version history — the endpoint
GET /api/transcription-blocks/{id}/historyimplies we're tracking block revisions. But the data model has notranscription_block_historytable. Are we using Hibernate Envers, a manual audit table, or something else? This needs to be specced at the data model level before implementation.Suggestions
TDD plan for the backend: Start with
TranscriptionBlockService— test create, update, delete, and reorder. ThenTranscriptionBlockController— test endpoint routing, permission checks, validation. Then the comment thread endpoints. Each layer gets its own test class.Component decomposition: The 5 components listed look right.
TranscriptionBlock.svelteis the densest — it owns the contenteditable, the block header (label, number), the footer (Kommentieren button, comment count), and the comment thread. That's potentially 4 visual regions → consider splitting the footer intoBlockFooter.svelteand the contenteditable intoBlockEditor.svelte.Annotation drawing — the "draw a rectangle on the PDF" interaction is non-trivial. This is essentially a canvas interaction layer. Is there an existing library we're using for annotations, or is this custom? The answer significantly affects the implementation complexity.
🏗️ Markus Keller — Application Architect
Questions & Observations
This is the biggest feature in the project so far — new domain entity, new table, new service, new controller, modifications to the existing annotation and comment systems, contenteditable editing, auto-save, version history. This needs a phased implementation plan, not a single issue. I'd recommend splitting into at least 3 issues: (1) data model + backend CRUD, (2) frontend transcription UI, (3) comment threads + history.
Domain boundary question:
TranscriptionBlockvsDocumentAnnotation— the block has anannotation_idFK, making it dependent on the annotation domain. But the block also has its owndocument_id,text,label,sort_order,created_by,updated_by. IsTranscriptionBlockits own domain or a sub-entity ofDocumentAnnotation? This determines: does it get its own service (TranscriptionBlockService) or does it live inside the existing annotation service? I'd argue for its own service — the block has its own CRUD lifecycle, its own comment threads, its own version history. The annotation is just a spatial anchor.document_comments.block_idFK — adding a nullable FK to the existing comments table ties two domains together at the DB level. This is acceptable for MVP, but be aware: deleting a transcription block now needs to handle cascading comment cleanup.ON DELETE SET NULLorON DELETE CASCADE? SET NULL preserves the comment but orphans it; CASCADE deletes the thread. The issue doesn't specify.Version history storage — no table defined for history. Options:
transcription_blocks_AUD), minimal code. But it's a framework coupling.transcription_block_versionstable — explicit, queryable, no framework magic.SYSTEM_TIMEversioning. Most powerful but complex.For MVP, I'd recommend option 2: a simple
transcription_block_versionstable with(id, block_id, text, changed_by, changed_at). One INSERT per save. Simple query for history. No framework dependency.Auto-save architecture — debounced 1.5s from the frontend, hitting
PUT /api/transcription-blocks/{id}. This means: concurrent edits by two users on the same block will cause last-write-wins conflicts. Is that acceptable for MVP? If not, you need optimistic locking (@Versioncolumn +409 Conflictresponse). The issue doesn't address concurrent editing — worth deciding now.Suggestions
Flyway migration order: Create
transcription_blockstable first, then alterdocument_commentsto addblock_idFK. Two separate migration files. The FK ondocument_commentsshould referencetranscription_blocks(id)withON DELETE SET NULL— orphaning a comment is better than silently deleting discussion.API path design: The mixed paths (
/api/documents/{id}/transcription-blocksfor list/create,/api/transcription-blocks/{id}for update/delete) are fine — they follow the existing pattern. But the comment endpoints under/api/transcription-blocks/{id}/commentsshould verify that the authenticated user has READ_ALL permission on the parent document, not just on the block itself. Permission check must flow through the document.🧪 Sara Holt — QA Engineer & Test Strategist
Questions & Observations
15 acceptance criteria — good coverage, but several are compound and need splitting for proper test mapping:
Missing acceptance criteria:
WRITE_ALL? Can a user edit another user's block?TEXT(unlimited), but the frontend should probably have a reasonable limit.Version history — the AC says "Version history accessible via Verlauf button" but doesn't define what's in the history. What fields are tracked? Just text changes? Label changes? Reordering? Who sees the history — all users or only WRITE_ALL?
Suggestions
TranscriptionBlockService— create, update, delete, reorder, version history retrieval. Permission checks. Cascade behavior on delete.block_idon comments works. Version history inserts on save.TranscriptionBlock.svelte— renders text, handles contenteditable input, fires save callback.BlockCommentThread.svelte— renders comments, shows quoted text, handles new comment submission.🔒 Nora "NullX" Steiner — Application Security Engineer
Questions & Observations
New attack surface: 7 API endpoints — this is the first feature that introduces user-generated content stored in the database (transcription text, comments with quoted selections). Each endpoint needs authorization checks.
contenteditable→ stored text — users type into a contenteditable div, and this text is stored as-is in thetextcolumn. Critical question: is the text stored as plain text or HTML? If contenteditable produces HTML (which it does by default —<div>,<br>,<b>tags from paste/formatting), and that HTML is rendered back via{@html}in Svelte, we have a stored XSS vulnerability. The text MUST be either:{text}interpolation, never{@html}), orThis is the single most important security consideration in this feature. The issue doesn't specify the text format.
Comment body with markdown blockquotes — comments contain
> "Breslau"quoted selections. If the comment body is rendered as markdown, any markdown injection (links, images, HTML in markdown) could be a vector. Ensure the markdown renderer (if used) sanitizes HTML within markdown. If comments are plain text with just the>visual treatment, this is fine.Block deletion cascade —
document_comments.block_idFK. If a malicious user can delete a block they don't own, they can orphan or cascade-delete other users' comments. Verify: delete endpoint checks that the user has WRITE_ALL permission AND (optionally) is the block creator or an admin.Auto-save rate limiting — debounced 1.5s on the frontend, but a malicious client can bypass the debounce and flood the PUT endpoint. Add server-side rate limiting or at minimum throttle per-user per-block (e.g., max 1 update per second per block).
Version history data exposure —
GET /api/transcription-blocks/{id}/historyreturns who changed what and when. This includeschanged_byuser information. Ensure this endpoint requires at least READ_ALL permission on the parent document, and that the user info returned is limited (display name only, not email or internal user IDs).Suggestions
contenteditableoutput is stripped to text content before save."@RequirePermission(Permission.WRITE_ALL)on create/update/delete endpoints. READ_ALL on list/get/history/comments.🎨 Leonie Voss — UI/UX Design Lead
Questions & Observations
Contenteditable styling — the block cards use
contenteditablefor text input. This is the most delicate UI element in the feature. Key concerns:[contenteditable]:empty::beforecan handle this.::selection { background: rgba(0,199,177,.2) }) to visually connect the editing experience to the turquoise annotation color.Block card density — each block has: numbered badge, label, contenteditable text area, footer with Kommentieren button + comment count + hint text. On a 4-block letter, the right panel is quite dense. Ensure adequate spacing: at least 12px between blocks, 8px padding inside each card. The footer should be visually quieter than the text area — don't let UI chrome compete with the content.
"Text markieren für Zitat" hint — this hint in the block footer guides users to select text before commenting. But it's always visible, even when no one is commenting. Consider showing it only when the comment thread is open or when the Kommentieren button is hovered. Permanent hints become visual noise after the first use.
Mobile transcribe mode (S4) — on mobile (stacked vertical), the contenteditable blocks need extra care:
Mode exclusivity visual feedback — when switching from annotate to transcribe mode, the yellow annotations dim and turquoise annotations brighten. This transition should be animated (opacity 300ms ease) so the user understands what changed. An abrupt switch could be confusing for 60+ users.
Suggestions
Block numbering — the turquoise numbered badges on both the PDF annotations and the block cards are the visual anchor linking the two panels. These numbers must always match and be prominent. If a user reorders blocks, the numbers must update on both sides simultaneously. Test with 10+ blocks to ensure the numbers remain readable at smaller sizes.
Auto-save indicator — the spec mentions auto-save but the AC doesn't describe a visual indicator. Users (especially 60+ users) need reassurance that their work is saved. Suggest a subtle "Gespeichert ✓" text in the status bar or block footer that appears after each successful save, fading after 2s. And a "Speichere..." indicator during the save request.
🔧 Tobias Wendt — DevOps & Platform Engineer
Questions & Observations
Flyway migrations — two new migrations: create
transcription_blockstable, alterdocument_commentsto addblock_id. These run automatically on backend startup. No manual intervention needed. Standard pattern, no concerns.Auto-save traffic impact — debounced 1.5s means each active editor generates ~40 PUT requests per minute of continuous typing. With the expected user base (small family, 1-3 concurrent editors), this is negligible. But if the feature grows, consider: are we writing a full row UPDATE on every save, or just the text column? Full row UPDATE is fine for now, but a PATCH approach would reduce WAL volume in PostgreSQL.
Version history storage growth — if we insert a version row on every auto-save (every 1.5s of typing), a single editing session on one block could generate hundreds of version rows. Consider: should version history be throttled (e.g., one version per 30 seconds of inactivity) rather than one per save? Or: should we compact versions older than 24h into a single "session summary"? The unbounded growth of a
_versionstable could become a disk space concern over years.PDF rendering + annotation drawing — drawing rectangles on a PDF requires PDF.js or similar. This is a client-side dependency — no infrastructure impact. But the PDF files themselves are served from MinIO/S3. If the PDF is large (multi-page scanned letters at 300dpi), the initial load time could be noticeable. Ensure the PDF viewer supports page-by-page loading rather than requiring the full file upfront.
No new environment variables, no new services — the feature adds backend endpoints and frontend components but no new infrastructure. The
transcription_blockstable lives in the existing PostgreSQL database. Comments are in the existingdocument_commentstable. Clean.Suggestions
Database index: add an index on
transcription_blocks(document_id, sort_order)for the list endpoint. And ondocument_comments(block_id)for the comment thread queries. These should be in the Flyway migration.Backup consideration: transcription text is user-generated content that cannot be re-derived from the uploaded files. It's the output of manual human work. Ensure the existing PostgreSQL backup strategy (WAL-G or pg_dump) covers this table. It should by default, but worth verifying the backup is actually tested with a restore after this feature ships.
🎨 Leonie Voss — UI/UX Discussion Summary
Worked through 10 open UI/UX items with the team. All resolved.
Resolved items
Textarea over contenteditable — Use
<textarea>with CSS auto-resize, styled to look seamless (no visible border, matching serif font,resize: none). Eliminates the entire class of contenteditable bugs (paste HTML, undo/redo inconsistency, IME issues) and resolves NullX's stored XSS concern — textarea content is always plain text. Update the spec and AC accordingly.Per-block save indicator — Each block shows its own save state in the footer, right-aligned,
text-xs text-ink-2:text-error, persistent with retry buttonTurquoise left border on focus — Active block gets
border-l-2 border-turquoise. The matching PDF annotation simultaneously brightens from dimmed (30%) to full opacity. Creates a visual link between "this block" and "that rectangle on the scan."Quote hint shown on focus only — "Text markieren für Zitat" hint appears only when the block is focused (pairs with the turquoise left border focus state). Unfocused blocks stay clean — just text and a minimal footer with Kommentieren button and save status.
Desktop drag + mobile arrows — Desktop: drag handle (⠿ grip icon) on the left of each block. Mobile (< 768px): replace drag handle with ▲/▼ arrow buttons, 44×44px tap targets. Same position, right interaction per device.
300ms mode transition animation —
transition: opacity 300ms easeon all annotation rectangles during mode switch. Dimmed annotations fade to 30%, active annotations fade to 100% simultaneously. Users withprefers-reduced-motionget instant switch.Three-layer error escalation for save failures:
beforeunloadconfirmation dialog when unsaved changes exist and user navigates awayborder-l-2 border-error) on blocks with failed saves — visible from a distance, uses same spatial position as the turquoise focus borderBlock card spacing —
gap-3(12px) between blocks.p-4(16px) card padding. Seamless textarea withpy-2internal padding. Footer separated bypt-2 border-t border-line, elements attext-xs text-ink-2. Numbered badge: 24px turquoise circle, top-left, slightly overlapping. Optional label:text-xs font-medium uppercase tracking-wideabove textarea. Key principle: text area dominates, chrome recedes.Stale quote detection — Simple
block.text.includes(quotedString)check when rendering comments. If the quoted text is still found in the block, render the blockquote normally. If not found, add a muted label below: "Zitat aus älterer Version" (text-xs text-ink-2). No visual drama — just informational.Drawing interaction affordance — Three layers for discoverability:
cursor: crosshairwhen hovering over the PDF in transcribe modeOverall read
This is a complex feature but the UI decisions are now well-grounded. The textarea decision is the biggest win — it cuts implementation complexity dramatically while improving accessibility and security. The per-block save indicator and error escalation are essential for our 60+ users handling irreplaceable family documents. Ship with confidence.
🏗️ Markus Keller — Application Architect
Interactive architecture discussion with the project owner. All 7 items were resolved.
Resolved Items
Domain boundary: TranscriptionBlock ownership →
TranscriptionServiceas a facade. OwnsTranscriptionBlockRepository, delegates annotation validation toAnnotationService. Transcription is its own bounded context — not a sub-concern of annotations.Comment FK on block delete →
ON DELETE CASCADE. Deleting a block deletes its comments. The UI must show a confirmation dialog with the comment count before proceeding.Concurrent editing → Optimistic locking via
@VersiononTranscriptionBlock. Stale writes receive a 409 Conflict. Frontend shows a "block was modified by someone else, reload" message. Per-block saves keep the conflict scope narrow.Version history → Manual
transcription_block_versionstable and entity created in #176.TranscriptionService.updateBlock()writes a version row on every save. No retrieval API or UI yet — history is captured from day one, consumption comes later.Block reordering → Single transactional endpoint
PUT /documents/{docId}/transcription-blocks/reorderreceiving the full ordering. Atomic all-or-nothing — partial failures roll back.Annotation↔Block lifecycle → The block is the aggregate root. The annotation is a dependent — no standalone annotation delete exists. Creating a block creates its annotation; deleting a block CASCADEs to the annotation (and per item 2, to comments). Users can only move/resize annotations, never delete them independently.
API nesting → Document-level REST resources:
/api/documents/{docId}/transcription-blocks/*. Annotation coordinates travel as a nested object in the block request/response payload. Authorization checks happen at the document level.Architectural Overview
The key insight from this discussion is that transcription is its own bounded context, not a sub-feature of annotations. The
TranscriptionServicefacade owns the block lifecycle, the annotation is a visual dependent of the block (not the other way around), and the API is rooted at the document level. This gives a clean domain model that scales naturally as read mode (#177), version history retrieval, and future features are added.The data integrity chain is deliberate and strict: delete block → CASCADE annotation + CASCADE comments, with UI confirmation at the trigger point. Optimistic locking prevents silent overwrites. Version history is captured from day one with zero retrieval overhead until it's needed.