--- name: transcribe description: Transcribe a document's PDF by visually analyzing each page, creating annotation-backed transcription blocks via the API with paragraph-level bounding boxes and OCR text. --- # Transcribe — PDF-to-Transcription-Blocks Workflow ## Argument The user provides: 1. A **document URL**, e.g. `http://localhost:5173/documents/{id}` — extract the document UUID from the path. 2. A **PDF file path**, e.g. `@import/C-1654.pdf` — the source file to read and transcribe. --- ## Phase 1 — Gather Context 1. **Read the PDF** using the Read tool to get the visual content of every page. 2. **Check the API** — the transcription blocks endpoint is: ``` POST /api/documents/{documentId}/transcription-blocks ``` with Basic Auth (`admin:admin123`) and JSON body: ```json { "pageNumber": <1-based>, "x": <0-1 normalized>, "y": <0-1 normalized>, "width": <0-1 normalized>, "height": <0-1 normalized>, "text": "transcribed text", "label": "optional label or null" } ``` 3. **Check for existing blocks** — `GET /api/documents/{documentId}/transcription-blocks`. If blocks already exist, ask the user whether to delete them first or abort. Do not silently overwrite. ### Coordinate system - All coordinates are **normalized 0-1 fractions** of page width and height. - `x`, `y` is the **top-left corner** of the annotation rectangle. - Page numbers are **1-based** (page 1 = 1, page 2 = 2). --- ## Phase 2 — Visual Analysis & Segmentation For each page of the PDF: 1. **Identify the script type**: typewritten, Kurrent/Sutterlin, Latin handwriting, mixed, printed, etc. 2. **Segment into logical blocks** — each block is one visual paragraph or logical section: - Header / letterhead / date line - Salutation / greeting - Body paragraphs (split at natural paragraph breaks) - Closing / signature - Address fields (postcards) - Margin notes, annotations, stamps - Rotated text sections (note the rotation in the label) 3. **Estimate bounding boxes** for each block as normalized 0-1 coordinates. The rectangle should tightly enclose all the text in that block with a small margin. 4. **Assign labels** to structural blocks: - `Briefkopf` — letterhead / header with date and location - `Anrede` — salutation line - `Gruss` — closing and signature - `Adresse` — address field (postcards) - `Fortsetzung (gedreht)` — rotated continuation text - `null` — regular body paragraphs (no label needed) --- ## Phase 3 — Transcription For each identified block, transcribe the text: ### Rules - **Never guess**. If a word or passage is not clearly readable, use `[unleserlich]` as a placeholder. - Preserve the original spelling, punctuation, and line breaks where they indicate structure (e.g. address lines, signature blocks). Do not "correct" old German spelling. - For typewritten text with handwritten corrections/additions above or below the line, note them inline, e.g. `statt [unleserlich]` or describe in brackets: `[handschriftliche Erganzung: ...]`. - For Kurrent/Sutterlin script: be especially conservative. It is better to mark something `[unleserlich]` than to guess incorrectly. If an entire block is unreadable, use: `[unleserlich - Kurrentschrift, kurze Beschreibung des Inhaltsbereichs]`. - For rotated text, note the rotation in the label field. - Use `\n` for line breaks within a block (e.g. multi-line addresses, signature blocks). ### Script-specific guidance | Script | Confidence threshold | Notes | |--------|---------------------|-------| | Typewritten (Schreibmaschine) | High — most words should be readable | Watch for corrections, strikethroughs, carbon copy artifacts | | Latin handwriting | Medium — depends on hand | Easier than Kurrent but still variable | | Kurrent / Sutterlin | Low — expect heavy `[unleserlich]` usage | Angular strokes, long-s, distinctive letter forms. Context helps (dates, place names, salutations are easier) | | Mixed | Per-section | Common on postcards: Latin address + Kurrent message | --- ## Phase 4 — Create Blocks via API 1. **Delete existing blocks** if user approved it in Phase 1. 2. **Create blocks in reading order** using `curl` with Basic Auth: ```bash curl -s -u admin:admin123 -X POST \ "http://localhost:8080/api/documents/${DOC_ID}/transcription-blocks" \ -H "Content-Type: application/json" \ -d '{ "pageNumber": 1, "x": 0.03, "y": 0.02, "width": 0.94, "height": 0.07, "text": "...", "label": "Briefkopf" }' ``` 3. Create blocks **page by page, top to bottom**. The API auto-assigns `sortOrder` incrementally. 4. Verify each response returns a valid block ID. --- ## Phase 5 — Summary After all blocks are created, present a table: | # | Page | Label | Readability | Content (truncated) | |---|------|-------|-------------|---------------------| Where readability is one of: - **Klar** — fully readable, no `[unleserlich]` markers - **Teilweise** — some `[unleserlich]` markers, majority readable - **Schwer** — heavy `[unleserlich]` usage, only fragments readable - **Unleserlich** — entire block could not be transcribed End with a note about the overall script type and any sections that would benefit from expert review.