---
name: transcribe
description: Transcribe a document's PDF by visually analyzing each page, creating annotation-backed transcription blocks via the API with paragraph-level bounding boxes and OCR text.
---

# Transcribe — PDF-to-Transcription-Blocks Workflow

## Argument

The user provides:
1. A **document URL**, e.g. `http://localhost:5173/documents/{id}` — extract the document UUID from the path.
2. A **PDF file path**, e.g. `@import/C-1654.pdf` — the source file to read and transcribe.

---

## Phase 1 — Gather Context

1. **Read the PDF** using the Read tool to get the visual content of every page.
2. **Check the API** — the transcription blocks endpoint is:
   ```
   POST /api/documents/{documentId}/transcription-blocks
   ```
   with Basic Auth (`admin:admin123`) and JSON body:
   ```json
   {
     "pageNumber": <1-based>,
     "x": <0-1 normalized>,
     "y": <0-1 normalized>,
     "width": <0-1 normalized>,
     "height": <0-1 normalized>,
     "text": "transcribed text",
     "label": "optional label or null"
   }
   ```
3. **Check for existing blocks** — `GET /api/documents/{documentId}/transcription-blocks`. If blocks already exist, ask the user whether to delete them first or abort. Do not silently overwrite.

### Coordinate system

- All coordinates are **normalized 0-1 fractions** of page width and height.
- `x`, `y` is the **top-left corner** of the annotation rectangle.
- Page numbers are **1-based** (page 1 = 1, page 2 = 2).

---

## Phase 2 — Visual Analysis & Segmentation

For each page of the PDF:

1. **Identify the script type**: typewritten, Kurrent/Sutterlin, Latin handwriting, mixed, printed, etc.
2. **Segment into logical blocks** — each block is one visual paragraph or logical section:
   - Header / letterhead / date line
   - Salutation / greeting
   - Body paragraphs (split at natural paragraph breaks)
   - Closing / signature
   - Address fields (postcards)
   - Margin notes, annotations, stamps
   - Rotated text sections (note the rotation in the label)
3. **Estimate bounding boxes** for each block as normalized 0-1 coordinates. The rectangle should tightly enclose all the text in that block with a small margin.
4. **Assign labels** to structural blocks:
   - `Briefkopf` — letterhead / header with date and location
   - `Anrede` — salutation line
   - `Gruss` — closing and signature
   - `Adresse` — address field (postcards)
   - `Fortsetzung (gedreht)` — rotated continuation text
   - `null` — regular body paragraphs (no label needed)

---

## Phase 3 — Transcription

For each identified block, transcribe the text:

### Rules

- **Never guess**. If a word or passage is not clearly readable, use `[unleserlich]` as a placeholder.
- Preserve the original spelling, punctuation, and line breaks where they indicate structure (e.g. address lines, signature blocks). Do not "correct" old German spelling.
- For typewritten text with handwritten corrections/additions above or below the line, note them inline, e.g. `statt [unleserlich]` or describe in brackets: `[handschriftliche Erganzung: ...]`.
- For Kurrent/Sutterlin script: be especially conservative. It is better to mark something `[unleserlich]` than to guess incorrectly. If an entire block is unreadable, use: `[unleserlich - Kurrentschrift, kurze Beschreibung des Inhaltsbereichs]`.
- For rotated text, note the rotation in the label field.
- Use `\n` for line breaks within a block (e.g. multi-line addresses, signature blocks).

### Script-specific guidance

| Script | Confidence threshold | Notes |
|--------|---------------------|-------|
| Typewritten (Schreibmaschine) | High — most words should be readable | Watch for corrections, strikethroughs, carbon copy artifacts |
| Latin handwriting | Medium — depends on hand | Easier than Kurrent but still variable |
| Kurrent / Sutterlin | Low — expect heavy `[unleserlich]` usage | Angular strokes, long-s, distinctive letter forms. Context helps (dates, place names, salutations are easier) |
| Mixed | Per-section | Common on postcards: Latin address + Kurrent message |

---

## Phase 4 — Create Blocks via API

1. **Delete existing blocks** if user approved it in Phase 1.
2. **Create blocks in reading order** using `curl` with Basic Auth:
   ```bash
   curl -s -u admin:admin123 -X POST \
     "http://localhost:8080/api/documents/${DOC_ID}/transcription-blocks" \
     -H "Content-Type: application/json" \
     -d '{ "pageNumber": 1, "x": 0.03, "y": 0.02, "width": 0.94, "height": 0.07, "text": "...", "label": "Briefkopf" }'
   ```
3. Create blocks **page by page, top to bottom**. The API auto-assigns `sortOrder` incrementally.
4. Verify each response returns a valid block ID.

---

## Phase 5 — Summary

After all blocks are created, present a table:

| # | Page | Label | Readability | Content (truncated) |
|---|------|-------|-------------|---------------------|

Where readability is one of:
- **Klar** — fully readable, no `[unleserlich]` markers
- **Teilweise** — some `[unleserlich]` markers, majority readable
- **Schwer** — heavy `[unleserlich]` usage, only fragments readable
- **Unleserlich** — entire block could not be transcribed

End with a note about the overall script type and any sections that would benefit from expert review.