5.2 KiB
5.2 KiB
name, description
| name | description |
|---|---|
| transcribe | Transcribe a document's PDF by visually analyzing each page, creating annotation-backed transcription blocks via the API with paragraph-level bounding boxes and OCR text. |
Transcribe — PDF-to-Transcription-Blocks Workflow
Argument
The user provides:
- A document URL, e.g.
http://localhost:5173/documents/{id}— extract the document UUID from the path. - A PDF file path, e.g.
@import/C-1654.pdf— the source file to read and transcribe.
Phase 1 — Gather Context
- Read the PDF using the Read tool to get the visual content of every page.
- Check the API — the transcription blocks endpoint is:
with Basic Auth (
POST /api/documents/{documentId}/transcription-blocksadmin:admin123) and JSON body:{ "pageNumber": <1-based>, "x": <0-1 normalized>, "y": <0-1 normalized>, "width": <0-1 normalized>, "height": <0-1 normalized>, "text": "transcribed text", "label": "optional label or null" } - Check for existing blocks —
GET /api/documents/{documentId}/transcription-blocks. If blocks already exist, ask the user whether to delete them first or abort. Do not silently overwrite.
Coordinate system
- All coordinates are normalized 0-1 fractions of page width and height.
x,yis the top-left corner of the annotation rectangle.- Page numbers are 1-based (page 1 = 1, page 2 = 2).
Phase 2 — Visual Analysis & Segmentation
For each page of the PDF:
- Identify the script type: typewritten, Kurrent/Sutterlin, Latin handwriting, mixed, printed, etc.
- Segment into logical blocks — each block is one visual paragraph or logical section:
- Header / letterhead / date line
- Salutation / greeting
- Body paragraphs (split at natural paragraph breaks)
- Closing / signature
- Address fields (postcards)
- Margin notes, annotations, stamps
- Rotated text sections (note the rotation in the label)
- Estimate bounding boxes for each block as normalized 0-1 coordinates. The rectangle should tightly enclose all the text in that block with a small margin.
- Assign labels to structural blocks:
Briefkopf— letterhead / header with date and locationAnrede— salutation lineGruss— closing and signatureAdresse— address field (postcards)Fortsetzung (gedreht)— rotated continuation textnull— regular body paragraphs (no label needed)
Phase 3 — Transcription
For each identified block, transcribe the text:
Rules
- Never guess. If a word or passage is not clearly readable, use
[unleserlich]as a placeholder. - Preserve the original spelling, punctuation, and line breaks where they indicate structure (e.g. address lines, signature blocks). Do not "correct" old German spelling.
- For typewritten text with handwritten corrections/additions above or below the line, note them inline, e.g.
statt [unleserlich]or describe in brackets:[handschriftliche Erganzung: ...]. - For Kurrent/Sutterlin script: be especially conservative. It is better to mark something
[unleserlich]than to guess incorrectly. If an entire block is unreadable, use:[unleserlich - Kurrentschrift, kurze Beschreibung des Inhaltsbereichs]. - For rotated text, note the rotation in the label field.
- Use
\nfor line breaks within a block (e.g. multi-line addresses, signature blocks).
Script-specific guidance
| Script | Confidence threshold | Notes |
|---|---|---|
| Typewritten (Schreibmaschine) | High — most words should be readable | Watch for corrections, strikethroughs, carbon copy artifacts |
| Latin handwriting | Medium — depends on hand | Easier than Kurrent but still variable |
| Kurrent / Sutterlin | Low — expect heavy [unleserlich] usage |
Angular strokes, long-s, distinctive letter forms. Context helps (dates, place names, salutations are easier) |
| Mixed | Per-section | Common on postcards: Latin address + Kurrent message |
Phase 4 — Create Blocks via API
- Delete existing blocks if user approved it in Phase 1.
- Create blocks in reading order using
curlwith Basic Auth:curl -s -u admin:admin123 -X POST \ "http://localhost:8080/api/documents/${DOC_ID}/transcription-blocks" \ -H "Content-Type: application/json" \ -d '{ "pageNumber": 1, "x": 0.03, "y": 0.02, "width": 0.94, "height": 0.07, "text": "...", "label": "Briefkopf" }' - Create blocks page by page, top to bottom. The API auto-assigns
sortOrderincrementally. - Verify each response returns a valid block ID.
Phase 5 — Summary
After all blocks are created, present a table:
| # | Page | Label | Readability | Content (truncated) |
|---|
Where readability is one of:
- Klar — fully readable, no
[unleserlich]markers - Teilweise — some
[unleserlich]markers, majority readable - Schwer — heavy
[unleserlich]usage, only fragments readable - Unleserlich — entire block could not be transcribed
End with a note about the overall script type and any sections that would benefit from expert review.