Files
familienarchiv/.claude/skills/transcribe/SKILL.md
2026-04-14 23:21:15 +02:00

5.2 KiB

name, description
name description
transcribe Transcribe a document's PDF by visually analyzing each page, creating annotation-backed transcription blocks via the API with paragraph-level bounding boxes and OCR text.

Transcribe — PDF-to-Transcription-Blocks Workflow

Argument

The user provides:

  1. A document URL, e.g. http://localhost:5173/documents/{id} — extract the document UUID from the path.
  2. A PDF file path, e.g. @import/C-1654.pdf — the source file to read and transcribe.

Phase 1 — Gather Context

  1. Read the PDF using the Read tool to get the visual content of every page.
  2. Check the API — the transcription blocks endpoint is:
    POST /api/documents/{documentId}/transcription-blocks
    
    with Basic Auth (admin:admin123) and JSON body:
    {
      "pageNumber": <1-based>,
      "x": <0-1 normalized>,
      "y": <0-1 normalized>,
      "width": <0-1 normalized>,
      "height": <0-1 normalized>,
      "text": "transcribed text",
      "label": "optional label or null"
    }
    
  3. Check for existing blocksGET /api/documents/{documentId}/transcription-blocks. If blocks already exist, ask the user whether to delete them first or abort. Do not silently overwrite.

Coordinate system

  • All coordinates are normalized 0-1 fractions of page width and height.
  • x, y is the top-left corner of the annotation rectangle.
  • Page numbers are 1-based (page 1 = 1, page 2 = 2).

Phase 2 — Visual Analysis & Segmentation

For each page of the PDF:

  1. Identify the script type: typewritten, Kurrent/Sutterlin, Latin handwriting, mixed, printed, etc.
  2. Segment into logical blocks — each block is one visual paragraph or logical section:
    • Header / letterhead / date line
    • Salutation / greeting
    • Body paragraphs (split at natural paragraph breaks)
    • Closing / signature
    • Address fields (postcards)
    • Margin notes, annotations, stamps
    • Rotated text sections (note the rotation in the label)
  3. Estimate bounding boxes for each block as normalized 0-1 coordinates. The rectangle should tightly enclose all the text in that block with a small margin.
  4. Assign labels to structural blocks:
    • Briefkopf — letterhead / header with date and location
    • Anrede — salutation line
    • Gruss — closing and signature
    • Adresse — address field (postcards)
    • Fortsetzung (gedreht) — rotated continuation text
    • null — regular body paragraphs (no label needed)

Phase 3 — Transcription

For each identified block, transcribe the text:

Rules

  • Never guess. If a word or passage is not clearly readable, use [unleserlich] as a placeholder.
  • Preserve the original spelling, punctuation, and line breaks where they indicate structure (e.g. address lines, signature blocks). Do not "correct" old German spelling.
  • For typewritten text with handwritten corrections/additions above or below the line, note them inline, e.g. statt [unleserlich] or describe in brackets: [handschriftliche Erganzung: ...].
  • For Kurrent/Sutterlin script: be especially conservative. It is better to mark something [unleserlich] than to guess incorrectly. If an entire block is unreadable, use: [unleserlich - Kurrentschrift, kurze Beschreibung des Inhaltsbereichs].
  • For rotated text, note the rotation in the label field.
  • Use \n for line breaks within a block (e.g. multi-line addresses, signature blocks).

Script-specific guidance

Script Confidence threshold Notes
Typewritten (Schreibmaschine) High — most words should be readable Watch for corrections, strikethroughs, carbon copy artifacts
Latin handwriting Medium — depends on hand Easier than Kurrent but still variable
Kurrent / Sutterlin Low — expect heavy [unleserlich] usage Angular strokes, long-s, distinctive letter forms. Context helps (dates, place names, salutations are easier)
Mixed Per-section Common on postcards: Latin address + Kurrent message

Phase 4 — Create Blocks via API

  1. Delete existing blocks if user approved it in Phase 1.
  2. Create blocks in reading order using curl with Basic Auth:
    curl -s -u admin:admin123 -X POST \
      "http://localhost:8080/api/documents/${DOC_ID}/transcription-blocks" \
      -H "Content-Type: application/json" \
      -d '{ "pageNumber": 1, "x": 0.03, "y": 0.02, "width": 0.94, "height": 0.07, "text": "...", "label": "Briefkopf" }'
    
  3. Create blocks page by page, top to bottom. The API auto-assigns sortOrder incrementally.
  4. Verify each response returns a valid block ID.

Phase 5 — Summary

After all blocks are created, present a table:

# Page Label Readability Content (truncated)

Where readability is one of:

  • Klar — fully readable, no [unleserlich] markers
  • Teilweise — some [unleserlich] markers, majority readable
  • Schwer — heavy [unleserlich] usage, only fragments readable
  • Unleserlich — entire block could not be transcribed

End with a note about the overall script type and any sections that would benefit from expert review.