Files

Marcel 3d3d4b8616 chore: add Claude personas, skills, memory, and project docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-14 23:21:15 +02:00

5.2 KiB

Raw Blame History

name, description

name	description
transcribe	Transcribe a document's PDF by visually analyzing each page, creating annotation-backed transcription blocks via the API with paragraph-level bounding boxes and OCR text.

Transcribe — PDF-to-Transcription-Blocks Workflow

Argument

The user provides:

A document URL, e.g. http://localhost:5173/documents/{id} — extract the document UUID from the path.
A PDF file path, e.g. @import/C-1654.pdf — the source file to read and transcribe.

Phase 1 — Gather Context

Read the PDF using the Read tool to get the visual content of every page.

Check the API — the transcription blocks endpoint is:

POST /api/documents/{documentId}/transcription-blocks

with Basic Auth (admin:admin123) and JSON body:

{
  "pageNumber": <1-based>,
  "x": <0-1 normalized>,
  "y": <0-1 normalized>,
  "width": <0-1 normalized>,
  "height": <0-1 normalized>,
  "text": "transcribed text",
  "label": "optional label or null"
}

Check for existing blocks — GET /api/documents/{documentId}/transcription-blocks. If blocks already exist, ask the user whether to delete them first or abort. Do not silently overwrite.

Coordinate system

All coordinates are normalized 0-1 fractions of page width and height.
x, y is the top-left corner of the annotation rectangle.
Page numbers are 1-based (page 1 = 1, page 2 = 2).

Phase 2 — Visual Analysis & Segmentation

For each page of the PDF:

Identify the script type: typewritten, Kurrent/Sutterlin, Latin handwriting, mixed, printed, etc.
Segment into logical blocks — each block is one visual paragraph or logical section:
- Header / letterhead / date line
- Salutation / greeting
- Body paragraphs (split at natural paragraph breaks)
- Closing / signature
- Address fields (postcards)
- Margin notes, annotations, stamps
- Rotated text sections (note the rotation in the label)
Estimate bounding boxes for each block as normalized 0-1 coordinates. The rectangle should tightly enclose all the text in that block with a small margin.
Assign labels to structural blocks:
- Briefkopf — letterhead / header with date and location
- Anrede — salutation line
- Gruss — closing and signature
- Adresse — address field (postcards)
- Fortsetzung (gedreht) — rotated continuation text
- null — regular body paragraphs (no label needed)

Phase 3 — Transcription

For each identified block, transcribe the text:

Rules

Never guess. If a word or passage is not clearly readable, use [unleserlich] as a placeholder.
Preserve the original spelling, punctuation, and line breaks where they indicate structure (e.g. address lines, signature blocks). Do not "correct" old German spelling.
For typewritten text with handwritten corrections/additions above or below the line, note them inline, e.g. statt [unleserlich] or describe in brackets: [handschriftliche Erganzung: ...].
For Kurrent/Sutterlin script: be especially conservative. It is better to mark something [unleserlich] than to guess incorrectly. If an entire block is unreadable, use: [unleserlich - Kurrentschrift, kurze Beschreibung des Inhaltsbereichs].
For rotated text, note the rotation in the label field.
Use \n for line breaks within a block (e.g. multi-line addresses, signature blocks).

Script-specific guidance

Script	Confidence threshold	Notes
Typewritten (Schreibmaschine)	High — most words should be readable	Watch for corrections, strikethroughs, carbon copy artifacts
Latin handwriting	Medium — depends on hand	Easier than Kurrent but still variable
Kurrent / Sutterlin	Low — expect heavy `[unleserlich]` usage	Angular strokes, long-s, distinctive letter forms. Context helps (dates, place names, salutations are easier)
Mixed	Per-section	Common on postcards: Latin address + Kurrent message

Phase 4 — Create Blocks via API

Delete existing blocks if user approved it in Phase 1.

Create blocks in reading order using curl with Basic Auth:

curl -s -u admin:admin123 -X POST \
  "http://localhost:8080/api/documents/${DOC_ID}/transcription-blocks" \
  -H "Content-Type: application/json" \
  -d '{ "pageNumber": 1, "x": 0.03, "y": 0.02, "width": 0.94, "height": 0.07, "text": "...", "label": "Briefkopf" }'

Create blocks page by page, top to bottom. The API auto-assigns sortOrder incrementally.
Verify each response returns a valid block ID.

Phase 5 — Summary

After all blocks are created, present a table:

#	Page	Label	Readability	Content (truncated)

Where readability is one of:

Klar — fully readable, no [unleserlich] markers
Teilweise — some [unleserlich] markers, majority readable
Schwer — heavy [unleserlich] usage, only fragments readable
Unleserlich — entire block could not be transcribed

End with a note about the overall script type and any sections that would benefit from expert review.

5.2 KiB Raw Blame History