The previous approach used find across the htrmopo cache which failed
because -newer /tmp ran in a separate container. Now parses the
'Model dir: <path>' line from kraken get output directly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Kraken 7 uses DOIs (not short names) to identify models from Zenodo.
Updated to use actual DOIs:
- 10.5281/zenodo.7933463 — German handwriting HTR
- 10.5281/zenodo.13788177 — McCATMuS generic handwritten/printed/typed
Added -f pdf flag for PDF input, volume mounts for import dir,
and post-download copy from htrmopo cache to the models volume.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Runbook script to download both HTR-United Kurrent model candidates
(german_kurrent_manu_9, kurrent-de) into the ocr_models Docker volume,
test them against sample documents, and activate the winner.
Usage:
./scripts/download-kraken-models.sh # download both
./scripts/download-kraken-models.sh --activate 1 # pick model 1
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stops the container, removes the stale node_modules volume, and
rebuilds the image. Run this after adding or updating npm dependencies.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>