feat(ocr): add DTA-derived historical German wordlist and generation script

153K words from dtak+dtae 1800-1899 corpora (min_freq=20),
covering pre-reform spellings common in Kurrent/Süterlin documents.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-04-17 16:48:26 +02:00
parent 6faaa3b7d6
commit 30a6cbeb7f
2 changed files with 153641 additions and 0 deletions

File diff suppressed because it is too large Load Diff