docs: add ADR-023 and glossary entries for OCR metrics

ADR-023 captures why prometheus-fastapi-instrumentator was chosen, the build_metrics(registry) factory pattern, and the test rebinding seam. The glossary gains four ops-aligned terms — illegible word, models-ready gauge, recognition vs segmentation accuracy — so the metrics documentation in OBSERVABILITY.md has a vocabulary to lean on. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 17:06:44 +02:00
parent 2dbb3c37b4
commit 2df71beb7e
2 changed files with 102 additions and 0 deletions
--- a/docs/GLOSSARY.md
+++ b/docs/GLOSSARY.md
@@ -80,6 +80,14 @@ _See also [DocumentStatus lifecycle](#documentstatus-lifecycle)._

 **Sütterlin** — A specific standardized style of Kurrent taught in German schools from 1915 to 1941.

+**Illegible word** — a word whose recognition confidence falls below the configured threshold; replaced with the literal token `[unleserlich]` in the rendered block text and counted in the `ocr_illegible_words_total` Prometheus counter.
+
+**Models-ready gauge** — the `ocr_models_ready` Prometheus gauge, flipped from `0` to `1` once the FastAPI lifespan startup has finished loading the Kraken model and the spell-checker. Used both for the `/health` endpoint and as the supervised signal for the `ocr_models_ready < 1 for 2m` alert.
+
+**Recognition model accuracy** — the accuracy reported by `ketos train` for the recognition (text-line) model, exposed as `ocr_model_accuracy{kind="recognition"}`. Sourced from `_parse_best_checkpoint` on the highest-scoring checkpoint after training.
+
+**Segmentation model accuracy** — the accuracy reported by `ketos segtrain` for the baseline layout analysis (`blla`) model, exposed as `ocr_model_accuracy{kind="segmentation"}`. Distinct from recognition accuracy because the two models are trained and improved independently.
+
 ---

 ## Other Domain Terms