Add Prometheus alerting rules for OCR service #654
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
PR #653 wires the
ocr_*Prometheus metrics onocr:8000/metrics. Thescrape target is configured but no alerting rules consume the new signals
yet. Without alerts, a stuck or degraded OCR service is invisible until a
user reports it.
Scope
Add a Prometheus rule file (most likely
infra/observability/prometheus/rules/ocr.yml) wired up via therule_files:list inprometheus.yml, containing the three alerts below.Owner of dispatch (Alertmanager → email / webhook) is out of scope for
this issue — assume the existing receiver chain.
Alert 1 — OCR service models not ready
Rationale:
ocr_models_readyis set to 1 exactly once at the end of theFastAPI lifespan. If it stays 0, the container is up but cannot serve OCR.
Alert 2 — High skipped-page rate
Alert 3 — Training error rate
Acceptance criteria
infra/observability/prometheus/rules/ocr.ymlexists with the three rules above.prometheus.ymlincludes the file viarule_files:.promtool check rules infra/observability/prometheus/rules/ocr.ymlpasses locally.Related
docs/adr/023-prometheus-instrumentator-and-metrics-registry-injection.md)