Adds a logging.Filter on uvicorn.access that drops records whose request
path is /metrics or /health. Each is hit on a tight schedule (Prometheus
scrape interval and Docker healthcheck), so unfiltered they dominate the
access log without carrying any information about real traffic.
Refs #652 (Nora's recommendation)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirrors the existing _models_ready bool so Prometheus has a time-series
liveness/readiness signal for future alerting rules (e.g.
ocr_models_ready < 1 for 2m).
Refs #652 (AC7)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hits /train then /segtrain through the same test, each with a distinct
mocked accuracy, and asserts the labelled gauges reflect the two values.
Locks down the kind-label separation between recognition and segmentation
accuracy (decision #2).
Refs #652 (AC6)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wraps the await asyncio.to_thread(_run_*) calls in /train, /train-sender,
and /segtrain with try/except. Recognition training (/train, /train-sender)
shares kind="recognition"; /segtrain uses kind="segmentation". The
ocr_model_accuracy gauge is set per kind on success.
Refs #652 (AC6, decision #2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wraps every asyncio.to_thread(engine.extract_*) call with time.monotonic()
deltas in /ocr (per document) and in both /ocr/stream generators (per page).
Streaming buckets are the useful operational signal; the non-streaming
observation is a bonus.
Refs #652 (AC5)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Walks block["words"] before apply_confidence_markers strips the list, then
increments ocr_words_total by len(words) and ocr_illegible_words_total by
the count below threshold. Same pattern in both /ocr and /ocr/stream so the
ratio illegible/words is a faithful quality signal across endpoints.
Refs #652 (AC4)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bumps the counter in both /ocr/stream except blocks (standard and guided
generators) so the existing skipped_pages local variable now also flows
into Prometheus.
Refs #652 (AC3b)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bumps the counter inside both the standard and guided /ocr/stream
generators after a page yields its blocks, before the per-page json line is
emitted. Also moves the ocr_jobs_total increment for /ocr/stream right after
engine selection so the counter still fires when a page later errors out.
Refs #652 (AC3a)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Locks down AC2 for the non-Kurrent path. The same code branch in /ocr that
sets engine_name from script_type now has explicit coverage for both
HANDWRITING_KURRENT → kraken and TYPEWRITER → surya.
Refs #652 (AC2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pick engine="kraken" for HANDWRITING_KURRENT, engine="surya" otherwise,
then increment after the blocks have been extracted.
Refs #652 (AC2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Encapsulates every custom OCR metric in an OcrMetrics frozen dataclass and
exposes a `build_metrics(registry)` factory. Production main.py binds against
the default REGISTRY; tests construct a fresh CollectorRegistry per case and
monkeypatch main.metrics, so counter values stay isolated between tests
(decision #3 on issue #652, Option A).
Refs #652
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Locks down AC1: prometheus-fastapi-instrumentator must keep auto-exposing
http_requests_total and http_request_duration_seconds for application
traffic, not just register the /metrics endpoint.
Refs #652
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mount the instrumentator immediately after FastAPI app creation, excluding
/health and /metrics from request metrics to keep http_requests_total focused
on real application traffic.
Refs #652
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>