# ADR-023: Prometheus Instrumentator and Metrics Registry Injection ## Status Accepted ## Context Until issue #652 the OCR service exposed no `/metrics` endpoint. The observability stack already scrapes the Spring Boot backend's actuator endpoint, but it had nothing to scrape on the Python side. Without HTTP- and domain-level metrics from `ocr-service` we cannot answer questions like "what is the share of words rendered as `[unleserlich]`" or "is the training error rate above its budget" from Grafana. Two implementation requirements influenced the design: 1. **Counter / gauge isolation in tests.** `prometheus_client` collectors are module-level singletons keyed by name on the global `REGISTRY`. Re-importing or naively re-instantiating them raises a duplicated- collector error and cross-test state leaks (a `.inc()` in test A is still readable by test B). A test harness needs a way to swap the active container for a fresh per-test instance. 2. **Minimal blast radius on the request path.** We did not want to hand-instrument every endpoint with FastAPI middleware. The `prometheus-fastapi-instrumentator` library already provides `http_requests_total`, `http_request_duration_seconds`, and the `/metrics` exposition route, all idiomatic Prometheus names. ## Decision - Add `prometheus-fastapi-instrumentator==7.0.0` and pin its transitive dependency `prometheus-client==0.25.0` explicitly in `ocr-service/requirements.txt`. - Mount the instrumentator once at module load: `Instrumentator(excluded_handlers=["/health", "/metrics"]).instrument(app).expose(app)`. This adds `/metrics` and an HTTP-level dashboard surface without changing any endpoint code. - Define every domain metric (`ocr_jobs_total`, `ocr_pages_total`, `ocr_processing_seconds`, …) inside a `build_metrics(registry)` factory in `ocr-service/metrics.py` that returns a frozen `OcrMetrics` dataclass. Production code binds the container to the default `REGISTRY` once: `metrics: OcrMetrics = build_metrics(REGISTRY)`. - Tests use a `fresh_metrics` fixture that builds a new `CollectorRegistry()` per test and monkeypatches `main.metrics` with a container bound to it. The endpoint code keeps reading `metrics.` without knowing whether it is talking to the global registry or a per-test one. ## Consequences **Positive** - One reusable factory captures the metric definitions; future metrics go in one place. - Tests run with full counter isolation. Cross-test state leakage is impossible because each test sees its own dataclass instance. - The instrumentator gives us `http_*` metrics for free, including a Grafana-ready histogram that pairs with the Spring Boot one. **Negative** - One extra level of indirection: any test that asserts on metric values must remember to monkeypatch `main.metrics`, not the registry directly. Rebinding through the registry is harmless but useless — the dataclass holds references to the original collectors. - `prometheus-client` is now pinned. Upgrading it requires an explicit bump and re-checking the instrumentator's compatibility range. - `/metrics` is exposed unauthenticated and relies on the Docker internal network for confidentiality. See [docs/OBSERVABILITY.md §Internal-only endpoints](../OBSERVABILITY.md) for the Caddy snippet that must be added if the service ever gets a host-side port mapping. ## Alternatives considered - **Hand-roll the `/metrics` endpoint.** Rejected: would have meant duplicating what `prometheus-fastapi-instrumentator` ships, plus middleware for the HTTP histograms. - **Skip the factory; pass `registry` as a function argument everywhere.** Rejected: clutters every endpoint signature and breaks the symmetry with the Spring Boot side, which also relies on a process-global Micrometer registry. - **Use a `pytest` autouse fixture that resets `REGISTRY` between tests.** Rejected: `prometheus_client` does not expose a clean "unregister all" hook, and we would be relying on private APIs. ## References - Issue: [#652](https://git.raddatz.cloud/marcel/familienarchiv/issues/652) - Library: - Code: `ocr-service/metrics.py`, `ocr-service/main.py`, `ocr-service/test_metrics.py`