diff --git a/docs/GLOSSARY.md b/docs/GLOSSARY.md
index 21addb4c..99da1775 100644
--- a/docs/GLOSSARY.md
+++ b/docs/GLOSSARY.md
@@ -80,6 +80,14 @@ _See also [DocumentStatus lifecycle](#documentstatus-lifecycle)._
 
 **Sütterlin** — A specific standardized style of Kurrent taught in German schools from 1915 to 1941.
 
+**Illegible word** — a word whose recognition confidence falls below the configured threshold; replaced with the literal token `[unleserlich]` in the rendered block text and counted in the `ocr_illegible_words_total` Prometheus counter.
+
+**Models-ready gauge** — the `ocr_models_ready` Prometheus gauge, flipped from `0` to `1` once the FastAPI lifespan startup has finished loading the Kraken model and the spell-checker. Used both for the `/health` endpoint and as the supervised signal for the `ocr_models_ready < 1 for 2m` alert.
+
+**Recognition model accuracy** — the accuracy reported by `ketos train` for the recognition (text-line) model, exposed as `ocr_model_accuracy{kind="recognition"}`. Sourced from `_parse_best_checkpoint` on the highest-scoring checkpoint after training.
+
+**Segmentation model accuracy** — the accuracy reported by `ketos segtrain` for the baseline layout analysis (`blla`) model, exposed as `ocr_model_accuracy{kind="segmentation"}`. Distinct from recognition accuracy because the two models are trained and improved independently.
+
 ---
 
 ## Other Domain Terms
diff --git a/docs/OBSERVABILITY.md b/docs/OBSERVABILITY.md
index b895e849..2b3855d6 100644
--- a/docs/OBSERVABILITY.md
+++ b/docs/OBSERVABILITY.md
@@ -118,11 +118,14 @@ To find a trace for a specific request in staging/production, either increase th
 
 ## Metrics (Prometheus → Grafana)
 
-Prometheus scrapes the backend management endpoint every 15 s:
+Prometheus scrapes two targets every 15 s:
 
 ```
 Target: backend:8081/actuator/prometheus
 Labels: job="spring-boot", application="Familienarchiv"
+
+Target: ocr:8000/metrics
+Labels: job="ocr-service"
 ```
 
 All Spring Boot metrics carry the `application="Familienarchiv"` tag, which is how the Grafana Spring Boot Observability dashboard (ID 17175) filters to this service.
@@ -146,6 +149,70 @@ jvm_memory_used_bytes{area="heap", application="Familienarchiv"}
 hikaricp_connections_active
 ```
 
+### OCR-service custom metrics
+
+Exposed at `ocr:8000/metrics` by `prometheus-fastapi-instrumentator`. The
+`http_*` metrics describe the FastAPI request layer; the `ocr_*` series are
+domain-specific. **Never label these with PII or document content** — labels
+have unbounded cardinality risk and are visible to anyone with Grafana access.
+
+| Metric | Type | Labels | Unit | What it tracks |
+|---|---|---|---|---|
+| `ocr_jobs_total` | Counter | `engine` (`surya`/`kraken`), `script_type` | jobs | OCR jobs that started after a successful PDF download |
+| `ocr_pages_total` | Counter | `engine` | pages | Successfully OCR'd pages in the streaming generator |
+| `ocr_skipped_pages_total` | Counter | — | pages | Pages skipped because the engine raised on them |
+| `ocr_words_total` | Counter | — | words | Recognized words summed across every block |
+| `ocr_illegible_words_total` | Counter | — | words | Words below the confidence threshold (rendered as `[unleserlich]`) |
+| `ocr_processing_seconds` | Histogram | `engine` | seconds | Per-page (stream) or per-document (`/ocr`) engine time, excluding preprocessing |
+| `ocr_training_runs_total` | Counter | `kind` (`recognition`/`segmentation`), `outcome` (`success`/`error`) | runs | Completed training runs |
+| `ocr_model_accuracy` | Gauge | `kind` | ratio (0–1) | Latest accuracy reported by a successful training run |
+| `ocr_models_ready` | Gauge | — | 0\|1 | 1 once the lifespan startup has finished loading models |
+
+Canonical example queries (the same ones referenced in issue #652):
+
+```promql
+# OCR throughput by engine
+sum by (engine) (rate(ocr_pages_total[5m]))
+
+# Share of words rendered as [unleserlich]
+sum(rate(ocr_illegible_words_total[5m]))
+  / sum(rate(ocr_words_total[5m]))
+
+# p95 page processing time per engine
+histogram_quantile(0.95, sum by (engine, le) (
+  rate(ocr_processing_seconds_bucket[5m])
+))
+
+# Training error rate
+sum(rate(ocr_training_runs_total{outcome="error"}[1h]))
+  / sum(rate(ocr_training_runs_total[1h]))
+
+# Latest recognition vs segmentation accuracy
+ocr_model_accuracy
+```
+
+### Internal-only endpoints
+
+`/metrics` is exposed by the OCR service over plain HTTP without
+authentication. The container is reachable only on the internal Docker
+network — Caddy never proxies to it directly. If the service is ever
+exposed (e.g. a `ports:` mapping is added), block the endpoint at the
+reverse proxy:
+
+```caddy
+ocr.example.com {
+    @internal_only path /metrics /health
+    respond @internal_only 404
+    reverse_proxy ocr:8000
+}
+```
+
+The `MetricsPathFilter` in `ocr-service/main.py` suppresses uvicorn's
+**stdout** access log lines for `/metrics` and `/health` so the container
+console stays focused on real OCR traffic. Promtail/Loki still receive
+access lines from any other source. Treat the filter as console
+noise-control, not an audit-suppression mechanism.
+
 ## Errors (GlitchTip)
 
 GlitchTip receives errors from both the backend (via Sentry Java SDK) and the frontend (via Sentry JavaScript SDK). It groups events by fingerprint, tracks first/last seen times, and links to the release that introduced the error.
diff --git a/docs/adr/023-prometheus-instrumentator-and-metrics-registry-injection.md b/docs/adr/023-prometheus-instrumentator-and-metrics-registry-injection.md
new file mode 100644
index 00000000..5e8a1020
--- /dev/null
+++ b/docs/adr/023-prometheus-instrumentator-and-metrics-registry-injection.md
@@ -0,0 +1,94 @@
+# ADR-023: Prometheus Instrumentator and Metrics Registry Injection
+
+## Status
+
+Accepted
+
+## Context
+
+Until issue #652 the OCR service exposed no `/metrics` endpoint. The
+observability stack already scrapes the Spring Boot backend's actuator
+endpoint, but it had nothing to scrape on the Python side. Without HTTP-
+and domain-level metrics from `ocr-service` we cannot answer questions
+like "what is the share of words rendered as `[unleserlich]`" or
+"is the training error rate above its budget" from Grafana.
+
+Two implementation requirements influenced the design:
+
+1. **Counter / gauge isolation in tests.** `prometheus_client` collectors
+   are module-level singletons keyed by name on the global `REGISTRY`.
+   Re-importing or naively re-instantiating them raises a duplicated-
+   collector error and cross-test state leaks (a `.inc()` in test A is
+   still readable by test B). A test harness needs a way to swap the
+   active container for a fresh per-test instance.
+
+2. **Minimal blast radius on the request path.** We did not want to
+   hand-instrument every endpoint with FastAPI middleware. The
+   `prometheus-fastapi-instrumentator` library already provides
+   `http_requests_total`, `http_request_duration_seconds`, and the
+   `/metrics` exposition route, all idiomatic Prometheus names.
+
+## Decision
+
+- Add `prometheus-fastapi-instrumentator==7.0.0` and pin its transitive
+  dependency `prometheus-client==0.25.0` explicitly in
+  `ocr-service/requirements.txt`.
+- Mount the instrumentator once at module load:
+  `Instrumentator(excluded_handlers=["/health", "/metrics"]).instrument(app).expose(app)`.
+  This adds `/metrics` and an HTTP-level dashboard surface without
+  changing any endpoint code.
+- Define every domain metric (`ocr_jobs_total`, `ocr_pages_total`,
+  `ocr_processing_seconds`, …) inside a `build_metrics(registry)`
+  factory in `ocr-service/metrics.py` that returns a frozen `OcrMetrics`
+  dataclass. Production code binds the container to the default
+  `REGISTRY` once: `metrics: OcrMetrics = build_metrics(REGISTRY)`.
+- Tests use a `fresh_metrics` fixture that builds a new
+  `CollectorRegistry()` per test and monkeypatches `main.metrics` with
+  a container bound to it. The endpoint code keeps reading
+  `metrics.<name>` without knowing whether it is talking to the global
+  registry or a per-test one.
+
+## Consequences
+
+**Positive**
+
+- One reusable factory captures the metric definitions; future metrics
+  go in one place.
+- Tests run with full counter isolation. Cross-test state leakage is
+  impossible because each test sees its own dataclass instance.
+- The instrumentator gives us `http_*` metrics for free, including a
+  Grafana-ready histogram that pairs with the Spring Boot one.
+
+**Negative**
+
+- One extra level of indirection: any test that asserts on metric
+  values must remember to monkeypatch `main.metrics`, not the registry
+  directly. Rebinding through the registry is harmless but useless —
+  the dataclass holds references to the original collectors.
+- `prometheus-client` is now pinned. Upgrading it requires an explicit
+  bump and re-checking the instrumentator's compatibility range.
+- `/metrics` is exposed unauthenticated and relies on the Docker
+  internal network for confidentiality. See
+  [docs/OBSERVABILITY.md §Internal-only endpoints](../OBSERVABILITY.md)
+  for the Caddy snippet that must be added if the service ever gets a
+  host-side port mapping.
+
+## Alternatives considered
+
+- **Hand-roll the `/metrics` endpoint.** Rejected: would have meant
+  duplicating what `prometheus-fastapi-instrumentator` ships, plus
+  middleware for the HTTP histograms.
+- **Skip the factory; pass `registry` as a function argument
+  everywhere.** Rejected: clutters every endpoint signature and breaks
+  the symmetry with the Spring Boot side, which also relies on a
+  process-global Micrometer registry.
+- **Use a `pytest` autouse fixture that resets `REGISTRY` between
+  tests.** Rejected: `prometheus_client` does not expose a clean
+  "unregister all" hook, and we would be relying on private APIs.
+
+## References
+
+- Issue: [#652](https://git.raddatz.cloud/marcel/familienarchiv/issues/652)
+- Library: <https://github.com/trallnag/prometheus-fastapi-instrumentator>
+- Code: `ocr-service/metrics.py`, `ocr-service/main.py`,
+  `ocr-service/test_metrics.py`
diff --git a/docs/architecture/c4/l2-containers.puml b/docs/architecture/c4/l2-containers.puml
index 346efe75..8d66a614 100644
--- a/docs/architecture/c4/l2-containers.puml
+++ b/docs/architecture/c4/l2-containers.puml
@@ -43,6 +43,8 @@ Rel(ocr, storage, "Fetches PDF via presigned URL", "HTTP / S3 presigned")
 Rel(mc, storage, "Bootstraps bucket + service account on startup", "MinIO Client CLI")
 Rel(promtail, loki, "Pushes log streams", "HTTP/Loki push API")
 Rel(backend, tempo, "Sends distributed traces via OTLP", "HTTP / OTLP / port 4318 (archiv-net)")
+Rel(prometheus, backend, "Scrapes JVM + HTTP metrics", "HTTP 8081 /actuator/prometheus")
+Rel(prometheus, ocr, "Scrapes OCR + http_* metrics", "HTTP 8000 /metrics")
 Rel(grafana, prometheus, "Queries metrics", "HTTP 9090")
 Rel(grafana, loki, "Queries logs", "HTTP 3100")
 Rel(grafana, tempo, "Queries traces", "HTTP 3200")
diff --git a/infra/observability/prometheus/prometheus.yml b/infra/observability/prometheus/prometheus.yml
index a29cc75b..4838bc1c 100644
--- a/infra/observability/prometheus/prometheus.yml
+++ b/infra/observability/prometheus/prometheus.yml
@@ -20,7 +20,4 @@ scrape_configs:
   - job_name: ocr-service
     metrics_path: /metrics
     static_configs:
-      # TODO: remove or add prometheus-client to ocr-service.
-      # The Python OCR service does not currently expose Prometheus metrics.
-      # This target will show as DOWN until prometheus-client is added to ocr-service.
       - targets: ['ocr:8000']
diff --git a/ocr-service/main.py b/ocr-service/main.py
index 409cc78f..bde3fb2f 100644
--- a/ocr-service/main.py
+++ b/ocr-service/main.py
@@ -2,6 +2,7 @@
 
 import asyncio
 import glob
+import inspect
 import io
 import json
 import logging
@@ -10,9 +11,11 @@ import re
 import shutil
 import subprocess
 import tempfile
+import time
 import zipfile
 from contextlib import asynccontextmanager
 from datetime import datetime, timezone
+from typing import Awaitable, Callable
 from urllib.parse import urlparse
 
 import httpx
@@ -20,8 +23,11 @@ import pypdfium2 as pdfium
 from fastapi import FastAPI, Form, Header, HTTPException, UploadFile
 from fastapi.responses import StreamingResponse
 from PIL import Image
+from prometheus_client import REGISTRY
+from prometheus_fastapi_instrumentator import Instrumentator
 
 from confidence import apply_confidence_markers, get_threshold
+from metrics import OcrMetrics, build_metrics
 from spell_check import correct_text, load_spell_checker
 from engines import kraken as kraken_engine
 from engines import surya as surya_engine
@@ -37,6 +43,12 @@ logger = logging.getLogger(__name__)
 
 _models_ready = False
 
+# One-shot import-time binding to the default REGISTRY. Tests that need a
+# clean counter state must monkeypatch `main.metrics` with a container built
+# from a fresh CollectorRegistry — rebinding through the registry directly
+# will not retarget the references stored in the OcrMetrics dataclass.
+metrics: OcrMetrics = build_metrics(REGISTRY)
+
 ALLOWED_PDF_HOSTS = set(
     h.strip() for h in os.getenv("ALLOWED_PDF_HOSTS", "minio,localhost,127.0.0.1").split(",")
 )
@@ -44,6 +56,42 @@ ALLOWED_PDF_HOSTS = set(
 _SPELL_CHECK_SCRIPT_TYPES = {"HANDWRITING_KURRENT", "HANDWRITING_LATIN"}
 
 
+async def _record_training(
+    runner: Callable[[], Awaitable[dict] | dict],
+    kind: str,
+) -> dict:
+    """Run a training callable and record outcome + accuracy metrics.
+
+    Wraps the per-endpoint try/except + outcome counter + accuracy gauge
+    block that used to be repeated at /train, /train-sender, and /segtrain.
+    The runner returns a dict with at least an `accuracy` key; if its value
+    is None, the gauge is left at its default.
+    """
+    try:
+        result = runner()
+        if inspect.isawaitable(result):
+            result = await result
+    except Exception:
+        metrics.ocr_training_runs_total.labels(kind=kind, outcome="error").inc()
+        raise
+    metrics.ocr_training_runs_total.labels(kind=kind, outcome="success").inc()
+    if result.get("accuracy") is not None:
+        metrics.ocr_model_accuracy.labels(kind=kind).set(result["accuracy"])
+    return result
+
+
+def _observe_block_words(words: list[dict], threshold: float) -> None:
+    """Record per-block word counts and below-threshold word counts.
+
+    Pre: `words` is non-empty. Caller checks for that — keeping the helper
+    branch-free makes the call sites read as a single line.
+    """
+    metrics.ocr_words_total.inc(len(words))
+    metrics.ocr_illegible_words_total.inc(
+        sum(1 for w in words if w["confidence"] < threshold)
+    )
+
+
 def _validate_url(url: str) -> None:
     """Validate that the PDF URL points to an allowed host (SSRF protection)."""
     parsed = urlparse(url)
@@ -63,6 +111,7 @@ async def lifespan(app: FastAPI):
     kraken_engine.load_models()
     load_spell_checker()
     _models_ready = True
+    metrics.ocr_models_ready.set(1)
     logger.info("Startup complete — ready to accept requests")
 
     yield
@@ -72,6 +121,28 @@ async def lifespan(app: FastAPI):
 
 app = FastAPI(title="Familienarchiv OCR Service", lifespan=lifespan)
 
+# /metrics is unauthenticated — relies on Docker-internal-network exposure
+# only (CWE-200 risk if `ports:` ever maps 8000 to host). See
+# docs/OBSERVABILITY.md §Internal-only endpoints for the Caddy block snippet.
+Instrumentator(excluded_handlers=["/health", "/metrics"]).instrument(app).expose(app)
+
+
+class MetricsPathFilter(logging.Filter):
+    """Drop uvicorn.access entries for /metrics and /health to keep logs focused."""
+
+    _SUPPRESSED_PATHS = {"/metrics", "/health"}
+
+    def filter(self, record: logging.LogRecord) -> bool:
+        # uvicorn.access formats as: '%s - "%s %s HTTP/%s" %d'
+        if record.args and len(record.args) >= 3:
+            path = record.args[2]
+            if isinstance(path, str) and path in self._SUPPRESSED_PATHS:
+                return False
+        return True
+
+
+logging.getLogger("uvicorn.access").addFilter(MetricsPathFilter())
+
 
 @app.get("/health")
 def health():
@@ -99,7 +170,9 @@ async def run_ocr(request: OcrRequest):
         del img
 
     script_type = request.scriptType.upper()
+    engine_name = "kraken" if script_type == "HANDWRITING_KURRENT" else "surya"
 
+    extract_started = time.monotonic()
     if script_type == "HANDWRITING_KURRENT":
         if not kraken_engine.is_available():
             raise HTTPException(
@@ -111,11 +184,18 @@ async def run_ocr(request: OcrRequest):
     else:
         # TYPEWRITER, HANDWRITING_LATIN, UNKNOWN — all use Surya
         blocks = await asyncio.to_thread(surya_engine.extract_blocks, images, request.language)
+    metrics.ocr_processing_seconds.labels(engine=engine_name).observe(
+        time.monotonic() - extract_started
+    )
+
+    metrics.ocr_jobs_total.labels(engine=engine_name, script_type=script_type).inc()
 
     threshold = get_threshold(script_type)
     for block in blocks:
-        if block.get("words"):
-            block["text"] = apply_confidence_markers(block["words"], threshold)
+        words = block.get("words") or []
+        if words:
+            _observe_block_words(words, threshold)
+            block["text"] = apply_confidence_markers(words, threshold)
         block.pop("words", None)
         if script_type in _SPELL_CHECK_SCRIPT_TYPES:
             block["text"] = correct_text(block["text"])
@@ -146,6 +226,9 @@ async def run_ocr_stream(request: OcrRequest):
         )
 
     engine = kraken_engine if use_kraken else surya_engine
+    engine_name = "kraken" if use_kraken else "surya"
+
+    metrics.ocr_jobs_total.labels(engine=engine_name, script_type=script_type).inc()
 
     if request.regions:
         # Guided mode: recognize only the user-drawn annotation regions
@@ -176,12 +259,15 @@ async def run_ocr_stream(request: OcrRequest):
                     image = await asyncio.to_thread(preprocess_page, image)
                     blocks = []
                     sender_path = request.senderModelPath if use_kraken else None
+                    engine_seconds = 0.0
                     for region in page_regions:
+                        region_started = time.monotonic()
                         text = await asyncio.to_thread(
                             engine.extract_region_text, image,
                             region.x, region.y, region.width, region.height,
                             sender_path,
                         )
+                        engine_seconds += time.monotonic() - region_started
                         if script_type in _SPELL_CHECK_SCRIPT_TYPES:
                             text = correct_text(text)
                         blocks.append({
@@ -195,7 +281,11 @@ async def run_ocr_stream(request: OcrRequest):
                             "annotationId": region.annotationId,
                         })
 
+                    metrics.ocr_processing_seconds.labels(engine=engine_name).observe(
+                        engine_seconds
+                    )
                     total_blocks += len(blocks)
+                    metrics.ocr_pages_total.labels(engine=engine_name).inc()
                     yield json.dumps({
                         "type": "page",
                         "pageNumber": page_idx,
@@ -205,6 +295,7 @@ async def run_ocr_stream(request: OcrRequest):
                 except Exception:
                     logger.exception("Guided OCR failed on page %d", page_idx)
                     skipped_pages += 1
+                    metrics.ocr_skipped_pages_total.inc()
                     yield json.dumps({
                         "type": "error",
                         "pageNumber": page_idx,
@@ -238,18 +329,25 @@ async def run_ocr_stream(request: OcrRequest):
                 yield json.dumps({"type": "preprocessing", "pageNumber": page_idx}) + "\n"
                 image = await asyncio.to_thread(preprocess_page, image)
                 sender_path = request.senderModelPath if use_kraken else None
+                page_started = time.monotonic()
                 blocks = await asyncio.to_thread(
                     engine.extract_page_blocks, image, page_idx, request.language, sender_path
                 )
+                metrics.ocr_processing_seconds.labels(engine=engine_name).observe(
+                    time.monotonic() - page_started
+                )
 
                 for block in blocks:
-                    if block.get("words"):
-                        block["text"] = apply_confidence_markers(block["words"], threshold)
+                    words = block.get("words") or []
+                    if words:
+                        _observe_block_words(words, threshold)
+                        block["text"] = apply_confidence_markers(words, threshold)
                     block.pop("words", None)
                     if script_type in _SPELL_CHECK_SCRIPT_TYPES:
                         block["text"] = correct_text(block["text"])
 
                 total_blocks += len(blocks)
+                metrics.ocr_pages_total.labels(engine=engine_name).inc()
                 yield json.dumps({
                     "type": "page",
                     "pageNumber": page_idx,
@@ -259,6 +357,7 @@ async def run_ocr_stream(request: OcrRequest):
             except Exception:
                 logger.exception("OCR failed on page %d", page_idx)
                 skipped_pages += 1
+                metrics.ocr_skipped_pages_total.inc()
                 yield json.dumps({
                     "type": "error",
                     "pageNumber": page_idx,
@@ -438,8 +537,7 @@ async def train_model(
 
             return {"loss": None, "accuracy": accuracy, "cer": cer, "epochs": epochs}
 
-    result = await asyncio.to_thread(_run_training)
-    return result
+    return await _record_training(lambda: asyncio.to_thread(_run_training), kind="recognition")
 
 
 @app.post("/train-sender")
@@ -518,8 +616,9 @@ async def train_sender_model(
 
             return {"loss": None, "accuracy": accuracy, "cer": cer, "epochs": epochs}
 
-    result = await asyncio.to_thread(_run_sender_training)
-    return result
+    return await _record_training(
+        lambda: asyncio.to_thread(_run_sender_training), kind="recognition"
+    )
 
 
 @app.post("/segtrain")
@@ -628,8 +727,7 @@ async def segtrain_model(
 
             return {"loss": None, "accuracy": accuracy, "cer": cer, "epochs": epochs}
 
-    result = await asyncio.to_thread(_run_segtrain)
-    return result
+    return await _record_training(lambda: asyncio.to_thread(_run_segtrain), kind="segmentation")
 
 
 async def _download_and_convert_pdf(url: str) -> list[Image.Image]:
diff --git a/ocr-service/metrics.py b/ocr-service/metrics.py
new file mode 100644
index 00000000..66efb9f2
--- /dev/null
+++ b/ocr-service/metrics.py
@@ -0,0 +1,92 @@
+"""Prometheus metric definitions for the OCR service.
+
+`build_metrics(registry)` returns a fresh `OcrMetrics` instance bound to the
+given `CollectorRegistry`. Production code calls it once at module load with
+the default `REGISTRY`; tests pass a per-test `CollectorRegistry()` to keep
+counter values isolated between cases (decision #3 on issue #652).
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+from prometheus_client import CollectorRegistry, Counter, Gauge, Histogram
+
+
+@dataclass(frozen=True)
+class OcrMetrics:
+    """Container for every custom OCR metric.
+
+    Counters and gauges are immutable references to `prometheus_client`
+    instances. Mutating them (`.inc()`, `.observe()`, `.set()`) is safe;
+    rebinding the field on the dataclass is not — use `build_metrics` to get
+    a new container.
+    """
+
+    ocr_jobs_total: Counter
+    ocr_pages_total: Counter
+    ocr_skipped_pages_total: Counter
+    ocr_words_total: Counter
+    ocr_illegible_words_total: Counter
+    ocr_processing_seconds: Histogram
+    ocr_training_runs_total: Counter
+    ocr_model_accuracy: Gauge
+    ocr_models_ready: Gauge
+
+
+def build_metrics(registry: CollectorRegistry) -> OcrMetrics:
+    """Create one OcrMetrics instance bound to `registry`."""
+    return OcrMetrics(
+        ocr_jobs_total=Counter(
+            "ocr_jobs_total",
+            "Number of OCR jobs processed, labelled by engine and script type.",
+            ["engine", "script_type"],
+            registry=registry,
+        ),
+        ocr_pages_total=Counter(
+            "ocr_pages_total",
+            "Number of pages successfully OCR'd, labelled by engine.",
+            ["engine"],
+            registry=registry,
+        ),
+        ocr_skipped_pages_total=Counter(
+            "ocr_skipped_pages_total",
+            "Number of pages skipped because the OCR engine raised.",
+            registry=registry,
+        ),
+        ocr_words_total=Counter(
+            "ocr_words_total",
+            "Number of words recognized across all OCR blocks.",
+            registry=registry,
+        ),
+        ocr_illegible_words_total=Counter(
+            "ocr_illegible_words_total",
+            "Number of words below the confidence threshold "
+            "(replaced with [unleserlich]).",
+            registry=registry,
+        ),
+        ocr_processing_seconds=Histogram(
+            "ocr_processing_seconds",
+            "OCR processing time per page (streaming) or per document (non-streaming).",
+            ["engine"],
+            registry=registry,
+        ),
+        ocr_training_runs_total=Counter(
+            "ocr_training_runs_total",
+            "Number of training runs, labelled by kind (recognition|segmentation) "
+            "and outcome (success|error).",
+            ["kind", "outcome"],
+            registry=registry,
+        ),
+        ocr_model_accuracy=Gauge(
+            "ocr_model_accuracy",
+            "Latest model accuracy reported by a successful training run.",
+            ["kind"],
+            registry=registry,
+        ),
+        ocr_models_ready=Gauge(
+            "ocr_models_ready",
+            "1 once the lifespan startup has finished loading models, 0 before.",
+            registry=registry,
+        ),
+    )
diff --git a/ocr-service/requirements.txt b/ocr-service/requirements.txt
index befaafcb..39c492cc 100644
--- a/ocr-service/requirements.txt
+++ b/ocr-service/requirements.txt
@@ -10,3 +10,5 @@ pyvips>=2.2.0
 httpx==0.28.1
 pyspellchecker==0.9.0
 opencv-python-headless==4.11.0.86
+prometheus-fastapi-instrumentator==7.0.0
+prometheus-client==0.25.0
diff --git a/ocr-service/test_metrics.py b/ocr-service/test_metrics.py
new file mode 100644
index 00000000..42afd275
--- /dev/null
+++ b/ocr-service/test_metrics.py
@@ -0,0 +1,638 @@
+"""Tests for Prometheus metrics exposed by the OCR service.
+
+Each test that asserts on a counter/gauge value uses a fresh CollectorRegistry
+(see decision #3 on issue #652) to keep the metrics isolated between tests.
+"""
+
+import contextlib
+import io
+import zipfile
+from unittest.mock import AsyncMock, patch
+
+import pytest
+from httpx import ASGITransport, AsyncClient
+from PIL import Image
+from prometheus_client import CollectorRegistry
+
+from main import app
+from metrics import build_metrics
+
+
+@contextlib.asynccontextmanager
+async def ocr_client(*, raise_app_exceptions: bool = True):
+    """Yield an AsyncClient with model-loaders patched and _models_ready forced on.
+
+    The shared setup for almost every metrics test: stub the heavy lifecycle
+    hooks (kraken_engine.load_models, load_spell_checker), flip the readiness
+    flag so request handlers do not 503, and restore it afterwards.
+    """
+    with patch("main.kraken_engine.load_models"), \
+         patch("main.load_spell_checker"):
+        transport = ASGITransport(app=app, raise_app_exceptions=raise_app_exceptions)
+        async with AsyncClient(transport=transport, base_url="http://test") as client:
+            import main as main_module
+            main_module._models_ready = True
+            try:
+                yield client
+            finally:
+                main_module._models_ready = False
+
+
+def _minimal_zip() -> bytes:
+    """Return a ZIP containing one fake .xml so endpoint validation passes."""
+    buf = io.BytesIO()
+    with zipfile.ZipFile(buf, "w") as zf:
+        zf.writestr("page_01.xml", "<PcGts/>")
+    return buf.getvalue()
+
+
+def _fake_training_result(accuracy: float = 0.91) -> dict:
+    return {"loss": None, "accuracy": accuracy, "cer": round(1 - accuracy, 4), "epochs": 5}
+
+
+@pytest.fixture
+def fresh_metrics(monkeypatch):
+    """Replace the module-level `main.metrics` with one bound to a fresh registry."""
+    registry = CollectorRegistry()
+    test_metrics = build_metrics(registry)
+    monkeypatch.setattr("main.metrics", test_metrics)
+    return test_metrics
+
+
+@pytest.mark.asyncio
+async def test_metrics_endpoint_returns_200():
+    """`GET /metrics` returns 200 with Prometheus exposition content.
+
+    Uses the global REGISTRY by design — does NOT take the `fresh_metrics` fixture.
+    The `/metrics` endpoint is wired by `prometheus-fastapi-instrumentator`, which
+    binds to the default REGISTRY at app-construction time; swapping `main.metrics`
+    via the fixture would not redirect what `/metrics` exposes. This test only
+    asserts response shape (status code + content-type substring), not numeric
+    counter values, so cross-test state leakage cannot affect it.
+    """
+    with patch("main.kraken_engine.load_models"), \
+         patch("main.load_spell_checker"):
+        async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
+            response = await client.get("/metrics")
+
+    assert response.status_code == 200
+    assert "text/plain" in response.headers.get("content-type", "")
+
+
+@pytest.mark.asyncio
+async def test_metrics_includes_http_request_metrics_after_ocr_call():
+    """After a request to /ocr, `/metrics` exposes auto-instrumented http_* metrics.
+
+    Uses the global REGISTRY by design — does NOT take the `fresh_metrics` fixture.
+    The `http_requests_total` / `http_request_duration_seconds` metrics live on
+    the instrumentator's default REGISTRY (not on `main.metrics`), so a fresh
+    CollectorRegistry would never see them. This test only asserts response shape
+    (substring presence in the exposition body), not numeric counter values, so
+    cross-test state leakage cannot affect it.
+    """
+    mock_images = [Image.new("RGB", (100, 100))]
+    mock_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
+                    "polygon": None, "text": "hi", "words": []}]
+
+    with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
+         patch("main.preprocess_page", side_effect=lambda img: img), \
+         patch("main.surya_engine.extract_blocks", return_value=mock_blocks):
+        async with ocr_client() as client:
+            ocr_response = await client.post("/ocr", json={
+                "pdfUrl": "http://minio/doc.pdf",
+                "scriptType": "TYPEWRITER",
+                "language": "de",
+            })
+            assert ocr_response.status_code == 200, ocr_response.text
+
+            metrics_response = await client.get("/metrics")
+
+    body = metrics_response.text
+    assert "http_requests_total" in body
+    assert "http_request_duration_seconds" in body
+
+
+def test_build_metrics_registers_all_custom_metrics_on_given_registry():
+    """`build_metrics` returns an OcrMetrics bound to the supplied registry."""
+    registry = CollectorRegistry()
+    metrics = build_metrics(registry)
+
+    metric_names = {m.name for m in registry.collect()}
+    expected = {
+        "ocr_jobs",
+        "ocr_pages",
+        "ocr_skipped_pages",
+        "ocr_words",
+        "ocr_illegible_words",
+        "ocr_processing_seconds",
+        "ocr_training_runs",
+        "ocr_model_accuracy",
+        "ocr_models_ready",
+    }
+    assert expected <= metric_names, f"missing: {expected - metric_names}"
+
+    # A second registry yields a separate container — no shared state.
+    other_metrics = build_metrics(CollectorRegistry())
+    assert metrics is not other_metrics
+
+
+async def _drive_ocr(client: AsyncClient, *, script_type: str) -> None:
+    """Helper — fires /ocr with a single mocked page and asserts a 200."""
+    response = await client.post("/ocr", json={
+        "pdfUrl": "http://minio/doc.pdf",
+        "scriptType": script_type,
+        "language": "de",
+    })
+    assert response.status_code == 200, response.text
+
+
+@pytest.mark.asyncio
+async def test_ocr_jobs_total_incremented_with_kraken_engine_label_for_kurrent(fresh_metrics):
+    """A /ocr call with HANDWRITING_KURRENT increments engine=kraken."""
+    mock_images = [Image.new("RGB", (100, 100))]
+    mock_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
+                    "polygon": None, "text": "hi", "words": []}]
+
+    with patch("main.correct_text", side_effect=lambda t: t), \
+         patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
+         patch("main.preprocess_page", side_effect=lambda img: img), \
+         patch("main.kraken_engine.is_available", return_value=True), \
+         patch("main.kraken_engine.extract_blocks", return_value=mock_blocks):
+        async with ocr_client() as client:
+            await _drive_ocr(client, script_type="HANDWRITING_KURRENT")
+
+    value = fresh_metrics.ocr_jobs_total.labels(
+        engine="kraken", script_type="HANDWRITING_KURRENT"
+    )._value.get()
+    assert value == 1.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_jobs_total_incremented_with_surya_engine_label_for_typewriter(fresh_metrics):
+    """A /ocr call with TYPEWRITER increments engine=surya."""
+    mock_images = [Image.new("RGB", (100, 100))]
+    mock_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
+                    "polygon": None, "text": "hi", "words": []}]
+
+    with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
+         patch("main.preprocess_page", side_effect=lambda img: img), \
+         patch("main.surya_engine.extract_blocks", return_value=mock_blocks):
+        async with ocr_client() as client:
+            await _drive_ocr(client, script_type="TYPEWRITER")
+
+    value = fresh_metrics.ocr_jobs_total.labels(
+        engine="surya", script_type="TYPEWRITER"
+    )._value.get()
+    assert value == 1.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_pages_total_incremented_once_per_page_in_stream(fresh_metrics):
+    """The /ocr/stream generator increments ocr_pages_total per successful page."""
+    mock_images = [Image.new("RGB", (100, 100)) for _ in range(3)]
+    mock_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
+                    "polygon": None, "text": "hi", "words": []}]
+
+    with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
+         patch("main.preprocess_page", side_effect=lambda img: img), \
+         patch("main.surya_engine.extract_page_blocks", return_value=mock_blocks):
+        async with ocr_client() as client:
+            async with client.stream("POST", "/ocr/stream", json={
+                "pdfUrl": "http://minio/doc.pdf",
+                "scriptType": "TYPEWRITER",
+                "language": "de",
+            }) as response:
+                assert response.status_code == 200
+                # Drain the stream so all per-page increments fire.
+                async for _ in response.aiter_lines():
+                    pass
+
+    value = fresh_metrics.ocr_pages_total.labels(engine="surya")._value.get()
+    assert value == 3.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_skipped_pages_total_incremented_when_engine_raises_for_a_page(fresh_metrics):
+    """When the engine raises on a page, ocr_skipped_pages_total bumps and the stream finishes."""
+    mock_images = [Image.new("RGB", (100, 100)) for _ in range(2)]
+    good_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
+                    "polygon": None, "text": "ok", "words": []}]
+
+    call_count = {"n": 0}
+
+    def extract_side_effect(*args, **kwargs):
+        call_count["n"] += 1
+        if call_count["n"] == 1:
+            raise RuntimeError("synthetic engine failure")
+        return good_blocks
+
+    with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
+         patch("main.preprocess_page", side_effect=lambda img: img), \
+         patch("main.surya_engine.extract_page_blocks", side_effect=extract_side_effect):
+        async with ocr_client() as client:
+            async with client.stream("POST", "/ocr/stream", json={
+                "pdfUrl": "http://minio/doc.pdf",
+                "scriptType": "TYPEWRITER",
+                "language": "de",
+            }) as response:
+                assert response.status_code == 200
+                saw_error = False
+                async for line in response.aiter_lines():
+                    if line and '"type": "error"' in line:
+                        saw_error = True
+                assert saw_error
+
+    assert fresh_metrics.ocr_skipped_pages_total._value.get() == 1.0
+    # The second page still succeeds.
+    assert fresh_metrics.ocr_pages_total.labels(engine="surya")._value.get() == 1.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_words_and_illegible_words_total_sum_across_blocks(fresh_metrics):
+    """Counters reflect totals summed over every block in the request.
+
+    Threshold defaults to THRESHOLD_DEFAULT (0.3) for non-Kurrent scripts. Two
+    blocks: 3 words above + 2 words below threshold across blocks.
+    """
+    mock_images = [Image.new("RGB", (100, 100))]
+    mock_blocks = [
+        {"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
+         "polygon": None, "text": "ignored",
+         "words": [{"text": "Lieber", "confidence": 0.9},
+                   {"text": "Freund", "confidence": 0.1}]},
+        {"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
+         "polygon": None, "text": "ignored",
+         "words": [{"text": "Gruss", "confidence": 0.8},
+                   {"text": "verschmiert", "confidence": 0.05},
+                   {"text": "Karl", "confidence": 0.95}]},
+    ]
+
+    with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
+         patch("main.preprocess_page", side_effect=lambda img: img), \
+         patch("main.surya_engine.extract_blocks", return_value=mock_blocks):
+        async with ocr_client() as client:
+            await _drive_ocr(client, script_type="TYPEWRITER")
+
+    assert fresh_metrics.ocr_words_total._value.get() == 5.0
+    assert fresh_metrics.ocr_illegible_words_total._value.get() == 2.0
+
+
+def _histogram_count_sum(histogram, **labels) -> tuple[float, float]:
+    """Read the per-label-set _count and _sum from a prometheus_client Histogram."""
+    child = histogram.labels(**labels)
+    return child._sum.get(), sum(b.get() for b in child._buckets)
+
+
+@pytest.mark.asyncio
+async def test_ocr_processing_seconds_histogram_observed_per_page_in_stream(fresh_metrics):
+    """The streaming generator observes ocr_processing_seconds once per page."""
+    mock_images = [Image.new("RGB", (100, 100)) for _ in range(2)]
+    mock_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
+                    "polygon": None, "text": "ok", "words": []}]
+
+    with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
+         patch("main.preprocess_page", side_effect=lambda img: img), \
+         patch("main.surya_engine.extract_page_blocks", return_value=mock_blocks):
+        async with ocr_client() as client:
+            async with client.stream("POST", "/ocr/stream", json={
+                "pdfUrl": "http://minio/doc.pdf",
+                "scriptType": "TYPEWRITER",
+                "language": "de",
+            }) as response:
+                assert response.status_code == 200
+                async for _ in response.aiter_lines():
+                    pass
+
+    sum_seconds, count = _histogram_count_sum(
+        fresh_metrics.ocr_processing_seconds, engine="surya"
+    )
+    assert count == 2.0
+    assert sum_seconds >= 0.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_training_runs_total_incremented_with_recognition_success_label(fresh_metrics):
+    """/train success increments ocr_training_runs_total{kind=recognition, outcome=success}."""
+    async def fake_to_thread(func, *args, **kwargs):
+        return _fake_training_result()
+
+    with patch("main.TRAINING_TOKEN", "secret-token"), \
+         patch("main._models_ready", True), \
+         patch("main.asyncio.to_thread", side_effect=fake_to_thread):
+        async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
+            response = await client.post(
+                "/train",
+                files={"file": ("training.zip", _minimal_zip(), "application/zip")},
+                headers={"X-Training-Token": "secret-token"},
+            )
+
+    assert response.status_code == 200
+    assert fresh_metrics.ocr_training_runs_total.labels(
+        kind="recognition", outcome="success"
+    )._value.get() == 1.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_training_runs_total_incremented_with_recognition_error_label(fresh_metrics):
+    """When ketos exits non-zero, the error counter bumps and the exception propagates.
+
+    Uses the narrowest available seam — `subprocess.run` returning a failing
+    CompletedProcess — instead of stubbing the asyncio.to_thread boundary,
+    so the test exercises the real _run_training error path.
+    """
+    from subprocess import CompletedProcess
+
+    failing_proc = CompletedProcess(
+        args=["ketos"], returncode=1, stdout="", stderr="synthetic ketos failure"
+    )
+
+    with patch("main.TRAINING_TOKEN", "secret-token"), \
+         patch("main._models_ready", True), \
+         patch("main.subprocess.run", return_value=failing_proc):
+        transport = ASGITransport(app=app, raise_app_exceptions=False)
+        async with AsyncClient(transport=transport, base_url="http://test") as client:
+            response = await client.post(
+                "/train",
+                files={"file": ("training.zip", _minimal_zip(), "application/zip")},
+                headers={"X-Training-Token": "secret-token"},
+            )
+
+    assert response.status_code == 500
+    assert fresh_metrics.ocr_training_runs_total.labels(
+        kind="recognition", outcome="error"
+    )._value.get() == 1.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_training_runs_total_incremented_with_segmentation_success_label(fresh_metrics):
+    """/segtrain success increments ocr_training_runs_total{kind=segmentation, outcome=success}."""
+    async def fake_to_thread(func, *args, **kwargs):
+        return _fake_training_result(accuracy=0.83)
+
+    with patch("main.TRAINING_TOKEN", "secret-token"), \
+         patch("main._models_ready", True), \
+         patch("main.asyncio.to_thread", side_effect=fake_to_thread):
+        async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
+            response = await client.post(
+                "/segtrain",
+                files={"file": ("training.zip", _minimal_zip(), "application/zip")},
+                headers={"X-Training-Token": "secret-token"},
+            )
+
+    assert response.status_code == 200
+    assert fresh_metrics.ocr_training_runs_total.labels(
+        kind="segmentation", outcome="success"
+    )._value.get() == 1.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_training_runs_total_incremented_with_recognition_success_label_for_train_sender(fresh_metrics):
+    """/train-sender success increments ocr_training_runs_total{kind=recognition, outcome=success}."""
+    async def fake_to_thread(func, *args, **kwargs):
+        return _fake_training_result()
+
+    with patch("main.TRAINING_TOKEN", "secret-token"), \
+         patch("main._models_ready", True), \
+         patch("main.asyncio.to_thread", side_effect=fake_to_thread):
+        async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
+            response = await client.post(
+                "/train-sender",
+                files={"file": ("training.zip", _minimal_zip(), "application/zip")},
+                data={"output_model_path": "/app/models/sender_test.mlmodel"},
+                headers={"X-Training-Token": "secret-token"},
+            )
+
+    assert response.status_code == 200, response.text
+    assert fresh_metrics.ocr_training_runs_total.labels(
+        kind="recognition", outcome="success"
+    )._value.get() == 1.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_model_accuracy_gauge_stays_default_when_training_returns_no_accuracy(fresh_metrics):
+    """When the runner returns accuracy=None, ocr_model_accuracy must remain at its default 0."""
+    async def fake_to_thread(func, *args, **kwargs):
+        return {"loss": None, "accuracy": None, "cer": None, "epochs": 5}
+
+    with patch("main.TRAINING_TOKEN", "secret-token"), \
+         patch("main._models_ready", True), \
+         patch("main.asyncio.to_thread", side_effect=fake_to_thread):
+        async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
+            response = await client.post(
+                "/train",
+                files={"file": ("training.zip", _minimal_zip(), "application/zip")},
+                headers={"X-Training-Token": "secret-token"},
+            )
+
+    assert response.status_code == 200
+    # Gauge was never .set() — accessing the label child still creates it with default 0.0.
+    assert fresh_metrics.ocr_model_accuracy.labels(
+        kind="recognition"
+    )._value.get() == 0.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_model_accuracy_gauge_set_per_kind_after_successful_training(fresh_metrics):
+    """After /train and /segtrain succeed, ocr_model_accuracy{kind=...} reflects the result."""
+    recognition_accuracy = 0.917
+    segmentation_accuracy = 0.834
+
+    async def fake_recognition_to_thread(func, *args, **kwargs):
+        return _fake_training_result(accuracy=recognition_accuracy)
+
+    async def fake_segmentation_to_thread(func, *args, **kwargs):
+        return _fake_training_result(accuracy=segmentation_accuracy)
+
+    with patch("main.TRAINING_TOKEN", "secret-token"), \
+         patch("main._models_ready", True):
+        async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
+            with patch("main.asyncio.to_thread", side_effect=fake_recognition_to_thread):
+                rec_resp = await client.post(
+                    "/train",
+                    files={"file": ("training.zip", _minimal_zip(), "application/zip")},
+                    headers={"X-Training-Token": "secret-token"},
+                )
+            assert rec_resp.status_code == 200
+            with patch("main.asyncio.to_thread", side_effect=fake_segmentation_to_thread):
+                seg_resp = await client.post(
+                    "/segtrain",
+                    files={"file": ("training.zip", _minimal_zip(), "application/zip")},
+                    headers={"X-Training-Token": "secret-token"},
+                )
+            assert seg_resp.status_code == 200
+
+    assert fresh_metrics.ocr_model_accuracy.labels(kind="recognition")._value.get() == pytest.approx(recognition_accuracy)
+    assert fresh_metrics.ocr_model_accuracy.labels(kind="segmentation")._value.get() == pytest.approx(segmentation_accuracy)
+
+
+def test_ocr_models_ready_gauge_defaults_to_zero():
+    """A freshly-built OcrMetrics has ocr_models_ready=0 before lifespan runs."""
+    metrics = build_metrics(CollectorRegistry())
+    assert metrics.ocr_models_ready._value.get() == 0.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_models_ready_gauge_is_one_after_lifespan_startup(fresh_metrics):
+    """The lifespan flips ocr_models_ready to 1 once load_models / load_spell_checker return.
+
+    ASGITransport does not run lifespan by default, so the lifespan context
+    manager is driven directly to exercise the startup code path.
+    """
+    assert fresh_metrics.ocr_models_ready._value.get() == 0.0
+    with patch("main.kraken_engine.load_models"), \
+         patch("main.load_spell_checker"):
+        async with app.router.lifespan_context(app):
+            assert fresh_metrics.ocr_models_ready._value.get() == 1.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_processing_seconds_histogram_observed_per_page_in_guided_stream(fresh_metrics):
+    """The guided streaming generator observes ocr_processing_seconds once per page."""
+    mock_images = [Image.new("RGB", (100, 100)) for _ in range(2)]
+    regions = [
+        {"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 0.5, "height": 0.5, "annotationId": "a1"},
+        {"pageNumber": 2, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0, "annotationId": "a2"},
+    ]
+
+    with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
+         patch("main.preprocess_page", side_effect=lambda img: img), \
+         patch("main.surya_engine.extract_region_text", return_value="text"):
+        async with ocr_client() as client:
+            async with client.stream("POST", "/ocr/stream", json={
+                "pdfUrl": "http://minio/doc.pdf",
+                "scriptType": "TYPEWRITER",
+                "language": "de",
+                "regions": regions,
+            }) as response:
+                assert response.status_code == 200
+                async for _ in response.aiter_lines():
+                    pass
+
+    sum_seconds, count = _histogram_count_sum(
+        fresh_metrics.ocr_processing_seconds, engine="surya"
+    )
+    assert count == 2.0
+    assert sum_seconds >= 0.0
+
+
+@pytest.mark.asyncio
+async def test_ocr_processing_seconds_histogram_excludes_spell_check_time_in_guided_stream(fresh_metrics):
+    """The guided observation must time engine work only, not the spell-check pass.
+
+    Wall-clock bound rather than a structural `patch("main.time.monotonic")`:
+    the patched attribute is the *global* `time.monotonic`, which httpx and
+    asyncio also consume — they exhaust the deterministic sequence before the
+    request reaches the engine loop. Bound is sized against the failure mode,
+    not the noise floor: spell-check sleeps 0.05s × 2 regions = 0.1s, so a
+    timer that accidentally wrapped `correct_text` would observe >= 0.1s. The
+    0.09s ceiling catches that bug while leaving ~90ms of slack for slow CI
+    runners (engine work is instantaneous under the mock).
+    """
+    mock_images = [Image.new("RGB", (100, 100))]
+    regions = [
+        {"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 0.5, "height": 0.5, "annotationId": "a1"},
+        {"pageNumber": 1, "x": 0.5, "y": 0.0, "width": 0.5, "height": 0.5, "annotationId": "a2"},
+    ]
+
+    def slow_correct(text):
+        import time as _time
+        _time.sleep(0.05)
+        return text
+
+    with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
+         patch("main.preprocess_page", side_effect=lambda img: img), \
+         patch("main.kraken_engine.is_available", return_value=True), \
+         patch("main.kraken_engine.extract_region_text", return_value="text"), \
+         patch("main.correct_text", side_effect=slow_correct):
+        async with ocr_client() as client:
+            async with client.stream("POST", "/ocr/stream", json={
+                "pdfUrl": "http://minio/doc.pdf",
+                "scriptType": "HANDWRITING_KURRENT",
+                "language": "de",
+                "regions": regions,
+            }) as response:
+                assert response.status_code == 200
+                async for _ in response.aiter_lines():
+                    pass
+
+    sum_seconds, _ = _histogram_count_sum(
+        fresh_metrics.ocr_processing_seconds, engine="kraken"
+    )
+    assert sum_seconds < 0.09, f"timing must exclude spell-check; got sum={sum_seconds}"
+
+
+@pytest.mark.asyncio
+async def test_ocr_jobs_total_not_incremented_when_pdf_download_fails_in_stream(fresh_metrics):
+    """If `_download_and_convert_pdf` raises, ocr_jobs_total is NOT incremented.
+
+    Mirrors the /ocr endpoint's semantics: the counter only records jobs that
+    actually started OCR work, not failed downloads.
+    """
+    async def fail_download(url):
+        raise RuntimeError("synthetic download failure")
+
+    with patch("main._download_and_convert_pdf", new=fail_download):
+        async with ocr_client(raise_app_exceptions=False) as client:
+            response = await client.post("/ocr/stream", json={
+                "pdfUrl": "http://minio/doc.pdf",
+                "scriptType": "TYPEWRITER",
+                "language": "de",
+            })
+
+    assert response.status_code == 500
+    assert fresh_metrics.ocr_jobs_total.labels(
+        engine="surya", script_type="TYPEWRITER"
+    )._value.get() == 0.0
+
+
+def test_uvicorn_access_log_filter_fails_open_on_short_or_missing_args():
+    """The filter must default-allow records when args is None or shorter than expected.
+
+    Locks in fail-open behavior: if uvicorn ever changes its format we keep
+    forwarding records to the handler rather than silently dropping logs.
+    """
+    import logging as _logging
+    from main import MetricsPathFilter
+
+    filt = MetricsPathFilter()
+    none_record = _logging.LogRecord(
+        name="uvicorn.access", level=_logging.INFO, pathname="", lineno=0,
+        msg="some message", args=None, exc_info=None,
+    )
+    short_record = _logging.LogRecord(
+        name="uvicorn.access", level=_logging.INFO, pathname="", lineno=0,
+        msg="%s %s", args=("a", "b"), exc_info=None,
+    )
+
+    assert filt.filter(none_record) is True
+    assert filt.filter(short_record) is True
+
+
+def test_uvicorn_access_log_filter_skips_metrics_path():
+    """The MetricsPathFilter drops uvicorn.access log records that target /metrics."""
+    import logging as _logging
+    from main import MetricsPathFilter
+
+    filt = MetricsPathFilter()
+    metrics_record = _logging.LogRecord(
+        name="uvicorn.access", level=_logging.INFO, pathname="", lineno=0,
+        msg='%s - "%s %s HTTP/%s" %d',
+        args=("127.0.0.1:1234", "GET", "/metrics", "1.1", 200),
+        exc_info=None,
+    )
+    health_record = _logging.LogRecord(
+        name="uvicorn.access", level=_logging.INFO, pathname="", lineno=0,
+        msg='%s - "%s %s HTTP/%s" %d',
+        args=("127.0.0.1:1234", "GET", "/health", "1.1", 200),
+        exc_info=None,
+    )
+    ocr_record = _logging.LogRecord(
+        name="uvicorn.access", level=_logging.INFO, pathname="", lineno=0,
+        msg='%s - "%s %s HTTP/%s" %d',
+        args=("127.0.0.1:1234", "POST", "/ocr", "1.1", 200),
+        exc_info=None,
+    )
+
+    assert filt.filter(metrics_record) is False
+    assert filt.filter(health_record) is False
+    assert filt.filter(ocr_record) is True