Compare commits

..

2 Commits

Author SHA1 Message Date
Marcel
9fcd0553bd test(mocks): switch $app/navigation factory mocks to shared __mocks__ stub
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 2m51s
CI / OCR Service Tests (pull_request) Successful in 21s
CI / Backend Unit Tests (pull_request) Successful in 3m43s
CI / fail2ban Regex (pull_request) Failing after 42s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Migrates 36 of the remaining 37 vi.mock('$app/navigation', factory) call
sites in frontend/src/**/*.svelte.{spec,test}.ts to
vi.mock('$app/navigation') (no factory), so they all resolve to the
shared src/__mocks__/$app/navigation.ts stub introduced in commit
aea37250.

Special handling:
- DropZone.svelte.spec.ts: removed vi.hoisted invalidateAllMock; test
  body now imports invalidateAll from $app/navigation and uses
  vi.mocked(invalidateAll) for assertions.
- documents/bulk-edit/page.svelte.test.ts: removed the module-scope
  const gotoSpy = vi.fn() that was injected into the factory; test
  body now imports goto from $app/navigation and uses
  vi.mocked(goto) for assertions and mockClear().

Deferred: src/lib/shared/hooks/useUnsavedWarning.svelte.test.ts uses a
non-trivial beforeNavigate wrapper that captures the callback into
module-scope state (registeredBeforeNavigate) so tests can simulate
navigation events. That hybrid factory cannot be replaced with the
plain-vi.fn() shared stub without redesigning the test; it is left in
place and will be addressed separately.

Per ADR-012: eliminating factory bodies removes 36 birpc teardown-race
surface points. The shared mock keeps every nav export as a vi.fn().

Refs #560.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 18:42:52 +02:00
Marcel
aea37250f4 test(mocks): wire DocumentRow spec to shared __mocks__/$app/navigation
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m33s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m35s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
Smallest possible first step of issue #560. Adds a shared
__mocks__/$app/navigation.ts exporting every nav function as vi.fn(),
and switches one consumer (DocumentRow.svelte.spec.ts) from a per-spec
factory to vi.mock('\$app/navigation') (no factory) to verify Vitest's
__mocks__/ redirect resolves for SvelteKit virtual modules in
browser-mode tests.

CI is the gate: if the migrated spec passes alongside the 35 unchanged
factory call sites, the same redirect pattern is applied across the
remaining $app/navigation, $app/state, $app/forms and paraglide
factories.

Per ADR-012, eliminating factory mocks shrinks the birpc teardown-race
surface; this commit is the first concrete cut.

Refs #560.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 18:19:45 +02:00
60 changed files with 76 additions and 2091 deletions

View File

@@ -39,12 +39,6 @@ PORT_PROMETHEUS=9090
# Grafana admin password — change this before exposing Grafana beyond localhost
GRAFANA_ADMIN_PASSWORD=changeme
# Password for the read-only grafana_reader PostgreSQL role used by the PO
# Overview dashboard. Consumed by Flyway V68 (to set the role's password) and
# by Grafana's PostgreSQL datasource (to connect). REQUIRED in production —
# generate with: openssl rand -hex 32
GRAFANA_DB_PASSWORD=changeme-generate-with-openssl-rand-hex-32
# GlitchTip domain — production: use https://glitchtip.archiv.raddatz.cloud (must match Caddy vhost)
GLITCHTIP_DOMAIN=http://localhost:3002

View File

@@ -31,7 +31,6 @@ name: nightly
# STAGING_APP_ADMIN_USERNAME
# STAGING_APP_ADMIN_PASSWORD
# GRAFANA_ADMIN_PASSWORD
# GRAFANA_DB_PASSWORD (read-only grafana_reader DB role, issue #651)
# GLITCHTIP_SECRET_KEY
# SENTRY_DSN (set after GlitchTip first-run; empty = Sentry disabled)
@@ -81,7 +80,6 @@ jobs:
POSTGRES_USER=archiv
SENTRY_DSN=${{ secrets.SENTRY_DSN }}
VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
EOF
- name: Verify backend /import:ro mount is wired
@@ -145,7 +143,6 @@ jobs:
cp docker-compose.observability.yml /opt/familienarchiv/
cat > /opt/familienarchiv/obs-secrets.env <<'EOF'
GRAFANA_ADMIN_PASSWORD=${{ secrets.GRAFANA_ADMIN_PASSWORD }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
GLITCHTIP_SECRET_KEY=${{ secrets.GLITCHTIP_SECRET_KEY }}
POSTGRES_PASSWORD=${{ secrets.STAGING_POSTGRES_PASSWORD }}
POSTGRES_HOST=archiv-staging-db-1

View File

@@ -35,7 +35,6 @@ name: release
# MAIL_USERNAME
# MAIL_PASSWORD
# GRAFANA_ADMIN_PASSWORD
# GRAFANA_DB_PASSWORD (read-only grafana_reader DB role, issue #651)
# GLITCHTIP_SECRET_KEY
# SENTRY_DSN (set after GlitchTip first-run; empty = Sentry disabled)
@@ -78,7 +77,6 @@ jobs:
IMPORT_HOST_DIR=/srv/familienarchiv-production/import
POSTGRES_USER=archiv
SENTRY_DSN=${{ secrets.SENTRY_DSN }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
EOF
- name: Build images
@@ -112,7 +110,6 @@ jobs:
cp docker-compose.observability.yml /opt/familienarchiv/
cat > /opt/familienarchiv/obs-secrets.env <<'EOF'
GRAFANA_ADMIN_PASSWORD=${{ secrets.GRAFANA_ADMIN_PASSWORD }}
GRAFANA_DB_PASSWORD=${{ secrets.GRAFANA_DB_PASSWORD }}
GLITCHTIP_SECRET_KEY=${{ secrets.GLITCHTIP_SECRET_KEY }}
POSTGRES_PASSWORD=${{ secrets.PROD_POSTGRES_PASSWORD }}
POSTGRES_HOST=archiv-production-db-1

View File

@@ -7,15 +7,12 @@ import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import javax.sql.DataSource;
import java.util.Map;
@Configuration
@RequiredArgsConstructor
@Slf4j
public class FlywayConfig {
private static final String GRAFANA_DB_PASSWORD_FALLBACK = "changeme-grafana-db-password";
private final DataSource dataSource;
@Bean(name = "flyway")
@@ -24,7 +21,6 @@ public class FlywayConfig {
Flyway flyway = Flyway.configure()
.dataSource(dataSource)
.locations("classpath:db/migration")
.placeholders(Map.of("grafanaDbPassword", resolveGrafanaDbPassword()))
.baselineOnMigrate(true)
.baselineVersion("4")
.load();
@@ -32,14 +28,4 @@ public class FlywayConfig {
log.info("Flyway: {} migration(s) applied.", result.migrationsExecuted);
return flyway;
}
private String resolveGrafanaDbPassword() {
String value = System.getenv("GRAFANA_DB_PASSWORD");
if (value == null || value.isBlank()) {
log.warn("GRAFANA_DB_PASSWORD is not set; the grafana_reader role will use a non-secret fallback. "
+ "Set GRAFANA_DB_PASSWORD in production to enable the Grafana PostgreSQL datasource.");
return GRAFANA_DB_PASSWORD_FALLBACK;
}
return value;
}
}

View File

@@ -1,17 +0,0 @@
-- Read-only role used by the Grafana PostgreSQL datasource for the PO Overview
-- dashboard (issue #651). Password is injected at migration time via the Flyway
-- placeholder ${grafanaDbPassword}, supplied by FlywayConfig from the
-- GRAFANA_DB_PASSWORD environment variable.
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = 'grafana_reader') THEN
EXECUTE format('CREATE ROLE grafana_reader WITH LOGIN PASSWORD %L', '${grafanaDbPassword}');
ELSE
EXECUTE format('ALTER ROLE grafana_reader WITH LOGIN PASSWORD %L', '${grafanaDbPassword}');
END IF;
END
$$;
GRANT CONNECT ON DATABASE ${flyway:database} TO grafana_reader;
GRANT USAGE ON SCHEMA public TO grafana_reader;
GRANT SELECT ON audit_log, documents, transcription_blocks TO grafana_reader;

View File

@@ -1,47 +0,0 @@
package org.raddatz.familienarchiv.config;
import org.junit.jupiter.api.Test;
import org.raddatz.familienarchiv.PostgresContainerConfig;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.data.jpa.test.autoconfigure.DataJpaTest;
import org.springframework.boot.jdbc.test.autoconfigure.AutoConfigureTestDatabase;
import org.springframework.context.annotation.Import;
import org.springframework.jdbc.core.JdbcTemplate;
import static org.assertj.core.api.Assertions.assertThat;
@DataJpaTest
@AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
@Import({PostgresContainerConfig.class, FlywayConfig.class})
class GrafanaReaderRoleIntegrationTest {
@Autowired JdbcTemplate jdbc;
@Test
void grafana_reader_has_select_on_audit_log() {
assertThat(hasSelect("audit_log")).isTrue();
}
@Test
void grafana_reader_has_select_on_documents() {
assertThat(hasSelect("documents")).isTrue();
}
@Test
void grafana_reader_has_select_on_transcription_blocks() {
assertThat(hasSelect("transcription_blocks")).isTrue();
}
@Test
void grafana_reader_has_no_select_on_app_users() {
assertThat(hasSelect("app_users")).isFalse();
}
private boolean hasSelect(String table) {
Boolean result = jdbc.queryForObject(
"SELECT has_table_privilege('grafana_reader', ?, 'SELECT')",
Boolean.class,
table);
return Boolean.TRUE.equals(result);
}
}

View File

@@ -147,9 +147,6 @@ services:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PASSWORD:-changeme}
GF_USERS_ALLOW_SIGN_UP: "false"
GF_SERVER_ROOT_URL: ${GF_SERVER_ROOT_URL:-http://localhost:3003}
# Read-only password for the grafana_reader PostgreSQL role; interpolated
# into the provisioned PostgreSQL datasource (see datasources.yml).
GRAFANA_DB_PASSWORD: ${GRAFANA_DB_PASSWORD}
volumes:
- grafana_data:/var/lib/grafana
- ./infra/observability/grafana/provisioning:/etc/grafana/provisioning:ro
@@ -168,7 +165,6 @@ services:
condition: service_healthy
networks:
- obs-net
- archiv-net # PO Overview dashboard queries archive-db via the grafana_reader role
# --- Error Tracking: GlitchTip ---

View File

@@ -227,9 +227,6 @@ services:
SPRING_DATASOURCE_URL: jdbc:postgresql://db:5432/archiv
SPRING_DATASOURCE_USERNAME: archiv
SPRING_DATASOURCE_PASSWORD: ${POSTGRES_PASSWORD}
# Consumed by Flyway V68 via the ${grafanaDbPassword} placeholder to set
# the read-only grafana_reader role's password.
GRAFANA_DB_PASSWORD: ${GRAFANA_DB_PASSWORD}
# Application uses the bucket-scoped service account, not MinIO root.
S3_ENDPOINT: http://minio:9000
S3_ACCESS_KEY: archiv-app

View File

@@ -163,9 +163,6 @@ services:
SPRING_DATASOURCE_URL: jdbc:postgresql://db:5432/${POSTGRES_DB}
SPRING_DATASOURCE_USERNAME: ${POSTGRES_USER}
SPRING_DATASOURCE_PASSWORD: ${POSTGRES_PASSWORD}
# Consumed by Flyway V68 via the ${grafanaDbPassword} placeholder to set
# the read-only grafana_reader role's password.
GRAFANA_DB_PASSWORD: ${GRAFANA_DB_PASSWORD}
S3_ENDPOINT: http://minio:9000
S3_ACCESS_KEY: ${MINIO_ROOT_USER}
S3_SECRET_KEY: ${MINIO_ROOT_PASSWORD}

View File

@@ -152,7 +152,6 @@ All vars are set in `.env` at the repo root (copy from `.env.example`). The back
| `PORT_GRAFANA` | Host port for the Grafana UI (bound to `127.0.0.1` only) | `3003` | — | — |
| `POSTGRES_HOST` | PostgreSQL hostname for GlitchTip's db-init job and workers. Override when only the staging stack is running and `archive-db` is not resolvable by that name. | `archive-db` | — | — |
| `GRAFANA_ADMIN_PASSWORD` | Grafana `admin` user password | `changeme` | YES (prod) | YES |
| `GRAFANA_DB_PASSWORD` | Password for the read-only `grafana_reader` PostgreSQL role used by the PO Overview dashboard (issue #651). Consumed by Flyway V68 and the Grafana PostgreSQL datasource. Generate with `openssl rand -hex 32`. | — | YES (prod) | YES |
| `PORT_GLITCHTIP` | Host port for the GlitchTip UI (bound to `127.0.0.1` only) | `3002` | — | — |
| `GLITCHTIP_DOMAIN` | Public-facing base URL for GlitchTip (used in email links and CORS) | `http://localhost:3002` | YES (prod) | — |
| `GLITCHTIP_SECRET_KEY` | Django secret key for GlitchTip — generate with `python3 -c "import secrets; print(secrets.token_hex(32))"` | — | YES | YES |
@@ -257,7 +256,6 @@ git.raddatz.cloud A <server IP>
| `MAIL_USERNAME` | release.yml | SMTP user |
| `MAIL_PASSWORD` | release.yml | SMTP password |
| `GRAFANA_ADMIN_PASSWORD` | both | Grafana `admin` login — generate a strong password |
| `GRAFANA_DB_PASSWORD` | both | Read-only `grafana_reader` role password — `openssl rand -hex 32` |
| `GLITCHTIP_SECRET_KEY` | both | Django secret key — `openssl rand -hex 32` |
| `SENTRY_DSN` | both | GlitchTip project DSN — set after first-run (§4); leave empty to keep Sentry disabled |
| `VITE_SENTRY_DSN` | both | GlitchTip frontend project DSN — set after first-run (§4); leave empty to keep Sentry disabled |
@@ -359,7 +357,6 @@ Both files are passed explicitly via `--env-file` to the compose command, so the
| Gitea secret | Notes |
|---|---|
| `GRAFANA_ADMIN_PASSWORD` | Strong unique password; shared by nightly and release |
| `GRAFANA_DB_PASSWORD` | `openssl rand -hex 32`; shared by nightly and release — read-only DB role for the PO Overview dashboard |
| `GLITCHTIP_SECRET_KEY` | `openssl rand -hex 32`; shared by nightly and release |
| `STAGING_POSTGRES_PASSWORD` / `PROD_POSTGRES_PASSWORD` | Must match the running PostgreSQL container |

View File

@@ -80,14 +80,6 @@ _See also [DocumentStatus lifecycle](#documentstatus-lifecycle)._
**Sütterlin** — A specific standardized style of Kurrent taught in German schools from 1915 to 1941.
**Illegible word** — a word whose recognition confidence falls below the configured threshold; replaced with the literal token `[unleserlich]` in the rendered block text and counted in the `ocr_illegible_words_total` Prometheus counter.
**Models-ready gauge** — the `ocr_models_ready` Prometheus gauge, flipped from `0` to `1` once the FastAPI lifespan startup has finished loading the Kraken model and the spell-checker. Used both for the `/health` endpoint and as the supervised signal for the `ocr_models_ready < 1 for 2m` alert.
**Recognition model accuracy** — the accuracy reported by `ketos train` for the recognition (text-line) model, exposed as `ocr_model_accuracy{kind="recognition"}`. Sourced from `_parse_best_checkpoint` on the highest-scoring checkpoint after training.
**Segmentation model accuracy** — the accuracy reported by `ketos segtrain` for the baseline layout analysis (`blla`) model, exposed as `ocr_model_accuracy{kind="segmentation"}`. Distinct from recognition accuracy because the two models are trained and improved independently.
---
## Other Domain Terms

View File

@@ -118,14 +118,11 @@ To find a trace for a specific request in staging/production, either increase th
## Metrics (Prometheus → Grafana)
Prometheus scrapes two targets every 15 s:
Prometheus scrapes the backend management endpoint every 15 s:
```
Target: backend:8081/actuator/prometheus
Labels: job="spring-boot", application="Familienarchiv"
Target: ocr:8000/metrics
Labels: job="ocr-service"
```
All Spring Boot metrics carry the `application="Familienarchiv"` tag, which is how the Grafana Spring Boot Observability dashboard (ID 17175) filters to this service.
@@ -149,70 +146,6 @@ jvm_memory_used_bytes{area="heap", application="Familienarchiv"}
hikaricp_connections_active
```
### OCR-service custom metrics
Exposed at `ocr:8000/metrics` by `prometheus-fastapi-instrumentator`. The
`http_*` metrics describe the FastAPI request layer; the `ocr_*` series are
domain-specific. **Never label these with PII or document content** — labels
have unbounded cardinality risk and are visible to anyone with Grafana access.
| Metric | Type | Labels | Unit | What it tracks |
|---|---|---|---|---|
| `ocr_jobs_total` | Counter | `engine` (`surya`/`kraken`), `script_type` | jobs | OCR jobs that started after a successful PDF download |
| `ocr_pages_total` | Counter | `engine` | pages | Successfully OCR'd pages in the streaming generator |
| `ocr_skipped_pages_total` | Counter | — | pages | Pages skipped because the engine raised on them |
| `ocr_words_total` | Counter | — | words | Recognized words summed across every block |
| `ocr_illegible_words_total` | Counter | — | words | Words below the confidence threshold (rendered as `[unleserlich]`) |
| `ocr_processing_seconds` | Histogram | `engine` | seconds | Per-page (stream) or per-document (`/ocr`) engine time, excluding preprocessing |
| `ocr_training_runs_total` | Counter | `kind` (`recognition`/`segmentation`), `outcome` (`success`/`error`) | runs | Completed training runs |
| `ocr_model_accuracy` | Gauge | `kind` | ratio (01) | Latest accuracy reported by a successful training run |
| `ocr_models_ready` | Gauge | — | 0\|1 | 1 once the lifespan startup has finished loading models |
Canonical example queries (the same ones referenced in issue #652):
```promql
# OCR throughput by engine
sum by (engine) (rate(ocr_pages_total[5m]))
# Share of words rendered as [unleserlich]
sum(rate(ocr_illegible_words_total[5m]))
/ sum(rate(ocr_words_total[5m]))
# p95 page processing time per engine
histogram_quantile(0.95, sum by (engine, le) (
rate(ocr_processing_seconds_bucket[5m])
))
# Training error rate
sum(rate(ocr_training_runs_total{outcome="error"}[1h]))
/ sum(rate(ocr_training_runs_total[1h]))
# Latest recognition vs segmentation accuracy
ocr_model_accuracy
```
### Internal-only endpoints
`/metrics` is exposed by the OCR service over plain HTTP without
authentication. The container is reachable only on the internal Docker
network — Caddy never proxies to it directly. If the service is ever
exposed (e.g. a `ports:` mapping is added), block the endpoint at the
reverse proxy:
```caddy
ocr.example.com {
@internal_only path /metrics /health
respond @internal_only 404
reverse_proxy ocr:8000
}
```
The `MetricsPathFilter` in `ocr-service/main.py` suppresses uvicorn's
**stdout** access log lines for `/metrics` and `/health` so the container
console stays focused on real OCR traffic. Promtail/Loki still receive
access lines from any other source. Treat the filter as console
noise-control, not an audit-suppression mechanism.
## Errors (GlitchTip)
GlitchTip receives errors from both the backend (via Sentry Java SDK) and the frontend (via Sentry JavaScript SDK). It groups events by fingerprint, tracks first/last seen times, and links to the release that introduced the error.

View File

@@ -1,94 +0,0 @@
# ADR-023: Prometheus Instrumentator and Metrics Registry Injection
## Status
Accepted
## Context
Until issue #652 the OCR service exposed no `/metrics` endpoint. The
observability stack already scrapes the Spring Boot backend's actuator
endpoint, but it had nothing to scrape on the Python side. Without HTTP-
and domain-level metrics from `ocr-service` we cannot answer questions
like "what is the share of words rendered as `[unleserlich]`" or
"is the training error rate above its budget" from Grafana.
Two implementation requirements influenced the design:
1. **Counter / gauge isolation in tests.** `prometheus_client` collectors
are module-level singletons keyed by name on the global `REGISTRY`.
Re-importing or naively re-instantiating them raises a duplicated-
collector error and cross-test state leaks (a `.inc()` in test A is
still readable by test B). A test harness needs a way to swap the
active container for a fresh per-test instance.
2. **Minimal blast radius on the request path.** We did not want to
hand-instrument every endpoint with FastAPI middleware. The
`prometheus-fastapi-instrumentator` library already provides
`http_requests_total`, `http_request_duration_seconds`, and the
`/metrics` exposition route, all idiomatic Prometheus names.
## Decision
- Add `prometheus-fastapi-instrumentator==7.0.0` and pin its transitive
dependency `prometheus-client==0.25.0` explicitly in
`ocr-service/requirements.txt`.
- Mount the instrumentator once at module load:
`Instrumentator(excluded_handlers=["/health", "/metrics"]).instrument(app).expose(app)`.
This adds `/metrics` and an HTTP-level dashboard surface without
changing any endpoint code.
- Define every domain metric (`ocr_jobs_total`, `ocr_pages_total`,
`ocr_processing_seconds`, …) inside a `build_metrics(registry)`
factory in `ocr-service/metrics.py` that returns a frozen `OcrMetrics`
dataclass. Production code binds the container to the default
`REGISTRY` once: `metrics: OcrMetrics = build_metrics(REGISTRY)`.
- Tests use a `fresh_metrics` fixture that builds a new
`CollectorRegistry()` per test and monkeypatches `main.metrics` with
a container bound to it. The endpoint code keeps reading
`metrics.<name>` without knowing whether it is talking to the global
registry or a per-test one.
## Consequences
**Positive**
- One reusable factory captures the metric definitions; future metrics
go in one place.
- Tests run with full counter isolation. Cross-test state leakage is
impossible because each test sees its own dataclass instance.
- The instrumentator gives us `http_*` metrics for free, including a
Grafana-ready histogram that pairs with the Spring Boot one.
**Negative**
- One extra level of indirection: any test that asserts on metric
values must remember to monkeypatch `main.metrics`, not the registry
directly. Rebinding through the registry is harmless but useless —
the dataclass holds references to the original collectors.
- `prometheus-client` is now pinned. Upgrading it requires an explicit
bump and re-checking the instrumentator's compatibility range.
- `/metrics` is exposed unauthenticated and relies on the Docker
internal network for confidentiality. See
[docs/OBSERVABILITY.md §Internal-only endpoints](../OBSERVABILITY.md)
for the Caddy snippet that must be added if the service ever gets a
host-side port mapping.
## Alternatives considered
- **Hand-roll the `/metrics` endpoint.** Rejected: would have meant
duplicating what `prometheus-fastapi-instrumentator` ships, plus
middleware for the HTTP histograms.
- **Skip the factory; pass `registry` as a function argument
everywhere.** Rejected: clutters every endpoint signature and breaks
the symmetry with the Spring Boot side, which also relies on a
process-global Micrometer registry.
- **Use a `pytest` autouse fixture that resets `REGISTRY` between
tests.** Rejected: `prometheus_client` does not expose a clean
"unregister all" hook, and we would be relying on private APIs.
## References
- Issue: [#652](https://git.raddatz.cloud/marcel/familienarchiv/issues/652)
- Library: <https://github.com/trallnag/prometheus-fastapi-instrumentator>
- Code: `ocr-service/metrics.py`, `ocr-service/main.py`,
`ocr-service/test_metrics.py`

View File

@@ -43,12 +43,9 @@ Rel(ocr, storage, "Fetches PDF via presigned URL", "HTTP / S3 presigned")
Rel(mc, storage, "Bootstraps bucket + service account on startup", "MinIO Client CLI")
Rel(promtail, loki, "Pushes log streams", "HTTP/Loki push API")
Rel(backend, tempo, "Sends distributed traces via OTLP", "HTTP / OTLP / port 4318 (archiv-net)")
Rel(prometheus, backend, "Scrapes JVM + HTTP metrics", "HTTP 8081 /actuator/prometheus")
Rel(prometheus, ocr, "Scrapes OCR + http_* metrics", "HTTP 8000 /metrics")
Rel(grafana, prometheus, "Queries metrics", "HTTP 9090")
Rel(grafana, loki, "Queries logs", "HTTP 3100")
Rel(grafana, tempo, "Queries traces", "HTTP 3200")
Rel(grafana, db, "Read-only dashboard queries via grafana_reader role", "PostgreSQL / archiv-net")
Rel(glitchtip, db, "Stores error events in glitchtip DB", "PostgreSQL / archiv-net")
Rel(obs_glitchtip_worker, obs_redis, "Processes Celery tasks", "Redis / obs-net")

View File

@@ -0,0 +1,20 @@
// Shared mock for SvelteKit's $app/navigation virtual module.
// Activated by calling `vi.mock('$app/navigation')` (no factory) in a spec.
// Per ADR-012: eliminating per-spec factory bodies removes 36 birpc-race surface
// points; the unified mock keeps every nav export available as a vi.fn().
//
// IMPORTANT: consuming specs MUST call `vi.clearAllMocks()` (or per-mock
// `mockClear()`) in `afterEach` — otherwise call counts leak between tests.
import { vi } from 'vitest';
export const goto = vi.fn(async () => {});
export const invalidate = vi.fn(async () => {});
export const invalidateAll = vi.fn(async () => {});
export const beforeNavigate = vi.fn();
export const afterNavigate = vi.fn();
export const preloadCode = vi.fn(async () => {});
export const preloadData = vi.fn(async () => {});
export const pushState = vi.fn();
export const replaceState = vi.fn();
export const disableScrollHandling = vi.fn();
export const onNavigate = vi.fn(() => () => {});

View File

@@ -4,7 +4,7 @@ import { cleanup, render } from 'vitest-browser-svelte';
import { page, userEvent } from 'vitest/browser';
import BulkDocumentEditLayout from './BulkDocumentEditLayout.svelte';
vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
vi.mock('$app/navigation');
afterEach(() => {
cleanup();

View File

@@ -5,7 +5,7 @@ import { goto } from '$app/navigation';
import BulkSelectionBar from './BulkSelectionBar.svelte';
import { bulkSelectionStore } from '$lib/document/bulkSelection.svelte';
vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
vi.mock('$app/navigation');
afterEach(() => {
cleanup();

View File

@@ -6,7 +6,7 @@ import DocumentRow from './DocumentRow.svelte';
import { bulkSelectionStore } from '$lib/document/bulkSelection.svelte';
import type { components } from '$lib/generated/api';
vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
vi.mock('$app/navigation');
afterEach(() => {
cleanup();

View File

@@ -2,19 +2,7 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: DocumentRow } = await import('./DocumentRow.svelte');

View File

@@ -3,7 +3,7 @@ import { cleanup, render } from 'vitest-browser-svelte';
import type { NotificationItem } from '$lib/notification/notifications';
import NotificationBell from './NotificationBell.svelte';
vi.mock('$app/navigation', () => ({ goto: vi.fn(), beforeNavigate: vi.fn() }));
vi.mock('$app/navigation');
vi.mock('$app/forms', () => ({
enhance(node: HTMLFormElement, submit?: (opts: { formData: FormData }) => unknown) {
const handler = (e: Event) => {

View File

@@ -4,7 +4,7 @@ import { page } from 'vitest/browser';
import { goto } from '$app/navigation';
import NotificationDropdown from './NotificationDropdown.svelte';
vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
vi.mock('$app/navigation');
// Configurable result for the enhance mock — tests that need failure set
// mockFormResult.type = 'failure' before clicking.

View File

@@ -3,7 +3,7 @@ import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
import StammbaumSidePanel from './StammbaumSidePanel.svelte';
vi.mock('$app/navigation', () => ({ invalidateAll: vi.fn() }));
vi.mock('$app/navigation');
vi.mock('$app/forms', () => ({ enhance: () => () => {} }));
vi.mock('$lib/person/PersonTypeahead.svelte', () => ({ default: () => null }));

View File

@@ -3,19 +3,7 @@ import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
import StammbaumSidePanel from './StammbaumSidePanel.svelte';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
afterEach(cleanup);

View File

@@ -4,7 +4,7 @@ import { page } from 'vitest/browser';
import DocumentList from './DocumentList.svelte';
import type { components } from '$lib/generated/api';
vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
vi.mock('$app/navigation');
afterEach(() => cleanup());

View File

@@ -2,19 +2,7 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: DocumentList } = await import('./DocumentList.svelte');

View File

@@ -1,13 +1,11 @@
import { describe, it, expect, afterEach, vi } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
import { invalidateAll } from '$app/navigation';
import DropZone from './DropZone.svelte';
// vi.hoisted lets the mock fn reference survive vi.mock's hoisting so tests
// can assert on it from below while the factory remains self-contained.
const { invalidateAllMock } = vi.hoisted(() => ({ invalidateAllMock: vi.fn(async () => {}) }));
vi.mock('$app/navigation', () => ({ invalidateAll: invalidateAllMock }));
vi.mock('$app/navigation');
afterEach(() => {
cleanup();
@@ -68,7 +66,7 @@ describe('DropZone onUploadComplete', () => {
// invalidateAll is the last async step of the upload handler — once it
// has been called, the callback decision has already been made.
await vi.waitFor(() => {
expect(invalidateAllMock).toHaveBeenCalled();
expect(vi.mocked(invalidateAll)).toHaveBeenCalled();
});
expect(onUploadComplete).not.toHaveBeenCalled();
});

View File

@@ -2,19 +2,7 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: DropZone } = await import('./DropZone.svelte');

View File

@@ -5,7 +5,7 @@ import Page from './+page.svelte';
import { createConfirmService, CONFIRM_KEY } from '$lib/shared/services/confirm.svelte.js';
vi.mock('$app/forms', () => ({ enhance: () => () => {} }));
vi.mock('$app/navigation', () => ({ beforeNavigate: vi.fn(), goto: vi.fn() }));
vi.mock('$app/navigation');
import { beforeNavigate, goto } from '$app/navigation';

View File

@@ -11,7 +11,7 @@ vi.mock('$app/forms', () => ({
return { destroy: vi.fn() };
}
}));
vi.mock('$app/navigation', () => ({ beforeNavigate: vi.fn(), goto: vi.fn() }));
vi.mock('$app/navigation');
import { beforeNavigate, goto } from '$app/navigation';

View File

@@ -2,19 +2,7 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: AdminGroupNewPage } = await import('./+page.svelte');

View File

@@ -8,7 +8,7 @@ import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
import Page from './+page.svelte';
vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
vi.mock('$app/navigation');
const fullData = {
userCount: 4,

View File

@@ -2,19 +2,7 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: AdminEntryPage } = await import('./+page.svelte');

View File

@@ -5,11 +5,7 @@ import Page from './+page.svelte';
import { createConfirmService, CONFIRM_KEY } from '$lib/shared/services/confirm.svelte.js';
vi.mock('$app/forms', () => ({ enhance: () => () => {} }));
vi.mock('$app/navigation', () => ({
beforeNavigate: vi.fn(),
goto: vi.fn(),
replaceState: vi.fn()
}));
vi.mock('$app/navigation');
vi.mock('$app/stores', () => ({
page: {
subscribe: (fn: (v: { url: URL }) => void) => {

View File

@@ -17,19 +17,7 @@ vi.mock('$lib/shared/services/confirm.svelte', () => ({
getConfirmService: () => ({ confirm: async () => false })
}));
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: AdminTagEditPage } = await import('./+page.svelte');

View File

@@ -15,7 +15,7 @@ vi.mock('$app/forms', () => ({
return () => {};
}
}));
vi.mock('$app/navigation', () => ({ beforeNavigate: vi.fn(), goto: vi.fn() }));
vi.mock('$app/navigation');
import { beforeNavigate, goto } from '$app/navigation';

View File

@@ -11,7 +11,7 @@ vi.mock('$app/forms', () => ({
return { destroy: vi.fn() };
}
}));
vi.mock('$app/navigation', () => ({ beforeNavigate: vi.fn(), goto: vi.fn() }));
vi.mock('$app/navigation');
import { beforeNavigate, goto } from '$app/navigation';

View File

@@ -2,19 +2,7 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: AdminUserNewPage } = await import('./+page.svelte');

View File

@@ -14,19 +14,7 @@ vi.mock('$app/state', () => ({
}
}));
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
vi.mock('$lib/notification/notifications.svelte', () => ({
notificationStore: {

View File

@@ -3,7 +3,7 @@ import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
import CorrespondenzHero from './CorrespondenzHero.svelte';
vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
vi.mock('$app/navigation');
afterEach(cleanup);

View File

@@ -3,7 +3,7 @@ import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
import Page from './+page.svelte';
vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
vi.mock('$app/navigation');
afterEach(cleanup);

View File

@@ -2,19 +2,7 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: BriefwechselPage } = await import('./+page.svelte');

View File

@@ -13,19 +13,7 @@ vi.mock('$app/state', () => ({
}
}));
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
vi.mock('$lib/shared/services/confirm.svelte', () => ({
getConfirmService: () => ({ confirm: async () => false })

View File

@@ -1,21 +1,9 @@
import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
import { goto } from '$app/navigation';
const gotoSpy = vi.fn();
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: gotoSpy,
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { bulkSelectionStore } = await import('$lib/document/bulkSelection.svelte');
const { default: BulkEditPage } = await import('./+page.svelte');
@@ -23,14 +11,14 @@ const { default: BulkEditPage } = await import('./+page.svelte');
afterEach(() => {
cleanup();
bulkSelectionStore.clear();
gotoSpy.mockClear();
vi.mocked(goto).mockClear();
});
describe('documents/bulk-edit page', () => {
it('redirects to /documents when no documents are selected', async () => {
render(BulkEditPage, { props: {} });
await vi.waitFor(() => expect(gotoSpy).toHaveBeenCalledWith('/documents'));
await vi.waitFor(() => expect(vi.mocked(goto)).toHaveBeenCalledWith('/documents'));
});
it('shows the loading spinner while fetching batch metadata', async () => {

View File

@@ -2,7 +2,7 @@ import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
vi.mock('$app/navigation');
vi.mock('$app/state', () => ({ navigating: { to: null } }));
import Page from './+page.svelte';

View File

@@ -4,19 +4,7 @@ import { page } from 'vitest/browser';
const mockNavigating = { to: null };
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
vi.mock('$app/state', () => ({
get navigating() {

View File

@@ -2,19 +2,7 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: GeschichtenEditPage } = await import('./+page.svelte');

View File

@@ -2,19 +2,7 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: GeschichtenNewPage } = await import('./+page.svelte');

View File

@@ -2,7 +2,7 @@ import { afterEach, describe, expect, it, vi } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
vi.mock('$app/navigation');
vi.mock('$app/state', () => ({ navigating: { to: null } }));
import Page from './+page.svelte';

View File

@@ -2,19 +2,7 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: GeschichtenListPage } = await import('./+page.svelte');

View File

@@ -8,7 +8,7 @@ type User = components['schemas']['AppUser'];
afterEach(cleanup);
vi.mock('$app/navigation', () => ({ goto: vi.fn(), invalidateAll: vi.fn() }));
vi.mock('$app/navigation');
const baseUser: User = {
id: 'u1',

View File

@@ -5,7 +5,7 @@ import Page from './+page.svelte';
const tick = () => new Promise((r) => setTimeout(r, 0));
vi.mock('$app/navigation', () => ({ goto: vi.fn() }));
vi.mock('$app/navigation');
const makePerson = (overrides = {}) => ({
id: '1',

View File

@@ -2,19 +2,7 @@ import { describe, it, expect, vi, afterEach } from 'vitest';
import { cleanup, render } from 'vitest-browser-svelte';
import { page } from 'vitest/browser';
vi.mock('$app/navigation', () => ({
beforeNavigate: () => {},
afterNavigate: () => {},
goto: vi.fn(),
invalidate: vi.fn(),
invalidateAll: vi.fn(),
preloadCode: vi.fn(),
preloadData: vi.fn(),
pushState: vi.fn(),
replaceState: vi.fn(),
disableScrollHandling: vi.fn(),
onNavigate: () => () => {}
}));
vi.mock('$app/navigation');
const { default: PersonsListPage } = await import('./+page.svelte');

View File

@@ -1,702 +0,0 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": { "type": "grafana", "uid": "grafana" },
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"description": "Product owner overview — system health, user activity, archive progress, and OCR quality at a weekly glance.",
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"collapsed": false,
"gridPos": { "h": 1, "w": 24, "x": 0, "y": 0 },
"id": 100,
"title": "System Health",
"type": "row",
"panels": []
},
{
"id": 1,
"title": "Backend Status",
"type": "stat",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"gridPos": { "h": 4, "w": 6, "x": 0, "y": 1 },
"targets": [
{
"expr": "up{job=\"spring-boot\"}",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"mappings": [
{ "type": "value", "options": { "0": { "text": "DOWN", "color": "red" } } },
{ "type": "value", "options": { "1": { "text": "UP", "color": "green" } } }
],
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "red", "value": null },
{ "color": "green", "value": 1 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"colorMode": "background",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"textMode": "value"
}
},
{
"id": 2,
"title": "Server Errors (5xx)",
"type": "stat",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"gridPos": { "h": 4, "w": 6, "x": 6, "y": 1 },
"targets": [
{
"expr": "sum(increase(http_server_requests_seconds_count{status=~\"5..\"}[$__range]))",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 1 },
{ "color": "red", "value": 6 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"colorMode": "background",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 3,
"title": "Response Time (p95)",
"type": "stat",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"gridPos": { "h": 4, "w": 6, "x": 12, "y": 1 },
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(http_server_requests_seconds_bucket[$__range])) by (le))",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"decimals": 2,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 0.5 },
{ "color": "red", "value": 2 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"colorMode": "background",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 4,
"title": "Error Log Count",
"type": "stat",
"datasource": { "type": "loki", "uid": "loki" },
"gridPos": { "h": 4, "w": 6, "x": 18, "y": 1 },
"targets": [
{
"expr": "sum(count_over_time({compose_service=\"backend\"} | json | level=\"ERROR\" [$__range]))",
"queryType": "instant",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 1 },
{ "color": "red", "value": 10 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"colorMode": "background",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 5,
"title": "CPU Usage",
"type": "bargauge",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"gridPos": { "h": 5, "w": 8, "x": 0, "y": 5 },
"targets": [
{
"expr": "100 - (avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"decimals": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 70 },
{ "color": "red", "value": 85 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"displayMode": "gradient",
"orientation": "horizontal",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"showUnfilled": true
}
},
{
"id": 6,
"title": "Memory Usage",
"type": "bargauge",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"gridPos": { "h": 5, "w": 8, "x": 8, "y": 5 },
"targets": [
{
"expr": "(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"decimals": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 70 },
{ "color": "red", "value": 85 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"displayMode": "gradient",
"orientation": "horizontal",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"showUnfilled": true
}
},
{
"id": 7,
"title": "Disk Usage",
"type": "bargauge",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"gridPos": { "h": 5, "w": 8, "x": 16, "y": 5 },
"targets": [
{
"expr": "(1 - (node_filesystem_avail_bytes{mountpoint=\"/\"} / node_filesystem_size_bytes{mountpoint=\"/\"})) * 100",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"decimals": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 70 },
{ "color": "red", "value": 80 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"displayMode": "gradient",
"orientation": "horizontal",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"showUnfilled": true
}
},
{
"collapsed": false,
"gridPos": { "h": 1, "w": 24, "x": 0, "y": 10 },
"id": 101,
"title": "User Activity",
"type": "row",
"panels": []
},
{
"id": 8,
"title": "Active Users",
"type": "stat",
"datasource": { "type": "postgres", "uid": "postgres" },
"gridPos": { "h": 4, "w": 8, "x": 0, "y": 11 },
"targets": [
{
"rawSql": "SELECT COUNT(DISTINCT actor_id) AS value FROM audit_log WHERE happened_at >= NOW() - INTERVAL '7 days' AND kind = 'LOGIN_SUCCESS'",
"format": "table",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0,
"color": { "mode": "fixed", "fixedColor": "blue" }
}
},
"options": {
"colorMode": "value",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 9,
"title": "Total Logins",
"type": "stat",
"datasource": { "type": "postgres", "uid": "postgres" },
"gridPos": { "h": 4, "w": 8, "x": 8, "y": 11 },
"targets": [
{
"rawSql": "SELECT COUNT(*) AS value FROM audit_log WHERE happened_at >= NOW() - INTERVAL '7 days' AND kind = 'LOGIN_SUCCESS'",
"format": "table",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0,
"color": { "mode": "fixed", "fixedColor": "blue" }
}
},
"options": {
"colorMode": "value",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 10,
"title": "Failed Login Attempts",
"type": "stat",
"datasource": { "type": "postgres", "uid": "postgres" },
"gridPos": { "h": 4, "w": 8, "x": 16, "y": 11 },
"targets": [
{
"rawSql": "SELECT COUNT(*) AS value FROM audit_log WHERE happened_at >= NOW() - INTERVAL '7 days' AND kind IN ('LOGIN_FAILED', 'LOGIN_RATE_LIMITED')",
"format": "table",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 1 },
{ "color": "red", "value": 4 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"colorMode": "background",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 11,
"title": "Daily Logins (last 7 days)",
"type": "barchart",
"datasource": { "type": "postgres", "uid": "postgres" },
"gridPos": { "h": 7, "w": 24, "x": 0, "y": 15 },
"targets": [
{
"rawSql": "SELECT DATE_TRUNC('day', happened_at) AS time, COUNT(*) AS logins FROM audit_log WHERE happened_at >= NOW() - INTERVAL '7 days' AND kind = 'LOGIN_SUCCESS' GROUP BY 1 ORDER BY 1",
"format": "time_series",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0,
"color": { "mode": "fixed", "fixedColor": "blue" }
}
},
"options": {
"legend": { "displayMode": "hidden" },
"orientation": "auto",
"showValue": "auto",
"stacking": "none",
"xTickLabelRotation": 0,
"xTickLabelSpacing": 0
}
},
{
"collapsed": false,
"gridPos": { "h": 1, "w": 24, "x": 0, "y": 22 },
"id": 102,
"title": "Archive Progress",
"type": "row",
"panels": []
},
{
"id": 12,
"title": "Transcription Coverage",
"type": "bargauge",
"datasource": { "type": "postgres", "uid": "postgres" },
"gridPos": { "h": 5, "w": 24, "x": 0, "y": 23 },
"targets": [
{
"rawSql": "SELECT (COUNT(*) FILTER (WHERE text IS NOT NULL AND text <> ''))::float * 100.0 / NULLIF(COUNT(*), 0) AS percent_complete FROM transcription_blocks",
"format": "table",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"decimals": 1,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "red", "value": null },
{ "color": "yellow", "value": 25 },
{ "color": "green", "value": 75 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"displayMode": "gradient",
"orientation": "horizontal",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"showUnfilled": true
}
},
{
"id": 13,
"title": "Total Documents",
"type": "stat",
"datasource": { "type": "postgres", "uid": "postgres" },
"gridPos": { "h": 4, "w": 6, "x": 0, "y": 28 },
"targets": [
{
"rawSql": "SELECT COUNT(*) AS value FROM documents WHERE status <> 'PLACEHOLDER'",
"format": "table",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0,
"color": { "mode": "fixed", "fixedColor": "blue" }
}
},
"options": {
"colorMode": "value",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 14,
"title": "Uploads This Week",
"type": "stat",
"datasource": { "type": "postgres", "uid": "postgres" },
"gridPos": { "h": 4, "w": 6, "x": 6, "y": 28 },
"targets": [
{
"rawSql": "SELECT COUNT(*) AS value FROM audit_log WHERE happened_at >= NOW() - INTERVAL '7 days' AND kind = 'FILE_UPLOADED'",
"format": "table",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0,
"color": { "mode": "fixed", "fixedColor": "blue" }
}
},
"options": {
"colorMode": "value",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 15,
"title": "Blocks Transcribed This Week",
"type": "stat",
"datasource": { "type": "postgres", "uid": "postgres" },
"gridPos": { "h": 4, "w": 6, "x": 12, "y": 28 },
"targets": [
{
"rawSql": "SELECT COUNT(*) AS value FROM audit_log WHERE happened_at >= NOW() - INTERVAL '7 days' AND kind = 'TEXT_SAVED'",
"format": "table",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0,
"color": { "mode": "fixed", "fixedColor": "blue" }
}
},
"options": {
"colorMode": "value",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 16,
"title": "Blocks Reviewed This Week",
"type": "stat",
"datasource": { "type": "postgres", "uid": "postgres" },
"gridPos": { "h": 4, "w": 6, "x": 18, "y": 28 },
"targets": [
{
"rawSql": "SELECT COUNT(*) AS value FROM audit_log WHERE happened_at >= NOW() - INTERVAL '7 days' AND kind = 'BLOCK_REVIEWED'",
"format": "table",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0,
"color": { "mode": "fixed", "fixedColor": "blue" }
}
},
"options": {
"colorMode": "value",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"collapsed": false,
"gridPos": { "h": 1, "w": 24, "x": 0, "y": 32 },
"id": 103,
"title": "OCR Health",
"type": "row",
"panels": []
},
{
"id": 17,
"title": "OCR Jobs",
"type": "stat",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"gridPos": { "h": 4, "w": 6, "x": 0, "y": 33 },
"targets": [
{
"expr": "sum(increase(ocr_jobs_total[$__range]))",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0,
"color": { "mode": "fixed", "fixedColor": "blue" }
}
},
"options": {
"colorMode": "value",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 18,
"title": "OCR Page Error Rate",
"type": "stat",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"gridPos": { "h": 4, "w": 6, "x": 6, "y": 33 },
"targets": [
{
"expr": "sum(increase(ocr_skipped_pages_total[$__range])) / clamp_min(sum(increase(ocr_pages_total[$__range])), 1)",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"decimals": 1,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 0.01 },
{ "color": "red", "value": 0.05 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"colorMode": "background",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 19,
"title": "Illegible Word Rate",
"type": "stat",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"gridPos": { "h": 4, "w": 6, "x": 12, "y": 33 },
"targets": [
{
"expr": "sum(increase(ocr_illegible_words_total[$__range])) / clamp_min(sum(increase(ocr_words_total[$__range])), 1)",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"decimals": 1,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 0.1 },
{ "color": "red", "value": 0.25 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"colorMode": "background",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
}
},
{
"id": 20,
"title": "OCR Service Status",
"type": "stat",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"gridPos": { "h": 4, "w": 6, "x": 18, "y": 33 },
"targets": [
{
"expr": "ocr_models_ready",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"mappings": [
{ "type": "value", "options": { "0": { "text": "NOT READY", "color": "red" } } },
{ "type": "value", "options": { "1": { "text": "READY", "color": "green" } } }
],
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "red", "value": null },
{ "color": "green", "value": 1 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"colorMode": "background",
"graphMode": "none",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"textMode": "value"
}
}
],
"refresh": "",
"schemaVersion": 39,
"tags": ["po-overview", "familienarchiv"],
"templating": { "list": [] },
"time": { "from": "now-7d", "to": "now" },
"timepicker": {},
"timezone": "browser",
"title": "PO Overview",
"uid": "po-overview",
"version": 1,
"weekStart": ""
}

View File

@@ -36,19 +36,3 @@ datasources:
datasourceUid: prometheus
nodeGraph:
enabled: true
# Read-only PostgreSQL datasource for the PO Overview dashboard (issue #651).
# Uses the grafana_reader role provisioned by Flyway V68. Traffic stays inside
# archiv-net, so sslmode=disable is the deliberate, accepted setting.
- name: PostgreSQL
type: postgres
uid: postgres
url: archive-db:5432
user: grafana_reader
editable: false
secureJsonData:
password: ${GRAFANA_DB_PASSWORD}
jsonData:
database: ${POSTGRES_DB}
sslmode: disable
postgresVersion: 1600

View File

@@ -16,11 +16,6 @@ GLITCHTIP_DOMAIN=https://glitchtip.archiv.raddatz.cloud
POSTGRES_USER=archiv
# Note: GRAFANA_DB_PASSWORD is a secret and is injected by CI from
# obs-secrets.env (see .env.example for the local-dev declaration).
# It is consumed by both archive-backend (Flyway V68 placeholder) and
# obs-grafana (PostgreSQL datasource).
# PostgreSQL hostname for GlitchTip db-init and workers.
# The actual value depends on the Compose project name — it is not a fixed string.
# CI sets POSTGRES_HOST in obs-secrets.env per environment:

View File

@@ -20,4 +20,7 @@ scrape_configs:
- job_name: ocr-service
metrics_path: /metrics
static_configs:
# TODO: remove or add prometheus-client to ocr-service.
# The Python OCR service does not currently expose Prometheus metrics.
# This target will show as DOWN until prometheus-client is added to ocr-service.
- targets: ['ocr:8000']

View File

@@ -2,7 +2,6 @@
import asyncio
import glob
import inspect
import io
import json
import logging
@@ -11,11 +10,9 @@ import re
import shutil
import subprocess
import tempfile
import time
import zipfile
from contextlib import asynccontextmanager
from datetime import datetime, timezone
from typing import Awaitable, Callable
from urllib.parse import urlparse
import httpx
@@ -23,11 +20,8 @@ import pypdfium2 as pdfium
from fastapi import FastAPI, Form, Header, HTTPException, UploadFile
from fastapi.responses import StreamingResponse
from PIL import Image
from prometheus_client import REGISTRY
from prometheus_fastapi_instrumentator import Instrumentator
from confidence import apply_confidence_markers, get_threshold
from metrics import OcrMetrics, build_metrics
from spell_check import correct_text, load_spell_checker
from engines import kraken as kraken_engine
from engines import surya as surya_engine
@@ -43,12 +37,6 @@ logger = logging.getLogger(__name__)
_models_ready = False
# One-shot import-time binding to the default REGISTRY. Tests that need a
# clean counter state must monkeypatch `main.metrics` with a container built
# from a fresh CollectorRegistry — rebinding through the registry directly
# will not retarget the references stored in the OcrMetrics dataclass.
metrics: OcrMetrics = build_metrics(REGISTRY)
ALLOWED_PDF_HOSTS = set(
h.strip() for h in os.getenv("ALLOWED_PDF_HOSTS", "minio,localhost,127.0.0.1").split(",")
)
@@ -56,42 +44,6 @@ ALLOWED_PDF_HOSTS = set(
_SPELL_CHECK_SCRIPT_TYPES = {"HANDWRITING_KURRENT", "HANDWRITING_LATIN"}
async def _record_training(
runner: Callable[[], Awaitable[dict] | dict],
kind: str,
) -> dict:
"""Run a training callable and record outcome + accuracy metrics.
Wraps the per-endpoint try/except + outcome counter + accuracy gauge
block that used to be repeated at /train, /train-sender, and /segtrain.
The runner returns a dict with at least an `accuracy` key; if its value
is None, the gauge is left at its default.
"""
try:
result = runner()
if inspect.isawaitable(result):
result = await result
except Exception:
metrics.ocr_training_runs_total.labels(kind=kind, outcome="error").inc()
raise
metrics.ocr_training_runs_total.labels(kind=kind, outcome="success").inc()
if result.get("accuracy") is not None:
metrics.ocr_model_accuracy.labels(kind=kind).set(result["accuracy"])
return result
def _observe_block_words(words: list[dict], threshold: float) -> None:
"""Record per-block word counts and below-threshold word counts.
Pre: `words` is non-empty. Caller checks for that — keeping the helper
branch-free makes the call sites read as a single line.
"""
metrics.ocr_words_total.inc(len(words))
metrics.ocr_illegible_words_total.inc(
sum(1 for w in words if w["confidence"] < threshold)
)
def _validate_url(url: str) -> None:
"""Validate that the PDF URL points to an allowed host (SSRF protection)."""
parsed = urlparse(url)
@@ -111,7 +63,6 @@ async def lifespan(app: FastAPI):
kraken_engine.load_models()
load_spell_checker()
_models_ready = True
metrics.ocr_models_ready.set(1)
logger.info("Startup complete — ready to accept requests")
yield
@@ -121,28 +72,6 @@ async def lifespan(app: FastAPI):
app = FastAPI(title="Familienarchiv OCR Service", lifespan=lifespan)
# /metrics is unauthenticated — relies on Docker-internal-network exposure
# only (CWE-200 risk if `ports:` ever maps 8000 to host). See
# docs/OBSERVABILITY.md §Internal-only endpoints for the Caddy block snippet.
Instrumentator(excluded_handlers=["/health", "/metrics"]).instrument(app).expose(app)
class MetricsPathFilter(logging.Filter):
"""Drop uvicorn.access entries for /metrics and /health to keep logs focused."""
_SUPPRESSED_PATHS = {"/metrics", "/health"}
def filter(self, record: logging.LogRecord) -> bool:
# uvicorn.access formats as: '%s - "%s %s HTTP/%s" %d'
if record.args and len(record.args) >= 3:
path = record.args[2]
if isinstance(path, str) and path in self._SUPPRESSED_PATHS:
return False
return True
logging.getLogger("uvicorn.access").addFilter(MetricsPathFilter())
@app.get("/health")
def health():
@@ -170,9 +99,7 @@ async def run_ocr(request: OcrRequest):
del img
script_type = request.scriptType.upper()
engine_name = "kraken" if script_type == "HANDWRITING_KURRENT" else "surya"
extract_started = time.monotonic()
if script_type == "HANDWRITING_KURRENT":
if not kraken_engine.is_available():
raise HTTPException(
@@ -184,18 +111,11 @@ async def run_ocr(request: OcrRequest):
else:
# TYPEWRITER, HANDWRITING_LATIN, UNKNOWN — all use Surya
blocks = await asyncio.to_thread(surya_engine.extract_blocks, images, request.language)
metrics.ocr_processing_seconds.labels(engine=engine_name).observe(
time.monotonic() - extract_started
)
metrics.ocr_jobs_total.labels(engine=engine_name, script_type=script_type).inc()
threshold = get_threshold(script_type)
for block in blocks:
words = block.get("words") or []
if words:
_observe_block_words(words, threshold)
block["text"] = apply_confidence_markers(words, threshold)
if block.get("words"):
block["text"] = apply_confidence_markers(block["words"], threshold)
block.pop("words", None)
if script_type in _SPELL_CHECK_SCRIPT_TYPES:
block["text"] = correct_text(block["text"])
@@ -226,9 +146,6 @@ async def run_ocr_stream(request: OcrRequest):
)
engine = kraken_engine if use_kraken else surya_engine
engine_name = "kraken" if use_kraken else "surya"
metrics.ocr_jobs_total.labels(engine=engine_name, script_type=script_type).inc()
if request.regions:
# Guided mode: recognize only the user-drawn annotation regions
@@ -259,15 +176,12 @@ async def run_ocr_stream(request: OcrRequest):
image = await asyncio.to_thread(preprocess_page, image)
blocks = []
sender_path = request.senderModelPath if use_kraken else None
engine_seconds = 0.0
for region in page_regions:
region_started = time.monotonic()
text = await asyncio.to_thread(
engine.extract_region_text, image,
region.x, region.y, region.width, region.height,
sender_path,
)
engine_seconds += time.monotonic() - region_started
if script_type in _SPELL_CHECK_SCRIPT_TYPES:
text = correct_text(text)
blocks.append({
@@ -281,11 +195,7 @@ async def run_ocr_stream(request: OcrRequest):
"annotationId": region.annotationId,
})
metrics.ocr_processing_seconds.labels(engine=engine_name).observe(
engine_seconds
)
total_blocks += len(blocks)
metrics.ocr_pages_total.labels(engine=engine_name).inc()
yield json.dumps({
"type": "page",
"pageNumber": page_idx,
@@ -295,7 +205,6 @@ async def run_ocr_stream(request: OcrRequest):
except Exception:
logger.exception("Guided OCR failed on page %d", page_idx)
skipped_pages += 1
metrics.ocr_skipped_pages_total.inc()
yield json.dumps({
"type": "error",
"pageNumber": page_idx,
@@ -329,25 +238,18 @@ async def run_ocr_stream(request: OcrRequest):
yield json.dumps({"type": "preprocessing", "pageNumber": page_idx}) + "\n"
image = await asyncio.to_thread(preprocess_page, image)
sender_path = request.senderModelPath if use_kraken else None
page_started = time.monotonic()
blocks = await asyncio.to_thread(
engine.extract_page_blocks, image, page_idx, request.language, sender_path
)
metrics.ocr_processing_seconds.labels(engine=engine_name).observe(
time.monotonic() - page_started
)
for block in blocks:
words = block.get("words") or []
if words:
_observe_block_words(words, threshold)
block["text"] = apply_confidence_markers(words, threshold)
if block.get("words"):
block["text"] = apply_confidence_markers(block["words"], threshold)
block.pop("words", None)
if script_type in _SPELL_CHECK_SCRIPT_TYPES:
block["text"] = correct_text(block["text"])
total_blocks += len(blocks)
metrics.ocr_pages_total.labels(engine=engine_name).inc()
yield json.dumps({
"type": "page",
"pageNumber": page_idx,
@@ -357,7 +259,6 @@ async def run_ocr_stream(request: OcrRequest):
except Exception:
logger.exception("OCR failed on page %d", page_idx)
skipped_pages += 1
metrics.ocr_skipped_pages_total.inc()
yield json.dumps({
"type": "error",
"pageNumber": page_idx,
@@ -537,7 +438,8 @@ async def train_model(
return {"loss": None, "accuracy": accuracy, "cer": cer, "epochs": epochs}
return await _record_training(lambda: asyncio.to_thread(_run_training), kind="recognition")
result = await asyncio.to_thread(_run_training)
return result
@app.post("/train-sender")
@@ -616,9 +518,8 @@ async def train_sender_model(
return {"loss": None, "accuracy": accuracy, "cer": cer, "epochs": epochs}
return await _record_training(
lambda: asyncio.to_thread(_run_sender_training), kind="recognition"
)
result = await asyncio.to_thread(_run_sender_training)
return result
@app.post("/segtrain")
@@ -727,7 +628,8 @@ async def segtrain_model(
return {"loss": None, "accuracy": accuracy, "cer": cer, "epochs": epochs}
return await _record_training(lambda: asyncio.to_thread(_run_segtrain), kind="segmentation")
result = await asyncio.to_thread(_run_segtrain)
return result
async def _download_and_convert_pdf(url: str) -> list[Image.Image]:

View File

@@ -1,92 +0,0 @@
"""Prometheus metric definitions for the OCR service.
`build_metrics(registry)` returns a fresh `OcrMetrics` instance bound to the
given `CollectorRegistry`. Production code calls it once at module load with
the default `REGISTRY`; tests pass a per-test `CollectorRegistry()` to keep
counter values isolated between cases (decision #3 on issue #652).
"""
from __future__ import annotations
from dataclasses import dataclass
from prometheus_client import CollectorRegistry, Counter, Gauge, Histogram
@dataclass(frozen=True)
class OcrMetrics:
"""Container for every custom OCR metric.
Counters and gauges are immutable references to `prometheus_client`
instances. Mutating them (`.inc()`, `.observe()`, `.set()`) is safe;
rebinding the field on the dataclass is not — use `build_metrics` to get
a new container.
"""
ocr_jobs_total: Counter
ocr_pages_total: Counter
ocr_skipped_pages_total: Counter
ocr_words_total: Counter
ocr_illegible_words_total: Counter
ocr_processing_seconds: Histogram
ocr_training_runs_total: Counter
ocr_model_accuracy: Gauge
ocr_models_ready: Gauge
def build_metrics(registry: CollectorRegistry) -> OcrMetrics:
"""Create one OcrMetrics instance bound to `registry`."""
return OcrMetrics(
ocr_jobs_total=Counter(
"ocr_jobs_total",
"Number of OCR jobs processed, labelled by engine and script type.",
["engine", "script_type"],
registry=registry,
),
ocr_pages_total=Counter(
"ocr_pages_total",
"Number of pages successfully OCR'd, labelled by engine.",
["engine"],
registry=registry,
),
ocr_skipped_pages_total=Counter(
"ocr_skipped_pages_total",
"Number of pages skipped because the OCR engine raised.",
registry=registry,
),
ocr_words_total=Counter(
"ocr_words_total",
"Number of words recognized across all OCR blocks.",
registry=registry,
),
ocr_illegible_words_total=Counter(
"ocr_illegible_words_total",
"Number of words below the confidence threshold "
"(replaced with [unleserlich]).",
registry=registry,
),
ocr_processing_seconds=Histogram(
"ocr_processing_seconds",
"OCR processing time per page (streaming) or per document (non-streaming).",
["engine"],
registry=registry,
),
ocr_training_runs_total=Counter(
"ocr_training_runs_total",
"Number of training runs, labelled by kind (recognition|segmentation) "
"and outcome (success|error).",
["kind", "outcome"],
registry=registry,
),
ocr_model_accuracy=Gauge(
"ocr_model_accuracy",
"Latest model accuracy reported by a successful training run.",
["kind"],
registry=registry,
),
ocr_models_ready=Gauge(
"ocr_models_ready",
"1 once the lifespan startup has finished loading models, 0 before.",
registry=registry,
),
)

View File

@@ -10,5 +10,3 @@ pyvips>=2.2.0
httpx==0.28.1
pyspellchecker==0.9.0
opencv-python-headless==4.11.0.86
prometheus-fastapi-instrumentator==7.0.0
prometheus-client==0.25.0

View File

@@ -1,638 +0,0 @@
"""Tests for Prometheus metrics exposed by the OCR service.
Each test that asserts on a counter/gauge value uses a fresh CollectorRegistry
(see decision #3 on issue #652) to keep the metrics isolated between tests.
"""
import contextlib
import io
import zipfile
from unittest.mock import AsyncMock, patch
import pytest
from httpx import ASGITransport, AsyncClient
from PIL import Image
from prometheus_client import CollectorRegistry
from main import app
from metrics import build_metrics
@contextlib.asynccontextmanager
async def ocr_client(*, raise_app_exceptions: bool = True):
"""Yield an AsyncClient with model-loaders patched and _models_ready forced on.
The shared setup for almost every metrics test: stub the heavy lifecycle
hooks (kraken_engine.load_models, load_spell_checker), flip the readiness
flag so request handlers do not 503, and restore it afterwards.
"""
with patch("main.kraken_engine.load_models"), \
patch("main.load_spell_checker"):
transport = ASGITransport(app=app, raise_app_exceptions=raise_app_exceptions)
async with AsyncClient(transport=transport, base_url="http://test") as client:
import main as main_module
main_module._models_ready = True
try:
yield client
finally:
main_module._models_ready = False
def _minimal_zip() -> bytes:
"""Return a ZIP containing one fake .xml so endpoint validation passes."""
buf = io.BytesIO()
with zipfile.ZipFile(buf, "w") as zf:
zf.writestr("page_01.xml", "<PcGts/>")
return buf.getvalue()
def _fake_training_result(accuracy: float = 0.91) -> dict:
return {"loss": None, "accuracy": accuracy, "cer": round(1 - accuracy, 4), "epochs": 5}
@pytest.fixture
def fresh_metrics(monkeypatch):
"""Replace the module-level `main.metrics` with one bound to a fresh registry."""
registry = CollectorRegistry()
test_metrics = build_metrics(registry)
monkeypatch.setattr("main.metrics", test_metrics)
return test_metrics
@pytest.mark.asyncio
async def test_metrics_endpoint_returns_200():
"""`GET /metrics` returns 200 with Prometheus exposition content.
Uses the global REGISTRY by design — does NOT take the `fresh_metrics` fixture.
The `/metrics` endpoint is wired by `prometheus-fastapi-instrumentator`, which
binds to the default REGISTRY at app-construction time; swapping `main.metrics`
via the fixture would not redirect what `/metrics` exposes. This test only
asserts response shape (status code + content-type substring), not numeric
counter values, so cross-test state leakage cannot affect it.
"""
with patch("main.kraken_engine.load_models"), \
patch("main.load_spell_checker"):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
response = await client.get("/metrics")
assert response.status_code == 200
assert "text/plain" in response.headers.get("content-type", "")
@pytest.mark.asyncio
async def test_metrics_includes_http_request_metrics_after_ocr_call():
"""After a request to /ocr, `/metrics` exposes auto-instrumented http_* metrics.
Uses the global REGISTRY by design — does NOT take the `fresh_metrics` fixture.
The `http_requests_total` / `http_request_duration_seconds` metrics live on
the instrumentator's default REGISTRY (not on `main.metrics`), so a fresh
CollectorRegistry would never see them. This test only asserts response shape
(substring presence in the exposition body), not numeric counter values, so
cross-test state leakage cannot affect it.
"""
mock_images = [Image.new("RGB", (100, 100))]
mock_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
"polygon": None, "text": "hi", "words": []}]
with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
patch("main.preprocess_page", side_effect=lambda img: img), \
patch("main.surya_engine.extract_blocks", return_value=mock_blocks):
async with ocr_client() as client:
ocr_response = await client.post("/ocr", json={
"pdfUrl": "http://minio/doc.pdf",
"scriptType": "TYPEWRITER",
"language": "de",
})
assert ocr_response.status_code == 200, ocr_response.text
metrics_response = await client.get("/metrics")
body = metrics_response.text
assert "http_requests_total" in body
assert "http_request_duration_seconds" in body
def test_build_metrics_registers_all_custom_metrics_on_given_registry():
"""`build_metrics` returns an OcrMetrics bound to the supplied registry."""
registry = CollectorRegistry()
metrics = build_metrics(registry)
metric_names = {m.name for m in registry.collect()}
expected = {
"ocr_jobs",
"ocr_pages",
"ocr_skipped_pages",
"ocr_words",
"ocr_illegible_words",
"ocr_processing_seconds",
"ocr_training_runs",
"ocr_model_accuracy",
"ocr_models_ready",
}
assert expected <= metric_names, f"missing: {expected - metric_names}"
# A second registry yields a separate container — no shared state.
other_metrics = build_metrics(CollectorRegistry())
assert metrics is not other_metrics
async def _drive_ocr(client: AsyncClient, *, script_type: str) -> None:
"""Helper — fires /ocr with a single mocked page and asserts a 200."""
response = await client.post("/ocr", json={
"pdfUrl": "http://minio/doc.pdf",
"scriptType": script_type,
"language": "de",
})
assert response.status_code == 200, response.text
@pytest.mark.asyncio
async def test_ocr_jobs_total_incremented_with_kraken_engine_label_for_kurrent(fresh_metrics):
"""A /ocr call with HANDWRITING_KURRENT increments engine=kraken."""
mock_images = [Image.new("RGB", (100, 100))]
mock_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
"polygon": None, "text": "hi", "words": []}]
with patch("main.correct_text", side_effect=lambda t: t), \
patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
patch("main.preprocess_page", side_effect=lambda img: img), \
patch("main.kraken_engine.is_available", return_value=True), \
patch("main.kraken_engine.extract_blocks", return_value=mock_blocks):
async with ocr_client() as client:
await _drive_ocr(client, script_type="HANDWRITING_KURRENT")
value = fresh_metrics.ocr_jobs_total.labels(
engine="kraken", script_type="HANDWRITING_KURRENT"
)._value.get()
assert value == 1.0
@pytest.mark.asyncio
async def test_ocr_jobs_total_incremented_with_surya_engine_label_for_typewriter(fresh_metrics):
"""A /ocr call with TYPEWRITER increments engine=surya."""
mock_images = [Image.new("RGB", (100, 100))]
mock_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
"polygon": None, "text": "hi", "words": []}]
with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
patch("main.preprocess_page", side_effect=lambda img: img), \
patch("main.surya_engine.extract_blocks", return_value=mock_blocks):
async with ocr_client() as client:
await _drive_ocr(client, script_type="TYPEWRITER")
value = fresh_metrics.ocr_jobs_total.labels(
engine="surya", script_type="TYPEWRITER"
)._value.get()
assert value == 1.0
@pytest.mark.asyncio
async def test_ocr_pages_total_incremented_once_per_page_in_stream(fresh_metrics):
"""The /ocr/stream generator increments ocr_pages_total per successful page."""
mock_images = [Image.new("RGB", (100, 100)) for _ in range(3)]
mock_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
"polygon": None, "text": "hi", "words": []}]
with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
patch("main.preprocess_page", side_effect=lambda img: img), \
patch("main.surya_engine.extract_page_blocks", return_value=mock_blocks):
async with ocr_client() as client:
async with client.stream("POST", "/ocr/stream", json={
"pdfUrl": "http://minio/doc.pdf",
"scriptType": "TYPEWRITER",
"language": "de",
}) as response:
assert response.status_code == 200
# Drain the stream so all per-page increments fire.
async for _ in response.aiter_lines():
pass
value = fresh_metrics.ocr_pages_total.labels(engine="surya")._value.get()
assert value == 3.0
@pytest.mark.asyncio
async def test_ocr_skipped_pages_total_incremented_when_engine_raises_for_a_page(fresh_metrics):
"""When the engine raises on a page, ocr_skipped_pages_total bumps and the stream finishes."""
mock_images = [Image.new("RGB", (100, 100)) for _ in range(2)]
good_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
"polygon": None, "text": "ok", "words": []}]
call_count = {"n": 0}
def extract_side_effect(*args, **kwargs):
call_count["n"] += 1
if call_count["n"] == 1:
raise RuntimeError("synthetic engine failure")
return good_blocks
with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
patch("main.preprocess_page", side_effect=lambda img: img), \
patch("main.surya_engine.extract_page_blocks", side_effect=extract_side_effect):
async with ocr_client() as client:
async with client.stream("POST", "/ocr/stream", json={
"pdfUrl": "http://minio/doc.pdf",
"scriptType": "TYPEWRITER",
"language": "de",
}) as response:
assert response.status_code == 200
saw_error = False
async for line in response.aiter_lines():
if line and '"type": "error"' in line:
saw_error = True
assert saw_error
assert fresh_metrics.ocr_skipped_pages_total._value.get() == 1.0
# The second page still succeeds.
assert fresh_metrics.ocr_pages_total.labels(engine="surya")._value.get() == 1.0
@pytest.mark.asyncio
async def test_ocr_words_and_illegible_words_total_sum_across_blocks(fresh_metrics):
"""Counters reflect totals summed over every block in the request.
Threshold defaults to THRESHOLD_DEFAULT (0.3) for non-Kurrent scripts. Two
blocks: 3 words above + 2 words below threshold across blocks.
"""
mock_images = [Image.new("RGB", (100, 100))]
mock_blocks = [
{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
"polygon": None, "text": "ignored",
"words": [{"text": "Lieber", "confidence": 0.9},
{"text": "Freund", "confidence": 0.1}]},
{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
"polygon": None, "text": "ignored",
"words": [{"text": "Gruss", "confidence": 0.8},
{"text": "verschmiert", "confidence": 0.05},
{"text": "Karl", "confidence": 0.95}]},
]
with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
patch("main.preprocess_page", side_effect=lambda img: img), \
patch("main.surya_engine.extract_blocks", return_value=mock_blocks):
async with ocr_client() as client:
await _drive_ocr(client, script_type="TYPEWRITER")
assert fresh_metrics.ocr_words_total._value.get() == 5.0
assert fresh_metrics.ocr_illegible_words_total._value.get() == 2.0
def _histogram_count_sum(histogram, **labels) -> tuple[float, float]:
"""Read the per-label-set _count and _sum from a prometheus_client Histogram."""
child = histogram.labels(**labels)
return child._sum.get(), sum(b.get() for b in child._buckets)
@pytest.mark.asyncio
async def test_ocr_processing_seconds_histogram_observed_per_page_in_stream(fresh_metrics):
"""The streaming generator observes ocr_processing_seconds once per page."""
mock_images = [Image.new("RGB", (100, 100)) for _ in range(2)]
mock_blocks = [{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0,
"polygon": None, "text": "ok", "words": []}]
with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
patch("main.preprocess_page", side_effect=lambda img: img), \
patch("main.surya_engine.extract_page_blocks", return_value=mock_blocks):
async with ocr_client() as client:
async with client.stream("POST", "/ocr/stream", json={
"pdfUrl": "http://minio/doc.pdf",
"scriptType": "TYPEWRITER",
"language": "de",
}) as response:
assert response.status_code == 200
async for _ in response.aiter_lines():
pass
sum_seconds, count = _histogram_count_sum(
fresh_metrics.ocr_processing_seconds, engine="surya"
)
assert count == 2.0
assert sum_seconds >= 0.0
@pytest.mark.asyncio
async def test_ocr_training_runs_total_incremented_with_recognition_success_label(fresh_metrics):
"""/train success increments ocr_training_runs_total{kind=recognition, outcome=success}."""
async def fake_to_thread(func, *args, **kwargs):
return _fake_training_result()
with patch("main.TRAINING_TOKEN", "secret-token"), \
patch("main._models_ready", True), \
patch("main.asyncio.to_thread", side_effect=fake_to_thread):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
response = await client.post(
"/train",
files={"file": ("training.zip", _minimal_zip(), "application/zip")},
headers={"X-Training-Token": "secret-token"},
)
assert response.status_code == 200
assert fresh_metrics.ocr_training_runs_total.labels(
kind="recognition", outcome="success"
)._value.get() == 1.0
@pytest.mark.asyncio
async def test_ocr_training_runs_total_incremented_with_recognition_error_label(fresh_metrics):
"""When ketos exits non-zero, the error counter bumps and the exception propagates.
Uses the narrowest available seam — `subprocess.run` returning a failing
CompletedProcess — instead of stubbing the asyncio.to_thread boundary,
so the test exercises the real _run_training error path.
"""
from subprocess import CompletedProcess
failing_proc = CompletedProcess(
args=["ketos"], returncode=1, stdout="", stderr="synthetic ketos failure"
)
with patch("main.TRAINING_TOKEN", "secret-token"), \
patch("main._models_ready", True), \
patch("main.subprocess.run", return_value=failing_proc):
transport = ASGITransport(app=app, raise_app_exceptions=False)
async with AsyncClient(transport=transport, base_url="http://test") as client:
response = await client.post(
"/train",
files={"file": ("training.zip", _minimal_zip(), "application/zip")},
headers={"X-Training-Token": "secret-token"},
)
assert response.status_code == 500
assert fresh_metrics.ocr_training_runs_total.labels(
kind="recognition", outcome="error"
)._value.get() == 1.0
@pytest.mark.asyncio
async def test_ocr_training_runs_total_incremented_with_segmentation_success_label(fresh_metrics):
"""/segtrain success increments ocr_training_runs_total{kind=segmentation, outcome=success}."""
async def fake_to_thread(func, *args, **kwargs):
return _fake_training_result(accuracy=0.83)
with patch("main.TRAINING_TOKEN", "secret-token"), \
patch("main._models_ready", True), \
patch("main.asyncio.to_thread", side_effect=fake_to_thread):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
response = await client.post(
"/segtrain",
files={"file": ("training.zip", _minimal_zip(), "application/zip")},
headers={"X-Training-Token": "secret-token"},
)
assert response.status_code == 200
assert fresh_metrics.ocr_training_runs_total.labels(
kind="segmentation", outcome="success"
)._value.get() == 1.0
@pytest.mark.asyncio
async def test_ocr_training_runs_total_incremented_with_recognition_success_label_for_train_sender(fresh_metrics):
"""/train-sender success increments ocr_training_runs_total{kind=recognition, outcome=success}."""
async def fake_to_thread(func, *args, **kwargs):
return _fake_training_result()
with patch("main.TRAINING_TOKEN", "secret-token"), \
patch("main._models_ready", True), \
patch("main.asyncio.to_thread", side_effect=fake_to_thread):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
response = await client.post(
"/train-sender",
files={"file": ("training.zip", _minimal_zip(), "application/zip")},
data={"output_model_path": "/app/models/sender_test.mlmodel"},
headers={"X-Training-Token": "secret-token"},
)
assert response.status_code == 200, response.text
assert fresh_metrics.ocr_training_runs_total.labels(
kind="recognition", outcome="success"
)._value.get() == 1.0
@pytest.mark.asyncio
async def test_ocr_model_accuracy_gauge_stays_default_when_training_returns_no_accuracy(fresh_metrics):
"""When the runner returns accuracy=None, ocr_model_accuracy must remain at its default 0."""
async def fake_to_thread(func, *args, **kwargs):
return {"loss": None, "accuracy": None, "cer": None, "epochs": 5}
with patch("main.TRAINING_TOKEN", "secret-token"), \
patch("main._models_ready", True), \
patch("main.asyncio.to_thread", side_effect=fake_to_thread):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
response = await client.post(
"/train",
files={"file": ("training.zip", _minimal_zip(), "application/zip")},
headers={"X-Training-Token": "secret-token"},
)
assert response.status_code == 200
# Gauge was never .set() — accessing the label child still creates it with default 0.0.
assert fresh_metrics.ocr_model_accuracy.labels(
kind="recognition"
)._value.get() == 0.0
@pytest.mark.asyncio
async def test_ocr_model_accuracy_gauge_set_per_kind_after_successful_training(fresh_metrics):
"""After /train and /segtrain succeed, ocr_model_accuracy{kind=...} reflects the result."""
recognition_accuracy = 0.917
segmentation_accuracy = 0.834
async def fake_recognition_to_thread(func, *args, **kwargs):
return _fake_training_result(accuracy=recognition_accuracy)
async def fake_segmentation_to_thread(func, *args, **kwargs):
return _fake_training_result(accuracy=segmentation_accuracy)
with patch("main.TRAINING_TOKEN", "secret-token"), \
patch("main._models_ready", True):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with patch("main.asyncio.to_thread", side_effect=fake_recognition_to_thread):
rec_resp = await client.post(
"/train",
files={"file": ("training.zip", _minimal_zip(), "application/zip")},
headers={"X-Training-Token": "secret-token"},
)
assert rec_resp.status_code == 200
with patch("main.asyncio.to_thread", side_effect=fake_segmentation_to_thread):
seg_resp = await client.post(
"/segtrain",
files={"file": ("training.zip", _minimal_zip(), "application/zip")},
headers={"X-Training-Token": "secret-token"},
)
assert seg_resp.status_code == 200
assert fresh_metrics.ocr_model_accuracy.labels(kind="recognition")._value.get() == pytest.approx(recognition_accuracy)
assert fresh_metrics.ocr_model_accuracy.labels(kind="segmentation")._value.get() == pytest.approx(segmentation_accuracy)
def test_ocr_models_ready_gauge_defaults_to_zero():
"""A freshly-built OcrMetrics has ocr_models_ready=0 before lifespan runs."""
metrics = build_metrics(CollectorRegistry())
assert metrics.ocr_models_ready._value.get() == 0.0
@pytest.mark.asyncio
async def test_ocr_models_ready_gauge_is_one_after_lifespan_startup(fresh_metrics):
"""The lifespan flips ocr_models_ready to 1 once load_models / load_spell_checker return.
ASGITransport does not run lifespan by default, so the lifespan context
manager is driven directly to exercise the startup code path.
"""
assert fresh_metrics.ocr_models_ready._value.get() == 0.0
with patch("main.kraken_engine.load_models"), \
patch("main.load_spell_checker"):
async with app.router.lifespan_context(app):
assert fresh_metrics.ocr_models_ready._value.get() == 1.0
@pytest.mark.asyncio
async def test_ocr_processing_seconds_histogram_observed_per_page_in_guided_stream(fresh_metrics):
"""The guided streaming generator observes ocr_processing_seconds once per page."""
mock_images = [Image.new("RGB", (100, 100)) for _ in range(2)]
regions = [
{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 0.5, "height": 0.5, "annotationId": "a1"},
{"pageNumber": 2, "x": 0.0, "y": 0.0, "width": 1.0, "height": 1.0, "annotationId": "a2"},
]
with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
patch("main.preprocess_page", side_effect=lambda img: img), \
patch("main.surya_engine.extract_region_text", return_value="text"):
async with ocr_client() as client:
async with client.stream("POST", "/ocr/stream", json={
"pdfUrl": "http://minio/doc.pdf",
"scriptType": "TYPEWRITER",
"language": "de",
"regions": regions,
}) as response:
assert response.status_code == 200
async for _ in response.aiter_lines():
pass
sum_seconds, count = _histogram_count_sum(
fresh_metrics.ocr_processing_seconds, engine="surya"
)
assert count == 2.0
assert sum_seconds >= 0.0
@pytest.mark.asyncio
async def test_ocr_processing_seconds_histogram_excludes_spell_check_time_in_guided_stream(fresh_metrics):
"""The guided observation must time engine work only, not the spell-check pass.
Wall-clock bound rather than a structural `patch("main.time.monotonic")`:
the patched attribute is the *global* `time.monotonic`, which httpx and
asyncio also consume — they exhaust the deterministic sequence before the
request reaches the engine loop. Bound is sized against the failure mode,
not the noise floor: spell-check sleeps 0.05s × 2 regions = 0.1s, so a
timer that accidentally wrapped `correct_text` would observe >= 0.1s. The
0.09s ceiling catches that bug while leaving ~90ms of slack for slow CI
runners (engine work is instantaneous under the mock).
"""
mock_images = [Image.new("RGB", (100, 100))]
regions = [
{"pageNumber": 1, "x": 0.0, "y": 0.0, "width": 0.5, "height": 0.5, "annotationId": "a1"},
{"pageNumber": 1, "x": 0.5, "y": 0.0, "width": 0.5, "height": 0.5, "annotationId": "a2"},
]
def slow_correct(text):
import time as _time
_time.sleep(0.05)
return text
with patch("main._download_and_convert_pdf", new_callable=AsyncMock, return_value=mock_images), \
patch("main.preprocess_page", side_effect=lambda img: img), \
patch("main.kraken_engine.is_available", return_value=True), \
patch("main.kraken_engine.extract_region_text", return_value="text"), \
patch("main.correct_text", side_effect=slow_correct):
async with ocr_client() as client:
async with client.stream("POST", "/ocr/stream", json={
"pdfUrl": "http://minio/doc.pdf",
"scriptType": "HANDWRITING_KURRENT",
"language": "de",
"regions": regions,
}) as response:
assert response.status_code == 200
async for _ in response.aiter_lines():
pass
sum_seconds, _ = _histogram_count_sum(
fresh_metrics.ocr_processing_seconds, engine="kraken"
)
assert sum_seconds < 0.09, f"timing must exclude spell-check; got sum={sum_seconds}"
@pytest.mark.asyncio
async def test_ocr_jobs_total_not_incremented_when_pdf_download_fails_in_stream(fresh_metrics):
"""If `_download_and_convert_pdf` raises, ocr_jobs_total is NOT incremented.
Mirrors the /ocr endpoint's semantics: the counter only records jobs that
actually started OCR work, not failed downloads.
"""
async def fail_download(url):
raise RuntimeError("synthetic download failure")
with patch("main._download_and_convert_pdf", new=fail_download):
async with ocr_client(raise_app_exceptions=False) as client:
response = await client.post("/ocr/stream", json={
"pdfUrl": "http://minio/doc.pdf",
"scriptType": "TYPEWRITER",
"language": "de",
})
assert response.status_code == 500
assert fresh_metrics.ocr_jobs_total.labels(
engine="surya", script_type="TYPEWRITER"
)._value.get() == 0.0
def test_uvicorn_access_log_filter_fails_open_on_short_or_missing_args():
"""The filter must default-allow records when args is None or shorter than expected.
Locks in fail-open behavior: if uvicorn ever changes its format we keep
forwarding records to the handler rather than silently dropping logs.
"""
import logging as _logging
from main import MetricsPathFilter
filt = MetricsPathFilter()
none_record = _logging.LogRecord(
name="uvicorn.access", level=_logging.INFO, pathname="", lineno=0,
msg="some message", args=None, exc_info=None,
)
short_record = _logging.LogRecord(
name="uvicorn.access", level=_logging.INFO, pathname="", lineno=0,
msg="%s %s", args=("a", "b"), exc_info=None,
)
assert filt.filter(none_record) is True
assert filt.filter(short_record) is True
def test_uvicorn_access_log_filter_skips_metrics_path():
"""The MetricsPathFilter drops uvicorn.access log records that target /metrics."""
import logging as _logging
from main import MetricsPathFilter
filt = MetricsPathFilter()
metrics_record = _logging.LogRecord(
name="uvicorn.access", level=_logging.INFO, pathname="", lineno=0,
msg='%s - "%s %s HTTP/%s" %d',
args=("127.0.0.1:1234", "GET", "/metrics", "1.1", 200),
exc_info=None,
)
health_record = _logging.LogRecord(
name="uvicorn.access", level=_logging.INFO, pathname="", lineno=0,
msg='%s - "%s %s HTTP/%s" %d',
args=("127.0.0.1:1234", "GET", "/health", "1.1", 200),
exc_info=None,
)
ocr_record = _logging.LogRecord(
name="uvicorn.access", level=_logging.INFO, pathname="", lineno=0,
msg='%s - "%s %s HTTP/%s" %d',
args=("127.0.0.1:1234", "POST", "/ocr", "1.1", 200),
exc_info=None,
)
assert filt.filter(metrics_record) is False
assert filt.filter(health_record) is False
assert filt.filter(ocr_record) is True