familienarchiv

Author	SHA1	Message	Date
Marcel	525f091b3a	feat(ocr): suppress uvicorn access logs for /metrics and /health Adds a logging.Filter on uvicorn.access that drops records whose request path is /metrics or /health. Each is hit on a tight schedule (Prometheus scrape interval and Docker healthcheck), so unfiltered they dominate the access log without carrying any information about real traffic. Refs #652 (Nora's recommendation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:16:14 +02:00
Marcel	d6abf990c7	feat(ocr): flip ocr_models_ready to 1 once the lifespan startup finishes Mirrors the existing _models_ready bool so Prometheus has a time-series liveness/readiness signal for future alerting rules (e.g. ocr_models_ready < 1 for 2m). Refs #652 (AC7) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:15:11 +02:00
Marcel	77d59c5d83	test(ocr): assert ocr_model_accuracy gauge is set per kind on success Hits /train then /segtrain through the same test, each with a distinct mocked accuracy, and asserts the labelled gauges reflect the two values. Locks down the kind-label separation between recognition and segmentation accuracy (decision #2). Refs #652 (AC6) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:13:05 +02:00
Marcel	6c2b9af10b	feat(ocr): record training runs in ocr_training_runs_total per kind and outcome Wraps the await asyncio.to_thread(_run_*) calls in /train, /train-sender, and /segtrain with try/except. Recognition training (/train, /train-sender) shares kind="recognition"; /segtrain uses kind="segmentation". The ocr_model_accuracy gauge is set per kind on success. Refs #652 (AC6, decision #2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:12:26 +02:00
Marcel	2e3744d9ef	feat(ocr): observe ocr_processing_seconds around engine.to_thread calls Wraps every asyncio.to_thread(engine.extract_*) call with time.monotonic() deltas in /ocr (per document) and in both /ocr/stream generators (per page). Streaming buckets are the useful operational signal; the non-streaming observation is a bonus. Refs #652 (AC5) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:09:25 +02:00
Marcel	131ed336bc	feat(ocr): count words and illegible words at the OCR call sites Walks block["words"] before apply_confidence_markers strips the list, then increments ocr_words_total by len(words) and ocr_illegible_words_total by the count below threshold. Same pattern in both /ocr and /ocr/stream so the ratio illegible/words is a faithful quality signal across endpoints. Refs #652 (AC4) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:07:59 +02:00
Marcel	3fa3460dbf	feat(ocr): increment ocr_skipped_pages_total on per-page engine failure Bumps the counter in both /ocr/stream except blocks (standard and guided generators) so the existing skipped_pages local variable now also flows into Prometheus. Refs #652 (AC3b) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:06:50 +02:00
Marcel	79edb94558	feat(ocr): increment ocr_pages_total per successful page in stream Bumps the counter inside both the standard and guided /ocr/stream generators after a page yields its blocks, before the per-page json line is emitted. Also moves the ocr_jobs_total increment for /ocr/stream right after engine selection so the counter still fires when a page later errors out. Refs #652 (AC3a) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:05:36 +02:00
Marcel	52d8dc2b20	test(ocr): assert ocr_jobs_total label is engine=surya for typewriter Locks down AC2 for the non-Kurrent path. The same code branch in /ocr that sets engine_name from script_type now has explicit coverage for both HANDWRITING_KURRENT → kraken and TYPEWRITER → surya. Refs #652 (AC2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:04:20 +02:00
Marcel	696b71da5a	feat(ocr): increment ocr_jobs_total with engine and script_type labels Pick engine="kraken" for HANDWRITING_KURRENT, engine="surya" otherwise, then increment after the blocks have been extracted. Refs #652 (AC2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:03:37 +02:00
Marcel	f3e3545d06	feat(ocr): add metrics.py factory with test-scoped CollectorRegistry support Encapsulates every custom OCR metric in an OcrMetrics frozen dataclass and exposes a `build_metrics(registry)` factory. Production main.py binds against the default REGISTRY; tests construct a fresh CollectorRegistry per case and monkeypatch main.metrics, so counter values stay isolated between tests (decision #3 on issue #652, Option A). Refs #652 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:02:20 +02:00
Marcel	4bb6685edb	test(ocr): assert http_* metrics appear after an /ocr request Locks down AC1: prometheus-fastapi-instrumentator must keep auto-exposing http_requests_total and http_request_duration_seconds for application traffic, not just register the /metrics endpoint. Refs #652 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 16:00:33 +02:00
Marcel	18c93d4eaa	feat(ocr): expose /metrics endpoint via prometheus-fastapi-instrumentator Mount the instrumentator immediately after FastAPI app creation, excluding /health and /metrics from request metrics to keep http_requests_total focused on real application traffic. Refs #652 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 15:59:37 +02:00
Marcel	6839cf2a33	docs(ocr): clarify entrypoint comment and add manual run hint for skipped test - entrypoint.sh: replace "cross-job ground-truth leakage" with plain "Remove stale partial downloads left by a previous docker-kill" - test_tmpdir_is_inside_persistent_cache_volume: add docker exec command so future developers know how to run this deployment-contract test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 11:20:45 +02:00
Marcel	775b5c062e	test(ocr): add orphan cleanup behavior tests for entrypoint.sh find -mtime test_entrypoint_removes_day_old_orphans and test_entrypoint_preserves_fresh_files verify the find -mtime +1 -delete logic using os.utime() to fabricate old mtimes without mocking system time. Also extracts _run_entrypoint helper to remove subprocess setup duplication. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 11:19:33 +02:00
Marcel	e31dac5c9c	test(ocr): assert entrypoint.sh exit code in test_entrypoint_creates_tmpdir A silent non-zero exit would previously cause the test to pass incorrectly because only directory creation was checked. Exit code is now the first assertion, catching regressions before the filesystem check runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 11:18:14 +02:00
Marcel	c2bd1b34f0	refactor(ocr): extract _validate_zip_entry to utils.py so ZIP Slip test runs in CI _validate_zip_entry has no ML-stack dependency; importing it via main.py pulled in surya/torch and caused the test to be skipped in CI. Moving it to utils.py (fastapi only) and adding fastapi to the CI lightweight install lets test_zipslip_still_anchors_under_custom_tmpdir run on every push. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 11:17:15 +02:00
Marcel	cfd49ff69e	docs(ocr): document TMPDIR convention and add ADR-021 All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m7s Details CI / OCR Service Tests (pull_request) Successful in 19s Details CI / Backend Unit Tests (pull_request) Successful in 3m7s Details CI / fail2ban Regex (pull_request) Successful in 42s Details CI / Semgrep Security Scan (pull_request) Successful in 18s Details CI / Compose Bucket Idempotency (pull_request) Successful in 59s Details - ocr-service/README.md: add HF_HOME, XDG_CACHE_HOME, TORCH_HOME, TMPDIR rows to the environment variables table - ocr-service/CLAUDE.md: LLM reminder — TMPDIR must stay on the cache volume - docs/adr/021-tmpdir-persistent-volume-staging.md: records the decision, trade-offs, and rejected alternatives (Approach B / C) for issue #614 - ci.yml: add test_tmpdir.py to the OCR CI run (stdlib-only tests, no ML stack) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 10:58:10 +02:00
Marcel	240b373f68	fix(ocr): create TMPDIR on startup and clear day-old orphans On a fresh ocr_cache volume /app/cache/.tmp does not exist yet. The mkdir ensures the first Surya model download can proceed without ENOSPC on the 512 MB /tmp tmpfs. The find cleanup removes fragments left by docker-kill mid-download, preventing cross-job ground-truth leakage. Fixes #614. See ADR-021. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 10:54:17 +02:00
Marcel	09a043431e	build(ocr): set ENV TMPDIR=/app/cache/.tmp so docker run uses SSD staging Without this, running the image outside compose loses the TMPDIR redirect and Surya model downloads fall back to the 512 MB /tmp tmpfs (ENOSPC). See issue #614, ADR-021. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 10:53:15 +02:00
Marcel	bead6f1811	fix(ocr): handle empty-string HTRMOPO_DIR env var with or-fallback os.environ.get(key, default) returns "" when the key exists but is blank — the default is only used when the key is absent. The or-fallback treats both absence and blank values as "use the default". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 18:53:26 +02:00
Marcel	fc8b4b164b	security(ocr): redirect XDG cache and Torch home away from read-only HOME Prevents PyTorch/Matplotlib/Ketos from writing to /home/ocr which is on the read-only container filesystem — fixes Nora's blocker. Also restores the explanatory comment on the ocr_cache volume mount. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 17:30:39 +02:00
Marcel	eb63df2000	test(ocr): add startup root canary tests for main.py lifespan Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 17:29:47 +02:00
Marcel	53bd574660	test(ocr): replace vacuous startswith assertion with equality check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 17:26:58 +02:00
Marcel	581ba01d8d	security(ocr): log warning on startup when running as root All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m3s Details CI / OCR Service Tests (pull_request) Successful in 18s Details CI / Backend Unit Tests (pull_request) Successful in 3m10s Details CI / fail2ban Regex (pull_request) Successful in 42s Details CI / Semgrep Security Scan (pull_request) Successful in 19s Details CI / Compose Bucket Idempotency (pull_request) Successful in 59s Details Adds a canary log line if os.getuid() == 0. Produces an observable signal in container logs if the USER directive is ever removed from the Dockerfile, without requiring an external audit tool. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 16:51:00 +02:00
Marcel	9db42d6cc1	fix(ocr): resolve HTRMOPO_DIR from env var, not ~ expansion With --no-create-home, os.path.expanduser("~") resolves to "/" causing kraken get to write to /.local/share/htrmopo. Replace with os.environ.get("HTRMOPO_DIR", "/app/models/.htrmopo") so the path is explicit and override-friendly without a home directory. Adds two tests verifying env-var resolution and ~-free default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 16:49:21 +02:00
Marcel	1aca4c4a41	security(ocr): add non-root user and set HOME/HF_HOME in Dockerfile CIS Docker §4.1: run uvicorn as UID 1000 (ocr) instead of root. Creates /home/ocr and /app/cache with correct ownership so named volumes inherit ocr:ocr on first Docker mount. Sets HOME and HF_HOME so ~ expansion and Hugging Face caching resolve under /app, not /root. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 16:46:25 +02:00
Marcel	50b18f0849	docs(legibility): fix three review blockers in DOC-7 Some checks failed CI / Unit & Component Tests (push) Failing after 3m29s Details CI / OCR Service Tests (push) Successful in 32s Details CI / Backend Unit Tests (push) Failing after 3m29s Details - docs/README.md: remove duplicate infrastructure/ entry at end of folder tree - ocr-service/CLAUDE.md: add LLM reminder: prefix to ALLOWED_PDF_HOSTS SSRF warning (consistent with all other machine-readable instructions) - backend/CLAUDE.md: restore ResponseStatusException note for simple controller validation — avoids LLMs reaching for DomainException for trivial checks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 07:41:02 +02:00
Marcel	86c13a230c	docs(legibility): migrate CLAUDE.md rules into human docs — DOC-7 Processes all 7 CLAUDE.md files according to the 3-bucket classification. Migration targets (CONTRIBUTING.md, docs/ARCHITECTURE.md, docs/DEPLOYMENT.md, domain READMEs) are introduced by DOC-2/4/5/6 — this PR must merge last. ### scripts/CLAUDE.md → scripts/README.md New `scripts/README.md` with full script documentation (preserving the ⚠️ destructive-operation warning on reset-db.sh). `scripts/CLAUDE.md` reduced to a pointer + "document new scripts in README.md" reminder. ### .devcontainer/CLAUDE.md → .devcontainer/README.md New `.devcontainer/README.md` with all configuration, usage, and limitations. `devcontainer/CLAUDE.md` reduced to a single pointer line. ### docs/CLAUDE.md → docs/README.md New `docs/README.md` covering the folder structure, ADR guide, infrastructure docs, and specs folder. `docs/CLAUDE.md` reduced to pointer + ADR reminder. ### ocr-service/CLAUDE.md Reduced to pointer to `ocr-service/README.md` (content migrated in DOC-6). Kept LLM reminders: single-node constraint, ALLOWED_PDF_HOSTS SSRF risk. ### backend/CLAUDE.md - Layering Rules → pointer to docs/ARCHITECTURE.md - Error Handling → pointer to CONTRIBUTING.md + reminder - Security/Permissions → pointer to docs/ARCHITECTURE.md + reminder - Package Structure → tagged TODO post-REFACTOR-1 - Fixed errors.ts path to frontend/src/lib/shared/errors.ts - Added ANNOTATE_ALL + BLOG_WRITE to permission list - Key Entities, Entity Code Style, Services → kept (Bucket-2) ### root CLAUDE.md - Stack, Infrastructure, Dev Container → pointers - Layering Rules, Error Handling, Security, OpenAPI, API Client, Date Handling, UI Components, Frontend Error Handling → pointers + reminders - Package Structure → tagged TODO post-REFACTOR-1 - Domain Model, Entity Code Style, Form Actions, Styling → kept (Bucket-2) ### frontend/CLAUDE.md - API Client Pattern, Date Handling → pointers + reminders - Key UI Components → pointer to domain READMEs - Styling, Form Actions, How to Run, Vite Proxy, i18n → kept (Bucket-2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 07:41:02 +02:00
Marcel	a1b89670c0	docs(legibility): add 18 per-domain README.md files (DOC-6) Backend (9): document, person, tag, user, geschichte, notification, ocr, audit, dashboard. Frontend (8): document, person, tag, user, geschichte, notification, ocr, shared. OCR service (1): ocr-service/README.md. Each README covers: what the domain owns, explicit non-ownership, public surface (verified by grep against the codebase), internal layout, and cross-domain dependencies. Closes #400 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 07:36:38 +02:00
Marcel	e85057bed2	refactor(document): move document domain core to document/ package Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 12:39:20 +02:00
Marcel	23cf88856e	fix(ocr): guard Kraken block extraction against missing boundary/baseline Some checks failed CI / Unit & Component Tests (push) Failing after 2m37s Details CI / OCR Service Tests (push) Successful in 32s Details CI / Backend Unit Tests (push) Failing after 2m51s Details extract_page_blocks() walked `record.boundary` and `record.baseline` unconditionally, so a record that arrived without either (malformed kraken output, or a MagicMock in tests that iterates to nothing) crashed with "min() arg is an empty sequence". Coerce both attributes through list(), require at least 3 points for the polygon path, fall back to the baseline path when the polygon is missing, and skip the record entirely when neither is usable — emitting no block is safer than emitting one with garbage coordinates. The test helper now sets `boundary` and `baseline` explicitly to mirror real Kraken 7.0 records (and so the happy-path test exercises the polygon branch). A new regression test covers the skip path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-23 09:33:03 +02:00
Marcel	1f7b712dd0	fix(ocr): accept sender_model_path in Surya engine so non-Kurrent OCR works Some checks failed CI / Unit & Component Tests (push) Failing after 2m36s Details CI / OCR Service Tests (push) Successful in 33s Details CI / Backend Unit Tests (push) Has started running Details main.py unifies the call to both engines and always passes `sender_model_path` (None for non-Kurrent scripts). Surya's extract_region_text / extract_page_blocks accepted one fewer positional arg than Kraken's, so every guided-OCR run on a TYPEWRITER or HANDWRITING_LATIN document raised "takes 5 positional arguments but 6 were given" and the stream returned 0 blocks / 1 skipped page. Add an ignored `sender_model_path` kwarg to both Surya functions so the signatures match Kraken's, and guard the regression with two signature tests in test_engines.py that compare both engines' parameter lists. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-23 09:28:25 +02:00
Marcel	64a854aad6	refactor(ocr): mark _SenderModelRegistry.contains as private (_contains) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 21:26:46 +02:00
Marcel	84c09e41ef	test(ocr): add /train-sender auth tests and run sender registry tests in CI Add 503/403 auth tests for the /train-sender endpoint, matching the pattern already used for /train and /segtrain. Also surface test_sender_registry.py in CI (it needs no ML stack) and add pytest-asyncio to the install step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 21:14:27 +02:00
Marcel	000079fd50	refactor(ocr): rename _contains to contains in SenderModelRegistry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 20:53:16 +02:00
Marcel	07035b9fa9	style(ocr): add Image type hints to extract_page_blocks and extract_region_text Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 20:22:34 +02:00
Marcel	eab37b9ac9	test(ocr): verify load failure does not cache broken entry in SenderModelRegistry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 20:19:40 +02:00
Marcel	64d27d6d61	feat(ocr): per-sender model registry and /train-sender endpoint engines/kraken.py: - Add _SenderModelRegistry with LRU eviction (max configurable via OCR_MAX_CACHED_MODELS env var), double-checked locking, invalidate(), and path whitelist (/app/models/ only) - Add _load_sender_model() helper for testability - extract_page_blocks() and extract_region_text() accept optional sender_model_path; route to sender registry when provided models.py: - OcrRequest gains senderModelPath: str \| None = None field main.py: - /ocr and /ocr/stream pass request.senderModelPath to Kraken engine - New /train-sender endpoint: validates output_model_path, runs ketos train with base model as starting point, invalidates sender cache docker-compose.yml: - Add OCR_MAX_CACHED_MODELS: "5" to ocr-service environment test_sender_registry.py: - 4 tests: cache hit, LRU eviction, invalidate, path traversal guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 18:05:39 +02:00
Marcel	c5e6ed922b	test(ocr): decouple correction tests from exact library dictionary state Some checks failed CI / Unit & Component Tests (pull_request) Successful in 3m35s Details CI / OCR Service Tests (pull_request) Successful in 36s Details CI / Backend Unit Tests (pull_request) Failing after 2m47s Details CI / Unit & Component Tests (push) Failing after 2m33s Details CI / OCR Service Tests (push) Successful in 34s Details CI / Backend Unit Tests (push) Failing after 2m41s Details Replace exact-string assertions in test_correctable_ocr_error_gets_corrected and test_sentence_with_multiple_corrections with structural assertions that verify behavior (correction attempted, marker present, expected stem) without coupling to a specific pyspellchecker version's frequency weights. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 17:23:09 +02:00
Marcel	ec85f228c1	refactor(ocr): document > 50 frequency threshold rationale Strict greater-than avoids non-determinism: if multiple candidates share the minimum frequency value, pyspellchecker's ranking is undefined. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 17:21:37 +02:00
Marcel	fea24aee25	refactor(ocr): make collapse_adjacent_markers a public function Drop underscore prefix — the helper is part of confidence.py's effective public API since spell_check.py imports and calls it directly. Fixes reviewer concern: importing a _-prefixed name across module boundaries contradicts Python's private-by-convention signal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 17:20:31 +02:00
Marcel	77100ab1e6	feat(ocr): integrate spell-check post-processing for handwriting script types Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:54:17 +02:00
Marcel	092131930c	feat(ocr): add spell_check module with German spellchecker and historical wordlist Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:52:50 +02:00
Marcel	47f9a0bf73	test(ocr): add failing tests for spell_check module Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:51:38 +02:00
Marcel	30a6cbeb7f	feat(ocr): add DTA-derived historical German wordlist and generation script 153K words from dtak+dtae 1800-1899 corpora (min_freq=20), covering pre-reform spellings common in Kurrent/Süterlin documents. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:48:26 +02:00
Marcel	6faaa3b7d6	feat(ocr): add pyspellchecker dependency Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:41:24 +02:00
Marcel	77747aa556	refactor(ocr): extract _collapse_adjacent_markers helper and add CORRECTION_MARKER Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:40:39 +02:00
Marcel	4cb7c975f5	test(ocr): add resilience tests for tiny image and unexpected exception propagation Some checks failed CI / Unit & Component Tests (pull_request) Failing after 2m27s Details CI / Backend Unit Tests (pull_request) Failing after 2m37s Details CI / Unit & Component Tests (push) Failing after 3m14s Details CI / Backend Unit Tests (push) Has been cancelled Details Add test for 1×1 image (sub-tile-size) resilience and narrow preprocess_page fallback from except Exception to (cv2.error, ValueError, MemoryError) so programming errors propagate instead of being silently swallowed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 15:16:17 +02:00
Marcel	b310caaeeb	feat(ocr): integrate preprocessing into stream and batch endpoints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 14:16:47 +02:00

1 2

91 Commits