test(ocr): add integration test for full streaming pipeline with a real image #258

Open
opened 2026-04-17 15:20:18 +02:00 by marcel · 0 comments
Owner

Background

Deferred during PR #255 review cycle 1 (Sara Holt — QA review).

Concern

The preprocessing pipeline is tested at unit level only. test_stream.py mocks the OCR engine and test_preprocessing.py tests preprocess_page in isolation. There is no test that spins up the FastAPI app and posts a real image through /ocr/stream to verify the full event sequence end-to-end.

"No integration test — the preprocessing pipeline is only tested at unit level. There's no test that spins up the FastAPI app and calls /stream with a real image to verify the full event sequence end-to-end." — @Sara

Suggested approach

Use httpx.AsyncClient + ASGITransport (already used in other tests) with a small real PNG/PDF image. The test should assert that:

  • A preprocessing event is emitted for each page
  • A page event follows each preprocessing event
  • A done event closes the stream

This would catch bugs in the generate() / generate_guided() ordering that unit tests cannot catch.

Why deferred

Requires a real PDF fixture to be included in the test assets and needs the OCR models to be available (or a lightweight mock that doesn't need patching). Infrastructure work beyond the scope of PR #255.

Reference

PR: http://heim-nas:3005/marcel/familienarchiv/pulls/255

## Background Deferred during PR #255 review cycle 1 (Sara Holt — QA review). ## Concern The preprocessing pipeline is tested at unit level only. `test_stream.py` mocks the OCR engine and `test_preprocessing.py` tests `preprocess_page` in isolation. There is no test that spins up the FastAPI app and posts a real image through `/ocr/stream` to verify the full event sequence end-to-end. > "No integration test — the preprocessing pipeline is only tested at unit level. There's no test that spins up the FastAPI app and calls `/stream` with a real image to verify the full event sequence end-to-end." — @Sara ## Suggested approach Use `httpx.AsyncClient` + `ASGITransport` (already used in other tests) with a small real PNG/PDF image. The test should assert that: - A `preprocessing` event is emitted for each page - A `page` event follows each `preprocessing` event - A `done` event closes the stream This would catch bugs in the `generate()` / `generate_guided()` ordering that unit tests cannot catch. ## Why deferred Requires a real PDF fixture to be included in the test assets and needs the OCR models to be available (or a lightweight mock that doesn't need patching). Infrastructure work beyond the scope of PR #255. ## Reference PR: http://heim-nas:3005/marcel/familienarchiv/pulls/255
marcel added the test label 2026-04-17 15:33:32 +02:00
Sign in to join this conversation.
No Label test
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#258