feat(ocr): add Python OCR microservice, RestClientOcrClient, Docker Compose
Python microservice (ocr-service/): - FastAPI app with /ocr and /health endpoints - Surya engine: transformer-based OCR for typewritten/modern handwriting - Kraken engine: historical HTR for Kurrent/Suetterlin with pure-Python polygon-to-quad approximation (gift wrapping + rotating calipers) - Eager model loading at startup via lifespan context manager - PDF download via httpx, page rendering via pypdfium2 at 300 DPI Java RestClientOcrClient: - Implements OcrClient + OcrHealthClient interfaces - Calls Python service via Spring RestClient - Health check with graceful fallback Docker Compose: - New ocr-service container (mem_limit 6g, no host ports) - Health check with start_period 60s for model loading - ocr_models volume for Kraken model files - Backend depends on ocr-service health Refs #226, #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
20
ocr-service/models.py
Normal file
20
ocr-service/models.py
Normal file
@@ -0,0 +1,20 @@
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class OcrRequest(BaseModel):
|
||||
pdf_url: str = Field(..., alias="pdfUrl")
|
||||
script_type: str = Field("UNKNOWN", alias="scriptType")
|
||||
language: str = "de"
|
||||
|
||||
|
||||
class OcrBlock(BaseModel):
|
||||
page_number: int = Field(..., alias="pageNumber")
|
||||
x: float
|
||||
y: float
|
||||
width: float
|
||||
height: float
|
||||
polygon: list[list[float]] | None = None
|
||||
text: str
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
Reference in New Issue
Block a user