Files
familienarchiv/ocr-service/Dockerfile
Marcel 6737bd6db5 feat(ocr): add Python OCR microservice, RestClientOcrClient, Docker Compose
Python microservice (ocr-service/):
- FastAPI app with /ocr and /health endpoints
- Surya engine: transformer-based OCR for typewritten/modern handwriting
- Kraken engine: historical HTR for Kurrent/Suetterlin with
  pure-Python polygon-to-quad approximation (gift wrapping + rotating calipers)
- Eager model loading at startup via lifespan context manager
- PDF download via httpx, page rendering via pypdfium2 at 300 DPI

Java RestClientOcrClient:
- Implements OcrClient + OcrHealthClient interfaces
- Calls Python service via Spring RestClient
- Health check with graceful fallback

Docker Compose:
- New ocr-service container (mem_limit 6g, no host ports)
- Health check with start_period 60s for model loading
- ocr_models volume for Kraken model files
- Backend depends on ocr-service health

Refs #226, #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:26:40 +02:00

24 lines
593 B
Docker

FROM python:3.11-slim
WORKDIR /app
# curl for healthcheck; libgomp1 for PyTorch CPU threading
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
libgomp1 \
&& rm -rf /var/lib/apt/lists/*
# PyTorch CPU-only — separate layer; the whl/cpu index strips all CUDA variants (~2 GB saved)
RUN pip install --no-cache-dir \
torch==2.5.1 \
--index-url https://download.pytorch.org/whl/cpu
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]