Files
familienarchiv/ocr-service/Dockerfile
Marcel e8375d6c72 fix(ocr-service): add entrypoint that validates blla model format on startup
Adds ensure_blla_model.py which loads the blla segmentation model with
ketos on every container start. If the model is missing or in the legacy
PyTorch ZIP format (incompatible with ketos 7), it re-downloads the
correct CoreML protobuf model from Zenodo (DOI 10.5281/zenodo.14602569).
The Dockerfile now uses entrypoint.sh which runs this check before
starting uvicorn.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00

29 lines
748 B
Docker

FROM python:3.11-slim
WORKDIR /app
# curl for healthcheck; libgomp1 for PyTorch CPU threading; libvips for kraken PDF support
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
libgomp1 \
libvips42 \
&& rm -rf /var/lib/apt/lists/*
# PyTorch CPU-only — separate layer; the whl/cpu index strips all CUDA variants (~2 GB saved)
# torchvision must also come from the CPU index to match torch's operator registrations
RUN pip install --no-cache-dir \
torch==2.7.1 \
torchvision==0.22.1 \
--index-url https://download.pytorch.org/whl/cpu
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN chmod +x /app/entrypoint.sh
EXPOSE 8000
CMD ["/app/entrypoint.sh"]