feat(ocr): add Python OCR microservice, RestClientOcrClient, Docker Compose
Python microservice (ocr-service/): - FastAPI app with /ocr and /health endpoints - Surya engine: transformer-based OCR for typewritten/modern handwriting - Kraken engine: historical HTR for Kurrent/Suetterlin with pure-Python polygon-to-quad approximation (gift wrapping + rotating calipers) - Eager model loading at startup via lifespan context manager - PDF download via httpx, page rendering via pypdfium2 at 300 DPI Java RestClientOcrClient: - Implements OcrClient + OcrHealthClient interfaces - Calls Python service via Spring RestClient - Health check with graceful fallback Docker Compose: - New ocr-service container (mem_limit 6g, no host ports) - Health check with start_period 60s for model loading - ocr_models volume for Kraken model files - Backend depends on ocr-service health Refs #226, #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -71,6 +71,28 @@ services:
|
||||
networks:
|
||||
- archive-net
|
||||
|
||||
# --- OCR: Python microservice (Surya + Kraken) ---
|
||||
ocr-service:
|
||||
build:
|
||||
context: ./ocr-service
|
||||
dockerfile: Dockerfile
|
||||
container_name: archive-ocr
|
||||
restart: unless-stopped
|
||||
mem_limit: 6g
|
||||
memswap_limit: 6g
|
||||
volumes:
|
||||
- ocr_models:/app/models
|
||||
environment:
|
||||
KRAKEN_MODEL_PATH: /app/models/german_kurrent.mlmodel
|
||||
networks:
|
||||
- archive-net
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 12
|
||||
start_period: 60s
|
||||
|
||||
# --- Backend: Spring Boot ---
|
||||
backend:
|
||||
build:
|
||||
@@ -89,6 +111,8 @@ services:
|
||||
condition: service_healthy
|
||||
mailpit:
|
||||
condition: service_started
|
||||
ocr-service:
|
||||
condition: service_healthy
|
||||
environment:
|
||||
SPRING_DATASOURCE_URL: jdbc:postgresql://db:5432/${POSTGRES_DB}
|
||||
SPRING_DATASOURCE_USERNAME: ${POSTGRES_USER}
|
||||
@@ -109,6 +133,8 @@ services:
|
||||
# Mailpit needs no auth or STARTTLS; production SMTP overrides these via .env
|
||||
SPRING_MAIL_PROPERTIES_MAIL_SMTP_AUTH: ${MAIL_SMTP_AUTH:-false}
|
||||
SPRING_MAIL_PROPERTIES_MAIL_SMTP_STARTTLS_ENABLE: ${MAIL_STARTTLS_ENABLE:-false}
|
||||
APP_OCR_BASE_URL: http://ocr-service:8000
|
||||
APP_S3_INTERNAL_URL: http://minio:9000
|
||||
ports:
|
||||
- "${PORT_BACKEND}:8080"
|
||||
networks:
|
||||
@@ -155,3 +181,4 @@ networks:
|
||||
volumes:
|
||||
frontend_node_modules:
|
||||
maven_cache:
|
||||
ocr_models:
|
||||
|
||||
Reference in New Issue
Block a user