docs(ocr): document single-node constraint for OCR training

Training reloads the Kraken model in-process on the Python service.
The DB-level RUNNING constraint prevents concurrent API calls but
cannot protect against multi-replica deployments. Added explicit
comments in docker-compose.yml and OcrTrainingService to prevent
accidental horizontal scaling. See ADR-001.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Marcel
2026-04-13 23:01:45 +02:00
parent 2b355e748e
commit 287920a982
2 changed files with 7 additions and 0 deletions

View File

@@ -42,6 +42,10 @@ public class OcrTrainingService {
List<OcrTrainingRun> runs List<OcrTrainingRun> runs
) {} ) {}
// Not safe for horizontal scaling: training reloads the Kraken model in-process on the
// Python OCR service after each run. The DB-level RUNNING constraint (V30 partial unique
// index) prevents concurrent training API calls, but cannot prevent two OCR service replicas
// from diverging on model state. Deploy as a single instance only. See ADR-001.
@Transactional @Transactional
public OcrTrainingRun triggerTraining(UUID triggeredBy) { public OcrTrainingRun triggerTraining(UUID triggeredBy) {
if (trainingRunRepository.findFirstByStatus(TrainingStatus.RUNNING).isPresent()) { if (trainingRunRepository.findFirstByStatus(TrainingStatus.RUNNING).isPresent()) {

View File

@@ -72,6 +72,9 @@ services:
- archive-net - archive-net
# --- OCR: Python microservice (Surya + Kraken) --- # --- OCR: Python microservice (Surya + Kraken) ---
# Single-node only: OCR training reloads the model in-process after each run.
# Running multiple replicas would cause training conflicts and model-state divergence.
# See ADR-001 for the architectural rationale.
ocr-service: ocr-service:
build: build:
context: ./ocr-service context: ./ocr-service