docs(ocr): document single-node constraint for OCR training
Training reloads the Kraken model in-process on the Python service. The DB-level RUNNING constraint prevents concurrent API calls but cannot protect against multi-replica deployments. Added explicit comments in docker-compose.yml and OcrTrainingService to prevent accidental horizontal scaling. See ADR-001. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -42,6 +42,10 @@ public class OcrTrainingService {
|
|||||||
List<OcrTrainingRun> runs
|
List<OcrTrainingRun> runs
|
||||||
) {}
|
) {}
|
||||||
|
|
||||||
|
// Not safe for horizontal scaling: training reloads the Kraken model in-process on the
|
||||||
|
// Python OCR service after each run. The DB-level RUNNING constraint (V30 partial unique
|
||||||
|
// index) prevents concurrent training API calls, but cannot prevent two OCR service replicas
|
||||||
|
// from diverging on model state. Deploy as a single instance only. See ADR-001.
|
||||||
@Transactional
|
@Transactional
|
||||||
public OcrTrainingRun triggerTraining(UUID triggeredBy) {
|
public OcrTrainingRun triggerTraining(UUID triggeredBy) {
|
||||||
if (trainingRunRepository.findFirstByStatus(TrainingStatus.RUNNING).isPresent()) {
|
if (trainingRunRepository.findFirstByStatus(TrainingStatus.RUNNING).isPresent()) {
|
||||||
|
|||||||
@@ -72,6 +72,9 @@ services:
|
|||||||
- archive-net
|
- archive-net
|
||||||
|
|
||||||
# --- OCR: Python microservice (Surya + Kraken) ---
|
# --- OCR: Python microservice (Surya + Kraken) ---
|
||||||
|
# Single-node only: OCR training reloads the model in-process after each run.
|
||||||
|
# Running multiple replicas would cause training conflicts and model-state divergence.
|
||||||
|
# See ADR-001 for the architectural rationale.
|
||||||
ocr-service:
|
ocr-service:
|
||||||
build:
|
build:
|
||||||
context: ./ocr-service
|
context: ./ocr-service
|
||||||
|
|||||||
Reference in New Issue
Block a user