Files
familienarchiv/docs/architecture/c4/l3-backend-3f-ocr.puml

3.1 KiB

Component Diagram: API Backend — OCR OrchestrationComponent Diagram: API Backend — OCR OrchestrationAPI Backend (Spring Boot)[system]«component»OcrController[Spring MVC â€” /api/ocr] REST entry point: triggersingle or batch OCR jobs,stream progress via SSE,query job status, andmanage training runs andper-sender models.«component»OcrService[Spring Service] Creates OcrJob andOcrJobDocument records,checks Python servicehealth, and delegates asyncexecution toOcrAsyncRunner.«component»OcrBatchService[Spring Service] Orchestratesmulti-document OCR jobs,iterating documents anddelegating each toOcrAsyncRunner.«component»OcrAsyncRunner[Spring Component â€” @Async] Async worker that streamsOCR results from Pythonpage by page, persiststranscription blocks andannotations via domainservices, and emitsprogress via SSE.«component»RestClientOcrClient[Spring Component] HTTP client wrapping thePython service: POST/ocr/stream (NDJSON),/train, /segtrain, and/train-sender. Falls backfrom streaming to batch on404.«component»OcrTrainingService[Spring Service] Orchestrates modeltraining: exports trainingdata as ZIP, calls Python/train or /segtrain, persiststraining metrics inOcrTrainingRunRepository.«component»OcrJobRepository,OcrJobDocumentRepository[Spring Data JPA] Reads and writes OcrJoband OcrJobDocumentrecords. Tracks job status(RUNNING/DONE/FAILED),per-document progress,page counts, and errormessages.«container»Web Frontend[SvelteKit]«container»PostgreSQL[PostgreSQL 16]«container»Object Storage[MinIO (S3-compatible)]«container»OCR Service[Python FastAPI]«component»TranscriptionService[Spring Service] See diagram 3c. Called byOcrAsyncRunner to persisttranscription blocks perpage.«component»AnnotationService[Spring Service] See diagram 3c. Called byOcrAsyncRunner to persistOCR-generated annotationregions per page.OCR trigger, status,and progressrequests[HTTP / JSON / SSE]Single-document jobsBatch jobsTraining runsDelegates asyncexecutionDelegates asyncexecutionStreams OCR resultspage by page[HTTP / NDJSON]Sends training dataZIP[HTTP / multipart]POST /ocr/stream,/train, /segtrain,/train-sender[HTTP / REST]Saves transcriptionblocks per pageSaves annotationregions per pageReads / writes OCRjob stateSQL queries[JDBC]Generates presignedURLs for PDF fetch[S3 API]Fetches PDF viapresigned URL[HTTP / S3 presigned]Persists training runmetrics[JDBC]