This website requires JavaScript.
Explore
Help
Register
Sign In
marcel
/
familienarchiv
Watch
1
Star
0
Fork
0
You've already forked familienarchiv
Code
Issues
115
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
bc2dd3a98a355a992a8aebb1d4c4dbcbbe529e33
familienarchiv
/
docs
/
architecture
/
c4
/
l3-backend-3f-ocr.puml
Marcel
421d7ffd37
docs(c4): add L3 backend 3e persons, 3f OCR, 3g supporting domains
2026-05-06 22:52:21 +02:00
3.1 KiB
Raw
Blame
History
Component Diagram: API Backend â OCR Orchestration
Component Diagram: API Backend â OCR Orchestration
API Backend (Spring Boot)
[system]
«component»
OcrController
[Spring
MVC
â
/api/ocr]
REST
entry
point:
trigger
single
or
batch
OCR
jobs,
stream
progress
via
SSE,
query
job
status,
and
manage
training
runs
and
per-sender
models.
«component»
OcrService
[Spring
Service]
Creates
OcrJob
and
OcrJobDocument
records,
checks
Python
service
health,
and
delegates
async
execution
to
OcrAsyncRunner.
«component»
OcrBatchService
[Spring
Service]
Orchestrates
multi-document
OCR
jobs,
iterating
documents
and
delegating
each
to
OcrAsyncRunner.
«component»
OcrAsyncRunner
[Spring
Component
â
@Async]
Async
worker
that
streams
OCR
results
from
Python
page
by
page,
persists
transcription
blocks
and
annotations
via
domain
services,
and
emits
progress
via
SSE.
«component»
RestClientOcrClient
[Spring
Component]
HTTP
client
wrapping
the
Python
service:
POST
/ocr/stream
(NDJSON),
/train,
/segtrain,
and
/train-sender.
Falls
back
from
streaming
to
batch
on
404.
«component»
OcrTrainingService
[Spring
Service]
Orchestrates
model
training:
exports
training
data
as
ZIP,
calls
Python
/train
or
/segtrain,
persists
training
metrics
in
OcrTrainingRunRepository.
«component»
OcrJobRepository,
OcrJobDocumentRepository
[Spring
Data
JPA]
Reads
and
writes
OcrJob
and
OcrJobDocument
records.
Tracks
job
status
(RUNNING/DONE/FAILED),
per-document
progress,
page
counts,
and
error
messages.
«container»
Web
Frontend
[SvelteKit]
«container»
PostgreSQL
[PostgreSQL
16]
«container»
Object
Storage
[MinIO
(S3-compatible)]
«container»
OCR
Service
[Python
FastAPI]
«component»
TranscriptionService
[Spring
Service]
See
diagram
3c.
Called
by
OcrAsyncRunner
to
persist
transcription
blocks
per
page.
«component»
AnnotationService
[Spring
Service]
See
diagram
3c.
Called
by
OcrAsyncRunner
to
persist
OCR-generated
annotation
regions
per
page.
OCR
trigger,
status,
and
progress
requests
[HTTP
/
JSON
/
SSE]
Single-document
jobs
Batch
jobs
Training
runs
Delegates
async
execution
Delegates
async
execution
Streams
OCR
results
page
by
page
[HTTP
/
NDJSON]
Sends
training
data
ZIP
[HTTP
/
multipart]
POST
/ocr/stream,
/train,
/segtrain,
/train-sender
[HTTP
/
REST]
Saves
transcription
blocks
per
page
Saves
annotation
regions
per
page
Reads
/
writes
OCR
job
state
SQL
queries
[JDBC]
Generates
presigned
URLs
for
PDF
fetch
[S3
API]
Fetches
PDF
via
presigned
URL
[HTTP
/
S3
presigned]
Persists
training
run
metrics
[JDBC]
Reference in New Issue
View Git Blame
Copy Permalink