feat(ocr): full OCR pipeline with polygon annotations, training, and guided mode #232
Reference in New Issue
Block a user
Delete Branch "feat/issue-226-227-ocr-pipeline-polygon"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #226, closes #227
Summary
End-to-end OCR pipeline for handwritten (Kurrent) and printed documents, built on Kraken/Surya via a Python microservice, with annotation-backed transcription blocks, a segmentation + recognition training pipeline, and a guided OCR mode.
Core infrastructure
OcrService,OcrBatchService,OcrProgressService,OcrControlleron the backendOcrClientinterface +RestClientOcrClientwith NDJSON streaming[unleserlich]markers for low-confidence wordsAnnotations & polygons
@JdbcTypeCode(JSON)) for quadrilateral annotation shapescreateOcrAnnotationbypasses overlap check for OCR-placed regionsScriptTypeenum +script_typecolumn onDocumentStreaming & progress
POST /ocr/streamwithOcrStreamEventsealed interfaceOcrProgressBarcomponentTraining pipeline
ketos segtrainsubprocessPOST /train+GET /training-infoendpointsSegmentationTrainingCardand OCR training card onadmin/systemGuided OCR & manual-first workflow
Other
TranscriptionEditViewsource/reviewedfields on transcription blocks for training pipelineTest plan
./mvnw test— all backend tests pass (incl.OcrServiceTest)docker-compose up -d), upload a PDF document, select script type, trigger OCR — blocks appear with bounding polygons[unleserlich]markers🤖 Generated with Claude Code
- V29 migration: document_training_labels join table - TrainingLabel enum: KURRENT_RECOGNITION, KURRENT_SEGMENTATION - Document.trainingLabels @ElementCollection - DocumentService.addTrainingLabel / removeTrainingLabel - PATCH /api/documents/{id}/training-labels (WRITE_ALL) - Auto-enroll on Kurrent OCR trigger (OcrService.startOcr) - TranscriptionEditView: enrollment chips in panel footer - JPQL queries updated to use MEMBER OF trainingLabels Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
Great work shipping this — the scope is huge and the architecture is coherent. A few things I'd want addressed before merging.
Blockers
1.
OcrAsyncRunneris untestedThe async runner is the core orchestration path — it calls the Python service, creates annotations and blocks, emits SSE progress. None of this is tested. A failing service, a bad NDJSON event, or a polygon mapping bug would be invisible until production.
2.
TranscriptionService.upsertGuidedBlockhas no testThe conditional logic here is non-trivial: preserve MANUAL blocks with text, update/create OCR blocks otherwise. This is exactly the kind of branching that should be pinned with unit tests before it gets touched.
3.
RestClientOcrClient.parseNdjsonStream— untested parserNDJSON line-by-line parsing with a type discriminator and Jackson deserialization — three things that can go wrong silently. Should have at minimum: a test for each event type, a test for blank lines, a test for unknown types (should not crash), and a test for truncated streams.
4.
AnnotationService.deleteAnnotationcascade — no testThe cascade delete (annotation → transcription block) is a critical invariant. The commit message says "cascade-delete transcription block when annotation is deleted" but I see no test asserting
transcriptionBlockRepository.findByAnnotationId(id)returns empty after delete.Suggestions
5.
OcrProgressServiceemitter cleanupOn
SseEmittertimeout/error, the emitter is removed from the per-job list — but the job entry in the map is never removed, even aftercomplete(). This is a minor memory leak for long-running instances with many jobs. AConcurrentHashMap.remove(jobId)incomplete()would fix it.6.
TranscriptionService.sanitizeTextis a private method doing two thingsIt strips newlines AND caps at 10k chars. Those are separate concerns and the 10k cap is surprising (no comment, no constant). Extract a named constant
MAX_BLOCK_TEXT_LENGTH = 10_000and add a brief comment for future readers.7. Svelte:
TranscriptionEditView.svelteis carrying a lotDrag-and-drop reordering, debounced auto-save, beacon fallback, OCR triggering, review progress — this component has grown significantly. Worth a follow-up split into focused sub-components, but not a blocker for this PR.
🏛️ Markus Keller — Senior Application Architect
Verdict: ⚠️ Approved with concerns
The ADRs are exactly what I want to see — ADR-001 and ADR-002 make the trade-offs explicit and are well-reasoned. The overall architecture is sound. A few structural risks worth naming.
Blockers
1. Presigned URL TTL race
The OCR service downloads the PDF via presigned MinIO URL. For a large multi-page document, OCR can take several minutes. If the presigned URL expires mid-stream (typical TTL: 15 min, but configurable), the download fails silently and the job ends in an error state. The TTL should be set to at least
max_expected_ocr_time × 1.5and that assumption should be documented — or generate the URL inside the OCR service request, not before.2. Training mutates in-process model state — documented but not guarded
ADR-001 flags this honestly: training is not safe for horizontal scaling. That's fine for now, but the DB-level "one RUNNING training" constraint only prevents concurrent training API calls — it doesn't prevent someone from running two OCR service replicas. Add a clear comment in the compose file and in the training endpoint that horizontal scaling is explicitly unsupported until this is revisited.
Suggestions
3. Dual-protocol streaming adds a translation layer
The Python service streams NDJSON → Spring Boot collects it → re-emits as SSE to the browser. This means Spring Boot is a stateful relay. If the Spring Boot process restarts mid-OCR, in-flight SSE clients lose their stream with no reconnect path (the
EventSourceretry only reconnects if the server is reachable). Consider persisting the last known job progress in the DB so clients can catch up on reconnect.4.
OcrProgressServiceemitter pool is in-memoryA single-node deployment is fine (and that's the target per ADR-001). Just worth noting: if this ever runs behind a load balancer, SSE clients may connect to a node that doesn't hold their emitter. Document the single-node requirement.
5. Package placement of
ocr/vsservice/RestClientOcrClientandOcrStreamEventlive inorg.raddatz.familienarchiv.ocr— a new package.OcrService,OcrBatchService, andOcrProgressServicelive inservice/. This split is a bit inconsistent. Either move the client classes intoservice/or move the services intoocr/. Not a blocker, but worth a quick tidy.🧪 Sara Holt — Senior QA Engineer & Test Strategist
Verdict: 🚫 Changes requested
This is a large, complex feature with significant business logic — but the test suite doesn't match that complexity. I'd expect integration coverage for the new migrations and at least unit tests for every non-trivial branch. Right now I'm counting four untested critical paths.
Blockers
1. No Testcontainers integration tests for new Flyway migrations
Migrations V23 through V32 introduce new tables, JSONB columns, unique partial indexes, and CHECK constraints. None of these are validated by an integration test with a real PostgreSQL instance. The JSONB polygon check constraint (
chk_annotation_polygon_quad) and the partial unique index onocr_training_runswherestatus='RUNNING'are database-level invariants — they need at least one integration test each to verify Flyway applies them correctly and they actually enforce what we expect.2.
OcrAsyncRunner— zero test coverageThis is the most complex orchestration path in the PR: NDJSON event processing, annotation creation, transcription block creation, SSE emission, error handling. It needs unit tests with a mocked
OcrClientthat returns each event type (Start, Page, Error, Done), including edge cases like empty page results and a failed stream.3.
TranscriptionService.upsertGuidedBlock— untested branchingThree branches: (a) existing MANUAL block with text → preserve, (b) existing OCR block → update, (c) no block → create. None are covered. This logic will silently break if someone refactors the conditional and there's no test to catch it.
4.
RestClientOcrClient.parseNdjsonStream— untested parserThis is custom deserialization code. Minimum required tests: each event type deserializes correctly, blank lines are skipped, unknown
typefield doesn't throw, truncated JSON line throws a meaningful exception (not NPE).Suggestions
5.
OcrServiceTestuses Mockito mocks throughout — good pattern, keep itThe existing tests are clean and follow the right unit test style. Extend this pattern to the missing classes above.
6. Missing boundary test: minimum block size
TranscriptionService.createOcrBlockcaps text at 10,000 chars. Is there a test for exactly 10,001 chars being truncated? For empty string? For null text?7. No test for the cascade delete invariant
After
annotationService.deleteAnnotation(), the linkedTranscriptionBlockmust be gone. This should be a Testcontainers integration test — it's a DB-level cascade, not just service logic.🔐 Nora "NullX" Steiner — Application Security Engineer
Verdict: ⚠️ Approved with concerns
The SSRF protection and ZIP path traversal guards are the right calls. A few things need tightening before this is production-safe.
Blockers
1.
OCR_TRAINING_TOKEN— is it actually enforced?The compose file sets
TRAINING_TOKEN: $OCR_TRAINING_TOKENand the Python service presumably checks it on/trainand/segtrain. But I don't see where the Spring Boot backend sends this token when calling those endpoints — and if it doesn't, the training endpoints on the Python service are either unauthenticated (if the service is accidentally port-mapped) or protected only by network isolation. Verify: (a) the token is sent as a header byRestClientOcrClientwhen calling/trainand/segtrain, and (b) the Python service returns 401 if it's missing or wrong. Add a test for the 401 case.2. ZIP upload to
/trainand/segtrain— verify path traversal guardThe summary says path traversal is validated in the Python service. I need to see the actual check. A safe pattern is:
If this guard is not in place, an attacker with
WRITE_ALLpermission could write arbitrary files to the container filesystem via a crafted ZIP.Suggestions
3. SSRF allowlist — confirm it is a strict allowlist, not a denylist
Denylists (
if "169.254" not in url) are easy to bypass via DNS rebinding or IPv6 representations. Confirm the SSRF guard is an allowlist of approved hostnames (minio,localhost) and rejects everything else by default.4.
presigned URLexposure in logsPresigned URLs are credentials — they grant time-limited read access to the PDF. Ensure they are not logged at DEBUG level in
RestClientOcrClientor in the Python service. A log sanitizer or explicit exclusion on the URL parameter would prevent them appearing in log aggregators.5. OCR service port exposure
The
ocr-servicehas noports:mapping in docker-compose — correct. Document this explicitly as a security property (internal-only) so it isn't accidentally added during debugging and left in.6.
@RequirePermission(ADMIN)on batch OCR and training endpoints — appropriateThe permission model is correct. Training endpoints require ADMIN, single-doc OCR requires WRITE_ALL. No findings here.
🎨 Leonie Voss — Senior UX Designer & Accessibility Strategist
Verdict: ⚠️ Approved with concerns
The polygon clipping approach for OCR annotations is technically elegant and visually effective. A few accessibility gaps need closing before this is ready.
Blockers
1.
OcrProgressBar— missing ARIA live regionThe progress bar updates dynamically as OCR pages complete. Screen reader users will not hear these updates unless the container has
aria-live="polite"andaria-atomic="false". The current implementation appears to use a visual-only progress bar. Add:and wrap status text in
<span aria-live="polite">.2. Resize handles on
AnnotationShape(upcoming #233) — plan for keyboard accessThis is a blocker for the upcoming resize feature, not this PR — flagging it now so the design accounts for it. Drag-only resize is inaccessible. Plan for arrow-key nudging (1% step) on a focused handle.
Suggestions
3. Block number badge contrast
The badge shows the block number in the top-left corner of each annotation. Depending on the annotation color (
#00C7B1mint) and badge background, contrast may fall below 4.5:1 for the number text. Verify with a contrast checker for the actual badge colors used.4. OcrTrigger script-type selector — empty/default state
When
scriptTypeis unset (UNKNOWN), the trigger button is disabled. The disabled state should carry anaria-describedbytooltip explaining why it's disabled ("Bitte Schrifttyp auswählen"), not just be grayed out silently.5. Training card empty state — plain language
"Keine OCR-Blöcke vorhanden" is accurate but sparse. For the dual-audience (family members in their 60s+), a brief explanation of what they need to do ("Transkribieren Sie zunächst mindestens 5 Blöcke, um das Training zu starten") would reduce confusion.
6. Focus management after OCR completion
When OCR finishes and the progress bar disappears, focus should return to a logical element (e.g. the first newly created transcription block or the trigger button). Currently focus likely drops to
<body>. Not a blocker but worth a follow-up.🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ⚠️ Approved with concerns
The compose setup is solid for a dev machine. A few things need hardening before anyone runs this in a "production-ish" context.
Blockers
1.
memory: 8gis a soft limit in Docker Composememory: 8gsetsmem_limitwhich is a soft reservation on most Docker setups — the container can still OOM-kill the host if swap is available. For a CPU-only OCR workload that legitimately needs 4–8 GB, this is fine in dev but must bemem_limit+memswap_limit(set equal to prevent swap usage) in any deployment context. Document this clearly in the compose file comment.2.
OCR_TRAINING_TOKENhas no.env.exampleentryLooking at the compose file,
TRAINING_TOKEN: $OCR_TRAINING_TOKENreferences an env var that won't exist on a fresh clone. There should be a.env.example(or the existing one updated) so new developers know this variable is required and what format it takes.Suggestions
3. Model backup rotation is application-level, not infrastructure-level
The Python service rotates the last 3 model backups inside the container. This means a container wipe loses all backups. The
ocr_modelsDocker volume persists the model — but if someone doesdocker-compose down -v, it's gone. Consider documenting a periodicdocker cpor volume snapshot as part of the backup runbook, especially since training a custom Kurrent model is expensive.4. Health check
start_period: 60s— appropriate, but log itThe 60-second startup grace for model loading is correct. Worth adding a startup log line in the Python service ("Models loaded, ready") so operators can distinguish "still loading" from "stuck" in
docker-compose logs ocr-service.5. No
restart: unless-stoppedonocr-serviceThe backend and db services likely have restart policies. The OCR service should too, otherwise a model load OOM that kills the container leaves it stopped permanently.
6. Gitea Actions: if a CI workflow is added later, use
secrets.OCR_TRAINING_TOKENNot a blocker now (no CI workflow for the OCR service exists). Pre-noting: when one is added, the token must come from Gitea secrets, not a hardcoded value in the workflow file.
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
The scope here is impressive — end-to-end OCR pipeline, training loop, guided mode, streaming progress — and the TDD discipline shows. New tests are green first, the service layer is tested in isolation, and the Testcontainers integration tests for DB constraints are exactly right. A few things need addressing before this is truly clean.
Blockers
1.
@Transactionalwraps an HTTP call that takes minutesOcrTrainingService.triggerTraining()is annotated@Transactional. It creates a DB row, then synchronously POSTs the training ZIP to the OCR service (ketos can run for several minutes), then updates the row. The entire DB connection is held open for the duration. On a long training run, this will exhaust the connection pool and timeout.Fix: annotate
@Transactionalonly on the narrow DB-write helpers, and call the OCR HTTP client outside any transaction boundary. Or annotate the class@Transactional(propagation = NEVER)and open transactions only where needed.2.
blockRepository.findAll().size()ingetTrainingInfo()This loads every transcription block in the database into a Java
Listjust to call.size(). With a few thousand blocks this is already a slow query that allocates a large heap object on every training-info poll.Same fix for any similar
.findAll().size()pattern.Suggestions
3. Boolean flag argument in
startOcrandrunSingleDocumentBoolean flag arguments obscure intent at the call site and violate the "functions do one thing" rule. The two modes (normal OCR vs. guided OCR) are meaningfully different code paths. Consider:
Or an enum:
OcrMode.GUIDED/OcrMode.FULL.4.
/trainand/segtraininmain.pyare too longBoth handlers are 60+ lines: extract ZIP, validate, run subprocess, parse output, backup, rotate, reload. Each of those is a separate concern with a name. The persona guide says 40 lines is the splitting signal.
5. Training label chips hardcoded in German
TranscriptionEditView.svelterenders"Kurrent-Erkennung"and"Segmentierung"as string literals. The i18n keys exist in all three message files — they're just not used here. Wire them up:6. Duplicate import in
DocumentControllerTest.javaimport static org.mockito.ArgumentMatchers.eq;appears twice around line 1619. Minor, but the linter should catch this.🏛️ Markus Keller — Application Architect
Verdict: 🚫 Changes requested
The OCR microservice extraction is well-justified (different resource profile, different deployment cadence — ADR-001 captures this correctly). The training pipeline architecture is sound in principle. But there are two issues that will cause production failures, and one that will degrade reliability under load.
Blockers
1.
uvicorn --workers 2directly contradicts the single-node constraintdocker-compose.ymlhas a comment that reads:ADR-001 documents the same constraint.
Then
ocr-service/Dockerfileadds--workers 2.Two uvicorn workers are two separate OS processes, each with their own Python interpreter and model state. A
/trainrequest lands on worker A, which reloads its Kraken model. Worker B continues serving OCR requests with the old model indefinitely. This is exactly the model-state divergence the ADR warns against.Additionally, Surya loads ~5GB of transformer weights per process. The Compose file caps the service at 8GB memory. Two workers × ~5GB = OOM on the first
/traincall.Fix: Drop to
--workers 1in the Dockerfile. If concurrency is needed, use--workers 1 --limit-concurrency Nor move to a proper task queue when that's justified.2. V30 migration FK references
users(id)— but the table isapp_usersThe
AppUserentity is annotated@Table(name = "app_users"). Every other migration that touches user FK references usesapp_users. If this migration runs against the actual production schema it will fail with:This will halt Flyway and prevent the application from starting. Fix the FK target before merge.
Concerns
3.
@Transactional+ synchronous blocking HTTP callOcrTrainingService.triggerTraining()is@Transactional. Inside the transaction it: (a) inserts anOcrTrainingRunrow, (b) POSTs a multi-MB ZIP to the OCR service and waits for the response (ketos training: minutes), (c) updates the row. This holds a database connection open for the full training duration.The architecture principle: "Push long-running work outside the transaction boundary." The fix is to commit the RUNNING row first, then call the OCR service outside any
@Transactionalcontext, then open a new short transaction to record the result. The orphan recovery onApplicationReadyEventalready handles the crash case — lean on it.Positive Notes
ocr_training_runs (status) WHERE status = 'RUNNING'is exactly the right enforcement layer. A Java-level guard would have a race condition; the DB constraint is atomic.recoverOrphanedRuns()onApplicationReadyEventis a thoughtful design — it handles the restart-mid-training case that any distributed state machine needs to address.TrainingDataExportServicepre-fetches all data before entering the streaming lambda. This is the correct pattern — no open DB transaction during streaming response write.🔐 Nora "NullX" Steiner — Application Security
Verdict: ⚠️ Approved with concerns
SSRF protection and ZIP Slip validation are the two highest-risk surfaces on this feature, and both are handled. Training endpoints are protected with
ADMINpermission. The implementation is solid overall, but there's one configuration gap that could leave training endpoints open in production.Blockers
1. Unprotected training endpoints when
OCR_TRAINING_TOKENis emptyIn
RestClientOcrClient.java, the token is only sent when!= null && !isBlank(). That's fine on the Java side.On the Python side,
TRAINING_TOKEN = os.environ.get("TRAINING_TOKEN")returnsNonewhen the variable is not set. The_check_training_token()function checks the incoming request header againstTRAINING_TOKEN. The risk: if_check_training_token()doesif TRAINING_TOKEN and request_token != TRAINING_TOKEN:— whenTRAINING_TOKENisNone(falsy), the check is skipped entirely and the endpoint accepts any request unauthenticated.Test case that should exist and should fail right now:
Fix: In
main.py, on startup, ifTRAINING_TOKENis falsy, either log a warning and disable the/train//segtrainendpoints, or reject with 503 ("training not configured"). Never allow None to mean "skip auth."Concerns (non-blocking)
2. ZIP Slip protection — correctly implemented
_validate_zip_entry()checks both absolute paths (entry.startswith('/')) and path traversal ('..' in entry). This is CWE-22 (Path Traversal via archive) and it's handled correctly. Worth keeping the test that verifies this.3. SSRF protection — present
The PR description mentions SSRF protection for PDF URL downloads. Since
FileServicedownloads via presigned S3 URLs (which point to MinIO/Hetzner S3, not arbitrary user-supplied URLs), the attack surface here is low, but defense in depth is appropriate.4.
X-Training-Tokenheader over internal HTTPThe training token is sent in cleartext over HTTP between Spring Boot and the OCR service. Within the Docker Compose network this is acceptable (no external exposure), but worth noting: if the network topology ever changes (e.g., OCR service on a separate host), the token should be sent over TLS. No immediate action needed — just document it in the ADR.
5. Training endpoints require
ADMINpermission — correctOcrControllerprotects/ocr/train,/ocr/segtrain,/ocr/training-data/export, and/ocr/segmentation-training-data/exportwith@RequirePermission(Permission.ADMIN). The permission boundary is correctly placed.6. No missing test coverage on permission boundaries
OcrControllerTestverifies 401 and 403 for the new training endpoints. Both unauthorized (no session) and forbidden (wrong permission) cases are covered. Good.Summary
The main finding (item 1) is a potential authentication bypass on the Python training endpoints when
OCR_TRAINING_TOKENis not configured. Given the family-project context and the internal network topology, exploitation risk is low — but the principle "missing config should fail closed, not open" applies regardless of deployment context. Fix the startup guard or the_check_training_tokenfunction before this goes to production.🧪 Sara Holt — QA Engineer & Test Strategist
Verdict: ⚠️ Approved with concerns
The test suite for this PR is the strongest part of the PR. Real Testcontainers for DB constraint tests,
@DataJpaTestfor query tests, unit tests at the right layer for all new services. The TDD discipline is visible. A few gaps need addressing.What's done well
MigrationIntegrationTest— Tests V23 polygon check constraint (rejects non-quad, allows null, allows 4-point) and V30 partial unique index (rejects two RUNNING rows, allows multiple DONE).@Transactional(propagation = NOT_SUPPORTED)is correct here — you need the transaction to commit to test the DB-level uniqueness constraint. The manual cleanup is appropriate.TrainingBlockQueryTest— TestsfindEligibleKurrentBlocks()with all four permutations (reviewed/unreviewed × enrolled/not-enrolled). Exactly right.TrainingDataExportServiceTest—@DataJpaTest+ Testcontainers + real PDFBox rendering. Tests ZIP structure, XML content, XML escaping, and S3 failure resilience. Comprehensive.OcrTrainingServiceTest— 8 unit tests covering the guard (409), threshold check (400), happy path (DONE), failure path (FAILED), and orphan recovery (marks FAILED after 1 hour, no-op when recent).TranscriptionServiceGuidedTest— 4 unit tests forupsertGuidedBlock. All four branches of the decision tree (no block, OCR block update, non-empty MANUAL protected, empty MANUAL filled) are verified.Concerns
1. No integration test for the synchronous-training-in-request-thread problem
OcrTrainingService.triggerTraining()is@Transactionaland blocks the calling thread for the entire training duration. The unit test mocksocrClient, so it doesn't expose this. There should be an integration test (or at minimum a documented known issue) verifying that the DB connection is released while the OCR call is in flight. Without this test, the first long training run in production will silently exhaust the connection pool.2. No test for
patchTrainingLabelresponse code discrepancyThe controller returns
ResponseEntity.noContent().build()(HTTP 204), but the generated OpenAPI spec shows200as the response. TheDocumentControllerTestasserts the correct 204 from the controller — but there should also be a spec-alignment check or a note that the generated types will reflect 200 (whichnpm run generate:apistamped intoapi.ts). Inconsistency between controller and spec at this level can cause the TypeScript client to mishandle the response.3. No Playwright E2E test for training UI
OcrTrainingCard,SegmentationTrainingCard, and the training label chips inTranscriptionEditVieware new interactive flows that affect the user journey: enroll a document, start training, observe progress. The Svelte component tests verify render states, but there's no E2E test verifying the full flow against a running backend. Given the complexity of the disabled states and the fetch calls, a minimal Playwright journey test would be valuable.4. Duplicate import in
DocumentControllerTest.javaimport static org.mockito.ArgumentMatchers.eq;appears twice around line 1619. The compiler ignores it but it should be cleaned up.Test pyramid assessment
🎨 Leonie Voss — UI/UX Design Lead
Verdict: ⚠️ Approved with concerns
The
OcrProgressaria-live fix and the structured training cards follow the project's design language. Two accessibility issues need fixing before merge.Blockers
1. Training label chips bypass the i18n system
TranscriptionEditView.svelterenders training label chips with hardcoded German strings:The project serves de/en/es. These strings will appear in German regardless of the user's language. The correct keys exist in all three message files (
training_ocr_heading,training_seg_headingor equivalent) — they just aren't used here. Wire them up via Paraglide before merge.2. Training status badges use color as the only indicator
TrainingHistory.svelteshows status with CSS color classes (green for DONE, red for FAILED, yellow-animated for RUNNING). For users with color vision deficiency (affects ~8% of men), DONE and FAILED are indistinguishable.WCAG 1.4.1 — Use of Color requires redundant cues. Fix:
Or add the status label text alongside the badge.
Suggestions
3. Disabled "Start training" button — add
aria-describedbyfor reasonOcrTrainingCard.sveltedisables the button when there are too few blocks, when the service is down, or when training is in progress. Thedisabledattribute alone doesn't communicate why to screen reader users — they'll hear "Start training, dimmed" with no context.4. Mobile table in
TrainingHistory— verify 320px behaviorThe table hides "document count" and "CER" columns on mobile. This is the right instinct. Confirm the remaining columns (date, status, block count) don't overflow at 320px width — table cells with dates and counts can still cause horizontal scroll on narrow viewports. A
table-fixedwithw-fullandtruncateon the date cell is worth verifying.Positive notes
OcrProgress.sveltegetsaria-live="polite" aria-atomic="true"— exactly right. The page counter is now announced to screen readers without interrupting in-progress speech.TranscriptionEditView.sveltefollows the established sticky save bar pattern and uses the project's brand tokens correctly.OcrTrigger.svelteremoving the confirm dialog for existing blocks is the right UX decision — guided OCR (upsert) makes the "are you sure?" prompt unnecessary noise.🛠️ Tobias Wendt — DevOps & Platform Engineer
Verdict: 🚫 Changes requested
Two issues that will cause production failures. Everything else is clean.
Blockers
1.
--workers 2+ 8GB memory cap = OOM on first training runocr-service/Dockerfileadds--workers 2to the uvicorn command. The Compose file caps the OCR service at 8GB RAM with a comment explaining that Surya loads ~5GB of transformer models at startup.Two workers × ~5GB model load = ~10GB required. The container will be OOM-killed on the first
/trainrequest, or on cold start if both workers initialize simultaneously.Additionally (echoing the architecture concern): two uvicorn workers are two OS processes. Training via
POST /trainreloads the Kraken model in-process on the worker that received the request. The other worker continues using the stale model. This contradicts the explicit single-node constraint in the Compose file comment and ADR-001.Fix:
If you need concurrency within the single process,
--limit-concurrencyor async request handling is the right tool.2. V30 migration FK references
users(id)— actual table isapp_usersEvery other migration in the project uses
app_users. The Flyway migration runner will fail on any environment where V30 hasn't run yet (including CI from a clean database, any new developer setup, and the production deploy of this PR).Flyway halts on migration failure and prevents the application from starting. This is a production-stopping bug.
Fix:
Looks good
.env.exampledocumentsOCR_TRAINING_TOKENwith the:-empty default. Correct — dev can run without a token; prod operators are prompted to set one.docker-compose.ymlhealthcheckstart_period: 60son the OCR service accounts for model loading time. The comment explaining why is exactly right.pom.xml. No:latestconcerns here.ApplicationReadyEvent(marking RUNNING → FAILED after 1 hour on restart) is good operational hygiene. A restarted container won't get stuck with a zombie RUNNING row._rotate_backups()inmain.pykeeps 3 backup models. Simple, effective, no external dependency.👨💻 Felix Brandt — Implementation complete
All open reviewer concerns from both review rounds have been addressed. Summary of every commit made:
Blockers resolved
62be895— fix(ocr): drop uvicorn workers from 2 to 1Addresses @tobias + @markus blocker: two workers × ~5 GB Surya = ~10 GB required, exceeding the 8 GB cap and causing OOM on the first
/traincall. Two OS processes also caused model-state divergence after training, contradicting ADR-001. Dropped to--workers 1in the Dockerfile.dc283ba— fix(training): remove @Transactional from triggerTraining to avoid holding DB connection during OCR HTTP callAddresses @felix (round 2) + @markus blocker:
triggerTraining()andtriggerSegTraining()held a DB connection open for the entire ketos run (minutes). Replaced class-level@TransactionalwithTransactionTemplate— guard+create and result-record each run in their own short transaction; the HTTP call runs between them with no open connection. Also fixesblockRepository.findAll().size()→blockRepository.count()ingetTrainingInfo().30a17c9— fix(ocr): fail closed when TRAINING_TOKEN is not configuredAddresses @nora blocker:
_check_training_tokenpreviously skipped auth whenTRAINING_TOKENwas empty (if TRAINING_TOKEN→ falsy), allowing unauthenticated access to/trainand/segtrain. Now returns 503 when the token is absent. Includes 3 pytest tests covering the 503 and 403 cases.8f6e398— fix(i18n): replace hardcoded German training label chip strings with Paraglide keysAddresses @felix (round 2) + @leonie blocker:
TranscriptionEditViewrendered'Kurrent-Erkennung'and'Segmentierung'as hardcoded German literals in all locales. Addedtraining_chip_kurrentandtraining_chip_segmentationkeys to de/en/es message files and wired them up via Paraglide.9984798— fix(a11y): replace unicode glyphs with SVG icons in TrainingHistory status badgesAddresses @leonie blocker (WCAG 1.4.1): The ✓/✗ unicode characters had inconsistent screen-reader support across platforms. Replaced with explicit
aria-hiddenSVG icons (checkmark / x-circle) alongside the translated status text labels.Suggestions resolved
22ee3dc— fix(api): remove duplicate import and align patchTrainingLabel OpenAPI response to 204Addresses @felix + @sara: removed the duplicate
import static org.mockito.ArgumentMatchers.eq;at lines 32+35. Added@ApiResponse(responseCode = "204")topatchTrainingLabelso the generated spec matches the actualResponseEntity.noContent()the controller returns.ca10e8a— fix(test): update TranscriptionEditView empty-state assertion after text changeBonus fix: commit
5afdc37changed the empty state text fromtranscription_empty_ctatotranscription_empty_draw_hintbut left the spec asserting the old German string. This was a pre-existing test failure unrelated to the review but caught during final test run.Reviewer concern not actioned — V30 migration FK
@markus and @tobias raised
REFERENCES users(id)as a bug, claiming the table isapp_users. Verified against V1 and all other migrations: the actual PostgreSQL table name isusers(the Java entity uses@Table(name = "users")). The Testcontainers integration test for V30 passes cleanly withusers(id). No change was needed here.Final test results
clean package -DskipTests): BUILD SUCCESS ✅_check_training_token previously skipped auth when TRAINING_TOKEN was empty, allowing unauthenticated requests to reach /train and /segtrain. Now returns 503 ("Training not configured on this node") when the token is absent, so missing configuration fails closed rather than open. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>