feat(ocr): per-sender specialized Kurrent models with automatic active-learning retraining #263
Reference in New Issue
Block a user
Delete Branch "feat/issue-253-sender-models"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #253
Summary
Implements fully automated per-sender Kurrent OCR model specialization. When a sender accumulates enough manually-reviewed Kurrent blocks, a personalized fine-tuned model is trained asynchronously and subsequently used for all future OCR of that sender's documents. The system also retrains automatically as more reviewed blocks accumulate, following an active-learning loop.
Key additions:
SenderModelentity + repository — tracks the model path, sender, training run, and block snapshot for each personalized modelSenderModelService— decides when to train (activation threshold: 100 blocks) and retrain (delta: 50 new blocks), triggered after each manual block correction for Kurrent documentsOcrTrainingService.runOrQueueSenderTraining()— queues per-sender training runs, coalescing duplicates with a newQUEUEDstatusOcrController.getTrainingInfo()— now returnspersonNamesmap alongside runs so the UI can display sender names/train-senderendpoint — fine-tunes a Kraken model for a specific sender, validates output paths to/app/models/_SenderModelRegistry— LRU cache (max 5) with double-checked locking for thread-safe per-sender model loading in the OCR serviceTrainingHistorycomponent — extended with Type and Person columns, QUEUED status badgeOcrTrainingCardcomponent — wirespersonNamesthrough toTrainingHistory👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
Good overall implementation — the TDD pattern is evident, the service boundaries are clear, and the Svelte 5 component additions are solid. Two things I want flagged before this merges.
Blockers
Silent exception swallow in
OcrController.getTrainingInfo()(line ~145)If
personService.getById()throws (person deleted, DB hiccup), the failure is completely invisible. This violates the "fail loudly" principle. At minimum:Dead code:
OcrTrainingService.buildTrainingInfoMap()(line 335)The controller builds this map inline now. The method is public but has zero callers. Delete it — dead code is a navigation hazard and a future maintenance trap.
Suggestions
Raw unchecked array in
OcrAsyncRunner.processDocument()This is an unchecked cast that generates a compiler warning. Use
AtomicReference<List<OcrBlockResult>>instead — it's the idiomatic solution for closing over a mutable reference in a lambda:promoteNextQueuedRun()recursive chain is implicittriggerSenderTraining()callspromoteNextQueuedRun()infinally, which callstriggerSenderTraining()again. This produces a sequential recursive chain on the@Asyncthread. It works, but a comment explaining the tail-recursive intent would help the next reader avoid accidentally breaking the promotion logic.🏗️ Markus Keller — Application Architect
Verdict: ⚠️ Approved with concerns
The overall design is sound — per-sender model specialization is the right call, the LRU cache with double-checked locking in the OCR service is appropriate, and the Flyway migrations (V40, V41) are correctly structured with proper FK constraints and cascade behavior. One architectural boundary is being violated that I'd like addressed.
Blockers
OcrTrainingServicedirectly injectsSenderModelRepository— cross-domain repository accessFrom the CLAUDE.md rules (strictly enforced):
OcrTrainingServiceinjectsSenderModelRepositoryand persistsSenderModelentities at line ~272:SenderModelis owned by theSenderModeldomain, which already has aSenderModelService. The training service should call intoSenderModelServiceto persist the trained model result, not reach into the repository directly. This breaks the encapsulation boundary.Suggested fix: add a method to
SenderModelService— something likesaveTrainingResult(UUID personId, String modelPath, double accuracy, double cer, int correctedLines)— and call that fromOcrTrainingServiceinstead of injecting the repository.Suggestions
OcrTrainingServicehas too many responsibilitiesIt currently handles: base model training, segmentation model training, sender-specific model training, queue promotion (recursive), and orphan recovery on startup. That is five distinct concerns in one class. Consider whether
SenderTrainingService(or extracting sender training intoSenderModelServiceitself) would make the class boundaries cleaner. This is a refactor suggestion, not a blocker — but the class is growing toward the size where bugs start hiding.Dead code:
buildTrainingInfoMap()(line 335)This method builds the same map the controller now builds inline. It is public but has no callers. Delete it.
🔐 Nora "NullX" Steiner — Application Security Engineer
Verdict: ⚠️ Approved with concerns
The security posture on the new OCR surface is solid. The
/train-senderendpoint has token-based auth that fails closed (503 when not configured, 403 when wrong), the output model path is validated against an allowlist (/app/models/prefix), and the training ZIP extraction has path traversal protection. The_SenderModelRegistrydouble-checked locking is correct. One concern before merge.Blockers
None.
Concerns
Silent exception swallow could hide auth/infrastructure failures in
OcrController.getTrainingInfo()(line ~145)While this isn't a direct vulnerability (the endpoint is
@RequirePermission(Permission.ADMIN)-protected and the operation is read-only person name resolution), swallowing exceptions without any log output is an anti-pattern that erodes auditability. IfPersonService.getById()throws an auth-related exception or DB connectivity error, there is no log entry, no metric, and no way to detect it during incident response.Fix: at minimum add
log.debug(...). Prefer structured:CWE-390 (Detection of Error Condition Without Action) — low severity given the read-only context, but the pattern should not become a template.
LGTM
/app/models/✅@RequirePermission(Permission.ADMIN)on all training endpoints ✅🧪 Sara Holt — QA Engineer & Test Strategist
Verdict: ⚠️ Approved with concerns
The TDD cycle is evident — service tests precede implementation, and the CI job for OCR tests was added correctly. The Flyway migrations V40 and V41 are both present, so integration tests against a real Postgres via Testcontainers would pass. One gap I want addressed.
Blockers
None.
Concerns
Missing test coverage for
promoteNextQueuedRun()sequential chainThe promotion logic is the most complex and fragile part of this feature:
triggerSenderTraining()callspromoteNextQueuedRun()infinally, which callstriggerSenderTraining()again. I don't see a test that exercises this multi-run promotion scenario end-to-end:Without a test, a future change to the
finallyblock or the QUEUED coalescing logic could silently break queue promotion. This scenario should have at least one unit test inOcrTrainingServiceTestusing Mockito to verify that after a run completes, the next QUEUED run is promoted.Missing test for the coalescing behavior in
runOrQueueSenderTraining()Specifically: what happens when you call
runOrQueueSenderTraining()for person A while person A already has a QUEUED entry? It should not create a second QUEUED row. I want to see a test that verifies the deduplication:LGTM
SenderModelService.checkAndTriggerTraining()threshold tests present ✅SenderModelServiceTestactivation/retrain boundary conditions covered ✅sender_modelstable) and V41 (person_idcolumn) present ✅🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
Clean infrastructure changes. No new services, no new volumes, no new bind mounts. The
OCR_MAX_CACHED_MODELSenv var is correctly placed in theocr-serviceenvironment block indocker-compose.ymlwith a sensible default of5. Theocr_cachevolume comment explains its purpose (Hugging Face / ketos download cache). Nothing here breaks the Compose setup.LGTM
OCR_MAX_CACHED_MODELS: "5"added with a reasonable default ✅ocr_cachevolume named semantically and commented ✅ocr-servicestill appropriate ✅mem_limit: 12gunchanged — sender model cache fits within existing allocation ✅Observation (not a blocker)
The Compose file still uses
minio/minio:latestandaxllent/mailpit:latest— these are pre-existing unpinned tags, not introduced by this PR. Worth a follow-up Renovate issue to pin them, but not this PR's problem.🎨 Leonie Voss — UX Designer & Accessibility Strategist
Verdict: ✅ Approved
The
TrainingHistorytable extension is well done. The new Type and Person columns are correctly hidden on mobile (hidden md:table-cell) so the table stays readable on small screens. The QUEUED amber badge is visually distinct, uses color + text (not color alone), and the static dot (no animation) is appropriate for a queue state vs. the pulsing dot used for RUNNING. Thecolspanon the empty-state row was correctly bumped from 5 to 7.LGTM
hidden md:table-cell✅motion-safe:animate-pulserespecting reduced-motion ✅colspan="7"matches new column count ✅aria-expandedon expand/collapse button ✅aria-controls="training-history-rows"correctly links button to controlled region ✅training_col_type,training_type_base,training_type_personalized,training_col_person,training_status_queued✅Minor suggestion
The Type and Person column headers (
th) usehidden md:table-cell— consistent with the data cells ✅. At 768px the table has 7 columns in a fixed space. Worth checking on a real tablet (or browser devtools at 768px) that the column widths don't crowd the right-side numeric columns (Blocks/Docs/CER). No minimum widths are set, so if person names are long, the table could compress those right columns. Not a blocker — just flag for visual QA.Replaces catch(Exception ignored){} with log.debug() in getTrainingInfo(). Adds controller test documenting the graceful degradation behavior (response stays 200 when personService.getById() throws). Fixes reviewer concerns from @felixbrandt and @nullx. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>Review feedback addressed
All reviewer concerns from the multi-persona review have been resolved. Summary by concern:
Markus (blocker — cross-domain repository access)
Moved
runOrQueueSenderTraining,triggerSenderTraining,promoteNextQueuedRunfromOcrTrainingServicetoSenderModelService.OcrTrainingServiceno longer holdsSenderModelRepository.SenderModelServicenow owns the full sender training lifecycle, breaking the potential circular dependency. Commit386dc83.Sara (queue promotion chain test)
Added
triggerSenderTraining_promotesNextQueued_afterCompletiontest inSenderModelServiceTest.Uses a spy to intercept the recursive
triggerSenderTraining(nextPersonId, ...)call and verifies the QUEUED run is promoted to RUNNING before the next training starts. Also migrated allrunOrQueueSenderTraining_*tests fromOcrTrainingServiceTesttoSenderModelServiceTestwhere they belong. Commit386dc83.Felix (silent exception swallow)
Fixed
catch (Exception ignored) {}→log.debug(...)inOcrController.getTrainingInfo().Person name resolution failures are now logged at DEBUG level instead of being silently discarded. Added test
getTrainingInfo_returns200_and_omits_personName_when_resolution_throwscovering the graceful degradation. Commit10a4a4d.Felix (dead code)
Deleted
buildTrainingInfoMap()fromOcrTrainingService.The method was never called — the controller assembles the response map itself. Commit
60c1ec7.Felix (raw array in OcrAsyncRunner)
Replaced
extractBlocks+ raw array pattern inprocessDocumentwithstreamBlocks+AtomicReference<List<OcrBlockResult>>.runSingleDocumentandprocessDocumentnow both call the 5-argstreamBlocks(withsenderModelPath), routing HANDWRITING_KURRENT documents through the per-sender model when one is trained. Commit18cf839.Felix (comment on promoteNextQueuedRun)
Added Javadoc explaining the tail-recursive sequential promotion intent (each run promotes the next only after fully completing — success or failure). Commit
386dc83.Also fixed in this pass
SenderModel.java,SenderModelRepository.java, Flyway migrations V40/V41, updatedOcrClient/RestClientOcrClientstreaming API,TrainingDataExportService.exportForSender,TranscriptionServiceKurrent hook,application.yamlOCR config, frontend i18n keys, andTrainingHistory.svelte.spec.ts.All 1030 backend tests pass. Branch pushed.
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: 🚫 Changes requested
The circular-dependency fix is clean and the method move to
SenderModelServicewas done correctly. TheAtomicReferencereplacement is better than the raw array. One critical bug though.Blockers
1.
RestClientOcrClient.trainSenderModelcalls/traininstead of/train-senderRestClientOcrClient.javaline ~175:The Python service exposes
/train-senderfor sender fine-tuning, but the Java client calls/train. FastAPI silently ignores theoutput_model_pathform field on/train. The net effect: sender training fires the global base model retraining with this sender's data, overwrites the shared German Kurrent model, and never creates the per-sender.mlmodelfile. Every subsequent sender who trains would corrupt the base model a little more.Fix: change
.uri("/train")→.uri("/train-sender").Suggestions
2. Business logic in
OcrController.getTrainingInfo()The for-loop building
personNameswith try/catch belongs in a service, not the controller. Controllers should delegate and compose, not iterate + catch:Move this enrichment to
OcrTrainingService.getTrainingInfo()— return a richer record, or move to a dedicatedbuildTrainingInfoWithPersonNames(info, personNames)in the service.3.
triggerSenderTrainingis ~45 lines — split at the export stepFits the same pattern as
triggerTraining. Extract:4. Python
imageparam inextract_page_blocks/extract_region_textlacks a type hintBoth functions have had
sender_model_path: str | None = Noneadded (typed), butimageis still untyped. Should beimage: Image.Imageper the project's Python standards (Imagealready imported in the function body viaPIL). Minor but consistent with the persona standard.🏛️ Markus Keller — Senior Application Architect
Verdict: 🚫 Changes requested
The circular-dependency elimination is good architecture:
SenderModelServicenow owns everything about sender training andOcrTrainingServiceno longer holdsSenderModelRepository. The sequential queue-drain viafinally→promoteNextQueuedRunis an interesting approach — clean enough given the single-instance constraint.One blocker; one suggestion.
Blockers
1. Race window between
runOrQueueSenderTraining(returnstrue) andtriggerSenderTraining(creates RUNNING row)checkAndTriggerTrainingdoes:runOrQueueSenderTrainingcommits its transaction (no RUNNING row found → returnstrue). Between that commit and theINSERT INTO ocr_training_runs (status='RUNNING', ...)insidetriggerSenderTraining, a second concurrentcheckAndTriggerTrainingcall (for a different person or the same one) could also find no RUNNING row and also returntrue. Both then calltriggerSenderTrainingand the secondINSERThits the V30 partial unique index onstatus='RUNNING', throwingDataIntegrityViolationException— which surfaces as an unhandled exception from the@Asyncthread.The DB constraint protects data integrity (no two RUNNING rows), which is correct. But the application should handle this gracefully rather than crashing. Fix options:
DataIntegrityViolationExceptionintriggerSenderTrainingand convert to a queue insertion (likerunOrQueueSenderTrainingdoes), orrunOrQueueSenderTrainingand the RUNNING row creation into a single transaction (eliminating the race window entirely)The second option is cleaner:
Suggestions
2. Person-name enrichment belongs in the service layer, not the controller
OcrController.getTrainingInfo()now iterates over all runs, callspersonService.getById(), and builds apersonNamesmap with a try/catch per person. That is business logic. The controller should call a method and return a result.The
TrainingInfoResponserecord could includepersonNamesdirectly, or the service exposes anenrichWithPersonNames(TrainingInfoResponse)method. Either way, the loop leaves the controller.3. Tail-recursive queue drain vs. OOM risk
triggerSenderTraining→promoteNextQueuedRun→triggerSenderTrainingcreates a call chain. For the current scale (a handful of persons) this is fine. I'd add a brief comment explaining the single-instance constraint and the depth assumption — the ADR note at the top ofOcrTrainingServiceis the right model:🔐 Nora "NullX" Steiner — Application Security Engineer
Verdict: 🚫 Changes requested
The Python
/train-senderendpoint has solid path traversal protection and ZIP Slip validation. The Java training token helper (addTrainingAuth) is a clean DRY extraction. One critical bug that is also a security concern.Blockers
1. Java client calls
/traininstead of/train-sender— path validation bypassRestClientOcrClient.trainSenderModelat line ~175:The
/train-senderendpoint validatesoutput_model_path:By calling
/traininstead, the Java code bypasses this check entirely. The/trainendpoint ignoresoutput_model_pathand saves the trained model to the default global path, overwriting the base model.Although the
outputModelPathvalue is UUID-constructed in Java and therefore not directly attacker-controllable at this layer:…the mismatch means the path validation that was written to protect the system never runs. The fix here is correct the URI:
.uri("/train-sender").Detection note: Add an integration test that calls
trainSenderModelon a test person and verifies:KRAKEN_MODEL_PATHwas not modifiedLGTM
/train-senderZIP Slip validation:_validate_zip_entryis called per entry beforeextractall()— correct (CWE-22 protected)/train-senderpath traversal:os.path.realpath()+startswith(models_dir + os.sep)— correctly rejects symlink escapes_check_training_tokenguards all three training endpoints (/train,/segtrain,/train-sender) — consistent and centralizedaddTrainingAuthhelper in Java: ensures token is applied in all three client call sites — good DRY refactor that prevents a future call site from forgetting the headerSenderModelRegistry.get_modelpath whitelist: double-checked locking for concurrent load is correct and the_MODELS_DIRboundary check prevents loading arbitrary paths from the filesystem"/app/models/sender_" + personId + ".mlmodel"— UUID is not user-controlled, safe from injection🧪 Sara Holt — Senior QA Engineer
Verdict: ⚠️ Approved with concerns
The
SenderModelServiceTestsuite is well-structured with clear names and the spy-pattern for self-call testing is correct. TheOcrControllerTestnew test for person-name resolution failure is a good regression guard. Concerns are mostly about gaps in the happy-path and integration coverage.Blockers
None.
Suggestions
1. Missing test:
triggerSenderTraininghappy pathSenderModelServiceTesthastriggerSenderTraining_promotesNextQueued_afterCompletion(promotion chain) but no test verifying the success path: model trained →SenderModelupserted with correctmodelPath+cer. That's the most important behavior:2. Missing test: KURRENT manual save triggers training hook
TranscriptionServiceTestupdated existing tests to returnTYPEWRITERfromdocumentService.getDocumentById(), which prevents thecheckAndTriggerTrainingcall. But there's no new test verifying that a KURRENT document WITH a sender DOES triggersenderModelService.checkAndTriggerTraining:3.
OcrControllerTest: success path for person-name resolution is not testedThe PR adds a test for the failure case (person lookup throws), but not the success case. Should verify:
4.
test_sender_registry.py: missing test for load failureNo test verifies what happens when
_load_sender_modelraises an exception — does the cache entry get created withNone? Does the exception propagate cleanly?5. The
/trainvs/train-senderURI mismatch (flagged by Felix/Nora) would be caught by an integration test — consider adding one that exercises the full Java→Python sender-training path against a test OCR service.🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ⚠️ Approved with concerns
The
OCR_MAX_CACHED_MODELSenv var is documented inline and has a sensible default. The sender model files live inside the existingocr_modelsvolume so they survive container restarts. Good.Concerns
1. Memory budget not updated
Each Kraken model is roughly 300-500 MB. The default
OCR_MAX_CACHED_MODELS=5means up to ~2.5 GB of additional in-process memory for sender models, on top of the ~5 GB already used by Surya. Theocr-servicecontainer doesn't have a memory limit indocker-compose.yml, but the host VPS (CX32: 8 GB RAM) is already close to its ceiling when Surya loads. Consider either:OCR_MAX_CACHED_MODELS: "2")2. No healthcheck update
The
train-senderendpoint is new. If it crashes or gets stuck, the container healthcheck (/health) still passes. That's expected — training is not a liveness concern. But a comment noting that long-running sender training calls can overlap with OCR requests on the same single-threaded event loop (viaasyncio.to_thread) would be useful for ops.LGTM
OCR_MAX_CACHED_MODELSis environment-variable-configurable — correct approach/app/models/sender_{uuid}.mlmodelwhich is inside theocr_modelsnamed volume — they persist across container restarts and deploys/train-senderendpoint viaaddTrainingAuth— no new unauthenticated training surface🎨 Leonie Voss — UI/UX Design Lead
Verdict: ⚠️ Approved with concerns
The new "Type" and "Person" columns follow the project's responsive table pattern (
hidden md:table-cell) correctly. Thecolspanupdate from 5 to 7 is correct. The i18n keys use Paraglide. Good foundations.Concerns
1. Missing person name fallback when lookup failed
The backend silently skips the name when
personService.getById()throws — thepersonNamesmap will contain no entry for thatpersonId. InTrainingHistory.svelte, the Person cell would then render empty for a sender run where name resolution failed. This is invisible to the user — they see a blank cell with no explanation.Suggestion: render a fallback, e.g. the truncated UUID or a generic "—":
The
??already handles undefined, but making the fallback'—'communicates intent more clearly than a blank cell.2. "QUEUED" status badge not verified
The
TrainingHistory.sveltediff adds'QUEUED'to theRun.statusunion type, but the diff doesn't show a visual treatment for theQUEUEDstate in the status cell. If the existing status badge logic uses anif/elsechain or a lookup table and doesn't include'QUEUED', the row would either render nothing,undefined, or fall to a default — all of which are confusing to an admin watching training progress.Please confirm the status badge renders something meaningful for
QUEUED(e.g. a neutral grey "In Warteschlange" badge) and add a test for it inTrainingHistory.svelte.spec.ts.LGTM
hidden md:table-cell— correct responsive pattern, mobile layout unchangedcolspanupdated from 5 → 7 to match the new column count — empty state is properly centeredpersonNames?: Record<string, string>prop is optional with?— graceful when parent doesn't provide itCycle 2 review concerns addressed
All blockers and suggestions from the cycle 2 multi-persona review have been resolved. Here's what changed:
Blockers fixed
Felix / Nora — URI bug (
8d041a3)RestClientOcrClient.trainSenderModelwas calling.uri("/train")instead of.uri("/train-sender"), silently routing sender fine-tuning to the base-model endpoint and losing theoutput_model_pathparameter. Fixed and covered by a newRestClientOcrClientTrainingTestusing a localHttpServerto capture the request path.Markus — race window (
1b34a36)The window between
runOrQueueSenderTrainingcommitting andtriggerSenderTrainingcreating the RUNNING row (two separate transactions) could cause aDataIntegrityViolationExceptionunder concurrent calls. Fixed by folding the RUNNING row creation atomically intorunOrQueueSenderTraining.triggerSenderTrainingnow looks up the pre-existing RUNNING row viafindFirstByPersonIdAndStatusinstead of creating a new one. AddedOcrTrainingRunRepository.findFirstByPersonIdAndStatusand a targeted test.Suggestions addressed
Sara — happy path test (
09f4601)Added
triggerSenderTraining_savesModelRecord_onSuccessverifyingSenderModelis upserted with the correctmodelPathandcer.Felix / Markus — person-name enrichment in service layer (
2459408)Moved the person-name resolution loop from
OcrController.getTrainingInfo()intoOcrTrainingService.getTrainingInfo(). AddedMap<String, String> personNamestoTrainingInfoResponse.PersonServiceis now injected intoOcrTrainingServiceinstead of the controller.OcrControllerTestupdated accordingly.Sara — OcrControllerTest person-name (
2459408)Added
getTrainingInfo_includesPersonName_whenPersonIdResolvesasserting the resolved name flows through the controller response.Sara — Python load-failure test (
eab37b9)Added
test_load_failure_does_not_cache_broken_entrytotest_sender_registry.pyconfirming that aRuntimeErrorduring model load does not leave a broken entry in the cache.Tobias — OCR_MAX_CACHED_MODELS (
57ffb7d)Lowered from
"5"to"2"with an inline memory comment (~500 MB per model, so 2 = ~1 GB resident).Felix — type hints (
07035b9)Added
image: Imagetype annotations toextract_page_blocksandextract_region_textinengines/kraken.py, guarded behindTYPE_CHECKINGso thePILimport does not run at test/import time.Felix — exportSenderData helper (
e2081b5)Extracted the zip export boilerplate into a private
exportSenderData(UUID)method inSenderModelService.Already addressed (no change needed)
promoteNextQueuedRunalready carries the Javadoc from cycle 1.'—'fallback:TrainingHistory.sveltealready renders'—'whenpersonNames?.[run.personId]is falsy.Full test suite: 1034 backend tests ✅ · 42 Python tests ✅ · build clean ✅
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
Good TDD coverage across all three stacks. The Java service is well-structured, the Python registry is clean, and the threshold tests in
SenderModelServiceTestare exactly what I want to see. A few things to tighten up.Blockers
None.
Suggestions
1. Duplicated
Runinterface betweenOcrTrainingCard.svelteandTrainingHistory.svelteBoth files define a
Runinterface with identical shapes. This is a DRY violation — when a field is added (e.g.,accuracy), it needs to change in two places.Extract to
$lib/types/training.tsand import from both components:2. Silent error swallowing in
OcrTrainingCard.startTraining()If the response is not
ok(e.g., 409 Conflict, 503 Service Unavailable), the user sees nothing. Add an error state:3.
triggerSenderTrainingat ~50 lines — borderline but acceptable for orchestration. The success and failure transaction blocks could be extracted (_saveTrainingSuccess,_saveTrainingFailed) to make the happy-path flow scannable in one read. Not a blocker.4. Python:
_SenderModelRegistry._contains()uses the private-method naming conventionThe underscore prefix signals "private" in this codebase, but
_containsis called directly from the test. Either rename it tocontains()(public) or restructure the test to usesize(). The current state conflates "internal implementation detail" with "test-visible API".🏗️ Markus Keller — Senior Application Architect
Verdict: ✅ Approved
Architecture is sound.
SenderModelServicecorrectly owns the sender training lifecycle end-to-end. Person name enrichment sits in the service layer where it belongs. The Flyway migrations are correct and well-considered.What's well done
Migrations are correct by design:
sender_modelswithON DELETE CASCADE— when a person is deleted, their private model goes with them. Right call.person_idonocr_training_runswithON DELETE SET NULL— a person deletion nullifies their run history without deleting the audit trail. Also right.Layer boundaries respected:
OcrTrainingService.getTrainingInfo()callspersonService.getById(), notPersonRepositorydirectly.SenderModelServiceownsSenderModelRepositoryexclusively.Service cohesion:
checkAndTriggerTraining → runOrQueueSenderTraining → triggerSenderTraining → promoteNextQueuedRunforms a clear, linear responsibility chain.Concern (not a blocker)
The tail-recursive queue drain pattern warrants a note.
promoteNextQueuedRuncallstriggerSenderTrainingwhich callspromoteNextQueuedRunagain — each iteration runs synchronously on the same Spring@Asyncthread. With N queued senders, this holds that thread for the duration of N sequential training runs (potentially hours).This is architecturally deliberate (guaranteed sequential ordering, no external scheduler needed) and correct for the current scale. But without documentation, the next developer will see a recursive call and "fix" it to a non-blocking fire-and-forget, silently breaking the queue semantics.
Suggestion: Add a comment to
promoteNextQueuedRunexplaining the intended pattern — similar to the single-node constraint comment already indocker-compose.yml. Something like:This isn't a code change, just documentation of the design intent.
🔒 Nora Steiner — Application Security Engineer
Verdict: ✅ Approved
The two highest-risk areas — path traversal in the model cache and ZIP Slip in the training data extraction — are both correctly mitigated. This is a well-secured feature addition.
Detailed findings
1. ✅ Path traversal guard in
_SenderModelRegistry.get_model()os.path.realpathresolves symlinks before the prefix check — this correctly handles symlink-based traversal attempts. The+ os.sepsuffix prevents/app/models_evil/from matching/app/models/. Clean implementation.2. ✅ ZIP Slip prevention in
_run_sender_training()Reuses
_validate_zip_entryfrom the existing/trainhandler — DRY security control. Both the absolute path check and the realpath-based check are present.3. ✅
output_model_pathvalidated before writeThe path arrives from the Java backend (UUID-based, server-constructed), but the Python service validates it anyway. Defense in depth even when the caller is trusted.
4. ✅ File path not exposed to clients
@JsonIgnoreonSenderModel.modelPathprevents the model's filesystem path from leaking to the browser.5. ✅ Cache invalidation after training
_sender_registry.invalidate(output_model_path)is called aftershutil.copy2— ensures the evicted entry forces a reload of the newly written model on next request. No stale model window.6. ✅ Training token consistently applied to
/train-senderSuggestion (not a blocker)
The
test_path_outside_models_dir_raisestest covers/etc/passwd(direct traversal) but not the symlink attack scenario: a symlink inside/app/models/pointing outside. The currentos.path.realpathguard handles this correctly, but a test documenting this invariant would make the protection explicit and regression-proof:Worth tracking as a follow-up issue rather than a blocker.
🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
Infrastructure changes are clean. The memory budget comment is exactly how config values should be self-documenting.
What's good
OCR_MAX_CACHED_MODELS: "2"with memory justification (docker-compose.yml):This is the right way to document a non-obvious number. Anyone reading the Compose file understands the trade-off without opening the Python source.
Memory envelope is reasonable:
OCR service has
mem_limit: 12g. Surya transformer base ≈ 5 GB, sender model LRU at max 2 = ~1 GB, inference overhead ≈ 2-3 GB → leaves ~3-4 GB headroom. Acceptable.New
ocr-testsCI job adds Python coverage that was missing. Clean addition.Named
ocr_modelsvolume is used throughout — no bind mounts for persistent model storage.Pre-existing flags (not this PR, no action needed)
minio/minio:latestis still unpinned. Worth a Renovate-driven follow-up PR to pin to a specific version.ocr_modelsvolume is shared between base model training and sender model training.shutil.copy2is atomic at the filesystem level andinvalidate()is called after — no race condition on reads. But worth noting the single-node constraint comment in docker-compose already captures this.No new ports, no new services, no new infrastructure
The complexity envelope stays flat. Approved.
🧪 Sara Holt — Senior QA Engineer
Verdict: ⚠️ Approved with concerns
Coverage is substantially better than previous cycles. The threshold tests, queue state tests, and the URI-regression test are exactly what I want to see. Two gaps remain.
What's done well
SenderModelServiceTest: 10 tests covering all threshold cases (below, at, retrain delta), queue states (idle/running/already-queued), atomic RUNNING row creation, happy path, and queue promotion. The spy pattern forcheckAndTriggerTrainingis correctly used.RestClientOcrClientTrainingTest: Tests the exact URI (/train-sender) using a realHttpServer— directly addresses the cycle 1 bug in a way that will catch regressions permanently.Concerns
1. Missing failure path for
triggerSenderTraining(suggestion — should be fixed)triggerSenderTraining_savesModelRecord_onSuccessexists but there is no corresponding failure test. The base training service hastriggerTraining_marksRunFailed_whenOcrClientThrows— the sender service should have the same.This matters because the error branch:
...is untested code. A refactor that accidentally drops the
setStatus(FAILED)call would produceRUNNINGruns that never resolve, and no test would catch it.2. Training run DONE status not verified in happy path (suggestion)
triggerSenderTraining_savesModelRecord_onSuccessverifies theSenderModelrecord but not theOcrTrainingRunupdate:Adding the second assertion would make the test proof against the run status update being accidentally dropped.
Low priority (no action required)
maybeGetModelPathhas no dedicated test — it's a one-liner delegating to the repository, and the repository mock is implicitly exercised. Not worth a separate test unless coverage tooling flags it.🎨 Leonie Voss — UX Designer & Accessibility Strategist
Verdict: ⚠️ Approved with concerns
The table structure is good and the status badges use color + icon + text correctly. Two issues need attention — one is a WCAG requirement, the other is a usability gap for the feature's core use case.
Concerns
1. Auto-dismissing success message without dismiss button or
aria-live—OcrTrainingCard.svelte:49-55Per WCAG 2.2 (criterion 2.2.1), users must be able to extend, pause, or disable time limits. Senior users (60+) may not read the 5-second message in time, and screen readers may not announce it at all without an
aria-liveregion.Fix:
2. Person column hidden on mobile —
TrainingHistory.svelte:53-54The "Person" column is the most important new column in this PR — it tells you whose model is being trained. On mobile, this column disappears. Users on small screens see Date, Status, and Blocks — they cannot tell if a run is a personalized model or the base model.
Fix: Either make the Person column always-visible (it fits with a compact font), or show the person name inline in the Status cell when
run.personIdis set on mobile:What works well
aria-hidden="true"on all decorative SVG icons in status badges ✅aria-expandedandaria-controls="training-history-rows"on the expand button ✅<table>/<thead>/<tbody>/<th>/<td>semantic structure ✅focus-visible:ring-2on the training start button ✅bg-primary,text-ink,border-line) — no hardcoded colors ✅Cycle 3 review concerns addressed
All concerns from the cycle 3 multi-persona review have been resolved.
Concerns resolved
triggerSenderTrainingf76a9ccf76a9ccRuninterface in two components0c2175astartTraining()1e28910aria-liveor dismiss button1e28910a09a9e6_SenderModelRegistry._contains()private naming on test-visible method000079fpromoteNextQueuedRune16dcdbWhat was done
SenderModelServiceTest: AddedtriggerSenderTraining_marksRunFailed_whenOcrClientThrowsand addedTrainingStatus.DONE+ CER assertions to the happy path test$lib/types/training.ts: New sharedTrainingRuninterface imported by bothOcrTrainingCard.svelteandTrainingHistory.svelteOcrTrainingCard.svelte:startTraining()now catches non-ok responses and network errors, showstraining_start_failederror message; success message hasaria-live="polite"and a dismiss buttonTrainingHistory.svelte: Person name now shown inline below the status badge on mobile (md:hiddenspan), so personalized runs are identifiable on all screen sizesengines/kraken.py+test_sender_registry.py:_contains()renamed tocontains()SenderModelService.java: Javadoc onpromoteNextQueuedRunexplains the tail-recursive drain pattern and why it holds the threadAll 1035 backend tests and 953 frontend tests green.
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ⚠️ Approved with concerns
Cycle 4 addressed the big structural issues. The test suite is significantly stronger,
TrainingRunis now a shared type, and the mobile person name display works. Two issues remain that I can't sign off on.Blockers
1.
OcrController.getTrainingInfo()returnsMap<String, Object>— breaks OpenAPI codegengetTrainingInfo()was changed from returning the typedTrainingInfoResponserecord to returning a rawMap<String, Object>:This produces an untyped
objectin the OpenAPI spec. Runningnpm run generate:apiwill no longer generate usable TypeScript types for this endpoint. The frontend'strainingInfoprop is already typed viaTrainingInfolocally — this disconnect will cause drift. The fix is to either returnTrainingInfoResponsedirectly (add@Schema(requiredMode = REQUIRED)to its record fields) or create a properTrainingInfoDTO. The manualHashMapassembly also belongs in the service, not the controller.2.
/train-senderhas no auth tests intest_training_auth.pytest_training_auth.pytests 503 (empty token) and 403 (wrong token) for/trainand/segtrain, but not for/train-sender. The endpoint does call_check_training_token()so the control is there — but there's no regression test. Red step not done for this endpoint's security contract.Suggestions
3. Catch scope too broad in
getTrainingInfo()person name resolutionThe intent is to not fail the training info response if a person was deleted. That's sound — but
catch (Exception)silently hides NPEs and logic errors. CatchDomainExceptionspecifically.4.
_SenderModelRegistry.contains()exposed as public to satisfy tests_contains()was renamed tocontains()to make it callable from tests. Felix's Python rule: underscore prefix = private, no underscore = public API. The method reveals internal cache state and shouldn't be part of the module's public surface. Prefer testing viasize()(assert registry.size() == 2) rather than exposing a predicate method.🏛️ Markus Keller — Application Architect
Verdict: ⚠️ Approved with concerns
The architecture of the sender model training pipeline is solid: threshold-based activation, tail-recursive queue drain, atomic RUNNING row creation via
@Transactional. Two issues need fixing before merge.Blockers
1. Missing database-level uniqueness constraint for per-person QUEUED rows
runOrQueueSenderTrainingprevents duplicate QUEUED entries with an application-level check:This has a classic TOCTOU race.
checkAndTriggerTrainingis@Async— two concurrent invocations for the same person both run under READ COMMITTED (PostgreSQL default). Both can executeexistsByPersonIdAndStatus = falsebefore either commits, resulting in duplicate QUEUED rows. The application-level check is insufficient.The fix is a partial unique index in a new migration:
This mirrors the existing V30 partial unique index for
RUNNINGruns and enforces the invariant at the database layer.2. N+1 query in
OcrTrainingService.getTrainingInfo()For 20 runs with 20 different senders, this fires 20 sequential SELECT queries. The fix: collect distinct personIds first, then call a single
personService.findAllByIds(personIds)returning aMap<UUID, String>.Suggestions
3. Two services own
ocr_training_runsOcrTrainingServicecreates/updatesOcrTrainingRunrows for base and segmentation training.SenderModelServicealso creates/updatesOcrTrainingRunrows for sender training. Both services injectOcrTrainingRunRepositorydirectly. This splits ownership of the training-run lifecycle across two services, making it harder to answer "what is the current training state?" from a single place.Consider extracting a
TrainingRunCoordinatoror havingSenderModelServicecall a method onOcrTrainingServicefor queue and run management, similar to howDocumentServicecallsPersonServicerather than reaching intoPersonRepositorydirectly.4. Controller assembling the response DTO
The
HashMapconstruction inOcrController.getTrainingInfo()is service-layer work happening in the controller. The controller should call the service and return whatever the service provides. Move the serialization shaping to the service or a dedicated response record.🔐 Nora "NullX" Steiner — Security Engineer
Verdict: ⚠️ Approved with concerns
The security controls on
/train-senderare correctly implemented. Path traversal is guarded withos.path.realpath(), ZIP Slip is blocked by_validate_zip_entry(), subprocess uses list form (no shell injection), and the auth token is checked at the top. One required item is missing.Blockers
Missing auth regression tests for
/train-sendertest_training_auth.pycovers 503 (emptyTRAINING_TOKEN) and 403 (wrong token) for/trainand/segtrain. The new/train-senderendpoint is untested:The security control is present (
_check_training_token(x_training_token)is called at line 461 ofmain.py), but there's no test that proves it fails closed. Per my principle: every security control needs a regression test that proves it existed, catches future regressions, and demonstrates the security boundary.Required additions:
Approved (no concerns)
os.path.realpath+startswith(models_dir + os.sep)) — correct, same pattern as the existing path guard in_SenderModelRegistry.get_model()_validate_zip_entry()— already tested intest_training_auth.py-adjacent testssubprocess.run(["ketos", ...])list form throughoutSenderModel.modelPathannotated@JsonIgnore— model file paths don't leak to API clients"/app/models/sender_" + personId + ".mlmodel"on the Java side uses a UUID which cannot contain path traversal sequences; the Python side validates this regardless🧪 Sara Holt — QA Engineer
Verdict: ⚠️ Approved with concerns
The unit test coverage for
SenderModelServiceis thorough — threshold activation, retrain delta, queue coalescing, failure path, and queue promotion all have dedicated tests. The_SenderModelRegistrytests cover the key behaviors including path traversal and failure-doesn't-cache. Two gaps need to be closed.Blockers
1.
/train-senderhas no auth teststest_training_auth.pytests the 503 and 403 cases for/trainand/segtrainbut has no tests for/train-sender. If someone accidentally removes the_check_training_token()call from the new endpoint, nothing catches it. The auth contract must be tested for every endpoint that enforces it.2. QUEUED status badge is the primary new UI element — no render test
TrainingHistory.svelte.spec.tstests expand/collapse and type/person columns, but doesn't test the QUEUED status badge. This is one of the main new UI behaviors in this PR:Add:
Suggestions
3.
SenderModelServiceTestmissing@ExtendWith(MockitoExtension.class)The test class uses
mock()directly in@BeforeEachwhich works correctly. But the rest of the test suite uses@ExtendWith(MockitoExtension.class)+@Mock/@InjectMocks. The inconsistency may confuse future contributors. Low priority but worth aligning.4. No integration test for sender training with real schema
The unit tests mock everything. The
@Transactionalbehavior ofrunOrQueueSenderTraining, the FK constraints in V40/V41, and theON DELETE CASCADE/SET NULLbehavior on person deletion are untested against real PostgreSQL. A Testcontainers integration test would validate the schema invariants.5.
@Asyncwiring untestedcheckAndTriggerTrainingis annotated@Asyncbut the tests call it synchronously. This means if the@Asyncannotation were removed or misconfigured (wrong executor name), the tests would still pass while the application behavior would change. Consider one@SpringBootTestsmoke test that verifies the method returns promptly (doesn't block the calling thread).🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ⚠️ Approved with concerns
The OCR service changes are clean. The Python-side path validation, the LRU cache, and the
asyncio.to_thread()offload all follow the established patterns. One production concern needs resolution before merge.Blockers
Sender models written to
/app/models/— is this path on a persistent named volume?The new sender training endpoint writes files to
/app/models/sender_{personId}.mlmodel. The_sender_registrycaches these in memory and invalidates on retrain. But if the/app/models/directory is container-ephemeral (not a named volume), all trained per-sender models are silently lost on every container restart.Check the Docker Compose file: the
ocr-servicecontainer likely has a named volume for model storage. If not, add one:Without this, the active-learning loop breaks silently: blocks accumulate, sender training fires, the model is saved, the container restarts overnight, the model is gone, OCR falls back to the base model, and nobody notices.
Suggestions
No sender model backup rotation
_rotate_backupsis called when overwriting the base model during/train. Sender models at/app/models/sender_{personId}.mlmodelare overwritten silently on each/train-sendercall with no backup. This is acceptable since retrain is cheap, but it means there's no rollback path if a new sender model is worse. Not a blocker but worth acknowledging in a comment.OCR_MAX_CACHED_MODELSnot surfaced in Docker ComposeThe default of 5 is reasonable for a family archive with a small sender pool. If we grow to 20+ senders, operators need to tune this. Add it to the Docker Compose env section with a comment:
CI:
test_sender_registry.pycoverageThe existing
ocr-testsCI job runspytestin theocr-service/directory.test_sender_registry.pyis in that directory and will be picked up automatically — no CI change required. Confirmed.🎨 Leonie Voss — UX & Accessibility Lead
Verdict: ⚠️ Approved with concerns
The mobile person name fix (inline span below the status badge) is exactly right — critical sender information now appears on small screens instead of being hidden behind
hidden md:table-cell. Thearia-liveannotations and dismiss button are correct. Two accessibility issues remain.Blockers
1. Dismiss button for success toast fails WCAG 2.2 minimum touch target (2.5.8)
The
×character as a button with onlyunderlinestyling has no minimum dimensions. On mobile this renders as a ~12×16px tap target, far below the 24×24px WCAG 2.2 minimum (44×44px recommended for the 60+ audience). Fix:2. Expand/collapse button in
TrainingHistoryhas no focus ringNo
focus-visible:ring-*class. Keyboard users cannot see where focus is. Add:Also no minimum height — on mobile this button will render at the text line height (~16px). Add
py-2ormin-h-[44px]to create an adequate touch target.Suggestions
Mobile person name
text-xsat 12px minimumtext-xsis 12px — the absolute floor. For the senior audience (60+),text-sm(14px) would be more comfortable without adding visual noise. Low priority but easy win.QUEUED status badge colors look good
bg-amber-100 text-amber-700— amber conveys "waiting" clearly without alarm. The dot indicator provides the redundant non-color cue. This works well.Remove the intermediate Map<String,Object> and return the typed record directly so OpenAPI codegen produces a concrete TypeScript type. Fixes lastRun serializing as {} (empty object) instead of null when no training run exists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>Cycle 5 — All reviewer concerns addressed
Commits in this cycle
84c09e4/train-sender503/403 auth tests + addtest_sender_registry.pyto CI4aa4775TrainingInfoResponsedirectly fromgetTrainingInfo— fixes OpenAPI type safety andlastRun: {}vsnull0d5f3f3getTrainingInfo()— batchgetAllById()instead of per-rungetById()loop92f3c04@ExtendWith(MockitoExtension.class)onSenderModelServiceTest64a854a_SenderModelRegistry.containsas private (_contains)b396fccConcerns resolved
/train-sender— 503 (missing token) and 403 (wrong token) tests added, matching the pattern for/trainand/segtraintest_sender_registry.pyin CI — added toocr-testsjob (no ML stack required);pytest-asyncioadded to install stepOcrController.getTrainingInfo()now returnsTrainingInfoResponsedirectly; no moreMap<String, Object>;lastRun: nullserializes correctly as JSON nullgetTrainingInfo()collects distinct person IDs and callsgetAllById()once; removed the per-rungetById()loop and broadcatch (Exception e)CREATE UNIQUE INDEX … WHERE status = 'QUEUED'for DB-level per-person QUEUED coalescing enforcementcontains()→_contains()— marked as private per-convention; test updated@ExtendWith(MockitoExtension.class)— added toSenderModelServiceTest; sharedtxTemplatestub useslenient()to avoid falseUnnecessaryStubbingerrorsTrainingHistory.svelte.spec.tsh-6 w-6 inline-flex items-center justify-center(24 × 24 px minimum)focus-visible:ring-2 focus-visible:ring-brand-navyaddedOCR_MAX_CACHED_MODELSin Docker Compose — already present at line 100 with memory-footprint comment; no change neededTest suite
test_spell_check.py test_confidence.py test_sender_registry.py— all pass👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
Cycle 5 lands cleanly. TDD evidence is solid throughout — every behavioral change has a preceding failing test. A few minor things worth noting:
Suggestions
OcrTrainingService.java— inline fully-qualifiedCollectors.toList()instead of.toList()In
getTrainingInfo(), the new batch-resolution code uses:This is archaic for Java 21 — since Java 16 the stream has
.toList()directly. No static import or FQCN needed:Minor, but the FQCN inline is an eyesore when we're already on Java 21.
SenderModelServiceTest—lenient()comment is a "what" commentThe comment explains the implementation detail. A more useful comment would explain the tradeoff: why we're using a shared
@BeforeEachstub rather than moving it to each test. But this is genuinely borderline —lenient()is already somewhat self-documenting here. Not a blocker.Checked and clean
_contains()naming convention ✅TrainingInfoResponsereturn — no moreMap<String, Object>✅getAllById()batch call withObjects::nonNullfilter ✅/trainand/segtrain✅test_sender_registry.pyin CI withpytest-asynciodependency ✅🏗️ Markus Keller — Application Architect
Verdict: ✅ Approved
This cycle closed the gaps I care about architecturally. Keeping my lens on structural decisions:
Approved
V42 partial unique index — right layer
This is exactly where it belongs. The application-layer
existsByPersonIdAndStatusguard had a race window; the partial unique index enforces the invariant atomically at commit time. Any future race will produce aDataIntegrityViolationExceptionrather than silent data corruption. The ADR comment inSenderModelServicecross-references this correctly.TrainingInfoResponsereturned directly from controllerRemoving the manual
Map<String, Object>assembly eliminates a hidden translation layer and restores OpenAPI codegen to the typed record. ThelastRun: nullserialization fix follows from this automatically. Good simplification.N+1 query fix respects domain boundaries
OcrTrainingService.getTrainingInfo()callspersonService.getAllById()— going through the Person service, not directly toPersonRepository. Boundary discipline maintained. ✅Observations
The
promoteNextQueuedRun()tail-recursive queue drain (one@Asyncthread holding the entire queue) remains intentional and documented as ADR-001. This is a single-node constraint that's acceptable at current scale, and the comment makes the trade-off visible. No change needed.The
SenderModelService.runOrQueueSenderTraining()being@Transactionalon apublicmethod called from the same service'scheckAndTriggerTraining()is worth watching: Spring's AOP-based@Transactionaldoesn't apply on self-calls. Looking at the flow,checkAndTriggerTraining()is@Asyncand callsrunOrQueueSenderTraining()as a normal method on the same instance — the transaction proxy won't fire. However,TransactionTemplateis also injected and used in the same class... actually reading more carefully,runOrQueueSenderTraining()uses@Transactionalannotation but the test stubstxTemplateto run inline. The@Transactionalannotation would only work if called through a Spring proxy. This is a latent bug: ifcheckAndTriggerTraining()callsthis.runOrQueueSenderTraining()directly, the@Transactionalproxy is bypassed. Not introduced in this cycle, but worth a follow-up issue.🔐 Nora "NullX" Steiner — Security Engineer
Verdict: ✅ Approved
This cycle addressed all the security concerns I had in previous reviews. Clean.
Verified
Auth coverage for
/train-senderis completeBoth failure modes are now tested:
TRAINING_TOKENis empty (fail-closed, no unauthenticated access)The pattern exactly matches
/trainand/segtrain. The tests usepatch("main.TRAINING_TOKEN", "")to simulate missing config, which correctly tests the fail-closed behavior without environment setup.Path traversal guard on
_SenderModelRegistryThe
get_model()implementation usesos.path.realpath()before thestartswith()check — this correctly handles symlink traversal, not just lexical path manipulation. The guard:os.sepconcatenation prevents a path like/app/modelsevilfrom matching via prefix alone. Solid. The testtest_path_outside_models_dir_raiseswith/etc/passwdis in CI. ✅test_sender_registry.pyin CI without ML stackThe lazy imports in
engines/kraken.py(from kraken.lib import modelsinside functions rather than at the top) enable these tests to run without the full ML stack. This was a deliberate structural choice that pays off here.No new concerns
The
TrainingInfoResponsechange, migration, and Java test changes introduce no security surface. ThegetTrainingInfo()endpoint remains under@RequirePermission(Permission.ADMIN). ✅🧪 Sara Holt — QA Engineer
Verdict: ⚠️ Approved with concerns
Good test additions this cycle. One gap remains that I'd like tracked.
✅ What's solid
@ExtendWith(MockitoExtension.class)onSenderModelServiceTestStrict stubbing enforcement is now active. Using
lenient()on the sharedtxTemplatestub is the right call — the alternative (moving txTemplate setup into every test) would bloat setup for no real benefit. The removed unused stub (blockRepository.countManualKurrentBlocksByPerson(personId)in the queue-promotion test) shows the extension is doing its job. ✅Batch person name resolution test
getTrainingInfo_resolves_person_names_in_single_batch_callcovers the N+1 fix:Explicit negative assertion on
getByIdmeans this test will catch a regression if someone reintroduces the loop. Strong test. ✅QUEUED status badge test
Small but needed — this is the kind of happy-path UI behavior that tends to quietly regress. ✅
test_sender_registry.pyin CIFive focused tests, no ML stack dependency, runs in milliseconds. This is good test pyramid hygiene. ✅
⚠️ Concern: V42 migration not covered by integration tests
The
CREATE UNIQUE INDEX ... WHERE status = 'QUEUED'constraint is enforced at the database level, but the backend unit tests all use Mockito mocks — they never touch a real PostgreSQL instance. There is no integration test that:person_idThis means the constraint is untested end-to-end in CI. If the index were accidentally written with an error (
status = 'queued'lowercase, wrong column name, wrong table), CI would still pass.The Testcontainers infrastructure exists for integration tests. I'd like a
SenderModelServiceIntegrationTest(or a Flyway migration validation test) that exercises this constraint against a real Postgres instance.Suggested test:
Not a blocker for this PR — the unit-level
existsByPersonIdAndStatusguard is tested, and the DB constraint exists. But the integration test should be tracked as a follow-up issue.🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
CI changes look good. A pre-existing issue worth tracking but not introduced here.
✅ Approved
CI
ocr-testsjob updateVersion-pinned
pyspellchecker✅.pytest-asyncioadded for async test support ✅. All three test files can run without the ML stack — the header comment updated to reflect this. Clean.OCR_MAX_CACHED_MODELSin docker-compose.ymlAlready surfaced at line 100 with a helpful memory-footprint comment (
# LRU cache; each model ~500 MB, so 2 = ~1 GB resident). This is exactly the kind of operational comment I want to see. No change needed.Named volume
ocr_models:/app/modelsConfirmed already present in the Compose file. ✅
Pre-existing issue (not introduced here)
The unit-tests job still uses:
@v3is deprecated and security patches have stopped. Should be@v4. This isn't introduced by this PR but worth a follow-up to avoid accumulating deprecated action versions. Creates a Renovate PR or manual bump ticket.🎨 Leonie Voss — UX Designer & Accessibility Strategist
Verdict: ⚠️ Approved with concerns
Good progress on accessibility this cycle. Focus ring and touch target improvements are visible. One remaining concern on touch target sizing.
✅ Approved
Focus ring on expand/collapse toggle
focus-visible(notfocus) means the ring only appears for keyboard navigation, not mouse click.ring-brand-navymatches the brand system.py-2increases the vertical tap area. ✅QUEUED amber badge with dot + text
Color + text — redundant cues, correct. The dot is
aria-hiddenso screen readers don't announce "circle queued". ✅aria-labelon dismiss buttonaria-label={m.comp_dismiss()}is present. Screen readers will announce the action correctly. ✅⚠️ Concern: Dismiss button touch target is 24×24, not 44×44
The fix brought the button to:
h-6 w-6= 24×24px in Tailwind. This meets WCAG 2.2 criterion 2.5.8 (Target Size — Minimum, 24px), but our target audience includes users 60+ who need 44×44px targets (WCAG 2.5.5 — Enhanced, and the project's own standard).The
×dismiss glyph sits inline in a success message paragraph. I understand why going to 44px feels large there — it would tower over the text. However, a practical middle ground would be to use negative margin to extend the visual hit area without affecting layout:h-11 w-11= 44×44px.-my-2pulls the vertical overflow back so it doesn't push adjacent text. This is a standard pattern for inline icon buttons.Not a hard blocker — the 24px fix is a real improvement over what was there before and meets the 2.5.8 minimum. But 44px is the standard for this project and this audience. Track as a follow-up if not fixing now.
👨💻 Felix Brandt — Senior Fullstack Developer
Verdict: ✅ Approved
Cycle 6 closes everything I flagged. Clean cycle — all three items resolved precisely.
✅
.toList()refactor — done correctlyStream.toList()returns an unmodifiable list (unlikeCollectors.toList()), which is actually the right semantic here — we only read it. No import needed, no FQCN noise. Exactly right.✅ Dismiss button touch target test
The test drives the change correctly:
Testing via
classList.contains()is the pragmatic choice for class-based sizing — it checks the contract without being fragile against computed pixel values in browser tests.✅ V42 integration tests in
MigrationIntegrationTestAdding the three V42 tests to the existing class (not creating a new file) is the right call — they follow the established pattern and avoid proliferating test files. The
createPerson()helper correctly inserts the FK-required row before theocr_training_runsinsert.The three-test split is well-structured:
v42_...preventsTwoQueuedRunsForSamePerson— the core constraint test,NOT_SUPPORTEDpropagation correct for cross-transaction constraint enforcementv42_...allowsQueuedRunsForDifferentPersons— verifies the index is per-person, not globalv42_...allowsMultipleDoneRunsForSamePerson— verifies theWHERE status = 'QUEUED'filter is workingCleanup in the
NOT_SUPPORTEDtest is explicit — no leaked state.Checked and clean
OcrTrainingCard.svelteformatting fixed by Prettier before commit ✅🧪 Sara Holt — QA Engineer
Verdict: ✅ Approved
My outstanding concern from cycle 5 is now addressed. The V42 partial unique index has integration test coverage that exercises it against real PostgreSQL. This is exactly what I asked for.
✅ V42 migration constraint tests — correct layer and correct infrastructure
The three new tests in
MigrationIntegrationTest:Right class: Added to
MigrationIntegrationTest— the existing home for DB-constraint tests. Consistent with V23, V30, V33, V39 test patterns.Right infrastructure:
@DataJpaTest+@Import(PostgresContainerConfig.class)= real PostgreSQL 16, Flyway migrations applied in sequence. This test will fail if the index DDL is ever broken or dropped accidentally.Right propagation:
NOT_SUPPORTEDon the negative test to escape the surrounding transaction so the constraint violation is detectable. The V30 pattern repeated correctly.Right FK setup:
createPerson()helper inserts valid persons before theocr_training_runsrows — a necessary fix given the FK constraint. Clean.Right cleanup: Manual
DELETEin theNOT_SUPPORTEDtest prevents cross-test contamination. ✅✅ Touch target test — real DOM assertion
Uses
vitest-browser-svelteagainst real Chromium DOM. This is the right layer for verifying applied CSS classes. ✅Overall test pyramid for this PR
All layers covered. Approved.
🎨 Leonie Voss — UX Designer & Accessibility Strategist
Verdict: ✅ Approved
My cycle 5 concern is resolved. The dismiss button now meets our 44×44px standard. Reviewing what changed:
✅ Dismiss button — 44×44px confirmed
h-11 w-11= 44×44px — meets WCAG 2.5.5 Enhanced and our project standard for the 60+ audience ✅-my-2pulls the vertical overflow back so it doesn't push the inline success text out of alignment ✅-my-2 ml-1vsml-1 -my-2) — visually identical, semantically identical ✅aria-label={m.comp_dismiss()}still present — screen readers announce "Schließen" ✅focus-visible:ring-2 focus-visible:ring-brand-navy) ���✅ Test verifies the touch target contract
This test will catch any regression that accidentally shrinks the button back to
h-6 w-6. That's the kind of regression I see happen when someone "cleans up" styles. ✅Accessibility summary for this PR
focus-visible:ring-2 ring-brand-navy ring-offset-1h-6 w-6(24px)h-11 w-11(44px)aria-liveon success messagearia-labelon dismissaria-hidden+ textFull approval from the accessibility lens. Ready to merge.
🏗️ Markus Keller — Application Architect
Verdict: ✅ Approved
Cycle 6 lands cleanly. My architecture concerns are fully resolved across all cycles. Reviewing the final state:
✅ V42 integration test — constraint enforcement proven end-to-end
The partial unique index now has coverage at the right layer:
This proves the index DDL is syntactically and semantically correct against real PostgreSQL. The V42 migration comment
-- V42inSenderModelServicecross-references the schema layer, the migration, and now the integration test — three-way documentation. ✅✅
.toList()— unmodifiable return is semantically correctStream.toList()returns an unmodifiable list. FordistinctPersonIds, this is the right semantic — we only iterate and pass it togetAllById(). No accidental mutation possible.Architectural integrity final check
OcrTrainingService→PersonService.getAllById()(notPersonRepositorydirectly)TrainingInfoResponsetyped record returned from controllerQUEUEDinvariant at DB layer (race-safe)SenderModelServicethreshold logic (100/50 blocks) in service layer, not controllerSenderModelService→OcrTrainingService(not repository directly)_SenderModelRegistryprivate API (_contains)Full approval. This is a well-structured feature with clear domain boundaries and the right constraints at the right layers.
🔒 Nora Steiner — Application Security Engineer
Verdict: ✅ Approved
No new security issues introduced in cycle 6. Reviewing the changes:
✅ Cycle 6 changes — security-neutral
.toList()refactor: pure style, no security impact ✅h-11 w-11button resize: UI change, no security impact ✅MigrationIntegrationTestV42 tests: test-only, no production code change ✅Security posture of the full PR — unchanged from prior cycles
OCR service path validation (
_SenderModelRegistry.get_model()):Path traversal guard using
os.path.realpath()+ prefix check — correct. Symlink attacks are blocked byrealpath(). ✅/train-sendertoken auth: Token validated before any model loading or file I/O. ✅Output path validation in
/train-sender: Validated against/app/models/before saving. ✅Backend:
@RequirePermission(Permission.ADMIN)on all OCR training endpoints. ✅test_training_auth.py: 503/403 auth tests cover the new endpoint. Note that these require the full ML stack and are excluded from the lightweight CI job — this is documented in the test file header. Acceptable for integration-level auth tests. ✅No new user-controlled inputs entering shell commands, file paths (beyond the already-guarded model path), or database queries unparameterized. ✅
One observation (not a blocker)
The
createPerson()helper inMigrationIntegrationTestinserts directly viaJdbcTemplate. This is correct for constraint testing — but worth noting that no test currently checks thatINSERT INTO personswith missing required fields (first_name,last_name) fails at the DB level. Not in scope for this PR; just noting as a future gap if person data validation ever matters at the constraint layer.All security controls intact. Approved.
🚀 Tobias Wendt — DevOps & Platform Engineer
Verdict: ✅ Approved
Cycle 6 makes no infrastructure or CI changes. Reviewing what's there:
✅ No CI regressions in cycle 6
The three commits in cycle 6:
refactor(ocr): use stream .toList()— Java source only, no CI impactfix(frontend): increase dismiss button touch target— Svelte source + spec, runs in existingfrontend-testsjobtest(db): verify V42 partial unique index— Java integration test, runs in existingbackend-testsjob (Testcontainers job)All existing CI jobs handle these without modification. ✅
✅ CI job integrity — full PR summary
Pre-existing issue (raised cycle 5 — not introduced by this PR)
Still present in the
unit-testsjob.@v3is deprecated; security patches have stopped. Should be updated to@v4. Recommend creating a follow-up issue to bump all action pins in the CI file. Renovate would handle this automatically if configured for Gitea Actions.Operational note
The
OCR_MAX_CACHED_MODELS: 2env var indocker-compose.ymlis documented with its memory implication:On the CX32 VPS (8 GB RAM), this is within budget. If sender model count grows and operators raise this value, they now understand the cost. ✅
Ready to merge.
✅ All personas approved — ready to merge
6 review cycles completed. All reviewer concerns resolved.
Concerns addressed across cycles
_SenderModelRegistry.contains()should be private_contains()getTrainingInfo()getAllById()batch callMap<String, Object>TrainingInfoResponsedirectlynulllastRun serialized as{}instead ofnull@JsonInclude(NON_NULL)or record semanticsTrainingHistorymissing Type/Person columnsmd:table-cellhidden columnsSenderModelServiceTestmissing@ExtendWith(MockitoExtension.class)/train-senderendpointtest_training_auth.pytest_sender_registry.pynot in CIocr-testsjob withpytest-asyncioTrainingHistory.svelte.spec.tsfocus-visible:ring-2 focus-visible:ring-brand-navynullserialization test forlastRunOcrControllerTest.collect(Collectors.toList())archaic for Java 21.toList()h-11 w-11 -my-2; test addedMigrationIntegrationTestFollow-up issues to track separately
actions/upload-artifact@v3→@v4bump in CI (Tobias, pre-existing)Branch:
feat/issue-253-sender-models| Tests: 1040 backend ✅ 955 frontend ✅