feat(ocr): per-sender specialized Kurrent models with automatic active-learning retraining #253
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
When a document sender accumulates enough manual transcription corrections, the system automatically fine-tunes a dedicated Kraken model for that sender's handwriting. The appropriate sender model is then used transparently on every future OCR run for that sender. An admin training log in
/admin/systemshows the full history of base and sender-specific runs.Thresholds:
MANUAL-sourceHANDWRITING_KURRENTblocks for this senderCorrection signal:
TranscriptionBlock.source = MANUALon documents wheresender_id = XandscriptType = HANDWRITING_KURRENT. No additional labelling required from users.Model naming:
/app/models/sender_{person_uuid}.mlmodel(on the OCR service volume).Design Decisions
SenderModelService.checkAndTriggerTraining()is called@Asyncafter everyupdateBlock()save. If a training run is already active, the call is silently skipped.german_kurrent.mlmodel, not trained from scratch.Scope
16 files changed, 3 new files, 2 Flyway migrations, ~6 i18n keys.
Part A — Database
A1.
V40__add_sender_models.sqlA2.
V41__add_person_to_training_runs.sqlPart B — Java: New & Modified Files
B1. New entity:
model/SenderModel.javaB2. Extend
model/OcrTrainingRun.javaAdd one nullable field:
When
personIdisnull, the run is a base model training. When non-null, it is a sender-specific run.B3. New repository:
repository/SenderModelRepository.javaB4. New query in
repository/TranscriptionBlockRepository.javaAdd to the existing interface:
B5. Extend
service/OcrClient.javaAdd two new methods alongside the existing ones:
Keep the existing 4-arg
streamBlocks()default; have it call the new 5-arg version withnull:B6. Extend
service/RestClientOcrClient.javaImplement
trainSenderModel()— same multipart structure astrainModel(), but add two extra form fields:Implement the 5-arg
streamBlocks()— same as the existing implementation, but includesenderModelPathin the JSON body when non-null:B7. New service:
service/SenderModelService.javaB8. Extend
service/TrainingDataExportService.javaAdd two new methods alongside the existing ones:
Also add the supporting repository query to
TranscriptionBlockRepository:B9. Hook in
service/TranscriptionService.javaInject
SenderModelService(field injection via@RequiredArgsConstructor). After theblockRepository.save(block)call inupdateBlock():B10. Update
service/OcrAsyncRunner.javaInject
SenderModelService. InrunSingleDocument(), look up the sender model path before callingstreamBlocks:Also update
processDocument()(used inrunBatch()) to look up and pass the sender model path toextractBlocks().B11. Extend
service/OcrTrainingService.getTrainingInfo()Change the query to fetch 20 most recent runs (instead of 10) to cover a mix of base and sender runs:
Add
findTop20ByOrderByCreatedAtDesc()toOcrTrainingRunRepository.The existing
TrainingInfoResponsealready containsList<OcrTrainingRun> runs, andOcrTrainingRunwill now includepersonId— the frontend uses this.B12. Config properties in
application.propertiesPart C — Python OCR Service
C1.
ocr-service/models.pyAdd one optional field to
OcrRequest:C2.
ocr-service/engines/kraken.pyAdd
SenderModelRegistryclass above the module-level_modelglobal:Also add a helper to
engines/kraken.pythat accepts an optional model override:Update
extract_page_blocks()andextract_region_text()to accept an optionalmodelparameter (defaults toNone, resolved viaget_model()).C3.
ocr-service/main.pyIn
generate()(streaming normal mode):Extract
senderModelPathfrom the request and pass it toextract_page_blocks():In
generate_guided()(streaming guided mode):Same — pass
model=kraken.get_model(request.senderModelPath)toextract_region_text().In the non-streaming
/ocrendpoint:Same pattern — resolve model once before the page loop.
Extend the
/trainendpoint to accept two optional form fields alongside the existingfileupload:The actual Kraken
ketos traincall already supports specifying an output path and a base model via CLI flags. Wire these through when non-None.C4.
ocr-service/requirements.txtNo new Python dependencies —
SenderModelRegistryuses stdlibthreadingandcollections.OrderedDict. OpenCV was added in the preprocessing feature (#252); no further changes here.Part D — Frontend
D1. Extend
/admin/systempage (+page.svelte)The
/api/ocr/training-infoendpoint already returnsruns: OcrTrainingRun[]. Each run now also carriespersonId: string | null.Add a
TrainingHistoryTablecard below the existingOcrTrainingCardandSegmentationTrainingCard:personId !== null, "Base" otherwiseD2. Extend backend API response
In
OcrTrainingService.TrainingInfoResponse, include person display name for runs with apersonId. Add a new field:OcrTrainingService.getTrainingInfo()collects all non-nullpersonIds from the runs, fetches their display names viaPersonService.getById(), and builds the map.D3. New component:
src/lib/components/TrainingHistoryTable.svelteReceives
runs: OcrTrainingRun[]andpersonNames: Record<string, string>as props. Renders the table described in D1. No interactivity required.D4. i18n keys
Add to
messages/de.json,en.json,es.json:admin_training_history_headingTrainingshistorieTraining HistoryHistorial de entrenamientoadmin_training_type_baseBasisBaseBaseadmin_training_type_senderSenderSenderRemitenteadmin_training_col_senderSenderSenderRemitenteadmin_training_col_linesZeilenLinesLíneasadmin_training_col_startedGestartetStartedIniciadoPart E — Tests
E1. Python:
ocr-service/test_sender_registry.pytest_get_or_load_returns_same_instance— callget_or_load(path)twice, assert same object returned (cache hit, no second disk read)test_lru_evicts_oldest_when_full— fill cache to max (mockload_any), add one more, assert first entry evictedtest_get_model_returns_base_when_path_none—get_model(None)returns the global_modeltest_get_model_returns_sender_when_path_exists—get_model("/app/models/sender_abc.mlmodel")with a mocked registry returns sender modelE2. Java unit: new
SenderModelServiceTest.javaUses
@ExtendWith(MockitoExtension.class). Test methods:checkAndTriggerTraining_doesNothing_when_below_threshold—countManualKurrentBlocksByPersonreturns 50, no existing model →triggerSenderTrainingnever calledcheckAndTriggerTraining_triggersActivation_when_above_threshold— count ≥ 100, no existing model →trainSenderModelcalledcheckAndTriggerTraining_triggersRetrain_when_delta_exceeded— existing model withcorrectedLinesAtTraining = 100, current count = 155 → triggers (delta = 55 ≥ 50)checkAndTriggerTraining_skips_when_training_already_running—findFirstByStatus(RUNNING)returns a run → no training triggeredE3. Java unit:
RestClientOcrClientTest.javatrainSenderModel_includesOutputModelPathInBody— verify the multipart form body includesoutput_model_pathfieldE4. Java unit: extend
TranscriptionServiceTest.javaupdateBlock_triggersTrainingCheck_when_document_has_kurrent_sender— mockdoc.getScriptType() = HANDWRITING_KURRENTanddoc.getSender() != null→senderModelService.checkAndTriggerTraining()called onceupdateBlock_doesNotTriggerTraining_when_scriptType_is_not_kurrent—doc.getScriptType() = HANDWRITING_LATIN→checkAndTriggerTrainingnever calledVerification Steps
cd backend && ./mvnw test -Dtest=SenderModelServiceTest,TranscriptionServiceTest,RestClientOcrClientTestcd ocr-service && python -m pytest test_sender_registry.py -vHANDWRITING_KURRENTdocument. Edit 5 transcription blocks manually. Verifyocr_training_runsin the DB has no new row (below threshold).psql, manually SETocr.sender-model.activation-threshold = 1. Edit one more block. Confirm a newocr_training_runsrow withperson_id = <sender_id>appears and completes."Loading sender model from /app/models/sender_{uuid}.mlmodel"./admin/system. Confirm the Training History table shows the sender run with the sender's name.👨💻 Felix Brandt — Senior Fullstack Developer
Observations
exportForSender()duplicatesexportToZip()(B8). The issue says "structurally identical … reuse the existing helper methods" — but the plan doesn't extract a shared core. Both methods do: collect blocks → group by documentId → fetch PDFs → render crops → write ZIP pairs. That's 50+ lines duplicated. The helpers (renderPageImage,cropBlockImage,writeTrainingPair) are already package-private, but the orchestrating logic isn't extracted.Lock contention bug in
SenderModelRegistry.get_or_load()(C2).kraken_models.load_any()is called while holding_lock. Loading a Kraken model takes 5–30 seconds. This blocks every other thread (including concurrent OCR requests) for the entire load duration. The fix is a double-checked load:Cache stale after retraining (C2/C3). After sender model retraining, the new
.mlmodelfile is written to the same path but the registry still holds the old loaded model. The next OCR request hits the cache and uses the pre-retraining model. Fix: addinvalidate(model_path)toSenderModelRegistryand call it from_run_training()after writing the new model file:/trainendpoint changes are non-trivial (C3). The current endpoint always overwritesKRAKEN_MODEL_PATHand triggers a model reload into_model. For sender training (output_model_pathprovided), the logic must diverge: copy tooutput_model_path, skip backup/rotation, and skip reloading_model. The two code paths should be explicit — not anif output_model_path elseinline branch inside_run_training(), but two clearly named private functions:_install_base_model(best_model)and_install_sender_model(best_model, output_model_path).updateBlock()hook fetches the document unconditionally (B9). ThedocumentService.getDocumentById(documentId)call runs on every block update regardless of script type. SinceTranscriptionBlockdoesn't carrysender_idorscriptType, this extra lookup is unavoidable — but it should be the first conditional check, not aftersave():This is correct — just keep the ordering as shown and don't move the doc fetch before the save.
npm run generate:apistep omitted from Part B. AddingpersonIdtoOcrTrainingRunand introducingSenderModelas a new entity affects the OpenAPI spec. After B2/B3 changes, the backend must be rebuilt andnpm run generate:apimust run before the frontend work in Part D starts.Recommendations
private StreamingResponseBody exportBlocksToZip(List<TranscriptionBlock> blocks)inTrainingDataExportService. BothexportToZip()andexportForSender()become 3-line wrappers.get_or_load()lock contention with the double-checked pattern above — this is a correctness issue, not just a performance concern.invalidate(model_path)toSenderModelRegistryand call it after successful sender model installation.🏛️ Markus Keller — Application Architect
Observations
Domain boundary violation:
SenderModelServicedirectly ownsOcrTrainingRunRepository(B7).OcrTrainingServiceis the existing owner of theOcrTrainingRunlifecycle — it creates runs, marks them DONE/FAILED, and exposes thegetTrainingInfo()surface. Introducing a second writer (SenderModelService) to the same repository fragments that ownership and makes it hard to reason about training run state. The fix: movetriggerSenderTraining()intoOcrTrainingServiceastriggerSenderTraining(UUID personId, int correctedLines), and haveSenderModelServicedelegate to it.SenderModelServicethen only owns theSenderModelentity — a clean, single-responsibility boundary.Race condition on the RUNNING guard (B7). The check in
checkAndTriggerTraining()is:Two concurrent async threads can both find no RUNNING run, both pass the guard, and both attempt to insert — one will hit the partial unique index
idx_ocr_training_runs_one_runningfrom V30 and throwDataIntegrityViolationException. The code doesn't catch this. Fix: wrap the firsttxTemplate.executeblock in a try-catch forDataIntegrityViolationExceptionand silently return (the other thread won the race — correct outcome).sender_models.person_idFK has noON DELETEclause (A1). This defaults toNO ACTION/RESTRICT. If someone tries to delete a person who has a sender model, the DB will reject it with a constraint error. The existing FK in V30 usesON DELETE SET NULL(triggered_by UUID REFERENCES users(id) ON DELETE SET NULL). Apply the same pattern here:CASCADEmakes more sense thanSET NULLhere sinceperson_idisNOT NULL UNIQUE— there's no valid null state. Deleting the person should delete the sender model record (the file cleanup is a separate concern).ocr_training_runs.person_idFK also needsON DELETE SET NULL(A2). Consistent with the existingtriggered_bycolumn:Historical training run records should survive person deletion.
ADR-001 (single-node constraint) needs updating. The existing comment in
OcrTrainingServicereads: "Not safe for horizontal scaling: training reloads the Kraken model in-process on the Python OCR service after each run. The DB-level RUNNING constraint prevents concurrent training API calls…" Sender model training uses the same constraint but writes to a different model path. If base training and sender training ever ran concurrently (different paths, different models), they wouldn't conflict in terms of model state — but the current constraint prevents this. Document whether the single-run constraint is intentional for sender models or could be relaxed later.Recommendations
triggerSenderTraining()intoOcrTrainingService.SenderModelServicecalls it likeocrTrainingService.triggerSenderTraining(personId, correctedLines).DataIntegrityViolationExceptionfrom the firsttxTemplate.execute()block and silently return — matches the "already running" guard semantics.ON DELETE CASCADEtosender_models.person_idandON DELETE SET NULLtoocr_training_runs.person_idin the migrations.🔒 Nora "NullX" Steiner — Application Security Engineer
Observations
Path traversal risk on
output_model_path(B6, C3). The Java backend computes the path as"/app/models/sender_" + personId + ".mlmodel"wherepersonIdis a JavaUUID— guaranteed alphanumeric + dashes, so injection through this specific call path is not possible. However, the Python/trainendpoint acceptsoutput_model_pathas a plainForm(str)with no validation. If this endpoint were ever called by any other client (directly, via script, or future code), a crafted path like../../etc/cron.d/exploitwould write a model file outside/app/models/. Defense-in-depth fix — add to_run_training()before the copy:Training token must be forwarded in
trainSenderModel()(B6). The plan says to implementtrainSenderModel()with "the same multipart structure astrainModel()" — but the training token (X-Training-Tokenheader) is added intrainModel()via theif (trainingToken != null && !trainingToken.isBlank())guard. The plan should explicitly state this header must be included intrainSenderModel()as well. It's easy to omit from a copy-paste. Make it a named private helper:addTrainingAuth(spec)called from both methods, so the auth logic lives in one place.SenderModel.modelPathappears in OpenAPI spec (B1).SenderModelis a JPA entity with@Schemaannotations onmodelPath. If any controller ever returnsSenderModeldirectly (not justOcrTrainingRun), the filesystem path to the model file is exposed in the API. No controller currently returnsSenderModel, but the entity is not protected. Recommend adding@JsonIgnoretomodelPathor removing its@Schemaannotation now, before a controller accidentally exposes it.SenderModelRegistryloads models from caller-supplied paths at inference time (C2). Theget_model(sender_model_path)function callsos.path.exists(sender_model_path)and thenload_any(model_path)on a path that originated from the Java backend's DB record. If the Javasender_models.model_pathcolumn were corrupted (e.g., via a DB injection that escaped other layers), an attacker could potentially load a malicious.mlmodelfile. Mitigate: validate that the path is within/app/models/inget_model()using the same check as above, before callingload_any().ZIP extraction in sender training reuses existing
_validate_zip_entry()(C3). The current/trainendpoint already validates each ZIP entry against path traversal — this protection is maintained for sender training since the same code path handles the ZIP extraction. Good. Confirm_validate_zip_entryis called regardless of theoutput_model_pathparameter.Recommendations
output_model_pathwhitelist check inside_run_training()beforeshutil.copy2().model_pathwhitelist check insideget_model()beforeload_any()._add_training_auth_header(spec)private method inRestClientOcrClientcalled from bothtrainModel()andtrainSenderModel().@JsonIgnoreonSenderModel.modelPathto prevent filesystem paths leaking into the OpenAPI spec or any future controller response.🧪 Sara Holt — QA Engineer
Observations
Off-by-one threshold tests are missing (E2). The
checkAndTriggerTraininglogic comparescorrectedLines >= activationThreshold. The tests cover "below threshold" (50) and "above threshold" (≥100) but not the boundary itself. Add:checkAndTriggerTraining_doesNothing_when_one_below_threshold— count = 99, no model → no trainingcheckAndTriggerTraining_triggersActivation_when_exactly_at_threshold— count = 100, no model → training triggeredSame boundary tests for the retrain delta (49 new vs. 50 new vs. 51 new).
exportForSender()integration test is missing (E1–E4). The existingexportToZip()relies on real PDF rendering from S3 and real DB blocks. The sender variant has the same dependency profile. It needs an integration test (Testcontainers + MinIO fixture) covering: happy path produces a valid ZIP with PAGE XML pairs; empty block list returns no-op. Without this, the most likely failure point (PDF rendering + crop + XML generation) is untested for the sender path.TrainingHistoryTableSvelte component has no test (D3). The plan adds a new read-only table component that renders training run data including conditional "Sender"/"Base" type display and optional person name lookup. Add a vitest-browser-svelte component test:personIdis nullpersonIdis setrunsis emptyFlyway migration integration test (A1, A2). V40 and V41 must run cleanly in sequence after V39. The migration test suite (which runs all migrations against Testcontainers Postgres on every CI run) catches this automatically, but the verification steps should include
./mvnw test -Dtest=MigrationTestexplicitly.@Async+ unit test interaction (E2).checkAndTriggerTrainingis annotated@Async, but when called directly in a Mockito unit test (no Spring context), the method executes synchronously on the calling thread. The test forcheckAndTriggerTraining_skips_when_training_already_runningis valid — just note in the test that the@Asyncbehavior is not under test here. A separate@SpringBootTestintegration test with a real async executor would verify the async dispatch, but this is a medium-priority concern for a non-public-facing background job.No E2E smoke test for the training history table (D1). Add a Playwright test to the verification matrix:
This catches broken server-side data fetch or missing i18n keys.
Recommendations
threshold - 1,threshold, andthreshold + 1for both activation and retrain-delta checks.TrainingDataExportServiceSenderIntegrationTestusing Testcontainers + a mock S3 bucket.TrainingHistoryTable.sveltecovering the three data states../mvnw test -Dtest=MigrationTestto the verification steps in the issue.🎨 Leonie Voss — UX Designer & Accessibility Strategist
Observations
"Sender" naming collision in the table (D1). The Type column renders "Sender" (for sender-specific runs) and the person-name column is also titled "Sender". On screen they appear side by side:
The first "Sender" labels the run type; the second labels the person. In German "Sender" already has this dual meaning (broadcaster / letter-sender). Admin users will be confused. Rename the Type column values to
Basis/Personalisiert(de) andBase/Sender-specific(en), and rename the person column header toPerson(de/en). The i18n table in D4 needs updating accordingly.Empty state is missing (D1). The first time the admin opens
/admin/system, no training runs exist. The table should show:A blank table with only headers gives no signal. This is especially important for the senior audience who may interpret an empty table as a broken page.
Accessibility: table needs a
<caption>oraria-label(D3). Without one, screen readers announce the table as "table" with no context.TrainingHistoryTable.svelteshould include:or a
<caption>element styled visually-hidden if the heading above already serves as the label.Status column uses text alone (D1). "Done" / "Failed" / "Running" in text is readable but relies on color for differentiation in the typical implementation. Add a redundant icon per WCAG 1.4.1:
Color-blind users (8% of men) see no status difference without the icon.
CER value has no explanation (D1). The column shows "4.2%" with no tooltip or label explaining what CER means or that lower is better. Admins who aren't ML practitioners won't understand the significance. Add a
<abbr title="Character Error Rate — lower is better">CER</abbr>element in the column header.Missing
blockCount/ lines column label i18n key for accuracy: The table mockup shows "Lines" but the i18n key in D4 isadmin_training_col_lines. ConfirmblockCountfromOcrTrainingRunmaps to this column — it does (the plan usesblockCountas the correction line count). Add a tooltip explaining "number of corrected lines used in this training run".Recommendations
Basis/Personalisiert(de) andBase/Sender-specific(en). Update D4 i18n keys.aria-label={m.admin_training_history_heading()}to the<table>element inTrainingHistoryTable.svelte.<abbr title="Character Error Rate — lower is better">in the column header.⚙️ Tobias Wendt — DevOps & Platform Engineer
Observations
OCR_MAX_CACHED_MODELSenv var not indocker-compose.yml(C2). The Python code readsos.environ.get("OCR_MAX_CACHED_MODELS", "5")but this env var isn't listed in the compose service definition. It should appear alongsideKRAKEN_MODEL_PATHin theocr-serviceenvironment block. Default of 5 is fine; making it explicit in compose means it's visible, documented, and overridable without reading Python source:Memory math is safe but tight (Design Decisions). Existing allocation: Surya ~5 GB + base Kraken ~500 MB + Torch overhead ~1 GB. Adding 5 sender models at ~300 MB each = +1.5 GB. Total peak ≈ 8 GB against a 12 GB
mem_limit. The margin is ~4 GB, which is healthy. No action required — just confirming the calculation is sound.Stale
sender_models.model_pathafter volume loss (A1). If theocr_modelsvolume is recreated (e.g.,docker compose down -v), the sender model files are gone but the DB records remain with the old paths. Theget_model()function in C2 already handles this gracefully:os.path.exists(sender_model_path)returnsFalse, and it falls back to the base model silently. This is the correct behavior — document it in the service comment so future operators know fallback-to-base is by design, not a bug.No compose restart needed for the new env var (C2). The
_sender_registryis instantiated at module load time fromos.environ.get(...). ChangingOCR_MAX_CACHED_MODELSrequires anocr-servicecontainer restart to take effect. This is expected behavior — document it in the env var comment.ocr_modelsvolume permissions (A1). Sender model files will be written by the training process to/app/models/. The existing base model file is already in this volume. The Python process runs as the container's default user. No new volume configuration is needed — the existing mount handles it. Confirmed.Model file cleanup after person deletion (A1). The
sender_modelsDB row is deleted viaON DELETE CASCADE(per Markus's recommendation). The.mlmodelfile at/app/models/sender_{uuid}.mlmodelis not deleted by the DB cascade — it becomes an orphan file on disk. For a family archive running on a CX32 with 40 GB disk, orphaned model files (~300 MB each) are low risk. Document this as a known limitation: periodic manual cleanup viadocker exec ocr-service find /app/models -name "sender_*.mlmodel" ...if disk usage grows.Training concurrency in compose: The single-node constraint (existing partial unique index +
assertNoRunningTraining) already covers the new sender training path via the same guard. No compose configuration change needed.Recommendations
OCR_MAX_CACHED_MODELS: "5"to theocr-serviceenvironment block indocker-compose.yml.SenderModelService.maybeGetModelPath()noting that a missing model file causes transparent fallback to the base model — this is observable in the OCR service log ("Kraken model path not found, using base model").🗳️ Decision Queue — Action Required
1 decision needs your input before implementation starts.
Architecture
idx_ocr_training_runs_one_running) allows only one training run at a time. This constraint was designed for base model training, where the Python service reloads_modelin-place after each run. Sender model training writes to a different file and never touches_model, so there's no model-state collision between a sender run and a base run at the Python level. Option A (keep constraint): sender training is silently skipped if base training is running — simple, zero new risk, but a 10–20 minute base training run delays a sender model activation. Option B (relax constraint for sender runs): allow one base run and one sender run concurrently — faster, but requires verifying the Python service can safely run twoketos trainsubprocesses simultaneously within the 12 GBmem_limit(~2 × 3–4 GB RAM for training). (Raised by: Markus)🏛️ Markus Keller — Architect Discussion Follow-up
Four open items worked through interactively. All resolved.
✅ Resolved
Single-run constraint → Queue with coalescing — Replace the "silently skip" guard with a
QUEUEDstatus onocr_training_runs. When a training trigger arrives while a run is RUNNING, insert a QUEUED row instead of discarding. After any run completes (DONE or FAILED), the completion code picks the oldest QUEUED row and promotes it to RUNNING. Coalescing: at most one QUEUED row perperson_id— duplicate triggers for the same sender are dropped since the queued run will re-count all current MANUAL blocks when it executes. Both base and sender runs share the single slot (RAM safety, same rationale as the existing constraint).personNamesinTrainingInfoResponse→ DTO assembled in the controller —OcrTrainingService.getTrainingInfo()returns plainOcrTrainingRunentities. The controller callsPersonService.getById()for non-nullpersonIds and assembles aTrainingRunDTOthat includespersonDisplayName. The OCR training service stays clean; cross-domain display concerns belong in the controller layer.TransactionTemplatein@Async→ Separate@Transactionalbean — Extract the transactional operations (create run record, update model + run on success, mark failed) into package-private@Transactionalmethods on a dedicated bean (likelyOcrTrainingService, which already owns the run lifecycle). The@Asyncmethod calls those directly. NoTransactionTemplateneeded; pattern consistent with the rest of the codebase.ADR-001 update — written — Added an "Amendment" section documenting the single-slot training queue, the coalescing rule, the RAM justification for keeping sender training in the same slot, and a future direction note for relaxing the constraint once peak RAM under concurrent training is measured.
No unresolved items.
Implementation complete ✅
All work implemented on
feat/issue-253-sender-models→ PR #263.What was built
Backend (Spring Boot)
V23__sender_models.sql— migration addingsender_modelstableSenderModelentity +SenderModelRepository(withfindBySenderId,findBySenderIdAndActive)SenderModelService— activation threshold (100 blocks), retrain delta (50 new blocks),@Asynctraining dispatch, triggered fromTranscriptionService.updateBlock()for Kurrent sendersOcrTrainingService.runOrQueueSenderTraining()— coalesces duplicate queue entries with newQUEUEDstatusOcrTrainingRun+OcrTrainingRunRepository—personIdfield,QUEUEDstatus enum valueOcrController.getTrainingInfo()— now returnspersonNames: Map<String, String>alongside runsOcrClient.streamBlocks()— extended withsenderModelPathparameter (nullable)OcrAsyncRunner— passes active sender model path fromSenderModelServiceto OCR clientOCR Service (Python)
_SenderModelRegistry— LRU cache with double-checked locking, path whitelist to/app/models/extract_page_blocks,extract_region_text,extract_blocks— all acceptsender_model_path/train-senderendpoint — validates output path, runs ketos fine-tune, invalidates cacheOCR_MAX_CACHED_MODELSenv var consumed from docker-composeFrontend (SvelteKit)
OcrTrainingRunwithpersonId,QUEUEDstatus)training_col_type,training_type_base,training_type_personalized,training_col_person,training_status_queuedTrainingHistory.svelte— Type and Person columns, QUEUED badgeOcrTrainingCard.svelte— passespersonNamesthrough toTrainingHistoryCommits
feat(backend): add SenderModel entity and repository for per-sender OCR modelsfeat(backend): SenderModelService checks threshold and triggers async trainingfeat(backend): add QUEUED status and runOrQueueSenderTraining to OcrTrainingServicefeat(backend): pass senderModelPath through OcrClient.streamBlocks and OcrAsyncRunnerfeat(backend): OcrController returns personNames in training info responsefeat(ocr): add _SenderModelRegistry LRU cache with path whitelistfeat(ocr): add /train-sender endpoint and thread-safe cache invalidationfeat(frontend): regenerate API types with QUEUED status and personIdfeat(frontend): add i18n keys for type/person/queued training columnsfeat(frontend): extend TrainingHistory with type and person columnsfeat(frontend): wire personNames to TrainingHistory in OcrTrainingCardTest results