feat(admin): OCR admin pages — overview & model detail #264

Closed
opened 2026-04-17 19:12:22 +02:00 by marcel · 10 comments
Owner

Spec

UI/UX spec: docs/specs/ocr-admin-spec.html
Commit: 78eca8e

Two new admin routes:

  • /admin/ocr — health bar, stat cards, models table (global + all senders), global action buttons
  • /admin/ocr/global — global model detail with metric cards + full training history
  • /admin/ocr/[personId] — sender model detail with metric cards + full training history + person link

Backend gaps to close first

1 — Add sender models list to GET /api/ocr/training-info

The overview models table must show all known sender models, including those not in the last 20 runs. Currently the response only includes runs-derived data.

Changes:

  • Add senderModels: List<SenderModel> to TrainingInfoResponse (via SenderModelRepository.findAll())
  • Inject SenderModelRepository into OcrTrainingService
  • Expand personNames in OcrController.getTrainingInfo() to cover all senders, not just recent-run persons

2 — Per-model history via ?personId= query param

Detail pages need the full run history for one model — top-20 global is insufficient once there are many senders and frequent global runs.

Changes to OcrTrainingRunRepository:

List<OcrTrainingRun> findByPersonIdIsNullOrderByCreatedAtDesc();
List<OcrTrainingRun> findByPersonIdOrderByCreatedAtDesc(UUID personId);

Change to GET /api/ocr/training-info: add optional ?personId= query param.

  • ?personId=global → returns full global model history (personId IS NULL)
  • ?personId={uuid} → returns full sender history for that person
  • Without param: current overview behaviour (+ senderModels field from gap 1)

3 — POST /api/ocr/train-sender endpoint

The sender detail page has a "Jetzt neu trainieren" button. There is no HTTP endpoint exposing sender training today (SenderModelService.triggerSenderTraining() is internal only).

Add to OcrController:

POST /api/ocr/train-sender
Body: { "personId": UUID }
Permission: ADMIN
 calls SenderModelService.runOrQueueSenderTraining()
 returns 201 with queued/started status

No backend changes needed

  • Stats fields (availableBlocks, availableDocuments, availableSegBlocks, ocrServiceAvailable)
  • TrainingStatus enum (QUEUED / RUNNING / DONE / FAILED)
  • Export endpoints
  • SSE progress endpoint
  • POST /api/ocr/train and POST /api/ocr/segtrain

Frontend work (after backend)

  • Add OCR entry to EntityNav (new "System" section with separator; count = 1 + senderModels.length)
  • New routes: admin/ocr/+page.svelte, admin/ocr/global/+page.svelte, admin/ocr/[personId]/+page.svelte
  • Reuse existing TrainingHistory.svelte component on detail pages
  • Health bar + models table are new components
  • SSE wiring via existing /api/ocr/jobs/{jobId}/progress channel
  • Regenerate API types after backend changes (npm run generate:api)
## Spec UI/UX spec: [`docs/specs/ocr-admin-spec.html`](http://192.168.178.71:3005/marcel/familienarchiv/src/branch/main/docs/specs/ocr-admin-spec.html) Commit: `78eca8e` Two new admin routes: - `/admin/ocr` — health bar, stat cards, models table (global + all senders), global action buttons - `/admin/ocr/global` — global model detail with metric cards + full training history - `/admin/ocr/[personId]` — sender model detail with metric cards + full training history + person link --- ## Backend gaps to close first ### 1 — Add sender models list to `GET /api/ocr/training-info` The overview models table must show **all known sender models**, including those not in the last 20 runs. Currently the response only includes runs-derived data. **Changes:** - Add `senderModels: List<SenderModel>` to `TrainingInfoResponse` (via `SenderModelRepository.findAll()`) - Inject `SenderModelRepository` into `OcrTrainingService` - Expand `personNames` in `OcrController.getTrainingInfo()` to cover all senders, not just recent-run persons ### 2 — Per-model history via `?personId=` query param Detail pages need the **full run history for one model** — top-20 global is insufficient once there are many senders and frequent global runs. **Changes to `OcrTrainingRunRepository`:** ```java List<OcrTrainingRun> findByPersonIdIsNullOrderByCreatedAtDesc(); List<OcrTrainingRun> findByPersonIdOrderByCreatedAtDesc(UUID personId); ``` **Change to `GET /api/ocr/training-info`:** add optional `?personId=` query param. - `?personId=global` → returns full global model history (`personId IS NULL`) - `?personId={uuid}` → returns full sender history for that person - Without param: current overview behaviour (+ senderModels field from gap 1) ### 3 — `POST /api/ocr/train-sender` endpoint The sender detail page has a "Jetzt neu trainieren" button. There is no HTTP endpoint exposing sender training today (`SenderModelService.triggerSenderTraining()` is internal only). **Add to `OcrController`:** ```java POST /api/ocr/train-sender Body: { "personId": UUID } Permission: ADMIN → calls SenderModelService.runOrQueueSenderTraining() → returns 201 with queued/started status ``` --- ## No backend changes needed - Stats fields (`availableBlocks`, `availableDocuments`, `availableSegBlocks`, `ocrServiceAvailable`) ✅ - `TrainingStatus` enum (QUEUED / RUNNING / DONE / FAILED) ✅ - Export endpoints ✅ - SSE progress endpoint ✅ - `POST /api/ocr/train` and `POST /api/ocr/segtrain` ✅ --- ## Frontend work (after backend) - Add OCR entry to `EntityNav` (new "System" section with separator; count = `1 + senderModels.length`) - New routes: `admin/ocr/+page.svelte`, `admin/ocr/global/+page.svelte`, `admin/ocr/[personId]/+page.svelte` - Reuse existing `TrainingHistory.svelte` component on detail pages - Health bar + models table are new components - SSE wiring via existing `/api/ocr/jobs/{jobId}/progress` channel - Regenerate API types after backend changes (`npm run generate:api`)
marcel added the featureui labels 2026-04-17 19:12:26 +02:00
Author
Owner

🏗️ Markus Keller — Application Architect

Observations

  • Cross-domain repository access in the issue spec. Gap 1 explicitly says "Inject SenderModelRepository into OcrTrainingService." This violates the project's layering rule: each service owns its repository; cross-domain data access goes through the owning service, not its repository. SenderModelRepository belongs to the sender-model domain — OcrTrainingService has no business holding a direct reference to it.

  • TrainingInfoResponse is a private nested record inside OcrTrainingService (lines 38–46). Once we add senderModels and expose the type via the controller, it needs to be a top-level class. A controller method should never implicitly reference a private nested record from a service.

  • ?personId=global is a magic string in a UUID-typed parameter slot. This breaks the implicit type contract of the parameter, will confuse OpenAPI documentation (which will type it as string rather than uuid), and needs special-case parsing in the backend. Clean alternatives exist.

  • Three routes sharing the same full-history shape (/admin/ocr/global and /admin/ocr/[personId]) is a good reuse of TrainingHistory.svelte. This part of the design is sound.

Recommendations

  • Fix the cross-domain access: Add List<SenderModel> getAllSenderModels() to SenderModelService and call it from OcrTrainingService.getTrainingInfo(). Never inject SenderModelRepository into a service that doesn't own it.

  • Promote TrainingInfoResponse to dto/TrainingInfoResponse.java (or a top-level record in the service package if you prefer). It's already controller-visible — a private nested type is structurally wrong for that role.

  • Replace ?personId=global with path-based routing. The cleanest option: GET /api/ocr/training-info (overview), GET /api/ocr/training-info/global (full global history), GET /api/ocr/training-info/{personId} (sender history). Path segments make the intent explicit and the OpenAPI type can be uuid for the personId variant. If a single endpoint with a query param is preferred, use two separate params: ?scope=global vs ?personId={uuid} — not a string-that-pretends-to-be-a-UUID.

## 🏗️ Markus Keller — Application Architect ### Observations - **Cross-domain repository access in the issue spec.** Gap 1 explicitly says "Inject `SenderModelRepository` into `OcrTrainingService`." This violates the project's layering rule: each service owns its repository; cross-domain data access goes through the owning service, not its repository. `SenderModelRepository` belongs to the sender-model domain — `OcrTrainingService` has no business holding a direct reference to it. - **`TrainingInfoResponse` is a private nested record inside `OcrTrainingService` (lines 38–46).** Once we add `senderModels` and expose the type via the controller, it needs to be a top-level class. A controller method should never implicitly reference a private nested record from a service. - **`?personId=global` is a magic string in a UUID-typed parameter slot.** This breaks the implicit type contract of the parameter, will confuse OpenAPI documentation (which will type it as `string` rather than `uuid`), and needs special-case parsing in the backend. Clean alternatives exist. - **Three routes sharing the same full-history shape** (`/admin/ocr/global` and `/admin/ocr/[personId]`) is a good reuse of `TrainingHistory.svelte`. This part of the design is sound. ### Recommendations - **Fix the cross-domain access:** Add `List<SenderModel> getAllSenderModels()` to `SenderModelService` and call it from `OcrTrainingService.getTrainingInfo()`. Never inject `SenderModelRepository` into a service that doesn't own it. - **Promote `TrainingInfoResponse` to `dto/TrainingInfoResponse.java`** (or a top-level record in the `service` package if you prefer). It's already controller-visible — a private nested type is structurally wrong for that role. - **Replace `?personId=global` with path-based routing.** The cleanest option: `GET /api/ocr/training-info` (overview), `GET /api/ocr/training-info/global` (full global history), `GET /api/ocr/training-info/{personId}` (sender history). Path segments make the intent explicit and the OpenAPI type can be `uuid` for the personId variant. If a single endpoint with a query param is preferred, use two separate params: `?scope=global` vs `?personId={uuid}` — not a string-that-pretends-to-be-a-UUID.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Observations

  • runOrQueueSenderTraining(UUID personId, int correctedLines) requires correctedLines, but the new endpoint body only has personId. The issue spec says "calls SenderModelService.runOrQueueSenderTraining()" without addressing the second parameter. Passing 0 silently is not obvious — it needs a deliberate decision.

  • HTTP 201 is wrong for an async trigger. The existing POST /api/ocr/train and POST /api/ocr/segtrain return the OcrTrainingRun entity with an implicit 200. The new POST /api/ocr/train-sender queues or starts work — it doesn't create a resource. 202 Accepted with the OcrTrainingRun body is the correct status and matches the async intent.

  • senderModels on TrainingInfoResponse will be null on the overview path for all existing frontend consumers until the backend change lands and npm run generate:api is re-run. The TypeScript type will need to reflect this as senderModels?: SenderModel[] or be required once the backend ships. Plan the regeneration step before writing any frontend overview code.

  • Component granularity for the three new pages. The issue mentions "health bar + models table are new components." Following the one-visual-region-per-file rule, I'd name these: OcrHealthBar.svelte, OcrModelsTable.svelte, OcrStatCards.svelte. The detail pages (global/+page.svelte, [personId]/+page.svelte) should be under 60 lines each by composing these.

  • TrainingHistory.svelte reuse is correct as-is — its props { runs: Run[], personNames?: Record<string, string> } already cover the detail page use case. No modification needed.

Recommendations

  • Add triggerManualSenderTraining(UUID personId): OcrTrainingRun to SenderModelService — a thin wrapper that calls runOrQueueSenderTraining(personId, 0) with documented intent. The 0 for correctedLines becomes an explicit named concept rather than a silent magic number at the call site.

  • Change the endpoint response to ResponseEntity<OcrTrainingRun> returning ResponseEntity.accepted().body(run) (HTTP 202), consistent with the async nature and distinguishable from a true resource creation.

  • Before any frontend work: rebuild the backend JAR, start with --spring.profiles.active=dev, run npm run generate:api. Do this immediately after each backend gap is closed — don't batch all three gaps before regenerating.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Observations - **`runOrQueueSenderTraining(UUID personId, int correctedLines)` requires `correctedLines`, but the new endpoint body only has `personId`.** The issue spec says "calls `SenderModelService.runOrQueueSenderTraining()`" without addressing the second parameter. Passing `0` silently is not obvious — it needs a deliberate decision. - **HTTP 201 is wrong for an async trigger.** The existing `POST /api/ocr/train` and `POST /api/ocr/segtrain` return the `OcrTrainingRun` entity with an implicit 200. The new `POST /api/ocr/train-sender` queues or starts work — it doesn't create a resource. **202 Accepted** with the `OcrTrainingRun` body is the correct status and matches the async intent. - **`senderModels` on `TrainingInfoResponse` will be `null` on the overview path for all existing frontend consumers** until the backend change lands and `npm run generate:api` is re-run. The TypeScript type will need to reflect this as `senderModels?: SenderModel[]` or be required once the backend ships. Plan the regeneration step before writing any frontend overview code. - **Component granularity for the three new pages.** The issue mentions "health bar + models table are new components." Following the one-visual-region-per-file rule, I'd name these: `OcrHealthBar.svelte`, `OcrModelsTable.svelte`, `OcrStatCards.svelte`. The detail pages (`global/+page.svelte`, `[personId]/+page.svelte`) should be under 60 lines each by composing these. - **`TrainingHistory.svelte` reuse is correct as-is** — its props `{ runs: Run[], personNames?: Record<string, string> }` already cover the detail page use case. No modification needed. ### Recommendations - Add `triggerManualSenderTraining(UUID personId): OcrTrainingRun` to `SenderModelService` — a thin wrapper that calls `runOrQueueSenderTraining(personId, 0)` with documented intent. The `0` for `correctedLines` becomes an explicit named concept rather than a silent magic number at the call site. - Change the endpoint response to `ResponseEntity<OcrTrainingRun>` returning `ResponseEntity.accepted().body(run)` (HTTP 202), consistent with the async nature and distinguishable from a true resource creation. - Before any frontend work: rebuild the backend JAR, start with `--spring.profiles.active=dev`, run `npm run generate:api`. Do this immediately after each backend gap is closed — don't batch all three gaps before regenerating.
Author
Owner

🔐 Nora "NullX" Steiner — Security Engineer

Observations

  • POST /api/ocr/train-sender is correctly scoped to ADMIN. The @RequirePermission(Permission.ADMIN) annotation via PermissionAspect will enforce this at the AOP layer.

  • UUID validation at the controller boundary. If personId in the request body is declared as UUID (Java type), Spring's Jackson deserializer will throw a 400 automatically on malformed input. But if it's declared as String and converted manually, an invalid UUID produces an opaque 500. Declare it UUID in the DTO.

  • No person existence check is described in the spec. The issue says the endpoint "calls SenderModelService.runOrQueueSenderTraining()" directly. If the personId belongs to a non-existent person, the behavior is undefined — potentially a 500 from a foreign key constraint, or silently training a phantom model. A structured 404 should come back instead.

  • Duplicate training prevention is already handled. runOrQueueSenderTraining prevents duplicate queued runs per person via application-level check. This is the correct defense against rapid repeated clicks on "Jetzt neu trainieren."

  • SSE permission parity. The existing /api/ocr/jobs/{jobId}/progress requires READ_ALL. ADMIN permission must imply READ_ALL in the permission model — confirm this before wiring the sender detail page's progress polling, since an ADMIN-only user who lacks READ_ALL would lose their SSE stream mid-training.

  • The ?personId=global sentinel (from Gap 2). If this string is passed to any UUID-typed field, it will cause a 400 or 500. The controller must handle it with an explicit if ("global".equals(personId)) branch — this is an untyped special case that lives entirely in application logic and is invisible to the type system or OpenAPI tooling.

Recommendations

  • Declare the request body DTO with UUID personId (not String). Jackson validates format; you get a free 400 on malformed input.
  • Before calling runOrQueueSenderTraining, call personService.getById(personId) — this returns a structured DomainException.notFound(ErrorCode.PERSON_NOT_FOUND, ...) rather than a DB constraint error.
  • Confirm that the Permission enum hierarchy ensures ADMIN ⊇ READ_ALL, or add READ_ALL as a second annotation where needed on the SSE endpoint.
  • If the ?personId=global design survives (see Markus's concern), isolate the parsing in a dedicated method with a comment explaining the intent — don't scatter "global".equals(...) checks inline.
## 🔐 Nora "NullX" Steiner — Security Engineer ### Observations - **`POST /api/ocr/train-sender` is correctly scoped to `ADMIN`.** ✅ The `@RequirePermission(Permission.ADMIN)` annotation via `PermissionAspect` will enforce this at the AOP layer. - **UUID validation at the controller boundary.** If `personId` in the request body is declared as `UUID` (Java type), Spring's Jackson deserializer will throw a 400 automatically on malformed input. ✅ But if it's declared as `String` and converted manually, an invalid UUID produces an opaque 500. Declare it `UUID` in the DTO. - **No person existence check is described in the spec.** The issue says the endpoint "calls `SenderModelService.runOrQueueSenderTraining()`" directly. If the `personId` belongs to a non-existent person, the behavior is undefined — potentially a 500 from a foreign key constraint, or silently training a phantom model. A structured 404 should come back instead. - **Duplicate training prevention is already handled.** `runOrQueueSenderTraining` prevents duplicate queued runs per person via application-level check. ✅ This is the correct defense against rapid repeated clicks on "Jetzt neu trainieren." - **SSE permission parity.** The existing `/api/ocr/jobs/{jobId}/progress` requires `READ_ALL`. ADMIN permission must imply READ_ALL in the permission model — confirm this before wiring the sender detail page's progress polling, since an ADMIN-only user who lacks `READ_ALL` would lose their SSE stream mid-training. - **The `?personId=global` sentinel (from Gap 2).** If this string is passed to any UUID-typed field, it will cause a 400 or 500. The controller must handle it with an explicit `if ("global".equals(personId))` branch — this is an untyped special case that lives entirely in application logic and is invisible to the type system or OpenAPI tooling. ### Recommendations - Declare the request body DTO with `UUID personId` (not `String`). Jackson validates format; you get a free 400 on malformed input. - Before calling `runOrQueueSenderTraining`, call `personService.getById(personId)` — this returns a structured `DomainException.notFound(ErrorCode.PERSON_NOT_FOUND, ...)` rather than a DB constraint error. - Confirm that the Permission enum hierarchy ensures ADMIN ⊇ READ_ALL, or add `READ_ALL` as a second annotation where needed on the SSE endpoint. - If the `?personId=global` design survives (see Markus's concern), isolate the parsing in a dedicated method with a comment explaining the intent — don't scatter `"global".equals(...)` checks inline.
Author
Owner

🧪 Sara Holt — QA Engineer

Observations

  • Two new derived query methods (findByPersonIdIsNullOrderByCreatedAtDesc(), findByPersonIdOrderByCreatedAtDesc(UUID personId)) look correct as Spring Data naming, but the IS NULL path is non-obvious. Spring Data's null-handling for derived queries can surprise — findByPersonIdIsNull is explicit, but it's worth an integration test against real Postgres (not H2) to confirm the generated SQL matches expectations.

  • The ?personId= query parameter has four distinct code paths that all need explicit test coverage:

    1. No param → overview (top-20, + senderModels list)
    2. ?personId=global → full global history (personId IS NULL)
    3. ?personId={valid uuid} → full sender history
    4. ?personId={invalid string} → 400
    5. ?personId={valid uuid, no matching person} → 404 (if existence check is added)

    That's five @WebMvcTest cases for a single endpoint change.

  • POST /api/ocr/train-sender permission boundary tests are not mentioned in the issue. These need to be written before the implementation (TDD): 401 unauthenticated, 403 for READ_ALL-only user, 404 for unknown personId, 202 for happy path.

  • TrainingHistory.svelte.spec.ts — the existing spec tests cover runs + personNames for recent-run persons. Once personNames is expanded to cover all senders (not just those in recent runs), add a test for the case where a person appears in personNames but not in the run list.

  • runOrQueueSenderTraining duplicate-prevention logic relies on application-level checks (not a DB unique constraint per the code). This is a race condition risk under concurrent requests. This path needs an integration test with Testcontainers and real Postgres to verify idempotency — a pure Mockito unit test cannot catch a TOCTOU race.

Recommendations

  • Write @WebMvcTest tests for all five ?personId= branches before implementing the endpoint change. The red test proves each branch is real.
  • Add an integration test (@SpringBootTest + Testcontainers) for both new repository methods specifically verifying the IS NULL / IS NOT NULL distinction against a real ocr_training_runs table.
  • For the duplicate-prevention race: add a comment in the service and a TODO for a DB-level unique partial index (CREATE UNIQUE INDEX ... WHERE status = 'QUEUED' per person) if concurrent admin use becomes realistic.
## 🧪 Sara Holt — QA Engineer ### Observations - **Two new derived query methods** (`findByPersonIdIsNullOrderByCreatedAtDesc()`, `findByPersonIdOrderByCreatedAtDesc(UUID personId)`) look correct as Spring Data naming, but the `IS NULL` path is non-obvious. Spring Data's null-handling for derived queries can surprise — `findByPersonIdIsNull` is explicit, but it's worth an integration test against real Postgres (not H2) to confirm the generated SQL matches expectations. - **The `?personId=` query parameter has four distinct code paths** that all need explicit test coverage: 1. No param → overview (top-20, + senderModels list) 2. `?personId=global` → full global history (personId IS NULL) 3. `?personId={valid uuid}` → full sender history 4. `?personId={invalid string}` → 400 5. `?personId={valid uuid, no matching person}` → 404 (if existence check is added) That's five `@WebMvcTest` cases for a single endpoint change. - **`POST /api/ocr/train-sender` permission boundary tests** are not mentioned in the issue. These need to be written before the implementation (TDD): 401 unauthenticated, 403 for READ_ALL-only user, 404 for unknown personId, 202 for happy path. - **`TrainingHistory.svelte.spec.ts`** — the existing spec tests cover `runs` + `personNames` for recent-run persons. Once `personNames` is expanded to cover all senders (not just those in recent runs), add a test for the case where a person appears in `personNames` but not in the run list. - **`runOrQueueSenderTraining` duplicate-prevention logic** relies on application-level checks (not a DB unique constraint per the code). This is a race condition risk under concurrent requests. This path needs an integration test with Testcontainers and real Postgres to verify idempotency — a pure Mockito unit test cannot catch a TOCTOU race. ### Recommendations - Write `@WebMvcTest` tests for all five `?personId=` branches before implementing the endpoint change. The red test proves each branch is real. - Add an integration test (`@SpringBootTest` + Testcontainers) for both new repository methods specifically verifying the `IS NULL` / `IS NOT NULL` distinction against a real `ocr_training_runs` table. - For the duplicate-prevention race: add a comment in the service and a TODO for a DB-level unique partial index (`CREATE UNIQUE INDEX ... WHERE status = 'QUEUED'` per person) if concurrent admin use becomes realistic.
Author
Owner

🎨 Leonie Voss — UX Designer & Accessibility Strategist

Observations

  • The spec file docs/specs/ocr-admin-spec.html does not exist in the repository at the current commit. I cannot review the visual design against the spec — this review is based solely on the issue description.

  • "Health bar" component. A horizontal bar conveying health/availability status is an accessibility risk if it uses color alone (green/red). Color-blind users cannot distinguish healthy from degraded. The component also needs ARIA progressbar semantics if it shows a percentage.

  • count = 1 + senderModels.length in the EntityNav badge. A badge showing "OCR (5)" next to an admin nav item is ambiguous — does it mean 5 models? 5 senders? Unseen errors? The senior user segment won't parse this without context.

  • "Jetzt neu trainieren" button on the sender detail page. If a training run is QUEUED or RUNNING, the button must be visually disabled and communicate its state. Clicking a button and seeing nothing happen is confusing for all users, critical for seniors.

  • SSE progress updates. If the overview or detail page streams live training progress, the updating text must be in an aria-live region so screen readers announce changes without requiring the user to navigate to the element.

  • Three new admin routes — each needs a consistent <title> (for browser tab and screen reader page announcements) and a back-link using the project's standard back-link pattern.

Recommendations

  • Implement the health bar as:

    <div role="progressbar" aria-valuenow={healthPercent} aria-valuemin={0} aria-valuemax={100}
         aria-label="OCR Service  Verfügbarkeit">
      <!-- fill bar -->
      <span class="sr-only">{healthPercent}% verfügbar</span>
    </div>
    

    Add an icon (✓ / ✗) alongside the color fill for color-blind parity.

  • Remove the numeric count from the EntityNav OCR entry, or use a subtitle: "1 global + {n} Sender" as a descriptive label rather than a raw number badge.

  • Bind "Jetzt neu trainieren" to disabled={status === 'RUNNING' || status === 'QUEUED'} and show an inline spinner when active. Touch target: min-h-[44px].

  • Wrap SSE-driven status text in:

    <p aria-live="polite" aria-atomic="false">{statusMessage}</p>
    
  • Ensure each new route sets <svelte:head><title>OCR — Familienarchiv</title></svelte:head> following the existing admin page pattern.

## 🎨 Leonie Voss — UX Designer & Accessibility Strategist ### Observations - **The spec file `docs/specs/ocr-admin-spec.html` does not exist in the repository at the current commit.** I cannot review the visual design against the spec — this review is based solely on the issue description. - **"Health bar" component.** A horizontal bar conveying health/availability status is an accessibility risk if it uses color alone (green/red). Color-blind users cannot distinguish healthy from degraded. The component also needs ARIA progressbar semantics if it shows a percentage. - **`count = 1 + senderModels.length` in the EntityNav badge.** A badge showing "OCR (5)" next to an admin nav item is ambiguous — does it mean 5 models? 5 senders? Unseen errors? The senior user segment won't parse this without context. - **"Jetzt neu trainieren" button on the sender detail page.** If a training run is QUEUED or RUNNING, the button must be visually disabled and communicate its state. Clicking a button and seeing nothing happen is confusing for all users, critical for seniors. - **SSE progress updates.** If the overview or detail page streams live training progress, the updating text must be in an `aria-live` region so screen readers announce changes without requiring the user to navigate to the element. - **Three new admin routes** — each needs a consistent `<title>` (for browser tab and screen reader page announcements) and a back-link using the project's standard back-link pattern. ### Recommendations - Implement the health bar as: ```svelte <div role="progressbar" aria-valuenow={healthPercent} aria-valuemin={0} aria-valuemax={100} aria-label="OCR Service — Verfügbarkeit"> <!-- fill bar --> <span class="sr-only">{healthPercent}% verfügbar</span> </div> ``` Add an icon (✓ / ✗) alongside the color fill for color-blind parity. - Remove the numeric count from the EntityNav OCR entry, or use a subtitle: `"1 global + {n} Sender"` as a descriptive label rather than a raw number badge. - Bind "Jetzt neu trainieren" to `disabled={status === 'RUNNING' || status === 'QUEUED'}` and show an inline spinner when active. Touch target: `min-h-[44px]`. - Wrap SSE-driven status text in: ```svelte <p aria-live="polite" aria-atomic="false">{statusMessage}</p> ``` - Ensure each new route sets `<svelte:head><title>OCR — Familienarchiv</title></svelte:head>` following the existing admin page pattern.
Author
Owner

🛠️ Tobias Wendt — DevOps & Platform Engineer

Observations

  • No new infrastructure components needed for this issue. No new Docker services, volumes, networks, or environment variables are introduced.

  • OCR service memory limits. The admin "Jetzt neu trainieren" button puts model training directly in the hands of admins, potentially triggering it more frequently than the automated threshold-based path. If docker-compose.yml doesn't already have deploy.resources.limits.memory on the OCR service, concurrent or back-to-back sender training runs could OOM-kill the container (Surya loads ~5GB of transformer weights).

  • SenderModelRepository.findAll() on every admin page load. The overview endpoint will call findAll() on every visit to /admin/ocr. For a family archive with <30 senders this is negligible, but it's a full table scan with no pagination. Note it as a future concern if the archive grows.

  • SSE stream timeout at the Caddy reverse proxy. The progress channel at /api/ocr/jobs/{jobId}/progress is a long-lived HTTP connection. Caddy's default reverse_proxy directive uses a 0s (infinite) read timeout, but if the Caddyfile has explicit timeout overrides, the SSE stream will be silently cut mid-training. Verify flush_interval -1 is set for SSE routes, or that no explicit read timeout applies to the /api/ocr/jobs/*/progress path.

  • V40 and V41 migrations are already on the branch (confirmed from git status). No additional migrations needed for this issue.

Recommendations

  • Check docker-compose.yml for the OCR service — if there's no deploy.resources.limits.memory, add 8G (matching the Surya load requirement documented in docs/infrastructure/).
  • Verify the Caddyfile has either flush_interval -1 globally for SSE routes, or an explicit handle /api/ocr/jobs/*/progress block with that setting. This is needed for the existing SSE consumers too — if it's not there yet, this is the right time to add it.
  • No CI pipeline changes needed. The new endpoints and Svelte routes are covered by existing backend test and frontend build jobs.
## 🛠️ Tobias Wendt — DevOps & Platform Engineer ### Observations - **No new infrastructure components needed for this issue.** No new Docker services, volumes, networks, or environment variables are introduced. ✅ - **OCR service memory limits.** The admin "Jetzt neu trainieren" button puts model training directly in the hands of admins, potentially triggering it more frequently than the automated threshold-based path. If `docker-compose.yml` doesn't already have `deploy.resources.limits.memory` on the OCR service, concurrent or back-to-back sender training runs could OOM-kill the container (Surya loads ~5GB of transformer weights). - **`SenderModelRepository.findAll()` on every admin page load.** The overview endpoint will call `findAll()` on every visit to `/admin/ocr`. For a family archive with <30 senders this is negligible, but it's a full table scan with no pagination. Note it as a future concern if the archive grows. - **SSE stream timeout at the Caddy reverse proxy.** The progress channel at `/api/ocr/jobs/{jobId}/progress` is a long-lived HTTP connection. Caddy's default `reverse_proxy` directive uses a 0s (infinite) read timeout, but if the Caddyfile has explicit timeout overrides, the SSE stream will be silently cut mid-training. Verify `flush_interval -1` is set for SSE routes, or that no explicit read timeout applies to the `/api/ocr/jobs/*/progress` path. - **V40 and V41 migrations are already on the branch** (confirmed from git status). No additional migrations needed for this issue. ### Recommendations - Check `docker-compose.yml` for the OCR service — if there's no `deploy.resources.limits.memory`, add `8G` (matching the Surya load requirement documented in `docs/infrastructure/`). - Verify the Caddyfile has either `flush_interval -1` globally for SSE routes, or an explicit `handle /api/ocr/jobs/*/progress` block with that setting. This is needed for the existing SSE consumers too — if it's not there yet, this is the right time to add it. - No CI pipeline changes needed. The new endpoints and Svelte routes are covered by existing backend test and frontend build jobs.
Author
Owner

🗳️ Decision Queue — Action Required

1 decision needs your input before implementation starts.

API Design

  • Routing for per-model history: query param sentinel vs path segments — The issue spec uses GET /api/ocr/training-info?personId=global (a magic string in a UUID slot) to distinguish global history from sender history. Markus recommends path segments (GET /api/ocr/training-info/global and GET /api/ocr/training-info/{personId}) which give a clean OpenAPI type (uuid for the personId variant) and explicit routing. Nora flagged that the ?personId=global approach requires special-case parsing that's invisible to the type system. Trade-off: path segments require two controller methods and slightly different SvelteKit load function URLs; the sentinel query param keeps it as one method but conflates string and UUID semantics and makes OpenAPI generation awkward. (Raised by: Markus, Nora)
## 🗳️ Decision Queue — Action Required _1 decision needs your input before implementation starts._ ### API Design - **Routing for per-model history: query param sentinel vs path segments** — The issue spec uses `GET /api/ocr/training-info?personId=global` (a magic string in a UUID slot) to distinguish global history from sender history. Markus recommends path segments (`GET /api/ocr/training-info/global` and `GET /api/ocr/training-info/{personId}`) which give a clean OpenAPI type (`uuid` for the personId variant) and explicit routing. Nora flagged that the `?personId=global` approach requires special-case parsing that's invisible to the type system. **Trade-off:** path segments require two controller methods and slightly different SvelteKit load function URLs; the sentinel query param keeps it as one method but conflates string and UUID semantics and makes OpenAPI generation awkward. _(Raised by: Markus, Nora)_
Author
Owner

🏗️ Markus Keller — Architecture Discussion

Four open items from my review comment worked through and closed. These supersede the Decision Queue item posted earlier.

Resolved

  1. Cross-domain repo access — Do not inject SenderModelRepository into OcrTrainingService. Add getAllSenderModels(): List<SenderModel> to SenderModelService and call that from OcrTrainingService.getTrainingInfo(). Service-to-service is the permitted cross-domain channel.

  2. Routing design for per-model history — The ?personId=global sentinel is dropped. Use three path-based endpoints instead:

    • GET /api/ocr/training-info — overview (top-20 + senderModels list)
    • GET /api/ocr/training-info/global — full global model history
    • GET /api/ocr/training-info/{personId} — full sender history (personId typed uuid)

    OpenAPI will type personId correctly and no sentinel parsing is needed in the controller.

  3. TrainingInfoResponse location — Move from private nested record inside OcrTrainingService to dto/TrainingInfoResponse.java (top-level class). It's controller-visible with a new field — a private nested type is the wrong home for it.

  4. POST /api/ocr/train-sender full contract — Settled as:

    • Request body: { personId: UUID } — declare as Java UUID type so Jackson rejects malformed input with a 400
    • Person existence guard: call personService.getById(personId) before SenderModelService — returns structured DomainException.notFound(PERSON_NOT_FOUND) rather than a FK constraint error
    • Response: 202 Accepted with OcrTrainingRun body (async trigger, not resource creation)

The issue spec's Gap 1–3 descriptions should be read with these four corrections applied before implementation starts.

## 🏗️ Markus Keller — Architecture Discussion Four open items from my review comment worked through and closed. These supersede the Decision Queue item posted earlier. ### Resolved 1. **Cross-domain repo access** — Do not inject `SenderModelRepository` into `OcrTrainingService`. Add `getAllSenderModels(): List<SenderModel>` to `SenderModelService` and call that from `OcrTrainingService.getTrainingInfo()`. Service-to-service is the permitted cross-domain channel. 2. **Routing design for per-model history** — The `?personId=global` sentinel is dropped. Use three path-based endpoints instead: - `GET /api/ocr/training-info` — overview (top-20 + senderModels list) - `GET /api/ocr/training-info/global` — full global model history - `GET /api/ocr/training-info/{personId}` — full sender history (`personId` typed `uuid`) OpenAPI will type `personId` correctly and no sentinel parsing is needed in the controller. 3. **`TrainingInfoResponse` location** — Move from private nested record inside `OcrTrainingService` to `dto/TrainingInfoResponse.java` (top-level class). It's controller-visible with a new field — a private nested type is the wrong home for it. 4. **`POST /api/ocr/train-sender` full contract** — Settled as: - Request body: `{ personId: UUID }` — declare as Java `UUID` type so Jackson rejects malformed input with a 400 - Person existence guard: call `personService.getById(personId)` before `SenderModelService` — returns structured `DomainException.notFound(PERSON_NOT_FOUND)` rather than a FK constraint error - Response: **202 Accepted** with `OcrTrainingRun` body (async trigger, not resource creation) The issue spec's Gap 1–3 descriptions should be read with these four corrections applied before implementation starts.
Author
Owner

Implementation complete

Branch: feat/issue-264-ocr-admin-pages

Commits

Backend

  • b1b7418 — Promote TrainingInfoResponse to dto/, add senderModels field
  • 178afcd — Add per-model history endpoints (GET /api/ocr/training-info/global, GET /api/ocr/training-info/{personId})
  • c3fa09d — Add POST /api/ocr/train-sender endpoint for manual sender training trigger
  • 99e7176 — Service-level tests for triggerManualSenderTraining

Frontend

  • f533817 — Regenerate TypeScript API types with new OCR endpoints
  • 5f4e60a — Add OCR entry to EntityNav sidebar and flyout
  • 0d8ac46OcrHealthBar, OcrStatCards, OcrModelsTable components (with tests)
  • 8acb830 — Admin OCR routes: /admin/ocr overview, /admin/ocr/global history, /admin/ocr/[personId] sender detail (with page server tests)

Test results

  • Backend: 1060 tests
  • Frontend: 971 tests
## Implementation complete ✅ Branch: `feat/issue-264-ocr-admin-pages` ### Commits **Backend** - `b1b7418` — Promote `TrainingInfoResponse` to `dto/`, add `senderModels` field - `178afcd` — Add per-model history endpoints (`GET /api/ocr/training-info/global`, `GET /api/ocr/training-info/{personId}`) - `c3fa09d` — Add `POST /api/ocr/train-sender` endpoint for manual sender training trigger - `99e7176` — Service-level tests for `triggerManualSenderTraining` **Frontend** - `f533817` — Regenerate TypeScript API types with new OCR endpoints - `5f4e60a` — Add OCR entry to EntityNav sidebar and flyout - `0d8ac46` — `OcrHealthBar`, `OcrStatCards`, `OcrModelsTable` components (with tests) - `8acb830` — Admin OCR routes: `/admin/ocr` overview, `/admin/ocr/global` history, `/admin/ocr/[personId]` sender detail (with page server tests) ### Test results - Backend: 1060 tests ✅ - Frontend: 971 tests ✅
Author
Owner

Implementation complete

All acceptance criteria from the issue are implemented and fully reviewed via PR #265.

What was implemented

Backend

  • GET /api/ocr/models — returns all sender models with personId, cer, accuracy, correctedLinesAtTraining
  • GET /api/ocr/training-info — training overview (global stats, per-person model list)
  • GET /api/ocr/training-info/global — global training history runs
  • GET /api/ocr/training-info/{personId} — per-person training history
  • POST /api/ocr/train-sender — manually trigger sender model training for a person
  • ErrorCode.OCR_TRAINING_CONFLICT added for the defensive path in triggerManualSenderTraining and triggerSenderTraining
  • All endpoints guarded with @RequirePermission(Permission.ADMIN)
  • 20 SenderModelServiceTest tests passing

Frontend

  • /admin/ocr — OCR overview page (health bar, stat cards, sender models table)
  • /admin/ocr/global — global training history page
  • /admin/ocr/[personId] — per-person training history page
  • OcrModelsTable, OcrStatCards, OcrHealthBar, TrainingHistory components
  • WCAG 2.5.5 touch targets (44px) on all table links
  • Focus rings on all interactive elements
  • Full i18n (de/en/es) via Paraglide

Commits

  • 794000c — locale-agnostic OcrHealthBar tests, focus rings on OCR links
  • 2466553 — fix: IllegalStateExceptionDomainException.internal in triggerManualSenderTraining
  • fc892f0 — fix: pass personId through load fn; widen touch targets in table rows
  • 16bcd0f — fix: IllegalStateExceptionDomainException.internal in triggerSenderTraining

Post-merge task

Issue #266 tracks regenerating TypeScript types so TriggerSenderTrainingDTO.personId becomes non-optional in the generated client.

PR: http://heim-nas:3005/marcel/familienarchiv/pulls/265

## Implementation complete ✅ All acceptance criteria from the issue are implemented and fully reviewed via PR #265. ### What was implemented **Backend** - `GET /api/ocr/models` — returns all sender models with `personId`, `cer`, `accuracy`, `correctedLinesAtTraining` - `GET /api/ocr/training-info` — training overview (global stats, per-person model list) - `GET /api/ocr/training-info/global` — global training history runs - `GET /api/ocr/training-info/{personId}` — per-person training history - `POST /api/ocr/train-sender` — manually trigger sender model training for a person - `ErrorCode.OCR_TRAINING_CONFLICT` added for the defensive path in `triggerManualSenderTraining` and `triggerSenderTraining` - All endpoints guarded with `@RequirePermission(Permission.ADMIN)` - 20 `SenderModelServiceTest` tests passing **Frontend** - `/admin/ocr` — OCR overview page (health bar, stat cards, sender models table) - `/admin/ocr/global` — global training history page - `/admin/ocr/[personId]` — per-person training history page - `OcrModelsTable`, `OcrStatCards`, `OcrHealthBar`, `TrainingHistory` components - WCAG 2.5.5 touch targets (44px) on all table links - Focus rings on all interactive elements - Full i18n (de/en/es) via Paraglide ### Commits - `794000c` — locale-agnostic OcrHealthBar tests, focus rings on OCR links - `2466553` — fix: `IllegalStateException` → `DomainException.internal` in `triggerManualSenderTraining` - `fc892f0` — fix: pass `personId` through load fn; widen touch targets in table rows - `16bcd0f` — fix: `IllegalStateException` → `DomainException.internal` in `triggerSenderTraining` ### Post-merge task Issue #266 tracks regenerating TypeScript types so `TriggerSenderTrainingDTO.personId` becomes non-optional in the generated client. PR: http://heim-nas:3005/marcel/familienarchiv/pulls/265
Sign in to join this conversation.
No Label feature ui
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#264