test(search): add sender name FTS coverage and combined filter test

- should_find_document_by_sender_name — symmetric with existing receiver test - fts_combined_with_status_filter_excludes_non_matching_status — verifies hasIds(rankedIds).and(hasStatus(...)) two-phase search works together Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
refactor(search): replace O(n²) indexOf with HashMap for rank ordering
2026-04-15 11:03:37 +02:00 · 2026-04-15 10:59:05 +02:00 · 2026-04-15 10:57:24 +02:00 · 2026-04-14 23:47:45 +02:00 · 2026-04-14 23:46:24 +02:00 · 2026-04-14 23:38:12 +02:00
27 changed files with 129 additions and 824 deletions
--- a/.env.example
+++ b/.env.example
@@ -21,10 +21,9 @@ PORT_FRONTEND=5173
 PORT_MAILPIT_UI=8100
 PORT_MAILPIT_SMTP=1025

-# OCR Training — secret token required to call /train and /segtrain on the OCR service.
-# Also set in the backend so it can pass the token through. Must not be empty in production.
-# Generate with: python3 -c "import secrets; print(secrets.token_hex(32))"
-OCR_TRAINING_TOKEN=change-me-in-production
+# OCR Training — set a secret token to protect the /train and /segtrain endpoints on the
+# Python OCR microservice. Leave empty to disable token authentication (development only).
+# OCR_TRAINING_TOKEN=change-me-in-production

 # Production SMTP — uncomment and fill in to send real emails instead of catching them
 # APP_BASE_URL=https://your-domain.example.com
--- a/backend/.dockerignore
+++ b/backend/.dockerignore
@@ -1,4 +0,0 @@
-target/
-.git/
-*.md
-api_tests/
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -1,18 +1,9 @@
-FROM eclipse-temurin:21.0.10_7-jdk-noble AS builder
+FROM eclipse-temurin:21-jdk
+
 WORKDIR /app

-# Copy wrapper and POM first — dependency layer is cached separately from source
-COPY .mvn .mvn
-COPY mvnw pom.xml ./
-RUN --mount=type=cache,target=/root/.m2 ./mvnw dependency:go-offline -q
-
-COPY src ./src
-# -Dmaven.test.skip=true skips test compilation entirely (not just execution)
-RUN --mount=type=cache,target=/root/.m2 ./mvnw clean package -Dmaven.test.skip=true -q
-
-FROM eclipse-temurin:21.0.10_7-jre-noble
-WORKDIR /app
-# Spring Boot repackages to *.jar; pre-repackage artifact uses .jar.original, not .jar
-COPY --from=builder /app/target/*.jar app.jar
 EXPOSE 8080
-CMD ["java", "-jar", "app.jar"]
+
+# Source code and mvnw are mounted via docker-compose volume at runtime.
+# Maven dependencies are cached in a named volume (~/.m2).
+CMD ["./mvnw", "spring-boot:run"]
--- a/backend/src/main/java/org/raddatz/familienarchiv/repository/OcrTrainingRunRepository.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/repository/OcrTrainingRunRepository.java
@@ -12,5 +12,5 @@ public interface OcrTrainingRunRepository extends JpaRepository<OcrTrainingRun,

    Optional<OcrTrainingRun> findFirstByStatus(TrainingStatus status);

-    List<OcrTrainingRun> findTop10ByOrderByCreatedAtDesc();
+    List<OcrTrainingRun> findTop5ByOrderByCreatedAtDesc();
 }
--- a/backend/src/main/java/org/raddatz/familienarchiv/service/OcrTrainingService.java
+++ b/backend/src/main/java/org/raddatz/familienarchiv/service/OcrTrainingService.java
@@ -45,13 +45,6 @@ public class OcrTrainingService {
            List<OcrTrainingRun> runs
    ) {}

-    private void assertNoRunningTraining() {
-        if (trainingRunRepository.findFirstByStatus(TrainingStatus.RUNNING).isPresent()) {
-            throw DomainException.conflict(ErrorCode.TRAINING_ALREADY_RUNNING,
-                    "A training run is already in progress");
-        }
-    }
-
    // Not safe for horizontal scaling: training reloads the Kraken model in-process on the
    // Python OCR service after each run. The DB-level RUNNING constraint (V30 partial unique
    // index) prevents concurrent training API calls, but cannot prevent two OCR service replicas
@@ -60,7 +53,10 @@ public class OcrTrainingService {
        // Short transaction: guard check + create RUNNING row, then commit immediately.
        // The DB connection is released before the OCR HTTP call, which can take several minutes.
        OcrTrainingRun run = Objects.requireNonNull(txTemplate.execute(status -> {
-            assertNoRunningTraining();
+            if (trainingRunRepository.findFirstByStatus(TrainingStatus.RUNNING).isPresent()) {
+                throw DomainException.conflict(ErrorCode.TRAINING_ALREADY_RUNNING,
+                        "A training run is already in progress");
+            }

            var eligibleBlocks = trainingDataExportService.queryEligibleBlocks();
            if (eligibleBlocks.size() < 5) {
@@ -124,7 +120,10 @@ public class OcrTrainingService {
    public OcrTrainingRun triggerSegTraining(UUID triggeredBy) {
        // Same pattern as triggerTraining: narrow transactions around DB writes only.
        OcrTrainingRun run = Objects.requireNonNull(txTemplate.execute(status -> {
-            assertNoRunningTraining();
+            if (trainingRunRepository.findFirstByStatus(TrainingStatus.RUNNING).isPresent()) {
+                throw DomainException.conflict(ErrorCode.TRAINING_ALREADY_RUNNING,
+                        "A training run is already in progress");
+            }

            var segBlocks = segmentationTrainingExportService.querySegmentationBlocks();
            if (segBlocks.size() < 5) {
@@ -163,12 +162,11 @@ public class OcrTrainingService {
            return Objects.requireNonNull(txTemplate.execute(status -> {
                run.setStatus(TrainingStatus.DONE);
                run.setCompletedAt(Instant.now());
-                run.setCer(result.cer());
                run.setLoss(result.loss());
                run.setAccuracy(result.accuracy());
                run.setEpochs(result.epochs());
                OcrTrainingRun updated = trainingRunRepository.save(run);
-                log.info("[trainingRun={}] Segmentation training completed — cer={} epochs={}", runId, result.cer(), result.epochs());
+                log.info("[trainingRun={}] Segmentation training completed — epochs={}", runId, result.epochs());
                return updated;
            }));
        } catch (Exception e) {
@@ -195,7 +193,7 @@ public class OcrTrainingService {
        int totalOcrBlocks = (int) blockRepository.count();
        int availableSegBlocks = segmentationTrainingExportService.querySegmentationBlocks().size();

-        List<OcrTrainingRun> recentRuns = trainingRunRepository.findTop10ByOrderByCreatedAtDesc();
+        List<OcrTrainingRun> recentRuns = trainingRunRepository.findTop5ByOrderByCreatedAtDesc();
        OcrTrainingRun lastRun = recentRuns.isEmpty() ? null : recentRuns.get(0);

        return new TrainingInfoResponse(
--- a/backend/src/test/java/org/raddatz/familienarchiv/service/OcrTrainingServiceTest.java
+++ b/backend/src/test/java/org/raddatz/familienarchiv/service/OcrTrainingServiceTest.java
@@ -53,7 +53,7 @@ class OcrTrainingServiceTest {
        service = new OcrTrainingService(runRepository, exportService, segExportService, ocrClient, healthClient, blockRepository, txTemplate);

        when(blockRepository.count()).thenReturn(0L);
-        when(runRepository.findTop10ByOrderByCreatedAtDesc()).thenReturn(List.of());
+        when(runRepository.findTop5ByOrderByCreatedAtDesc()).thenReturn(List.of());
        when(segExportService.querySegmentationBlocks()).thenReturn(List.of());
    }

@@ -146,90 +146,6 @@ class OcrTrainingServiceTest {
                run.getStatus() == TrainingStatus.FAILED && run.getErrorMessage() != null));
    }

-    // ─── triggerSegTraining ───────────────────────────────────────────────────
-
-    @Test
-    void triggerSegTraining_throws409_whenRunningRunExists() {
-        when(runRepository.findFirstByStatus(TrainingStatus.RUNNING))
-                .thenReturn(Optional.of(OcrTrainingRun.builder()
-                        .id(UUID.randomUUID()).status(TrainingStatus.RUNNING)
-                        .blockCount(5).documentCount(2).modelName("blla").build()));
-
-        assertThatThrownBy(() -> service.triggerSegTraining(null))
-                .isInstanceOf(DomainException.class)
-                .extracting("status")
-                .satisfies(s -> assertThat(s.toString()).contains("409"));
-    }
-
-    @Test
-    void triggerSegTraining_throws422_whenFewerThan5Segments() {
-        when(runRepository.findFirstByStatus(TrainingStatus.RUNNING)).thenReturn(Optional.empty());
-        when(segExportService.querySegmentationBlocks()).thenReturn(List.of(
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(UUID.randomUUID()).build(),
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(UUID.randomUUID()).build()
-        ));
-
-        assertThatThrownBy(() -> service.triggerSegTraining(null))
-                .isInstanceOf(DomainException.class);
-    }
-
-    @Test
-    void triggerSegTraining_createsRunWithBlla_andMarksDoneWithCer() throws Exception {
-        when(runRepository.findFirstByStatus(TrainingStatus.RUNNING)).thenReturn(Optional.empty());
-
-        UUID docA = UUID.randomUUID();
-        UUID docB = UUID.randomUUID();
-        List<TranscriptionBlock> segs = List.of(
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(docA).build(),
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(docA).build(),
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(docA).build(),
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(docA).build(),
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(docB).build()
-        );
-        when(segExportService.querySegmentationBlocks()).thenReturn(segs);
-        when(segExportService.exportToZip()).thenReturn(out -> {});
-        when(ocrClient.segtrainModel(any())).thenReturn(new OcrClient.TrainingResult(null, 0.92, 0.08, 5));
-
-        OcrTrainingRun saved = OcrTrainingRun.builder()
-                .id(UUID.randomUUID()).status(TrainingStatus.RUNNING)
-                .blockCount(5).documentCount(2).modelName("blla").build();
-        when(runRepository.save(any())).thenReturn(saved);
-
-        service.triggerSegTraining(null);
-
-        verify(runRepository, atLeastOnce()).save(argThat(run ->
-                run.getStatus() == TrainingStatus.DONE
-                        && "blla".equals(run.getModelName())
-                        && run.getCer() != null));
-    }
-
-    @Test
-    void triggerSegTraining_marksRunFailed_whenOcrClientThrows() throws Exception {
-        when(runRepository.findFirstByStatus(TrainingStatus.RUNNING)).thenReturn(Optional.empty());
-
-        UUID docA = UUID.randomUUID();
-        List<TranscriptionBlock> segs = List.of(
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(docA).build(),
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(docA).build(),
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(docA).build(),
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(docA).build(),
-                TranscriptionBlock.builder().id(UUID.randomUUID()).documentId(docA).build()
-        );
-        when(segExportService.querySegmentationBlocks()).thenReturn(segs);
-        when(segExportService.exportToZip()).thenReturn(out -> {});
-        when(ocrClient.segtrainModel(any())).thenThrow(new RuntimeException("seg timeout"));
-
-        OcrTrainingRun saved = OcrTrainingRun.builder()
-                .id(UUID.randomUUID()).status(TrainingStatus.RUNNING)
-                .blockCount(5).documentCount(1).modelName("blla").build();
-        when(runRepository.save(any())).thenReturn(saved);
-
-        service.triggerSegTraining(null);
-
-        verify(runRepository, atLeastOnce()).save(argThat(run ->
-                run.getStatus() == TrainingStatus.FAILED && run.getErrorMessage() != null));
-    }
-
    // ─── Orphan recovery ──────────────────────────────────────────────────────

    @Test
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -83,11 +83,11 @@ services:
    restart: unless-stopped
    expose:
      - "8000"
-    mem_limit: 12g
-    memswap_limit: 12g
+    mem_limit: 8g
+    memswap_limit: 8g
    volumes:
      - ocr_models:/app/models
-      - ocr_cache:/root/.cache  # Hugging Face / ketos model download cache — prevents re-downloads on container recreate
+      - ocr_cache:/root/.cache
    environment:
      KRAKEN_MODEL_PATH: /app/models/german_kurrent.mlmodel
      TRAINING_TOKEN: "${OCR_TRAINING_TOKEN:-}"
@@ -102,7 +102,7 @@ services:
      interval: 10s
      timeout: 5s
      retries: 12
-      start_period: 120s
+      start_period: 60s

  # --- Backend: Spring Boot ---
  backend:
@@ -112,7 +112,9 @@ services:
    container_name: archive-backend
    restart: unless-stopped
    volumes:
+      - ./backend:/app
      - ./import:/import
+      - maven_cache:/root/.m2
    depends_on:
      db:
        condition: service_healthy
@@ -143,7 +145,6 @@ services:
      SPRING_MAIL_PROPERTIES_MAIL_SMTP_AUTH: ${MAIL_SMTP_AUTH:-false}
      SPRING_MAIL_PROPERTIES_MAIL_SMTP_STARTTLS_ENABLE: ${MAIL_STARTTLS_ENABLE:-false}
      APP_OCR_BASE_URL: http://ocr-service:8000
-      APP_OCR_TRAINING_TOKEN: "${OCR_TRAINING_TOKEN:-}"
    ports:
      - "${PORT_BACKEND}:8080"
    networks:
@@ -153,7 +154,7 @@ services:
      interval: 15s
      timeout: 5s
      retries: 10
-      start_period: 30s  # JAR starts in ~15s; was 60s when compilation happened at startup
+      start_period: 60s

  # --- Frontend: SvelteKit (Dev Server) ---
  frontend:
@@ -189,5 +190,6 @@ networks:

 volumes:
  frontend_node_modules:
+  maven_cache:
  ocr_models:
  ocr_cache:
--- a/frontend/messages/de.json
+++ b/frontend/messages/de.json
@@ -79,8 +79,6 @@
 	"docs_list_from": "Von",
 	"docs_list_to": "An",
 	"docs_list_unknown": "Unbekannt",
-	"docs_group_undated": "Undatiert",
-	"docs_group_unknown": "Unbekannt",
 	"doc_section_who_when": "Wer & Wann",
 	"doc_section_description": "Beschreibung",
 	"doc_section_file": "Datei",
@@ -560,7 +558,6 @@
 	"training_history_col_cer": "Fehlerrate",
 	"training_status_done": "Fertig",
 	"training_status_failed": "Fehler",
-	"training_error_detail_label": "Fehlerdetails",
 	"training_status_running": "Läuft…",
 	"training_seg_heading": "Segmentierung trainieren",
 	"training_seg_description": "Starte ein neues Training mit annotierten Segmentierungsbereichen, um die Texterkennung zu verbessern.",
--- a/frontend/messages/en.json
+++ b/frontend/messages/en.json
@@ -79,8 +79,6 @@
 	"docs_list_from": "From",
 	"docs_list_to": "To",
 	"docs_list_unknown": "Unknown",
-	"docs_group_undated": "Undated",
-	"docs_group_unknown": "Unknown",
 	"doc_section_who_when": "Who & When",
 	"doc_section_description": "Description",
 	"doc_section_file": "File",
@@ -560,7 +558,6 @@
 	"training_history_col_cer": "Error Rate",
 	"training_status_done": "Done",
 	"training_status_failed": "Failed",
-	"training_error_detail_label": "Error details",
 	"training_status_running": "Running…",
 	"training_seg_heading": "Train segmentation",
 	"training_seg_description": "Start a new training run using annotated segmentation regions to improve text detection.",
--- a/frontend/messages/es.json
+++ b/frontend/messages/es.json
@@ -79,8 +79,6 @@
 	"docs_list_from": "De",
 	"docs_list_to": "Para",
 	"docs_list_unknown": "Desconocido",
-	"docs_group_undated": "Sin fecha",
-	"docs_group_unknown": "Desconocido",
 	"doc_section_who_when": "Quién & Cuándo",
 	"doc_section_description": "Descripción",
 	"doc_section_file": "Archivo",
@@ -560,7 +558,6 @@
 	"training_history_col_cer": "Tasa de error",
 	"training_status_done": "Listo",
 	"training_status_failed": "Error",
-	"training_error_detail_label": "Detalles del error",
 	"training_status_running": "Ejecutando…",
 	"training_seg_heading": "Entrenar segmentación",
 	"training_seg_description": "Inicia un nuevo entrenamiento con regiones de segmentación anotadas para mejorar la detección de texto.",
--- a/frontend/src/lib/components/GroupDivider.svelte
+++ b/frontend/src/lib/components/GroupDivider.svelte
@@ -1,15 +0,0 @@
-<script lang="ts">
-let { label }: { label: string } = $props();
-</script>
-
-<div
-	data-testid="group-divider"
-	role="separator"
-	aria-label={label}
-	class="relative flex items-center py-2 text-center"
->
-	<div class="flex-grow border-t border-line"></div>
-	<span class="mx-4 font-sans text-sm font-bold tracking-widest text-ink/60 uppercase">{label}</span
-	>
-	<div class="flex-grow border-t border-line"></div>
-</div>
--- a/frontend/src/lib/components/GroupDivider.svelte.spec.ts
+++ b/frontend/src/lib/components/GroupDivider.svelte.spec.ts
@@ -1,23 +0,0 @@
-import { describe, expect, it, afterEach } from 'vitest';
-import { cleanup, render } from 'vitest-browser-svelte';
-import { page } from 'vitest/browser';
-import GroupDivider from './GroupDivider.svelte';
-
-afterEach(() => cleanup());
-
-describe('GroupDivider', () => {
-	it('renders the label text', async () => {
-		render(GroupDivider, { label: '1938' });
-		await expect.element(page.getByText('1938')).toBeInTheDocument();
-	});
-
-	it('has data-testid="group-divider" on the root element', async () => {
-		render(GroupDivider, { label: 'Test' });
-		await expect.element(page.getByTestId('group-divider')).toBeInTheDocument();
-	});
-
-	it('renders a person name label', async () => {
-		render(GroupDivider, { label: 'Anna Müller' });
-		await expect.element(page.getByText('Anna Müller')).toBeInTheDocument();
-	});
-});
--- a/frontend/src/lib/components/TrainingHistory.svelte
+++ b/frontend/src/lib/components/TrainingHistory.svelte
@@ -20,12 +20,6 @@ interface Props {

 let { runs }: Props = $props();

-const COLLAPSED_COUNT = 3;
-let expanded = $state(false);
-
-const visibleRuns = $derived(expanded ? runs : runs.slice(0, COLLAPSED_COUNT));
-const hasMore = $derived(runs.length > COLLAPSED_COUNT);
-
 const dateFormatter = new Intl.DateTimeFormat('de-DE', {
 	day: 'numeric',
 	month: 'short',
@@ -52,7 +46,7 @@ function formatCer(cer: number | undefined | null): string {
 			<th class="hidden pb-2 text-right md:table-cell">{m.training_history_col_cer()}</th>
 		</tr>
 	</thead>
-	<tbody id="training-history-rows">
+	<tbody>
 		{#if runs.length === 0}
 			<tr>
 				<td colspan="5" class="py-4 text-center text-sm text-ink-2">
@@ -60,7 +54,7 @@ function formatCer(cer: number | undefined | null): string {
 				</td>
 			</tr>
 		{:else}
-			{#each visibleRuns as run (run.id)}
+			{#each runs as run (run.id)}
 				<tr class="border-b border-line/50 last:border-0">
 					<td class="py-2 text-ink-2">{formatDate(run.createdAt)}</td>
 					<td class="py-2">
@@ -85,6 +79,7 @@ function formatCer(cer: number | undefined | null): string {
 						{:else if run.status === 'FAILED'}
 							<span
 								class="inline-flex items-center gap-1 rounded-sm bg-red-100 px-1.5 py-0.5 text-xs font-medium text-red-700"
+								title={run.errorMessage}
 							>
 								<svg
 									aria-hidden="true"
@@ -100,21 +95,13 @@ function formatCer(cer: number | undefined | null): string {
 								</svg>
 								{m.training_status_failed()}
 							</span>
-							{#if run.errorMessage}
-								<details class="mt-0.5">
-									<summary class="cursor-pointer text-xs text-red-700 underline">
-										{m.training_error_detail_label()}
-									</summary>
-									<p class="mt-1 text-xs text-red-600">{run.errorMessage}</p>
-								</details>
-							{/if}
 						{:else}
 							<span
 								class="inline-flex items-center gap-1 rounded-sm bg-yellow-100 px-1.5 py-0.5 text-xs font-medium text-yellow-700"
 							>
 								<span
 									aria-hidden="true"
-									class="h-1.5 w-1.5 rounded-full bg-yellow-500 motion-safe:animate-pulse"
+									class="h-1.5 w-1.5 animate-pulse rounded-full bg-yellow-500"
 								></span>
 								{m.training_status_running()}
 							</span>
@@ -130,17 +117,3 @@ function formatCer(cer: number | undefined | null): string {
 		{/if}
 	</tbody>
 </table>
-
-{#if hasMore}
-	<div class="mt-2 text-center">
-		<button
-			type="button"
-			aria-expanded={expanded}
-			aria-controls="training-history-rows"
-			class="text-xs font-medium text-ink-3 transition-colors hover:text-ink"
-			onclick={() => (expanded = !expanded)}
-		>
-			{expanded ? m.comp_expandable_show_less() : m.comp_expandable_show_more()}
-		</button>
-	</div>
-{/if}
--- a/frontend/src/lib/components/TrainingHistory.svelte.spec.ts
+++ b/frontend/src/lib/components/TrainingHistory.svelte.spec.ts
@@ -1,52 +0,0 @@
-import { afterEach, describe, expect, it } from 'vitest';
-import { cleanup, render } from 'vitest-browser-svelte';
-import { page } from 'vitest/browser';
-import TrainingHistory from './TrainingHistory.svelte';
-
-afterEach(cleanup);
-
-function makeRun(i: number) {
-	return {
-		id: `run-${i}`,
-		status: 'DONE' as const,
-		blockCount: 10,
-		documentCount: 2,
-		modelName: 'german_kurrent',
-		createdAt: `2026-01-0${i + 1}T12:00:00Z`,
-		completedAt: `2026-01-0${i + 1}T12:05:00Z`
-	};
-}
-
-const fiveRuns = Array.from({ length: 5 }, (_, i) => makeRun(i));
-const twoRuns = Array.from({ length: 2 }, (_, i) => makeRun(i));
-
-describe('TrainingHistory — expand/collapse', () => {
-	it('shows only 3 runs initially when more than 3 exist', async () => {
-		render(TrainingHistory, { runs: fiveRuns });
-
-		const rows = page.getByRole('row');
-		// 1 header row + 3 data rows = 4 total
-		await expect.element(rows.nth(3)).toBeInTheDocument();
-		await expect.element(rows.nth(4)).not.toBeInTheDocument();
-
-		await expect.element(page.getByRole('button', { name: /Mehr anzeigen/i })).toBeInTheDocument();
-	});
-
-	it('shows all runs after clicking the expand button', async () => {
-		render(TrainingHistory, { runs: fiveRuns });
-
-		await page.getByRole('button', { name: /Mehr anzeigen/i }).click();
-
-		const rows = page.getByRole('row');
-		// 1 header row + 5 data rows = 6 total
-		await expect.element(rows.nth(5)).toBeInTheDocument();
-	});
-
-	it('hides the toggle button when 3 or fewer runs exist', async () => {
-		render(TrainingHistory, { runs: twoRuns });
-
-		await expect
-			.element(page.getByRole('button', { name: /Mehr anzeigen/i }))
-			.not.toBeInTheDocument();
-	});
-});
--- a/frontend/src/lib/utils/groupDocuments.spec.ts
+++ b/frontend/src/lib/utils/groupDocuments.spec.ts
@@ -1,165 +0,0 @@
-import { describe, expect, it } from 'vitest';
-import { groupDocuments } from './groupDocuments';
-
-type Doc = {
-	id: string;
-	documentDate?: string | null;
-	sender?: { displayName: string } | null;
-	receivers?: { displayName: string }[];
-};
-
-const doc = (overrides: Partial<Doc> & { id: string }): Doc => ({
-	documentDate: null,
-	sender: null,
-	receivers: [],
-	...overrides
-});
-
-// ─── DATE sort ───────────────────────────────────────────────────────────────
-
-describe('groupDocuments — DATE sort', () => {
-	it('produces one group per distinct year', () => {
-		const docs = [
-			doc({ id: 'a', documentDate: '1923-04-12' }),
-			doc({ id: 'b', documentDate: '1938-01-01' }),
-			doc({ id: 'c', documentDate: '1965-08-03' })
-		];
-		const groups = groupDocuments(docs, 'DATE', 'Undatiert');
-		expect(groups.map((g) => g.label)).toEqual(['1923', '1938', '1965']);
-		expect(groups.every((g) => g.documents.length === 1)).toBe(true);
-	});
-
-	it('puts multiple docs from the same year into one group', () => {
-		const docs = [
-			doc({ id: 'a', documentDate: '1938-03-01' }),
-			doc({ id: 'b', documentDate: '1938-11-15' })
-		];
-		const groups = groupDocuments(docs, 'DATE', 'Undatiert');
-		expect(groups).toHaveLength(1);
-		expect(groups[0].label).toBe('1938');
-		expect(groups[0].documents).toHaveLength(2);
-	});
-
-	it('places undated docs in the fallback group at the bottom', () => {
-		const docs = [
-			doc({ id: 'a', documentDate: '1938-01-01' }),
-			doc({ id: 'b', documentDate: null }),
-			doc({ id: 'c', documentDate: null })
-		];
-		const groups = groupDocuments(docs, 'DATE', 'Undatiert');
-		expect(groups).toHaveLength(2);
-		expect(groups[0].label).toBe('1938');
-		expect(groups[1].label).toBe('Undatiert');
-		expect(groups[1].documents.map((d) => d.id)).toEqual(['b', 'c']);
-	});
-
-	it('returns one group with fallback label when all docs are undated', () => {
-		const docs = [doc({ id: 'a' }), doc({ id: 'b' })];
-		const groups = groupDocuments(docs, 'DATE', 'Undatiert');
-		expect(groups).toHaveLength(1);
-		expect(groups[0].label).toBe('Undatiert');
-	});
-
-	it('returns one group when all docs are from the same year', () => {
-		const docs = [
-			doc({ id: 'a', documentDate: '1938-01-01' }),
-			doc({ id: 'b', documentDate: '1938-06-15' })
-		];
-		const groups = groupDocuments(docs, 'DATE', 'Undatiert');
-		expect(groups).toHaveLength(1);
-	});
-});
-
-// ─── SENDER sort ─────────────────────────────────────────────────────────────
-
-describe('groupDocuments — SENDER sort', () => {
-	it('produces one group per distinct sender', () => {
-		const docs = [
-			doc({ id: 'a', sender: { displayName: 'Anna Müller' } }),
-			doc({ id: 'b', sender: { displayName: 'Karl Bauer' } }),
-			doc({ id: 'c', sender: { displayName: 'Anna Müller' } })
-		];
-		const groups = groupDocuments(docs, 'SENDER', 'Unbekannt');
-		expect(groups.map((g) => g.label)).toEqual(['Anna Müller', 'Karl Bauer']);
-		expect(groups[0].documents).toHaveLength(2);
-		expect(groups[1].documents).toHaveLength(1);
-	});
-
-	it('places docs with no sender in the fallback group at the bottom', () => {
-		const docs = [
-			doc({ id: 'a', sender: { displayName: 'Anna Müller' } }),
-			doc({ id: 'b', sender: null })
-		];
-		const groups = groupDocuments(docs, 'SENDER', 'Unbekannt');
-		expect(groups).toHaveLength(2);
-		expect(groups[0].label).toBe('Anna Müller');
-		expect(groups[1].label).toBe('Unbekannt');
-		expect(groups[1].documents[0].id).toBe('b');
-	});
-});
-
-// ─── RECEIVER sort ───────────────────────────────────────────────────────────
-
-describe('groupDocuments — RECEIVER sort', () => {
-	it('a doc with two receivers appears in both receiver groups', () => {
-		const docs = [
-			doc({
-				id: 'a',
-				receivers: [{ displayName: 'Anna' }, { displayName: 'Karl' }]
-			})
-		];
-		const groups = groupDocuments(docs, 'RECEIVER', 'Unbekannt');
-		expect(groups.map((g) => g.label)).toEqual(['Anna', 'Karl']);
-		expect(groups[0].documents[0].id).toBe('a');
-		expect(groups[1].documents[0].id).toBe('a');
-	});
-
-	it('places docs with no receivers in the fallback group at the bottom', () => {
-		const docs = [
-			doc({ id: 'a', receivers: [{ displayName: 'Anna' }] }),
-			doc({ id: 'b', receivers: [] })
-		];
-		const groups = groupDocuments(docs, 'RECEIVER', 'Unbekannt');
-		expect(groups).toHaveLength(2);
-		expect(groups[0].label).toBe('Anna');
-		expect(groups[1].label).toBe('Unbekannt');
-		expect(groups[1].documents[0].id).toBe('b');
-	});
-
-	it('composite keys are unique: groupLabel + doc.id identifies each item', () => {
-		const docs = [
-			doc({ id: 'a', receivers: [{ displayName: 'Anna' }, { displayName: 'Karl' }] }),
-			doc({ id: 'b', receivers: [{ displayName: 'Anna' }] })
-		];
-		const groups = groupDocuments(docs, 'RECEIVER', 'Unbekannt');
-		const keys = groups.flatMap((g) => g.documents.map((d) => `${g.label}-${d.id}`));
-		const uniqueKeys = new Set(keys);
-		expect(uniqueKeys.size).toBe(keys.length);
-	});
-});
-
-// ─── Non-groupable sorts ──────────────────────────────────────────────────────
-
-describe('groupDocuments — non-groupable sorts', () => {
-	it('TITLE sort returns one group containing all documents', () => {
-		const docs = [doc({ id: 'a' }), doc({ id: 'b' })];
-		const groups = groupDocuments(docs, 'TITLE', 'Undatiert');
-		expect(groups).toHaveLength(1);
-		expect(groups[0].documents).toHaveLength(2);
-	});
-
-	it('UPLOAD_DATE sort returns one group containing all documents', () => {
-		const docs = [doc({ id: 'a' }), doc({ id: 'b' })];
-		const groups = groupDocuments(docs, 'UPLOAD_DATE', 'Undatiert');
-		expect(groups).toHaveLength(1);
-		expect(groups[0].documents).toHaveLength(2);
-	});
-});
-
-// ─── Edge cases ──────────────────────────────────────────────────────────────
-
-describe('groupDocuments — edge cases', () => {
-	it('returns empty array for an empty document list', () => {
-		expect(groupDocuments([], 'DATE', 'Undatiert')).toEqual([]);
-	});
-});
--- a/frontend/src/lib/utils/groupDocuments.ts
+++ b/frontend/src/lib/utils/groupDocuments.ts
@@ -1,56 +0,0 @@
-export type GroupableDoc = {
-	id: string;
-	documentDate?: string | null;
-	sender?: { displayName: string } | null;
-	receivers?: { displayName: string }[];
-};
-
-export type DocumentGroup<T extends GroupableDoc> = {
-	label: string;
-	documents: T[];
-};
-
-const GROUPABLE_SORTS = ['DATE', 'SENDER', 'RECEIVER'] as const;
-type GroupableSort = (typeof GROUPABLE_SORTS)[number];
-
-export function groupDocuments<T extends GroupableDoc>(
-	docs: T[],
-	sort: string,
-	fallbackLabel: string
-): DocumentGroup<T>[] {
-	if (docs.length === 0) return [];
-	if (!GROUPABLE_SORTS.includes(sort as GroupableSort)) {
-		return [{ label: '', documents: [...docs] }];
-	}
-
-	const groupMap = new Map<string, T[]>();
-	const fallbackDocs: T[] = [];
-
-	for (const doc of docs) {
-		const keys = extractGroupKeys(doc, sort as GroupableSort);
-		if (keys.length === 0) {
-			fallbackDocs.push(doc);
-		} else {
-			for (const key of keys) {
-				if (!groupMap.has(key)) groupMap.set(key, []);
-				groupMap.get(key)!.push(doc);
-			}
-		}
-	}
-
-	const groups = [...groupMap.entries()].map(([label, documents]) => ({ label, documents }));
-	if (fallbackDocs.length > 0) groups.push({ label: fallbackLabel, documents: fallbackDocs });
-	return groups;
-}
-
-function extractGroupKeys<T extends GroupableDoc>(doc: T, sort: GroupableSort): string[] {
-	if (sort === 'DATE') {
-		const year = doc.documentDate
-			? String(new Date(doc.documentDate + 'T12:00:00').getFullYear())
-			: null;
-		return year ? [year] : [];
-	}
-	if (sort === 'SENDER') return doc.sender ? [doc.sender.displayName] : [];
-	if (sort === 'RECEIVER') return (doc.receivers ?? []).map((r) => r.displayName);
-	return [];
-}
--- a/frontend/src/routes/+page.server.ts
+++ b/frontend/src/routes/+page.server.ts
@@ -13,18 +13,8 @@ export async function load({ url, fetch }) {
 	const senderId = url.searchParams.get('senderId') || '';
 	const receiverId = url.searchParams.get('receiverId') || '';
 	const tags = url.searchParams.getAll('tag');
-	const VALID_SORTS = ['DATE', 'TITLE', 'SENDER', 'RECEIVER', 'UPLOAD_DATE'] as const;
-	type ValidSort = (typeof VALID_SORTS)[number];
-	const rawSort = url.searchParams.get('sort') ?? 'DATE';
-	const sort: ValidSort = (VALID_SORTS as readonly string[]).includes(rawSort)
-		? (rawSort as ValidSort)
-		: 'DATE';
-	const VALID_DIRS = ['asc', 'desc'] as const;
-	type ValidDir = (typeof VALID_DIRS)[number];
-	const rawDir = url.searchParams.get('dir') ?? 'desc';
-	const dir: ValidDir = (VALID_DIRS as readonly string[]).includes(rawDir)
-		? (rawDir as ValidDir)
-		: 'desc';
+	const sort = url.searchParams.get('sort') || 'DATE';
+	const dir = url.searchParams.get('dir') || 'desc';
 	const tagQ = url.searchParams.get('tagQ') || '';

 	const isDashboard = !q && !from && !to && !senderId && !receiverId && !tags.length && !tagQ;
@@ -45,7 +35,7 @@ export async function load({ url, fetch }) {
 								receiverId: receiverId || undefined,
 								tag: tags.length ? tags : undefined,
 								tagQ: tagQ || undefined,
-								sort,
+								sort: sort as 'DATE' | 'TITLE' | 'SENDER' | 'RECEIVER' | 'UPLOAD_DATE',
 								dir: dir || undefined
 							}
 						}
--- a/frontend/src/routes/+page.svelte
+++ b/frontend/src/routes/+page.svelte
@@ -139,7 +139,6 @@ const showRightColumn = $derived(data.canWrite || (data.incompleteDocs?.length ?
 			error={data.error}
 			total={data.total ?? 0}
 			q={q}
-			sort={sort}
 		/>
 	{/if}
 </main>
--- a/frontend/src/routes/DocumentList.svelte
+++ b/frontend/src/routes/DocumentList.svelte
@@ -30,9 +30,7 @@ let {
 	sort?: string;
 } = $props();

-const fallbackLabel = $derived(
-	(sort ?? 'DATE') === 'DATE' ? m.docs_group_undated() : m.docs_group_unknown()
-);
+const fallbackLabel = $derived(sort === 'DATE' ? m.docs_group_undated() : m.docs_group_unknown());
 const groupedDocuments = $derived.by(() =>
 	groupDocuments(documents, sort ?? 'DATE', fallbackLabel)
 );
--- a/frontend/src/routes/DocumentList.svelte.spec.ts
+++ b/frontend/src/routes/DocumentList.svelte.spec.ts
@@ -1,12 +1,10 @@
-import { describe, expect, it, vi, afterEach } from 'vitest';
-import { cleanup, render } from 'vitest-browser-svelte';
+import { describe, expect, it, vi } from 'vitest';
+import { render } from 'vitest-browser-svelte';
 import { page } from 'vitest/browser';
 import DocumentList from './DocumentList.svelte';

 vi.mock('$app/navigation', () => ({ goto: vi.fn() }));

-afterEach(() => cleanup());
-
 const baseProps = {
 	documents: [],
 	canWrite: false,
@@ -15,14 +13,7 @@ const baseProps = {
 	q: ''
 };

-type DocOverrides = {
-	id?: string;
-	documentDate?: string | null;
-	sender?: { firstName?: string | null; lastName: string; displayName: string } | null;
-	receivers?: { firstName?: string | null; lastName: string; displayName: string }[];
-};
-
-const makeDoc = (overrides: DocOverrides = {}) => ({
+const makeDoc = () => ({
 	id: '1',
 	title: 'Testbrief',
 	originalFilename: 'testbrief.pdf',
@@ -30,9 +21,8 @@ const makeDoc = (overrides: DocOverrides = {}) => ({
 	documentDate: '2024-03-15',
 	location: null,
 	sender: null,
-	receivers: [] as { firstName?: string | null; lastName: string; displayName: string }[],
-	tags: [],
-	...overrides
+	receivers: [],
+	tags: []
 });

 describe('DocumentList – result count', () => {
@@ -59,59 +49,3 @@ describe('DocumentList – empty state with search term', () => {
 		await expect.element(page.getByText(/"Urlaub"/)).toBeInTheDocument();
 	});
 });
-
-// ─── Group headers ────────────────────────────────────────────────────────────
-
-describe('DocumentList – group headers', () => {
-	it('renders group-divider elements when DATE sort spans multiple years', async () => {
-		const documents = [
-			makeDoc({ id: '1', documentDate: '1923-04-12' }),
-			makeDoc({ id: '2', documentDate: '1965-08-03' })
-		];
-		render(DocumentList, { ...baseProps, documents, total: 2, sort: 'DATE' });
-		await expect.element(page.getByTestId('group-divider').first()).toBeInTheDocument();
-	});
-
-	it('does not render group-divider when DATE sort has only one distinct year', async () => {
-		const documents = [
-			makeDoc({ id: '1', documentDate: '1938-01-01' }),
-			makeDoc({ id: '2', documentDate: '1938-06-15' })
-		];
-		render(DocumentList, { ...baseProps, documents, total: 2, sort: 'DATE' });
-		await expect.element(page.getByTestId('group-divider')).not.toBeInTheDocument();
-	});
-
-	it('does not render group-divider for TITLE sort', async () => {
-		const documents = [
-			makeDoc({ id: '1', documentDate: '1923-04-12' }),
-			makeDoc({ id: '2', documentDate: '1965-08-03' })
-		];
-		render(DocumentList, { ...baseProps, documents, total: 2, sort: 'TITLE' });
-		await expect.element(page.getByTestId('group-divider')).not.toBeInTheDocument();
-	});
-
-	it('shows Undatiert fallback label when sort is undefined and doc has no date', async () => {
-		const documents = [
-			makeDoc({ id: '1', documentDate: '1938-01-01' }),
-			makeDoc({ id: '2', documentDate: null })
-		];
-		render(DocumentList, { ...baseProps, documents, total: 2 }); // sort omitted — defaults to DATE grouping
-		await expect.element(page.getByText(/UNDATIERT/i)).toBeInTheDocument();
-	});
-
-	it('a doc with two receivers appears in both receiver groups', async () => {
-		const documents = [
-			makeDoc({
-				id: '1',
-				receivers: [
-					{ firstName: null, lastName: 'Müller', displayName: 'Anna Müller' },
-					{ firstName: null, lastName: 'Bauer', displayName: 'Karl Bauer' }
-				]
-			})
-		];
-		render(DocumentList, { ...baseProps, documents, total: 1, sort: 'RECEIVER' });
-		const links = page.getByRole('link', { name: /Testbrief/ });
-		await expect.element(links.first()).toBeInTheDocument();
-		await expect.element(links.nth(1)).toBeInTheDocument();
-	});
-});
--- a/frontend/src/routes/conversations/ConversationTimeline.svelte
+++ b/frontend/src/routes/conversations/ConversationTimeline.svelte
@@ -1,8 +1,6 @@
 <script lang="ts">
 import { m } from '$lib/paraglide/messages.js';
 import { formatDate } from '$lib/utils/date';
-import GroupDivider from '$lib/components/GroupDivider.svelte';
-import { groupDocuments } from '$lib/utils/groupDocuments';

 let {
 	documents,
@@ -31,15 +29,22 @@ let {

 const documentYears = $derived(
 	documents
-		.map((doc) =>
-			doc.documentDate ? new Date(doc.documentDate + 'T12:00:00').getFullYear() : null
-		)
+		.map((doc) => (doc.documentDate ? new Date(doc.documentDate).getFullYear() : null))
 		.filter((y): y is number => y !== null)
 );
 const yearFrom = $derived(documentYears.length > 0 ? Math.min(...documentYears) : null);
 const yearTo = $derived(documentYears.length > 0 ? Math.max(...documentYears) : null);

-const documentGroups = $derived.by(() => groupDocuments(documents, 'DATE', ''));
+const enrichedDocuments = $derived(
+	documents.map((doc, i) => {
+		const year = doc.documentDate ? new Date(doc.documentDate).getFullYear() : null;
+		const prevYear =
+			i > 0 && documents[i - 1].documentDate
+				? new Date(documents[i - 1].documentDate!).getFullYear()
+				: null;
+		return { doc, year, showYearDivider: year !== null && year !== prevYear };
+	})
+);
 </script>

 <!-- Summary bar -->
@@ -77,83 +82,87 @@ const documentGroups = $derived.by(() => groupDocuments(documents, 'DATE', ''));

 	<div class="p-6 md:p-8">
 		<div class="relative z-10 flex flex-col gap-4">
-			{#each documentGroups as group (group.label)}
-				{#if group.label}
-					<GroupDivider label={group.label} />
+			{#each enrichedDocuments as { doc, year, showYearDivider } (doc.id)}
+				{#if showYearDivider}
+					<div data-testid="year-divider" class="relative flex items-center py-2 text-center">
+						<div class="flex-grow border-t border-line"></div>
+						<span class="mx-4 font-sans text-xs font-bold tracking-widest text-ink/40 uppercase"
+							>{year}</span
+						>
+						<div class="flex-grow border-t border-line"></div>
+					</div>
 				{/if}
-				{#each group.documents as doc (doc.id)}
-					{@const isRight = doc.sender?.id === senderId}
+				{@const isRight = doc.sender?.id === senderId}

-					<!-- Message Row -->
-					<div class="flex w-full {isRight ? 'justify-end' : 'justify-start'}">
-						<!-- Bubble Group -->
-						<div
-							class="flex max-w-[90%] gap-3 md:max-w-[70%] {isRight
+				<!-- Message Row -->
+				<div class="flex w-full {isRight ? 'justify-end' : 'justify-start'}">
+					<!-- Bubble Group -->
+					<div
+						class="flex max-w-[90%] gap-3 md:max-w-[70%] {isRight
 							? 'flex-row-reverse'
 							: 'flex-row'}"
-						>
-							<!-- AVATAR -->
-							<div class="mt-auto mb-1 hidden flex-shrink-0 sm:block">
-								<div
-									class="flex h-8 w-8 items-center justify-center rounded-full border font-serif text-xs shadow-sm
+					>
+						<!-- AVATAR -->
+						<div class="mt-auto mb-1 hidden flex-shrink-0 sm:block">
+							<div
+								class="flex h-8 w-8 items-center justify-center rounded-full border font-serif text-xs shadow-sm
                                {isRight
 									? 'border-primary bg-primary text-primary-fg'
 									: 'border-line bg-surface text-ink'}"
-								>
-									{#if doc.sender}
-										{doc.sender.firstName ? doc.sender.firstName[0] : doc.sender.lastName[0]}{doc.sender.lastName[0]}
-									{:else}
-										?
-									{/if}
-								</div>
+							>
+								{#if doc.sender}
+									{doc.sender.firstName ? doc.sender.firstName[0] : doc.sender.lastName[0]}{doc.sender.lastName[0]}
+								{:else}
+									?
+								{/if}
 							</div>
+						</div>

-							<!-- BUBBLE CARD -->
-							<a
-								href="/documents/{doc.id}"
-								class="group block transform rounded border p-4 shadow-sm transition-all duration-200 hover:-translate-y-0.5 hover:shadow-md
+						<!-- BUBBLE CARD -->
+						<a
+							href="/documents/{doc.id}"
+							class="group block transform rounded border p-4 shadow-sm transition-all duration-200 hover:-translate-y-0.5 hover:shadow-md
                               {isRight
 								? 'rounded-br-none border-primary bg-primary text-primary-fg'
 								: 'rounded-bl-none border-line bg-muted/50 text-ink'}"
-							>
-								<!-- Header -->
-								<div class="mb-2 flex items-start justify-between gap-4">
-									<h3
-										class="font-serif text-sm leading-snug font-medium {isRight
+						>
+							<!-- Header -->
+							<div class="mb-2 flex items-start justify-between gap-4">
+								<h3
+									class="font-serif text-sm leading-snug font-medium {isRight
 										? 'text-primary-fg'
 										: 'text-ink'}"
-									>
-										{doc.title || doc.originalFilename}
-									</h3>
+								>
+									{doc.title || doc.originalFilename}
+								</h3>

-									<!-- Status Dot -->
-									<span
-										class="mt-1.5 h-1.5 w-1.5 flex-shrink-0 rounded-full
+								<!-- Status Dot -->
+								<span
+									class="mt-1.5 h-1.5 w-1.5 flex-shrink-0 rounded-full
                                    {doc.status === 'UPLOADED' ? 'bg-accent' : 'bg-yellow-400'}"
-										title={doc.status}
-									>
-									</span>
-								</div>
+									title={doc.status}
+								>
+								</span>
+							</div>

-								<!-- Metadata -->
-								<div
-									class="flex flex-wrap gap-3 font-sans text-[10px] tracking-wider uppercase opacity-80 {isRight
+							<!-- Metadata -->
+							<div
+								class="flex flex-wrap gap-3 font-sans text-[10px] tracking-wider uppercase opacity-80 {isRight
 									? 'text-primary-fg/70'
 									: 'text-ink-2'}"
-								>
+							>
+								<span class="flex items-center">
+									{doc.documentDate ? formatDate(doc.documentDate) : '—'}
+								</span>
+								{#if doc.location}
 									<span class="flex items-center">
-										{doc.documentDate ? formatDate(doc.documentDate) : '—'}
+										• {doc.location}
 									</span>
-									{#if doc.location}
-										<span class="flex items-center">
-											• {doc.location}
-										</span>
-									{/if}
-								</div>
-							</a>
-						</div>
+								{/if}
+							</div>
+						</a>
 					</div>
-				{/each}
+				</div>
 			{/each}
 		</div>
 	</div>
--- a/frontend/src/routes/conversations/page.svelte.spec.ts
+++ b/frontend/src/routes/conversations/page.svelte.spec.ts
@@ -116,7 +116,7 @@ describe('Conversations page – summary', () => {
 describe('Conversations page – year dividers', () => {
 	it('renders a year divider for the first document', async () => {
 		render(Page, { data: withDocs });
-		await expect.element(page.getByTestId('group-divider').first()).toHaveTextContent('1923');
+		await expect.element(page.getByTestId('year-divider').first()).toHaveTextContent('1923');
 	});

 	it('renders a divider for each new year in the document list', async () => {
@@ -128,8 +128,8 @@ describe('Conversations page – year dividers', () => {
 			]
 		};
 		render(Page, { data });
-		await expect.element(page.getByTestId('group-divider').first()).toHaveTextContent('1923');
-		await expect.element(page.getByTestId('group-divider').nth(1)).toHaveTextContent('1965');
+		await expect.element(page.getByTestId('year-divider').first()).toHaveTextContent('1923');
+		await expect.element(page.getByTestId('year-divider').nth(1)).toHaveTextContent('1965');
 	});

 	it('does not render a second divider for documents from the same year', async () => {
@@ -142,8 +142,8 @@ describe('Conversations page – year dividers', () => {
 		};
 		render(Page, { data });
 		// Only one divider for 1923; 1965 divider should not appear
-		await expect.element(page.getByTestId('group-divider').first()).toHaveTextContent('1923');
-		await expect.element(page.getByTestId('group-divider').nth(1)).not.toBeInTheDocument();
+		await expect.element(page.getByTestId('year-divider').first()).toHaveTextContent('1923');
+		await expect.element(page.getByTestId('year-divider').nth(1)).not.toBeInTheDocument();
 	});
 });

--- a/ocr-service/Dockerfile
+++ b/ocr-service/Dockerfile
@@ -1,4 +1,4 @@
-FROM python:3.11.9-slim
+FROM python:3.11-slim

 WORKDIR /app

@@ -21,8 +21,6 @@ RUN pip install --no-cache-dir -r requirements.txt

 COPY . .

-RUN chmod +x /app/entrypoint.sh
-
 EXPOSE 8000

-CMD ["/app/entrypoint.sh"]
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "1"]
--- a/ocr-service/ensure_blla_model.py
+++ b/ocr-service/ensure_blla_model.py
@@ -1,80 +0,0 @@
-"""Validates the blla segmentation base model and downloads it if needed.
-
-Run at container startup before uvicorn. ketos 7 requires the model in
-CoreML protobuf or safetensors format — legacy PyTorch ZIP archives
-(torch.save output from kraken <4) are not loadable and will be replaced.
-
-Exits non-zero on failure so Docker marks the container unhealthy rather
-than silently starting with a broken model.
-"""
-
-import glob
-import logging
-import os
-import shutil
-import subprocess
-import sys
-
-logging.basicConfig(
-    level=logging.INFO,
-    format="%(levelname)s:ensure_blla_model:%(message)s",
-)
-log = logging.getLogger(__name__)
-
-BLLA_MODEL_PATH = os.environ.get("BLLA_MODEL_PATH", "/app/models/blla.mlmodel")
-# DOI for "General segmentation model for print and handwriting" — ketos 7 compatible.
-BLLA_MODEL_DOI = "10.5281/zenodo.14602569"
-HTRMOPO_DIR = os.path.expanduser("~/.local/share/htrmopo")
-
-
-def _model_is_loadable(path: str) -> bool:
-    try:
-        from kraken.lib import vgsl
-
-        vgsl.TorchVGSLModel.load_model(path)
-        return True
-    except (RuntimeError, OSError, ValueError) as e:
-        log.warning("Model at %s failed to load: %s", path, e)
-        return False
-    except Exception:
-        log.debug("Unexpected error loading model at %s", path, exc_info=True)
-        return False
-
-
-def _download_blla() -> str:
-    log.info("Downloading blla model (DOI %s) ...", BLLA_MODEL_DOI)
-    result = subprocess.run(
-        ["kraken", "get", BLLA_MODEL_DOI],
-        capture_output=True,
-        text=True,
-    )
-    if result.returncode != 0:
-        log.error("kraken get failed: %s", result.stderr)
-        sys.exit(1)
-
-    candidates = sorted(glob.glob(os.path.join(HTRMOPO_DIR, "*/blla.mlmodel")))
-    if not candidates:
-        log.error("Downloaded blla.mlmodel not found under %s", HTRMOPO_DIR)
-        sys.exit(1)
-
-    return candidates[-1]
-
-
-def main() -> None:
-    if os.path.exists(BLLA_MODEL_PATH):
-        if _model_is_loadable(BLLA_MODEL_PATH):
-            log.info("blla model OK: %s", BLLA_MODEL_PATH)
-            return
-        log.warning(
-            "blla model at %s is in an incompatible format — replacing", BLLA_MODEL_PATH
-        )
-        os.rename(BLLA_MODEL_PATH, BLLA_MODEL_PATH + ".incompatible")
-
-    os.makedirs(os.path.dirname(BLLA_MODEL_PATH), exist_ok=True)
-    downloaded = _download_blla()
-    shutil.copy2(downloaded, BLLA_MODEL_PATH)
-    log.info("Installed blla model at %s", BLLA_MODEL_PATH)
-
-
-if __name__ == "__main__":
-    main()
--- a/ocr-service/entrypoint.sh
+++ b/ocr-service/entrypoint.sh
@@ -1,9 +0,0 @@
-#!/bin/bash
-set -euo pipefail
-
-# Validate the blla segmentation base model and download it if missing or
-# incompatible. ketos 7 dropped support for legacy PyTorch ZIP archives —
-# this ensures the volume always holds a loadable CoreML protobuf model.
-python3 /app/ensure_blla_model.py
-
-exec uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1
--- a/ocr-service/main.py
+++ b/ocr-service/main.py
@@ -472,35 +472,16 @@ async def segtrain_model(
                "-q", "fixed",
                "-N", "10",
            ]
-            # Train at 800px height. The default blla model uses 1800px, which peaks at
-            # ~7+ GB on CPU and kills the host (ketos ignores -s when -i is present, so
-            # we cannot override the height of an existing model).
-            # Strategy: only use the base model if it is already at 800px (i.e. was
-            # produced by a previous fine-tuning run here). Otherwise train from scratch —
-            # the first run bootstraps a 800px model; all subsequent runs fine-tune it.
-            seg_spec = (
-                "[1,800,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 "
-                "Cr3,3,256 Gn32 Cr3,3,256 Gn32 Lbx32 Lby32 Cr1,1,32 Gn32 Lby32 Lbx32]"
-            )
-            use_base_model = False
            if os.path.exists(blla_model_path):
-                try:
-                    from kraken.lib import vgsl as _vgsl
-                    _m = _vgsl.TorchVGSLModel.load_model(blla_model_path)
-                    use_base_model = _m.input[2] == 800  # input is (batch, channels, H, W)
-                    if not use_base_model:
-                        log.info(
-                            "Base model height is %dpx — skipping -i to avoid OOM; "
-                            "will train from scratch at 800px",
-                            _m.input[2],
-                        )
-                except Exception as exc:
-                    log.warning("Could not inspect base model height, training from scratch: %s", exc)
-
-            if use_base_model:
-                cmd += ["-i", blla_model_path, "--resize", "union", "-s", seg_spec]
+                cmd += ["-i", blla_model_path, "--resize", "both"]
            else:
-                cmd += ["-s", seg_spec]
+                # No pretrained model — train from scratch with reduced height (800px)
+                # to keep peak RAM under ~200 MB on CPU (default 1800px uses ~500 MB+)
+                cmd += [
+                    "-s",
+                    "[1,800,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 "
+                    "Cr3,3,256 Gn32 Cr3,3,256 Gn32 Lbx32 Lby32 Cr1,1,32 Gn32 Lby32 Lbx32]",
+                ]
            cmd += xml_files

            log.info("Running: %s", " ".join(cmd[:5]) + " ...")
@@ -512,8 +493,7 @@ async def segtrain_model(
                raise RuntimeError(f"ketos segtrain failed (exit {proc.returncode}): {proc.stderr[-500:]}")

            accuracy, epochs = _parse_best_checkpoint(checkpoint_dir)
-            cer = round(1.0 - accuracy, 4) if accuracy is not None else None
-            log.info("Segmentation training complete — epochs=%s accuracy=%s cer=%s", epochs, accuracy, cer)
+            log.info("Segmentation training complete — epochs=%s accuracy=%s", epochs, accuracy)

            best_model = _find_best_model(checkpoint_dir)
            if best_model is None:
@@ -528,7 +508,7 @@ async def segtrain_model(
            shutil.copy2(best_model, blla_model_path)
            log.info("Replaced blla model at %s", blla_model_path)

-            return {"loss": None, "accuracy": accuracy, "cer": cer, "epochs": epochs}
+            return {"loss": None, "accuracy": accuracy, "cer": None, "epochs": epochs}

    result = await asyncio.to_thread(_run_segtrain)
    return result
--- a/ocr-service/test_ensure_blla_model.py
+++ b/ocr-service/test_ensure_blla_model.py
@@ -1,69 +0,0 @@
-"""Unit tests for ensure_blla_model.main()."""
-
-from unittest.mock import MagicMock, call, patch
-
-import ensure_blla_model
-
-
-# ─── Model already loadable ───────────────────────────────────────────────────
-
-
-def test_main_returns_early_when_model_is_loadable():
-    """When the model exists and loads cleanly, no download or rename occurs."""
-    with (
-        patch("os.path.exists", return_value=True),
-        patch.object(ensure_blla_model, "_model_is_loadable", return_value=True),
-        patch.object(ensure_blla_model, "_download_blla") as mock_download,
-        patch("os.rename") as mock_rename,
-    ):
-        ensure_blla_model.main()
-
-    mock_download.assert_not_called()
-    mock_rename.assert_not_called()
-
-
-# ─── Model exists but is incompatible ─────────────────────────────────────────
-
-
-def test_main_replaces_incompatible_model():
-    """An incompatible model is renamed and replaced with a fresh download."""
-    fake_path = "/app/models/blla.mlmodel"
-    downloaded_path = "/tmp/downloaded.mlmodel"
-
-    with (
-        patch.object(ensure_blla_model, "BLLA_MODEL_PATH", fake_path),
-        patch("os.path.exists", return_value=True),
-        patch.object(ensure_blla_model, "_model_is_loadable", return_value=False),
-        patch.object(ensure_blla_model, "_download_blla", return_value=downloaded_path),
-        patch("os.rename") as mock_rename,
-        patch("shutil.copy2") as mock_copy,
-        patch("os.makedirs"),
-    ):
-        ensure_blla_model.main()
-
-    mock_rename.assert_called_once_with(fake_path, fake_path + ".incompatible")
-    mock_copy.assert_called_once_with(downloaded_path, fake_path)
-
-
-# ─── Model missing ────────────────────────────────────────────────────────────
-
-
-def test_main_downloads_when_model_missing():
-    """When the model file doesn't exist at all, it is downloaded without rename."""
-    fake_path = "/app/models/blla.mlmodel"
-    downloaded_path = "/tmp/downloaded.mlmodel"
-
-    with (
-        patch.object(ensure_blla_model, "BLLA_MODEL_PATH", fake_path),
-        patch("os.path.exists", return_value=False),
-        patch.object(ensure_blla_model, "_model_is_loadable") as mock_loadable,
-        patch.object(ensure_blla_model, "_download_blla", return_value=downloaded_path),
-        patch("os.rename") as mock_rename,
-        patch("shutil.copy2") as mock_copy,
-        patch("os.makedirs"),
-    ):
-        ensure_blla_model.main()
-
-    mock_loadable.assert_not_called()
-    mock_rename.assert_not_called()
-    mock_copy.assert_called_once_with(downloaded_path, fake_path)
Author	SHA1	Message	Date
Marcel	13955a5459	test(search): add sender name FTS coverage and combined filter test Some checks failed CI / Unit & Component Tests (push) Failing after 3s Details CI / Backend Unit Tests (push) Failing after 2s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 0s Details - should_find_document_by_sender_name — symmetric with existing receiver test - fts_combined_with_status_filter_excludes_non_matching_status — verifies hasIds(rankedIds).and(hasStatus(...)) two-phase search works together Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:03:37 +02:00
Marcel	5affe21b79	refactor(search): replace O(n²) indexOf with HashMap for rank ordering ids.indexOf() scans the full list for each document, giving O(n²) total. Build a Map<UUID, Integer> once at O(n) and use getOrDefault at O(1) per document. Behavior is identical; existing tests remain green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 10:59:05 +02:00
Marcel	e621bdd890	fix(search): respect DATE sort when text is present — do not override with relevance When a user explicitly selects DATE sort with a text query active, the previous code treated it identically to RELEVANCE, silently discarding the user's sort choice. Remove DATE from the useRankOrder condition so that explicit DATE sort always goes through the standard JPA sort path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 10:57:24 +02:00
Marcel	3421f3203c	feat(fts): backfill search_vector for all existing documents (V35) Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 2s Details CI / Backend Unit Tests (pull_request) Failing after 3s Details Fires the BEFORE UPDATE trigger for every documents row, which recomputes the tsvector from all currently-linked metadata, blocks, receivers, and tags. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 23:47:45 +02:00
Marcel	349f74d39a	feat(fts): replace ILIKE hasText with FTS two-phase search and RELEVANCE sort - DocumentSort: add RELEVANCE enum value - DocumentSpecifications: remove hasText() ILIKE, add hasIds(List<UUID>) for FTS-pre-filtered ID sets - DocumentService.searchDocuments(): FTS two-phase path — findRankedIdsByFts() returns ranked UUIDs, hasIds() narrows subsequent Specification query, in-memory re-sort preserves rank order; RELEVANCE is the default when text is present and no explicit non-relevance sort is requested - DocumentSpecificationsTest: remove hasText() tests (Specification removed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 23:46:24 +02:00
Marcel	efeb913d4c	feat(fts): add search_vector column, GIN index, DB triggers, and FTS repository method (V34) - V34 migration: adds search_vector tsvector column with GIN index - BEFORE INSERT/UPDATE trigger on documents rebuilds vector from title (A), summary + transcription_blocks.text (B), sender/receiver names (C), tag names + location (D) using german FTS config - AFTER triggers on transcription_blocks, document_receivers, document_tags touch the parent document row to re-fire the BEFORE UPDATE trigger - DocumentRepository.findRankedIdsByFts() native query using websearch_to_tsquery - DocumentFtsTest: 12 integration tests covering stemming, trigger sync, ranking, stop words, malformed input, receiver and tag search Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 23:38:12 +02:00
Marcel	fc27043d40	chore: add Claude personas, skills, memory, and project docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 20:22:39 +02:00