docs(legibility): migrate CLAUDE.md rules into human docs — DOC-7

Processes all 7 CLAUDE.md files according to the 3-bucket classification. Migration targets (CONTRIBUTING.md, docs/ARCHITECTURE.md, docs/DEPLOYMENT.md, domain READMEs) are introduced by DOC-2/4/5/6 — this PR must merge last. ### scripts/CLAUDE.md → scripts/README.md New `scripts/README.md` with full script documentation (preserving the ⚠️ destructive-operation warning on reset-db.sh). `scripts/CLAUDE.md` reduced to a pointer + "document new scripts in README.md" reminder. ### .devcontainer/CLAUDE.md → .devcontainer/README.md New `.devcontainer/README.md` with all configuration, usage, and limitations. `devcontainer/CLAUDE.md` reduced to a single pointer line. ### docs/CLAUDE.md → docs/README.md New `docs/README.md` covering the folder structure, ADR guide, infrastructure docs, and specs folder. `docs/CLAUDE.md` reduced to pointer + ADR reminder. ### ocr-service/CLAUDE.md Reduced to pointer to `ocr-service/README.md` (content migrated in DOC-6). Kept LLM reminders: single-node constraint, ALLOWED_PDF_HOSTS SSRF risk. ### backend/CLAUDE.md - Layering Rules → pointer to docs/ARCHITECTURE.md - Error Handling → pointer to CONTRIBUTING.md + reminder - Security/Permissions → pointer to docs/ARCHITECTURE.md + reminder - Package Structure → tagged TODO post-REFACTOR-1 - Fixed errors.ts path to frontend/src/lib/shared/errors.ts - Added ANNOTATE_ALL + BLOG_WRITE to permission list - Key Entities, Entity Code Style, Services → kept (Bucket-2) ### root CLAUDE.md - Stack, Infrastructure, Dev Container → pointers - Layering Rules, Error Handling, Security, OpenAPI, API Client, Date Handling, UI Components, Frontend Error Handling → pointers + reminders - Package Structure → tagged TODO post-REFACTOR-1 - Domain Model, Entity Code Style, Form Actions, Styling → kept (Bucket-2) ### frontend/CLAUDE.md - API Client Pattern, Date Handling → pointers + reminders - Key UI Components → pointer to domain READMEs - Styling, Form Actions, How to Run, Vite Proxy, i18n → kept (Bucket-2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 23:33:41 +02:00
parent a5f4b0df31
commit e2c86626f7
11 changed files with 452 additions and 732 deletions
--- a/.devcontainer/CLAUDE.md
+++ b/.devcontainer/CLAUDE.md
@@ -1,96 +1,3 @@
-# Dev Container — Familienarchiv
+# Dev Container

-## Overview
-
-VS Code Dev Container configuration for a pre-configured development environment. Includes Java 21, Maven, and Node.js 24 — everything needed to work on both backend and frontend.
-
-## Configuration
-
-File: `.devcontainer/devcontainer.json`
-
-### Included Features
-
-| Feature | Version | Purpose |
-|---|---|---|
-| Java | 21 | Spring Boot backend |
-| Maven | bundled with Java feature | Build tool |
-| Node.js | 24 | SvelteKit frontend |
-
-### VS Code Extensions (Auto-installed)
-
-| Extension | Purpose |
-|---|---|
-| `vscjava.vscode-java-pack` | Java language support, debugging, testing |
-| `vmware.vscode-spring-boot` | Spring Boot tooling |
-| `gabrielbb.vscode-lombok` | Lombok annotation support |
-| `humao.rest-client` | HTTP request files (for `backend/api_tests/`) |
-
-### Ports
-
- `8080` forwarded to host — access backend at `http://localhost:8080`
-
-### User
-
-Runs as `vscode` user (not root) for security.
-
-## How to Use
-
-### Prerequisites
-
- VS Code with the **Dev Containers** extension installed
- Docker running locally
-
-### Open in Dev Container
-
-1. Open the project in VS Code
-2. Press `F1` → type "Dev Containers: Reopen in Container"
-3. VS Code will:
-   - Build the container using the root `docker-compose.yml`
-   - Install Java 21, Maven, and Node 24
-   - Install the listed extensions
-   - Mount the workspace folder
-
-### Working Inside the Container
-
-Once inside the container, you have access to both stacks:
-
-```bash
-# Backend
-cd backend
-./mvnw spring-boot:run
-
-# Frontend (in a new terminal)
-cd frontend
-npm install
-npm run dev
-```
-
-The container reuses the `docker-compose.yml` services, so PostgreSQL and MinIO are available automatically.
-
-### Forwarding Frontend Port
-
-The devcontainer config only forwards port 8080 by default. To access the frontend dev server (port 5173 or 3000), either:
-
-1. Add `5173` to `forwardPorts` in `devcontainer.json`, or
-2. Use the VS Code "Ports" panel to forward it dynamically
-
-## Limitations
-
- The devcontainer attaches to the `backend` service from `docker-compose.yml`, so it inherits those environment variables
- OCR service and other containers should be started separately via `docker-compose up -d`
- GPU passthrough for OCR training is not configured
-
-## Customization
-
-To add more tools or extensions, edit `.devcontainer/devcontainer.json`:
-
-```json
-{
-  "features": {
-    "ghcr.io/devcontainers/features/python:1": {
-      "version": "3.11"
-    }
-  },
-  "forwardPorts": [8080, 5173, 3000]
-}
-```
+→ See [.devcontainer/README.md](./README.md) for configuration, usage, and known limitations.
--- a/.devcontainer/README.md
+++ b/.devcontainer/README.md
@@ -0,0 +1,94 @@
+# Dev Container — Familienarchiv
+
+VS Code Dev Container configuration for a pre-configured development environment. Includes Java 21, Maven, and Node.js 24 — everything needed to work on both backend and frontend.
+
+## Configuration
+
+File: `.devcontainer/devcontainer.json`
+
+### Included Features
+
+| Feature | Version                   | Purpose             |
+| ------- | ------------------------- | ------------------- |
+| Java    | 21                        | Spring Boot backend |
+| Maven   | bundled with Java feature | Build tool          |
+| Node.js | 24                        | SvelteKit frontend  |
+
+### VS Code Extensions (Auto-installed)
+
+| Extension                   | Purpose                                       |
+| --------------------------- | --------------------------------------------- |
+| `vscjava.vscode-java-pack`  | Java language support, debugging, testing     |
+| `vmware.vscode-spring-boot` | Spring Boot tooling                           |
+| `gabrielbb.vscode-lombok`   | Lombok annotation support                     |
+| `humao.rest-client`         | HTTP request files (for `backend/api_tests/`) |
+
+### Ports
+
+- `8080` forwarded to host — access backend at `http://localhost:8080`
+
+### User
+
+Runs as `vscode` user (not root) for security.
+
+## How to Use
+
+### Prerequisites
+
+- VS Code with the **Dev Containers** extension installed
+- Docker running locally
+
+### Open in Dev Container
+
+1. Open the project in VS Code
+2. Press `F1` → type "Dev Containers: Reopen in Container"
+3. VS Code will:
+   - Build the container using the root `docker-compose.yml`
+   - Install Java 21, Maven, and Node 24
+   - Install the listed extensions
+   - Mount the workspace folder
+
+### Working Inside the Container
+
+Once inside the container, you have access to both stacks:
+
+```bash
+# Backend
+cd backend
+./mvnw spring-boot:run
+
+# Frontend (in a new terminal)
+cd frontend
+npm install
+npm run dev
+```
+
+The container reuses the `docker-compose.yml` services, so PostgreSQL and MinIO are available automatically.
+
+### Forwarding Frontend Port
+
+The devcontainer config only forwards port 8080 by default. To access the frontend dev server (port 5173 or 3000), either:
+
+1. Add `5173` to `forwardPorts` in `devcontainer.json`, or
+2. Use the VS Code "Ports" panel to forward it dynamically
+
+## Limitations
+
+- The devcontainer attaches to the `backend` service from `docker-compose.yml`, so it inherits those environment variables
+- OCR service and other containers should be started separately via `docker-compose up -d`
+- GPU passthrough for OCR training is not configured
+
+## Customization
+
+To add more tools or extensions, edit `.devcontainer/devcontainer.json`:
+
+```json
+{
+  "features": {
+    "ghcr.io/devcontainers/features/python:1": {
+      "version": "3.11"
+    }
+  },
+  "forwardPorts": [8080, 5173, 3000]
+}
+```
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -2,6 +2,8 @@

 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

+> For a human-readable project overview, see [README.md](./README.md).
+
 ## Project Overview

 **Familienarchiv** is a family document archival system — a full-stack web app for digitizing, organizing, and searching family documents. Key features: file uploads (stored in MinIO/S3), metadata management, Excel/ODS batch import, full-text search, conversation threads between family members, and role-based access control.
@@ -16,6 +18,8 @@ See [CODESTYLE.md](./CODESTYLE.md) for coding standards: Clean Code, DRY/KISS tr

 ## Stack

+→ See [README.md §Tech Stack](./README.md#tech-stack)
+
 - **Backend**: Spring Boot 4.0 (Java 21, Maven, Jetty, JPA/Hibernate, Flyway, Spring Security, Spring Session JDBC)
 - **Frontend**: SvelteKit 2 with Svelte 5, TypeScript, Tailwind CSS 4, Paraglide.js (i18n: de/en/es)
 - **Database**: PostgreSQL 16
@@ -25,12 +29,13 @@ See [CODESTYLE.md](./CODESTYLE.md) for coding standards: Clean Code, DRY/KISS tr
 ## Common Commands

 ### Running the Full Stack
+
 ```bash
-# From repo root — starts PostgreSQL, MinIO, and Spring Boot backend
 docker-compose up -d
 ```

 ### Backend (Spring Boot)
+
 ```bash
 cd backend

@@ -42,6 +47,7 @@ cd backend
 ```

 ### Frontend (SvelteKit)
+
 ```bash
 cd frontend

@@ -64,7 +70,7 @@ npm run generate:api  # Regenerate TypeScript API types from OpenAPI spec

 ### Package Structure

-Package-by-domain: each domain owns its controller, service, repository, entities, and DTOs.
+<!-- TODO: rewrite post-REFACTOR-1 — see Epic 4 -->

 ```
 backend/src/main/java/org/raddatz/familienarchiv/
@@ -88,27 +94,21 @@ backend/src/main/java/org/raddatz/familienarchiv/
 └── user/                User domain — AppUser, UserGroup, UserService, auth controllers
 ```

-### Layering Rules (strictly enforced)
+### Layering Rules

-```
-Controller → Service → Repository → DB
-```
+→ See [docs/ARCHITECTURE.md §Layering rule](./docs/ARCHITECTURE.md#layering-rule)

- **Controllers** never inject or call repositories directly.
- **Services** never reach into another domain's repository. Call the other domain's service instead.
-  - ✅ `DocumentService` → `PersonService.getById()` → `PersonRepository`
-  - ❌ `DocumentService` → `PersonRepository` directly
- This keeps domain boundaries clear and business logic testable in isolation.
+**LLM reminder:** controllers never call repositories directly; services never reach into another domain's repository — always call the other domain's service instead.

 ### Domain Model

-| Entity | Table | Key relationships |
-|---|---|---|
-| `Document` | `documents` | ManyToOne `sender` (Person), ManyToMany `receivers` (Person), ManyToMany `tags` (Tag) |
-| `Person` | `persons` | Referenced by documents as sender/receiver |
-| `Tag` | `tag` | ManyToMany with documents via `document_tags` |
-| `AppUser` | `app_users` | ManyToMany `groups` (UserGroup) |
-| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
+| Entity      | Table         | Key relationships                                                                     |
+| ----------- | ------------- | ------------------------------------------------------------------------------------- |
+| `Document`  | `documents`   | ManyToOne `sender` (Person), ManyToMany `receivers` (Person), ManyToMany `tags` (Tag) |
+| `Person`    | `persons`     | Referenced by documents as sender/receiver                                            |
+| `Tag`       | `tag`         | ManyToMany with documents via `document_tags`                                         |
+| `AppUser`   | `app_users`   | ManyToMany `groups` (UserGroup)                                                       |
+| `UserGroup` | `user_groups` | Has a `Set<String> permissions`                                                       |

 **`DocumentStatus` lifecycle:** `PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`

@@ -118,6 +118,7 @@ Controller → Service → Repository → DB
 ### Entity Code Style

 All entities use these Lombok annotations:
+
 ```java
@Entity
@Table(name = "table_name")
@@ -146,65 +147,29 @@ Services are annotated with `@Service`, `@RequiredArgsConstructor`, and optional
 - Read methods are not annotated (default non-transactional is fine).
 - Each service owns its domain's repository. Cross-domain data access goes through the other domain's service.

-**Existing services:**
-
-| Service | Responsibility |
-|---|---|
-| `DocumentService` | Document CRUD, search, tag cascade delete |
-| `PersonService` | Person CRUD, find-or-create by alias |
-| `TagService` | Tag find/create/update/delete |
-| `UserService` | User and group CRUD |
-| `FileService` | S3/MinIO upload and download |
-| `MassImportService` | Async ODS/Excel import; delegates to PersonService and TagService |
-
 ### DTOs

-Input DTOs live in `dto/`. Response types are the model entities themselves (no response DTOs).
+Input DTOs live flat in the domain package. Response types are the model entities themselves (no response DTOs).

- `DocumentUpdateDTO` — used for both create and update (all fields optional)
- `CreateUserRequest` — user creation
- `GroupDTO` — group create/update
+- `@Schema(requiredMode = REQUIRED)` on every field the backend always populates — drives TypeScript generation.

 ### Error Handling

-Use `DomainException` for all domain errors. Never throw raw exceptions from service methods.
+→ See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)

-```java
-// Static factories match common HTTP status codes:
-DomainException.notFound(ErrorCode.DOCUMENT_NOT_FOUND, "Document not found: " + id)
-DomainException.forbidden("Access denied")
-DomainException.conflict(ErrorCode.IMPORT_ALREADY_RUNNING, "Already running")
-DomainException.internal(ErrorCode.FILE_UPLOAD_FAILED, "Upload failed: " + e.getMessage())
-```
-
-`ErrorCode` is an enum in `exception/ErrorCode.java`. When adding a new error case, add the value there **and** mirror it in the frontend's `src/lib/errors.ts` + add a Paraglide translation key.
-
-For simple validation in controllers (not domain logic), `ResponseStatusException` is acceptable:
-```java
-throw new ResponseStatusException(HttpStatus.BAD_REQUEST, "firstName is required");
-```
+**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) mirror in `frontend/src/lib/shared/errors.ts`, (3) add i18n keys in `messages/{de,en,es}.json`.

 ### Security / Permissions

-Use `@RequirePermission` on controller methods (or the whole controller class):
+→ See [docs/ARCHITECTURE.md §Permission system](./docs/ARCHITECTURE.md#permission-system)

-```java
-@RequirePermission(Permission.WRITE_ALL)
-public Document updateDocument(...) { ... }
-```
-
-Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`
-
-`PermissionAspect` (AOP) checks the current user's `UserGroup.permissions` at runtime.
+**LLM reminder:** `@RequirePermission(Permission.WRITE_ALL)` is **required** on every `POST`, `PUT`, `PATCH`, `DELETE` endpoint — not optional. Do not mix with Spring Security's `@PreAuthorize`. Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`, `ANNOTATE_ALL`, `BLOG_WRITE`.

 ### OpenAPI / API Types

-SpringDoc generates the spec at `/v3/api-docs` (only accessible when running with `--spring.profiles.active=dev`).
+→ See [CONTRIBUTING.md §Walkthrough B — Add a new endpoint](./CONTRIBUTING.md#4-walkthrough-b--add-a-new-endpoint)

-When changing any model field or endpoint:
-1. Rebuild the backend JAR with `-DskipTests`
-2. Start it with `--spring.profiles.active=dev`
-3. Run `npm run generate:api` in `frontend/`
+**LLM reminder:** always run `npm run generate:api` in `frontend/` after any backend model or endpoint change — this is the most common cause of TypeScript type errors.

 ---

@@ -233,79 +198,52 @@ frontend/src/routes/

 ### API Client Pattern

-All server-side API calls use the typed client from `$lib/api.server.ts`:
+→ See [CONTRIBUTING.md §Frontend API client](./CONTRIBUTING.md#frontend-api-client)

-```typescript
-const api = createApiClient(fetch);
-const result = await api.GET('/api/persons/{id}', { params: { path: { id } } });
-
-// Always check via response.ok, NOT result.error
-if (!result.response.ok) {
-    const code = (result.error as unknown as { code?: string })?.code;
-    throw error(result.response.status, getErrorMessage(code));
-}
-return { person: result.data! };
-```
-
-Key rules:
- Use `!result.response.ok` for error checking (not `if (result.error)` — this breaks when the spec has no error responses defined)
- Cast errors as `result.error as unknown as { code?: string }` to extract the backend error code
- Use `result.data!` (non-null assertion) after an ok check — TypeScript knows it's present
-
-For multipart/form-data endpoints (file uploads), bypass the typed client and use raw `fetch`:
-```typescript
-const res = await fetch(`${baseUrl}/api/documents`, { method: 'POST', body: formData });
-```
+**LLM reminder:** check `!result.response.ok` (not `result.error` — breaks when spec has no error responses defined); cast errors as `result.error as unknown as { code?: string }`; use `result.data!` after an ok check.

 ### Form Actions Pattern

 ```typescript
 // +page.server.ts
 export const actions = {
-    default: async ({ request, fetch }) => {
-        const formData = await request.formData();
-        const name = formData.get('name') as string;  // cast needed — FormData returns FormDataEntryValue
-        // ...
-        return fail(400, { error: 'message' });  // on error
-        throw redirect(303, '/target');           // on success
-    }
+  default: async ({ request, fetch }) => {
+    const formData = await request.formData();
+    const name = formData.get("name") as string;
+    // ...
+    return fail(400, { error: "message" }); // on error
+    throw redirect(303, "/target"); // on success
+  },
 };
 ```

 ### Date Handling

- **Forms**: German format `dd.mm.yyyy` with auto-dot insertion via `handleDateInput()`. A hidden `<input type="hidden" name="documentDate" value={dateIso}>` sends ISO format to the backend.
- **Display**: Always use `Intl.DateTimeFormat` with `T12:00:00` suffix to prevent UTC timezone off-by-one:
-  ```typescript
-  new Intl.DateTimeFormat('de-DE', { day: 'numeric', month: 'long', year: 'numeric' })
-      .format(new Date(doc.documentDate + 'T12:00:00'))
-  ```
+→ See [CONTRIBUTING.md §Date handling](./CONTRIBUTING.md#date-handling)
+
+**LLM reminder:** always append `T12:00:00` when constructing `new Date()` from an ISO date string — prevents UTC timezone off-by-one errors.

 ### UI Component Library

-Custom components in `src/lib/components/`:
-
-| Component | Props | Description |
-|---|---|---|
-| `PersonTypeahead` | `name`, `label`, `value`, `initialName`, `on:change` | Single-person selector with typeahead dropdown |
-| `PersonMultiSelect` | `selectedPersons` (bind) | Chip-based multi-person selector |
-| `TagInput` | `tags` (bind), `allowCreation?`, `on:change` | Tag chip input with typeahead |
+→ See per-domain READMEs: [`frontend/src/lib/person/README.md`](./frontend/src/lib/person/README.md), [`frontend/src/lib/tag/README.md`](./frontend/src/lib/tag/README.md), [`frontend/src/lib/document/README.md`](./frontend/src/lib/document/README.md), [`frontend/src/lib/shared/README.md`](./frontend/src/lib/shared/README.md)

 ### Styling Conventions (Tailwind CSS 4)

 Brand color utilities (defined in `layout.css`):

-| Class | Value | Usage |
-|---|---|---|
-| `brand-navy` | `#002850` | Primary text, buttons, headers |
+| Class        | Value     | Usage                            |
+| ------------ | --------- | -------------------------------- |
+| `brand-navy` | `#002850` | Primary text, buttons, headers   |
 | `brand-mint` | `#A6DAD8` | Accents, hover underlines, icons |
-| `brand-sand` | `#E4E2D7` | Page background, card borders |
+| `brand-sand` | `#E4E2D7` | Page background, card borders    |

 Typography:
+
 - `font-serif` (Merriweather) — body text, document titles, names
 - `font-sans` (Montserrat) — labels, metadata, UI chrome

 Card pattern for content sections:
+
 ```svelte
 <div class="bg-white shadow-sm border border-brand-sand rounded-sm p-6">
    <h2 class="text-xs font-bold uppercase tracking-widest text-gray-400 mb-5">Section Title</h2>
@@ -313,48 +251,19 @@ Card pattern for content sections:
 </div>
 ```

-Save bar pattern — use **sticky full-bleed** for long forms (edit document), **card-style with `mt-4`** for short forms (new person):
-```svelte
-<!-- Long forms: sticky, full-bleed -->
-<div class="sticky bottom-0 z-10 -mx-4 px-6 py-4 bg-white border-t border-brand-sand shadow-[0_-2px_8px_rgba(0,0,0,0.06)] flex items-center justify-between">
-
-<!-- Short forms: card, top margin -->
-<div class="mt-4 flex items-center justify-between rounded-sm border border-brand-sand bg-white px-6 py-4 shadow-sm">
-```
-
-Back button pattern — use the shared `<BackButton>` component from `$lib/components/BackButton.svelte`:
-```svelte
-<script lang="ts">
-    import BackButton from '$lib/components/BackButton.svelte';
-</script>
-
-<BackButton />
-```
-The component calls `history.back()` so the user returns to wherever they came from. Label is always "Zurück" (no contextual suffix — destination is unknown). Touch target ≥ 44px and focus ring are built in. Do not use a static `<a href>` for back navigation.
-
-Subtle action link (e.g. "new document/person"):
-```svelte
-<a href="/documents/new" class="inline-flex items-center gap-1 text-sm font-medium text-brand-navy/60 hover:text-brand-navy transition-colors">
-    <svg class="w-4 h-4" ...><!-- plus icon --></svg>
-    Neues Dokument
-</a>
-```
+Back button pattern — use the shared `<BackButton>` component from `$lib/shared/primitives/BackButton.svelte`. Do not use a static `<a href>` for back navigation.

 ### Error Handling (Frontend)

-`src/lib/errors.ts` mirrors the backend `ErrorCode` enum and maps codes to Paraglide translation keys. When adding a new `ErrorCode` on the backend:
-1. Add it to `ErrorCode.java`
-2. Add it to the `ErrorCode` type in `errors.ts`
-3. Add a `case` in `getErrorMessage()`
-4. Add the translation key in `messages/de.json`, `en.json`, `es.json`
+→ See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
+
+**LLM reminder:** when adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`.

 ---

 ## Infrastructure

-The `docker-compose.yml` at the repo root orchestrates everything. A MinIO MC helper container runs at startup to create the `archive-documents` bucket. The backend container depends on both `db` and `minio` being healthy.
-
-Database migrations live in `backend/src/main/resources/db/migration/` (Flyway, SQL files named `V{n}__{description}.sql`).
+→ See [docs/DEPLOYMENT.md](./docs/DEPLOYMENT.md)

 ## API Testing

@@ -362,4 +271,4 @@ HTTP test files are in `backend/api_tests/` for use with the VS Code REST Client

 ## Dev Container

-A `.devcontainer/` config is available (Java 21 + Node 24, ports 8080 and 3000 forwarded). Use VS Code's "Reopen in Container" for a pre-configured environment.
+→ See [.devcontainer/README.md](./.devcontainer/README.md)
--- a/backend/CLAUDE.md
+++ b/backend/CLAUDE.md
@@ -11,7 +11,7 @@ Spring Boot 4.0 monolith serving the Familienarchiv REST API. Handles document m
 - **Server**: Jetty (not Tomcat — excluded in pom.xml)
 - **Data**: PostgreSQL 16, JPA/Hibernate, Spring Data JPA
 - **Migrations**: Flyway (SQL files in `src/main/resources/db/migration/`)
- **Security**: Spring Security, Spring Session JDBC, JWT tokens
+- **Security**: Spring Security, Spring Session JDBC
 - **File Storage**: MinIO via AWS SDK v2 (S3-compatible)
 - **Spreadsheet Import**: Apache POI 5.5.0 (Excel/ODS)
 - **API Docs**: SpringDoc OpenAPI 3.x (`/v3/api-docs` — dev profile only)
@@ -19,7 +19,7 @@ Spring Boot 4.0 monolith serving the Familienarchiv REST API. Handles document m

 ## Package Structure

-Package-by-domain: each domain owns its controller, service, repository, entities, and DTOs.
+<!-- TODO: rewrite post-REFACTOR-1 — see Epic 4 -->

 ```
 src/main/java/org/raddatz/familienarchiv/
@@ -43,31 +43,28 @@ src/main/java/org/raddatz/familienarchiv/
 └── user/                # User domain — AppUser, UserGroup, UserService, auth controllers
 ```

-## Layering Rules (Strict)
+For per-domain ownership and public surface, see each domain's `README.md`.

-```
-Controller → Service → Repository → DB
-```
+## Layering Rules

- **Controllers never call repositories directly.**
- **Services never reach into another domain's repository.** Call the other domain's service instead.
-  - ✅ `DocumentService` → `PersonService.getById()` → `PersonRepository`
-  - ❌ `DocumentService` → `PersonRepository` directly
+→ See [docs/ARCHITECTURE.md §Layering rule](../docs/ARCHITECTURE.md#layering-rule)
+
+**LLM reminder:** controllers never call repositories directly; services never reach into another domain's repository — always call the other domain's service.

 ## Key Entities

-| Entity | Table | Key Relationships |
-|---|---|---|
-| `Document` | `documents` | ManyToOne sender (Person), ManyToMany receivers (Person), ManyToMany tags (Tag) |
-| `Person` | `persons` | Referenced by documents as sender/receiver; name aliases table |
-| `Tag` | `tag` | ManyToMany with documents via `document_tags`; self-referencing parent for tree |
-| `AppUser` | `app_users` | ManyToMany groups (UserGroup) |
-| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
-| `TranscriptionBlock` | `transcription_blocks` | Per-document, per-page text blocks with polygons |
-| `DocumentAnnotation` | `document_annotations` | Free-form annotations on document pages |
-| `Comment` | `document_comments` | Threaded comments with mentions |
-| `Notification` | `notifications` | User notification feed |
-| `OcrJob` / `OcrJobDocument` | `ocr_jobs`, `ocr_job_documents` | Batch OCR job tracking |
+| Entity                      | Table                           | Key Relationships                                                               |
+| --------------------------- | ------------------------------- | ------------------------------------------------------------------------------- |
+| `Document`                  | `documents`                     | ManyToOne sender (Person), ManyToMany receivers (Person), ManyToMany tags (Tag) |
+| `Person`                    | `persons`                       | Referenced by documents as sender/receiver; name aliases table                  |
+| `Tag`                       | `tag`                           | ManyToMany with documents via `document_tags`; self-referencing parent for tree |
+| `AppUser`                   | `app_users`                     | ManyToMany groups (UserGroup)                                                   |
+| `UserGroup`                 | `user_groups`                   | Has a `Set<String> permissions`                                                 |
+| `TranscriptionBlock`        | `transcription_blocks`          | Per-document, per-page text blocks with polygons                                |
+| `DocumentAnnotation`        | `document_annotations`          | Free-form annotations on document pages                                         |
+| `Comment`                   | `document_comments`             | Threaded comments with mentions                                                 |
+| `Notification`              | `notifications`                 | User notification feed                                                          |
+| `OcrJob` / `OcrJobDocument` | `ocr_jobs`, `ocr_job_documents` | Batch OCR job tracking                                                          |

 **`DocumentStatus` lifecycle:** `PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`

@@ -104,32 +101,15 @@ public class MyEntity {

 ## Error Handling

-Use `DomainException` for all domain errors:
+→ See [CONTRIBUTING.md §Error handling](../CONTRIBUTING.md#error-handling)

-```java
-DomainException.notFound(ErrorCode.DOCUMENT_NOT_FOUND, "...")
-DomainException.forbidden("...")
-DomainException.conflict(ErrorCode.IMPORT_ALREADY_RUNNING, "...")
-DomainException.internal(ErrorCode.FILE_UPLOAD_FAILED, "...")
-```
-
-When adding a new `ErrorCode`:
-1. Add to `ErrorCode.java`
-2. Mirror in frontend `src/lib/errors.ts`
-3. Add Paraglide translation key in `messages/{de,en,es}.json`
+**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` — never throw raw exceptions from service methods. When adding a new `ErrorCode`: add to `ErrorCode.java`, mirror in `frontend/src/lib/shared/errors.ts`, add i18n keys in `messages/{de,en,es}.json`.

 ## Security / Permissions

-Use `@RequirePermission` on controller methods or classes:
+→ See [docs/ARCHITECTURE.md §Permission system](../docs/ARCHITECTURE.md#permission-system)

-```java
-@RequirePermission(Permission.WRITE_ALL)
-public Document updateDocument(...) { ... }
-```
-
-Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`
-
-`PermissionAspect` checks the current user's `UserGroup.permissions` at runtime.
+**LLM reminder:** `@RequirePermission(Permission.WRITE_ALL)` is **required** on every `POST`, `PUT`, `PATCH`, `DELETE` endpoint — not optional. Do not mix with Spring Security's `@PreAuthorize`. Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`, `ANNOTATE_ALL`, `BLOG_WRITE`.

 ## OCR Integration

@@ -141,49 +121,35 @@ The backend orchestrates OCR by calling the Python `ocr-service` microservice vi
 - `OcrBatchService` — handles batch/job workflows
 - `OcrAsyncRunner` — async execution of OCR jobs

+For ocr-service internals, see [`ocr-service/README.md`](../ocr-service/README.md).
+
 ## API Testing

 HTTP test files in `backend/api_tests/` for the VS Code REST Client extension.

 ## How to Run

-### Local Development
-
 ```bash
 cd backend

-# Run with dev profile (requires PostgreSQL + MinIO running via docker-compose)
-./mvnw spring-boot:run
-
-# Build JAR (with tests)
-./mvnw clean package
-
-# Build JAR skipping tests
+./mvnw spring-boot:run          # Run with dev profile (requires PostgreSQL + MinIO)
+./mvnw clean package            # Build JAR (with tests)
 ./mvnw clean package -DskipTests
-
-# Run all tests
-./mvnw test
-
-# Run a single test class
-./mvnw test -Dtest=ClassName
-
-# Run with coverage (JaCoCo)
-./mvnw clean verify
+./mvnw test                     # Run all tests
+./mvnw test -Dtest=ClassName    # Run a single test class
+./mvnw clean verify             # Run with JaCoCo coverage report
 ```

-### OpenAPI TypeScript Generation
+**OpenAPI / TypeScript type generation:**

-1. Build and start backend with `--spring.profiles.active=dev`
-2. In `frontend/`, run: `npm run generate:api`
+1. Start backend with `--spring.profiles.active=dev`
+2. In `frontend/`: `npm run generate:api`

-### Profiles
-
- **dev** (default): Enables OpenAPI, dev configs, e2e seeds
- **prod**: Production profile — no dev endpoints
+**LLM reminder:** always regenerate types after any model or endpoint change — the most common cause of "where did my TypeScript type go?"

 ## Testing

 - Unit tests: Mockito + JUnit, pure in-memory
 - Slice tests: `@WebMvcTest`, `@DataJpaTest` with Testcontainers PostgreSQL
 - Integration tests: Full Spring context with Testcontainers
- Coverage gate: 88% branch coverage overall (JaCoCo)
+- Coverage gate: 88% branch coverage (JaCoCo)
--- a/docs/CLAUDE.md
+++ b/docs/CLAUDE.md
@@ -1,97 +1,5 @@
-# Docs — Familienarchiv
+# docs/

-## Overview
+→ See [docs/README.md](./README.md) for the folder structure and documentation guide.

-Project documentation organized into four categories: architecture decision records (ADRs), system architecture diagrams, infrastructure runbooks, and detailed UI/UX specifications.
-
-## Folder Structure
-
-```
-docs/
-├── adr/                     # Architecture Decision Records
-├── architecture/            # C4 model diagrams and system architecture docs
-├── infrastructure/          # Deployment, CI/CD, and ops guides
-├── specs/                   # UI/UX feature specifications (HTML)
-├── app-analysis-*.md        # Application analysis reports
-├── mail.md                  # Mail system documentation
-├── security-guide.md        # Security policies and hardening guide
-├── STYLEGUIDE.md            # Coding and design style guide
-├── TODO-backend.md          # Backend backlog
-└── TODO-frontend.md         # Frontend backlog
-```
-
-## ADR (`adr/`)
-
-Architecture Decision Records capture major technical decisions and their rationale.
-
-| ADR | Title | Status |
-|---|---|---|
-| `001-ocr-python-microservice.md` | OCR as a separate Python container | Accepted |
-| `002-polygon-jsonb-storage.md` | Polygon coordinates in JSONB columns | Accepted |
-| `003-chronik-unified-activity-feed.md` | Unified activity feed (Chronik) | Accepted |
-
-When making a significant architectural change (new service, data model change, technology swap), write a new ADR following the format:
- Status (Proposed / Accepted / Deprecated / Superseded)
- Context (forces at play)
- Decision (what we decided)
- Consequences (trade-offs)
- Alternatives Considered (table format)
-
-## Architecture (`architecture/`)
-
-Contains C4 model diagrams describing the system at different zoom levels:
-
- **Context diagram** — How Familienarchiv fits into the user and system ecosystem
- **Container diagram** — The high-level technology choices (Spring Boot, SvelteKit, PostgreSQL, MinIO, OCR service)
- **Component diagram** — Major structural components within the backend
-
-Written in Markdown with embedded Mermaid or PlantUML diagrams (`c4-diagrams.md`).
-
-## Infrastructure (`infrastructure/`)
-
-Operational documentation for running Familienarchiv in production and CI.
-
-| Document | Purpose |
-|---|---|
-| `ci-gitea.md` | Gitea CI/CD pipeline configuration |
-| `production-compose.md` | Production Docker Compose setup |
-| `s3-migration.md` | Migrating documents between S3 buckets |
-| `self-hosted-catalogue.md` | Self-hosted software catalogue |
-
-## Specs (`specs/`)
-
-High-fidelity UI/UX specifications written as standalone HTML files. These are design documents that describe exact layout, interactions, and responsive behavior before implementation.
-
-Each spec typically includes:
- Visual mockups with CSS-in-HTML styling
- Interaction flows and state transitions
- Responsive breakpoint behavior
- Accessibility requirements
-
-Examples of active spec areas:
- Document detail page (`document-topbar-*.html`, `documents-page-spec.html`)
- Admin interfaces (`admin-redesign-*.html`, `admin-tag-overhaul.html`)
- Transcription workflows (`inline-transcription-*.html`, `annotation-transcription-*.html`)
- Dashboard and activity feeds (`dashboard-*.html`, `chronik-spec.html`)
- OCR admin (`ocr-admin-spec.html`)
-
-## How to Use
-
-1. **Before implementing a feature**, check `specs/` for an existing specification.
-2. **When proposing a new architecture**, draft an ADR in `adr/` and discuss before coding.
-3. **When deploying**, follow `infrastructure/production-compose.md`.
-4. **Keep TODO files updated** — they serve as lightweight backlogs.
-
-## Style Guide
-
-`STYLEGUIDE.md` covers:
- Code formatting and linting rules
- Component naming conventions
- Color palette and typography
- Accessibility standards (WCAG 2.1 AA)
-
-## Contributing
-
- ADRs should be sequential (`NNN-descriptive-name.md`).
- Specs should be self-contained HTML files viewable in a browser.
- Infrastructure docs should include copy-pasteable commands.
+**LLM reminder:** ADRs are sequential — use the next number after the highest existing one in `docs/adr/`. When making a significant architectural change (new service, data model change, technology swap), write a new ADR before implementing.
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,86 @@
+# docs/
+
+Project documentation organised into four categories: architecture decision records (ADRs), system architecture diagrams, infrastructure runbooks, and detailed UI/UX specifications.
+
+## Folder structure
+
+```
+docs/
+├── adr/                     # Architecture Decision Records
+├── architecture/            # C4 model diagrams and system architecture docs
+├── infrastructure/          # Deployment, CI/CD, and ops guides
+├── specs/                   # UI/UX feature specifications (HTML)
+├── ARCHITECTURE.md          # Human-readable architecture overview (DOC-2)
+├── DEPLOYMENT.md            # Day-1 checklist and operational reference (DOC-5)
+├── GLOSSARY.md              # Domain terminology (DOC-3)
+├── security-guide.md        # Security policies and hardening guide
+├── STYLEGUIDE.md            # Coding and design style guide
+└── infrastructure/          # Production compose, CI config, S3 migration
+```
+
+## ADR (`adr/`)
+
+Architecture Decision Records capture major technical decisions and their rationale.
+
+| ADR                                    | Title                                | Status   |
+| -------------------------------------- | ------------------------------------ | -------- |
+| `001-ocr-python-microservice.md`       | OCR as a separate Python container   | Accepted |
+| `002-polygon-jsonb-storage.md`         | Polygon coordinates in JSONB columns | Accepted |
+| `003-chronik-unified-activity-feed.md` | Unified activity feed (Chronik)      | Accepted |
+
+When making a significant architectural change (new service, data model change, technology swap), write a new ADR:
+
+- **Status** (Proposed / Accepted / Deprecated / Superseded)
+- **Context** (forces at play)
+- **Decision** (what we decided)
+- **Consequences** (trade-offs)
+- **Alternatives Considered** (table format)
+
+ADRs are sequential (`NNN-descriptive-name.md`). Do not reuse numbers.
+
+## Architecture (`architecture/`)
+
+Contains C4 model diagrams describing the system at different zoom levels:
+
+- **Context diagram** — How Familienarchiv fits into the user and system ecosystem
+- **Container diagram** — The high-level technology choices (Spring Boot, SvelteKit, PostgreSQL, MinIO, OCR service)
+- **Component diagram** — Major structural components within the backend
+
+Written in Markdown with embedded Mermaid diagrams (`c4-diagrams.md`). Gitea renders these automatically.
+
+For the human-readable architecture narrative, see [`docs/ARCHITECTURE.md`](ARCHITECTURE.md).
+
+## Infrastructure (`infrastructure/`)
+
+Operational documentation for running Familienarchiv in production and CI.
+
+| Document                   | Purpose                                              |
+| -------------------------- | ---------------------------------------------------- |
+| `ci-gitea.md`              | Gitea CI/CD pipeline configuration                   |
+| `production-compose.md`    | Production Docker Compose setup and VPS provisioning |
+| `s3-migration.md`          | Migrating documents between S3 buckets               |
+| `self-hosted-catalogue.md` | Self-hosted software catalogue                       |
+
+For the day-1 deployment checklist, see [`docs/DEPLOYMENT.md`](DEPLOYMENT.md).
+
+## Specs (`specs/`)
+
+High-fidelity UI/UX specifications written as standalone HTML files. These are design documents describing exact layout, interactions, and responsive behavior before implementation.
+
+Each spec typically includes:
+
+- Visual mockups with CSS-in-HTML styling
+- Interaction flows and state transitions
+- Responsive breakpoint behavior
+- Accessibility requirements
+
+Before implementing a feature, check `specs/` for an existing specification.
+
+## Style Guide
+
+[`docs/STYLEGUIDE.md`](STYLEGUIDE.md) covers:
+
+- Code formatting and linting rules
+- Component naming conventions
+- Color palette and typography
+- Accessibility standards (WCAG 2.1 AA)
--- a/1
+++ b/1
--- a/frontend/CLAUDE.md
+++ b/frontend/CLAUDE.md
@@ -71,29 +71,13 @@ src/
 └── ...                  # Other SvelteKit config files
 ```

+For per-domain component inventories, see the domain READMEs in `src/lib/<domain>/README.md`.
+
 ## API Client Pattern

-All server-side API calls use the typed client from `$lib/api.server.ts`:
+→ See [CONTRIBUTING.md §Frontend API client](../CONTRIBUTING.md#frontend-api-client)

-```typescript
-const api = createApiClient(fetch);
-const result = await api.GET('/api/persons/{id}', { params: { path: { id } } });
-
-// Always check via response.ok, NOT result.error
-if (!result.response.ok) {
-	const code = (result.error as unknown as { code?: string })?.code;
-	throw error(result.response.status, getErrorMessage(code));
-}
-return { person: result.data! };
-```
-
-Key rules:
-
- Use `!result.response.ok` for error checking (not `if (result.error)` — breaks when spec has no error responses defined)
- Cast errors as `result.error as unknown as { code?: string }` to extract backend error code
- Use `result.data!` after an ok check
-
-For multipart/form-data (file uploads), bypass the typed client and use raw `fetch`.
+**LLM reminder:** check `!result.response.ok` (not `result.error` — breaks when spec has no error responses); cast errors as `result.error as unknown as { code?: string }`; use `result.data!` after an ok check. For multipart/form-data (file uploads), bypass the typed client and use raw `fetch`.

 ## Form Actions Pattern

@@ -102,7 +86,7 @@ For multipart/form-data (file uploads), bypass the typed client and use raw `fet
 export const actions = {
 	default: async ({ request, fetch }) => {
 		const formData = await request.formData();
-		const name = formData.get('name') as string;
+		const name = formData.get('name') as string; // cast needed — FormData returns FormDataEntryValue
 		// ...
 		return fail(400, { error: 'message' }); // on error
 		throw redirect(303, '/target'); // on success
@@ -112,13 +96,9 @@ export const actions = {

 ## Date Handling

- **Forms**: German format `dd.mm.yyyy` with auto-dot insertion via `handleDateInput()`. A hidden `<input type="hidden" name="documentDate" value={dateIso}>` sends ISO to the backend.
- **Display**: Always use `Intl.DateTimeFormat` with `T12:00:00` suffix to prevent UTC off-by-one:
-  ```typescript
-  new Intl.DateTimeFormat('de-DE', { day: 'numeric', month: 'long', year: 'numeric' }).format(
-  	new Date(doc.documentDate + 'T12:00:00')
-  );
-  ```
+→ See [CONTRIBUTING.md §Date handling](../CONTRIBUTING.md#date-handling)
+
+**LLM reminder:** always append `T12:00:00` when constructing `new Date()` from an ISO date string — prevents UTC timezone off-by-one errors. Forms use German `dd.mm.yyyy` format via `handleDateInput()` with a hidden ISO input.

 ## Styling Conventions (Tailwind CSS 4)

@@ -146,15 +126,9 @@ Card pattern for content sections:

 ## Key UI Components

-| Component            | Location                       | Props                                   | Description                                |
-| -------------------- | ------------------------------ | --------------------------------------- | ------------------------------------------ |
-| `PersonTypeahead`    | `$lib/person/`                 | `name`, `label`, `value`, `initialName` | Single-person selector with typeahead      |
-| `PersonMultiSelect`  | `$lib/person/`                 | `selectedPersons` (bind)                | Chip-based multi-person selector           |
-| `TagInput`           | `$lib/tag/`                    | `tags` (bind), `allowCreation?`         | Tag chip input with typeahead              |
-| `PdfViewer`          | `$lib/document/`               | `url`, `annotations`                    | PDF rendering with annotation overlay      |
-| `TranscriptionBlock` | `$lib/document/transcription/` | `block`, `mode`                         | Read/edit transcription block              |
-| `DocumentTopBar`     | `$lib/document/`               | `document`                              | Responsive document metadata header        |
-| `BackButton`         | `$lib/shared/primitives/`      | —                                       | Calls `history.back()`; 44 px touch target |
+→ See per-domain READMEs: [`src/lib/person/README.md`](src/lib/person/README.md), [`src/lib/tag/README.md`](src/lib/tag/README.md), [`src/lib/document/README.md`](src/lib/document/README.md), [`src/lib/shared/README.md`](src/lib/shared/README.md)
+
+**LLM reminder:** `BackButton` is at `$lib/shared/primitives/BackButton.svelte` — use it for all back navigation; never a static `<a href>`. API client is at `$lib/shared/api.server`.

 ## How to Run

--- a/ocr-service/CLAUDE.md
+++ b/ocr-service/CLAUDE.md
@@ -1,154 +1,7 @@
-# OCR Service — Familienarchiv
+# OCR Service

-## Overview
+→ See [ocr-service/README.md](./README.md) for tech stack, architecture, endpoints, environment variables, local development, testing, and training.

-Python FastAPI microservice that performs OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition) on historical family documents. It exposes a simple HTTP API consumed by the Spring Boot backend. The service is stateless — all job tracking and business logic remain in Java.
+**LLM reminder:** the OCR service is a **single-node container** — training reloads the model in-process, so multiple replicas cause model-state divergence (see ADR-001). All job tracking and business logic stay in Spring Boot; the Python service is stateless OCR only.

-## Tech Stack
-
- **Framework**: FastAPI 0.115.6 (Python 3.11)
- **OCR Engines**:
-  - **Surya** (`surya-ocr`) — Transformer-based, handles typewritten and modern Latin handwriting
-  - **Kraken** (`kraken==7.0`) — Historical HTR model support, required for pre-1941 German Kurrent/Sütterlin scripts
- **ML**: PyTorch 2.7.1 (CPU-only), torchvision, transformers
- **PDF Processing**: `pypdfium2` (rendering), `pillow`
- **Image Processing**: `opencv-python-headless`, `pyvips`
- **Spell Checking**: `pyspellchecker`
- **HTTP Client**: `httpx`
-
-## Architecture
-
-The service is a single-node container (see ADR-001). OCR training reloads the model in-process after each run, so multiple replicas would cause training conflicts and model-state divergence.
-
-### Interface Contract
-
-**Request:**
-```json
-{
-  "pdfUrl": "http://minio:9000/archive-documents/abc.pdf?presigned...",
-  "scriptType": "HANDWRITING_KURRENT",
-  "language": "de"
-}
-```
-
-**Response:** Array of `OcrBlock` objects:
-```json
-[
-  {
-    "pageNumber": 0,
-    "x": 0.12, "y": 0.08, "width": 0.76, "height": 0.04,
-    "polygon": [[0.12,0.08],[0.88,0.09],[0.87,0.12],[0.13,0.11]],
-    "text": "Sehr geehrter Herr ..."
-  }
-]
-```
-
-Coordinates are normalized (0-1) relative to page dimensions.
-
-### File Structure
-
-```
-ocr-service/
-├── main.py                  # FastAPI app, endpoints, request handling
-├── models.py                # Pydantic models (OcrRequest, OcrBlock)
-├── engines/
-│   ├── __init__.py
-│   ├── kraken.py            # Kraken engine wrapper (Kurrent models)
-│   └── surya.py             # Surya engine wrapper (typewritten/Latin)
-├── preprocessing.py         # Image preprocessing (CLAHE, deskew, denoise)
-├── confidence.py            # Confidence scoring and thresholding
-├── spell_check.py           # Post-OCR spell correction
-├── ensure_blla_model.py     # Model download / verification helper
-├── dictionaries/            # Historical word lists for spell checking
-├── requirements.txt         # Python dependencies
-├── Dockerfile               # Production container image
-└── entrypoint.sh            # Container startup script
-```
-
-### Key Endpoints
-
-| Endpoint | Method | Description |
-|---|---|---|
-| `/health` | GET | Returns 200 only after models are loaded |
-| `/ocr` | POST | Extract text blocks from a PDF URL |
-| `/ocr/stream` | POST | Streaming OCR with SSE-style progress events |
-| `/training/submit` | POST | Submit training data for model fine-tuning |
-
-### Environment Variables
-
-| Variable | Default | Description |
-|---|---|---|
-| `KRAKEN_MODEL_PATH` | `/app/models/german_kurrent.mlmodel` | Path to Kraken model file |
-| `TRAINING_TOKEN` | `""` | Bearer token required for training endpoints |
-| `OCR_CONFIDENCE_THRESHOLD` | `0.3` | Minimum confidence for Latin scripts |
-| `OCR_CONFIDENCE_THRESHOLD_KURRENT` | `0.5` | Minimum confidence for Kurrent scripts |
-| `RECOGNITION_BATCH_SIZE` | `16` | Kraken recognition batch size |
-| `DETECTOR_BATCH_SIZE` | `8` | Surya detector batch size |
-| `OCR_CLAHE_CLIP_LIMIT` | `2.0` | CLAHE contrast enhancement limit |
-| `OCR_CLAHE_TILE_SIZE` | `8` | CLAHE tile grid size |
-| `OCR_MAX_CACHED_MODELS` | `2` | LRU model cache size (~500 MB each) |
-| `ALLOWED_PDF_HOSTS` | `minio,localhost,127.0.0.1` | SSRF protection — allowed PDF URL hosts |
-
-## How to Run
-
-### Local Development (Python venv)
-
-```bash
-cd ocr-service
-python -m venv .venv
-source .venv/bin/activate
-
-# Install PyTorch CPU first (saves ~2 GB vs CUDA)
-pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cpu
-
-# Install remaining dependencies
-pip install -r requirements.txt
-
-# Run development server
-fastapi dev main.py --host 0.0.0.0 --port 8000
-
-# Or production mode
-uvicorn main:app --host 0.0.0.0 --port 8000
-```
-
-### Docker (via docker-compose)
-
-The OCR service is included in the root `docker-compose.yml`:
-
-```bash
-docker-compose up -d ocr-service
-```
-
-The container:
- Exposes port 8000 internally (not mapped to host by default)
- Mounts `ocr_models` and `ocr_cache` volumes for persistence
- Has a 120-second startup grace period for model loading
- Memory limit: 12 GB
-
-### Model Downloads
-
-Use the helper script to download Kraken models:
-
-```bash
-./scripts/download-kraken-models.sh
-```
-
-Models are stored in the `ocr_models` Docker volume or `./ocr-service/models/` locally.
-
-## Testing
-
-Only a subset of tests can run without the full ML stack:
-
-```bash
-cd ocr-service
-pip install pytest pytest-asyncio pyspellchecker
-
-# No ML required — pure logic tests
-python -m pytest test_spell_check.py test_confidence.py test_sender_registry.py -v
-```
-
-Tests requiring PyTorch/Kraken/Surya (e.g., `test_engines.py`) must be run in the Docker container or a fully provisioned venv.
-
-## Training
-
-The service supports in-process model fine-tuning via Kraken's `ketos` training pipeline. Training endpoints require the `TRAINING_TOKEN` bearer token. After training completes, the model is reloaded in-process — this is why only a single replica is supported.
+`ALLOWED_PDF_HOSTS` must never be set to `*` — that opens SSRF. The default (`minio,localhost,127.0.0.1`) is correct for dev.
--- a/scripts/CLAUDE.md
+++ b/scripts/CLAUDE.md
@@ -1,144 +1,5 @@
-# Scripts — Familienarchiv
+# scripts/

-## Overview
+→ See [scripts/README.md](./README.md) for the full list of scripts, their purpose, and usage.

-Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.
-
-## Scripts
-
-### `reset-db.sh`
-**Purpose**: Hard-reset the development database, wiping all documents, persons, tags, and related data.
-
-**Usage:**
-```bash
-./scripts/reset-db.sh
-# Type 'yes' to confirm
-```
-
-**What it truncates:**
- `transcription_block_versions`
- `transcription_blocks`
- `comment_mentions`
- `document_comments`
- `document_annotations`
- `document_versions`
- `notifications`
- `documents`
- `person_name_aliases`
- `persons`
- `tag`
-
-> ⚠️ **Destructive operation** — only for development!
-
---
-
-### `rebuild-frontend.sh`
-**Purpose**: Force a clean rebuild of the frontend Docker container.
-
-**Usage:**
-```bash
-./scripts/rebuild-frontend.sh
-```
-
---
-
-### `download-kraken-models.sh`
-**Purpose**: Download Kraken HTR models for German Kurrent and Sütterlin scripts.
-
-**Usage:**
-```bash
-./scripts/download-kraken-models.sh
-```
-
-Downloads models into `./ocr-service/models/` or the `ocr_models` Docker volume. Models are ~100-500 MB each.
-
---
-
-### `download-paperless.sh`
-**Purpose**: Download exported documents from a Paperless-ngx instance.
-
-**Usage:**
-```bash
-./scripts/download-paperless.sh
-```
-
-Requires environment variables or config for the Paperless API endpoint and token.
-
---
-
-### `flatten-paperless.sh`
-**Purpose**: Flatten nested Paperless export directories into a single import-ready structure.
-
-**Usage:**
-```bash
-./scripts/flatten-paperless.sh
-```
-
---
-
-### `generate_data.py`
-**Purpose**: Generate synthetic test data for development.
-
-**Usage:**
-```bash
-python scripts/generate_data.py
-```
-
-Generates fake documents, persons, and tags suitable for load testing or UI development.
-
---
-
-### `prepare_historical_dict.py`
-**Purpose**: Build a historical German word dictionary for the OCR spell-checker.
-
-**Usage:**
-```bash
-python scripts/prepare_historical_dict.py
-```
-
-Processes raw word lists into the format expected by `ocr-service/spell_check.py`.
-
---
-
-### `schema.sql`
-**Purpose**: Complete database schema dump for reference.
-
-**Note**: Flyway migrations in `backend/src/main/resources/db/migration/` are the source of truth for schema evolution. `schema.sql` is a snapshot for quick reference only.
-
---
-
-### `large-data.sql`
-**Purpose**: Pre-seeded dataset with a large number of documents for performance testing.
-
-**Usage:**
-```bash
-# Import into PostgreSQL
-docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql
-```
-
-## How to Use
-
-Most scripts should be run from the **repository root**:
-
-```bash
-# Database reset
-./scripts/reset-db.sh
-
-# Model download
-./scripts/download-kraken-models.sh
-
-# Data generation
-cd scripts && python generate_data.py
-```
-
-Ensure scripts are executable:
-```bash
-chmod +x scripts/*.sh
-```
-
-## Adding New Scripts
-
-1. Place the script in `scripts/`
-2. Add a header comment describing purpose and usage
-3. Make it executable (`chmod +x`)
-4. Document it in this `CLAUDE.md`
+**LLM reminder:** when adding a new script, document it in `scripts/README.md` (not here).
--- a/scripts/README.md
+++ b/scripts/README.md
@@ -0,0 +1,161 @@
+# scripts/
+
+Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.
+
+## Scripts
+
+### `reset-db.sh`
+
+**Purpose**: Hard-reset the development database, wiping all documents, persons, tags, and related data.
+
+**Usage:**
+
+```bash
+./scripts/reset-db.sh
+# Type 'yes' to confirm
+```
+
+**What it truncates:**
+
+- `transcription_block_versions`
+- `transcription_blocks`
+- `comment_mentions`
+- `document_comments`
+- `document_annotations`
+- `document_versions`
+- `notifications`
+- `documents`
+- `person_name_aliases`
+- `persons`
+- `tag`
+
+> ⚠️ **Destructive operation — only for development!** This wipes ALL data. Not reversible without a backup.
+
+---
+
+### `rebuild-frontend.sh`
+
+**Purpose**: Force a clean rebuild of the frontend Docker container.
+
+**Usage:**
+
+```bash
+./scripts/rebuild-frontend.sh
+```
+
+---
+
+### `download-kraken-models.sh`
+
+**Purpose**: Download Kraken HTR models for German Kurrent and Sütterlin scripts.
+
+**Usage:**
+
+```bash
+./scripts/download-kraken-models.sh
+```
+
+Downloads models into `./ocr-service/models/` or the `ocr_models` Docker volume. Models are ~100–500 MB each.
+
+---
+
+### `download-paperless.sh`
+
+**Purpose**: Download exported documents from a Paperless-ngx instance.
+
+**Usage:**
+
+```bash
+./scripts/download-paperless.sh
+```
+
+Requires environment variables or config for the Paperless API endpoint and token.
+
+---
+
+### `flatten-paperless.sh`
+
+**Purpose**: Flatten nested Paperless export directories into a single import-ready structure.
+
+**Usage:**
+
+```bash
+./scripts/flatten-paperless.sh
+```
+
+---
+
+### `generate_data.py`
+
+**Purpose**: Generate synthetic test data for development.
+
+**Usage:**
+
+```bash
+python scripts/generate_data.py
+```
+
+Generates fake documents, persons, and tags suitable for load testing or UI development.
+
+---
+
+### `prepare_historical_dict.py`
+
+**Purpose**: Build a historical German word dictionary for the OCR spell-checker.
+
+**Usage:**
+
+```bash
+python scripts/prepare_historical_dict.py
+```
+
+Processes raw word lists into the format expected by `ocr-service/spell_check.py`.
+
+---
+
+### `schema.sql`
+
+**Purpose**: Complete database schema dump for reference.
+
+**Note**: Flyway migrations in `backend/src/main/resources/db/migration/` are the source of truth for schema evolution. `schema.sql` is a snapshot for quick reference only.
+
+---
+
+### `large-data.sql`
+
+**Purpose**: Pre-seeded dataset with a large number of documents for performance testing.
+
+**Usage:**
+
+```bash
+# Import into PostgreSQL
+docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql
+```
+
+## How to Use
+
+Most scripts should be run from the **repository root**:
+
+```bash
+# Database reset
+./scripts/reset-db.sh
+
+# Model download
+./scripts/download-kraken-models.sh
+
+# Data generation
+cd scripts && python generate_data.py
+```
+
+Ensure scripts are executable:
+
+```bash
+chmod +x scripts/*.sh
+```
+
+## Adding New Scripts
+
+1. Place the script in `scripts/`
+2. Add a header comment describing purpose and usage
+3. Make it executable (`chmod +x`)
+4. Document it in this `README.md`