docs(legibility): migrate CLAUDE.md rules into human docs — DOC-7

Processes all 7 CLAUDE.md files according to the 3-bucket classification. Migration targets (CONTRIBUTING.md, docs/ARCHITECTURE.md, docs/DEPLOYMENT.md, domain READMEs) are introduced by DOC-2/4/5/6 — this PR must merge last. ### scripts/CLAUDE.md → scripts/README.md New `scripts/README.md` with full script documentation (preserving the ⚠️ destructive-operation warning on reset-db.sh). `scripts/CLAUDE.md` reduced to a pointer + "document new scripts in README.md" reminder. ### .devcontainer/CLAUDE.md → .devcontainer/README.md New `.devcontainer/README.md` with all configuration, usage, and limitations. `devcontainer/CLAUDE.md` reduced to a single pointer line. ### docs/CLAUDE.md → docs/README.md New `docs/README.md` covering the folder structure, ADR guide, infrastructure docs, and specs folder. `docs/CLAUDE.md` reduced to pointer + ADR reminder. ### ocr-service/CLAUDE.md Reduced to pointer to `ocr-service/README.md` (content migrated in DOC-6). Kept LLM reminders: single-node constraint, ALLOWED_PDF_HOSTS SSRF risk. ### backend/CLAUDE.md - Layering Rules → pointer to docs/ARCHITECTURE.md - Error Handling → pointer to CONTRIBUTING.md + reminder - Security/Permissions → pointer to docs/ARCHITECTURE.md + reminder - Package Structure → tagged TODO post-REFACTOR-1 - Fixed errors.ts path to frontend/src/lib/shared/errors.ts - Added ANNOTATE_ALL + BLOG_WRITE to permission list - Key Entities, Entity Code Style, Services → kept (Bucket-2) ### root CLAUDE.md - Stack, Infrastructure, Dev Container → pointers - Layering Rules, Error Handling, Security, OpenAPI, API Client, Date Handling, UI Components, Frontend Error Handling → pointers + reminders - Package Structure → tagged TODO post-REFACTOR-1 - Domain Model, Entity Code Style, Form Actions, Styling → kept (Bucket-2) ### frontend/CLAUDE.md - API Client Pattern, Date Handling → pointers + reminders - Key UI Components → pointer to domain READMEs - Styling, Form Actions, How to Run, Vite Proxy, i18n → kept (Bucket-2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 23:33:41 +02:00
parent 513fda2888
commit 86c13a230c
11 changed files with 452 additions and 732 deletions
--- a/.devcontainer/CLAUDE.md
+++ b/.devcontainer/CLAUDE.md
@@ -1,96 +1,3 @@
-# Dev Container — Familienarchiv
+# Dev Container
-## Overview
+→ See [.devcontainer/README.md](./README.md) for configuration, usage, and known limitations.
 VS Code Dev Container configuration for a pre-configured development environment. Includes Java 21, Maven, and Node.js 24 — everything needed to work on both backend and frontend.
 ## Configuration
 File: `.devcontainer/devcontainer.json`
 ### Included Features
 | Feature | Version | Purpose |
 |---|---|---|
 | Java | 21 | Spring Boot backend |
 | Maven | bundled with Java feature | Build tool |
 | Node.js | 24 | SvelteKit frontend |
 ### VS Code Extensions (Auto-installed)
 | Extension | Purpose |
 |---|---|
 | `vscjava.vscode-java-pack` | Java language support, debugging, testing |
 | `vmware.vscode-spring-boot` | Spring Boot tooling |
 | `gabrielbb.vscode-lombok` | Lombok annotation support |
 | `humao.rest-client` | HTTP request files (for `backend/api_tests/`) |
 ### Ports
 - `8080` forwarded to host — access backend at `http://localhost:8080`
 ### User
 Runs as `vscode` user (not root) for security.
 ## How to Use
 ### Prerequisites
 - VS Code with the **Dev Containers** extension installed
 - Docker running locally
 ### Open in Dev Container
 1. Open the project in VS Code
 2. Press `F1` → type "Dev Containers: Reopen in Container"
 3. VS Code will:
   - Build the container using the root `docker-compose.yml`
   - Install Java 21, Maven, and Node 24
   - Install the listed extensions
   - Mount the workspace folder
 ### Working Inside the Container
 Once inside the container, you have access to both stacks:
 ```bash
 # Backend
 cd backend
 ./mvnw spring-boot:run
 # Frontend (in a new terminal)
 cd frontend
 npm install
 npm run dev
 ```
 The container reuses the `docker-compose.yml` services, so PostgreSQL and MinIO are available automatically.
 ### Forwarding Frontend Port
 The devcontainer config only forwards port 8080 by default. To access the frontend dev server (port 5173 or 3000), either:
 1. Add `5173` to `forwardPorts` in `devcontainer.json`, or
 2. Use the VS Code "Ports" panel to forward it dynamically
 ## Limitations
 - The devcontainer attaches to the `backend` service from `docker-compose.yml`, so it inherits those environment variables
 - OCR service and other containers should be started separately via `docker-compose up -d`
 - GPU passthrough for OCR training is not configured
 ## Customization
 To add more tools or extensions, edit `.devcontainer/devcontainer.json`:
 ```json
 {
  "features": {
    "ghcr.io/devcontainers/features/python:1": {
      "version": "3.11"
    }
  },
  "forwardPorts": [8080, 5173, 3000]
 }
 ```
--- a/.devcontainer/README.md
+++ b/.devcontainer/README.md
@@ -0,0 +1,94 @@
 # Dev Container — Familienarchiv
 VS Code Dev Container configuration for a pre-configured development environment. Includes Java 21, Maven, and Node.js 24 — everything needed to work on both backend and frontend.
 ## Configuration
 File: `.devcontainer/devcontainer.json`
 ### Included Features
 | Feature | Version                   | Purpose             |
 | ------- | ------------------------- | ------------------- |
 | Java    | 21                        | Spring Boot backend |
 | Maven   | bundled with Java feature | Build tool          |
 | Node.js | 24                        | SvelteKit frontend  |
 ### VS Code Extensions (Auto-installed)
 | Extension                   | Purpose                                       |
 | --------------------------- | --------------------------------------------- |
 | `vscjava.vscode-java-pack`  | Java language support, debugging, testing     |
 | `vmware.vscode-spring-boot` | Spring Boot tooling                           |
 | `gabrielbb.vscode-lombok`   | Lombok annotation support                     |
 | `humao.rest-client`         | HTTP request files (for `backend/api_tests/`) |
 ### Ports
 - `8080` forwarded to host — access backend at `http://localhost:8080`
 ### User
 Runs as `vscode` user (not root) for security.
 ## How to Use
 ### Prerequisites
 - VS Code with the **Dev Containers** extension installed
 - Docker running locally
 ### Open in Dev Container
 1. Open the project in VS Code
 2. Press `F1` → type "Dev Containers: Reopen in Container"
 3. VS Code will:
   - Build the container using the root `docker-compose.yml`
   - Install Java 21, Maven, and Node 24
   - Install the listed extensions
   - Mount the workspace folder
 ### Working Inside the Container
 Once inside the container, you have access to both stacks:
 ```bash
 # Backend
 cd backend
 ./mvnw spring-boot:run
 # Frontend (in a new terminal)
 cd frontend
 npm install
 npm run dev
 ```
 The container reuses the `docker-compose.yml` services, so PostgreSQL and MinIO are available automatically.
 ### Forwarding Frontend Port
 The devcontainer config only forwards port 8080 by default. To access the frontend dev server (port 5173 or 3000), either:
 1. Add `5173` to `forwardPorts` in `devcontainer.json`, or
 2. Use the VS Code "Ports" panel to forward it dynamically
 ## Limitations
 - The devcontainer attaches to the `backend` service from `docker-compose.yml`, so it inherits those environment variables
 - OCR service and other containers should be started separately via `docker-compose up -d`
 - GPU passthrough for OCR training is not configured
 ## Customization
 To add more tools or extensions, edit `.devcontainer/devcontainer.json`:
 ```json
 {
  "features": {
    "ghcr.io/devcontainers/features/python:1": {
      "version": "3.11"
    }
  },
  "forwardPorts": [8080, 5173, 3000]
 }
 ```
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -4,6 +4,8 @@
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 > For a human-readable project overview, see [README.md](./README.md).
 ## Project Overview
 **Familienarchiv** is a family document archival system — a full-stack web app for digitizing, organizing, and searching family documents. Key features: file uploads (stored in MinIO/S3), metadata management, Excel/ODS batch import, full-text search, conversation threads between family members, and role-based access control.
@@ -18,6 +20,8 @@ See [CODESTYLE.md](./CODESTYLE.md) for coding standards: Clean Code, DRY/KISS tr
 ## Stack
 → See [README.md §Tech Stack](./README.md#tech-stack)
 - **Backend**: Spring Boot 4.0 (Java 21, Maven, Jetty, JPA/Hibernate, Flyway, Spring Security, Spring Session JDBC)
 - **Frontend**: SvelteKit 2 with Svelte 5, TypeScript, Tailwind CSS 4, Paraglide.js (i18n: de/en/es)
 - **Database**: PostgreSQL 16
@@ -27,12 +31,13 @@ See [CODESTYLE.md](./CODESTYLE.md) for coding standards: Clean Code, DRY/KISS tr
 ## Common Commands
 ### Running the Full Stack
 ```bash
 # From repo root — starts PostgreSQL, MinIO, and Spring Boot backend
 docker-compose up -d
 ```
 ### Backend (Spring Boot)
 ```bash
 cd backend
@@ -44,6 +49,7 @@ cd backend
 ```
 ### Frontend (SvelteKit)
 ```bash
 cd frontend
@@ -66,7 +72,7 @@ npm run generate:api  # Regenerate TypeScript API types from OpenAPI spec
 ### Package Structure
-Package-by-domain: each domain owns its controller, service, repository, entities, and DTOs.
+<!-- TODO: rewrite post-REFACTOR-1 — see Epic 4 -->
 ```
 backend/src/main/java/org/raddatz/familienarchiv/
@@ -90,27 +96,21 @@ backend/src/main/java/org/raddatz/familienarchiv/
 └── user/                User domain — AppUser, UserGroup, UserService, auth controllers
 ```
-### Layering Rules (strictly enforced)
+### Layering Rules
-```
+→ See [docs/ARCHITECTURE.md §Layering rule](./docs/ARCHITECTURE.md#layering-rule)
 Controller → Service → Repository → DB
 ```
- **Controllers** never inject or call repositories directly.
+**LLM reminder:** controllers never call repositories directly; services never reach into another domain's repository — always call the other domain's service instead.
 - **Services** never reach into another domain's repository. Call the other domain's service instead.
  - ✅ `DocumentService` → `PersonService.getById()` → `PersonRepository`
  - ❌ `DocumentService` → `PersonRepository` directly
 - This keeps domain boundaries clear and business logic testable in isolation.
 ### Domain Model
-| Entity | Table | Key relationships |
+| Entity      | Table         | Key relationships                                                                     |
-|---|---|---|
+| ----------- | ------------- | ------------------------------------------------------------------------------------- |
-| `Document` | `documents` | ManyToOne `sender` (Person), ManyToMany `receivers` (Person), ManyToMany `tags` (Tag) |
+| `Document`  | `documents`   | ManyToOne `sender` (Person), ManyToMany `receivers` (Person), ManyToMany `tags` (Tag) |
-| `Person` | `persons` | Referenced by documents as sender/receiver |
+| `Person`    | `persons`     | Referenced by documents as sender/receiver                                            |
-| `Tag` | `tag` | ManyToMany with documents via `document_tags` |
+| `Tag`       | `tag`         | ManyToMany with documents via `document_tags`                                         |
-| `AppUser` | `app_users` | ManyToMany `groups` (UserGroup) |
+| `AppUser`   | `app_users`   | ManyToMany `groups` (UserGroup)                                                       |
-| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
+| `UserGroup` | `user_groups` | Has a `Set<String> permissions`                                                       |
 **`DocumentStatus` lifecycle:** `PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`
@@ -120,6 +120,7 @@ Controller → Service → Repository → DB
 ### Entity Code Style
 All entities use these Lombok annotations:
 ```java
@Entity
@Table(name = "table_name")
@@ -148,65 +149,29 @@ Services are annotated with `@Service`, `@RequiredArgsConstructor`, and optional
 - Read methods are not annotated (default non-transactional is fine).
 - Each service owns its domain's repository. Cross-domain data access goes through the other domain's service.
 **Existing services:**
 | Service | Responsibility |
 |---|---|
 | `DocumentService` | Document CRUD, search, tag cascade delete |
 | `PersonService` | Person CRUD, find-or-create by alias |
 | `TagService` | Tag find/create/update/delete |
 | `UserService` | User and group CRUD |
 | `FileService` | S3/MinIO upload and download |
 | `MassImportService` | Async ODS/Excel import; delegates to PersonService and TagService |
 ### DTOs
-Input DTOs live in `dto/`. Response types are the model entities themselves (no response DTOs).
+Input DTOs live flat in the domain package. Response types are the model entities themselves (no response DTOs).
- `DocumentUpdateDTO` — used for both create and update (all fields optional)
+- `@Schema(requiredMode = REQUIRED)` on every field the backend always populates — drives TypeScript generation.
 - `CreateUserRequest` — user creation
 - `GroupDTO` — group create/update
 ### Error Handling
-Use `DomainException` for all domain errors. Never throw raw exceptions from service methods.
+→ See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
-```java
+**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) mirror in `frontend/src/lib/shared/errors.ts`, (3) add i18n keys in `messages/{de,en,es}.json`.
 // Static factories match common HTTP status codes:
 DomainException.notFound(ErrorCode.DOCUMENT_NOT_FOUND, "Document not found: " + id)
 DomainException.forbidden("Access denied")
 DomainException.conflict(ErrorCode.IMPORT_ALREADY_RUNNING, "Already running")
 DomainException.internal(ErrorCode.FILE_UPLOAD_FAILED, "Upload failed: " + e.getMessage())
 ```
 `ErrorCode` is an enum in `exception/ErrorCode.java`. When adding a new error case, add the value there **and** mirror it in the frontend's `src/lib/errors.ts` + add a Paraglide translation key.
 For simple validation in controllers (not domain logic), `ResponseStatusException` is acceptable:
 ```java
 throw new ResponseStatusException(HttpStatus.BAD_REQUEST, "firstName is required");
 ```
 ### Security / Permissions
-Use `@RequirePermission` on controller methods (or the whole controller class):
+→ See [docs/ARCHITECTURE.md §Permission system](./docs/ARCHITECTURE.md#permission-system)
-```java
+**LLM reminder:** `@RequirePermission(Permission.WRITE_ALL)` is **required** on every `POST`, `PUT`, `PATCH`, `DELETE` endpoint — not optional. Do not mix with Spring Security's `@PreAuthorize`. Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`, `ANNOTATE_ALL`, `BLOG_WRITE`.
@RequirePermission(Permission.WRITE_ALL)
 public Document updateDocument(...) { ... }
 ```
 Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`
 `PermissionAspect` (AOP) checks the current user's `UserGroup.permissions` at runtime.
 ### OpenAPI / API Types
-SpringDoc generates the spec at `/v3/api-docs` (only accessible when running with `--spring.profiles.active=dev`).
+→ See [CONTRIBUTING.md §Walkthrough B — Add a new endpoint](./CONTRIBUTING.md#4-walkthrough-b--add-a-new-endpoint)
-When changing any model field or endpoint:
+**LLM reminder:** always run `npm run generate:api` in `frontend/` after any backend model or endpoint change — this is the most common cause of TypeScript type errors.
 1. Rebuild the backend JAR with `-DskipTests`
 2. Start it with `--spring.profiles.active=dev`
 3. Run `npm run generate:api` in `frontend/`
 ---
@@ -235,79 +200,52 @@ frontend/src/routes/
 ### API Client Pattern
-All server-side API calls use the typed client from `$lib/api.server.ts`:
+→ See [CONTRIBUTING.md §Frontend API client](./CONTRIBUTING.md#frontend-api-client)
-```typescript
+**LLM reminder:** check `!result.response.ok` (not `result.error` — breaks when spec has no error responses defined); cast errors as `result.error as unknown as { code?: string }`; use `result.data!` after an ok check.
 const api = createApiClient(fetch);
 const result = await api.GET('/api/persons/{id}', { params: { path: { id } } });
 // Always check via response.ok, NOT result.error
 if (!result.response.ok) {
    const code = (result.error as unknown as { code?: string })?.code;
    throw error(result.response.status, getErrorMessage(code));
 }
 return { person: result.data! };
 ```
 Key rules:
 - Use `!result.response.ok` for error checking (not `if (result.error)` — this breaks when the spec has no error responses defined)
 - Cast errors as `result.error as unknown as { code?: string }` to extract the backend error code
 - Use `result.data!` (non-null assertion) after an ok check — TypeScript knows it's present
 For multipart/form-data endpoints (file uploads), bypass the typed client and use raw `fetch`:
 ```typescript
 const res = await fetch(`${baseUrl}/api/documents`, { method: 'POST', body: formData });
 ```
 ### Form Actions Pattern
 ```typescript
 // +page.server.ts
 export const actions = {
-    default: async ({ request, fetch }) => {
+  default: async ({ request, fetch }) => {
-        const formData = await request.formData();
+    const formData = await request.formData();
-        const name = formData.get('name') as string;  // cast needed — FormData returns FormDataEntryValue
+    const name = formData.get("name") as string;
-        // ...
+    // ...
-        return fail(400, { error: 'message' });  // on error
+    return fail(400, { error: "message" }); // on error
-        throw redirect(303, '/target');           // on success
+    throw redirect(303, "/target"); // on success
-    }
+  },
 };
 ```
 ### Date Handling
- **Forms**: German format `dd.mm.yyyy` with auto-dot insertion via `handleDateInput()`. A hidden `<input type="hidden" name="documentDate" value={dateIso}>` sends ISO format to the backend.
+→ See [CONTRIBUTING.md §Date handling](./CONTRIBUTING.md#date-handling)
- **Display**: Always use `Intl.DateTimeFormat` with `T12:00:00` suffix to prevent UTC timezone off-by-one:
+
-  ```typescript
+**LLM reminder:** always append `T12:00:00` when constructing `new Date()` from an ISO date string — prevents UTC timezone off-by-one errors.
  new Intl.DateTimeFormat('de-DE', { day: 'numeric', month: 'long', year: 'numeric' })
      .format(new Date(doc.documentDate + 'T12:00:00'))
  ```
 ### UI Component Library
-Custom components in `src/lib/components/`:
+→ See per-domain READMEs: [`frontend/src/lib/person/README.md`](./frontend/src/lib/person/README.md), [`frontend/src/lib/tag/README.md`](./frontend/src/lib/tag/README.md), [`frontend/src/lib/document/README.md`](./frontend/src/lib/document/README.md), [`frontend/src/lib/shared/README.md`](./frontend/src/lib/shared/README.md)
 | Component | Props | Description |
 |---|---|---|
 | `PersonTypeahead` | `name`, `label`, `value`, `initialName`, `on:change` | Single-person selector with typeahead dropdown |
 | `PersonMultiSelect` | `selectedPersons` (bind) | Chip-based multi-person selector |
 | `TagInput` | `tags` (bind), `allowCreation?`, `on:change` | Tag chip input with typeahead |
 ### Styling Conventions (Tailwind CSS 4)
 Brand color utilities (defined in `layout.css`):
-| Class | Value | Usage |
+| Class        | Value     | Usage                            |
-|---|---|---|
+| ------------ | --------- | -------------------------------- |
-| `brand-navy` | `#002850` | Primary text, buttons, headers |
+| `brand-navy` | `#002850` | Primary text, buttons, headers   |
 | `brand-mint` | `#A6DAD8` | Accents, hover underlines, icons |
-| `brand-sand` | `#E4E2D7` | Page background, card borders |
+| `brand-sand` | `#E4E2D7` | Page background, card borders    |
 Typography:
 - `font-serif` (Merriweather) — body text, document titles, names
 - `font-sans` (Montserrat) — labels, metadata, UI chrome
 Card pattern for content sections:
 ```svelte
 <div class="bg-white shadow-sm border border-brand-sand rounded-sm p-6">
    <h2 class="text-xs font-bold uppercase tracking-widest text-gray-400 mb-5">Section Title</h2>
@@ -315,48 +253,19 @@ Card pattern for content sections:
 </div>
 ```
-Save bar pattern — use **sticky full-bleed** for long forms (edit document), **card-style with `mt-4`** for short forms (new person):
+Back button pattern — use the shared `<BackButton>` component from `$lib/shared/primitives/BackButton.svelte`. Do not use a static `<a href>` for back navigation.
 ```svelte
 <!-- Long forms: sticky, full-bleed -->
 <div class="sticky bottom-0 z-10 -mx-4 px-6 py-4 bg-white border-t border-brand-sand shadow-[0_-2px_8px_rgba(0,0,0,0.06)] flex items-center justify-between">
 <!-- Short forms: card, top margin -->
 <div class="mt-4 flex items-center justify-between rounded-sm border border-brand-sand bg-white px-6 py-4 shadow-sm">
 ```
 Back button pattern — use the shared `<BackButton>` component from `$lib/components/BackButton.svelte`:
 ```svelte
 <script lang="ts">
    import BackButton from '$lib/components/BackButton.svelte';
 </script>
 <BackButton />
 ```
 The component calls `history.back()` so the user returns to wherever they came from. Label is always "Zurück" (no contextual suffix — destination is unknown). Touch target ≥ 44px and focus ring are built in. Do not use a static `<a href>` for back navigation.
 Subtle action link (e.g. "new document/person"):
 ```svelte
 <a href="/documents/new" class="inline-flex items-center gap-1 text-sm font-medium text-brand-navy/60 hover:text-brand-navy transition-colors">
    <svg class="w-4 h-4" ...><!-- plus icon --></svg>
    Neues Dokument
 </a>
 ```
 ### Error Handling (Frontend)
-`src/lib/errors.ts` mirrors the backend `ErrorCode` enum and maps codes to Paraglide translation keys. When adding a new `ErrorCode` on the backend:
+→ See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
-1. Add it to `ErrorCode.java`
+
-2. Add it to the `ErrorCode` type in `errors.ts`
+**LLM reminder:** when adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`.
 3. Add a `case` in `getErrorMessage()`
 4. Add the translation key in `messages/de.json`, `en.json`, `es.json`
 ---
 ## Infrastructure
-The `docker-compose.yml` at the repo root orchestrates everything. A MinIO MC helper container runs at startup to create the `archive-documents` bucket. The backend container depends on both `db` and `minio` being healthy.
+→ See [docs/DEPLOYMENT.md](./docs/DEPLOYMENT.md)
 Database migrations live in `backend/src/main/resources/db/migration/` (Flyway, SQL files named `V{n}__{description}.sql`).
 ## API Testing
@@ -364,4 +273,4 @@ HTTP test files are in `backend/api_tests/` for use with the VS Code REST Client
 ## Dev Container
-A `.devcontainer/` config is available (Java 21 + Node 24, ports 8080 and 3000 forwarded). Use VS Code's "Reopen in Container" for a pre-configured environment.
+→ See [.devcontainer/README.md](./.devcontainer/README.md)
--- a/backend/CLAUDE.md
+++ b/backend/CLAUDE.md
@@ -11,7 +11,7 @@ Spring Boot 4.0 monolith serving the Familienarchiv REST API. Handles document m
 - **Server**: Jetty (not Tomcat — excluded in pom.xml)
 - **Data**: PostgreSQL 16, JPA/Hibernate, Spring Data JPA
 - **Migrations**: Flyway (SQL files in `src/main/resources/db/migration/`)
- **Security**: Spring Security, Spring Session JDBC, JWT tokens
+- **Security**: Spring Security, Spring Session JDBC
 - **File Storage**: MinIO via AWS SDK v2 (S3-compatible)
 - **Spreadsheet Import**: Apache POI 5.5.0 (Excel/ODS)
 - **API Docs**: SpringDoc OpenAPI 3.x (`/v3/api-docs` — dev profile only)
@@ -19,7 +19,7 @@ Spring Boot 4.0 monolith serving the Familienarchiv REST API. Handles document m
 ## Package Structure
-Package-by-domain: each domain owns its controller, service, repository, entities, and DTOs.
+<!-- TODO: rewrite post-REFACTOR-1 — see Epic 4 -->
 ```
 src/main/java/org/raddatz/familienarchiv/
@@ -43,31 +43,28 @@ src/main/java/org/raddatz/familienarchiv/
 └── user/                # User domain — AppUser, UserGroup, UserService, auth controllers
 ```
-## Layering Rules (Strict)
+For per-domain ownership and public surface, see each domain's `README.md`.
-```
+## Layering Rules
 Controller → Service → Repository → DB
 ```
- **Controllers never call repositories directly.**
+→ See [docs/ARCHITECTURE.md §Layering rule](../docs/ARCHITECTURE.md#layering-rule)
- **Services never reach into another domain's repository.** Call the other domain's service instead.
+
-  - ✅ `DocumentService` → `PersonService.getById()` → `PersonRepository`
+**LLM reminder:** controllers never call repositories directly; services never reach into another domain's repository — always call the other domain's service.
  - ❌ `DocumentService` → `PersonRepository` directly
 ## Key Entities
-| Entity | Table | Key Relationships |
+| Entity                      | Table                           | Key Relationships                                                               |
-|---|---|---|
+| --------------------------- | ------------------------------- | ------------------------------------------------------------------------------- |
-| `Document` | `documents` | ManyToOne sender (Person), ManyToMany receivers (Person), ManyToMany tags (Tag) |
+| `Document`                  | `documents`                     | ManyToOne sender (Person), ManyToMany receivers (Person), ManyToMany tags (Tag) |
-| `Person` | `persons` | Referenced by documents as sender/receiver; name aliases table |
+| `Person`                    | `persons`                       | Referenced by documents as sender/receiver; name aliases table                  |
-| `Tag` | `tag` | ManyToMany with documents via `document_tags`; self-referencing parent for tree |
+| `Tag`                       | `tag`                           | ManyToMany with documents via `document_tags`; self-referencing parent for tree |
-| `AppUser` | `app_users` | ManyToMany groups (UserGroup) |
+| `AppUser`                   | `app_users`                     | ManyToMany groups (UserGroup)                                                   |
-| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
+| `UserGroup`                 | `user_groups`                   | Has a `Set<String> permissions`                                                 |
-| `TranscriptionBlock` | `transcription_blocks` | Per-document, per-page text blocks with polygons |
+| `TranscriptionBlock`        | `transcription_blocks`          | Per-document, per-page text blocks with polygons                                |
-| `DocumentAnnotation` | `document_annotations` | Free-form annotations on document pages |
+| `DocumentAnnotation`        | `document_annotations`          | Free-form annotations on document pages                                         |
-| `Comment` | `document_comments` | Threaded comments with mentions |
+| `Comment`                   | `document_comments`             | Threaded comments with mentions                                                 |
-| `Notification` | `notifications` | User notification feed |
+| `Notification`              | `notifications`                 | User notification feed                                                          |
-| `OcrJob` / `OcrJobDocument` | `ocr_jobs`, `ocr_job_documents` | Batch OCR job tracking |
+| `OcrJob` / `OcrJobDocument` | `ocr_jobs`, `ocr_job_documents` | Batch OCR job tracking                                                          |
 **`DocumentStatus` lifecycle:** `PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`
@@ -104,32 +101,15 @@ public class MyEntity {
 ## Error Handling
-Use `DomainException` for all domain errors:
+→ See [CONTRIBUTING.md §Error handling](../CONTRIBUTING.md#error-handling)
-```java
+**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` — never throw raw exceptions from service methods. When adding a new `ErrorCode`: add to `ErrorCode.java`, mirror in `frontend/src/lib/shared/errors.ts`, add i18n keys in `messages/{de,en,es}.json`.
 DomainException.notFound(ErrorCode.DOCUMENT_NOT_FOUND, "...")
 DomainException.forbidden("...")
 DomainException.conflict(ErrorCode.IMPORT_ALREADY_RUNNING, "...")
 DomainException.internal(ErrorCode.FILE_UPLOAD_FAILED, "...")
 ```
 When adding a new `ErrorCode`:
 1. Add to `ErrorCode.java`
 2. Mirror in frontend `src/lib/errors.ts`
 3. Add Paraglide translation key in `messages/{de,en,es}.json`
 ## Security / Permissions
-Use `@RequirePermission` on controller methods or classes:
+→ See [docs/ARCHITECTURE.md §Permission system](../docs/ARCHITECTURE.md#permission-system)
-```java
+**LLM reminder:** `@RequirePermission(Permission.WRITE_ALL)` is **required** on every `POST`, `PUT`, `PATCH`, `DELETE` endpoint — not optional. Do not mix with Spring Security's `@PreAuthorize`. Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`, `ANNOTATE_ALL`, `BLOG_WRITE`.
@RequirePermission(Permission.WRITE_ALL)
 public Document updateDocument(...) { ... }
 ```
 Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`
 `PermissionAspect` checks the current user's `UserGroup.permissions` at runtime.
 ## OCR Integration
@@ -141,49 +121,35 @@ The backend orchestrates OCR by calling the Python `ocr-service` microservice vi
 - `OcrBatchService` — handles batch/job workflows
 - `OcrAsyncRunner` — async execution of OCR jobs
 For ocr-service internals, see [`ocr-service/README.md`](../ocr-service/README.md).
 ## API Testing
 HTTP test files in `backend/api_tests/` for the VS Code REST Client extension.
 ## How to Run
 ### Local Development
 ```bash
 cd backend
-# Run with dev profile (requires PostgreSQL + MinIO running via docker-compose)
+./mvnw spring-boot:run          # Run with dev profile (requires PostgreSQL + MinIO)
-./mvnw spring-boot:run
+./mvnw clean package            # Build JAR (with tests)
 # Build JAR (with tests)
 ./mvnw clean package
 # Build JAR skipping tests
 ./mvnw clean package -DskipTests
-
+./mvnw test                     # Run all tests
-# Run all tests
+./mvnw test -Dtest=ClassName    # Run a single test class
-./mvnw test
+./mvnw clean verify             # Run with JaCoCo coverage report
 # Run a single test class
 ./mvnw test -Dtest=ClassName
 # Run with coverage (JaCoCo)
 ./mvnw clean verify
 ```
-### OpenAPI TypeScript Generation
+**OpenAPI / TypeScript type generation:**
-1. Build and start backend with `--spring.profiles.active=dev`
+1. Start backend with `--spring.profiles.active=dev`
-2. In `frontend/`, run: `npm run generate:api`
+2. In `frontend/`: `npm run generate:api`
-### Profiles
+**LLM reminder:** always regenerate types after any model or endpoint change — the most common cause of "where did my TypeScript type go?"
 - **dev** (default): Enables OpenAPI, dev configs, e2e seeds
 - **prod**: Production profile — no dev endpoints
 ## Testing
 - Unit tests: Mockito + JUnit, pure in-memory
 - Slice tests: `@WebMvcTest`, `@DataJpaTest` with Testcontainers PostgreSQL
 - Integration tests: Full Spring context with Testcontainers
- Coverage gate: 88% branch coverage overall (JaCoCo)
+- Coverage gate: 88% branch coverage (JaCoCo)
--- a/docs/CLAUDE.md
+++ b/docs/CLAUDE.md
@@ -1,97 +1,5 @@
-# Docs — Familienarchiv
+# docs/
-## Overview
+→ See [docs/README.md](./README.md) for the folder structure and documentation guide.
-Project documentation organized into four categories: architecture decision records (ADRs), system architecture diagrams, infrastructure runbooks, and detailed UI/UX specifications.
+**LLM reminder:** ADRs are sequential — use the next number after the highest existing one in `docs/adr/`. When making a significant architectural change (new service, data model change, technology swap), write a new ADR before implementing.
 ## Folder Structure
 ```
 docs/
 ├── adr/                     # Architecture Decision Records
 ├── architecture/            # C4 model diagrams and system architecture docs
 ├── infrastructure/          # Deployment, CI/CD, and ops guides
 ├── specs/                   # UI/UX feature specifications (HTML)
 ├── app-analysis-*.md        # Application analysis reports
 ├── mail.md                  # Mail system documentation
 ├── security-guide.md        # Security policies and hardening guide
 ├── STYLEGUIDE.md            # Coding and design style guide
 ├── TODO-backend.md          # Backend backlog
 └── TODO-frontend.md         # Frontend backlog
 ```
 ## ADR (`adr/`)
 Architecture Decision Records capture major technical decisions and their rationale.
 | ADR | Title | Status |
 |---|---|---|
 | `001-ocr-python-microservice.md` | OCR as a separate Python container | Accepted |
 | `002-polygon-jsonb-storage.md` | Polygon coordinates in JSONB columns | Accepted |
 | `003-chronik-unified-activity-feed.md` | Unified activity feed (Chronik) | Accepted |
 When making a significant architectural change (new service, data model change, technology swap), write a new ADR following the format:
 - Status (Proposed / Accepted / Deprecated / Superseded)
 - Context (forces at play)
 - Decision (what we decided)
 - Consequences (trade-offs)
 - Alternatives Considered (table format)
 ## Architecture (`architecture/`)
 Contains C4 model diagrams describing the system at different zoom levels:
 - **Context diagram** — How Familienarchiv fits into the user and system ecosystem
 - **Container diagram** — The high-level technology choices (Spring Boot, SvelteKit, PostgreSQL, MinIO, OCR service)
 - **Component diagram** — Major structural components within the backend
 Written in Markdown with embedded Mermaid or PlantUML diagrams (`c4-diagrams.md`).
 ## Infrastructure (`infrastructure/`)
 Operational documentation for running Familienarchiv in production and CI.
 | Document | Purpose |
 |---|---|
 | `ci-gitea.md` | Gitea CI/CD pipeline configuration |
 | `production-compose.md` | Production Docker Compose setup |
 | `s3-migration.md` | Migrating documents between S3 buckets |
 | `self-hosted-catalogue.md` | Self-hosted software catalogue |
 ## Specs (`specs/`)
 High-fidelity UI/UX specifications written as standalone HTML files. These are design documents that describe exact layout, interactions, and responsive behavior before implementation.
 Each spec typically includes:
 - Visual mockups with CSS-in-HTML styling
 - Interaction flows and state transitions
 - Responsive breakpoint behavior
 - Accessibility requirements
 Examples of active spec areas:
 - Document detail page (`document-topbar-*.html`, `documents-page-spec.html`)
 - Admin interfaces (`admin-redesign-*.html`, `admin-tag-overhaul.html`)
 - Transcription workflows (`inline-transcription-*.html`, `annotation-transcription-*.html`)
 - Dashboard and activity feeds (`dashboard-*.html`, `chronik-spec.html`)
 - OCR admin (`ocr-admin-spec.html`)
 ## How to Use
 1. **Before implementing a feature**, check `specs/` for an existing specification.
 2. **When proposing a new architecture**, draft an ADR in `adr/` and discuss before coding.
 3. **When deploying**, follow `infrastructure/production-compose.md`.
 4. **Keep TODO files updated** — they serve as lightweight backlogs.
 ## Style Guide
 `STYLEGUIDE.md` covers:
 - Code formatting and linting rules
 - Component naming conventions
 - Color palette and typography
 - Accessibility standards (WCAG 2.1 AA)
 ## Contributing
 - ADRs should be sequential (`NNN-descriptive-name.md`).
 - Specs should be self-contained HTML files viewable in a browser.
 - Infrastructure docs should include copy-pasteable commands.
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,86 @@
 # docs/
 Project documentation organised into four categories: architecture decision records (ADRs), system architecture diagrams, infrastructure runbooks, and detailed UI/UX specifications.
 ## Folder structure
 ```
 docs/
 ├── adr/                     # Architecture Decision Records
 ├── architecture/            # C4 model diagrams and system architecture docs
 ├── infrastructure/          # Deployment, CI/CD, and ops guides
 ├── specs/                   # UI/UX feature specifications (HTML)
 ├── ARCHITECTURE.md          # Human-readable architecture overview (DOC-2)
 ├── DEPLOYMENT.md            # Day-1 checklist and operational reference (DOC-5)
 ├── GLOSSARY.md              # Domain terminology (DOC-3)
 ├── security-guide.md        # Security policies and hardening guide
 ├── STYLEGUIDE.md            # Coding and design style guide
 └── infrastructure/          # Production compose, CI config, S3 migration
 ```
 ## ADR (`adr/`)
 Architecture Decision Records capture major technical decisions and their rationale.
 | ADR                                    | Title                                | Status   |
 | -------------------------------------- | ------------------------------------ | -------- |
 | `001-ocr-python-microservice.md`       | OCR as a separate Python container   | Accepted |
 | `002-polygon-jsonb-storage.md`         | Polygon coordinates in JSONB columns | Accepted |
 | `003-chronik-unified-activity-feed.md` | Unified activity feed (Chronik)      | Accepted |
 When making a significant architectural change (new service, data model change, technology swap), write a new ADR:
 - **Status** (Proposed / Accepted / Deprecated / Superseded)
 - **Context** (forces at play)
 - **Decision** (what we decided)
 - **Consequences** (trade-offs)
 - **Alternatives Considered** (table format)
 ADRs are sequential (`NNN-descriptive-name.md`). Do not reuse numbers.
 ## Architecture (`architecture/`)
 Contains C4 model diagrams describing the system at different zoom levels:
 - **Context diagram** — How Familienarchiv fits into the user and system ecosystem
 - **Container diagram** — The high-level technology choices (Spring Boot, SvelteKit, PostgreSQL, MinIO, OCR service)
 - **Component diagram** — Major structural components within the backend
 Written in Markdown with embedded Mermaid diagrams (`c4-diagrams.md`). Gitea renders these automatically.
 For the human-readable architecture narrative, see [`docs/ARCHITECTURE.md`](ARCHITECTURE.md).
 ## Infrastructure (`infrastructure/`)
 Operational documentation for running Familienarchiv in production and CI.
 | Document                   | Purpose                                              |
 | -------------------------- | ---------------------------------------------------- |
 | `ci-gitea.md`              | Gitea CI/CD pipeline configuration                   |
 | `production-compose.md`    | Production Docker Compose setup and VPS provisioning |
 | `s3-migration.md`          | Migrating documents between S3 buckets               |
 | `self-hosted-catalogue.md` | Self-hosted software catalogue                       |
 For the day-1 deployment checklist, see [`docs/DEPLOYMENT.md`](DEPLOYMENT.md).
 ## Specs (`specs/`)
 High-fidelity UI/UX specifications written as standalone HTML files. These are design documents describing exact layout, interactions, and responsive behavior before implementation.
 Each spec typically includes:
 - Visual mockups with CSS-in-HTML styling
 - Interaction flows and state transitions
 - Responsive breakpoint behavior
 - Accessibility requirements
 Before implementing a feature, check `specs/` for an existing specification.
 ## Style Guide
 [`docs/STYLEGUIDE.md`](STYLEGUIDE.md) covers:
 - Code formatting and linting rules
 - Component naming conventions
 - Color palette and typography
 - Accessibility standards (WCAG 2.1 AA)
--- a/1
+++ b/1
--- a/frontend/CLAUDE.md
+++ b/frontend/CLAUDE.md
@@ -71,29 +71,13 @@ src/
 └── ...                  # Other SvelteKit config files
 ```
 For per-domain component inventories, see the domain READMEs in `src/lib/<domain>/README.md`.
 ## API Client Pattern
-All server-side API calls use the typed client from `$lib/api.server.ts`:
+→ See [CONTRIBUTING.md §Frontend API client](../CONTRIBUTING.md#frontend-api-client)
-```typescript
+**LLM reminder:** check `!result.response.ok` (not `result.error` — breaks when spec has no error responses); cast errors as `result.error as unknown as { code?: string }`; use `result.data!` after an ok check. For multipart/form-data (file uploads), bypass the typed client and use raw `fetch`.
 const api = createApiClient(fetch);
 const result = await api.GET('/api/persons/{id}', { params: { path: { id } } });
 // Always check via response.ok, NOT result.error
 if (!result.response.ok) {
 	const code = (result.error as unknown as { code?: string })?.code;
 	throw error(result.response.status, getErrorMessage(code));
 }
 return { person: result.data! };
 ```
 Key rules:
 - Use `!result.response.ok` for error checking (not `if (result.error)` — breaks when spec has no error responses defined)
 - Cast errors as `result.error as unknown as { code?: string }` to extract backend error code
 - Use `result.data!` after an ok check
 For multipart/form-data (file uploads), bypass the typed client and use raw `fetch`.
 ## Form Actions Pattern
@@ -102,7 +86,7 @@ For multipart/form-data (file uploads), bypass the typed client and use raw `fet
 export const actions = {
 	default: async ({ request, fetch }) => {
 		const formData = await request.formData();
-		const name = formData.get('name') as string;
+		const name = formData.get('name') as string; // cast needed — FormData returns FormDataEntryValue
 		// ...
 		return fail(400, { error: 'message' }); // on error
 		throw redirect(303, '/target'); // on success
@@ -112,13 +96,9 @@ export const actions = {
 ## Date Handling
- **Forms**: German format `dd.mm.yyyy` with auto-dot insertion via `handleDateInput()`. A hidden `<input type="hidden" name="documentDate" value={dateIso}>` sends ISO to the backend.
+→ See [CONTRIBUTING.md §Date handling](../CONTRIBUTING.md#date-handling)
- **Display**: Always use `Intl.DateTimeFormat` with `T12:00:00` suffix to prevent UTC off-by-one:
+
-  ```typescript
+**LLM reminder:** always append `T12:00:00` when constructing `new Date()` from an ISO date string — prevents UTC timezone off-by-one errors. Forms use German `dd.mm.yyyy` format via `handleDateInput()` with a hidden ISO input.
  new Intl.DateTimeFormat('de-DE', { day: 'numeric', month: 'long', year: 'numeric' }).format(
  	new Date(doc.documentDate + 'T12:00:00')
  );
  ```
 ## Styling Conventions (Tailwind CSS 4)
@@ -146,15 +126,9 @@ Card pattern for content sections:
 ## Key UI Components
-| Component            | Location                       | Props                                   | Description                                |
+→ See per-domain READMEs: [`src/lib/person/README.md`](src/lib/person/README.md), [`src/lib/tag/README.md`](src/lib/tag/README.md), [`src/lib/document/README.md`](src/lib/document/README.md), [`src/lib/shared/README.md`](src/lib/shared/README.md)
-| -------------------- | ------------------------------ | --------------------------------------- | ------------------------------------------ |
+
-| `PersonTypeahead`    | `$lib/person/`                 | `name`, `label`, `value`, `initialName` | Single-person selector with typeahead      |
+**LLM reminder:** `BackButton` is at `$lib/shared/primitives/BackButton.svelte` — use it for all back navigation; never a static `<a href>`. API client is at `$lib/shared/api.server`.
 | `PersonMultiSelect`  | `$lib/person/`                 | `selectedPersons` (bind)                | Chip-based multi-person selector           |
 | `TagInput`           | `$lib/tag/`                    | `tags` (bind), `allowCreation?`         | Tag chip input with typeahead              |
 | `PdfViewer`          | `$lib/document/`               | `url`, `annotations`                    | PDF rendering with annotation overlay      |
 | `TranscriptionBlock` | `$lib/document/transcription/` | `block`, `mode`                         | Read/edit transcription block              |
 | `DocumentTopBar`     | `$lib/document/`               | `document`                              | Responsive document metadata header        |
 | `BackButton`         | `$lib/shared/primitives/`      | —                                       | Calls `history.back()`; 44 px touch target |
 ## How to Run
--- a/ocr-service/CLAUDE.md
+++ b/ocr-service/CLAUDE.md
@@ -1,154 +1,7 @@
-# OCR Service — Familienarchiv
+# OCR Service
-## Overview
+→ See [ocr-service/README.md](./README.md) for tech stack, architecture, endpoints, environment variables, local development, testing, and training.
-Python FastAPI microservice that performs OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition) on historical family documents. It exposes a simple HTTP API consumed by the Spring Boot backend. The service is stateless — all job tracking and business logic remain in Java.
+**LLM reminder:** the OCR service is a **single-node container** — training reloads the model in-process, so multiple replicas cause model-state divergence (see ADR-001). All job tracking and business logic stay in Spring Boot; the Python service is stateless OCR only.
-## Tech Stack
+`ALLOWED_PDF_HOSTS` must never be set to `*` — that opens SSRF. The default (`minio,localhost,127.0.0.1`) is correct for dev.
 - **Framework**: FastAPI 0.115.6 (Python 3.11)
 - **OCR Engines**:
  - **Surya** (`surya-ocr`) — Transformer-based, handles typewritten and modern Latin handwriting
  - **Kraken** (`kraken==7.0`) — Historical HTR model support, required for pre-1941 German Kurrent/Sütterlin scripts
 - **ML**: PyTorch 2.7.1 (CPU-only), torchvision, transformers
 - **PDF Processing**: `pypdfium2` (rendering), `pillow`
 - **Image Processing**: `opencv-python-headless`, `pyvips`
 - **Spell Checking**: `pyspellchecker`
 - **HTTP Client**: `httpx`
 ## Architecture
 The service is a single-node container (see ADR-001). OCR training reloads the model in-process after each run, so multiple replicas would cause training conflicts and model-state divergence.
 ### Interface Contract
 **Request:**
 ```json
 {
  "pdfUrl": "http://minio:9000/archive-documents/abc.pdf?presigned...",
  "scriptType": "HANDWRITING_KURRENT",
  "language": "de"
 }
 ```
 **Response:** Array of `OcrBlock` objects:
 ```json
 [
  {
    "pageNumber": 0,
    "x": 0.12, "y": 0.08, "width": 0.76, "height": 0.04,
    "polygon": [[0.12,0.08],[0.88,0.09],[0.87,0.12],[0.13,0.11]],
    "text": "Sehr geehrter Herr ..."
  }
 ]
 ```
 Coordinates are normalized (0-1) relative to page dimensions.
 ### File Structure
 ```
 ocr-service/
 ├── main.py                  # FastAPI app, endpoints, request handling
 ├── models.py                # Pydantic models (OcrRequest, OcrBlock)
 ├── engines/
 │   ├── __init__.py
 │   ├── kraken.py            # Kraken engine wrapper (Kurrent models)
 │   └── surya.py             # Surya engine wrapper (typewritten/Latin)
 ├── preprocessing.py         # Image preprocessing (CLAHE, deskew, denoise)
 ├── confidence.py            # Confidence scoring and thresholding
 ├── spell_check.py           # Post-OCR spell correction
 ├── ensure_blla_model.py     # Model download / verification helper
 ├── dictionaries/            # Historical word lists for spell checking
 ├── requirements.txt         # Python dependencies
 ├── Dockerfile               # Production container image
 └── entrypoint.sh            # Container startup script
 ```
 ### Key Endpoints
 | Endpoint | Method | Description |
 |---|---|---|
 | `/health` | GET | Returns 200 only after models are loaded |
 | `/ocr` | POST | Extract text blocks from a PDF URL |
 | `/ocr/stream` | POST | Streaming OCR with SSE-style progress events |
 | `/training/submit` | POST | Submit training data for model fine-tuning |
 ### Environment Variables
 | Variable | Default | Description |
 |---|---|---|
 | `KRAKEN_MODEL_PATH` | `/app/models/german_kurrent.mlmodel` | Path to Kraken model file |
 | `TRAINING_TOKEN` | `""` | Bearer token required for training endpoints |
 | `OCR_CONFIDENCE_THRESHOLD` | `0.3` | Minimum confidence for Latin scripts |
 | `OCR_CONFIDENCE_THRESHOLD_KURRENT` | `0.5` | Minimum confidence for Kurrent scripts |
 | `RECOGNITION_BATCH_SIZE` | `16` | Kraken recognition batch size |
 | `DETECTOR_BATCH_SIZE` | `8` | Surya detector batch size |
 | `OCR_CLAHE_CLIP_LIMIT` | `2.0` | CLAHE contrast enhancement limit |
 | `OCR_CLAHE_TILE_SIZE` | `8` | CLAHE tile grid size |
 | `OCR_MAX_CACHED_MODELS` | `2` | LRU model cache size (~500 MB each) |
 | `ALLOWED_PDF_HOSTS` | `minio,localhost,127.0.0.1` | SSRF protection — allowed PDF URL hosts |
 ## How to Run
 ### Local Development (Python venv)
 ```bash
 cd ocr-service
 python -m venv .venv
 source .venv/bin/activate
 # Install PyTorch CPU first (saves ~2 GB vs CUDA)
 pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cpu
 # Install remaining dependencies
 pip install -r requirements.txt
 # Run development server
 fastapi dev main.py --host 0.0.0.0 --port 8000
 # Or production mode
 uvicorn main:app --host 0.0.0.0 --port 8000
 ```
 ### Docker (via docker-compose)
 The OCR service is included in the root `docker-compose.yml`:
 ```bash
 docker-compose up -d ocr-service
 ```
 The container:
 - Exposes port 8000 internally (not mapped to host by default)
 - Mounts `ocr_models` and `ocr_cache` volumes for persistence
 - Has a 120-second startup grace period for model loading
 - Memory limit: 12 GB
 ### Model Downloads
 Use the helper script to download Kraken models:
 ```bash
 ./scripts/download-kraken-models.sh
 ```
 Models are stored in the `ocr_models` Docker volume or `./ocr-service/models/` locally.
 ## Testing
 Only a subset of tests can run without the full ML stack:
 ```bash
 cd ocr-service
 pip install pytest pytest-asyncio pyspellchecker
 # No ML required — pure logic tests
 python -m pytest test_spell_check.py test_confidence.py test_sender_registry.py -v
 ```
 Tests requiring PyTorch/Kraken/Surya (e.g., `test_engines.py`) must be run in the Docker container or a fully provisioned venv.
 ## Training
 The service supports in-process model fine-tuning via Kraken's `ketos` training pipeline. Training endpoints require the `TRAINING_TOKEN` bearer token. After training completes, the model is reloaded in-process — this is why only a single replica is supported.
--- a/scripts/CLAUDE.md
+++ b/scripts/CLAUDE.md
@@ -1,144 +1,5 @@
-# Scripts — Familienarchiv
+# scripts/
-## Overview
+→ See [scripts/README.md](./README.md) for the full list of scripts, their purpose, and usage.
-Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.
+**LLM reminder:** when adding a new script, document it in `scripts/README.md` (not here).
 ## Scripts
 ### `reset-db.sh`
 **Purpose**: Hard-reset the development database, wiping all documents, persons, tags, and related data.
 **Usage:**
 ```bash
 ./scripts/reset-db.sh
 # Type 'yes' to confirm
 ```
 **What it truncates:**
 - `transcription_block_versions`
 - `transcription_blocks`
 - `comment_mentions`
 - `document_comments`
 - `document_annotations`
 - `document_versions`
 - `notifications`
 - `documents`
 - `person_name_aliases`
 - `persons`
 - `tag`
 > ⚠️ **Destructive operation** — only for development!
 ---
 ### `rebuild-frontend.sh`
 **Purpose**: Force a clean rebuild of the frontend Docker container.
 **Usage:**
 ```bash
 ./scripts/rebuild-frontend.sh
 ```
 ---
 ### `download-kraken-models.sh`
 **Purpose**: Download Kraken HTR models for German Kurrent and Sütterlin scripts.
 **Usage:**
 ```bash
 ./scripts/download-kraken-models.sh
 ```
 Downloads models into `./ocr-service/models/` or the `ocr_models` Docker volume. Models are ~100-500 MB each.
 ---
 ### `download-paperless.sh`
 **Purpose**: Download exported documents from a Paperless-ngx instance.
 **Usage:**
 ```bash
 ./scripts/download-paperless.sh
 ```
 Requires environment variables or config for the Paperless API endpoint and token.
 ---
 ### `flatten-paperless.sh`
 **Purpose**: Flatten nested Paperless export directories into a single import-ready structure.
 **Usage:**
 ```bash
 ./scripts/flatten-paperless.sh
 ```
 ---
 ### `generate_data.py`
 **Purpose**: Generate synthetic test data for development.
 **Usage:**
 ```bash
 python scripts/generate_data.py
 ```
 Generates fake documents, persons, and tags suitable for load testing or UI development.
 ---
 ### `prepare_historical_dict.py`
 **Purpose**: Build a historical German word dictionary for the OCR spell-checker.
 **Usage:**
 ```bash
 python scripts/prepare_historical_dict.py
 ```
 Processes raw word lists into the format expected by `ocr-service/spell_check.py`.
 ---
 ### `schema.sql`
 **Purpose**: Complete database schema dump for reference.
 **Note**: Flyway migrations in `backend/src/main/resources/db/migration/` are the source of truth for schema evolution. `schema.sql` is a snapshot for quick reference only.
 ---
 ### `large-data.sql`
 **Purpose**: Pre-seeded dataset with a large number of documents for performance testing.
 **Usage:**
 ```bash
 # Import into PostgreSQL
 docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql
 ```
 ## How to Use
 Most scripts should be run from the **repository root**:
 ```bash
 # Database reset
 ./scripts/reset-db.sh
 # Model download
 ./scripts/download-kraken-models.sh
 # Data generation
 cd scripts && python generate_data.py
 ```
 Ensure scripts are executable:
 ```bash
 chmod +x scripts/*.sh
 ```
 ## Adding New Scripts
 1. Place the script in `scripts/`
 2. Add a header comment describing purpose and usage
 3. Make it executable (`chmod +x`)
 4. Document it in this `CLAUDE.md`
--- a/scripts/README.md
+++ b/scripts/README.md
@@ -0,0 +1,161 @@
 # scripts/
 Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.
 ## Scripts
 ### `reset-db.sh`
 **Purpose**: Hard-reset the development database, wiping all documents, persons, tags, and related data.
 **Usage:**
 ```bash
 ./scripts/reset-db.sh
 # Type 'yes' to confirm
 ```
 **What it truncates:**
 - `transcription_block_versions`
 - `transcription_blocks`
 - `comment_mentions`
 - `document_comments`
 - `document_annotations`
 - `document_versions`
 - `notifications`
 - `documents`
 - `person_name_aliases`
 - `persons`
 - `tag`
 > ⚠️ **Destructive operation — only for development!** This wipes ALL data. Not reversible without a backup.
 ---
 ### `rebuild-frontend.sh`
 **Purpose**: Force a clean rebuild of the frontend Docker container.
 **Usage:**
 ```bash
 ./scripts/rebuild-frontend.sh
 ```
 ---
 ### `download-kraken-models.sh`
 **Purpose**: Download Kraken HTR models for German Kurrent and Sütterlin scripts.
 **Usage:**
 ```bash
 ./scripts/download-kraken-models.sh
 ```
 Downloads models into `./ocr-service/models/` or the `ocr_models` Docker volume. Models are ~100–500 MB each.
 ---
 ### `download-paperless.sh`
 **Purpose**: Download exported documents from a Paperless-ngx instance.
 **Usage:**
 ```bash
 ./scripts/download-paperless.sh
 ```
 Requires environment variables or config for the Paperless API endpoint and token.
 ---
 ### `flatten-paperless.sh`
 **Purpose**: Flatten nested Paperless export directories into a single import-ready structure.
 **Usage:**
 ```bash
 ./scripts/flatten-paperless.sh
 ```
 ---
 ### `generate_data.py`
 **Purpose**: Generate synthetic test data for development.
 **Usage:**
 ```bash
 python scripts/generate_data.py
 ```
 Generates fake documents, persons, and tags suitable for load testing or UI development.
 ---
 ### `prepare_historical_dict.py`
 **Purpose**: Build a historical German word dictionary for the OCR spell-checker.
 **Usage:**
 ```bash
 python scripts/prepare_historical_dict.py
 ```
 Processes raw word lists into the format expected by `ocr-service/spell_check.py`.
 ---
 ### `schema.sql`
 **Purpose**: Complete database schema dump for reference.
 **Note**: Flyway migrations in `backend/src/main/resources/db/migration/` are the source of truth for schema evolution. `schema.sql` is a snapshot for quick reference only.
 ---
 ### `large-data.sql`
 **Purpose**: Pre-seeded dataset with a large number of documents for performance testing.
 **Usage:**
 ```bash
 # Import into PostgreSQL
 docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql
 ```
 ## How to Use
 Most scripts should be run from the **repository root**:
 ```bash
 # Database reset
 ./scripts/reset-db.sh
 # Model download
 ./scripts/download-kraken-models.sh
 # Data generation
 cd scripts && python generate_data.py
 ```
 Ensure scripts are executable:
 ```bash
 chmod +x scripts/*.sh
 ```
 ## Adding New Scripts
 1. Place the script in `scripts/`
 2. Add a header comment describing purpose and usage
 3. Make it executable (`chmod +x`)
 4. Document it in this `README.md`