docs(legibility): migrate CLAUDE.md rules into human docs — DOC-7
Processes all 7 CLAUDE.md files according to the 3-bucket classification. Migration targets (CONTRIBUTING.md, docs/ARCHITECTURE.md, docs/DEPLOYMENT.md, domain READMEs) are introduced by DOC-2/4/5/6 — this PR must merge last. ### scripts/CLAUDE.md → scripts/README.md New `scripts/README.md` with full script documentation (preserving the ⚠️ destructive-operation warning on reset-db.sh). `scripts/CLAUDE.md` reduced to a pointer + "document new scripts in README.md" reminder. ### .devcontainer/CLAUDE.md → .devcontainer/README.md New `.devcontainer/README.md` with all configuration, usage, and limitations. `devcontainer/CLAUDE.md` reduced to a single pointer line. ### docs/CLAUDE.md → docs/README.md New `docs/README.md` covering the folder structure, ADR guide, infrastructure docs, and specs folder. `docs/CLAUDE.md` reduced to pointer + ADR reminder. ### ocr-service/CLAUDE.md Reduced to pointer to `ocr-service/README.md` (content migrated in DOC-6). Kept LLM reminders: single-node constraint, ALLOWED_PDF_HOSTS SSRF risk. ### backend/CLAUDE.md - Layering Rules → pointer to docs/ARCHITECTURE.md - Error Handling → pointer to CONTRIBUTING.md + reminder - Security/Permissions → pointer to docs/ARCHITECTURE.md + reminder - Package Structure → tagged TODO post-REFACTOR-1 - Fixed errors.ts path to frontend/src/lib/shared/errors.ts - Added ANNOTATE_ALL + BLOG_WRITE to permission list - Key Entities, Entity Code Style, Services → kept (Bucket-2) ### root CLAUDE.md - Stack, Infrastructure, Dev Container → pointers - Layering Rules, Error Handling, Security, OpenAPI, API Client, Date Handling, UI Components, Frontend Error Handling → pointers + reminders - Package Structure → tagged TODO post-REFACTOR-1 - Domain Model, Entity Code Style, Form Actions, Styling → kept (Bucket-2) ### frontend/CLAUDE.md - API Client Pattern, Date Handling → pointers + reminders - Key UI Components → pointer to domain READMEs - Styling, Form Actions, How to Run, Vite Proxy, i18n → kept (Bucket-2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,96 +1,3 @@
|
||||
# Dev Container — Familienarchiv
|
||||
# Dev Container
|
||||
|
||||
## Overview
|
||||
|
||||
VS Code Dev Container configuration for a pre-configured development environment. Includes Java 21, Maven, and Node.js 24 — everything needed to work on both backend and frontend.
|
||||
|
||||
## Configuration
|
||||
|
||||
File: `.devcontainer/devcontainer.json`
|
||||
|
||||
### Included Features
|
||||
|
||||
| Feature | Version | Purpose |
|
||||
|---|---|---|
|
||||
| Java | 21 | Spring Boot backend |
|
||||
| Maven | bundled with Java feature | Build tool |
|
||||
| Node.js | 24 | SvelteKit frontend |
|
||||
|
||||
### VS Code Extensions (Auto-installed)
|
||||
|
||||
| Extension | Purpose |
|
||||
|---|---|
|
||||
| `vscjava.vscode-java-pack` | Java language support, debugging, testing |
|
||||
| `vmware.vscode-spring-boot` | Spring Boot tooling |
|
||||
| `gabrielbb.vscode-lombok` | Lombok annotation support |
|
||||
| `humao.rest-client` | HTTP request files (for `backend/api_tests/`) |
|
||||
|
||||
### Ports
|
||||
|
||||
- `8080` forwarded to host — access backend at `http://localhost:8080`
|
||||
|
||||
### User
|
||||
|
||||
Runs as `vscode` user (not root) for security.
|
||||
|
||||
## How to Use
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- VS Code with the **Dev Containers** extension installed
|
||||
- Docker running locally
|
||||
|
||||
### Open in Dev Container
|
||||
|
||||
1. Open the project in VS Code
|
||||
2. Press `F1` → type "Dev Containers: Reopen in Container"
|
||||
3. VS Code will:
|
||||
- Build the container using the root `docker-compose.yml`
|
||||
- Install Java 21, Maven, and Node 24
|
||||
- Install the listed extensions
|
||||
- Mount the workspace folder
|
||||
|
||||
### Working Inside the Container
|
||||
|
||||
Once inside the container, you have access to both stacks:
|
||||
|
||||
```bash
|
||||
# Backend
|
||||
cd backend
|
||||
./mvnw spring-boot:run
|
||||
|
||||
# Frontend (in a new terminal)
|
||||
cd frontend
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
The container reuses the `docker-compose.yml` services, so PostgreSQL and MinIO are available automatically.
|
||||
|
||||
### Forwarding Frontend Port
|
||||
|
||||
The devcontainer config only forwards port 8080 by default. To access the frontend dev server (port 5173 or 3000), either:
|
||||
|
||||
1. Add `5173` to `forwardPorts` in `devcontainer.json`, or
|
||||
2. Use the VS Code "Ports" panel to forward it dynamically
|
||||
|
||||
## Limitations
|
||||
|
||||
- The devcontainer attaches to the `backend` service from `docker-compose.yml`, so it inherits those environment variables
|
||||
- OCR service and other containers should be started separately via `docker-compose up -d`
|
||||
- GPU passthrough for OCR training is not configured
|
||||
|
||||
## Customization
|
||||
|
||||
To add more tools or extensions, edit `.devcontainer/devcontainer.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"features": {
|
||||
"ghcr.io/devcontainers/features/python:1": {
|
||||
"version": "3.11"
|
||||
}
|
||||
},
|
||||
"forwardPorts": [8080, 5173, 3000]
|
||||
}
|
||||
```
|
||||
→ See [.devcontainer/README.md](./README.md) for configuration, usage, and known limitations.
|
||||
|
||||
94
.devcontainer/README.md
Normal file
94
.devcontainer/README.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# Dev Container — Familienarchiv
|
||||
|
||||
VS Code Dev Container configuration for a pre-configured development environment. Includes Java 21, Maven, and Node.js 24 — everything needed to work on both backend and frontend.
|
||||
|
||||
## Configuration
|
||||
|
||||
File: `.devcontainer/devcontainer.json`
|
||||
|
||||
### Included Features
|
||||
|
||||
| Feature | Version | Purpose |
|
||||
| ------- | ------------------------- | ------------------- |
|
||||
| Java | 21 | Spring Boot backend |
|
||||
| Maven | bundled with Java feature | Build tool |
|
||||
| Node.js | 24 | SvelteKit frontend |
|
||||
|
||||
### VS Code Extensions (Auto-installed)
|
||||
|
||||
| Extension | Purpose |
|
||||
| --------------------------- | --------------------------------------------- |
|
||||
| `vscjava.vscode-java-pack` | Java language support, debugging, testing |
|
||||
| `vmware.vscode-spring-boot` | Spring Boot tooling |
|
||||
| `gabrielbb.vscode-lombok` | Lombok annotation support |
|
||||
| `humao.rest-client` | HTTP request files (for `backend/api_tests/`) |
|
||||
|
||||
### Ports
|
||||
|
||||
- `8080` forwarded to host — access backend at `http://localhost:8080`
|
||||
|
||||
### User
|
||||
|
||||
Runs as `vscode` user (not root) for security.
|
||||
|
||||
## How to Use
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- VS Code with the **Dev Containers** extension installed
|
||||
- Docker running locally
|
||||
|
||||
### Open in Dev Container
|
||||
|
||||
1. Open the project in VS Code
|
||||
2. Press `F1` → type "Dev Containers: Reopen in Container"
|
||||
3. VS Code will:
|
||||
- Build the container using the root `docker-compose.yml`
|
||||
- Install Java 21, Maven, and Node 24
|
||||
- Install the listed extensions
|
||||
- Mount the workspace folder
|
||||
|
||||
### Working Inside the Container
|
||||
|
||||
Once inside the container, you have access to both stacks:
|
||||
|
||||
```bash
|
||||
# Backend
|
||||
cd backend
|
||||
./mvnw spring-boot:run
|
||||
|
||||
# Frontend (in a new terminal)
|
||||
cd frontend
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
The container reuses the `docker-compose.yml` services, so PostgreSQL and MinIO are available automatically.
|
||||
|
||||
### Forwarding Frontend Port
|
||||
|
||||
The devcontainer config only forwards port 8080 by default. To access the frontend dev server (port 5173 or 3000), either:
|
||||
|
||||
1. Add `5173` to `forwardPorts` in `devcontainer.json`, or
|
||||
2. Use the VS Code "Ports" panel to forward it dynamically
|
||||
|
||||
## Limitations
|
||||
|
||||
- The devcontainer attaches to the `backend` service from `docker-compose.yml`, so it inherits those environment variables
|
||||
- OCR service and other containers should be started separately via `docker-compose up -d`
|
||||
- GPU passthrough for OCR training is not configured
|
||||
|
||||
## Customization
|
||||
|
||||
To add more tools or extensions, edit `.devcontainer/devcontainer.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"features": {
|
||||
"ghcr.io/devcontainers/features/python:1": {
|
||||
"version": "3.11"
|
||||
}
|
||||
},
|
||||
"forwardPorts": [8080, 5173, 3000]
|
||||
}
|
||||
```
|
||||
195
CLAUDE.md
195
CLAUDE.md
@@ -2,6 +2,8 @@
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
> For a human-readable project overview, see [README.md](./README.md).
|
||||
|
||||
## Project Overview
|
||||
|
||||
**Familienarchiv** is a family document archival system — a full-stack web app for digitizing, organizing, and searching family documents. Key features: file uploads (stored in MinIO/S3), metadata management, Excel/ODS batch import, full-text search, conversation threads between family members, and role-based access control.
|
||||
@@ -16,6 +18,8 @@ See [CODESTYLE.md](./CODESTYLE.md) for coding standards: Clean Code, DRY/KISS tr
|
||||
|
||||
## Stack
|
||||
|
||||
→ See [README.md §Tech Stack](./README.md#tech-stack)
|
||||
|
||||
- **Backend**: Spring Boot 4.0 (Java 21, Maven, Jetty, JPA/Hibernate, Flyway, Spring Security, Spring Session JDBC)
|
||||
- **Frontend**: SvelteKit 2 with Svelte 5, TypeScript, Tailwind CSS 4, Paraglide.js (i18n: de/en/es)
|
||||
- **Database**: PostgreSQL 16
|
||||
@@ -25,12 +29,13 @@ See [CODESTYLE.md](./CODESTYLE.md) for coding standards: Clean Code, DRY/KISS tr
|
||||
## Common Commands
|
||||
|
||||
### Running the Full Stack
|
||||
|
||||
```bash
|
||||
# From repo root — starts PostgreSQL, MinIO, and Spring Boot backend
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### Backend (Spring Boot)
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
|
||||
@@ -42,6 +47,7 @@ cd backend
|
||||
```
|
||||
|
||||
### Frontend (SvelteKit)
|
||||
|
||||
```bash
|
||||
cd frontend
|
||||
|
||||
@@ -64,7 +70,7 @@ npm run generate:api # Regenerate TypeScript API types from OpenAPI spec
|
||||
|
||||
### Package Structure
|
||||
|
||||
Package-by-domain: each domain owns its controller, service, repository, entities, and DTOs.
|
||||
<!-- TODO: rewrite post-REFACTOR-1 — see Epic 4 -->
|
||||
|
||||
```
|
||||
backend/src/main/java/org/raddatz/familienarchiv/
|
||||
@@ -88,27 +94,21 @@ backend/src/main/java/org/raddatz/familienarchiv/
|
||||
└── user/ User domain — AppUser, UserGroup, UserService, auth controllers
|
||||
```
|
||||
|
||||
### Layering Rules (strictly enforced)
|
||||
### Layering Rules
|
||||
|
||||
```
|
||||
Controller → Service → Repository → DB
|
||||
```
|
||||
→ See [docs/ARCHITECTURE.md §Layering rule](./docs/ARCHITECTURE.md#layering-rule)
|
||||
|
||||
- **Controllers** never inject or call repositories directly.
|
||||
- **Services** never reach into another domain's repository. Call the other domain's service instead.
|
||||
- ✅ `DocumentService` → `PersonService.getById()` → `PersonRepository`
|
||||
- ❌ `DocumentService` → `PersonRepository` directly
|
||||
- This keeps domain boundaries clear and business logic testable in isolation.
|
||||
**LLM reminder:** controllers never call repositories directly; services never reach into another domain's repository — always call the other domain's service instead.
|
||||
|
||||
### Domain Model
|
||||
|
||||
| Entity | Table | Key relationships |
|
||||
|---|---|---|
|
||||
| `Document` | `documents` | ManyToOne `sender` (Person), ManyToMany `receivers` (Person), ManyToMany `tags` (Tag) |
|
||||
| `Person` | `persons` | Referenced by documents as sender/receiver |
|
||||
| `Tag` | `tag` | ManyToMany with documents via `document_tags` |
|
||||
| `AppUser` | `app_users` | ManyToMany `groups` (UserGroup) |
|
||||
| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
|
||||
| Entity | Table | Key relationships |
|
||||
| ----------- | ------------- | ------------------------------------------------------------------------------------- |
|
||||
| `Document` | `documents` | ManyToOne `sender` (Person), ManyToMany `receivers` (Person), ManyToMany `tags` (Tag) |
|
||||
| `Person` | `persons` | Referenced by documents as sender/receiver |
|
||||
| `Tag` | `tag` | ManyToMany with documents via `document_tags` |
|
||||
| `AppUser` | `app_users` | ManyToMany `groups` (UserGroup) |
|
||||
| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
|
||||
|
||||
**`DocumentStatus` lifecycle:** `PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`
|
||||
|
||||
@@ -118,6 +118,7 @@ Controller → Service → Repository → DB
|
||||
### Entity Code Style
|
||||
|
||||
All entities use these Lombok annotations:
|
||||
|
||||
```java
|
||||
@Entity
|
||||
@Table(name = "table_name")
|
||||
@@ -146,65 +147,29 @@ Services are annotated with `@Service`, `@RequiredArgsConstructor`, and optional
|
||||
- Read methods are not annotated (default non-transactional is fine).
|
||||
- Each service owns its domain's repository. Cross-domain data access goes through the other domain's service.
|
||||
|
||||
**Existing services:**
|
||||
|
||||
| Service | Responsibility |
|
||||
|---|---|
|
||||
| `DocumentService` | Document CRUD, search, tag cascade delete |
|
||||
| `PersonService` | Person CRUD, find-or-create by alias |
|
||||
| `TagService` | Tag find/create/update/delete |
|
||||
| `UserService` | User and group CRUD |
|
||||
| `FileService` | S3/MinIO upload and download |
|
||||
| `MassImportService` | Async ODS/Excel import; delegates to PersonService and TagService |
|
||||
|
||||
### DTOs
|
||||
|
||||
Input DTOs live in `dto/`. Response types are the model entities themselves (no response DTOs).
|
||||
Input DTOs live flat in the domain package. Response types are the model entities themselves (no response DTOs).
|
||||
|
||||
- `DocumentUpdateDTO` — used for both create and update (all fields optional)
|
||||
- `CreateUserRequest` — user creation
|
||||
- `GroupDTO` — group create/update
|
||||
- `@Schema(requiredMode = REQUIRED)` on every field the backend always populates — drives TypeScript generation.
|
||||
|
||||
### Error Handling
|
||||
|
||||
Use `DomainException` for all domain errors. Never throw raw exceptions from service methods.
|
||||
→ See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
|
||||
|
||||
```java
|
||||
// Static factories match common HTTP status codes:
|
||||
DomainException.notFound(ErrorCode.DOCUMENT_NOT_FOUND, "Document not found: " + id)
|
||||
DomainException.forbidden("Access denied")
|
||||
DomainException.conflict(ErrorCode.IMPORT_ALREADY_RUNNING, "Already running")
|
||||
DomainException.internal(ErrorCode.FILE_UPLOAD_FAILED, "Upload failed: " + e.getMessage())
|
||||
```
|
||||
|
||||
`ErrorCode` is an enum in `exception/ErrorCode.java`. When adding a new error case, add the value there **and** mirror it in the frontend's `src/lib/errors.ts` + add a Paraglide translation key.
|
||||
|
||||
For simple validation in controllers (not domain logic), `ResponseStatusException` is acceptable:
|
||||
```java
|
||||
throw new ResponseStatusException(HttpStatus.BAD_REQUEST, "firstName is required");
|
||||
```
|
||||
**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) mirror in `frontend/src/lib/shared/errors.ts`, (3) add i18n keys in `messages/{de,en,es}.json`.
|
||||
|
||||
### Security / Permissions
|
||||
|
||||
Use `@RequirePermission` on controller methods (or the whole controller class):
|
||||
→ See [docs/ARCHITECTURE.md §Permission system](./docs/ARCHITECTURE.md#permission-system)
|
||||
|
||||
```java
|
||||
@RequirePermission(Permission.WRITE_ALL)
|
||||
public Document updateDocument(...) { ... }
|
||||
```
|
||||
|
||||
Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`
|
||||
|
||||
`PermissionAspect` (AOP) checks the current user's `UserGroup.permissions` at runtime.
|
||||
**LLM reminder:** `@RequirePermission(Permission.WRITE_ALL)` is **required** on every `POST`, `PUT`, `PATCH`, `DELETE` endpoint — not optional. Do not mix with Spring Security's `@PreAuthorize`. Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`, `ANNOTATE_ALL`, `BLOG_WRITE`.
|
||||
|
||||
### OpenAPI / API Types
|
||||
|
||||
SpringDoc generates the spec at `/v3/api-docs` (only accessible when running with `--spring.profiles.active=dev`).
|
||||
→ See [CONTRIBUTING.md §Walkthrough B — Add a new endpoint](./CONTRIBUTING.md#4-walkthrough-b--add-a-new-endpoint)
|
||||
|
||||
When changing any model field or endpoint:
|
||||
1. Rebuild the backend JAR with `-DskipTests`
|
||||
2. Start it with `--spring.profiles.active=dev`
|
||||
3. Run `npm run generate:api` in `frontend/`
|
||||
**LLM reminder:** always run `npm run generate:api` in `frontend/` after any backend model or endpoint change — this is the most common cause of TypeScript type errors.
|
||||
|
||||
---
|
||||
|
||||
@@ -233,79 +198,52 @@ frontend/src/routes/
|
||||
|
||||
### API Client Pattern
|
||||
|
||||
All server-side API calls use the typed client from `$lib/api.server.ts`:
|
||||
→ See [CONTRIBUTING.md §Frontend API client](./CONTRIBUTING.md#frontend-api-client)
|
||||
|
||||
```typescript
|
||||
const api = createApiClient(fetch);
|
||||
const result = await api.GET('/api/persons/{id}', { params: { path: { id } } });
|
||||
|
||||
// Always check via response.ok, NOT result.error
|
||||
if (!result.response.ok) {
|
||||
const code = (result.error as unknown as { code?: string })?.code;
|
||||
throw error(result.response.status, getErrorMessage(code));
|
||||
}
|
||||
return { person: result.data! };
|
||||
```
|
||||
|
||||
Key rules:
|
||||
- Use `!result.response.ok` for error checking (not `if (result.error)` — this breaks when the spec has no error responses defined)
|
||||
- Cast errors as `result.error as unknown as { code?: string }` to extract the backend error code
|
||||
- Use `result.data!` (non-null assertion) after an ok check — TypeScript knows it's present
|
||||
|
||||
For multipart/form-data endpoints (file uploads), bypass the typed client and use raw `fetch`:
|
||||
```typescript
|
||||
const res = await fetch(`${baseUrl}/api/documents`, { method: 'POST', body: formData });
|
||||
```
|
||||
**LLM reminder:** check `!result.response.ok` (not `result.error` — breaks when spec has no error responses defined); cast errors as `result.error as unknown as { code?: string }`; use `result.data!` after an ok check.
|
||||
|
||||
### Form Actions Pattern
|
||||
|
||||
```typescript
|
||||
// +page.server.ts
|
||||
export const actions = {
|
||||
default: async ({ request, fetch }) => {
|
||||
const formData = await request.formData();
|
||||
const name = formData.get('name') as string; // cast needed — FormData returns FormDataEntryValue
|
||||
// ...
|
||||
return fail(400, { error: 'message' }); // on error
|
||||
throw redirect(303, '/target'); // on success
|
||||
}
|
||||
default: async ({ request, fetch }) => {
|
||||
const formData = await request.formData();
|
||||
const name = formData.get("name") as string;
|
||||
// ...
|
||||
return fail(400, { error: "message" }); // on error
|
||||
throw redirect(303, "/target"); // on success
|
||||
},
|
||||
};
|
||||
```
|
||||
|
||||
### Date Handling
|
||||
|
||||
- **Forms**: German format `dd.mm.yyyy` with auto-dot insertion via `handleDateInput()`. A hidden `<input type="hidden" name="documentDate" value={dateIso}>` sends ISO format to the backend.
|
||||
- **Display**: Always use `Intl.DateTimeFormat` with `T12:00:00` suffix to prevent UTC timezone off-by-one:
|
||||
```typescript
|
||||
new Intl.DateTimeFormat('de-DE', { day: 'numeric', month: 'long', year: 'numeric' })
|
||||
.format(new Date(doc.documentDate + 'T12:00:00'))
|
||||
```
|
||||
→ See [CONTRIBUTING.md §Date handling](./CONTRIBUTING.md#date-handling)
|
||||
|
||||
**LLM reminder:** always append `T12:00:00` when constructing `new Date()` from an ISO date string — prevents UTC timezone off-by-one errors.
|
||||
|
||||
### UI Component Library
|
||||
|
||||
Custom components in `src/lib/components/`:
|
||||
|
||||
| Component | Props | Description |
|
||||
|---|---|---|
|
||||
| `PersonTypeahead` | `name`, `label`, `value`, `initialName`, `on:change` | Single-person selector with typeahead dropdown |
|
||||
| `PersonMultiSelect` | `selectedPersons` (bind) | Chip-based multi-person selector |
|
||||
| `TagInput` | `tags` (bind), `allowCreation?`, `on:change` | Tag chip input with typeahead |
|
||||
→ See per-domain READMEs: [`frontend/src/lib/person/README.md`](./frontend/src/lib/person/README.md), [`frontend/src/lib/tag/README.md`](./frontend/src/lib/tag/README.md), [`frontend/src/lib/document/README.md`](./frontend/src/lib/document/README.md), [`frontend/src/lib/shared/README.md`](./frontend/src/lib/shared/README.md)
|
||||
|
||||
### Styling Conventions (Tailwind CSS 4)
|
||||
|
||||
Brand color utilities (defined in `layout.css`):
|
||||
|
||||
| Class | Value | Usage |
|
||||
|---|---|---|
|
||||
| `brand-navy` | `#002850` | Primary text, buttons, headers |
|
||||
| Class | Value | Usage |
|
||||
| ------------ | --------- | -------------------------------- |
|
||||
| `brand-navy` | `#002850` | Primary text, buttons, headers |
|
||||
| `brand-mint` | `#A6DAD8` | Accents, hover underlines, icons |
|
||||
| `brand-sand` | `#E4E2D7` | Page background, card borders |
|
||||
| `brand-sand` | `#E4E2D7` | Page background, card borders |
|
||||
|
||||
Typography:
|
||||
|
||||
- `font-serif` (Merriweather) — body text, document titles, names
|
||||
- `font-sans` (Montserrat) — labels, metadata, UI chrome
|
||||
|
||||
Card pattern for content sections:
|
||||
|
||||
```svelte
|
||||
<div class="bg-white shadow-sm border border-brand-sand rounded-sm p-6">
|
||||
<h2 class="text-xs font-bold uppercase tracking-widest text-gray-400 mb-5">Section Title</h2>
|
||||
@@ -313,48 +251,19 @@ Card pattern for content sections:
|
||||
</div>
|
||||
```
|
||||
|
||||
Save bar pattern — use **sticky full-bleed** for long forms (edit document), **card-style with `mt-4`** for short forms (new person):
|
||||
```svelte
|
||||
<!-- Long forms: sticky, full-bleed -->
|
||||
<div class="sticky bottom-0 z-10 -mx-4 px-6 py-4 bg-white border-t border-brand-sand shadow-[0_-2px_8px_rgba(0,0,0,0.06)] flex items-center justify-between">
|
||||
|
||||
<!-- Short forms: card, top margin -->
|
||||
<div class="mt-4 flex items-center justify-between rounded-sm border border-brand-sand bg-white px-6 py-4 shadow-sm">
|
||||
```
|
||||
|
||||
Back button pattern — use the shared `<BackButton>` component from `$lib/components/BackButton.svelte`:
|
||||
```svelte
|
||||
<script lang="ts">
|
||||
import BackButton from '$lib/components/BackButton.svelte';
|
||||
</script>
|
||||
|
||||
<BackButton />
|
||||
```
|
||||
The component calls `history.back()` so the user returns to wherever they came from. Label is always "Zurück" (no contextual suffix — destination is unknown). Touch target ≥ 44px and focus ring are built in. Do not use a static `<a href>` for back navigation.
|
||||
|
||||
Subtle action link (e.g. "new document/person"):
|
||||
```svelte
|
||||
<a href="/documents/new" class="inline-flex items-center gap-1 text-sm font-medium text-brand-navy/60 hover:text-brand-navy transition-colors">
|
||||
<svg class="w-4 h-4" ...><!-- plus icon --></svg>
|
||||
Neues Dokument
|
||||
</a>
|
||||
```
|
||||
Back button pattern — use the shared `<BackButton>` component from `$lib/shared/primitives/BackButton.svelte`. Do not use a static `<a href>` for back navigation.
|
||||
|
||||
### Error Handling (Frontend)
|
||||
|
||||
`src/lib/errors.ts` mirrors the backend `ErrorCode` enum and maps codes to Paraglide translation keys. When adding a new `ErrorCode` on the backend:
|
||||
1. Add it to `ErrorCode.java`
|
||||
2. Add it to the `ErrorCode` type in `errors.ts`
|
||||
3. Add a `case` in `getErrorMessage()`
|
||||
4. Add the translation key in `messages/de.json`, `en.json`, `es.json`
|
||||
→ See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
|
||||
|
||||
**LLM reminder:** when adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`.
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure
|
||||
|
||||
The `docker-compose.yml` at the repo root orchestrates everything. A MinIO MC helper container runs at startup to create the `archive-documents` bucket. The backend container depends on both `db` and `minio` being healthy.
|
||||
|
||||
Database migrations live in `backend/src/main/resources/db/migration/` (Flyway, SQL files named `V{n}__{description}.sql`).
|
||||
→ See [docs/DEPLOYMENT.md](./docs/DEPLOYMENT.md)
|
||||
|
||||
## API Testing
|
||||
|
||||
@@ -362,4 +271,4 @@ HTTP test files are in `backend/api_tests/` for use with the VS Code REST Client
|
||||
|
||||
## Dev Container
|
||||
|
||||
A `.devcontainer/` config is available (Java 21 + Node 24, ports 8080 and 3000 forwarded). Use VS Code's "Reopen in Container" for a pre-configured environment.
|
||||
→ See [.devcontainer/README.md](./.devcontainer/README.md)
|
||||
|
||||
@@ -11,7 +11,7 @@ Spring Boot 4.0 monolith serving the Familienarchiv REST API. Handles document m
|
||||
- **Server**: Jetty (not Tomcat — excluded in pom.xml)
|
||||
- **Data**: PostgreSQL 16, JPA/Hibernate, Spring Data JPA
|
||||
- **Migrations**: Flyway (SQL files in `src/main/resources/db/migration/`)
|
||||
- **Security**: Spring Security, Spring Session JDBC, JWT tokens
|
||||
- **Security**: Spring Security, Spring Session JDBC
|
||||
- **File Storage**: MinIO via AWS SDK v2 (S3-compatible)
|
||||
- **Spreadsheet Import**: Apache POI 5.5.0 (Excel/ODS)
|
||||
- **API Docs**: SpringDoc OpenAPI 3.x (`/v3/api-docs` — dev profile only)
|
||||
@@ -19,7 +19,7 @@ Spring Boot 4.0 monolith serving the Familienarchiv REST API. Handles document m
|
||||
|
||||
## Package Structure
|
||||
|
||||
Package-by-domain: each domain owns its controller, service, repository, entities, and DTOs.
|
||||
<!-- TODO: rewrite post-REFACTOR-1 — see Epic 4 -->
|
||||
|
||||
```
|
||||
src/main/java/org/raddatz/familienarchiv/
|
||||
@@ -43,31 +43,28 @@ src/main/java/org/raddatz/familienarchiv/
|
||||
└── user/ # User domain — AppUser, UserGroup, UserService, auth controllers
|
||||
```
|
||||
|
||||
## Layering Rules (Strict)
|
||||
For per-domain ownership and public surface, see each domain's `README.md`.
|
||||
|
||||
```
|
||||
Controller → Service → Repository → DB
|
||||
```
|
||||
## Layering Rules
|
||||
|
||||
- **Controllers never call repositories directly.**
|
||||
- **Services never reach into another domain's repository.** Call the other domain's service instead.
|
||||
- ✅ `DocumentService` → `PersonService.getById()` → `PersonRepository`
|
||||
- ❌ `DocumentService` → `PersonRepository` directly
|
||||
→ See [docs/ARCHITECTURE.md §Layering rule](../docs/ARCHITECTURE.md#layering-rule)
|
||||
|
||||
**LLM reminder:** controllers never call repositories directly; services never reach into another domain's repository — always call the other domain's service.
|
||||
|
||||
## Key Entities
|
||||
|
||||
| Entity | Table | Key Relationships |
|
||||
|---|---|---|
|
||||
| `Document` | `documents` | ManyToOne sender (Person), ManyToMany receivers (Person), ManyToMany tags (Tag) |
|
||||
| `Person` | `persons` | Referenced by documents as sender/receiver; name aliases table |
|
||||
| `Tag` | `tag` | ManyToMany with documents via `document_tags`; self-referencing parent for tree |
|
||||
| `AppUser` | `app_users` | ManyToMany groups (UserGroup) |
|
||||
| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
|
||||
| `TranscriptionBlock` | `transcription_blocks` | Per-document, per-page text blocks with polygons |
|
||||
| `DocumentAnnotation` | `document_annotations` | Free-form annotations on document pages |
|
||||
| `Comment` | `document_comments` | Threaded comments with mentions |
|
||||
| `Notification` | `notifications` | User notification feed |
|
||||
| `OcrJob` / `OcrJobDocument` | `ocr_jobs`, `ocr_job_documents` | Batch OCR job tracking |
|
||||
| Entity | Table | Key Relationships |
|
||||
| --------------------------- | ------------------------------- | ------------------------------------------------------------------------------- |
|
||||
| `Document` | `documents` | ManyToOne sender (Person), ManyToMany receivers (Person), ManyToMany tags (Tag) |
|
||||
| `Person` | `persons` | Referenced by documents as sender/receiver; name aliases table |
|
||||
| `Tag` | `tag` | ManyToMany with documents via `document_tags`; self-referencing parent for tree |
|
||||
| `AppUser` | `app_users` | ManyToMany groups (UserGroup) |
|
||||
| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
|
||||
| `TranscriptionBlock` | `transcription_blocks` | Per-document, per-page text blocks with polygons |
|
||||
| `DocumentAnnotation` | `document_annotations` | Free-form annotations on document pages |
|
||||
| `Comment` | `document_comments` | Threaded comments with mentions |
|
||||
| `Notification` | `notifications` | User notification feed |
|
||||
| `OcrJob` / `OcrJobDocument` | `ocr_jobs`, `ocr_job_documents` | Batch OCR job tracking |
|
||||
|
||||
**`DocumentStatus` lifecycle:** `PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`
|
||||
|
||||
@@ -104,32 +101,15 @@ public class MyEntity {
|
||||
|
||||
## Error Handling
|
||||
|
||||
Use `DomainException` for all domain errors:
|
||||
→ See [CONTRIBUTING.md §Error handling](../CONTRIBUTING.md#error-handling)
|
||||
|
||||
```java
|
||||
DomainException.notFound(ErrorCode.DOCUMENT_NOT_FOUND, "...")
|
||||
DomainException.forbidden("...")
|
||||
DomainException.conflict(ErrorCode.IMPORT_ALREADY_RUNNING, "...")
|
||||
DomainException.internal(ErrorCode.FILE_UPLOAD_FAILED, "...")
|
||||
```
|
||||
|
||||
When adding a new `ErrorCode`:
|
||||
1. Add to `ErrorCode.java`
|
||||
2. Mirror in frontend `src/lib/errors.ts`
|
||||
3. Add Paraglide translation key in `messages/{de,en,es}.json`
|
||||
**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` — never throw raw exceptions from service methods. When adding a new `ErrorCode`: add to `ErrorCode.java`, mirror in `frontend/src/lib/shared/errors.ts`, add i18n keys in `messages/{de,en,es}.json`.
|
||||
|
||||
## Security / Permissions
|
||||
|
||||
Use `@RequirePermission` on controller methods or classes:
|
||||
→ See [docs/ARCHITECTURE.md §Permission system](../docs/ARCHITECTURE.md#permission-system)
|
||||
|
||||
```java
|
||||
@RequirePermission(Permission.WRITE_ALL)
|
||||
public Document updateDocument(...) { ... }
|
||||
```
|
||||
|
||||
Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`
|
||||
|
||||
`PermissionAspect` checks the current user's `UserGroup.permissions` at runtime.
|
||||
**LLM reminder:** `@RequirePermission(Permission.WRITE_ALL)` is **required** on every `POST`, `PUT`, `PATCH`, `DELETE` endpoint — not optional. Do not mix with Spring Security's `@PreAuthorize`. Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`, `ANNOTATE_ALL`, `BLOG_WRITE`.
|
||||
|
||||
## OCR Integration
|
||||
|
||||
@@ -141,49 +121,35 @@ The backend orchestrates OCR by calling the Python `ocr-service` microservice vi
|
||||
- `OcrBatchService` — handles batch/job workflows
|
||||
- `OcrAsyncRunner` — async execution of OCR jobs
|
||||
|
||||
For ocr-service internals, see [`ocr-service/README.md`](../ocr-service/README.md).
|
||||
|
||||
## API Testing
|
||||
|
||||
HTTP test files in `backend/api_tests/` for the VS Code REST Client extension.
|
||||
|
||||
## How to Run
|
||||
|
||||
### Local Development
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
|
||||
# Run with dev profile (requires PostgreSQL + MinIO running via docker-compose)
|
||||
./mvnw spring-boot:run
|
||||
|
||||
# Build JAR (with tests)
|
||||
./mvnw clean package
|
||||
|
||||
# Build JAR skipping tests
|
||||
./mvnw spring-boot:run # Run with dev profile (requires PostgreSQL + MinIO)
|
||||
./mvnw clean package # Build JAR (with tests)
|
||||
./mvnw clean package -DskipTests
|
||||
|
||||
# Run all tests
|
||||
./mvnw test
|
||||
|
||||
# Run a single test class
|
||||
./mvnw test -Dtest=ClassName
|
||||
|
||||
# Run with coverage (JaCoCo)
|
||||
./mvnw clean verify
|
||||
./mvnw test # Run all tests
|
||||
./mvnw test -Dtest=ClassName # Run a single test class
|
||||
./mvnw clean verify # Run with JaCoCo coverage report
|
||||
```
|
||||
|
||||
### OpenAPI TypeScript Generation
|
||||
**OpenAPI / TypeScript type generation:**
|
||||
|
||||
1. Build and start backend with `--spring.profiles.active=dev`
|
||||
2. In `frontend/`, run: `npm run generate:api`
|
||||
1. Start backend with `--spring.profiles.active=dev`
|
||||
2. In `frontend/`: `npm run generate:api`
|
||||
|
||||
### Profiles
|
||||
|
||||
- **dev** (default): Enables OpenAPI, dev configs, e2e seeds
|
||||
- **prod**: Production profile — no dev endpoints
|
||||
**LLM reminder:** always regenerate types after any model or endpoint change — the most common cause of "where did my TypeScript type go?"
|
||||
|
||||
## Testing
|
||||
|
||||
- Unit tests: Mockito + JUnit, pure in-memory
|
||||
- Slice tests: `@WebMvcTest`, `@DataJpaTest` with Testcontainers PostgreSQL
|
||||
- Integration tests: Full Spring context with Testcontainers
|
||||
- Coverage gate: 88% branch coverage overall (JaCoCo)
|
||||
- Coverage gate: 88% branch coverage (JaCoCo)
|
||||
|
||||
@@ -1,97 +1,5 @@
|
||||
# Docs — Familienarchiv
|
||||
# docs/
|
||||
|
||||
## Overview
|
||||
→ See [docs/README.md](./README.md) for the folder structure and documentation guide.
|
||||
|
||||
Project documentation organized into four categories: architecture decision records (ADRs), system architecture diagrams, infrastructure runbooks, and detailed UI/UX specifications.
|
||||
|
||||
## Folder Structure
|
||||
|
||||
```
|
||||
docs/
|
||||
├── adr/ # Architecture Decision Records
|
||||
├── architecture/ # C4 model diagrams and system architecture docs
|
||||
├── infrastructure/ # Deployment, CI/CD, and ops guides
|
||||
├── specs/ # UI/UX feature specifications (HTML)
|
||||
├── app-analysis-*.md # Application analysis reports
|
||||
├── mail.md # Mail system documentation
|
||||
├── security-guide.md # Security policies and hardening guide
|
||||
├── STYLEGUIDE.md # Coding and design style guide
|
||||
├── TODO-backend.md # Backend backlog
|
||||
└── TODO-frontend.md # Frontend backlog
|
||||
```
|
||||
|
||||
## ADR (`adr/`)
|
||||
|
||||
Architecture Decision Records capture major technical decisions and their rationale.
|
||||
|
||||
| ADR | Title | Status |
|
||||
|---|---|---|
|
||||
| `001-ocr-python-microservice.md` | OCR as a separate Python container | Accepted |
|
||||
| `002-polygon-jsonb-storage.md` | Polygon coordinates in JSONB columns | Accepted |
|
||||
| `003-chronik-unified-activity-feed.md` | Unified activity feed (Chronik) | Accepted |
|
||||
|
||||
When making a significant architectural change (new service, data model change, technology swap), write a new ADR following the format:
|
||||
- Status (Proposed / Accepted / Deprecated / Superseded)
|
||||
- Context (forces at play)
|
||||
- Decision (what we decided)
|
||||
- Consequences (trade-offs)
|
||||
- Alternatives Considered (table format)
|
||||
|
||||
## Architecture (`architecture/`)
|
||||
|
||||
Contains C4 model diagrams describing the system at different zoom levels:
|
||||
|
||||
- **Context diagram** — How Familienarchiv fits into the user and system ecosystem
|
||||
- **Container diagram** — The high-level technology choices (Spring Boot, SvelteKit, PostgreSQL, MinIO, OCR service)
|
||||
- **Component diagram** — Major structural components within the backend
|
||||
|
||||
Written in Markdown with embedded Mermaid or PlantUML diagrams (`c4-diagrams.md`).
|
||||
|
||||
## Infrastructure (`infrastructure/`)
|
||||
|
||||
Operational documentation for running Familienarchiv in production and CI.
|
||||
|
||||
| Document | Purpose |
|
||||
|---|---|
|
||||
| `ci-gitea.md` | Gitea CI/CD pipeline configuration |
|
||||
| `production-compose.md` | Production Docker Compose setup |
|
||||
| `s3-migration.md` | Migrating documents between S3 buckets |
|
||||
| `self-hosted-catalogue.md` | Self-hosted software catalogue |
|
||||
|
||||
## Specs (`specs/`)
|
||||
|
||||
High-fidelity UI/UX specifications written as standalone HTML files. These are design documents that describe exact layout, interactions, and responsive behavior before implementation.
|
||||
|
||||
Each spec typically includes:
|
||||
- Visual mockups with CSS-in-HTML styling
|
||||
- Interaction flows and state transitions
|
||||
- Responsive breakpoint behavior
|
||||
- Accessibility requirements
|
||||
|
||||
Examples of active spec areas:
|
||||
- Document detail page (`document-topbar-*.html`, `documents-page-spec.html`)
|
||||
- Admin interfaces (`admin-redesign-*.html`, `admin-tag-overhaul.html`)
|
||||
- Transcription workflows (`inline-transcription-*.html`, `annotation-transcription-*.html`)
|
||||
- Dashboard and activity feeds (`dashboard-*.html`, `chronik-spec.html`)
|
||||
- OCR admin (`ocr-admin-spec.html`)
|
||||
|
||||
## How to Use
|
||||
|
||||
1. **Before implementing a feature**, check `specs/` for an existing specification.
|
||||
2. **When proposing a new architecture**, draft an ADR in `adr/` and discuss before coding.
|
||||
3. **When deploying**, follow `infrastructure/production-compose.md`.
|
||||
4. **Keep TODO files updated** — they serve as lightweight backlogs.
|
||||
|
||||
## Style Guide
|
||||
|
||||
`STYLEGUIDE.md` covers:
|
||||
- Code formatting and linting rules
|
||||
- Component naming conventions
|
||||
- Color palette and typography
|
||||
- Accessibility standards (WCAG 2.1 AA)
|
||||
|
||||
## Contributing
|
||||
|
||||
- ADRs should be sequential (`NNN-descriptive-name.md`).
|
||||
- Specs should be self-contained HTML files viewable in a browser.
|
||||
- Infrastructure docs should include copy-pasteable commands.
|
||||
**LLM reminder:** ADRs are sequential — use the next number after the highest existing one in `docs/adr/`. When making a significant architectural change (new service, data model change, technology swap), write a new ADR before implementing.
|
||||
|
||||
86
docs/README.md
Normal file
86
docs/README.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# docs/
|
||||
|
||||
Project documentation organised into four categories: architecture decision records (ADRs), system architecture diagrams, infrastructure runbooks, and detailed UI/UX specifications.
|
||||
|
||||
## Folder structure
|
||||
|
||||
```
|
||||
docs/
|
||||
├── adr/ # Architecture Decision Records
|
||||
├── architecture/ # C4 model diagrams and system architecture docs
|
||||
├── infrastructure/ # Deployment, CI/CD, and ops guides
|
||||
├── specs/ # UI/UX feature specifications (HTML)
|
||||
├── ARCHITECTURE.md # Human-readable architecture overview (DOC-2)
|
||||
├── DEPLOYMENT.md # Day-1 checklist and operational reference (DOC-5)
|
||||
├── GLOSSARY.md # Domain terminology (DOC-3)
|
||||
├── security-guide.md # Security policies and hardening guide
|
||||
├── STYLEGUIDE.md # Coding and design style guide
|
||||
└── infrastructure/ # Production compose, CI config, S3 migration
|
||||
```
|
||||
|
||||
## ADR (`adr/`)
|
||||
|
||||
Architecture Decision Records capture major technical decisions and their rationale.
|
||||
|
||||
| ADR | Title | Status |
|
||||
| -------------------------------------- | ------------------------------------ | -------- |
|
||||
| `001-ocr-python-microservice.md` | OCR as a separate Python container | Accepted |
|
||||
| `002-polygon-jsonb-storage.md` | Polygon coordinates in JSONB columns | Accepted |
|
||||
| `003-chronik-unified-activity-feed.md` | Unified activity feed (Chronik) | Accepted |
|
||||
|
||||
When making a significant architectural change (new service, data model change, technology swap), write a new ADR:
|
||||
|
||||
- **Status** (Proposed / Accepted / Deprecated / Superseded)
|
||||
- **Context** (forces at play)
|
||||
- **Decision** (what we decided)
|
||||
- **Consequences** (trade-offs)
|
||||
- **Alternatives Considered** (table format)
|
||||
|
||||
ADRs are sequential (`NNN-descriptive-name.md`). Do not reuse numbers.
|
||||
|
||||
## Architecture (`architecture/`)
|
||||
|
||||
Contains C4 model diagrams describing the system at different zoom levels:
|
||||
|
||||
- **Context diagram** — How Familienarchiv fits into the user and system ecosystem
|
||||
- **Container diagram** — The high-level technology choices (Spring Boot, SvelteKit, PostgreSQL, MinIO, OCR service)
|
||||
- **Component diagram** — Major structural components within the backend
|
||||
|
||||
Written in Markdown with embedded Mermaid diagrams (`c4-diagrams.md`). Gitea renders these automatically.
|
||||
|
||||
For the human-readable architecture narrative, see [`docs/ARCHITECTURE.md`](ARCHITECTURE.md).
|
||||
|
||||
## Infrastructure (`infrastructure/`)
|
||||
|
||||
Operational documentation for running Familienarchiv in production and CI.
|
||||
|
||||
| Document | Purpose |
|
||||
| -------------------------- | ---------------------------------------------------- |
|
||||
| `ci-gitea.md` | Gitea CI/CD pipeline configuration |
|
||||
| `production-compose.md` | Production Docker Compose setup and VPS provisioning |
|
||||
| `s3-migration.md` | Migrating documents between S3 buckets |
|
||||
| `self-hosted-catalogue.md` | Self-hosted software catalogue |
|
||||
|
||||
For the day-1 deployment checklist, see [`docs/DEPLOYMENT.md`](DEPLOYMENT.md).
|
||||
|
||||
## Specs (`specs/`)
|
||||
|
||||
High-fidelity UI/UX specifications written as standalone HTML files. These are design documents describing exact layout, interactions, and responsive behavior before implementation.
|
||||
|
||||
Each spec typically includes:
|
||||
|
||||
- Visual mockups with CSS-in-HTML styling
|
||||
- Interaction flows and state transitions
|
||||
- Responsive breakpoint behavior
|
||||
- Accessibility requirements
|
||||
|
||||
Before implementing a feature, check `specs/` for an existing specification.
|
||||
|
||||
## Style Guide
|
||||
|
||||
[`docs/STYLEGUIDE.md`](STYLEGUIDE.md) covers:
|
||||
|
||||
- Code formatting and linting rules
|
||||
- Component naming conventions
|
||||
- Color palette and typography
|
||||
- Accessibility standards (WCAG 2.1 AA)
|
||||
1
familienarchiv-408
Submodule
1
familienarchiv-408
Submodule
Submodule familienarchiv-408 added at 6ecff120e6
@@ -71,29 +71,13 @@ src/
|
||||
└── ... # Other SvelteKit config files
|
||||
```
|
||||
|
||||
For per-domain component inventories, see the domain READMEs in `src/lib/<domain>/README.md`.
|
||||
|
||||
## API Client Pattern
|
||||
|
||||
All server-side API calls use the typed client from `$lib/api.server.ts`:
|
||||
→ See [CONTRIBUTING.md §Frontend API client](../CONTRIBUTING.md#frontend-api-client)
|
||||
|
||||
```typescript
|
||||
const api = createApiClient(fetch);
|
||||
const result = await api.GET('/api/persons/{id}', { params: { path: { id } } });
|
||||
|
||||
// Always check via response.ok, NOT result.error
|
||||
if (!result.response.ok) {
|
||||
const code = (result.error as unknown as { code?: string })?.code;
|
||||
throw error(result.response.status, getErrorMessage(code));
|
||||
}
|
||||
return { person: result.data! };
|
||||
```
|
||||
|
||||
Key rules:
|
||||
|
||||
- Use `!result.response.ok` for error checking (not `if (result.error)` — breaks when spec has no error responses defined)
|
||||
- Cast errors as `result.error as unknown as { code?: string }` to extract backend error code
|
||||
- Use `result.data!` after an ok check
|
||||
|
||||
For multipart/form-data (file uploads), bypass the typed client and use raw `fetch`.
|
||||
**LLM reminder:** check `!result.response.ok` (not `result.error` — breaks when spec has no error responses); cast errors as `result.error as unknown as { code?: string }`; use `result.data!` after an ok check. For multipart/form-data (file uploads), bypass the typed client and use raw `fetch`.
|
||||
|
||||
## Form Actions Pattern
|
||||
|
||||
@@ -102,7 +86,7 @@ For multipart/form-data (file uploads), bypass the typed client and use raw `fet
|
||||
export const actions = {
|
||||
default: async ({ request, fetch }) => {
|
||||
const formData = await request.formData();
|
||||
const name = formData.get('name') as string;
|
||||
const name = formData.get('name') as string; // cast needed — FormData returns FormDataEntryValue
|
||||
// ...
|
||||
return fail(400, { error: 'message' }); // on error
|
||||
throw redirect(303, '/target'); // on success
|
||||
@@ -112,13 +96,9 @@ export const actions = {
|
||||
|
||||
## Date Handling
|
||||
|
||||
- **Forms**: German format `dd.mm.yyyy` with auto-dot insertion via `handleDateInput()`. A hidden `<input type="hidden" name="documentDate" value={dateIso}>` sends ISO to the backend.
|
||||
- **Display**: Always use `Intl.DateTimeFormat` with `T12:00:00` suffix to prevent UTC off-by-one:
|
||||
```typescript
|
||||
new Intl.DateTimeFormat('de-DE', { day: 'numeric', month: 'long', year: 'numeric' }).format(
|
||||
new Date(doc.documentDate + 'T12:00:00')
|
||||
);
|
||||
```
|
||||
→ See [CONTRIBUTING.md §Date handling](../CONTRIBUTING.md#date-handling)
|
||||
|
||||
**LLM reminder:** always append `T12:00:00` when constructing `new Date()` from an ISO date string — prevents UTC timezone off-by-one errors. Forms use German `dd.mm.yyyy` format via `handleDateInput()` with a hidden ISO input.
|
||||
|
||||
## Styling Conventions (Tailwind CSS 4)
|
||||
|
||||
@@ -146,15 +126,9 @@ Card pattern for content sections:
|
||||
|
||||
## Key UI Components
|
||||
|
||||
| Component | Location | Props | Description |
|
||||
| -------------------- | ------------------------------ | --------------------------------------- | ------------------------------------------ |
|
||||
| `PersonTypeahead` | `$lib/person/` | `name`, `label`, `value`, `initialName` | Single-person selector with typeahead |
|
||||
| `PersonMultiSelect` | `$lib/person/` | `selectedPersons` (bind) | Chip-based multi-person selector |
|
||||
| `TagInput` | `$lib/tag/` | `tags` (bind), `allowCreation?` | Tag chip input with typeahead |
|
||||
| `PdfViewer` | `$lib/document/` | `url`, `annotations` | PDF rendering with annotation overlay |
|
||||
| `TranscriptionBlock` | `$lib/document/transcription/` | `block`, `mode` | Read/edit transcription block |
|
||||
| `DocumentTopBar` | `$lib/document/` | `document` | Responsive document metadata header |
|
||||
| `BackButton` | `$lib/shared/primitives/` | — | Calls `history.back()`; 44 px touch target |
|
||||
→ See per-domain READMEs: [`src/lib/person/README.md`](src/lib/person/README.md), [`src/lib/tag/README.md`](src/lib/tag/README.md), [`src/lib/document/README.md`](src/lib/document/README.md), [`src/lib/shared/README.md`](src/lib/shared/README.md)
|
||||
|
||||
**LLM reminder:** `BackButton` is at `$lib/shared/primitives/BackButton.svelte` — use it for all back navigation; never a static `<a href>`. API client is at `$lib/shared/api.server`.
|
||||
|
||||
## How to Run
|
||||
|
||||
|
||||
@@ -1,154 +1,7 @@
|
||||
# OCR Service — Familienarchiv
|
||||
# OCR Service
|
||||
|
||||
## Overview
|
||||
→ See [ocr-service/README.md](./README.md) for tech stack, architecture, endpoints, environment variables, local development, testing, and training.
|
||||
|
||||
Python FastAPI microservice that performs OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition) on historical family documents. It exposes a simple HTTP API consumed by the Spring Boot backend. The service is stateless — all job tracking and business logic remain in Java.
|
||||
**LLM reminder:** the OCR service is a **single-node container** — training reloads the model in-process, so multiple replicas cause model-state divergence (see ADR-001). All job tracking and business logic stay in Spring Boot; the Python service is stateless OCR only.
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Framework**: FastAPI 0.115.6 (Python 3.11)
|
||||
- **OCR Engines**:
|
||||
- **Surya** (`surya-ocr`) — Transformer-based, handles typewritten and modern Latin handwriting
|
||||
- **Kraken** (`kraken==7.0`) — Historical HTR model support, required for pre-1941 German Kurrent/Sütterlin scripts
|
||||
- **ML**: PyTorch 2.7.1 (CPU-only), torchvision, transformers
|
||||
- **PDF Processing**: `pypdfium2` (rendering), `pillow`
|
||||
- **Image Processing**: `opencv-python-headless`, `pyvips`
|
||||
- **Spell Checking**: `pyspellchecker`
|
||||
- **HTTP Client**: `httpx`
|
||||
|
||||
## Architecture
|
||||
|
||||
The service is a single-node container (see ADR-001). OCR training reloads the model in-process after each run, so multiple replicas would cause training conflicts and model-state divergence.
|
||||
|
||||
### Interface Contract
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"pdfUrl": "http://minio:9000/archive-documents/abc.pdf?presigned...",
|
||||
"scriptType": "HANDWRITING_KURRENT",
|
||||
"language": "de"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** Array of `OcrBlock` objects:
|
||||
```json
|
||||
[
|
||||
{
|
||||
"pageNumber": 0,
|
||||
"x": 0.12, "y": 0.08, "width": 0.76, "height": 0.04,
|
||||
"polygon": [[0.12,0.08],[0.88,0.09],[0.87,0.12],[0.13,0.11]],
|
||||
"text": "Sehr geehrter Herr ..."
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
Coordinates are normalized (0-1) relative to page dimensions.
|
||||
|
||||
### File Structure
|
||||
|
||||
```
|
||||
ocr-service/
|
||||
├── main.py # FastAPI app, endpoints, request handling
|
||||
├── models.py # Pydantic models (OcrRequest, OcrBlock)
|
||||
├── engines/
|
||||
│ ├── __init__.py
|
||||
│ ├── kraken.py # Kraken engine wrapper (Kurrent models)
|
||||
│ └── surya.py # Surya engine wrapper (typewritten/Latin)
|
||||
├── preprocessing.py # Image preprocessing (CLAHE, deskew, denoise)
|
||||
├── confidence.py # Confidence scoring and thresholding
|
||||
├── spell_check.py # Post-OCR spell correction
|
||||
├── ensure_blla_model.py # Model download / verification helper
|
||||
├── dictionaries/ # Historical word lists for spell checking
|
||||
├── requirements.txt # Python dependencies
|
||||
├── Dockerfile # Production container image
|
||||
└── entrypoint.sh # Container startup script
|
||||
```
|
||||
|
||||
### Key Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|---|---|---|
|
||||
| `/health` | GET | Returns 200 only after models are loaded |
|
||||
| `/ocr` | POST | Extract text blocks from a PDF URL |
|
||||
| `/ocr/stream` | POST | Streaming OCR with SSE-style progress events |
|
||||
| `/training/submit` | POST | Submit training data for model fine-tuning |
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `KRAKEN_MODEL_PATH` | `/app/models/german_kurrent.mlmodel` | Path to Kraken model file |
|
||||
| `TRAINING_TOKEN` | `""` | Bearer token required for training endpoints |
|
||||
| `OCR_CONFIDENCE_THRESHOLD` | `0.3` | Minimum confidence for Latin scripts |
|
||||
| `OCR_CONFIDENCE_THRESHOLD_KURRENT` | `0.5` | Minimum confidence for Kurrent scripts |
|
||||
| `RECOGNITION_BATCH_SIZE` | `16` | Kraken recognition batch size |
|
||||
| `DETECTOR_BATCH_SIZE` | `8` | Surya detector batch size |
|
||||
| `OCR_CLAHE_CLIP_LIMIT` | `2.0` | CLAHE contrast enhancement limit |
|
||||
| `OCR_CLAHE_TILE_SIZE` | `8` | CLAHE tile grid size |
|
||||
| `OCR_MAX_CACHED_MODELS` | `2` | LRU model cache size (~500 MB each) |
|
||||
| `ALLOWED_PDF_HOSTS` | `minio,localhost,127.0.0.1` | SSRF protection — allowed PDF URL hosts |
|
||||
|
||||
## How to Run
|
||||
|
||||
### Local Development (Python venv)
|
||||
|
||||
```bash
|
||||
cd ocr-service
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate
|
||||
|
||||
# Install PyTorch CPU first (saves ~2 GB vs CUDA)
|
||||
pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cpu
|
||||
|
||||
# Install remaining dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Run development server
|
||||
fastapi dev main.py --host 0.0.0.0 --port 8000
|
||||
|
||||
# Or production mode
|
||||
uvicorn main:app --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
### Docker (via docker-compose)
|
||||
|
||||
The OCR service is included in the root `docker-compose.yml`:
|
||||
|
||||
```bash
|
||||
docker-compose up -d ocr-service
|
||||
```
|
||||
|
||||
The container:
|
||||
- Exposes port 8000 internally (not mapped to host by default)
|
||||
- Mounts `ocr_models` and `ocr_cache` volumes for persistence
|
||||
- Has a 120-second startup grace period for model loading
|
||||
- Memory limit: 12 GB
|
||||
|
||||
### Model Downloads
|
||||
|
||||
Use the helper script to download Kraken models:
|
||||
|
||||
```bash
|
||||
./scripts/download-kraken-models.sh
|
||||
```
|
||||
|
||||
Models are stored in the `ocr_models` Docker volume or `./ocr-service/models/` locally.
|
||||
|
||||
## Testing
|
||||
|
||||
Only a subset of tests can run without the full ML stack:
|
||||
|
||||
```bash
|
||||
cd ocr-service
|
||||
pip install pytest pytest-asyncio pyspellchecker
|
||||
|
||||
# No ML required — pure logic tests
|
||||
python -m pytest test_spell_check.py test_confidence.py test_sender_registry.py -v
|
||||
```
|
||||
|
||||
Tests requiring PyTorch/Kraken/Surya (e.g., `test_engines.py`) must be run in the Docker container or a fully provisioned venv.
|
||||
|
||||
## Training
|
||||
|
||||
The service supports in-process model fine-tuning via Kraken's `ketos` training pipeline. Training endpoints require the `TRAINING_TOKEN` bearer token. After training completes, the model is reloaded in-process — this is why only a single replica is supported.
|
||||
`ALLOWED_PDF_HOSTS` must never be set to `*` — that opens SSRF. The default (`minio,localhost,127.0.0.1`) is correct for dev.
|
||||
|
||||
@@ -1,144 +1,5 @@
|
||||
# Scripts — Familienarchiv
|
||||
# scripts/
|
||||
|
||||
## Overview
|
||||
→ See [scripts/README.md](./README.md) for the full list of scripts, their purpose, and usage.
|
||||
|
||||
Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.
|
||||
|
||||
## Scripts
|
||||
|
||||
### `reset-db.sh`
|
||||
**Purpose**: Hard-reset the development database, wiping all documents, persons, tags, and related data.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./scripts/reset-db.sh
|
||||
# Type 'yes' to confirm
|
||||
```
|
||||
|
||||
**What it truncates:**
|
||||
- `transcription_block_versions`
|
||||
- `transcription_blocks`
|
||||
- `comment_mentions`
|
||||
- `document_comments`
|
||||
- `document_annotations`
|
||||
- `document_versions`
|
||||
- `notifications`
|
||||
- `documents`
|
||||
- `person_name_aliases`
|
||||
- `persons`
|
||||
- `tag`
|
||||
|
||||
> ⚠️ **Destructive operation** — only for development!
|
||||
|
||||
---
|
||||
|
||||
### `rebuild-frontend.sh`
|
||||
**Purpose**: Force a clean rebuild of the frontend Docker container.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./scripts/rebuild-frontend.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `download-kraken-models.sh`
|
||||
**Purpose**: Download Kraken HTR models for German Kurrent and Sütterlin scripts.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./scripts/download-kraken-models.sh
|
||||
```
|
||||
|
||||
Downloads models into `./ocr-service/models/` or the `ocr_models` Docker volume. Models are ~100-500 MB each.
|
||||
|
||||
---
|
||||
|
||||
### `download-paperless.sh`
|
||||
**Purpose**: Download exported documents from a Paperless-ngx instance.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./scripts/download-paperless.sh
|
||||
```
|
||||
|
||||
Requires environment variables or config for the Paperless API endpoint and token.
|
||||
|
||||
---
|
||||
|
||||
### `flatten-paperless.sh`
|
||||
**Purpose**: Flatten nested Paperless export directories into a single import-ready structure.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./scripts/flatten-paperless.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `generate_data.py`
|
||||
**Purpose**: Generate synthetic test data for development.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python scripts/generate_data.py
|
||||
```
|
||||
|
||||
Generates fake documents, persons, and tags suitable for load testing or UI development.
|
||||
|
||||
---
|
||||
|
||||
### `prepare_historical_dict.py`
|
||||
**Purpose**: Build a historical German word dictionary for the OCR spell-checker.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python scripts/prepare_historical_dict.py
|
||||
```
|
||||
|
||||
Processes raw word lists into the format expected by `ocr-service/spell_check.py`.
|
||||
|
||||
---
|
||||
|
||||
### `schema.sql`
|
||||
**Purpose**: Complete database schema dump for reference.
|
||||
|
||||
**Note**: Flyway migrations in `backend/src/main/resources/db/migration/` are the source of truth for schema evolution. `schema.sql` is a snapshot for quick reference only.
|
||||
|
||||
---
|
||||
|
||||
### `large-data.sql`
|
||||
**Purpose**: Pre-seeded dataset with a large number of documents for performance testing.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Import into PostgreSQL
|
||||
docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql
|
||||
```
|
||||
|
||||
## How to Use
|
||||
|
||||
Most scripts should be run from the **repository root**:
|
||||
|
||||
```bash
|
||||
# Database reset
|
||||
./scripts/reset-db.sh
|
||||
|
||||
# Model download
|
||||
./scripts/download-kraken-models.sh
|
||||
|
||||
# Data generation
|
||||
cd scripts && python generate_data.py
|
||||
```
|
||||
|
||||
Ensure scripts are executable:
|
||||
```bash
|
||||
chmod +x scripts/*.sh
|
||||
```
|
||||
|
||||
## Adding New Scripts
|
||||
|
||||
1. Place the script in `scripts/`
|
||||
2. Add a header comment describing purpose and usage
|
||||
3. Make it executable (`chmod +x`)
|
||||
4. Document it in this `CLAUDE.md`
|
||||
**LLM reminder:** when adding a new script, document it in `scripts/README.md` (not here).
|
||||
|
||||
161
scripts/README.md
Normal file
161
scripts/README.md
Normal file
@@ -0,0 +1,161 @@
|
||||
# scripts/
|
||||
|
||||
Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.
|
||||
|
||||
## Scripts
|
||||
|
||||
### `reset-db.sh`
|
||||
|
||||
**Purpose**: Hard-reset the development database, wiping all documents, persons, tags, and related data.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
./scripts/reset-db.sh
|
||||
# Type 'yes' to confirm
|
||||
```
|
||||
|
||||
**What it truncates:**
|
||||
|
||||
- `transcription_block_versions`
|
||||
- `transcription_blocks`
|
||||
- `comment_mentions`
|
||||
- `document_comments`
|
||||
- `document_annotations`
|
||||
- `document_versions`
|
||||
- `notifications`
|
||||
- `documents`
|
||||
- `person_name_aliases`
|
||||
- `persons`
|
||||
- `tag`
|
||||
|
||||
> ⚠️ **Destructive operation — only for development!** This wipes ALL data. Not reversible without a backup.
|
||||
|
||||
---
|
||||
|
||||
### `rebuild-frontend.sh`
|
||||
|
||||
**Purpose**: Force a clean rebuild of the frontend Docker container.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
./scripts/rebuild-frontend.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `download-kraken-models.sh`
|
||||
|
||||
**Purpose**: Download Kraken HTR models for German Kurrent and Sütterlin scripts.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
./scripts/download-kraken-models.sh
|
||||
```
|
||||
|
||||
Downloads models into `./ocr-service/models/` or the `ocr_models` Docker volume. Models are ~100–500 MB each.
|
||||
|
||||
---
|
||||
|
||||
### `download-paperless.sh`
|
||||
|
||||
**Purpose**: Download exported documents from a Paperless-ngx instance.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
./scripts/download-paperless.sh
|
||||
```
|
||||
|
||||
Requires environment variables or config for the Paperless API endpoint and token.
|
||||
|
||||
---
|
||||
|
||||
### `flatten-paperless.sh`
|
||||
|
||||
**Purpose**: Flatten nested Paperless export directories into a single import-ready structure.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
./scripts/flatten-paperless.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `generate_data.py`
|
||||
|
||||
**Purpose**: Generate synthetic test data for development.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
python scripts/generate_data.py
|
||||
```
|
||||
|
||||
Generates fake documents, persons, and tags suitable for load testing or UI development.
|
||||
|
||||
---
|
||||
|
||||
### `prepare_historical_dict.py`
|
||||
|
||||
**Purpose**: Build a historical German word dictionary for the OCR spell-checker.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
python scripts/prepare_historical_dict.py
|
||||
```
|
||||
|
||||
Processes raw word lists into the format expected by `ocr-service/spell_check.py`.
|
||||
|
||||
---
|
||||
|
||||
### `schema.sql`
|
||||
|
||||
**Purpose**: Complete database schema dump for reference.
|
||||
|
||||
**Note**: Flyway migrations in `backend/src/main/resources/db/migration/` are the source of truth for schema evolution. `schema.sql` is a snapshot for quick reference only.
|
||||
|
||||
---
|
||||
|
||||
### `large-data.sql`
|
||||
|
||||
**Purpose**: Pre-seeded dataset with a large number of documents for performance testing.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
# Import into PostgreSQL
|
||||
docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql
|
||||
```
|
||||
|
||||
## How to Use
|
||||
|
||||
Most scripts should be run from the **repository root**:
|
||||
|
||||
```bash
|
||||
# Database reset
|
||||
./scripts/reset-db.sh
|
||||
|
||||
# Model download
|
||||
./scripts/download-kraken-models.sh
|
||||
|
||||
# Data generation
|
||||
cd scripts && python generate_data.py
|
||||
```
|
||||
|
||||
Ensure scripts are executable:
|
||||
|
||||
```bash
|
||||
chmod +x scripts/*.sh
|
||||
```
|
||||
|
||||
## Adding New Scripts
|
||||
|
||||
1. Place the script in `scripts/`
|
||||
2. Add a header comment describing purpose and usage
|
||||
3. Make it executable (`chmod +x`)
|
||||
4. Document it in this `README.md`
|
||||
Reference in New Issue
Block a user