docs(legibility): migrate CLAUDE.md rules into human docs — DOC-7
Processes all 7 CLAUDE.md files according to the 3-bucket classification. Migration targets (CONTRIBUTING.md, docs/ARCHITECTURE.md, docs/DEPLOYMENT.md, domain READMEs) are introduced by DOC-2/4/5/6 — this PR must merge last. ### scripts/CLAUDE.md → scripts/README.md New `scripts/README.md` with full script documentation (preserving the ⚠️ destructive-operation warning on reset-db.sh). `scripts/CLAUDE.md` reduced to a pointer + "document new scripts in README.md" reminder. ### .devcontainer/CLAUDE.md → .devcontainer/README.md New `.devcontainer/README.md` with all configuration, usage, and limitations. `devcontainer/CLAUDE.md` reduced to a single pointer line. ### docs/CLAUDE.md → docs/README.md New `docs/README.md` covering the folder structure, ADR guide, infrastructure docs, and specs folder. `docs/CLAUDE.md` reduced to pointer + ADR reminder. ### ocr-service/CLAUDE.md Reduced to pointer to `ocr-service/README.md` (content migrated in DOC-6). Kept LLM reminders: single-node constraint, ALLOWED_PDF_HOSTS SSRF risk. ### backend/CLAUDE.md - Layering Rules → pointer to docs/ARCHITECTURE.md - Error Handling → pointer to CONTRIBUTING.md + reminder - Security/Permissions → pointer to docs/ARCHITECTURE.md + reminder - Package Structure → tagged TODO post-REFACTOR-1 - Fixed errors.ts path to frontend/src/lib/shared/errors.ts - Added ANNOTATE_ALL + BLOG_WRITE to permission list - Key Entities, Entity Code Style, Services → kept (Bucket-2) ### root CLAUDE.md - Stack, Infrastructure, Dev Container → pointers - Layering Rules, Error Handling, Security, OpenAPI, API Client, Date Handling, UI Components, Frontend Error Handling → pointers + reminders - Package Structure → tagged TODO post-REFACTOR-1 - Domain Model, Entity Code Style, Form Actions, Styling → kept (Bucket-2) ### frontend/CLAUDE.md - API Client Pattern, Date Handling → pointers + reminders - Key UI Components → pointer to domain READMEs - Styling, Form Actions, How to Run, Vite Proxy, i18n → kept (Bucket-2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,96 +1,3 @@
|
|||||||
# Dev Container — Familienarchiv
|
# Dev Container
|
||||||
|
|
||||||
## Overview
|
→ See [.devcontainer/README.md](./README.md) for configuration, usage, and known limitations.
|
||||||
|
|
||||||
VS Code Dev Container configuration for a pre-configured development environment. Includes Java 21, Maven, and Node.js 24 — everything needed to work on both backend and frontend.
|
|
||||||
|
|
||||||
## Configuration
|
|
||||||
|
|
||||||
File: `.devcontainer/devcontainer.json`
|
|
||||||
|
|
||||||
### Included Features
|
|
||||||
|
|
||||||
| Feature | Version | Purpose |
|
|
||||||
|---|---|---|
|
|
||||||
| Java | 21 | Spring Boot backend |
|
|
||||||
| Maven | bundled with Java feature | Build tool |
|
|
||||||
| Node.js | 24 | SvelteKit frontend |
|
|
||||||
|
|
||||||
### VS Code Extensions (Auto-installed)
|
|
||||||
|
|
||||||
| Extension | Purpose |
|
|
||||||
|---|---|
|
|
||||||
| `vscjava.vscode-java-pack` | Java language support, debugging, testing |
|
|
||||||
| `vmware.vscode-spring-boot` | Spring Boot tooling |
|
|
||||||
| `gabrielbb.vscode-lombok` | Lombok annotation support |
|
|
||||||
| `humao.rest-client` | HTTP request files (for `backend/api_tests/`) |
|
|
||||||
|
|
||||||
### Ports
|
|
||||||
|
|
||||||
- `8080` forwarded to host — access backend at `http://localhost:8080`
|
|
||||||
|
|
||||||
### User
|
|
||||||
|
|
||||||
Runs as `vscode` user (not root) for security.
|
|
||||||
|
|
||||||
## How to Use
|
|
||||||
|
|
||||||
### Prerequisites
|
|
||||||
|
|
||||||
- VS Code with the **Dev Containers** extension installed
|
|
||||||
- Docker running locally
|
|
||||||
|
|
||||||
### Open in Dev Container
|
|
||||||
|
|
||||||
1. Open the project in VS Code
|
|
||||||
2. Press `F1` → type "Dev Containers: Reopen in Container"
|
|
||||||
3. VS Code will:
|
|
||||||
- Build the container using the root `docker-compose.yml`
|
|
||||||
- Install Java 21, Maven, and Node 24
|
|
||||||
- Install the listed extensions
|
|
||||||
- Mount the workspace folder
|
|
||||||
|
|
||||||
### Working Inside the Container
|
|
||||||
|
|
||||||
Once inside the container, you have access to both stacks:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Backend
|
|
||||||
cd backend
|
|
||||||
./mvnw spring-boot:run
|
|
||||||
|
|
||||||
# Frontend (in a new terminal)
|
|
||||||
cd frontend
|
|
||||||
npm install
|
|
||||||
npm run dev
|
|
||||||
```
|
|
||||||
|
|
||||||
The container reuses the `docker-compose.yml` services, so PostgreSQL and MinIO are available automatically.
|
|
||||||
|
|
||||||
### Forwarding Frontend Port
|
|
||||||
|
|
||||||
The devcontainer config only forwards port 8080 by default. To access the frontend dev server (port 5173 or 3000), either:
|
|
||||||
|
|
||||||
1. Add `5173` to `forwardPorts` in `devcontainer.json`, or
|
|
||||||
2. Use the VS Code "Ports" panel to forward it dynamically
|
|
||||||
|
|
||||||
## Limitations
|
|
||||||
|
|
||||||
- The devcontainer attaches to the `backend` service from `docker-compose.yml`, so it inherits those environment variables
|
|
||||||
- OCR service and other containers should be started separately via `docker-compose up -d`
|
|
||||||
- GPU passthrough for OCR training is not configured
|
|
||||||
|
|
||||||
## Customization
|
|
||||||
|
|
||||||
To add more tools or extensions, edit `.devcontainer/devcontainer.json`:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"features": {
|
|
||||||
"ghcr.io/devcontainers/features/python:1": {
|
|
||||||
"version": "3.11"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"forwardPorts": [8080, 5173, 3000]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|||||||
94
.devcontainer/README.md
Normal file
94
.devcontainer/README.md
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
# Dev Container — Familienarchiv
|
||||||
|
|
||||||
|
VS Code Dev Container configuration for a pre-configured development environment. Includes Java 21, Maven, and Node.js 24 — everything needed to work on both backend and frontend.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
File: `.devcontainer/devcontainer.json`
|
||||||
|
|
||||||
|
### Included Features
|
||||||
|
|
||||||
|
| Feature | Version | Purpose |
|
||||||
|
| ------- | ------------------------- | ------------------- |
|
||||||
|
| Java | 21 | Spring Boot backend |
|
||||||
|
| Maven | bundled with Java feature | Build tool |
|
||||||
|
| Node.js | 24 | SvelteKit frontend |
|
||||||
|
|
||||||
|
### VS Code Extensions (Auto-installed)
|
||||||
|
|
||||||
|
| Extension | Purpose |
|
||||||
|
| --------------------------- | --------------------------------------------- |
|
||||||
|
| `vscjava.vscode-java-pack` | Java language support, debugging, testing |
|
||||||
|
| `vmware.vscode-spring-boot` | Spring Boot tooling |
|
||||||
|
| `gabrielbb.vscode-lombok` | Lombok annotation support |
|
||||||
|
| `humao.rest-client` | HTTP request files (for `backend/api_tests/`) |
|
||||||
|
|
||||||
|
### Ports
|
||||||
|
|
||||||
|
- `8080` forwarded to host — access backend at `http://localhost:8080`
|
||||||
|
|
||||||
|
### User
|
||||||
|
|
||||||
|
Runs as `vscode` user (not root) for security.
|
||||||
|
|
||||||
|
## How to Use
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- VS Code with the **Dev Containers** extension installed
|
||||||
|
- Docker running locally
|
||||||
|
|
||||||
|
### Open in Dev Container
|
||||||
|
|
||||||
|
1. Open the project in VS Code
|
||||||
|
2. Press `F1` → type "Dev Containers: Reopen in Container"
|
||||||
|
3. VS Code will:
|
||||||
|
- Build the container using the root `docker-compose.yml`
|
||||||
|
- Install Java 21, Maven, and Node 24
|
||||||
|
- Install the listed extensions
|
||||||
|
- Mount the workspace folder
|
||||||
|
|
||||||
|
### Working Inside the Container
|
||||||
|
|
||||||
|
Once inside the container, you have access to both stacks:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Backend
|
||||||
|
cd backend
|
||||||
|
./mvnw spring-boot:run
|
||||||
|
|
||||||
|
# Frontend (in a new terminal)
|
||||||
|
cd frontend
|
||||||
|
npm install
|
||||||
|
npm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
The container reuses the `docker-compose.yml` services, so PostgreSQL and MinIO are available automatically.
|
||||||
|
|
||||||
|
### Forwarding Frontend Port
|
||||||
|
|
||||||
|
The devcontainer config only forwards port 8080 by default. To access the frontend dev server (port 5173 or 3000), either:
|
||||||
|
|
||||||
|
1. Add `5173` to `forwardPorts` in `devcontainer.json`, or
|
||||||
|
2. Use the VS Code "Ports" panel to forward it dynamically
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- The devcontainer attaches to the `backend` service from `docker-compose.yml`, so it inherits those environment variables
|
||||||
|
- OCR service and other containers should be started separately via `docker-compose up -d`
|
||||||
|
- GPU passthrough for OCR training is not configured
|
||||||
|
|
||||||
|
## Customization
|
||||||
|
|
||||||
|
To add more tools or extensions, edit `.devcontainer/devcontainer.json`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"features": {
|
||||||
|
"ghcr.io/devcontainers/features/python:1": {
|
||||||
|
"version": "3.11"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"forwardPorts": [8080, 5173, 3000]
|
||||||
|
}
|
||||||
|
```
|
||||||
195
CLAUDE.md
195
CLAUDE.md
@@ -4,6 +4,8 @@
|
|||||||
|
|
||||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
> For a human-readable project overview, see [README.md](./README.md).
|
||||||
|
|
||||||
## Project Overview
|
## Project Overview
|
||||||
|
|
||||||
**Familienarchiv** is a family document archival system — a full-stack web app for digitizing, organizing, and searching family documents. Key features: file uploads (stored in MinIO/S3), metadata management, Excel/ODS batch import, full-text search, conversation threads between family members, and role-based access control.
|
**Familienarchiv** is a family document archival system — a full-stack web app for digitizing, organizing, and searching family documents. Key features: file uploads (stored in MinIO/S3), metadata management, Excel/ODS batch import, full-text search, conversation threads between family members, and role-based access control.
|
||||||
@@ -18,6 +20,8 @@ See [CODESTYLE.md](./CODESTYLE.md) for coding standards: Clean Code, DRY/KISS tr
|
|||||||
|
|
||||||
## Stack
|
## Stack
|
||||||
|
|
||||||
|
→ See [README.md §Tech Stack](./README.md#tech-stack)
|
||||||
|
|
||||||
- **Backend**: Spring Boot 4.0 (Java 21, Maven, Jetty, JPA/Hibernate, Flyway, Spring Security, Spring Session JDBC)
|
- **Backend**: Spring Boot 4.0 (Java 21, Maven, Jetty, JPA/Hibernate, Flyway, Spring Security, Spring Session JDBC)
|
||||||
- **Frontend**: SvelteKit 2 with Svelte 5, TypeScript, Tailwind CSS 4, Paraglide.js (i18n: de/en/es)
|
- **Frontend**: SvelteKit 2 with Svelte 5, TypeScript, Tailwind CSS 4, Paraglide.js (i18n: de/en/es)
|
||||||
- **Database**: PostgreSQL 16
|
- **Database**: PostgreSQL 16
|
||||||
@@ -27,12 +31,13 @@ See [CODESTYLE.md](./CODESTYLE.md) for coding standards: Clean Code, DRY/KISS tr
|
|||||||
## Common Commands
|
## Common Commands
|
||||||
|
|
||||||
### Running the Full Stack
|
### Running the Full Stack
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# From repo root — starts PostgreSQL, MinIO, and Spring Boot backend
|
|
||||||
docker-compose up -d
|
docker-compose up -d
|
||||||
```
|
```
|
||||||
|
|
||||||
### Backend (Spring Boot)
|
### Backend (Spring Boot)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd backend
|
cd backend
|
||||||
|
|
||||||
@@ -44,6 +49,7 @@ cd backend
|
|||||||
```
|
```
|
||||||
|
|
||||||
### Frontend (SvelteKit)
|
### Frontend (SvelteKit)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd frontend
|
cd frontend
|
||||||
|
|
||||||
@@ -66,7 +72,7 @@ npm run generate:api # Regenerate TypeScript API types from OpenAPI spec
|
|||||||
|
|
||||||
### Package Structure
|
### Package Structure
|
||||||
|
|
||||||
Package-by-domain: each domain owns its controller, service, repository, entities, and DTOs.
|
<!-- TODO: rewrite post-REFACTOR-1 — see Epic 4 -->
|
||||||
|
|
||||||
```
|
```
|
||||||
backend/src/main/java/org/raddatz/familienarchiv/
|
backend/src/main/java/org/raddatz/familienarchiv/
|
||||||
@@ -90,27 +96,21 @@ backend/src/main/java/org/raddatz/familienarchiv/
|
|||||||
└── user/ User domain — AppUser, UserGroup, UserService, auth controllers
|
└── user/ User domain — AppUser, UserGroup, UserService, auth controllers
|
||||||
```
|
```
|
||||||
|
|
||||||
### Layering Rules (strictly enforced)
|
### Layering Rules
|
||||||
|
|
||||||
```
|
→ See [docs/ARCHITECTURE.md §Layering rule](./docs/ARCHITECTURE.md#layering-rule)
|
||||||
Controller → Service → Repository → DB
|
|
||||||
```
|
|
||||||
|
|
||||||
- **Controllers** never inject or call repositories directly.
|
**LLM reminder:** controllers never call repositories directly; services never reach into another domain's repository — always call the other domain's service instead.
|
||||||
- **Services** never reach into another domain's repository. Call the other domain's service instead.
|
|
||||||
- ✅ `DocumentService` → `PersonService.getById()` → `PersonRepository`
|
|
||||||
- ❌ `DocumentService` → `PersonRepository` directly
|
|
||||||
- This keeps domain boundaries clear and business logic testable in isolation.
|
|
||||||
|
|
||||||
### Domain Model
|
### Domain Model
|
||||||
|
|
||||||
| Entity | Table | Key relationships |
|
| Entity | Table | Key relationships |
|
||||||
|---|---|---|
|
| ----------- | ------------- | ------------------------------------------------------------------------------------- |
|
||||||
| `Document` | `documents` | ManyToOne `sender` (Person), ManyToMany `receivers` (Person), ManyToMany `tags` (Tag) |
|
| `Document` | `documents` | ManyToOne `sender` (Person), ManyToMany `receivers` (Person), ManyToMany `tags` (Tag) |
|
||||||
| `Person` | `persons` | Referenced by documents as sender/receiver |
|
| `Person` | `persons` | Referenced by documents as sender/receiver |
|
||||||
| `Tag` | `tag` | ManyToMany with documents via `document_tags` |
|
| `Tag` | `tag` | ManyToMany with documents via `document_tags` |
|
||||||
| `AppUser` | `app_users` | ManyToMany `groups` (UserGroup) |
|
| `AppUser` | `app_users` | ManyToMany `groups` (UserGroup) |
|
||||||
| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
|
| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
|
||||||
|
|
||||||
**`DocumentStatus` lifecycle:** `PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`
|
**`DocumentStatus` lifecycle:** `PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`
|
||||||
|
|
||||||
@@ -120,6 +120,7 @@ Controller → Service → Repository → DB
|
|||||||
### Entity Code Style
|
### Entity Code Style
|
||||||
|
|
||||||
All entities use these Lombok annotations:
|
All entities use these Lombok annotations:
|
||||||
|
|
||||||
```java
|
```java
|
||||||
@Entity
|
@Entity
|
||||||
@Table(name = "table_name")
|
@Table(name = "table_name")
|
||||||
@@ -148,65 +149,29 @@ Services are annotated with `@Service`, `@RequiredArgsConstructor`, and optional
|
|||||||
- Read methods are not annotated (default non-transactional is fine).
|
- Read methods are not annotated (default non-transactional is fine).
|
||||||
- Each service owns its domain's repository. Cross-domain data access goes through the other domain's service.
|
- Each service owns its domain's repository. Cross-domain data access goes through the other domain's service.
|
||||||
|
|
||||||
**Existing services:**
|
|
||||||
|
|
||||||
| Service | Responsibility |
|
|
||||||
|---|---|
|
|
||||||
| `DocumentService` | Document CRUD, search, tag cascade delete |
|
|
||||||
| `PersonService` | Person CRUD, find-or-create by alias |
|
|
||||||
| `TagService` | Tag find/create/update/delete |
|
|
||||||
| `UserService` | User and group CRUD |
|
|
||||||
| `FileService` | S3/MinIO upload and download |
|
|
||||||
| `MassImportService` | Async ODS/Excel import; delegates to PersonService and TagService |
|
|
||||||
|
|
||||||
### DTOs
|
### DTOs
|
||||||
|
|
||||||
Input DTOs live in `dto/`. Response types are the model entities themselves (no response DTOs).
|
Input DTOs live flat in the domain package. Response types are the model entities themselves (no response DTOs).
|
||||||
|
|
||||||
- `DocumentUpdateDTO` — used for both create and update (all fields optional)
|
- `@Schema(requiredMode = REQUIRED)` on every field the backend always populates — drives TypeScript generation.
|
||||||
- `CreateUserRequest` — user creation
|
|
||||||
- `GroupDTO` — group create/update
|
|
||||||
|
|
||||||
### Error Handling
|
### Error Handling
|
||||||
|
|
||||||
Use `DomainException` for all domain errors. Never throw raw exceptions from service methods.
|
→ See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
|
||||||
|
|
||||||
```java
|
**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` from service methods — never throw raw exceptions. When adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) mirror in `frontend/src/lib/shared/errors.ts`, (3) add i18n keys in `messages/{de,en,es}.json`.
|
||||||
// Static factories match common HTTP status codes:
|
|
||||||
DomainException.notFound(ErrorCode.DOCUMENT_NOT_FOUND, "Document not found: " + id)
|
|
||||||
DomainException.forbidden("Access denied")
|
|
||||||
DomainException.conflict(ErrorCode.IMPORT_ALREADY_RUNNING, "Already running")
|
|
||||||
DomainException.internal(ErrorCode.FILE_UPLOAD_FAILED, "Upload failed: " + e.getMessage())
|
|
||||||
```
|
|
||||||
|
|
||||||
`ErrorCode` is an enum in `exception/ErrorCode.java`. When adding a new error case, add the value there **and** mirror it in the frontend's `src/lib/errors.ts` + add a Paraglide translation key.
|
|
||||||
|
|
||||||
For simple validation in controllers (not domain logic), `ResponseStatusException` is acceptable:
|
|
||||||
```java
|
|
||||||
throw new ResponseStatusException(HttpStatus.BAD_REQUEST, "firstName is required");
|
|
||||||
```
|
|
||||||
|
|
||||||
### Security / Permissions
|
### Security / Permissions
|
||||||
|
|
||||||
Use `@RequirePermission` on controller methods (or the whole controller class):
|
→ See [docs/ARCHITECTURE.md §Permission system](./docs/ARCHITECTURE.md#permission-system)
|
||||||
|
|
||||||
```java
|
**LLM reminder:** `@RequirePermission(Permission.WRITE_ALL)` is **required** on every `POST`, `PUT`, `PATCH`, `DELETE` endpoint — not optional. Do not mix with Spring Security's `@PreAuthorize`. Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`, `ANNOTATE_ALL`, `BLOG_WRITE`.
|
||||||
@RequirePermission(Permission.WRITE_ALL)
|
|
||||||
public Document updateDocument(...) { ... }
|
|
||||||
```
|
|
||||||
|
|
||||||
Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`
|
|
||||||
|
|
||||||
`PermissionAspect` (AOP) checks the current user's `UserGroup.permissions` at runtime.
|
|
||||||
|
|
||||||
### OpenAPI / API Types
|
### OpenAPI / API Types
|
||||||
|
|
||||||
SpringDoc generates the spec at `/v3/api-docs` (only accessible when running with `--spring.profiles.active=dev`).
|
→ See [CONTRIBUTING.md §Walkthrough B — Add a new endpoint](./CONTRIBUTING.md#4-walkthrough-b--add-a-new-endpoint)
|
||||||
|
|
||||||
When changing any model field or endpoint:
|
**LLM reminder:** always run `npm run generate:api` in `frontend/` after any backend model or endpoint change — this is the most common cause of TypeScript type errors.
|
||||||
1. Rebuild the backend JAR with `-DskipTests`
|
|
||||||
2. Start it with `--spring.profiles.active=dev`
|
|
||||||
3. Run `npm run generate:api` in `frontend/`
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -235,79 +200,52 @@ frontend/src/routes/
|
|||||||
|
|
||||||
### API Client Pattern
|
### API Client Pattern
|
||||||
|
|
||||||
All server-side API calls use the typed client from `$lib/api.server.ts`:
|
→ See [CONTRIBUTING.md §Frontend API client](./CONTRIBUTING.md#frontend-api-client)
|
||||||
|
|
||||||
```typescript
|
**LLM reminder:** check `!result.response.ok` (not `result.error` — breaks when spec has no error responses defined); cast errors as `result.error as unknown as { code?: string }`; use `result.data!` after an ok check.
|
||||||
const api = createApiClient(fetch);
|
|
||||||
const result = await api.GET('/api/persons/{id}', { params: { path: { id } } });
|
|
||||||
|
|
||||||
// Always check via response.ok, NOT result.error
|
|
||||||
if (!result.response.ok) {
|
|
||||||
const code = (result.error as unknown as { code?: string })?.code;
|
|
||||||
throw error(result.response.status, getErrorMessage(code));
|
|
||||||
}
|
|
||||||
return { person: result.data! };
|
|
||||||
```
|
|
||||||
|
|
||||||
Key rules:
|
|
||||||
- Use `!result.response.ok` for error checking (not `if (result.error)` — this breaks when the spec has no error responses defined)
|
|
||||||
- Cast errors as `result.error as unknown as { code?: string }` to extract the backend error code
|
|
||||||
- Use `result.data!` (non-null assertion) after an ok check — TypeScript knows it's present
|
|
||||||
|
|
||||||
For multipart/form-data endpoints (file uploads), bypass the typed client and use raw `fetch`:
|
|
||||||
```typescript
|
|
||||||
const res = await fetch(`${baseUrl}/api/documents`, { method: 'POST', body: formData });
|
|
||||||
```
|
|
||||||
|
|
||||||
### Form Actions Pattern
|
### Form Actions Pattern
|
||||||
|
|
||||||
```typescript
|
```typescript
|
||||||
// +page.server.ts
|
// +page.server.ts
|
||||||
export const actions = {
|
export const actions = {
|
||||||
default: async ({ request, fetch }) => {
|
default: async ({ request, fetch }) => {
|
||||||
const formData = await request.formData();
|
const formData = await request.formData();
|
||||||
const name = formData.get('name') as string; // cast needed — FormData returns FormDataEntryValue
|
const name = formData.get("name") as string;
|
||||||
// ...
|
// ...
|
||||||
return fail(400, { error: 'message' }); // on error
|
return fail(400, { error: "message" }); // on error
|
||||||
throw redirect(303, '/target'); // on success
|
throw redirect(303, "/target"); // on success
|
||||||
}
|
},
|
||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
### Date Handling
|
### Date Handling
|
||||||
|
|
||||||
- **Forms**: German format `dd.mm.yyyy` with auto-dot insertion via `handleDateInput()`. A hidden `<input type="hidden" name="documentDate" value={dateIso}>` sends ISO format to the backend.
|
→ See [CONTRIBUTING.md §Date handling](./CONTRIBUTING.md#date-handling)
|
||||||
- **Display**: Always use `Intl.DateTimeFormat` with `T12:00:00` suffix to prevent UTC timezone off-by-one:
|
|
||||||
```typescript
|
**LLM reminder:** always append `T12:00:00` when constructing `new Date()` from an ISO date string — prevents UTC timezone off-by-one errors.
|
||||||
new Intl.DateTimeFormat('de-DE', { day: 'numeric', month: 'long', year: 'numeric' })
|
|
||||||
.format(new Date(doc.documentDate + 'T12:00:00'))
|
|
||||||
```
|
|
||||||
|
|
||||||
### UI Component Library
|
### UI Component Library
|
||||||
|
|
||||||
Custom components in `src/lib/components/`:
|
→ See per-domain READMEs: [`frontend/src/lib/person/README.md`](./frontend/src/lib/person/README.md), [`frontend/src/lib/tag/README.md`](./frontend/src/lib/tag/README.md), [`frontend/src/lib/document/README.md`](./frontend/src/lib/document/README.md), [`frontend/src/lib/shared/README.md`](./frontend/src/lib/shared/README.md)
|
||||||
|
|
||||||
| Component | Props | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| `PersonTypeahead` | `name`, `label`, `value`, `initialName`, `on:change` | Single-person selector with typeahead dropdown |
|
|
||||||
| `PersonMultiSelect` | `selectedPersons` (bind) | Chip-based multi-person selector |
|
|
||||||
| `TagInput` | `tags` (bind), `allowCreation?`, `on:change` | Tag chip input with typeahead |
|
|
||||||
|
|
||||||
### Styling Conventions (Tailwind CSS 4)
|
### Styling Conventions (Tailwind CSS 4)
|
||||||
|
|
||||||
Brand color utilities (defined in `layout.css`):
|
Brand color utilities (defined in `layout.css`):
|
||||||
|
|
||||||
| Class | Value | Usage |
|
| Class | Value | Usage |
|
||||||
|---|---|---|
|
| ------------ | --------- | -------------------------------- |
|
||||||
| `brand-navy` | `#002850` | Primary text, buttons, headers |
|
| `brand-navy` | `#002850` | Primary text, buttons, headers |
|
||||||
| `brand-mint` | `#A6DAD8` | Accents, hover underlines, icons |
|
| `brand-mint` | `#A6DAD8` | Accents, hover underlines, icons |
|
||||||
| `brand-sand` | `#E4E2D7` | Page background, card borders |
|
| `brand-sand` | `#E4E2D7` | Page background, card borders |
|
||||||
|
|
||||||
Typography:
|
Typography:
|
||||||
|
|
||||||
- `font-serif` (Merriweather) — body text, document titles, names
|
- `font-serif` (Merriweather) — body text, document titles, names
|
||||||
- `font-sans` (Montserrat) — labels, metadata, UI chrome
|
- `font-sans` (Montserrat) — labels, metadata, UI chrome
|
||||||
|
|
||||||
Card pattern for content sections:
|
Card pattern for content sections:
|
||||||
|
|
||||||
```svelte
|
```svelte
|
||||||
<div class="bg-white shadow-sm border border-brand-sand rounded-sm p-6">
|
<div class="bg-white shadow-sm border border-brand-sand rounded-sm p-6">
|
||||||
<h2 class="text-xs font-bold uppercase tracking-widest text-gray-400 mb-5">Section Title</h2>
|
<h2 class="text-xs font-bold uppercase tracking-widest text-gray-400 mb-5">Section Title</h2>
|
||||||
@@ -315,48 +253,19 @@ Card pattern for content sections:
|
|||||||
</div>
|
</div>
|
||||||
```
|
```
|
||||||
|
|
||||||
Save bar pattern — use **sticky full-bleed** for long forms (edit document), **card-style with `mt-4`** for short forms (new person):
|
Back button pattern — use the shared `<BackButton>` component from `$lib/shared/primitives/BackButton.svelte`. Do not use a static `<a href>` for back navigation.
|
||||||
```svelte
|
|
||||||
<!-- Long forms: sticky, full-bleed -->
|
|
||||||
<div class="sticky bottom-0 z-10 -mx-4 px-6 py-4 bg-white border-t border-brand-sand shadow-[0_-2px_8px_rgba(0,0,0,0.06)] flex items-center justify-between">
|
|
||||||
|
|
||||||
<!-- Short forms: card, top margin -->
|
|
||||||
<div class="mt-4 flex items-center justify-between rounded-sm border border-brand-sand bg-white px-6 py-4 shadow-sm">
|
|
||||||
```
|
|
||||||
|
|
||||||
Back button pattern — use the shared `<BackButton>` component from `$lib/components/BackButton.svelte`:
|
|
||||||
```svelte
|
|
||||||
<script lang="ts">
|
|
||||||
import BackButton from '$lib/components/BackButton.svelte';
|
|
||||||
</script>
|
|
||||||
|
|
||||||
<BackButton />
|
|
||||||
```
|
|
||||||
The component calls `history.back()` so the user returns to wherever they came from. Label is always "Zurück" (no contextual suffix — destination is unknown). Touch target ≥ 44px and focus ring are built in. Do not use a static `<a href>` for back navigation.
|
|
||||||
|
|
||||||
Subtle action link (e.g. "new document/person"):
|
|
||||||
```svelte
|
|
||||||
<a href="/documents/new" class="inline-flex items-center gap-1 text-sm font-medium text-brand-navy/60 hover:text-brand-navy transition-colors">
|
|
||||||
<svg class="w-4 h-4" ...><!-- plus icon --></svg>
|
|
||||||
Neues Dokument
|
|
||||||
</a>
|
|
||||||
```
|
|
||||||
|
|
||||||
### Error Handling (Frontend)
|
### Error Handling (Frontend)
|
||||||
|
|
||||||
`src/lib/errors.ts` mirrors the backend `ErrorCode` enum and maps codes to Paraglide translation keys. When adding a new `ErrorCode` on the backend:
|
→ See [CONTRIBUTING.md §Error handling](./CONTRIBUTING.md#error-handling)
|
||||||
1. Add it to `ErrorCode.java`
|
|
||||||
2. Add it to the `ErrorCode` type in `errors.ts`
|
**LLM reminder:** when adding a new `ErrorCode`: (1) add to `ErrorCode.java`, (2) add to `ErrorCode` type in `frontend/src/lib/shared/errors.ts`, (3) add a `case` in `getErrorMessage()`, (4) add i18n keys in `messages/{de,en,es}.json`.
|
||||||
3. Add a `case` in `getErrorMessage()`
|
|
||||||
4. Add the translation key in `messages/de.json`, `en.json`, `es.json`
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Infrastructure
|
## Infrastructure
|
||||||
|
|
||||||
The `docker-compose.yml` at the repo root orchestrates everything. A MinIO MC helper container runs at startup to create the `archive-documents` bucket. The backend container depends on both `db` and `minio` being healthy.
|
→ See [docs/DEPLOYMENT.md](./docs/DEPLOYMENT.md)
|
||||||
|
|
||||||
Database migrations live in `backend/src/main/resources/db/migration/` (Flyway, SQL files named `V{n}__{description}.sql`).
|
|
||||||
|
|
||||||
## API Testing
|
## API Testing
|
||||||
|
|
||||||
@@ -364,4 +273,4 @@ HTTP test files are in `backend/api_tests/` for use with the VS Code REST Client
|
|||||||
|
|
||||||
## Dev Container
|
## Dev Container
|
||||||
|
|
||||||
A `.devcontainer/` config is available (Java 21 + Node 24, ports 8080 and 3000 forwarded). Use VS Code's "Reopen in Container" for a pre-configured environment.
|
→ See [.devcontainer/README.md](./.devcontainer/README.md)
|
||||||
|
|||||||
@@ -11,7 +11,7 @@ Spring Boot 4.0 monolith serving the Familienarchiv REST API. Handles document m
|
|||||||
- **Server**: Jetty (not Tomcat — excluded in pom.xml)
|
- **Server**: Jetty (not Tomcat — excluded in pom.xml)
|
||||||
- **Data**: PostgreSQL 16, JPA/Hibernate, Spring Data JPA
|
- **Data**: PostgreSQL 16, JPA/Hibernate, Spring Data JPA
|
||||||
- **Migrations**: Flyway (SQL files in `src/main/resources/db/migration/`)
|
- **Migrations**: Flyway (SQL files in `src/main/resources/db/migration/`)
|
||||||
- **Security**: Spring Security, Spring Session JDBC, JWT tokens
|
- **Security**: Spring Security, Spring Session JDBC
|
||||||
- **File Storage**: MinIO via AWS SDK v2 (S3-compatible)
|
- **File Storage**: MinIO via AWS SDK v2 (S3-compatible)
|
||||||
- **Spreadsheet Import**: Apache POI 5.5.0 (Excel/ODS)
|
- **Spreadsheet Import**: Apache POI 5.5.0 (Excel/ODS)
|
||||||
- **API Docs**: SpringDoc OpenAPI 3.x (`/v3/api-docs` — dev profile only)
|
- **API Docs**: SpringDoc OpenAPI 3.x (`/v3/api-docs` — dev profile only)
|
||||||
@@ -19,7 +19,7 @@ Spring Boot 4.0 monolith serving the Familienarchiv REST API. Handles document m
|
|||||||
|
|
||||||
## Package Structure
|
## Package Structure
|
||||||
|
|
||||||
Package-by-domain: each domain owns its controller, service, repository, entities, and DTOs.
|
<!-- TODO: rewrite post-REFACTOR-1 — see Epic 4 -->
|
||||||
|
|
||||||
```
|
```
|
||||||
src/main/java/org/raddatz/familienarchiv/
|
src/main/java/org/raddatz/familienarchiv/
|
||||||
@@ -43,31 +43,28 @@ src/main/java/org/raddatz/familienarchiv/
|
|||||||
└── user/ # User domain — AppUser, UserGroup, UserService, auth controllers
|
└── user/ # User domain — AppUser, UserGroup, UserService, auth controllers
|
||||||
```
|
```
|
||||||
|
|
||||||
## Layering Rules (Strict)
|
For per-domain ownership and public surface, see each domain's `README.md`.
|
||||||
|
|
||||||
```
|
## Layering Rules
|
||||||
Controller → Service → Repository → DB
|
|
||||||
```
|
|
||||||
|
|
||||||
- **Controllers never call repositories directly.**
|
→ See [docs/ARCHITECTURE.md §Layering rule](../docs/ARCHITECTURE.md#layering-rule)
|
||||||
- **Services never reach into another domain's repository.** Call the other domain's service instead.
|
|
||||||
- ✅ `DocumentService` → `PersonService.getById()` → `PersonRepository`
|
**LLM reminder:** controllers never call repositories directly; services never reach into another domain's repository — always call the other domain's service.
|
||||||
- ❌ `DocumentService` → `PersonRepository` directly
|
|
||||||
|
|
||||||
## Key Entities
|
## Key Entities
|
||||||
|
|
||||||
| Entity | Table | Key Relationships |
|
| Entity | Table | Key Relationships |
|
||||||
|---|---|---|
|
| --------------------------- | ------------------------------- | ------------------------------------------------------------------------------- |
|
||||||
| `Document` | `documents` | ManyToOne sender (Person), ManyToMany receivers (Person), ManyToMany tags (Tag) |
|
| `Document` | `documents` | ManyToOne sender (Person), ManyToMany receivers (Person), ManyToMany tags (Tag) |
|
||||||
| `Person` | `persons` | Referenced by documents as sender/receiver; name aliases table |
|
| `Person` | `persons` | Referenced by documents as sender/receiver; name aliases table |
|
||||||
| `Tag` | `tag` | ManyToMany with documents via `document_tags`; self-referencing parent for tree |
|
| `Tag` | `tag` | ManyToMany with documents via `document_tags`; self-referencing parent for tree |
|
||||||
| `AppUser` | `app_users` | ManyToMany groups (UserGroup) |
|
| `AppUser` | `app_users` | ManyToMany groups (UserGroup) |
|
||||||
| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
|
| `UserGroup` | `user_groups` | Has a `Set<String> permissions` |
|
||||||
| `TranscriptionBlock` | `transcription_blocks` | Per-document, per-page text blocks with polygons |
|
| `TranscriptionBlock` | `transcription_blocks` | Per-document, per-page text blocks with polygons |
|
||||||
| `DocumentAnnotation` | `document_annotations` | Free-form annotations on document pages |
|
| `DocumentAnnotation` | `document_annotations` | Free-form annotations on document pages |
|
||||||
| `Comment` | `document_comments` | Threaded comments with mentions |
|
| `Comment` | `document_comments` | Threaded comments with mentions |
|
||||||
| `Notification` | `notifications` | User notification feed |
|
| `Notification` | `notifications` | User notification feed |
|
||||||
| `OcrJob` / `OcrJobDocument` | `ocr_jobs`, `ocr_job_documents` | Batch OCR job tracking |
|
| `OcrJob` / `OcrJobDocument` | `ocr_jobs`, `ocr_job_documents` | Batch OCR job tracking |
|
||||||
|
|
||||||
**`DocumentStatus` lifecycle:** `PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`
|
**`DocumentStatus` lifecycle:** `PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED`
|
||||||
|
|
||||||
@@ -104,32 +101,15 @@ public class MyEntity {
|
|||||||
|
|
||||||
## Error Handling
|
## Error Handling
|
||||||
|
|
||||||
Use `DomainException` for all domain errors:
|
→ See [CONTRIBUTING.md §Error handling](../CONTRIBUTING.md#error-handling)
|
||||||
|
|
||||||
```java
|
**LLM reminder:** use `DomainException.notFound/forbidden/conflict/internal()` — never throw raw exceptions from service methods. When adding a new `ErrorCode`: add to `ErrorCode.java`, mirror in `frontend/src/lib/shared/errors.ts`, add i18n keys in `messages/{de,en,es}.json`.
|
||||||
DomainException.notFound(ErrorCode.DOCUMENT_NOT_FOUND, "...")
|
|
||||||
DomainException.forbidden("...")
|
|
||||||
DomainException.conflict(ErrorCode.IMPORT_ALREADY_RUNNING, "...")
|
|
||||||
DomainException.internal(ErrorCode.FILE_UPLOAD_FAILED, "...")
|
|
||||||
```
|
|
||||||
|
|
||||||
When adding a new `ErrorCode`:
|
|
||||||
1. Add to `ErrorCode.java`
|
|
||||||
2. Mirror in frontend `src/lib/errors.ts`
|
|
||||||
3. Add Paraglide translation key in `messages/{de,en,es}.json`
|
|
||||||
|
|
||||||
## Security / Permissions
|
## Security / Permissions
|
||||||
|
|
||||||
Use `@RequirePermission` on controller methods or classes:
|
→ See [docs/ARCHITECTURE.md §Permission system](../docs/ARCHITECTURE.md#permission-system)
|
||||||
|
|
||||||
```java
|
**LLM reminder:** `@RequirePermission(Permission.WRITE_ALL)` is **required** on every `POST`, `PUT`, `PATCH`, `DELETE` endpoint — not optional. Do not mix with Spring Security's `@PreAuthorize`. Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`, `ANNOTATE_ALL`, `BLOG_WRITE`.
|
||||||
@RequirePermission(Permission.WRITE_ALL)
|
|
||||||
public Document updateDocument(...) { ... }
|
|
||||||
```
|
|
||||||
|
|
||||||
Available permissions: `READ_ALL`, `WRITE_ALL`, `ADMIN`, `ADMIN_USER`, `ADMIN_TAG`, `ADMIN_PERMISSION`
|
|
||||||
|
|
||||||
`PermissionAspect` checks the current user's `UserGroup.permissions` at runtime.
|
|
||||||
|
|
||||||
## OCR Integration
|
## OCR Integration
|
||||||
|
|
||||||
@@ -141,49 +121,35 @@ The backend orchestrates OCR by calling the Python `ocr-service` microservice vi
|
|||||||
- `OcrBatchService` — handles batch/job workflows
|
- `OcrBatchService` — handles batch/job workflows
|
||||||
- `OcrAsyncRunner` — async execution of OCR jobs
|
- `OcrAsyncRunner` — async execution of OCR jobs
|
||||||
|
|
||||||
|
For ocr-service internals, see [`ocr-service/README.md`](../ocr-service/README.md).
|
||||||
|
|
||||||
## API Testing
|
## API Testing
|
||||||
|
|
||||||
HTTP test files in `backend/api_tests/` for the VS Code REST Client extension.
|
HTTP test files in `backend/api_tests/` for the VS Code REST Client extension.
|
||||||
|
|
||||||
## How to Run
|
## How to Run
|
||||||
|
|
||||||
### Local Development
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd backend
|
cd backend
|
||||||
|
|
||||||
# Run with dev profile (requires PostgreSQL + MinIO running via docker-compose)
|
./mvnw spring-boot:run # Run with dev profile (requires PostgreSQL + MinIO)
|
||||||
./mvnw spring-boot:run
|
./mvnw clean package # Build JAR (with tests)
|
||||||
|
|
||||||
# Build JAR (with tests)
|
|
||||||
./mvnw clean package
|
|
||||||
|
|
||||||
# Build JAR skipping tests
|
|
||||||
./mvnw clean package -DskipTests
|
./mvnw clean package -DskipTests
|
||||||
|
./mvnw test # Run all tests
|
||||||
# Run all tests
|
./mvnw test -Dtest=ClassName # Run a single test class
|
||||||
./mvnw test
|
./mvnw clean verify # Run with JaCoCo coverage report
|
||||||
|
|
||||||
# Run a single test class
|
|
||||||
./mvnw test -Dtest=ClassName
|
|
||||||
|
|
||||||
# Run with coverage (JaCoCo)
|
|
||||||
./mvnw clean verify
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### OpenAPI TypeScript Generation
|
**OpenAPI / TypeScript type generation:**
|
||||||
|
|
||||||
1. Build and start backend with `--spring.profiles.active=dev`
|
1. Start backend with `--spring.profiles.active=dev`
|
||||||
2. In `frontend/`, run: `npm run generate:api`
|
2. In `frontend/`: `npm run generate:api`
|
||||||
|
|
||||||
### Profiles
|
**LLM reminder:** always regenerate types after any model or endpoint change — the most common cause of "where did my TypeScript type go?"
|
||||||
|
|
||||||
- **dev** (default): Enables OpenAPI, dev configs, e2e seeds
|
|
||||||
- **prod**: Production profile — no dev endpoints
|
|
||||||
|
|
||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
- Unit tests: Mockito + JUnit, pure in-memory
|
- Unit tests: Mockito + JUnit, pure in-memory
|
||||||
- Slice tests: `@WebMvcTest`, `@DataJpaTest` with Testcontainers PostgreSQL
|
- Slice tests: `@WebMvcTest`, `@DataJpaTest` with Testcontainers PostgreSQL
|
||||||
- Integration tests: Full Spring context with Testcontainers
|
- Integration tests: Full Spring context with Testcontainers
|
||||||
- Coverage gate: 88% branch coverage overall (JaCoCo)
|
- Coverage gate: 88% branch coverage (JaCoCo)
|
||||||
|
|||||||
@@ -1,97 +1,5 @@
|
|||||||
# Docs — Familienarchiv
|
# docs/
|
||||||
|
|
||||||
## Overview
|
→ See [docs/README.md](./README.md) for the folder structure and documentation guide.
|
||||||
|
|
||||||
Project documentation organized into four categories: architecture decision records (ADRs), system architecture diagrams, infrastructure runbooks, and detailed UI/UX specifications.
|
**LLM reminder:** ADRs are sequential — use the next number after the highest existing one in `docs/adr/`. When making a significant architectural change (new service, data model change, technology swap), write a new ADR before implementing.
|
||||||
|
|
||||||
## Folder Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
docs/
|
|
||||||
├── adr/ # Architecture Decision Records
|
|
||||||
├── architecture/ # C4 model diagrams and system architecture docs
|
|
||||||
├── infrastructure/ # Deployment, CI/CD, and ops guides
|
|
||||||
├── specs/ # UI/UX feature specifications (HTML)
|
|
||||||
├── app-analysis-*.md # Application analysis reports
|
|
||||||
├── mail.md # Mail system documentation
|
|
||||||
├── security-guide.md # Security policies and hardening guide
|
|
||||||
├── STYLEGUIDE.md # Coding and design style guide
|
|
||||||
├── TODO-backend.md # Backend backlog
|
|
||||||
└── TODO-frontend.md # Frontend backlog
|
|
||||||
```
|
|
||||||
|
|
||||||
## ADR (`adr/`)
|
|
||||||
|
|
||||||
Architecture Decision Records capture major technical decisions and their rationale.
|
|
||||||
|
|
||||||
| ADR | Title | Status |
|
|
||||||
|---|---|---|
|
|
||||||
| `001-ocr-python-microservice.md` | OCR as a separate Python container | Accepted |
|
|
||||||
| `002-polygon-jsonb-storage.md` | Polygon coordinates in JSONB columns | Accepted |
|
|
||||||
| `003-chronik-unified-activity-feed.md` | Unified activity feed (Chronik) | Accepted |
|
|
||||||
|
|
||||||
When making a significant architectural change (new service, data model change, technology swap), write a new ADR following the format:
|
|
||||||
- Status (Proposed / Accepted / Deprecated / Superseded)
|
|
||||||
- Context (forces at play)
|
|
||||||
- Decision (what we decided)
|
|
||||||
- Consequences (trade-offs)
|
|
||||||
- Alternatives Considered (table format)
|
|
||||||
|
|
||||||
## Architecture (`architecture/`)
|
|
||||||
|
|
||||||
Contains C4 model diagrams describing the system at different zoom levels:
|
|
||||||
|
|
||||||
- **Context diagram** — How Familienarchiv fits into the user and system ecosystem
|
|
||||||
- **Container diagram** — The high-level technology choices (Spring Boot, SvelteKit, PostgreSQL, MinIO, OCR service)
|
|
||||||
- **Component diagram** — Major structural components within the backend
|
|
||||||
|
|
||||||
Written in Markdown with embedded Mermaid or PlantUML diagrams (`c4-diagrams.md`).
|
|
||||||
|
|
||||||
## Infrastructure (`infrastructure/`)
|
|
||||||
|
|
||||||
Operational documentation for running Familienarchiv in production and CI.
|
|
||||||
|
|
||||||
| Document | Purpose |
|
|
||||||
|---|---|
|
|
||||||
| `ci-gitea.md` | Gitea CI/CD pipeline configuration |
|
|
||||||
| `production-compose.md` | Production Docker Compose setup |
|
|
||||||
| `s3-migration.md` | Migrating documents between S3 buckets |
|
|
||||||
| `self-hosted-catalogue.md` | Self-hosted software catalogue |
|
|
||||||
|
|
||||||
## Specs (`specs/`)
|
|
||||||
|
|
||||||
High-fidelity UI/UX specifications written as standalone HTML files. These are design documents that describe exact layout, interactions, and responsive behavior before implementation.
|
|
||||||
|
|
||||||
Each spec typically includes:
|
|
||||||
- Visual mockups with CSS-in-HTML styling
|
|
||||||
- Interaction flows and state transitions
|
|
||||||
- Responsive breakpoint behavior
|
|
||||||
- Accessibility requirements
|
|
||||||
|
|
||||||
Examples of active spec areas:
|
|
||||||
- Document detail page (`document-topbar-*.html`, `documents-page-spec.html`)
|
|
||||||
- Admin interfaces (`admin-redesign-*.html`, `admin-tag-overhaul.html`)
|
|
||||||
- Transcription workflows (`inline-transcription-*.html`, `annotation-transcription-*.html`)
|
|
||||||
- Dashboard and activity feeds (`dashboard-*.html`, `chronik-spec.html`)
|
|
||||||
- OCR admin (`ocr-admin-spec.html`)
|
|
||||||
|
|
||||||
## How to Use
|
|
||||||
|
|
||||||
1. **Before implementing a feature**, check `specs/` for an existing specification.
|
|
||||||
2. **When proposing a new architecture**, draft an ADR in `adr/` and discuss before coding.
|
|
||||||
3. **When deploying**, follow `infrastructure/production-compose.md`.
|
|
||||||
4. **Keep TODO files updated** — they serve as lightweight backlogs.
|
|
||||||
|
|
||||||
## Style Guide
|
|
||||||
|
|
||||||
`STYLEGUIDE.md` covers:
|
|
||||||
- Code formatting and linting rules
|
|
||||||
- Component naming conventions
|
|
||||||
- Color palette and typography
|
|
||||||
- Accessibility standards (WCAG 2.1 AA)
|
|
||||||
|
|
||||||
## Contributing
|
|
||||||
|
|
||||||
- ADRs should be sequential (`NNN-descriptive-name.md`).
|
|
||||||
- Specs should be self-contained HTML files viewable in a browser.
|
|
||||||
- Infrastructure docs should include copy-pasteable commands.
|
|
||||||
|
|||||||
86
docs/README.md
Normal file
86
docs/README.md
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
# docs/
|
||||||
|
|
||||||
|
Project documentation organised into four categories: architecture decision records (ADRs), system architecture diagrams, infrastructure runbooks, and detailed UI/UX specifications.
|
||||||
|
|
||||||
|
## Folder structure
|
||||||
|
|
||||||
|
```
|
||||||
|
docs/
|
||||||
|
├── adr/ # Architecture Decision Records
|
||||||
|
├── architecture/ # C4 model diagrams and system architecture docs
|
||||||
|
├── infrastructure/ # Deployment, CI/CD, and ops guides
|
||||||
|
├── specs/ # UI/UX feature specifications (HTML)
|
||||||
|
├── ARCHITECTURE.md # Human-readable architecture overview (DOC-2)
|
||||||
|
├── DEPLOYMENT.md # Day-1 checklist and operational reference (DOC-5)
|
||||||
|
├── GLOSSARY.md # Domain terminology (DOC-3)
|
||||||
|
├── security-guide.md # Security policies and hardening guide
|
||||||
|
├── STYLEGUIDE.md # Coding and design style guide
|
||||||
|
└── infrastructure/ # Production compose, CI config, S3 migration
|
||||||
|
```
|
||||||
|
|
||||||
|
## ADR (`adr/`)
|
||||||
|
|
||||||
|
Architecture Decision Records capture major technical decisions and their rationale.
|
||||||
|
|
||||||
|
| ADR | Title | Status |
|
||||||
|
| -------------------------------------- | ------------------------------------ | -------- |
|
||||||
|
| `001-ocr-python-microservice.md` | OCR as a separate Python container | Accepted |
|
||||||
|
| `002-polygon-jsonb-storage.md` | Polygon coordinates in JSONB columns | Accepted |
|
||||||
|
| `003-chronik-unified-activity-feed.md` | Unified activity feed (Chronik) | Accepted |
|
||||||
|
|
||||||
|
When making a significant architectural change (new service, data model change, technology swap), write a new ADR:
|
||||||
|
|
||||||
|
- **Status** (Proposed / Accepted / Deprecated / Superseded)
|
||||||
|
- **Context** (forces at play)
|
||||||
|
- **Decision** (what we decided)
|
||||||
|
- **Consequences** (trade-offs)
|
||||||
|
- **Alternatives Considered** (table format)
|
||||||
|
|
||||||
|
ADRs are sequential (`NNN-descriptive-name.md`). Do not reuse numbers.
|
||||||
|
|
||||||
|
## Architecture (`architecture/`)
|
||||||
|
|
||||||
|
Contains C4 model diagrams describing the system at different zoom levels:
|
||||||
|
|
||||||
|
- **Context diagram** — How Familienarchiv fits into the user and system ecosystem
|
||||||
|
- **Container diagram** — The high-level technology choices (Spring Boot, SvelteKit, PostgreSQL, MinIO, OCR service)
|
||||||
|
- **Component diagram** — Major structural components within the backend
|
||||||
|
|
||||||
|
Written in Markdown with embedded Mermaid diagrams (`c4-diagrams.md`). Gitea renders these automatically.
|
||||||
|
|
||||||
|
For the human-readable architecture narrative, see [`docs/ARCHITECTURE.md`](ARCHITECTURE.md).
|
||||||
|
|
||||||
|
## Infrastructure (`infrastructure/`)
|
||||||
|
|
||||||
|
Operational documentation for running Familienarchiv in production and CI.
|
||||||
|
|
||||||
|
| Document | Purpose |
|
||||||
|
| -------------------------- | ---------------------------------------------------- |
|
||||||
|
| `ci-gitea.md` | Gitea CI/CD pipeline configuration |
|
||||||
|
| `production-compose.md` | Production Docker Compose setup and VPS provisioning |
|
||||||
|
| `s3-migration.md` | Migrating documents between S3 buckets |
|
||||||
|
| `self-hosted-catalogue.md` | Self-hosted software catalogue |
|
||||||
|
|
||||||
|
For the day-1 deployment checklist, see [`docs/DEPLOYMENT.md`](DEPLOYMENT.md).
|
||||||
|
|
||||||
|
## Specs (`specs/`)
|
||||||
|
|
||||||
|
High-fidelity UI/UX specifications written as standalone HTML files. These are design documents describing exact layout, interactions, and responsive behavior before implementation.
|
||||||
|
|
||||||
|
Each spec typically includes:
|
||||||
|
|
||||||
|
- Visual mockups with CSS-in-HTML styling
|
||||||
|
- Interaction flows and state transitions
|
||||||
|
- Responsive breakpoint behavior
|
||||||
|
- Accessibility requirements
|
||||||
|
|
||||||
|
Before implementing a feature, check `specs/` for an existing specification.
|
||||||
|
|
||||||
|
## Style Guide
|
||||||
|
|
||||||
|
[`docs/STYLEGUIDE.md`](STYLEGUIDE.md) covers:
|
||||||
|
|
||||||
|
- Code formatting and linting rules
|
||||||
|
- Component naming conventions
|
||||||
|
- Color palette and typography
|
||||||
|
- Accessibility standards (WCAG 2.1 AA)
|
||||||
1
familienarchiv-408
Submodule
1
familienarchiv-408
Submodule
Submodule familienarchiv-408 added at 6ecff120e6
@@ -71,29 +71,13 @@ src/
|
|||||||
└── ... # Other SvelteKit config files
|
└── ... # Other SvelteKit config files
|
||||||
```
|
```
|
||||||
|
|
||||||
|
For per-domain component inventories, see the domain READMEs in `src/lib/<domain>/README.md`.
|
||||||
|
|
||||||
## API Client Pattern
|
## API Client Pattern
|
||||||
|
|
||||||
All server-side API calls use the typed client from `$lib/api.server.ts`:
|
→ See [CONTRIBUTING.md §Frontend API client](../CONTRIBUTING.md#frontend-api-client)
|
||||||
|
|
||||||
```typescript
|
**LLM reminder:** check `!result.response.ok` (not `result.error` — breaks when spec has no error responses); cast errors as `result.error as unknown as { code?: string }`; use `result.data!` after an ok check. For multipart/form-data (file uploads), bypass the typed client and use raw `fetch`.
|
||||||
const api = createApiClient(fetch);
|
|
||||||
const result = await api.GET('/api/persons/{id}', { params: { path: { id } } });
|
|
||||||
|
|
||||||
// Always check via response.ok, NOT result.error
|
|
||||||
if (!result.response.ok) {
|
|
||||||
const code = (result.error as unknown as { code?: string })?.code;
|
|
||||||
throw error(result.response.status, getErrorMessage(code));
|
|
||||||
}
|
|
||||||
return { person: result.data! };
|
|
||||||
```
|
|
||||||
|
|
||||||
Key rules:
|
|
||||||
|
|
||||||
- Use `!result.response.ok` for error checking (not `if (result.error)` — breaks when spec has no error responses defined)
|
|
||||||
- Cast errors as `result.error as unknown as { code?: string }` to extract backend error code
|
|
||||||
- Use `result.data!` after an ok check
|
|
||||||
|
|
||||||
For multipart/form-data (file uploads), bypass the typed client and use raw `fetch`.
|
|
||||||
|
|
||||||
## Form Actions Pattern
|
## Form Actions Pattern
|
||||||
|
|
||||||
@@ -102,7 +86,7 @@ For multipart/form-data (file uploads), bypass the typed client and use raw `fet
|
|||||||
export const actions = {
|
export const actions = {
|
||||||
default: async ({ request, fetch }) => {
|
default: async ({ request, fetch }) => {
|
||||||
const formData = await request.formData();
|
const formData = await request.formData();
|
||||||
const name = formData.get('name') as string;
|
const name = formData.get('name') as string; // cast needed — FormData returns FormDataEntryValue
|
||||||
// ...
|
// ...
|
||||||
return fail(400, { error: 'message' }); // on error
|
return fail(400, { error: 'message' }); // on error
|
||||||
throw redirect(303, '/target'); // on success
|
throw redirect(303, '/target'); // on success
|
||||||
@@ -112,13 +96,9 @@ export const actions = {
|
|||||||
|
|
||||||
## Date Handling
|
## Date Handling
|
||||||
|
|
||||||
- **Forms**: German format `dd.mm.yyyy` with auto-dot insertion via `handleDateInput()`. A hidden `<input type="hidden" name="documentDate" value={dateIso}>` sends ISO to the backend.
|
→ See [CONTRIBUTING.md §Date handling](../CONTRIBUTING.md#date-handling)
|
||||||
- **Display**: Always use `Intl.DateTimeFormat` with `T12:00:00` suffix to prevent UTC off-by-one:
|
|
||||||
```typescript
|
**LLM reminder:** always append `T12:00:00` when constructing `new Date()` from an ISO date string — prevents UTC timezone off-by-one errors. Forms use German `dd.mm.yyyy` format via `handleDateInput()` with a hidden ISO input.
|
||||||
new Intl.DateTimeFormat('de-DE', { day: 'numeric', month: 'long', year: 'numeric' }).format(
|
|
||||||
new Date(doc.documentDate + 'T12:00:00')
|
|
||||||
);
|
|
||||||
```
|
|
||||||
|
|
||||||
## Styling Conventions (Tailwind CSS 4)
|
## Styling Conventions (Tailwind CSS 4)
|
||||||
|
|
||||||
@@ -146,15 +126,9 @@ Card pattern for content sections:
|
|||||||
|
|
||||||
## Key UI Components
|
## Key UI Components
|
||||||
|
|
||||||
| Component | Location | Props | Description |
|
→ See per-domain READMEs: [`src/lib/person/README.md`](src/lib/person/README.md), [`src/lib/tag/README.md`](src/lib/tag/README.md), [`src/lib/document/README.md`](src/lib/document/README.md), [`src/lib/shared/README.md`](src/lib/shared/README.md)
|
||||||
| -------------------- | ------------------------------ | --------------------------------------- | ------------------------------------------ |
|
|
||||||
| `PersonTypeahead` | `$lib/person/` | `name`, `label`, `value`, `initialName` | Single-person selector with typeahead |
|
**LLM reminder:** `BackButton` is at `$lib/shared/primitives/BackButton.svelte` — use it for all back navigation; never a static `<a href>`. API client is at `$lib/shared/api.server`.
|
||||||
| `PersonMultiSelect` | `$lib/person/` | `selectedPersons` (bind) | Chip-based multi-person selector |
|
|
||||||
| `TagInput` | `$lib/tag/` | `tags` (bind), `allowCreation?` | Tag chip input with typeahead |
|
|
||||||
| `PdfViewer` | `$lib/document/` | `url`, `annotations` | PDF rendering with annotation overlay |
|
|
||||||
| `TranscriptionBlock` | `$lib/document/transcription/` | `block`, `mode` | Read/edit transcription block |
|
|
||||||
| `DocumentTopBar` | `$lib/document/` | `document` | Responsive document metadata header |
|
|
||||||
| `BackButton` | `$lib/shared/primitives/` | — | Calls `history.back()`; 44 px touch target |
|
|
||||||
|
|
||||||
## How to Run
|
## How to Run
|
||||||
|
|
||||||
|
|||||||
@@ -1,154 +1,7 @@
|
|||||||
# OCR Service — Familienarchiv
|
# OCR Service
|
||||||
|
|
||||||
## Overview
|
→ See [ocr-service/README.md](./README.md) for tech stack, architecture, endpoints, environment variables, local development, testing, and training.
|
||||||
|
|
||||||
Python FastAPI microservice that performs OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition) on historical family documents. It exposes a simple HTTP API consumed by the Spring Boot backend. The service is stateless — all job tracking and business logic remain in Java.
|
**LLM reminder:** the OCR service is a **single-node container** — training reloads the model in-process, so multiple replicas cause model-state divergence (see ADR-001). All job tracking and business logic stay in Spring Boot; the Python service is stateless OCR only.
|
||||||
|
|
||||||
## Tech Stack
|
`ALLOWED_PDF_HOSTS` must never be set to `*` — that opens SSRF. The default (`minio,localhost,127.0.0.1`) is correct for dev.
|
||||||
|
|
||||||
- **Framework**: FastAPI 0.115.6 (Python 3.11)
|
|
||||||
- **OCR Engines**:
|
|
||||||
- **Surya** (`surya-ocr`) — Transformer-based, handles typewritten and modern Latin handwriting
|
|
||||||
- **Kraken** (`kraken==7.0`) — Historical HTR model support, required for pre-1941 German Kurrent/Sütterlin scripts
|
|
||||||
- **ML**: PyTorch 2.7.1 (CPU-only), torchvision, transformers
|
|
||||||
- **PDF Processing**: `pypdfium2` (rendering), `pillow`
|
|
||||||
- **Image Processing**: `opencv-python-headless`, `pyvips`
|
|
||||||
- **Spell Checking**: `pyspellchecker`
|
|
||||||
- **HTTP Client**: `httpx`
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
The service is a single-node container (see ADR-001). OCR training reloads the model in-process after each run, so multiple replicas would cause training conflicts and model-state divergence.
|
|
||||||
|
|
||||||
### Interface Contract
|
|
||||||
|
|
||||||
**Request:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"pdfUrl": "http://minio:9000/archive-documents/abc.pdf?presigned...",
|
|
||||||
"scriptType": "HANDWRITING_KURRENT",
|
|
||||||
"language": "de"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:** Array of `OcrBlock` objects:
|
|
||||||
```json
|
|
||||||
[
|
|
||||||
{
|
|
||||||
"pageNumber": 0,
|
|
||||||
"x": 0.12, "y": 0.08, "width": 0.76, "height": 0.04,
|
|
||||||
"polygon": [[0.12,0.08],[0.88,0.09],[0.87,0.12],[0.13,0.11]],
|
|
||||||
"text": "Sehr geehrter Herr ..."
|
|
||||||
}
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
Coordinates are normalized (0-1) relative to page dimensions.
|
|
||||||
|
|
||||||
### File Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
ocr-service/
|
|
||||||
├── main.py # FastAPI app, endpoints, request handling
|
|
||||||
├── models.py # Pydantic models (OcrRequest, OcrBlock)
|
|
||||||
├── engines/
|
|
||||||
│ ├── __init__.py
|
|
||||||
│ ├── kraken.py # Kraken engine wrapper (Kurrent models)
|
|
||||||
│ └── surya.py # Surya engine wrapper (typewritten/Latin)
|
|
||||||
├── preprocessing.py # Image preprocessing (CLAHE, deskew, denoise)
|
|
||||||
├── confidence.py # Confidence scoring and thresholding
|
|
||||||
├── spell_check.py # Post-OCR spell correction
|
|
||||||
├── ensure_blla_model.py # Model download / verification helper
|
|
||||||
├── dictionaries/ # Historical word lists for spell checking
|
|
||||||
├── requirements.txt # Python dependencies
|
|
||||||
├── Dockerfile # Production container image
|
|
||||||
└── entrypoint.sh # Container startup script
|
|
||||||
```
|
|
||||||
|
|
||||||
### Key Endpoints
|
|
||||||
|
|
||||||
| Endpoint | Method | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| `/health` | GET | Returns 200 only after models are loaded |
|
|
||||||
| `/ocr` | POST | Extract text blocks from a PDF URL |
|
|
||||||
| `/ocr/stream` | POST | Streaming OCR with SSE-style progress events |
|
|
||||||
| `/training/submit` | POST | Submit training data for model fine-tuning |
|
|
||||||
|
|
||||||
### Environment Variables
|
|
||||||
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| `KRAKEN_MODEL_PATH` | `/app/models/german_kurrent.mlmodel` | Path to Kraken model file |
|
|
||||||
| `TRAINING_TOKEN` | `""` | Bearer token required for training endpoints |
|
|
||||||
| `OCR_CONFIDENCE_THRESHOLD` | `0.3` | Minimum confidence for Latin scripts |
|
|
||||||
| `OCR_CONFIDENCE_THRESHOLD_KURRENT` | `0.5` | Minimum confidence for Kurrent scripts |
|
|
||||||
| `RECOGNITION_BATCH_SIZE` | `16` | Kraken recognition batch size |
|
|
||||||
| `DETECTOR_BATCH_SIZE` | `8` | Surya detector batch size |
|
|
||||||
| `OCR_CLAHE_CLIP_LIMIT` | `2.0` | CLAHE contrast enhancement limit |
|
|
||||||
| `OCR_CLAHE_TILE_SIZE` | `8` | CLAHE tile grid size |
|
|
||||||
| `OCR_MAX_CACHED_MODELS` | `2` | LRU model cache size (~500 MB each) |
|
|
||||||
| `ALLOWED_PDF_HOSTS` | `minio,localhost,127.0.0.1` | SSRF protection — allowed PDF URL hosts |
|
|
||||||
|
|
||||||
## How to Run
|
|
||||||
|
|
||||||
### Local Development (Python venv)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ocr-service
|
|
||||||
python -m venv .venv
|
|
||||||
source .venv/bin/activate
|
|
||||||
|
|
||||||
# Install PyTorch CPU first (saves ~2 GB vs CUDA)
|
|
||||||
pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cpu
|
|
||||||
|
|
||||||
# Install remaining dependencies
|
|
||||||
pip install -r requirements.txt
|
|
||||||
|
|
||||||
# Run development server
|
|
||||||
fastapi dev main.py --host 0.0.0.0 --port 8000
|
|
||||||
|
|
||||||
# Or production mode
|
|
||||||
uvicorn main:app --host 0.0.0.0 --port 8000
|
|
||||||
```
|
|
||||||
|
|
||||||
### Docker (via docker-compose)
|
|
||||||
|
|
||||||
The OCR service is included in the root `docker-compose.yml`:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker-compose up -d ocr-service
|
|
||||||
```
|
|
||||||
|
|
||||||
The container:
|
|
||||||
- Exposes port 8000 internally (not mapped to host by default)
|
|
||||||
- Mounts `ocr_models` and `ocr_cache` volumes for persistence
|
|
||||||
- Has a 120-second startup grace period for model loading
|
|
||||||
- Memory limit: 12 GB
|
|
||||||
|
|
||||||
### Model Downloads
|
|
||||||
|
|
||||||
Use the helper script to download Kraken models:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
./scripts/download-kraken-models.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
Models are stored in the `ocr_models` Docker volume or `./ocr-service/models/` locally.
|
|
||||||
|
|
||||||
## Testing
|
|
||||||
|
|
||||||
Only a subset of tests can run without the full ML stack:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ocr-service
|
|
||||||
pip install pytest pytest-asyncio pyspellchecker
|
|
||||||
|
|
||||||
# No ML required — pure logic tests
|
|
||||||
python -m pytest test_spell_check.py test_confidence.py test_sender_registry.py -v
|
|
||||||
```
|
|
||||||
|
|
||||||
Tests requiring PyTorch/Kraken/Surya (e.g., `test_engines.py`) must be run in the Docker container or a fully provisioned venv.
|
|
||||||
|
|
||||||
## Training
|
|
||||||
|
|
||||||
The service supports in-process model fine-tuning via Kraken's `ketos` training pipeline. Training endpoints require the `TRAINING_TOKEN` bearer token. After training completes, the model is reloaded in-process — this is why only a single replica is supported.
|
|
||||||
|
|||||||
@@ -1,144 +1,5 @@
|
|||||||
# Scripts — Familienarchiv
|
# scripts/
|
||||||
|
|
||||||
## Overview
|
→ See [scripts/README.md](./README.md) for the full list of scripts, their purpose, and usage.
|
||||||
|
|
||||||
Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.
|
**LLM reminder:** when adding a new script, document it in `scripts/README.md` (not here).
|
||||||
|
|
||||||
## Scripts
|
|
||||||
|
|
||||||
### `reset-db.sh`
|
|
||||||
**Purpose**: Hard-reset the development database, wiping all documents, persons, tags, and related data.
|
|
||||||
|
|
||||||
**Usage:**
|
|
||||||
```bash
|
|
||||||
./scripts/reset-db.sh
|
|
||||||
# Type 'yes' to confirm
|
|
||||||
```
|
|
||||||
|
|
||||||
**What it truncates:**
|
|
||||||
- `transcription_block_versions`
|
|
||||||
- `transcription_blocks`
|
|
||||||
- `comment_mentions`
|
|
||||||
- `document_comments`
|
|
||||||
- `document_annotations`
|
|
||||||
- `document_versions`
|
|
||||||
- `notifications`
|
|
||||||
- `documents`
|
|
||||||
- `person_name_aliases`
|
|
||||||
- `persons`
|
|
||||||
- `tag`
|
|
||||||
|
|
||||||
> ⚠️ **Destructive operation** — only for development!
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### `rebuild-frontend.sh`
|
|
||||||
**Purpose**: Force a clean rebuild of the frontend Docker container.
|
|
||||||
|
|
||||||
**Usage:**
|
|
||||||
```bash
|
|
||||||
./scripts/rebuild-frontend.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### `download-kraken-models.sh`
|
|
||||||
**Purpose**: Download Kraken HTR models for German Kurrent and Sütterlin scripts.
|
|
||||||
|
|
||||||
**Usage:**
|
|
||||||
```bash
|
|
||||||
./scripts/download-kraken-models.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
Downloads models into `./ocr-service/models/` or the `ocr_models` Docker volume. Models are ~100-500 MB each.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### `download-paperless.sh`
|
|
||||||
**Purpose**: Download exported documents from a Paperless-ngx instance.
|
|
||||||
|
|
||||||
**Usage:**
|
|
||||||
```bash
|
|
||||||
./scripts/download-paperless.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
Requires environment variables or config for the Paperless API endpoint and token.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### `flatten-paperless.sh`
|
|
||||||
**Purpose**: Flatten nested Paperless export directories into a single import-ready structure.
|
|
||||||
|
|
||||||
**Usage:**
|
|
||||||
```bash
|
|
||||||
./scripts/flatten-paperless.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### `generate_data.py`
|
|
||||||
**Purpose**: Generate synthetic test data for development.
|
|
||||||
|
|
||||||
**Usage:**
|
|
||||||
```bash
|
|
||||||
python scripts/generate_data.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Generates fake documents, persons, and tags suitable for load testing or UI development.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### `prepare_historical_dict.py`
|
|
||||||
**Purpose**: Build a historical German word dictionary for the OCR spell-checker.
|
|
||||||
|
|
||||||
**Usage:**
|
|
||||||
```bash
|
|
||||||
python scripts/prepare_historical_dict.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Processes raw word lists into the format expected by `ocr-service/spell_check.py`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### `schema.sql`
|
|
||||||
**Purpose**: Complete database schema dump for reference.
|
|
||||||
|
|
||||||
**Note**: Flyway migrations in `backend/src/main/resources/db/migration/` are the source of truth for schema evolution. `schema.sql` is a snapshot for quick reference only.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### `large-data.sql`
|
|
||||||
**Purpose**: Pre-seeded dataset with a large number of documents for performance testing.
|
|
||||||
|
|
||||||
**Usage:**
|
|
||||||
```bash
|
|
||||||
# Import into PostgreSQL
|
|
||||||
docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql
|
|
||||||
```
|
|
||||||
|
|
||||||
## How to Use
|
|
||||||
|
|
||||||
Most scripts should be run from the **repository root**:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Database reset
|
|
||||||
./scripts/reset-db.sh
|
|
||||||
|
|
||||||
# Model download
|
|
||||||
./scripts/download-kraken-models.sh
|
|
||||||
|
|
||||||
# Data generation
|
|
||||||
cd scripts && python generate_data.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Ensure scripts are executable:
|
|
||||||
```bash
|
|
||||||
chmod +x scripts/*.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
## Adding New Scripts
|
|
||||||
|
|
||||||
1. Place the script in `scripts/`
|
|
||||||
2. Add a header comment describing purpose and usage
|
|
||||||
3. Make it executable (`chmod +x`)
|
|
||||||
4. Document it in this `CLAUDE.md`
|
|
||||||
|
|||||||
161
scripts/README.md
Normal file
161
scripts/README.md
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
# scripts/
|
||||||
|
|
||||||
|
Utility scripts for development, data management, model downloads, and database operations. These are standalone shell and Python scripts used outside the normal application runtime.
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
### `reset-db.sh`
|
||||||
|
|
||||||
|
**Purpose**: Hard-reset the development database, wiping all documents, persons, tags, and related data.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/reset-db.sh
|
||||||
|
# Type 'yes' to confirm
|
||||||
|
```
|
||||||
|
|
||||||
|
**What it truncates:**
|
||||||
|
|
||||||
|
- `transcription_block_versions`
|
||||||
|
- `transcription_blocks`
|
||||||
|
- `comment_mentions`
|
||||||
|
- `document_comments`
|
||||||
|
- `document_annotations`
|
||||||
|
- `document_versions`
|
||||||
|
- `notifications`
|
||||||
|
- `documents`
|
||||||
|
- `person_name_aliases`
|
||||||
|
- `persons`
|
||||||
|
- `tag`
|
||||||
|
|
||||||
|
> ⚠️ **Destructive operation — only for development!** This wipes ALL data. Not reversible without a backup.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `rebuild-frontend.sh`
|
||||||
|
|
||||||
|
**Purpose**: Force a clean rebuild of the frontend Docker container.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/rebuild-frontend.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `download-kraken-models.sh`
|
||||||
|
|
||||||
|
**Purpose**: Download Kraken HTR models for German Kurrent and Sütterlin scripts.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/download-kraken-models.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Downloads models into `./ocr-service/models/` or the `ocr_models` Docker volume. Models are ~100–500 MB each.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `download-paperless.sh`
|
||||||
|
|
||||||
|
**Purpose**: Download exported documents from a Paperless-ngx instance.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/download-paperless.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Requires environment variables or config for the Paperless API endpoint and token.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `flatten-paperless.sh`
|
||||||
|
|
||||||
|
**Purpose**: Flatten nested Paperless export directories into a single import-ready structure.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/flatten-paperless.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `generate_data.py`
|
||||||
|
|
||||||
|
**Purpose**: Generate synthetic test data for development.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/generate_data.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Generates fake documents, persons, and tags suitable for load testing or UI development.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `prepare_historical_dict.py`
|
||||||
|
|
||||||
|
**Purpose**: Build a historical German word dictionary for the OCR spell-checker.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/prepare_historical_dict.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Processes raw word lists into the format expected by `ocr-service/spell_check.py`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `schema.sql`
|
||||||
|
|
||||||
|
**Purpose**: Complete database schema dump for reference.
|
||||||
|
|
||||||
|
**Note**: Flyway migrations in `backend/src/main/resources/db/migration/` are the source of truth for schema evolution. `schema.sql` is a snapshot for quick reference only.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `large-data.sql`
|
||||||
|
|
||||||
|
**Purpose**: Pre-seeded dataset with a large number of documents for performance testing.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Import into PostgreSQL
|
||||||
|
docker exec -i archive-db psql -U archive_user -d family_archive_db < scripts/large-data.sql
|
||||||
|
```
|
||||||
|
|
||||||
|
## How to Use
|
||||||
|
|
||||||
|
Most scripts should be run from the **repository root**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Database reset
|
||||||
|
./scripts/reset-db.sh
|
||||||
|
|
||||||
|
# Model download
|
||||||
|
./scripts/download-kraken-models.sh
|
||||||
|
|
||||||
|
# Data generation
|
||||||
|
cd scripts && python generate_data.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Ensure scripts are executable:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
chmod +x scripts/*.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Adding New Scripts
|
||||||
|
|
||||||
|
1. Place the script in `scripts/`
|
||||||
|
2. Add a header comment describing purpose and usage
|
||||||
|
3. Make it executable (`chmod +x`)
|
||||||
|
4. Document it in this `README.md`
|
||||||
Reference in New Issue
Block a user