Files
familienarchiv/backend/CLAUDE.md
Marcel 50b18f0849
Some checks failed
CI / Unit & Component Tests (push) Failing after 3m29s
CI / OCR Service Tests (push) Successful in 32s
CI / Backend Unit Tests (push) Failing after 3m29s
docs(legibility): fix three review blockers in DOC-7
- docs/README.md: remove duplicate infrastructure/ entry at end of folder tree
- ocr-service/CLAUDE.md: add **LLM reminder:** prefix to ALLOWED_PDF_HOSTS
  SSRF warning (consistent with all other machine-readable instructions)
- backend/CLAUDE.md: restore ResponseStatusException note for simple controller
  validation — avoids LLMs reaching for DomainException for trivial checks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 07:41:02 +02:00

7.9 KiB

Backend — Familienarchiv

Overview

Spring Boot 4.0 monolith serving the Familienarchiv REST API. Handles document management, person/entity tracking, transcription workflows, OCR orchestration, user management, and full-text search.

Tech Stack

  • Framework: Spring Boot 4.0 (Java 21)
  • Build: Maven (./mvnw wrapper)
  • Server: Jetty (not Tomcat — excluded in pom.xml)
  • Data: PostgreSQL 16, JPA/Hibernate, Spring Data JPA
  • Migrations: Flyway (SQL files in src/main/resources/db/migration/)
  • Security: Spring Security, Spring Session JDBC
  • File Storage: MinIO via AWS SDK v2 (S3-compatible)
  • Spreadsheet Import: Apache POI 5.5.0 (Excel/ODS)
  • API Docs: SpringDoc OpenAPI 3.x (/v3/api-docs — dev profile only)
  • Monitoring: Spring Boot Actuator (/actuator/health)

Package Structure

src/main/java/org/raddatz/familienarchiv/
├── audit/               # Audit logging (AuditService, AuditLogQueryService)
├── config/              # Infrastructure config (MinioConfig, AsyncConfig, WebConfig)
├── dashboard/           # Dashboard analytics + StatsController/StatsService
├── document/            # Document domain — entities, controller, service, repository, DTOs
│   ├── annotation/      # DocumentAnnotation, AnnotationService, AnnotationController
│   ├── comment/         # DocumentComment, CommentService, CommentController
│   └── transcription/   # TranscriptionBlock, TranscriptionService, TranscriptionBlockQueryService
├── exception/           # DomainException, ErrorCode, GlobalExceptionHandler
├── filestorage/         # FileService (S3/MinIO)
├── geschichte/          # Geschichte (story) domain
├── importing/           # MassImportService
├── notification/        # Notification domain + SseEmitterRegistry
├── ocr/                 # OCR domain — OcrService, OcrBatchService, training
├── person/              # Person domain — Person, PersonService, PersonController
│   └── relationship/    # PersonRelationship sub-domain
├── security/            # SecurityConfig, Permission, @RequirePermission, PermissionAspect
├── tag/                 # Tag domain — Tag, TagService, TagController
└── user/                # User domain — AppUser, UserGroup, UserService, auth controllers

For per-domain ownership and public surface, see each domain's README.md.

Layering Rules

→ See docs/ARCHITECTURE.md §Layering rule

LLM reminder: controllers never call repositories directly; services never reach into another domain's repository — always call the other domain's service.

Key Entities

Entity Table Key Relationships
Document documents ManyToOne sender (Person), ManyToMany receivers (Person), ManyToMany tags (Tag)
Person persons Referenced by documents as sender/receiver; name aliases table
Tag tag ManyToMany with documents via document_tags; self-referencing parent for tree
AppUser app_users ManyToMany groups (UserGroup)
UserGroup user_groups Has a Set<String> permissions
TranscriptionBlock transcription_blocks Per-document, per-page text blocks with polygons
DocumentAnnotation document_annotations Free-form annotations on document pages
Comment document_comments Threaded comments with mentions
Notification notifications User notification feed
OcrJob / OcrJobDocument ocr_jobs, ocr_job_documents Batch OCR job tracking

DocumentStatus lifecycle: PLACEHOLDER → UPLOADED → TRANSCRIBED → REVIEWED → ARCHIVED

Entity Code Style

All entities use these Lombok annotations:

@Entity
@Table(name = "table_name")
@Data
@NoArgsConstructor
@AllArgsConstructor
@Builder
public class MyEntity {
    @Id
    @GeneratedValue(strategy = GenerationType.UUID)
    @Schema(requiredMode = Schema.RequiredMode.REQUIRED)
    private UUID id;
    // ...
}
  • @Schema(requiredMode = REQUIRED) on every field the backend always populates — drives TypeScript generation.
  • Collections use @Builder.Default with new HashSet<>() as default.
  • Timestamps use @CreationTimestamp / @UpdateTimestamp.

Services

  • Annotated with @Service, @RequiredArgsConstructor, optionally @Slf4j.
  • Write methods: @Transactional.
  • Read methods: no annotation (default non-transactional).
  • Cross-domain access goes through the other domain's service, never its repository.

Error Handling

→ See CONTRIBUTING.md §Error handling

LLM reminder: use DomainException.notFound/forbidden/conflict/internal() — never throw raw exceptions from service methods. For simple controller validation (not domain logic), ResponseStatusException is acceptable: throw new ResponseStatusException(HttpStatus.BAD_REQUEST, "…"). When adding a new ErrorCode: add to ErrorCode.java, mirror in frontend/src/lib/shared/errors.ts, add i18n keys in messages/{de,en,es}.json.

Security / Permissions

→ See docs/ARCHITECTURE.md §Permission system

LLM reminder: @RequirePermission(Permission.WRITE_ALL) is required on every POST, PUT, PATCH, DELETE endpoint — not optional. Do not mix with Spring Security's @PreAuthorize. Available permissions: READ_ALL, WRITE_ALL, ADMIN, ADMIN_USER, ADMIN_TAG, ADMIN_PERMISSION, ANNOTATE_ALL, BLOG_WRITE.

OCR Integration

The backend orchestrates OCR by calling the Python ocr-service microservice via RestClient:

  • OcrClient interface — mockable for tests
  • RestClientOcrClient — implementation using Spring RestClient
  • OcrService — orchestrates presigned URL generation, OCR call, block mapping
  • OcrBatchService — handles batch/job workflows
  • OcrAsyncRunner — async execution of OCR jobs

For ocr-service internals, see ocr-service/README.md.

API Testing

HTTP test files in backend/api_tests/ for the VS Code REST Client extension.

How to Run

cd backend

./mvnw spring-boot:run          # Run with dev profile (requires PostgreSQL + MinIO)
./mvnw clean package            # Build JAR (with tests)
./mvnw clean package -DskipTests
./mvnw test                     # Run all tests
./mvnw test -Dtest=ClassName    # Run a single test class
./mvnw clean verify             # Run with JaCoCo coverage report

OpenAPI / TypeScript type generation:

  1. Start backend with --spring.profiles.active=dev
  2. In frontend/: npm run generate:api

LLM reminder: always regenerate types after any model or endpoint change — the most common cause of "where did my TypeScript type go?"

Testing

  • Unit tests: Mockito + JUnit, pure in-memory
  • Slice tests: @WebMvcTest, @DataJpaTest with Testcontainers PostgreSQL
  • Integration tests: Full Spring context with Testcontainers
  • Coverage gate: 88% branch coverage (JaCoCo)