familienarchiv

Author	SHA1	Message	Date
Marcel	e9cf2998fe	fix(ocr): reduce mem_limit to 4g, allow 4g swap for 16GB dev machines Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details mem_limit 4g keeps more RAM free for the host. memswap_limit 8g (= 4g swap) lets peaks spill to disk instead of OOM-killing. Slower during peak inference but won't starve the dev machine. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 22:33:05 +02:00
Marcel	902d423f3c	fix(ocr): reduce memory usage for 16GB dev machines Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details - Surya models lazy-load on first OCR request instead of at startup (saves ~3-4GB idle RAM — Kraken stays eager at ~16MB) - Process one page at a time in Surya engine (limits peak memory) - RECOGNITION_BATCH_SIZE=1, DETECTOR_BATCH_SIZE=1 (slower but fits in RAM) - Revert mem_limit back to 6GB (sufficient with these optimizations) - Render DPI stays at 200 Idle memory: ~2GB (Kraken only). Peak during OCR: ~5-6GB (Surya loaded). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 22:26:50 +02:00
Marcel	7f78bc9cf4	fix(ocr): increase memory limit to 10GB, reduce render DPI to 200 Some checks failed CI / Unit & Component Tests (push) Failing after 1s Details CI / Backend Unit Tests (push) Failing after 0s Details CI / Unit & Component Tests (pull_request) Failing after 0s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details Surya 0.17 models use ~5GB idle. At 300 DPI on a multi-page PDF, page images + inference tensors push past the 6GB limit, causing OOM kills during 'Detecting bboxes'. Increased to 10GB and reduced render DPI to 200 (still sufficient for OCR, uses ~44% less memory). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 22:20:36 +02:00
Marcel	f064b27439	feat(ocr): per-script-type confidence thresholds Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 1s Details CI / Unit & Component Tests (pull_request) Failing after 1s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details Kurrent OCR produces much lower confidence than typewriter/Latin. Separate thresholds allow aggressive filtering for Kurrent (0.5) while keeping typewriter lenient (0.3). - OCR_CONFIDENCE_THRESHOLD: default for Surya paths (0.3) - OCR_CONFIDENCE_THRESHOLD_KURRENT: Kraken Kurrent path (0.5) - apply_confidence_markers() now accepts threshold parameter - get_threshold(script_type) selects the right threshold Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 20:50:59 +02:00
Marcel	c74539b04b	feat(ocr): auto-insert [unleserlich] markers for low-confidence words Some checks failed CI / Unit & Component Tests (push) Failing after 2s Details CI / Backend Unit Tests (push) Failing after 2s Details CI / Unit & Component Tests (pull_request) Failing after 2s Details CI / Backend Unit Tests (pull_request) Failing after 1s Details New confidence.py module with two functions: - apply_confidence_markers(): replaces words below threshold with [unleserlich], collapses adjacent markers into one - words_from_characters(): reconstructs word-level confidence from Kraken's character-level data Surya 0.17 provides native word-level confidence via line.words. Kraken 7.0 provides per-character confidences via record.confidences. Both engines now pass word+confidence data through main.py, which applies the marker post-processing before returning the API response. Threshold configurable via OCR_CONFIDENCE_THRESHOLD env var (default 0.3). Frontend already renders [unleserlich] markers via transcriptionMarkers.ts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 19:16:17 +02:00
Marcel	6737bd6db5	feat(ocr): add Python OCR microservice, RestClientOcrClient, Docker Compose Python microservice (ocr-service/): - FastAPI app with /ocr and /health endpoints - Surya engine: transformer-based OCR for typewritten/modern handwriting - Kraken engine: historical HTR for Kurrent/Suetterlin with pure-Python polygon-to-quad approximation (gift wrapping + rotating calipers) - Eager model loading at startup via lifespan context manager - PDF download via httpx, page rendering via pypdfium2 at 300 DPI Java RestClientOcrClient: - Implements OcrClient + OcrHealthClient interfaces - Calls Python service via Spring RestClient - Health check with graceful fallback Docker Compose: - New ocr-service container (mem_limit 6g, no host ports) - Health check with start_period 60s for model loading - ocr_models volume for Kraken model files - Backend depends on ocr-service health Refs #226, #227 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 15:26:40 +02:00
Marcel	ea1c097ae0	fix(e2e): activate e2e profile in dev mode and create reader user idempotently - Add e2e to the dev Maven profile's spring.profiles.active so DataInitializer always runs when developing/testing locally - Create the reader test user independently of the person-seed guard so it survives restarts where seed data already exists - Set SPRING_PROFILES_ACTIVE=dev,e2e in docker-compose backend service Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-24 08:25:54 +01:00
Marcel	c18cdbfac1	feat(dev): add Mailpit mail catcher to docker-compose Some checks failed CI / Backend Unit Tests (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details CI / Unit & Component Tests (push) Has been cancelled Details CI / Unit & Component Tests (pull_request) Successful in 2m6s Details CI / Backend Unit Tests (pull_request) Successful in 2m7s Details CI / E2E Tests (pull_request) Has been cancelled Details Adds a Mailpit container that catches all outgoing emails locally so password reset links can be tested without a real SMTP server. - Backend defaults to MAIL_HOST=mailpit / MAIL_PORT=1025 in compose - SMTP auth and STARTTLS disabled for Mailpit (no credentials needed) - Web inbox available at http://localhost:8025 - Production SMTP still works by overriding MAIL_HOST, MAIL_PORT, MAIL_USERNAME, MAIL_SMTP_AUTH, and MAIL_STARTTLS_ENABLE in .env Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 09:10:17 +01:00
Marcel	908221f04d	feat(frontend): add forgot-password and reset-password pages Some checks failed CI / Unit & Component Tests (push) Successful in 2m7s Details CI / Backend Unit Tests (push) Successful in 2m3s Details CI / E2E Tests (push) Failing after 14m54s Details CI / Unit & Component Tests (pull_request) Successful in 2m4s Details CI / E2E Tests (pull_request) Has been cancelled Details CI / Backend Unit Tests (pull_request) Has been cancelled Details - /forgot-password: email form → sends POST /api/auth/forgot-password → success banner - /reset-password: password form reads token from URL → sends POST /api/auth/reset-password - Login page: add "Passwort vergessen?" link - hooks.server.ts: add /forgot-password and /reset-password to PUBLIC_PATHS; skip auth injection for public auth API endpoints - errors.ts: add INVALID_RESET_TOKEN error code - i18n: add all new message keys in de/en/es - playwright.config.ts: use E2E_BASE_URL for webServer check URL (allows reusing docker dev server at port 5173 locally) - ci.yml: pass E2E_BACKEND_URL=http://localhost:8080 to E2E test step - e2e/password-reset.spec.ts: 5 tests (4 pass locally, full flow requires e2e profile in CI) - Regenerated OpenAPI types including new /api/auth/* endpoints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 07:26:35 +01:00
Marcel	9b67db74eb	feat: auto-start Spring Boot backend via docker-compose Replace the devcontainer (sleep infinity + VS Code image) with a proper dev setup: - Dockerfile: eclipse-temurin:21-jdk-alpine running ./mvnw spring-boot:run - Source mounted at /app, Maven deps cached in named volume maven_cache - Healthcheck on /actuator/health so frontend waits until backend is ready - frontend depends_on backend: service_healthy (was service_started) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 12:03:14 +01:00
Marcel	3280125140	feat: add frontend dev container to docker-compose - frontend/Dockerfile: Node 20 Alpine image running npm run dev - docker-compose: frontend service with depends_on db/minio/backend, source mounted as volume, named volume for node_modules to avoid OS binary conflicts between host and container - vite.config.ts: make API proxy target configurable via API_PROXY_TARGET env var (defaults to localhost:8080 for local dev, set to http://backend:8080 inside Docker) - .env: update PORT_FRONTEND to 5173 (actual vite dev server port) Usage: docker compose up frontend # starts frontend + all dependencies docker compose up # starts everything Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 12:03:14 +01:00
Marcel	0cb8812692	fix: correct devcontainer workspace path mismatch Volume was mounting ./backend to /workspaces/backend, but devcontainer.json pointed VS Code to /workspaces/familienarchiv — causing the broken path shown in Remote Explorer. Now mounts the full project root to /workspaces/familienarchiv, which matches the workspaceFolder variable. Also gives container access to frontend/ for running npm run generate:api without leaving the devcontainer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-15 13:44:06 +01:00
Marcel	e63adb964d	restructure: flatten workspace nesting, move devcontainer to root - backend/workspaces/backend/ → backend/ - backend/workspaces/frontend/ → frontend/ - backend/.devcontainer/ + .vscode/ → repo root (where VS Code expects them) - loose scripts/SQL files → scripts/ - replace nested git repo with single repo at project root - update docker-compose.yml build context and devcontainer.json path - add root .gitignore Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-15 11:47:58 +01:00

13 Commits