Commit Graph

13 Commits

Author SHA1 Message Date
Marcel
e9cf2998fe fix(ocr): reduce mem_limit to 4g, allow 4g swap for 16GB dev machines
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
mem_limit 4g keeps more RAM free for the host. memswap_limit 8g
(= 4g swap) lets peaks spill to disk instead of OOM-killing.
Slower during peak inference but won't starve the dev machine.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:33:05 +02:00
Marcel
902d423f3c fix(ocr): reduce memory usage for 16GB dev machines
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
- Surya models lazy-load on first OCR request instead of at startup
  (saves ~3-4GB idle RAM — Kraken stays eager at ~16MB)
- Process one page at a time in Surya engine (limits peak memory)
- RECOGNITION_BATCH_SIZE=1, DETECTOR_BATCH_SIZE=1 (slower but fits in RAM)
- Revert mem_limit back to 6GB (sufficient with these optimizations)
- Render DPI stays at 200

Idle memory: ~2GB (Kraken only). Peak during OCR: ~5-6GB (Surya loaded).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:26:50 +02:00
Marcel
7f78bc9cf4 fix(ocr): increase memory limit to 10GB, reduce render DPI to 200
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 0s
CI / Unit & Component Tests (pull_request) Failing after 0s
CI / Backend Unit Tests (pull_request) Failing after 1s
Surya 0.17 models use ~5GB idle. At 300 DPI on a multi-page PDF,
page images + inference tensors push past the 6GB limit, causing
OOM kills during 'Detecting bboxes'. Increased to 10GB and reduced
render DPI to 200 (still sufficient for OCR, uses ~44% less memory).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:20:36 +02:00
Marcel
f064b27439 feat(ocr): per-script-type confidence thresholds
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
Kurrent OCR produces much lower confidence than typewriter/Latin.
Separate thresholds allow aggressive filtering for Kurrent (0.5)
while keeping typewriter lenient (0.3).

- OCR_CONFIDENCE_THRESHOLD: default for Surya paths (0.3)
- OCR_CONFIDENCE_THRESHOLD_KURRENT: Kraken Kurrent path (0.5)
- apply_confidence_markers() now accepts threshold parameter
- get_threshold(script_type) selects the right threshold

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 20:50:59 +02:00
Marcel
c74539b04b feat(ocr): auto-insert [unleserlich] markers for low-confidence words
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 2s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 1s
New confidence.py module with two functions:
- apply_confidence_markers(): replaces words below threshold with
  [unleserlich], collapses adjacent markers into one
- words_from_characters(): reconstructs word-level confidence from
  Kraken's character-level data

Surya 0.17 provides native word-level confidence via line.words.
Kraken 7.0 provides per-character confidences via record.confidences.
Both engines now pass word+confidence data through main.py, which
applies the marker post-processing before returning the API response.

Threshold configurable via OCR_CONFIDENCE_THRESHOLD env var (default 0.3).
Frontend already renders [unleserlich] markers via transcriptionMarkers.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:16:17 +02:00
Marcel
6737bd6db5 feat(ocr): add Python OCR microservice, RestClientOcrClient, Docker Compose
Python microservice (ocr-service/):
- FastAPI app with /ocr and /health endpoints
- Surya engine: transformer-based OCR for typewritten/modern handwriting
- Kraken engine: historical HTR for Kurrent/Suetterlin with
  pure-Python polygon-to-quad approximation (gift wrapping + rotating calipers)
- Eager model loading at startup via lifespan context manager
- PDF download via httpx, page rendering via pypdfium2 at 300 DPI

Java RestClientOcrClient:
- Implements OcrClient + OcrHealthClient interfaces
- Calls Python service via Spring RestClient
- Health check with graceful fallback

Docker Compose:
- New ocr-service container (mem_limit 6g, no host ports)
- Health check with start_period 60s for model loading
- ocr_models volume for Kraken model files
- Backend depends on ocr-service health

Refs #226, #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:26:40 +02:00
Marcel
ea1c097ae0 fix(e2e): activate e2e profile in dev mode and create reader user idempotently
- Add e2e to the dev Maven profile's spring.profiles.active so
  DataInitializer always runs when developing/testing locally
- Create the reader test user independently of the person-seed guard
  so it survives restarts where seed data already exists
- Set SPRING_PROFILES_ACTIVE=dev,e2e in docker-compose backend service

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 08:25:54 +01:00
Marcel
c18cdbfac1 feat(dev): add Mailpit mail catcher to docker-compose
Some checks failed
CI / Backend Unit Tests (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
CI / Unit & Component Tests (pull_request) Successful in 2m6s
CI / Backend Unit Tests (pull_request) Successful in 2m7s
CI / E2E Tests (pull_request) Has been cancelled
Adds a Mailpit container that catches all outgoing emails locally so
password reset links can be tested without a real SMTP server.

- Backend defaults to MAIL_HOST=mailpit / MAIL_PORT=1025 in compose
- SMTP auth and STARTTLS disabled for Mailpit (no credentials needed)
- Web inbox available at http://localhost:8025
- Production SMTP still works by overriding MAIL_HOST, MAIL_PORT,
  MAIL_USERNAME, MAIL_SMTP_AUTH, and MAIL_STARTTLS_ENABLE in .env

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 09:10:17 +01:00
Marcel
908221f04d feat(frontend): add forgot-password and reset-password pages
Some checks failed
CI / Unit & Component Tests (push) Successful in 2m7s
CI / Backend Unit Tests (push) Successful in 2m3s
CI / E2E Tests (push) Failing after 14m54s
CI / Unit & Component Tests (pull_request) Successful in 2m4s
CI / E2E Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
- /forgot-password: email form → sends POST /api/auth/forgot-password → success banner
- /reset-password: password form reads token from URL → sends POST /api/auth/reset-password
- Login page: add "Passwort vergessen?" link
- hooks.server.ts: add /forgot-password and /reset-password to PUBLIC_PATHS; skip auth
  injection for public auth API endpoints
- errors.ts: add INVALID_RESET_TOKEN error code
- i18n: add all new message keys in de/en/es
- playwright.config.ts: use E2E_BASE_URL for webServer check URL (allows reusing docker
  dev server at port 5173 locally)
- ci.yml: pass E2E_BACKEND_URL=http://localhost:8080 to E2E test step
- e2e/password-reset.spec.ts: 5 tests (4 pass locally, full flow requires e2e profile in CI)
- Regenerated OpenAPI types including new /api/auth/* endpoints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 07:26:35 +01:00
Marcel
9b67db74eb feat: auto-start Spring Boot backend via docker-compose
Replace the devcontainer (sleep infinity + VS Code image) with a proper
dev setup:
- Dockerfile: eclipse-temurin:21-jdk-alpine running ./mvnw spring-boot:run
- Source mounted at /app, Maven deps cached in named volume maven_cache
- Healthcheck on /actuator/health so frontend waits until backend is ready
- frontend depends_on backend: service_healthy (was service_started)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 12:03:14 +01:00
Marcel
3280125140 feat: add frontend dev container to docker-compose
- frontend/Dockerfile: Node 20 Alpine image running npm run dev
- docker-compose: frontend service with depends_on db/minio/backend,
  source mounted as volume, named volume for node_modules to avoid
  OS binary conflicts between host and container
- vite.config.ts: make API proxy target configurable via
  API_PROXY_TARGET env var (defaults to localhost:8080 for local dev,
  set to http://backend:8080 inside Docker)
- .env: update PORT_FRONTEND to 5173 (actual vite dev server port)

Usage:
  docker compose up frontend   # starts frontend + all dependencies
  docker compose up            # starts everything

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 12:03:14 +01:00
Marcel
0cb8812692 fix: correct devcontainer workspace path mismatch
Volume was mounting ./backend to /workspaces/backend, but devcontainer.json
pointed VS Code to /workspaces/familienarchiv — causing the broken path shown
in Remote Explorer.

Now mounts the full project root to /workspaces/familienarchiv, which matches
the workspaceFolder variable. Also gives container access to frontend/ for
running npm run generate:api without leaving the devcontainer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 13:44:06 +01:00
Marcel
e63adb964d restructure: flatten workspace nesting, move devcontainer to root
- backend/workspaces/backend/ → backend/
- backend/workspaces/frontend/ → frontend/
- backend/.devcontainer/ + .vscode/ → repo root (where VS Code expects them)
- loose scripts/SQL files → scripts/
- replace nested git repo with single repo at project root
- update docker-compose.yml build context and devcontainer.json path
- add root .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 11:47:58 +01:00