Commit Graph

27 Commits

Author SHA1 Message Date
Marcel
57ffb7d751 chore(ocr): lower OCR_MAX_CACHED_MODELS to 2 with memory budget comment
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 20:20:53 +02:00
Marcel
64d27d6d61 feat(ocr): per-sender model registry and /train-sender endpoint
engines/kraken.py:
- Add _SenderModelRegistry with LRU eviction (max configurable via
  OCR_MAX_CACHED_MODELS env var), double-checked locking, invalidate(),
  and path whitelist (/app/models/ only)
- Add _load_sender_model() helper for testability
- extract_page_blocks() and extract_region_text() accept optional
  sender_model_path; route to sender registry when provided

models.py:
- OcrRequest gains senderModelPath: str | None = None field

main.py:
- /ocr and /ocr/stream pass request.senderModelPath to Kraken engine
- New /train-sender endpoint: validates output_model_path, runs ketos
  train with base model as starting point, invalidates sender cache

docker-compose.yml:
- Add OCR_MAX_CACHED_MODELS: "5" to ocr-service environment

test_sender_registry.py:
- 4 tests: cache hit, LRU eviction, invalidate, path traversal guard

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:05:39 +02:00
Marcel
615d404ba9 chore(ocr): add opencv-python-headless, libglib2.0-0, and CLAHE env vars
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:14:47 +02:00
Marcel
57c44cf02f devops(backend): reduce healthcheck start_period to 30s
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
With a pre-built JAR, Spring Boot + Flyway starts in ~15 seconds.
The previous 60s was sized for runtime compilation (90+ seconds).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:33:03 +02:00
Marcel
3c46d820ad devops(backend): switch to multi-stage Docker build
Replace runtime mvn spring-boot:run with a proper multi-stage build:
- Stage 1 (builder): compiles JAR with BuildKit cache mount for ~/.m2
- Stage 2 (runtime): eclipse-temurin:21-jre with only the JAR

Removes the backend source volume mount and maven_cache named volume.
Deploy with: docker compose up -d --build

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:33:03 +02:00
Marcel
e4719b9487 fix(deploy): increase OCR healthcheck start_period, comment ocr_cache volume, add token hint
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
- start_period 60s → 120s: Zenodo download on cold start can exceed 60s on slow connections
- ocr_cache volume comment: documents what the cache stores for future operators
- .env.example: add token generation command to prevent weak placeholder in production

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
f08897b801 fix(deploy): wire OCR training token to backend and raise container memory limit
- Pass OCR_TRAINING_TOKEN through to the backend container as
  APP_OCR_TRAINING_TOKEN so RestClientOcrClient sends the X-Training-Token
  header when calling /train and /segtrain.
- Raise mem_limit/memswap_limit from 8g to 12g to give segtrain headroom
  on hosts with more available RAM.
- Uncomment OCR_TRAINING_TOKEN in .env.example — it is now required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 21:17:53 +02:00
Marcel
287920a982 docs(ocr): document single-node constraint for OCR training
Training reloads the Kraken model in-process on the Python service.
The DB-level RUNNING constraint prevents concurrent API calls but
cannot protect against multi-replica deployments. Added explicit
comments in docker-compose.yml and OcrTrainingService to prevent
accidental horizontal scaling. See ADR-001.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 23:01:45 +02:00
Marcel
bc97a2dade feat(ocr): add /train endpoint to OCR service and OcrClient.trainModel()
- POST /train in ocr-service with ZIP Slip validation, TemporaryDirectory,
  ketos transfer learning, timestamped backups (keep last 3), in-process reload
- X-Training-Token auth (no-op in dev when TRAINING_TOKEN env is empty)
- trainModel() in OcrClient interface + RestClientOcrClient (10-min timeout,
  multipart upload, forwards X-Training-Token when configured)
- TRAINING_TOKEN env var wired in docker-compose; --workers 2 in Dockerfile
  so /health stays responsive during synchronous training

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 14:40:53 +02:00
Marcel
0beaf351f0 fix(docker): soften ocr-service dependency and clean up compose
Changed ocr-service dependency from service_healthy to service_started
since the backend already handles OCR unavailability gracefully. Removed
unused APP_S3_INTERNAL_URL env var. Added expose directive and
.dockerignore for ocr-service.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:29:21 +02:00
Marcel
0b0d4a7d5e perf(ocr): double batch sizes (detector=8, recognition=16)
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
4GB headroom in the container. Doubling batches should use ~2GB
more RAM but significantly speed up inference.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 23:23:13 +02:00
Marcel
1b7540143e fix(ocr): persist model cache across container restarts
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 0s
CI / Backend Unit Tests (pull_request) Failing after 1s
Surya downloads models from HuggingFace to /root/.cache on first use.
Without a volume, every container restart re-downloads ~73MB+.
Added ocr_cache volume to persist the cache.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 23:21:51 +02:00
Marcel
2cc7dcd5e3 perf(ocr): increase batch sizes (detector=4, recognition=8)
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 2s
5GB free on host during OCR, container at 3.8/8GB. Larger batches
use more memory but process faster on CPU.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 23:19:22 +02:00
Marcel
741979304c fix(ocr): increase to 8g mem_limit and larger batch sizes
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 2s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 0s
5GB free on host while OCR runs — give the container more room.
Bump batch sizes (detector=2, recognition=4) so it processes
faster with the available memory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:35:34 +02:00
Marcel
e9cf2998fe fix(ocr): reduce mem_limit to 4g, allow 4g swap for 16GB dev machines
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
mem_limit 4g keeps more RAM free for the host. memswap_limit 8g
(= 4g swap) lets peaks spill to disk instead of OOM-killing.
Slower during peak inference but won't starve the dev machine.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:33:05 +02:00
Marcel
902d423f3c fix(ocr): reduce memory usage for 16GB dev machines
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
- Surya models lazy-load on first OCR request instead of at startup
  (saves ~3-4GB idle RAM — Kraken stays eager at ~16MB)
- Process one page at a time in Surya engine (limits peak memory)
- RECOGNITION_BATCH_SIZE=1, DETECTOR_BATCH_SIZE=1 (slower but fits in RAM)
- Revert mem_limit back to 6GB (sufficient with these optimizations)
- Render DPI stays at 200

Idle memory: ~2GB (Kraken only). Peak during OCR: ~5-6GB (Surya loaded).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:26:50 +02:00
Marcel
7f78bc9cf4 fix(ocr): increase memory limit to 10GB, reduce render DPI to 200
Some checks failed
CI / Unit & Component Tests (push) Failing after 1s
CI / Backend Unit Tests (push) Failing after 0s
CI / Unit & Component Tests (pull_request) Failing after 0s
CI / Backend Unit Tests (pull_request) Failing after 1s
Surya 0.17 models use ~5GB idle. At 300 DPI on a multi-page PDF,
page images + inference tensors push past the 6GB limit, causing
OOM kills during 'Detecting bboxes'. Increased to 10GB and reduced
render DPI to 200 (still sufficient for OCR, uses ~44% less memory).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 22:20:36 +02:00
Marcel
f064b27439 feat(ocr): per-script-type confidence thresholds
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 1s
CI / Unit & Component Tests (pull_request) Failing after 1s
CI / Backend Unit Tests (pull_request) Failing after 1s
Kurrent OCR produces much lower confidence than typewriter/Latin.
Separate thresholds allow aggressive filtering for Kurrent (0.5)
while keeping typewriter lenient (0.3).

- OCR_CONFIDENCE_THRESHOLD: default for Surya paths (0.3)
- OCR_CONFIDENCE_THRESHOLD_KURRENT: Kraken Kurrent path (0.5)
- apply_confidence_markers() now accepts threshold parameter
- get_threshold(script_type) selects the right threshold

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 20:50:59 +02:00
Marcel
c74539b04b feat(ocr): auto-insert [unleserlich] markers for low-confidence words
Some checks failed
CI / Unit & Component Tests (push) Failing after 2s
CI / Backend Unit Tests (push) Failing after 2s
CI / Unit & Component Tests (pull_request) Failing after 2s
CI / Backend Unit Tests (pull_request) Failing after 1s
New confidence.py module with two functions:
- apply_confidence_markers(): replaces words below threshold with
  [unleserlich], collapses adjacent markers into one
- words_from_characters(): reconstructs word-level confidence from
  Kraken's character-level data

Surya 0.17 provides native word-level confidence via line.words.
Kraken 7.0 provides per-character confidences via record.confidences.
Both engines now pass word+confidence data through main.py, which
applies the marker post-processing before returning the API response.

Threshold configurable via OCR_CONFIDENCE_THRESHOLD env var (default 0.3).
Frontend already renders [unleserlich] markers via transcriptionMarkers.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 19:16:17 +02:00
Marcel
6737bd6db5 feat(ocr): add Python OCR microservice, RestClientOcrClient, Docker Compose
Python microservice (ocr-service/):
- FastAPI app with /ocr and /health endpoints
- Surya engine: transformer-based OCR for typewritten/modern handwriting
- Kraken engine: historical HTR for Kurrent/Suetterlin with
  pure-Python polygon-to-quad approximation (gift wrapping + rotating calipers)
- Eager model loading at startup via lifespan context manager
- PDF download via httpx, page rendering via pypdfium2 at 300 DPI

Java RestClientOcrClient:
- Implements OcrClient + OcrHealthClient interfaces
- Calls Python service via Spring RestClient
- Health check with graceful fallback

Docker Compose:
- New ocr-service container (mem_limit 6g, no host ports)
- Health check with start_period 60s for model loading
- ocr_models volume for Kraken model files
- Backend depends on ocr-service health

Refs #226, #227

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 15:26:40 +02:00
Marcel
ea1c097ae0 fix(e2e): activate e2e profile in dev mode and create reader user idempotently
- Add e2e to the dev Maven profile's spring.profiles.active so
  DataInitializer always runs when developing/testing locally
- Create the reader test user independently of the person-seed guard
  so it survives restarts where seed data already exists
- Set SPRING_PROFILES_ACTIVE=dev,e2e in docker-compose backend service

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 08:25:54 +01:00
Marcel
c18cdbfac1 feat(dev): add Mailpit mail catcher to docker-compose
Some checks failed
CI / Backend Unit Tests (push) Has been cancelled
CI / E2E Tests (push) Has been cancelled
CI / Unit & Component Tests (push) Has been cancelled
CI / Unit & Component Tests (pull_request) Successful in 2m6s
CI / Backend Unit Tests (pull_request) Successful in 2m7s
CI / E2E Tests (pull_request) Has been cancelled
Adds a Mailpit container that catches all outgoing emails locally so
password reset links can be tested without a real SMTP server.

- Backend defaults to MAIL_HOST=mailpit / MAIL_PORT=1025 in compose
- SMTP auth and STARTTLS disabled for Mailpit (no credentials needed)
- Web inbox available at http://localhost:8025
- Production SMTP still works by overriding MAIL_HOST, MAIL_PORT,
  MAIL_USERNAME, MAIL_SMTP_AUTH, and MAIL_STARTTLS_ENABLE in .env

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 09:10:17 +01:00
Marcel
908221f04d feat(frontend): add forgot-password and reset-password pages
Some checks failed
CI / Unit & Component Tests (push) Successful in 2m7s
CI / Backend Unit Tests (push) Successful in 2m3s
CI / E2E Tests (push) Failing after 14m54s
CI / Unit & Component Tests (pull_request) Successful in 2m4s
CI / E2E Tests (pull_request) Has been cancelled
CI / Backend Unit Tests (pull_request) Has been cancelled
- /forgot-password: email form → sends POST /api/auth/forgot-password → success banner
- /reset-password: password form reads token from URL → sends POST /api/auth/reset-password
- Login page: add "Passwort vergessen?" link
- hooks.server.ts: add /forgot-password and /reset-password to PUBLIC_PATHS; skip auth
  injection for public auth API endpoints
- errors.ts: add INVALID_RESET_TOKEN error code
- i18n: add all new message keys in de/en/es
- playwright.config.ts: use E2E_BASE_URL for webServer check URL (allows reusing docker
  dev server at port 5173 locally)
- ci.yml: pass E2E_BACKEND_URL=http://localhost:8080 to E2E test step
- e2e/password-reset.spec.ts: 5 tests (4 pass locally, full flow requires e2e profile in CI)
- Regenerated OpenAPI types including new /api/auth/* endpoints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 07:26:35 +01:00
Marcel
9b67db74eb feat: auto-start Spring Boot backend via docker-compose
Replace the devcontainer (sleep infinity + VS Code image) with a proper
dev setup:
- Dockerfile: eclipse-temurin:21-jdk-alpine running ./mvnw spring-boot:run
- Source mounted at /app, Maven deps cached in named volume maven_cache
- Healthcheck on /actuator/health so frontend waits until backend is ready
- frontend depends_on backend: service_healthy (was service_started)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 12:03:14 +01:00
Marcel
3280125140 feat: add frontend dev container to docker-compose
- frontend/Dockerfile: Node 20 Alpine image running npm run dev
- docker-compose: frontend service with depends_on db/minio/backend,
  source mounted as volume, named volume for node_modules to avoid
  OS binary conflicts between host and container
- vite.config.ts: make API proxy target configurable via
  API_PROXY_TARGET env var (defaults to localhost:8080 for local dev,
  set to http://backend:8080 inside Docker)
- .env: update PORT_FRONTEND to 5173 (actual vite dev server port)

Usage:
  docker compose up frontend   # starts frontend + all dependencies
  docker compose up            # starts everything

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 12:03:14 +01:00
Marcel
0cb8812692 fix: correct devcontainer workspace path mismatch
Volume was mounting ./backend to /workspaces/backend, but devcontainer.json
pointed VS Code to /workspaces/familienarchiv — causing the broken path shown
in Remote Explorer.

Now mounts the full project root to /workspaces/familienarchiv, which matches
the workspaceFolder variable. Also gives container access to frontend/ for
running npm run generate:api without leaving the devcontainer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 13:44:06 +01:00
Marcel
e63adb964d restructure: flatten workspace nesting, move devcontainer to root
- backend/workspaces/backend/ → backend/
- backend/workspaces/frontend/ → frontend/
- backend/.devcontainer/ + .vscode/ → repo root (where VS Code expects them)
- loose scripts/SQL files → scripts/
- replace nested git repo with single repo at project root
- update docker-compose.yml build context and devcontainer.json path
- add root .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 11:47:58 +01:00