From ea38efc734c2e0bf35b4c651fe09501f95e20ca9 Mon Sep 17 00:00:00 2001 From: Marcel Date: Wed, 27 May 2026 21:30:40 +0200 Subject: [PATCH] docs: drop remaining stale MassImportService/ExcelService references Replace the legacy raw-spreadsheet importer references left behind after #674 with the canonical import architecture (CanonicalImportOrchestrator + four loaders) and document #686 index-based PDF resolution. - l3-backend-3b: DocumentImporter now resolves PDF by index (importDir/ .pdf) with index validation + canonical-path containment + %PDF magic-byte check (no recursive walk / homoglyph file-path guards) - c4-diagrams.md: replace massImport/excelSvc components + their rels with an importOrch (CanonicalImportOrchestrator) component wired to doc/person/ tag services; refresh adminCtrl and adminSystem descriptions - ARCHITECTURE.md: importing package row now describes the orchestrator + four loaders consuming canonical artifacts - TODO-backend.md: remove obsolete "MassImportService provides no status" item (service deleted; orchestrator already exposes import-status); update stale ExcelService test-coverage suggestion Refs #686 Co-Authored-By: Claude Opus 4.7 --- docs/ARCHITECTURE.md | 2 +- docs/TODO-backend.md | 13 +------------ docs/architecture/c4-diagrams.md | 16 ++++++++-------- .../c4/l3-backend-3b-document-management.puml | 2 +- .../c4/l3-frontend-3d-administration.puml | 2 +- 5 files changed, 12 insertions(+), 23 deletions(-) diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index de071a43..5bc46261 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -65,7 +65,7 @@ Members of the cross-cutting layer have no entity of their own, no user-facing C | `dashboard` | Stats aggregation for the admin dashboard and Family Pulse widget | Aggregates from 3+ domains; no owned entities | | `exception` | `DomainException`, `ErrorCode` enum, `GlobalExceptionHandler` | Framework infra; consumed by every controller and service. Adding a new `ErrorCode` requires matching updates in `frontend/src/lib/shared/errors.ts` and all three `messages/*.json` locale files. Current security-related codes: `CSRF_TOKEN_MISSING` (403 on mutating request without valid `X-XSRF-TOKEN` header), `TOO_MANY_LOGIN_ATTEMPTS` (429 when login rate limit exceeded). | | `filestorage` | `FileService` — MinIO/S3 upload, download, presigned-URL generation | Generic service; consumed by `document` and `ocr` | -| `importing` | `MassImportService` — async ODS/Excel batch import | Orchestrates across `person`, `tag`, `document` | +| `importing` | `CanonicalImportOrchestrator` — async canonical import running four idempotent loaders (`TagTreeImporter` → `PersonRegisterImporter` → `PersonTreeImporter` → `DocumentImporter`) over the normalizer's committed canonical artifacts (`canonical-*.xlsx` + `canonical-persons-tree.json`) | Orchestrates across `person`, `tag`, `document` | | `security` | `SecurityConfig`, `Permission` enum, `@RequirePermission` annotation, `PermissionAspect` (AOP) | Framework infra; enforced globally across all controllers | **Frontend `shared/`** follows the same admission criteria. Key members: `api.server.ts` (typed openapi-fetch client factory), `errors.ts` (backend `ErrorCode` → i18n mapping), `shared/primitives/` (generic UI components used across ≥2 domains), `shared/discussion/` (comment/mention editor used by `document` and `geschichte`), `shared/utils/` (pure date/sort/debounce utilities). diff --git a/docs/TODO-backend.md b/docs/TODO-backend.md index 7b03f802..e09f47e9 100644 --- a/docs/TODO-backend.md +++ b/docs/TODO-backend.md @@ -94,17 +94,6 @@ The schema includes `spring_session` and `spring_session_attributes` tables, but --- -### `MassImportService` provides no status or error feedback -**File:** `service/MassImportService.java`, `controller/AdminController.java` - -`/api/admin/trigger-import` returns immediately (async), but there is no way for the admin to know whether the import succeeded, failed, or is still running. Errors during async execution are silently swallowed. - -**Fix options:** -- Store import job status in a DB table (`import_jobs`) with state (`RUNNING`, `DONE`, `FAILED`) and expose a `GET /api/admin/import-status` endpoint -- Alternatively, make the endpoint synchronous since it already blocks on file I/O — only use async if you need true non-blocking behaviour - ---- - ## Missing Capabilities ### No test coverage @@ -114,7 +103,7 @@ The only test is a Spring context load test. No unit or integration tests exist **Suggested starting points (highest value for effort):** 1. `DocumentSpecifications` — pure logic, easy to unit test with an in-memory H2 or Testcontainers PostgreSQL -2. `ExcelService` — parsing logic, test with fixture `.xlsx` files (one exists in `api_tests/`) +2. Canonical import loaders (`CanonicalSheetReader`, `DocumentImporter`, etc.) — parsing/upsert logic, test with fixture canonical `.xlsx` files 3. `PermissionAspect` — security logic should be tested; use `@WithMockUser` from Spring Security Test --- diff --git a/docs/architecture/c4-diagrams.md b/docs/architecture/c4-diagrams.md index 858082aa..d01cdfaa 100644 --- a/docs/architecture/c4-diagrams.md +++ b/docs/architecture/c4-diagrams.md @@ -93,7 +93,7 @@ C4Component ### 3b — Document Management & Import -Document management, file storage, and bulk Excel/ODS import. +Document management, file storage, and the canonical import. ```mermaid C4Component @@ -105,12 +105,11 @@ C4Component System_Boundary(backend, "API Backend (Spring Boot)") { Component(docCtrl, "DocumentController", "Spring MVC — /api/documents", "CRUD for documents: search, get by ID, update metadata, upload/download file, conversation thread, and batch metadata updates.") - Component(adminCtrl, "AdminController", "Spring MVC — /api/admin", "Triggers asynchronous Excel/ODS mass import (requires ADMIN permission). Reports import state (IDLE/RUNNING/DONE/FAILED).") + Component(adminCtrl, "AdminController", "Spring MVC — /api/admin", "Triggers the asynchronous canonical import (requires ADMIN permission). Reports import state via GET /api/admin/import-status (IDLE/RUNNING/DONE/FAILED).") Component(docSvc, "DocumentService", "Spring Service", "Core document business logic: store, update, search. Resolves persons and tags, delegates file I/O to FileService, builds dynamic JPA Specifications, and integrates with audit logging.") Component(fileSvc, "FileService", "Spring Service", "Wraps AWS SDK v2 S3Client. Uploads files with UUID-keyed paths, computes SHA-256 hash, downloads with content-type detection, and generates presigned URLs for OCR access.") - Component(massImport, "MassImportService", "Spring Service — @Async", "Reads Excel/ODS files from /import mount. Tracks import state (IDLE/RUNNING/DONE/FAILED) and delegates to ExcelService. Returns immediately; processing runs asynchronously.") - Component(excelSvc, "ExcelService", "Spring Service", "Parses Excel/ODS workbooks (Apache POI). Column indices configurable via application.properties. Creates/updates document records per row.") + Component(importOrch, "CanonicalImportOrchestrator", "Spring Service — @Async", "Runs four idempotent loaders (TagTree → PersonRegister → PersonTree → Document) in a fixed DAG over the normalizer's committed canonical artifacts (canonical-*.xlsx + canonical-persons-tree.json) from /import — see diagram 3b. Owns the IDLE/RUNNING/DONE/FAILED state machine.") Component(minioConf, "MinioConfig", "Spring @Configuration", "Creates the S3Client and S3Presigner beans with path-style access for MinIO. Validates MinIO connectivity on startup.") Component(docRepo, "DocumentRepository", "Spring Data JPA", "Queries documents with Specification-based dynamic search, bidirectional conversation thread queries, full-text search with ranking and match highlighting, and transcription pipeline queue projections.") @@ -123,14 +122,15 @@ C4Component Rel(frontend, docCtrl, "Document requests", "HTTP / JSON") Rel(frontend, adminCtrl, "Trigger import", "HTTP / JSON") Rel(docCtrl, docSvc, "Delegates to", "") - Rel(adminCtrl, massImport, "Triggers", "") + Rel(adminCtrl, importOrch, "Triggers", "") Rel(docSvc, fileSvc, "Upload / download files", "") Rel(docSvc, docRepo, "Reads / writes documents", "") Rel(docSvc, docSpec, "Builds search predicates", "") Rel(docSvc, personSvc, "Resolves sender / receivers", "") Rel(docSvc, tagSvc, "Finds or creates tags", "") - Rel(massImport, excelSvc, "Parses Excel/ODS file", "") - Rel(excelSvc, docSvc, "Creates / updates documents", "") + Rel(importOrch, docSvc, "Upserts documents (PDF by index) — see 3b", "") + Rel(importOrch, personSvc, "Upserts persons + relationships", "") + Rel(importOrch, tagSvc, "Upserts tag hierarchy", "") Rel(minioConf, fileSvc, "Provides S3Client and S3Presigner beans", "") Rel(fileSvc, minio, "PUT / GET / presigned URL objects", "S3 API / HTTP") Rel(docRepo, db, "SQL queries", "JDBC") @@ -492,7 +492,7 @@ C4Component Component(adminGroups, "/admin/groups, /admin/groups/[id], /admin/groups/new", "SvelteKit Routes", "Permission group management: create/edit groups and their permission sets.") Component(adminTags, "/admin/tags and /admin/tags/[id]", "SvelteKit Routes", "Tag administration: edit tag hierarchy, merge tags, delete subtrees.") Component(adminOcr, "/admin/ocr and /admin/ocr/[personId]", "SvelteKit Routes", "Global and per-person OCR configuration. Manages script types and triggers sender model training.") - Component(adminSystem, "/admin/system", "SvelteKit Route", "System status panel. Triggers Excel/ODS mass import (POST /api/admin/trigger-import). Displays import state.") + Component(adminSystem, "/admin/system", "SvelteKit Route", "System status panel. Triggers the canonical import (POST /api/admin/trigger-import). Displays import state.") Component(hilfe, "/hilfe/transkription", "SvelteKit Route", "Static transcription style guide for Kurrent and Sütterlin character recognition. No backend calls.") } diff --git a/docs/architecture/c4/l3-backend-3b-document-management.puml b/docs/architecture/c4/l3-backend-3b-document-management.puml index cca25d75..ac2f0208 100644 --- a/docs/architecture/c4/l3-backend-3b-document-management.puml +++ b/docs/architecture/c4/l3-backend-3b-document-management.puml @@ -16,7 +16,7 @@ System_Boundary(backend, "API Backend (Spring Boot)") { Component(tagTreeLoader, "TagTreeImporter", "Spring Component", "Upserts the tag hierarchy from canonical-tag-tree.xlsx via TagService (by canonical tag_path).") Component(personRegLoader, "PersonRegisterImporter", "Spring Component", "Upserts register persons from canonical-persons.xlsx via PersonService (by normalizer person_id).") Component(personTreeLoader, "PersonTreeImporter", "Spring Component", "Upserts tree persons + relationships from canonical-persons-tree.json via PersonService and RelationshipService.") - Component(docLoader, "DocumentImporter", "Spring Component", "Loads canonical-documents.xlsx: routes attribution register-first (raw cell always retained in sender_text/receiver_text), parses clean dates, builds an honest precision-aware title via DocumentTitleFormatter, keeps the S3 upload + thumbnail plumbing, and ports the path-traversal / homoglyph / absolute-path / %PDF magic-byte security guards.") + Component(docLoader, "DocumentImporter", "Spring Component", "Loads canonical-documents.xlsx: routes attribution register-first (raw cell always retained in sender_text/receiver_text), parses clean dates, builds an honest precision-aware title via DocumentTitleFormatter, keeps the S3 upload + thumbnail plumbing, and resolves each PDF by index (importDir/.pdf) guarded by strict index validation + canonical-path containment + %PDF magic-byte check (no recursive walk).") Component(titleFmt, "DocumentTitleFormatter", "Pure helper", "Formats the date label baked into an import title at exactly the data's precision (MONTH -> 'Juni 1916', never a fabricated day). Mirrors the frontend formatDocumentDate; both are pinned to docs/date-label-fixtures.json (#666).") Component(sheetReader, "CanonicalSheetReader", "POI helper", "Maps a canonical .xlsx by header name (no positional indices), splits pipe-delimited list columns, fails closed (IMPORT_ARTIFACT_INVALID) on a missing required header.") Component(minioConf, "MinioConfig", "Spring @Configuration", "Creates the S3Client and S3Presigner beans with path-style access for MinIO. Validates MinIO connectivity on startup.") diff --git a/docs/architecture/c4/l3-frontend-3d-administration.puml b/docs/architecture/c4/l3-frontend-3d-administration.puml index 3f7c89ef..5b711b3a 100644 --- a/docs/architecture/c4/l3-frontend-3d-administration.puml +++ b/docs/architecture/c4/l3-frontend-3d-administration.puml @@ -12,7 +12,7 @@ System_Boundary(frontend, "Web Frontend (SvelteKit / SSR)") { Component(adminGroups, "/admin/groups, /admin/groups/[id], /admin/groups/new", "SvelteKit Routes", "Permission group management: create/edit groups and their permission sets.") Component(adminTags, "/admin/tags and /admin/tags/[id]", "SvelteKit Routes", "Tag administration: edit tag hierarchy, merge tags, delete subtrees.") Component(adminOcr, "/admin/ocr and /admin/ocr/[personId]", "SvelteKit Routes", "Global and per-person OCR configuration. Manages script types and triggers sender model training.") - Component(adminSystem, "/admin/system", "SvelteKit Route", "System status panel. Triggers Excel/ODS mass import (POST /api/admin/trigger-import). Displays import state.") + Component(adminSystem, "/admin/system", "SvelteKit Route", "System status panel. Triggers the canonical import (POST /api/admin/trigger-import). Displays import state.") Component(hilfe, "/hilfe/transkription", "SvelteKit Route", "Static transcription style guide for Kurrent and Sütterlin character recognition. No backend calls.") }