All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m42s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m50s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
Replace the legacy raw-spreadsheet importer references left behind after #674 with the canonical import architecture (CanonicalImportOrchestrator + four loaders) and document #686 index-based PDF resolution. - l3-backend-3b: DocumentImporter now resolves PDF by index (importDir/ <index>.pdf) with index validation + canonical-path containment + %PDF magic-byte check (no recursive walk / homoglyph file-path guards) - c4-diagrams.md: replace massImport/excelSvc components + their rels with an importOrch (CanonicalImportOrchestrator) component wired to doc/person/ tag services; refresh adminCtrl and adminSystem descriptions - ARCHITECTURE.md: importing package row now describes the orchestrator + four loaders consuming canonical artifacts - TODO-backend.md: remove obsolete "MassImportService provides no status" item (service deleted; orchestrator already exposes import-status); update stale ExcelService test-coverage suggestion Refs #686 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
61 lines
5.7 KiB
Plaintext
61 lines
5.7 KiB
Plaintext
@startuml
|
|
!include <C4/C4_Component>
|
|
|
|
title Component Diagram: API Backend — Document Management & Canonical Import
|
|
|
|
Container(frontend, "Web Frontend", "SvelteKit")
|
|
ContainerDb(db, "PostgreSQL", "PostgreSQL 16")
|
|
ContainerDb(minio, "Object Storage", "MinIO (S3-compatible)")
|
|
|
|
System_Boundary(backend, "API Backend (Spring Boot)") {
|
|
Component(docCtrl, "DocumentController", "Spring MVC — /api/documents", "CRUD for documents: search, get by ID, update metadata, upload/download file, conversation thread, batch metadata updates, and per-month density aggregation for the timeline filter widget.")
|
|
Component(adminCtrl, "AdminController", "Spring MVC — /api/admin", "Triggers the asynchronous canonical import (requires ADMIN permission). Reports import state (IDLE/RUNNING/DONE/FAILED).")
|
|
Component(docSvc, "DocumentService", "Spring Service", "Core document business logic: store, update, search. Resolves persons and tags, delegates file I/O to FileService, builds dynamic JPA Specifications, and integrates with audit logging.")
|
|
Component(fileSvc, "FileService", "Spring Service", "Wraps AWS SDK v2 S3Client. Uploads files with UUID-keyed paths, computes SHA-256 hash, downloads with content-type detection, and generates presigned URLs for OCR access.")
|
|
Component(importOrch, "CanonicalImportOrchestrator", "Spring Service — @Async", "Runs the four canonical loaders in an explicit dependency DAG (TagTree → PersonRegister → PersonTree → Document). Smoke-checks all four artifacts before starting, owns the IDLE/RUNNING/DONE/FAILED state machine, fails closed on a malformed artifact.")
|
|
Component(tagTreeLoader, "TagTreeImporter", "Spring Component", "Upserts the tag hierarchy from canonical-tag-tree.xlsx via TagService (by canonical tag_path).")
|
|
Component(personRegLoader, "PersonRegisterImporter", "Spring Component", "Upserts register persons from canonical-persons.xlsx via PersonService (by normalizer person_id).")
|
|
Component(personTreeLoader, "PersonTreeImporter", "Spring Component", "Upserts tree persons + relationships from canonical-persons-tree.json via PersonService and RelationshipService.")
|
|
Component(docLoader, "DocumentImporter", "Spring Component", "Loads canonical-documents.xlsx: routes attribution register-first (raw cell always retained in sender_text/receiver_text), parses clean dates, builds an honest precision-aware title via DocumentTitleFormatter, keeps the S3 upload + thumbnail plumbing, and resolves each PDF by index (importDir/<index>.pdf) guarded by strict index validation + canonical-path containment + %PDF magic-byte check (no recursive walk).")
|
|
Component(titleFmt, "DocumentTitleFormatter", "Pure helper", "Formats the date label baked into an import title at exactly the data's precision (MONTH -> 'Juni 1916', never a fabricated day). Mirrors the frontend formatDocumentDate; both are pinned to docs/date-label-fixtures.json (#666).")
|
|
Component(sheetReader, "CanonicalSheetReader", "POI helper", "Maps a canonical .xlsx by header name (no positional indices), splits pipe-delimited list columns, fails closed (IMPORT_ARTIFACT_INVALID) on a missing required header.")
|
|
Component(minioConf, "MinioConfig", "Spring @Configuration", "Creates the S3Client and S3Presigner beans with path-style access for MinIO. Validates MinIO connectivity on startup.")
|
|
Component(docRepo, "DocumentRepository", "Spring Data JPA", "Queries documents with Specification-based dynamic search, bidirectional conversation thread queries, full-text search with ranking and match highlighting, and transcription pipeline queue projections.")
|
|
Component(docSpec, "DocumentSpecifications", "JPA Criteria API", "Factory for composable predicates: hasText (full-text), hasSender, hasReceiver, isBetween (date range), hasTags (subquery AND/OR logic).")
|
|
}
|
|
|
|
Component(personSvc, "PersonService", "Spring Service", "See diagram 3e. Resolves sender / receiver persons by ID; upserts persons by source_ref for the importer.")
|
|
Component(tagSvc, "TagService", "Spring Service", "See diagram 3d. Finds or creates tags by name; upserts tags by source_ref for the importer.")
|
|
Component(relSvc, "RelationshipService", "Spring Service", "See diagram 3e. Creates family relationships from the person tree during import.")
|
|
|
|
Rel(frontend, docCtrl, "Document requests", "HTTP / JSON")
|
|
Rel(frontend, adminCtrl, "Trigger import", "HTTP / JSON")
|
|
Rel(docCtrl, docSvc, "Delegates to")
|
|
Rel(adminCtrl, importOrch, "Triggers")
|
|
Rel(docSvc, fileSvc, "Upload / download files")
|
|
Rel(docSvc, docRepo, "Reads / writes documents")
|
|
Rel(docSvc, docSpec, "Builds search predicates")
|
|
Rel(docSvc, personSvc, "Resolves sender / receivers")
|
|
Rel(docSvc, tagSvc, "Finds or creates tags")
|
|
Rel(importOrch, tagTreeLoader, "1. Loads tags")
|
|
Rel(importOrch, personRegLoader, "2. Loads register persons")
|
|
Rel(importOrch, personTreeLoader, "3. Loads tree persons + relationships")
|
|
Rel(importOrch, docLoader, "4. Loads documents")
|
|
Rel(tagTreeLoader, sheetReader, "Reads canonical .xlsx")
|
|
Rel(personRegLoader, sheetReader, "Reads canonical .xlsx")
|
|
Rel(docLoader, sheetReader, "Reads canonical .xlsx")
|
|
Rel(docLoader, titleFmt, "Builds honest title date")
|
|
Rel(tagTreeLoader, tagSvc, "Upserts tags by source_ref")
|
|
Rel(personRegLoader, personSvc, "Upserts persons by source_ref")
|
|
Rel(personTreeLoader, personSvc, "Upserts persons by source_ref")
|
|
Rel(personTreeLoader, relSvc, "Creates relationships")
|
|
Rel(docLoader, docSvc, "Upserts documents by index")
|
|
Rel(docLoader, personSvc, "Register-first match / provisional person")
|
|
Rel(docLoader, tagSvc, "Attaches tag by source_ref")
|
|
Rel(docLoader, fileSvc, "Uploads resolved file")
|
|
Rel(minioConf, fileSvc, "Provides S3Client and S3Presigner beans")
|
|
Rel(fileSvc, minio, "PUT / GET / presigned URL objects", "S3 API / HTTP")
|
|
Rel(docRepo, db, "SQL queries", "JDBC")
|
|
|
|
@enduml
|