Files
familienarchiv/docs/architecture/c4/l3-backend-3b-document-management.puml
Marcel 0a3d12b9af
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m42s
CI / OCR Service Tests (pull_request) Successful in 23s
CI / Backend Unit Tests (pull_request) Successful in 3m50s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
docs: drop remaining stale MassImportService/ExcelService references
Replace the legacy raw-spreadsheet importer references left behind after
#674 with the canonical import architecture (CanonicalImportOrchestrator +
four loaders) and document #686 index-based PDF resolution.

- l3-backend-3b: DocumentImporter now resolves PDF by index (importDir/
  <index>.pdf) with index validation + canonical-path containment + %PDF
  magic-byte check (no recursive walk / homoglyph file-path guards)
- c4-diagrams.md: replace massImport/excelSvc components + their rels with
  an importOrch (CanonicalImportOrchestrator) component wired to doc/person/
  tag services; refresh adminCtrl and adminSystem descriptions
- ARCHITECTURE.md: importing package row now describes the orchestrator +
  four loaders consuming canonical artifacts
- TODO-backend.md: remove obsolete "MassImportService provides no status"
  item (service deleted; orchestrator already exposes import-status); update
  stale ExcelService test-coverage suggestion

Refs #686

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:08:45 +02:00

5.7 KiB

Component Diagram: API Backend — Document Management & Canonical ImportComponent Diagram: API Backend — Document Management & Canonical ImportAPI Backend (Spring Boot)[system]«component»DocumentController[Spring MVC â€”/api/documents] CRUD for documents:search, get by ID, updatemetadata, upload/downloadfile, conversation thread,batch metadata updates,and per-month densityaggregation for the timelinefilter widget.«component»AdminController[Spring MVC â€” /api/admin] Triggers the asynchronouscanonical import (requiresADMIN permission). Reportsimport state(IDLE/RUNNING/DONE/FAILED).«component»DocumentService[Spring Service] Core document businesslogic: store, update, search.Resolves persons and tags,delegates file I/O toFileService, builds dynamicJPA Specifications, andintegrates with auditlogging.«component»FileService[Spring Service] Wraps AWS SDK v2S3Client. Uploads files withUUID-keyed paths,computes SHA-256 hash,downloads withcontent-type detection, andgenerates presigned URLsfor OCR access.«component»CanonicalImportOrchestrator[Spring Service â€” @Async] Runs the four canonicalloaders in an explicitdependency DAG (TagTree→ PersonRegister â†’PersonTree â†’ Document).Smoke-checks all fourartifacts before starting,owns theIDLE/RUNNING/DONE/FAILEDstate machine, fails closedon a malformed artifact.«component»TagTreeImporter[Spring Component] Upserts the tag hierarchyfrom canonical-tag-tree.xlsxvia TagService (bycanonical tag_path).«component»PersonRegisterImporter[Spring Component] Upserts register personsfrom canonical-persons.xlsxvia PersonService (bynormalizer person_id).«component»PersonTreeImporter[Spring Component] Upserts tree persons +relationships fromcanonical-persons-tree.jsonvia PersonService andRelationshipService.«component»DocumentImporter[Spring Component] Loadscanonical-documents.xlsx:routes attributionregister-first (raw cellalways retained insender_text/receiver_text),parses clean dates, buildsan honest precision-awaretitle viaDocumentTitleFormatter,keeps the S3 upload +thumbnail plumbing, andresolves each PDF by index(importDir/<index>.pdf)guarded by strict indexvalidation + canonical-pathcontainment + %PDFmagic-byte check (norecursive walk).«component»DocumentTitleFormatter[Pure helper] Formats the date labelbaked into an import title atexactly the data's precision(MONTH -> 'Juni 1916',never a fabricated day).Mirrors the frontendformatDocumentDate; bothare pinned todocs/date-label-fixtures.json(#666).«component»CanonicalSheetReader[POI helper] Maps a canonical .xlsx byheader name (no positionalindices), splitspipe-delimited list columns,fails closed(IMPORT_ARTIFACT_INVALID)on a missing requiredheader.«component»MinioConfig[Spring @Configuration] Creates the S3Client andS3Presigner beans withpath-style access for MinIO.Validates MinIO connectivityon startup.«component»DocumentRepository[Spring Data JPA] Queries documents withSpecification-baseddynamic search,bidirectional conversationthread queries, full-textsearch with ranking andmatch highlighting, andtranscription pipeline queueprojections.«component»DocumentSpecifications[JPA Criteria API] Factory for composablepredicates: hasText(full-text), hasSender,hasReceiver, isBetween(date range), hasTags(subquery AND/OR logic).«container»Web Frontend[SvelteKit]«container»PostgreSQL[PostgreSQL 16]«container»Object Storage[MinIO (S3-compatible)]«component»PersonService[Spring Service] See diagram 3e. Resolvessender / receiver personsby ID; upserts persons bysource_ref for the importer.«component»TagService[Spring Service] See diagram 3d. Finds orcreates tags by name;upserts tags by source_reffor the importer.«component»RelationshipService[Spring Service] See diagram 3e. Createsfamily relationships fromthe person tree duringimport.Document requests[HTTP / JSON]Trigger import[HTTP / JSON]Delegates toTriggersUpload / downloadfilesReads / writesdocumentsBuilds searchpredicatesResolves sender /receiversFinds or creates tags1. Loads tags2. Loads registerpersons3. Loads tree persons+ relationships4. Loads documentsReads canonical .xlsxReads canonical .xlsxReads canonical .xlsxBuilds honest titledateUpserts tags bysource_refUpserts persons bysource_refUpserts persons bysource_refCreates relationshipsUpserts documentsby indexRegister-first match /provisional personAttaches tag bysource_refUploads resolved fileProvides S3Client andS3Presigner beansPUT / GET / presignedURL objects[S3 API / HTTP]SQL queries[JDBC]