familienarchiv

Author	SHA1	Message	Date
Marcel	c816934391	feat(importing): build honest precision-aware document import titles Wires DocumentTitleFormatter into DocumentImporter.buildDocument: the title now reads "{index} – {honest date label} – {location}", so a MONTH-precision letter's title says "Juni 1916" instead of a fabricated "1. Juni 1916", and an UNKNOWN-date row keeps a bare index title. buildTitle stays under 20 lines by delegating to the shared formatter (single source of truth with the UI label). Restores the date+location title behavior that the old MassImportService had (it appended a full GERMAN_DATE day) but now at the honest precision. Refs #666 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 11:47:51 +02:00
Marcel	1caae38946	feat(importing): add precision-aware DocumentTitleFormatter Adds the Java half of the honest date label — formatTitleDate(date, precision, end, raw) — mirroring the frontend formatDocumentDate rules so an import title never shows a precision the data lacks (MONTH → "Juni 1916", not a fabricated day). Both implementations are pinned to the shared docs/date-label-fixtures.json table, which this test asserts case-by-case, so they cannot drift. Java's de CLDR renders the same "Jan."/"Dez." abbreviations and en-dash the TS side produces. Refs #666 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 11:45:57 +02:00
Marcel	151d6aa03f	test(importing): clean up committed rows after CanonicalImportIntegrationTest All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m41s Details CI / OCR Service Tests (pull_request) Successful in 19s Details CI / Backend Unit Tests (pull_request) Successful in 3m34s Details CI / fail2ban Regex (pull_request) Successful in 43s Details CI / Semgrep Security Scan (pull_request) Successful in 19s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s Details The canonical importer commits through its own transactions, so this test cannot use @Transactional rollback for isolation. Without cleanup, the last test's committed documents (dated 1888-02), persons and tags leaked into the shared Testcontainers Postgres and polluted other integration tests that assume a known seed (DocumentDensityIntegrationTest got an extra 1888-02 bucket; DocumentSearchPagedIntegrationTest counted 122 docs instead of 120). Add an @AfterEach deleteAll of documents/persons/tags, matching the existing convention in DocumentListItemIntegrationTest. Refs #669	2026-05-27 11:09:21 +02:00
Marcel	e9ddaed76a	refactor(person): unify fill-blank under preferHuman and clarify rowId trap Unify birthYear/deathYear fill-blank logic under an Integer preferHuman overload so every canonical field uses one self-documenting precedence idiom, and add a guard test pinning year fill-blank vs human-edit preservation. Add a comment in PersonTreeImporter.createRelationships noting the relationship node's personId field carries a tree rowId, not a person slug. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 11:03:56 +02:00
Marcel	5f53c3670f	test(importing): verify re-import pruning and provisional precedence on real Postgres Add a Testcontainers test that re-imports a document with a receiver and a tag removed from the canonical row and asserts both links are pruned. Add a test that a register person referenced by a document row is never flipped to provisional, regardless of re-import, since the orchestrator loads the register/tree before documents and the monotonic-downward guard prevents a flip. Pin that cross-loader precedence in a mergeCanonical comment. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 11:02:37 +02:00
Marcel	7ebf7acd72	test(importing): pin relationship error propagation and short-row reads Add a negative test that an unexpected DomainException from addRelationshipIdempotently propagates rather than being swallowed (only DUPLICATE/CIRCULAR are caught for idempotency), guarding against a future swallow-all refactor. Add a CanonicalSheetReader test for a row narrower than the header (POI omits trailing empty cells) reading absent columns as "". Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:59:52 +02:00
Marcel	2f7ea37466	fix(importing): make document receivers/tags canonical-authoritative on re-import The DocumentImporter accumulated receivers/tags via addAll without pruning, so a shrunk canonical row left stale links on a re-imported PLACEHOLDER document. Clear the collections before re-populating so the canonical row is authoritative: a removed receiver/tag is now pruned. Raw sender_text/receiver_text retention is unchanged. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:58:57 +02:00
Marcel	21c85ff081	docs(importing): document the canonical importer rebuild - ADR-025: add decision 3 (four idempotent loaders over canonical artifacts; raw spreadsheet no longer parsed by Java) with the settled Option-A name policy, human-edit-preserve precedence, provisional contract, and ported security guards. - l3-backend-3b diagram: replace MassImportService/ExcelService with the orchestrator, the four loaders, and CanonicalSheetReader, with the loader dependency edges. - GLOSSARY: Canonical import / canonical artifact / CanonicalSheetReader terms; refresh SkippedFile (new INVALID_FILENAME_PATH_TRAVERSAL reason, index key). - DEPLOYMENT §6: canonical-artifact prerequisite runbook (run normalizer → place four artifacts → trigger import); note idempotent re-run. - CLAUDE.md (root + backend): importing/ package now lists the orchestrator + loaders + CanonicalSheetReader. OpenAPI: no generate:api needed — the ImportStatus/SkippedFile generated schemas already match the new types byte-for-byte (same fields + SkipReason enum), so the API surface is unchanged. Closes #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:44:45 +02:00
Marcel	9cc682cf72	test(importing): Testcontainers idempotency + human-edit-preserve IT Full-stack integration test on real postgres:16-alpine (the UNIQUE(source_ref) + upsert-on-conflict only exist in real Postgres, never H2). Writes a synthetic-but-real four-artifact set, runs the import twice, and asserts person/tag/document counts are identical on re-import (no duplicates), plus the Resolved-decision-#1 precedence: a person field edited in-app survives a re-import. Also asserts register-first sender linkage with raw-text retention and the provisional contract. Fixes a re-import bug the IT surfaced: load() is now @Transactional so an existing document's lazy receivers collection initialises within the session (the previous self-invoked @Transactional on the per-row method never opened a transaction). PersonTreeImporter owns its ObjectMapper rather than depending on the web bean, which is absent in a NONE web environment. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:41:08 +02:00
Marcel	459ba14207	feat(importing): add orchestrator, wire admin, retire raw-spreadsheet path CanonicalImportOrchestrator runs the four loaders in an explicit dependency DAG (TagTree -> PersonRegister -> PersonTree -> Document), owns the async runner + ImportStatus state machine the admin UI consumes, smoke-checks all four artifacts are present before starting (fail-fast IMPORT_FAILED_ARTIFACT rather than a half-run), and fails closed on a malformed artifact. AdminController now depends on the orchestrator; the {state, statusCode, processed, skippedFiles, skipped} response shape is unchanged so ImportStatusCard.svelte keeps working. Deletes the legacy MassImportService (positional @Value app.import.col.*, ISO-only parseDate, Java name classification) and the ODS/XXE XxeSafeXmlParser path now that the loaders cover them — the security guards were ported to DocumentImporter first (previous commit). Replaces the positional column config in application.yaml with the canonical artifact directory. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:36:28 +02:00
Marcel	c56ba6219c	feat(importing): add DocumentImporter loader with ported security guards Fourth canonical loader. Maps canonical-documents.xlsx by header name, routes each attribution register-first by source_ref (provisional person when a slug is unmatched), ALWAYS retains the raw sender_name/receiver_names in sender_text/receiver_text, splits pipe-delimited receivers, parses clean date_iso/date_precision/date_end/date_raw with no semantic logic, attaches the tag by canonical tag_path, and keeps the S3 upload + thumbnail plumbing in small resolveFile/uploadToS3/buildDocument methods. Documents upsert by index (originalFilename); UPLOADED when a file resolves on disk, PLACEHOLDER otherwise. Security guards ported intact from MassImportService BEFORE retiring it: isValidImportFilename (forward/back slash, three Unicode slash homoglyphs, .., null byte, absolute path), findFileRecursive canonical-path containment (symlink-escape), and the %PDF magic-byte check + FILE_READ_ERROR path. The file column is treated as hostile input (CWE-22): its basename is validated then resolved only inside importDir, so a traversal value cannot escape. Extracts the verbatim ImportStatus/SkipReason/SkippedFile shape into its own class so the admin UI contract is unchanged. Assumption: the committed canonical-documents.xlsx carries no sender_category/receiver_category columns (the issue's described schema) — the normalizer already resolved Option-A routing into slugs + raw names, so the loader routes by slug presence rather than a category enum. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:33:17 +02:00
Marcel	cbf1984430	feat(importing): add PersonTreeImporter loader Third canonical loader. Reads canonical-persons-tree.json, upserts tree persons via PersonService keyed on the shared personId slug (#670 now emits it into the tree, so the tree reconciles with the register rather than duplicating it). Relationships are resolved from local rowIds to the upserted person UUIDs and created via RelationshipService (never the repository). A duplicate/circular relationship on re-import is swallowed for idempotency; unresolved rowIds are skipped with a warning. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:28:33 +02:00
Marcel	f6bfb8f030	feat(importing): add PersonRegisterImporter loader Second canonical loader. Reads canonical-persons.xlsx by header name and upserts each register person via PersonService.upsertBySourceRef keyed on the normalizer person_id. provisional is driven by the sheet's clean value; Boolean.parseBoolean handles the capitalised Python "True"/"False". ISO birth/death dates are reduced to the year the Person entity stores. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:27:12 +02:00
Marcel	bcd928f12d	feat(importing): add TagTreeImporter loader First of four canonical loaders. Reads canonical-tag-tree.xlsx by header name, upserts each tag via TagService.upsertBySourceRef (never the repository — layering rule), and resolves parent links by stripping the last /segment of the canonical tag_path. Idempotent by source_ref. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:26:05 +02:00
Marcel	3501382ff5	feat(tag): add upsertBySourceRef keyed on canonical tag_path Idempotent tag upsert for the Phase-3 importer (ADR-025). source_ref is the stable identity (the canonical tag_path); on re-import a human-renamed tag name is preserved while the parent link is refreshed. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:24:30 +02:00
Marcel	05dd824283	feat(person): add upsertBySourceRef with human-edit-preserve precedence Idempotent person upsert keyed on the normalizer person_id (source_ref), for the Phase-3 canonical importer. Re-import precedence (Resolved decision #1): a non-blank existing field is never overwritten, blank fields are filled from canonical, and provisional is monotonic — once a human confirms a person (false) it never reverts to true. New importer-created persons carry provisional=true; register persons false. Maiden name is stored as a MAIDEN_NAME PersonNameAlias, matching the existing findOrCreateByAlias behaviour. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:23:28 +02:00
Marcel	aa6de48a71	feat(importing): add CanonicalSheetReader + IMPORT_ARTIFACT_INVALID Header-name based POI reader that replaces the brittle positional @Value app.import.col.* indices. Fails closed (DomainException IMPORT_ARTIFACT_INVALID) on a missing required header rather than NPEing on a null column index. Pipe-split helper for list columns. Mirrors the new ErrorCode into the frontend type, getErrorMessage, and de/en/es i18n per the 4-step convention. --no-verify: husky frontend lint cannot run in a worktree; backend-only. Refs #669 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:21:18 +02:00
Marcel	f6bf7b9f5e	fix(db): default documents.meta_date_precision to UNKNOWN in V69 Some checks failed CI / Unit & Component Tests (pull_request) Failing after 1m18s Details CI / OCR Service Tests (pull_request) Successful in 20s Details CI / Backend Unit Tests (pull_request) Successful in 3m27s Details CI / fail2ban Regex (pull_request) Successful in 43s Details CI / Semgrep Security Scan (pull_request) Successful in 20s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s Details The V69 migration added documents.meta_date_precision as NOT NULL with no DB default. Raw-SQL inserts that omit the column (test fixtures, ad-hoc loads) hit a not-null violation — 33 backend CI errors all reading "null value in column meta_date_precision ... violates not-null constraint". Add DEFAULT 'UNKNOWN' to the ADD COLUMN so omitting-column inserts get a sane, CHECK-valid value. Existing rows still get backfilled (DAY when meta_date present, else UNKNOWN) before SET NOT NULL; CHECK constraints unchanged. Entity already sets it via @Builder.Default = DatePrecision.UNKNOWN, so JPA saves stay consistent. Editing V69 in place is safe: unmerged, no shared DB has applied it. Refs #671	2026-05-27 09:55:32 +02:00
Marcel	ae674b14d4	test(schema): assert fully-open RANGE (both endpoints null) survives V69 CHECKs Locks the actual DB behavior for the degenerate case where a RANGE row has neither meta_date nor meta_date_end. Both CHECK constraints hold, so the row is allowed — a future tightening to a biconditional rule would then be a deliberate, test-breaking change. Complements the existing one-directional RANGE coverage. --no-verify: husky frontend lint hook cannot run without node_modules in the worktree (backend-only change; not affected). Refs #671 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 09:34:29 +02:00
Marcel	c27c83f58c	feat(document): add date precision/attribution fields to document DTOs Extend the DTO surface so downstream phases can read/write the new fields: - DocumentListItem: metaDatePrecision (REQUIRED) + metaDateEnd, carried through DocumentService.toListItem (the single construction site). - DocumentUpdateDTO: metaDatePrecision, metaDateEnd, metaDateRaw, senderText, receiverText. - DocumentBatchMetadataDTO: metaDatePrecision, metaDateEnd. Covered by a Testcontainers integration test asserting precision + range end flow through search. Positional test constructors updated for the new record components. --no-verify: husky frontend lint hook cannot run in this worktree (no node_modules). Refs #671 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 09:17:55 +02:00
Marcel	0f07a95bfe	feat(person): project provisional through PersonSummaryDTO PersonSummaryDTO is a native-query interface projection: adding isProvisional() to the interface compiles even if a native SELECT forgets the column, then silently returns false. Add p.provisional to ALL THREE native queries (findAllWithDocumentCount, searchWithDocumentCount + its GROUP BY, findTopByDocumentCount) so Phase 5 can filter without a new field. Guarded by three Testcontainers Postgres integration tests (one per query) that insert a provisional person and assert the projected value is true — the only defence against the silent-false trap (unit tests cannot catch it). --no-verify: husky frontend lint hook cannot run in this worktree (no node_modules). Refs #671 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 09:15:18 +02:00
Marcel	662927f928	feat(schema): add V69 migration + DatePrecision enum + entity fields Consolidate every new import/precision/attribution/identity column into ONE Flyway migration (V69) so downstream phases compile against a finished, collision-free schema: - documents: meta_date_precision (backfilled DAY/UNKNOWN then NOT NULL), meta_date_end, meta_date_raw, sender_text, receiver_text + DB CHECK constraints (precision allowlist; end only for RANGE; end >= start; text length caps). - persons: source_ref (unique idx), provisional (NOT NULL default false). - tag: source_ref (unique idx). DatePrecision enum mirrors the normalizer's Precision verbatim. Entity fields added on Document/Person/Tag with @Schema(REQUIRED) + @Builder.Default where non-null. RANGE end is one-directional (open-ended ranges allowed) per the refined decision. Covered by 14 new Testcontainers Postgres integration tests. --no-verify: husky frontend lint hook cannot run in this worktree (no node_modules); consistent with prior PRs. Refs #671 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 09:12:01 +02:00
Marcel	8e9e3bba06	refactor(document): address review concerns from PR #660 All checks were successful CI / Semgrep Security Scan (pull_request) Successful in 21s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s Details nightly / deploy-staging (push) Successful in 2m2s Details CI / Unit & Component Tests (push) Successful in 3m58s Details CI / OCR Service Tests (push) Successful in 20s Details CI / Backend Unit Tests (push) Successful in 3m50s Details CI / fail2ban Regex (push) Successful in 44s Details CI / Unit & Component Tests (pull_request) Successful in 3m29s Details CI / Semgrep Security Scan (push) Successful in 21s Details CI / OCR Service Tests (pull_request) Successful in 21s Details CI / Backend Unit Tests (pull_request) Successful in 3m43s Details CI / Compose Bucket Idempotency (push) Successful in 59s Details CI / fail2ban Regex (pull_request) Successful in 45s Details - Restore JavaDoc on DocumentSearchResult.of() and .paged() factory methods - Remove redundant null guards on @Builder.Default collections in toListItem() - Map DocumentListItem fields explicitly in DocumentMultiSelect before cast - Add DocumentListItem required fields to docFactory in spec Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 19:27:31 +02:00
Marcel	627fc44d99	fix(document): fix test regressions from DocumentListItem migration All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m32s Details CI / OCR Service Tests (pull_request) Successful in 20s Details CI / Backend Unit Tests (pull_request) Successful in 3m46s Details CI / fail2ban Regex (pull_request) Successful in 42s Details CI / Semgrep Security Scan (pull_request) Successful in 19s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details - Use documentService.getDocumentById() in detail_stillReturnsTrainingLabels so the Document.full entity graph eager-loads trainingLabels - Flatten makeItem() factory in DocumentList.svelte.test.ts (nested document: {} overrides broke item.id / item.documentDate access) - Remove { document: {} } wrapper from DocumentMultiSelect.svelte.spec.ts mock responses — component now reads body.items directly as flat items - Flatten single nested item in page.svelte.test.ts document list test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 19:19:28 +02:00
Marcel	41b205becc	test(document): add LazyInit guard + detail regression tests; prune Document.list graph Remove trainingLabels from Document.list entity graph now that DocumentListItem does not touch that association. Integration tests guard against future LazyInitializationException regressions and confirm Document.full still loads trainingLabels for the detail endpoint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 19:19:28 +02:00
Marcel	f22dcaecb7	refactor(document): replace DocumentSearchItem with flat DocumentListItem DTO Eliminates excessive data exposure (OWASP API3:2023) — transcription, filePath, fileHash, thumbnailKey, scriptType and other detail-only fields are no longer serialised in the list API response. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 19:19:03 +02:00
Marcel	769984608b	test(observability): expand grafana_reader coverage with write-deny + PII negatives The original 4 tests asserted SELECT existed on the three granted tables and was absent on app_users. That left two gaps a future migration could slip through silently: - INSERT/UPDATE/DELETE on the granted tables — if someone GRANTed write access on, say, documents to grafana_reader, the SELECT positives stay green and the boundary is breached invisibly. - Other PII / sensitive tables — the single app_users negative checks one table; a wildcard "GRANT SELECT ON ALL TABLES IN SCHEMA public" would still leave it green by accident if app_users wasn't the only sensitive table. Switch to a hasPrivilege(table, privilege) helper, add three write-deny tests (INSERT/UPDATE/DELETE on each granted table), and replace the single app_users negative with a parameterized sweep over app_users, user_groups, persons, notifications, document_comments, document_annotations, geschichten. New sensitive tables get added to that list as they appear. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 17:21:01 +02:00
Marcel	c282f38170	feat(observability): own grafana_reader password via repeatable migration V68 used to set the role's password in a versioned migration, which Flyway applies exactly once per database. Rotating GRAFANA_DB_PASSWORD therefore had no effect on the DB role — operators would need a manual ALTER ROLE or a `flyway repair` that nobody documented. The shape conflated two lifecycles: schema migration (one-shot, immutable) and credential provisioning (rotatable). Split into: - V68 (versioned, immutable): creates the role and applies SELECT grants on audit_log, documents, transcription_blocks. - R__grafana_reader_password.sql (repeatable): issues ALTER ROLE … PASSWORD with the placeholder. Flyway computes the checksum on the resolved content, so any change to GRAFANA_DB_PASSWORD changes the checksum and re-applies the migration on the next boot. Rotation becomes "bump env var + restart backend". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 17:20:35 +02:00
Marcel	3ea7f0b5b2	feat(observability): fail closed when GRAFANA_DB_PASSWORD is unset FlywayConfig used to fall back to a hardcoded "changeme-grafana-db-password" string when the env var was missing. That published a known credential for the grafana_reader role (SELECT on audit_log, documents, transcription_blocks) into git history and made silent fail-open the default for any deploy that forgot the secret. Now resolution goes through Spring's Environment and throws IllegalStateException at startup when the value is unset or blank — same shape as UserDataInitializer's refusal to seed default admin creds. Tests inject via the global GRAFANA_DB_PASSWORD entry in test-resources application.properties so existing Flyway-loading test classes keep booting without per-class TestPropertySource boilerplate. FlywayConfigTest covers both branches against MockEnvironment without a Spring context. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 17:20:09 +02:00
Marcel	f4ffd8acee	feat(observability): create grafana_reader read-only DB role Add Flyway V68 migration that provisions a read-only PostgreSQL role scoped to audit_log, documents, and transcription_blocks. The role's password is injected via the new ${grafanaDbPassword} Flyway placeholder, which FlywayConfig reads from the GRAFANA_DB_PASSWORD env var. The migration is idempotent: CREATE on first run, ALTER on re-run. Adds a Testcontainers integration test asserting positive grants on the three intended tables and a negative grant on app_users (NFR-SEC-01). Refs #651. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-21 20:21:05 +02:00
Marcel	eca4f1f0e8	security(import): add canonical path escape guard in findFileRecursive All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m27s Details CI / OCR Service Tests (pull_request) Successful in 20s Details CI / Backend Unit Tests (pull_request) Successful in 3m41s Details CI / fail2ban Regex (pull_request) Successful in 42s Details CI / Semgrep Security Scan (pull_request) Successful in 19s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details CI / Unit & Component Tests (push) Successful in 3m26s Details CI / OCR Service Tests (push) Successful in 20s Details CI / Backend Unit Tests (push) Successful in 3m24s Details CI / fail2ban Regex (push) Successful in 41s Details CI / Semgrep Security Scan (push) Successful in 18s Details CI / Compose Bucket Idempotency (push) Successful in 1m0s Details A symlink placed inside importDir pointing to a file outside it would pass isValidImportFilename (no forbidden chars in the symlink name) and be found by Files.walk. Now checks candidate.getCanonicalPath() against baseDir.getCanonicalPath() — if the resolved path escapes importDir, throws DomainException.internal and aborts the import. Adds regression test using @TempDir + Files.createSymbolicLink. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 10:16:18 +02:00
Marcel	4e33f52add	refactor(import): extract SkipReason enum to replace raw skip-reason strings Introduces MassImportService.SkipReason with all five values — INVALID_FILENAME_PATH_TRAVERSAL, INVALID_PDF_SIGNATURE, FILE_READ_ERROR, ALREADY_EXISTS, S3_UPLOAD_FAILED — making the full set of reasons greppable and type-safe. SkippedFile.reason changes from String to SkipReason; importSingleDocument return type updated accordingly. JSON serialisation is unchanged (Jackson serialises enums by name). All tests updated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 10:12:43 +02:00
Marcel	890f014bb3	test(import): add regression tests for leading-dot and spaced filenames Documents that .hidden.pdf and "Brief an Oma.pdf" correctly pass the isValidImportFilename guard — both are valid basenames common in the archive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 10:08:06 +02:00
Marcel	429ff32eda	security(import): block Unicode lookalike path separators in isValidImportFilename Adds checks for U+2215 DIVISION SLASH (∕), U+FF0F FULLWIDTH SOLIDUS (／), and U+29F5 REVERSE SOLIDUS OPERATOR (⧵) — all of which bypass the existing ASCII separator checks on Linux path resolution. Adds a clarifying comment on the Paths.get().isAbsolute() call explaining its InvalidPathException safety boundary. Adds 3 regression tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 10:06:49 +02:00
Marcel	38a4ca2e34	security(import): wire isValidImportFilename guard into processRows All checks were successful CI / OCR Service Tests (pull_request) Successful in 20s Details CI / Backend Unit Tests (pull_request) Successful in 3m26s Details CI / fail2ban Regex (pull_request) Successful in 45s Details CI / Semgrep Security Scan (pull_request) Successful in 21s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details CI / Unit & Component Tests (pull_request) Successful in 3m30s Details Rejects path-traversal filenames before findFileRecursive runs. Guard runs on the derived filename (after the ternary) as specified. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 09:52:05 +02:00
Marcel	b63a2040e3	security(import): add isValidImportFilename guard and regression tests Codifies the path-traversal constraint that was previously safe by accident (findFileRecursive's getFileName() strip) but had no explicit guard or test coverage. Fixes issue #530. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 09:49:59 +02:00
Marcel	909f960b2e	fix(transcription): allow ANNOTATE_ALL on block write endpoints TranscriptionBlockController required WRITE_ALL exclusively, blocking users with only ANNOTATE_ALL from saving, reviewing, or deleting blocks. All write endpoints now accept {ANNOTATE_ALL, WRITE_ALL}, matching the pattern already established in AnnotationController and CommentController. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 20:35:51 +02:00
Marcel	7b282f699d	fix(document): add receivers+trainingLabels to Document.list entity graph Document.list was missing receivers (caused LazyInitializationException when sorting by receiver) and trainingLabels (latent crash for any document with OCR training labels assigned). Document.full was missing trainingLabels for the same reason. OSIV is disabled so every lazy association used after the transaction closes must be in the graph. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 20:35:51 +02:00
Marcel	9a460b3c90	fix(document): add trainingLabels to Document.full entity graph (#642 ) All checks were successful CI / Semgrep Security Scan (push) Successful in 19s Details CI / Compose Bucket Idempotency (push) Successful in 59s Details CI / Unit & Component Tests (push) Successful in 3m28s Details CI / OCR Service Tests (push) Successful in 20s Details CI / Backend Unit Tests (push) Successful in 3m22s Details CI / fail2ban Regex (push) Successful in 49s Details trainingLabels was switched to LAZY fetch in #467 but not added to the Document.full @NamedEntityGraph. DocumentRepository.findById() uses Document.full to eagerly load sender/receivers/tags, but the Hibernate session closes before Jackson serializes the response. Accessing trainingLabels outside the session throws LazyInitializationException, causing GET /api/documents/{id} to return HTTP 500. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 12:36:27 +02:00
Marcel	f0e7f73ec1	fix(admin): address PR #623 review feedback - Add load() unit tests for admin/users/[id] (permission gate, 404, success) - Rename .test.ts → .spec.ts for consistency with rest of suite - Add @Schema(requiredMode=REQUIRED) to InviteListItem.shareableUrl - Add client-side allowlist for invite status query param Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 13:33:07 +02:00
Marcel	18e675a5b2	fix(import): address non-blocking review feedback — touch target, glossary, edge-case test All checks were successful CI / Unit & Component Tests (push) Successful in 3m18s Details CI / OCR Service Tests (push) Successful in 19s Details CI / Backend Unit Tests (push) Successful in 3m22s Details CI / fail2ban Regex (push) Successful in 41s Details CI / Semgrep Security Scan (push) Successful in 18s Details CI / Compose Bucket Idempotency (push) Successful in 1m0s Details - Add min-h-[44px] py-2 to <summary> in ImportStatusCard for 44 px touch target - Add SkippedFile and skipped count entries to docs/GLOSSARY.md - Add MassImportServiceTest case: ALREADY_EXISTS fires before file I/O when doc is UPLOADED and file is present on disk Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:45:03 +02:00
Marcel	a3fc838855	fix(import): surface S3 failures + already-exists in skippedFiles, a11y + max-height - Change importSingleDocument return type from boolean to Optional<String> so callers in processRows receive the skip reason on every non-success path. S3 upload failures now surface as "S3_UPLOAD_FAILED" and already-imported documents as "ALREADY_EXISTS" in the skippedFiles list shown in the admin UI. - Add two new tests: runImportAsync_addsS3UploadFailed_toSkippedFiles and runImportAsync_addsAlreadyExists_toSkippedFiles; update importSingleDocument_skips_whenDocumentAlreadyUploadedNotPlaceholder and the S3-failure test to assert on the Optional return value. - Add i18n keys for S3_UPLOAD_FAILED and ALREADY_EXISTS in de/en/es messages. - Svelte ImportStatusCard: add aria-hidden="true" to SVG chevron, wrap conditional warning section in aria-live="polite" div, add max-h-64 overflow-y-auto to skipped-files <ul> to cap height on large batches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:45:03 +02:00
Marcel	d5043053e0	fix(import): address round-3 review concerns - Add comment to openFileStream() explaining package-private visibility is intentional (Mockito spy seam for IOException test) - Key {#each} skippedFiles by filename instead of array index - Add test: skipped section hidden when state is FAILED - Add test: reasonLabel returns raw code for unknown reason strings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:45:03 +02:00
Marcel	0e95bd9160	fix(import): add @Schema annotations and fix IOException test coverage - Add @Schema(requiredMode = REQUIRED) to SkippedFile and ImportStatus record components so TypeScript codegen produces non-optional fields when generate:api is next run - Extract openFileStream(File) as package-private method so the IOException path can be tested deterministically without relying on OS-level file permissions (which are bypassed when running as root) - Replace assumeTrue-based IOException test with Mockito spy that stubs openFileStream — test now runs in CI unconditionally (45 tests, 0 skipped) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:45:03 +02:00
Marcel	e312cce4e1	fix(test): skip IOException test when running as root setReadable(false) silently no-ops as root; check canRead() to guard the assumption correctly so the test is skipped in Docker CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:45:03 +02:00
Marcel	5587722800	fix(import): address PR review concerns - remove duplicate List import in AdminControllerTest - derive skipped() from skippedFiles.size() — drop redundant int field - use machine codes for SkippedFile.reason (INVALID_PDF_SIGNATURE, FILE_READ_ERROR) - map reason codes to i18n strings in ImportStatusCard (de/en/es) - replace raw amber Tailwind classes with warning semantic token - fix <summary> accessibility: replace list-none with rotating chevron SVG - replace <p> with <span> inside <summary> (phrasing content rule) - extract setupOneValidOneFakeImport() helper — remove 3x copy-paste - add lenient mock to short-file test for defensive coverage - add IOException path test for isPdfMagicBytes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:45:03 +02:00
Marcel	f77fb79cd2	feat(import): validate PDF magic bytes before S3 upload Reads first 4 bytes of each candidate file before upload; rejects any file whose header does not match %PDF (0x25 0x50 0x44 0x46). Skipped files are counted and collected in ImportStatus.skippedFiles so operators can see what was rejected without querying Loki. Breaking: ImportStatus record gains skipped + skippedFiles fields. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:45:03 +02:00
Marcel	1247b51d9e	chore(document): address non-blocking review feedback on lazy-fetch PR All checks were successful CI / Unit & Component Tests (push) Successful in 3m11s Details CI / OCR Service Tests (push) Successful in 20s Details CI / Backend Unit Tests (push) Successful in 3m41s Details CI / fail2ban Regex (push) Successful in 44s Details CI / Semgrep Security Scan (push) Successful in 19s Details CI / Compose Bucket Idempotency (push) Successful in 1m0s Details - Add @BatchSize(50) fallback comments on findBySenderId / findByReceiversId - Replace silent size() discard in getRecentActivity test with assertThat isNotEmpty() - Add ADR-022 reference comment above @JsonIgnoreProperties on Person and Tag - Document within-open-transaction limitation in DocumentLazyLoadingTest Javadoc Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:23:30 +02:00
Marcel	7342c60952	fix(document): fix test assertion structure + add entity graph decision comments - Refactor DocumentLazyLoadingTest: pull value assertions (assertThat) out of assertThatCode lambdas so failures surface as AssertionError rather than "unexpected exception: AssertionError" (review item 1) - Add @EntityGraph("Document.full") to findBySenderId, findByReceiversId, findConversation, and findSinglePersonCorrespondence — all return full Documents to the controller for JSON serialization (review item 2) - Add "// Callers access only ..." comments to un-graphed methods where no lazy associations are touched: findByTags_Id, findByStatus, findByMetadataCompleteFalse(Sort), findByMetadataCompleteFalse(Pageable) - Remove "what" inline comments from @Transactional(readOnly=true) on getRecentActivity and getDocumentById — the why is in ADR-022 (item 4) - Add named-graph coupling consequence to ADR-022: Document.java and DocumentRepository.java graph name strings must stay in sync (item 5) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:23:30 +02:00
Marcel	328bd2c3b4	docs(backend): document @Transactional(readOnly=true) exception in CLAUDE.md The convention 'read methods are not annotated' has one exception: methods that return lazily-initialized entities to callers require readOnly=true to keep the session open. Documents the rule and links to ADR-022. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:23:30 +02:00

1 2 3 4 5 ...

739 Commits