familienarchiv

Author	SHA1	Message	Date
Marcel	94a40237f4	feat(normalizer): generate structured tags from Schlagwort + Inhalt fields Adds tags.py module implementing a three-outcome heuristic: - Individual-to-individual correspondence tags ("Clara an Herbert") → dropped - Group/collective correspondence ("Clara an Kinder", "Walter an Geschwister") → Briefwechsel/<value> - Semantic/event tags ("Brautbriefe", "Alltag", "zur Hochzeit") → Themen/<value> Three correspondence patterns detected: space-an-space, starts-with-"an ", and abbreviated-sender form ("Maria W.an Clara"). COLLECTIVE_TERMS in config.py extended with 17 plural/group relational terms (söhne, brüder, schwiegereltern, cousinen, etc.) confirmed against the full Excel. Also adds two-phase summary mining: every run emits review/tag-candidates.csv; subsequent runs apply keywords from overrides/approved-themes.csv as Themen tags. Outputs: canonical-documents.xlsx gets pipe-separated "Parent/Child" tag paths; canonical-tag-tree.xlsx provides the full tag hierarchy for backend pre-import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 19:47:36 +02:00
Marcel	5efe3b8a7c	feat(normalizer): parse Spanish month names + Month DD-YYYY hyphen form All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m31s Details CI / OCR Service Tests (pull_request) Successful in 22s Details CI / Backend Unit Tests (pull_request) Successful in 3m42s Details CI / fail2ban Regex (pull_request) Successful in 45s Details CI / Semgrep Security Scan (pull_request) Successful in 20s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s Details Add Spanish month names (Mexican-branch letters) to config.MONTHS and let the month-first matcher accept a hyphen (not just a dot) before the year, so "Mayo 18-1929"/"Junio 7-904" parse without manual overrides. Also bound 4-digit years to 1700-2100 so gross typos ("23-9003") stay in review instead of producing a bogus year. Cuts unknown-date rate 9.2% -> 7.9%. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 17:00:33 +02:00
Marcel	0f1f9055c3	docs(normalizer): add overrides/ README with structure + examples All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m27s Details CI / OCR Service Tests (pull_request) Successful in 21s Details CI / Backend Unit Tests (pull_request) Successful in 3m40s Details CI / fail2ban Regex (pull_request) Successful in 45s Details CI / Semgrep Security Scan (pull_request) Successful in 21s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s Details Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:53:03 +02:00
Marcel	8cac63e938	feat(normalizer): drop unmatched-names.csv; unresolved-names is the names report All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m32s Details CI / OCR Service Tests (pull_request) Successful in 19s Details CI / Backend Unit Tests (pull_request) Successful in 3m26s Details CI / fail2ban Regex (pull_request) Successful in 47s Details CI / Semgrep Security Scan (pull_request) Successful in 21s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details The unmatched list was just non-family correspondents (expected noise); their count stays in summary.txt and they remain in canonical-persons.xlsx. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:46:08 +02:00
Marcel	97db718f81	docs(import): add unresolved-names plan + worklog entry All checks were successful CI / OCR Service Tests (pull_request) Successful in 22s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s Details CI / Backend Unit Tests (pull_request) Successful in 3m52s Details CI / fail2ban Regex (pull_request) Successful in 42s Details CI / Unit & Component Tests (pull_request) Successful in 4m13s Details CI / Semgrep Security Scan (pull_request) Successful in 20s Details Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:01:18 +02:00
Marcel	06127724de	docs(normalizer): document unresolved-names.csv review report Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:59:45 +02:00
Marcel	7c017eca2a	test(normalizer): assert unresolved stat key + drop duplicate assertion Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:58:34 +02:00
Marcel	97ab9e38df	feat(normalizer): unresolved-names report + fix ambiguous-pair over-flagging Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:54:37 +02:00
Marcel	f10b80a03f	feat(normalizer): build_given_names from register + supplement Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:51:23 +02:00
Marcel	6478cc58ae	feat(normalizer): classify_name + NameClass Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:47:40 +02:00
Marcel	a7c45b3a0e	feat(normalizer): config tables for name classification Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:43:31 +02:00
Marcel	5ff0c25e10	chore: drop stray reader-dashboard test from this branch All checks were successful CI / Semgrep Security Scan (pull_request) Successful in 23s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s Details CI / Unit & Component Tests (pull_request) Successful in 3m31s Details CI / OCR Service Tests (pull_request) Successful in 20s Details CI / Backend Unit Tests (pull_request) Successful in 3m53s Details CI / fail2ban Regex (pull_request) Successful in 41s Details page.server.spec.ts picked up an unrelated reader-dashboard test case via a cross-session staging race; restore it to match main so this PR only touches the import-normalizer tool + docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:07:14 +02:00
Marcel	7ba3a29592	docs(import): record normalizer completion + dry-run results in worklog Some checks failed CI / Unit & Component Tests (pull_request) Failing after 1m17s Details CI / OCR Service Tests (pull_request) Successful in 19s Details CI / Backend Unit Tests (pull_request) Successful in 3m46s Details CI / fail2ban Regex (pull_request) Successful in 41s Details CI / Semgrep Security Scan (pull_request) Successful in 20s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s Details Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:56:20 +02:00
Marcel	d314fd9338	docs(normalizer): README + seed overrides Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:51:20 +02:00
Marcel	18d5a1e2da	feat(normalizer): orchestrator + end-to-end integration test Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:46:13 +02:00
Marcel	df00ea4238	fix(normalizer): defang leading LF in CSV + assert pinned workbook timestamp Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:43:45 +02:00
Marcel	ff1a7c07f1	feat(normalizer): overrides loader + xlsx/csv writers Recovered from an entangled commit: these files were correct but had been bundled into an unrelated reader-dashboard commit by a concurrent session. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:39:28 +02:00
Marcel	366b484815	test(normalizer): real provisional-vs-register collision + override-hits coverage Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:25:49 +02:00
Marcel	88c8063227	feat(normalizer): person resolution context + to_canonical Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:18:09 +02:00
Marcel	3066d3d3ff	refactor(normalizer): harden triage index guard + index_file_mismatch tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:15:50 +02:00
Marcel	3e7ddea90a	feat(normalizer): row extraction, triage, canonical record Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:12:48 +02:00
Marcel	75b3ca8b9e	fix(normalizer): don't coerce boolean cells to 1/0 Add bool guard before the int branch in _cell_to_str so True/False cells are preserved as "True"/"False" instead of "1"/"0". Add two regression tests covering the fix and missing-sheet error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:11:19 +02:00
Marcel	74c4c390fc	feat(normalizer): xlsx ingest + header mapping Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:08:30 +02:00
Marcel	29087319e6	test(normalizer): cover AliasIndex unambiguous first-name resolution Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:07:20 +02:00
Marcel	53457d9319	feat(normalizer): alias index with maiden/married/nickname resolution Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:04:11 +02:00
Marcel	2d97595e9c	fix(normalizer): split_receivers returns [] for a geb.-only cell Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:02:35 +02:00
Marcel	a177077b40	feat(normalizer): receiver splitting Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:59:51 +02:00
Marcel	b7a2332861	fix(normalizer): suffix all members of a colliding person-id group Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:58:35 +02:00
Marcel	1da1a8d223	feat(normalizer): person register parsing Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:54:37 +02:00
Marcel	59715bdccd	fix(normalizer): require day-dot in English month-first matcher (structural anti-shadow) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:53:05 +02:00
Marcel	53a661adb6	feat(normalizer): month/year, feast/season, range matchers + overrides Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:47:26 +02:00
Marcel	4942c0ea07	feat(normalizer): day-first month-name matcher Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:42:36 +02:00
Marcel	7edc002ebb	feat(normalizer): roman-numeral month matcher Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:38:32 +02:00
Marcel	b43dd6cdd4	fix(normalizer): keep Task 5 scoped — drop year-only matcher (belongs to Task 8) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:36:48 +02:00
Marcel	cff486dda7	fix(normalizer): treat leading date qualifiers (nach/vor/…) as APPROX _preprocess now sets approx=True when a leading marker is stripped; add _match_year_only so bare years (e.g. "nach 1900" -> "1900") resolve to 1900-01-01/YEAR before being upgraded to APPROX. Strengthen test_parse_approx_marker_upgrades_precision and add test_parse_leading_qualifier_is_approx (11 tests, all pass). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:35:19 +02:00
Marcel	df14e6b1ee	feat(normalizer): parse_date dispatch + iso/numeric matchers Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:30:07 +02:00
Marcel	1908dde859	feat(normalizer): year expansion century rule Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:27:26 +02:00
Marcel	4845e7a3c1	feat(normalizer): feast + season resolution Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:24:26 +02:00
Marcel	c6cceec6e9	feat(normalizer): Easter computus Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:21:39 +02:00
Marcel	8f6f4f2d62	feat(normalizer): scaffold tool + config tables Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:18:52 +02:00
Marcel	6f7aa643c9	docs(import): add normalizer implementation plan + apply persona review 17-task TDD plan for tools/import-normalizer/. Incorporates inline 6-persona review: content-deterministic idempotency, duplicate-index fix, provisional-id collision guard, date-parser edge cases, multi-sender split, CSV-injection defang, pinned deps. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 12:55:50 +02:00
Marcel	adfff420a5	docs(import): add import-migration analysis + normalizer spec Document the raw archive spreadsheet findings (IMP-01..12) and a requirements spec for an offline normalizer that produces a clean canonical dataset before import. Local docs only; no Gitea issue yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 12:32:37 +02:00
Marcel	8e9e3bba06	refactor(document): address review concerns from PR #660 All checks were successful CI / Semgrep Security Scan (pull_request) Successful in 21s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s Details nightly / deploy-staging (push) Successful in 2m2s Details CI / Unit & Component Tests (push) Successful in 3m58s Details CI / OCR Service Tests (push) Successful in 20s Details CI / Backend Unit Tests (push) Successful in 3m50s Details CI / fail2ban Regex (push) Successful in 44s Details CI / Unit & Component Tests (pull_request) Successful in 3m29s Details CI / Semgrep Security Scan (push) Successful in 21s Details CI / OCR Service Tests (pull_request) Successful in 21s Details CI / Backend Unit Tests (pull_request) Successful in 3m43s Details CI / Compose Bucket Idempotency (push) Successful in 59s Details CI / fail2ban Regex (pull_request) Successful in 45s Details - Restore JavaDoc on DocumentSearchResult.of() and .paged() factory methods - Remove redundant null guards on @Builder.Default collections in toListItem() - Map DocumentListItem fields explicitly in DocumentMultiSelect before cast - Add DocumentListItem required fields to docFactory in spec Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 19:27:31 +02:00
Marcel	627fc44d99	fix(document): fix test regressions from DocumentListItem migration All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m32s Details CI / OCR Service Tests (pull_request) Successful in 20s Details CI / Backend Unit Tests (pull_request) Successful in 3m46s Details CI / fail2ban Regex (pull_request) Successful in 42s Details CI / Semgrep Security Scan (pull_request) Successful in 19s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details - Use documentService.getDocumentById() in detail_stillReturnsTrainingLabels so the Document.full entity graph eager-loads trainingLabels - Flatten makeItem() factory in DocumentList.svelte.test.ts (nested document: {} overrides broke item.id / item.documentDate access) - Remove { document: {} } wrapper from DocumentMultiSelect.svelte.spec.ts mock responses — component now reads body.items directly as flat items - Flatten single nested item in page.svelte.test.ts document list test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 19:19:28 +02:00
Marcel	6583226d79	refactor(document): migrate frontend from DocumentSearchItem to flat DocumentListItem All components, specs, and the generated API client now use the new DocumentListItem shape — flat access (item.title, item.sender) instead of the removed item.document.* nesting. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 19:19:28 +02:00
Marcel	41b205becc	test(document): add LazyInit guard + detail regression tests; prune Document.list graph Remove trainingLabels from Document.list entity graph now that DocumentListItem does not touch that association. Integration tests guard against future LazyInitializationException regressions and confirm Document.full still loads trainingLabels for the detail endpoint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 19:19:28 +02:00
Marcel	f22dcaecb7	refactor(document): replace DocumentSearchItem with flat DocumentListItem DTO Eliminates excessive data exposure (OWASP API3:2023) — transcription, filePath, fileHash, thumbnailKey, scriptType and other detail-only fields are no longer serialised in the list API response. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 19:19:03 +02:00
Marcel	1109ab917b	docs(observability): ADR-024 + rotation runbook for grafana_reader All checks were successful CI / Backend Unit Tests (push) Successful in 3m35s Details CI / fail2ban Regex (push) Successful in 42s Details CI / Semgrep Security Scan (push) Successful in 19s Details CI / Compose Bucket Idempotency (push) Successful in 1m3s Details nightly / deploy-staging (push) Successful in 2m0s Details CI / Unit & Component Tests (pull_request) Successful in 3m39s Details CI / OCR Service Tests (pull_request) Successful in 22s Details CI / Backend Unit Tests (pull_request) Successful in 3m53s Details CI / fail2ban Regex (pull_request) Successful in 43s Details CI / Semgrep Security Scan (pull_request) Successful in 20s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s Details CI / Unit & Component Tests (push) Successful in 3m39s Details CI / OCR Service Tests (push) Successful in 20s Details ADR-024 records the deliberate cross-domain link (obs-grafana joins archiv-net to query archive-db via the SELECT-only grafana_reader role), the rejected alternatives (Prometheus exporter, read replica, versioned migration + flyway repair, hardcoded fallback), and the consequences — specifically that a Grafana compromise gains TCP reach to archive-db but is bounded by the role's least-privilege grants. The DEPLOYMENT.md runbook documents the rotation procedure that R__grafana_reader_password.sql now enables: bump GRAFANA_DB_PASSWORD, restart backend (Flyway re-applies because the resolved checksum changed), restart obs-grafana (datasource picks up the new env var). Also calls out the fail-closed startup behavior so operators who hit IllegalStateException know it is deliberate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 17:21:27 +02:00
Marcel	769984608b	test(observability): expand grafana_reader coverage with write-deny + PII negatives The original 4 tests asserted SELECT existed on the three granted tables and was absent on app_users. That left two gaps a future migration could slip through silently: - INSERT/UPDATE/DELETE on the granted tables — if someone GRANTed write access on, say, documents to grafana_reader, the SELECT positives stay green and the boundary is breached invisibly. - Other PII / sensitive tables — the single app_users negative checks one table; a wildcard "GRANT SELECT ON ALL TABLES IN SCHEMA public" would still leave it green by accident if app_users wasn't the only sensitive table. Switch to a hasPrivilege(table, privilege) helper, add three write-deny tests (INSERT/UPDATE/DELETE on each granted table), and replace the single app_users negative with a parameterized sweep over app_users, user_groups, persons, notifications, document_comments, document_annotations, geschichten. New sensitive tables get added to that list as they appear. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 17:21:01 +02:00
Marcel	c282f38170	feat(observability): own grafana_reader password via repeatable migration V68 used to set the role's password in a versioned migration, which Flyway applies exactly once per database. Rotating GRAFANA_DB_PASSWORD therefore had no effect on the DB role — operators would need a manual ALTER ROLE or a `flyway repair` that nobody documented. The shape conflated two lifecycles: schema migration (one-shot, immutable) and credential provisioning (rotatable). Split into: - V68 (versioned, immutable): creates the role and applies SELECT grants on audit_log, documents, transcription_blocks. - R__grafana_reader_password.sql (repeatable): issues ALTER ROLE … PASSWORD with the placeholder. Flyway computes the checksum on the resolved content, so any change to GRAFANA_DB_PASSWORD changes the checksum and re-applies the migration on the next boot. Rotation becomes "bump env var + restart backend". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 17:20:35 +02:00

1 2 3 4 5 ...

2858 Commits