familienarchiv

Author	SHA1	Message	Date
Marcel	889d301f16	fix(normalizer): correct _MIN_YEAR comment in test (1700 not 1500)	2026-05-25 20:53:16 +02:00
Marcel	443c7a48db	fix(normalizer): don't convert plausible typo years as Excel serials	2026-05-25 20:46:42 +02:00
Marcel	9ae1196d1c	feat(normalizer): add persons_tree skeleton + year extraction	2026-05-25 20:41:25 +02:00
Marcel	b37fd1728b	docs(importer): add Personendatei importer implementation plan 9-task TDD plan for persons_tree.py — year extraction, name index, deduplication, SPOUSE_OF/PARENT_OF extraction, CLI + JSON output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 20:38:14 +02:00
Marcel	6103d5d229	docs(importer): resolve open questions in Personendatei importer spec OQ-01: tool deduplicates rows with identical (firstName, lastName, birthYear) OQ-02: birthPlace/deathPlace kept as separate JSON fields OQ-03: multi-name firstName stored verbatim Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 20:28:45 +02:00
Marcel	7b483d357a	docs(importer): add Personendatei importer design spec Two-pass Python tool (persons_tree.py) that normalizes import/Personendatei 2.xlsx into canonical-persons-tree.json with persons, SPOUSE_OF/PARENT_OF relationships, and an unresolved[] list for manual review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 20:26:30 +02:00
Marcel	94a40237f4	feat(normalizer): generate structured tags from Schlagwort + Inhalt fields Adds tags.py module implementing a three-outcome heuristic: - Individual-to-individual correspondence tags ("Clara an Herbert") → dropped - Group/collective correspondence ("Clara an Kinder", "Walter an Geschwister") → Briefwechsel/<value> - Semantic/event tags ("Brautbriefe", "Alltag", "zur Hochzeit") → Themen/<value> Three correspondence patterns detected: space-an-space, starts-with-"an ", and abbreviated-sender form ("Maria W.an Clara"). COLLECTIVE_TERMS in config.py extended with 17 plural/group relational terms (söhne, brüder, schwiegereltern, cousinen, etc.) confirmed against the full Excel. Also adds two-phase summary mining: every run emits review/tag-candidates.csv; subsequent runs apply keywords from overrides/approved-themes.csv as Themen tags. Outputs: canonical-documents.xlsx gets pipe-separated "Parent/Child" tag paths; canonical-tag-tree.xlsx provides the full tag hierarchy for backend pre-import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 19:47:36 +02:00
Marcel	5efe3b8a7c	feat(normalizer): parse Spanish month names + Month DD-YYYY hyphen form All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m31s Details CI / OCR Service Tests (pull_request) Successful in 22s Details CI / Backend Unit Tests (pull_request) Successful in 3m42s Details CI / fail2ban Regex (pull_request) Successful in 45s Details CI / Semgrep Security Scan (pull_request) Successful in 20s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s Details Add Spanish month names (Mexican-branch letters) to config.MONTHS and let the month-first matcher accept a hyphen (not just a dot) before the year, so "Mayo 18-1929"/"Junio 7-904" parse without manual overrides. Also bound 4-digit years to 1700-2100 so gross typos ("23-9003") stay in review instead of producing a bogus year. Cuts unknown-date rate 9.2% -> 7.9%. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 17:00:33 +02:00
Marcel	0f1f9055c3	docs(normalizer): add overrides/ README with structure + examples All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m27s Details CI / OCR Service Tests (pull_request) Successful in 21s Details CI / Backend Unit Tests (pull_request) Successful in 3m40s Details CI / fail2ban Regex (pull_request) Successful in 45s Details CI / Semgrep Security Scan (pull_request) Successful in 21s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s Details Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:53:03 +02:00
Marcel	8cac63e938	feat(normalizer): drop unmatched-names.csv; unresolved-names is the names report All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m32s Details CI / OCR Service Tests (pull_request) Successful in 19s Details CI / Backend Unit Tests (pull_request) Successful in 3m26s Details CI / fail2ban Regex (pull_request) Successful in 47s Details CI / Semgrep Security Scan (pull_request) Successful in 21s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details The unmatched list was just non-family correspondents (expected noise); their count stays in summary.txt and they remain in canonical-persons.xlsx. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:46:08 +02:00
Marcel	97db718f81	docs(import): add unresolved-names plan + worklog entry All checks were successful CI / OCR Service Tests (pull_request) Successful in 22s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s Details CI / Backend Unit Tests (pull_request) Successful in 3m52s Details CI / fail2ban Regex (pull_request) Successful in 42s Details CI / Unit & Component Tests (pull_request) Successful in 4m13s Details CI / Semgrep Security Scan (pull_request) Successful in 20s Details Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:01:18 +02:00
Marcel	06127724de	docs(normalizer): document unresolved-names.csv review report Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:59:45 +02:00
Marcel	7c017eca2a	test(normalizer): assert unresolved stat key + drop duplicate assertion Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:58:34 +02:00
Marcel	97ab9e38df	feat(normalizer): unresolved-names report + fix ambiguous-pair over-flagging Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:54:37 +02:00
Marcel	f10b80a03f	feat(normalizer): build_given_names from register + supplement Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:51:23 +02:00
Marcel	6478cc58ae	feat(normalizer): classify_name + NameClass Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:47:40 +02:00
Marcel	a7c45b3a0e	feat(normalizer): config tables for name classification Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:43:31 +02:00
Marcel	5ff0c25e10	chore: drop stray reader-dashboard test from this branch All checks were successful CI / Semgrep Security Scan (pull_request) Successful in 23s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s Details CI / Unit & Component Tests (pull_request) Successful in 3m31s Details CI / OCR Service Tests (pull_request) Successful in 20s Details CI / Backend Unit Tests (pull_request) Successful in 3m53s Details CI / fail2ban Regex (pull_request) Successful in 41s Details page.server.spec.ts picked up an unrelated reader-dashboard test case via a cross-session staging race; restore it to match main so this PR only touches the import-normalizer tool + docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:07:14 +02:00
Marcel	7ba3a29592	docs(import): record normalizer completion + dry-run results in worklog Some checks failed CI / Unit & Component Tests (pull_request) Failing after 1m17s Details CI / OCR Service Tests (pull_request) Successful in 19s Details CI / Backend Unit Tests (pull_request) Successful in 3m46s Details CI / fail2ban Regex (pull_request) Successful in 41s Details CI / Semgrep Security Scan (pull_request) Successful in 20s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s Details Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:56:20 +02:00
Marcel	d314fd9338	docs(normalizer): README + seed overrides Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:51:20 +02:00
Marcel	18d5a1e2da	feat(normalizer): orchestrator + end-to-end integration test Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:46:13 +02:00
Marcel	df00ea4238	fix(normalizer): defang leading LF in CSV + assert pinned workbook timestamp Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:43:45 +02:00
Marcel	ff1a7c07f1	feat(normalizer): overrides loader + xlsx/csv writers Recovered from an entangled commit: these files were correct but had been bundled into an unrelated reader-dashboard commit by a concurrent session. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:39:28 +02:00
Marcel	366b484815	test(normalizer): real provisional-vs-register collision + override-hits coverage Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:25:49 +02:00
Marcel	88c8063227	feat(normalizer): person resolution context + to_canonical Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:18:09 +02:00
Marcel	3066d3d3ff	refactor(normalizer): harden triage index guard + index_file_mismatch tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:15:50 +02:00
Marcel	3e7ddea90a	feat(normalizer): row extraction, triage, canonical record Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:12:48 +02:00
Marcel	75b3ca8b9e	fix(normalizer): don't coerce boolean cells to 1/0 Add bool guard before the int branch in _cell_to_str so True/False cells are preserved as "True"/"False" instead of "1"/"0". Add two regression tests covering the fix and missing-sheet error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:11:19 +02:00
Marcel	74c4c390fc	feat(normalizer): xlsx ingest + header mapping Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:08:30 +02:00
Marcel	29087319e6	test(normalizer): cover AliasIndex unambiguous first-name resolution Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:07:20 +02:00
Marcel	53457d9319	feat(normalizer): alias index with maiden/married/nickname resolution Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:04:11 +02:00
Marcel	2d97595e9c	fix(normalizer): split_receivers returns [] for a geb.-only cell Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:02:35 +02:00
Marcel	a177077b40	feat(normalizer): receiver splitting Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:59:51 +02:00
Marcel	b7a2332861	fix(normalizer): suffix all members of a colliding person-id group Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:58:35 +02:00
Marcel	1da1a8d223	feat(normalizer): person register parsing Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:54:37 +02:00
Marcel	59715bdccd	fix(normalizer): require day-dot in English month-first matcher (structural anti-shadow) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:53:05 +02:00
Marcel	53a661adb6	feat(normalizer): month/year, feast/season, range matchers + overrides Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:47:26 +02:00
Marcel	4942c0ea07	feat(normalizer): day-first month-name matcher Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:42:36 +02:00
Marcel	7edc002ebb	feat(normalizer): roman-numeral month matcher Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:38:32 +02:00
Marcel	b43dd6cdd4	fix(normalizer): keep Task 5 scoped — drop year-only matcher (belongs to Task 8) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:36:48 +02:00
Marcel	cff486dda7	fix(normalizer): treat leading date qualifiers (nach/vor/…) as APPROX _preprocess now sets approx=True when a leading marker is stripped; add _match_year_only so bare years (e.g. "nach 1900" -> "1900") resolve to 1900-01-01/YEAR before being upgraded to APPROX. Strengthen test_parse_approx_marker_upgrades_precision and add test_parse_leading_qualifier_is_approx (11 tests, all pass). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:35:19 +02:00
Marcel	df14e6b1ee	feat(normalizer): parse_date dispatch + iso/numeric matchers Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:30:07 +02:00
Marcel	1908dde859	feat(normalizer): year expansion century rule Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:27:26 +02:00
Marcel	4845e7a3c1	feat(normalizer): feast + season resolution Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:24:26 +02:00
Marcel	c6cceec6e9	feat(normalizer): Easter computus Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:21:39 +02:00
Marcel	8f6f4f2d62	feat(normalizer): scaffold tool + config tables Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:18:52 +02:00
Marcel	6f7aa643c9	docs(import): add normalizer implementation plan + apply persona review 17-task TDD plan for tools/import-normalizer/. Incorporates inline 6-persona review: content-deterministic idempotency, duplicate-index fix, provisional-id collision guard, date-parser edge cases, multi-sender split, CSV-injection defang, pinned deps. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 12:55:50 +02:00
Marcel	adfff420a5	docs(import): add import-migration analysis + normalizer spec Document the raw archive spreadsheet findings (IMP-01..12) and a requirements spec for an offline normalizer that produces a clean canonical dataset before import. Local docs only; no Gitea issue yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 12:32:37 +02:00
Marcel	8e9e3bba06	refactor(document): address review concerns from PR #660 All checks were successful CI / Semgrep Security Scan (pull_request) Successful in 21s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s Details nightly / deploy-staging (push) Successful in 2m2s Details CI / Unit & Component Tests (push) Successful in 3m58s Details CI / OCR Service Tests (push) Successful in 20s Details CI / Backend Unit Tests (push) Successful in 3m50s Details CI / fail2ban Regex (push) Successful in 44s Details CI / Unit & Component Tests (pull_request) Successful in 3m29s Details CI / Semgrep Security Scan (push) Successful in 21s Details CI / OCR Service Tests (pull_request) Successful in 21s Details CI / Backend Unit Tests (pull_request) Successful in 3m43s Details CI / Compose Bucket Idempotency (push) Successful in 59s Details CI / fail2ban Regex (pull_request) Successful in 45s Details - Restore JavaDoc on DocumentSearchResult.of() and .paged() factory methods - Remove redundant null guards on @Builder.Default collections in toListItem() - Map DocumentListItem fields explicitly in DocumentMultiSelect before cast - Add DocumentListItem required fields to docFactory in spec Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 19:27:31 +02:00
Marcel	627fc44d99	fix(document): fix test regressions from DocumentListItem migration All checks were successful CI / Unit & Component Tests (pull_request) Successful in 3m32s Details CI / OCR Service Tests (pull_request) Successful in 20s Details CI / Backend Unit Tests (pull_request) Successful in 3m46s Details CI / fail2ban Regex (pull_request) Successful in 42s Details CI / Semgrep Security Scan (pull_request) Successful in 19s Details CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s Details - Use documentService.getDocumentById() in detail_stillReturnsTrainingLabels so the Document.full entity graph eager-loads trainingLabels - Flatten makeItem() factory in DocumentList.svelte.test.ts (nested document: {} overrides broke item.id / item.documentDate access) - Remove { document: {} } wrapper from DocumentMultiSelect.svelte.spec.ts mock responses — component now reads body.items directly as flat items - Flatten single nested item in page.svelte.test.ts document list test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-22 19:19:28 +02:00

1 2 3 4 5 ...

2864 Commits