Marcel
5efe3b8a7c
feat(normalizer): parse Spanish month names + Month DD-YYYY hyphen form
...
CI / Unit & Component Tests (pull_request) Successful in 3m31s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m42s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s
Add Spanish month names (Mexican-branch letters) to config.MONTHS and let
the month-first matcher accept a hyphen (not just a dot) before the year, so
"Mayo 18-1929"/"Junio 7-904" parse without manual overrides. Also bound
4-digit years to 1700-2100 so gross typos ("23-9003") stay in review instead
of producing a bogus year. Cuts unknown-date rate 9.2% -> 7.9%.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 17:00:33 +02:00
Marcel
0f1f9055c3
docs(normalizer): add overrides/ README with structure + examples
...
CI / Unit & Component Tests (pull_request) Successful in 3m27s
CI / OCR Service Tests (pull_request) Successful in 21s
CI / Backend Unit Tests (pull_request) Successful in 3m40s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m3s
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 16:53:03 +02:00
Marcel
8cac63e938
feat(normalizer): drop unmatched-names.csv; unresolved-names is the names report
...
CI / Unit & Component Tests (pull_request) Successful in 3m32s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 3m26s
CI / fail2ban Regex (pull_request) Successful in 47s
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
The unmatched list was just non-family correspondents (expected noise);
their count stays in summary.txt and they remain in canonical-persons.xlsx.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 16:46:08 +02:00
Marcel
06127724de
docs(normalizer): document unresolved-names.csv review report
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 15:59:45 +02:00
Marcel
7c017eca2a
test(normalizer): assert unresolved stat key + drop duplicate assertion
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 15:58:34 +02:00
Marcel
97ab9e38df
feat(normalizer): unresolved-names report + fix ambiguous-pair over-flagging
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 15:54:37 +02:00
Marcel
f10b80a03f
feat(normalizer): build_given_names from register + supplement
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 15:51:23 +02:00
Marcel
6478cc58ae
feat(normalizer): classify_name + NameClass
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 15:47:40 +02:00
Marcel
a7c45b3a0e
feat(normalizer): config tables for name classification
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 15:43:31 +02:00
Marcel
d314fd9338
docs(normalizer): README + seed overrides
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:51:20 +02:00
Marcel
18d5a1e2da
feat(normalizer): orchestrator + end-to-end integration test
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:46:13 +02:00
Marcel
df00ea4238
fix(normalizer): defang leading LF in CSV + assert pinned workbook timestamp
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:43:45 +02:00
Marcel
ff1a7c07f1
feat(normalizer): overrides loader + xlsx/csv writers
...
Recovered from an entangled commit: these files were correct but had been
bundled into an unrelated reader-dashboard commit by a concurrent session.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:39:28 +02:00
Marcel
366b484815
test(normalizer): real provisional-vs-register collision + override-hits coverage
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:25:49 +02:00
Marcel
88c8063227
feat(normalizer): person resolution context + to_canonical
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:18:09 +02:00
Marcel
3066d3d3ff
refactor(normalizer): harden triage index guard + index_file_mismatch tests
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:15:50 +02:00
Marcel
3e7ddea90a
feat(normalizer): row extraction, triage, canonical record
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:12:48 +02:00
Marcel
75b3ca8b9e
fix(normalizer): don't coerce boolean cells to 1/0
...
Add bool guard before the int branch in _cell_to_str so True/False
cells are preserved as "True"/"False" instead of "1"/"0". Add two
regression tests covering the fix and missing-sheet error.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:11:19 +02:00
Marcel
74c4c390fc
feat(normalizer): xlsx ingest + header mapping
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:08:30 +02:00
Marcel
29087319e6
test(normalizer): cover AliasIndex unambiguous first-name resolution
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:07:20 +02:00
Marcel
53457d9319
feat(normalizer): alias index with maiden/married/nickname resolution
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:04:11 +02:00
Marcel
2d97595e9c
fix(normalizer): split_receivers returns [] for a geb.-only cell
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 14:02:35 +02:00
Marcel
a177077b40
feat(normalizer): receiver splitting
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:59:51 +02:00
Marcel
b7a2332861
fix(normalizer): suffix all members of a colliding person-id group
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:58:35 +02:00
Marcel
1da1a8d223
feat(normalizer): person register parsing
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:54:37 +02:00
Marcel
59715bdccd
fix(normalizer): require day-dot in English month-first matcher (structural anti-shadow)
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:53:05 +02:00
Marcel
53a661adb6
feat(normalizer): month/year, feast/season, range matchers + overrides
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:47:26 +02:00
Marcel
4942c0ea07
feat(normalizer): day-first month-name matcher
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:42:36 +02:00
Marcel
7edc002ebb
feat(normalizer): roman-numeral month matcher
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:38:32 +02:00
Marcel
b43dd6cdd4
fix(normalizer): keep Task 5 scoped — drop year-only matcher (belongs to Task 8)
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:36:48 +02:00
Marcel
cff486dda7
fix(normalizer): treat leading date qualifiers (nach/vor/…) as APPROX
...
_preprocess now sets approx=True when a leading marker is stripped; add
_match_year_only so bare years (e.g. "nach 1900" -> "1900") resolve to
1900-01-01/YEAR before being upgraded to APPROX. Strengthen
test_parse_approx_marker_upgrades_precision and add
test_parse_leading_qualifier_is_approx (11 tests, all pass).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:35:19 +02:00
Marcel
df14e6b1ee
feat(normalizer): parse_date dispatch + iso/numeric matchers
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:30:07 +02:00
Marcel
1908dde859
feat(normalizer): year expansion century rule
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:27:26 +02:00
Marcel
4845e7a3c1
feat(normalizer): feast + season resolution
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:24:26 +02:00
Marcel
c6cceec6e9
feat(normalizer): Easter computus
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:21:39 +02:00
Marcel
8f6f4f2d62
feat(normalizer): scaffold tool + config tables
...
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-25 13:18:52 +02:00