Commit Graph

3 Commits

Author SHA1 Message Date
Marcel
5efe3b8a7c feat(normalizer): parse Spanish month names + Month DD-YYYY hyphen form
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m31s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m42s
CI / fail2ban Regex (pull_request) Successful in 45s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s
Add Spanish month names (Mexican-branch letters) to config.MONTHS and let
the month-first matcher accept a hyphen (not just a dot) before the year, so
"Mayo 18-1929"/"Junio 7-904" parse without manual overrides. Also bound
4-digit years to 1700-2100 so gross typos ("23-9003") stay in review instead
of producing a bogus year. Cuts unknown-date rate 9.2% -> 7.9%.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 17:00:33 +02:00
Marcel
a7c45b3a0e feat(normalizer): config tables for name classification
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 15:43:31 +02:00
Marcel
8f6f4f2d62 feat(normalizer): scaffold tool + config tables
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:18:52 +02:00