docs(import): record normalizer completion + dry-run results in worklog
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 1m17s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 3m46s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 1m17s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 3m46s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -4,6 +4,46 @@ Running log of each working session. **Resume here.** Newest entry on top.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-25 (session 4) — Built the normalizer (subagent-driven, all 17 tasks)
|
||||
|
||||
**Did:** Executed the plan subagent-driven (implementer + spec review + code-quality review per
|
||||
task). The tool `tools/import-normalizer/` is **complete and passing (57 tests)**. Final
|
||||
opus review: **READY** — determinism verified on the real corpus (two runs → identical cell
|
||||
matrices + byte-identical review files), zero silent drops.
|
||||
|
||||
**Per-task code review caught & fixed real issues** (all in the committed code): leading
|
||||
qualifiers `nach/vor/…` now → APPROX; English month-first matcher hardened to structurally
|
||||
not shadow `Mai 1895`; person-id collision de-dup suffixes *all* members; `split_receivers`
|
||||
returns `[]` for a `geb.`-only cell; boolean cells no longer coerced to `1/0`; duplicate-index
|
||||
flags every occurrence; provisional ids never steal a register id; CSV-injection defanged.
|
||||
|
||||
**REAL DRY-RUN** (`python normalize.py` over the actual archive — outputs are gitignored):
|
||||
- documents_emitted **7,582** (+225 empty +93 blank-index +42 x-suffix = 7,942 rows read, 0 dropped)
|
||||
- register_persons **163**, provisional_persons **942**
|
||||
- dates: DAY 6,509 / MONTH 36 / RANGE 36 / APPROX 28 / YEAR 17 / SEASON 1 / UNKNOWN 955
|
||||
- **unknown_date_rate 9.2%** (of dated rows; target ≤5% pre-override, ≤0.5% after overrides)
|
||||
- duplicate_index 85, index_file_mismatches 550, ambiguous_receivers 303
|
||||
|
||||
**⚠️ Concurrency incident:** a parallel Claude session committed reader-dashboard work to this
|
||||
branch and hard-reset it mid-execution, deleting the Task 15 files and orphaning a commit.
|
||||
Recovered via reflog (`reset --hard 366b4848` + `checkout 401160e3 -- <task15 files>`); no code
|
||||
lost. Casualty: my *during-execution* edits to the plan/spec docs (02/03) for Tasks 5–14 were
|
||||
discarded — **the committed code + tests are the source of truth**, not the plan doc, which now
|
||||
reflects the pre-execution + persona-review version.
|
||||
|
||||
**Next steps (iterative refinement — the overrides loop, as designed):**
|
||||
1. Shave the 9.2% UNKNOWN cheaply: add **Spanish month names** (Enero…Diciembre) and the
|
||||
`Mon DD-YYYY` dash form to `config.MONTHS`/the parser (Mexican-branch correspondence);
|
||||
revisit the 58–72 two-digit-year band (real `…58/59/60` dates = 1958–1960, just past the
|
||||
1873–1957 window — decide whether to extend the upper bound in `config`).
|
||||
2. `?` (99×) is genuinely "date unknown" — leave UNKNOWN or add a convention.
|
||||
3. Populate `overrides/dates.csv` + `overrides/names.csv` from the review CSVs and re-run.
|
||||
4. README note: a leading `'`/`!` in a `review/*.csv` `raw` cell may be a CSV-defang artifact —
|
||||
match against the true source value when writing overrides.
|
||||
5. Phase 2 (separate spec): wire the canonical contract into the Java `MassImportService`.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-25 (session 3) — Implementation plan + persona review
|
||||
|
||||
**Did:**
|
||||
|
||||
Reference in New Issue
Block a user