diff --git a/docs/import-migration/02-normalization-spec.md b/docs/import-migration/02-normalization-spec.md index b2829d23..b301c42c 100644 --- a/docs/import-migration/02-normalization-spec.md +++ b/docs/import-migration/02-normalization-spec.md @@ -176,6 +176,14 @@ letter actually said.* Silvester=12-31, …). Seasons map to representative months: Frühling/Frühjahr=Apr, Sommer=Jul, Herbst=Oct, Winter=Jan. The feast/season tables and Easter algorithm live in `config.py` (NFR-MAINT-01). +- **REQ-DATE-07** — **Intra-month day ranges carry an end day; half-resolved ranges are + flagged.** For a day range like `7./8. Sept.1923`, `date_iso` holds the start day, the end + day is resolved against the shared month/year into `date_end`, and `date_precision` = + `RANGE`. If the **start** parses but the **end day is impossible** (e.g. `10./40.1.1917`), + the row keeps the start and `RANGE` precision, leaves `date_end` **empty**, and is flagged + `needs_review = range_end_unparsed` — the unparseable end is dropped honestly (surfaced for + review), never silently invented or clamped. A `RANGE` row **may** therefore legitimately + have an empty `date_end`; the importer must treat `date_end` as optional even on a `RANGE`. ### 4.4 Person resolution & dedup (`FR-PERS`, `FR-DEDUP`) — resolves IMP-04, IMP-05, IMP-11 @@ -262,6 +270,7 @@ DB schema. | Field | Required | Format / values | Notes | | --- | --- | --- | --- | | `index` | yes | string | Stable key; basis for PDF matching. | +| `file` | no | string | verbatim `Datei` value (e.g. `H-0730.pdf`); carried through for the importer to link the scanned PDF. | | `box` | no | string | from `Box`. | | `folder` | no | string | from `Mappe`. | | `sender_person_id` | no | person_id | resolved; empty if no sender. | @@ -271,11 +280,12 @@ DB schema. | `date_iso` | no | `YYYY-MM-DD` | best-effort; empty if `UNKNOWN`. | | `date_raw` | no | string | verbatim source date. | | `date_precision` | yes | enum | `DAY\|MONTH\|SEASON\|YEAR\|RANGE\|APPROX\|UNKNOWN`. | +| `date_end` | no | `YYYY-MM-DD` or empty | RANGE end day (e.g. `7./8. Sept.1923` → `date_iso` = start, `date_end` = end). Empty for every non-RANGE precision **and** for a half-resolved RANGE whose end did not parse (see REQ-DATE-07). | | `location` | no | string | from `Ort`. | | `tags` | no | `tag\|tag` | from `Schlagwort`. | | `summary` | no | string | from `Inhalt`. | | `source_row` | yes | int | provenance (NFR-DATA-01). | -| `needs_review` | yes | `flag\|flag` or empty | review flags (REQ-PROV-02). | +| `needs_review` | yes | `flag\|flag` or empty | review flags (REQ-PROV-02). Flags include `unparsed_date`, `range_end_unparsed` (half-resolved RANGE, REQ-DATE-07), `unmatched_sender`, `unmatched_receiver`, `multi_sender`, `index_file_mismatch`, `duplicate_index`. | ### 6.2 `canonical-persons.xlsx` @@ -295,6 +305,27 @@ DB schema. | `aliases` | no | `a\|b\|c` | every surface form that maps here. | | `provisional` | yes | bool | true if created from a document string, not the register. | +### 6.3 `canonical-persons-tree.json` + +The de-duplicated genealogical tree (family members + their relationships) the importer +uses to seed the family graph. Each `persons[]` entry carries a `personId` that **joins +1:1 onto** `person_id` in `canonical-persons.xlsx`. + +| Field | Required | Format | Notes | +| --- | --- | --- | --- | +| `personId` | yes | slug | The register's **verbatim** `person_id` (e.g. `cram-hans-1`), propagated — never re-slugified — so collision suffixes match `canonical-persons.xlsx` exactly. Every tree `personId` exists in the register; the register is the sole slug authority. | +| `firstName` / `lastName` / `maidenName` | first/last yes | string | name parts. | +| `birthYear` / `deathYear` | no | int or null | year only (tree granularity). | +| `birthPlace` / `deathPlace` | no | string or null | from the register. | +| `generation` | no | int or null | parsed from `G n`. | +| `notes` | no | string or null | leftover Bemerkung text after relationship extraction. | +| `familyMember` | yes | bool | always true for tree persons. | + +A top-level `generated_at` is pinned to a fixed timestamp (`2020-01-01T00:00:00`) for +reproducibility (NFR-IDEM-01), not a wall-clock value. `relationships[]` carry `SPOUSE_OF` +and `PARENT_OF` edges keyed by `rowId`; `unresolved[]` lists relationship strings that did +not match a tree person. + --- ## 7. Prioritized Backlog (MoSCoW) @@ -339,7 +370,7 @@ DB schema. | ID | Question | Why it matters | Ref | Resolution | | --- | --- | --- | --- | --- | | OQ-01 ✅ | Season/holiday → date. | Accuracy of ~70 SEASON/feast rows. | REQ-DATE-06 | **Resolved (2026-05-25):** movable feasts (Ostern, Pfingsten, Himmelfahrt, Advent, …) **computed per year from Easter — never a fixed month**; fixed feasts looked up (Weihnachten=12-25, Neujahr=01-01, …); seasons = mid-season month (Frühling=Apr, Sommer=Jul, Herbst=Oct, Winter=Jan). | -| OQ-02 ✅ | Date ranges: start only, or start+end? | Sorting/display of ~315 range values. | REQ-DATE-02 | **Confirmed:** store **start** in `date_iso`, precision `RANGE`, full text in `date_raw`. | +| OQ-02 ✅ | Date ranges: start only, or start+end? | Sorting/display of ~315 range values. | REQ-DATE-02, REQ-DATE-07 | **Confirmed (updated #670):** store **start** in `date_iso`, precision `RANGE`, full text in `date_raw`, **and the resolved end day in `date_end`** for intra-month day ranges. A half-resolved range (start parsed, end impossible) keeps `date_end` empty and is flagged `range_end_unparsed`. | | OQ-03 ✅ | `person_id` format. | Stability across re-runs; diffability. | §6 | **Confirmed:** readable slug `lastname-firstname`, numeric suffix on collision. | | OQ-04 ✅ | `x`-suffix row handling. | 42 rows. | REQ-TRIAGE-03 | **Resolved (2026-05-25):** `x` rows are transcriptions of the base letter but not yet mappable → **skip this pass**, log to `review/skipped-x-suffix.csv` for later linking. | | OQ-05 ✅ | Importer output format. | Phase-2 reader. | B11 | **Confirmed:** `.xlsx` (openpyxl-native, headered). | diff --git a/tools/import-normalizer/.gitignore b/tools/import-normalizer/.gitignore index 426c6709..1907040b 100644 --- a/tools/import-normalizer/.gitignore +++ b/tools/import-normalizer/.gitignore @@ -1,6 +1,7 @@ .venv/ -out/ +out/* !out/canonical-persons-tree.json +!out/*.xlsx review/ __pycache__/ *.pyc diff --git a/tools/import-normalizer/dates.py b/tools/import-normalizer/dates.py index 77245680..907178b2 100644 --- a/tools/import-normalizer/dates.py +++ b/tools/import-normalizer/dates.py @@ -66,6 +66,24 @@ class ParsedDate: iso: str | None precision: Precision raw: str + end: str | None = None # RANGE end day; None for every non-RANGE precision + # True only for a half-resolved RANGE: the start parsed but the end did not, so + # the end was dropped and the row should surface in review (#670, Gap 2). + needs_review: bool = False + + +@dataclass(frozen=True) +class MatchResult: + """Uniform return shape for every _match_* matcher. + + A matcher returns None when it does not match, or a MatchResult when it does. + `end` is the RANGE end day (None for every non-RANGE precision); `needs_review` + is True only for a half-resolved RANGE whose start parsed but end did not. + """ + iso: str + precision: Precision + end: str | None = None + needs_review: bool = False _LEADING_MARKERS = re.compile( @@ -97,7 +115,7 @@ def _match_iso(s): if re.fullmatch(r"\d{4}-\d{2}-\d{2}", s): try: datetime.date.fromisoformat(s) - return s, Precision.DAY + return MatchResult(s, Precision.DAY) except ValueError: return None return None @@ -112,7 +130,7 @@ def _match_numeric(s): if year is None or not (1 <= month <= 12): return None try: - return datetime.date(year, month, day).isoformat(), Precision.DAY + return MatchResult(datetime.date(year, month, day).isoformat(), Precision.DAY) except ValueError: return None @@ -130,7 +148,7 @@ def _match_roman(s): if not month or year is None: return None try: - return datetime.date(year, month, day).isoformat(), Precision.DAY + return MatchResult(datetime.date(year, month, day).isoformat(), Precision.DAY) except ValueError: return None @@ -146,7 +164,7 @@ def _build_day_month_year(day, month, year): if not month or year is None or not (1 <= month <= 12): return None try: - return datetime.date(year, month, day).isoformat(), Precision.DAY + return MatchResult(datetime.date(year, month, day).isoformat(), Precision.DAY) except ValueError: return None @@ -188,7 +206,7 @@ def _match_month_year(s): year = expand_year(m.group(2)) if not month or year is None: return None - return datetime.date(year, month, 1).isoformat(), Precision.MONTH + return MatchResult(datetime.date(year, month, 1).isoformat(), Precision.MONTH) def _match_feast_season(s): @@ -198,33 +216,44 @@ def _match_feast_season(s): year = expand_year(m.group(2)) if year is None: return None - return resolve_feast_or_season(m.group(1), year) + resolved = resolve_feast_or_season(m.group(1), year) + if resolved is None: + return None + iso, precision = resolved + return MatchResult(iso, precision) def _match_year_only(s): if _YEAR_ONLY_RE.fullmatch(s): - return datetime.date(int(s), 1, 1).isoformat(), Precision.YEAR + return MatchResult(datetime.date(int(s), 1, 1).isoformat(), Precision.YEAR) return None def _match_range(s): m = _RANGE_YY_RE.fullmatch(s) if m: - return datetime.date(int(m.group(1)), 1, 1).isoformat(), Precision.RANGE + return MatchResult(datetime.date(int(m.group(1)), 1, 1).isoformat(), Precision.RANGE) m = _RANGE_DAY_RE.fullmatch(s) if m: - first = f"{m.group(1)}.{m.group(3)}" # "7." + "Sept.1923" -> "7.Sept.1923" - for matcher in (_match_numeric, _match_monthname_a): - r = matcher(first) - if r: - return r[0], Precision.RANGE + day_start, day_end, rest = m.group(1), m.group(2), m.group(3) + # "10." + "1.1917" -> "10.1.1917"; resolve start and end day against the shared month/year + for matcher in (_match_numeric, _match_roman, _match_monthname_a): + start = matcher(f"{day_start}.{rest}") + if start: + end = matcher(f"{day_end}.{rest}") + # Half-resolved range (start parsed, end did not — e.g. the impossible + # end day in "10./40.1.1917"): keep the start and RANGE precision, drop + # the end, and flag needs_review so the dropped end surfaces (#670, Gap 2). + return MatchResult(start.iso, Precision.RANGE, + end.iso if end else None, + needs_review=end is None) m = _RANGE_HYPHEN_RE.fullmatch(s) if m: start = m.group(1).strip() for matcher in (_match_numeric, _match_roman, _match_monthname_a, _match_year_only): r = matcher(start) if r: - return r[0], Precision.RANGE + return MatchResult(r.iso, Precision.RANGE) return None @@ -253,10 +282,8 @@ def parse_date(raw: str, date_overrides: dict | None = None) -> ParsedDate: for matcher in _MATCHERS: result = matcher(cleaned) if result: - iso, precision = result - if approx: - precision = Precision.APPROX - return ParsedDate(iso, precision, raw) + precision = Precision.APPROX if approx else result.precision + return ParsedDate(result.iso, precision, raw, result.end, result.needs_review) return ParsedDate(None, Precision.UNKNOWN, raw) diff --git a/tools/import-normalizer/documents.py b/tools/import-normalizer/documents.py index 3ebac821..94381acf 100644 --- a/tools/import-normalizer/documents.py +++ b/tools/import-normalizer/documents.py @@ -31,6 +31,7 @@ class RawRow: @dataclass class CanonicalDocument: index: str + file: str = "" box: str = "" folder: str = "" sender_person_id: str = "" @@ -40,6 +41,7 @@ class CanonicalDocument: date_iso: str = "" date_raw: str = "" date_precision: str = "" + date_end: str = "" location: str = "" tags: list = field(default_factory=list) summary: str = "" @@ -105,15 +107,18 @@ def to_canonical(raw, ctx, date_overrides: dict, approved_themes: frozenset = fr if raw.date.strip() and pd.precision == _dates.Precision.UNKNOWN: flags.append("unparsed_date") + if pd.needs_review: + flags.append("range_end_unparsed") if index_file_mismatch(raw.index, raw.file): flags.append("index_file_mismatch") return CanonicalDocument( - index=raw.index, box=raw.box, folder=raw.folder, + index=raw.index, file=raw.file, box=raw.box, folder=raw.folder, sender_person_id=sender_id, sender_name=sender_name, receiver_person_ids=[r[0] for r in receivers], receiver_names=[r[1] for r in receivers], date_iso=pd.iso or "", date_raw=raw.date, date_precision=str(pd.precision), + date_end=pd.end or "", location=raw.location, tags=_tags.generate_tags(raw.tags, raw.summary, approved_themes), summary=raw.summary, source_row=raw.source_row, needs_review=flags, ) diff --git a/tools/import-normalizer/out/canonical-documents.xlsx b/tools/import-normalizer/out/canonical-documents.xlsx new file mode 100644 index 00000000..e0731299 Binary files /dev/null and b/tools/import-normalizer/out/canonical-documents.xlsx differ diff --git a/tools/import-normalizer/out/canonical-persons-tree.json b/tools/import-normalizer/out/canonical-persons-tree.json index 663f0b9f..8a5acc57 100644 --- a/tools/import-normalizer/out/canonical-persons-tree.json +++ b/tools/import-normalizer/out/canonical-persons-tree.json @@ -1,5 +1,5 @@ { - "generated_at": "2026-05-25T21:18:00.241406", + "generated_at": "2020-01-01T00:00:00", "source": "Personendatei 2.xlsx", "stats": { "persons": 157, @@ -19,7 +19,8 @@ "birthPlace": "Garz", "deathPlace": "Espelkamp", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "allemeyer-elsgard" }, { "rowId": "row_003", @@ -33,7 +34,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "allemeyer-werner" }, { "rowId": "row_004", @@ -47,7 +49,8 @@ "birthPlace": null, "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "allemeyer-juergen" }, { "rowId": "row_005", @@ -61,7 +64,8 @@ "birthPlace": null, "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "allemeyer-jutta" }, { "rowId": "row_006", @@ -75,7 +79,8 @@ "birthPlace": "Bünde,Westfalen", "deathPlace": "Berlin", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "bertkau-hanna" }, { "rowId": "row_007", @@ -89,7 +94,8 @@ "birthPlace": "Schülperneusiel", "deathPlace": "Göteborg", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "blomquist-charlotte" }, { "rowId": "row_008", @@ -103,7 +109,8 @@ "birthPlace": "Göteborg", "deathPlace": "Haga", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "blomquist-karl-erhard" }, { "rowId": "row_009", @@ -117,7 +124,8 @@ "birthPlace": "Mexiko", "deathPlace": "Bohrmann", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "bohrmann-else" }, { "rowId": "row_010", @@ -131,7 +139,8 @@ "birthPlace": "Mannheim", "deathPlace": "Heidelberg", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "bohrmann-ludwig" }, { "rowId": "row_011", @@ -145,7 +154,8 @@ "birthPlace": "Karlsruhe", "deathPlace": "Kassel", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "bohrmann-kurt" }, { "rowId": "row_012", @@ -159,7 +169,8 @@ "birthPlace": null, "deathPlace": "Kassel", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "bohrmann-ruth" }, { "rowId": "row_013", @@ -173,7 +184,8 @@ "birthPlace": "Mainz", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "braun-ruth" }, { "rowId": "row_014", @@ -187,7 +199,8 @@ "birthPlace": "Berlin", "deathPlace": "Düsseldorf", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "burkhard-meier-ellen" }, { "rowId": "row_015", @@ -201,7 +214,8 @@ "birthPlace": "Berlin", "deathPlace": "Aachen", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "cram-alli" }, { "rowId": "row_016", @@ -215,7 +229,8 @@ "birthPlace": "Schleswig Holstein", "deathPlace": "Monterrey, Mexiko", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "cram-alma" }, { "rowId": "row_017", @@ -229,7 +244,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "cram-berit" }, { "rowId": "row_018", @@ -243,7 +259,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "cram-bjoern" }, { "rowId": "row_019", @@ -257,7 +274,8 @@ "birthPlace": "Ruhrort", "deathPlace": "Berlin", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "cram-clara" }, { "rowId": "row_020", @@ -271,7 +289,8 @@ "birthPlace": "Essen", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "cram-doris" }, { "rowId": "row_021", @@ -285,7 +304,8 @@ "birthPlace": "Berlin", "deathPlace": "Berlin", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "cram-ella-anita" }, { "rowId": "row_022", @@ -299,7 +319,8 @@ "birthPlace": "Berlin", "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "cram-elsbeth" }, { "rowId": "row_023", @@ -313,7 +334,8 @@ "birthPlace": "Vogtland", "deathPlace": "Federal Way", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "cram-erna" }, { "rowId": "row_024", @@ -327,7 +349,8 @@ "birthPlace": "Mexiko DF", "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "cram-franziska" }, { "rowId": "row_025", @@ -341,7 +364,8 @@ "birthPlace": null, "deathPlace": "Berlin", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "cram-gisela" }, { "rowId": "row_026", @@ -355,7 +379,8 @@ "birthPlace": "Mexiko", "deathPlace": "Monterrey, Mexiko", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "cram-hans" }, { "rowId": "row_027", @@ -369,7 +394,8 @@ "birthPlace": null, "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "cram-hans-robert" }, { "rowId": "row_028", @@ -383,7 +409,8 @@ "birthPlace": "Eagle Pass, Texas, USA, Texas, USA", "deathPlace": "Berlin", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "cram-herbert" }, { "rowId": "row_029", @@ -397,7 +424,8 @@ "birthPlace": "Burg Schwalbach", "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "cram-ilse" }, { "rowId": "row_030", @@ -411,7 +439,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "cram-jens" }, { "rowId": "row_031", @@ -425,7 +454,8 @@ "birthPlace": "Hamburg", "deathPlace": "Monterrey, Mexiko", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "cram-john-james-juan" }, { "rowId": "row_032", @@ -439,7 +469,8 @@ "birthPlace": null, "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "cram-jutta" }, { "rowId": "row_033", @@ -453,7 +484,8 @@ "birthPlace": "Eagle Pass, Texas, USA, Texas, USA", "deathPlace": "an der Marne", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "cram-kurt" }, { "rowId": "row_034", @@ -467,7 +499,8 @@ "birthPlace": "Berlin", "deathPlace": "Berlin", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "cram-kurt-georg" }, { "rowId": "row_035", @@ -481,7 +514,8 @@ "birthPlace": "Schleswig Holstein", "deathPlace": "Monterrey, Mexiko", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "cram-marie" }, { "rowId": "row_036", @@ -495,7 +529,8 @@ "birthPlace": "Berlin", "deathPlace": "Berlin", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "cram-margret" }, { "rowId": "row_037", @@ -509,7 +544,8 @@ "birthPlace": null, "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "cram-martin" }, { "rowId": "row_038", @@ -523,7 +559,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "cram-meike" }, { "rowId": "row_039", @@ -537,7 +574,8 @@ "birthPlace": "Aachen", "deathPlace": "Essen", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "cram-otto-herbert" }, { "rowId": "row_040", @@ -551,7 +589,8 @@ "birthPlace": "Texas", "deathPlace": "Tenafly", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "cram-ralph" }, { "rowId": "row_041", @@ -565,7 +604,8 @@ "birthPlace": "Aachen", "deathPlace": "Aachen", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "cram-ruth" }, { "rowId": "row_042", @@ -579,7 +619,8 @@ "birthPlace": "Texas", "deathPlace": "Aachen", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "cram-walter-sen" }, { "rowId": "row_043", @@ -593,7 +634,8 @@ "birthPlace": "Berlin", "deathPlace": "Mexiko", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "cram-walter-john" }, { "rowId": "row_044", @@ -607,7 +649,8 @@ "birthPlace": "Essen", "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "cram-walter-otto" }, { "rowId": "row_045", @@ -621,7 +664,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "cram-heydrich-ingrid" }, { "rowId": "row_046", @@ -635,7 +679,8 @@ "birthPlace": "Morelia, Mexiko", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "cram-silke" }, { "rowId": "row_047", @@ -649,7 +694,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "cram-thomas" }, { "rowId": "row_048", @@ -663,7 +709,8 @@ "birthPlace": "Morelia, Mexiko", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "cram-walter" }, { "rowId": "row_049", @@ -677,7 +724,8 @@ "birthPlace": "Tuxpan, Mexiko", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "cram-heydrich-kurt" }, { "rowId": "row_050", @@ -691,7 +739,8 @@ "birthPlace": "Monterrey, Mexiko", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "cram-schmolke-sabina" }, { "rowId": "row_051", @@ -705,7 +754,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "cram-schmolke-carolina" }, { "rowId": "row_052", @@ -719,7 +769,8 @@ "birthPlace": "Aachen", "deathPlace": "Aachen", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "cram-heinemann-rosemarie" }, { "rowId": "row_053", @@ -733,7 +784,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "cram-gonzales-verena" }, { "rowId": "row_054", @@ -747,7 +799,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "cram-gonzales-simona" }, { "rowId": "row_055", @@ -761,7 +814,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "cram-rodriguez-catharina" }, { "rowId": "row_056", @@ -775,7 +829,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "crisolli-karl-august" }, { "rowId": "row_057", @@ -789,7 +844,8 @@ "birthPlace": "Berlin", "deathPlace": "Schweiz", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "crisolli-moelle-rudolf-walter" }, { "rowId": "row_058", @@ -803,7 +859,8 @@ "birthPlace": null, "deathPlace": "Ruhrort", "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-albert" }, { "rowId": "row_059", @@ -817,7 +874,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-brigitte" }, { "rowId": "row_060", @@ -831,7 +889,8 @@ "birthPlace": null, "deathPlace": null, "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-clara" }, { "rowId": "row_061", @@ -845,7 +904,8 @@ "birthPlace": null, "deathPlace": null, "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-emilie" }, { "rowId": "row_062", @@ -859,7 +919,8 @@ "birthPlace": "Hückeswagen", "deathPlace": "Berlin", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-eugenie" }, { "rowId": "row_063", @@ -873,7 +934,8 @@ "birthPlace": "Ruhrort", "deathPlace": "Frankreich", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-georg" }, { "rowId": "row_064", @@ -887,7 +949,8 @@ "birthPlace": "Ruhrort", "deathPlace": "Verdun", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-hans" }, { "rowId": "row_065", @@ -901,7 +964,8 @@ "birthPlace": null, "deathPlace": "Heidelberg", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-hilde" }, { "rowId": "row_066", @@ -915,7 +979,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-marie-elisabeth" }, { "rowId": "row_067", @@ -929,7 +994,8 @@ "birthPlace": "Ruhrort", "deathPlace": "Berlin", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-paul" }, { "rowId": "row_068", @@ -943,7 +1009,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-paul-friedrich" }, { "rowId": "row_069", @@ -957,7 +1024,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-paul-otto" }, { "rowId": "row_070", @@ -971,7 +1039,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-ursula" }, { "rowId": "row_071", @@ -985,7 +1054,8 @@ "birthPlace": "Ruhrort", "deathPlace": "Berlin", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-walter" }, { "rowId": "row_072", @@ -999,7 +1069,8 @@ "birthPlace": null, "deathPlace": null, "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "de-gruyter-julius" }, { "rowId": "row_073", @@ -1013,7 +1084,8 @@ "birthPlace": "Berlin", "deathPlace": "Leipzig", "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "delbrueck-berta-tante-tueten" }, { "rowId": "row_074", @@ -1027,7 +1099,8 @@ "birthPlace": null, "deathPlace": null, "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "dieckmann-ella" }, { "rowId": "row_075", @@ -1041,7 +1114,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "duncker-dolores-dodo" }, { "rowId": "row_076", @@ -1055,7 +1129,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "duncker-max" }, { "rowId": "row_077", @@ -1069,7 +1144,8 @@ "birthPlace": null, "deathPlace": "Mannheim", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "duerr-felix" }, { "rowId": "row_078", @@ -1083,7 +1159,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "duerr-felix-sen" }, { "rowId": "row_079", @@ -1097,7 +1174,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "duerr-herta" }, { "rowId": "row_080", @@ -1111,7 +1189,8 @@ "birthPlace": null, "deathPlace": "Bad Homburg", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "duvenbeck-bernhard" }, { "rowId": "row_081", @@ -1125,7 +1204,8 @@ "birthPlace": null, "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "duvenbeck-birgitta" }, { "rowId": "row_082", @@ -1139,7 +1219,8 @@ "birthPlace": "Heidelberg", "deathPlace": "Bad Homburg", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "duvenbeck-lili" }, { "rowId": "row_083", @@ -1153,7 +1234,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "epping-else" }, { "rowId": "row_084", @@ -1167,7 +1249,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "faerber-editha" }, { "rowId": "row_085", @@ -1181,7 +1264,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "gaedeke-gudula" }, { "rowId": "row_086", @@ -1195,7 +1279,8 @@ "birthPlace": "Tuxpan, Mexiko", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "gomez-cram-susana" }, { "rowId": "row_087", @@ -1209,7 +1294,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "gomez-cram-arturo-jun" }, { "rowId": "row_088", @@ -1223,7 +1309,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "gomez-cram-roberto" }, { "rowId": "row_089", @@ -1237,7 +1324,8 @@ "birthPlace": null, "deathPlace": null, "generation": 5, - "familyMember": true + "familyMember": true, + "personId": "gomez-cram-ingrid-jun" }, { "rowId": "row_090", @@ -1251,7 +1339,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "gruber-gertrud-tante-tutu" }, { "rowId": "row_091", @@ -1265,7 +1354,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "gruber-wolfgang" }, { "rowId": "row_092", @@ -1279,7 +1369,8 @@ "birthPlace": null, "deathPlace": "Berlin", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "hafner-erdmuthe" }, { "rowId": "row_093", @@ -1293,7 +1384,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "heydrich-gertrud" }, { "rowId": "row_094", @@ -1307,7 +1399,8 @@ "birthPlace": "Berlin", "deathPlace": "Denver, Colorado, USA", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "heydrich-heider" }, { "rowId": "row_095", @@ -1321,7 +1414,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "heydrich-peter" }, { "rowId": "row_096", @@ -1335,7 +1429,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "heydrich-dieter" }, { "rowId": "row_097", @@ -1349,7 +1444,8 @@ "birthPlace": null, "deathPlace": null, "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "kisker-clara" }, { "rowId": "row_098", @@ -1363,7 +1459,8 @@ "birthPlace": null, "deathPlace": null, "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "kisker-alexander-lippstadt" }, { "rowId": "row_099", @@ -1377,7 +1474,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "kracker-v-schwartzenf-ingrid" }, { "rowId": "row_100", @@ -1391,7 +1489,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "kuehne-margarete" }, { "rowId": "row_101", @@ -1405,7 +1504,8 @@ "birthPlace": null, "deathPlace": null, "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "liebrecht-emilie" }, { "rowId": "row_102", @@ -1419,7 +1519,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "linser-elsbeth" }, { "rowId": "row_103", @@ -1433,7 +1534,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "martius-annemarie" }, { "rowId": "row_104", @@ -1447,7 +1549,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "meier-burkhardt" }, { "rowId": "row_105", @@ -1461,7 +1564,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "meier-michael" }, { "rowId": "row_106", @@ -1475,7 +1579,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "moeller-herta" }, { "rowId": "row_107", @@ -1489,7 +1594,8 @@ "birthPlace": "Hückeswagen", "deathPlace": "Hückeswagen", "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "mueller-reinhard" }, { "rowId": "row_108", @@ -1503,7 +1609,8 @@ "birthPlace": null, "deathPlace": null, "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "mueller-carl" }, { "rowId": "row_109", @@ -1517,7 +1624,8 @@ "birthPlace": "Elberfeld", "deathPlace": "Hückeswagen", "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "mueller-eugenie" }, { "rowId": "row_110", @@ -1531,7 +1639,8 @@ "birthPlace": "Bielefeld", "deathPlace": "Königstein", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "ober-hermann" }, { "rowId": "row_111", @@ -1545,7 +1654,8 @@ "birthPlace": "Garz", "deathPlace": "Bad Soden", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "ober-inge" }, { "rowId": "row_112", @@ -1559,7 +1669,8 @@ "birthPlace": "Hamburg", "deathPlace": "Hanmburg", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "quast-mary" }, { "rowId": "row_113", @@ -1573,7 +1684,8 @@ "birthPlace": "Hamburg", "deathPlace": "Hamburg", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "quast-emil" }, { "rowId": "row_114", @@ -1587,7 +1699,8 @@ "birthPlace": "Hamburg", "deathPlace": "Hamburg", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "quast-richard" }, { "rowId": "row_115", @@ -1601,7 +1714,8 @@ "birthPlace": null, "deathPlace": "Hausschneiderin in H 14", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "pietzsch-hilde" }, { "rowId": "row_116", @@ -1615,7 +1729,8 @@ "birthPlace": null, "deathPlace": "Überführung v Hans u Geo d Gr aus Frankreich", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "rammelt-sophie-u-walter" }, { "rowId": "row_117", @@ -1629,7 +1744,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "rammelt-peter" }, { "rowId": "row_118", @@ -1643,7 +1759,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "roehr-schefold-harald-bimchen" }, { "rowId": "row_119", @@ -1657,7 +1774,8 @@ "birthPlace": null, "deathPlace": null, "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "ross-marlise-marie-luise" }, { "rowId": "row_120", @@ -1671,7 +1789,8 @@ "birthPlace": "Schülperneuensiel", "deathPlace": "Göteborg", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "ruge-charlotte" }, { "rowId": "row_121", @@ -1685,7 +1804,8 @@ "birthPlace": "Altona", "deathPlace": "Monterrey, Mexiko", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "ruge-emma" }, { "rowId": "row_122", @@ -1699,7 +1819,8 @@ "birthPlace": null, "deathPlace": null, "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "ruhfus-clara" }, { "rowId": "row_123", @@ -1713,7 +1834,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "ruhfus-fritz" }, { "rowId": "row_124", @@ -1727,7 +1849,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "ruhfus-heinz" }, { "rowId": "row_125", @@ -1741,7 +1864,8 @@ "birthPlace": null, "deathPlace": null, "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "schroeder-bertha" }, { "rowId": "row_126", @@ -1755,7 +1879,8 @@ "birthPlace": null, "deathPlace": null, "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "schroeder-emil-lennep" }, { "rowId": "row_127", @@ -1769,7 +1894,8 @@ "birthPlace": "Mainz", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "schuetz-christa-1" }, { "rowId": "row_128", @@ -1783,7 +1909,8 @@ "birthPlace": "Berlin", "deathPlace": "Lüneburg", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "seils-clara-eugenie-1" }, { "rowId": "row_129", @@ -1797,7 +1924,8 @@ "birthPlace": "Hamburg", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "seils-christoph-1" }, { "rowId": "row_130", @@ -1811,7 +1939,8 @@ "birthPlace": "Hamburg", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "seils-dorothee-1" }, { "rowId": "row_131", @@ -1825,7 +1954,8 @@ "birthPlace": "Stade", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "seils-gabriele-1" }, { "rowId": "row_132", @@ -1839,7 +1969,8 @@ "birthPlace": null, "deathPlace": "Berlin", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "seils-peter-ernst-albert-1" }, { "rowId": "row_133", @@ -1853,7 +1984,8 @@ "birthPlace": "Altona", "deathPlace": "Monterrey, Mexiko", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "schefold-emma" }, { "rowId": "row_134", @@ -1867,7 +1999,8 @@ "birthPlace": "Pforzheim", "deathPlace": "Monterrey, Mexiko", "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "schefold-adolf" }, { "rowId": "row_135", @@ -1881,7 +2014,8 @@ "birthPlace": "Monterrey, Mexiko", "deathPlace": "Hannover", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "schefold-erich" }, { "rowId": "row_136", @@ -1895,7 +2029,8 @@ "birthPlace": "Monterrey, Mexiko", "deathPlace": "Monterrey, Mexiko", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "schefold-mieze-maria" }, { "rowId": "row_137", @@ -1909,7 +2044,8 @@ "birthPlace": "Monterrey, Mexiko", "deathPlace": "Mexiko", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "schefold-willy" }, { "rowId": "row_144", @@ -1923,7 +2059,8 @@ "birthPlace": "Berlin", "deathPlace": "Würzburg", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "siebert-hannemarie-sen" }, { "rowId": "row_145", @@ -1937,7 +2074,8 @@ "birthPlace": "Mainz", "deathPlace": "Berlin", "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "siebert-georg" }, { "rowId": "row_146", @@ -1951,7 +2089,8 @@ "birthPlace": "Mainz", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "siebert-hannemarie-jun" }, { "rowId": "row_147", @@ -1965,7 +2104,8 @@ "birthPlace": "Mainz", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "siebert-john-walter" }, { "rowId": "row_148", @@ -1979,7 +2119,8 @@ "birthPlace": "Mainz", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "siebert-juergen" }, { "rowId": "row_149", @@ -1993,7 +2134,8 @@ "birthPlace": "Mainz", "deathPlace": "Schwäbisch Hall", "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "siebert-konrad" }, { "rowId": "row_150", @@ -2007,7 +2149,8 @@ "birthPlace": "Magdeburg", "deathPlace": "Berlin", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "siebert-magdalena-leni" }, { "rowId": "row_151", @@ -2021,7 +2164,8 @@ "birthPlace": "Mainz", "deathPlace": "Berlin", "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "siebert-margret" }, { "rowId": "row_152", @@ -2035,7 +2179,8 @@ "birthPlace": "Berlin", "deathPlace": "Grünstadt", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "siebert-guenther" }, { "rowId": "row_153", @@ -2049,7 +2194,8 @@ "birthPlace": "Mainz", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "siebert-rudolf" }, { "rowId": "row_154", @@ -2063,7 +2209,8 @@ "birthPlace": "Mainz", "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "siebert-spissmann-karola" }, { "rowId": "row_155", @@ -2077,7 +2224,8 @@ "birthPlace": "Karlsruhe", "deathPlace": "Heidelberg", "generation": 3, - "familyMember": true + "familyMember": true, + "personId": "thiel-helga" }, { "rowId": "row_156", @@ -2091,7 +2239,8 @@ "birthPlace": null, "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "thiel-baerbel" }, { "rowId": "row_157", @@ -2105,7 +2254,8 @@ "birthPlace": null, "deathPlace": null, "generation": 4, - "familyMember": true + "familyMember": true, + "personId": "tran-renate" }, { "rowId": "row_158", @@ -2119,7 +2269,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "von-blumenthal-ilse" }, { "rowId": "row_159", @@ -2133,7 +2284,8 @@ "birthPlace": null, "deathPlace": null, "generation": 1, - "familyMember": true + "familyMember": true, + "personId": "weinlig-milly" }, { "rowId": "row_160", @@ -2147,7 +2299,8 @@ "birthPlace": null, "deathPlace": "Lektorat der Geisteswissenschaften, besondere Persönlichkeit", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "wenzel-prof-heinz" }, { "rowId": "row_161", @@ -2161,7 +2314,8 @@ "birthPlace": null, "deathPlace": null, "generation": 0, - "familyMember": true + "familyMember": true, + "personId": "wiehager-helene" }, { "rowId": "row_162", @@ -2175,7 +2329,8 @@ "birthPlace": "Mexiko", "deathPlace": "Garz", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "woehler-anita" }, { "rowId": "row_163", @@ -2189,7 +2344,8 @@ "birthPlace": null, "deathPlace": "Garz", "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "woehler-oskar" }, { "rowId": "row_164", @@ -2203,7 +2359,8 @@ "birthPlace": null, "deathPlace": null, "generation": 2, - "familyMember": true + "familyMember": true, + "personId": "wittkopp-hans" } ], "relationships": [ diff --git a/tools/import-normalizer/out/canonical-persons.xlsx b/tools/import-normalizer/out/canonical-persons.xlsx new file mode 100644 index 00000000..cdefc3f5 Binary files /dev/null and b/tools/import-normalizer/out/canonical-persons.xlsx differ diff --git a/tools/import-normalizer/out/canonical-tag-tree.xlsx b/tools/import-normalizer/out/canonical-tag-tree.xlsx new file mode 100644 index 00000000..6933897d Binary files /dev/null and b/tools/import-normalizer/out/canonical-tag-tree.xlsx differ diff --git a/tools/import-normalizer/persons_tree.py b/tools/import-normalizer/persons_tree.py index e2d92d6b..5c18897c 100644 --- a/tools/import-normalizer/persons_tree.py +++ b/tools/import-normalizer/persons_tree.py @@ -8,9 +8,14 @@ from pathlib import Path import config import dates +import persons as _persons from persons import _strip_accents +# Pinned so the committed tree JSON is reproducible and does not churn on every run +# (NFR-IDEM-01) — mirrors writers._FIXED_TS for the xlsx exports. +_GENERATED_AT = "2020-01-01T00:00:00" + _MIN_YEAR = 1700 _MAX_YEAR = 2100 # Threshold: if parse_date parses a pure-digit string as a year outside [_MIN_YEAR, _MAX_YEAR], @@ -175,6 +180,29 @@ def _parse_row(row_num: int, fields: dict) -> dict: } +def _attach_person_ids(tree_persons: list[dict], raw_dicts: list[dict]) -> None: + """Attach the register's verbatim person_id to each tree person, in place. + + The register (persons.parse_register) is the sole authority for person_id; it + slugifies and suffixes colliding ids exactly once. We propagate that id rather + than re-slugify in the tree, because re-slugifying would not reproduce the + register's collision suffixes and so would not reconcile 1:1 with the register + (#670, Gap 3). + + tree_persons and raw_dicts must be the same length and in the same row order — + parse_register and _parse_row both keep exactly the rows that have a last name. + """ + register = _persons.parse_register(raw_dicts) + if len(tree_persons) != len(register): + raise ValueError( + "person_id propagation requires equal length: " + f"{len(tree_persons)} tree persons vs {len(register)} register persons " + "(the positional zip would otherwise silently truncate and mis-join ids)" + ) + for tree_person, register_person in zip(tree_persons, register): + tree_person["personId"] = register_person.person_id + + def _deduplicate(persons: list[dict]) -> tuple[list[dict], list[str]]: """Remove duplicate rows. Two-stage: @@ -339,11 +367,17 @@ def main() -> None: # --- Pass 1: parse rows --- persons_raw: list[dict] = [] + raw_dicts: list[dict] = [] for row_num, row in enumerate(rows[1:], start=2): field_dict = {field: (row[col] if col < len(row) else "") for field, col in fields_map.items()} if not field_dict.get("last_name", "").strip(): continue persons_raw.append(_parse_row(row_num, field_dict)) + raw_dicts.append(field_dict) + + # Propagate the register's verbatim person_id before dedup so the tree reconciles 1:1 + # with canonical-persons.xlsx (#670, Gap 3). + _attach_person_ids(persons_raw, raw_dicts) persons, skipped_msgs = _deduplicate(persons_raw) for msg in skipped_msgs: @@ -387,7 +421,7 @@ def main() -> None: return output = { - "generated_at": datetime.datetime.now().isoformat(), + "generated_at": _GENERATED_AT, "source": Path(args.input).name, "stats": { "persons": len(persons), diff --git a/tools/import-normalizer/tests/test_dates.py b/tools/import-normalizer/tests/test_dates.py index 2a43ad61..2b59796a 100644 --- a/tools/import-normalizer/tests/test_dates.py +++ b/tools/import-normalizer/tests/test_dates.py @@ -2,6 +2,18 @@ import datetime import dates from dates import Precision +def test_matchers_return_uniform_matchresult(): + # Every matcher returns a MatchResult(iso, precision, end) — no 2- vs 3-tuple + # length-sniffing. A non-range matcher leaves end=None; a range matcher sets it. + day = dates._match_numeric("15.2.1888") + assert isinstance(day, dates.MatchResult) + assert (day.iso, day.precision, day.end) == ("1888-02-15", Precision.DAY, None) + + rng = dates._match_range("10./11.1.1917") + assert isinstance(rng, dates.MatchResult) + assert (rng.iso, rng.precision, rng.end) == ("1917-01-10", Precision.RANGE, "1917-01-11") + + def test_easter_known_years(): # Anonymous Gregorian algorithm — verified against published tables assert dates.easter(2024) == datetime.date(2024, 3, 31) @@ -115,10 +127,55 @@ def test_parse_invalid_calendar_date_is_unknown(): assert dates.parse_date("31.4.1916").precision == Precision.UNKNOWN def test_parse_intra_month_day_range(): - # "7./8. Sept.1923" -> start day, RANGE. Must NOT be confused with slash-date "17/6. 1916". - assert dates.parse_date("7./8. Sept.1923") == dates.ParsedDate("1923-09-07", Precision.RANGE, "7./8. Sept.1923") + # "7./8. Sept.1923" -> start day, RANGE, end day 8th. Must NOT be confused with slash-date "17/6. 1916". + assert dates.parse_date("7./8. Sept.1923") == dates.ParsedDate("1923-09-07", Precision.RANGE, "7./8. Sept.1923", "1923-09-08") assert dates.parse_date("17/6. 1916") == dates.ParsedDate("1916-06-17", Precision.DAY, "17/6. 1916") +def test_parse_intra_month_day_range_carries_end_day(): + # the intra-month day range surfaces the END day so Phase 4 can render meta_date_end + r = dates.parse_date("10./11.1.1917") + assert r.iso == "1917-01-10" + assert r.precision == Precision.RANGE + assert r.end == "1917-01-11" + +def test_parse_roman_month_day_range(): + # "10./11.I.1917" — Roman-numeral-month range; previously fell through to UNKNOWN + r = dates.parse_date("10./11.I.1917") + assert r.iso == "1917-01-10" + assert r.precision == Precision.RANGE + assert r.end == "1917-01-11" + +def test_parse_range_invalid_end_keeps_start_flags_review(): + # "10./40.1.1917" — the 40th is an impossible end day. The start parses fine, + # so the row stays RANGE with the start preserved, the unparseable end is dropped + # (end is None), and the half-resolved range is flagged needs_review so the + # dropped end surfaces honestly instead of vanishing silently (#670, Gap 2). + r = dates.parse_date("10./40.1.1917") + assert r.iso == "1917-01-10" + assert r.precision == Precision.RANGE + assert r.end is None + assert r.needs_review is True + + +def test_parse_range_valid_end_not_flagged(): + # a fully-resolved range carries its end and is NOT flagged for review + r = dates.parse_date("10./11.1.1917") + assert r.end == "1917-01-11" + assert r.needs_review is False + + +def test_parse_non_range_has_no_review_flag(): + # every fully-parsed non-range date is never flagged for review by the date layer + assert dates.parse_date("15.2.1888").needs_review is False + assert dates.parse_date("Mai 1895").needs_review is False + assert dates.parse_date("").needs_review is False + + +def test_parse_non_range_has_no_end(): + assert dates.parse_date("15.2.1888").end is None + assert dates.parse_date("Mai 1895").end is None + assert dates.parse_date("").end is None + def test_parse_trailing_note_stripped_but_raw_preserved(): r = dates.parse_date("17.Nov 1887, 2. Brief") # REQ-DATE-04 assert r.iso == "1887-11-17" diff --git a/tools/import-normalizer/tests/test_documents.py b/tools/import-normalizer/tests/test_documents.py index 52f5025f..fe07f40d 100644 --- a/tools/import-normalizer/tests/test_documents.py +++ b/tools/import-normalizer/tests/test_documents.py @@ -52,8 +52,59 @@ def test_to_canonical_resolves_and_flags(): assert doc.receiver_person_ids == ["de-gruyter-eugenie"] # matched via maiden alias assert doc.date_iso == "1888-02-15" and doc.date_precision == "DAY" assert doc.tags == ["Themen/Brautbriefe"] + assert doc.file == r"..\__scan\W-0001.pdf" # file name carried through for the importer assert doc.needs_review == [] + +def test_to_canonical_carries_file_name(): + ctx = _ctx() + raw = documents.RawRow(source_row=4, index="H-0730", sender="", receivers="", + file="H-0730.pdf") + doc = documents.to_canonical(raw, ctx, date_overrides={}) + assert doc.file == "H-0730.pdf" + + +def test_to_canonical_range_carries_date_end(): + ctx = _ctx() + raw = documents.RawRow(source_row=4, index="H-0730", sender="", receivers="", + date="10./11.1.1917") + doc = documents.to_canonical(raw, ctx, date_overrides={}) + assert doc.date_iso == "1917-01-10" + assert doc.date_precision == "RANGE" + assert doc.date_end == "1917-01-11" + + +def test_to_canonical_non_range_has_empty_date_end(): + ctx = _ctx() + raw = documents.RawRow(source_row=4, index="H-0730", sender="", receivers="", + date="15.2.1888") + doc = documents.to_canonical(raw, ctx, date_overrides={}) + assert doc.date_precision == "DAY" + assert doc.date_end == "" + +def test_to_canonical_half_resolved_range_flags_review(): + # an impossible end day ("10./40.1.1917") keeps the start + RANGE precision but + # drops the unparseable end; the document must surface this as a review flag + # so the importer (#669) knows date_end is empty on a RANGE row by design. + ctx = _ctx() + raw = documents.RawRow(source_row=5, index="H-0731", sender="", receivers="", + date="10./40.1.1917") + doc = documents.to_canonical(raw, ctx, date_overrides={}) + assert doc.date_iso == "1917-01-10" + assert doc.date_precision == "RANGE" + assert doc.date_end == "" + assert "range_end_unparsed" in doc.needs_review + + +def test_to_canonical_full_range_not_flagged(): + ctx = _ctx() + raw = documents.RawRow(source_row=5, index="H-0730", sender="", receivers="", + date="10./11.1.1917") + doc = documents.to_canonical(raw, ctx, date_overrides={}) + assert doc.date_end == "1917-01-11" + assert "range_end_unparsed" not in doc.needs_review + + def test_to_canonical_unmatched_and_unparsed(): ctx = _ctx() raw = documents.RawRow(source_row=9, index="C-0001", diff --git a/tools/import-normalizer/tests/test_normalize.py b/tools/import-normalizer/tests/test_normalize.py index c6638d9e..2adf2a4a 100644 --- a/tools/import-normalizer/tests/test_normalize.py +++ b/tools/import-normalizer/tests/test_normalize.py @@ -1,3 +1,8 @@ +import json +import subprocess +import sys +from pathlib import Path + import openpyxl import normalize @@ -119,3 +124,56 @@ def test_approved_themes_applied(tmp_path): tag_values = [ws.cell(row=r, column=tag_col + 1).value for r in range(2, ws.max_row + 1)] # W-0001 has Inhalt "Geschäftsreise" — should get an extra Themen/geschäftsreise tag assert any(v and "Themen/geschäftsreise" in v for v in tag_values) + + +def _person_wb_with_collision(tmp_path): + # Two "Hans Cram" rows force the register to suffix the colliding slug (-1/-2); + # the tree must carry those exact suffixed ids so the join still reconciles. + wb = openpyxl.Workbook(); ws = wb.active; ws.title = "Tabelle1" + ws.append(["Generation", "Familienname", "Vorname", "geb als", "Geburtsdatum", + "Geburtsort", "Todesdatum", "Sterbeort", "verheiratet mit", "Bemerkung"]) + ws.append(["G 1", "de Gruyter", "Walter", "", "", "", "", "", "", ""]) + ws.append(["G 1", "de Gruyter", "Eugenie", "Müller", "", "", "", "", "", ""]) + ws.append(["G 2", "Cram", "Hans", "", "1890", "", "", "", "", ""]) + ws.append(["G 3", "Cram", "Hans", "", "1925", "", "", "", "", ""]) + p = tmp_path / "persons.xlsx"; wb.save(p); return p + + +def _generate_tree(person_wb, out_path): + script = Path(__file__).parent.parent / "persons_tree.py" + result = subprocess.run( + [sys.executable, str(script), "--input", str(person_wb), "--output", str(out_path)], + capture_output=True, text=True, + ) + assert result.returncode == 0, result.stderr + return json.loads(out_path.read_text(encoding="utf-8")) + + +def test_tree_person_ids_reconcile_with_persons_xlsx(tmp_path): + # The real #669 contract: every personId in canonical-persons-tree.json must join + # 1:1 onto a person_id in canonical-persons.xlsx — no orphan tree id, no duplicate. + # Both artifacts are produced from the SAME person workbook (collision included). + person_wb = _person_wb_with_collision(tmp_path) + out_dir = tmp_path / "out"; review_dir = tmp_path / "review" + + normalize.run( + document_workbook=_doc_wb(tmp_path), document_sheet="Familienarchiv", + person_workbook=person_wb, person_sheet="Tabelle1", + out_dir=out_dir, review_dir=review_dir, date_overrides={}, name_overrides={}) + + tree = _generate_tree(person_wb, tmp_path / "tree.json") + tree_ids = [p["personId"] for p in tree["persons"]] + + wb = openpyxl.load_workbook(out_dir / "canonical-persons.xlsx") + ws = wb.active + header = [c.value for c in ws[1]] + pid_col = header.index("person_id") + register_ids = [ws.cell(row=r, column=pid_col + 1).value for r in range(2, ws.max_row + 1)] + + # tree ids are unique (no duplicate join key) + assert len(tree_ids) == len(set(tree_ids)) + # the suffixed collision ids actually reached the tree + assert "cram-hans-1" in tree_ids and "cram-hans-2" in tree_ids + # every tree id resolves to exactly one register row — the join is total and 1:1 + register_counts = {pid: register_ids.count(pid) for pid in tree_ids} + assert all(count == 1 for count in register_counts.values()), register_counts diff --git a/tools/import-normalizer/tests/test_persons_tree.py b/tools/import-normalizer/tests/test_persons_tree.py index d8de1e67..cdf7a450 100644 --- a/tools/import-normalizer/tests/test_persons_tree.py +++ b/tools/import-normalizer/tests/test_persons_tree.py @@ -433,6 +433,64 @@ def test_parse_bemerkung_sohn_with_trailing_remark(): assert notes == "nach Mexiko emigriert" +def test_generated_at_is_fixed_for_reproducibility(): + # NFR-IDEM-01: a pinned timestamp so the committed tree JSON doesn't churn on every run + assert persons_tree._GENERATED_AT == "2020-01-01T00:00:00" + + +def test_attach_person_ids_propagates_register_slug(): + # the tree person must carry the register's verbatim person_id (slug), not a recomputed one + raw_dicts = [ + {"generation": "G 1", "last_name": "de Gruyter", "first_name": "Walter", + "maiden_name": "", "birth_date": "", "birth_place": "", + "death_date": "", "death_place": "", "spouse": "", "notes": ""}, + {"generation": "G 1", "last_name": "de Gruyter", "first_name": "Eugenie", + "maiden_name": "Müller", "birth_date": "", "birth_place": "", + "death_date": "", "death_place": "", "spouse": "", "notes": ""}, + ] + tree_persons = [persons_tree._parse_row(n, d) for n, d in enumerate(raw_dicts, start=2)] + persons_tree._attach_person_ids(tree_persons, raw_dicts) + assert tree_persons[0]["personId"] == "de-gruyter-walter" + assert tree_persons[1]["personId"] == "de-gruyter-eugenie" + + +def test_attach_person_ids_raises_on_length_divergence(): + # The propagation is a positional zip; if tree_persons and the register drift in + # length (e.g. a future filter change), zip would silently truncate and mis-join ids. + # The guard must fail loudly instead. + raw_dicts = [ + {"generation": "G 1", "last_name": "de Gruyter", "first_name": "Walter", + "maiden_name": "", "birth_date": "", "birth_place": "", + "death_date": "", "death_place": "", "spouse": "", "notes": ""}, + # second register row has a last name -> parse_register keeps it ... + {"generation": "G 1", "last_name": "de Gruyter", "first_name": "Eugenie", + "maiden_name": "Müller", "birth_date": "", "birth_place": "", + "death_date": "", "death_place": "", "spouse": "", "notes": ""}, + ] + # ... but the tree side only has one person -> lengths diverge. + tree_persons = [persons_tree._parse_row(2, raw_dicts[0])] + import pytest + with pytest.raises(ValueError, match="length"): + persons_tree._attach_person_ids(tree_persons, raw_dicts) + + +def test_attach_person_ids_carries_register_collision_suffix(): + # when two register rows slug-collide, the register suffixes the ids (-1, -2); + # those exact suffixed ids must reach the tree persons, never a recomputed bare slug + raw_dicts = [ + {"generation": "G 2", "last_name": "Cram", "first_name": "Hans", + "maiden_name": "", "birth_date": "1890", "birth_place": "", + "death_date": "", "death_place": "", "spouse": "", "notes": ""}, + {"generation": "G 3", "last_name": "Cram", "first_name": "Hans", + "maiden_name": "", "birth_date": "1925", "birth_place": "", + "death_date": "", "death_place": "", "spouse": "", "notes": ""}, + ] + tree_persons = [persons_tree._parse_row(n, d) for n, d in enumerate(raw_dicts, start=2)] + persons_tree._attach_person_ids(tree_persons, raw_dicts) + assert tree_persons[0]["personId"] == "cram-hans-1" + assert tree_persons[1]["personId"] == "cram-hans-2" + + import subprocess diff --git a/tools/import-normalizer/tests/test_writers.py b/tools/import-normalizer/tests/test_writers.py index 37c4e199..9f20d501 100644 --- a/tools/import-normalizer/tests/test_writers.py +++ b/tools/import-normalizer/tests/test_writers.py @@ -31,6 +31,21 @@ def test_write_documents_xlsx_joins_lists(tmp_path): assert row["receiver_person_ids"] == "a|b" assert row["needs_review"] == "unparsed_date" + +def test_write_documents_xlsx_carries_file_and_date_end(tmp_path): + doc = documents.CanonicalDocument( + index="H-0730", file="H-0730.pdf", date_iso="1917-01-10", + date_precision="RANGE", date_end="1917-01-11") + out = tmp_path / "docs.xlsx" + writers.write_documents_xlsx([doc], out) + wb = openpyxl.load_workbook(out) + ws = wb.active + header = [c.value for c in ws[1]] + assert "file" in header and "date_end" in header + row = {h: c.value for h, c in zip(header, ws[2])} + assert row["file"] == "H-0730.pdf" + assert row["date_end"] == "1917-01-11" + def test_write_documents_xlsx_pins_timestamp(tmp_path): # determinism (NFR-IDEM-01): workbook created/modified are pinned, not the current time doc = documents.CanonicalDocument(index="W-0001") diff --git a/tools/import-normalizer/writers.py b/tools/import-normalizer/writers.py index 05b4d52e..5b9799e1 100644 --- a/tools/import-normalizer/writers.py +++ b/tools/import-normalizer/writers.py @@ -22,9 +22,10 @@ def _csv_safe(value): return "'" + s if s[:1] in ("=", "+", "-", "@", "\t", "\r", "\n") else s -DOC_COLUMNS = ["index", "box", "folder", "sender_person_id", "sender_name", +DOC_COLUMNS = ["index", "file", "box", "folder", "sender_person_id", "sender_name", "receiver_person_ids", "receiver_names", "date_iso", "date_raw", - "date_precision", "location", "tags", "summary", "source_row", "needs_review"] + "date_precision", "date_end", "location", "tags", "summary", + "source_row", "needs_review"] PERSON_COLUMNS = ["person_id", "last_name", "first_name", "maiden_name", "title", "nickname", "birth_date", "birth_date_raw", "birth_place", "death_date", "death_date_raw",