feat(normalizer): complete canonical exports for the importer (Phase 1, #670) #672
@@ -176,6 +176,14 @@ letter actually said.*
|
|||||||
Silvester=12-31, …). Seasons map to representative months: Frühling/Frühjahr=Apr, Sommer=Jul,
|
Silvester=12-31, …). Seasons map to representative months: Frühling/Frühjahr=Apr, Sommer=Jul,
|
||||||
Herbst=Oct, Winter=Jan. The feast/season tables and Easter algorithm live in `config.py`
|
Herbst=Oct, Winter=Jan. The feast/season tables and Easter algorithm live in `config.py`
|
||||||
(NFR-MAINT-01).
|
(NFR-MAINT-01).
|
||||||
|
- **REQ-DATE-07** — **Intra-month day ranges carry an end day; half-resolved ranges are
|
||||||
|
flagged.** For a day range like `7./8. Sept.1923`, `date_iso` holds the start day, the end
|
||||||
|
day is resolved against the shared month/year into `date_end`, and `date_precision` =
|
||||||
|
`RANGE`. If the **start** parses but the **end day is impossible** (e.g. `10./40.1.1917`),
|
||||||
|
the row keeps the start and `RANGE` precision, leaves `date_end` **empty**, and is flagged
|
||||||
|
`needs_review = range_end_unparsed` — the unparseable end is dropped honestly (surfaced for
|
||||||
|
review), never silently invented or clamped. A `RANGE` row **may** therefore legitimately
|
||||||
|
have an empty `date_end`; the importer must treat `date_end` as optional even on a `RANGE`.
|
||||||
|
|
||||||
### 4.4 Person resolution & dedup (`FR-PERS`, `FR-DEDUP`) — resolves IMP-04, IMP-05, IMP-11
|
### 4.4 Person resolution & dedup (`FR-PERS`, `FR-DEDUP`) — resolves IMP-04, IMP-05, IMP-11
|
||||||
|
|
||||||
@@ -262,6 +270,7 @@ DB schema.
|
|||||||
| Field | Required | Format / values | Notes |
|
| Field | Required | Format / values | Notes |
|
||||||
| --- | --- | --- | --- |
|
| --- | --- | --- | --- |
|
||||||
| `index` | yes | string | Stable key; basis for PDF matching. |
|
| `index` | yes | string | Stable key; basis for PDF matching. |
|
||||||
|
| `file` | no | string | verbatim `Datei` value (e.g. `H-0730.pdf`); carried through for the importer to link the scanned PDF. |
|
||||||
| `box` | no | string | from `Box`. |
|
| `box` | no | string | from `Box`. |
|
||||||
| `folder` | no | string | from `Mappe`. |
|
| `folder` | no | string | from `Mappe`. |
|
||||||
| `sender_person_id` | no | person_id | resolved; empty if no sender. |
|
| `sender_person_id` | no | person_id | resolved; empty if no sender. |
|
||||||
@@ -271,11 +280,12 @@ DB schema.
|
|||||||
| `date_iso` | no | `YYYY-MM-DD` | best-effort; empty if `UNKNOWN`. |
|
| `date_iso` | no | `YYYY-MM-DD` | best-effort; empty if `UNKNOWN`. |
|
||||||
| `date_raw` | no | string | verbatim source date. |
|
| `date_raw` | no | string | verbatim source date. |
|
||||||
| `date_precision` | yes | enum | `DAY\|MONTH\|SEASON\|YEAR\|RANGE\|APPROX\|UNKNOWN`. |
|
| `date_precision` | yes | enum | `DAY\|MONTH\|SEASON\|YEAR\|RANGE\|APPROX\|UNKNOWN`. |
|
||||||
|
| `date_end` | no | `YYYY-MM-DD` or empty | RANGE end day (e.g. `7./8. Sept.1923` → `date_iso` = start, `date_end` = end). Empty for every non-RANGE precision **and** for a half-resolved RANGE whose end did not parse (see REQ-DATE-07). |
|
||||||
| `location` | no | string | from `Ort`. |
|
| `location` | no | string | from `Ort`. |
|
||||||
| `tags` | no | `tag\|tag` | from `Schlagwort`. |
|
| `tags` | no | `tag\|tag` | from `Schlagwort`. |
|
||||||
| `summary` | no | string | from `Inhalt`. |
|
| `summary` | no | string | from `Inhalt`. |
|
||||||
| `source_row` | yes | int | provenance (NFR-DATA-01). |
|
| `source_row` | yes | int | provenance (NFR-DATA-01). |
|
||||||
| `needs_review` | yes | `flag\|flag` or empty | review flags (REQ-PROV-02). |
|
| `needs_review` | yes | `flag\|flag` or empty | review flags (REQ-PROV-02). Flags include `unparsed_date`, `range_end_unparsed` (half-resolved RANGE, REQ-DATE-07), `unmatched_sender`, `unmatched_receiver`, `multi_sender`, `index_file_mismatch`, `duplicate_index`. |
|
||||||
|
|
||||||
### 6.2 `canonical-persons.xlsx`
|
### 6.2 `canonical-persons.xlsx`
|
||||||
|
|
||||||
@@ -295,6 +305,27 @@ DB schema.
|
|||||||
| `aliases` | no | `a\|b\|c` | every surface form that maps here. |
|
| `aliases` | no | `a\|b\|c` | every surface form that maps here. |
|
||||||
| `provisional` | yes | bool | true if created from a document string, not the register. |
|
| `provisional` | yes | bool | true if created from a document string, not the register. |
|
||||||
|
|
||||||
|
### 6.3 `canonical-persons-tree.json`
|
||||||
|
|
||||||
|
The de-duplicated genealogical tree (family members + their relationships) the importer
|
||||||
|
uses to seed the family graph. Each `persons[]` entry carries a `personId` that **joins
|
||||||
|
1:1 onto** `person_id` in `canonical-persons.xlsx`.
|
||||||
|
|
||||||
|
| Field | Required | Format | Notes |
|
||||||
|
| --- | --- | --- | --- |
|
||||||
|
| `personId` | yes | slug | The register's **verbatim** `person_id` (e.g. `cram-hans-1`), propagated — never re-slugified — so collision suffixes match `canonical-persons.xlsx` exactly. Every tree `personId` exists in the register; the register is the sole slug authority. |
|
||||||
|
| `firstName` / `lastName` / `maidenName` | first/last yes | string | name parts. |
|
||||||
|
| `birthYear` / `deathYear` | no | int or null | year only (tree granularity). |
|
||||||
|
| `birthPlace` / `deathPlace` | no | string or null | from the register. |
|
||||||
|
| `generation` | no | int or null | parsed from `G n`. |
|
||||||
|
| `notes` | no | string or null | leftover Bemerkung text after relationship extraction. |
|
||||||
|
| `familyMember` | yes | bool | always true for tree persons. |
|
||||||
|
|
||||||
|
A top-level `generated_at` is pinned to a fixed timestamp (`2020-01-01T00:00:00`) for
|
||||||
|
reproducibility (NFR-IDEM-01), not a wall-clock value. `relationships[]` carry `SPOUSE_OF`
|
||||||
|
and `PARENT_OF` edges keyed by `rowId`; `unresolved[]` lists relationship strings that did
|
||||||
|
not match a tree person.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 7. Prioritized Backlog (MoSCoW)
|
## 7. Prioritized Backlog (MoSCoW)
|
||||||
@@ -339,7 +370,7 @@ DB schema.
|
|||||||
| ID | Question | Why it matters | Ref | Resolution |
|
| ID | Question | Why it matters | Ref | Resolution |
|
||||||
| --- | --- | --- | --- | --- |
|
| --- | --- | --- | --- | --- |
|
||||||
| OQ-01 ✅ | Season/holiday → date. | Accuracy of ~70 SEASON/feast rows. | REQ-DATE-06 | **Resolved (2026-05-25):** movable feasts (Ostern, Pfingsten, Himmelfahrt, Advent, …) **computed per year from Easter — never a fixed month**; fixed feasts looked up (Weihnachten=12-25, Neujahr=01-01, …); seasons = mid-season month (Frühling=Apr, Sommer=Jul, Herbst=Oct, Winter=Jan). |
|
| OQ-01 ✅ | Season/holiday → date. | Accuracy of ~70 SEASON/feast rows. | REQ-DATE-06 | **Resolved (2026-05-25):** movable feasts (Ostern, Pfingsten, Himmelfahrt, Advent, …) **computed per year from Easter — never a fixed month**; fixed feasts looked up (Weihnachten=12-25, Neujahr=01-01, …); seasons = mid-season month (Frühling=Apr, Sommer=Jul, Herbst=Oct, Winter=Jan). |
|
||||||
| OQ-02 ✅ | Date ranges: start only, or start+end? | Sorting/display of ~315 range values. | REQ-DATE-02 | **Confirmed:** store **start** in `date_iso`, precision `RANGE`, full text in `date_raw`. |
|
| OQ-02 ✅ | Date ranges: start only, or start+end? | Sorting/display of ~315 range values. | REQ-DATE-02, REQ-DATE-07 | **Confirmed (updated #670):** store **start** in `date_iso`, precision `RANGE`, full text in `date_raw`, **and the resolved end day in `date_end`** for intra-month day ranges. A half-resolved range (start parsed, end impossible) keeps `date_end` empty and is flagged `range_end_unparsed`. |
|
||||||
| OQ-03 ✅ | `person_id` format. | Stability across re-runs; diffability. | §6 | **Confirmed:** readable slug `lastname-firstname`, numeric suffix on collision. |
|
| OQ-03 ✅ | `person_id` format. | Stability across re-runs; diffability. | §6 | **Confirmed:** readable slug `lastname-firstname`, numeric suffix on collision. |
|
||||||
| OQ-04 ✅ | `x`-suffix row handling. | 42 rows. | REQ-TRIAGE-03 | **Resolved (2026-05-25):** `x` rows are transcriptions of the base letter but not yet mappable → **skip this pass**, log to `review/skipped-x-suffix.csv` for later linking. |
|
| OQ-04 ✅ | `x`-suffix row handling. | 42 rows. | REQ-TRIAGE-03 | **Resolved (2026-05-25):** `x` rows are transcriptions of the base letter but not yet mappable → **skip this pass**, log to `review/skipped-x-suffix.csv` for later linking. |
|
||||||
| OQ-05 ✅ | Importer output format. | Phase-2 reader. | B11 | **Confirmed:** `.xlsx` (openpyxl-native, headered). |
|
| OQ-05 ✅ | Importer output format. | Phase-2 reader. | B11 | **Confirmed:** `.xlsx` (openpyxl-native, headered). |
|
||||||
|
|||||||
3
tools/import-normalizer/.gitignore
vendored
3
tools/import-normalizer/.gitignore
vendored
@@ -1,6 +1,7 @@
|
|||||||
.venv/
|
.venv/
|
||||||
out/
|
out/*
|
||||||
!out/canonical-persons-tree.json
|
!out/canonical-persons-tree.json
|
||||||
|
!out/*.xlsx
|
||||||
review/
|
review/
|
||||||
__pycache__/
|
__pycache__/
|
||||||
*.pyc
|
*.pyc
|
||||||
|
|||||||
@@ -66,6 +66,24 @@ class ParsedDate:
|
|||||||
iso: str | None
|
iso: str | None
|
||||||
precision: Precision
|
precision: Precision
|
||||||
raw: str
|
raw: str
|
||||||
|
end: str | None = None # RANGE end day; None for every non-RANGE precision
|
||||||
|
# True only for a half-resolved RANGE: the start parsed but the end did not, so
|
||||||
|
# the end was dropped and the row should surface in review (#670, Gap 2).
|
||||||
|
needs_review: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class MatchResult:
|
||||||
|
"""Uniform return shape for every _match_* matcher.
|
||||||
|
|
||||||
|
A matcher returns None when it does not match, or a MatchResult when it does.
|
||||||
|
`end` is the RANGE end day (None for every non-RANGE precision); `needs_review`
|
||||||
|
is True only for a half-resolved RANGE whose start parsed but end did not.
|
||||||
|
"""
|
||||||
|
iso: str
|
||||||
|
precision: Precision
|
||||||
|
end: str | None = None
|
||||||
|
needs_review: bool = False
|
||||||
|
|
||||||
|
|
||||||
_LEADING_MARKERS = re.compile(
|
_LEADING_MARKERS = re.compile(
|
||||||
@@ -97,7 +115,7 @@ def _match_iso(s):
|
|||||||
if re.fullmatch(r"\d{4}-\d{2}-\d{2}", s):
|
if re.fullmatch(r"\d{4}-\d{2}-\d{2}", s):
|
||||||
try:
|
try:
|
||||||
datetime.date.fromisoformat(s)
|
datetime.date.fromisoformat(s)
|
||||||
return s, Precision.DAY
|
return MatchResult(s, Precision.DAY)
|
||||||
except ValueError:
|
except ValueError:
|
||||||
return None
|
return None
|
||||||
return None
|
return None
|
||||||
@@ -112,7 +130,7 @@ def _match_numeric(s):
|
|||||||
if year is None or not (1 <= month <= 12):
|
if year is None or not (1 <= month <= 12):
|
||||||
return None
|
return None
|
||||||
try:
|
try:
|
||||||
return datetime.date(year, month, day).isoformat(), Precision.DAY
|
return MatchResult(datetime.date(year, month, day).isoformat(), Precision.DAY)
|
||||||
except ValueError:
|
except ValueError:
|
||||||
return None
|
return None
|
||||||
|
|
||||||
@@ -130,7 +148,7 @@ def _match_roman(s):
|
|||||||
if not month or year is None:
|
if not month or year is None:
|
||||||
return None
|
return None
|
||||||
try:
|
try:
|
||||||
return datetime.date(year, month, day).isoformat(), Precision.DAY
|
return MatchResult(datetime.date(year, month, day).isoformat(), Precision.DAY)
|
||||||
except ValueError:
|
except ValueError:
|
||||||
return None
|
return None
|
||||||
|
|
||||||
@@ -146,7 +164,7 @@ def _build_day_month_year(day, month, year):
|
|||||||
if not month or year is None or not (1 <= month <= 12):
|
if not month or year is None or not (1 <= month <= 12):
|
||||||
return None
|
return None
|
||||||
try:
|
try:
|
||||||
return datetime.date(year, month, day).isoformat(), Precision.DAY
|
return MatchResult(datetime.date(year, month, day).isoformat(), Precision.DAY)
|
||||||
except ValueError:
|
except ValueError:
|
||||||
return None
|
return None
|
||||||
|
|
||||||
@@ -188,7 +206,7 @@ def _match_month_year(s):
|
|||||||
year = expand_year(m.group(2))
|
year = expand_year(m.group(2))
|
||||||
if not month or year is None:
|
if not month or year is None:
|
||||||
return None
|
return None
|
||||||
return datetime.date(year, month, 1).isoformat(), Precision.MONTH
|
return MatchResult(datetime.date(year, month, 1).isoformat(), Precision.MONTH)
|
||||||
|
|
||||||
|
|
||||||
def _match_feast_season(s):
|
def _match_feast_season(s):
|
||||||
@@ -198,33 +216,44 @@ def _match_feast_season(s):
|
|||||||
year = expand_year(m.group(2))
|
year = expand_year(m.group(2))
|
||||||
if year is None:
|
if year is None:
|
||||||
return None
|
return None
|
||||||
return resolve_feast_or_season(m.group(1), year)
|
resolved = resolve_feast_or_season(m.group(1), year)
|
||||||
|
if resolved is None:
|
||||||
|
return None
|
||||||
|
iso, precision = resolved
|
||||||
|
return MatchResult(iso, precision)
|
||||||
|
|
||||||
|
|
||||||
def _match_year_only(s):
|
def _match_year_only(s):
|
||||||
if _YEAR_ONLY_RE.fullmatch(s):
|
if _YEAR_ONLY_RE.fullmatch(s):
|
||||||
return datetime.date(int(s), 1, 1).isoformat(), Precision.YEAR
|
return MatchResult(datetime.date(int(s), 1, 1).isoformat(), Precision.YEAR)
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
def _match_range(s):
|
def _match_range(s):
|
||||||
m = _RANGE_YY_RE.fullmatch(s)
|
m = _RANGE_YY_RE.fullmatch(s)
|
||||||
if m:
|
if m:
|
||||||
return datetime.date(int(m.group(1)), 1, 1).isoformat(), Precision.RANGE
|
return MatchResult(datetime.date(int(m.group(1)), 1, 1).isoformat(), Precision.RANGE)
|
||||||
m = _RANGE_DAY_RE.fullmatch(s)
|
m = _RANGE_DAY_RE.fullmatch(s)
|
||||||
if m:
|
if m:
|
||||||
first = f"{m.group(1)}.{m.group(3)}" # "7." + "Sept.1923" -> "7.Sept.1923"
|
day_start, day_end, rest = m.group(1), m.group(2), m.group(3)
|
||||||
for matcher in (_match_numeric, _match_monthname_a):
|
# "10." + "1.1917" -> "10.1.1917"; resolve start and end day against the shared month/year
|
||||||
r = matcher(first)
|
for matcher in (_match_numeric, _match_roman, _match_monthname_a):
|
||||||
if r:
|
start = matcher(f"{day_start}.{rest}")
|
||||||
return r[0], Precision.RANGE
|
if start:
|
||||||
|
end = matcher(f"{day_end}.{rest}")
|
||||||
|
# Half-resolved range (start parsed, end did not — e.g. the impossible
|
||||||
|
# end day in "10./40.1.1917"): keep the start and RANGE precision, drop
|
||||||
|
# the end, and flag needs_review so the dropped end surfaces (#670, Gap 2).
|
||||||
|
return MatchResult(start.iso, Precision.RANGE,
|
||||||
|
end.iso if end else None,
|
||||||
|
needs_review=end is None)
|
||||||
m = _RANGE_HYPHEN_RE.fullmatch(s)
|
m = _RANGE_HYPHEN_RE.fullmatch(s)
|
||||||
if m:
|
if m:
|
||||||
start = m.group(1).strip()
|
start = m.group(1).strip()
|
||||||
for matcher in (_match_numeric, _match_roman, _match_monthname_a, _match_year_only):
|
for matcher in (_match_numeric, _match_roman, _match_monthname_a, _match_year_only):
|
||||||
r = matcher(start)
|
r = matcher(start)
|
||||||
if r:
|
if r:
|
||||||
return r[0], Precision.RANGE
|
return MatchResult(r.iso, Precision.RANGE)
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
@@ -253,10 +282,8 @@ def parse_date(raw: str, date_overrides: dict | None = None) -> ParsedDate:
|
|||||||
for matcher in _MATCHERS:
|
for matcher in _MATCHERS:
|
||||||
result = matcher(cleaned)
|
result = matcher(cleaned)
|
||||||
if result:
|
if result:
|
||||||
iso, precision = result
|
precision = Precision.APPROX if approx else result.precision
|
||||||
if approx:
|
return ParsedDate(result.iso, precision, raw, result.end, result.needs_review)
|
||||||
precision = Precision.APPROX
|
|
||||||
return ParsedDate(iso, precision, raw)
|
|
||||||
return ParsedDate(None, Precision.UNKNOWN, raw)
|
return ParsedDate(None, Precision.UNKNOWN, raw)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -31,6 +31,7 @@ class RawRow:
|
|||||||
@dataclass
|
@dataclass
|
||||||
class CanonicalDocument:
|
class CanonicalDocument:
|
||||||
index: str
|
index: str
|
||||||
|
file: str = ""
|
||||||
box: str = ""
|
box: str = ""
|
||||||
folder: str = ""
|
folder: str = ""
|
||||||
sender_person_id: str = ""
|
sender_person_id: str = ""
|
||||||
@@ -40,6 +41,7 @@ class CanonicalDocument:
|
|||||||
date_iso: str = ""
|
date_iso: str = ""
|
||||||
date_raw: str = ""
|
date_raw: str = ""
|
||||||
date_precision: str = ""
|
date_precision: str = ""
|
||||||
|
date_end: str = ""
|
||||||
location: str = ""
|
location: str = ""
|
||||||
tags: list = field(default_factory=list)
|
tags: list = field(default_factory=list)
|
||||||
summary: str = ""
|
summary: str = ""
|
||||||
@@ -105,15 +107,18 @@ def to_canonical(raw, ctx, date_overrides: dict, approved_themes: frozenset = fr
|
|||||||
|
|
||||||
if raw.date.strip() and pd.precision == _dates.Precision.UNKNOWN:
|
if raw.date.strip() and pd.precision == _dates.Precision.UNKNOWN:
|
||||||
flags.append("unparsed_date")
|
flags.append("unparsed_date")
|
||||||
|
if pd.needs_review:
|
||||||
|
flags.append("range_end_unparsed")
|
||||||
if index_file_mismatch(raw.index, raw.file):
|
if index_file_mismatch(raw.index, raw.file):
|
||||||
flags.append("index_file_mismatch")
|
flags.append("index_file_mismatch")
|
||||||
|
|
||||||
return CanonicalDocument(
|
return CanonicalDocument(
|
||||||
index=raw.index, box=raw.box, folder=raw.folder,
|
index=raw.index, file=raw.file, box=raw.box, folder=raw.folder,
|
||||||
sender_person_id=sender_id, sender_name=sender_name,
|
sender_person_id=sender_id, sender_name=sender_name,
|
||||||
receiver_person_ids=[r[0] for r in receivers],
|
receiver_person_ids=[r[0] for r in receivers],
|
||||||
receiver_names=[r[1] for r in receivers],
|
receiver_names=[r[1] for r in receivers],
|
||||||
date_iso=pd.iso or "", date_raw=raw.date, date_precision=str(pd.precision),
|
date_iso=pd.iso or "", date_raw=raw.date, date_precision=str(pd.precision),
|
||||||
|
date_end=pd.end or "",
|
||||||
location=raw.location, tags=_tags.generate_tags(raw.tags, raw.summary, approved_themes), summary=raw.summary,
|
location=raw.location, tags=_tags.generate_tags(raw.tags, raw.summary, approved_themes), summary=raw.summary,
|
||||||
source_row=raw.source_row, needs_review=flags,
|
source_row=raw.source_row, needs_review=flags,
|
||||||
)
|
)
|
||||||
|
|||||||
BIN
tools/import-normalizer/out/canonical-documents.xlsx
Normal file
BIN
tools/import-normalizer/out/canonical-documents.xlsx
Normal file
Binary file not shown.
File diff suppressed because it is too large
Load Diff
BIN
tools/import-normalizer/out/canonical-persons.xlsx
Normal file
BIN
tools/import-normalizer/out/canonical-persons.xlsx
Normal file
Binary file not shown.
BIN
tools/import-normalizer/out/canonical-tag-tree.xlsx
Normal file
BIN
tools/import-normalizer/out/canonical-tag-tree.xlsx
Normal file
Binary file not shown.
@@ -8,9 +8,14 @@ from pathlib import Path
|
|||||||
|
|
||||||
import config
|
import config
|
||||||
import dates
|
import dates
|
||||||
|
import persons as _persons
|
||||||
from persons import _strip_accents
|
from persons import _strip_accents
|
||||||
|
|
||||||
|
|
||||||
|
# Pinned so the committed tree JSON is reproducible and does not churn on every run
|
||||||
|
# (NFR-IDEM-01) — mirrors writers._FIXED_TS for the xlsx exports.
|
||||||
|
_GENERATED_AT = "2020-01-01T00:00:00"
|
||||||
|
|
||||||
_MIN_YEAR = 1700
|
_MIN_YEAR = 1700
|
||||||
_MAX_YEAR = 2100
|
_MAX_YEAR = 2100
|
||||||
# Threshold: if parse_date parses a pure-digit string as a year outside [_MIN_YEAR, _MAX_YEAR],
|
# Threshold: if parse_date parses a pure-digit string as a year outside [_MIN_YEAR, _MAX_YEAR],
|
||||||
@@ -175,6 +180,29 @@ def _parse_row(row_num: int, fields: dict) -> dict:
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _attach_person_ids(tree_persons: list[dict], raw_dicts: list[dict]) -> None:
|
||||||
|
"""Attach the register's verbatim person_id to each tree person, in place.
|
||||||
|
|
||||||
|
The register (persons.parse_register) is the sole authority for person_id; it
|
||||||
|
slugifies and suffixes colliding ids exactly once. We propagate that id rather
|
||||||
|
than re-slugify in the tree, because re-slugifying would not reproduce the
|
||||||
|
register's collision suffixes and so would not reconcile 1:1 with the register
|
||||||
|
(#670, Gap 3).
|
||||||
|
|
||||||
|
tree_persons and raw_dicts must be the same length and in the same row order —
|
||||||
|
parse_register and _parse_row both keep exactly the rows that have a last name.
|
||||||
|
"""
|
||||||
|
register = _persons.parse_register(raw_dicts)
|
||||||
|
if len(tree_persons) != len(register):
|
||||||
|
raise ValueError(
|
||||||
|
"person_id propagation requires equal length: "
|
||||||
|
f"{len(tree_persons)} tree persons vs {len(register)} register persons "
|
||||||
|
"(the positional zip would otherwise silently truncate and mis-join ids)"
|
||||||
|
)
|
||||||
|
for tree_person, register_person in zip(tree_persons, register):
|
||||||
|
tree_person["personId"] = register_person.person_id
|
||||||
|
|
||||||
|
|
||||||
def _deduplicate(persons: list[dict]) -> tuple[list[dict], list[str]]:
|
def _deduplicate(persons: list[dict]) -> tuple[list[dict], list[str]]:
|
||||||
"""Remove duplicate rows. Two-stage:
|
"""Remove duplicate rows. Two-stage:
|
||||||
|
|
||||||
@@ -339,11 +367,17 @@ def main() -> None:
|
|||||||
|
|
||||||
# --- Pass 1: parse rows ---
|
# --- Pass 1: parse rows ---
|
||||||
persons_raw: list[dict] = []
|
persons_raw: list[dict] = []
|
||||||
|
raw_dicts: list[dict] = []
|
||||||
for row_num, row in enumerate(rows[1:], start=2):
|
for row_num, row in enumerate(rows[1:], start=2):
|
||||||
field_dict = {field: (row[col] if col < len(row) else "") for field, col in fields_map.items()}
|
field_dict = {field: (row[col] if col < len(row) else "") for field, col in fields_map.items()}
|
||||||
if not field_dict.get("last_name", "").strip():
|
if not field_dict.get("last_name", "").strip():
|
||||||
continue
|
continue
|
||||||
persons_raw.append(_parse_row(row_num, field_dict))
|
persons_raw.append(_parse_row(row_num, field_dict))
|
||||||
|
raw_dicts.append(field_dict)
|
||||||
|
|
||||||
|
# Propagate the register's verbatim person_id before dedup so the tree reconciles 1:1
|
||||||
|
# with canonical-persons.xlsx (#670, Gap 3).
|
||||||
|
_attach_person_ids(persons_raw, raw_dicts)
|
||||||
|
|
||||||
persons, skipped_msgs = _deduplicate(persons_raw)
|
persons, skipped_msgs = _deduplicate(persons_raw)
|
||||||
for msg in skipped_msgs:
|
for msg in skipped_msgs:
|
||||||
@@ -387,7 +421,7 @@ def main() -> None:
|
|||||||
return
|
return
|
||||||
|
|
||||||
output = {
|
output = {
|
||||||
"generated_at": datetime.datetime.now().isoformat(),
|
"generated_at": _GENERATED_AT,
|
||||||
"source": Path(args.input).name,
|
"source": Path(args.input).name,
|
||||||
"stats": {
|
"stats": {
|
||||||
"persons": len(persons),
|
"persons": len(persons),
|
||||||
|
|||||||
@@ -2,6 +2,18 @@ import datetime
|
|||||||
import dates
|
import dates
|
||||||
from dates import Precision
|
from dates import Precision
|
||||||
|
|
||||||
|
def test_matchers_return_uniform_matchresult():
|
||||||
|
# Every matcher returns a MatchResult(iso, precision, end) — no 2- vs 3-tuple
|
||||||
|
# length-sniffing. A non-range matcher leaves end=None; a range matcher sets it.
|
||||||
|
day = dates._match_numeric("15.2.1888")
|
||||||
|
assert isinstance(day, dates.MatchResult)
|
||||||
|
assert (day.iso, day.precision, day.end) == ("1888-02-15", Precision.DAY, None)
|
||||||
|
|
||||||
|
rng = dates._match_range("10./11.1.1917")
|
||||||
|
assert isinstance(rng, dates.MatchResult)
|
||||||
|
assert (rng.iso, rng.precision, rng.end) == ("1917-01-10", Precision.RANGE, "1917-01-11")
|
||||||
|
|
||||||
|
|
||||||
def test_easter_known_years():
|
def test_easter_known_years():
|
||||||
# Anonymous Gregorian algorithm — verified against published tables
|
# Anonymous Gregorian algorithm — verified against published tables
|
||||||
assert dates.easter(2024) == datetime.date(2024, 3, 31)
|
assert dates.easter(2024) == datetime.date(2024, 3, 31)
|
||||||
@@ -115,10 +127,55 @@ def test_parse_invalid_calendar_date_is_unknown():
|
|||||||
assert dates.parse_date("31.4.1916").precision == Precision.UNKNOWN
|
assert dates.parse_date("31.4.1916").precision == Precision.UNKNOWN
|
||||||
|
|
||||||
def test_parse_intra_month_day_range():
|
def test_parse_intra_month_day_range():
|
||||||
# "7./8. Sept.1923" -> start day, RANGE. Must NOT be confused with slash-date "17/6. 1916".
|
# "7./8. Sept.1923" -> start day, RANGE, end day 8th. Must NOT be confused with slash-date "17/6. 1916".
|
||||||
assert dates.parse_date("7./8. Sept.1923") == dates.ParsedDate("1923-09-07", Precision.RANGE, "7./8. Sept.1923")
|
assert dates.parse_date("7./8. Sept.1923") == dates.ParsedDate("1923-09-07", Precision.RANGE, "7./8. Sept.1923", "1923-09-08")
|
||||||
assert dates.parse_date("17/6. 1916") == dates.ParsedDate("1916-06-17", Precision.DAY, "17/6. 1916")
|
assert dates.parse_date("17/6. 1916") == dates.ParsedDate("1916-06-17", Precision.DAY, "17/6. 1916")
|
||||||
|
|
||||||
|
def test_parse_intra_month_day_range_carries_end_day():
|
||||||
|
# the intra-month day range surfaces the END day so Phase 4 can render meta_date_end
|
||||||
|
r = dates.parse_date("10./11.1.1917")
|
||||||
|
assert r.iso == "1917-01-10"
|
||||||
|
assert r.precision == Precision.RANGE
|
||||||
|
assert r.end == "1917-01-11"
|
||||||
|
|
||||||
|
def test_parse_roman_month_day_range():
|
||||||
|
# "10./11.I.1917" — Roman-numeral-month range; previously fell through to UNKNOWN
|
||||||
|
r = dates.parse_date("10./11.I.1917")
|
||||||
|
assert r.iso == "1917-01-10"
|
||||||
|
assert r.precision == Precision.RANGE
|
||||||
|
assert r.end == "1917-01-11"
|
||||||
|
|
||||||
|
def test_parse_range_invalid_end_keeps_start_flags_review():
|
||||||
|
# "10./40.1.1917" — the 40th is an impossible end day. The start parses fine,
|
||||||
|
# so the row stays RANGE with the start preserved, the unparseable end is dropped
|
||||||
|
# (end is None), and the half-resolved range is flagged needs_review so the
|
||||||
|
# dropped end surfaces honestly instead of vanishing silently (#670, Gap 2).
|
||||||
|
r = dates.parse_date("10./40.1.1917")
|
||||||
|
assert r.iso == "1917-01-10"
|
||||||
|
assert r.precision == Precision.RANGE
|
||||||
|
assert r.end is None
|
||||||
|
assert r.needs_review is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_range_valid_end_not_flagged():
|
||||||
|
# a fully-resolved range carries its end and is NOT flagged for review
|
||||||
|
r = dates.parse_date("10./11.1.1917")
|
||||||
|
assert r.end == "1917-01-11"
|
||||||
|
assert r.needs_review is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_non_range_has_no_review_flag():
|
||||||
|
# every fully-parsed non-range date is never flagged for review by the date layer
|
||||||
|
assert dates.parse_date("15.2.1888").needs_review is False
|
||||||
|
assert dates.parse_date("Mai 1895").needs_review is False
|
||||||
|
assert dates.parse_date("").needs_review is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_non_range_has_no_end():
|
||||||
|
assert dates.parse_date("15.2.1888").end is None
|
||||||
|
assert dates.parse_date("Mai 1895").end is None
|
||||||
|
assert dates.parse_date("").end is None
|
||||||
|
|
||||||
def test_parse_trailing_note_stripped_but_raw_preserved():
|
def test_parse_trailing_note_stripped_but_raw_preserved():
|
||||||
r = dates.parse_date("17.Nov 1887, 2. Brief") # REQ-DATE-04
|
r = dates.parse_date("17.Nov 1887, 2. Brief") # REQ-DATE-04
|
||||||
assert r.iso == "1887-11-17"
|
assert r.iso == "1887-11-17"
|
||||||
|
|||||||
@@ -52,8 +52,59 @@ def test_to_canonical_resolves_and_flags():
|
|||||||
assert doc.receiver_person_ids == ["de-gruyter-eugenie"] # matched via maiden alias
|
assert doc.receiver_person_ids == ["de-gruyter-eugenie"] # matched via maiden alias
|
||||||
assert doc.date_iso == "1888-02-15" and doc.date_precision == "DAY"
|
assert doc.date_iso == "1888-02-15" and doc.date_precision == "DAY"
|
||||||
assert doc.tags == ["Themen/Brautbriefe"]
|
assert doc.tags == ["Themen/Brautbriefe"]
|
||||||
|
assert doc.file == r"..\__scan\W-0001.pdf" # file name carried through for the importer
|
||||||
assert doc.needs_review == []
|
assert doc.needs_review == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_to_canonical_carries_file_name():
|
||||||
|
ctx = _ctx()
|
||||||
|
raw = documents.RawRow(source_row=4, index="H-0730", sender="", receivers="",
|
||||||
|
file="H-0730.pdf")
|
||||||
|
doc = documents.to_canonical(raw, ctx, date_overrides={})
|
||||||
|
assert doc.file == "H-0730.pdf"
|
||||||
|
|
||||||
|
|
||||||
|
def test_to_canonical_range_carries_date_end():
|
||||||
|
ctx = _ctx()
|
||||||
|
raw = documents.RawRow(source_row=4, index="H-0730", sender="", receivers="",
|
||||||
|
date="10./11.1.1917")
|
||||||
|
doc = documents.to_canonical(raw, ctx, date_overrides={})
|
||||||
|
assert doc.date_iso == "1917-01-10"
|
||||||
|
assert doc.date_precision == "RANGE"
|
||||||
|
assert doc.date_end == "1917-01-11"
|
||||||
|
|
||||||
|
|
||||||
|
def test_to_canonical_non_range_has_empty_date_end():
|
||||||
|
ctx = _ctx()
|
||||||
|
raw = documents.RawRow(source_row=4, index="H-0730", sender="", receivers="",
|
||||||
|
date="15.2.1888")
|
||||||
|
doc = documents.to_canonical(raw, ctx, date_overrides={})
|
||||||
|
assert doc.date_precision == "DAY"
|
||||||
|
assert doc.date_end == ""
|
||||||
|
|
||||||
|
def test_to_canonical_half_resolved_range_flags_review():
|
||||||
|
# an impossible end day ("10./40.1.1917") keeps the start + RANGE precision but
|
||||||
|
# drops the unparseable end; the document must surface this as a review flag
|
||||||
|
# so the importer (#669) knows date_end is empty on a RANGE row by design.
|
||||||
|
ctx = _ctx()
|
||||||
|
raw = documents.RawRow(source_row=5, index="H-0731", sender="", receivers="",
|
||||||
|
date="10./40.1.1917")
|
||||||
|
doc = documents.to_canonical(raw, ctx, date_overrides={})
|
||||||
|
assert doc.date_iso == "1917-01-10"
|
||||||
|
assert doc.date_precision == "RANGE"
|
||||||
|
assert doc.date_end == ""
|
||||||
|
assert "range_end_unparsed" in doc.needs_review
|
||||||
|
|
||||||
|
|
||||||
|
def test_to_canonical_full_range_not_flagged():
|
||||||
|
ctx = _ctx()
|
||||||
|
raw = documents.RawRow(source_row=5, index="H-0730", sender="", receivers="",
|
||||||
|
date="10./11.1.1917")
|
||||||
|
doc = documents.to_canonical(raw, ctx, date_overrides={})
|
||||||
|
assert doc.date_end == "1917-01-11"
|
||||||
|
assert "range_end_unparsed" not in doc.needs_review
|
||||||
|
|
||||||
|
|
||||||
def test_to_canonical_unmatched_and_unparsed():
|
def test_to_canonical_unmatched_and_unparsed():
|
||||||
ctx = _ctx()
|
ctx = _ctx()
|
||||||
raw = documents.RawRow(source_row=9, index="C-0001",
|
raw = documents.RawRow(source_row=9, index="C-0001",
|
||||||
|
|||||||
@@ -1,3 +1,8 @@
|
|||||||
|
import json
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
import openpyxl
|
import openpyxl
|
||||||
import normalize
|
import normalize
|
||||||
|
|
||||||
@@ -119,3 +124,56 @@ def test_approved_themes_applied(tmp_path):
|
|||||||
tag_values = [ws.cell(row=r, column=tag_col + 1).value for r in range(2, ws.max_row + 1)]
|
tag_values = [ws.cell(row=r, column=tag_col + 1).value for r in range(2, ws.max_row + 1)]
|
||||||
# W-0001 has Inhalt "Geschäftsreise" — should get an extra Themen/geschäftsreise tag
|
# W-0001 has Inhalt "Geschäftsreise" — should get an extra Themen/geschäftsreise tag
|
||||||
assert any(v and "Themen/geschäftsreise" in v for v in tag_values)
|
assert any(v and "Themen/geschäftsreise" in v for v in tag_values)
|
||||||
|
|
||||||
|
|
||||||
|
def _person_wb_with_collision(tmp_path):
|
||||||
|
# Two "Hans Cram" rows force the register to suffix the colliding slug (-1/-2);
|
||||||
|
# the tree must carry those exact suffixed ids so the join still reconciles.
|
||||||
|
wb = openpyxl.Workbook(); ws = wb.active; ws.title = "Tabelle1"
|
||||||
|
ws.append(["Generation", "Familienname", "Vorname", "geb als", "Geburtsdatum",
|
||||||
|
"Geburtsort", "Todesdatum", "Sterbeort", "verheiratet mit", "Bemerkung"])
|
||||||
|
ws.append(["G 1", "de Gruyter", "Walter", "", "", "", "", "", "", ""])
|
||||||
|
ws.append(["G 1", "de Gruyter", "Eugenie", "Müller", "", "", "", "", "", ""])
|
||||||
|
ws.append(["G 2", "Cram", "Hans", "", "1890", "", "", "", "", ""])
|
||||||
|
ws.append(["G 3", "Cram", "Hans", "", "1925", "", "", "", "", ""])
|
||||||
|
p = tmp_path / "persons.xlsx"; wb.save(p); return p
|
||||||
|
|
||||||
|
|
||||||
|
def _generate_tree(person_wb, out_path):
|
||||||
|
script = Path(__file__).parent.parent / "persons_tree.py"
|
||||||
|
result = subprocess.run(
|
||||||
|
[sys.executable, str(script), "--input", str(person_wb), "--output", str(out_path)],
|
||||||
|
capture_output=True, text=True,
|
||||||
|
)
|
||||||
|
assert result.returncode == 0, result.stderr
|
||||||
|
return json.loads(out_path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
|
||||||
|
def test_tree_person_ids_reconcile_with_persons_xlsx(tmp_path):
|
||||||
|
# The real #669 contract: every personId in canonical-persons-tree.json must join
|
||||||
|
# 1:1 onto a person_id in canonical-persons.xlsx — no orphan tree id, no duplicate.
|
||||||
|
# Both artifacts are produced from the SAME person workbook (collision included).
|
||||||
|
person_wb = _person_wb_with_collision(tmp_path)
|
||||||
|
out_dir = tmp_path / "out"; review_dir = tmp_path / "review"
|
||||||
|
|
||||||
|
normalize.run(
|
||||||
|
document_workbook=_doc_wb(tmp_path), document_sheet="Familienarchiv",
|
||||||
|
person_workbook=person_wb, person_sheet="Tabelle1",
|
||||||
|
out_dir=out_dir, review_dir=review_dir, date_overrides={}, name_overrides={})
|
||||||
|
|
||||||
|
tree = _generate_tree(person_wb, tmp_path / "tree.json")
|
||||||
|
tree_ids = [p["personId"] for p in tree["persons"]]
|
||||||
|
|
||||||
|
wb = openpyxl.load_workbook(out_dir / "canonical-persons.xlsx")
|
||||||
|
ws = wb.active
|
||||||
|
header = [c.value for c in ws[1]]
|
||||||
|
pid_col = header.index("person_id")
|
||||||
|
register_ids = [ws.cell(row=r, column=pid_col + 1).value for r in range(2, ws.max_row + 1)]
|
||||||
|
|
||||||
|
# tree ids are unique (no duplicate join key)
|
||||||
|
assert len(tree_ids) == len(set(tree_ids))
|
||||||
|
# the suffixed collision ids actually reached the tree
|
||||||
|
assert "cram-hans-1" in tree_ids and "cram-hans-2" in tree_ids
|
||||||
|
# every tree id resolves to exactly one register row — the join is total and 1:1
|
||||||
|
register_counts = {pid: register_ids.count(pid) for pid in tree_ids}
|
||||||
|
assert all(count == 1 for count in register_counts.values()), register_counts
|
||||||
|
|||||||
@@ -433,6 +433,64 @@ def test_parse_bemerkung_sohn_with_trailing_remark():
|
|||||||
assert notes == "nach Mexiko emigriert"
|
assert notes == "nach Mexiko emigriert"
|
||||||
|
|
||||||
|
|
||||||
|
def test_generated_at_is_fixed_for_reproducibility():
|
||||||
|
# NFR-IDEM-01: a pinned timestamp so the committed tree JSON doesn't churn on every run
|
||||||
|
assert persons_tree._GENERATED_AT == "2020-01-01T00:00:00"
|
||||||
|
|
||||||
|
|
||||||
|
def test_attach_person_ids_propagates_register_slug():
|
||||||
|
# the tree person must carry the register's verbatim person_id (slug), not a recomputed one
|
||||||
|
raw_dicts = [
|
||||||
|
{"generation": "G 1", "last_name": "de Gruyter", "first_name": "Walter",
|
||||||
|
"maiden_name": "", "birth_date": "", "birth_place": "",
|
||||||
|
"death_date": "", "death_place": "", "spouse": "", "notes": ""},
|
||||||
|
{"generation": "G 1", "last_name": "de Gruyter", "first_name": "Eugenie",
|
||||||
|
"maiden_name": "Müller", "birth_date": "", "birth_place": "",
|
||||||
|
"death_date": "", "death_place": "", "spouse": "", "notes": ""},
|
||||||
|
]
|
||||||
|
tree_persons = [persons_tree._parse_row(n, d) for n, d in enumerate(raw_dicts, start=2)]
|
||||||
|
persons_tree._attach_person_ids(tree_persons, raw_dicts)
|
||||||
|
assert tree_persons[0]["personId"] == "de-gruyter-walter"
|
||||||
|
assert tree_persons[1]["personId"] == "de-gruyter-eugenie"
|
||||||
|
|
||||||
|
|
||||||
|
def test_attach_person_ids_raises_on_length_divergence():
|
||||||
|
# The propagation is a positional zip; if tree_persons and the register drift in
|
||||||
|
# length (e.g. a future filter change), zip would silently truncate and mis-join ids.
|
||||||
|
# The guard must fail loudly instead.
|
||||||
|
raw_dicts = [
|
||||||
|
{"generation": "G 1", "last_name": "de Gruyter", "first_name": "Walter",
|
||||||
|
"maiden_name": "", "birth_date": "", "birth_place": "",
|
||||||
|
"death_date": "", "death_place": "", "spouse": "", "notes": ""},
|
||||||
|
# second register row has a last name -> parse_register keeps it ...
|
||||||
|
{"generation": "G 1", "last_name": "de Gruyter", "first_name": "Eugenie",
|
||||||
|
"maiden_name": "Müller", "birth_date": "", "birth_place": "",
|
||||||
|
"death_date": "", "death_place": "", "spouse": "", "notes": ""},
|
||||||
|
]
|
||||||
|
# ... but the tree side only has one person -> lengths diverge.
|
||||||
|
tree_persons = [persons_tree._parse_row(2, raw_dicts[0])]
|
||||||
|
import pytest
|
||||||
|
with pytest.raises(ValueError, match="length"):
|
||||||
|
persons_tree._attach_person_ids(tree_persons, raw_dicts)
|
||||||
|
|
||||||
|
|
||||||
|
def test_attach_person_ids_carries_register_collision_suffix():
|
||||||
|
# when two register rows slug-collide, the register suffixes the ids (-1, -2);
|
||||||
|
# those exact suffixed ids must reach the tree persons, never a recomputed bare slug
|
||||||
|
raw_dicts = [
|
||||||
|
{"generation": "G 2", "last_name": "Cram", "first_name": "Hans",
|
||||||
|
"maiden_name": "", "birth_date": "1890", "birth_place": "",
|
||||||
|
"death_date": "", "death_place": "", "spouse": "", "notes": ""},
|
||||||
|
{"generation": "G 3", "last_name": "Cram", "first_name": "Hans",
|
||||||
|
"maiden_name": "", "birth_date": "1925", "birth_place": "",
|
||||||
|
"death_date": "", "death_place": "", "spouse": "", "notes": ""},
|
||||||
|
]
|
||||||
|
tree_persons = [persons_tree._parse_row(n, d) for n, d in enumerate(raw_dicts, start=2)]
|
||||||
|
persons_tree._attach_person_ids(tree_persons, raw_dicts)
|
||||||
|
assert tree_persons[0]["personId"] == "cram-hans-1"
|
||||||
|
assert tree_persons[1]["personId"] == "cram-hans-2"
|
||||||
|
|
||||||
|
|
||||||
import subprocess
|
import subprocess
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -31,6 +31,21 @@ def test_write_documents_xlsx_joins_lists(tmp_path):
|
|||||||
assert row["receiver_person_ids"] == "a|b"
|
assert row["receiver_person_ids"] == "a|b"
|
||||||
assert row["needs_review"] == "unparsed_date"
|
assert row["needs_review"] == "unparsed_date"
|
||||||
|
|
||||||
|
|
||||||
|
def test_write_documents_xlsx_carries_file_and_date_end(tmp_path):
|
||||||
|
doc = documents.CanonicalDocument(
|
||||||
|
index="H-0730", file="H-0730.pdf", date_iso="1917-01-10",
|
||||||
|
date_precision="RANGE", date_end="1917-01-11")
|
||||||
|
out = tmp_path / "docs.xlsx"
|
||||||
|
writers.write_documents_xlsx([doc], out)
|
||||||
|
wb = openpyxl.load_workbook(out)
|
||||||
|
ws = wb.active
|
||||||
|
header = [c.value for c in ws[1]]
|
||||||
|
assert "file" in header and "date_end" in header
|
||||||
|
row = {h: c.value for h, c in zip(header, ws[2])}
|
||||||
|
assert row["file"] == "H-0730.pdf"
|
||||||
|
assert row["date_end"] == "1917-01-11"
|
||||||
|
|
||||||
def test_write_documents_xlsx_pins_timestamp(tmp_path):
|
def test_write_documents_xlsx_pins_timestamp(tmp_path):
|
||||||
# determinism (NFR-IDEM-01): workbook created/modified are pinned, not the current time
|
# determinism (NFR-IDEM-01): workbook created/modified are pinned, not the current time
|
||||||
doc = documents.CanonicalDocument(index="W-0001")
|
doc = documents.CanonicalDocument(index="W-0001")
|
||||||
|
|||||||
@@ -22,9 +22,10 @@ def _csv_safe(value):
|
|||||||
return "'" + s if s[:1] in ("=", "+", "-", "@", "\t", "\r", "\n") else s
|
return "'" + s if s[:1] in ("=", "+", "-", "@", "\t", "\r", "\n") else s
|
||||||
|
|
||||||
|
|
||||||
DOC_COLUMNS = ["index", "box", "folder", "sender_person_id", "sender_name",
|
DOC_COLUMNS = ["index", "file", "box", "folder", "sender_person_id", "sender_name",
|
||||||
"receiver_person_ids", "receiver_names", "date_iso", "date_raw",
|
"receiver_person_ids", "receiver_names", "date_iso", "date_raw",
|
||||||
"date_precision", "location", "tags", "summary", "source_row", "needs_review"]
|
"date_precision", "date_end", "location", "tags", "summary",
|
||||||
|
"source_row", "needs_review"]
|
||||||
|
|
||||||
PERSON_COLUMNS = ["person_id", "last_name", "first_name", "maiden_name", "title", "nickname",
|
PERSON_COLUMNS = ["person_id", "last_name", "first_name", "maiden_name", "title", "nickname",
|
||||||
"birth_date", "birth_date_raw", "birth_place", "death_date", "death_date_raw",
|
"birth_date", "birth_date_raw", "birth_place", "death_date", "death_date_raw",
|
||||||
|
|||||||
Reference in New Issue
Block a user