Commit Graph

2854 Commits

Author SHA1 Message Date
Marcel
97db718f81 docs(import): add unresolved-names plan + worklog entry
All checks were successful
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
CI / Backend Unit Tests (pull_request) Successful in 3m52s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Unit & Component Tests (pull_request) Successful in 4m13s
CI / Semgrep Security Scan (pull_request) Successful in 20s
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 16:01:18 +02:00
Marcel
06127724de docs(normalizer): document unresolved-names.csv review report
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 15:59:45 +02:00
Marcel
7c017eca2a test(normalizer): assert unresolved stat key + drop duplicate assertion
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 15:58:34 +02:00
Marcel
97ab9e38df feat(normalizer): unresolved-names report + fix ambiguous-pair over-flagging
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 15:54:37 +02:00
Marcel
f10b80a03f feat(normalizer): build_given_names from register + supplement
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 15:51:23 +02:00
Marcel
6478cc58ae feat(normalizer): classify_name + NameClass
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 15:47:40 +02:00
Marcel
a7c45b3a0e feat(normalizer): config tables for name classification
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 15:43:31 +02:00
Marcel
5ff0c25e10 chore: drop stray reader-dashboard test from this branch
All checks were successful
CI / Semgrep Security Scan (pull_request) Successful in 23s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s
CI / Unit & Component Tests (pull_request) Successful in 3m31s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m53s
CI / fail2ban Regex (pull_request) Successful in 41s
page.server.spec.ts picked up an unrelated reader-dashboard test case via
a cross-session staging race; restore it to match main so this PR only
touches the import-normalizer tool + docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 15:07:14 +02:00
Marcel
7ba3a29592 docs(import): record normalizer completion + dry-run results in worklog
Some checks failed
CI / Unit & Component Tests (pull_request) Failing after 1m17s
CI / OCR Service Tests (pull_request) Successful in 19s
CI / Backend Unit Tests (pull_request) Successful in 3m46s
CI / fail2ban Regex (pull_request) Successful in 41s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:56:20 +02:00
Marcel
d314fd9338 docs(normalizer): README + seed overrides
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:51:20 +02:00
Marcel
18d5a1e2da feat(normalizer): orchestrator + end-to-end integration test
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:46:13 +02:00
Marcel
df00ea4238 fix(normalizer): defang leading LF in CSV + assert pinned workbook timestamp
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:43:45 +02:00
Marcel
ff1a7c07f1 feat(normalizer): overrides loader + xlsx/csv writers
Recovered from an entangled commit: these files were correct but had been
bundled into an unrelated reader-dashboard commit by a concurrent session.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:39:28 +02:00
Marcel
366b484815 test(normalizer): real provisional-vs-register collision + override-hits coverage
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:25:49 +02:00
Marcel
88c8063227 feat(normalizer): person resolution context + to_canonical
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:18:09 +02:00
Marcel
3066d3d3ff refactor(normalizer): harden triage index guard + index_file_mismatch tests
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:15:50 +02:00
Marcel
3e7ddea90a feat(normalizer): row extraction, triage, canonical record
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:12:48 +02:00
Marcel
75b3ca8b9e fix(normalizer): don't coerce boolean cells to 1/0
Add bool guard before the int branch in _cell_to_str so True/False
cells are preserved as "True"/"False" instead of "1"/"0". Add two
regression tests covering the fix and missing-sheet error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:11:19 +02:00
Marcel
74c4c390fc feat(normalizer): xlsx ingest + header mapping
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:08:30 +02:00
Marcel
29087319e6 test(normalizer): cover AliasIndex unambiguous first-name resolution
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:07:20 +02:00
Marcel
53457d9319 feat(normalizer): alias index with maiden/married/nickname resolution
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:04:11 +02:00
Marcel
2d97595e9c fix(normalizer): split_receivers returns [] for a geb.-only cell
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:02:35 +02:00
Marcel
a177077b40 feat(normalizer): receiver splitting
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:59:51 +02:00
Marcel
b7a2332861 fix(normalizer): suffix all members of a colliding person-id group
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:58:35 +02:00
Marcel
1da1a8d223 feat(normalizer): person register parsing
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:54:37 +02:00
Marcel
59715bdccd fix(normalizer): require day-dot in English month-first matcher (structural anti-shadow)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:53:05 +02:00
Marcel
53a661adb6 feat(normalizer): month/year, feast/season, range matchers + overrides
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:47:26 +02:00
Marcel
4942c0ea07 feat(normalizer): day-first month-name matcher
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:42:36 +02:00
Marcel
7edc002ebb feat(normalizer): roman-numeral month matcher
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:38:32 +02:00
Marcel
b43dd6cdd4 fix(normalizer): keep Task 5 scoped — drop year-only matcher (belongs to Task 8)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:36:48 +02:00
Marcel
cff486dda7 fix(normalizer): treat leading date qualifiers (nach/vor/…) as APPROX
_preprocess now sets approx=True when a leading marker is stripped; add
_match_year_only so bare years (e.g. "nach 1900" -> "1900") resolve to
1900-01-01/YEAR before being upgraded to APPROX. Strengthen
test_parse_approx_marker_upgrades_precision and add
test_parse_leading_qualifier_is_approx (11 tests, all pass).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:35:19 +02:00
Marcel
df14e6b1ee feat(normalizer): parse_date dispatch + iso/numeric matchers
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:30:07 +02:00
Marcel
1908dde859 feat(normalizer): year expansion century rule
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:27:26 +02:00
Marcel
4845e7a3c1 feat(normalizer): feast + season resolution
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:24:26 +02:00
Marcel
c6cceec6e9 feat(normalizer): Easter computus
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:21:39 +02:00
Marcel
8f6f4f2d62 feat(normalizer): scaffold tool + config tables
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:18:52 +02:00
Marcel
6f7aa643c9 docs(import): add normalizer implementation plan + apply persona review
17-task TDD plan for tools/import-normalizer/. Incorporates inline
6-persona review: content-deterministic idempotency, duplicate-index
fix, provisional-id collision guard, date-parser edge cases, multi-sender
split, CSV-injection defang, pinned deps.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 12:55:50 +02:00
Marcel
adfff420a5 docs(import): add import-migration analysis + normalizer spec
Document the raw archive spreadsheet findings (IMP-01..12) and a
requirements spec for an offline normalizer that produces a clean
canonical dataset before import. Local docs only; no Gitea issue yet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 12:32:37 +02:00
Marcel
8e9e3bba06 refactor(document): address review concerns from PR #660
All checks were successful
CI / Semgrep Security Scan (pull_request) Successful in 21s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s
nightly / deploy-staging (push) Successful in 2m2s
CI / Unit & Component Tests (push) Successful in 3m58s
CI / OCR Service Tests (push) Successful in 20s
CI / Backend Unit Tests (push) Successful in 3m50s
CI / fail2ban Regex (push) Successful in 44s
CI / Unit & Component Tests (pull_request) Successful in 3m29s
CI / Semgrep Security Scan (push) Successful in 21s
CI / OCR Service Tests (pull_request) Successful in 21s
CI / Backend Unit Tests (pull_request) Successful in 3m43s
CI / Compose Bucket Idempotency (push) Successful in 59s
CI / fail2ban Regex (pull_request) Successful in 45s
- Restore JavaDoc on DocumentSearchResult.of() and .paged() factory methods
- Remove redundant null guards on @Builder.Default collections in toListItem()
- Map DocumentListItem fields explicitly in DocumentMultiSelect before cast
- Add DocumentListItem required fields to docFactory in spec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 19:27:31 +02:00
Marcel
627fc44d99 fix(document): fix test regressions from DocumentListItem migration
All checks were successful
CI / Unit & Component Tests (pull_request) Successful in 3m32s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m46s
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 19s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m0s
- Use documentService.getDocumentById() in detail_stillReturnsTrainingLabels
  so the Document.full entity graph eager-loads trainingLabels
- Flatten makeItem() factory in DocumentList.svelte.test.ts (nested
  document: {} overrides broke item.id / item.documentDate access)
- Remove { document: {} } wrapper from DocumentMultiSelect.svelte.spec.ts
  mock responses — component now reads body.items directly as flat items
- Flatten single nested item in page.svelte.test.ts document list test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 19:19:28 +02:00
Marcel
6583226d79 refactor(document): migrate frontend from DocumentSearchItem to flat DocumentListItem
All components, specs, and the generated API client now use the new
DocumentListItem shape — flat access (item.title, item.sender) instead of
the removed item.document.* nesting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 19:19:28 +02:00
Marcel
41b205becc test(document): add LazyInit guard + detail regression tests; prune Document.list graph
Remove trainingLabels from Document.list entity graph now that DocumentListItem
does not touch that association. Integration tests guard against future
LazyInitializationException regressions and confirm Document.full still
loads trainingLabels for the detail endpoint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 19:19:28 +02:00
Marcel
f22dcaecb7 refactor(document): replace DocumentSearchItem with flat DocumentListItem DTO
Eliminates excessive data exposure (OWASP API3:2023) — transcription,
filePath, fileHash, thumbnailKey, scriptType and other detail-only fields
are no longer serialised in the list API response.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 19:19:03 +02:00
Marcel
1109ab917b docs(observability): ADR-024 + rotation runbook for grafana_reader
All checks were successful
CI / Backend Unit Tests (push) Successful in 3m35s
CI / fail2ban Regex (push) Successful in 42s
CI / Semgrep Security Scan (push) Successful in 19s
CI / Compose Bucket Idempotency (push) Successful in 1m3s
nightly / deploy-staging (push) Successful in 2m0s
CI / Unit & Component Tests (pull_request) Successful in 3m39s
CI / OCR Service Tests (pull_request) Successful in 22s
CI / Backend Unit Tests (pull_request) Successful in 3m53s
CI / fail2ban Regex (pull_request) Successful in 43s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m1s
CI / Unit & Component Tests (push) Successful in 3m39s
CI / OCR Service Tests (push) Successful in 20s
ADR-024 records the deliberate cross-domain link (obs-grafana joins
archiv-net to query archive-db via the SELECT-only grafana_reader role),
the rejected alternatives (Prometheus exporter, read replica, versioned
migration + flyway repair, hardcoded fallback), and the consequences —
specifically that a Grafana compromise gains TCP reach to archive-db
but is bounded by the role's least-privilege grants.

The DEPLOYMENT.md runbook documents the rotation procedure that
R__grafana_reader_password.sql now enables: bump GRAFANA_DB_PASSWORD,
restart backend (Flyway re-applies because the resolved checksum
changed), restart obs-grafana (datasource picks up the new env var).
Also calls out the fail-closed startup behavior so operators who hit
IllegalStateException know it is deliberate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 17:21:27 +02:00
Marcel
769984608b test(observability): expand grafana_reader coverage with write-deny + PII negatives
The original 4 tests asserted SELECT existed on the three granted tables
and was absent on app_users. That left two gaps a future migration could
slip through silently:

- INSERT/UPDATE/DELETE on the granted tables — if someone GRANTed write
  access on, say, documents to grafana_reader, the SELECT positives stay
  green and the boundary is breached invisibly.
- Other PII / sensitive tables — the single app_users negative checks
  one table; a wildcard "GRANT SELECT ON ALL TABLES IN SCHEMA public"
  would still leave it green by accident if app_users wasn't the only
  sensitive table.

Switch to a hasPrivilege(table, privilege) helper, add three write-deny
tests (INSERT/UPDATE/DELETE on each granted table), and replace the
single app_users negative with a parameterized sweep over app_users,
user_groups, persons, notifications, document_comments,
document_annotations, geschichten. New sensitive tables get added to
that list as they appear.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 17:21:01 +02:00
Marcel
c282f38170 feat(observability): own grafana_reader password via repeatable migration
V68 used to set the role's password in a versioned migration, which Flyway
applies exactly once per database. Rotating GRAFANA_DB_PASSWORD therefore
had no effect on the DB role — operators would need a manual ALTER ROLE
or a `flyway repair` that nobody documented. The shape conflated two
lifecycles: schema migration (one-shot, immutable) and credential
provisioning (rotatable).

Split into:
- V68 (versioned, immutable): creates the role and applies SELECT grants
  on audit_log, documents, transcription_blocks.
- R__grafana_reader_password.sql (repeatable): issues ALTER ROLE … PASSWORD
  with the placeholder. Flyway computes the checksum on the resolved
  content, so any change to GRAFANA_DB_PASSWORD changes the checksum and
  re-applies the migration on the next boot. Rotation becomes "bump env
  var + restart backend".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 17:20:35 +02:00
Marcel
3ea7f0b5b2 feat(observability): fail closed when GRAFANA_DB_PASSWORD is unset
FlywayConfig used to fall back to a hardcoded "changeme-grafana-db-password"
string when the env var was missing. That published a known credential for
the grafana_reader role (SELECT on audit_log, documents, transcription_blocks)
into git history and made silent fail-open the default for any deploy that
forgot the secret. Now resolution goes through Spring's Environment and
throws IllegalStateException at startup when the value is unset or blank —
same shape as UserDataInitializer's refusal to seed default admin creds.

Tests inject via the global GRAFANA_DB_PASSWORD entry in test-resources
application.properties so existing Flyway-loading test classes keep
booting without per-class TestPropertySource boilerplate. FlywayConfigTest
covers both branches against MockEnvironment without a Spring context.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 17:20:09 +02:00
Marcel
bcba4dab80 ci(observability): inject GRAFANA_DB_PASSWORD from Gitea secrets
All checks were successful
CI / fail2ban Regex (pull_request) Successful in 42s
CI / Semgrep Security Scan (pull_request) Successful in 20s
CI / Compose Bucket Idempotency (pull_request) Successful in 1m2s
CI / Unit & Component Tests (pull_request) Successful in 3m32s
CI / OCR Service Tests (pull_request) Successful in 20s
CI / Backend Unit Tests (pull_request) Successful in 3m30s
Wires the new GRAFANA_DB_PASSWORD secret through the deploy pipeline:

- docker-compose.prod.yml: backend env now passes GRAFANA_DB_PASSWORD
  through so Flyway V68 can resolve the ${grafanaDbPassword} placeholder
  in production and staging (it already worked in local dev via
  docker-compose.yml).
- release.yml + nightly.yml: declare GRAFANA_DB_PASSWORD as a required
  Gitea secret, write it into .env.production / .env.staging (consumed
  by archive-backend), and into /opt/familienarchiv/obs-secrets.env
  (consumed by obs-grafana's PostgreSQL datasource).

Operator action before the next deploy: add a GRAFANA_DB_PASSWORD value
to the Gitea repo secrets (openssl rand -hex 32).

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 20:21:27 +02:00
Marcel
a4a3e3b105 docs(architecture): show Grafana→PostgreSQL link for PO Overview dashboard
Adds the new read-only connection from Grafana to archive-db (via the
grafana_reader role) introduced by the PO Overview dashboard.

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 20:21:05 +02:00
Marcel
cac00ed711 docs(deployment): document GRAFANA_DB_PASSWORD across env tables
Adds GRAFANA_DB_PASSWORD to the observability-stack env-var table, the
Gitea secrets table, and the obs-secrets.env reference, so operators see
the variable wherever they look for related secrets.

Refs #651.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 20:21:05 +02:00