feat(tags): split flat tag taxonomy into documentType + event + freeform dimensions #325

Closed
opened 2026-04-24 13:26:09 +02:00 by marcel · 0 comments
Owner

Context

/admin/tags already informally groups tags into sections (Dokumententyp, Ereignisse, Alltag, …). All live in a single Tag table, but they serve at least three distinct semantic roles:

  • Document type — what kind of document is this? Examples with current usage counts: Telegramm (2), Schulrechnung (1), Kaufvertrag (1), Briefumschlag (2), Sterbeurkunde & Traueranzeige (1), Adressenliste (2). Should be a controlled vocabulary, single-valued per document.
  • Event — a dated family happening: Brautbriefe (129), Kondolenz (60), Tod Walter & Geburtstag Clara (1), Danksagung (1), Fest (1).
  • Theme / freeform topic — freeform context: Alltag (31), Erbauseinandersetzung (12), Alltag in Ruhrort, Briefwechsel Religionszugehörigkeit Lilli.

Currently all three live as rows in Tag, loosely grouped only by a spec file (docs/specs/admin-tag-overhaul.html). Consequences:

  • Tag auto-suggest during upload surfaces irrelevant matches (typing "Te…" for a theme returns Telegramm — a document type).
  • /documents filter can't narrow by type alone (typing filter and topic filter compete in one UI).
  • Usage counts skew wildly (129 vs 1), making a flat tag cloud noisy.

Marcel already has a design spec at docs/specs/admin-tag-overhaul.html — reuse it for the admin UI.

Non-goals

  • No deletion of any current tag. All remain usable through migration.
  • No mass re-tagging of documents beyond the automated migration mapping.
  • No English/Spanish renaming of existing German tag names — taxonomy stays in German.

Proposed model

Layer 1: introduce DocumentType enum on Document

  • DocumentType { LETTER, TELEGRAM, RECEIPT, CONTRACT, DEATH_CERTIFICATE, ADDRESS_LIST, ENVELOPE, POSTCARD, OTHER }.
  • Document.documentType: DocumentType NOT NULL DEFAULT LETTER.
  • Single-valued. Required on upload (default LETTER).

Layer 2: extend Tag with a subtype column

  • Tag.tagType: TagType NOT NULL DEFAULT 'FREEFORM' — enum { FREEFORM, EVENT, DOCUMENT_TYPE_LEGACY }.
  • DOCUMENT_TYPE_LEGACY marks the migrated document-type tags so they can be displayed but not re-used during new uploads.

Migration (one-time, scripted)

A Flyway V<next>__document_type_and_tag_type.sql:

  1. Add documents.document_type column.
  2. Add tag.tag_type column.
  3. For each document-type tag name (hardcoded list based on the current data — see docs/specs/admin-tag-overhaul.html):
    • For every document with that tag, set document_type to the matching enum.
    • Mark the tag itself tag_type = DOCUMENT_TYPE_LEGACY.
    • Emit a report row (via Flyway INSERT INTO migration_reports … or a console log) so the operator can review.
  4. For each event-heavy tag (usage count ≥ 20 + matches a known event name list), set tag_type = EVENT.
  5. Leave the rest as FREEFORM.

The migration must be idempotent and reversible in spirit (running twice does nothing; dropping the columns restores current state).

Implementation plan

Backend

  • Enum DocumentType (new).
  • Enum TagType (new).
  • Document.documentType field + @Schema(requiredMode = REQUIRED).
  • Tag.tagType field + @Schema(requiredMode = REQUIRED).
  • Flyway migration (above).
  • DocumentService + TagService methods to filter by type.
  • DocumentController.search accepts ?type=<DocumentType> filter.
  • TagController.search accepts ?tagType=<TagType> filter (used by typeahead).

Frontend

  • Upload form (/documents/new): new dropdown for DocumentType (required, default LETTER).
  • Edit form (/documents/[id]/edit): same dropdown.
  • TagInput.svelte: when opened in "event"-context, narrow the typeahead to tagType=EVENT OR FREEFORM. When in "freeform"-context, only FREEFORM. Do not surface DOCUMENT_TYPE_LEGACY tags at all in typeahead.
  • /documents filter bar: new "Typ" facet driven by DocumentType enum.
  • /admin/tags: three tabs or filter chips — Freeform / Event / Dokumententyp (Legacy). Follow docs/specs/admin-tag-overhaul.html.

i18n

10–12 new keys for the UI (dropdown labels, tab names, filter facet).
Translations for each DocumentType enum value in de/en/es (exposed via a utility documentTypeLabel(type, locale)).

Tests

  • Backend unit (migration): apply on a seeded fixture with known document-type tags; assert documents get correct enum; assert tags get correct tagType; assert report count matches expected.
  • Controller tests: /api/documents/search?type=TELEGRAM returns only telegrams; /api/tags/search?tagType=EVENT returns only events.
  • Frontend: TagInput narrows suggestions by context. Upload form saves a DocumentType. /documents filter chip works.
  • E2E: upload a new doc with type=LETTER + 2 event tags; search with ?type=LETTER; confirm visible; search with ?type=TELEGRAM; confirm hidden.

Verification

Post-migration sanity checks:

  • SELECT document_type, COUNT(*) FROM documents GROUP BY document_type; — distribution matches the pre-migration tag counts.
  • SELECT tag_type, COUNT(*) FROM tag GROUP BY tag_type; — Freeform > Event > Legacy.
  • No orphaned documents (all retain a document_type).
  • Total tag count unchanged (nothing deleted).

Acceptance criteria

  • DocumentType enum exists on Document with non-null default
  • Tag.tagType column exists with non-null default
  • Flyway migration classifies existing tags idempotently
  • Migration emits a report / log of what was classified
  • Upload + edit forms show DocumentType dropdown
  • /documents filter supports type
  • TagInput narrows suggestions by context
  • /admin/tags separates Freeform / Event / Legacy
  • No data loss — pre-migration tag counts tally with post-migration combined count
  • i18n complete for de/en/es

Critical files

backend/src/main/java/org/raddatz/familienarchiv/model/Document.java                   (+ documentType)
backend/src/main/java/org/raddatz/familienarchiv/model/DocumentType.java               (new)
backend/src/main/java/org/raddatz/familienarchiv/model/Tag.java                        (+ tagType)
backend/src/main/java/org/raddatz/familienarchiv/model/TagType.java                    (new)
backend/src/main/resources/db/migration/V<next>__document_type_and_tag_type.sql        (new)
backend/src/main/java/org/raddatz/familienarchiv/service/DocumentService.java          (filter)
backend/src/main/java/org/raddatz/familienarchiv/service/TagService.java               (filter)
backend/src/main/java/org/raddatz/familienarchiv/controller/DocumentController.java    (search param)
backend/src/main/java/org/raddatz/familienarchiv/controller/TagController.java         (search param)
frontend/src/routes/documents/new/+page.svelte                                         (dropdown)
frontend/src/routes/documents/[id]/edit/+page.svelte                                   (dropdown)
frontend/src/routes/documents/+page.svelte                                             (filter)
frontend/src/routes/admin/tags/+page.svelte                                            (tabs)
frontend/src/lib/components/TagInput.svelte                                            (context)
frontend/src/lib/generated/api.ts                                                      (regen)
frontend/messages/{de,en,es}.json
  • Existing spec: docs/specs/admin-tag-overhaul.html
  • Adjacent: tag cloud on /persons/[id] (#306 Korrespondenz-Überblick) must be updated to respect tagType after this merges.
## Context `/admin/tags` already informally groups tags into sections (Dokumententyp, Ereignisse, Alltag, …). All live in a single `Tag` table, but they serve at least three distinct semantic roles: - **Document type** — what kind of document is this? Examples with current usage counts: `Telegramm` (2), `Schulrechnung` (1), `Kaufvertrag` (1), `Briefumschlag` (2), `Sterbeurkunde & Traueranzeige` (1), `Adressenliste` (2). Should be a controlled vocabulary, single-valued per document. - **Event** — a dated family happening: `Brautbriefe` (129), `Kondolenz` (60), `Tod Walter & Geburtstag Clara` (1), `Danksagung` (1), `Fest` (1). - **Theme / freeform topic** — freeform context: `Alltag` (31), `Erbauseinandersetzung` (12), `Alltag in Ruhrort`, `Briefwechsel Religionszugehörigkeit Lilli`. Currently all three live as rows in `Tag`, loosely grouped only by a spec file (`docs/specs/admin-tag-overhaul.html`). Consequences: - Tag auto-suggest during upload surfaces irrelevant matches (typing "Te…" for a theme returns `Telegramm` — a document type). - `/documents` filter can't narrow by type alone (typing filter and topic filter compete in one UI). - Usage counts skew wildly (129 vs 1), making a flat tag cloud noisy. Marcel already has a design spec at `docs/specs/admin-tag-overhaul.html` — reuse it for the admin UI. ## Non-goals - No deletion of any current tag. All remain usable through migration. - No mass re-tagging of documents beyond the automated migration mapping. - No English/Spanish renaming of existing German tag names — taxonomy stays in German. ## Proposed model ### Layer 1: introduce `DocumentType` enum on `Document` - `DocumentType { LETTER, TELEGRAM, RECEIPT, CONTRACT, DEATH_CERTIFICATE, ADDRESS_LIST, ENVELOPE, POSTCARD, OTHER }`. - `Document.documentType: DocumentType NOT NULL DEFAULT LETTER`. - Single-valued. Required on upload (default `LETTER`). ### Layer 2: extend `Tag` with a subtype column - `Tag.tagType: TagType NOT NULL DEFAULT 'FREEFORM'` — enum `{ FREEFORM, EVENT, DOCUMENT_TYPE_LEGACY }`. - `DOCUMENT_TYPE_LEGACY` marks the migrated document-type tags so they can be displayed but not re-used during new uploads. ### Migration (one-time, scripted) A Flyway `V<next>__document_type_and_tag_type.sql`: 1. Add `documents.document_type` column. 2. Add `tag.tag_type` column. 3. For each document-type tag name (hardcoded list based on the current data — see `docs/specs/admin-tag-overhaul.html`): - For every document with that tag, set `document_type` to the matching enum. - Mark the tag itself `tag_type = DOCUMENT_TYPE_LEGACY`. - Emit a report row (via Flyway `INSERT INTO migration_reports …` or a console log) so the operator can review. 4. For each event-heavy tag (usage count ≥ 20 + matches a known event name list), set `tag_type = EVENT`. 5. Leave the rest as `FREEFORM`. The migration must be **idempotent** and **reversible in spirit** (running twice does nothing; dropping the columns restores current state). ## Implementation plan ### Backend - Enum `DocumentType` (new). - Enum `TagType` (new). - `Document.documentType` field + `@Schema(requiredMode = REQUIRED)`. - `Tag.tagType` field + `@Schema(requiredMode = REQUIRED)`. - Flyway migration (above). - `DocumentService` + `TagService` methods to filter by type. - `DocumentController.search` accepts `?type=<DocumentType>` filter. - `TagController.search` accepts `?tagType=<TagType>` filter (used by typeahead). ### Frontend - **Upload form** (`/documents/new`): new dropdown for DocumentType (required, default LETTER). - **Edit form** (`/documents/[id]/edit`): same dropdown. - **`TagInput.svelte`**: when opened in "event"-context, narrow the typeahead to `tagType=EVENT OR FREEFORM`. When in "freeform"-context, only `FREEFORM`. Do not surface `DOCUMENT_TYPE_LEGACY` tags at all in typeahead. - **`/documents` filter bar**: new "Typ" facet driven by `DocumentType` enum. - **`/admin/tags`**: three tabs or filter chips — `Freeform` / `Event` / `Dokumententyp (Legacy)`. Follow `docs/specs/admin-tag-overhaul.html`. ### i18n 10–12 new keys for the UI (dropdown labels, tab names, filter facet). Translations for each `DocumentType` enum value in de/en/es (exposed via a utility `documentTypeLabel(type, locale)`). ## Tests - **Backend unit (migration):** apply on a seeded fixture with known document-type tags; assert documents get correct enum; assert tags get correct tagType; assert report count matches expected. - **Controller tests:** `/api/documents/search?type=TELEGRAM` returns only telegrams; `/api/tags/search?tagType=EVENT` returns only events. - **Frontend:** TagInput narrows suggestions by context. Upload form saves a DocumentType. `/documents` filter chip works. - **E2E:** upload a new doc with type=LETTER + 2 event tags; search with `?type=LETTER`; confirm visible; search with `?type=TELEGRAM`; confirm hidden. ## Verification Post-migration sanity checks: - `SELECT document_type, COUNT(*) FROM documents GROUP BY document_type;` — distribution matches the pre-migration tag counts. - `SELECT tag_type, COUNT(*) FROM tag GROUP BY tag_type;` — Freeform > Event > Legacy. - No orphaned documents (all retain a document_type). - Total tag count unchanged (nothing deleted). ## Acceptance criteria - [ ] `DocumentType` enum exists on Document with non-null default - [ ] `Tag.tagType` column exists with non-null default - [ ] Flyway migration classifies existing tags idempotently - [ ] Migration emits a report / log of what was classified - [ ] Upload + edit forms show DocumentType dropdown - [ ] `/documents` filter supports type - [ ] `TagInput` narrows suggestions by context - [ ] `/admin/tags` separates Freeform / Event / Legacy - [ ] No data loss — pre-migration tag counts tally with post-migration combined count - [ ] i18n complete for de/en/es ## Critical files ``` backend/src/main/java/org/raddatz/familienarchiv/model/Document.java (+ documentType) backend/src/main/java/org/raddatz/familienarchiv/model/DocumentType.java (new) backend/src/main/java/org/raddatz/familienarchiv/model/Tag.java (+ tagType) backend/src/main/java/org/raddatz/familienarchiv/model/TagType.java (new) backend/src/main/resources/db/migration/V<next>__document_type_and_tag_type.sql (new) backend/src/main/java/org/raddatz/familienarchiv/service/DocumentService.java (filter) backend/src/main/java/org/raddatz/familienarchiv/service/TagService.java (filter) backend/src/main/java/org/raddatz/familienarchiv/controller/DocumentController.java (search param) backend/src/main/java/org/raddatz/familienarchiv/controller/TagController.java (search param) frontend/src/routes/documents/new/+page.svelte (dropdown) frontend/src/routes/documents/[id]/edit/+page.svelte (dropdown) frontend/src/routes/documents/+page.svelte (filter) frontend/src/routes/admin/tags/+page.svelte (tabs) frontend/src/lib/components/TagInput.svelte (context) frontend/src/lib/generated/api.ts (regen) frontend/messages/{de,en,es}.json ``` ## Related - Existing spec: `docs/specs/admin-tag-overhaul.html` - Adjacent: tag cloud on `/persons/[id]` (#306 Korrespondenz-Überblick) must be updated to respect tagType after this merges.
marcel added the P2-mediumfeaturerefactor labels 2026-04-24 13:28:13 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#325