As a reader browsing topics I want each theme box to count documents across its whole sub-topic tree so I can judge a topic's depth at a glance #698

Closed
opened 2026-05-31 11:13:13 +02:00 by marcel · 1 comment
Owner

Context / problem

On /themen, each theme box shows a count of documents tagged with that exact tag (rows in document_tags, GROUP BY tag_id — never recursive). A parent tag whose documents mostly live on its children therefore reads as near-empty, understating how much material a topic actually holds. The reader surfaces should instead show the whole subtree.

Strongest rationale: the document search at /documents?tag=X already expands the tag to all its descendants (DocumentService:531TagRepository.findDescendantIdsByName). So today a theme box shows "Reisen — 2" but clicking it lands the user on a page with 7+ results — the count and its own destination disagree. A subtree rollup makes the box number equal what the user actually finds. Count↔destination parity is the core win.

User story

As a reader browsing topics, I want each theme box to show how many documents exist under that topic including its sub-topics, so that I can judge how rich a topic is without drilling into every child.

Decisions (resolved with product owner)

  • Scope of count: tag + all descendants (full subtree rollup), not just direct children, not literal siblings.
  • Double counting: count distinct documents — a document tagged with several tags inside the same subtree counts once.
  • Display / contract — Option B (new field, not a global redefinition): add a new field subtreeDocumentCount to TagTreeNodeDTO. The reader surfaces read the rollup; the admin surfaces keep the existing documentCount (direct count) unchanged.
    • Reader (use subtreeDocumentCount): /themen page (themen/+page.svelte) and the dashboard ThemenWidget.svelte.
    • Admin (keep documentCount, direct): admin/tags sidebar (TagTreeNode.svelte), and the two operation previews TagMergeZone.svelte and TagDeleteGuard.svelte.
    • Why not redefine documentCount globally (Option A): documentCount is read as logic input by TagMergeZone (line 87, "N documents will move") and TagDeleteGuard (line 54, delete-impact warning). Both describe direct-document operations (merge re-tags only direct docs; single-tag delete removes only direct document_tags). A rollup there would misstate a destructive action's effect. Option B keeps those previews truthful and requires no admin rework.
  • Child ordering on /themen may change as a result of the new counts — accepted.
  • Performance target ≤300 ms server-side for the current tag corpus — confirmed (validate in Grafana, not as a test gate; see NFRs).

Behaviour rules (EARS)

  • REQ-THEMEN-01: The system shall compute, for every tag, subtreeDocumentCount = the count of distinct documents tagged with that tag or any of its descendant tags.
  • REQ-THEMEN-02: If a document is tagged with multiple tags within the same subtree (e.g. both a parent and its child), then subtreeDocumentCount shall count that document exactly once.
  • REQ-THEMEN-03: The system shall continue to populate documentCount with the existing direct per-tag count, unchanged. Both fields are returned by GET /api/tags/tree.
  • REQ-THEMEN-04: The reader surfaces (/themen page and ThemenWidget) shall display subtreeDocumentCount; the admin surfaces (sidebar tree, merge preview, delete-impact guard) shall continue to display documentCount.
  • REQ-THEMEN-05: While a tag's entire subtree contains zero documents (subtreeDocumentCount == 0), the system shall omit that tag from the /themen page (current empty-tag hiding behaviour is preserved, now keyed on the rollup).
  • REQ-THEMEN-06: If the tag hierarchy contains a cycle or exceeds the existing depth guard, then the rollup shall terminate safely without error (reuse the current 50-level recursion guard).

Acceptance criteria (Given-When-Then)

  1. Given a root tag Reisen with 2 direct documents and a child Reisen → Italien with 5 documents (none shared), when I view /themen, then the Reisen box header shows 7 (subtreeDocumentCount).
  2. Given a document tagged with both Reisen and Reisen → Italien, when subtreeDocumentCount for Reisen is computed, then that document is counted once (if the 2 and 5 overlap by one document, the box shows 6, not 7). Make-or-break case — distinguishes COUNT(DISTINCT) from a naive sum-of-children.
  3. Given a grandchild tag Reisen → Italien → Rom with 3 documents, when I view the Reisen box, then those 3 are included (full descendant depth, not just direct children).
  4. Given a leaf tag with no children, when I view its box count, then subtreeDocumentCount equals its documentCount (rollup of a leaf = its direct count).
  5. Given a parent tag with 0 direct documents but children that have documents, when I view /themen, then its box shows a non-zero total (today it shows no number).
  6. Given a tag whose entire subtree has 0 documents, when the page loads, then the tag is not rendered (unchanged).
  7. Given a uniquely-named tag T, when I compare the /themen box count for T against the number of results on /documents?tag=T, then the two are equal (count↔destination parity — both expand to descendants distinct). Fixture must use a unique name; name-based search vs ID-based rollup only align when names don't collide.
  8. Given an admin viewing TagMergeZone / TagDeleteGuard for a tag, when the preview/impact number is shown, then it equals the tag's direct documentCount (admin previews are unchanged by this feature).

Non-functional requirements

  • NFR-PERF-THEMEN-01: The /themen tree, including all rolled-up distinct counts, shall be produced in a constant number of aggregate queries (no N+1 / no per-tag counting) — the existing direct-count aggregate plus the new subtree-rollup query. Assert this structurally in tests; do not encode the ≤300 ms wall-clock as a JUnit assertion (flaky on a loaded CI runner). Validate the 300 ms target via the Actuator → Prometheus → Grafana http_server_requests panel for /api/tags/tree.
  • Counts remain global (not scoped to any search filter). Safe: the endpoint is authenticated (TagControllerTest:95 asserts 401 for anonymous), and there is no per-document ACL, so a subtree count leaks nothing a READ_ALL user can't already enumerate.

Implementation notes (from review)

  • Compute the rollup in one recursive CTE in PostgreSQL — build a tag-closure of (ancestor_id, descendant_id) pairs (recursive, depth guard ≤50, reusing the existing TagRepository CTE idiom), join to document_tags on descendant_id, then GROUP BY ancestor_id with COUNT(DISTINCT document_id). One query, all tags. bigint → int via the existing TagCount.getCount().intValue() path.
  • Keep the existing findDocumentCountsPerTag (GROUP BY tag_id) for documentCount — direct count is unchanged. buildTree() maps both counts onto each node.
  • Do NOT roll up by summing child documentCount ints in Java — overlapping documents make that wrong (AC #2). Distinct-count belongs in the DB.
  • DTO + codegen: add @Schema(requiredMode = REQUIRED) int subtreeDocumentCount to TagTreeNodeDTO, then npm run generate:api in frontend/.
  • Frontend wiring:
    • /themen +page.svelte and ThemenWidget.svelte → display subtreeDocumentCount; their visibility filter keys on subtreeDocumentCount > 0.
    • hasAnyDocuments ($lib/shared/utils/tagUtils.ts) currently checks documentCount recursively. For the reader surfaces switch the check to subtreeDocumentCount > 0 (a single field read — the recursion becomes unnecessary). Leave admin consumers untouched.
    • Admin components (TagTreeNode, TagMergeZone, TagDeleteGuard) → no change, continue reading documentCount.
  • Tests (TDD, real Postgres via Testcontainers postgres:16-alpine — not H2; recursive CTE + COUNT(DISTINCT) needs real PG):
    • Backend integration: one test per AC 2/3/4/6 + cycle guard (REQ-THEMEN-06).
    • Parity test (AC #7): uniquely-named root with descendants; assert themenBoxCount(T) === documentSearchResultCount(?tag=T).
    • Characterization tests for the admin previews (AC #8): pin admin_tag_merge_preview_docs and admin_tag_delete_impact to the direct documentCount — none exist today, so add them so a future change can't silently desync a destructive-action preview.
    • The existing TagServiceTest.getTagTree_populatesDocumentCount_fromAggregateQuery stays green (direct count unchanged); getTagTree_callsFindDocumentCountsPerTag_exactlyOnce needs updating — getTagTree now also issues the rollup query.
    • Component test (vitest-browser-svelte): parent with documentCount: 0 but subtreeDocumentCount > 0/themen header shows the rollup (AC #5).
    • Respect the 88% branch coverage gate.
  • Fix the stale JavaDoc on TagService.getTagTree() (lines 177-178) — it claims the tree is "only used for the admin sidebar," which is false (themen + dashboard widget + admin). Update backend/.../tag/README.md.
  • Known pre-existing caveat (not this issue's job): theme box links key on tag name (/documents?tag={tag.name}); rollup is per tag ID. If two distinct tags share a name, name-based search and per-ID rollup diverge together.

Scope / docs

  • No new Flyway migration, no schema change, no new service → no DB/C4 diagram updates triggered. A new DTO field requires generate:api. Doc touch-ups: the getTagTree() JavaDoc + tag/README.md. No ADR required.

Out of scope / parked (Release +1, "Could")

  • Surfacing subtreeDocumentCount on the admin sidebar too (e.g. "12 direct · 200 in subtree") if admins later want subtree context alongside the direct count.
## Context / problem On `/themen`, each theme box shows a count of documents tagged with that *exact* tag (rows in `document_tags`, `GROUP BY tag_id` — never recursive). A parent tag whose documents mostly live on its children therefore reads as near-empty, understating how much material a topic actually holds. The reader surfaces should instead show the whole subtree. **Strongest rationale:** the document search at `/documents?tag=X` *already expands the tag to all its descendants* (`DocumentService:531` → `TagRepository.findDescendantIdsByName`). So today a theme box shows "Reisen — 2" but clicking it lands the user on a page with 7+ results — **the count and its own destination disagree.** A subtree rollup makes the box number equal what the user actually finds. Count↔destination parity is the core win. ## User story > As a **reader browsing topics**, I want each theme box to show how many documents exist under that topic *including its sub-topics*, so that I can judge how rich a topic is without drilling into every child. ## Decisions (resolved with product owner) - **Scope of count:** tag + **all descendants** (full subtree rollup), not just direct children, not literal siblings. - **Double counting:** count **distinct documents** — a document tagged with several tags inside the same subtree counts **once**. - **Display / contract — Option B (new field, not a global redefinition):** add a new field **`subtreeDocumentCount`** to `TagTreeNodeDTO`. The **reader surfaces** read the rollup; the **admin surfaces keep the existing `documentCount` (direct count) unchanged.** - Reader (use `subtreeDocumentCount`): `/themen` page (`themen/+page.svelte`) **and** the dashboard `ThemenWidget.svelte`. - Admin (keep `documentCount`, direct): `admin/tags` sidebar (`TagTreeNode.svelte`), and the two operation previews `TagMergeZone.svelte` and `TagDeleteGuard.svelte`. - **Why not redefine `documentCount` globally (Option A):** `documentCount` is read as *logic input* by `TagMergeZone` (line 87, "N documents will move") and `TagDeleteGuard` (line 54, delete-impact warning). Both describe **direct-document** operations (merge re-tags only direct docs; single-tag delete removes only direct `document_tags`). A rollup there would misstate a destructive action's effect. Option B keeps those previews truthful and requires no admin rework. - **Child ordering on `/themen` may change** as a result of the new counts — accepted. - **Performance target ≤300 ms** server-side for the current tag corpus — confirmed (validate in Grafana, not as a test gate; see NFRs). ## Behaviour rules (EARS) - **REQ-THEMEN-01:** The system shall compute, for every tag, `subtreeDocumentCount` = the count of **distinct** documents tagged with that tag **or any of its descendant tags**. - **REQ-THEMEN-02:** If a document is tagged with multiple tags within the same subtree (e.g. both a parent and its child), then `subtreeDocumentCount` shall count that document **exactly once**. - **REQ-THEMEN-03:** The system shall continue to populate `documentCount` with the existing **direct** per-tag count, unchanged. Both fields are returned by `GET /api/tags/tree`. - **REQ-THEMEN-04:** The reader surfaces (`/themen` page and `ThemenWidget`) shall display `subtreeDocumentCount`; the admin surfaces (sidebar tree, merge preview, delete-impact guard) shall continue to display `documentCount`. - **REQ-THEMEN-05:** While a tag's entire subtree contains zero documents (`subtreeDocumentCount == 0`), the system shall omit that tag from the `/themen` page (current empty-tag hiding behaviour is preserved, now keyed on the rollup). - **REQ-THEMEN-06:** If the tag hierarchy contains a cycle or exceeds the existing depth guard, then the rollup shall terminate safely without error (reuse the current 50-level recursion guard). ## Acceptance criteria (Given-When-Then) 1. **Given** a root tag *Reisen* with 2 direct documents and a child *Reisen → Italien* with 5 documents (none shared), **when** I view `/themen`, **then** the *Reisen* box header shows **7** (`subtreeDocumentCount`). 2. **Given** a document tagged with both *Reisen* and *Reisen → Italien*, **when** `subtreeDocumentCount` for *Reisen* is computed, **then** that document is counted **once** (if the 2 and 5 overlap by one document, the box shows **6**, not 7). _Make-or-break case — distinguishes `COUNT(DISTINCT)` from a naive sum-of-children._ 3. **Given** a grandchild tag *Reisen → Italien → Rom* with 3 documents, **when** I view the *Reisen* box, **then** those 3 are included (full descendant depth, not just direct children). 4. **Given** a leaf tag with no children, **when** I view its box count, **then** `subtreeDocumentCount` equals its `documentCount` (rollup of a leaf = its direct count). 5. **Given** a parent tag with **0 direct documents** but children that have documents, **when** I view `/themen`, **then** its box shows a **non-zero** total (today it shows no number). 6. **Given** a tag whose entire subtree has 0 documents, **when** the page loads, **then** the tag is **not rendered** (unchanged). 7. **Given** a uniquely-named tag *T*, **when** I compare the `/themen` box count for *T* against the number of results on `/documents?tag=T`, **then** the two are **equal** (count↔destination parity — both expand to descendants distinct). _Fixture must use a unique name; name-based search vs ID-based rollup only align when names don't collide._ 8. **Given** an admin viewing `TagMergeZone` / `TagDeleteGuard` for a tag, **when** the preview/impact number is shown, **then** it equals the tag's **direct** `documentCount` (admin previews are unchanged by this feature). ## Non-functional requirements - **NFR-PERF-THEMEN-01:** The `/themen` tree, including all rolled-up distinct counts, shall be produced in a **constant number of aggregate queries (no N+1 / no per-tag counting)** — the existing direct-count aggregate plus the new subtree-rollup query. Assert this **structurally** in tests; **do not** encode the ≤300 ms wall-clock as a JUnit assertion (flaky on a loaded CI runner). Validate the 300 ms target via the Actuator → Prometheus → **Grafana** `http_server_requests` panel for `/api/tags/tree`. - Counts remain **global** (not scoped to any search filter). Safe: the endpoint is **authenticated** (`TagControllerTest:95` asserts 401 for anonymous), and there is no per-document ACL, so a subtree count leaks nothing a `READ_ALL` user can't already enumerate. ## Implementation notes (from review) - **Compute the rollup in one recursive CTE in PostgreSQL** — build a **tag-closure** of `(ancestor_id, descendant_id)` pairs (recursive, depth guard ≤50, reusing the existing `TagRepository` CTE idiom), join to `document_tags` on `descendant_id`, then `GROUP BY ancestor_id` with `COUNT(DISTINCT document_id)`. One query, all tags. `bigint → int` via the existing `TagCount.getCount().intValue()` path. - **Keep the existing `findDocumentCountsPerTag` (`GROUP BY tag_id`) for `documentCount`** — direct count is unchanged. `buildTree()` maps **both** counts onto each node. - **Do NOT roll up by summing child `documentCount` ints in Java** — overlapping documents make that wrong (AC #2). Distinct-count belongs in the DB. - **DTO + codegen:** add `@Schema(requiredMode = REQUIRED) int subtreeDocumentCount` to `TagTreeNodeDTO`, then **`npm run generate:api`** in `frontend/`. - **Frontend wiring:** - `/themen` `+page.svelte` and `ThemenWidget.svelte` → display `subtreeDocumentCount`; their visibility filter keys on `subtreeDocumentCount > 0`. - `hasAnyDocuments` (`$lib/shared/utils/tagUtils.ts`) currently checks `documentCount` recursively. For the reader surfaces switch the check to `subtreeDocumentCount > 0` (a single field read — the recursion becomes unnecessary). Leave admin consumers untouched. - Admin components (`TagTreeNode`, `TagMergeZone`, `TagDeleteGuard`) → **no change**, continue reading `documentCount`. - **Tests (TDD, real Postgres via Testcontainers `postgres:16-alpine` — not H2; recursive CTE + `COUNT(DISTINCT)` needs real PG):** - Backend integration: one test per AC 2/3/4/6 + cycle guard (REQ-THEMEN-06). - **Parity test (AC #7):** uniquely-named root with descendants; assert `themenBoxCount(T) === documentSearchResultCount(?tag=T)`. - **Characterization tests for the admin previews (AC #8):** pin `admin_tag_merge_preview_docs` and `admin_tag_delete_impact` to the **direct** `documentCount` — none exist today, so add them so a future change can't silently desync a destructive-action preview. - The existing `TagServiceTest.getTagTree_populatesDocumentCount_fromAggregateQuery` stays green (direct count unchanged); `getTagTree_callsFindDocumentCountsPerTag_exactlyOnce` needs updating — `getTagTree` now also issues the rollup query. - Component test (`vitest-browser-svelte`): parent with `documentCount: 0` but `subtreeDocumentCount > 0` → `/themen` header shows the rollup (AC #5). - Respect the **88% branch coverage** gate. - **Fix the stale JavaDoc** on `TagService.getTagTree()` (lines 177-178) — it claims the tree is "only used for the admin sidebar," which is false (themen + dashboard widget + admin). Update `backend/.../tag/README.md`. - **Known pre-existing caveat (not this issue's job):** theme box links key on tag **name** (`/documents?tag={tag.name}`); rollup is per tag **ID**. If two distinct tags share a name, name-based search and per-ID rollup diverge together. ## Scope / docs - No new Flyway migration, no schema change, no new service → no DB/C4 diagram updates triggered. A new DTO field requires `generate:api`. Doc touch-ups: the `getTagTree()` JavaDoc + `tag/README.md`. No ADR required. ## Out of scope / parked (Release +1, "Could") - Surfacing `subtreeDocumentCount` on the admin sidebar too (e.g. "12 direct · 200 in subtree") if admins later want subtree context alongside the direct count.
marcel added the P2-mediumfeatureui labels 2026-05-31 11:13:18 +02:00
Author
Owner

Implemented

Built on a worktree branch (worktree-feat+issue-698-themen-subtree-count) following Option B — a new subtreeDocumentCount field, reader surfaces read the rollup, admin surfaces keep the direct documentCount.

Backend

  • 335ef18a feat(tag) — added subtreeDocumentCount to TagTreeNodeDTO; getTagTree() now maps both counts onto each node from two aggregate queries (no N+1, NFR-PERF-THEMEN-01 ). New TagRepository.findSubtreeDocumentCountsPerTag() is one recursive-CTE tag closure (depth guard ≤50) ⋈ document_tags, GROUP BY ancestor_id with COUNT(DISTINCT document_id)REQ-THEMEN-01/02/03/06. Stale getTagTree() JavaDoc corrected.
  • 08a7f892 test(tag) — Testcontainers postgres:16-alpine integration tests: leaf = direct (AC#4), shared doc counted once (AC#1+#2), full grandchild depth (AC#3), empty subtree absent (REQ-THEMEN-05), cycle terminates via the guard (REQ-THEMEN-06), and rollup == distinct documents from the real findDescendantIdsByName + DocumentSpecifications.hasTags search expansion (AC#7 parity).
  • 4e8962a0 docs(tag)tag/README.md now documents the two counts and which surfaces read which.

Frontend

  • e8377b57 feat(themen) — regenerated TagTreeNodeDTO type; hasAnyDocuments now keys on subtreeDocumentCount > 0 (single field read, recursion dropped). Note: the generated type was hand-edited to match generate:api output since codegen needs a live dev backend — re-running npm run generate:api is byte-equivalent.
  • 1fd8c2b2 feat(themen)/themen page (header, child rows, aria-labels) and dashboard ThemenWidget display subtreeDocumentCount. A parent with 0 direct docs but populated children now shows a non-zero total (AC#5, REQ-THEMEN-04).
  • aa9d4aaf test(tag) — admin sidebar fixtures gain the now-required field (no behaviour change).
  • 98be732b test(admin-tags) — characterization tests pinning the merge preview and delete-impact warning to the direct documentCount; each passes a stray subtreeDocumentCount and asserts it does not leak (AC#8).

Validation

  • ./mvnw clean package -DskipTests → BUILD SUCCESS.
  • Tag backend tests green: TagServiceTest (43), TagControllerTest (18), TagRollupRepositoryIntegrationTest (6).
  • npm run lint clean; npm run check adds zero new type errors (browser-mode component tests run on CI per project convention).

Notes / deferred

  • Browser-mode component tests (themen page AC#5, ThemenWidget, admin characterization) are validated on CI, not run locally (project convention).
  • "Could" items (subtree context on the admin sidebar) remain out of scope as parked.

Next: open a PR from this branch.

## Implemented ✅ Built on a worktree branch (`worktree-feat+issue-698-themen-subtree-count`) following Option B — a new `subtreeDocumentCount` field, reader surfaces read the rollup, admin surfaces keep the direct `documentCount`. ### Backend - **`335ef18a`** `feat(tag)` — added `subtreeDocumentCount` to `TagTreeNodeDTO`; `getTagTree()` now maps **both** counts onto each node from two aggregate queries (no N+1, **NFR-PERF-THEMEN-01** ✅). New `TagRepository.findSubtreeDocumentCountsPerTag()` is one recursive-CTE tag closure (depth guard ≤50) ⋈ `document_tags`, `GROUP BY ancestor_id` with `COUNT(DISTINCT document_id)` — **REQ-THEMEN-01/02/03/06**. Stale `getTagTree()` JavaDoc corrected. - **`08a7f892`** `test(tag)` — Testcontainers `postgres:16-alpine` integration tests: leaf = direct (**AC#4**), shared doc counted once (**AC#1+#2**), full grandchild depth (**AC#3**), empty subtree absent (**REQ-THEMEN-05**), cycle terminates via the guard (**REQ-THEMEN-06**), and rollup == distinct documents from the real `findDescendantIdsByName` + `DocumentSpecifications.hasTags` search expansion (**AC#7 parity**). - **`4e8962a0`** `docs(tag)` — `tag/README.md` now documents the two counts and which surfaces read which. ### Frontend - **`e8377b57`** `feat(themen)` — regenerated `TagTreeNodeDTO` type; `hasAnyDocuments` now keys on `subtreeDocumentCount > 0` (single field read, recursion dropped). _Note: the generated type was hand-edited to match `generate:api` output since codegen needs a live dev backend — re-running `npm run generate:api` is byte-equivalent._ - **`1fd8c2b2`** `feat(themen)` — `/themen` page (header, child rows, aria-labels) and dashboard `ThemenWidget` display `subtreeDocumentCount`. A parent with 0 direct docs but populated children now shows a non-zero total (**AC#5**, **REQ-THEMEN-04**). - **`aa9d4aaf`** `test(tag)` — admin sidebar fixtures gain the now-required field (no behaviour change). - **`98be732b`** `test(admin-tags)` — characterization tests pinning the merge preview and delete-impact warning to the **direct** `documentCount`; each passes a stray `subtreeDocumentCount` and asserts it does not leak (**AC#8**). ### Validation - `./mvnw clean package -DskipTests` → BUILD SUCCESS. - Tag backend tests green: `TagServiceTest` (43), `TagControllerTest` (18), `TagRollupRepositoryIntegrationTest` (6). - `npm run lint` clean; `npm run check` adds **zero** new type errors (browser-mode component tests run on CI per project convention). ### Notes / deferred - Browser-mode component tests (themen page AC#5, ThemenWidget, admin characterization) are validated on CI, not run locally (project convention). - "Could" items (subtree context on the admin sidebar) remain out of scope as parked. Next: open a PR from this branch.
Sign in to join this conversation.
No Label P2-medium feature ui
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#698