feat: extend annotations to support polygon shapes (4-point quadrilateral) #227

Closed
opened 2026-04-12 11:43:44 +02:00 by marcel · 9 comments
Owner

Overview

Currently DocumentAnnotation stores axis-aligned bounding boxes (x, y, width, height). This is sufficient for manually drawn rectangles and Surya OCR output, but loses precision for Kraken OCR output — Kraken uses baseline segmentation that returns polygon boundaries for historical handwriting (Kurrent, Sütterlin), where text lines curve and slant.

Extending annotations to support an optional quadrilateral (4 corner points) would allow Kraken-generated blocks to be stored and displayed with precise shape, while remaining fully backward compatible with all existing rectangular annotations.

Tracked as a mid-term goal from the OCR pipeline design discussion in #226.


Problem

Kraken's segmentation outputs polygon boundaries, not rectangles. For Kurrent text lines the difference is significant — a line may be rotated 5–10° or follow a slight curve. Today the OCR pipeline computes the AABB (axis-aligned bounding box) of each Kraken polygon before storing it. The resulting annotation covers the right region but is visually imprecise ��� it bleeds into adjacent lines and doesn't highlight the actual text area.


Proposed Solution

Database

Add an optional polygon column to document_annotations:

ALTER TABLE document_annotations
ADD COLUMN polygon JSONB;
  • null → rectangle, rendered using existing x, y, width, height fields (fully backward compatible)
  • non-null → normalized 4-point quadrilateral: [[x1,y1],[x2,y2],[x3,y3],[x4,y4]] with coordinates in 0–1 range relative to page dimensions

The existing x, y, width, height fields are retained as the axis-aligned bounding box of the polygon — used for overlap detection and as a fallback for clients that don't yet support polygons.

Backend

  • DocumentAnnotation entity: add @Column(columnDefinition = "jsonb") private String polygon
  • CreateAnnotationDTO: add optional polygon field
  • AnnotationService: pass through polygon without modification; overlap detection continues to use the AABB fields
  • OpenAPI spec: expose polygon on the annotation response type

OCR Service (Python)

Instead of computing a simple AABB from Kraken's polygon output, approximate to a quadrilateral:

from scipy.spatial import ConvexHull
import numpy as np

def polygon_to_quad(points, page_w, page_h):
    hull = ConvexHull(np.array(points))
    hull_pts = [points[i] for i in hull.vertices]
    quad = approximate_to_quad(hull_pts)  # 4-point simplification
    return [[p[0]/page_w, p[1]/page_h] for p in quad]

Return the quad alongside the AABB in the OCR response:

{
  "pageNumber": 0,
  "x": 0.10, "y": 0.08, "width": 0.78, "height": 0.06,
  "polygon": [[0.10,0.08],[0.88,0.09],[0.87,0.14],[0.11,0.13]]
}

Frontend

  • PdfViewer: render polygon overlay using SVG <polygon> when annotation.polygon is present, fall back to <rect> for null
  • Annotation type in types.ts: add polygon?: [number, number][] | null

Out of Scope

  • Free-form polygon drawing tool for manual annotations (users still draw rectangles)
  • More than 4 points per annotation — quadrilateral is sufficient for rotated/slanted lines
  • Changing overlap detection to use polygon geometry instead of AABB

Depends On

  • #226 — OCR pipeline must ship first; this improves its Kraken output quality
## Overview Currently `DocumentAnnotation` stores axis-aligned bounding boxes (`x, y, width, height`). This is sufficient for manually drawn rectangles and Surya OCR output, but loses precision for Kraken OCR output — Kraken uses baseline segmentation that returns polygon boundaries for historical handwriting (Kurrent, Sütterlin), where text lines curve and slant. Extending annotations to support an optional quadrilateral (4 corner points) would allow Kraken-generated blocks to be stored and displayed with precise shape, while remaining fully backward compatible with all existing rectangular annotations. Tracked as a mid-term goal from the OCR pipeline design discussion in #226. --- ## Problem Kraken's segmentation outputs polygon boundaries, not rectangles. For Kurrent text lines the difference is significant — a line may be rotated 5–10° or follow a slight curve. Today the OCR pipeline computes the AABB (axis-aligned bounding box) of each Kraken polygon before storing it. The resulting annotation covers the right region but is visually imprecise ��� it bleeds into adjacent lines and doesn't highlight the actual text area. --- ## Proposed Solution ### Database Add an optional `polygon` column to `document_annotations`: ```sql ALTER TABLE document_annotations ADD COLUMN polygon JSONB; ``` - `null` → rectangle, rendered using existing `x, y, width, height` fields (fully backward compatible) - non-null → normalized 4-point quadrilateral: `[[x1,y1],[x2,y2],[x3,y3],[x4,y4]]` with coordinates in 0–1 range relative to page dimensions The existing `x, y, width, height` fields are retained as the axis-aligned bounding box of the polygon — used for overlap detection and as a fallback for clients that don't yet support polygons. ### Backend - `DocumentAnnotation` entity: add `@Column(columnDefinition = "jsonb") private String polygon` - `CreateAnnotationDTO`: add optional `polygon` field - `AnnotationService`: pass through polygon without modification; overlap detection continues to use the AABB fields - OpenAPI spec: expose `polygon` on the annotation response type ### OCR Service (Python) Instead of computing a simple AABB from Kraken's polygon output, approximate to a quadrilateral: ```python from scipy.spatial import ConvexHull import numpy as np def polygon_to_quad(points, page_w, page_h): hull = ConvexHull(np.array(points)) hull_pts = [points[i] for i in hull.vertices] quad = approximate_to_quad(hull_pts) # 4-point simplification return [[p[0]/page_w, p[1]/page_h] for p in quad] ``` Return the quad alongside the AABB in the OCR response: ```json { "pageNumber": 0, "x": 0.10, "y": 0.08, "width": 0.78, "height": 0.06, "polygon": [[0.10,0.08],[0.88,0.09],[0.87,0.14],[0.11,0.13]] } ``` ### Frontend - `PdfViewer`: render polygon overlay using SVG `<polygon>` when `annotation.polygon` is present, fall back to `<rect>` for null - `Annotation` type in `types.ts`: add `polygon?: [number, number][] | null` --- ## Out of Scope - Free-form polygon drawing tool for manual annotations (users still draw rectangles) - More than 4 points per annotation — quadrilateral is sufficient for rotated/slanted lines - Changing overlap detection to use polygon geometry instead of AABB --- ## Depends On - #226 — OCR pipeline must ship first; this improves its Kraken output quality
marcel added the feature label 2026-04-12 11:43:48 +02:00
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Questions & Observations

  • The most interesting function isn't shown. The Python snippet calls approximate_to_quad(hull_pts) but doesn't implement it. That simplification from N convex hull points down to 4 is where all the precision vs. correctness trade-off lives. What algorithm is intended here? Ramer–Douglas–Peucker? A bounding quadrilateral approach? This is the one piece that needs a unit test and a design decision before implementation starts.

  • String polygon in the Java entity. Storing JSONB as a raw String works but requires manual parsing at every callsite. Consider using a proper @Type mapping (Hypersistence Utils JsonType) or a value object class with Jackson deserialization so the entity can hold List<List<Double>> polygon directly — cleaner, testable, and avoids string manipulation in service code.

  • Naming tension: "polygon" vs. "quadrilateral". The issue title says "4-point quadrilateral" but the field is named polygon. If the schema only ever supports exactly 4 points, quadrilateral or quad is more precise and self-documenting. If it might later expand to N points, polygon is fine — but then the Out of Scope section should say that explicitly, because the Java entity and validation should enforce the 4-point constraint now.

  • Where's the input validation? The issue says "pass through polygon without modification" in AnnotationService. That implies no validation. But the frontend sends this data — it should be rejected if it has ≠ 4 points, contains values outside [0,1], or contains duplicate coordinates. Fail fast at the DTO boundary.

  • Frontend component splitting signal. PdfViewer is described as rendering either <polygon> or <rect> based on a runtime condition. If the rendering logic for each shape type grows (hover state, click handler, highlight animation), this component will accumulate two distinct visual behaviors. Consider a named AnnotationOverlay.svelte that takes a single annotation and decides which SVG primitive to use internally — PdfViewer stays clean.

Suggestions

  • Write the failing test for polygon round-trip first: create annotation with polygon via API → fetch annotation → assert polygon coordinates match exactly. This catches serialization issues (float precision, null handling) before any frontend work.
  • For the Python approximate_to_quad function: write pytest unit tests for a known rotated line polygon and verify the 4 output points are geometrically correct before wiring it into the OCR pipeline.
## 👨‍💻 Felix Brandt — Senior Fullstack Developer ### Questions & Observations - **The most interesting function isn't shown.** The Python snippet calls `approximate_to_quad(hull_pts)` but doesn't implement it. That simplification from N convex hull points down to 4 is where all the precision vs. correctness trade-off lives. What algorithm is intended here? Ramer–Douglas–Peucker? A bounding quadrilateral approach? This is the one piece that needs a unit test and a design decision before implementation starts. - **`String polygon` in the Java entity.** Storing JSONB as a raw `String` works but requires manual parsing at every callsite. Consider using a proper `@Type` mapping (Hypersistence Utils `JsonType`) or a value object class with Jackson deserialization so the entity can hold `List<List<Double>> polygon` directly — cleaner, testable, and avoids string manipulation in service code. - **Naming tension: "polygon" vs. "quadrilateral".** The issue title says "4-point quadrilateral" but the field is named `polygon`. If the schema only ever supports exactly 4 points, `quadrilateral` or `quad` is more precise and self-documenting. If it might later expand to N points, `polygon` is fine — but then the Out of Scope section should say that explicitly, because the Java entity and validation should enforce the 4-point constraint now. - **Where's the input validation?** The issue says "pass through polygon without modification" in `AnnotationService`. That implies no validation. But the frontend sends this data — it should be rejected if it has ≠ 4 points, contains values outside [0,1], or contains duplicate coordinates. Fail fast at the DTO boundary. - **Frontend component splitting signal.** `PdfViewer` is described as rendering either `<polygon>` or `<rect>` based on a runtime condition. If the rendering logic for each shape type grows (hover state, click handler, highlight animation), this component will accumulate two distinct visual behaviors. Consider a named `AnnotationOverlay.svelte` that takes a single annotation and decides which SVG primitive to use internally — `PdfViewer` stays clean. ### Suggestions - Write the failing test for polygon round-trip first: create annotation with polygon via API → fetch annotation → assert polygon coordinates match exactly. This catches serialization issues (float precision, null handling) before any frontend work. - For the Python `approximate_to_quad` function: write pytest unit tests for a known rotated line polygon and verify the 4 output points are geometrically correct before wiring it into the OCR pipeline.
Author
Owner

🏛️ Markus Keller — Application Architect

Questions & Observations

  • JSONB vs. normalized columns. Storing the polygon as JSONB is pragmatic — it avoids a second table and keeps the entity flat. The trade-off: you can't query on individual point coordinates, you lose type-level enforcement in the database, and range checks (0 ≤ x ≤ 1) can't be expressed as CHECK constraints without a PL/pgSQL function. For this use case (pass-through, display only, no server-side geometry queries), JSONB is the right call. But document that decision — an ADR entry, or at minimum a comment in the Flyway migration, explaining why JSONB was chosen over NUMERIC(8,6) columns.

  • The "pass through without modification" decision is worth questioning. The proposal stores whatever the OCR service sends as the polygon, with overlap detection continuing to use the AABB. If that assumption ever breaks — say, a future feature needs server-side polygon intersection — the service layer will need significant rework. Is there a stronger invariant we can establish now? For example: "polygon, if present, must be the convex hull of the text area as reported by Kraken." Document what the field semantically means, not just its format.

  • The approximate_to_quad function introduces an approximation gap. ConvexHull of Kraken's polygon output → 4-point quad is a lossy compression step. For strongly curved Kurrent lines, the AABB and the quad may both be poor representations. Worth stating explicitly in the issue whether quadrilateral is a permanent solution or a stepping stone. The Out of Scope note says "more than 4 points per annotation" — but does that mean "not now" or "never"? The schema design (JSONB) can accommodate N points, so leaving the door open costs nothing.

  • Interface between OCR service and Java backend. The issue describes what the OCR service returns, but not how it's integrated. Is the Python service called synchronously by AnnotationService? Does it post back to a Spring endpoint? Is there a shared contract (OpenAPI spec or JSON schema)? The polygon field in the OCR response JSON needs a formal contract — otherwise, format drift between the two services will cause silent data corruption.

  • Flyway migration is a one-liner — good. ADD COLUMN polygon JSONB is safe, backward compatible, and reversible (you can DROP COLUMN cleanly). No concerns there.

Suggestions

  • Add a CHECK constraint on polygon at the database level using a simple JSON structure check, e.g. CHECK (polygon IS NULL OR jsonb_array_length(polygon) = 4). This enforces the 4-point invariant at the data layer, where it belongs.
  • Define a JSON Schema or Pydantic model for the polygon field in the OCR service — use it as the canonical contract shared with the Java backend, instead of relying on implicit structure agreement.
## 🏛️ Markus Keller — Application Architect ### Questions & Observations - **JSONB vs. normalized columns.** Storing the polygon as `JSONB` is pragmatic — it avoids a second table and keeps the entity flat. The trade-off: you can't query on individual point coordinates, you lose type-level enforcement in the database, and range checks (`0 ≤ x ≤ 1`) can't be expressed as `CHECK` constraints without a PL/pgSQL function. For this use case (pass-through, display only, no server-side geometry queries), JSONB is the right call. But document that decision — an ADR entry, or at minimum a comment in the Flyway migration, explaining why JSONB was chosen over `NUMERIC(8,6)` columns. - **The "pass through without modification" decision is worth questioning.** The proposal stores whatever the OCR service sends as the polygon, with overlap detection continuing to use the AABB. If that assumption ever breaks — say, a future feature needs server-side polygon intersection — the service layer will need significant rework. Is there a stronger invariant we can establish now? For example: "polygon, if present, must be the convex hull of the text area as reported by Kraken." Document what the field semantically means, not just its format. - **The `approximate_to_quad` function introduces an approximation gap.** ConvexHull of Kraken's polygon output → 4-point quad is a lossy compression step. For strongly curved Kurrent lines, the AABB and the quad may both be poor representations. Worth stating explicitly in the issue whether quadrilateral is a permanent solution or a stepping stone. The Out of Scope note says "more than 4 points per annotation" — but does that mean "not now" or "never"? The schema design (`JSONB`) can accommodate N points, so leaving the door open costs nothing. - **Interface between OCR service and Java backend.** The issue describes what the OCR service *returns*, but not how it's integrated. Is the Python service called synchronously by `AnnotationService`? Does it post back to a Spring endpoint? Is there a shared contract (OpenAPI spec or JSON schema)? The `polygon` field in the OCR response JSON needs a formal contract — otherwise, format drift between the two services will cause silent data corruption. - **Flyway migration is a one-liner — good.** `ADD COLUMN polygon JSONB` is safe, backward compatible, and reversible (you can `DROP COLUMN` cleanly). No concerns there. ### Suggestions - Add a `CHECK` constraint on `polygon` at the database level using a simple JSON structure check, e.g. `CHECK (polygon IS NULL OR jsonb_array_length(polygon) = 4)`. This enforces the 4-point invariant at the data layer, where it belongs. - Define a JSON Schema or Pydantic model for the polygon field in the OCR service — use it as the canonical contract shared with the Java backend, instead of relying on implicit structure agreement.
Author
Owner

🧪 Sara Holt — QA Engineer

Questions & Observations

  • Acceptance criteria are missing. The issue describes the implementation well but doesn't define what "done" looks like from a behavior standpoint. Before implementation: what should a tester verify? Some candidates:

    • Annotation created with polygon: null → stored and retrieved as null, rendered as <rect>
    • Annotation created with a valid 4-point polygon → stored and retrieved with coordinate precision preserved, rendered as <polygon>
    • Annotation created with a 3-point or 5-point polygon → rejected with a 400 error
    • Existing annotations (no polygon column yet) → continue to display correctly after migration
  • The approximate_to_quad algorithm needs a unit test suite in Python. This is a non-trivial geometric computation. Test cases to cover:

    • Perfectly horizontal line (quad should degenerate to near-rectangle)
    • 5–10° rotated line (verify the quad captures the rotation)
    • Strongly curved Kurrent line (verify the ConvexHull step doesn't produce a degenerate quad)
    • Single-point or two-point polygon input (what does ConvexHull do with degenerate input? Does it throw?)
  • Float precision in the round-trip. Coordinates are normalized to [0,1] and stored in JSONB. After serialization → storage → retrieval → deserialization, are the floats stable? JavaScript's 64-bit float and PostgreSQL's float8 will agree, but if the OCR service produces Python float32 values, precision loss may occur. This should be verified with an integration test.

  • Migration test. The Flyway migration adds a nullable column — trivially safe, but the integration test suite should confirm that: (a) the migration runs cleanly on a fresh database, and (b) existing annotation rows have polygon = null after migration and still load correctly in the application.

  • Frontend rendering regression. Any change to PdfViewer that adds conditional rendering logic risks breaking the existing <rect> path. A Vitest component test asserting that annotation.polygon = null renders a <rect> (not a <polygon>) should be added as a regression guard.

Suggested Test Plan

Layer Test Type
Python unit polygon_to_quad with rotated, curved, degenerate inputs pytest unit
Java unit DTO validation rejects polygon with ≠ 4 points JUnit + Mockito
Java integration Annotation round-trip with polygon field (Testcontainers) @DataJpaTest
Java integration Migration: existing rows unaffected, polygon is null Flyway + Testcontainers
Frontend unit PdfViewer renders <rect> for null polygon, <polygon> otherwise Vitest
E2E OCR pipeline end-to-end: Kraken block → annotation stored with polygon Playwright
## 🧪 Sara Holt — QA Engineer ### Questions & Observations - **Acceptance criteria are missing.** The issue describes the implementation well but doesn't define what "done" looks like from a behavior standpoint. Before implementation: what should a tester verify? Some candidates: - Annotation created with `polygon: null` → stored and retrieved as `null`, rendered as `<rect>` - Annotation created with a valid 4-point polygon → stored and retrieved with coordinate precision preserved, rendered as `<polygon>` - Annotation created with a 3-point or 5-point polygon → rejected with a 400 error - Existing annotations (no polygon column yet) → continue to display correctly after migration - **The `approximate_to_quad` algorithm needs a unit test suite in Python.** This is a non-trivial geometric computation. Test cases to cover: - Perfectly horizontal line (quad should degenerate to near-rectangle) - 5–10° rotated line (verify the quad captures the rotation) - Strongly curved Kurrent line (verify the ConvexHull step doesn't produce a degenerate quad) - Single-point or two-point polygon input (what does ConvexHull do with degenerate input? Does it throw?) - **Float precision in the round-trip.** Coordinates are normalized to `[0,1]` and stored in JSONB. After serialization → storage → retrieval → deserialization, are the floats stable? JavaScript's 64-bit float and PostgreSQL's `float8` will agree, but if the OCR service produces Python `float32` values, precision loss may occur. This should be verified with an integration test. - **Migration test.** The Flyway migration adds a nullable column — trivially safe, but the integration test suite should confirm that: (a) the migration runs cleanly on a fresh database, and (b) existing annotation rows have `polygon = null` after migration and still load correctly in the application. - **Frontend rendering regression.** Any change to `PdfViewer` that adds conditional rendering logic risks breaking the existing `<rect>` path. A Vitest component test asserting that `annotation.polygon = null` renders a `<rect>` (not a `<polygon>`) should be added as a regression guard. ### Suggested Test Plan | Layer | Test | Type | |---|---|---| | Python unit | `polygon_to_quad` with rotated, curved, degenerate inputs | pytest unit | | Java unit | DTO validation rejects polygon with ≠ 4 points | JUnit + Mockito | | Java integration | Annotation round-trip with polygon field (Testcontainers) | @DataJpaTest | | Java integration | Migration: existing rows unaffected, `polygon` is null | Flyway + Testcontainers | | Frontend unit | `PdfViewer` renders `<rect>` for null polygon, `<polygon>` otherwise | Vitest | | E2E | OCR pipeline end-to-end: Kraken block → annotation stored with polygon | Playwright |
Author
Owner

🔐 Nora "NullX" Steiner — Application Security Engineer

Questions & Observations

  • Unvalidated JSONB input stored directly. The issue says AnnotationService passes through the polygon "without modification." If the polygon field in CreateAnnotationDTO is a raw String (or generic Object) that is written to JSONB without structural validation, a client can store arbitrary JSON in this column. While JSONB doesn't enable SQL injection on its own, it creates two risks: (1) bloated storage if an attacker sends megabyte-sized arrays, and (2) downstream type-confusion bugs if code later assumes the structure is [[x,y],[x,y],[x,y],[x,y]]. CWE-20 (Improper Input Validation) applies here.

    Fix: Validate in the DTO before the service ever sees the value:

    @Size(min = 4, max = 4, message = "polygon must have exactly 4 points")
    @Valid
    private List<@Size(min = 2, max = 2) List<@DecimalMin("0.0") @DecimalMax("1.0") Double>> polygon;
    

    This rejects malformed input at the controller boundary — no service change needed.

  • Mass assignment risk. If polygon is added to DocumentAnnotation entity and the entity is also used in update paths, ensure the update DTO also explicitly controls which fields can be set. Passing an entity directly to a form endpoint is a mass assignment vector (CWE-915). The issue mentions CreateAnnotationDTO — verify that no update endpoint exposes the full entity to be overwritten.

  • SVG <polygon> injection in the frontend. The points attribute of an SVG <polygon> element is populated from backend data. If coordinates are rendered via string interpolation into an SVG attribute without escaping, an attacker who stores crafted polygon data could inject SVG attributes. In Svelte, binding via {points} is safe (Svelte escapes attribute values), but double-check the rendering code uses points={formattedPoints} not {@html ...}.

  • No concerns with the JSONB storage itself. JSONB is not a SQL injection vector when accessed through JPA/Hibernate. The migration (ADD COLUMN polygon JSONB) is safe.

  • scipy/numpy supply chain. The OCR service is adding scipy as a new dependency for ConvexHull. Ensure requirements.txt pins the exact version and that the CI pipeline runs pip-audit or similar. scipy is a large dependency with a history of CVEs in its Fortran bindings (generally not exploitable in this context, but worth noting for dependency hygiene).

Summary of Findings

# Finding Severity CWE
1 Unvalidated polygon JSONB stored directly Medium CWE-20
2 Mass assignment risk on annotation update path Low (verify) CWE-915
3 SVG attribute injection via polygon points Low (Svelte mitigates) CWE-79
4 scipy version not pinned Informational CWE-1395
## 🔐 Nora "NullX" Steiner — Application Security Engineer ### Questions & Observations - **Unvalidated JSONB input stored directly.** The issue says `AnnotationService` passes through the polygon "without modification." If the `polygon` field in `CreateAnnotationDTO` is a raw `String` (or generic `Object`) that is written to JSONB without structural validation, a client can store arbitrary JSON in this column. While JSONB doesn't enable SQL injection on its own, it creates two risks: (1) bloated storage if an attacker sends megabyte-sized arrays, and (2) downstream type-confusion bugs if code later assumes the structure is `[[x,y],[x,y],[x,y],[x,y]]`. CWE-20 (Improper Input Validation) applies here. **Fix:** Validate in the DTO before the service ever sees the value: ```java @Size(min = 4, max = 4, message = "polygon must have exactly 4 points") @Valid private List<@Size(min = 2, max = 2) List<@DecimalMin("0.0") @DecimalMax("1.0") Double>> polygon; ``` This rejects malformed input at the controller boundary — no service change needed. - **Mass assignment risk.** If `polygon` is added to `DocumentAnnotation` entity and the entity is also used in update paths, ensure the update DTO also explicitly controls which fields can be set. Passing an entity directly to a form endpoint is a mass assignment vector (CWE-915). The issue mentions `CreateAnnotationDTO` — verify that no update endpoint exposes the full entity to be overwritten. - **SVG `<polygon>` injection in the frontend.** The `points` attribute of an SVG `<polygon>` element is populated from backend data. If coordinates are rendered via string interpolation into an SVG attribute without escaping, an attacker who stores crafted polygon data could inject SVG attributes. In Svelte, binding via `{points}` is safe (Svelte escapes attribute values), but double-check the rendering code uses `points={formattedPoints}` not `{@html ...}`. - **No concerns with the JSONB storage itself.** JSONB is not a SQL injection vector when accessed through JPA/Hibernate. The migration (`ADD COLUMN polygon JSONB`) is safe. - **scipy/numpy supply chain.** The OCR service is adding `scipy` as a new dependency for `ConvexHull`. Ensure `requirements.txt` pins the exact version and that the CI pipeline runs `pip-audit` or similar. `scipy` is a large dependency with a history of CVEs in its Fortran bindings (generally not exploitable in this context, but worth noting for dependency hygiene). ### Summary of Findings | # | Finding | Severity | CWE | |---|---|---|---| | 1 | Unvalidated polygon JSONB stored directly | Medium | CWE-20 | | 2 | Mass assignment risk on annotation update path | Low (verify) | CWE-915 | | 3 | SVG attribute injection via polygon points | Low (Svelte mitigates) | CWE-79 | | 4 | scipy version not pinned | Informational | CWE-1395 |
Author
Owner

🎨 Leonie Voss — UI/UX Designer

Questions & Observations

  • No visual design spec. This is a technical issue, but the polygon overlay will be visible to users when they view OCR'd documents. What should it look like? Right now, the rectangular annotation style exists somewhere in the codebase. The polygon variant needs the same visual treatment: fill opacity, stroke color, hover state, selected state. Without a design spec, the developer will make arbitrary choices that may not match the existing annotation appearance.

  • The fallback rendering path must be visually identical to the current behavior. The <polygon> SVG element and the <rect> SVG element render differently in SVG coordinate space. A <rect> is defined by x, y, width, height — a <polygon> by points. If styling is applied via CSS class (e.g. .annotation-overlay), both elements will inherit it. But if any style is applied via SVG presentation attributes (fill, stroke) directly on the element tag, you need to explicitly replicate them on both elements. Verify this is handled.

  • Accessibility of SVG shapes. Both <rect> and <polygon> need the same accessible treatment:

    • role="img" or role="region" with an aria-label describing the annotation (e.g. "OCR block, page 1")
    • If annotations are interactive (clickable/selectable), they need tabindex="0", keyboard handlers, and a visible focus indicator
    • The focus outline must work on both shape types — outline behaves differently on SVG elements vs HTML elements; use stroke + filter: drop-shadow for SVG focus states
  • Will users be able to distinguish polygon annotations from rectangle annotations? If all annotations look the same regardless of shape, users lose the precision benefit this feature is supposed to provide. Consider a subtle visual cue (e.g. slightly different stroke style or corner treatment) that signals "this annotation was computed from a precise polygon." At minimum, note in the issue whether same-appearance is intentional or a gap.

  • Coordinate rendering at different zoom levels. Normalized 0–1 coordinates scaled by the page dimensions in the viewer — this is the right approach and should scale correctly. But confirm that the SVG viewport or viewBox is set consistently for both <rect> and <polygon> rendering. A mismatch in how page dimensions are mapped to SVG space would cause the polygon overlay to appear in the wrong position.

Suggestions

  • Add a visual annotation design note to the issue (or link a spec) before implementation — even a one-paragraph description of "looks like existing rect annotations, same stroke/fill" is sufficient.
  • Test the rendered result at the common zoom levels your users use (100%, 150%, fit-to-width) and at both the top-left and bottom-right corners of a page where coordinate normalization errors are most visible.
## 🎨 Leonie Voss — UI/UX Designer ### Questions & Observations - **No visual design spec.** This is a technical issue, but the polygon overlay will be visible to users when they view OCR'd documents. What should it look like? Right now, the rectangular annotation style exists somewhere in the codebase. The polygon variant needs the same visual treatment: fill opacity, stroke color, hover state, selected state. Without a design spec, the developer will make arbitrary choices that may not match the existing annotation appearance. - **The fallback rendering path must be visually identical to the current behavior.** The `<polygon>` SVG element and the `<rect>` SVG element render differently in SVG coordinate space. A `<rect>` is defined by `x`, `y`, `width`, `height` — a `<polygon>` by `points`. If styling is applied via CSS class (e.g. `.annotation-overlay`), both elements will inherit it. But if any style is applied via SVG presentation attributes (`fill`, `stroke`) directly on the element tag, you need to explicitly replicate them on both elements. Verify this is handled. - **Accessibility of SVG shapes.** Both `<rect>` and `<polygon>` need the same accessible treatment: - `role="img"` or `role="region"` with an `aria-label` describing the annotation (e.g. "OCR block, page 1") - If annotations are interactive (clickable/selectable), they need `tabindex="0"`, keyboard handlers, and a visible focus indicator - The focus outline must work on both shape types — `outline` behaves differently on SVG elements vs HTML elements; use `stroke` + `filter: drop-shadow` for SVG focus states - **Will users be able to distinguish polygon annotations from rectangle annotations?** If all annotations look the same regardless of shape, users lose the precision benefit this feature is supposed to provide. Consider a subtle visual cue (e.g. slightly different stroke style or corner treatment) that signals "this annotation was computed from a precise polygon." At minimum, note in the issue whether same-appearance is intentional or a gap. - **Coordinate rendering at different zoom levels.** Normalized 0–1 coordinates scaled by the page dimensions in the viewer — this is the right approach and should scale correctly. But confirm that the SVG viewport or `viewBox` is set consistently for both `<rect>` and `<polygon>` rendering. A mismatch in how page dimensions are mapped to SVG space would cause the polygon overlay to appear in the wrong position. ### Suggestions - Add a visual annotation design note to the issue (or link a spec) before implementation — even a one-paragraph description of "looks like existing rect annotations, same stroke/fill" is sufficient. - Test the rendered result at the common zoom levels your users use (100%, 150%, fit-to-width) and at both the top-left and bottom-right corners of a page where coordinate normalization errors are most visible.
Author
Owner

🛠️ Tobias Wendt — DevOps & Platform Engineer

Questions & Observations

  • scipy is a heavyweight dependency. scipy pulls in numpy, LAPACK/BLAS bindings, and multiple compiled extensions — the Docker image for the OCR service could grow by 300–500 MB just for ConvexHull. If only ConvexHull is needed, consider whether scipy.spatial.ConvexHull is really necessary or if a minimal pure-Python implementation (e.g. the gift-wrapping algorithm for 2D points) would suffice. This avoids the image bloat, speeds up CI image builds, and eliminates a large compiled-extension dependency from the attack surface.

  • The approximate_to_quad function is the undefined piece. From a deployment standpoint, if this function fails at runtime (e.g. on a degenerate polygon that makes ConvexHull throw a QhullError), what happens? Does the OCR service crash? Return a 500 to the Java backend? Return the annotation without a polygon field? The fallback behavior needs to be defined and handled gracefully — the OCR pipeline should not fail a whole document because one text block had a degenerate polygon.

  • No observability change mentioned. The polygon computation is a new failure mode in the OCR service. If polygon_to_quad silently produces incorrect results (coordinates outside [0,1], NaN values, etc.), you'll have no visibility unless you log or metric it. Add a log line when polygon computation falls back to None, and consider a counter metric (e.g. Prometheus ocr_polygon_fallback_total) so you can see the fallback rate in Grafana.

  • Migration deployment. ADD COLUMN polygon JSONB is safe to run with the application live (no lock escalation, no data rewrite). The backend can be deployed before or after the migration without compatibility issues in either direction — the column is nullable and the existing code ignores it. Good design.

  • Python service requirements.txt. Confirm scipy and numpy are pinned to exact versions in requirements.txt, not just scipy>=1.x. Floating dependency versions in a container image are reproducibility problems waiting to happen. The CI build should fail if the lock file is out of date.

Suggestions

  • Consider wrapping polygon_to_quad in a try/except that logs the error and returns None, so a polygon computation failure degrades gracefully to AABB-only storage rather than blocking the annotation entirely.
  • Add the polygon fallback rate to the existing OCR service health endpoint or Prometheus metrics so you can detect systematic failures without reading raw logs.
## 🛠️ Tobias Wendt — DevOps & Platform Engineer ### Questions & Observations - **scipy is a heavyweight dependency.** `scipy` pulls in `numpy`, LAPACK/BLAS bindings, and multiple compiled extensions — the Docker image for the OCR service could grow by 300–500 MB just for `ConvexHull`. If only `ConvexHull` is needed, consider whether `scipy.spatial.ConvexHull` is really necessary or if a minimal pure-Python implementation (e.g. the gift-wrapping algorithm for 2D points) would suffice. This avoids the image bloat, speeds up CI image builds, and eliminates a large compiled-extension dependency from the attack surface. - **The `approximate_to_quad` function is the undefined piece.** From a deployment standpoint, if this function fails at runtime (e.g. on a degenerate polygon that makes ConvexHull throw a `QhullError`), what happens? Does the OCR service crash? Return a 500 to the Java backend? Return the annotation without a polygon field? The fallback behavior needs to be defined and handled gracefully — the OCR pipeline should not fail a whole document because one text block had a degenerate polygon. - **No observability change mentioned.** The polygon computation is a new failure mode in the OCR service. If `polygon_to_quad` silently produces incorrect results (coordinates outside [0,1], NaN values, etc.), you'll have no visibility unless you log or metric it. Add a log line when polygon computation falls back to None, and consider a counter metric (e.g. Prometheus `ocr_polygon_fallback_total`) so you can see the fallback rate in Grafana. - **Migration deployment.** `ADD COLUMN polygon JSONB` is safe to run with the application live (no lock escalation, no data rewrite). The backend can be deployed before or after the migration without compatibility issues in either direction — the column is nullable and the existing code ignores it. Good design. - **Python service requirements.txt.** Confirm `scipy` and `numpy` are pinned to exact versions in `requirements.txt`, not just `scipy>=1.x`. Floating dependency versions in a container image are reproducibility problems waiting to happen. The CI build should fail if the lock file is out of date. ### Suggestions - Consider wrapping `polygon_to_quad` in a try/except that logs the error and returns `None`, so a polygon computation failure degrades gracefully to AABB-only storage rather than blocking the annotation entirely. - Add the polygon fallback rate to the existing OCR service health endpoint or Prometheus metrics so you can detect systematic failures without reading raw logs.
Author
Owner

🏛️ Markus Keller — Application Architect

Interactive discussion — architectural decisions agreed with @marcel


Resolved: OCR service integration contract

The existing ADR-001 (docs/adr/001-ocr-python-microservice.md) already defines the transport: Spring Boot calls the Python OCR service via RestClient over HTTP. The polygon field is simply added to the existing JSON response body. Contract is enforced by a Pydantic model in the Python service and a matching Java DTO on the Spring Boot side. No new transport decision required.


Resolved: 4-point constraint enforced at the database layer

The 4-point limit is a permanent domain constraint, not a temporary restriction. The Flyway migration will include a named CHECK constraint:

ALTER TABLE document_annotations ADD COLUMN polygon JSONB;

ALTER TABLE document_annotations
ADD CONSTRAINT chk_annotation_polygon_quad
    CHECK (polygon IS NULL OR jsonb_array_length(polygon) = 4);

Two separate statements so the constraint is named and independently droppable.


Resolved: Java entity typed via AttributeConverter, no new dependency

private String polygon is rejected. The entity will hold List<List<Double>> polygon backed by a small AttributeConverter<List<List<Double>>, String>. Hypersistence Utils is not currently a project dependency and won't be added for a single use case.


Resolved: JSONB decision gets a full ADR

A new docs/adr/002-polygon-jsonb-storage.md will document the JSONB choice, the alternatives considered (8 NUMERIC columns, separate join table), and why they were rejected (no server-side geometry queries needed, pass-through display-only storage).


Resolved: Semantic invariant for the polygon field

Agreed definition for the entity Javadoc and the ADR:

polygon, if present, is a 4-point quadrilateral with coordinates normalized to [0, 1] relative to page dimensions. It may originate from OCR engine output (Kraken) or from a future manual drawing tool. The AABB fields (x, y, width, height) remain the authoritative bounding box for overlap detection and are always populated.

Key points: polygon is source-agnostic (Kraken or future manual input), AABB is always the geometry source of truth for server-side logic.


Overall read

The design is sound. The main gaps were documentation (ADR, semantic definition) and a missing DB-level constraint — both easy to close before implementation starts. The integration contract with the OCR service was already resolved by ADR-001, just not referenced from this issue. Recommend linking #226 and ADR-001 explicitly in the implementation tasks.

## 🏛️ Markus Keller — Application Architect *Interactive discussion — architectural decisions agreed with @marcel* --- ### ✅ Resolved: OCR service integration contract The existing ADR-001 (`docs/adr/001-ocr-python-microservice.md`) already defines the transport: Spring Boot calls the Python OCR service via `RestClient` over HTTP. The `polygon` field is simply added to the existing JSON response body. Contract is enforced by a Pydantic model in the Python service and a matching Java DTO on the Spring Boot side. No new transport decision required. --- ### ✅ Resolved: 4-point constraint enforced at the database layer The 4-point limit is a permanent domain constraint, not a temporary restriction. The Flyway migration will include a named `CHECK` constraint: ```sql ALTER TABLE document_annotations ADD COLUMN polygon JSONB; ALTER TABLE document_annotations ADD CONSTRAINT chk_annotation_polygon_quad CHECK (polygon IS NULL OR jsonb_array_length(polygon) = 4); ``` Two separate statements so the constraint is named and independently droppable. --- ### ✅ Resolved: Java entity typed via `AttributeConverter`, no new dependency `private String polygon` is rejected. The entity will hold `List<List<Double>> polygon` backed by a small `AttributeConverter<List<List<Double>>, String>`. Hypersistence Utils is not currently a project dependency and won't be added for a single use case. --- ### ✅ Resolved: JSONB decision gets a full ADR A new `docs/adr/002-polygon-jsonb-storage.md` will document the JSONB choice, the alternatives considered (8 `NUMERIC` columns, separate join table), and why they were rejected (no server-side geometry queries needed, pass-through display-only storage). --- ### ✅ Resolved: Semantic invariant for the `polygon` field Agreed definition for the entity Javadoc and the ADR: > `polygon`, if present, is a 4-point quadrilateral with coordinates normalized to [0, 1] relative to page dimensions. It may originate from OCR engine output (Kraken) or from a future manual drawing tool. The AABB fields (`x`, `y`, `width`, `height`) remain the authoritative bounding box for overlap detection and are always populated. Key points: polygon is source-agnostic (Kraken or future manual input), AABB is always the geometry source of truth for server-side logic. --- ### Overall read The design is sound. The main gaps were documentation (ADR, semantic definition) and a missing DB-level constraint — both easy to close before implementation starts. The integration contract with the OCR service was already resolved by ADR-001, just not referenced from this issue. Recommend linking #226 and ADR-001 explicitly in the implementation tasks.
Author
Owner

👨‍💻 Felix Brandt — Senior Fullstack Developer

Interactive discussion — implementation decisions agreed with @marcel


Resolved: approximate_to_quad algorithm — pure Python, no scipy

The Python snippet in the issue calls approximate_to_quad but never implements it. Agreed algorithm: rotating calipers minimum bounding rectangle on the convex hull of Kraken's polygon output. Convex hull computed with a pure-Python gift-wrapping (Jarvis march) implementation.

  • No scipy, no numpy, no cv2 — pure Python only
  • Tobias's image-bloat concern (scipy pulling in LAPACK/BLAS) is resolved as a side effect

Resolved: DTO validation scope

Four-layer validation on CreateAnnotationDTO.polygon:

@Size(min = 4, max = 4)
@UniquePoints
@Valid
private List<@Size(min = 2, max = 2) List<@DecimalMin("0.0") @DecimalMax("1.0") Double>> polygon;
  • Exactly 4 points (@Size)
  • Exactly 2 coordinates per point (@Size)
  • Each coordinate in [0, 1] (@DecimalMin / @DecimalMax)
  • No duplicate points — custom @UniquePoints validator with a proper validation message; null passes through (field is optional)

DB CHECK constraint (from Markus's discussion) enforces the 4-point invariant at the data layer as a second line of defence.


Resolved: AnnotationShape.svelte component split

PdfViewer must not own the rect/polygon rendering decision — that is a second responsibility. Extract AnnotationShape.svelte as a separate commit before the polygon feature:

  • AnnotationShape.svelte receives one annotation, renders <rect> or <polygon> internally
  • PdfViewer iterates and delegates: {#each annotations as annotation (annotation.id)}<AnnotationShape {annotation} />{/each}
  • Commit 1: extract component, move existing <rect> rendering — no behavior change
  • Commit 2: add polygon branch inside AnnotationShape

Resolved: TDD order

First failing test is a DTO validation unit test:

@Test
void shouldRejectPolygonWithThreePoints() {
    var dto = new CreateAnnotationDTO();
    dto.setPolygon(List.of(
        List.of(0.1, 0.1),
        List.of(0.9, 0.1),
        List.of(0.9, 0.9)
    ));
    var violations = validator.validate(dto);
    assertThat(violations).anyMatch(v -> v.getPropertyPath().toString().equals("polygon"));
}

Full TDD order: DTO validation → AttributeConverter unit → round-trip @DataJpaTest → API integration slice.


Resolved: Implementation sequencing — #226 and #227 ship together

The "Depends On: #226" relationship in this issue is backward for Kraken quality. Shipping #226 first means Kraken goes live with AABB-only output that is explicitly described as imprecise. Agreed: implement #226 and #227 in a single branch, closing both issues in one PR.

Branch implementation order:

  1. Base OCR microservice — Surya path, AABB only (#226 core)
  2. Kraken path with polygon_to_quad and polygon output (#226 + #227)
  3. Java backend changes — AttributeConverter, DTO validation, Flyway migration (#227)
  4. AnnotationShape.svelte split + polygon rendering (#227)

Overall read

The feature is well-scoped. The main implementation gaps were the undefined approximate_to_quad algorithm and the DTO validation boundary. Both are now settled. The sequencing fix (shipping #226 and #227 together) avoids deliberately degraded Kraken output and simplifies the deployment story.

## 👨‍💻 Felix Brandt — Senior Fullstack Developer *Interactive discussion — implementation decisions agreed with @marcel* --- ### ✅ Resolved: `approximate_to_quad` algorithm — pure Python, no scipy The Python snippet in the issue calls `approximate_to_quad` but never implements it. Agreed algorithm: **rotating calipers minimum bounding rectangle** on the convex hull of Kraken's polygon output. Convex hull computed with a pure-Python **gift-wrapping (Jarvis march)** implementation. - No `scipy`, no `numpy`, no `cv2` — pure Python only - Tobias's image-bloat concern (scipy pulling in LAPACK/BLAS) is resolved as a side effect --- ### ✅ Resolved: DTO validation scope Four-layer validation on `CreateAnnotationDTO.polygon`: ```java @Size(min = 4, max = 4) @UniquePoints @Valid private List<@Size(min = 2, max = 2) List<@DecimalMin("0.0") @DecimalMax("1.0") Double>> polygon; ``` - Exactly 4 points (`@Size`) - Exactly 2 coordinates per point (`@Size`) - Each coordinate in [0, 1] (`@DecimalMin` / `@DecimalMax`) - No duplicate points — custom `@UniquePoints` validator with a proper validation message; `null` passes through (field is optional) DB `CHECK` constraint (from Markus's discussion) enforces the 4-point invariant at the data layer as a second line of defence. --- ### ✅ Resolved: `AnnotationShape.svelte` component split `PdfViewer` must not own the rect/polygon rendering decision — that is a second responsibility. Extract `AnnotationShape.svelte` as a **separate commit before the polygon feature**: - `AnnotationShape.svelte` receives one annotation, renders `<rect>` or `<polygon>` internally - `PdfViewer` iterates and delegates: `{#each annotations as annotation (annotation.id)}<AnnotationShape {annotation} />{/each}` - Commit 1: extract component, move existing `<rect>` rendering — no behavior change - Commit 2: add polygon branch inside `AnnotationShape` --- ### ✅ Resolved: TDD order First failing test is a DTO validation unit test: ```java @Test void shouldRejectPolygonWithThreePoints() { var dto = new CreateAnnotationDTO(); dto.setPolygon(List.of( List.of(0.1, 0.1), List.of(0.9, 0.1), List.of(0.9, 0.9) )); var violations = validator.validate(dto); assertThat(violations).anyMatch(v -> v.getPropertyPath().toString().equals("polygon")); } ``` Full TDD order: DTO validation → `AttributeConverter` unit → round-trip `@DataJpaTest` → API integration slice. --- ### ✅ Resolved: Implementation sequencing — #226 and #227 ship together The "Depends On: #226" relationship in this issue is backward for Kraken quality. Shipping #226 first means Kraken goes live with AABB-only output that is explicitly described as imprecise. Agreed: **implement #226 and #227 in a single branch**, closing both issues in one PR. Branch implementation order: 1. Base OCR microservice — Surya path, AABB only (#226 core) 2. Kraken path with `polygon_to_quad` and polygon output (#226 + #227) 3. Java backend changes — `AttributeConverter`, DTO validation, Flyway migration (#227) 4. `AnnotationShape.svelte` split + polygon rendering (#227) --- ### Overall read The feature is well-scoped. The main implementation gaps were the undefined `approximate_to_quad` algorithm and the DTO validation boundary. Both are now settled. The sequencing fix (shipping #226 and #227 together) avoids deliberately degraded Kraken output and simplifies the deployment story.
Author
Owner

Implementation Complete — Polygon Annotation Support

Implemented together with #226 on branch feat/issue-226-227-ocr-pipeline-polygon (as agreed in the discussion summary).

What was implemented for #227

Database:

  • V23 migration: polygon JSONB column on document_annotations with CHECK (polygon IS NULL OR jsonb_array_length(polygon) = 4)

Backend:

  • DocumentAnnotation entity: polygon field using @JdbcTypeCode(SqlTypes.JSON)
  • PolygonConverter: standalone AttributeConverter<List<List<Double>>, String> (retained as utility)
  • CreateAnnotationDTO: polygon field with @Size(min=4, max=4), @UniquePoints, coordinate range validation @DecimalMin("0.0") @DecimalMax("1.0")
  • @UniquePoints custom validator rejects duplicate coordinates
  • createOcrAnnotation() on AnnotationService: skips overlap check, accepts optional polygon
  • ADR-002 documenting the JSONB design decision

Python microservice:

  • Kraken engine outputs quadrilateral polygons via pure-Python convex hull (Jarvis march) + minimum bounding rectangle (rotating calipers) — no scipy dependency
  • Polygon coordinates normalized to [0,1] relative to page dimensions

Frontend:

  • AnnotationShape.svelte: extracted from AnnotationLayer, renders <rect> or clip-path: polygon() based on annotation.polygon
  • Annotation type updated with optional polygon?: [number, number][] | null
  • All 687 frontend tests pass

Test coverage

  • PolygonConverterTest: 6 tests (null handling, serialization, round-trip)
  • UniquePointsValidatorTest: 8 tests (null, valid, duplicates, wrong count, out-of-range coords)
  • AnnotationServiceTest: 3 new tests for createOcrAnnotation (polygon pass-through, null polygon, no overlap check)
  • All existing AnnotationLayer browser tests pass with the AnnotationShape extraction
## Implementation Complete — Polygon Annotation Support Implemented together with #226 on branch `feat/issue-226-227-ocr-pipeline-polygon` (as agreed in the discussion summary). ### What was implemented for #227 **Database:** - V23 migration: `polygon JSONB` column on `document_annotations` with `CHECK (polygon IS NULL OR jsonb_array_length(polygon) = 4)` **Backend:** - `DocumentAnnotation` entity: `polygon` field using `@JdbcTypeCode(SqlTypes.JSON)` - `PolygonConverter`: standalone `AttributeConverter<List<List<Double>>, String>` (retained as utility) - `CreateAnnotationDTO`: polygon field with `@Size(min=4, max=4)`, `@UniquePoints`, coordinate range validation `@DecimalMin("0.0") @DecimalMax("1.0")` - `@UniquePoints` custom validator rejects duplicate coordinates - `createOcrAnnotation()` on `AnnotationService`: skips overlap check, accepts optional polygon - ADR-002 documenting the JSONB design decision **Python microservice:** - Kraken engine outputs quadrilateral polygons via pure-Python convex hull (Jarvis march) + minimum bounding rectangle (rotating calipers) — no scipy dependency - Polygon coordinates normalized to [0,1] relative to page dimensions **Frontend:** - `AnnotationShape.svelte`: extracted from `AnnotationLayer`, renders `<rect>` or `clip-path: polygon()` based on `annotation.polygon` - `Annotation` type updated with optional `polygon?: [number, number][] | null` - All 687 frontend tests pass ### Test coverage - `PolygonConverterTest`: 6 tests (null handling, serialization, round-trip) - `UniquePointsValidatorTest`: 8 tests (null, valid, duplicates, wrong count, out-of-range coords) - `AnnotationServiceTest`: 3 new tests for `createOcrAnnotation` (polygon pass-through, null polygon, no overlap check) - All existing `AnnotationLayer` browser tests pass with the AnnotationShape extraction
Sign in to join this conversation.
No Label feature
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marcel/familienarchiv#227