feat: extend annotations to support polygon shapes (4-point quadrilateral) #227
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Overview
Currently
DocumentAnnotationstores axis-aligned bounding boxes (x, y, width, height). This is sufficient for manually drawn rectangles and Surya OCR output, but loses precision for Kraken OCR output — Kraken uses baseline segmentation that returns polygon boundaries for historical handwriting (Kurrent, Sütterlin), where text lines curve and slant.Extending annotations to support an optional quadrilateral (4 corner points) would allow Kraken-generated blocks to be stored and displayed with precise shape, while remaining fully backward compatible with all existing rectangular annotations.
Tracked as a mid-term goal from the OCR pipeline design discussion in #226.
Problem
Kraken's segmentation outputs polygon boundaries, not rectangles. For Kurrent text lines the difference is significant — a line may be rotated 5–10° or follow a slight curve. Today the OCR pipeline computes the AABB (axis-aligned bounding box) of each Kraken polygon before storing it. The resulting annotation covers the right region but is visually imprecise ��� it bleeds into adjacent lines and doesn't highlight the actual text area.
Proposed Solution
Database
Add an optional
polygoncolumn todocument_annotations:null→ rectangle, rendered using existingx, y, width, heightfields (fully backward compatible)[[x1,y1],[x2,y2],[x3,y3],[x4,y4]]with coordinates in 0–1 range relative to page dimensionsThe existing
x, y, width, heightfields are retained as the axis-aligned bounding box of the polygon — used for overlap detection and as a fallback for clients that don't yet support polygons.Backend
DocumentAnnotationentity: add@Column(columnDefinition = "jsonb") private String polygonCreateAnnotationDTO: add optionalpolygonfieldAnnotationService: pass through polygon without modification; overlap detection continues to use the AABB fieldspolygonon the annotation response typeOCR Service (Python)
Instead of computing a simple AABB from Kraken's polygon output, approximate to a quadrilateral:
Return the quad alongside the AABB in the OCR response:
Frontend
PdfViewer: render polygon overlay using SVG<polygon>whenannotation.polygonis present, fall back to<rect>for nullAnnotationtype intypes.ts: addpolygon?: [number, number][] | nullOut of Scope
Depends On
👨💻 Felix Brandt — Senior Fullstack Developer
Questions & Observations
The most interesting function isn't shown. The Python snippet calls
approximate_to_quad(hull_pts)but doesn't implement it. That simplification from N convex hull points down to 4 is where all the precision vs. correctness trade-off lives. What algorithm is intended here? Ramer–Douglas–Peucker? A bounding quadrilateral approach? This is the one piece that needs a unit test and a design decision before implementation starts.String polygonin the Java entity. Storing JSONB as a rawStringworks but requires manual parsing at every callsite. Consider using a proper@Typemapping (Hypersistence UtilsJsonType) or a value object class with Jackson deserialization so the entity can holdList<List<Double>> polygondirectly — cleaner, testable, and avoids string manipulation in service code.Naming tension: "polygon" vs. "quadrilateral". The issue title says "4-point quadrilateral" but the field is named
polygon. If the schema only ever supports exactly 4 points,quadrilateralorquadis more precise and self-documenting. If it might later expand to N points,polygonis fine — but then the Out of Scope section should say that explicitly, because the Java entity and validation should enforce the 4-point constraint now.Where's the input validation? The issue says "pass through polygon without modification" in
AnnotationService. That implies no validation. But the frontend sends this data — it should be rejected if it has ≠ 4 points, contains values outside [0,1], or contains duplicate coordinates. Fail fast at the DTO boundary.Frontend component splitting signal.
PdfVieweris described as rendering either<polygon>or<rect>based on a runtime condition. If the rendering logic for each shape type grows (hover state, click handler, highlight animation), this component will accumulate two distinct visual behaviors. Consider a namedAnnotationOverlay.sveltethat takes a single annotation and decides which SVG primitive to use internally —PdfViewerstays clean.Suggestions
approximate_to_quadfunction: write pytest unit tests for a known rotated line polygon and verify the 4 output points are geometrically correct before wiring it into the OCR pipeline.🏛️ Markus Keller — Application Architect
Questions & Observations
JSONB vs. normalized columns. Storing the polygon as
JSONBis pragmatic — it avoids a second table and keeps the entity flat. The trade-off: you can't query on individual point coordinates, you lose type-level enforcement in the database, and range checks (0 ≤ x ≤ 1) can't be expressed asCHECKconstraints without a PL/pgSQL function. For this use case (pass-through, display only, no server-side geometry queries), JSONB is the right call. But document that decision — an ADR entry, or at minimum a comment in the Flyway migration, explaining why JSONB was chosen overNUMERIC(8,6)columns.The "pass through without modification" decision is worth questioning. The proposal stores whatever the OCR service sends as the polygon, with overlap detection continuing to use the AABB. If that assumption ever breaks — say, a future feature needs server-side polygon intersection — the service layer will need significant rework. Is there a stronger invariant we can establish now? For example: "polygon, if present, must be the convex hull of the text area as reported by Kraken." Document what the field semantically means, not just its format.
The
approximate_to_quadfunction introduces an approximation gap. ConvexHull of Kraken's polygon output → 4-point quad is a lossy compression step. For strongly curved Kurrent lines, the AABB and the quad may both be poor representations. Worth stating explicitly in the issue whether quadrilateral is a permanent solution or a stepping stone. The Out of Scope note says "more than 4 points per annotation" — but does that mean "not now" or "never"? The schema design (JSONB) can accommodate N points, so leaving the door open costs nothing.Interface between OCR service and Java backend. The issue describes what the OCR service returns, but not how it's integrated. Is the Python service called synchronously by
AnnotationService? Does it post back to a Spring endpoint? Is there a shared contract (OpenAPI spec or JSON schema)? Thepolygonfield in the OCR response JSON needs a formal contract — otherwise, format drift between the two services will cause silent data corruption.Flyway migration is a one-liner — good.
ADD COLUMN polygon JSONBis safe, backward compatible, and reversible (you canDROP COLUMNcleanly). No concerns there.Suggestions
CHECKconstraint onpolygonat the database level using a simple JSON structure check, e.g.CHECK (polygon IS NULL OR jsonb_array_length(polygon) = 4). This enforces the 4-point invariant at the data layer, where it belongs.🧪 Sara Holt — QA Engineer
Questions & Observations
Acceptance criteria are missing. The issue describes the implementation well but doesn't define what "done" looks like from a behavior standpoint. Before implementation: what should a tester verify? Some candidates:
polygon: null→ stored and retrieved asnull, rendered as<rect><polygon>The
approximate_to_quadalgorithm needs a unit test suite in Python. This is a non-trivial geometric computation. Test cases to cover:Float precision in the round-trip. Coordinates are normalized to
[0,1]and stored in JSONB. After serialization → storage → retrieval → deserialization, are the floats stable? JavaScript's 64-bit float and PostgreSQL'sfloat8will agree, but if the OCR service produces Pythonfloat32values, precision loss may occur. This should be verified with an integration test.Migration test. The Flyway migration adds a nullable column — trivially safe, but the integration test suite should confirm that: (a) the migration runs cleanly on a fresh database, and (b) existing annotation rows have
polygon = nullafter migration and still load correctly in the application.Frontend rendering regression. Any change to
PdfViewerthat adds conditional rendering logic risks breaking the existing<rect>path. A Vitest component test asserting thatannotation.polygon = nullrenders a<rect>(not a<polygon>) should be added as a regression guard.Suggested Test Plan
polygon_to_quadwith rotated, curved, degenerate inputspolygonis nullPdfViewerrenders<rect>for null polygon,<polygon>otherwise🔐 Nora "NullX" Steiner — Application Security Engineer
Questions & Observations
Unvalidated JSONB input stored directly. The issue says
AnnotationServicepasses through the polygon "without modification." If thepolygonfield inCreateAnnotationDTOis a rawString(or genericObject) that is written to JSONB without structural validation, a client can store arbitrary JSON in this column. While JSONB doesn't enable SQL injection on its own, it creates two risks: (1) bloated storage if an attacker sends megabyte-sized arrays, and (2) downstream type-confusion bugs if code later assumes the structure is[[x,y],[x,y],[x,y],[x,y]]. CWE-20 (Improper Input Validation) applies here.Fix: Validate in the DTO before the service ever sees the value:
This rejects malformed input at the controller boundary — no service change needed.
Mass assignment risk. If
polygonis added toDocumentAnnotationentity and the entity is also used in update paths, ensure the update DTO also explicitly controls which fields can be set. Passing an entity directly to a form endpoint is a mass assignment vector (CWE-915). The issue mentionsCreateAnnotationDTO— verify that no update endpoint exposes the full entity to be overwritten.SVG
<polygon>injection in the frontend. Thepointsattribute of an SVG<polygon>element is populated from backend data. If coordinates are rendered via string interpolation into an SVG attribute without escaping, an attacker who stores crafted polygon data could inject SVG attributes. In Svelte, binding via{points}is safe (Svelte escapes attribute values), but double-check the rendering code usespoints={formattedPoints}not{@html ...}.No concerns with the JSONB storage itself. JSONB is not a SQL injection vector when accessed through JPA/Hibernate. The migration (
ADD COLUMN polygon JSONB) is safe.scipy/numpy supply chain. The OCR service is adding
scipyas a new dependency forConvexHull. Ensurerequirements.txtpins the exact version and that the CI pipeline runspip-auditor similar.scipyis a large dependency with a history of CVEs in its Fortran bindings (generally not exploitable in this context, but worth noting for dependency hygiene).Summary of Findings
🎨 Leonie Voss — UI/UX Designer
Questions & Observations
No visual design spec. This is a technical issue, but the polygon overlay will be visible to users when they view OCR'd documents. What should it look like? Right now, the rectangular annotation style exists somewhere in the codebase. The polygon variant needs the same visual treatment: fill opacity, stroke color, hover state, selected state. Without a design spec, the developer will make arbitrary choices that may not match the existing annotation appearance.
The fallback rendering path must be visually identical to the current behavior. The
<polygon>SVG element and the<rect>SVG element render differently in SVG coordinate space. A<rect>is defined byx,y,width,height— a<polygon>bypoints. If styling is applied via CSS class (e.g..annotation-overlay), both elements will inherit it. But if any style is applied via SVG presentation attributes (fill,stroke) directly on the element tag, you need to explicitly replicate them on both elements. Verify this is handled.Accessibility of SVG shapes. Both
<rect>and<polygon>need the same accessible treatment:role="img"orrole="region"with anaria-labeldescribing the annotation (e.g. "OCR block, page 1")tabindex="0", keyboard handlers, and a visible focus indicatoroutlinebehaves differently on SVG elements vs HTML elements; usestroke+filter: drop-shadowfor SVG focus statesWill users be able to distinguish polygon annotations from rectangle annotations? If all annotations look the same regardless of shape, users lose the precision benefit this feature is supposed to provide. Consider a subtle visual cue (e.g. slightly different stroke style or corner treatment) that signals "this annotation was computed from a precise polygon." At minimum, note in the issue whether same-appearance is intentional or a gap.
Coordinate rendering at different zoom levels. Normalized 0–1 coordinates scaled by the page dimensions in the viewer — this is the right approach and should scale correctly. But confirm that the SVG viewport or
viewBoxis set consistently for both<rect>and<polygon>rendering. A mismatch in how page dimensions are mapped to SVG space would cause the polygon overlay to appear in the wrong position.Suggestions
🛠️ Tobias Wendt — DevOps & Platform Engineer
Questions & Observations
scipy is a heavyweight dependency.
scipypulls innumpy, LAPACK/BLAS bindings, and multiple compiled extensions — the Docker image for the OCR service could grow by 300–500 MB just forConvexHull. If onlyConvexHullis needed, consider whetherscipy.spatial.ConvexHullis really necessary or if a minimal pure-Python implementation (e.g. the gift-wrapping algorithm for 2D points) would suffice. This avoids the image bloat, speeds up CI image builds, and eliminates a large compiled-extension dependency from the attack surface.The
approximate_to_quadfunction is the undefined piece. From a deployment standpoint, if this function fails at runtime (e.g. on a degenerate polygon that makes ConvexHull throw aQhullError), what happens? Does the OCR service crash? Return a 500 to the Java backend? Return the annotation without a polygon field? The fallback behavior needs to be defined and handled gracefully — the OCR pipeline should not fail a whole document because one text block had a degenerate polygon.No observability change mentioned. The polygon computation is a new failure mode in the OCR service. If
polygon_to_quadsilently produces incorrect results (coordinates outside [0,1], NaN values, etc.), you'll have no visibility unless you log or metric it. Add a log line when polygon computation falls back to None, and consider a counter metric (e.g. Prometheusocr_polygon_fallback_total) so you can see the fallback rate in Grafana.Migration deployment.
ADD COLUMN polygon JSONBis safe to run with the application live (no lock escalation, no data rewrite). The backend can be deployed before or after the migration without compatibility issues in either direction — the column is nullable and the existing code ignores it. Good design.Python service requirements.txt. Confirm
scipyandnumpyare pinned to exact versions inrequirements.txt, not justscipy>=1.x. Floating dependency versions in a container image are reproducibility problems waiting to happen. The CI build should fail if the lock file is out of date.Suggestions
polygon_to_quadin a try/except that logs the error and returnsNone, so a polygon computation failure degrades gracefully to AABB-only storage rather than blocking the annotation entirely.🏛️ Markus Keller — Application Architect
Interactive discussion — architectural decisions agreed with @marcel
✅ Resolved: OCR service integration contract
The existing ADR-001 (
docs/adr/001-ocr-python-microservice.md) already defines the transport: Spring Boot calls the Python OCR service viaRestClientover HTTP. Thepolygonfield is simply added to the existing JSON response body. Contract is enforced by a Pydantic model in the Python service and a matching Java DTO on the Spring Boot side. No new transport decision required.✅ Resolved: 4-point constraint enforced at the database layer
The 4-point limit is a permanent domain constraint, not a temporary restriction. The Flyway migration will include a named
CHECKconstraint:Two separate statements so the constraint is named and independently droppable.
✅ Resolved: Java entity typed via
AttributeConverter, no new dependencyprivate String polygonis rejected. The entity will holdList<List<Double>> polygonbacked by a smallAttributeConverter<List<List<Double>>, String>. Hypersistence Utils is not currently a project dependency and won't be added for a single use case.✅ Resolved: JSONB decision gets a full ADR
A new
docs/adr/002-polygon-jsonb-storage.mdwill document the JSONB choice, the alternatives considered (8NUMERICcolumns, separate join table), and why they were rejected (no server-side geometry queries needed, pass-through display-only storage).✅ Resolved: Semantic invariant for the
polygonfieldAgreed definition for the entity Javadoc and the ADR:
Key points: polygon is source-agnostic (Kraken or future manual input), AABB is always the geometry source of truth for server-side logic.
Overall read
The design is sound. The main gaps were documentation (ADR, semantic definition) and a missing DB-level constraint — both easy to close before implementation starts. The integration contract with the OCR service was already resolved by ADR-001, just not referenced from this issue. Recommend linking #226 and ADR-001 explicitly in the implementation tasks.
👨💻 Felix Brandt — Senior Fullstack Developer
Interactive discussion — implementation decisions agreed with @marcel
✅ Resolved:
approximate_to_quadalgorithm — pure Python, no scipyThe Python snippet in the issue calls
approximate_to_quadbut never implements it. Agreed algorithm: rotating calipers minimum bounding rectangle on the convex hull of Kraken's polygon output. Convex hull computed with a pure-Python gift-wrapping (Jarvis march) implementation.scipy, nonumpy, nocv2— pure Python only✅ Resolved: DTO validation scope
Four-layer validation on
CreateAnnotationDTO.polygon:@Size)@Size)@DecimalMin/@DecimalMax)@UniquePointsvalidator with a proper validation message;nullpasses through (field is optional)DB
CHECKconstraint (from Markus's discussion) enforces the 4-point invariant at the data layer as a second line of defence.✅ Resolved:
AnnotationShape.sveltecomponent splitPdfViewermust not own the rect/polygon rendering decision — that is a second responsibility. ExtractAnnotationShape.svelteas a separate commit before the polygon feature:AnnotationShape.sveltereceives one annotation, renders<rect>or<polygon>internallyPdfVieweriterates and delegates:{#each annotations as annotation (annotation.id)}<AnnotationShape {annotation} />{/each}<rect>rendering — no behavior changeAnnotationShape✅ Resolved: TDD order
First failing test is a DTO validation unit test:
Full TDD order: DTO validation →
AttributeConverterunit → round-trip@DataJpaTest→ API integration slice.✅ Resolved: Implementation sequencing — #226 and #227 ship together
The "Depends On: #226" relationship in this issue is backward for Kraken quality. Shipping #226 first means Kraken goes live with AABB-only output that is explicitly described as imprecise. Agreed: implement #226 and #227 in a single branch, closing both issues in one PR.
Branch implementation order:
polygon_to_quadand polygon output (#226 + #227)AttributeConverter, DTO validation, Flyway migration (#227)AnnotationShape.sveltesplit + polygon rendering (#227)Overall read
The feature is well-scoped. The main implementation gaps were the undefined
approximate_to_quadalgorithm and the DTO validation boundary. Both are now settled. The sequencing fix (shipping #226 and #227 together) avoids deliberately degraded Kraken output and simplifies the deployment story.Implementation Complete — Polygon Annotation Support
Implemented together with #226 on branch
feat/issue-226-227-ocr-pipeline-polygon(as agreed in the discussion summary).What was implemented for #227
Database:
polygon JSONBcolumn ondocument_annotationswithCHECK (polygon IS NULL OR jsonb_array_length(polygon) = 4)Backend:
DocumentAnnotationentity:polygonfield using@JdbcTypeCode(SqlTypes.JSON)PolygonConverter: standaloneAttributeConverter<List<List<Double>>, String>(retained as utility)CreateAnnotationDTO: polygon field with@Size(min=4, max=4),@UniquePoints, coordinate range validation@DecimalMin("0.0") @DecimalMax("1.0")@UniquePointscustom validator rejects duplicate coordinatescreateOcrAnnotation()onAnnotationService: skips overlap check, accepts optional polygonPython microservice:
Frontend:
AnnotationShape.svelte: extracted fromAnnotationLayer, renders<rect>orclip-path: polygon()based onannotation.polygonAnnotationtype updated with optionalpolygon?: [number, number][] | nullTest coverage
PolygonConverterTest: 6 tests (null handling, serialization, round-trip)UniquePointsValidatorTest: 8 tests (null, valid, duplicates, wrong count, out-of-range coords)AnnotationServiceTest: 3 new tests forcreateOcrAnnotation(polygon pass-through, null polygon, no overlap check)AnnotationLayerbrowser tests pass with the AnnotationShape extraction