Locating Items on a Document Page

Focus mode

Locating Items on a Document Page - Amazon Textract

Amazon Textract operations return the location and geometry of items found on a document page. DetectDocumentText and GetDocumentTextDetection return the location and geometry for lines and words, while AnalyzeDocument and GetDocumentAnalysis return the location and geometry of key-value pairs, tables, cells, and selection elements.

To determine where an item is on a document page, use the bounding box (Geometry) information returned by the Amazon Textract operation in a Block object. The Geometry object contains two types of location and geometric information for detected items:

An axis-aligned BoundingBox object that contains the top-left coordinate and the width and height of the item.
A polygon object that describes the outline of the item, specified as an array of Point objects that contain X (horizontal axis) and Y (vertical axis) document page coordinates of each point.

The JSON for a Block object looks similar to the following. Note the BoundingBox and Polygon fields.


{
    "Geometry": {
        "BoundingBox": {
            "Width": 0.053907789289951324, 
            "Top": 0.08913730084896088, 
            "Left": 0.11085548996925354, 
            "Height": 0.013171200640499592
        }, 
        "Polygon": [
            {
                "Y": 0.08985357731580734, 
                "X": 0.11085548996925354
            }, 
            {
                "Y": 0.08913730084896088, 
                "X": 0.16447919607162476
            }, 
            {
                "Y": 0.10159222036600113, 
                "X": 0.16476328670978546
            }, 
            {
                "Y": 0.10230850428342819, 
                "X": 0.11113958805799484
            }
        ]
    }, 
    "Text": "Name:", 
    "TextType": "PRINTED",
    "BlockType": "WORD", 
    "Confidence": 99.56285858154297, 
    "Id": "c734fca6-c4c4-415c-b6c1-30f7510b72ee"
},

You can use geometry information to draw bounding boxes around detected items. For an example that uses BoundingBox and Polygon information to draw boxes around lines and vertical lines at the start and end of each word, see Detecting Document Text with Amazon Textract. The example output is similar to the following.

Form with fields for Name, Address, and Birthdate displaying personal information of Jane Doe with address 123 Any Street, Anytown, USA and birthdate 12-26-1980.

Bounding Box

A bounding box (BoundingBox) has the following properties:

Height – The height of the bounding box as a ratio of the overall document page height.
Left – The X coordinate of the top-left point of the bounding box as a ratio of the overall document page width.
Top – The Y coordinate of the top-left point of the bounding box as a ratio of the overall document page height.
Width – The width of the bounding box as a ratio of the overall document page width.

Each BoundingBox property has a value between 0 and 1. The value is a ratio of the overall image width (applies to Left and Width) or height (applies to Height and Top). For example, if the input image is 700 x 200 pixels, and the top-left coordinate of the bounding box is (350,50) pixels, the API returns a Left value of 0.5 (350/700) and a Top value of 0.25 (50/200).

The following diagram shows the range of a document page that each BoundingBox property covers.

To display the bounding box with the correct location and size, you multiply the BoundingBox values by the document page width or height (depending on the value you want) to get the pixel values. You use the pixel values to display the bounding box. An example is using a document page of 608 pixels width x 588 pixels height, and the following bounding box values for analyzed text:


BoundingBox.Left: 0.3922065
BoundingBox.Top: 0.15567766
BoundingBox.Width: 0.284666
BoundingBox.Height: 0.2930403

The location of the text bounding box in pixels is calculated as follows:

Left coordinate = BoundingBox.Left (0.3922065) * document page width (608) = 238

Top coordinate = BoundingBox.Top (0.15567766) * document page height (588) = 91

Bounding box width = BoundingBox.Width (0.284666) * document page width (608) = 173

Bounding box height = BoundingBox.Height (0.2930403) * document page height (588) = 172

You use these values to display a bounding box around the analyzed text. The following Java and Python examples demonstrate how to display a bounding box.

anchor anchor


    public void ShowBoundingBox(int imageHeight, int imageWidth, BoundingBox box, Graphics2D g2d) {

        float left = imageWidth * box.getLeft();
        float top = imageHeight * box.getTop();

        // Display bounding box.
        g2d.setColor(new Color(0, 212, 0));
        g2d.drawRect(Math.round(left / scale), Math.round(top / scale),
                Math.round((imageWidth * box.getWidth()) / scale), Math.round((imageHeight * box.getHeight())) / scale);

    }

Polygon

The polygon returned by AnalyzeDocument is an array of Point objects. Each Point has an X and Y coordinate for a specific location on the document page. Like the BoundingBox coordinates, the polygon coordinates are normalized to the document width and height, and are between 0 and 1.

You can use points in the polygon array to display a finer-grain bounding box around a Block object. You calculate the position of each polygon point on the document page by using the same technique used for BoundingBoxes. Multiply the X coordinate by the document page width, and multiply the Y coordinate by the document page height.

The following example shows how to display the vertical lines of a polygon.


    public void ShowPolygonVerticals(int imageHeight, int imageWidth, List <Point> points, Graphics2D g2d) {

        g2d.setColor(new Color(0, 212, 0));
        Object[] parry = points.toArray();
        g2d.setStroke(new BasicStroke(2));

        g2d.drawLine(Math.round(((Point) parry[0]).getX() * imageWidth),
                Math.round(((Point) parry[0]).getY() * imageHeight), Math.round(((Point) parry[3]).getX() * imageWidth),
                Math.round(((Point) parry[3]).getY() * imageHeight));

        g2d.setColor(new Color(255, 0, 0));
        g2d.drawLine(Math.round(((Point) parry[1]).getX() * imageWidth),
                Math.round(((Point) parry[1]).getY() * imageHeight), Math.round(((Point) parry[2]).getX() * imageWidth),
                Math.round(((Point) parry[2]).getY() * imageHeight));

    }

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Interpreting Responses

Text Detection and Document Analysis Response Objects

Next topic:

Text Detection and Document Analysis Response Objects

Previous topic:

Interpreting Responses

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Locating Items on a Document Page

Bounding Box

Polygon

Next topic:

Previous topic:

Need help?

On this page

Did this page help you?

Form with fields for Name, Address, and Birthdate displaying personal information of Jane Doe with address 123 Any Street, Anytown, USA and birthdate 12-26-1980.