Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Extracting Key-Value Pairs from a Form Document

Focus mode
Extracting Key-Value Pairs from a Form Document - Amazon Textract

The following Python example shows how to extract key-value pairs in form documents from Block objects that are stored in a map. Block objects are returned from a call to AnalyzeDocument. For more information, see Form Data (Key-Value Pairs).

You use the following functions:

  • get_kv_map – Calls AnalyzeDocument, and stores the KEY and VALUE BLOCK objects in a map.

  • get_kv_relationship and find_value_block – Constructs the key-value relationships from the map.

To extract key-value pairs from a form document
  1. Configure your environment. For more information, see Prerequisites.

  2. Save the following example code to a file named textract_python_kv_parser.py. In the function get_kv_map, replace profile-name with the name of a profile that can assume the role and region with the region in which you want to run the code.

    import boto3 import sys import re import json from collections import defaultdict def get_kv_map(file_name): with open(file_name, 'rb') as file: img_test = file.read() bytes_test = bytearray(img_test) print('Image loaded', file_name) # process using image bytes session = boto3.Session(profile_name='profile-name') client = session.client('textract', region_name='region') response = client.analyze_document(Document={'Bytes': bytes_test}, FeatureTypes=['FORMS']) # Get the text blocks blocks = response['Blocks'] # get key and value maps key_map = {} value_map = {} block_map = {} for block in blocks: block_id = block['Id'] block_map[block_id] = block if block['BlockType'] == "KEY_VALUE_SET": if 'KEY' in block['EntityTypes']: key_map[block_id] = block else: value_map[block_id] = block return key_map, value_map, block_map def get_kv_relationship(key_map, value_map, block_map): kvs = defaultdict(list) for block_id, key_block in key_map.items(): value_block = find_value_block(key_block, value_map) key = get_text(key_block, block_map) val = get_text(value_block, block_map) kvs[key].append(val) return kvs def find_value_block(key_block, value_map): for relationship in key_block['Relationships']: if relationship['Type'] == 'VALUE': for value_id in relationship['Ids']: value_block = value_map[value_id] return value_block def get_text(result, blocks_map): text = '' if 'Relationships' in result: for relationship in result['Relationships']: if relationship['Type'] == 'CHILD': for child_id in relationship['Ids']: word = blocks_map[child_id] if word['BlockType'] == 'WORD': text += word['Text'] + ' ' if word['BlockType'] == 'SELECTION_ELEMENT': if word['SelectionStatus'] == 'SELECTED': text += 'X ' return text def print_kvs(kvs): for key, value in kvs.items(): print(key, ":", value) def search_value(kvs, search_key): for key, value in kvs.items(): if re.search(search_key, key, re.IGNORECASE): return value def main(file_name): key_map, value_map, block_map = get_kv_map(file_name) # Get Key Value relationship kvs = get_kv_relationship(key_map, value_map, block_map) print("\n\n== FOUND KEY : VALUE pairs ===\n") print_kvs(kvs) # Start searching a key value while input('\n Do you want to search a value for a key? (enter "n" for exit) ') != 'n': search_key = input('\n Enter a search key:') print('The value is:', search_value(kvs, search_key)) if __name__ == "__main__": file_name = sys.argv[1] main(file_name)
  3. At the command prompt, enter the following command. Replace file with the document image file that you want to analyze.

    python textract_python_kv_parser.py file
  4. When you're prompted, enter a key that's in the input document. If the code detects the key, it displays the key's value.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.