

# Match input data using a matching workflow
<a name="create-matching-workflow"></a>

A *matching workflow* is a data processing job that combines and compares data from different input sources and determines which records match based on different matching techniques. AWS Entity Resolution reads your data from your specified locations, finds matches between records, and assigns a [Match ID](glossary.md#match-id-defin) to each matched set of data.

The following diagram summarizes how to create a matching workflow.

![\[A summary of the four steps to create a matching workflow in AWS Entity Resolution\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/HIW-Matching-Workflow.png)

**Topics**
+ [Matching workflow types](#matching-workflow-types)
+ [Data output options](#data-output-options)
+ [Matching workflow results](#matching-workflow-results)
+ [Creating a rule-based matching workflow](creating-matching-workflow-rule-based.md)
+ [Creating a machine learning-based matching workflow](create-matching-workflow-ml.md)
+ [Creating a provider service-based matching workflow](create-matching-workflow-provider.md)
+ [Editing a matching workflow](edit-matching-workflow.md)
+ [Deleting a matching workflow](delete-matching-workflow.md)
+ [Modifying or generating a Match ID for a rule-based matching workflow](generate-match-id.md)
+ [Looking up a Match ID for a rule-based matching workflow](find-match-id.md)
+ [Deleting records from a rule-based or ML-based matching workflow](delete-records.md)
+ [Troubleshooting matching workflows](troubleshooting.md)

## Matching workflow types
<a name="matching-workflow-types"></a>

AWS Entity Resolution supports three types of matching workflows: 

Rule-based matching  
Uses configurable rules to identify matching records based on exact or fuzzy matching of specified fields. You define the matching criteria, such as matching names that are spelled similarly or addresses that are formatted differently. 

Machine learning-based matching  
Uses machine learning models to identify similar records, even when the data has variations, errors, or missing fields. This approach can detect more complex matches than rule-based matching. 

Provider service-based matching  
Uses third-party data providers to enrich and validate your data before matching. This type of matching is not compatible with Amazon Connect Customer Profiles output.

## Data output options
<a name="data-output-options"></a>

AWS Entity Resolution can write data output files to: 
+ An Amazon S3 location that you specify 
+ Amazon Connect Customer Profiles (for customer data deduplication) 

**Important**  
Exporting to Amazon Connect Customer Profiles is not compatible with provider-based matching. To export to Amazon Connect Customer Profiles, you must use rule-based matching or machine learning-based matching.

You can use AWS Entity Resolution to hash output data if desired – helping you maintain control over your data. 

The following table shows the three types of matching workflows and their supported output destinations.


| Matching type | S3 output | Customer Profiles Output | 
| --- | --- | --- | 
| [rule-based](creating-matching-workflow-rule-based.md) | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/success_icon.png) Yes | 
| [machine learning-based](create-matching-workflow-ml.md) | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/success_icon.png) Yes | 
| [provider service-based](create-matching-workflow-provider.md) | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/success_icon.png) Yes | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/negative_icon.png)No | 

## Matching workflow results
<a name="matching-workflow-results"></a>

After you create and run a matching workflow, you can view the results in your specified S3 location or in Amazon Connect Customer Profiles. Matching workflows generate IDs after the data is indexed.

A matching workflow can have multiple runs and the results (successes or errors) are written to a folder with the `jobId` as the name.

For each run for S3 output destinations:
+ The data output contains both a file for successful matches and a file for errors
+ Successful results are written to a `success` folder containing multiple files
+ Errors are written to an `error` folder with multiple fields

For each run for Amazon Connect Customer Profiles output destinations:
+ Deduplicated customer records are sent directly to your Amazon Connect instance
+ You can view your recent job history in the AWS Entity Resolution console
+ Existing profiles in Amazon Connect are not included in the deduplication process

After you create and run a matching workflow, you can use the output of [rule-based matching](creating-matching-workflow-rule-based.md) or [machine learning (ML) matching](create-matching-workflow-ml.md) as an input to [provider service-based matching](create-matching-workflow-provider.md) or the other way around to meet your business needs. 

For example, to save provider subscription costs, you can first run [rule-based matching](creating-matching-workflow-rule-based.md) to find matches on your data. Then, you can send a subset of unmatched records to [provider service-based matching](create-matching-workflow-provider.md). Note that if you plan to export to Customer Profiles, you should use rule-based or machine learning-based matching only.

For more information about troubleshooting errors, see [Troubleshooting matching workflows](troubleshooting.md). 

# Creating a rule-based matching workflow
<a name="creating-matching-workflow-rule-based"></a>

*[Rule-based matching](glossary.md#rule-based-matching-defn)* is a hierarchical set of waterfall matching rules, suggested by AWS Entity Resolution, based upon the data that you input and is completely configurable by you. The rule-based matching workflow enables you to compare cleartext or hashed data to find exact matches based on criteria that you customize.

When AWS Entity Resolution finds a match between two or more records in your data, it assigns:
+ A [Match ID](glossary.md#match-id-defin) to the records in the matched set of data
+ The [Match rule](glossary.md#match-rule-defn) that generated the match.

When you create a rule-based matching workflow in AWS Entity Resolution, you must choose either a **Simple** or **Advanced** rule type. The rule type determines the complexity of rule conditions you can create. You can't change the rule type after creating the workflow.

You can use the following chart to compare the two **Rule types** and determine which one suits your use case.


**Rule type comparison chart**  

| Use case | Advanced rule type | Simple rule type | 
| --- |--- |--- |
| Schema mappings mapped one-to-one with input types | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/success_icon.png) Yes | No | 
| --- |--- |--- |
| Schema mapping with multiple data columns mapped to the same input types | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/negative_icon.png) No | Yes | 
| --- |--- |--- |
| Supports Exact and Fuzzy matching | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/success_icon.png) Yes | No (Exact matching only) | 
| --- |--- |--- |
| Supports AND, OR, and parentheses operators | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/success_icon.png) Yes | No (AND operator only) | 
| --- |--- |--- |
| Supports batch workflows | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/success_icon.png) Yes | Yes | 
| --- |--- |--- |
| Supports incremental workflows | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/success_icon.png) Yes | Yes | 
| --- |--- |--- |
| Supports real-time workflows | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/negative_icon.png)No | Yes | 
| --- |--- |--- |
| Supports ID mapping workflows | ![\[alt text not found\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/negative_icon.png) No | Yes | 
| --- |--- |--- |

After you have determined which rule type you want to use, use the following topics to create a rule-based matching workflow with either the **Advanced** or **Simple** rule type.

**Topics**
+ [Creating a rule-based matching workflow with the Advanced rule type](rule-based-mw-advanced.md)
+ [Creating a rule-based matching workflow with the Simple rule type](rule-based-mw-simple.md)

# Creating a rule-based matching workflow with the Advanced rule type
<a name="rule-based-mw-advanced"></a>

**Prerequisites**

Before you create a rule-based matching workflow, you must:

1. Create a schema mapping. For more information, see [Creating a schema mapping](create-schema-mapping.md).

1. If using Amazon Connect Customer Profiles as your output destination, ensure you have the appropriate permissions configured.

The following procedure demonstrates how to create a rule-based matching workflow with the **Advanced** rule type using either the AWS Entity Resolution console or the `CreateMatchingWorkflow` API.

------
#### [ Console ]

**To create a rule-based matching workflow with the **Advanced** rule type using the console**

1. Sign in to the AWS Management Console and open the AWS Entity Resolution console at [https://console.aws.amazon.com/entityresolution/](https://console.aws.amazon.com/entityresolution/).

1. In the left navigation pane, under **Workflows**, choose **Matching**.

1. On the **Matching workflows** page, in the upper right corner, choose **Create matching workflow**.

1. For **Step 1: Specify matching workflow details**, do the following: 

   1. Enter a **Matching workflow name** and an optional **Description**.

   1. For **Data input**, choose an **AWS Region**, **AWS Glue database**, the **AWS Glue table**, and then the corresponding **Schema mapping**.

      You can add up to 19 data inputs.
**Note**  
To use **Advanced** rules, your schema mappings must meet the following requirements:  
Each input field must be mapped to a unique match key, unless the fields are grouped together.
If input fields are grouped together, they can share the same match key.  
For example, the following schema mapping would be valid for **Advanced** rules:  
`firstName: { matchKey: 'name', groupName: 'name' }`  
`lastName: { matchKey: 'name', groupName: 'name' }`  
In this case, the `firstName` and `lastName` fields are grouped together and share the same name match key, which is allowed.  
Review your schema mappings and update them to follow this one-to-one matching rule, unless the fields are properly grouped, in order to use **Advanced** rules.
If your data table has a DELETE column, the schema mapping's type must be `String` and you can't have a `matchKey` and `groupName`. 

   1. The **Normalize data** option is selected by default, so that data inputs are normalized before matching. If you don't want to normalize data, deselect the **Normalize data** option.
**Note**  
Normalization is only supported for the following scenarios in **Create schema mapping**:   
If the following **Name** sub-types are grouped: **First name**, **Middle name**, **Last name**.
If the following **Address** sub-types are grouped: **Street address 1**, **Street address 2**, **Street address 3**, **City**, **State**, **Country**, **Postal code**.
If the following **Phone** sub-types are grouped: **Phone number**, **Phone country code**.

   1. To specify the **Service access** permissions, choose an option and take the recommended action.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/rule-based-mw-advanced.html)

   1. (Optional) To enable **Tags** for the resource, choose **Add new tag**, and then enter the **Key** and **Value** pair.

   1. Choose **Next**.

1. For **Step 2: Choose matching technique**:

   1. For **Matching method**, choose **Rule-based matching**.

   1. For **Rule type**, choose **Advanced**.  
![\[Choose matching technique screen with the Advanced rule-based matching option selected.\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/choose-matching-method-rule-based-advanced.PNG)

   1. For **Processing cadence**, select one of the following options.
      + Choose **Manual** to run a workflow on demand for a bulk update 
      + Choose **Automatic** to run a workflow as soon as new data is in your S3 bucket 
**Note**  
If you choose **Automatic**, ensure that you have Amazon EventBridge notifications turned on for your S3 bucket. For instructions on enabling Amazon EventBridge using the S3 console, see [Enabling Amazon EventBridge](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-event-notifications-eventbridge.html) in the *Amazon S3 User Guide*.

   1. For **Matching rules**, enter a **Rule name** and then build the **Rule condition** by choosing the appropriate matching functions and operators from the dropdown list based on your goal.

      You can create up to 25 rules.

      You must combine a fuzzy matching function (**Cosine**, **Levenshtein**, or **Soundex**) with an exact matching function (**Exact**, **ExactManyToMany**) using the **AND** operator.

      You can use the following table to help decide what type of function or operator you want to use, depending on your goal.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/rule-based-mw-advanced.html)  
**Example Rule condition that matches on phone numbers and email**  

      The following is an example of a rule condition that matches records on phone numbers (**Phone** match key) and email addresses (**Email address** match key):

      `Exact(Phone,EmptyValues=Process) AND Levenshtein("Email address",2)`  
![\[Example of a rule condition that matches records on phone numbers and email addresses.\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/matching-rule-condition-example.png)

      The **Phone** match key uses the **Exact** matching function to match identical strings. The **Phone** match key processes empty values in matching using the **EmptyValues=Process** modifier.

      The **Email address** match key uses the **Levenshtein** matching function to match data with misspellings using the default Levenshtein Distance algorithm threshold of 2. The **Email** match key doesn't use any optional modifiers.

      The **AND** operator combines the **Exact** matching function and the **Levenshtein** matching function.  
**Example Rule condition that uses ExactManyToMany to perform matchkey matching**  

      The following is an example of a rule condition that matches records on three address fields (**HomeAddress** match key, **BillingAddress** match key, and **ShippingAddress** match key to find potential matches by checking if any if any of them have identical values. 

      The `ExactManyToMany` operator evaluates all possible combinations of the specified address fields to identify exact matches between any two or more addresses. For example, it would detect if the `HomeAddress` matches either the `BillingAddress` or `ShippingAddress`, or if all three addresses match exactly.

      ```
      ExactManyToMany(HomeAddress, BillingAddress, ShippingAddress)
      ```  
**Example Rule condition that uses clustering**  

      In Advanced Rule Based Matching with fuzzy conditions, the system first groups records into clusters based on exact matches. Once these initial clusters are formed, the system applies fuzzy matching filters to identify additional matches within each cluster. For optimal performance, you should select exact match conditions based on your data patterns to create well-defined initial clusters. 

      The following is an example of a rule condition that combines multiple exact matches with a fuzzy match requirement. It uses `AND` operators to check that three fields — `FullName`, Date of Birth (`DOB`), and `Address` — match exactly between records. It also allows for minor variations in the `InternalID` field using a Levenshtein distance of `1`. The Levenshtein distance measures the minimum number of single-character edits required to change one string into another. A distance of 1 means it will match `InternalIDs` that differ by only one character (like a single typo, deletion, or insertion). This combination of conditions helps identify records that are very likely to represent the same entity, even if there are small discrepancies in the identifier.

      ```
      Exact(FullName) AND Exact(DOB) AND Exact(Address) and Levenshtein(InternalID, 1)
      ```

   1. Choose **Next**.

1. For **Step 3: Specify data output and format**:

   1. For **Data output destination and format**, choose the **Amazon S3 location** for the data output and whether the **Data format** will be **Normalized data** or **Original data**.

   1. For **Encryption**, if you choose to **Customize encryption settings**, enter the **AWS KMS key** ARN.

   1. View the **System generated output**.

   1. For **Data output**, decide which fields you want to include, hide, or mask, and then take the recommended actions based on your goals.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/rule-based-mw-advanced.html)

   1. Choose **Next**.

1. For **Step 4: Review and create**:

   1. Review the selections that you made for the previous steps and edit if necessary.

   1. Choose **Create and run**.

      A message appears, indicating that the matching workflow has been created and that the job has started.

1. On the matching workflow details page, on the **Metrics** tab, view the following under **Last job metrics**:
   + The **Job ID**. 
   + The **Status** of the matching workflow job: **Queued**, **In progress**, **Completed**, **Failed** 
   + The **Time completed** for the workflow job.
   + The number of **Records processed**. 
   + The number of **Records not processed**. 
   + The **Unique match IDs generated**.
   + The number of **Input records**.

   You can also view the job metrics for matching workflow jobs that have been previously run under the **Job history**.

1. After the matching workflow job completes (**Status** is **Completed**), you can go to the **Data output** tab and then select your **Amazon S3 location** to view the results.

1. (**Manual** processing type only) If you have created a **Rule-based matching** workflow with the **Manual** processing type, you can run the matching workflow anytime by choosing **Run workﬂow** on the matching workflow details page.

1. (**Automatic** processing type only) If your data table has a DELETE column, then: 
   + Records set to *true* in the DELETE column are deleted.
   + Records set to *false* in the DELETE column are ingested into S3.

   For more information, see [Step 1: Prepare first-party data tables](prepare-input-data.md#prepare-first-party-tables).

------
#### [ API ]

**To create a rule-based matching workflow with the **Advanced** rule type using the API**
**Note**  
By default, the workflow uses standard (batch) processing. To use incremental (automatic processing, you must explicitly configure it.

1. Open a terminal or command prompt to make the API request.

1. Create a POST request to the following endpoint: 

   ```
   /matchingworkflows
   ```

1. In the request header, set the Content-type to application/json. 
**Note**  
For a complete list of supported programming languages, see the *[AWS Entity Resolution API Reference](https://docs.aws.amazon.com/entityresolution/latest/apireference/Welcome.html)*. 

1. For the request body, provide the following required JSON parameters: 

   ```
   {
      "description": "string",
      "incrementalRunConfig": { 
         "incrementalRunType": "string"
      },
      "inputSourceConfig": [ 
         { 
            "applyNormalization": boolean,
            "inputSourceARN": "string",
            "schemaName": "string"
         }
      ],
      "outputSourceConfig": [ 
         { 
            "applyNormalization": boolean,
            "KMSArn": "string",
            "output": [ 
               { 
                  "hashed": boolean,
                  "name": "string"
               }
            ],
            "outputS3Path": "string"
         }
      ],
      "resolutionTechniques": { 
         "providerProperties": { 
            "intermediateSourceConfiguration": { 
               "intermediateS3Path": "string"
            },
            "providerConfiguration": JSON value,
            "providerServiceArn": "string"
         },
         "resolutionType": "RULE_MATCHING",
         "ruleBasedProperties": { 
            "attributeMatchingModel": "string",
            "matchPurpose": "string",
            "rules": [ 
               { 
                  "matchingKeys": [ "string" ],
                  "ruleName": "string"
               }
            ]
         },
         "ruleConditionProperties": { 
            "rules": [ 
               { 
                  "condition": "string",
                  "ruleName": "string"
               }
            ]
         }
      },
      "roleArn": "string",
      "tags": { 
         "string" : "string" 
      },
      "workflowName": "string"
   }
   ```

   Where:
   + `workflowName` (required) – Must be unique and between 1–255 characters matching pattern [a-zA-Z\$10-9-]\$1
   + `inputSourceConfig` (required) – List of 1–20 input source configurations
   + `outputSourceConfig` (required) – Exactly one output source configuration
   + `resolutionTechniques` (required) – Set to "RULE\$1MATCHING" as the resolutionType for rule-based matching
   + `roleArn` (required) – IAM role ARN for workflow execution
   + `ruleConditionProperties` (required) – List of rule conditions and the name of the matching rule.

   Optional parameters include:
   + `description` – Up to 255 characters
   + `incrementalRunConfig` – Incremental run type configuration
   + `tags` – Up to 200 key-value pairs

1. (Optional) To use incremental processing instead of the default standard (batch) processing, add the following parameter to the request body: 

   ```
   "incrementalRunConfig": {
      "incrementalRunType": "AUTOMATIC"
   }
   ```

1. Send the request.

1. If successful, you'll receive a response with status code 200 and a JSON body containing: 

   ```
   {
      "workflowArn": "string",
      "workflowName": "string",
      // Plus all configured workflow details
   }
   ```

1. If the call is unsuccessful, you might receive one of these errors:
   + 400 – ConflictException if the workflow name already exists
   + 400 – ValidationException if the input fails validation
   + 402 – ExceedsLimitException if account limits are exceeded
   + 403 – AccessDeniedException if you don't have sufficient access
   + 429 – ThrottlingException if the request was throttled
   + 500 – InternalServerException if there's an internal service failure

------

# Creating a rule-based matching workflow with the Simple rule type
<a name="rule-based-mw-simple"></a>

**Prerequisites**

Before you create a rule-based matching workflow, you must:

1. Create a schema mapping. For more information, see [Creating a schema mapping](create-schema-mapping.md).

1. If using Amazon Connect Customer Profiles as your output destination, ensure you have the appropriate permissions configured.

The following procedure demonstrates how to create a rule-based matching workflow with the **Simple** rule type using either the AWS Entity Resolution Console or the `CreateMatchingWorkflow` API.

------
#### [ Console ]

**To create a rule-based matching workflow with the **Simple** rule type using the console**

1. Sign in to the AWS Management Console and open the AWS Entity Resolution console at [https://console.aws.amazon.com/entityresolution/](https://console.aws.amazon.com/entityresolution/).

1. In the left navigation pane, under **Workflows**, choose **Matching**.

1. On the **Matching workflows** page, in the upper right corner, choose **Create matching workflow**.

1. For **Step 1: Specify matching workflow details**, do the following: 

   1. Enter a **Matching workflow name** and an optional **Description**.

   1. For **Data input**, choose an **AWS Region**, **AWS Glue database**, the **AWS Glue table**, and then the corresponding **Schema mapping**.

      You can add up to 19 data inputs.

   1. The **Normalize data** option is selected by default, so that data inputs are normalized before matching. If you don't want to normalize data, deselect the **Normalize data** option.
**Note**  
Normalization is only supported for the following scenarios in **Create schema mapping**:   
If the following **Name** sub-types are grouped: **First name**, **Middle name**, **Last name**.
If the following **Address** sub-types are grouped: **Street address 1**, **Street address 2**, **Street address 3**, **City**, **State**, **Country**, **Postal code**.
If the following **Phone** sub-types are grouped: **Phone number**, **Phone country code**.

   1. To specify the **Service access** permissions, choose an option and take the recommended action.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/rule-based-mw-simple.html)

   1. (Optional) To enable **Tags** for the resource, choose **Add new tag**, and then enter the **Key** and **Value** pair.

   1. Choose **Next**.

1. For **Step 2: Choose matching technique**:

   1. For **Matching method**, choose **Rule-based matching**.

   1. For **Rule type**, choose **Simple**.  
![\[Choose matching technique screen with the Simple rule-based matching option selected.\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/choose-matching-method-rule-based-simple.PNG)

   1. For **Processing cadence**, select one of the following options.
      + Choose **Manual** to run a workflow on demand for a bulk update 
      + Choose **Automatic** to run a workflow as soon as new data is in your S3 bucket 
**Note**  
If you choose **Automatic**, ensure that you have Amazon EventBridge notifications turned on for your S3 bucket. For instructions on enabling Amazon EventBridge using the S3 console, see [Enabling Amazon EventBridge](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-event-notifications-eventbridge.html) in the *Amazon S3 User Guide*.

   1. (Optional) For **Index only for ID mapping**, You can choose to **Turn on** the ability to only index the data and not generate IDs. 

      By default, matching workflow generate IDs after the data is indexed. 

   1. For **Matching rules**, enter a **Rule name** and then choose the **Match keys** for that rule.

      You can create up to 15 rules and you can apply up to 15 different match keys across your rules to define match criteria.  
![\[Matching rules interface with fields to enter rule name and select match keys.\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/matching-rules.PNG)

   1. For **Comparison type**, choose one of the following options based on your goal.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/rule-based-mw-simple.html)  
![\[Comparison type options: Multiple input fields to find matches across data stored in multiple fields, or Single input field to limit comparison within one field.\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/comparison-type.PNG)

   1. Choose **Next**.

1. For **Step 3: Specify data output and format**:

   1. For **Data output destination and format**, choose the **Amazon S3 location** for the data output and whether the **Data format** will be **Normalized data** or **Original data**.

   1. For **Encryption**, if you choose to **Customize encryption settings**, enter the **AWS KMS key** ARN.

   1. View the **System generated output**.

   1. For **Data output**, decide which fields you want to include, hide, or mask, and then take the recommended actions based on your goals.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/rule-based-mw-simple.html)

   1. Choose **Next**.

1. For **Step 4: Review and create**:

   1. Review the selections that you made for the previous steps and edit if necessary.

   1. Choose **Create and run**.

      A message appears, indicating that the matching workflow has been created and that the job has started.

1. On the matching workflow details page, on the **Metrics** tab, view the following under **Last job metrics**:
   + The **Job ID**. 
   + The **Status** of the matching workflow job: **Queued**, **In progress**, **Completed**, **Failed** 
   + The **Time completed** for the workflow job.
   + The number of **Records processed**. 
   + The number of **Records not processed**. 
   + The **Unique match IDs generated**.
   + The number of **Input records**.

   You can also view the job metrics for matching workflow jobs that have been previously run under the **Job history**.

1. After the matching workflow job completes (**Status** is **Completed**), you can go to the **Data output** tab and then select your **Amazon S3 location** to view the results.

1. (**Manual** processing type only) If you have created a **Rule-based matching** workflow with the **Manual** processing type, you can run the matching workflow anytime by choosing **Run workflow** on the matching workflow details page.

------
#### [ API ]

**To create a rule-based matching workflow with the **Simple** rule type using the API**
**Note**  
By default, the workflow uses standard (batch) processing. To use incremental (automatic processing, you must explicitly configure it.

1. Open a terminal or command prompt to make the API request.

1. Create a POST request to the following endpoint: 

   ```
   /matchingworkflows
   ```

1. In the request header, set the Content-type to application/json. 
**Note**  
For a complete list of supported programming languages, see the *[AWS Entity Resolution API Reference](https://docs.aws.amazon.com/entityresolution/latest/apireference/Welcome.html)*. 

1. For the request body, provide the following required JSON parameters: 

   ```
   {
      "description": "string",
      "incrementalRunConfig": { 
         "incrementalRunType": "string"
      },
      "inputSourceConfig": [ 
         { 
            "applyNormalization": boolean,
            "inputSourceARN": "string",
            "schemaName": "string"
         }
      ],
      "outputSourceConfig": [ 
         { 
            "applyNormalization": boolean,
            "KMSArn": "string",
            "output": [ 
               { 
                  "hashed": boolean,
                  "name": "string"
               }
            ],
            "outputS3Path": "string"
         }
      ],
      "resolutionTechniques": { 
         "providerProperties": { 
            "intermediateSourceConfiguration": { 
               "intermediateS3Path": "string"
            },
            "providerConfiguration": JSON value,
            "providerServiceArn": "string"
         },
         "resolutionType": "RULE_MATCHING",
         "ruleBasedProperties": { 
            "attributeMatchingModel": "string",
            "matchPurpose": "string",
            "rules": [ 
               { 
                  "matchingKeys": [ "string" ],
                  "ruleName": "string"
               }
            ]
         },
         "ruleConditionProperties": { 
            "rules": [ 
               { 
                  "condition": "string",
                  "ruleName": "string"
               }
            ]
         }
      },
      "roleArn": "string",
      "tags": { 
         "string" : "string" 
      },
      "workflowName": "string"
   }
   ```

   Where:
   + `workflowName` (required) – Must be unique and between 1–255 characters matching pattern [a-zA-Z\$10-9-]\$1
   + `inputSourceConfig` (required) – List of 1–20 input source configurations
   + `outputSourceConfig` (required) – Exactly one output source configuration
   + `resolutionTechniques` (required) – Set to "RULE\$1MATCHING" for rule-based matching
   + `roleArn` (required) – IAM role ARN for workflow execution
   + `ruleConditionProperties` (required) – List of rule conditions and the name of the matching rule.

   Optional parameters include:
   + `description` – Up to 255 characters
   + `incrementalRunConfig` – Incremental run type configuration
   + `tags` – Up to 200 key-value pairs

1. (Optional) To use incremental processing instead of the default standard (batch) processing, add the following parameter to the request body: 

   ```
   "incrementalRunConfig": {
      "incrementalRunType": "AUTOMATIC"
   }
   ```

1. Send the request.

1. If successful, you'll receive a response with status code 200 and a JSON body containing: 

   ```
   {
      "workflowArn": "string",
      "workflowName": "string",
      // Plus all configured workflow details
   }
   ```

1. If the call is unsuccessful, you might receive one of these errors:
   + 400 – ConflictException if the workflow name already exists
   + 400 – ValidationException if the input fails validation
   + 402 – ExceedsLimitException if account limits are exceeded
   + 403 – AccessDeniedException if you don't have sufficient access
   + 429 – ThrottlingException if the request was throttled
   + 500 – InternalServerException if there's an internal service failure

------

# Creating a machine learning-based matching workflow
<a name="create-matching-workflow-ml"></a>

*[Machine learning-based matching](glossary.md#ml-matching-defn)* is a preset process that attempts to match records across all of the data that you input. The machine learning-based matching workflow enables you to compare cleartext data to find a broad range of matches using a machine learning model.

**Note**  
The machine learning model doesn't support the comparison of hashed data.

When AWS Entity Resolution finds a match between two or more records in your data, it assigns:
+ A [Match ID](glossary.md#match-id-defin) to the records in the matched set of data
+ The match [confidence level](glossary.md#confidence-level-defn) percentage.

You can use the output of an ML-based matching workflow as an input for data service provider matching, or vice-versa to meet your specific goals. For example, you can run an ML-based matching to find matches across your data sources on your own records first. If a subset wasn't matched, you can then run [provider service- based matching](create-matching-workflow-provider.md) to find additional matches.

**Prerequisites**

Before you create an ML-based matching workflow, you must:

1. Create a schema mapping. For more information, see [Creating a schema mapping](create-schema-mapping.md).

1. If using Amazon Connect Customer Profiles as your output destination, ensure you have the appropriate permissions configured.

**To create a ML-based matching workflow:**

1. Sign in to the AWS Management Console and open the AWS Entity Resolution console at [https://console.aws.amazon.com/entityresolution/](https://console.aws.amazon.com/entityresolution/).

1. In the left navigation pane, under **Workflows**, choose **Matching**.

1. On the **Matching workflows** page, in the upper right corner, choose **Create matching workflow**.

1. For **Step 1: Specify matching workflow details**, do the following: 

   1. Enter a **Matching workflow name** and an optional **Description**.

   1. For **Data input**, choose an **AWS Region**, **AWS Glue database**, the **AWS Glue table**, and then the corresponding **Schema mapping**.

      You can add up to 20 data inputs.

   1. The **Normalize data** option is selected by default, so that data inputs are normalized before matching. If you don't want to normalize data, deselect the **Normalize data** option.

      Machine learning based-matching only normalizes [Name](glossary.md#normalization-ML-defn-name), [Phone](glossary.md#normalization-ML-defn-phone), and [Email](glossary.md#normalization-ML-defn-email).

   1. To specify the **Service access** permissions, choose an option and take the recommended action.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/create-matching-workflow-ml.html)

   1. (Optional) To enable **Tags** for the resource, choose **Add new tag**, and then enter the **Key** and **Value** pair.

   1. Choose **Next**.

1. For **Step 2: Choose matching technique**:

   1. For **Matching method**, choose **Machine learning-based matching**.  
![\[AWS Entity Resolution matching workflow creation interface with options for rule-based or machine learning matching.\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/choose-matching-method-machine-learning.PNG)

   1. For **Processing cadence**, the **Manual** option is selected.

      This option enables you to run a workflow on demand for a bulk update.
**Note**  
Automatic (incremental) processing is not supported for machine learning-based matching workflows.

   1. Choose **Next**.

1. For **Step 3: Specify data output and format**:

   1. For **Data output destination and format**, choose the **Amazon S3 location** for the data output and whether the **Data format** will be **Normalized data** or **Original data**.

   1. For **Encryption**, if you choose to **Customize encryption settings**, enter the **AWS KMS key** ARN.

   1. View the **System generated output**.

   1. For **Data output**, decide which fields you want to include, hide, or mask, and then take the recommended actions based on your goals.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/create-matching-workflow-ml.html)

   1. Choose **Next**.

1. For **Step 4: Review and create**:

   1. Review the selections that you made for the previous steps and edit if necessary.

   1. Choose **Create and run**.

      A message appears, indicating that the matching workflow has been created and that the job has started.

1. On the matching workflow details page, on the **Metrics** tab, view the following under **Last job metrics**:
   + The **Job ID**. 
   + The **Status** of the matching workflow job: **Queued**, **In progress**, **Completed**, **Failed** 
   + The **Time completed** for the workflow job.
   + The number of **Records processed**. 
   + The number of **Records not processed**. 
   + The **Unique match IDs generated**.
   + The number of **Input records**.

   You can also view the job metrics for matching workflow jobs that have been previously run under the **Job history**.

1. After the matching workflow job completes (**Status** is **Completed**), you can go to the **Data output** tab and then select your **Amazon S3 location** to view the results.

1. (**Manual** processing type only) If you have created a **Machine learning-based matching** workflow with the **Manual** processing type, you can run the matching workflow anytime by choosing **Run workflow** on the matching workflow details page.

# Creating a provider service-based matching workflow
<a name="create-matching-workflow-provider"></a>

*[Provider service-based matching](glossary.md#provider-service-matching)* enables you to match your known identifiers with your preferred data service provider.

AWS Entity Resolution currently supports the following data provider services:
+ LiveRamp
+ TransUnion
+ Unified ID 2.0

For more information about the supported provider services, see [Preparing third-party input data](prepare-third-party-input-data.md).

You can use a public subscription for these providers on AWS Data Exchange or negotiate a private offer directly with the data provider. For more information about creating a new subscription or reusing an existing subscription to a provider service, see [Step 1: Subscribe to a provider service on AWS Data Exchange](prepare-third-party-input-data.md#subscribe-provider-service).

The following sections describe how to create a provider-based matching workflow.

**Topics**
+ [Creating a matching workflow with LiveRamp](#create-mw-liveramp)
+ [Creating a matching workflow with TransUnion](#create-mw-transunion)
+ [Creating a matching workflow with UID 2.0](#create-mw-uid)

## Creating a matching workflow with LiveRamp
<a name="create-mw-liveramp"></a>

The LiveRamp service provides an identifier called the RampID. The RampID is one of the most commonly used IDs in demand-side platforms to create an audience for an advertising campaign. Using a matching workflow with LiveRamp, you can resolve hashed email addresses to RAMPIDs.

**Note**  
AWS Entity Resolution supports PII-based RampID assignment.

**Prerequisites**

Before you create a matching workflow with LiveRamp, you must:

1. Create a schema mapping. For more information, see [Creating a schema mapping](create-schema-mapping.md).

1. Have a subscription to the LiveRamp service

1. Have appropriate permissions configured to the Amazon S3 data staging bucket where you want the matching workflow output to be temporarily written

Before you create a ID mapping workflow with LiveRamp, add the following permissions to the S3 data staging bucket.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::715724997226:root"
      
            },
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::<staging-bucket>",
                "arn:aws:s3:::<staging-bucket>/*"
            ]
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::715724997226:root"
            },
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:GetBucketPolicy",
                "s3:ListBucketVersions",
                "s3:GetBucketAcl"
            ],
            "Resource": [
                "arn:aws:s3:::<staging-bucket>",
                "arn:aws:s3:::<staging-bucket>/*"
            ]
        }
    ]
}
```

------

Replace each *<user input placeholder>* with your own information.


|  |  | 
| --- |--- |
| staging-bucket | Amazon S3 bucket that temporarily stores your data while running a provider service-based workflow. | 

**To create a matching workflow with LiveRamp:**

1. Sign in to the AWS Management Console and open the AWS Entity Resolution console at [https://console.aws.amazon.com/entityresolution/](https://console.aws.amazon.com/entityresolution/).

1. In the left navigation pane, under **Workflows**, choose **Matching**.

1. On the **Matching workflows** page, in the upper right corner, choose **Create matching workflow**.

1. For **Step 1: Specify matching workflow details**, do the following: 

   1. Enter a **Matching workflow name** and an optional **Description**.

   1. For **Data input**, choose an **AWS Region**, **AWS Glue database**, the **AWS Glue table**, and then the corresponding **Schema mapping**.

      You can add up to 20 data inputs.

   1. The **Normalize data** option is selected by default, so that data inputs are normalized before matching. 
**Note**  
Normalization is only supported for the following scenarios in **Create schema mapping**:   
If the following **Name** sub-types are grouped: **First name**, **Middle name**, **Last name**.
If the following **Address** sub-types are grouped: **Street address 1**, **Street address 2**: **Street address 3 name**, **City name**, **State**, **Country**, **Postal code**.
If the following **Phone** sub-types are grouped: **Phone number**, **Phone country code**.

      If you are using the email-only resolution process, deselect the **Normalize data** option, because only hashed emails are used for input data.

   1. To specify the **Service access** permissions, choose an option and take the recommended action.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/create-matching-workflow-provider.html)

   1. (Optional) To enable **Tags** for the resource, choose **Add new tag**, and then enter the **Key** and **Value** pair.

   1. Choose **Next**.

1. For **Step 2: Choose matching technique**:

   1. For **Matching method**, choose **Provider services**.

   1. For **Provider services**, choose **LiveRamp**.
**Note**  
Ensure that your data input file format and normalization is aligned with the provider service's guidelines.   
For more information about input file formatting guidelines for the matching workflow, see [Perform Identity Resolution Through ADX](https://docs.liveramp.com/identity/en/perform-identity-resolution-through-adx.html) in the LiveRamp documentation. 

   1. For **LiveRamp products**, choose a product from the dropdown list.  
![\[Provider services options with the LiveRamp provider service selected.\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/choose-matching-method-liveramp.png)
**Note**  
If you choose **Assignment PII,** then you must provide at least one non-identifier column when performing entity resolution. For example, GENDER.

   1. For **LiveRamp configuration**, enter a **Client ID manager ARN** and a **Client secret manager ARN**.  
![\[LiveRamp configuration form with fields for Client ID manager ARN and Client secret manager ARN.\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/choose-matching-method-liveramp-config.png)

   1. For **Data staging**, choose the **Amazon S3 location** for the temporary storage of your data while it processes. 

      You must have permission to the data staging **Amazon S3 location**. For more information, see [Creating a workflow job role for AWS Entity Resolution](create-workflow-job-role.md).

   1. Choose **Next**.

1. For **Step 3: Specify data output**:

   1. For **Data output destination and format**, choose the **Amazon S3 location** for the data output and whether the **Data format** will be **Normalized data** or **Original data**.

   1. For **Encryption**, if you choose to **Customize encryption settings**, enter the **AWS KMS key** ARN.

   1. View the **LiveRamp generated output**.

      This is the additional information generated by LiveRamp.

   1. For **Data output**, decide which fields you want to include, hide, or mask, and then take the recommended actions based on your goals. 
**Note**  
If you have chosen **LiveRamp**, due to LiveRamp privacy filters that remove Personally Identifiable Information (PII), some fields will display an **Output** state of **Unavailable**.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/create-matching-workflow-provider.html)  
![\[AWS Entity Resolution ID mapping workflow creation interface with options to specify data output location.\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/specify-data-output.PNG)

   1. Choose **Next**.

1. For **Step 4: Review and create**:

   1. Review the selections that you made for the previous steps and edit if necessary.

   1. Choose **Create and run**.

      A message appears, indicating that the matching workflow has been created and that the job has started.

1. On the matching workflow details page, on the **Metrics** tab, view the following under **Last job metrics**:
   + The **Job ID**. 
   + The **Status** of the matching workflow job: **Queued**, **In progress**, **Completed**, **Failed** 
   + The **Time completed** for the workflow job.
   + The number of **Records processed**. 
   + The number of **Records not processed**. 
   + The **Unique match IDs generated**.
   + The number of **Input records**.

   You can also view the job metrics for matching workflow jobs that have been previously run under the **Job history**.

1. After the matching workflow job completes (**Status** is **Completed**), you can go to the **Data output** tab and then select your **Amazon S3 location** to view the results.

## Creating a matching workflow with TransUnion
<a name="create-mw-transunion"></a>

If you have a subscription to the TransUnion service, you can improve customer understanding by linking, matching, and enhancing customer-related records stored across disparate channels with TransUnion Person and Household E Keys and over 200 data attributes.

The TransUnion service provides identifiers known as the TransUnion Individual and Household IDs. TransUnion provides ID assignment (also known as encoding) of known identifiers such as name, address, phone number, and email address.

**Prerequisites**

Before you create a matching workflow with LiveRamp, you must:

1. Create a schema mapping. For more information, see [Creating a schema mapping](create-schema-mapping.md).

1. Have a subscription to the TransUnion service

1. Have appropriate permissions configured to the Amazon S3 data staging bucket where you want the matching workflow output to be temporarily written

Before you create a matching workflow with TransUnion, add the following permissions to the S3 data staging bucket.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::381491956555:root"
      
            },
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::<staging-bucket>",
                "arn:aws:s3:::<staging-bucket>/*"
            ]
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::381491956555:root"
            },
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:GetBucketPolicy",
                "s3:ListBucketVersions",
                "s3:GetBucketAcl"
            ],
            "Resource": [
                "arn:aws:s3:::<staging-bucket>",
                "arn:aws:s3:::<staging-bucket>/*"
            ]
        }
    ]
}
```

------

Replace each *<user input placeholder>* with your own information.


|  |  | 
| --- |--- |
| staging-bucket | Amazon S3 bucket that temporarily stores your data while running a provider service-based workflow. | 

**To create a matching workflow with TransUnion:**

1. Sign in to the AWS Management Console and open the AWS Entity Resolution console at [https://console.aws.amazon.com/entityresolution/](https://console.aws.amazon.com/entityresolution/).

1. In the left navigation pane, under **Workflows**, choose **Matching**.

1. On the **Matching workflows** page, in the upper right corner, choose **Create matching workflow**.

1. For **Step 1: Specify matching workflow details**, do the following: 

   1. Enter a **Matching workflow name** and an optional **Description**.

   1. For **Data input**, choose an **AWS Region**, **AWS Glue database**, the **AWS Glue table**, and then the corresponding **Schema mapping**.

      You can add up to 20 data inputs.

   1. The **Normalize data** option is selected by default, so that data inputs are normalized before matching. If you don't want to normalize data, deselect the **Normalize data** option.
**Note**  
Normalization is only supported for the following scenarios in **Create schema mapping**:   
If the following **Name** sub-types are grouped: **First name**, **Middle name**, **Last name**.
If the following **Address** sub-types are grouped: **Street address 1**, **Street address 2**: **Street address 3 name**, **City name**, **State**, **Country**, **Postal code**.
If the following **Phone** sub-types are grouped: **Phone number**, **Phone country code**.

   1. To specify the **Service access** permissions, choose an option and take the recommended action.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/create-matching-workflow-provider.html)

   1. (Optional) To enable **Tags** for the resource, choose **Add new tag**, and then enter the **Key** and **Value** pair.

   1. Choose **Next**.

1. For **Step 2: Choose matching technique**:

   1. For **Matching method**, choose **Provider services**.

   1. For **Provider services**, choose **TransUnion**.
**Note**  
Ensure that your data input file format and normalization is aligned with the provider service's guidelines.   
![\[Provider services options with the TransUnion provider service selected.\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/choose-matching-method-transunion.PNG)

   1. For **Data staging**, choose the **Amazon S3 location** for the temporary storage of your data while it processes. 

      You must have permission to the data staging **Amazon S3 location**. For more information, see [Creating a workflow job role for AWS Entity Resolution](create-workflow-job-role.md).

1. Choose **Next**.

1. For **Step 3: Specify data output**:

   1. For **Data output destination and format**, choose the **Amazon S3 location** for the data output and whether the **Data format** will be **Normalized data** or **Original data**.

   1. For **Encryption**, if you choose to **Customize encryption settings**, enter the **AWS KMS key** ARN.

   1. View the **TransUnion generated output**.

      This is the additional information generated by TransUnion.

   1. For **Data output**, decide which fields you want to include, hide, or mask, and then take the recommended actions based on your goals.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/create-matching-workflow-provider.html)

   1. For **System generated output**, view all of the fields that are included. 

   1. Choose **Next**.

1. For **Step 4: Review and create**:

   1. Review the selections that you made for the previous steps and edit if necessary.

   1. Choose **Create and run**.

      A message appears, indicating that the matching workflow has been created and that the job has started.

1. On the matching workflow details page, on the **Metrics** tab, view the following under **Last job metrics**:
   + The **Job ID**. 
   + The **Status** of the matching workflow job: **Queued**, **In progress**, **Completed**, **Failed** 
   + The **Time completed** for the workflow job.
   + The number of **Records processed**. 
   + The number of **Records not processed**. 
   + The **Unique match IDs generated**.
   + The number of **Input records**.

   You can also view the job metrics for matching workflow jobs that have been previously run under the **Job history**.

1. After the matching workflow job completes (**Status** is **Completed**), you can go to the **Data output** tab and then select your **Amazon S3 location** to view the results.

## Creating a matching workflow with UID 2.0
<a name="create-mw-uid"></a>

If you have a subscription to the Unified ID 2.0 service, you can activate advertising campaigns with deterministic identity and lean on interoperability with many UID2-enabled participants across the advertising ecosystem. For more information, see[ Unified ID 2.0 Overview]( https://unifiedid.com/docs/intro).

The Unified ID 2.0 service provides raw UID 2, which is used for building advertising campaigns in The Trade Desk platform. UID 2.0 is generated using an open source framework.

In one workflow you can use either **Email Address** or **Phone number** for raw UID2 generation but not both. If both are present in the schema mapping, then the workflow will pick the **Email Address** and the **Phone number** will be a pass-through field. To support both, create a new schema mapping where **Phone number** is mapped but **Email Address** isn't mapped. Then, create a second workflow using this new schema mapping.

**Note**  
Raw UID2s are created by adding salts from salt buckets which are rotated approximately once a year, causing the raw UID2 to also be rotated with it. Therefore, it's recommended that you refresh the raw UID2s daily. For more information, see [https://unifiedid.com/docs/getting-started/gs-faqs\$1how-often-should-uid2s-be-refreshed-for-incremental-updates](https://unifiedid.com/docs/getting-started/gs-faqs#how-often-should-uid2s-be-refreshed-for-incremental-updates).

**Prerequisites**

Before you create a matching workflow with UID 2.0, you must:

1. Create a schema mapping. For more information, see [Creating a schema mapping](create-schema-mapping.md).

1. Have a subscription to the UID 2.0 service

**To create a matching workflow with UID 2.0:**

1. Sign in to the AWS Management Console and open the AWS Entity Resolution console at [https://console.aws.amazon.com/entityresolution/](https://console.aws.amazon.com/entityresolution/).

1. In the left navigation pane, under **Workflows**, choose **Matching**.

1. On the **Matching workflows** page, in the upper right corner, choose **Create matching workflow**.

1. For **Step 1: Specify matching workflow details**, do the following: 

   1. Enter a **Matching workflow name** and an optional **Description**.

   1. For **Data input**, choose an **AWS Region**, **AWS Glue database**, the **AWS Glue table**, and then the corresponding **Schema mapping**.

      You can add up to 20 data inputs.

   1. Leave the **Normalize data** option is selected, so that data inputs (**Email Address** or **Phone number**) are normalized before matching. 

      For more information about **Email Address** normalization, see [Email Address Normalization](https://unifiedid.com/docs/getting-started/gs-normalization-encoding#email-address-normalization) in the UID 2.0 documentation.

      For more information about **Phone number** normalization, see [Phone Number Normalization](https://unifiedid.com/docs/getting-started/gs-normalization-encoding#phone-number-normalization) in the UID 2.0 documentation.

   1. To specify the **Service access** permissions, choose an option and take the recommended action.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/create-matching-workflow-provider.html)

   1. (Optional) To enable **Tags** for the resource, choose **Add new tag**, and then enter the **Key** and **Value** pair.

   1. Choose **Next**.

1. For **Step 2: Choose matching technique**:

   1. For **Matching method**, choose **Provider services**.

   1. For **Provider services**, choose **Unified ID 2.0**.  
![\[Provider services options with the Unified ID provider service selected.\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/images/choose-matching-method-uid.PNG)

   1. Choose **Next**.

1. For **Step 3: Specify data output**:

   1. For **Data output destination and format**, choose the **Amazon S3 location** for the data output and whether the **Data format** will be **Normalized data** or **Original data**.

   1. For **Encryption**, if you choose to **Customize encryption settings**, enter the **AWS KMS key** ARN.

   1. View the **Unified ID 2.0 generated output**.

      This is a list of all of the additional information generated by UID 2.0

   1. For **Data output**, decide which fields you want to include, hide, or mask, and then take the recommended actions based on your goals.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/create-matching-workflow-provider.html)

   1. For **System generated output**, view all of the fields that are included. 

   1. Choose **Next**.

1. For **Step 4: Review and create**:

   1. Review the selections that you made for the previous steps and edit if necessary.

   1. Choose **Create and run**.

      A message appears, indicating that the matching workflow has been created and that the job has started.

1. On the matching workflow details page, on the **Metrics** tab, view the following under **Last job metrics**:
   + The **Job ID**. 
   + The **Status** of the matching workflow job: **Queued**, **In progress**, **Completed**, **Failed** 
   + The **Time completed** for the workflow job.
   + The number of **Records processed**. 
   + The number of **Records not processed**. 
   + The **Unique match IDs generated**.
   + The number of **Input records**.

   You can also view the job metrics for matching workflow jobs that have been previously run under the **Job history**.

1. After the matching workflow job completes (**Status** is **Completed**), you can go to the **Data output** tab and then select your **Amazon S3 location** to view the results.

# Editing a matching workflow
<a name="edit-matching-workflow"></a>

Editing the matching workflow allows you to keep your entity resolution processes up-to-date and responsive to your organization's changing requirements over time. You may want to adjust the matching criteria, techniques, or data outputs to improve the accuracy and efficiency of the entity resolution process. If you identify problems or errors in the results of the current workflow, editing it can help you diagnose and resolve those issues. 

**To edit a matching workflow:**

1. Sign in to the AWS Management Console and open the AWS Entity Resolution console at [https://console.aws.amazon.com/entityresolution/](https://console.aws.amazon.com/entityresolution/).

1. In the left navigation pane, under **Workflows**, choose **Matching**.

1. Choose the matching workflow.

1. On the matching workflow details page, in the upper right corner, choose **Edit workflow**.

1. On the **Specify matching workflow details** page, make any necessary changes and then choose **Next**.

1. On the **Choose matching technique** page, make any necessary changes and then choose **Next**.
**Important**  
You can change the **Processing cadence** from **Manual** to **Automatic**, but after you change it to **Automatic**, you can't change it back to **Manual**.   
If the **Processing cadence** is already set to **Automatic**, you can't change it to **Manual**.

1. On the **Specify data output** page, make any necessary changes and then choose **Next**.

1. On the **Review and save** page, make any necessary changes and then choose **Save**.

# Deleting a matching workflow
<a name="delete-matching-workflow"></a>

If a matching workflow is no longer being used or has become obsolete, deleting it can help keep your workspace organized and uncluttered. If you've developed a new, improved workflow that replaces an older one, deleting the old workflow can help ensure you're only using the most up-to-date processes.

**To delete a matching workflow:**

1. Sign in to the AWS Management Console and open the AWS Entity Resolution console at [https://console.aws.amazon.com/entityresolution/](https://console.aws.amazon.com/entityresolution/).

1. In the left navigation pane, under **Workflows**, choose **Matching**.

1. Choose the matching workflow.

1. On the matching workflow details page, in the upper right corner, choose **Delete**.

1. Confirm the deletion and then choose **Delete**.

# Modifying or generating a Match ID for a rule-based matching workflow
<a name="generate-match-id"></a>

A *Match ID* is the identifier generated by AWS Entity Resolution and applied to each matched record set after a matching workflow is run. This is part of the matching workflow metadata that is included in output.

When you need to update records for an existing customer or add a new customer to your dataset, you can use the AWS Entity Resolution console or the `GenerateMatchID` API. Modifying an existing match ID helps maintain consistency when updating customer information, while generating a new match ID is necessary when adding previously unidentified customers to your system.

**Note**  
Additional charges apply, whether you use the console or the API. The processing type you choose affects both the accuracy and response time of the operation.

**Important**  
If you revoke AWS Entity Resolution permissions to your S3 bucket while a job is in progress, AWS Entity Resolution will still process and charge for outputting results to S3 but can't deliver the results to your bucket. To avoid this issue, make sure that AWS Entity Resolution has the correct permissions to write to your S3 bucket before starting a job. If permissions are revoked during processing, AWS Entity Resolution attempts to re-deliver results for up to 30 days after job completion once you restore the correct bucket permissions.

The following procedure guides you through the process of looking up or generating a Match ID, selecting a processing type, and viewing the results. 

------
#### [ Console ]

**To modify or generate a Match ID using the console**

1. Sign in to the AWS Management Console and open the AWS Entity Resolution console at [https://console.aws.amazon.com/entityresolution/](https://console.aws.amazon.com/entityresolution/).

1. In the left navigation pane, under **Workflows**, choose **Matching**.

1. Choose the rule-based matching workflow that has been processed (**Job status** is **Completed**).

1. On the matching workflow details page, choose the **Match IDs** tab.

1. Choose **Modify or generate match ID**.
**Note**  
The **Modify or generate match ID** option is only available for matching workflows that use the **Automatic** processing cadence. If you have selected the **Manual** processing cadence, this option will appear inactive. To use this option, edit your workflow to use the **Automatic** processing cadence. For more information about editing workflows, see [Editing a matching workflow](edit-matching-workflow.md).

1. Select the **AWS Glue table** from the dropdown list.

   If there is only one AWS Glue table in the workflow, it's selected by default.

1. Choose the **Processing type**.
   + **Consistent** – You can look up an existing match ID or generate and save a new match ID immediately. This option has the highest accuracy and the slower response time.
   + **Background** (shown as `EVENTUAL` in the API) – You can look up an existing match ID or generate a new match ID immediately. The updated record is saved in the background. This option has a fast initial response, with complete results available in S3 later.
   + **Quick ID generation** (shown as `EVENTUAL_NO_LOOKUP` in the API) – You can create a new match ID without looking up an existing one. The updated record is saved in the background. This option has the fastest response. It is recommended for unique records only.

1. For **Record attributes**, 

   1. Enter the **Value** for the **Unique ID**.

   1. Enter a **Value** for each **Match key** that will match with existing records based on the rules configured in your workflow.

1. Choose **Find match ID and save record**.

   A success message appears, stating that either the Match ID was found or a new Match ID was generated and the record was saved. 

1. View the corresponding Match ID and the associated rule that was saved to the matching workflow in the success message. 

1. (Optional) To copy the match ID, choose **Copy**. 

------
#### [ API ]

**To modify or generate a Match ID using the API**
**Note**  
To call this API successfully, you must have first successfully run a rule-based matching workflow using the [StartMatchingJob API](https://docs.aws.amazon.com/entityresolution/latest/apireference/API_StartMatchingJob.html).   
For a complete list of supported programming languages, see the [See Also](https://docs.aws.amazon.com/entityresolution/latest/apireference/API_GenerateMatchId.html#API_GenerateMatchId_SeeAlso) section of the [GenerateMatchID](https://docs.aws.amazon.com/entityresolution/latest/apireference/API_GenerateMatchId.html).

1. Open a terminal or command prompt to make the API request.

1. Create a POST request to the following endpoint: 

   ```
   /matchingworkflows/workflowName/generateMatches
   ```

1. In the request header, set the Content-type to application/json. 

1. In the request URI, specify your `workflowName`. 

   The `workflowName` must: 
   + Be between 1 and 255 characters long 
   + Match the pattern [a-zA-Z\$10-9-]\$1

1. For the request body, provide the following JSON: 

   ```
   {
      "processingType": "string",
      "records": [ 
         { 
            "inputSourceARN": "string",
            "recordAttributeMap": { 
               "string" : "string" 
            },
            "uniqueId": "string"
         }
      ]
   }
   ```

   Where: 
   + `processingType` (optional) - Defaults to `CONSISTENT`. Choose one of these values: 
     + `CONSISTENT` - For highest accuracy with slower response time 
     + `EVENTUAL` - For faster initial response with background processing 
     + `EVENTUAL_NO_LOOKUP` - For fastest response when records are known to be unique 
   + `records` (required) - Array containing exactly one record object

1. Send the request. 

   If successful, you'll receive a response with status code 200 and a JSON body containing:

   ```
   {
      "failedRecords": [ 
         { 
            "errorMessage": "string",
            "inputSourceARN": "string",
            "uniqueId": "string"
         }
      ],
      "matchGroups": [ 
         { 
            "matchId": "string",
            "matchRule": "string",
            "records": [ 
               { 
                  "inputSourceARN": "string",
                  "recordId": "string"
               }
            ]
         }
      ]
   }
   ```

   If the call is unsuccessful, you might receive one of these errors:
   + 403 - AccessDeniedException if you don't have sufficient access
   + 404 - ResourceNotFoundException if the resource can't be found
   + 429 - ThrottlingException if the request was throttled
   + 400 - ValidationException if the input fails validation
   + 500 - InternalServerException if there's an internal service failure

------

# Looking up a Match ID for a rule-based matching workflow
<a name="find-match-id"></a>

After completing a rule-based matching workflow, you can retrieve the Match ID and associated rule for each processed record. This information helps you understand how records were matched and which rules were applied. The following procedure demonstrates how to access this data using either the AWS Entity Resolution console or the `GetMatchID` API.

------
#### [ Console ]

**To look up a Match ID using the console**

1. Sign in to the AWS Management Console and open the AWS Entity Resolution console at [https://console.aws.amazon.com/entityresolution/](https://console.aws.amazon.com/entityresolution/).

1. In the left navigation pane, under **Workflows**, choose **Matching**.

1. Choose the rule-based matching workflow that has been processed (**Job status** is **Completed**).

1. On the matching workflow details page, choose the **Match IDs** tab.

1. Choose **Look up match ID**.
**Note**  
The **Look up match ID** option is only available for matching workflows that use the **Automatic** processing cadence. If you have selected the **Manual** processing cadence, this option will appear inactive. To use this option, edit your workflow to use the **Automatic** processing cadence. For more information about editing workflows, see [Editing a matching workflow](edit-matching-workflow.md).

1. Do one of the following:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/entityresolution/latest/userguide/find-match-id.html)

1. For **Record attributes**, enter the **Value** for an existing **Match key** to look up for each existing record.
**Tip**  
Enter as many values as you can to help find the Match ID. 

1. The **Normalize data** option is selected by default, so that data inputs are normalized before matching. If you don't want to normalize data, deselect the **Normalize data** option.

1. If you want to view the matching rules expand the **View matching rules**.

1. Choose **Look up**.

   A success message appears, stating that the Match ID was found. 

1. View the corresponding Match ID and the associated rule that was found. 

------
#### [ API ]

**To look up a Match ID using the API**
**Note**  
To call this API successfully, you must have first successfully run a rule-based matching workflow using the [StartMatchingJob API](https://docs.aws.amazon.com/entityresolution/latest/apireference/API_StartMatchingJob.html).   
For a complete list of supported programming languages, see the [See Also](https://docs.aws.amazon.com/entityresolution/latest/apireference/API_GetMatchId.html#API_GetMatchId_SeeAlso) section of the [GetMatchID API](https://docs.aws.amazon.com/entityresolution/latest/apireference/API_GetMatchId.html).

1. Open a terminal or command prompt to make the API request.

1. Create a POST request to the following endpoint: 

   ```
   /matchingworkflows/workflowName/matches
   ```

1. In the request header, set the Content-type to application/json. 

1. In the request URI, specify your `workflowName`. 

   The `workflowName` must: 
   + Be between 1 and 255 characters long 
   + Match the pattern [a-zA-Z\$10-9-]\$1

1. For the request body, provide the following JSON: 

   ```
   {
      "applyNormalization": boolean,
      "record": { 
         "string" : "string" 
      }
   }
   ```

   Where: 

   `applyNormalization` (optional) - Set to `true` to normalize attributes defined in the schema 

   `record` (required) - The record to fetch the Match ID for

1. Send the request. 

   If successful, you'll receive a response with status code 200 and a JSON body containing: 

   ```
   {
      "matchId": "string",
      "matchRule": "string"
   }
   ```

   The `matchId` is the unique identifier for this group of matched records, and `matchRule` indicates which rule the record matched on. 

   If the call is unsuccessful, you might receive one of these errors:
   + 403 - AccessDeniedException if you don't have sufficient access
   + 404 - ResourceNotFoundException if the resource can't be found
   + 429 - ThrottlingException if the request was throttled
   + 400 - ValidationException if the input fails validation
   + 500 - InternalServerException if there's an internal service failure

------

# Deleting records from a rule-based or ML-based matching workflow
<a name="delete-records"></a>

If you need to comply with data management regulations, you can delete the records from either a rule-based or ML-based matching workflow.

**To delete records from a rule-based or ML-based matching workflow**

1. Sign in to the AWS Management Console and open the AWS Entity Resolution console at [https://console.aws.amazon.com/entityresolution/](https://console.aws.amazon.com/entityresolution/).

1. In the left navigation pane, under **Workflows**, choose **Matching**.

1. Choose the rule-based or ML-based matching workflow.

1. On the matching workflow details page, choose **Delete unique IDs** from the **Actions** dropdown list. 

1. Enter the unique ID you want to delete in the **Unique IDs** section. 

   You can enter up to 10 unique IDs.

1. Specify the **Input source** from which to delete the unique IDs.

   If there is only one **Input source** for the workflow, the **Input source** is listed by default. 

   If you only specify one **Input source**, the unique IDs in other input sources won't be affected.

1. Choose **Delete unique IDs**.

# Troubleshooting matching workflows
<a name="troubleshooting"></a>

Use the following information to help you diagnose and fix common issues that you might encounter when running matching workflows.

## I received an error file after running a matching workflow
<a name="troubleshooting_error_code_1"></a>

### Common cause
<a name="troubleshooting_common_cause"></a>

A matching workflow can have multiple runs and the results (successes or errors) are written to a folder with the `jobId` as the name.

The successful results for a matching workflow are written to a `success` folder that contains multiple files, and each file contains a subset of the successful records. 

The errors for a matching workflow are written to an `error` folder with multiple fields, with each containing a subset of the error records. 

The error file can be created for the following reasons:
+ The [Unique ID](glossary.md#unique-id-defn) is: 
  + null
  + missing in a row of data
  + missing in a record in the data table
  + repeated in another row of data in the data table
  + not specified
  + not unique within the same source
  + not unique across multiple sources
  + overlaps across sources
  + exceeds 38 characters (rule-based matching workflow only)
+ One of the fields in the [schema mapping](glossary.md#schema-mapping-definition) includes a reserved name:
  + EmailAddress
  + InputSourceARN
  + MatchRule
  + MatchID
  + HashingProtocol
  + ConfidenceLevel
  + Source

**Note**  
If the record in the error file is created due to the reasons listed previously, you are charged, because it incurs processing cost for the service. If the record in the error file is because of an internal server error, you aren't charged.

### Resolution
<a name="troubleshooting_resolution"></a>

**To resolve this issue**

1. Check to see if the [Unique ID](glossary.md#unique-id-defn) is valid.

   If the [Unique ID](glossary.md#unique-id-defn) isn't valid, update the Unique ID in your data table, save the new data table, create a new schema mapping, and run the matching workflow again.

1. Check if one of the fields in the [schema mapping](glossary.md#schema-mapping-definition) includes a reserved name.

   If one of the fields includes a reserved name, create a new schema mapping with a new name, and run the matching workflow again.