Variables
Variables represent data elements that you want to use in a fraud prediction. These variables can be taken from the event dataset that you prepared for training your model, from your Amazon Fraud Detector model's risk score outputs, or from Amazon SageMaker AI models. For more information about variables taken from the event dataset, see Get event dataset requirements using the Data models explorer.
The variables you want to use in your fraud prediction must first be created and then added to the event when creating your event type. Each variable you create must be assigned a datatype, a default value, and optionally a variable type. Amazon Fraud Detector enriches some of the variables that you provide such as IP addresses, bank identification numbers (BINs), and phone numbers, to create additional inputs and boost performance for the models that use these variables.
Data types
Variables must have a data type for the data element that the variable represents and can optionally be assigned one of the predefined Variable types. For variables that are assigned to a variable type, the data type is pre-selected. Possible data types include the following types :
Data type | Description | Default value | Example values |
---|---|---|---|
String | Any combination of letters, whole numbers, or both | <empty> |
abc, 123, 1D3B |
Integer | Positive or negative whole numbers | 0 | 1, -1 |
Boolean | True or False | False | True, False |
DateTime | Date and time specified in the ISO 8601 standard UTC format only | <empty> | 2019-11-30T13:01:01Z |
Float | Numbers with decimal points | 0.0 | 4.01, 0.10 |
Default value
Variables must have a default value. When Amazon Fraud Detector generates fraud predictions, this default value is used to run a rule or model if Amazon Fraud Detector doesn't
receive a value for a variable. Default values you provide must match the selected data type. In the
AWS Console, Amazon Fraud Detector assigns the default value of 0
for integers, false
for Booleans,
0.0
for floats, and (empty) for strings. You can set a custom default value for any of these data types.
Variable types
When you create a variable, you can optionally assign the variable to a variable type. Variable type represents the common data elements that are used to train models and to generate fraud predictions. Only variables with an associated variable type can be used for model training. As part of the model training process, Amazon Fraud Detector uses the variable type associated with the variable to perform variable enrichments, feature engineering, and risk scoring.
Amazon Fraud Detector has pre-defined the following variable types that can be used to assign to your variables.
Category | Variable type | Description | Data type | Example |
---|---|---|---|---|
Session | IP_ADDRESS | The IP address that's collected during the event | String | 192.0.2.0
Note: Amazon Fraud Detector enriches this data. For more information, see Geolocation enrichment |
USERAGENT | The user agent that's collected during the event | String | Mozilla 5.0 (Windows NT 10.0, Win64, x64,rv:68.0) Gecko 20100101 | |
FINGERPRINT | The unique identifier for a device used for the event | String | sadfow987u234 | |
SESSION_ID | The session ID for the event's active session | String | sid123456789 | |
ARE_CREDENTIALS_VALID | Indicates if the credentials used for event login are valid | Boolean | True | |
User | EMAIL_ADDRESS | The email address that's collected during the event | String | abc@domain.com |
PHONE_NUMBER | The phone number collected during the event | String | +1 555-0100
Note: Amazon Fraud Detector enriches this data. For more information, see Phone number enrichment |
|
Billing | BILLING_NAME | The name that's associated with the billing address | String | John Doe |
BILLING_PHONE | The phone number that's associated with the billing address | String | +1 555-0100
Note: Amazon Fraud Detector enriches this data. For more information, see Phone number enrichment |
|
BILLING_ADDRESS_L1 | The first line of the billing address | String | Any street | |
BILLING_ADDRESS_L2 | The second line of the billing address | String | Any unit 123 | |
BILLING_CITY | The city that's in the billing address | String | Any City | |
BILLING_STATE | The state or province that's in the billing address | String | Any state or province | |
BILLING_COUNTRY | The country that's in the billing address | String | Any country
Note: Amazon Fraud Detector enriches this data. For more information, see Geolocation enrichment |
|
BILLING_ZIP | The postal code that's in the billing address | String | 01234
Note: Amazon Fraud Detector enriches this data. For more information, see Geolocation enrichment |
|
Shipping | SHIPPING_NAME | The name that's associated with the shipping address | String | John Doe |
SHIPPING_PHONE | The phone number that's associated with the shipping address | String | +1 555-0100
Note: Amazon Fraud Detector enriches this data. For more information, see Phone number enrichment |
|
SHIPPING_ADDRESS_L1 | The first line of the shipping address | String | 123 Any Street | |
SHIPPING_ADDRESS_L2 | The second line of the shipping address | String | Unit 123 | |
SHIPPING_CITY | The city that's in the shipping address | String | Any City | |
SHIPPING_STATE | The state or province that's in the shipping address | String | Any State | |
SHIPPING_COUNTRY | The country that's in that's in the shipping address | String | Any Country
Note: Amazon Fraud Detector enriches this data. For more information, see Geolocation enrichment |
|
SHIPPING_ZIP | The postal code that's in the shipping address | String | 01234
Note: Amazon Fraud Detector enriches this data. For more information, see Geolocation enrichment |
|
Payment | ORDER_ID | The unique identifier for the transaction | String | LUX60 |
PRICE | The total order price | String | 560.00 | |
CURRENCY_CODE | The ISO 4217 currency code | String | USD | |
PAYMENT_TYPE | The payment method that's used for payment during the event | String | Credit card | |
AUTH_CODE | The alphanumerical code that's sent by a credit card issuer or issuing bank | String | 0000 | |
AVS | The address verification system (AVS) response code from the card processor | String | Y | |
Product | PRODUCT_CATEGORY | The product category of order item | String | Kitchen |
Custom | NUMERIC | Any variable that can be represented as a real number | Float | 1.224 |
CATEGORICAL | Any variable that describes categories, segments, or groups | String | Large | |
FREE_FORM_TEXT | Any free form text that's captured as part of the event (for example, a customer review or comment) | String | Example of a free form text input |
Assigning variable to a variable type
If you are planning to use a variable for training your model, it is important that you choose a right variable type to assign to the variable. Incorrect variable type assignment can negatively impact your model performance. It can also become very difficult for you change the assignment later, especially if multiple models and events have used the variable.
You can assign your variable any one of the pre-defined variable types or one of the custom variable types – FREE_FORM_TEXT
,
CATEGORICAL
, or NUMERIC
.
Important notes for assigning variables to the right variable types
-
If the variable matches one of predefined variable types, use it. Make sure the variable type corresponds to the variable. For example, if you assign an ip_address variable to
EMAIL_ADDRESS
variable type, the ip_address variable will not get enriched with enrichments such as ASN, ISP, geo-location, and risk score. For more information, see Variable enrichments. -
If the variable doesn’t match any of predefined variable types, follow the recommendations listed below to assign one of the custom variable types.
-
Assign
CATEGORICAL
variable type to variables that typically do not have natural ordering and can be put into categories, segments, or groups. The dataset you are using to train your model might have ID variables such as, merchant_id, campaign_id, or policy_id. These variables represent groups (for example, all customers with same policy_id represent a group). Variables that have the following data must be assigned CATEGORICAL variable type --
Variables that contain data such as customer_ID, segment_ID, color_ID, department_code, or product_ID.
-
Variables that contain Boolean data with true, false, or null values.
-
Variables that can be put into groups or categories such as company name, product category, card type, or referral medium.
Note
ENTITY_ID
is a reserved variable type used by Amazon Fraud Detector to assign to ENTITY_ID variable. The ENTITY_ID variable is the ID of the entity initiating the action you want to evaluate. If you are creating a Transaction Fraud Insight (TFI) model type, you are required to provide ENTITY_ID variable. You will need to decide which variable in your data uniquely identifies the entity initiating the action and pass it on as ENTITY_ID variable. Assign CATEGORICAL variable type to all the other IDs in your dataset, if they are present and if you are using them for model training. Examples of other IDs that are not an entity in your dataset can be merchant_ID, policy_ID, and campaign_ID. -
-
Assign
FREE_FORM_TEXT
variable type to variables that contain a block of text. Examples of FREE_FORM_TEXT variable types are – user reviews, comments, dates, and referral codes. The FREE_FORM_TEXT data contains multiple tokens separated by a delimiter. The delimiters can be any character other than alpha-numeric and underscore symbol. For example, user reviews and comments can be separated by “space” delimiter, dates and referral codes can use hyphens as delimiters to separate out prefix, suffix, and intermediate parts. Amazon Fraud Detector uses the delimiters to extract data from FREE_FORM_TEXT variables. -
Assign NUMERIC variable type to variables that are real numbers and have inherent ordering. Examples of NUMERIC variables include day_of_the_week, incident_severity, customer_rating. Although, you can assign CATEGORICAL variable type to these variables, we strongly recommend to assign all real number variables with inherent order to NUMERIC variable type.
Variable enrichments
Amazon Fraud Detector enriches some of the raw data elements that you provide such as IP addresses, bank identification numbers (BINs), and phone numbers, to create additional inputs and boost performance for the models that use these data elements. The enrichment helps identify potentially suspicious situations and help the models to capture more fraud.
Phone number enrichment
Amazon Fraud Detector enriches phone number data with additional information that relates to geolocation, the original carrier, and the validity of the phone number. Phone number enrichment is automatically enabled for all the models that are trained on or after December 13, 2021 and have a phone number that includes a country code (+xxx). If you have included phone number variable in your model and have trained it before December 13, 2021, retrain your model so it can take advantage of this enrichment.
We highly recommend that you use the following format for phone number variables to ensure that your data is enriched successfully.
Geolocation enrichment
Starting on February 8, 2022 Amazon Fraud Detector calculates the physical distance between the IP_ADDRESS, BILLING_ZIP, and SHIPPING_ZIP values that you provide for an event. The calculated distances are used as inputs to your fraud detection model.
To enable geolocation enrichment, your event data must include at least two of the three variables: IP_ADDRESS, BILLING_ZIP, or SHIPPING_ZIP. In addition, each BILLING_ZIP and SHIPPING_ZIP value must have a valid BILLING_COUNTRY code and SHIPPING_COUNTRY code respectively. If you have a model that was trained before February 8, 2022 and it includes these variables, you must retrain the model to enable the geolocation enrichment.
If Amazon Fraud Detector can't determine the location that's associated with the IP_ADDRESS, BILLING_ZIP ,or SHIPPING_ZIP values for an event due to the data being not valid, a special placeholder value is used instead. For example, suppose that an event has valid IP_ADDRESS and BILLING_ZIP values, but SHIPPING_ZIP value isn't valid. In this case, enrichment is done only for IP_ADDRESS–> BILLING_ZIP. The enrichment isn't done for IP_ADDRESS–>SHIPPING_ZIP and BILLING_ZIP–>SHIPPING_ZIP . Instead, the placeholder values are used in their place. No matter if geolocation enrichment is enabled for your model or not, the performance of your model doesn't change.
You can opt out of geolocation enrichment by mapping your BILLING_ZIP and SHIPPING_ZIP variables to the CUSTOM_CATEGORICAL variable type. Changing the variable type doesn't affect your model's performance.
Geolocation variable format
We highly recommend that you use the following format for geolocation variables to ensure that your location data is enriched successfully.
Variable | Format | Description |
---|---|---|
IP_ADDRESS | IPv4 |
For example - 1.1.1.1 |
BILLING_ZIP and SHIPPING_ZIP | The ISO
3166-1 alpha-2 |
For more information, see the Country and territory codes section in this topic. |
BILLING_COUNTRY and SHIPPING_COUNTRY | The ISO
3166-1 alpha-2 |
For more information, see the Country and territory codes section in this topic. Amazon Fraud Detector tries to match all the common variations of a country's name to their ISO 3166-1 two-letter standard country code. However, we cannot guarantee they will be matched correctly. |
The following table provides a complete list of the countries and territories that are supported by Amazon Fraud Detector for geolocation enrichment. Each country and territory has an assigned country code (specifically, the ISO 3166-1 alpha-2 two-letter country code) and a postal code.
Postal code format
9 - number
a - letter
[X] - X is optional. For example, Guersney "GY9[9] 9aa" means both "GY9 9aa" and "GY99 9aa" are valid. Use one format.
[X/XX] - either X or XX can be used. For example, Bermuda "aa[aa/99]" means both "aa aa" and "aa 99" are valid. Use either one of these formats, but do not use both.
Some countries have fixed prefix. For example, the postal code for Andorra is AD999. This means the country code must start with letters AD followed by three numbers.
Code | Name | Postal code |
---|---|---|
AD | Andorra | AD999 |
AR | Netherlands Antilles | 9999 |
AT | Austria | 9999 |
AU | Australia | 9999 |
AZ | Azerbaijan | AZ 9999 |
BD | Bangladesh | 9999 |
BE | Belgium | 9999 |
BG | Bulgaria | 9999 |
BM | Bermuda | aa[aa/99] |
BY | Belarus | 999999 |
CA | Canada | a9a 9a9 |
CH | Switzerland | 9999 |
CL | Chile | 9999999 |
CO | Colombia | 999999 |
CR | Costa Rica | 99999 |
CY | Cyprus | 9999 |
CZ | Czechia | 999 99 |
DE | Germany | 99999 |
DK | Denmark | 9999 |
DO | Dominican Republic | 99999 |
DZ | Algeria | 99999 |
EE | Estonia | 99999 |
ES | Spain | 99999 |
FI | Finland | 99999 |
FM | Federated States of Micronesia | 99999 |
FO | Faroe Islands | 999 |
FR | France | 99999 |
GB | United Kingdom | a[a]9[a/9] 9aa |
GG | Guernsey | GY9[9] 9aa |
GL | Greenland | 9999 |
GP | Guadeloupe | 99999 |
GT | Guatemala | 99999 |
GU | Guam | 99999 |
HR | Croatia | 99999 |
HU | Hungary | 9999 |
IE | Ireland | a99[a/9][a/9][a/9][a/9] |
IM | Isle of Man | IM9[9]9aa |
IN | India | 999999 |
IS | Iceland | 999 |
IT | Italy | 99999 |
JE | Jersey | JE9[9]9aa |
JP | Japan | 999-9999 |
KR | Republic of Korea | 99999 |
LI | Liechtenstein | 9999 |
LK | Sri Lanka | 99999 |
LT | Lithuania | 99999 |
LU | Luxembourg | L-9999 |
LV | Latvia | LV-9999 |
MC | Monaco | 99999 |
MD | Republic of Moldova | 9999 |
MH | Marshall Islands | 99999 |
MK | North Macedonia | 9999 |
MP | North Mariana Islands | 99999 |
MQ | Matinique | 99999 |
MT | Malta | aaa 9999 |
MX | Mexico | 99999 |
MY | Malaysia | 99999 |
NL | Netherlands | 9999 aa |
NO | Norway | 9999 |
NZ | New Zealand | 9999 |
PH | Philippines | 9999 |
PK | Pakistan | 99999 |
PL | Poland | 99-999 |
PR | Puerto Rico | 99999 |
PT | Portugal | 9999-999 |
PW | Palau | 99999 |
RE |
Reunion |
99999 |
RO | Romania | 999999 |
RU | Russian Federation | 999999 |
SE | Sweden | 999 99 |
SG | Singapore | 999999 |
SI | Slovenia | 9999 |
SK | Slovakia | 999 99 |
SM | San Marino | 99999 |
TH | Thailand | 99999 |
TR | Turkey | 99999 |
UA | Ukraine | 99999 |
US | United States | 99999 |
UY | Uruguay | 99999 |
VI | Virgin Islands, US | 99999 |
WF | Wallis and Futuna | 99999 |
YT | Mayotte | 99999 |
ZA | South Africa | 9999 |
Useragent enrichment
If you create the Account Takeover Insights (ATI) model, you must provide a variable of the useragent
variable type in your dataset.
This variable contains the browser, device, and OS data of a login event. Amazon Fraud Detector enriches the useragent data with additional information such as user_agent_family
OS_family
, and device_family
.