

# Discovering sensitive data with Macie
<a name="data-classification"></a>

With Amazon Macie, you can automate discovery, logging, and reporting of sensitive data in your Amazon Simple Storage Service (Amazon S3) data estate. You can do this in two ways: by configuring Macie to perform automated sensitive data discovery, and by creating and running sensitive data discovery jobs.

Automated sensitive data discovery provides broad visibility into where sensitive data might reside in your Amazon S3 data estate. With this option, Macie evaluates your S3 bucket inventory on a daily basis and uses sampling techniques to identify and select representative S3 objects from your buckets. Macie then retrieves and analyzes the selected objects, inspecting them for sensitive data. For more information, see [Performing automated sensitive data discovery](discovery-asdd.md).

Sensitive data discovery jobs provide deeper, more targeted analysis. With this option, you define the breadth and depth of the analysis—specific S3 buckets that you select or buckets that match specific criteria. You can also refine the scope of the analysis by choosing options such as custom criteria that derive from properties of S3 objects. In addition, you can configure a job to run only once for on-demand analysis and assessment, or on a recurring basis for periodic analysis, assessment, and monitoring. For more information, see [Running sensitive data discovery jobs](discovery-jobs.md).

With either option, automated sensitive data discovery or sensitive data discovery jobs, you can configure Macie to analyze S3 objects by using managed data identifiers that it provides, custom data identifiers that you define, or a combination of the two. You can also fine tune the analysis with allow lists. When you configure settings for automated sensitive data discovery or a sensitive data discovery job, you specify which to use:
+ **Managed data identifiers** – These are built-in criteria and techniques that are designed to detect specific types of sensitive data. For example, they can detect credit card numbers, AWS secret access keys, and passport numbers for particular countries and regions. They can detect a large and growing list of sensitive data types for many countries and regions. This includes multiple types of personally identifiable information (PII), financial information, and credentials data. For more information, see [Using managed data identifiers](managed-data-identifiers.md).
+ **Custom data identifiers** – These are custom criteria that you define to detect sensitive data. Each custom data identifier specifies a regular expression (*regex*) that defines a text pattern to match and, optionally, character sequences and a proximity rule that refine the results. You can use them to detect sensitive data that reflects your particular scenarios, intellectual property, or proprietary data—for example, employee IDs, customer account numbers, or internal data classifications. For more information, see [Building custom data identifiers](custom-data-identifiers.md).
+ **Allow lists** – These specify text and text patterns that you want Macie to ignore. You can use them to specify sensitive data exceptions for your particular scenarios or environment—for example, public names or phone numbers for your organization, or sample data that your organization uses for testing. If Macie finds text that matches an entry or pattern in an allow list, Macie doesn’t report that occurrence of text. This is the case even if the text matches the criteria of a managed or custom data identifier. For more information, see [Defining sensitive data exceptions with allow lists](allow-lists.md).

When Macie analyzes an S3 object, Macie retrieves the latest version of the object from Amazon S3, and then inspects the object's contents for sensitive data. Macie can analyze an object if the following is true:
+ The object uses a supported file or storage format and it's stored in an S3 general purpose bucket using a supported storage class. For more information, see [Supported storage classes and formats](discovery-supported-storage.md).
+ If the object is encrypted, it’s encrypted with a key that Macie can access and is allowed to use. For more information, see [Analyzing encrypted S3 objects](discovery-supported-encryption-types.md).
+ If the object is stored in a bucket that has a restrictive bucket policy, the policy allows Macie to access objects in the bucket. For more information, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md).

To help you meet and maintain compliance with your data security and privacy requirements, Macie produces records of the sensitive data that it finds and the analysis that it performs—*sensitive data findings* and *sensitive data discovery results*. A *sensitive data finding* is a detailed report of sensitive data that Macie found in an S3 object. A *sensitive data discovery result* is a record that logs details about the analysis of an object. Each type of record adheres to a standardized schema, which can help you query, monitor, and process them by using other applications, services, and systems as necessary.

**Tip**  
Although Macie is optimized for Amazon S3, you can use it to discover sensitive data in resources that you currently store elsewhere. You can do this by moving the data to Amazon S3 temporarily or permanently. For example, export Amazon Relational Database Service or Amazon Aurora snapshots to Amazon S3 in Apache Parquet format. Or export an Amazon DynamoDB table to Amazon S3. You can then create a job to analyze the data in Amazon S3.

**Topics**
+ [Using managed data identifiers](managed-data-identifiers.md)
+ [Building custom data identifiers](custom-data-identifiers.md)
+ [Defining sensitive data exceptions with allow lists](allow-lists.md)
+ [Performing automated sensitive data discovery](discovery-asdd.md)
+ [Running sensitive data discovery jobs](discovery-jobs.md)
+ [Analyzing encrypted S3 objects](discovery-supported-encryption-types.md)
+ [Storing and retaining sensitive data discovery results](discovery-results-repository-s3.md)
+ [Supported storage classes and formats](discovery-supported-storage.md)

# Using managed data identifiers
<a name="managed-data-identifiers"></a>

Amazon Macie uses a combination of criteria and techniques, including machine learning and pattern matching, to detect sensitive data in Amazon Simple Storage Service (Amazon S3) objects. These criteria and techniques, collectively referred to as *managed data identifiers*, can detect a large and growing list of sensitive data types for many countries and regions, including multiple types of credentials data, financial information, personal health information (PHI), and personally identifiable information (PII). Each managed data identifier is designed to detect a specific type of sensitive data—for example, AWS secret access keys, credit card numbers, or passport numbers for a particular country or region.

Macie can detect the following categories of sensitive data by using managed data identifiers:
+ Credentials, for credentials data such as private keys and AWS secret access keys.
+ Financial information, for financial data such as credit card numbers and bank account numbers.
+ Personal information, for PHI such as health insurance and medical identification numbers, and PII such as driver's license identification numbers and passport numbers.

Within each category, Macie can detect multiple types of sensitive data. The topics in this section list and describe each type and any relevant requirements for detecting it. For each type, they also indicate the unique identifier (ID) for the managed data identifier that's designed to detect the data. When you [create a sensitive data discovery job](discovery-jobs-create.md) or [configure settings for automated sensitive data discovery](discovery-asdd-account-configure.md), you can use these IDs to specify which managed data identifiers you want Macie to use when it analyzes S3 objects.

**Topics**
+ [Keyword requirements](managed-data-identifiers-keywords.md)
+ [Quick reference by sensitive data type](mdis-reference-quick.md)
+ [Detailed reference by sensitive data category](mdis-reference.md)

For a list of managed data identifiers that we recommend for jobs, see [Managed data identifiers recommended for sensitive data discovery jobs](discovery-jobs-mdis-recommended.md). For a list of managed data identifiers that we recommend and are used by default for automated sensitive data discovery, see [Default settings for automated sensitive data discovery](discovery-asdd-settings-defaults.md).

# Keyword requirements for managed data identifiers
<a name="managed-data-identifiers-keywords"></a>

To detect certain types of sensitive data by using managed data identifiers, Amazon Macie requires a keyword to be in proximity of the data. If this is the case for a particular type of data, reference topics in this section indicate the keyword requirements for that data.

If a keyword has to be in proximity of a particular type of data, the keyword typically has to be within 30 characters (inclusively) of the data. Additional proximity requirements vary based on the file type or storage format of an Amazon Simple Storage Service (Amazon S3) object.

**Structured columnar data**  
For columnar data, a keyword has to be part of the same value or in the name of the column or field that stores a value. This is the case for Microsoft Excel workbooks, CSV files, and TSV files.  
For example, if the value for a field contains both *SSN* and a nine-digit number that uses the syntax of a US Social Security number (SSN), Macie can detect the SSN in the field. Similarly, if the name of a column contains *SSN*, Macie can detect each SSN in the column. Macie treats the values in that column as being in proximity of the keyword *SSN*.

**Structured record-based data**  
For record-based data, a keyword has to be part of the same value or in the name of an element in the path to the field or array that stores a value. This is the case for Apache Avro object containers, Apache Parquet files, JSON files, and JSON Lines files.  
For example, if the value for a field contains both *credentials* and a character sequence that uses the syntax of an AWS secret access key, Macie can detect the key in the field. Similarly, if the path to a field is `$.credentials.aws.key`, Macie can detect an AWS secret access key in the field. Macie treats the value in the field as being in proximity of the keyword *credentials*.

**Unstructured data**  
For unstructured data, a keyword typically has to be within 30 characters (inclusively) of the data. There aren't any additional proximity requirements. This is the case for Adobe Portable Document Format files, Microsoft Word documents, email messages, and non-binary text files other than CSV, JSON, JSON Lines, and TSV files. This includes any structured data, such as tables or XML, in these types of files.

Keywords aren’t case sensitive. In addition, if a keyword contains a space, Macie automatically matches keyword variations that don’t contain the space or contain an underscore (\$1) or a hyphen (-) instead of the space. In certain cases, Macie also expands or abbreviates a keyword to address common variations of the keyword.

For a demonstration of how keywords provide context and help Macie detect specific types of sensitive data, watch the following video:




# Quick reference: Managed data identifiers by type
<a name="mdis-reference-quick"></a>

In Amazon Macie, a *managed data identifier* is a set of built-in criteria and techniques that are designed to detect a specific type of sensitive data—for example, credit card numbers, AWS secret access keys, or passport numbers for a particular country or region. These identifiers can detect a large and growing list of sensitive data types for many countries and regions, including multiple types of credentials data, financial information, personal health information (PHI), and personally identifiable information (PII).

The following table lists all the managed data identifiers that Macie currently provides, organized by sensitive data type. For each type, it provides the following information:
+ **Sensitive data category** – Specifies the general category of sensitive data that includes the type: *Credentials*, for credentials data such as private keys; *Financial information*, for financial data such as credit card numbers and bank account numbers; *Personal information: PHI* for personal health information such as health insurance and medical identification numbers; and, *Personal information: PII* for personally identifiable information such as driver's license identification numbers and passport numbers.
+ **Managed data identifier ID** – Specifies the unique identifier (ID) for one or more managed data identifiers that are designed to detect the data. When you create a sensitive data discovery job or configure settings for automated sensitive data discovery, you can use these IDs to specify which managed data identifiers you want Macie to use when it analyzes data. For a list of managed data identifiers that we recommend for jobs, see [Managed data identifiers recommended for sensitive data discovery jobs](discovery-jobs-mdis-recommended.md). For a list of managed data identifiers that we recommend for automated sensitive data discovery, see [Default settings for automated sensitive data discovery](discovery-asdd-settings-defaults.md).
+ **Keyword required** – Specifies whether detection requires a keyword to be in proximity of the data. For information about how Macie uses keywords when it analyzes data, see [Keyword requirements](managed-data-identifiers-keywords.md).
+ **Countries and regions** – Specifies which countries and regions the applicable managed data identifiers are designed for. If the managed data identifiers aren't designed for particular countries and regions, this value is *Any*.

To review additional details about the managed data identifiers for a particular type of sensitive data, choose the type.


| Sensitive data type | Sensitive data category | Managed data identifier ID | Keyword required | Countries and regions | 
| --- | --- | --- | --- | --- | 
| [AWS secret access key](mdis-reference-credentials.md#mdis-reference-AWS-CREDENTIALS) | Credentials | AWS\$1CREDENTIALS | Yes | Any | 
| [Bank account number](mdis-reference-financial.md#mdis-reference-BAN) | Financial information |  BANK\$1ACCOUNT\$1NUMBER (for both Canada and the US)  | Yes | Canada, US | 
| [Basic Bank Account Number (BBAN)](mdis-reference-financial.md#mdis-reference-BBAN) | Financial information |  Depending on country or region: FRANCE\$1BANK\$1ACCOUNT\$1NUMBER, GERMANY\$1BANK\$1ACCOUNT\$1NUMBER, ITALY\$1BANK\$1ACCOUNT\$1NUMBER, SPAIN\$1BANK\$1ACCOUNT\$1NUMBER, UK\$1BANK\$1ACCOUNT\$1NUMBER  | Yes | France, Germany, Italy, Spain, UK | 
| [Birth date](mdis-reference-pii.md#mdis-reference-DATE_OF_BIRTH) | Personal information: PII | DATE\$1OF\$1BIRTH | Yes | Any | 
| [Credit card expiration date](mdis-reference-financial.md#mdis-reference-CC-expiration) | Financial information | CREDIT\$1CARD\$1EXPIRATION | Yes | Any | 
| [Credit card magnetic stripe data](mdis-reference-financial.md#mdis-reference-CC-stripe) | Financial information | CREDIT\$1CARD\$1MAGNETIC\$1STRIPE | Yes | Any | 
| [Credit card number](mdis-reference-financial.md#mdis-reference-CC-number) | Financial information | CREDIT\$1CARD\$1NUMBER (for credit card numbers in proximity of a keyword), CREDIT\$1CARD\$1NUMBER\$1(NO\$1KEYWORD) (for credit card numbers not in proximity of a keyword) | Varies | Any | 
| [Credit card verification code](mdis-reference-financial.md#mdis-reference-CC-verification-code) | Financial information | CREDIT\$1CARD\$1SECURITY\$1CODE | Yes | Any | 
| [Driver’s license identification number](mdis-reference-pii.md#mdis-reference-DL-num) | Personal information: PII |  Depending on country or region: AUSTRALIA\$1DRIVERS\$1LICENSE, AUSTRIA\$1DRIVERS\$1LICENSE, BELGIUM\$1DRIVERS\$1LICENSE, BULGARIA\$1DRIVERS\$1LICENSE, CANADA\$1DRIVERS\$1LICENSE, CROATIA\$1DRIVERS\$1LICENSE, CYPRUS\$1DRIVERS\$1LICENSE, CZECHIA\$1DRIVERS\$1LICENSE, DENMARK\$1DRIVERS\$1LICENSE, DRIVERS\$1LICENSE (for the US), ESTONIA\$1DRIVERS\$1LICENSE, FINLAND\$1DRIVERS\$1LICENSE, FRANCE\$1DRIVERS\$1LICENSE, GERMANY\$1DRIVERS\$1LICENSE, GREECE\$1DRIVERS\$1LICENSE, HUNGARY\$1DRIVERS\$1LICENSE, INDIA\$1DRIVERS\$1LICENSE, IRELAND\$1DRIVERS\$1LICENSE, ITALY\$1DRIVERS\$1LICENSE, LATVIA\$1DRIVERS\$1LICENSE, LITHUANIA\$1DRIVERS\$1LICENSE, LUXEMBOURG\$1DRIVERS\$1LICENSE, MALTA\$1DRIVERS\$1LICENSE, NETHERLANDS\$1DRIVERS\$1LICENSE, POLAND\$1DRIVERS\$1LICENSE, PORTUGAL\$1DRIVERS\$1LICENSE, ROMANIA\$1DRIVERS\$1LICENSE, SLOVAKIA\$1DRIVERS\$1LICENSE, SLOVENIA\$1DRIVERS\$1LICENSE, SPAIN\$1DRIVERS\$1LICENSE, SWEDEN\$1DRIVERS\$1LICENSE, UK\$1DRIVERS\$1LICENSE  | Yes | Australia, Austria, Belgium, Bulgaria, Canada, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, India, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, UK, US | 
| [Drug Enforcement Agency (DEA) Registration Number](mdis-reference-phi.md#mdis-reference-DEA-registration-num) | Personal information: PHI | US\$1DRUG\$1ENFORCEMENT\$1AGENCY\$1NUMBER | Yes | US | 
| [Electoral roll number](mdis-reference-pii.md#mdis-reference-electoral-roll-num) | Personal information: PII | UK\$1ELECTORAL\$1ROLL\$1NUMBER | Yes | UK | 
| [Full name](mdis-reference-pii.md#mdis-reference-full-name) | Personal information: PII | NAME | No | Any, if the name uses a Latin character set | 
| [Global Positioning System (GPS) coordinates](mdis-reference-pii.md#mdis-reference-GPS) | Personal information: PII | LATITUDE\$1LONGITUDE | Yes | Any, if the coordinates are in proximity of an English keyword | 
| [Google Cloud API key](mdis-reference-credentials.md#mdis-reference-GCP-API-key) | Credentials | GCP\$1API\$1KEY | Yes | Any | 
| [Health Insurance Claim Number (HICN)](mdis-reference-phi.md#mdis-reference-HICN) | Personal information: PHI | USA\$1HEALTH\$1INSURANCE\$1CLAIM\$1NUMBER | Yes | US | 
| [Health insurance or medical identification number](mdis-reference-phi.md#mdis-reference-HI-ID) | Personal information: PHI |  Depending on country or region: CANADA\$1HEALTH\$1NUMBER, EUROPEAN\$1HEALTH\$1INSURANCE\$1CARD\$1NUMBER, FINLAND\$1EUROPEAN\$1HEALTH\$1INSURANCE\$1NUMBER, FRANCE\$1HEALTH\$1INSURANCE\$1NUMBER, UK\$1NHS\$1NUMBER, USA\$1MEDICARE\$1BENEFICIARY\$1IDENTIFIER  | Yes | Canada, EU, Finland, France, UK, US | 
| [Healthcare Common Procedure Coding System (HCPCS) code](mdis-reference-phi.md#mdis-reference-HCPCS) | Personal information: PHI | USA\$1HEALTHCARE\$1PROCEDURE\$1CODE | Yes | US | 
| [HTTP Basic Authorization header](mdis-reference-credentials.md#mdis-reference-HTTP_BASIC_AUTH_HEADER) | Credentials | HTTP\$1BASIC\$1AUTH\$1HEADER | No | Any | 
| [HTTP cookie](mdis-reference-pii.md#mdis-reference-HTTP_COOKIE) | Personal information: PII | HTTP\$1COOKIE | No | Any | 
| [International Bank Account Number (IBAN)](mdis-reference-financial.md#mdis-reference-IBAN) | Financial information |  Depending on country or region: ALBANIA\$1BANK\$1ACCOUNT\$1NUMBER, ANDORRA\$1BANK\$1ACCOUNT\$1NUMBER, BOSNIA\$1AND\$1HERZEGOVINA\$1BANK\$1ACCOUNT\$1NUMBER, BRAZIL\$1BANK\$1ACCOUNT\$1NUMBER, BULGARIA\$1BANK\$1ACCOUNT\$1NUMBER, COSTA\$1RICA\$1BANK\$1ACCOUNT\$1NUMBER, CROATIA\$1BANK\$1ACCOUNT\$1NUMBER, CYPRUS\$1BANK\$1ACCOUNT\$1NUMBER, CZECH\$1REPUBLIC\$1BANK\$1ACCOUNT\$1NUMBER, DENMARK\$1BANK\$1ACCOUNT\$1NUMBER, DOMINICAN\$1REPUBLIC\$1BANK\$1ACCOUNT\$1NUMBER, EGYPT\$1BANK\$1ACCOUNT\$1NUMBER, ESTONIA\$1BANK\$1ACCOUNT\$1NUMBER, FAROE\$1ISLANDS\$1BANK\$1ACCOUNT\$1NUMBER, FINLAND\$1BANK\$1ACCOUNT\$1NUMBER, FRANCE\$1BANK\$1ACCOUNT\$1NUMBER, GEORGIA\$1BANK\$1ACCOUNT\$1NUMBER, GERMANY\$1BANK\$1ACCOUNT\$1NUMBER, GREECE\$1BANK\$1ACCOUNT\$1NUMBER, GREENLAND\$1BANK\$1ACCOUNT\$1NUMBER, HUNGARY\$1BANK\$1ACCOUNT\$1NUMBER, ICELAND\$1BANK\$1ACCOUNT\$1NUMBER, IRELAND\$1BANK\$1ACCOUNT\$1NUMBER, ITALY\$1BANK\$1ACCOUNT\$1NUMBER, JORDAN\$1BANK\$1ACCOUNT\$1NUMBER, KOSOVO\$1BANK\$1ACCOUNT\$1NUMBER, LIECHTENSTEIN\$1BANK\$1ACCOUNT\$1NUMBER, LITHUANIA\$1BANK\$1ACCOUNT\$1NUMBER, MALTA\$1BANK\$1ACCOUNT\$1NUMBER, MAURITANIA\$1BANK\$1ACCOUNT\$1NUMBER, MAURITIUS\$1BANK\$1ACCOUNT\$1NUMBER, MONACO\$1BANK\$1ACCOUNT\$1NUMBER, MONTENEGRO\$1BANK\$1ACCOUNT\$1NUMBER, NETHERLANDS\$1BANK\$1ACCOUNT\$1NUMBER, NORTH\$1MACEDONIA\$1BANK\$1ACCOUNT\$1NUMBER, POLAND\$1BANK\$1ACCOUNT\$1NUMBER, PORTUGAL\$1BANK\$1ACCOUNT\$1NUMBER, SAN\$1MARINO\$1BANK\$1ACCOUNT\$1NUMBER, SENEGAL\$1BANK\$1ACCOUNT\$1NUMBER, SERBIA\$1BANK\$1ACCOUNT\$1NUMBER, SLOVAKIA\$1BANK\$1ACCOUNT\$1NUMBER, SLOVENIA\$1BANK\$1ACCOUNT\$1NUMBER, SPAIN\$1BANK\$1ACCOUNT\$1NUMBER, SWEDEN\$1BANK\$1ACCOUNT\$1NUMBER, SWITZERLAND\$1BANK\$1ACCOUNT\$1NUMBER, TIMOR\$1LESTE\$1BANK\$1ACCOUNT\$1NUMBER, TUNISIA\$1BANK\$1ACCOUNT\$1NUMBER, TURKIYE\$1BANK\$1ACCOUNT\$1NUMBER, UK\$1BANK\$1ACCOUNT\$1NUMBER, UKRAINE\$1BANK\$1ACCOUNT\$1NUMBER, UNITED\$1ARAB\$1EMIRATES\$1BANK\$1ACCOUNT\$1NUMBER, VIRGIN\$1ISLANDS\$1BANK\$1ACCOUNT\$1NUMBER (for the British Virgin Islands)  | No | Albania, Andorra, Bosnia-Herzegovina, Brazil, Bulgaria, Costa Rica, Croatia, Cyprus, Czech Republic, Denmark, Dominican Republic, Egypt, Estonia, Faroe Islands, Finland, France, Georgia, Germany, Greece, Greenland, Hungary, Iceland, Ireland, Italy, Jordan, Kosovo, Liechtenstein, Lithuania, Malta, Mauritania, Mauritius, Monaco, Montenegro, Netherlands, North Macedonia, Poland, Portugal, San Marino, Senegal, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Timor-Leste, Tunisia, Türkiye, UK, Ukraine, United Arab Emirates, Virgin Islands (British) | 
| [JSON Web Token (JWT)](mdis-reference-credentials.md#mdis-reference-JSON_WEB_TOKEN) | Credentials | JSON\$1WEB\$1TOKEN | No | Any | 
| [Mailing address](mdis-reference-pii.md#mdis-reference-mailing-address) | Personal information: PII | ADDRESS, BRAZIL\$1CEP\$1CODE (for Brazil's Código de Endereçamento Postal) | Varies | Australia, Brazil, Canada, France, Germany, Italy, Spain, UK, US | 
| [National Drug Code (NDC)](mdis-reference-phi.md#mdis-reference-NDC) | Personal information: PHI | USA\$1NATIONAL\$1DRUG\$1CODE | Yes | US | 
| [National identification number](mdis-reference-pii.md#mdis-reference-national-id) | Personal information: PII |  Depending on country or region: ARGENTINA\$1DNI\$1NUMBER, BRAZIL\$1RG\$1NUMBER, CHILE\$1RUT\$1NUMBER, COLOMBIA\$1CITIZENSHIP\$1CARD\$1NUMBER, FRANCE\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, GERMANY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, INDIA\$1AADHAAR\$1NUMBER, ITALY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, MEXICO\$1CURP\$1NUMBER, SPAIN\$1DNI\$1NUMBER  | Yes | Argentina, Brazil, Chile, Colombia, France, Germany, India, Italy, Mexico, Spain | 
| [National Insurance Number (NINO)](mdis-reference-pii.md#mdis-reference-NINO) | Personal information: PII | UK\$1NATIONAL\$1INSURANCE\$1NUMBER | Yes | UK | 
| [National Provider Identifier (NPI)](mdis-reference-phi.md#mdis-reference-NPI) | Personal information: PHI | USA\$1NATIONAL\$1PROVIDER\$1IDENTIFIER | Yes | US | 
| [OpenSSH private key](mdis-reference-credentials.md#mdis-reference-OPENSSH_PRIVATE_KEY) | Credentials | OPENSSH\$1PRIVATE\$1KEY | No | Any | 
| [Passport number](mdis-reference-pii.md#mdis-reference-passport-num) | Personal information: PII |  Depending on country or region: CANADA\$1PASSPORT\$1NUMBER, FRANCE\$1PASSPORT\$1NUMBER, GERMANY\$1PASSPORT\$1NUMBER, ITALY\$1PASSPORT\$1NUMBER, SPAIN\$1PASSPORT\$1NUMBER, UK\$1PASSPORT\$1NUMBER, USA\$1PASSPORT\$1NUMBER  | Yes | Canada, France, Germany, Italy, Spain, UK, US | 
| [Permanent residence number](mdis-reference-pii.md#mdis-reference-permanent-residence-num) | Personal information: PII | CANADA\$1NATIONAL\$1IDENTIFICATION\$1NUMBER | Yes | Canada | 
| [PGP private key](mdis-reference-credentials.md#mdis-reference-PGP_PRIVATE_KEY) | Credentials | PGP\$1PRIVATE\$1KEY | No | Any | 
| [Phone number](mdis-reference-pii.md#mdis-reference-phone-num) | Personal information: PII |  Depending on country or region: BRAZIL\$1PHONE\$1NUMBER, FRANCE\$1PHONE\$1NUMBER, GERMANY\$1PHONE\$1NUMBER, ITALY\$1PHONE\$1NUMBER, PHONE\$1NUMBER (for Canada and the US), SPAIN\$1PHONE\$1NUMBER, UK\$1PHONE\$1NUMBER  | Varies | Brazil, Canada, France, Germany, Italy, Spain, UK, US | 
| [Public-Key Cryptography Standard (PKCS) private key](mdis-reference-credentials.md#mdis-reference-PKCS) | Credentials | PKCS | No | Any | 
| [Public transportation card number](mdis-reference-pii.md#mdis-reference-public-transport-num) | Personal information: PII | ARGENTINA\$1TARJETA\$1SUBE | Yes | Argentina | 
| [PuTTY private key](mdis-reference-credentials.md#mdis-reference-PUTTY_PRIVATE_KEY) | Credentials | PUTTY\$1PRIVATE\$1KEY | No | Any | 
| [Social Insurance Number (SIN)](mdis-reference-pii.md#mdis-reference-social-insurance-num) | Personal information: PII | CANADA\$1SOCIAL\$1INSURANCE\$1NUMBER | Yes | Canada | 
| [Social Security number (SSN)](mdis-reference-pii.md#mdis-reference-social-security-num) | Personal information: PII |  Depending on country or region: SPAIN\$1SOCIAL\$1SECURITY\$1NUMBER, USA\$1SOCIAL\$1SECURITY\$1NUMBER  | Yes | Spain, US | 
| [Stripe API key](mdis-reference-credentials.md#mdis-reference-Stripe_API_key) | Credentials | STRIPE\$1CREDENTIALS | No | Any | 
| [Taxpayer identification or reference number](mdis-reference-pii.md#mdis-reference-taxpayer-num) | Personal information: PII |  Depending on country or region: ARGENTINA\$1INDIVIDUAL\$1TAX\$1IDENTIFICATION\$1NUMBER, ARGENTINA\$1ORGANIZATION\$1TAX\$1IDENTIFICATION\$1NUMBER, AUSTRALIA\$1TAX\$1FILE\$1NUMBER, BRAZIL\$1CNPJ\$1NUMBER, BRAZIL\$1CPF\$1NUMBER, CHILE\$1RUT\$1NUMBER, COLOMBIA\$1INDIVIDUAL\$1NIT\$1NUMBER, COLOMBIA\$1ORGANIZATION\$1NIT\$1NUMBER, FRANCE\$1TAX\$1IDENTIFICATION\$1NUMBER, GERMANY\$1TAX\$1IDENTIFICATION\$1NUMBER, INDIA\$1PERMANENT\$1ACCOUNT\$1NUMBER, ITALY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, MEXICO\$1INDIVIDUAL\$1RFC\$1NUMBER, MEXICO\$1ORGANIZATION\$1RFC\$1NUMBER, SPAIN\$1NIE\$1NUMBER, SPAIN\$1NIF\$1NUMBER, SPAIN\$1TAX\$1IDENTIFICATION\$1NUMBER, UK\$1TAX\$1IDENTIFICATION\$1NUMBER, USA\$1INDIVIDUAL\$1TAX\$1IDENTIFICATION\$1NUMBER  | Yes | Argentina, Australia, Brazil, Chile, Colombia, France, Germany, India, Italy, Mexico, Spain, UK, US | 
| [Unique device identifier (UDI)](mdis-reference-phi.md#mdis-reference-UDI) | Personal information: PHI | MEDICAL\$1DEVICE\$1UDI | Yes | US | 
| [Vehicle identification number (VIN)](mdis-reference-pii.md#mdis-reference-vin) | Personal information: PII | VEHICLE\$1IDENTIFICATION\$1NUMBER | Yes | Any, if the VIN is in proximity of a keyword in one of the following languages: English, French, German, Lithuanian, Polish, Portuguese, Romanian, or Spanish | 

# Detailed reference: Managed data identifiers by category
<a name="mdis-reference"></a>

In Amazon Macie, *managed data identifiers* are built-in criteria and techniques that are designed to detect specific types of sensitive data. They can detect a large and growing list of sensitive data types for many countries and regions, including multiple types of credentials data, financial information, and personal information. Each managed data identifier is designed to detect a specific type of sensitive data—for example, AWS secret access keys, credit card numbers, or passport numbers for a particular country or region. 

Macie can detect several categories of sensitive data by using managed data identifiers. Within each category, Macie can detect multiple types of sensitive data. The topics in this section list and describe each type and any relevant requirements for detecting the data. You can browse the topics by category:
+ [Credentials](mdis-reference-credentials.md) – For credentials data such as private keys and AWS secret access keys.
+ [Financial information](mdis-reference-financial.md) – For financial data such as credit card numbers and bank account numbers.
+ [Personal information: PHI](mdis-reference-phi.md) – For personal health information (PHI) such as health insurance and medical identification numbers.
+ [Personal information: PII](mdis-reference-pii.md) – For personally identifiable information (PII) such as driver's license identification numbers and passport numbers.

Or choose a specific type of sensitive data from the following table. The table lists all the managed data identifiers that Macie currently provides, organized by sensitive data type. The table also summarizes relevant requirements for detecting each type.


| Sensitive data type | Sensitive data category | Managed data identifier ID | Keyword required | Countries and regions | 
| --- | --- | --- | --- | --- | 
| [AWS secret access key](mdis-reference-credentials.md#mdis-reference-AWS-CREDENTIALS) | Credentials | AWS\$1CREDENTIALS | Yes | Any | 
| [Bank account number](mdis-reference-financial.md#mdis-reference-BAN) | Financial information |  BANK\$1ACCOUNT\$1NUMBER (for both Canada and the US)  | Yes | Canada, US | 
| [Basic Bank Account Number (BBAN)](mdis-reference-financial.md#mdis-reference-BBAN) | Financial information |  Depending on country or region: FRANCE\$1BANK\$1ACCOUNT\$1NUMBER, GERMANY\$1BANK\$1ACCOUNT\$1NUMBER, ITALY\$1BANK\$1ACCOUNT\$1NUMBER, SPAIN\$1BANK\$1ACCOUNT\$1NUMBER, UK\$1BANK\$1ACCOUNT\$1NUMBER  | Yes | France, Germany, Italy, Spain, UK | 
| [Birth date](mdis-reference-pii.md#mdis-reference-DATE_OF_BIRTH) | Personal information: PII | DATE\$1OF\$1BIRTH | Yes | Any | 
| [Credit card expiration date](mdis-reference-financial.md#mdis-reference-CC-expiration) | Financial information | CREDIT\$1CARD\$1EXPIRATION | Yes | Any | 
| [Credit card magnetic stripe data](mdis-reference-financial.md#mdis-reference-CC-stripe) | Financial information | CREDIT\$1CARD\$1MAGNETIC\$1STRIPE | Yes | Any | 
| [Credit card number](mdis-reference-financial.md#mdis-reference-CC-number) | Financial information | CREDIT\$1CARD\$1NUMBER (for credit card numbers in proximity of a keyword), CREDIT\$1CARD\$1NUMBER\$1(NO\$1KEYWORD) (for credit card numbers not in proximity of a keyword) | Varies | Any | 
| [Credit card verification code](mdis-reference-financial.md#mdis-reference-CC-verification-code) | Financial information | CREDIT\$1CARD\$1SECURITY\$1CODE | Yes | Any | 
| [Driver’s license identification number](mdis-reference-pii.md#mdis-reference-DL-num) | Personal information: PII |  Depending on country or region: AUSTRALIA\$1DRIVERS\$1LICENSE, AUSTRIA\$1DRIVERS\$1LICENSE, BELGIUM\$1DRIVERS\$1LICENSE, BULGARIA\$1DRIVERS\$1LICENSE, CANADA\$1DRIVERS\$1LICENSE, CROATIA\$1DRIVERS\$1LICENSE, CYPRUS\$1DRIVERS\$1LICENSE, CZECHIA\$1DRIVERS\$1LICENSE, DENMARK\$1DRIVERS\$1LICENSE, DRIVERS\$1LICENSE (for the US), ESTONIA\$1DRIVERS\$1LICENSE, FINLAND\$1DRIVERS\$1LICENSE, FRANCE\$1DRIVERS\$1LICENSE, GERMANY\$1DRIVERS\$1LICENSE, GREECE\$1DRIVERS\$1LICENSE, HUNGARY\$1DRIVERS\$1LICENSE, INDIA\$1DRIVERS\$1LICENSE, IRELAND\$1DRIVERS\$1LICENSE, ITALY\$1DRIVERS\$1LICENSE, LATVIA\$1DRIVERS\$1LICENSE, LITHUANIA\$1DRIVERS\$1LICENSE, LUXEMBOURG\$1DRIVERS\$1LICENSE, MALTA\$1DRIVERS\$1LICENSE, NETHERLANDS\$1DRIVERS\$1LICENSE, POLAND\$1DRIVERS\$1LICENSE, PORTUGAL\$1DRIVERS\$1LICENSE, ROMANIA\$1DRIVERS\$1LICENSE, SLOVAKIA\$1DRIVERS\$1LICENSE, SLOVENIA\$1DRIVERS\$1LICENSE, SPAIN\$1DRIVERS\$1LICENSE, SWEDEN\$1DRIVERS\$1LICENSE, UK\$1DRIVERS\$1LICENSE  | Yes | Australia, Austria, Belgium, Bulgaria, Canada, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, India, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, UK, US | 
| [Drug Enforcement Agency (DEA) Registration Number](mdis-reference-phi.md#mdis-reference-DEA-registration-num) | Personal information: PHI | US\$1DRUG\$1ENFORCEMENT\$1AGENCY\$1NUMBER | Yes | US | 
| [Electoral roll number](mdis-reference-pii.md#mdis-reference-electoral-roll-num) | Personal information: PII | UK\$1ELECTORAL\$1ROLL\$1NUMBER | Yes | UK | 
| [Full name](mdis-reference-pii.md#mdis-reference-full-name) | Personal information: PII | NAME | No | Any, if the name uses a Latin character set | 
| [Global Positioning System (GPS) coordinates](mdis-reference-pii.md#mdis-reference-GPS) | Personal information: PII | LATITUDE\$1LONGITUDE | Yes | Any, if the coordinates are in proximity of an English keyword | 
| [Google Cloud API key](mdis-reference-credentials.md#mdis-reference-GCP-API-key) | Credentials | GCP\$1API\$1KEY | Yes | Any | 
| [Health Insurance Claim Number (HICN)](mdis-reference-phi.md#mdis-reference-HICN) | Personal information: PHI | USA\$1HEALTH\$1INSURANCE\$1CLAIM\$1NUMBER | Yes | US | 
| [Health insurance or medical identification number](mdis-reference-phi.md#mdis-reference-HI-ID) | Personal information: PHI |  Depending on country or region: CANADA\$1HEALTH\$1NUMBER, EUROPEAN\$1HEALTH\$1INSURANCE\$1CARD\$1NUMBER, FINLAND\$1EUROPEAN\$1HEALTH\$1INSURANCE\$1NUMBER, FRANCE\$1HEALTH\$1INSURANCE\$1NUMBER, UK\$1NHS\$1NUMBER, USA\$1MEDICARE\$1BENEFICIARY\$1IDENTIFIER  | Yes | Canada, EU, Finland, France, UK, US | 
| [Healthcare Common Procedure Coding System (HCPCS) code](mdis-reference-phi.md#mdis-reference-HCPCS) | Personal information: PHI | USA\$1HEALTHCARE\$1PROCEDURE\$1CODE | Yes | US | 
| [HTTP Basic Authorization header](mdis-reference-credentials.md#mdis-reference-HTTP_BASIC_AUTH_HEADER) | Credentials | HTTP\$1BASIC\$1AUTH\$1HEADER | No | Any | 
| [HTTP cookie](mdis-reference-pii.md#mdis-reference-HTTP_COOKIE) | Personal information: PII | HTTP\$1COOKIE | No | Any | 
| [International Bank Account Number (IBAN)](mdis-reference-financial.md#mdis-reference-IBAN) | Financial information |  Depending on country or region: ALBANIA\$1BANK\$1ACCOUNT\$1NUMBER, ANDORRA\$1BANK\$1ACCOUNT\$1NUMBER, BOSNIA\$1AND\$1HERZEGOVINA\$1BANK\$1ACCOUNT\$1NUMBER, BRAZIL\$1BANK\$1ACCOUNT\$1NUMBER, BULGARIA\$1BANK\$1ACCOUNT\$1NUMBER, COSTA\$1RICA\$1BANK\$1ACCOUNT\$1NUMBER, CROATIA\$1BANK\$1ACCOUNT\$1NUMBER, CYPRUS\$1BANK\$1ACCOUNT\$1NUMBER, CZECH\$1REPUBLIC\$1BANK\$1ACCOUNT\$1NUMBER, DENMARK\$1BANK\$1ACCOUNT\$1NUMBER, DOMINICAN\$1REPUBLIC\$1BANK\$1ACCOUNT\$1NUMBER, EGYPT\$1BANK\$1ACCOUNT\$1NUMBER, ESTONIA\$1BANK\$1ACCOUNT\$1NUMBER, FAROE\$1ISLANDS\$1BANK\$1ACCOUNT\$1NUMBER, FINLAND\$1BANK\$1ACCOUNT\$1NUMBER, FRANCE\$1BANK\$1ACCOUNT\$1NUMBER, GEORGIA\$1BANK\$1ACCOUNT\$1NUMBER, GERMANY\$1BANK\$1ACCOUNT\$1NUMBER, GREECE\$1BANK\$1ACCOUNT\$1NUMBER, GREENLAND\$1BANK\$1ACCOUNT\$1NUMBER, HUNGARY\$1BANK\$1ACCOUNT\$1NUMBER, ICELAND\$1BANK\$1ACCOUNT\$1NUMBER, IRELAND\$1BANK\$1ACCOUNT\$1NUMBER, ITALY\$1BANK\$1ACCOUNT\$1NUMBER, JORDAN\$1BANK\$1ACCOUNT\$1NUMBER, KOSOVO\$1BANK\$1ACCOUNT\$1NUMBER, LIECHTENSTEIN\$1BANK\$1ACCOUNT\$1NUMBER, LITHUANIA\$1BANK\$1ACCOUNT\$1NUMBER, MALTA\$1BANK\$1ACCOUNT\$1NUMBER, MAURITANIA\$1BANK\$1ACCOUNT\$1NUMBER, MAURITIUS\$1BANK\$1ACCOUNT\$1NUMBER, MONACO\$1BANK\$1ACCOUNT\$1NUMBER, MONTENEGRO\$1BANK\$1ACCOUNT\$1NUMBER, NETHERLANDS\$1BANK\$1ACCOUNT\$1NUMBER, NORTH\$1MACEDONIA\$1BANK\$1ACCOUNT\$1NUMBER, POLAND\$1BANK\$1ACCOUNT\$1NUMBER, PORTUGAL\$1BANK\$1ACCOUNT\$1NUMBER, SAN\$1MARINO\$1BANK\$1ACCOUNT\$1NUMBER, SENEGAL\$1BANK\$1ACCOUNT\$1NUMBER, SERBIA\$1BANK\$1ACCOUNT\$1NUMBER, SLOVAKIA\$1BANK\$1ACCOUNT\$1NUMBER, SLOVENIA\$1BANK\$1ACCOUNT\$1NUMBER, SPAIN\$1BANK\$1ACCOUNT\$1NUMBER, SWEDEN\$1BANK\$1ACCOUNT\$1NUMBER, SWITZERLAND\$1BANK\$1ACCOUNT\$1NUMBER, TIMOR\$1LESTE\$1BANK\$1ACCOUNT\$1NUMBER, TUNISIA\$1BANK\$1ACCOUNT\$1NUMBER, TURKIYE\$1BANK\$1ACCOUNT\$1NUMBER, UK\$1BANK\$1ACCOUNT\$1NUMBER, UKRAINE\$1BANK\$1ACCOUNT\$1NUMBER, UNITED\$1ARAB\$1EMIRATES\$1BANK\$1ACCOUNT\$1NUMBER, VIRGIN\$1ISLANDS\$1BANK\$1ACCOUNT\$1NUMBER (for the British Virgin Islands)  | No | Albania, Andorra, Bosnia-Herzegovina, Brazil, Bulgaria, Costa Rica, Croatia, Cyprus, Czech Republic, Denmark, Dominican Republic, Egypt, Estonia, Faroe Islands, Finland, France, Georgia, Germany, Greece, Greenland, Hungary, Iceland, Ireland, Italy, Jordan, Kosovo, Liechtenstein, Lithuania, Malta, Mauritania, Mauritius, Monaco, Montenegro, Netherlands, North Macedonia, Poland, Portugal, San Marino, Senegal, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Timor-Leste, Tunisia, Türkiye, UK, Ukraine, United Arab Emirates, Virgin Islands (British) | 
| [JSON Web Token (JWT)](mdis-reference-credentials.md#mdis-reference-JSON_WEB_TOKEN) | Credentials | JSON\$1WEB\$1TOKEN | No | Any | 
| [Mailing address](mdis-reference-pii.md#mdis-reference-mailing-address) | Personal information: PII | ADDRESS, BRAZIL\$1CEP\$1CODE (for Brazil's Código de Endereçamento Postal) | Varies | Australia, Brazil, Canada, France, Germany, Italy, Spain, UK, US | 
| [National Drug Code (NDC)](mdis-reference-phi.md#mdis-reference-NDC) | Personal information: PHI | USA\$1NATIONAL\$1DRUG\$1CODE | Yes | US | 
| [National identification number](mdis-reference-pii.md#mdis-reference-national-id) | Personal information: PII |  Depending on country or region: ARGENTINA\$1DNI\$1NUMBER, BRAZIL\$1RG\$1NUMBER, CHILE\$1RUT\$1NUMBER, COLOMBIA\$1CITIZENSHIP\$1CARD\$1NUMBER, FRANCE\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, GERMANY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, INDIA\$1AADHAAR\$1NUMBER, ITALY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, MEXICO\$1CURP\$1NUMBER, SPAIN\$1DNI\$1NUMBER  | Yes | Argentina, Brazil, Chile, Colombia, France, Germany, India, Italy, Mexico, Spain | 
| [National Insurance Number (NINO)](mdis-reference-pii.md#mdis-reference-NINO) | Personal information: PII | UK\$1NATIONAL\$1INSURANCE\$1NUMBER | Yes | UK | 
| [National Provider Identifier (NPI)](mdis-reference-phi.md#mdis-reference-NPI) | Personal information: PHI | USA\$1NATIONAL\$1PROVIDER\$1IDENTIFIER | Yes | US | 
| [OpenSSH private key](mdis-reference-credentials.md#mdis-reference-OPENSSH_PRIVATE_KEY) | Credentials | OPENSSH\$1PRIVATE\$1KEY | No | Any | 
| [Passport number](mdis-reference-pii.md#mdis-reference-passport-num) | Personal information: PII |  Depending on country or region: CANADA\$1PASSPORT\$1NUMBER, FRANCE\$1PASSPORT\$1NUMBER, GERMANY\$1PASSPORT\$1NUMBER, ITALY\$1PASSPORT\$1NUMBER, SPAIN\$1PASSPORT\$1NUMBER, UK\$1PASSPORT\$1NUMBER, USA\$1PASSPORT\$1NUMBER  | Yes | Canada, France, Germany, Italy, Spain, UK, US | 
| [Permanent residence number](mdis-reference-pii.md#mdis-reference-permanent-residence-num) | Personal information: PII | CANADA\$1NATIONAL\$1IDENTIFICATION\$1NUMBER | Yes | Canada | 
| [PGP private key](mdis-reference-credentials.md#mdis-reference-PGP_PRIVATE_KEY) | Credentials | PGP\$1PRIVATE\$1KEY | No | Any | 
| [Phone number](mdis-reference-pii.md#mdis-reference-phone-num) | Personal information: PII |  Depending on country or region: BRAZIL\$1PHONE\$1NUMBER, FRANCE\$1PHONE\$1NUMBER, GERMANY\$1PHONE\$1NUMBER, ITALY\$1PHONE\$1NUMBER, PHONE\$1NUMBER (for Canada and the US), SPAIN\$1PHONE\$1NUMBER, UK\$1PHONE\$1NUMBER  | Varies | Brazil, Canada, France, Germany, Italy, Spain, UK, US | 
| [Public-Key Cryptography Standard (PKCS) private key](mdis-reference-credentials.md#mdis-reference-PKCS) | Credentials | PKCS | No | Any | 
| [Public transportation card number](mdis-reference-pii.md#mdis-reference-public-transport-num) | Personal information: PII | ARGENTINA\$1TARJETA\$1SUBE | Yes | Argentina | 
| [PuTTY private key](mdis-reference-credentials.md#mdis-reference-PUTTY_PRIVATE_KEY) | Credentials | PUTTY\$1PRIVATE\$1KEY | No | Any | 
| [Social Insurance Number (SIN)](mdis-reference-pii.md#mdis-reference-social-insurance-num) | Personal information: PII | CANADA\$1SOCIAL\$1INSURANCE\$1NUMBER | Yes | Canada | 
| [Social Security number (SSN)](mdis-reference-pii.md#mdis-reference-social-security-num) | Personal information: PII |  Depending on country or region: SPAIN\$1SOCIAL\$1SECURITY\$1NUMBER, USA\$1SOCIAL\$1SECURITY\$1NUMBER  | Yes | Spain, US | 
| [Stripe API key](mdis-reference-credentials.md#mdis-reference-Stripe_API_key) | Credentials | STRIPE\$1CREDENTIALS | No | Any | 
| [Taxpayer identification or reference number](mdis-reference-pii.md#mdis-reference-taxpayer-num) | Personal information: PII |  Depending on country or region: ARGENTINA\$1INDIVIDUAL\$1TAX\$1IDENTIFICATION\$1NUMBER, ARGENTINA\$1ORGANIZATION\$1TAX\$1IDENTIFICATION\$1NUMBER, AUSTRALIA\$1TAX\$1FILE\$1NUMBER, BRAZIL\$1CNPJ\$1NUMBER, BRAZIL\$1CPF\$1NUMBER, CHILE\$1RUT\$1NUMBER, COLOMBIA\$1INDIVIDUAL\$1NIT\$1NUMBER, COLOMBIA\$1ORGANIZATION\$1NIT\$1NUMBER, FRANCE\$1TAX\$1IDENTIFICATION\$1NUMBER, GERMANY\$1TAX\$1IDENTIFICATION\$1NUMBER, INDIA\$1PERMANENT\$1ACCOUNT\$1NUMBER, ITALY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, MEXICO\$1INDIVIDUAL\$1RFC\$1NUMBER, MEXICO\$1ORGANIZATION\$1RFC\$1NUMBER, SPAIN\$1NIE\$1NUMBER, SPAIN\$1NIF\$1NUMBER, SPAIN\$1TAX\$1IDENTIFICATION\$1NUMBER, UK\$1TAX\$1IDENTIFICATION\$1NUMBER, USA\$1INDIVIDUAL\$1TAX\$1IDENTIFICATION\$1NUMBER  | Yes | Argentina, Australia, Brazil, Chile, Colombia, France, Germany, India, Italy, Mexico, Spain, UK, US | 
| [Unique device identifier (UDI)](mdis-reference-phi.md#mdis-reference-UDI) | Personal information: PHI | MEDICAL\$1DEVICE\$1UDI | Yes | US | 
| [Vehicle identification number (VIN)](mdis-reference-pii.md#mdis-reference-vin) | Personal information: PII | VEHICLE\$1IDENTIFICATION\$1NUMBER | Yes | Any, if the VIN is in proximity of a keyword in one of the following languages: English, French, German, Lithuanian, Polish, Portuguese, Romanian, or Spanish | 

# Managed data identifiers for credentials data
<a name="mdis-reference-credentials"></a>

Amazon Macie can detect multiple types of sensitive credentials data by using managed data identifiers. The topics on this page specify each type and provide information about the managed data identifier that's designed to detect the data. Each topic provides the following information:<a name="mdi-ref-fields-singular"></a>
+ **Managed data identifier ID** – Specifies the unique identifier (ID) for the managed data identifier that's designed to detect the data. When you [create a sensitive data discovery job](discovery-jobs-create.md) or [configure settings for automated sensitive data discovery](discovery-asdd-account-configure.md), you can use this ID to specify whether you want Macie to use the managed data identifier when it analyzes data.
+ **Supported countries and regions** – Indicates which countries or regions the applicable managed data identifier is designed for. If the managed data identifier isn't designed for a particular country or region, this value is *Any*.
+ **Keyword required** – Specifies whether detection requires a keyword to be in proximity of the data. If a keyword is required, the topic also provides examples of required keywords. For information about how Macie uses keywords when it analyzes data, see [Keyword requirements](managed-data-identifiers-keywords.md).
+ **Comments** – Provides any relevant details that might affect your choice of managed data identifier or your investigation into reported occurrences of the sensitive data. The details include information such as supported standards, syntax requirements, and exceptions.

The topics are listed in alphabetical order by sensitive data type.

**Topics**
+ [AWS secret access key](#mdis-reference-AWS-CREDENTIALS)
+ [Google Cloud API key](#mdis-reference-GCP-API-key)
+ [HTTP Basic Authorization header](#mdis-reference-HTTP_BASIC_AUTH_HEADER)
+ [JSON Web Token (JWT)](#mdis-reference-JSON_WEB_TOKEN)
+ [OpenSSH private key](#mdis-reference-OPENSSH_PRIVATE_KEY)
+ [PGP private key](#mdis-reference-PGP_PRIVATE_KEY)
+ [Public-Key Cryptography Standard (PKCS) private key](#mdis-reference-PKCS)
+ [PuTTY private key](#mdis-reference-PUTTY_PRIVATE_KEY)
+ [Stripe API key](#mdis-reference-Stripe_API_key)

## AWS secret access key
<a name="mdis-reference-AWS-CREDENTIALS"></a>

**Managed data identifier ID:** AWS\$1CREDENTIALS

**Supported countries and regions:** Any

**Keyword required:** Yes. Keywords include: *aws\$1secret\$1access\$1key, credentials, secret access key, secret key, set-awscredential*

**Comments:** Macie doesn't report occurrences of the following character sequences, which are commonly used as fictitious examples: `je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY` and `wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY`.

## Google Cloud API key
<a name="mdis-reference-GCP-API-key"></a>

**Managed data identifier ID:** GCP\$1API\$1KEY

**Supported countries and regions:** Any

**Keyword required:** Yes. Keywords include: *G\$1PLACES\$1KEY, GCP api key, GCP key, google cloud key, google-api-key, google-cloud-apikeys, GOOGLEKEY, X-goog-api-key*

**Comments:** Macie can detect only the string (`keyString`) component of a Google Cloud API key. Support doesn't include detection of the ID or display name component of a Google Cloud API key.

## HTTP Basic Authorization header
<a name="mdis-reference-HTTP_BASIC_AUTH_HEADER"></a>

**Managed data identifier ID:** HTTP\$1BASIC\$1AUTH\$1HEADER

**Supported countries and regions:** Any

**Keyword required:** No

**Comments:** Detection requires a complete header, including the field name and authentication scheme directive, as specified by [RFC 7617](https://tools.ietf.org/html/rfc7617). For example: `Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==` and `Proxy-Authorization: Basic dGVzdDoxMjPCow==`.

## JSON Web Token (JWT)
<a name="mdis-reference-JSON_WEB_TOKEN"></a>

**Managed data identifier ID:** JSON\$1WEB\$1TOKEN

**Supported countries and regions:** Any

**Keyword required:** No

**Comments:** Macie can detect JSON Web Tokens (JWTs) that comply with the requirements specified by [RFC 7519](https://tools.ietf.org/html/rfc7519) for JSON Web Signature (JWS) structures. The tokens can be signed or unsigned.

## OpenSSH private key
<a name="mdis-reference-OPENSSH_PRIVATE_KEY"></a>

**Managed data identifier ID:** OPENSSH\$1PRIVATE\$1KEY

**Supported countries and regions:** Any

**Keyword required:** No

**Comments:** None

## PGP private key
<a name="mdis-reference-PGP_PRIVATE_KEY"></a>

**Managed data identifier ID:** PGP\$1PRIVATE\$1KEY

**Supported countries and regions:** Any

**Keyword required:** No

**Comments:** None

## Public-Key Cryptography Standard (PKCS) private key
<a name="mdis-reference-PKCS"></a>

**Managed data identifier ID:** PKCS

**Supported countries and regions:** Any

**Keyword required:** No

**Comments:** None

## PuTTY private key
<a name="mdis-reference-PUTTY_PRIVATE_KEY"></a>

**Managed data identifier ID:** PUTTY\$1PRIVATE\$1KEY

**Supported countries and regions:** Any

**Keyword required:** No

**Comments:** Macie can detect PuTTY private keys that use the following standard headers and header sequence: `PuTTY-User-Key-File`, `Encryption`, `Comment`, `Public-Lines`, `Private-Lines`, and `Private-MAC`. The header values can contain alphanumeric characters, hyphens (`‐`), and newline characters (`\n` or `\r`). `Public-Lines` and `Private-Lines` values can also contain forward slashes (`/`), plus signs (`+`), and equal signs (`=`). `Private-MAC` values can also contain plus signs (`+`). Support doesn’t include detection of private keys with header values that contain other characters, such as spaces or underscores (`_`). Support also doesn’t include detection of private keys that include custom headers.

## Stripe API key
<a name="mdis-reference-Stripe_API_key"></a>

**Managed data identifier ID:** STRIPE\$1CREDENTIALS

**Supported countries and regions:** Any

**Keyword required:** No

**Comments:** Macie doesn't report occurrences of the following character sequences, which are commonly used in Stripe code examples: `sk_test_4eC39HqLyjWDarjtT1zdp7dc` and `pk_test_TYooMQauvdEDq54NiTphI7jx`.

# Managed data identifiers for financial information
<a name="mdis-reference-financial"></a>

Amazon Macie can detect multiple types of sensitive financial information by using managed data identifiers. The topics on this page list each type and provide information about the managed data identifiers that are designed to detect the data. Each topic provides the following information:<a name="mdi-ref-fields-plural"></a>
+ **Managed data identifier ID** – Specifies the unique identifier (ID) for one or more managed data identifiers that are designed to detect the data. When you [create a sensitive data discovery job](discovery-jobs-create.md) or [configure settings for automated sensitive data discovery](discovery-asdd-account-configure.md), you can use these IDs to specify which managed data identifiers you want Macie to use when it analyzes data.
+ **Supported countries and regions** – Indicates which countries and regions the applicable managed data identifiers are designed for. If the managed data identifiers aren't designed for particular countries or regions, this value is *Any*.
+ **Keyword required** – Specifies whether detection requires a keyword to be in proximity of the data. If a keyword is required, the topic also provides examples of required keywords. For information about how Macie uses keywords when it analyzes data, see [Keyword requirements](managed-data-identifiers-keywords.md).
+ **Comments** – Provides any relevant details that might affect your choice of managed data identifier or your investigation into reported occurrences of the sensitive data. The details include information such as supported standards, syntax requirements, and exceptions.

The topics are listed in alphabetical order by sensitive data type.

**Topics**
+ [Bank account number](#mdis-reference-BAN)
+ [Basic Bank Account Number (BBAN)](#mdis-reference-BBAN)
+ [Credit card expiration date](#mdis-reference-CC-expiration)
+ [Credit card magnetic stripe data](#mdis-reference-CC-stripe)
+ [Credit card number](#mdis-reference-CC-number)
+ [Credit card verification code](#mdis-reference-CC-verification-code)
+ [International Bank Account Number (IBAN)](#mdis-reference-IBAN)

## Bank account number
<a name="mdis-reference-BAN"></a>

Macie can detect Canadian and US bank account numbers that consist of 9–17 digit sequences and don't contain any spaces.

**Managed data identifier ID:** BANK\$1ACCOUNT\$1NUMBER

**Supported countries and regions:** Canada, US

**Keyword required:** Yes. Keywords include: *bank account, bank acct, checking account, checking acct, deposit account, deposit acct, savings account, savings acct, chequing account, chequing acct*

**Comments:** This managed data identifier is explicitly designed to detect bank account numbers for Canada and the US. These countries don’t use the Basic Bank Account Number (BBAN) or International Bank Account Number (IBAN) formats defined by the ISO international standard for numbering bank accounts, as specified by [ISO 13616](https://www.iso.org/standard/81090.html). To detect bank account numbers for other countries and regions, use the managed data identifiers that are designed for those formats. For more information, see [Basic Bank Account Number (BBAN)](#mdis-reference-BBAN) and [International Bank Account Number (IBAN)](#mdis-reference-IBAN).

## Basic Bank Account Number (BBAN)
<a name="mdis-reference-BBAN"></a>

Macie can detect Basic Bank Account Numbers (BBANs) that conform to the BBAN structure defined by the ISO international standard for numbering bank accounts, as specified by [ISO 13616](https://www.iso.org/standard/81090.html). This includes BBANs that don't contain spaces, or use space or hyphen separators—for example, `NWBK60161331926819`, `NWBK 6016 1331 9268 19`, and `NWBK-6016-1331-9268-19`.

**Managed data identifier ID:** Depending on country or region, FRANCE\$1BANK\$1ACCOUNT\$1NUMBER, GERMANY\$1BANK\$1ACCOUNT\$1NUMBER, ITALY\$1BANK\$1ACCOUNT\$1NUMBER, SPAIN\$1BANK\$1ACCOUNT\$1NUMBER, UK\$1BANK\$1ACCOUNT\$1NUMBER

**Supported countries and regions:** France, Germany, Italy, Spain, UK

**Keyword required:** Yes. The following table lists the keywords that Macie recognizes for specific countries and regions.


| Country or region | Keywords | 
| --- | --- | 
| France | account code, account number, accountno\$1, accountnumber\$1, bban, code bancaire, compte bancaire, customer account id, customer account number, customer bank account id, iban, numéro de compte | 
| Germany | account code, account number, accountno\$1, accountnumber\$1, bankleitzahl, bban, customer account id, customer account number, customer bank account id, geheimzahl, iban, kartennummer, kontonummer, kreditkartennummer, sepa | 
| Italy | account code, account number, accountno\$1, accountnumber\$1, bban, codice bancario, conto bancario, customer account id, customer account number, customer bank account id, iban, numero di conto | 
| Spain | account code, account number, accountno\$1, accountnumber\$1, bban, código cuenta, código cuenta bancaria, cuenta cliente id, customer account ID, customer account number, customer bank account id, iban, número cuenta bancaria cliente, número cuenta cliente | 
| UK | account code, account number, accountno\$1, accountnumber\$1, bban, customer account id, customer account number, customer bank account id, iban, sepa | 

**Comments:** These managed data identifiers can also detect International Bank Account Numbers (IBANs) that comply with the ISO 13616 standard. For more information, see [International Bank Account Number (IBAN)](#mdis-reference-IBAN). The managed data identifier for the UK (UK\$1BANK\$1ACCOUNT\$1NUMBER) can also detect domestic bank account numbers for the UK—for example, `60-16-13 31926819`.

## Credit card expiration date
<a name="mdis-reference-CC-expiration"></a>

**Managed data identifier ID:** CREDIT\$1CARD\$1EXPIRATION

**Supported countries and regions:** Any

**Keyword required:** Yes. Keywords include: *exp d, exp m, exp y, expiration, expiry*

**Comments:** Support includes most date formats, such as all digits and combinations of digits and names of months. Date components can be separated by slashes (/), hyphens (‐), or applicable keywords. For example, Macie can detect dates such as `02/26`, `02/2026`, `Feb 2026`, `26-Feb`, and `expY=2026, expM=02`.

## Credit card magnetic stripe data
<a name="mdis-reference-CC-stripe"></a>

**Managed data identifier ID:** CREDIT\$1CARD\$1MAGNETIC\$1STRIPE

**Supported countries and regions:** Any

**Keyword required:** Yes. Keywords include: *card data, iso7813, mag, magstripe, stripe, swipe*

**Comments:** Support includes tracks 1 and 2.

## Credit card number
<a name="mdis-reference-CC-number"></a>

**Managed data identifier ID:** CREDIT\$1CARD\$1NUMBER for credit card numbers that are in proximity of a keyword, CREDIT\$1CARD\$1NUMBER\$1(NO\$1KEYWORD) for credit card numbers that aren't in proximity of a keyword

**Supported countries and regions:** Any

**Keyword required:** Varies. Keywords are required by the CREDIT\$1CARD\$1NUMBER managed data identifier. Keywords include: *account number, american express, amex, bank card, c card, card, cc \$1, ccn, check card, cred card, credit, credit card, credit cards, credit no, credit num, dankort, debit, debit card, debit no, debit num, diners club, discover, electron, japanese card bureau, jcb, mastercard, mc, pan, payment account number, payment card number, pcn, pmnt \$1, pmnt card, pmnt no, pmnt number, union pay, visa*. Keywords aren't required by the CREDIT\$1CARD\$1NUMBER\$1(NO\$1KEYWORD) managed data identifier.

**Comments:** Detection requires the data to be a 13–19 digit sequence that adheres to the Luhn check formula and uses a standard card number prefix for any of the following types of credit cards: American Express, Dankort, Diner’s Club, Discover, Electron, Japanese Card Bureau (JCB), Mastercard, UnionPay, and Visa.

Macie doesn't report occurrences of the following sequences, which credit card issuers have reserved for public testing: `122000000000003`, `2222405343248877`, `2222990905257051`, `2223007648726984`, `2223577120017656`, `30569309025904`, `34343434343434`, `3528000700000000`, `3530111333300000`, `3566002020360505`, `36148900647913`, `36700102000000`, `371449635398431`, `378282246310005`, `378734493671000`, `38520000023237`, `4012888888881881`, `4111111111111111`, `4222222222222`, `4444333322221111`, `4462030000000000`, `4484070000000000`, `4911830000000`, `4917300800000000`, `4917610000000000`, `4917610000000000003`, `5019717010103742`, `5105105105105100`, `5111010030175156`, `5185540810000019`, `5200828282828210`, `5204230080000017`, `5204740009900014`, `5420923878724339`, `5454545454545454`, `5455330760000018`, `5506900490000436`, `5506900490000444`, `5506900510000234`, `5506920809243667`, `5506922400634930`, `5506927427317625`, `5553042241984105`, `5555553753048194`, `5555555555554444`, `5610591081018250`, `6011000990139424`, `6011000400000000`, `6011111111111117`, `630490017740292441`, `630495060000000000`, `6331101999990016`, `6759649826438453`, `6799990100000000019`, and `76009244561`.

## Credit card verification code
<a name="mdis-reference-CC-verification-code"></a>

**Managed data identifier ID:** CREDIT\$1CARD\$1SECURITY\$1CODE

**Supported countries and regions:** Any

**Keyword required:** Yes. Keywords include: *card id, card identification code, card identification number, card security code, card validation code, card validation number, card verification data, card verification value, cvc, cvc2, cvv, cvv2, elo verification code*

**Comments:** None

## International Bank Account Number (IBAN)
<a name="mdis-reference-IBAN"></a>

Macie can detect International Bank Account Numbers (IBANs) that consist of up to 34 alphanumeric characters, including elements such as country code. More specifically, Macie can detect IBANs that comply with the ISO international standard for numbering bank accounts, as specified by [ISO 13616](https://www.iso.org/standard/81090.html). This includes IBANs that don't contain spaces, or use space or hyphen separators—for example, `GB29NWBK60161331926819`, `GB29 NWBK 6016 1331 9268 19`, and `GB29-NWBK-6016-1331-9268-19`. Detection includes validation checks based on the Modulus 97 scheme.

**Managed data identifier ID:** Depending on country or region, ALBANIA\$1BANK\$1ACCOUNT\$1NUMBER, ANDORRA\$1BANK\$1ACCOUNT\$1NUMBER, BOSNIA\$1AND\$1HERZEGOVINA\$1BANK\$1ACCOUNT\$1NUMBER, BRAZIL\$1BANK\$1ACCOUNT\$1NUMBER, BULGARIA\$1BANK\$1ACCOUNT\$1NUMBER, COSTA\$1RICA\$1BANK\$1ACCOUNT\$1NUMBER, CROATIA\$1BANK\$1ACCOUNT\$1NUMBER, CYPRUS\$1BANK\$1ACCOUNT\$1NUMBER, CZECH\$1REPUBLIC\$1BANK\$1ACCOUNT\$1NUMBER, DENMARK\$1BANK\$1ACCOUNT\$1NUMBER, DOMINICAN\$1REPUBLIC\$1BANK\$1ACCOUNT\$1NUMBER, EGYPT\$1BANK\$1ACCOUNT\$1NUMBER, ESTONIA\$1BANK\$1ACCOUNT\$1NUMBER, FAROE\$1ISLANDS\$1BANK\$1ACCOUNT\$1NUMBER, FINLAND\$1BANK\$1ACCOUNT\$1NUMBER, FRANCE\$1BANK\$1ACCOUNT\$1NUMBER, GEORGIA\$1BANK\$1ACCOUNT\$1NUMBER, GERMANY\$1BANK\$1ACCOUNT\$1NUMBER, GREECE\$1BANK\$1ACCOUNT\$1NUMBER, GREENLAND\$1BANK\$1ACCOUNT\$1NUMBER, HUNGARY\$1BANK\$1ACCOUNT\$1NUMBER, ICELAND\$1BANK\$1ACCOUNT\$1NUMBER, IRELAND\$1BANK\$1ACCOUNT\$1NUMBER, ITALY\$1BANK\$1ACCOUNT\$1NUMBER, JORDAN\$1BANK\$1ACCOUNT\$1NUMBER, KOSOVO\$1BANK\$1ACCOUNT\$1NUMBER, LIECHTENSTEIN\$1BANK\$1ACCOUNT\$1NUMBER, LITHUANIA\$1BANK\$1ACCOUNT\$1NUMBER, MALTA\$1BANK\$1ACCOUNT\$1NUMBER, MAURITANIA\$1BANK\$1ACCOUNT\$1NUMBER, MAURITIUS\$1BANK\$1ACCOUNT\$1NUMBER, MONACO\$1BANK\$1ACCOUNT\$1NUMBER, MONTENEGRO\$1BANK\$1ACCOUNT\$1NUMBER, NETHERLANDS\$1BANK\$1ACCOUNT\$1NUMBER, NORTH\$1MACEDONIA\$1BANK\$1ACCOUNT\$1NUMBER, POLAND\$1BANK\$1ACCOUNT\$1NUMBER, PORTUGAL\$1BANK\$1ACCOUNT\$1NUMBER, SAN\$1MARINO\$1BANK\$1ACCOUNT\$1NUMBER, SENEGAL\$1BANK\$1ACCOUNT\$1NUMBER, SERBIA\$1BANK\$1ACCOUNT\$1NUMBER, SLOVAKIA\$1BANK\$1ACCOUNT\$1NUMBER, SLOVENIA\$1BANK\$1ACCOUNT\$1NUMBER, SPAIN\$1BANK\$1ACCOUNT\$1NUMBER, SWEDEN\$1BANK\$1ACCOUNT\$1NUMBER, SWITZERLAND\$1BANK\$1ACCOUNT\$1NUMBER, TIMOR\$1LESTE\$1BANK\$1ACCOUNT\$1NUMBER, TUNISIA\$1BANK\$1ACCOUNT\$1NUMBER, TURKIYE\$1BANK\$1ACCOUNT\$1NUMBER, UK\$1BANK\$1ACCOUNT\$1NUMBER, UKRAINE\$1BANK\$1ACCOUNT\$1NUMBER, UNITED\$1ARAB\$1EMIRATES\$1BANK\$1ACCOUNT\$1NUMBER, VIRGIN\$1ISLANDS\$1BANK\$1ACCOUNT\$1NUMBER (for the British Virgin Islands)

**Supported countries and regions:** Albania, Andorra, Bosnia-Herzegovina, Brazil, Bulgaria, Costa Rica, Croatia, Cyprus, Czech Republic, Denmark, Dominican Republic, Egypt, Estonia, Faroe Islands, Finland, France, Georgia, Germany, Greece, Greenland, Hungary, Iceland, Ireland, Italy, Jordan, Kosovo, Liechtenstein, Lithuania, Malta, Mauritania, Mauritius, Monaco, Montenegro, Netherlands, North Macedonia, Poland, Portugal, San Marino, Senegal, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Timor-Leste, Tunisia, Türkiye, UK, Ukraine, United Arab Emirates, Virgin Islands (British)

**Keyword required:** No

**Comments:** The managed data identifiers for France, Germany, Italy, Spain, and the UK can also detect Basic Bank Account Numbers (BBANs) that conform to the BBAN structure defined by the ISO 13616 standard, if the character sequence is in proximity of a keyword. For more information, see [Basic Bank Account Number (BBAN)](#mdis-reference-BBAN).

# Managed data identifiers for PHI
<a name="mdis-reference-phi"></a>

Amazon Macie can detect multiple types of sensitive, personal health information (PHI) by using managed data identifiers. The topics on this page specify each type and provide information about the managed data identifier that's designed to detect the data. Each topic provides the following information:<a name="mdi-ref-fields-singular"></a>
+ **Managed data identifier ID** – Specifies the unique identifier (ID) for the managed data identifier that's designed to detect the data. When you [create a sensitive data discovery job](discovery-jobs-create.md) or [configure settings for automated sensitive data discovery](discovery-asdd-account-configure.md), you can use this ID to specify whether you want Macie to use the managed data identifier when it analyzes data.
+ **Supported countries and regions** – Indicates which countries or regions the applicable managed data identifier is designed for. If the managed data identifier isn't designed for a particular country or region, this value is *Any*.
+ **Keyword required** – Specifies whether detection requires a keyword to be in proximity of the data. If a keyword is required, the topic also provides examples of required keywords. For information about how Macie uses keywords when it analyzes data, see [Keyword requirements](managed-data-identifiers-keywords.md).
+ **Comments** – Provides any relevant details that might affect your choice of managed data identifier or your investigation into reported occurrences of the sensitive data. The details include information such as supported standards, syntax requirements, and exceptions.

The topics are listed in alphabetical order by sensitive data type.

**Topics**
+ [Drug Enforcement Agency (DEA) Registration Number](#mdis-reference-DEA-registration-num)
+ [Health Insurance Claim Number (HICN)](#mdis-reference-HICN)
+ [Health insurance or medical identification number](#mdis-reference-HI-ID)
+ [Healthcare Common Procedure Coding System (HCPCS) code](#mdis-reference-HCPCS)
+ [National Drug Code (NDC)](#mdis-reference-NDC)
+ [National Provider Identifier (NPI)](#mdis-reference-NPI)
+ [Unique device identifier (UDI)](#mdis-reference-UDI)

## Drug Enforcement Agency (DEA) Registration Number
<a name="mdis-reference-DEA-registration-num"></a>

**Managed data identifier ID:** US\$1DRUG\$1ENFORCEMENT\$1AGENCY\$1NUMBER

**Supported countries and regions:** US

**Keyword required:** Yes. Keywords include: *dea number, dea registration*

**Comments:** None

## Health Insurance Claim Number (HICN)
<a name="mdis-reference-HICN"></a>

**Managed data identifier ID:** USA\$1HEALTH\$1INSURANCE\$1CLAIM\$1NUMBER

**Supported countries and regions:** US

**Keyword required:** Yes. Keywords include: *health insurance claim number, hic no, hic no., hic number, hic\$1, hicn, hicn\$1., hicno\$1*

**Comments:** None

## Health insurance or medical identification number
<a name="mdis-reference-HI-ID"></a>

Support includes European Health Insurance Card numbers for the EU and Finland, health insurance numbers for France, Medicare Beneficiary Identifiers for the US, NHS numbers for the UK, and Personal Health Numbers for Canada.

**Managed data identifier ID:** Depending on country or region, CANADA\$1HEALTH\$1NUMBER, EUROPEAN\$1HEALTH\$1INSURANCE\$1CARD\$1NUMBER, FINLAND\$1EUROPEAN\$1HEALTH\$1INSURANCE\$1NUMBER, FRANCE\$1HEALTH\$1INSURANCE\$1NUMBER, UK\$1NHS\$1NUMBER, USA\$1MEDICARE\$1BENEFICIARY\$1IDENTIFIER

**Supported countries and regions:** Canada, EU, Finland, France, UK, US

**Keyword required:** Yes. The following table lists the keywords that Macie recognizes for specific countries and regions.


| Country or region | Keywords | 
| --- | --- | 
| Canada | canada healthcare number, msp number, personal healthcare number, phn, soins de santé | 
| EU | assicurazione sanitaria numero, carta assicurazione numero, carte d’assurance maladie, carte européenne d'assurance maladie, ceam, ehic, ehic\$1, finlandehicnumber\$1, gesundheitskarte, hälsokort, health card, health card number, health insurance card, health insurance number, insurance card number, krankenversicherungskarte, krankenversicherungsnummer, medical account number, numero conto medico, numéro d’assurance maladie, numéro de carte d’assurance, numéro de compte medical, número de cuenta médica, número de seguro de salud, número de tarjeta de seguro, sairaanhoitokortin, sairausvakuutuskortti, sairausvakuutusnumero, sjukförsäkring nummer, sjukförsäkringskort, suomi ehic-numero, tarjeta de salud, terveyskortti, tessera sanitaria assicurazione numero, versicherungsnummer | 
| Finland | ehic, ehic\$1, finland health insurance card, finlandehicnumber\$1, finska sjukförsäkringskort, hälsokort, health card, health card number, health insurance card, health insurance number, sairaanhoitokortin, sairaanhoitokortin, sairausvakuutuskortti, sairausvakuutusnumero, sjukförsäkring nummer, sjukförsäkringskort, suomen sairausvakuutuskortti, suomi ehic-numero, terveyskortti | 
| France | carte d'assuré social, carte vitale, insurance card | 
| UK | national health service, NHS | 
| US | mbi, medicare beneficiary | 

**Comments:** None

## Healthcare Common Procedure Coding System (HCPCS) code
<a name="mdis-reference-HCPCS"></a>

**Managed data identifier ID:** USA\$1HEALTHCARE\$1PROCEDURE\$1CODE

**Supported countries and regions:** US

**Keyword required:** Yes. Keywords include: *current procedural terminology, hcpcs, healthcare common procedure coding system*

**Comments:** None

## National Drug Code (NDC)
<a name="mdis-reference-NDC"></a>

**Managed data identifier ID:** USA\$1NATIONAL\$1DRUG\$1CODE

**Supported countries and regions:** US

**Keyword required:** Yes. Keywords include: *national drug code, ndc*

**Comments:** None

## National Provider Identifier (NPI)
<a name="mdis-reference-NPI"></a>

**Managed data identifier ID:** USA\$1NATIONAL\$1PROVIDER\$1IDENTIFIER

**Supported countries and regions:** US

**Keyword required:** Yes. Keywords include: *hipaa, n.p.i, national provider, npi*

**Comments:** None

## Unique device identifier (UDI)
<a name="mdis-reference-UDI"></a>

**Managed data identifier ID:** MEDICAL\$1DEVICE\$1UDI

**Supported countries and regions:** US

**Keyword required:** Yes. Keywords include: *blood, blood bag, dev id, device id, device identifier, gs1, hibcc, iccbba, med, udi, unique device id, unique device identifier*

**Comments:** Macie can detect unique device identifiers (UDIs) that comply with formats approved by the US Food and Drug Administration. This includes standard formats defined by GS1, HIBCC, and ICCBBA. ICCBA support is for the ISBT standard.

# Managed data identifiers for PII
<a name="mdis-reference-pii"></a>

Amazon Macie can detect multiple types of sensitive, personally identifiable information (PII) by using managed data identifiers. The topics on this page list each type and provide information about the managed data identifiers that are designed to detect the data. Each topic provides the following information:<a name="mdi-ref-fields-plural"></a>
+ **Managed data identifier ID** – Specifies the unique identifier (ID) for one or more managed data identifiers that are designed to detect the data. When you [create a sensitive data discovery job](discovery-jobs-create.md) or [configure settings for automated sensitive data discovery](discovery-asdd-account-configure.md), you can use these IDs to specify which managed data identifiers you want Macie to use when it analyzes data.
+ **Supported countries and regions** – Indicates which countries and regions the applicable managed data identifiers are designed for. If the managed data identifiers aren't designed for particular countries or regions, this value is *Any*.
+ **Keyword required** – Specifies whether detection requires a keyword to be in proximity of the data. If a keyword is required, the topic also provides examples of required keywords. For information about how Macie uses keywords when it analyzes data, see [Keyword requirements](managed-data-identifiers-keywords.md).
+ **Comments** – Provides any relevant details that might affect your choice of managed data identifier or your investigation into reported occurrences of the sensitive data. The details include information such as supported standards, syntax requirements, and exceptions.

The topics are listed in alphabetical order by sensitive data type.

**Topics**
+ [Birth date](#mdis-reference-DATE_OF_BIRTH)
+ [Driver’s license identification number](#mdis-reference-DL-num)
+ [Electoral roll number](#mdis-reference-electoral-roll-num)
+ [Full name](#mdis-reference-full-name)
+ [Global Positioning System (GPS) coordinates](#mdis-reference-GPS)
+ [HTTP cookie](#mdis-reference-HTTP_COOKIE)
+ [Mailing address](#mdis-reference-mailing-address)
+ [National identification number](#mdis-reference-national-id)
+ [National Insurance Number (NINO)](#mdis-reference-NINO)
+ [Passport number](#mdis-reference-passport-num)
+ [Permanent residence number](#mdis-reference-permanent-residence-num)
+ [Phone number](#mdis-reference-phone-num)
+ [Public transportation card number](#mdis-reference-public-transport-num)
+ [Social Insurance Number (SIN)](#mdis-reference-social-insurance-num)
+ [Social Security number (SSN)](#mdis-reference-social-security-num)
+ [Taxpayer identification or reference number](#mdis-reference-taxpayer-num)
+ [Vehicle identification number (VIN)](#mdis-reference-vin)

## Birth date
<a name="mdis-reference-DATE_OF_BIRTH"></a>

**Managed data identifier ID:** DATE\$1OF\$1BIRTH

**Supported countries and regions:** Any

**Keyword required:** Yes. Keywords include: *bday, b-day, birth date, birthday, date of birth, dob*

**Comments:** Support includes most date formats, such as all digits and combinations of digits and names of months. Date components can be separated by spaces, slashes (/), or hyphens (‐).

## Driver’s license identification number
<a name="mdis-reference-DL-num"></a>

**Managed data identifier ID:** Depending on country or region, AUSTRALIA\$1DRIVERS\$1LICENSE, AUSTRIA\$1DRIVERS\$1LICENSE, BELGIUM\$1DRIVERS\$1LICENSE, BULGARIA\$1DRIVERS\$1LICENSE, CANADA\$1DRIVERS\$1LICENSE, CROATIA\$1DRIVERS\$1LICENSE, CYPRUS\$1DRIVERS\$1LICENSE, CZECHIA\$1DRIVERS\$1LICENSE, DENMARK\$1DRIVERS\$1LICENSE, DRIVERS\$1LICENSE (for the US), ESTONIA\$1DRIVERS\$1LICENSE, FINLAND\$1DRIVERS\$1LICENSE, FRANCE\$1DRIVERS\$1LICENSE, GERMANY\$1DRIVERS\$1LICENSE, GREECE\$1DRIVERS\$1LICENSE, HUNGARY\$1DRIVERS\$1LICENSE, INDIA\$1DRIVERS\$1LICENSE, IRELAND\$1DRIVERS\$1LICENSE, ITALY\$1DRIVERS\$1LICENSE, LATVIA\$1DRIVERS\$1LICENSE, LITHUANIA\$1DRIVERS\$1LICENSE, LUXEMBOURG\$1DRIVERS\$1LICENSE, MALTA\$1DRIVERS\$1LICENSE, NETHERLANDS\$1DRIVERS\$1LICENSE, POLAND\$1DRIVERS\$1LICENSE, PORTUGAL\$1DRIVERS\$1LICENSE, ROMANIA\$1DRIVERS\$1LICENSE, SLOVAKIA\$1DRIVERS\$1LICENSE, SLOVENIA\$1DRIVERS\$1LICENSE, SPAIN\$1DRIVERS\$1LICENSE, SWEDEN\$1DRIVERS\$1LICENSE, UK\$1DRIVERS\$1LICENSE

**Supported countries and regions:** Australia, Austria, Belgium, Bulgaria, Canada, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, India, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, UK, US

**Keyword required:** Yes. The following table lists the keywords that Macie recognizes for specific countries and regions.


| Country or region | Keywords | 
| --- | --- | 
| Australia | dl\$1, dl:, dlno\$1, driver licence, driver license, driver permit, drivers lic., drivers licence, driver's licence, drivers license, driver's license, drivers permit, driver's permit, drivers permit number, driving licence, driving license, driving permit | 
| Austria | führerschein, fuhrerschein, führerschein republik österreich, fuhrerschein republik osterreich | 
| Belgium | fuehrerschein, fuehrerschein- nr, fuehrerscheinnummer, fuhrerschein, führerschein, fuhrerschein- nr, führerschein- nr, fuhrerscheinnummer, führerscheinnummer, numéro permis conduire, permis de conduire, rijbewijs, rijbewijsnummer | 
| Bulgaria | превозно средство, свидетелство за управление на моторно, свидетелство за управление на мпс, сумпс, шофьорска книжка | 
| Canada | dl\$1, dl:, dlno\$1, driver licence, driver licences, driver license, driver licenses, driver permit, drivers lic., drivers licence, driver's licence, drivers licences, driver's licences, drivers license, driver's license, drivers licenses, driver's licenses, drivers permit, driver's permit, drivers permit number, driving licence, driving license, driving permit, permis de conduire | 
| Croatia | vozačka dozvola | 
| Cyprus | άδεια οδήγησης | 
| Czech Republic | číslo licence, císlo licence řidiče, číslo řidičského průkazu, ovladače lic., povolení k jízdě, povolení řidiče, řidiči povolení, řidičský prúkaz, řidičský průkaz | 
| Denmark | kørekort, kørekortnummer | 
| Estonia | juhi litsentsi number, juhiloa number, juhiluba, juhiluba number | 
| Finland | ajokortin numero, ajokortti, förare lic., körkort, körkort nummer, kuljettaja lic., permis de conduire | 
| France | permis de conduire | 
| Germany | fuehrerschein, fuehrerschein- nr, fuehrerscheinnummer, fuhrerschein, führerschein, fuhrerschein- nr, führerschein- nr, fuhrerscheinnummer, führerscheinnummer | 
| Greece | δεια οδήγησης, adeia odigisis | 
| Hungary | illesztőprogramok lic, jogosítvány, jogsi, licencszám, vezető engedély, vezetői engedély | 
| India | driver licence, driver licences, driver license, driver licenses, drivers lic., drivers licence, driver's licence, drivers licences, driver's licences, drivers license, driver's license, drivers licenses, driver's licenses, driving licence, driving license | 
| Ireland | ceadúnas tiomána | 
| Italy | patente di guida, patente di guida numero, patente guida, patente guida numero | 
| Latvia | autovadītāja apliecība, licences numurs, vadītāja apliecība, vadītāja apliecības numurs, vadītāja atļauja, vadītāja licences numurs, vadītāji lic. | 
| Lithuania | vairuotojo pažymėjimas | 
| Luxembourg | fahrerlaubnis, führerschäin | 
| Malta | liċenzja tas-sewqan | 
| Netherlands | permis de conduire, rijbewijs, rijbewijsnummer | 
| Poland | numer licencyjny, prawo jazdy, zezwolenie na prowadzenie | 
| Portugal | carta de condução, carteira de habilitação, carteira de motorist, carteira habilitação, carteira motorist, licença condução, licença de condução, número de licença, número licença, permissão condução, permissão de condução | 
| Romania | numărul permisului de conducere, permis de conducere | 
| Slovakia | číslo licencie, číslo vodičského preukazu, ovládače lic., povolenia vodičov, povolenie jazdu, povolenie na jazdu, povolenie vodiča, vodičský preukaz | 
| Slovenia | vozniško dovoljenje | 
| Spain | carnet conducer, el carnet de conducer, licencia conducer, licencia de manejo, número carnet conducer, número de carnet de conducer, número de permiso conducer, número de permiso de conducer, número licencia conducer, número permiso conducer, permiso conducción, permiso conducer, permiso de conducción | 
| Sweden | ajokortin numero, dlno\$1 ajokortti, drivere lic., förare lic., körkort, körkort nummer, körkortsnummer, kuljettajat lic.  | 
| UK | dl\$1, dl:, dlno\$1, driver licence, driver licences, driver license, driver licenses, driver permit, drivers lic., drivers licence, driver's licence, drivers licences, driver's licences, drivers license, driver's license, drivers licenses, driver's licenses, drivers permit, driver's permit, drivers permit number, driving licence, driving license, driving permit | 
| US | dl\$1, dl:, dlno\$1, driver licence, driver licences, driver license, driver licenses, driver permit, drivers lic., drivers licence, driver's licence, drivers licences, driver's licences, drivers license, driver's license, drivers licenses, driver's licenses, drivers permit, driver's permit, drivers permit number, driving licence, driving license, driving permit | 

**Comments:** None

## Electoral roll number
<a name="mdis-reference-electoral-roll-num"></a>

**Managed data identifier ID:** UK\$1ELECTORAL\$1ROLL\$1NUMBER

**Supported countries and regions:** UK

**Keyword required:** Yes. Keywords include: *electoral \$1, electoral number, electoral roll \$1, electoral roll no., electoral roll number, electoralrollno*

**Comments:** None

## Full name
<a name="mdis-reference-full-name"></a>

**Managed data identifier ID:** NAME

**Supported countries and regions:** Any

**Keyword required:** No

**Comments:** Macie can detect full names only. Support is limited to Latin character sets.

## Global Positioning System (GPS) coordinates
<a name="mdis-reference-GPS"></a>

**Managed data identifier ID:** LATITUDE\$1LONGITUDE

**Supported countries and regions:** Any, if the coordinates are in proximity of an English keyword.

**Keyword required:** Yes. Keywords include: *coordinate, coordinates, lat long, latitude longitude, position*

**Comments:** Macie can detect GPS coordinates if the latitude and longitude coordinates are stored as a pair and they're in Decimal Degrees (DD) format, for example `41.948614,-87.655311`. Support doesn't include detection of coordinates in: Degrees Decimal Minutes (DDM) format, for example `41°56.9168'N 87°39.3187'W`; or Degrees, Minutes, Seconds (DMS) format, for example `41°56'55.0104"N 87°39'19.1196"W`.

## HTTP cookie
<a name="mdis-reference-HTTP_COOKIE"></a>

**Managed data identifier ID:** HTTP\$1COOKIE

**Supported countries and regions:** Any

**Keyword required:** No

**Comments:** Detection requires a complete `Cookie` or `Set-Cookie` header. The header can include one or more name-value pairs, for example: `Set-Cookie: id=TWlrZQ` and `Cookie: session=3948; lang=en`.

## Mailing address
<a name="mdis-reference-mailing-address"></a>

**Managed data identifier ID:** ADDRESS (for Australia, Canada, France, Germany, Italy, Spain, UK, and the US), BRAZIL\$1CEP\$1CODE (for Brazil's Código de Endereçamento Postal)

**Supported countries and regions:** Australia, Brazil, Canada, France, Germany, Italy, Spain, UK, US

**Keyword required:** Varies. Keywords aren't required by the ADDRESS managed data identifier. Keywords are required by the BRAZIL\$1CEP\$1CODE managed data identifier. Keywords include: *cep, código de endereçamento postal, codigo de endereçamento postal, código postal, codigo postal*

**Comments:** Although a keyword isn't required by the ADDRESS managed data identifier, detection requires an address to include the name of a city or place and a corresponding ZIP or Postal Code in a supported country or region. The BRAZIL\$1CEP\$1CODE managed data identifier can detect only the Código de Endereçamento Postal (CEP) portion of an address.

## National identification number
<a name="mdis-reference-national-id"></a>

Support includes: Aadhaar numbers for India; Cédula de Ciudadanía numbers for Colombia; Clave Única de Registro de Población (CURP) numbers for Mexico; Codice Fiscale numbers for Italy; Documento Nacional de Identidad (DNI) numbers for Argentina and Spain; French National Institute for Statistics and Economic Studies (INSEE) codes; German National Identity Card numbers; Registro Geral (RG) numbers for Brazil; and, Rol Único Nacional (RUN) numbers for Chile.

**Managed data identifier ID:** Depending on country or region, ARGENTINA\$1DNI\$1NUMBER, BRAZIL\$1RG\$1NUMBER, CHILE\$1RUT\$1NUMBER, COLOMBIA\$1CITIZENSHIP\$1CARD\$1NUMBER, FRANCE\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, GERMANY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, INDIA\$1AADHAAR\$1NUMBER, ITALY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, MEXICO\$1CURP\$1NUMBER, SPAIN\$1DNI\$1NUMBER

**Supported countries and regions:** Argentina, Brazil, Chile, Colombia, France, Germany, India, Italy, Mexico, Spain

**Keyword required:** Yes. The following table lists the keywords that Macie recognizes for specific countries and regions.


| Country or region | Keywords | 
| --- | --- | 
| Argentina | dni, dni\$1, d.n.i., documento nacional de identidad | 
| Brazil | registro geral, rg | 
| Chile | identidad número, nacional identidad, national unique role, nationaluniqueroleID\$1, número identificación, rol único nacional, rol único tributario, run, run\$1, r.u.n., rut, rut\$1, r.u.t., unique national number, unique national role, unique tax registry, unique tax role, unique tributary number, unique tributary role | 
| Colombia | cédula de ciudadanía, documento de identificación | 
| France | assurance sociale, carte nationale d’identité, cni, code sécurité sociale, French social security number, fssn\$1, insee, insurance number, national id number, nationalid\$1, numéro d'assurance, sécurité sociale, sécurité sociale non., sécurité sociale numéro, social, social security, social security number, socialsecuritynumber, ss\$1, ssn, ssn\$1 | 
| Germany | ausweisnummer, id number, identification number, identity number, insurance number, personal id, personalausweis | 
| India | aadhaar, aadhar, adhaar, uidai | 
| Italy | codice fiscal, dati anagrafici, ehic, health card, health insurance card, p. iva, partita i.v.a., personal data, tax code, tessera sanitaria | 
| Mexico | clave personal identidad, clave única, clave única de registro de población, clavepersonalIdentidad, curp, registration code, registry code, personal identidad clave, population code | 
| Spain | dni, dni\$1, dninúmero\$1, documento nacional de identidad, identidad único, identidadúnico\$1, insurance number, national identification number, national identity, nationalid\$1, nationalidno\$1, número nacional identidad, personal identification number, personal identity no, unique identity number, uniqueid\$1 | 

**Comments:** The managed data identifier for Chile (CHILE\$1RUT\$1NUMBER) is designed to detect both Rol Único Nacional (RUN) numbers and Rol Único Tributario (RUT) numbers. For either type of number, Macie doesn't report occurrences where all the digits are zeroes, such as `00000000-K`, because they're commonly used as examples.

Although DNI numbers for Argentina and Spain have different syntaxes, there are similarities between them. Therefore, Macie might report a DNI number for Argentina as a DNI number for Spain, or the other way around. In addition, Macie doesn't report occurrences of the following character sequences, which are commonly used as example DNI numbers: `99999999` and `99.999.999`. Macie also doesn't report occurrences that consist of only zeroes—for example, `000000000` and `00.000.000`.

## National Insurance Number (NINO)
<a name="mdis-reference-NINO"></a>

**Managed data identifier ID:** UK\$1NATIONAL\$1INSURANCE\$1NUMBER

**Supported countries and regions:** UK

**Keyword required:** Yes. Keywords include: *insurance no., insurance number, insurance\$1, national insurance number, nationalinsurance\$1, nationalinsurancenumber, nin, nino*

**Comments:** None

## Passport number
<a name="mdis-reference-passport-num"></a>

**Managed data identifier ID:** Depending on country or region, CANADA\$1PASSPORT\$1NUMBER, FRANCE\$1PASSPORT\$1NUMBER, GERMANY\$1PASSPORT\$1NUMBER, ITALY\$1PASSPORT\$1NUMBER, SPAIN\$1PASSPORT\$1NUMBER, UK\$1PASSPORT\$1NUMBER, USA\$1PASSPORT\$1NUMBER

**Supported countries and regions:** Canada, France, Germany, Italy, Spain, UK, US

**Keyword required:** Yes. The following table lists the keywords that Macie recognizes for specific countries and regions.


| Country or region | Keywords | 
| --- | --- | 
| Canada | passeport, passeport\$1, passport, passport\$1, passportno, passportno\$1 | 
| France | numéro de passeport, passeport, passeport \$1, passeport n °, passeport non | 
| Germany | ausstellungsdatum, ausstellungsort, geburtsdatum, passport, passports, reisepass, reisepass–nr, reisepassnummer | 
| Italy | italian passport number, numéro passeport, numéro passeport italien, passaporto, passaporto italiana, passaporto numero, passport number, repubblica italiana passaporto | 
| Spain | españa pasaporte, libreta pasaporte, número pasaporte, pasaporte, passport, passport book, passport no, passport number, spain passport | 
| UK | passeport \$1, passeport n °, passeport non, passeportn °, passport \$1, passport no, passport number, passport\$1, passportid | 
| US | passport, travel document | 

**Comments:** None

## Permanent residence number
<a name="mdis-reference-permanent-residence-num"></a>

**Managed data identifier ID:** CANADA\$1NATIONAL\$1IDENTIFICATION\$1NUMBER

**Supported countries and regions:** Canada

**Keyword required:** Yes. Keywords include: *carte résident permanent, numéro carte résident permanent, numéro résident permanent, permanent resident card, permanent resident card number, permanent resident no, permanent resident no., permanent resident number, pr no, pr no., pr non, pr number, résident permanent no., résident permanent non*

**Comments:** None

## Phone number
<a name="mdis-reference-phone-num"></a>

**Managed data identifier ID:** Depending on country or region, BRAZIL\$1PHONE\$1NUMBER, FRANCE\$1PHONE\$1NUMBER, GERMANY\$1PHONE\$1NUMBER, ITALY\$1PHONE\$1NUMBER, PHONE\$1NUMBER (for Canada and the US), SPAIN\$1PHONE\$1NUMBER, UK\$1PHONE\$1NUMBER

**Supported countries and regions:** Brazil, Canada, France, Germany, Italy, Spain, UK, US

**Keyword required:** Varies. If a keyword is in proximity of the data, the number doesn’t have to include a country code. Keywords include: *cell, contact, fax, fax number, mobile, phone, phone number, tel, telephone, telephone number*. For Brazil, keywords also include: *cel, celular, fone, móvel, número residencial, numero residencial, telefone*. If a keyword isn’t in proximity of the data, the number has to include a country code.

**Comments:** For the US, support includes toll-free numbers.

## Public transportation card number
<a name="mdis-reference-public-transport-num"></a>

**Managed data identifier ID:** ARGENTINA\$1TARJETA\$1SUBE

**Supported countries and regions:** Argentina

**Keyword required:** Yes. Keywords include: *sistema único de boleto electrónico, sube*

**Comments:** Macie can detect 16‐digit Sistema Único de Boleto Electrónico (SUBE) card numbers that begin with `6061` and adhere to the Luhn check formula. Card number components can be separated by spaces or hyphens (‐), or not use a separator—for example, `6061 1234 1234 1234`, `6061‐1234‐1234‐1234`, and `6061123412341234`.

## Social Insurance Number (SIN)
<a name="mdis-reference-social-insurance-num"></a>

**Managed data identifier ID:** CANADA\$1SOCIAL\$1INSURANCE\$1NUMBER

**Supported countries and regions:** Canada

**Keyword required:** Yes. Keywords include: *canadian id, numéro d'assurance sociale, sin, social insurance number*

**Comments:** None

## Social Security number (SSN)
<a name="mdis-reference-social-security-num"></a>

**Managed data identifier ID:** Depending on country or region, SPAIN\$1SOCIAL\$1SECURITY\$1NUMBER, USA\$1SOCIAL\$1SECURITY\$1NUMBER

**Supported countries and regions:** Spain, US

**Keyword required:** Yes. For Spain, keywords include: *número de la seguridad social, social security no., social security number, socialsecurityno\$1, ssn, ssn\$1*. For the US, keywords include: *social security, ss\$1, ssn*.

**Comments:** None

## Taxpayer identification or reference number
<a name="mdis-reference-taxpayer-num"></a>

Support includes: CUIL and CUIT codes for Argentina; CIF, NIE, and NIF numbers for Spain; CNPJ and CPF numbers for Brazil; Codice Fiscale numbers for Italy; ITINs for the US; NIT numbers for Colombia; PANs for India; RFC numbers for Mexico; RUN and RUT numbers for Chile; Steueridentifikationsnummer numbers for Germany; TFNs for Australia; TINs for France; and, TRN and UTR numbers for the UK.

**Managed data identifier ID:** Depending on country or region, ARGENTINA\$1INDIVIDUAL\$1TAX\$1IDENTIFICATION\$1NUMBER, ARGENTINA\$1ORGANIZATION\$1TAX\$1IDENTIFICATION\$1NUMBER, AUSTRALIA\$1TAX\$1FILE\$1NUMBER, BRAZIL\$1CNPJ\$1NUMBER, BRAZIL\$1CPF\$1NUMBER, CHILE\$1RUT\$1NUMBER, COLOMBIA\$1INDIVIDUAL\$1NIT\$1NUMBER, COLOMBIA\$1ORGANIZATION\$1NIT\$1NUMBER, FRANCE\$1TAX\$1IDENTIFICATION\$1NUMBER, GERMANY\$1TAX\$1IDENTIFICATION\$1NUMBER, INDIA\$1PERMANENT\$1ACCOUNT\$1NUMBER, ITALY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, MEXICO\$1INDIVIDUAL\$1RFC\$1NUMBER, MEXICO\$1ORGANIZATION\$1RFC\$1NUMBER, SPAIN\$1NIE\$1NUMBER, SPAIN\$1NIF\$1NUMBER, SPAIN\$1TAX\$1IDENTIFICATION\$1NUMBER, UK\$1TAX\$1IDENTIFICATION\$1NUMBER, USA\$1INDIVIDUAL\$1TAX\$1IDENTIFICATION\$1NUMBER

**Supported countries and regions:** Argentina, Australia, Brazil, Chile, Colombia, France, Germany, India, Italy, Mexico, Spain, UK, US

**Keyword required:** Yes. The following table lists the keywords that Macie recognizes for specific countries and regions.


| Country or region | Keywords | 
| --- | --- | 
| Argentina | argentina taxpayer id, clave única de identificación tributaria, cuil, c.u.i.l, cuit, c.u.i.t, número de identificación fiscal, número de contribuyente, unified labor identification code | 
| Australia | tax file number, tfn | 
| Brazil | cadastro de pessoa física, cadastro de pessoa fisica, cadastro de pessoas físicas, cadastro de pessoas fisicas, cadastro nacional da pessoa jurídica, cadastro nacional da pessoa juridica, cnpj, cpf | 
| Chile | identidad número, nacional identidad, national unique role, nationaluniqueroleID\$1, número identificación, rol único nacional, rol único tributario, run, run\$1, r.u.n., rut, rut\$1, r.u.t., unique national number, unique national role, unique tax registry, unique tax role, unique tributary number, unique tributary role | 
| Colombia | nit, nit., nit\$1, n.i.t. | 
| France | numéro d'identification fiscal, tax id, tax identification number, tax number, tin, tin\$1 | 
| Germany | identifikationsnummer, steuer id, steueridentifikationsnummer, steuernummer, tax id, tax identification number, tax number | 
| India | e-pan, pan card, pan number, permanent account number | 
| Italy | codice fiscal, dati anagrafici, ehic, health card, health insurance card, p. iva, partita i.v.a., personal data, tax code, tessera sanitaria | 
| Mexico | código del registro federal de contribuyentes, identificación de impuestos, identificacion de impuestos, impuesto al valor agregado, iva, iva\$1, i.v.a., registro federal de contribuyentes, rfc, rfc\$1, r.f.c. | 
| Spain | cif, cif número, cifnúmero\$1, nie, nif, número de contribuyente, número de identidad de extranjero, número de identificación fiscal, número de impuesto corporativo, personal tax number, tax id, tax identification number, tax number, tin, tin\$1 | 
| UK | paye, tax id, tax id no., tax id number, tax identification, tax identification\$1, tax no., tax number, tax reference, tax\$1, taxid\$1, temporary reference number, tin, trn, unique tax reference, unique taxpayer reference, utr | 
| US | i.t.i.n., individual taxpayer identification number, itin | 

**Comments:** The managed data identifier for Chile (CHILE\$1RUT\$1NUMBER) is designed to detect both Rol Único Nacional (RUN) numbers and Rol Único Tributario (RUT) numbers. For Registro Federal de Contribuyentes (RFC) numbers for Mexico, Macie doesn't report occurrences of the following character sequences, which are commonly used as example RFC numbers: `XAXX010101000` and `XEXX010101000`.

For several types of taxpayer identification and reference numbers, Macie doesn't report occurrences where all the digits are zeroes—for example, `00000000-K`, `000000000`, and `00.000.000`. This is because the use of only zeroes is common in examples of certain types of taxpayer identification and reference numbers.

## Vehicle identification number (VIN)
<a name="mdis-reference-vin"></a>

**Managed data identifier ID:** VEHICLE\$1IDENTIFICATION\$1NUMBER

**Supported countries and regions:** Any, if the VIN is in proximity of a keyword in one of the following languages: English, French, German, Lithuanian, Polish, Portuguese, Romanian, or Spanish.

**Keyword required:** Yes. Keywords include: *Fahrgestellnummer, niv, numarul de identificare, numarul seriei de sasiu, numer VIN, Número de Identificação do Veículo, Número de Identificación de Automóviles, numéro d'identification du véhicule, vehicle identification number, vin, VIN numeris*

**Comments:** Macie can detect VINs that consist of a 17-character sequence and adhere to the ISO 3779 and 3780 standards. These standards were designed for worldwide use.

# Building custom data identifiers
<a name="custom-data-identifiers"></a>

In addition to using the managed data identifiers that Amazon Macie provides, you can build and use custom data identifiers. A *custom data identifier* is a set of criteria that you define to detect sensitive data in Amazon Simple Storage Service (Amazon S3) objects. The criteria consist of a regular expression (*regex*) that defines a text pattern to match and, optionally, character sequences and a proximity rule that refine the results. The character sequences can be: *keywords*, which are words or phrases that must be in proximity of text that matches the regex, or *ignore words*, which are words or phrases to exclude from results.

With custom data identifiers, you can define detection criteria that reflect your organization's particular scenarios, intellectual property, or proprietary data. For example, you can detect employee IDs, customer account numbers, or internal data classifications. If you configure [sensitive data discovery jobs](discovery-jobs.md) or [automated sensitive data discovery](discovery-asdd.md) to use these identifiers, you can supplement the [managed data identifiers](managed-data-identifiers.md) that Macie provides.

In addition to detection criteria, you can optionally configure custom severity settings for findings that a custom data identifier produces. By default, Macie assigns the *Medium* severity to all the findings that a custom data identifier produces. Severity doesn't change based on the number of occurrences of text that match an identifier's detection criteria. If you configure custom severity settings, severity can be based on the number of occurrences of text that match the criteria.

**Topics**
+ [Configuration options for custom data identifiers](cdis-options.md)
+ [Creating a custom data identifier](cdis-create.md)
+ [Deleting a custom data identifier](cdis-delete.md)

# Configuration options for custom data identifiers
<a name="cdis-options"></a>

By using custom data identifiers, you can define custom criteria for detecting sensitive data in Amazon Simple Storage Service (Amazon S3) objects. You can supplement the [managed data identifiers](managed-data-identifiers.md) that Amazon Macie provides, and detect sensitive data that reflects your organization's particular scenarios, intellectual property, or proprietary data.

Each custom data identifier specifies detection criteria and, optionally, severity settings for findings that the identifier produces. The detection criteria specify a regular expression that defines a text pattern to match in an S3 object. The criteria can also specify character sequences and a proximity rule that refine the results. The severity settings specify which severity to assign to findings. Severity can be based on the number of occurrences of text that match the identifier's detection criteria.

**Topics**
+ [Detection criteria](#cdis-detection-criteria)
+ [Severity settings for findings](#cdis-finding-severity)

## Detection criteria
<a name="cdis-detection-criteria"></a>

When you create a custom data identifier, you specify a regular expression (*regex*) that defines a text pattern to match. You can also specify character sequences, such as words and phrases, and a proximity rule that refine the results. The character sequences can be: *keywords*, which are words or phrases that must be in proximity of text that matches the regex, or *ignore words*, which are words or phrases to exclude from results.

For the regex, Amazon Macie supports a subset of the pattern syntax provided by the [Perl Compatible Regular Expressions (PCRE) library](https://www.pcre.org/). Of the constructs provided by the PCRE library, Macie doesn’t support the following pattern elements:
+ Backreferences
+ Capturing groups
+ Conditional patterns
+ Embedded code
+ Global pattern flags, such as `/i`, `/m`, and `/x`
+ Recursive patterns
+ Positive and negative look-behind and look-ahead zero-width assertions, such as `?=`, `?!`, `?<=`, and `?<!`

The regex can contain as many as 512 characters.

To create an effective regex pattern for a custom data identifier, note the following tips and recommendations:
+ Use anchors (`^` or `$`) only if you expect the pattern to appear at the beginning or end of a file, not the beginning or end of a line.
+ For performance reasons, Macie limits the size of bounded repeat groups. For example, `\d{100,1000}` won’t compile in Macie. To approximate this functionality, you can use an open-ended repeat such as `\d{100,}`.
+ To make parts of a pattern case insensitive, you can use the `(?i)` construct instead of the `/i` flag.
+ There’s no need to optimize prefixes or alternations manually. For example, changing `/hello|hi|hey/` to `/h(?:ello|i|ey)/` won’t improve performance.
+ For performance reasons, Macie limits the number of repeated wildcards. For example, `a*b*a*` won’t compile in Macie.

To protect against malformed or long-running expressions, Macie automatically tests regex patterns against a collection of sample text when you create a custom data identifier. If there's an issue with the regex, Macie returns an error that describes the issue.

In addition to the regex, you can optionally specify character sequences and a proximity rule to refine the results.

**Keywords**  
These are specific character sequences that must be in proximity of text that matches the regex pattern. The proximity requirements vary based on an S3 object's storage format or file type:  
+ **Structured columnar data** – Macie includes a result if the text matches the regex pattern and a keyword is in the name of the field or column that stores the text, or the text is preceded by and within the maximum match distance of a keyword in the same field or cell value. This is the case for Microsoft Excel workbooks, CSV files, and TSV files.
+ **Structured record-based data** – Macie includes a result if the text matches the regex pattern and the text is within the maximum match distance of a keyword. The keyword can be in the name of an element in the path to the field or array that stores the text, or it can precede and be part of the same value in the field or array that stores the text. This is the case for Apache Avro object containers, Apache Parquet files, JSON files, and JSON Lines files.
+ **Unstructured data** – Macie includes a result if the text matches the regex pattern and the text is preceded by and within the maximum match distance of a keyword. This is the case for Adobe Portable Document Format files, Microsoft Word documents, email messages, and non-binary text files other than CSV, JSON, JSON Lines, and TSV files. This includes any structured data, such as tables, in these types of files.
You can specify as many as 50 keywords. Each keyword can contain 3–90 UTF-8 characters. Keywords aren't case sensitive.

**Maximum match distance**  
This is a character-based proximity rule for keywords. Macie uses this setting to determine whether a keyword precedes text that matches the regex pattern. The setting defines the maximum number of characters that can exist between the end of a complete keyword and the end of text that matches the regex pattern. Macie includes a result if the text:  
+ Matches the regex pattern,
+ Occurs after at least one complete keyword, and
+ Occurs within the specified distance of the keyword.
Otherwise, Macie excludes the text from results.  
You can specify a distance of 1–300 characters. The default distance is 50 characters. For best results, this distance should be greater than the minimum number of characters of text that the regex is designed to detect. If only part of the text is within the maximum match distance of a keyword, Macie doesn’t include it in results.

**Ignore words**  
These are specific character sequences to exclude from results. If text matches the regex pattern but it contains an ignore word, Macie doesn't include it in results.  
You can specify as many as 10 ignore words. Each ignore word can contain 4–90 UTF-8 characters. Ignore words are case sensitive.

**Note**  
Before you create a custom data identifier, we strongly recommend that you test and refine its detection criteria with sample data. Because custom data identifiers are used by sensitive data discovery jobs, you can't change a custom data identifier after you create it. This helps ensure that you have an immutable history of sensitive data findings and discovery results for data privacy and protection audits or investigations that you perform.  
You can test detection criteria by using the Amazon Macie console or the Amazon Macie API. To test the criteria by using the console, use the options in the **Evaluate** section while you're creating the custom data identifier. To test the criteria programmatically, use the [TestCustomDataIdentifier](https://docs.aws.amazon.com/macie/latest/APIReference/custom-data-identifiers-test.html) operation of the Amazon Macie API. If you're using the AWS Command Line Interface, run the [test-custom-data-identifier](https://docs.aws.amazon.com/cli/latest/reference/macie2/test-custom-data-identifier.html) command to test the criteria.

For a demonstration of how keywords can help you find sensitive data and avoid false positives, watch the following video:




## Severity settings for findings
<a name="cdis-finding-severity"></a>

When you create a custom data identifier, you can also specify custom severity settings for sensitive data findings that the identifier produces. By default, Amazon Macie assigns the *Medium* severity to all the findings that a custom data identifier produces. If an S3 object contains at least one occurrence of text that matches the detection criteria, Macie automatically assigns the *Medium* severity to the resulting finding.

With custom severity settings, you specify which severity to assign based on the number of occurrences of text that match the detection criteria. You can define *occurrences thresholds* for as many as three severity levels: *Low* (least severe), *Medium*, and *High* (most severe). An *occurrences threshold* is the minimum number of matches that must exist in an S3 object to produce a finding with the specified severity. If you specify more than one threshold, the thresholds must be in ascending order by severity, moving from *Low* to *High*.

For example, the following image shows severity settings that specify three occurrences thresholds, one for each severity level that Macie supports.

![\[Severity settings that specify occurrences thresholds for Low, Medium, and High severity levels.\]](http://docs.aws.amazon.com/macie/latest/user/images/scrn-cdi-severity.png)


The following table indicates the severity of the findings that the custom data identifier produces.


| Occurrences threshold | Severity level | Result | 
| --- | --- | --- | 
| 1 | Low | If an S3 object contains 1–49 occurrences of text that match the detection criteria, the severity of the resulting finding is Low.  | 
| 50 | Medium | If an S3 object contains 50–99 occurrences of text that match the detection criteria, the severity of the resulting finding is Medium. | 
| 100 | High | If an S3 object contains 100 or more occurrences of text that match the detection criteria, the severity of the resulting finding is High. | 

You can also use severity settings to specify whether to create a finding at all. If an S3 object contains fewer occurrences than the lowest occurrences threshold, Macie doesn't create a finding.

# Creating a custom data identifier
<a name="cdis-create"></a>

A *custom data identifier* is a set of criteria that you define to detect sensitive data in Amazon Simple Storage Service (Amazon S3) objects. When you create a custom data identifier, you specify a regular expression (*regex*) that defines a text pattern to match in an S3 object. You can also specify character sequences and a proximity rule that refine the results. The character sequences can be: *keywords*, which are words or phrases that must be in proximity of text that matches the regex, or *ignore words*, which are words or phrases to exclude from results. By using custom data identifiers, you can supplement the [managed data identifiers](managed-data-identifiers.md) that Amazon Macie provides, and detect sensitive data that reflects your organization's particular scenarios, intellectual property, or proprietary data.

For example, many companies have a specific syntax for employee IDs. One such syntax might be: a capital letter that indicates whether an employee is a full-time (*F*) or part-time (*P*) employee, followed by a hyphen (–), followed by an eight-digit sequence that identifies the employee. Examples are: *F–12345678* for a full-time employee, and *P–87654321* for a part-time employee. To detect employee IDs that use this syntax, you might create a custom data identifier that specifies the following regex: `[A-Z]-\d{8}`. To refine the analysis and avoid false positives, you might also configure the identifier to use keywords (`employee` and `employee ID`) and a maximum match distance of 20 characters. With these criteria, results include text that matches the regex if the text occurs after the keyword *employee* or *employee ID* and all the text occurs within 20 characters of one of those keywords.

For a demonstration of how keywords can help you find sensitive data and avoid false positives, watch the following video:




In addition to detection criteria, you can optionally specify custom severity settings for findings that a custom data identifier produces. Severity can be based on the number of occurrences of text that match the identifier's detection criteria. If you don't specify these settings, Macie automatically assigns the *Medium* severity to all the findings that the identifier produces. Severity doesn't change based on the number of occurrences of text that match the identifier's detection criteria.

For detailed information about these and other settings, see [Configuration options for custom data identifiers](cdis-options.md).

**To create a custom data identifier**  
You can create a custom data identifier by using the Amazon Macie console or the Amazon Macie API.

------
#### [ Console ]

Follow these steps to create a custom data identifier by using the Amazon Macie console.

**To create a custom data identifier**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, under **Settings**, choose **Custom data identifiers**.

1. Choose **Create**.

1. For **Name**, enter a name for the custom data identifier. The name can contain as many as 128 characters.

1. For **Description**, optionally enter a brief description of the custom data identifier. The description can contain as many as 512 characters.
**Note**  
Avoid including sensitive data in the name or description of a custom data identifier. Other users of your account might be able to access the name or description, depending on the actions that they're allowed to perform in Macie.

1. For **Regular expression**, enter the regular expression (*regex*) that defines the text pattern to match. The regex can contain as many as 512 characters.

   Macie supports a subset of the pattern syntax provided by the [Perl Compatible Regular Expressions (PCRE) library](https://www.pcre.org/). For additional details and tips, see [Detection criteria for custom data identifiers](cdis-options.md#cdis-detection-criteria).

1. For **Keywords**, optionally enter as many as 50 character sequences (separated by commas) to define specific text that must be in proximity of text that matches the regex pattern.

   Macie includes an occurrence in results only if the text matches the regex pattern and the text is within the maximum match distance of one of these keywords. Each keyword can contain 3–90 UTF-8 characters. Keywords aren't case sensitive.

1. For **Ignore words**, optionally enter as many as 10 character sequences (separated by commas) that define specific text to exclude from results.

   Macie excludes an occurrence from results if the text matches the regex pattern but it contains one of these ignore words. Each ignore word can contain 4–90 UTF-8 characters. Ignore words are case sensitive.

1. For **Maximum match distance**, optionally enter the maximum number of characters that can exist between the end of a keyword and the end of text that matches the regex pattern.

   Macie includes an occurrence in results only if the text matches the regex pattern and the text is within this distance of a complete keyword. The distance can be 1–300 characters. The default distance is 50 characters.

1. For **Severity**, choose how to determine the severity of sensitive data findings that the custom data identifier produces:
   + To automatically assign the *Medium* severity to all findings, choose **Use Medium severity for any number of matches (default)**. With this option, Macie automatically assigns the *Medium* severity to a finding if the affected S3 object contains one or more occurrences of text that match the detection criteria.
   + To assign severity based on occurrences thresholds that you specify, choose **Use custom settings to determine severity**. Then use the **Occurrences threshold** and **Severity level** options to specify the minimum number of matches that must exist in an S3 object to produce a finding with a selected severity.

     You can specify as many as three occurrences thresholds, one for each severity level that Macie supports: *Low* (least severe), *Medium*, or *High* (most severe). If you specify more than one, the thresholds must be in ascending order by severity, moving from *Low* to *High*. If an S3 object contains fewer occurrences than the lowest threshold, Macie doesn't create a finding.

1. (Optional) For **Tags**, choose **Add tag**, and then enter as many as 50 tags to assign to the custom data identifier.

   A *tag* is a label that you define and assign to certain types of AWS resources. Each tag consists of a required tag key and an optional tag value. Tags can help you identify, categorize, and manage resources in different ways, such as by purpose, owner, environment, or other criteria. To learn more, see [Tagging Macie resources](tagging-resources.md).

1. (Optional) For **Evaluate**, enter up to 1,000 characters in the **Sample data** box, and then choose **Test** to test the detection criteria. Macie evaluates the sample data and reports the number of occurrences of text that match the criteria. You can repeat this step as many times as you like to refine and optimize the criteria.
**Note**  
We strongly recommend that you test and refine the detection criteria with sample data. Because custom data identifiers are used by sensitive data discovery jobs, you can't change a custom data identifier after you create it. This helps ensure that you have an immutable history of sensitive data findings and discovery results.  
Because Macie applies additional logic when processing structured records, the match count returned by the **Evaluate** box may differ in certain cases from the results produced by jobs.

1. When you finish, choose **Submit**.

Macie tests the settings and verifies that it can compile the regex. If there's an issue with a setting or the regex, Macie displays an error that describes the issue. After you address any issues, you can save the custom data identifier.

------
#### [ API ]

To create a custom data identifier programmatically, use the [CreateCustomDataIdentifier](https://docs.aws.amazon.com/macie/latest/APIReference/custom-data-identifiers.html) operation of the Amazon Macie API. Or, if you're using the AWS Command Line Interface (AWS CLI), run the [create-custom-data-identifier](https://docs.aws.amazon.com/cli/latest/reference/macie2/create-custom-data-identifier.html) command.

**Note**  
Before you create a custom data identifier, we strongly recommend that you test and refine its detection criteria with sample data. Because custom data identifiers are used by sensitive data discovery jobs, you can't change a custom data identifier after you create it. This helps ensure that you have an immutable history of sensitive data findings and discovery results.  
To test the criteria programmatically, you can use the [TestCustomDataIdentifier](https://docs.aws.amazon.com/macie/latest/APIReference/custom-data-identifiers-test.html) operation of the Amazon Macie API. This operation provides an environment for evaluating sample data with detection criteria. If you're using the AWS CLI, you can run the [test-custom-data-identifier](https://docs.aws.amazon.com/cli/latest/reference/macie2/test-custom-data-identifier.html) command to test the criteria.

When you're ready to create the custom data identifier, use the following parameters to define its detection criteria:
+ `regex` – Specify the regular expression (*regex*) that defines the text pattern to match. The regex can contain as many as 512 characters.

  Macie supports a subset of the pattern syntax provided by the [Perl Compatible Regular Expressions (PCRE) library](https://www.pcre.org/). For additional details and tips, see [Detection criteria for custom data identifiers](cdis-options.md#cdis-detection-criteria).
+ `keywords` – Optionally specify 1–50 character sequences (*keywords*) that must be in proximity of text that matches the regex pattern.

  Macie includes an occurrence in results only if the text matches the regex pattern and the text is within the maximum match distance of one of these keywords. Each keyword can contain 3–90 UTF-8 characters. Keywords aren't case sensitive.
+ `maximumMatchDistance` – Optionally specify the maximum number of characters that can exist between the end of a keyword and the end of text that matches the regex pattern. If you're using the AWS CLI, use the `maximum-match-distance` parameter to specify this value.

  Macie includes an occurrence in results only if the text matches the regex pattern and the text is within this distance of a complete keyword. The distance can be 1–300 characters. The default distance is 50 characters.
+ `ignoreWords` – Optionally specify 1–10 character sequences (*ignore words*) to exclude from results. If you're using the AWS CLI, use the `ignore-words` parameter to specify these character sequences.

  Macie excludes an occurrence from results if the text matches the regex pattern but it contains one of these ignore words. Each ignore word can contain 4–90 UTF-8 characters. Ignore words are case sensitive.

To specify the severity of sensitive data findings that the custom data identifier produces, use the `severityLevels` parameter or, if you're using the AWS CLI, the `severity-levels` parameter:
+ To automatically assign the `MEDIUM` severity to all the findings, omit this parameter. Macie then uses the default setting. By default, Macie assigns the `MEDIUM` severity to a finding if the affected S3 object contains one or more occurrences of text that match the detection criteria.
+ To assign severity based on occurrences thresholds that you specify, specify the minimum number of matches that must exist in an S3 object to produce a finding with a specified severity.

  You can specify as many as three occurrences thresholds, one for each severity level that Macie supports: `LOW` (least severe), `MEDIUM`, or `HIGH` (most severe). If you specify more than one, the thresholds must be in ascending order by severity, moving from `LOW` to `HIGH`. If an S3 object contains fewer occurrences than the lowest threshold, Macie doesn't create a finding.

Use additional parameters to specify a name and other settings, such as tags, for the custom data identifier. Avoid including sensitive data in these settings. Other users of your account might be able to access these values, depending on the actions that they're allowed to perform in Macie.

When you submit your request, Macie tests the settings and verifies that it can compile the regex. If there's an issue with a setting or the regex, the request fails and Macie returns a message that describes the issue. If the request succeeds, you receive output similar to the following:

```
{
    "customDataIdentifierId": "393950aa-82ea-4bdc-8f7b-e5be3example"
}
```

Where `customDataIdentifierId` specifies the unique identifier (ID) for the custom data identifier that was created.

To subsequently retrieve and review the settings for the custom data identifier, use the [GetCustomDataIdentifier](https://docs.aws.amazon.com/macie/latest/APIReference/custom-data-identifiers-id.html) operation or, if you’re using the AWS CLI, run the [get-custom-data-identifier](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-custom-data-identifier.html) command. For the `id` parameter, specify the custom data identifier's ID.

The following examples show how to use the AWS CLI to create a custom data identifier. The examples create a custom data identifier that's designed to detect employee IDs that use a specific syntax and are within proximity of a specified keyword. The examples also define custom severity settings for findings that the identifier produces.

This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws macie2 create-custom-data-identifier \
--name "EmployeeIDs" \
--regex "[A-Z]-\d{8}" \
--keywords '["employee","employee ID"]' \
--maximum-match-distance 20 \
--severity-levels '[{"occurrencesThreshold":1,"severity":"LOW"},{"occurrencesThreshold":50,"severity":"MEDIUM"},{"occurrencesThreshold":100,"severity":"HIGH"}]' \
--description "Detects employee IDs in proximity of a keyword." \
--tags '{"Stack":"Production"}'
```

This example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 create-custom-data-identifier ^
--name "EmployeeIDs" ^
--regex "[A-Z]-\d{8}" ^
--keywords "[\"employee\",\"employee ID\"]" ^
--maximum-match-distance 20 ^
--severity-levels "[{\"occurrencesThreshold\":1,\"severity\":\"LOW\"},{\"occurrencesThreshold\":50,\"severity\":\"MEDIUM\"},{\"occurrencesThreshold\":100,\"severity\":\"HIGH\"}]" ^
--description "Detects employee IDs in proximity of a keyword." ^
--tags={\"Stack\":\"Production\"}
```

Where:
+ `EmployeeIDs` is the name of the custom data identifier.
+ `[A-Z]-\d{8}` is the regex for the text pattern to match.
+ `employee` and `employee ID` are keywords that must be in proximity of text that matches the regex pattern.
+ `20` is the maximum number of characters that can exist between the end of a keyword and the end of text that matches the regex pattern.
+ `description` specifies a brief description of the custom data identifier.
+ `severity-levels` defines custom occurrences thresholds for the severity of findings that the custom data identifier produces: `LOW` for 1–49 occurrences; `MEDIUM` for 50–99 occurrences; and, `HIGH` for 100 or more occurrences.
+ `Stack` is the tag key of the tag to assign to the custom data identifier. `Production` is the tag value for the specified tag key.

------

After you create the custom data identifier, you can [create and configure sensitive data discovery jobs](discovery-jobs-create.md) to use it, or [add it to your settings for automated sensitive data discovery](discovery-asdd-account-configure.md).

# Deleting a custom data identifier
<a name="cdis-delete"></a>

After you create a custom data identifier, you can delete it. If you do this, Amazon Macie soft deletes the custom data identifier. This means that a record of the custom data identifier remains for your account, but it’s marked as deleted. If a custom data identifier has this status, you can’t configure new sensitive data discovery jobs to use it or add it to your settings for automated sensitive data discovery. In addition, you can no longer access it by using the Amazon Macie console. You can, however, retrieve its settings by using the Amazon Macie API. If you delete a custom data identifier, it doesn’t count against the quota of custom data identifiers for your account.

If you configure a sensitive data discovery job to use a custom data identifier that you subsequently delete, the job will run as scheduled and continue to use the custom data identifier. This means that your job results, both sensitive data findings and sensitive data discovery results, will report text that matches the identifier's criteria. This helps ensure that you have an immutable history of sensitive data findings and discovery results for data privacy and protection audits or investigations that you perform.

Similarly, if you configure automated sensitive data discovery to use a custom data identifier that you subsequently delete, daily analysis cycles will proceed and continue to use the custom data identifier. This means that sensitive data findings, statistics, and other types of results will continue to report text that matches the identifier's criteria.

Before you delete a custom data identifier, do the following to prevent Macie from using it during subsequent analysis cycles and job runs:
+ Check your settings for automated sensitive data discovery. If you added the custom data identifier to these settings, remove it. For more information, see [Configuring settings for automated sensitive data discovery](discovery-asdd-account-configure.md).
+ Review your job inventory to identify jobs that use the custom data identifier and are scheduled to run in the future. If you want a job to stop using the custom data identifier, you can cancel the job. Then create a copy of the job, adjust the settings for the copy, and save the copy as a new job. For more information, see [Managing sensitive data discovery jobs](discovery-jobs-manage.md).

It's also a good idea to note the unique identifier (ID) that Macie assigned to the custom data identifier. You'll need this ID if you later want to review the custom data identifier's settings.

After you complete the preceding tasks, delete the custom data identifier.

**To delete a custom data identifier**  
You can delete a custom data identifier by using the Amazon Macie console or the Amazon Macie API.

------
#### [ Console ]

Follow these steps to delete a custom data identifier by using the Amazon Macie console.

**To delete a custom data identifier**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, under **Settings**, choose **Custom data identifiers**.

1. To note the unique identifier (ID) for the custom data identifier that you want to delete, choose the custom data identifier's name. On the page that appears, the **Id** box displays this ID. After you note the ID, choose **Custom data identifiers** in the navigation pane again.

1. On the **Custom data identifiers** page, select the checkbox for the custom data identifier to delete.

1. On the **Actions** menu, choose **Delete**.

1. When prompted for confirmation, choose **Ok**.

------
#### [ API ]

To delete a custom data identifier programmatically, use the [DeleteCustomDataIdentifier](https://docs.aws.amazon.com/macie/latest/APIReference/custom-data-identifiers-id.html) operation of the Amazon Macie API. Or, if you're using the AWS Command Line Interface (AWS CLI), run the [delete-custom-data-identifier](https://docs.aws.amazon.com/cli/latest/reference/macie2/delete-custom-data-identifier.html) command.

For the `id` parameter, specify the unique identifier (ID) for the custom data identifier that you want to delete. You can get this ID by using the [ListCustomDataIdentifiers](https://docs.aws.amazon.com/macie/latest/APIReference/custom-data-identifiers-list.html) operation. This operation retrieves a subset of information about the custom data identifiers for your account. If you're using the AWS CLI, you can run the [list-custom-data-identifiers](https://docs.aws.amazon.com/cli/latest/reference/macie2/list-custom-data-identifiers.html) command to retrieve this information.

The following example shows how to delete a custom data identifier by using the AWS CLI.

```
$ aws macie2 delete-custom-data-identifier --id 393950aa-82ea-4bdc-8f7b-e5be3example
```

Where *393950aa-82ea-4bdc-8f7b-e5be3example* is the ID for the custom data identifier to delete.

If the request succeeds, Macie returns an empty HTTP 200 response. Otherwise, Macie returns an HTTP 4*xx* or 500 response indicating why the request failed.

------

To review a custom data identifier's settings after you delete it, use the [GetCustomDataIdentifier](https://docs.aws.amazon.com/macie/latest/APIReference/custom-data-identifiers-id.html) operation of the Amazon Macie API. Or, if you're using the AWS CLI, run the [get-custom-data-identifier](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-custom-data-identifier.html) command. For the `id` parameter, specify the custom data identifier's ID. After you delete a custom data identifier, you can't access its settings by using the Amazon Macie console.

# Defining sensitive data exceptions with allow lists
<a name="allow-lists"></a>

With allow lists in Amazon Macie, you can define specific text and text patterns that you want Macie to ignore when it inspects Amazon Simple Storage Service (Amazon S3) objects for sensitive data. These are typically sensitive data exceptions for your particular scenarios or environment. If data matches text or a text pattern in an allow list, Macie doesn’t report the data. This is the case even if the data matches the criteria of a [managed data identifier](managed-data-identifiers.md) or a [custom data identifier](custom-data-identifiers.md). By using allow lists, you can refine your analysis of Amazon S3 data and reduce noise.

You can create and use two types of allow lists in Macie:
+ **Predefined text** – For this type of list, you specify certain character sequences to ignore. For example, you might specify the names of public representatives for your organization, specific phone numbers, or specific sample data that your organization uses for testing. If you use this type of list, Macie ignores text that exactly matches an entry in the list.

  This type of allow list is helpful if you want to specify words, phrases, and other kinds of character sequences that aren’t sensitive, aren’t likely to change, and don’t necessarily adhere to a common pattern.
+ **Regular expression** – For this type of list, you specify a regular expression (*regex*) that defines a text pattern to ignore. For example, you might specify the pattern for your organization's public phone numbers, email addresses for your organization’s domain, or patterned sample data that your organization uses for testing. If you use this type of list, Macie ignores text that completely matches the pattern defined by the list.

  This type of allow list is helpful if you want to specify text that isn’t sensitive but varies or is likely to change while also adhering to a common pattern.

After you create an allow list, you can [create and configure sensitive data discovery jobs](discovery-jobs-create.md) to use it, or [add it to your settings for automated sensitive data discovery](discovery-asdd-account-configure.md). Macie then uses the list when it analyzes data. If Macie finds text that matches an entry or pattern in an allow list, Macie doesn’t report that occurrence of text in sensitive data findings, statistics, and other types of results.

You can manage and use allow lists in all the AWS Regions where Macie is currently available except the Asia Pacific (Osaka) Region.

**Topics**
+ [Configuration options for allow lists](allow-lists-options.md)
+ [Creating an allow list](allow-lists-create.md)
+ [Checking the status of an allow list](allow-lists-status-check.md)
+ [Changing an allow list](allow-lists-change.md)
+ [Deleting an allow list](allow-lists-delete.md)

# Configuration options and requirements for allow lists
<a name="allow-lists-options"></a>

In Amazon Macie, you can use allow lists to specify text or text patterns that you want Macie to ignore when it inspects Amazon Simple Storage Service (Amazon S3) objects for sensitive data. Macie provides options for two types of allow lists, predefined text and regular expressions.

A list of predefined text is helpful if you want Macie to ignore specific words, phrases, and other kinds of character sequences that you don't consider sensitive. Examples are: the names of public representatives for your organization, specific phone numbers, or specific sample data that your organization uses for testing. If Macie finds text that matches the criteria of a managed or custom data identifier and the text also matches an entry in an allow list, Macie doesn't report that occurrence of text in sensitive data findings, statistics, and other types of results.

A regular expression (*regex*) is helpful if you want Macie to ignore text that varies or is likely to change while also adhering to a common pattern. The regex specifies a text pattern to ignore. Examples are: public phone numbers for your organization, email addresses for your organization's domain, or patterned sample data that your organization uses for testing. If Macie finds text that matches the criteria of a managed or custom data identifier and the text also matches a regex pattern in an allow list, Macie doesn't report that occurrence of text in sensitive data findings, statistics, and other types of results.

You can create and use both types of allow lists in all the AWS Regions where Macie is currently available except the Asia Pacific (Osaka) Region. As you create and manage allow lists, keep the following options and requirements in mind. Also note that list entries and regex patterns for mailing addresses aren't supported.

**Contents**
+ [Options and requirements for lists of predefined text](#allow-lists-options-s3list)
  + [Syntax requirements](#allow-lists-options-s3list-syntax)
  + [Storage requirements](#allow-lists-options-s3list-storage)
  + [Encryption/Decryption requirements](#allow-lists-options-s3list-encryption)
  + [Design considerations and recommendations](#allow-lists-options-s3list-notes)
+ [Options and requirements for regular expressions](#allow-lists-options-regex)
  + [Syntax support and recommendations](#allow-lists-options-regex-syntax)
  + [Examples](#allow-lists-options-regex-examples)

## Options and requirements for lists of predefined text
<a name="allow-lists-options-s3list"></a>

For this type of allow list, you provide a line-delimited plaintext file that lists specific character sequences to ignore. The list entries are typically words, phrases, and other kinds of character sequences that you don’t consider sensitive, aren’t likely to change, and don’t necessarily adhere to a specific pattern. If you use this type of list, Amazon Macie doesn't report occurrences of text that exactly match an entry in the list. Macie treats each list entry as a string literal value.

To use this type of allow list, start by creating the list in a text editor and saving it as a plaintext file. Then upload the list to an S3 general purpose bucket. Also ensure that the storage and encryption settings for the bucket and the object allow Macie to retrieve and decrypt the list. Then [create and configure settings for the list](allow-lists-create.md) in Macie.

After you configure the settings in Macie, we recommend that you test the allow list with a small, representative set of data for your account or organization. To test a list, you can [create a one-time job](discovery-jobs-create.md). Configure the job to use the list in addition to the managed and custom data identifiers that you typically use to analyze data. You can then review the job's results—sensitive data findings, sensitive data discovery results, or both. If the job's results differ from what you expect, you can change and test the list until the results are what you expect.

After you finish configuring and testing an allow list, you can create and configure additional jobs to use it, or add it to your settings for automated sensitive data discovery. When those jobs start to run or the next automated discovery analysis cycle starts, Macie retrieves the latest version of the list from Amazon S3 and stores it in temporary memory. Macie then uses this temporary copy of the list when it inspects S3 objects for sensitive data. When a job finishes running or the analysis cycle is complete, Macie permanently deletes its copy of the list from memory. The list doesn't persist in Macie. Only the list's settings persist in Macie.

**Important**  
Because lists of predefined text don't persist in Macie, it's important to [check the status of your allow lists](allow-lists-status-check.md) periodically. If Macie can’t retrieve or parse a list that you configured a job or automated discovery to use, Macie doesn’t use the list. This might produce unexpected results, such as sensitive data findings for text that you specified in the list.

**Topics**
+ [Syntax requirements](#allow-lists-options-s3list-syntax)
+ [Storage requirements](#allow-lists-options-s3list-storage)
+ [Encryption/Decryption requirements](#allow-lists-options-s3list-encryption)
+ [Design considerations and recommendations](#allow-lists-options-s3list-notes)

### Syntax requirements
<a name="allow-lists-options-s3list-syntax"></a>

When you create this type of allow list, note the following requirements for the list's file:
+ The list must be stored as a plaintext (`text/plain`) file, such as a .txt, .text, or .plain file.
+ The list must use line breaks to separate individual entries. For example:

  ```
  Akua Mansa
  John Doe
  Martha Rivera
  425-555-0100
  425-555-0101
  425-555-0102
  ```

  Macie treats each line as a single, distinct entry in the list. The file can also contain blank lines to improve readability. Macie skips blank lines when it parses the file.
+ Each entry can contain 1–90 UTF–8 characters.
+ Each entry must be a complete, exact match for the text to ignore. Macie doesn't support use of wildcard characters or partial values for entries. Macie treats each entry as a string literal value. Matches aren't case sensitive.
+ The file can contain 1–100,000 entries.
+ The total storage size of the file can't exceed 35 MB.

### Storage requirements
<a name="allow-lists-options-s3list-storage"></a>

As you add and manage allow lists in Amazon S3, note the following storage requirements and recommendations:
+ **Regional support** – An allow list must be stored in a bucket that's in the same AWS Region as your Macie account. Macie can’t access an allow list if it’s stored in a different Region.
+ **Bucket ownership** – An allow list must be stored in a bucket that's owned by your AWS account. If you want other accounts to use the same allow list, consider creating an Amazon S3 replication rule to replicate the list to buckets that are owned by those accounts. For information about replicating S3 objects, see [Replicating objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication.html) in the *Amazon Simple Storage Service User Guide*.

  In addition, your AWS Identity and Access Management (IAM) identity must have read access to the bucket and object that store the list. Otherwise, you won't be allowed to create or update the list's settings or check the list's status by using Macie.
+ **Storage types and classes** – An allow list must be stored in a general purpose bucket, not a directory bucket. In addition, it must be stored using one of the following storage classes: Reduced Redundancy (RRS), S3 Glacier Instant Retrieval, S3 Intelligent-Tiering, S3 One Zone-IA, S3 Standard, or S3 Standard-IA.
+ **Bucket policies** – If you store an allow list in a bucket that has a restrictive bucket policy, ensure that the policy allows Macie to retrieve the list. To do this, you can add a condition for the Macie service-linked role to the bucket policy. For more information, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md).

  Also ensure that the policy allows your IAM identity to have read access to the bucket. Otherwise, you won't be allowed to create or update the list's settings or check the list's status by using Macie.
+ **Object paths** – If you store more than one allow list in Amazon S3, the object path for each list must be unique. In other words, each allow list must be stored separately in its own S3 object.
+ **Versioning** – When you add an allow list to a bucket, we recommend that you also enable versioning for the bucket. You can then use date and time values to correlate versions of the list with the results of sensitive data discovery jobs and automated sensitive data discovery cycles that use the list. This can help with data privacy and protection audits or investigations that you perform.
+ **Object Lock** – To prevent an allow list from being deleted or overwritten for a certain amount of time or indefinitely, you can enable Object Lock for the bucket that stores the list. Enabling this setting doesn’t prevent Macie from accessing the list. For information about this setting, see [Locking objects with Object Lock](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html) in the *Amazon Simple Storage Service User Guide*.

### Encryption/Decryption requirements
<a name="allow-lists-options-s3list-encryption"></a>

If you encrypt an allow list in Amazon S3, the permissions policy for the [Macie service-linked role](service-linked-roles.md) typically grants Macie the permissions that it needs to decrypt the list. However, this depends on the type of encryption that’s used:
+ If a list is encrypted using server-side encryption with an Amazon S3 managed key (SSE-S3), Macie can decrypt the list. The service-linked role for your Macie account grants Macie the permissions that it needs.
+ If a list is encrypted using server-side encryption with an AWS managed AWS KMS key (DSSE-KMS or SSE-KMS), Macie can decrypt the list. The service-linked role for your Macie account grants Macie the permissions that it needs.
+ If a list is encrypted using server-side encryption with a customer managed AWS KMS key (DSSE-KMS or SSE-KMS), Macie can decrypt the list only if you allow Macie to use the key. To learn how to do this, see [Allowing Macie to use a customer managed AWS KMS key](discovery-supported-encryption-types.md#discovery-supported-encryption-cmk-configuration).
**Note**  
You can encrypt a list with a customer managed AWS KMS key in an external key store. However, the key might then be slower and less reliable than a key that’s managed entirely within AWS KMS. If latency or an availability issue prevents Macie from decrypting the list, Macie doesn’t use the list when it analyzes S3 objects. This might produce unexpected results, such as sensitive data findings for text that you specified in the list. To reduce this risk, consider storing the list in an S3 bucket that’s configured to use the key as an S3 Bucket Key.  
For information about using KMS keys in external key stores, see [External key stores](https://docs.aws.amazon.com/kms/latest/developerguide/keystore-external.html) in the *AWS Key Management Service Developer Guide*. For information about using S3 Bucket Keys, see [Reducing the cost of SSE-KMS with Amazon S3 Bucket Keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-key.html) in the *Amazon Simple Storage Service User Guide*.
+ If a list is encrypted using server-side encryption with a customer-provided key (SSE-C) or client-side encryption, Macie can’t decrypt the list. Consider using SSE-S3, DSSE-KMS, or SSE-KMS encryption instead.

If a list is encrypted with an AWS managed KMS key or a customer managed KMS key, your AWS Identity and Access Management (IAM) identity must also be allowed to use the key. Otherwise, you won't be allowed to create or update the list's settings or check the list's status by using Macie. To learn how to check or change the permissions for a KMS key, see [Key policies in AWS KMS](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html) in the *AWS Key Management Service Developer Guide*.

For detailed information about encryption options for Amazon S3 data, see [Protecting data with encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingEncryption.html) in the *Amazon Simple Storage Service User Guide*.

### Design considerations and recommendations
<a name="allow-lists-options-s3list-notes"></a>

In general, Macie treats each entry in an allow list as a string literal value. That is to say, Macie ignores each occurrence of text that exactly matches a complete entry in an allow list. Matches aren't case sensitive.

However, Macie uses the entries as part of a larger data extraction and analysis framework. The framework includes machine learning and pattern matching functions that factor dimensions such as grammatical and syntactical variations and, in many cases, keyword proximity. The framework also factors an S3 object’s file type or storage format. Therefore, keep the following considerations and recommendations in mind as you add and manage the entries in an allow list.

**Prepare for different file types and storage formats**  
For unstructured data, such as text in an Adobe Portable Document Format (.pdf) file, Macie ignores text that exactly matches a complete entry in an allow list, including text that spans multiple lines or pages.  
For structured data, such as columnar data in a CSV file or record-based data in a JSON file, Macie ignores text that exactly matches a complete entry in an allow list if all the text is stored in a single field, cell, or array. This requirement doesn’t apply to structured data that’s stored in an otherwise unstructured file, such as a table in a .pdf file.  
For example, consider the following content in a CSV file:  

```
Name,Account ID
Akua Mansa,111111111111
John Doe,222222222222
```
If `Akua Mansa` and `John Doe` are entries in an allow list, Macie ignores those names in the CSV file. The complete text of each list entry is stored in a single `Name` field.  
Conversely, consider a CSV file that contains the following columns and fields:  

```
First Name,Last Name,Account ID
Akua,Mansa,111111111111
John,Doe,222222222222
```
If `Akua Mansa` and `John Doe` are entries in an allow list, Macie doesn’t ignore those names in the CSV file. None of the fields in the CSV file contain the complete text of an entry in the allow list.

**Include common variations**  
Add entries for common variations of numeric data, proper nouns, terms, and alphanumeric character sequences. For example, if you add names or phrases that contain only one space between words, also add variations that include two spaces between words. Similarly, add words and phrases that do and don’t contain special characters, and consider including common syntactical and semantic variations.  
For the US phone number *425-555-0100*, for example, you might add these entries to an allow list:  

```
425-555-0100
425.555.0100
(425) 555-0100
+1-425-555-0100
```
For the date *February 1, 2022* in a multinational context, you might add entries that include common syntactical variations for English and French, including variations that do and don't include special characters:  

```
February 1, 2022
1 février 2022
1 fevrier 2022
Feb 01, 2022
1 fév 2022
1 fev 2022
02/01/2022
01/02/2022
```
For names of people, include entries for various forms of a name that you don't consider sensitive. For example, include: the first name followed by the last name; the last name followed by the first name, the first and last name separated by one space; the first and last name separated by two spaces; and nicknames.  
For the name *Martha Rivera*, for example, you might add:  

```
Martha Rivera
Martha  Rivera
Rivera, Martha
Rivera,  Martha
Rivera Martha
Rivera  Martha
```
If you want to ignore variations of a specific name that contains many parts, create an allow list that uses a regular expression instead. For example, for the name *Dr. Martha Lyda Rivera, PhD*, you might use the following regular expression: `^(Dr. )?Martha\s(Lyda|L\.)?\s?Rivera,?( PhD)?$`.

## Options and requirements for regular expressions
<a name="allow-lists-options-regex"></a>

For this type of allow list, you specify a regular expression (*regex*) that defines a text pattern to ignore. For example, you might specify the pattern for your organization's public phone numbers, email addresses for your organization’s domain, or patterned sample data that your organization uses for testing. The regex defines a common pattern for a specific kind of data that you don’t consider sensitive. If you use this type of allow list, Amazon Macie doesn't report occurrences of text that completely match the specified pattern. Unlike an allow list that specifies predefined text to ignore, you create and store the regex and all other list settings in Macie.

When you create or update this type of allow list, you can test the list’s regex with sample data before you save the list. We recommend that you do this with multiple sets of sample data. If you create a regex that’s too general, Macie might ignore occurrences of text that you consider sensitive. If a regex is too specific, Macie might not ignore occurrences of text that you don’t consider sensitive. To protect against malformed or long-running expressions, Macie also compiles and tests the regex against a collection of sample text automatically, and notifies you of issues to address.

For additional testing, we recommend that you also test the list’s regex with a small, representative set of data for your account or organization. To do this, you can [create a one-time job](discovery-jobs-create.md). Configure the job to use the list in addition to the managed and custom data identifiers that you typically use to analyze data. You can then review the job's results—sensitive data findings, sensitive data discovery results, or both. If the job's results differ from what you expect, you can change and test the regex until the results are what you expect.

After you configure and test an allow list, you can create and configure additional jobs to use it, or add it to your settings for automated sensitive data discovery. When those job run or Macie performs automated discovery, Macie uses the latest version of the list's regex to analyze data.

**Topics**
+ [Syntax support and recommendations](#allow-lists-options-regex-syntax)
+ [Examples](#allow-lists-options-regex-examples)

### Syntax support and recommendations
<a name="allow-lists-options-regex-syntax"></a>

An allow list can specify a regular expression (*regex*) that contains as many as 512 characters. Macie supports a subset of the regex pattern syntax provided by the [Perl Compatible Regular Expressions (PCRE) library](https://www.pcre.org/). Of the constructs provided by the PCRE library, Macie doesn’t support the following pattern elements:
+ Backreferences
+ Capturing groups
+ Conditional patterns
+ Embedded code
+ Global pattern flags, such as `/i`, `/m`, and `/x`
+ Recursive patterns
+ Positive and negative look-behind and look-ahead zero-width assertions, such as `?=`, `?!`, `?<=`, and `?<!`

To create effective regex patterns for allow lists, note the following tips and recommendations:
+ **Anchors** – Use anchors (`^` or `$`) only if you expect the pattern to appear at the beginning or end of a file, not the beginning or end of a line.
+ **Bounded repeats** – For performance reasons, Macie limits the size of bounded repeat groups. For example, `\d{100,1000}` won’t compile in Macie. To approximate this functionality, you can use an open-ended repeat such as `\d{100,}`.
+ **Case insensitivity** – To make parts of a pattern case insensitive, you can use the `(?i)` construct instead of the `/i` flag.
+ **Performance** – There’s no need to optimize prefixes or alternations manually. For example, changing `/hello|hi|hey/` to `/h(?:ello|i|ey)/` won’t improve performance.
+ **Wildcards** – For performance reasons, Macie limits the number of repeated wildcards. For example, `a*b*a*` won’t compile in Macie.
+ **Alternation** – To specify more than one pattern in a single allow list, you can use the alternation operator (`|`) to concatenate the patterns. If you do this, Macie uses OR logic to combine the patterns and form a new pattern. For example, if you specify `(apple|orange)`, Macie recognizes both *apple* and *orange* as a match and ignores occurrences of both words. If you concatenate patterns, be sure to limit the overall length of the concatenated expression to 512 or fewer characters.

Finally, when you develop the regex, design it to accommodate different file types and storage formats. Macie uses the regex as part of a larger data extraction and analysis framework. The framework factors an S3 object’s file type or storage format. For structured data, such as columnar data in a CSV file or record-based data in a JSON file, Macie ignores text that completely matches the pattern only if all the text is stored in a single field, cell, or array. This requirement doesn’t apply to structured data that’s stored in an otherwise unstructured file, such as a table in an Adobe Portable Document Format (.pdf) file. For unstructured data, such as text in a .pdf file, Macie ignores text that completely matches the pattern, including text that spans multiple lines or pages. 

### Examples
<a name="allow-lists-options-regex-examples"></a>

The following examples demonstrate valid regex patterns for some common scenarios.

**Email addresses**  
If you use a custom data identifier to detect email addresses, you can ignore email addresses that you don't consider sensitive, such as email addresses for your organization.  
To ignore email addresses for a particular second-level and top-level domain, you can use this pattern:  
`[a-zA-Z0-9_.+\\-]+@example\.com`  
Where *example* is the name of the second-level domain and *com* is the top-level domain. In this case, Macie matches and ignores addresses such as *johndoe@example.com* and *john.doe@example.com*.  
To ignore email addresses for a particular domain in any generic top-level domain (gTLD), such as *.com* or *.gov*, you can use this pattern:  
`[a-zA-Z0-9_.+\\-]+@example\.[a-zA-Z]{2,}`  
Where *example* is the name of the domain. In this case, Macie matches and ignores addresses such as *johndoe@example.com*, *john.doe@example.gov*, and *johndoe@example.edu*.  
To ignore email addresses for a particular domain in any one country code top-level domain (ccTLD), such as *.ca* for Canada or *.au* for Australia, you can use this pattern:  
`[a-zA-Z0-9_.+\\-]+@example\.(ca|au)`  
Where *example* is the name of the domain and *ca* and *au* are specific ccTLDs to ignore. In this case, Macie matches and ignores addresses such as *johndoe@example.ca* and *john.doe@example.au*.  
To ignore email addresses that are for a particular domain and gTLD and include third- and fourth-level domains, you can use this pattern:  
`[a-zA-Z0-9_.+\\-]+@([a-zA-Z0-9-]+\.)?[a-zA-Z0-9-]+\.example\.com`  
Where *example* is the name of the domain and *com* is the gTLD. In this case, Macie matches and ignores addresses such as *johndoe@www.example.com* and *john.doe@www.team.example.com*.

**Phone numbers**  
Macie provides managed data identifiers that can detect phone numbers for several countries and regions. To ignore certain phone numbers, such as toll-free numbers or public phone numbers for your organization, you can use patterns such as the following.  
To ignore toll-free, US phone numbers that use the *800* area code and are formatted as *(800) \$1\$1\$1-\$1\$1\$1\$1*:  
`^\(?800\)?[ -]?\d{3}[ -]?\d{4}$`  
To ignore toll-free, US phone numbers that use the *888* area code and are formatted as *(888) \$1\$1\$1-\$1\$1\$1\$1*:  
`^\(?888\)?[ -]?\d{3}[ -]?\d{4}$`  
To ignore 10-digit, French phone numbers that include the *33* country code and are formatted as *\$133 \$1\$1 \$1\$1 \$1\$1 \$1\$1 \$1\$1*:  
`^\+33 \d( \d\d){4}$`  
To ignore US and Canadian phone numbers that use particular area and exchange codes, don’t include a country code, and are formatted as *(\$1\$1\$1) \$1\$1\$1-\$1\$1\$1\$1*:  
`^\(?123\)?[ -]?555[ -]?\d{4}$`  
Where *123* is the area code and *555* is the exchange code.  
To ignore US and Canadian phone numbers that use particular area and exchange codes, include a country code, and are formatted as *\$11 (\$1\$1\$1) \$1\$1\$1-\$1\$1\$1\$1*:  
`^\+1\(?123\)?[ -]?555[ -]?\d{4}$`  
Where *123* is the area code and *555* is the exchange code.

# Creating an allow list
<a name="allow-lists-create"></a>

In Amazon Macie, an allow list defines specific text or a text pattern that you want Macie to ignore when it inspects Amazon Simple Storage Service (Amazon S3) objects for sensitive data. If text matches an entry or pattern in an allow list, Macie doesn’t report the text in sensitive data findings, statistics, or other types of results. This is the case even if the text matches the criteria of a [managed data identifier](managed-data-identifiers.md) or a [custom data identifier](custom-data-identifiers.md).

You can create the following types of allow lists in Macie.

**Predefined text**  
Use this type of list to specify words, phrases, and other kinds of character sequences that aren’t sensitive, aren’t likely to change, and don’t necessarily adhere to a common pattern. Examples are: the names of public representatives for your organization, specific phone numbers, and specific sample data that your organization uses for testing. If you use this type of list, Macie ignores text that exactly matches an entry in the list.  
For this type of list, you create a line-delimited plaintext file that lists specific text to ignore. You then store the file in an S3 bucket and configure settings for Macie to access the list in the bucket. You can then create and configure sensitive data discovery jobs to use the list, or add the list to your settings for automated sensitive data discovery. When each job starts to run or the next automated discovery analysis cycle starts, Macie retrieves the latest version of the list from Amazon S3. Macie then uses that version of the list when it inspects S3 objects for sensitive data. If Macie finds text that exactly matches an entry in the list, Macie doesn't report that occurrence of text as sensitive data.

**Regular expression**  
Use this type of list to specify a regular expression (*regex*) that defines a text pattern to ignore. Examples are: public phone numbers for your organization, email addresses for your organization’s domain, and patterned sample data that your organization uses for testing. If you use this type of list, Macie ignores text that completely matches the regex pattern defined by the list.  
For this type of list, you create a regex that defines a common pattern for text that isn't sensitive but varies or is likely to change. Unlike a list of predefined text, you create and store the regex and all other list settings in Macie. You can then create and configure sensitive data discovery jobs to use the list, or add the list to your settings for automated sensitive data discovery. When those jobs run or Macie performs automated discovery, Macie uses the latest version of the list's regex to analyze data. If Macie finds text that completely matches the pattern defined by the list, Macie doesn't report that occurrence of text as sensitive data.

For detailed requirements, recommendations, and examples of each type, see [Configuration options and requirements for allow lists](allow-lists-options.md).

You can create as many as 10 allow lists in each supported AWS Region: up to five allow lists that specify predefined text, and up to five allow lists that specify regular expressions. You can create and use allow lists in all the AWS Regions where Macie is currently available except the Asia Pacific (Osaka) Region.

**To create an allow list**  
How you create an allow list depends on the type of list that you want to create: a file that lists predefined text to ignore, or a regular expression that defines a text pattern to ignore. The following sections provide instructions for each type. Choose the section for the type of list that you want to create.



## Predefined text
<a name="allow-lists-create-s3list"></a>

Before you create this type of allow list in Macie, do the following:

1. By using a text editor, create a line-delimited plaintext file that lists specific text to ignore—for example, a .txt, .text, or .plain file. For more information, see [Syntax requirements](allow-lists-options.md#allow-lists-options-s3list-syntax).

1. Upload the file to an S3 general purpose bucket and note the name of the bucket and the object. You'll need to enter these names when you configure the settings in Macie.

1. Ensure that the settings for the S3 bucket and object allow you and Macie to retrieve the list from the bucket. For more information, see [Storage requirements](allow-lists-options.md#allow-lists-options-s3list-storage).

1. If you encrypted the S3 object, ensure that it's encrypted with a key that you and Macie are allowed to use. For more information, see [Encryption/Decryption requirements](allow-lists-options.md#allow-lists-options-s3list-encryption).

After you complete these tasks, you're ready to configure the list's settings in Macie. You can configure the settings by using the Amazon Macie console or the Amazon Macie API. 

------
#### [ Console ]

Follow these steps to configure the settings for an allow list by using the Amazon Macie console.

**To configure allow list settings in Macie**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, under **Settings**, choose **Allow lists**.

1. On the **Allow lists** page, choose **Create**.

1. Under **Select a list type**, choose **Predefined text**.

1. Under **List settings**, use the following options to enter additional settings for the allow list:
   + For **Name**, enter a name for the list. The name can contain as many as 128 characters.
   + For **Description**, optionally enter a brief description of the list. The description can contain as many as 512 characters.
   + For **S3 bucket name**, enter the name of the bucket that stores the list.

     In Amazon S3, you can find this value in the **Name** field of the bucket's properties. This value is case sensitive. In addition, don't use wildcard characters or partial values when you enter the name.
   + For **S3 object name**, enter the name of the S3 object that stores the list.

     In Amazon S3, you can find this value in the **Key** field of the object's properties. If the name includes a path, be sure to include the complete path when you enter the name, for example **allowlists/macie/mylist.txt**. This value is case sensitive. In addition, don't use wildcard characters or partial values when you enter the name.

1. (Optional) Under **Tags**, choose **Add tag**, and then enter as many as 50 tags to assign to the allow list.

   A *tag* is a label that you define and assign to certain types of AWS resources. Each tag consists of a required tag key and an optional tag value. Tags can help you identify, categorize, and manage resources in different ways, such as by purpose, owner, environment, or other criteria. To learn more, see [Tagging Macie resources](tagging-resources.md).

1. When you finish, choose **Create**.

Macie tests the list's settings. Macie also verifies that it can retrieve the list from Amazon S3 and parse the list's content. If an error occurs, Macie displays a message that describes the error. For detailed information that can help you troubleshoot the error, see [Options and requirements for lists of predefined text](allow-lists-options.md#allow-lists-options-s3list). After you address any errors, you can save the list's settings.

------
#### [ API ]

To configure allow list settings programmatically, use the [CreateAllowList](https://docs.aws.amazon.com/macie/latest/APIReference/allow-lists.html) operation of the Amazon Macie API and specify the appropriate values for the required parameters.

For the `criteria` parameter, use an `s3WordsList` object to specify the name of the S3 bucket (`bucketName`) and the name of the S3 object (`objectKey`) that stores the list. To determine the bucket name, refer to the `Name` field in Amazon S3. To determine the object name, refer to the `Key` field in Amazon S3. Note that these values are case sensitive. In addition, don't use wildcard characters or partial values when you specify these names.

To configure the settings by using the AWS CLI, run the [create-allow-list](https://docs.aws.amazon.com/cli/latest/reference/macie2/create-allow-list.html) command and specify the appropriate values for the required parameters. The following examples show how to configure the settings for an allow list that's stored in an S3 bucket named *amzn-s3-demo-bucket*. The name of the S3 object that stores the list is *allowlists/macie/mylist.txt*.

This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws macie2 create-allow-list \
--criteria '{"s3WordsList":{"bucketName":"amzn-s3-demo-bucket","objectKey":"allowlists/macie/mylist.txt"}}' \
--name my_allow_list \
--description "Lists public phone numbers and names for Example Corp."
```

This example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 create-allow-list ^
--criteria={\"s3WordsList\":{\"bucketName\":\"amzn-s3-demo-bucket\",\"objectKey\":\"allowlists/macie/mylist.txt\"}} ^
--name my_allow_list ^
--description "Lists public phone numbers and names for Example Corp."
```

When you submit your request, Macie tests the list's settings. Macie also verifies that it can retrieve the list from Amazon S3 and parse the list's content. If an error occurs, your request fails and Macie returns a message that describes the error. For detailed information that can help you troubleshoot the error, see [Options and requirements for lists of predefined text](allow-lists-options.md#allow-lists-options-s3list).

If Macie can retrieve and parse the list, your request succeeds and you receive output similar to the following.

```
{
    "arn": "arn:aws:macie2:us-west-2:123456789012:allow-list/nkr81bmtu2542yyexample",
    "id": "nkr81bmtu2542yyexample"
}
```

Where `arn` is the Amazon Resource Name (ARN) of the allow list that was created, and `id` is the unique identifier for the list.

------

After you save the list's settings, you can [create and configure sensitive data discovery jobs](discovery-jobs-create.md) to use the list, or [add the list to your settings for automated sensitive data discovery](discovery-asdd-account-configure.md). Each time those jobs start to run or an automated discovery analysis cycle starts, Macie retrieves the latest version of the list from Amazon S3. Macie then uses that version of the list when it analyzes data.

## Regular expression
<a name="allow-lists-create-regex"></a>

When you create an allow list that specifies a regular expression (*regex*), you define the regex and all other list settings directly in Macie. For the regex, Macie supports a subset of the pattern syntax provided by the [Perl Compatible Regular Expressions (PCRE) library](https://www.pcre.org/). For more information, see [Syntax support and recommendations](allow-lists-options.md#allow-lists-options-regex-syntax). 

You can create this type of list by using the Amazon Macie console or the Amazon Macie API. 

------
#### [ Console ]

Follow these steps to create an allow list by using the Amazon Macie console.

**To create an allow list by using the console**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, under **Settings**, choose **Allow lists**.

1. On the **Allow lists** page, choose **Create**.

1. Under **Select a list type**, choose **Regular expression**.

1. Under **List settings**, use the following options to enter additional settings for the allow list:
   + For **Name**, enter a name for the list. The name can contain as many as 128 characters.
   + For **Description**, optionally enter a brief description of the list. The description can contain as many as 512 characters.
   + For **Regular expression**, enter the regex that defines the text pattern to ignore. The regex can contain as many as 512 characters.

1. (Optional) For **Evaluate**, enter up to 1,000 characters in the **Sample data** box, and then choose **Test** to test the regex. Macie evaluates the sample data and reports the number of occurrences of text that match the regex. You can repeat this step as many times as you like to refine and optimize the regex.
**Note**  
We recommend that you test and refine the regex with multiple sets of sample data. If you create a regex that’s too general, Macie might ignore occurrences of text that you consider sensitive. If a regex is too specific, Macie might not ignore occurrences of text that you don’t consider sensitive.

1. (Optional) Under **Tags**, choose **Add tag**, and then enter as many as 50 tags to assign to the allow list.

   A *tag* is a label that you define and assign to certain types of AWS resources. Each tag consists of a required tag key and an optional tag value. Tags can help you identify, categorize, and manage resources in different ways, such as by purpose, owner, environment, or other criteria. To learn more, see [Tagging Macie resources](tagging-resources.md).

1. When you finish, choose **Create**.

Macie tests the list's settings. Macie also tests the regex to verify that it can compile the expression. If an error occurs, Macie displays a message that describes the error. For detailed information that can help you troubleshoot the error, see [Options and requirements for regular expressions](allow-lists-options.md#allow-lists-options-regex). After you address any errors, you can save the allow list.

------
#### [ API ]

Before you create this type of allow list in Macie, we recommend that you test and refine the regex with multiple sets of sample data. If you create a regex that’s too general, Macie might ignore occurrences of text that you consider sensitive. If a regex is too specific, Macie might not ignore occurrences of text that you don’t consider sensitive.

To test an expression with Macie, you can use the [TestCustomDataIdentifier](https://docs.aws.amazon.com/macie/latest/APIReference/custom-data-identifiers-test.html) operation of the Amazon Macie API or, for the AWS CLI, run the [test-custom-data-identifier](https://docs.aws.amazon.com/cli/latest/reference/macie2/test-custom-data-identifier.html) command. Macie uses the same underlying code to compile expressions for allow lists and custom data identifiers. If you test an expression in this way, be sure to specify values only for the `regex` and `sampleText` parameters. Otherwise, you'll receive inaccurate results.

When you're ready to create this type of allow list, use the [CreateAllowList](https://docs.aws.amazon.com/macie/latest/APIReference/allow-lists.html) operation of the Amazon Macie API and specify the appropriate values for the required parameters. For the `criteria` parameter, use the `regex` field to specify the regular expression that defines the text pattern to ignore. The expression can contain as many as 512 characters.

To create this type of list by using the AWS CLI, run the [create-allow-list](https://docs.aws.amazon.com/cli/latest/reference/macie2/create-allow-list.html) command and specify the appropriate values for the required parameters. The following examples create an allow list named *my\$1allow\$1list*. The regex is designed to ignore all email addresses that a custom data identifier might otherwise detect for the `example.com` domain.

This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws macie2 create-allow-list \
--criteria '{"regex":"[a-z]@example.com"}' \
--name my_allow_list \
--description "Ignores all email addresses for Example Corp."
```

This example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 create-allow-list ^
--criteria={\"regex\":\"[a-z]@example.com\"} ^
--name my_allow_list ^
--description "Ignores all email addresses for Example Corp."
```

When you submit your request, Macie tests the list's settings. Macie also tests the regex to verify that it can compile the expression. If an error occurs, the request fails and Macie returns a message that describes the error. For detailed information that can help you troubleshoot the error, see [Options and requirements for regular expressions](allow-lists-options.md#allow-lists-options-regex).

If Macie can compile the expression, the request succeeds and you receive output similar to the following:

```
{
    "arn": "arn:aws:macie2:us-west-2:123456789012:allow-list/km2d4y22hp6rv05example",
    "id": "km2d4y22hp6rv05example"
}
```

Where `arn` is the Amazon Resource Name (ARN) of the allow list that was created, and `id` is the unique identifier for the list.

------

After you save the list, you can [create and configure sensitive data discovery jobs](discovery-jobs-create.md) to use it, or [add it to your settings for automated sensitive data discovery](discovery-asdd-account-configure.md). When those jobs run or Macie performs automated discovery, Macie uses the latest version of the list's regex to analyze data.

# Checking the status of an allow list
<a name="allow-lists-status-check"></a>

If you create an allow list, it's important to check its status periodically. Otherwise, errors might cause Amazon Macie to produce unexpected analysis results for your Amazon Simple Storage Service (Amazon S3) data. For example, Macie might create sensitive data findings for text that you specified in an allow list.

If you configure a sensitive data discovery job to use an allow list and Macie can't access or use the list when the job starts to run, the job continues to run. However, Macie doesn't use the list when it analyzes S3 objects. Similarly, if an analysis cycle starts for automated sensitive data discovery and Macie can't access or use a specified allow list, the analysis continues but Macie doesn't use the list.

Errors are unlikely to occur for an allow list that specifies a regular expression (*regex*). This is partly because Macie automatically tests the regex when you create or update the list's settings. In addition, you store the regex and all other list settings in Macie.

However, errors can occur for an allow list that specifies predefined text, partly because you store the list in Amazon S3 instead of Macie. Common causes of errors are:
+ The S3 bucket or object is deleted.
+ The S3 bucket or object is renamed and the list's settings in Macie don't specify the new name.
+ The S3 bucket's permissions settings are changed and Macie loses access to the bucket and the object.
+ The encryption settings for the S3 bucket are changed and Macie can't decrypt the object that stores the list.
+ The policy for the encryption key is changed and Macie loses access to the key. Macie can't decrypt the S3 object that stores the list.

**Important**  
Because these errors affect your analyses' results, we recommend that you check the status of all of your allow lists periodically. We recommend that you also do this if you change the permissions or encryption settings for an S3 bucket that stores an allow list, or you change the policy for an AWS Key Management Service (AWS KMS) key that's used to encrypt a list.

For detailed information that can help you troubleshoot errors that occur, see [Options and requirements for lists of predefined text](allow-lists-options.md#allow-lists-options-s3list).

**To check the status of an allow list**  
You can check the status of an allow list by using the Amazon Macie console or the Amazon Macie API. On the console, you can use a single page to check the status of all of your allow lists at the same time. If you use the Amazon Macie API, you can check the status of individual allow lists, one at a time.



------
#### [ Console ]

Follow these steps to check the status of your allow lists by using the Amazon Macie console.

**To check the status of your allow lists**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, under **Settings**, choose **Allow lists**.

1. On the **Allow lists** page, choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)). Macie tests the settings for all of your allow lists and updates the **Status** field to indicate the current status of each list.

   If a list specifies a regular expression, its status is typically **OK**. This means that Macie can compile the expression. If a list specifies predefined text, its status can be any of the following values.

      
**OK**  
Macie can retrieve and parse the contents of the list.  
**Access denied**  
Macie isn't allowed to access the S3 object that stores the list. Amazon S3 denied the request to retrieve the object. A list can also have this status if the object is encrypted with a customer managed AWS KMS key that Macie isn't allowed to use.   
To address this error, review the bucket policy and other permissions settings for the bucket and the object. Ensure that Macie is allowed to access and retrieve the object. If the object is encrypted with a customer managed AWS KMS key, also review the key policy and ensure that Macie is allowed to use the key.   
**Error**  
A transient or internal error occurred when Macie attempted to retrieve or parse the contents of the list. An allow list can also have this status if it's encrypted with an encryption key that Amazon S3 and Macie can't access or use.  
To address this error, wait a few minutes and then choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) again. If the status continues to be **Error**, check the encryption settings for the S3 object. Ensure that the object is encrypted with a key that Amazon S3 and Macie can access and use.  
**Object is empty**  
Macie can retrieve the list from Amazon S3 but the list doesn't contain any content.  
To address this error, download the object from Amazon S3 and ensure that it contains the correct entries. If the entries are correct, review the list's settings in Macie. Ensure that the specified bucket and object names are correct.  
**Object not found**  
The list doesn't exist in Amazon S3.  
To address this error, review the list's settings in Macie. Ensure that the specified bucket and object names are correct.  
**Quota exceeded**  
Macie can access the list in Amazon S3. However, the number of entries in the list or the storage size of the list exceeds the quota for an allow list.  
To address this error, break the list into multiple files. Ensure that each file contains fewer than 100,000 entries. Also ensure that the size of each file is less than 35 MB. Then, upload each file to Amazon S3. When you finish, configure allow list settings in Macie for each file. You can have as many as five lists of predefined text in each supported AWS Region.  
**Throttled**  
Amazon S3 throttled the request to retrieve the list.  
To address this error, wait a few minutes and then choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) again.  
**User access denied**  
Amazon S3 denied the request to retrieve the object. If the specified object exists, you're not allowed to access it or it's encrypted with an AWS KMS key that you're not allowed to use.  
To address this error, work with your AWS administrator to ensure that the list's settings specify the correct bucket and object names, and you have read access to the bucket and the object. If the object is encrypted, also ensure that it's encrypted with a key that you're allowed to use.

1. To review the settings and status of a specific list, choose the list's name.

------
#### [ API ]

To check the status of an allow list programmatically, use the [GetAllowList](https://docs.aws.amazon.com/macie/latest/APIReference/allow-lists-id.html) operation of the Amazon Macie API. Or, if you're using the AWS CLI, run the [get-allow-list](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-allow-list.html) command.

For the `id` parameter, specify the unique identifier for the allow list whose status you want to check. To get this identifier, you can use the [ListAllowLists](https://docs.aws.amazon.com/macie/latest/APIReference/allow-lists.html) operation. The **ListAllowLists** operation retrieves information about all the allow lists for your account. If you're using the AWS CLI, you can run the [list-allow-lists](https://docs.aws.amazon.com/cli/latest/reference/macie2/list-allow-lists.html) command to retrieve this information.

When you submit a **GetAllowList** request, Macie tests all the settings for the allow list. If the settings specify a regular expression (`regex`), Macie verifies that it can compile the expression. If the settings specify a list of predefined text (`s3WordsList`), Macie verifies that it can retrieve and parse the list.

Macie then returns a `GetAllowListResponse` object that provides the details of the allow list. In the `GetAllowListResponse` object, the `status` object indicates the current status of the list: a status code (`code`) and, depending on the status code, a brief description of the list's status (`description`).

If the allow list specifies a regex, the status code is typically `OK` and there isn't an associated description. This means that Macie compiled the expression successfully.

If the allow list specifies predefined text, the status code varies depending on the test results:
+ If Macie retrieved and parsed the list successfully, the status code is `OK` and there isn't an associated description.
+ If an error prevented Macie from retrieving or parsing the list, the status code and description indicate the nature of the error that occurred. 

For a list of possible status codes and a description of each one, see [AllowListStatus](https://docs.aws.amazon.com/macie/latest/APIReference/allow-lists-id.html#allow-lists-id-model-allowliststatus) in the *Amazon Macie API Reference*.

------

# Changing an allow list
<a name="allow-lists-change"></a>

After you create an allow list, you can change most of the list's settings in Amazon Macie. For example, you can change the list's name and description. You can also add and edit tags for the list. The only setting that you can't change is a list's type. For example, if an existing list specifies a regular expression (*regex*), you can't change its type to predefined text.

If an allow list specifies predefined text, you can also change the entries in the list. To do this, update the file that contains the entries. Then upload the new version of the file to Amazon Simple Storage Service (Amazon S3). The next time Macie prepares to use the list, Macie retrieves the latest version of the file from Amazon S3. When you upload the new file, ensure that you store it in the same S3 bucket and object. Or, if you change the name of the bucket or object, ensure that you update the list's settings in Macie.

**To change the settings for an allow list**  
You can change the settings for an allow list by using the Amazon Macie console or the Amazon Macie API.



------
#### [ Console ]

Follow these steps to change an allow list's settings by using the Amazon Macie console.

**To change an allow list's settings by using the console**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, under **Settings**, choose **Allow lists**.

1. On the **Allow lists** page, choose the name of the allow list that you want to change. The allow list page opens and displays the current settings for the list.

1. To add or edit tags for the allow list, choose **Manage tags** in the **Tags** section. Then change the tags as necessary. When you finish, choose **Save**.

1. To change other settings for the allow list, choose **Edit** in the **List settings** section. Then change the settings that you want:
   + **Name** – Enter a new name for the list. The name can contain as many as 128 characters.
   + **Description** – Enter a new description of the list. The description can contain as many as 512 characters.
   + If the allow list specifies predefined text:
     + **S3 bucket name** – Enter the name of the bucket that stores the list.

       In Amazon S3, you can find this value in the **Name** field of the bucket's properties. This value is case sensitive. In addition, don't use wildcard characters or partial values when you enter the name.
     + **S3 object name** – Enter the name of the S3 object that stores the list.

       In Amazon S3, you can find this value in the **Key** field of the object's properties. If the name includes a path, be sure to include the complete path when you enter the name, for example **allowlists/macie/mylist.txt**. This value is case sensitive. In addition, don't use wildcard characters or partial values when you enter the name.
   + If the allow list specifies a regular expression (*regex*), enter a new regex in the **Regular expression** box. The regex can contain as many as 512 characters.

     After you enter the new regex, optionally test it. To do this, enter up to 1,000 characters in the **Sample data** box, and then choose **Test**. Macie evaluates the sample data and reports the number of occurrences of text that match the regex. You can repeat this step as many times as you like to refine and optimize the regex before you save your changes.

1. When you finish, choose **Save**.

Macie tests the list's settings. For a list of predefined text, Macie also verifies that it can retrieve the list from Amazon S3 and parse the list's content. For a regex, Macie also verifies that it can compile the expression. If an error occurs, Macie displays a message that describes the error. For detailed information that can help you troubleshoot the error, see [Configuration options and requirements for allow lists](allow-lists-options.md). After you address any errors, you can save your changes.

------
#### [ API ]

To change an allow list's settings programmatically, use the [UpdateAllowList](https://docs.aws.amazon.com/macie/latest/APIReference/allow-lists-id.html) operation of the Amazon Macie API. Or, if you're using the AWS CLI, run the [update-allow-list](https://docs.aws.amazon.com/cli/latest/reference/macie2/update-allow-list.html) command. In your request, use the supported parameters to specify a new value for each setting that you want to change. Note that the `criteria`, `id`, and `name` parameters are required. If you don't want to change the value for a required parameter, specify the current value for the parameter. 

For example, the following command changes the name and description of an existing allow list. The example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 update-allow-list ^
--id km2d4y22hp6rv05example ^
--name my_allow_list-email ^
--criteria={\"regex\":\"[a-z]@example.com\"} ^
--description "Ignores all email addresses for the example.com domain"
```

Where:
+ *km2d4y22hp6rv05example* is the unique identifier for the list.
+ *my\$1allow\$1list-email* is the new name for the list.
+ *[a-z]@example.com* is the list's criteria, a regular expression.
+ *Ignores all email addresses for the example.com domain* is the new description for the list.

When you submit your request, Macie tests the list's settings. If the list specifies predefined text (`s3WordsList`), this includes verifying that Macie can retrieve the list from Amazon S3 and parse the list's content. If the list specifies a regex (`regex`), this includes verifying that Macie can compile the expression.

If an error occurs when Macie tests the settings, your request fails and Macie returns a message that describes the error. For detailed information that can help you troubleshoot the error, see [Configuration options and requirements for allow lists](allow-lists-options.md). If the request fails for another reason, Macie returns an HTTP 4*xx* or 500 response that indicates why the operation failed.

If your request succeeds, Macie updates the list's settings and you receive output similar to the following.

```
{
    "arn": "arn:aws:macie2:us-west-2:123456789012:allow-list/km2d4y22hp6rv05example",
    "id": "km2d4y22hp6rv05example"
}
```

Where `arn` is the Amazon Resource Name (ARN) of the allow list that was updated, and `id` is the unique identifier for the list.

------

# Deleting an allow list
<a name="allow-lists-delete"></a>

When you delete an allow list in Amazon Macie, you permanently delete all the list's settings. These settings can't be recovered after they're deleted. If the settings specify a list of predefined text that you store in Amazon Simple Storage Service (Amazon S3), Macie doesn't delete the S3 object that stores the list. Only the settings in Macie are deleted.

If you configure sensitive data discovery jobs to use an allow list that you subsequently delete, the jobs will run as scheduled. However, your job results, both sensitive data findings and sensitive data discovery results, might report text that you previously specified in the allow list. Similarly, if you configure automated sensitive data discovery to use a list that you subsequently delete, daily analyses cycles will proceed. However, sensitive data findings, statistics, and other types of results might report text that you previously specified in the allow list.

Before you delete an allow list, we recommend that you [review your job inventory](discovery-jobs-manage-view.md) to identify jobs that use the list and are scheduled to run in the future. In the inventory, the details panel indicates whether a job is configured to use any allow lists and, if so, which ones. We recommend that you also [check your settings for automated sensitive data discovery](discovery-asdd-account-configure.md). You might determine that it's best to change a list instead of deleting it.

As an additional safeguard, Macie checks the settings for all of your jobs when you try to delete an allow list. If you configured jobs to use the list and any of those jobs have a status other than **Complete** or **Cancelled**, Macie doesn't delete the list unless you provide additional confirmation.

**To delete an allow list**  
You can delete an allow list by using the Amazon Macie console or the Amazon Macie API.

 

------
#### [ Console ]

Follow these steps to delete an allow list by using the Amazon Macie console.

**To delete an allow list by using the console**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, under **Settings**, choose **Allow lists**.

1. On the **Allow lists** page, select the checkbox for the allow list that you want to delete.

1. On the **Actions** menu, choose **Delete**.

1. When prompted for confirmation, enter **delete**, and then choose **Delete**.

------
#### [ API ]

To delete an allow list programmatically, use the [DeleteAllowList](https://docs.aws.amazon.com/macie/latest/APIReference/allow-lists-id.html) operation of the Amazon Macie API. For the `id` parameter, specify the unique identifier for the allow list to delete. You can get this identifier by using the [ListAllowLists](https://docs.aws.amazon.com/macie/latest/APIReference/allow-lists.html) operation. The **ListAllowLists** operation retrieves information about all the allow lists for your account. If you're using the AWS CLI, you can run the [list-allow-lists](https://docs.aws.amazon.com/cli/latest/reference/macie2/list-allow-lists.html) command to retrieve this information.

For the `ignoreJobChecks` parameter, specify whether to force deletion of the list, even if sensitive data discovery jobs are configured to use the list:
+ If you specify `false`, Macie checks the settings for all of your jobs that have a status other than `COMPLETE` or `CANCELLED`. If none of those jobs are configured to use the list, Macie deletes the list permanently. If any of those jobs are configured to use the list, Macie rejects your request and returns an HTTP 400 (`ValidationException`) error. The error message indicates the number of applicable jobs for up to 200 jobs. 
+ If you specify `true`, Macie deletes the list permanently without checking the settings for any of your jobs. 

 To delete an allow list by using the AWS CLI, run the [delete-allow-list](https://docs.aws.amazon.com/cli/latest/reference/macie2/delete-allow-list.html) command. For example:

```
C:\> aws macie2 delete-allow-list --id nkr81bmtu2542yyexample --ignore-job-checks false
```

Where *nkr81bmtu2542yyexample* is the unique identifier for the allow list to delete.

If your request succeeds, Macie returns an empty HTTP 200 response. Otherwise, Macie returns an HTTP 4*xx* or 500 response that indicates why the operation failed.

------

If the allow list specified predefined text, you can optionally delete the S3 object that stores the list. However, keeping this object can help ensure that you have an immutable history of sensitive data findings and discovery results for data privacy and protection audits or investigations.

# Performing automated sensitive data discovery
<a name="discovery-asdd"></a>

For broad visibility into where sensitive data might reside in your Amazon Simple Storage Service (Amazon S3) data estate, configure Amazon Macie to perform automated sensitive data discovery for your account or organization. With automated sensitive data discovery, Macie continually evaluates your S3 bucket inventory and uses sampling techniques to identify and select representative S3 objects in your buckets. Macie then retrieves and analyzes the selected objects, inspecting them for sensitive data.

By default, Macie selects and analyzes objects from all of your S3 general purpose buckets. If you're the Macie administrator for an organization, this includes objects in buckets that your member accounts own. You can adjust the scope of the analyses by excluding specific buckets. For example, you might exclude buckets that typically store AWS logging data. If you're a Macie administrator, an additional option is to enable or disable automated sensitive data discovery for individual accounts in your organization on a case-by-case basis.

You can tailor the analyses to focus on specific types of sensitive data. By default, Macie analyzes S3 objects by using the set of managed data identifiers that we recommend for automated sensitive data discovery. To tailor the analyses, you can configure Macie to use specific [managed data identifiers](managed-data-identifiers.md) that Macie provides, [custom data identifiers](custom-data-identifiers.md) that you define, or a combination of the two. You can also refine the analyses by configuring Macie to use [allow lists](allow-lists.md) that you specify.

As the analysis progresses each day, Macie produces records of the sensitive data that it finds and the analysis that it performs: *sensitive data findings*, which report sensitive data that Macie finds in individual S3 objects, and *sensitive data discovery results*, which log details about the analysis of individual S3 objects. Macie also updates statistics, inventory data, and other information that it provides about your Amazon S3 data. For example, an interactive heat map on the console provides a visual representation of data sensitivity across your data estate:

![\[The S3 buckets map. It shows different colored squares, one for each S3 bucket, grouped by account.\]](http://docs.aws.amazon.com/macie/latest/user/images/scrn-s3-map-small.png)


These features are designed to help you evaluate data sensitivity across your Amazon S3 data estate, and drill down to investigate and assess individual accounts, buckets, and objects. They can also help you determine where to perform deeper, more immediate analysis by [running sensitive data discovery jobs](discovery-jobs.md). Combined with information that Macie provides about the security and privacy of your Amazon S3 data, you can also use these features to identify cases where immediate remediation might be necessary—for example, a publicly accessible bucket that Macie found sensitive data in.

To configure and manage automated sensitive data discovery, you must be the Macie administrator for an organization or have a standalone Macie account.

**Topics**
+ [How automated sensitive data discovery works](discovery-asdd-how-it-works.md)
+ [Configuring automated sensitive data discovery](discovery-asdd-account-manage.md)
+ [Reviewing automated sensitive data discovery results](discovery-asdd-results-s3.md)
+ [Assessing automated sensitive data discovery coverage](discovery-coverage.md)
+ [Adjusting sensitivity scores for S3 buckets](discovery-asdd-s3bucket-manage.md)
+ [Sensitivity scoring for S3 buckets](discovery-scoring-s3.md)
+ [Default settings for automated sensitive data discovery](discovery-asdd-settings-defaults.md)

# How automated sensitive data discovery works
<a name="discovery-asdd-how-it-works"></a>

When you enable Amazon Macie for your AWS account, Macie creates an AWS Identity and Access Management (IAM) [service-linked role](service-linked-roles.md) for your account in the current AWS Region. The permissions policy for this role allows Macie to call other AWS services and monitor AWS resources on your behalf. By using this role, Macie generates and maintains an inventory of your Amazon Simple Storage Service (Amazon S3) general purpose buckets in the Region. The inventory includes information about each of your S3 buckets and objects in the buckets. If you're the Macie administrator for an organization, your inventory includes information about buckets that your member accounts own. For more information, see [Managing multiple accounts](macie-accounts.md).

If you enable automated sensitive data discovery, Macie evaluates your inventory data on a daily basis to identify S3 objects that are eligible for automated discovery. As part of the evaluation, Macie also selects a sampling of representative objects to analyze. Macie then retrieves and analyzes the latest version of each selected object, inspecting it for sensitive data.

As the analysis progresses each day, Macie updates statistics, inventory data, and other information that it provides about your Amazon S3 data. Macie also produces records of the sensitive data it finds and the analysis that it performs. The resulting data provides insight into where Macie found sensitive data in your Amazon S3 data estate, which can span all the S3 general purpose buckets for your account. The data can help you assess the security and privacy of your Amazon S3 data, determine where to perform a deeper investigation, and identify cases where remediation is necessary.

For a brief demonstration of how automated sensitive data discovery works, watch the following video:




To configure and manage automated sensitive data discovery, you must be the Macie administrator for an organization or have a standalone Macie account. If your account is part of an organization, only the Macie administrator for your organization can enable or disable automated discovery for accounts in the organization. In addition, only the Macie administrator can configure and manage automated discovery settings for the accounts. This includes settings that define the scope and nature of the analyses that Macie performs. If you have a member account in an organization, contact your Macie administrator to learn about the settings for your account and organization.

**Topics**
+ [Key components](#discovery-asdd-how-it-works-components)
+ [Considerations](#discovery-asdd-how-it-works-considerations)

## Key components
<a name="discovery-asdd-how-it-works-components"></a>

Amazon Macie uses a combination of features and techniques to perform automated sensitive data discovery. These work together with features that Macie provides to help you [monitor your Amazon S3 data for security and access control](monitoring-s3-how-it-works.md).

**Selecting S3 objects to analyze**  
On a daily basis, Macie evaluates your Amazon S3 inventory data to identify S3 objects that are eligible for analysis by automated sensitive data discovery. If you're the Macie administrator for an organization, by default the evaluation includes data for S3 buckets that your member accounts own.  
As part of the evaluation, Macie uses sampling techniques to select representative S3 objects to analyze. The techniques define groups of objects that have similar metadata and are likely to have similar content. The groups are based on dimensions such as bucket name, prefix, storage class, file name extension, and last modified date. Macie then selects a representative set of samples from each group, retrieves the latest version of each selected object from Amazon S3, and analyzes each selected object to determine whether the object contains sensitive data. When the analysis is complete, Macie discards its copy of the object.  
The sampling strategy prioritizes distributed analyses. In general, it uses a breadth-first approach to your Amazon S3 data estate. Each day, a representative set of S3 objects are selected from as many of your general purpose buckets as possible based on the total storage size of all the classifiable objects in your Amazon S3 data estate. For example, if Macie has already analyzed and found sensitive data in objects in one bucket and hasn't yet analyzed objects in another bucket, the latter bucket is a higher priority for analysis. With this approach, you gain broad insight into the sensitivity of your Amazon S3 data more quickly. Depending on the size of your data estate, analysis results can begin to appear within 48 hours.  
The sampling strategy also prioritizes analysis of different kinds of S3 objects and objects that were recently created or changed. Any single object sample isn’t guaranteed to be conclusive. Therefore, analysis of a diverse set of objects can yield better insight into the types and amount of sensitive data that an S3 bucket might contain. In addition, prioritizing new or recently changed objects helps the analysis adapt to changes to your bucket inventory. For example, if objects are created or changed after a previous analysis, those objects are a higher priority for subsequent analysis. Conversely, if an object was previously analyzed and hasn't changed since that analysis, Macie doesn't analyze the object again. This approach helps you establish sensitivity baselines for individual S3 buckets. Then, as continual, incremental analyses progress for your account, your sensitivity assessments of individual buckets can become increasingly deeper and detailed at a predictable rate.

**Defining the scope of the analyses**  
By default, Macie includes all the S3 general purpose buckets for your account when it evaluates your inventory data and selects S3 objects to analyze. If you're the Macie administrator for an organization, this includes buckets that your member accounts own.  
You can adjust the scope of the analyses by excluding specific S3 buckets from automated sensitive data discovery. For example, you might want to exclude buckets that typically store AWS logging data, such as AWS CloudTrail event logs. To exclude a bucket, you can change the automated discovery settings for your account or the bucket. If you do this, Macie starts excluding the bucket when the next daily evaluation and analysis cycle starts. You can exclude as many as 1,000 buckets from analyses. If you exclude an S3 bucket, you can include it again later. To do this, change the settings for your account or the bucket again. Macie then starts including the bucket when the next daily evaluation and analysis cycle starts.  
If you're the Macie administrator for an organization, you can also enable or disable automated sensitive data discovery for individual accounts in your organization. If you disable automated discovery for an account, Macie excludes all the S3 buckets that the account owns. If you subsequently re-enable automated discovery for the account, Macie starts including the buckets again.

**Determining which types of sensitive data to detect and report**  
By default, Macie inspects S3 objects by using the set of managed data identifiers that we recommend for automated sensitive data discovery. For a list of these managed data identifiers, see [Default settings for automated sensitive data discovery](discovery-asdd-settings-defaults.md).  
You can tailor the analyses to focus on specific types of sensitive data. To do this, change your automated discovery settings in any of the following ways:  
+ **Add or remove managed data identifiers** – A *managed data identifier* is a set of built-in criteria and techniques that are designed to detect a specific type of sensitive data, such as credit card numbers, AWS secret access keys, or passport numbers for a particular country or region. For more information, see [Using managed data identifiers](managed-data-identifiers.md).
+ **Add or remove custom data identifiers** – A *custom data identifier* is a set of criteria that you define to detect sensitive data. With custom data identifiers, you can detect sensitive data that reflects your organization's particular scenarios, intellectual property, or proprietary data. For example, you can detect employee IDs, customer account numbers, or internal data classifications. For more information, see [Building custom data identifiers](custom-data-identifiers.md).
+ **Add or remove allow lists** – In Macie, an allow list specifies text or a text pattern that you want Macie to ignore in S3 objects. These are typically sensitive data exceptions for your particular scenarios or environment, such as public names or phone numbers for your organization, or sample data that your organization uses for testing. For more information, see [Defining sensitive data exceptions with allow lists](allow-lists.md).
If you change a setting, Macie applies your change when the next daily analysis cycle starts. If you're the Macie administrator for an organization, Macie uses the settings for your account when it analyzes S3 objects for other accounts in your organization.  
You can also configure bucket-level settings that determine whether specific types of sensitive data are included in assessments of a bucket's sensitivity. To learn how, see [Adjusting sensitivity scores for S3 buckets](discovery-asdd-s3bucket-manage.md).

**Calculating sensitivity scores**  
By default, Macie automatically calculates a sensitivity score for each S3 general purpose bucket for your account. If you're the Macie administrator for an organization, this includes buckets that your member accounts own.  
In Macie, a *sensitivity score* is a quantitative measure of the intersection of two primary dimensions: the amount of sensitive data that Macie has found in a bucket, and the amount of data that Macie has analyzed in a bucket. A bucket's sensitivity score determines which sensitivity label Macie assigns to the bucket. A *sensitivity label* is a qualitative representation of a bucket's sensitivity score—for example, *Sensitive*, *Not sensitive*, and *Not yet analyzed*. For details about the range of sensitivity scores and labels that Macie defines, see [Sensitivity scoring for S3 buckets](discovery-scoring-s3.md).  
An S3 bucket's sensitivity score and label don't imply or otherwise indicate the criticality or importance that the bucket or the bucket's objects might have for you or your organization. Instead, they're intended to provide reference points that can help you identify and monitor potential security risks.
When you enable automated sensitive data discovery for the first time, Macie automatically assigns a sensitivity score of *50* and the *Not yet analyzed* label to each S3 bucket. The exception is empty buckets. An *empty bucket* is a bucket that doesn't store any objects or all the bucket's objects contain zero (0) bytes of data. If this is the case for a bucket, Macie assigns a score of *1* to the bucket and it assigns the *Not sensitive* label to the bucket.  
As automated sensitive data discovery progresses, Macie updates sensitivity scores and labels to reflect the results of its analyses. For example:  
+ If Macie doesn't find sensitive data in an object, Macie decreases the bucket's sensitivity score and updates the bucket's sensitivity label as necessary.
+ If Macie finds sensitive data in an object, Macie increases the bucket's sensitivity score and updates the bucket's sensitivity label as necessary.
+ If Macie finds sensitive data in an object that's subsequently changed, Macie removes sensitive data detections for the object from the bucket's sensitivity score and updates the bucket's sensitivity label as necessary.
+ If Macie finds sensitive data in an object that's subsequently deleted, Macie removes sensitive data detections for the object from the bucket's sensitivity score and updates the bucket's sensitivity label as necessary.
You can adjust the sensitivity scoring settings for individual S3 buckets by including or excluding specific types of sensitive data from a bucket's score. You can also override a bucket's calculated score by manually assigning the maximum score (*100*) to the bucket. If you assign the maximum score, the bucket's label is *Sensitive*. For more information, see [Adjusting sensitivity scores for S3 buckets](discovery-asdd-s3bucket-manage.md).

**Generating metadata, statistics, and other types of results**  
When you enable automated sensitive data discovery, Macie generates and begins maintaining additional inventory data, statistics, and other information about the S3 general purpose buckets for your account. If you're the Macie administrator for an organization, by default this includes buckets that your member accounts own.  
The additional information captures the results of the automated sensitive data discovery activities that Macie has performed thus far. It also supplements other information that Macie provides about your Amazon S3 data, such as the public access and shared access settings for individual buckets. The additional information includes:  
+ An interactive, visual representation of data sensitivity across your Amazon S3 data estate.
+ Aggregated data sensitivity statistics, such as the total number of buckets that Macie has found sensitive data in and how many of those buckets are publicly accessible.
+ Bucket-level details that indicate the current status of the analyses. For example, a list of objects that Macie has analyzed in a bucket, the types of sensitive data that Macie has found in a bucket, and the number of occurrences of each type of sensitive data that Macie found.
The information also includes statistics and details that can help you assess and monitor coverage of your Amazon S3 data. You can check the status of the analyses for your data estate overall and for individual S3 buckets. You can also identify issues that prevented Macie from analyzing objects in specific buckets. If you remediate the issues, you can increase coverage of your Amazon S3 data during subsequent analysis cycles. For more information, see [Assessing automated sensitive data discovery coverage](discovery-coverage.md).  
Macie automatically recalculates and updates this information while it performs automated sensitive data discovery. For example, if Macie finds sensitive data in an S3 object that's subsequently changed or deleted, Macie updates the applicable bucket's metadata: removes the object from the list of analyzed objects; removes occurrences of sensitive data that Macie found in the object; recalculates the sensitivity score, if the score is calculated automatically; and, updates the sensitivity label as necessary to reflect the new score.  
In addition to metadata and statistics, Macie produces records of the sensitive data it finds and the analysis that it performs: *sensitive data findings*, which report sensitive data that Macie finds in individual S3 objects, and *sensitive data discovery results*, which log details about the analysis of individual S3 objects.  
For more information, see [Reviewing automated sensitive data discovery results](discovery-asdd-results-s3.md).

## Considerations
<a name="discovery-asdd-how-it-works-considerations"></a>

As you configure and use Amazon Macie to perform automated sensitive data discovery for your Amazon S3 data, keep the following in mind:
+ Your automated discovery settings apply only to the current AWS Region. Consequently, the resulting analyses and data apply only to S3 general purpose buckets and objects in the current Region. To perform automated discovery and access the resulting data in additional Regions, enable and configure automated discovery in each additional Region.
+ If you're the Macie administrator for an organization:
  + You can perform automated discovery for a member account only if Macie is enabled for the account in the current Region. In addition, you must enable automated discovery for the account in that Region. Members can't enable or disable automated discovery for their own accounts.
  + If you enable automated discovery for a member account, Macie uses the automated discovery settings for your administrator account when it analyzes data for the member account. The applicable settings are: the list of S3 buckets to exclude from analyses, and the managed data identifiers, custom data identifiers, and allow lists to use when analyzing S3 objects. Members can't review or change these settings.
  + Members can't access automated discovery settings for individual S3 buckets that they own. For example, a member can't review or adjust the sensitivity scoring settings for one of their buckets. Only the Macie administrator can access these settings.
  + Members have read access to sensitive data discovery statistics and other results that Macie directly provides for their S3 buckets. For example, a member can use Macie to review sensitivity scores and coverage data for their S3 buckets. The exception is sensitive data findings. Only the Macie administrator has direct access to findings that automated discovery produces.
+ If an S3 bucket's permissions settings prevent Macie from accessing or retrieving information about the bucket or the bucket’s objects, Macie can't perform automated discovery for the bucket. Macie can only provide a subset of information about the bucket, such as the account ID for the AWS account that owns the bucket, the bucket's name, and when Macie most recently retrieved bucket and object metadata for the bucket as part of the [daily refresh cycle](monitoring-s3-how-it-works.md#monitoring-s3-how-it-works-data-refresh). In your bucket inventory, the sensitivity score for these buckets is *50* and their sensitivity label is *Not yet analyzed*. To identify S3 buckets where this is the case, you can refer to coverage data. For more information, see [Assessing automated sensitive data discovery coverage](discovery-coverage.md).
+ To be eligible for selection and analysis, an S3 object must be stored in a general purpose bucket and it must be *classifiable*. A *classifiable* object uses a supported Amazon S3 storage class and it has a file name extension for a supported file or storage format. For more information, see [Supported storage classes and formats](discovery-supported-storage.md).
+ If an S3 object is encrypted, Macie can analyze it only if it's encrypted with a key that Macie can access and is allowed to use. For more information, see [Analyzing encrypted S3 objects](discovery-supported-encryption-types.md). To identify cases where encryption settings prevented Macie from analyzing one or more objects in a bucket, you can refer to coverage data. For more information, see [Assessing automated sensitive data discovery coverage](discovery-coverage.md).

# Configuring automated sensitive data discovery
<a name="discovery-asdd-account-manage"></a>

To gain broad visibility into where sensitive data might reside in your Amazon Simple Storage Service (Amazon S3) data estate, enable and configure automated sensitive data discovery for your account or organization. Amazon Macie then evaluates your S3 bucket inventory on a daily basis and uses sampling techniques to identify and select representative S3 objects from your buckets. Macie retrieves and analyzes the selected objects, inspecting them for sensitive data. If you're the Macie administrator for an organization, by default this includes objects in S3 buckets that your member accounts own. 

As the analysis progresses each day, Macie produces records of the sensitive data it finds and the analysis that it performs. Macie also updates statistics, inventory data, and other information that it provides about your Amazon S3 data. The resulting data provides insight into where Macie found sensitive data in your Amazon S3 data estate, which can span all the S3 buckets for your account or organization. For more information, see [How automated sensitive data discovery works](discovery-asdd-how-it-works.md).

If you have a standalone Macie account or you're the Macie administrator for an organization, you can configure and manage automated sensitive data discovery for your account or organization. This includes enabling and disabling automated discovery, and configuring settings that define the scope and nature of the analyses that Macie performs. If you have a member account in an organization, contact your Macie administrator to learn about the settings for your account and organization.

**Topics**
+ [Prerequisites for configuring automated sensitive data discovery](discovery-asdd-account-configure-prereqs.md)
+ [Enabling automated sensitive data discovery](discovery-asdd-account-enable.md)
+ [Configuring settings for automated sensitive data discovery](discovery-asdd-account-configure.md)
+ [Disabling automated sensitive data discovery](discovery-asdd-account-disable.md)

# Prerequisites for configuring automated sensitive data discovery
<a name="discovery-asdd-account-configure-prereqs"></a>

Before you enable or configure settings for automated sensitive data discovery, complete the following tasks. This helps ensure that you have the resources and permissions that you need.

To complete these tasks, you must be the Amazon Macie administrator for an organization or have a standalone Macie account. If your account is part of an organization, only the Macie administrator for your organization can enable or disable automated sensitive data discovery for accounts in the organization. In addition, only the Macie administrator can configure automated discovery settings for the accounts.

**Topics**
+ [Step 1: Configure a repository for sensitive data discovery results](#discovery-asdd-account-configure-prereqs-sddr)
+ [Step 2: Verify your permissions](#discovery-asdd-account-configure-prereqs-perms)
+ [Next steps](#discovery-asdd-account-configure-prereqs-next)

## Step 1: Configure a repository for sensitive data discovery results
<a name="discovery-asdd-account-configure-prereqs-sddr"></a>

When Amazon Macie performs automated sensitive data discovery, it creates an analysis record for each Amazon Simple Storage Service (Amazon S3) object that it selects for analysis. These records, referred to as *sensitive data discovery results*, log details about the analysis of individual S3 objects. This includes objects that Macie doesn't find sensitive data in, and objects that Macie can't analyze due to errors or issues such as permissions settings. If Macie finds sensitive data in an object, the sensitive data discovery result includes information about the sensitive data that Macie found. Sensitive data discovery results provide you with analysis records that can be helpful for data privacy and protection audits or investigations.

Macie stores your sensitive data discovery results for only 90 days. To access the results and enable long-term storage and retention of them, configure Macie to store the results in an S3 bucket. The bucket can serve as a definitive, long-term repository for all of your sensitive data discovery results. If you're the Macie administrator for an organization, this includes sensitive data discovery results for member accounts that you enable automated sensitive data discovery for.

To verify that you configured this repository, choose **Discovery results** in the navigation pane on the Amazon Macie console. If you prefer to do this programmatically, use the [GetClassificationExportConfiguration](https://docs.aws.amazon.com/macie/latest/APIReference/classification-export-configuration.html) operation of the Amazon Macie API. To learn more about sensitive data discovery results and how to configure this repository, see [Storing and retaining sensitive data discovery results](discovery-results-repository-s3.md).

If you configured the repository, Macie creates a folder named `automated-sensitive-data-discovery` in the repository when you enable automated sensitive data discovery for the first time. This folder stores sensitive data discovery results that Macie creates while performing automated discovery for your account or organization.

If you use Macie in multiple AWS Regions, verify that you configured the repository for each of those Regions.

## Step 2: Verify your permissions
<a name="discovery-asdd-account-configure-prereqs-perms"></a>

To verify your permissions, use AWS Identity and Access Management (IAM) to review the IAM policies that are attached to your IAM identity. Then compare the information in those policies to the following list of actions that you must be allowed to perform:
+ `macie2:GetMacieSession`
+ `macie2:UpdateAutomatedDiscoveryConfiguration`
+ `macie2:ListClassificationScopes`
+ `macie2:UpdateClassificationScope`
+ `macie2:ListSensitivityInspectionTemplates`
+ `macie2:UpdateSensitivityInspectionTemplate`

The first action allows you to access your Amazon Macie account. The second action allows you to enable or disable automated sensitive data discovery for your account or organization. For an organization, it also allows you to enable automated discovery automatically for accounts in your organization. The remaining actions allow you to identify and change the configuration settings.

If you plan to review or change the configuration settings by using the Amazon Macie console, you must also be allowed to perform the following actions:
+ `macie2:GetAutomatedDiscoveryConfiguration`
+ `macie2:GetClassificationScope`
+ `macie2:GetSensitivityInspectionTemplate`

These actions allow you to retrieve your current configuration settings and the status of automated sensitive data discovery for your account or organization. Permission to perform these actions is optional if you plan to change the configuration settings programmatically.

If you're the Macie administrator for an organization, you must also be allowed to perform the following actions:
+ `macie2:ListAutomatedDiscoveryAccounts`
+ `macie2:BatchUpdateAutomatedDiscoveryAccounts`

The first action allows you to retrieve the status of automated sensitive data discovery for individual accounts in your organization. The second action allows you to enable or disable automated discovery for individual accounts in your organization.

If you're not allowed to perform the requisite actions, ask your AWS administrator for assistance.

## Next steps
<a name="discovery-asdd-account-configure-prereqs-next"></a>

After you complete the preceding tasks, you're ready to enable and configure the settings for your account or organization:
+ [Enabling automated sensitive data discovery](discovery-asdd-account-enable.md)
+ [Configuring settings for automated sensitive data discovery](discovery-asdd-account-configure.md)

 

# Enabling automated sensitive data discovery
<a name="discovery-asdd-account-enable"></a>

When you enable automated sensitive data discovery, Amazon Macie begins evaluating your Amazon Simple Storage Service (Amazon S3) inventory data and performing other automated discovery activities for your account in the current AWS Region. If you're the Macie administrator for an organization, by default the evaluation and activities include S3 buckets that your member accounts own. Depending on the size of your Amazon S3 data estate, statistics and other results can begin to appear within 48 hours.

After you enable automated sensitive data discovery, you can configure settings that refine the scope and nature of the analyses that Macie performs. These settings specify any S3 buckets to exclude from analyses. They also specify the managed data identifiers, custom data identifiers, and allow lists that you want Macie to use when it analyzes S3 objects. For information about these settings, see [Configuring settings for automated sensitive data discovery](discovery-asdd-account-configure.md). If you're the Macie administrator for an organization, you can also refine the scope of the analyses by enabling or disabling automated sensitive data discovery for individual accounts in your organization on a case-by-case basis.

To enable automated sensitive data discovery, you must be the Macie administrator for an organization or have a standalone Macie account. If you have a member account in an organization, work with your Macie administrator to enable automated sensitive data discovery for your account.

**To enable automated sensitive data discovery**  
If you're the Macie administrator for an organization or you have a standalone Macie account, you can enable automated sensitive data discovery by using the Amazon Macie console or the Amazon Macie API. If you're enabling it for the first time, start by [completing the prerequisite tasks](discovery-asdd-account-configure-prereqs.md). This helps ensure that you have the resources and permissions that you need.

------
#### [ Console ]

Follow these steps to enable automated sensitive data discovery by using the Amazon Macie console.

**To enable automated sensitive data discovery**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. By using the AWS Region selector in the upper-right corner of the page, choose the Region in which you want to enable automated sensitive data discovery.

1. In the navigation pane, under **Settings**, choose **Automated sensitive data discovery**.

1. If you have a standalone Macie account, choose **Enable** in the **Status** section.

1. If you're the Macie administrator for an organization, choose an option in the **Status** section to specify the accounts to enable automated sensitive data discovery for:
   + To enable it for all the accounts in your organization, choose **Enable**. In the dialog box that appears, choose **My organization**. For an organization in AWS Organizations, select **Enable automatically for new accounts** to also enable it automatically for accounts that subsequently join your organization. When you finish, choose **Enable**.
   + To enable it for only particular member accounts, choose **Manage accounts**. Then, in the table on the **Accounts** page, select the checkbox for each account to enable it for. When you finish, choose **Enable automated sensitive data discovery** on the **Actions** menu.
   + To enable it for only your Macie administrator account, choose **Enable**. In the dialog box that appears, choose **My account** and clear **Enable automatically for new accounts**. When you finish, choose **Enable**.

If you use Macie in multiple Regions and want to enable automated sensitive data discovery in additional Regions, repeat the preceding steps in each additional Region.

To subsequently check or change the status of automated sensitive data discovery for individual accounts in an organization, choose **Accounts** in the navigation pane. On the **Accounts** page, the **Automated sensitive data discovery** field in the table indicates the current status of automated discovery for an account. To change the status for an account, select the checkbox for the account. Then use the **Actions** menu to enable or disable automated discovery for the account.

------
#### [ API ]

To enable automated sensitive data discovery programmatically, you have several options:
+ To enable it for a Macie administrator account, an organization, or a standalone Macie account, use the [UpdateAutomatedDiscoveryConfiguration](https://docs.aws.amazon.com/macie/latest/APIReference/automated-discovery-configuration.html) operation. Or, if you're using the AWS Command Line Interface (AWS CLI), run the [update-automated-discovery-configuration](https://docs.aws.amazon.com/cli/latest/reference/macie2/update-automated-discovery-configuration.html) command.
+ To enable it for only particular member accounts in an organization, use the [BatchUpdateAutomatedDiscoveryAccounts](https://docs.aws.amazon.com/macie/latest/APIReference/automated-discovery-accounts.html) operation. Or, if you're using the AWS CLI, run the [batch-update-automated-discovery-accounts](https://docs.aws.amazon.com/cli/latest/reference/macie2/batch-update-automated-discovery-accounts.html) command. To enable automated discovery for a member account, you must first enable it for your administrator account or organization.

Additional options and details vary depending on the type of account that you have.

If you're a Macie administrator, use the **UpdateAutomatedDiscoveryConfiguration** operation or run the **update-automated-discovery-configuration** command to enable automated sensitive data discovery for your account or organization. In your request, specify `ENABLED` for the `status` parameter. For the `autoEnableOrganizationMembers` parameter, specify the accounts to enable it for. If you're using the AWS CLI, specify the accounts by using the `auto-enable-organization-members` parameter. Valid values are:
+ `ALL` (default) – Enable it for all the accounts in your organization. This includes your administrator account, existing member accounts, and accounts that subsequently join your organization.
+ `NEW` – Enable it for your administrator account. Also enable it automatically for accounts that subsequently join your organization. If you previously enabled automated discovery for your organization and you specify this value, automated discovery will continue to be enabled for existing member accounts that it's currently enabled for.
+ `NONE` – Enable it for only your administrator account. Don't enable it automatically for accounts that subsequently join your organization. If you previously enabled automated discovery for your organization and you specify this value, automated discovery will continue to be enabled for existing member accounts that it's currently enabled for.

If you want to selectively enable automated sensitive data discovery for only particular member accounts, specify `NEW` or `NONE`. You can then use the **BatchUpdateAutomatedDiscoveryAccounts** operation or run the **batch-update-automated-discovery-accounts** command to enable automated discovery for the accounts.

If you have a standalone Macie account, use the **UpdateAutomatedDiscoveryConfiguration** operation or run the **update-automated-discovery-configuration** command to enable automated sensitive data discovery for your account. In your request, specify `ENABLED` for the `status` parameter. For the `autoEnableOrganizationMembers` parameter, consider whether you plan to become the Macie administrator for other accounts, and specify the appropriate value. If you specify `NONE`, automated discovery isn't enabled automatically for an account when you become the Macie administrator for the account. If you specify `ALL` or `NEW`, automated discovery is enabled automatically for the account. If you're using the AWS CLI, use the `auto-enable-organization-members` parameter to specify the appropriate value for this setting.

The following examples show how to use the AWS CLI to enable automated sensitive data discovery for one or more accounts in an organization. This first example enables automated discovery for all the accounts in an organization for the first time. It enables automated discovery for the Macie administrator account, all existing member accounts, and any accounts that subsequently join the organization.

```
$ aws macie2 update-automated-discovery-configuration --status ENABLED --auto-enable-organization-members ALL --region us-east-1
```

Where *us-east-1* is the Region in which to enable automated sensitive data discovery for the accounts, the US East (N. Virginia) Region. If the request succeeds, Macie enables automated discovery for the accounts and returns an empty response.

The next example changes the member enablement setting for an organization to `NONE`. With this change, automated sensitive data discovery isn't enabled automatically for accounts that subsequently join the organization. Instead, it's enabled only for the Macie administrator account, and any existing member accounts that it's currently enabled for.

```
$ aws macie2 update-automated-discovery-configuration --status ENABLED --auto-enable-organization-members NONE --region us-east-1
```

Where *us-east-1* is the Region in which to change the setting, the US East (N. Virginia) Region. If the request succeeds, Macie updates the setting and returns an empty response.

The following examples enable automated sensitive data discovery for two member accounts in an organization. The Macie administrator has already enabled automated discovery for the organization. This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws macie2 batch-update-automated-discovery-accounts \
--region us-east-1 \
--accounts '[{"accountId":"123456789012","status":"ENABLED"},{"accountId":"111122223333","status":"ENABLED"}]'
```

This example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 batch-update-automated-discovery-accounts ^
--region us-east-1 ^
--accounts=[{\"accountId\":\"123456789012\",\"status\":\"ENABLED\"},{\"accountId\":\"111122223333\",\"status\":\"ENABLED\"}]
```

Where:
+ *us-east-1* is the Region in which to enable automated sensitive data discovery for the specified accounts, the US East (N. Virginia) Region.
+ *123456789012* and *111122223333* are the account IDs for the accounts to enable automated sensitive data discovery for.

If the request succeeds for all specified accounts, Macie returns an empty `errors` array. If the request fails for some accounts, the array specifies the error that occurred for each affected account. For example:

```
"errors": [
    {
        "accountId": "123456789012",
        "errorCode": "ACCOUNT_PAUSED"
    }
]
```

In the preceding response, the request failed for the specified account (`123456789012`) because Macie is currently suspended for the account. To address this error, the Macie administrator must first enable Macie for the account.

If the request fails for all accounts, you receive a message that describes the error that occurred. 

------

# Configuring settings for automated sensitive data discovery
<a name="discovery-asdd-account-configure"></a>

If you enable automated sensitive data discovery for your account or organization, you can adjust your automated discovery settings to refine the analyses that Amazon Macie performs. The settings specify Amazon Simple Storage Service (Amazon S3) buckets to exclude from analyses. They also specify the types and occurrences of sensitive data to detect and report—the managed data identifiers, custom data identifiers, and allow lists to use when analyzing S3 objects.

By default, Macie performs automated sensitive data discovery for all the S3 general purpose buckets for your account. If you're the Macie administrator for an organization, this includes buckets that your member accounts own. You can exclude specific buckets from the analyses. For example, you might exclude buckets that typically store AWS logging data, such as AWS CloudTrail event logs. If you exclude a bucket, you can include it again later. 

In addition, Macie analyzes S3 objects by using only the set of managed data identifiers that we recommend for automated sensitive data discovery. Macie doesn't use custom data identifiers or allow lists that you defined. To customize the analyses, you can add or remove specific managed data identifiers, custom data identifiers, and allow lists.

If you change a setting, Macie applies your change when the next evaluation and analysis cycle starts, typically within 24 hours. In addition, your change applies only to the current AWS Region. To make the same change in additional Regions, repeat the applicable steps in each additional Region.

**Topics**
+ [Configuration options for organizations](#discovery-asdd-configure-options-orgs)
+ [Excluding or including S3 buckets](#discovery-asdd-account-configure-s3buckets)
+ [Adding or removing managed data identifiers](#discovery-asdd-account-configure-mdis)
+ [Adding or removing custom data identifiers](#discovery-asdd-account-configure-cdis)
+ [Adding or removing allow lists](#discovery-asdd-account-configure-als)

**Note**  
To configure settings for automated sensitive data discovery, you must be the Macie administrator for an organization or have a standalone Macie account. If your account is part of an organization, only the Macie administrator for your organization can configure and manage the settings for accounts in your organization. If you have a member account, contact your Macie administrator to learn about the settings for your account and organization.

## Configuration options for organizations
<a name="discovery-asdd-configure-options-orgs"></a>

If an account is part of an organization that centrally manages multiple Amazon Macie accounts, the Macie administrator for the organization configures and manages automated sensitive data discovery for accounts in the organization. This includes settings that define the scope and nature of the analyses that Macie performs for the accounts. Members can't access these settings for their own accounts.

If you're the Macie administrator for an organization, you can define the scope of the analyses in several ways:
+ **Automatically enable automated sensitive data discovery for accounts** – When you enable automated sensitive data discovery, you specify whether to enable it for all existing accounts and new member accounts, only for new member accounts, or no member accounts. If you enable it for new member accounts, it's enabled automatically for any account that subsequently joins your organization, when the account joins your organization in Macie. If it's enabled for an account, Macie includes S3 buckets that the account owns. If it's disabled for an account, Macie excludes buckets that the account owns.
+ **Selectively enable automated sensitive data discovery for accounts** – With this option, you enable or disable automated sensitive data discovery for individual accounts on a case-by-case basis. If you enable it for an account, Macie includes S3 buckets that the account owns. If you don't enable it or you disable it for an account, Macie excludes buckets that the account owns.
+ **Exclude specific S3 buckets from automated sensitive data discovery** – If you enable automated sensitive data discovery for an account, you can exclude particular S3 buckets that the account owns. Macie then skips the buckets when it performs automated discovery. To exclude particular buckets, add them to the exclusion list in the configuration settings for your administrator account. You can exclude as many as 1,000 buckets for your organization.

By default, automated sensitive data discovery is enabled automatically for all new and existing accounts in an organization. In addition, Macie includes all the S3 buckets that the accounts own. If you keep the default settings, this means that Macie performs automated discovery for all the buckets for your administrator account, which includes all the buckets that your member accounts own.

As a Macie administrator, you also define the nature of the analyses that Macie performs for your organization. You do this by configuring additional settings for your administrator account—the managed data identifiers, custom data identifiers, and allows lists that you want Macie to use when it analyzes S3 objects. Macie uses the settings for your administrator account when it analyzes S3 objects for other accounts in your organization.

## Excluding or including S3 buckets in automated sensitive data discovery
<a name="discovery-asdd-account-configure-s3buckets"></a>

By default, Amazon Macie performs automated sensitive data discovery for all the S3 general purpose buckets for your account. If you're the Macie administrator for an organization, this includes buckets that your member accounts own.

To refine the scope, you can exclude as many as 1,000 S3 buckets from analyses. If you exclude a bucket, Macie stops selecting and analyzing objects in the bucket when it performs automated sensitive data discovery. Existing sensitive data discovery statistics and details for the bucket persist. For example, the bucket's current sensitivity score remains unchanged. After you exclude a bucket, you can include it again later.

**To exclude or include an S3 bucket in automated sensitive data discovery**  
You can exclude or subsequently include an S3 bucket by using the Amazon Macie console or the Amazon Macie API.

------
#### [ Console ]

Follow these steps to exclude or subsequently include an S3 bucket by using the Amazon Macie console.

**To exclude or include an S3 bucket**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. By using the AWS Region selector in the upper-right corner of the page, choose the Region in which you want to exclude or include specific S3 buckets in analyses.

1. In the navigation pane, under **Settings**, choose **Automated sensitive data discovery**.

   The **Automated sensitive data discovery** page appears and displays your current settings. On that page, the **S3 buckets** section lists S3 buckets that are currently excluded, or it indicates that all buckets are currently included.

1. In the **S3 buckets** section, choose **Edit**.

1. Do one of the following:
   + To exclude one or more S3 buckets, choose **Add buckets to the exclude list**. Then, in the **S3 buckets** table, select the checkbox for each bucket to exclude. The table lists all the general purpose buckets for your account or organization in the current Region.
   + To include one or more S3 buckets that you previously excluded, choose **Remove buckets from the exclude list**. Then, in the **S3 buckets** table, select the checkbox for each bucket to include. The table lists all the buckets that are currently excluded from analyses.

   To find specific buckets more easily, enter search criteria in the search box above the table. You can also sort the table by choosing a column heading.

1. When you finish selecting buckets, choose **Add** or **Remove**, depending on the option that you chose in the preceding step.

**Tip**  
You can also exclude or include individual S3 buckets on a case-by-case basis while you review bucket details on the console. To do this, choose the bucket on the **S3 buckets** page. Then, in the details panel, change the **Exclude from automated discovery** setting for the bucket.

------
#### [ API ]

To exclude or subsequently include an S3 bucket programmatically, use the Amazon Macie API to update the classification scope for your account. The classification scope specifies buckets that you don't want Macie to analyze when it performs automated sensitive data discovery. It defines a bucket exclusion list for automated discovery.

When you update the classification scope, you specify whether to add or remove individual buckets from the exclusion list, or overwrite the current list with a new list. Therefore, it's a good idea to start by retrieving and reviewing your current list. To retrieve the list, use the [GetClassificationScope](https://docs.aws.amazon.com/macie/latest/APIReference/classification-scopes-id.html) operation. If you're using the AWS Command Line Interface (AWS CLI), run the [get-classification-scope](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-classification-scope.html) command to retrieve the list.

To retrieve or update the classification scope, you have to specify its unique identifier (`id`). You can get this identifier by using the [GetAutomatedDiscoveryConfiguration](https://docs.aws.amazon.com/macie/latest/APIReference/automated-discovery-configuration.html) operation. This operation retrieves your current configuration settings for automated sensitive data discovery, including the unique identifier for the classification scope for your account in the current AWS Region. If you're using the AWS CLI, run the [get-automated-discovery-configuration](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-automated-discovery-configuration.html) command to retrieve this information.

When you're ready to update the classification scope, use the [UpdateClassificationScope](https://docs.aws.amazon.com/macie/latest/APIReference/classification-scopes-id.html) operation or, if you're using the AWS CLI, run the [update-classification-scope](https://docs.aws.amazon.com/cli/latest/reference/macie2/update-classification-scope.html) command. In your request, use the supported parameters to exclude or include an S3 bucket in subsequent analyses:
+ To exclude one or more buckets, specify the name of each bucket for the `bucketNames` parameter. For the `operation` parameter, specify `ADD`.
+ To include one or more buckets that you previously excluded, specify the name of each bucket for the `bucketNames` parameter. For the `operation` parameter, specify `REMOVE`.
+ To overwrite the current list with a new list of buckets to exclude, specify `REPLACE` for the `operation` parameter. For the `bucketNames` parameter, specify the name of each bucket to exclude.

Each value for the `bucketNames` parameter must be the full name of an existing general purpose bucket in the current Region. Values are case sensitive. If your request succeeds, Macie updates the classification scope and returns an empty response.

The following examples show how to use the AWS CLI to update the classification scope for an account. The first set of examples excludes two S3 buckets (*amzn-s3-demo-bucket1* and *amzn-s3-demo-bucket2*) from subsequent analyses. It adds the buckets to the list of buckets to exclude.

This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws macie2 update-classification-scope \
--id 117aff7ed76b59a59c3224ebdexample \
--s3 '{"excludes":{"bucketNames":["amzn-s3-demo-bucket1","amzn-s3-demo-bucket2"],"operation": "ADD"}}'
```

This example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 update-classification-scope ^
--id 117aff7ed76b59a59c3224ebdexample ^
--s3={\"excludes\":{\"bucketNames\":[\"amzn-s3-demo-bucket1\",\"amzn-s3-demo-bucket2\"],\"operation\":\"ADD\"}}
```

The next set of examples later includes the buckets (*amzn-s3-demo-bucket1* and *amzn-s3-demo-bucket2*) in subsequent analyses. It removes the buckets from the list of buckets to exclude. For Linux, macOS, or Unix:

```
$ aws macie2 update-classification-scope \
--id 117aff7ed76b59a59c3224ebdexample \
--s3 '{"excludes":{"bucketNames":["amzn-s3-demo-bucket1","amzn-s3-demo-bucket2"],"operation": "REMOVE"}}'
```

For Microsoft Windows:

```
C:\> aws macie2 update-classification-scope ^
--id 117aff7ed76b59a59c3224ebdexample ^
--s3={\"excludes\":{\"bucketNames\":[\"amzn-s3-demo-bucket1\",\"amzn-s3-demo-bucket2\"],\"operation\":\"REMOVE\"}}
```

The following examples overwrite and replace the current list with a new list of S3 buckets to exclude. The new list specifies three buckets to exclude: *amzn-s3-demo-bucket*, *amzn-s3-demo-bucket2*, and *amzn-s3-demo-bucket3*. For Linux, macOS, or Unix:

```
$ aws macie2 update-classification-scope \
--id 117aff7ed76b59a59c3224ebdexample \
--s3 '{"excludes":{"bucketNames":["amzn-s3-demo-bucket","amzn-s3-demo-bucket2","amzn-s3-demo-bucket3"],"operation": "REPLACE"}}'
```

For Microsoft Windows:

```
C:\> aws macie2 update-classification-scope ^
--id 117aff7ed76b59a59c3224ebdexample ^
--s3={\"excludes\":{\"bucketNames\":[\"amzn-s3-demo-bucket\",\"amzn-s3-demo-bucket2\",\"amzn-s3-demo-bucket3\"],\"operation\":\"REPLACE\"}}
```

------

## Adding or removing managed data identifiers from automated sensitive data discovery
<a name="discovery-asdd-account-configure-mdis"></a>

A *managed data identifier* is a set of built-in criteria and techniques that are designed to detect a specific type of sensitive data—for example, credit card numbers, AWS secret access keys, or passport numbers for a particular country or region. By default, Amazon Macie analyzes S3 objects by using the set of managed data identifiers that we recommend for automated sensitive data discovery. To review a list of these identifiers, see [Default settings for automated sensitive data discovery](discovery-asdd-settings-defaults.md).

You can tailor the analyses to focus on specific types of sensitive data:
+ Add managed data identifiers for the types of sensitive data that you want Macie to detect and report, and
+ Remove managed data identifiers for the types of sensitive data that you don't want Macie to detect and report.

For a complete list of all the managed data identifiers that Macie currently provides and details for each one, see [Using managed data identifiers](managed-data-identifiers.md).

If you remove a managed data identifier, your change doesn't affect existing sensitive data discovery statistics and details for S3 buckets. For example, if you remove the managed data identifier for AWS secret access keys and Macie previously detected that data in a bucket, Macie continues to report those detections. However, instead of removing the identifier, which affects subsequent analyses of all buckets, consider excluding its detections from sensitivity scores for only particular buckets. For more information, see [Adjusting sensitivity scores for S3 buckets](discovery-asdd-s3bucket-manage.md).

**To add or remove managed data identifiers from automated sensitive data discovery**  
You can add or remove managed data identifiers by using the Amazon Macie console or the Amazon Macie API.

------
#### [ Console ]

Follow these steps to add or remove a managed data identifier by using the Amazon Macie console.

**To add or remove a managed data identifier**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. By using the AWS Region selector in the upper-right corner of the page, choose the Region in which you want to add or remove a managed data identifier from analyses.

1. In the navigation pane, under **Settings**, choose **Automated sensitive data discovery**.

   The **Automated sensitive data discovery** page appears and displays your current settings. On that page, the **Managed data identifiers** section displays your current settings, organized into two tabs:
   + **Added to default** – This tab lists managed data identifiers that you added. Macie uses these identifiers in addition to the ones that are in the default set and you haven't removed.
   + **Removed from default** – This tab lists managed data identifiers that you removed. Macie doesn't use these identifiers.

1. In the **Managed data identifiers** section, choose **Edit**.

1. Do any of the following:
   + To add one or more managed data identifiers, choose the **Added to default** tab. Then, in the table, select the checkbox for each managed data identifier to add. If a checkbox is already selected, you already added that identifier.
   + To remove one or more managed data identifiers, choose the **Removed from default** tab. Then, in the table, select the checkbox for each managed data identifier to remove. If a checkbox is already selected, you already removed that identifier.

   On each tab, the table displays a list of all the managed data identifiers that Macie currently provides. In the table, the first column specifies each managed data identifier's ID. The ID describes the type of sensitive data that an identifier is designed to detect—for example, **USA\$1PASSPORT\$1NUMBER** for US passport numbers. To find specific managed data identifiers more easily, enter search criteria in the search box above the table. You can also sort the table by choosing a column heading.

1. When you finish, choose **Save**.

------
#### [ API ]

To add or remove a managed data identifier programmatically, use the Amazon Macie API to update the sensitivity inspection template for your account. The template stores settings that specify which managed data identifiers to use (*include*) in addition to the ones in the default set. They also specify managed data identifiers to not use (*exclude*). The settings also specify any custom data identifiers and allow lists that you want Macie to use.

When you update the template, you overwrite its current settings. Therefore, it's a good idea to start by retrieving your current settings and determining which ones you want to keep. To retrieve your current settings, use the [GetSensitivityInspectionTemplate](https://docs.aws.amazon.com/macie/latest/APIReference/templates-sensitivity-inspections-id.html) operation. If you're using the AWS Command Line Interface (AWS CLI), run the [get-sensitivity-inspection-template](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-sensitivity-inspection-template.html) command to retrieve the settings.

To retrieve or update the template, you have to specify its unique identifier (`id`). You can get this identifier by using the [GetAutomatedDiscoveryConfiguration](https://docs.aws.amazon.com/macie/latest/APIReference/automated-discovery-configuration.html) operation. This operation retrieves your current configuration settings for automated sensitive data discovery, including the unique identifier for the sensitivity inspection template for your account in the current AWS Region. If you're using the AWS CLI, run the [get-automated-discovery-configuration](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-automated-discovery-configuration.html) command to retrieve this information.

When you're ready to update the template, use the [UpdateSensitivityInspectionTemplate](https://docs.aws.amazon.com/macie/latest/APIReference/templates-sensitivity-inspections-id.html) operation or, if you're using the AWS CLI, run the [update-sensitivity-inspection-template](https://docs.aws.amazon.com/cli/latest/reference/macie2/update-sensitivity-inspection-template.html) command. In your request, use the appropriate parameters to add or remove one or more managed data identifiers from subsequent analyses:
+ To start using a managed data identifier, specify its ID for the `managedDataIdentifierIds` parameter of the `includes` parameter.
+ To stop using a managed data identifier, specify its ID for the `managedDataIdentifierIds` parameter of the `excludes` parameter.
+ To restore the default settings, don't specify any IDs for the `includes` and `excludes` parameters. Macie then starts using only the managed data identifiers that are in the default set.

In addition to the parameters for managed data identifiers, use the appropriate `includes` parameters to specify any custom data identifiers (`customDataIdentifierIds`) and allow lists (`allowListIds`) that you want Macie to use. Also specify the Region that your request applies to. If your request succeeds, Macie updates the template and returns an empty response.

The following examples show how to use the AWS CLI to update the sensitivity inspection template for an account. The examples add one managed data identifier and remove another from subsequent analyses. They also maintain current settings that specify two custom data identifiers to use.

This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws macie2 update-sensitivity-inspection-template \
--id fd7b6d71c8006fcd6391e6eedexample \
--excludes '{"managedDataIdentifierIds":["UK_ELECTORAL_ROLL_NUMBER"]}' \
--includes '{"managedDataIdentifierIds":["STRIPE_CREDENTIALS"],"customDataIdentifierIds":["3293a69d-4a1e-4a07-8715-208ddexample","6fad0fb5-3e82-4270-bede-469f2example"]}'
```

This example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 update-sensitivity-inspection-template ^
--id fd7b6d71c8006fcd6391e6eedexample ^
--excludes={\"managedDataIdentifierIds\":[\"UK_ELECTORAL_ROLL_NUMBER\"]} ^
--includes={\"managedDataIdentifierIds\":[\"STRIPE_CREDENTIALS\"],\"customDataIdentifierIds\":[\"3293a69d-4a1e-4a07-8715-208ddexample\",\"6fad0fb5-3e82-4270-bede-469f2example\"]}
```

Where:
+ *fd7b6d71c8006fcd6391e6eedexample* is the unique identifier for the sensitivity inspection template to update.
+ *UK\$1ELECTORAL\$1ROLL\$1NUMBER* is the ID for the managed data identifier to stop using (*exclude*).
+ *STRIPE\$1CREDENTIALS* is the ID for the managed data identifier to start using (*include*).
+ *3293a69d-4a1e-4a07-8715-208ddexample* and *6fad0fb5-3e82-4270-bede-469f2example* are the unique identifiers for the custom data identifiers to use.

------

## Adding or removing custom data identifiers from automated sensitive data discovery
<a name="discovery-asdd-account-configure-cdis"></a>

A *custom data identifier* is a set of criteria that you define to detect sensitive data. The criteria consist of a regular expression (*regex*) that defines a text pattern to match and, optionally, character sequences and a proximity rule that refine the results. To learn more, see [Building custom data identifiers](custom-data-identifiers.md).

By default, Amazon Macie doesn't use custom data identifiers when it performs automated sensitive data discovery. If you want Macie to use specific custom data identifiers, you can add them to subsequent analyses. Macie then uses the custom data identifiers in addition to any managed data identifiers that you configure Macie to use.

If you add a custom data identifier, you can later remove it. Your change doesn't affect existing sensitive data discovery statistics and details for S3 buckets. That is to say, if you remove a custom data identifier that previously produced detections for a bucket, Macie continues to report those detections. However, instead of removing the identifier, which affects subsequent analyses of all buckets, consider excluding its detections from sensitivity scores for only particular buckets. For more information, see [Adjusting sensitivity scores for S3 buckets](discovery-asdd-s3bucket-manage.md).

**To add or remove custom data identifiers from automated sensitive data discovery**  
You can add or remove custom data identifiers by using the Amazon Macie console or the Amazon Macie API.

------
#### [ Console ]

Follow these steps to add or remove a custom data identifier by using the Amazon Macie console.

**To add or remove a custom data identifier**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. By using the AWS Region selector in the upper-right corner of the page, choose the Region in which you want to add or remove a custom data identifier from analyses.

1. In the navigation pane, under **Settings**, choose **Automated sensitive data discovery**.

   The **Automated sensitive data discovery** page appears and displays your current settings. On that page, the **Custom data identifiers** section lists custom data identifiers that you already added, or it indicates that you haven't added any custom data identifiers.

1. In the **Custom data identifiers** section, choose **Edit**.

1. Do any of the following:
   + To add one or more custom data identifiers, select the checkbox for each custom data identifier to add. If a checkbox is already selected, you already added that identifier.
   + To remove one or more custom data identifiers, clear the checkbox for each custom data identifier to remove. If a checkbox is already cleared, Macie doesn't currently use that identifier.
**Tip**  
To review or test the settings for a custom data identifier before you add or remove it, choose the link icon (![\[The link icon, which is a blue box that has an arrow in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-external-link.png)) next to the identifier's name. Macie opens a page that displays the identifier's settings. To also test the identifier with sample data, enter up to 1,000 characters of text in the **Sample data** box on that page. Then choose **Test**. Macie evaluates the sample data and reports the number of matches.

1. When you finish, choose **Save**.

------
#### [ API ]

To add or remove a custom data identifier programmatically, use the Amazon Macie API to update the sensitivity inspection template for your account. The template stores settings that specify which custom data identifiers you want Macie to use when performing automated sensitive data discovery. The settings also specify which managed data identifiers and allow lists to use.

When you update the template, you overwrite its current settings. Therefore, it's a good idea to start by retrieving your current settings and determining which ones you want to keep. To retrieve your current settings, use the [GetSensitivityInspectionTemplate](https://docs.aws.amazon.com/macie/latest/APIReference/templates-sensitivity-inspections-id.html) operation. If you're using the AWS Command Line Interface (AWS CLI), run the [get-sensitivity-inspection-template](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-sensitivity-inspection-template.html) command to retrieve the settings.

To retrieve or update the template, you have to specify its unique identifier (`id`). You can get this identifier by using the [GetAutomatedDiscoveryConfiguration](https://docs.aws.amazon.com/macie/latest/APIReference/automated-discovery-configuration.html) operation. This operation retrieves your current configuration settings for automated sensitive data discovery, including the unique identifier for the sensitivity inspection template for your account in the current AWS Region. If you're using the AWS CLI, run the [get-automated-discovery-configuration](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-automated-discovery-configuration.html) command to retrieve this information.

When you're ready to update the template, use the [UpdateSensitivityInspectionTemplate](https://docs.aws.amazon.com/macie/latest/APIReference/templates-sensitivity-inspections-id.html) operation or, if you're using the AWS CLI, run the [update-sensitivity-inspection-template](https://docs.aws.amazon.com/cli/latest/reference/macie2/update-sensitivity-inspection-template.html) command. In your request, use the `customDataIdentifierIds` parameter to add or remove one or more custom data identifiers from subsequent analyses: 
+ To start using a custom data identifier, specify its unique identifier for the parameter.
+ To stop using a custom data identifier, omit its unique identifier from the parameter.

Use additional parameters to specify which managed data identifiers and allow lists you want Macie to use. Also specify the Region that your request applies to. If your request succeeds, Macie updates the template and returns an empty response.

The following examples show how to use the AWS CLI to update the sensitivity inspection template for an account. The examples add two custom data identifiers to subsequent analyses. They also maintain current settings that specify which managed data identifiers and allow lists to use: use the default set of managed data identifiers and one allow list.

This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws macie2 update-sensitivity-inspection-template \
--id fd7b6d71c8006fcd6391e6eedexample \
--includes '{"allowListIds":["nkr81bmtu2542yyexample"],"customDataIdentifierIds":["3293a69d-4a1e-4a07-8715-208ddexample","6fad0fb5-3e82-4270-bede-469f2example"]}'
```

This example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 update-sensitivity-inspection-template ^
--id fd7b6d71c8006fcd6391e6eedexample ^
--includes={\"allowListIds\":[\"nkr81bmtu2542yyexample\"],\"customDataIdentifierIds\":[\"3293a69d-4a1e-4a07-8715-208ddexample\",\"6fad0fb5-3e82-4270-bede-469f2example\"]}
```

Where:
+ *fd7b6d71c8006fcd6391e6eedexample* is the unique identifier for the sensitivity inspection template to update.
+ *nkr81bmtu2542yyexample* is the unique identifier for the allow list to use.
+ *3293a69d-4a1e-4a07-8715-208ddexample* and *6fad0fb5-3e82-4270-bede-469f2example* are the unique identifiers for the custom data identifiers to use.

------

## Adding or removing allow lists from automated sensitive data discovery
<a name="discovery-asdd-account-configure-als"></a>

In Amazon Macie, an allow list defines specific text or a text pattern that you want Macie to ignore when it inspects S3 objects for sensitive data. If text matches an entry or pattern in an allow list, Macie doesn’t report the text. This is the case even if the text matches the criteria of a managed or custom data identifier. To learn more, see [Defining sensitive data exceptions with allow lists](allow-lists.md).

By default, Macie doesn't use allow lists when it performs automated sensitive data discovery. If you want Macie to use specific allow lists, you can add them to subsequent analyses. If you add an allow list, you can later remove it.

**To add or remove allow lists from automated sensitive data discovery**  
You can add or remove allow lists by using the Amazon Macie console or the Amazon Macie API.

------
#### [ Console ]

Follow these steps to add or remove an allow list by using the Amazon Macie console.

**To add or remove an allow list**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. By using the AWS Region selector in the upper-right corner of the page, choose the Region in which you want to add or remove an allow list from analyses.

1. In the navigation pane, under **Settings**, choose **Automated sensitive data discovery**. 

   The **Automated sensitive data discovery** page appears and displays your current settings. On that page, the **Allow lists** section specifies allow lists that you already added, or it indicates that you haven't added any allow lists.

1. In the **Allow lists** section, choose **Edit**.

1. Do any of the following:
   + To add one or more allow lists, select the checkbox for each allow list to add. If a checkbox is already selected, you already added that list.
   + To remove one or more allow lists, clear the checkbox for each allow list to remove. If a checkbox is already cleared, Macie doesn't currently use that list.
**Tip**  
To review the settings for an allow list before you add or remove it, choose the link icon (![\[The link icon, which is a blue box that has an arrow in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-external-link.png)) next to the list's name. Macie opens a page that displays the list's settings. If the list specifies a regular expression (*regex*), you can also use this page to test the regex with sample data. To do this, enter up to 1,000 characters of text in the **Sample data** box, and then choose **Test**. Macie evaluates the sample data and reports the number of matches.

1. When you finish, choose **Save**.

------
#### [ API ]

To add or remove an allow list programmatically, use the Amazon Macie API to update the sensitivity inspection template for your account. The template stores settings that specify which allow lists you want Macie to use when performing automated sensitive data discovery. The settings also specify which managed data identifiers and custom data identifiers to use.

When you update the template, you overwrite its current settings. Therefore, it's a good idea to start by retrieving your current settings and determining which ones you want to keep. To retrieve your current settings, use the [GetSensitivityInspectionTemplate](https://docs.aws.amazon.com/macie/latest/APIReference/templates-sensitivity-inspections-id.html) operation. If you're using the AWS Command Line Interface (AWS CLI), run the [get-sensitivity-inspection-template](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-sensitivity-inspection-template.html) command to retrieve the settings.

To retrieve or update the template, you have to specify its unique identifier (`id`). You can get this identifier by using the [GetAutomatedDiscoveryConfiguration](https://docs.aws.amazon.com/macie/latest/APIReference/automated-discovery-configuration.html) operation. This operation retrieves your current configuration settings for automated sensitive data discovery, including the unique identifier for the sensitivity inspection template for your account in the current AWS Region. If you're using the AWS CLI, run the [get-automated-discovery-configuration](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-automated-discovery-configuration.html) command to retrieve this information.

When you're ready to update the template, use the [UpdateSensitivityInspectionTemplate](https://docs.aws.amazon.com/macie/latest/APIReference/templates-sensitivity-inspections-id.html) operation or, if you're using the AWS CLI, run the [update-sensitivity-inspection-template](https://docs.aws.amazon.com/cli/latest/reference/macie2/update-sensitivity-inspection-template.html) command. In your request, use the `allowListIds` parameter to add or remove one or more allow lists from subsequent analyses:
+ To start using an allow list, specify its unique identifier for the parameter.
+ To stop using an allow list, omit its unique identifier from the parameter.

Use additional parameters to specify which managed data identifiers and custom data identifiers you want Macie to use. Also specify the Region that your request applies to. If your request succeeds, Macie updates the template and returns an empty response.

The following examples show how to use the AWS CLI to update the sensitivity inspection template for an account. The examples add an allow list to subsequent analyses. They also maintain current settings that specify which managed data identifiers and custom data identifiers to use: use the default set of managed data identifiers and two custom data identifiers.

This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws macie2 update-sensitivity-inspection-template \
--id fd7b6d71c8006fcd6391e6eedexample \
--includes '{"allowListIds":["nkr81bmtu2542yyexample"],"customDataIdentifierIds":["3293a69d-4a1e-4a07-8715-208ddexample","6fad0fb5-3e82-4270-bede-469f2example"]}'
```

This example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 update-sensitivity-inspection-template ^
--id fd7b6d71c8006fcd6391e6eedexample ^
--includes={\"allowListIds\":[\"nkr81bmtu2542yyexample\"],\"customDataIdentifierIds\":[\"3293a69d-4a1e-4a07-8715-208ddexample\",\"6fad0fb5-3e82-4270-bede-469f2example\"]}
```

Where:
+ *fd7b6d71c8006fcd6391e6eedexample* is the unique identifier for the sensitivity inspection template to update.
+ *nkr81bmtu2542yyexample* is the unique identifier for the allow list to use.
+ *3293a69d-4a1e-4a07-8715-208ddexample* and *6fad0fb5-3e82-4270-bede-469f2example* are the unique identifiers for the custom data identifiers to use.

------

# Disabling automated sensitive data discovery
<a name="discovery-asdd-account-disable"></a>

You can disable automated sensitive data discovery for an account or organization at any time. If you do this, Amazon Macie stops performing all automated discovery activities for the account or organization before a subsequent evaluation and analysis cycle starts, typically within 48 hours. Additional effects vary:
+ If you're a Macie administrator and you disable it for an individual account in your organization, you and the account can continue to access to all statistical data, inventory data, and other information that Macie produced and directly provided while performing automated discovery for the account. You can enable automated discovery for the account again. Macie then resumes all automated discovery activities for the account.
+ If you're a Macie administrator and you disable it for your organization, you and the accounts in your organization lose access to all statistical data, inventory data, and other information that Macie produced and directly provided while performing automated discovery for your organization. For example, your S3 bucket inventory no longer includes sensitivity visualizations or analyses statistics. You can subsequently enable automated discovery for your organization again. Macie then resumes all automated discovery activities for accounts in your organization. If you re-enable it within 30 days, you and the accounts regain access to data and information that Macie previously produced and directly provided while performing automated discovery. If you don't re-enable it within 30 days, Macie permanently deletes this data and information.
+ If you disable it for your standalone Macie account, you lose access to all statistical data, inventory data, and other information that Macie produced and directly provided while performing automated discovery for your account. If you don't re-enable it within 30 days, Macie permanently deletes this data and information.

You can continue to access sensitive data findings that Macie produced while performing automated sensitive data discovery for the account or organization. Macie stores findings for 90 days. Macie also retains your configuration settings for automated discovery. In addition, data that you stored or published to other AWS services remains intact and isn't affected, such as sensitive data discovery results in Amazon S3 and finding events in Amazon EventBridge.

**To disable automated sensitive data discovery**  
If you're the Macie administrator for an organization or you have a standalone Macie account, you can disable automated sensitive data discovery by using the Amazon Macie console or the Amazon Macie API. If you have a member account in an organization, work with your Macie administrator to disable automated discovery for your account. Only your Macie administrator can disable automated discovery for your account.

------
#### [ Console ]

Follow these steps to disable automated sensitive data discovery by using the Amazon Macie console.

**To disable automated sensitive data discovery**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. By using the AWS Region selector in the upper-right corner of the page, choose the Region in which you want to disable automated sensitive data discovery.

1. In the navigation pane, under **Settings**, choose **Automated sensitive data discovery**.

1. If you're the Macie administrator for an organization, choose an option in the **Status** section to specify the accounts to disable automated sensitive data discovery for:
   + To disable it for only particular member accounts, choose **Manage accounts**. Then, in the table on the **Accounts** page, select the checkbox for each account to disable it for. When you finish, choose **Disable automated sensitive data discovery** on the **Actions** menu.
   + To disable it for only your Macie administrator account, choose **Disable**. In the dialog box that appears, choose **My account**, and then choose **Disable**.
   + To disable it for all the accounts in your organization and your organization overall, choose **Disable**. In the dialog box that appears, choose **My organization**, and then choose **Disable**.

1. If you have a standalone Macie account, choose **Disable** in the **Status** section.

If you use Macie in multiple Regions and want to disable automated sensitive data discovery in additional Regions, repeat the preceding steps in each additional Region.

------
#### [ API ]

With the Amazon Macie API, you can disable automated sensitive data discovery in two ways. How you disable it depends partly on the type of account that you have. If you're the Macie administrator for an organization, it also depends on whether you want to disable automated discovery for only particular member accounts or your organization overall. If you disable it for your organization, you disable it for all the accounts that are currently part of your organization. If additional accounts subsequently join your organization, automated discovery is also disabled for those accounts.

To disable automated sensitive data discovery for an organization or a standalone Macie account, use the [UpdateAutomatedDiscoveryConfiguration](https://docs.aws.amazon.com/macie/latest/APIReference/automated-discovery-configuration.html) operation. Or, if you're using the AWS Command Line Interface (AWS CLI), run the [update-automated-discovery-configuration](https://docs.aws.amazon.com/cli/latest/reference/macie2/update-automated-discovery-configuration.html) command. In your request, specify `DISABLED` for the `status` parameter.

To disable automated sensitive data discovery for only particular member accounts in an organization, use the [BatchUpdateAutomatedDiscoveryAccounts](https://docs.aws.amazon.com/macie/latest/APIReference/automated-discovery-accounts.html) operation. Or, if you're using the AWS CLI, run the [batch-update-automated-discovery-accounts](https://docs.aws.amazon.com/cli/latest/reference/macie2/batch-update-automated-discovery-accounts.html) command. In your request, use the `accountId` parameter to specify the account ID for an account that you want to disable automated discovery for. For the `status` parameter, specify `DISABLED`. To disable automated discovery for an account, Macie must currently be enabled for the account.

The following examples show how to use the AWS CLI to disable automated sensitive data discovery for one or more accounts in an organization. This first example disables automated discovery for an organization. It disables automated discovery for the Macie administrator account and all member accounts in the organization.

```
$ aws macie2 update-automated-discovery-configuration --status DISABLED --region us-east-1
```

Where *us-east-1* is the Region in which to disable automated sensitive data discovery for the organization, the US East (N. Virginia) Region. If the request succeeds, Macie disables automated discovery for the organization and returns an empty response.

These next examples disable automated sensitive data discovery for two member accounts in an organization. This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws macie2 batch-update-automated-discovery-accounts \
--region us-east-1 \
--accounts '[{"accountId":"123456789012","status":"DISABLED"},{"accountId":"111122223333","status":"DISABLED"}]'
```

This example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 batch-update-automated-discovery-accounts ^
--region us-east-1 ^
--accounts=[{\"accountId\":\"123456789012\",\"status\":\"DISABLED\"},{\"accountId\":\"111122223333\",\"status\":\"DISABLED\"}]
```

Where:
+ *us-east-1* is the Region in which to disable automated sensitive data discovery for the specified accounts, the US East (N. Virginia) Region.
+ *123456789012* and *111122223333* are the account IDs for the accounts to disable automated sensitive data discovery for.

If the request succeeds for all specified accounts, Macie returns an empty `errors` array. If the request fails for some accounts, the array specifies the error that occurred for each affected account. For example:

```
"errors": [
    {
        "accountId": "123456789012",
        "errorCode": "ACCOUNT_PAUSED"
    }
]
```

In the preceding response, the request failed for the specified account (`123456789012`) because Macie is currently suspended for the account.

If the request fails for all accounts, you receive a message that describes the error that occurred. For example:

```
An error occurred (ConflictException) when calling the BatchUpdateAutomatedDiscoveryAccounts operation: Cannot modify account states
while auto-enable is set to ALL.
```

In the preceding response, the request failed because the member enablement setting for the organization is currently configured to enable automated sensitive data discovery for all accounts (`ALL`). To address the error, the Macie administrator must first change this setting to `NONE` or `NEW`. For information about this setting, see [Enabling automated sensitive data discovery](discovery-asdd-account-enable.md).

------

# Reviewing automated sensitive data discovery results
<a name="discovery-asdd-results-s3"></a>

If automated sensitive data discovery is enabled, Amazon Macie automatically generates and maintains additional inventory data, statistics, and other information about the Amazon Simple Storage Service (Amazon S3) general purpose buckets for your account. If you're the Macie administrator for an organization, by default this includes S3 buckets that your member accounts own.

The additional information captures the results of automated sensitive data discovery activities that Macie has performed thus far. It also supplements other information that Macie provides about your Amazon S3 data, such as public access and encryption settings for individual S3 buckets. In addition to metadata and statistics, Macie produces records of the sensitive data it finds and the analysis that it performs—*sensitive data findings* and *sensitive data discovery results*.

As automated sensitive data discovery progresses each day, the following features and data can help you review and evaluate the results:
+ [****Summary** dashboard**](discovery-asdd-results-s3-dashboard.md) – Provides aggregated statistics for your Amazon S3 data estate. The statistics include data for key metrics such as the total number of buckets that Macie has found sensitive data in, and how many of those buckets are publicly accessible. They also report issues that affect coverage of your Amazon S3 data.
+ [****S3 buckets** heat map**](discovery-asdd-results-s3-inventory-map.md) – Provides an interactive, visual representation of data sensitivity across your data estate, grouped by AWS account. For each account, the map includes aggregated sensitivity statistics and it uses colors to indicate the current sensitivity score for each bucket that the account owns. The map also uses symbols to help you identify buckets that are publicly accessible, can't be analyzed by Macie, and more.
+ [****S3 buckets** table**](discovery-asdd-results-s3-inventory-table.md) – Provides summary information for each S3 bucket in your inventory. For each bucket, the table includes data such as the bucket's current sensitivity score, the number of objects that Macie can analyze in the bucket, and whether you configured any sensitive data discovery jobs to periodically analyze objects in the bucket. You can export data from the table to a comma-separated values (CSV) file. 
+ [****S3 bucket** details**](discovery-asdd-results-s3-inventory-details.md) – Provides detailed statistics and information about an S3 bucket. The details include a list of objects that Macie has analyzed in the bucket, and a breakdown of the types and number of occurrences of sensitive data that Macie has found in the bucket. These are in addition to details about settings that affect the security and privacy of the bucket’s data.
+ [**Sensitive data findings**](discovery-asdd-results-s3-findings.md) – Provide detailed reports of sensitive data that Macie found in individual S3 objects. The details include when Macie found the sensitive data, and the types and number of occurrences of the sensitive data that Macie found. The details also include information about the affected S3 bucket and object, including the bucket's public access settings and when the object was most recently changed.
+ [**Sensitive data discovery results**](discovery-asdd-results-s3-sddrs.md) – Provide records of the analysis that Macie performed for individual S3 objects. This includes objects that Macie doesn't find sensitive data in, and objects that Macie can't analyze due to issues or errors. If Macie finds sensitive data in an object, the sensitive data discovery result provides information about the sensitive data that Macie found.

With this data, you can evaluate data sensitivity across your Amazon S3 data estate and drill down to evaluate and investigate individual S3 buckets and objects. Combined with information that Macie provides about the security and privacy of your Amazon S3 data, you can also identify cases where immediate remediation might be necessary—for example, a publicly accessible bucket that Macie found sensitive data in.

Additional data can help you assess and monitor coverage of your Amazon S3 data. With coverage data, you can check the status of the analyses for your data estate overall and individual S3 buckets within it. You can also identify issues that prevented Macie from analyzing objects in specific buckets. If you remediate the issues, you can increase coverage of your Amazon S3 data during subsequent analysis cycles. For more information, see [Assessing automated sensitive data discovery coverage](discovery-coverage.md).

**Topics**
+ [Reviewing data sensitivity statistics on the Summary dashboard](discovery-asdd-results-s3-dashboard.md)
+ [Visualizing data sensitivity with the S3 buckets map](discovery-asdd-results-s3-inventory-map.md)
+ [Assessing data sensitivity with the S3 buckets table](discovery-asdd-results-s3-inventory-table.md)
+ [Reviewing data sensitivity details for S3 buckets](discovery-asdd-results-s3-inventory-details.md)
+ [Analyzing findings from automated sensitive data discovery](discovery-asdd-results-s3-findings.md)
+ [Accessing discovery results from automated sensitive data discovery](discovery-asdd-results-s3-sddrs.md)

# Reviewing data sensitivity statistics on the Summary dashboard
<a name="discovery-asdd-results-s3-dashboard"></a>

On the Amazon Macie console, the **Summary** dashboard provides a snapshot of aggregated statistics and findings data for your Amazon Simple Storage Service (Amazon S3) data in the current AWS Region. It's designed to help you assess the overall security posture of your Amazon S3 data.

Dashboard statistics include data for key security metrics such as the number of S3 general purpose buckets that are publicly accessible or shared with other AWS accounts. The dashboard also displays groups of aggregated findings data for your account—for example, the buckets that generated the most findings during the preceding seven days. If you're the Macie administrator for an organization, the dashboard provides aggregated statistics and data for all the accounts in your organization. You can optionally filter the data by account.

If automated sensitive data discovery is enabled, the **Summary** dashboard includes additional statistics. The statistics capture the status and results of automated discovery activities that Macie has performed thus far for your Amazon S3 data. The following image shows an example of these statistics. 

![\[Sensitive data discovery statistics on the Summary dashboard. Each statistic has example data.\]](http://docs.aws.amazon.com/macie/latest/user/images/scrn-summary-dashboard-sensitivity.png)


The statistics are organized primarily into two sections, **Automated discovery** and **Coverage issues**. Statistics in the **Automated discovery** section provide a snapshot of the current status and results of automated sensitive data discovery activities. Statistics in the **Coverage issues** section indicate whether issues prevented Macie from analyzing objects in individual S3 buckets. The statistics don't include data for sensitive data discovery jobs that you create and run. However, remediating coverage issues for automated sensitive data discovery is likely to also increase coverage by jobs that you subsequently run.

**Topics**
+ [Displaying the dashboard](#discovery-asdd-results-s3-dashboard-view)
+ [Understanding statistics on the dashboard](#discovery-asdd-results-s3-dashboard-statistics)

## Displaying the Summary dashboard
<a name="discovery-asdd-results-s3-dashboard-view"></a>

Follow these steps to display the **Summary** dashboard on the Amazon Macie console. To query the statistics programmatically, use the [GetBucketStatistics](https://docs.aws.amazon.com/macie/latest/APIReference/datasources-s3-statistics.html) operation of the Amazon Macie API.

**To display the Summary dashboard**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Summary**. Macie displays the **Summary** dashboard.

1. To drill down and review the supporting data for an item on the dashboard, choose the item.

If you're the Macie administrator for an organization, the dashboard displays aggregated statistics and data for your account and member accounts in your organization. To display data for only a particular account, enter the account's ID in the **Account** box above the dashboard.

## Understanding sensitive data discovery statistics on the Summary dashboard
<a name="discovery-asdd-results-s3-dashboard-statistics"></a>

The **Summary** dashboard includes aggregated statistics that can help you monitor automated sensitive data discovery for your Amazon S3 data. It provides a snapshot of the current status and results of the analyses for your Amazon S3 data in the current AWS Region. For example, you can use dashboard statistics to quickly determine how many S3 buckets Amazon Macie has found sensitive data in, and how many of those buckets are publicly accessible. You can also assess coverage of your Amazon S3 data. Coverage statistics can help you identify issues that prevent Macie from analyzing objects in individual S3 buckets. 

On the dashboard, statistics for automated sensitive data discovery are organized into the following sections:
+ [Storage and sensitive data discovery](#discovery-asdd-results-s3-dashboard-storage-statistics)
+ [Automated discovery](#discovery-asdd-results-s3-dashboard-sensitivity-statistics)
+ [Coverage issues](#discovery-asdd-results-s3-dashboard-coverage-statistics)

Individual statistics in each section are as follows. For information about statistics in other sections of the dashboard, see [Understanding components of the Summary dashboard](monitoring-s3-dashboard.md#monitoring-s3-dashboard-components-main).

### Storage and sensitive data discovery
<a name="discovery-asdd-results-s3-dashboard-storage-statistics"></a>

At the top of the dashboard, statistics indicate how much data you store in Amazon S3, and how much of that data Amazon Macie can analyze to detect sensitive data. The following image shows an example of these statistics for an organization with seven accounts.

![\[The Storage and sensitive data discovery section of the dashboard. Each field contains example data.\]](http://docs.aws.amazon.com/macie/latest/user/images/scrn-summary-dashboard-storage.png)


Individual statistics in this section are:
+ **Total accounts** – This field appears if you're the Macie administrator for an organization or you have a standalone Macie account. It indicates the total number of AWS accounts that own buckets in your bucket inventory. If you're a Macie administrator, this is the total number of Macie accounts that you manage for your organization. If you have a standalone Macie account, this value is *1*.

  **Total S3 buckets** – This field appears if you have a member account in an organization. It indicates the total number of general purpose buckets in your inventory, including buckets that don't store any objects. 
+ **Storage** – These statistics provide information about the storage size of objects in your bucket inventory:
  + **Classifiable** – The total storage size of all the objects that Macie can analyze in the buckets.
  + **Total** – The total storage size of all the objects in the buckets, including objects that Macie can’t analyze.

  If any of the objects are compressed files, these values don’t reflect the actual size of those files after they’re decompressed. If versioning is enabled for any of the buckets, these values are based on the storage size of the latest version of each object in those buckets.
+ **Objects** – These statistics provide information about the number of objects in your bucket inventory:
  + **Classifiable** – The total number of objects that Macie can analyze in the buckets.
  + **Total** – The total number of objects in the buckets, including objects that Macie can’t analyze.

In the preceding statistics, data and objects are *classifiable* if they use a supported Amazon S3 storage class and they have a file name extension for a supported file or storage format. You can detect sensitive data in the objects by using Macie. For more information, see [Supported storage classes and formats](discovery-supported-storage.md).

Note that **Storage** and **Objects** statistics don't include data about objects in buckets that Macie isn't allowed to access. To identify buckets where this is the case, choose the **Access denied** statistic in the **Coverage issues** section of the dashboard.

### Automated discovery
<a name="discovery-asdd-results-s3-dashboard-sensitivity-statistics"></a>

This section captures the status and results of automated sensitive data discovery activities that Amazon Macie has performed thus far for your Amazon S3 data. The following image shows an example of the statistics that this section provides.

![\[The Automated discovery section of the dashboard. A chart and related fields contain example data.\]](http://docs.aws.amazon.com/macie/latest/user/images/scrn-summary-dashboard-asdd.png)


Individual statistics in this section are as follows.

**Total buckets**  
The doughnut chart indicates the total number of buckets in your bucket inventory. The chart groups the buckets into categories based on each bucket's current sensitivity score:  
+ **Sensitive** (*red*) – The total number of buckets whose sensitivity score ranges from *51* through *100*.
+ **Not sensitive** (*blue*) – The total number of buckets whose sensitivity score ranges from *1* through *49*.
+ **Not yet analyzed** (*light gray*) – The total number of buckets whose sensitivity score is *50*.
+ **Classification error** (*dark gray*) – The total number of buckets whose sensitivity score is *-1*.
For details about the range of sensitivity scores and labels that Macie defines, see [Sensitivity scoring for S3 buckets](discovery-scoring-s3.md).  
To review additional statistics for a group, hover over the group:  
+ **Buckets** – The total number of buckets.
+ **Publicly accessible** – The total number of buckets that allow the general public to have read or write access to the bucket.
+ **Classifiable bytes** – The total storage size of all the objects that Macie can analyze in the buckets. These objects use supported Amazon S3 storage classes and they have file name extensions for supported file or storage formats. For more information, see [Supported storage classes and formats](discovery-supported-storage.md).
+ **Total bytes** – The total storage size of all the buckets.
In the preceding statistics, storage size values are based on the storage size of the latest version of each object in the buckets. If any of the objects are compressed files, these values don’t reflect the actual size of those files after they’re decompressed.

**Sensitive**  
This area indicates the total number of buckets that currently have a sensitivity score ranging from *51* through *100*. Within this group, **Publicly accessible** indicates the total number of buckets that also allow the general public to have read or write access to the bucket.

**Not sensitive**  
This area indicates the total number of buckets that currently have a sensitivity score ranging from *1* through *49*. Within this group, **Publicly accessible** indicates the total number of buckets that also allow the general public to have read or write access to the bucket.

To determine and calculate values for **Publicly accessible** statistics, Macie analyzes a combination of account- and bucket-level settings for each bucket, such as the block public access settings for the account and bucket, and the bucket policy for the bucket. Macie does this for up to 10,000 buckets for an account. For more information, see [How Macie monitors Amazon S3 data security](monitoring-s3-how-it-works.md).

Note that statistics in the **Automated discovery** section don't include the results of sensitive data discovery jobs that you create and run.

### Coverage issues
<a name="discovery-asdd-results-s3-dashboard-coverage-statistics"></a>

In this section, statistics indicate whether certain types of issues prevented Amazon Macie from analyzing objects in individual S3 buckets. The following image shows an example of the statistics that this section provides.

![\[The Coverage issues section of the dashboard. Each field contains example data.\]](http://docs.aws.amazon.com/macie/latest/user/images/scrn-summary-dashboard-coverage.png)


Individual statistics in this section are:
+ **Access denied** – The total number of buckets that Macie isn't allowed to access. Macie can't analyze any objects in these buckets. The buckets' permissions settings prevent Macie from accessing the buckets and the buckets' objects.
+ **Classification error** – The total number of buckets that Macie hasn't analyzed yet due to object-level classification errors. Macie tried to analyze one or more objects in these buckets. However, Macie couldn't analyze the objects due to issues with object-level permissions settings, object content, or quotas.
+ **Unclassifiable** – The total number of buckets that don't store any classifiable objects. Macie can't analyze any objects in these buckets. All the objects use Amazon S3 storage classes that Macie doesn't support, or they have file name extensions for file or storage formats that Macie doesn't support. 

Choose the value for a statistic to display additional details and, as applicable, remediation guidance. If you remediate access issues and classification errors, you can increase coverage of your Amazon S3 data during subsequent analysis cycles. For more information, see [Assessing automated sensitive data discovery coverage](discovery-coverage.md).

Note that statistics in the **Coverage issues** section don't explicitly include data for sensitive data discovery jobs that you create and run. However, remediating coverage issues that affect automated sensitive data discovery is likely to also increase coverage by jobs that you subsequently run.

# Visualizing data sensitivity with the S3 buckets map
<a name="discovery-asdd-results-s3-inventory-map"></a>

On the Amazon Macie console, the **S3 buckets** heat map provides an interactive, visual representation of data sensitivity across your Amazon Simple Storage Service (Amazon S3) data estate. It captures the results of automated sensitive data discovery activities that Macie has performed thus far for your Amazon S3 data in the current AWS Region.

If you're the Macie administrator for an organization, the map includes results for S3 buckets that your member accounts own. The data is grouped by AWS account and sorted by account ID, as shown in the following image.

![\[The S3 buckets map. It shows different colored squares, one for each bucket, grouped by account.\]](http://docs.aws.amazon.com/macie/latest/user/images/scrn-s3-map-small.png)


The map displays data for up to 100 S3 buckets for each account. To display data for all buckets, you can [switch to table view](discovery-asdd-results-s3-inventory-table.md) and review the data in tabular format instead.

To display the map, choose **S3 buckets** in the navigation pane on the console. Then choose map (![\[The map view button, which is a button that displays four black squares.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-s3-map-view.png)) at the top of the page. The map is available only if automated sensitive data discovery is currently enabled. It doesn't include the results of sensitive data discovery jobs that you create and run.

**Topics**
+ [Interpreting data in the S3 buckets map](#discovery-asdd-results-s3-inventory-map-legend)
+ [Interacting with the S3 buckets map](#discovery-asdd-results-s3-inventory-map-use)

## Interpreting data in the S3 buckets map
<a name="discovery-asdd-results-s3-inventory-map-legend"></a>

In the **S3 buckets** map, each square represents an S3 general purpose bucket in your bucket inventory. The color of a square represents a bucket's current sensitivity score, which measures the intersection of two primary dimensions: the amount of sensitive data that Macie has found in the bucket, and the amount of data that Macie has analyzed in the bucket. The intensity of the color's hue represents where a score falls in a range of data sensitivity values, as shown in the following image.

![\[The color spectrum for sensitivity scores: blue hues for 1-49, red hues for 51-100, and gray for -1.\]](http://docs.aws.amazon.com/macie/latest/user/images/sensitivity-scoring-spectrum.png)


In general, you can interpret color and hue intensity as follows:
+ **Blue** – If a bucket's current sensitivity score ranges from *1* through *49*, the bucket's square is blue and the bucket's sensitivity label is **Not sensitive**. The intensity of the blue hue reflects the number of unique objects that Macie has analyzed in the bucket relative to the total number of unique objects in the bucket. A darker hue indicates a lower sensitivity score.
+ **No color** – If a bucket's current sensitivity score is *50*, the bucket's square isn't colored and the bucket's sensitivity label is **Not yet analyzed**. In addition, the square has a dashed border.
+ **Red** – If a bucket's current sensitivity score ranges from *51* through *100*, the bucket's square is red and the bucket's sensitivity label is **Sensitive**. The intensity of the red hue reflects the amount of sensitive data that Macie has found in the bucket. A darker hue indicates a higher sensitivity score.
+ **Gray** – If a bucket's current sensitivity score is *-1*, the bucket's square is dark gray and the bucket's sensitivity label is **Classification error**. Hue intensity doesn't vary.

For details about the range of sensitivity scores and labels that Macie defines, see [Sensitivity scoring for S3 buckets](discovery-scoring-s3.md).

In the map, the square for an S3 bucket might also contain a symbol. The symbol indicates an error, issue, or other type of consideration that might affect your evaluation of a bucket's sensitivity. A symbol can also indicate a potential issue with the security of the bucket—for example, the bucket is publicly accessible. The following table lists the symbols that Macie uses to notify you of these cases.


| Symbol | Definition | Description | 
| --- | --- | --- | 
|  ![\[The Access denied symbol, which is a gray exclamation point.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-map-access-denied.png)  | Access denied |  Macie isn't allowed to access the bucket or the bucket's objects. Consequently, Macie can't analyze any objects in the bucket.  This issue typically occurs because a bucket has a restrictive bucket policy. For information about how to address this issue, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md).  | 
|  ![\[The Publicly accessible symbol, which is a solid, gray, upward-facing arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-map-publicly-accessible.png)  | Publicly accessible |  The general public has read or write access to the bucket. To make this determination, Macie analyzes a combination of settings for each bucket, such as the block public access settings for the account and the bucket, and the bucket policy for the bucket. Macie can do this for up to 10,000 buckets for an account. For more information, see [How Macie monitors Amazon S3 data security](monitoring-s3-how-it-works.md).  | 
|  ![\[The Unclassifiable symbol, which is a gray question mark.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-map-unclassifiable.png)  | Unclassifiable |  Macie can't analyze any objects in the bucket. All the bucket's objects use Amazon S3 storage classes that Macie doesn't support, or they have file name extensions for file or storage formats that Macie doesn't support. For Macie to analyze an object, the object must use a supported storage class and have a file name extension for a supported file or storage format. For more information, see [Supported storage classes and formats](discovery-supported-storage.md).  | 
|  ![\[The Zero bytes symbol, which is the number zero.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-map-zero-bytes.png)  | Zero bytes |  The bucket doesn't store any objects for Macie to analyze. The bucket is empty or all the objects in the bucket contain zero (0) bytes of data.  | 

## Interacting with the S3 buckets map
<a name="discovery-asdd-results-s3-inventory-map-use"></a>

As you review the **S3 buckets** map, you can interact with it in different ways to reveal and evaluate additional data and details for individual accounts and buckets. Follow these steps to display the map and use various features that it provides. 

**To interact with the S3 buckets map**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **S3 buckets**. The **S3 buckets** page displays a map of your bucket inventory. If the page displays your inventory in tabular format instead, choose map (![\[The map view button, which is a button that displays four black squares.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-s3-map-view.png)) at the top of the page.

   By default, the map doesn't display data for buckets that are currently excluded from automated sensitive data discovery. If you're the Macie administrator for an organization, it also doesn't display data for accounts that automated sensitive data discovery is currently disabled for. To display this data, choose **X** in the **Is monitored by automated discovery** filter token below the filter box.

1. At the top of the page, optionally choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) to retrieve the latest bucket metadata from Amazon S3.

1. In the **S3 buckets** map, do any of the following:
   + To determine how many buckets have a specific sensitivity label, refer to the colored badges immediately below an AWS account ID. The badges display aggregated bucket counts, broken down by sensitivity label.

     For example, the red badge reports the total number of buckets that are owned by the account and have the **Sensitive** label. The sensitivity score for these buckets ranges from *51* through *100*. The blue badge reports the total number of buckets that are owned by the account and have the **Not sensitive** label. The sensitivity score for these buckets ranges from *1* through *49*.
   + To review a subset of information about a bucket, hover over the bucket's square. A popover displays the bucket's name and current sensitivity score.

     The popover also displays the total number of objects that Macie can analyze in the bucket and the total storage size of the latest version of those objects. These objects are *classifiable*. They use supported Amazon S3 storage classes and they have file name extensions for supported file or storage formats. For more information, see [Supported storage classes and formats](discovery-supported-storage.md).
   + To filter the map and display only those buckets that have a specific value for a field, place your cursor in the filter box, and then add a filter condition for the field. Macie applies the condition's criteria and displays the condition below the filter box. To further refine the results, add filter conditions for additional fields. For more information, see [Filtering your S3 bucket inventory](monitoring-s3-inventory-filter.md).
   + To drill down and display only those buckets that are owned by a particular account, choose the account ID for the account. Macie opens a new tab that filters and displays data only for that account.

1. To review data sensitivity statistics and other information for a particular bucket, choose the bucket's square. Then refer to the details panel. For information about these details, see [Reviewing data sensitivity details for S3 buckets](discovery-asdd-results-s3-inventory-details.md).
**Tip**  
On the **Bucket details** tab of the panel, you can pivot and drill down on many of the fields. To show buckets that have the same value for a field, choose ![\[The zoom in icon, which is a magnifying glass that has a plus sign in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-magnifying-glass-plus-sign.png) in the field. To show buckets that have other values for a field, choose ![\[The zoom out icon, which is a magnifying glass that has a minus sign in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-magnifying-glass-minus-sign.png) in the field.

# Assessing data sensitivity with the S3 buckets table
<a name="discovery-asdd-results-s3-inventory-table"></a>

To review summary information for your Amazon Simple Storage Service (Amazon S3) buckets, you can use the **S3 buckets** table on the Amazon Macie console. By using the table, you can review and analyze an inventory of your general purpose buckets in the current AWS Region, and drill down to review detailed information and statistics for individual buckets. If you're the Macie administrator for an organization, the table includes information about buckets that your member accounts own. If you prefer to access and query the data programmatically, you can use the [DescribeBuckets](https://docs.aws.amazon.com/macie/latest/APIReference/datasources-s3.html) operation of the Amazon Macie API. 

On the console, you can sort and filter the table to customize your view. You can also export data from the table to a comma-separated values (CSV) file. If you choose an S3 bucket in the table, the details panel displays additional information about the bucket. This includes details and statistics for settings and metrics that provide insight into the security and privacy of the bucket’s data. If automated sensitive data discovery is enabled, it also includes data that captures the results of automated discovery activities that Macie has performed thus far for the bucket.

**To assess data sensitivity by using the S3 buckets table**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **S3 buckets**. The **S3 buckets** page displays your bucket inventory.

   By default, the page doesn't display data for buckets that are currently excluded from automated sensitive data discovery. If you're the Macie administrator for an organization, it also doesn't display data for accounts that automated sensitive data discovery is currently disabled for. To display this data, choose **X** in the **Is monitored by automated discovery** filter token below the filter box.

1. Choose table (![\[The table view button, which is a button that displays three black horizontal lines.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-s3-table-view.png)) at the top of the page. Macie displays the number of buckets in your inventory and a table of the buckets.

1. To retrieve the latest bucket metadata from Amazon S3, choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) at the top of the page.

   If the information icon (![\[The information icon, which is a blue circle that has a lowercase letter i in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-info-blue.png)) appears next to any bucket names, we recommend that you do this. This icon indicates that a bucket was created during the past 24 hours, possibly after Macie last retrieved bucket and object metadata from Amazon S3 as part of the [daily refresh cycle](monitoring-s3-how-it-works.md#monitoring-s3-how-it-works-data-refresh).

1. In the **S3 buckets** table, review summary information about each bucket in your inventory:
   + **Sensitivity** – The bucket's current sensitivity score. For information about the range of sensitivity scores that Macie defines, see [Sensitivity scoring for S3 buckets](discovery-scoring-s3.md).
   + **Bucket** – The name of the bucket.
   + **Account** – The account ID for the AWS account that owns the bucket.
   + **Classifiable objects** – The total number of objects that Macie can analyze to detect sensitive data in the bucket.
   + **Classifiable size** – The total storage size of all the objects that Macie can analyze to detect sensitive data in the bucket.

     This value doesn’t reflect the actual size of any compressed objects after they're decompressed. Also, if versioning is enabled for the bucket, this value is based on the storage size of the latest version of each object in the bucket.
   + **Monitored by job** – Whether you configured any sensitive data discovery jobs to periodically analyze objects in the bucket on a daily, weekly, or monthly basis.

     If the value for this field is *Yes*, the bucket is explicitly included in a periodic job or the bucket matched the criteria for a periodic job within the past 24 hours. In addition, the status of at least one of those jobs is not *Cancelled*. Macie updates this data on a daily basis.
   + **Latest job run** – If you configured any one-time or periodic sensitive data discovery jobs to analyze objects in the bucket, this field indicates the most recent date and time when one of those jobs started to run. Otherwise, a dash (–) appears in this field. 

   In the preceding data, objects are *classifiable* if they use a supported Amazon S3 storage class and they have a file name extension for a supported file or storage format. You can detect sensitive data in the objects by using Macie. For more information, see [Supported storage classes and formats](discovery-supported-storage.md).

1. To analyze your inventory by using the table, do any of the following:
   + To sort the table by a specific field, choose the column heading for the field. To change the sort order, choose the column heading again.
   + To filter the table and display only those buckets that have a specific value for a field, place your cursor in the filter box, and then add a filter condition for the field. To further refine the results, add filter conditions for additional fields. For more information, see [Filtering your S3 bucket inventory](monitoring-s3-inventory-filter.md).
   + To review data sensitivity statistics and other information for a particular bucket, choose the bucket's name. Then refer to the details panel. For information about these details, see [Reviewing S3 bucket details](discovery-asdd-results-s3-inventory-details.md).
**Tip**  
On the **Bucket details** tab of the panel, you can pivot and drill down on many of the fields. To show buckets that have the same value for a field, choose ![\[The zoom in icon, which is a magnifying glass that has a plus sign in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-magnifying-glass-plus-sign.png) in the field. To show buckets that have other values for a field, choose ![\[The zoom out icon, which is a magnifying glass that has a minus sign in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-magnifying-glass-minus-sign.png) in the field.

1. To export data from the table to a CSV file, select the checkbox for each row to export, or select the checkbox in the selection column heading to select all rows. Then choose **Export to CSV** at the top of the page. You can export up to 50,000 rows from the table. 

1. To perform deeper, more immediate analysis of objects in one or more buckets, select the checkbox for each bucket. Then choose **Create job**. For more information, see [Creating a sensitive data discovery job](discovery-jobs-create.md).

# Reviewing data sensitivity details for S3 buckets
<a name="discovery-asdd-results-s3-inventory-details"></a>

As automated sensitive data discovery progresses, you can review detailed results in statistics and other information that Amazon Macie provides about each of your Amazon Simple Storage Service (Amazon S3) buckets. If you're the Macie administrator for an organization, this includes buckets that your member accounts own.

The statistics and information include details that provide insight into the security and privacy of an S3 bucket’s data. They also capture the results of automated sensitive data discovery activities that Macie has performed thus far for a bucket. For example, you can find a list of objects that Macie has analyzed in a bucket. You can also find a breakdown of the types and number of occurrences of sensitive data that Macie has found in a bucket. Note that this data doesn't include the results of sensitive data discovery jobs that you create and run.

Macie automatically recalculates and updates statistics and details for your S3 buckets while it performs automated sensitive data discovery. For example:
+ If Macie doesn't find sensitive data in an S3 object, Macie decreases the bucket's sensitivity score and updates the bucket's sensitivity label as necessary. Macie also adds the object to the list of objects that it selected for analysis.
+ If Macie finds sensitive data in an S3 object, Macie adds those occurrences to the breakdown of sensitive data types that Macie has found in the bucket. Macie also increases the bucket's sensitivity score and updates the bucket's sensitivity label as necessary. In addition, Macie adds the object to the list of objects that it selected for analysis. These tasks are in addition to creating a sensitive data finding for the object.
+ If Macie finds sensitive data in an S3 object that's subsequently changed or deleted, Macie removes sensitive data occurrences for the object from the bucket's breakdown of sensitive data types. Macie also decreases the bucket's sensitivity score and updates the bucket's sensitivity label as necessary. In addition, Macie removes the object from the list of objects that it selected for analysis.
+ If Macie attempts to analyze an S3 object but an issue or error prevents analysis, Macie adds the object to the list of objects that it selected for analysis, and indicates that it wasn't able to analyze the object.

If you're the Macie administrator for an organization or you have a standalone Macie account, you can optionally use these details to assess and adjust certain automated discovery settings for an S3 bucket. For example, you can include or exclude specific types of sensitive data from a bucket's score. For more information, see [Adjusting sensitivity scores for S3 buckets](discovery-asdd-s3bucket-manage.md).

**To review data sensitivity details for an S3 bucket**  
To review data sensitivity and other details for an S3 bucket, you can use the Amazon Macie console or the Amazon Macie API. On the console, the details panel provides centralized access to this information. With the API, you can retrieve and process the data programmatically.

------
#### [ Console ]

Follow these steps to review data sensitivity and other details for an S3 bucket by using the Amazon Macie console.

**To review the details for an S3 bucket**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **S3 buckets**. The **S3 buckets** page displays an interactive map of your bucket inventory. Optionally choose table (![\[The table view button, which is a button that displays three black horizontal lines.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-s3-table-view.png)) at the top of the page to display your inventory in tabular format instead.

   By default, the page doesn't display data for buckets that are currently excluded from automated sensitive data discovery. If you're the Macie administrator for an organization, it also doesn't display data for accounts that automated sensitive data discovery is currently disabled for. To display this data, choose **X** in the **Is monitored by automated discovery** filter token below the filter box.

1. To retrieve the latest bucket metadata from Amazon S3, choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) at the top of the page.

1. Choose the bucket whose details you want to review. The details panel displays data sensitivity statistics and other information about the bucket.

The top of the panel shows general information about the bucket: the bucket's name, the account ID for the AWS account that owns the bucket, and the bucket's current sensitivity score. If you're a Macie administrator or you have a standalone Macie account, it also provides options for changing certain automated discovery settings for the bucket. Additional settings and information are organized into the following tabs:

[Sensitivity](#discovery-asdd-results-s3-inventory-sensitivity-details) \$1 [Bucket details](#discovery-asdd-results-s3-inventory-bucket-details) \$1 [Object samples](#discovery-asdd-results-s3-inventory-sample-details) \$1 [Sensitive data discovery](#discovery-asdd-results-s3-inventory-sdd-details)

Individual settings and information on each tab are as follows.

**Sensitivity**  
This tab shows the bucket's current sensitivity score, ranging from *-1* to *100*. For information about the range of sensitivity scores that Macie defines, see [Sensitivity scoring for S3 buckets](discovery-scoring-s3.md).  
The tab also provides a breakdown of the types of sensitive data that Macie has found in the bucket's objects, and the number of occurrences of each type:  
+ **Sensitive data type** – The unique identifier (ID) for the managed data identifier that detected the data, or the name of the custom data identifier that detected the data.

  A managed data identifier's ID describes the type of sensitive data that it's designed to detect—for example, **USA\$1PASSPORT\$1NUMBER** for US passport numbers. For details about each managed data identifier, see [Using managed data identifiers](managed-data-identifiers.md).
+ **Count** – The total number of occurrences of the data that the managed or custom data identifier detected.
+ **Scoring status** – This field appears if you're a Macie administrator or you have a standalone Macie account. It specifies whether occurrences of the data are included or excluded from the bucket's sensitivity score.

  If Macie calculates the bucket's score, you can adjust the calculation by including or excluding specific types of sensitive data from the score: select the checkbox for the identifier that detected the sensitive data to include or exclude, and then choose an option on the **Actions** menu. For more information, see [Adjusting sensitivity scores for S3 buckets](discovery-asdd-s3bucket-manage.md).
If Macie hasn't found sensitive data in objects that the bucket currently stores, this section shows the **No detections found** message.  
Note that the **Sensitivity** tab doesn't include data for objects that were changed or deleted after Macie analyzed them. If objects are changed or deleted after analysis, Macie automatically recalculates and updates the appropriate statistics and data to exclude the objects.

**Bucket details**  
This tab provides details about the bucket's settings, including data security and privacy settings. For example, you can review breakdowns of the bucket’s public access settings, and determine whether the bucket replicates objects or is shared with other AWS accounts.  
Of special note, the **Last updated** field indicates when Macie most recently retrieved metadata from Amazon S3 for the bucket or the bucket’s objects. The **Latest automated discovery run** field indicates when Macie most recently analyzed objects in the bucket while performing automated sensitive data discovery. If this analysis hasn't occurred, a dash (–) appears in this field.  
The tab also provides object-level statistics that can help you assess how much data Macie can analyze in the bucket. It also indicates whether you configured any sensitive data discovery jobs to analyze objects in the bucket. If you have, you can access details about the job that ran most recently and then optionally display any findings that the job produced.  
In certain cases, this tab might not include all the details of a bucket. This can occur if you store more than 10,000 buckets in Amazon S3. Macie maintains complete inventory data for only 10,000 buckets for an account—the 10,000 buckets that were most recently created or changed. Macie can, however, analyze objects in buckets that exceed this quota. To review additional details for the buckets, use Amazon S3.  
For additional details about the information on this tab, see [Reviewing the details of S3 buckets](monitoring-s3-inventory-review.md#monitoring-s3-inventory-view-details).

**Object samples**  
This tab lists objects that Macie selected for analysis while performing automated sensitive data discovery for the bucket. Optionally choose an object's name to open the Amazon S3 console and display the object's properties.  
The list includes data for up to 100 objects. The list is populated based on the value for the **Object sensitivity** field: **Sensitive**, followed by **Not Sensitive**, followed by objects that Macie wasn't able to analyze.  
In the list, the **Object sensitivity** field indicates whether Macie found sensitive data in an object:  
+ **Sensitive** – Macie found at least one occurrence of sensitive data in the object.
+ **Not sensitive** – Macie didn't find sensitive data in the object.
+ **–** (*dash*) – Macie wasn't able to complete its analysis of the object due to an issue or error.
The **Classification result** field indicates whether Macie was able to analyze an object:  
+ **Complete** – Macie completed its analysis of the object.
+ **Partial** – Macie analyzed only a subset of data in the object due to an issue or error. For example, the object is an archive file that contains files in an unsupported format.
+ **Skipped** – Macie wasn't able to analyze any data in the object due to an issue or error. For example, the object is encrypted with a key that Macie isn't allowed to use.
Note that the list doesn't include objects that were changed or deleted after Macie analyzed or attempted to analyze them. Macie automatically removes an object from the list if the object is subsequently changed or deleted.

**Sensitive data discovery**  
This tab provides aggregated, automated sensitive data discovery statistics for the bucket:  
+ **Analyzed bytes** – The total amount of data, in bytes, that Macie has analyzed in the bucket.
+ **Classifiable bytes** – The total storage size, in bytes, of all the objects that Macie can analyze in the bucket. These objects use supported Amazon S3 storage classes and they have file name extensions for supported file or storage formats. For more information, see [Supported storage classes and formats](discovery-supported-storage.md).
+ **Total detections** – The total number of occurrences of sensitive data that Macie has found in the bucket. This includes occurrences that are currently suppressed by the sensitivity scoring settings for the bucket.
The **Objects analyzed** chart indicates the total number of objects that Macie has analyzed in the bucket. It also provides a visual representation of the number of objects that Macie did or didn't find sensitive data in. The legend below the chart shows a breakdown of these results:  
+ **Sensitive objects** (*red*) – The total number of objects that Macie found at least one occurrence of sensitive data in.
+ **Not sensitive objects** (*blue*) – The total number of objects that Macie didn't find sensitive data in.
+ **Objects skipped** (*dark gray*) – The total number of objects that Macie wasn't able to analyze due to an issue or error.
The area below the chart's legend provides a breakdown of cases where Macie wasn't able to analyze objects because certain types of permissions issues or cryptographic errors occurred:  
+ **Skipped: Invalid encryption** – The total number of objects that are encrypted with customer-provided keys. Macie can't access these keys.
+ **Skipped: Invalid KMS** – The total number of objects that are encrypted with AWS Key Management Service (AWS KMS) keys that are no longer available. These objects are encrypted with AWS KMS keys that were disabled, are scheduled for deletion, or were deleted. Macie can't use these keys.
+ **Skipped: Permission denied** – The total number of objects that Macie isn't allowed to access due to the permissions settings for the object, or the permissions settings for the key that was used to encrypt the object.
For details about these and other types of issues and errors that can occur, see [Remediating coverage issues](discovery-coverage-remediate.md). If you remediate the issues and errors, you can increase coverage of the bucket's data during subsequent analysis cycles.  
Statistics on the **Sensitive data discovery** tab don't include data for objects that were changed or deleted after Macie analyzed or attempted to analyze them. If objects are changed or deleted after Macie analyzes or attempts to analyze them, Macie automatically recalculates these statistics to exclude the objects.

------
#### [ API ]

To retrieve data sensitivity and other details for an S3 bucket programmatically, you have several options. The appropriate option depends on the details that you want to retrieve:
+ To retrieve a bucket's current sensitivity score and aggregated analysis statistics, use the [GetResourceProfile](https://docs.aws.amazon.com/macie/latest/APIReference/resource-profiles.html) operation. Or, if you're using the AWS Command Line Interface (AWS CLI), run the [get-resource-profile](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-resource-profile.html) command. The statistics include data such as the number of objects that Macie has analyzed, and the number of objects that Macie has found sensitive data in.
+ To retrieve a breakdown of the types and amount of sensitive data that Macie has found in a bucket, use the [ListResourceProfileDetections](https://docs.aws.amazon.com/macie/latest/APIReference/resource-profiles-detections.html) operation. Or, if you're using the AWS CLI, run the [list-resource-profile-detections](https://docs.aws.amazon.com/cli/latest/reference/macie2/list-resource-profile-detections.html) command. The breakdown also provides details about the managed or custom data identifier that detected each type of sensitive data.
+ To retrieve a list of up to 100 objects that Macie selected from a bucket for analysis, use the [ListResourceProfileArtifacts](https://docs.aws.amazon.com/macie/latest/APIReference/resource-profiles-artifacts.html) operation. Or, if you're using the AWS CLI, run the [list-resource-profile-artifacts](https://docs.aws.amazon.com/cli/latest/reference/macie2/list-resource-profile-artifacts.html) command. For each object, the list specifies: the Amazon Resource Name (ARN) of the object, whether Macie completed its analysis of the object; and, whether Macie found sensitive data in the object.

In your request, use the `resourceArn` parameter to specify the ARN of the bucket to retrieve the details for. If you're using the AWS CLI, use the `resource-arn` parameter to specify the ARN.

For additional details about an S3 bucket, such as the bucket's public access settings, use the [DescribeBuckets](https://docs.aws.amazon.com/macie/latest/APIReference/datasources-s3.html) operation. If you're using the AWS CLI, run the [describe-buckets](https://docs.aws.amazon.com/cli/latest/reference/macie2/describe-buckets.html) command to retrieve these details. In your request, optionally use filter criteria to specify the name of the bucket. For more information and examples, see [Filtering your S3 bucket inventory](monitoring-s3-inventory-filter.md).

The following examples show how to use the AWS CLI to retrieve data sensitivity details for an S3 bucket. This first example retrieves the current sensitivity score and aggregated analysis statistics for a bucket.

```
$ aws macie2 get-resource-profile --resource-arn arn:aws:s3:::amzn-s3-demo-bucket
```

Where *arn:aws:s3:::amzn-s3-demo-bucket* is the ARN of the bucket. If the request succeeds, you receive output similar to the following:

```
{
    "profileUpdatedAt": "2024-11-21T15:44:46+00:00",
    "sensitivityScore": 83,
    "sensitivityScoreOverridden": false,
    "statistics": {
        "totalBytesClassified": 933599,
        "totalDetections": 3641,
        "totalDetectionsSuppressed": 0,
        "totalItemsClassified": 111,
        "totalItemsSensitive": 84,
        "totalItemsSkipped": 1,
        "totalItemsSkippedInvalidEncryption": 0,
        "totalItemsSkippedInvalidKms": 0,
        "totalItemsSkippedPermissionDenied": 0
    }
}
```

The next example retrieves a breakdown of the types of sensitive data that Macie has found in an S3 bucket, and the number of occurrences of each type. The breakdown also specifies which managed data identifier or custom data identifier detected the data. It also indicates whether the occurrences are currently excluded (`suppressed`) from the bucket's sensitivity score, if the score is calculated automatically by Macie.

```
$ aws macie2 list-resource-profile-detections --resource-arn arn:aws:s3:::amzn-s3-demo-bucket
```

Where *arn:aws:s3:::amzn-s3-demo-bucket* is the ARN of the bucket. If the request succeeds, you receive output similar to the following:

```
{
    "detections": [
        {
            "count": 8,
            "id": "AWS_CREDENTIALS",
            "name": "AWS_CREDENTIALS",
            "suppressed": false,
            "type": "MANAGED"
        },
        {
            "count": 1194,
            "id": "CREDIT_CARD_NUMBER",
            "name": "CREDIT_CARD_NUMBER",
            "suppressed": false,
            "type": "MANAGED"
        },
        {
            "count": 1194,
            "id": "CREDIT_CARD_SECURITY_CODE",
            "name": "CREDIT_CARD_SECURITY_CODE",
            "suppressed": false,
            "type": "MANAGED"
        },
        {
            "arn": "arn:aws:macie2:us-east-1:123456789012:custom-data-identifier/3293a69d-4a1e-4a07-8715-208ddexample",
            "count": 8,
            "id": "3293a69d-4a1e-4a07-8715-208ddexample",
            "name": "Employee IDs with keyword",
            "suppressed": false,
            "type": "CUSTOM"
        },
        {
            "count": 1237,
            "id": "USA_SOCIAL_SECURITY_NUMBER",
            "name": "USA_SOCIAL_SECURITY_NUMBER",
            "suppressed": false,
            "type": "MANAGED"
        }
    ]
}
```

This example retrieves a list of objects that Macie selected from an S3 bucket for analysis. For each object, the list also indicates whether Macie completed its analysis of the object, and whether Macie found sensitive data in the object.

```
$ aws macie2 list-resource-profile-artifacts --resource-arn arn:aws:s3:::amzn-s3-demo-bucket
```

Where *arn:aws:s3:::amzn-s3-demo-bucket* is the ARN of the bucket. If the request succeeds, you receive output similar to the following:

```
{
    "artifacts": [
        {
            "arn": "arn:aws:s3:::amzn-s3-demo-bucket/amzn-s3-demo-object1.csv",
            "classificationResultStatus": "COMPLETE",
            "sensitive": true
        },
        {
            "arn": "arn:aws:s3:::amzn-s3-demo-bucket/amzn-s3-demo-object2.xlsx",
            "classificationResultStatus": "COMPLETE",
            "sensitive": true
        },
        {
            "arn": "arn:aws:s3:::amzn-s3-demo-bucket/amzn-s3-demo-object3.json",
            "classificationResultStatus": "COMPLETE",
            "sensitive": true
        },
        {
            "arn": "arn:aws:s3:::amzn-s3-demo-bucket/amzn-s3-demo-object4.pdf",
            "classificationResultStatus": "COMPLETE",
            "sensitive": true
        },
        {
            "arn": "arn:aws:s3:::amzn-s3-demo-bucket/amzn-s3-demo-object5.zip",
            "classificationResultStatus": "PARTIAL",
            "sensitive": true
        },
        {
            "arn": "arn:aws:s3:::amzn-s3-demo-bucket/amzn-s3-demo-object6.vssx",
            "classificationResultStatus": "SKIPPED"
        }
    ]
}
```

------

# Analyzing findings from automated sensitive data discovery
<a name="discovery-asdd-results-s3-findings"></a>

When Amazon Macie performs automated sensitive data discovery, it creates a sensitive data finding for each Amazon Simple Storage Service (Amazon S3) object that it finds sensitive data in. A *sensitive data finding* is a detailed report of sensitive data that Macie found in an S3 object. A finding doesn't include the sensitive data that Macie found. Instead, it provides information that you can use for further investigation and remediation as necessary.

Each sensitive data finding provides a severity rating and details such as:
+ The date and time when Macie found the sensitive data.
+ The category and types of sensitive data that Macie found.
+ The number of occurrences of each type of sensitive data that Macie found.
+ How Macie found the sensitive data, automated sensitive data discovery or a sensitive data discovery job.
+ The name, public access settings, encryption type, and other information about the affected S3 bucket and object.

Depending on the affected S3 object's file type or storage format, the details can also include the location of as many as 15 occurrences of the sensitive data that Macie found.

Macie stores sensitive data findings for 90 days. You can access them by using the Amazon Macie console or the Amazon Macie API. You can also monitor and process findings by using other applications, services, and systems. For more information, see [Reviewing and analyzing findings](findings.md).

**To analyze findings produced by automated sensitive data discovery**  
To identify and analyze findings that Macie created while performing automated sensitive data discovery, you can filter your findings. With filters, you use specific attributes of findings to build custom views and queries for findings. To filter findings, you can use the Amazon Macie console or submit queries programmatically using the Amazon Macie API. For more information, see [Filtering findings](findings-filter-overview.md).

**Note**  
If your account is part of an organization that centrally manages multiple Macie accounts, only the Macie administrator for your organization has direct access to findings that automated sensitive data discovery produces for accounts in your organization. If you have a member account and want to review the findings for your account, contact your Macie administrator.

------
#### [ Console ]

Follow these steps to identify and analyze the findings by using the Amazon Macie console.

**To analyze findings produced by automated discovery**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Findings**.

1. To display findings that were suppressed by a [suppression rule](findings-suppression.md), change the **Finding status** setting. Choose **All** to display both suppressed and unsuppressed findings, or choose **Archived** to display only suppressed findings. To then hide suppressed findings again, choose **Current**.

1. Place your cursor in the **Filter criteria** box. In the list of fields that appears, choose **Origin type**.

   This field specifies how Macie found the sensitive data that produced a finding, automated sensitive data discovery or a sensitive data discovery job. To find this field in the list of filter fields, you can browse the complete list, or enter part of the field's name to narrow the list of fields.

1. Select **AUTOMATED\$1SENSITIVE\$1DATA\$1DISCOVERY** as the value for the field, and then choose **Apply**. Macie applies the filter criteria and adds the condition to a filter token in the **Filter criteria** box.

1. To refine the results, add filter conditions for additional fields—for example, **Created at** for the time range when a finding was created, **S3 bucket name** for the name of an affected bucket, or **Sensitive data detection type** for the type of sensitive that was detected and produced a finding.

If you want to subsequently use this set of conditions again, you can save it as a filter rule. To do this, choose **Save rule** in the **Filter criteria** box. Then enter a name and, optionally, a description for the rule. When you finish, choose **Save**.

------
#### [ API ]

To identify and analyze the findings programmatically, specify filter criteria in queries that you submit using the [ListFindings](https://docs.aws.amazon.com/macie/latest/APIReference/findings.html) or [GetFindingStatistics](https://docs.aws.amazon.com/macie/latest/APIReference/findings-statistics.html) operation of the Amazon Macie API. The **ListFindings** operation returns an array of finding IDs, one ID for each finding that matches the filter criteria. You can then use those IDs to retrieve the details of each finding. The **GetFindingStatistics** operation returns aggregated statistical data about all the findings that match the filter criteria, grouped by a field that you specify in your request. For more information about filtering findings programmatically, see [Filtering findings](findings-filter-overview.md).

In the filter criteria, include a condition for the `originType` field. This field specifies how Macie found the sensitive data that produced a finding, automated sensitive data discovery or a sensitive data discovery job. If automated sensitive data discovery produced a finding, the value for this field is `AUTOMATED_SENSITIVE_DATA_DISCOVERY`.

To identify and analyze the findings by using the AWS Command Line Interface (AWS CLI), run the [list-findings](https://docs.aws.amazon.com/cli/latest/reference/macie2/list-findings.html) or [get-finding-statistics](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-finding-statistics.html) command. The following examples use the **list-findings** command to retrieve finding IDs for all high-severity findings that automated sensitive data discovery produced in the current AWS Region.

This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws macie2 list-findings \
--finding-criteria '{"criterion":{"classificationDetails.originType":{"eq":["AUTOMATED_SENSITIVE_DATA_DISCOVERY"]},"severity.description":{"eq":["High"]}}}'
```

This example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 list-findings ^
--finding-criteria={\"criterion\":{\"classificationDetails.originType\":{\"eq\":[\"AUTOMATED_SENSITIVE_DATA_DISCOVERY\"]},\"severity.description\":{\"eq\":[\"High\"]}}}
```

Where:
+ `classificationDetails.originType` specifies the JSON name of the **Origin type** field, and:
  + `eq` specifies the *equals* operator.
  + `AUTOMATED_SENSITIVE_DATA_DISCOVERY` is an enumerated value for the field.
+ *`severity.description`* specifies the JSON name of the **Severity** field, and:
  + *`eq`* specifies the *equals* operator.
  + *`High`* is an enumerated value for the field.

If the request succeeds, Macie returns a `findingIds` array. The array lists the unique identifier for each finding that matches the filter criteria, as shown in the following example.

```
{
    "findingIds": [
        "1f1c2d74db5d8caa76859ec52example",
        "6cfa9ac820dd6d55cad30d851example",
        "702a6fd8750e567d1a3a63138example",
        "826e94e2a820312f9f964cf60example",
        "274511c3fdcd87010a19a3a42example"
    ]
}
```

If no findings match the filter criteria, Macie returns an empty `findingIds` array.

```
{
    "findingIds": []
}
```

------

# Accessing discovery results from automated sensitive data discovery
<a name="discovery-asdd-results-s3-sddrs"></a>

When Amazon Macie performs automated sensitive data discovery, it creates an analysis record for each Amazon Simple Storage Service (Amazon S3) object that it selects for analysis. These records, referred to as *sensitive data discovery results*, log details about the analysis that Macie performs on individual S3 objects. This includes objects that Macie doesn't find sensitive data in, and objects that Macie can't analyze due to errors or issues such as permissions settings or use of an unsupported file or storage format. Sensitive data discovery results provide you with analysis records that can be helpful for data privacy and protection audits or investigations.

If Macie finds sensitive data in an S3 object, the sensitive data discovery result provides information about the sensitive data that Macie found. The information includes the same types of details that a sensitive data finding provides. It provides additional information too, such as the location of as many as 1,000 occurrences of each type of sensitive data that Macie found. For example: 
+ The column and row number for a cell or field in a Microsoft Excel workbook, CSV file, or TSV file
+ The path to a field or array in a JSON or JSON Lines file
+ The line number for a line in a non-binary text file other than a CSV, JSON, JSON Lines, or TSV file—for example, an HTML, TXT, or XML file
+ The page number for a page in an Adobe Portable Document Format (PDF) file
+ The record index and the path to a field in a record in an Apache Avro object container or Apache Parquet file

If the affected S3 object is an archive file, such as a .tar or .zip file, the sensitive data discovery result also provides detailed location data for occurrences of sensitive data in individual files that Macie extracted from the archive. Macie doesn’t include this information in sensitive data findings for archive files. To report location data, sensitive data discovery results use a [standardized JSON schema](findings-locate-sd-schema.md).

**Note**  
As is the case with sensitive data findings, sensitive data discovery results don't include sensitive data that Macie finds in S3 objects. Instead, they provide analysis details that can be helpful for audits or investigations.

Macie stores your sensitive data discovery results for 90 days. You can’t access them directly on the Amazon Macie console or with the Amazon Macie API. Instead, you configure Macie to encrypt and store them in an S3 bucket. The bucket can serve as a definitive, long-term repository for all of your sensitive data discovery results. To determine where this repository is for your account, choose **Discovery results** in the navigation pane on the Amazon Macie console. To do this programmatically, use the [GetClassificationExportConfiguration](https://docs.aws.amazon.com/macie/latest/APIReference/classification-export-configuration.html) operation of the Amazon Macie API. If you haven't configured this repository for your account, see [Storing and retaining sensitive data discovery results](discovery-results-repository-s3.md) to learn how.

After you configure Macie to store your sensitive data discovery results in an S3 bucket, Macie writes the results to JSON Lines (.jsonl) files, and it encrypts and adds those files to the bucket as GNU Zip (.gz) files. For automated sensitive data discovery, Macie adds the files to a folder named `automated-sensitive-data-discovery` in the bucket. You can then optionally access and query the results in that folder. If your account is part of an organization that centrally manages multiple Macie accounts, Macie adds the files to the `automated-sensitive-data-discovery` folder in the bucket for your Macie administrator's account.

Sensitive data discovery results adhere to a standardized schema. This can help you query, monitor, and process them by using other applications, services, and systems. For a detailed, instructional example of how you might query and use these results, see the following blog post on the *AWS Security Blog*: [How to query and visualize Macie sensitive data discovery results with Amazon Athena and Amazon Quick](https://aws.amazon.com/blogs/security/how-to-query-and-visualize-macie-sensitive-data-discovery-results-with-athena-and-quicksight/). For samples of Athena queries that you can use to analyze the results, visit the [Amazon Macie Results Analytics repository](https://github.com/aws-samples/amazon-macie-results-analytics) on GitHub. This repository also provides instructions for configuring Athena to retrieve and decrypt your results, and scripts for creating tables for the results.

# Assessing automated sensitive data discovery coverage
<a name="discovery-coverage"></a>

As automated sensitive data discovery progresses for your account or organization, Amazon Macie provides statistics and details to help you assess and monitor its coverage of your Amazon Simple Storage Service (Amazon S3) data estate. With this data, you can check the status of automated sensitive data discovery for your data estate overall and individual S3 buckets within it. You can also identify issues that prevented Macie from analyzing objects in specific buckets. If you remediate the issues, you can increase coverage of your Amazon S3 data during subsequent analysis cycles.

Coverage data provides a snapshot of the current status of automated sensitive data discovery for your S3 general purpose buckets in the current AWS Region. If you're the Macie administrator for an organization, this includes buckets that your member accounts own. For each bucket, the data indicates whether issues occurred when Macie attempted to analyze objects in the bucket. If issues occurred, the data indicates the nature of each issue and, in certain cases, the number of occurrences. The data is updated as automated sensitive data discovery progresses each day. If Macie analyzes or attempts to analyze one or more objects in a bucket during a daily analysis cycle, Macie updates coverage and other data to reflect the results.

For certain types of issues, you can review the data in aggregate for all of your S3 general purpose buckets and optionally drill down for additional details about each bucket. For example, coverage data can help you quickly identify all the buckets that Macie isn't allowed to access for your account. Coverage data also reports object-level issues that occurred. These issues, referred to as *classification errors*, prevented Macie from analyzing specific objects in a bucket. For example, you can determine how many objects Macie couldn't analyze in a bucket because the objects are encrypted with an AWS Key Management Service (AWS KMS) key that's no longer available.

If you use the Amazon Macie console to review coverage data, your view of the data includes guidance for remediating each type of issue. Subsequent topics in this section also provide remediation guidance for each type.

**Topics**
+ [Reviewing coverage data](discovery-coverage-review.md)
+ [Remediating coverage issues](discovery-coverage-remediate.md)

# Reviewing coverage data for automated sensitive data discovery
<a name="discovery-coverage-review"></a>

To review and assess coverage by automated sensitive data discovery, you can use the Amazon Macie console or the Amazon Macie API. Both the console and the API provide data that indicates the current status of the analyses for your Amazon Simple Storage Service (Amazon S3) general purpose buckets in the current AWS Region. The data includes information about issues that create gaps in the analyses:
+ Buckets that Macie isn't allowed to access. Macie can't analyze any objects in these buckets. The buckets' permissions settings prevent Macie from accessing the buckets and the buckets' objects.
+ Buckets that don't store any classifiable objects. Macie can't analyze any objects in these buckets. All the objects use Amazon S3 storage classes that Macie doesn't support, or they have file name extensions for file or storage formats that Macie doesn't support. 
+ Buckets that Macie hasn’t been able to analyze yet due to object-level classification errors. Macie attempted to analyze one or more objects in these buckets. However, Macie couldn't analyze the objects due to issues with object-level permissions settings, object content, or quotas.

Coverage data is updated as automated sensitive data discovery progresses each day. If you're the Macie administrator for an organization, the data includes information for S3 buckets that your member accounts own.

**Note**  
Coverage data doesn't explicitly include results for sensitive data discovery jobs that you create and run. However, remediating coverage issues that affect automated sensitive data discovery is likely to also increase coverage by jobs that you subsequently run. To assess coverage for a job, [review the job's results](discovery-jobs-manage-results.md). If a job's log events or other results indicate coverage issues, [remediation guidance for automated sensitive data discovery](discovery-coverage-remediate.md) can help you address some of the issues.

**To review coverage data for automated sensitive data discovery**  
To review coverage data for automated sensitive data discovery, you can use the Amazon Macie console or the Amazon Macie API. On the console, a single page provides a unified view of coverage data for all of your S3 general purpose buckets in the current Region. This includes a rollup of issues that recently occurred for each bucket. The page also provides options for reviewing groups of data by issue type. To track your investigation of issues for specific buckets, you can export data from the page to a comma-separated values (CSV) file.

------
#### [ Console ]

Follow these steps to review coverage data by using the Amazon Macie console.

**To review coverage data**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Resource coverage**.

1. On the **Resource coverage** page, choose the tab for the type of coverage data that you want to review:
   + **All** – Lists all the buckets for your account. For each bucket, the **Issues** field indicates whether issues prevented Macie from analyzing objects in the bucket. If the value for this field is **None**, Macie has analyzed at least one of the bucket's objects or Macie hasn't attempted to analyze any of the bucket's objects yet. If there are issues, this field indicates the nature of the issues and how to remediate them. For object-level classification errors, it might also indicate (in parentheses) the number of occurrences of the error.
   + **Access denied** – Lists buckets that Macie isn't allowed to access. The permissions settings for these buckets prevent Macie from accessing the buckets and the buckets' objects. Consequently, Macie can't analyze any objects in the buckets. 
   + **Classification error** – Lists buckets that Macie hasn’t analyzed yet due to object-level classification errors—issues with object-level permissions settings, object content, or quotas. For each bucket, the **Issues** field indicates the nature of each type of error that occurred and prevented Macie from analyzing an object in the bucket. It also indicates how to remediate each type of error. Depending on the error, it might also indicate (in parentheses) the number of occurrences of the error.
   + **Unclassifiable** – Lists buckets that Macie can't analyze because they don't store any classifiable objects. All the objects in these buckets use unsupported Amazon S3 storage classes or they have file name extensions for unsupported file or storage formats. Consequently, Macie can't analyze any objects in the buckets. 

1. To drill down and review the supporting data for a bucket, choose the bucket's name. Then refer to the details panel for statistics and other information about the bucket.

1. To export the table to a CSV file, choose **Export to CSV** at the top of the page. The resulting CSV file contains a subset of metadata for each bucket in the table, for up to 50,000 buckets. The file includes a **Coverage issues** field. The value for this field indicates whether issues prevented Macie from analyzing objects in the bucket and, if so, the nature of the issues.

------
#### [ API ]

To review coverage data programmatically, specify filter criteria in queries that you submit using the [DescribeBuckets](https://docs.aws.amazon.com/macie/latest/APIReference/datasources-s3.html) operation of the Amazon Macie API. This operation returns an array of objects. Each object contains statistical data and other information about an S3 general purpose bucket that matches the filter criteria.

In the filter criteria, include a condition for the type of coverage data that you want to review:
+ To identify buckets that Macie isn't allowed to access due to the buckets' permissions settings, include a condition where the value for the `errorCode` field equals `ACCESS_DENIED`.
+ To identify buckets that Macie is allowed to access and hasn't analyzed yet, include conditions where the value for the `sensitivityScore` field equals `50` and the value for the `errorCode` field doesn't equal `ACCESS_DENIED`.
+ To identify buckets that Macie can't analyze because all the buckets' objects use unsupported storage classes or formats, include conditions where the value for the `classifiableSizeInBytes` field equals `0` and the value for the `sizeInBytes` field is greater than `0`.
+ To identify buckets for which Macie has analyzed at least one object, include conditions where the value for the `sensitivityScore` field falls within the range of 1–99 but is not equal to `50`. To also include buckets where you manually assigned the maximum score, the range should be 1–100.
+ To identify buckets that Macie hasn’t analyzed yet due to object-level classification errors, include a condition where the value for the `sensitivityScore` field equals `-1`. To then review a breakdown of the types and number of errors that occurred for a particular bucket, use the [GetResourceProfile](https://docs.aws.amazon.com/macie/latest/APIReference/resource-profiles.html) operation.

If you're using the AWS Command Line Interface (AWS CLI), specify filter criteria in queries that you submit by running the [describe-buckets](https://docs.aws.amazon.com/cli/latest/reference/macie2/describe-buckets.html) command. To review a breakdown of the types and number of errors that occurred for a particular S3 bucket, if any, run the [get-resource-profile](https://docs.aws.amazon.com/cli/latest/reference/macie2/get-resource-profile.html) command.

For example, the following AWS CLI commands use filter criteria to retrieve the details of all the S3 buckets that Macie isn't allowed to access due to the buckets' permissions settings.

This example is formatted for Linux, macOS, or Unix:

```
$ aws macie2 describe-buckets --criteria '{"errorCode":{"eq":["ACCESS_DENIED"]}}'
```

This example is formatted for Microsoft Windows:

```
C:\> aws macie2 describe-buckets --criteria={\"errorCode\":{\"eq\":[\"ACCESS_DENIED\"]}}
```

If your request succeeds, Macie returns a `buckets` array. The array contains an object for each S3 bucket that’s in the current AWS Region and matches the filter criteria.

If no S3 buckets match the filter criteria, Macie returns an empty `buckets` array.

```
{
    "buckets": []
}
```

For more information about specifying filter criteria in queries, including examples of common criteria, see [Filtering your S3 bucket inventory](monitoring-s3-inventory-filter.md).

------

For detailed information that can help you address coverage issues, see [Remediating coverage issues for automated sensitive data discovery](discovery-coverage-remediate.md).

# Remediating coverage issues for automated sensitive data discovery
<a name="discovery-coverage-remediate"></a>

As automated sensitive data discovery progresses each day, Amazon Macie provides statistics and details to help you assess and monitor its coverage of your Amazon Simple Storage Service (Amazon S3) data estate. By [reviewing coverage data](discovery-coverage-review.md), you can check the status of automated sensitive data discovery for your data estate overall and individual S3 buckets within it. You can also identify issues that prevented Macie from analyzing objects in specific buckets. If you remediate the issues, you can increase coverage of your Amazon S3 data during subsequent analysis cycles.

Macie reports several types of issues that reduce coverage of your Amazon S3 data by automated sensitive data discovery. This includes bucket-level issues that prevent Macie from analyzing any objects in an S3 bucket. It also includes object-level issues. These issues, referred to as *classification errors*, prevented Macie from analyzing specific objects in a bucket. The following information can help you investigate and remediate the issues.

**Topics**
+ [Access denied](#discovery-issues-access-denied)
+ [Classification error: Invalid content](#discovery-issues-invalid-content)
+ [Classification error: Invalid encryption](#discovery-issues-classification-error-invalid-encryption)
+ [Classification error: Invalid KMS key](#discovery-issues-classification-error-invalid-key)
+ [Classification error: Permission denied](#discovery-issues-classification-error-permission-denied)
+ [Unclassifiable](#discovery-issues-unclassifiable)

**Tip**  
To investigate object-level classification errors for an S3 bucket, start by reviewing the list of object samples for the bucket. This list indicates which objects Macie analyzed or attempted to analyze in the bucket, for up to 100 objects.   
To review the list on the Amazon Macie console, choose the bucket on the **S3 buckets** page, and then choose the **Object samples** tab in the details panel. To review the list programmatically, use the [ListResourceProfileArtifacts](https://docs.aws.amazon.com/macie/latest/APIReference/resource-profiles-artifacts.html) operation of the Amazon Macie API. If the status of the analysis for an object is **Skipped** (`SKIPPED`), the object might have caused the error.

## Access denied
<a name="discovery-issues-access-denied"></a>

This issue indicates that an S3 bucket's permissions settings prevent Macie from accessing the bucket and the bucket’s objects. Macie can't retrieve and analyze any objects in the bucket.

**Details**  
The most common cause for this type of issue is a restrictive bucket policy. A *bucket policy* is a resource-based AWS Identity and Access Management (IAM) policy that specifies which actions a principal (user, account, service, or other entity) can perform on an S3 bucket, and the conditions under which a principal can perform those actions. A *restrictive bucket policy* uses explicit `Allow` or `Deny` statements that grant or restrict access to a bucket's data based on specific conditions. For example, a bucket policy might contain an `Allow` or `Deny` statement that denies access to a bucket unless specific source IP addresses are used to access the bucket.  
If the bucket policy for an S3 bucket contains an explicit `Deny` statement with one or more conditions, Macie might not be allowed to retrieve and analyze the bucket’s objects to detect sensitive data. Macie can only provide a subset of information about the bucket, such as the bucket's name and creation date.

**Remediation guidance**  
To remediate this issue, update the bucket policy for the S3 bucket. Ensure that the policy allows Macie to access the bucket and the bucket’s objects. To allow this access, add a condition for the Macie service-linked role (`AWSServiceRoleForAmazonMacie`) to the policy. The condition should exclude the Macie service-linked role from matching the `Deny` restriction in the policy. It can do this by using the `aws:PrincipalArn` global condition context key and the Amazon Resource Name (ARN) of the Macie service-linked role for your account.  
If you update the bucket policy and Macie gains access to the S3 bucket, Macie will detect the change. When this happens, Macie will update statistics, inventory data, and other information that it provides about your Amazon S3 data. In addition, the bucket's objects will be a higher priority for analysis during a subsequent analysis cycle.

**Additional reference**  
For more information about updating an S3 bucket policy to allow Macie to access a bucket, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md). For information about using bucket policies to control access to buckets, see [Bucket policies](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-policies.html) and [How Amazon S3 authorizes a request](https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-s3-evaluates-access-control.html) in the *Amazon Simple Storage Service User Guide*.

## Classification error: Invalid content
<a name="discovery-issues-invalid-content"></a>

This type of classification error occurs if Macie attempts to analyze an object in an S3 bucket and the object is malformed or the object contains content that exceeds a sensitive data discovery quota. Macie can't analyze the object.

**Details**  
This error typically occurs because an S3 object is a malformed or corrupted file. Consequently, Macie can't parse and analyze all the data in the file.  
This error can also occur if analysis of an S3 object would exceed a sensitive data discovery quota for an individual file. For example, the storage size of the object exceeds the size quota for that type of file.  
For either case, Macie can't complete its analysis of the S3 object and the status of the analysis for the object is **Skipped** (`SKIPPED`).

**Remediation guidance**  
To investigate this error, download the S3 object and check the formatting and contents of the file. Also assess the contents of the file against Macie quotas for sensitive data discovery.  
If you don't remediate this error, Macie will try to analyze other objects in the S3 bucket. If Macie analyzes another object successfully, Macie will update coverage data and other information that it provides about the bucket.

**Additional reference**  
For a list of sensitive data discovery quotas, including the quotas for certain types of files, see [Quotas for Macie](macie-quotas.md). For information about how Macie updates sensitivity scores and other information that it provides about S3 buckets, see [How automated sensitive data discovery works](discovery-asdd-how-it-works.md).

## Classification error: Invalid encryption
<a name="discovery-issues-classification-error-invalid-encryption"></a>

This type of classification error occurs if Macie attempts to analyze an object in an S3 bucket and the object is encrypted with a customer-provided key. The object uses SSE-C encryption, which means that Macie can't retrieve and analyze the object.

**Details**  
Amazon S3 supports multiple encryption options for S3 objects. For most of these options, Macie can decrypt an object by using the Macie service-linked role for your account. However, this depends on the type of encryption that was used.  
For Macie to decrypt an S3 object, the object must be encrypted with a key that Macie can access and is allowed to use. If an object is encrypted with a customer-provided key, Macie can't provide the requisite key material to retrieve the object from Amazon S3. Consequently, Macie can't analyze the object and the status of the analysis for the object is **Skipped** (`SKIPPED`).

**Remediation guidance**  
To remediate this error, encrypt S3 objects with Amazon S3 managed keys or AWS Key Management Service (AWS KMS) keys. If you prefer to use AWS KMS keys, the keys can be AWS managed KMS keys, or customer managed KMS keys that Macie is allowed to use.  
To encrypt existing S3 objects with keys that Macie can access and use, you can change the encryption settings for the objects. To encrypt new objects with keys that Macie can access and use, change the default encryption settings for the S3 bucket. Also ensure that the bucket's policy doesn't require new objects to be encrypted with a customer-provided key.  
If you don't remediate this error, Macie will try to analyze other objects in the S3 bucket. If Macie analyzes another object successfully, Macie will update coverage data and other information that it provides about the bucket.

**Additional reference**  
For information about requirements and options for using Macie to analyze encrypted S3 objects, see [Analyzing encrypted Amazon S3 objects](discovery-supported-encryption-types.md). For information about encryption options and settings for S3 buckets, see [Protecting data with encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingEncryption.html) and [Setting default server-side encryption behavior for S3 buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-encryption.html) in the *Amazon Simple Storage Service User Guide*.

## Classification error: Invalid KMS key
<a name="discovery-issues-classification-error-invalid-key"></a>

This type of classification error occurs if Macie attempts to analyze an object in an S3 bucket and the object is encrypted with an AWS Key Management Service (AWS KMS) key that's no longer available. Macie can't retrieve and analyze the object.

**Details**  
AWS KMS provides options for disabling and deleting customer managed AWS KMS keys. If an S3 object is encrypted with a KMS key that is disabled, is scheduled for deletion, or was deleted, Macie can't retrieve and decrypt the object. Consequently, Macie can't analyze the object and the status of the analysis for the object is **Skipped** (`SKIPPED`). For Macie to analyze an encrypted object, the object must be encrypted with a key that Macie can access and is allowed to use.

**Remediation guidance**  
To remediate this error, re-enable the applicable AWS KMS key or cancel the scheduled deletion of the key, depending on the current status of the key. If the applicable key was already deleted, this error cannot be remediated.   
To determine which AWS KMS key was used to encrypt an S3 object, you can start by using Macie to review the server-side encryption settings for the S3 bucket. If the default encryption settings for the bucket are configured to use a KMS key, the bucket's details indicate which key is used. You can then check the status of that key. Alternatively, you can use Amazon S3 to review the encryption settings for the bucket and individual objects in the bucket.  
If you don't remediate this error, Macie will try to analyze other objects in the S3 bucket. If Macie analyzes another object successfully, Macie will update coverage data and other information that it provides about the bucket.

**Additional reference**  
For information about using Macie to review the server-side encryption settings for an S3 bucket, see [Reviewing the details of S3 buckets](monitoring-s3-inventory-review.md#monitoring-s3-inventory-view-details). For information about re-enabling an AWS KMS key or canceling the scheduled deletion of a key, see [Enabling and disabling keys](https://docs.aws.amazon.com/kms/latest/developerguide/enabling-keys.html) and [Deleting keys](https://docs.aws.amazon.com/kms/latest/developerguide/deleting-keys.html) in the *AWS Key Management Service Developer Guide*.

## Classification error: Permission denied
<a name="discovery-issues-classification-error-permission-denied"></a>

This type of classification error occurs if Macie attempts to analyze an object in an S3 bucket and Macie can't retrieve or decrypt the object due to the permissions settings for the object or the permissions settings for the key that was used to encrypt the object. Macie can't retrieve and analyze the object.

**Details**  
This error typically occurs because an S3 object is encrypted with a customer managed AWS Key Management Service (AWS KMS) key that Macie isn’t allowed to use. If an object is encrypted with a customer managed AWS KMS key, the key's policy must allow Macie to decrypt data by using the key.  
This error can also occur if Amazon S3 permissions settings prevent Macie from retrieving an S3 object. The bucket policy for the S3 bucket might restrict access to specific bucket objects or allow only certain principals (users, accounts, services, or other entities) to access the objects. Or the access control list (ACL) for an object might restrict access to the object. Consequently, Macie might not be allowed to access the object.  
For any of the preceding cases, Macie can't retrieve and analyze the object, and the status of the analysis for the object is **Skipped** (`SKIPPED`).

**Remediation guidance**  
To remediate this error, determine whether the S3 object is encrypted with a customer managed AWS KMS key. If it is, ensure that the key's policy allows the Macie service-linked role (`AWSServiceRoleForAmazonMacie`) to decrypt data with the key. How you allow this access depends on whether the account that owns the AWS KMS key also owns the S3 bucket that stores the object. If the same account owns the KMS key and the bucket, a user of the account has to update the key's policy. If one account owns the KMS key and a different account owns the bucket, a user of the account that owns the key has to allow cross-account access to the key.  
You can automatically generate a list of all the customer managed AWS KMS keys that Macie needs to access to analyze objects in the S3 buckets for your account. To do this, run the AWS KMS Permission Analyzer script, which is available from the [Amazon Macie Scripts](https://github.com/aws-samples/amazon-macie-scripts) repository on GitHub. The script can also generate an additional script of AWS Command Line Interface (AWS CLI) commands. You can optionally run those commands to update the requisite configuration settings and policies for KMS keys that you specify.
If Macie is already allowed to use the applicable AWS KMS key or the S3 object isn't encrypted with a customer managed KMS key, ensure that the bucket's policy allows Macie to access the object. Also verify that the object's ACL allows Macie to read the object's data and metadata.   
For the bucket policy, you can allow this access by adding a condition for the Macie service-linked role to the policy. The condition should exclude the Macie service-linked role from matching the `Deny` restriction in the policy. It can do this by using the `aws:PrincipalArn` global condition context key and the Amazon Resource Name (ARN) of the Macie service-linked role for your account.  
For the object ACL, you can allow this access by working with the object owner to add your AWS account as a grantee with `READ` permissions for the object. Macie can then use the service-linked role for your account to retrieve and analyze the object. Also consider changing the Object Ownership settings for the bucket. You can use these settings to disable ACLs for all the objects in the bucket and grant ownership permissions to the account that owns the bucket.  
If you don't remediate this error, Macie will try to analyze other objects in the S3 bucket. If Macie analyzes another object successfully, Macie will update coverage data and other information that it provides about the bucket.

**Additional reference**  
For more information about allowing Macie to decrypt data with a customer managed AWS KMS key, see [Allowing Macie to use a customer managed AWS KMS key](discovery-supported-encryption-types.md#discovery-supported-encryption-cmk-configuration). For information about updating an S3 bucket policy to allow Macie to access a bucket, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md).  
For information about updating a key policy, see [Changing a key policy](https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-modifying.html) in the *AWS Key Management Service Developer Guide*. For information about using customer managed AWS KMS keys to encrypt S3 objects, see [Using server-side encryption with AWS KMS keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) in the *Amazon Simple Storage Service User Guide*.   
For information about using bucket policies to control access to S3 buckets, see [Access control](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-management.html) and [How Amazon S3 authorizes a request](https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-s3-evaluates-access-control.html) in the *Amazon Simple Storage Service User Guide*. For information about using ACLs or Object Ownership settings to control access to S3 objects, see [Managing access with ACLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/acls.html) and [Controlling ownership of objects and disabling ACLs for your bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/about-object-ownership.html) in the *Amazon Simple Storage Service User Guide*.

## Unclassifiable
<a name="discovery-issues-unclassifiable"></a>

This issue indicates that all the objects in an S3 bucket are stored using unsupported Amazon S3 storage classes or unsupported file or storage formats. Macie can't analyze any objects in the bucket.

**Details**  
To be eligible for selection and analysis, an S3 object must use an Amazon S3 storage class that Macie supports. The object must also have a file name extension for a file or storage format that Macie supports. If an object doesn't meet these criteria, the object is treated as an *unclassifiable object*. Macie doesn't attempt to retrieve or analyze data in unclassifiable objects.  
If all the objects in an S3 bucket are unclassifiable objects, the overall bucket is an *unclassifiable bucket*. Macie can't perform automated sensitive data discovery for the bucket.

**Remediation guidance**  
To address this issue, review lifecycle configuration rules and other settings that determine which storage classes are used to store objects in the S3 bucket. Consider adjusting those settings to use storage classes that Macie supports. You can also change the storage class of existing objects in the bucket.  
Also assess the file and storage formats of existing objects in the S3 bucket. To analyze the objects, consider porting the data, either temporarily or permanently, to new objects that use a supported format.  
If objects are added to the S3 bucket and they use a supported storage class and format, Macie will detect the objects the next time it evaluates your bucket inventory. When this happens, Macie will stop reporting that the bucket is *unclassifiable* in statistics, coverage data, and other information that it provides about your Amazon S3 data. In addition, the new objects will be a higher priority for analysis during a subsequent analysis cycle.

**Additional reference**  
For information about the Amazon S3 storage classes and the file and storage formats that Macie supports, see [Supported storage classes and formats](discovery-supported-storage.md). For information about lifecycle configuration rules and the storage class options that Amazon S3 provides, see [Managing your storage lifecycle](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html) and [Using Amazon S3 storage classes](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html) in the *Amazon Simple Storage Service User Guide*. 

# Adjusting sensitivity scores for S3 buckets
<a name="discovery-asdd-s3bucket-manage"></a>

As you review and evaluate statistics, data, and other results of automated sensitive data discovery, there might be cases where you want to fine tune sensitivity assessments of your Amazon Simple Storage Service (Amazon S3) buckets. You might also want to capture the results of investigations that you or your organization performs for specific buckets. If you're the Amazon Macie administrator for an organization or you have a standalone Macie account, you can make these changes by adjusting the sensitivity score and other settings for individual buckets. If you have a member account in an organization, work with your Macie administrator to adjust the settings for buckets that you own. Only the Macie administrator for your organization can adjust these settings for your buckets.

If you're a Macie administrator or you have a standalone Macie account, you can adjust the sensitivity score for an S3 bucket in the following ways:
+ **Assign a sensitivity score** – By default, Macie automatically calculates a bucket's sensitivity score. The score is based primarily on the amount of sensitive data that Macie has found in a bucket, and the amount of data that Macie has analyzed in a bucket. For more information, see [Sensitivity scoring for S3 buckets](discovery-scoring-s3.md).

  You can override a bucket's calculated score and manually assign the maximum score (*100*), which also applies the *Sensitive* label to the bucket. If you do this, Macie stops performing automated sensitive data discovery for the bucket, as buckets with a score of 100 are excluded from further scanning. To calculate the score automatically again and resume scanning, change the setting again.
+ **Exclude or include sensitive data types in the sensitivity score** – If it's calculated automatically, a bucket's sensitivity score is based partly on the amount of sensitive data that Macie has found in the bucket. This derives primarily from the nature and number of sensitive data types that Macie has found, and the number of occurrences of each type. By default, Macie includes occurrences of all types of sensitive data when it calculates a bucket's score.

  You can adjust the calculation by excluding or including specific types of sensitive data in a bucket's score. For example, if Macie detected mailing addresses in a bucket and you determine that this is acceptable, you can exclude all occurrences of mailing addresses from the bucket's score. If you exclude a sensitive data type, Macie continues to inspect the bucket for that type of data, and report occurrences that it finds. However, those occurrences don't affect the bucket's score. To include a sensitive data type in the score again, change the setting again.

You can also exclude an S3 bucket from subsequent analyses. If you exclude a bucket, existing sensitive data discovery statistics and details for the bucket persist. For example, the bucket's current sensitivity score remains unchanged. However, Macie stops analyzing objects in the bucket when it performs automated sensitive data discovery. After you exclude a bucket, you can include it again later.

If you change a setting that affects the sensitivity score for an S3 bucket, Macie immediately begins to recalculate the score. Macie also updates relevant statistics and other information that it provides about the bucket and your Amazon S3 data overall. For example, if you assign the maximum score to a bucket, Macie increments the count of *Sensitive* buckets in aggregated statistics.

**To adjust the sensitivity score or other settings for an S3 bucket**  
To adjust the sensitivity score or other settings for an S3 bucket, you can use the Amazon Macie console or the Amazon Macie API.

------
#### [ Console ]

Follow these steps to adjust the sensitivity score or a setting for an S3 bucket by using the Amazon Macie console.

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **S3 buckets**. The **S3 buckets** page displays your bucket inventory.

   By default, the page doesn't display data for buckets that are currently excluded from analyses. If you're the Macie administrator for an organization, it also doesn't display data for accounts that automated sensitive data discovery is currently disabled for. To display this data, choose **X** in the **Is monitored by automated discovery** filter token below the filter box.

1. Choose the S3 bucket that has a setting to adjust. You can choose the bucket by using the table view (![\[The table view button, which is a button that displays three black horizontal lines.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-s3-table-view.png)) or the interactive map (![\[The map view button, which is a button that displays four black squares.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-s3-map-view.png)).

1. In the details panel, do any of the following:
   + To override the calculated sensitivity score and manually assign a score, turn on **Assign maximum score** (![\[A toggle switch with a gray background and the toggle positioned to the left.\]](http://docs.aws.amazon.com/macie/latest/user/images/tgl-gray-off.png)). This changes the bucket's score to *100* and applies the *Sensitive* label to the bucket.
   + To assign a sensitivity score that Macie calculates automatically, turn off **Assign maximum score** (![\[A toggle switch with a blue background and the toggle positioned to the right.\]](http://docs.aws.amazon.com/macie/latest/user/images/tgl-blue-on.png)).
   + To exclude or include specific types of sensitive data in the sensitivity score, choose the **Sensitivity** tab. In the **Detections** table, select the checkbox for the sensitive data type to exclude or include. Then, on the **Actions** menu, choose **Exclude from score** to exclude the type or choose **Include in score** to include the type.

     In the table, the **Sensitive data type** field specifies the managed data identifier or custom data identifier that detected the data. For a managed data identifier, this is a unique identifier (ID) that describes the type of sensitive data that the identifier is designed to detect—for example, **USA\$1PASSPORT\$1NUMBER** for US passport numbers. For details about each managed data identifier, see [Using managed data identifiers](managed-data-identifiers.md).
   + To exclude the bucket from subsequent analyses, turn on **Exclude from automated discovery** (![\[A toggle switch with a gray background and the toggle positioned to the left.\]](http://docs.aws.amazon.com/macie/latest/user/images/tgl-gray-off.png)).
   + To include the bucket in subsequent analyses, if you previously excluded it, turn off **Exclude from automated discovery** (![\[A toggle switch with a blue background and the toggle positioned to the right.\]](http://docs.aws.amazon.com/macie/latest/user/images/tgl-blue-on.png)).

------
#### [ API ]

To adjust the sensitivity score or a setting for an S3 bucket programmatically, you have several options. The appropriate option depends on what you want to adjust.

**Assign a sensitivity score**  
To assign a sensitivity score to an S3 bucket, use the [UpdateResourceProfile](https://docs.aws.amazon.com/macie/latest/APIReference/resource-profiles.html) operation. In your request, use the `resourceArn` parameter to specify the Amazon Resource Name (ARN) of the bucket. For the `sensitivityScoreOverride` parameter, do one of the following:  
+ To override the calculated score and manually assign the maximum score, specify `100`.
+ To assign a score that Macie calculates automatically, omit the parameter. If this parameter is null, Macie calculates and assigns the score.
If you're using the AWS Command Line Interface (AWS CLI), run the [update-resource-profile](https://docs.aws.amazon.com/cli/latest/reference/macie2/update-resource-profile.html) command to assign a sensitivity score to an S3 bucket. In your request, use the `resource-arn` parameter to specify the ARN of the bucket. Omit or use the `sensitivity-score-override` parameter to specify which score to assign.  
If your request succeeds, Macie assigns the specified score and returns an empty response.

**Exclude or include sensitive data types in the sensitivity score**  
To exclude or include sensitive data types in the sensitivity score for an S3 bucket, use the [UpdateResourceProfileDetections](https://docs.aws.amazon.com/macie/latest/APIReference/resource-profiles-detections.html) operation. When you use this operation, you overwrite the current inclusion and exclusion settings for a bucket's score. Therefore, it's a good idea to first retrieve the current settings and determine which ones you want to keep. To retrieve the current settings, use the [ListResourceProfileDetections](https://docs.aws.amazon.com/macie/latest/APIReference/resource-profiles-detections.html) operation.  
When you're ready to update the settings, use the `resourceArn` parameter to specify the ARN of the S3 bucket. For the `suppressDataIdentifiers` parameter, do one of the following:  
+ To exclude a sensitive data type from the bucket's score, use the `type` parameter to specify the type of data identifier that detected the data, a managed data identifier (`MANAGED`) or a custom data identifier (`CUSTOM`). Use the `id` parameter to specify the unique identifier for the managed or custom data identifier that detected the data.
+ To include a sensitive data type in the bucket's score, don't specify any details for the managed or custom data identifier that detected the data.
+ To include all sensitive data types in the bucket's score, don't specify any values. If the value for the `suppressDataIdentifiers` parameter is null (empty), Macie includes all types of detections when it calculates the score.
If you're using the AWS CLI, run the [update-resource-profile-detections](https://docs.aws.amazon.com/cli/latest/reference/macie2/update-resource-profile-detections.html) command to exclude or include sensitive data types in the sensitivity score for an S3 bucket. Use the `resource-arn` parameter to specify the ARN of the bucket. Use the `suppress-data-identifiers` parameter to specify which sensitive data types to exclude or include in the bucket's score. To first retrieve and review the current settings for the bucket, run the [list-resource-profile-detections](https://docs.aws.amazon.com/cli/latest/reference/macie2/list-resource-profile-detections.html) command.   
If your request succeeds, Macie updates the settings and returns an empty response.

**Exclude or include an S3 bucket in analyses**  
To exclude or subsequently include an S3 bucket in analyses, use the [UpdateClassificationScope](https://docs.aws.amazon.com/macie/latest/APIReference/classification-scopes-id.html) operation. Or, if you're using the AWS CLI, run the [update-classification-scope](https://docs.aws.amazon.com/cli/latest/reference/macie2/update-classification-scope.html) command. For additional details and examples, see [Excluding or including S3 buckets in automated sensitive data discovery](discovery-asdd-account-configure.md#discovery-asdd-account-configure-s3buckets).

The following examples show how to use the AWS CLI to adjust individual settings for an S3 bucket. This first example manually assigns the maximum sensitivity score (`100`) to a bucket. It overrides the bucket's calculated score.

```
$ aws macie2 update-resource-profile --resource-arn arn:aws:s3:::amzn-s3-demo-bucket --sensitivity-score-override 100
```

Where *arn:aws:s3:::amzn-s3-demo-bucket* is the ARN of the S3 bucket.

The next example changes the sensitivity score for an S3 bucket to a score that Macie calculates automatically. The bucket currently has a manually assigned score that overrides the calculated score. This example removes that override by omitting the `sensitivity-score-override` parameter from the request.

```
$ aws macie2 update-resource-profile --resource-arn arn:aws:s3:::amzn-s3-demo-bucket2
```

Where *arn:aws:s3:::amzn-s3-demo-bucket2* is the ARN of the S3 bucket.

The following examples exclude particular types of sensitive data from the sensitivity score for an S3 bucket. This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws macie2 update-resource-profile-detections \
--resource-arn arn:aws:s3:::amzn-s3-demo-bucket3 \
--suppress-data-identifiers '[{"type":"MANAGED","id":"ADDRESS"},{"type":"CUSTOM","id":"3293a69d-4a1e-4a07-8715-208ddexample"}]'
```

This example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws macie2 update-resource-profile-detections ^
--resource-arn arn:aws:s3:::amzn-s3-demo-bucket3 ^
--suppress-data-identifiers=[{\"type\":\"MANAGED\",\"id\":\"ADDRESS\"},{\"type\":\"CUSTOM\",\"id\":\"3293a69d-4a1e-4a07-8715-208ddexample\"}]
```

Where:
+ *arn:aws:s3:::amzn-s3-demo-bucket3* is the ARN of the S3 bucket.
+ *ADDRESS* is the unique identifier for the managed data identifier that detected a type of sensitive data to exclude (mailing addresses).
+ *3293a69d-4a1e-4a07-8715-208ddexample* is the unique identifier for the custom data identifier that detected a type of sensitive data to exclude.

This next set of examples later includes all types of sensitive data in the sensitivity score for the S3 bucket. It overwrites the current exclusion settings for the bucket by specifying an empty (null) value for the `suppress-data-identifiers` parameter. For Linux, macOS, or Unix:

```
$ aws macie2 update-resource-profile-detections --resource-arn arn:aws:s3:::amzn-s3-demo-bucket3 --suppress-data-identifiers '[]'
```

For Microsoft Windows:

```
C:\> aws macie2 update-resource-profile-detections --resource-arn arn:aws:s3:::amzn-s3-demo-bucket3 --suppress-data-identifiers=[]
```

Where *arn:aws:s3:::amzn-s3-demo-bucket3* is the ARN of the S3 bucket.

------

# Sensitivity scoring for S3 buckets
<a name="discovery-scoring-s3"></a>

If automated sensitive data discovery is enabled, Amazon Macie automatically calculates and assigns a sensitivity score to each Amazon Simple Storage Service (Amazon S3) general purpose bucket that it monitors and analyzes for an account or organization. A *sensitivity score* is a quantitative representation of the amount of sensitive data that an S3 bucket might contain. Based on that score, Macie also assigns a sensitivity label to each bucket. A *sensitivity label* is a qualitative representation of a bucket's sensitivity score. These values can serve as reference points for determining where sensitive data might reside in your Amazon S3 data estate, and identifying and monitoring potential security risks for that data.

By default, an S3 bucket's sensitivity score and label reflect the results of automated sensitive data discovery activities that Macie has performed thus far for the bucket. They don't reflect the results of sensitive data discovery jobs that you create and run. In addition, neither the score nor the label implies or otherwise indicates the criticality or importance that a bucket or a bucket's objects might have for you or your organization. However, you can override a bucket's calculated score by manually assigning the maximum score (*100*) to the bucket. This also assigns the *Sensitive* label to the bucket. To override a calculated score, you must be the Macie administrator for the account that owns the bucket, or have a standalone Macie account.

**Topics**
+ [Sensitivity scoring dimensions and ranges](#discovery-scoring-s3-dimensions)
+ [Monitoring sensitivity scores](#discovery-scoring-s3-monitoring)

## Sensitivity scoring dimensions and ranges
<a name="discovery-scoring-s3-dimensions"></a>

If it's calculated by Amazon Macie, an S3 bucket's sensitivity score is a quantitative measure of the intersection of two primary dimensions: 
+ The amount of sensitive data that Macie has found in the bucket. This derives primarily from the nature and number of sensitive data types that Macie has found in the bucket, and the number of occurrences of each type.
+ The amount of data that Macie has analyzed in the bucket. This derives primarily from the number of unique objects that Macie has analyzed in the bucket relative to the total number of unique objects in the bucket. 

An S3 bucket's sensitivity score also determines which sensitivity label Macie assigns to the bucket. The sensitivity label is a qualitative representation of the score—for example, *Sensitive* or *Not sensitive*. On the Amazon Macie console, a bucket's sensitivity score also determines which color Macie uses to represent the bucket in data visualizations, as shown in the following image.

![\[The color spectrum for sensitivity scores: blue hues for 1-49, red hues for 51-100, and gray for -1.\]](http://docs.aws.amazon.com/macie/latest/user/images/sensitivity-scoring-spectrum.png)


Sensitivity scores range from *-1* through *100*, as described in the following table. To assess inputs to an S3 bucket's score, you can refer to sensitive data discovery statistics and other details that Macie provides about the bucket. 


| Sensitivity score | Sensitivity label | Additional information | 
| --- | --- | --- | 
| -1 | Classification error |  Macie hasn't successfully analyzed any of the bucket's objects yet due to object-level classification errors—issues with object-level permissions settings, object content, or quotas.  When Macie tried to analyze one or more objects in the bucket, errors occurred. For example, an object is a malformed file, or an object is encrypted with a key that Macie can't access or isn't allowed to use. Coverage data for the bucket can help you investigate and remediate the errors. For more information, see [Assessing automated sensitive data discovery coverage](discovery-coverage.md). Macie will continue to try to analyze objects in the bucket. If Macie analyzes an object successfully, Macie will update the bucket's sensitivity score and label to reflect the results of the analysis.  | 
| 1-49 | Not sensitive |  In this range, a higher score, such as *49*, indicates that Macie has analyzed relatively few objects in the bucket. A lower score, such as *1*, indicates that Macie has analyzed many objects in the bucket (relative to the total number of objects in the bucket) and detected relatively few types and occurrences of sensitive data in those objects. A score of *1* can also indicate that the bucket doesn't store any objects or all the objects in the bucket contain zero (0) bytes of data. Object statistics in the bucket's details can help you determine if this is the case. For more information, see [Reviewing S3 bucket details](discovery-asdd-results-s3-inventory-details.md).  | 
| 50 | Not yet analyzed |  Macie hasn't tried to analyze or analyzed any of the bucket's objects yet. Macie automatically assigns this score when automated discovery is initially enabled or a bucket is added to the bucket inventory for an account. In an organization, a bucket can also have this score if automated discovery has never been enabled for the account that owns the bucket. A score of *50* can also indicate that the bucket's permissions settings prevent Macie from accessing the bucket or the bucket’s objects. This is typically due to a restrictive bucket policy. The bucket's details can help you determine if this is the case because Macie can provide only a subset of information about the bucket. For information about how to address this issue, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md).  | 
| 51-99 | Sensitive |  In this range, a higher score, such as *99*, indicates that Macie has analyzed many objects in the bucket (relative to the total number of objects in the bucket) and detected many types and occurrences of sensitive data in those objects. A lower score, such as *51*, indicates that Macie has analyzed a moderate number of objects in the bucket (relative to the total number of objects in the bucket) and detected at least a few types and occurrences of sensitive data in those objects.  | 
| 100 | Sensitive |  The score was manually assigned to the bucket, overriding the calculated score. Macie doesn't assign this score to buckets.  | 

## Monitoring sensitivity scores
<a name="discovery-scoring-s3-monitoring"></a>

When automated sensitive data discovery is initially enabled for an account, Amazon Macie automatically assigns a sensitivity score of *50* to each S3 bucket that the account owns. Macie also assigns this score to a bucket when the bucket is added to the bucket inventory for an account. Based on that score, each bucket's sensitivity label is *Not yet analyzed*. The exception is an empty bucket, which is a bucket that doesn't store any objects or all the objects in the bucket contain zero (0) bytes of data. If this is the case for a bucket, Macie assigns a score of *1* to the bucket and the bucket's sensitivity label is *Not sensitive*.

As automated sensitive data discovery progresses each day, Macie updates sensitivity scores and labels for S3 buckets to reflect the results of its analysis. For example:
+ If Macie doesn't find sensitive data in an object, Macie decreases the bucket's sensitivity score and updates the sensitivity label as necessary.
+ If Macie finds sensitive data in an object, Macie increases the bucket's sensitivity score and updates the sensitivity label as necessary.
+ If Macie finds sensitive data in an object that's subsequently changed, Macie removes sensitive data detections for the object from the bucket's sensitivity score and updates the sensitivity label as necessary.
+ If Macie finds sensitive data in an object that's subsequently deleted, Macie removes sensitive data detections for the object from the bucket's sensitivity score and updates the sensitivity label as necessary.
+ If an object is added to a bucket that was previously empty and Macie finds sensitive data in the object, Macie increases the bucket's sensitivity score and updates the sensitivity label as necessary.
+ If a bucket's permissions settings prevent Macie from accessing or retrieving information about the bucket or the bucket’s objects, Macie changes the bucket's sensitivity score to *50* and changes the bucket's sensitivity label to *Not yet analyzed*.

Analysis results can begin to appear within 48 hours of enabling automated sensitive data discovery for an account.

If you're the Macie administrator for an organization or you have a standalone Macie account, you can adjust sensitivity scoring settings for your organization or account:
+ To adjust the settings for subsequent analyses of all S3 buckets, change the settings for your account. You can start including or excluding specific managed data identifiers, custom data identifiers, or allow lists. You can also exclude specific buckets. For more information, see [Configuring automated discovery settings](discovery-asdd-account-configure.md).
+ To adjust the settings for individual S3 buckets, change the settings for each bucket. You can include or exclude specific types of sensitive data from a bucket's score. You can also specify whether to assign an automatically calculated score to a bucket. For more information, see [Adjusting sensitivity scores for S3 buckets](discovery-asdd-s3bucket-manage.md).

If you disable automated sensitive data discovery, the effect varies for existing sensitivity scores and labels. If you disable it for a member account in an organization, existing scores and labels persist for S3 buckets that the account owns. If you disable it for an organization overall or a standalone Macie account, existing scores and labels persist for only 30 days. After 30 days, Macie resets scores and labels for all the buckets that the organization or account owns. If a bucket stores objects, Macie changes the score to *50* and assigns the *Not yet analyzed* label to the bucket. If a bucket is empty, Macie changes the score to *1* and assigns the *Not sensitive* label to the bucket. After this reset, Macie stops updating sensitivity scores and labels for the buckets, unless you enable automated sensitive data discovery for the organization or account again.

# Default settings for automated sensitive data discovery
<a name="discovery-asdd-settings-defaults"></a>

If automated sensitive data discovery is enabled, Amazon Macie automatically selects and analyzes sample objects from all the Amazon Simple Storage Service (Amazon S3) general purpose buckets for your account. If you're the Macie administrator for an organization, by default this includes S3 buckets that your member accounts own. 

If you're a Macie administrator or you have a standalone Macie account, you can refine the scope of the analyses by excluding specific S3 buckets from automated sensitive data discovery. You can do this in two ways: by changing the settings for your account, and by changing the settings for individual buckets. As a Macie administrator, you can also enable or disable automated sensitive data discovery for individual accounts in your organization.

By default, Macie analyzes S3 objects by using only the set of managed data identifiers that we recommend for automated sensitive data discovery. Macie doesn't use any custom data identifiers or allow lists that you defined. If you're a Macie administrator or you have a standalone Macie account, you can customize the analyses by configuring Macie to use specific managed data identifiers, custom data identifiers, and allow lists. You can do this by changing the settings for your account. 

For information about changing your settings, see [Configuring settings for automated sensitive data discovery](discovery-asdd-account-configure.md).

**Topics**
+ [Default managed data identifiers](#discovery-asdd-settings-defaults-mdis)
+ [Updates to the default settings](#discovery-asdd-mdis-default-updates)

## Default managed data identifiers for automated sensitive data discovery
<a name="discovery-asdd-settings-defaults-mdis"></a>

By default, Amazon Macie analyzes S3 objects by using only the set of managed data identifiers that we recommend for automated sensitive data discovery. This default set of managed data identifiers is designed to detect common categories and types of sensitive data. Based on our research, it can detect general categories and types of sensitive data while also optimizing your results by reducing noise.

The default set is dynamic. As we release new managed data identifiers, we add them to the default set if they're likely to further optimize your automated sensitive data discovery results. Over time, we might also add or remove existing managed data identifiers from the set. Removal of a managed data identifier doesn't affect existing sensitive data discovery statistics and details for your S3 buckets. For example, if we remove the managed data identifier for a type of sensitive data that Macie previously detected in a bucket, Macie continues to report those detections. If we add or remove a managed data identifier from the default set, we update this page to indicate the nature and timing of the change. For automatic alerts about these changes, you can subscribe to the RSS feed on the [Macie document history](doc-history.md) page.

The following topics list the managed data identifiers that are currently in the default set, organized by sensitive data category and type. They specify the unique identifier (ID) for each managed data identifier in the set. This ID describes the type of sensitive data that a managed data identifier is designed to detect, for example: `PGP_PRIVATE_KEY` for PGP private keys and `USA_PASSPORT_NUMBER` for US passport numbers. If you change your settings for automated sensitive data discovery, you can use this ID to explicitly exclude a managed data identifier from subsequent analyses.

**Topics**
+ [Credentials](#discovery-asdd-settings-defaults-mdis-credentials)
+ [Financial information](#discovery-asdd-settings-defaults-mdis-financial)
+ [Personally identifiable information (PII)](#discovery-asdd-settings-defaults-mdis-pii)

 For details about specific managed data identifiers or a complete list of all the managed data identifiers that Macie currently provides, see [Using managed data identifiers](managed-data-identifiers.md).

### Credentials
<a name="discovery-asdd-settings-defaults-mdis-credentials"></a>

To detect occurrences of credentials data in S3 objects, Macie uses the following managed data identifiers by default.


| Sensitive data type | Managed data identifier ID | 
| --- | --- | 
| AWS secret access key | AWS\$1CREDENTIALS | 
| HTTP Basic Authorization header | HTTP\$1BASIC\$1AUTH\$1HEADER | 
| OpenSSH private key | OPENSSH\$1PRIVATE\$1KEY | 
| PGP private key | PGP\$1PRIVATE\$1KEY | 
| Public Key Cryptography Standard (PKCS) private key | PKCS | 
| PuTTY private key | PUTTY\$1PRIVATE\$1KEY | 

### Financial information
<a name="discovery-asdd-settings-defaults-mdis-financial"></a>

To detect occurrences of financial information in S3 objects, Macie uses the following managed data identifiers by default.


| Sensitive data type | Managed data identifier ID | 
| --- | --- | 
| Credit card magnetic stripe data | CREDIT\$1CARD\$1MAGNETIC\$1STRIPE | 
| Credit card number | CREDIT\$1CARD\$1NUMBER (for credit card numbers in proximity of a keyword) | 

### Personally identifiable information (PII)
<a name="discovery-asdd-settings-defaults-mdis-pii"></a>

To detect occurrences of personally identifiable information (PII) in S3 objects, Macie uses the following managed data identifiers by default.


| Sensitive data type | Managed data identifier ID | 
| --- | --- | 
| Driver’s license identification number | CANADA\$1DRIVERS\$1LICENSE, DRIVERS\$1LICENSE (for the US),  UK\$1DRIVERS\$1LICENSE | 
| Electoral roll number | UK\$1ELECTORAL\$1ROLL\$1NUMBER | 
| National identification number | FRANCE\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, GERMANY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, ITALY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, SPAIN\$1DNI\$1NUMBER | 
| National Insurance Number (NINO) | UK\$1NATIONAL\$1INSURANCE\$1NUMBER | 
| Passport number | CANADA\$1PASSPORT\$1NUMBER, FRANCE\$1PASSPORT\$1NUMBER, GERMANY\$1PASSPORT\$1NUMBER, ITALY\$1PASSPORT\$1NUMBER, SPAIN\$1PASSPORT\$1NUMBER, UK\$1PASSPORT\$1NUMBER, USA\$1PASSPORT\$1NUMBER | 
| Social Insurance Number (SIN) | CANADA\$1SOCIAL\$1INSURANCE\$1NUMBER | 
| Social Security number (SSN) | SPAIN\$1SOCIAL\$1SECURITY\$1NUMBER, USA\$1SOCIAL\$1SECURITY\$1NUMBER | 
| Taxpayer identification or reference number | AUSTRALIA\$1TAX\$1FILE\$1NUMBER, BRAZIL\$1CPF\$1NUMBER, FRANCE\$1TAX\$1IDENTIFICATION\$1NUMBER, GERMANY\$1TAX\$1IDENTIFICATION\$1NUMBER, SPAIN\$1NIE\$1NUMBER, SPAIN\$1NIF\$1NUMBER, SPAIN\$1TAX\$1IDENTIFICATION\$1NUMBER, USA\$1INDIVIDUAL\$1TAX\$1IDENTIFICATION\$1NUMBER | 

## Updates to the default settings for automated sensitive data discovery
<a name="discovery-asdd-mdis-default-updates"></a>

The following table describes changes to the settings that Amazon Macie uses by default for automated sensitive data discovery. For automatic alerts about these changes, subscribe to the RSS feed on the [Macie document history](doc-history.md) page.


| Change | Description | Date | 
| --- | --- | --- | 
|  Implemented a new, dynamic set of default managed data identifiers  |  New automated sensitive data discovery configurations are now based on a dynamic [default set of managed data identifiers](#discovery-asdd-settings-defaults-mdis). If you enable automated sensitive data discovery for the first time on or after this date, your configuration is based on the dynamic set. If you enabled automated sensitive data discovery for the first time before this date, your configuration is based on a different set of managed data identifiers. For more information, see the notes after this table.  | August 2, 2023 | 
|  General availability  |  Initial release of automated sensitive data discovery.  |  November 28, 2022  | 

If you initially enabled automated sensitive data discovery prior to August 2, 2023, your configuration isn't based on the dynamic set of default managed data identifiers. Instead, it's based on a static set of managed data identifiers that we defined for the initial release of automated sensitive data discovery, as listed in the table below.

To determine when you initially enabled automated sensitive data discovery you can use the Amazon Macie console: choose **Automated sensitive data discovery** in the navigation pane, and then refer to the enabled date in the **Status** section. You can also do this programmatically: use the [GetAutomatedDiscoveryConfiguration](https://docs.aws.amazon.com/macie/latest/APIReference/automated-discovery-configuration.html) operation of the Amazon Macie API and refer to the value for the `firstEnabledAt` field. If the date is prior to August 2, 2023, and you want to start using the dynamic set of default managed data identifiers, contact AWS Support for assistance.

The following table lists all the managed data identifiers that are in the static set. The table is sorted first by sensitive data category and then by sensitive data type. For details about specific managed data identifiers, see [Using managed data identifiers](managed-data-identifiers.md).


| Sensitive data category | Sensitive data type | Managed data identifier ID | 
| --- | --- | --- | 
| Credentials | AWS secret access key | AWS\$1CREDENTIALS | 
| Credentials | HTTP Basic Authorization header | HTTP\$1BASIC\$1AUTH\$1HEADER | 
| Credentials | OpenSSH private key | OPENSSH\$1PRIVATE\$1KEY | 
| Credentials | PGP private key | PGP\$1PRIVATE\$1KEY | 
| Credentials | Public Key Cryptography Standard (PKCS) private key | PKCS | 
| Credentials | PuTTY private key | PUTTY\$1PRIVATE\$1KEY | 
| Financial information | Bank account number | BANK\$1ACCOUNT\$1NUMBER (for Canadian and US bank account numbers), FRANCE\$1BANK\$1ACCOUNT\$1NUMBER, GERMANY\$1BANK\$1ACCOUNT\$1NUMBER, ITALY\$1BANK\$1ACCOUNT\$1NUMBER, SPAIN\$1BANK\$1ACCOUNT\$1NUMBER, UK\$1BANK\$1ACCOUNT\$1NUMBER | 
| Financial information | Credit card expiration date | CREDIT\$1CARD\$1EXPIRATION | 
| Financial information | Credit card magnetic stripe data | CREDIT\$1CARD\$1MAGNETIC\$1STRIPE | 
| Financial information | Credit card number | CREDIT\$1CARD\$1NUMBER (for credit card numbers in proximity of a keyword) | 
| Financial information | Credit card verification code | CREDIT\$1CARD\$1SECURITY\$1CODE | 
| Personal information: Personal health information (PHI) | Drug Enforcement Agency (DEA) Registration Number | US\$1DRUG\$1ENFORCEMENT\$1AGENCY\$1NUMBER | 
| Personal information: PHI | Health Insurance Claim Number (HICN) | USA\$1HEALTH\$1INSURANCE\$1CLAIM\$1NUMBER | 
| Personal information: PHI | Health insurance or medical identification number | CANADA\$1HEALTH\$1NUMBER, EUROPEAN\$1HEALTH\$1INSURANCE\$1CARD\$1NUMBER, FINLAND\$1EUROPEAN\$1HEALTH\$1INSURANCE\$1NUMBER, FRANCE\$1HEALTH\$1INSURANCE\$1NUMBER, UK\$1NHS\$1NUMBER, USA\$1MEDICARE\$1BENEFICIARY\$1IDENTIFIER | 
| Personal information: PHI | Healthcare Common Procedure Coding System (HCPCS) code | USA\$1HEALTHCARE\$1PROCEDURE\$1CODE | 
| Personal information: PHI | National Drug Code (NDC) | USA\$1NATIONAL\$1DRUG\$1CODE | 
| Personal information: PHI | National Provider Identifier (NPI) | USA\$1NATIONAL\$1PROVIDER\$1IDENTIFIER | 
| Personal information: PHI | Unique device identifier (UDI) | MEDICAL\$1DEVICE\$1UDI | 
| Personal information: Personally identifiable information (PII) | Birth date | DATE\$1OF\$1BIRTH | 
| Personal information: PII | Driver’s license identification number | AUSTRALIA\$1DRIVERS\$1LICENSE, AUSTRIA\$1DRIVERS\$1LICENSE, BELGIUM\$1DRIVERS\$1LICENSE, BULGARIA\$1DRIVERS\$1LICENSE, CANADA\$1DRIVERS\$1LICENSE, CROATIA\$1DRIVERS\$1LICENSE, CYPRUS\$1DRIVERS\$1LICENSE, CZECHIA\$1DRIVERS\$1LICENSE, DENMARK\$1DRIVERS\$1LICENSE, DRIVERS\$1LICENSE (for the US), ESTONIA\$1DRIVERS\$1LICENSE, FINLAND\$1DRIVERS\$1LICENSE, FRANCE\$1DRIVERS\$1LICENSE, GERMANY\$1DRIVERS\$1LICENSE, GREECE\$1DRIVERS\$1LICENSE, HUNGARY\$1DRIVERS\$1LICENSE, IRELAND\$1DRIVERS\$1LICENSE, ITALY\$1DRIVERS\$1LICENSE, LATVIA\$1DRIVERS\$1LICENSE, LITHUANIA\$1DRIVERS\$1LICENSE, LUXEMBOURG\$1DRIVERS\$1LICENSE, MALTA\$1DRIVERS\$1LICENSE, NETHERLANDS\$1DRIVERS\$1LICENSE, POLAND\$1DRIVERS\$1LICENSE, PORTUGAL\$1DRIVERS\$1LICENSE, ROMANIA\$1DRIVERS\$1LICENSE, SLOVAKIA\$1DRIVERS\$1LICENSE, SLOVENIA\$1DRIVERS\$1LICENSE, SPAIN\$1DRIVERS\$1LICENSE, SWEDEN\$1DRIVERS\$1LICENSE, UK\$1DRIVERS\$1LICENSE | 
| Personal information: PII | Electoral roll number | UK\$1ELECTORAL\$1ROLL\$1NUMBER | 
| Personal information: PII | Full name | NAME | 
| Personal information: PII | Global Positioning System (GPS) coordinates | LATITUDE\$1LONGITUDE | 
| Personal information: PII | Mailing address | ADDRESS, BRAZIL\$1CEP\$1CODE | 
| Personal information: PII | National identification number | BRAZIL\$1RG\$1NUMBER, FRANCE\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, GERMANY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, ITALY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, SPAIN\$1DNI\$1NUMBER | 
| Personal information: PII | National Insurance Number (NINO) | UK\$1NATIONAL\$1INSURANCE\$1NUMBER | 
| Personal information: PII | Passport number | CANADA\$1PASSPORT\$1NUMBER, FRANCE\$1PASSPORT\$1NUMBER, GERMANY\$1PASSPORT\$1NUMBER, ITALY\$1PASSPORT\$1NUMBER, SPAIN\$1PASSPORT\$1NUMBER, UK\$1PASSPORT\$1NUMBER, USA\$1PASSPORT\$1NUMBER | 
| Personal information: PII | Permanent residence number | CANADA\$1NATIONAL\$1IDENTIFICATION\$1NUMBER | 
| Personal information: PII | Phone number | BRAZIL\$1PHONE\$1NUMBER, FRANCE\$1PHONE\$1NUMBER, GERMANY\$1PHONE\$1NUMBER, ITALY\$1PHONE\$1NUMBER, PHONE\$1NUMBER (for Canada and the US), SPAIN\$1PHONE\$1NUMBER, UK\$1PHONE\$1NUMBER | 
| Personal information: PII | Social Insurance Number (SIN) | CANADA\$1SOCIAL\$1INSURANCE\$1NUMBER | 
| Personal information: PII | Social Security number (SSN) | SPAIN\$1SOCIAL\$1SECURITY\$1NUMBER, USA\$1SOCIAL\$1SECURITY\$1NUMBER | 
| Personal information: PII | Taxpayer identification or reference number | AUSTRALIA\$1TAX\$1FILE\$1NUMBER, BRAZIL\$1CNPJ\$1NUMBER, BRAZIL\$1CPF\$1NUMBER, FRANCE\$1TAX\$1IDENTIFICATION\$1NUMBER, GERMANY\$1TAX\$1IDENTIFICATION\$1NUMBER, SPAIN\$1NIE\$1NUMBER, SPAIN\$1NIF\$1NUMBER, SPAIN\$1TAX\$1IDENTIFICATION\$1NUMBER, UK\$1TAX\$1IDENTIFICATION\$1NUMBER, USA\$1INDIVIDUAL\$1TAX\$1IDENTIFICATION\$1NUMBER | 
| Personal information: PII | Vehicle identification number (VIN) | VEHICLE\$1IDENTIFICATION\$1NUMBER | 

# Running sensitive data discovery jobs
<a name="discovery-jobs"></a>

With Amazon Macie, you can create and run sensitive data discovery jobs to automate discovery, logging, and reporting of sensitive data in Amazon Simple Storage Service (Amazon S3) general purpose buckets. A *sensitive data discovery job* is a series of automated processing and analysis tasks that Macie performs to detect and report sensitive data in Amazon S3 objects. Each job provides detailed reports of the sensitive data that Macie finds and the analysis that Macie performs. By creating and running jobs, you can build and maintain a comprehensive view of the data that your organization stores in Amazon S3 and any security or compliance risks for that data.

To help you meet and maintain compliance with your data security and privacy requirements, Macie provides several options for scheduling and defining the scope of a job. You can configure a job to run only once for on-demand analysis and assessment, or on a recurring basis for periodic analysis, assessment, and monitoring. You also define the breadth and depth of a job's analysis—specific S3 buckets that you select or buckets that match specific criteria. You can optionally refine the scope of that analysis by choosing additional options. The options include custom criteria that derive from properties of S3 objects, such as tags, prefixes, and when an object was last modified.

For each job, you also specify the types of sensitive data that you want Macie to detect and report. You can configure a job to use [managed data identifiers](managed-data-identifiers.md) that Macie provides, [custom data identifiers](custom-data-identifiers.md) that you define, or a combination of the two. By selecting specific managed and custom data identifiers for a job, you can tailor the analysis to focus on specific types of sensitive data. To fine tune the analysis, you can also configure a job to use [allow lists](allow-lists.md). Allow lists specify text and text patterns that you want Macie to ignore, typically sensitive data exceptions for your organization's particular scenarios or environment.

Each job produces records of the sensitive data that Macie finds and the analysis that Macie performs—*sensitive data findings* and *sensitive data discovery results*. A *sensitive data finding* is a detailed report of sensitive data that Macie found in an S3 object. A *sensitive data discovery result* is a record that logs details about the analysis of an S3 object. Macie creates a sensitive data discovery result for each object that you configure a job to analyze. This includes objects that Macie doesn’t find sensitive data in, and therefore don't produce sensitive data findings, and objects that Macie can't analyze due to errors or issues. Each type of record adheres to a standardized schema, which can help you query, monitor, and process the records to meet your security and compliance requirements.

**Topics**
+ [Scope options for jobs](discovery-jobs-scope.md)
+ [Creating a job](discovery-jobs-create.md)
+ [Reviewing job results](discovery-jobs-manage-results.md)
+ [Managing jobs](discovery-jobs-manage.md)
+ [Monitoring jobs with CloudWatch Logs](discovery-jobs-monitor-cw-logs.md)
+ [Forecasting and monitoring job costs](discovery-jobs-costs.md)
+ [Managed data identifiers recommended for jobs](discovery-jobs-mdis-recommended.md)

# Scope options for sensitive data discovery jobs
<a name="discovery-jobs-scope"></a>

With sensitive data discovery jobs, you define the scope of the analysis that Amazon Macie performs to detect and report sensitive data in your Amazon Simple Storage Service (Amazon S3) general purpose buckets. To help you do this, Macie provides several job-specific options that you can choose when you create and configure a job.

**Topics**
+ [S3 buckets or bucket criteria](#discovery-jobs-scope-buckets)
+ [Sampling depth](#discovery-jobs-scope-sampling)
+ [Initial run: Include existing S3 objects](#discovery-jobs-scope-objects)
+ [S3 object criteria](#discovery-jobs-scope-criteria)

## S3 buckets or bucket criteria
<a name="discovery-jobs-scope-buckets"></a>

When you create a sensitive data discovery job, you specify which S3 buckets store objects that you want Macie to analyze when the job runs. You can do this in two ways: by selecting specific S3 buckets from your bucket inventory, or by specifying custom criteria that derive from properties of S3 buckets.

**Select specific S3 buckets**  
With this option, you explicitly select each S3 bucket to analyze. Then, when the job runs, Macie analyzes objects only in the buckets that you select. If you configure a job to run periodically on a daily, weekly, or monthly basis, Macie analyzes objects in those same buckets each time the job runs.   
This configuration is helpful for cases where you want to perform targeted analysis of a specific set of data. It gives you precise, predictable control over which buckets a job analyzes.

**Specify S3 bucket criteria**  
With this option, you define runtime criteria that determine which S3 buckets to analyze. The criteria consist of one or more conditions that derive from bucket properties, such as public access settings and tags. When the job runs, Macie identifies buckets that match your criteria, and then analyzes objects in those buckets. If you configure a job to run periodically, Macie does this each time the job runs. Consequently, Macie might analyze objects in different buckets each time the job runs, depending on changes to your bucket inventory and the criteria that you define.  
This configuration is helpful for cases where you want the scope of the analysis to dynamically adapt to changes to your bucket inventory. If you configure a job to use bucket criteria and run periodically, Macie automatically identifies new buckets that match the criteria and inspects those buckets for sensitive data.

The topics in this section provide additional details about each option.

**Topics**
+ [Selecting specific S3 buckets](#discovery-jobs-scope-buckets-select)
+ [Specifying S3 bucket criteria](#discovery-jobs-scope-buckets-criteria)

### Selecting specific S3 buckets
<a name="discovery-jobs-scope-buckets-select"></a>

If you choose to explicitly select each S3 bucket that you want a job to analyze, Macie provides you with an inventory of your general purpose buckets in the current AWS Region. You can then review your inventory and select the buckets that you want. If you're the Macie administrator for an organization, your inventory includes buckets that your member accounts own. You can select as many as 1,000 of these buckets, spanning as many as 1,000 accounts.

To help you make your bucket selections, the inventory provides details and statistics for each bucket. This includes the amount of data that a job can analyze in each bucket—*classifiable objects* are objects that use a [supported Amazon S3 storage class](discovery-supported-storage.md#discovery-supported-s3-classes) and have a file name extension for a [supported file or storage format](discovery-supported-storage.md#discovery-supported-formats). The inventory also indicates whether you configured any existing jobs to analyze objects in a bucket. These details can help you estimate the breadth of a job and refine your bucket selections.

In the inventory table:
+ **Sensitivity** – Specifies the bucket's current sensitivity score, if [automated sensitive data discovery](discovery-asdd.md) is enabled.
+ **Classifiable objects** – Specifies the total number of objects that the job can analyze in the bucket.
+ **Classifiable size** – Specifies the total storage size of all the objects that the job can analyze in the bucket.

  If the bucket stores compressed objects, this value doesn’t reflect the actual size of those objects after they're decompressed. If versioning is enabled for the bucket, this value is based on the storage size of the latest version of each object in the bucket.
+ **Monitored by job** – Specifies whether you configured any existing jobs to periodically analyze objects in the bucket on a daily, weekly, or monthly basis.

  If the value for this field is **Yes**, the bucket is explicitly included in a periodic job or the bucket matched the criteria for a periodic job within the past 24 hours. In addition, the status of at least one of those jobs is not *Cancelled*. Macie updates this data on a daily basis.
+ **Latest job run** – If you configured any periodic or one-time jobs to analyze objects in the bucket, this field specifies the most recent date and time when one of those jobs started to run. Otherwise, a dash (–) appears in this field.

If the information icon (![\[The information icon, which is a blue circle that has a lowercase letter i in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-info-blue.png)) appears next to any bucket names, we recommend that you retrieve the latest bucket metadata from Amazon S3. To do this, choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) above the table. The information icon indicates that a bucket was created during the past 24 hours, possibly after Macie last retrieved bucket and object metadata from Amazon S3 as part of the daily refresh cycle. For more information, see [Data refreshes](monitoring-s3-how-it-works.md#monitoring-s3-how-it-works-data-refresh).

If the warning icon (![\[The warning icon, which is a red triangle that has an exclamation point in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-warning-red.png)) appears next to a bucket's name, Macie isn't allowed to access the bucket or the bucket's objects. This means that the job won't be able to analyze objects in the bucket. To investigate the issue, review the bucket’s policy and permissions settings in Amazon S3. For example, the bucket might have a restrictive bucket policy. For more information, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md).

To customize your view and find specific buckets more easily, you can filter the table by entering filter criteria in the filter box. The following table provides some examples.


| To show all buckets that... | Apply this filter... | 
| --- | --- | 
| Are owned by a specific account | Account ID = the 12-digit ID for the account | 
| Are publicly accessible | Effective permission = Public | 
| Aren't included in any periodic jobs | Actively monitored by job = False | 
| Aren't included in any periodic or one-time jobs | Defined in job = False | 
| Have a specific tag key\$1 | Tag key = the tag key | 
| Have a specific tag value\$1 | Tag value = the tag value | 
| Store unencrypted objects (or objects that use client-side encryption) | Object count by encryption is No encryption and From = 1 | 

\$1 Tag keys and values are case sensitive. Also, you have to specify a complete, valid value. You can’t specify partial values or use wildcard characters.

To display additional details for a bucket, choose the bucket's name and refer to the details panel. In the panel, you can also:
+ Pivot and drill down on certain fields by choosing a magnifying glass for the field. Choose ![\[The zoom in icon, which is a magnifying glass that has a plus sign in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-magnifying-glass-plus-sign.png) to show buckets with the same value. Choose ![\[The zoom out icon, which is a magnifying glass that has a minus sign in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-magnifying-glass-minus-sign.png) to show buckets with other values.
+ Retrieve the latest metadata for objects in the bucket. This can be helpful if you recently created a bucket or made significant changes to the bucket's objects during the past 24 hours. To retrieve the data, choose refresh (![\[The refresh button, which is a button that displays an empty, dark gray circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-object-data.png)) in the **Object statistics** section of the panel. This option is available for buckets that store 30,000 or fewer objects.

In certain cases, the panel might not include all the details of a bucket. This can occur if you store more than 10,000 buckets in Amazon S3. Macie maintains complete inventory data for only 10,000 buckets for an account—the 10,000 buckets that were most recently created or changed. You can, however, configure a job to analyze objects in buckets that exceed this quota. To review additional details for these buckets, use Amazon S3.

### Specifying S3 bucket criteria
<a name="discovery-jobs-scope-buckets-criteria"></a>

If you choose to specify bucket criteria for a job, Macie provides options for defining and testing the criteria. These are runtime criteria that determine which S3 buckets store objects to analyze. Each time the job runs, Macie identifies general purpose buckets that match your criteria, and then analyzes objects in the appropriate buckets. If you're the Macie administrator for an organization, this includes buckets that your member accounts own. 

#### Defining bucket criteria
<a name="discovery-jobs-scope-buckets-criteria-define"></a>

Bucket criteria consist of one or more conditions that derive from properties of S3 buckets. Each condition, also referred to as a *criterion*, consists of the following parts:
+ A property-based field, such as **Account ID** or **Effective permission**.
+ An operator, either *equals* (`eq`) or *not equals* (`neq`).
+ One or more values.
+ An include or exclude statement that indicates whether to analyze (*include*) or skip (*exclude*) buckets that match the condition.

If you specify more than one value for a field, Macie uses OR logic to join the values. If you specify more than one condition for the criteria, Macie uses AND logic to join the conditions. In addition, exclude conditions take precedence over include conditions. For example, if you include buckets that are publicly accessible and exclude buckets that have specific tags, the job analyzes objects in any bucket that's publicly accessible unless the bucket has one of the specified tags.

You can define conditions that derive from any of the following property-based fields for S3 buckets.

**Account ID**   
The unique identifier (ID) for the AWS account that owns a bucket. To specify multiple values for this field, enter the ID for each account and separate each entry with a comma.  
Note that Macie doesn't support use of wildcard characters or partial values for this field.

**Bucket name**  
The name of a bucket. This field correlates to the **Name** field, not the **Amazon Resource Name (ARN)** field, in Amazon S3. To specify multiple values for this field, enter the name of each bucket and separate each entry with a comma.  
Note that values are case sensitive. In addition, Macie doesn't support use of wildcard characters or partial values for this field. 

**Effective permission**  
Specifies whether a bucket is publicly accessible. You can choose one or more of the following values for this field:  
+ **Not public** – The general public doesn't have read or write access to the bucket.
+ **Public** – The general public has read or write access to the bucket.
+ **Unknown** – Macie wasn't able to evaluate the public access settings for the bucket. An issue or quota prevented Macie from retrieving and evaluating the requisite data.
To determine whether a bucket is publicly accessible, Macie analyzes a combination of account- and bucket-level settings for the bucket: the block public access settings for the account; the block public access settings for the bucket; the bucket policy for the bucket; and, the access control list (ACL) for the bucket. For information about these settings, see [Access control](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-management.html) and [Blocking public access to your Amazon S3 storage](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-control-block-public-access.html) in the *Amazon Simple Storage Service User Guide*.

**Shared access**  
Specifies whether a bucket is shared with another AWS account, an Amazon CloudFront origin access identity (OAI), or a CloudFront origin access control (OAC). You can choose one or more of the following values for this field:  
+ **External** – The bucket is shared with one or more of the following or any combination of the following: a CloudFront OAI, a CloudFront OAC, or an account that's external to (not part of) your organization.
+ **Internal** – The bucket is shared with one or more accounts that are internal to (part of) your organization. It isn't shared with a CloudFront OAI or OAC.
+ **Not shared** – The bucket isn't shared with another account, a CloudFront OAI, or a CloudFront OAC.
+ **Unknown** – Macie wasn't able to evaluate the shared access settings for the bucket. An issue or quota prevented Macie from retrieving and evaluating the requisite data.
To determine whether a bucket is shared with another AWS account, Macie analyzes the bucket policy and ACL for the bucket. In addition, an *organization* is defined as a set of Macie accounts that are centrally managed as a group of related accounts through AWS Organizations or by Macie invitation. For information about Amazon S3 options for sharing buckets, see [Access control](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-management.html) in the *Amazon Simple Storage Service User Guide*.  
To determine whether a bucket is shared with a CloudFront OAI or OAC, Macie analyzes the bucket policy for the bucket. A CloudFront OAI or OAC allows users to access a bucket's objects through one or more specified CloudFront distributions. For information about CloudFront OAIs and OACs, see [Restricting access to an Amazon S3 origin](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-restricting-access-to-s3.html) in the *Amazon CloudFront Developer Guide*.

**Tags**  
The tags that are associated with a bucket. Tags are labels that you can define and assign to certain types of AWS resources, including S3 buckets. Each tag consists of a required tag key and an optional tag value. For information about tagging S3 buckets, see [Using cost allocation S3 bucket tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/CostAllocTagging.html) in the *Amazon Simple Storage Service User Guide*.  
For a sensitive data discovery job, you can use this type of condition to include or exclude buckets that have a specific tag key, a specific tag value, or a specific tag key and tag value (as a pair). For example:  
+ If you specify **Project** as a tag key and don't specify any tag values for a condition, any bucket that has the *Project* tag key matches the condition’s criteria, regardless of the tag values that are associated with that tag key.
+ If you specify **Development** and **Test** as tag values and don't specify any tag keys for a condition, any bucket that has the **Development** or **Test** tag value matches the condition’s criteria, regardless of the tag keys that are associated with those tag values.
Tag keys and values are case sensitive. In addition, Macie doesn't support use of wildcard characters or partial values in tag conditions.  
To specify multiple tag keys in a condition, enter each tag key in the **Key** field and separate each entry with a comma. To specify multiple tag values in a condition, enter each tag value in the **Value** field and separate each entry with a comma.  
If you store more than 10,000 buckets in Amazon S3, note that Macie doesn't maintain tag data for all the buckets. Macie maintains complete inventory data for only 10,000 buckets for an account—the 10,000 buckets that were most recently created or changed. For all other buckets, any associated tag keys and values aren't included in inventory data. This means that the buckets won't match specific tag keys or values in a condition that uses the *equals* (`eq`) operator. If you specify a *not equals* (`neq`) operator for a tag-based condition, this means that the buckets will match the condition.

#### Testing bucket criteria
<a name="discovery-jobs-scope-buckets-criteria-test"></a>

While you define your bucket criteria, you can test and refine the criteria by previewing the results. To do this, expand the **Preview the criteria results** section that appears below the criteria on the console. This section displays a table of up to 25 general purpose buckets that currently match the criteria.

The table also provides insight into the amount of data that the job can analyze in each bucket—*classifiable objects* are objects that use a [supported Amazon S3 storage class](discovery-supported-storage.md#discovery-supported-s3-classes) and have a file name extension for a [supported file or storage format](discovery-supported-storage.md#discovery-supported-formats). The table also indicates whether you configured any existing jobs to periodically analyze objects in a bucket.

In the table:
+ **Sensitivity** – Specifies the bucket's current sensitivity score, if [automated sensitive data discovery](discovery-asdd.md) is enabled.
+ **Classifiable objects** – Specifies the total number of objects that the job can analyze in the bucket.
+ **Classifiable size** – Specifies the total storage size of all the objects that the job can analyze in the bucket.

  If the bucket stores compressed objects, this value doesn’t reflect the actual size of those objects after they're decompressed. If versioning is enabled for the bucket, this value is based on the storage size of the latest version of each object in the bucket.
+ **Monitored by job** – Specifies whether you configured any existing jobs to periodically analyze objects in the bucket on a daily, weekly, or monthly basis.

  If the value for this field is **Yes**, the bucket is explicitly included in a periodic job or the bucket matched the criteria for a periodic job within the past 24 hours. In addition, the status of at least one of those jobs is not *Cancelled*. Macie updates this data on a daily basis.

If the warning icon (![\[The warning icon, which is a red triangle that has an exclamation point in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-warning-red.png)) appears next to a bucket's name, Macie isn't allowed to access the bucket or the bucket's objects. This means that the job won't be able to analyze objects in the bucket. To investigate the issue, review the bucket’s policy and permissions settings in Amazon S3. For example, the bucket might have a restrictive bucket policy. For more information, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md).

To refine the bucket criteria for the job, use the filter options to add, change, or remove conditions from the criteria. Macie then updates the table to reflect your changes.

## Sampling depth
<a name="discovery-jobs-scope-sampling"></a>

With this option, you specify the percentage of eligible S3 objects that you want a sensitive data discovery job to analyze. Eligible objects are objects that: use a [supported Amazon S3 storage class](discovery-supported-storage.md#discovery-supported-s3-classes), have a file name extension for a [supported file or storage format](discovery-supported-storage.md#discovery-supported-formats), and match other criteria that you specify for the job.

If this value is less than 100%, Macie selects eligible objects to analyze at random, up to the specified percentage, and analyzes all the data in those objects. For example, if you configure a job to analyze 10,000 objects and you specify a sampling depth of 20%, Macie analyzes approximately 2,000 randomly selected, eligible objects when the job runs.

Reducing the sampling depth of a job can lower the cost and reduce the duration of a job. It's helpful for cases where the data in objects is highly consistent and you want to determine whether an S3 bucket, rather than each object, stores sensitive data.

Note that this option controls the percentage of *objects* that are analyzed, not the percentage of *bytes* that are analyzed. If you enter a sampling depth that’s less than 100%, Macie analyzes all the data in each selected object, not that percentage of the data in each selected object.

## Initial run: Include existing S3 objects
<a name="discovery-jobs-scope-objects"></a>

You can use sensitive data discovery jobs to perform ongoing, incremental analysis of objects in S3 buckets. If you configure a job to run periodically, Macie does this for you automatically—each run analyzes only those objects that were created or changed after the preceding run. With the **Include existing objects** option, you choose the starting point for the first increment:
+ To analyze all existing objects immediately after you finish creating the job, select the checkbox for this option.
+ To wait and analyze only those objects that are created or changed after you create the job and before the first run, clear the checkbox for this option.

  Clearing this checkbox is helpful for cases where you already analyzed the data and want to continue to analyze it periodically. For example, if you previously used another service or application to classify data and you recently started using Macie, you might use this option to ensure continued discovery and classification of your data without incurring unnecessary costs or duplicating classification data.

Each subsequent run of a periodic job automatically analyzes only those objects that are created or changed after the preceding run.

For both periodic and one-time jobs, you can also configure a job to analyze only those objects that are created or changed before or after a certain time or during a certain time range. To do this, add object criteria that use the last modified date for objects.

## S3 object criteria
<a name="discovery-jobs-scope-criteria"></a>

To fine tune the scope of a sensitive data discovery job, you can define custom criteria for S3 objects. Macie uses these criteria to determine which objects to analyze (*include*) or skip (*exclude*) when the job runs. The criteria consist of one or more conditions that derive from properties of S3 objects. The conditions apply to objects in all the S3 buckets that are included in the analysis. If a bucket stores multiple versions of an object, the conditions apply to the latest version of the object.

If you define multiple conditions as object criteria, Macie uses AND logic to join the conditions. In addition, exclude conditions take precedence over include conditions. For example, if you include objects that have the .pdf file name extension and exclude objects that are larger than 5 MB, the job analyzes any object that has the .pdf file name extension, unless the object is larger than 5 MB.

You can define conditions that derive from any of the following properties of S3 objects.

**File name extension**  
This correlates to the file name extension of an S3 object. You can use this type of condition to include or exclude objects based on file type. To do this for multiple types of files, enter the file name extension for each type and separate each entry with a comma—for example: **docx,pdf,xlsx**. If you enter multiple file name extensions as values for a condition, Macie uses OR logic to join the values.  
Note that values are case sensitive. In addition, Macie doesn't support the use of partial values or wildcard characters in this type of condition.  
For information about the types of files that Macie can analyze, see [Supported file and storage formats](discovery-supported-storage.md#discovery-supported-formats).

**Last modified**  
This correlates to the **Last modified** field in Amazon S3. In Amazon S3, this field stores the date and time when an S3 object was created or last changed, whichever is latest.  
For a sensitive data discovery job, this condition can be a specific date, a specific date and time, or an exclusive time range:  
+ To analyze objects that were last modified after a certain date or date and time, enter the values in the **From** fields.
+ To analyze objects that were last modified before a certain date or date and time, enter the values in the **To** fields.
+ To analyze objects that were last modified during a certain time range, use the **From** fields to enter the values for the first date or date and time in the time range. Use the **To** fields to enter the values for the last date or date and time in the time range.
+ To analyze objects that were last modified at any time during a certain single day, enter the date in the **From** date field. Enter the date for the next day in the **To** date field. Then verify that both time fields are blank. (Macie treats a blank time field as `00:00:00`.) For example, to analyze objects that changed on August 9, 2023, enter **2023/08/09** in the **From** date field, enter **2023/08/10** in the **To** date field, and don't enter a value in either time field.
Enter any time values in Coordinated Universal Time (UTC) and use 24-hour notation.

**Prefix**  
This correlates to the **Key** field in Amazon S3. In Amazon S3, this field stores the name of an S3 object, including the object's prefix. A *prefix* is similar to a directory path within a bucket. It enables you to group similar objects together in a bucket, much like you might store similar files together in a folder on a file system. For information about object prefixes and folders in Amazon S3, see [Organizing objects in the Amazon S3 console using folders](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html) in the *Amazon Simple Storage Service User Guide*.  
You can use this type of condition to include or exclude objects whose keys (names) begin with a certain value. For example, to exclude all objects whose key begins with *AWSLogs*, enter **AWSLogs** as the value for a **Prefix** condition, and then choose **Exclude**.   
If you enter multiple prefixes as values for a condition, Macie uses OR logic to join the values. For example, if you enter **AWSLogs1** and **AWSLogs2** as values for a condition, any object whose key begins with *AWSLogs1* or *AWSLogs2* matches the condition’s criteria.  
When you enter a value for a **Prefix** condition, keep the following in mind:  
+ Values are case sensitive.
+ Macie doesn't support the use of wildcard characters in these values.
+ In Amazon S3, an object’s key doesn’t include the name of the bucket that stores the object. For this reason, don’t specify bucket names in these values.
+ If a prefix includes a delimiter, include the delimiter in the value. For example, enter **AWSLogs/eventlogs** to define a condition for all objects whose key begins with *AWSLogs/eventlogs*. Macie supports the default Amazon S3 delimiter, which is a slash (/), and custom delimiters.
Also note that an object matches a condition's criteria only if the object's key exactly matches the value that you enter, starting with the first character in the object's key. In addition, Macie applies a condition to the complete **Key** value for an object, including the object's file name.   
For example, if an object's key is *AWSLogs/eventlogs/testlog.csv* and you enter any of the following values for a condition, the object matches the condition's criteria:  
+ **AWSLogs**
+ **AWSLogs/event**
+ **AWSLogs/eventlogs/**
+ **AWSLogs/eventlogs/testlog**
+ **AWSLogs/eventlogs/testlog.csv**
However, if you enter **eventlogs**, the object doesn't match the criteria—the condition's value doesn't include the first part of the key, *AWSLogs/*. Similarly, if you enter **awslogs**, the object doesn't match the criteria due to differences in capitalization.

**Storage size**  
This correlates to the **Size** field in Amazon S3. In Amazon S3, this field indicates the total storage size of an S3 object. If an object is a compressed file, this value doesn't reflect the actual size of the file after it's decompressed.  
You can use this type of condition to include or exclude objects that are smaller than a certain size, larger than a certain size, or fall within a certain size range. Macie applies this type of condition to all types of objects, including compressed or archive files and the files that they contain. For information about size-based restrictions for each supported format, see [Quotas for Macie](macie-quotas.md).

**Tags**  
The tags that are associated with an S3 object. Tags are labels that you can define and assign to certain types of AWS resources, including S3 objects. Each tag consists of a required tag key and an optional tag value. For information about tagging S3 objects, see [Categorizing your storage using tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) in the *Amazon Simple Storage Service User Guide*.  
For a sensitive data discovery job, you can use this type of condition to include or exclude objects that have a specific tag. This can be a specific tag key or a specific tag key and tag value (as a pair). If you specify multiple tags as values for a condition, Macie uses OR logic to join the values. For example, if you specify **Project1** and **Project2** as tag keys for a condition, any object that has the *Project1* or *Project2* tag key matches the condition’s criteria.  
Note that tag keys and values are case sensitive. In addition, Macie doesn't support use of partial values or wildcard characters in this type of condition.

# Creating a sensitive data discovery job
<a name="discovery-jobs-create"></a>

With Amazon Macie, you can create and run sensitive data discovery jobs to automate discovery, logging, and reporting of sensitive data in Amazon Simple Storage Service (Amazon S3) general purpose buckets. A *sensitive data discovery job* is a series of automated processing and analysis tasks that Macie performs to detect and report sensitive data in Amazon S3 objects. As the analysis progresses, Macie provides detailed reports of the sensitive data that it finds and the analysis that it performs: *sensitive data findings*, which report sensitive data that Macie finds in individual S3 objects, and *sensitive data discovery results*, which log details about the analysis of individual S3 objects. For more information, see [Reviewing job results](discovery-jobs-manage-results.md).

When you create a job, you start by specifying which S3 buckets store objects that you want Macie to analyze when the job runs—specific buckets that you select or buckets that match specific criteria. Then you specify how often to run the job—once, or periodically on a daily, weekly, or monthly basis. You can also choose options to refine the scope of the job's analysis. The options include custom criteria that derive from properties of S3 objects, such as tags, prefixes, and when an object was last modified.

After you define the schedule and scope of the job, you specify which managed data identifiers and custom data identifiers to use: 
+ A *managed data identifier* is a set of built-in criteria and techniques that are designed to detect a specific type of sensitive data—for example, credit card numbers, AWS secret access keys, or passport numbers for a particular country or region. These identifiers can detect a large and growing list of sensitive data types for many countries and regions, including multiple types of credentials data, financial information, and personally identifiable information (PII). For more information, see [Using managed data identifiers](managed-data-identifiers.md).
+ A *custom data identifier* is a set of criteria that you define to detect sensitive data. With custom data identifiers, you can detect sensitive data that reflects your organization's particular scenarios, intellectual property, or proprietary data—for example, employee IDs, customer account numbers, or internal data classifications. You can supplement the managed data identifiers that Macie provides. For more information, see [Building custom data identifiers](custom-data-identifiers.md).

You then optionally select allow lists to use. In Macie, an *allow list* specifies text or a text pattern to ignore. These are typically sensitive data exceptions for your particular scenarios or environment—for example, public names or phone numbers for your organization, or sample data that your organization uses for testing. For more information, see [Defining sensitive data exceptions with allow lists](allow-lists.md).

When you finish choosing these options, you're ready to enter general settings for the job, such as the job's name and description. You can then review and save the job.

**Topics**
+ [Before you begin: Set up key resources](#discovery-jobs-create-prerequisites)
+ [Step 1: Choose S3 buckets](#discovery-jobs-create-step1)
+ [Step 2: Review your S3 bucket selections or criteria](#discovery-jobs-create-step2)
+ [Step 3: Define the schedule and refine the scope](#discovery-jobs-create-step3)
+ [Step 4: Select managed data identifiers](#discovery-jobs-create-step4)
+ [Step 5: Select custom data identifiers](#discovery-jobs-create-step5)
+ [Step 6: Select allow lists](#discovery-jobs-create-step6)
+ [Step 7: Enter general settings](#discovery-jobs-create-step7)
+ [Step 8: Review and create](#discovery-jobs-create-step8)

## Before you begin: Set up key resources
<a name="discovery-jobs-create-prerequisites"></a>

Before you create a job, it's a good idea to take the following steps: 
+ Verify that you configured a repository for your sensitive data discovery results. To do this, choose **Discovery results** in the navigation pane on the Amazon Macie console. To learn about these settings, see [Storing and retaining sensitive data discovery results](discovery-results-repository-s3.md).
+ Create any custom data identifiers that you want the job to use. To learn how, see [Building custom data identifiers](custom-data-identifiers.md).
+ Create any allow lists that you want the job to use. To learn how, see [Defining sensitive data exceptions with allow lists](allow-lists.md).
+ If you want to analyze S3 objects that are encrypted, ensure that Macie can access and use the appropriate encryption keys. For more information, see [Analyzing encrypted S3 objects](discovery-supported-encryption-types.md).
+ If you want to analyze objects in an S3 bucket that has a restrictive bucket policy, ensure that Macie is allowed to access the objects. For more information, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md).

If you do these things before you create a job, you streamline creation of the job and help ensure that the job can analyze the data that you want.

## Step 1: Choose S3 buckets
<a name="discovery-jobs-create-step1"></a>

When you create a job, the first step is to specify which S3 buckets store objects that you want Macie to analyze when the job runs. For this step, you have two options:
+ **Select specific buckets** – With this option, you explicitly select each S3 bucket to analyze. Then, when the job runs, Macie analyzes objects only in the buckets that you select.
+ **Specify bucket criteria** – With this option, you define runtime criteria that determine which S3 buckets to analyze. The criteria consist of one or more conditions that derive from bucket properties. Then, when the job runs, Macie identifies buckets that match your criteria and analyzes objects in those buckets.

For detailed information about these options, see [Scope options for jobs](discovery-jobs-scope.md).

The following sections provide instructions for choosing and configuring each option. Choose the section for the option that you want.

### Select specific buckets
<a name="discovery-jobs-create-step1-buckets-select"></a>

If you choose to explicitly select each S3 bucket to analyze, Macie provides you with an inventory of your general purpose buckets in the current AWS Region. You can then use this inventory to select one or more buckets for the job. To learn about this inventory, see [Selecting specific S3 buckets](discovery-jobs-scope.md#discovery-jobs-scope-buckets-select).

If you're the Macie administrator for an organization, the inventory includes buckets that are owned by member accounts in your organization. You can select as many as 1,000 of these buckets, spanning as many as 1,000 accounts.

**To select specific S3 buckets for the job**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**.

1. Choose **Create job**.

1. On the **Choose S3 buckets** page, choose **Select specific buckets**. Macie displays a table of all the general purpose buckets for your account in the current Region. 

1. In the **Select S3 buckets** section, optionally choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) to retrieve the latest bucket metadata from Amazon S3.

   If the information icon (![\[The information icon, which is a blue circle that has a lowercase letter i in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-info-blue.png)) appears next to any bucket names, we recommend that you do this. This icon indicates that a bucket was created during the past 24 hours, possibly after Macie last retrieved bucket and object metadata from Amazon S3 as part of the [daily refresh cycle](monitoring-s3-how-it-works.md#monitoring-s3-how-it-works-data-refresh).

1. In the table, select the checkbox for each bucket that you want the job to analyze. 
**Tip**  
To find specific buckets more easily, enter filter criteria in the filter box above the table. You can also sort the table by choosing a column heading.
To determine whether you already configured a job to periodically analyze objects in a bucket, refer to the **Monitored by job** field. If **Yes** appears in a field, the bucket is explicitly included in a periodic job or the bucket matched the criteria for a periodic job within the past 24 hours. In addition, the status of at least one of those jobs is not *Cancelled*. Macie updates this data on a daily basis. 
To determine when an existing periodic or one-time job most recently analyzed objects in a bucket, refer to the **Latest job run** field. For additional information about that job, refer to the bucket's details.
To display a bucket's details, choose the bucket's name. In addition to job-related information, the details panel provides statistics and other information about the bucket, such as the bucket's public access settings. To learn more about this data, see [Reviewing your S3 bucket inventory](monitoring-s3-inventory-review.md).

1. When you finish selecting buckets, choose **Next**.

In the next step, you'll review and verify your selections.

### Specify bucket criteria
<a name="discovery-jobs-create-step1-buckets-criteria"></a>

If you choose to specify runtime criteria that determine which S3 buckets to analyze, Macie provides options to help you choose fields, operators, and values for individual conditions in the criteria. To learn more about these options, see [Specifying S3 bucket criteria](discovery-jobs-scope.md#discovery-jobs-scope-buckets-criteria).

**To specify S3 bucket criteria for the job**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**.

1. Choose **Create job**.

1. On the **Choose S3 buckets** page, choose **Specify bucket criteria**.

1. Under **Specify bucket criteria**, do the following to add a condition to the criteria:

   1. Place your cursor in the filter box, and then choose the bucket property to use for the condition.

   1. In the first box, choose an operator for the condition, **Equals** or **Not equals**.

   1. In the next box, enter one or more values for the property.

      Depending on the type and nature of the bucket property, Macie displays different options for entering values. For example, if you choose the **Effective permission** property, Macie displays a list of values to choose from. If you choose the **Account ID** property, Macie displays a text box in which you can enter one or more AWS account IDs. To enter multiple values in a text box, enter each value and separate each entry with a comma.

   1. Choose **Apply**. Macie adds the condition and displays it below the filter box.

      By default, Macie adds the condition with an include statement. This means that the job is configured to analyze (*include*) objects in buckets that match the condition. To skip (*exclude*) buckets that match the condition, choose **Include** for the condition, and then choose **Exclude**.

   1. Repeat the preceding steps for each additional condition that you want to add to the criteria.

1. To test your criteria, expand the **Preview the criteria results** section. This section displays a table of up to 25 general purpose buckets that currently match the criteria.

1. To refine your criteria, do any of the following: 
   + To remove a condition, choose **X** for the condition.
   + To change a condition, remove the condition by choosing **X** for the condition. Then add a condition that has the correct settings.
   + To remove all conditions, choose **Clear filters**.

   Macie updates the table of criteria results to reflect your changes.

1. When you finish specifying bucket criteria, choose **Next**.

In the next step, you'll review and verify your criteria.

## Step 2: Review your S3 bucket selections or criteria
<a name="discovery-jobs-create-step2"></a>

For this step, verify that you chose the correct settings in the preceding step:
+ **Review your bucket selections** ‐ If you selected specific S3 buckets for the job, review the table of buckets and change your bucket selections as necessary. The table provides insight into the projected scope and cost of the job's analysis. The data is based on the size and types of objects that are currently stored in a bucket.

  In the table, the **Estimated cost** field indicates the total estimated cost (in US dollars) of analyzing objects in an S3 bucket. Each estimate reflects the projected amount of uncompressed data that the job will analyze in a bucket. If any objects are compressed or archive files, the estimate assumes that the files use a 3:1 compression ratio and the job can analyze all extracted files. For more information, see [Forecasting and monitoring job costs](discovery-jobs-costs.md).
+ **Review your bucket criteria** ‐ If you specified bucket criteria for the job, review each condition in the criteria. To change the criteria, choose **Previous**, and then use the filter options in the preceding step to enter the correct criteria. When you finish, choose **Next**.

When you finish reviewing and verifying the settings, choose **Next**.

## Step 3: Define the schedule and refine the scope
<a name="discovery-jobs-create-step3"></a>

For this step, specify how often you want the job to run—once, or periodically on a daily, weekly, or monthly basis. Also choose various options to refine the scope of the job's analysis. To learn about these options, see [Scope options for jobs](discovery-jobs-scope.md).

**To define the schedule and refine the scope of the job**

1. On the **Refine the scope** page, specify how often you want the job to run: 
   + To run the job only once, immediately after you finish creating it, choose **One-time job**.
   + To run the job periodically on a recurring basis, choose **Scheduled job**. For **Update frequency**, choose whether to run the job daily, weekly, or monthly. Then use the **Include existing objects** option to define the scope of the job's first run:
     + Select this checkbox to analyze all existing objects immediately after you finish creating the job. Each subsequent run analyzes only those objects that are created or changed after the preceding run.
     + Clear this checkbox to skip analysis of all existing objects. The job's first run analyzes only those objects that are created or changed after you finish creating the job and before the first run starts. Each subsequent run analyzes only those objects that are created or changed after the preceding run.

       Clearing this checkbox is helpful for cases where you already analyzed the data and want to continue to analyze it periodically. For example, if you previously used another service or application to classify data and you recently started using Macie, you might use this option to ensure continued discovery and classification of your data without incurring unnecessary costs or duplicating classification data.

1. (Optional) To specify the percentage of objects that you want the job to analyze, enter the percentage in the **Sampling depth** box.

   If this value is less than 100%, Macie selects the objects to analyze at random, up to the specified percentage, and analyzes all the data in those objects. The default value is 100%.

1. (Optional) To add specific criteria that determine which S3 objects are included or excluded from the job's analysis, expand the **Additional settings** section, and then enter the criteria. These criteria consist of individual conditions that derive from properties of objects:
   + To analyze (*include*) objects that meet a specific condition, enter the condition type and value, and then choose **Include**.
   + To skip (*exclude*) objects that meet a specific condition, enter the condition type and value, and then choose **Exclude**.

   Repeat this step for each include or exclude condition that you want.

   If you enter multiple conditions, any exclude conditions take precedence over include conditions. For example, if you include objects that have the .pdf file name extension and exclude objects that are larger than 5 MB, the job analyzes any object that has the .pdf file name extension, unless the object is larger than 5 MB.

1. When you finish, choose **Next**.

## Step 4: Select managed data identifiers
<a name="discovery-jobs-create-step4"></a>

For this step, specify which managed data identifiers you want the job to use when it analyzes S3 objects. You have two options:
+ **Use recommended settings** ‐ With this option, the job analyzes S3 objects by using the set of managed data identifiers that we recommend for jobs. This set is designed to detect common categories and types of sensitive data. To review a list of managed data identifiers that are currently in the set, see [Managed data identifiers recommended for jobs](discovery-jobs-mdis-recommended.md). We update that list each time we add or remove a managed data identifier from the set.
+ **Use custom settings** ‐ With this option, the job analyzes S3 objects by using managed data identifiers that you select. This can be all or only some of the managed data identifiers that are currently available. You can also configure the job to not use any managed data identifiers. The job can instead use only custom data identifiers that you select in the next step. To review a list of managed data identifiers that are currently available, see [Quick reference: Managed data identifiers by type](mdis-reference-quick.md). We update that list each time we release a new managed data identifier.

When you choose either option, Macie displays a table of managed data identifiers. In the table, the **Sensitive data type** field specifies the unique identifier (ID) for a managed data identifier. This ID describes the type of sensitive data that the managed data identifier is designed to detect, for example: **USA\$1PASSPORT\$1NUMBER** for US passport numbers, **CREDIT\$1CARD\$1NUMBER** for credit card numbers, and **PGP\$1PRIVATE\$1KEY** for PGP private keys. To find specific identifiers more quickly, you can sort and filter the table by sensitive data category or type.

**To select managed data identifiers for the job**

1. On the **Select managed data identifiers** page, under **Managed data identifier options**, do one of the following:
   + To use the set of managed data identifiers that we recommend for jobs, choose **Recommended**.

     If you choose this option and you configured the job to run more than once, each run automatically uses all the managed data identifiers that are in the recommended set when the run starts. This includes new managed data identifiers that we release and add to the set. It excludes managed data identifiers that we remove from the set and no longer recommend for jobs.
   + To use only specific managed data identifiers that you select, choose **Custom**, and then choose **Use specific managed data identifiers**. Then, in the table, select the checkbox for each managed data identifier that you want the job to use.

     If you choose this option and you configured the job to run more than once, each run uses only the managed data identifiers that you select. In other words, the job uses these same managed data identifiers each time it runs.
   + To use all the managed data identifiers that Macie currently provides, choose **Custom**, and then choose **Use specific managed data identifiers**. Then, in the table, select the checkbox in the selection column heading to select all rows.

     If you choose this option and you configured the job to run more than once, each run uses only the managed data identifiers that you select. In other words, the job uses these same managed data identifiers each time it runs.
   + To not use any managed data identifiers and use only custom data identifiers, choose **Custom**, and then choose **Don't use any managed data identifiers**. Then, in the next step, select the custom data identifiers to use.

1. When you finish, choose **Next**.

## Step 5: Select custom data identifiers
<a name="discovery-jobs-create-step5"></a>

For this step, select any custom data identifiers that you want the job to use when it analyzes S3 objects. The job will use the selected identifiers in addition to any managed data identifiers that you configured the job to use. To learn more about custom data identifiers, see [Building custom data identifiers](custom-data-identifiers.md).

**To select custom data identifiers for the job**

1. On the **Select custom data identifiers** page, select the checkbox for each custom data identifier that you want the job to use. You can select as many as 30 custom data identifiers.
**Tip**  
To review or test the settings for a custom data identifier before you select it, choose the link icon (![\[The link icon, which is a blue box that has an arrow in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-external-link.png)) next to the identifier's name. Macie opens a page that displays the identifier's settings.  
You can also use this page to test the identifier with sample data. To do this, enter up to 1,000 characters of text in the **Sample data** box, and then choose **Test**. Macie evaluates the sample data by using the identifier, and then reports the number of matches.

1. When you finish selecting custom data identifiers, choose **Next**.

## Step 6: Select allow lists
<a name="discovery-jobs-create-step6"></a>

For this step, select any allow lists that you want the job to use when it analyzes S3 objects. To learn more about allow lists, see [Defining sensitive data exceptions with allow lists](allow-lists.md).

**To select allow lists for the job**

1. On the **Select allow lists** page, select the checkbox for each allow list that you want the job to use. You can select as many as 10 lists.
**Tip**  
To review the settings for an allow list before you select it, choose the link icon (![\[The link icon, which is a blue box that has an arrow in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-external-link.png)) next to the list's name. Macie opens a page that displays the list's settings.  
If the list specifies a regular expression (*regex*), you can also use this page to test the regex with sample data. To do this, enter up to 1,000 characters of text in the **Sample data** box, and then choose **Test**. Macie evaluates the sample data by using the regex, and then reports the number of matches.

1. When you finish selecting allow lists, choose **Next**.

## Step 7: Enter general settings
<a name="discovery-jobs-create-step7"></a>

For this step, specify a name and, optionally, a description of the job. You can also assign tags to the job. A *tag* is a label that you define and assign to certain types of AWS resources. Each tag consists of a required tag key and an optional tag value. Tags can help you identify, categorize, and manage resources in different ways, such as by purpose, owner, environment, or other criteria. To learn more, see [Tagging Macie resources](tagging-resources.md).

**To enter general settings for the job**

1. On the **Enter general settings** page, enter a name for the job in the **Job name** box. The name can contain as many as 500 characters. 

1. (Optional) For **Job description**, enter a brief description of the job. The description can contain as many as 200 characters. 

1. (Optional) For **Tags**, choose **Add tag**, and then enter as many as 50 tags to assign to the job.

1. When you finish, choose **Next**.

## Step 8: Review and create
<a name="discovery-jobs-create-step8"></a>

For this final step, review the job's configuration settings and verify that they're correct. This is an important step. After you create a job, you can’t change any of these settings. This helps ensure that you have an immutable history of sensitive data findings and discovery results for data privacy and protection audits or investigations that you perform.

Depending on the job's settings, you can also review the total estimated cost (in US dollars) of running the job once. If you selected specific S3 buckets for the job, the estimate is based on the size and types of objects in the buckets that you selected, and how much of that data the job can analyze. If you specified bucket criteria for the job, the estimate is based on the size and types of objects in as many as 500 buckets that currently match the criteria, and how much of that data the job can analyze. To learn about this estimate, see [Forecasting and monitoring job costs](discovery-jobs-costs.md).

**To review and create the job**

1. On the **Review and create** page, review each setting and verify that it's correct. To change a setting, choose **Edit** in the section that contains the setting, and then enter the correct setting. You can also use the navigation tabs to go to the page that contains a setting.

1. When you finish verifying the settings, choose **Submit** to create and save the job. Macie checks the settings and notifies you of any issues to address.
**Note**  
If you haven’t configured a repository for your sensitive data discovery results, Macie displays a warning and doesn't save the job. To address this issue, choose **Configure** in the **Repository for sensitive data discovery results** section. Then enter the configuration settings for the repository. To learn how, see [Storing and retaining sensitive data discovery results](discovery-results-repository-s3.md). After you enter the settings, return to the **Review and create** page and choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) in the **Repository for sensitive data discovery results** section of the page.  
Although we don't recommend it, you can temporarily override the repository requirement and save the job. If you do this, you risk losing discovery results from the job—Macie retains the results for only 90 days. To temporarily override the requirement, select the checkbox for the override option.

1. If Macie notifies you of issues to address, address the issues, and then choose **Submit** again to create and save the job.

If you configured the job to run once, on a daily basis, or on the current day of the week or month, Macie starts running the job immediately after you save it. Otherwise, Macie prepares to run the job on the specified day of the week or month. To monitor the job, you can [check the status of the job](discovery-jobs-status-check.md).

# Reviewing the results of a sensitive data discovery job
<a name="discovery-jobs-manage-results"></a>

When you run a sensitive data discovery job, Amazon Macie automatically calculates and reports certain statistical data for the job. For example, Macie reports the number of times that the job has run, and the approximate number of Amazon Simple Storage Service (Amazon S3) objects that the job has yet to process during its current run. Macie also produces several types of results for the job: *log events*, *sensitive data findings*, and *sensitive data discovery results*.

**Topics**
+ [Types of job results](#discovery-jobs-manage-results-types)
+ [Reviewing job statistics and results](#discovery-jobs-manage-results-review)

## Types of results for sensitive data discovery jobs
<a name="discovery-jobs-manage-results-types"></a>

As a sensitive data discovery job progresses, Amazon Macie produces the following types of results for the job.

**Log event**  
This is a record of an event that occurred while the job was running. Macie automatically logs and publishes data for certain events to Amazon CloudWatch Logs. The data in these logs provides a record of changes to the job's progress or status, such as the exact date and time when the job started or stopped running. The data also provides details about any account- or bucket-level errors that occurred while the job ran.  
Log events can help you monitor a job and address any issues that prevented the job from analyzing the data that you want. If a job uses runtime criteria to determine which S3 buckets to analyze, log events can also help you determine whether and which S3 buckets matched the criteria when the job ran.  
You can access log events by using the Amazon CloudWatch console or the Amazon CloudWatch Logs API. To help you navigate to the log events for a job, the Amazon Macie console provides a link to them. For more information, see [Monitoring jobs with CloudWatch Logs](discovery-jobs-monitor-cw-logs.md).

**Sensitive data finding**  
This is a report of sensitive data that Macie found in an S3 object. Each finding provides a severity rating and details such as:  
+ The date and time when Macie found the sensitive data.
+ The category and types of sensitive data that Macie found.
+ The number of occurrences of each type of sensitive data that Macie found.
+ The unique identifier for the job that produced the finding.
+ The name, public access settings, encryption type, and other information about the affected S3 bucket and object.
Depending on the affected S3 object's file type or storage format, the details can also include the location of as many as 15 occurrences of the sensitive data that Macie found. To report location data, sensitive data findings use a [standardized JSON schema](findings-locate-sd-schema.md).  
A sensitive data finding doesn't include the sensitive data that Macie found. Instead, it provides information that you can use for further investigation and remediation as necessary.  
Macie stores sensitive data findings for 90 days. You can access them by using the Amazon Macie console or the Amazon Macie API. You can also monitor and process them by using other applications, services, and systems. For more information, see [Reviewing and analyzing findings](findings.md).

**Sensitive data discovery result**  
This is a record that logs details about the analysis of an S3 object. Macie automatically creates a sensitive data discovery result for each object that you configure a job to analyze. This includes objects that Macie doesn't find sensitive data in, and therefore don't produce sensitive data findings, and objects that Macie can't analyze due to errors or issues such as permissions settings or use of an unsupported file or storage format.  
If Macie finds sensitive data in an S3 object, the sensitive data discovery result includes data from the corresponding sensitive data finding. It provides additional information too, such as the location of as many as 1,000 occurrences of each type of sensitive data that Macie found in the object. For example:   
+ The column and row number for a cell or field in a Microsoft Excel workbook, CSV file, or TSV file
+ The path to a field or array in a JSON or JSON Lines file
+ The line number for a line in a non-binary text file other than a CSV, JSON, JSON Lines, or TSV file—for example, an HTML, TXT, or XML file
+ The page number for a page in an Adobe Portable Document Format (PDF) file
+ The record index and the path to a field in a record in an Apache Avro object container or Apache Parquet file
If the affected S3 object is an archive file, such as a .tar or .zip file, the sensitive data discovery result also provides detailed location data for occurrences of sensitive data in individual files that Macie extracted from the archive. Macie doesn’t include this information in sensitive data findings for archive files. To report location data, sensitive data discovery results use a [standardized JSON schema](findings-locate-sd-schema.md).  
A sensitive data discovery result doesn't include the sensitive data that Macie found. Instead, it provides you with an analysis record that can be helpful for data privacy and protection audits or investigations.  
Macie stores your sensitive data discovery results for 90 days. You can’t access them directly on the Amazon Macie console or with the Amazon Macie API. Instead, you configure Macie to encrypt and store them in an S3 bucket. The bucket can serve as a definitive, long-term repository for all of your sensitive data discovery results. You can then optionally access and query the results in that repository. To learn how to configure these settings, see [Storing and retaining sensitive data discovery results](discovery-results-repository-s3.md).  
After you configure the settings, Macie writes your sensitive data discovery results to JSON Lines (.jsonl) files, and it encrypts and adds those files to the S3 bucket as GNU Zip (.gz) files. To help you navigate to the results, the Amazon Macie console provides links to them.

Sensitive data findings and sensitive data discovery results both adhere to standardized schemas. This can help you optionally query, monitor, and process them by using other applications, services, and systems.

**Tips**  
For a detailed, instructional example of how you might query and use sensitive data discovery results to analyze and report potential data security risks, see the following blog post on the *AWS Security Blog*: [How to query and visualize Macie sensitive data discovery results with Amazon Athena and Amazon Quick](https://aws.amazon.com/blogs/security/how-to-query-and-visualize-macie-sensitive-data-discovery-results-with-athena-and-quicksight/).  
For samples of Amazon Athena queries that you can use to analyze sensitive data discovery results, visit the [Amazon Macie Results Analytics repository](https://github.com/aws-samples/amazon-macie-results-analytics) on GitHub. This repository also provides instructions for configuring Athena to retrieve and decrypt your results, and scripts for creating tables for the results.

## Reviewing statistics and results for a sensitive data discovery job
<a name="discovery-jobs-manage-results-review"></a>

To review processing statistics and the results of a sensitive data discovery job, you can use the Amazon Macie console or the Amazon Macie API. Follow these steps to review the statistics and results by using the console.

To access a job's processing statistics programmatically, use the [DescribeClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API. For programmatic access to the findings that a job produced, use the [ListFindings](https://docs.aws.amazon.com/macie/latest/APIReference/findings.html) operation and specify the job's unique identifier in a filter condition for the `classificationDetails.jobId` field. To learn how, see [Creating and applying filters to Macie findings](findings-filter-procedure.md). You can then use the [GetFindings](https://docs.aws.amazon.com/macie/latest/APIReference/findings-describe.html) operation to retrieve the details of the findings.

**To review statistics and results for a job**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**.

1. On the **Jobs** page, choose the name of the job whose statistics and results you want to review. The details panel displays statistics, settings, and other information about the job.

1. In the details panel, do any of the following:
   + To review processing statistics for the job, refer to the **Statistics** section of the panel. This section displays statistics such as the number of times that the job has run, and the approximate number of objects that the job has yet to process during its current run.
   + To review log events for the job, choose **Show results** at the top of the panel, and then choose **Show CloudWatch logs**. Macie opens the Amazon CloudWatch console and displays a table of the log events that Macie published for the job.
   + To review all the sensitive data findings that the job produced, choose **Show results** at the top of the panel, and then choose **Show findings**. Macie opens the **Findings** page and displays all the findings from the job. To review the details of a particular finding, choose the finding, and then refer to the details panel.
**Tip**  
In the finding details panel, you can use the link in the **Detailed result location** field to navigate to the corresponding sensitive data discovery result in Amazon S3:  
If the finding applies to a large archive or compressed file, the link displays the folder that contains the discovery results for the file. An archive or compressed file is *large* if it generates more than 100 discovery results.
If the finding applies to a small archive or compressed file, the link displays the file that contains the discovery results for the file. An archive or compressed file is *small* if it generates 100 or fewer discovery results.
If the finding applies to another type of file, the link displays the file that contains the discovery results for the file.
   + To review all the sensitive data discovery results that the job produced, choose **Show results** at the top of the panel, and then choose **Show classifications**. Macie opens the Amazon S3 console and displays the folder that contains all the discovery results for the job. This option is available only after you configure Macie to [store your sensitive data discovery results](discovery-results-repository-s3.md) in an S3 bucket.

# Managing sensitive data discovery jobs
<a name="discovery-jobs-manage"></a>

To help you manage your sensitive data discovery jobs, Amazon Macie maintains a complete inventory of your jobs in each AWS Region. With this inventory, you can manage your jobs as a single collection, and access configuration settings, processing statistics, and the status of individual jobs.

For example, you can identify all the jobs that you configured to run on a recurring basis for periodic analysis, assessment, and monitoring. You can also review a breakdown of the configuration settings for a job. This includes settings that define the scope of the analysis. It also includes settings that specify the types of sensitive data that you want Macie to detect and report when the job runs. If you use the Amazon Macie console to manage your jobs, each job's details also provide direct access to [sensitive data findings and other results](discovery-jobs-manage-results.md) that the job produced.

In addition to these tasks, you can create custom variations of individual jobs. You can copy an existing job, adjust the settings for the copy, and then save the copy as a new job. This can be helpful for cases where you want to analyze different sets of data in the same way, or the same set of data in different ways. It can also be helpful if you want to adjust the configuration settings for an existing job—cancel the existing job, copy it, and then adjust and save the copy as a new job.

**Topics**
+ [Reviewing your job inventory](discovery-jobs-manage-view.md)
+ [Reviewing configuration settings for a job](discovery-jobs-manage-settings.md)
+ [Checking the status of a job](discovery-jobs-status-check.md)
+ [Changing the status of a job](discovery-jobs-status-change.md)
+ [Copying a job](discovery-jobs-manage-copy.md)

# Reviewing your inventory of sensitive data discovery jobs
<a name="discovery-jobs-manage-view"></a>

On the Amazon Macie console, you can review a complete inventory of your sensitive data discovery jobs in the current AWS Region. The inventory provides both summary information for all of your jobs and details about individual jobs. Summary information includes: the current status of each job; whether a job runs on a scheduled, periodic basis; and, whether a job is configured to analyze objects in specific Amazon Simple Storage Service (Amazon S3) buckets or S3 buckets that match runtime criteria. For individual jobs, you can also access details such as a breakdown of the job's configuration settings. If a job has already run, the details also provide direct access to sensitive data findings and other types of results that the job produced.

**To review your job inventory**

Follow these steps to review your job inventory by using the Amazon Macie console. To access your inventory programmatically, use the [ListClassificationJobs](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-list.html) operation of the Amazon Macie API.

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**. The **Jobs** page opens and displays the number of jobs in your inventory and a table of those jobs.

1. At the top of the page, optionally choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) to retrieve the current status of each job.

1. In the **Jobs** table, review summary information for your jobs:
   + **Job name** – The name of the job.
   + **Resources** – Whether the job is configured to analyze objects in specific S3 buckets or buckets that match runtime criteria. If you explicitly selected buckets for the job to analyze, this field indicates the number of buckets that you selected. If you configured the job to use runtime criteria, the value for this field is **Criteria based**.
   + **Job type** – Whether the job is configured to run once (**One time**) or on a scheduled, periodic basis (**Scheduled**). 
   + **Status** – The current status of the job. To learn more about this value, see [Checking the status of a job](discovery-jobs-status-check.md).
   + **Created at** – When the job was created.

1. To analyze your inventory or find a specific job more quickly, do any of the following:
   + To sort the table by a specific field, choose the column heading for the field. To change the sort order, choose the column heading again.
   + To show only those jobs that have a specific value for a field, place your cursor in the filter box. In the menu that appears, choose the field to use for the filter, and enter the value for the filter. Then choose **Apply**.
   + To hide jobs that have a specific value for a field, place your cursor in the filter box. In the menu that appears, choose the field to use for the filter, and enter the value for the filter. Then choose **Apply**. In the filter box, choose the equals icon (![\[The equals icon, which is a solid gray circle.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-operator-equals.png)) for the filter. This changes the filter's operator from *equals* to *not equals* (![\[The not equals icon, which is an empty gray circle that has a backslash in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-operator-not-equals.png)).
   + To remove a filter, choose the remove filter icon (![\[The remove filter condition icon, which is a circle that has an X in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-filter-remove.png)) for the filter to remove.

1. To review additional settings and details for a particular job, choose the job's name. Then refer to the details panel. For information about these details, see [Reviewing configuration settings for a job](discovery-jobs-manage-settings.md).

# Reviewing the settings for a sensitive data discovery job
<a name="discovery-jobs-manage-settings"></a>

On the Amazon Macie console, you can use the details panel on the **Jobs** page to review configuration settings and other information about individual sensitive data discovery jobs. For example, you can review a list of the Amazon Simple Storage Service (Amazon S3) buckets that a job is configured to analyze. You can also determine which managed and custom data identifiers a job is configured to use when analyzing objects in those buckets.

Note that you can’t change any configuration settings for an existing job. This helps ensure that you have an immutable history of sensitive data findings and discovery results for data privacy and protection audits or investigations that you perform.

If you want to change an existing job, you can [cancel the job](discovery-jobs-status-change.md). Then [copy the job](discovery-jobs-manage-copy.md), configure the copy to use the settings that you want, and save the copy as a new job. If you do this, you should also take steps to ensure that the new job doesn't analyze existing data in the same way again. To do this, note the date and time when you cancel the existing job. Then configure the scope of the new job to include only those objects that are created or changed after you cancel the original job. For example, you can use [object criteria](discovery-jobs-scope.md#discovery-jobs-scope-criteria) to define an exclude condition that specifies when you cancelled the original job.

**To review the configuration settings for a job**

Follow these steps to review a job's configuration settings by using the Amazon Macie console. To review the settings programmatically, use the [DescribeClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API.

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**. The **Jobs** page opens and displays the number of jobs in your inventory and a table of those jobs.

1. In the **Jobs** table, choose the name of the job whose settings you want to review. To find the job more quickly, you can filter the table by using the filter options above the table. You can also sort the table in ascending or descending order by certain fields.

When you choose a job in the table, the details panel displays the job's configuration settings and other information about the job. Depending on the job's settings, the panel contains the following sections.

**General information**  
This section provides general information about the job. For example, it shows the Amazon Resource Name (ARN) of the job, when the job most recently started to run, and the current status of the job. If you paused the job, this section also indicates when you paused the job, and when the job or latest job run expired or will expire if you don't resume it.

**Statistics**  
This section shows processing statistics for the job. For example, it specifies the number of times that the job has run, and the approximate number of S3 objects that the job has yet to process during its current run.

**Scope**  
This section indicates how often the job runs. It also shows settings that refine the job's scope—for example, the [sampling depth](discovery-jobs-scope.md#discovery-jobs-scope-sampling), and any [object criteria](discovery-jobs-scope.md#discovery-jobs-scope-criteria) that include or exclude S3 objects from the analysis.

**S3 buckets**  
This section appears in the panel if the job is configured to analyze buckets that you explicitly selected when you created the job. It indicates the number of AWS accounts that the job is configured to analyze data for. It also indicates the number of buckets that the job is configured to analyze and the names of those buckets (grouped by account).  
To show the complete list of accounts and buckets in JSON format, choose the number in the **Total buckets** field.

**S3 bucket criteria**  
This section appears in the panel if the job uses runtime criteria to determine which buckets to analyze. It lists the criteria that the job is configured to use. To show the criteria in JSON format, choose **Details**. Then choose the **Criteria** tab in the window that appears.  
To review a list of buckets that currently match the criteria, choose **Details**. Then choose the **Matching buckets** tab in the window that appears. Optionally choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) to retrieve the latest data. The tab lists up to 25 buckets that currently match the criteria.  
If the job has already run, you can also determine whether any buckets matched the criteria when the job ran and, if so, the names of those buckets. To do this, review log events for the job: choose **Show results** at the top of the panel, and then choose **Show CloudWatch logs**. Macie opens the Amazon CloudWatch console and displays a table of log events for the job. The events include a `BUCKET_MATCHED_THE_CRITERIA` event for each bucket that matched the criteria and was included in the job's analysis. For more information, see [Monitoring jobs with CloudWatch Logs](discovery-jobs-monitor-cw-logs.md).

**Custom data identifiers**  
This section appears in the panel if the job is configured to use one or more [custom data identifiers](custom-data-identifiers.md). It specifies the names of those custom data identifiers.

**Allow lists**  
This section appears in the panel if the job is configured to use one or more [allow lists](allow-lists.md). It specifies the names of those lists. To review the settings and status of a list, choose the link icon (![\[The link icon, which is a blue box that has an arrow in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-view-resource-blue.png)) next to the list's name.

**Managed data identifiers**  
This section indicates which [managed data identifiers](managed-data-identifiers.md) the job is configured to use. This is determined by the managed data identifier selection type for the job:  
+ **Recommended** – Use the managed data identifiers that are in the [recommended set](discovery-jobs-mdis-recommended.md) when the job runs.
+ **Include selected** – Use only the managed data identifiers listed in the **Selections** section.
+ **Include all** – Use all the managed data identifiers that are available when the job runs.
+ **Exclude selected** – Use all the managed data identifiers that are available when the job runs, except the ones listed in the **Selections** section.
+ **Exclude all** – Don't use any managed data identifiers. Use only the specified custom data identifiers.
To review these settings in JSON format, choose **Details**.

**Tags**  
This section appears in the panel if tags are assigned to the job. It lists those tags. A *tag* is a label that you define and assign to certain types of AWS resources. Each tag consists of a required tag key and an optional tag value. To learn more, see [Tagging Macie resources](tagging-resources.md).

To review and save the job's settings in JSON format, choose the unique identifier for the job (**Job ID**) at the top of the panel. Then choose **Download**.

# Checking the status of a sensitive data discovery job
<a name="discovery-jobs-status-check"></a>

When you create a sensitive data discovery job, its initial status is **Active (Running)** or **Active (Idle)**, depending on the job's type and schedule. The job then passes through additional states, which you can monitor as the job progresses.

**Tip**  
In addition to monitoring the overall status of a job, you can monitor specific events that occur as a job progresses. You can do this by using logging data that Amazon Macie automatically publishes to Amazon CloudWatch Logs. The data in these logs provides a record of changes to a job's status and details about any account- or bucket-level errors that occur while a job runs. For more information, see [Monitoring jobs with CloudWatch Logs](discovery-jobs-monitor-cw-logs.md).

**To check the status of a job**

Follow these steps to check the status of a job by using the Amazon Macie console. To check a job's status programmatically, use the [DescribeClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API.

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**. The **Jobs** page opens and displays the number of jobs in your inventory and a table of those jobs.

1. At the top of the page, choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) to retrieve the current status of each job.

1. In the **Jobs** table, locate the job whose status you want to check. To find the job more quickly, you can filter the table by using the filter options above the table. You can also sort the table in ascending or descending order by certain fields.

1. Refer to the **Status** field in the table. This field indicates the job's current status.

A job's status can be one of the following.

**Active (Idle)**  
For a periodic job, the previous run is complete and the next scheduled run is pending. This value doesn't apply to one-time jobs.

**Active (Running)**  
For a one-time job, the job is currently in progress. For a periodic job, a scheduled run is in progress.

**Cancelled**  
For any type of job, the job was stopped permanently (cancelled).  
A job has this status if you explicitly cancelled it or, if it's a one-time job, you paused the job and didn't resume it within 30 days. A job can also have this status if you previously [suspended Macie](suspend-macie.md) in the current AWS Region.

**Complete**  
For a one-time job, the job ran successfully and is now complete. This value doesn't apply to periodic jobs. Instead, the status of a periodic job changes to **Active (Idle)** when each run completes successfully.

**Paused (By Macie)**  
For any type of job, the job was stopped temporarily (paused) by Macie.  
A job has this status if completion of the job or a job run would exceed the monthly [sensitive data discovery quota](macie-quotas.md) for your account. When this happens, Macie automatically pauses the job. Macie automatically resumes the job when the next calendar month starts and the monthly quota is reset for your account, or you increase the quota for your account.  
If you’re the Macie administrator for an organization and you configured the job to analyze data for member accounts, the job can also have this status if completion of the job or a job run would exceed the monthly sensitive data discovery quota for a member account.  
If a job is running and the analysis of eligible objects reaches this quota for a member account, the job stops analyzing objects that are owned by the account. When the job finishes analyzing objects for all other accounts that haven’t met the quota, Macie automatically pauses the job. If it’s a one-time job, Macie automatically resumes the job when the next calendar month starts or the quota is increased for all the affected accounts, whichever occurs first. If it’s a periodic job, Macie automatically resumes the job when the next run is scheduled to start or the next calendar month starts, whichever occurs first. If a scheduled run starts before the next calendar month starts or the quota is increased for an affected account, the job doesn’t analyze objects that are owned by the account.

**Paused (By user)**  
For any type of job, the job was stopped temporarily (paused) by you.  
If you pause a one-time job and you don't resume it within 30 days, the job expires and Macie cancels it. If you pause a periodic job while it's actively running and you don't resume it within 30 days, the job's run expires and Macie cancels the run. To check the expiration date for a paused job or job run, choose the job's name in the table, and then refer to the **Expires** field in the **Status details** section of the details panel.

If a job is cancelled or paused, you can refer to the job's details to determine whether the job started to run or, for a periodic job, ran at least once before it was cancelled or paused. To do this, choose the job's name in the **Jobs** table, and then refer to the details panel. In the panel, the **Number of runs** field indicates the number of times that the job has run. The **Last run time** field indicates the most recent date and time when the job started to run.

Depending on the job’s current status, you can optionally pause, resume, or cancel the job. For more information, see [Changing the status of a job](discovery-jobs-status-change.md).

# Changing the status of a sensitive data discovery job
<a name="discovery-jobs-status-change"></a>

After you create a sensitive data discovery job, you can pause it temporarily or cancel it permanently. When you pause a job that's actively running, Amazon Macie immediately begins to pause all processing tasks for the job. When you cancel a job that's actively running, Macie immediately begins to stop all processing tasks for the job. You can’t resume or restart a job after it’s cancelled.

If you pause a one-time job, you can resume it within 30 days. When you resume the job, Macie immediately resumes processing from the point where you paused the job. Macie doesn't restart the job from the beginning. If you don't resume a one-time job within 30 days of pausing it, the job expires and Macie cancels it.

If you pause a periodic job, you can resume it at any time. If you resume a periodic job and the job was idle when you paused it, Macie resumes the job according to the schedule and other configuration settings that you chose when you created the job. If you resume a periodic job and the job was actively running when you paused it, how Macie resumes the job depends on when you resume the job:
+ If you resume the job within 30 days of pausing it, Macie immediately resumes the latest scheduled run from the point where you paused the job. Macie doesn't restart the run from the beginning.
+ If you don't resume the job within 30 days of pausing it, the latest scheduled run expires and Macie cancels all remaining processing tasks for the run. When you subsequently resume the job, Macie resumes the job according to the schedule and other configuration settings that you chose when you created the job.

To help you determine when a paused job or job run will expire, Macie adds an expiration date to the job’s details while the job is paused. In addition, we notify you approximately seven days before the job or job run will expire. We notify you again when the job or job run expires and is cancelled. To notify you, we send email to the address that's associated with your AWS account. We also create AWS Health events and Amazon CloudWatch Events for your account. To check the expiration date by using the console, choose the job’s name in the table on the **Jobs** page. Then refer to the **Expires** field in the **Status details** section of the details panel. To check the date programmatically, use the [DescribeClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API. 

**To pause, resume, or cancel a job**

To pause, resume, or cancel a job by using the Amazon Macie console, follow these steps. To do this programmatically, use the [UpdateClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API.

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**. The **Jobs** page opens and displays the number of jobs in your inventory and a table of those jobs.

1. At the top of the page, choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) to retrieve the current status of each job.

1. In the **Jobs** table, select the checkbox for the job that you want to pause, resume, or cancel. To find the job more quickly, you can filter the table by using the filter options above the table. You can also sort the table in ascending or descending order by certain fields.

1. On the **Actions** menu, do one of the following:
   + To pause the job temporarily, choose **Pause**. This option is available only if the job's current status is **Active (Idle)**, **Active (Running)**, or **Paused (By Macie)**.
   + To resume the job, choose **Resume**. This option is available only if the job's current status is **Paused (By user)**.
   + To cancel the job permanently, choose **Cancel**. If you choose this option, you can't subsequently resume or restart the job.

# Copying a sensitive data discovery job
<a name="discovery-jobs-manage-copy"></a>

To quickly create a sensitive data discovery job that's similar to an existing job, you can create a copy of the existing job. You can then edit the copy's settings, and save the copy as a new job. This can be helpful for cases where you want to analyze different sets of data in the same way, or the same set of data in different ways. It can also be helpful if you want to adjust the configuration settings for an existing job—cancel the existing job, copy it, and then adjust and save the copy as a new job.

**To copy a job**

Follow these steps to copy a job by using the Amazon Macie console. To copy a job programmatically, use the [DescribeClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API to retrieve the configuration settings for the job that you want to copy. Then use the [CreateClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs.html) operation to create a copy of the job.

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**. The **Jobs** page opens and displays the number of jobs in your inventory and a table of those jobs.

1. In the **Jobs** table, select the checkbox for the job that you want to copy. To find the job more quickly, you can filter the table by using the filter options above the table. You can also sort the table in ascending or descending order by certain fields.

1. On the **Actions** menu, choose **Copy to new**.

1. Complete the steps on the console to review and adjust the settings for the copy of the job. For the **Refine the scope** step, consider choosing options that prevent the job from analyzing existing data in the same way again: 
   + For a one-time job, use [object criteria](discovery-jobs-scope.md#discovery-jobs-scope-criteria) to include only those objects that were created or changed after a certain time. For example, if you're creating a copy of a job that you cancelled, add a **Last modified** condition that specifies the date and time when you cancelled the existing job.
   + For a periodic job, clear the **Include existing objects** checkbox. If you do this, the first run of the job analyzes only those objects that are created or changed after you create the job and before the job's first run. You can also use [object criteria](discovery-jobs-scope.md#discovery-jobs-scope-criteria) to exclude objects that were last modified before a certain date and time.

   For additional details about this and other steps, see [Creating a sensitive data discovery job](discovery-jobs-create.md).

1. When you finish, choose **Submit** to save the copy as a new job.

If you configured the job to run once, on a daily basis, or on the current day of the week or month, Macie starts running the job immediately after you save it. Otherwise, Macie prepares to run the job on the specified day of the week or month. To monitor the job, you can [check the status of the job](discovery-jobs-status-check.md).

# Monitoring sensitive data discovery jobs with CloudWatch Logs
<a name="discovery-jobs-monitor-cw-logs"></a>

In addition to [monitoring the overall status](discovery-jobs-status-check.md) of a sensitive data discovery job, you can monitor and analyze specific events that occur as a job progresses. You can do this by using near real-time logging data that Amazon Macie automatically publishes to Amazon CloudWatch Logs. The data in these logs provides a record of changes to a job's progress or status. For example, you can use the data to determine the exact date and time when a job started to run, was paused, or finished running.

The log data also provides details about any account- or bucket-level errors that occur while a job runs. For example, Macie logs an event if the permissions settings for an Amazon Simple Storage Service (Amazon S3) bucket prevent a job from analyzing objects in the bucket. The event indicates when the error occurred, and it identifies the affected bucket and the AWS account that owns the bucket. The data for these types of events can help you identify, investigate, and address errors that prevent Macie from analyzing the data that you want.

With Amazon CloudWatch Logs, you can monitor, store, and access log files from multiple systems, applications, and AWS services, including Macie. You can also query and analyze log data, and configure CloudWatch Logs to notify you when certain events occur or thresholds are met. CloudWatch Logs also provides features for archiving log data and exporting the data to Amazon S3. To learn more about CloudWatch Logs, see the [Amazon CloudWatch Logs User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html).

**Topics**
+ [How logging works for jobs](discovery-jobs-monitor-cw-logs-configure.md)
+ [Reviewing logs for jobs](discovery-jobs-monitor-cw-logs-review.md)
+ [Understanding log events for jobs](discovery-jobs-monitor-cw-logs-ref.md)

# How logging works for sensitive data discovery jobs
<a name="discovery-jobs-monitor-cw-logs-configure"></a>

When you start running sensitive data discovery jobs, Amazon Macie automatically creates and configures the appropriate resources in Amazon CloudWatch Logs to log events for all of your jobs. Macie then publishes event data to those resources automatically when your jobs run. The permissions policy for the Macie [service-linked role](service-linked-roles.md) for your account allows Macie to perform these tasks on your behalf. You don't need to take any steps to create or configure resources in CloudWatch Logs to log event data for your jobs.

In CloudWatch Logs, logs are organized into *log groups*. Each log group contains *log streams*. Each log stream contains *log events*. The general purpose of each of these resources is as follows:
+ A *log group* is a collection of log streams that share the same retention, monitoring, and access control settings—for example, the collection of logs for all of your sensitive data discovery jobs.
+ A *log stream* is a sequence of log events that share the same source—for example, an individual sensitive data discovery job.
+ A *log event* is a record of an activity that was recorded by an application or resource—for example, an individual event that Macie recorded and published for a particular sensitive data discovery job.

Macie publishes events for all of your sensitive data discovery jobs to one log group. Each job has a unique log stream in that log group. The log group has the following prefix and name:

`/aws/macie/classificationjobs`

If this log group already exists, Macie uses it to store log events for your jobs. This can be helpful if your organization uses automated configuration, such as [AWS CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html), to create log groups with predefined retention periods, encryption settings, tags, metric filters, and so on, for job events.

If this log group doesn't exist, Macie creates it with the default settings that CloudWatch Logs uses for new log groups. The settings include a log retention period of **Never Expire**, which means that CloudWatch Logs stores the logs indefinitely. You can change the retention period for the log group. To learn how, see [Working with log groups and log streams](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Working-with-log-groups-and-streams.html) in the *Amazon CloudWatch Logs User Guide*.

Within this log group, Macie creates a unique log stream for each job that you run, the first time that the job runs. The name of the log stream is the unique identifier for the job, such as `85a55dc0fa6ed0be5939d0408example`, in the following format:

`/aws/macie/classificationjobs/85a55dc0fa6ed0be5939d0408example`

Each log stream contains all the log events that Macie recorded and published for the corresponding job. For periodic jobs, this includes events for all of the job's runs. If you delete the log stream for a periodic job, Macie creates the stream again the next time that the job runs. If you delete the log stream for a one-time job, you can't restore it.

Note that logging is enabled by default for all of your jobs. You can't disable it or otherwise prevent Macie from publishing job events to CloudWatch Logs. If you don't want to store the logs, you can reduce the retention period for the log group to as little as one day. At the end of the retention period, CloudWatch Logs automatically deletes expired event data from the log group.



# Reviewing logs for sensitive data discovery jobs
<a name="discovery-jobs-monitor-cw-logs-review"></a>

After you start running sensitive data discovery jobs in Amazon Macie, you can review logs for your jobs by using Amazon CloudWatch Logs. CloudWatch Logs provides features that are designed to help you review, analyze, and monitor log data. You can use these features to work with log streams and events for jobs as you would work with any other type of log data in CloudWatch Logs.

For example, you can search and filter aggregate data to identify specific types of events that occurred for all of your jobs during a specific time range. Or you can perform a targeted review of all the events that occurred for a particular job. CloudWatch Logs also provides options for monitoring log data, defining metric filters, and creating custom alarms.

**Tip**  
To quickly navigate to the log data for a particular job, you can use the Amazon Macie console. To do this, choose the job's name on the **Jobs** page. At the top of the details panel, choose **Show results**, and then choose **Show CloudWatch logs**. Macie opens the Amazon CloudWatch console and displays a table of log events for the job.

**To review logs for sensitive data discovery jobs**

Follow these steps to navigate to and review log data by using the Amazon CloudWatch console. To review the data programmatically, use the [Amazon CloudWatch Logs API](https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/Welcome.html).

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. By using the AWS Region selector in the upper-right corner of the page, choose the Region in which you ran jobs that you want to review logs for.

1. In the navigation pane, choose **Logs**, and then choose **Log groups**.

1. On the **Log groups** page, choose the **/aws/macie/classificationjobs** log group. CloudWatch displays a table of log streams for the jobs that you've run. There is one unique stream for each job. The name of each stream correlates to the unique identifier for a job.

1. On the **Log streams** tab, do one of the following:
   + To review the log events for a particular job, choose the log stream for the job. To find the stream more easily, enter the job's unique identifier in the filter box above the table. After you choose the log stream, CloudWatch displays a table of log events for the job.
   + To review log events for all of your jobs, choose **Search all log streams**. CloudWatch displays a table of log events for all of your jobs.

1. (Optional) In the filter box above the table, enter terms, phrases, or values that specify characteristics of specific events to review. For more information, see [Search log data using filter patterns](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SearchDataFilterPattern.html) in the *Amazon CloudWatch Logs User Guide*.

1. To review the details of a specific log event, choose expand (![\[The expand row icon, which is a right-facing solid arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-caret-right-filled.png)) in the row for the event. CloudWatch displays the event's details in JSON format. To learn more about these details, see [Understanding log events for jobs](discovery-jobs-monitor-cw-logs-ref.md).

As you familiarize yourself with the data in the log events, you can perform additional tasks to streamline analysis and monitoring of the data. For example, you can [create metrics filters](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/MonitoringLogData.html) that turn log data into numerical CloudWatch metrics. You can also [create custom alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ConsoleAlarms.html) that make it easier to identify and respond to specific log events. For more information, see the [Amazon CloudWatch Logs User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html).

# Understanding log events for sensitive data discovery jobs
<a name="discovery-jobs-monitor-cw-logs-ref"></a>

To help you monitor your sensitive data discovery jobs, Amazon Macie automatically publishes logging data for jobs to Amazon CloudWatch Logs. The data in these logs provides a record of changes to a job's progress or status. For example, you can use the data to determine the exact date and time when a job started to run or finished running. The data also provides details about certain types of errors that can occur while a job runs. This data can help you identify, investigate, and address errors that prevent Macie from analyzing the data that you want.

When you start running jobs, Macie automatically creates and configures the appropriate resources in CloudWatch Logs to log events for all of your jobs. Macie then publishes event data to those resources automatically when your jobs run. For more information, see [How logging works for jobs](discovery-jobs-monitor-cw-logs-configure.md).

By using CloudWatch Logs, you can then query and analyze log data for your jobs. For example, you can search and filter aggregate data to identify specific types of events that occurred for all of your jobs during a specific time range. Or you can perform a targeted review of all the events that occurred for a particular job. CloudWatch Logs also provides options for monitoring log data, defining metric filters, and creating custom alarms. For example, you can configure CloudWatch Logs to notify you if a certain type of event occurs when your jobs run. For more information, see the [Amazon CloudWatch Logs User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html).

**Contents**
+ [Log event schema for jobs](#discovery-jobs-monitor-cw-logs-schema)
+ [Types of log events for jobs](#discovery-jobs-monitor-cw-logs-event-index)
  + [Job status events](#discovery-jobs-monitor-cw-logs-event-index-status)
  + [Account-level error events](#discovery-jobs-monitor-cw-logs-event-index-account-errors)
  + [Bucket-level error events](#discovery-jobs-monitor-cw-logs-event-index-bucket-errors)

## Log event schema for sensitive data discovery jobs
<a name="discovery-jobs-monitor-cw-logs-schema"></a>

Each log event for a sensitive data discovery job is a JSON object that contains a standard set of fields and conforms to the Amazon CloudWatch Logs event schema. Some types of events have additional fields that provide information that's particularly useful for that type of event. For example, events for account-level errors include the account ID for the affected AWS account. Events for bucket-level errors include the name of the affected Amazon Simple Storage Service (Amazon S3) bucket.

The following example shows the log event schema for sensitive data discovery jobs. In this example, the event reports that Amazon Macie wasn't able to analyze any objects in an S3 bucket because Amazon S3 denied access to the bucket.

```
{
    "adminAccountId": "123456789012",
    "jobId": "85a55dc0fa6ed0be5939d0408example",
    "eventType": "BUCKET_ACCESS_DENIED",
    "occurredAt": "2024-04-14T17:11:30.574809Z",
    "description": "Macie doesn’t have permission to access the affected S3 bucket.",
    "jobName": "My_Macie_Job",
    "operation": "ListObjectsV2",
    "runDate": "2024-04-14T17:08:30.345809Z",
    "affectedAccount": "111122223333",
    "affectedResource": {
        "type": "S3_BUCKET_NAME",
        "value": "amzn-s3-demo-bucket"
    }
}
```

In the preceding example, Macie attempted to list the bucket's objects by using the [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html) operation of the Amazon S3 API. When Macie sent the request to Amazon S3, Amazon S3 denied access to the bucket. 

The following fields are common to all log events for sensitive data discovery jobs:
+ `adminAccountId` – The unique identifier for the AWS account that created the job.
+ `jobId` – The unique identifier for the job.
+ `eventType` – The type of event that occurred.
+ `occurredAt` – The date and time, in Coordinated Universal Time (UTC) and extended ISO 8601 format, when the event occurred.
+ `description` – A brief description of the event.
+ `jobName` – The name of the job.

Depending on the type and nature of an event, a log event can also contain the following fields:
+ `affectedAccount` – The unique identifier for the AWS account that owns the affected resource.
+ `affectedResource` – A JSON object that provides details about the affected resource. In the object, the `type` field specifies a field that stores metadata about a resource. The `value` field specifies the value for the field (`type`).
+ `operation` – The operation that Macie attempted to perform and caused the error.
+ `runDate` – The date and time, in Coordinated Universal Time (UTC) and extended ISO 8601 format, when the applicable job or job run started.

## Types of log events for sensitive data discovery jobs
<a name="discovery-jobs-monitor-cw-logs-event-index"></a>

Amazon Macie publishes log events for three categories of events that can occur for a sensitive data discovery job:
+ Job status events, which record changes to the status or progress of a job or a job run.
+ Account-level error events, which record errors that prevented Macie from analyzing Amazon S3 data for a specific AWS account.
+ Bucket-level error events, which record errors that prevented Macie from analyzing data in a specific S3 bucket.

The topics in this section list and describe the types of events that Macie publishes for each category.

### Job status events
<a name="discovery-jobs-monitor-cw-logs-event-index-status"></a>

A job status event records a change to the status or progress of a job or a job run. For periodic jobs, Macie logs and publishes these events for both the overall job and individual job runs.

The following example uses sample data to show the structure and nature of the fields in a job status event. In this example, a `SCHEDULED_RUN_COMPLETED` event indicates that a scheduled run of a periodic job finished running. The run started on April 14, 2024, at 17:09:30 UTC, as indicated by the `runDate` field. The run finished on April 14, 2024, at 17:16:30 UTC, as indicated by the `occurredAt` field.

```
{
    "adminAccountId": "123456789012",
    "jobId": "ffad0e71455f38a4c7c220f3cexample",
    "eventType": "SCHEDULED_RUN_COMPLETED",
    "occurredAt": "2024-04-14T17:16:30.574809Z",
    "description": "The scheduled job run finished running.",
    "jobName": "My_Daily_Macie_Job",
    "runDate": "2024-04-14T17:09:30.574809Z"
}
```

The following table lists and describes the types of job status events that Macie logs and publishes to CloudWatch Logs. The **Event type** column indicates the name of each event as it appears in the `eventType` field of an event. The **Description** column provides a brief description of the event as it appears in the `description` field of an event. The **Additional information** provides information about the type of job that the event applies to. The table is sorted first by the general chronological order in which events might occur, and then in ascending alphabetical order by event type.


| Event type | Description | Additional information | 
| --- | --- | --- | 
|  JOB\$1CREATED  |  The job was created.  |  Applies to one-time and periodic jobs.  | 
| ONE\$1TIME\$1JOB\$1STARTED |  The job started running.  |  Applies only to one-time jobs.  | 
|  SCHEDULED\$1RUN\$1STARTED  |  The scheduled job run started running.  |  Applies only to periodic jobs. To log the start of a one-time job, Macie publishes a ONE\$1TIME\$1JOB\$1STARTED event, not this type of event.  | 
|  BUCKET\$1MATCHED\$1THE\$1CRITERIA  |  The affected bucket matched the bucket criteria specified for the job.  |  Applies to one-time and periodic jobs that use runtime bucket criteria to determine which S3 buckets to analyze. The `affectedResource` object specifies the name of the bucket that matched the criteria and was included in the job's analysis.  | 
|  NO\$1BUCKETS\$1MATCHED\$1THE\$1CRITERIA  |  The job started running but no buckets currently match the bucket criteria specified for the job. The job didn't analyze any data.  |  Applies to one-time and periodic jobs that use runtime bucket criteria to determine which S3 buckets to analyze.  | 
| SCHEDULED\$1RUN\$1COMPLETED |  The scheduled job run finished running.  |  Applies only to periodic jobs. To log completion of a one-time job, Macie publishes a JOB\$1COMPLETED event, not this type of event.  | 
|  JOB\$1PAUSED\$1BY\$1USER  |  The job was paused by a user.  |  Applies to one-time and periodic jobs that you stopped temporarily (paused).  | 
|  JOB\$1RESUMED\$1BY\$1USER  |  The job was resumed by a user.  |  Applies to one-time and periodic jobs that you stopped temporarily (paused) and later resumed.  | 
|  JOB\$1PAUSED\$1BY\$1MACIE\$1SERVICE\$1QUOTA\$1MET  |  The job was paused by Macie. Completion of the job would exceed a monthly quota for the affected account.  |  Applies to one-time and periodic jobs that Macie stopped temporarily (paused). Macie automatically pauses a job when additional processing by the job or a job run would exceed the monthly [sensitive data discovery quota](macie-quotas.md) for one or more accounts that the job analyzes data for. To avoid this issue, consider increasing the quota for the affected accounts.  | 
|  JOB\$1RESUMED\$1BY\$1MACIE\$1SERVICE\$1QUOTA\$1LIFTED  |  The job was resumed by Macie. The monthly service quota was lifted for the affected account.  |  Applies to one-time and periodic jobs that Macie stopped temporarily (paused) and later resumed. If Macie automatically paused a one-time job, Macie automatically resumes the job when the subsequent month starts or the monthly sensitive data discovery quota is increased for all the affected accounts, whichever occurs first. If Macie automatically paused a periodic job, Macie automatically resumes the job when the next run is scheduled to start or the subsequent month starts, whichever occurs first.  | 
|  JOB\$1CANCELLED  | The job was cancelled. |  Applies to one-time and periodic jobs that you stopped permanently (cancelled) or, for one-time jobs, paused and didn't resume within 30 days. If you suspend or disable Macie, this type of event also applies to jobs that were active or paused when you suspended or disabled Macie. Macie automatically cancels your jobs in an AWS Region if you suspend or disable Macie in the Region.  | 
|  JOB\$1COMPLETED  |  The job finished running.  |  Applies only to one-time jobs. To log completion of a job run for a periodic job, Macie publishes a SCHEDULED\$1RUN\$1COMPLETED event, not this type of event.  | 

### Account-level error events
<a name="discovery-jobs-monitor-cw-logs-event-index-account-errors"></a>

An account-level error event records an error that prevented Macie from analyzing objects in S3 buckets that are owned by a specific AWS account. The `affectedAccount` field in each event specifies the account ID for that account.

The following example uses sample data to show the structure and nature of the fields in an account-level error event. In this example, an `ACCOUNT_ACCESS_DENIED` event indicates that Macie wasn't able to analyze objects in any S3 buckets that are owned by account `444455556666`.

```
{
    "adminAccountId": "123456789012",
    "jobId": "85a55dc0fa6ed0be5939d0408example",
    "eventType": "ACCOUNT_ACCESS_DENIED",
    "occurredAt": "2024-04-14T17:08:30.585709Z",
    "description": "Macie doesn’t have permission to access S3 bucket data for the affected account.",
    "jobName": "My_Macie_Job",
    "operation": "ListBuckets",
    "runDate": "2024-04-14T17:05:27.574809Z",
    "affectedAccount": "444455556666"
}
```

The following table lists and describes the types of account-level error events that Macie logs and publishes to CloudWatch Logs. The **Event type** column indicates the name of each event as it appears in the `eventType` field of an event. The **Description** column provides a brief description of the event as it appears in the `description` field of an event. The **Additional information** column provides any applicable tips for investigating or addressing the error that occurred. The table is sorted in ascending alphabetical order by event type.


| Event type | Description | Additional information | 
| --- | --- | --- | 
|  ACCOUNT\$1ACCESS\$1DENIED  |  Macie doesn’t have permission to access S3 bucket data for the affected account.  |  This typically occurs because the buckets that are owned by the account have restrictive bucket policies. For information about how to address this issue, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md). The value for the `operation` field in the event can help you determine which permissions settings prevented Macie from accessing S3 data for the account. This field indicates the Amazon S3 operation that Macie attempted to perform when the error occurred.  | 
| ACCOUNT\$1DISABLED |  The job skipped resources that are owned by the affected account. Macie was disabled for the account.  |  To address this issue, re-enable Macie for the account in the same AWS Region.  | 
| ACCOUNT\$1DISASSOCIATED |  The job skipped resources that are owned by the affected account. The account isn't associated with your Macie administrator account as a member account anymore.  |  This occurs if you, as a Macie administrator for an organization, configure a job to analyze data for a member account and the account is later removed from your organization. To address this issue, re-associate the affected account with your Macie administrator account as a member account. For more information, see [Managing multiple accounts](macie-accounts.md).  | 
|  ACCOUNT\$1ISOLATED  |  The job skipped resources that are owned by the affected account. The AWS account was isolated.  |  –  | 
|  ACCOUNT\$1REGION\$1DISABLED  |  The job skipped resources that are owned by the affected account. The AWS account isn't active in the current AWS Region.  |  –   | 
|  ACCOUNT\$1SUSPENDED  |  The job was cancelled or skipped resources that are owned by the affected account. Macie was suspended for the account.  |  If the specified account is your own account, Macie automatically cancelled the job when you suspended Macie in the same Region. To address the issue, re-enable Macie in the Region.  If the specified account is a member account, re-enable Macie for that account in the same Region.  | 
|  ACCOUNT\$1TERMINATED  |  The job skipped resources that are owned by the affected account. The AWS account was terminated.  |  –  | 

### Bucket-level error events
<a name="discovery-jobs-monitor-cw-logs-event-index-bucket-errors"></a>

A bucket-level error event records an error that prevented Macie from analyzing objects in a specific S3 bucket. The `affectedAccount` field in each event specifies the account ID for the AWS account that owns the bucket. The `affectedResource` object in each event specifies the name of the bucket.

The following example uses sample data to show the structure and nature of the fields in a bucket-level error event. In this example, a `BUCKET_ACCESS_DENIED` event indicates that Macie wasn't able to analyze any objects in the S3 bucket named `amzn-s3-demo-bucket`. When Macie attempted to list the bucket's objects by using the [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html) operation of the Amazon S3 API, Amazon S3 denied access to the bucket.

```
{
    "adminAccountId": "123456789012",
    "jobId": "85a55dc0fa6ed0be5939d0408example",
    "eventType": "BUCKET_ACCESS_DENIED",
    "occurredAt": "2024-04-14T17:11:30.574809Z",
    "description": "Macie doesn’t have permission to access the affected S3 bucket.",
    "jobName": "My_Macie_Job",
    "operation": "ListObjectsV2",
    "runDate": "2024-04-14T17:09:30.685209Z",
    "affectedAccount": "111122223333",
    "affectedResource": {
        "type": "S3_BUCKET_NAME",
        "value": "amzn-s3-demo-bucket"
    }
}
```

The following table lists and describes the types of bucket-level error events that Macie logs and publishes to CloudWatch Logs. The **Event type** column indicates the name of each event as it appears in the `eventType` field of an event. The **Description** column provides a brief description of the event as it appears in the `description` field of an event. The **Additional information** column provides any applicable tips for investigating or addressing the error that occurred. The table is sorted in ascending alphabetical order by event type.


| Event type | Description | Additional information | 
| --- | --- | --- | 
|  BUCKET\$1ACCESS\$1DENIED  |  Macie doesn’t have permission to access the affected S3 bucket.  |  This typically occurs because a bucket has a restrictive bucket policy. For information about how to address this issue, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md). The value for the `operation` field in the event can help you determine which permissions settings prevented Macie from accessing the bucket. This field indicates the Amazon S3 operation that Macie attempted to perform when the error occurred.  | 
|  BUCKET\$1DETAILS\$1UNAVAILABLE  |  A temporary issue prevented Macie from retrieving details about the bucket and the bucket’s objects.  |  This occurs if a transient issue prevented Macie from retrieving the bucket and object metadata that it needs to analyze a bucket's objects. For example, an Amazon S3 exception occurred when Macie tried to verify that it's allowed to access the bucket. To address the issue for a one-time job, consider creating and running a new, one-time job to analyze objects in the bucket. For a scheduled job, Macie will try to retrieve the metadata again during the next job run.  | 
| BUCKET\$1DOES\$1NOT\$1EXIST |  The affected S3 bucket doesn’t exist anymore.  |  This typically occurs because a bucket was deleted.   | 
|  BUCKET\$1IN\$1DIFFERENT\$1REGION  |  The affected S3 bucket was moved to a different AWS Region.  |  –  | 
| BUCKET\$1OWNER\$1CHANGED |  The owner of the affected S3 bucket changed. Macie doesn’t have permission to access the bucket anymore.  |  This typically occurs if ownership of a bucket was transferred to an AWS account that isn't part of your organization. The `affectedAccount` field in the event indicates the account ID for the account that previously owned the bucket.  | 

# Forecasting and monitoring costs for sensitive data discovery jobs
<a name="discovery-jobs-costs"></a>

Amazon Macie pricing is based partly on the amount of data that you analyze by running sensitive data discovery jobs. To forecast and monitor your estimated costs for running sensitive data discovery jobs, you can review cost estimates that Macie provides when you create a job and after you start running jobs. 

To review and monitor your actual costs, you can use AWS Billing and Cost Management. AWS Billing and Cost Management provides features that are designed to help you track and analyze your costs for AWS services, and manage budgets for your account or organization. It also provides features that can help you forecast usage costs based on historical data. To learn more, see the [AWS Billing User Guide](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-what-is.html).

For information about Macie pricing, see [Amazon Macie pricing](https://aws.amazon.com/macie/pricing/).

**Topics**
+ [Forecasting the cost of a job](#discovery-jobs-costs-forecast)
+ [Monitoring estimated costs for jobs](#discovery-jobs-costs-track)

## Forecasting the cost of a sensitive data discovery job
<a name="discovery-jobs-costs-forecast"></a>

When you create a sensitive data discovery job, Amazon Macie can calculate and display estimated costs during two key steps in the job creation process: when you review the table of S3 buckets that you selected for the job (step 2) and when you review all the settings for the job (step 8). These estimates can help you determine whether to adjust the job's settings before you save the job. The availability and nature of the estimates depends on the settings that you choose for the job.

**Reviewing estimated costs for individual buckets (step 2)**  
If you explicitly select individual buckets for a job to analyze, you can review the estimated cost of analyzing objects in each of those buckets. Macie displays these estimates during step 2 of the job creation process, when you review your bucket selections. In the table for this step, the **Estimated cost** field indicates the total estimated cost (in US dollars) of running the job once to analyze objects in a bucket.  
Each estimate reflects the projected amount of uncompressed data that the job will analyze in a bucket, based on the size and types of objects that are currently stored in the bucket. The estimate also reflects Macie pricing for the current AWS Region.  
Only classifiable objects are included in the cost estimate for a bucket. A *classifiable object* is an S3 object that uses a [supported Amazon S3 storage class](discovery-supported-storage.md#discovery-supported-s3-classes) and has a file name extension for a [supported file or storage format](discovery-supported-storage.md#discovery-supported-formats). If any classifiable objects are compressed or archive files, the estimate assumes that the files use a 3:1 compression ratio and the job can analyze all extracted files.

**Reviewing the total estimated cost of a job (step 8)**  
If you create a one-time job or you create and configure a periodic job to include existing S3 objects, Macie calculates and displays the job's total estimated cost during the final step of the job creation process. You can review this estimate while you review and verify all the settings that you selected for the job.  
This estimate indicates the total projected cost (in US dollars) of running the job once in the current Region. The estimate reflects the projected amount of uncompressed data that the job will analyze. It's based on the size and types of objects that are currently stored in buckets that you explicitly selected for the job or up to 500 buckets that currently match bucket criteria that you specified for the job, depending on the job's settings.  
Note that this estimate doesn't reflect any options that you selected to refine and reduce the scope of the job—for example, a lower sampling depth, or criteria that exclude certain S3 objects from the job. It also doesn't reflect your monthly [sensitive data discovery quota](macie-quotas.md), which might limit the scope and cost of the job's analysis, or any discounts that might apply to your account.  
In addition to the total estimated cost of the job, the estimate provides aggregated data that offers insight into the projected scope and cost of the job:  
+ **Size** values indicate the total storage size of the objects that the job can and can't analyze.
+ **Object count** values indicate the total number of objects that the job can and can't analyze.
In these values, a **Classifiable** object is an S3 object that uses a [supported Amazon S3 storage class](discovery-supported-storage.md#discovery-supported-s3-classes) and has a file name extension for a [supported file or storage format](discovery-supported-storage.md#discovery-supported-formats). Only classifiable objects are included in the cost estimate. A **Not classifiable** object is an object that doesn't use a supported storage class or doesn't have a file name extension for a supported file or storage format. These objects aren't included in the cost estimate.   
The estimate provides additional aggregated data for S3 objects that are compressed or archive files. The **Compressed** value indicates the total storage size of objects that use a supported Amazon S3 storage class and have a file name extension for a supported type of compressed or archive file. The **Uncompressed** value indicates the approximate size of these objects if they're decompressed, based on a specified compression ratio. This data is relevant due to the way that Macie analyzes compressed files and archive files.  
When Macie analyzes a compressed or archive file, it inspects both the full file and the contents of the file. To inspect the file’s contents, Macie decompresses the file, and then inspects each extracted file that uses a supported format. The actual amount of data that a job analyzes therefore depends on:  
+ Whether a file uses compression and, if so, the compression ratio that it uses.
+ The number, size, and format of the extracted files.
By default, Macie assumes the following when it calculates cost estimates for a job:   
+ All compressed and archive files use a 3:1 compression ratio.
+ All the extracted files use a supported file or storage format.
These assumptions can result in a larger size estimate for the scope of the data that the job will analyze, and, consequently, a higher cost estimate for the job.   
You can recalculate the job's total estimated cost based on a different compression ratio. To do this, choose the ratio from the **Choose an estimated compression ratio** list in the **Estimated cost** section. Macie then updates the estimate to match your selection.

For more information about how Macie calculates estimated costs, see [Understanding estimated usage costs](account-mgmt-costs-calculations.md).

## Monitoring estimated costs for sensitive data discovery jobs
<a name="discovery-jobs-costs-track"></a>

If you’re already running sensitive data discovery jobs, the **Usage** page on the Amazon Macie console can help you monitor the estimated cost of those jobs. The page shows your estimated costs (in US dollars) for using Macie in the current AWS Region during the current calendar month. For information about how Macie calculates these estimates, see [Understanding estimated usage costs](account-mgmt-costs-calculations.md).

**To review your estimated costs for running jobs**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. By using the AWS Region selector in the upper-right corner of the page, choose the Region in which you want to review your estimated costs.

1. In the navigation pane, choose **Usage**.

1. On the **Usage** page, refer to the breakdown of estimated costs for your account. The **Sensitive data discovery jobs** item reports the total estimated cost of the jobs that you've run thus far during the current month in the current Region.

   If you're the Macie administrator for an organization, the **Estimated costs** section shows estimated costs for your organization overall for the current month in the current Region. To show the total estimated cost of the jobs that were run for a specific account, choose the account in the table. The **Estimated costs** section then shows a breakdown of estimated costs for the account, including the estimated cost of the jobs that were run. To show this data for a different account, choose the account in the table. To clear your account selection, choose **X** next to the account ID.

To review and monitor your actual costs, use [AWS Billing and Cost Management](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-what-is.html).

# Managed data identifiers recommended for sensitive data discovery jobs
<a name="discovery-jobs-mdis-recommended"></a>

To optimize the results of your sensitive data discovery jobs, you can configure individual jobs to automatically use the set of managed data identifiers that we recommend for jobs. A *managed data identifier* is a set of built-in criteria and techniques that are designed to detect a specific type of sensitive data—for example, AWS secret access keys, credit card numbers, or passport numbers for a particular country or region.

The recommended set of managed data identifiers is designed to detect common categories and types of sensitive data. Based on our research, it can detect general categories and types of sensitive data while also optimizing your job results by reducing noise. As we release new managed data identifiers, we add them to this set if they're likely to further optimize your job results. Over time, we might also add or remove existing managed data identifiers from the set. If we add or remove a managed data identifier from the recommended set, we update this page to indicate the nature and timing of the change. For automatic alerts about these changes, you can subscribe to the RSS feed on the [Macie document history](doc-history.md) page.

When you create a sensitive data discovery job, you specify which managed data identifiers you want the job to use to analyze objects in Amazon Simple Storage Service (Amazon S3) buckets. To configure a job to use the recommended set of managed data identifiers, choose the *Recommended* option when you create the job. The job will then automatically use all the managed data identifiers that are in the recommended set when the job starts to run. If you configure a job to run more than once, each run will automatically use all the managed data identifiers that are in the recommended set when the run starts.

The following topics list the managed data identifiers that are currently in the recommended set, organized by sensitive data category and type. They specify the unique identifier (ID) for each managed data identifier in the set. This ID describes the type of sensitive data that a managed data identifier is designed to detect, for example: `PGP_PRIVATE_KEY` for PGP private keys and `USA_PASSPORT_NUMBER` for US passport numbers.

**Topics**
+ [Credentials](#discovery-jobs-mdis-recommended-credentials)
+ [Financial information](#discovery-jobs-mdis-recommended-financial)
+ [Personally identifiable information (PII)](#discovery-jobs-mdis-recommended-pii)
+ [Updates to the recommended set](#discovery-jobs-mdis-recommended-updates)

 For details about specific managed data identifiers or a complete list of all the managed data identifiers that Macie currently provides, see [Using managed data identifiers](managed-data-identifiers.md).

## Credentials
<a name="discovery-jobs-mdis-recommended-credentials"></a>

To detect occurrences of credentials data in S3 objects, the recommended set uses the following managed data identifiers.


| Sensitive data type | Managed data identifier ID | 
| --- | --- | 
| AWS secret access key | AWS\$1CREDENTIALS | 
| HTTP Basic Authorization header | HTTP\$1BASIC\$1AUTH\$1HEADER | 
| OpenSSH private key | OPENSSH\$1PRIVATE\$1KEY | 
| PGP private key | PGP\$1PRIVATE\$1KEY | 
| Public Key Cryptography Standard (PKCS) private key | PKCS | 
| PuTTY private key | PUTTY\$1PRIVATE\$1KEY | 

## Financial information
<a name="discovery-jobs-mdis-recommended-financial"></a>

To detect occurrences of financial information in S3 objects, the recommended set uses the following managed data identifiers.


| Sensitive data type | Managed data identifier ID | 
| --- | --- | 
| Credit card magnetic stripe data | CREDIT\$1CARD\$1MAGNETIC\$1STRIPE | 
| Credit card number | CREDIT\$1CARD\$1NUMBER (for credit card numbers in proximity of a keyword) | 

## Personally identifiable information (PII)
<a name="discovery-jobs-mdis-recommended-pii"></a>

To detect occurrences of personally identifiable information (PII) in S3 objects, the recommended set uses the following managed data identifiers.


| Sensitive data type | Managed data identifier ID | 
| --- | --- | 
| Driver’s license identification number | CANADA\$1DRIVERS\$1LICENSE, DRIVERS\$1LICENSE (for the US), UK\$1DRIVERS\$1LICENSE | 
| Electoral roll number | UK\$1ELECTORAL\$1ROLL\$1NUMBER | 
| National identification number | FRANCE\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, GERMANY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, ITALY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, SPAIN\$1DNI\$1NUMBER | 
| National Insurance Number (NINO) | UK\$1NATIONAL\$1INSURANCE\$1NUMBER | 
| Passport number | CANADA\$1PASSPORT\$1NUMBER, FRANCE\$1PASSPORT\$1NUMBER, GERMANY\$1PASSPORT\$1NUMBER, ITALY\$1PASSPORT\$1NUMBER, SPAIN\$1PASSPORT\$1NUMBER, UK\$1PASSPORT\$1NUMBER, USA\$1PASSPORT\$1NUMBER | 
| Social Insurance Number (SIN) | CANADA\$1SOCIAL\$1INSURANCE\$1NUMBER | 
| Social Security number (SSN) | SPAIN\$1SOCIAL\$1SECURITY\$1NUMBER, USA\$1SOCIAL\$1SECURITY\$1NUMBER | 
| Taxpayer identification or reference number | AUSTRALIA\$1TAX\$1FILE\$1NUMBER, BRAZIL\$1CPF\$1NUMBER, FRANCE\$1TAX\$1IDENTIFICATION\$1NUMBER, GERMANY\$1TAX\$1IDENTIFICATION\$1NUMBER, SPAIN\$1NIE\$1NUMBER, SPAIN\$1NIF\$1NUMBER, SPAIN\$1TAX\$1IDENTIFICATION\$1NUMBER, USA\$1INDIVIDUAL\$1TAX\$1IDENTIFICATION\$1NUMBER | 

## Updates to the recommended set
<a name="discovery-jobs-mdis-recommended-updates"></a>

The following table describes changes to the set of managed data identifiers that we recommend for sensitive data discovery jobs. For automatic alerts about these changes, subscribe to the RSS feed on the [Macie document history](doc-history.md) page.


| Change | Description | Date | 
| --- | --- | --- | 
|  General availability  |  Initial release of the recommended set.  |  June 27, 2023  | 

# Analyzing encrypted Amazon S3 objects
<a name="discovery-supported-encryption-types"></a>

When you enable Amazon Macie for your AWS account, Macie creates a [service-linked role](service-linked-roles.md) that grants Macie the permissions that it requires to call Amazon Simple Storage Service (Amazon S3) and other AWS services on your behalf. A service-linked role simplifies the process of setting up an AWS service because you don't have to manually add permissions for the service to complete actions on your behalf. To learn about this type of role, see [IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) in the *AWS Identity and Access Management User Guide*.

The permissions policy for the Macie service-linked role (`AWSServiceRoleForAmazonMacie`) allows Macie to perform actions that include retrieving information about your S3 buckets and objects, and retrieving and analyzing objects in your S3 buckets. If your account is the Macie administrator account for an organization, the policy also allows Macie to perform these actions on your behalf for member accounts in your organization.

If an S3 object is encrypted, the permissions policy for the Macie service-linked role typically grants Macie the permissions that it requires to decrypt the object. However, this depends on the type of encryption that was used. It can also depend on whether Macie is allowed to use the appropriate encryption key.

**Topics**
+ [Encryption options for S3 objects](#discovery-supported-encryption-types-matrix)
+ [Allowing Macie to use a customer managed AWS KMS key](#discovery-supported-encryption-cmk-configuration)

## Encryption options for Amazon S3 objects
<a name="discovery-supported-encryption-types-matrix"></a>

Amazon S3 supports multiple encryption options for S3 objects. For most of these options, Amazon Macie can decrypt an object by using the Macie service-linked role for your account. However, this depends on the type of encryption that was used to encrypt an object.

**Server-side encryption with Amazon S3 managed keys (SSE-S3)**  
If an object is encrypted using server-side encryption with an Amazon S3 managed key (SSE-S3), Macie can decrypt the object.  
To learn about this type of encryption, see [Using server-side encryption with Amazon S3 managed keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html) in the *Amazon Simple Storage Service User Guide*.

**Server-side encryption with AWS KMS keys (DSSE-KMS and SSE-KMS)**  
If an object is encrypted using dual-layer server-side encryption or server-side encryption with an AWS managed AWS KMS key (DSSE-KMS or SSE-KMS), Macie can decrypt the object.  
If an object is encrypted using dual-layer server-side encryption or server-side encryption with a customer managed AWS KMS key (DSSE-KMS or SSE-KMS), Macie can decrypt the object only if you [allow Macie to use the key](#discovery-supported-encryption-cmk-configuration). This is the case for objects that are encrypted with KMS keys managed entirely within AWS KMS and KMS keys in an external key store. If Macie isn't allowed to use the applicable KMS key, Macie can only store and report metadata for the object.  
To learn about these types of encryption, see [Using dual-layer server-side encryption with AWS KMS keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingDSSEncryption.html) and [Using server-side encryption with AWS KMS keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) in the *Amazon Simple Storage Service User Guide*.  
You can automatically generate a list of all the customer managed AWS KMS keys that Macie needs to access to analyze objects in S3 buckets for your account. To do this, run the AWS KMS Permission Analyzer script, which is available from the [Amazon Macie Scripts](https://github.com/aws-samples/amazon-macie-scripts) repository on GitHub. The script can also generate an additional script of AWS Command Line Interface (AWS CLI) commands. You can optionally run those commands to update the requisite configuration settings and policies for KMS keys that you specify.

**Server-side encryption with customer-provided keys (SSE-C)**  
If an object is encrypted using server-side encryption with a customer-provided key (SSE-C), Macie can't decrypt the object. Macie can only store and report metadata for the object.  
To learn about this type of encryption, see [Using server-side encryption with customer-provided keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ServerSideEncryptionCustomerKeys.html) in the *Amazon Simple Storage Service User Guide*.

**Client-side encryption**  
If an object is encrypted using client-side encryption, Macie can't decrypt the object. Macie can only store and report metadata for the object. For example, Macie can report the size of the object and the tags that are associated with the object.   
To learn about this type of encryption in the context of Amazon S3, see [Protecting data by using client-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingClientSideEncryption.html) in the *Amazon Simple Storage Service User Guide*.

You can [filter your bucket inventory](monitoring-s3-inventory-filter.md) in Macie to determine which S3 buckets store objects that use certain types of encryption. You can also determine which buckets use certain types of server-side encryption by default when storing new objects. The following table provides examples of filters that you can apply to your bucket inventory to find this information.


| To show buckets that... | Apply this filter... | 
| --- | --- | 
| Store objects that use SSE-C encryption | Object count by encryption is Customer provided and From = 1 | 
| Store objects that use DSSE-KMS or SSE-KMS encryption | Object count by encryption is AWS KMS managed and From = 1 | 
| Store objects that use SSE-S3 encryption | Object count by encryption is Amazon S3 managed and From = 1 | 
| Store objects that use client-side encryption (or aren't encrypted) | Object count by encryption is No encryption and From = 1 | 
| Encrypt new objects by default using DSSE-KMS encryption | Default encryption = aws:kms:dsse | 
| Encrypt new objects by default using SSE-KMS encryption | Default encryption = aws:kms | 
| Encrypt new objects by default using SSE-S3 encryption | Default encryption = AES256 | 

If a bucket is configured to encrypt new objects by default using DSSE-KMS or SSE-KMS encryption, you can also determine which AWS KMS key is used. To do this, choose the bucket on the **S3 buckets** page. In the bucket details panel, under **Server-side encryption**, refer to the **AWS KMS key** field. This field shows the Amazon Resource Name (ARN) or unique identifier (key ID) for the key.

## Allowing Macie to use a customer managed AWS KMS key
<a name="discovery-supported-encryption-cmk-configuration"></a>

If an Amazon S3 object is encrypted using dual-layer server-side encryption or server-side encryption with a customer managed AWS KMS key (DSSE-KMS or SSE-KMS), Amazon Macie can decrypt the object only if it is allowed to use the key. How to provide this access depends on whether the account that owns the key also owns the S3 bucket that stores the object:
+ If the same account owns the AWS KMS key and the bucket, a user of the account has to update the key's policy. 
+ If one account owns the AWS KMS key and a different account owns the bucket, a user of the account that owns the key has to allow cross-account access to the key.

This topic describes how to perform these tasks and provides examples for both scenarios. To learn more about allowing access to customer managed AWS KMS keys, see [KMS key access and permissions](https://docs.aws.amazon.com/kms/latest/developerguide/control-access.html) in the *AWS Key Management Service Developer Guide*.

### Allowing same-account access to a customer managed key
<a name="discovery-supported-encryption-cmk-configuration-1account"></a>

If the same account owns both the AWS KMS key and the S3 bucket, a user of the account has to add a statement to the policy for the key. The additional statement must allow the Macie service-linked role for the account to decrypt data by using the key. For detailed information about updating a key policy, see [Changing a key policy](https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-modifying.html) in the *AWS Key Management Service Developer Guide*.

In the statement:
+ The `Principal` element must specify the Amazon Resource Name (ARN) of the Macie service-linked role for the account that owns the AWS KMS key and the S3 bucket.

  If the account is in an opt-in AWS Region, the ARN must also include the appropriate Region code for the Region. For example, if the account is in the Middle East (Bahrain) Region, which has the Region code *me-south-1*, the `Principal` element must specify `arn:aws:iam::123456789012:role/aws-service-role/macie.me-south-1.amazonaws.com/AWSServiceRoleForAmazonMacie`, where *123456789012* is the account ID for the account. For a list of Region codes for the Regions where Macie is currently available, see [Amazon Macie endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/macie.html) in the *AWS General Reference*.
+ The `Action` array must specify the `kms:Decrypt` action. This is the only AWS KMS action that Macie must be allowed to perform to decrypt an S3 object that's encrypted with the key.

The following is an example of the statement to add to the policy for an AWS KMS key. 

```
{
    "Sid": "Allow the Macie service-linked role to use the key",
    "Effect": "Allow",
    "Principal": {
        "AWS": "arn:aws:iam::123456789012:role/aws-service-role/macie.amazonaws.com/AWSServiceRoleForAmazonMacie"
    },
    "Action": [
        "kms:Decrypt"
    ],
    "Resource": "*"
}
```

In the preceding example: 
+ The `AWS` field in the `Principal` element specifies the ARN of the Macie service-linked role (`AWSServiceRoleForAmazonMacie`) for the account. It allows the Macie service-linked role to perform the action specified by the policy statement. *123456789012* is an example account ID. Replace this value with the account ID for the account that owns the KMS key and the S3 bucket.
+ The `Action` array specifies the action that the Macie service-linked role is allowed to perform using the KMS key—decrypt ciphertext that's encrypted with the key.

Where you add this statement to a key policy depends on the structure and elements that the policy currently contains. When you add the statement, ensure that the syntax is valid. Key policies use JSON format. This means that you have to also add a comma before or after the statement, depending on where you add the statement to the policy. 

### Allowing cross-account access to a customer managed key
<a name="discovery-supported-encryption-cmk-configuration-xaccount"></a>

If one account owns the AWS KMS key (*key owner*) and a different account owns the S3 bucket (*bucket owner*), the key owner has to provide the bucket owner with cross-account access to the KMS key. To do this, the key owner first ensures that the key's policy allows the bucket owner to both use the key and create a grant for the key. The bucket owner then creates a grant for the key. A *grant* is a policy instrument that allows AWS principals to use KMS keys in cryptographic operations if the conditions specified by the grant are met. In this case, the grant delegates the relevant permissions to the Macie service-linked role for the bucket owner's account.

For detailed information about updating a key policy, see [Changing a key policy](https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-modifying.html) in the *AWS Key Management Service Developer Guide*. To learn about grants, see [Grants in AWS KMS](https://docs.aws.amazon.com/kms/latest/developerguide/grants.html) in the *AWS Key Management Service Developer Guide*.

**Step 1: Update the key policy**  
In the key policy, the key owner should ensure that the policy includes two statements:
+ The first statement allows the bucket owner to use the key to decrypt data.
+ The second statement allows the bucket owner to create a grant for the Macie service-linked role for their (the bucket owner's) account.

In the first statement, the `Principal` element must specify the ARN of the bucket owner's account. The `Action` array must specify the `kms:Decrypt` action. This is the only AWS KMS action that Macie must be allowed to perform to decrypt an object that's encrypted with the key. The following is an example of this statement in the policy for an AWS KMS key. 

```
{
    "Sid": "Allow account 111122223333 to use the key",
    "Effect": "Allow",
    "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
    },
    "Action": [
        "kms:Decrypt"
    ],
    "Resource": "*"
}
```

In the preceding example: 
+ The `AWS` field in the `Principal` element specifies the ARN of the bucket owner's account (*111122223333*). It allows the bucket owner to perform the action specified by the policy statement. *111122223333* is an example account ID. Replace this value with the account ID for the bucket owner's account.
+ The `Action` array specifies the action that the bucket owner is allowed to perform using the KMS key—decrypt ciphertext that's encrypted with the key.

The second statement in the key policy allows the bucket owner to create a grant for the Macie service-linked role for their account. In this statement, the `Principal` element must specify the ARN of the bucket's owner's account. The `Action` array must specify the `kms:CreateGrant` action. A `Condition` element can filter access to the `kms:CreateGrant` action specified in the statement. The following is an example of this statement in the policy for an AWS KMS key.

```
{
    "Sid": "Allow account 111122223333 to create a grant",
    "Effect": "Allow",
    "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
    },
    "Action": [
        "kms:CreateGrant"
    ],
    "Resource": "*",
    "Condition": {
        "StringEquals": {
            "kms:GranteePrincipal": "arn:aws:iam::111122223333:role/aws-service-role/macie.amazonaws.com/AWSServiceRoleForAmazonMacie"
        }
    }
}
```

In the preceding example:
+ The `AWS` field in the `Principal` element specifies the ARN of the bucket owner's account (*111122223333*). It allows the bucket owner to perform the action specified by the policy statement. *111122223333* is an example account ID. Replace this value with the account ID for the bucket owner's account.
+ The `Action` array specifies the action that the bucket owner is allowed to perform on the KMS key—create a grant for the key.
+ The `Condition` element uses the `StringEquals` [condition operator](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_condition_operators.html) and the `kms:GranteePrincipal` [condition key](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awskeymanagementservice.html#awskeymanagementservice-policy-keys) to filter access to the action specified by the policy statement. In this case, the bucket owner can create a grant only for the specified `GranteePrincipal`, which is the ARN of the Macie service-linked role for their account. In that ARN, *111122223333* is an example account ID. Replace this value with the account ID for the bucket owner's account.

  If the bucket owner's account is in an opt-in AWS Region, also include the appropriate Region code in the ARN of the Macie service-linked role. For example, if the account is in the Middle East (Bahrain) Region, which has the Region code *me-south-1*, replace `macie.amazonaws.com` with `macie.me-south-1.amazonaws.com` in the ARN. For a list of Region codes for the Regions where Macie is currently available, see [Amazon Macie endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/macie.html) in the *AWS General Reference*.

Where the key owner adds these statements to the key policy depends on the structure and elements that the policy currently contains. When the key owner adds the statements, they should ensure that the syntax is valid. Key policies use JSON format. This means that the key owner has to also add a comma before or after each statement, depending on where they add the statement to the policy.

**Step 2: Create a grant**  
After the key owner updates the key policy as necessary, the bucket owner must create a grant for the key. The grant delegates the relevant permissions to the Macie service-linked role for their (the bucket owner's) account. Before the bucket owner creates the grant, they should verify that they're allowed to perform the `kms:CreateGrant` action for their account. This action allows them to add a grant to an existing, customer managed AWS KMS key.

To create the grant, the bucket owner can use the [CreateGrant](https://docs.aws.amazon.com/kms/latest/APIReference/API_CreateGrant.html) operation of the AWS Key Management Service API. When the bucket owner creates the grant, they should specify the following values for the required parameters:
+ `KeyId` – The ARN of the KMS key. For cross-account access to a KMS key, this value must be an ARN. It can't be a key ID.
+ `GranteePrincipal` – The ARN of the Macie service-linked role (`AWSServiceRoleForAmazonMacie`) for their account. This value should be `arn:aws:iam::111122223333:role/aws-service-role/macie.amazonaws.com/AWSServiceRoleForAmazonMacie`, where *111122223333* is the account ID for the bucket owner's account.

  If their account is in an opt-in Region, the ARN must include the appropriate Region code. For example, if their account is in the Middle East (Bahrain) Region, which has the Region code *me-south-1*, the ARN should be `arn:aws:iam::111122223333:role/aws-service-role/macie.me-south-1.amazonaws.com/AWSServiceRoleForAmazonMacie`, where *111122223333* is the account ID for the bucket owner's account.
+ `Operations` – The AWS KMS decrypt action (`Decrypt`). This is the only AWS KMS action that Macie must be allowed to perform to decrypt an object that's encrypted with the KMS key.

To create a grant for a customer managed KMS key by using the AWS Command Line Interface (AWS CLI), run the [create-grant](https://docs.aws.amazon.com/cli/latest/reference/kms/create-grant.html) command. The following example shows how. The example is formatted for Microsoft Windows and it uses the caret (^) line-continuation character to improve readability.

```
C:\> aws kms create-grant ^
--key-id arn:aws:kms:us-east-1:123456789012:key/1234abcd-12ab-34cd-56ef-1234567890ab ^
--grantee-principal arn:aws:iam::111122223333:role/aws-service-role/macie.amazonaws.com/AWSServiceRoleForAmazonMacie ^
--operations "Decrypt"
```

Where:
+ `key-id` specifies the ARN of the KMS key to apply the grant to.
+ `grantee-principal` specifies the ARN of the Macie service-linked role for the account that's allowed to perform the action specified by the grant. This value should match the ARN specified by the `kms:GranteePrincipal` condition of the second statement in the key policy.
+ `operations` specifies the action that the grant allows the specified principal to perform—decrypt ciphertext that's encrypted with the KMS key.

If the command runs successfully, you receive output similar to the following.

```
{
    "GrantToken": "<grant token>",
    "GrantId": "1a2b3c4d2f5e69f440bae30eaec9570bb1fb7358824f9ddfa1aa5a0dab1a59b2"
}
```

Where `GrantToken` is a unique, non-secret, variable-length, base64-encoded string that represents the grant that was created, and `GrantId` is the unique identifier for the grant.

# Storing and retaining sensitive data discovery results
<a name="discovery-results-repository-s3"></a>

When you run a sensitive data discovery job or Amazon Macie performs automated sensitive data discovery, Macie creates an analysis record for each Amazon Simple Storage Service (Amazon S3) object that's included in the scope of the analysis. These records, referred to as a *sensitive data discovery results*, log details about the analysis that Macie performs on individual S3 objects. This includes objects that Macie doesn't detect sensitive data in, and therefore don't produce findings, and objects that Macie can't analyze due to errors or issues. If Macie detects sensitive data in an object, the record includes data from the corresponding finding as well as additional information. Sensitive data discovery results provide you with analysis records that can be helpful for data privacy and protection audits or investigations.

Macie stores your sensitive data discovery results for only 90 days. To access your results and enable long-term storage and retention of them, configure Macie to encrypt the results with an AWS Key Management Service (AWS KMS) key and store them in an S3 bucket. The bucket can serve as a definitive, long-term repository for all of your sensitive data discovery results. You can then optionally access and query the results in that repository.

This topic guides you through the process of using the AWS Management Console to configure a repository for your sensitive data discovery results. The configuration is a combination of an AWS KMS key that encrypts the results, an S3 general purpose bucket that stores the results, and Macie settings that specify which key and bucket to use. If you prefer to configure the Macie settings programmatically, you can use the [PutClassificationExportConfiguration](https://docs.aws.amazon.com/macie/latest/APIReference/classification-export-configuration.html) operation of the Amazon Macie API.

When you configure the settings in Macie, your choices apply only to the current AWS Region. If you're the Macie administrator for an organization, your choices apply only to your account. They don't apply to any associated member accounts. If you enable automated sensitive data discovery or run sensitive data discovery jobs to analyze data for member accounts, Macie stores the sensitive data discovery results in the repository for your administrator account.

If you use Macie in multiple AWS Regions, configure the repository settings for each Region in which you use Macie. You can optionally store sensitive data discovery results for multiple Regions in the same S3 bucket. However, note the following requirements:
+ To store the results for a Region that AWS enables by default for AWS accounts, such as the US East (N. Virginia) Region, you have to choose a bucket in a Region that's enabled by default. The results can't be stored in a bucket in an opt-in Region (Region that's disabled by default).
+ To store the results for an opt-in Region, such as the Middle East (Bahrain) Region, you have to choose a bucket in that same Region or a Region that's enabled by default. The results can't be stored in a bucket in a different opt-in Region.

To determine whether a Region is enabled by default, see [Enable or disable AWS Regions in your account](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-regions.html) in the *AWS Account Management User Guide*. In addition to the preceding requirements, also consider whether you want to [retrieve samples of sensitive data](findings-retrieve-sd.md) that Macie reports in individual findings. To retrieve sensitive data samples from an affected S3 object, all of the following resources and data must be stored in the same Region: the affected object, the applicable finding, and the corresponding sensitive data discovery result.

**Topics**
+ [Before you begin: Learn key concepts](#discovery-results-repository-s3-overview)
+ [Step 1: Verify your permissions](#discovery-results-repository-s3-permissions)
+ [Step 2: Configure an AWS KMS key](#discovery-results-repository-s3-key-policy)
+ [Step 3: Choose an S3 bucket](#discovery-results-repository-s3-choose-bucket)

## Before you begin: Learn key concepts
<a name="discovery-results-repository-s3-overview"></a>

Amazon Macie automatically creates a sensitive data discovery result for each Amazon S3 object that it analyzes or attempts to analyze when you run a sensitive data discovery job or it performs automated sensitive data discovery. This includes:
+ Objects that Macie detects sensitive data in, and therefore also produce sensitive data findings.
+ Objects that Macie doesn't detect sensitive data in, and therefore don't produce sensitive data findings.
+ Objects that Macie can't analyze due to errors or issues such as permissions settings or use of an unsupported file or storage format.

If Macie detects sensitive data in an S3 object, the sensitive data discovery result includes data from the corresponding sensitive data finding. It provides additional information too, such as the location of as many as 1,000 occurrences of each type of sensitive data that Macie found in the object. For example: 
+ The column and row number for a cell or field in a Microsoft Excel workbook, CSV file, or TSV file
+ The path to a field or array in a JSON or JSON Lines file
+ The line number for a line in a non-binary text file other than a CSV, JSON, JSON Lines, or TSV file—for example, an HTML, TXT, or XML file
+ The page number for a page in an Adobe Portable Document Format (PDF) file
+ The record index and the path to a field in a record in an Apache Avro object container or Apache Parquet file

If the affected S3 object is an archive file, such as a .tar or .zip file, the sensitive data discovery result also provides detailed location data for occurrences of sensitive data in individual files that Macie extracted from the archive. Macie doesn’t include this information in sensitive data findings for archive files. To report location data, sensitive data discovery results use a [standardized JSON schema](findings-locate-sd-schema.md).

A sensitive data discovery result doesn't include the sensitive data that Macie found. Instead, it provides you with an analysis record that can be helpful for audits or investigations.

Macie stores your sensitive data discovery results for 90 days. You can’t access them directly on the Amazon Macie console or with the Amazon Macie API. Instead, follow the steps in this topic to configure Macie to encrypt your results with an AWS KMS key that you specify, and store the results in an S3 general purpose bucket that you also specify. Macie then writes the results to JSON Lines (.jsonl) files, adds the files to the bucket as GNU Zip (.gz) files, and encrypts the data using SSE-KMS encryption. As of November 8, 2023, Macie also signs the resulting S3 objects with a Hash-based Message Authentication Code (HMAC) AWS KMS key.

After you configure Macie to store your sensitive data discovery results in an S3 bucket, the bucket can serve as a definitive, long-term repository for the results. You can then optionally access and query the results in that repository. 

**Tips**  
For a detailed, instructional example of how you might query and use sensitive data discovery results to analyze and report potential data security risks, see the following blog post on the *AWS Security Blog*: [How to query and visualize Macie sensitive data discovery results with Amazon Athena and Amazon Quick](https://aws.amazon.com/blogs/security/how-to-query-and-visualize-macie-sensitive-data-discovery-results-with-athena-and-quicksight/).  
For samples of Amazon Athena queries that you can use to analyze sensitive data discovery results, visit the [Amazon Macie Results Analytics repository](https://github.com/aws-samples/amazon-macie-results-analytics) on GitHub. This repository also provides instructions for configuring Athena to retrieve and decrypt your results, and scripts for creating tables for the results.

## Step 1: Verify your permissions
<a name="discovery-results-repository-s3-permissions"></a>

Before you configure a repository for your sensitive data discovery results, verify that you have the permissions that you need to encrypt and store the results. To verify your permissions, use AWS Identity and Access Management (IAM) to review the IAM policies that are attached to your IAM identity. Then compare the information in those policies to the following list of actions that you must be allowed to perform to configure the repository.

**Amazon Macie**  
For Macie, verify that you're allowed to perform the following action:  
`macie2:PutClassificationExportConfiguration`  
This action allows you to add or change the repository settings in Macie.

**Amazon S3**  
For Amazon S3, verify that you're allowed to perform the following actions:  
+ `s3:CreateBucket`
+ `s3:GetBucketLocation`
+ `s3:ListAllMyBuckets`
+ `s3:PutBucketAcl`
+ `s3:PutBucketPolicy`
+ `s3:PutBucketPublicAccessBlock`
+ `s3:PutObject`
These actions allow you to access and configure an S3 general purpose bucket that can serve as the repository.

**AWS KMS**  
To use the Amazon Macie console to add or change the repository settings, also verify that you're allowed to perform the following AWS KMS actions:  
+ `kms:DescribeKey`
+ `kms:ListAliases`
These actions allow you to retrieve and display information about the AWS KMS keys for your account. You can then choose one of these keys to encrypt your sensitive data discovery results.  
If you plan to create a new AWS KMS key to encrypt the data, you also need to be allowed to perform the following actions: `kms:CreateKey`, `kms:GetKeyPolicy`, and `kms:PutKeyPolicy`.

If you're not allowed to perform the requisite actions, ask your AWS administrator for assistance before you proceed to the next step.

## Step 2: Configure an AWS KMS key
<a name="discovery-results-repository-s3-key-policy"></a>

After you verify your permissions, determine which AWS KMS key you want Macie to use to encrypt your sensitive data discovery results. The key must be a customer managed, symmetric encryption KMS key that's enabled in the same AWS Region as the S3 bucket where you want to store the results.

The key can be an existing AWS KMS key from your own account, or an existing AWS KMS key that another account owns. If you want to use a new KMS key, create the key before proceeding. If you want to use an existing key that another account owns, obtain the Amazon Resource Name (ARN) of the key. You'll need to enter this ARN when you configure the repository settings in Macie. For information about creating and reviewing the settings for KMS keys, see the [AWS Key Management Service Developer Guide](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html).

**Note**  
The key can be an AWS KMS key in an external key store. However, the key might then be slower and less reliable than a key that’s managed entirely within AWS KMS. You can reduce this risk by storing your sensitive data discovery results in an S3 bucket that’s configured to use the key as an S3 Bucket Key. Doing so reduces the number of AWS KMS requests that must be made to encrypt your sensitive data discovery results.  
For information about using KMS keys in external key stores, see [External key stores](https://docs.aws.amazon.com/kms/latest/developerguide/keystore-external.html) in the *AWS Key Management Service Developer Guide*. For information about using S3 Bucket Keys, see [Reducing the cost of SSE-KMS with Amazon S3 Bucket Keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-key.html) in the *Amazon Simple Storage Service User Guide*.

After you determine which KMS key you want Macie to use, give Macie permission to use the key. Otherwise, Macie won't be able to encrypt or store your results in the repository. To give Macie permission to use the key, update the key policy for the key. For detailed information about key policies and managing access to KMS keys, see [Key policies in AWS KMS](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html) in the *AWS Key Management Service Developer Guide*.

**To update the key policy**

1. Open the AWS KMS console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms).

1. To change the AWS Region, use the Region selector in the upper-right corner of the page.

1. Choose the key that you want Macie to use to encrypt your sensitive data discovery results.

1. On the **Key policy** tab, choose **Edit**.

1. Copy the following statement to your clipboard, and then add it to the policy:

   ```
   {
       "Sid": "Allow Macie to use the key",
       "Effect": "Allow",
       "Principal": {
           "Service": "macie.amazonaws.com"
       },
       "Action": [
           "kms:GenerateDataKey",
           "kms:Encrypt"
       ],
       "Resource": "*",
       "Condition": {
           "StringEquals": {
               "aws:SourceAccount": "111122223333"
            },
            "ArnLike": {
                "aws:SourceArn": [
                    "arn:aws:macie2:us-east-1:111122223333:export-configuration:*",
                    "arn:aws:macie2:us-east-1:111122223333:classification-job/*"
                ]
            }
       }
   }
   ```
**Note**  
When you add the statement to the policy, make sure that the syntax is valid. Policies use JSON format. This means that you need to also add a comma before or after the statement, depending on where you add the statement to the policy. If you add the statement as the last statement, add a comma after the closing curly brace for the preceding statement. If you add it as the first statement or between two existing statements, add a comma after the closing curly brace for the statement.

1. Update the statement with the correct values for your environment:
   + In the `Condition` fields, replace the placeholder values, where:
     + *111122223333* is the account ID for your AWS account.
     + *us-east-1* is the Region code for the AWS Region in which you're using Macie and want to allow Macie to use the key.

       If you use Macie in multiple Regions and want to allow Macie to use the key in additional Regions, add `aws:SourceArn` conditions for each additional Region. For example:

       ```
       "aws:SourceArn": [
           "arn:aws:macie2:us-east-1:111122223333:export-configuration:*",
           "arn:aws:macie2:us-east-1:111122223333:classification-job/*",
           "arn:aws:macie2:us-west-2:111122223333:export-configuration:*",
           "arn:aws:macie2:us-west-2:111122223333:classification-job/*"
       ]
       ```

       Alternatively, you can allow Macie to use the key in all Regions. To do this, replace the placeholder value with the wildcard character (`*`). For example:

       ```
       "aws:SourceArn": [
           "arn:aws:macie2:*:111122223333:export-configuration:*",
           "arn:aws:macie2:*:111122223333:classification-job/*"
       ]
       ```
   + If you're using Macie in an opt-in Region, add the appropriate Region code to the value for the `Service` field. For example, if you're using Macie in the Middle East (Bahrain) Region, which has the Region code *me-south-1*, replace `macie.amazonaws.com` with `macie.me-south-1.amazonaws.com`.

     For a list of Regions where Macie is currently available and the Region code for each one, see [Amazon Macie endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/macie.html) in the *AWS General Reference*.

   Note that the `Condition` fields use two IAM global condition keys:
   + [aws:SourceAccount](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourceaccount) – This condition allows Macie to perform the specified actions only for your account. More specifically, it determines which account can perform the specified actions for the resources and actions specified by the `aws:SourceArn` condition.

     To allow Macie to perform the specified actions for additional accounts, add the account ID for each additional account to this condition. For example:

     ```
     "aws:SourceAccount": [111122223333,444455556666]
     ```
   + [aws:SourceArn](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourcearn) – This condition prevents other AWS services from performing the specified actions. It also prevents Macie from using the key while performing other actions for your account. In other words, it allows Macie to encrypt S3 objects with the key only if: the objects are sensitive data discovery results, and the results are for automated sensitive data discovery or sensitive data discovery jobs created by the specified account in the specified Region.

     To allow Macie to perform the specified actions for additional accounts, add ARNs for each additional account to this condition. For example:

     ```
     "aws:SourceArn": [
         "arn:aws:macie2:us-east-1:111122223333:export-configuration:*",
         "arn:aws:macie2:us-east-1:111122223333:classification-job/*",
         "arn:aws:macie2:us-east-1:444455556666:export-configuration:*",
         "arn:aws:macie2:us-east-1:444455556666:classification-job/*"
     ]
     ```

   The accounts specified by the `aws:SourceAccount` and `aws:SourceArn` conditions should match.

   These conditions help prevent Macie from being used as a [confused deputy](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html) during transactions with AWS KMS. Although we don’t recommend it, you can remove these conditions from the statement.

1. When you finish adding and updating the statement, choose **Save changes**.

## Step 3: Choose an S3 bucket
<a name="discovery-results-repository-s3-choose-bucket"></a>

After you verify your permissions and configure the AWS KMS key, you're ready to specify which S3 bucket you want to use as the repository for your sensitive data discovery results. You have two options:
+ **Use a new S3 bucket that Macie creates** – If you choose this option, Macie automatically creates a new S3 general purpose bucket in the current AWS Region for your discovery results. Macie also applies a bucket policy to the bucket. The policy allows Macie to add objects to the bucket. It also requires the objects to be encrypted with the AWS KMS key that you specify, using SSE-KMS encryption. To review the policy, choose **View policy** on the Amazon Macie console after you specify a name for the bucket and the KMS key to use.
+ **Use an existing S3 bucket that you create** – If you prefer to store your discovery results in a particular S3 bucket that you create, create the bucket before you proceed. The bucket must be a general purpose bucket. In addition, the bucket's settings and policy must allow Macie to add objects to the bucket. This topic explains which settings to check and how to update the policy. It also provides examples of the statements to add to the policy.

The following sections provide instructions for each option. Choose the section for the option that you want.

### Use a new S3 bucket that Macie creates
<a name="discovery-results-repository-s3-new-bucket"></a>

If you prefer to use a new S3 bucket that Macie creates for you, the final step in the process is to configure the repository settings in Macie.

**To configure the repository settings in Macie**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, under **Settings**, choose **Discovery results**.

1. Under **Repository for sensitive data discovery results**, choose **Create bucket**.

1. In the **Create a bucket** box, enter a name for the bucket.

   The name must be unique across all S3 buckets. In addition, the name can consist only of lowercase letters, numbers, dots (.), and hyphens (-). For additional naming requirements, see [Bucket naming rules](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html) in the *Amazon Simple Storage Service User Guide*.

1. Expand the **Advanced** section.

1. (Optional) To specify a prefix to use in the path to a location in the bucket, enter the prefix in the **Data discovery result prefix** box.

   When you enter a value, Macie updates the example below the box to show the path to the bucket location where it will store your discovery results.

1. For **Block all public access**, choose **Yes** to enable all block public access settings for the bucket.

   For information about these settings, see [Blocking public access to your Amazon S3 storage](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-control-block-public-access.html) in the *Amazon Simple Storage Service User Guide*.

1. Under **Encryption settings**, specify the AWS KMS key that you want Macie to use to encrypt the results:
   + To use a key from your own account, choose **Select a key from your account**. Then, in the **AWS KMS key** list, choose the key to use. The list displays customer managed, symmetric encryption KMS keys for your account.
   + To use a key that another account owns, choose **Enter the ARN of a key from another account**. Then, in the **AWS KMS key ARN** box, enter the Amazon Resource Name (ARN) of the key to use—for example, **`arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab`**.

1. When you finish entering the settings, choose **Save**.

   Macie tests the settings to verify that they're correct. If any settings are incorrect, Macie displays an error message to help you address the issue.

After you save the repository settings, Macie adds existing discovery results for the preceding 90 days to the repository. Macie also starts adding new discovery results to the repository.

### Use an existing S3 bucket that you create
<a name="discovery-results-repository-s3-existing-bucket"></a>

If you prefer to store your sensitive data discovery results in a particular S3 bucket that you create, create and configure the bucket before you configure the settings in Macie. When you create the bucket, note the following requirements:
+ The bucket must be a general purpose bucket. It can't be another type of bucket, such as a directory bucket.
+ To store your discovery results for a Region that's enabled by default for AWS accounts, such as the US East (N. Virginia) Region, the bucket has to be in a Region that's enabled by default. The results can't be stored in a bucket in an opt-in Region (Region that's disabled by default).
+ To store your discovery results for an opt-in Region, such as the Middle East (Bahrain) Region, the bucket has to be in the same Region or a Region that's enabled by default. The results can't be stored in a bucket in a different opt-in Region.

To determine whether a Region is enabled by default, see [Enable or disable AWS Regions in your account](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-regions.html) in the *AWS Account Management User Guide*.

After you create the bucket, update the bucket's policy to allow Macie to retrieve information about the bucket and add objects to the bucket. You can then configure the settings in Macie.

**To update the bucket policy for the bucket**

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. Choose the bucket that you want to store your discovery results in.

1. Choose the **Permissions** tab.

1. In the **Bucket policy** section, choose **Edit**.

1. Copy the following example policy to your clipboard:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "Allow Macie to use the GetBucketLocation operation",
               "Effect": "Allow",
               "Principal": {
                   "Service": "macie.amazonaws.com"
               },
               "Action": "s3:GetBucketLocation",
               "Resource": "arn:aws:s3:::amzn-s3-demo-bucket",
               "Condition": {
                   "StringEquals": {
                       "aws:SourceAccount": "111122223333"
                   },
                   "ArnLike": {
                       "aws:SourceArn": [
                           "arn:aws:macie2:us-east-1:111122223333:export-configuration:*",
                           "arn:aws:macie2:us-east-1:111122223333:classification-job/*"
                       ]
                   }
               }
           },
           {
               "Sid": "Allow Macie to add objects to the bucket",
               "Effect": "Allow",
               "Principal": {
                   "Service": "macie.amazonaws.com"
               },
               "Action": "s3:PutObject",
               "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/[optional prefix/]*",
               "Condition": {
                   "StringEquals": {
                       "aws:SourceAccount": "111122223333"
                   },
                   "ArnLike": {
                       "aws:SourceArn": [
                           "arn:aws:macie2:us-east-1:111122223333:export-configuration:*",
                           "arn:aws:macie2:us-east-1:111122223333:classification-job/*"
                       ]
                   }
               }
           },
           {
               "Sid": "Deny unencrypted object uploads. This is optional",
               "Effect": "Deny",
               "Principal": {
                   "Service": "macie.amazonaws.com"
               },
               "Action": "s3:PutObject",
               "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/[optional prefix/]*",
               "Condition": {
                   "StringNotEquals": {
                       "s3:x-amz-server-side-encryption": "aws:kms"
                   }
               }
           },
           {
               "Sid": "Deny incorrect encryption headers. This is optional",
               "Effect": "Deny",
               "Principal": {
                   "Service": "macie.amazonaws.com"
               },
               "Action": "s3:PutObject",
               "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/[optional prefix/]*",
               "Condition": {
                   "StringNotEquals": {
                       "s3:x-amz-server-side-encryption-aws-kms-key-id": "arn:aws:kms:us-east-1:111122223333:key/KMSKeyId"
                   }
               }
           },
           {
               "Sid": "Deny non-HTTPS access",
               "Effect": "Deny",
               "Principal": "*",
               "Action": "s3:*",
               "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*",
               "Condition": {
                   "Bool": {
                       "aws:SecureTransport": "false"
                   }
               }
           }
       ]
   }
   ```

------

1. Paste the example policy in the **Bucket policy** editor on the Amazon S3 console.

1. Update the example policy with the correct values for your environment:
   + In the optional statement that denies incorrect encryption headers:
     + Replace *amzn-s3-demo-bucket* with the name of the bucket. To also specify a prefix for a path to a location in the bucket, replace *[optional prefix/]* with the prefix. Otherwise, remove the *[optional prefix/]* placeholder value.
     + In the `StringNotEquals` condition, replace *arn:aws:kms:us-east-1:111122223333:key/KMSKeyId* with the Amazon Resource Name (ARN) of the AWS KMS key to use for encryption of your discovery results.
   + In all other statements, replace the placeholder values, where:
     + *amzn-s3-demo-bucket* is the name of the bucket.
     + *[optional prefix/]* is the prefix for a path to a location in the bucket. Remove this placeholder value if you don't want to specify a prefix.
     + *111122223333* is the account ID for your AWS account.
     + *us-east-1* is the Region code for the AWS Region in which you're using Macie and want to allow Macie to add discovery results to the bucket.

       If you use Macie in multiple Regions and want to allow Macie to add results to the bucket for additional Regions, add `aws:SourceArn` conditions for each additional Region. For example:

       ```
       "aws:SourceArn": [
           "arn:aws:macie2:us-east-1:111122223333:export-configuration:*",
           "arn:aws:macie2:us-east-1:111122223333:classification-job/*",
           "arn:aws:macie2:us-west-2:111122223333:export-configuration:*",
           "arn:aws:macie2:us-west-2:111122223333:classification-job/*"
       ]
       ```

       Alternatively, you can allow Macie to add results to the bucket for all Regions in which you use Macie. To do this, replace the placeholder value with the wildcard character (`*`). For example:

       ```
       "aws:SourceArn": [
           "arn:aws:macie2:*:111122223333:export-configuration:*",
           "arn:aws:macie2:*:111122223333:classification-job/*"
       ]
       ```
   + If you're using Macie in an opt-in Region, add the appropriate Region code to the value for the `Service` field in each statement that specifies the Macie service principal. For example, if you're using Macie in the Middle East (Bahrain) Region, which has the Region code *me-south-1*, replace `macie.amazonaws.com` with `macie.me-south-1.amazonaws.com` in each applicable statement.

     For a list of Regions where Macie is currently available and the Region code for each one, see [Amazon Macie endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/macie.html) in the *AWS General Reference*.

   Note that the example policy includes statements that allow Macie to determine which Region the bucket resides in (`GetBucketLocation`) and add objects to the bucket (`PutObject`). These statements define conditions that use two IAM global condition keys:
   + [aws:SourceAccount](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourceaccount) – This condition allows Macie to add sensitive data discovery results to the bucket only for your account. It prevents Macie from adding discovery results for other accounts to the bucket. More specifically, the condition specifies which account can use the bucket for the resources and actions specified by the `aws:SourceArn` condition.

     To store results for additional accounts in the bucket, add the account ID for each additional account to this condition. For example:

     ```
     "aws:SourceAccount": [111122223333,444455556666]
     ```
   + [aws:SourceArn](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourcearn) – This condition restricts access to the bucket based on the source of the objects that are being added to the bucket. It prevents other AWS services from adding objects to the bucket. It also prevents Macie from adding objects to the bucket while performing other actions for your account. More specifically, the condition allows Macie to add objects to the bucket only if: the objects are sensitive data discovery results, and the results are for automated sensitive data discovery or sensitive data discovery jobs created by the specified account in the specified Region.

     To allow Macie to perform the specified actions for additional accounts, add ARNs for each additional account to this condition. For example:

     ```
     "aws:SourceArn": [
         "arn:aws:macie2:us-east-1:111122223333:export-configuration:*",
         "arn:aws:macie2:us-east-1:111122223333:classification-job/*",
         "arn:aws:macie2:us-east-1:444455556666:export-configuration:*",
         "arn:aws:macie2:us-east-1:444455556666:classification-job/*"
     ]
     ```

   The accounts specified by the `aws:SourceAccount` and `aws:SourceArn` conditions should match.

   Both conditions help prevent Macie from being used as a [confused deputy](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html) during transactions with Amazon S3. Although we don’t recommend it, you can remove these conditions from the bucket policy.

1. When you finish updating the bucket policy, choose **Save changes**. 

You can now configure the repository settings in Macie.

**To configure the repository settings in Macie**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, under **Settings**, choose **Discovery results**.

1. Under **Repository for sensitive data discovery results**, choose **Existing bucket**.

1. For **Choose a bucket**, select the bucket that you want to store your discovery results in.

1. To specify a prefix for a path to a location in the bucket, expand the **Advanced** section. Then, for **Data discovery result prefix**, enter the prefix.

   When you enter a value, Macie updates the example below the box to show the path to the bucket location where it will store your discovery results.

1. Under **Encryption settings**, specify the AWS KMS key that you want Macie to use to encrypt the results:
   + To use a key from your own account, choose **Select a key from your account**. Then, in the **AWS KMS key** list, choose the key to use. The list displays customer managed, symmetric encryption KMS keys for your account.
   + To use a key that another account owns, choose **Enter the ARN of a key from another account**. Then, in the **AWS KMS key ARN** box, enter the ARN of the key to use—for example, **arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab**.

1. When you finish entering the settings, choose **Save**.

   Macie tests the settings to verify that they're correct. If any settings are incorrect, Macie displays an error message to help you address the issue.

After you save the repository settings, Macie adds existing discovery results for the preceding 90 days to the repository. Macie also starts adding new discovery results to the repository.

**Note**  
If you subsequently change the **Data discovery result prefix** setting, also update the bucket policy in Amazon S3. Policy statements that specify the previous prefix must specify the new prefix. Otherwise, Macie won't be allowed to add your discovery results to the bucket.

**Tip**  
To reduce server-side encryption costs, also configure the S3 bucket to use an S3 Bucket Key, and specify the AWS KMS key that you configured for encryption of your sensitive data discovery results. Use of an S3 Bucket Key reduces the number of calls to AWS KMS, which can reduce AWS KMS request costs. If the KMS key is in an external key store, use of an S3 Bucket Key can also minimize the performance impact of using the key. To learn more, see [Reducing the cost of SSE-KMS with Amazon S3 Bucket Keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-key.html) in the *Amazon Simple Storage Service User Guide*.

# Supported storage classes and formats
<a name="discovery-supported-storage"></a>

To help you discover sensitive data in your Amazon Simple Storage Service (Amazon S3) data estate, Amazon Macie supports most Amazon S3 storage classes and a wide variety of file and storage formats. This support applies to the use of [managed data identifiers](managed-data-identifiers.md) and the use of [custom data identifiers](custom-data-identifiers.md) to analyze S3 objects.

For Macie to analyze an S3 object, the object has to be stored in an Amazon S3 general purpose bucket using a supported storage class. The object also has to use a supported file or storage format. The topics in this section list the storage classes and the file and storage formats that Macie currently supports.

**Tip**  
Although Macie is optimized for Amazon S3, you can use it to discover sensitive data in resources that you currently store elsewhere. You can do this by moving the data to Amazon S3 temporarily or permanently. For example, export Amazon Relational Database Service or Amazon Aurora snapshots to Amazon S3 in Apache Parquet format. Or export an Amazon DynamoDB table to Amazon S3. You can then create a sensitive data discovery job to analyze the data in Amazon S3.

**Topics**
+ [Supported storage classes](#discovery-supported-s3-classes)
+ [Supported file and storage formats](#discovery-supported-formats)

## Supported Amazon S3 storage classes
<a name="discovery-supported-s3-classes"></a>

For sensitive data discovery, Amazon Macie supports the following Amazon S3 storage classes:
+ Reduced Redundancy (RRS)
+ S3 Glacier Instant Retrieval
+ S3 Intelligent‐Tiering
+ S3 One Zone‐Infrequent Access (S3 One Zone‐IA)
+ S3 Standard
+ S3 Standard‐Infrequent Access (S3 Standard‐IA)

Macie doesn’t analyze S3 objects that use other Amazon S3 storage classes, such as S3 Glacier Deep Archive or S3 Express One Zone. In addition, Macie doesn't analyze objects that are stored in S3 directory buckets.

If you configure a sensitive data discovery job to analyze S3 objects that don't use a supported Amazon S3 storage class, Macie skips those objects when the job runs. Macie doesn't attempt to retrieve or analyze data in the objects—the objects are treated as *unclassifiable objects*. An *unclassifiable object* is an object that doesn't use a supported storage class or a supported file or storage format. Macie analyzes only those objects that use a supported storage class and a supported file or storage format.

Similarly, if you configure Macie to perform automated sensitive data discovery, unclassifiable objects aren't eligible for selection and analysis. Macie selects only those objects that use a supported Amazon S3 storage class and a supported file or storage format.

To identify S3 buckets that store unclassifiable objects, you can [filter your S3 bucket inventory](monitoring-s3-inventory-filter.md). For each bucket in your inventory, there are fields that report the number and total storage size of unclassifiable objects in the bucket.

For detailed information about the storage classes that Amazon S3 provides, see [Using Amazon S3 storage classes](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html) in the *Amazon Simple Storage Service User Guide*.

## Supported file and storage formats
<a name="discovery-supported-formats"></a>

When Amazon Macie analyzes an S3 object, Macie retrieves the latest version of the object from Amazon S3, and then performs a deep inspection of the object's contents. This inspection factors the file or storage format of the data. Macie can analyze data in many different formats, including commonly used compression and archive formats.

When Macie analyzes data in a compressed or archive file, Macie inspects both the full file and the contents of the file. To inspect the file’s contents, Macie decompresses the file, and then inspects each extracted file that uses a supported format. Macie can do this for as many as 1,000,000 files and up to a nested depth of 10 levels. For information about additional quotas that apply to sensitive data discovery, see [Quotas for Macie](macie-quotas.md).

The following table lists and describes the types of file and storage formats that Macie can analyze to detect sensitive data. For each supported type, the table also lists the applicable file name extensions.


| File or storage type | Description | File name extensions | 
| --- | --- | --- | 
|  Big data  |  Apache Avro object containers and Apache Parquet files  |  .avro, .parquet  | 
|  Compression or archive  |  GNU Zip compressed archives, TAR archives, and ZIP compressed archives  |  .gz, .gzip, .tar, .zip  | 
|  Document  |  Adobe Portable Document Format files, Microsoft Excel workbooks, and Microsoft Word documents  |  .doc, .docx, .pdf, .xls, .xlsx  | 
|  Email message  |  Electronic mail files whose contents comply with the requirements specified by an IETF RFC for electronic mail messages, such as [RFC 2822](https://www.rfc-editor.org/rfc/rfc2822)  |  .eml  | 
|  Text  |  Non-binary text files. Examples are: comma-separated values (CSV) files, Extensible Markup Language (XML) files, Hypertext Markup Language (HTML) files, JavaScript Object Notation (JSON) files, JSON Lines files, plaintext documents, tab-separated values (TSV) files, and YAML files  |  Depending on the type of non-binary text file: .csv, .htm, .html, .json, .jsonl, .tsv, .txt, .xml, .yaml, .yml, and others  | 

Macie doesn’t analyze data in images, or audio, video, and other types of multimedia content.

If you configure a sensitive data discovery job to analyze S3 objects that don't use a supported file or storage format, Macie skips those objects when the job runs. Macie doesn't attempt to retrieve or analyze data in the objects—the objects are treated as *unclassifiable objects*. An *unclassifiable object* is an object that doesn't use a supported Amazon S3 storage class or a supported file or storage format. Macie analyzes only those objects that use a supported storage class and a supported file or storage format.

Similarly, if you configure Macie to perform automated sensitive data discovery, unclassifiable objects aren't eligible for selection and analysis. Macie selects only those objects that use a supported Amazon S3 storage class and a supported file or storage format.

To identify S3 buckets that store unclassifiable objects, you can [filter your S3 bucket inventory](monitoring-s3-inventory-filter.md). For each bucket in your inventory, there are fields that report the number and total storage size of unclassifiable objects in the bucket.