Document history for AWS Glue DataBrew Developer Guide

Current API version: databrew-2017-07-25

The following table describes the documentation for this release of AWS Glue DataBrew. If you want to be notified when the AWS Glue DataBrew Developer Guide is updated, you can subscribe to the RSS feed.

Change	Description	Date
glue:GetCustomEntityType added to AWS managed policies	This permission is required to execute AWS Glue DataBrew profile jobs with PII-identification enabled. For more information, see AWS Glue DataBrew updates to AWS managed policies.	March 20, 2024
Support for multiple hashing algorithms in the CRYPTOGRAPHIC_HASH transformation	You can now specify a hashing algorithm when hashing values in a column. For more information, see CRYPTOGRAPHIC_HASH.	August 11, 2023
glue:BatchGetCustomEntityTypes added to AWS managed policies	This permission is required to execute AWS Glue DataBrew profile jobs with PII-identification enabled. For more information, see AWS Glue DataBrew updates to AWS managed policies.	May 9, 2022
Support for Apache ORC file format	DataBrew now supports Apache ORC as a file format for DataBrew data sources and outputs. For more information, see Supported file types for data sources.	March 31, 2022
Support for cross-account AWS Glue Data Catalog Amazon S3 access	You can now access AWS Glue Data Catalog S3 tables from other AWS accounts if an appropriate resource policy is created in the AWS Glue console. After creating a policy, the relevant Data Catalog S3 tables can be selected as input sources when creating a DataBrew dataset. For more information, see Supported connections for data sources and outputs.	March 11, 2022
Support for native console integration with Amazon AppFlow	DataBrew now has native console integration with Amazon AppFlow. This integration means that you can connect to data from Salesforce, Zendesk, Slack, ServiceNow, and other software-as-a-service (SaaS) applications. You can also connect to data from AWS services such as Amazon S3 and Amazon Redshift. For more information, see Supported connections for data sources and outputs.	November 18, 2021
Support for data quality rules	DataBrew now supports the creation of data quality rules, which are customizable validation checks that define business requirements for specific data. For more information, see Validating data quality in AWS Glue DataBrew.	November 18, 2021
Support for custom SQL statements	DataBrew now supports custom SQL statements for retrieving data from Amazon Redshift and Snowflake. This support means that you can use a purpose-built query to select and limit the data returned from large tables. For more information, see Supported connections for data sources and outputs.	November 18, 2021
Support for PII detection	DataBrew now supports detection of personally identifiable information (PII). This gives you the option of masking PII during data preparation. For more information, see Identifying and handling personally identifiable information (PII).	November 18, 2021
Support for additional AWS Regions	DataBrew now supports additional AWS Regions. For a list of supported Regions, see AWS Glue DataBrew endpoints and quotas.	October 5, 2021
Support for writing data to Lake Formation-based Amazon S3 tables	DataBrew now supports writing data into AWS Glue Data Catalog S3 tables based on AWS Lake Formation. DataBrew also now supports writing data into Tableau Hyper format. For more information, see Creating and working with AWS Glue DataBrew recipe jobs.	August 13, 2021
Support for writing data into JDBC destinations	DataBrew now supports writing data directly into JDBC-supported databases and data warehouses. These include Amazon Redshift, Snowflake, Microsoft SQL Server, MySQL, Oracle Database, and PostgreSQL. For more information, see Creating and working with AWS Glue DataBrew recipe jobs.	July 23, 2021
Support for specifying which data quality statistics are generated for a profile job	DataBrew now supports specifying which data quality statistics are autogenerated for datasets in a profile job. For more information, see Creating and working with AWS Glue DataBrew recipe jobs.	July 23, 2021
Support for writing datasets into the AWS Glue Data Catalog	DataBrew now includes support for writing datasets directly into the AWS Glue Data Catalog. You can choose to store datasets created from jobs that run your data preparation recipes in Amazon S3, Amazon Redshift, and Amazon RDS tables in the Data Catalog. The RDS tables supported include those for Amazon Aurora, RDS for Oracle, RDS for Microsoft SQL Server, RDS for MySQL, and RDS for PostgreSQL.	June 30, 2021
Support for identifying advanced data types	DataBrew now includes support to automatically identify and mark advanced data types for columns, which makes it easier to normalize columns that contain certain types of data. These types of data include Social Security number, email address, phone number, gender, credit card, URL, IP address, date and time, currency, ZIP code, country, region, state, and city.	June 30, 2021
Support for using Amazon AppFlow to transfer data from SAAS applications	DataBrew now supports using Amazon AppFlow to transfer data into Amazon S3 from third-party software-as-a-service (SaaS) applications such as Salesforce, Zendesk, Slack, and ServiceNow. For more information, see Supported connections for data sources and outputs.	April 29, 2021
Support for creating DataBrew datasets with input from JDBC databases	DataBrew now supports creating datasets from data in JDBC-supported databases and data warehouses, including Amazon Redshift, Snowflake, Microsoft SQL Server, MySQL, Oracle Database, and PostgreSQL. For more information, see Supported connections for data sources and outputs.	April 2, 2021
Support for additional AWS Regions	DataBrew now supports additional AWS Regions. For a list of supported Regions, see AWS Glue DataBrew endpoints and quotas.	January 28, 2021
New transforms for handling duplication	Four new transforms for handling duplication have been added to the DataBrew console and API. For more information, see DELETE_DUPLICATE_ROWS, FLAG_DUPLICATE_ROWS, FLAG_DUPLICATES_IN_COLUMN, and REMOVE_DUPLICATES in Data quality recipe steps.	January 28, 2021
Additional CSV delimiters	DataBrew now supports additional delimiters besides commas in comma-separated value (CSV) files used to create DataBrew datasets. For more information, see Creating and using AWS Glue DataBrew datasets.	January 28, 2021
DataBrew extension for JupyterLab	Now you can use AWS Glue DataBrew as an extension in JupyterLab. For more information, see Using DataBrew as an extension in JupyterLab.	November 20, 2020
New data preparation tool: AWS Glue DataBrew	This is the first release of the AWS Glue DataBrew Developer Guide.	November 11, 2020

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Quotas and constraints

AWS Glossary