Document history for AWS Glue DataBrew Developer Guide - AWS Glue DataBrew

Document history for AWS Glue DataBrew Developer Guide

Current API version: databrew-2017-07-25

The following table describes the documentation for this release of AWS Glue DataBrew. If you want to be notified when the AWS Glue DataBrew Developer Guide is updated, you can subscribe to the RSS feed.

ChangeDescriptionDate

glue:GetCustomEntityType added to AWS managed policies

This permission is required to execute AWS Glue DataBrew profile jobs with PII-identification enabled. For more information, see AWS Glue DataBrew updates to AWS managed policies.

March 20, 2024

Support for multiple hashing algorithms in the CRYPTOGRAPHIC_HASH transformation

You can now specify a hashing algorithm when hashing values in a column. For more information, see CRYPTOGRAPHIC_HASH.

August 11, 2023

glue:BatchGetCustomEntityTypes added to AWS managed policies

This permission is required to execute AWS Glue DataBrew profile jobs with PII-identification enabled. For more information, see AWS Glue DataBrew updates to AWS managed policies.

May 9, 2022

Support for Apache ORC file format

DataBrew now supports Apache ORC as a file format for DataBrew data sources and outputs. For more information, see Supported file types for data sources.

March 31, 2022

Support for cross-account AWS Glue Data Catalog Amazon S3 access

You can now access AWS Glue Data Catalog S3 tables from other AWS accounts if an appropriate resource policy is created in the AWS Glue console. After creating a policy, the relevant Data Catalog S3 tables can be selected as input sources when creating a DataBrew dataset. For more information, see Supported connections for data sources and outputs.

March 11, 2022

Support for native console integration with Amazon AppFlow

DataBrew now has native console integration with Amazon AppFlow. This integration means that you can connect to data from Salesforce, Zendesk, Slack, ServiceNow, and other software-as-a-service (SaaS) applications. You can also connect to data from AWS services such as Amazon S3 and Amazon Redshift. For more information, see Supported connections for data sources and outputs.

November 18, 2021

Support for data quality rules

DataBrew now supports the creation of data quality rules, which are customizable validation checks that define business requirements for specific data. For more information, see Validating data quality in AWS Glue DataBrew.

November 18, 2021

Support for custom SQL statements

DataBrew now supports custom SQL statements for retrieving data from Amazon Redshift and Snowflake. This support means that you can use a purpose-built query to select and limit the data returned from large tables. For more information, see Supported connections for data sources and outputs.

November 18, 2021

Support for PII detection

DataBrew now supports detection of personally identifiable information (PII). This gives you the option of masking PII during data preparation. For more information, see Identifying and handling personally identifiable information (PII).

November 18, 2021

Support for additional AWS Regions

DataBrew now supports additional AWS Regions. For a list of supported Regions, see AWS Glue DataBrew endpoints and quotas.

October 5, 2021

Support for writing data to Lake Formation-based Amazon S3 tables

DataBrew now supports writing data into AWS Glue Data Catalog S3 tables based on AWS Lake Formation. DataBrew also now supports writing data into Tableau Hyper format. For more information, see Creating and working with AWS Glue DataBrew recipe jobs.

August 13, 2021

Support for writing data into JDBC destinations

DataBrew now supports writing data directly into JDBC-supported databases and data warehouses. These include Amazon Redshift, Snowflake, Microsoft SQL Server, MySQL, Oracle Database, and PostgreSQL. For more information, see Creating and working with AWS Glue DataBrew recipe jobs.

July 23, 2021

Support for specifying which data quality statistics are generated for a profile job

DataBrew now supports specifying which data quality statistics are autogenerated for datasets in a profile job. For more information, see Creating and working with AWS Glue DataBrew recipe jobs.

July 23, 2021

Support for writing datasets into the AWS Glue Data Catalog

DataBrew now includes support for writing datasets directly into the AWS Glue Data Catalog. You can choose to store datasets created from jobs that run your data preparation recipes in Amazon S3, Amazon Redshift, and Amazon RDS tables in the Data Catalog. The RDS tables supported include those for Amazon Aurora, RDS for Oracle, RDS for Microsoft SQL Server, RDS for MySQL, and RDS for PostgreSQL.

June 30, 2021

Support for identifying advanced data types

DataBrew now includes support to automatically identify and mark advanced data types for columns, which makes it easier to normalize columns that contain certain types of data. These types of data include Social Security number, email address, phone number, gender, credit card, URL, IP address, date and time, currency, ZIP code, country, region, state, and city.

June 30, 2021

Support for using Amazon AppFlow to transfer data from SAAS applications

DataBrew now supports using Amazon AppFlow to transfer data into Amazon S3 from third-party software-as-a-service (SaaS) applications such as Salesforce, Zendesk, Slack, and ServiceNow. For more information, see Supported connections for data sources and outputs.

April 29, 2021

Support for creating DataBrew datasets with input from JDBC databases

DataBrew now supports creating datasets from data in JDBC-supported databases and data warehouses, including Amazon Redshift, Snowflake, Microsoft SQL Server, MySQL, Oracle Database, and PostgreSQL. For more information, see Supported connections for data sources and outputs.

April 2, 2021

Support for additional AWS Regions

DataBrew now supports additional AWS Regions. For a list of supported Regions, see AWS Glue DataBrew endpoints and quotas.

January 28, 2021

New transforms for handling duplication

Four new transforms for handling duplication have been added to the DataBrew console and API. For more information, see DELETE_DUPLICATE_ROWS, FLAG_DUPLICATE_ROWS, FLAG_DUPLICATES_IN_COLUMN, and REMOVE_DUPLICATES in Data quality recipe steps.

January 28, 2021

Additional CSV delimiters

DataBrew now supports additional delimiters besides commas in comma-separated value (CSV) files used to create DataBrew datasets. For more information, see Creating and using AWS Glue DataBrew datasets.

January 28, 2021

DataBrew extension for JupyterLab

Now you can use AWS Glue DataBrew as an extension in JupyterLab. For more information, see Using DataBrew as an extension in JupyterLab.

November 20, 2020

New data preparation tool: AWS Glue DataBrew

This is the first release of the AWS Glue DataBrew Developer Guide.

November 11, 2020