Integration with AWS Glue - Amazon Athena

Integration with AWS Glue

AWS Glue is a fully managed ETL (extract, transform, and load) AWS service. One of its key abilities is to analyze and categorize data. You can use AWS Glue crawlers to automatically infer database and table schema from your data in Amazon S3 and store the associated metadata in the AWS Glue Data Catalog.

Athena uses the AWS Glue Data Catalog to store and retrieve table metadata for the Amazon S3 data in your Amazon Web Services account. The table metadata lets the Athena query engine know how to find, read, and process the data that you want to query.

To create database and table schema in the AWS Glue Data Catalog, you can run an AWS Glue crawler from within Athena on a data source, or you can run Data Definition Language (DDL) queries directly in the Athena Query Editor. Then, using the database and table schema that you created, you can use Data Manipulation (DML) queries in Athena to query the data.

You can register an AWS Glue Data Catalog from an account other than your own. After you configure the required IAM permissions for AWS Glue, you can use Athena to run cross-account queries. For more information, see Cross-account access to AWS Glue data catalogs.

For more information about the AWS Glue Data Catalog, see Data Catalog and crawlers in AWS Glue in the AWS Glue Developer Guide.

For an illustrative article showing how to use AWS Glue and Athena to process XML data, see Process and analyze highly nested and large XML files using AWS Glue and Amazon Athena in the AWS Big Data Blog.

Separate charges apply to AWS Glue. For more information, see AWS Glue pricing.