Configure cross-account access to a shared AWS Glue Data Catalog using Amazon Athena
Created by Denis Avdonin (AWS)
Summary
This pattern provides step-by-step instructions, including AWS Identity and Access Management (IAM) policy samples, to configure cross-account sharing of a dataset stored in an Amazon Simple Storage Service (Amazon S3) bucket by using the AWS Glue Data Catalog. You can store the dataset in an S3 bucket. The metadata is collected by an AWS Glue crawler and put into the AWS Glue Data Catalog. The S3 bucket and the AWS Glue Data Catalog reside in an AWS account referred to as the data account. You can provide access to IAM principals in another AWS account referred to as the consumer account. Users can query the data in the consumer account by using the Amazon Athena serverless query engine.
Prerequisites and limitations
Prerequisites
Two active AWS accounts
An S3 bucket in one of the AWS accounts
AWS Command Line Interface (AWS CLI), installed and configured (or AWS CloudShell
for running AWS CLI commands)
Product versions
This pattern works with Athena engine version 2 and Athena engine version 3 only. We recommend that you upgrade to Athena engine version 3. If you can’t upgrade from Athena engine version 1 to Athena engine version 3, then follow the approach from Cross-account AWS Glue Data Catalog access with Amazon Athena
Architecture
Target technology stack
Amazon Athena
Amazon Simple Storage Service (Amazon S3)
AWS Glue
AWS Identity and Access Management (IAM)
AWS Key Management Service (AWS KMS)
The following diagram shows an architecture that uses IAM permissions to share data in an S3 bucket in one AWS account (data account) with another AWS account (consumer account) through the AWS Glue Data Catalog.
The diagram shows the following workflow:
The S3 bucket policy in the data account grants permissions to an IAM role in the consumer account and to the AWS Glue crawler service role in the data account.
The AWS KMS key policy in the data account grants permissions to the IAM role in the consumer account and to the AWS Glue crawler service role in the data account.
The AWS Glue crawler in the data account discovers the schema of the data that’s stored in the S3 bucket.
The resource policy of the AWS Glue Data Catalog in the data account grants access to the IAM role in the consumer account.
A user creates a named catalog reference in the consumer account by using an AWS CLI command.
An IAM policy grants an IAM role in the consumer account access to resources in the data account. The IAM role’s trust policy allows users in the consumer account to assume the IAM role.
A user in the consumer account assumes the IAM role and accesses objects in the data catalog by using SQL queries.
The Athena serverless engine runs the SQL queries.
Note
IAM best practices recommend that you grant permissions to an IAM role and use identity federation
Tools
Amazon Athena is an interactive query service that helps you analyze data directly in Amazon S3 by using standard SQL.
Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.
AWS Glue is a fully managed extract, transform, and load (ETL) service. It helps you reliably categorize, clean, enrich, and move data between data stores and data streams.
AWS Identity and Access Management (IAM) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.
AWS Key Management Service (AWS KMS) helps you create and control cryptographic keys to protect your data.
Epics
Task | Description | Skills required |
---|---|---|
Grant access to data in the S3 bucket. | Create an S3 bucket policy based on the following template and assign the policy to the bucket where the data is stored.
The bucket policy grants permissions to the IAM role in the consumer account and to the AWS Glue crawler service role in the data account. | Cloud administrator |
(If required) Grant access to the data encryption key. | If the S3 bucket is encrypted by an AWS KMS key, grant Update the key policy with the following statement:
| Cloud administrator |
Grant the crawler access to the data. | Attach the following IAM policy to the crawler’s service role:
| Cloud administrator |
(If required) Grant the crawler access to the data encryption key. | If the S3 bucket is encrypted by an AWS KMS key, grant
| Cloud administrator |
Grant the IAM role in the consumer account and the crawler access to the data catalog. |
This policy allows all AWS Glue actions on all databases and tables in the data account. You can customize the policy to grant only required permissions to the consumer principals. For example, you can provide read-only access to specific tables or views in a database. | Cloud administrator |
Task | Description | Skills required |
---|---|---|
Create a named reference for the data catalog. | To create a named data catalog reference, use CloudShell or a locally installed AWS CLI to run the following command:
| Cloud administrator |
Grant the IAM role in the consumer account access to the data. | Attach the following policy to the IAM role in the consumer account to grant the role cross-account access to the data:
Next, use the following template to specify what users can accept the IAM role in its trust policy:
Finally, grant user permissions to assume the IAM role by attaching the same policy to the user group they belong to. | Cloud administrator |
(If required) Grant the IAM role in the consumer account access to the data encryption key. | If the S3 bucket is encrypted by an AWS KMS key, grant
| Cloud administrator |
Switch to the IAM role in the consumer account to access data. | As a data consumer, switch to the IAM role to access data in the data account. | Data consumer |
Access the data. | Query data using Athena. For example, open the Athena query editor and run the following query:
Instead of using a named catalog reference, you can also refer to the catalog by its Amazon Resource Name (ARN). NoteIf you use a dynamic catalog reference in a query or view, surround the reference with escaped double quotation marks (\"). For example:
For more information, see Cross-account access to AWS Glue data catalogs in the Amazon Athena User Guide. | Data consumer |
Related resources
Cross-account access to AWS Glue data catalogs (Athena documentation)
(AWS CLI) create-data-catalog (AWS CLI Command Reference)
Cross-account AWS Glue Data Catalog access with Amazon Athena
(AWS Big Data Blog) Security best practices in IAM (IAM documentation)
Additional information
Using Lake Formation as an alternative for cross-account sharing
You can also use AWS Lake Formation to share access to AWS Glue catalog objects across accounts. Lake Formation provides fine-grained access control at the column and row level, tag-based access control, governed tables for ACID transactions, and other functionality. Although Lake Formation is well-integrated with Athena, it does require additional configuration compared to this pattern’s IAM-only approach. We recommend that you consider the decision to use Lake Formation or IAM-only access controls within the wider context of your overall solution architecture. Considerations include what other services are involved and how they integrate with both approaches.