Configure cross-account access to a shared AWS Glue Data Catalog using Amazon Athena - AWS Prescriptive Guidance

Configure cross-account access to a shared AWS Glue Data Catalog using Amazon Athena

Created by Denis Avdonin (AWS)

Environment: Production

Technologies: DataLakes; Analytics; Big data

Workload: All other workloads

AWS services: Amazon Athena; AWS Glue

Summary

This pattern provides step-by-step instructions, including AWS Identity and Access Management (IAM) policy samples, to configure cross-account sharing of a dataset stored in an Amazon Simple Storage Service (Amazon S3) bucket by using the AWS Glue Data Catalog. You can store the dataset in an S3 bucket. The metadata is collected by an AWS Glue crawler and put into the AWS Glue Data Catalog. The S3 bucket and the AWS Glue Data Catalog reside in an AWS account referred to as the data account. You can provide access to IAM principals in another AWS account referred to as the consumer account. Users can query the data in the consumer account by using the Amazon Athena serverless query engine.

Prerequisites and limitations

Prerequisites

Product versions

This pattern works with Athena engine version 2 and Athena engine version 3 only. We recommend that you upgrade to Athena engine version 3. If you can’t upgrade from Athena engine version 1 to Athena engine version 3, then follow the approach from Cross-account AWS Glue Data Catalog access with Amazon Athena in the AWS Big Data Blog.

Architecture

Target technology stack

  • Amazon Athena

  • Amazon Simple Storage Service (Amazon S3)

  • AWS Glue

  • AWS Identity and Access Management (IAM)

  • AWS Key Management Service (AWS KMS)

The following diagram shows an architecture that uses IAM permissions to share data in an S3 bucket in one AWS account (data account) with another AWS account (consumer account) through the AWS Glue Data Catalog.

Sharing a dataset in an S3 bucket between a data account and a consumer account by using the AWS Glue Data Catalog.

The diagram shows the following workflow:

  1. The S3 bucket policy in the data account grants permissions to an IAM role in the consumer account and to the AWS Glue crawler service role in the data account.

  2. The AWS KMS key policy in the data account grants permissions to the IAM role in the consumer account and to the AWS Glue crawler service role in the data account.

  3. The AWS Glue crawler in the data account discovers the schema of the data that’s stored in the S3 bucket.

  4. The resource policy of the AWS Glue Data Catalog in the data account grants access to the IAM role in the consumer account.

  5. A user creates a named catalog reference in the consumer account by using an AWS CLI command.

  6. An IAM policy grants an IAM role in the consumer account access to resources in the data account. The IAM role’s trust policy allows users in the consumer account to assume the IAM role.

  7. A user in the consumer account assumes the IAM role and accesses objects in the data catalog by using SQL queries.

  8. The Athena serverless engine runs the SQL queries.

Note: IAM best practices recommend that you grant permissions to an IAM role and use identity federation.

Tools

  • Amazon Athena is an interactive query service that helps you analyze data directly in Amazon S3 by using standard SQL.

  • Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.

  • AWS Glue is a fully managed extract, transform, and load (ETL) service. It helps you reliably categorize, clean, enrich, and move data between data stores and data streams.

  • AWS Identity and Access Management (IAM) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.

  • AWS Key Management Service (AWS KMS) helps you create and control cryptographic keys to protect your data.

Epics

TaskDescriptionSkills required

Grant access to data in the S3 bucket.

Create an S3 bucket policy based on the following template and assign the policy to the bucket where the data is stored.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::<consumer account id>:role/<role name>", "arn:aws:iam::<data account id>:role/service-role/AWSGlueServiceRole-data-bucket-crawler" ] }, "Action": "s3:GetObject", "Resource": "arn:aws:s3:::data-bucket/*" }, { "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::<consumer account id>:role/<role name>", "arn:aws:iam::<data account id>:role/service-role/AWSGlueServiceRole-data-bucket-crawler" ] }, "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::data-bucket" } ] }

The bucket policy grants permissions to the IAM role in the consumer account and to the AWS Glue crawler service role in the data account.

Cloud administrator

(If required) Grant access to the data encryption key.

If the S3 bucket is encrypted by an AWS KMS key, grant kms:Decrypt permission on the key to the IAM role in the consumer account and to the AWS Glue crawler service role in the data account.

Update the key policy with the following statement:

{ "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::<consumer account id>:role/<role name>", "arn:aws:iam::<data account id>:role/service-role/AWSGlueServiceRole-data-bucket-crawler" ] }, "Action": "kms:Decrypt", "Resource": "arn:aws:kms:<region>:<data account id>:key/<key id>" }
Cloud administrator

Grant the crawler access to the data.

Attach the following IAM policy to the crawler’s service role:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::data-bucket/*" }, { "Effect": "Allow", "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::data-bucket" } ] }
Cloud administrator

(If required) Grant the crawler access to the data encryption key.

If the S3 bucket is encrypted by an AWS KMS key, grant kms:Decrypt permission on the key to the crawler’s service role by attaching the following policy to it:

{ "Effect": "Allow", "Action": "kms:Decrypt", "Resource": "arn:aws:kms:<region>:<data account id>:key/<key id>" }
Cloud administrator

Grant the IAM role in the consumer account and the crawler access to the data catalog.

  1. Sign in to the AWS Management Console and open the AWS Glue console.

  2. In the navigation pane, under Data Catalog, choose Settings.

  3. In the Permissions section, add the following statement, and then choose Save.

{ "Version" : "2012-10-17", "Statement" : [ { "Effect" : "Allow", "Principal" : { "AWS" : [ "arn:aws:iam::<consumer account id>:role/<role name>", "arn:aws:iam::<data account id>:role/service-role/AWSGlueServiceRole-data-bucket-crawler" ] }, "Action" : "glue:*", "Resource" : [ "arn:aws:glue:<region>:<data account id>:catalog", "arn:aws:glue:<region>:<data account id>:database/*", "arn:aws:glue:<region>:<data account id>:table/*" ] } ] }

This policy allows all AWS Glue actions on all databases and tables in the data account. You can customize the policy to grant only required permissions to the consumer principals. For example, you can provide read-only access to specific tables or views in a database.

Cloud administrator
TaskDescriptionSkills required

Create a named reference for the data catalog.

To create a named data catalog reference, use CloudShell or a locally installed AWS CLI to run the following command:

aws athena create-data-catalog --name <shared catalog name> --type GLUE --parameters catalog-id=<data account id>
Cloud administrator

Grant the IAM role in the consumer account access to the data.

Attach the following policy to the IAM role in the consumer account to grant the role cross-account access to the data:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::data-bucket/*" }, { "Effect": "Allow", "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::data-bucket" }, { "Effect": "Allow", "Action": "glue:*", "Resource": [ "arn:aws:glue:<region>:<data account id>:catalog", "arn:aws:glue:<region>:<data account id>:database/*", "arn:aws:glue:<region>:<data account id>:table/*" ] } ] }

Next, use the following template to specify what users can accept the IAM role in its trust policy:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<consumer account id>:user/<IAM user>" }, "Action": "sts:AssumeRole" } ] }

Finally, grant user permissions to assume the IAM role by attaching the same policy to the user group they belong to.

Cloud administrator

(If required) Grant the IAM role in the consumer account access to the data encryption key.

If the S3 bucket is encrypted by an AWS KMS key, grant kms:Decrypt permission on the key to the IAM role in the consumer account by attaching the following policy to it:

{ "Effect": "Allow", "Action": "kms:Decrypt", "Resource": "arn:aws:kms:<region>:<data account id>:key/<key id>" }
Cloud administrator

Switch to the IAM role in the consumer account to access data.

As a data consumer, switch to the IAM role to access data in the data account.

Data consumer

Access the data.

Query data using Athena. For example, open the Athena query editor and run the following query:

SELECT * FROM <shared catalog name>.<database name>.<table name>

Instead of using a named catalog reference, you can also refer to the catalog by its Amazon Resource Name (ARN).

Note: If you use a dynamic catalog reference in a query or view, surround the reference with escaped double quotation marks (\"). For example:

SELECT * FROM \"glue:arn:aws:glue:<region>:<data account id>:catalog\".<database name>.<table name>

For more information, see Cross-account access to AWS Glue data catalogs in the Amazon Athena User Guide.

Data consumer

Related resources

Additional information

Using Lake Formation as an alternative for cross-account sharing

You can also use AWS Lake Formation to share access to AWS Glue catalog objects across accounts. Lake Formation provides fine-grained access control at the column and row level, tag-based access control, governed tables for ACID transactions, and other functionality. Although Lake Formation is well-integrated with Athena, it does require additional configuration compared to this pattern’s IAM-only approach. We recommend that you consider the decision to use Lake Formation or IAM-only access controls within the wider context of your overall solution architecture. Considerations include what other services are involved and how they integrate with both approaches.