Allow Lambda function access to external Hive metastores - Amazon Athena

Allow Lambda function access to external Hive metastores

To invoke a Lambda function in your account, you must create a role that has the following permissions:

  • AWSLambdaVPCAccessExecutionRole – An AWS Lambda execution role permission to manage elastic network interfaces that connect your function to a VPC. Ensure that you have a sufficient number of network interfaces and IP addresses available.

  • AmazonAthenaFullAccess – The AmazonAthenaFullAccess managed policy grants full access to Athena.

  • An Amazon S3 policy to allow the Lambda function to write to S3 and to allow Athena to read from S3.

For example, the following policy defines the permission for the spill location s3:\\mybucket\spill.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::amzn-s3-demo-bucket/spill" ] } ] }

Whenever you use IAM policies, make sure that you follow IAM best practices. For more information, see Security best practices in IAM in the IAM User Guide.

Create Lambda functions

To create a Lambda function in your account, function development permissions or the AWSLambdaFullAccess role are required. For more information, see Identity-based IAM policies for AWS Lambda.

Because Athena uses the AWS Serverless Application Repository to create Lambda functions, the superuser or administrator who creates Lambda functions should also have IAM policies to allow Athena federated queries.

Configure permissions for catalog registration and metadata API operations

For API access to catalog registration and metadata operations, you can use the AmazonAthenaFullAccess managed policy. If you do not use the AmazonAthenaFullAccess policy, add the following API operations to your Athena policies:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "athena:ListDataCatalogs", "athena:GetDataCatalog", "athena:CreateDataCatalog", "athena:UpdateDataCatalog", "athena:DeleteDataCatalog", "athena:GetDatabase", "athena:ListDatabases", "athena:GetTableMetadata", "athena:ListTableMetadata" ], "Resource": [ "*" ] } ] }

Call a Lambda function across regions

By default, Athena invokes Lambda functions defined in the same region. To invoke a Lambda function in an AWS Region other than the region in which you are running Athena queries, use the full ARN of the Lambda function.

The following example shows how a catalog in the Europe (Frankfurt) Region can specify a Lambda function in the US East (N. Virginia) Region to fetch data from the Hive metastore in the Europe (Frankfurt) Region.

arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new

When you specify the full ARN in this way, Athena can call the external-hms-service-new Lambda function on us-east-1 to fetch the Hive metastore data from eu-central-1.

Note

The catalog should be registered in the same AWS Region that you use to run Athena queries.

Call a Lambda function across accounts

Sometimes you might require access to a Hive metastore from a different account. For example, to run a Hive metastore, you might use an account that is different from the one that you use for Athena queries. Different groups or teams might run Hive metastore with different accounts inside their VPC. Or you might want to access metadata from different Hive metastores from different groups or teams.

Athena uses the AWS Lambda support for cross account access to enable cross account access for Hive Metastores.

Note

Note that cross account access for Athena normally implies cross account access for both metadata and data in Amazon S3.

Imagine the following scenario:

  • Account 111122223333 sets up the Lambda function external-hms-service-new on us-east-1 in Athena to access a Hive Metastore running on an EMR cluster.

  • Account 111122223333 wants to allow account 444455556666 to access the Hive Metastore data.

To grant account 444455556666 access to the Lambda function external-hms-service-new, account 111122223333 uses the following AWS CLI add-permission command. The command has been formatted for readability.

$ aws --profile perf-test lambda add-permission --function-name external-hms-service-new --region us-east-1 --statement-id Id-ehms-invocation2 --action "lambda:InvokeFunction" --principal arn:aws:iam::444455556666:user/perf1-test { "Statement": "{\"Sid\":\"Id-ehms-invocation2\", \"Effect\":\"Allow\", \"Principal\":{\"AWS\":\"arn:aws:iam::444455556666:user/perf1-test\"}, \"Action\":\"lambda:InvokeFunction\", \"Resource\":\"arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new\"}" }

To check the Lambda permission, use the get-policy command, as in the following example. The command has been formatted for readability.

$ aws --profile perf-test lambda get-policy --function-name arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new --region us-east-1 { "RevisionId": "711e93ea-9851-44c8-a09f-5f2a2829d40f", "Policy": "{\"Version\":\"2012-10-17\", \"Id\":\"default\", \"Statement\":[{\"Sid\":\"Id-ehms-invocation2\", \"Effect\":\"Allow\", \"Principal\":{\"AWS\":\"arn:aws:iam::444455556666:user/perf1-test\"}, \"Action\":\"lambda:InvokeFunction\", \"Resource\":\"arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new\"}]}" }

After adding the permission, you can use a full ARN of the Lambda function on us-east-1 like the following when you define catalog ehms:

arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new

For information about cross region invocation, see Call a Lambda function across regions earlier in this topic.

Grant cross-account access to data

Before you can run Athena queries, you must grant cross account access to the data in Amazon S3. You can do this in one of the following ways:

  • Update the access control list policy of the Amazon S3 bucket with a canonical user ID.

  • Add cross account access to the Amazon S3 bucket policy.

For example, add the following policy to the Amazon S3 bucket policy in the account 111122223333 to allow account 444455556666 to read data from the Amazon S3 location specified.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1234567890123", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::444455556666:user/perf1-test" }, "Action": "s3:GetObject", "Resource": "arn:aws:s3:::athena-test/lambda/dataset/*" } ] }
Note

You might need to grant cross account access to Amazon S3 not only to your data, but also to your Amazon S3 spill location. Your Lambda function spills extra data to the spill location when the size of the response object exceeds a given threshold. See the beginning of this topic for a sample policy.

In the current example, after cross account access is granted to 444455556666, 444455556666 can use catalog ehms in its own account to query tables that are defined in account 111122223333.

In the following example, the SQL Workbench profile perf-test-1 is for account 444455556666. The query uses catalog ehms to access the Hive metastore and the Amazon S3 data in account 111122223333.

Accessing Hive metastore and Amazon S3 data across accounts in SQL Workbench.