Using Amazon S3 Tables with AWS analytics services
Note
The integration with AWS analytics services for table buckets is in preview release and is subject to change.
To make tables in your account accessible by AWS analytics services, you integrate your table buckets with AWS Glue Data Catalog and AWS Lake Formation. You can use this integration to work with your S3 tables in these services:
Note
This integration uses the AWS Glue and AWS Lake Formation services and might incur AWS Glue request and storage costs. For more information,
see AWS Glue Pricing.
Additional pricing applies for running queries on your S3 tables. For more information, see pricing information for the query engine you're using.
How the integration works
When you create a table bucket in the console, Amazon S3 initiates the following actions to integrate table buckets in the Region that you have selected with AWS analytics services:
-
Creates a new AWS Identity and Access Management (IAM) service role that gives Lake Formation access to all your table buckets.
-
Using the service role, Lake Formation registers table buckets in the current Region. This allows Lake Formation to manage access, permissions, and governance for all current and future table buckets in that Region.
-
Adds the s3tablescatalog to the AWS Glue Data Catalog in the current Region. This allows all your table buckets, namespaces, and tables to be populated in the Data Catalog
Note
These actions are automated through the Amazon S3 console, if you want to integrate programatically you must manually take these actions.
You integrate once per AWS Region. Once the integration is completed, all current and future table buckets, namespaces, and tables are added to the AWS Glue Data Catalog in that Region.
The following illustration shows how s3tablescatalog automatically populates table buckets, namespaces, and tables in the current Region as corresponding objects in the Data Catalog. Table buckets are populated as sub-catalogs. Namespaces within a table bucket are populated as databases within their respective sub-catalogs. Tables are populated as tables in their respective databases.
Prerequisites for integration
The following prerequisites are required to integrate S3 Tables with AWS analytics services.
Attach the AWSLakeFormationDataAdmin AWS managed policy to your IAM principal to make that user a data lake administrator. For more information on how to create a data lake administrator, see Create a data lake administrator.
-
Add permissions for the
glue:PassConnection
operation to your IAM principal. -
Add permissions for the
lakeformation:RegisterResource
operation to your IAM principal.
Integrating table buckets with AWS analytics services
This integration must be done once per AWS Region.
Open the Amazon S3 console at https://console.aws.amazon.com/s3/
. In the left navigation pane, choose Table buckets.
Choose Create table bucket.
The Create table bucket page opens.
Enter a Table bucket name and make sure Enable integration is selected.
Choose Create table bucket. Amazon S3 will attempt to automatically integrate your table buckets in that Region.
The first time you integrate table buckets in any Region, Amazon S3 creates a new IAM service role on your behalf. This role allows Lake Formation to access all table buckets in your account and federate access to your tables in AWS Glue Data Catalog.
To integrate table buckets using the AWS CLI
The following steps show how you can use the AWS CLI to integrate
table buckets. To use this example, replace the user input
placeholders
with your own information.
Create a table bucket
aws s3tables create-table-bucket \ --region
us-east-2
\ --nameamzn-s3-demo-table-bucket
-
Create an IAM service role that allows Lake Formation to access your table resources.
-
Create the role
aws iam create-role \ --role-name S3TablesRoleForLakeFormation \ --assume-role-policy-document file://Role-Trust-Policy.json
Role-Trust-Policy.json:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "LakeFormationDataAccessPolicy", "Effect": "Allow", "Principal": { "Service": "lakeformation.amazonaws.com" }, "Action": [ "sts:AssumeRole", "sts:SetContext", "sts:SetSourceIdentity" ], "Condition": { "StringEquals": { "aws:SourceAccount": "
111122223333
" } } } ] -
Attach a policy to the role.
aws iam put-role-policy \ --role-name S3TablesRoleForLakeFormation \ --policy-name LakeFormationDataAccessPermissionsForS3TableBucket \ --policy-document file://LF-GluePolicy.json
LF-GluePolicy.json:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "LakeFormationPermissionsForS3ListTableBucket", "Effect": "Allow", "Action": [ "s3tables:ListTableBuckets" ], "Resource": [ "*" ] }, { "Sid": "LakeFormationDataAccessPermissionsForS3TableBucket", "Effect": "Allow", "Action": [ "s3tables:CreateTableBucket", "s3tables:GetTableBucket", "s3tables:CreateNamespace", "s3tables:GetNamespace", "s3tables:ListNamespaces", "s3tables:DeleteNamespace", "s3tables:DeleteTableBucket", "s3tables:CreateTable", "s3tables:DeleteTable", "s3tables:GetTable", "s3tables:ListTables", "s3tables:RenameTable", "s3tables:UpdateTableMetadataLocation", "s3tables:GetTableMetadataLocation", "s3tables:GetTableData", "s3tables:PutTableData" ], "Resource": [ "arn:aws:s3tables:
us-east-1
:111122223333
:bucket/*" ] } ] }
-
-
Create the s3tablescatalog. This populates the AWS Glue Data Catalog with objects corresponding to table buckets, namespaces, and tables.
aws glue create-catalog \ --region
us-east-1
\ --cli-input-json file://catalog.jsoncatalog.json:
{ "Name": "s3tablescatalog", "CatalogInput": { "FederatedCatalog": { "Identifier": "arn:aws:s3tables:
us-east-1
:111122223333
:bucket/*", "ConnectionName": "aws:s3tables" }, "CreateDatabaseDefaultPermissions":[], "CreateTableDefaultPermissions":[] } } -
Register table buckets with Lake Formation
aws lakeformation register-resource \ --region
us-east-1
\ --cli-input-json file://input.jsoninput.json:
{ "ResourceArn": "arn:aws:s3tables:
us-east-1
:111122223333
:bucket/*", "WithFederation": true, "RoleArn": "arn:aws:iam::111122223333
:role/S3TablesRoleForLakeFormation" } Verify that the
s3tablescatalog
was added in AWS Glue by using the following command.aws glue get-catalog
Next steps
Creating a resource link to your table's namespaces
To access your tables, some AWS analytic services need a resource link that targets your table's namespace. A resource link is a Data Catalog object that acts as an alias or pointer to another Data Catalog resource, such as a database or table. The link is stored in the Data Catalog of the account or Region where it's created. For more information, see How resource links work in the Lake Formation Developer Guide.
After the AWS integration, you create resource links to work with your tables in the following services:
-
Amazon Redshift
-
Amazon Data Firehose
-
Amazon EMR
-
(Optional) Amazon Athena
You create resource links to your table namespaces, and then provide the name of the link to AWS analytics services so they can work with the linked tables.
To create a resource links to a table namespace
The following CLI command shows how to create a resource link that can be used to connect your S3 tables to AWS analytics services. To use this example, replace the
user input placeholders
with your own information.aws glue create-database --region
us-east-2
--catalog-id "111122223333
" --database-input \ '{ "Name": "resource-link-name
", "TargetDatabase": { "CatalogId": "111122223333
:s3tablescatalog/amzn-s3-demo-table-bucket
", "DatabaseName": "my_namespace
" }, "CreateTableDefaultPermissions": [] }'
Grant Lake Formation permissions on your table resources
After integration, Lake Formation manages access to your table resources. Lake Formation uses it's own Lake Formation permissions model that enables fine-grained access control for Data Catalog resources. Lake Formation requires that each IAM principal (user or role) be authorized to perform actions on Lake Formation–managed resources. For more information, see Overview of Lake Formation permissions . Before principals can access tables in AWS analytics services, you must grant Lake Formation permissions on those resources.
You need to grant Lake Formation permissions on your tables to work with them in the following services:
-
Amazon Redshift
-
Amazon Data Firehose
-
Amazon QuickSight
-
Amazon Athena
You can grant a principal Lake Formation permission on the table in S3 table bucket, either through the Lake Formation console or CLI.
Note
If you are using a resource link to access your tables you must separately grant permissions to both the resource link and the target (linked) namespace or table.