Using Amazon S3 Tables with AWS analytics services - Amazon Simple Storage Service

Using Amazon S3 Tables with AWS analytics services

Note

The integration with AWS analytics services for table buckets is in preview release and is subject to change.

To make tables in your account accessible by AWS analytics services, you integrate your table buckets with AWS Glue Data Catalog and AWS Lake Formation. You can use this integration to work with your S3 tables in these services:

Note

This integration uses the AWS Glue and AWS Lake Formation services and might incur AWS Glue request and storage costs. For more information, see AWS Glue Pricing.

Additional pricing applies for running queries on your S3 tables. For more information, see pricing information for the query engine you're using.

How the integration works

When you create a table bucket in the console, Amazon S3 initiates the following actions to integrate table buckets in the Region that you have selected with AWS analytics services:

  1. Creates a new AWS Identity and Access Management (IAM) service role that gives Lake Formation access to all your table buckets.

  2. Using the service role, Lake Formation registers table buckets in the current Region. This allows Lake Formation to manage access, permissions, and governance for all current and future table buckets in that Region.

  3. Adds the s3tablescatalog to the AWS Glue Data Catalog in the current Region. This allows all your table buckets, namespaces, and tables to be populated in the Data Catalog

Note

These actions are automated through the Amazon S3 console, if you want to integrate programatically you must manually take these actions.

You integrate once per AWS Region. Once the integration is completed, all current and future table buckets, namespaces, and tables are added to the AWS Glue Data Catalog in that Region.

The following illustration shows how s3tablescatalog automatically populates table buckets, namespaces, and tables in the current Region as corresponding objects in the Data Catalog. Table buckets are populated as sub-catalogs. Namespaces within a table bucket are populated as databases within their respective sub-catalogs. Tables are populated as tables in their respective databases.

The ways table resources are represented in AWS Glue Data Catalog

Prerequisites for integration

The following prerequisites are required to integrate S3 Tables with AWS analytics services.

  • Create a table bucket.

  • Attach the AWSLakeFormationDataAdmin AWS managed policy to your IAM principal to make that user a data lake administrator. For more information on how to create a data lake administrator, see Create a data lake administrator.

  • Add permissions for the glue:PassConnection operation to your IAM principal.

  • Add permissions for the lakeformation:RegisterResource operation to your IAM principal.

Integrating table buckets with AWS analytics services

This integration must be done once per AWS Region.

  1. Open the Amazon S3 console at https://console.aws.amazon.com/s3/.

  2. In the left navigation pane, choose Table buckets.

  3. Choose Create table bucket.

    The Create table bucket page opens.

  4. Enter a Table bucket name and make sure Enable integration is selected.

  5. Choose Create table bucket. Amazon S3 will attempt to automatically integrate your table buckets in that Region.

The first time you integrate table buckets in any Region, Amazon S3 creates a new IAM service role on your behalf. This role allows Lake Formation to access all table buckets in your account and federate access to your tables in AWS Glue Data Catalog.

To integrate table buckets using the AWS CLI

The following steps show how you can use the AWS CLI to integrate table buckets. To use this example, replace the user input placeholders with your own information.

  1. Create a table bucket

    aws s3tables create-table-bucket \ --region us-east-2 \ --name amzn-s3-demo-table-bucket
  2. Create an IAM service role that allows Lake Formation to access your table resources.

    1. Create the role

      aws iam create-role \ --role-name S3TablesRoleForLakeFormation \ --assume-role-policy-document file://Role-Trust-Policy.json

      Role-Trust-Policy.json:

      { "Version": "2012-10-17", "Statement": [ { "Sid": "LakeFormationDataAccessPolicy", "Effect": "Allow", "Principal": { "Service": "lakeformation.amazonaws.com" }, "Action": [ "sts:AssumeRole", "sts:SetContext", "sts:SetSourceIdentity" ], "Condition": { "StringEquals": { "aws:SourceAccount": "111122223333" } } } ]
    2. Attach a policy to the role.

      aws iam put-role-policy \ --role-name S3TablesRoleForLakeFormation \ --policy-name LakeFormationDataAccessPermissionsForS3TableBucket \ --policy-document file://LF-GluePolicy.json

      LF-GluePolicy.json:

      { "Version": "2012-10-17", "Statement": [ { "Sid": "LakeFormationPermissionsForS3ListTableBucket", "Effect": "Allow", "Action": [ "s3tables:ListTableBuckets" ], "Resource": [ "*" ] }, { "Sid": "LakeFormationDataAccessPermissionsForS3TableBucket", "Effect": "Allow", "Action": [ "s3tables:CreateTableBucket", "s3tables:GetTableBucket", "s3tables:CreateNamespace", "s3tables:GetNamespace", "s3tables:ListNamespaces", "s3tables:DeleteNamespace", "s3tables:DeleteTableBucket", "s3tables:CreateTable", "s3tables:DeleteTable", "s3tables:GetTable", "s3tables:ListTables", "s3tables:RenameTable", "s3tables:UpdateTableMetadataLocation", "s3tables:GetTableMetadataLocation", "s3tables:GetTableData", "s3tables:PutTableData" ], "Resource": [ "arn:aws:s3tables:us-east-1:111122223333:bucket/*" ] } ] }
  3. Create the s3tablescatalog. This populates the AWS Glue Data Catalog with objects corresponding to table buckets, namespaces, and tables.

    aws glue create-catalog \ --region us-east-1 \ --cli-input-json file://catalog.json

    catalog.json:

    { "Name": "s3tablescatalog", "CatalogInput": { "FederatedCatalog": { "Identifier": "arn:aws:s3tables:us-east-1:111122223333:bucket/*", "ConnectionName": "aws:s3tables" }, "CreateDatabaseDefaultPermissions":[], "CreateTableDefaultPermissions":[] } }
  4. Register table buckets with Lake Formation

    aws lakeformation register-resource \ --region us-east-1 \ --cli-input-json file://input.json

    input.json:

    { "ResourceArn": "arn:aws:s3tables:us-east-1:111122223333:bucket/*", "WithFederation": true, "RoleArn": "arn:aws:iam::111122223333:role/S3TablesRoleForLakeFormation" }
  5. Verify that the s3tablescatalog was added in AWS Glue by using the following command.

    aws glue get-catalog

To access your tables, some AWS analytic services need a resource link that targets your table's namespace. A resource link is a Data Catalog object that acts as an alias or pointer to another Data Catalog resource, such as a database or table. The link is stored in the Data Catalog of the account or Region where it's created. For more information, see How resource links work in the Lake Formation Developer Guide.

After the AWS integration, you create resource links to work with your tables in the following services:

  • Amazon Redshift

  • Amazon Data Firehose

  • Amazon EMR

  • (Optional) Amazon Athena

You create resource links to your table namespaces, and then provide the name of the link to AWS analytics services so they can work with the linked tables.

To create a resource links to a table namespace
  • The following CLI command shows how to create a resource link that can be used to connect your S3 tables to AWS analytics services. To use this example, replace the user input placeholders with your own information.

    aws glue create-database --region us-east-2 --catalog-id "111122223333" --database-input \ '{ "Name": "resource-link-name", "TargetDatabase": { "CatalogId": "111122223333:s3tablescatalog/amzn-s3-demo-table-bucket", "DatabaseName": "my_namespace" }, "CreateTableDefaultPermissions": [] }'

Grant Lake Formation permissions on your table resources

After integration, Lake Formation manages access to your table resources. Lake Formation uses it's own Lake Formation permissions model that enables fine-grained access control for Data Catalog resources. Lake Formation requires that each IAM principal (user or role) be authorized to perform actions on Lake Formation–managed resources. For more information, see Overview of Lake Formation permissions . Before principals can access tables in AWS analytics services, you must grant Lake Formation permissions on those resources.

You need to grant Lake Formation permissions on your tables to work with them in the following services:

  • Amazon Redshift

  • Amazon Data Firehose

  • Amazon QuickSight

  • Amazon Athena

You can grant a principal Lake Formation permission on the table in S3 table bucket, either through the Lake Formation console or CLI.

Note

If you are using a resource link to access your tables you must separately grant permissions to both the resource link and the target (linked) namespace or table.

Console
  1. Open the AWS Lake Formation console at https://console.aws.amazon.com/lakeformation/, and sign in as a data lake administrator. For more information on how to create a data late administrator, see Create a data lake administrator.

  2. In the navigation pane, choose Data permissions and then choose Grant.

  3. On the Grant Permissions page, under Principals, choose the principal that needs access to the table. For query engines this is the principal that will run queries, for Firehose this is the service role you use to stream data.

  4. Under LF-Tags or catalog resources, choose Named Data Catalog resources.

  5. For Catalogs, choose a glue data catalog that you created from the integration of your table bucket. For example, <accoundID>:<s3tablescatalog>/<table-bucket-name>.

  6. For Databases, choose the S3 table bucket namespace that you created.

  7. For Tables, choose the S3 table that you created in S3 table bucket.

  8. For Table permissions, choose Super.

  9. Choose Grant.

CLI
  1. Make sure that you are running AWS CLI command as a data lake administrator. For more information, see Create a data lake administrator.

  2. Run the following command to grant Lake Formation permissions on table in S3 table bucket to an IAM principal to access the table.

    aws lakeformation grant-permissions \ --region <region e.g. us-east-1> \ --cli-input-json \ '{ "Principal": { "DataLakePrincipalIdentifier": "<user or role ARN e.g. arn:aws:iam::<accoundiD>:role/ExampleRole>" }, "Resource": { "Table": { "CatalogId": "<Account ID>:<s3tablescatalog>/<S3 table bucket name>", "DatabaseName": "<S3 table bucket namespace e.g. test_namespace>", "Name": "<S3 table bucket table name e.g. test_table>" } }, "Permissions": [ "ALL" ] }'