Amazon S3 tables integration
Amazon SageMaker Lakehouse unifies all your data across Amazon S3 data lakes, Amazon Redshift data warehouses, and third-party data sources without having to copy data. Amazon S3 Tables delivers the first cloud object store with built-in Apache Iceberg support. Amazon SageMaker Lakehouse integrates with Amazon S3 Tables so you can access S3 Tables from AWS analytics services, such as Amazon Redshift, Amazon Athena, Amazon EMR, AWS Glue, or Apache Iceberg-compatible engines (Apache Spark or PyIceberg).
Amazon SageMaker Lakehouse integration with Amazon S3 Tables helps you secure analytic workflows by joining data from Amazon S3 Tables with sources, such as Amazon Redshift data warehouses, third-party, and federated data sources (Amazon DynamoDB or PostgreSQL). SageMaker Lakehouse also enables centralized management of fine-grained data access permissions for S3 Tables and other data, and consistently applies them across all engines. To get started, complete the steps in the following sections.
Prerequisites - complete all the steps in the Getting started with Amazon SageMaker Lakehouse.
Enable Amazon S3 Integration
-
Navigate to the Amazon S3 console
. In the left navigation pane, choose Table buckets. -
Choose Create table bucket.
-
On the Create table bucket page, enter a Table bucket name and select Enable integration.
-
Choose Create table bucket.
-
You will see confirmation when Amazon S3 completes integration of your table buckets with SageMaker Lakehouse.
Onboard S3 Tables in SageMaker Lakehouse
To provide access to S3 tables, complete the following steps:
-
Navigate to the AWS Lake Formation
console. -
In the left navigation pane, choose Catalogs and choose S3tablescatalog.
-
From S3tablescatalog, under Objects, choose the name of your newly created table bucket.
-
From the Actions menu, select Grant.
-
In the Grant permissions, under IAM users and roles, select your Amazon SageMaker Unified Studio Project role. To grant full access, under Catalog Permissions > Grant, select Super user.
Create S3 Table and add data in SageMaker Lakehouse
-
Navigate to Amazon SageMaker Unified Studio, and select the project.
-
From the Build menu, select Query Editor, and ensure you have Athena selected in Connections.
-
Create a database using SQL.
CREATE DATABASE "s3tablescatalog/<Your Bucket Name>".<YourDBName>;
-
Create an S3 table using SQL.
CREATE TABLE "s3tablescatalog/<Your Bucket Name>".<YourDBName>.<YourTableName> ( c_salutation string, c_login string, c_first_name string, c_last_name string, c_email_address string) TBLPROPERTIES ( 'table_type'='ICEBERG' );
-
Add data using SQL.
INSERT INTO "s3tablescatalog/<Your Bucket Name>".<YourDBName>.<YourTableName> VALUES('Dr.','1381546','Joyce','Deaton','Joyce.Deaton@qhtrwert.edu');
You can now use the following integrated analytics services:
-
Amazon Athena - create databases, tables, query and add data in S3 Tables.
-
Amazon Redshift - query data from S3 Tables.
-
Amazon EMR - create table, namespace, query and add data in S3 Tables.
-
AWS Glue - create table, namespace, query and add data in S3 Tables.
-
AWS Lake Formation - grant fine-grained permissions for S3 table catalogs, databases, tables, columns, and cells.
Note
Access to S3 Tables with SageMaker Lakehouse is available in the AWS Regions where S3 Tables are available. Amazon SageMaker Unified Studio Visual ETL flow integration is not supported.