Amazon SageMaker Unified Studio is in preview release and is subject to change.
Getting started with Amazon SageMaker Lakehouse
This tutorial covers information to help you get started using Amazon SageMaker Lakehouse as a user. If you are new to Amazon SageMaker Lakehouse, start by reading Amazon SageMaker Lakehouse. If you are new to Amazon SageMaker Unified Studio, start by reading the concepts and terminology in Amazon SageMaker Unified Studio terminology and concepts.
Prerequisites
-
Your administrator must grant you access to Amazon SageMaker Unified Studio.
If you don't have access to it, contact your administrator. For more information, see Access Amazon SageMaker Unified Studio.
-
You must have an Amazon SageMaker Unified Studio project and with the proper project membership role.
If you don't have proper access to a project, contact your administrator. To view your project membership role, choose Actions on the top right corner of the project overview page, then choose Manage members. You will see your membership role in the Role column.
Create a project
You can create a project from a project profile, which defines a template for projects in your domain. To use Amazon SageMaker Lakehouse, your project must be created using either Data analytics and AI-ML model development or SQL analytics project profile. For more information about creating a project, see Create a project from Amazon SageMaker Unified Studio User Guide.
When using Amazon SageMaker Unified Studio, you can create the following resources in the lakehouse:
-
Databases in AWS Glue Data Catalog
Amazon SageMaker Lakehouse is implemented on AWS Glue and AWS Lake Formation in your AWS account.
-
A catalog to store data in Redshift Managed Storage (RMS) format
You will create a catalog in RMS format. To view the catalog, navigate to the AWS Lake Formation console at https://console.aws.amazon.com/lakeformation/
, you should be able to see the catalog from the Catalogs list. -
Provisioning permissions
You will create an IAM role when you create a project. Each project has a dedicated IAM role. This IAM role has permission to the resources that are created from this project. The Amazon Resource Name (ARN) of this IAM role is visible from Project details section of the Project overview page.
Browse data
You can browse data in Amazon SageMaker Lakehouse by completing the following steps.
To browse data
-
Choose a project to view the data.
-
On project page, from the left navigation, choose Data. This opens the Data explorer in the middle of the page.
The Data explorer includes: Lakehouse, Redshift, and S3.
-
Expand Lakehouse to view catalogs, databases, tables.
Upload data
You can upload data in CSV or JSON format to a catalog. To upload data, follow the instructions in Uploading data.
After uploading data is complete, you will see the table listed within the database under AwsDataCatalog.
Query data
You can query data using supported query editor.
To query data
-
On Lakehouse, choose AwsDataCatalog on top. Expand the catalog to view the list of databases. Choose a database.
-
From a selected database, choose a table. Then choose the three dot menu to the right of the table to view supported tools for data query.
-
Choose Query with Athena. This opens the Data explorer page where you can run SQL queries. You might find information in SQL reference for Athena helpful.
-
Choose Query with Amazon Redshift. This opens the Data explorer page where you can run SQL queries. You might find information in Querying a database using the query editor v2 helpful.
To subscribe an asset, see Request subscription to assets in Amazon SageMaker Unified Studio.
To publish data to the catalog from the lakehouse inventory, see Publishing data in Amazon SageMaker Lakehouse.