Accessing table data

Focus mode

Accessing table data - Amazon Simple Storage Service

Accessing tables through the Amazon SageMaker Lakehouse integration Accessing tables directly

There are multiple ways to access tables in Amazon S3 table buckets, you can integrate tables with AWS analytics services using Amazon SageMaker Lakehouse, or access tables directly using the Amazon S3 Tables Iceberg REST endpoint or the Amazon S3 Tables Catalog for Apache Iceberg. The access method you use will depend on your catalog setup, governance model, and access control needs. The following is an overview of these access methods.

Choosing an access method

Amazon SageMaker Lakehouse integration: This is the recommended access method for working with tables in S3 table buckets. The integration gives you unified table management, centralized governance, and fine-grained access control across multiple AWS analytics services.
Direct access: Use this method if you need to work with AWS Partner Network (APN) catalog implementations, custom catalog implementations, or if you only need to perform basic read/write operations on tables within a single table bucket.

Accessing tables through the Amazon SageMaker Lakehouse integration

You can integrate S3 table buckets with Amazon SageMaker Lakehouse to access tables from AWS analytics services, such as Amazon Athena, Amazon Redshift, and Amazon QuickSight. Amazon SageMaker Lakehouse unifies your data across Amazon S3 data lakes and Amazon Redshift data warehouses, so you can build analytics, machine learning (ML), and generative AI applications on a single copy of data. The integration populates the AWS Glue Data Catalog with your table resources, and federates access to these resources with AWS Lake Formation. For more information on integrating, see Using Amazon S3 Tables with AWS analytics services.

The integration enables fine-grained access control through AWS Lake Formation to provide additional security. Lake Formation uses a combination of its own permissions model and the IAM permissions model to control access to table resources and underlying data. This means that a request to access your table must pass permission checks by both IAM and Lake Formation. For more information see, Lake Formation permissions overview in the AWS Lake Formation Developer Guide.

The following AWS analytics services can access tables through this integration:

Accessing tables using the AWS Glue Iceberg REST endpoint

Once your S3 table buckets are integrated with Amazon SageMaker Lakehouse, you can also use the AWS Glue Iceberg REST endpoint to connect to S3 tables from third-party query engines that support Iceberg. For more information, see Accessing Amazon S3 tables using the AWS Glue Iceberg REST endpoint.

We recommend using the AWS Glue Iceberg REST endpoint when you want to access tables from Spark, PyIceberg, or other Iceberg-compatible clients.

The following clients can access tables directly through the AWS Glue Iceberg REST endpoint:

Any Iceberg client, including Spark, PyIceberg, and more.

Accessing tables directly

You can access tables directly from open source query engines through methods that bridge S3 Tables management operations to your Apache Iceberg analytics applications. There are two direct access methods: the Amazon S3 Tables Iceberg REST endpoint or the Amazon S3 Tables Catalog for Apache Iceberg. The REST endpoint is recommended.

We recommend direct access if you access tables in self-managed catalog implementations, or only need to perform basic read/write operations on tables in a single table bucket. For other access scenarios, we recommend the Amazon SageMaker Lakehouse integration.

Direct access to tables is managed through either IAM identity-based policies or resource-based policies attached to tables and table buckets. You do not need to manage Lake Formation permissions for tables when you access them directly.

Accessing tables through the Amazon S3 Tables Iceberg REST endpoint

You can use the Amazon S3 Tables Iceberg REST endpoint to access your tables directly from any Iceberg REST compatible clients through HTTP endpoints, for more information, see Accessing tables using the Amazon S3 Tables Iceberg REST endpoint.

The following AWS analytics services and query engines can access tables directly using the Amazon S3 Tables Iceberg REST endpoint:

Supported query engines

Any Iceberg client, including Spark, PyIceberg, and more.
Amazon EMR
AWS Glue ETL

Accessing tables directly through the Amazon S3 Tables Catalog for Apache Iceberg

You can also access tables directly from query engines like Apache Spark by using the S3 Tables client catalog, for more information, see Accessing Amazon S3 tables with the Amazon S3 Tables Catalog for Apache Iceberg. However, S3 recommends using the Amazon S3 Tables Iceberg REST endpoint for direct access because it supports more applications, without requiring language or engine-specific code.

The following query engines can access tables directly using the client catalog:

Apache Spark

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Managing policies

Using S3 Tables with AWS analytics services

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Accessing table data

Choosing an access method

Accessing tables through the Amazon SageMaker Lakehouse integration

Accessing tables using the AWS Glue Iceberg REST endpoint

Accessing tables directly

Accessing tables through the Amazon S3 Tables Iceberg REST endpoint

Supported query engines

Accessing tables directly through the Amazon S3 Tables Catalog for Apache Iceberg

On this page

Did this page help you?

Next topic:

Previous topic:

Need help?