Prerequisites to use Apache Iceberg Tables as a destination

PDF

RSS

Focus mode

Prerequisites to use Apache Iceberg Tables as a destination - Amazon Data Firehose

Prerequisites to deliver to Iceberg Tables in Amazon S3 Prerequisites to deliver to Amazon S3 Tables

Choose from the following options to complete the required prerequisites.

Topics

Prerequisites to deliver to Iceberg Tables in Amazon S3
Prerequisites to deliver to Amazon S3 Tables

Prerequisites to deliver to Iceberg Tables in Amazon S3

Before you begin, complete the following prerequisites.

Create an Amazon S3 bucket – You must create an Amazon S3 bucket to add metadata file path during tables creation. For more information, see Create an S3 bucket.
Create an IAM role with required permissions – Firehose needs an IAM role with specific permissions to access AWS Glue tables and write data to Amazon S3. The same role is used to grant AWS Glue access to Amazon S3 buckets. You need this IAM role when you create an Iceberg Table and a Firehose stream. For more information, see Grant Firehose access to an Apache Iceberg Tables destination.
Create Apache Iceberg Tables – If you are configuring unique keys in the Firehose stream for updates and deletes, Firehose validates if the table and unique keys exist as a part of stream creation. For this scenario, you must create tables before creating the Firehose stream. You can use AWS Glue to create Apache Iceberg Tables. For more information, see Creating Apache Iceberg tables. If you are not configuring unique keys in the Firehose stream, then you don't require to create Iceberg tables before creating a Firehose stream.
Note
Firehose supports the following table version and format for Apache Iceberg tables.
- Table format version – Firehose only supports V2 table format. Do not create tables in V1 format, else you get an error and data is delivered to the S3 error bucket instead.
- Data storage format – Firehose writes data to Apache Iceberg Tables in Parquet format.
- Row level operation – Firehose supports the Merge-on-Read (MOR) mode of writing data to Apache Iceberg Tables.

Prerequisites to deliver to Amazon S3 Tables

To deliver data to Amazon S3 table buckets, complete the following prerequisites.

Create an S3 Table bucket, namespace, tables in the table bucket, and other integration steps outlined in Getting started with Amazon S3 Tables. Column names must be lowercase because of the limitations imposed by the S3 Tables catalog integration, as specified in S3 tables catalog integration limitations.
Create a resource link to the namespace – Firehose streams data to the tables in the database registered in the default catalog of the AWS Glue Data Catalog. To stream data to tables in S3 table buckets, create a resource link in the default catalog that points to the namespace in table bucket. A resource link is a Data Catalog object that acts as an alias or pointer to another Data Catalog resource, such as a database or table.
Create an IAM role with required permissions – Firehose needs an IAM role with specific permissions to access AWS Glue tables and write data to tables in an Amazon S3 table bucket. To write to tables in an S3 table bucket, you must also provide the IAM role with the required permissions in AWS Lake Formation. You configure this IAM role when you create a Firehose stream. For more information, see Grant Firehose access to Amazon S3 Tables.
Configure AWS Lake Formation permissions – AWS Lake Formation manages access to your table resources. Lake Formation uses its own permissions model that enables fine-grained access control for Data Catalog resources. For Firehose to ingest data into table buckets, the Firehose role requires DESCRIBE permissions on the resource link to discover the S3 Tables namespace through the resource link and read/write permission on the underlying table.

For step-by-step integration, refer to the blog Build a data lake for streaming data with Amazon S3 Tables and Amazon Data Firehose. For additional information, also refer to Using Amazon S3 Tables with AWS analytics services.

You will use the resource link name for Database created as part of prerequisites in your Firehose stream configuration for routing purposes. You can use them in the Unique key section of your Firehose stream configuration if you are routing to a single table, or send them as part of your input data for Firehose to route to the right table using JSON Query expressions.

For more ways to create resource links, see Creating a resource link to a shared Data Catalog table or Creating a resource link to a shared Data Catalog database in the Lake Formation user guide.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Consideration and limitations

Set up the Firehose stream

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Prerequisites to use Apache Iceberg Tables as a destination

Topics

Prerequisites to deliver to Iceberg Tables in Amazon S3

Note

Prerequisites to deliver to Amazon S3 Tables

On this page

Did this page help you?

Next topic:

Previous topic:

Need help?