Prerequisites to use Apache Iceberg Tables as a destination - Amazon Data Firehose

Prerequisites to use Apache Iceberg Tables as a destination

Choose from the following options to complete the required prerequisites.

Prerequisites to deliver to Iceberg Tables in Amazon S3

Before you begin, complete the following prerequisites.

  • Create an Amazon S3 bucket – You must create an Amazon S3 bucket to add metadata file path during tables creation. For more information, see Create an S3 bucket.

  • Create an IAM role with required permissions – Firehose needs an IAM role with specific permissions to access AWS Glue tables and write data to Amazon S3. The same role is used to grant AWS Glue access to Amazon S3 buckets. You need this IAM role when you create Iceberg Table and a Firehose stream. For more information, see Grant Firehose access to an Apache Iceberg Tables destination.

  • Create Apache Iceberg Tables – If you are configuring unique keys in the Firehose stream for updates and deletes, Firehose validates if the table and unique keys exist as a part of stream creation. For this scenario, you must create tables before creating the Firehose stream. You can use AWS Glue to create Apache Iceberg Tables. For more information, see Creating Apache Iceberg tables. If you are not configuring unique keys in the Firehose stream, then you don't require to create Iceberg tables before creating a Firehose stream.

    Note

    Firehose supports the following table version and format for Apache Iceberg tables.

    • Table format version – Firehose only supports V2 table format. Do not create tables in V1 format, else you get an error and data is delivered to S3 error bucket instead.

    • Data storage format –Firehose writes data to Apache Iceberg Tables in Parquet format.

    • Row level operation –Firehose supports Merge-on-Read (MOR) mode of writing data to Apache Iceberg Tables.

Prerequisites to deliver to Amazon S3 Tables

To deliver data to Amazon S3 table buckets, complete the following prerequisites.

  • Create an IAM role with required permissions – Firehose needs an IAM role with specific permissions to access AWS Glue tables and write data to tables in an Amazon S3 table bucket. To write to tables in an S3 table bucket, you also must provide the IAM role with required permissions in AWS Lake Formation. You configure this IAM role when you create a Firehose stream. For more information, see Grant Firehose access to Amazon S3 Tables.

  • Create an S3 Table bucket, namespace, tables in the table bucket, and other integration steps outlined in Integrating Amazon S3 Tables with AWS analytics services.

    Note

    In the described steps, grant AWS Lake Formation DESCRIBE permission to the IAM role that you created previously.

    You will use the resource link names for Database and Table created as part of prerequisites as Database and Table name in your Firehose stream configuration for routing purposes. You can use them in the Unique key section of your Firehose stream configuration if you are routing to a single table, or send them as part of your input data for Firehose to route to the right table using JSON Query expressions.

    For more ways to create resource links, see Creating a resource link to a shared Data Catalog table or Creating a resource link to a shared Data Catalog database in the Lake Formation user guide.