Streaming data to tables with Amazon Data Firehose
Note
The integration with AWS analytics services for table buckets is in preview release and is subject to change.
Amazon Data Firehose is a fully managed service for delivering real-time streaming data
After you Integrate your table buckets with AWS analytics services, you can configure Firehose to deliver data into your S3 tables. To do so, you create an IAM service role that allows Firehose to access your tables. Next, you create a resource link to your table or table's namespace and grant the Firehose service role explicit permissions to those resources. Then, you can create a Firehose stream that routes data to your table.
Creating a role for Firehose to use S3 tables as a destination
Firehose needs an IAM service role with specific permissions to access AWS Glue tables and write data to S3 tables. You need this provide this IAM role when you create an Firehose stream.
Open the IAM console at https://console.aws.amazon.com/iam/
. -
In the left navigation pane, choose Policies
-
Choose Create a policy, and choose JSON in policy editor.
-
Add the following inline policy that grants permissions to all databases and tables in your data catalog. If you want, you can give permissions only to specific tables and databases. To use this policy, replace the
input placeholders
with your own information."Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "glue:GetTable", "glue:GetDatabase", "glue:UpdateTable" ], "Resource": [ "arn:aws:glue:
us-east-1
:111122223333
:catalog", "arn:aws:glue:us-east-1
:111122223333
:database/*", "arn:aws:glue:us-east-1
:111122223333
:table/*/*" ] }, { "Effect": "Allow", "Action": [ "s3:AbortMultipartUpload", "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::amzn-s3-demo-logging-bucket
", "arn:aws:s3:::amzn-s3-demo-logging-bucket
/*" ] }, { "Effect": "Allow", "Action": [ "kinesis:DescribeStream", "kinesis:GetShardIterator", "kinesis:GetRecords", "kinesis:ListShards" ], "Resource": "arn:aws:kinesis:us-east-1
:amzn-s3-demo-bucket
:stream/stream-name
" }, { "Effect": "Allow", "Action": [ "lakeformation:GetDataAccess" ], "Resource": * }, { "Effect": "Allow", "Action": [ "kms:Decrypt",amzn-s3-demo-source-bucket
"kms:GenerateDataKey" ], "Resource": [ "arn:aws:kms:Region
:amzn-s3-demo-bucket
:key/KMS-key-id
" ], "Condition": { "StringEquals": { "kms:ViaService": "s3.region.amazonaws.com" }, "StringLike": { "kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::amzn-s3-demo-bucket
/prefix
*" } } }, { "Effect": "Allow", "Action": [ "logs:PutLogEvents" ], "Resource": [ "arn:aws:logs:us-east-1
:111122223333
:log-group:log-group-name
:log-stream:log-stream-name
" ] }, { "Effect": "Allow", "Action": [ "lambda:InvokeFunction", "lambda:GetFunctionConfiguration" ], "Resource": [ "arn:aws:lambda:us-east-1
:amzn-s3-demo-bucket
:function:function-name
:function-version
" ] } ] }This policy has a statements that allow access to Kinesis Data Streams, invoking Lambda functions and access to AWS KMS keys. If you don't use any of these resources, you can remove the respective statements.
If error logging is enabled, Firehose also sends data delivery errors to your CloudWatch log group and streams. For this, you must configure log group and log stream names. For log group and log stream names, see Monitor Amazon Data Firehose Using CloudWatch Logs.
-
After you create the policy, create an IAM role with AWS service as the Trusted entity type.
-
For Service or use case, choose Kinesis. For Use case choose Kinesis Firehose.
-
Choose Next, and then select the policy you created earlier.
-
Give your role a name. Review your role details, and choose Create role. The role will have the following trust policy.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "sts:AssumeRole" ], "Principal": { "Service": [ "firehose.amazonaws.com" ] } } ] }
Granting the Firehose role permissions to your table data
AWS Lake Formation permissions control fine-grained access to AWS Glue Data Catalog resources, and the underlying data in S3 tables. For more information, see Overview of Lake Formation permissions . In order for Firehose to stream to your tables, the service role you created for it needs explicit permission from Lake Formation to access the resource link and linked table data.
Prerequisites
Note
Granting permissions on a resource link doesn't grant permissions on the target (linked) namespace or table. You must separately grant permissions to both the resource link and the target.
To grant Firehose permission to a resource link
-
Create a resource link to the namespace of the table you want to stream to. For more information, see Create a resource link to your namespace.
-
Open the Lake Formation console at https://console.aws.amazon.com/lakeformation/
. -
Choose Databases, and then select the name of the resource link you created for your table namespace.
-
From the Actions menu under Permissions, select Grant. This opens permissions to the resource link.
-
For Principals, choose IAM users and roles, and then select the Firehose service role you created previously.
-
For Resource link permissions, select Describe. Then choose Grant.
To grant Firehose permission to a linked namespace
-
Open the Lake Formation console at https://console.aws.amazon.com/lakeformation/
. -
Choose Databases, and then select the name of the resource link you created for your table namespace.
-
From the Actions menu under Permissions, select Grant on target. This opens the permission settings for the linked table namespace.
-
For Principals, choose IAM users and roles, and then select the Firehose service role you created previously.
-
(Optional) Under LF-Tags or catalog resources , choose Named Data Catalog resources then select a specific table in that database to give Firehose access to.
-
For Database permissions (or Table permissions if you are limiting access to specific tables), select Super. Then, choose Grant.
Setting up a Firehose stream to S3 tables
The following procedure shows how to setup a Firehose stream to deliver data to S3 tables using the console. The following prerequisites are required to set up a Firehose stream to S3 tables.
Prerequisites
-
Create the Role for Firehose to access S3 Tables.
-
Create a resource link to the namespace of the table you want to stream to. For more information, see Create a resource link to your namespace.
-
Grant Firehose permission to access your resource link and linked namespace data. For more information, see Granting the Firehose role permissions to your table data.
To provide routing information to Firehose when you configure a stream, you use the name of resource link you created for your namespace as the database name and the name of a table in that namespace. You can use these values in the Unique key section of a Firehose stream configuration to route data to a single table. You can also use this values to route to a table using JSON Query expressions. For more information, see Route incoming records to a single Iceberg table.
To set up a Firehose stream to S3 tables (Console)
Open the Firehose console at https://console.aws.amazon.com/firehose/
. -
Choose Create Firehose stream.
-
For Source, choose one of the following sources:
-
Amazon Kinesis Data Streams
-
Amazon MSK
-
Direct PUT
-
-
For Destination, choose Apache Iceberg Tables.
-
Enter a Firehose stream name.
-
Configure your Source settings.
-
For Destination settings, select Current Account and the AWS Region of the tables you want to stream to.
-
In the Unique key configuration
JSON
field, use the resource link for your namespace as theDestinationDatabaseName
and the name of the of the table to stream data to asDestinationTableName
. The other values are optional:[ { "DestinationDatabaseName": "
namespace_resource_link
", "DestinationTableName": "my_table
", "UniqueKeys": [ "OPTIONAL_COLUMN_PLACEHOLDER
" ], "S3ErrorOutputPrefix": "OPTIONAL_PREFIX_PLACEHOLDER
" } ] -
Under Backup settings, specify a S3 backup bucket.
-
For Existing IAM roles under Advanced settings, select the IAM role you created for Firehose.
For more information about the other settings you can configure for a stream, see Set up the Firehose stream in the Firehose Developer Guide.