Develop producers using the Amazon Kinesis Producer Library (KPL)
An Amazon Kinesis Data Streams producer is an application that puts user data records into a Kinesis data stream (also called data ingestion). The Kinesis Producer Library (KPL) simplifies producer application development, allowing developers to achieve high write throughput to a Kinesis data stream.
You can monitor the KPL with Amazon CloudWatch. For more information, see Monitor the Kinesis Producer Library with Amazon CloudWatch.
Topics
- Review the role of the KPL
- Realize the advantages of using the KPL
- Understand when not to use the KPL
- Install the KPL
- Transition to Amazon Trust Services (ATS) certificates for the KPL
- KPL supported platforms
- KPL key concepts
- Integrate the KPL with producer code
- Write to your Kinesis data stream using the KPL
- Configure the Kinesis Producer Library
- Implement consumer de-aggregation
- Use the KPL with Amazon Data Firehose
- Use the KPL with the AWS Glue Schema Registry
- Configure the KPL proxy configuration
Note
It is recommended that you upgrade to the latest KPL version. KPL is regularly updated
with newer releases that include the latest dependency and security patches, bug fixes, and
backward-compatible new features. For more information, see https://github.com/awslabs/amazon-kinesis-producer/releases/
Review the role of the KPL
The KPL is an easy-to-use, highly configurable library that helps you write to a Kinesis data stream. It acts as an intermediary between your producer application code and the Kinesis Data Streams API actions. The KPL performs the following primary tasks:
-
Writes to one or more Kinesis data streams with an automatic and configurable retry mechanism
-
Collects records and uses
PutRecords
to write multiple records to multiple shards per request -
Aggregates user records to increase payload size and improve throughput
-
Integrates seamlessly with the Kinesis Client Library (KCL) to de-aggregate batched records on the consumer
-
Submits Amazon CloudWatch metrics on your behalf to provide visibility into producer performance
Note that the KPL is different from the Kinesis Data Streams API that is available in
the AWS SDKs
Realize the advantages of using the KPL
The following list represents some of the major advantages to using the KPL for developing Kinesis Data Streams producers.
The KPL can be used in either synchronous or asynchronous use cases. We suggest using the higher performance of the asynchronous interface unless there is a specific reason to use synchronous behavior. For more information about these two use cases and example code, see Write to your Kinesis data stream using the KPL.
- Performance Benefits
-
The KPL can help build high-performance producers. Consider a situation where your Amazon EC2 instances serve as a proxy for collecting 100-byte events from hundreds or thousands of low power devices and writing records into a Kinesis data stream. These EC2 instances must each write thousands of events per second to your data stream. To achieve the throughput needed, producers must implement complicated logic, such as batching or multithreading, in addition to retry logic and record de-aggregation at the consumer side. The KPL performs all of these tasks for you.
- Consumer-Side Ease of Use
-
For consumer-side developers using the KCL in Java, the KPL integrates without additional effort. When the KCL retrieves an aggregated Kinesis Data Streams record consisting of multiple KPL user records, it automatically invokes the KPL to extract the individual user records before returning them to the user.
For consumer-side developers who do not use the KCL but instead use the API operation
GetRecords
directly, a KPL Java library is available to extract the individual user records before returning them to the user. - Producer Monitoring
-
You can collect, monitor, and analyze your Kinesis Data Streams producers using Amazon CloudWatch and the KPL. The KPL emits throughput, error, and other metrics to CloudWatch on your behalf, and is configurable to monitor at the stream, shard, or producer level.
- Asynchronous Architecture
-
Because the KPL may buffer records before sending them to Kinesis Data Streams, it does not force the caller application to block and wait for a confirmation that the record has arrived at the server before continuing execution. A call to put a record into the KPL always returns immediately and does not wait for the record to be sent or a response to be received from the server. Instead, a
Future
object is created that receives the result of sending the record to Kinesis Data Streams at a later time. This is the same behavior as asynchronous clients in the AWS SDK.
Understand when not to use the KPL
The KPL can incur an additional processing delay of up to
RecordMaxBufferedTime
within the library (user-configurable). Larger values of
RecordMaxBufferedTime
results in higher packing efficiencies and better
performance. Applications that cannot tolerate this additional delay may need to use the AWS
SDK directly. For more information about using the AWS SDK with Kinesis Data Streams, see Develop producers using the Amazon Kinesis Data Streams API
with the AWS SDK for Java. For
more information about RecordMaxBufferedTime
and other user-configurable
properties of the KPL, see Configure the Kinesis Producer Library.