# Develop producers using the Amazon Kinesis Producer Library (KPL)
<a name="developing-producers-with-kpl"></a>

An Amazon Kinesis Data Streams producer is an application that puts user data records into a Kinesis data stream (also called *data ingestion*). The Amazon Kinesis Producer Library (KPL) simplifies producer application development, letting developers achieve high write throughput to a Kinesis data stream. 

You can monitor the KPL with Amazon CloudWatch. For more information, see [Monitor the Kinesis Producer Library with Amazon CloudWatch](monitoring-with-kpl.md).

**Topics**
+ [Review the role of the KPL](#developing-producers-with-kpl-role)
+ [Realize the advantages of using the KPL](#developing-producers-with-kpl-advantage)
+ [Understand when not to use the KPL](#developing-producers-with-kpl-when)
+ [Install the KPL](kinesis-kpl-dl-install.md)
+ [Migrate from KPL 0.x to KPL 1.x](kpl-migration-1x.md)
+ [Transition to Amazon Trust Services (ATS) certificates for the KPL](kinesis-kpl-upgrades.md)
+ [KPL supported platforms](kinesis-kpl-supported-plats.md)
+ [KPL key concepts](kinesis-kpl-concepts.md)
+ [Integrate the KPL with producer code](kinesis-kpl-integration.md)
+ [Write to your Kinesis data stream using the KPL](kinesis-kpl-writing.md)
+ [Configure the Amazon Kinesis Producer Library](kinesis-kpl-config.md)
+ [Implement consumer de-aggregation](kinesis-kpl-consumer-deaggregation.md)
+ [Use the KPL with Amazon Data Firehose](kpl-with-firehose.md)
+ [Use the KPL with the AWS Glue Schema Registry](kpl-with-schemaregistry.md)
+ [Configure the KPL proxy configuration](kpl-proxy-configuration.md)
+ [KPL version lifecycle policy](kpl-version-lifecycle-policy.md)

**Note**  
It is recommended that you upgrade to the latest KPL version. KPL is regularly updated with newer releases that include the latest dependency and security patches, bug fixes, and backward-compatible new features. For more information, see [https://github.com/awslabs/amazon-kinesis-producer/releases/](https://github.com/awslabs/amazon-kinesis-producer/releases/).

## Review the role of the KPL
<a name="developing-producers-with-kpl-role"></a>

The KPL is an easy-to-use, highly configurable library that helps you write to a Kinesis data stream. It acts as an intermediary between your producer application code and the Kinesis Data Streams API actions. The KPL performs the following primary tasks: 
+ Writes to one or more Kinesis data streams with an automatic and configurable retry mechanism
+ Collects records and uses `PutRecords` to write multiple records to multiple shards per request
+ Aggregates user records to increase payload size and improve throughput
+ Integrates seamlessly with the [Kinesis Client Library](https://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html) (KCL) to de-aggregate batched records on the consumer
+ Submits Amazon CloudWatch metrics on your behalf to provide visibility into producer performance

Note that the KPL is different from the Kinesis Data Streams API that is available in the [AWS SDKs](https://aws.amazon.com/tools/). The Kinesis Data Streams API helps you manage many aspects of Kinesis Data Streams (including creating streams, resharding, and putting and getting records), while the KPL provides a layer of abstraction specifically for ingesting data. For information about the Kinesis Data Streams API, see the [Amazon Kinesis API Reference](https://docs.aws.amazon.com/kinesis/latest/APIReference/).

## Realize the advantages of using the KPL
<a name="developing-producers-with-kpl-advantage"></a>

The following list represents some of the major advantages to using the KPL for developing Kinesis Data Streams producers.

The KPL can be used in either synchronous or asynchronous use cases. We suggest using the higher performance of the asynchronous interface unless there is a specific reason to use synchronous behavior. For more information about these two use cases and example code, see [Write to your Kinesis data stream using the KPL](kinesis-kpl-writing.md).

 **Performance Benefits**   
The KPL can help build high-performance producers. Consider a situation where your Amazon EC2 instances serve as a proxy for collecting 100-byte events from hundreds or thousands of low power devices and writing records into a Kinesis data stream. These EC2 instances must each write thousands of events per second to your data stream. To achieve the throughput needed, producers must implement complicated logic, such as batching or multithreading, in addition to retry logic and record de-aggregation at the consumer side. The KPL performs all of these tasks for you. 

 **Consumer-Side Ease of Use**   
For consumer-side developers using the KCL in Java, the KPL integrates without additional effort. When the KCL retrieves an aggregated Kinesis Data Streams record consisting of multiple KPL user records, it automatically invokes the KPL to extract the individual user records before returning them to the user.   
For consumer-side developers who do not use the KCL but instead use the API operation `GetRecords` directly, a KPL Java library is available to extract the individual user records before returning them to the user. 

 **Producer Monitoring**   
You can collect, monitor, and analyze your Kinesis Data Streams producers using Amazon CloudWatch and the KPL. The KPL emits throughput, error, and other metrics to CloudWatch on your behalf, and is configurable to monitor at the stream, shard, or producer level.

 **Asynchronous Architecture**   
Because the KPL may buffer records before sending them to Kinesis Data Streams, it does not force the caller application to block and wait for a confirmation that the record has arrived at the server before continuing runtime. A call to put a record into the KPL always returns immediately and does not wait for the record to be sent or a response to be received from the server. Instead, a `Future` object is created that receives the result of sending the record to Kinesis Data Streams at a later time. This is the same behavior as asynchronous clients in the AWS SDK.

## Understand when not to use the KPL
<a name="developing-producers-with-kpl-when"></a>

The KPL can incur an additional processing delay of up to `RecordMaxBufferedTime` within the library (user-configurable). Larger values of `RecordMaxBufferedTime` results in higher packing efficiencies and better performance. Applications that cannot tolerate this additional delay might need to use the AWS SDK directly. For more information about using the AWS SDK with Kinesis Data Streams, see [Develop producers using the Amazon Kinesis Data Streams API with the AWS SDK for Java](developing-producers-with-sdk.md). For more information about `RecordMaxBufferedTime` and other user-configurable properties of the KPL, see [Configure the Amazon Kinesis Producer Library](kinesis-kpl-config.md).

# Install the KPL
<a name="kinesis-kpl-dl-install"></a>

Amazon provides pre-built binaries of the C\$1\$1 Amazon Kinesis Producer Library (KPL) for macOS, Windows, and recent Linux distributions (for supported platform details, see the next section). These binaries are packaged as part of Java .jar files and are automatically invoked and used if you are using Maven to install the package. To locate the latest versions of the KPL and KCL, use the following Maven search links:
+ [KPL](https://search.maven.org/#search|ga|1|amazon-kinesis-producer)
+ [KCL](https://search.maven.org/#search|ga|1|amazon-kinesis-client)

The Linux binaries have been compiled with the GNU Compiler Collection (GCC) and statically linked against libstdc\$1\$1 on Linux. They are expected to work on any 64-bit Linux distribution that includes a glibc version 2.5 or higher.

Users of earlier Linux distributions can build the KPL using the build instructions provided along with the source on GitHub. To download the KPL from GitHub, see [Amazon Kinesis Producer Library](https://github.com/awslabs/amazon-kinesis-producer).

**Important**  
Amazon Kinesis Producer Library (KPL) 0.x will reach end-of-support on January 30, 2026. We **strongly recommend** that you migrate your KPL applications using version 0.x to the latest KPL version before January 30, 2026. To find the latest KPL version, see the [KPL page on Github](https://github.com/awslabs/amazon-kinesis-producer). For information about migrating from KPL 0.x to KPL 1.x, see [Migrate from KPL 0.x to KPL 1.x](kpl-migration-1x.md).

# Migrate from KPL 0.x to KPL 1.x
<a name="kpl-migration-1x"></a>

This topic provides step-by-step instructions to migrate your consumer from KPL 0.x to KPL 1.x. KPL 1.x introduces support for the AWS SDK for Java 2.x while maintaining interface compatibility with previous versions. You don’t have to update your core data processing logic to migrate to KPL 1.x. 

1. **Make sure that you have the following prerequisites:**
   + Java Development Kit (JDK) 8 or later
   + AWS SDK for Java 2.x
   + Maven or Gradle for dependency management

1. **Add dependencies**

   If you're using Maven, add the following dependency to your pom.xml file. Make sure you updated the groupId from `com.amazonaws` to `software.amazon.kinesis` and the version `1.x.x` to the latest KPL version. 

   ```
   <dependency>
       <groupId>software.amazon.kinesis</groupId>
       <artifactId>amazon-kinesis-producer</artifactId>
       <version>1.x.x</version> <!-- Use the latest version -->
   </dependency>
   ```

   If you're using Gradle, add the following to your `build.gradle` file. Make sure to replace `1.x.x` with the latest KPL version. 

   ```
   implementation 'software.amazon.kinesis:amazon-kinesis-producer:1.x.x'
   ```

   You can check for the latest version of the KPL on the [Maven Central Repository](https://central.sonatype.com/search?q=amazon-kinesis-producer). 

1. **Update import statements for KPL**

   KPL 1.x uses the AWS SDK for Java 2.x and uses an updated package name that starts with `software.amazon.kinesis`, compared to the package name in the previous KPL that starts with `com.amazonaws.services.kinesis`.

   Replace the import for `com.amazonaws.services.kinesis` with `software.amazon.kinesis`. The following table lists the imports that you must replace.  
**Import replacements**    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/streams/latest/dev/kpl-migration-1x.html)

1. **Update import statements for AWS credentials provider classes**

   When migrating to KPL 1.x, you must update packages and classes in your imports in your KPL application code that are based on the AWS SDK for Java 1.x to corresponding ones based on the AWS SDK for Java 2.x. Common imports in the KPL application are credentials provider classes. See [Credentials provider changes](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/migration-client-credentials.html) in the AWS SDK for Java 2.x migration guide documentation for the full list of credentials provider changes. Here is the common import change that you might need to make in your KPL applications. 

   **Import in KPL 0.x**

   ```
   import com.amazonaws.auth.DefaultAWSCredentialsProviderChain;
   ```

   **Import in KPL 1.x**

   ```
   import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider;
   ```

   If you import any other credentials providers based on the AWS SDK for Java 1.x, you must update them to the AWS SDK for Java 2.x equivalent ones. If you didn’t import any classes/packages from the AWS SDK for Java 1.x, you can ignore this step.

1. **Update the credentials provider configuration in the KPL configuration**

   The credentials provider configuration in KPL 1.x requires the AWS SDK for Java 2.x credential providers. If you are passing credentials providers for the AWS SDK for Java 1.x in the `KinesisProducerConfiguration` by overriding the default credentials provider, you must update it with the AWS SDK for Java 2.x credential providers. See [Credentials provider changes](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/migration-client-credentials.html) in the AWS SDK for Java 2.x migration guide documentation for the full list of credentials provider changes. If you didn’t override the default credentials provider in the KPL configuration, you can ignore this step.

   For example, if you are overriding the default credentials provider for the KPL with the following code:

   ```
   KinesisProducerConfiguration config = new KinesisProducerConfiguration();
   // SDK v1 default credentials provider
   config.setCredentialsProvider(new DefaultAWSCredentialsProviderChain());
   ```

   You must update them with the following code to use the AWS SDK for Java 2.x credentials provider:

   ```
   KinesisProducerConfiguration config = new KinesisProducerConfiguration();
   // New SDK v2 default credentials provider
   config.setCredentialsProvider(DefaultCredentialsProvider.create());
   ```

# Transition to Amazon Trust Services (ATS) certificates for the KPL
<a name="kinesis-kpl-upgrades"></a>

On February 9, 2018, at 9:00 AM PST, Amazon Kinesis Data Streams installed ATS certificates. To continue to be able to write records to Kinesis Data Streams using the Amazon Kinesis Producer Library (KPL), you must upgrade your installation of the KPL to [version 0.12.6](http://search.maven.org/#artifactdetails|com.amazonaws|amazon-kinesis-producer|0.12.6|jar) or later. This change affects all AWS Regions.

For information about the move to ATS, see [How to Prepare for AWS’s Move to Its Own Certificate Authority](https://aws.amazon.com/blogs/security/how-to-prepare-for-aws-move-to-its-own-certificate-authority/).

If you encounter problems and need technical support, [create a case](https://console.aws.amazon.com/support/v1#/case/create) with the AWS Support Center.

# KPL supported platforms
<a name="kinesis-kpl-supported-plats"></a>

The Amazon Kinesis Producer Library (KPL) is written in C\$1\$1 and runs as a child process to the main user process. Precompiled 64-bit native binaries are bundled with the Java release and are managed by the Java wrapper.

The Java package runs without the need to install any additional libraries on the following operating systems:
+ Linux distributions with kernel 2.6.18 (September 2006) and later
+ Apple iOS X 10.9 and later
+ Windows Server 2008 and later
**Important**  
Windows Server 2008 and later is supported for all KPL versions up to version 0.14.0.   
The Windows platform is NOT supported starting with KPL version 0.14.0 or higher.

Note that the KPL is 64-bit only.

## Source code
<a name="kinesis-kpl-supported-plats-source-code"></a>

If the binaries provided in the KPL installation are not sufficient for your environment, the core of the KPL is written as a C\$1\$1 module. The source code for the C\$1\$1 module and the Java interface are released under the Amazon Public License and are available on GitHub at [Amazon Kinesis Producer Library](https://github.com/awslabs/amazon-kinesis-producer). Although the KPL can be used on any platform for which a recent standards-compliant C\$1\$1 compiler and JRE are available, Amazon doesn't officially support any platform that is not on the supported platforms list.

# KPL key concepts
<a name="kinesis-kpl-concepts"></a>

The following sections contain concepts and terminology necessary to understand and benefit from the Amazon Kinesis Producer Library (KPL).

**Topics**
+ [Records](#kinesis-kpl-concepts-records)
+ [Batching](#kinesis-kpl-concepts-batching)
+ [Aggregation](#kinesis-kpl-concepts-aggretation)
+ [Collection](#kinesis-kpl-concepts-collection)

## Records
<a name="kinesis-kpl-concepts-records"></a>

In this guide, we distinguish between *KPL user records* and *Kinesis Data Streams records*. When we use the term *record* without a qualifier, we refer to a *KPL user record*. When we refer to a Kinesis Data Streams record, we explicitly say *Kinesis Data Streams record*.

A KPL user record is a blob of data that has particular meaning to the user. Examples include a JSON blob representing a UI event on a website, or a log entry from a web server.

A Kinesis Data Streams record is an instance of the `Record` data structure defined by the Kinesis Data Streams service API. It contains a partition key, sequence number, and a blob of data. 

## Batching
<a name="kinesis-kpl-concepts-batching"></a>

*Batching* refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item. 

In this context, the "item" is a record, and the action is sending it to Kinesis Data Streams. In a non-batching situation, you would place each record in a separate Kinesis Data Streams record and make one HTTP request to send it to Kinesis Data Streams. With batching, each HTTP request can carry multiple records instead of just one.

The KPL supports two types of batching:
+ *Aggregation* – Storing multiple records within a single Kinesis Data Streams record. 
+ *Collection* – Using the API operation `PutRecords` to send multiple Kinesis Data Streams records to one or more shards in your Kinesis data stream. 

The two types of KPL batching are designed to coexist and can be turned on or off independently of one another. By default, both are turned on.

## Aggregation
<a name="kinesis-kpl-concepts-aggretation"></a>

*Aggregation* refers to the storage of multiple records in a Kinesis Data Streams record. Aggregation allows customers to increase the number of records sent per API call, which effectively increases producer throughput.

Kinesis Data Streams shards support up to 1,000 Kinesis Data Streams records per second, or 1 MB throughput. The Kinesis Data Streams records per second limit binds customers with records smaller than 1 KB. Record aggregation allows customers to combine multiple records into a single Kinesis Data Streams record. This allows customers to improve their per shard throughput. 

Consider the case of one shard in Region us-east-1 that is currently running at a constant rate of 1,000 records per second, with records that are 512 bytes each. With KPL aggregation, you can pack 1,000 records into only 10 Kinesis Data Streams records, reducing the RPS to 10 (at 50 KB each).

## Collection
<a name="kinesis-kpl-concepts-collection"></a>

*Collection* refers to batching multiple Kinesis Data Streams records and sending them in a single HTTP request with a call to the API operation `PutRecords`, instead of sending each Kinesis Data Streams record in its own HTTP request.

This increases throughput compared to using no collection because it reduces the overhead of making many separate HTTP requests. In fact, `PutRecords` itself was specifically designed for this purpose.

Collection differs from aggregation in that it is working with groups of Kinesis Data Streams records. The Kinesis Data Streams records being collected can still contain multiple records from the user. The relationship can be visualized as such:

```
record 0 --|
record 1   |        [ Aggregation ]
    ...    |--> Amazon Kinesis record 0 --|
    ...    |                              |
record A --|                              |
                                          |
    ...                   ...             |
                                          |
record K --|                              |
record L   |                              |      [ Collection ]
    ...    |--> Amazon Kinesis record C --|--> PutRecords Request
    ...    |                              |
record S --|                              |
                                          |
    ...                   ...             |
                                          |
record AA--|                              |
record BB  |                              |
    ...    |--> Amazon Kinesis record M --|
    ...    |
record ZZ--|
```

# Integrate the KPL with producer code
<a name="kinesis-kpl-integration"></a>

The Amazon Kinesis Producer Library (KPL) runs in a separate process, and communicates with your parent user process using IPC. This architecture is sometimes called a [microservice](http://en.wikipedia.org/wiki/Microservices), and is chosen for two main reasons:

**1) Your user process will not crash even if the KPL crashes**  
Your process could have tasks unrelated to Kinesis Data Streams, and may be able to continue operation even if the KPL crashes. It is also possible for your parent user process to restart the KPL and recover to a fully working state (this functionality is in the official wrappers).

An example is a web server that sends metrics to Kinesis Data Streams; the server can continue serving pages even if the Kinesis Data Streams part has stopped working. Crashing the whole server because of a bug in the KPL would therefore cause an unnecessary outage.

**2) Arbitrary clients can be supported**  
There are always customers who use languages other than the ones officially supported. These customers should also be able to use the KPL easily.

## Recommended usage matrix
<a name="kinesis-kpl-integration-usage"></a>

The following usage matrix lists the recommended settings for different users and advises you about whether and how you should use the KPL. Keep in mind that if aggregation is enabled, de-aggregation must also be used to extract your records on the consumer side. 


| Producer side language | Consumer side language | KCL Version | Checkpoint logic | Can you use the KPL? | Caveats | 
| --- | --- | --- | --- | --- | --- | 
| Anything but Java | \$1 | \$1 | \$1 | No | N/A | 
| Java | Java | Uses Java SDK directly | N/A | Yes | If aggregation is used, you have to use the provided de-aggregation library after GetRecords calls. | 
| Java | Anything but Java | Uses SDK directly | N/A | Yes | Must disable aggregation.  | 
| Java | Java | 1.3.x | N/A | Yes | Must disable aggregation. | 
| Java | Java  | 1.4.x | Calls checkpoint without any arguments | Yes | None | 
| Java | Java | 1.4.x | Calls checkpoint with an explicit sequence number | Yes | Either disable aggregation, or change the code to use extended sequence numbers for checkpointing. | 
| Java | Anything but Java  | 1.3.x \$1 Multilanguage daemon \$1 language-specific wrapper | N/A | Yes | Must disable aggregation.  | 

# Write to your Kinesis data stream using the KPL
<a name="kinesis-kpl-writing"></a>

The following sections show sample code in a progression from the most basic producer to fully asynchronous code.

## Barebones producer code
<a name="kinesis-kpl-writing-code"></a>

The following code is all that is needed to write a minimal working producer. The Amazon Kinesis Producer Library (KPL) user records are processed in the background.

```
// KinesisProducer gets credentials automatically like 
// DefaultAWSCredentialsProviderChain. 
// It also gets region automatically from the EC2 metadata service. 
KinesisProducer kinesis = new KinesisProducer();  
// Put some records 
for (int i = 0; i < 100; ++i) {
    ByteBuffer data = ByteBuffer.wrap("myData".getBytes("UTF-8"));
    // doesn't block       
    kinesis.addUserRecord("myStream", "myPartitionKey", data); 
}  
// Do other stuff ...
```

## Respond to results synchronously
<a name="kinesis-kpl-writing-synchronous"></a>

In the previous example, the code didn't check whether the KPL user records succeeded. The KPL performs any retries needed to account for failures. But if you want to check on the results, you can examine them using the `Future` objects that are returned from `addUserRecord`, as in the following example (previous example shown for context):

```
KinesisProducer kinesis = new KinesisProducer();  

// Put some records and save the Futures 
List<Future<UserRecordResult>> putFutures = new LinkedList<Future<UserRecordResult>>(); 
for (int i = 0; i < 100; i++) {
    ByteBuffer data = ByteBuffer.wrap("myData".getBytes("UTF-8"));
    // doesn't block 
    putFutures.add(
        kinesis.addUserRecord("myStream", "myPartitionKey", data)); 
}  

// Wait for puts to finish and check the results 
for (Future<UserRecordResult> f : putFutures) {
    UserRecordResult result = f.get(); // this does block     
    if (result.isSuccessful()) {         
        System.out.println("Put record into shard " + 
                            result.getShardId());     
    } else {
        for (Attempt attempt : result.getAttempts()) {
            // Analyze and respond to the failure         
        }
    }
}
```

## Respond to results asynchronously
<a name="kinesis-kpl-writing-asynchronous"></a>

The previous example is calling `get()` on a `Future` object, which blocks runtime. If you don't want to block runtime, you can use an asynchronous callback, as shown in the following example:

```
KinesisProducer kinesis = new KinesisProducer();

FutureCallback<UserRecordResult> myCallback = new FutureCallback<UserRecordResult>() {     
    @Override public void onFailure(Throwable t) {
        /* Analyze and respond to the failure  */ 
    };     
    @Override public void onSuccess(UserRecordResult result) { 
        /* Respond to the success */ 
    };
};

for (int i = 0; i < 100; ++i) {
    ByteBuffer data = ByteBuffer.wrap("myData".getBytes("UTF-8"));      
    ListenableFuture<UserRecordResult> f = kinesis.addUserRecord("myStream", "myPartitionKey", data);     
    // If the Future is complete by the time we call addCallback, the callback will be invoked immediately.
    Futures.addCallback(f, myCallback); 
}
```

# Configure the Amazon Kinesis Producer Library
<a name="kinesis-kpl-config"></a>

Although the default settings should work well for most use cases, you may want to change some of the default settings to tailor the behavior of the `KinesisProducer` to your needs. An instance of the `KinesisProducerConfiguration` class can be passed to the `KinesisProducer` constructor to do so, for example:

```
KinesisProducerConfiguration config = new KinesisProducerConfiguration()
        .setRecordMaxBufferedTime(3000)
        .setMaxConnections(1)
        .setRequestTimeout(60000)
        .setRegion("us-west-1");
        
final KinesisProducer kinesisProducer = new KinesisProducer(config);
```

You can also load a configuration from a properties file:

```
KinesisProducerConfiguration config = KinesisProducerConfiguration.fromPropertiesFile("default_config.properties");
```

You can substitute any path and file name that the user process has access to. You can additionally call set methods on the `KinesisProducerConfiguration` instance created this way to customize the config.

The properties file should specify parameters using their names in PascalCase. The names match those used in the set methods in the `KinesisProducerConfiguration` class. For example:

```
RecordMaxBufferedTime = 100
MaxConnections = 4
RequestTimeout = 6000
Region = us-west-1
```

For more information about configuration parameter usage rules and value limits, see the [sample configuration properties file on GitHub](https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties).

Note that after `KinesisProducer` is initialized, changing the `KinesisProducerConfiguration` instance that was used has no further effect. `KinesisProducer` does not currently support dynamic reconfiguration.

# Implement consumer de-aggregation
<a name="kinesis-kpl-consumer-deaggregation"></a>

Beginning with release 1.4.0, the KCL supports automatic de-aggregation of KPL user records. Consumer application code written with previous versions of the KCL will compile without any modification after you update the KCL. However, if KPL aggregation is being used on the producer side, there is a subtlety involving checkpointing: all subrecords within an aggregated record have the same sequence number, so additional data has to be stored with the checkpoint if you need to distinguish between subrecords. This additional data is referred to as the *subsequence number*.

**Topics**
+ [Migrate from previous versions of the KCL](#kinesis-kpl-consumer-deaggregation-migration)
+ [Use KCL extensions for KPL de-aggregation](#kinesis-kpl-consumer-deaggregation-extensions)
+ [Use GetRecords directly](#kinesis-kpl-consumer-deaggregation-getrecords)

## Migrate from previous versions of the KCL
<a name="kinesis-kpl-consumer-deaggregation-migration"></a>

You are not required to change your existing calls to do checkpointing with aggregation. It is still guaranteed that you can retrieve all records successfully stored in Kinesis Data Streams. The KCL now provides two new checkpoint operations to support particular use cases, described following.

If your existing code was written for the KCL before KPL support, and your checkpoint operation is called without arguments, it is equivalent to checkpointing the sequence number of the last KPL user record in the batch. If your checkpoint operation is called with a sequence number string, it is equivalent to checkpointing the given sequence number of the batch along with the implicit subsequence number 0 (zero).

Calling the new KCL checkpoint operation `checkpoint()` without any arguments is semantically equivalent to checkpointing the sequence number of the last `Record` call in the batch, along with the implicit subsequence number 0 (zero). 

Calling the new KCL checkpoint operation `checkpoint(Record record)` is semantically equivalent to checkpointing the given `Record`’s sequence number along with the implicit subsequence number 0 (zero). If the `Record` call is actually a `UserRecord`, the `UserRecord` sequence number and subsequence number are checkpointed. 

Calling the new KCL checkpoint operation `checkpoint(String sequenceNumber, long subSequenceNumber)` explicitly checkpoints the given sequence number along with the given subsequence number. 

In any of these cases, after the checkpoint is stored in the Amazon DynamoDB checkpoint table, the KCL can correctly resume retrieving records even when the application crashes and restarts. If more records are contained within the sequence, retrieval occurs starting with the next subsequence number record within the record with the most recently checkpointed sequence number. If the most recent checkpoint included the very last subsequence number of the previous sequence number record, retrieval occurs starting with the record with the next sequence number. 

The next section discusses details of sequence and subsequence checkpointing for consumers that must avoid skipping and duplication of records. If skipping (or duplication) of records when stopping and restarting your consumer’s record processing is not important, you can run your existing code with no modification.

## Use KCL extensions for KPL de-aggregation
<a name="kinesis-kpl-consumer-deaggregation-extensions"></a>

KPL de-aggregation can involve subsequence checkpointing. To facilitate using subsequence checkpointing, a `UserRecord` class has been added to the KCL:

```
public class UserRecord extends Record {     
    public long getSubSequenceNumber() {
    /* ... */
    }      
    @Override 
    public int hashCode() {
    /* contract-satisfying implementation */ 
    }      
    @Override 
    public boolean equals(Object obj) {
    /* contract-satisfying implementation */ 
    } 
}
```

This class is now used instead of `Record`. This does not break existing code because it is a subclass of `Record`. The `UserRecord` class represents both actual subrecords and standard, non-aggregated records. Non-aggregated records can be thought of as aggregated records with exactly one subrecord.

In addition, two new operations are added to`IRecordProcessorCheckpointer`:

```
public void checkpoint(Record record); 
public void checkpoint(String sequenceNumber, long subSequenceNumber);
```

To begin using subsequence number checkpointing, you can perform the following conversion. Change the following form code:

```
checkpointer.checkpoint(record.getSequenceNumber());
```

New form code:

```
checkpointer.checkpoint(record);
```

We recommend that you use the `checkpoint(Record record)` form for subsequence checkpointing. However, if you are already storing `sequenceNumbers` in strings to use for checkpointing, you should now also store `subSequenceNumber`, as shown in the following example:

```
String sequenceNumber = record.getSequenceNumber(); 
long subSequenceNumber = ((UserRecord) record).getSubSequenceNumber();  // ... do other processing  
checkpointer.checkpoint(sequenceNumber, subSequenceNumber);
```

The cast from `Record`to`UserRecord` always succeeds because the implementation always uses `UserRecord`. Unless there is a need to perform arithmetic on the sequence numbers, this approach is not recommended.

While processing KPL user records, the KCL writes the subsequence number into Amazon DynamoDB as an extra field for each row. Previous versions of the KCL used `AFTER_SEQUENCE_NUMBER` to fetch records when resuming checkpoints. The current KCL with KPL support uses `AT_SEQUENCE_NUMBER` instead. When the record at the checkpointed sequence number is retrieved, the checkpointed subsequence number is checked, and subrecords are dropped as appropriate (which may be all of them, if the last subrecord is the one checkpointed). Again, non-aggregated records can be thought of as aggregated records with a single subrecord, so the same algorithm works for both aggregated and non-aggregated records.

## Use GetRecords directly
<a name="kinesis-kpl-consumer-deaggregation-getrecords"></a>

You can also choose not to use the KCL but instead invoke the API operation `GetRecords` directly to retrieve Kinesis Data Streams records. To unpack these retrieved records into your original KPL user records, call one of the following static operations in `UserRecord.java`:

```
public static List<Record> deaggregate(List<Record> records)

public static List<UserRecord> deaggregate(List<UserRecord> records, BigInteger startingHashKey, BigInteger endingHashKey)
```

The first operation uses the default value `0` (zero) for `startingHashKey` and the default value `2^128 -1` for `endingHashKey`.

Each of these operations de-aggregates the given list of Kinesis Data Streams records into a list of KPL user records. Any KPL user records whose explicit hash key or partition key falls outside the range of the `startingHashKey` (inclusive) and the `endingHashKey` (inclusive) are discarded from the returned list of records.

# Use the KPL with Amazon Data Firehose
<a name="kpl-with-firehose"></a>

If you use the Kinesis Producer Library (KPL) to write data to a Kinesis data stream, you can use aggregation to combine the records that you write to that Kinesis data stream. If you then use that data stream as a source for your Firehose delivery stream, Firehose de-aggregates the records before it delivers them to the destination. If you configure your delivery stream to transform the data, Firehose de-aggregates the records before it delivers them to AWS Lambda. For more information, see [Writing to Amazon Firehose Using Kinesis Data Streams](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-kinesis-streams.html).

# Use the KPL with the AWS Glue Schema Registry
<a name="kpl-with-schemaregistry"></a>

You can integrate your Kinesis data streams with the AWS Glue Schema Registry. The AWS Glue Schema Registry allows you to centrally discover, control, and evolve schemas, while ensuring data produced is continuously validated by a registered schema. A schema defines the structure and format of a data record. A schema is a versioned specification for reliable data publication, consumption, or storage. The AWS Glue Schema Registry enables you to improve end-to-end data quality and data governance within your streaming applications. For more information, see [AWS Glue Schema Registry](https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html). One of the ways to set up this integration is through the KPL and Kinesis Client Library (KCL) libraries in Java. 

**Important**  
Currently, Kinesis Data Streams and AWS Glue schema registry integration is only supported for the Kinesis data streams that use KPL producers implemented in Java. Multi-language support is not provided. 

For detailed instructions on how to set up integration of Kinesis Data Streams with Schema Registry using the KPL, see the "Interacting with Data Using the KPL/KCL Libraries" section in [Use Case: Integrating Amazon Kinesis Data Streams with the AWS Glue Schema Registry](https://docs.aws.amazon.com/glue/latest/dg/schema-registry-integrations.html#schema-registry-integrations-kds).

# Configure the KPL proxy configuration
<a name="kpl-proxy-configuration"></a>

For applications that cannot directly connect to the internet, all AWS SDK clients support the use of HTTP or HTTPS proxies. In a typical enterprise environment, all outbound network traffic has to go through proxy servers. If your application uses Kinesis Producer Library (KPL) to collect and send data to AWS in an environment that uses proxy servers, your application will require KPL proxy configuration. KPL is a high level library built on top of the AWS Kinesis SDK. It is split into a native process and a wrapper. The native process performs all of the jobs of processing and sending records, while the wrapper manages the native process and communicates with it. For more information, see [Implementing Efficient and Reliable Producers with the Amazon Kinesis Producer Library](https://aws.amazon.com/blogs/big-data/implementing-efficient-and-reliable-producers-with-the-amazon-kinesis-producer-library/). 

The wrapper is written in Java and the native process is written in C\$1\$1 with the use of Kinesis SDK. KPL version 0.14.7 and higher now supports proxy configuration in the Java wrapper which can pass all proxy configurations to the native process. For more information, see [https://github.com/awslabs/amazon-kinesis-producer/releases/tag/v0.14.7](https://github.com/awslabs/amazon-kinesis-producer/releases/tag/v0.14.7).

You can use the following code to add proxy configurations to your KPL applications.

```
KinesisProducerConfiguration configuration = new KinesisProducerConfiguration();
// Next 4 lines used to configure proxy 
configuration.setProxyHost("10.0.0.0"); // required
configuration.setProxyPort(3128); // default port is set to 443
configuration.setProxyUserName("username"); // no default 
configuration.setProxyPassword("password"); // no default

KinesisProducer kinesisProducer = new KinesisProducer(configuration);
```

# KPL version lifecycle policy
<a name="kpl-version-lifecycle-policy"></a>

This topic outlines the version lifecycle policy for Amazon Kinesis Producer Library (KPL). AWS regularly provides new releases for KPL versions to support new features and enhancements, bug fixes, security patches, and dependency updates. We recommend that you stay up-to-date with KPL versions to keep up with the latest features, security updates, and underlying dependencies. We **don't** recommend continued use of an unsupported KPL version.

The lifecycle for major KPL versions consists of the following three phases:
+ **General availability (GA)** – During this phase, the major version is fully supported. AWS provides regular minor and patch version releases that include support for new features or API updates for Kinesis Data Streams, as well as bug and security fixes.
+ **Maintenance mode** – AWS limits patch version releases to address critical bug fixes and security issues only. The major version won't receive updates for new features or APIs of Kinesis Data Streams.
+ **End-of-support** – The major version will no longer receive updates or releases. Previously published releases will continue to be available through public package managers and the code will remain on GitHub. Use of a version which has reached end-of-support is done at the user’s discretion. We recommend that you upgrade to the latest major version.


| Major version | Current phase | Release date | Maintenance mode date | End-of-support date | 
| --- | --- | --- | --- | --- | 
| KPL 0.x | Maintenance mode | 2015-06-02 | 2025-04-17 | 2026-01-30 | 
| KPL 1.x | General availability | 2024-12-15 | -- | -- |