How Lambda processes records from Amazon Kinesis Data Streams
You can use a Lambda function to process records in an Amazon Kinesis data stream. You can map a Lambda function to a Kinesis Data Streams shared-throughput consumer (standard iterator), or to a dedicated-throughput consumer with enhanced fan-out. For standard iterators, Lambda polls each shard in your Kinesis stream for records using HTTP protocol. The event source mapping shares read throughput with other consumers of the shard.
For details about Kinesis data streams, see Reading Data from Amazon Kinesis Data Streams.
Note
Kinesis charges for each shard and, for enhanced fan-out, data read from the stream. For pricing details, see
Amazon Kinesis pricing
Topics
- Polling and batching streams
- Example event
- Process Amazon Kinesis Data Streams records with Lambda
- Configuring partial batch response with Kinesis Data Streams and Lambda
- Retain discarded batch records for a Kinesis Data Streams event source in Lambda
- Implementing stateful Kinesis Data Streams processing in Lambda
- Lambda parameters for Amazon Kinesis Data Streams event source mappings
- Using event filtering with a Kinesis event source
- Tutorial: Using Lambda with Kinesis Data Streams
Polling and batching streams
Lambda reads records from the data stream and invokes your function synchronously with an event that contains stream records. Lambda reads records in batches and invokes your function to process records from the batch. Each batch contains records from a single shard/data stream.
For standard Kinesis data streams, Lambda polls shards in your stream for records at a rate of once per second for each shard. For Kinesis enhanced fan-out, Lambda uses an HTTP/2 connection to listen for records being pushed from Kinesis. When records are available, Lambda invokes your function and waits for the result.
By default, Lambda invokes your function as soon as records are available. If the batch that Lambda reads from the event source has only one record in it, Lambda sends only one record to the function. To avoid invoking the function with a small number of records, you can tell the event source to buffer records for up to 5 minutes by configuring a batching window. Before invoking the function, Lambda continues to read records from the event source until it has gathered a full batch, the batching window expires, or the batch reaches the payload limit of 6 MB. For more information, see Batching behavior.
Warning
Lambda event source mappings process each event at least once, and duplicate processing of records can occur. To avoid potential issues
related to duplicate events, we strongly recommend that you make your function code idempotent. To learn more, see How do I make my Lambda function idempotent
Lambda doesn't wait for any configured extensions to complete before sending the next batch for processing. In other words, your extensions may continue to run as Lambda processes the next batch of records. This can cause throttling issues if you breach any of your account's concurrency settings or limits. To detect whether this is a potential issue, monitor your functions and check whether you're seeing higher concurrency metrics than expected for your event source mapping. Due to short times in between invokes, Lambda may briefly report higher concurrency usage than the number of shards. This can be true even for Lambda functions without extensions.
Configure the ParallelizationFactor setting to process one shard of a Kinesis data stream with more than one Lambda invocation simultaneously.
You can specify the number of concurrent batches that Lambda polls from a shard via a parallelization factor from 1 (default) to 10. For example, when you set ParallelizationFactor
to 2, you can have 200 concurrent Lambda invocations at maximum to process 100 Kinesis data shards (though in practice, you may see different values for the ConcurrentExecutions
metric).
This helps scale up the processing throughput when the data volume is volatile and
the IteratorAge
is high. When you increase the number of concurrent batches per shard, Lambda still ensures in-order processing at the partition-key level.
You can also use ParallelizationFactor
with Kinesis aggregation. The behavior of the event source mapping
depends on whether you're using enhanced fan-out:
-
Without enhanced fan-out: All of the events inside an aggregated event must have the same partition key. The partition key must also match that of the aggregated event. If the events inside the aggregated event have different partition keys, Lambda cannot guarantee in-order processing of the events by partition key.
-
With enhanced fan-out: First, Lambda decodes the aggregated event into its individual events. The aggregated event can have a different partition key than events it contains. However, events that don't correspond to the partition key are dropped and lost
. Lambda doesn't process these events, and doesn't send them to a configured failure destination.
Example event
{ "Records": [ { "kinesis": { "kinesisSchemaVersion": "1.0", "partitionKey": "1", "sequenceNumber": "49590338271490256608559692538361571095921575989136588898", "data": "SGVsbG8sIHRoaXMgaXMgYSB0ZXN0Lg==", "approximateArrivalTimestamp": 1545084650.987 }, "eventSource": "aws:kinesis", "eventVersion": "1.0", "eventID": "shardId-000000000006:49590338271490256608559692538361571095921575989136588898", "eventName": "aws:kinesis:record", "invokeIdentityArn": "arn:aws:iam::123456789012:role/lambda-role", "awsRegion": "us-east-2", "eventSourceARN": "arn:aws:kinesis:us-east-2:123456789012:stream/lambda-stream" }, { "kinesis": { "kinesisSchemaVersion": "1.0", "partitionKey": "1", "sequenceNumber": "49590338271490256608559692540925702759324208523137515618", "data": "VGhpcyBpcyBvbmx5IGEgdGVzdC4=", "approximateArrivalTimestamp": 1545084711.166 }, "eventSource": "aws:kinesis", "eventVersion": "1.0", "eventID": "shardId-000000000006:49590338271490256608559692540925702759324208523137515618", "eventName": "aws:kinesis:record", "invokeIdentityArn": "arn:aws:iam::123456789012:role/lambda-role", "awsRegion": "us-east-2", "eventSourceARN": "arn:aws:kinesis:us-east-2:123456789012:stream/lambda-stream" } ] }