

# Transform source data in Amazon Data Firehose
<a name="data-transformation"></a>

Amazon Data Firehose can invoke your Lambda function to transform incoming source data and deliver the transformed data to destinations. You can enable Amazon Data Firehose data transformation when you create your Firehose stream.

## Understand data transformation flow
<a name="data-transformation-flow"></a>

When you enable Firehose data transformation, Firehose buffers incoming data. The buffering size hint ranges between 0.2 MB and 3MB. The default Lambda buffering size hint is 1 MB for all destinations, except Splunk and Snowflake. For Splunk and Snowflake, the default buffering hint is 256 KB. The Lambda buffering interval hint ranges between 0 and 900 seconds. The default Lambda buffering interval hint is sixty seconds for all destinations except Snowflake. For Snowflake, the default buffering hint interval is 30 seconds. To adjust the buffering size, set the [ProcessingConfiguration](https://docs.aws.amazon.com/firehose/latest/APIReference/API_ProcessingConfiguration.html) parameter of the [CreateDeliveryStream](https://docs.aws.amazon.com/firehose/latest/APIReference/API_CreateDeliveryStream.html) or [UpdateDestination](https://docs.aws.amazon.com/firehose/latest/APIReference/API_UpdateDestination.html) API with the [ProcessorParameter](https://docs.aws.amazon.com/firehose/latest/APIReference/API_ProcessorParameter.html) called `BufferSizeInMBs` and `IntervalInSeconds`. Firehose then invokes the specified Lambda function synchronously with each buffered batch using the AWS Lambda synchronous invocation mode. The transformed data is sent from Lambda to Firehose. Firehose then sends it to the destination when the specified destination buffering size or buffering interval is reached, whichever happens first.

**Important**  
The Lambda synchronous invocation mode has a payload size limit of 6 MB for both the request and the response. Make sure that your buffering size for sending the request to the function is less than or equal to 6 MB. Also ensure that the response that your function returns doesn't exceed 6 MB.

## Lambda invocation duration
<a name="data-transformation-execution-duration"></a>

Amazon Data Firehose supports a Lambda invocation time of up to 5 minutes. If your Lambda function takes more than 5 minutes to complete, you get the following error: Firehose encountered timeout errors when calling AWS Lambda. The maximum supported function timeout is 5 minutes.

For information about what Amazon Data Firehose does if such an error occurs, see [Handle failure in data transformation](data-transformation-failure-handling.md).

# Required parameters for data transformation
<a name="data-transformation-status-model"></a>

All transformed records from Lambda must contain the following parameters, or Amazon Data Firehose rejects them and treats that as a data transformation failure.

------
#### [ For Kinesis Data Streams and Direct PUT ]

The following parameters are required for all transformed records from Lambda.
+ `recordId` – The record ID is passed from Amazon Data Firehose to Lambda during the invocation. The transformed record must contain the same record ID. Any mismatch between the ID of the original record and the ID of the transformed record is treated as a data transformation failure.
+ `result` – The status of the data transformation of the record. The possible values are: `Ok` (the record was transformed successfully), `Dropped` (the record was dropped intentionally by your processing logic), and `ProcessingFailed` (the record could not be transformed). If a record has a status of `Ok` or `Dropped`, Amazon Data Firehose considers it successfully processed. Otherwise, Amazon Data Firehose considers it unsuccessfully processed.
+ `data` – The transformed data payload, after base64-encoding.

  Following is a sample Lambda result output:

  ```
   {
      "recordId": "<recordId from the Lambda input>",
      "result": "Ok",
      "data": "<Base64 encoded Transformed data>"
  }
  ```

------
#### [ For Amazon MSK ]

The following parameters are required for all transformed records from Lambda.
+ `recordId` – The record ID is passed from Firehose to Lambda during the invocation. The transformed record must contain the same record ID. Any mismatch between the ID of the original record and the ID of the transformed record is treated as a data transformation failure.
+ `result` – The status of the data transformation of the record. The possible values are: `Ok` (the record was transformed successfully), `Dropped` (the record was dropped intentionally by your processing logic), and `ProcessingFailed` (the record could not be transformed). If a record has a status of `Ok` or `Dropped`, Firehose considers it successfully processed. Otherwise, Firehose considers it unsuccessfully processed.
+ `KafkaRecordValue` – The transformed data payload, after base64-encoding.

  Following is a sample Lambda result output:

  ```
   {
      "recordId": "<recordId from the Lambda input>",
      "result": "Ok",
      "kafkaRecordValue": "<Base64 encoded Transformed data>"
  }
  ```

------

# Supported Lambda blueprints
<a name="lambda-blueprints"></a>

These blueprints demonstrate how you can create and use AWS Lambda functions to transform data in your Amazon Data Firehose data streams. 

**To see the blueprints that are available in the AWS Lambda console**

1. Sign in to the AWS Management Console and open the AWS Lambda console at [https://console.aws.amazon.com/lambda/](https://console.aws.amazon.com/lambda/).

1. Choose **Create function**, and then choose **Use a blueprint**.

1. In the **Blueprints** field, search for the keyword `firehose` to find the Amazon Data Firehose Lambda blueprints.

List of blueprints:
+ **Process records sent to Amazon Data Firehose stream (Node.js, Python)**

  This blueprint shows a basic example of how to process data in your Firehose data stream using AWS Lambda. 

  *Latest release date:* November, 2016. 

  *Release notes:* none.
+ **Process CloudWatch Logs sent to Firehose **

  This blueprint is deprecated. Do not use this blueprint. It might incur high charges when the decompressed CloudWatch Logs data is more than 6MB (Lambda limit). For information on processing CloudWatch Logs sent to Firehose, see [Writing to Firehose Using CloudWatch Logs](https://docs.aws.amazon.com/firehose/latest/dev/writing-with-cloudwatch-logs.html).
+ **Convert Amazon Data Firehose stream records in syslog format to JSON (Node.js)**

  This blueprint shows how you can convert input records in RFC3164 Syslog format to JSON. 

  *Latest release date:* Nov, 2016. 

  *Release notes:* none. 

**To see the blueprints that are available in the AWS Serverless Application Repository**

1. Go to [AWS Serverless Application Repository](https://aws.amazon.com/serverless/serverlessrepo).

1. Choose **Browse all applications**.

1. In the **Applications** field, search for the keyword `firehose`.

You can also create a Lambda function without using a blueprint. See [Getting Started with AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html).

# Handle failure in data transformation
<a name="data-transformation-failure-handling"></a>

If your Lambda function invocation fails because of a network timeout or because you've reached the Lambda invocation limit, Amazon Data Firehose retries the invocation three times by default. If the invocation does not succeed, Amazon Data Firehose then skips that batch of records. The skipped records are treated as unsuccessfully processed records. You can specify or override the retry options using the [CreateDeliveryStream](https://docs.aws.amazon.com/firehose/latest/APIReference/API_CreateDeliveryStream.html) or `[UpdateDestination](https://docs.aws.amazon.com/firehose/latest/APIReference/API_UpdateDestination.html)` API. For this type of failure, you can log invocation errors to Amazon CloudWatch Logs. For more information, see [Monitor Amazon Data Firehose Using CloudWatch Logs](monitoring-with-cloudwatch-logs.md).

If the status of the data transformation of a record is `ProcessingFailed`, Amazon Data Firehose treats the record as unsuccessfully processed. For this type of failure, you can emit error logs to Amazon CloudWatch Logs from your Lambda function. For more information, see [Accessing Amazon CloudWatch Logs for AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-logs.html) in the *AWS Lambda Developer Guide*.

If a data transformation fails, the unsuccessfully processed records are delivered to your S3 bucket in the `processing-failed` folder. The records have the following format:

```
{
    "attemptsMade": "count",
    "arrivalTimestamp": "timestamp",
    "errorCode": "code",
    "errorMessage": "message",
    "attemptEndingTimestamp": "timestamp",
    "rawData": "data",
    "lambdaArn": "arn"
}
```

`attemptsMade`  
The number of invocation requests attempted.

`arrivalTimestamp`  
The time that the record was received by Amazon Data Firehose.

`errorCode`  
The HTTP error code returned by Lambda.

`errorMessage`  
The error message returned by Lambda.

`attemptEndingTimestamp`  
The time that Amazon Data Firehose stopped attempting Lambda invocations.

`rawData`  
The base64-encoded record data.

`lambdaArn`  
The Amazon Resource Name (ARN) of the Lambda function.

# Back up source records
<a name="data-transformation-source-record-backup"></a>

Amazon Data Firehose can back up all untransformed records to your S3 bucket concurrently while delivering transformed records to the destination. You can enable source record backup when you create or update your Firehose stream. You cannot disable source record backup after you enable it.