How Step Functions parses input CSV files
Managing state and transforming data
Step Functions recently added variables and JSONata to manage state and transform data.
Learn about Passing data with variables and Transforming data with JSONata.
Step Functions parses CSV files based on the following rules:
-
Commas (,) are a delimiter that separates fields.
-
Newlines are a delimiter that separates records.
-
Fields are treated as strings. For data type conversions, use the
States.StringToJson
intrinsic function in ItemSelector (Map). -
Double quotation marks (" ") are not required to enclose strings. However, strings that are enclosed by double quotation marks can contain commas and newlines without acting as record delimiters.
-
You can preserve double quotes by repeating them.
-
If the number of fields in a row is less than the number of fields in the header, Step Functions provides empty strings for the missing values.
-
If the number of fields in a row is more than the number of fields in the header, Step Functions skips the additional fields.
Example of parsing an input CSV file
Say that you have provided a CSV file named
that contains one row as input.
Then, you've stored this file in an Amazon S3 bucket that's named
myCSVInput.csv
. The CSV file is as follows.amzn-s3-demo-bucket
abc,123,"This string contains commas, a double quotation marks (""), and a newline (
)",{""MyKey"":""MyValue""},"[1,2,3]"
The following state machine reads this CSV file and uses ItemSelector (Map) to convert the data types of some of the fields.
{ "StartAt": "Map", "States": { "Map": { "Type": "Map", "ItemProcessor": { "ProcessorConfig": { "Mode": "DISTRIBUTED", "ExecutionType": "STANDARD" }, "StartAt": "Pass", "States": { "Pass": { "Type": "Pass", "End": true } } }, "End": true, "Label": "Map", "MaxConcurrency": 1000, "ItemReader": { "Resource": "arn:aws:states:::s3:getObject", "ReaderConfig": { "InputType": "CSV", "CSVHeaderLocation": "GIVEN", "CSVHeaders": [ "MyLetters", "MyNumbers", "MyString", "MyObject", "MyArray" ] }, "Parameters": { "Bucket": "
amzn-s3-demo-bucket
", "Key": "myCSVInput.csv
" } }, "ItemSelector": { "MyLetters.$": "$$.Map.Item.Value.MyLetters", "MyNumbers.$": "States.StringToJson($$.Map.Item.Value.MyNumbers)", "MyString.$": "$$.Map.Item.Value.MyString", "MyObject.$": "States.StringToJson($$.Map.Item.Value.MyObject)", "MyArray.$": "States.StringToJson($$.Map.Item.Value.MyArray)" } } } }
When you run this state machine, it produces the following output.
[
{
"MyNumbers": 123,
"MyObject": {
"MyKey": "MyValue"
},
"MyString": "This string contains commas, a double quote (\"), and a newline (\n)",
"MyLetters": "abc",
"MyArray": [
1,
2,
3
]
}
]