

For similar capabilities to Amazon Timestream for LiveAnalytics, consider Amazon Timestream for InfluxDB. It offers simplified data ingestion and single-digit millisecond query response times for real-time analytics. Learn more [here](https://docs.aws.amazon.com//timestream/latest/developerguide/timestream-for-influxdb.html).

# Using UNLOAD to export query results to S3 from Timestream for LiveAnalytics
<a name="export-unload"></a>

Amazon Timestream for LiveAnalytics now enables you to export your query results to Amazon S3 in a cost-effective and secure way using the `UNLOAD` statement. Using the `UNLOAD` statement, you can now export time series data to selected S3 buckets in either Apache Parquet or Comma Separated Values (CSV) format, which provides flexibility to store, combine, and analyze your time series data with other services. The `UNLOAD` statement allows you to export the data in a compressed manner, which reduces the data transferred and storage space required. `UNLOAD` also supports partitioning based on selected attributes when exporting the data, improving performance and reducing the processing time of downstream services accessing the data. In addition, you can use Amazon S3 managed keys (SSE-S3) or AWS Key Management Service (AWS KMS) managed keys (SSE-KMS) to encrypt your exported data.

## Benefits of UNLOAD from Timestream for LiveAnalytics
<a name="export-unload-benefits"></a>

The key benefits of using the `UNLOAD` statement are as follows.
+ **Operational ease** – With the `UNLOAD` statement, you can export gigabytes of data in a single query request in either Apache Parquet or CSV format, providing flexibility to select the best suited format for your downstream processing needs and making it easier to build data lakes.
+ **Secure and Cost effective** – `UNLOAD` statement provides the capability to export your data to an S3 bucket in a compressed manner and to encrypt (SSE-KMS or SSE\$1S3) your data using customer managed keys, reducing the data storage costs and protecting against unauthorized access.
+ **Performance** – Using the `UNLOAD` statement, you can partition the data when exporting to an S3 bucket. Partitioning the data enables downstream services to process the data in parallel, reducing their processing time. In addition, downstream services can process only the data they need, reducing the processing resources required and thereby costs associated.

## Use cases for UNLOAD from Timestream for LiveAnalytics
<a name="export-unload-use-cases"></a>

You can use the `UNLOAD` statement to write data to your S3 bucket to the following.
+ **Build Data Warehouse** – You can export gigabytes of query results into S3 bucket and more easily add time series data into your data lake. You can use services such as Amazon Athena and Amazon Redshift to combine your time series data with other relevant data to derive complex business insights. 
+ **Build AI and ML data pipelines** – The `UNLOAD` statement enables you to easily build data pipelines for your machine learning models that access time series data, making it easier to use time series data with services such as Amazon SageMaker and Amazon EMR.
+ **Simplify ETL Processing** – Exporting data into S3 buckets can simplify the process of performing Extract, Transform, Load (ETL) operations on the data, enabling you to seamlessly use third-party tools or AWS services such as AWS Glue to process and transform the data.

# UNLOAD Concepts
<a name="export-unload-concepts"></a>

## Syntax
<a name="export-unload-concepts-syntax"></a>

```
UNLOAD (SELECT statement)
 TO 's3://bucket-name/folder'
 WITH ( option = expression [, ...] )
```

where `option` is

```
{ partitioned_by = ARRAY[ col_name[,…] ] 
 | format = [ '{ CSV | PARQUET }' ] 
 | compression = [ '{ GZIP | NONE }' ]
 | encryption = [ '{ SSE_KMS | SSE_S3 }' ]
 | kms_key = '<string>'
 | field_delimiter ='<character>'
 | escaped_by = '<character>'
 | include_header = ['{true, false}']
 | max_file_size = '<value>'
 | }
```

## Parameters
<a name="export-unload-concepts-parameters"></a>

SELECT statement  
The query statement used to select and retrieve data from one or more Timestream for LiveAnalytics tables.   

```
(SELECT column 1, column 2, column 3 from database.table
      where measure_name = "ABC" and timestamp between ago (1d) and now() )
```

TO clause  

```
TO 's3://bucket-name/folder'
```
or  

```
TO 's3://access-point-alias/folder'
```
The `TO` clause in the `UNLOAD` statement specifies the destination for the output of the query results. You need to provide the full path, including either Amazon S3 bucket-name or Amazon S3 access-point-alias with folder location on Amazon S3 where Timestream for LiveAnalytics writes the output file objects. The S3 bucket should be owned by the same account and in the same region. In addition to the query result set, Timestream for LiveAnalytics writes the manifest and metadata files to specified destination folder. 

PARTITIONED\$1BY clause  

```
partitioned_by = ARRAY [col_name[,…] , (default: none)
```
The `partitioned_by` clause is used in queries to group and analyze data at a granular level. When you export your query results to the S3 bucket, you can choose to partition the data based on one or more columns in the select query. When partitioning the data, the exported data is divided into subsets based on the partition column and each subset is stored in a separate folder. Within the results folder that contains your exported data, a sub-folder `folder/results/partition column = partition value/` is automatically created. However, note that partitioned columns are not included in the output file.   
`partitioned_by` is not a mandatory clause in the syntax. If you choose to export the data without any partitioning, you can exclude the clause in the syntax.   

**Example**  
Assuming you are monitoring clickstream data of your website and have 5 channels of traffic namely `direct`, `Social Media`, `Organic Search`, `Other`, and `Referral`. When exporting the data, you can choose to partition the data using the column `Channel`. Within your data folder, `s3://bucketname/results`, you will have five folders each with their respective channel name, for instance, `s3://bucketname/results/channel=Social Media/.` Within this folder you will find the data of all the customers that landed on your website through the `Social Media` channel. Similarly, you will have other folders for the remaining channels.
Exported data partitioned by Channel column  

![\[Folder structure showing channels: Direct, Organic search, Other, Referral, and Social media.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/unload-results.png)


FORMAT  

```
format = [ '{ CSV | PARQUET }' , default: CSV
```
The keywords to specify the format of the query results written to your S3 bucket. You can export the data either as a comma separated value (CSV) using a comma (,) as the default delimiter or in the Apache Parquet format, an efficient open columnar storage format for analytics. 

COMPRESSION  

```
compression = [ '{ GZIP | NONE }' ], default: GZIP
```
You can compress the exported data using compression algorithm GZIP or have it uncompressed by specifying the `NONE` option.

ENCRYPTION  

```
encryption = [ '{ SSE_KMS | SSE_S3 }' ], default: SSE_S3
```
The output files on Amazon S3 are encrypted using your selected encryption option. In addition to your data, the manifest and metadata files are also encrypted based on your selected encryption option. We currently support SSE\$1S3 and SSE\$1KMS encryption. SSE\$1S3 is a server-side encryption with Amazon S3 encrypting the data using 256-bit advanced encryption standard (AES) encryption. SSE\$1KMS is a server-side encryption to encrypt data using customer-managed keys.

KMS\$1KEY  

```
kms_key = '<string>'
```
KMS Key is a customer-defined key to encrypt exported query results. KMS Key is securely managed by AWS Key Management Service (AWS KMS) and used to encrypt data files on Amazon S3.

FIELD\$1DELIMITER  

```
field_delimiter ='<character>' , default: (,)
```
When exporting the data in CSV format, this field specifies a single ASCII character that is used to separate fields in the output file, such as pipe character (\$1), a comma (,), or tab (/t). The default delimiter for CSV files is a comma character. If a value in your data contains the chosen delimiter, the delimiter will be quoted with a quote character. For instance, if the value in your data contains `Time,stream`, then this value will be quoted as `"Time,stream"` in the exported data. The quote character used by Timestream for LiveAnalytics is double quotes (").  
Avoid specifying the carriage return character (ASCII 13, hex `0D`, text '\$1r') or the line break character (ASCII 10, hex 0A, text '\$1n') as the `FIELD_DELIMITER` if you want to include headers in the CSV, since that will prevent many parsers from being able to parse the headers correctly in the resulting CSV output.

ESCAPED\$1BY  

```
escaped_by = '<character>', default: (\)
```
When exporting the data in CSV format, this field specifies the character that should be treated as an escape character in the data file written to S3 bucket. Escaping happens in the following scenarios:  

1. If the value itself contains the quote character (") then it will be escaped using an escape character. For example, if the value is `Time"stream`, where (\$1) is the configured escape character, then it will be escaped as `Time\"stream`. 

1. If the value contains the configured escape character, it will be escaped. For example, if the value is `Time\stream`, then it will be escaped as `Time\\stream`. 
If the exported output contains complex data type in the like Arrays, Rows or Timeseries, it will be serialized as a JSON string. Following is an example.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/timestream/latest/developerguide/export-unload-concepts.html)

INCLUDE\$1HEADER  

```
include_header = 'true' , default: 'false'
```
When exporting the data in CSV format, this field lets you include column names as the first row of the exported CSV data files.  
The accepted values are 'true' and 'false' and the default value is 'false'. Text transformation options such as `escaped_by` and `field_delimiter` apply to headers as well.  
When including headers, it is important that you not select a carriage return character (ASCII 13, hex 0D, text '\$1r') or a line break character (ASCII 10, hex 0A, text '\$1n') as the `FIELD_DELIMITER`, since that will prevent many parsers from being able to parse the headers correctly in the resulting CSV output.

MAX\$1FILE\$1SIZE  

```
max_file_size = 'X[MB|GB]' , default: '78GB'
```
This field specifies the maximum size of the files that the `UNLOAD` statement creates in Amazon S3. The `UNLOAD` statement can create multiple files but the maximum size of each file written to Amazon S3 will be approximately what is specified in this field.  
The value of the field must be between 16 MB and 78 GB, inclusive. You can specify it in integer such as `12GB`, or in decimals such as `0.5GB` or `24.7MB`. The default value is 78 GB.  
The actual file size is approximated when the file is being written, so the actual maximum size may not be exactly equal to the number you specify.

## What is written to my S3 bucket?
<a name="export-unload-common-questions-what-is-written"></a>

For every successfully executed UNLOAD query, Timestream for LiveAnalytics writes your query results, metadata file and manifest file into the S3 bucket. If you have partitioned the data, you have all the partition folders in the results folder. Manifest file contains a list of the files that were written by the UNLOAD command. Metadata file contains information that describes the characteristics, properties, and attributes of the written data. 

## What is the exported file name?
<a name="export-unload-common-questions-what-is-filename"></a>

The exported file name contains two components, the first component is the queryID and the second component is a unique identifier.

CSV files

```
S3://bucket_name/results/<queryid>_<UUID>.csv
S3://bucket_name/results/<partitioncolumn>=<partitionvalue>/<queryid>_<UUID>.csv
```

Compressed CSV file

```
S3://bucket_name/results/<partitioncolumn>=<partitionvalue>/<queryid>_<UUID>.gz 
```

Parquet file

```
S3://bucket_name/results/<partitioncolumn>=<partitionvalue>/<queryid>_<UUID>.parquet
```

Metadata and Manifest files

```
S3://bucket_name/<queryid>_<UUID>_manifest.json 
S3://bucket_name/<queryid>_<UUID>_metadata.json
```

As the data in CSV format is stored at a file level, when you compress the data when exporting to S3, the file will have a “.gz” extension. However, the data in Parquet is compressed at column level so even when you compress the data while exporting, the file will still have .parquet extension.

## What information does each file contain?
<a name="export-unload-common-questions-what-information"></a>

### Manifest file
<a name="export-unload-common-questions-what-information-manifest"></a>

The manifest file provides information on the list of files that are exported with the UNLOAD execution. The manifest file is available in the provided S3 bucket with a file name: `s3://<bucket_name>/<queryid>_<UUID>_manifest.json`. The manifest file will contain the url of the files in the results folder, the number of records and size of the respective files, and the query metadata (which is total bytes and total rows exported to S3 for the query). 

```
{
  "result_files": [
    {
        "url":"s3://my_timestream_unloads/ec2_metrics/AEDAGANLHLBH4OLISD3CVOZZRWPX5GV2XCXRBKCVD554N6GWPWWXBP7LSG74V2Q_1448466917_szCL4YgVYzGXj2lS.gz", 
        "file_metadata": 
            { 
                "content_length_in_bytes": 32295, 
                "row_count": 10 
            }
    },
    {
        "url":"s3://my_timestream_unloads/ec2_metrics/AEDAGANLHLBH4OLISD3CVOZZRWPX5GV2XCXRBKCVD554N6GWPWWXBP7LSG74V2Q_1448466917_szCL4YgVYzGXj2lS.gz", 
        "file_metadata": 
            { 
                "content_length_in_bytes": 62295, 
                "row_count": 20 
            }
    },
  ],
  "query_metadata": 
    {
      "content_length_in_bytes": 94590, 
      "total_row_count": 30,
      "result_format": "CSV",
      "result_version": "Amazon Timestream version 1.0.0"  
    },
  "author": {
        "name": "Amazon Timestream", 
        "manifest_file_version": "1.0" 
  }
}
```

### Metadata
<a name="export-unload-common-questions-what-information-metadata"></a>

The metadata file provides additional information about the data set such as column name, column type, and schema. The metadata file is available in the provided S3 bucket with a file name: S3://bucket\$1name/<queryid>\$1<UUID>\$1metadata.json 

Following is an example of a metadata file.

```
{
    "ColumnInfo": [
        {
            "Name": "hostname",
            "Type": {
                "ScalarType": "VARCHAR"
            }
        },
        {
            "Name": "region",
            "Type": {
                "ScalarType": "VARCHAR"
            }
        },
        {
            "Name": "measure_name",
            "Type": {
                "ScalarType": "VARCHAR"
            }
        },
        {
            "Name": "cpu_utilization",
            "Type": {
                "TimeSeriesMeasureValueColumnInfo": {
                    "Type": {
                        "ScalarType": "DOUBLE"
                    }
                }
            }
        }
  ],
  "Author": {
        "Name": "Amazon Timestream", 
        "MetadataFileVersion": "1.0" 
  }
}
```

The column information shared in the metadata file has same structure as `ColumnInfo` sent in Query API response for `SELECT` queries. 

### Results
<a name="export-unload-common-questions-what-information-results"></a>

Results folder contains your exported data in either Apache Parquet or CSV format. 

## Example
<a name="export-unload-example-short"></a>

When you submit an `UNLOAD` query like below via Query API,

```
UNLOAD(SELECT user_id, ip_address, event, session_id, measure_name, time, query, quantity, product_id, channel 
                    FROM sample_clickstream.sample_shopping WHERE time BETWEEN ago(2d) AND now()) 
                TO 's3://my_timestream_unloads/withoutpartition/' WITH ( format='CSV', compression='GZIP')
```

`UNLOAD` query response will have 1 row \$1 3 columns. Those 3 columns are:
+ rows of type BIGINT - indicating the number of rows exported
+ metadataFile of type VARCHAR - which is the S3 URI of metadata file exported
+ manifestFile of type VARCHAR - which is the S3 URI of manifest file exported

You will get the following response from Query API:

```
{
    "Rows": [
        {
            "Data": [
                {
                    "ScalarValue": "20" # No of rows in output across all files
                },
                {
                    "ScalarValue": "s3://my_timestream_unloads/withoutpartition/AEDAAANGH3D7FYHOBQGQQMEAISCJ45B42OWWJMOT4N6RRJICZUA7R25VYVOHJIY_<UUID>_metadata.json" #Metadata file
                },
                {
                    "ScalarValue": "s3://my_timestream_unloads/withoutpartition/AEDAAANGH3D7FYHOBQGQQMEAISCJ45B42OWWJMOT4N6RRJICZUA7R25VYVOHJIY_<UUID>_manifest.json" #Manifest file
                }
            ]
        }
    ],
    "ColumnInfo": [
        {
            "Name": "rows",
            "Type": {
                "ScalarType": "BIGINT"
            }
        },
        {
            "Name": "metadataFile",
            "Type": {
                "ScalarType": "VARCHAR"
            }
        },
        {
            "Name": "manifestFile",
            "Type": {
                "ScalarType": "VARCHAR"
            }
        }
    ],
    "QueryId": "AEDAAANGH3D7FYHOBQGQQMEAISCJ45B42OWWJMOT4N6RRJICZUA7R25VYVOHJIY",
    "QueryStatus": {
        "ProgressPercentage": 100.0,
        "CumulativeBytesScanned": 1000,
        "CumulativeBytesMetered": 10000000
    }
}
```

## Data types
<a name="export-unload-data-types-explanation"></a>

The `UNLOAD` statement supports all data types of Timestream for LiveAnalytics’s query language described in [Supported data types](supported-data-types.md) except `time` and `unknown`.

# Prerequisites for UNLOAD from Timestream for LiveAnalytics
<a name="export-unload-prerequisites"></a>

Following are prerequisites for writing data to S3 using `UNLOAD` from Timestream for LiveAnalytics.
+ You must have permission to read data from the Timestream for LiveAnalytics table(s) to be used in an `UNLOAD` command.
+ You must have an Amazon S3 bucket in the same AWS Region as your Timestream for LiveAnalytics resources.
+ For the selected S3 bucket, ensure that the [S3 bucket policy](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html) also has permissions to allow Timestream for LiveAnalytics to export the data.
+ The credentials used to execute `UNLOAD` query must have necessary AWS Identity and Access Management (IAM) permissions that allows Timestream for LiveAnalytics to write the data to S3. An example policy would be as follows:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [{
            "Effect": "Allow",
            "Action": [
                "timestream:Select",
                "timestream:ListMeasures",
                "timestream:WriteRecords",
                "timestream:Unload"
            ],
            "Resource": "arn:aws:timestream:us-east-2:111122223333:database/database_name/table/table_name"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketAcl",
                "s3:PutObject",
                "s3:GetObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::S3_Bucket_Created",
                "arn:aws:s3:::S3_Bucket_Created/*"
            ]
        }
    ]
}
```

------

For additional context on these S3 write permissions, refer to the [Amazon Simple Storage Service guide](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html#mpuAndPermissions). If you are using a KMS key for encrypting the exported data, see the following for the additional IAM policies required.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
      {
        "Effect": "Allow",
        "Action": [
            "kms:DescribeKey",
            "kms:Decrypt",
            "kms:GenerateDataKey*"
        ],
        "Resource": "arn:aws:kms:us-east-2:111122223333:key/*",
        "Condition": {
            "ForAnyValue:StringLike": {
                "kms:ResourceAliases": "alias/Alias_For_Generated_Key"
            }
        }
    }, {
        "Effect": "Allow",
        "Action": [
            "kms:CreateGrant"
        ],
        "Resource": "arn:aws:kms:us-east-2:111122223333:key/*",
        "Condition": {
            "ForAnyValue:StringEquals": {
                "kms:EncryptionContextKeys": "aws:timestream:database_name"
            },
            "Bool": {
                "kms:GrantIsForAWSResource": true
            },
            "StringLike": {
                "kms:ViaService": "timestream.us-east-2.amazonaws.com"
            },
            "ForAnyValue:StringLike": {
                "kms:ResourceAliases": "alias/Alias_For_Generated_Key"
            }
        }
    }
]
}
```

------

# Best practices for UNLOAD from Timestream for LiveAnalytics
<a name="export-unload-best-practices"></a>

Following are best practices related to the UNLOAD command.
+ The amount of data that can be exported to S3 bucket using the `UNLOAD` command is not bounded. However, the query times out in 60 minutes and we recommend exporting no more than 60GB of data in a single query. If you need to export more than 60GB of data, split the job across multiple queries.
+ While you can send thousands of requests to S3 to upload the data, it is recommended to parallelize the write operations to multiple S3 prefixes. Refer to documentation [here](https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html). S3 API call rate could be throttled when multiple readers/writers access the same folder. 
+ Given the limit on S3 key length for defining a prefix, we recommend having bucket and folder names within 10-15 characters, especially when using `partitioned_by` clause. 
+ When you receive a 4XX or 5XX for queries containing the `UNLOAD` statement, it is possible that partial results are written into the S3 bucket. Timestream for LiveAnalytics does not delete any data from your bucket. Before executing another `UNLOAD` query with same S3 destination, we recommend to manually delete the files created by the failed query. You can identify the files written by a failed query with the corresponding `QueryExecutionId`. For failed queries, Timestream for LiveAnalytics does not export a manifest file to the S3 bucket. 
+ Timestream for LiveAnalytics uses multi-part upload to export query results to S3. When you receive a 4XX or 5XX from Timestream for LiveAnalytics for queries containing an UNLOAD statement, Timestream for LiveAnalytics does a best-effort abortion of multi-part upload but it is possible that some incomplete parts are left behind. Hence, we recommended to set up an auto cleanup of incomplete multi-part uploads in your S3 bucket by following the guidelines [here](https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/).

## Recommendations for accessing the data in CSV format using CSV parser
<a name="export-unload-common-questions-what-information-recommendations-csv"></a>
+ CSV parsers don’t allow you to have same character in delimiter, escape, and quote character.
+ Some CSV parsers cannot interpret complex data types such as Arrays, we recommend interpreting those through JSON deserializer. 

## Recommendations for accessing the data in Parquet format
<a name="export-unload-common-questions-what-information-recommendations-parquet"></a>

1. If your use case requires UTF-8 character support in schema aka column name, we recommend using [Parquet-mr library](https://github.com/apache/parquet-mr).

1. The timestamp in your results is represented as a 12 byte integer (INT96)

1. Timeseries will be represented as `array<row<time, value>>`, other nested structures will use corresponding datatypes supported in Parquet format

## Using partition\$1by clause
<a name="export-unload-best-practices-partition-by"></a>
+ The column used in the `partitioned_by` field should be the last column in the select query. If more than one column is used in the `partitioned_by` field, the columns should be the last columns in the select query and in the same order as used in the `partition_by` field.
+ The column values used to partition the data (`partitioned_by` field) can contain only ASCII characters. While Timestream for LiveAnalytics allows UTF-8 characters in the values, S3 supports only ASCII characters as object keys.

# Example use case for UNLOAD from Timestream for LiveAnalytics
<a name="export-unload-example-use-case"></a>

Assume you are monitoring user session metrics, traffic sources, and product purchases of your e-commerce website. You are using Timestream for LiveAnalytics to derive real-time insights into user behavior, product sales, and perform marketing analytics on traffic channels (organic search, social media, direct traffic, paid campaigns and others) that drive customers to the website. 

**Topics**
+ [

## Exporting the data without any partitions
](#export-unload-example-sample-1)
+ [

## Partitioning data by channel
](#export-unload-example-sample-2)
+ [

## Partitioning data by event
](#export-unload-example-sample-3)
+ [

## Partitioning data by both channel and event
](#export-unload-example-sample-4)
+ [

## Manifest and metadata files
](#export-unload-example-manifest-metadata)
+ [

## Using Glue crawlers to build Glue Data Catalog
](#export-unload-example-using-glue-crawlers)

## Exporting the data without any partitions
<a name="export-unload-example-sample-1"></a>

You want to export the last two days of your data in CSV format.

```
UNLOAD(SELECT user_id, ip_address, event, session_id, measure_name, time, 
query, quantity, product_id, channel 
FROM sample_clickstream.sample_shopping 
WHERE time BETWEEN ago(2d) AND now())  
TO 's3://<bucket_name>/withoutpartition' 
WITH (  format='CSV',   
compression='GZIP')
```

## Partitioning data by channel
<a name="export-unload-example-sample-2"></a>

You want to export the last two days of data in CSV format but would like to have the data from each traffic channel in a separate folder. To do this, you need to partition the data using the `channel` column as shown in the following.

```
UNLOAD(SELECT user_id, ip_address, event, session_id, measure_name, time, 
query, quantity, product_id, channel 
FROM sample_clickstream.sample_shopping 
WHERE time BETWEEN ago(2d) AND now())  
TO 's3://<bucket_name>/partitionbychannel/' 
WITH (  
partitioned_by = ARRAY ['channel'], 
format='CSV',   
compression='GZIP')
```

## Partitioning data by event
<a name="export-unload-example-sample-3"></a>

You want to export the last two days of data in CSV format but would like to have the data for each event in a separate folder. To do this, you need to partition the data using the `event` column as shown in the following.

```
UNLOAD(SELECT user_id, ip_address, channel, session_id, measure_name, time, 
query, quantity, product_id, event 
FROM sample_clickstream.sample_shopping 
WHERE time BETWEEN ago(2d) AND now())  
TO 's3://<bucket_name>/partitionbyevent/' 
WITH (  
partitioned_by = ARRAY ['event'], 
format='CSV',   
compression='GZIP')
```

## Partitioning data by both channel and event
<a name="export-unload-example-sample-4"></a>

You want to export the last two days of data in CSV format but would like to have the data for each channel and within channel store each event in a separate folder. To do this, you need to partition the data using both `channel` and `event` column as shown in the following.

```
UNLOAD(SELECT user_id, ip_address, session_id, measure_name, time, 
query, quantity, product_id, channel,event 
FROM sample_clickstream.sample_shopping 
WHERE time BETWEEN ago(2d) AND now())  
TO 's3://<bucket_name>/partitionbychannelevent/' 
WITH (  
partitioned_by = ARRAY ['channel','event'], 
format='CSV',   
compression='GZIP')
```

## Manifest and metadata files
<a name="export-unload-example-manifest-metadata"></a>

### Manifest file
<a name="export-unload-common-questions-what-information-manifest"></a>

The manifest file provides information on the list of files that are exported with the UNLOAD execution. The manifest file is available in the provided S3 bucket with a file name: `S3://bucket_name/<queryid>_<UUID>_manifest.json`. The manifest file will contain the url of the files in the results folder, the number of records and size of the respective files, and the query metadata (which is total bytes and total rows exported to S3 for the query). 

```
{
  "result_files": [
    {
        "url":"s3://my_timestream_unloads/ec2_metrics/AEDAGANLHLBH4OLISD3CVOZZRWPX5GV2XCXRBKCVD554N6GWPWWXBP7LSG74V2Q_1448466917_szCL4YgVYzGXj2lS.gz", 
        "file_metadata": 
            { 
                "content_length_in_bytes": 32295, 
                "row_count": 10 
            }
    },
    {
        "url":"s3://my_timestream_unloads/ec2_metrics/AEDAGANLHLBH4OLISD3CVOZZRWPX5GV2XCXRBKCVD554N6GWPWWXBP7LSG74V2Q_1448466917_szCL4YgVYzGXj2lS.gz", 
        "file_metadata": 
            { 
                "content_length_in_bytes": 62295, 
                "row_count": 20 
            }
    },
  ],
  "query_metadata": 
    {
      "content_length_in_bytes": 94590, 
      "total_row_count": 30,
      "result_format": "CSV",
      "result_version": "Amazon Timestream version 1.0.0"  
    },
  "author": {
        "name": "Amazon Timestream", 
        "manifest_file_version": "1.0" 
  }
}
```

### Metadata
<a name="export-unload-common-questions-what-information-metadata"></a>

The metadata file provides additional information about the data set such as column name, column type, and schema. The metadata file is available in the provided S3 bucket with a file name: S3://bucket\$1name/<queryid>\$1<UUID>\$1metadata.json 

Following is an example of a metadata file.

```
{
    "ColumnInfo": [
        {
            "Name": "hostname",
            "Type": {
                "ScalarType": "VARCHAR"
            }
        },
        {
            "Name": "region",
            "Type": {
                "ScalarType": "VARCHAR"
            }
        },
        {
            "Name": "measure_name",
            "Type": {
                "ScalarType": "VARCHAR"
            }
        },
        {
            "Name": "cpu_utilization",
            "Type": {
                "TimeSeriesMeasureValueColumnInfo": {
                    "Type": {
                        "ScalarType": "DOUBLE"
                    }
                }
            }
        }
  ],
  "Author": {
        "Name": "Amazon Timestream", 
        "MetadataFileVersion": "1.0" 
  }
}
```

The column information shared in the metadata file has same structure as `ColumnInfo` sent in Query API response for `SELECT` queries. 

## Using Glue crawlers to build Glue Data Catalog
<a name="export-unload-example-using-glue-crawlers"></a>

1. Login to your account with Admin credentials for the following validation.

1. Create a Crawler for Glue Database using the guidelines provided [here](https://docs.aws.amazon.com/glue/latest/ug/tutorial-add-crawler.html). Please note that the S3 folder to be provided in the datasource should be the `UNLOAD` result folder such as `s3://my_timestream_unloads/results`.

1. Run the crawler following the guidelines [here](https://docs.aws.amazon.com/glue/latest/ug/tutorial-add-crawler.html#tutorial-add-crawler-step2).

1. View the Glue table.
   + Go to **AWS Glue** → **Tables**.
   + You will see a new table created with table prefix provided while creating the crawler.
   + You can see the schema and partition information by clicking the table details view.

The following are other AWS services and open-source projects that use the AWS Glue Data Catalog.
+ **Amazon Athena** – For more information, see [Understanding tables, databases, and data catalogs](https://docs.aws.amazon.com/athena/latest/ug/understanding-tables-databases-and-the-data-catalog.html) in the Amazon Athena User Guide.
+ **Amazon Redshift Spectrum** – For more information, see [Querying external data using Amazon Redshift Spectrum](https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html) in the Amazon Redshift Database Developer Guide.
+ **Amazon EMR** – For more information, see [Use resource-based policies for Amazon EMR access to AWS Glue Data Catalog](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles-glue.html) in the Amazon EMR Management Guide.
+ **AWS Glue Data Catalog client for Apache Hive metastore** – For more information about this GitHub project, see [AWS Glue Data Catalog Client for Apache Hive Metastore](https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore).

# Limits for UNLOAD from Timestream for LiveAnalytics
<a name="export-unload-limits"></a>

Following are limits related to the `UNLOAD` command.
+ Concurrency for queries using the `UNLOAD` statement is 1 query per second (QPS). Exceeding the query rate might result in throttling.
+ Queries containing `UNLOAD` statement can export at most 100 partitions per query. We recommend to check the distinct count of the selected column before using it to partition the exported data.
+ Queries containing `UNLOAD` statement time out after 60 minutes.
+ The maximum size of the files that the `UNLOAD` statement creates in Amazon S3 is 78 GB.

For other limits for Timestream for LiveAnalytics, see [QuotasDefault quotas](ts-limits.md)