With Amazon Redshift, you can export semistructured data from your Amazon Redshift cluster to Amazon S3 in a variety of formats, including text, Apache Parquet, Apache ORC, and Avro. The following sections will guide you through the process of configuring and executing unload operations for your semistructured data in Amazon Redshift.
You can unload tables with SUPER data columns to Amazon S3 in a comma-separated value (CSV) or text format. Using a combination of navigation and unnest clauses, Amazon Redshift unloads hierarchical data in SUPER data format to Amazon S3 in CSV or text formats. Subsequently, you can create external tables against unloaded data and query them using Redshift Spectrum. For information on using UNLOAD and the required IAM permissions, see UNLOAD.
Before running the following example, populate the region_nations table using the processes in Loading semistructured data into Amazon Redshift. For information on the tables used in the following example, see SUPER sample dataset.
The following example unloads data into Amazon S3.
UNLOAD ('SELECT * FROM region_nations')
TO 's3://xxxxxx/'
IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3-Write'
DELIMITER AS '|'
GZIP
ALLOWOVERWRITE;
Unlike other data types where a user-defined string represents a null value, Amazon Redshift exports the SUPER data columns using the JSON format and represents it as null as determined by the JSON format. As a result, SUPER data columns ignore the NULL [AS] option used in UNLOAD commands.