Considerations and limitations for CTAS queries - Amazon Athena

Considerations and limitations for CTAS queries

The following sections describe considerations and limitations to keep in mind when you use CREATE TABLE AS SELECT (CTAS) queries in Athena.

Learn the CTAS query syntax

The CTAS query syntax differs from the syntax of CREATE [EXTERNAL] TABLE used for creating tables. See CREATE TABLE AS.

The difference between views and CTAS queries

CTAS queries write new data to a specified location in Amazon S3. Views do not write any data.

Specify a location for your CTAS query results

If your workgroup overrides the client-side setting for query results location, Athena creates your table in the location s3://amzn-s3-demo-bucket/tables/<query-id>/. To see the query results location specified for the workgroup, view the workgroup's details.

If your workgroup does not override the query results location, you can use the syntax WITH (external_location ='s3://amzn-s3-demo-bucket/') in your CTAS query to specify where your CTAS query results are stored.

Note

The external_location property must specify a location that is empty. A CTAS query checks that the path location (prefix) in the bucket is empty and never overwrites the data if the location already has data in it. To use the same location again, delete the data in the key prefix location in the bucket.

If you omit the external_location syntax and are not using the workgroup setting, Athena uses your client-side setting for the query results location and creates your table in the location s3://amzn-s3-demo-bucket/<Unsaved-or-query-name>/<year>/<month/<date>/tables/<query-id>/.

Locate orphaned files

If a CTAS or INSERT INTO statement fails, it is possible that orphaned data are left in the data location. Because Athena in some cases does not delete data or partial data from your bucket, you might be able to read this partial data in subsequent queries. To locate orphaned files for inspection or deletion, you can use the data manifest file that Athena provides to track the list of files to be written. For more information, see Identify query output files and DataManifestLocation.

Remember that ORDER BY clauses are ignored

In a CTAS query, Athena ignores ORDER BY clauses in the SELECT portion of the query.

According to the SQL specification (ISO 9075 Part 2), the ordering of the rows of a table specified by a query expression is guaranteed only for the query expression that immediately contains the ORDER BY clause. Tables in SQL are in any case inherently unordered, and implementing the ORDER BY in sub query clauses would both cause the query to perform poorly and not result in ordered output. Thus, in Athena CTAS queries, there is no guarantee that the order specified by the ORDER BY clause will be preserved when the data is written.

Choose a format to store your query results

You can store CTAS results in PARQUET, ORC, AVRO, JSON, and TEXTFILE. Multi-character delimiters are not supported for the CTAS TEXTFILE format. If you don't specify a data storage format, CTAS query results are stored in Parquet by default.

CTAS queries do not require specifying a SerDe to interpret format transformations. See Example: Writing query results to a different format.

Consider compression formats

GZIP compression is used for CTAS query results in JSON and TEXTFILE formats. For Parquet, you can use GZIP or SNAPPY, and the default is GZIP. For ORC, you can use LZ4, SNAPPY, ZLIB, or ZSTD, and the default is ZLIB. For CTAS examples that specify compression, see Example: Specifying data storage and compression formats. For more information about compression in Athena, see Use compression in Athena.

Partition and bucket your results

You can partition and bucket the results data of a CTAS query. To specify properties of the destination table, include partitioning and bucketing predicates at the end of the WITH clause. For more information, see Use partitioning and bucketing and Example: Creating bucketed and partitioned tables.

When you use CTAS to create a partitioned table, Athena has a write limit of 100 partitions. For information about working around the 100-partition limitation, see Use CTAS and INSERT INTO to work around the 100 partition limit.

Encrypt your results

You can encrypt CTAS query results in Amazon S3, similar to the way you encrypt other query results in Athena. For more information, see Encrypt Athena query results stored in Amazon S3.

The expected bucket owner setting does not apply to CTAS

For CTAS statements, the expected bucket owner setting does not apply to the destination table location in Amazon S3. The expected bucket owner setting applies only to the Amazon S3 output location that you specify for Athena query results. For more information, see Specify a query result location using the Athena console.

Column data types are preserved

Column data types for a CTAS query are the same as specified for the original query.