# SQL reference for Amazon S3 Select **Important** Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) This reference contains a description of the structured query language (SQL) elements that are supported by Amazon S3 Select. **Topics** + [ # SELECT command ](s3-select-sql-reference-select.md) + [ # Data types ](s3-select-sql-reference-data-types.md) + [ # Operators ](s3-select-sql-reference-operators.md) + [ # Reserved keywords ](s3-select-sql-reference-keyword-list.md) + [ # SQL functions ](s3-select-sql-reference-sql-functions.md) # SELECT command **Important** Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) Amazon S3 Select supports only the `SELECT` SQL command. The following ANSI standard clauses are supported for `SELECT`: + `SELECT` list + `FROM` clause + `WHERE` clause + `LIMIT` clause **Note** Amazon S3 Select queries currently do not support subqueries or joins. ## SELECT list The `SELECT` list names the columns, functions, and expressions that you want the query to return. The list represents the output of the query. ``` SELECT * SELECT projection1 AS column_alias_1, projection2 AS column_alias_2 ``` The first form of `SELECT` with the `*` (asterisk) returns every row that passed the `WHERE` clause, as-is. The second form of `SELECT` creates a row with user-defined output scalar expressions **`projection1`** and **`projection2`** for each column. ## FROM clause Amazon S3 Select supports the following forms of the `FROM` clause: ``` FROM table_name FROM table_name alias FROM table_name AS alias ``` In each form of the `FROM` clause, `table_name` is the `S3Object` that's being queried. Users coming from traditional relational databases can think of this as a database schema that contains multiple views over a table. Following standard SQL, the `FROM` clause creates rows that are filtered in the `WHERE` clause and projected in the `SELECT` list. For JSON objects that are stored in Amazon S3 Select, you can also use the following forms of the `FROM` clause: ``` FROM S3Object[*].path FROM S3Object[*].path alias FROM S3Object[*].path AS alias ``` Using this form of the `FROM` clause, you can select from arrays or objects within a JSON object. You can specify `path` by using one of the following forms: + By name (in an object): `.name` or `['name']` + By index (in an array): `[index]` + By wildcard character (in an object): `.*` + By wildcard character (in an array): `[*]` **Note** This form of the `FROM` clause works only with JSON objects. Wildcard characters always emit at least one record. If no record matches, then Amazon S3 Select emits the value `MISSING`. During output serialization (after the query finishes running), Amazon S3 Select replaces `MISSING` values with empty records. Aggregate functions (`AVG`, `COUNT`, `MAX`, `MIN`, and `SUM`) skip `MISSING` values. If you don't provide an alias when using a wildcard character, you can refer to the row by using the last element in the path. For example, you could select all prices from a list of books by using the query `SELECT price FROM S3Object[*].books[*].price`. If the path ends in a wildcard character instead of a name, then you can use the value `_1` to refer to the row. For example, instead of `SELECT price FROM S3Object[*].books[*].price`, you could use the query `SELECT _1.price FROM S3Object[*].books[*]`. Amazon S3 Select always treats a JSON document as an array of root-level values. Thus, even if the JSON object that you are querying has only one root element, the `FROM` clause must begin with `S3Object[*]`. However, for compatibility reasons, Amazon S3 Select allows you to omit the wildcard character if you don't include a path. Thus, the complete clause `FROM S3Object` is equivalent to `FROM S3Object[*] as S3Object`. If you include a path, you must also use the wildcard character. So, `FROM S3Object` and `FROM S3Object[*].path` are both valid clauses, but `FROM S3Object.path` is not. **Example** **Examples:** *Example \$11* This example shows results when using the following dataset and query: ``` { "Rules": [ {"id": "1"}, {"expr": "y > x"}, {"id": "2", "expr": "z = DEBUG"} ]} { "created": "June 27", "modified": "July 6" } ``` ``` SELECT id FROM S3Object[*].Rules[*].id ``` ``` {"id":"1"} {} {"id":"2"} {} ``` Amazon S3 Select produces each result for the following reasons: + `{"id":"id-1"}` – `S3Object[0].Rules[0].id` produced a match. + `{}` – `S3Object[0].Rules[1].id` did not match a record, so Amazon S3 Select emitted `MISSING`, which was then changed to an empty record during output serialization and returned. + `{"id":"id-2"}` – `S3Object[0].Rules[2].id` produced a match. + `{}` – `S3Object[1]` did not match on `Rules`, so Amazon S3 Select emitted `MISSING`, which was then changed to an empty record during output serialization and returned. If you don't want Amazon S3 Select to return empty records when it doesn't find a match, you can test for the value `MISSING`. The following query returns the same results as the previous query, but with the empty values omitted: ``` SELECT id FROM S3Object[*].Rules[*].id WHERE id IS NOT MISSING ``` ``` {"id":"1"} {"id":"2"} ``` *Example \$12* This example shows results when using the following dataset and queries: ``` { "created": "936864000", "dir_name": "important_docs", "files": [ { "name": "." }, { "name": ".." }, { "name": ".aws" }, { "name": "downloads" } ], "owner": "Amazon S3" } { "created": "936864000", "dir_name": "other_docs", "files": [ { "name": "." }, { "name": ".." }, { "name": "my stuff" }, { "name": "backup" } ], "owner": "User" } ``` ``` SELECT d.dir_name, d.files FROM S3Object[*] d ``` ``` {"dir_name":"important_docs","files":[{"name":"."},{"name":".."},{"name":".aws"},{"name":"downloads"}]} {"dir_name":"other_docs","files":[{"name":"."},{"name":".."},{"name":"my stuff"},{"name":"backup"}]} ``` ``` SELECT _1.dir_name, _1.owner FROM S3Object[*] ``` ``` {"dir_name":"important_docs","owner":"Amazon S3"} {"dir_name":"other_docs","owner":"User"} ``` ## WHERE clause The `WHERE` clause follows this syntax: ``` WHERE condition ``` The `WHERE` clause filters rows based on the `condition`. A condition is an expression that has a Boolean result. Only rows for which the condition evaluates to `TRUE` are returned in the result. ## LIMIT clause The `LIMIT` clause follows this syntax: ``` LIMIT number ``` The `LIMIT` clause limits the number of records that you want the query to return based on `number`. ## Attribute access The `SELECT` and `WHERE` clauses can refer to record data by using one of the methods in the following sections, depending on whether the file that is being queried is in CSV or JSON format. ### CSV + **Column Numbers** – You can refer to the *Nth* column of a row with the column name `_N`, where *`N`* is the column position. The position count starts at 1. For example, the first column is named `_1` and the second column is named `_2`. You can refer to a column as `_N` or `alias._N`. For example, `_2` and `myAlias._2` are both valid ways to refer to a column in the `SELECT` list and `WHERE` clause. + **Column Headers** – For objects in CSV format that have a header row, the headers are available to the `SELECT` list and `WHERE` clause. In particular, as in traditional SQL, within `SELECT` and `WHERE` clause expressions, you can refer to the columns by `alias.column_name` or `column_name`. ### JSON + **Document** – You can access JSON document fields as `alias.name`. You can also access nested fields, for example, `alias.name1.name2.name3`. + **List** – You can access elements in a JSON list by using zero-based indexes with the `[]` operator. For example, you can access the second element of a list as `alias[1]`. You can combine accessing list elements with fields, for example, `alias.name1.name2[1].name3`. + **Examples:** Consider this JSON object as a sample dataset: ``` {"name": "Susan Smith", "org": "engineering", "projects": [ {"project_name":"project1", "completed":false}, {"project_name":"project2", "completed":true} ] } ``` *Example \$11* The following query returns these results: ``` Select s.name from S3Object s ``` ``` {"name":"Susan Smith"} ``` *Example \$12* The following query returns these results: ``` Select s.projects[0].project_name from S3Object s ``` ``` {"project_name":"project1"} ``` ## Case sensitivity of header and attribute names With Amazon S3 Select, you can use double quotation marks to indicate that column headers (for CSV objects) and attributes (for JSON objects) are case sensitive. Without double quotation marks, object headers and attributes are case insensitive. An error is thrown in cases of ambiguity. The following examples are either 1) Amazon S3 objects in CSV format with the specified column headers, and with `FileHeaderInfo` set to `"Use"` for the query request; or 2) Amazon S3 objects in JSON format with the specified attributes. *Example \$11:* The object being queried has the header or attribute `NAME`. + The following expression successfully returns values from the object. Because there are no quotation marks, the query is case insensitive. ``` SELECT s.name from S3Object s ``` + The following expression results in a 400 error `MissingHeaderName`. Because there are quotation marks, the query is case sensitive. ``` SELECT s."name" from S3Object s ``` *Example \$12:* The Amazon S3 object being queried has one header or attribute with `NAME` and another header or attribute with `name`. + The following expression results in a 400 error `AmbiguousFieldName`. Because there are no quotation marks, the query is case insensitive, but there are two matches, so the error is thrown. ``` SELECT s.name from S3Object s ``` + The following expression successfully returns values from the object. Because there are quotation marks, the query is case sensitive, so there is no ambiguity. ``` SELECT s."NAME" from S3Object s ``` ## Using reserved keywords as user-defined terms Amazon S3 Select has a set of reserved keywords that are needed to run the SQL expressions used to query object content. Reserved keywords include function names, data types, operators, and so on. In some cases, user-defined terms, such as the column headers (for CSV files) or attributes (for JSON objects), might clash with a reserved keyword. When this happens, you must use double quotation marks to indicate that you are intentionally using a user-defined term that clashes with a reserved keyword. Otherwise a 400 parse error will result. For the full list of reserved keywords, see [Reserved keywords](s3-select-sql-reference-keyword-list.md). The following example is either 1) an Amazon S3 object in CSV format with the specified column headers, with `FileHeaderInfo` set to `"Use"` for the query request, or 2) an Amazon S3 object in JSON format with the specified attributes. *Example:* The object being queried has a header or attribute named `CAST`, which is a reserved keyword. + The following expression successfully returns values from the object. Because quotation marks are used in the query, S3 Select uses the user-defined header or attribute. ``` SELECT s."CAST" from S3Object s ``` + The following expression results in a 400 parse error. Because no quotation marks are used in the query, `CAST` clashes with a reserved keyword. ``` SELECT s.CAST from S3Object s ``` ## Scalar expressions Within the `WHERE` clause and the `SELECT` list, you can have SQL *scalar expressions*, which are expressions that return scalar values. They have the following form: + ***`literal`*** An SQL literal. + ***`column_reference`*** A reference to a column in the form `column_name` or `alias.column_name`. + **`unary_op`** **`expression`** In this case, ****`unary_op`**** is an SQL unary operator. + **`expression`** **`binary_op`** ***`expression`*** In this case, ****`binary_op`**** is an SQL binary operator. + **`func_name`** In this case, **`func_name`** is the name of the scalar function to invoke. + ***`expression`*** `[ NOT ] BETWEEN` ****`expression`**** `AND` ****`expression`**** + ***`expression`*** `LIKE` ****`expression`**** [ `ESCAPE` ***`expression`*** ] # Data types **Important** Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) Amazon S3 Select supports several primitive data types. ## Data type conversions The general rule is to follow the `CAST` function if it's defined. If `CAST` is not defined, then all input data is treated as a string. In that case, you must cast your input data into the relevant data types when necessary. For more information about the `CAST` function, see [CAST](s3-select-sql-reference-conversion.md#s3-select-sql-reference-cast). ## Supported data types Amazon S3 Select supports the following set of primitive data types. | Name | Description | Examples | | --- | --- | --- | | `bool` | A Boolean value, either `TRUE` or `FALSE`. | `FALSE` | | `int`, `integer` | An 8-byte signed integer in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. | `100000` | | `string` | A UTF8-encoded variable-length string. The default limit is 1 character. The maximum character limit is 2,147,483,647. | `'xyz'` | | `float` | An 8-byte floating point number. | `CAST(0.456 AS FLOAT)` | | `decimal`, `numeric` | A base-10 number, with a maximum precision of 38 (that is, the maximum number of significant digits), and with a scale within the range of -231 to 231-1 (that is, the base-10 exponent). Amazon S3 Select ignores scale and precision when you provide both at the same time. | `123.456 ` | | `timestamp` | Timestamps represent a specific moment in time, always include a local offset, and are capable of arbitrary precision. In the text format, timestamps follow the [W3C note on date and time formats](https://www.w3.org/TR/NOTE-datetime), but they must end with the literal `T` if the timestamps are not at least whole-day precision. Fractional seconds are allowed, with at least one digit of precision, and an unlimited maximum. Local-time offsets can be represented as either hour:minute offsets from UTC, or as the literal `Z` to denote a local time of UTC. Local-time offsets are required on timestamps with time and are not allowed on date values. | `CAST('2007-04-05T14:30Z' AS TIMESTAMP)` | ### Supported Parquet types Amazon S3 Select supports the following Parquet types. + `DATE` + `DECIMAL` + `ENUM` + `INT(8)` + `INT(16)` + `INT(32)` + `INT(64)` + `LIST` **Note** For `LIST` Parquet type output, Amazon S3 Select supports only JSON format. However, if the query limits the data to simple values, the `LIST` Parquet type can also be queried in CSV format. + `STRING` + `TIMESTAMP` supported precision (`MILLIS`/`MICROS`/`NANOS`) **Note** Timestamps saved as an `INT(96)` are unsupported. Because of the range of the `INT(64)` type, timestamps that are using the `NANOS` unit can represent only values between `1677-09-21 00:12:43` and `2262-04-11 23:47:16`. Values outside of this range cannot be represented with the `NANOS` unit. ### Mapping of Parquet types to supported data types in Amazon S3 Select | Parquet types | Supported data types | | --- | --- | | `DATE` | `timestamp` | | `DECIMAL` | `decimal`, `numeric` | | `ENUM` | `string` | | `INT(8)` | `int`, `integer` | | `INT(16)` | `int`, `integer` | | `INT(32)` | `int`, `integer` | | `INT(64)` | `decimal`, `numeric` | | `LIST` | Each Parquet type in list is mapped to the corresponding data type. | | `STRING` | `string` | | `TIMESTAMP` | `timestamp` | # Operators **Important** Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) Amazon S3 Select supports the following operators. ## Logical operators + `AND` + `NOT` + `OR` ## Comparison operators + `<` + `>` + `<=` + `>=` + `=` + `<>` + `!=` + `BETWEEN` + `IN` – For example: `IN ('a', 'b', 'c')` ## Pattern-matching operators + `LIKE` + `_` (Matches any character) + `%` (Matches any sequence of characters) ## Unitary operators + `IS NULL` + `IS NOT NULL` ## Math operators Addition, subtraction, multiplication, division, and modulo are supported, as follows: + \$1 + - + \$1 + / + % ## Operator precedence The following table shows the operators' precedence in decreasing order. | Operator or element | Associativity | Required | | --- | --- | --- | | `-` | right | unary minus | | `*`, `/`, `%` | left | multiplication, division, modulo | | `+`, `-` | left | addition, subtraction | | `IN` | | set membership | | `BETWEEN` | | range containment | | `LIKE` | | string pattern matching | | `<``>` | | less than, greater than | | `=` | right | equality, assignment | | `NOT` | right | logical negation | | `AND` | left | logical conjunction | | `OR` | left | logical disjunction | # Reserved keywords **Important** Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) The following is the list of reserved keywords for Amazon S3 Select. These keywords include function names, data types, operators, and so on, that are needed to run the SQL expressions that are used to query object content. ``` absolute action add all allocate alter and any are as asc assertion at authorization avg bag begin between bit bit_length blob bool boolean both by cascade cascaded case cast catalog char char_length character character_length check clob close coalesce collate collation column commit connect connection constraint constraints continue convert corresponding count create cross current current_date current_time current_timestamp current_user cursor date day deallocate dec decimal declare default deferrable deferred delete desc describe descriptor diagnostics disconnect distinct domain double drop else end end-exec escape except exception exec execute exists external extract false fetch first float for foreign found from full get global go goto grant group having hour identity immediate in indicator initially inner input insensitive insert int integer intersect interval into is isolation join key language last leading left level like limit list local lower match max min minute missing module month names national natural nchar next no not null nullif numeric octet_length of on only open option or order outer output overlaps pad partial pivot position precision prepare preserve primary prior privileges procedure public read real references relative restrict revoke right rollback rows schema scroll second section select session session_user set sexp size smallint some space sql sqlcode sqlerror sqlstate string struct substring sum symbol system_user table temporary then time timestamp timezone_hour timezone_minute to trailing transaction translate translation trim true tuple union unique unknown unpivot update upper usage user using value values varchar varying view when whenever where with work write year zone ``` # SQL functions **Important** Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) Amazon S3 Select supports the following SQL functions. **Topics** + [ # Aggregate functions ](s3-select-sql-reference-aggregate.md) + [ # Conditional functions ](s3-select-sql-reference-conditional.md) + [ # Conversion functions ](s3-select-sql-reference-conversion.md) + [ # Date functions ](s3-select-sql-reference-date.md) + [ # String functions ](s3-select-sql-reference-string.md) # Aggregate functions **Important** Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) Amazon S3 Select supports the following aggregate functions. | Function | Argument type | Return type | | --- | --- | --- | | `AVG(expression)` | `INT`, `FLOAT`, `DECIMAL` | `DECIMAL` for an `INT` argument, `FLOAT` for a floating-point argument; otherwise the same as the argument data type. | | `COUNT` | `-` | `INT` | | `MAX(expression)` | `INT`, `DECIMAL` | Same as the argument type. | | `MIN(expression)` | `INT`, `DECIMAL` | Same as the argument type. | | `SUM(expression)` | `INT`, `FLOAT`, `DOUBLE`, `DECIMAL` | `INT` for an `INT` argument, `FLOAT` for a floating-point argument; otherwise, the same as the argument data type. | ## SUM example To aggregate the total object sizes of a folder in an [S3 Inventory report](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html), use a `SUM` expression. The following S3 Inventory report is a CSV file that's compressed with GZIP. There are three columns. + The first column is the name of the S3 bucket (*`DOC-EXAMPLE-BUCKET`*) that the S3 Inventory report is for. + The second column is the object key name that uniquely identifies the object in the bucket. The `example-folder/` value in the first row is for the folder `example-folder`. In Amazon S3, when you create a folder in your bucket, S3 creates a 0-byte object with a key that's set to the folder name that you provided. The `example-folder/object1` value in the second row is for the object `object1` in the folder `example-folder`. The `example-folder/object2` value in the third row is for the object `object2` in the folder `example-folder`. For more information about S3 folders, see [Organizing objects in the Amazon S3 console by using folders](using-folders.md). + The third column is the object size in bytes. ``` "DOC-EXAMPLE-BUCKET","example-folder/","0" "DOC-EXAMPLE-BUCKET","example-folder/object1","2011267" "DOC-EXAMPLE-BUCKET","example-folder/object2","1570024" ``` To use a `SUM` expression to calculate the total size of the folder `example-folder`, run the SQL query with Amazon S3 Select. ``` SELECT SUM(CAST(_3 as INT)) FROM s3object s WHERE _2 LIKE 'example-folder/%' AND _2 != 'example-folder/'; ``` Query Result: ``` 3581291 ``` # Conditional functions **Important** Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) Amazon S3 Select supports the following conditional functions. **Topics** + [ ## CASE ](#s3-select-sql-reference-case) + [ ## COALESCE ](#s3-select-sql-reference-coalesce) + [ ## NULLIF ](#s3-select-sql-reference-nullif) ## CASE The `CASE` expression is a conditional expression, similar to `if/then/else` statements found in other languages. `CASE` is used to specify a result when there are multiple conditions. There are two types of `CASE` expressions: simple and searched. In simple `CASE` expressions, an expression is compared with a value. When a match is found, the specified action in the `THEN` clause is applied. If no match is found, the action in the `ELSE` clause is applied. In searched `CASE` expressions, each `CASE` is evaluated based on a Boolean expression, and the `CASE` statement returns the first matching `CASE`. If no matching `CASE` is found among the `WHEN` clauses, the action in the `ELSE` clause is returned. ### Syntax **Note** Currently, Amazon S3 Select doesn't support `ORDER BY` or queries that contain new lines. Make sure that you use queries with no line breaks. The following is a simple `CASE` statement that's used to match conditions: ``` CASE expression WHEN value THEN result [WHEN...] [ELSE result] END ``` The following is a searched `CASE` statement that's used to evaluate each condition: ``` CASE WHEN boolean condition THEN result [WHEN ...] [ELSE result] END ``` ### Examples **Note** If you use the Amazon S3 console to run the following examples and your CSV file contains a header row, choose **Exclude the first line of CSV data**. **Example 1:** Use a simple `CASE` expression to replace `New York City` with `Big Apple` in a query. Replace all other city names with `other`. ``` SELECT venuecity, CASE venuecity WHEN 'New York City' THEN 'Big Apple' ELSE 'other' END FROM S3Object; ``` Query result: ``` venuecity | case -----------------+----------- Los Angeles | other New York City | Big Apple San Francisco | other Baltimore | other ... ``` **Example 2:** Use a searched `CASE` expression to assign group numbers based on the `pricepaid` value for individual ticket sales: ``` SELECT pricepaid, CASE WHEN CAST(pricepaid as FLOAT) < 10000 THEN 'group 1' WHEN CAST(pricepaid as FLOAT) > 10000 THEN 'group 2' ELSE 'group 3' END FROM S3Object; ``` Query result: ``` pricepaid | case -----------+--------- 12624.00 | group 2 10000.00 | group 3 10000.00 | group 3 9996.00 | group 1 9988.00 | group 1 ... ``` ## COALESCE `COALESCE` evaluates the arguments in order and returns the first non-unknown value, that is, the first non-null or non-missing value. This function does not propagate null and missing values. ### Syntax ``` COALESCE ( expression, expression, ... ) ``` ### Parameters *`expression`* The target expression that the function operates on. ### Examples ``` COALESCE(1) -- 1 COALESCE(null) -- null COALESCE(null, null) -- null COALESCE(missing) -- null COALESCE(missing, missing) -- null COALESCE(1, null) -- 1 COALESCE(null, null, 1) -- 1 COALESCE(null, 'string') -- 'string' COALESCE(missing, 1) -- 1 ``` ## NULLIF Given two expressions, `NULLIF` returns `NULL` if the two expressions evaluate to the same value; otherwise, `NULLIF` returns the result of evaluating the first expression. ### Syntax ``` NULLIF ( expression1, expression2 ) ``` ### Parameters `expression1, expression2` The target expressions that the function operates on. ### Examples ``` NULLIF(1, 1) -- null NULLIF(1, 2) -- 1 NULLIF(1.0, 1) -- null NULLIF(1, '1') -- 1 NULLIF([1], [1]) -- null NULLIF(1, NULL) -- 1 NULLIF(NULL, 1) -- null NULLIF(null, null) -- null NULLIF(missing, null) -- null NULLIF(missing, missing) -- null ``` # Conversion functions **Important** Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) Amazon S3 Select supports the following conversion function. **Topics** + [ ## CAST ](#s3-select-sql-reference-cast) ## CAST The `CAST` function converts an entity, such as an expression that evaluates to a single value, from one type to another. ### Syntax ``` CAST ( expression AS data_type ) ``` ### Parameters *`expression`* A combination of one or more values, operators, and SQL functions that evaluate to a value. *`data_type`* The target data type, such as `INT`, to cast the expression to. For a list of supported data types, see [Data types](s3-select-sql-reference-data-types.md). ### Examples ``` CAST('2007-04-05T14:30Z' AS TIMESTAMP) CAST(0.456 AS FLOAT) ``` # Date functions **Important** Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) Amazon S3 Select supports the following date functions. **Topics** + [ ## DATE\$1ADD ](#s3-select-sql-reference-date-add) + [ ## DATE\$1DIFF ](#s3-select-sql-reference-date-diff) + [ ## EXTRACT ](#s3-select-sql-reference-extract) + [ ## TO\$1STRING ](#s3-select-sql-reference-to-string) + [ ## TO\$1TIMESTAMP ](#s3-select-sql-reference-to-timestamp) + [ ## UTCNOW ](#s3-select-sql-reference-utcnow) ## DATE\$1ADD Given a date part, a quantity, and a timestamp, `DATE_ADD` returns an updated timestamp by altering the date part by the quantity. ### Syntax ``` DATE_ADD( date_part, quantity, timestamp ) ``` ### Parameters *`date_part`* Specifies which part of the date to modify. This can be one of the following: + year + month + day + hour + minute + second *`quantity`* The value to apply to the updated timestamp. Positive values for `quantity` add to the timestamp's date\$1part, and negative values subtract. *`timestamp`* The target timestamp that the function operates on. ### Examples ``` DATE_ADD(year, 5, `2010-01-01T`) -- 2015-01-01 (equivalent to 2015-01-01T) DATE_ADD(month, 1, `2010T`) -- 2010-02T (result will add precision as necessary) DATE_ADD(month, 13, `2010T`) -- 2011-02T DATE_ADD(day, -1, `2017-01-10T`) -- 2017-01-09 (equivalent to 2017-01-09T) DATE_ADD(hour, 1, `2017T`) -- 2017-01-01T01:00-00:00 DATE_ADD(hour, 1, `2017-01-02T03:04Z`) -- 2017-01-02T04:04Z DATE_ADD(minute, 1, `2017-01-02T03:04:05.006Z`) -- 2017-01-02T03:05:05.006Z DATE_ADD(second, 1, `2017-01-02T03:04:05.006Z`) -- 2017-01-02T03:04:06.006Z ``` ## DATE\$1DIFF Given a date part and two valid timestamps, `DATE_DIFF` returns the difference in date parts. The return value is a negative integer when the `date_part` value of `timestamp1` is greater than the `date_part` value of `timestamp2`. The return value is a positive integer when the `date_part` value of `timestamp1` is less than the `date_part` value of `timestamp2`. ### Syntax ``` DATE_DIFF( date_part, timestamp1, timestamp2 ) ``` ### Parameters **`date_part`** Specifies which part of the timestamps to compare. For the definition of `date_part`, see [DATE\$1ADD](#s3-select-sql-reference-date-add). **`timestamp1`** The first timestamp to compare. **`timestamp2`** The second timestamp to compare. ### Examples ``` DATE_DIFF(year, `2010-01-01T`, `2011-01-01T`) -- 1 DATE_DIFF(year, `2010T`, `2010-05T`) -- 4 (2010T is equivalent to 2010-01-01T00:00:00.000Z) DATE_DIFF(month, `2010T`, `2011T`) -- 12 DATE_DIFF(month, `2011T`, `2010T`) -- -12 DATE_DIFF(day, `2010-01-01T23:00`, `2010-01-02T01:00`) -- 0 (need to be at least 24h apart to be 1 day apart) ``` ## EXTRACT Given a date part and a timestamp, `EXTRACT` returns the timestamp's date part value. ### Syntax ``` EXTRACT( date_part FROM timestamp ) ``` ### Parameters **`date_part`** Specifies which part of the timestamps to extract. This can be one of the following: + `YEAR` + `MONTH` + `DAY` + `HOUR` + `MINUTE` + `SECOND` + `TIMEZONE_HOUR` + `TIMEZONE_MINUTE` **`timestamp`** The target timestamp that the function operates on. ### Examples ``` EXTRACT(YEAR FROM `2010-01-01T`) -- 2010 EXTRACT(MONTH FROM `2010T`) -- 1 (equivalent to 2010-01-01T00:00:00.000Z) EXTRACT(MONTH FROM `2010-10T`) -- 10 EXTRACT(HOUR FROM `2017-01-02T03:04:05+07:08`) -- 3 EXTRACT(MINUTE FROM `2017-01-02T03:04:05+07:08`) -- 4 EXTRACT(TIMEZONE_HOUR FROM `2017-01-02T03:04:05+07:08`) -- 7 EXTRACT(TIMEZONE_MINUTE FROM `2017-01-02T03:04:05+07:08`) -- 8 ``` ## TO\$1STRING Given a timestamp and a format pattern, `TO_STRING` returns a string representation of the timestamp in the given format. ### Syntax ``` TO_STRING ( timestamp time_format_pattern ) ``` ### Parameters *`timestamp`* The target timestamp that the function operates on. *`time_format_pattern`* A string that has the following special character interpretations: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-select-sql-reference-date.html) ### Examples ``` TO_STRING(`1969-07-20T20:18Z`, 'MMMM d, y') -- "July 20, 1969" TO_STRING(`1969-07-20T20:18Z`, 'MMM d, yyyy') -- "Jul 20, 1969" TO_STRING(`1969-07-20T20:18Z`, 'M-d-yy') -- "7-20-69" TO_STRING(`1969-07-20T20:18Z`, 'MM-d-y') -- "07-20-1969" TO_STRING(`1969-07-20T20:18Z`, 'MMMM d, y h:m a') -- "July 20, 1969 8:18 PM" TO_STRING(`1969-07-20T20:18Z`, 'y-MM-dd''T''H:m:ssX') -- "1969-07-20T20:18:00Z" TO_STRING(`1969-07-20T20:18+08:00Z`, 'y-MM-dd''T''H:m:ssX') -- "1969-07-20T20:18:00Z" TO_STRING(`1969-07-20T20:18+08:00`, 'y-MM-dd''T''H:m:ssXXXX') -- "1969-07-20T20:18:00+0800" TO_STRING(`1969-07-20T20:18+08:00`, 'y-MM-dd''T''H:m:ssXXXXX') -- "1969-07-20T20:18:00+08:00" ``` ## TO\$1TIMESTAMP Given a string, `TO_TIMESTAMP` converts it to a timestamp. `TO_TIMESTAMP` is the inverse operation of `TO_STRING`. ### Syntax ``` TO_TIMESTAMP ( string ) ``` ### Parameters *`string`* The target string that the function operates on. ### Examples ``` TO_TIMESTAMP('2007T') -- `2007T` TO_TIMESTAMP('2007-02-23T12:14:33.079-08:00') -- `2007-02-23T12:14:33.079-08:00` ``` ## UTCNOW `UTCNOW` returns the current time in UTC as a timestamp. ### Syntax ``` UTCNOW() ``` ### Parameters `UTCNOW` takes no parameters. ### Examples ``` UTCNOW() -- 2017-10-13T16:02:11.123Z ``` # String functions **Important** Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) Amazon S3 Select supports the following string functions. **Topics** + [ ## CHAR\$1LENGTH, CHARACTER\$1LENGTH ](#s3-select-sql-reference-char-length) + [ ## LOWER ](#s3-select-sql-reference-lower) + [ ## SUBSTRING ](#s3-select-sql-reference-substring) + [ ## TRIM ](#s3-select-sql-reference-trim) + [ ## UPPER ](#s3-select-sql-reference-upper) ## CHAR\$1LENGTH, CHARACTER\$1LENGTH `CHAR_LENGTH` (or `CHARACTER_LENGTH`) counts the number of characters in the specified string. **Note** `CHAR_LENGTH` and `CHARACTER_LENGTH` are synonyms. ### Syntax ``` CHAR_LENGTH ( string ) ``` ### Parameters *`string`* The target string that the function operates on. ### Examples ``` CHAR_LENGTH('') -- 0 CHAR_LENGTH('abcdefg') -- 7 ``` ## LOWER Given a string, `LOWER` converts all uppercase characters to lowercase characters. Any non-uppercased characters remain unchanged. ### Syntax ``` LOWER ( string ) ``` ### Parameters **`string`** The target string that the function operates on. ### Examples ``` LOWER('AbCdEfG!@#$') -- 'abcdefg!@#$' ``` ## SUBSTRING Given a string, a start index, and optionally a length, `SUBSTRING` returns the substring from the start index up to the end of the string, or up to the length provided. **Note** The first character of the input string has an index position of 1. If `start` is < 1, with no length specified, then the index position is set to 1. If `start` is < 1, with a length specified, then the index position is set to `start + length -1`. If `start + length -1` < 0, then an empty string is returned. If `start + length -1` > = 0, then the substring starting at index position 1 with the length `start + length - 1` is returned. ### Syntax ``` SUBSTRING( string FROM start [ FOR length ] ) ``` ### Parameters **`string`** The target string that the function operates on. **`start`** The start position of the string. **`length`** The length of the substring to return. If not present, proceed to the end of the string. ### Examples ``` SUBSTRING("123456789", 0) -- "123456789" SUBSTRING("123456789", 1) -- "123456789" SUBSTRING("123456789", 2) -- "23456789" SUBSTRING("123456789", -4) -- "123456789" SUBSTRING("123456789", 0, 999) -- "123456789" SUBSTRING("123456789", 1, 5) -- "12345" ``` ## TRIM Trims leading or trailing characters from a string. The default character to remove is a space (`' '`). ### Syntax ``` TRIM ( [[LEADING | TRAILING | BOTH remove_chars] FROM] string ) ``` ### Parameters **`string`** The target string that the function operates on. `LEADING` \$1 `TRAILING` \$1 `BOTH` This parameter indicates whether to trim leading or trailing characters, or both leading and trailing characters. **`remove_chars`** The set of characters to remove. `remove_chars` can be a string with a length > 1. This function returns the string with any character from `remove_chars` found at the beginning or end of the string that was removed. ### Examples ``` TRIM(' foobar ') -- 'foobar' TRIM(' \tfoobar\t ') -- '\tfoobar\t' TRIM(LEADING FROM ' foobar ') -- 'foobar ' TRIM(TRAILING FROM ' foobar ') -- ' foobar' TRIM(BOTH FROM ' foobar ') -- 'foobar' TRIM(BOTH '12' FROM '1112211foobar22211122') -- 'foobar' ``` ## UPPER Given a string, `UPPER` converts all lowercase characters to uppercase characters. Any non-lowercased characters remain unchanged. ### Syntax ``` UPPER ( string ) ``` ### Parameters **`string`** The target string that the function operates on. ### Examples ``` UPPER('AbCdEfG!@#$') -- 'ABCDEFG!@#$' ```