Data types - AWS Clean Rooms

Data types

Each value that AWS Clean Rooms Spark SQL stores or retrieves has a data type with a fixed set of associated properties. Data types are declared when tables are created. A data type constrains the set of values that a column or argument can contain.

The following table lists the data types that you can use in AWS Clean Rooms Spark SQL.

Data type name Data type Aliases Description
ARRAY Nested type Not applicable Array nested data type
BIGINT Numeric types Not applicable Signed eight-byte integer
BINARY Binary type Not applicable Byte sequence values
BOOLEAN Boolean type BOOL Logical Boolean (true/false)
BYTE Binary type Not applicable 1-byte signed integer numbers, from -128 to 127
CHAR Character types CHARACTER Fixed-length character string
DATE Datetime types Not applicable Calendar date (year, month, day)
DECIMAL Numeric types NUMERIC Exact numeric of selectable precision
FLOAT Numeric types FLOAT8, DOUBLE PRECISION Double precision floating-point number
INTEGER Numeric types INT Signed four-byte integer
INTERVAL Datetime types Not applicable Time duration in day to time order or year to month order
LONG Numeric types Not applicable 8-byte signed integer numbers
MAP Nested type Not applicable Map nested data type
REAL Numeric types FLOAT4 Single precision floating-point number
SHORT Numeric types Not applicable 2-byte signed integer numbers.
SMALLINT Numeric types Not applicable Signed two-byte integer
STRUCT Nested type Not applicable Struct nested data type
TIME Datetime types Not applicable Time of day
TIMESTAMP_LTZ Datetime types Not applicable Time of day with local time zone
TIMESTAMP_NTZ Datetime types Not applicable Time of day without time zone
TINYINT Numeric types Not applicable 1-byte signed integer numbers, from -128 to 127
VARCHAR Character types CHARACTER VARYING Variable-length character string with a user-defined limit
Note

The ARRAY, STRUCT, and MAP nested data types are currently only enabled for the custom analysis rule. For more information, see Nested type.

Multibyte characters

The VARCHAR data type supports UTF-8 multibyte characters up to a maximum of four bytes. Five-byte or longer characters are not supported. To calculate the size of a VARCHAR column that contains multibyte characters, multiply the number of characters by the number of bytes per character. For example, if a string has four Chinese characters, and each character is three bytes long, then you will need a VARCHAR(12) column to store the string.

The VARCHAR data type doesn't support the following invalid UTF-8 codepoints:

0xD800 – 0xDFFF (Byte sequences: ED A0 80ED BF BF)

The CHAR data type doesn't support multibyte characters.