Collations for T-SQL
This topic provides reference content comparing collation and character set support between Microsoft SQL Server 2019 and Amazon Aurora MySQL. You can gain insight into how these database systems handle string management, storage, and comparison rules.
Feature compatibility | AWS SCT / AWS DMS automation level | AWS SCT action code index | Key differences |
---|---|---|---|
|
|
UNICODE uses |
SQL Server Usage
SQL Server collations define the rules for string management and storage in terms of sorting, case sensitivity, accent sensitivity, and code page mapping. SQL Server supports both ASCII and UCS-2 UNICODE data.
UCS-2 UNICODE data uses a dedicated set of UNICODE data types denoted by the prefix N
: Nchar
and Nvarchar
. Their ASCII counterparts are CHAR and VARCHAR.
Choosing a collation and a character set has significant implications on data storage, logical predicate evaluations, query results, and query performance.
Note
To view all collations supported by SQL Server, use the fn_helpcollations
function as shown following: SELECT * FROM sys.fn_helpcollations()
.
Collations define the actual bitwise binary representation of all string characters and the associated sorting rules. SQL Server supports multiple collations down to the column level. A table may have multiple string columns that use different collations. Collations for non-UNICODE character sets determine the code page number representing the string characters.
Note
UNICODE and non-UNICODE data types in SQL Server aren’t compatible. A predicate or data modification that introduces a type conflict is resolved using predefined collation precedence rules. For more information, see Collation Precedence
Collations define sorting and matching sensitivity for the following string characteristics:
-
Case
-
Accent
-
Kana
-
Width
-
Variation selector
SQL Server uses a suffix naming convention that appends the option name to the collation name. For example, the collation Azeri_Cyrillic_100_CS_AS_KS_WS_SC
, is an Azeri-Cyrillic-100 collation that is case-sensitive, accent-sensitive, kana type-sensitive, width-sensitive, and has supplementary characters.
SQL Server supports three types of collation sets: * Windows Collations use the rules defined for collations by the operating system locale where UNICODE and non-UNICODE data use the same comparison algorithms. * Binary Collations use the binary bit-wise code for comparison. Therefore, the locale doesn’t affect sorting. * SQL Server Collations provide backward compatibility with previous SQL Server versions. They aren’t compatible with the windows collation rules for non-UNICODE data.
You can define collations at various levels:
-
Server-level collations determine the collations used for all system databases and is the default for future user databases. While the system databases collation can’t be changed, an alternative collation can be specified as part of the
CREATE DATABASE
statement -
Database-level collations inherit the server default unless the
CREATE DATABASE
statement explicitly sets a different collation. This collation is used as a default for allCREATE TABLE
andALTER TABLE
statements. -
Column-level collations can be specified as part of the
CREATE TABLE
orALTER TABLE
statements to override the database’s default collation setting. -
Expression-level collations can be set for individual string expressions using the
COLLATE
function. For example,SELECT * FROM MyTable ORDER BY StringColumn COLLATE Latin1_General_CS_AS
.
Note
SQL Server supports UCS-2 UNICODE only.
SQL Server 2019 adds support for UTF-8 for import and export encoding, and as database-level or column-level collation for string data. Support includes PolyBase external tables, and Always Encrypted (when not used with Enclaves). For more information, see Collation and Unicode Support
Syntax
CREATE DATABASE <Database Name> [ ON <File Specifications> ] COLLATE <Collation> [ WITH <Database Option List> ];
CREATE TABLE <Table Name> ( <Column Name> <String Data Type> COLLATE <Collation> [ <Column Constraints> ]... );
Examples
The following example creates a database with a default Bengali_100_CS_AI collation.
CREATE DATABASE MyBengaliDatabase ON ( NAME = MyBengaliDatabase_Datafile, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\MyBengaliDatabase.mdf', SIZE = 100) LOG ON ( NAME = MyBengaliDatabase_Logfile, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\MyBengaliDblog.ldf', SIZE = 25) COLLATE Bengali_100_CS_AI;
The following example creates a table with two different collations.
CREATE TABLE MyTable ( Col1 CHAR(10) COLLATE Hungarian_100_CI_AI_SC NOT NULL PRIMARY KEY, COL2 VARCHAR(100) COLLATE Sami_Sweden_Finland_100_CS_AS_KS NOT NULL );
For more information, see Collation and Unicode support
MySQL Usage
Amazon Aurora MySQL-Compatible Edition (Aurora MySQL) supports multiple character sets and a variety of collations that can be used for comparison. Similar to SQL Server, you can define collations at the server, database, and column level. Additionally, you can define collations at the table level in Aurora MySQL.
The paradigm of collations in Aurora MySQL is different than in SQL Server and consists of separate character set and collation objects. Aurora MySQL supports 41 different character sets and 222 collations. Seven different UNICODE character sets are supported including UCS-2, UTF-8 and UTF-32.
Note
Use UCS-2 which is compatible with SQL Server UNICODE types.
Each character set can have one or more associated collations with a single default collation.
Collation names have prefixes consisting of the name of their associated character set followed by suffixes that indicate additional characteristics.
To see all character sets supported by Aurora MySQL, use the INFORMATION_SCHEMA.CHARACTER_SETS
table or the SHOW CHARACTER SET
statement.
To see all collations for a character set, use the INFORMATION_SCHEMA.COLLATIONS
table or the SHOW COLLATION
statement.
Note
Character set and collation settings also affect client-to -server communications. You can set explicit collations for sessions using the SET
command. For example, SET NAMES 'utf8';
causes Aurora MySQL to treat incoming object names as UTF-8 encoded.
You can set the default character set and collations at the server level using custom cluster parameter groups. For more information, see Server Options.
At the database level, you can set a default character set and collation with the CREATE DATABASE
and ALTER DATABASE
statements. Consider the following example:
CREATE DATABASE MyDatabase CHARACTER SET latin1 COLLATE latin1_swedish_ci;
To view the default character set and collation for an Aurora MySQL databases, use the following statement:
SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = '<Database Name>';
Note
In Aurora MySQL, a database is equivalent to an SQL Server schema. For more information, see Databases and Schemas.
Every string column in Aurora MySQL has a character set and an associated collation. If not explicitly specified, it will inherit the table default. To specify a non-default character set and collation, use the CHARACTER SET
and COLLATE
clauses of the CREATE TABLE
statement.
CREATE TABLE MyTable ( StringColumn VARCHAR(5) NOT NULL CHARACTER SET latin1 COLLATE latin1_german1_ci );
At the expression level, similar to SQL Server, you can use the COLLATE
function to explicitly declare a string’s collation. In addition, a prefix to the string can be used to denote a specific character set. Consider the following example:
SELECT _latin1'Latin non-UNICODE String', _utf8'UNICODE String' COLLATE utf8_danish_ci;
Note
The Aurora MySQL term for this prefix or string header is introducer. It doesn’t change the value of the string; only the character set.
At the session level, the server’s setting determines the default character set and collation used to evaluate nonqualified strings.
Although the server’s character set and collation default settings can be modified using the cluster parameter groups, it is recommended that client applications don’t assume a specific setting and explicitly set the required character set and collation using the SET NAMES
and SET CHARACTER SET
statements.
For more information, see Connection Character Sets and Collations
Syntax
The following example creates a database-level collation.
CREATE DATABASE <Database Name> [DEFAULT] CHARACTER SET <Character Set> [[DEFAULT] COLLATE <Collation>];
The following example creates a table-level collation.
CREATE TABLE <Table Name> (Column Specifications) [DEFAULT] CHARACTER SET <Character Set> [COLLATE <Collation>];
The following example creates a column collation.
CREATE TABLE <Table Name>
(
<Column Name> {CHAR | VARCHAR | TEXT} (<Length>)
CHARACTER SET CHARACTER SET <Character Set>
[COLLATE <Collation>];
The following example creates an expression collation.
_<Character Set>'<String>' COLLATE <Collation>
Examples
The following walkthrough describes how to change the cluster character set and collation.
-
Log in to your Management Console
, choose Amazon RDS , and then choose Parameter groups. -
Choose Create parameter group.
-
For Parameter group family, choose aurora-mysql5.7.
-
For Type, choose DB Cluster Parameter Group.
-
For Group name, enter the identified for the DB parameter group.
-
Choose Create.
-
Choose the newly created group on the Parameter groups list.
-
For Parameters, enter character_set_server in the search box and choose Edit parameters.
-
Choose the server default character set.
-
Delete the search term and enter collation. Select the desired default server collation and choose Preview changes.
-
Check the values and choose Close, and then choose Save changes.
-
Return to the Management Console dashboard and choose Create database.
-
For Choose a database creation method, choose Easy create.
-
For Engine type, choose Amazon Aurora .
-
Enter the instance size, cluster identifier and username. Choose Create database.
-
Modify the created instance to change the DB Parameter group.
Summary
The following table identifies similarities, differences, and key migration considerations.
Feature | SQL Server | Aurora MySQL |
---|---|---|
Unicode support |
UTF 16 using |
8 UNICODE character sets, using the |
Collations levels |
Server, Database, Column, Expression |
Server, Database, Table, Column, Expression |
View collation metadata |
|
|
For more information, see Character Sets, Collations, Unicode