

# Understanding Collations in Babelfish for Aurora PostgreSQL
<a name="babelfish-collations"></a>

When you create an Aurora PostgreSQL DB cluster with Babelfish, you choose a collation for your data. A *collation* specifies the sort order and bit patterns that produce the text or characters in a given written human language. A collation includes rules comparing data for a given set of bit patterns. Collation is related to localization. Different locales affect character mapping, sort order, and the like. Collation attributes are reflected in the names of various collations. For information about attributes, see the [Babelfish collation attributes table](#bfish-collation-attributes-table). 

Babelfish maps SQL Server collations to comparable collations provided by Babelfish. Babelfish predefines Unicode collations with culturally sensitive string comparisons and sort orders. Babelfish also provides a way to translate the collations in your SQL Server DB to the closest-matching Babelfish collation. Locale-specific collations are provided for different languages and regions. 

Some collations specify a code page that corresponds to a client-side encoding. Babelfish automatically translates from the server encoding to the client encoding depending on the collation of each output column. 

Babelfish supports the collations listed in the [Babelfish supported collations table](#bfish-collations-table). Babelfish maps SQL Server collations to comparable collations provided by Babelfish. 

Babelfish uses version 153.80 of the International Components for Unicode (ICU) collation library. For more information about ICU collations, see [ Collation](https://unicode-org.github.io/icu/userguide/collation/) in the ICU documentation. To learn more about PostgreSQL and collation, see [Collation Support](https://www.postgresql.org/docs/current/collation.html) in the the PostgreSQL documentation.

**Topics**
+ [DB cluster parameters that control collation and locale](#babelfish-collations.parameters)
+ [Deterministic and nondeterministic collations in Babelfish](#babelfish-collations.deterministic-nondeterministic)
+ [Collations supported at database level in Babelfish](#babelfish-collations.database-level)
+ [Server and object Collations in Babelfish](#babelfish-collations.reference-tables-supported-collations)
+ [Default Collation behavior in Babelfish](#babelfish-collations-default)
+ [Managing collations](collation.managing.md)
+ [Collation limitations and behavior differences](collation.limitations.md)

## DB cluster parameters that control collation and locale
<a name="babelfish-collations.parameters"></a><a name="collation-related-parameters"></a>

The following parameters affect collation behavior. 

**babelfishpg\$1tsql.default\$1locale**  
This parameter specifies the default locale used by the collation. This parameter is used in combination with attributes listed in the [Babelfish collation attributes table](#bfish-collation-attributes-table) to customize collations for a specific language and region. The default value for this parameter is `en-US`.  
The default locale applies to all Babelfish collation names that start with "BBF" and to all SQL Server collations that are mapped to Babelfish collations. Changing the setting for this parameter on an existing Babelfish DB cluster doesn't affect the locale of existing collations. For the list of collations, see the [Babelfish supported collations table](#bfish-collations-table). 

**babelfishpg\$1tsql.server\$1collation\$1name**  
This parameter specifies the default collation for the server (Aurora PostgreSQL DB cluster instance) and the database. The default value is `sql_latin1_general_cp1_ci_as`. The `server_collation_name` has to be a `CI_AS` collation because in T-SQL, the server collation determines how identifiers are compared.  
When you create your Babelfish DB cluster, you choose the **Collation name** from the selectable list. These include the collations listed in the [Babelfish supported collations table](#bfish-collations-table). Don't modify the `server_collation_name` after the Babelfish database is created.

The settings you choose when you create your Babelfish for Aurora PostgreSQL DB cluster are stored in the DB cluster parameter group associated with the cluster for these parameters and set its collation behavior.

## Deterministic and nondeterministic collations in Babelfish
<a name="babelfish-collations.deterministic-nondeterministic"></a>

Babelfish supports deterministic and nondeterministic collations:
+ A *deterministic collation* evaluates characters that have identical byte sequences as equal. That means that `x` and `X` aren't equal in a deterministic collation. Deterministic collations can be case-sensitive (CS) and accent-sensitive (AS).
+ A *nondeterministic collation* doesn't need an identical match. A nondeterministic collation evaluates `x` and `X` as equal. Nondeterministic collations are case-insensitive (CI) or accent-insensitive (AI), or both.

In the table following, you can find some behavior differences between Babelfish and PostgreSQL when using nondeterministic collations.


| Babelfish | PostgreSQL | 
| --- | --- | 
|  Supports the LIKE clause for CI\$1AS collations.  |  Doesn't support the LIKE clause on nondeterministic collations.  | 
|  Supports the LIKE clause only on the following AI collations from Babelfish version 4.2.0: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/babelfish-collations.html)  |  Doesn't support the LIKE clause on nondeterministic collations.  | 

For a list of other limitations and behavior differences for Babelfish compared to SQL Server and PostgreSQL, see [Collation limitations and behavior differences](collation.limitations.md). 

Babelfish and SQL Server follow a naming convention for collations that describe the collation attributes, as shown in the table following.<a name="bfish-collation-attributes-table"></a>


| Attribute | Description | 
| --- | --- | 
| AI | Accent-insensitive. | 
| AS | Accent-sensitive. | 
| BIN2 | BIN2 requests data to be sorted in code point order. Unicode code point order is the same character order for UTF-8, UTF-16, and UCS-2 encodings. Code point order is a fast deterministic collation. | 
| CI | Case-insensitive. | 
| CS | Case-sensitive. | 
| PREF | To sort uppercase letters before lowercase letters, use a PREF collation. If comparison is case-insensitive, the uppercase version of a letter sorts before the lowercase version, if there is no other distinction. The ICU library supports uppercase preference with `colCaseFirst=upper`, but not for CI\$1AS collations. PREF can be applied only to `CS_AS` deterministic collations. | 

## Collations supported at database level in Babelfish
<a name="babelfish-collations.database-level"></a>

The following collations are supported at database level in Babelfish:
+ bbf\$1unicode\$1bin2
+ bbf\$1unicode\$1cp1\$1ci\$1ai
+ bbf\$1unicode\$1cp1\$1ci\$1as
+ bbf\$1unicode\$1cp1250\$1ci\$1ai
+ bbf\$1unicode\$1cp1250\$1ci\$1as
+ bbf\$1unicode\$1cp1257\$1ci\$1ai
+ bbf\$1unicode\$1cp1257\$1ci\$1as
+ estonian\$1ci\$1ai
+ estonian\$1ci\$1as
+ finnish\$1swedish\$1ci\$1ai
+ finnish\$1swedish\$1ci\$1as
+ french\$1ci\$1ai
+ french\$1ci\$1as
+ latin1\$1general\$1bin2
+ latin1\$1general\$1ci\$1ai
+ latin1\$1general\$1ci\$1as
+ latin1\$1general\$190\$1bin2
+ latin1\$1general\$1100\$1bin2
+ latin1\$1general\$1140\$1bin2
+ modern\$1spanish\$1ci\$1ai
+ modern\$1spanish\$1ci\$1as
+ polish\$1ci\$1ai
+ polish\$1ci\$1as
+ sql\$1latin1\$1general\$1cp1\$1ci\$1ai
+ sql\$1latin1\$1general\$1cp1\$1ci\$1as
+ sql\$1latin1\$1general\$1cp1250\$1ci\$1as
+ sql\$1latin1\$1general\$1cp1251\$1ci\$1as
+ sql\$1latin1\$1general\$1cp1257\$1ci\$1as
+ traditional\$1spanish\$1ci\$1ai
+ traditional\$1spanish\$1ci\$1as

**Note**  
To use a different collation at the database level, ensure it matches the server-level collation. For more information, see [Server and object Collations in Babelfish](#babelfish-collations.reference-tables-supported-collations)

## Server and object Collations in Babelfish
<a name="babelfish-collations.reference-tables-supported-collations"></a>

Use the following collations as a server collation or an object collation.<a name="bfish-collations-table"></a>


| Collation ID | Notes | 
| --- | --- | 
|  bbf\$1unicode\$1general\$1ci\$1as  |  Supports case-insensitive comparison and the LIKE operator.  | 
|  bbf\$1unicode\$1cp1\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt) also known as CP1252.  | 
|  bbf\$1unicode\$1CP1250\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1250.txt) used to represent texts in Central European and Eastern European languages that use Latin script.  | 
|  bbf\$1unicode\$1CP1251\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1251.txt) for languages that use the Cyrillic script.  | 
|  bbf\$1unicode\$1cp1253\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1253.txt) used to represent modern Greek.  | 
|  bbf\$1unicode\$1cp1254\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1254.txt) that supports Turkish.  | 
|  bbf\$1unicode\$1cp1255\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1255.txt) that supports Hebrew.  | 
|  bbf\$1unicode\$1cp1256\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1256.txt) used to write languages that use Arabic script.  | 
|  bbf\$1unicode\$1cp1257\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1257.txt) used to support Estonian, Latvian, and Lithuanian languages.  | 
|  bbf\$1unicode\$1cp1258\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1258.txt) used to write Vietnamese characters.  | 
|  bbf\$1unicode\$1cp874\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit874.txt) used to write Thai characters.  | 
|  sql\$1latin1\$1general\$1cp1250\$1ci\$1as  |  [Nondeterministic single-byte character encoding](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1250.txt) used to represent Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1251\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1251.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1253\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1253.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1254\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1254.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1255\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1255.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1256\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1256.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1257\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1257.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1258\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1258.txt) that supports Latin characters.  | 
|  chinese\$1prc\$1ci\$1as  |  Nondeterministic collation that supports Chinese (PRC).  | 
|  cyrillic\$1general\$1ci\$1as  |  Nondeterministic collation that supports Cyrillic.  | 
|  finnish\$1swedish\$1ci\$1as  |  Nondeterministic collation that supports Finnish.  | 
|  french\$1ci\$1as  |  Nondeterministic collation that supports French.  | 
|  japanese\$1ci\$1as  | Nondeterministic collation that supports Japanese. Supported in Babelfish 2.1.0 and higher releases. | 
|  korean\$1wansung\$1ci\$1as  |  Nondeterministic collation that supports Korean (with dictionary sort).  | 
|  latin1\$1general\$1ci\$1as  |  Nondeterministic collation that supports Latin characters.  | 
|  modern\$1spanish\$1ci\$1as  |  Nondeterministic collation that supports Modern Spanish.  | 
|  polish\$1ci\$1as  |  Nondeterministic collation that supports Polish.  | 
|  thai\$1ci\$1as  |  Nondeterministic collation that supports Thai.  | 
|  traditional\$1spanish\$1ci\$1as  |  Nondeterministic collation that supports Spanish (traditional sort).  | 
|  turkish\$1ci\$1as  |  Nondeterministic collation that supports Turkish.  | 
|  ukrainian\$1ci\$1as  |  Nondeterministic collation that supports Ukrainian.  | 
|  vietnamese\$1ci\$1as  |  Nondeterministic collation that supports Vietnamese.  | 

You can use the following collations as object collations.<a name="bfish-icu-collations-table"></a>


| Dialect | Deterministic options | Nondeterministic options | 
| --- | --- | --- | 
|  Arabic  |  Arabic\$1CS\$1AS  |  Arabic\$1CI\$1AS Arabic\$1CI\$1AI  | 
|  Arabic script  |  BBF\$1Unicode\$1CP1256\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1256\$1CS\$1AS  |  BBF\$1Unicode\$1CP1256\$1CI\$1AI BBF\$1Unicode\$1CP1256\$1CS\$1AI  | 
|  Binary  |  latin1\$1general\$1bin2 BBF\$1Unicode\$1BIN2  |  –  | 
|  Central European and Eastern European languages that use Latin script  |  BBF\$1Unicode\$1CP1250\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1250\$1CS\$1AS  |  BBF\$1Unicode\$1CP1250\$1CI\$1AI BBF\$1Unicode\$1CP1250\$1CS\$1AI  | 
|  Chinese  |  Chinese\$1PRC\$1CS\$1AS  |  Chinese\$1PRC\$1CI\$1AS Chinese\$1PRC\$1CI\$1AI  | 
|  Cyrillic\$1General  |  Cyrillic\$1General\$1CS\$1AS  |  Cyrillic\$1General\$1CI\$1AS Cyrillic\$1General\$1CI\$1AI  | 
|  Cyrillic script  |  BBF\$1Unicode\$1CP1251\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1251\$1CS\$1AS  |  BBF\$1Unicode\$1CP1251\$1CI\$1AI BBF\$1Unicode\$1CP1251\$1CS\$1AI  | 
|  Estonian  |  Estonian\$1CS\$1AS  |  Estonian\$1CI\$1AS Estonian\$1CI\$1AI  | 
|  Estonian, Latvian, and Lithuanian  |  BBF\$1Unicode\$1CP1257\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1257\$1CS\$1AS  |  BBF\$1Unicode\$1CP1257\$1CI\$1AI BBF\$1Unicode\$1CP1257\$1CS\$1AI  | 
|  Finnish\$1Swedish  |  Finnish\$1Swedish\$1CS\$1AS  |  Finnish\$1Swedish\$1CI\$1AS Finnish\$1Swedish\$1CI\$1AI  | 
|  French  |  French\$1CS\$1AS  |  French\$1CI\$1AS French\$1CI\$1AI  | 
|  Greek  |  Greek\$1CS\$1AS  |  Greek\$1CI\$1AS Greek\$1CI\$1AI  | 
|  Hebrew  |  BBF\$1Unicode\$1CP1255\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1255\$1CS\$1AS Hebrew\$1CS\$1AS  |  BBF\$1Unicode\$1CP1255\$1CI\$1AI BBF\$1Unicode\$1CP1255\$1CS\$1AI Hebrew\$1CI\$1AS Hebrew\$1CI\$1AI  | 
|  Japanese (Babelfish 2.1.0 and higher)  | Japanese\$1CS\$1AS | Japanese\$1CI\$1AI Japanese\$1CI\$1AS | 
|  Korean\$1Wamsung  |  Korean\$1Wamsung\$1CS\$1AS  |  Korean\$1Wamsung\$1CI\$1AS Korean\$1Wamsung\$1CI\$1AI  | 
|  Latin characters for code page CP1252  |  latin1\$1general\$1cs\$1as BBF\$1Unicode\$1General\$1CS\$1AS BBF\$1Unicode\$1General\$1Pref\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1\$1CS\$1AS BBF\$1Unicode\$1CP1\$1CS\$1AS  |  latin1\$1general\$1ci\$1as latin1\$1general\$1ci\$1ai latin1\$1general\$1cs\$1ai BBF\$1Unicode\$1General\$1CI\$1AI BBF\$1Unicode\$1General\$1CS\$1AI BBF\$1Unicode\$1CP1\$1CI\$1AI BBF\$1Unicode\$1CP1\$1CS\$1AI  | 
|  Modern Greek  |  BBF\$1Unicode\$1CP1253\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1253\$1CS\$1AS  |  BBF\$1Unicode\$1CP1253\$1CI\$1AI BBF\$1Unicode\$1CP1253\$1CS\$1AI  | 
|  Modern\$1Spanish  |  Modern\$1Spanish\$1CS\$1AS  |  Modern\$1Spanish\$1CI\$1AS Modern\$1Spanish\$1CI\$1AI  | 
|  Mongolian  |  Mongolian\$1CS\$1AS  |  Mongolian\$1CI\$1AS Mongolian\$1CI\$1AI  | 
|  Polish  |  Polish\$1CS\$1AS  |  Polish\$1CI\$1AS Polish\$1CI\$1AI  | 
|  Thai  |  BBF\$1Unicode\$1CP874\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP874\$1CS\$1AS Thai\$1CS\$1AS  |  BBF\$1Unicode\$1CP874\$1CI\$1AI BBF\$1Unicode\$1CP874\$1CS\$1AI Thai\$1CI\$1AS, Thai\$1CI\$1AI  | 
|  Traditional\$1Spanish  |  Traditional\$1Spanish\$1CS\$1AS  |  Traditional\$1Spanish\$1CI\$1AS Traditional\$1Spanish\$1CI\$1AI  | 
|  Turkish  |  BBF\$1Unicode\$1CP1254\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1254\$1CS\$1AS Turkish\$1CS\$1AS  |  BBF\$1Unicode\$1CP1254\$1CI\$1AI BBF\$1Unicode\$1CP1254\$1CS\$1AI Turkish\$1CI\$1AS, Turkish\$1CI\$1AI  | 
|  Ukranian  |  Ukranian\$1CS\$1AS  |  Ukranian\$1CI\$1AS Ukranian\$1CI\$1AI  | 
|  Vietnamese  |  BBF\$1Unicode\$1CP1258\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1258\$1CS\$1AS Vietnamese\$1CS\$1AS  |  BBF\$1Unicode\$1CP1258\$1CI\$1AI BBF\$1Unicode\$1CP1258\$1CS\$1AI Vietnamese\$1CI\$1AS Vietnamese\$1CI\$1AI  | 

## Default Collation behavior in Babelfish
<a name="babelfish-collations-default"></a>

Earlier, the default collation of the collatable datatypes was `pg_catalog.default`. The datatypes and the objects that depends on these datatypes follows cases-sensitive collation. This condition potentially impacts the T-SQL objects of the data set with case-insensitive collation. Starting with Babelfish 2.3.0, the default collation for the collatable data types (except TEXT and NTEXT) is the same as the collation in the `babelfishpg_tsql.server_collation_name` parameter. When you upgrade to Babelfish 2.3.0, the default collation is picked automatically at the time of DB cluster creation, which doesn't create any visible impact. 