Collations supported by Babelfish
When you create an Aurora PostgreSQL DB cluster with Babelfish, you choose a collation for your data. A collation specifies the sort order and bit patterns that produce the text or characters in a given written human language. A collation includes rules comparing data for a given set of bit patterns. Collation is related to localization. Different locales affect character mapping, sort order, and the like. Collation attributes are reflected in the names of various collations. For information about attributes, see the Babelfish collation attributes table.
Babelfish maps SQL Server collations to comparable collations provided by Babelfish. Babelfish predefines Unicode collations with culturally sensitive string comparisons and sort orders. Babelfish also provides a way to translate the collations in your SQL Server DB to the closest-matching Babelfish collation. Locale-specific collations are provided for different languages and regions.
Some collations specify a code page that corresponds to a client-side encoding. Babelfish automatically translates from the server encoding to the client encoding depending on the collation of each output column.
Babelfish supports the collations listed in the Babelfish supported collations table. Babelfish maps SQL Server collations to comparable collations provided by Babelfish.
Babelfish uses version 153.80 of the International Components for Unicode (ICU) collation library.
For more information about ICU collations, see
Collation
Topics
DB cluster parameters that control collation and locale
The following parameters affect collation behavior.
- babelfishpg_tsql.default_locale
This parameter specifies the default locale used by the collation. This parameter is used in combination with attributes listed in the Babelfish collation attributes table to customize collations for a specific language and region. The default value for this parameter is
en-US
.The default locale applies to all Babelfish collation names that start with "BBF" and to all SQL Server collations that are mapped to Babelfish collations. Changing the setting for this parameter on an existing Babelfish DB cluster doesn't affect the locale of existing collations. For the list of collations, see the Babelfish supported collations table.
- babelfishpg_tsql.server_collation_name
This parameter specifies the default collation for the server (Aurora PostgreSQL DB cluster instance) and the database. The default value is
sql_latin1_general_cp1_ci_as
. Theserver_collation_name
has to be aCI_AS
collation because in T-SQL, the server collation determines how identifiers are compared.When you create your Babelfish DB cluster, you choose the Collation name from the selectable list. These include the collations listed in the Babelfish supported collations table. Don't modify the
server_collation_name
after the Babelfish database is created.
The settings you choose when you create your Babelfish for Aurora PostgreSQL DB cluster are stored in the DB cluster parameter group associated with the cluster for these parameters and set its collation behavior.
Deterministic and nondeterministic collations and Babelfish
Babelfish supports deterministic and nondeterministic collations:
A deterministic collation evaluates characters that have identical byte sequences as equal. That means that
x
andX
aren't equal in a deterministic collation. Deterministic collations can be case-sensitive (CS) and accent-sensitive (AS).A nondeterministic collation doesn't need an identical match. A nondeterministic collation evaluates
x
andX
as equal. Nondeterministic collations are case-insensitive (CI) or accent-insensitive (AI), or both.
In the table following, you can find some behavior differences between Babelfish and PostgreSQL when using nondeterministic collations.
Babelfish | PostgreSQL |
---|---|
Supports the LIKE clause for CI_AS collations. |
Doesn't support the LIKE clause on nondeterministic collations. |
Supports the LIKE clause only on the following AI collations from Babelfish version 4.2.0:
|
Doesn't support the LIKE clause on nondeterministic collations. |
For a list of other limitations and behavior differences for Babelfish compared to SQL Server and PostgreSQL, see Collation limitations and behavior differences.
Babelfish and SQL Server follow a naming convention for collations that describe the collation attributes, as shown in the table following.
Attribute | Description |
---|---|
AI |
Accent-insensitive. |
AS |
Accent-sensitive. |
BIN2 |
BIN2 requests data to be sorted in code point order. Unicode code point order is the same character order for UTF-8, UTF-16, and UCS-2 encodings. Code point order is a fast deterministic collation. |
CI |
Case-insensitive. |
CS |
Case-sensitive. |
PREF |
To sort uppercase letters before lowercase letters, use a PREF collation. If comparison
is case-insensitive, the uppercase version of a letter sorts before the lowercase version,
if there is no other distinction. The ICU library supports uppercase preference with
PREF can be applied only to |
Collations supported by Babelfish
Use the following collations as a server collation or an object collation.
Collation ID | Notes |
---|---|
bbf_unicode_general_ci_as |
Supports case-insensitive comparison and the LIKE operator. |
bbf_unicode_cp1_ci_as |
Nondeterministic collation |
bbf_unicode_CP1250_ci_as |
Nondeterministic collation |
bbf_unicode_CP1251_ci_as |
Nondeterministic collation |
bbf_unicode_cp1253_ci_as |
Nondeterministic collation |
bbf_unicode_cp1254_ci_as |
Nondeterministic collation |
bbf_unicode_cp1255_ci_as |
Nondeterministic collation |
bbf_unicode_cp1256_ci_as |
Nondeterministic collation |
bbf_unicode_cp1257_ci_as |
Nondeterministic collation |
bbf_unicode_cp1258_ci_as |
Nondeterministic collation |
bbf_unicode_cp874_ci_as |
Nondeterministic collation |
sql_latin1_general_cp1250_ci_as |
Nondeterministic single-byte character encoding |
sql_latin1_general_cp1251_ci_as |
Nondeterministic collation |
sql_latin1_general_cp1_ci_as |
Nondeterministic collation |
sql_latin1_general_cp1253_ci_as |
Nondeterministic collation |
sql_latin1_general_cp1254_ci_as |
Nondeterministic collation |
sql_latin1_general_cp1255_ci_as |
Nondeterministic collation |
sql_latin1_general_cp1256_ci_as |
Nondeterministic collation |
sql_latin1_general_cp1257_ci_as |
Nondeterministic collation |
sql_latin1_general_cp1258_ci_as |
Nondeterministic collation |
chinese_prc_ci_as |
Nondeterministic collation that supports Chinese (PRC). |
cyrillic_general_ci_as |
Nondeterministic collation that supports Cyrillic. |
finnish_swedish_ci_as |
Nondeterministic collation that supports Finnish. |
french_ci_as |
Nondeterministic collation that supports French. |
japanese_ci_as |
Nondeterministic collation that supports Japanese. Supported in Babelfish 2.1.0 and higher releases. |
korean_wansung_ci_as |
Nondeterministic collation that supports Korean (with dictionary sort). |
latin1_general_ci_as |
Nondeterministic collation that supports Latin characters. |
modern_spanish_ci_as |
Nondeterministic collation that supports Modern Spanish. |
polish_ci_as |
Nondeterministic collation that supports Polish. |
thai_ci_as |
Nondeterministic collation that supports Thai. |
traditional_spanish_ci_as |
Nondeterministic collation that supports Spanish (traditional sort). |
turkish_ci_as |
Nondeterministic collation that supports Turkish. |
ukrainian_ci_as |
Nondeterministic collation that supports Ukrainian. |
vietnamese_ci_as |
Nondeterministic collation that supports Vietnamese. |
You can use the following collations as object collations.
Dialect | Deterministic options | Nondeterministic options |
---|---|---|
Arabic |
Arabic_CS_AS |
Arabic_CI_AS Arabic_CI_AI |
Arabic script |
BBF_Unicode_CP1256_CS_AS BBF_Unicode_Pref_CP1256_CS_AS |
BBF_Unicode_CP1256_CI_AI BBF_Unicode_CP1256_CS_AI |
Binary |
latin1_general_bin2 BBF_Unicode_BIN2 |
– |
Central European and Eastern European languages that use Latin script |
BBF_Unicode_CP1250_CS_AS BBF_Unicode_Pref_CP1250_CS_AS |
BBF_Unicode_CP1250_CI_AI BBF_Unicode_CP1250_CS_AI |
Chinese |
Chinese_PRC_CS_AS |
Chinese_PRC_CI_AS Chinese_PRC_CI_AI |
Cyrillic_General |
Cyrillic_General_CS_AS |
Cyrillic_General_CI_AS Cyrillic_General_CI_AI |
Cyrillic script |
BBF_Unicode_CP1251_CS_AS BBF_Unicode_Pref_CP1251_CS_AS |
BBF_Unicode_CP1251_CI_AI BBF_Unicode_CP1251_CS_AI |
Estonian |
Estonian_CS_AS |
Estonian_CI_AS Estonian_CI_AI |
Estonian, Latvian, and Lithuanian |
BBF_Unicode_CP1257_CS_AS BBF_Unicode_Pref_CP1257_CS_AS |
BBF_Unicode_CP1257_CI_AI BBF_Unicode_CP1257_CS_AI |
Finnish_Swedish |
Finnish_Swedish_CS_AS |
Finnish_Swedish_CI_AS Finnish_Swedish_CI_AI |
French |
French_CS_AS |
French_CI_AS French_CI_AI |
Greek |
Greek_CS_AS |
Greek_CI_AS Greek_CI_AI |
Hebrew |
BBF_Unicode_CP1255_CS_AS BBF_Unicode_Pref_CP1255_CS_AS Hebrew_CS_AS |
BBF_Unicode_CP1255_CI_AI BBF_Unicode_CP1255_CS_AI Hebrew_CI_AS Hebrew_CI_AI |
Japanese (Babelfish 2.1.0 and higher) |
Japanese_CS_AS |
Japanese_CI_AI Japanese_CI_AS |
Korean_Wamsung |
Korean_Wamsung_CS_AS |
Korean_Wamsung_CI_AS Korean_Wamsung_CI_AI |
Latin characters for code page CP1252 |
latin1_general_cs_as BBF_Unicode_General_CS_AS BBF_Unicode_General_Pref_CS_AS BBF_Unicode_Pref_CP1_CS_AS BBF_Unicode_CP1_CS_AS |
latin1_general_ci_as latin1_general_ci_ai latin1_general_cs_ai BBF_Unicode_General_CI_AI BBF_Unicode_General_CS_AI BBF_Unicode_CP1_CI_AI BBF_Unicode_CP1_CS_AI |
Modern Greek |
BBF_Unicode_CP1253_CS_AS BBF_Unicode_Pref_CP1253_CS_AS |
BBF_Unicode_CP1253_CI_AI BBF_Unicode_CP1253_CS_AI |
Modern_Spanish |
Modern_Spanish_CS_AS |
Modern_Spanish_CI_AS Modern_Spanish_CI_AI |
Mongolian |
Mongolian_CS_AS |
Mongolian_CI_AS Mongolian_CI_AI |
Polish |
Polish_CS_AS |
Polish_CI_AS Polish_CI_AI |
Thai |
BBF_Unicode_CP874_CS_AS BBF_Unicode_Pref_CP874_CS_AS Thai_CS_AS |
BBF_Unicode_CP874_CI_AI BBF_Unicode_CP874_CS_AI Thai_CI_AS, Thai_CI_AI |
Traditional_Spanish |
Traditional_Spanish_CS_AS |
Traditional_Spanish_CI_AS Traditional_Spanish_CI_AI |
Turkish |
BBF_Unicode_CP1254_CS_AS BBF_Unicode_Pref_CP1254_CS_AS Turkish_CS_AS |
BBF_Unicode_CP1254_CI_AI BBF_Unicode_CP1254_CS_AI Turkish_CI_AS, Turkish_CI_AI |
Ukranian |
Ukranian_CS_AS |
Ukranian_CI_AS Ukranian_CI_AI |
Vietnamese |
BBF_Unicode_CP1258_CS_AS BBF_Unicode_Pref_CP1258_CS_AS Vietnamese_CS_AS |
BBF_Unicode_CP1258_CI_AI BBF_Unicode_CP1258_CS_AI Vietnamese_CI_AS Vietnamese_CI_AI |
Default Collation in Babelfish
Earlier, the default collation of the collatable datatypes was pg_catalog.default
. The datatypes and the objects that depends on these datatypes follows cases-sensitive
collation. This condition potentially impacts the T-SQL objects of the data set with case-insensitive collation. Starting with Babelfish 2.3.0,
the default collation for the collatable data types (except TEXT and NTEXT) is the same as the collation in the babelfishpg_tsql.server_collation_name
parameter. When you upgrade
to Babelfish 2.3.0, the default collation is picked automatically at the time of DB cluster creation, which doesn't create any visible impact.