Migrating to the Amazon CloudSearch 2013-01-01 API
The Amazon CloudSearch 2013-01-01 API offers several new features, including support for multiple languages, highlighting search terms in the results, and getting suggestions. To use these features, you create and configure a new 2013-01-01 search domain, modify your data pipeline to populate the new domain using the 2013-01-01 data format, and update your query pipeline to submit requests in the 2013-01-01 request format. This migration guide summarizes the API changes and highlights the ones that are most likely to affect your application.
Creating 2013-01-01 Amazon CloudSearch Domains
If you created Amazon CloudSearch domains prior to the launch of the 2013-01-01 API, you can choose which API version to use when you create a new domain. To create a 2013-01-01 domain through the console, select the 2013-01-01 version in the Create Domain Wizard. To create a 2013-01-01 domain from the command line, download and install the AWS CLI and run the aws cloudsearch create-domain
command.
Note
To create and interact with 2013-01-01 domains, you must use the AWS CLI tools. To create and interact with 2011-02-01 domains, you must use the v1 tools.
Topics
Configuring 2013-01-01 Amazon CloudSearch Domains
You can configure 2013-01-01 domains through the console, command line tools, or AWS SDKs. 2013-01-01 domains support several new configuration options:
Analysis Schemes—you configure analysis schemes to specify language-specific text processing options for
text
andtext-array
fields. Amazon CloudSearch now supports 33 languages, as well as an option for multi-language fields. For more information, see Configuring Analysis Schemes. For the complete list of supported languages, see Supported Languages.Availability Options—you can enable the Multi-AZ option to expand a domain into a second availability zone to ensure availability in the event of a service disruption. For more information, see Configuring Availability Options.
Scaling Options—you can set the desired instance type and desired replication count to increase upload or search capacity, speed up search requests, and improve fault tolerance. For more information, see Configuring Scaling Options in Amazon CloudSearch.
Suggesters—you can configure suggesters to implement autocomplete functionality. For more information, see Configuring Suggesters for Amazon CloudSearch.
Access to the Amazon CloudSearch configuration service is managed through IAM and now enables you to control access to specific configuration actions. Note that the Amazon CloudSearch ARN has also changed. Access to your domain's document and search endpoints is managed through the Amazon CloudSearch configuration service. For more information, see configure access policies.
2013-01-01 domains also support an expanded set of indexing options:
Analysis Scheme—you configure language-specific text-processing on a per field basis by specifying an analysis scheme for each
text
andtext-array
field. For more information, see Configuring Analysis Schemes.Field Types—Amazon CloudSearch now supports 11 field types:
date—contains a timestamp. Dates and times are specified in UTC (Coordinated Universal Time) according to IETF RFC3339: yyyy-mm-ddT00:00:00Z. In UTC, for example, 5:00 PM August 23, 1970 is: 1970-08-23T17:00:00Z.
date-array—a date field that can contain multiple values.
double—contains a double-precision 64-bit floating point value.
double-array—a double field that can contain multiple values.
int—contains a 64-bit signed integer value.
int-array—an integer field that can contain multiple values.
latlon—contains a location stored as a latitude and longitude value pair.
literal—contains an identifier or other data that you want to be able to match exactly.
literal-array—a literal field that can contain multiple values.
text—contains arbitrary alphanumeric data.
text-array—a text field that can contain multiple values.
Highlight—when you enable the highlight option for a field, you can retrieve excerpts that show where the search terms occur within that field. For more information, see Highlighting Search Hits in Amazon CloudSearch.
Source—you can specify a source for a field to copy data from one field to another, enabling you to use the same source data in different ways by configuring different options for the fields.
When configuring your 2013-01-01 domain, there are several things to keep in mind:
By default, when you add a field, all options valid for that field type are enabled. While this is useful for development and testing, disabling options you don't need can reduce the size of your index and improve performance.
You must use the separate array type fields for multi-valued fields.
Only single-value fields can be sort enabled.
Only
text
andtext-array
fields can be highlight enabled.All fields except
text
andtext-array
fields can be facet enabled.Literal fields are now case-sensitive.
You no longer have to store floating point values as integers—use a
double
field.You can store locations using the new
latlon
field type. For more information, see location-based searching and sorting.An
int
field is a 64-bit signed integer.Instead of configuring a default search field, you can specify which fields to search with the
q.options
parameter in your search requests. Theq.options
parameter also enables you to specify weights for each of the fields.When sorting and configuring expressions, you reference the default relevance score with the name
_score
. Due to changes in the relevance algorithm, the calculated scores will be different than they were under the 2011-02-01 API. For more information, see Configuring Expressions.Expressions now support the
logn
,atan2
, andhaversin
functions as well as the_score
(text relevance score) and_time
(epoch time) variables. If you store locations inlatlon
fields, you can reference the latitude and longitude values asFIELD.latitude
andFIELD.longitude
. You can also reference bothint
anddouble
fields in expressions. The following functions are no longer supported:cs.text_relevance
,erf
,lgamma
,rand
, andtime
. For more information, see Configuring Expressions.
For more information about configuring indexing options for a 2013-01-01 domain, see configure indexing options. For more information about configuring availability options, scaling options, text processing options, suggesters, and expressions see Creating and Managing Search Domains.
New Amazon CloudSearch Configuration Service Actions and Options
The following actions have been added to the 2013-01-01 Configuration Service API:
DefineAnalysisScheme
DefineExpression
DefineSuggester
DeleteAnalysisScheme
DeleteExpression
DeleteSuggester
DexcribeAnalysisSchemes
DescribeAvailabilityOptions
DescribeExpressions
DescribeScalingParameters
DescribeSuggesters
ListDomainNames
UpdateAvailabilityOptions
UpdateScalingParameters
The deployed
option has been added to the describe actions for index fields, access policies, and suggesters. Set the deployed
option to true to show the active configuration and exclude pending changes.
Obsolete Amazon CloudSearch Configuration Service Actions and Options
The following actions are not supported in the 2013-01-01 Configuration Service API:
DefineRankExpression
DescribeRankExpression
DeleteRankExpression
DescribeDefaultSearchField
DescribeStemmingOptions
DescribeStopwordOptions
DescribeSynonymOptions
UpdateDefaultSearchField
UpdateStemmingOptions
UpdateStopwordOptions
UpdateSynonymOptions
Uploading Data to 2013-01-01 Amazon CloudSearch Domains
With the 2013-01-01 API, you no longer have to specify document versions—updates are applied in the order they are received. You also no longer specify the lang
attribute for each document—you control language-specific text processing by configuring an analysis scheme for each text
and text-array
field.
To upload your data to a 2013-01-01 domain, you need to:
Omit the
version
andlang
attributes from your document batches.Make sure all of the document fields correspond to index fields configured for your domain. Unrecognized fields are no longer ignored, they will generate an error.
Post the document batches to your 2013-01-01 domain's doc endpoint. Note that you must specify the 2013-01-01 API version. For example, the following request posts the batch contained in
data1.json
to thedoc-movies-123456789012.us-east-1.cloudsearch.amazonaws.com
endpoint.curl -X POST --upload-file data1.json doc-movies-123456789012.us-east-1. cloudsearch.amazonaws.com/2013-01-01/documents/batch --header "Content-Type: application/json"
The 2013-01-01 API supports prescaling your domain to increase upload capacity. If you have a large amount of data to upload, configure your domain's scaling options and select a larger desired instance type. Moving to a larger instance type enables you to upload batches in parallel and reduces the time it takes for the data to be indexed. For more information, see Configuring Scaling Options in Amazon CloudSearch.
For more information about formatting your data, see Preparing Your Data.
Searching 2013-01-01 Amazon CloudSearch Domains
Much of the effort required to migrate an existing Amazon CloudSearch search domain to the 2013-01-01 API is updating your query pipeline to submit 2013-01-01 compatible search requests.
Use the 2013-01-01 API version in all requests.
Use the
q
parameter to specify search criteria for all requests. Thebq
parameter is no longer supported. To use the structured (Boolean) search syntax, specifyq.parser=structured
in the request.Parameters cannot be repeated in a search request.
The wildcard character (*) is only supported when using the simple query parser. Use the
prefix
operator to perform prefix matching with the structured query parser. For example,q=(prefix 'oce')&q.parser=structured
.Use the field name
_id
to reference the document ID field in a search request. Thedocid
field name is no longer supported.Use the
range
operator search a field for a value within the specified range. Thefilter
operator is no longer supported.Use the new range syntax to search for ranges of values, including dates and locations stored in
latlon
fields. The double dot (..) notation is no longer supported. Separate the upper and lower bounds with a comma (,), and enclose the range in brackets or braces. A square bracket ([,]) indicates that the bound is included, a curly brace ({,}) excludes the bound. For example,year:2008..2011
is now expressed asyear:[2008,2011]
. An open ended range such asyear:..2011
is now expressed asyear:{,2011]
.Use the
term
operator to search a field for a particular value. Thefield
operator is no longer supported.Use the
q.options
parameter to specify field weights. Thecs.text_relevance
function is no longer supported. For example,q.options={fields:['title^2','plot^0.5']}
.Use the
fq
parameter to filter results without affecting how the matching documents are scored and sorted.Use a dot (.) as a separator rather than a hyphen (-) in the prefix parameters:
expr.NAME
,facet.FIELD
,highlight.FIELD
.Use the
facet.FIELD
parameter to specify all facet options. Thefacet-FIELD-top-N
,facet-FIELD-sort
, andfacet-FIELD-constraints
parameters are no longer supported.Use the
sort
parameter to specify the fields or expressions you want to use for sorting. You must explicitly specify the sort direction in the sort parameter. For example,sort=rank asc, date desc
. Therank
parameter is no longer supported.Use
expr.NAME
to define an expression in a search request. Therank-RANKNAME
parameter is no longer supported.Use
format=xml
to get results as XML. Theresult-type
parameter is no longer supported.
The 2013-01-01 search API also supports several new features:
Term boosting—use the
boost
option in a structured query to increase the importance of one part of the query relative to the other parts. For more information, see Constructing Compound Queries.Sloppy phrase search—use the
near
operator in a structured query to search atext
ortext-array
field for multiple terms and find documents that contain the terms within the specified distance of one another. You can also perform a sloppy phrase search with the simple query parser by appending the~
operator and a value to the phrase. For more information, see Searching for Phrases.Fuzzy search—use the
~
operator to perform fuzzy searches with the simple query parser. Append the~
operator and a value to a term to indicate how much terms can differ and still be considered a match. For more information, see Searching for Individual Terms.Highlighting—use the
highlight.FIELD
parameter to highlight matches in a particular field. For more information, see Highlighting Search Hits in Amazon CloudSearch.Autocomplete—configure a suggester and submit requests to the
suggester
resource to get a list of query completions and the documents in which they were found. For more information, see Getting Autocomplete Suggestions in Amazon CloudSearch.Partial search results—use the
partial=true
parameter to retrieve partial results when one or more index partitions are unavailable. By default Amazon CloudSearch only returns results if every partition can be queried.Deep paging—use the
cursor
parameter to paginate results when you have a large result set. For more information, see Paginate the results.Match all documents—use the
matchall
structured query operator to retrieve all of the documents in the index.New query parsers—use the
q.parser
parameter to select the Lucene or DisMax parsers instead of the simple or structured parser,q.parser=lucene
orq.parser=dismax
.
You'll also notice some changes in behavior when searching:
Strings are no longer tokenized on case boundaries and periods that aren't followed by a space are considered part of the term. For more information, see Text Processing in Amazon CloudSearch.
Literal fields are now case-sensitive.
Search responses no longer include the rank, match expression, or CPU time. The only status information returned is the resource ID (rid) and processing time (time-ms).
When you get facet information for an
int
field,min
andmax
values are no longer returned.
For more information about searching your data, see Searching Your Data with Amazon CloudSearch and the Search API.
New Parameters and Options in the Amazon CloudSearch 2013-01-01 Search API
The following parameters have been added to the 2013-01-01 Search API:
cursor.FIELD
expr.NAME
facet.FIELD
format
fq
highlight.FIELD
partial
pretty
q.options
q.parser
return
sort
The ~
operator has been added to the simple query language to support fuzzy searches and sloppy phrase searches.
The following operators have been added to the structured query language:
boost
matchall
near
phrase
prefix
range
term
Obsolete Amazon CloudSearch Search Parameters and Options
The following parameters are no longer supported in the 2013-01-01 search API:
bq
facet-FIELD-top-N
facet-FIELD-sort
facet-FIELD-constraints
rank
rank-RANKNAME
return-fields
result-type
t-FIELD
The following operators and shortcuts are no longer supported in structured queries:
field
filter
-
|
+
*
Updated Limits in Amazon CloudSearch 2013-01-01
This table summarizes the changes and additions to the Amazon CloudSearch limits. For the complete list of Amazon CloudSearch limits, see Limits.
Change | Summary |
---|---|
Reserved names | score is the only reserved name. |
No limit on return data | Data returned from a text field is no longer truncated at 2 KB. However, keep in mind that the maximum document size is 1 MB. |
No limit on stemming, stopword, or synonym dictionaries. | Stemming, stopword, and synonym dictionaries are configured in an analysis scheme and there is no limit on the size of an analysis scheme definition. |
Maximum number of field values | An array type field can contain up to 1000 values. |
Field size | The maximum size of literal fields is 4096 Unicode code points. |
Int field range | An int field can contain values in the range -9,223,372,036,854,775,808 - 9,223,372,036,854,775,807 (inclusive). |
Maximum number of highlights | The maximum number of occurrences of the search term(s) that can be highlighted is 5. |
Maximum number of suggesters | The maximum number of suggesters you can configure for a domain is 10. |
Maximum number of hits you can retrieve at once | The maximum number of hits you can retrieve at once is 10,000. The size parameter can contain values in the range 0 - 10000. |