Performing text search with Amazon DocumentDB
Amazon DocumentDB's native full text search feature allows you to perform text search on large textual data sets using special purpose text indexes. This section describes the functionalities of the text index feature and provides steps on how to create and use text indexes in Amazon DocumentDB. Text search limitations are also listed.
Topics
Supported functionalities
Amazon DocumentDB text search supports the following MongoDB API compatible functionalities:
Create text indexes on a single field.
Create compound text indexes that include more than one text field.
Perform single word or multi-word searches.
Control search results using weights.
Sort search results by score.
Use text index in aggregation pipeline.
Search for exact phrase.
Using Amazon DocumentDB text index
To create a text index on a field containing string data, specify the string “text” as shown below:
Single field index:
db.test.createIndex({"comments": "text"})
This index supports text search queries in the "comments" string field in the specified collection.
Create a compound text index on more than one string field:
db.test.createIndex({"comments": "text", "title":"text"})
This index supports text search queries in the "comments" and "title" string fields in the specified collection. You can specify up to 30 fields when creating a compound text index. Once created, your text search queries will query all the indexed fields.
Note
Only one text index is allowed on each collection.
Listing a text index on an Amazon DocumentDB collection
You can use getIndexes()
on your collection to identify and describe indexes, including text indexes, as shown in the example below:
rs0:PRIMARY> db.test.getIndexes() [ { "v" : 4, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "test.test" }, { "v" : 1, "key" : { "_fts" : "text", "_ftsx" : 1 }, "name" : "contents_text", "ns" : "test.test", "default_language" : "english", "weights" : { "comments" : 1 }, "textIndexVersion" : 1 } ]
Once you have created an index, start inserting data into your Amazon DocumentDB collection.
db.test.insertMany([{"_id": 1, "star_rating": 4, "comments": "apple is red"}, {"_id": 2, "star_rating": 5, "comments": "pie is delicious"}, {"_id": 3, "star_rating": 3, "comments": "apples, oranges - healthy fruit"}, {"_id": 4, "star_rating": 2, "comments": "bake the apple pie in the oven"}, {"_id": 5, "star_rating": 5, "comments": "interesting couch"}, {"_id": 6, "star_rating": 5, "comments": "interested in couch for sale, year 2022"}])
Running text search queries
Run a single-word text search query
You will need to use $text
and $search
operators to perform text searches.
The following example returns all documents where your text indexed field contain the string “apple” or “apple” in other formats such as “apples”:
db.test.find({$text: {$search: "apple"}})
Output:
The output of this command looks something like this:
{ "_id" : 1, "star_rating" : 4, "comments" : "apple is red" }
{ "_id" : 3, "star_rating" : 3, "comments" : "apples, oranges - healthy fruit" }
{ "_id" : 4, "star_rating" : 2, "comments" : "bake the apple pie in the oven" }
Run a multi-word text search
You can also perform multi-word text searches on your Amazon DocumentDB data. The command below returns documents with a text indexed field containing “apple” or “pie”:
db.test.find({$text: {$search: "apple pie"}})
Output:
The output of this command looks something like this:
{ "_id" : 1, "star_rating" : 4, "comments" : "apple is red" }
{ "_id" : 2, "star_rating" : 5, "comments" : "pie is delicious" }
{ "_id" : 3, "star_rating" : 3, "comments" : "apples, oranges - healthy fruit" }
{ "_id" : 4, "star_rating" : 2, "comments" : "bake the apple pie in the oven" }
Run a multi-word phrase text search
For a multi-word phrase search, use this example:
db.test.find({$text: {$search: "\"apple pie\""}})
Output:
The command above returns documents with text indexed field containing the exact phrase “apple pie”. The output of this command looks something like this:
{ "_id" : 4, "star_rating" : 2, "comments" : "bake the apple pie in the oven" }
Run a text search with filters
You can also combine text search with other query operators to filter results based on additional criteria:
db.test.find({$and: [{star_rating: 5}, {$text: {$search: "interest"}}]})
Output:
The command above returns documents with a text indexed field containing any form of “interest” and a “star_rating” equal to 5. The output of this command looks something like this:
{ "_id" : 5, "star_rating" : 5, "comments" : "interesting couch" }
{ "_id" : 6, "star_rating" : 5, "comments" : "interested in couch for sale, year 2022" }
Limit the number of documents returned in a text search
You can choose to restrict the number of documents returned by using limit
:
db.test.find({$and: [{star_rating: 5}, {$text: {$search: "couch"}}]}).limit(1)
Output:
The command above returns one result that satisfies the filter:
{ "_id" : 5, "star_rating" : 5, "comments" : "interesting couch" }
Sort results by text score
The following example sorts the text search results by text score:
db.test.find({$text: {$search: "apple"}}, {score: {$meta: "textScore"}}).sort({score: {$meta: "textScore"}})
Output:
The command above returns documents with a text indexed field containing “apple”, or “apple” in it's other formats like “apples”, and sorts the result based on how relevant the document is related to the search term. The output of this command looks something like this:
{ "_id" : 1, "star_rating" : 4, "comments" : "apple is red", "score" : 0.6079270860936958 }
{ "_id" : 3, "star_rating" : 3, "comments" : "apples, oranges - healthy fruit", "score" : 0.6079270860936958 }
{ "_id" : 4, "star_rating" : 2, "comments" : "bake the apple pie in the oven", "score" : 0.6079270860936958 }
$text
and $search
are also supported for aggregate
, count
, findAndModify
, update
, and delete
commands.
Aggregation operators
Aggregation pipeline using $match
db.test.aggregate( [{ $match: { $text: { $search: "apple pie" } } }] )
Output:
The command above returns the following results:
{ "_id" : 1, "star_rating" : 4, "comments" : "apple is red" }
{ "_id" : 3, "star_rating" : 3, "comments" : "apple - a healthy fruit" }
{ "_id" : 4, "star_rating" : 2, "comments" : "bake the apple pie in the oven" }
{ "_id" : 2, "star_rating" : 5, "comments" : "pie is delicious" }
A combination of other aggregation operators
db.test.aggregate( [ { $match: { $text: { $search: "apple pie" } } }, { $sort: { score: { $meta: "textScore" } } }, { $project: { score: { $meta: "textScore" } } } ] )
Output:
The command above returns the following results:
{ "_id" : 4, "score" : 0.6079270860936958 }
{ "_id" : 1, "score" : 0.3039635430468479 }
{ "_id" : 2, "score" : 0.3039635430468479 }
{ "_id" : 3, "score" : 0.3039635430468479 }
Specify multiple fields when creating a text index
You can assign weights to up to three fields in your compound text index. The default weight assigned to a field in a text index is one (1). Weight is an optional parameter and must be in the range from 1 to 100000.
db.test.createIndex( { "firstname": "text", "lastname": "text", ... }, { weights: { "firstname": 5, "lastname":10, ... }, name: "name_text_index" } )
Differences with MongoDB
Amazon DocumentDB’s text index feature uses inverted index with a term-frequency algorithm. Text indexes are sparse by default. Due to differences in parsing logic, tokenization delimiters, and others, the same result set as MongoDB may not be returned for the same dataset or query shape.
The following additional differences between Amazon DocumentDB text index and MongoDB exist:
Compound indexes using non-text indexes are not supported.
Amazon DocumentDB text indexes are case insensitive and diacritics insensitive.
Only English language is supported with text index.
Text indexing of array (or multi-key) fields is not supported. For example, creating a text index on “a“ with the document {“a”:[“apple”, “pie”]} will fail.
Wildcard text indexing is not supported.
Unique text indexes are not supported.
Excluding a term is not supported.
Best practices and guidelines
For optimal performance on text search queries involving sorting by text scores, we recommended that you create the text index before loading data.
Text indexes require additional storage for an optimized internal copy of the indexed data. This has additional cost implications.
Limitations
Text search has the following limitations in Amazon DocumentDB:
Text search is supported on Amazon DocumentDB 5.0 instance-based clusters only.