Using Neptune full-text search in Gremlin queries
NeptuneSearchStep
enables full-text search queries for the part of a Gremlin
traversal that is not converted into Neptune steps. For example, consider a query like the
following.
g.withSideEffect("Neptune#fts.endpoint", "
your-es-endpoint-URL
") .V() .tail(100) .has("name", "Neptune#fts mark*") <== # Limit the search on name
This query is converted into the following optimized traversal in Neptune.
Neptune steps: [ NeptuneGraphQueryStep(Vertex) { JoinGroupNode { PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .], {estimatedCardinality=INFINITY} }, annotations={path=[Vertex(?1):GraphStep], maxVarId=4} }, NeptuneTraverserConverterStep ] + not converted into Neptune steps: [NeptuneTailGlobalStep(100), NeptuneTinkerpopTraverserConverterStep, NeptuneSearchStep { JoinGroupNode { SearchNode[(idVar=?3, query=mark*, field=name) . project ask .], {endpoint=your-OpenSearch-endpoint-URL} } JoinGroupNode { SearchNode[(idVar=?3, query=mark*, field=name) . project ask .], {endpoint=your-OpenSearch-endpoint-URL} } }]
The following examples are of Gremlin queries against air-routes data:
Gremlin basic case-insensitive match
query
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'match') .V().has("city","Neptune#fts dallas") ==>v[186] ==>v[8]
Gremlin match
query
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'match') .V().has("city","Neptune#fts southampton") .local(values('code','city').fold()) .limit(5) ==>[SOU, Southampton]
Gremlin fuzzy
query
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .V().has("city","Neptune#fts allas~").values('city').limit(5) ==>Dallas ==>Dallas ==>Walla Walla ==>Velas ==>Altai
Gremlin query_string
fuzzy query
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'query_string') .V().has("city","Neptune#fts allas~").values('city').limit(5) ==>Dallas ==>Dallas
Gremlin query_string
regular expression query
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'query_string') .V().has("city","Neptune#fts /[dp]allas/").values('city').limit(5) ==>Dallas ==>Dallas
Gremlin hybrid query
This query uses a Neptune internal index and the OpenSearch index in the same query.
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .V().has("region","GB-ENG") .has('city','Neptune#fts L*') .values('city') .dedup() .limit(10) ==>London ==>Leeds ==>Liverpool ==>Land's End
Simple Gremlin full-text search example
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .V().has('desc','Neptune#fts regional municipal') .local(values('code','desc').fold()) .limit(100) ==>[HYA, Barnstable Municipal Boardman Polando Field] ==>[SPS, Sheppard Air Force Base-Wichita Falls Municipal Airport] ==>[ABR, Aberdeen Regional Airport] ==>[SLK, Adirondack Regional Airport] ==>[BFD, Bradford Regional Airport] ==>[EAR, Kearney Regional Airport] ==>[ROT, Rotorua Regional Airport] ==>[YHD, Dryden Regional Airport] ==>[TEX, Telluride Regional Airport] ==>[WOL, Illawarra Regional Airport] ==>[TUP, Tupelo Regional Airport] ==>[COU, Columbia Regional Airport] ==>[MHK, Manhattan Regional Airport] ==>[BJI, Bemidji Regional Airport] ==>[HAS, Hail Regional Airport] ==>[ALO, Waterloo Regional Airport] ==>[SHV, Shreveport Regional Airport] ==>[ABI, Abilene Regional Airport] ==>[GIZ, Jizan Regional Airport] ==>[USA, Concord Regional Airport] ==>[JMS, Jamestown Regional Airport] ==>[COS, City of Colorado Springs Municipal Airport] ==>[PKB, Mid Ohio Valley Regional Airport]
Gremlin query using query_string
With '+' and '-' Operators
Although the query_string
query type is much less forgiving than
the default simple_query_string
type, it does allow for more precise
queries. The first query below uses query_string
, while the second use
the default simple_query_string
:
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'query_string') . V().has('desc','Neptune#fts +London -(Stansted|Gatwick)') .local(values('code','desc').fold()) .limit(10) ==>[LHR, London Heathrow] ==>[YXU, London Airport] ==>[LTN, London Luton Airport] ==>[SEN, London Southend Airport] ==>[LCY, London City Airport]
Notice how simple_query_string
in the examples below
quietly ignores the '+' and '-' operators:
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .V().has('desc','Neptune#fts +London -(Stansted|Gatwick)') .local(values('code','desc').fold()) .limit(10) ==>[LHR, London Heathrow] ==>[YXU, London Airport] ==>[LGW, London Gatwick] ==>[STN, London Stansted Airport] ==>[LTN, London Luton Airport] ==>[SEN, London Southend Airport] ==>[LCY, London City Airport] ==>[SKG, Thessaloniki Macedonia International Airport] ==>[ADB, Adnan Menderes International Airport] ==>[BTV, Burlington International Airport]
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'query_string') .V().has('desc','Neptune#fts +(regional|municipal) -(international|bradford)') .local(values('code','desc').fold()) .limit(10) ==>[CZH, Corozal Municipal Airport] ==>[MMU, Morristown Municipal Airport] ==>[YBR, Brandon Municipal Airport] ==>[RDD, Redding Municipal Airport] ==>[VIS, Visalia Municipal Airport] ==>[AIA, Alliance Municipal Airport] ==>[CDR, Chadron Municipal Airport] ==>[CVN, Clovis Municipal Airport] ==>[SDY, Sidney Richland Municipal Airport] ==>[SGU, St George Municipal Airport]
Gremlin query_string
query with AND
and OR
operators
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'query_string') .V().has('desc','Neptune#fts (St AND George) OR (St AND Augustin)') .local(values('code','desc').fold()) .limit(10) ==>[YIF, St Augustin Airport] ==>[STG, St George Airport] ==>[SGO, St George Airport] ==>[SGU, St George Municipal Airport]
Gremlin term
query
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'term') .V().has("SKU","Neptune#fts ABC123DEF9") .local(values('code','city').fold()) .limit(5) ==>[AUS, Austin]
Gremlin prefix
query
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'prefix') .V().has("icao","Neptune#fts ka") .local(values('code','icao','city').fold()) .limit(5) ==>[AZO, KAZO, Kalamazoo] ==>[APN, KAPN, Alpena] ==>[ACK, KACK, Nantucket] ==>[ALO, KALO, Waterloo] ==>[ABI, KABI, Abilene]
Using Lucene syntax in Neptune Gremlin
In Neptune Gremlin, you can also write very powerful queries using the Lucene query
syntax. Note that Lucene syntax is only supported for query_string
queries
in OpenSearch.
Assume the following data:
g.addV("person") .property(T.id, "p1") .property("name", "simone") .property("surname", "rondelli") g.addV("person") .property(T.id, "p2") .property("name", "simone") .property("surname", "sengupta") g.addV("developer") .property(T.id, "p3") .property("name", "simone") .property("surname", "rondelli")
Using Lucene syntax, which is invoked when the queryType
is
query_string
, you can search this data by name and surname as follows:
g.withSideEffect("Neptune#fts.endpoint", "es_endpoint") .withSideEffect("Neptune#fts.queryType", "query_string") .V() .has("*", "Neptune#fts predicates.name.value:simone AND predicates.surname.value:rondelli") ==> v[p1], v[p3]
Note that in the has()
step above, the field is replaced by
"*"
). Actually, any value placed there is overridden by the fields
that you access within the query. You access the name field using
predicates.name.value,
because that is how the data model is
structured.
You can search by name, surname and label, as follows:
g.withSideEffect("Neptune#fts.endpoint", getEsEndpoint()) .withSideEffect("Neptune#fts.queryType", "query_string") .V() .has("*", "Neptune#fts predicates.name.value:simone AND predicates.surname.value:rondelli AND entity_type:person") ==> v[p1]
The label is accessed using entity_type
, again because that is how
the data model is structured.
You can also include nesting conditions:
g.withSideEffect("Neptune#fts.endpoint", getEsEndpoint()) .withSideEffect("Neptune#fts.queryType", "query_string") .V() .has("*", "Neptune#fts (predicates.name.value:simone AND predicates.surname.value:rondelli AND entity_type:person) OR predicates.surname.value:sengupta") ==> v[p1], v[p2]
Inserting a modern TinkerPop graph
g.addV('person').property(T.id, '1').property('name', 'marko').property('age', 29) .addV('personr').property(T.id, '2').property('name', 'vadas').property('age', 27) .addV('software').property(T.id, '3').property('name', 'lop').property('lang', 'java') .addV('person').property(T.id, '4').property('name', 'josh').property('age', 32) .addV('software').property(T.id, '5').property('name', 'ripple').property('lang', 'java') .addV('person').property(T.id, '6').property('name', 'peter').property('age', 35) g.V('1').as('a').V('2').as('b').addE('knows').from('a').to('b').property('weight', 0.5f).property(T.id, '7') .V('1').as('a').V('3').as('b').addE('created').from('a').to('b').property('weight', 0.4f).property(T.id, '9') .V('4').as('a').V('3').as('b').addE('created').from('a').to('b').property('weight', 0.4f).property(T.id, '11') .V('4').as('a').V('5').as('b').addE('created').from('a').to('b').property('weight', 1.0f).property(T.id, '10') .V('6').as('a').V('3').as('b').addE('created').from('a').to('b').property('weight', 0.2f).property(T.id, '12') .V('1').as('a').V('4').as('b').addE('knows').from('a').to('b').property('weight', 1.0f).property(T.id, '8')
Sort by string field value example
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'query_string') .withSideEffect('Neptune#fts.sortOrder', 'asc') .withSideEffect('Neptune#fts.sortBy', 'name') .V().has('name', 'Neptune#fts marko OR vadas OR ripple')
Sort by non-string field value example
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'query_string') .withSideEffect('Neptune#fts.sortOrder', 'asc') .withSideEffect('Neptune#fts.sortBy', 'age.value') .V().has('name', 'Neptune#fts marko OR vadas OR ripple')
Sort by ID field value example
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'query_string') .withSideEffect('Neptune#fts.sortOrder', 'asc') .withSideEffect('Neptune#fts.sortBy', 'Neptune#fts.entity_id') .V().has('name', 'Neptune#fts marko OR vadas OR ripple')
Sort by label field value example
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'query_string') .withSideEffect('Neptune#fts.sortOrder', 'asc') .withSideEffect('Neptune#fts.sortBy', 'Neptune#fts.entity_type') .V().has('name', 'Neptune#fts marko OR vadas OR ripple')
Sort by document_type
field value example
g.withSideEffect("Neptune#fts.endpoint", "
your-OpenSearch-endpoint-URL
") .withSideEffect('Neptune#fts.queryType', 'query_string') .withSideEffect('Neptune#fts.sortOrder', 'asc') .withSideEffect('Neptune#fts.sortBy', 'Neptune#fts.document_type') .V().has('name', 'Neptune#fts marko OR vadas OR ripple')