PageRank mutate centrality algorithm
The ranking metric computed by .pageRank.mutate
can indicate the importance
of a node within a given graph, with the most important nodes having the highest
score, and the least important node having the lowest score. PageRank is used in
search engines to rank web pages based on their importance and influence,
in citation networks to identify highly cited scientific papers, and in
recommendation systems to suggest popular and relevant content to users.
The mutate variant of the PageRank algorithm performs the PageRank calculation over the entire graph unless the configuration parameters establish a filter, and each traversed node's calculated PageRank value is stored on that node as a property.
pageRank.mutate
inputs
Inputs for the pageRank.mutate
algorithm are passed in a configuration
object parameter that contains:
-
edgeLabels (optional) – type: a list of edge label strings; example:
["route",
; default: no edge filtering....
]To filter on one more edge labels, provide a list of the ones to filter on. If no
edgeLabels
field is provided then all edge labels are processed during traversal. -
writeProperty (required) – type:
string
; default: none.A name for the new vertex property that will contain the computed PageRank values. If a property of that name already exists, it is overwritten.
-
vertexLabel (optional) – type:
string
; default: none.A vertex label for vertex filtering. If a vertex label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
-
traversalDirection (optional) – type:
string
; default:"outbound"
.The direction of edge to follow. Must be one of:
"outbound"
or"inbound"
. -
numOfIterations (optional) – type: a positive integer greater than zero; default: 20.
The number of iterations to perform to reach convergence. A number between 10 and 20 is recommended.
-
dampingFactor (optional) – type: a positive floating-point number less than or equal to
1.0
; default:0.85
.A positive floating-point damping factor between 0.0 and 1.0 that expresses the probability, at any step, that the node will continue.
-
concurrency (optional) – type: 0 or 1; default: 0.
Controls the number of concurrent threads used to run the algorithm.
If set to
0
, uses all available threads to complete execution of the individual algorithm invocation. If set to1
, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently. -
tolerance (optional) – a floating point number between 0.0 and 1.0 (inclusive). When the average difference in the pageRank values of two iterations drops below
tolerance
, the algorithm stops, regardless of whether thenumOfIterations
is reached. Default value is0.000001 (1e-6)
.-
Note that this tolerance computation is equivalent to L1 error or sum of Mean Absolute Difference (MAE)s.
-
The stopping condition is
l1_error_sum < tolerance * numNodes
, equivalent tol1_error_sum/numNodes < tolerance
.
-
-
edgeWeightProperty (optional) – type:
string
default: none.The weight property to consider for weighted pageRank computation.
-
edgeWeightType (optional) - required if
edgeWeightProperty
is present – type:string
; default: none.The type of values associated with the edgeWeightProperty argument, specified as a string. valid values: "int", "long", "float", "double".
-
If the edgeWeightProperty is not given, the algorithm runs unweighted no matter if the edgeWeightType is given or not.
-
Note that if multiple properties exist on the edge with the name specified by edgeWeightProperty, one of those property values will be sampled at random.
-
Outputs for the pageRank.mutate
algorithm
The computed PageRank values are written to a new vertex property on each node
using the property name specified by the writeProperty
input parameter.
A single Boolean success
value (true
or false
)
is returned, which indicates whether or not the writes succeeded.
Query example for pageRank.mutate
The example below computes the PageRank score of every vertex in
the graph, and writes that score to a new vertex property named P_RANK
:
CALL neptune.algo.pageRank.mutate( { writeProperty:"P_RANK", dampingFactor: 0.85, numOfIterations: 1, edgeLabels: ["route"] } )
This query illustrates how you could then access the PageRank values in the
P_RANK
vertex property. It counts how many nodes have a P_RANK
property value greater than the "SEA" node's P_RANK
property value:
MATCH (n) WHERE n.code = "SEA" WITH n.P_RANK AS lowerBound MATCH (m) WHERE m.P_RANK > lowerBound RETURN count(m)
Sample .pageRank.mutate
output
Here is an example of the output returned by .pageRank.mutate when run against the
sample air-routes dataset [nodes]
aws neptune-graph execute-query \ --graph-identifier ${graphIdentifier} \ --query-string "CALL neptune.algo.pageRank.mutate({writeProperty: 'prscore'}) YIELD success RETURN success" \ --language open_cypher \ /tmp/out.txt cat /tmp/out.txt { "results": [ { "success": true } ] }