

# Centrality algorithms in Neptune Analytics
<a name="centrality-algorithms"></a>

Centrality algorithms utilize the topology of a network to determine the relative importance or influence of a specific node within the graph. By measuring the relative importance of a node or edge within a network, centrality values can indicate which elements in a graph play a critical role in that network.

By identifying the most influential or important nodes within a network, centrality algorithms can provide insights about key players or critical points of interaction. This is valuable in social network analysis, where it helps pinpoint influential individuals, and in transportation networks, where it aids in identifying crucial hubs for efficient routing and resource allocation.

Different types of centrality algorithms use different techniques to measure the importance of a node. Understanding how an algorithm calculates centrality is important to understanding the meaning of its outputs.

In addition to returning centrality data to the client, Neptune Analytics provides mutate variations of the centrality algorithms which store the calculated centrality values as vertex properties in the graph.

Neptune Analytics supports four centrality algorithms along with their mutate variants:
+ [`degree`](degree.md)   –   This measures a node's centrality by the number of edges connected to it, and can therefore be used to find the most connected nodes in a network.
+ [`degree.mutate`](degree-mutate.md)   –   The degree centrality mutate algorithm measures the number of incident edges of each vertex it traverses and writes that calculated degree value as a property of the vertex.
+ [`pageRank`](page-rank.md)   –   This is an iterative algorithm that measures a node's centrality by the number and quality of incident edges and adjacent vertices. The centrality of a node connected to a few important nodes may therefore be higher than that of a node connected to many less important nodes. The output of this algorithm is a value that indicates the importance of a given node relative to the other nodes in the graph.
+ [`pageRank.mutate`](page-rank-mutate.md)   –   This algorithm stores the calculated PageRank of a given node as a property of the node.
+ [`closenessCentrality`](closeness-centrality.md)   –   This algorithm computes the closeness centrality (CC) metric of nodes in a graph. The closeness centrality metric of a vertex is a positive measure of how close it is to all other vertices, or how central it is in the graph. Because it indicates how quickly all other nodes in a network can be reached from a given node, it can be used in transportation networks to identify key hub locations, and in disease-spread modeling to pinpoint central locations for targeted intervention efforts.
+ [`closenessCentrality.mutate`](closeness-centrality-mutate.md)   –   This algorithm computes the closeness centrality (CC) metric of vertices in a graph and writes them as a property of each vertex.

# Degree centrality algorithm
<a name="degree"></a>

The `.degree` centrality algorithm counts the number of incident edges at each node that it traverses. This measure of how connected the node is can in turn indicate the node's importance and level of influence in the network.

The `.degree` algorithm is used in social networks to identify popular individuals with many connections, in transportation networks to locate central hubs with numerous roads leading to and from them, and in web analysis to find influential web pages with many incoming links.

The time complexity of `.degree` is `O(|E|)`, where `|E|` is the number of edges in the graph. The space complexity is `O(|V|)`, where `|V|` is the number of vertices in the graph.

## `.degree`   syntax
<a name="degree-syntax"></a>

```
CALL neptune.algo.degree(
  [node list (required)],
  {
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabels: [a list of vertex labels for filtering (optional)], 
    vertexLabel: [a node label for filtering (optional) [deprecated]],
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD node, degree
RETURN node, degree
```

## Inputs for the `.degree` algorithm
<a name="degree-inputs"></a>
+ **a node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes for which to return the edge count (degree). If an empty list is provided, the query result is also empty.

  If the algorithm is called following a `MATCH` clause (query integration), the result returned by the `MATCH` clause is taken as the node list. 
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabels**   *(optional)*   –   *type:* a list of vertex label strings;   *example:* `["airport", ...]`;   *default:* no vertex filtering.

    To filter on one more vertex labels, provide a list of the ones to filter on. If no `vertexLabels` field is provided then all vertex labels are processed during traversal.
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    [deprecated]
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.degree`   outputs
<a name="degree-outputs"></a>
+ **node**   –   A list of the requested nodes. If `vertexLabels` or `vertexLabel` is present, only the requested nodes that match in `vertexLabels ` or `vertexLabel` value are included.
+ **degree**   –   A list of corresponding degree values for the nodes with respect to edges with a label specified in `edgeLabels`.

If the input vertex list is empty, the output is empty.

## Query examples for `.degree`
<a name="degree-query-examples"></a>

This is a standalone example, where the source node list is explicitly specified in the query:

```
CALL neptune.algo.degree(["101"], {edgeLabels: ["route"]})
```

This is a more complicated standalone query submitted using the AWS CLI:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string 'CALL neptune.algo.degree(
        ["101", "102", "103"],
        {
          edgeLabels: ["route"],
          vertexLabel: ["airport"],
          traversalDirection: "inbound",
          concurrency: 2
        }
      )
      YIELD node, degree
      RETURN node, degree' \
  --language open_cypher \
  /tmp/out.txt
```

This is a query integration example with frontier injection, where `.degree` follows a `MATCH` clause and finds the degree value for all vertices returned by `MATCH(n:airport)`:

```
MATCH(n:airport)
CALL neptune.algo.degree(n, {edgeLabels: ["route"]})
YIELD degree
RETURN n, degree'
```

This is an example of multiple `.degree` invocations chained together, where the output of one invocation serves as the input of another:

```
CALL neptune.algo.degree(
  ["108"],
  {
    edgeLabels: ["route"],
    vertexLabel: ["airport"]
  }
)
YIELD node
CALL neptune.algo.degree(
  node,
  {
    edgeLabels: ["route"],
    vertexLabel: ["airport"]
  }
)
YIELD node AS node2 WITH id(node2) AS id
RETURN id
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample   `.degree`   output
<a name="degree-sample-output"></a>

Here is an example of the output returned by .degree when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string 'MATCH (n)
      CALL neptune.algo.degree(n)
      YIELD node, degree
      RETURN node, degree
      LIMIT 2' \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    {
      "node": {
        "~id": "10",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 38.94449997,
          "elev": 313,
          "longest": 11500,
          "city": "Washington D.C.",
          "type": "airport",
          "region": "US-VA",
          "desc": "Washington Dulles International Airport",
          "code": "IAD",
          "prscore": 0.002264724113047123,
          "lon": -77.45580292,
          "wccid": 2357352929951779,
          "country": "US",
          "icao": "KIAD",
          "runways": 4
        }
      },
      "degree": 312
    },
    {
      "node": {
        "~id": "12",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 40.63980103,
          "elev": 12,
          "longest": 14511,
          "city": "New York",
          "type": "airport",
          "region": "US-NY",
          "desc": "New York John F. Kennedy International Airport",
          "code": "JFK",
          "prscore": 0.002885053399950266,
          "lon": -73.77890015,
          "wccid": 2357352929951779,
          "country": "US",
          "icao": "KJFK",
          "runways": 4
        }
      },
      "degree": 403
    }
  ]
}
```

# Degree mutate centrality algorithm
<a name="degree-mutate"></a>

The `.degree.mutate` centrality algorithm counts the number of incident edges of every node in the graph. This measure of how connected the node is can in turn indicate the node's importance and level of influence in the network. The `.degree.mutate` algorithm then stores each node's calculated degree value as a property of the node.

The algorithm returns a single success flag (`true` or `false`), which indicates whether the writes succeeded or failed.

## `.degree.mutate`   syntax
<a name="degree-mutate-syntax"></a>

```
CALL neptune.algo.degree.mutate(
  {
    writeProperty: A name for the new node property where the degree values will be written,
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabels: [a list of vertex labels for filtering (optional)],
    vertexLabel: "a node label for filtering (optional) [deprecated]",
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD success
RETURN success
```

## `.degree.mutate`   inputs
<a name="degree-mutate-inputs"></a>

**a configuration object that contains:**
+ **writeProperty** *(required)*   –   *type:* `string`;   *default: none*.

  A name for the new node property that will contain the computed degree values.
+ **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

  To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
+ **vertexLabels**   *(optional)*   –   *type:* a list of vertex label strings;   *example:* `["airport", ...]`;   *default:* no vertex filtering.

  To filter on one more vertex labels, provide a list of the ones to filter on. If no `vertexLabels` field is provided then all vertex labels are processed during traversal.
+ **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

   [deprecated] A node label for node filtering. Note that it is deprecated. If vertexLabels is provided, vertexLabel is ignored.
+ **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

  The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
+ **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

  Controls the number of concurrent threads used to run the algorithm.

   If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Output of the `.degree.mutate` algorithm
<a name="degree-mutate-output"></a>

The computed degree values are written to a new vertex property using the property name specified by the `writeProperty` input parameter.

A single Boolean `success` value (`true` or `false`) is returned, which indicates whether or not the writes succeeded.

## `.degree.mutate` query examples
<a name="degree-mutate-query-examples"></a>

The example below is a standalone example, where the source vertex list is explicitly provided in the query.

This query writes the degree values of all nodes in the graph to a new vertex property called `DEGREE`:

```
CALL neptune.algo.degree.mutate({writeProperty: "DEGREE", edgeLabels: ["route]})
```

After using the mutate algorithm, the newly written properties can then be accessed in subsequent queries. For example, after the mutate algorithm call above, you could use the following query to retrieve the `.degree` property of specific nodes:

```
MATCH (n) WHERE id(n) IN ["101", "102", "103"]
RETURN n.DEGREE'
```

## Sample output from `.degree.mutate`
<a name="degree-mutate-query-examples"></a>

Here is an example of the output returned by .degree.mutate when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.degree.mutate({writeProperty: 'degree'}) YIELD success RETURN success" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt   
{
  "results": [
    { "success": true }
  ]
}
```

# PageRank centrality algorithm
<a name="page-rank"></a>

PageRank is an algorithm originally developed by Larry Page and Sergey Brin, co-founders of Google. It was originally developed to rank web pages in search engine results. The PageRank score for a given node is calculated based on the number and quality of the edges pointing to that node, as well as the importance of the nodes that are connected to it. The PageRank algorithm assigns a higher score to nodes that are linked to other high-scoring nodes, and a lower score to nodes that are linked to low-scoring nodes.

The output of PageRank can be visualized as a ranking metric for the importance of a node within a given graph, with the most important nodes having the highest score, and the least important node having the lowest score. PageRank is used in search engines to rank web pages based on their importance and influence, in citation networks to identify highly cited scientific papers, and in recommendation systems to suggest popular and relevant content to users.

The space complexity is O(\$1V\$1), where \$1V\$1 is the number of vertices in the graph.

## Personalized PageRank
<a name="personalized-page-rank"></a>

 Personalized PageRank is a variation of the original PageRank algorithm. It is generally used to measure the importance of vertices in a graph. It tailors the ranking process to individual users or specific topics. Compared to PageRank, the result of personalized pagerank is a more customized ranking of web pages or nodes in a graph, reflecting individual interests or specific areas of focus. This approach is particularly useful in recommendation systems, personalized search results, and analyzing large-scale networks with diverse content for a specific focus. 

### Example scenario: Online retail platform
<a name="personalized-pageRank-example-1"></a>

 Imagine an online retail platform where each product is a node, and customer purchases (or views) between products are edges. 

**Regular PageRank**
+  Objective: Rank products based on their general popularity across all customers. 
+  Result: Products that are frequently purchased or viewed by many customers will receive higher PageRank scores. This helps identify the most popular products across the entire platform. 

**Personalized PageRank**
+  Objective: Rank products based on their relevance to a specific customer's shopping behavior. 
+  Inputs: 
  +  Source Nodes: A list of products that the customer has previously purchased or shown interest in. 
  +  Source Weights: Optional weights indicating the relative importance of each source product (e.g., higher weight for recently purchased items). 
+  Result: Products that are not only popular but also closely related to the customer's specific shopping behavior will receive higher scores. This helps the platform recommend new products that are more likely to interest the customer. 

### Example scenario: Organizational network security
<a name="personalized-pageRank-example-2"></a>

 Imagine a network of computers within an organization where each computer is a node, and communication paths (like data transfers or network connections) between computers are edges. 

**Regular PageRank**
+  Objective: Rank computers based on their general importance within the organizational network. 
+  Result: Computers that have a high number of connections to other computers will receive higher PageRank scores. This helps identify critical nodes in the network that, if compromised, could have a significant impact on the entire network. 

**Personalized PageRank**
+  Objective: Rank computers based on their relevance to a specific security concern or department within the organization. 
+  Inputs: 
  +  Source Nodes: A list of computers that are known to handle sensitive data or are critical to a specific department (e.g., HR, Finance). 
  +  Source Weights: Optional weights indicating the relative importance of each source computer (e.g., higher weight for computers handling more sensitive data). 
+  Result: Computers that are not only well-connected but also closely related to the specific security concern or department will receive higher scores. This helps the cybersecurity team prioritize monitoring and protection efforts on the most critical nodes. 

### Example scenario: Insurance policyholder network
<a name="personalized-pageRank-example-3"></a>

 Imagine a network of policyholders where each policyholder is a node, and the relationships (like shared claims, referrals, or common risk factors) between policyholders are edges. 

**Regular PageRank**
+  Objective: Rank policyholders based on their general importance within the insurance network. 
+  Result: Policyholders who have a high number of connections to other policyholders (e.g., through shared claims or referrals) will receive higher PageRank scores. This helps identify influential policyholders who may have a significant impact on the network. 

**Personalized PageRank**
+  Objective: Rank policyholders based on their relevance to a specific risk profile or insurance product. 
+  Inputs: 
  +  Source Nodes: A list of policyholders that fit a specific risk profile or are relevant to a particular insurance product (e.g., high-risk drivers for auto insurance). 
  +  Source Weights: Optional weights indicating the relative importance of each source policyholder (e.g., higher weight for policyholders with more recent claims). 
+  Result: Policyholders who are not only well-connected but also closely related to the specific risk profile or insurance product will receive higher scores. This helps the insurance company tailor its risk assessment and underwriting processes more effectively. 

 The additional inputs to the algorithm are a list of vertices to be personalized on (`sourceNodes`) and optionally the weight distribution among those vertices (`sourceWeights`). If given, the weights are normalized before pagerank computation. 

**Note**  
 Neptune Analytics allows up to 8192 vertices in the personalization vector, sourceNodes. 

## `.pageRank`  syntax
<a name="page-rank-syntax"></a>

```
CALL neptune.algo.pageRank(
  [node list (required)],
  {
    numOfIterations: a small positive integer like 20 (optional),
    dampingFactor: a positive float less than or equal to 1.0, like 0.85 (optional)
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    concurrency: number of threads to use (optional),
    traversalDirection: the direction of edge to follow (optional),
    tolerance: a floating point number between 0.0 and 1.0 (inclusive)(optional),
    edgeWeightProperty: the weight property to consider for weighted pageRank computation (optional),
    edgeWeightType: The type of values associated with the edgeWeightProperty argument (optional),
    sourceNodes: [A list of node IDs to personalize on (optional)],
    sourceWeights: [A list of non-negative weights for the sourceNodes (optional)]
  }
)
YIELD node, rank
RETURN node, rank
```

## `.pageRank`  inputs
<a name="page-rank-inputs"></a>
+ **a node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes for which to return the page rank values. If an empty list is provided, the query result will also be empty.

  If the algorithm is called following a `MATCH` clause (query integration), the result returned by the `MATCH` clause is taken as the node list.
+ 

**a configuration object that contains:**
  + **numOfIterations** *(optional)*   –   *type:* a positive integer greater than zero;   *default: 20*.

    The number of iterations to perform to reach convergence. A number between 10 and 20 is recommended.
  + **dampingFactor** *(optional)*   –   *type:* a positive floating-point number less than or equal to `1.0`;   *default: *` 0.85`.

    A positive floating-point damping factor between 0.0 and 1.0 that expresses the probability, at any step, that the node will continue.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A vertex label for vertex filtering. If a vertex label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"outbound"` or `"inbound"`.
  + **tolerance** *(optional)*   –   a floating point number between 0.0 and 1.0 (inclusive). When the average difference in the pageRank values of two iterations drops below `tolerance`, the algorithm stops, regardless of whether the `numOfIterations` is reached. Default value is `0.000001 (1e-6)`.
    + Note that this tolerance computation is equivalent to L1 error or sum of Mean Absolute Difference (MAE)s.
    + The stopping condition is `l1_error_sum < tolerance * numNodes`, equivalent to `l1_error_sum/numNodes < tolerance`.
  + **edgeWeightProperty** *(optional)*   –   *type:* `string`   *default: none*.

    The weight property to consider for weighted pageRank computation.
  + **edgeWeightType** *(optional) - required if `edgeWeightProperty` is present*   –   *type:* `string`;   *default: none*.

    The type of values associated with the edgeWeightProperty argument, specified as a string. *valid values*: "int", "long", "float", "double".
    + If the edgeWeightProperty is not given, the algorithm runs unweighted no matter if the edgeWeightType is given or not.
    + Note that if multiple properties exist on the edge with the name specified by edgeWeightProperty, one of those property values will be sampled at random.
  + **sourceNodes** *(optional) - required if running personalized PageRank*   –   *type:* `list`;   *default: none*.

    A personalization vertex list ["101", ...]
    +  Can include 1 to 8192 vertices. 
    +  If a `vertexLabel` is provided, nodes that do not have the given `vertexLabel` are ignored. 
  + **sourceWeights** *(optional)*   –   *type:* `list`;   *default: none*.

    A personalization weight list. The weight distribution among the personalized vertices.
    +  If not provided, the default behavior is uniform distribution among the vertices given in `sourceNodes`. 
    +  There must be at least one non-zero weight in the list. 
    +  The length of the sourceWeights list must match the `sourceNodes` list. 
    +  The mapping of personalization vertex and weight lists are one to one. The first value in the weight list corresponds to the weight of first vertex in the vertex list, second value is for the second vertex, etc. 
    +  The weights can be one of `int`, `long`, `float`, or `double` types. 

## Outputs for the `.pageRank` algorithm
<a name="page-rank-outputs"></a>
+ **node**   –   A key column of the input nodes.
+ **rank**   –   A key column of the corresponding page-rank scores for those nodes.

If the input nodes list is empty, the output is empty.

## Query examples for `.pageRank`
<a name="page-rank-query-examples"></a>

This is a standalone example, where the input vertex list is explicitly specified in the query.

```
CALL neptune.algo.pageRank(
  ["101"],
  {
    numOfIterations: 1,
    dampingFactor: 0.85,
    edgeLabels: ["route"]
  }
)
```

This is a query integration example, where `.pageRank` follows a `MATCH` clause and uses frontier injection to take the output of the `MATCH` clause as its list of input nodes:

```
MATCH (n)
CALL neptune.algo.pageRank(
  n,
  {
    dampingFactor: 0.85,
    numOfIterations: 1,
    edgeLabels: ["route"]
  }
)
YIELD rank
RETURN n, rank
```

This query is an example of constraining the results of `.pageRank` based on the PageRank values, and returning them in ascending order:

```
MATCH (n)
CALL neptune.algo.pageRank(
  n,
  {
    numOfIterations: 10,
    dampingFactor: 0.85,
    tolerance: 0.0001,
    vertexLabel: "airport",
    edgeLabels: ["route"]
  }
)
YIELD rank WHERE rank > 0.004
RETURN n, rank ORDER BY rank
```

## Query examples for Personalized `.pageRank`
<a name="personalized-page-rank-query-examples"></a>

 Personalized PageRank applies the same integration and constraints. Here are some examples that pass personalization-specific configurations. 

 This is a query integration example, where .pageRank follows a MATCH clause and uses frontier injection to take the output of the MATCH clause as its input list. We use nodes “101” and “102” as personalization nodes with "1" and "1.5" weights as weights respectively: 

```
MATCH (n)
CALL neptune.algo.pageRank( n, {
    tolerance: 0.001,
    edgeLabels:["route”],
    sourceNodes:["101", "102"],
    sourceWeights:[1, 1.5]
} )
YIELD node, rank
RETURN node, rank
```

 This is an example of where only source nodes is provided. The weights of ”101” and ”102” will be "1" and "1" (same, uniform) respectively. 

```
MATCH (n)
CALL neptune.algo.pageRank( n, {
    tolerance: 0.001,
    edgeLabels:["route”],
    sourceNodes:["101", "102"],
} )
YIELD node, rank
RETURN node, rank
```

 This is an example where the weights are given as integral numbers. Note that this would yield the same result as the first example in which the weights were ("1" and "1.5"): 

```
MATCH(n)
CALL neptune.algo.pageRank( n, {
    tolerance: 0.001,
    edgeLabels:["route”],
    sourceNodes:["101", "102"],
    sourceWeights:[2, 3]
} )
YIELD node, rank
RETURN node, rank
```

## Sample   `.pageRank`   output
<a name="page-rank-sample-output"></a>

Here is an example of the output returned by .pageRank when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.pageRank(n) YIELD node, rank RETURN node, rank LIMIT 2" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    {
      "node": {
        "~id": "2709",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 65.4809036254883,
          "elev": 49,
          "longest": 8711,
          "city": "Nadym",
          "type": "airport",
          "region": "RU-YAN",
          "desc": "Nadym Airport",
          "code": "NYM",
          "lon": 72.6988983154297,
          "country": "RU",
          "icao": "USMM",
          "runways": 1
        }
      },
      "rank": 0.00016044313088059425
    },
    {
      "node": {
        "~id": "3747",
        "~entityType": "node",
        "~labels": ["continent"],
        "~properties": {
          "code": "AN",
          "type": "continent",
          "desc": "Antarctica"
        }
      },
      "rank": 0.0000404242
    }
  ]
}
```

# PageRank mutate centrality algorithm
<a name="page-rank-mutate"></a>

The ranking metric computed by `.pageRank.mutate` can indicate the importance of a node within a given graph, with the most important nodes having the highest score, and the least important node having the lowest score. PageRank is used in search engines to rank web pages based on their importance and influence, in citation networks to identify highly cited scientific papers, and in recommendation systems to suggest popular and relevant content to users.

The mutate variant of the PageRank algorithm performs the PageRank calculation over the entire graph unless the configuration parameters establish a filter, and each traversed node's calculated PageRank value is stored on that node as a property.

## `pageRank.mutate`  inputs
<a name="page-rank-mutate-inputs"></a>

Inputs for the `pageRank.mutate` algorithm are passed in a configuration object parameter that contains:
+ **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

  To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
+ **writeProperty** *(required)*   –   *type:* `string`;   *default: none*.

  A name for the new vertex property that will contain the computed PageRank values. If a property of that name already exists, it is overwritten.
+ **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

  A vertex label for vertex filtering. If a vertex label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
+ **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

  The direction of edge to follow. Must be one of: `"outbound"` or `"inbound"`.
+ **numOfIterations** *(optional)*   –   *type:* a positive integer greater than zero;   *default: 20*.

  The number of iterations to perform to reach convergence. A number between 10 and 20 is recommended.
+ **dampingFactor** *(optional)*   –   *type:* a positive floating-point number less than or equal to `1.0`;   *default: *` 0.85`.

  A positive floating-point damping factor between 0.0 and 1.0 that expresses the probability, at any step, that the node will continue.
+ **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

  Controls the number of concurrent threads used to run the algorithm.

   If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.
+ **tolerance** *(optional)*   –   a floating point number between 0.0 and 1.0 (inclusive). When the average difference in the pageRank values of two iterations drops below `tolerance`, the algorithm stops, regardless of whether the `numOfIterations` is reached. Default value is `0.000001 (1e-6)`.
  + Note that this tolerance computation is equivalent to L1 error or sum of Mean Absolute Difference (MAE)s.
  + The stopping condition is `l1_error_sum < tolerance * numNodes`, equivalent to `l1_error_sum/numNodes < tolerance`.
+ **edgeWeightProperty** *(optional)*   –   *type:* `string`   *default: none*.

  The weight property to consider for weighted pageRank computation.
+ **edgeWeightType** *(optional) - required if `edgeWeightProperty` is present*   –   *type:* `string`;   *default: none*.

  The type of values associated with the edgeWeightProperty argument, specified as a string. *valid values*: "int", "long", "float", "double".
  + If the edgeWeightProperty is not given, the algorithm runs unweighted no matter if the edgeWeightType is given or not.
  + Note that if multiple properties exist on the edge with the name specified by edgeWeightProperty, one of those property values will be sampled at random.
+ **sourceNodes** *(optional) - required if running personalized PageReank*   –   *type:* `list`;   *default: none*.

  A personalization vertex list ["101", ...]
  +  Can include 1 to 8192 vertices. 
  +  If a `vertexLabel` is provided, nodes that do not have the given `vertexLabel` are ignored. 
+ **sourceWeights** *(optional)*   –   *type:* `list`;   *default: none*.

  A personalization weight list. The weight distribution among the personalized vertices.
  +  If not provided, the default behavior is uniform distribution among the vertices given in `sourceNodes`. 
  +  There must be at least one non-zero weight in the list. 
  +  The length of the sourceWeights list must match the `sourceNodes` list. 
  +  The mapping of personalization vertex and weight lists are one to one. The first value in the weight list corresponds to the weight of first vertex in the vertex list, second value is for the second vertex, etc. 
  +  The weights can be one of `int`, `long`, `float`, or `double` types. 

## Outputs for the `pageRank.mutate` algorithm
<a name="page-rank-mutate-outputs"></a>

The computed PageRank values are written to a new vertex property on each node using the property name specified by the `writeProperty` input parameter.

A single Boolean `success` value (`true` or `false`) is returned, which indicates whether or not the writes succeeded.

## Query example for `pageRank.mutate`
<a name="page-rank-mutate-query-example"></a>

The example below computes the PageRank score of every vertex in the graph, and writes that score to a new vertex property named `P_RANK`:

```
CALL neptune.algo.pageRank.mutate(
  {
    writeProperty:"P_RANK",
    dampingFactor: 0.85,
    numOfIterations: 1,
    edgeLabels: ["route"]
  }
)
```

This query illustrates how you could then access the PageRank values in the `P_RANK` vertex property. It counts how many nodes have a `P_RANK` property value greater than the "SEA" node's `P_RANK` property value:

```
MATCH (n) WHERE n.code = "SEA" WITH n.P_RANK AS lowerBound
MATCH (m) WHERE m.P_RANK > lowerBound
RETURN count(m)
```

## Query examples for Personalized `pageRank.mutate`
<a name="personalized-page-rank-mutate-query-examples"></a>

 Personalized PagerRank applies the same integration and constraints. Here are some examples that pass personalization-specific configurations. 

 The example below computes the Personalized PageRank score of every vertex in the graph, and writes that score to a new vertex property named "PRS\$1RANK": 

```
CALL neptune.algo.pageRank.mutate(
  {
    writeProperty:"PRS_RANK",
    sourceNodes:[”101”, “103”, “105”],
    sourceWeights:[5, 3, 2],
    dampingFactor: 0.85,
    numOfIterations: 1,
    edgeLabels: ["route"]
  }
)
```

## Sample   `.pageRank.mutate`   output
<a name="page-rank-mutate-sample-output"></a>

Here is an example of the output returned by .pageRank.mutate when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.pageRank.mutate({writeProperty: 'prscore'}) YIELD success RETURN success" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    { "success": true }
  ]
}
```

# Closeness centrality algorithm
<a name="closeness-centrality"></a>

The closeness centrality algorithm computes a Closeness Centrality (CC) metric for specified nodes in a graph. The CC metric of a node can be used as a positive measure of how close it is to all other nodes or how central it is in the graph.

The CC metric can be interpreted to show how quickly all other nodes in a network can be reached from a given node, and how important it is as a central hub for rapid information flow. It can be used in transportation networks to identify key hub locations, and in disease-spread modeling to pinpoint central points for targeted intervention efforts.

The closeness centrality (CC) score of a node is calculated based on the sum of its distances to all other nodes. The CC score itself is the inverse of that number; in other words, one divided by that sum. In practice, the calculation is commonly normalized to use the average length of the shortest paths rather than the actual sum of their lengths.

## `.closenessCentrality`  syntax
<a name="closeness-centrality-syntax"></a>

```
CALL neptune.algo.closenessCentrality(
  [node list (required)],
  {
    numSources: the number of BFS sources to use for computing the CC (required)
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional),
    normalize: Boolean, set to false to prevent normalization (optional)
    concurrency: number of threads to use (optional)
  }
)
YIELD node, score
RETURN node, score
```

## Inputs for the `.closenessCentrality` algorithm
<a name="closenessCentrality-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting locations for the algorithm. Each node in the list triggers an execution of the algorithm. If an empty list is provided, the query result is also empty.

  If the algorithm is called following a `MATCH` clause (query algo integration), the source query list is the result returned by the `MATCH` clause. 
+ 

**a configuration object that contains:**
  + **numSources** *(required)*   –   *type:* `unsigned long`;   *default: none*.

    The number of BFS sources for computing approximate Closeness Centrality (CC). To compute exact closeness centrality, set `numSources` to a number larger than number of nodes, such as `maxInt`.

    Because of the computational complexity of the algorithm for large graphs, it's generally best to specify a number in the order of thousands to ten thousands, such as 8,192.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **normalize** *(optional)*   –   *type:* `Boolean`;   *default:*` true`.

    You can use this field to turn off normalization, which is on by default. Without normalization, only centrality scores of nodes within the same component can be meaningfully compared. Normalized scores can be compared across different connected components.

    The CC is normalized using the Wasserman-Faust normalization formula for unconnected graphs. If there are `n` vertices reachable from vertex `u` (including vertex `u` itself), the Wasserman-Faust normalized closeness centrality score of vertex `u` is calculated as follows:

    ```
    (n-1)^2 / (|V| - 1) * sum(distance from u to these n vertices)
    ```

    Without normalization, the centrality score of vertex `u` is calculated as:

    ```
    (|V| - 1) / sum(distance from u to all other vertices in the graph)
    ```
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A node label for node filtering. If a node label is provided, nodes matching the label are the only nodes that are included, including nodes in the input list.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `.closenessCentrality` algorithm
<a name="closenessCentrality-outputs"></a>
+ **node**   –   A key column of the input nodes.
+ **score**   –   A key column of the corresponding closeness-centrality (CC) scores for those nodes.

If the input node list is empty, the output is empty.

## `.closenessCentrality`  query examples
<a name="closenessCentrality-query-examples"></a>

This is a standalone example, where the source node list is explicitly provided in the query:

```
CALL neptune.algo.closenessCentrality(
  ["101"],
  {
    numSources: 10,
    edgeLabels: ["route"],
    vertexLabel: "airport",
    traversalDirection: "outbound",
    normalize: true,
    concurrency: 1
  }
)
YIELD node, score
RETURN node, score
```

This is a query integration example, where `.closenessCentrality.mutate` follows a `MATCH` clause and uses the output of the `MATCH` clause as its list of source nodes:

```
Match (n)
CALL neptune.algo.closenessCentrality(
  n,
  {
    numSources: 10,
    edgeLabels: ["route"],
    vertexLabel: "airport",
    traversalDirection: "outbound",
    normalize: true,
    concurrency: 1
  }
)
YIELD score
RETURN n, score
```

This is a query integration example that returns the nodes with the 10 highest CC scores:

```
CALL neptune.algo.closenessCentrality(
  n,
  {
    edgeLabels: ["route"],
    numSources: 10
  }
)
YIELD score
RETURN n, score
ORDER BY score DESC
LIMIT 10"
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample   `.closenessCentrality`   output
<a name="page-rank-sample-output"></a>

Here is an example of the output returned by .closenessCentrality when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.closenessCentrality(n, {numSources: 10}) YIELD node, score RETURN node, score limit 2" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    {
      "node": {
        "~id": "10",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 38.94449997,
          "elev": 313,
          "longest": 11500,
          "city": "Washington D.C.",
          "type": "airport",
          "region": "US-VA",
          "desc": "Washington Dulles International Airport",
          "code": "IAD",
          "prscore": 0.002264724113047123,
          "degree": 312,
          "lon": -77.45580292,
          "wccid": 2357352929951779,
          "country": "US",
          "icao": "KIAD",
          "runways": 4
        }
      },
      "score": 0.20877772569656373
    },
    {
      "node": {
        "~id": "12",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 40.63980103,
          "elev": 12,
          "longest": 14511,
          "city": "New York",
          "type": "airport",
          "region": "US-NY",
          "desc": "New York John F. Kennedy International Airport",
          "code": "JFK",
          "prscore": 0.002885053399950266,
          "degree": 403,
          "lon": -73.77890015,
          "wccid": 2357352929951779,
          "country": "US",
          "icao": "KJFK",
          "runways": 4
        }
      },
      "score": 0.2199712097644806
    }
  ]
}
```

# Closeness centrality mutatealgorithm
<a name="closeness-centrality-mutate"></a>

The closeness centrality mutate algorithm computes a Closeness Centrality (CC) metric for specified nodes in a graph. The CC metric of a node can be used as a positive measure of how close it is to all other nodes or how central it is in the graph.

The CC metric can be interpreted to show how quickly all other nodes in a network can be reached from a given node, and how important it is as a central hub for rapid information flow. It can be used in transportation networks to identify key hub locations, and in disease-spread modeling to pinpoint central points for targeted intervention efforts.

The closeness centrality (CC) score of a node is calculated based on the sum of its distances to all other vertices. The CC score itself is the inverse of that number; in other words, one divided by that sum. In practice, the calculation is commonly normalized to use the average length of the shortest paths rather than the actual sum of their lengths.

## `.closenessCentrality.mutate`  syntax
<a name="closeness-centrality-mutate-syntax"></a>

```
CALL neptune.algo.closenessCentrality.mutate(
  [node list (required)],
  {
    numSources: the number of BFS sources to use for computing the CC (required)
    writeProperty: name of the node property to write the CC score to (required)
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabel: "a node label for filtering (optional)",
    traversalDirection: traversal direction (optional),
    normalize: Boolean, set to false to prevent normalization (optional)
    concurrency: number of threads to use (optional)
  }
)
YIELD success
RETURN success
```

## `.closenessCentrality.mutate`  inputs
<a name="closeness-centrality-mutate-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting locations for the algorithm. Each node in the list triggers an execution of the algorithm. If an empty list is provided, the query result is also empty.

  If the algorithm is called following a `MATCH` clause (query algo integration), the source node list is the result returned by the `MATCH` clause. 
+ 

**a configuration object that contains:**
  + **numSources** *(required)*   –   *type:* `uint64_t`;   *default: none*.

    The number of BFS sources for computing approximate Closeness Centrality (CC). To compute exact closeness centrality, set `numSources` to a number larger than number of vertices, such as `maxInt`.

    Because of the computational complexity of the algorithm for large graphs, it's generally best to specify a number in the order of thousands to ten thousands, such as 8,192.
  + **writeProperty** *(required)*   –   *type:* `string`;   *default: none*.

    A name for the new node property that will contain the computed CC score of each node.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **normalize** *(optional)*   –   *type:* `Boolean`;   *default:*` true`.

    You can use this field to turn off normalization, which is on by default. Without normalization, only centrality scores of nodes within the same component can be meaningfully compared. Normalized scores can be compared across different connected components.

    The CC is normalizd using the Wasserman-Faust normalization formula for unconnected graphs. If there are `n` vertices reachable from vertex `u` (including vertex `u` itself), the Wasserman-Faust normalized closeness centrality score of vertex `u` is calculated as follows:

    ```
    (n-1)^2 / (|V| - 1) * sum(distance from u to these n vertices)
    ```

    Without normalization, the centrality score of vertex `u` is calculated as:

    ```
    (|V| - 1) / sum(distance from u to all other vertices in the graph)
    ```
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A node label for node filtering. If a node label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.closenessCentrality.mutate`  outputs
<a name="closeness-centrality-mutate-outputs"></a>

The closeness centrality score of each source node in the input list is written as a new node property using the property name specified in `writeProperty`.

If the algorithm is invoked as a standalone query, there is no other output.

If the algorithm is invoked following a `MATCH` clause that provides its source node list (query integration), the algorithm outputs a key column of the source vertices from the `MATCH` clause and a value column of Booleans (true or false) that indicate whether the CC value was successfully written to the node in question.

## Query examples for  `.closenessCentrality.mutate`
<a name="closeness-centrality-mutate-query-examples"></a>

This example computes closeness centrality scores and writes them as a new node property called `ccScore`:

```
CALL neptune.algo.closenessCentrality.mutate(
  {
    numSources: 10,
    writeProperty: "ccScore",
    edgeLabels: ["route"],
    vertexLabel: "airport",
    traversalDirection: "outbound",
    normalize: true,
    concurrency: 1
  }
)
```

Then you can query the `ccScore` property in a subsequent query:

```
MATCH (n) RETURN id(n), n.ccScore limit 5
```

## Sample   `.closenessCentrality.mutate`   output
<a name="page-rank-sample-output"></a>

Here is an example of the output returned by .closenessCentrality.mutate when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.closenessCentrality.mutate(
       {
         writeProperty: 'ccscore',
         numSources: 10
       }
     )
     YIELD success
     RETURN success"
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    {
      "success": true
    }
  ]
}
```