

# Neptune Analytics algorithms
<a name="algorithms"></a>

Graph algorithms are powerful tools for gaining insights into data. Neptune Analytics provides a set of optimized in-database implementations of common graph algorithms that are exposed as openCypher procedures. These algorithms analyze inherent aspects of the underlying graph structure, such as connectedness (path finding), relative importance (centrality), and community membership (community detection).

Neptune Analytics natively supports over 25 optimized graph algorithms and variants in the 5 most popular categories, as well as 3 miscellaneous graph procedures, that help customers extract insights from their graphs, which are listed in the following table.


| Category | Action | Algorithms | Common Uses | 
| --- | --- | --- | --- | 
| [Pathfinding](path-finding-algorithms.md) | Find the existence, quality, or availability of a path between nodes. |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  | 
| [Centrality](centrality-algorithms.md) | Determine the absolute or relative importance of a node in the graph. |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  | 
| [Similarity](similarity-algorithms.md) | Compare the similarities between different graph structures. |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  | 
| [Clustering and Community Detection](clustering-algorithms.md) | Identify meaningful groups or clusters within graph structures. |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  | 
| [Vector Similarity Search](vector-similarity.md) | Identify approximate nearest neighbor (ANN) nodes by comparing vector embeddings using the Hierarchical Navigable Small World (HNSW) algorithm. |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  | 
| [Miscellaneous Graph Procedures](custom-algorithms.md) | Provide summaries and statistics about the graph. |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/algorithms.html)  | 

Many of these algorithms require interacting with most or all the nodes and edges in a graph, often in an iterative fashion. As a result, they are too computationally expensive to process using normal analytic technologies. Neptune Analytics has built highly optimized implementations that allow them to run over graphs of any size. 

Algorithms in Neptune Analytics are integrated naturally into openCypher through the CALL clause, as illustrated below. This lets you combine algorithms naturally with openCypher clauses, functions, and semantics to build very complex queries. For example, you could look for the top 10 most important airports in the US-AK region like this:

```
MATCH (n:airport {region: 'US-AK'})
CALL neptune.algo.pageRank(n, {edgeLabels: ['route'], numOfIterations: 10})
YIELD rank
RETURN n.code, rank
ORDER BY rank DESC LIMIT 10
```

You can run algorithms in the SDKs using the `ExecuteOpenCypherQuery` operation or in boto3 and the AWS CLI using the `execute-query` command. If you don't want to use the SDK or CLI, you can use [https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth-connect-command-line.html#iam-auth-connect-awscurl](https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth-connect-command-line.html#iam-auth-connect-awscurl) to sign your Neptune Analytics requests [using Signature Version 4 (Sig4)](https://docs.aws.amazon.com/general/latest/gr/signing-aws-api-requests.html). For example, you can run a simple breadth-first search like this:

```
awscurl -X POST -H "Content-Type: application/x-www-form-urlencoded" \
  https://(graphIdentifier).(region).neptune-graph.amazonaws.com/opencypher \
  --service neptune-graph \
  --region (region) \
  -d "query=CALL neptune.algo.bfs([\"101\", \"102\"], {edgeLabels: [\"route\"]})"
```

You could run the same query using the AWS CLI like this:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string 'CALL neptune.algo.bfs(["101", "102"], {edgeLabels: ["route"]})' \
  --language open_cypher \
  /tmp/out.txt
```

Algorithms having signatures with different kinds of input in Neptune Analytics are exposed as separate algorithms. Unless otherwise indicated, the examples here are using the [Air Routes dataset](https://aws.amazon.com/blogs/database/let-me-graph-that-for-you-part-1-air-routes/).

Neptune Analytics currently supports five main categories of algorithm and a set of miscellenaous graph procedures:
+ [Path finding algorithms](path-finding-algorithms.md)   –   These find the existence, quality, or availability of a path or paths between two or more nodes in the graph. A path in this sense is a set of nodes and connecting edges.

  By efficiently determining the optimal route between two nodes, path-finding algorithms enable you to model real-world systems like roads or social networks as interconnected nodes and edges. Finding the shortest paths between various points is crucial in applications like route planning for GPS systems, logistics optimization, and even in solving complex problems in fields like biology or engineering.
+ [Centrality algorithms](centrality-algorithms.md)   –   These are used to determine the absolute or relative importance or influence of a node or nodes in the graph.

  By identifying the most influential or important nodes within a network, centrality algorithms can provide insights about key players or critical points of interaction. This is valuable in social network analysis, where it helps pinpoint influential individuals, and in transportation networks, where it aids in identifying crucial hubs for efficient routing and resource allocation.
+ [Similarity algorithms](similarity-algorithms.md)   –   Graph similarity algorithms allow you to compare and analyze the similarities and dissimilarities between different graph structures, which can provide insight into relationships, patterns, and commonalities across diverse datasets. This is invaluable in various fields, including biology for comparing molecular structures, social networks for identifying similar communities, and recommendation systems for suggesting similar items based on user preferences.
+ [Clustering or community-detection algorithms](clustering-algorithms.md)   –   Community-detection algorithms can identify meaningful groups or clusters of nodes in a network, revealing hidden patterns and structures that can provide insights into the organization and dynamics of complex systems. This is valuable in social network analysis, and in biology, for identifying functional modules in protein-protein interaction networks, and more generally for understanding information flow and influence propagation in many different domains.
+ [Vector Similarity Search](vector-similarity.md)   –   Vector similarity algorithms work by using vector based representations of data, a.k.a. embeddings, to answer questions about the data's context and its similarity and connection to other data. This is valuable in applications such as Retrieval Augmented Generation (RAG) applications, knowledge graph backed chatbots, and recommendation engines.
+ [Miscellenaous graph procedures](custom-algorithms.md)   –   The miscellaneous graph procedures can be ran on your graphs to give you insight into your graphs and their metrics. 

# Path-finding algorithms in Neptune Analytics
<a name="path-finding-algorithms"></a>

Path-finding algorithms are a category of graph algorithms that focus on finding a path, a connected set of nodes and edges, between two or more sets of nodes within a graph. They are often used to find available or optimized paths based on the existence, quantity, or quality of the paths and the values of properties along those paths.

By efficiently determining the best route between two nodes, path-finding algorithms enable you to model real-world systems like roads or social networks as interconnected nodes and edges. Finding the shortest paths between various points is crucial in applications like route planning for GPS systems, logistics optimization, and even in solving complex problems in fields like biology or engineering.

# Breadth-first search (BFS) path finding algorithms
<a name="bfs-algorithms"></a>

Breadth-first search (BFS) path-finding algorithms search for nodes in breadth-first order, starting from a single vertex. They can also, in the multi-source case, start from more than one vertex.

They can systematically explore and evaluate all neighboring nodes from a starting point before moving on to the neighbors of those nodes, which ensures that the algorithm searches the shallowest levels of the graph first.

Breadth-first-search is used in computer networking to find the shortest path between two devices, and in social networks to understand how information spreads through connections, and in games to explore possible moves and strategies.

**Time complexity**   –   The time complexity of breadth-first search algorithms is `O(|V|+|E|)`, where `|V|` is the number of vertices in the graph and `|E|` is the number of edges in the graph.

A breadth-first algorithm can be invoked as a *standalone* operation whose inputs are explicitly defined, or as a *query-algorithm integration* which takes as its input the output of an immediately preceding `MATCH` clause. 

Neptune Analytics supports these BFS algorithms:
+ [`.bfs`](bfs-standard.md)   –   This standard breadth-first search algorithm starts from the source vertex of the graph and returns a column of visited vertices.
+ [`.bfs.parents`](bfs-parents.md)   –   This variant of BFS starts from a source vertex or vertices and finds the parent of each vertex during the search. It returns a key column of the vertices and a value column of the parents of the key vertices.
+ [`.bfs.levels`](bfs-levels.md)   –   This variant of BFS starts from a source vertex or vertices and finds the levels of each vertex during the search. It returns a key column of the vertices and a value column of integers that are the level values of the key vertices. 

  Note that the level of a source vertex is 0.

# Standard breadth-first search (BFS) algorithm
<a name="bfs-standard"></a>

Standard breadth-first search (BFS) is an algorithm for finding nodes from a starting node or nodes in a graph in breadth-first order.

It returns the source node or nodes that it started from, and all of the nodes visited by each search.

**Note**  
Because every source node passed in leads to its own execution of the algorithm, your queries should limit the number of source nodes as much as possible.

## `.bfs`   syntax
<a name="bfs-standard-syntax"></a>

```
CALL neptune.algo.bfs(
  [source-node list (required)],
  {
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    maxDepth: maximum number of hops to traverse from a source node (optional),
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD the outputs to generate (source and/or node)
RETURN the outputs to return
```

## `.bfs`   inputs
<a name="bfs-standard-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The source-node list contains the node or nodes used as the starting locations for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source-node list is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source-node list for the algorithm.
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel**   *(optional)*   –   *type:* `string`;   *example:* `"airport"`;  *default:* no node filtering.

    If you provide a node label to filter on then only nodes matching that label will be traversed. This does not, however, filter out any nodes in the source node list.
  + **maxDepth**   *(optional)*   –   *type:* positive integer or 0 or -1;   *default:* -1.

    The maximum number of hops to traverse from a source node. If set at `-1` then there's no maximum depth limit. If set to `0`, only the nodes in the source node list are returned.
  + **traversalDirection**   *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.bfs`   outputs
<a name="bfs-standard-outputs"></a>

The `.bfs` algorithm returns:
+ **source**   –   *type:* `Node[]`.

  The source nodes.
+ **node**   –   *type:* `Node[]`.

  The nodes that the algorithm traversed from each source node.

## `.bfs`   query examples
<a name="bfs-standard-query-examples"></a>

This is a standalone example, where the query provides an explicit source node list.

```
CALL neptune.algo.bfs(
  ["101", "102"],
  {
    edgeLabels: ["route"],
    vertexLabel: "airport",
    maxDepth: 11,
    traversalDirection: "both",
    concurrency: 2
  }
)
YIELD node
```

You can run that query using the `execute-query` operation in the AWS CLI like this:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string 'CALL neptune.algo.bfs(["101", "102"],
      {edgeLabels: ["route"], vertexLabel: "airport", maxDepth: 11,
      traversalDirection: "both", concurrency: 2})' \
  --language open_cypher \
  /tmp/out.txt
```

A query like this one would return an empty result because the source list is empty:

```
CALL neptune.algo.bfs([], {edgeLabels: ["route"]})
```

By default, both the source nodes (`"source"` output) and the visited nodes (`"node"` output) are returned. You can use `YIELD` to specify which of those outputs you would like to see. For example, to see only the `"node"` outputs:

```
CALL neptune.algo.bfs(["101"], {edgeLabels: ["route"]}) YIELD node
```

The examples below are query integration examples, where `.bfs` follows a `MATCH` clause and uses the output of the `MATCH` clause as its source node list:

```
MATCH (n) WITH n LIMIT 5
CALL neptune.algo.bfs(n, {edgeLabels: ["route"]})
YIELD node
RETURN node
```

The `MATCH` clause can also explicitly specify a starting node list using the `id()` function, like this:

```
MATCH (n) where id(n)="101"
CALL neptune.algo.bfs(n, {edgeLabels: ["route"]})
YIELD node
RETURN node
```

Also:

```
MATCH (n) where id(n) IN ["101", "102"]
CALL neptune.algo.bfs(n, {edgeLabels: ["route"]})
YIELD node
RETURN COUNT(node)
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.bfs` output
<a name="bfs-standard-sample-output"></a>

Here is an example of the output returned by .bfs when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.bfs(['101'], {maxDepth: 1}) yield source, node return source, node limit 2" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [{
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.681099891662599,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "lon": 100.74700164794901,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "1483",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 39.490000000000002,
          "elev": 4557,
          "longest": 9186,
          "city": "Ordos",
          "type": "airport",
          "region": "CN-15",
          "desc": "Ordos Ejin Horo Airport",
          "code": "DSN",
          "lon": 109.861388889,
          "country": "CN",
          "icao": "ZBDS",
          "runways": 1
        }
      }
    }, {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.681099891662599,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "lon": 100.74700164794901,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "103",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 55.972599029541001,
          "elev": 622,
          "longest": 12139,
          "city": "Moscow",
          "type": "airport",
          "region": "RU-MOS",
          "desc": "Moscow, Sheremetyevo International Airport",
          "code": "SVO",
          "lon": 37.414600372314503,
          "country": "RU",
          "icao": "UUEE",
          "runways": 2
        }
      }
    }]
}
```

# Parents breadth-first search (BFS) algorithm
<a name="bfs-parents"></a>

The `parents` variant of breadth-first search is an algorithm for finding nodes from a starting node or vertices in breadth-first order and then performing a breadth-first search for the parent of each node.

It returns a key column of vertices, and a value column of the vertices that are the parents of the key vertices. The parent of a source node is itself.

**Note**  
Because every source node passed in initiates its own execution of the algorithm, your queries should limit the number of source nodes as much as possible.

## `.bfs.parents`   syntax
<a name="bfs-parents-syntax"></a>

```
CALL neptune.algo.bfs.parents(
  [source-node list (required)],
  {
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    maxDepth: maximum number of hops to traverse from a source node (optional),
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD the outputs to generate (source and/or node and/or parent)
RETURN the outputs to return
```

## `.bfs.parents`   inputs
<a name="bfs-parents-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The source-node list contains the node or nodes used as the starting locations for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source-node list is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source-node list for the algorithm.
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel**   *(optional)*   –   *type:* `string`;   *example:* `"airport"`;  *default:* no node filtering.

    If you provide a node label to filter on then only vertices matching that label will be traversed. This does not, however, filter out any nodes in the source node list.
  + **maxDepth**   *(optional)*   –   *type:* positive integer or 0 or -1;   *default:* -1.

    The maximum number of hops to traverse from a source node. If set at `-1` then there's no maximum depth limit. If set to `0`, only the vertices in the source node list are returned.
  + **traversalDirection**   *(optional)*   –   *type:* `string`;   *default:*` outbound`.

    The direction of edge to follow. Must be one of: `inbound`, `outbound`, or `both`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.bfs.parents`   outputs
<a name="bfs-parents-outputs"></a>

The `.bfs.parents` algorithm returns:
+ **source**   –   *type:* `Node[]`.

  The source nodes.
+ **node**   –   *type:* `Node[]`.

  The vertices that the algorithm traversed from each source node.
+ **parent**   –   *type:* `Node[]`.

  The parents of those traversed nodes.

## `.bfs.parents`   query examples
<a name="bfs-parents-query-examples"></a>

This is a standalone example, where the source node list is explicitly provided in the query:

```
CALL neptune.algo.bfs.parents(
  ["105", "113"],
  {
    edgeLabels: ["route"],
    vertexLabel: "airport",
    maxDepth: 2,
    traversalDirection: "both",
    concurrency: 2
  }
)
YIELD node, parent
```

A query like this one would return an empty result because the source list is empty:

```
CALL neptune.algo.bfs.parents([], {edgeLabels: ["route"]})
```

This is a query integration example, where `.bfs.parents` follows a `MATCH` clause that provides the source node list for `.bfs.parents`:

```
Match (n) with n LIMIT 5
CALL neptune.algo.bfs.parents(n, {edgeLabels: ["route"]})
YIELD node
RETURN n, node
```

This query is an example of aliasing the algorithm output:

```
MATCH (n {code: "AUS"})
CALL neptune.algo.bfs.parents(n, { edgeLabels: ["route"], maxDepth: 2})
YIELD node AS ReachedNode
RETURN ReachedNode
```

This query searches for routes to BFS from BKK, returning the starting node (BKK), 5 visited vertices, and their parents:

```
MATCH (n) where n.code CONTAINS "BKK"
CALL neptune.algo.bfs.parents(n, {edgeLabels: ["route"]})
YIELD node, parent
RETURN n, node, parent
LIMIT 5
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.bfs.parents` output
<a name="bfs-parents-sample-output"></a>

Here is an example of the output returned by .bfs.parents when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.bfs.parents(['101'], {maxDepth: 1})
                       YIELD source, node, parent
                       RETURN source, node, parent
                       LIMIT 2" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt   
{
  "results": [
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "lon": 100.747001647949,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "1483",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 39.49,
          "elev": 4557,
          "longest": 9186,
          "city": "Ordos",
          "type": "airport",
          "region": "CN-15",
          "desc": "Ordos Ejin Horo Airport",
          "code": "DSN",
          "lon": 109.861388889,
          "country": "CN",
          "icao": "ZBDS",
          "runways": 1
        }
      },
      "parent": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "lon": 100.747001647949,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      }
    },
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "lon": 100.747001647949,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "103",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 55.972599029541,
          "elev": 622,
          "longest": 12139,
          "city": "Moscow",
          "type": "airport",
          "region": "RU-MOS",
          "desc": "Moscow, Sheremetyevo International Airport",
          "code": "SVO",
          "lon": 37.4146003723145,
          "country": "RU",
          "icao": "UUEE",
          "runways": 2
        }
      },
      "parent": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "lon": 100.747001647949,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      }
    }
  ]
}
```

# Levels breadth-first search (BFS) algorithm
<a name="bfs-levels"></a>

The `levels` variant of breadth-first search is an algorithm for searching nodes from a starting node or nodes in breadth-first order. From there it performs a breadth-first search and records the hop level from the starting node of each node that it finds.

It returns a key column of nodes, and a value column containing the level values of those key nodes.

The level of a source node is 0. Note that because every source node passed into breadth-first search `levels` initiates its own execution of the algorithm, your queries should filter to a subset of the graph before executing BFS `levels` whenever possible.

## `.bfs.levels`   syntax
<a name="bfs-levels-syntax"></a>

```
CALL neptune.algo.bfs.levels(
  [source-node list (required)],
  {
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    maxDepth: maximum number of hops to traverse from a source node (optional),
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD the outputs to generate (source and/or node)
RETURN the outputs to return
```

## `.bfs.levels`   inputs
<a name="bfs-levels-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The source-node list contains the node or nodes used as the starting locations for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source-node list is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source-node list for the algorithm.
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel**   *(optional)*   –   *type:* `string`;   *example:* `"airport"`;  *default:* no node filtering.

    If you provide a node label to filter on then only nodes matching that label will be traversed. This does not, however, filter out any nodes in the source node list.
  + **maxDepth**   *(optional)*   –   *type:* positive integer or 0 or -1;   *default:* -1.

    The maximum number of hops to traverse from a source node. If set at `-1` then there's no maximum depth limit. If set to `0`, only the nodes in the source node list are returned.
  + **traversalDirection**   *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.bfs.levels`   outputs
<a name="bfs-levels-outputs"></a>

The `.bfs.levels` algorithm returns:
+ **source**   –   *type:* `Node[]`.

  The source nodes.
+ **node**   –   *type:* `Node[]`.

  The nodes that the algorithm traversed from each source node.
+ **level**   –   *type:* `integer[]`.

  The hop levels of those traversed nodes.

## `.bfs.levels`   standalone query examples
<a name="bfs-levels-standalone-examples"></a>

The examples below are standalone examples, where the query provides an explicit source node list.

A query like this one would return an empty result because the source list is empty:

```
CALL neptune.algo.bfs.levels(
  ["101", "102"],
  {
    edgeLabels: ["route"],
    vertexLabel: "airport",
    maxDepth: 6,
    traversalDirection: "both",
    concurrency: 2
  }
)
YIELD node
```

You can run the algorithm using the `execute-query` operation in the AWS CLI like this:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string 'CALL neptune.algo.bfs.levels(["101", "102"], {edgeLabels: ["route"]})' \
  --language open_cypher \
  /tmp/out.txt
```

By default, all the outputs are generated. You can use `YIELD` to specify which of those outputs to generate. For example, to generate only the `"node"` and `level` outputs:

```
CALL neptune.algo.bfs.levels(["101"], {edgeLabels: ["route"]}) YIELD node, level
```

## `.bfs.levels`   query integration examples
<a name="bfs-levels-integration-examples"></a>

The examples below are query integration examples, where `.bfs.levels` follows a `MATCH` clause and uses the output of the `MATCH` clause as its source node list:

```
MATCH (n) WITH n LIMIT 5
CALL neptune.algo.bfs.levels(n, {edgeLabels: ["route"]})
YIELD node, level
RETURN n, node, level
```

This query illustrates various ways to constrain the input and output:

```
MATCH (n) where id(n)="101"
CALL neptune.algo.bfs.levels(n, { edgeLabel: "route", maxDepth: 2})
YIELD node, level WHERE node.city CONTAINS "New"
RETURN n.city, node.city, level
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.bfs.levels` output
<a name="bfs-standard-sample-output"></a>

Here is an example of the output returned by .bfs.levels when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.bfs.levels(['101'], {maxDepth: 1}) yield source, node, level return source, node, level limit 2" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt   
{
  "results": [
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "lon": 100.747001647949,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "1483",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 39.49,
          "elev": 4557,
          "longest": 9186,
          "city": "Ordos",
          "type": "airport",
          "region": "CN-15",
          "desc": "Ordos Ejin Horo Airport",
          "code": "DSN",
          "lon": 109.861388889,
          "country": "CN",
          "icao": "ZBDS",
          "runways": 1
        }
      },
      "level": 1
    },
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "lon": 100.747001647949,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "103",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 55.972599029541,
          "elev": 622,
          "longest": 12139,
          "city": "Moscow",
          "type": "airport",
          "region": "RU-MOS",
          "desc": "Moscow, Sheremetyevo International Airport",
          "code": "SVO",
          "lon": 37.4146003723145,
          "country": "RU",
          "icao": "UUEE",
          "runways": 2
        }
      },
      "level": 1
    }
  ]
}
```

# Single-source shortest-path algorithms
<a name="sssp-algorithms"></a>

A single-source-shortest-path algorithm finds the shortest paths (or the distance of the shortest paths) between a given vertex and all reachable vertices in the graph (including itself).

By determining the most efficient routes from a single starting node to all other nodes in the graph, single-source-shortest-path can be used to calculate the shortest distances or lowest cost required to reach each destination. This is applicable in GPS systems to find the fastest routes between a starting point and different destinations, and in logistics to optimize delivery routes, and in transportation planning for efficient navigation through road networks.

Neptune Analytics supports the following single-source-shortest-path (SSSP) algorithms:
+ [`.sssp.bellmanFord`](sssp-bellmanFord.md)   –   Computes the shortest path distances from a source vertex to all other vertices in the graph using the [Bellman-Ford](https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm) algorithm. Positive edge weights must be provided using the `edgeWeightProperty`, and the traversal direction must not be set to `both`.
+ [`.sssp.bellmanFord.parents`](sssp-bellmanFord-parents.md)   –   Identifies the parent vertices along the shortest paths from the source vertex to all other vertices in the graph using the Bellman-Ford algorithm. Positive edge weights must be provided using the `edgeWeightProperty`, and the traversal direction must not be set to `both`.
+  [`.sssp.bellmanFord.path`](sssp-bellmanFord-path.md)   –   Finds the shortest path between a given source vertex and a target vertex in the graph using the [Bellman-Ford](https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm) algorithm. To compute all shortest paths from a given source vertex, the regular SSSP algorithm can be used. Positive edge weights must be provided using the `edgeWeightProperty`, and the traversal direction must not be set to `both`. 
+ [`.sssp.deltaStepping`](sssp-deltaStepping.md)   –   Computes the shortest path distances from a source vertex to all other vertices in the graph using a [delta-stepping](https://en.wikipedia.org/wiki/Parallel_single-source_shortest_path_algorithm#Delta_stepping_algorithm) algorithm. Positive edge weights must be provided using the `edgeWeightProperty`, and the traversal direction must not be set to `both`.
+ [`.sssp.deltaStepping.parents`](sssp-deltaStepping-parents.md)   –   Identifies the parent vertices along the shortest paths from the source vertex to all other vertices in the graph using a delta-stepping algorithm. Positive edge weights must be provided using the `edgeWeightProperty`, and the traversal direction must not be set to `both`.
+ [`.sssp.deltaStepping.path`](sssp-deltaStepping-path.md)   –   Finds the shortest path between a given source vertex and a target vertex in the graph using the delta-stepping algorithm. To compute all shortest paths from a given source vertex, the regular SSSP algorithm can be used. Positive edge weights must be provided using the `edgeWeightProperty`, and the traversal direction must not be set to `both`.
+ [`.topksssp`](topk-sssp.md)   –   The TopK hop-limited single source shortest path algorithm finds the single-source weighted shortest paths starting from a source vertex to all its `maxDepth` neighbors. The distance or cost from the source vertex to each target vertex is accumulated on the edge weights of the path. The topK distances of the paths are sorted in descending or ascending order.

  The algorithm can be run unweighted as well as weighted. When you run it unweighted, it's equivalent to [`.bfs.levels`](bfs-levels.md).

# Bellman-Ford single source shortest path (SSSP) algorithm
<a name="sssp-bellmanFord"></a>

The `.sssp.bellmanFord` algorithm computes the shortest path distances from a single source vertex to all other vertices in the graph using the [Bellman-Ford](https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm) algorithm.

Neptune Analytics implements the algorithm such that:
+ Positive edge weights must be provided using the `edgeWeightProperty` field
+ Negative edge weights are not supported.
+ The traversal direction cannot be set to `both`.

## `.sssp.bellmanFord`   syntax
<a name="sssp-bellmanFord-syntax"></a>

```
CALL neptune.algo.sssp.bellmanFord(
  [source-node list (required)],
  {
    edgeWeightProperty: edge weight predicate for traversal (required)
    edgeWeightType: numeric type of the edge weight property (required)
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD the outputs to generate (source and/or node)
RETURN the outputs to return
```

## `.sssp.bellmanFord`   inputs
<a name="sssp-bellmanFord-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting location(s) for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source-node list is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source-node list for the algorithm.
+ 

**a configuration object that contains:**
  + **edgeWeightProperty** *(required)*   –   *type:* `string`;   *default: none*.

    The edge weight predicate for traversal.
  + **edgeWeightType** *(required)*   –   *type:* `string`;   *valid values:* `"int"`, `"long"`, `"float"`, `"double"`.

    The numeric data type of the values in the property specified by `edgeWeightProperty`.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel**   *(optional)*   –   *type:* `string`;   *example:* `"airport"`;  *default:* no node filtering.

    A node label for node filtering. If a node label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"` or `"outbound"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `.sssp.bellmanFord` algorithm
<a name="sssp-bellmanFord-outputs"></a>

For every node that can be reached from the specified source list, the algorithm returns:
+ **source**   –   The source node.
+ **node**   –   A node found traversing from the source.
+ **distance**   –   The distance between the source node and the found node.

## `.sssp.bellmanFord`   query examples
<a name="sssp-bellmanFord-query-examples"></a>

This is a standalone query, where a source node (or nodes) is explicitly provided:

```
CALL neptune.algo.sssp.bellmanFord(
  ["101"],
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int"
  }
)
```

This is a query integration example, where `.sssp.bellmanFord` follows a `MATCH` clause and uses the output of the `MATCH` clause as its source node list:

```
MATCH (source:airport {code: 'ANC'})
CALL neptune.algo.sssp.bellmanFord(
  source,
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int",
    vertexLabel: "airport",
    traversalDirection: "outbound",
    concurrency: 1
  }
)
YIELD node, parent, distance
RETURN source, node, parent, distance
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.sssp.bellmanFord` output
<a name="sssp-bellmanFord-sample-output"></a>

Here is an example of the output returned by .sssp.bellmanFord when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.sssp.bellmanFord(['101'],
       {edgeWeightProperty: 'dist', edgeWeightType: 'int'})
     yield source, node, distance
     return source, node, distance
     limit 2" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "prscore": 0.002498496090993285,
          "degree": 308,
          "lon": 100.747001647949,
          "wccid": 2357352929951779,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "2709",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 65.4809036254883,
          "elev": 49,
          "longest": 8711,
          "city": "Nadym",
          "type": "airport",
          "region": "RU-YAN",
          "desc": "Nadym Airport",
          "code": "NYM",
          "prscore": 0.00016044313088059425,
          "degree": 18,
          "lon": 72.6988983154297,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "USMM",
          "runways": 1
        }
      },
      "distance": 3812
    },
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "prscore": 0.002498496090993285,
          "degree": 308,
          "lon": 100.747001647949,
          "wccid": 2357352929951779,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "2667",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 56.8567008972168,
          "elev": 2188,
          "longest": 6562,
          "city": "Ust-Kut",
          "type": "airport",
          "region": "RU-IRK",
          "desc": "Ust-Kut Airport",
          "code": "UKX",
          "prscore": 0.000058275499999999997,
          "degree": 4,
          "lon": 105.730003356934,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "UITT",
          "runways": 1
        }
      },
      "distance": 2993
    }
  ]
}
```

# Bellman-Ford single source shortest path (SSSP) parents algorithm
<a name="sssp-bellmanFord-parents"></a>

The `.sssp.bellmanFord.parents` algorithm uses the [Bellman-Ford](https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm) algorithm to find the parent nodes along with the shortest path distances from the source node to all other nodes in the graph.

Neptune Analytics implements the algorithm such that:
+ Positive edge weights must be provided using the `edgeWeightProperty` field
+ Negative edge weights are not supported.
+ The traversal direction cannot be set to `both`.

## `.sssp.bellmanFord.parents`   syntax
<a name="sssp-bellmanFord-parents-syntax"></a>

```
CALL neptune.algo.sssp.bellmanFord.parents(
  [source-node list (required)],
  {
    edgeWeightProperty: edge weight predicate for traversal (required)
    edgeWeightType: numeric type of the edge weight property (required)
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD the outputs to generate (source and/or node)
RETURN the outputs to return
```

## `.sssp.bellmanFord.parents`   inputs
<a name="sssp-bellmanFord-parents-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting location(s) for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source-node list is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source-node list for the algorithm.
+ 

**a configuration object that contains:**
  + **edgeWeightProperty** *(required)*   –   *type:* `string`;   *example:* `"distnce"`;   *default: none*.

    The edge weight predicate for traversal.
  + **edgeWeightType** *(required)*   –   *type:* `string`;   *valid values:* `"int"`, `"long"`, `"float"`, `"double"`.

    The numeric data type of the values in the property specified by `edgeWeightProperty`.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel**   *(optional)*   –   *type:* `string`;   *example:* `"airport"`;  *default:* no node filtering.

    A node label for node filtering. If a node label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"` or `"outbound"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `.sssp.bellmanFord.parents` algorithm
<a name="sssp-bellmanFord-parents-outputs"></a>

For every node that can be reached from the specified source list, the algorithm returns:
+ **source**   –   The source node.
+ **node**   –   A node found traversing from the source.
+ **distance**   –   The distance between the source node and the found node.
+ **parent**   –   The parent of the found node. Note that the parent of the source vertex is itself.

## `.sssp.bellmanFord.parents`   query examples
<a name="sssp-bellmanFord-parents-query-examples"></a>

This is a standalone query, where a source node (or nodes) is explicitly provided:

```
CALL neptune.algo.sssp.bellmanFord.parents(
  ["101"],
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int"
  }
)
```

This is a query integration example, where where `.sssp.bellmanFord.parents` follows a `MATCH` clause and uses the output of the `MATCH` clause as its source node list:

```
MATCH (source:airport {code: 'ANC'})
CALL neptune.algo.sssp.bellmanFord.parents(
  source,
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int",
    vertexLabel: "airport",
    traversalDirection: "outbound",
    concurrency: 1
  }
)
YIELD node, parent, distance
RETURN source, node, parent, distance
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.sssp.bellmanFord.parents` output
<a name="sssp-bellmanFord-parents-sample-output"></a>

Here is an example of the output returned by .sssp.bellmanFord.parents when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.sssp.bellmanFord.parents(['101'],
       {edgeWeightProperty: 'dist', edgeWeightType: 'int'})
     yield source, node, parent
     return source, node, parent
     limit 2" \
  --language open_cypher \
  /tmp/out.txt

cat /tmp/out.txt 
{
  "results": [
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "prscore": 0.002498496090993285,
          "degree": 308,
          "lon": 100.747001647949,
          "wccid": 2357352929951779,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "2709",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 65.4809036254883,
          "elev": 49,
          "longest": 8711,
          "city": "Nadym",
          "type": "airport",
          "region": "RU-YAN",
          "desc": "Nadym Airport",
          "code": "NYM",
          "prscore": 0.00016044313088059425,
          "degree": 18,
          "lon": 72.6988983154297,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "USMM",
          "runways": 1
        }
      },
      "parent": {
        "~id": "810",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 55.0125999450684,
          "elev": 365,
          "longest": 11818,
          "city": "Novosibirsk",
          "type": "airport",
          "region": "RU-NVS",
          "desc": "Tolmachevo Airport",
          "code": "OVB",
          "prscore": 0.0012910010991618038,
          "degree": 162,
          "lon": 82.6507034301758,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "UNNT",
          "runways": 2
        }
      }
    },
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "prscore": 0.002498496090993285,
          "degree": 308,
          "lon": 100.747001647949,
          "wccid": 2357352929951779,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "2667",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 56.8567008972168,
          "elev": 2188,
          "longest": 6562,
          "city": "Ust-Kut",
          "type": "airport",
          "region": "RU-IRK",
          "desc": "Ust-Kut Airport",
          "code": "UKX",
          "prscore": 0.000058275499999999997,
          "degree": 4,
          "lon": 105.730003356934,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "UITT",
          "runways": 1
        }
      },
      "parent": {
        "~id": "1038",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 52.2680015563965,
          "elev": 1675,
          "longest": 10384,
          "city": "Irkutsk",
          "type": "airport",
          "region": "RU-IRK",
          "desc": "Irkutsk Airport",
          "code": "IKT",
          "prscore": 0.0008466026629321277,
          "degree": 84,
          "lon": 104.388999938965,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "UIII",
          "runways": 1
        }
      }
    }
  ]
}
```

# Bellman-Ford single source single target shortest path algorithm
<a name="sssp-bellmanFord-path"></a>

 The `.sssp.bellmanFord.path` algorithm uses the Bellman-Ford algorithm to find the shortest path along with the shortest path distances from a source node to a target node in the graph. If there are multiple shortest paths between the source node and the target node, only one will be returned. The algorithm can run only weighted, with `edgeWeightProperty` provided. 

 Neptune Analytics implements the algorithm such that: 
+ Positive edge weights must be provided using the `edgeWeightProperty` field.
+ Negative edge weights are not supported.
+ The traversal direction cannot be set to `both`.

## `.sssp.bellmanFord.path`   syntax
<a name="sssp-bellmanFord-path-syntax"></a>

```
CALL neptune.algo.sssp.bellmanFord.path(
  [source node(s) (required)], [target node(s) (required)]
  {
    edgeWeightProperty: edge weight predicate for traversal (required)
    edgeWeightType: numeric type of the edge weight property (required)
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD the outputs to generate (source and/or node)
RETURN the outputs to return
```

## `.sssp.bellmanFord.path`   inputs
<a name="sssp-bellmanFord-path-inputs"></a>
+ **source node(s)**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting location(s) for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source node(s) is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source node(s) for the algorithm.
+ **target node(s)**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the ending location(s) for the algorithm.
  + Each source-target node pair produces an output of the algorithm.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the target node(s) for the algorithm.
+ 

**a configuration object that contains:**
  + **edgeWeightProperty** *(required)*   –   *type:* `string`;   *default: none*.

    The edge weight predicate for traversal.
  + **edgeWeightType** *(required)*   –   *type:* `string`;   *valid values:* `"int"`, `"long"`, `"float"`, `"double"`.

    The numeric data type of the values in the property specified by `edgeWeightProperty`.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel**   *(optional)*   –   *type:* `string`;   *example:* `"airport"`;  *default:* no node filtering.

    A node label for node filtering. If a node label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"` or `"outbound"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `.sssp.bellmanFord.path` algorithm
<a name="sssp-bellmanFord-path-outputs"></a>

For every pair of source and target nodes, the algorithm returns:
+ **source**   –   The source vertex.
+ **target**   –   The target vertex.
+ **distance**   –   The total shortest path distance from source to target.
+ **vertexPath**   –   A list of vertices in the path in visit order (including the source and the target).
+ **allDistances**   –   A list of cumulative distances to the vertices in the traversal path (including the source and the target).
+ **path**   –   An openCypher path object representing the shortest path between the source and the target. (A list of vertices from the source vertex to the target vertex, interleaved with the corresponding edges, representing the shortest path. Sequence of vertex id (source), edge id, vertex id, edge id, ..., vertex id (target)).
  + Starts and ends with a vertex, and has edges in between each vertex.
  + Includes the source and the target vertices.

## `.sssp.bellmanFord.path`   query examples
<a name="sssp-bellmanFord-path-query-examples"></a>

This is a standalone query, where a source node and target node are explicitly provided:

```
CALL neptune.algo.sssp.bellmanFord.path(
  "9", "37",
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int"
  }
)
```

This is a query integration example, where `.sssp.bellmanFord.path` follows a `MATCH` clause and uses the output of the `MATCH` clause as its source node(s) and target node(s):

```
MATCH (source:airport {code: 'FLL'})
MATCH (target:airport {code: 'HNL'})
CALL neptune.algo.sssp.bellmanFord.path(
  source, target,
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int",
    vertexLabel: "airport",
    traversalDirection: "outbound",
    concurrency: 1
  }
)
YIELD distance, vertexPath, allDistances, path
RETURN source, target, distance, vertexPath, allDistances, path
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes are returned; and that every source-target node pair produces an output, which can result in a very large query output if a large number of nodes are returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.sssp.bellmanFord.path` output
<a name="sssp-bellmanFord-path-sample-output"></a>

Here is an example of the output returned by .sssp.bellmanFord.path when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "MATCH (n:airport {code: 'FLL'})
  MATCH (m:airport {code: 'HNL'})
  CALL neptune.algo.sssp.bellmanFord.path(n, m, {
  edgeWeightProperty: "dist", 
  edgeWeightType: "int",
  edgeLabels: ["route"],
  vertexLabel: "airport",
  traversalDirection: "outbound",
  concurrency: 1
  })
  YIELD source, target, distance, vertexPath, allDistances, path
  RETURN source, target, distance, vertexPath, allDistances, path" \
  --language open_cypher \
  /tmp/out.txt

cat /tmp/out.txt 
{
  "results": [{
      "source": {
        "~id": "9",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "region": "US-FL",
          "runways": 2,
          "country": "US",
          "city": "Fort Lauderdale",
          "type": "airport",
          "icao": "KFLL",
          "lon": -80.152702331542997,
          "code": "FLL",
          "lat": 26.0725994110107,
          "longest": 9000,
          "elev": 64,
          "desc": "Fort Lauderdale/Hollywood International Airport"
        }
      },
      "target": {
        "~id": "37",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "region": "US-HI",
          "runways": 4,
          "country": "US",
          "city": "Honolulu",
          "type": "airport",
          "icao": "PHNL",
          "lon": -157.92199707031199,
          "code": "HNL",
          "lat": 21.318700790405298,
          "longest": 12312,
          "elev": 13,
          "desc": "Honolulu International Airport"
        }
      },
      "distance": 4854,
      "vertexPath": [{
          "~id": "9",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-FL",
            "runways": 2,
            "country": "US",
            "city": "Fort Lauderdale",
            "type": "airport",
            "icao": "KFLL",
            "lon": -80.152702331542997,
            "code": "FLL",
            "lat": 26.0725994110107,
            "longest": 9000,
            "elev": 64,
            "desc": "Fort Lauderdale/Hollywood International Airport"
          }
        }, {
          "~id": "11",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-TX",
            "runways": 5,
            "country": "US",
            "city": "Houston",
            "type": "airport",
            "icao": "KIAH",
            "lon": -95.341400146484403,
            "code": "IAH",
            "lat": 29.984399795532202,
            "longest": 12001,
            "elev": 96,
            "desc": "George Bush Intercontinental"
          }
        }, {
          "~id": "37",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-HI",
            "runways": 4,
            "country": "US",
            "city": "Honolulu",
            "type": "airport",
            "icao": "PHNL",
            "lon": -157.92199707031199,
            "code": "HNL",
            "lat": 21.318700790405298,
            "longest": 12312,
            "elev": 13,
            "desc": "Honolulu International Airport"
          }
        }],
      "allDistances": [0, 964, 4854],
      "path": [{
          "~id": "9",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-FL",
            "runways": 2,
            "country": "US",
            "city": "Fort Lauderdale",
            "type": "airport",
            "icao": "KFLL",
            "lon": -80.152702331542997,
            "code": "FLL",
            "lat": 26.0725994110107,
            "longest": 9000,
            "elev": 64,
            "desc": "Fort Lauderdale/Hollywood International Airport"
          }
        }, {
          "~id": "neptune_reserved_1_1152921504607567884",
          "~entityType": "relationship",
          "~start": "9",
          "~end": "11",
          "~type": "route",
          "~properties": {
            "dist": 964
          }
        }, {
          "~id": "11",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-TX",
            "runways": 5,
            "country": "US",
            "city": "Houston",
            "type": "airport",
            "icao": "KIAH",
            "lon": -95.341400146484403,
            "code": "IAH",
            "lat": 29.984399795532202,
            "longest": 12001,
            "elev": 96,
            "desc": "George Bush Intercontinental"
          }
        }, {
          "~id": "neptune_reserved_1_1152921508902600717",
          "~entityType": "relationship",
          "~start": "11",
          "~end": "37",
          "~type": "route",
          "~properties": {
            "dist": 3890
          }
        }, {
          "~id": "37",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-HI",
            "runways": 4,
            "country": "US",
            "city": "Honolulu",
            "type": "airport",
            "icao": "PHNL",
            "lon": -157.92199707031199,
            "code": "HNL",
            "lat": 21.318700790405298,
            "longest": 12312,
            "elev": 13,
            "desc": "Honolulu International Airport"
          }
        }]
    }]
}
```

# Delta-stepping single source shortest path (SSSP) algorithm
<a name="sssp-deltaStepping"></a>

The `.sssp.deltaStepping` algorithm computes the shortest path distances from a single source vertex to all other vertices in the graph using a [delta-stepping](https://en.wikipedia.org/wiki/Parallel_single-source_shortest_path_algorithm#Delta_stepping_algorithm) algorithm.

Neptune Analytics implements the algorithm such that:
+ Positive edge weights must be provided using the `edgeWeightProperty` field
+ Negative edge weights are not supported.
+ The traversal direction cannot be set to `both`.

## `.sssp.deltaStepping`   syntax
<a name="sssp-deltaStepping-syntax"></a>

```
CALL neptune.algo.sssp.deltaStepping(
  [source-node list (required)],
  {
    edgeWeightProperty: edge weight predicate for traversal (required)
    edgeWeightType: numeric type of the edge weight property (required)
    delta: the stepping delta (optional)
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD the outputs to generate (source and/or node)
RETURN the outputs to return
```

## `.sssp.deltaStepping`   inputs
<a name="sssp-deltaStepping-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting location(s) for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source-node list is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source-node list for the algorithm.
+ 

**a configuration object that contains:**
  + **edgeWeightProperty** *(required)*   –   *type:* `string`;   *default: none*.

    The edge weight predicate for traversal.
  + **edgeWeightType** *(required)*   –   *type:* `string`;   *valid values:* `"int"`, `"long"`, `"float"`, `"double"`.

    The numeric data type of the values in the property specified by `edgeWeightProperty`.
  + **delta**   *(optional)*   –   *type:* float;   *example:* `3.0`;   *default:* `2.0`.

    The delta stepping value.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel**   *(optional)*   –   *type:* `string`;   *example:* `"airport"`;  *default:* no node filtering.

    A node label for node filtering. If a node label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"` or `"outbound"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `sssp.deltaStepping` algorithm
<a name="sssp-deltaStepping-outputs"></a>

For every node that can be reached from the specified source list, the algorithm returns:
+ **source**   –   The source node.
+ **node**   –   A node found traversing from the source.
+ **distance**   –   The distance between the source node and the found node.

## `.sssp.deltaStepping`   query examples
<a name="sssp-deltaStepping-query-examples"></a>

This is a standalone query, where a source node (or nodes) is explicitly provided:

```
CALL neptune.algo.sssp.deltaStepping(
  ["101"],
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int"
  }
)
```

This is a query integration example, where where `.sssp.deltaStepping` follows a `MATCH` clause and uses the output of the `MATCH` clause as its source node list:

```
MATCH (source:airport {code: 'ANC'})
CALL neptune.algo.sssp.deltaStepping(
  source,
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int",
    vertexLabel: "airport",
    traversalDirection: "outbound",
    concurrency: 1
  }
)
YIELD node, parent, distance
RETURN source, node, parent, distance
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample   `.sssp.deltaStepping`   output
<a name="sssp-deltaStepping-sample-output"></a>

Here is an example of the output returned by .sssp.deltaStepping when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.sssp.deltaStepping(['101'],
       {edgeWeightProperty: 'dist', edgeWeightType: 'int'})
     yield source, node, distance
     return source, node, distance
     limit 2" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "prscore": 0.002498496090993285,
          "degree": 308,
          "lon": 100.747001647949,
          "wccid": 2357352929951779,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "2709",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 65.4809036254883,
          "elev": 49,
          "longest": 8711,
          "city": "Nadym",
          "type": "airport",
          "region": "RU-YAN",
          "desc": "Nadym Airport",
          "code": "NYM",
          "prscore": 0.00016044313088059425,
          "degree": 18,
          "lon": 72.6988983154297,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "USMM",
          "runways": 1
        }
      },
      "distance": 3812
    },
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "prscore": 0.002498496090993285,
          "degree": 308,
          "lon": 100.747001647949,
          "wccid": 2357352929951779,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "2667",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 56.8567008972168,
          "elev": 2188,
          "longest": 6562,
          "city": "Ust-Kut",
          "type": "airport",
          "region": "RU-IRK",
          "desc": "Ust-Kut Airport",
          "code": "UKX",
          "prscore": 0.000058275499999999997,
          "degree": 4,
          "lon": 105.730003356934,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "UITT",
          "runways": 1
        }
      },
      "distance": 2993
    }
  ]
}
```

# Delta-stepping single source shortest path (SSSP) parents algorithm
<a name="sssp-deltaStepping-parents"></a>

The `.sssp.deltaStepping.parents` algorithm computes the shortest path distances from a single source vertex to all other vertices in the graph using a [delta-stepping](https://en.wikipedia.org/wiki/Parallel_single-source_shortest_path_algorithm#Delta_stepping_algorithm) algorithm.

Neptune Analytics implements the algorithm such that:
+ Positive edge weights must be provided using the `edgeWeightProperty` field
+ Negative edge weights are not supported.
+ The traversal direction cannot be set to `both`.

## `.sssp.deltaStepping.parents`   syntax
<a name="sssp-deltaStepping-parents-syntax"></a>

```
CALL neptune.algo.sssp.deltaStepping.parents(
  [source-node list (required)],
  {
    edgeWeightProperty: edge weight predicate for traversal (required)
    edgeWeightType: numeric type of the edge weight property (required)
    delta: the stepping delta (optional)
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD the outputs to generate (source and/or node)
RETURN the outputs to return
```

## `.sssp.deltaStepping.parents`   inputs
<a name="sssp-deltaStepping-parents-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting location(s) for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source-node list is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source-node list for the algorithm.
+ 

**a configuration object that contains:**
  + **edgeWeightProperty** *(required)*   –   *type:* `string`;   *default: none*.

    The edge weight predicate for traversal.
  + **edgeWeightType** *(required)*   –   *type:* `string`;   *valid values:* `"int"`, `"long"`, `"float"`, `"double"`.

    The numeric data type of the values in the property specified by `edgeWeightProperty`.
  + **delta**   *(optional)*   –   *type:* float;   *example:* `3.0`;   *default:* `2.0`.

    The delta stepping value.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel**   *(optional)*   –   *type:* `string`;   *example:* `"airport"`;  *default:* no node filtering.

    A node label for node filtering. If a node label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"` or `"outbound"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `sssp.deltaStepping.parents` algorithm
<a name="sssp-deltaStepping-parents-outputs"></a>

For every node that can be reached from the specified source list, the algorithm returns:
+ **source**   –   The source node.
+ **node**   –   A node found traversing from the source.
+ **distance**   –   The distance between the source node and the found node.
+ **parent**   –   The parent of the found node. Note that the parent of the source vertex is itself.

## `.sssp.deltaStepping.parents`   query examples
<a name="sssp-deltaStepping-parents-query-examples"></a>

This is a standalone query, where a source node (or nodes) is explicitly provided:

```
CALL neptune.algo.sssp.deltaStepping.parents(
  ["101"],
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int"
  }
)
```

This is a query integration example, where where `.sssp.deltaStepping.parents` follows a `MATCH` clause and uses the output of the `MATCH` clause as its source node list:

```
MATCH (source:airport {code: 'ANC'})
CALL neptune.algo.sssp.deltaStepping.parents(
  source,
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int",
    vertexLabel: "airport",
    traversalDirection: "outbound",
    concurrency: 1
  }
)
YIELD node, parent, distance
RETURN source, node, parent, distance
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.sssp.deltaStepping.parents` output
<a name="sssp-deltaStepping-parents-sample-output"></a>

Here is an example of the output returned by .sssp.deltaStepping.parents when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.sssp.deltaStepping.parents(['101'],
       {edgeWeightProperty: 'dist', edgeWeightType: 'int'})
     yield source, node, distance
     return source, node, distance
     limit 2" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "prscore": 0.002498496090993285,
          "degree": 308,
          "lon": 100.747001647949,
          "wccid": 2357352929951779,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "2709",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 65.4809036254883,
          "elev": 49,
          "longest": 8711,
          "city": "Nadym",
          "type": "airport",
          "region": "RU-YAN",
          "desc": "Nadym Airport",
          "code": "NYM",
          "prscore": 0.00016044313088059425,
          "degree": 18,
          "lon": 72.6988983154297,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "USMM",
          "runways": 1
        }
      },
      "parent": {
        "~id": "810",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 55.0125999450684,
          "elev": 365,
          "longest": 11818,
          "city": "Novosibirsk",
          "type": "airport",
          "region": "RU-NVS",
          "desc": "Tolmachevo Airport",
          "code": "OVB",
          "prscore": 0.0012910010991618038,
          "degree": 162,
          "lon": 82.6507034301758,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "UNNT",
          "runways": 2
        }
      }
    },
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "prscore": 0.002498496090993285,
          "degree": 308,
          "lon": 100.747001647949,
          "wccid": 2357352929951779,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "2667",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 56.8567008972168,
          "elev": 2188,
          "longest": 6562,
          "city": "Ust-Kut",
          "type": "airport",
          "region": "RU-IRK",
          "desc": "Ust-Kut Airport",
          "code": "UKX",
          "prscore": 0.000058275499999999997,
          "degree": 4,
          "lon": 105.730003356934,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "UITT",
          "runways": 1
        }
      },
      "parent": {
        "~id": "1038",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 52.2680015563965,
          "elev": 1675,
          "longest": 10384,
          "city": "Irkutsk",
          "type": "airport",
          "region": "RU-IRK",
          "desc": "Irkutsk Airport",
          "code": "IKT",
          "prscore": 0.0008466026629321277,
          "degree": 84,
          "lon": 104.388999938965,
          "wccid": 2357352929951779,
          "country": "RU",
          "icao": "UIII",
          "runways": 1
        }
      }
    }
  ]
}
```

# DeltaStepping single source single target shortest path algorithm
<a name="sssp-deltaStepping-path"></a>

 The `.sssp.deltaStepping.path` algorithm uses the deltaStepping algorithm to find the shortest path along with the shortest path distances from a source node to a target node in the graph. If there are multiple shortest paths between the source node and the target node, only one will be returned. The algorithm can run only weighted, with `edgeWeightProperty` provided. 

 Neptune Analytics implements the algorithm such that: 
+ Positive edge weights must be provided using the `edgeWeightProperty` field.
+ Negative edge weights are not supported.
+ The traversal direction cannot be set to `both`.

## `.sssp.deltaStepping.path`   syntax
<a name="sssp-deltaStepping-path-syntax"></a>

```
CALL neptune.algo.sssp.deltaStepping.path(
  [source node(s) (required)], [target node(s) (required)]
  {
    edgeWeightProperty: edge weight predicate for traversal (required)
    edgeWeightType: numeric type of the edge weight property (required),
    delta: the stepping delta (optional)
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD the outputs to generate (source and/or node)
RETURN the outputs to return
```

## `.sssp.deltaStepping.path`   inputs
<a name="sssp-deltaStepping-path-inputs"></a>
+ **source node(s)**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting location(s) for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source node(s) is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source node(s) for the algorithm.
+ **target node(s)**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the ending location(s) for the algorithm.
  + Each source-target node pair produces an output of the algorithm.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the target node(s) for the algorithm.
+ 

**a configuration object that contains:**
  + **edgeWeightProperty** *(required)*   –   *type:* `string`;   *default: none*.

    The edge weight predicate for traversal.
  + **edgeWeightType** *(required)*   –   *type:* `string`;   *valid values:* `"int"`, `"long"`, `"float"`, `"double"`.

    The numeric data type of the values in the property specified by `edgeWeightProperty`.
  + **delta**   *(optional)*   –   *type:* `float`;   *example:* `3.0`;  *default:* 2.0

    The delta stepping value.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel**   *(optional)*   –   *type:* `string`;   *example:* `"airport"`;  *default:* no node filtering.

    A node label for node filtering. If a node label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"` or `"outbound"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `.sssp.deltaStepping.path` algorithm
<a name="sssp-deltaStepping-path-outputs"></a>

For every pair of source and target nodes, the algorithm returns:
+ **source**   –   The source vertex.
+ **target**   –   The target vertex.
+ **distance**   –   The total shortest path distance from source to target.
+ **vertexPath**   –   A list of vertices in the path in visit order (including the source and the target).
+ **allDistances**   –   A list of cumulative distances to the vertices in the traversal path (including the source and the target).
+ **path**   –   An openCypher path object representing the shortest path between the source and the target. (A list of vertices from the source vertex to the target vertex, interleaved with the corresponding edges, representing the shortest path. Sequence of vertex id (source), edge id, vertex id, edge id, ..., vertex id (target)).
  + Starts and ends with a vertex, and has edges in between each vertex.
  + Includes the source and the target vertices.

## `.sssp.deltaStepping.path`   query examples
<a name="sssp-deltaStepping-path-query-examples"></a>

This is a standalone query, where a source node and target node are explicitly provided:

```
CALL neptune.algo.sssp.deltaStepping.path(
  "9", "37",
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int",
    delta: 2.0
  }
)
```

This is a query integration example, where `.sssp.deltaStepping.path` follows a `MATCH` clause and uses the output of the `MATCH` clause as its source node(s) and target node(s):

```
MATCH (source:airport {code: 'FLL'})
MATCH (target:airport {code: 'HNL'})
CALL neptune.algo.sssp.deltaStepping.path(
  source, target,
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    delta: 2.0,
    edgeWeightType: "int",
    vertexLabel: "airport",
    traversalDirection: "outbound",
    concurrency: 1
  }
)
YIELD distance, vertexPath, allDistances, path
RETURN source, target, distance, vertexPath, allDistances, path
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes are returned; and that every source-target node pair produces an output, which can result in a very large query output if a large number of nodes are returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.sssp.deltaStepping.path` output
<a name="sssp-deltaStepping-path-sample-output"></a>

Here is an example of the output returned by .sssp.deltaStepping.path when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "MATCH (n:airport {code: 'FLL'})
  MATCH (m:airport {code: 'HNL'})
  CALL neptune.algo.sssp.deltaStepping.path(n, m, {
  edgeWeightProperty: "dist", 
  edgeWeightType: "int",
  delta: 2.0, 
  edgeLabels: ["route"],
  vertexLabel: "airport",
  traversalDirection: "outbound",
  concurrency: 1
  })
  YIELD source, target, distance, vertexPath, allDistances, path
  RETURN source, target, distance, vertexPath, allDistances, path" \
  --language open_cypher \
  /tmp/out.txt

cat /tmp/out.txt 
{
  "results": [{
      "source": {
        "~id": "9",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "region": "US-FL",
          "runways": 2,
          "country": "US",
          "city": "Fort Lauderdale",
          "type": "airport",
          "icao": "KFLL",
          "lon": -80.152702331542997,
          "code": "FLL",
          "lat": 26.0725994110107,
          "longest": 9000,
          "elev": 64,
          "desc": "Fort Lauderdale/Hollywood International Airport"
        }
      },
      "target": {
        "~id": "37",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "region": "US-HI",
          "runways": 4,
          "country": "US",
          "city": "Honolulu",
          "type": "airport",
          "icao": "PHNL",
          "lon": -157.92199707031199,
          "code": "HNL",
          "lat": 21.318700790405298,
          "longest": 12312,
          "elev": 13,
          "desc": "Honolulu International Airport"
        }
      },
      "distance": 4854,
      "vertexPath": [{
          "~id": "9",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-FL",
            "runways": 2,
            "country": "US",
            "city": "Fort Lauderdale",
            "type": "airport",
            "icao": "KFLL",
            "lon": -80.152702331542997,
            "code": "FLL",
            "lat": 26.0725994110107,
            "longest": 9000,
            "elev": 64,
            "desc": "Fort Lauderdale/Hollywood International Airport"
          }
        }, {
          "~id": "11",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-TX",
            "runways": 5,
            "country": "US",
            "city": "Houston",
            "type": "airport",
            "icao": "KIAH",
            "lon": -95.341400146484403,
            "code": "IAH",
            "lat": 29.984399795532202,
            "longest": 12001,
            "elev": 96,
            "desc": "George Bush Intercontinental"
          }
        }, {
          "~id": "37",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-HI",
            "runways": 4,
            "country": "US",
            "city": "Honolulu",
            "type": "airport",
            "icao": "PHNL",
            "lon": -157.92199707031199,
            "code": "HNL",
            "lat": 21.318700790405298,
            "longest": 12312,
            "elev": 13,
            "desc": "Honolulu International Airport"
          }
        }],
      "allDistances": [0, 964, 4854],
      "path": [{
          "~id": "9",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-FL",
            "runways": 2,
            "country": "US",
            "city": "Fort Lauderdale",
            "type": "airport",
            "icao": "KFLL",
            "lon": -80.152702331542997,
            "code": "FLL",
            "lat": 26.0725994110107,
            "longest": 9000,
            "elev": 64,
            "desc": "Fort Lauderdale/Hollywood International Airport"
          }
        }, {
          "~id": "neptune_reserved_1_1152921504607567884",
          "~entityType": "relationship",
          "~start": "9",
          "~end": "11",
          "~type": "route",
          "~properties": {
            "dist": 964
          }
        }, {
          "~id": "11",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-TX",
            "runways": 5,
            "country": "US",
            "city": "Houston",
            "type": "airport",
            "icao": "KIAH",
            "lon": -95.341400146484403,
            "code": "IAH",
            "lat": 29.984399795532202,
            "longest": 12001,
            "elev": 96,
            "desc": "George Bush Intercontinental"
          }
        }, {
          "~id": "neptune_reserved_1_1152921508902600717",
          "~entityType": "relationship",
          "~start": "11",
          "~end": "37",
          "~type": "route",
          "~properties": {
            "dist": 3890
          }
        }, {
          "~id": "37",
          "~entityType": "node",
          "~labels": ["airport"],
          "~properties": {
            "region": "US-HI",
            "runways": 4,
            "country": "US",
            "city": "Honolulu",
            "type": "airport",
            "icao": "PHNL",
            "lon": -157.92199707031199,
            "code": "HNL",
            "lat": 21.318700790405298,
            "longest": 12312,
            "elev": 13,
            "desc": "Honolulu International Airport"
          }
        }]
    }]
}
```

# TopK hop-limited single source (weighted) shortest path algorithm
<a name="topk-sssp"></a>

The `.topksssp`algorithm finds the single-source weighted shortest paths from a source node to its neighbors out to the distance specified by `maxDepth`. It accumulates the path lengths using the edge weights along the paths and then returns a sorted list of the shortest paths.

## `.topksssp`   syntax
<a name="topk-sssp-syntax"></a>

```
CALL neptune.algo.topksssp(
  [source-node list (required)],
  {
    hopCount: maximum hops on the shortest path (required),
    perHopLimits: [a list of the maximum number of nodes to carry forward at each hop (required)],
    edgeLabels: [list of edge labels for filtering (optional)],
    edgeWeightProperty: a numeric edge property to weight the traversal (optional),
    edgeWeightType: numeric type of the specified edgeWeightProperty (optional),
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional, default: outbound),
    costFunction: determines whether the topK distances are in ascending or descending order (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD source, node, distance
RETURN source, node, distance
```

## Inputs for the `topksssp` algorithm
<a name="topksssp-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting location(s) for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source-node list is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source-node list for the algorithm.
+ 

**a configuration object that contains:**
  + **hopCount** *(required)*   –   *type:* positive integer;   *default:* none.

    Restricts the number of hops on a shortest path, which restricts the number of iterations of the SSSP algorithm to be used.
  + **perHopLimits** *(required)*   –   *type: a list of integers*;   *valid values:* positive integers, or `-1` meaning unlimited;   *default:* none.

    Each integer represents the maximum number of candidate vertices to carry to the next hop.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **edgeWeightProperty** *(optional)*   –   *type:* `string`;   *default: none*.

    The edge weight predicate to for traversal. If no property is specified then the algorithm runs unweighted. If multiple properties exist on an edge having the specified name, then one of them is selected at random for the weight value.
  + **edgeWeightType** *(optional)*   –   *type:* `string`;   *valid values:* `"int"`, `"long"`, `"float"`, `"double"`.

    The numeric data type of the values in the property specified by `edgeWeightProperty`. If the `edgeWeightProperty` is not present, `edgeWeightType` is ignored and the algorithm runs unweighted. If an edge contains a property specified by `edgeWeightProperty` that has a numeric type different from what is specified in `edgeWeightType`, the property value is typecast to the type specified by `edgeWeightType`.
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A node label for node filtering. If a node label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
  + **costFunction** *(optional)*   –   *type:* `string`;   *valid values:* `"min"`, `"max"`;   *default: `"min"`*.

    Specifies the ordering of the topK distances returned. A `"min"` value indicates that the topK distances between the source and target vertices should be returned in descending order, whereas a `"max"` value indicates that they should be returned in ascending order.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `topksssp` algorithm
<a name="topksssp-outputs"></a>

For every node that can be reached from the specified source list, the algorithm returns:
+ **source**   –   The source node.
+ **node**   –   A node found traversing from the source.
+ **distance**   –   The distance between the source node and the found node.

## `.topksssp`   query examples
<a name="topksssp-query-examples"></a>

This is a standalone query, where the source node list is explicitly provided in the query:

```
CALL neptune.algo.topksssp(
  ["101"],
  {
    edgeLabels: ["route"],
    hopCount: 3,
    perHopLimits: [10, 100, 1000],
    edgeWeightProperty: "dist",
    edgeWeightType: "int"
  }
)
```

This is a query integration example, where `.topksssp` follows a `MATCH` clause and uses the output of the `MATCH` clause as its source node list:

```
MATCH (n) WHERE id(n) IN ["108","109"]
CALL neptune.algo.topksssp(
  n,
  {
    edgeLabels: ["route"],
    edgeWeightProperty: "dist",
    edgeWeightType: "int",
    hopCount: 5,
    perHopLimits: [5,10,15,20,25]
  }
)
YIELD distance
RETURN n, collect(distance) AS distances'
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.topksssp` output
<a name="topksssp-sample-output"></a>

Here is an example of the output returned by .topksssp when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier}
  --query-string "CALL neptune.algo.topksssp(['101'], {hopCount: 2, perHopLimits: [3, 5]})
      YIELD source, node, distance
      RETURN source, node, distance limit 2" \
  --language open_cypher
  /tmp/out.txt
  
  cat /tmp/out.txt
  {
  "results": [
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "prscore": 0.002498496090993285,
          "degree": 308,
          "lon": 100.747001647949,
          "wccid": 2357352929951779,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "170",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 40.8860015869,
          "elev": 294,
          "longest": 8622,
          "city": "Naples",
          "type": "airport",
          "region": "IT-72",
          "desc": "Naples International Airport",
          "code": "NAP",
          "prscore": 0.001119577675126493,
          "degree": 222,
          "lon": 14.2908000946,
          "wccid": 2357352929951779,
          "country": "IT",
          "icao": "LIRN",
          "runways": 1
        }
      },
      "distance": 2.0
    },
    {
      "source": {
        "~id": "101",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 13.6810998916626,
          "elev": 5,
          "longest": 13123,
          "city": "Bangkok",
          "type": "airport",
          "region": "TH-10",
          "desc": "Suvarnabhumi Bangkok International Airport",
          "code": "BKK",
          "prscore": 0.002498496090993285,
          "degree": 308,
          "lon": 100.747001647949,
          "wccid": 2357352929951779,
          "country": "TH",
          "icao": "VTBS",
          "runways": 2
        }
      },
      "node": {
        "~id": "12",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 40.63980103,
          "elev": 12,
          "longest": 14511,
          "city": "New York",
          "type": "airport",
          "region": "US-NY",
          "desc": "New York John F. Kennedy International Airport",
          "code": "JFK",
          "prscore": 0.002885053399950266,
          "degree": 403,
          "lon": -73.77890015,
          "wccid": 2357352929951779,
          "country": "US",
          "icao": "KJFK",
          "runways": 4
        }
      },
      "distance": 2.0
    }
  ]
}
```

# Egonet algorithms
<a name="egonet-algorithms"></a>

 The EgoNet algorithm finds the (filtered) EgoNet of a vertex to its hopCount-neighbors. An EgoNet, also known as the egocentric network, is a subgraph of a social network that encapsulates the connections of a single individual, known as the ego, and all the people they are socially connected to, known as alters. EgoNet can be used for further analysis in social networks. 

Neptune Analytics supports the following EgoNet algorithms:
+  [.egonet](egonet.md)   –   The EgoNet algorithm finds the (filtered) EgoNet of a vertex to its hopCount-neighbors. An EgoNet, also known as the egocentric network, is a subgraph of a social network that encapsulates the connections of a single individual, known as the ego, and all the people they are socially connected to, known as alters. 
+  [.egonet.edgeList](egonet-edgelist.md)   –   This variant of EgoNet returns the edge list of the ego network instead of the node list, using a different output schema. 

# .egonet
<a name="egonet"></a>

 This `EgoNet` algorithm finds the (filtered) EgoNet of a vertex to its hopCount-neighbors. An EgoNet, also known as the egocentric network, is a subgraph of a social network that encapsulates the connections of a single individual, known as the ego, and all the people they are socially connected to, known as alters. 

 For each hop, the algorithm gets the `topK` (K is specified per hop by the user via `perHopMaxNeighbor`) neighbors those have the highest/lowest (based on the `costFunction`) edge weights, and these neighbors become the source vertices for the next hop. The algorithm assumes the graph is an edge weighted graph. 

## `.egonet`   syntax
<a name="egonet-syntax"></a>

```
CALL neptune.algo.egonet(
  [source/ego-node list (required)],
  {
    hopCount: fixed hops of traversal (required),
    perHopMaxNeighbor: [list of the max number of top neighbor vertices at each hop (required)],
    perHopEdgeWeightProperty: [list of edge weight predicates at each hop (required)],
    edgeWeightType: numeric type of the specified edgeWeightProperty (required),
    edgeLabels: [list of edge labels for filtering (optional)],
    perHopVertexLabel: [list of node labels for filtering at each hop(optional)],
    perHopTraversalDirection: [list of traversal directions at each hop (optional, default: outbound)],
    costFunction: determines whether the edges having the maximum weights or the minimum weight will be included in the EgoNet (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD egoNode, nodeList, edgeList
RETURN egoNode, nodeList, edgeList
```

## Inputs for the `.egonet` algorithm
<a name="egonet-inputs"></a>
+ **a source/ego node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting location(s) for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source-node list is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source-node list for the algorithm.
+ 

**a configuration object that contains:**
  + **hopCount** *(required)*   –   *type:* positive integer;   *valid values:* 1, 2 or 3; other values will be rejected.   *default:* none.

    Restricts the number of hops during traversal.
  + **perHopMaxNeighbor** *(required)*   –   *type: a list of integers*;   *valid values:* positive integers, or `-1` meaning unlimited;   *default:* none.

    Each integer represents the maximum number of candidate vertices to carry to the next hop. It should have the same size as the value of `hopCount`.
  + **perHopEdgeWeightProperty** *(required)*   –   *type: a list of strings*;   *default:* none.

    The edge weight predicate for traversal at each hop. If multiple properties exist on an edge having the specified name, then one of them is selected at random for the weight value. It should have the same size as the value of `hopCount`.
  + **edgeWeightType** *(required)*   –   *type: string*;   *valid values:* "int", "long", "float", "double";   *default:* none.

    The numeric data type of the values in the property specified by `perHopEdgeWeightProperty`. If an edge contains a property specified by `perHopEdgeWeightProperty` that has a numeric type different from what is specified in `edgeWeightType`, the property value is typecast to the type specified by `edgeWeightType`.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **perHopVertexLabel** *(optional)*   –   *type:* a list of vertex label strings;   *default: none*.

     A list of node labels for node filtering at each hop. At each hop, if a node label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list. It should have the same size as the value of `hopCount`.
  + **perHopTraversalDirection** *(optional)*   –   *type:* a list of strings;   *valid values:* "inbound","outbound", or "both"; *default:* outbound.

    The direction of edge to follow at each hop. It should have the same size as the value of `hopCount`.
  + **costFunction** *(optional)*   –   *type:* `string`;   *valid values:* `"min"`, `"max"`;   *default: `"max"`*.

    This determines whether the edges having the maximum weights or the minimum weight will be included in the EgoNet adhering the `perHopMaxNeighbor` limits. A `min` value indicates that the edge with minimum weights will be included in the EgoNet, whereas a `max` value indicates that the edge with maximum weights will be included in the EgoNet.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `.egonet` algorithm
<a name="egonet-outputs"></a>

The .egonet algorithm returns:
+ **egoNode**   –   The ego vertex for the egonet.
+ **nodeList**   –   A list of traversed vertices from the ego vertex.
+ **edgeList**   –   A list of traversed edges from the ego vertex.

## `.egonet`   query examples
<a name="egonet-query-examples"></a>

This is a standalone query, where the source node list is explicitly provided in the query:

```
CALL neptune.algo.egonet(["101"], {
hopCount: 2,
perHopMaxNeighbor: [-1,-1],
edgeLabels: ["route"], 
perHopEdgeWeightProperty: ["dist", "dist"],
edgeWeightType: "int",
perHopVertexLabel: ["airport", "airport"], 
perHopTraversalDirection: ["outbound", "outbound"],
costFunction: "max",
concurrency: 1
})
YIELD egoNode, nodeList, edgeList
RETURN egoNode, nodeList, edgeList
```

This is a query integration example, where `.egonet` follows a `MATCH` clause and uses the output of the `MATCH` clause as its source node list:

```
MATCH (n:airport {code: 'ANC'}) 
CALL neptune.algo.egonet(n, {
hopCount: 2,
perHopMaxNeighbor: [-1,-1],
edgeLabels: ["route"], 
perHopEdgeWeightProperty: ["dist", "dist"],
edgeWeightType: "int",
perHopVertexLabel: ["airport", "airport"], 
perHopTraversalDirection: ["outbound", "outbound"],
costFunction: "max",
concurrency: 1
})
YIELD nodeList, edgeList
RETURN n, nodeList, edgeList
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.egonet` output
<a name="egonet-sample-output"></a>

Here is an example of the output returned by .egonet when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier}
  --query-string "CALL neptune.algo.egonet(["1"], \
  {perHopEdgeWeightProperty: ["dist"], \
  edgeWeightType: "int", \
  hopCount: 1, \
  perHopMaxNeighbor: [3], \
  perHopTraversalDirection: ["both"]}) \
  yield egoNode, edgeList, nodeList \
  return egoNode, edgeList, nodeList" \
  --language open_cypher
  /tmp/out.txt

  cat /tmp/out.txt
{
 "results": [{
 "egoNode": {
 "~id": "1",
 "~entityType": "node",
 "~labels": ["airport"],
 "~properties": {
 "region": "US-GA",
 "runways": 5,
 "country": "US",
 "city": "Atlanta",
 "type": "airport",
 "icao": "KATL",
 "lon": -84.4281005859375,
 "code": "ATL",
 "lat": 33.6366996765137,
 "longest": 12390,
 "elev": 1026,
 "desc": "Hartsfield - Jackson Atlanta International Airport"
 }
},
 "edgeList": [{
 "~id": "neptune_reserved_1_1152921770894950415",
 "~entityType": "relationship",
 "~start": "67",
 "~end": "1",
 "~type": "route",
 "~properties": {
 "dist": 7640
 }
 }, {
 "~id": "neptune_reserved_1_1152922020003053583",
 "~entityType": "relationship",
 "~start": "126",
 "~end": "1",
 "~type": "route",
 "~properties": {
 "dist": 8434
 }
 }, {
 "~id": "neptune_reserved_1_1152921521787699214",
 "~entityType": "relationship",
 "~start": "1",
 "~end": "58",
 "~type": "route",
 "~properties": {
 "dist": 7581
 }
 }],
 "nodeList": [{
 "~id": "126",
 "~entityType": "node",
 "~labels": ["airport"],
 "~properties": {
 "region": "ZA-GT",
 "runways": 2,
 "country": "ZA",
 "city": "Johannesburg",
 "type": "airport",
 "icao": "FAJS",
 "lon": 28.2460002899,
 "code": "JNB",
 "lat": -26.139200210599999,
 "longest": 14495,
 "elev": 5558,
 "desc": "Johannesburg, OR Tambo International Airport"
 }
 }, {
 "~id": "67",
 "~entityType": "node",
 "~labels": ["airport"],
 "~properties": {
 "region": "CN-31",
 "runways": 2,
 "country": "CN",
 "city": "Shanghai",
 "type": "airport",
 "icao": "ZSPD",
 "lon": 121.80500030517599,
 "code": "PVG",
 "lat": 31.1434001922607,
 "longest": 13123,
 "elev": 13,
 "desc": "Shanghai - Pudong International Airport"
 }
 }, {
 "~id": "58",
 "~entityType": "node",
 "~labels": ["airport"],
 "~properties": {
 "region": "AE-DU",
 "runways": 2,
 "country": "AE",
 "city": "Dubai",
 "type": "airport",
 "icao": "OMDB",
 "lon": 55.364398956300001,
 "code": "DXB",
 "lat": 25.2527999878,
 "longest": 13124,
 "elev": 62,
 "desc": "Dubai International Airport"
 }
 }, {
 "~id": "1",
 "~entityType": "node",
 "~labels": ["airport"],
 "~properties": {
 "region": "US-GA",
 "runways": 5,
 "country": "US",
 "city": "Atlanta",
 "type": "airport",
 "icao": "KATL",
 "lon": -84.4281005859375,
 "code": "ATL",
 "lat": 33.6366996765137,
 "longest": 12390,
 "elev": 1026,
 "desc": "Hartsfield - Jackson Atlanta International Airport"
 }
 }]
 }]
}
```

# .egonet.edgeList
<a name="egonet-edgelist"></a>

 This `EgoNet EdgeList` algorithm is the same as the standard `EgoNet` algorithm, except that this variant has a different output schema, which returns the `EgoNet` in an edge list form. 

## `.egonet.edgeList`   syntax
<a name="egonet-syntax"></a>

```
CALL neptune.algo.egonet.edgeList(
  [source/ego-node list (required)],
  {
    hopCount: fixed hops of traversal (required),
    perHopMaxNeighbor: [list of the max number of top neighor vertices at each hop (required)],
    perHopEdgeWeightProperty: [list of edge weight predicates at each hop (required)],
    edgeWeightType: numeric type of the specified edgeWeightProperty (required),
    edgeLabels: [list of edge labels for filtering (optional)],
    perHopVertexLabel: [list of node labels for filtering at each hop(optional)],
    perHopTraversalDirection: [list of traversal directions at each hop (optional, default: outbound)],
    costFunction: determines whether the edges having the maximum weights or the minimum weight will be included in the EgoNet (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD egoNode, source, target, weight
RETURN egoNode, source, target, weight
```

## Inputs for the `.egonet.edgeList` algorithm
<a name="egonet-edgelist-inputs"></a>
+ **a source/ego node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting location(s) for the algorithm.
  + Each starting node triggers its own execution of the algorithm.
  + If the source-node list is empty then the query result is also empty.
  + If the algorithm is called following a `MATCH` clause (this is known as query-algorithm integration), the output of the `MATCH` clause is used as the source-node list for the algorithm.
+ 

**a configuration object that contains:**
  + **hopCount** *(required)*   –   *type:* positive integer;   *valid values:* 1, 2 or 3; other values will be rejected.   *default:* none.

    Restricts the number of hops during traversal.
  + **perHopMaxNeighbor** *(required)*   –   *type: a list of integers*;   *valid values:* positive integers, or `-1` meaning unlimited;   *default:* none.

    Each integer represents the maximum number of candidate vertices to carry to the next hop. It should have the same size as the value of `hopCount`.
  + **perHopEdgeWeightProperty** *(required)*   –   *type: a list of strings*;   *default:* none.

    The edge weight predicate for traversal at each hop. If multiple properties exist on an edge having the specified name, then one of them is selected at random for the weight value. It should have the same size as the value of `hopCount`.
  + **edgeWeightType** *(required)*   –   *type: string*;   *valid values:* "int", "long", "float", "double";   *default:* none.

    The numeric data type of the values in the property specified by `perHopEdgeWeightProperty`. If an edge contains a property specified by `perHopEdgeWeightProperty` that has a numeric type different from what is specified in `edgeWeightType`, the property value is typecast to the type specified by `edgeWeightType`.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **perHopVertexLabel** *(optional)*   –   *type:* a list of vertex label strings;   *default: none*.

     A list of node labels for node filtering at each hop. At each hop, if a node label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list. It should have the same size as the value of `hopCount`.
  + **perHopTraversalDirection** *(optional)*   –   *type:* a list of strings;   *valid values:* "inbound","outbound", or "both"; *default:* outbound.

    The direction of edge to follow at each hop. It should have the same size as the value of `hopCount`.
  + **costFunction** *(optional)*   –   *type:* `string`;   *valid values:* `"min"`, `"max"`;   *default: `"max"`*.

    This determines whether the edges having the maximum weights or the minimum weight will be included in the EgoNet adhering the `perHopMaxNeigbor` limits. A `min` value indicates that the edge with minimum weights will be included in the EgoNet, whereas a `max` value indicates that the edge with maximum weights will be included in the EgoNet.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `.egonet.edgeList` algorithm
<a name="egonet-outputs"></a>

The .egonet.edgeList algorithm returns:
+ **egoNode**   –   The ego vertex of the egonet.
+ **source**   –   The source vertex of an edge in the weighted edge list.
+ **target**   –   The source vertex of an edge in the weighted edge list.
+ **weight**   –   The edge weight of the source-target edge in the weighted edge list.

## `.egonet.edgeList`   query examples
<a name="egonet-edgelist-query-examples"></a>

This ia a standalone query, where the source node list is explicitly provided in the query:

```
CALL neptune.algo.egonet.edgeList(["101"], {
hopCount: 2,
perHopMaxNeighbor: [-1,-1],
edgeLabels: ["route"], 
perHopEdgeWeightProperty: ["dist", "dist"],
edgeWeightType: "int",
perHopVertexLabel: ["airport", "airport"], 
perHopTraversalDirection: ["outbound", "outbound"],
costFunction: "max",
concurrency: 1
})
YIELD egoNode, source, target, weight
RETURN egoNode, source, target, weight
```

This is a query integration example, where `.egonet.edgeList` follows a `MATCH` clause and uses the output of the `MATCH` clause as its source node list:

```
MATCH (n:airport {code: 'ANC'}) 
CALL neptune.algo.egonet.edgeList(n, {
hopCount: 2,
perHopMaxNeighbor: [-1,-1],
edgeLabels: ["route"], 
perHopEdgeWeightProperty: ["dist", "dist"],
edgeWeightType: "int",
perHopVertexLabel: ["airport", "airport"], 
perHopTraversalDirection: ["outbound", "outbound"],
costFunction: "max",
concurrency: 1
})
YIELD source, target, weight
RETURN n, source, target, weight
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.egonet.edgeList` output
<a name="egonet-edgelist-sample-output"></a>

Here is an example of the output returned by .egonet.edgeList when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier}
  --query-string "CALL neptune.algo.egonet(["1"], \
  {perHopEdgeWeightProperty: ["dist", "dist"], \
  edgeWeightType: "int", \
  hopCount: 2, \
  perHopMaxNeighbor: [30, 30], \
  perHopTraversalDirection: ["both", "both"]}) \
  YIELD egoNode, source, target, weight \
  RETURN egoNode, source, target, weight limit 2" \
  --language open_cypher
  /tmp/out.txt

  cat /tmp/out.txt
{
 "results": [{
 "egoNode": {
 "~id": "1",
 "~entityType": "node",
 "~labels": ["airport"],
 "~properties": {
 "region": "US-GA",
 "runways": 5,
 "country": "US",
 "city": "Atlanta",
 "type": "airport",
 "icao": "KATL",
 "lon": -84.4281005859375,
 "code": "ATL",
 "lat": 33.6366996765137,
 "longest": 12390,
 "elev": 1026,
 "desc": "Hartsfield - Jackson Atlanta International Airport"
 }
 },
 "source": {
 "~id": "1",
 "~entityType": "node",
 "~labels": ["airport"],
 "~properties": {
 "region": "US-GA",
 "runways": 5,
 "country": "US",
 "city": "Atlanta",
 "type": "airport",
 "icao": "KATL",
 "lon": -84.4281005859375,
 "code": "ATL",
 "lat": 33.6366996765137,
 "longest": 12390,
 "elev": 1026,
 "desc": "Hartsfield - Jackson Atlanta International Airport"
 }
 },
 "target": {
 "~id": "27",
 "~entityType": "node",
 "~labels": ["airport"],
 "~properties": {
 "region": "US-CA",
 "runways": 3,
 "country": "US",
 "city": "Long Beach",
 "type": "airport",
 "icao": "KLGB",
 "lon": -118.15200040000001,
 "code": "LGB",
 "lat": 33.817699429999998,
 "longest": 10003,
 "elev": 60,
 "desc": "Long Beach Airport"
 }
 },
 "weight": 2460
 }, {
 "egoNode": {
 "~id": "1",
 "~entityType": "node",
 "~labels": ["airport"],
 "~properties": {
 "region": "US-GA",
 "runways": 5,
 "country": "US",
 "city": "Atlanta",
 "type": "airport",
 "icao": "KATL",
 "lon": -84.4281005859375,
 "code": "ATL",
 "lat": 33.6366996765137,
 "longest": 12390,
 "elev": 1026,
 "desc": "Hartsfield - Jackson Atlanta International Airport"
 }
 },
 "source": {
 "~id": "134",
 "~entityType": "node",
 "~labels": ["airport"],
 "~properties": {
 "region": "PE-LIM",
 "runways": 1,
 "country": "PE",
 "city": "Lima",
 "type": "airport",
 "icao": "SPIM",
 "lon": -77.1143035889,
 "code": "LIM",
 "lat": -12.021900176999999,
 "longest": 11506,
 "elev": 113,
 "desc": "Lima, Jorge Chavez International Airport"
 }
 },
 "target": {
 "~id": "1",
 "~entityType": "node",
 "~labels": ["airport"],
 "~properties": {
 "region": "US-GA",
 "runways": 5,
 "country": "US",
 "city": "Atlanta",
 "type": "airport",
 "icao": "KATL",
 "lon": -84.4281005859375,
 "code": "ATL",
 "lat": 33.6366996765137,
 "longest": 12390,
 "elev": 1026,
 "desc": "Hartsfield - Jackson Atlanta International Airport"
 }
 },
 "weight": 3189
 }]
}
```

# Centrality algorithms in Neptune Analytics
<a name="centrality-algorithms"></a>

Centrality algorithms utilize the topology of a network to determine the relative importance or influence of a specific node within the graph. By measuring the relative importance of a node or edge within a network, centrality values can indicate which elements in a graph play a critical role in that network.

By identifying the most influential or important nodes within a network, centrality algorithms can provide insights about key players or critical points of interaction. This is valuable in social network analysis, where it helps pinpoint influential individuals, and in transportation networks, where it aids in identifying crucial hubs for efficient routing and resource allocation.

Different types of centrality algorithms use different techniques to measure the importance of a node. Understanding how an algorithm calculates centrality is important to understanding the meaning of its outputs.

In addition to returning centrality data to the client, Neptune Analytics provides mutate variations of the centrality algorithms which store the calculated centrality values as vertex properties in the graph.

Neptune Analytics supports four centrality algorithms along with their mutate variants:
+ [`degree`](degree.md)   –   This measures a node's centrality by the number of edges connected to it, and can therefore be used to find the most connected nodes in a network.
+ [`degree.mutate`](degree-mutate.md)   –   The degree centrality mutate algorithm measures the number of incident edges of each vertex it traverses and writes that calculated degree value as a property of the vertex.
+ [`pageRank`](page-rank.md)   –   This is an iterative algorithm that measures a node's centrality by the number and quality of incident edges and adjacent vertices. The centrality of a node connected to a few important nodes may therefore be higher than that of a node connected to many less important nodes. The output of this algorithm is a value that indicates the importance of a given node relative to the other nodes in the graph.
+ [`pageRank.mutate`](page-rank-mutate.md)   –   This algorithm stores the calculated PageRank of a given node as a property of the node.
+ [`closenessCentrality`](closeness-centrality.md)   –   This algorithm computes the closeness centrality (CC) metric of nodes in a graph. The closeness centrality metric of a vertex is a positive measure of how close it is to all other vertices, or how central it is in the graph. Because it indicates how quickly all other nodes in a network can be reached from a given node, it can be used in transportation networks to identify key hub locations, and in disease-spread modeling to pinpoint central locations for targeted intervention efforts.
+ [`closenessCentrality.mutate`](closeness-centrality-mutate.md)   –   This algorithm computes the closeness centrality (CC) metric of vertices in a graph and writes them as a property of each vertex.

# Degree centrality algorithm
<a name="degree"></a>

The `.degree` centrality algorithm counts the number of incident edges at each node that it traverses. This measure of how connected the node is can in turn indicate the node's importance and level of influence in the network.

The `.degree` algorithm is used in social networks to identify popular individuals with many connections, in transportation networks to locate central hubs with numerous roads leading to and from them, and in web analysis to find influential web pages with many incoming links.

The time complexity of `.degree` is `O(|E|)`, where `|E|` is the number of edges in the graph. The space complexity is `O(|V|)`, where `|V|` is the number of vertices in the graph.

## `.degree`   syntax
<a name="degree-syntax"></a>

```
CALL neptune.algo.degree(
  [node list (required)],
  {
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabels: [a list of vertex labels for filtering (optional)], 
    vertexLabel: [a node label for filtering (optional) [deprecated]],
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD node, degree
RETURN node, degree
```

## Inputs for the `.degree` algorithm
<a name="degree-inputs"></a>
+ **a node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes for which to return the edge count (degree). If an empty list is provided, the query result is also empty.

  If the algorithm is called following a `MATCH` clause (query integration), the result returned by the `MATCH` clause is taken as the node list. 
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabels**   *(optional)*   –   *type:* a list of vertex label strings;   *example:* `["airport", ...]`;   *default:* no vertex filtering.

    To filter on one more vertex labels, provide a list of the ones to filter on. If no `vertexLabels` field is provided then all vertex labels are processed during traversal.
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    [deprecated]
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.degree`   outputs
<a name="degree-outputs"></a>
+ **node**   –   A list of the requested nodes. If `vertexLabels` or `vertexLabel` is present, only the requested nodes that match in `vertexLabels ` or `vertexLabel` value are included.
+ **degree**   –   A list of corresponding degree values for the nodes with respect to edges with a label specified in `edgeLabels`.

If the input vertex list is empty, the output is empty.

## Query examples for `.degree`
<a name="degree-query-examples"></a>

This is a standalone example, where the source node list is explicitly specified in the query:

```
CALL neptune.algo.degree(["101"], {edgeLabels: ["route"]})
```

This is a more complicated standalone query submitted using the AWS CLI:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string 'CALL neptune.algo.degree(
        ["101", "102", "103"],
        {
          edgeLabels: ["route"],
          vertexLabel: ["airport"],
          traversalDirection: "inbound",
          concurrency: 2
        }
      )
      YIELD node, degree
      RETURN node, degree' \
  --language open_cypher \
  /tmp/out.txt
```

This is a query integration example with frontier injection, where `.degree` follows a `MATCH` clause and finds the degree value for all vertices returned by `MATCH(n:airport)`:

```
MATCH(n:airport)
CALL neptune.algo.degree(n, {edgeLabels: ["route"]})
YIELD degree
RETURN n, degree'
```

This is an example of multiple `.degree` invocations chained together, where the output of one invocation serves as the input of another:

```
CALL neptune.algo.degree(
  ["108"],
  {
    edgeLabels: ["route"],
    vertexLabel: ["airport"]
  }
)
YIELD node
CALL neptune.algo.degree(
  node,
  {
    edgeLabels: ["route"],
    vertexLabel: ["airport"]
  }
)
YIELD node AS node2 WITH id(node2) AS id
RETURN id
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample   `.degree`   output
<a name="degree-sample-output"></a>

Here is an example of the output returned by .degree when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string 'MATCH (n)
      CALL neptune.algo.degree(n)
      YIELD node, degree
      RETURN node, degree
      LIMIT 2' \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    {
      "node": {
        "~id": "10",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 38.94449997,
          "elev": 313,
          "longest": 11500,
          "city": "Washington D.C.",
          "type": "airport",
          "region": "US-VA",
          "desc": "Washington Dulles International Airport",
          "code": "IAD",
          "prscore": 0.002264724113047123,
          "lon": -77.45580292,
          "wccid": 2357352929951779,
          "country": "US",
          "icao": "KIAD",
          "runways": 4
        }
      },
      "degree": 312
    },
    {
      "node": {
        "~id": "12",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 40.63980103,
          "elev": 12,
          "longest": 14511,
          "city": "New York",
          "type": "airport",
          "region": "US-NY",
          "desc": "New York John F. Kennedy International Airport",
          "code": "JFK",
          "prscore": 0.002885053399950266,
          "lon": -73.77890015,
          "wccid": 2357352929951779,
          "country": "US",
          "icao": "KJFK",
          "runways": 4
        }
      },
      "degree": 403
    }
  ]
}
```

# Degree mutate centrality algorithm
<a name="degree-mutate"></a>

The `.degree.mutate` centrality algorithm counts the number of incident edges of every node in the graph. This measure of how connected the node is can in turn indicate the node's importance and level of influence in the network. The `.degree.mutate` algorithm then stores each node's calculated degree value as a property of the node.

The algorithm returns a single success flag (`true` or `false`), which indicates whether the writes succeeded or failed.

## `.degree.mutate`   syntax
<a name="degree-mutate-syntax"></a>

```
CALL neptune.algo.degree.mutate(
  {
    writeProperty: A name for the new node property where the degree values will be written,
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabels: [a list of vertex labels for filtering (optional)],
    vertexLabel: "a node label for filtering (optional) [deprecated]",
    traversalDirection: traversal direction (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD success
RETURN success
```

## `.degree.mutate`   inputs
<a name="degree-mutate-inputs"></a>

**a configuration object that contains:**
+ **writeProperty** *(required)*   –   *type:* `string`;   *default: none*.

  A name for the new node property that will contain the computed degree values.
+ **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

  To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
+ **vertexLabels**   *(optional)*   –   *type:* a list of vertex label strings;   *example:* `["airport", ...]`;   *default:* no vertex filtering.

  To filter on one more vertex labels, provide a list of the ones to filter on. If no `vertexLabels` field is provided then all vertex labels are processed during traversal.
+ **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

   [deprecated] A node label for node filtering. Note that it is deprecated. If vertexLabels is provided, vertexLabel is ignored.
+ **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

  The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
+ **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

  Controls the number of concurrent threads used to run the algorithm.

   If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Output of the `.degree.mutate` algorithm
<a name="degree-mutate-output"></a>

The computed degree values are written to a new vertex property using the property name specified by the `writeProperty` input parameter.

A single Boolean `success` value (`true` or `false`) is returned, which indicates whether or not the writes succeeded.

## `.degree.mutate` query examples
<a name="degree-mutate-query-examples"></a>

The example below is a standalone example, where the source vertex list is explicitly provided in the query.

This query writes the degree values of all nodes in the graph to a new vertex property called `DEGREE`:

```
CALL neptune.algo.degree.mutate({writeProperty: "DEGREE", edgeLabels: ["route]})
```

After using the mutate algorithm, the newly written properties can then be accessed in subsequent queries. For example, after the mutate algorithm call above, you could use the following query to retrieve the `.degree` property of specific nodes:

```
MATCH (n) WHERE id(n) IN ["101", "102", "103"]
RETURN n.DEGREE'
```

## Sample output from `.degree.mutate`
<a name="degree-mutate-query-examples"></a>

Here is an example of the output returned by .degree.mutate when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.degree.mutate({writeProperty: 'degree'}) YIELD success RETURN success" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt   
{
  "results": [
    { "success": true }
  ]
}
```

# PageRank centrality algorithm
<a name="page-rank"></a>

PageRank is an algorithm originally developed by Larry Page and Sergey Brin, co-founders of Google. It was originally developed to rank web pages in search engine results. The PageRank score for a given node is calculated based on the number and quality of the edges pointing to that node, as well as the importance of the nodes that are connected to it. The PageRank algorithm assigns a higher score to nodes that are linked to other high-scoring nodes, and a lower score to nodes that are linked to low-scoring nodes.

The output of PageRank can be visualized as a ranking metric for the importance of a node within a given graph, with the most important nodes having the highest score, and the least important node having the lowest score. PageRank is used in search engines to rank web pages based on their importance and influence, in citation networks to identify highly cited scientific papers, and in recommendation systems to suggest popular and relevant content to users.

The space complexity is O(\$1V\$1), where \$1V\$1 is the number of vertices in the graph.

## Personalized PageRank
<a name="personalized-page-rank"></a>

 Personalized PageRank is a variation of the original PageRank algorithm. It is generally used to measure the importance of vertices in a graph. It tailors the ranking process to individual users or specific topics. Compared to PageRank, the result of personalized pagerank is a more customized ranking of web pages or nodes in a graph, reflecting individual interests or specific areas of focus. This approach is particularly useful in recommendation systems, personalized search results, and analyzing large-scale networks with diverse content for a specific focus. 

### Example scenario: Online retail platform
<a name="personalized-pageRank-example-1"></a>

 Imagine an online retail platform where each product is a node, and customer purchases (or views) between products are edges. 

**Regular PageRank**
+  Objective: Rank products based on their general popularity across all customers. 
+  Result: Products that are frequently purchased or viewed by many customers will receive higher PageRank scores. This helps identify the most popular products across the entire platform. 

**Personalized PageRank**
+  Objective: Rank products based on their relevance to a specific customer's shopping behavior. 
+  Inputs: 
  +  Source Nodes: A list of products that the customer has previously purchased or shown interest in. 
  +  Source Weights: Optional weights indicating the relative importance of each source product (e.g., higher weight for recently purchased items). 
+  Result: Products that are not only popular but also closely related to the customer's specific shopping behavior will receive higher scores. This helps the platform recommend new products that are more likely to interest the customer. 

### Example scenario: Organizational network security
<a name="personalized-pageRank-example-2"></a>

 Imagine a network of computers within an organization where each computer is a node, and communication paths (like data transfers or network connections) between computers are edges. 

**Regular PageRank**
+  Objective: Rank computers based on their general importance within the organizational network. 
+  Result: Computers that have a high number of connections to other computers will receive higher PageRank scores. This helps identify critical nodes in the network that, if compromised, could have a significant impact on the entire network. 

**Personalized PageRank**
+  Objective: Rank computers based on their relevance to a specific security concern or department within the organization. 
+  Inputs: 
  +  Source Nodes: A list of computers that are known to handle sensitive data or are critical to a specific department (e.g., HR, Finance). 
  +  Source Weights: Optional weights indicating the relative importance of each source computer (e.g., higher weight for computers handling more sensitive data). 
+  Result: Computers that are not only well-connected but also closely related to the specific security concern or department will receive higher scores. This helps the cybersecurity team prioritize monitoring and protection efforts on the most critical nodes. 

### Example scenario: Insurance policyholder network
<a name="personalized-pageRank-example-3"></a>

 Imagine a network of policyholders where each policyholder is a node, and the relationships (like shared claims, referrals, or common risk factors) between policyholders are edges. 

**Regular PageRank**
+  Objective: Rank policyholders based on their general importance within the insurance network. 
+  Result: Policyholders who have a high number of connections to other policyholders (e.g., through shared claims or referrals) will receive higher PageRank scores. This helps identify influential policyholders who may have a significant impact on the network. 

**Personalized PageRank**
+  Objective: Rank policyholders based on their relevance to a specific risk profile or insurance product. 
+  Inputs: 
  +  Source Nodes: A list of policyholders that fit a specific risk profile or are relevant to a particular insurance product (e.g., high-risk drivers for auto insurance). 
  +  Source Weights: Optional weights indicating the relative importance of each source policyholder (e.g., higher weight for policyholders with more recent claims). 
+  Result: Policyholders who are not only well-connected but also closely related to the specific risk profile or insurance product will receive higher scores. This helps the insurance company tailor its risk assessment and underwriting processes more effectively. 

 The additional inputs to the algorithm are a list of vertices to be personalized on (`sourceNodes`) and optionally the weight distribution among those vertices (`sourceWeights`). If given, the weights are normalized before pagerank computation. 

**Note**  
 Neptune Analytics allows up to 8192 vertices in the personalization vector, sourceNodes. 

## `.pageRank`  syntax
<a name="page-rank-syntax"></a>

```
CALL neptune.algo.pageRank(
  [node list (required)],
  {
    numOfIterations: a small positive integer like 20 (optional),
    dampingFactor: a positive float less than or equal to 1.0, like 0.85 (optional)
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    concurrency: number of threads to use (optional),
    traversalDirection: the direction of edge to follow (optional),
    tolerance: a floating point number between 0.0 and 1.0 (inclusive)(optional),
    edgeWeightProperty: the weight property to consider for weighted pageRank computation (optional),
    edgeWeightType: The type of values associated with the edgeWeightProperty argument (optional),
    sourceNodes: [A list of node IDs to personalize on (optional)],
    sourceWeights: [A list of non-negative weights for the sourceNodes (optional)]
  }
)
YIELD node, rank
RETURN node, rank
```

## `.pageRank`  inputs
<a name="page-rank-inputs"></a>
+ **a node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes for which to return the page rank values. If an empty list is provided, the query result will also be empty.

  If the algorithm is called following a `MATCH` clause (query integration), the result returned by the `MATCH` clause is taken as the node list.
+ 

**a configuration object that contains:**
  + **numOfIterations** *(optional)*   –   *type:* a positive integer greater than zero;   *default: 20*.

    The number of iterations to perform to reach convergence. A number between 10 and 20 is recommended.
  + **dampingFactor** *(optional)*   –   *type:* a positive floating-point number less than or equal to `1.0`;   *default: *` 0.85`.

    A positive floating-point damping factor between 0.0 and 1.0 that expresses the probability, at any step, that the node will continue.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A vertex label for vertex filtering. If a vertex label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"outbound"` or `"inbound"`.
  + **tolerance** *(optional)*   –   a floating point number between 0.0 and 1.0 (inclusive). When the average difference in the pageRank values of two iterations drops below `tolerance`, the algorithm stops, regardless of whether the `numOfIterations` is reached. Default value is `0.000001 (1e-6)`.
    + Note that this tolerance computation is equivalent to L1 error or sum of Mean Absolute Difference (MAE)s.
    + The stopping condition is `l1_error_sum < tolerance * numNodes`, equivalent to `l1_error_sum/numNodes < tolerance`.
  + **edgeWeightProperty** *(optional)*   –   *type:* `string`   *default: none*.

    The weight property to consider for weighted pageRank computation.
  + **edgeWeightType** *(optional) - required if `edgeWeightProperty` is present*   –   *type:* `string`;   *default: none*.

    The type of values associated with the edgeWeightProperty argument, specified as a string. *valid values*: "int", "long", "float", "double".
    + If the edgeWeightProperty is not given, the algorithm runs unweighted no matter if the edgeWeightType is given or not.
    + Note that if multiple properties exist on the edge with the name specified by edgeWeightProperty, one of those property values will be sampled at random.
  + **sourceNodes** *(optional) - required if running personalized PageRank*   –   *type:* `list`;   *default: none*.

    A personalization vertex list ["101", ...]
    +  Can include 1 to 8192 vertices. 
    +  If a `vertexLabel` is provided, nodes that do not have the given `vertexLabel` are ignored. 
  + **sourceWeights** *(optional)*   –   *type:* `list`;   *default: none*.

    A personalization weight list. The weight distribution among the personalized vertices.
    +  If not provided, the default behavior is uniform distribution among the vertices given in `sourceNodes`. 
    +  There must be at least one non-zero weight in the list. 
    +  The length of the sourceWeights list must match the `sourceNodes` list. 
    +  The mapping of personalization vertex and weight lists are one to one. The first value in the weight list corresponds to the weight of first vertex in the vertex list, second value is for the second vertex, etc. 
    +  The weights can be one of `int`, `long`, `float`, or `double` types. 

## Outputs for the `.pageRank` algorithm
<a name="page-rank-outputs"></a>
+ **node**   –   A key column of the input nodes.
+ **rank**   –   A key column of the corresponding page-rank scores for those nodes.

If the input nodes list is empty, the output is empty.

## Query examples for `.pageRank`
<a name="page-rank-query-examples"></a>

This is a standalone example, where the input vertex list is explicitly specified in the query.

```
CALL neptune.algo.pageRank(
  ["101"],
  {
    numOfIterations: 1,
    dampingFactor: 0.85,
    edgeLabels: ["route"]
  }
)
```

This is a query integration example, where `.pageRank` follows a `MATCH` clause and uses frontier injection to take the output of the `MATCH` clause as its list of input nodes:

```
MATCH (n)
CALL neptune.algo.pageRank(
  n,
  {
    dampingFactor: 0.85,
    numOfIterations: 1,
    edgeLabels: ["route"]
  }
)
YIELD rank
RETURN n, rank
```

This query is an example of constraining the results of `.pageRank` based on the PageRank values, and returning them in ascending order:

```
MATCH (n)
CALL neptune.algo.pageRank(
  n,
  {
    numOfIterations: 10,
    dampingFactor: 0.85,
    tolerance: 0.0001,
    vertexLabel: "airport",
    edgeLabels: ["route"]
  }
)
YIELD rank WHERE rank > 0.004
RETURN n, rank ORDER BY rank
```

## Query examples for Personalized `.pageRank`
<a name="personalized-page-rank-query-examples"></a>

 Personalized PageRank applies the same integration and constraints. Here are some examples that pass personalization-specific configurations. 

 This is a query integration example, where .pageRank follows a MATCH clause and uses frontier injection to take the output of the MATCH clause as its input list. We use nodes “101” and “102” as personalization nodes with "1" and "1.5" weights as weights respectively: 

```
MATCH (n)
CALL neptune.algo.pageRank( n, {
    tolerance: 0.001,
    edgeLabels:["route”],
    sourceNodes:["101", "102"],
    sourceWeights:[1, 1.5]
} )
YIELD node, rank
RETURN node, rank
```

 This is an example of where only source nodes is provided. The weights of ”101” and ”102” will be "1" and "1" (same, uniform) respectively. 

```
MATCH (n)
CALL neptune.algo.pageRank( n, {
    tolerance: 0.001,
    edgeLabels:["route”],
    sourceNodes:["101", "102"],
} )
YIELD node, rank
RETURN node, rank
```

 This is an example where the weights are given as integral numbers. Note that this would yield the same result as the first example in which the weights were ("1" and "1.5"): 

```
MATCH(n)
CALL neptune.algo.pageRank( n, {
    tolerance: 0.001,
    edgeLabels:["route”],
    sourceNodes:["101", "102"],
    sourceWeights:[2, 3]
} )
YIELD node, rank
RETURN node, rank
```

## Sample   `.pageRank`   output
<a name="page-rank-sample-output"></a>

Here is an example of the output returned by .pageRank when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.pageRank(n) YIELD node, rank RETURN node, rank LIMIT 2" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    {
      "node": {
        "~id": "2709",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 65.4809036254883,
          "elev": 49,
          "longest": 8711,
          "city": "Nadym",
          "type": "airport",
          "region": "RU-YAN",
          "desc": "Nadym Airport",
          "code": "NYM",
          "lon": 72.6988983154297,
          "country": "RU",
          "icao": "USMM",
          "runways": 1
        }
      },
      "rank": 0.00016044313088059425
    },
    {
      "node": {
        "~id": "3747",
        "~entityType": "node",
        "~labels": ["continent"],
        "~properties": {
          "code": "AN",
          "type": "continent",
          "desc": "Antarctica"
        }
      },
      "rank": 0.0000404242
    }
  ]
}
```

# PageRank mutate centrality algorithm
<a name="page-rank-mutate"></a>

The ranking metric computed by `.pageRank.mutate` can indicate the importance of a node within a given graph, with the most important nodes having the highest score, and the least important node having the lowest score. PageRank is used in search engines to rank web pages based on their importance and influence, in citation networks to identify highly cited scientific papers, and in recommendation systems to suggest popular and relevant content to users.

The mutate variant of the PageRank algorithm performs the PageRank calculation over the entire graph unless the configuration parameters establish a filter, and each traversed node's calculated PageRank value is stored on that node as a property.

## `pageRank.mutate`  inputs
<a name="page-rank-mutate-inputs"></a>

Inputs for the `pageRank.mutate` algorithm are passed in a configuration object parameter that contains:
+ **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

  To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
+ **writeProperty** *(required)*   –   *type:* `string`;   *default: none*.

  A name for the new vertex property that will contain the computed PageRank values. If a property of that name already exists, it is overwritten.
+ **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

  A vertex label for vertex filtering. If a vertex label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
+ **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

  The direction of edge to follow. Must be one of: `"outbound"` or `"inbound"`.
+ **numOfIterations** *(optional)*   –   *type:* a positive integer greater than zero;   *default: 20*.

  The number of iterations to perform to reach convergence. A number between 10 and 20 is recommended.
+ **dampingFactor** *(optional)*   –   *type:* a positive floating-point number less than or equal to `1.0`;   *default: *` 0.85`.

  A positive floating-point damping factor between 0.0 and 1.0 that expresses the probability, at any step, that the node will continue.
+ **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

  Controls the number of concurrent threads used to run the algorithm.

   If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.
+ **tolerance** *(optional)*   –   a floating point number between 0.0 and 1.0 (inclusive). When the average difference in the pageRank values of two iterations drops below `tolerance`, the algorithm stops, regardless of whether the `numOfIterations` is reached. Default value is `0.000001 (1e-6)`.
  + Note that this tolerance computation is equivalent to L1 error or sum of Mean Absolute Difference (MAE)s.
  + The stopping condition is `l1_error_sum < tolerance * numNodes`, equivalent to `l1_error_sum/numNodes < tolerance`.
+ **edgeWeightProperty** *(optional)*   –   *type:* `string`   *default: none*.

  The weight property to consider for weighted pageRank computation.
+ **edgeWeightType** *(optional) - required if `edgeWeightProperty` is present*   –   *type:* `string`;   *default: none*.

  The type of values associated with the edgeWeightProperty argument, specified as a string. *valid values*: "int", "long", "float", "double".
  + If the edgeWeightProperty is not given, the algorithm runs unweighted no matter if the edgeWeightType is given or not.
  + Note that if multiple properties exist on the edge with the name specified by edgeWeightProperty, one of those property values will be sampled at random.
+ **sourceNodes** *(optional) - required if running personalized PageReank*   –   *type:* `list`;   *default: none*.

  A personalization vertex list ["101", ...]
  +  Can include 1 to 8192 vertices. 
  +  If a `vertexLabel` is provided, nodes that do not have the given `vertexLabel` are ignored. 
+ **sourceWeights** *(optional)*   –   *type:* `list`;   *default: none*.

  A personalization weight list. The weight distribution among the personalized vertices.
  +  If not provided, the default behavior is uniform distribution among the vertices given in `sourceNodes`. 
  +  There must be at least one non-zero weight in the list. 
  +  The length of the sourceWeights list must match the `sourceNodes` list. 
  +  The mapping of personalization vertex and weight lists are one to one. The first value in the weight list corresponds to the weight of first vertex in the vertex list, second value is for the second vertex, etc. 
  +  The weights can be one of `int`, `long`, `float`, or `double` types. 

## Outputs for the `pageRank.mutate` algorithm
<a name="page-rank-mutate-outputs"></a>

The computed PageRank values are written to a new vertex property on each node using the property name specified by the `writeProperty` input parameter.

A single Boolean `success` value (`true` or `false`) is returned, which indicates whether or not the writes succeeded.

## Query example for `pageRank.mutate`
<a name="page-rank-mutate-query-example"></a>

The example below computes the PageRank score of every vertex in the graph, and writes that score to a new vertex property named `P_RANK`:

```
CALL neptune.algo.pageRank.mutate(
  {
    writeProperty:"P_RANK",
    dampingFactor: 0.85,
    numOfIterations: 1,
    edgeLabels: ["route"]
  }
)
```

This query illustrates how you could then access the PageRank values in the `P_RANK` vertex property. It counts how many nodes have a `P_RANK` property value greater than the "SEA" node's `P_RANK` property value:

```
MATCH (n) WHERE n.code = "SEA" WITH n.P_RANK AS lowerBound
MATCH (m) WHERE m.P_RANK > lowerBound
RETURN count(m)
```

## Query examples for Personalized `pageRank.mutate`
<a name="personalized-page-rank-mutate-query-examples"></a>

 Personalized PagerRank applies the same integration and constraints. Here are some examples that pass personalization-specific configurations. 

 The example below computes the Personalized PageRank score of every vertex in the graph, and writes that score to a new vertex property named "PRS\$1RANK": 

```
CALL neptune.algo.pageRank.mutate(
  {
    writeProperty:"PRS_RANK",
    sourceNodes:[”101”, “103”, “105”],
    sourceWeights:[5, 3, 2],
    dampingFactor: 0.85,
    numOfIterations: 1,
    edgeLabels: ["route"]
  }
)
```

## Sample   `.pageRank.mutate`   output
<a name="page-rank-mutate-sample-output"></a>

Here is an example of the output returned by .pageRank.mutate when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.pageRank.mutate({writeProperty: 'prscore'}) YIELD success RETURN success" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    { "success": true }
  ]
}
```

# Closeness centrality algorithm
<a name="closeness-centrality"></a>

The closeness centrality algorithm computes a Closeness Centrality (CC) metric for specified nodes in a graph. The CC metric of a node can be used as a positive measure of how close it is to all other nodes or how central it is in the graph.

The CC metric can be interpreted to show how quickly all other nodes in a network can be reached from a given node, and how important it is as a central hub for rapid information flow. It can be used in transportation networks to identify key hub locations, and in disease-spread modeling to pinpoint central points for targeted intervention efforts.

The closeness centrality (CC) score of a node is calculated based on the sum of its distances to all other nodes. The CC score itself is the inverse of that number; in other words, one divided by that sum. In practice, the calculation is commonly normalized to use the average length of the shortest paths rather than the actual sum of their lengths.

## `.closenessCentrality`  syntax
<a name="closeness-centrality-syntax"></a>

```
CALL neptune.algo.closenessCentrality(
  [node list (required)],
  {
    numSources: the number of BFS sources to use for computing the CC (required)
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional),
    normalize: Boolean, set to false to prevent normalization (optional)
    concurrency: number of threads to use (optional)
  }
)
YIELD node, score
RETURN node, score
```

## Inputs for the `.closenessCentrality` algorithm
<a name="closenessCentrality-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting locations for the algorithm. Each node in the list triggers an execution of the algorithm. If an empty list is provided, the query result is also empty.

  If the algorithm is called following a `MATCH` clause (query algo integration), the source query list is the result returned by the `MATCH` clause. 
+ 

**a configuration object that contains:**
  + **numSources** *(required)*   –   *type:* `unsigned long`;   *default: none*.

    The number of BFS sources for computing approximate Closeness Centrality (CC). To compute exact closeness centrality, set `numSources` to a number larger than number of nodes, such as `maxInt`.

    Because of the computational complexity of the algorithm for large graphs, it's generally best to specify a number in the order of thousands to ten thousands, such as 8,192.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **normalize** *(optional)*   –   *type:* `Boolean`;   *default:*` true`.

    You can use this field to turn off normalization, which is on by default. Without normalization, only centrality scores of nodes within the same component can be meaningfully compared. Normalized scores can be compared across different connected components.

    The CC is normalized using the Wasserman-Faust normalization formula for unconnected graphs. If there are `n` vertices reachable from vertex `u` (including vertex `u` itself), the Wasserman-Faust normalized closeness centrality score of vertex `u` is calculated as follows:

    ```
    (n-1)^2 / (|V| - 1) * sum(distance from u to these n vertices)
    ```

    Without normalization, the centrality score of vertex `u` is calculated as:

    ```
    (|V| - 1) / sum(distance from u to all other vertices in the graph)
    ```
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A node label for node filtering. If a node label is provided, nodes matching the label are the only nodes that are included, including nodes in the input list.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `.closenessCentrality` algorithm
<a name="closenessCentrality-outputs"></a>
+ **node**   –   A key column of the input nodes.
+ **score**   –   A key column of the corresponding closeness-centrality (CC) scores for those nodes.

If the input node list is empty, the output is empty.

## `.closenessCentrality`  query examples
<a name="closenessCentrality-query-examples"></a>

This is a standalone example, where the source node list is explicitly provided in the query:

```
CALL neptune.algo.closenessCentrality(
  ["101"],
  {
    numSources: 10,
    edgeLabels: ["route"],
    vertexLabel: "airport",
    traversalDirection: "outbound",
    normalize: true,
    concurrency: 1
  }
)
YIELD node, score
RETURN node, score
```

This is a query integration example, where `.closenessCentrality.mutate` follows a `MATCH` clause and uses the output of the `MATCH` clause as its list of source nodes:

```
Match (n)
CALL neptune.algo.closenessCentrality(
  n,
  {
    numSources: 10,
    edgeLabels: ["route"],
    vertexLabel: "airport",
    traversalDirection: "outbound",
    normalize: true,
    concurrency: 1
  }
)
YIELD score
RETURN n, score
```

This is a query integration example that returns the nodes with the 10 highest CC scores:

```
CALL neptune.algo.closenessCentrality(
  n,
  {
    edgeLabels: ["route"],
    numSources: 10
  }
)
YIELD score
RETURN n, score
ORDER BY score DESC
LIMIT 10"
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample   `.closenessCentrality`   output
<a name="page-rank-sample-output"></a>

Here is an example of the output returned by .closenessCentrality when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.closenessCentrality(n, {numSources: 10}) YIELD node, score RETURN node, score limit 2" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    {
      "node": {
        "~id": "10",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 38.94449997,
          "elev": 313,
          "longest": 11500,
          "city": "Washington D.C.",
          "type": "airport",
          "region": "US-VA",
          "desc": "Washington Dulles International Airport",
          "code": "IAD",
          "prscore": 0.002264724113047123,
          "degree": 312,
          "lon": -77.45580292,
          "wccid": 2357352929951779,
          "country": "US",
          "icao": "KIAD",
          "runways": 4
        }
      },
      "score": 0.20877772569656373
    },
    {
      "node": {
        "~id": "12",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 40.63980103,
          "elev": 12,
          "longest": 14511,
          "city": "New York",
          "type": "airport",
          "region": "US-NY",
          "desc": "New York John F. Kennedy International Airport",
          "code": "JFK",
          "prscore": 0.002885053399950266,
          "degree": 403,
          "lon": -73.77890015,
          "wccid": 2357352929951779,
          "country": "US",
          "icao": "KJFK",
          "runways": 4
        }
      },
      "score": 0.2199712097644806
    }
  ]
}
```

# Closeness centrality mutatealgorithm
<a name="closeness-centrality-mutate"></a>

The closeness centrality mutate algorithm computes a Closeness Centrality (CC) metric for specified nodes in a graph. The CC metric of a node can be used as a positive measure of how close it is to all other nodes or how central it is in the graph.

The CC metric can be interpreted to show how quickly all other nodes in a network can be reached from a given node, and how important it is as a central hub for rapid information flow. It can be used in transportation networks to identify key hub locations, and in disease-spread modeling to pinpoint central points for targeted intervention efforts.

The closeness centrality (CC) score of a node is calculated based on the sum of its distances to all other vertices. The CC score itself is the inverse of that number; in other words, one divided by that sum. In practice, the calculation is commonly normalized to use the average length of the shortest paths rather than the actual sum of their lengths.

## `.closenessCentrality.mutate`  syntax
<a name="closeness-centrality-mutate-syntax"></a>

```
CALL neptune.algo.closenessCentrality.mutate(
  [node list (required)],
  {
    numSources: the number of BFS sources to use for computing the CC (required)
    writeProperty: name of the node property to write the CC score to (required)
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabel: "a node label for filtering (optional)",
    traversalDirection: traversal direction (optional),
    normalize: Boolean, set to false to prevent normalization (optional)
    concurrency: number of threads to use (optional)
  }
)
YIELD success
RETURN success
```

## `.closenessCentrality.mutate`  inputs
<a name="closeness-centrality-mutate-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting locations for the algorithm. Each node in the list triggers an execution of the algorithm. If an empty list is provided, the query result is also empty.

  If the algorithm is called following a `MATCH` clause (query algo integration), the source node list is the result returned by the `MATCH` clause. 
+ 

**a configuration object that contains:**
  + **numSources** *(required)*   –   *type:* `uint64_t`;   *default: none*.

    The number of BFS sources for computing approximate Closeness Centrality (CC). To compute exact closeness centrality, set `numSources` to a number larger than number of vertices, such as `maxInt`.

    Because of the computational complexity of the algorithm for large graphs, it's generally best to specify a number in the order of thousands to ten thousands, such as 8,192.
  + **writeProperty** *(required)*   –   *type:* `string`;   *default: none*.

    A name for the new node property that will contain the computed CC score of each node.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **normalize** *(optional)*   –   *type:* `Boolean`;   *default:*` true`.

    You can use this field to turn off normalization, which is on by default. Without normalization, only centrality scores of nodes within the same component can be meaningfully compared. Normalized scores can be compared across different connected components.

    The CC is normalizd using the Wasserman-Faust normalization formula for unconnected graphs. If there are `n` vertices reachable from vertex `u` (including vertex `u` itself), the Wasserman-Faust normalized closeness centrality score of vertex `u` is calculated as follows:

    ```
    (n-1)^2 / (|V| - 1) * sum(distance from u to these n vertices)
    ```

    Without normalization, the centrality score of vertex `u` is calculated as:

    ```
    (|V| - 1) / sum(distance from u to all other vertices in the graph)
    ```
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A node label for node filtering. If a node label is provided, vertices matching the label are the only vertices that are included, including vertices in the input list.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.closenessCentrality.mutate`  outputs
<a name="closeness-centrality-mutate-outputs"></a>

The closeness centrality score of each source node in the input list is written as a new node property using the property name specified in `writeProperty`.

If the algorithm is invoked as a standalone query, there is no other output.

If the algorithm is invoked following a `MATCH` clause that provides its source node list (query integration), the algorithm outputs a key column of the source vertices from the `MATCH` clause and a value column of Booleans (true or false) that indicate whether the CC value was successfully written to the node in question.

## Query examples for  `.closenessCentrality.mutate`
<a name="closeness-centrality-mutate-query-examples"></a>

This example computes closeness centrality scores and writes them as a new node property called `ccScore`:

```
CALL neptune.algo.closenessCentrality.mutate(
  {
    numSources: 10,
    writeProperty: "ccScore",
    edgeLabels: ["route"],
    vertexLabel: "airport",
    traversalDirection: "outbound",
    normalize: true,
    concurrency: 1
  }
)
```

Then you can query the `ccScore` property in a subsequent query:

```
MATCH (n) RETURN id(n), n.ccScore limit 5
```

## Sample   `.closenessCentrality.mutate`   output
<a name="page-rank-sample-output"></a>

Here is an example of the output returned by .closenessCentrality.mutate when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.closenessCentrality.mutate(
       {
         writeProperty: 'ccscore',
         numSources: 10
       }
     )
     YIELD success
     RETURN success"
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt  
{
  "results": [
    {
      "success": true
    }
  ]
}
```

# Similarity algorithms in Neptune Analytics
<a name="similarity-algorithms"></a>

Graph similarity algorithms allow you to compare and analyze the similarities and dissimilarities between different graph structures, which can provide insight into relationships, patterns, and commonalities across diverse datasets. This is invaluable in various fields, including biology for comparing molecular structures, social networks for identifying similar communities, and recommendation systems for suggesting similar items based on user preferences.

Neptune Analytics supports the following similarity algorithms:
+ [`neighbors.common`](common-neighbors.md)   –   This algorithm counts the number of common neighbors of two input vertices, which is the intersection of the neighborhoods of those vertices.

  By counting how many neighboring nodes are shared by two nodes, it provides a measure of their potential interaction or similarity within the network. It's used in social network analysis to identify individuals with mutual connections, in citation networks to find influential papers referenced by multiple sources, and in transportation networks to locate critical hubs with many direct connections to other nodes.
+ [`neighbors.total`](total-neighbors.md)   –   This algorithm counts the number of total unique neighbors among two input vertices, which is the union of the neighborhoods of those vertices.
+ [`jaccardSimilarity`](jaccard-similarity.md)   –   This algorithm measures the similarity between two sets by dividing the size of their intersection by the size of their union.

  By measuring the proportion of shared neighbors relative to the total number of unique neighbors, it provides a metric for understanding the degree of overlap or commonality between different parts of a network. Jaccard similarity is applied in recommendation systems to suggest products or content to users based on their shared preferences and in biology to compare genetic sequences for identifying similarities in DNA fragments.
+ [`overlapSimilarity`](overlap-similarity.md)   –   This algorithm measures the overlap between the neighbors of two vertices.

  It quantifies the similarity between nodes by calculating the ratio of common neighbors they share to the total number of neighbors they collectively have, providing a measure of their closeness or similarity within the network. Overlap similarity is applied in social network analysis to identify communities of individuals with shared interests or interactions, and in biological networks to detect common functionalities among proteins in molecular pathways.

# Common neighbors algorithm
<a name="common-neighbors"></a>

Common neighbors is an algorithm that counts the number of common neighbors of two input nodes, which is the intersection of their neighborhoods. This provides a measure of their potential interaction or similarity within the network. The common neighbors algorithm is used in social network analysis to identify individuals with mutual connections, in citation networks to find influential papers referenced by multiple sources, and in transportation networks to locate critical hubs with many direct connections to other nodes.

## `.neighbors.common`  syntax
<a name="common-neighbors-syntax"></a>

```
CALL neptune.algo.neighbors.common(
  [first node(s)],
  [second node(s)],
  {
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional)
  }
)
YIELD common
RETURN firstNodes, secondNodes, common
```

## `.neighbors.common`  inputs
<a name="common-neighbors-inputs"></a>
+ **first node(s)** *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  One or more nodes of which to find the common neighbors with the corresponding second node(s).
+ **second node(s)** *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  One or more nodes of which to find the common neighbors with the corresponding first node(s).
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A node label for node filtering. If a node label is provided, nodes matching the label are the only nodes that are considered neighbors. This does not filter the nodes in the first or second node lists.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default: outbound*.

     The direction of edge to follow. Must be one of: "inbound", "outbound", or "both". 

## `.neighbors.common`  outputs
<a name="common-neighbors-outputs"></a>

**common**: A row for each node in the first node list and corresponding node in the second node list, and the number of neighboring nodes they have in common.

If either input node list is empty, the output is empty.

## `.neighbors.common`  query examples
<a name="common-neighbors-query-examples"></a>

This example specifies only two nodes:

```
MATCH (sydairport:airport {code: 'SYD'})
MATCH (jfkairport:airport {code: 'JFK'})
CALL neptune.algo.neighbors.common( sydairport, jfkairport, { edgeLabels: ['route'] })
YIELD common
RETURN sydairport, jfkairport, common
```

This example specifies multiple nodes. It returns a row for each combination of a US airport and a UK airport, and the number of destinations we could reach from both of those two airports:

```
MATCH (usairports:airport {country: 'US'})
MATCH (ukairports:airport {country: 'UK'}) 
CALL neptune.algo.neighbors.common(usairports, ukairports, {edgeLabels: ['route']})
YIELD common
RETURN usairports, ukairports, common
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample   `.neighbors.common`   output
<a name="common-neighbors-sample-output"></a>

Here is an example of the output returned by .neighbors.common when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "MATCH (sydairport:airport {code: 'SYD'})
                       MATCH (jfkairport:airport {code: 'JFK'})
                       CALL neptune.algo.neighbors.common(sydairport, jfkairport, {edgeLabels: ['route']})
                       YIELD common
                       RETURN sydairport, jfkairport, common" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt
{
  "results": [
    {
      "sydairport": {
        "~id": "55",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": -33.9460983276367,
          "elev": 21,
          "type": "airport",
          "code": "SYD",
          "lon": 151.177001953125,
          "runways": 3,
          "longest": 12999,
          "communityId": 2357352929951971,
          "city": "Sydney",
          "region": "AU-NSW",
          "desc": "Sydney Kingsford Smith",
          "prscore": 0.0028037719894200565,
          "degree": 206,
          "wccid": 2357352929951779,
          "ccscore": 0.19631840288639069,
          "country": "AU",
          "icao": "YSSY"
        }
      },
      "jfkairport": {
        "~id": "12",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 40.63980103,
          "elev": 12,
          "type": "airport",
          "code": "JFK",
          "lon": -73.77890015,
          "runways": 4,
          "longest": 14511,
          "communityId": 2357352929951971,
          "city": "New York",
          "region": "US-NY",
          "desc": "New York John F. Kennedy International Airport",
          "prscore": 0.002885053399950266,
          "degree": 403,
          "wccid": 2357352929951779,
          "ccscore": 0.2199712097644806,
          "country": "US",
          "icao": "KJFK"
        }
      },
      "common": 24
    }
  ]
}
```

# Total neighbors algorithm
<a name="total-neighbors"></a>

Total neighbors is an algorithm that counts the total number of unique neighbors of two input vertices, which is the union of the neighborhoods of those vertices.

## `.neighbors.total`  syntax
<a name="total-neighbors-syntax"></a>

```
CALL neptune.algo.neighbors.total(
  [first node(s)],
  [second node(s)],
  {
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional)
  }
)
YIELD total
RETURN firstNodes, secondNodes, total
```

## Inputs for the `.neighbors.total` algorithm
<a name="total-neighbors-inputs"></a>
+ **first node(s)** *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  One or more nodes of which to find the total unique neighbors with the corresponding second nodes.
+ **second node(s)** *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  One or more nodes of which to find the total unique neighbors with the corresponding first nodes.
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A node label for node filtering. If a node label is provided, nodes matching the label are the only nodes that are considered neighbors. This does not filter the nodes in the first or second node lists.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default: outbound*.

     The direction of edge to follow. Must be one of: "inbound", "outbound", or "both". 

## `.neighbors.total`  outputs
<a name="total-neighbors-outputs"></a>

**total**: A row for each node in the first node list and corresponding node in the second node list, and the total number of neighboring nodes they have.

If either input node list is empty, the output is empty.

## `.neighbors.total`  query examples
<a name="total-neighbors-query-examples"></a>

This example returns a row for each combination of a US airport and a UK airport, and the total number of destinations we could reach if we could fly out of either of the two airports.

```
MATCH (usairports:airport {country: 'US'})
MATCH (ukairports:airport {country: 'UK'}) 
CALL neptune.algo.neighbors.total(usairports, ukairports, {edgeLabels: ['route']})
YIELD total
RETURN usairports, ukairports, total"
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample   `.neighbors.total`   output
<a name="total-neighbors-sample-output"></a>

Here is an example of the output returned by .neighbors.total when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "MATCH (sydairport:airport {code: 'SYD'})
                       MATCH (jfkairport:airport {code: 'JFK'})
                       CALL neptune.algo.neighbors.total(sydairport, jfkairport, {edgeLabels: ['route']})
                       YIELD total
                       RETURN sydairport, jfkairport, total"
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt
{
  "results": [
    {
      "sydairport": {
        "~id": "55",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": -33.9460983276367,
          "elev": 21,
          "type": "airport",
          "code": "SYD",
          "lon": 151.177001953125,
          "runways": 3,
          "longest": 12999,
          "communityId": 2357352929951971,
          "city": "Sydney",
          "region": "AU-NSW",
          "desc": "Sydney Kingsford Smith",
          "prscore": 0.0028037719894200565,
          "degree": 206,
          "wccid": 2357352929951779,
          "ccscore": 0.19631840288639069,
          "country": "AU",
          "icao": "YSSY"
        }
      },
      "jfkairport": {
        "~id": "12",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 40.63980103,
          "elev": 12,
          "type": "airport",
          "code": "JFK",
          "lon": -73.77890015,
          "runways": 4,
          "longest": 14511,
          "communityId": 2357352929951971,
          "city": "New York",
          "region": "US-NY",
          "desc": "New York John F. Kennedy International Airport",
          "prscore": 0.002885053399950266,
          "degree": 403,
          "wccid": 2357352929951779,
          "ccscore": 0.2199712097644806,
          "country": "US",
          "icao": "KJFK"
        }
      },
      "total": 279
    }
  ]
}
```

# Jaccard similarity algorithm
<a name="jaccard-similarity"></a>

The Jaccard similarity algorithm measures the similarity between two sets. It is calculated by dividing the size of the intersection of the two sets by the size of their union.

By measuring the proportion of shared neighbors relative to the total number of unique neighbors, this algorithm provides a metric for the degree of overlap or commonality between different parts of a network. It can be useful in recommendation systems to suggest products or content to users based on their shared preferences and in biology to compare genetic sequences for identifying similarities in DNA fragments.

## `.jaccardSimilarity`  syntax
<a name="jaccard-similarity-syntax"></a>

```
CALL neptune.algo.jaccardSimilarity(
  [first node(s)],
  [second node(s)],
  {
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional)
  }
)
YIELD score
RETURN firstNodes, secondNodes, score
```

## `.jaccardSimilarity`  inputs
<a name="jaccard-similarity-inputs"></a>
+ **first node(s)** *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  One or more nodes for which to find the Jaccard similarity score with respect to the corresponding second node(s).
+ **second node(s)** *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  One or more nodes for which to find the Jaccard similarity score with respect to the corresponding first node(s).
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A node label for node filtering. If a node label is provided, nodes matching the label are the only nodes that are considered neighbors. This does not filter the nodes in the first or second node lists.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default: outbound*.

     The direction of edge to follow. Must be one of: "inbound", "outbound", or "both". 

## Outputs for the `.jaccardSimilarity` algorithm
<a name="jaccard-similarity-outputs"></a>

**score**: A row for each node in the first node list and corresponding node in the second node list, and the Jaccard similarity score for the two.

If either input node list is empty, the output is empty.

## `.jaccardSimilarity`  query examples
<a name="jaccard-similarity-query-examples"></a>

The example below is a query integration example, where the node list inputs for `.jaccardSimilarity` come from a preceding `MATCH` clause:

```
MATCH (n1:Person {name: "Alice"}), (n2:Person {name: "Bob"}) 
CALL neptune.algo.jaccardSimilarity(n1, n2, {edgeLabels: ['knows']})
YIELD score
RETURN n1, n2, score
```

Another example:

```
MATCH (n {code: "AUS"})
MATCH (m {code: "FLL"})
CALL neptune.algo.jaccardSimilarity(
  n,
  m,
  {
    edgeLabels: ["route"],
    vertexLabel: "airport"
  }
)
YIELD score
RETURN n, m, score
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample   `.jaccardSimilarity`   output
<a name="jaccard-similarity-sample-output"></a>

Here is an example of the output returned by .jaccardSimilarity when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "MATCH (n {code: 'AUS'})
                       MATCH (m {code: "FLL"})
                       CALL neptune.algo.jaccardSimilarity(n, m,
                           {edgeLabels: [\"route\"], vertexLabel: \"airport\"})
                       YIELD score
                       RETURN n, m, score"
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt
{
  "results": [
    {
      "n": {
        "~id": "3",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 30.1944999694824,
          "elev": 542,
          "type": "airport",
          "code": "AUS",
          "lon": -97.6698989868164,
          "runways": 2,
          "longest": 12250,
          "communityId": 2357352929951971,
          "city": "Austin",
          "region": "US-TX",
          "desc": "Austin Bergstrom International Airport",
          "prscore": 0.0012390684569254518,
          "degree": 188,
          "wccid": 2357352929951779,
          "ccscore": 0.1833982616662979,
          "country": "US",
          "icao": "KAUS"
        }
      },
      "m": {
        "~id": "9",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 26.0725994110107,
          "elev": 64,
          "type": "airport",
          "code": "FLL",
          "lon": -80.152702331543,
          "runways": 2,
          "longest": 9000,
          "communityId": 2357352929951971,
          "city": "Fort Lauderdale",
          "region": "US-FL",
          "desc": "Fort Lauderdale/Hollywood International Airport",
          "prscore": 0.0024497462436556818,
          "degree": 316,
          "wccid": 2357352929951779,
          "ccscore": 0.19741515815258027,
          "country": "US",
          "icao": "KFLL"
        }
      },
      "score": 0.2953367829322815
    }
  ]
}
```

# Overlap similarity algorithm
<a name="overlap-similarity"></a>

Overlap Similarity is an algorithm that measures the overlap between the neighbors of two nodes. It does this by dividing the intersection of the two neighborhoods by the neighbor with minimum degree.

By calculating the ratio of common neighbors shared by two nodes to the total number of neighbors they collectively have, it provides a measure of their closeness or similarity within the network. Overlap similarity is applied in social network analysis to identify communities of individuals with shared interests or interactions, and in biological networks to detect common functionalities among proteins in molecular pathways.

## `.overlapSimilarity`  syntax
<a name="overlap-similarity-syntax"></a>

```
CALL neptune.algo.overlapSimilarity(
  [first node(s)],
  [second node(s)],
  {
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional)
  }
)
YIELD score
RETURN firstNodes, secondNodes, score
```

## `.overlapSimilarity`  inputs
<a name="overlap-similarity-inputs"></a>
+ **first node(s)** *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  One or more nodes for which to find the overlap similarity score with respect to the corresponding second node(s).
+ **second node(s)** *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  One or more nodes for which to find the overlap similarity score with respect to the corresponding first node(s).
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A node label for node filtering. If a node label is provided, nodes matching the label are the only nodes that are considered neighbors. This does not filter the nodes in the first or second node lists.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default: outbound*.

     The direction of edge to follow. Must be one of: "inbound", "outbound", or "both". 

## `.overlapSimilarity`  outputs
<a name="overlap-similarity-outputs"></a>

**score**: A row for each node in the first node list and corresponding node in the second node list, and the overlap similarity score for the two.

If either input node list is empty, the output is empty.

## `.overlapSimilarity`  query examples
<a name="overlap-similarity-query-examples"></a>

This is a query integration example, where `.overlapSimilarity` takes its input node lists from the output of a `MATCH` clause:

```
MATCH (n1:Person {name: "Alice"}), (n2:Person {name: "Bob"})
CALL neptune.algo.overlapSimilarity(n1, n2, {edgeLabel: 'knows'})
YIELD score
RETURN n1, n2, score
```

Another example:

```
MATCH (n {code: "AUS"})
MATCH (m {code: "FLL"})
CALL neptune.algo.overlapSimilarity(
  n,
  m,
  {
    edgeLabels: ["route"],
    vertexLabel: "airport"
  }
)
YIELD score
RETURN n, m, score'
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample   `.overlapSimilarity`   output
<a name="overlap-similarity-sample-output"></a>

Here is an example of the output returned by .overlapSimilarity when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string 'MATCH (n {code: "AUS"})
                       MATCH (m {code: "FLL"})
                       CALL neptune.algo.overlapSimilarity(
                         n,
                         m,
                         {
                           edgeLabels: ["route"],
                           vertexLabel: "airport"
                         }
                       )
                       YIELD score
                       RETURN n, m, score' \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt
{
  "results": [
    {
      "n": {
        "~id": "3",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 30.1944999694824,
          "elev": 542,
          "type": "airport",
          "code": "AUS",
          "lon": -97.6698989868164,
          "runways": 2,
          "longest": 12250,
          "communityId": 2357352929951971,
          "city": "Austin",
          "region": "US-TX",
          "desc": "Austin Bergstrom International Airport",
          "prscore": 0.0012390684569254518,
          "degree": 188,
          "wccid": 2357352929951779,
          "ccscore": 0.1833982616662979,
          "country": "US",
          "icao": "KAUS"
        }
      },
      "m": {
        "~id": "9",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 26.0725994110107,
          "elev": 64,
          "type": "airport",
          "code": "FLL",
          "lon": -80.152702331543,
          "runways": 2,
          "longest": 9000,
          "communityId": 2357352929951971,
          "city": "Fort Lauderdale",
          "region": "US-FL",
          "desc": "Fort Lauderdale/Hollywood International Airport",
          "prscore": 0.0024497462436556818,
          "degree": 316,
          "wccid": 2357352929951779,
          "ccscore": 0.19741515815258027,
          "country": "US",
          "icao": "KFLL"
        }
      },
      "score": 0.6129032373428345
    }
  ]
}
```

# Clustering and community detection algorithms in Neptune Analytics
<a name="clustering-algorithms"></a>

Clustering algorithms evaluate how nodes are clustered in communities, in closely-knit sets, or in highly or loosely interconnected groups.

These algorithms can identify meaningful groups or clusters of nodes in a network, revealing hidden patterns and structures that can provide insights into the organization and dynamics of complex systems. This is valuable in social network analysis and in biology, for identifying functional modules in protein-protein interaction networks, and more generally for understanding information flow and influence propagation in many different domains.

Neptune Analytics supports these community detection algorithms:
+ [`wcc`](wcc.md)   –   The Weakly Connected Components (WCC) algorithm finds weakly-connected components in a directed graph. A weakly-connected component is a group of nodes where every node in the group is reachable from every other node in the group if edge direction is ignored.

  Identifying weakly-connected components helps in understanding the overall connectivity and structure of the graph. Weakly-connected components can be used in transportation networks to identify disconnected regions that may require improved connectivity, and in social networks to find isolated groups of users with limited interactions, and in webpage analysis to pinpoint sections with low accessibility.
+ [`wcc.mutate`](wcc-mutate.md)   –   This algorithm stores the calculated component value of each given node as a property of the node.
+ [`labelPropagation`](label-propagation.md)   –   Label Propagation Algorithm (LPA) is an algorithm for community detection that is also used in semi-supervised machine learning for data classification.
+ [`labelPropagation.mutate`](label-propagation-mutate.md)   –   Label Propagation Algorithm (LPA) is an algorithm that assigns labels to nodes based on the consensus of their neighboring nodes, making it useful for identifying groups. Label propagation can be applied in social networks to find groups, and in identity management to identify households, and in recommendation systems to group similar products for personalized suggestions. It can also be used in semi-supervised machine learning for data classification.
+ [`scc`](scc.md)   –   The Strongly Connected Components (SCC) algorithm identifies maximally connected subgraphs of a directed graph, where every node is reachable from every other node. This can provide insights into the tightly interconnected portions of a graph and highlight key structures within it. Strongly connected components are valuable in computer programming for detecting loops or cycles in code, in social networks to find tightly connected groups of users who interact frequently, and in web crawling to identify clusters of interlinked pages for efficient indexing.
+ [`scc.mutate`](scc-mutate.md)   –   This algorithm finds the maximally connected subgraphs of a directed graph and writes their component IDs as a new property of each subgraph node.
+  [`louvain`](louvain.md)   –   The Louvain algorithm is a hierarchical community detection method that identifies groups of densely connected nodes within networks. It works by optimizing modularity - a measure comparing internal community connection density against random networks with the same degree distribution. Through iterative local optimization and network aggregation, it efficiently detects community structures in large networks. The algorithm can run in both weighted and unweighted modes; when an edge weight property is specified, it operates in weighted mode. Louvain is valuable in social networks for identifying user communities, in biological networks to discover functional modules or protein complexes, and in financial networks to detect market segments or potential fraud rings. 
+  [`louvain.mutate`](louvain-mutate.md)   –   This Louvain variant stores the calculated community ID of each node as a property of the node, enabling persistent community assignments for applications like customer segmentation, network infrastructure optimization, and research community identification. 

# Weakly connected components algorithm
<a name="wcc"></a>

The Weakly Connected Components (WCC) algorithm finds the weakly-connected components in a directed graph. A weakly-connected component is a group of nodes in which every node is reachable from every other node when edge directions are ignored. Weakly connected components are the maximal connected subgraphs of an undirected graph.

Identifying weakly-connected components helps in understanding the overall connectivity and structure of the graph. Weakly-connected components can be used in transportation networks to identify disconnected regions that may require improved connectivity, and in social networks to find isolated groups of users with limited interactions, and in webpage analysis to pinpoint sections with low accessibility.

The time complexity of the WCC algorithm is `O(|E|logD)`, where `|E|` is the number of edges in the graph, and `D` is the diameter (the length of the longest path from one node to any other node) of the graph.

The memory used by the WCC algorithm is approximately \$1V\$1 \$1 20 bytes.

## `.wcc`  syntax
<a name="wcc-syntax"></a>

```
CALL neptune.algo.wcc(
  [source-node list (required)],
  {
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD node, component
RETURN node, component
```

## `.wcc`  inputs
<a name="wcc-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting locations for the algorithm. Each node in the list triggers an execution of the algorithm. If an empty list is provided, the query result is also empty.

  If the algorithm is called following a `MATCH` clause (query algo integration), the source query list is the result returned by the `MATCH` clause. 
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A node label for node filtering. If a node label is provided, only nodes matching the label are considered. This includes the nodes in the source node lists.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.wcc`  outputs
<a name="wcc-outputs"></a>

For each source node:
+ **node**   –   The source node.
+ **component**   –   The component ID associated with the source node.

If the input node list is empty, the output is empty.

## `.wcc`  query examples
<a name="wcc-query-examples"></a>

This is a standalone example, where the source node list is explicitly provided in the query:

```
CALL neptune.algo.wcc(
  ["101"],
  {
    edgeLabels: ["route"],
    vertexLabel: "airport",
    concurrency: 2
  }
)
YIELD node, component
RETURN node, component
```

This is a query integration example, where `.wcc` follows a `MATCH` clause and uses the output of the MATCH clause as its source node list:

```
MATCH (n) WHERE n.region = 'US-WA'
CALL neptune.algo.wcc( 
  n, 
  {
    edgeLabels: ["route"],
    vertexLabel: "airport"
  }
)
YIELD component
RETURN n, component
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.wcc` output
<a name="wcc-sample-output"></a>

Here is an example of the output returned by .wcc when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "MATCH (n)
                       CALL neptune.algo.wcc(n)
                       YIELD node, component
                       RETURN node, component
                       LIMIT 2"
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt
{
  "results": [
    {
      "node": {
        "~id": "10",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 38.94449997,
          "elev": 313,
          "longest": 11500,
          "city": "Washington D.C.",
          "type": "airport",
          "region": "US-VA",
          "desc": "Washington Dulles International Airport",
          "code": "IAD",
          "prscore": 0.002264724113047123,
          "lon": -77.45580292,
          "wccid": 2357352929951779,
          "country": "US",
          "icao": "KIAD",
          "runways": 4
        }
      },
      "component": 2357352929951779
    },
    {
      "node": {
        "~id": "12",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 40.63980103,
          "elev": 12,
          "longest": 14511,
          "city": "New York",
          "type": "airport",
          "region": "US-NY",
          "desc": "New York John F. Kennedy International Airport",
          "code": "JFK",
          "prscore": 0.002885053399950266,
          "lon": -73.77890015,
          "wccid": 2357352929951779,
          "country": "US",
          "icao": "KJFK",
          "runways": 4
        }
      },
      "component": 2357352929951779
    }
  ]
}
```

# Weakly connected components mutate algorithm
<a name="wcc-mutate"></a>

The mutate variant of the weakly connected components (WCC) algorithm performs the weakly connected components calculation over the entire graph unless the configuration parameters establish a filter, and each traversed node's calculated WCC value is stored as a property on the node.

## `.wcc.mutate`  syntax
<a name="wcc-mutate-syntax"></a>

```
CALL neptune.algo.wcc.mutate(
  {
    writeProperty: the name for the node property to which to write component IDs
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD success
RETURN success
```

## `.wcc.mutate`  inputs
<a name="wcc-mutate-inputs"></a>

Inputs for `.wcc.mutate` are passed in a configuration object that contains:
+ **`writeProperty`**   *(required)*   –   *type:* `string`;   *default: none*.

  A name for the new node property where the component IDs will be written.
+ **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

  To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
+ **`vertexLabel`**   *(optional)*   –   *type:* `string`;   *default: none*.

  The node label to filter on for traversing. Only nodes matching this label will be traversed. For example: `"airport"`.
+ **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

  Controls the number of concurrent threads used to run the algorithm.

   If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.wcc.mutate`  outputs
<a name="wcc-mutate-outputs"></a>

**success**:  The computed component IDs are written as a new property on each node using the property name specified by `writeProperty`, and a single `success` flag (`true` or `false`) is returned to indicate whether or not the writes succeeded.

## `.wcc.mutate`  query examples
<a name="wcc-mutate-query-examples"></a>

This query writes the calculated component ID of each vertex in the graph to a new property of the vertex named `CCID`:

```
CALL neptune.algo.wcc.mutate(
  {
    writeProperty: "CCID", 
    edgeLabels: ["route"],
    vertexLabel: "airport",
    concurrency: 2
  }
)
```

After the mutate algorithm call above, the following query can retrieve the CCID property of a specific node:

```
MATCH (n: airport {code: "SEA"})
RETURN n.CCID
```

## Sample `.wcc.mutate` output
<a name="wcc-mutate-sample-output"></a>

Here is an example of the output returned by .wcc.mutate when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.wcc.mutate({writeProperty: 'wccid'}) YIELD success RETURN success"
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt
{
  "results": [
    {
      "success": true
    }]
}
```

# Label propagation algorithm (LPA)
<a name="label-propagation"></a>

Label Propagation Algorithm (LPA) is an algorithm for community detection that is also used in semi-supervised machine learning for data classification.

A community structure is loosely defined as a tightly knit group of entities in social networks. LPA can be enhanced by providing a set of seed nodes, the quality of which can dramatically influence the solution quality of the found communities. If the seeds are well-selected, the quality of the solution can be good, but if not, the quality of the solution can be very bad.

See Xu T. Liu *et al*, [Direction-optimizing label propagation and its application to community detection](https://doi.org/10.1145/3387902.3392634), and Xu T. Liu *et al*, [Direction-optimizing Label Propagation Framework for Structure Detection in Graphs: Design, Implementation, and Experimental Analysis](https://doi.org/10.1145/3564593), and the [Neo4j Label Propagation API](https://neo4j.com/docs/graph-data-science/current/algorithms/label-propagation/).

The time complexity of the algorithm is `O(k|E|)`, where `|E|` is the number of edges in the graph, and `k` is the number of iterations for the algorithm to converge. Its space complexity is `O(|V|)`, where `|V|` is the number of nodes in the graph.

## `.labelPropagation`  syntax
<a name="label-propagation-syntax"></a>

```
CALL neptune.algo.labelPropagation(
  [source-node list (required)],
  {
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    vertexWeightProperty: a numeric node property used to weight the community ID (optional),
    vertexWeightType: numeric type of the specified vertexWeightProperty (optional),
    edgeWeightProperty: a numeric edge property used to weight the community ID (optional),
    edgeWeightType: numeric type of the specified edgeWeightProperty (optional),
    maxIterations: the maximum number of iterations to run (optional, default: 10),
    traversalDirection: traversal direction (optional, default: outbound),
    concurrency: number of threads to use (optional)
  }
)
Yield node, community
Return node, community
```

## `.labelPropagation`  inputs
<a name="label-propagation-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

  The node or nodes to use as the starting locations for the algorithm. Each node in the list triggers an execution of the algorithm. If an empty list is provided, the query result is also empty.

  If the algorithm is called following a `MATCH` clause (query algo integration), the source query list is the result returned by the `MATCH` clause. 
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

    A node label for node filtering. If a node label is provided, nodes matching the label are the only nodes that are included in the calculation, including nodes in the input list.
  + **vertexWeightProperty** *(optional)*   –   *type:* `string`;   *default: none*.

    The node weight used in Label Propagation. When `vertexWeightProperty` is not specified, each node's `communityId` is treated equally, as if the node weight were 1.0. When the `vertexWeightProperty` is specified without an `edgeWeightProperty`, the weight of the `communityId` for each node is the value of the node weight property. When both `vertexWeightProperty` and `edgeWeightProperty` are specified, the weight of the `communityId` is the product of the node property value and edge property value.

    Note that if multiple properties exist on the node with the name specified by `vertexWeightProperty`, one of those property values will be sampled at random.
  + **vertexWeightType** *(required if `vertexWeightProperty` is present)*   –   *type:* `string`;   *valid values:* `"int"`, `"long"`, `"float"`, `"double"`;   *default: empty*.

    The type of the numeric values in the node property specified by `vertexWeightProperty`.

    If `vertexWeightProperty` is not provided, `vertexWeightType` is ignored. If a node contains a numeric property with the name specified by `vertexWeightProperty` but its value is a different numeric type than is specified by `vertexWeightType`, the value is typecast to the type specified by `vertexWeightType`. If both `vertexWeightType` and `edgeWeightType` are given, the type specified by `edgeWeightType` is used for both node and edge properties.
  + **edgeWeightProperty** *(optional)*   –   *type:* `string`;   *default: none*.

    The numeric edge property used as a weight in Label Propagation. When `vertexWeightProperty` is not specified, the default edge weight is 1.0, so each edge is treated equally. When only `edgeWeightProperty` is provided, the weight of the communityId is the value of that edge property. When both `vertexWeightProperty` and `edgeWeightProperty` are present, the weight of a `communityId` is the product of the edge property value and the node property value.

    Note that if multiple properties exist on the edge with the name specified by `edgeWeightProperty`, one of those property values will be sampled at random.
  + **edgeWeightType** *(required if `edgeWeightProperty` is present)*   –   *type:* `string`;   *valid values:* `"int"`, `"long"`, `"float"`, `"double"`;   *default: none*.

    The type of the numeric values in the edge property specified by `edgeWeightProperty`.

    If `edgeWeightProperty` is not provided, `edgeWeightType` is ignored. If a node contains a numeric property with the name specified by `edgeWeightProperty` but its value is a different numeric type than is specified by `edgeWeightType`, the value is typecast to the type specified by `edgeWeightType`. If both `vertexWeightType` and `edgeWeightType` are given, the type specified by `edgeWeightType` is used for both node and edge properties.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

    The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
  + **maxIterations** *(optional)*   –   *type:* integer;   *default:* `10`.

    The maximum number of iterations to run.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.labelPropagation`  outputs
<a name="label-propagation-outputs"></a>
+ **node**   –   A key column of the input nodes.
+ **community**   –   A key column of the corresponding `communityId` values for those nodes. All the nodes with the same `communityId` are in the same weakly-connected component.

If the input node list is empty, the output is empty.

## `.labelPropagation`  query examples
<a name="label-propagation-query-examples"></a>

This is a standalone example, where the source node list is explicitly provided in the query. It runs the algorithm over the whole graph, but only queries the component ID of one node:

```
CALL neptune.algo.labelPropagation(
  ["101"],
  {
    edgeLabels: ["route"], 
    maxIterations: 10,
    vertexLabel: "airport",
    vertexWeightProperty: "runways",
    vertexWeightType: "int",
    edgeWeightProperty: "dist",
    edgeWeightType: "int",
    traversalDirection: "both",
    concurrency: 2
  }
)
YIELD node, community
RETURN node, community
```

This is a query integration example, where `.labelPropagation` uses the output of a preceding `MATCH` clause as its source node list:

```
Match (n) 
CALL neptune.algo.labelPropagation(
  n,
  {
    edgeLabels: ["route"], 
    maxIterations: 10,
    vertexLabel: "airport",
    vertexWeightProperty: "runways",
    vertexWeightType: "int",
    edgeWeightProperty: "dist",
    edgeWeightType: "int",
    traversalDirection: "both",
    concurrency: 2
  }
) 
YIELD community
RETURN n, community
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample   `.labelPropagation`   output
<a name="label-propagation-sample-output"></a>

Here is an example of the output returned by .labelPropagation when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
"results": [{
      "node": {
        "~id": "8",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "region": "US-TX",
          "runways": 7,
          "country": "US",
          "city": "Dallas",
          "type": "airport",
          "icao": "KDFW",
          "lon": -97.038002014160199,
          "code": "DFW",
          "lat": 32.896800994872997,
          "longest": 13401,
          "elev": 607,
          "desc": "Dallas/Fort Worth International Airport"
        }
      },
      "community": 2357352929952311
    }, {
      "node": {
        "~id": "24",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "region": "US-CA",
          "runways": 3,
          "country": "US",
          "city": "San Jose",
          "type": "airport",
          "icao": "KSJC",
          "lon": -121.929000854492,
          "code": "SJC",
          "lat": 37.362598419189503,
          "longest": 11000,
          "elev": 62,
          "desc": "Norman Y. Mineta San Jose International Airport"
        }
      },
      "community": 2357352929952311
    }]
```

# Label propagation mutate algorithm
<a name="label-propagation-mutate"></a>

[Label Propagation Algorithm (LPA)](label-propagation.md) is an algorithm for community detection that is also used in semi-supervised machine learning for data classification.

The `.labelPropagation.mutate` variant of the algorithm writes the derived community component ID of each node in the source list to a new property of that node.

## `.labelPropagation.mutate`  syntax
<a name="label-propagation-mutate-syntax"></a>

```
CALL neptune.algo.labelPropagation.mutate(
  {
    writeProperty: the name for the node property to which to write component IDs
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    vertexWeightProperty: a numeric node property used to weight the community ID (optional),
    vertexWeightType: numeric type of the specified vertexWeightProperty (optional),
    edgeWeightProperty: a numeric edge property used to weight the community ID (optional),
    edgeWeightType: numeric type of the specified edgeWeightProperty (optional),
    maxIterations: the maximum number of iterations to run (optional, default: 10),
    traversalDirection: traversal direction (optional, default: outbound),
    concurrency: number of threads to use (optional)
  }
)
```

## `.labelPropagation.mutate`  inputs
<a name="label-propagation-mutate-inputs"></a>

Inputs for `.labelPropagation.mutate` are passed in a configuration object that contains:
+ **writeProperty** *(required)*   –   *type:* `string`;   *default: none*.

  A name for the new node property that will contain the computed community component ID of the node.
+ **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

  To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
+ **vertexLabel** *(optional)*   –   *type:* `string`;   *default: none*.

  A node label for node filtering. If a node label is provided, nodes matching the label are the only nodes that are included in the calculation, including nodes in the input list.
+ **vertexWeightProperty** *(optional)*   –   *type:* `string`;   *default: none*.

  The node weight used in Label Propagation. When `vertexWeightProperty` is not specified, each node's `communityId` is treated equally, as if the node weight were 1.0. When the `vertexWeightProperty` is specified without an `edgeWeightProperty`, the weight of the `communityId` for each node is the value of the node weight property. When both `vertexWeightProperty` and `edgeWeightProperty` are specified, the weight of the `communityId` is the product of the node property value and edge property value.

  Note that if multiple properties exist on the node with the name specified by `vertexWeightProperty`, one of those property values will be sampled at random.
+ **vertexWeightType** *(required if `vertexWeightProperty` is present)*   –   *type:* `string`;   *valid values:* `"int"`, `"long"`, `"float"`, `"double"`;   *default: empty*.

  The type of the numeric values in the node property specified by `vertexWeightProperty`.

  If `vertexWeightProperty` is not provided, `vertexWeightType` is ignored. If a node contains a numeric property with the name specified by `vertexWeightProperty` but its value is a different numeric type than is specified by `vertexWeightType`, the value is typecast to the type specified by `vertexWeightType`. If both `vertexWeightType` and `edgeWeightType` are given, the type specified by `edgeWeightType` is used for both node and edge properties.
+ **edgeWeightProperty** *(optional)*   –   *type:* `string`;   *default: none*.

  The numeric edge property used as a weight in Label Propagation. When `vertexWeightProperty` is not specified, the default edge weight is 1.0, so each edge is treated equally. When only `edgeWeightProperty` is provided, the weight of the communityId is the value of that edge property. When both `vertexWeightProperty` and `edgeWeightProperty` are present, the weight of a `communityId` is the product of the edge property value and the node property value.

  Note that if multiple properties exist on the edge with the name specified by `edgeWeightProperty`, one of those property values will be sampled at random.
+ **edgeWeightType** *(required if `edgeWeightProperty` is present)*   –   *type:* `string`;   *valid values:* `"int"`, `"long"`, `"float"`, `"double"`;   *default: none*.

  The type of the numeric values in the edge property specified by `edgeWeightProperty`.

  If `edgeWeightProperty` is not provided, `edgeWeightType` is ignored. If a node contains a numeric property with the name specified by `edgeWeightProperty` but its value is a different numeric type than is specified by `edgeWeightType`, the value is typecast to the type specified by `edgeWeightType`. If both `vertexWeightType` and `edgeWeightType` are given, the type specified by `edgeWeightType` is used for both node and edge properties.
+ **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "outbound"`.

  The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
+ **maxIterations** *(optional)*   –   *type:* integer;   *default:* `10`.

  The maximum number of iterations to run.
+ **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

  Controls the number of concurrent threads used to run the algorithm.

   If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `.labelPropagation.mutate` algorithm
<a name="label-propagation-mutate-outputs"></a>

The community component IDs are written as a new node property of each source node using the property name specified by `writeProperty`.

If the algorithm is invoked as a standalone query, there is no other output.

If the algorithm is invoked immediately after a `MATCH` clause that supplies its source node list, the algorithm outputs a key column of the source nodes from the `MATCH` clause and a value column of success flags (true or false) to indicate whether or not the write to the new node property of that node succeeded.

## `.labelPropagation.mutate`  query example
<a name="label-propagation-mutate-query-example"></a>

```
CALL neptune.algo.labelPropagation.mutate(
  {
    writeProperty: "COMM_ID",
    edgeLabels: ["route"], 
    maxIterations: 10,
    vertexLabel: "airport",
    vertexWeightProperty: "runways",
    vertexWeightType: "int",
    edgeWeightProperty: "dist",
    edgeWeightType: "int",
    traversalDirection: "both",
    concurrency: 2
  }
)
```

## Sample  `.labelPropagation.mutate`  output
<a name="label-propagation-mutate-sample-output"></a>

Here is an example of the output returned by .labelPropagation.mutate when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.labelPropagation.mutate({writeProperty: 'communityId'}) YIELD success RETURN success" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt
{
  "results": [
    {
      "success": true
    }
  ]
}
```

# Strongly connected components algorithm
<a name="scc"></a>

Strongly connected components (SCC) are the maximally connected subgraphs of a directed graph where every node is reachable from every other node (in other words, there exists a path between every node in the subgraph).

Neptune Analytics implements this algorithm using a modified multi-step approach (see [BFS and Coloring-based Parallel Algorithms for Strongly Connected Components and Related Problems](https://www.cs.rpi.edu/~slotag/pub/SCC-IPDPS14.pdf), by George M. Slota, Sivasankaran Rajamanickam, and Kamesh Madduri, IPDPS 2014).

The time complexity of the `.scc` algorithm in the worst case is `O(|V|+|E|*D)`, where `|V|` is the number of nodes in the graph, `|E|` is the number of edges in the graph, and `D` is the diameter, defined as the length of the longest path from one node to another in the graph.

The space complexity is O(\$1V\$1), where \$1V\$1 is the number of vertices in the graph.

## `.scc`  syntax
<a name="scc-syntax"></a>

```
CALL neptune.algo.scc(
  [source-node list (required)],
  {
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD node, component
RETURN node, component
```

## `.scc`  inputs
<a name="scc-inputs"></a>
+ **a source node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.
+ 

**a configuration object that contains:**
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **vertexLabel**   *(optional)*   –   *type:* `string`;   *default: none*.

    A node label to filter on.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.scc`  outputs
<a name="scc-outputs"></a>

For each source node:
+ **node**   –   The source node.
+ **component**   –   The component ID associated with the source node.

If the input node list is empty, the output is empty.

## `.scc`  query examples
<a name="scc-examples"></a>

This is a query integration example, where `.scc` follows a `MATCH` clause that generates its input node list:

```
Match (n)
CALL neptune.algo.scc(n, {edgeLabels: ["route", "contains"]})
YIELD component
RETURN n, component
```

This is another query integration example:

```
Match (n)
CALL neptune.algo.scc(n, {})
Yield component
Return n, component
```

**Warning**  
It is not good practice to use `MATCH(n)` without restriction in query integrations. Keep in mind that every node returned by the `MATCH(n)` clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use `LIMIT` or put conditions on the `MATCH` clause to restrict its output appropriately.

## Sample `.scc` output
<a name="scc-sample-output"></a>

Here is an example of the output returned by .scc when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "MATCH (n) CALL neptune.algo.scc(n) YIELD node, component RETURN node, component LIMIT 2" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt
{
  "results": [
    {
      "node": {
        "~id": "10",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 38.94449997,
          "elev": 313,
          "type": "airport",
          "code": "IAD",
          "lon": -77.45580292,
          "runways": 4,
          "longest": 11500,
          "communityId": 2357352929951971,
          "city": "Washington D.C.",
          "region": "US-VA",
          "desc": "Washington Dulles International Airport",
          "prscore": 0.002264724113047123,
          "degree": 312,
          "wccid": 2357352929951779,
          "ccscore": 0.20877772569656373,
          "country": "US",
          "icao": "KIAD"
        }
      },
      "component": 2357352929966149
    },
    {
      "node": {
        "~id": "12",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 40.63980103,
          "elev": 12,
          "type": "airport",
          "code": "JFK",
          "lon": -73.77890015,
          "runways": 4,
          "longest": 14511,
          "communityId": 2357352929951971,
          "city": "New York",
          "region": "US-NY",
          "desc": "New York John F. Kennedy International Airport",
          "prscore": 0.002885053399950266,
          "degree": 403,
          "wccid": 2357352929951779,
          "ccscore": 0.2199712097644806,
          "country": "US",
          "icao": "KJFK"
        }
      },
      "component": 2357352929966149
    }
  ]
}
```

# Strongly connected components mutate algorithm
<a name="scc-mutate"></a>

[Strongly connected components (SCC)](scc.md) are the maximally connected subgraphs of a directed graph, where every node is reachable from every other node (in other words, there exists a path between every node in the subgraph).

The time complexity of the `.scc-mutate` algorithm in the worst case is `O(|V|+|E|*D)`, where `|V|` is the number of nodes in the graph, `|E|` is the number of edges in the graph, and `D` is the diameter, the length of the longest path from one node to another in the graph.

The space complexity is O(\$1V\$1), where \$1V\$1 is the number of vertices in the graph.

## `.scc.mutate`  syntax
<a name="scc-mutate-syntax"></a>

```
CALL neptune.algo.scc.mutate(
  {
    writeProperty: the name for the node property to which to write component IDs
    edgeLabels: [list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    concurrency: number of threads to use (optional)
  }
)
YIELD success
RETURN success
```

## Inputs for the `.scc.mutate` algorithm
<a name="scc-mutate-inputs"></a>

Inputs for `.scc.mutate` are passed in a configuration object that contains:
+ **writeProperty**   *(required)*   –   *type:* `string`;   *default: none*.

  A name for the new node property where the component IDs will be written.
+ **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

  To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
+ **vertexLabel**   *(optional)*   –   *type:* `string`;   *default: none*.

  The node label to filter on for traversing. Only nodes matching this label will be traversed. For example: `"airport"`.
+ **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

  Controls the number of concurrent threads used to run the algorithm.

   If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `.scc.mutate` algorithm
<a name="scc-mutate-outputs"></a>

The computed strongly connected component IDs are written as a new node property using the specified property name. A single success flag (true or false) is returned to indicate whether the computation and writes succeeded or failed.

## `.scc.mutate`  query example
<a name="scc-mutate-query-examples"></a>

```
CALL neptune.algo.scc.mutate(
  {
    writeProperty: "SCOMM_ID",
    edgeLabels: ["route", ..], 
    vertexLabel: "airport",
    concurrency: 2
  }
)
```

## Sample  `.scc.mutate`  output
<a name="scc-mutate-sample-output"></a>

Here is an example of the output returned by .scc.mutate when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "CALL neptune.algo.scc.mutate({writeProperty: 'sccid'}) YIELD success RETURN success" \
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt
{
  "results": [
    {
      "success": true
    }
  ]
}
```

# Louvain algorithm
<a name="louvain"></a>

 The Louvain algorithm is a hierarchical clustering method for detecting community structures within networks. A community is defined as a subset of nodes with dense internal connections relative to sparse external connections. The algorithm iteratively optimizes modularity, which mathematically measures the strength of network division into communities by comparing the density of connections within communities to what would be expected in a random network with the same degree distribution. 

 Through a two-phase process of local optimization and network aggregation, the algorithm identifies hierarchical community structures that maximize this modularity measure. This method is particularly valuable in network science for decomposing complex networks into their natural organizational units, with applications ranging from social network analysis to biological systems. 

 The algorithm can run in unweighted or weighted mode based on the graph and user inputs. When an edge weight property is specified, the algorithm runs in weighted mode. 

 Here are several real-world applications of the Louvain algorithm for community detection: 
+  Fraud detection in financial transactions 
+  Cybersecurity threat analysis 
+  Identifying influencer communities on social networks 
+  Planning network coverage areas 
+  Protein-protein interaction networks 

 The Louvain algorithm is particularly useful in these cases because it: 
+  Handles large-scale networks efficiently 
+  Provides good quality results 
+  Works well with weighted networks 
+  Is computationally fast 
+  Can detect hierarchical community structures 

**Note**  
 There can only be one Louvain algorithm call running at a time. 
 Louvain is expected to run a long time, hence please set the query timeout to be a large number to avoid query timeout. See [ query-timeout-milliseconds](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/query-APIs-execute-query.html#query-APIs-execute-query-input) for more information on setting upper bounds on query run time. 

## `.louvain`  syntax
<a name="louvain-syntax"></a>

```
CALL neptune.algo.louvain(
  [node list (required)],
  {
    vertexLabels: [list of vertex labels for filtering (optional)],
    edgeLabels: [list of edge labels for filtering (optional)],
    edgeWeightProperty: a numeric edge property to use as weight (optional),
    edgeWeightType: numeric type of the specified edgeWeightProperty (optional),
    maxLevels: maximum number of levels to optimize at (optional, default: 10),
    maxIterations: maximum number of iterations per level (optional, default: 10),
    levelTolerance: minimum modularity change to continue to next level (optional, default: 0.01),
    iterationTolerance: minimum modularity change to continue to next iteration (optional, default: 0.0001),
    concurrency: number of threads to use (optional)
  }
)
YIELD node, community
RETURN node, community
```

## `.louvain`  inputs
<a name="louvain-inputs"></a>
+ **a node list**   *(required)*   –   *type:* `Node[]` or `NodeId[]`;   *default: none*.

   The node or nodes for which the algorithm will return the computed community ids. If the algorithm is called following a MATCH clause (query algo integration), the node query list is the result returned by the MATCH clause. 
+ 

**a configuration object that contains:**
  + **vertexLabels**   *(optional)*   –   *type:* a list of vertex label strings;   *example:* `["airport", ...]`;   *default:* no vertex filtering.

    To filter on one more vertex labels, provide a list of the ones to filter on. If no `vertexLabels` field is provided then all vertex labels are processed during traversal.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **edgeWeightProperty**   *(optional)*   –   *type:* `string`;   *default: none*.

     A string indicating the name of the edge weight property used as weight in Louvain. When the edgeWeightProperty is not specified, each edge is treated equally, i.e., the default value of the edge weight is 1. 

     Note that if multiple properties exist on the edge with the specified name, one of these values will be sampled at random. 
  + **edgeWeightType**   *(required if edgeWeightProperty is present)*   –   *type:* `string; valid values: "int", "long", "float", "double"`;   *default: none*.

     The type of the numeric values in the edge property specified by edgeWeightProperty. If the edgeWeightProperty is not given, the edgeWeightType is ignored even if it is specified. If an edge contains a property given by edgeWeightProperty, and its type is numeric but not matching the specified edgeWeightType, it will be typecast to the specified type. 
  + **maxLevels**   *(optional)*   –   *type:* `integer`;   *default: 10*.

     The maximum number of levels of granularity at which the algorithm optimizes the modularity. 
  + **maxIterations**   *(optional)*   –   *type:* `integer`;   *default: 10*.

     The maximum number of iterations to run at each level. 
  + **levelTolerance**   *(optional)*   –   *type:* `float`;   *default: .01*.

     The minimum change in modularity required to continue to the next level. 
  + **iterationTolerance**   *(optional)*   –   *type:* `float`;   *default: .0001*.

     The minimum change in modularity required to continue to the next iteration. 
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.louvain`  outputs
<a name="louvain-outputs"></a>

For each source node:
+ **node**   –   A column of the input nodes.
+ **community**   –   A column of the corresponding communityId values for those nodes. All the nodes with the same communityId are in the same community.

If the input node list is empty, the output is also empty.

## `.louvain`  query examples
<a name="louvain-examples"></a>

 Unweighted example: 

```
CALL neptune.algo.louvain(
  ["101"],
  {
    vertexLabels: ["airport"],
    edgeLabels: ["route"],
    maxLevels: 3,
    maxIterations: 10
  }
)
YIELD node, community
RETURN node, community
```

 Weighted example: 

 Sample use case: you may want to identify natural communities of stops where there is high intra-connectivity — essentially, clusters of stops that are strongly interconnected based on passenger traffic 

```
CALL neptune.algo.louvain(
  ["101"],
  {
    vertexLabels: ["airport"],
    edgeLabels: ["route"],
    maxLevels: 3,
    maxIterations: 10,
    edgeWeightProperty: "weight",
    edgeWeightType: "int"
  }
)
YIELD node, community
RETURN node, community
```

 This is a query integration example, where `.louvain` uses the output of a preceding `MATCH` clause as its node list: 

```
Match (n)
CALL neptune.algo.louvain(
  n,
  {
    vertexLabels: ["airport"],
    edgeLabels: ["route"],
    maxLevels: 1,
    maxIterations: 10,
    edgeWeightProperty: "weight",
    edgeWeightType: "int"
  }
)
YIELD community
RETURN n, community
```

**Warning**  
 It is not good practice to use MATCH(n) without restriction in query integrations. Keep in mind that every node returned by the MATCH(n) clause invokes the algorithm once, which can result in a very long-running query if a large number of nodes is returned. Use LIMIT or put conditions on the MATCH clause to restrict its output appropriately.   
 The Louvain algorithm requires exclusive processing. Neptune will process only one Louvain algorithm execution at a time. Any subsequent algorithm requests submitted before the completion of an active process will result in an error response. 

## Sample `.louvain` output
<a name="louvain-mutate-sample-output"></a>

Here is an example of the output returned by .louvain when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
    --graph-identifier ${graphIdentifier} \
    --query-string 'MATCH (n) \
            CALL neptune.algo.louvain(n) \
            YIELD node, community \
            return node, community \
            limit 2' \
    --language open_cypher \
    /tmp/out.txt
cat /tmp/out.txt
{
"results": [{
 "node": {
   "~id": "10",
   "~entityType": "node",
   "~labels": ["airport"],
   "~properties": {
     "lat": 38.944499970000003,
     "elev": 313,
     "type": "airport",
     "code": "IAD",
     "lon": -77.455802919999996,
     "runways": 4,
     "longest": 11500,
     "communityId": 2357352929951971,
     "city": "Washington D.C.",
     "region": "US-VA",
     "desc": "Washington Dulles International Airport",
     "degree": 312,
     "country": "US",
     "icao": "KIAD"
   }
 },
 "community": 2357352929951971
 }, {
 "node": {
   "~id": "12",
   "~entityType": "node",
   "~labels": ["airport"],
   "~properties": {
     "lat": 40.639801030000001,
     "elev": 12,
     "type": "airport",
     "code": "JFK",
     "lon": -73.778900149999998,
     "runways": 4,
     "longest": 14511,
     "communityId": 2357352929951971,
     "city": "New York",
     "region": "US-NY",
     "desc": "New York John F. Kennedy International Airport",
     "degree": 403,
     "country": "US",
     "icao": "KJFK"
   }
 },
 "community": 2357352929951971
 }]
```

# Louvain mutate algorithm
<a name="louvain-mutate"></a>

 The [Louvain algorithm](louvain.md) is a hierarchical clustering method for detecting community structures within networks. 

 .louvain.mutate is a variant of the Louvain algorithm that writes the derived community component ID of each node in the node list to a new property of that node. 

**Note**  
 There can only be one Louvain algorithm call running at a time. 
 Louvain is expected to run a long time, hence please set the query timeout to be large number to avoid query timeout. See [ query-timeout-milliseconds](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/query-APIs-execute-query.html#query-APIs-execute-query-input) for more information on setting upper bounds on query run time. 

## `.louvain.mutate`  syntax
<a name="louvain-mutate-syntax"></a>

```
CALL neptune.algo.louvain.mutate(
  {
    writeProperty: property name to store community IDs (require),
    vertexLabels: [list of vertex labels for filtering (optional)],
    edgeLabels: [list of edge labels for filtering (optional)],
    edgeWeightProperty: a numeric edge property to use as weight (optional),
    edgeWeightType: numeric type of the specified edgeWeightProperty (optional),
    maxLevels: maximum number of levels to optimize at (optional, default: 10),
    maxIterations: maximum number of iterations per level (optional, default: 10),
    levelTolerance: minimum modularity change to continue to next level (optional, default: 0.01),
    iterationTolerance: minimum modularity change to continue to next iteration (optional, default: 0.0001),
    concurrency: number of threads to use (optional)
  }
)
YIELD success
RETURN success
```

## Inputs for the `.louvain.mutate` algorithm
<a name="louvain-mutate-inputs"></a>

Inputs for `.louvain.mutate` are passed in a configuration object that contains:
+ **writeProperty**   *(required)*   –   *type:* `string`;   *default: none*.

  A name for the new node property that will contain the computed community ID of the nodes.
+ 

**a configuration object that contains:**
  + **vertexLabels**   *(optional)*   –   *type:* `a list of vertex label strings; example ["airport", ...]`;   *default: no vertex filtering*.

     Node labels for node filtering. To filter on one or more vertex labels, provide a list of node labels. Vertices matching any label in the vertexLabels list are the only vertices that are passed to the algorithm computation. If no vertexLabels field is provided then all vertices are passed to the Louvain algorithm. 
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **edgeWeightProperty**   *(optional)*   –   *type:* `string`;   *default: none*.

     A string indicating the name of the edge weight property used as weight in Louvain. When the edgeWeightProperty is not specified, each edge is treated equally, i.e., the default value of the edge weight is 1. 

     Note that if multiple properties exist on the edge with the specified name, one of these values will be sampled at random. 
  + **edgeWeightType**   *(required if edgeWeightProperty is present)*   –   *type:* `string; valid values: "int", "long", "float", "double"`;   *default: none*.

     The type of the numeric values in the edge property specified by edgeWeightProperty. If the edgeWeightProperty is not given, the edgeWeightType is ignored even if it is specified. If an edge contains a property given by edgeWeightProperty, and its type is numeric but not matching the specified edgeWeightType, it will be typecast to the specified type. 
  + **maxLevels**   *(optional)*   –   *type:* `integer`;   *default: 10*.

     The maximum number of levels of granularity at which the algorithm optimizes the modularity. 
  + **maxIterations**   *(optional)*   –   *type:* `integer`;   *default: 10*.

     The maximum number of iterations to run at each level. 
  + **levelTolerance**   *(optional)*   –   *type:* `float`;   *default: .01*.

     The minimum change in modularity required to continue to the next level. 
  + **iterationTolerance**   *(optional)*   –   *type:* `float`;   *default: .0001*.

     The minimum change in modularity required to continue to the next iteration. 
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## Outputs for the `.louvain.mutate` algorithm
<a name="louvain-mutate-outputs"></a>

 The community IDs are written as a new node property using the property name specified by `writeProperty`. A single success flag (true or false) is returned to indicate whether the computation and writes succeeded or failed. 

## `.louvain.mutate`  query example
<a name="louvain-mutate-query-examples"></a>

 This is a standalone example, where the node list is explicitly provided in the query. It runs the algorithm over the whole graph, but only queries the community ID of one node: 

 Unweighted: 

```
CALL neptune.algo.louvain.mutate(
  {
    writeProperty: "louvainCommId",
    vertexLabels: ["airport"],
    edgeLabels: ["route"],
    maxLevels: 3,
    maxIterations: 10
  }
)
YIELD success
RETURN success
```

 Weighted: 

```
CALL neptune.algo.louvain.mutate(
  {
    writeProperty: "louvainCommId",
    vertexLabels: ["airport"],
    edgeLabels: ["route"],
    maxLevels: 3,
    maxIterations: 10,
    edgeWeightProperty: "weight",
    edgeWeightType: "int"
  }
)
YIELD success
RETURN success
```

## Sample  `.louvain.mutate`  output
<a name="louvain-mutate-sample-output"></a>

Here is an example of the output returned by .louvain.mutate when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
    --graph-identifier ${graphIdentifier} \
    --query-string 'query=CALL neptune.algo.louvain.mutate({writeProperty: 'communityId'}) \
            YIELD success RETURN success' \
    --language open_cypher \
    /tmp/out.txt
cat /tmp/out.txt
{
  "results": [
    {
      "success": true
    }
  ]
}
```

# Misc. graph procedures
<a name="custom-algorithms"></a>

 The miscellaneous graph procedures can be ran on your graphs to give you insight into your graphs and their metrics. 

 Property Graph Information (`graph.pg_info`) summarizes some of the basic metrics of the graph, such as the number of vertices, the number of edges, the number of edge properties, the number of vertex properties, the number of edge labels, and the number of vertex labels. 

 The `neptune.graph.pg_schema()` procedure provides a comprehensive overview of the graph structure. It extracts and summarizes the current schema of a Neptune Analytics graph, i.e., customers can observe the property names and types that appear on vertices and edges of particular labels within the graph. The procedure is designed for use cases such as: schema visualization, integration with third-party applications, and inclusion in open-source tools. 

 The `neptune.algo.degreeDistribution()` analyzes the structural characteristics of a graph. It calculates the frequency distribution of vertex degrees across the entire network and provides basic statistics of the distribution. 

**Topics**
+ [Property graph information](custom-algorithms-property-graph.md)
+ [Property graph schema](custom-algorithms-property-graph-schema.md)
+ [.degreeDistribution algorithm](degreeDistribution.md)

# Property graph information
<a name="custom-algorithms-property-graph"></a>

 Property Graph Information (graph.pg\$1info) summarizes some of the basic metrics of the graph, such as the number of vertices, the number of edges, the number of edge properties, the number of vertex properties, the number of edge labels, and the number of vertex labels. 

## Inputs for graph.pg\$1info
<a name="custom-algorithms-property-graph-input"></a>

There are no inputs for graph.pg\$1info.

## Outputs for graph.pg\$1info
<a name="custom-algorithms-property-graph-output"></a>

There are two columns in the output relation: the first column is the metric name and the second column is the count.

**metric: the metrics that graph.pg\$1info will return, which include:**
+ numVertices: the number of vertices in the graph.
+ numEdges: the number of edges in the graph.
+ numVertexProperties: the number of node properties in the graph.
+ numEdgeProperties: the number of edge properties in the graph.
+ numVertexLabels: the number of unique vertex labels in the graph.
+ numEdgeLabels: the number of unique edge labels in the graph.

**count**
+ count: the value of the above metrics.

## graph.pg\$1info query example
<a name="custom-algorithms-property-graph-query-example"></a>

```
## Syntax
CALL neptune.graph.pg_info() 
YIELD metric, count
RETURN metric, count
```

## graph.pg\$1info query integration
<a name="custom-algorithms-property-graph-query-integration"></a>

```
# sample query integration 
CALL neptune.graph.pg_info()
YIELD metric, count
WHERE metric = 'numVertices'
RETURN count
```

## Sample graph.pg\$1info output
<a name="custom-algorithms-property-graph-output-example"></a>

```
# sample output of graph.pg_info
aws neptune-graph execute-query \                                                                                                                                       
     --graph-identifier ${graphIdentifier} \
     --query-string "CALL neptune.graph.pg_info()
     YIELD metric, count
     RETURN metric, count " \
     --language open_cypher \
     /tmp/out.txt
cat /tmp/out.txt
{
  "results": [{
      "metric": "numVertices",
      "count": 3748
    }, {
      "metric": "numEdges",
      "count": 57538
    }, {
      "metric": "numVertexProperties",
      "count": 42773
    }, {
      "metric": "numEdgeProperties",
      "count": 50532
    }, {
      "metric": "numVertexLabels",
      "count": 4
    }, {
      "metric": "numEdgeLabels",
      "count": 2
    }]
}
```

# Property graph schema
<a name="custom-algorithms-property-graph-schema"></a>

 The `neptune.graph.pg_schema()` procedure provides a comprehensive overview of the graph structure. It extracts and summarizes the current schema of a Neptune Analytics graph, i.e., customers can observe the property names and types that appear on vertices and edges of particular labels within the graph. The procedure is designed for use cases such as: schema visualization, integration with third-party applications, and inclusion in open-source tools. 

**Benefits:**
+  Node and edge label enumeration: The procedure identifies and lists all unique labels for nodes and edges present in the graph (`nodeLabels` and `edgeLabels`, respectively). 
+  Property and data type analysis: For each node and edge label, it catalogs associated properties and their corresponding data types (`nodeLabelDetails` and `edgeLabelDetails`, respectively). This information is crucial for understanding the attributes of different graph elements. 
+  Topological relationship mapping: The procedure generates a set of triples in the format (`nodeLabel)-[edgeLabel]->(nodeLabel`), effectively summarizing the graph's topology and the relationships between different node types (`labelTriples`). 
+  Consistency across tools: By providing a standardized schema representation, the procedure ensures consistency across various third-party and open-source tools that interact with the graph database. 
+  Integration-friendly output: The schema information is formatted in a way that facilitates easy integration with AI tools, visualization software, and reporting systems. 

 This procedure provides a unified method of complete and up-to-date information extraction to support a wide range of applications from AI-driven query generation to data visualization and reporting. 

## Inputs for neptune.graph.pg\$1schema()
<a name="custom-algorithms-property-graph-schema-input"></a>

There are no inputs for neptune.graph.pg\$1schema().

## Outputs for neptune.graph.pg\$1schema()
<a name="custom-algorithms-property-graph-schema-output"></a>

 There is a single column in the output containing a map schema containing the following key components in the schema map: 
+  `nodeLabels`: A list of all unique labels assigned to nodes/vertices in the graph. 
+  `edgeLabels`: A list of all unique labels assigned to relationships/edges in the graph. 
+  `nodeLabelDetails`: For each node label, all properties associated with that node containing an enumeration of each property and the various data types it can manifest as across different nodes with the same label. 
  +  `label` - The node label or labels. 
  +  `properties` - An array of the superset of properties for the node: 
    +  `<key:> name` - The property name. 
    +  `<value:> A key-value dictionary (map)` - Stores data types that are available for the property. 
      +  `<key:> "datatypes"` , 
      +  `<value:> array[string]` 
    + e.g.,

      ```
      "contains": {
        "properties": {
          "weight": {
            "datatypes": ["Int"]
          }
        }
      }
      ```
+  `edgeLabelDetails`: For each edge label, all properties associated with edges that have that label containing an enumeration of each property and the various data types it can manifest as across different edges with the same label. 
  +  `label` - The edge label. 
  +  `properties` - A key-value dictionary (map) of properties for the edge label: 
    +  `<key:>` name - The property name 
    +  `<value:>` A key-value dictionary (map) - Stores data types that are available for the property. 
      +  `<key:> "datatypes"` , 
      +  `<value:> array[string]` 
+  `labelTriples`: A set of `nodeLabel-edgeLabel->nodeLabel` combinations that represent the connections between different types of nodes in the graph. These triples summarize the graph's topology by showing how different node types are related through various edge types. Each entry is a key-value dictionary, holding the following: 
  +  `~type` - The edge label. 
  +  `~from` - The node label of the head node of the node-edge->node. 
  +  `~to` - The node label of the tail node of the node-edge->node. 

## neptune.graph.pg\$1schema() query example
<a name="custom-algorithms-property-graph-schema-query-example"></a>

```
## Syntax
CALL neptune.graph.pg_schema() 
YIELD schema
RETURN schema
```

## neptune.graph.pg\$1schema() query integration
<a name="custom-algorithms-property-graph-schema-query-integration"></a>

```
# sample query integration.
# Calls pg_schema,
# Then acquires node labels,
# Then sorts them alphabetically,
# Then counts number of vertices with each label and returns it

CALL neptune.graph.pg_schema()
YIELD schema
WITH schema.nodeLabels as nl
UNWIND collSort(nl) as label
MATCH (n)
WHERE label in labels(n)
RETURN label, COUNT(n) as count

# output

{
  "results": [{
      "label": "airport",
      "count": 27
    }, {
      "label": "country",
      "count": 3
    }, {
      "label": "version",
      "count": 3
    }]
}%
```

## Sample neptune.graph.pg\$1schema() output
<a name="custom-algorithms-property-graph-schema-output-example"></a>

```
% aws neptune-graph execute-query \ 
    --graph-identifier ${graphIdentifier} \
    --query-string 'CALL neptune.graph.pg_schema()
                    YIELD schema
                    RETURN schema' \
    --language open_cypher \
    /tmp/out.txt
{
  "results": [{
      "schema": {
        "edgeLabelDetails": {
          "route": {
            "properties": {
              "weight": {
                "datatypes": ["Int"]
              },
              "dist": {
                "datatypes": ["Int"]
              }
            }
          },
          "contains": {
            "properties": {
              "weight": {
                "datatypes": ["Int"]
              }
            }
          }
        },
        "edgeLabels": ["route", "contains"],
        "nodeLabels": ["version", "airport", "continent", "country"],
        "labelTriples": [{
            "~type": "route",
            "~from": "airport",
            "~to": "airport"
          }, {
            "~type": "contains",
            "~from": "country",
            "~to": "airport"
          }, {
            "~type": "contains",
            "~from": "continent",
            "~to": "airport"
          }],
        "nodeLabelDetails": {
          "continent": {
            "properties": {
              "type": {
                "datatypes": ["String"]
              },
              "code": {
                "datatypes": ["String"]
              },
              "desc": {
                "datatypes": ["String"]
              }
            }
          },
          "airport": {
            "properties": {
              "type": {
                "datatypes": ["String"]
              },
              "city": {
                "datatypes": ["String"]
              },
              "icao": {
                "datatypes": ["String"]
              },
              "code": {
                "datatypes": ["String"]
              },
              "country": {
                "datatypes": ["String"]
              },
              "lat": {
                "datatypes": ["Double"]
              },
              "longest": {
                "datatypes": ["Int"]
              },
              "runways": {
                "datatypes": ["Int"]
              },
              "desc": {
                "datatypes": ["String"]
              },
              "lon": {
                "datatypes": ["Double"]
              },
              "region": {
                "datatypes": ["String"]
              },
              "elev": {
                "datatypes": ["Int"]
              }
            }
          },
          "country": {
            "properties": {
              "type": {
                "datatypes": ["String"]
              },
              "code": {
                "datatypes": ["String"]
              },
              "desc": {
                "datatypes": ["String"]
              }
            }
          },
          "version": {
            "properties": {
              "date": {
                "datatypes": ["String"]
              },
              "desc": {
                "datatypes": ["String"]
              },
              "author": {
                "datatypes": ["String"]
              },
              "type": {
                "datatypes": ["String"]
              },
              "code": {
                "datatypes": ["String"]
              }
            }
          }
        }
      }
    }]
}
```

# .degreeDistribution algorithm
<a name="degreeDistribution"></a>

The `.degreeDistribution` algorithm is a tool for analyzing and visualizing the structural characteristics of a graph. It calculates the frequency distribution of vertex degrees across the entire network and provides basic statistics of the distribution.

`.degreeDistribution` provides insight into the topology and connectivity patterns of the network, such as identifying the hubs (i.e., super nodes or high-degree nodes) and distinguishing different network types (e.g., tree vs. scale-free), which can help making informed decisions on selecting appropriate algorithms for analysis.

The `%degreeDistribution` magic command in the notebook provides an interactive visualization of the output, please see the [notebook magics](https://docs.aws.amazon.com//neptune/latest/userguide/notebooks-magics.html#notebooks-line-magics-degreeDistribution) documentation for details.

## `.degreeDistribution`   syntax
<a name="degreeDistribution-syntax"></a>

```
CALL neptune.algo.degreeDistribution(
  {
    vertexLabels: [a list of vertex labels for filtering (optional)],
    edgeLabels: [a list of edge labels for filtering (optional)],
    binWidth: a positive integer that specifies the size of each bin in the degree distribution (optional, default: 1),
    traversalDirection: the direction of edge used for degree computation (optional, default: "both"),
    concurrency: number of threads to use (optional)
  }
)
YIELD output
RETURN output
```

## `.degreeDistribution`   inputs
<a name="degreeDistribution-inputs"></a>
+ 

**a configuration object that contains:**
  + **vertexLabels**   *(optional)*   –   *type:* a list of vertex label strings;   *example:* `["airport", ...]`;   *default:* no vertex filtering.

    To filter on one more vertex labels, provide a list of the ones to filter on. If no `vertexLabels` field is provided then all vertex labels are processed during traversal.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **binWidth** *(optional)*   –   *type:* `integer`;   *default: 1*.

    To specify the size of each bin in the degree distribution, provide an integer value.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "both"`.

    The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.degreeDistribution`   outputs
<a name="degreeDistribution-outputs"></a>

There is a single column in the output containing a map with the following key components:
+ **distribution**   –   A list of lists where each list item is as follows:
  + [`degree`, `count`]   –   Degree and corresponding count. The list is sorted in the increasing order of `degree`.
+ **statistics**   –   A map with the following components:
  + `maxDeg`   –   the maximum degree in the graph.
  + `mean`   –   the average degree in the graph.
  + `minDeg`   –   the minimum degree in the graph.
  + `p50`   –   the 50th percentile degree in the graph, i.e., median.
  + `p75`   –   the 75th percentile degree in the graph.
  + `p90`   –   the 90th percentile degree in the graph.
  + `p95`   –   the 95th percentile degree in the graph.
  + `p99`   –   the 99th percentile degree in the graph.
  + `p999`   –   the 99.9th percentile degree in the graph.

## `.degreeDistribution`   query examples
<a name="degreeDistribution-query-examples"></a>

This is a standalone example, where the in-degree distribution is computed for the graph with specified vertex labels and edge label, and the mean degree is returned.

```
CALL neptune.algo.degreeDistribution({
   vertexLabels: ['airport', 'country'],
   edgeLabels: ['route'],
   traversalDirection: 'inbound',
})
YIELD output
WITH output.statistics.mean as meanDegree
RETURN meanDegree
```

## Sample   `.degreeDistribution`   output
<a name="degreeDistribution-sample-output"></a>

Here is an example of the output returned by .degreeDistribution when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
    \
   --region ${region}
   --graph-identifier ${graphIdentifier} \
   --query-string "CALL neptune.algo.degreeDistribution({binWidth: 50, vertexLabels: ['airport', 'country'], edgeLabels: ['route'], traversalDirection: 'inbound'}) YIELD output RETURN output" \
   --language open_cypher \
   /tmp/out.txt
   
cat /tmp/out.txt
{
  "results": [{
      "output": {
        "statistics": {
          "maxDeg": 307,
          "mean": 13.511229946524065,
          "minDeg": 0,
          "p50": 3,          
          "p75": 9,
          "p90": 36,
          "p95": 67,          
          "p99": 173,
          "p999": 284          
        },
        "distribution": [[0, 268], [50, 3204], [100, 162], [150, 54], [200, 29], [250, 16], [300, 5], [350, 2]]
      }
    }]
}
```