Strongly connected components algorithm
Strongly connected components (SCC) are the maximally connected subgraphs of a directed graph where every node is reachable from every other node (in other words, there exists a path between every node in the subgraph).
Neptune Analytics implements this algorithm using a modified multi-step approach (see BFS and Coloring-based Parallel Algorithms for Strongly Connected Components and Related Problems, by George M. Slota, Sivasankaran Rajamanickam, and Kamesh Madduri, IPDPS 2014).
The time complexity of the .scc
algorithm in the worst case is
O(|V|+|E|*D)
, where |V|
is the number of nodes in the graph,
|E|
is the number of edges in the graph, and D
is the diameter,
defined as the length of the longest path from one node to another in the graph.
The algorith's space complexity is O(|V|) + [storage of data O(|V|+|E|)]
.
This can be expressed in more detail as: |V| * (7 * sizeof(local id type) +
4 * sizeof(global id type)) + O(|1DData Parts|)
.
.scc
syntax
CALL neptune.algo.scc( [
source-node list (required)
], { edgeLabels: [list of edge labels for filtering (optional)
], vertexLabel:a node label for filtering (optional)
, concurrency:number of threads to use (optional)
} ) YIELD node, component RETURN node, component
.scc
inputs
-
a source node list (required) – type:
Node[]
orNodeId[]
; default: none. -
a configuration object that contains:
-
edgeLabels (optional) – type: a list of edge label strings; example:
["route",
; default: no edge filtering....
]To filter on one more edge labels, provide a list of the ones to filter on. If no
edgeLabels
field is provided then all edge labels are processed during traversal. -
vertexLabel (optional) – type:
string
; default: none.A node label to filter on.
-
concurrency (optional) – type: 0 or 1; default: 0.
Controls the number of concurrent threads used to run the algorithm.
If set to
0
, uses all available threads to complete execution of the individual algorithm invocation. If set to1
, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.
-
.scc
outputs
For each source node:
-
node – The source node.
-
component – The component ID associated with the source node.
If the input node list is empty, the output is empty.
.scc
query examples
This openCypher query has an empty input list, and so will have no output:
Match (n) CALL neptune.algo.scc(n, {edgeLabels: ["route", "contains"]}) YIELD component RETURN n, component
This is a query integration example, where .scc
follows a
MATCH
clause that generates its input node list:
Match (n) CALL neptune.algo.scc(n, {}) Yield component Return n, component
Warning
It is not good practice to use MATCH(n)
without restriction
in query integrations. Keep in mind that every node returned by the MATCH(n)
clause invokes the algorithm once, which can result a very long-running query if
a large number of nodes is returned. Use LIMIT
or put conditions on the
MATCH
clause to restrict its output appropriately.
Sample .scc
output
Here is an example of the output returned by .scc when run against the
sample air-routes dataset [nodes]
aws neptune-graph execute-query \ --graph-identifier ${graphIdentifier} \ --query-string "CALL neptune.algo.scc({writeProperty: 'sccid'}) YIELD success RETURN success" \ --language open_cypher \ /tmp/out.txt cat /tmp/out.txt { "results": [ { "node": { "~id": "10", "~entityType": "node", "~labels": ["airport"], "~properties": { "lat": 38.94449997, "elev": 313, "type": "airport", "code": "IAD", "lon": -77.45580292, "runways": 4, "longest": 11500, "communityId": 2357352929951971, "city": "Washington D.C.", "region": "US-VA", "desc": "Washington Dulles International Airport", "prscore": 0.002264724113047123, "degree": 312, "wccid": 2357352929951779, "ccscore": 0.20877772569656373, "country": "US", "icao": "KIAD" } }, "component": 2357352929966149 }, { "node": { "~id": "12", "~entityType": "node", "~labels": ["airport"], "~properties": { "lat": 40.63980103, "elev": 12, "type": "airport", "code": "JFK", "lon": -73.77890015, "runways": 4, "longest": 14511, "communityId": 2357352929951971, "city": "New York", "region": "US-NY", "desc": "New York John F. Kennedy International Airport", "prscore": 0.002885053399950266, "degree": 403, "wccid": 2357352929951779, "ccscore": 0.2199712097644806, "country": "US", "icao": "KJFK" } }, "component": 2357352929966149 } ] }