Neptune Analytics algorithms
Graph algorithms are powerful tools for gaining insights into data. Neptune Analytics provides a set of optimized in-database implementations of common graph algorithms that are exposed as openCypher procedures. These algorithms analyze inherent aspects of the underlying graph structure, such as connectedness (path finding), relative importance (centrality), and community membership (community detection).
Neptune Analytics natively supports over 25 optimized graph algorithms and variants in the 5 most popular categories that help customers extract insights from their graphs, which are listed in the following table.
Category | Action | Algorithms | Common Uses |
---|---|---|---|
Find the existence, quality, or availability of a path between nodes. |
|
|
|
Determines the absolute or relative importance of a node in the graph. |
|
|
|
Compare the similarities between different graph structures. |
|
|
|
Identify meaningful groups or clusters within graph structures. |
|
|
|
Identify approximate nearest neighbor (ANN) nodes by comparing vector embeddings using the Hierarchical Navigable Small World (HNSW) algorithm. |
|
|
Many of these algorithms require interacting with most or all the nodes and edges in a graph, often in an iterative fashion. As a result, they are too computationally expensive to process using normal analytic technologies. Neptune Analytics has built highly optimized implementations that allow them to run over graphs of any size.
Algorithms in Neptune Analytics are integrated naturally into openCypher through the CALL clause, as illustrated below. This lets you combine algorithms naturally with openCypher clauses, functions, and semantics to build very complex queries. For example, you could look for the top 10 most important airports in the US-AK region like this:
MATCH (n:airport {region: 'US-AK'}) CALL neptune.algo.pageRank(n, {edgeLabels: ['route'], numOfIterations: 10}) YIELD rank RETURN n.code, rank ORDER BY rank DESC LIMIT 10
You can run algorithms in the SDKs using the ExecuteOpenCypherQuery
operation or in boto3 and the AWS CLI using the execute-query
command.
If you don't want to use the SDK or CLI, you can use you can use awscurl
to sign your Neptune Analytics requests using signed
using Signature Version 4 (Sig4). For example, you can run a simple breadth-first search like this:
awscurl -X POST -H "Content-Type: application/x-www-form-urlencoded" \ https://
(graphIdentifier)
.(region)
.neptune-graph.amazonaws.com/opencypher \ --service neptune-graph \ --region(region)
\ -d "query=CALL neptune.algo.bfs([\"101\", \"102\"], {edgeLabels: [\"route\"]})"
You could run the same query using the AWS CLI like this:
aws neptune-graph execute-query \ --graph-identifier ${graphIdentifier} \ --query-string 'CALL neptune.algo.bfs(["101", "102"], {edgeLabels: ["route"]})' \ --language open_cypher \ /tmp/out.txt
Algorithms having signatures with different kinds of input
in Neptune Analytics are exposed as separate algorithms. Unless otherwise
indicated, the examples here are using the Air Routes
dataset
Neptune Analytics currently supports five main categories of algorithm:
-
Path finding algorithms – These find the existence, quality, or availability of a path or paths between two or more nodes in the graph. A path in this sense is a set of nodes and connecting edges.
By efficiently determining the optimal route between two nodes, path-finding algorithms enable you to model real-world systems like roads or social networks as interconnected nodes and edges. Finding the shortest paths between various points is crucial in applications like route planning for GPS systems, logistics optimization, and even in solving complex problems in fields like biology or engineering.
-
Centrality algorithms – These are used to determine the absolute or relative importance or influence of a node or nodes in the graph.
By identifying the most influential or important nodes within a network, centrality algorithms can provide insights about key players or critical points of interaction. This is valuable in social network analysis, where it helps pinpoint influential individuals, and in transportation networks, where it aids in identifying crucial hubs for efficient routing and resource allocation.
-
Similarity algorithms – Graph similarity algorithms allow you to compare and analyze the similarities and dissimilarities between different graph structures, which can provide insight into relationships, patterns, and commonalities across diverse datasets. This is invaluable in various fields, such as biology, for comparing molecular structures, such as social networks, for identifying similar communities, and such as recommendation systems, for suggesting similar items based on user preferences.
-
Clustering or community-detection algorithms – Community-detection algorithms can identify meaningful groups or clusters of nodes in a network, revealing hidden patterns and structures that can provide insights into the organization and dynamics of complex systems. This is valuable in social network analysis, and in biology, for identifying functional modules in protein-protein interaction networks, and more generally for understanding information flow and influence propagation in many different domains.
-
Vector Similarity Search – Vector similarity algorithms work by using vector based representations of data, a.k.a. embeddings, to answer questions about the data's context and its similarity and connection to other data. This is valuable in applications such as Retrieval Augmented Generation (RAG) applications, knowledge graph backed chatbots, and recommendation engines.