Syntax Inputs Outputs Integration example Sample output

Jaccard similarity algorithm

The Jaccard similarity algorithm measures the similarity between two sets. It is calculated by dividing the size of the intersection of the two sets by the size of their union.

By measuring the proportion of shared neighbors relative to the total number of unique neighbors, this algorithm provides a metric for the degree of overlap or commonality between different parts of a network. It can be useful in recommendation systems to suggest products or content to users based on their shared preferences and in biology to compare genetic sequences for identifying similarities in DNA fragments.

`.jaccardSimilarity` syntax


CALL neptune.algo.jaccardSimilarity(
  [first node(s)],
  [second node(s)],
  {
    edgeLabels: [a list of edge labels for filtering (optional)],
    vertexLabel: a node label for filtering (optional),
    traversalDirection: traversal direction (optional)
  }
)
YIELD score
RETURN firstNodes, secondNodes, score

`.jaccardSimilarity` inputs

first node(s) (required) – type: Node[] or NodeId[]; default: none.

One or more nodes for which to find the Jaccard similarity score with respect to the corresponding second node(s).
second node(s) (required) – type: Node[] or NodeId[]; default: none.

One or more nodes for which to find the Jaccard similarity score with respect to the corresponding first node(s).
a configuration object that contains:
- edgeLabels (optional) – type: a list of edge label strings; example: ["route", ...]; default: no edge filtering.
  
  To filter on one more edge labels, provide a list of the ones to filter on. If no edgeLabels field is provided then all edge labels are processed during traversal.
- vertexLabel (optional) – type: string; default: none.
  
  A node label for node filtering. If a node label is provided, nodes matching the label are the only nodes that are considered neighbors. This does not filter the nodes in the first or second node lists.
- traversalDirection (optional) – type: string; default: outbound.
  
  The direction of edge to follow. Must be one of: "inbound", "outbound", or "both".

Outputs for the `.jaccardSimilarity` algorithm

score: A row for each node in the first node list and corresponding node in the second node list, and the Jaccard similarity score for the two.

If either input node list is empty, the output is empty.

`.jaccardSimilarity` query examples

The example below is a query integration examples, where the node list inputs for .jaccardSimilarity come from a preceding MATCH clause:


MATCH (n1:Person {name: "Alice"}), (n2:Person {name: "Bob"}) 
CALL neptune.algo.jaccardSimilarity(n1, n2, {edgeLabels: ['knows']})
YIELD score
RETURN n1, n2, score

Another example:


MATCH (n {code: "AUS"})
MATCH (m {code: "FLL"})
CALL neptune.algo.jaccardSimilarity(
  n,
  m,
  {
    edgeLabels: ["route"],
    vertexLabel: "airport"
  }
)
YIELD score
RETURN n, m, score

Warning

It is not good practice to use MATCH(n) without restriction in query integrations. Keep in mind that every node returned by the MATCH(n) clause invokes the algorithm once, which can result a very long-running query if a large number of nodes is returned. Use LIMIT or put conditions on the MATCH clause to restrict its output appropriately.

Sample `.jaccardSimilarity` output

Here is an example of the output returned by .jaccardSimilarity when run against the sample air-routes dataset [nodes], and sample air-routes dataset [edges], when using the following query:


aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string "MATCH (n {code: 'AUS'})
                       MATCH (m {code: "FLL"})
                       CALL neptune.algo.jaccardSimilarity(n, m,
                           {edgeLabels: [\"route\"], vertexLabel: \"airport\"})
                       YIELD score
                       RETURN n, m, score"
  --language open_cypher \
  /tmp/out.txt
  
cat /tmp/out.txt
{
  "results": [
    {
      "n": {
        "~id": "3",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 30.1944999694824,
          "elev": 542,
          "type": "airport",
          "code": "AUS",
          "lon": -97.6698989868164,
          "runways": 2,
          "longest": 12250,
          "communityId": 2357352929951971,
          "city": "Austin",
          "region": "US-TX",
          "desc": "Austin Bergstrom International Airport",
          "prscore": 0.0012390684569254518,
          "degree": 188,
          "wccid": 2357352929951779,
          "ccscore": 0.1833982616662979,
          "country": "US",
          "icao": "KAUS"
        }
      },
      "m": {
        "~id": "9",
        "~entityType": "node",
        "~labels": ["airport"],
        "~properties": {
          "lat": 26.0725994110107,
          "elev": 64,
          "type": "airport",
          "code": "FLL",
          "lon": -80.152702331543,
          "runways": 2,
          "longest": 9000,
          "communityId": 2357352929951971,
          "city": "Fort Lauderdale",
          "region": "US-FL",
          "desc": "Fort Lauderdale/Hollywood International Airport",
          "prscore": 0.0024497462436556818,
          "degree": 316,
          "wccid": 2357352929951779,
          "ccscore": 0.19741515815258027,
          "country": "US",
          "icao": "KFLL"
        }
      },
      "score": 0.2953367829322815
    }
  ]
}

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

.neighbors.total

.overlapSimilarity

Jaccard similarity algorithm

.jaccardSimilarity syntax

.jaccardSimilarity inputs

a configuration object that contains:

Outputs for the .jaccardSimilarity algorithm

.jaccardSimilarity query examples