

# Misc. graph procedures
<a name="custom-algorithms"></a>

 The miscellaneous graph procedures can be ran on your graphs to give you insight into your graphs and their metrics. 

 Property Graph Information (`graph.pg_info`) summarizes some of the basic metrics of the graph, such as the number of vertices, the number of edges, the number of edge properties, the number of vertex properties, the number of edge labels, and the number of vertex labels. 

 The `neptune.graph.pg_schema()` procedure provides a comprehensive overview of the graph structure. It extracts and summarizes the current schema of a Neptune Analytics graph, i.e., customers can observe the property names and types that appear on vertices and edges of particular labels within the graph. The procedure is designed for use cases such as: schema visualization, integration with third-party applications, and inclusion in open-source tools. 

 The `neptune.algo.degreeDistribution()` analyzes the structural characteristics of a graph. It calculates the frequency distribution of vertex degrees across the entire network and provides basic statistics of the distribution. 

**Topics**
+ [Property graph information](custom-algorithms-property-graph.md)
+ [Property graph schema](custom-algorithms-property-graph-schema.md)
+ [.degreeDistribution algorithm](degreeDistribution.md)

# Property graph information
<a name="custom-algorithms-property-graph"></a>

 Property Graph Information (graph.pg\$1info) summarizes some of the basic metrics of the graph, such as the number of vertices, the number of edges, the number of edge properties, the number of vertex properties, the number of edge labels, and the number of vertex labels. 

## Inputs for graph.pg\$1info
<a name="custom-algorithms-property-graph-input"></a>

There are no inputs for graph.pg\$1info.

## Outputs for graph.pg\$1info
<a name="custom-algorithms-property-graph-output"></a>

There are two columns in the output relation: the first column is the metric name and the second column is the count.

**metric: the metrics that graph.pg\$1info will return, which include:**
+ numVertices: the number of vertices in the graph.
+ numEdges: the number of edges in the graph.
+ numVertexProperties: the number of node properties in the graph.
+ numEdgeProperties: the number of edge properties in the graph.
+ numVertexLabels: the number of unique vertex labels in the graph.
+ numEdgeLabels: the number of unique edge labels in the graph.

**count**
+ count: the value of the above metrics.

## graph.pg\$1info query example
<a name="custom-algorithms-property-graph-query-example"></a>

```
## Syntax
CALL neptune.graph.pg_info() 
YIELD metric, count
RETURN metric, count
```

## graph.pg\$1info query integration
<a name="custom-algorithms-property-graph-query-integration"></a>

```
# sample query integration 
CALL neptune.graph.pg_info()
YIELD metric, count
WHERE metric = 'numVertices'
RETURN count
```

## Sample graph.pg\$1info output
<a name="custom-algorithms-property-graph-output-example"></a>

```
# sample output of graph.pg_info
aws neptune-graph execute-query \                                                                                                                                       
     --graph-identifier ${graphIdentifier} \
     --query-string "CALL neptune.graph.pg_info()
     YIELD metric, count
     RETURN metric, count " \
     --language open_cypher \
     /tmp/out.txt
cat /tmp/out.txt
{
  "results": [{
      "metric": "numVertices",
      "count": 3748
    }, {
      "metric": "numEdges",
      "count": 57538
    }, {
      "metric": "numVertexProperties",
      "count": 42773
    }, {
      "metric": "numEdgeProperties",
      "count": 50532
    }, {
      "metric": "numVertexLabels",
      "count": 4
    }, {
      "metric": "numEdgeLabels",
      "count": 2
    }]
}
```

# Property graph schema
<a name="custom-algorithms-property-graph-schema"></a>

 The `neptune.graph.pg_schema()` procedure provides a comprehensive overview of the graph structure. It extracts and summarizes the current schema of a Neptune Analytics graph, i.e., customers can observe the property names and types that appear on vertices and edges of particular labels within the graph. The procedure is designed for use cases such as: schema visualization, integration with third-party applications, and inclusion in open-source tools. 

**Benefits:**
+  Node and edge label enumeration: The procedure identifies and lists all unique labels for nodes and edges present in the graph (`nodeLabels` and `edgeLabels`, respectively). 
+  Property and data type analysis: For each node and edge label, it catalogs associated properties and their corresponding data types (`nodeLabelDetails` and `edgeLabelDetails`, respectively). This information is crucial for understanding the attributes of different graph elements. 
+  Topological relationship mapping: The procedure generates a set of triples in the format (`nodeLabel)-[edgeLabel]->(nodeLabel`), effectively summarizing the graph's topology and the relationships between different node types (`labelTriples`). 
+  Consistency across tools: By providing a standardized schema representation, the procedure ensures consistency across various third-party and open-source tools that interact with the graph database. 
+  Integration-friendly output: The schema information is formatted in a way that facilitates easy integration with AI tools, visualization software, and reporting systems. 

 This procedure provides a unified method of complete and up-to-date information extraction to support a wide range of applications from AI-driven query generation to data visualization and reporting. 

## Inputs for neptune.graph.pg\$1schema()
<a name="custom-algorithms-property-graph-schema-input"></a>

There are no inputs for neptune.graph.pg\$1schema().

## Outputs for neptune.graph.pg\$1schema()
<a name="custom-algorithms-property-graph-schema-output"></a>

 There is a single column in the output containing a map schema containing the following key components in the schema map: 
+  `nodeLabels`: A list of all unique labels assigned to nodes/vertices in the graph. 
+  `edgeLabels`: A list of all unique labels assigned to relationships/edges in the graph. 
+  `nodeLabelDetails`: For each node label, all properties associated with that node containing an enumeration of each property and the various data types it can manifest as across different nodes with the same label. 
  +  `label` - The node label or labels. 
  +  `properties` - An array of the superset of properties for the node: 
    +  `<key:> name` - The property name. 
    +  `<value:> A key-value dictionary (map)` - Stores data types that are available for the property. 
      +  `<key:> "datatypes"` , 
      +  `<value:> array[string]` 
    + e.g.,

      ```
      "contains": {
        "properties": {
          "weight": {
            "datatypes": ["Int"]
          }
        }
      }
      ```
+  `edgeLabelDetails`: For each edge label, all properties associated with edges that have that label containing an enumeration of each property and the various data types it can manifest as across different edges with the same label. 
  +  `label` - The edge label. 
  +  `properties` - A key-value dictionary (map) of properties for the edge label: 
    +  `<key:>` name - The property name 
    +  `<value:>` A key-value dictionary (map) - Stores data types that are available for the property. 
      +  `<key:> "datatypes"` , 
      +  `<value:> array[string]` 
+  `labelTriples`: A set of `nodeLabel-edgeLabel->nodeLabel` combinations that represent the connections between different types of nodes in the graph. These triples summarize the graph's topology by showing how different node types are related through various edge types. Each entry is a key-value dictionary, holding the following: 
  +  `~type` - The edge label. 
  +  `~from` - The node label of the head node of the node-edge->node. 
  +  `~to` - The node label of the tail node of the node-edge->node. 

## neptune.graph.pg\$1schema() query example
<a name="custom-algorithms-property-graph-schema-query-example"></a>

```
## Syntax
CALL neptune.graph.pg_schema() 
YIELD schema
RETURN schema
```

## neptune.graph.pg\$1schema() query integration
<a name="custom-algorithms-property-graph-schema-query-integration"></a>

```
# sample query integration.
# Calls pg_schema,
# Then acquires node labels,
# Then sorts them alphabetically,
# Then counts number of vertices with each label and returns it

CALL neptune.graph.pg_schema()
YIELD schema
WITH schema.nodeLabels as nl
UNWIND collSort(nl) as label
MATCH (n)
WHERE label in labels(n)
RETURN label, COUNT(n) as count

# output

{
  "results": [{
      "label": "airport",
      "count": 27
    }, {
      "label": "country",
      "count": 3
    }, {
      "label": "version",
      "count": 3
    }]
}%
```

## Sample neptune.graph.pg\$1schema() output
<a name="custom-algorithms-property-graph-schema-output-example"></a>

```
% aws neptune-graph execute-query \ 
    --graph-identifier ${graphIdentifier} \
    --query-string 'CALL neptune.graph.pg_schema()
                    YIELD schema
                    RETURN schema' \
    --language open_cypher \
    /tmp/out.txt
{
  "results": [{
      "schema": {
        "edgeLabelDetails": {
          "route": {
            "properties": {
              "weight": {
                "datatypes": ["Int"]
              },
              "dist": {
                "datatypes": ["Int"]
              }
            }
          },
          "contains": {
            "properties": {
              "weight": {
                "datatypes": ["Int"]
              }
            }
          }
        },
        "edgeLabels": ["route", "contains"],
        "nodeLabels": ["version", "airport", "continent", "country"],
        "labelTriples": [{
            "~type": "route",
            "~from": "airport",
            "~to": "airport"
          }, {
            "~type": "contains",
            "~from": "country",
            "~to": "airport"
          }, {
            "~type": "contains",
            "~from": "continent",
            "~to": "airport"
          }],
        "nodeLabelDetails": {
          "continent": {
            "properties": {
              "type": {
                "datatypes": ["String"]
              },
              "code": {
                "datatypes": ["String"]
              },
              "desc": {
                "datatypes": ["String"]
              }
            }
          },
          "airport": {
            "properties": {
              "type": {
                "datatypes": ["String"]
              },
              "city": {
                "datatypes": ["String"]
              },
              "icao": {
                "datatypes": ["String"]
              },
              "code": {
                "datatypes": ["String"]
              },
              "country": {
                "datatypes": ["String"]
              },
              "lat": {
                "datatypes": ["Double"]
              },
              "longest": {
                "datatypes": ["Int"]
              },
              "runways": {
                "datatypes": ["Int"]
              },
              "desc": {
                "datatypes": ["String"]
              },
              "lon": {
                "datatypes": ["Double"]
              },
              "region": {
                "datatypes": ["String"]
              },
              "elev": {
                "datatypes": ["Int"]
              }
            }
          },
          "country": {
            "properties": {
              "type": {
                "datatypes": ["String"]
              },
              "code": {
                "datatypes": ["String"]
              },
              "desc": {
                "datatypes": ["String"]
              }
            }
          },
          "version": {
            "properties": {
              "date": {
                "datatypes": ["String"]
              },
              "desc": {
                "datatypes": ["String"]
              },
              "author": {
                "datatypes": ["String"]
              },
              "type": {
                "datatypes": ["String"]
              },
              "code": {
                "datatypes": ["String"]
              }
            }
          }
        }
      }
    }]
}
```

# .degreeDistribution algorithm
<a name="degreeDistribution"></a>

The `.degreeDistribution` algorithm is a tool for analyzing and visualizing the structural characteristics of a graph. It calculates the frequency distribution of vertex degrees across the entire network and provides basic statistics of the distribution.

`.degreeDistribution` provides insight into the topology and connectivity patterns of the network, such as identifying the hubs (i.e., super nodes or high-degree nodes) and distinguishing different network types (e.g., tree vs. scale-free), which can help making informed decisions on selecting appropriate algorithms for analysis.

The `%degreeDistribution` magic command in the notebook provides an interactive visualization of the output, please see the [notebook magics](https://docs.aws.amazon.com//neptune/latest/userguide/notebooks-magics.html#notebooks-line-magics-degreeDistribution) documentation for details.

## `.degreeDistribution`   syntax
<a name="degreeDistribution-syntax"></a>

```
CALL neptune.algo.degreeDistribution(
  {
    vertexLabels: [a list of vertex labels for filtering (optional)],
    edgeLabels: [a list of edge labels for filtering (optional)],
    binWidth: a positive integer that specifies the size of each bin in the degree distribution (optional, default: 1),
    traversalDirection: the direction of edge used for degree computation (optional, default: "both"),
    concurrency: number of threads to use (optional)
  }
)
YIELD output
RETURN output
```

## `.degreeDistribution`   inputs
<a name="degreeDistribution-inputs"></a>
+ 

**a configuration object that contains:**
  + **vertexLabels**   *(optional)*   –   *type:* a list of vertex label strings;   *example:* `["airport", ...]`;   *default:* no vertex filtering.

    To filter on one more vertex labels, provide a list of the ones to filter on. If no `vertexLabels` field is provided then all vertex labels are processed during traversal.
  + **edgeLabels**   *(optional)*   –   *type:* a list of edge label strings;   *example:* `["route", ...]`;   *default:* no edge filtering.

    To filter on one more edge labels, provide a list of the ones to filter on. If no `edgeLabels` field is provided then all edge labels are processed during traversal.
  + **binWidth** *(optional)*   –   *type:* `integer`;   *default: 1*.

    To specify the size of each bin in the degree distribution, provide an integer value.
  + **traversalDirection** *(optional)*   –   *type:* `string`;   *default:*` "both"`.

    The direction of edge to follow. Must be one of: `"inbound"`, `"outbound"`, or `"both"`.
  + **concurrency**   *(optional)*   –   *type:* 0 or 1;   *default:* 0.

    Controls the number of concurrent threads used to run the algorithm.

     If set to `0`, uses all available threads to complete execution of the individual algorithm invocation. If set to `1`, uses a single thread. This can be useful when requiring the invocation of many algorithms concurrently.

## `.degreeDistribution`   outputs
<a name="degreeDistribution-outputs"></a>

There is a single column in the output containing a map with the following key components:
+ **distribution**   –   A list of lists where each list item is as follows:
  + [`degree`, `count`]   –   Degree and corresponding count. The list is sorted in the increasing order of `degree`.
+ **statistics**   –   A map with the following components:
  + `maxDeg`   –   the maximum degree in the graph.
  + `mean`   –   the average degree in the graph.
  + `minDeg`   –   the minimum degree in the graph.
  + `p50`   –   the 50th percentile degree in the graph, i.e., median.
  + `p75`   –   the 75th percentile degree in the graph.
  + `p90`   –   the 90th percentile degree in the graph.
  + `p95`   –   the 95th percentile degree in the graph.
  + `p99`   –   the 99th percentile degree in the graph.
  + `p999`   –   the 99.9th percentile degree in the graph.

## `.degreeDistribution`   query examples
<a name="degreeDistribution-query-examples"></a>

This is a standalone example, where the in-degree distribution is computed for the graph with specified vertex labels and edge label, and the mean degree is returned.

```
CALL neptune.algo.degreeDistribution({
   vertexLabels: ['airport', 'country'],
   edgeLabels: ['route'],
   traversalDirection: 'inbound',
})
YIELD output
WITH output.statistics.mean as meanDegree
RETURN meanDegree
```

## Sample   `.degreeDistribution`   output
<a name="degreeDistribution-sample-output"></a>

Here is an example of the output returned by .degreeDistribution when run against the [ sample air-routes dataset [nodes]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-nodes.csv), and [ sample air-routes dataset [edges]](https://github.com/krlawrence/graph/blob/main/sample-data/air-routes-latest-edges.csv), when using the following query:

```
aws neptune-graph execute-query \
    \
   --region ${region}
   --graph-identifier ${graphIdentifier} \
   --query-string "CALL neptune.algo.degreeDistribution({binWidth: 50, vertexLabels: ['airport', 'country'], edgeLabels: ['route'], traversalDirection: 'inbound'}) YIELD output RETURN output" \
   --language open_cypher \
   /tmp/out.txt
   
cat /tmp/out.txt
{
  "results": [{
      "output": {
        "statistics": {
          "maxDeg": 307,
          "mean": 13.511229946524065,
          "minDeg": 0,
          "p50": 3,          
          "p75": 9,
          "p90": 36,
          "p95": 67,          
          "p99": 173,
          "p999": 284          
        },
        "distribution": [[0, 268], [50, 3204], [100, 162], [150, 54], [200, 29], [250, 16], [300, 5], [350, 2]]
      }
    }]
}
```