Using Amazon Neptune with graph notebooks - Amazon Neptune

Using Amazon Neptune with graph notebooks

To work with Neptune graphs you can use Neptune graph notebook or create a new Neptune database using a AWS CloudFormation template. You can also

Whether you're new to graphs and want to learn and experiment, or you're experienced and want to refine your queries, the Neptune workbench offers an interactive development environment (IDE) that can boost your productivity when you're building graph applications. The Workbench provides a user-friendly interface for interacting with your Neptune database, writing queries, and visualizing your data.

By using the AWS CloudFormation template to set up your Neptune database, and the Workbench to develop your graph applications, you can get started with Neptune quickly and efficiently, without the need for additional tooling. This allows you to focus on building your applications rather than setting up the underlying infrastructure.

Neptune provides Jupyter and JupyterLab notebooks in the open-source Neptune graph notebook project on GitHub, and in the Neptune workbench. These notebooks offer sample application tutorials and code snippets in an interactive coding environment where you can learn about graph technology and Neptune. You can use them to walk through setting up, configuring, populating and querying graphs using different query languages, different data sets, and even different databases on the back end.

You can host these notebooks in several different ways:

  • The Neptune workbench lets you run Jupyter notebooks in a fully managed environment, hosted in Amazon SageMaker AI, and automatically loads the latest release of the Neptune graph notebook project for you. It is easy to set up the workbench in the Neptune console when you create a new Neptune database.

    Note

    When creating a Neptune notebook instance, you are provided with two options for network access: Direct access through Amazon SageMaker AI (the default) and access through a VPC. In either option, the notebook requires access to the internet to fetch package dependencies for installing the Neptune workbench. Lack of internet access will cause the creation of a Neptune notebook instance to fail.

  • You can also install Jupyter locally. This lets you run the notebooks from your laptop, connected either to Neptune or to a local instance of one of the open-source graph databases. In the latter case, you can experiment with graph technology as much as you want before you spend a penny. Then, when you're ready, you can move smoothly to the managed production environment that Neptune offers.

Using the Neptune workbench to host Neptune notebooks

Neptune offers T3 and T4g instance types that you can get started with for less than $0.10 per hour. You are billed for workbench resources through Amazon SageMaker AI, separately from your Neptune billing. See the Neptune pricing page. Jupyter and JupyterLab notebooks created on the Neptune workbench all use an Amazon Linux 2 and JupyterLab 3 environment. For more information about JupyterLab notebook support, see the Amazon SageMaker AI documentation.

You can create a Jupyter or JupyterLab notebook using the Neptune workbench in the AWS Management Console in either of two ways:

  • Use the Notebook configuration menu when creating a new Neptune DB cluster. To do this, follow the steps outlined in Launching a Neptune DB cluster using the AWS Management Console.

  • Use the Notebooks menu in the left navigation pane after your DB cluster has already been created. To do this, follow the steps below.

To create a Jupyter or JupyterLab notebook using the Notebooks menu
  1. Sign in to the AWS Management Console, and open the Amazon Neptune console at https://console.aws.amazon.com/neptune/home.

  2. In the navigation pane on the left, choose Notebooks.

  3. Choose Create notebook.

  4. In the Cluster list, choose your Neptune DB cluster. If you don't yet have a DB cluster, choose Create cluster to create one.

  5. Select a Notebook instance type.

  6. Give your notebook a name, and optionally a description.

  7. Unless you already created an AWS Identity and Access Management (IAM) role for your notebooks, choose Create an IAM role, and enter an IAM role name.

    Note

    If you do choose to re-use an IAM role created for a previous notebook, the role policy must contain the correct permissions to access the Neptune DB cluster that you're using. You can verify this by checking that the components in the resource ARN under the neptune-db:* action match that cluster. Incorrectly configured permissions result in connection errors when you try to run notebook magic commands.

  8. Choose Create notebook. The creation process may take 5 to 10 minutes before everything is ready.

  9. After your notebook is created, select it and then choose Open Jupyter or Open JupyterLab.

The console can create an AWS Identity and Access Management (IAM) role for your notebooks, or you can create one yourself. The policy for this role should include the following:

{If you do choose to re-use "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::aws-neptune-notebook", "arn:aws:s3:::aws-neptune-notebook/*" "arn:aws:s3:::aws-neptune-notebook-(AWS region)", "arn:aws:s3:::aws-neptune-notebook-(AWS region)/*" ] }, { "Effect": "Allow", "Action": "neptune-db:*", "Resource": [ "arn:aws:neptune-db:(AWS region):(AWS account ID):(Neptune resource ID)/*" ] } ] }

Note that the second statement in the policy above lists one or more Neptune cluster resource IDs.

Also, the role should establish the following trust relationship:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }

Again, getting everything ready to go can take 5 to 10 minutes.

You can configure your new notebook to work with Neptune ML, as explained in Manually configuring a Neptune notebook for Neptune ML.

Using Python to connect a generic SageMaker AI notebook to Neptune

Connecting a notebook to Neptune is easy if you have installed the Neptune magics, but it is also possible to connect a SageMaker AI notebook to Neptune using Python, even if you are not using a Neptune notebook.

Steps to take to connect to Neptune in a SageMaker AI notebook cell
  1. Install the Gremlin Python client:

    !pip install gremlinpython

    Neptune notebooks install the Gremlin Python client for you, so this step is only necessary if you're using a plain SageMaker AI notebook.

  2. Write code such as the following to connect and issue a Gremlin query:

    from gremlin_python import statics from gremlin_python.structure.graph import Graph from gremlin_python.process.graph_traversal import __ from gremlin_python.process.strategies import * from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection from gremlin_python.driver.aiohttp.transport import AiohttpTransport from gremlin_python.process.traversal import * import os port = 8182 server = '(your server endpoint)' endpoint = f'wss://{server}:{port}/gremlin' graph=Graph() connection = DriverRemoteConnection(endpoint,'g', transport_factory=lambda:AiohttpTransport(call_from_event_loop=True)) g = graph.traversal().withRemote(connection) results = (g.V().hasLabel('airport') .sample(10) .order() .by('code') .local(__.values('code','city').fold()) .toList()) # Print the results in a tabular form with a row index for i,c in enumerate(results,1): print("%3d %4s %s" % (i,c[0],c[1])) connection.close()
Note

If you happen to be using a version of the Gremlin Python client that is older than 3.5.0, this line:

connection = DriverRemoteConnection(endpoint,'g', transport_factory=lambda:AiohttpTransport(call_from_event_loop=True))

Would just be:

connection = DriverRemoteConnection(endpoint,'g')

Enabling CloudWatch logs on Neptune Notebooks

CloudWatch logs are now enabled by default for Neptune Notebooks. If you have an older notebook that is not producing CloudWatch logs, follow these steps to enable them manually:

  1. Sign in to the AWS Management Console and open the SageMaker AI console.

  2. On the navigation pane on the left, choose Notebook, then Notebook Instances. Look for the name of the Neptune notebook for which you would like to enable logs.

  3. Go to the details page by selecting the name of that notebook instance.

  4. If the notebook instance is running, select the Stop button, at the top right of the notebook details page.

  5. Under Permissions and encryption there is a field for IAM role ARN. Select the link in this field to go to the IAM role that this notebook instance runs with.

  6. Create the following policy:

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogDelivery", "logs:CreateLogGroup", "logs:CreateLogStream", "logs:DeleteLogDelivery", "logs:Describe*", "logs:GetLogDelivery", "logs:GetLogEvents", "logs:ListLogDeliveries", "logs:PutLogEvents", "logs:PutResourcePolicy", "logs:UpdateLogDelivery" ], "Resource": "*" } ] }
  7. Save this new policy and attach it to the IAM Role found in Step 4.

  8. Click Start at the top right of the SageMaker AI notebook instance details page.

  9. When logs start flowing, you should see a View Logs link beneath the field labeled Lifecycle configuration near the bottom left of the Notebook instance settings section of the details page.

If a notebook fails to start, there will be a message from the in the notebook details page on the SageMaker AI console, stating that the notebook instance took over 5 minutes to start. CloudWatch logs relevant to this issue can be found under this name:

(your-notebook-name)/LifecycleConfigOnStart

Setting up graph notebooks on your local machine

The graph-notebook project has instructions for setting up Neptune notebooks on your local machine:

You can connect your local notebooks either to a Neptune DB cluster, or to a local or remote instance of an open-source graph database.

Using Neptune notebooks with Neptune clusters

If you are connecting to a Neptune cluster on the back end, you may want to run the notebooks in Amazon SageMaker AI. Connecting to Neptune from SageMaker AI can be more convenient than from a local installation of the notebooks, and it will let you work more easily with Neptune ML.

For instructions about how to set up notebooks in SageMaker AI, see Launching graph-notebook using Amazon SageMaker.

For instructions about how to set up and configure Neptune itself, see Setting up Amazon Neptune.

You can also connect a local installation of the Neptune notebooks to a Neptune DB cluster. This can be somewhat more complicated because Amazon Neptune DB clusters can only be created in an Amazon Virtual Private Cloud (VPC), which is by design isolated from the outside world. There are a number ways to connect into a VPC from the outside it. One is to use a load balancer. Another is to use VPC peering (see the Amazon Virtual Private Cloud Peering Guide).

The most convenient way for most people, however, is to connect to set up an Amazon EC2 proxy server within the VPC and then use SSH tunnelling (also called port fowarding), to connect to it. You can find instructions about how to set up at Connecting graph notebook locally to Amazon Neptune in the additional-databases/neptune folder of the graph-notebook GitHub project.

Using Neptune notebooks with open-source graph databases

To get started with graph technology at no cost, you can also use Neptune notebooks with various open-source databases on the back end. Examples are the TinkerPop Gremlin server, and the Blazegraph database.

To use Gremlin Server as your back-end database, follow these steps:

To use a local instance of Blazegraph as your back-end database, follow these steps:

  • Review the Blazegraph quick-start instructions to understand the basic setup and configuration required for running a Blazegraph instance.

  • Access the graph-notebook Blazegraph configuration GitHub folder containing the necessary files and instructions for setting up a local Blazegraph instance. .

  • Within the GitHub repository, navigate to the "blazegraph" directory and follow the provided instructions to set up your local Blazegraph instance. This includes steps for downloading the Blazegraph software, configuring the necessary files, and starting the Blazegraph server.

Once you have a local Blazegraph instance running, you can integrate it with your application as the backend database for your graph-based data and queries. Refer to the documentation and example code provided in the graph-notebook repository to learn how to connect your application to the Blazegraph instance.

Migrating your Neptune notebooks from Jupyter to JupyterLab 3

Neptune notebooks created prior to December 21, 2022 use the Amazon Linux 1 environment. You can migrate older Jupyter notebooks created before that date to the new Amazon Linux 2 environment with JupyterLab 3 by taking the steps described in this AWS blog post: Migrate your work to an Amazon SageMaker notebook instance with Amazon Linux 2.

In addition, there are also a few more steps that apply specifically to migrating Neptune notebooks to the new environment:

Neptune-specific prerequisites

In the source Neptune notebook's IAM role, add all of the following permissions:

{ "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket", "s3:CreateBucket", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::(your ebs backup bucket name)", "arn:aws:s3:::(your ebs backup bucket name)/*" ] }, { "Effect": "Allow", "Action": [ "sagemaker:ListTags" ], "Resource": [ "*" ] }

Be sure to specify the correct ARN for the S3 bucket you will use for backing up.

Neptune-specific lifecycle configuration

When creating the second Lifecycle configuration script for restoring the backup (from on-create.sh) as described in the blog post, the Lifecycle name must follow the aws-neptune-* format, like aws-neptune-sync-from-s3. This ensures that the LCC can be selected during notebook creation in the Neptune console.

Neptune-specific synchronization from a snapshot to a new instance

In the steps described in the blog post for synchronizing from a snapshot to a new instance, here are the Neptune-specific changes:

  • On step 4, choose notebook-al2-v2.

  • On step 5, re-use the IAM role from the source Neptune notebook.

  • Between steps 7 and 8:

    • In Notebook instance settings, set a name that uses the aws-neptune-* format.

    • Open the Network settings accordion and select the same VPC, Subnet, and Security group as in the source notebook.

Neptune-specific steps after the new notebook has been created

  1. Select the Open Jupyter button for the notebook. Once the SYNC_COMPLETE file shows up in the main directory, proceed to the next step.

  2. Go to the notebook instance page in the SageMaker AI console.

  3. Stop the notebook.

  4. Select Edit.

  5. In the notebook instance settings, edit the Lifecycle configuration field by selecting the source Neptune notebook's original Lifecycle. Note that this is not the EBS backup Lifecycle.

  6. Select Update notebook settings.

  7. Start the notebook again.

With the modifications described here to the steps outlined in the blog post, your graph notebooks should now be migrated onto a new Neptune notebook instance that uses the Amazon Linux 2 and JupyterLab 3 environment. They'll show up for access and management on the Neptune page in the AWS Management Console, and you can now continue your work from where you left off by selecting either Open Jupyter or Open JupyterLab.