Using magics in Amazon Neptune notebooks
The Neptune workbench provides a number of so-called magic commands in the notebooks that save a great deal of time and effort. They fall into two categories: line magics and cell magics.
Line magics are commands preceded by a single percent sign
(%
). They only take line input, not input from the rest of the cell
body. Neptune workbench provides the following line magics:
Cell magics are preceded by two percent signs (%%
) rather
than one, and use the cell content as input, although they can also take line content as
input. Neptune workbench provides the following cell magics:
There are also two magics, a line magic and a cell magic, for working with Neptune machine learning:
Note
When working with Neptune magics, you can generally get help text using a
--help
or -h
parameter. With a cell magic, the body cannot
be empty, so when getting help, put filler text, even a single character, in the
body. For example:
%%gremlin --help x
Variable injection in cell or line magics
Variables defined in a notebook can be referenced inside any cell or line magics
in the notebook using the format: ${VAR_NAME}
.
For example, suppose you define these variables:
c = 'code' my_edge_labels = '{"route":"dist"}'
Then, this Gremlin query in a cell magic:
%%gremlin -de $my_edge_labels g.V().has('${c}','SAF').out('route').values('${c}')
Is equivalent to this:
%%gremlin -de {"route":"dist"} g.V().has('code','SAF').out('route').values('code')
Query arguments that work with all query languages
The following query arguments work with %%gremlin
, %%opencypher
,
and %%sparql
magics in the Neptune workbench:
Common query arguments
-
--store-to
(or-s
) – Specifies the name of a variable in which to store the query results. -
--silent
– If present, no output is displayed after the query completes. -
--group-by
(or-g
) – Specifies the property used to group nodes (such ascode
orT.region
). Vertices are colored based on their assigned group. -
--ignore-groups
– If present, all grouping options are ignored. -
--display-property
(or-d
) – Specifies the property whose value should be displayed for each vertex.The default value for each query language is as follows:
For Gremlin:
T.label
.For openCypher:
~labels
.For SPARQL:
type
.
-
--edge-display-property
(or-t
) – Specifies the property whose value should be displayed for each edge.The default value for each query language is as follows:
For Gremlin:
T.label
.For openCypher:
~labels
.For SPARQL:
type
.
-
--tooltip-property
(or-de
) – Specifies a property whose value should be displayed as a tooltip for each node.The default value for each query language is as follows:
For Gremlin:
T.label
.For openCypher:
~labels
.For SPARQL:
type
.
-
--edge-tooltip-property
(or-te
) – Specifies a property whose value should be displayed as a tooltip for each edge.The default value for each query language is as follows:
For Gremlin:
T.label
.For openCypher:
~labels
.For SPARQL:
type
.
-
--label-max-length
(or-l
) – Specifies the maximum character length of any vertex label. Defaults to 10. -
--edge-label-max-length
(or-le
) – Specifies the maximum character length of any edge label. Defaults to 10.In the case of openCypher only, this is
--rel-label-max-length
or-rel
instead. -
--simulation-duration
(or-sd
) – Specifies the maximum duration of the visualization physics simulation. Defaults to 1500 ms. -
--stop-physics
(or-sp
) – Disables visualization physics after the initial simulation has stabilized.
Property values for these arguments can consist either of a single property key, or of a JSON string that can specify a different properyt for each label type. A JSON string can only be specified using variable injection.
The %seed
line magic
The %seed
line magic is a convenient way to add data to your
Neptune endpoint that you can use to explore and experiment with Gremlin, openCypher,
or SPARQL queries. It provides a form where you can select the data model you want to
explore (property-graph or RDF) and then choose from among a number of different sample
data sets that Neptune provides.
The %load
line magic
The %load
line magic generates a form that you can use to submit
a bulk load request to Neptune (see Neptune Loader Command). The source must be an Amazon S3 path
in the same region as the Neptune cluster.
The %load_ids
line magic
The %load_ids
line magic retrieves the load Ids that have been
submitted to the notebook's host endpoint (see Neptune Loader Get-Status request parameters). The
request takes this form:
GET https://
your-neptune-endpoint
:port
/loader
The %load_status
line magic
The %load_status
line magic retrieves the load status of a particular
load job that has been submitted to the notebook's host endpoint, specified by the line
input (see Neptune Loader Get-Status request parameters). The
request takes this form:
GET https://
your-neptune-endpoint
:port
/loader?loadId=loadId
The line magic looks like this:
%load_status
load id
The %reset_graph
line magic
The %reset_graph
(or %_graph_reset
) line magic executes a
ResetGraph
call against the Neptune Analytics endpoint. It accepts the following optional line input:
-
-ns or --no-skip-snapshot - If present, a final graph snapshot will be created before the graph data is deleted.
-
--silent – If present, no output is displayed after the reset call is submitted.
-
--store-to – Used to specify a variable to which to store the ResetGraph response.
The %cancel_load
line magic
The %cancel_load
line magic cancels a particular load job (see Neptune Loader Cancel Job). The request takes this form:
DELETE https://
your-neptune-endpoint
:port
/loader?loadId=loadId
The line magic looks like this:
%cancel_load
load id
The %status
line magic
Retrieves status information from the notebook's host endpoint (%graph_notebook_config shows the host endpoint).
For Neptune DB hosts, status information will be fetched from the health status endpoint. For Neptune Analytics hosts, the status will be retrieved via the GetGraph API. See %get_graph for more information.
The %get_graph
line magic
The %get_graph
line magic retrieves information about a graph via the
GetGraph API.
This magic is functionally identical to %status when used with Neptune Analytics.
The %gremlin_status
line magic
Retrieves Gremlin query status information.
The %opencypher_status
line magic (also %oc_status
)
Retrieves query status for an opencypher query. This line magic takes the following optional arguments:
-
--queryId
or-q
– Specifies the ID of a specific running query for which to show the status. -
--cancelQuery
or-c
– Cancels a running query. Does not take a value. -
--silent-cancel
or-s
– If--silent
is set totrue
when cancelling a query, the running query is cancelled with an HTTP response code of200
. Otherwise, the HTTP response code would be500
. -
--store-to
– Specifies the name of a variable in which to store the query results. -
-w/--includeWaiting
– Neptune DB only. When set to true and other parameters are not present, causes status information for waiting queries to be returned as well as for running queries. This parameter does not take a value. -
--state
– Neptune Analytics only. Specifies what subset of query states to retrieve the status of. -
-m/--maxResults
– Neptune Analytics only. Sets an upper limit on the set of returned queries matching the value of--state
. -
--silent
– If present, no output is displayed after the query completes.
The %sparql_status
line magic
Retrieves SPARQL query status information.
The %stream_viewer
line magic
The %stream_viewer
line magic displays an interface that allows for
interactively exploring the entries logged in Neptune streams, if streams are enabled
on the Neptune cluster. It accepts the following optional arguments:
-
language
– The query language of the stream data: eithergremlin
orsparql
. The default, if you don't supply this argument, isgremlin
. -
--limit
– Specifies the maximum number of stream entries to display per page. The default value, if you don't supply this argument, is10
.
Note
The %stream_viewer
line magic is fully supported only
on engine versions 1.0.5.1 and earlier.
The %graph_notebook_config
line magic
This line magic displays a JSON object containing the configuration that the notebook is using to communicate with Neptune. The configuration includes:
host
: The endpoint to which to connect and issue commands.port
: The port used when issuing commands to Neptune. The default is8182
.auth_mode
: The mode of authentication to use when issuing commands to Neptune. Must beIAM
if connecting to a cluster that has IAM authentication enabled, or otherwiseDEFAULT
.load_from_s3_arn
: Specifies an Amazon S3 ARN for the%load
magic to use. If this value is empty, the ARN must be specified in the%load
command.ssl
: A Boolean value indicating whether or not to connect to Neptune using TLS. The default value istrue
.aws_region
: The region where this notebook is deployed. This information is used for IAM authentication and for%load
requests.
You can change the configuration by copying the %graph_notebook_config
output into a new cell and make changes to it there. Then if you run the %%graph_notebook_config cell
magic on the new cell, the configuration will be changed accordingly.
The %graph_notebook_host
line magic
Sets the line input as the notebook's host.
The %graph_notebook_version
line magic
The %graph_notebook_version
line magic returns the Neptune workbench
notebook release number. For example, graph visualization was introduced in version
1.27
.
The %graph_notebook_service
line magic
The %graph_notebook_service
line magic sets the line input as the service name used for Neptune
requests.
The %graph_notebook_vis_options
line magic
The %graph_notebook_vis_options
line magic displays the current
visualization settings that the notebook is using. These options are explained in the vis.js
You can modify these settings by copying the output into a new cell,
making the changes you want, and then running the %%graph_notebook_vis_options
cell magic on the cell.
To restore the visualization settings to their default values, you
can run the %graph_notebook_vis_options
line magic with a
reset
parameter. This resets all the visualization settings:
%graph_notebook_vis_options reset
The %statistics
line magic
The %statistics
line magic is used to retrieve or manage
DFE engine statistics (see Managing statistics for the Neptune DFE to use).
This magic can also be used to retrieve a graph summary.
It accepts the following parameters:
-
--language
– The query language of the statistics endpoint: either orpropertygraph
(orpg
) orrdf
.If not supplied, the default is
propertygraph
. -
--mode
(or-m
) – Specifies the type of request or action to submit: one ofstatus
,disableAutoCompute
,enableAutoCompute
,refresh
,delete
,detailed
, orbasic
).If not supplied, the default is
status
unless--summary
is specified, in which case the default isbasic
. -
--summary
– Retrieves the graph summary from the statistics summary endpoint of the selected language. -
--silent
– If present, no output is displayed after the query completes. -
--store-to
– Used to specify a variable to which to store the query results.
The %summary
line magic
The %summary
line magic is used to retrieve graph summary information.
It is available starting with Neptune engine version 1.2.1.0
.
It accepts the following parameters:
-
--language
– The query language of the statistics endpoint: either orpropertygraph
(orpg
) orrdf
.If not supplied, the default is
propertygraph
. -
--detailed
– Toggles the display of structures fields on or off in the output.If not supplied, the default is the
basic
summary display mode. -
--silent
– If present, no output is displayed after the query completes. -
--store-to
– Used to specify a variable to which to store the query results.
The %%graph_notebook_config
cell magic
The %%graph_notebook_config
cell magic uses a JSON object containing
configuration information to modify the settings that the notebook is using to communicate
with Neptune, if possible. The configuration takes the same form returned by the %graph_notebook_config line magic.
For example:
%%graph_notebook_config { "host": "my-new-cluster-endpoint.amazon.com", "port": 8182, "auth_mode": "DEFAULT", "load_from_s3_arn": "", "ssl": true, "aws_region": "us-east-1" }
The %%sparql
cell magic
The %%sparql
cell magic issues a SPARQL query to the Neptune
endpoint. It accepts the following optional line input:
-
-h
or--help
– Returns help text about these parameters. -
--path
– Prefixes a path to the SPARQL endpoint. For example, if you specify--path "abc/def"
then the endpoint called would be
.host
:port
/abc/def -
--expand-all
– This is a query visualization hint that tells the visualizer to include all?s ?p ?o
results in the graph diagram regardless of binding type.By default, a SPARQL visualization only includes triple patterns where the
o?
is auri
or abnode
(blank node). All other?o
binding types such as literal strings or integers are treated as properties of the?s
node that can be viewed using the Details pane in the Graph tab.Use the
--expand-all
query hint when you may want to include such literal values as vertices in the visualization instead.Don't combine this visualization hint with explain parameters, because explain queries are not visualized.
-
--explain-type
– Used to specify the explain mode to use (one of:dynamic
,static
, ordetails
). -
--explain-format
– Used to specify the response format for an explain query (one oftext/csv
ortext/html
). -
--store-to – Used to specify a variable to which to store the query results.
Example of an explain
query:
%%sparql explain SELECT * WHERE {?s ?p ?o} LIMIT 10
Example of a visualization query with an --expand-all
visualization
hint parameter (see SPARQL visualization):
%%sparql --expand-all SELECT * WHERE {?s ?p ?o} LIMIT 10
The %%gremlin
cell magic
The %%gremlin
cell magic issues a Gremlin query to the Neptune
endpoint using WebSocket. It accepts an optional line input to toggle into Gremlin explain /> mode or
Gremlin profile API,
and a separate optional visualization hint input to modify visualization output
behavior (see Gremlin visualization).
Example of an explain
query:
%%gremlin explain g.V().limit(10)
Example of a profile
query:
%%gremlin profile g.V().limit(10)
Example of a visualization query with a visualization query hint:
%%gremlin -p v,outv g.V().out().limit(10)
Optional parameters for %%gremlin profile
queries
-
--profile-chop
– Specifies the maximum length of the profile results string. The default value if you don't supply this argument is 250. -
--profile-serializer
– Specifies the serializer to use for the results. Allowed values are any of the valid MIME type or TinkerPop driver "Serializers" enum values. The default value if you don't supply this argument isapplication.json
. -
--profile-no-results
– Displays only the result count. If not used, all query results are displayed in the profile report by default. -
--profile-indexOps
– Shows a detailed report of all index operations.
The %%opencypher
cell magic (also %%oc
)
The %%opencypher
cell magic (which also has the abbreviated
%%oc
form), issues an openCypher query to the Neptune
endpoint. It accepts the following optional line input arguments:
-
mode – The query mode: either
query
orbolt
. The default value if you don't supply this argument isquery
. -
--group-by
or-g
– Specifies the property used to group nodes. For example,code, ~id
. The default value if you don't supply this argument is~labels
. -
--ignore-groups
– If present, all grouping options are ignored. -
--display-propery
or-d
– Specifies the property whose value should be displayed for each vertex. The default value if you don't supply this argument is~labels
. -
--edge-display-propery
or-de
– Specifies the property whose value should be displayed for each edge. The default value if you don't supply this argument is~labels
. -
--label-max-length
or-l
– Specifies the maximum number of characters of a vertex label to display. The default value if you don't supply this argument is10
. -
--store-to
or-s
– Specifies the name of a variable in which to store the query results. -
--plan-cache
or-pc
– Specifies the plan cache mode to use. The default value isauto
. -
--query-timeout
or-qt
– Specifies the maximum query timeout in milliseconds. The default value is1800000
. -
--query-parameters
orqp
– Parameter definitions to apply to the query. This option can either accept a single variable name, or a string representation of the map.Example usage of
--query-parameters
-
Define a map of openCypher parameters in one notebook cell.
params = '''{ "name":"john", "age": 20, }'''
-
Pass the parameters into
--query-parameters
in another cell with%%oc
.%%oc --query-parameters params MATCH (n {name: $name, age: $age}) RETURN n
-
-
--explain-type – Used to specify the explain mode to use (one of: dynamic, static, or details).
The %%graph_notebook_vis_options
cell magic
The %%graph_notebook_vis_options
cell magic lets you set
visualization options for the notebook. You can copy the settings returned
by the %graph-notebook-vis-options
line magic into a new cell,
make changes to them, and use the %%graph_notebook_vis_options
cell magic to set the new values.
These options are explained in the vis.js
To restore the visualization settings to their default values, you
can run the %graph_notebook_vis_options
line magic with a
reset
parameter. This resets all the visualization settings:
%graph_notebook_vis_options reset
The %neptune_ml
line magic
You can use the %neptune_ml
line magic to initiate and manage various
Neptune ML operations.
Note
You can also initiate and manage some Neptune ML operations using the %%neptune_ml cell magic.
-
%neptune_ml export start
– Starts a new export job.Parameters
-
--export-url
exporter-endpoint
– (optional) The Amazon API Gateway endpoint where the exporter can be called. -
--export-iam
– (optional) Flag indicating that requests to the export url must be signed using SigV4. -
--export-no-ssl
– (optional) Flag indicating that SSL should not be used when connecting to the exporter. -
--wait
– (optional) Flag indicating that the operation should wait until the export has completed. -
--wait-interval
interval-to-wait
– (optional) Sets the time, in seconds, between export status checks (Default: 60). -
--wait-timeout
timeout-seconds
– (optional) Sets the time, in seconds, to wait for the export job to complete before returning the most recent status (Default: 3,600). -
--store-to
location-to-store-result
– (optional) The variable in which to store the export result. If--wait
is specified, the final status will be stored there.
-
-
%neptune_ml export status
– Retrieves the status of an export job.Parameters
-
--job-id
export job ID
– The ID of the export job for which to retrieve status. -
--export-url
exporter-endpoint
– (optional) The Amazon API Gateway endpoint where the exporter can be called. -
--export-iam
– (optional) Flag indicating that requests to the export url must be signed using SigV4. -
--export-no-ssl
– (optional) Flag indicating that SSL should not be used when connecting to the exporter. -
--wait
– (optional) Flag indicating that the operation should wait until the export has completed. -
--wait-interval
interval-to-wait
– (optional) Sets the time, in seconds, between export status checks (Default: 60). -
--wait-timeout
timeout-seconds
– (optional) Sets the time, in seconds, to wait for the export job to complete before returning the most recent status (Default: 3,600). -
--store-to
location-to-store-result
– (optional) The variable in which to store the export result. If--wait
is specified, the final status will be stored there.
-
-
%neptune_ml dataprocessing start
– Starts the Neptune ML dataprocessing step.Parameters
-
--job-id
ID for this job
– (optional) ID to assign to this job. -
--s3-input-uri
S3 URI
– (optional) The S3 URI at which to find the input for this dataprocessing job. -
--config-file-name
file name
– (optional) Name of the configuration file for this dataprocessing job. -
--store-to
location-to-store-result
– (optional) The variable in which to store the dataprocessing result. -
--instance-type
(instance type)
– (optional) The instance size to use for this data-processing job. -
--wait
– (optional) Flag indicating that the operation should wait until the dataprocessing has completed. -
--wait-interval
interval-to-wait
– (optional) Sets the time, in seconds, between dataprocessing status checks (Default: 60). -
--wait-timeout
timeout-seconds
– (optional) Sets the time, in seconds, to wait for the dataprocessing job to complete before returning the most recent status (Default: 3,600).
-
-
%neptune_ml dataprocessing status
– Retrieves the status of a dataprocessing job.Parameters
-
--job-id
ID of the job
– ID of the job for which to retrieve the status. -
--store-to
instance type
– (optional) The variable in which to store the model-training result. -
--wait
– (optional) Flag indicating that the operation should wait until the model-training has completed. -
--wait-interval
interval-to-wait
– (optional) Sets the time, in seconds, between model-training status checks (Default: 60). -
--wait-timeout
timeout-seconds
– (optional) Sets the time, in seconds, to wait for the dataprocessing job to complete before returning the most recent status (Default: 3,600).
-
-
%neptune_ml training start
– Starts the Neptune ML model-training process.Parameters
-
--job-id
ID for this job
– (optional) ID to assign to this job. -
--data-processing-id
dataprocessing job ID
– (optional) ID of the dataprocessing job that created the artifacts to use for training. -
--s3-output-uri
S3 URI
– (optional) The S3 URI at which to store the output from this model-training job. -
--instance-type
(instance type)
– (optional) The instance size to use for this model-training job. -
--store-to
location-to-store-result
– (optional) The variable in which to store the model-training result. -
--wait
– (optional) Flag indicating that the operation should wait until the model-training has completed. -
--wait-interval
interval-to-wait
– (optional) Sets the time, in seconds, between model-training status checks (Default: 60). -
--wait-timeout
timeout-seconds
– (optional) Sets the time, in seconds, to wait for the model-training job to complete before returning the most recent status (Default: 3,600).
-
-
%neptune_ml training status
– Retrieves the status of a Neptune ML model-training job.Parameters
-
--job-id
ID of the job
– ID of the job for which to retrieve the status. -
--store-to
instance type
– (optional) The variable in which to store the status result. -
--wait
– (optional) Flag indicating that the operation should wait until the model-training has completed. -
--wait-interval
interval-to-wait
– (optional) Sets the time, in seconds, between model-training status checks (Default: 60). -
--wait-timeout
timeout-seconds
– (optional) Sets the time, in seconds, to wait for the dataprocessing job to complete before returning the most recent status (Default: 3,600).
-
-
%neptune_ml endpoint create
– Creates a query endpoint for a Neptune ML model.Parameters
-
--job-id
ID for this job
– (optional) ID to assign to this job. -
--model-job-id
model-training job ID
– (optional) ID of the model-training job for which to create a query endpoint. -
--instance-type
(instance type)
– (optional) The instance size to use for for the query endpoint.. -
--store-to
location-to-store-result
– (optional) The variable in which to store the result of the endpoint creation. -
--wait
– (optional) Flag indicating that the operation should wait until the endpoint creation has completed. -
--wait-interval
interval-to-wait
– (optional) Sets the time, in seconds, between status checks (Default: 60). -
--wait-timeout
timeout-seconds
– (optional) Sets the time, in seconds, to wait for the endpoint creation job to complete before returning the most recent status (Default: 3,600).
-
-
%neptune_ml endpoint status
– Retrieves the status of a Neptune ML query endpoint.Parameters
-
--job-id
endpoint creation ID
– (optional) ID of an endpoint creation job for which to report status. -
--store-to
location-to-store-result
– (optional) The variable in which to store the status result. -
--wait
– (optional) Flag indicating that the operation should wait until the endpoint creation has completed. -
--wait-interval
interval-to-wait
– (optional) Sets the time, in seconds, between status checks (Default: 60). -
--wait-timeout
timeout-seconds
– (optional) Sets the time, in seconds, to wait for the endpoint creation job to complete before returning the most recent status (Default: 3,600).
-
The %%neptune_ml
cell magic
The %%neptune_ml
cell magic ignores line inputs such as --job-id
or --export-url
. Instead, it lets you provide those inputs and others within
within the cell body.
You can also save such inputs in another cell, assigned to a Jupyter variable, and then inject them into the cell body using that variable. That way, you can use such inputs over and over without having to re-enter them all every time.
This only works if the injecting variable is the only content of the cell. You cannot use multiple variables in one cell, or a combination of text and a variable.
For example, the %%neptune_ml export start
cell magic can consume a
JSON document in the cell body that contains all the parameters described in
Parameters used to control the Neptune export process.
In the Neptune-ML-01-Introduction-to-Node-Classification-Gremlinexport-params
:
export_params = { "command": "export-pg", "params": { "endpoint": neptune_ml.get_host(), "profile": "neptune_ml", "useIamAuth": neptune_ml.get_iam(), "cloneCluster": False }, "outputS3Path": f'{s3_bucket_uri}/neptune-export', "additionalParams": { "neptune_ml": { "targets": [ { "node": "movie", "property": "genre" } ], "features": [ { "node": "movie", "property": "title", "type": "word2vec" }, { "node": "user", "property": "age", "type": "bucket_numerical", "range" : [1, 100], "num_buckets": 10 } ] } }, "jobSize": "medium"}
When you run this cell, Jupyter saves the parameters document under that name.
Then, you can use ${export_params}
to inject the JSON document into the
body of a %%neptune_ml export start cell
, like this:
%%neptune_ml export start --export-url {neptune_ml.get_export_service_host()} --export-iam --wait --store-to export_results ${export_params}
Available forms of the %%neptune_ml
cell magic
The %%neptune_ml
cell magic can be used in the following forms:
-
%%neptune_ml export start
– Starts a Neptune ML export process. -
%%neptune_ml dataprocessing start
– Starts a Neptune ML dataprocessing job. -
%%neptune_ml training start
– Starts a Neptune ML model-training job. -
%%neptune_ml endpoint create
– Creates a Neptune ML query endpoint for a model.