

# Using the DLAMI with AWS Neuron
<a name="tutorial-inferentia-using"></a>

 A typical workflow with the AWS Neuron SDK is to compile a previously trained machine learning model on a compilation server. After this, distribute the artifacts to the Inf1 instances for execution. AWS Deep Learning AMIs (DLAMI) comes pre-installed with everything you need to compile and run inference in an Inf1 instance that uses Inferentia. 

 The following sections describe how to use the DLAMI with Inferentia. 

**Topics**
+ [Using TensorFlow-Neuron and the AWS Neuron Compiler](tutorial-inferentia-tf-neuron.md)
+ [Using AWS Neuron TensorFlow Serving](tutorial-inferentia-tf-neuron-serving.md)
+ [Using MXNet-Neuron and the AWS Neuron Compiler](tutorial-inferentia-mxnet-neuron.md)
+ [Using MXNet-Neuron Model Serving](tutorial-inferentia-mxnet-neuron-serving.md)
+ [Using PyTorch-Neuron and the AWS Neuron Compiler](tutorial-inferentia-pytorch-neuron.md)

# Using TensorFlow-Neuron and the AWS Neuron Compiler
<a name="tutorial-inferentia-tf-neuron"></a>

 This tutorial shows how to use the AWS Neuron compiler to compile the Keras ResNet-50 model and export it as a saved model in SavedModel format. This format is a typical TensorFlow model interchangeable format. You also learn how to run inference on an Inf1 instance with example input.  

 For more information about the Neuron SDK, see the [AWS Neuron SDK documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/tensorflow-neuron/index.html). 

**Topics**
+ [Prerequisites](#tutorial-inferentia-tf-neuron-prerequisites)
+ [Activate the Conda environment](#tutorial-inferentia-tf-neuron-activate)
+ [Resnet50 Compilation](#tutorial-inferentia-tf-neuron-compilation)
+ [ResNet50 Inference](#tutorial-inferentia-tf-neuron-inference)

## Prerequisites
<a name="tutorial-inferentia-tf-neuron-prerequisites"></a>

 Before using this tutorial, you should have completed the set up steps in [Launching a DLAMI Instance with AWS Neuron](tutorial-inferentia-launching.md). You should also have a familiarity with deep learning and using the DLAMI. 

## Activate the Conda environment
<a name="tutorial-inferentia-tf-neuron-activate"></a>

 Activate the TensorFlow-Neuron conda environment using the following command: 

```
source activate aws_neuron_tensorflow_p36
```

 To exit the current conda environment, run the following command: 

```
source deactivate
```

## Resnet50 Compilation
<a name="tutorial-inferentia-tf-neuron-compilation"></a>

Create a Python script called **tensorflow\$1compile\$1resnet50.py** that has the following content. This Python script compiles the Keras ResNet50 model and exports it as a saved model. 

```
import os
import time
import shutil
import tensorflow as tf
import tensorflow.neuron as tfn
import tensorflow.compat.v1.keras as keras
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input

# Create a workspace
WORKSPACE = './ws_resnet50'
os.makedirs(WORKSPACE, exist_ok=True)

# Prepare export directory (old one removed)
model_dir = os.path.join(WORKSPACE, 'resnet50')
compiled_model_dir = os.path.join(WORKSPACE, 'resnet50_neuron')
shutil.rmtree(model_dir, ignore_errors=True)
shutil.rmtree(compiled_model_dir, ignore_errors=True)

# Instantiate Keras ResNet50 model
keras.backend.set_learning_phase(0)
model = ResNet50(weights='imagenet')

# Export SavedModel
tf.saved_model.simple_save(
 session            = keras.backend.get_session(),
 export_dir         = model_dir,
 inputs             = {'input': model.inputs[0]},
 outputs            = {'output': model.outputs[0]})

# Compile using Neuron
tfn.saved_model.compile(model_dir, compiled_model_dir)

# Prepare SavedModel for uploading to Inf1 instance
shutil.make_archive(compiled_model_dir, 'zip', WORKSPACE, 'resnet50_neuron')
```

 Compile the model using the following command: 

```
python tensorflow_compile_resnet50.py
```

The compilation process will take a few minutes. When it completes, your output should look like the following: 

```
...
INFO:tensorflow:fusing subgraph neuron_op_d6f098c01c780733 with neuron-cc
INFO:tensorflow:Number of operations in TensorFlow session: 4638
INFO:tensorflow:Number of operations after tf.neuron optimizations: 556
INFO:tensorflow:Number of operations placed on Neuron runtime: 554
INFO:tensorflow:Successfully converted ./ws_resnet50/resnet50 to ./ws_resnet50/resnet50_neuron
...
```

 ​ 

 After compilation, the saved model is zipped at **ws\$1resnet50/resnet50\$1neuron.zip**. Unzip the model and download the sample image for inference using the following commands: 

```
unzip ws_resnet50/resnet50_neuron.zip -d .
curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg
```

## ResNet50 Inference
<a name="tutorial-inferentia-tf-neuron-inference"></a>

Create a Python script called **tensorflow\$1infer\$1resnet50.py**  that has the following content. This script runs inference on the downloaded model using a previously compiled inference model. 

```
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications import resnet50

# Create input from image
img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))
img_arr = image.img_to_array(img_sgl)
img_arr2 = np.expand_dims(img_arr, axis=0)
img_arr3 = resnet50.preprocess_input(img_arr2)
# Load model
COMPILED_MODEL_DIR = './ws_resnet50/resnet50_neuron/'
predictor_inferentia = tf.contrib.predictor.from_saved_model(COMPILED_MODEL_DIR)
# Run inference
model_feed_dict={'input': img_arr3}
infa_rslts = predictor_inferentia(model_feed_dict);
# Display results
print(resnet50.decode_predictions(infa_rslts["output"], top=5)[0])
```

 Run inference on the model using the following command: 

```
python tensorflow_infer_resnet50.py
```

 Your output should look like the following: 

```
...
[('n02123045', 'tabby', 0.6918919), ('n02127052', 'lynx', 0.12770271), ('n02123159', 'tiger_cat', 0.08277027), ('n02124075', 'Egyptian_cat', 0.06418919), ('n02128757', 'snow_leopard', 0.009290541)]
```

**Next Step**  
[Using AWS Neuron TensorFlow Serving](tutorial-inferentia-tf-neuron-serving.md)

# Using AWS Neuron TensorFlow Serving
<a name="tutorial-inferentia-tf-neuron-serving"></a>

This tutorial shows how to construct a graph and add an AWS Neuron compilation step before exporting the saved model to use with TensorFlow Serving. TensorFlow Serving is a serving system that allows you to scale-up inference across a network. Neuron TensorFlow Serving uses the same API as normal TensorFlow Serving. The only difference is that a saved model must be compiled for AWS Inferentia and the entry point is a different binary named `tensorflow_model_server_neuron`. The binary is found at `/usr/local/bin/tensorflow_model_server_neuron` and is pre-installed in the DLAMI. 

 For more information about the Neuron SDK, see the [AWS Neuron SDK documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/tensorflow-neuron/index.html). 

**Topics**
+ [Prerequisites](#tutorial-inferentia-tf-neuron--serving-prerequisites)
+ [Activate the Conda environment](#tutorial-inferentia-tf-neuron-serving-activate)
+ [Compile and Export the Saved Model](#tutorial-inferentia-tf-neuron-serving-compile)
+ [Serving the Saved Model](#tutorial-inferentia-tf-neuron-serving-serving)
+ [Generate inference requests to the model server](#tutorial-inferentia-tf-neuron-serving-inference)

## Prerequisites
<a name="tutorial-inferentia-tf-neuron--serving-prerequisites"></a>

Before using this tutorial, you should have completed the set up steps in [Launching a DLAMI Instance with AWS Neuron](tutorial-inferentia-launching.md). You should also have a familiarity with deep learning and using the DLAMI. 

## Activate the Conda environment
<a name="tutorial-inferentia-tf-neuron-serving-activate"></a>

 Activate the TensorFlow-Neuron conda environment using the following command: 

```
source activate aws_neuron_tensorflow_p36
```

 If you need to exit the current conda environment, run: 

```
source deactivate
```

## Compile and Export the Saved Model
<a name="tutorial-inferentia-tf-neuron-serving-compile"></a>

Create a Python script called `tensorflow-model-server-compile.py` with the following content. This script constructs a graph and compiles it using Neuron. It then exports the compiled graph as a saved model.  

```
import tensorflow as tf
import tensorflow.neuron
import os

tf.keras.backend.set_learning_phase(0)
model = tf.keras.applications.ResNet50(weights='imagenet')
sess = tf.keras.backend.get_session()
inputs = {'input': model.inputs[0]}
outputs = {'output': model.outputs[0]}

# save the model using tf.saved_model.simple_save
modeldir = "./resnet50/1"
tf.saved_model.simple_save(sess, modeldir, inputs, outputs)

# compile the model for Inferentia
neuron_modeldir = os.path.join(os.path.expanduser('~'), 'resnet50_inf1', '1')
tf.neuron.saved_model.compile(modeldir, neuron_modeldir, batch_size=1)
```

 Compile the model using the following command: 

```
python tensorflow-model-server-compile.py
```

 Your output should look like the following: 

```
...
INFO:tensorflow:fusing subgraph neuron_op_d6f098c01c780733 with neuron-cc
INFO:tensorflow:Number of operations in TensorFlow session: 4638
INFO:tensorflow:Number of operations after tf.neuron optimizations: 556
INFO:tensorflow:Number of operations placed on Neuron runtime: 554
INFO:tensorflow:Successfully converted ./resnet50/1 to /home/ubuntu/resnet50_inf1/1
```

## Serving the Saved Model
<a name="tutorial-inferentia-tf-neuron-serving-serving"></a>

Once the model has been compiled, you can use the following command to serve the saved model with the tensorflow\$1model\$1server\$1neuron binary: 

```
tensorflow_model_server_neuron --model_name=resnet50_inf1 \
    --model_base_path=$HOME/resnet50_inf1/ --port=8500 &
```

 Your output should look like the following. The compiled model is staged in the Inferentia device’s DRAM by the server to prepare for inference. 

```
...
2019-11-22 01:20:32.075856: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 40764 microseconds.
2019-11-22 01:20:32.075888: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /home/ubuntu/resnet50_inf1/1/assets.extra/tf_serving_warmup_requests
2019-11-22 01:20:32.075950: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: resnet50_inf1 version: 1}
2019-11-22 01:20:32.077859: I tensorflow_serving/model_servers/server.cc:353] Running gRPC ModelServer at 0.0.0.0:8500 ...
```

## Generate inference requests to the model server
<a name="tutorial-inferentia-tf-neuron-serving-inference"></a>

Create a Python script called `tensorflow-model-server-infer.py` with the following content. This script runs inference via gRPC, which is service framework. 

```
import numpy as np
import grpc
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow.keras.applications.resnet50 import decode_predictions

if __name__ == '__main__':
    channel = grpc.insecure_channel('localhost:8500')
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
    img_file = tf.keras.utils.get_file(
        "./kitten_small.jpg",
        "https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg")
    img = image.load_img(img_file, target_size=(224, 224))
    img_array = preprocess_input(image.img_to_array(img)[None, ...])
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'resnet50_inf1'
    request.inputs['input'].CopyFrom(
        tf.contrib.util.make_tensor_proto(img_array, shape=img_array.shape))
    result = stub.Predict(request)
    prediction = tf.make_ndarray(result.outputs['output'])
    print(decode_predictions(prediction))
```

 Run inference on the model by using gRPC with the following command: 

```
python tensorflow-model-server-infer.py
```

 Your output should look like the following: 

```
[[('n02123045', 'tabby', 0.6918919), ('n02127052', 'lynx', 0.12770271), ('n02123159', 'tiger_cat', 0.08277027), ('n02124075', 'Egyptian_cat', 0.06418919), ('n02128757', 'snow_leopard', 0.009290541)]]
```

# Using MXNet-Neuron and the AWS Neuron Compiler
<a name="tutorial-inferentia-mxnet-neuron"></a>

The MXNet-Neuron compilation API provides a method to compile a model graph that you can run on an AWS Inferentia device. 

 In this example, you use the API to compile a ResNet-50 model and use it to run inference. 

 For more information about the Neuron SDK, see the [AWS Neuron SDK documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/mxnet-neuron/index.html). 

**Topics**
+ [Prerequisites](#tutorial-inferentia-mxnet-neuron-prerequisites)
+ [Activate the Conda Environment](#tutorial-inferentia-mxnet-neuron-activate)
+ [Resnet50 Compilation](#tutorial-inferentia-mxnet-neuron-compilation)
+ [ResNet50 Inference](#tutorial-inferentia-mxnet-neuron-inference)

## Prerequisites
<a name="tutorial-inferentia-mxnet-neuron-prerequisites"></a>

 Before using this tutorial, you should have completed the set up steps in [Launching a DLAMI Instance with AWS Neuron](tutorial-inferentia-launching.md). You should also have a familiarity with deep learning and using the DLAMI. 

## Activate the Conda Environment
<a name="tutorial-inferentia-mxnet-neuron-activate"></a>

 Activate the MXNet-Neuron conda environment using the following command: 

```
source activate aws_neuron_mxnet_p36
```

To exit the current conda environment, run: 

```
source deactivate
```

## Resnet50 Compilation
<a name="tutorial-inferentia-mxnet-neuron-compilation"></a>

Create a Python script called **mxnet\$1compile\$1resnet50.py** with the following content. This script uses the MXNet-Neuron compilation Python API to compile a ResNet-50 model. 

```
import mxnet as mx
import numpy as np

print("downloading...")
path='http://data.mxnet.io/models/imagenet/'
mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params')
mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json')
print("download finished.")

sym, args, aux = mx.model.load_checkpoint('resnet-50', 0)

print("compile for inferentia using neuron... this will take a few minutes...")
inputs = { "data" : mx.nd.ones([1,3,224,224], name='data', dtype='float32') }

sym, args, aux = mx.contrib.neuron.compile(sym, args, aux, inputs)

print("save compiled model...")
mx.model.save_checkpoint("compiled_resnet50", 0, sym, args, aux)
```

 Compile the model using the following command: 

```
python mxnet_compile_resnet50.py
```

 Compilation will take a few minutes. When compilation has finished, the following files will be in your current directory: 

```
resnet-50-0000.params
resnet-50-symbol.json
compiled_resnet50-0000.params
compiled_resnet50-symbol.json
```

## ResNet50 Inference
<a name="tutorial-inferentia-mxnet-neuron-inference"></a>

Create a Python script called **mxnet\$1infer\$1resnet50.py** with the following content. This script downloads a sample image and uses it to run inference with the compiled model. 

```
import mxnet as mx
import numpy as np

path='http://data.mxnet.io/models/imagenet/'
mx.test_utils.download(path+'synset.txt')

fname = mx.test_utils.download('https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg')
img = mx.image.imread(fname)

# convert into format (batch, RGB, width, height)
img = mx.image.imresize(img, 224, 224) 
# resize
img = img.transpose((2, 0, 1)) 
# Channel first
img = img.expand_dims(axis=0) 
# batchify
img = img.astype(dtype='float32')

sym, args, aux = mx.model.load_checkpoint('compiled_resnet50', 0)
softmax = mx.nd.random_normal(shape=(1,))
args['softmax_label'] = softmax
args['data'] = img
# Inferentia context
ctx = mx.neuron()

exe = sym.bind(ctx=ctx, args=args, aux_states=aux, grad_req='null')
with open('synset.txt', 'r') as f:
    labels = [l.rstrip() for l in f]

exe.forward(data=img)
prob = exe.outputs[0].asnumpy()
# print the top-5
prob = np.squeeze(prob)
a = np.argsort(prob)[::-1] 
for i in a[0:5]:
    print('probability=%f, class=%s' %(prob[i], labels[i]))
```

 Run inference with the compiled model using the following command: 

```
python mxnet_infer_resnet50.py
```

 Your output should look like the following: 

```
probability=0.642454, class=n02123045 tabby, tabby cat
probability=0.189407, class=n02123159 tiger cat
probability=0.100798, class=n02124075 Egyptian cat
probability=0.030649, class=n02127052 lynx, catamount
probability=0.016278, class=n02129604 tiger, Panthera tigris
```

**Next Step**  
[Using MXNet-Neuron Model Serving](tutorial-inferentia-mxnet-neuron-serving.md)

# Using MXNet-Neuron Model Serving
<a name="tutorial-inferentia-mxnet-neuron-serving"></a>

In this tutorial, you learn to use a pre-trained MXNet model to perform real-time image classification with Multi Model Server (MMS). MMS is a flexible and easy-to-use tool for serving deep learning models that are trained using any machine learning or deep learning framework. This tutorial includes a compilation step using AWS Neuron and an implementation of MMS using MXNet.

 For more information about the Neuron SDK, see the [AWS Neuron SDK documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/mxnet-neuron/index.html). 

**Topics**
+ [Prerequisites](#tutorial-inferentia-mxnet-neuron-serving-prerequisites)
+ [Activate the Conda Environment](#tutorial-inferentia-mxnet-neuron-serving-activate)
+ [Download the Example Code](#tutorial-inferentia-mxnet-neuron-serving-download)
+ [Compile the Model](#tutorial-inferentia-mxnet-neuron-serving-compile)
+ [Run Inference](#tutorial-inferentia-mxnet-neuron-serving-inference)

## Prerequisites
<a name="tutorial-inferentia-mxnet-neuron-serving-prerequisites"></a>

 Before using this tutorial, you should have completed the set up steps in [Launching a DLAMI Instance with AWS Neuron](tutorial-inferentia-launching.md). You should also have a familiarity with deep learning and using the DLAMI. 

## Activate the Conda Environment
<a name="tutorial-inferentia-mxnet-neuron-serving-activate"></a>

 Activate the MXNet-Neuron conda environment by using the following command: 

```
source activate aws_neuron_mxnet_p36
```

 To exit the current conda environment, run: 

```
source deactivate
```

## Download the Example Code
<a name="tutorial-inferentia-mxnet-neuron-serving-download"></a>

 To run this example, download the example code using the following commands: 

```
git clone https://github.com/awslabs/multi-model-server
cd multi-model-server/examples/mxnet_vision
```

## Compile the Model
<a name="tutorial-inferentia-mxnet-neuron-serving-compile"></a>

Create a Python script called `multi-model-server-compile.py` with the following content. This script compiles the ResNet50 model to the Inferentia device target. 

```
import mxnet as mx
from mxnet.contrib import neuron
import numpy as np

path='http://data.mxnet.io/models/imagenet/'
mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params')
mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json')
mx.test_utils.download(path+'synset.txt')

nn_name = "resnet-50"

#Load a model
sym, args, auxs = mx.model.load_checkpoint(nn_name, 0)

#Define compilation parameters#  - input shape and dtype
inputs = {'data' : mx.nd.zeros([1,3,224,224], dtype='float32') }

# compile graph to inferentia target
csym, cargs, cauxs = neuron.compile(sym, args, auxs, inputs)

# save compiled model
mx.model.save_checkpoint(nn_name + "_compiled", 0, csym, cargs, cauxs)
```

 To compile the model, use the following command: 

```
python multi-model-server-compile.py
```

 Your output should look like the following: 

```
...
[21:18:40] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[21:18:40] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[21:19:00] src/operator/subgraph/build_subgraph.cc:698: start to execute partition graph.
[21:19:00] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[21:19:00] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
```

 Create a file named `signature.json` with the following content to configure the input name and shape: 

```
{
  "inputs": [
    {
      "data_name": "data",
      "data_shape": [
        1,
        3,
        224,
        224
      ]
    }
  ]
}
```

Download the `synset.txt` file by using the following command. This file is a list of names for ImageNet prediction classes. 

```
curl -O https://s3.amazonaws.com/model-server/model_archive_1.0/examples/squeezenet_v1.1/synset.txt
```

Create a custom service class following the template in the `model_server_template` folder. Copy the template into your current working directory by using the following command: 

```
cp -r ../model_service_template/* .
```

 Edit the `mxnet_model_service.py` module to replace the `mx.cpu()` context with the `mx.neuron()` context as follows. You also need to comment out the unnecessary data copy for `model_input` because MXNet-Neuron does not support the NDArray and Gluon APIs. 

```
...
self.mxnet_ctx = mx.neuron() if gpu_id is None else mx.gpu(gpu_id)
...
#model_input = [item.as_in_context(self.mxnet_ctx) for item in model_input]
```

 Package the model with model-archiver using the following commands: 

```
cd ~/multi-model-server/examples
model-archiver --force --model-name resnet-50_compiled --model-path mxnet_vision --handler mxnet_vision_service:handle
```

## Run Inference
<a name="tutorial-inferentia-mxnet-neuron-serving-inference"></a>

Start the Multi Model Server and load the model that uses the RESTful API by using the following commands. Ensure that **neuron-rtd** is running with the default settings. 

```
cd ~/multi-model-server/
multi-model-server --start --model-store examples > /dev/null # Pipe to log file if you want to keep a log of MMS
curl -v -X POST "http://localhost:8081/models?initial_workers=1&max_workers=4&synchronous=true&url=resnet-50_compiled.mar"
sleep 10 # allow sufficient time to load model
```

 Run inference using an example image with the following commands: 

```
curl -O https://raw.githubusercontent.com/awslabs/multi-model-server/master/docs/images/kitten_small.jpg
curl -X POST http://127.0.0.1:8080/predictions/resnet-50_compiled -T kitten_small.jpg
```

 Your output should look like the following: 

```
[
  {
    "probability": 0.6388034820556641,
    "class": "n02123045 tabby, tabby cat"
  },
  {
    "probability": 0.16900072991847992,
    "class": "n02123159 tiger cat"
  },
  {
    "probability": 0.12221276015043259,
    "class": "n02124075 Egyptian cat"
  },
  {
    "probability": 0.028706775978207588,
    "class": "n02127052 lynx, catamount"
  },
  {
    "probability": 0.01915954425930977,
    "class": "n02129604 tiger, Panthera tigris"
  }
]
```

 To cleanup after the test, issue a delete command via the RESTful API and stop the model server using the following commands: 

```
curl -X DELETE http://127.0.0.1:8081/models/resnet-50_compiled

multi-model-server --stop
```

 You should see the following output: 

```
{
  "status": "Model \"resnet-50_compiled\" unregistered"
}
Model server stopped.
Found 1 models and 1 NCGs.
Unloading 10001 (MODEL_STATUS_STARTED) :: success
Destroying NCG 1 :: success
```

# Using PyTorch-Neuron and the AWS Neuron Compiler
<a name="tutorial-inferentia-pytorch-neuron"></a>

The PyTorch-Neuron compilation API provides a method to compile a model graph that you can run on an AWS Inferentia device. 

A trained model must be compiled to an Inferentia target before it can be deployed on Inf1 instances. The following tutorial compiles the torchvision ResNet50 model and exports it as a saved TorchScript module. This model is then used to run inference.

For convenience, this tutorial uses an Inf1 instance for both compilation and inference. In practice, you may compile your model using another instance type, such as the c5 instance family. You must then deploy your compiled model to the Inf1 inference server. For more information, see the [AWS Neuron PyTorch SDK Documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/index.html).

**Topics**
+ [Prerequisites](#tutorial-inferentia-pytorch-neuron-prerequisites)
+ [Activate the Conda Environment](#tutorial-inferentia-pytorch-neuron-activate)
+ [Resnet50 Compilation](#tutorial-inferentia-pytorch-neuron-compilation)
+ [ResNet50 Inference](#tutorial-inferentia-pytorch-neuron-inference)

## Prerequisites
<a name="tutorial-inferentia-pytorch-neuron-prerequisites"></a>

Before using this tutorial, you should have completed the set up steps in [Launching a DLAMI Instance with AWS Neuron](tutorial-inferentia-launching.md). You should also have a familiarity with deep learning and using the DLAMI. 

## Activate the Conda Environment
<a name="tutorial-inferentia-pytorch-neuron-activate"></a>

Activate the PyTorch-Neuron conda environment using the following command: 

```
source activate aws_neuron_pytorch_p36
```

To exit the current conda environment, run: 

```
source deactivate
```

## Resnet50 Compilation
<a name="tutorial-inferentia-pytorch-neuron-compilation"></a>

Create a Python script called **pytorch\$1trace\$1resnet50.py** with the following content. This script uses the PyTorch-Neuron compilation Python API to compile a ResNet-50 model. 

**Note**  
There is a dependency between versions of torchvision and the torch package that you should be aware of when compiling torchvision models. These dependency rules can be managed through pip. Torchvision==0.6.1 matches the torch==1.5.1 release, while torchvision==0.8.2 matches the torch==1.7.1 release.

```
import torch
import numpy as np
import os
import torch_neuron
from torchvision import models

image = torch.zeros([1, 3, 224, 224], dtype=torch.float32)

## Load a pretrained ResNet50 model
model = models.resnet50(pretrained=True)

## Tell the model we are using it for evaluation (not training)
model.eval()
model_neuron = torch.neuron.trace(model, example_inputs=[image])

## Export to saved model
model_neuron.save("resnet50_neuron.pt")
```

Run the compilation script.

```
python pytorch_trace_resnet50.py
```

Compilation will take a few minutes. When compilation has finished, the compiled model is saved as `resnet50_neuron.pt` in the local directory.

## ResNet50 Inference
<a name="tutorial-inferentia-pytorch-neuron-inference"></a>

Create a Python script called **pytorch\$1infer\$1resnet50.py** with the following content. This script downloads a sample image and uses it to run inference with the compiled model. 

```
import os
import time
import torch
import torch_neuron
import json
import numpy as np

from urllib import request

from torchvision import models, transforms, datasets

## Create an image directory containing a small kitten
os.makedirs("./torch_neuron_test/images", exist_ok=True)
request.urlretrieve("https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg",
                    "./torch_neuron_test/images/kitten_small.jpg")


## Fetch labels to output the top classifications
request.urlretrieve("https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json","imagenet_class_index.json")
idx2label = []

with open("imagenet_class_index.json", "r") as read_file:
    class_idx = json.load(read_file)
    idx2label = [class_idx[str(k)][1] for k in range(len(class_idx))]

## Import a sample image and normalize it into a tensor
normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406],
    std=[0.229, 0.224, 0.225])

eval_dataset = datasets.ImageFolder(
    os.path.dirname("./torch_neuron_test/"),
    transforms.Compose([
    transforms.Resize([224, 224]),
    transforms.ToTensor(),
    normalize,
    ])
)

image, _ = eval_dataset[0]
image = torch.tensor(image.numpy()[np.newaxis, ...])

## Load model
model_neuron = torch.jit.load( 'resnet50_neuron.pt' )

## Predict
results = model_neuron( image )

# Get the top 5 results
top5_idx = results[0].sort()[1][-5:]

# Lookup and print the top 5 labels
top5_labels = [idx2label[idx] for idx in top5_idx]

print("Top 5 labels:\n {}".format(top5_labels) )
```

Run inference with the compiled model using the following command: 

```
python pytorch_infer_resnet50.py
```

Your output should look like the following: 

```
Top 5 labels:
 ['tiger', 'lynx', 'tiger_cat', 'Egyptian_cat', 'tabby']
```