翻訳は機械翻訳により提供されています。提供された翻訳内容と英語版の間で齟齬、不一致または矛盾がある場合、英語版が優先します。

# AWS Neuron TensorFlow Serving の使用
<a name="tutorial-inferentia-tf-neuron-serving"></a>

このチュートリアルでは、TensorFlow Serving で使用する保存済みモデルをエクスポートする前に、グラフを作成し、 AWS Neuron コンパイルステップを追加する方法を示します。TensorFlow Serving は、ネットワーク全体で推論をスケールアップすることを可能にする提供システムです。Neuron TensorFlow Serving は通常の TensorFlow Serving と同じ API を使用します。唯一の違いは、保存されたモデルを Inferentia AWS 用にコンパイルする必要があり、エントリポイントは という名前の別のバイナリであることです `tensorflow_model_server_neuron`。バイナリは、`/usr/local/bin/tensorflow_model_server_neuron` にあり、DLAMI にあらかじめインストールされています。

 Neuron SDK の詳細については、[AWS Neuron SDK のドキュメント](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/tensorflow-neuron/index.html)を参照してください。

**Topics**
+ [前提条件](#tutorial-inferentia-tf-neuron--serving-prerequisites)
+ [Conda 環境のアクティブ化](#tutorial-inferentia-tf-neuron-serving-activate)
+ [保存したモデルのコンパイルとエクスポート](#tutorial-inferentia-tf-neuron-serving-compile)
+ [保存したモデルの提供](#tutorial-inferentia-tf-neuron-serving-serving)
+ [モデルサーバーへの推論リクエストを生成する](#tutorial-inferentia-tf-neuron-serving-inference)

## 前提条件
<a name="tutorial-inferentia-tf-neuron--serving-prerequisites"></a>

このチュートリアルを使用する前に、[AWS Neuron を使用した DLAMI インスタンスの起動](tutorial-inferentia-launching.md) の設定ステップを完了しておく必要があります。また、深層学習および DLAMI の使用にも精通している必要があります。

## Conda 環境のアクティブ化
<a name="tutorial-inferentia-tf-neuron-serving-activate"></a>

 次のコマンドを使用して、TensorFlow-Neuron Conda 環境をアクティブにします。

```
source activate aws_neuron_tensorflow_p36
```

 現在の Conda 環境を終了する必要がある場合は、次のコマンドを実行します。

```
source deactivate
```

## 保存したモデルのコンパイルとエクスポート
<a name="tutorial-inferentia-tf-neuron-serving-compile"></a>

以下の内容が含まれた Python スクリプト `tensorflow-model-server-compile.py` を作成します。このスクリプトは、Neuron を使用してグラフを作成し、コンパイルします。次に、コンパイルされたグラフを保存されたモデルとしてエクスポートします。  

```
import tensorflow as tf
import tensorflow.neuron
import os

tf.keras.backend.set_learning_phase(0)
model = tf.keras.applications.ResNet50(weights='imagenet')
sess = tf.keras.backend.get_session()
inputs = {'input': model.inputs[0]}
outputs = {'output': model.outputs[0]}

# save the model using tf.saved_model.simple_save
modeldir = "./resnet50/1"
tf.saved_model.simple_save(sess, modeldir, inputs, outputs)

# compile the model for Inferentia
neuron_modeldir = os.path.join(os.path.expanduser('~'), 'resnet50_inf1', '1')
tf.neuron.saved_model.compile(modeldir, neuron_modeldir, batch_size=1)
```

 次のコマンドを使用してモデルをコンパイルします。

```
python tensorflow-model-server-compile.py
```

 出力は次のようになります。

```
...
INFO:tensorflow:fusing subgraph neuron_op_d6f098c01c780733 with neuron-cc
INFO:tensorflow:Number of operations in TensorFlow session: 4638
INFO:tensorflow:Number of operations after tf.neuron optimizations: 556
INFO:tensorflow:Number of operations placed on Neuron runtime: 554
INFO:tensorflow:Successfully converted ./resnet50/1 to /home/ubuntu/resnet50_inf1/1
```

## 保存したモデルの提供
<a name="tutorial-inferentia-tf-neuron-serving-serving"></a>

モデルをコンパイルしたら、次のコマンドを使用して、保存したモデルに tensorflow\$1model\$1server\$1neuron バイナリを提供できます。

```
tensorflow_model_server_neuron --model_name=resnet50_inf1 \
    --model_base_path=$HOME/resnet50_inf1/ --port=8500 &
```

 出力は次のようになります。コンパイルされたモデルは、推論の準備のためにサーバーによって Inferentia デバイスの DRAM にステージングされます。

```
...
2019-11-22 01:20:32.075856: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 40764 microseconds.
2019-11-22 01:20:32.075888: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /home/ubuntu/resnet50_inf1/1/assets.extra/tf_serving_warmup_requests
2019-11-22 01:20:32.075950: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: resnet50_inf1 version: 1}
2019-11-22 01:20:32.077859: I tensorflow_serving/model_servers/server.cc:353] Running gRPC ModelServer at 0.0.0.0:8500 ...
```

## モデルサーバーへの推論リクエストを生成する
<a name="tutorial-inferentia-tf-neuron-serving-inference"></a>

次の内容で `tensorflow-model-server-infer.py` という Python スクリプトを作成します。このスクリプトはサービスフレームワークである gRPC を介して推論を実行してください。

```
import numpy as np
import grpc
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow.keras.applications.resnet50 import decode_predictions

if __name__ == '__main__':
    channel = grpc.insecure_channel('localhost:8500')
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
    img_file = tf.keras.utils.get_file(
        "./kitten_small.jpg",
        "https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg")
    img = image.load_img(img_file, target_size=(224, 224))
    img_array = preprocess_input(image.img_to_array(img)[None, ...])
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'resnet50_inf1'
    request.inputs['input'].CopyFrom(
        tf.contrib.util.make_tensor_proto(img_array, shape=img_array.shape))
    result = stub.Predict(request)
    prediction = tf.make_ndarray(result.outputs['output'])
    print(decode_predictions(prediction))
```

 次のコマンドで gRPC を使用して、モデルの推論を実行します。

```
python tensorflow-model-server-infer.py
```

 出力は次のようになります。

```
[[('n02123045', 'tabby', 0.6918919), ('n02127052', 'lynx', 0.12770271), ('n02123159', 'tiger_cat', 0.08277027), ('n02124075', 'Egyptian_cat', 0.06418919), ('n02128757', 'snow_leopard', 0.009290541)]]
```