如果您使用編譯模型 SageMaker SDK 如果您使用 MXNet或編譯模型 PyTorch 如果您使用 Boto3、 SageMaker 主控台或 CLI 的編譯模型 TensorFlow

使用部署編譯模型 SageMaker SDK

如果模型是使用 AWS SDK for Python (Boto3)、或 Amazon SageMaker 主控台編譯的 AWS CLI，您必須滿足先決條件區段。請依照下列其中一個使用案例，部署以 SageMaker Neo 編譯的模型，根據您編譯模型的方式進行。

主題

如果您使用編譯模型 SageMaker SDK
如果您使用 MXNet或編譯模型 PyTorch
如果您使用 Boto3、 SageMaker 主控台或 CLI 的編譯模型 TensorFlow

如果您使用編譯模型 SageMaker SDK

已編譯模型的 sagemaker.Model 物件控點提供 deploy() 函式，讓您建立服務推論請求的端點。此函式可讓您設定用於端點的執行個體數量和類型。您必須選擇您為其編譯模型的執行個體。例如，在編譯模型（Amazon SageMaker SDK）區段中編譯的任務中，這是 ml_c5。


predictor = compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.c5.4xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)

如果您使用 MXNet或編譯模型 PyTorch

建立 SageMaker 模型，並使用架構特定模型 API 下的 deploy（）部署模型APIs。對於 MXNet，它是 MXNetModel，對於 PyTorch，它是 PyTorchModel。當您建立和部署 SageMaker 模型時，必須將MMS_DEFAULT_RESPONSE_TIMEOUT環境變數設定為，500並將 entry_point 參數指定為推論指令碼（inference.py），並將 source_dir 參數指定為推論指令碼的目錄位置（code）。若要準備推論指令碼 (inference.py)，請遵循先決條件步驟。

下列範例示範如何使用這些函數，使用 SageMaker SDK for Python 部署編譯的模型：

MXNet


from sagemaker.mxnet import MXNetModel

# Create SageMaker model and deploy an endpoint
sm_mxnet_compiled_model = MXNetModel(
    model_data='insert S3 path of compiled MXNet model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.8.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for MXNet',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_mxnet_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)

PyTorch 1.4 and Older


from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.4.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)

PyTorch 1.5 and Newer


from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.5',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)

注意

AmazonSageMakerFullAccess 和 AmazonS3ReadOnlyAccess政策必須連接至AmazonSageMaker-ExecutionRoleIAM角色。

如果您使用 Boto3、 SageMaker 主控台或 CLI 的編譯模型 TensorFlow

建構一個 TensorFlowModel 物件，然後呼叫部署：


role='AmazonSageMaker-ExecutionRole'
model_path='S3 path for model file'
framework_image='inference container arn'
tf_model = TensorFlowModel(model_data=model_path,
                framework_version='1.15.3',
                role=role, 
                image_uri=framework_image)
instance_type='ml.c5.xlarge'
predictor = tf_model.deploy(instance_type=instance_type,
                    initial_instance_count=1)

若需更多資訊，請參閱直接從模型成品部署。

您可以從此清單中選取ECRURI符合您需求的 Docker 映像 Amazon。

如需如何建構TensorFlowModel物件的詳細資訊，請參閱 SageMaker SDK。

注意

如果您在上部署模型，您的第一個推論請求可能會有高延遲GPU。這是因為在第一個推論請求上建立了最佳化的運算核心。建議您製作推論請求的暖機檔案，並在將模型檔案傳送到之前將其與模型檔案一起存放TFX。這就是所謂的 “暖機” 模型。

下列程式碼片段示範如何在先決條件區段中產生映像分類範例的暖機檔案：


import tensorflow as tf
from tensorflow_serving.apis import classification_pb2
from tensorflow_serving.apis import inference_pb2
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_log_pb2
from tensorflow_serving.apis import regression_pb2
import numpy as np

with tf.python_io.TFRecordWriter("tf_serving_warmup_requests") as writer:       
    img = np.random.uniform(0, 1, size=[224, 224, 3]).astype(np.float32)
    img = np.expand_dims(img, axis=0)
    test_data = np.repeat(img, 1, axis=0)
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'compiled_models'
    request.model_spec.signature_name = 'serving_default'
    request.inputs['Placeholder:0'].CopyFrom(tf.compat.v1.make_tensor_proto(test_data, shape=test_data.shape, dtype=tf.float32))
    log = prediction_log_pb2.PredictionLog(
    predict_log=prediction_log_pb2.PredictLog(request=request))
    writer.write(log.SerializeToString())

如需有關如何「暖機」模型的詳細資訊，請參閱 TensorFlow TFX第頁。

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

必要條件

使用 Boto3 部署編譯的模型

使用 部署編譯模型 SageMaker SDK

主題

如果您使用 編譯模型 SageMaker SDK

如果您使用 MXNet或 編譯模型 PyTorch

注意

如果您使用 Boto3、 SageMaker 主控台或 CLI 的 編譯模型 TensorFlow

注意

使用部署編譯模型 SageMaker SDK

如果您使用編譯模型 SageMaker SDK

如果您使用 MXNet或編譯模型 PyTorch

如果您使用 Boto3、 SageMaker 主控台或 CLI 的編譯模型 TensorFlow