必要條件

注意

如果您使用 AWS SDK for Python (Boto3)、或 SageMaker 主控台編譯模型 AWS CLI，請遵循本節中的指示。

若要建立 SageMaker 新編譯模型，您需要下列項目：

Docker 映像 Amazon ECR URI。您可以從這個清單選取一個符合需求的 URL。

進入點指令碼檔案：

對於 PyTorch 和 MXNet模型：

如果您使用訓練模型 SageMaker，則訓練指令碼必須實作下列函數。訓練指令碼可當成推論期間的進入點指令碼。在使用 MXNet Module and SageMaker Neo 進行MNIST訓練、編譯和部署的範例中，訓練指令碼（mnist.py）會實作所需的函數。

如果您未使用訓練模型 SageMaker，則需要提供可在推論時使用的入口點指令碼（inference.py）檔案。根據架構 -MXNet 或 PyTorch- 推論指令碼位置必須符合適用於的 SageMaker Python SDK模型目錄結構或適用於的模型目錄結構 PyTorch。 MxNet

將 Neo Inference Optimized Container 映像與 PyTorch和 MXNet CPUGPU執行個體類型搭配使用時，推論指令碼必須實作下列函數：

model_fn：載入模型。(選用)
input_fn：將傳入請求承載轉換為 numpy 陣列。
predict_fn：執行預測。
output_fn：將預測輸出轉換為回應承載。
或者，您也可以定義 transform_fn，合併 input_fn、predict_fn 與 output_fn。

以下是和 code（Gluon 和 Modulecode/inference.py）的名為（）的目錄中的inference.py指令碼範例。 PyTorch MXNet 這些範例會先載入模型，然後在上的映像資料上提供模型GPU：

MXNet Module


import numpy as np
import json
import mxnet as mx
import neomx  # noqa: F401
from collections import namedtuple

Batch = namedtuple('Batch', ['data'])

# Change the context to mx.cpu() if deploying to a CPU endpoint
ctx = mx.gpu()

def model_fn(model_dir):
    # The compiled model artifacts are saved with the prefix 'compiled'
    sym, arg_params, aux_params = mx.model.load_checkpoint('compiled', 0)
    mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
    exe = mod.bind(for_training=False,
                   data_shapes=[('data', (1,3,224,224))],
                   label_shapes=mod._label_shapes)
    mod.set_params(arg_params, aux_params, allow_missing=True)
    
    # Run warm-up inference on empty data during model load (required for GPU)
    data = mx.nd.empty((1,3,224,224), ctx=ctx)
    mod.forward(Batch([data]))
    return mod


def transform_fn(mod, image, input_content_type, output_content_type):
    # pre-processing
    decoded = mx.image.imdecode(image)
    resized = mx.image.resize_short(decoded, 224)
    cropped, crop_info = mx.image.center_crop(resized, (224, 224))
    normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                  mean=mx.nd.array([0.485, 0.456, 0.406]),
                                  std=mx.nd.array([0.229, 0.224, 0.225]))
    transposed = normalized.transpose((2, 0, 1))
    batchified = transposed.expand_dims(axis=0)
    casted = batchified.astype(dtype='float32')
    processed_input = casted.as_in_context(ctx)

    # prediction/inference
    mod.forward(Batch([processed_input]))

    # post-processing
    prob = mod.get_outputs()[0].asnumpy().tolist()
    prob_json = json.dumps(prob)
    return prob_json, output_content_type

MXNet Gluon


import numpy as np
import json
import mxnet as mx
import neomx  # noqa: F401

# Change the context to mx.cpu() if deploying to a CPU endpoint
ctx = mx.gpu()

def model_fn(model_dir):
    # The compiled model artifacts are saved with the prefix 'compiled'
    block = mx.gluon.nn.SymbolBlock.imports('compiled-symbol.json',['data'],'compiled-0000.params', ctx=ctx)
    
    # Hybridize the model & pass required options for Neo: static_alloc=True & static_shape=True
    block.hybridize(static_alloc=True, static_shape=True)
    
    # Run warm-up inference on empty data during model load (required for GPU)
    data = mx.nd.empty((1,3,224,224), ctx=ctx)
    warm_up = block(data)
    return block


def input_fn(image, input_content_type):
    # pre-processing
    decoded = mx.image.imdecode(image)
    resized = mx.image.resize_short(decoded, 224)
    cropped, crop_info = mx.image.center_crop(resized, (224, 224))
    normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                  mean=mx.nd.array([0.485, 0.456, 0.406]),
                                  std=mx.nd.array([0.229, 0.224, 0.225]))
    transposed = normalized.transpose((2, 0, 1))
    batchified = transposed.expand_dims(axis=0)
    casted = batchified.astype(dtype='float32')
    processed_input = casted.as_in_context(ctx)
    return processed_input


def predict_fn(processed_input_data, block):
    # prediction/inference
    prediction = block(processed_input_data)
    return prediction

def output_fn(prediction, output_content_type):
    # post-processing
    prob = prediction.asnumpy().tolist()
    prob_json = json.dumps(prob)
    return prob_json, output_content_type

PyTorch 1.4 and Older


import os
import torch
import torch.nn.parallel
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
from PIL import Image
import io
import json
import pickle


def model_fn(model_dir):
    """Load the model and return it.
    Providing this function is optional.
    There is a default model_fn available which will load the model
    compiled using SageMaker Neo. You can override it here.

    Keyword arguments:
    model_dir -- the directory path where the model artifacts are present
    """

    # The compiled model is saved as "compiled.pt"
    model_path = os.path.join(model_dir, 'compiled.pt')
    with torch.neo.config(model_dir=model_dir, neo_runtime=True):
        model = torch.jit.load(model_path)
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        model = model.to(device)

    # We recommend that you run warm-up inference during model load
    sample_input_path = os.path.join(model_dir, 'sample_input.pkl')
    with open(sample_input_path, 'rb') as input_file:
        model_input = pickle.load(input_file)
    if torch.is_tensor(model_input):
        model_input = model_input.to(device)
        model(model_input)
    elif isinstance(model_input, tuple):
        model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp))
        model(*model_input)
    else:
        print("Only supports a torch tensor or a tuple of torch tensors")
        return model


def transform_fn(model, request_body, request_content_type,
                 response_content_type):
    """Run prediction and return the output.
    The function
    1. Pre-processes the input request
    2. Runs prediction
    3. Post-processes the prediction output.
    """
    # preprocess
    decoded = Image.open(io.BytesIO(request_body))
    preprocess = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[
                0.485, 0.456, 0.406], std=[
                0.229, 0.224, 0.225]),
    ])
    normalized = preprocess(decoded)
    batchified = normalized.unsqueeze(0)
    # predict
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    batchified = batchified.to(device)
    output = model.forward(batchified)

    return json.dumps(output.cpu().numpy().tolist()), response_content_type

PyTorch 1.5 and Newer


import os
import torch
import torch.nn.parallel
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
from PIL import Image
import io
import json
import pickle


def model_fn(model_dir):
    """Load the model and return it.
    Providing this function is optional.
    There is a default_model_fn available, which will load the model
    compiled using SageMaker Neo. You can override the default here.
    The model_fn only needs to be defined if your model needs extra
    steps to load, and can otherwise be left undefined.

    Keyword arguments:
    model_dir -- the directory path where the model artifacts are present
    """

    # The compiled model is saved as "model.pt"
    model_path = os.path.join(model_dir, 'model.pt')
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = torch.jit.load(model_path, map_location=device)
    model = model.to(device)

    return model


def transform_fn(model, request_body, request_content_type,
                    response_content_type):
    """Run prediction and return the output.
    The function
    1. Pre-processes the input request
    2. Runs prediction
    3. Post-processes the prediction output.
    """
    # preprocess
    decoded = Image.open(io.BytesIO(request_body))
    preprocess = transforms.Compose([
                                transforms.Resize(256),
                                transforms.CenterCrop(224),
                                transforms.ToTensor(),
                                transforms.Normalize(
                                    mean=[
                                        0.485, 0.456, 0.406], std=[
                                        0.229, 0.224, 0.225]),
                                    ])
    normalized = preprocess(decoded)
    batchified = normalized.unsqueeze(0)
    
    # predict
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    batchified = batchified.to(device)
    output = model.forward(batchified)
    return json.dumps(output.cpu().numpy().tolist()), response_content_type

針對 inf1 執行個體或 onnx、xgboost、keras 容器映像檔

針對所有其他 Neo 推論最佳化容器映像或 inferentia 執行個體類型，進入點指令碼必須為 Neo 深度學習執行期實作以下函式：
- neo_preprocess：將傳入請求承載轉換為 numpy 陣列。
- neo_postprocess：將 Neo 深度學習執行期的預測輸出轉換為回應內文。
  
  注意
  上述兩個函數不使用 MXNet PyTorch、或的任何功能 TensorFlow。
如需如何使用這些函式的範例，請參閱 Neo 模型編譯範例筆記本。

對於 TensorFlow 模型

如果您的模型在將資料傳送至模型之前需要自訂的預處理和後處理邏輯，則您必須指定可在推論時使用的進入點指令碼 inference.py 檔案。指令碼應該實作一對 input_handler 和 output_handler 函式或單一處理常式函式。

注意

請注意，如果已實作處理常式函式，則會忽略 input_handler 和 output_handler。

以下是 inference.py 指令碼的程式碼範例，您可以將其與編譯模型結合在一起，在映像分類模型上執行自訂預處理和後處理。 SageMaker 用戶端會將映像檔案作為application/x-image內容類型傳送至 input_handler函數，並在其中將其轉換為 JSON。然後，轉換的影像檔案會使用傳送至 Tensorflow 模型伺服器（TFX）API。 REST


import json
import numpy as np
import json
import io
from PIL import Image

def input_handler(data, context):
    """ Pre-process request input before it is sent to TensorFlow Serving REST API
    
    Args:
    data (obj): the request data, in format of dict or string
    context (Context): an object containing request and configuration details
    
    Returns:
    (dict): a JSON-serializable dict that contains request body and headers
    """
    f = data.read()
    f = io.BytesIO(f)
    image = Image.open(f).convert('RGB')
    batch_size = 1
    image = np.asarray(image.resize((512, 512)))
    image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
    body = json.dumps({"signature_name": "serving_default", "instances": image.tolist()})
    return body

def output_handler(data, context):
    """Post-process TensorFlow Serving output before it is returned to the client.
    
    Args:
    data (obj): the TensorFlow serving response
    context (Context): an object containing request and configuration details
    
    Returns:
    (bytes, string): data to return to client, response content type
    """
    if data.status_code != 200:
        raise ValueError(data.content.decode('utf-8'))

    response_content_type = context.accept_header
    prediction = data.content
    return prediction, response_content_type

如果沒有自訂的預處理或後處理， SageMaker 用戶端會以JSON類似的方式將檔案映像轉換為，然後再將其傳送到 SageMaker 端點。

如需詳細資訊，請參閱 SageMaker Python 中的部署至 TensorFlow 服務端點SDK。

URI 包含編譯模型成品的 Amazon S3 儲存貯體。

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

部署模型

使用部署編譯模型 SageMaker SDK