必要條件 - Amazon SageMaker

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

必要條件

注意

如果您使用 AWS SDK for Python (Boto3)、 或 SageMaker 主控台編譯模型 AWS CLI,請遵循本節中的指示。

若要建立 SageMaker 新編譯模型,您需要下列項目:

  1. Docker 映像 Amazon ECR URI。您可以從這個清單選取一個符合需求的 URL。

  2. 進入點指令碼檔案:

    1. 對於 PyTorch 和 MXNet模型:

      如果您使用 訓練模型 SageMaker,則訓練指令碼必須實作下列函數。訓練指令碼可當成推論期間的進入點指令碼。在使用 MXNet Module and SageMaker Neo 進行MNIST訓練、編譯和部署的範例中,訓練指令碼 (mnist.py) 會實作所需的函數。

      如果您未使用 訓練模型 SageMaker,則需要提供可在推論時使用的入口點指令碼 (inference.py) 檔案。根據架構 -MXNet 或 PyTorch- 推論指令碼位置必須符合適用於 的 SageMaker Python SDK模型目錄結構或適用於 的模型目錄結構 PyTorch MxNet

      將 Neo Inference Optimized Container 映像與 PyTorchMXNet CPUGPU執行個體類型搭配使用時,推論指令碼必須實作下列函數:

      • model_fn:載入模型。(選用)

      • input_fn:將傳入請求承載轉換為 numpy 陣列。

      • predict_fn:執行預測。

      • output_fn:將預測輸出轉換為回應承載。

      • 或者,您也可以定義 transform_fn,合併 input_fnpredict_fnoutput_fn

      以下是 和 code(Gluon 和 Modulecode/inference.py) 的名為 () 的目錄中的inference.py指令碼範例。 PyTorch MXNet 這些範例會先載入模型,然後在 上的映像資料上提供模型GPU:

      MXNet Module
      import numpy as np import json import mxnet as mx import neomx # noqa: F401 from collections import namedtuple Batch = namedtuple('Batch', ['data']) # Change the context to mx.cpu() if deploying to a CPU endpoint ctx = mx.gpu() def model_fn(model_dir): # The compiled model artifacts are saved with the prefix 'compiled' sym, arg_params, aux_params = mx.model.load_checkpoint('compiled', 0) mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None) exe = mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))], label_shapes=mod._label_shapes) mod.set_params(arg_params, aux_params, allow_missing=True) # Run warm-up inference on empty data during model load (required for GPU) data = mx.nd.empty((1,3,224,224), ctx=ctx) mod.forward(Batch([data])) return mod def transform_fn(mod, image, input_content_type, output_content_type): # pre-processing decoded = mx.image.imdecode(image) resized = mx.image.resize_short(decoded, 224) cropped, crop_info = mx.image.center_crop(resized, (224, 224)) normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255, mean=mx.nd.array([0.485, 0.456, 0.406]), std=mx.nd.array([0.229, 0.224, 0.225])) transposed = normalized.transpose((2, 0, 1)) batchified = transposed.expand_dims(axis=0) casted = batchified.astype(dtype='float32') processed_input = casted.as_in_context(ctx) # prediction/inference mod.forward(Batch([processed_input])) # post-processing prob = mod.get_outputs()[0].asnumpy().tolist() prob_json = json.dumps(prob) return prob_json, output_content_type
      MXNet Gluon
      import numpy as np import json import mxnet as mx import neomx # noqa: F401 # Change the context to mx.cpu() if deploying to a CPU endpoint ctx = mx.gpu() def model_fn(model_dir): # The compiled model artifacts are saved with the prefix 'compiled' block = mx.gluon.nn.SymbolBlock.imports('compiled-symbol.json',['data'],'compiled-0000.params', ctx=ctx) # Hybridize the model & pass required options for Neo: static_alloc=True & static_shape=True block.hybridize(static_alloc=True, static_shape=True) # Run warm-up inference on empty data during model load (required for GPU) data = mx.nd.empty((1,3,224,224), ctx=ctx) warm_up = block(data) return block def input_fn(image, input_content_type): # pre-processing decoded = mx.image.imdecode(image) resized = mx.image.resize_short(decoded, 224) cropped, crop_info = mx.image.center_crop(resized, (224, 224)) normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255, mean=mx.nd.array([0.485, 0.456, 0.406]), std=mx.nd.array([0.229, 0.224, 0.225])) transposed = normalized.transpose((2, 0, 1)) batchified = transposed.expand_dims(axis=0) casted = batchified.astype(dtype='float32') processed_input = casted.as_in_context(ctx) return processed_input def predict_fn(processed_input_data, block): # prediction/inference prediction = block(processed_input_data) return prediction def output_fn(prediction, output_content_type): # post-processing prob = prediction.asnumpy().tolist() prob_json = json.dumps(prob) return prob_json, output_content_type
      PyTorch 1.4 and Older
      import os import torch import torch.nn.parallel import torch.optim import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms from PIL import Image import io import json import pickle def model_fn(model_dir): """Load the model and return it. Providing this function is optional. There is a default model_fn available which will load the model compiled using SageMaker Neo. You can override it here. Keyword arguments: model_dir -- the directory path where the model artifacts are present """ # The compiled model is saved as "compiled.pt" model_path = os.path.join(model_dir, 'compiled.pt') with torch.neo.config(model_dir=model_dir, neo_runtime=True): model = torch.jit.load(model_path) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) # We recommend that you run warm-up inference during model load sample_input_path = os.path.join(model_dir, 'sample_input.pkl') with open(sample_input_path, 'rb') as input_file: model_input = pickle.load(input_file) if torch.is_tensor(model_input): model_input = model_input.to(device) model(model_input) elif isinstance(model_input, tuple): model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp)) model(*model_input) else: print("Only supports a torch tensor or a tuple of torch tensors") return model def transform_fn(model, request_body, request_content_type, response_content_type): """Run prediction and return the output. The function 1. Pre-processes the input request 2. Runs prediction 3. Post-processes the prediction output. """ # preprocess decoded = Image.open(io.BytesIO(request_body)) preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize( mean=[ 0.485, 0.456, 0.406], std=[ 0.229, 0.224, 0.225]), ]) normalized = preprocess(decoded) batchified = normalized.unsqueeze(0) # predict device = torch.device("cuda" if torch.cuda.is_available() else "cpu") batchified = batchified.to(device) output = model.forward(batchified) return json.dumps(output.cpu().numpy().tolist()), response_content_type
      PyTorch 1.5 and Newer
      import os import torch import torch.nn.parallel import torch.optim import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms from PIL import Image import io import json import pickle def model_fn(model_dir): """Load the model and return it. Providing this function is optional. There is a default_model_fn available, which will load the model compiled using SageMaker Neo. You can override the default here. The model_fn only needs to be defined if your model needs extra steps to load, and can otherwise be left undefined. Keyword arguments: model_dir -- the directory path where the model artifacts are present """ # The compiled model is saved as "model.pt" model_path = os.path.join(model_dir, 'model.pt') device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = torch.jit.load(model_path, map_location=device) model = model.to(device) return model def transform_fn(model, request_body, request_content_type, response_content_type): """Run prediction and return the output. The function 1. Pre-processes the input request 2. Runs prediction 3. Post-processes the prediction output. """ # preprocess decoded = Image.open(io.BytesIO(request_body)) preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize( mean=[ 0.485, 0.456, 0.406], std=[ 0.229, 0.224, 0.225]), ]) normalized = preprocess(decoded) batchified = normalized.unsqueeze(0) # predict device = torch.device("cuda" if torch.cuda.is_available() else "cpu") batchified = batchified.to(device) output = model.forward(batchified) return json.dumps(output.cpu().numpy().tolist()), response_content_type
    2. 針對 inf1 執行個體或 onnx、xgboost、keras 容器映像檔

      針對所有其他 Neo 推論最佳化容器映像或 inferentia 執行個體類型,進入點指令碼必須為 Neo 深度學習執行期實作以下函式:

      • neo_preprocess:將傳入請求承載轉換為 numpy 陣列。

      • neo_postprocess:將 Neo 深度學習執行期的預測輸出轉換為回應內文。

        注意

        上述兩個函數不使用 MXNet PyTorch、 或 的任何功能 TensorFlow。

      如需如何使用這些函式的範例,請參閱 Neo 模型編譯範例筆記本

    3. 對於 TensorFlow 模型

      如果您的模型在將資料傳送至模型之前需要自訂的預處理和後處理邏輯,則您必須指定可在推論時使用的進入點指令碼 inference.py 檔案。指令碼應該實作一對 input_handleroutput_handler 函式或單一處理常式函式。

      注意

      請注意,如果已實作處理常式函式,則會忽略 input_handleroutput_handler

      以下是 inference.py 指令碼的程式碼範例,您可以將其與編譯模型結合在一起,在映像分類模型上執行自訂預處理和後處理。 SageMaker 用戶端會將映像檔案作為application/x-image內容類型傳送至 input_handler函數,並在其中將其轉換為 JSON。然後,轉換的影像檔案會使用 傳送至 Tensorflow 模型伺服器 (TFX)API。 REST

      import json import numpy as np import json import io from PIL import Image def input_handler(data, context): """ Pre-process request input before it is sent to TensorFlow Serving REST API Args: data (obj): the request data, in format of dict or string context (Context): an object containing request and configuration details Returns: (dict): a JSON-serializable dict that contains request body and headers """ f = data.read() f = io.BytesIO(f) image = Image.open(f).convert('RGB') batch_size = 1 image = np.asarray(image.resize((512, 512))) image = np.concatenate([image[np.newaxis, :, :]] * batch_size) body = json.dumps({"signature_name": "serving_default", "instances": image.tolist()}) return body def output_handler(data, context): """Post-process TensorFlow Serving output before it is returned to the client. Args: data (obj): the TensorFlow serving response context (Context): an object containing request and configuration details Returns: (bytes, string): data to return to client, response content type """ if data.status_code != 200: raise ValueError(data.content.decode('utf-8')) response_content_type = context.accept_header prediction = data.content return prediction, response_content_type

      如果沒有自訂的預處理或後處理, SageMaker 用戶端會以JSON類似的方式將檔案映像轉換為 ,然後再將其傳送到 SageMaker 端點。

      如需詳細資訊,請參閱 SageMaker Python 中的部署至 TensorFlow 服務端點SDK

  3. URI 包含編譯模型成品的 Amazon S3 儲存貯體。