Prerequisites
Note
Follow the instructions in this section if you compiled your model using AWS SDK for Python (Boto3), AWS CLI, or the SageMaker AI console.
To create a SageMaker Neo-compiled model, you need the following:
-
A Docker image Amazon ECR URI. You can select one that meets your needs from this list.
-
An entry point script file:
-
For PyTorch and MXNet models:
If you trained your model using SageMaker AI, the training script must implement the functions described below. The training script serves as the entry point script during inference. In the example detailed in MNIST Training, Compilation and Deployment with MXNet Module and SageMaker Neo
, the training script ( mnist.py
) implements the required functions.If you did not train your model using SageMaker AI, you need to provide an entry point script (
inference.py
) file that can be used at the time of inference. Based on the framework—MXNet or PyTorch—the inference script location must conform to the SageMaker Python SDK Model Directory Structure for MxNetor Model Directory Structure for PyTorch . When using Neo Inference Optimized Container images with PyTorch and MXNet on CPU and GPU instance types, the inference script must implement the following functions:
-
model_fn
: Loads the model. (Optional) -
input_fn
: Converts the incoming request payload into a numpy array. -
predict_fn
: Performs the prediction. -
output_fn
: Converts the prediction output into the response payload. -
Alternatively, you can define
transform_fn
to combineinput_fn
,predict_fn
, andoutput_fn
.
The following are examples of
inference.py
script within a directory namedcode
(code/inference.py
) for PyTorch and MXNet (Gluon and Module). The examples first load the model and then serve it on image data on a GPU: -
-
For inf1 instances or onnx, xgboost, keras container images
For all other Neo Inference-optimized container images, or inferentia instance types, the entry point script must implement the following functions for Neo Deep Learning Runtime:
-
neo_preprocess
: Converts the incoming request payload into a numpy array. -
neo_postprocess
: Converts the prediction output from Neo Deep Learning Runtime into the response body.Note
The preceding two functions do not use any of the functionalities of MXNet, PyTorch, or TensorFlow.
For examples of how to use these functions, see Neo Model Compilation Sample Notebooks.
-
-
For TensorFlow models
If your model requires custom pre- and post-processing logic before data is sent to the model, then you must specify an entry point script
inference.py
file that can be used at the time of inference. The script should implement either a either a pair ofinput_handler
andoutput_handler
functions or a single handler function.Note
Note that if handler function is implemented,
input_handler
andoutput_handler
are ignored.The following is a code example of
inference.py
script that you can put together with the compile model to perform custom pre- and post-processing on an image classification model. The SageMaker AI client sends the image file as anapplication/x-image
content type to theinput_handler
function, where it is converted to JSON. The converted image file is then sent to the Tensorflow Model Server (TFX)using the REST API. import json import numpy as np import json import io from PIL import Image def input_handler(data, context): """ Pre-process request input before it is sent to TensorFlow Serving REST API Args: data (obj): the request data, in format of dict or string context (Context): an object containing request and configuration details Returns: (dict): a JSON-serializable dict that contains request body and headers """ f = data.read() f = io.BytesIO(f) image = Image.open(f).convert('RGB') batch_size = 1 image = np.asarray(image.resize((512, 512))) image = np.concatenate([image[np.newaxis, :, :]] * batch_size) body = json.dumps({"signature_name": "serving_default", "instances": image.tolist()}) return body def output_handler(data, context): """Post-process TensorFlow Serving output before it is returned to the client. Args: data (obj): the TensorFlow serving response context (Context): an object containing request and configuration details Returns: (bytes, string): data to return to client, response content type """ if data.status_code != 200: raise ValueError(data.content.decode('utf-8')) response_content_type = context.accept_header prediction = data.content return prediction, response_content_type
If there is no custom pre- or post-processing, the SageMaker AI client converts the file image to JSON in a similar way before sending it over to the SageMaker AI endpoint.
For more information, see the Deploying to TensorFlow Serving Endpoints in the SageMaker Python SDK
.
-
-
The Amazon S3 bucket URI that contains the compiled model artifacts.