Run Real-time Predictions with an Inference Pipeline
You can use trained models in an inference pipeline to make real-time predictions directly without performing external preprocessing. When you configure the pipeline, you can choose to use the built-in feature transformers already available in Amazon SageMaker AI. Or, you can implement your own transformation logic using just a few lines of scikit-learn or Spark code.
MLeap
The containers in a pipeline listen on the port specified in the
SAGEMAKER_BIND_TO_PORT
environment variable (instead of 8080).
When
running in an inference pipeline, SageMaker AI automatically provides this
environment variable to containers. If this environment variable isn't present,
containers default to using port 8080. To indicate that your container complies with
this requirement, use the following command to add a label to your Dockerfile:
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
If your container needs
to
listen on a second port,
choose
a port in the range specified by the SAGEMAKER_SAFE_PORT_RANGE
environment
variable. Specify the value as an inclusive range in the format
"XXXX-YYYY"
, where XXXX
and YYYY
are multi-digit integers. SageMaker AI provides this value automatically when you run the
container in a multicontainer pipeline.
Note
To use custom Docker images in a pipeline that includes SageMaker AI built-in algorithms, you need an Amazon Elastic Container Registry (Amazon ECR) policy. Your Amazon ECR repository must grant SageMaker AI permission to pull the image. For more information, see Troubleshoot Amazon ECR Permissions for Inference Pipelines.
Create and Deploy an Inference Pipeline Endpoint
The following code creates and deploys a real-time inference pipeline model with SparkML and XGBoost models in series using the SageMaker AI SDK.
from sagemaker.model import Model from sagemaker.pipeline_model import PipelineModel from sagemaker.sparkml.model import SparkMLModel sparkml_data = 's3://{}/{}/{}'.format(s3_model_bucket, s3_model_key_prefix, 'model.tar.gz') sparkml_model = SparkMLModel(model_data=sparkml_data) xgb_model = Model(model_data=xgb_model.model_data, image=training_image) model_name = 'serial-inference-' + timestamp_prefix endpoint_name = 'serial-inference-ep-' + timestamp_prefix sm_model = PipelineModel(name=model_name, role=role, models=[sparkml_model, xgb_model]) sm_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)
Request Real-Time Inference from an Inference Pipeline Endpoint
The following example shows how to make real-time predictions by calling an inference endpoint and passing a request payload in JSON format:
import sagemaker from sagemaker.predictor import json_serializer, json_deserializer, Predictor payload = { "input": [ { "name": "Pclass", "type": "float", "val": "1.0" }, { "name": "Embarked", "type": "string", "val": "Q" }, { "name": "Age", "type": "double", "val": "48.0" }, { "name": "Fare", "type": "double", "val": "100.67" }, { "name": "SibSp", "type": "double", "val": "1.0" }, { "name": "Sex", "type": "string", "val": "male" } ], "output": { "name": "features", "type": "double", "struct": "vector" } } predictor = Predictor(endpoint=endpoint_name, sagemaker_session=sagemaker.Session(), serializer=json_serializer, content_type='text/csv', accept='application/json' print(predictor.predict(payload))
The response you get from predictor.predict(payload)
is the model's
inference result.
Realtime inference pipeline example
You can run this example notebook using the SKLearn predictor