Real-time Inference Pipeline Endpoint Call an Inference Endpoint Realtime inference pipeline example

Run Real-time Predictions with an Inference Pipeline

You can use trained models in an inference pipeline to make real-time predictions directly without performing external preprocessing. When you configure the pipeline, you can choose to use the built-in feature transformers already available in Amazon SageMaker AI. Or, you can implement your own transformation logic using just a few lines of scikit-learn or Spark code.

MLeap, a serialization format and execution engine for machine learning pipelines, supports Spark, scikit-learn, and TensorFlow for training pipelines and exporting them to a serialized pipeline called an MLeap Bundle. You can deserialize Bundles back into Spark for batch-mode scoring or into the MLeap runtime to power real-time API services.

The containers in a pipeline listen on the port specified in the SAGEMAKER_BIND_TO_PORT environment variable (instead of 8080). When running in an inference pipeline, SageMaker AI automatically provides this environment variable to containers. If this environment variable isn't present, containers default to using port 8080. To indicate that your container complies with this requirement, use the following command to add a label to your Dockerfile:


LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true

If your container needs to listen on a second port, choose a port in the range specified by the SAGEMAKER_SAFE_PORT_RANGE environment variable. Specify the value as an inclusive range in the format "XXXX-YYYY", where XXXX and YYYY are multi-digit integers. SageMaker AI provides this value automatically when you run the container in a multicontainer pipeline.

Note

To use custom Docker images in a pipeline that includes SageMaker AI built-in algorithms, you need an Amazon Elastic Container Registry (Amazon ECR) policy. Your Amazon ECR repository must grant SageMaker AI permission to pull the image. For more information, see Troubleshoot Amazon ECR Permissions for Inference Pipelines.

Create and Deploy an Inference Pipeline Endpoint

The following code creates and deploys a real-time inference pipeline model with SparkML and XGBoost models in series using the SageMaker AI SDK.


from sagemaker.model import Model
from sagemaker.pipeline_model import PipelineModel
from sagemaker.sparkml.model import SparkMLModel

sparkml_data = 's3://{}/{}/{}'.format(s3_model_bucket, s3_model_key_prefix, 'model.tar.gz')
sparkml_model = SparkMLModel(model_data=sparkml_data)
xgb_model = Model(model_data=xgb_model.model_data, image=training_image)

model_name = 'serial-inference-' + timestamp_prefix
endpoint_name = 'serial-inference-ep-' + timestamp_prefix
sm_model = PipelineModel(name=model_name, role=role, models=[sparkml_model, xgb_model])
sm_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)

Request Real-Time Inference from an Inference Pipeline Endpoint

The following example shows how to make real-time predictions by calling an inference endpoint and passing a request payload in JSON format:


import sagemaker
from sagemaker.predictor import json_serializer, json_deserializer, Predictor

payload = {
        "input": [
            {
                "name": "Pclass",
                "type": "float",
                "val": "1.0"
            },
            {
                "name": "Embarked",
                "type": "string",
                "val": "Q"
            },
            {
                "name": "Age",
                "type": "double",
                "val": "48.0"
            },
            {
                "name": "Fare",
                "type": "double",
                "val": "100.67"
            },
            {
                "name": "SibSp",
                "type": "double",
                "val": "1.0"
            },
            {
                "name": "Sex",
                "type": "string",
                "val": "male"
            }
        ],
        "output": {
            "name": "features",
            "type": "double",
            "struct": "vector"
        }
    }

predictor = Predictor(endpoint=endpoint_name, sagemaker_session=sagemaker.Session(), serializer=json_serializer,
                                content_type='text/csv', accept='application/json'

print(predictor.predict(payload))

The response you get from predictor.predict(payload) is the model's inference result.

Realtime inference pipeline example

You can run this example notebook using the SKLearn predictor that shows how to deploy an endpoint, run an inference request, then deserialize the response. Find this notebook and more examples in the Amazon SageMaker example GitHub repository.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Create a Pipeline Model

Batch transforms