Summary Prerequisites and limitations Architecture Tools Epics Related resources

Create a custom Docker container image for SageMaker and use it for model training in AWS Step Functions

Created by Julia Bluszcz (AWS), Neha Sharma (AWS), Aubrey Oosthuizen (AWS), Mohan Gowda Purushothama (AWS), and Mateusz Zaremba (AWS)

Summary

This pattern shows how to create a Docker container image for Amazon SageMaker and use it for a training model in AWS Step Functions. By packaging custom algorithms in a container, you can run almost any code in the SageMaker environment, regardless of programming language, framework, or dependencies.

In the example SageMaker notebook provided, the custom Docker container image is stored in Amazon Elastic Container Registry (Amazon ECR). Step Functions then uses the container that’s stored in Amazon ECR to run a Python processing script for SageMaker. Then, the container exports the model to Amazon Simple Storage Service (Amazon S3).

Prerequisites and limitations

Prerequisites

An active AWS account
An AWS Identity and Access Management (IAM) role for SageMaker with Amazon S3 permissions
An IAM role for Step Functions
Familiarity with Python
Familiarity with the Amazon SageMaker Python SDK
Familiarity with the AWS Command Line Interface (AWS CLI)
Familiarity with AWS SDK for Python (Boto3)
Familiarity with Amazon ECR
Familiarity with Docker

Product versions

AWS Step Functions Data Science SDK version 2.3.0
Amazon SageMaker Python SDK version 2.78.0

Architecture

The following diagram shows an example workflow for creating a Docker container image for SageMaker, then using it for a training model in Step Functions:

Workflow to create Docker container image for SageMaker to use as a Step Functions training model.

The diagram shows the following workflow:

A data scientist or DevOps engineer uses a Amazon SageMaker notebook to create a custom Docker container image.
A data scientist or DevOps engineer stores the Docker container image in an Amazon ECR private repository that’s in a private registry.
A data scientist or DevOps engineer uses the Docker container to run a Python SageMaker processing job in a Step Functions workflow.

Automation and scale

The example SageMaker notebook in this pattern uses an ml.m5.xlarge notebook instance type. You can change the instance type to fit your use case. For more information about SageMaker notebook instance types, see Amazon SageMaker Pricing.

Tools

Amazon Elastic Container Registry (Amazon ECR) is a managed container image registry service that’s secure, scalable, and reliable.
Amazon SageMaker is a managed machine learning (ML) service that helps you build and train ML models and then deploy them into a production-ready hosted environment.
Amazon SageMaker Python SDK is an open source library for training and deploying machine-learning models on SageMaker.
AWS Step Functions is a serverless orchestration service that helps you combine AWS Lambda functions and other AWS services to build business-critical applications.
AWS Step Functions Data Science Python SDK is an open source library that helps you create Step Functions workflows that process and publish machine learning models.

Epics

Task	Description	Skills required
Setup Amazon ECR and create a new private registry.	If you haven’t already, set up Amazon ECR by following the instructions in Setting up with Amazon ECR in the Amazon ECR User Guide. Each AWS account is provided with a default private Amazon ECR registry.	DevOps engineer
Create an Amazon ECR private repository.	Follow the instructions in Creating a private repository in the Amazon ECR User Guide. Note The repository that you create is where you’ll store your custom Docker container images.	DevOps engineer
Create a Dockerfile that includes the specifications needed to run your SageMaker processing job.	Create a Dockerfile that includes the specifications needed to run your SageMaker processing job by configuring a Dockerfile. For instructions, see Adapting your own training container in the Amazon SageMaker Developer Guide. For more information about Dockerfiles, see the Dockerfile Reference in the Docker documentation. Example Jupyter notebook code cells to create a Dockerfile Cell 1 `# Make docker folder !mkdir -p docker` Cell 2 `%%writefile docker/Dockerfile FROM python:3.7-slim-buster RUN pip3 install pandas==0.25.3 scikit-learn==0.21.3 ENV PYTHONUNBUFFERED=TRUE ENTRYPOINT ["python3"]`	DevOps engineer
Build your Docker container image and push it to Amazon ECR.	Build the container image using the Dockerfile that you created by running the `docker build` command in the AWS CLI. Push the container image to Amazon ECR by running the `docker push` command. For more information, see Building and registering the container in Building your own algorithm container on GitHub. Example Jupyter notebook code cells to build and register a Docker image Important Before running the following cells, make sure that you’ve created a Dockerfile and stored it in the directory called `docker`. Also, make sure that you’ve created an Amazon ECR repository, and that you replace the `ecr_repository` value in the first cell with your repository’s name. Cell 1 `import boto3 tag = ':latest' account_id = boto3.client('sts').get_caller_identity().get('Account') region = boto3.Session().region_name ecr_repository = 'byoc' image_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)` Cell 2 `# Build docker image !docker build -t $image_uri docker` Cell 3 `# Authenticate to ECR !aws ecr get-login-password --region {region} \| docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com` Cell 4 `# Push docker image !docker push $image_uri` Note You must authenticate your Docker client to your private registry so that you can use the `docker push` and `docker pull` commands. These commands push and pull images to and from the repositories in your registry.	DevOps engineer

Task Description Skills required

Task	Description	Skills required
Create a Python script that includes your custom processing and model training logic.	Write custom processing logic to run in your data processing script. Then, save it as a Python script named `training.py`. For more information, see Bring your own model with SageMaker Script Mode on GitHub. Example Python script that includes custom processing and model training logic `%%writefile training.py from numpy import empty import pandas as pd import os from sklearn import datasets, svm from joblib import dump, load if __name__ == '__main__': digits = datasets.load_digits() #create classifier object clf = svm.SVC(gamma=0.001, C=100.) #fit the model clf.fit(digits.data[:-1], digits.target[:-1]) #model output in binary format output_path = os.path.join('/opt/ml/processing/model', "model.joblib") dump(clf, output_path)`	Data scientist
Create a Step Functions workflow that includes your SageMaker Processing job as one of the steps.	Install and import the AWS Step Functions Data Science SDK and upload the training.py file to Amazon S3. Then, use the Amazon SageMaker Python SDK to define a processing step in Step Functions. Important Make sure that you’ve created an IAM execution role for Step Functions in your AWS account. Example environment set up and custom training script to upload to Amazon S3 !pip install stepfunctions import boto3 import stepfunctions import sagemaker import datetime from stepfunctions import steps from stepfunctions.inputs import ExecutionInput from stepfunctions.steps import ( Chain ) from stepfunctions.workflow import Workflow from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput sagemaker_session = sagemaker.Session() bucket = sagemaker_session.default_bucket() role = sagemaker.get_execution_role() prefix = 'byoc-training-model' # See prerequisites section to create this role workflow_execution_role = f"arn:aws:iam::{account_id}:role/AmazonSageMaker-StepFunctionsWorkflowExecutionRole" execution_input = ExecutionInput( schema={ "PreprocessingJobName": str}) input_code = sagemaker_session.upload_data( "training.py", bucket=bucket, key_prefix="preprocessing.py", ) Example SageMaker processing step definition that uses a custom Amazon ECR image and Python script Note Make sure that you use the `execution_input` parameter to specify the job name. The parameter’s value must be unique each time the job runs. Also, the training.py file’s code is passed as an `input` parameter to the `ProcessingStep`, which means that it will be copied inside the container. The destination for the `ProcessingInput` code is the same as the second argument inside the `container_entrypoint`. script_processor = ScriptProcessor(command=['python3'], image_uri=image_uri, role=role, instance_count=1, instance_type='ml.m5.xlarge') processing_step = steps.ProcessingStep( "training-step", processor=script_processor, job_name=execution_input["PreprocessingJobName"], inputs=[ ProcessingInput( source=input_code, destination="/opt/ml/processing/input/code", input_name="code", ), ], outputs=[ ProcessingOutput( source='/opt/ml/processing/model', destination="s3://{}/{}".format(bucket, prefix), output_name='byoc-example') ], container_entrypoint=["python3", "/opt/ml/processing/input/code/training.py"], ) Example Step Functions workflow that runs a SageMaker processing job Note This example workflow includes the SageMaker processing job step only, not a complete Step Functions workflow. For a full example workflow, see Example notebooks in SageMaker in the AWS Step Functions Data Science SDK documentation. `workflow_graph = Chain([processing_step]) workflow = Workflow( name="ProcessingWorkflow", definition=workflow_graph, role=workflow_execution_role ) workflow.create() # Execute workflow execution = workflow.execute( inputs={ "PreprocessingJobName": str(datetime.datetime.now().strftime("%Y%m%d%H%M-%SS")), # Each pre processing job (SageMaker processing job) requires a unique name, } ) execution_output = execution.get_output(wait=True)`	Data scientist

Create a Python script that includes your custom processing and model training logic.

Write custom processing logic to run in your data processing script. Then, save it as a Python script named training.py.

For more information, see Bring your own model with SageMaker Script Mode on GitHub.

Example Python script that includes custom processing and model training logic


%%writefile training.py
from numpy import empty
import pandas as pd
import os
from sklearn import datasets, svm
from joblib import dump, load


if __name__ == '__main__':
    digits = datasets.load_digits()
    #create classifier object
    clf = svm.SVC(gamma=0.001, C=100.)
    
    #fit the model
    clf.fit(digits.data[:-1], digits.target[:-1])
    
    #model output in binary format
    output_path = os.path.join('/opt/ml/processing/model', "model.joblib")
    dump(clf, output_path)

Data scientist

Create a Step Functions workflow that includes your SageMaker Processing job as one of the steps.

Install and import the AWS Step Functions Data Science SDK and upload the training.py file to Amazon S3. Then, use the Amazon SageMaker Python SDK to define a processing step in Step Functions.

Important

Make sure that you’ve created an IAM execution role for Step Functions in your AWS account.

Example environment set up and custom training script to upload to Amazon S3


!pip install stepfunctions

import boto3
import stepfunctions
import sagemaker
import datetime

from stepfunctions import steps
from stepfunctions.inputs import ExecutionInput
from stepfunctions.steps import (
    Chain
)
from stepfunctions.workflow import Workflow
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket() 
role = sagemaker.get_execution_role()
prefix = 'byoc-training-model'

# See prerequisites section to create this role
workflow_execution_role = f"arn:aws:iam::{account_id}:role/AmazonSageMaker-StepFunctionsWorkflowExecutionRole"

execution_input = ExecutionInput(
    schema={
        "PreprocessingJobName": str})


input_code = sagemaker_session.upload_data(
    "training.py",
    bucket=bucket,
    key_prefix="preprocessing.py",
)

Example SageMaker processing step definition that uses a custom Amazon ECR image and Python script

Note

Make sure that you use the execution_input parameter to specify the job name. The parameter’s value must be unique each time the job runs. Also, the training.py file’s code is passed as an input parameter to the ProcessingStep, which means that it will be copied inside the container. The destination for the ProcessingInput code is the same as the second argument inside the container_entrypoint.


script_processor = ScriptProcessor(command=['python3'],
                image_uri=image_uri,
                role=role,
                instance_count=1,
                instance_type='ml.m5.xlarge')


processing_step = steps.ProcessingStep(
    "training-step",
    processor=script_processor,
    job_name=execution_input["PreprocessingJobName"],
    inputs=[
        ProcessingInput(
            source=input_code,
            destination="/opt/ml/processing/input/code",
            input_name="code",
        ),
    ],
    outputs=[
        ProcessingOutput(
            source='/opt/ml/processing/model', 
            destination="s3://{}/{}".format(bucket, prefix), 
            output_name='byoc-example')
    ],
    container_entrypoint=["python3", "/opt/ml/processing/input/code/training.py"],
)

Example Step Functions workflow that runs a SageMaker processing job

Note

This example workflow includes the SageMaker processing job step only, not a complete Step Functions workflow. For a full example workflow, see Example notebooks in SageMaker in the AWS Step Functions Data Science SDK documentation.


workflow_graph = Chain([processing_step])

workflow = Workflow(
    name="ProcessingWorkflow",
    definition=workflow_graph,
    role=workflow_execution_role
)

workflow.create()
# Execute workflow
execution = workflow.execute(
    inputs={
        "PreprocessingJobName": str(datetime.datetime.now().strftime("%Y%m%d%H%M-%SS")),  # Each pre processing job (SageMaker processing job) requires a unique name,
    }
)
execution_output = execution.get_output(wait=True)

Data scientist

Related resources

Process data (Amazon SageMaker Developer Guide)
Adapting your own training container (Amazon SageMaker Developer Guide)

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Build an MLOps workflow using SageMaker AI and Azure DevOps

Deploy a RAG use case on AWS