Use SageMaker AI components
In this tutorial, you run a pipeline using SageMaker AI Components for Kubeflow Pipelines to train a
classification model using Kmeans with the MNIST dataset on SageMaker AI. The workflow uses Kubeflow
Pipelines as the orchestrator and SageMaker AI to execute each step of the workflow. The example was taken from an existing SageMaker AI example
You can define your pipeline in Python using AWS SDK for Python (Boto3) then use the KFP dashboard, KFP CLI,
or Boto3 to compile, deploy, and run your workflows. The full code for the MNIST classification
pipeline example is available in the Kubeflow Github repository
You can find additional
SageMaker AI Kubeflow Pipelines examples
To run the classification pipeline example, create a SageMaker AI IAM execution role granting your training job the permission to access AWS resources, then continue with the steps that correspond to your deployment option.
Create a SageMaker AI execution role
The kfp-example-sagemaker-execution-role
IAM role is a runtime role assumed
by SageMaker AI jobs to access AWS resources. In the following command, you create an IAM
execution role named kfp-example-sagemaker-execution-role
, attach two managed
policies (AmazonSageMakerFullAccess, AmazonS3FullAccess), and create a trust relationship with
SageMaker AI to grant SageMaker AI jobs access to those AWS resources.
You provide this role as an input parameter when running the pipeline.
Run the following command to create the role. Note the ARN that is returned in your output.
SAGEMAKER_EXECUTION_ROLE_NAME=kfp-example-sagemaker-execution-role TRUST="{ \"Version\": \"2012-10-17\", \"Statement\": [ { \"Effect\": \"Allow\", \"Principal\": { \"Service\": \"sagemaker.amazonaws.com\" }, \"Action\": \"sts:AssumeRole\" } ] }" aws iam create-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --assume-role-policy-document "$TRUST" aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess aws iam get-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --output text --query 'Role.Arn'
Follow the instructions of the SageMaker Training Pipeline tutorial for MNIST Classification with K-Means
Prepare datasets
To run the pipelines, you need to upload the data extraction pre-processing script to
an Amazon S3 bucket. This bucket and all resources for this example must be located in the
us-east-1
region. For information on creating a bucket, see Creating a
bucket.
From the mnist-kmeans-sagemaker
folder of the Kubeflow repository you
cloned on your gateway node, run the following command to upload the
kmeans_preprocessing.py
file to your Amazon S3 bucket. Change
<bucket-name>
to the name of your Amazon S3 bucket.
aws s3 cp mnist-kmeans-sagemaker/kmeans_preprocessing.py s3://
<bucket-name>
/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
Compile and deploy your pipeline
After defining the pipeline, you must compile it to an intermediate representation before you submit it to the Kubeflow Pipelines service on your cluster. The intermediate representation is a workflow specification in the form of a YAML file compressed into a tar.gz file. You need the KFP SDK to compile your pipeline.
Install KFP SDK
Run the following from the command line of your gateway node:
-
Install the KFP SDK following the instructions in the Kubeflow pipelines documentation
. -
Verify that the KFP SDK is installed with the following command:
pip show kfp
-
Verify that
dsl-compile
has been installed correctly as follows:which dsl-compile
Compile your pipeline
You have three options to interact with Kubeflow Pipelines: KFP UI, KFP CLI, or the KFP SDK. The following sections illustrate the workflow using the KFP UI and CLI.
Complete the following steps from your gateway node.
-
Modify your Python file with your Amazon S3 bucket name and IAM role ARN.
-
Use the
dsl-compile
command from the command line to compile your pipeline as follows. Replace<path-to-python-file>
with the path to your pipeline and<path-to-output>
with the location where you want your tar.gz file to be.dsl-compile --py
<path-to-python-file>
--output<path-to-output>
Upload and run the pipeline using the KFP CLI
Complete the following steps from the command line of your gateway node. KFP organizes runs of your pipeline as experiments. You have the option to specify an experiment name. If you do not specify one, the run will be listed under Default experiment.
-
Upload your pipeline as follows:
kfp pipeline upload --pipeline-name
<pipeline-name>
<path-to-output-tar.gz>
Your output should look like the following. Take note of the pipeline
ID
.Pipeline 29c3ff21-49f5-4dfe-94f6-618c0e2420fe has been submitted Pipeline Details ------------------ ID 29c3ff21-49f5-4dfe-94f6-618c0e2420fe Name sm-pipeline Description Uploaded at 2020-04-30T20:22:39+00:00 ... ...
-
Create a run using the following command. The KFP CLI run command currently does not support specifying input parameters while creating the run. You need to update your parameters in the AWS SDK for Python (Boto3) pipeline file before compiling. Replace
<experiment-name>
and<job-name>
with any names. Replace<pipeline-id>
with the ID of your submitted pipeline. Replace<your-role-arn>
with the ARN ofkfp-example-pod-role
. Replace<your-bucket-name>
with the name of the Amazon S3 bucket you created.kfp run submit --experiment-name
<experiment-name>
--run-name<job-name>
--pipeline-id<pipeline-id>
role_arn="<your-role-arn>
" bucket_name="<your-bucket-name>
"You can also directly submit a run using the compiled pipeline package created as the output of the
dsl-compile
command.kfp run submit --experiment-name
<experiment-name>
--run-name<job-name>
--package-file<path-to-output>
role_arn="<your-role-arn>
" bucket_name="<your-bucket-name>
"Your output should look like the following:
Creating experiment aws. Run 95084a2c-f18d-4b77-a9da-eba00bf01e63 is submitted +--------------------------------------+--------+----------+---------------------------+ | run id | name | status | created at | +======================================+========+==========+===========================+ | 95084a2c-f18d-4b77-a9da-eba00bf01e63 | sm-job | | 2020-04-30T20:36:41+00:00 | +--------------------------------------+--------+----------+---------------------------+
-
Navigate to the UI to check the progress of the job.
Upload and run the pipeline using the KFP UI
-
On the left panel, choose the Pipelines tab.
-
In the upper-right corner, choose +UploadPipeline.
-
Enter the pipeline name and description.
-
Choose Upload a file and enter the path to the tar.gz file you created using the CLI or with AWS SDK for Python (Boto3).
-
On the left panel, choose the Pipelines tab.
-
Find the pipeline you created.
-
Choose +CreateRun.
-
Enter your input parameters.
-
Choose Run.
Run predictions
Once your classification pipeline is deployed, you can run classification predictions
against the endpoint that was created by the Deploy component. Use the KFP UI to check the
output artifacts for sagemaker-deploy-model-endpoint_name
. Download the .tgz
file to extract the endpoint name or check the SageMaker AI console in the region you used.
Configure permissions to run predictions
If you want to run predictions from your gateway node, skip this section.
To use any other machine to run predictions, assign
the sagemaker:InvokeEndpoint
permission to the IAM role used by the
client machine.
-
On your gateway node, run the following to create an IAM policy file:
cat <<EoF > ./sagemaker-invoke.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "sagemaker:InvokeEndpoint" ], "Resource": "*" } ] } EoF
-
Attach the policy to the IAM role of the client node.
Run the following command. Replace
<your-instance-IAM-role>
with the name of the IAM role. Replace<path-to-sagemaker-invoke-json>
with the path to the policy file you created.aws iam put-role-policy --role-name
<your-instance-IAM-role>
--policy-name sagemaker-invoke-for-worker --policy-document file://<path-to-sagemaker-invoke-json>
Run predictions
-
Create a AWS SDK for Python (Boto3) file from your client machine named
mnist-predictions.py
with the following content. Replace theENDPOINT_NAME
variable. The script loads the MNIST dataset, creates a CSV from those digits, then sends the CSV to the endpoint for prediction and prints the results.import boto3 import gzip import io import json import numpy import pickle ENDPOINT_NAME='
<endpoint-name>
' region = boto3.Session().region_name # S3 bucket where the original mnist data is downloaded and stored downloaded_data_bucket = f"jumpstart-cache-prod-{region}" downloaded_data_prefix = "1p-notebooks-datasets/mnist" # Download the dataset s3 = boto3.client("s3") s3.download_file(downloaded_data_bucket, f"{downloaded_data_prefix}/mnist.pkl.gz", "mnist.pkl.gz") # Load the dataset with gzip.open('mnist.pkl.gz', 'rb') as f: train_set, valid_set, test_set = pickle.load(f, encoding='latin1') # Simple function to create a csv from our numpy array def np2csv(arr): csv = io.BytesIO() numpy.savetxt(csv, arr, delimiter=',', fmt='%g') return csv.getvalue().decode().rstrip() runtime = boto3.Session(region).client('sagemaker-runtime') payload = np2csv(train_set[0][30:31]) response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME, ContentType='text/csv', Body=payload) result = json.loads(response['Body'].read().decode()) print(result) -
Run the AWS SDK for Python (Boto3) file as follows:
python mnist-predictions.py
View results and logs
When the pipeline is running, you can choose any component to check execution details, such as inputs and outputs. This lists the names of created resources.
If the KFP request is successfully processed and an SageMaker AI job is created, the component logs in the KFP UI provide a link to the job created in SageMaker AI. The CloudWatch logs are also provided if the job is successfully created.
If you run too many pipeline jobs on the same cluster, you may see an error message that indicates that you do not have enough pods available. To fix this, log in to your gateway node and delete the pods created by the pipelines you are not using:
kubectl get pods -n kubeflow kubectl delete pods -n kubeflow
<name-of-pipeline-pod>
Cleanup
When you're finished with your pipeline, you need to clean up your resources.
-
From the KFP dashboard, terminate your pipeline runs if they do not exit properly by choosing Terminate.
-
If the Terminate option doesn't work, log in to your gateway node and manually terminate all the pods created by your pipeline run as follows:
kubectl get pods -n kubeflow kubectl delete pods -n kubeflow
<name-of-pipeline-pod>
-
Using your AWS account, log in to the SageMaker AI service. Manually stop all training, batch transform, and HPO jobs. Delete models, data buckets, and endpoints to avoid incurring any additional costs. Terminating the pipeline runs does not stop the jobs in SageMaker AI.