Tune the hyperparameters of a machine learning model in SageMaker - AWS Step Functions

Tune the hyperparameters of a machine learning model in SageMaker

This sample project demonstrates using SageMaker to tune the hyperparameters of a machine learning model, and to batch transform a test dataset.

In this project, Step Functions uses a Lambda function to seed an Amazon S3 bucket with a test dataset. It then creates a hyperparameter tuning job using the SageMaker service integration. It then uses a Lambda function to extract the data path, saves the tuning model, extracts the model name, and then runs a batch transform job to perform inference in SageMaker.

For more information about SageMaker and Step Functions service integrations, see the following:

Note

This sample project may incur charges.

For new AWS users, a free usage tier is available. On this tier, services are free below a certain level of usage. For more information about AWS costs and the Free Tier, see SageMaker Pricing.

Step 1: Create the state machine

  1. Open the Step Functions console and choose Create state machine.

  2. Type Tune a machine learning model in the search box, and then choose Tune a machine learning model from the search results that are returned.

  3. Choose Next to continue.

  4. Choose Run a demo to create a read-only and ready-to-deploy workflow, or choose Build on it to create an editable state machine definition that you can build on and later deploy.

    This sample project deploys the following resources:

    • Three AWS Lambda functions

    • An Amazon Simple Storage Service (Amazon S3) bucket

    • An AWS Step Functions state machine

    • Related AWS Identity and Access Management (IAM) roles

    The following image shows the workflow graph for the Tune a machine learning model sample project:

    Workflow graph of the Tune a machine learning model sample project.
  5. Choose Use template to continue with your selection.

Next steps depend on your previous choice:

  1. Run a demo – You can review the state machine before you create a read-only project with resources deployed by AWS CloudFormation to your AWS account.

    You can view the state machine definition, and when you are ready, choose Deploy and run to deploy the project and create the resources.

    Deploying can take up to 10 minutes to create resources and permissions. You can use the Stack ID link to monitor progress in AWS CloudFormation.

    After deploy completes, you should see your new state machine in the console.

  2. Build on it – You can review and edit the workflow definition. You might need to set values for placeholders in the sample project before attemping to run your custom workflow.

Note

Standard charges might apply for services deployed to your account.

Step 2: Run the state machine

  1. On the State machines page, choose your sample project.

  2. On the sample project page, choose Start execution.

  3. In the Start execution dialog box, do the following:

    1. (Optional) Enter a custom execution name to override the generated default.

      Non-ASCII names and logging

      Step Functions accepts names for state machines, executions, activities, and labels that contain non-ASCII characters. Because such characters will not work with Amazon CloudWatch, we recommend using only ASCII characters so you can track metrics in CloudWatch.

    2. (Optional) In the Input box, enter input values as JSON. You can skip this step if you are running a demo.

    3. Choose Start execution.

    The Step Functions console will direct you to an Execution Details page where you can choose states in the Graph view to explore related information in the Step details pane.

Example State Machine Code

The state machine in this sample project integrates with SageMaker and AWS Lambda by passing parameters directly to those resources, and uses an Amazon S3 bucket for the training data source and output.

Browse through this example state machine to see how Step Functions controls Lambda and SageMaker.

For more information about how AWS Step Functions can control other AWS services, see Integrating services with Step Functions.

{ "StartAt": "Generate Training Dataset", "States": { "Generate Training Dataset": { "Resource": "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageMa-LambdaForDataGeneration-1TF67BUE5A12U", "Type": "Task", "Next": "HyperparameterTuning (XGBoost)" }, "HyperparameterTuning (XGBoost)": { "Resource": "arn:aws:states:::sagemaker:createHyperParameterTuningJob.sync", "Parameters": { "HyperParameterTuningJobName.$": "$.body.jobName", "HyperParameterTuningJobConfig": { "Strategy": "Bayesian", "HyperParameterTuningJobObjective": { "Type": "Minimize", "MetricName": "validation:rmse" }, "ResourceLimits": { "MaxNumberOfTrainingJobs": 2, "MaxParallelTrainingJobs": 2 }, "ParameterRanges": { "ContinuousParameterRanges": [{ "Name": "alpha", "MinValue": "0", "MaxValue": "1000", "ScalingType": "Auto" }, { "Name": "gamma", "MinValue": "0", "MaxValue": "5", "ScalingType": "Auto" } ], "IntegerParameterRanges": [{ "Name": "max_delta_step", "MinValue": "0", "MaxValue": "10", "ScalingType": "Auto" }, { "Name": "max_depth", "MinValue": "0", "MaxValue": "10", "ScalingType": "Auto" } ] } }, "TrainingJobDefinition": { "AlgorithmSpecification": { "TrainingImage": "433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest", "TrainingInputMode": "File" }, "OutputDataConfig": { "S3OutputPath": "s3://amzn-s3-demo-bucket/models" }, "StoppingCondition": { "MaxRuntimeInSeconds": 86400 }, "ResourceConfig": { "InstanceCount": 1, "InstanceType": "ml.m4.xlarge", "VolumeSizeInGB": 30 }, "RoleArn": "arn:aws:iam::012345678912:role/StepFunctionsSample-SageM-SageMakerAPIExecutionRol-1MNH1VS5CGGOG", "InputDataConfig": [{ "DataSource": { "S3DataSource": { "S3DataDistributionType": "FullyReplicated", "S3DataType": "S3Prefix", "S3Uri": "s3://amzn-s3-demo-bucket/csv/train.csv" } }, "ChannelName": "train", "ContentType": "text/csv" }, { "DataSource": { "S3DataSource": { "S3DataDistributionType": "FullyReplicated", "S3DataType": "S3Prefix", "S3Uri": "s3://amzn-s3-demo-bucket/csv/validation.csv" } }, "ChannelName": "validation", "ContentType": "text/csv" }], "StaticHyperParameters": { "precision_dtype": "float32", "num_round": "2" } } }, "Type": "Task", "Next": "Extract Model Path" }, "Extract Model Path": { "Resource": "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelPath-V0R37CVARUS9", "Type": "Task", "Next": "HyperparameterTuning - Save Model" }, "HyperparameterTuning - Save Model": { "Parameters": { "PrimaryContainer": { "Image": "433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest", "Environment": {}, "ModelDataUrl.$": "$.body.modelDataUrl" }, "ExecutionRoleArn": "arn:aws:iam::012345678912:role/StepFunctionsSample-SageM-SageMakerAPIExecutionRol-1MNH1VS5CGGOG", "ModelName.$": "$.body.bestTrainingJobName" }, "Resource": "arn:aws:states:::sagemaker:createModel", "Type": "Task", "Next": "Extract Model Name" }, "Extract Model Name": { "Resource": "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelName-8FUOB30SM5EM", "Type": "Task", "Next": "Batch transform" }, "Batch transform": { "Type": "Task", "Resource": "arn:aws:states:::sagemaker:createTransformJob.sync", "Parameters": { "ModelName.$": "$.body.jobName", "TransformInput": { "CompressionType": "None", "ContentType": "text/csv", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://amzn-s3-demo-bucket/csv/test.csv" } } }, "TransformOutput": { "S3OutputPath": "s3://amzn-s3-demo-bucket/output" }, "TransformResources": { "InstanceCount": 1, "InstanceType": "ml.m4.xlarge" }, "TransformJobName.$": "$.body.jobName" }, "End": true } } }

For information about how to configure IAM when using Step Functions with other AWS services, see How Step Functions generates IAM policies for integrated services.

IAM Examples

These example AWS Identity and Access Management (IAM) policies generated by the sample project include the least privilege necessary to execute the state machine and related resources. We recommend that you include only those permissions that are necessary in your IAM policies.

The following IAM policy is attached to the state machine, and allows the state machine execution to access necessary SageMaker, Lambda, and Amazon S3 resources.

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "sagemaker:CreateHyperParameterTuningJob", "sagemaker:DescribeHyperParameterTuningJob", "sagemaker:StopHyperParameterTuningJob", "sagemaker:ListTags", "sagemaker:CreateModel", "sagemaker:CreateTransformJob", "iam:PassRole" ], "Resource": "*", "Effect": "Allow" }, { "Action": [ "lambda:InvokeFunction" ], "Resource": [ "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageMa-LambdaForDataGeneration-1TF67BUE5A12U", "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelPath-V0R37CVARUS9", "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelName-8FUOB30SM5EM" ], "Effect": "Allow" }, { "Action": [ "events:PutTargets", "events:PutRule", "events:DescribeRule" ], "Resource": [ "arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTrainingJobsRule", "arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTransformJobsRule", "arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTuningJobsRule" ], "Effect": "Allow" } ] }

The following IAM policy is referenced in the TrainingJobDefinition and HyperparameterTuning fields of the HyperparameterTuning state.

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "cloudwatch:PutMetricData", "logs:CreateLogStream", "logs:PutLogEvents", "logs:CreateLogGroup", "logs:DescribeLogStreams", "ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "sagemaker:DescribeHyperParameterTuningJob", "sagemaker:StopHyperParameterTuningJob", "sagemaker:ListTags" ], "Resource": "*", "Effect": "Allow" }, { "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*", "Effect": "Allow" }, { "Action": [ "s3:ListBucket" ], "Resource": "arn:aws:s3:::amzn-s3-demo-bucket", "Effect": "Allow" } ] }

The following IAM policy allows the Lambda function to seed the Amazon S3 bucket with sample data.

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:PutObject" ], "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*", "Effect": "Allow" } ] }

For information about how to configure IAM when using Step Functions with other AWS services, see How Step Functions generates IAM policies for integrated services.