CreateTrainingPlan - Amazon SageMaker

CreateTrainingPlan

Creates a new training plan in SageMaker to reserve compute capacity.

Amazon SageMaker Training Plan is a capability within SageMaker that allows customers to reserve and manage GPU capacity for large-scale AI model training. It provides a way to secure predictable access to computational resources within specific timelines and budgets, without the need to manage underlying infrastructure.

How it works

Plans can be created for specific resources such as SageMaker Training Jobs or SageMaker HyperPod clusters, automatically provisioning resources, setting up infrastructure, executing workloads, and handling infrastructure failures.

Plan creation workflow

  • Users search for available plan offerings based on their requirements (e.g., instance type, count, start time, duration) using the SearchTrainingPlanOfferings API operation.

  • They create a plan that best matches their needs using the ID of the plan offering they want to use.

  • After successful upfront payment, the plan's status becomes Scheduled.

  • The plan can be used to:

    • Queue training jobs.

    • Allocate to an instance group of a SageMaker HyperPod cluster.

  • When the plan start date arrives, it becomes Active. Based on available reserved capacity:

    • Training jobs are launched.

    • Instance groups are provisioned.

Plan composition

A plan can consist of one or more Reserved Capacities, each defined by a specific instance type, quantity, Availability Zone, duration, and start and end times. For more information about Reserved Capacity, see ReservedCapacitySummary .

Request Syntax

{ "Tags": [ { "Key": "string", "Value": "string" } ], "TrainingPlanName": "string", "TrainingPlanOfferingId": "string" }

Request Parameters

For information about the parameters that are common to all actions, see Common Parameters.

The request accepts the following data in JSON format.

Tags

An array of key-value pairs to apply to this training plan.

Type: Array of Tag objects

Array Members: Minimum number of 0 items. Maximum number of 50 items.

Required: No

TrainingPlanName

The name of the training plan to create.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 64.

Pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,63}

Required: Yes

TrainingPlanOfferingId

The unique identifier of the training plan offering to use for creating this plan.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 256.

Pattern: ^[a-z0-9\-]+$

Required: Yes

Response Syntax

{ "TrainingPlanArn": "string" }

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

TrainingPlanArn

The Amazon Resource Name (ARN); of the created training plan.

Type: String

Length Constraints: Minimum length of 50. Maximum length of 2048.

Pattern: arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:training-plan/.*

Errors

For information about the errors that are common to all actions, see Common Errors.

ResourceInUse

Resource being accessed is in use.

HTTP Status Code: 400

ResourceLimitExceeded

You have exceeded an SageMaker resource limit. For example, you might have too many training jobs created.

HTTP Status Code: 400

ResourceNotFound

Resource being access is not found.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: