ResourceConfig
Describes the resources, including machine learning (ML) compute instances and ML storage volumes, to use for model training.
Contents
- VolumeSizeInGB
-
The size of the ML storage volume that you want to provision.
ML storage volumes store model artifacts and incremental states. Training algorithms might also use the ML storage volume for scratch space. If you want to store the training data in the ML storage volume, choose
File
as theTrainingInputMode
in the algorithm specification.When using an ML instance with NVMe SSD volumes, SageMaker doesn't provision Amazon EBS General Purpose SSD (gp2) storage. Available storage is fixed to the NVMe-type instance's storage capacity. SageMaker configures storage paths for training datasets, checkpoints, model artifacts, and outputs to use the entire capacity of the instance storage. For example, ML instance families with the NVMe-type instance storage include
ml.p4d
,ml.g4dn
, andml.g5
.When using an ML instance with the EBS-only storage option and without instance storage, you must define the size of EBS volume through
VolumeSizeInGB
in theResourceConfig
API. For example, ML instance families that use EBS volumes includeml.c5
andml.p2
.To look up instance types and their instance storage types and volumes, see Amazon EC2 Instance Types
. To find the default local paths defined by the SageMaker training platform, see Amazon SageMaker Training Storage Folders for Training Datasets, Checkpoints, Model Artifacts, and Outputs.
Type: Integer
Valid Range: Minimum value of 1.
Required: Yes
- InstanceCount
-
The number of ML compute instances to use. For distributed training, provide a value greater than 1.
Type: Integer
Valid Range: Minimum value of 0.
Required: No
- InstanceGroups
-
The configuration of a heterogeneous cluster in JSON format.
Type: Array of InstanceGroup objects
Array Members: Maximum number of 5 items.
Required: No
- InstanceType
-
The ML compute instance type.
Note
SageMaker Training on Amazon Elastic Compute Cloud (EC2) P4de instances is in preview release starting December 9th, 2022.
Amazon EC2 P4de instances
(currently in preview) are powered by 8 NVIDIA A100 GPUs with 80GB high-performance HBM2e GPU memory, which accelerate the speed of training ML models that need to be trained on large datasets of high-resolution data. In this preview release, Amazon SageMaker supports ML training jobs on P4de instances ( ml.p4de.24xlarge
) to reduce model training time. Theml.p4de.24xlarge
instances are available in the following AWS Regions.-
US East (N. Virginia) (us-east-1)
-
US West (Oregon) (us-west-2)
To request quota limit increase and start using P4de instances, contact the SageMaker Training service team through your account team.
Type: String
Valid Values:
ml.m4.xlarge | ml.m4.2xlarge | ml.m4.4xlarge | ml.m4.10xlarge | ml.m4.16xlarge | ml.g4dn.xlarge | ml.g4dn.2xlarge | ml.g4dn.4xlarge | ml.g4dn.8xlarge | ml.g4dn.12xlarge | ml.g4dn.16xlarge | ml.m5.large | ml.m5.xlarge | ml.m5.2xlarge | ml.m5.4xlarge | ml.m5.12xlarge | ml.m5.24xlarge | ml.c4.xlarge | ml.c4.2xlarge | ml.c4.4xlarge | ml.c4.8xlarge | ml.p2.xlarge | ml.p2.8xlarge | ml.p2.16xlarge | ml.p3.2xlarge | ml.p3.8xlarge | ml.p3.16xlarge | ml.p3dn.24xlarge | ml.p4d.24xlarge | ml.p4de.24xlarge | ml.p5.48xlarge | ml.p5e.48xlarge | ml.c5.xlarge | ml.c5.2xlarge | ml.c5.4xlarge | ml.c5.9xlarge | ml.c5.18xlarge | ml.c5n.xlarge | ml.c5n.2xlarge | ml.c5n.4xlarge | ml.c5n.9xlarge | ml.c5n.18xlarge | ml.g5.xlarge | ml.g5.2xlarge | ml.g5.4xlarge | ml.g5.8xlarge | ml.g5.16xlarge | ml.g5.12xlarge | ml.g5.24xlarge | ml.g5.48xlarge | ml.g6.xlarge | ml.g6.2xlarge | ml.g6.4xlarge | ml.g6.8xlarge | ml.g6.16xlarge | ml.g6.12xlarge | ml.g6.24xlarge | ml.g6.48xlarge | ml.g6e.xlarge | ml.g6e.2xlarge | ml.g6e.4xlarge | ml.g6e.8xlarge | ml.g6e.16xlarge | ml.g6e.12xlarge | ml.g6e.24xlarge | ml.g6e.48xlarge | ml.trn1.2xlarge | ml.trn1.32xlarge | ml.trn1n.32xlarge | ml.m6i.large | ml.m6i.xlarge | ml.m6i.2xlarge | ml.m6i.4xlarge | ml.m6i.8xlarge | ml.m6i.12xlarge | ml.m6i.16xlarge | ml.m6i.24xlarge | ml.m6i.32xlarge | ml.c6i.xlarge | ml.c6i.2xlarge | ml.c6i.8xlarge | ml.c6i.4xlarge | ml.c6i.12xlarge | ml.c6i.16xlarge | ml.c6i.24xlarge | ml.c6i.32xlarge | ml.r5d.large | ml.r5d.xlarge | ml.r5d.2xlarge | ml.r5d.4xlarge | ml.r5d.8xlarge | ml.r5d.12xlarge | ml.r5d.16xlarge | ml.r5d.24xlarge | ml.t3.medium | ml.t3.large | ml.t3.xlarge | ml.t3.2xlarge | ml.r5.large | ml.r5.xlarge | ml.r5.2xlarge | ml.r5.4xlarge | ml.r5.8xlarge | ml.r5.12xlarge | ml.r5.16xlarge | ml.r5.24xlarge
Required: No
-
- KeepAlivePeriodInSeconds
-
The duration of time in seconds to retain configured resources in a warm pool for subsequent training jobs.
Type: Integer
Valid Range: Minimum value of 0. Maximum value of 3600.
Required: No
- VolumeKmsKeyId
-
The AWS KMS key that SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) that run the training job.
Note
Certain Nitro-based instances include local storage, dependent on the instance type. Local storage volumes are encrypted using a hardware module on the instance. You can't request a
VolumeKmsKeyId
when using an instance type with local storage.For a list of instance types that support local instance storage, see Instance Store Volumes.
For more information about local instance storage encryption, see SSD Instance Store Volumes.
The
VolumeKmsKeyId
can be in any of the following formats:-
// KMS Key ID
"1234abcd-12ab-34cd-56ef-1234567890ab"
-
// Amazon Resource Name (ARN) of a KMS Key
"arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab"
Type: String
Length Constraints: Maximum length of 2048.
Pattern:
^[a-zA-Z0-9:/_-]*$
Required: No
-
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: