StartMLModelTrainingJob - Neptune Data API

StartMLModelTrainingJob

Creates a new Neptune ML model training job. See Model training using the modeltraining command.

When invoking this operation in a Neptune cluster that has IAM authentication enabled, the IAM user or role making the request must have a policy attached that allows the neptune-db:StartMLModelTrainingJob IAM action in that cluster.

Request Syntax

POST /ml/modeltraining HTTP/1.1 Content-type: application/json { "baseProcessingInstanceType": "string", "customModelTrainingParameters": { "sourceS3DirectoryPath": "string", "trainingEntryPointScript": "string", "transformEntryPointScript": "string" }, "dataProcessingJobId": "string", "enableManagedSpotTraining": boolean, "id": "string", "maxHPONumberOfTrainingJobs": number, "maxHPOParallelTrainingJobs": number, "neptuneIamRoleArn": "string", "previousModelTrainingJobId": "string", "s3OutputEncryptionKMSKey": "string", "sagemakerIamRoleArn": "string", "securityGroupIds": [ "string" ], "subnets": [ "string" ], "trainingInstanceType": "string", "trainingInstanceVolumeSizeInGB": number, "trainingTimeOutInSeconds": number, "trainModelS3Location": "string", "volumeEncryptionKMSKey": "string" }

URI Request Parameters

The request does not use any URI parameters.

Request Body

The request accepts the following data in JSON format.

baseProcessingInstanceType

The type of ML instance used in preparing and managing training of ML models. This is a CPU instance chosen based on memory requirements for processing the training data and model.

Type: String

Required: No

customModelTrainingParameters

The configuration for custom model training. This is a JSON object.

Type: CustomModelTrainingParameters object

Required: No

dataProcessingJobId

The job ID of the completed data-processing job that has created the data that the training will work with.

Type: String

Required: Yes

enableManagedSpotTraining

Optimizes the cost of training machine-learning models by using Amazon Elastic Compute Cloud spot instances. The default is False.

Type: Boolean

Required: No

id

A unique identifier for the new job. The default is An autogenerated UUID.

Type: String

Required: No

maxHPONumberOfTrainingJobs

Maximum total number of training jobs to start for the hyperparameter tuning job. The default is 2. Neptune ML automatically tunes the hyperparameters of the machine learning model. To obtain a model that performs well, use at least 10 jobs (in other words, set maxHPONumberOfTrainingJobs to 10). In general, the more tuning runs, the better the results.

Type: Integer

Required: No

maxHPOParallelTrainingJobs

Maximum number of parallel training jobs to start for the hyperparameter tuning job. The default is 2. The number of parallel jobs you can run is limited by the available resources on your training instance.

Type: Integer

Required: No

neptuneIamRoleArn

The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.

Type: String

Required: No

previousModelTrainingJobId

The job ID of a completed model-training job that you want to update incrementally based on updated data.

Type: String

Required: No

s3OutputEncryptionKMSKey

The Amazon Key Management Service (KMS) key that SageMaker uses to encrypt the output of the processing job. The default is none.

Type: String

Required: No

sagemakerIamRoleArn

The ARN of an IAM role for SageMaker execution.This must be listed in your DB cluster parameter group or an error will occur.

Type: String

Required: No

securityGroupIds

The VPC security group IDs. The default is None.

Type: Array of strings

Required: No

subnets

The IDs of the subnets in the Neptune VPC. The default is None.

Type: Array of strings

Required: No

trainingInstanceType

The type of ML instance used for model training. All Neptune ML models support CPU, GPU, and multiGPU training. The default is ml.p3.2xlarge. Choosing the right instance type for training depends on the task type, graph size, and your budget.

Type: String

Required: No

trainingInstanceVolumeSizeInGB

The disk volume size of the training instance. Both input data and the output model are stored on disk, so the volume size must be large enough to hold both data sets. The default is 0. If not specified or 0, Neptune ML selects a disk volume size based on the recommendation generated in the data processing step.

Type: Integer

Required: No

trainingTimeOutInSeconds

Timeout in seconds for the training job. The default is 86,400 (1 day).

Type: Integer

Required: No

trainModelS3Location

The location in Amazon S3 where the model artifacts are to be stored.

Type: String

Required: Yes

volumeEncryptionKMSKey

The Amazon Key Management Service (KMS) key that SageMaker uses to encrypt data on the storage volume attached to the ML compute instances that run the training job. The default is None.

Type: String

Required: No

Response Syntax

HTTP/1.1 200 Content-type: application/json { "arn": "string", "creationTimeInMillis": number, "id": "string" }

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

arn

The ARN of the new model training job.

Type: String

creationTimeInMillis

The model training job creation time, in milliseconds.

Type: Long

id

The unique ID of the new model training job.

Type: String

Errors

For information about the errors that are common to all actions, see Common Errors.

BadRequestException

Raised when a request is submitted that cannot be processed.

HTTP Status Code: 400

ClientTimeoutException

Raised when a request timed out in the client.

HTTP Status Code: 408

ConstraintViolationException

Raised when a value in a request field did not satisfy required constraints.

HTTP Status Code: 400

IllegalArgumentException

Raised when an argument in a request is not supported.

HTTP Status Code: 400

InvalidArgumentException

Raised when an argument in a request has an invalid value.

HTTP Status Code: 400

InvalidParameterException

Raised when a parameter value is not valid.

HTTP Status Code: 400

MissingParameterException

Raised when a required parameter is missing.

HTTP Status Code: 400

MLResourceNotFoundException

Raised when a specified machine-learning resource could not be found.

HTTP Status Code: 404

PreconditionsFailedException

Raised when a precondition for processing a request is not satisfied.

HTTP Status Code: 400

TooManyRequestsException

Raised when the number of requests being processed exceeds the limit.

HTTP Status Code: 429

UnsupportedOperationException

Raised when a request attempts to initiate an operation that is not supported.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: