使用 `modeltraining` 命令進行模型訓練

您可以使用 Neptune ML modeltraining 命令來建立模型訓練工作、檢查其狀態、停止該工作，或列出所有作用中的模型訓練工作。

使用 Neptune ML `modeltraining` 命令建立模型訓練工作

用於建立全新工作的 Neptune ML modeltraining 命令如下所示：

用於為增量模型訓練建立更新工作的 Neptune ML modeltraining 命令如下所示：

AWS CLI


aws neptunedata start-ml-model-training-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --id "(a unique model-training job ID)" \
  --data-processing-job-id "(the data-processing job-id of a completed job)" \
  --train-model-s3-location "s3://(your S3 bucket)/neptune-model-graph-autotrainer" \
  --previous-model-training-job-id "(the job ID of a completed model-training job to update)"

如需詳細資訊，請參閱《 AWS CLI 命令參考》中的 start-ml-model-training-job。

SDK


import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.start_ml_model_training_job(
    id='(a unique model-training job ID)',
    dataProcessingJobId='(the data-processing job-id of a completed job)',
    trainModelS3Location='s3://(your S3 bucket)/neptune-model-graph-autotrainer',
    previousModelTrainingJobId='(the job ID of a completed model-training job to update)'
)

print(response)

awscurl


awscurl https://your-neptune-endpoint:port/ml/modeltraining \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-training job ID)",
        "dataProcessingJobId" : "(the data-processing job-id of a completed job)",
        "trainModelS3Location" : "s3://(your S3 bucket)/neptune-model-graph-autotrainer",
        "previousModelTrainingJobId" : "(the job ID of a completed model-training job to update)"
      }'

注意

此範例假設您的 AWS 登入資料已在您的環境中設定。將 us-east-1 取代為 Neptune 叢集的區域。

curl


curl \
  -X POST https://your-neptune-endpoint:port/ml/modeltraining \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-training job ID)",
        "dataProcessingJobId" : "(the data-processing job-id of a completed job)",
        "trainModelS3Location" : "s3://(your S3 bucket)/neptune-model-graph-autotrainer",
        "previousModelTrainingJobId" : "(the job ID of a completed model-training job to update)"
      }'

透過使用者提供的自訂模型實作建立新工作的 Neptune ML modeltraining 命令如下所示：

AWS CLI


aws neptunedata start-ml-model-training-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --id "(a unique model-training job ID)" \
  --data-processing-job-id "(the data-processing job-id of a completed job)" \
  --train-model-s3-location "s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer" \
  --model-name "custom" \
  --custom-model-training-parameters '{
    "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
    "trainingEntryPointScript": "(your training script entry-point name in the Python module)",
    "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
  }'

如需詳細資訊，請參閱《 AWS CLI 命令參考》中的 start-ml-model-training-job。

SDK


import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.start_ml_model_training_job(
    id='(a unique model-training job ID)',
    dataProcessingJobId='(the data-processing job-id of a completed job)',
    trainModelS3Location='s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer',
    modelName='custom',
    customModelTrainingParameters={
        'sourceS3DirectoryPath': 's3://(your Amazon S3 bucket)/(path to your Python module)',
        'trainingEntryPointScript': '(your training script entry-point name in the Python module)',
        'transformEntryPointScript': '(your transform script entry-point name in the Python module)'
    }
)

print(response)

awscurl


awscurl https://your-neptune-endpoint:port/ml/modeltraining \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-training job ID)",
        "dataProcessingJobId" : "(the data-processing job-id of a completed job)",
        "trainModelS3Location" : "s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer",
        "modelName": "custom",
        "customModelTrainingParameters" : {
          "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
          "trainingEntryPointScript": "(your training script entry-point name in the Python module)",
          "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
        }
      }'

注意

此範例假設您的 AWS 登入資料已在您的環境中設定。將 us-east-1 取代為 Neptune 叢集的區域。

curl


curl \
  -X POST https://your-neptune-endpoint:port/ml/modeltraining \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-training job ID)",
        "dataProcessingJobId" : "(the data-processing job-id of a completed job)",
        "trainModelS3Location" : "s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer",
        "modelName": "custom",
        "customModelTrainingParameters" : {
          "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
          "trainingEntryPointScript": "(your training script entry-point name in the Python module)",
          "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
        }
      }'

用於建立 `modeltraining` 工作的參數

id – (選用) 新工作的唯一識別符。

類型：字串 預設值：自動產生的 UUID。
dataProcessingJobId – (必要) 已完成資料處理工作的工作 ID，該工作已建立訓練將使用的資料。

類型：字串
trainModelS3Location – (必要) Amazon S3 中要儲存模型成品的位置。

類型：字串
previousModelTrainingJobId – (選用) 已完成模型訓練工作的工作 ID，您想要根據更新的資料以增量方式更新此工作。

類型：字串 預設值：none。
sagemakerIamRoleArn – (選用) SageMaker AI 執行的 IAM 角色 ARN。

類型：字串注意：這必須列示在您的資料庫叢集參數群組中，否則會發生錯誤。
neptuneIamRoleArn – (選用) IAM 角色的 ARN，提供對 SageMaker AI 和 Amazon S3 資源的 Neptune 存取權。

類型：字串注意：這必須列示在您的資料庫叢集參數群組中，否則會發生錯誤。
modelName – (選用) 用於訓練的模型類型。根據預設，ML 模型會是自動以資料處理中使用的 modelType 為基礎，但您可以在這裡指定不同的模型類型。

類型：字串 預設值：rgcn 用於異質圖和 kge 用於知識圖譜。有效值：若為異質圖：rgcn。若為 kge 圖形：transe、distmult 或 rotate。若為自訂模型實作：custom。
baseProcessingInstanceType – (選用) 用於準備和管理 ML 模型訓練的 ML 執行個體類型。

類型：字串注意：這是根據記憶體需求選擇的 CPU 執行個體，用於處理訓練資料和模型。請參閱選取執行個體進行模型訓練和模型轉換。
trainingInstanceType – (選用) 用於模型訓練的 ML 執行個體類型。所有 Neptune ML 模型都支援 CPU、GPU 和多 GPU 訓練。

類型：字串預設︰ml.p3.2xlarge。

注意：選擇適合訓練的執行個體類型取決於工作類型、圖形大小和您的預算。請參閱選取執行個體進行模型訓練和模型轉換。
trainingInstanceVolumeSizeInGB – (選用) 訓練執行個體的磁碟區大小。輸入資料和輸出模型都會儲存在磁碟上，因此磁碟區大小必須大到足以保留這兩個資料集。

類型：整數。預設︰0。

備註：如果未指定或指定 0，Neptune ML 會根據資料處理步驟中產生的建議選取磁碟區大小。請參閱選取執行個體進行模型訓練和模型轉換。
trainingTimeOutInSeconds – (選用) 訓練工作的逾時 (以秒為單位)。

類型：整數。預設值：86,400 (1 天)。
maxHPONumberOfTrainingJobs – 要對超參數調校工作啟動的訓練工作總數上限。

類型：整數。預設︰2。

注意：Neptune ML 會自動調校機器學習模型的超參數。若要取得效能良好的模型，請至少使用 10 個工作 (換句話說，將 maxHPONumberOfTrainingJobs 設為 10)。一般來說，調校執行越多，結果越好。
maxHPOParallelTrainingJobs – 要對超參數調校工作啟動的並行訓練工作數目上限。

類型：整數。預設︰2。

注意：您可以執行的並行工作數目受制於訓練執行個體上可用的資源。
subnets – (選用) Neptune VPC 中子網路的 ID。

類型：字串清單。預設值：none。
securityGroupIds – (選用) VPC 安全群組 ID。

類型：字串清單。預設值：none。
volumeEncryptionKMSKey – (選用) SageMaker AI 用來加密連接至執行訓練任務之 ML 運算執行個體之儲存磁碟區上的資料之 AWS Key Management Service (AWS KMS) 金鑰。

類型：字串 預設值：none。
s3OutputEncryptionKMSKey – (選用) SageMaker AI 用來加密處理任務輸出的 AWS Key Management Service (AWS KMS) 金鑰。

類型：字串 預設值：none。
enableInterContainerTrafficEncryption – (選用) 在訓練或超參數調校工作中啟用或停用容器間流量加密。

類型：布林值。預設值：true。

注意
enableInterContainerTrafficEncryption 參數僅適用於引擎版本 1.2.0.2.R3。
enableManagedSpotTraining – (選用) 使用 Amazon Elastic Compute Cloud Spot 執行個體，將訓練機器學習模型的成本最佳化。如需詳細資訊，請參閱 Amazon SageMaker 中的受管 SageMaker 訓練。

類型：布林值。預設值：false。
customModelTrainingParameters – (選用) 自訂模型訓練的組態。這是具有下列欄位的 JSON 物件：
- sourceS3DirectoryPath – (必要) 此路徑通往實作您模型之 Python 模組所在的 Amazon S3 位置。這必須指向有效的現有 Amazon S3 位置，其中至少包含訓練指令碼、轉換指令碼和 model-hpo-configuration.json 檔案。
- trainingEntryPointScript – (選用) 指令碼模組中的進入點名稱，該指令碼會執行模型訓練，並接受超參數作為命令列引數 (包括固定的超參數)。
  
  預設︰training.py。
- transformEntryPointScript – (選用) 指令碼模組中的進入點名稱，該指令碼應在識別了超參數搜尋中的最佳模型之後執行，以計算模型部署所需的模型成品。它應該能夠在沒有命令列參數的情況下執行。
  
  預設︰transform.py。
maxWaitTime – (選用) 使用 Spot 執行個體來執行模型訓練時，要等待的時間上限 (以秒為單位)。應大於 trainingTimeOutInSeconds。

類型：整數。

使用 Neptune ML `modeltraining` 命令取得模型訓練工作的狀態

工作狀態的範例 Neptune ML modeltraining 命令如下所示：

`modeltraining` 工作狀態的參數

id – (必要) 模型訓練工作的唯一識別符。

類型：字串
neptuneIamRoleArn – (選用) IAM 角色的 ARN，提供對 SageMaker AI 和 Amazon S3 資源的 Neptune 存取權。

類型：字串注意：這必須列示在您的資料庫叢集參數群組中，否則會發生錯誤。

使用 Neptune ML `modeltraining` 命令停止模型訓練工作

用於停止工作的範例 Neptune ML modeltraining 命令如下所示：

`modeltraining` 停止工作的參數

id – (必要) 模型訓練工作的唯一識別符。

類型：字串
neptuneIamRoleArn – (選用) IAM 角色的 ARN，提供對 SageMaker AI 和 Amazon S3 資源的 Neptune 存取權。

類型：字串注意：這必須列示在您的資料庫叢集參數群組中，否則會發生錯誤。
clean – (選用) 此旗標指定在工作停止時應刪除所有 Amazon S3 成品。

類型：布林值。預設︰FALSE。

使用 Neptune ML `modeltraining` 命令列出作用中的模型訓練工作

用於列出作用中工作的範例 Neptune ML modeltraining 命令如下所示：

`modeltraining` 列出工作的參數

maxItems – (選用) 要傳回的項目數上限。

類型：整數。預設︰10。允許的最大值：1024。
neptuneIamRoleArn – (選用) IAM 角色的 ARN，提供對 SageMaker AI 和 Amazon S3 資源的 Neptune 存取權。

類型：字串注意：這必須列示在您的資料庫叢集參數群組中，否則會發生錯誤。

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

dataprocessing 命令

modeltransform 命令

使用 modeltraining 命令進行模型訓練

使用 Neptune ML modeltraining 命令建立模型訓練工作

注意

注意

注意

用於建立 modeltraining 工作的參數

注意

使用 Neptune ML modeltraining 命令取得模型訓練工作的狀態

注意

modeltraining 工作狀態的參數

使用 Neptune ML modeltraining 命令停止模型訓練工作

注意

modeltraining 停止工作的參數

使用 Neptune ML modeltraining 命令列出作用中的模型訓練工作

注意

modeltraining 列出工作的參數