本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
處理從 Neptune 匯出用於訓練的圖形資料
資料處理步驟會取得匯出程序所建立的 Neptune 圖形資料,並建立 Deep Graph Library (DGL) 在訓練期間使用的資訊。這包括執行各種資料對應和轉換:
管理 Neptune ML 的資料處理步驟
從 Neptune 匯出您想要用於模型訓練的資料之後,您可以使用如下所示的命令啟動資料處理任務:
- AWS CLI
-
aws neptunedata start-ml-data-processing-job \
--endpoint-url https://your-neptune-endpoint:port \
--input-data-s3-location "s3://(S3 bucket name)/(path to your input folder)" \
--id "(a job ID for the new job)" \
--processed-data-s3-location "s3://(S3 bucket name)/(path to your output folder)" \
--config-file-name "training-job-configuration.json"
如需詳細資訊,請參閱《 AWS CLI 命令參考》中的 start-ml-data-processing-job。
- SDK
-
import boto3
from botocore.config import Config
client = boto3.client(
'neptunedata',
endpoint_url='https://your-neptune-endpoint:port',
config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)
response = client.start_ml_data_processing_job(
inputDataS3Location='s3://(S3 bucket name)/(path to your input folder)',
id='(a job ID for the new job)',
processedDataS3Location='s3://(S3 bucket name)/(path to your output folder)',
configFileName='training-job-configuration.json'
)
print(response)
- awscurl
-
awscurl https://your-neptune-endpoint:port/ml/dataprocessing \
--region us-east-1 \
--service neptune-db \
-X POST \
-H 'Content-Type: application/json' \
-d '{
"inputDataS3Location" : "s3://(S3 bucket name)/(path to your input folder)",
"id" : "(a job ID for the new job)",
"processedDataS3Location" : "s3://(S3 bucket name)/(path to your output folder)",
"configFileName" : "training-job-configuration.json"
}'
此範例假設您的 AWS 登入資料已在您的環境中設定。將 us-east-1 取代為 Neptune 叢集的區域。
- curl
-
curl \
-X POST https://your-neptune-endpoint:port/ml/dataprocessing \
-H 'Content-Type: application/json' \
-d '{
"inputDataS3Location" : "s3://(S3 bucket name)/(path to your input folder)",
"id" : "(a job ID for the new job)",
"processedDataS3Location" : "s3://(S3 bucket name)/(path to your output folder)",
"configFileName" : "training-job-configuration.json"
}'
如何使用此命令的詳細資訊會在 dataprocessing 命令 中加以說明,伴隨如何取得執行中工作狀態、如何停止執行中工作,以及如何列出所有執行中工作的相關資訊。
處理 Neptune ML 的更新圖形資料
您也可以將 previousDataProcessingJobId 提供給 API,以確保新的資料處理工作使用與先前工作相同的處理方法。當您想要透過對新資料重新訓練舊模型,或對新資料上重新計算模型成品,以在 Neptune 中取得更新圖形資料的預測時,這是必要的。
您可以使用如下所示的命令來執行此操作:
- AWS CLI
-
aws neptunedata start-ml-data-processing-job \
--endpoint-url https://your-neptune-endpoint:port \
--input-data-s3-location "s3://(Amazon S3 bucket name)/(path to your input folder)" \
--id "(a job ID for the new job)" \
--processed-data-s3-location "s3://(Amazon S3 bucket name)/(path to your output folder)" \
--previous-data-processing-job-id "(the job ID of the previous data-processing job)"
如需詳細資訊,請參閱《 AWS CLI 命令參考》中的 start-ml-data-processing-job。
- SDK
-
import boto3
from botocore.config import Config
client = boto3.client(
'neptunedata',
endpoint_url='https://your-neptune-endpoint:port',
config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)
response = client.start_ml_data_processing_job(
inputDataS3Location='s3://(Amazon S3 bucket name)/(path to your input folder)',
id='(a job ID for the new job)',
processedDataS3Location='s3://(Amazon S3 bucket name)/(path to your output folder)',
previousDataProcessingJobId='(the job ID of the previous data-processing job)'
)
print(response)
- awscurl
-
awscurl https://your-neptune-endpoint:port/ml/dataprocessing \
--region us-east-1 \
--service neptune-db \
-X POST \
-H 'Content-Type: application/json' \
-d '{
"inputDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your input folder)",
"id" : "(a job ID for the new job)",
"processedDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your output folder)",
"previousDataProcessingJobId" : "(the job ID of the previous data-processing job)"
}'
此範例假設您的 AWS 登入資料已在您的環境中設定。將 us-east-1 取代為 Neptune 叢集的區域。
- curl
-
curl \
-X POST https://your-neptune-endpoint:port/ml/dataprocessing \
-H 'Content-Type: application/json' \
-d '{
"inputDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your input folder)",
"id" : "(a job ID for the new job)",
"processedDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your output folder)",
"previousDataProcessingJobId" : "(the job ID of the previous data-processing job)"
}'
將 previousDataProcessingJobId 參數值設定為對應至訓練模型之先前資料處理工作的工作 ID。
目前不支援更新圖形中的節點刪除。如果節點已在更新圖形中移除,您必須啟動全新的資料處理工作,而不是使用 previousDataProcessingJobId。