

本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# 處理從 Neptune 匯出用於訓練的圖形資料
<a name="machine-learning-on-graphs-processing"></a>

資料處理步驟會取得匯出程序所建立的 Neptune 圖形資料，並建立 [Deep Graph Library (DGL)](https://www.dgl.ai/) 在訓練期間使用的資訊。這包括執行各種資料對應和轉換：
+ 解析節點和邊緣以建構 DGL 所需的圖形和 ID 對應檔案。
+ 將節點和邊緣屬性轉換為 DGL 所需的節點和邊緣特徵。
+ 將資料分割為訓練、驗證和測試集。

## 管理 Neptune ML 的資料處理步驟
<a name="machine-learning-on-graphs-processing-managing"></a>

從 Neptune 匯出您想要用於模型訓練的資料之後，您可以使用如下所示的命令啟動資料處理任務：

------
#### [ AWS CLI ]

```
aws neptunedata start-ml-data-processing-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --input-data-s3-location "s3://(S3 bucket name)/(path to your input folder)" \
  --id "(a job ID for the new job)" \
  --processed-data-s3-location "s3://(S3 bucket name)/(path to your output folder)" \
  --config-file-name "training-job-configuration.json"
```

如需詳細資訊，請參閱《 AWS CLI 命令參考》中的 [start-ml-data-processing-job](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/start-ml-data-processing-job.html)。

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.start_ml_data_processing_job(
    inputDataS3Location='s3://(S3 bucket name)/(path to your input folder)',
    id='(a job ID for the new job)',
    processedDataS3Location='s3://(S3 bucket name)/(path to your output folder)',
    configFileName='training-job-configuration.json'
)

print(response)
```

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/ml/dataprocessing \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "inputDataS3Location" : "s3://(S3 bucket name)/(path to your input folder)",
        "id" : "(a job ID for the new job)",
        "processedDataS3Location" : "s3://(S3 bucket name)/(path to your output folder)",
        "configFileName" : "training-job-configuration.json"
      }'
```

**注意**  
此範例假設您的 AWS 登入資料已在您的環境中設定。將 *us-east-1* 取代為 Neptune 叢集的區域。

------
#### [ curl ]

```
curl \
  -X POST https://your-neptune-endpoint:port/ml/dataprocessing \
  -H 'Content-Type: application/json' \
  -d '{
        "inputDataS3Location" : "s3://(S3 bucket name)/(path to your input folder)",
        "id" : "(a job ID for the new job)",
        "processedDataS3Location" : "s3://(S3 bucket name)/(path to your output folder)",
        "configFileName" : "training-job-configuration.json"
      }'
```

------

如何使用此命令的詳細資訊會在 [dataprocessing 命令](machine-learning-api-dataprocessing.md) 中加以說明，伴隨如何取得執行中工作狀態、如何停止執行中工作，以及如何列出所有執行中工作的相關資訊。

## 處理 Neptune ML 的更新圖形資料
<a name="machine-learning-on-graphs-processing-updated"></a>

您也可以將 `previousDataProcessingJobId` 提供給 API，以確保新的資料處理工作使用與先前工作相同的處理方法。當您想要透過對新資料重新訓練舊模型，或對新資料上重新計算模型成品，以在 Neptune 中取得更新圖形資料的預測時，這是必要的。

您可以使用如下所示的命令來執行此操作：

------
#### [ AWS CLI ]

```
aws neptunedata start-ml-data-processing-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --input-data-s3-location "s3://(Amazon S3 bucket name)/(path to your input folder)" \
  --id "(a job ID for the new job)" \
  --processed-data-s3-location "s3://(Amazon S3 bucket name)/(path to your output folder)" \
  --previous-data-processing-job-id "(the job ID of the previous data-processing job)"
```

如需詳細資訊，請參閱《 AWS CLI 命令參考》中的 [start-ml-data-processing-job](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/start-ml-data-processing-job.html)。

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.start_ml_data_processing_job(
    inputDataS3Location='s3://(Amazon S3 bucket name)/(path to your input folder)',
    id='(a job ID for the new job)',
    processedDataS3Location='s3://(Amazon S3 bucket name)/(path to your output folder)',
    previousDataProcessingJobId='(the job ID of the previous data-processing job)'
)

print(response)
```

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/ml/dataprocessing \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "inputDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your input folder)",
        "id" : "(a job ID for the new job)",
        "processedDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your output folder)",
        "previousDataProcessingJobId" : "(the job ID of the previous data-processing job)"
      }'
```

**注意**  
此範例假設您的 AWS 登入資料已在您的環境中設定。將 *us-east-1* 取代為 Neptune 叢集的區域。

------
#### [ curl ]

```
curl \
  -X POST https://your-neptune-endpoint:port/ml/dataprocessing \
  -H 'Content-Type: application/json' \
  -d '{
        "inputDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your input folder)",
        "id" : "(a job ID for the new job)",
        "processedDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your output folder)",
        "previousDataProcessingJobId" : "(the job ID of the previous data-processing job)"
      }'
```

------

將 `previousDataProcessingJobId` 參數值設定為對應至訓練模型之先前資料處理工作的工作 ID。

**注意**  
目前不支援更新圖形中的節點刪除。如果節點已在更新圖形中移除，您必須啟動全新的資料處理工作，而不是使用 `previousDataProcessingJobId`。