使用 XGBoost作為架構使用 XGBoost作為內建演算法 XGBoost 演算法的輸入/輸出介面

如何使用 SageMaker XGBoost

透過 SageMaker，您可以使用 XGBoost作為內建演算法或架構。XGBoost 作為架構時，您可以擁有更多彈性並存取更進階的案例，因為您可以自訂自己的訓練指令碼。下列各節說明如何XGBoost搭配 SageMaker Python SDK和XGBoost演算法的輸入/輸出介面使用。如需如何XGBoost從 Amazon SageMaker Studio Classic UI 使用的資訊，請參閱 SageMaker JumpStart 預先訓練的模型。

使用 XGBoost作為架構

使用 XGBoost作為架構，來執行自訂訓練指令碼，將其他資料處理納入訓練任務。在下列程式碼範例中， SageMaker Python SDK提供 XGBoostAPI作為架構。此函數與 SageMaker 提供其他架構的方式類似APIs，例如 MXNet、 TensorFlow和 PyTorch。


import boto3
import sagemaker
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput

# initialize hyperparameters
hyperparameters = {
        "max_depth":"5",
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "verbosity":"1",
        "objective":"reg:squarederror",
        "num_round":"50"}

# set an output path where the trained model will be saved
bucket = sagemaker.Session().default_bucket()
prefix = 'DEMO-xgboost-as-a-framework'
output_path = 's3://{}/{}/{}/output'.format(bucket, prefix, 'abalone-xgb-framework')

# construct a SageMaker XGBoost estimator
# specify the entry_point to your xgboost training script
estimator = XGBoost(entry_point = "your_xgboost_abalone_script.py", 
                    framework_version='1.7-1',
                    hyperparameters=hyperparameters,
                    role=sagemaker.get_execution_role(),
                    instance_count=1,
                    instance_type='ml.m5.2xlarge',
                    output_path=output_path)

# define the data type and paths to the training and validation datasets
content_type = "libsvm"
train_input = TrainingInput("s3://{}/{}/{}/".format(bucket, prefix, 'train'), content_type=content_type)
validation_input = TrainingInput("s3://{}/{}/{}/".format(bucket, prefix, 'validation'), content_type=content_type)

# execute the XGBoost training job
estimator.fit({'train': train_input, 'validation': validation_input})

如需 end-to-end使用 SageMaker XGBoost 作為架構的範例，請參閱使用 Amazon 迴歸 SageMaker XGBoost。

使用 XGBoost作為內建演算法

使用XGBoost內建演算法來建置XGBoost訓練容器，如下列程式碼範例所示。您可以使用 URI 自動識別XGBoost內建演算法映像 SageMaker image_uris.retrieveAPI。如果使用 Amazon SageMaker Python SDK 第 1 版，請使用 get_image_uri API。若要確定 image_uris.retrieveAPI找到正確的 URI，請參閱內建演算法的常見參數。然後從內建演算法映像URIs和可用區域xgboost的完整清單中進行查詢。

指定XGBoost影像之後URI，請使用XGBoost容器，使用估算器建構 SageMaker 估算器API並啟動訓練任務。此XGBoost內建演算法模式不會整合您自己的XGBoost訓練指令碼，並直接在輸入資料集上執行。

重要

當您擷取 SageMaker XGBoost映像時URI，請勿使用 :latest或 :1 作為映像URI標籤。您必須指定其中一個支援的版本，才能選擇具有您要使用的原生XGBoost套件版本的 SageMaker受管XGBoost容器。若要尋找遷移至容器的 SageMaker XGBoost套件版本，請參閱 Docker Registry 路徑和範例程式碼。然後選擇您的 AWS 區域，然後導覽至 XGBoost（演算法） 區段。


import sagemaker
import boto3
from sagemaker import image_uris
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput

# initialize hyperparameters
hyperparameters = {
        "max_depth":"5",
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "objective":"reg:squarederror",
        "num_round":"50"}

# set an output path where the trained model will be saved
bucket = sagemaker.Session().default_bucket()
prefix = 'DEMO-xgboost-as-a-built-in-algo'
output_path = 's3://{}/{}/{}/output'.format(bucket, prefix, 'abalone-xgb-built-in-algo')

# this line automatically looks for the XGBoost image URI and builds an XGBoost container.
# specify the repo_version depending on your preference.
xgboost_container = sagemaker.image_uris.retrieve("xgboost", region, "1.7-1")

# construct a SageMaker estimator that calls the xgboost-container
estimator = sagemaker.estimator.Estimator(image_uri=xgboost_container, 
                                          hyperparameters=hyperparameters,
                                          role=sagemaker.get_execution_role(),
                                          instance_count=1, 
                                          instance_type='ml.m5.2xlarge', 
                                          volume_size=5, # 5 GB 
                                          output_path=output_path)

# define the data type and paths to the training and validation datasets
content_type = "libsvm"
train_input = TrainingInput("s3://{}/{}/{}/".format(bucket, prefix, 'train'), content_type=content_type)
validation_input = TrainingInput("s3://{}/{}/{}/".format(bucket, prefix, 'validation'), content_type=content_type)

# execute the XGBoost training job
estimator.fit({'train': train_input, 'validation': validation_input})

如需如何XGBoost將設定為內建演算法的詳細資訊，請參閱下列筆記本範例。

XGBoost 演算法的輸入/輸出介面

梯度提升在表格式資料中操作，含有代表觀察的行、還有一個代表目標變數或標籤的欄，而剩下的欄則代表功能。

的 SageMaker 實作XGBoost支援下列資料格式以進行訓練和推論：

text/libsvm (預設值)
text/csv
application/x-parquet
應用程式/x-recordio-protobuf

注意

關於訓練和推論的輸入，有些注意事項需注意：

為了提高效能，我們建議您XGBoost搭配 File 模式使用 ，其中 Amazon S3 的資料會儲存在訓練執行個體磁碟區中。
以單欄式輸入的訓練，演算法假設目標變數 (標籤) 是在第一欄。對於推論，演算法假設輸入中沒有標籤欄。
對於CSV資料，輸入不應具有標頭記錄。
對於LIBSVM訓練，演算法假設標籤欄之後的後續資料欄包含特徵的零基索引值對。因此每個資料列的格式皆為：<label> <index0>:<value0> <index1>:<value1>。
如需執行個體類型和分散式訓練的資訊，請參閱EC2 XGBoost演算法的執行個體建議。

對於CSV訓練輸入模式，演算法可用的總記憶體必須能夠保留訓練資料集。可用的記憶體總數計算為 Instance Count * the memory available in the InstanceType。libsvm 訓練輸入模式並非必要，但建議使用。

對於 v1.3-1 和更新版本， SageMaker XGBoost 會使用將模型儲存為XGBoost內部二進位格式Booster.save_model。之前的版本使用 Python 保存模組將模型序列化/取消序列化。

注意

在開放原始碼中使用模型時 SageMaker XGBoost，請注意版本XGBoost。1.3-1 版和更新版本使用XGBoost內部二進位格式，而舊版使用 Python pickle 模組。

在開放原始碼中使用使用 SageMaker XGBoost v1.3-1 或更新版本訓練的模型 XGBoost

使用以下 Python 程式碼：


import xgboost as xgb

xgb_model = xgb.Booster()
xgb_model.load_model(model_file_path)
xgb_model.predict(dtest)

使用在開放原始碼中使用先前版本訓練的 SageMaker XGBoost模型 XGBoost

使用以下 Python 程式碼：


import pickle as pkl 
import tarfile

t = tarfile.open('model.tar.gz', 'r:gz')
t.extractall()

model = pkl.load(open(model_file_path, 'rb'))

# prediction with test data
pred = model.predict(dtest)

若要區隔標籤資料點的重要性，請使用執行個體權重支援

SageMaker XGBoost 允許客戶透過為每個執行個體指派權重值來區分已標記資料點的重要性。針對 text/libsvm 輸入，客戶可以將執行個體連接到標籤後面，以指派權重值給資料。例如：label:weight idx_0:val_0 idx_1:val_1...。針對 text/csv 輸入，客戶需要在參數中開啟 csv_weights 標記，將欄中的權重值連接在標籤後面。例如：label,weight,val_0,val_1,...。

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

XGBoost 演算法

範例筆記本