建立自訂 Docker 容器映像 for SageMaker ，並將其用於 AWS Step Functions 中的模型訓練

由 Julia Bluszcz (AWS)、Neha Sharma (AWS)、Aubrey Oosthuizen (AWS)、Mohan Gowda Purushothama (AWS) 和 Mateusz Zaremba (AWS) 建立

環境：生產

技術：機器學習和 AI； DevOps

AWS 服務：Amazon ECR；Amazon SageMaker；AWS Step Functions

Summary

此模式說明如何為 Amazon SageMaker 建立 Docker 容器映像，並將其用於 AWS Step Functions 中的訓練模型。透過在容器中封裝自訂演算法，您可以在 SageMaker 環境中執行幾乎任何程式碼，無論程式設計語言、架構或相依性為何。

在提供的 SageMaker 筆記本範例中，自訂 Docker 容器映像儲存在 Amazon Elastic Container Registry (Amazon ECR) 中。然後，Step Functions 會使用存放在 Amazon ECR 中的容器來執行 Python 處理指令碼 for SageMaker。然後，容器將模型匯出至 Amazon Simple Storage Service (Amazon S3)。

先決條件和限制

先決條件

作用中的 AWS 帳戶
具有 Amazon S3Amazon S3 許可的 AWS 的 SageMaker Identity and Access Management (IAM) 角色
Step Functions 的 IAM 角色
熟悉 Python
熟悉 Amazon SageMaker Python SDK
熟悉 AWS Command Line Interface (AWS CLI)
熟悉 Python SDK的 AWS (Boto3)
熟悉 Amazon ECR
熟悉 Docker

產品版本

AWS Step Functions 資料科學 SDK 2.3.0 版
Amazon SageMaker Python SDK 2.78.0 版

架構

下圖顯示建立 Docker 容器映像 for SageMaker 的範例工作流程，然後在 Step Functions 中將其用於訓練模型：

建立 Docker Container Image for SageMaker 以用作 Step Functions 訓練模型的工作流程。

該圖顯示以下工作流程：

資料科學家或 DevOps 工程師使用 Amazon SageMaker 筆記本來建立自訂 Docker 容器映像。
資料科學家或 DevOps 工程師會將 Docker 容器映像存放在私有登錄檔中的 Amazon ECR 私有儲存庫中。
資料科學家或 DevOps 工程師使用 Docker 容器在 Step Functions 工作流程中執行 Python SageMaker 處理任務。

自動化和擴展

此模式中的範例 SageMaker 筆記本使用ml.m5.xlarge筆記本執行個體類型。您可以變更執行個體類型以符合您的使用案例。如需 SageMaker 筆記本執行個體類型的詳細資訊，請參閱 Amazon SageMaker 定價。

工具

Amazon Elastic Container Registry (Amazon ECR) 是安全、可擴展且可靠的受管容器映像登錄服務。
Amazon SageMaker 是一項受管機器學習 (ML) 服務，可協助您建置和訓練 ML 模型，然後將模型部署到生產就緒的託管環境中。
Amazon SageMaker Python SDK 是一個開放原始碼程式庫，用於訓練和部署機器學習模型 on SageMaker。
AWS Step Functions 是一種無伺服器協調服務，可協助您結合 AWS Lambda 函數和其他 AWS 服務來建置業務關鍵型應用程式。
AWS Step Functions 資料科學 Python SDK 是一個開放原始碼程式庫，可協助您建立 Step Functions 工作流程，以處理和發佈機器學習模型。

Epics

任務	描述	所需的技能
設定 Amazon ECR 並建立新的私有登錄檔。	如果您尚未設定 Amazon ECR，請遵循 Amazon Word 使用者指南中的使用 Amazon ECR 設定中的指示來設定 Amazon Word。 ECR 每個 AWS 帳戶都會提供預設的私有 Amazon ECR 登錄檔。	DevOps 工程師
建立 Amazon ECR 私有儲存庫。	請遵循 Amazon ECR 使用者指南中建立私有儲存庫的指示。注意：您建立的儲存庫是您存放自訂 Docker 容器映像的位置。	DevOps 工程師
建立 Dockerfile，其中包含執行 SageMaker 處理任務所需的規格。	建立 Dockerfile，其中包含透過設定 Dockerfile 執行您的 SageMaker 處理任務所需的規格。如需指示，請參閱 Amazon SageMaker 開發人員指南中的調整您自己的訓練容器。如需 Dockerfiles 的詳細資訊，請參閱 Docker 文件中的 Dockerfile 參考。建立 Dockerfile 的 Jupyter 筆記本程式碼儲存格範例儲存格 1 `# Make docker folder !mkdir -p docker` 儲存格 2 `%%writefile docker/Dockerfile FROM python:3.7-slim-buster RUN pip3 install pandas==0.25.3 scikit-learn==0.21.3 ENV PYTHONUNBUFFERED=TRUE ENTRYPOINT ["python3"]`	DevOps 工程師
建置 Docker 容器映像並將其推送至 Amazon ECR。	使用您在 AWS 中執行 `docker build`命令所建立的 Dockerfile 建置容器映像CLI。執行 `docker push`命令，將容器映像推送至 Amazon ECR。如需詳細資訊，請參閱在在 onWord 上建置您自己的演算法容器中建置和註冊容器。 GitHub 建置和註冊 Docker 映像的 Jupyter 筆記本程式碼儲存格範例重要事項：在執行下列儲存格之前，請確定您已建立 Dockerfile，並將其存放在名為的目錄中`docker`。此外，請確定您已建立 Amazon ECR 儲存庫，並將第一個儲存格中的`ecr_repository`值取代為儲存庫的名稱。儲存格 1 `import boto3 tag = ':latest' account_id = boto3.client('sts').get_caller_identity().get('Account') region = boto3.Session().region_name ecr_repository = 'byoc' image_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)` 儲存格 2 `# Build docker image !docker build -t $image_uri docker` 儲存格 3 `# Authenticate to ECR !aws ecr get-login-password --region {region} \| docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com` 儲存格 4 `# Push docker image !docker push $image_uri` 注意：您必須向私有登錄檔驗證 Docker 用戶端，才能使用 `docker push`和 `docker pull`命令。這些命令會在登錄檔的儲存庫中推送和提取映像。	DevOps 工程師

任務描述所需的技能

任務	描述	所需的技能
建立 Python 指令碼，其中包含您的自訂處理和模型訓練邏輯。	撰寫要在資料處理指令碼中執行的自訂處理邏輯。然後，將其儲存為名為的 Python 指令碼`training.py`。如需詳細資訊，請參閱使用 SageMaker Script 模式 onWord 自帶模型。 GitHub 包含自訂處理和模型訓練邏輯的 Python 指令碼範例 `%%writefile training.py from numpy import empty import pandas as pd import os from sklearn import datasets, svm from joblib import dump, load if __name__ == '__main__': digits = datasets.load_digits() #create classifier object clf = svm.SVC(gamma=0.001, C=100.) #fit the model clf.fit(digits.data[:-1], digits.target[:-1]) #model output in binary format output_path = os.path.join('/opt/ml/processing/model', "model.joblib") dump(clf, output_path)`	資料科學家
建立 Step Functions 工作流程，其中包含您的 SageMaker 處理任務作為其中一個步驟。	安裝和匯入 AWS Step Functions 資料科學 SDK，並將 training.py 檔案上傳至 Amazon S3。然後，使用 Amazon SageMaker Python SDK 在 Step Functions 中定義處理步驟。重要：請確定您已在 IAM 帳戶中為 Step Functions 建立 Word 執行角色。 AWS 要上傳至 Amazon S3 的環境設定和自訂訓練指令碼範例 !pip install stepfunctions import boto3 import stepfunctions import sagemaker import datetime from stepfunctions import steps from stepfunctions.inputs import ExecutionInput from stepfunctions.steps import ( Chain ) from stepfunctions.workflow import Workflow from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput sagemaker_session = sagemaker.Session() bucket = sagemaker_session.default_bucket() role = sagemaker.get_execution_role() prefix = 'byoc-training-model' # See prerequisites section to create this role workflow_execution_role = f"arn:aws:iam::{account_id}:role/AmazonSageMaker-StepFunctionsWorkflowExecutionRole" execution_input = ExecutionInput( schema={ "PreprocessingJobName": str}) input_code = sagemaker_session.upload_data( "training.py", bucket=bucket, key_prefix="preprocessing.py", ) 使用自訂 Amazon Word 映像和 Python 指令碼的 ExampleECR SageMaker 處理步驟定義注意：請確定您使用 `execution_input` 參數來指定任務名稱。每次任務執行時，參數的值必須是唯一的。此外， training.py 檔案的程式碼會以`input`參數形式傳遞至 `ProcessingStep`，這表示該檔案會在容器內複製。`ProcessingInput` 程式碼的目的地與中的第二個引數相同`container_entrypoint`。 script_processor = ScriptProcessor(command=['python3'], image_uri=image_uri, role=role, instance_count=1, instance_type='ml.m5.xlarge') processing_step = steps.ProcessingStep( "training-step", processor=script_processor, job_name=execution_input["PreprocessingJobName"], inputs=[ ProcessingInput( source=input_code, destination="/opt/ml/processing/input/code", input_name="code", ), ], outputs=[ ProcessingOutput( source='/opt/ml/processing/model', destination="s3://{}/{}".format(bucket, prefix), output_name='byoc-example') ], container_entrypoint=["python3", "/opt/ml/processing/input/code/training.py"], ) 執行 a SageMaker 處理任務的 Step Functions 工作流程範例注意：此範例工作流程僅包含 SageMaker 處理任務步驟，而非完整的步驟函數工作流程。如需完整的工作流程範例，請參閱 SageMaker Step Functions Data Science Word 文件中的 inWord 筆記本範例。 AWS SDK `workflow_graph = Chain([processing_step]) workflow = Workflow( name="ProcessingWorkflow", definition=workflow_graph, role=workflow_execution_role ) workflow.create() # Execute workflow execution = workflow.execute( inputs={ "PreprocessingJobName": str(datetime.datetime.now().strftime("%Y%m%d%H%M-%SS")), # Each pre processing job (SageMaker processing job) requires a unique name, } ) execution_output = execution.get_output(wait=True)`	資料科學家

建立 Python 指令碼，其中包含您的自訂處理和模型訓練邏輯。

撰寫要在資料處理指令碼中執行的自訂處理邏輯。然後，將其儲存為名為的 Python 指令碼training.py。

如需詳細資訊，請參閱使用 SageMaker Script 模式 onWord 自帶模型。 GitHub

包含自訂處理和模型訓練邏輯的 Python 指令碼範例


%%writefile training.py
from numpy import empty
import pandas as pd
import os
from sklearn import datasets, svm
from joblib import dump, load


if __name__ == '__main__':
    digits = datasets.load_digits()
    #create classifier object
    clf = svm.SVC(gamma=0.001, C=100.)
    
    #fit the model
    clf.fit(digits.data[:-1], digits.target[:-1])
    
    #model output in binary format
    output_path = os.path.join('/opt/ml/processing/model', "model.joblib")
    dump(clf, output_path)

資料科學家

建立 Step Functions 工作流程，其中包含您的 SageMaker 處理任務作為其中一個步驟。

安裝和匯入 AWS Step Functions 資料科學 SDK，並將 training.py 檔案上傳至 Amazon S3。然後，使用 Amazon SageMaker Python SDK 在 Step Functions 中定義處理步驟。

重要：請確定您已在 IAM 帳戶中為 Step Functions 建立 Word 執行角色。 AWS

要上傳至 Amazon S3 的環境設定和自訂訓練指令碼範例


!pip install stepfunctions

import boto3
import stepfunctions
import sagemaker
import datetime

from stepfunctions import steps
from stepfunctions.inputs import ExecutionInput
from stepfunctions.steps import (
    Chain
)
from stepfunctions.workflow import Workflow
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket() 
role = sagemaker.get_execution_role()
prefix = 'byoc-training-model'

# See prerequisites section to create this role
workflow_execution_role = f"arn:aws:iam::{account_id}:role/AmazonSageMaker-StepFunctionsWorkflowExecutionRole"

execution_input = ExecutionInput(
    schema={
        "PreprocessingJobName": str})


input_code = sagemaker_session.upload_data(
    "training.py",
    bucket=bucket,
    key_prefix="preprocessing.py",
)

使用自訂 Amazon Word 映像和 Python 指令碼的 ExampleECR SageMaker 處理步驟定義

注意：請確定您使用 execution_input 參數來指定任務名稱。每次任務執行時，參數的值必須是唯一的。此外， training.py 檔案的程式碼會以input參數形式傳遞至 ProcessingStep，這表示該檔案會在容器內複製。ProcessingInput 程式碼的目的地與中的第二個引數相同container_entrypoint。


script_processor = ScriptProcessor(command=['python3'],
                image_uri=image_uri,
                role=role,
                instance_count=1,
                instance_type='ml.m5.xlarge')


processing_step = steps.ProcessingStep(
    "training-step",
    processor=script_processor,
    job_name=execution_input["PreprocessingJobName"],
    inputs=[
        ProcessingInput(
            source=input_code,
            destination="/opt/ml/processing/input/code",
            input_name="code",
        ),
    ],
    outputs=[
        ProcessingOutput(
            source='/opt/ml/processing/model', 
            destination="s3://{}/{}".format(bucket, prefix), 
            output_name='byoc-example')
    ],
    container_entrypoint=["python3", "/opt/ml/processing/input/code/training.py"],
)

執行 a SageMaker 處理任務的 Step Functions 工作流程範例

注意：此範例工作流程僅包含 SageMaker 處理任務步驟，而非完整的步驟函數工作流程。如需完整的工作流程範例，請參閱 SageMaker Step Functions Data Science Word 文件中的 inWord 筆記本範例。 AWS SDK


workflow_graph = Chain([processing_step])

workflow = Workflow(
    name="ProcessingWorkflow",
    definition=workflow_graph,
    role=workflow_execution_role
)

workflow.create()
# Execute workflow
execution = workflow.execute(
    inputs={
        "PreprocessingJobName": str(datetime.datetime.now().strftime("%Y%m%d%H%M-%SS")),  # Each pre processing job (SageMaker processing job) requires a unique name,
    }
)
execution_output = execution.get_output(wait=True)

資料科學家

建立自訂 Docker 容器映像 for SageMaker ，並將其用於 AWS Step Functions 中的模型訓練

Summary

先決條件和限制

架構

工具

Epics

相關資源