為 Amazon SageMaker AI 調整您自己的推論容器

如果您無法將 Amazon SageMaker AI 預先建置的 SageMaker AI Docker 映像中列出的任何映像用於您的使用案例，您可以建置自己的 Docker 容器，並在 SageMaker AI 中使用該容器進行訓練和推論。若要與 SageMaker AI 相容，您的容器必須具有下列特性：

您的容器必須擁有連接埠上的 Web 伺服器清單8080。
您的容器必須接受對 /invocations和 /ping即時端點的POST請求。您傳送至這些端點的請求必須以 60 秒傳回，且大小上限為 6 MB。

如需如何使用 SageMaker AI 建置自己的 Docker 容器進行訓練和推論的詳細資訊和範例，請參閱建置自己的演算法容器。

下列指南說明如何搭配 Amazon SageMaker Studio Classic 使用JupyterLab空間來調整推論容器，以使用 SageMaker AI 託管。此範例使用 NGINX Web 伺服器、Gunicorn做為 Python Web 伺服器閘道介面，以及Flask做為 Web 應用程式架構。您可以使用不同的應用程式來調整容器，只要其符合先前列出的要求即可。如需使用您自己的推論程式碼的詳細資訊，請參閱使用託管服務的自訂推論程式碼。

調整推論容器

使用下列步驟來調整您自己的推論容器，以使用 SageMaker AI 託管。下列步驟中顯示的範例使用預先訓練的具名實體辨識 (NER) 模型，該模型使用適用於 Python和下列項目的 spaCy 自然語言處理 (NLP) 程式庫：

Dockerfile 建置包含NER模型的容器。
提供NER模型的推論指令碼。

如果您針對使用案例調整此範例，則必須使用部署Dockerfile和提供模型所需的和推論指令碼。

使用 Amazon SageMaker Studio Classic 建立 JupyterLab 空間（選用）。

您可以使用任何筆記本來執行指令碼，以使用 SageMaker AI 託管來調整推論容器。此範例說明如何使用 Amazon SageMaker Studio Classic 中的JupyterLab空間來啟動隨附於 SageMaker AI 分佈映像JupyterLab的應用程式。如需詳細資訊，請參閱SageMaker JupyterLab。

上傳Docker檔案和推論指令碼。

在主目錄中建立新的資料夾。如果您使用的是 JupyterLab，請在左上角選擇新資料夾圖示，然後輸入資料夾名稱以包含您的 Dockerfile。在此範例中，資料夾稱為 docker_test_folder。

將Dockerfile文字檔案上傳至您的新資料夾。以下是從 spaCy 建立具有預先訓練之具名實體辨識 (NER) 模型Dockerfile的Docker容器的範例，這是執行範例所需的應用程式和環境變數：


FROM python:3.8

RUN apt-get -y update && apt-get install -y --no-install-recommends \
         wget \
         python3 \
         nginx \
         ca-certificates \
    && rm -rf /var/lib/apt/lists/*

RUN wget https://bootstrap.pypa.io/get-pip.py && python3 get-pip.py && \
    pip install flask gevent gunicorn && \
        rm -rf /root/.cache

#pre-trained model package installation
RUN pip install spacy
RUN python -m spacy download en


# Set environment variables
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"

COPY NER /opt/program
WORKDIR /opt/program

在先前的程式碼範例中，環境變數PYTHONUNBUFFEREDPython不會緩衝標準輸出串流，這可讓更快速地將日誌交付給使用者。環境變數PYTHONDONTWRITEBYTECODEPython不會寫入編譯的位元組碼.pyc檔案，這對於此使用案例來說是不必要的。環境變數PATH用於識別叫用容器時 train和 serve 程式的位置。

在新資料夾內建立新的目錄，以包含為您的模型提供服務的指令碼。此範例使用名為的目錄NER，其中包含執行此範例所需的下列指令碼：
- predictor.py – Python指令碼，其中包含使用模型載入和執行推論的邏輯。
- nginx.conf – 用來設定 Web 伺服器的指令碼。
- serve – 啟動推論伺服器的指令碼。
- wsgi.py – 提供模型的協助程式指令碼。
重要
如果您將推論指令碼複製到結尾為的筆記本，.ipynb並重新命名它們，則指令碼可能包含格式字元，以防止端點部署。反之，請建立文字檔案並重新命名。

上傳指令碼，讓您的模型可用於推論。以下是名為的範例指令碼predictor.py，使用 Flask 提供 /ping和 /invocations端點：


from flask import Flask
import flask
import spacy
import os
import json
import logging

#Load in model
nlp = spacy.load('en_core_web_sm') 
#If you plan to use a your own model artifacts, 
#your model artifacts should be stored in /opt/ml/model/ 


# The flask app for serving predictions
app = Flask(__name__)
@app.route('/ping', methods=['GET'])
def ping():
    # Check if the classifier was loaded correctly
    health = nlp is not None
    status = 200 if health else 404
    return flask.Response(response= '\n', status=status, mimetype='application/json')


@app.route('/invocations', methods=['POST'])
def transformation():
    
    #Process input
    input_json = flask.request.get_json()
    resp = input_json['input']
    
    #NER
    doc = nlp(resp)
    entities = [(X.text, X.label_) for X in doc.ents]

    # Transform predictions to JSON
    result = {
        'output': entities
        }

    resultjson = json.dumps(result)
    return flask.Response(response=resultjson, status=200, mimetype='application/json')

如果正確載入模型，以及如果錯誤404載入模型200，上一個指令碼範例中的/ping端點會傳回的狀態碼。/invocations 端點會處理格式化為的請求JSON、擷取輸入欄位，並使用 NER模型來識別和存放變數實體中的實體。Flask 應用程式會傳回包含這些實體的回應。如需這些必要運作狀態請求的詳細資訊，請參閱容器對運作狀態檢查 (Ping) 請求應有的回應方式。

上傳指令碼以啟動推論伺服器。下列指令碼範例serve會使用 Gunicorn做為應用程式伺服器，以及 Nginx 做為 Web 伺服器：


#!/usr/bin/env python

# This file implements the scoring service shell. You don't necessarily need to modify it for various
# algorithms. It starts nginx and gunicorn with the correct configurations and then simply waits until
# gunicorn exits.
#
# The flask server is specified to be the app object in wsgi.py
#
# We set the following parameters:
#
# Parameter                Environment Variable              Default Value
# ---------                --------------------              -------------
# number of workers        MODEL_SERVER_WORKERS              the number of CPU cores
# timeout                  MODEL_SERVER_TIMEOUT              60 seconds

import multiprocessing
import os
import signal
import subprocess
import sys

cpu_count = multiprocessing.cpu_count()

model_server_timeout = os.environ.get('MODEL_SERVER_TIMEOUT', 60)
model_server_workers = int(os.environ.get('MODEL_SERVER_WORKERS', cpu_count))

def sigterm_handler(nginx_pid, gunicorn_pid):
    try:
        os.kill(nginx_pid, signal.SIGQUIT)
    except OSError:
        pass
    try:
        os.kill(gunicorn_pid, signal.SIGTERM)
    except OSError:
        pass

    sys.exit(0)

def start_server():
    print('Starting the inference server with {} workers.'.format(model_server_workers))


    # link the log streams to stdout/err so they will be logged to the container logs
    subprocess.check_call(['ln', '-sf', '/dev/stdout', '/var/log/nginx/access.log'])
    subprocess.check_call(['ln', '-sf', '/dev/stderr', '/var/log/nginx/error.log'])

    nginx = subprocess.Popen(['nginx', '-c', '/opt/program/nginx.conf'])
    gunicorn = subprocess.Popen(['gunicorn',
                                 '--timeout', str(model_server_timeout),
                                 '-k', 'sync',
                                 '-b', 'unix:/tmp/gunicorn.sock',
                                 '-w', str(model_server_workers),
                                 'wsgi:app'])

    signal.signal(signal.SIGTERM, lambda a, b: sigterm_handler(nginx.pid, gunicorn.pid))

    # Exit the inference server upon exit of either subprocess
    pids = set([nginx.pid, gunicorn.pid])
    while True:
        pid, _ = os.wait()
        if pid in pids:
            break

    sigterm_handler(nginx.pid, gunicorn.pid)
    print('Inference server exiting')

# The main routine to invoke the start function.

if __name__ == '__main__':
    start_server()

先前的指令碼範例定義訊號處理常式函數 sigterm_handler，該函數會在 Nginx和 Gunicorn 子程序收到SIGTERM訊號時將其關閉。start_server 函數會啟動訊號處理常式、啟動和監控 Nginx和 Gunicorn子程序，以及擷取日誌串流。

上傳指令碼以設定您的 Web 伺服器。下列指令碼範例稱為 nginx.conf，使用 Gunicorn做為應用程式伺服器來設定 Nginx Web 伺服器，以為您的模型提供推論服務：


worker_processes 1;
daemon off; # Prevent forking


pid /tmp/nginx.pid;
error_log /var/log/nginx/error.log;

events {
  # defaults
}

http {
  include /etc/nginx/mime.types;
  default_type application/octet-stream;
  access_log /var/log/nginx/access.log combined;
  
  upstream gunicorn {
    server unix:/tmp/gunicorn.sock;
  }

  server {
    listen 8080 deferred;
    client_max_body_size 5m;

    keepalive_timeout 5;
    proxy_read_timeout 1200s;

    location ~ ^/(ping|invocations) {
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $http_host;
      proxy_redirect off;
      proxy_pass http://gunicorn;
    }

    location / {
      return 404 "{}";
    }
  }
}

先前的指令碼範例Nginx會設定在前景中執行、設定擷取的位置error_log，以及將 upstream定義為Gunicorn伺服器的通訊端襪子。伺服器會設定伺服器區塊以接聽連接埠 8080，並設定用戶端請求內文大小和逾時值的限制。伺服器區塊會將包含 /ping或 /invocations 路徑的請求轉送至 Gunicorn server http://gunicorn，並傳回其他路徑的404錯誤。

上傳為模型提供服務所需的任何其他指令碼。此範例需要下列名為的範例指令碼wsgi.py，以協助Gunicorn尋找您的應用程式：


import predictor as myapp

# This is just a simple wrapper for gunicorn to find your app.
# If you want to change the algorithm file, simply change "predictor" above to the
# new file.

app = myapp.app

從資料夾中docker_test_folder，您的目錄結構應該包含 Dockerfile和資料夾 NER。NER 資料夾應包含檔案 nginx.conf、serve、 predictor.py和 wsgi.py ，如下所示：

The Dockerfile structure has inference scripts under the NER directory next to the Dockerfile.

建置您自己的容器。

從資料夾中docker_test_folder，建置您的Docker容器。下列範例命令將建置在您的中設定的Docker容器Dockerfile：
```
! docker build -t byo-container-test .
```
先前的命令會在目前的工作目錄中建置名為 byo-container-test的容器。如需Docker建置參數的詳細資訊，請參閱建置引數。
注意
如果您收到下列錯誤訊息，其中Docker找不到 Dockerfile，請確定 Dockerfile具有正確的名稱，並已儲存至目錄。
```
unable to prepare context: unable to evaluate symlinks in Dockerfile path:
lstat /home/ec2-user/SageMaker/docker_test_folder/Dockerfile: no such file or directory
```
Docker 會尋找目前目錄中Dockerfile沒有副檔名的特別呼叫檔案。如果您將其命名為其他項目，則可以使用 -f 旗標手動傳遞檔案名稱。例如，如果您將命名Dockerfile為 Dockerfile-text.txt，請使用 -f旗標，後面接著檔案來建置Docker容器，如下所示：
```
! docker build -t byo-container-test -f Dockerfile-text.txt .
```

將您的Docker映像推送至 Amazon Elastic Container Registry (Amazon ECR)

在筆記本儲存格中，將Docker映像推送至 ECR。下列程式碼範例示範如何在本機建置容器、登入並推送至 ECR：


%%sh
# Name of algo -> ECR
algorithm_name=sm-pretrained-spacy

#make serve executable
chmod +x NER/serve
account=$(aws sts get-caller-identity --query Account --output text)
# Region, defaults to us-west-2
region=$(aws configure get region)
region=${region:-us-east-1}
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"
# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1
if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/nullfi
# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}
# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

在上一個範例中，示範如何執行將範例 Docker 容器推送至 ECR 所需的下列步驟：

將演算法名稱定義為 sm-pretrained-spacy。
讓NER資料夾內的 serve 檔案可執行。
設定 AWS 區域。
如果 ECR 尚未存在，請建立 ECR。
登入 ECR。
在本機建置Docker容器。
將Docker映像推送至 ECR。

設定 SageMaker AI 用戶端

如果您想要使用 SageMaker AI 託管服務進行推論，則必須建立模型、建立端點組態和建立端點。若要從端點取得推論，您可以使用 SageMaker AI boto3 Runtime 用戶端來叫用端點。下列程式碼說明如何使用 SageMaker AI boto3 用戶端設定 SageMaker AI 用戶端和 SageMaker Runtime 用戶端： SageMaker
```
import boto3
from sagemaker import get_execution_role

sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')

account_id = boto3.client('sts').get_caller_identity()['Account']
region = boto3.Session().region_name

#used to store model artifacts which SageMaker AI will extract to /opt/ml/model in the container, 
#in this example case we will not be making use of S3 to store the model artifacts
#s3_bucket = '<S3Bucket>'

role = get_execution_role()
```
在先前的程式碼範例中，不會使用 Amazon S3 儲存貯體，而是插入做為註解，以示範如何存放模型成品。

如果您在執行先前的程式碼範例之後收到許可錯誤，您可能需要將許可新增至 IAM 角色。如需關於 IAM 角色的詳細資訊，請參閱Amazon SageMaker 角色管理器。如需將許可新增至目前角色的詳細資訊，請參閱 AWS Amazon SageMaker AI 的受管政策。

建立您的模型。

如果您想要使用 SageMaker AI 託管服務進行推論，您必須在 SageMaker AI 中建立模型。下列程式碼範例示範如何在 SageMaker AI 中建立spaCyNER模型：


from time import gmtime, strftime

model_name = 'spacy-nermodel-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
# MODEL S3 URL containing model atrifacts as either model.tar.gz or extracted artifacts. 
# Here we are not  
#model_url = 's3://{}/spacy/'.format(s3_bucket) 

container = '{}.dkr.ecr.{}.amazonaws.com/sm-pretrained-spacy:latest'.format(account_id, region)
instance_type = 'ml.c5d.18xlarge'

print('Model name: ' + model_name)
#print('Model data Url: ' + model_url)
print('Container image: ' + container)

container = {
'Image': container
}

create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    Containers = [container])

print("Model Arn: " + create_model_response['ModelArn'])

先前的程式碼範例顯示，s3_bucket如果您要使用步驟 5 中註解中的 Amazon S3 儲存貯體，如何使用 model_url 定義，以及定義容器映像的 ECR URI。先前的程式碼範例ml.c5d.18xlarge將定義為執行個體類型。您也可以選擇不同的執行個體類型。如需可用執行個體類型的詳細資訊，請參閱 Amazon EC2 執行個體類型。

在先前的程式碼範例中， Image金鑰指向容器映像 URI。create_model_response 定義使用 create_model method來建立模型，並傳回模型名稱、角色和包含容器資訊的清單。

上一個指令碼的範例輸出如下：


Model name: spacy-nermodel-YYYY-MM-DD-HH-MM-SS
Model data Url: s3://spacy-sagemaker-us-east-1-bucket/spacy/
Container image: 123456789012.dkr.ecr.us-east-2.amazonaws.com/sm-pretrained-spacy:latest
Model Arn: arn:aws:sagemaker:us-east-2:123456789012:model/spacy-nermodel-YYYY-MM-DD-HH-MM-SS

設定及建立端點

若要使用 SageMaker AI 託管進行推論，您還必須設定和建立端點。SageMaker AI 將使用此端點進行推論。下列組態範例示範如何使用您先前定義的執行個體類型和模型名稱來產生和設定端點：


endpoint_config_name = 'spacy-ner-config' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint config name: ' + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': instance_type,
        'InitialInstanceCount': 1,
        'InitialVariantWeight': 1,
        'ModelName': model_name,
        'VariantName': 'AllTraffic'}])
        
print("Endpoint config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

在先前的組態範例中，會將 model_name與使用時間戳記endpoint_config_name建立的唯一端點組態名稱create_endpoint_config_response建立關聯。

上一個指令碼的範例輸出如下：


Endpoint config name: spacy-ner-configYYYY-MM-DD-HH-MM-SS
Endpoint config Arn: arn:aws:sagemaker:us-east-2:123456789012:endpoint-config/spacy-ner-config-MM-DD-HH-MM-SS

如需端點錯誤的詳細資訊，請參閱為什麼我的 Amazon SageMaker AI 端點在建立或更新端點時進入失敗狀態？

建立端點並等待端點進入服務狀態。

下列程式碼範例使用先前組態範例中的組態建立端點，並部署模型：


%%time

import time

endpoint_name = 'spacy-ner-endpoint' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint name: ' + endpoint_name)

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Endpoint Status: " + status)

print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

在先前的程式碼範例中， create_endpoint方法會使用先前程式碼範例中建立的端點名稱來建立端點，並列印端點的 Amazon Resource Name。describe_endpoint 方法會傳回端點及其狀態的相關資訊。SageMaker AI 等待程式等待端點使用中。

測試您的端點。

您的端點使用完畢後，請將調用請求傳送至您的端點。下列程式碼範例示範如何將測試請求傳送至您的端點：


import json
content_type = "application/json"
request_body = {"input": "This is a test with NER in America with \
    Amazon and Microsoft in Seattle, writing random stuff."}

#Serialize data for endpoint
#data = json.loads(json.dumps(request_body))
payload = json.dumps(request_body)

#Endpoint invocation
response = runtime_sm_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType=content_type,
Body=payload)

#Parse results
result = json.loads(response['Body'].read().decode())['output']
result

在先前的程式碼範例中，方法會將 json.dumps序列化request_body為 JSON 格式的字串，並將其儲存在變數承載中。然後SageMaker AI Runtime 用戶端會使用叫用端點方法將承載傳送至您的端點。結果包含擷取輸出欄位後來自端點的回應。

先前的程式碼範例應傳回下列輸出：


[['NER', 'ORG'],
 ['America', 'GPE'],
 ['Amazon', 'ORG'],
 ['Microsoft', 'ORG'],
 ['Seattle', 'GPE']]

刪除您的端點

完成調用後，請刪除端點以節省資源。下列程式碼範例示範如何刪除端點：
```
sm_client.delete_endpoint(EndpointName=endpoint_name)
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm_client.delete_model(ModelName=model_name)
```
如需包含此範例中程式碼的完整筆記本，請參閱 BYOC-Single-Model。

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

使用需要驗證的 Docker 登錄檔進行訓練

使用您自己的演算法和模型建立容器

為 Amazon SageMaker AI 調整您自己的推論容器

調整推論容器

重要

注意

設定及建立端點

建立端點並等待端點進入服務狀態。