Prasyarat

catatan

Ikuti petunjuk di bagian ini jika Anda mengkompilasi model menggunakan AWS SDK for Python (Boto3), AWS CLI, atau konsol SageMaker AI.

Untuk membuat model yang SageMaker dikompilasi NEO, Anda memerlukan yang berikut ini:

Gambar Docker Amazon ECR URI. Anda dapat memilih salah satu yang memenuhi kebutuhan Anda dari daftar ini.

File skrip titik masuk:

Untuk PyTorch dan MXNet model:

Jika Anda melatih model Anda menggunakan SageMaker AI, skrip pelatihan harus mengimplementasikan fungsi yang dijelaskan di bawah ini. Skrip pelatihan berfungsi sebagai skrip titik masuk selama inferensi. Dalam contoh yang dirinci dalam Pelatihan MNIST, Kompilasi dan Penerapan dengan MXNet Modul dan SageMaker Neo, skrip pelatihan (mnist.py) mengimplementasikan fungsi yang diperlukan.

Jika Anda tidak melatih model Anda menggunakan SageMaker AI, Anda perlu menyediakan file entry point script (inference.py) yang dapat digunakan pada saat inferensi. Berdasarkan kerangka kerja— MXNet atau PyTorch —lokasi skrip inferensi harus sesuai dengan Struktur Direktori Model SDK SageMaker Python untuk atau Struktur Direktori Model untuk MxNet. PyTorch

Saat menggunakan gambar Neo Inference Optimized Container dengan PyTorchdan MXNetpada tipe instance CPU dan GPU, skrip inferensi harus mengimplementasikan fungsi-fungsi berikut:

model_fn: Memuat model. (Opsional)
input_fn: Mengkonversi payload permintaan masuk ke array numpy.
predict_fn: Melakukan prediksi.
output_fn: Mengkonversi output prediksi ke payload respon.
Atau, Anda dapat menentukan transform_fn untuk menggabungkaninput_fn,predict_fn, danoutput_fn.

Berikut ini adalah contoh inference.py skrip dalam direktori bernama code (code/inference.py) untuk PyTorch dan MXNet (Gluon dan Modul). Contoh pertama memuat model dan kemudian menyajikannya pada data gambar pada GPU:

MXNet Module


import numpy as np
import json
import mxnet as mx
import neomx  # noqa: F401
from collections import namedtuple

Batch = namedtuple('Batch', ['data'])

# Change the context to mx.cpu() if deploying to a CPU endpoint
ctx = mx.gpu()

def model_fn(model_dir):
    # The compiled model artifacts are saved with the prefix 'compiled'
    sym, arg_params, aux_params = mx.model.load_checkpoint('compiled', 0)
    mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
    exe = mod.bind(for_training=False,
                   data_shapes=[('data', (1,3,224,224))],
                   label_shapes=mod._label_shapes)
    mod.set_params(arg_params, aux_params, allow_missing=True)
    
    # Run warm-up inference on empty data during model load (required for GPU)
    data = mx.nd.empty((1,3,224,224), ctx=ctx)
    mod.forward(Batch([data]))
    return mod


def transform_fn(mod, image, input_content_type, output_content_type):
    # pre-processing
    decoded = mx.image.imdecode(image)
    resized = mx.image.resize_short(decoded, 224)
    cropped, crop_info = mx.image.center_crop(resized, (224, 224))
    normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                  mean=mx.nd.array([0.485, 0.456, 0.406]),
                                  std=mx.nd.array([0.229, 0.224, 0.225]))
    transposed = normalized.transpose((2, 0, 1))
    batchified = transposed.expand_dims(axis=0)
    casted = batchified.astype(dtype='float32')
    processed_input = casted.as_in_context(ctx)

    # prediction/inference
    mod.forward(Batch([processed_input]))

    # post-processing
    prob = mod.get_outputs()[0].asnumpy().tolist()
    prob_json = json.dumps(prob)
    return prob_json, output_content_type

MXNet Gluon


import numpy as np
import json
import mxnet as mx
import neomx  # noqa: F401

# Change the context to mx.cpu() if deploying to a CPU endpoint
ctx = mx.gpu()

def model_fn(model_dir):
    # The compiled model artifacts are saved with the prefix 'compiled'
    block = mx.gluon.nn.SymbolBlock.imports('compiled-symbol.json',['data'],'compiled-0000.params', ctx=ctx)
    
    # Hybridize the model & pass required options for Neo: static_alloc=True & static_shape=True
    block.hybridize(static_alloc=True, static_shape=True)
    
    # Run warm-up inference on empty data during model load (required for GPU)
    data = mx.nd.empty((1,3,224,224), ctx=ctx)
    warm_up = block(data)
    return block


def input_fn(image, input_content_type):
    # pre-processing
    decoded = mx.image.imdecode(image)
    resized = mx.image.resize_short(decoded, 224)
    cropped, crop_info = mx.image.center_crop(resized, (224, 224))
    normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                  mean=mx.nd.array([0.485, 0.456, 0.406]),
                                  std=mx.nd.array([0.229, 0.224, 0.225]))
    transposed = normalized.transpose((2, 0, 1))
    batchified = transposed.expand_dims(axis=0)
    casted = batchified.astype(dtype='float32')
    processed_input = casted.as_in_context(ctx)
    return processed_input


def predict_fn(processed_input_data, block):
    # prediction/inference
    prediction = block(processed_input_data)
    return prediction

def output_fn(prediction, output_content_type):
    # post-processing
    prob = prediction.asnumpy().tolist()
    prob_json = json.dumps(prob)
    return prob_json, output_content_type

PyTorch 1.4 and Older


import os
import torch
import torch.nn.parallel
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
from PIL import Image
import io
import json
import pickle


def model_fn(model_dir):
    """Load the model and return it.
    Providing this function is optional.
    There is a default model_fn available which will load the model
    compiled using SageMaker Neo. You can override it here.

    Keyword arguments:
    model_dir -- the directory path where the model artifacts are present
    """

    # The compiled model is saved as "compiled.pt"
    model_path = os.path.join(model_dir, 'compiled.pt')
    with torch.neo.config(model_dir=model_dir, neo_runtime=True):
        model = torch.jit.load(model_path)
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        model = model.to(device)

    # We recommend that you run warm-up inference during model load
    sample_input_path = os.path.join(model_dir, 'sample_input.pkl')
    with open(sample_input_path, 'rb') as input_file:
        model_input = pickle.load(input_file)
    if torch.is_tensor(model_input):
        model_input = model_input.to(device)
        model(model_input)
    elif isinstance(model_input, tuple):
        model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp))
        model(*model_input)
    else:
        print("Only supports a torch tensor or a tuple of torch tensors")
        return model


def transform_fn(model, request_body, request_content_type,
                 response_content_type):
    """Run prediction and return the output.
    The function
    1. Pre-processes the input request
    2. Runs prediction
    3. Post-processes the prediction output.
    """
    # preprocess
    decoded = Image.open(io.BytesIO(request_body))
    preprocess = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[
                0.485, 0.456, 0.406], std=[
                0.229, 0.224, 0.225]),
    ])
    normalized = preprocess(decoded)
    batchified = normalized.unsqueeze(0)
    # predict
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    batchified = batchified.to(device)
    output = model.forward(batchified)

    return json.dumps(output.cpu().numpy().tolist()), response_content_type

PyTorch 1.5 and Newer


import os
import torch
import torch.nn.parallel
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
from PIL import Image
import io
import json
import pickle


def model_fn(model_dir):
    """Load the model and return it.
    Providing this function is optional.
    There is a default_model_fn available, which will load the model
    compiled using SageMaker Neo. You can override the default here.
    The model_fn only needs to be defined if your model needs extra
    steps to load, and can otherwise be left undefined.

    Keyword arguments:
    model_dir -- the directory path where the model artifacts are present
    """

    # The compiled model is saved as "model.pt"
    model_path = os.path.join(model_dir, 'model.pt')
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = torch.jit.load(model_path, map_location=device)
    model = model.to(device)

    return model


def transform_fn(model, request_body, request_content_type,
                    response_content_type):
    """Run prediction and return the output.
    The function
    1. Pre-processes the input request
    2. Runs prediction
    3. Post-processes the prediction output.
    """
    # preprocess
    decoded = Image.open(io.BytesIO(request_body))
    preprocess = transforms.Compose([
                                transforms.Resize(256),
                                transforms.CenterCrop(224),
                                transforms.ToTensor(),
                                transforms.Normalize(
                                    mean=[
                                        0.485, 0.456, 0.406], std=[
                                        0.229, 0.224, 0.225]),
                                    ])
    normalized = preprocess(decoded)
    batchified = normalized.unsqueeze(0)
    
    # predict
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    batchified = batchified.to(device)
    output = model.forward(batchified)
    return json.dumps(output.cpu().numpy().tolist()), response_content_type

Untuk instance inf1 atau onnx, xgboost, gambar kontainer keras

Untuk semua gambar kontainer yang dioptimalkan Neo Inference lainnya, atau jenis instance inferentia, skrip titik masuk harus mengimplementasikan fungsi berikut untuk Neo Deep Learning Runtime:
- neo_preprocess: Mengkonversi payload permintaan masuk ke array numpy.
- neo_postprocess: Mengonversi output prediksi dari Neo Deep Learning Runtime menjadi badan respons.
  
  catatan
  Dua fungsi sebelumnya tidak menggunakan salah satu fungsi, MXNet, PyTorch atau. TensorFlow
Untuk contoh cara menggunakan fungsi-fungsi ini, lihat Notebook Contoh Kompilasi Model Neo.

Untuk TensorFlow model

Jika model Anda memerlukan logika pra-dan pasca-pemrosesan khusus sebelum data dikirim ke model, maka Anda harus menentukan inference.py file skrip titik masuk yang dapat digunakan pada saat inferensi. Script harus mengimplementasikan baik sepasang input_handler dan output_handler fungsi atau fungsi handler tunggal.

catatan

Perhatikan bahwa jika fungsi handler diimplementasikan, input_handler dan output_handler diabaikan.

Berikut ini adalah contoh kode inference.py skrip yang dapat Anda kumpulkan dengan model kompilasi untuk melakukan pra-dan sesudah pemrosesan khusus pada model klasifikasi gambar. Klien SageMaker AI mengirimkan file gambar sebagai tipe application/x-image konten ke input_handler fungsi, di mana ia dikonversi ke JSON. File gambar yang dikonversi kemudian dikirim ke Tensorflow Model Server (TFX) menggunakan REST API.


import json
import numpy as np
import json
import io
from PIL import Image

def input_handler(data, context):
    """ Pre-process request input before it is sent to TensorFlow Serving REST API
    
    Args:
    data (obj): the request data, in format of dict or string
    context (Context): an object containing request and configuration details
    
    Returns:
    (dict): a JSON-serializable dict that contains request body and headers
    """
    f = data.read()
    f = io.BytesIO(f)
    image = Image.open(f).convert('RGB')
    batch_size = 1
    image = np.asarray(image.resize((512, 512)))
    image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
    body = json.dumps({"signature_name": "serving_default", "instances": image.tolist()})
    return body

def output_handler(data, context):
    """Post-process TensorFlow Serving output before it is returned to the client.
    
    Args:
    data (obj): the TensorFlow serving response
    context (Context): an object containing request and configuration details
    
    Returns:
    (bytes, string): data to return to client, response content type
    """
    if data.status_code != 200:
        raise ValueError(data.content.decode('utf-8'))

    response_content_type = context.accept_header
    prediction = data.content
    return prediction, response_content_type

Jika tidak ada pra-atau pasca-pemrosesan khusus, klien SageMaker AI mengonversi gambar file ke JSON dengan cara yang sama sebelum mengirimnya ke titik akhir AI. SageMaker

Untuk informasi selengkapnya, lihat Deploying to TensorFlow Serving Endpoints di Python SageMaker SDK.

URI bucket Amazon S3 yang berisi artefak model yang dikompilasi.

Awas Javascript dinonaktifkan atau tidak tersedia di browser Anda.

Untuk menggunakan Dokumentasi AWS, Javascript harus diaktifkan. Lihat halaman Bantuan browser Anda untuk petunjuk.

Konvensi Dokumen

Menyebarkan Model

Menerapkan Model Terkompilasi Menggunakan SageMaker AI SDK