本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
管理模型
Edge Manager 代理程式可以一次載入多個模型,並使用 Edge 裝置上載入的模型進行推論。代理程式可載入的型號數取決於裝置上的可用記憶體。代理程式會驗證模型簽章,並將 Edge 封裝任務產生的所有成品載入記憶體中。此步驟要求上一步中描述的所有必要憑證與二進位安裝的其餘部分一起安裝。如果無法驗證模型的簽章,則載入模型會失敗,並顯示適當的傳回碼和原因。
SageMaker Edge Manager 代理程式提供模型管理的清單APIs,可在APIs邊緣裝置上實作控制平面和資料平面。除了本文件之外,我們建議您瀏覽範例用戶端實作,其中顯示下列描述的正常用量APIs。
proto
檔案可作為發行成品的一部分使用 (在發行壓縮包中)。在此文件中,我們會列出並描述此proto
檔案中APIs列出的 使用量。
Windows one-to-one 版本APIs上有這些對應的,而 C# 中實作的應用程式範例程式碼會與 Windows 的版本成品共用。以下指示適用於以獨立程序的方式執行代理程式,適用於 Linux 的發行成品。
根據您的操作系統提取存檔。其中 VERSION
被分成三個組成部分:<MAJOR_VERSION>.<YYYY-MM-DD>-<SHA-7>
。如需如何取得發行版本 (<MAJOR_VERSION>
)、發行成品的時間戳記 (<YYYY-MM-DD>
) 以及儲存庫遞交 ID (SHA-7
) 的相關資訊,請參閱安裝 Edge Manager 代理程式
- Linux
-
可以使用以下命令提取 zip 存檔:
tar -xvzf <VERSION>
.tgz
- Windows
-
可以使用用戶介面或命令提取 zip 存檔:
unzip <VERSION>
.tgz
發行成品階層 (擷取 tar/zip
存檔後) 如下所示。代理程式 proto
檔案位於 api/
下。
0.20201205.7ee4b0b
├── bin
│ ├── sagemaker_edge_agent_binary
│ └── sagemaker_edge_agent_client_example
└── docs
├── api
│ └── agent.proto
├── attributions
│ ├── agent.txt
│ └── core.txt
└── examples
└── ipc_example
├── CMakeLists.txt
├── sagemaker_edge_client.cc
├── sagemaker_edge_client_example.cc
├── sagemaker_edge_client.hh
├── sagemaker_edge.proto
├── README.md
├── shm.cc
├── shm.hh
└── street_small.bmp
載入模型
Edge Manager 代理程式支援載入多個模型。這會API驗證模型簽章,並將EdgePackagingJob
操作產生的所有成品載入記憶體。此步驟要求所有必要憑證與代理程式二進位安裝的其餘部分一起安裝。如果無法驗證模型的簽章,則此步驟會失敗,並在記錄檔中顯示適當的傳回碼和錯誤訊息。
// perform load for a model
// Note:
// 1. currently only local filesystem paths are supported for loading models.
// 2. multiple models can be loaded at the same time, as limited by available device memory
// 3. users are required to unload any loaded model to load another model.
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
// 5. ALREADY_EXISTS - model with the same name is already loaded
// 6. RESOURCE_EXHAUSTED - memory is not available to load the model
// 7. FAILED_PRECONDITION - model is not compiled for the machine.
//
rpc LoadModel(LoadModelRequest) returns (LoadModelResponse);
- Input
-
//
// request for LoadModel rpc call
//
message LoadModelRequest {
string url = 1;
string name = 2; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$"
}
- Output
-
//
//
// response for LoadModel rpc call
//
message LoadModelResponse {
Model model = 1;
}
//
// Model represents the metadata of a model
// url - url representing the path of the model
// name - name of model
// input_tensor_metadatas - TensorMetadata array for the input tensors
// output_tensor_metadatas - TensorMetadata array for the output tensors
//
// Note:
// 1. input and output tensor metadata could empty for dynamic models.
//
message Model {
string url = 1;
string name = 2;
repeated TensorMetadata input_tensor_metadatas = 3;
repeated TensorMetadata output_tensor_metadatas = 4;
}
卸載模型
卸載先前載入的模型。它是通過其期間提供的模型別名標識 loadModel
。如果沒有找到別名或模型未加載則返回錯誤。
//
// perform unload for a model
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist
//
rpc UnLoadModel(UnLoadModelRequest) returns (UnLoadModelResponse);
- Input
-
//
// request for UnLoadModel rpc call
//
message UnLoadModelRequest {
string name = 1; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$"
}
- Output
-
//
// response for UnLoadModel rpc call
//
message UnLoadModelResponse {}
清單模型
列出所有載入的模型及其別名。
//
// lists the loaded models
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
//
rpc ListModels(ListModelsRequest) returns (ListModelsResponse);
- Input
-
//
// request for ListModels rpc call
//
message ListModelsRequest {}
- Output
-
//
// response for ListModels rpc call
//
message ListModelsResponse {
repeated Model models = 1;
}
描述型號
描述載入代理程式上的模型。
//
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
//
rpc DescribeModel(DescribeModelRequest) returns (DescribeModelResponse);
- Input
-
//
// request for DescribeModel rpc call
//
message DescribeModelRequest {
string name = 1;
}
- Output
-
//
// response for DescribeModel rpc call
//
message DescribeModelResponse {
Model model = 1;
}
擷取資料
允許用戶端應用程式擷取 Amazon S3 儲存貯體中的輸入和輸出張量,以及選擇性地擷取輔助功能。客戶端應用程式應該將唯一的擷取 ID 與每次呼叫一起傳遞至此 API。這可以稍後用於查詢擷取的狀態。
//
// allows users to capture input and output tensors along with auxiliary data.
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 5. ALREADY_EXISTS - capture initiated for the given capture_id
// 6. RESOURCE_EXHAUSTED - buffer is full cannot accept any more requests.
// 7. OUT_OF_RANGE - timestamp is in the future.
// 8. INVALID_ARGUMENT - capture_id is not of expected format.
//
rpc CaptureData(CaptureDataRequest) returns (CaptureDataResponse);
- Input
-
enum Encoding {
CSV = 0;
JSON = 1;
NONE = 2;
BASE64 = 3;
}
//
// AuxilaryData represents a payload of extra data to be capture along with inputs and outputs of inference
// encoding - supports the encoding of the data
// data - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment id and
// offset in bytes to location of multi-dimensional tensor array.
//
message AuxilaryData {
string name = 1;
Encoding encoding = 2;
oneof data {
bytes byte_data = 3;
SharedMemoryHandle shared_memory_handle = 4;
}
}
//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
// tensor_metadata - represents metadata of the shared memory segment
// data_or_handle - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment
// id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
TensorMetadata tensor_metadata = 1; //optional in the predict request
oneof data {
bytes byte_data = 4;
// will only be used for input tensors
SharedMemoryHandle shared_memory_handle = 5;
}
}
//
// request for CaptureData rpc call
//
message CaptureDataRequest {
string model_name = 1;
string capture_id = 2; //uuid string
Timestamp inference_timestamp = 3;
repeated Tensor input_tensors = 4;
repeated Tensor output_tensors = 5;
repeated AuxilaryData inputs = 6;
repeated AuxilaryData outputs = 7;
}
- Output
-
//
// response for CaptureData rpc call
//
message CaptureDataResponse {}
獲取擷取狀態
根據載入的型號,輸入和輸出張量可能很大 (對於許多 Edge 裝置來說)。擷取到雲端可能非常耗時。因此實作 CaptureData()
做為非同步作業。擷取 ID 是用戶端在擷取資料呼叫期間提供的唯一識別碼,此 ID 可用於查詢異步呼叫的狀態。
//
// allows users to query status of capture data operation
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - given capture id doesn't exist.
//
rpc GetCaptureDataStatus(GetCaptureDataStatusRequest) returns (GetCaptureDataStatusResponse);
- Input
-
//
// request for GetCaptureDataStatus rpc call
//
message GetCaptureDataStatusRequest {
string capture_id = 1;
}
- Output
-
enum CaptureDataStatus {
FAILURE = 0;
SUCCESS = 1;
IN_PROGRESS = 2;
NOT_FOUND = 3;
}
//
// response for GetCaptureDataStatus rpc call
//
message GetCaptureDataStatusResponse {
CaptureDataStatus status = 1;
}
预测
會在先前載入的模型上執行predict
API推論。它接受直接饋送到神經網路的張量形式的請求。輸出是來自模型的輸出張量 (或純量)。這是一個封鎖調用。
//
// perform inference on a model.
//
// Note:
// 1. users can chose to send the tensor data in the protobuf message or
// through a shared memory segment on a per tensor basis, the Predict
// method with handle the decode transparently.
// 2. serializing large tensors into the protobuf message can be quite expensive,
// based on our measurements it is recommended to use shared memory of
// tenors larger than 256KB.
// 3. SMEdge IPC server will not use shared memory for returning output tensors,
// i.e., the output tensor data will always send in byte form encoded
// in the tensors of PredictResponse.
// 4. currently SMEdge IPC server cannot handle concurrent predict calls, all
// these call will be serialized under the hood. this shall be addressed
// in a later release.
// Status Codes:
// 1. OK - prediction is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - when model not found
// 5. INVALID_ARGUMENT - when tenors types mismatch
//
rpc Predict(PredictRequest) returns (PredictResponse);
- Input
-
// request for Predict rpc call
//
message PredictRequest {
string name = 1;
repeated Tensor tensors = 2;
}
//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
// tensor_metadata - represents metadata of the shared memory segment
// data_or_handle - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment
// id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
TensorMetadata tensor_metadata = 1; //optional in the predict request
oneof data {
bytes byte_data = 4;
// will only be used for input tensors
SharedMemoryHandle shared_memory_handle = 5;
}
}
//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
// tensor_metadata - represents metadata of the shared memory segment
// data_or_handle - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment
// id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
TensorMetadata tensor_metadata = 1; //optional in the predict request
oneof data {
bytes byte_data = 4;
// will only be used for input tensors
SharedMemoryHandle shared_memory_handle = 5;
}
}
//
// TensorMetadata represents the metadata for a tensor
// name - name of the tensor
// data_type - data type of the tensor
// shape - array of dimensions of the tensor
//
message TensorMetadata {
string name = 1;
DataType data_type = 2;
repeated int32 shape = 3;
}
//
// SharedMemoryHandle represents a posix shared memory segment
// offset - offset in bytes from the start of the shared memory segment.
// segment_id - shared memory segment id corresponding to the posix shared memory segment.
// size - size in bytes of shared memory segment to use from the offset position.
//
message SharedMemoryHandle {
uint64 size = 1;
uint64 offset = 2;
uint64 segment_id = 3;
}
- Output
-
PredictResponse
僅傳回 Tensors
而不是 SharedMemoryHandle
。
// response for Predict rpc call
//
message PredictResponse {
repeated Tensor tensors = 1;
}