Gestionar el modelo

Modo de enfoque

Gestionar el modelo - Amazon SageMaker AI

Cargar modelo Descargar modelo Enumerar modelos Describir modelo Capturar datos Obtener estado de captura Predecir

El agente de Edge Manager puede cargar varios modelos a la vez y hacer inferencias con los modelos cargados en dispositivos periféricos. La cantidad de modelos que el agente puede cargar viene determinada por la memoria disponible en el dispositivo. El agente valida la firma del modelo y carga en la memoria todos los artefactos producidos por el trabajo de empaquetado periférico. Este paso requiere que se instalen todos los certificados necesarios descritos en los pasos anteriores junto con el resto de la instalación binaria. Si no se puede validar la firma del modelo, se produce un error al cargar el modelo con el código de retorno y el motivo correspondientes.

SageMaker El agente Edge Manager proporciona una lista de los sistemas de gestión de modelos APIs que implementan el plano de control y APIs el plano de datos en los dispositivos periféricos. Junto con esta documentación, recomendamos revisar el ejemplo de implementación del cliente, que muestra el uso canónico de lo que se describe a continuación. APIs

El archivo proto está disponible como parte de los artefactos de versión (dentro del archivo tarball de la versión). En este documento, enumeramos y describimos el uso de los que APIs figuran en este proto archivo.

nota

Están one-to-one mapeados APIs en la versión de Windows y se comparte un código de ejemplo para una aplicación implementada en C# con los artefactos de la versión para Windows. A continuación se indican las instrucciones para ejecutar el agente como un proceso independiente, aplicable a los artefactos de versión para Linux.

Extraiga el archivo en función de su sistema operativo. Donde VERSION se divide en tres componentes: <MAJOR_VERSION>.<YYYY-MM-DD>-<SHA-7>. Consulte Instalación del agente de Edge Manager para obtener información sobre cómo obtener la versión de lanzamiento (<MAJOR_VERSION>), la marca de tiempo del artefacto de versión (<YYYY-MM-DD>) y el ID de confirmación del repositorio (SHA-7)

anchor anchor

El archivo zip se puede extraer con el comando:


tar -xvzf <VERSION>.tgz

La jerarquía de artefactos de versión (después de extraer el archivo tar/zip) se muestra a continuación. El archivo proto del agente está disponible en api/.


0.20201205.7ee4b0b
├── bin
│         ├── sagemaker_edge_agent_binary
│         └── sagemaker_edge_agent_client_example
└── docs
├── api
│         └── agent.proto
├── attributions
│         ├── agent.txt
│         └── core.txt
└── examples
└── ipc_example
├── CMakeLists.txt
├── sagemaker_edge_client.cc
├── sagemaker_edge_client_example.cc
├── sagemaker_edge_client.hh
├── sagemaker_edge.proto
├── README.md
├── shm.cc
├── shm.hh
└── street_small.bmp

Cargar modelo

El agente de Edge Manager admite la carga de varios modelos. Esta API valida la firma del modelo y carga en la memoria todos los artefactos producidos por la operación EdgePackagingJob. Este paso requiere que se instalen todos los certificados necesarios junto con el resto de la instalación binaria de agente. Si no se puede validar la firma del modelo, se produce un error y aparecen en el registro el código de retorno y los mensajes de error pertinentes.


// perform load for a model
// Note:
// 1. currently only local filesystem paths are supported for loading models.
// 2. multiple models can be loaded at the same time, as limited by available device memory
// 3. users are required to unload any loaded model to load another model.
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
// 5. ALREADY_EXISTS - model with the same name is already loaded
// 6. RESOURCE_EXHAUSTED - memory is not available to load the model
// 7. FAILED_PRECONDITION - model is not compiled for the machine.
//
rpc LoadModel(LoadModelRequest) returns (LoadModelResponse);

anchor anchor


//
// request for LoadModel rpc call
//
message LoadModelRequest {
  string url = 1;
  string name = 2;  // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$"
}

Descargar modelo

Descarga un modelo cargado anteriormente. Se identifica mediante el alias del modelo que se proporcionó durante loadModel. Si no se encuentra el alias o el modelo no está cargado, devuelve un error.


//
// perform unload for a model
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist
//
rpc UnLoadModel(UnLoadModelRequest) returns (UnLoadModelResponse);

anchor anchor


//
// request for UnLoadModel rpc call
//
message UnLoadModelRequest {
 string name = 1; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$"
}

Enumerar modelos

Muestra todos los modelos cargados y sus alias.


//
// lists the loaded models
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
//
rpc ListModels(ListModelsRequest) returns (ListModelsResponse);

anchor anchor


//
// request for ListModels rpc call
//
message ListModelsRequest {}

Describir modelo

Describe un modelo que se carga en el agente.


//
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
//
rpc DescribeModel(DescribeModelRequest) returns (DescribeModelResponse);

anchor anchor


//
// request for DescribeModel rpc call
//
message DescribeModelRequest {
  string name = 1;
}

Capturar datos

Permite a la aplicación cliente capturar los tensores de entrada y salida en el bucket de Amazon S3 y, opcionalmente, en el auxiliar. Se espera que la aplicación cliente transmita un ID de captura único junto con cada llamada a esta API. Esto se puede usar más adelante para consultar el estado de la captura.


//
// allows users to capture input and output tensors along with auxiliary data.
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 5. ALREADY_EXISTS - capture initiated for the given capture_id
// 6. RESOURCE_EXHAUSTED - buffer is full cannot accept any more requests.
// 7. OUT_OF_RANGE - timestamp is in the future.
// 8. INVALID_ARGUMENT - capture_id is not of expected format.
//
rpc CaptureData(CaptureDataRequest) returns (CaptureDataResponse);

anchor anchor


enum Encoding {
 CSV = 0;
 JSON = 1;
 NONE = 2;
 BASE64 = 3;
}

//
// AuxilaryData represents a payload of extra data to be capture along with inputs and outputs of inference
// encoding - supports the encoding of the data
// data - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment id and
// offset in bytes to location of multi-dimensional tensor array.
//
message AuxilaryData {
 string name = 1;
 Encoding encoding = 2;
 oneof data {
 bytes byte_data = 3;
 SharedMemoryHandle shared_memory_handle = 4;
 }
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
// tensor_metadata - represents metadata of the shared memory segment
// data_or_handle - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment
// id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
 TensorMetadata tensor_metadata = 1; //optional in the predict request
 oneof data {
 bytes byte_data = 4;
 // will only be used for input tensors
 SharedMemoryHandle shared_memory_handle = 5;
 }
}

//
// request for CaptureData rpc call
//
message CaptureDataRequest {
 string model_name = 1;
 string capture_id = 2; //uuid string
 Timestamp inference_timestamp = 3;
 repeated Tensor input_tensors = 4;
 repeated Tensor output_tensors = 5;
 repeated AuxilaryData inputs = 6;
 repeated AuxilaryData outputs = 7;
}

Obtener estado de captura

Según los modelos cargados, los tensores de entrada y salida pueden ser grandes (para muchos dispositivos periféricos). La captura en la nube puede llevar mucho tiempo. Por tanto, CaptureData() se implementa como una operación asíncrona. Un ID de captura es un identificador único que el cliente proporciona durante la llamada de captura de datos, este ID se puede utilizar para consultar el estado de la llamada asíncrona.


//
// allows users to query status of capture data operation
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - given capture id doesn't exist.
//
rpc GetCaptureDataStatus(GetCaptureDataStatusRequest) returns (GetCaptureDataStatusResponse);

anchor anchor


//
// request for GetCaptureDataStatus rpc call
//
message GetCaptureDataStatusRequest {
  string capture_id = 1;
}

Predecir

La API predict realiza inferencias en un modelo cargado anteriormente. Acepta una solicitud en forma de tensor que se introduce directamente en la red neuronal. La salida es el tensor de salida (o escalar) del modelo. Esta es una llamada de bloqueo.


//
// perform inference on a model.
//
// Note:
// 1. users can chose to send the tensor data in the protobuf message or
// through a shared memory segment on a per tensor basis, the Predict
// method with handle the decode transparently.
// 2. serializing large tensors into the protobuf message can be quite expensive,
// based on our measurements it is recommended to use shared memory of
// tenors larger than 256KB.
// 3. SMEdge IPC server will not use shared memory for returning output tensors,
// i.e., the output tensor data will always send in byte form encoded
// in the tensors of PredictResponse.
// 4. currently SMEdge IPC server cannot handle concurrent predict calls, all
// these call will be serialized under the hood. this shall be addressed
// in a later release.
// Status Codes:
// 1. OK - prediction is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - when model not found
// 5. INVALID_ARGUMENT - when tenors types mismatch
//
rpc Predict(PredictRequest) returns (PredictResponse);

Input


// request for Predict rpc call
//
message PredictRequest {
string name = 1;
repeated Tensor tensors = 2;
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// TensorMetadata represents the metadata for a tensor
//    name - name of the tensor
//    data_type  - data type of the tensor
//    shape - array of dimensions of the tensor
//
message TensorMetadata {
  string name = 1;
  DataType data_type = 2;
  repeated int32 shape = 3;
}

//
// SharedMemoryHandle represents a posix shared memory segment
//    offset - offset in bytes from the start of the shared memory segment.
//    segment_id - shared memory segment id corresponding to the posix shared memory segment.
//    size - size in bytes of shared memory segment to use from the offset position.
//
message SharedMemoryHandle {
  uint64 size = 1;
  uint64 offset = 2;
  uint64 segment_id = 3;
}

Output

nota

El PredictResponse solo devuelve Tensors y no SharedMemoryHandle.


// response for Predict rpc call
//
message PredictResponse {
   repeated Tensor tensors = 1;
}

anchor anchor


// request for Predict rpc call
//
message PredictRequest {
string name = 1;
repeated Tensor tensors = 2;
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// TensorMetadata represents the metadata for a tensor
//    name - name of the tensor
//    data_type  - data type of the tensor
//    shape - array of dimensions of the tensor
//
message TensorMetadata {
  string name = 1;
  DataType data_type = 2;
  repeated int32 shape = 3;
}

//
// SharedMemoryHandle represents a posix shared memory segment
//    offset - offset in bytes from the start of the shared memory segment.
//    segment_id - shared memory segment id corresponding to the posix shared memory segment.
//    size - size in bytes of shared memory segment to use from the offset position.
//
message SharedMemoryHandle {
  uint64 size = 1;
  uint64 offset = 2;
  uint64 segment_id = 3;
}

Aviso JavaScript está desactivado o no está disponible en su navegador.

Para utilizar la documentación de AWS, debe estar habilitado JavaScript. Para obtener más información, consulte las páginas de ayuda de su navegador.

Convenciones del documento

Implemente el Model Package directamente con la API de implementación de SageMaker Edge Manager

SageMaker Fin de la vida útil de Edge Manager

En esta página

Seleccione sus preferencias de cookies

Personalizar preferencias de cookies

Esenciales

De rendimiento

Funcionales

De publicidad

No se pueden guardar las preferencias de cookies

Gestionar el modelo

nota

Temas

Cargar modelo

Descargar modelo

Enumerar modelos

Describir modelo

Capturar datos

Obtener estado de captura

Predecir

nota

En esta página

Related resources

¿Le ha servido de ayuda esta página?

Related resources

Tema siguiente:

Tema anterior:

¿Necesita ayuda?