Gestion des modèles

Mode de mise au point

Gestion des modèles - Amazon SageMaker AI

Charger des modèles Décharger un modèle Répertorier les modèles Décrire un modèle Capture des données Obtenir l'état de la capture Prédiction

L'agent Edge Manager peut charger plusieurs modèles à la fois et réaliser l'inférence sur les modèles chargés sur des dispositifs périphériques. Le nombre de modèles que l'agent peut charger est déterminé par la mémoire disponible sur le dispositif. L'agent valide la signature du modèle et charge en mémoire tous les artefacts produits par la tâche d'empaquetage Edge. Cette étape nécessite que tous les certificats requis décrits aux étapes précédentes soient installés avec le reste de l'installation binaire. Si la signature du modèle ne peut pas être validée, le chargement du modèle échoue, et un code et la raison correspondants sont renvoyés.

SageMaker L'agent Edge Manager fournit une liste de modèles de gestion APIs qui implémentent le plan de contrôle et APIs le plan de données sur les appareils Edge. Parallèlement à cette documentation, nous vous recommandons de passer en revue l'exemple d'implémentation du client qui montre l'utilisation canonique des éléments décrits APIs ci-dessous.

Le fichier proto est disponible en tant que partie des artefacts de version (à l'intérieur du fichier Tarball de version). Dans ce document, nous listons et décrivons l'utilisation des APIs éléments répertoriés dans ce proto fichier.

Note

Ils sont one-to-one mappés dans APIs la version Windows et un exemple de code pour une implémentation d'application en C# est partagé avec les artefacts de version pour Windows. Voici des instructions pour exécuter l'agent en tant que processus autonome, applicables aux artefacts de version pour Linux.

Extrayez l'archive en fonction de votre système d'exploitation. Où VERSION se décompose en trois éléments : <MAJOR_VERSION>.<YYYY-MM-DD>-<SHA-7>. Veuillez consulter Installation de l'agent Edge Manager pour obtenir des informations sur la façon d'obtenir la version de sortie (<MAJOR_VERSION>), l'horodatage de l'artefact de version (<YYYY-MM-DD>) et l'ID de validation du référentiel (SHA-7)

anchor anchor

L'archive zip peut être extraite avec la commande :


tar -xvzf <VERSION>.tgz

La hiérarchie des artefacts de version (après extraction de l'archive tar/zip) est présentée ci-dessous. Le fichier proto de l'agent est disponible sous api/.


0.20201205.7ee4b0b
├── bin
│         ├── sagemaker_edge_agent_binary
│         └── sagemaker_edge_agent_client_example
└── docs
├── api
│         └── agent.proto
├── attributions
│         ├── agent.txt
│         └── core.txt
└── examples
└── ipc_example
├── CMakeLists.txt
├── sagemaker_edge_client.cc
├── sagemaker_edge_client_example.cc
├── sagemaker_edge_client.hh
├── sagemaker_edge.proto
├── README.md
├── shm.cc
├── shm.hh
└── street_small.bmp

Charger des modèles

L'agent Edge Manager prend en charge le chargement de plusieurs modèles. Cette API valide la signature du modèle et charge en mémoire tous les artefacts produits par l'opération EdgePackagingJob. Cette étape nécessite que tous les certificats requis soient installés avec le reste de l'installation binaire de l'agent. Si la signature du modèle ne peut pas être validée, cette étape échoue, et un code et les messages d'erreur correspondants sont renvoyés dans le journal.


// perform load for a model
// Note:
// 1. currently only local filesystem paths are supported for loading models.
// 2. multiple models can be loaded at the same time, as limited by available device memory
// 3. users are required to unload any loaded model to load another model.
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
// 5. ALREADY_EXISTS - model with the same name is already loaded
// 6. RESOURCE_EXHAUSTED - memory is not available to load the model
// 7. FAILED_PRECONDITION - model is not compiled for the machine.
//
rpc LoadModel(LoadModelRequest) returns (LoadModelResponse);

anchor anchor


//
// request for LoadModel rpc call
//
message LoadModelRequest {
  string url = 1;
  string name = 2;  // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$"
}

Décharger un modèle

Décharge un modèle précédemment chargé. Il est identifié via l'alias du modèle qui a été fourni durant le loadModel. Si l'alias n'est pas trouvé ou si le modèle n'est pas chargé, une erreur est renvoyée.


//
// perform unload for a model
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist
//
rpc UnLoadModel(UnLoadModelRequest) returns (UnLoadModelResponse);

anchor anchor


//
// request for UnLoadModel rpc call
//
message UnLoadModelRequest {
 string name = 1; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$"
}

Répertorier les modèles

Répertorie tous les modèles chargés et leurs alias.


//
// lists the loaded models
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
//
rpc ListModels(ListModelsRequest) returns (ListModelsResponse);

anchor anchor


//
// request for ListModels rpc call
//
message ListModelsRequest {}

Décrire un modèle

Décrit un modèle chargé sur l'agent.


//
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
//
rpc DescribeModel(DescribeModelRequest) returns (DescribeModelResponse);

anchor anchor


//
// request for DescribeModel rpc call
//
message DescribeModelRequest {
  string name = 1;
}

Capture des données

Permet à l'application client de capturer les tenseurs d'entrée et de sortie dans le compartiment Amazon S3, et éventuellement l'auxiliaire. L'application client doit transmettre un ID de capture unique avec chaque appel à cette API. Cela peut servir ultérieurement à interroger l'état de la capture.


//
// allows users to capture input and output tensors along with auxiliary data.
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 5. ALREADY_EXISTS - capture initiated for the given capture_id
// 6. RESOURCE_EXHAUSTED - buffer is full cannot accept any more requests.
// 7. OUT_OF_RANGE - timestamp is in the future.
// 8. INVALID_ARGUMENT - capture_id is not of expected format.
//
rpc CaptureData(CaptureDataRequest) returns (CaptureDataResponse);

anchor anchor


enum Encoding {
 CSV = 0;
 JSON = 1;
 NONE = 2;
 BASE64 = 3;
}

//
// AuxilaryData represents a payload of extra data to be capture along with inputs and outputs of inference
// encoding - supports the encoding of the data
// data - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment id and
// offset in bytes to location of multi-dimensional tensor array.
//
message AuxilaryData {
 string name = 1;
 Encoding encoding = 2;
 oneof data {
 bytes byte_data = 3;
 SharedMemoryHandle shared_memory_handle = 4;
 }
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
// tensor_metadata - represents metadata of the shared memory segment
// data_or_handle - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment
// id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
 TensorMetadata tensor_metadata = 1; //optional in the predict request
 oneof data {
 bytes byte_data = 4;
 // will only be used for input tensors
 SharedMemoryHandle shared_memory_handle = 5;
 }
}

//
// request for CaptureData rpc call
//
message CaptureDataRequest {
 string model_name = 1;
 string capture_id = 2; //uuid string
 Timestamp inference_timestamp = 3;
 repeated Tensor input_tensors = 4;
 repeated Tensor output_tensors = 5;
 repeated AuxilaryData inputs = 6;
 repeated AuxilaryData outputs = 7;
}

Obtenir l'état de la capture

Selon les modèles chargés, les tenseurs d'entrée et de sortie peuvent être volumineux (pour de nombreux dispositifs périphériques). La capture dans le cloud peut être chronophage. La CaptureData() est donc mise en œuvre sous forme d'opération asynchrone. Un ID de capture est un identifiant unique que le client fournit lors de l'appel de données de capture. Cet ID peut servir à interroger l'état de l'appel asynchrone.


//
// allows users to query status of capture data operation
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - given capture id doesn't exist.
//
rpc GetCaptureDataStatus(GetCaptureDataStatusRequest) returns (GetCaptureDataStatusResponse);

anchor anchor


//
// request for GetCaptureDataStatus rpc call
//
message GetCaptureDataStatusRequest {
  string capture_id = 1;
}

Prédiction

L'API predict réalise l'inférence sur un modèle précédemment chargé. Elle accepte une requête sous la forme d'un tenseur directement introduit dans le réseau neuronal. La sortie est le tenseur de sortie (ou scalaire) du modèle. Il s'agit d'un appel bloquant.


//
// perform inference on a model.
//
// Note:
// 1. users can chose to send the tensor data in the protobuf message or
// through a shared memory segment on a per tensor basis, the Predict
// method with handle the decode transparently.
// 2. serializing large tensors into the protobuf message can be quite expensive,
// based on our measurements it is recommended to use shared memory of
// tenors larger than 256KB.
// 3. SMEdge IPC server will not use shared memory for returning output tensors,
// i.e., the output tensor data will always send in byte form encoded
// in the tensors of PredictResponse.
// 4. currently SMEdge IPC server cannot handle concurrent predict calls, all
// these call will be serialized under the hood. this shall be addressed
// in a later release.
// Status Codes:
// 1. OK - prediction is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - when model not found
// 5. INVALID_ARGUMENT - when tenors types mismatch
//
rpc Predict(PredictRequest) returns (PredictResponse);

Input


// request for Predict rpc call
//
message PredictRequest {
string name = 1;
repeated Tensor tensors = 2;
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// TensorMetadata represents the metadata for a tensor
//    name - name of the tensor
//    data_type  - data type of the tensor
//    shape - array of dimensions of the tensor
//
message TensorMetadata {
  string name = 1;
  DataType data_type = 2;
  repeated int32 shape = 3;
}

//
// SharedMemoryHandle represents a posix shared memory segment
//    offset - offset in bytes from the start of the shared memory segment.
//    segment_id - shared memory segment id corresponding to the posix shared memory segment.
//    size - size in bytes of shared memory segment to use from the offset position.
//
message SharedMemoryHandle {
  uint64 size = 1;
  uint64 offset = 2;
  uint64 segment_id = 3;
}

Output

Note

La PredictResponse renvoie Tensors uniquement, mais pas SharedMemoryHandle.


// response for Predict rpc call
//
message PredictResponse {
   repeated Tensor tensors = 1;
}

anchor anchor


// request for Predict rpc call
//
message PredictRequest {
string name = 1;
repeated Tensor tensors = 2;
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// TensorMetadata represents the metadata for a tensor
//    name - name of the tensor
//    data_type  - data type of the tensor
//    shape - array of dimensions of the tensor
//
message TensorMetadata {
  string name = 1;
  DataType data_type = 2;
  repeated int32 shape = 3;
}

//
// SharedMemoryHandle represents a posix shared memory segment
//    offset - offset in bytes from the start of the shared memory segment.
//    segment_id - shared memory segment id corresponding to the posix shared memory segment.
//    size - size in bytes of shared memory segment to use from the offset position.
//
message SharedMemoryHandle {
  uint64 size = 1;
  uint64 offset = 2;
  uint64 segment_id = 3;
}

Avertissement JavaScript est désactivé ou n'est pas disponible dans votre navigateur.

Pour que vous puissiez utiliser la documentation AWS, Javascript doit être activé. Vous trouverez des instructions sur les pages d'aide de votre navigateur.

Conventions de rédaction

Déployez le Package du modèle directement à l'aide de l'API de déploiement d' SageMaker Edge Manager

SageMaker Fin de vie d'Edge Manager

Sur cette page

Sélectionner vos préférences de cookies

Personnaliser les préférences de cookies

Essentiels

Performances

Fonctionnels

Publicitaires

Impossible d'enregistrer les préférences concernant les cookies

Gestion des modèles

Note

Rubriques

Charger des modèles

Décharger un modèle

Répertorier les modèles

Décrire un modèle

Capture des données

Obtenir l'état de la capture

Prédiction

Note

Sur cette page

Related resources

Cette page vous a-t-elle été utile ?

Related resources

Rubrique suivante :

Rubrique précédente :

Avez-vous besoin d’aide ?