排除 Neo 推理错误
此部分包含有关如何预防和解决您在部署和/或调用端点时可能遇到的一些常见错误的信息。此部分适用于 PyTorch 1.4.0 或更高版本和 MXNet v1.7.0 或更高版本。
-
如果您在推理脚本中定义了
model_fn
,请确保对有效输入数据的第一次推理(预热推理)是在model_fn()
中完成的,否则调用predict API
时终端上可能会显示以下错误消息: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from <users-sagemaker-endpoint> with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."
-
确保设置下表所示的环境变量。如果未设置,则可能会显示以下错误消息:
在终端上:
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (503) from <users-sagemaker-endpoint> with message "{ "code": 503, "type": "InternalServerException", "message": "Prediction failed" } ".
在 CloudWatch 中:
W-9001-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - AttributeError: 'NoneType' object has no attribute 'transform'
键 值 SAGEMAKER_PROGRAM inference.py SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/model/code SAGEMAKER_CONTAINER_LOG_LEVEL 20 SAGEMAKER_REGION <您的区域> -
创建 Amazon SageMaker 模型时,请确保将
MMS_DEFAULT_RESPONSE_TIMEOUT
环境变量设置为 500 或更高的值;否则,终端上可能会显示以下错误消息:An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from <users-sagemaker-endpoint> with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."