本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
故障診斷 Neo 推論錯誤
本節包含如何防止和解決部署和/或調用端點時可能遇到的一些常見錯誤之資訊。本節適用於 PyTorch 1.4.0 或更新版本以 MXNet 7.0 版或更新版本。
-
如果您在推論指令碼中定義了一個
model_fn
,請務必透過model_fn()
在有效輸入資料中完成第一個推論 (熱身推論),否則在呼叫predict API
時可能會在終端機上看到以下錯誤訊息: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from <users-sagemaker-endpoint> with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."
-
請務必設定環境變數,如下表所示。如果未設置這些變數,則系統可能會顯示以下錯誤訊息:
在終端機上:
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (503) from <users-sagemaker-endpoint> with message "{ "code": 503, "type": "InternalServerException", "message": "Prediction failed" } ".
在 CloudWatch:
W-9001-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - AttributeError: 'NoneType' object has no attribute 'transform'
金鑰 值 SAGEMAKER_PROGRAM inference.py SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/model/code SAGEMAKER_CONTAINER_LOG_LEVEL 20 SAGEMAKER_REGION <your region> -
確保在創建 Amazon SageMaker 模型時將
MMS_DEFAULT_RESPONSE_TIMEOUT
環境變量設置為 500 或更高的值;否則,終端機上可能會看到以下錯誤消息:An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from <users-sagemaker-endpoint> with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."