Monitoring Elastic Inference Accelerators
The following tools are provided to monitor and check the status of your Elastic Inference accelerators.
EI_VISIBLE_DEVICES
EI_VISIBLE_DEVICES
is an environment variable that you use to
control which Elastic Inference accelerator devices are visible to the deep learning frameworks.
EI_VISIBLE_DEVICES
can also be used with EI Tool
. The variable
is a comma-separated list of device ordinal numbers or device IDs. Use EI Tool
to
see all attached Elastic Inference accelerator devices.
EI_VISIBLE_DEVICES
is used as follows. In this example, only the device
with the ordinal number value 3
will be used when starting the server.
EI_VISIBLE_DEVICES=3 amazonei_tensorflow_model_server --port=8502 --rest_api_port=8503 --model_name=ssdresnet --model_base_path=/home/ec2-user/models/ssdresnet
If EI_VISIBLE_DEVICES
is not set, then all attached devices are visible. If
EI_VISIBLE_DEVICES
is set to an empty string, then none of the devices are
visible.
Using EI_VISIBLE_DEVICES with Multiple Devices
To pass multiple devices with EI_VISIBLE_DEVICES
, use a comma-separated
list. This list can contain device ordinal numbers or device IDs. The following command
shows the use of multiple devices with EI Tool
:
EI_VISIBLE_DEVICES=1,3 /opt/amazon/ei/ei_tools/bin/ei describe-accelerators -j
When using multiple Elastic Inference accelerators with EI_VISIBLE_DEVICES
, the
devices visible to the framework take on new ordinal numbers within the process. They will
be labeled within the process starting from zero. This change only happens within the
process. It does not have any impact on the ordinal numbers of the devices outside of the
process. It also does not impact devices that are not included in
EI_VISIBLE_DEVICES
.
Exporting EI_VISIBLE_DEVICES
To set the EI_VISIBLE_DEVICES
variable for use with all child processes
of the current shell process, use the following command:
export EI_VISIBLE_DEVICES=1,3
All subsequently launched processes use this value. You must override or update the EI_VISIBLE_DEVICES
value to change this behavior.
EI Tool
The EI Tool
is a binary that comes with the latest version, v26.0, of the
Conda DLAMI. You can also download it from the Amazon S3
Bucket
By default, running EI Tool
as follows prints basic information about the
Elastic Inference accelerators attached to the Amazon Elastic Compute Cloud instance.
ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ ./ei describe-accelerators EI Client Version: 1.5.0Time: Fri Nov 1 03:09:15 2019 Attached accelerators: 2 Device 0: Type: eia1.xlarge Id: eia-679e4c622d584803aed5b42ab6a97706 Status: healthy Device 1: Type: eia1.xlarge Id: eia-6c414c6ee37a4d93874afc00825c2f28 Status: healthy
The following topic describes options for using EI Tool
from the command
line.
Getting Help
There are two ways to get help when using EI Tool
. The following are the two methods for
accessing help.
-
The
EI Tool
will output usage information if a command is not provided.ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ ./ei Usage: ei describe-accelerators [options] Print description of attached accelerators. Options: -j, --json Print description of attached accelerators in JSON format. -h, --help Print this help instructions and exit. ubuntu@ip-10-0-0-98:~/ei_tools/bin$ echo $? 1
-
You can use the
-h
and—help
switches to output the same information.ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ ./ei describe-accelerators -h Usage: ei describe-accelerators [options] Print description of attached accelerators. Options: -j, --json Print description of attached accelerators in JSON format. -h, --help Print this help instructions and exit. ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ ./ei describe-accelerators --help Usage: ei describe-accelerators [options] Print description of attached accelerators. Options: -j, --json Print description of attached accelerators in JSON format. -h, --help Print this help instructions and exit.
JSON
The EI Tool
supports JSON output when describing attached Elastic Inference
accelerators. The -j/--json
switches can be used to print the
accelerator state description as a JSON object.
ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ ./ei describe-accelerators -j { "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2, "devices": [ { "ordinal": 0, "type": "eia1.xlarge", "id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy" }, { "ordinal": 1, "type": "eia1.xlarge", "id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy" } ] } ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ ./ei describe-accelerators --json { "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:10:15 2019", "attached_accelerators": 2, "devices": [ { "ordinal": 0, "type": "eia1.xlarge", "id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy" }, { "ordinal": 1, "type": "eia1.xlarge", "id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy" } ] }
Errors
Errors encountered when running EI Tool
are output to stderr
.
The following illustrates an error encountered due to blocked outgoing traffic.
ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ ./ei describe-accelerators [Fri Nov 1 03:20:29 2019, 046923us] [Connect] Failed. Error message - Last Error: EI Error Code: [1, 4, 1] EI Error Description: Internal error EI Request ID: MX-EFBD3C87-6E8E-4E99-A855-949CB2A24E7F -- EI Accelerator ID: eia-679e4c622d584803aed5b42ab6a97706 EI Client Version: 1.5.0 [Fri Nov 1 03:20:44 2019, 055905us] [Connect] Failed. Error message - Last Error: EI Error Code: [1, 4, 1] EI Error Description: Internal error EI Request ID: MX-BD40C53D-6BBC-49A8-AF6D-27FF542DA38A -- EI Accelerator ID: eia-6c414c6ee37a4d93874afc00825c2f28 EI Client Version: 1.5.0 EI Client Version: 1.5.0Time: Fri Nov 1 03:20:44 2019 Attached accelerators: 2 Device 0: Type: eia1.xlarge Id: eia-679e4c622d584803aed5b42ab6a97706 Status: not reachable Device 1: Type: eia1.xlarge Id: eia-6c414c6ee37a4d93874afc00825c2f28 Status: not reachable ubuntu@ip-10-0-0-98:~/ei_tools/bin$ echo $? 0
It’s important to note that a JSON object is also output when the -j/--json
switches are set. Even though errors encountered when running EI Tool
are
output to stderr
, the stdout
can still be parsed as a
JSON
object.
ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ ./ei describe-accelerators -j E1101 03:54:54.084712 25091 log_stream.cpp:232] [Connect] Failed. Error message - Last Error: EI Error Code: [1, 4, 1] EI Error Description: Internal error EI Request ID: MX-192D16B1-65CD-43AA-9CA8-0D717D134C0E -- EI Accelerator ID: eia-679e4c622d584803aed5b42ab6a97706 EI Client Version: 1.5.0 E1101 03:55:09.096704 25091 log_stream.cpp:232] [Connect] Failed. Error message - Last Error: EI Error Code: [1, 4, 1] EI Error Description: Internal error EI Request ID: MX-A4C4C90E-FC13-4D58-AA4F-54382222E8D7 -- EI Accelerator ID: eia-6c414c6ee37a4d93874afc00825c2f28 EI Client Version: 1.5.0 { "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:55:09 2019", "attached_accelerators": 2, "devices": [ { "ordinal": 0, "type": "eia1.xlarge", "id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "not reachable" }, { "ordinal": 1, "type": "eia1.xlarge", "id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "not reachable" } ] }
Using EI Tool with LD_LIBRARY_PATH
If there has been a change to your local LD_LIBRARY_PATH
variable, you
may have to modify your use of EI_Tool
. Include the following
LD_LIBRARY_PATH
value when using EI_Tool
:
LD_LIBRARY_PATH=/opt/amazon/ei/ei_tools/lib
The following example uses this value with a single Elastic Inference accelerator:
EI_VISIBLE_DEVICES=1 LD_LIBRARY_PATH=/opt/amazon/ei/ei_tools/lib /opt/amazon/ei/ei_tools/bin/ei describe-accelerators -j { "ei_client_version": "1.5.3", "time": "Tue Nov 19 16:57:21 2019", "attached_accelerators": 1, "devices": [ { "ordinal": 0, "type": "eia1.xlarge", "id": "eia-7f127e2640e642d48a7d4673a57581be", "status": "healthy" } ] }
Health Check
You can use Health Check
to monitor the health of your Elastic Inference accelerators.
The exit code of the Health Check
command is 0
if all accelerators
are healthy and reachable. If they are not, then the exit code is 1
.
ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ ./health_check EI Client Version: 1.5.0 Device 0: healthy Device 1: healthy ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ echo $? 0
The following illustrates an error due to blocked traffic received when running
Health Check
.
ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ ./health_check [Fri Nov 1 07:00:47 2019, 134735us] [Connect] Failed. Error message - Last Error: EI Error Code: [1, 4, 1] EI Error Description: Internal error EI Request ID: MX-A0558121-49D8-48DB-8CCB-9322D78BFCA5 -- EI Accelerator ID: eia-679e4c622d584803aed5b42ab6a97706 EI Client Version: 1.5.0 Device 0: not reachable [Fri Nov 1 07:01:02 2019, 143732us] [Connect] Failed. Error message - Last Error: EI Error Code: [1, 4, 1] EI Error Description: Internal error EI Request ID: MX-AC879033-FB46-46EE-B2B6-A76F5E674E0D -- EI Accelerator ID: eia-6c414c6ee37a4d93874afc00825c2f28 EI Client Version: 1.5.0 Device 1: not reachable ubuntu@ip-10-0-0-98:/opt/amazon/ei/ei_tools/bin$ echo $? 1