Monitoring interactive endpoints
With Amazon EMR on EKS version 6.10 and later, interactive endpoints emit Amazon CloudWatch metrics for monitoring and troubleshooting kernel lifecycle operations. Metrics are triggered by interactive clients, such as EMR Studio or self-hosted Jupyter notebooks. Each of the operations supported by interactive endpoints have metrics associated with them. The operations are modeled as dimensions to each metric, as shown in the table below. Metrics emitted by interactive endpoints are visible under a custom namespace, EMRContainers, in your account.
Metric | Description | Unit |
---|---|---|
RequestCount |
Cumulative number of requests of an operation processed by the interactive endpoint. |
Count |
RequestLatency |
The time from when a request arrived at the interactive endpoint and a response was sent by the interactive endpoint. |
Millisecond |
4XXError |
Emitted when a request for an operation results in a 4xx error during processing. |
Count |
5XXError |
Emitted when a request for an operation results in a 5Xxx server side error. |
Count |
KernelLaunchSuccess |
Applicable only for the CreateKernel operation. It indicates the cumulative number of kernel launches that were successful up to and including this request. |
Count |
KernelLaunchFailure |
Applicable only for the CreateKernel operation. It indicates the cumulative number of kernel launch failures up until and including this request. |
Count |
Each interactive endpoint metric has the following dimensions attached to it:
-
ManagedEndpointId
– Identifier for the interactive endpoint -
OperationName
– The operation triggered by the interactive client
Possible values for the OperationName
dimension are shown in the following table:
operationName |
Operation description |
---|---|
|
Request that the interactive endpoint start a kernel. |
|
Request that the interactive endpoint list the kernels that have been previously started using the same session token. |
|
Request that the interactive endpoint get details about a specific kernel that has been previously started. |
|
Request that the interactive endpoint establish connectivity between the notebook client and the kernel. |
|
Publish |
|
Request that the interactive endpoint list the available kernel specs. |
|
Request that the interactive endpoint get the kernel specs of a kernel that has been previously launched. |
|
Request that the interactive endpoint get specific resources associated with the kernel specs that have been previously launched. |
Examples
To access the total number of kernels launched for an interactive endpoint on a given day:
-
Select the custom namespace:
EMRContainers
-
Select your
ManagedEndpointId
,OperationName – CreateKernel
-
RequestCount
metric with the statisticSUM
and period1 day
will provide all the kernel launch requests made in the last 24 hours. -
KernelLaunchSuccess metric with statistic
SUM
and period1 day
will provide all the successful kernel launch requests made in the last 24 hours.
To access the number of kernel failures for an interactive endpoint on a given day:
-
Select the custom namespace: EMRContainers
-
Select your
ManagedEndpointId
,OperationName – CreateKernel
-
KernelLaunchFailure
metric with statisticSUM
and period1 day
will provide all the failed kernel launch requests made in the last 24 hours. You can also select the4XXError
and5XXError
metric to know what kind of kernel launch failure happened.